[Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration
       [not found] <CGME20170414131735eucas1p21f1fcadf426789276f567191372f7794@eucas1p2.samsung.com>
@ 2017-04-14 13:17 ` Alexey Perevalov
       [not found]   ` <CGME20170414131738eucas1p28fe4896d7f42d8c5b23cb95312c41eca@eucas1p2.samsung.com>
                     ` (7 more replies)
  0 siblings, 8 replies; 38+ messages in thread
From: Alexey Perevalov @ 2017-04-14 13:17 UTC (permalink / raw)
  To: dgilbert, qemu-devel; +Cc: a.perevalov, i.maximets

This patch set includes downtime calculation on destination,
sending it to source machine for statistics (query-migration).

Also additional traceses here for track down who was pagefault
initiator.

This patch set is based on master branch of git://git.qemu-project.org/qemu.git
base commit is commit 372b3fe0b2ecdd39ba850e31c0c6686315c507af.

It contains kernel side pages, just for convinience of applying current patch set,
for testing util kernel headers arn't synced.

Alexey Perevalov (6):
  userfault: add pid into uffd_msg & update UFFD_FEATURE_*
  util: introduce glib-helper.c
  migration: add UFFD_FEATURE_THREAD_ID feature support
  migration: calculate downtime on dst side
  migration: send postcopy downtime back to source
  migration: detailed traces for postcopy

 hw/block/xen_disk.c               |  10 +-
 include/glib-compat.h             | 352 --------------------------------------
 include/glib/glib-compat.h        | 352 ++++++++++++++++++++++++++++++++++++++
 include/glib/glib-helper.h        |  30 ++++
 include/migration/migration.h     |  18 +-
 include/migration/postcopy-ram.h  |   2 +-
 include/qemu/osdep.h              |   2 +-
 linux-headers/linux/userfaultfd.h |   5 +
 linux-user/main.c                 |   2 +-
 migration/migration.c             | 302 +++++++++++++++++++++++++++++++-
 migration/postcopy-ram.c          | 133 +++++++++++++-
 migration/qemu-file.c             |   1 -
 migration/savevm.c                |   2 +-
 migration/trace-events            |  15 +-
 scripts/clean-includes            |   2 +-
 util/Makefile.objs                |   1 +
 util/glib-helper.c                |  29 ++++
 17 files changed, 878 insertions(+), 380 deletions(-)
 delete mode 100644 include/glib-compat.h
 create mode 100644 include/glib/glib-compat.h
 create mode 100644 include/glib/glib-helper.h
 create mode 100644 util/glib-helper.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 1/6] userfault: add pid into uffd_msg & update UFFD_FEATURE_*
       [not found]   ` <CGME20170414131738eucas1p28fe4896d7f42d8c5b23cb95312c41eca@eucas1p2.samsung.com>
@ 2017-04-14 13:17     ` Alexey Perevalov
  0 siblings, 0 replies; 38+ messages in thread
From: Alexey Perevalov @ 2017-04-14 13:17 UTC (permalink / raw)
  To: dgilbert, qemu-devel; +Cc: a.perevalov, i.maximets

This commit duplicates header of "userfaultfd: provide pid in userfault msg"
into linux kernel.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 linux-headers/linux/userfaultfd.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/linux-headers/linux/userfaultfd.h b/linux-headers/linux/userfaultfd.h
index 2ed5dc3..760f02f 100644
--- a/linux-headers/linux/userfaultfd.h
+++ b/linux-headers/linux/userfaultfd.h
@@ -77,6 +77,9 @@ struct uffd_msg {
 		struct {
 			__u64	flags;
 			__u64	address;
+      union {
+			__u32   ptid;
+      } feat;
 		} pagefault;
 
 		struct {
@@ -158,6 +161,8 @@ struct uffdio_api {
 #define UFFD_FEATURE_EVENT_MADVDONTNEED		(1<<3)
 #define UFFD_FEATURE_MISSING_HUGETLBFS		(1<<4)
 #define UFFD_FEATURE_MISSING_SHMEM		(1<<5)
+#define UFFD_FEATURE_EVENT_UNMAP		(1<<6)
+#define UFFD_FEATURE_THREAD_ID			(1<<7)
 	__u64 features;
 
 	__u64 ioctls;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c
       [not found]   ` <CGME20170414131739eucas1p1ea9a6adcdbe8cfe45ac1ff582d28d873@eucas1p1.samsung.com>
@ 2017-04-14 13:17     ` Alexey Perevalov
  2017-04-14 16:05       ` Philippe Mathieu-Daudé
  2017-04-21 10:27       ` Peter Maydell
  0 siblings, 2 replies; 38+ messages in thread
From: Alexey Perevalov @ 2017-04-14 13:17 UTC (permalink / raw)
  To: dgilbert, qemu-devel; +Cc: a.perevalov, i.maximets

There is a lack of g_int_cmp which compares pointers value in glib,
xen_disk.c introduced its own, so the same function now requires
in migration.c. So logically to move it into common place.
Futher: maybe extend glib.

Also this commit moves existing glib-compat.h into util/glib
folder for consolidation purpose.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 hw/block/xen_disk.c        |  10 +-
 include/glib-compat.h      | 352 ---------------------------------------------
 include/glib/glib-compat.h | 352 +++++++++++++++++++++++++++++++++++++++++++++
 include/glib/glib-helper.h |  30 ++++
 include/qemu/osdep.h       |   2 +-
 linux-user/main.c          |   2 +-
 scripts/clean-includes     |   2 +-
 util/Makefile.objs         |   1 +
 util/glib-helper.c         |  29 ++++
 9 files changed, 417 insertions(+), 363 deletions(-)
 delete mode 100644 include/glib-compat.h
 create mode 100644 include/glib/glib-compat.h
 create mode 100644 include/glib/glib-helper.h
 create mode 100644 util/glib-helper.c

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 456a2d5..36f6396 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -20,6 +20,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "glib/glib-helper.h"
 #include <sys/ioctl.h>
 #include <sys/uio.h>
 
@@ -154,13 +155,6 @@ static void ioreq_reset(struct ioreq *ioreq)
     qemu_iovec_reset(&ioreq->v);
 }
 
-static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
-{
-    uint ua = GPOINTER_TO_UINT(a);
-    uint ub = GPOINTER_TO_UINT(b);
-    return (ua > ub) - (ua < ub);
-}
-
 static void destroy_grant(gpointer pgnt)
 {
     PersistentGrant *grant = pgnt;
@@ -1191,7 +1185,7 @@ static int blk_connect(struct XenDevice *xendev)
     if (blkdev->feature_persistent) {
         /* Init persistent grants */
         blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
-        blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
+        blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)g_int_cmp,
                                              NULL, NULL,
                                              batch_maps ?
                                              (GDestroyNotify)g_free :
diff --git a/include/glib-compat.h b/include/glib-compat.h
deleted file mode 100644
index 863c8cf..0000000
--- a/include/glib-compat.h
+++ /dev/null
@@ -1,352 +0,0 @@
-/*
- * GLIB Compatibility Functions
- *
- * Copyright IBM, Corp. 2013
- *
- * Authors:
- *  Anthony Liguori   <aliguori@us.ibm.com>
- *  Michael Tokarev   <mjt@tls.msk.ru>
- *  Paolo Bonzini     <pbonzini@redhat.com>
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef QEMU_GLIB_COMPAT_H
-#define QEMU_GLIB_COMPAT_H
-
-#include <glib.h>
-
-/* GLIB version compatibility flags */
-#if !GLIB_CHECK_VERSION(2, 26, 0)
-#define G_TIME_SPAN_SECOND              (G_GINT64_CONSTANT(1000000))
-#endif
-
-#if !GLIB_CHECK_VERSION(2, 28, 0)
-static inline gint64 qemu_g_get_monotonic_time(void)
-{
-    /* g_get_monotonic_time() is best-effort so we can use the wall clock as a
-     * fallback.
-     */
-
-    GTimeVal time;
-    g_get_current_time(&time);
-
-    return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
-}
-/* work around distro backports of this interface */
-#define g_get_monotonic_time() qemu_g_get_monotonic_time()
-#endif
-
-#if defined(_WIN32) && !GLIB_CHECK_VERSION(2, 50, 0)
-/*
- * g_poll has a problem on Windows when using
- * timeouts < 10ms, so use wrapper.
- */
-#define g_poll(fds, nfds, timeout) g_poll_fixed(fds, nfds, timeout)
-gint g_poll_fixed(GPollFD *fds, guint nfds, gint timeout);
-#endif
-
-#if !GLIB_CHECK_VERSION(2, 30, 0)
-/* Not a 100% compatible implementation, but good enough for most
- * cases. Placeholders are only supported at the end of the
- * template. */
-static inline gchar *qemu_g_dir_make_tmp(gchar const *tmpl, GError **error)
-{
-    gchar *path = g_build_filename(g_get_tmp_dir(), tmpl ?: ".XXXXXX", NULL);
-
-    if (mkdtemp(path) != NULL) {
-        return path;
-    }
-    /* Error occurred, clean up. */
-    g_set_error(error, G_FILE_ERROR, g_file_error_from_errno(errno),
-                "mkdtemp() failed");
-    g_free(path);
-    return NULL;
-}
-#define g_dir_make_tmp(tmpl, error) qemu_g_dir_make_tmp(tmpl, error)
-#endif /* glib 2.30 */
-
-#if !GLIB_CHECK_VERSION(2, 31, 0)
-/* before glib-2.31, GMutex and GCond was dynamic-only (there was a separate
- * GStaticMutex, but it didn't work with condition variables).
- *
- * Our implementation uses GOnce to fake a static implementation that does
- * not require separate initialization.
- * We need to rename the types to avoid passing our CompatGMutex/CompatGCond
- * by mistake to a function that expects GMutex/GCond.  However, for ease
- * of use we keep the GLib function names.  GLib uses macros for the
- * implementation, we use inline functions instead and undefine the macros.
- */
-
-typedef struct CompatGMutex {
-    GOnce once;
-} CompatGMutex;
-
-typedef struct CompatGCond {
-    GOnce once;
-} CompatGCond;
-
-static inline gpointer do_g_mutex_new(gpointer unused)
-{
-    return (gpointer) g_mutex_new();
-}
-
-static inline void g_mutex_init(CompatGMutex *mutex)
-{
-    mutex->once = (GOnce) G_ONCE_INIT;
-}
-
-static inline void g_mutex_clear(CompatGMutex *mutex)
-{
-    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
-    if (mutex->once.retval) {
-        g_mutex_free((GMutex *) mutex->once.retval);
-    }
-    mutex->once = (GOnce) G_ONCE_INIT;
-}
-
-static inline void (g_mutex_lock)(CompatGMutex *mutex)
-{
-    g_once(&mutex->once, do_g_mutex_new, NULL);
-    g_mutex_lock((GMutex *) mutex->once.retval);
-}
-#undef g_mutex_lock
-
-static inline gboolean (g_mutex_trylock)(CompatGMutex *mutex)
-{
-    g_once(&mutex->once, do_g_mutex_new, NULL);
-    return g_mutex_trylock((GMutex *) mutex->once.retval);
-}
-#undef g_mutex_trylock
-
-
-static inline void (g_mutex_unlock)(CompatGMutex *mutex)
-{
-    g_mutex_unlock((GMutex *) mutex->once.retval);
-}
-#undef g_mutex_unlock
-
-static inline gpointer do_g_cond_new(gpointer unused)
-{
-    return (gpointer) g_cond_new();
-}
-
-static inline void g_cond_init(CompatGCond *cond)
-{
-    cond->once = (GOnce) G_ONCE_INIT;
-}
-
-static inline void g_cond_clear(CompatGCond *cond)
-{
-    g_assert(cond->once.status != G_ONCE_STATUS_PROGRESS);
-    if (cond->once.retval) {
-        g_cond_free((GCond *) cond->once.retval);
-    }
-    cond->once = (GOnce) G_ONCE_INIT;
-}
-
-static inline void (g_cond_wait)(CompatGCond *cond, CompatGMutex *mutex)
-{
-    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
-    g_once(&cond->once, do_g_cond_new, NULL);
-    g_cond_wait((GCond *) cond->once.retval, (GMutex *) mutex->once.retval);
-}
-#undef g_cond_wait
-
-static inline void (g_cond_broadcast)(CompatGCond *cond)
-{
-    g_once(&cond->once, do_g_cond_new, NULL);
-    g_cond_broadcast((GCond *) cond->once.retval);
-}
-#undef g_cond_broadcast
-
-static inline void (g_cond_signal)(CompatGCond *cond)
-{
-    g_once(&cond->once, do_g_cond_new, NULL);
-    g_cond_signal((GCond *) cond->once.retval);
-}
-#undef g_cond_signal
-
-static inline gboolean (g_cond_timed_wait)(CompatGCond *cond,
-                                           CompatGMutex *mutex,
-                                           GTimeVal *time)
-{
-    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
-    g_once(&cond->once, do_g_cond_new, NULL);
-    return g_cond_timed_wait((GCond *) cond->once.retval,
-                             (GMutex *) mutex->once.retval, time);
-}
-#undef g_cond_timed_wait
-
-/* This is not a macro, because it didn't exist until 2.32.  */
-static inline gboolean g_cond_wait_until(CompatGCond *cond, CompatGMutex *mutex,
-                                         gint64 end_time)
-{
-    GTimeVal time;
-
-    /* Convert from monotonic to CLOCK_REALTIME.  */
-    end_time -= g_get_monotonic_time();
-    g_get_current_time(&time);
-    end_time += time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
-
-    time.tv_sec = end_time / G_TIME_SPAN_SECOND;
-    time.tv_usec = end_time % G_TIME_SPAN_SECOND;
-    return g_cond_timed_wait(cond, mutex, &time);
-}
-
-/* before 2.31 there was no g_thread_new() */
-static inline GThread *g_thread_new(const char *name,
-                                    GThreadFunc func, gpointer data)
-{
-    GThread *thread = g_thread_create(func, data, TRUE, NULL);
-    if (!thread) {
-        g_error("creating thread");
-    }
-    return thread;
-}
-#else
-#define CompatGMutex GMutex
-#define CompatGCond GCond
-#endif /* glib 2.31 */
-
-#if !GLIB_CHECK_VERSION(2, 32, 0)
-/* Beware, function returns gboolean since 2.39.2, see GLib commit 9101915 */
-static inline void g_hash_table_add(GHashTable *hash_table, gpointer key)
-{
-    g_hash_table_replace(hash_table, key, key);
-}
-#endif
-
-#ifndef g_assert_true
-#define g_assert_true(expr)                                                    \
-    do {                                                                       \
-        if (G_LIKELY(expr)) {                                                  \
-        } else {                                                               \
-            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
-                                "'" #expr "' should be TRUE");                 \
-        }                                                                      \
-    } while (0)
-#endif
-
-#ifndef g_assert_false
-#define g_assert_false(expr)                                                   \
-    do {                                                                       \
-        if (G_LIKELY(!(expr))) {                                               \
-        } else {                                                               \
-            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
-                                "'" #expr "' should be FALSE");                \
-        }                                                                      \
-    } while (0)
-#endif
-
-#ifndef g_assert_null
-#define g_assert_null(expr)                                                    \
-    do {                                                                       \
-        if (G_LIKELY((expr) == NULL)) {                                        \
-        } else {                                                               \
-            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
-                                "'" #expr "' should be NULL");                 \
-        }                                                                      \
-    } while (0)
-#endif
-
-#ifndef g_assert_nonnull
-#define g_assert_nonnull(expr)                                                 \
-    do {                                                                       \
-        if (G_LIKELY((expr) != NULL)) {                                        \
-        } else {                                                               \
-            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
-                                "'" #expr "' should not be NULL");             \
-        }                                                                      \
-    } while (0)
-#endif
-
-#ifndef g_assert_cmpmem
-#define g_assert_cmpmem(m1, l1, m2, l2)                                        \
-    do {                                                                       \
-        gconstpointer __m1 = m1, __m2 = m2;                                    \
-        int __l1 = l1, __l2 = l2;                                              \
-        if (__l1 != __l2) {                                                    \
-            g_assertion_message_cmpnum(                                        \
-                G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,                   \
-                #l1 " (len(" #m1 ")) == " #l2 " (len(" #m2 "))", __l1, "==",   \
-                __l2, 'i');                                                    \
-        } else if (memcmp(__m1, __m2, __l1) != 0) {                            \
-            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
-                                "assertion failed (" #m1 " == " #m2 ")");      \
-        }                                                                      \
-    } while (0)
-#endif
-
-#if !GLIB_CHECK_VERSION(2, 28, 0)
-static inline void g_list_free_full(GList *list, GDestroyNotify free_func)
-{
-    GList *l;
-
-    for (l = list; l; l = l->next) {
-        free_func(l->data);
-    }
-
-    g_list_free(list);
-}
-
-static inline void g_slist_free_full(GSList *list, GDestroyNotify free_func)
-{
-    GSList *l;
-
-    for (l = list; l; l = l->next) {
-        free_func(l->data);
-    }
-
-    g_slist_free(list);
-}
-#endif
-
-#if !GLIB_CHECK_VERSION(2, 26, 0)
-static inline void g_source_set_name(GSource *source, const char *name)
-{
-    /* This is just a debugging aid, so leaving it a no-op */
-}
-static inline void g_source_set_name_by_id(guint tag, const char *name)
-{
-    /* This is just a debugging aid, so leaving it a no-op */
-}
-#endif
-
-#if !GLIB_CHECK_VERSION(2, 36, 0)
-/* Always fail.  This will not include error_report output in the test log,
- * sending it instead to stderr.
- */
-#define g_test_initialized() (0)
-#endif
-#if !GLIB_CHECK_VERSION(2, 38, 0)
-#ifdef CONFIG_HAS_GLIB_SUBPROCESS_TESTS
-#error schizophrenic detection of glib subprocess testing
-#endif
-#define g_test_subprocess() (0)
-#endif
-
-
-#if !GLIB_CHECK_VERSION(2, 34, 0)
-static inline void
-g_test_add_data_func_full(const char *path,
-                          gpointer data,
-                          gpointer fn,
-                          gpointer data_free_func)
-{
-#if GLIB_CHECK_VERSION(2, 26, 0)
-    /* back-compat casts, remove this once we can require new-enough glib */
-    g_test_add_vtable(path, 0, data, NULL,
-                      (GTestFixtureFunc)fn, (GTestFixtureFunc) data_free_func);
-#else
-    /* back-compat casts, remove this once we can require new-enough glib */
-    g_test_add_vtable(path, 0, data, NULL,
-                      (void (*)(void)) fn, (void (*)(void)) data_free_func);
-#endif
-}
-#endif
-
-
-#endif
diff --git a/include/glib/glib-compat.h b/include/glib/glib-compat.h
new file mode 100644
index 0000000..863c8cf
--- /dev/null
+++ b/include/glib/glib-compat.h
@@ -0,0 +1,352 @@
+/*
+ * GLIB Compatibility Functions
+ *
+ * Copyright IBM, Corp. 2013
+ *
+ * Authors:
+ *  Anthony Liguori   <aliguori@us.ibm.com>
+ *  Michael Tokarev   <mjt@tls.msk.ru>
+ *  Paolo Bonzini     <pbonzini@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_GLIB_COMPAT_H
+#define QEMU_GLIB_COMPAT_H
+
+#include <glib.h>
+
+/* GLIB version compatibility flags */
+#if !GLIB_CHECK_VERSION(2, 26, 0)
+#define G_TIME_SPAN_SECOND              (G_GINT64_CONSTANT(1000000))
+#endif
+
+#if !GLIB_CHECK_VERSION(2, 28, 0)
+static inline gint64 qemu_g_get_monotonic_time(void)
+{
+    /* g_get_monotonic_time() is best-effort so we can use the wall clock as a
+     * fallback.
+     */
+
+    GTimeVal time;
+    g_get_current_time(&time);
+
+    return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
+}
+/* work around distro backports of this interface */
+#define g_get_monotonic_time() qemu_g_get_monotonic_time()
+#endif
+
+#if defined(_WIN32) && !GLIB_CHECK_VERSION(2, 50, 0)
+/*
+ * g_poll has a problem on Windows when using
+ * timeouts < 10ms, so use wrapper.
+ */
+#define g_poll(fds, nfds, timeout) g_poll_fixed(fds, nfds, timeout)
+gint g_poll_fixed(GPollFD *fds, guint nfds, gint timeout);
+#endif
+
+#if !GLIB_CHECK_VERSION(2, 30, 0)
+/* Not a 100% compatible implementation, but good enough for most
+ * cases. Placeholders are only supported at the end of the
+ * template. */
+static inline gchar *qemu_g_dir_make_tmp(gchar const *tmpl, GError **error)
+{
+    gchar *path = g_build_filename(g_get_tmp_dir(), tmpl ?: ".XXXXXX", NULL);
+
+    if (mkdtemp(path) != NULL) {
+        return path;
+    }
+    /* Error occurred, clean up. */
+    g_set_error(error, G_FILE_ERROR, g_file_error_from_errno(errno),
+                "mkdtemp() failed");
+    g_free(path);
+    return NULL;
+}
+#define g_dir_make_tmp(tmpl, error) qemu_g_dir_make_tmp(tmpl, error)
+#endif /* glib 2.30 */
+
+#if !GLIB_CHECK_VERSION(2, 31, 0)
+/* before glib-2.31, GMutex and GCond was dynamic-only (there was a separate
+ * GStaticMutex, but it didn't work with condition variables).
+ *
+ * Our implementation uses GOnce to fake a static implementation that does
+ * not require separate initialization.
+ * We need to rename the types to avoid passing our CompatGMutex/CompatGCond
+ * by mistake to a function that expects GMutex/GCond.  However, for ease
+ * of use we keep the GLib function names.  GLib uses macros for the
+ * implementation, we use inline functions instead and undefine the macros.
+ */
+
+typedef struct CompatGMutex {
+    GOnce once;
+} CompatGMutex;
+
+typedef struct CompatGCond {
+    GOnce once;
+} CompatGCond;
+
+static inline gpointer do_g_mutex_new(gpointer unused)
+{
+    return (gpointer) g_mutex_new();
+}
+
+static inline void g_mutex_init(CompatGMutex *mutex)
+{
+    mutex->once = (GOnce) G_ONCE_INIT;
+}
+
+static inline void g_mutex_clear(CompatGMutex *mutex)
+{
+    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
+    if (mutex->once.retval) {
+        g_mutex_free((GMutex *) mutex->once.retval);
+    }
+    mutex->once = (GOnce) G_ONCE_INIT;
+}
+
+static inline void (g_mutex_lock)(CompatGMutex *mutex)
+{
+    g_once(&mutex->once, do_g_mutex_new, NULL);
+    g_mutex_lock((GMutex *) mutex->once.retval);
+}
+#undef g_mutex_lock
+
+static inline gboolean (g_mutex_trylock)(CompatGMutex *mutex)
+{
+    g_once(&mutex->once, do_g_mutex_new, NULL);
+    return g_mutex_trylock((GMutex *) mutex->once.retval);
+}
+#undef g_mutex_trylock
+
+
+static inline void (g_mutex_unlock)(CompatGMutex *mutex)
+{
+    g_mutex_unlock((GMutex *) mutex->once.retval);
+}
+#undef g_mutex_unlock
+
+static inline gpointer do_g_cond_new(gpointer unused)
+{
+    return (gpointer) g_cond_new();
+}
+
+static inline void g_cond_init(CompatGCond *cond)
+{
+    cond->once = (GOnce) G_ONCE_INIT;
+}
+
+static inline void g_cond_clear(CompatGCond *cond)
+{
+    g_assert(cond->once.status != G_ONCE_STATUS_PROGRESS);
+    if (cond->once.retval) {
+        g_cond_free((GCond *) cond->once.retval);
+    }
+    cond->once = (GOnce) G_ONCE_INIT;
+}
+
+static inline void (g_cond_wait)(CompatGCond *cond, CompatGMutex *mutex)
+{
+    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
+    g_once(&cond->once, do_g_cond_new, NULL);
+    g_cond_wait((GCond *) cond->once.retval, (GMutex *) mutex->once.retval);
+}
+#undef g_cond_wait
+
+static inline void (g_cond_broadcast)(CompatGCond *cond)
+{
+    g_once(&cond->once, do_g_cond_new, NULL);
+    g_cond_broadcast((GCond *) cond->once.retval);
+}
+#undef g_cond_broadcast
+
+static inline void (g_cond_signal)(CompatGCond *cond)
+{
+    g_once(&cond->once, do_g_cond_new, NULL);
+    g_cond_signal((GCond *) cond->once.retval);
+}
+#undef g_cond_signal
+
+static inline gboolean (g_cond_timed_wait)(CompatGCond *cond,
+                                           CompatGMutex *mutex,
+                                           GTimeVal *time)
+{
+    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
+    g_once(&cond->once, do_g_cond_new, NULL);
+    return g_cond_timed_wait((GCond *) cond->once.retval,
+                             (GMutex *) mutex->once.retval, time);
+}
+#undef g_cond_timed_wait
+
+/* This is not a macro, because it didn't exist until 2.32.  */
+static inline gboolean g_cond_wait_until(CompatGCond *cond, CompatGMutex *mutex,
+                                         gint64 end_time)
+{
+    GTimeVal time;
+
+    /* Convert from monotonic to CLOCK_REALTIME.  */
+    end_time -= g_get_monotonic_time();
+    g_get_current_time(&time);
+    end_time += time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
+
+    time.tv_sec = end_time / G_TIME_SPAN_SECOND;
+    time.tv_usec = end_time % G_TIME_SPAN_SECOND;
+    return g_cond_timed_wait(cond, mutex, &time);
+}
+
+/* before 2.31 there was no g_thread_new() */
+static inline GThread *g_thread_new(const char *name,
+                                    GThreadFunc func, gpointer data)
+{
+    GThread *thread = g_thread_create(func, data, TRUE, NULL);
+    if (!thread) {
+        g_error("creating thread");
+    }
+    return thread;
+}
+#else
+#define CompatGMutex GMutex
+#define CompatGCond GCond
+#endif /* glib 2.31 */
+
+#if !GLIB_CHECK_VERSION(2, 32, 0)
+/* Beware, function returns gboolean since 2.39.2, see GLib commit 9101915 */
+static inline void g_hash_table_add(GHashTable *hash_table, gpointer key)
+{
+    g_hash_table_replace(hash_table, key, key);
+}
+#endif
+
+#ifndef g_assert_true
+#define g_assert_true(expr)                                                    \
+    do {                                                                       \
+        if (G_LIKELY(expr)) {                                                  \
+        } else {                                                               \
+            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
+                                "'" #expr "' should be TRUE");                 \
+        }                                                                      \
+    } while (0)
+#endif
+
+#ifndef g_assert_false
+#define g_assert_false(expr)                                                   \
+    do {                                                                       \
+        if (G_LIKELY(!(expr))) {                                               \
+        } else {                                                               \
+            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
+                                "'" #expr "' should be FALSE");                \
+        }                                                                      \
+    } while (0)
+#endif
+
+#ifndef g_assert_null
+#define g_assert_null(expr)                                                    \
+    do {                                                                       \
+        if (G_LIKELY((expr) == NULL)) {                                        \
+        } else {                                                               \
+            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
+                                "'" #expr "' should be NULL");                 \
+        }                                                                      \
+    } while (0)
+#endif
+
+#ifndef g_assert_nonnull
+#define g_assert_nonnull(expr)                                                 \
+    do {                                                                       \
+        if (G_LIKELY((expr) != NULL)) {                                        \
+        } else {                                                               \
+            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
+                                "'" #expr "' should not be NULL");             \
+        }                                                                      \
+    } while (0)
+#endif
+
+#ifndef g_assert_cmpmem
+#define g_assert_cmpmem(m1, l1, m2, l2)                                        \
+    do {                                                                       \
+        gconstpointer __m1 = m1, __m2 = m2;                                    \
+        int __l1 = l1, __l2 = l2;                                              \
+        if (__l1 != __l2) {                                                    \
+            g_assertion_message_cmpnum(                                        \
+                G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,                   \
+                #l1 " (len(" #m1 ")) == " #l2 " (len(" #m2 "))", __l1, "==",   \
+                __l2, 'i');                                                    \
+        } else if (memcmp(__m1, __m2, __l1) != 0) {                            \
+            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
+                                "assertion failed (" #m1 " == " #m2 ")");      \
+        }                                                                      \
+    } while (0)
+#endif
+
+#if !GLIB_CHECK_VERSION(2, 28, 0)
+static inline void g_list_free_full(GList *list, GDestroyNotify free_func)
+{
+    GList *l;
+
+    for (l = list; l; l = l->next) {
+        free_func(l->data);
+    }
+
+    g_list_free(list);
+}
+
+static inline void g_slist_free_full(GSList *list, GDestroyNotify free_func)
+{
+    GSList *l;
+
+    for (l = list; l; l = l->next) {
+        free_func(l->data);
+    }
+
+    g_slist_free(list);
+}
+#endif
+
+#if !GLIB_CHECK_VERSION(2, 26, 0)
+static inline void g_source_set_name(GSource *source, const char *name)
+{
+    /* This is just a debugging aid, so leaving it a no-op */
+}
+static inline void g_source_set_name_by_id(guint tag, const char *name)
+{
+    /* This is just a debugging aid, so leaving it a no-op */
+}
+#endif
+
+#if !GLIB_CHECK_VERSION(2, 36, 0)
+/* Always fail.  This will not include error_report output in the test log,
+ * sending it instead to stderr.
+ */
+#define g_test_initialized() (0)
+#endif
+#if !GLIB_CHECK_VERSION(2, 38, 0)
+#ifdef CONFIG_HAS_GLIB_SUBPROCESS_TESTS
+#error schizophrenic detection of glib subprocess testing
+#endif
+#define g_test_subprocess() (0)
+#endif
+
+
+#if !GLIB_CHECK_VERSION(2, 34, 0)
+static inline void
+g_test_add_data_func_full(const char *path,
+                          gpointer data,
+                          gpointer fn,
+                          gpointer data_free_func)
+{
+#if GLIB_CHECK_VERSION(2, 26, 0)
+    /* back-compat casts, remove this once we can require new-enough glib */
+    g_test_add_vtable(path, 0, data, NULL,
+                      (GTestFixtureFunc)fn, (GTestFixtureFunc) data_free_func);
+#else
+    /* back-compat casts, remove this once we can require new-enough glib */
+    g_test_add_vtable(path, 0, data, NULL,
+                      (void (*)(void)) fn, (void (*)(void)) data_free_func);
+#endif
+}
+#endif
+
+
+#endif
diff --git a/include/glib/glib-helper.h b/include/glib/glib-helper.h
new file mode 100644
index 0000000..db740fb
--- /dev/null
+++ b/include/glib/glib-helper.h
@@ -0,0 +1,30 @@
+/*
+ * Helpers for GLIB
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_GLIB_HELPER_H
+#define QEMU_GLIB_HELPER_H
+
+
+#include "glib/glib-compat.h"
+
+#define GPOINTER_TO_UINT64(a) ((guint64) (a))
+
+/*
+ * return 1 in case of a > b, -1 otherwise and 0 if equeal
+ */
+gint g_int_cmp64(gconstpointer a, gconstpointer b,
+        gpointer __attribute__((unused)) user_data);
+
+/*
+ * return 1 in case of a > b, -1 otherwise and 0 if equeal
+ */
+int g_int_cmp(gconstpointer a, gconstpointer b,
+        gpointer __attribute__((unused)) user_data);
+
+#endif /* QEMU_GLIB_HELPER_H */
+
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 122ff06..36f8a89 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -104,7 +104,7 @@ extern int daemon(int, int);
 #include "sysemu/os-posix.h"
 #endif
 
-#include "glib-compat.h"
+#include "glib/glib-compat.h"
 #include "qemu/typedefs.h"
 
 #ifndef O_LARGEFILE
diff --git a/linux-user/main.c b/linux-user/main.c
index 10a3bb3..7cea6bc 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -35,7 +35,7 @@
 #include "elf.h"
 #include "exec/log.h"
 #include "trace/control.h"
-#include "glib-compat.h"
+#include "glib/glib-compat.h"
 
 char *exec_path;
 
diff --git a/scripts/clean-includes b/scripts/clean-includes
index dd938da..b32b928 100755
--- a/scripts/clean-includes
+++ b/scripts/clean-includes
@@ -123,7 +123,7 @@ for f in "$@"; do
       ;;
     *include/qemu/osdep.h | \
     *include/qemu/compiler.h | \
-    *include/glib-compat.h | \
+    *include/glib/glib-compat.h | \
     *include/sysemu/os-posix.h | \
     *include/sysemu/os-win32.h | \
     *include/standard-headers/ )
diff --git a/util/Makefile.objs b/util/Makefile.objs
index c6205eb..0080712 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -43,3 +43,4 @@ util-obj-y += qdist.o
 util-obj-y += qht.o
 util-obj-y += range.o
 util-obj-y += systemd.o
+util-obj-y += glib-helper.o
diff --git a/util/glib-helper.c b/util/glib-helper.c
new file mode 100644
index 0000000..2557009
--- /dev/null
+++ b/util/glib-helper.c
@@ -0,0 +1,29 @@
+/*
+ * Implementation for GLIB helpers
+ * this file is intented to commulate and later reuse
+ * additional glib functions
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+
+ */
+
+#include "glib/glib-helper.h"
+
+gint g_int_cmp64(gconstpointer a, gconstpointer b,
+        gpointer __attribute__((unused)) user_data)
+{
+    guint64 ua = GPOINTER_TO_UINT64(a);
+    guint64 ub = GPOINTER_TO_UINT64(b);
+    return (ua > ub) - (ua < ub);
+}
+
+/*
+ * return 1 in case of a > b, -1 otherwise and 0 if equeal
+ */
+gint g_int_cmp(gconstpointer a, gconstpointer b,
+        gpointer __attribute__((unused)) user_data)
+{
+    return g_int_cmp64(a, b, user_data);
+}
+
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support
       [not found]   ` <CGME20170414131739eucas1p27a3eed795ae545efff380d7c5f8358c3@eucas1p2.samsung.com>
@ 2017-04-14 13:17     ` Alexey Perevalov
  2017-04-21 10:24       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 38+ messages in thread
From: Alexey Perevalov @ 2017-04-14 13:17 UTC (permalink / raw)
  To: dgilbert, qemu-devel; +Cc: a.perevalov, i.maximets

Userfaultfd mechanism is able to provide process thread id,
in case when client request it with UFDD_API ioctl.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/migration/postcopy-ram.h |  2 +-
 migration/migration.c            |  2 +-
 migration/postcopy-ram.c         | 12 ++++++------
 migration/savevm.c               |  2 +-
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
index 8e036b9..809f6db 100644
--- a/include/migration/postcopy-ram.h
+++ b/include/migration/postcopy-ram.h
@@ -14,7 +14,7 @@
 #define QEMU_POSTCOPY_RAM_H
 
 /* Return true if the host supports everything we need to do postcopy-ram */
-bool postcopy_ram_supported_by_host(void);
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
 
 /*
  * Make all of RAM sensitive to accesses to areas that haven't yet been written
diff --git a/migration/migration.c b/migration/migration.c
index ad4036f..79f6425 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -802,7 +802,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
          * special support.
          */
         if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
-            !postcopy_ram_supported_by_host()) {
+            !postcopy_ram_supported_by_host(NULL)) {
             /* postcopy_ram_supported_by_host will have emitted a more
              * detailed message
              */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index dc80dbb..70f0480 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -60,13 +60,13 @@ struct PostcopyDiscardState {
 #include <sys/eventfd.h>
 #include <linux/userfaultfd.h>
 
-static bool ufd_version_check(int ufd)
+static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
 {
     struct uffdio_api api_struct;
     uint64_t ioctl_mask;
 
     api_struct.api = UFFD_API;
-    api_struct.features = 0;
+    api_struct.features = UFFD_FEATURE_THREAD_ID;
     if (ioctl(ufd, UFFDIO_API, &api_struct)) {
         error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
                      strerror(errno));
@@ -113,7 +113,7 @@ static int test_range_shared(const char *block_name, void *host_addr,
  * normally fine since if the postcopy succeeds it gets turned back on at the
  * end.
  */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
     long pagesize = getpagesize();
     int ufd = -1;
@@ -136,7 +136,7 @@ bool postcopy_ram_supported_by_host(void)
     }
 
     /* Version and features check */
-    if (!ufd_version_check(ufd)) {
+    if (!ufd_version_check(ufd, mis)) {
         goto out;
     }
 
@@ -515,7 +515,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
      * Although the host check already tested the API, we need to
      * do the check again as an ABI handshake on the new fd.
      */
-    if (!ufd_version_check(mis->userfault_fd)) {
+    if (!ufd_version_check(mis->userfault_fd, mis)) {
         return -1;
     }
 
@@ -653,7 +653,7 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
 
 #else
 /* No target OS support, stubs just fail */
-bool postcopy_ram_supported_by_host(void)
+bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
     error_report("%s: No OS support", __func__);
     return false;
diff --git a/migration/savevm.c b/migration/savevm.c
index 3b19a4a..f01e418 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1360,7 +1360,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
         return -1;
     }
 
-    if (!postcopy_ram_supported_by_host()) {
+    if (!postcopy_ram_supported_by_host(mis)) {
         postcopy_state_set(POSTCOPY_INCOMING_NONE);
         return -1;
     }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side
       [not found]   ` <CGME20170414131740eucas1p27eba648b990a93a627265c740e7ff118@eucas1p2.samsung.com>
@ 2017-04-14 13:17     ` Alexey Perevalov
  2017-04-21 12:00       ` Dr. David Alan Gilbert
  2017-04-25  8:24       ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side Peter Xu
  0 siblings, 2 replies; 38+ messages in thread
From: Alexey Perevalov @ 2017-04-14 13:17 UTC (permalink / raw)
  To: dgilbert, qemu-devel; +Cc: a.perevalov, i.maximets

This patch provides downtime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach just keeps tree with page fault addr as a key,
and t1-t2 interval of pagefault time and page copy time, with
affected vCPU bit mask.
For more implementation details please see comment to
get_postcopy_total_downtime function.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/migration/migration.h |  14 +++
 migration/migration.c         | 280 +++++++++++++++++++++++++++++++++++++++++-
 migration/postcopy-ram.c      |  24 +++-
 migration/qemu-file.c         |   1 -
 migration/trace-events        |   9 +-
 5 files changed, 323 insertions(+), 5 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5720c88..5d2c628 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -123,10 +123,24 @@ struct MigrationIncomingState {
 
     /* See savevm.c */
     LoadStateEntry_Head loadvm_handlers;
+
+    /*
+     *  Tree for keeping postcopy downtime,
+     *  necessary to calculate correct downtime, during multiple
+     *  vm suspends, it keeps host page address as a key and
+     *  DowntimeDuration as a data
+     *  NULL means kernel couldn't provide process thread id,
+     *  and QEMU couldn't identify which vCPU raise page fault
+     */
+    GTree *postcopy_downtime;
 };
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
+void mark_postcopy_downtime_end(uint64_t addr);
+uint64_t get_postcopy_total_downtime(void);
+void destroy_downtime_duration(gpointer data);
 
 /*
  * An outstanding page request, on the source, having been received
diff --git a/migration/migration.c b/migration/migration.c
index 79f6425..5bac434 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -38,6 +38,8 @@
 #include "io/channel-tls.h"
 #include "migration/colo.h"
 
+#define DEBUG_VCPU_DOWNTIME 1
+
 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
 
 /* Amount of time to allocate to each "chunk" of bandwidth-throttled
@@ -77,6 +79,19 @@ static NotifierList migration_state_notifiers =
 
 static bool deferred_incoming;
 
+typedef struct {
+    int64_t begin;
+    int64_t end;
+    uint64_t *cpus; /* cpus bit mask array, QEMU bit functions support
+     bit operation on memory regions, but doesn't check out of range */
+} DowntimeDuration;
+
+typedef struct {
+    int64_t tp; /* point in time */
+    bool is_end;
+    uint64_t *cpus;
+} OverlapDowntime;
+
 /*
  * Current state of incoming postcopy; note this is not part of
  * MigrationIncomingState since it's state is used during cleanup
@@ -117,6 +132,13 @@ MigrationState *migrate_get_current(void)
     return &current_migration;
 }
 
+void destroy_downtime_duration(gpointer data)
+{
+    DowntimeDuration *dd = (DowntimeDuration *)data;
+    g_free(dd->cpus);
+    g_free(data);
+}
+
 MigrationIncomingState *migration_incoming_get_current(void)
 {
     static bool once;
@@ -138,10 +160,13 @@ void migration_incoming_state_destroy(void)
     struct MigrationIncomingState *mis = migration_incoming_get_current();
 
     qemu_event_destroy(&mis->main_thread_load_event);
+    if (mis->postcopy_downtime) {
+        g_tree_destroy(mis->postcopy_downtime);
+        mis->postcopy_downtime = NULL;
+    }
     loadvm_free_handlers(mis);
 }
 
-
 typedef struct {
     bool optional;
     uint32_t size;
@@ -1754,7 +1779,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
      */
     ms->postcopy_after_devices = true;
     notifier_list_notify(&migration_state_notifiers, ms);
-
     ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
 
     qemu_mutex_unlock_iothread();
@@ -2117,3 +2141,255 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
     return atomic_xchg(&incoming_postcopy_state, new_state);
 }
 
+#define SIZE_TO_KEEP_CPUBITS (1 + smp_cpus/sizeof(guint64))
+
+void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    DowntimeDuration *dd;
+    if (!mis->postcopy_downtime) {
+        return;
+    }
+
+    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr); /* !!! cast */
+    if (!dd) {
+        dd = (DowntimeDuration *)g_new0(DowntimeDuration, 1);
+        dd->cpus = g_new0(guint64, SIZE_TO_KEEP_CPUBITS);
+        g_tree_insert(mis->postcopy_downtime, (gpointer)addr, (gpointer)dd);
+    }
+
+    if (cpu < 0) {
+        /* assume in this situation all vCPUs are sleeping */
+        int i;
+        for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
+            dd->cpus[i] = ~(uint64_t)0u;
+        }
+    } else
+        set_bit(cpu, dd->cpus);
+
+    /*
+     *  overwrite previously set dd->begin, if that page already was
+     *     faulted on another cpu
+     */
+    dd->begin = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    trace_mark_postcopy_downtime_begin(addr, dd, dd->begin, cpu);
+}
+
+void mark_postcopy_downtime_end(uint64_t addr)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    DowntimeDuration *dd;
+    if (!mis->postcopy_downtime) {
+        return;
+    }
+
+    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr);
+    if (!dd) {
+        /* error_report("Could not populate downtime duration completion time \n\
+                        There is no downtime duration for 0x%"PRIx64, addr); */
+        return;
+    }
+
+    dd->end = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    trace_mark_postcopy_downtime_end(addr, dd, dd->end);
+}
+
+struct downtime_overlay_cxt {
+    GPtrArray *downtime_points;
+    size_t number_of_points;
+};
+/*
+ * This function split each DowntimeDuration, which represents as start/end
+ * pointand makes a points of it, then fill array with points,
+ * to sort it in future.
+ */
+static gboolean split_duration_and_fill_points(gpointer key, gpointer value,
+                                        gpointer data)
+{
+    struct downtime_overlay_cxt *ctx = (struct downtime_overlay_cxt *)data;
+    DowntimeDuration *dd = (DowntimeDuration *)value;
+    GPtrArray *interval = ctx->downtime_points;
+    if (dd->begin) {
+        OverlapDowntime *od_begin = g_new0(OverlapDowntime, 1);
+        od_begin->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
+        od_begin->tp = dd->begin;
+        od_begin->is_end = false;
+        g_ptr_array_add(interval, od_begin);
+        ctx->number_of_points += 1;
+    }
+
+    if (dd->end) {
+        OverlapDowntime *od_end = g_new0(OverlapDowntime, 1);
+        od_end->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
+        od_end->tp = dd->end;
+        od_end->is_end = true;
+        g_ptr_array_add(interval, od_end);
+        ctx->number_of_points += 1;
+    }
+
+    if (dd->end && dd->begin)
+        trace_split_duration_and_fill_points(dd->end - dd->begin, (uint64_t)key);
+    return FALSE;
+}
+
+#ifdef DEBUG_VCPU_DOWNTIME
+static gboolean calculate_per_cpu(gpointer key, gpointer value,
+                                  gpointer data)
+{
+    int *downtime_cpu = (int *)data;
+    DowntimeDuration *dd = (DowntimeDuration *)value;
+    int cpu_iter;
+    for (cpu_iter = 0; cpu_iter < smp_cpus; cpu_iter++) {
+        if (test_bit(cpu_iter, dd->cpus) && dd->end && dd->begin)
+            downtime_cpu[cpu_iter] += dd->end - dd->begin;
+    }
+    return FALSE;
+}
+#endif /* DEBUG_VCPU_DOWNTIME */
+
+static gint compare_downtime(gconstpointer a, gconstpointer b)
+{
+    DowntimeDuration *dda = (DowntimeDuration *)a;
+    DowntimeDuration *ddb = (DowntimeDuration *)b;
+    return dda->begin - ddb->begin;
+}
+
+static void destroy_overlap_downtime(gpointer data)
+{
+    OverlapDowntime *od = (OverlapDowntime *)data;
+    g_free(od->cpus);
+    g_free(data);
+}
+
+static int check_overlap(uint64_t *b)
+{
+    unsigned long zero_bit = find_first_zero_bit(b, BITS_PER_LONG * SIZE_TO_KEEP_CPUBITS);
+    return zero_bit >= smp_cpus;
+}
+
+/*
+ * This function calculates downtime per cpu and trace it
+ *
+ *  Also it calculates total downtime as an interval's overlap,
+ *  for many vCPU.
+ *
+ *  The approach is following:
+ *  Initially intervals are represented in tree where key is
+ *  pagefault address, and values:
+ *   begin - page fault time
+ *   end   - page load time
+ *   cpus  - bit mask shows affected cpus
+ *
+ *  To calculate overlap on all cpus, intervals converted into
+ *  array of points in time (downtime_points), the size of
+ *  array is 2 * number of nodes in tree of intervals (2 array
+ *  elements per one in element of interval).
+ *  Each element is marked as end (E) or as start (S) of interval.
+ *  The overlap downtime will be calculated for SE, only in case
+ *  there is sequence S(0..N)E(M) for every vCPU.
+ *
+ * As example we have 3 CPU
+ *
+ *      S1        E1           S1               E1
+ * -----***********------------xxx***************------------------------> CPU1
+ *
+ *             S2                E2
+ * ------------****************xxx---------------------------------------> CPU2
+ *
+ *                         S3            E3
+ * ------------------------****xxx********-------------------------------> CPU3
+ *
+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
+ * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
+ * Legend of picture is following: * - means downtime per vCPU
+ *                                 x - means overlapped downtime
+ */
+uint64_t get_postcopy_total_downtime(void)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    uint64_t total_downtime = 0; /* for total overlapped downtime */
+    const int intervals = g_tree_nnodes(mis->postcopy_downtime);
+    int point_iter, start_point_iter, i;
+    struct downtime_overlay_cxt dp_ctx = { 0 };
+    /*
+     * array will contain 2 * interval points or less, if
+     * it was not page fault finalization for page,
+     * real count will be in ctx.number_of_points
+     */
+    dp_ctx.downtime_points = g_ptr_array_new_full(2 * intervals,
+                                                     destroy_overlap_downtime);
+    if (!mis->postcopy_downtime) {
+        goto out;
+    }
+
+#ifdef DEBUG_VCPU_DOWNTIME
+    {
+        gint *downtime_cpu = g_new0(int, smp_cpus);
+        g_tree_foreach(mis->postcopy_downtime, calculate_per_cpu, downtime_cpu);
+        for (point_iter = 0; point_iter < smp_cpus; point_iter++)
+        {
+            trace_downtime_per_cpu(point_iter, downtime_cpu[point_iter]);
+        }
+        g_free(downtime_cpu);
+    }
+#endif /* DEBUG_VCPU_DOWNTIME */
+
+    /* make downtime points S/E from interval */
+    g_tree_foreach(mis->postcopy_downtime, split_duration_and_fill_points,
+                   &dp_ctx);
+    g_ptr_array_sort(dp_ctx.downtime_points, compare_downtime);
+
+    for (point_iter = 1; point_iter < dp_ctx.number_of_points;
+         point_iter++) {
+        OverlapDowntime *od = g_ptr_array_index(dp_ctx.downtime_points,
+                point_iter);
+        uint64_t *cur_cpus;
+        int smp_cpus_i = smp_cpus;
+        OverlapDowntime *prev_od = g_ptr_array_index(dp_ctx.downtime_points,
+                                                     point_iter - 1);
+        if (!od || !prev_od)
+            continue;
+        /* we need sequence SE */
+        if (!od->is_end || prev_od->is_end)
+            continue;
+
+        cur_cpus = g_memdup(od->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
+        for (start_point_iter = point_iter - 1;
+             start_point_iter >= 0 && smp_cpus_i;
+             start_point_iter--, smp_cpus_i--) {
+            OverlapDowntime *t_od = g_ptr_array_index(dp_ctx.downtime_points,
+                                                      start_point_iter);
+            if (!t_od)
+                break;
+            /* should be S */
+            if (t_od->is_end)
+                break;
+
+            /* points were sorted, it's possible when
+             * end is not occured, but this points were ommited
+             * in split_duration_and_fill_points */
+            if (od->tp <= prev_od->tp) {
+                break;
+            }
+
+            for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
+                cur_cpus[i] |= t_od->cpus[i];
+            }
+
+            /* check_overlap - just count number of bits in cur_cpus,
+             * and compare it with smp_cpus */
+            if (check_overlap(cur_cpus)) {
+                total_downtime += od->tp - prev_od->tp;
+                /* situation when one S point represents all vCPU is possible */
+                break;
+            }
+        }
+        g_free(cur_cpus);
+    }
+    trace_get_postcopy_total_downtime(g_tree_nnodes(mis->postcopy_downtime),
+        total_downtime);
+out:
+    g_ptr_array_free(dp_ctx.downtime_points, TRUE);
+    return total_downtime;
+}
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 70f0480..ea89f4e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -23,8 +23,10 @@
 #include "migration/postcopy-ram.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/balloon.h"
+#include <sys/param.h>
 #include "qemu/error-report.h"
 #include "trace.h"
+#include "glib/glib-helper.h"
 
 /* Arbitrary limit on size of each discard command,
  * keeps them around ~200 bytes
@@ -81,6 +83,11 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
         return false;
     }
 
+    if (mis && UFFD_FEATURE_THREAD_ID & api_struct.features) {
+        mis->postcopy_downtime = g_tree_new_full(g_int_cmp64,
+                                         NULL, NULL, destroy_downtime_duration);
+    }
+
     if (getpagesize() != ram_pagesize_summary()) {
         bool have_hp = false;
         /* We've got a huge page */
@@ -404,6 +411,18 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+static int get_mem_fault_cpu_index(uint32_t pid)
+{
+    CPUState *cpu_iter;
+
+    CPU_FOREACH(cpu_iter) {
+        if (cpu_iter->thread_id == pid)
+           return cpu_iter->cpu_index;
+    }
+    trace_get_mem_fault_cpu_index(pid);
+    return -1;
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -481,8 +500,10 @@ static void *postcopy_ram_fault_thread(void *opaque)
         rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
         trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
                                                 qemu_ram_get_idstr(rb),
-                                                rb_offset);
+                                                rb_offset, msg.arg.pagefault.feat.ptid);
 
+        mark_postcopy_downtime_begin(msg.arg.pagefault.address,
+                            get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
         /*
          * Send the request to the source - we want to request one
          * of our host page sizes (which is >= TPS)
@@ -577,6 +598,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 
         return -e;
     }
+    mark_postcopy_downtime_end((uint64_t)host);
 
     trace_postcopy_place_page(host);
     return 0;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 195fa94..c9f3e47 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -547,7 +547,6 @@ size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
 int qemu_peek_byte(QEMUFile *f, int offset)
 {
     int index = f->buf_index + offset;
-
     assert(!qemu_file_is_writable(f));
     assert(offset < IO_BUF_SIZE);
 
diff --git a/migration/trace-events b/migration/trace-events
index 7372ce2..ab2e1e4 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -110,6 +110,12 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
 process_incoming_migration_co_postcopy_end_main(void) ""
 migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
 migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
+mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
+mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
+get_postcopy_total_downtime(int num, uint64_t total) "faults %d, total downtime %" PRIu64
+split_duration_and_fill_points(int64_t downtime, uint64_t addr) "downtime %" PRId64 " addr 0x%" PRIx64
+downtime_per_cpu(int cpu_index, int downtime) "downtime cpu[%d]=%d"
+source_return_path_thread_downtime(uint64_t downtime) "downtime %" PRIu64
 
 # migration/rdma.c
 qemu_rdma_accept_incoming_migration(void) ""
@@ -186,7 +192,7 @@ postcopy_ram_enable_notify(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
 postcopy_ram_fault_thread_quit(void) ""
-postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
+postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, int pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %d"
 postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
@@ -195,6 +201,7 @@ save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
+get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
 
 # migration/exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 5/6] migration: send postcopy downtime back to source
       [not found]   ` <CGME20170414131740eucas1p28f240a4e6c78fb56be52f2641c3e5af6@eucas1p2.samsung.com>
@ 2017-04-14 13:17     ` Alexey Perevalov
  2017-04-24 17:26       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 38+ messages in thread
From: Alexey Perevalov @ 2017-04-14 13:17 UTC (permalink / raw)
  To: dgilbert, qemu-devel; +Cc: a.perevalov, i.maximets

Right now to initiate postcopy live migration need to
send request to source machine and specify destination.

User could request migration status by query-migrate qmp command on
source machine, but postcopy downtime is being evaluated on destination,
so it should be transmitted back to source. For this purpose return path
socket was shosen.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 include/migration/migration.h |  4 +++-
 migration/migration.c         | 20 ++++++++++++++++++--
 migration/postcopy-ram.c      |  1 +
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 5d2c628..5535aa6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -55,7 +55,8 @@ enum mig_rp_message_type {
 
     MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
     MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
-
+    MIG_RP_MSG_DOWNTIME,    /* downtime value from destination,
+                               calculated and sent in case of post copy */
     MIG_RP_MSG_MAX
 };
 
@@ -364,6 +365,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
                           uint32_t value);
 void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
                               ram_addr_t start, size_t len);
+void migrate_send_rp_downtime(MigrationIncomingState *mis, uint64_t downtime);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration/migration.c b/migration/migration.c
index 5bac434..3134e24 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -553,6 +553,19 @@ void migrate_send_rp_message(MigrationIncomingState *mis,
 }
 
 /*
+ * Send postcopy migration downtime,
+ * at the moment of calling this function migration should
+ * be completed.
+ */
+void migrate_send_rp_downtime(MigrationIncomingState *mis, uint64_t downtime)
+{
+    uint64_t buf;
+
+    buf = cpu_to_be64(downtime);
+    migrate_send_rp_message(mis, MIG_RP_MSG_DOWNTIME, sizeof(downtime), &buf);
+}
+
+/*
  * Send a 'SHUT' message on the return channel with the given value
  * to indicate that we've finished with the RP.  Non-0 value indicates
  * error.
@@ -1483,6 +1496,7 @@ static struct rp_cmd_args {
     [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
     [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
     [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
+    [MIG_RP_MSG_DOWNTIME]       = { .len =  8, .name = "DOWNTIME" },
     [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
 };
 
@@ -1613,6 +1627,10 @@ static void *source_return_path_thread(void *opaque)
             migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
             break;
 
+        case MIG_RP_MSG_DOWNTIME:
+            ms->downtime = ldq_be_p(buf);
+            break;
+
         default:
             break;
         }
@@ -1677,7 +1695,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
     int ret;
     QIOChannelBuffer *bioc;
     QEMUFile *fb;
-    int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
     bool restart_block = false;
     migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -1779,7 +1796,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
      */
     ms->postcopy_after_devices = true;
     notifier_list_notify(&migration_state_notifiers, ms);
-    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
 
     qemu_mutex_unlock_iothread();
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index ea89f4e..42330fd 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -330,6 +330,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     }
 
     postcopy_state_set(POSTCOPY_INCOMING_END);
+    migrate_send_rp_downtime(mis, get_postcopy_total_downtime());
     migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
 
     if (mis->postcopy_tmp_page) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Qemu-devel] [PATCH 6/6] migration: detailed traces for postcopy
       [not found]   ` <CGME20170414131741eucas1p2f34e11e4292fef1c50ef63bd3522ad04@eucas1p2.samsung.com>
@ 2017-04-14 13:17     ` Alexey Perevalov
  2017-04-17 13:32       ` Philippe Mathieu-Daudé
  2017-04-24 18:03       ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 38+ messages in thread
From: Alexey Perevalov @ 2017-04-14 13:17 UTC (permalink / raw)
  To: dgilbert, qemu-devel; +Cc: a.perevalov, i.maximets

It could help to track down vCPU state during page fault and
page fault sources.

This patch showes proc's status/stack/syscall file at the moment of pagefault,
it's very interesting to know who was page fault initiator.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 migration/postcopy-ram.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++-
 migration/trace-events   |  6 +++
 2 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 42330fd..513633c 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -412,7 +412,91 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
-static int get_mem_fault_cpu_index(uint32_t pid)
+#define PROC_LEN 1024
+#define DEBUG_FAULT_PROCESS_STATUS 1
+
+#ifdef DEBUG_FAULT_PROCESS_STATUS
+
+static FILE *get_proc_file(const gchar *frmt, pid_t thread_id)
+{
+    FILE *f = NULL;
+    gchar *file_path = g_strdup_printf(frmt, thread_id);
+    if (file_path == NULL) {
+        error_report("Couldn't allocate path for %u", thread_id);
+        return NULL;
+    }
+    f = fopen(file_path, "r");
+    if (!f) {
+        error_report("can't open %s", file_path);
+    }
+
+    trace_get_proc_file(file_path);
+    g_free(file_path);
+    return f;
+}
+
+typedef void(*proc_line_handler)(const char *line);
+
+static void proc_line_cb(const char *line)
+{
+    /* trace_ functions are inline */
+    trace_proc_line_cb(line);
+}
+
+static void foreach_line_in_file(FILE *f, proc_line_handler cb)
+{
+    char *line = NULL;
+    ssize_t read;
+    size_t len;
+
+    while ((read = getline(&line, &len, f)) != -1) {
+        /* workaround, trace_ infrastructure already insert \n
+         * and getline includes it */
+        ssize_t str_len = strlen(line) - 1;
+        if (str_len <= 0)
+            continue;
+        line[str_len] = '\0';
+        cb(line);
+    }
+    free(line);
+}
+
+static void observe_thread_proc(const gchar *path_frmt, pid_t thread_id)
+{
+    FILE *f = get_proc_file(path_frmt, thread_id);
+    if (!f) {
+        error_report("can't read thread's proc");
+        return;
+    }
+
+    foreach_line_in_file(f, proc_line_cb);
+    fclose(f);
+}
+
+/*
+ * for convinience tracing need to trace
+ * observe_thread_begin
+ * get_proc_file
+ * proc_line_cb
+ * observe_thread_end
+ */
+static void observe_thread(const char *msg, pid_t thread_id)
+{
+    trace_observe_thread_begin(msg);
+    observe_thread_proc("/proc/%d/status", thread_id);
+    observe_thread_proc("/proc/%d/syscall", thread_id);
+    observe_thread_proc("/proc/%d/stack", thread_id);
+    trace_observe_thread_end(msg);
+}
+
+#else
+static void observe_thread(const char *msg, pid_t thread_id)
+{
+}
+
+#endif /* DEBUG_FAULT_PROCESS_STATUS */
+
+static int get_mem_fault_cpu_index(pid_t pid)
 {
     CPUState *cpu_iter;
 
@@ -421,9 +505,20 @@ static int get_mem_fault_cpu_index(uint32_t pid)
            return cpu_iter->cpu_index;
     }
     trace_get_mem_fault_cpu_index(pid);
+    observe_thread("not a vCPU", pid);
+
     return -1;
 }
 
+static void observe_vcpu_state(void)
+{
+    CPUState *cpu_iter;
+    CPU_FOREACH(cpu_iter) {
+        observe_thread("vCPU", cpu_iter->thread_id);
+        trace_vcpu_state(cpu_iter->running, cpu_iter->cpu_index);
+    }
+}
+
 /*
  * Handle faults detected by the USERFAULT markings
  */
@@ -465,6 +560,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
         }
 
         ret = read(mis->userfault_fd, &msg, sizeof(msg));
+        observe_vcpu_state();
         if (ret != sizeof(msg)) {
             if (errno == EAGAIN) {
                 /*
diff --git a/migration/trace-events b/migration/trace-events
index ab2e1e4..3a74f91 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -202,6 +202,12 @@ save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
 ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
 get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
+observe_thread_status(int ptid, char *name, char *status) "host_tid %d %s %s"
+vcpu_state(int cpu_index, int is_running) "cpu %d running %d"
+proc_line_cb(const char *str) "%s"
+get_proc_file(const char *str) "opened %s"
+observe_thread_begin(const char *str) "%s"
+observe_thread_end(const char *str) "%s"
 
 # migration/exec.c
 migration_exec_outgoing(const char *cmd) "cmd=%s"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c
  2017-04-14 13:17     ` [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c Alexey Perevalov
@ 2017-04-14 16:05       ` Philippe Mathieu-Daudé
  2017-04-17  7:07         ` Alexey
  2017-04-21 10:01         ` Dr. David Alan Gilbert
  2017-04-21 10:27       ` Peter Maydell
  1 sibling, 2 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-04-14 16:05 UTC (permalink / raw)
  To: Alexey Perevalov, dgilbert, qemu-devel; +Cc: i.maximets

Hi Alexey,

On 04/14/2017 10:17 AM, Alexey Perevalov wrote:
> There is a lack of g_int_cmp which compares pointers value in glib,
> xen_disk.c introduced its own, so the same function now requires
> in migration.c. So logically to move it into common place.
> Futher: maybe extend glib.
>
> Also this commit moves existing glib-compat.h into util/glib
> folder for consolidation purpose.

Can you do this in two commits? First one moving files only, second move 
the function?

I'm not sure naming it "g_int_cmp()" won't clash with future _extended_ 
glib, what do you think about naming it "qemu_g_int_cmp()"?

> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  hw/block/xen_disk.c        |  10 +-
>  include/glib-compat.h      | 352 ---------------------------------------------
>  include/glib/glib-compat.h | 352 +++++++++++++++++++++++++++++++++++++++++++++
>  include/glib/glib-helper.h |  30 ++++
>  include/qemu/osdep.h       |   2 +-
>  linux-user/main.c          |   2 +-
>  scripts/clean-includes     |   2 +-
>  util/Makefile.objs         |   1 +
>  util/glib-helper.c         |  29 ++++
>  9 files changed, 417 insertions(+), 363 deletions(-)
>  delete mode 100644 include/glib-compat.h
>  create mode 100644 include/glib/glib-compat.h
>  create mode 100644 include/glib/glib-helper.h
>  create mode 100644 util/glib-helper.c
>
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 456a2d5..36f6396 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -20,6 +20,7 @@
>   */
>
>  #include "qemu/osdep.h"
> +#include "glib/glib-helper.h"
>  #include <sys/ioctl.h>
>  #include <sys/uio.h>
>
> @@ -154,13 +155,6 @@ static void ioreq_reset(struct ioreq *ioreq)
>      qemu_iovec_reset(&ioreq->v);
>  }
>
> -static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
> -{
> -    uint ua = GPOINTER_TO_UINT(a);
> -    uint ub = GPOINTER_TO_UINT(b);
> -    return (ua > ub) - (ua < ub);
> -}
> -
>  static void destroy_grant(gpointer pgnt)
>  {
>      PersistentGrant *grant = pgnt;
> @@ -1191,7 +1185,7 @@ static int blk_connect(struct XenDevice *xendev)
>      if (blkdev->feature_persistent) {
>          /* Init persistent grants */
>          blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
> -        blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
> +        blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)g_int_cmp,
>                                               NULL, NULL,
>                                               batch_maps ?
>                                               (GDestroyNotify)g_free :
> diff --git a/include/glib-compat.h b/include/glib-compat.h
> deleted file mode 100644
> index 863c8cf..0000000
> --- a/include/glib-compat.h
> +++ /dev/null
> @@ -1,352 +0,0 @@
> -/*
> - * GLIB Compatibility Functions
> - *
> - * Copyright IBM, Corp. 2013
> - *
> - * Authors:
> - *  Anthony Liguori   <aliguori@us.ibm.com>
> - *  Michael Tokarev   <mjt@tls.msk.ru>
> - *  Paolo Bonzini     <pbonzini@redhat.com>
> - *
> - * This work is licensed under the terms of the GNU GPL, version 2 or later.
> - * See the COPYING file in the top-level directory.
> - *
> - */
> -
> -#ifndef QEMU_GLIB_COMPAT_H
> -#define QEMU_GLIB_COMPAT_H
> -
> -#include <glib.h>
> -
> -/* GLIB version compatibility flags */
> -#if !GLIB_CHECK_VERSION(2, 26, 0)
> -#define G_TIME_SPAN_SECOND              (G_GINT64_CONSTANT(1000000))
> -#endif
> -
> -#if !GLIB_CHECK_VERSION(2, 28, 0)
> -static inline gint64 qemu_g_get_monotonic_time(void)
> -{
> -    /* g_get_monotonic_time() is best-effort so we can use the wall clock as a
> -     * fallback.
> -     */
> -
> -    GTimeVal time;
> -    g_get_current_time(&time);
> -
> -    return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> -}
> -/* work around distro backports of this interface */
> -#define g_get_monotonic_time() qemu_g_get_monotonic_time()
> -#endif
> -
> -#if defined(_WIN32) && !GLIB_CHECK_VERSION(2, 50, 0)
> -/*
> - * g_poll has a problem on Windows when using
> - * timeouts < 10ms, so use wrapper.
> - */
> -#define g_poll(fds, nfds, timeout) g_poll_fixed(fds, nfds, timeout)
> -gint g_poll_fixed(GPollFD *fds, guint nfds, gint timeout);
> -#endif
> -
> -#if !GLIB_CHECK_VERSION(2, 30, 0)
> -/* Not a 100% compatible implementation, but good enough for most
> - * cases. Placeholders are only supported at the end of the
> - * template. */
> -static inline gchar *qemu_g_dir_make_tmp(gchar const *tmpl, GError **error)
> -{
> -    gchar *path = g_build_filename(g_get_tmp_dir(), tmpl ?: ".XXXXXX", NULL);
> -
> -    if (mkdtemp(path) != NULL) {
> -        return path;
> -    }
> -    /* Error occurred, clean up. */
> -    g_set_error(error, G_FILE_ERROR, g_file_error_from_errno(errno),
> -                "mkdtemp() failed");
> -    g_free(path);
> -    return NULL;
> -}
> -#define g_dir_make_tmp(tmpl, error) qemu_g_dir_make_tmp(tmpl, error)
> -#endif /* glib 2.30 */
> -
> -#if !GLIB_CHECK_VERSION(2, 31, 0)
> -/* before glib-2.31, GMutex and GCond was dynamic-only (there was a separate
> - * GStaticMutex, but it didn't work with condition variables).
> - *
> - * Our implementation uses GOnce to fake a static implementation that does
> - * not require separate initialization.
> - * We need to rename the types to avoid passing our CompatGMutex/CompatGCond
> - * by mistake to a function that expects GMutex/GCond.  However, for ease
> - * of use we keep the GLib function names.  GLib uses macros for the
> - * implementation, we use inline functions instead and undefine the macros.
> - */
> -
> -typedef struct CompatGMutex {
> -    GOnce once;
> -} CompatGMutex;
> -
> -typedef struct CompatGCond {
> -    GOnce once;
> -} CompatGCond;
> -
> -static inline gpointer do_g_mutex_new(gpointer unused)
> -{
> -    return (gpointer) g_mutex_new();
> -}
> -
> -static inline void g_mutex_init(CompatGMutex *mutex)
> -{
> -    mutex->once = (GOnce) G_ONCE_INIT;
> -}
> -
> -static inline void g_mutex_clear(CompatGMutex *mutex)
> -{
> -    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> -    if (mutex->once.retval) {
> -        g_mutex_free((GMutex *) mutex->once.retval);
> -    }
> -    mutex->once = (GOnce) G_ONCE_INIT;
> -}
> -
> -static inline void (g_mutex_lock)(CompatGMutex *mutex)
> -{
> -    g_once(&mutex->once, do_g_mutex_new, NULL);
> -    g_mutex_lock((GMutex *) mutex->once.retval);
> -}
> -#undef g_mutex_lock
> -
> -static inline gboolean (g_mutex_trylock)(CompatGMutex *mutex)
> -{
> -    g_once(&mutex->once, do_g_mutex_new, NULL);
> -    return g_mutex_trylock((GMutex *) mutex->once.retval);
> -}
> -#undef g_mutex_trylock
> -
> -
> -static inline void (g_mutex_unlock)(CompatGMutex *mutex)
> -{
> -    g_mutex_unlock((GMutex *) mutex->once.retval);
> -}
> -#undef g_mutex_unlock
> -
> -static inline gpointer do_g_cond_new(gpointer unused)
> -{
> -    return (gpointer) g_cond_new();
> -}
> -
> -static inline void g_cond_init(CompatGCond *cond)
> -{
> -    cond->once = (GOnce) G_ONCE_INIT;
> -}
> -
> -static inline void g_cond_clear(CompatGCond *cond)
> -{
> -    g_assert(cond->once.status != G_ONCE_STATUS_PROGRESS);
> -    if (cond->once.retval) {
> -        g_cond_free((GCond *) cond->once.retval);
> -    }
> -    cond->once = (GOnce) G_ONCE_INIT;
> -}
> -
> -static inline void (g_cond_wait)(CompatGCond *cond, CompatGMutex *mutex)
> -{
> -    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> -    g_once(&cond->once, do_g_cond_new, NULL);
> -    g_cond_wait((GCond *) cond->once.retval, (GMutex *) mutex->once.retval);
> -}
> -#undef g_cond_wait
> -
> -static inline void (g_cond_broadcast)(CompatGCond *cond)
> -{
> -    g_once(&cond->once, do_g_cond_new, NULL);
> -    g_cond_broadcast((GCond *) cond->once.retval);
> -}
> -#undef g_cond_broadcast
> -
> -static inline void (g_cond_signal)(CompatGCond *cond)
> -{
> -    g_once(&cond->once, do_g_cond_new, NULL);
> -    g_cond_signal((GCond *) cond->once.retval);
> -}
> -#undef g_cond_signal
> -
> -static inline gboolean (g_cond_timed_wait)(CompatGCond *cond,
> -                                           CompatGMutex *mutex,
> -                                           GTimeVal *time)
> -{
> -    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> -    g_once(&cond->once, do_g_cond_new, NULL);
> -    return g_cond_timed_wait((GCond *) cond->once.retval,
> -                             (GMutex *) mutex->once.retval, time);
> -}
> -#undef g_cond_timed_wait
> -
> -/* This is not a macro, because it didn't exist until 2.32.  */
> -static inline gboolean g_cond_wait_until(CompatGCond *cond, CompatGMutex *mutex,
> -                                         gint64 end_time)
> -{
> -    GTimeVal time;
> -
> -    /* Convert from monotonic to CLOCK_REALTIME.  */
> -    end_time -= g_get_monotonic_time();
> -    g_get_current_time(&time);
> -    end_time += time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> -
> -    time.tv_sec = end_time / G_TIME_SPAN_SECOND;
> -    time.tv_usec = end_time % G_TIME_SPAN_SECOND;
> -    return g_cond_timed_wait(cond, mutex, &time);
> -}
> -
> -/* before 2.31 there was no g_thread_new() */
> -static inline GThread *g_thread_new(const char *name,
> -                                    GThreadFunc func, gpointer data)
> -{
> -    GThread *thread = g_thread_create(func, data, TRUE, NULL);
> -    if (!thread) {
> -        g_error("creating thread");
> -    }
> -    return thread;
> -}
> -#else
> -#define CompatGMutex GMutex
> -#define CompatGCond GCond
> -#endif /* glib 2.31 */
> -
> -#if !GLIB_CHECK_VERSION(2, 32, 0)
> -/* Beware, function returns gboolean since 2.39.2, see GLib commit 9101915 */
> -static inline void g_hash_table_add(GHashTable *hash_table, gpointer key)
> -{
> -    g_hash_table_replace(hash_table, key, key);
> -}
> -#endif
> -
> -#ifndef g_assert_true
> -#define g_assert_true(expr)                                                    \
> -    do {                                                                       \
> -        if (G_LIKELY(expr)) {                                                  \
> -        } else {                                                               \
> -            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> -                                "'" #expr "' should be TRUE");                 \
> -        }                                                                      \
> -    } while (0)
> -#endif
> -
> -#ifndef g_assert_false
> -#define g_assert_false(expr)                                                   \
> -    do {                                                                       \
> -        if (G_LIKELY(!(expr))) {                                               \
> -        } else {                                                               \
> -            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> -                                "'" #expr "' should be FALSE");                \
> -        }                                                                      \
> -    } while (0)
> -#endif
> -
> -#ifndef g_assert_null
> -#define g_assert_null(expr)                                                    \
> -    do {                                                                       \
> -        if (G_LIKELY((expr) == NULL)) {                                        \
> -        } else {                                                               \
> -            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> -                                "'" #expr "' should be NULL");                 \
> -        }                                                                      \
> -    } while (0)
> -#endif
> -
> -#ifndef g_assert_nonnull
> -#define g_assert_nonnull(expr)                                                 \
> -    do {                                                                       \
> -        if (G_LIKELY((expr) != NULL)) {                                        \
> -        } else {                                                               \
> -            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> -                                "'" #expr "' should not be NULL");             \
> -        }                                                                      \
> -    } while (0)
> -#endif
> -
> -#ifndef g_assert_cmpmem
> -#define g_assert_cmpmem(m1, l1, m2, l2)                                        \
> -    do {                                                                       \
> -        gconstpointer __m1 = m1, __m2 = m2;                                    \
> -        int __l1 = l1, __l2 = l2;                                              \
> -        if (__l1 != __l2) {                                                    \
> -            g_assertion_message_cmpnum(                                        \
> -                G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,                   \
> -                #l1 " (len(" #m1 ")) == " #l2 " (len(" #m2 "))", __l1, "==",   \
> -                __l2, 'i');                                                    \
> -        } else if (memcmp(__m1, __m2, __l1) != 0) {                            \
> -            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> -                                "assertion failed (" #m1 " == " #m2 ")");      \
> -        }                                                                      \
> -    } while (0)
> -#endif
> -
> -#if !GLIB_CHECK_VERSION(2, 28, 0)
> -static inline void g_list_free_full(GList *list, GDestroyNotify free_func)
> -{
> -    GList *l;
> -
> -    for (l = list; l; l = l->next) {
> -        free_func(l->data);
> -    }
> -
> -    g_list_free(list);
> -}
> -
> -static inline void g_slist_free_full(GSList *list, GDestroyNotify free_func)
> -{
> -    GSList *l;
> -
> -    for (l = list; l; l = l->next) {
> -        free_func(l->data);
> -    }
> -
> -    g_slist_free(list);
> -}
> -#endif
> -
> -#if !GLIB_CHECK_VERSION(2, 26, 0)
> -static inline void g_source_set_name(GSource *source, const char *name)
> -{
> -    /* This is just a debugging aid, so leaving it a no-op */
> -}
> -static inline void g_source_set_name_by_id(guint tag, const char *name)
> -{
> -    /* This is just a debugging aid, so leaving it a no-op */
> -}
> -#endif
> -
> -#if !GLIB_CHECK_VERSION(2, 36, 0)
> -/* Always fail.  This will not include error_report output in the test log,
> - * sending it instead to stderr.
> - */
> -#define g_test_initialized() (0)
> -#endif
> -#if !GLIB_CHECK_VERSION(2, 38, 0)
> -#ifdef CONFIG_HAS_GLIB_SUBPROCESS_TESTS
> -#error schizophrenic detection of glib subprocess testing
> -#endif
> -#define g_test_subprocess() (0)
> -#endif
> -
> -
> -#if !GLIB_CHECK_VERSION(2, 34, 0)
> -static inline void
> -g_test_add_data_func_full(const char *path,
> -                          gpointer data,
> -                          gpointer fn,
> -                          gpointer data_free_func)
> -{
> -#if GLIB_CHECK_VERSION(2, 26, 0)
> -    /* back-compat casts, remove this once we can require new-enough glib */
> -    g_test_add_vtable(path, 0, data, NULL,
> -                      (GTestFixtureFunc)fn, (GTestFixtureFunc) data_free_func);
> -#else
> -    /* back-compat casts, remove this once we can require new-enough glib */
> -    g_test_add_vtable(path, 0, data, NULL,
> -                      (void (*)(void)) fn, (void (*)(void)) data_free_func);
> -#endif
> -}
> -#endif
> -
> -
> -#endif
> diff --git a/include/glib/glib-compat.h b/include/glib/glib-compat.h
> new file mode 100644
> index 0000000..863c8cf
> --- /dev/null
> +++ b/include/glib/glib-compat.h
> @@ -0,0 +1,352 @@
> +/*
> + * GLIB Compatibility Functions
> + *
> + * Copyright IBM, Corp. 2013
> + *
> + * Authors:
> + *  Anthony Liguori   <aliguori@us.ibm.com>
> + *  Michael Tokarev   <mjt@tls.msk.ru>
> + *  Paolo Bonzini     <pbonzini@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef QEMU_GLIB_COMPAT_H
> +#define QEMU_GLIB_COMPAT_H
> +
> +#include <glib.h>
> +
> +/* GLIB version compatibility flags */
> +#if !GLIB_CHECK_VERSION(2, 26, 0)
> +#define G_TIME_SPAN_SECOND              (G_GINT64_CONSTANT(1000000))
> +#endif
> +
> +#if !GLIB_CHECK_VERSION(2, 28, 0)
> +static inline gint64 qemu_g_get_monotonic_time(void)
> +{
> +    /* g_get_monotonic_time() is best-effort so we can use the wall clock as a
> +     * fallback.
> +     */
> +
> +    GTimeVal time;
> +    g_get_current_time(&time);
> +
> +    return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> +}
> +/* work around distro backports of this interface */
> +#define g_get_monotonic_time() qemu_g_get_monotonic_time()
> +#endif
> +
> +#if defined(_WIN32) && !GLIB_CHECK_VERSION(2, 50, 0)
> +/*
> + * g_poll has a problem on Windows when using
> + * timeouts < 10ms, so use wrapper.
> + */
> +#define g_poll(fds, nfds, timeout) g_poll_fixed(fds, nfds, timeout)
> +gint g_poll_fixed(GPollFD *fds, guint nfds, gint timeout);
> +#endif
> +
> +#if !GLIB_CHECK_VERSION(2, 30, 0)
> +/* Not a 100% compatible implementation, but good enough for most
> + * cases. Placeholders are only supported at the end of the
> + * template. */
> +static inline gchar *qemu_g_dir_make_tmp(gchar const *tmpl, GError **error)
> +{
> +    gchar *path = g_build_filename(g_get_tmp_dir(), tmpl ?: ".XXXXXX", NULL);
> +
> +    if (mkdtemp(path) != NULL) {
> +        return path;
> +    }
> +    /* Error occurred, clean up. */
> +    g_set_error(error, G_FILE_ERROR, g_file_error_from_errno(errno),
> +                "mkdtemp() failed");
> +    g_free(path);
> +    return NULL;
> +}
> +#define g_dir_make_tmp(tmpl, error) qemu_g_dir_make_tmp(tmpl, error)
> +#endif /* glib 2.30 */
> +
> +#if !GLIB_CHECK_VERSION(2, 31, 0)
> +/* before glib-2.31, GMutex and GCond was dynamic-only (there was a separate
> + * GStaticMutex, but it didn't work with condition variables).
> + *
> + * Our implementation uses GOnce to fake a static implementation that does
> + * not require separate initialization.
> + * We need to rename the types to avoid passing our CompatGMutex/CompatGCond
> + * by mistake to a function that expects GMutex/GCond.  However, for ease
> + * of use we keep the GLib function names.  GLib uses macros for the
> + * implementation, we use inline functions instead and undefine the macros.
> + */
> +
> +typedef struct CompatGMutex {
> +    GOnce once;
> +} CompatGMutex;
> +
> +typedef struct CompatGCond {
> +    GOnce once;
> +} CompatGCond;
> +
> +static inline gpointer do_g_mutex_new(gpointer unused)
> +{
> +    return (gpointer) g_mutex_new();
> +}
> +
> +static inline void g_mutex_init(CompatGMutex *mutex)
> +{
> +    mutex->once = (GOnce) G_ONCE_INIT;
> +}
> +
> +static inline void g_mutex_clear(CompatGMutex *mutex)
> +{
> +    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> +    if (mutex->once.retval) {
> +        g_mutex_free((GMutex *) mutex->once.retval);
> +    }
> +    mutex->once = (GOnce) G_ONCE_INIT;
> +}
> +
> +static inline void (g_mutex_lock)(CompatGMutex *mutex)
> +{
> +    g_once(&mutex->once, do_g_mutex_new, NULL);
> +    g_mutex_lock((GMutex *) mutex->once.retval);
> +}
> +#undef g_mutex_lock
> +
> +static inline gboolean (g_mutex_trylock)(CompatGMutex *mutex)
> +{
> +    g_once(&mutex->once, do_g_mutex_new, NULL);
> +    return g_mutex_trylock((GMutex *) mutex->once.retval);
> +}
> +#undef g_mutex_trylock
> +
> +
> +static inline void (g_mutex_unlock)(CompatGMutex *mutex)
> +{
> +    g_mutex_unlock((GMutex *) mutex->once.retval);
> +}
> +#undef g_mutex_unlock
> +
> +static inline gpointer do_g_cond_new(gpointer unused)
> +{
> +    return (gpointer) g_cond_new();
> +}
> +
> +static inline void g_cond_init(CompatGCond *cond)
> +{
> +    cond->once = (GOnce) G_ONCE_INIT;
> +}
> +
> +static inline void g_cond_clear(CompatGCond *cond)
> +{
> +    g_assert(cond->once.status != G_ONCE_STATUS_PROGRESS);
> +    if (cond->once.retval) {
> +        g_cond_free((GCond *) cond->once.retval);
> +    }
> +    cond->once = (GOnce) G_ONCE_INIT;
> +}
> +
> +static inline void (g_cond_wait)(CompatGCond *cond, CompatGMutex *mutex)
> +{
> +    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> +    g_once(&cond->once, do_g_cond_new, NULL);
> +    g_cond_wait((GCond *) cond->once.retval, (GMutex *) mutex->once.retval);
> +}
> +#undef g_cond_wait
> +
> +static inline void (g_cond_broadcast)(CompatGCond *cond)
> +{
> +    g_once(&cond->once, do_g_cond_new, NULL);
> +    g_cond_broadcast((GCond *) cond->once.retval);
> +}
> +#undef g_cond_broadcast
> +
> +static inline void (g_cond_signal)(CompatGCond *cond)
> +{
> +    g_once(&cond->once, do_g_cond_new, NULL);
> +    g_cond_signal((GCond *) cond->once.retval);
> +}
> +#undef g_cond_signal
> +
> +static inline gboolean (g_cond_timed_wait)(CompatGCond *cond,
> +                                           CompatGMutex *mutex,
> +                                           GTimeVal *time)
> +{
> +    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> +    g_once(&cond->once, do_g_cond_new, NULL);
> +    return g_cond_timed_wait((GCond *) cond->once.retval,
> +                             (GMutex *) mutex->once.retval, time);
> +}
> +#undef g_cond_timed_wait
> +
> +/* This is not a macro, because it didn't exist until 2.32.  */
> +static inline gboolean g_cond_wait_until(CompatGCond *cond, CompatGMutex *mutex,
> +                                         gint64 end_time)
> +{
> +    GTimeVal time;
> +
> +    /* Convert from monotonic to CLOCK_REALTIME.  */
> +    end_time -= g_get_monotonic_time();
> +    g_get_current_time(&time);
> +    end_time += time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> +
> +    time.tv_sec = end_time / G_TIME_SPAN_SECOND;
> +    time.tv_usec = end_time % G_TIME_SPAN_SECOND;
> +    return g_cond_timed_wait(cond, mutex, &time);
> +}
> +
> +/* before 2.31 there was no g_thread_new() */
> +static inline GThread *g_thread_new(const char *name,
> +                                    GThreadFunc func, gpointer data)
> +{
> +    GThread *thread = g_thread_create(func, data, TRUE, NULL);
> +    if (!thread) {
> +        g_error("creating thread");
> +    }
> +    return thread;
> +}
> +#else
> +#define CompatGMutex GMutex
> +#define CompatGCond GCond
> +#endif /* glib 2.31 */
> +
> +#if !GLIB_CHECK_VERSION(2, 32, 0)
> +/* Beware, function returns gboolean since 2.39.2, see GLib commit 9101915 */
> +static inline void g_hash_table_add(GHashTable *hash_table, gpointer key)
> +{
> +    g_hash_table_replace(hash_table, key, key);
> +}
> +#endif
> +
> +#ifndef g_assert_true
> +#define g_assert_true(expr)                                                    \
> +    do {                                                                       \
> +        if (G_LIKELY(expr)) {                                                  \
> +        } else {                                                               \
> +            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> +                                "'" #expr "' should be TRUE");                 \
> +        }                                                                      \
> +    } while (0)
> +#endif
> +
> +#ifndef g_assert_false
> +#define g_assert_false(expr)                                                   \
> +    do {                                                                       \
> +        if (G_LIKELY(!(expr))) {                                               \
> +        } else {                                                               \
> +            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> +                                "'" #expr "' should be FALSE");                \
> +        }                                                                      \
> +    } while (0)
> +#endif
> +
> +#ifndef g_assert_null
> +#define g_assert_null(expr)                                                    \
> +    do {                                                                       \
> +        if (G_LIKELY((expr) == NULL)) {                                        \
> +        } else {                                                               \
> +            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> +                                "'" #expr "' should be NULL");                 \
> +        }                                                                      \
> +    } while (0)
> +#endif
> +
> +#ifndef g_assert_nonnull
> +#define g_assert_nonnull(expr)                                                 \
> +    do {                                                                       \
> +        if (G_LIKELY((expr) != NULL)) {                                        \
> +        } else {                                                               \
> +            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> +                                "'" #expr "' should not be NULL");             \
> +        }                                                                      \
> +    } while (0)
> +#endif
> +
> +#ifndef g_assert_cmpmem
> +#define g_assert_cmpmem(m1, l1, m2, l2)                                        \
> +    do {                                                                       \
> +        gconstpointer __m1 = m1, __m2 = m2;                                    \
> +        int __l1 = l1, __l2 = l2;                                              \
> +        if (__l1 != __l2) {                                                    \
> +            g_assertion_message_cmpnum(                                        \
> +                G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,                   \
> +                #l1 " (len(" #m1 ")) == " #l2 " (len(" #m2 "))", __l1, "==",   \
> +                __l2, 'i');                                                    \
> +        } else if (memcmp(__m1, __m2, __l1) != 0) {                            \
> +            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> +                                "assertion failed (" #m1 " == " #m2 ")");      \
> +        }                                                                      \
> +    } while (0)
> +#endif
> +
> +#if !GLIB_CHECK_VERSION(2, 28, 0)
> +static inline void g_list_free_full(GList *list, GDestroyNotify free_func)
> +{
> +    GList *l;
> +
> +    for (l = list; l; l = l->next) {
> +        free_func(l->data);
> +    }
> +
> +    g_list_free(list);
> +}
> +
> +static inline void g_slist_free_full(GSList *list, GDestroyNotify free_func)
> +{
> +    GSList *l;
> +
> +    for (l = list; l; l = l->next) {
> +        free_func(l->data);
> +    }
> +
> +    g_slist_free(list);
> +}
> +#endif
> +
> +#if !GLIB_CHECK_VERSION(2, 26, 0)
> +static inline void g_source_set_name(GSource *source, const char *name)
> +{
> +    /* This is just a debugging aid, so leaving it a no-op */
> +}
> +static inline void g_source_set_name_by_id(guint tag, const char *name)
> +{
> +    /* This is just a debugging aid, so leaving it a no-op */
> +}
> +#endif
> +
> +#if !GLIB_CHECK_VERSION(2, 36, 0)
> +/* Always fail.  This will not include error_report output in the test log,
> + * sending it instead to stderr.
> + */
> +#define g_test_initialized() (0)
> +#endif
> +#if !GLIB_CHECK_VERSION(2, 38, 0)
> +#ifdef CONFIG_HAS_GLIB_SUBPROCESS_TESTS
> +#error schizophrenic detection of glib subprocess testing
> +#endif
> +#define g_test_subprocess() (0)
> +#endif
> +
> +
> +#if !GLIB_CHECK_VERSION(2, 34, 0)
> +static inline void
> +g_test_add_data_func_full(const char *path,
> +                          gpointer data,
> +                          gpointer fn,
> +                          gpointer data_free_func)
> +{
> +#if GLIB_CHECK_VERSION(2, 26, 0)
> +    /* back-compat casts, remove this once we can require new-enough glib */
> +    g_test_add_vtable(path, 0, data, NULL,
> +                      (GTestFixtureFunc)fn, (GTestFixtureFunc) data_free_func);
> +#else
> +    /* back-compat casts, remove this once we can require new-enough glib */
> +    g_test_add_vtable(path, 0, data, NULL,
> +                      (void (*)(void)) fn, (void (*)(void)) data_free_func);
> +#endif
> +}
> +#endif
> +
> +
> +#endif
> diff --git a/include/glib/glib-helper.h b/include/glib/glib-helper.h
> new file mode 100644
> index 0000000..db740fb
> --- /dev/null
> +++ b/include/glib/glib-helper.h
> @@ -0,0 +1,30 @@
> +/*
> + * Helpers for GLIB
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef QEMU_GLIB_HELPER_H
> +#define QEMU_GLIB_HELPER_H
> +
> +
> +#include "glib/glib-compat.h"
> +
> +#define GPOINTER_TO_UINT64(a) ((guint64) (a))
> +
> +/*
> + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> + */
> +gint g_int_cmp64(gconstpointer a, gconstpointer b,
> +        gpointer __attribute__((unused)) user_data);
> +
> +/*
> + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> + */
> +int g_int_cmp(gconstpointer a, gconstpointer b,
> +        gpointer __attribute__((unused)) user_data);
> +
> +#endif /* QEMU_GLIB_HELPER_H */
> +
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 122ff06..36f8a89 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -104,7 +104,7 @@ extern int daemon(int, int);
>  #include "sysemu/os-posix.h"
>  #endif
>
> -#include "glib-compat.h"
> +#include "glib/glib-compat.h"
>  #include "qemu/typedefs.h"
>
>  #ifndef O_LARGEFILE
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 10a3bb3..7cea6bc 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -35,7 +35,7 @@
>  #include "elf.h"
>  #include "exec/log.h"
>  #include "trace/control.h"
> -#include "glib-compat.h"
> +#include "glib/glib-compat.h"
>
>  char *exec_path;
>
> diff --git a/scripts/clean-includes b/scripts/clean-includes
> index dd938da..b32b928 100755
> --- a/scripts/clean-includes
> +++ b/scripts/clean-includes
> @@ -123,7 +123,7 @@ for f in "$@"; do
>        ;;
>      *include/qemu/osdep.h | \
>      *include/qemu/compiler.h | \
> -    *include/glib-compat.h | \
> +    *include/glib/glib-compat.h | \
>      *include/sysemu/os-posix.h | \
>      *include/sysemu/os-win32.h | \
>      *include/standard-headers/ )
> diff --git a/util/Makefile.objs b/util/Makefile.objs
> index c6205eb..0080712 100644
> --- a/util/Makefile.objs
> +++ b/util/Makefile.objs
> @@ -43,3 +43,4 @@ util-obj-y += qdist.o
>  util-obj-y += qht.o
>  util-obj-y += range.o
>  util-obj-y += systemd.o
> +util-obj-y += glib-helper.o
> diff --git a/util/glib-helper.c b/util/glib-helper.c
> new file mode 100644
> index 0000000..2557009
> --- /dev/null
> +++ b/util/glib-helper.c
> @@ -0,0 +1,29 @@
> +/*
> + * Implementation for GLIB helpers
> + * this file is intented to commulate and later reuse
> + * additional glib functions
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> +
> + */
> +
> +#include "glib/glib-helper.h"
> +
> +gint g_int_cmp64(gconstpointer a, gconstpointer b,
> +        gpointer __attribute__((unused)) user_data)
> +{
> +    guint64 ua = GPOINTER_TO_UINT64(a);
> +    guint64 ub = GPOINTER_TO_UINT64(b);
> +    return (ua > ub) - (ua < ub);
> +}
> +
> +/*
> + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> + */
> +gint g_int_cmp(gconstpointer a, gconstpointer b,
> +        gpointer __attribute__((unused)) user_data)
> +{
> +    return g_int_cmp64(a, b, user_data);
> +}
> +
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration
  2017-04-14 13:17 ` [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration Alexey Perevalov
                     ` (5 preceding siblings ...)
       [not found]   ` <CGME20170414131741eucas1p2f34e11e4292fef1c50ef63bd3522ad04@eucas1p2.samsung.com>
@ 2017-04-17  2:32   ` no-reply
  2017-04-17  2:36   ` no-reply
  7 siblings, 0 replies; 38+ messages in thread
From: no-reply @ 2017-04-17  2:32 UTC (permalink / raw)
  To: a.perevalov; +Cc: famz, dgilbert, qemu-devel, i.maximets

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration
Message-id: 1492175840-5021-1-git-send-email-a.perevalov@samsung.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
73cecab migration: detailed traces for postcopy
ba44e58 migration: send postcopy downtime back to source
2187a65 migration: calculate downtime on dst side
313364f migration: add UFFD_FEATURE_THREAD_ID feature support
cd9e8e9 util: introduce glib-helper.c
e347334 userfault: add pid into uffd_msg & update UFFD_FEATURE_*

=== OUTPUT BEGIN ===
Checking PATCH 1/6: userfault: add pid into uffd_msg & update UFFD_FEATURE_*...
Checking PATCH 2/6: util: introduce glib-helper.c...
Checking PATCH 3/6: migration: add UFFD_FEATURE_THREAD_ID feature support...
Checking PATCH 4/6: migration: calculate downtime on dst side...
ERROR: spaces required around that '/' (ctx:VxV)
#121: FILE: migration/migration.c:2144:
+#define SIZE_TO_KEEP_CPUBITS (1 + smp_cpus/sizeof(guint64))
                                           ^

ERROR: braces {} are necessary for all arms of this statement
#138: FILE: migration/migration.c:2161:
+    if (cpu < 0) {
[...]
+    } else
[...]

WARNING: line over 80 characters
#165: FILE: migration/migration.c:2188:
+        /* error_report("Could not populate downtime duration completion time \n\

ERROR: unnecessary whitespace before a quoted newline
#165: FILE: migration/migration.c:2188:
+        /* error_report("Could not populate downtime duration completion time \n\

ERROR: Error messages should not contain newlines
#165: FILE: migration/migration.c:2188:
+        /* error_report("Could not populate downtime duration completion time \n\

WARNING: line over 80 characters
#191: FILE: migration/migration.c:2214:
+        od_begin->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);

WARNING: line over 80 characters
#200: FILE: migration/migration.c:2223:
+        od_end->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);

ERROR: braces {} are necessary for all arms of this statement
#207: FILE: migration/migration.c:2230:
+    if (dd->end && dd->begin)
[...]

WARNING: line over 80 characters
#208: FILE: migration/migration.c:2231:
+        trace_split_duration_and_fill_points(dd->end - dd->begin, (uint64_t)key);

ERROR: braces {} are necessary for all arms of this statement
#220: FILE: migration/migration.c:2243:
+        if (test_bit(cpu_iter, dd->cpus) && dd->end && dd->begin)
[...]

WARNING: line over 80 characters
#243: FILE: migration/migration.c:2266:
+    unsigned long zero_bit = find_first_zero_bit(b, BITS_PER_LONG * SIZE_TO_KEEP_CPUBITS);

ERROR: that open brace { should be on the previous line
#307: FILE: migration/migration.c:2330:
+        for (point_iter = 0; point_iter < smp_cpus; point_iter++)
+        {

ERROR: braces {} are necessary even for single statement blocks
#307: FILE: migration/migration.c:2330:
+        for (point_iter = 0; point_iter < smp_cpus; point_iter++)
+        {
+            trace_downtime_per_cpu(point_iter, downtime_cpu[point_iter]);
+        }

ERROR: braces {} are necessary for all arms of this statement
#328: FILE: migration/migration.c:2351:
+        if (!od || !prev_od)
[...]

ERROR: braces {} are necessary for all arms of this statement
#331: FILE: migration/migration.c:2354:
+        if (!od->is_end || prev_od->is_end)
[...]

ERROR: braces {} are necessary for all arms of this statement
#340: FILE: migration/migration.c:2363:
+            if (!t_od)
[...]

ERROR: braces {} are necessary for all arms of this statement
#343: FILE: migration/migration.c:2366:
+            if (t_od->is_end)
[...]

ERROR: suspect code indent for conditional statements (8, 11)
#409: FILE: migration/postcopy-ram.c:419:
+        if (cpu_iter->thread_id == pid)
+           return cpu_iter->cpu_index;

ERROR: braces {} are necessary for all arms of this statement
#409: FILE: migration/postcopy-ram.c:419:
+        if (cpu_iter->thread_id == pid)
[...]

WARNING: line over 80 characters
#424: FILE: migration/postcopy-ram.c:503:
+                                                rb_offset, msg.arg.pagefault.feat.ptid);

WARNING: line over 80 characters
#427: FILE: migration/postcopy-ram.c:506:
+                            get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));

total: 14 errors, 7 warnings, 431 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 5/6: migration: send postcopy downtime back to source...
Checking PATCH 6/6: migration: detailed traces for postcopy...
ERROR: braces {} are necessary for all arms of this statement
#65: FILE: migration/postcopy-ram.c:456:
+        if (str_len <= 0)
[...]

total: 1 errors, 0 warnings, 131 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration
  2017-04-14 13:17 ` [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration Alexey Perevalov
                     ` (6 preceding siblings ...)
  2017-04-17  2:32   ` [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration no-reply
@ 2017-04-17  2:36   ` no-reply
  7 siblings, 0 replies; 38+ messages in thread
From: no-reply @ 2017-04-17  2:36 UTC (permalink / raw)
  To: a.perevalov; +Cc: famz, dgilbert, qemu-devel, i.maximets

Hi,

This series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Type: series
Subject: [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration
Message-id: 1492175840-5021-1-git-send-email-a.perevalov@samsung.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
make docker-test-quick@centos6
make docker-test-mingw@fedora
make docker-test-build@min-glib
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
73cecab migration: detailed traces for postcopy
ba44e58 migration: send postcopy downtime back to source
2187a65 migration: calculate downtime on dst side
313364f migration: add UFFD_FEATURE_THREAD_ID feature support
cd9e8e9 util: introduce glib-helper.c
e347334 userfault: add pid into uffd_msg & update UFFD_FEATURE_*

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-cvj9w8en/src/dtc'...
Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-cvj9w8en/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPY    RUNNER
    RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache     tar git make gcc g++     zlib-devel glib2-devel SDL-devel pixman-devel     epel-release
HOSTNAME=9a6e87669ba4
TERM=xterm
MAKEFLAGS= -j8
HISTSIZE=1000
J=8
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix    /var/tmp/qemu-build/install
BIOS directory    /var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path       /tmp/qemu-test/src
C compiler        cc
Host C compiler   cc
C++ compiler      
Objective-C compiler cc
ARFLAGS           rv
CFLAGS            -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS       -I/usr/include/pixman-1   -I$(SRC_PATH)/dtc/libfdt -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all
LDFLAGS           -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make              make
install           install
python            python -B
smbd              /usr/sbin/smbd
module support    no
host CPU          x86_64
host big endian   no
target list       x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled     no
sparse enabled    no
strip binaries    yes
profiler          no
static build      no
pixman            system
SDL support       yes (1.2.14)
GTK support       no 
GTK GL support    no
VTE support       no 
TLS priority      NORMAL
GNUTLS support    no
GNUTLS rnd        no
libgcrypt         no
libgcrypt kdf     no
nettle            no 
nettle kdf        no
libtasn1          no
curses support    no
virgl support     no
curl support      no
mingw32 support   no
Audio drivers     oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS support    no
VNC support       yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support       no
brlapi support    no
bluez  support    no
Documentation     no
PIE               yes
vde support       no
netmap support    no
Linux AIO support no
ATTR/XATTR support yes
Install blobs     yes
KVM support       yes
HAX support       no
RDMA support      no
TCG interpreter   no
fdt support       yes
preadv support    yes
fdatasync         yes
madvise           yes
posix_madvise     yes
libcap-ng support no
vhost-net support yes
vhost-scsi support yes
vhost-vsock support yes
Trace backends    log
spice support     no 
rbd support       no
xfsctl support    no
smartcard support no
libusb            no
usb net redir     no
OpenGL support    no
OpenGL dmabufs    no
libiscsi support  no
libnfs support    no
build guest agent yes
QGA VSS support   no
QGA w32 disk info no
QGA MSI support   no
seccomp support   no
coroutine backend ucontext
coroutine pool    yes
debug stack usage no
GlusterFS support no
gcov              gcov
gcov enabled      no
TPM support       yes
libssh2 support   no
TPM passthrough   yes
QOM debugging     yes
lzo support       no
snappy support    no
bzip2 support     no
NUMA host support no
tcmalloc support  no
jemalloc support  no
avx2 optimization no
replication support yes
  GEN     x86_64-softmmu/config-devices.mak.tmp
mkdir -p dtc/libfdt
  GEN     aarch64-softmmu/config-devices.mak.tmp
mkdir -p dtc/tests
  GEN     config-host.h
  GEN     qemu-options.def
  GEN     qapi-types.h
  GEN     qmp-commands.h
  GEN     qapi-visit.h
  GEN     qapi-event.h
  GEN     x86_64-softmmu/config-devices.mak
  GEN     aarch64-softmmu/config-devices.mak
  GEN     qmp-marshal.c
  GEN     qapi-types.c
  GEN     qapi-visit.c
  GEN     qapi-event.c
  GEN     qmp-introspect.c
  GEN     qmp-introspect.h
  GEN     trace/generated-tcg-tracers.h
  GEN     trace/generated-helpers-wrappers.h
  GEN     trace/generated-helpers.h
  GEN     trace/generated-helpers.c
  GEN     module_block.h
  GEN     tests/test-qapi-types.h
  GEN     tests/test-qapi-visit.h
  GEN     tests/test-qmp-commands.h
  GEN     tests/test-qapi-event.h
  GEN     tests/test-qmp-introspect.h
  GEN     trace-root.h
  GEN     util/trace.h
  GEN     crypto/trace.h
  GEN     io/trace.h
  GEN     migration/trace.h
  GEN     block/trace.h
  GEN     backends/trace.h
  GEN     hw/block/trace.h
  GEN     hw/block/dataplane/trace.h
  GEN     hw/char/trace.h
  GEN     hw/intc/trace.h
  GEN     hw/net/trace.h
  GEN     hw/virtio/trace.h
  GEN     hw/audio/trace.h
  GEN     hw/misc/trace.h
  GEN     hw/usb/trace.h
  GEN     hw/scsi/trace.h
  GEN     hw/nvram/trace.h
  GEN     hw/display/trace.h
  GEN     hw/input/trace.h
  GEN     hw/timer/trace.h
  GEN     hw/dma/trace.h
  GEN     hw/sparc/trace.h
  GEN     hw/sd/trace.h
  GEN     hw/isa/trace.h
  GEN     hw/mem/trace.h
  GEN     hw/i386/trace.h
  GEN     hw/i386/xen/trace.h
  GEN     hw/9pfs/trace.h
  GEN     hw/ppc/trace.h
  GEN     hw/pci/trace.h
  GEN     hw/s390x/trace.h
  GEN     hw/vfio/trace.h
  GEN     hw/acpi/trace.h
  GEN     hw/arm/trace.h
  GEN     hw/alpha/trace.h
  GEN     hw/xen/trace.h
  GEN     ui/trace.h
  GEN     audio/trace.h
  GEN     net/trace.h
  GEN     target/arm/trace.h
  GEN     target/i386/trace.h
  GEN     target/mips/trace.h
  GEN     target/sparc/trace.h
  GEN     target/s390x/trace.h
  GEN     target/ppc/trace.h
  GEN     qom/trace.h
  GEN     linux-user/trace.h
  GEN     qapi/trace.h
  GEN     trace-root.c
  GEN     util/trace.c
  GEN     crypto/trace.c
  GEN     io/trace.c
  GEN     migration/trace.c
  GEN     block/trace.c
  GEN     backends/trace.c
  GEN     hw/block/trace.c
  GEN     hw/block/dataplane/trace.c
  GEN     hw/char/trace.c
  GEN     hw/intc/trace.c
  GEN     hw/net/trace.c
  GEN     hw/virtio/trace.c
  GEN     hw/audio/trace.c
  GEN     hw/misc/trace.c
  GEN     hw/usb/trace.c
  GEN     hw/scsi/trace.c
  GEN     hw/nvram/trace.c
  GEN     hw/display/trace.c
  GEN     hw/input/trace.c
  GEN     hw/timer/trace.c
  GEN     hw/dma/trace.c
  GEN     hw/sparc/trace.c
  GEN     hw/sd/trace.c
  GEN     hw/isa/trace.c
  GEN     hw/mem/trace.c
  GEN     hw/i386/trace.c
  GEN     hw/i386/xen/trace.c
  GEN     hw/9pfs/trace.c
  GEN     hw/ppc/trace.c
  GEN     hw/pci/trace.c
  GEN     hw/s390x/trace.c
  GEN     hw/vfio/trace.c
  GEN     hw/acpi/trace.c
  GEN     hw/arm/trace.c
  GEN     hw/alpha/trace.c
  GEN     hw/xen/trace.c
  GEN     ui/trace.c
  GEN     audio/trace.c
  GEN     net/trace.c
  GEN     target/arm/trace.c
  GEN     target/i386/trace.c
  GEN     target/mips/trace.c
  GEN     target/sparc/trace.c
  GEN     target/s390x/trace.c
  GEN     target/ppc/trace.c
  GEN     qom/trace.c
  GEN     linux-user/trace.c
  GEN     qapi/trace.c
  GEN     config-all-devices.mak
	 DEP /tmp/qemu-test/src/dtc/tests/dumptrees.c
	 DEP /tmp/qemu-test/src/dtc/tests/trees.S
	 DEP /tmp/qemu-test/src/dtc/tests/testutils.c
	 DEP /tmp/qemu-test/src/dtc/tests/value-labels.c
	 DEP /tmp/qemu-test/src/dtc/tests/asm_tree_dump.c
	 DEP /tmp/qemu-test/src/dtc/tests/truncated_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/check_path.c
	 DEP /tmp/qemu-test/src/dtc/tests/overlay.c
	 DEP /tmp/qemu-test/src/dtc/tests/overlay_bad_fixup.c
	 DEP /tmp/qemu-test/src/dtc/tests/property_iterate.c
	 DEP /tmp/qemu-test/src/dtc/tests/subnode_iterate.c
	 DEP /tmp/qemu-test/src/dtc/tests/integer-expressions.c
	 DEP /tmp/qemu-test/src/dtc/tests/utilfdt_test.c
	 DEP /tmp/qemu-test/src/dtc/tests/path_offset_aliases.c
	 DEP /tmp/qemu-test/src/dtc/tests/add_subnode_with_nops.c
	 DEP /tmp/qemu-test/src/dtc/tests/dtbs_equal_unordered.c
	 DEP /tmp/qemu-test/src/dtc/tests/dtb_reverse.c
	 DEP /tmp/qemu-test/src/dtc/tests/dtbs_equal_ordered.c
	 DEP /tmp/qemu-test/src/dtc/tests/extra-terminating-null.c
	 DEP /tmp/qemu-test/src/dtc/tests/boot-cpuid.c
	 DEP /tmp/qemu-test/src/dtc/tests/incbin.c
	 DEP /tmp/qemu-test/src/dtc/tests/phandle_format.c
	 DEP /tmp/qemu-test/src/dtc/tests/path-references.c
	 DEP /tmp/qemu-test/src/dtc/tests/references.c
	 DEP /tmp/qemu-test/src/dtc/tests/string_escapes.c
	 DEP /tmp/qemu-test/src/dtc/tests/propname_escapes.c
	 DEP /tmp/qemu-test/src/dtc/tests/appendprop2.c
	 DEP /tmp/qemu-test/src/dtc/tests/appendprop1.c
	 DEP /tmp/qemu-test/src/dtc/tests/del_node.c
	 DEP /tmp/qemu-test/src/dtc/tests/del_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/setprop.c
	 DEP /tmp/qemu-test/src/dtc/tests/set_name.c
	 DEP /tmp/qemu-test/src/dtc/tests/rw_tree1.c
	 DEP /tmp/qemu-test/src/dtc/tests/open_pack.c
	 DEP /tmp/qemu-test/src/dtc/tests/nopulate.c
	 DEP /tmp/qemu-test/src/dtc/tests/mangle-layout.c
	 DEP /tmp/qemu-test/src/dtc/tests/move_and_save.c
	 DEP /tmp/qemu-test/src/dtc/tests/sw_tree1.c
	 DEP /tmp/qemu-test/src/dtc/tests/nop_node.c
	 DEP /tmp/qemu-test/src/dtc/tests/nop_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/setprop_inplace.c
	 DEP /tmp/qemu-test/src/dtc/tests/stringlist.c
	 DEP /tmp/qemu-test/src/dtc/tests/addr_size_cells.c
	 DEP /tmp/qemu-test/src/dtc/tests/notfound.c
	 DEP /tmp/qemu-test/src/dtc/tests/sized_cells.c
	 DEP /tmp/qemu-test/src/dtc/tests/char_literal.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_alias.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_compatible.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_check_compatible.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_phandle.c
	 DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_prop_value.c
	 DEP /tmp/qemu-test/src/dtc/tests/parent_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/supernode_atdepth_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_path.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_phandle.c
	 DEP /tmp/qemu-test/src/dtc/tests/getprop.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_name.c
	 DEP /tmp/qemu-test/src/dtc/tests/path_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/subnode_offset.c
	 DEP /tmp/qemu-test/src/dtc/tests/find_property.c
	 DEP /tmp/qemu-test/src/dtc/tests/root_node.c
	 DEP /tmp/qemu-test/src/dtc/tests/get_mem_rsv.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_overlay.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_empty_tree.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_addresses.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_strerror.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_rw.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_sw.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_ro.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt_wip.c
	 DEP /tmp/qemu-test/src/dtc/libfdt/fdt.c
	 DEP /tmp/qemu-test/src/dtc/util.c
	 DEP /tmp/qemu-test/src/dtc/fdtput.c
	 DEP /tmp/qemu-test/src/dtc/fdtget.c
	 DEP /tmp/qemu-test/src/dtc/fdtdump.c
	 LEX convert-dtsv0-lexer.lex.c
	 DEP /tmp/qemu-test/src/dtc/srcpos.c
make[1]: flex: Command not found
	 BISON dtc-parser.tab.c
make[1]: bison: Command not found
	 LEX dtc-lexer.lex.c
make[1]: flex: Command not found
	 DEP /tmp/qemu-test/src/dtc/treesource.c
	 DEP /tmp/qemu-test/src/dtc/livetree.c
	 DEP /tmp/qemu-test/src/dtc/fstree.c
	 DEP /tmp/qemu-test/src/dtc/flattree.c
	 DEP /tmp/qemu-test/src/dtc/dtc.c
	 DEP /tmp/qemu-test/src/dtc/data.c
	 DEP /tmp/qemu-test/src/dtc/checks.c
	CHK version_gen.h
	 LEX convert-dtsv0-lexer.lex.c
	 BISON dtc-parser.tab.c
make[1]: flex: Command not found
make[1]: bison: Command not found
	 LEX dtc-lexer.lex.c
make[1]: flex: Command not found
	UPD version_gen.h
	 DEP /tmp/qemu-test/src/dtc/util.c
	 LEX convert-dtsv0-lexer.lex.c
	 BISON dtc-parser.tab.c
make[1]: flex: Command not found
make[1]: bison: Command not found
	 LEX dtc-lexer.lex.c
make[1]: flex: Command not found
	 CC libfdt/fdt.o
	 CC libfdt/fdt_ro.o
	 CC libfdt/fdt_wip.o
	 CC libfdt/fdt_sw.o
	 CC libfdt/fdt_rw.o
	 CC libfdt/fdt_strerror.o
	 CC libfdt/fdt_empty_tree.o
	 CC libfdt/fdt_addresses.o
	 CC libfdt/fdt_overlay.o
	 AR libfdt/libfdt.a
ar: creating libfdt/libfdt.a
a - libfdt/fdt.o
a - libfdt/fdt_ro.o
a - libfdt/fdt_wip.o
a - libfdt/fdt_sw.o
a - libfdt/fdt_rw.o
a - libfdt/fdt_strerror.o
a - libfdt/fdt_empty_tree.o
a - libfdt/fdt_addresses.o
a - libfdt/fdt_overlay.o
	 LEX convert-dtsv0-lexer.lex.c
make[1]: flex: Command not found
	 LEX dtc-lexer.lex.c
make[1]: flex: Command not found
	 BISON dtc-parser.tab.c
make[1]: bison: Command not found
  CC      tests/qemu-iotests/socket_scm_helper.o
  GEN     qga/qapi-generated/qga-qapi-types.h
  GEN     qga/qapi-generated/qga-qapi-visit.h
  GEN     qga/qapi-generated/qga-qmp-commands.h
  GEN     qga/qapi-generated/qga-qapi-types.c
  GEN     qga/qapi-generated/qga-qmp-marshal.c
  GEN     qga/qapi-generated/qga-qapi-visit.c
  CC      trace-root.o
  CC      util/trace.o
  CC      crypto/trace.o
  CC      io/trace.o
  CC      migration/trace.o
  CC      block/trace.o
  CC      backends/trace.o
  CC      hw/block/trace.o
  CC      hw/block/dataplane/trace.o
  CC      hw/char/trace.o
  CC      hw/intc/trace.o
  CC      hw/net/trace.o
  CC      hw/virtio/trace.o
  CC      hw/audio/trace.o
  CC      hw/misc/trace.o
  CC      hw/usb/trace.o
  CC      hw/scsi/trace.o
  CC      hw/nvram/trace.o
  CC      hw/display/trace.o
  CC      hw/input/trace.o
  CC      hw/timer/trace.o
  CC      hw/dma/trace.o
  CC      hw/sparc/trace.o
  CC      hw/sd/trace.o
  CC      hw/isa/trace.o
  CC      hw/mem/trace.o
  CC      hw/i386/trace.o
  CC      hw/i386/xen/trace.o
  CC      hw/9pfs/trace.o
  CC      hw/ppc/trace.o
  CC      hw/pci/trace.o
  CC      hw/s390x/trace.o
  CC      hw/vfio/trace.o
  CC      hw/acpi/trace.o
  CC      hw/arm/trace.o
  CC      hw/alpha/trace.o
  CC      hw/xen/trace.o
  CC      ui/trace.o
  CC      audio/trace.o
  CC      net/trace.o
  CC      target/arm/trace.o
  CC      target/i386/trace.o
  CC      target/mips/trace.o
  CC      target/sparc/trace.o
  CC      target/s390x/trace.o
  CC      target/ppc/trace.o
  CC      qom/trace.o
  CC      linux-user/trace.o
  CC      qapi/trace.o
  CC      qmp-introspect.o
  CC      qapi-types.o
  CC      qapi-visit.o
  CC      qapi-event.o
  CC      qapi/qapi-visit-core.o
  CC      qapi/qapi-dealloc-visitor.o
  CC      qapi/qobject-input-visitor.o
  CC      qapi/qobject-output-visitor.o
  CC      qapi/qmp-registry.o
  CC      qapi/qmp-dispatch.o
  CC      qapi/string-input-visitor.o
  CC      qapi/string-output-visitor.o
  CC      qapi/opts-visitor.o
  CC      qapi/qapi-clone-visitor.o
  CC      qapi/qmp-event.o
  CC      qapi/qapi-util.o
  CC      qobject/qnull.o
  CC      qobject/qint.o
  CC      qobject/qstring.o
  CC      qobject/qdict.o
  CC      qobject/qlist.o
  CC      qobject/qfloat.o
  CC      qobject/qbool.o
  CC      qobject/qjson.o
  CC      qobject/qobject.o
  CC      qobject/json-lexer.o
  CC      qobject/json-streamer.o
  CC      qobject/json-parser.o
  CC      trace/control.o
  CC      util/osdep.o
  CC      trace/qmp.o
  CC      util/cutils.o
  CC      util/unicode.o
  CC      util/qemu-timer-common.o
  CC      util/bufferiszero.o
  CC      util/lockcnt.o
  CC      util/aiocb.o
  CC      util/async.o
  CC      util/thread-pool.o
  CC      util/qemu-timer.o
  CC      util/main-loop.o
  CC      util/aio-posix.o
  CC      util/iohandler.o
  CC      util/compatfd.o
  CC      util/event_notifier-posix.o
  CC      util/mmap-alloc.o
  CC      util/oslib-posix.o
  CC      util/qemu-openpty.o
  CC      util/memfd.o
  CC      util/qemu-thread-posix.o
  CC      util/envlist.o
  CC      util/path.o
  CC      util/module.o
  CC      util/host-utils.o
  CC      util/bitmap.o
  CC      util/bitops.o
  CC      util/hbitmap.o
  CC      util/fifo8.o
  CC      util/acl.o
  CC      util/error.o
  CC      util/qemu-error.o
  CC      util/id.o
  CC      util/iov.o
  CC      util/qemu-config.o
  CC      util/qemu-sockets.o
  CC      util/uri.o
  CC      util/notify.o
  CC      util/qemu-option.o
  CC      util/qemu-progress.o
  CC      util/keyval.o
  CC      util/hexdump.o
  CC      util/crc32c.o
  CC      util/uuid.o
  CC      util/throttle.o
  CC      util/getauxval.o
  CC      util/readline.o
  CC      util/qemu-coroutine.o
  CC      util/rcu.o
  CC      util/qemu-coroutine-lock.o
  CC      util/qemu-coroutine-io.o
  CC      util/qemu-coroutine-sleep.o
  CC      util/coroutine-ucontext.o
  CC      util/buffer.o
  CC      util/timed-average.o
  CC      util/base64.o
  CC      util/log.o
  CC      util/qdist.o
  CC      util/qht.o
  CC      util/range.o
  CC      util/systemd.o
  CC      util/glib-helper.o
  CC      crypto/pbkdf-stub.o
  CC      stubs/arch-query-cpu-def.o
  CC      stubs/arch-query-cpu-model-expansion.o
  CC      stubs/arch-query-cpu-model-comparison.o
  CC      stubs/arch-query-cpu-model-baseline.o
  CC      stubs/bdrv-next-monitor-owned.o
  CC      stubs/blk-commit-all.o
  CC      stubs/blockdev-close-all-bdrv-states.o
In file included from /tmp/qemu-test/src/include/glib/glib-helper.h:14,
                 from /tmp/qemu-test/src/util/glib-helper.c:12:
/tmp/qemu-test/src/include/glib/glib-compat.h: In function ‘qemu_g_dir_make_tmp’:
/tmp/qemu-test/src/include/glib/glib-compat.h:59: warning: implicit declaration of function ‘mkdtemp’
/tmp/qemu-test/src/include/glib/glib-compat.h:59: warning: nested extern declaration of ‘mkdtemp’
/tmp/qemu-test/src/include/glib/glib-compat.h:59: warning: comparison between pointer and integer
/tmp/qemu-test/src/include/glib/glib-compat.h:63: error: ‘errno’ undeclared (first use in this function)
/tmp/qemu-test/src/include/glib/glib-compat.h:63: error: (Each undeclared identifier is reported only once
/tmp/qemu-test/src/include/glib/glib-compat.h:63: error: for each function it appears in.)
make: *** [util/glib-helper.o] Error 1
make: *** Waiting for unfinished jobs....
tests/docker/Makefile.include:118: recipe for target 'docker-run' failed
make[1]: *** [docker-run] Error 2
make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-cvj9w8en/src'
tests/docker/Makefile.include:149: recipe for target 'docker-run-test-quick@centos6' failed
make: *** [docker-run-test-quick@centos6] Error 2
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c
  2017-04-14 16:05       ` Philippe Mathieu-Daudé
@ 2017-04-17  7:07         ` Alexey
  2017-04-21 10:01         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 38+ messages in thread
From: Alexey @ 2017-04-17  7:07 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé; +Cc: dgilbert, qemu-devel, i.maximets

Hi Philippe, 

On Fri, Apr 14, 2017 at 01:05:52PM -0300, Philippe Mathieu-Daudé wrote:
> Hi Alexey,
> 
> On 04/14/2017 10:17 AM, Alexey Perevalov wrote:
> >There is a lack of g_int_cmp which compares pointers value in glib,
> >xen_disk.c introduced its own, so the same function now requires
> >in migration.c. So logically to move it into common place.
> >Futher: maybe extend glib.
> >
> >Also this commit moves existing glib-compat.h into util/glib
> >folder for consolidation purpose.
> 
> Can you do this in two commits? First one moving files only, second
> move the function?
> 
Ok
> I'm not sure naming it "g_int_cmp()" won't clash with future
> _extended_ glib, what do you think about naming it
> "qemu_g_int_cmp()"?
> 
Why not, if it will have better maintainability.

> >Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> >---
> > hw/block/xen_disk.c        |  10 +-
> > include/glib-compat.h      | 352 ---------------------------------------------
> > include/glib/glib-compat.h | 352 +++++++++++++++++++++++++++++++++++++++++++++
> > include/glib/glib-helper.h |  30 ++++
> > include/qemu/osdep.h       |   2 +-
> > linux-user/main.c          |   2 +-
> > scripts/clean-includes     |   2 +-
> > util/Makefile.objs         |   1 +
> > util/glib-helper.c         |  29 ++++
> > 9 files changed, 417 insertions(+), 363 deletions(-)
> > delete mode 100644 include/glib-compat.h
> > create mode 100644 include/glib/glib-compat.h
> > create mode 100644 include/glib/glib-helper.h
> > create mode 100644 util/glib-helper.c
> >
> >diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> >index 456a2d5..36f6396 100644
> >--- a/hw/block/xen_disk.c
> >+++ b/hw/block/xen_disk.c
> >@@ -20,6 +20,7 @@
> >  */
> >
> > #include "qemu/osdep.h"
> >+#include "glib/glib-helper.h"
> > #include <sys/ioctl.h>
> > #include <sys/uio.h>
> >
> >@@ -154,13 +155,6 @@ static void ioreq_reset(struct ioreq *ioreq)
> >     qemu_iovec_reset(&ioreq->v);
> > }
> >
> >-static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
> >-{
> >-    uint ua = GPOINTER_TO_UINT(a);
> >-    uint ub = GPOINTER_TO_UINT(b);
> >-    return (ua > ub) - (ua < ub);
> >-}
> >-
> > static void destroy_grant(gpointer pgnt)
> > {
> >     PersistentGrant *grant = pgnt;
> >@@ -1191,7 +1185,7 @@ static int blk_connect(struct XenDevice *xendev)
> >     if (blkdev->feature_persistent) {
> >         /* Init persistent grants */
> >         blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
> >-        blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
> >+        blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)g_int_cmp,
> >                                              NULL, NULL,
> >                                              batch_maps ?
> >                                              (GDestroyNotify)g_free :
> >diff --git a/include/glib-compat.h b/include/glib-compat.h
> >deleted file mode 100644
> >index 863c8cf..0000000
> >--- a/include/glib-compat.h
> >+++ /dev/null
> >@@ -1,352 +0,0 @@
> >-/*
> >- * GLIB Compatibility Functions
> >- *
> >- * Copyright IBM, Corp. 2013
> >- *
> >- * Authors:
> >- *  Anthony Liguori   <aliguori@us.ibm.com>
> >- *  Michael Tokarev   <mjt@tls.msk.ru>
> >- *  Paolo Bonzini     <pbonzini@redhat.com>
> >- *
> >- * This work is licensed under the terms of the GNU GPL, version 2 or later.
> >- * See the COPYING file in the top-level directory.
> >- *
> >- */
> >-
> >-#ifndef QEMU_GLIB_COMPAT_H
> >-#define QEMU_GLIB_COMPAT_H
> >-
> >-#include <glib.h>
> >-
> >-/* GLIB version compatibility flags */
> >-#if !GLIB_CHECK_VERSION(2, 26, 0)
> >-#define G_TIME_SPAN_SECOND              (G_GINT64_CONSTANT(1000000))
> >-#endif
> >-
> >-#if !GLIB_CHECK_VERSION(2, 28, 0)
> >-static inline gint64 qemu_g_get_monotonic_time(void)
> >-{
> >-    /* g_get_monotonic_time() is best-effort so we can use the wall clock as a
> >-     * fallback.
> >-     */
> >-
> >-    GTimeVal time;
> >-    g_get_current_time(&time);
> >-
> >-    return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> >-}
> >-/* work around distro backports of this interface */
> >-#define g_get_monotonic_time() qemu_g_get_monotonic_time()
> >-#endif
> >-
> >-#if defined(_WIN32) && !GLIB_CHECK_VERSION(2, 50, 0)
> >-/*
> >- * g_poll has a problem on Windows when using
> >- * timeouts < 10ms, so use wrapper.
> >- */
> >-#define g_poll(fds, nfds, timeout) g_poll_fixed(fds, nfds, timeout)
> >-gint g_poll_fixed(GPollFD *fds, guint nfds, gint timeout);
> >-#endif
> >-
> >-#if !GLIB_CHECK_VERSION(2, 30, 0)
> >-/* Not a 100% compatible implementation, but good enough for most
> >- * cases. Placeholders are only supported at the end of the
> >- * template. */
> >-static inline gchar *qemu_g_dir_make_tmp(gchar const *tmpl, GError **error)
> >-{
> >-    gchar *path = g_build_filename(g_get_tmp_dir(), tmpl ?: ".XXXXXX", NULL);
> >-
> >-    if (mkdtemp(path) != NULL) {
> >-        return path;
> >-    }
> >-    /* Error occurred, clean up. */
> >-    g_set_error(error, G_FILE_ERROR, g_file_error_from_errno(errno),
> >-                "mkdtemp() failed");
> >-    g_free(path);
> >-    return NULL;
> >-}
> >-#define g_dir_make_tmp(tmpl, error) qemu_g_dir_make_tmp(tmpl, error)
> >-#endif /* glib 2.30 */
> >-
> >-#if !GLIB_CHECK_VERSION(2, 31, 0)
> >-/* before glib-2.31, GMutex and GCond was dynamic-only (there was a separate
> >- * GStaticMutex, but it didn't work with condition variables).
> >- *
> >- * Our implementation uses GOnce to fake a static implementation that does
> >- * not require separate initialization.
> >- * We need to rename the types to avoid passing our CompatGMutex/CompatGCond
> >- * by mistake to a function that expects GMutex/GCond.  However, for ease
> >- * of use we keep the GLib function names.  GLib uses macros for the
> >- * implementation, we use inline functions instead and undefine the macros.
> >- */
> >-
> >-typedef struct CompatGMutex {
> >-    GOnce once;
> >-} CompatGMutex;
> >-
> >-typedef struct CompatGCond {
> >-    GOnce once;
> >-} CompatGCond;
> >-
> >-static inline gpointer do_g_mutex_new(gpointer unused)
> >-{
> >-    return (gpointer) g_mutex_new();
> >-}
> >-
> >-static inline void g_mutex_init(CompatGMutex *mutex)
> >-{
> >-    mutex->once = (GOnce) G_ONCE_INIT;
> >-}
> >-
> >-static inline void g_mutex_clear(CompatGMutex *mutex)
> >-{
> >-    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> >-    if (mutex->once.retval) {
> >-        g_mutex_free((GMutex *) mutex->once.retval);
> >-    }
> >-    mutex->once = (GOnce) G_ONCE_INIT;
> >-}
> >-
> >-static inline void (g_mutex_lock)(CompatGMutex *mutex)
> >-{
> >-    g_once(&mutex->once, do_g_mutex_new, NULL);
> >-    g_mutex_lock((GMutex *) mutex->once.retval);
> >-}
> >-#undef g_mutex_lock
> >-
> >-static inline gboolean (g_mutex_trylock)(CompatGMutex *mutex)
> >-{
> >-    g_once(&mutex->once, do_g_mutex_new, NULL);
> >-    return g_mutex_trylock((GMutex *) mutex->once.retval);
> >-}
> >-#undef g_mutex_trylock
> >-
> >-
> >-static inline void (g_mutex_unlock)(CompatGMutex *mutex)
> >-{
> >-    g_mutex_unlock((GMutex *) mutex->once.retval);
> >-}
> >-#undef g_mutex_unlock
> >-
> >-static inline gpointer do_g_cond_new(gpointer unused)
> >-{
> >-    return (gpointer) g_cond_new();
> >-}
> >-
> >-static inline void g_cond_init(CompatGCond *cond)
> >-{
> >-    cond->once = (GOnce) G_ONCE_INIT;
> >-}
> >-
> >-static inline void g_cond_clear(CompatGCond *cond)
> >-{
> >-    g_assert(cond->once.status != G_ONCE_STATUS_PROGRESS);
> >-    if (cond->once.retval) {
> >-        g_cond_free((GCond *) cond->once.retval);
> >-    }
> >-    cond->once = (GOnce) G_ONCE_INIT;
> >-}
> >-
> >-static inline void (g_cond_wait)(CompatGCond *cond, CompatGMutex *mutex)
> >-{
> >-    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> >-    g_once(&cond->once, do_g_cond_new, NULL);
> >-    g_cond_wait((GCond *) cond->once.retval, (GMutex *) mutex->once.retval);
> >-}
> >-#undef g_cond_wait
> >-
> >-static inline void (g_cond_broadcast)(CompatGCond *cond)
> >-{
> >-    g_once(&cond->once, do_g_cond_new, NULL);
> >-    g_cond_broadcast((GCond *) cond->once.retval);
> >-}
> >-#undef g_cond_broadcast
> >-
> >-static inline void (g_cond_signal)(CompatGCond *cond)
> >-{
> >-    g_once(&cond->once, do_g_cond_new, NULL);
> >-    g_cond_signal((GCond *) cond->once.retval);
> >-}
> >-#undef g_cond_signal
> >-
> >-static inline gboolean (g_cond_timed_wait)(CompatGCond *cond,
> >-                                           CompatGMutex *mutex,
> >-                                           GTimeVal *time)
> >-{
> >-    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> >-    g_once(&cond->once, do_g_cond_new, NULL);
> >-    return g_cond_timed_wait((GCond *) cond->once.retval,
> >-                             (GMutex *) mutex->once.retval, time);
> >-}
> >-#undef g_cond_timed_wait
> >-
> >-/* This is not a macro, because it didn't exist until 2.32.  */
> >-static inline gboolean g_cond_wait_until(CompatGCond *cond, CompatGMutex *mutex,
> >-                                         gint64 end_time)
> >-{
> >-    GTimeVal time;
> >-
> >-    /* Convert from monotonic to CLOCK_REALTIME.  */
> >-    end_time -= g_get_monotonic_time();
> >-    g_get_current_time(&time);
> >-    end_time += time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> >-
> >-    time.tv_sec = end_time / G_TIME_SPAN_SECOND;
> >-    time.tv_usec = end_time % G_TIME_SPAN_SECOND;
> >-    return g_cond_timed_wait(cond, mutex, &time);
> >-}
> >-
> >-/* before 2.31 there was no g_thread_new() */
> >-static inline GThread *g_thread_new(const char *name,
> >-                                    GThreadFunc func, gpointer data)
> >-{
> >-    GThread *thread = g_thread_create(func, data, TRUE, NULL);
> >-    if (!thread) {
> >-        g_error("creating thread");
> >-    }
> >-    return thread;
> >-}
> >-#else
> >-#define CompatGMutex GMutex
> >-#define CompatGCond GCond
> >-#endif /* glib 2.31 */
> >-
> >-#if !GLIB_CHECK_VERSION(2, 32, 0)
> >-/* Beware, function returns gboolean since 2.39.2, see GLib commit 9101915 */
> >-static inline void g_hash_table_add(GHashTable *hash_table, gpointer key)
> >-{
> >-    g_hash_table_replace(hash_table, key, key);
> >-}
> >-#endif
> >-
> >-#ifndef g_assert_true
> >-#define g_assert_true(expr)                                                    \
> >-    do {                                                                       \
> >-        if (G_LIKELY(expr)) {                                                  \
> >-        } else {                                                               \
> >-            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> >-                                "'" #expr "' should be TRUE");                 \
> >-        }                                                                      \
> >-    } while (0)
> >-#endif
> >-
> >-#ifndef g_assert_false
> >-#define g_assert_false(expr)                                                   \
> >-    do {                                                                       \
> >-        if (G_LIKELY(!(expr))) {                                               \
> >-        } else {                                                               \
> >-            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> >-                                "'" #expr "' should be FALSE");                \
> >-        }                                                                      \
> >-    } while (0)
> >-#endif
> >-
> >-#ifndef g_assert_null
> >-#define g_assert_null(expr)                                                    \
> >-    do {                                                                       \
> >-        if (G_LIKELY((expr) == NULL)) {                                        \
> >-        } else {                                                               \
> >-            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> >-                                "'" #expr "' should be NULL");                 \
> >-        }                                                                      \
> >-    } while (0)
> >-#endif
> >-
> >-#ifndef g_assert_nonnull
> >-#define g_assert_nonnull(expr)                                                 \
> >-    do {                                                                       \
> >-        if (G_LIKELY((expr) != NULL)) {                                        \
> >-        } else {                                                               \
> >-            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> >-                                "'" #expr "' should not be NULL");             \
> >-        }                                                                      \
> >-    } while (0)
> >-#endif
> >-
> >-#ifndef g_assert_cmpmem
> >-#define g_assert_cmpmem(m1, l1, m2, l2)                                        \
> >-    do {                                                                       \
> >-        gconstpointer __m1 = m1, __m2 = m2;                                    \
> >-        int __l1 = l1, __l2 = l2;                                              \
> >-        if (__l1 != __l2) {                                                    \
> >-            g_assertion_message_cmpnum(                                        \
> >-                G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,                   \
> >-                #l1 " (len(" #m1 ")) == " #l2 " (len(" #m2 "))", __l1, "==",   \
> >-                __l2, 'i');                                                    \
> >-        } else if (memcmp(__m1, __m2, __l1) != 0) {                            \
> >-            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> >-                                "assertion failed (" #m1 " == " #m2 ")");      \
> >-        }                                                                      \
> >-    } while (0)
> >-#endif
> >-
> >-#if !GLIB_CHECK_VERSION(2, 28, 0)
> >-static inline void g_list_free_full(GList *list, GDestroyNotify free_func)
> >-{
> >-    GList *l;
> >-
> >-    for (l = list; l; l = l->next) {
> >-        free_func(l->data);
> >-    }
> >-
> >-    g_list_free(list);
> >-}
> >-
> >-static inline void g_slist_free_full(GSList *list, GDestroyNotify free_func)
> >-{
> >-    GSList *l;
> >-
> >-    for (l = list; l; l = l->next) {
> >-        free_func(l->data);
> >-    }
> >-
> >-    g_slist_free(list);
> >-}
> >-#endif
> >-
> >-#if !GLIB_CHECK_VERSION(2, 26, 0)
> >-static inline void g_source_set_name(GSource *source, const char *name)
> >-{
> >-    /* This is just a debugging aid, so leaving it a no-op */
> >-}
> >-static inline void g_source_set_name_by_id(guint tag, const char *name)
> >-{
> >-    /* This is just a debugging aid, so leaving it a no-op */
> >-}
> >-#endif
> >-
> >-#if !GLIB_CHECK_VERSION(2, 36, 0)
> >-/* Always fail.  This will not include error_report output in the test log,
> >- * sending it instead to stderr.
> >- */
> >-#define g_test_initialized() (0)
> >-#endif
> >-#if !GLIB_CHECK_VERSION(2, 38, 0)
> >-#ifdef CONFIG_HAS_GLIB_SUBPROCESS_TESTS
> >-#error schizophrenic detection of glib subprocess testing
> >-#endif
> >-#define g_test_subprocess() (0)
> >-#endif
> >-
> >-
> >-#if !GLIB_CHECK_VERSION(2, 34, 0)
> >-static inline void
> >-g_test_add_data_func_full(const char *path,
> >-                          gpointer data,
> >-                          gpointer fn,
> >-                          gpointer data_free_func)
> >-{
> >-#if GLIB_CHECK_VERSION(2, 26, 0)
> >-    /* back-compat casts, remove this once we can require new-enough glib */
> >-    g_test_add_vtable(path, 0, data, NULL,
> >-                      (GTestFixtureFunc)fn, (GTestFixtureFunc) data_free_func);
> >-#else
> >-    /* back-compat casts, remove this once we can require new-enough glib */
> >-    g_test_add_vtable(path, 0, data, NULL,
> >-                      (void (*)(void)) fn, (void (*)(void)) data_free_func);
> >-#endif
> >-}
> >-#endif
> >-
> >-
> >-#endif
> >diff --git a/include/glib/glib-compat.h b/include/glib/glib-compat.h
> >new file mode 100644
> >index 0000000..863c8cf
> >--- /dev/null
> >+++ b/include/glib/glib-compat.h
> >@@ -0,0 +1,352 @@
> >+/*
> >+ * GLIB Compatibility Functions
> >+ *
> >+ * Copyright IBM, Corp. 2013
> >+ *
> >+ * Authors:
> >+ *  Anthony Liguori   <aliguori@us.ibm.com>
> >+ *  Michael Tokarev   <mjt@tls.msk.ru>
> >+ *  Paolo Bonzini     <pbonzini@redhat.com>
> >+ *
> >+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
> >+ * See the COPYING file in the top-level directory.
> >+ *
> >+ */
> >+
> >+#ifndef QEMU_GLIB_COMPAT_H
> >+#define QEMU_GLIB_COMPAT_H
> >+
> >+#include <glib.h>
> >+
> >+/* GLIB version compatibility flags */
> >+#if !GLIB_CHECK_VERSION(2, 26, 0)
> >+#define G_TIME_SPAN_SECOND              (G_GINT64_CONSTANT(1000000))
> >+#endif
> >+
> >+#if !GLIB_CHECK_VERSION(2, 28, 0)
> >+static inline gint64 qemu_g_get_monotonic_time(void)
> >+{
> >+    /* g_get_monotonic_time() is best-effort so we can use the wall clock as a
> >+     * fallback.
> >+     */
> >+
> >+    GTimeVal time;
> >+    g_get_current_time(&time);
> >+
> >+    return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> >+}
> >+/* work around distro backports of this interface */
> >+#define g_get_monotonic_time() qemu_g_get_monotonic_time()
> >+#endif
> >+
> >+#if defined(_WIN32) && !GLIB_CHECK_VERSION(2, 50, 0)
> >+/*
> >+ * g_poll has a problem on Windows when using
> >+ * timeouts < 10ms, so use wrapper.
> >+ */
> >+#define g_poll(fds, nfds, timeout) g_poll_fixed(fds, nfds, timeout)
> >+gint g_poll_fixed(GPollFD *fds, guint nfds, gint timeout);
> >+#endif
> >+
> >+#if !GLIB_CHECK_VERSION(2, 30, 0)
> >+/* Not a 100% compatible implementation, but good enough for most
> >+ * cases. Placeholders are only supported at the end of the
> >+ * template. */
> >+static inline gchar *qemu_g_dir_make_tmp(gchar const *tmpl, GError **error)
> >+{
> >+    gchar *path = g_build_filename(g_get_tmp_dir(), tmpl ?: ".XXXXXX", NULL);
> >+
> >+    if (mkdtemp(path) != NULL) {
> >+        return path;
> >+    }
> >+    /* Error occurred, clean up. */
> >+    g_set_error(error, G_FILE_ERROR, g_file_error_from_errno(errno),
> >+                "mkdtemp() failed");
> >+    g_free(path);
> >+    return NULL;
> >+}
> >+#define g_dir_make_tmp(tmpl, error) qemu_g_dir_make_tmp(tmpl, error)
> >+#endif /* glib 2.30 */
> >+
> >+#if !GLIB_CHECK_VERSION(2, 31, 0)
> >+/* before glib-2.31, GMutex and GCond was dynamic-only (there was a separate
> >+ * GStaticMutex, but it didn't work with condition variables).
> >+ *
> >+ * Our implementation uses GOnce to fake a static implementation that does
> >+ * not require separate initialization.
> >+ * We need to rename the types to avoid passing our CompatGMutex/CompatGCond
> >+ * by mistake to a function that expects GMutex/GCond.  However, for ease
> >+ * of use we keep the GLib function names.  GLib uses macros for the
> >+ * implementation, we use inline functions instead and undefine the macros.
> >+ */
> >+
> >+typedef struct CompatGMutex {
> >+    GOnce once;
> >+} CompatGMutex;
> >+
> >+typedef struct CompatGCond {
> >+    GOnce once;
> >+} CompatGCond;
> >+
> >+static inline gpointer do_g_mutex_new(gpointer unused)
> >+{
> >+    return (gpointer) g_mutex_new();
> >+}
> >+
> >+static inline void g_mutex_init(CompatGMutex *mutex)
> >+{
> >+    mutex->once = (GOnce) G_ONCE_INIT;
> >+}
> >+
> >+static inline void g_mutex_clear(CompatGMutex *mutex)
> >+{
> >+    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> >+    if (mutex->once.retval) {
> >+        g_mutex_free((GMutex *) mutex->once.retval);
> >+    }
> >+    mutex->once = (GOnce) G_ONCE_INIT;
> >+}
> >+
> >+static inline void (g_mutex_lock)(CompatGMutex *mutex)
> >+{
> >+    g_once(&mutex->once, do_g_mutex_new, NULL);
> >+    g_mutex_lock((GMutex *) mutex->once.retval);
> >+}
> >+#undef g_mutex_lock
> >+
> >+static inline gboolean (g_mutex_trylock)(CompatGMutex *mutex)
> >+{
> >+    g_once(&mutex->once, do_g_mutex_new, NULL);
> >+    return g_mutex_trylock((GMutex *) mutex->once.retval);
> >+}
> >+#undef g_mutex_trylock
> >+
> >+
> >+static inline void (g_mutex_unlock)(CompatGMutex *mutex)
> >+{
> >+    g_mutex_unlock((GMutex *) mutex->once.retval);
> >+}
> >+#undef g_mutex_unlock
> >+
> >+static inline gpointer do_g_cond_new(gpointer unused)
> >+{
> >+    return (gpointer) g_cond_new();
> >+}
> >+
> >+static inline void g_cond_init(CompatGCond *cond)
> >+{
> >+    cond->once = (GOnce) G_ONCE_INIT;
> >+}
> >+
> >+static inline void g_cond_clear(CompatGCond *cond)
> >+{
> >+    g_assert(cond->once.status != G_ONCE_STATUS_PROGRESS);
> >+    if (cond->once.retval) {
> >+        g_cond_free((GCond *) cond->once.retval);
> >+    }
> >+    cond->once = (GOnce) G_ONCE_INIT;
> >+}
> >+
> >+static inline void (g_cond_wait)(CompatGCond *cond, CompatGMutex *mutex)
> >+{
> >+    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> >+    g_once(&cond->once, do_g_cond_new, NULL);
> >+    g_cond_wait((GCond *) cond->once.retval, (GMutex *) mutex->once.retval);
> >+}
> >+#undef g_cond_wait
> >+
> >+static inline void (g_cond_broadcast)(CompatGCond *cond)
> >+{
> >+    g_once(&cond->once, do_g_cond_new, NULL);
> >+    g_cond_broadcast((GCond *) cond->once.retval);
> >+}
> >+#undef g_cond_broadcast
> >+
> >+static inline void (g_cond_signal)(CompatGCond *cond)
> >+{
> >+    g_once(&cond->once, do_g_cond_new, NULL);
> >+    g_cond_signal((GCond *) cond->once.retval);
> >+}
> >+#undef g_cond_signal
> >+
> >+static inline gboolean (g_cond_timed_wait)(CompatGCond *cond,
> >+                                           CompatGMutex *mutex,
> >+                                           GTimeVal *time)
> >+{
> >+    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> >+    g_once(&cond->once, do_g_cond_new, NULL);
> >+    return g_cond_timed_wait((GCond *) cond->once.retval,
> >+                             (GMutex *) mutex->once.retval, time);
> >+}
> >+#undef g_cond_timed_wait
> >+
> >+/* This is not a macro, because it didn't exist until 2.32.  */
> >+static inline gboolean g_cond_wait_until(CompatGCond *cond, CompatGMutex *mutex,
> >+                                         gint64 end_time)
> >+{
> >+    GTimeVal time;
> >+
> >+    /* Convert from monotonic to CLOCK_REALTIME.  */
> >+    end_time -= g_get_monotonic_time();
> >+    g_get_current_time(&time);
> >+    end_time += time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> >+
> >+    time.tv_sec = end_time / G_TIME_SPAN_SECOND;
> >+    time.tv_usec = end_time % G_TIME_SPAN_SECOND;
> >+    return g_cond_timed_wait(cond, mutex, &time);
> >+}
> >+
> >+/* before 2.31 there was no g_thread_new() */
> >+static inline GThread *g_thread_new(const char *name,
> >+                                    GThreadFunc func, gpointer data)
> >+{
> >+    GThread *thread = g_thread_create(func, data, TRUE, NULL);
> >+    if (!thread) {
> >+        g_error("creating thread");
> >+    }
> >+    return thread;
> >+}
> >+#else
> >+#define CompatGMutex GMutex
> >+#define CompatGCond GCond
> >+#endif /* glib 2.31 */
> >+
> >+#if !GLIB_CHECK_VERSION(2, 32, 0)
> >+/* Beware, function returns gboolean since 2.39.2, see GLib commit 9101915 */
> >+static inline void g_hash_table_add(GHashTable *hash_table, gpointer key)
> >+{
> >+    g_hash_table_replace(hash_table, key, key);
> >+}
> >+#endif
> >+
> >+#ifndef g_assert_true
> >+#define g_assert_true(expr)                                                    \
> >+    do {                                                                       \
> >+        if (G_LIKELY(expr)) {                                                  \
> >+        } else {                                                               \
> >+            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> >+                                "'" #expr "' should be TRUE");                 \
> >+        }                                                                      \
> >+    } while (0)
> >+#endif
> >+
> >+#ifndef g_assert_false
> >+#define g_assert_false(expr)                                                   \
> >+    do {                                                                       \
> >+        if (G_LIKELY(!(expr))) {                                               \
> >+        } else {                                                               \
> >+            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> >+                                "'" #expr "' should be FALSE");                \
> >+        }                                                                      \
> >+    } while (0)
> >+#endif
> >+
> >+#ifndef g_assert_null
> >+#define g_assert_null(expr)                                                    \
> >+    do {                                                                       \
> >+        if (G_LIKELY((expr) == NULL)) {                                        \
> >+        } else {                                                               \
> >+            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> >+                                "'" #expr "' should be NULL");                 \
> >+        }                                                                      \
> >+    } while (0)
> >+#endif
> >+
> >+#ifndef g_assert_nonnull
> >+#define g_assert_nonnull(expr)                                                 \
> >+    do {                                                                       \
> >+        if (G_LIKELY((expr) != NULL)) {                                        \
> >+        } else {                                                               \
> >+            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> >+                                "'" #expr "' should not be NULL");             \
> >+        }                                                                      \
> >+    } while (0)
> >+#endif
> >+
> >+#ifndef g_assert_cmpmem
> >+#define g_assert_cmpmem(m1, l1, m2, l2)                                        \
> >+    do {                                                                       \
> >+        gconstpointer __m1 = m1, __m2 = m2;                                    \
> >+        int __l1 = l1, __l2 = l2;                                              \
> >+        if (__l1 != __l2) {                                                    \
> >+            g_assertion_message_cmpnum(                                        \
> >+                G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,                   \
> >+                #l1 " (len(" #m1 ")) == " #l2 " (len(" #m2 "))", __l1, "==",   \
> >+                __l2, 'i');                                                    \
> >+        } else if (memcmp(__m1, __m2, __l1) != 0) {                            \
> >+            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> >+                                "assertion failed (" #m1 " == " #m2 ")");      \
> >+        }                                                                      \
> >+    } while (0)
> >+#endif
> >+
> >+#if !GLIB_CHECK_VERSION(2, 28, 0)
> >+static inline void g_list_free_full(GList *list, GDestroyNotify free_func)
> >+{
> >+    GList *l;
> >+
> >+    for (l = list; l; l = l->next) {
> >+        free_func(l->data);
> >+    }
> >+
> >+    g_list_free(list);
> >+}
> >+
> >+static inline void g_slist_free_full(GSList *list, GDestroyNotify free_func)
> >+{
> >+    GSList *l;
> >+
> >+    for (l = list; l; l = l->next) {
> >+        free_func(l->data);
> >+    }
> >+
> >+    g_slist_free(list);
> >+}
> >+#endif
> >+
> >+#if !GLIB_CHECK_VERSION(2, 26, 0)
> >+static inline void g_source_set_name(GSource *source, const char *name)
> >+{
> >+    /* This is just a debugging aid, so leaving it a no-op */
> >+}
> >+static inline void g_source_set_name_by_id(guint tag, const char *name)
> >+{
> >+    /* This is just a debugging aid, so leaving it a no-op */
> >+}
> >+#endif
> >+
> >+#if !GLIB_CHECK_VERSION(2, 36, 0)
> >+/* Always fail.  This will not include error_report output in the test log,
> >+ * sending it instead to stderr.
> >+ */
> >+#define g_test_initialized() (0)
> >+#endif
> >+#if !GLIB_CHECK_VERSION(2, 38, 0)
> >+#ifdef CONFIG_HAS_GLIB_SUBPROCESS_TESTS
> >+#error schizophrenic detection of glib subprocess testing
> >+#endif
> >+#define g_test_subprocess() (0)
> >+#endif
> >+
> >+
> >+#if !GLIB_CHECK_VERSION(2, 34, 0)
> >+static inline void
> >+g_test_add_data_func_full(const char *path,
> >+                          gpointer data,
> >+                          gpointer fn,
> >+                          gpointer data_free_func)
> >+{
> >+#if GLIB_CHECK_VERSION(2, 26, 0)
> >+    /* back-compat casts, remove this once we can require new-enough glib */
> >+    g_test_add_vtable(path, 0, data, NULL,
> >+                      (GTestFixtureFunc)fn, (GTestFixtureFunc) data_free_func);
> >+#else
> >+    /* back-compat casts, remove this once we can require new-enough glib */
> >+    g_test_add_vtable(path, 0, data, NULL,
> >+                      (void (*)(void)) fn, (void (*)(void)) data_free_func);
> >+#endif
> >+}
> >+#endif
> >+
> >+
> >+#endif
> >diff --git a/include/glib/glib-helper.h b/include/glib/glib-helper.h
> >new file mode 100644
> >index 0000000..db740fb
> >--- /dev/null
> >+++ b/include/glib/glib-helper.h
> >@@ -0,0 +1,30 @@
> >+/*
> >+ * Helpers for GLIB
> >+ *
> >+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
> >+ * See the COPYING file in the top-level directory.
> >+ *
> >+ */
> >+
> >+#ifndef QEMU_GLIB_HELPER_H
> >+#define QEMU_GLIB_HELPER_H
> >+
> >+
> >+#include "glib/glib-compat.h"
> >+
> >+#define GPOINTER_TO_UINT64(a) ((guint64) (a))
> >+
> >+/*
> >+ * return 1 in case of a > b, -1 otherwise and 0 if equeal
> >+ */
> >+gint g_int_cmp64(gconstpointer a, gconstpointer b,
> >+        gpointer __attribute__((unused)) user_data);
> >+
> >+/*
> >+ * return 1 in case of a > b, -1 otherwise and 0 if equeal
> >+ */
> >+int g_int_cmp(gconstpointer a, gconstpointer b,
> >+        gpointer __attribute__((unused)) user_data);
> >+
> >+#endif /* QEMU_GLIB_HELPER_H */
> >+
> >diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> >index 122ff06..36f8a89 100644
> >--- a/include/qemu/osdep.h
> >+++ b/include/qemu/osdep.h
> >@@ -104,7 +104,7 @@ extern int daemon(int, int);
> > #include "sysemu/os-posix.h"
> > #endif
> >
> >-#include "glib-compat.h"
> >+#include "glib/glib-compat.h"
> > #include "qemu/typedefs.h"
> >
> > #ifndef O_LARGEFILE
> >diff --git a/linux-user/main.c b/linux-user/main.c
> >index 10a3bb3..7cea6bc 100644
> >--- a/linux-user/main.c
> >+++ b/linux-user/main.c
> >@@ -35,7 +35,7 @@
> > #include "elf.h"
> > #include "exec/log.h"
> > #include "trace/control.h"
> >-#include "glib-compat.h"
> >+#include "glib/glib-compat.h"
> >
> > char *exec_path;
> >
> >diff --git a/scripts/clean-includes b/scripts/clean-includes
> >index dd938da..b32b928 100755
> >--- a/scripts/clean-includes
> >+++ b/scripts/clean-includes
> >@@ -123,7 +123,7 @@ for f in "$@"; do
> >       ;;
> >     *include/qemu/osdep.h | \
> >     *include/qemu/compiler.h | \
> >-    *include/glib-compat.h | \
> >+    *include/glib/glib-compat.h | \
> >     *include/sysemu/os-posix.h | \
> >     *include/sysemu/os-win32.h | \
> >     *include/standard-headers/ )
> >diff --git a/util/Makefile.objs b/util/Makefile.objs
> >index c6205eb..0080712 100644
> >--- a/util/Makefile.objs
> >+++ b/util/Makefile.objs
> >@@ -43,3 +43,4 @@ util-obj-y += qdist.o
> > util-obj-y += qht.o
> > util-obj-y += range.o
> > util-obj-y += systemd.o
> >+util-obj-y += glib-helper.o
> >diff --git a/util/glib-helper.c b/util/glib-helper.c
> >new file mode 100644
> >index 0000000..2557009
> >--- /dev/null
> >+++ b/util/glib-helper.c
> >@@ -0,0 +1,29 @@
> >+/*
> >+ * Implementation for GLIB helpers
> >+ * this file is intented to commulate and later reuse
> >+ * additional glib functions
> >+ *
> >+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
> >+ * See the COPYING file in the top-level directory.
> >+
> >+ */
> >+
> >+#include "glib/glib-helper.h"
> >+
> >+gint g_int_cmp64(gconstpointer a, gconstpointer b,
> >+        gpointer __attribute__((unused)) user_data)
> >+{
> >+    guint64 ua = GPOINTER_TO_UINT64(a);
> >+    guint64 ub = GPOINTER_TO_UINT64(b);
> >+    return (ua > ub) - (ua < ub);
> >+}
> >+
> >+/*
> >+ * return 1 in case of a > b, -1 otherwise and 0 if equeal
> >+ */
> >+gint g_int_cmp(gconstpointer a, gconstpointer b,
> >+        gpointer __attribute__((unused)) user_data)
> >+{
> >+    return g_int_cmp64(a, b, user_data);
> >+}
> >+
> >
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] migration: detailed traces for postcopy
  2017-04-14 13:17     ` [Qemu-devel] [PATCH 6/6] migration: detailed traces for postcopy Alexey Perevalov
@ 2017-04-17 13:32       ` Philippe Mathieu-Daudé
  2017-04-24 18:03       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-04-17 13:32 UTC (permalink / raw)
  To: Alexey Perevalov, dgilbert, qemu-devel; +Cc: i.maximets

Hi Alexey,

On 04/14/2017 10:17 AM, Alexey Perevalov wrote:
> It could help to track down vCPU state during page fault and
> page fault sources.
>
> This patch showes proc's status/stack/syscall file at the moment of pagefault,
> it's very interesting to know who was page fault initiator.
>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  migration/postcopy-ram.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++-
>  migration/trace-events   |  6 +++
>  2 files changed, 103 insertions(+), 1 deletion(-)
>
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 42330fd..513633c 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -412,7 +412,91 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>      return 0;
>  }
>
> -static int get_mem_fault_cpu_index(uint32_t pid)
> +#define PROC_LEN 1024

Remove PROC_LEN it is not used.

> +#define DEBUG_FAULT_PROCESS_STATUS 1
> +
> +#ifdef DEBUG_FAULT_PROCESS_STATUS
> +
> +static FILE *get_proc_file(const gchar *frmt, pid_t thread_id)
> +{
> +    FILE *f = NULL;
> +    gchar *file_path = g_strdup_printf(frmt, thread_id);
> +    if (file_path == NULL) {
> +        error_report("Couldn't allocate path for %u", thread_id);
> +        return NULL;
> +    }
> +    f = fopen(file_path, "r");
> +    if (!f) {
> +        error_report("can't open %s", file_path);
> +    }
> +
> +    trace_get_proc_file(file_path);
> +    g_free(file_path);
> +    return f;
> +}
> +
> +typedef void(*proc_line_handler)(const char *line);
> +
> +static void proc_line_cb(const char *line)
> +{
> +    /* trace_ functions are inline */
> +    trace_proc_line_cb(line);
> +}
> +
> +static void foreach_line_in_file(FILE *f, proc_line_handler cb)
> +{
> +    char *line = NULL;
> +    ssize_t read;
> +    size_t len;

Please initialize len = 0.

> +
> +    while ((read = getline(&line, &len, f)) != -1) {
> +        /* workaround, trace_ infrastructure already insert \n
> +         * and getline includes it */
> +        ssize_t str_len = strlen(line) - 1;
> +        if (str_len <= 0)
> +            continue;
> +        line[str_len] = '\0';
> +        cb(line);
> +    }
> +    free(line);
> +}
> +
> +static void observe_thread_proc(const gchar *path_frmt, pid_t thread_id)
> +{
> +    FILE *f = get_proc_file(path_frmt, thread_id);
> +    if (!f) {
> +        error_report("can't read thread's proc");

I'm not sure this _error_ is useful, it may be noisy (kernels compiled 
without CONFIG_HAVE_ARCH_TRACEHOOK).

> +        return;
> +    }
> +
> +    foreach_line_in_file(f, proc_line_cb);
> +    fclose(f);
> +}
> +
> +/*
> + * for convinience tracing need to trace
> + * observe_thread_begin
> + * get_proc_file
> + * proc_line_cb
> + * observe_thread_end
> + */
> +static void observe_thread(const char *msg, pid_t thread_id)
> +{
> +    trace_observe_thread_begin(msg);
> +    observe_thread_proc("/proc/%d/status", thread_id);

Better use FMT_pid from "qemu/osdep.h":

     observe_thread_proc("/proc/%" FMT_pid "/status", thread_id);

> +    observe_thread_proc("/proc/%d/syscall", thread_id);
> +    observe_thread_proc("/proc/%d/stack", thread_id);
> +    trace_observe_thread_end(msg);
> +}
> +
> +#else
> +static void observe_thread(const char *msg, pid_t thread_id)
> +{
> +}
> +
> +#endif /* DEBUG_FAULT_PROCESS_STATUS */
> +
> +static int get_mem_fault_cpu_index(pid_t pid)
>  {
>      CPUState *cpu_iter;
>
> @@ -421,9 +505,20 @@ static int get_mem_fault_cpu_index(uint32_t pid)
>             return cpu_iter->cpu_index;
>      }
>      trace_get_mem_fault_cpu_index(pid);
> +    observe_thread("not a vCPU", pid);
> +
>      return -1;
>  }
>
> +static void observe_vcpu_state(void)
> +{
> +    CPUState *cpu_iter;
> +    CPU_FOREACH(cpu_iter) {
> +        observe_thread("vCPU", cpu_iter->thread_id);
> +        trace_vcpu_state(cpu_iter->running, cpu_iter->cpu_index);

You inverted arguments:

trace_vcpu_state(cpu_iter->cpu_index, cpu_iter->running);

> +    }
> +}
> +
>  /*
>   * Handle faults detected by the USERFAULT markings
>   */
> @@ -465,6 +560,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          }
>
>          ret = read(mis->userfault_fd, &msg, sizeof(msg));
> +        observe_vcpu_state();
>          if (ret != sizeof(msg)) {
>              if (errno == EAGAIN) {
>                  /*
> diff --git a/migration/trace-events b/migration/trace-events
> index ab2e1e4..3a74f91 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -202,6 +202,12 @@ save_xbzrle_page_overflow(void) ""
>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
>  get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
> +observe_thread_status(int ptid, char *name, char *status) "host_tid %d %s %s"

First argument is "pid_t ptid" so format should be "host_tid " FMT_pid " 
%s %s" but there is no trace_observe_thread_status() in your code...

> +vcpu_state(int cpu_index, int is_running) "cpu %d running %d"
> +proc_line_cb(const char *str) "%s"
> +get_proc_file(const char *str) "opened %s"
> +observe_thread_begin(const char *str) "%s"
> +observe_thread_end(const char *str) "%s"
>
>  # migration/exec.c
>  migration_exec_outgoing(const char *cmd) "cmd=%s"
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c
  2017-04-14 16:05       ` Philippe Mathieu-Daudé
  2017-04-17  7:07         ` Alexey
@ 2017-04-21 10:01         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-21 10:01 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé; +Cc: Alexey Perevalov, qemu-devel, i.maximets

* Philippe Mathieu-Daudé (f4bug@amsat.org) wrote:
> Hi Alexey,
> 
> On 04/14/2017 10:17 AM, Alexey Perevalov wrote:
> > There is a lack of g_int_cmp which compares pointers value in glib,
> > xen_disk.c introduced its own, so the same function now requires
> > in migration.c. So logically to move it into common place.
> > Futher: maybe extend glib.
> > 
> > Also this commit moves existing glib-compat.h into util/glib
> > folder for consolidation purpose.
> 
> Can you do this in two commits? First one moving files only, second move the
> function?

Yes, if you're lucky and do it with git mv  then perhaps git  will generate us a nice
small commit with a move in it rather than a vast delete/add.

> I'm not sure naming it "g_int_cmp()" won't clash with future _extended_
> glib, what do you think about naming it "qemu_g_int_cmp()"?

Also agreed.

Dave

> 
> > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > ---
> >  hw/block/xen_disk.c        |  10 +-
> >  include/glib-compat.h      | 352 ---------------------------------------------
> >  include/glib/glib-compat.h | 352 +++++++++++++++++++++++++++++++++++++++++++++
> >  include/glib/glib-helper.h |  30 ++++
> >  include/qemu/osdep.h       |   2 +-
> >  linux-user/main.c          |   2 +-
> >  scripts/clean-includes     |   2 +-
> >  util/Makefile.objs         |   1 +
> >  util/glib-helper.c         |  29 ++++
> >  9 files changed, 417 insertions(+), 363 deletions(-)
> >  delete mode 100644 include/glib-compat.h
> >  create mode 100644 include/glib/glib-compat.h
> >  create mode 100644 include/glib/glib-helper.h
> >  create mode 100644 util/glib-helper.c
> > 
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index 456a2d5..36f6396 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -20,6 +20,7 @@
> >   */
> > 
> >  #include "qemu/osdep.h"
> > +#include "glib/glib-helper.h"
> >  #include <sys/ioctl.h>
> >  #include <sys/uio.h>
> > 
> > @@ -154,13 +155,6 @@ static void ioreq_reset(struct ioreq *ioreq)
> >      qemu_iovec_reset(&ioreq->v);
> >  }
> > 
> > -static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
> > -{
> > -    uint ua = GPOINTER_TO_UINT(a);
> > -    uint ub = GPOINTER_TO_UINT(b);
> > -    return (ua > ub) - (ua < ub);
> > -}
> > -
> >  static void destroy_grant(gpointer pgnt)
> >  {
> >      PersistentGrant *grant = pgnt;
> > @@ -1191,7 +1185,7 @@ static int blk_connect(struct XenDevice *xendev)
> >      if (blkdev->feature_persistent) {
> >          /* Init persistent grants */
> >          blkdev->max_grants = max_requests * BLKIF_MAX_SEGMENTS_PER_REQUEST;
> > -        blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)int_cmp,
> > +        blkdev->persistent_gnts = g_tree_new_full((GCompareDataFunc)g_int_cmp,
> >                                               NULL, NULL,
> >                                               batch_maps ?
> >                                               (GDestroyNotify)g_free :
> > diff --git a/include/glib-compat.h b/include/glib-compat.h
> > deleted file mode 100644
> > index 863c8cf..0000000
> > --- a/include/glib-compat.h
> > +++ /dev/null
> > @@ -1,352 +0,0 @@
> > -/*
> > - * GLIB Compatibility Functions
> > - *
> > - * Copyright IBM, Corp. 2013
> > - *
> > - * Authors:
> > - *  Anthony Liguori   <aliguori@us.ibm.com>
> > - *  Michael Tokarev   <mjt@tls.msk.ru>
> > - *  Paolo Bonzini     <pbonzini@redhat.com>
> > - *
> > - * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > - * See the COPYING file in the top-level directory.
> > - *
> > - */
> > -
> > -#ifndef QEMU_GLIB_COMPAT_H
> > -#define QEMU_GLIB_COMPAT_H
> > -
> > -#include <glib.h>
> > -
> > -/* GLIB version compatibility flags */
> > -#if !GLIB_CHECK_VERSION(2, 26, 0)
> > -#define G_TIME_SPAN_SECOND              (G_GINT64_CONSTANT(1000000))
> > -#endif
> > -
> > -#if !GLIB_CHECK_VERSION(2, 28, 0)
> > -static inline gint64 qemu_g_get_monotonic_time(void)
> > -{
> > -    /* g_get_monotonic_time() is best-effort so we can use the wall clock as a
> > -     * fallback.
> > -     */
> > -
> > -    GTimeVal time;
> > -    g_get_current_time(&time);
> > -
> > -    return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> > -}
> > -/* work around distro backports of this interface */
> > -#define g_get_monotonic_time() qemu_g_get_monotonic_time()
> > -#endif
> > -
> > -#if defined(_WIN32) && !GLIB_CHECK_VERSION(2, 50, 0)
> > -/*
> > - * g_poll has a problem on Windows when using
> > - * timeouts < 10ms, so use wrapper.
> > - */
> > -#define g_poll(fds, nfds, timeout) g_poll_fixed(fds, nfds, timeout)
> > -gint g_poll_fixed(GPollFD *fds, guint nfds, gint timeout);
> > -#endif
> > -
> > -#if !GLIB_CHECK_VERSION(2, 30, 0)
> > -/* Not a 100% compatible implementation, but good enough for most
> > - * cases. Placeholders are only supported at the end of the
> > - * template. */
> > -static inline gchar *qemu_g_dir_make_tmp(gchar const *tmpl, GError **error)
> > -{
> > -    gchar *path = g_build_filename(g_get_tmp_dir(), tmpl ?: ".XXXXXX", NULL);
> > -
> > -    if (mkdtemp(path) != NULL) {
> > -        return path;
> > -    }
> > -    /* Error occurred, clean up. */
> > -    g_set_error(error, G_FILE_ERROR, g_file_error_from_errno(errno),
> > -                "mkdtemp() failed");
> > -    g_free(path);
> > -    return NULL;
> > -}
> > -#define g_dir_make_tmp(tmpl, error) qemu_g_dir_make_tmp(tmpl, error)
> > -#endif /* glib 2.30 */
> > -
> > -#if !GLIB_CHECK_VERSION(2, 31, 0)
> > -/* before glib-2.31, GMutex and GCond was dynamic-only (there was a separate
> > - * GStaticMutex, but it didn't work with condition variables).
> > - *
> > - * Our implementation uses GOnce to fake a static implementation that does
> > - * not require separate initialization.
> > - * We need to rename the types to avoid passing our CompatGMutex/CompatGCond
> > - * by mistake to a function that expects GMutex/GCond.  However, for ease
> > - * of use we keep the GLib function names.  GLib uses macros for the
> > - * implementation, we use inline functions instead and undefine the macros.
> > - */
> > -
> > -typedef struct CompatGMutex {
> > -    GOnce once;
> > -} CompatGMutex;
> > -
> > -typedef struct CompatGCond {
> > -    GOnce once;
> > -} CompatGCond;
> > -
> > -static inline gpointer do_g_mutex_new(gpointer unused)
> > -{
> > -    return (gpointer) g_mutex_new();
> > -}
> > -
> > -static inline void g_mutex_init(CompatGMutex *mutex)
> > -{
> > -    mutex->once = (GOnce) G_ONCE_INIT;
> > -}
> > -
> > -static inline void g_mutex_clear(CompatGMutex *mutex)
> > -{
> > -    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> > -    if (mutex->once.retval) {
> > -        g_mutex_free((GMutex *) mutex->once.retval);
> > -    }
> > -    mutex->once = (GOnce) G_ONCE_INIT;
> > -}
> > -
> > -static inline void (g_mutex_lock)(CompatGMutex *mutex)
> > -{
> > -    g_once(&mutex->once, do_g_mutex_new, NULL);
> > -    g_mutex_lock((GMutex *) mutex->once.retval);
> > -}
> > -#undef g_mutex_lock
> > -
> > -static inline gboolean (g_mutex_trylock)(CompatGMutex *mutex)
> > -{
> > -    g_once(&mutex->once, do_g_mutex_new, NULL);
> > -    return g_mutex_trylock((GMutex *) mutex->once.retval);
> > -}
> > -#undef g_mutex_trylock
> > -
> > -
> > -static inline void (g_mutex_unlock)(CompatGMutex *mutex)
> > -{
> > -    g_mutex_unlock((GMutex *) mutex->once.retval);
> > -}
> > -#undef g_mutex_unlock
> > -
> > -static inline gpointer do_g_cond_new(gpointer unused)
> > -{
> > -    return (gpointer) g_cond_new();
> > -}
> > -
> > -static inline void g_cond_init(CompatGCond *cond)
> > -{
> > -    cond->once = (GOnce) G_ONCE_INIT;
> > -}
> > -
> > -static inline void g_cond_clear(CompatGCond *cond)
> > -{
> > -    g_assert(cond->once.status != G_ONCE_STATUS_PROGRESS);
> > -    if (cond->once.retval) {
> > -        g_cond_free((GCond *) cond->once.retval);
> > -    }
> > -    cond->once = (GOnce) G_ONCE_INIT;
> > -}
> > -
> > -static inline void (g_cond_wait)(CompatGCond *cond, CompatGMutex *mutex)
> > -{
> > -    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> > -    g_once(&cond->once, do_g_cond_new, NULL);
> > -    g_cond_wait((GCond *) cond->once.retval, (GMutex *) mutex->once.retval);
> > -}
> > -#undef g_cond_wait
> > -
> > -static inline void (g_cond_broadcast)(CompatGCond *cond)
> > -{
> > -    g_once(&cond->once, do_g_cond_new, NULL);
> > -    g_cond_broadcast((GCond *) cond->once.retval);
> > -}
> > -#undef g_cond_broadcast
> > -
> > -static inline void (g_cond_signal)(CompatGCond *cond)
> > -{
> > -    g_once(&cond->once, do_g_cond_new, NULL);
> > -    g_cond_signal((GCond *) cond->once.retval);
> > -}
> > -#undef g_cond_signal
> > -
> > -static inline gboolean (g_cond_timed_wait)(CompatGCond *cond,
> > -                                           CompatGMutex *mutex,
> > -                                           GTimeVal *time)
> > -{
> > -    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> > -    g_once(&cond->once, do_g_cond_new, NULL);
> > -    return g_cond_timed_wait((GCond *) cond->once.retval,
> > -                             (GMutex *) mutex->once.retval, time);
> > -}
> > -#undef g_cond_timed_wait
> > -
> > -/* This is not a macro, because it didn't exist until 2.32.  */
> > -static inline gboolean g_cond_wait_until(CompatGCond *cond, CompatGMutex *mutex,
> > -                                         gint64 end_time)
> > -{
> > -    GTimeVal time;
> > -
> > -    /* Convert from monotonic to CLOCK_REALTIME.  */
> > -    end_time -= g_get_monotonic_time();
> > -    g_get_current_time(&time);
> > -    end_time += time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> > -
> > -    time.tv_sec = end_time / G_TIME_SPAN_SECOND;
> > -    time.tv_usec = end_time % G_TIME_SPAN_SECOND;
> > -    return g_cond_timed_wait(cond, mutex, &time);
> > -}
> > -
> > -/* before 2.31 there was no g_thread_new() */
> > -static inline GThread *g_thread_new(const char *name,
> > -                                    GThreadFunc func, gpointer data)
> > -{
> > -    GThread *thread = g_thread_create(func, data, TRUE, NULL);
> > -    if (!thread) {
> > -        g_error("creating thread");
> > -    }
> > -    return thread;
> > -}
> > -#else
> > -#define CompatGMutex GMutex
> > -#define CompatGCond GCond
> > -#endif /* glib 2.31 */
> > -
> > -#if !GLIB_CHECK_VERSION(2, 32, 0)
> > -/* Beware, function returns gboolean since 2.39.2, see GLib commit 9101915 */
> > -static inline void g_hash_table_add(GHashTable *hash_table, gpointer key)
> > -{
> > -    g_hash_table_replace(hash_table, key, key);
> > -}
> > -#endif
> > -
> > -#ifndef g_assert_true
> > -#define g_assert_true(expr)                                                    \
> > -    do {                                                                       \
> > -        if (G_LIKELY(expr)) {                                                  \
> > -        } else {                                                               \
> > -            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> > -                                "'" #expr "' should be TRUE");                 \
> > -        }                                                                      \
> > -    } while (0)
> > -#endif
> > -
> > -#ifndef g_assert_false
> > -#define g_assert_false(expr)                                                   \
> > -    do {                                                                       \
> > -        if (G_LIKELY(!(expr))) {                                               \
> > -        } else {                                                               \
> > -            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> > -                                "'" #expr "' should be FALSE");                \
> > -        }                                                                      \
> > -    } while (0)
> > -#endif
> > -
> > -#ifndef g_assert_null
> > -#define g_assert_null(expr)                                                    \
> > -    do {                                                                       \
> > -        if (G_LIKELY((expr) == NULL)) {                                        \
> > -        } else {                                                               \
> > -            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> > -                                "'" #expr "' should be NULL");                 \
> > -        }                                                                      \
> > -    } while (0)
> > -#endif
> > -
> > -#ifndef g_assert_nonnull
> > -#define g_assert_nonnull(expr)                                                 \
> > -    do {                                                                       \
> > -        if (G_LIKELY((expr) != NULL)) {                                        \
> > -        } else {                                                               \
> > -            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> > -                                "'" #expr "' should not be NULL");             \
> > -        }                                                                      \
> > -    } while (0)
> > -#endif
> > -
> > -#ifndef g_assert_cmpmem
> > -#define g_assert_cmpmem(m1, l1, m2, l2)                                        \
> > -    do {                                                                       \
> > -        gconstpointer __m1 = m1, __m2 = m2;                                    \
> > -        int __l1 = l1, __l2 = l2;                                              \
> > -        if (__l1 != __l2) {                                                    \
> > -            g_assertion_message_cmpnum(                                        \
> > -                G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,                   \
> > -                #l1 " (len(" #m1 ")) == " #l2 " (len(" #m2 "))", __l1, "==",   \
> > -                __l2, 'i');                                                    \
> > -        } else if (memcmp(__m1, __m2, __l1) != 0) {                            \
> > -            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> > -                                "assertion failed (" #m1 " == " #m2 ")");      \
> > -        }                                                                      \
> > -    } while (0)
> > -#endif
> > -
> > -#if !GLIB_CHECK_VERSION(2, 28, 0)
> > -static inline void g_list_free_full(GList *list, GDestroyNotify free_func)
> > -{
> > -    GList *l;
> > -
> > -    for (l = list; l; l = l->next) {
> > -        free_func(l->data);
> > -    }
> > -
> > -    g_list_free(list);
> > -}
> > -
> > -static inline void g_slist_free_full(GSList *list, GDestroyNotify free_func)
> > -{
> > -    GSList *l;
> > -
> > -    for (l = list; l; l = l->next) {
> > -        free_func(l->data);
> > -    }
> > -
> > -    g_slist_free(list);
> > -}
> > -#endif
> > -
> > -#if !GLIB_CHECK_VERSION(2, 26, 0)
> > -static inline void g_source_set_name(GSource *source, const char *name)
> > -{
> > -    /* This is just a debugging aid, so leaving it a no-op */
> > -}
> > -static inline void g_source_set_name_by_id(guint tag, const char *name)
> > -{
> > -    /* This is just a debugging aid, so leaving it a no-op */
> > -}
> > -#endif
> > -
> > -#if !GLIB_CHECK_VERSION(2, 36, 0)
> > -/* Always fail.  This will not include error_report output in the test log,
> > - * sending it instead to stderr.
> > - */
> > -#define g_test_initialized() (0)
> > -#endif
> > -#if !GLIB_CHECK_VERSION(2, 38, 0)
> > -#ifdef CONFIG_HAS_GLIB_SUBPROCESS_TESTS
> > -#error schizophrenic detection of glib subprocess testing
> > -#endif
> > -#define g_test_subprocess() (0)
> > -#endif
> > -
> > -
> > -#if !GLIB_CHECK_VERSION(2, 34, 0)
> > -static inline void
> > -g_test_add_data_func_full(const char *path,
> > -                          gpointer data,
> > -                          gpointer fn,
> > -                          gpointer data_free_func)
> > -{
> > -#if GLIB_CHECK_VERSION(2, 26, 0)
> > -    /* back-compat casts, remove this once we can require new-enough glib */
> > -    g_test_add_vtable(path, 0, data, NULL,
> > -                      (GTestFixtureFunc)fn, (GTestFixtureFunc) data_free_func);
> > -#else
> > -    /* back-compat casts, remove this once we can require new-enough glib */
> > -    g_test_add_vtable(path, 0, data, NULL,
> > -                      (void (*)(void)) fn, (void (*)(void)) data_free_func);
> > -#endif
> > -}
> > -#endif
> > -
> > -
> > -#endif
> > diff --git a/include/glib/glib-compat.h b/include/glib/glib-compat.h
> > new file mode 100644
> > index 0000000..863c8cf
> > --- /dev/null
> > +++ b/include/glib/glib-compat.h
> > @@ -0,0 +1,352 @@
> > +/*
> > + * GLIB Compatibility Functions
> > + *
> > + * Copyright IBM, Corp. 2013
> > + *
> > + * Authors:
> > + *  Anthony Liguori   <aliguori@us.ibm.com>
> > + *  Michael Tokarev   <mjt@tls.msk.ru>
> > + *  Paolo Bonzini     <pbonzini@redhat.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef QEMU_GLIB_COMPAT_H
> > +#define QEMU_GLIB_COMPAT_H
> > +
> > +#include <glib.h>
> > +
> > +/* GLIB version compatibility flags */
> > +#if !GLIB_CHECK_VERSION(2, 26, 0)
> > +#define G_TIME_SPAN_SECOND              (G_GINT64_CONSTANT(1000000))
> > +#endif
> > +
> > +#if !GLIB_CHECK_VERSION(2, 28, 0)
> > +static inline gint64 qemu_g_get_monotonic_time(void)
> > +{
> > +    /* g_get_monotonic_time() is best-effort so we can use the wall clock as a
> > +     * fallback.
> > +     */
> > +
> > +    GTimeVal time;
> > +    g_get_current_time(&time);
> > +
> > +    return time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> > +}
> > +/* work around distro backports of this interface */
> > +#define g_get_monotonic_time() qemu_g_get_monotonic_time()
> > +#endif
> > +
> > +#if defined(_WIN32) && !GLIB_CHECK_VERSION(2, 50, 0)
> > +/*
> > + * g_poll has a problem on Windows when using
> > + * timeouts < 10ms, so use wrapper.
> > + */
> > +#define g_poll(fds, nfds, timeout) g_poll_fixed(fds, nfds, timeout)
> > +gint g_poll_fixed(GPollFD *fds, guint nfds, gint timeout);
> > +#endif
> > +
> > +#if !GLIB_CHECK_VERSION(2, 30, 0)
> > +/* Not a 100% compatible implementation, but good enough for most
> > + * cases. Placeholders are only supported at the end of the
> > + * template. */
> > +static inline gchar *qemu_g_dir_make_tmp(gchar const *tmpl, GError **error)
> > +{
> > +    gchar *path = g_build_filename(g_get_tmp_dir(), tmpl ?: ".XXXXXX", NULL);
> > +
> > +    if (mkdtemp(path) != NULL) {
> > +        return path;
> > +    }
> > +    /* Error occurred, clean up. */
> > +    g_set_error(error, G_FILE_ERROR, g_file_error_from_errno(errno),
> > +                "mkdtemp() failed");
> > +    g_free(path);
> > +    return NULL;
> > +}
> > +#define g_dir_make_tmp(tmpl, error) qemu_g_dir_make_tmp(tmpl, error)
> > +#endif /* glib 2.30 */
> > +
> > +#if !GLIB_CHECK_VERSION(2, 31, 0)
> > +/* before glib-2.31, GMutex and GCond was dynamic-only (there was a separate
> > + * GStaticMutex, but it didn't work with condition variables).
> > + *
> > + * Our implementation uses GOnce to fake a static implementation that does
> > + * not require separate initialization.
> > + * We need to rename the types to avoid passing our CompatGMutex/CompatGCond
> > + * by mistake to a function that expects GMutex/GCond.  However, for ease
> > + * of use we keep the GLib function names.  GLib uses macros for the
> > + * implementation, we use inline functions instead and undefine the macros.
> > + */
> > +
> > +typedef struct CompatGMutex {
> > +    GOnce once;
> > +} CompatGMutex;
> > +
> > +typedef struct CompatGCond {
> > +    GOnce once;
> > +} CompatGCond;
> > +
> > +static inline gpointer do_g_mutex_new(gpointer unused)
> > +{
> > +    return (gpointer) g_mutex_new();
> > +}
> > +
> > +static inline void g_mutex_init(CompatGMutex *mutex)
> > +{
> > +    mutex->once = (GOnce) G_ONCE_INIT;
> > +}
> > +
> > +static inline void g_mutex_clear(CompatGMutex *mutex)
> > +{
> > +    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> > +    if (mutex->once.retval) {
> > +        g_mutex_free((GMutex *) mutex->once.retval);
> > +    }
> > +    mutex->once = (GOnce) G_ONCE_INIT;
> > +}
> > +
> > +static inline void (g_mutex_lock)(CompatGMutex *mutex)
> > +{
> > +    g_once(&mutex->once, do_g_mutex_new, NULL);
> > +    g_mutex_lock((GMutex *) mutex->once.retval);
> > +}
> > +#undef g_mutex_lock
> > +
> > +static inline gboolean (g_mutex_trylock)(CompatGMutex *mutex)
> > +{
> > +    g_once(&mutex->once, do_g_mutex_new, NULL);
> > +    return g_mutex_trylock((GMutex *) mutex->once.retval);
> > +}
> > +#undef g_mutex_trylock
> > +
> > +
> > +static inline void (g_mutex_unlock)(CompatGMutex *mutex)
> > +{
> > +    g_mutex_unlock((GMutex *) mutex->once.retval);
> > +}
> > +#undef g_mutex_unlock
> > +
> > +static inline gpointer do_g_cond_new(gpointer unused)
> > +{
> > +    return (gpointer) g_cond_new();
> > +}
> > +
> > +static inline void g_cond_init(CompatGCond *cond)
> > +{
> > +    cond->once = (GOnce) G_ONCE_INIT;
> > +}
> > +
> > +static inline void g_cond_clear(CompatGCond *cond)
> > +{
> > +    g_assert(cond->once.status != G_ONCE_STATUS_PROGRESS);
> > +    if (cond->once.retval) {
> > +        g_cond_free((GCond *) cond->once.retval);
> > +    }
> > +    cond->once = (GOnce) G_ONCE_INIT;
> > +}
> > +
> > +static inline void (g_cond_wait)(CompatGCond *cond, CompatGMutex *mutex)
> > +{
> > +    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> > +    g_once(&cond->once, do_g_cond_new, NULL);
> > +    g_cond_wait((GCond *) cond->once.retval, (GMutex *) mutex->once.retval);
> > +}
> > +#undef g_cond_wait
> > +
> > +static inline void (g_cond_broadcast)(CompatGCond *cond)
> > +{
> > +    g_once(&cond->once, do_g_cond_new, NULL);
> > +    g_cond_broadcast((GCond *) cond->once.retval);
> > +}
> > +#undef g_cond_broadcast
> > +
> > +static inline void (g_cond_signal)(CompatGCond *cond)
> > +{
> > +    g_once(&cond->once, do_g_cond_new, NULL);
> > +    g_cond_signal((GCond *) cond->once.retval);
> > +}
> > +#undef g_cond_signal
> > +
> > +static inline gboolean (g_cond_timed_wait)(CompatGCond *cond,
> > +                                           CompatGMutex *mutex,
> > +                                           GTimeVal *time)
> > +{
> > +    g_assert(mutex->once.status != G_ONCE_STATUS_PROGRESS);
> > +    g_once(&cond->once, do_g_cond_new, NULL);
> > +    return g_cond_timed_wait((GCond *) cond->once.retval,
> > +                             (GMutex *) mutex->once.retval, time);
> > +}
> > +#undef g_cond_timed_wait
> > +
> > +/* This is not a macro, because it didn't exist until 2.32.  */
> > +static inline gboolean g_cond_wait_until(CompatGCond *cond, CompatGMutex *mutex,
> > +                                         gint64 end_time)
> > +{
> > +    GTimeVal time;
> > +
> > +    /* Convert from monotonic to CLOCK_REALTIME.  */
> > +    end_time -= g_get_monotonic_time();
> > +    g_get_current_time(&time);
> > +    end_time += time.tv_sec * G_TIME_SPAN_SECOND + time.tv_usec;
> > +
> > +    time.tv_sec = end_time / G_TIME_SPAN_SECOND;
> > +    time.tv_usec = end_time % G_TIME_SPAN_SECOND;
> > +    return g_cond_timed_wait(cond, mutex, &time);
> > +}
> > +
> > +/* before 2.31 there was no g_thread_new() */
> > +static inline GThread *g_thread_new(const char *name,
> > +                                    GThreadFunc func, gpointer data)
> > +{
> > +    GThread *thread = g_thread_create(func, data, TRUE, NULL);
> > +    if (!thread) {
> > +        g_error("creating thread");
> > +    }
> > +    return thread;
> > +}
> > +#else
> > +#define CompatGMutex GMutex
> > +#define CompatGCond GCond
> > +#endif /* glib 2.31 */
> > +
> > +#if !GLIB_CHECK_VERSION(2, 32, 0)
> > +/* Beware, function returns gboolean since 2.39.2, see GLib commit 9101915 */
> > +static inline void g_hash_table_add(GHashTable *hash_table, gpointer key)
> > +{
> > +    g_hash_table_replace(hash_table, key, key);
> > +}
> > +#endif
> > +
> > +#ifndef g_assert_true
> > +#define g_assert_true(expr)                                                    \
> > +    do {                                                                       \
> > +        if (G_LIKELY(expr)) {                                                  \
> > +        } else {                                                               \
> > +            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> > +                                "'" #expr "' should be TRUE");                 \
> > +        }                                                                      \
> > +    } while (0)
> > +#endif
> > +
> > +#ifndef g_assert_false
> > +#define g_assert_false(expr)                                                   \
> > +    do {                                                                       \
> > +        if (G_LIKELY(!(expr))) {                                               \
> > +        } else {                                                               \
> > +            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> > +                                "'" #expr "' should be FALSE");                \
> > +        }                                                                      \
> > +    } while (0)
> > +#endif
> > +
> > +#ifndef g_assert_null
> > +#define g_assert_null(expr)                                                    \
> > +    do {                                                                       \
> > +        if (G_LIKELY((expr) == NULL)) {                                        \
> > +        } else {                                                               \
> > +            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> > +                                "'" #expr "' should be NULL");                 \
> > +        }                                                                      \
> > +    } while (0)
> > +#endif
> > +
> > +#ifndef g_assert_nonnull
> > +#define g_assert_nonnull(expr)                                                 \
> > +    do {                                                                       \
> > +        if (G_LIKELY((expr) != NULL)) {                                        \
> > +        } else {                                                               \
> > +            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> > +                                "'" #expr "' should not be NULL");             \
> > +        }                                                                      \
> > +    } while (0)
> > +#endif
> > +
> > +#ifndef g_assert_cmpmem
> > +#define g_assert_cmpmem(m1, l1, m2, l2)                                        \
> > +    do {                                                                       \
> > +        gconstpointer __m1 = m1, __m2 = m2;                                    \
> > +        int __l1 = l1, __l2 = l2;                                              \
> > +        if (__l1 != __l2) {                                                    \
> > +            g_assertion_message_cmpnum(                                        \
> > +                G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,                   \
> > +                #l1 " (len(" #m1 ")) == " #l2 " (len(" #m2 "))", __l1, "==",   \
> > +                __l2, 'i');                                                    \
> > +        } else if (memcmp(__m1, __m2, __l1) != 0) {                            \
> > +            g_assertion_message(G_LOG_DOMAIN, __FILE__, __LINE__, G_STRFUNC,   \
> > +                                "assertion failed (" #m1 " == " #m2 ")");      \
> > +        }                                                                      \
> > +    } while (0)
> > +#endif
> > +
> > +#if !GLIB_CHECK_VERSION(2, 28, 0)
> > +static inline void g_list_free_full(GList *list, GDestroyNotify free_func)
> > +{
> > +    GList *l;
> > +
> > +    for (l = list; l; l = l->next) {
> > +        free_func(l->data);
> > +    }
> > +
> > +    g_list_free(list);
> > +}
> > +
> > +static inline void g_slist_free_full(GSList *list, GDestroyNotify free_func)
> > +{
> > +    GSList *l;
> > +
> > +    for (l = list; l; l = l->next) {
> > +        free_func(l->data);
> > +    }
> > +
> > +    g_slist_free(list);
> > +}
> > +#endif
> > +
> > +#if !GLIB_CHECK_VERSION(2, 26, 0)
> > +static inline void g_source_set_name(GSource *source, const char *name)
> > +{
> > +    /* This is just a debugging aid, so leaving it a no-op */
> > +}
> > +static inline void g_source_set_name_by_id(guint tag, const char *name)
> > +{
> > +    /* This is just a debugging aid, so leaving it a no-op */
> > +}
> > +#endif
> > +
> > +#if !GLIB_CHECK_VERSION(2, 36, 0)
> > +/* Always fail.  This will not include error_report output in the test log,
> > + * sending it instead to stderr.
> > + */
> > +#define g_test_initialized() (0)
> > +#endif
> > +#if !GLIB_CHECK_VERSION(2, 38, 0)
> > +#ifdef CONFIG_HAS_GLIB_SUBPROCESS_TESTS
> > +#error schizophrenic detection of glib subprocess testing
> > +#endif
> > +#define g_test_subprocess() (0)
> > +#endif
> > +
> > +
> > +#if !GLIB_CHECK_VERSION(2, 34, 0)
> > +static inline void
> > +g_test_add_data_func_full(const char *path,
> > +                          gpointer data,
> > +                          gpointer fn,
> > +                          gpointer data_free_func)
> > +{
> > +#if GLIB_CHECK_VERSION(2, 26, 0)
> > +    /* back-compat casts, remove this once we can require new-enough glib */
> > +    g_test_add_vtable(path, 0, data, NULL,
> > +                      (GTestFixtureFunc)fn, (GTestFixtureFunc) data_free_func);
> > +#else
> > +    /* back-compat casts, remove this once we can require new-enough glib */
> > +    g_test_add_vtable(path, 0, data, NULL,
> > +                      (void (*)(void)) fn, (void (*)(void)) data_free_func);
> > +#endif
> > +}
> > +#endif
> > +
> > +
> > +#endif
> > diff --git a/include/glib/glib-helper.h b/include/glib/glib-helper.h
> > new file mode 100644
> > index 0000000..db740fb
> > --- /dev/null
> > +++ b/include/glib/glib-helper.h
> > @@ -0,0 +1,30 @@
> > +/*
> > + * Helpers for GLIB
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#ifndef QEMU_GLIB_HELPER_H
> > +#define QEMU_GLIB_HELPER_H
> > +
> > +
> > +#include "glib/glib-compat.h"
> > +
> > +#define GPOINTER_TO_UINT64(a) ((guint64) (a))
> > +
> > +/*
> > + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> > + */
> > +gint g_int_cmp64(gconstpointer a, gconstpointer b,
> > +        gpointer __attribute__((unused)) user_data);
> > +
> > +/*
> > + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> > + */
> > +int g_int_cmp(gconstpointer a, gconstpointer b,
> > +        gpointer __attribute__((unused)) user_data);
> > +
> > +#endif /* QEMU_GLIB_HELPER_H */
> > +
> > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > index 122ff06..36f8a89 100644
> > --- a/include/qemu/osdep.h
> > +++ b/include/qemu/osdep.h
> > @@ -104,7 +104,7 @@ extern int daemon(int, int);
> >  #include "sysemu/os-posix.h"
> >  #endif
> > 
> > -#include "glib-compat.h"
> > +#include "glib/glib-compat.h"
> >  #include "qemu/typedefs.h"
> > 
> >  #ifndef O_LARGEFILE
> > diff --git a/linux-user/main.c b/linux-user/main.c
> > index 10a3bb3..7cea6bc 100644
> > --- a/linux-user/main.c
> > +++ b/linux-user/main.c
> > @@ -35,7 +35,7 @@
> >  #include "elf.h"
> >  #include "exec/log.h"
> >  #include "trace/control.h"
> > -#include "glib-compat.h"
> > +#include "glib/glib-compat.h"
> > 
> >  char *exec_path;
> > 
> > diff --git a/scripts/clean-includes b/scripts/clean-includes
> > index dd938da..b32b928 100755
> > --- a/scripts/clean-includes
> > +++ b/scripts/clean-includes
> > @@ -123,7 +123,7 @@ for f in "$@"; do
> >        ;;
> >      *include/qemu/osdep.h | \
> >      *include/qemu/compiler.h | \
> > -    *include/glib-compat.h | \
> > +    *include/glib/glib-compat.h | \
> >      *include/sysemu/os-posix.h | \
> >      *include/sysemu/os-win32.h | \
> >      *include/standard-headers/ )
> > diff --git a/util/Makefile.objs b/util/Makefile.objs
> > index c6205eb..0080712 100644
> > --- a/util/Makefile.objs
> > +++ b/util/Makefile.objs
> > @@ -43,3 +43,4 @@ util-obj-y += qdist.o
> >  util-obj-y += qht.o
> >  util-obj-y += range.o
> >  util-obj-y += systemd.o
> > +util-obj-y += glib-helper.o
> > diff --git a/util/glib-helper.c b/util/glib-helper.c
> > new file mode 100644
> > index 0000000..2557009
> > --- /dev/null
> > +++ b/util/glib-helper.c
> > @@ -0,0 +1,29 @@
> > +/*
> > + * Implementation for GLIB helpers
> > + * this file is intented to commulate and later reuse
> > + * additional glib functions
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > +
> > + */
> > +
> > +#include "glib/glib-helper.h"
> > +
> > +gint g_int_cmp64(gconstpointer a, gconstpointer b,
> > +        gpointer __attribute__((unused)) user_data)
> > +{
> > +    guint64 ua = GPOINTER_TO_UINT64(a);
> > +    guint64 ub = GPOINTER_TO_UINT64(b);
> > +    return (ua > ub) - (ua < ub);
> > +}
> > +
> > +/*
> > + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> > + */
> > +gint g_int_cmp(gconstpointer a, gconstpointer b,
> > +        gpointer __attribute__((unused)) user_data)
> > +{
> > +    return g_int_cmp64(a, b, user_data);
> > +}
> > +
> > 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support
  2017-04-14 13:17     ` [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support Alexey Perevalov
@ 2017-04-21 10:24       ` Dr. David Alan Gilbert
  2017-04-21 15:22         ` Alexey
  0 siblings, 1 reply; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-21 10:24 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> Userfaultfd mechanism is able to provide process thread id,
> in case when client request it with UFDD_API ioctl.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

There seem to be two parts to this:
  a) Adding the mis parameter to ufd_version_check
  b) Asking for the feature

Please split it into two patches.

Also....

> ---
>  include/migration/postcopy-ram.h |  2 +-
>  migration/migration.c            |  2 +-
>  migration/postcopy-ram.c         | 12 ++++++------
>  migration/savevm.c               |  2 +-
>  4 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> index 8e036b9..809f6db 100644
> --- a/include/migration/postcopy-ram.h
> +++ b/include/migration/postcopy-ram.h
> @@ -14,7 +14,7 @@
>  #define QEMU_POSTCOPY_RAM_H
>  
>  /* Return true if the host supports everything we need to do postcopy-ram */
> -bool postcopy_ram_supported_by_host(void);
> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
>  
>  /*
>   * Make all of RAM sensitive to accesses to areas that haven't yet been written
> diff --git a/migration/migration.c b/migration/migration.c
> index ad4036f..79f6425 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -802,7 +802,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>           * special support.
>           */
>          if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
> -            !postcopy_ram_supported_by_host()) {
> +            !postcopy_ram_supported_by_host(NULL)) {
>              /* postcopy_ram_supported_by_host will have emitted a more
>               * detailed message
>               */
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index dc80dbb..70f0480 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -60,13 +60,13 @@ struct PostcopyDiscardState {
>  #include <sys/eventfd.h>
>  #include <linux/userfaultfd.h>
>  
> -static bool ufd_version_check(int ufd)
> +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>  {
>      struct uffdio_api api_struct;
>      uint64_t ioctl_mask;
>  
>      api_struct.api = UFFD_API;
> -    api_struct.features = 0;
> +    api_struct.features = UFFD_FEATURE_THREAD_ID;
>      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
>          error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
>                       strerror(errno));

You're not actually using the 'mis' here - what I'd expected was
something that was going to check if the UFFDIO_API return said that it really
had the feature, and if so store a flag in the MIS somewhere.

Also, I'm not sure it's right to set 'api_struct.features' on the input - what
happens if this is run on an old kernel - we don't want postcopy to fail on
an old kernel without your feature.
I'm not 100% sure of the interface, but I think the way it works is you set
features = 0 before the call, and then check the api_struct.features in the
return - in the same way that I check for UFFD_FEATURE_MISSING_HUGETLBFS.

Dave

> @@ -113,7 +113,7 @@ static int test_range_shared(const char *block_name, void *host_addr,
>   * normally fine since if the postcopy succeeds it gets turned back on at the
>   * end.
>   */
> -bool postcopy_ram_supported_by_host(void)
> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
>  {
>      long pagesize = getpagesize();
>      int ufd = -1;
> @@ -136,7 +136,7 @@ bool postcopy_ram_supported_by_host(void)
>      }
>  
>      /* Version and features check */
> -    if (!ufd_version_check(ufd)) {
> +    if (!ufd_version_check(ufd, mis)) {
>          goto out;
>      }
>  
> @@ -515,7 +515,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
>       * Although the host check already tested the API, we need to
>       * do the check again as an ABI handshake on the new fd.
>       */
> -    if (!ufd_version_check(mis->userfault_fd)) {
> +    if (!ufd_version_check(mis->userfault_fd, mis)) {
>          return -1;
>      }
>  
> @@ -653,7 +653,7 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
>  
>  #else
>  /* No target OS support, stubs just fail */
> -bool postcopy_ram_supported_by_host(void)
> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
>  {
>      error_report("%s: No OS support", __func__);
>      return false;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 3b19a4a..f01e418 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1360,7 +1360,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
>          return -1;
>      }
>  
> -    if (!postcopy_ram_supported_by_host()) {
> +    if (!postcopy_ram_supported_by_host(mis)) {
>          postcopy_state_set(POSTCOPY_INCOMING_NONE);
>          return -1;
>      }
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c
  2017-04-14 13:17     ` [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c Alexey Perevalov
  2017-04-14 16:05       ` Philippe Mathieu-Daudé
@ 2017-04-21 10:27       ` Peter Maydell
  2017-04-21 15:10         ` Alexey
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Maydell @ 2017-04-21 10:27 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: Dr. David Alan Gilbert, QEMU Developers, i.maximets

On 14 April 2017 at 14:17, Alexey Perevalov <a.perevalov@samsung.com> wrote:
> There is a lack of g_int_cmp which compares pointers value in glib,
> xen_disk.c introduced its own, so the same function now requires
> in migration.c. So logically to move it into common place.
> Futher: maybe extend glib.
>
> Also this commit moves existing glib-compat.h into util/glib
> folder for consolidation purpose.
>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

Hi; thanks for this patch. I have some comments below, mostly
aimed at improving the documentation in comments of what these
new header files and functions are for -- the bar for "how
much explanation do we need" moves up when a function is
moved from being local to a single file to being available
to all of QEMU.

> diff --git a/include/glib/glib-helper.h b/include/glib/glib-helper.h
> new file mode 100644
> index 0000000..db740fb
> --- /dev/null
> +++ b/include/glib/glib-helper.h
> @@ -0,0 +1,30 @@
> +/*
> + * Helpers for GLIB
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */

So glib-compat.h is for functions which exist in newer versions
of glib but not older ones. What's this header for? Ideally the
comment at the top of the file should make it clear what kinds
of functions go here rather than elsewhere.

Also, GLib is capitalized like that, and you should have a
Copyright line here.

> +
> +#ifndef QEMU_GLIB_HELPER_H
> +#define QEMU_GLIB_HELPER_H
> +
> +
> +#include "glib/glib-compat.h"

Nothing needs to include glib-compat.h directly, because osdep.h does.

> +
> +#define GPOINTER_TO_UINT64(a) ((guint64) (a))
> +
> +/*
> + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> + */

Can we have a proper doc comment format comment, please,
since this is now a function available to all of QEMU?

> +gint g_int_cmp64(gconstpointer a, gconstpointer b,
> +        gpointer __attribute__((unused)) user_data);

What is this actually for? Looking at the original uses
I can tell that this is a GCompareDataFunc function, but
the comment should tell me that.

It also looks very fishy because the function name suggests
a 64 bit compare but gconstpointer may only be 32 bits.

I'm not sure it makes sense to specify the unused attribute
on the function prototype -- that is a property of the
implementation, not of the API exposed to callers, so it
should go on the function definition IMHO.

> +
> +/*
> + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> + */

This is the same comment as above, so it doesn't explain
what the difference between the two functions is.

> +int g_int_cmp(gconstpointer a, gconstpointer b,
> +        gpointer __attribute__((unused)) user_data);
> +
> +#endif /* QEMU_GLIB_HELPER_H */
> +
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 122ff06..36f8a89 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -104,7 +104,7 @@ extern int daemon(int, int);
>  #include "sysemu/os-posix.h"
>  #endif
>
> -#include "glib-compat.h"
> +#include "glib/glib-compat.h"
>  #include "qemu/typedefs.h"
>
>  #ifndef O_LARGEFILE
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 10a3bb3..7cea6bc 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -35,7 +35,7 @@
>  #include "elf.h"
>  #include "exec/log.h"
>  #include "trace/control.h"
> -#include "glib-compat.h"
> +#include "glib/glib-compat.h"

osdep.h includes glib-compat.h so we should just delete the #include,
not change it.

This patch looks like it will break bsd-user compiles, because
bsd-user/main.c has the same unnecessary glib-compat.h include
and the patch doesn't change or delete it.

>
>  char *exec_path;
>
> diff --git a/scripts/clean-includes b/scripts/clean-includes
> index dd938da..b32b928 100755
> --- a/scripts/clean-includes
> +++ b/scripts/clean-includes
> @@ -123,7 +123,7 @@ for f in "$@"; do
>        ;;
>      *include/qemu/osdep.h | \
>      *include/qemu/compiler.h | \
> -    *include/glib-compat.h | \
> +    *include/glib/glib-compat.h | \
>      *include/sysemu/os-posix.h | \
>      *include/sysemu/os-win32.h | \
>      *include/standard-headers/ )
> diff --git a/util/Makefile.objs b/util/Makefile.objs
> index c6205eb..0080712 100644
> --- a/util/Makefile.objs
> +++ b/util/Makefile.objs
> @@ -43,3 +43,4 @@ util-obj-y += qdist.o
>  util-obj-y += qht.o
>  util-obj-y += range.o
>  util-obj-y += systemd.o
> +util-obj-y += glib-helper.o
> diff --git a/util/glib-helper.c b/util/glib-helper.c
> new file mode 100644
> index 0000000..2557009
> --- /dev/null
> +++ b/util/glib-helper.c
> @@ -0,0 +1,29 @@
> +/*
> + * Implementation for GLIB helpers
> + * this file is intented to commulate and later reuse
> + * additional glib functions

Did you mean "accumulate" ?

More detailed description of what functions live in this
file would be useful -- these aren't actually GLib
functions, just utility routines that are useful to
code which uses GLib, as far as I can tell.

> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> +

Stray blank line.

> + */

This is also missing the copyright line.

> +
> +#include "glib/glib-helper.h"

Every C file should start by including "qemu/osdep.h" as the
first thing it does.

> +
> +gint g_int_cmp64(gconstpointer a, gconstpointer b,
> +        gpointer __attribute__((unused)) user_data)
> +{
> +    guint64 ua = GPOINTER_TO_UINT64(a);
> +    guint64 ub = GPOINTER_TO_UINT64(b);
> +    return (ua > ub) - (ua < ub);
> +}
> +
> +/*
> + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> + */
> +gint g_int_cmp(gconstpointer a, gconstpointer b,
> +        gpointer __attribute__((unused)) user_data)
> +{
> +    return g_int_cmp64(a, b, user_data);
> +}
> +
> --
> 1.8.3.1
>
>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side
  2017-04-14 13:17     ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side Alexey Perevalov
@ 2017-04-21 12:00       ` Dr. David Alan Gilbert
  2017-04-21 18:47         ` Alexey
  2017-04-22  9:49         ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side (CPUMASK) Alexey
  2017-04-25  8:24       ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side Peter Xu
  1 sibling, 2 replies; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-21 12:00 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> This patch provides downtime calculation per vCPU,
> as a summary and as a overlapped value for all vCPUs.
> 
> This approach just keeps tree with page fault addr as a key,
> and t1-t2 interval of pagefault time and page copy time, with
> affected vCPU bit mask.
> For more implementation details please see comment to
> get_postcopy_total_downtime function.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  include/migration/migration.h |  14 +++
>  migration/migration.c         | 280 +++++++++++++++++++++++++++++++++++++++++-
>  migration/postcopy-ram.c      |  24 +++-
>  migration/qemu-file.c         |   1 -
>  migration/trace-events        |   9 +-
>  5 files changed, 323 insertions(+), 5 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 5720c88..5d2c628 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -123,10 +123,24 @@ struct MigrationIncomingState {
>  
>      /* See savevm.c */
>      LoadStateEntry_Head loadvm_handlers;
> +
> +    /*
> +     *  Tree for keeping postcopy downtime,
> +     *  necessary to calculate correct downtime, during multiple
> +     *  vm suspends, it keeps host page address as a key and
> +     *  DowntimeDuration as a data
> +     *  NULL means kernel couldn't provide process thread id,
> +     *  and QEMU couldn't identify which vCPU raise page fault
> +     */
> +    GTree *postcopy_downtime;
>  };
>  
>  MigrationIncomingState *migration_incoming_get_current(void);
>  void migration_incoming_state_destroy(void);
> +void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> +void mark_postcopy_downtime_end(uint64_t addr);
> +uint64_t get_postcopy_total_downtime(void);
> +void destroy_downtime_duration(gpointer data);
>  
>  /*
>   * An outstanding page request, on the source, having been received
> diff --git a/migration/migration.c b/migration/migration.c
> index 79f6425..5bac434 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -38,6 +38,8 @@
>  #include "io/channel-tls.h"
>  #include "migration/colo.h"
>  
> +#define DEBUG_VCPU_DOWNTIME 1
> +
>  #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
>  
>  /* Amount of time to allocate to each "chunk" of bandwidth-throttled
> @@ -77,6 +79,19 @@ static NotifierList migration_state_notifiers =
>  
>  static bool deferred_incoming;
>  
> +typedef struct {
> +    int64_t begin;
> +    int64_t end;
> +    uint64_t *cpus; /* cpus bit mask array, QEMU bit functions support
> +     bit operation on memory regions, but doesn't check out of range */
> +} DowntimeDuration;
> +
> +typedef struct {
> +    int64_t tp; /* point in time */
> +    bool is_end;
> +    uint64_t *cpus;
> +} OverlapDowntime;
> +
>  /*
>   * Current state of incoming postcopy; note this is not part of
>   * MigrationIncomingState since it's state is used during cleanup
> @@ -117,6 +132,13 @@ MigrationState *migrate_get_current(void)
>      return &current_migration;
>  }
>  
> +void destroy_downtime_duration(gpointer data)
> +{
> +    DowntimeDuration *dd = (DowntimeDuration *)data;
> +    g_free(dd->cpus);
> +    g_free(data);
> +}
> +
>  MigrationIncomingState *migration_incoming_get_current(void)
>  {
>      static bool once;
> @@ -138,10 +160,13 @@ void migration_incoming_state_destroy(void)
>      struct MigrationIncomingState *mis = migration_incoming_get_current();
>  
>      qemu_event_destroy(&mis->main_thread_load_event);
> +    if (mis->postcopy_downtime) {
> +        g_tree_destroy(mis->postcopy_downtime);
> +        mis->postcopy_downtime = NULL;
> +    }
>      loadvm_free_handlers(mis);
>  }
>  
> -
>  typedef struct {
>      bool optional;
>      uint32_t size;
> @@ -1754,7 +1779,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
>       */
>      ms->postcopy_after_devices = true;
>      notifier_list_notify(&migration_state_notifiers, ms);
> -

Stray deletion

>      ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
>  
>      qemu_mutex_unlock_iothread();
> @@ -2117,3 +2141,255 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
>      return atomic_xchg(&incoming_postcopy_state, new_state);
>  }
>  
> +#define SIZE_TO_KEEP_CPUBITS (1 + smp_cpus/sizeof(guint64))

Split out your cpu-sets so that you have an 'alloc_cpu_set',
a 'set bit' a 'set all bits', dup etc
(I see Linux has cpumask.h that has a 'cpu_set' that's
basically the same thing, but we need something portablish.)

> +void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    DowntimeDuration *dd;
> +    if (!mis->postcopy_downtime) {
> +        return;
> +    }
> +
> +    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr); /* !!! cast */
> +    if (!dd) {
> +        dd = (DowntimeDuration *)g_new0(DowntimeDuration, 1);
> +        dd->cpus = g_new0(guint64, SIZE_TO_KEEP_CPUBITS);
> +        g_tree_insert(mis->postcopy_downtime, (gpointer)addr, (gpointer)dd);
> +    }
> +
> +    if (cpu < 0) {
> +        /* assume in this situation all vCPUs are sleeping */
> +        int i;
> +        for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
> +            dd->cpus[i] = ~(uint64_t)0u;
> +        }
> +    } else
> +        set_bit(cpu, dd->cpus);

Qemu coding style: Use {}'s even on one line blocks

> +
> +    /*
> +     *  overwrite previously set dd->begin, if that page already was
> +     *     faulted on another cpu
> +     */
> +    dd->begin = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);

OK, so this is making a decision that needs to be documented;
that is that if one CPU was already paused at time (a), then a second
CPU we see is paused at time (b), then the time  we record only starts
at (b) and ignores the time from a..b  - is that the way you want to do it?
As I say, it should be documented somewhere; it's probably worth
adding something to docs/migration.txt about how this measurement works.


> +    trace_mark_postcopy_downtime_begin(addr, dd, dd->begin, cpu);
> +}
> +
> +void mark_postcopy_downtime_end(uint64_t addr)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    DowntimeDuration *dd;
> +    if (!mis->postcopy_downtime) {
> +        return;
> +    }
> +
> +    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr);
> +    if (!dd) {
> +        /* error_report("Could not populate downtime duration completion time \n\
> +                        There is no downtime duration for 0x%"PRIx64, addr); */

Error or no error - decide!   Is this happening for pages that arrive before
they've been requested?

> +        return;
> +    }
> +
> +    dd->end = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +    trace_mark_postcopy_downtime_end(addr, dd, dd->end);
> +}
> +
> +struct downtime_overlay_cxt {
> +    GPtrArray *downtime_points;
> +    size_t number_of_points;
> +};

Why 'cxt' ? If you mean as an abbreviation to context, then we normally use ctxt.

> +/*
> + * This function split each DowntimeDuration, which represents as start/end
> + * pointand makes a points of it, then fill array with points,
> + * to sort it in future.
> + */
> +static gboolean split_duration_and_fill_points(gpointer key, gpointer value,
> +                                        gpointer data)
> +{
> +    struct downtime_overlay_cxt *ctx = (struct downtime_overlay_cxt *)data;
> +    DowntimeDuration *dd = (DowntimeDuration *)value;
> +    GPtrArray *interval = ctx->downtime_points;
> +    if (dd->begin) {
> +        OverlapDowntime *od_begin = g_new0(OverlapDowntime, 1);
> +        od_begin->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> +        od_begin->tp = dd->begin;
> +        od_begin->is_end = false;
> +        g_ptr_array_add(interval, od_begin);
> +        ctx->number_of_points += 1;
> +    }
> +
> +    if (dd->end) {
> +        OverlapDowntime *od_end = g_new0(OverlapDowntime, 1);
> +        od_end->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> +        od_end->tp = dd->end;
> +        od_end->is_end = true;
> +        g_ptr_array_add(interval, od_end);
> +        ctx->number_of_points += 1;
> +    }
> +
> +    if (dd->end && dd->begin)
> +        trace_split_duration_and_fill_points(dd->end - dd->begin, (uint64_t)key);

again, need {}'s

> +    return FALSE;
> +}
> +
> +#ifdef DEBUG_VCPU_DOWNTIME
> +static gboolean calculate_per_cpu(gpointer key, gpointer value,
> +                                  gpointer data)
> +{
> +    int *downtime_cpu = (int *)data;
> +    DowntimeDuration *dd = (DowntimeDuration *)value;
> +    int cpu_iter;
> +    for (cpu_iter = 0; cpu_iter < smp_cpus; cpu_iter++) {
> +        if (test_bit(cpu_iter, dd->cpus) && dd->end && dd->begin)
> +            downtime_cpu[cpu_iter] += dd->end - dd->begin;
> +    }
> +    return FALSE;
> +}
> +#endif /* DEBUG_VCPU_DOWNTIME */
> +
> +static gint compare_downtime(gconstpointer a, gconstpointer b)
> +{
> +    DowntimeDuration *dda = (DowntimeDuration *)a;
> +    DowntimeDuration *ddb = (DowntimeDuration *)b;
> +    return dda->begin - ddb->begin;
> +}
> +
> +static void destroy_overlap_downtime(gpointer data)
> +{
> +    OverlapDowntime *od = (OverlapDowntime *)data;
> +    g_free(od->cpus);
> +    g_free(data);
> +}
> +
> +static int check_overlap(uint64_t *b)
> +{
> +    unsigned long zero_bit = find_first_zero_bit(b, BITS_PER_LONG * SIZE_TO_KEEP_CPUBITS);

Line's too long.

> +    return zero_bit >= smp_cpus;

So this is really 'all cpus are blocked'?

> +}
> +
> +/*
> + * This function calculates downtime per cpu and trace it
> + *
> + *  Also it calculates total downtime as an interval's overlap,
> + *  for many vCPU.
> + *
> + *  The approach is following:
> + *  Initially intervals are represented in tree where key is
> + *  pagefault address, and values:
> + *   begin - page fault time
> + *   end   - page load time
> + *   cpus  - bit mask shows affected cpus
> + *
> + *  To calculate overlap on all cpus, intervals converted into
> + *  array of points in time (downtime_points), the size of
> + *  array is 2 * number of nodes in tree of intervals (2 array
> + *  elements per one in element of interval).
> + *  Each element is marked as end (E) or as start (S) of interval.
> + *  The overlap downtime will be calculated for SE, only in case
> + *  there is sequence S(0..N)E(M) for every vCPU.
> + *
> + * As example we have 3 CPU
> + *
> + *      S1        E1           S1               E1
> + * -----***********------------xxx***************------------------------> CPU1
> + *
> + *             S2                E2
> + * ------------****************xxx---------------------------------------> CPU2
> + *
> + *                         S3            E3
> + * ------------------------****xxx********-------------------------------> CPU3
> + *
> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> + * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
                       ^ typo

> + * Legend of picture is following: * - means downtime per vCPU
> + *                                 x - means overlapped downtime
> + */
> +uint64_t get_postcopy_total_downtime(void)
> +{
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    uint64_t total_downtime = 0; /* for total overlapped downtime */
> +    const int intervals = g_tree_nnodes(mis->postcopy_downtime);
> +    int point_iter, start_point_iter, i;
> +    struct downtime_overlay_cxt dp_ctx = { 0 };
> +    /*
> +     * array will contain 2 * interval points or less, if
> +     * it was not page fault finalization for page,
> +     * real count will be in ctx.number_of_points
> +     */
> +    dp_ctx.downtime_points = g_ptr_array_new_full(2 * intervals,
> +                                                     destroy_overlap_downtime);

Is the g_ptr_array giving you anything here over a plain-old C array of pointers?
You're not dynamically growing it.

> +    if (!mis->postcopy_downtime) {
> +        goto out;
> +    }
> +
> +#ifdef DEBUG_VCPU_DOWNTIME
> +    {
> +        gint *downtime_cpu = g_new0(int, smp_cpus);
> +        g_tree_foreach(mis->postcopy_downtime, calculate_per_cpu, downtime_cpu);
> +        for (point_iter = 0; point_iter < smp_cpus; point_iter++)
> +        {
> +            trace_downtime_per_cpu(point_iter, downtime_cpu[point_iter]);
> +        }
> +        g_free(downtime_cpu);
> +    }
> +#endif /* DEBUG_VCPU_DOWNTIME */

You mgight want to make that:
  if (TRACE_DOWNTIME_PER_CPU_ENABLED) {
  }

and remove the ifdef.

> +    /* make downtime points S/E from interval */
> +    g_tree_foreach(mis->postcopy_downtime, split_duration_and_fill_points,
> +                   &dp_ctx);
> +    g_ptr_array_sort(dp_ctx.downtime_points, compare_downtime);
> +
> +    for (point_iter = 1; point_iter < dp_ctx.number_of_points;
> +         point_iter++) {
> +        OverlapDowntime *od = g_ptr_array_index(dp_ctx.downtime_points,
> +                point_iter);
> +        uint64_t *cur_cpus;
> +        int smp_cpus_i = smp_cpus;
> +        OverlapDowntime *prev_od = g_ptr_array_index(dp_ctx.downtime_points,
> +                                                     point_iter - 1);
> +        if (!od || !prev_od)
> +            continue;

Why would that happen?

> +        /* we need sequence SE */
> +        if (!od->is_end || prev_od->is_end)
> +            continue;
> +
> +        cur_cpus = g_memdup(od->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> +        for (start_point_iter = point_iter - 1;
> +             start_point_iter >= 0 && smp_cpus_i;
> +             start_point_iter--, smp_cpus_i--) {

I think I see what you're doing in this loop, although it's a bit hairy;
I don't think I understand why we needed to get prev_od  if this loop is searching
backwards?

> +            OverlapDowntime *t_od = g_ptr_array_index(dp_ctx.downtime_points,
> +                                                      start_point_iter);
> +            if (!t_od)
> +                break;
> +            /* should be S */
> +            if (t_od->is_end)
> +                break;
> +
> +            /* points were sorted, it's possible when
> +             * end is not occured, but this points were ommited
> +             * in split_duration_and_fill_points */
> +            if (od->tp <= prev_od->tp) {

Why is this checking od and prev_od in this loop - isn't this
loop mainly t_od ?

> +                break;
> +            }
> +
> +            for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
> +                cur_cpus[i] |= t_od->cpus[i];
> +            }
> +
> +            /* check_overlap - just count number of bits in cur_cpus,
> +             * and compare it with smp_cpus */
> +            if (check_overlap(cur_cpus)) {
> +                total_downtime += od->tp - prev_od->tp;
> +                /* situation when one S point represents all vCPU is possible */
> +                break;
> +            }
> +        }
> +        g_free(cur_cpus);
> +    }
> +    trace_get_postcopy_total_downtime(g_tree_nnodes(mis->postcopy_downtime),
> +        total_downtime);
> +out:
> +    g_ptr_array_free(dp_ctx.downtime_points, TRUE);
> +    return total_downtime;
> +}
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 70f0480..ea89f4e 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -23,8 +23,10 @@
>  #include "migration/postcopy-ram.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/balloon.h"
> +#include <sys/param.h>
>  #include "qemu/error-report.h"
>  #include "trace.h"
> +#include "glib/glib-helper.h"
>  
>  /* Arbitrary limit on size of each discard command,
>   * keeps them around ~200 bytes
> @@ -81,6 +83,11 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>          return false;
>      }
>  
> +    if (mis && UFFD_FEATURE_THREAD_ID & api_struct.features) {

That's a very weird way of writing that test!  Also, I think you need
to still make this user-selectable given the complexity/cost.

> +        mis->postcopy_downtime = g_tree_new_full(g_int_cmp64,
> +                                         NULL, NULL, destroy_downtime_duration);
> +    }
> +
>      if (getpagesize() != ram_pagesize_summary()) {
>          bool have_hp = false;
>          /* We've got a huge page */
> @@ -404,6 +411,18 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>      return 0;
>  }
>  
> +static int get_mem_fault_cpu_index(uint32_t pid)
> +{
> +    CPUState *cpu_iter;
> +
> +    CPU_FOREACH(cpu_iter) {
> +        if (cpu_iter->thread_id == pid)
> +           return cpu_iter->cpu_index;
> +    }
> +    trace_get_mem_fault_cpu_index(pid);
> +    return -1;
> +}
> +
>  /*
>   * Handle faults detected by the USERFAULT markings
>   */
> @@ -481,8 +500,10 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
>          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
>                                                  qemu_ram_get_idstr(rb),
> -                                                rb_offset);
> +                                                rb_offset, msg.arg.pagefault.feat.ptid);

Line length!

>  
> +        mark_postcopy_downtime_begin(msg.arg.pagefault.address,
> +                            get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
>          /*
>           * Send the request to the source - we want to request one
>           * of our host page sizes (which is >= TPS)
> @@ -577,6 +598,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
>  
>          return -e;
>      }
> +    mark_postcopy_downtime_end((uint64_t)host);
>  
>      trace_postcopy_place_page(host);
>      return 0;
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index 195fa94..c9f3e47 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -547,7 +547,6 @@ size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
>  int qemu_peek_byte(QEMUFile *f, int offset)
>  {
>      int index = f->buf_index + offset;
> -

Stray!

>      assert(!qemu_file_is_writable(f));
>      assert(offset < IO_BUF_SIZE);
>  
> diff --git a/migration/trace-events b/migration/trace-events
> index 7372ce2..ab2e1e4 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -110,6 +110,12 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
>  process_incoming_migration_co_postcopy_end_main(void) ""
>  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
>  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> +mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> +mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
> +get_postcopy_total_downtime(int num, uint64_t total) "faults %d, total downtime %" PRIu64
> +split_duration_and_fill_points(int64_t downtime, uint64_t addr) "downtime %" PRId64 " addr 0x%" PRIx64
> +downtime_per_cpu(int cpu_index, int downtime) "downtime cpu[%d]=%d"
> +source_return_path_thread_downtime(uint64_t downtime) "downtime %" PRIu64
>  
>  # migration/rdma.c
>  qemu_rdma_accept_incoming_migration(void) ""
> @@ -186,7 +192,7 @@ postcopy_ram_enable_notify(void) ""
>  postcopy_ram_fault_thread_entry(void) ""
>  postcopy_ram_fault_thread_exit(void) ""
>  postcopy_ram_fault_thread_quit(void) ""
> -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, int pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %d"
>  postcopy_ram_incoming_cleanup_closeuf(void) ""
>  postcopy_ram_incoming_cleanup_entry(void) ""
>  postcopy_ram_incoming_cleanup_exit(void) ""
> @@ -195,6 +201,7 @@ save_xbzrle_page_skipping(void) ""
>  save_xbzrle_page_overflow(void) ""
>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
>  
>  # migration/exec.c
>  migration_exec_outgoing(const char *cmd) "cmd=%s"
> -- 
> 1.8.3.1
> 

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c
  2017-04-21 10:27       ` Peter Maydell
@ 2017-04-21 15:10         ` Alexey
  2017-04-21 15:49           ` Peter Maydell
  0 siblings, 1 reply; 38+ messages in thread
From: Alexey @ 2017-04-21 15:10 UTC (permalink / raw)
  To: Peter Maydell; +Cc: i.maximets, Dr. David Alan Gilbert, QEMU Developers

Hello, thank you for so  detailed comment,

On Fri, Apr 21, 2017 at 11:27:55AM +0100, Peter Maydell wrote:
> On 14 April 2017 at 14:17, Alexey Perevalov <a.perevalov@samsung.com> wrote:
> > There is a lack of g_int_cmp which compares pointers value in glib,
> > xen_disk.c introduced its own, so the same function now requires
> > in migration.c. So logically to move it into common place.
> > Futher: maybe extend glib.
> >
> > Also this commit moves existing glib-compat.h into util/glib
> > folder for consolidation purpose.
> >
> > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> 
> Hi; thanks for this patch. I have some comments below, mostly
> aimed at improving the documentation in comments of what these
> new header files and functions are for -- the bar for "how
> much explanation do we need" moves up when a function is
> moved from being local to a single file to being available
> to all of QEMU.
> 
> > diff --git a/include/glib/glib-helper.h b/include/glib/glib-helper.h
> > new file mode 100644
> > index 0000000..db740fb
> > --- /dev/null
> > +++ b/include/glib/glib-helper.h
> > @@ -0,0 +1,30 @@
> > +/*
> > + * Helpers for GLIB
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> 
> So glib-compat.h is for functions which exist in newer versions
> of glib but not older ones. What's this header for? Ideally the
> comment at the top of the file should make it clear what kinds
> of functions go here rather than elsewhere.
> 
> Also, GLib is capitalized like that, and you should have a
> Copyright line here.
> 
> > +
> > +#ifndef QEMU_GLIB_HELPER_H
> > +#define QEMU_GLIB_HELPER_H
> > +
> > +
> > +#include "glib/glib-compat.h"
> 
> Nothing needs to include glib-compat.h directly, because osdep.h does.
> 
> > +
> > +#define GPOINTER_TO_UINT64(a) ((guint64) (a))
> > +
> > +/*
> > + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> > + */
> 
> Can we have a proper doc comment format comment, please,
> since this is now a function available to all of QEMU?
> 
> > +gint g_int_cmp64(gconstpointer a, gconstpointer b,
> > +        gpointer __attribute__((unused)) user_data);
> 
> What is this actually for? Looking at the original uses
> I can tell that this is a GCompareDataFunc function, but
> the comment should tell me that.
I looked at another functions comments in QEMU, I didn't find
some common style, and decided keep it as is. Maybe I omitted some
best practice here.


> 
> It also looks very fishy because the function name suggests
> a 64 bit compare but gconstpointer may only be 32 bits.
> 
> I'm not sure it makes sense to specify the unused attribute
> on the function prototype -- that is a property of the
> implementation, not of the API exposed to callers, so it
> should go on the function definition IMHO.
> 
> > +
> > +/*
> > + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> > + */
> 
> This is the same comment as above, so it doesn't explain
> what the difference between the two functions is.
> 

yes, it was copy pasted,
right now, after mingw build check I think to use intptr_t as a type
for comparision in this function or even keep gpointer and merge these two
functions into _direct_.
I saw intptr_t is widely used in QEMU.

The intent of this function was a comparator for case when client code
want to keep integers in pointer field. xen_disk.c uses UINT32 so it
wasn't a problem, but migration uses 64 address (kernel provides it in
__u64, long long), so on 32 platform it's a problem.
Fortunately userfaultfd handler is linux specific code, 
and I'm going to keep there just cast, like that GUINT_TO_POINTER

#define GUINT_TO_POINTER(u)     ((gpointer) ${glib_gpui_cast} (u))

on 64 architecture glib_gpui_cast is guint64.

> > +int g_int_cmp(gconstpointer a, gconstpointer b,
> > +        gpointer __attribute__((unused)) user_data);
> > +
> > +#endif /* QEMU_GLIB_HELPER_H */
> > +
> > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > index 122ff06..36f8a89 100644
> > --- a/include/qemu/osdep.h
> > +++ b/include/qemu/osdep.h
> > @@ -104,7 +104,7 @@ extern int daemon(int, int);
> >  #include "sysemu/os-posix.h"
> >  #endif
> >
> > -#include "glib-compat.h"
> > +#include "glib/glib-compat.h"
> >  #include "qemu/typedefs.h"
> >
> >  #ifndef O_LARGEFILE
> > diff --git a/linux-user/main.c b/linux-user/main.c
> > index 10a3bb3..7cea6bc 100644
> > --- a/linux-user/main.c
> > +++ b/linux-user/main.c
> > @@ -35,7 +35,7 @@
> >  #include "elf.h"
> >  #include "exec/log.h"
> >  #include "trace/control.h"
> > -#include "glib-compat.h"
> > +#include "glib/glib-compat.h"
> 
> osdep.h includes glib-compat.h so we should just delete the #include,
> not change it.
> 
> This patch looks like it will break bsd-user compiles, because
> bsd-user/main.c has the same unnecessary glib-compat.h include
> and the patch doesn't change or delete it.
> 
> >
> >  char *exec_path;
> >
> > diff --git a/scripts/clean-includes b/scripts/clean-includes
> > index dd938da..b32b928 100755
> > --- a/scripts/clean-includes
> > +++ b/scripts/clean-includes
> > @@ -123,7 +123,7 @@ for f in "$@"; do
> >        ;;
> >      *include/qemu/osdep.h | \
> >      *include/qemu/compiler.h | \
> > -    *include/glib-compat.h | \
> > +    *include/glib/glib-compat.h | \
> >      *include/sysemu/os-posix.h | \
> >      *include/sysemu/os-win32.h | \
> >      *include/standard-headers/ )
> > diff --git a/util/Makefile.objs b/util/Makefile.objs
> > index c6205eb..0080712 100644
> > --- a/util/Makefile.objs
> > +++ b/util/Makefile.objs
> > @@ -43,3 +43,4 @@ util-obj-y += qdist.o
> >  util-obj-y += qht.o
> >  util-obj-y += range.o
> >  util-obj-y += systemd.o
> > +util-obj-y += glib-helper.o
> > diff --git a/util/glib-helper.c b/util/glib-helper.c
> > new file mode 100644
> > index 0000000..2557009
> > --- /dev/null
> > +++ b/util/glib-helper.c
> > @@ -0,0 +1,29 @@
> > +/*
> > + * Implementation for GLIB helpers
> > + * this file is intented to commulate and later reuse
> > + * additional glib functions
> 
> Did you mean "accumulate" ?
> 
> More detailed description of what functions live in this
> file would be useful -- these aren't actually GLib
> functions, just utility routines that are useful to
> code which uses GLib, as far as I can tell.
> 
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > +
> 
> Stray blank line.
> 
> > + */
> 
> This is also missing the copyright line.
Yes, maybe it was better for me to ask before send.
I found in util files with reference to GNU GPL, version 2, like
in this file, also I found that

 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2 of the License, or (at your option) any later version.

So I just copied copyright reference from glib-compat.h.

> 
> > +
> > +#include "glib/glib-helper.h"
> 
> Every C file should start by including "qemu/osdep.h" as the
> first thing it does.
> 
> > +
> > +gint g_int_cmp64(gconstpointer a, gconstpointer b,
> > +        gpointer __attribute__((unused)) user_data)
> > +{
> > +    guint64 ua = GPOINTER_TO_UINT64(a);
> > +    guint64 ub = GPOINTER_TO_UINT64(b);
> > +    return (ua > ub) - (ua < ub);
> > +}
> > +
> > +/*
> > + * return 1 in case of a > b, -1 otherwise and 0 if equeal
> > + */
> > +gint g_int_cmp(gconstpointer a, gconstpointer b,
> > +        gpointer __attribute__((unused)) user_data)
> > +{
> > +    return g_int_cmp64(a, b, user_data);
> > +}
> > +
> > --
> > 1.8.3.1
> >
> >
> 
> thanks
> -- PMM
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support
  2017-04-21 10:24       ` Dr. David Alan Gilbert
@ 2017-04-21 15:22         ` Alexey
  2017-04-24  8:03           ` Peter Xu
  2017-04-24  8:12           ` Peter Xu
  0 siblings, 2 replies; 38+ messages in thread
From: Alexey @ 2017-04-21 15:22 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: i.maximets, qemu-devel, aarcange

On Fri, Apr 21, 2017 at 11:24:54AM +0100, Dr. David Alan Gilbert wrote:
> * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > Userfaultfd mechanism is able to provide process thread id,
> > in case when client request it with UFDD_API ioctl.
> > 
> > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> 
> There seem to be two parts to this:
>   a) Adding the mis parameter to ufd_version_check
>   b) Asking for the feature
> 
> Please split it into two patches.
> 
> Also....
> 
> > ---
> >  include/migration/postcopy-ram.h |  2 +-
> >  migration/migration.c            |  2 +-
> >  migration/postcopy-ram.c         | 12 ++++++------
> >  migration/savevm.c               |  2 +-
> >  4 files changed, 9 insertions(+), 9 deletions(-)
> > 
> > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > index 8e036b9..809f6db 100644
> > --- a/include/migration/postcopy-ram.h
> > +++ b/include/migration/postcopy-ram.h
> > @@ -14,7 +14,7 @@
> >  #define QEMU_POSTCOPY_RAM_H
> >  
> >  /* Return true if the host supports everything we need to do postcopy-ram */
> > -bool postcopy_ram_supported_by_host(void);
> > +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
> >  
> >  /*
> >   * Make all of RAM sensitive to accesses to areas that haven't yet been written
> > diff --git a/migration/migration.c b/migration/migration.c
> > index ad4036f..79f6425 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -802,7 +802,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
> >           * special support.
> >           */
> >          if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
> > -            !postcopy_ram_supported_by_host()) {
> > +            !postcopy_ram_supported_by_host(NULL)) {
> >              /* postcopy_ram_supported_by_host will have emitted a more
> >               * detailed message
> >               */
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index dc80dbb..70f0480 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -60,13 +60,13 @@ struct PostcopyDiscardState {
> >  #include <sys/eventfd.h>
> >  #include <linux/userfaultfd.h>
> >  
> > -static bool ufd_version_check(int ufd)
> > +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> >  {
> >      struct uffdio_api api_struct;
> >      uint64_t ioctl_mask;
> >  
> >      api_struct.api = UFFD_API;
> > -    api_struct.features = 0;
> > +    api_struct.features = UFFD_FEATURE_THREAD_ID;
> >      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> >          error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
> >                       strerror(errno));
> 
> You're not actually using the 'mis' here - what I'd expected was
> something that was going to check if the UFFDIO_API return said that it really
> had the feature, and if so store a flag in the MIS somewhere.
> 
> Also, I'm not sure it's right to set 'api_struct.features' on the input - what
> happens if this is run on an old kernel - we don't want postcopy to fail on
> an old kernel without your feature.
> I'm not 100% sure of the interface, but I think the way it works is you set
> features = 0 before the call, and then check the api_struct.features in the
> return - in the same way that I check for UFFD_FEATURE_MISSING_HUGETLBFS.
> 
We need to ask kernel about that feature,
right,
kernel returns back available features
uffdio_api.features = UFFD_API_FEATURES
but it also stores requested features
/* only enable the requested features for this uffd context */
 ctx->features = uffd_ctx_features(features);

so, at the time when process thread id is going to be sent
kernel checks if it was requested
+       if (features & UFFD_FEATURE_THREAD_ID)
+               msg.arg.pagefault.ptid = task_pid_vnr(current);

from patch message:

 Process's thread id is being provided when user requeste it
by setting UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.

UFFD_FEATURE_MISSING_HUGETLBFS - look like default, unconditional
behavior (I didn't find any usage of that define in kernel).


> Dave
> 
> > @@ -113,7 +113,7 @@ static int test_range_shared(const char *block_name, void *host_addr,
> >   * normally fine since if the postcopy succeeds it gets turned back on at the
> >   * end.
> >   */
> > -bool postcopy_ram_supported_by_host(void)
> > +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
> >  {
> >      long pagesize = getpagesize();
> >      int ufd = -1;
> > @@ -136,7 +136,7 @@ bool postcopy_ram_supported_by_host(void)
> >      }
> >  
> >      /* Version and features check */
> > -    if (!ufd_version_check(ufd)) {
> > +    if (!ufd_version_check(ufd, mis)) {
> >          goto out;
> >      }
> >  
> > @@ -515,7 +515,7 @@ int postcopy_ram_enable_notify(MigrationIncomingState *mis)
> >       * Although the host check already tested the API, we need to
> >       * do the check again as an ABI handshake on the new fd.
> >       */
> > -    if (!ufd_version_check(mis->userfault_fd)) {
> > +    if (!ufd_version_check(mis->userfault_fd, mis)) {
> >          return -1;
> >      }
> >  
> > @@ -653,7 +653,7 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
> >  
> >  #else
> >  /* No target OS support, stubs just fail */
> > -bool postcopy_ram_supported_by_host(void)
> > +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
> >  {
> >      error_report("%s: No OS support", __func__);
> >      return false;
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 3b19a4a..f01e418 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1360,7 +1360,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
> >          return -1;
> >      }
> >  
> > -    if (!postcopy_ram_supported_by_host()) {
> > +    if (!postcopy_ram_supported_by_host(mis)) {
> >          postcopy_state_set(POSTCOPY_INCOMING_NONE);
> >          return -1;
> >      }
> > -- 
> > 1.8.3.1
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c
  2017-04-21 15:10         ` Alexey
@ 2017-04-21 15:49           ` Peter Maydell
  2017-04-25 11:23             ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Maydell @ 2017-04-21 15:49 UTC (permalink / raw)
  To: Alexey; +Cc: i.maximets, Dr. David Alan Gilbert, QEMU Developers

On 21 April 2017 at 16:10, Alexey <a.perevalov@samsung.com> wrote:
> Hello, thank you for so  detailed comment,
>
> On Fri, Apr 21, 2017 at 11:27:55AM +0100, Peter Maydell wrote:

>> Can we have a proper doc comment format comment, please,
>> since this is now a function available to all of QEMU?
>>
>> > +gint g_int_cmp64(gconstpointer a, gconstpointer b,
>> > +        gpointer __attribute__((unused)) user_data);
>>
>> What is this actually for? Looking at the original uses
>> I can tell that this is a GCompareDataFunc function, but
>> the comment should tell me that.
> I looked at another functions comments in QEMU, I didn't find
> some common style, and decided keep it as is. Maybe I omitted some
> best practice here.

See include/qemu/bitops.h for an example of the comment style.
More important than just the style is that the comment
should clearly explain the purpose of the function in detail.

Certainly many of our existing functions are poorly documented,
but we're trying to raise the bar gradually here.

> yes, it was copy pasted,
> right now, after mingw build check I think to use intptr_t as a type
> for comparision in this function or even keep gpointer and merge these two
> functions into _direct_.
> I saw intptr_t is widely used in QEMU.
>
> The intent of this function was a comparator for case when client code
> want to keep integers in pointer field. xen_disk.c uses UINT32 so it
> wasn't a problem, but migration uses 64 address (kernel provides it in
> __u64, long long), so on 32 platform it's a problem.

Code which tries to put a genuinely 64 bit value into a pointer
is buggy and needs to be fixed. I'm not clear if that is the
case here, or if the ABI from the kernel guarantees that the
value is really a pointer type and fits in uintptr_t / gpointer.

I don't think we need more than one of these functions.

>> This is also missing the copyright line.
> Yes, maybe it was better for me to ask before send.
> I found in util files with reference to GNU GPL, version 2, like
> in this file, also I found that
>
>  * This library is free software; you can redistribute it and/or
>  * modify it under the terms of the GNU Lesser General Public
>  * License as published by the Free Software Foundation; either
>  * version 2 of the License, or (at your option) any later version.
>
> So I just copied copyright reference from glib-compat.h.

Yes, that's the license statement, which is fine. What is
missing is the copyright line, which in glib-compat.h looks
like:
 Copyright IBM, Corp. 2013

For code you write, you want either your personal or (more likely)
a Samsung copyright line -- check with your company about what
their preferred form is.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side
  2017-04-21 12:00       ` Dr. David Alan Gilbert
@ 2017-04-21 18:47         ` Alexey
  2017-04-24 17:11           ` Dr. David Alan Gilbert
  2017-04-22  9:49         ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side (CPUMASK) Alexey
  1 sibling, 1 reply; 38+ messages in thread
From: Alexey @ 2017-04-21 18:47 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: i.maximets, qemu-devel

Hello, David!


I apologize, forgot to check patches with checkpatch.pl script, but now I checked,
and I fixed code styles in patches, however I checked also files,
migration.c has code style errors and glib-compat.h too.
I could send that patches to qemu-trivial, if you not against.


On Fri, Apr 21, 2017 at 01:00:32PM +0100, Dr. David Alan Gilbert wrote:
> * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > This patch provides downtime calculation per vCPU,
> > as a summary and as a overlapped value for all vCPUs.
> > 
> > This approach just keeps tree with page fault addr as a key,
> > and t1-t2 interval of pagefault time and page copy time, with
> > affected vCPU bit mask.
> > For more implementation details please see comment to
> > get_postcopy_total_downtime function.
> > 
> > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > ---
> >  include/migration/migration.h |  14 +++
> >  migration/migration.c         | 280 +++++++++++++++++++++++++++++++++++++++++-
> >  migration/postcopy-ram.c      |  24 +++-
> >  migration/qemu-file.c         |   1 -
> >  migration/trace-events        |   9 +-
> >  5 files changed, 323 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 5720c88..5d2c628 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -123,10 +123,24 @@ struct MigrationIncomingState {
> >  
> >      /* See savevm.c */
> >      LoadStateEntry_Head loadvm_handlers;
> > +
> > +    /*
> > +     *  Tree for keeping postcopy downtime,
> > +     *  necessary to calculate correct downtime, during multiple
> > +     *  vm suspends, it keeps host page address as a key and
> > +     *  DowntimeDuration as a data
> > +     *  NULL means kernel couldn't provide process thread id,
> > +     *  and QEMU couldn't identify which vCPU raise page fault
> > +     */
> > +    GTree *postcopy_downtime;
> >  };
> >  
> >  MigrationIncomingState *migration_incoming_get_current(void);
> >  void migration_incoming_state_destroy(void);
> > +void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > +void mark_postcopy_downtime_end(uint64_t addr);
> > +uint64_t get_postcopy_total_downtime(void);
> > +void destroy_downtime_duration(gpointer data);
> >  
> >  /*
> >   * An outstanding page request, on the source, having been received
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 79f6425..5bac434 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -38,6 +38,8 @@
> >  #include "io/channel-tls.h"
> >  #include "migration/colo.h"
> >  
> > +#define DEBUG_VCPU_DOWNTIME 1
> > +
> >  #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
> >  
> >  /* Amount of time to allocate to each "chunk" of bandwidth-throttled
> > @@ -77,6 +79,19 @@ static NotifierList migration_state_notifiers =
> >  
> >  static bool deferred_incoming;
> >  
> > +typedef struct {
> > +    int64_t begin;
> > +    int64_t end;
> > +    uint64_t *cpus; /* cpus bit mask array, QEMU bit functions support
> > +     bit operation on memory regions, but doesn't check out of range */
> > +} DowntimeDuration;
> > +
> > +typedef struct {
> > +    int64_t tp; /* point in time */
> > +    bool is_end;
> > +    uint64_t *cpus;
> > +} OverlapDowntime;
> > +
> >  /*
> >   * Current state of incoming postcopy; note this is not part of
> >   * MigrationIncomingState since it's state is used during cleanup
> > @@ -117,6 +132,13 @@ MigrationState *migrate_get_current(void)
> >      return &current_migration;
> >  }
> >  
> > +void destroy_downtime_duration(gpointer data)
> > +{
> > +    DowntimeDuration *dd = (DowntimeDuration *)data;
> > +    g_free(dd->cpus);
> > +    g_free(data);
> > +}
> > +
> >  MigrationIncomingState *migration_incoming_get_current(void)
> >  {
> >      static bool once;
> > @@ -138,10 +160,13 @@ void migration_incoming_state_destroy(void)
> >      struct MigrationIncomingState *mis = migration_incoming_get_current();
> >  
> >      qemu_event_destroy(&mis->main_thread_load_event);
> > +    if (mis->postcopy_downtime) {
> > +        g_tree_destroy(mis->postcopy_downtime);
> > +        mis->postcopy_downtime = NULL;
> > +    }
> >      loadvm_free_handlers(mis);
> >  }
> >  
> > -
> >  typedef struct {
> >      bool optional;
> >      uint32_t size;
> > @@ -1754,7 +1779,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> >       */
> >      ms->postcopy_after_devices = true;
> >      notifier_list_notify(&migration_state_notifiers, ms);
> > -
> 
> Stray deletion
> 
> >      ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
> >  
> >      qemu_mutex_unlock_iothread();
> > @@ -2117,3 +2141,255 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> >      return atomic_xchg(&incoming_postcopy_state, new_state);
> >  }
> >  
> > +#define SIZE_TO_KEEP_CPUBITS (1 + smp_cpus/sizeof(guint64))
> 
> Split out your cpu-sets so that you have an 'alloc_cpu_set',
> a 'set bit' a 'set all bits', dup etc
> (I see Linux has cpumask.h that has a 'cpu_set' that's
> basically the same thing, but we need something portablish.)
> 
> > +void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > +{
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    DowntimeDuration *dd;
> > +    if (!mis->postcopy_downtime) {
> > +        return;
> > +    }
> > +
> > +    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr); /* !!! cast */
> > +    if (!dd) {
> > +        dd = (DowntimeDuration *)g_new0(DowntimeDuration, 1);
> > +        dd->cpus = g_new0(guint64, SIZE_TO_KEEP_CPUBITS);
> > +        g_tree_insert(mis->postcopy_downtime, (gpointer)addr, (gpointer)dd);
> > +    }
> > +
> > +    if (cpu < 0) {
> > +        /* assume in this situation all vCPUs are sleeping */
> > +        int i;
> > +        for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
> > +            dd->cpus[i] = ~(uint64_t)0u;
> > +        }
> > +    } else
> > +        set_bit(cpu, dd->cpus);
> 
> Qemu coding style: Use {}'s even on one line blocks
> 
> > +
> > +    /*
> > +     *  overwrite previously set dd->begin, if that page already was
> > +     *     faulted on another cpu
> > +     */
> > +    dd->begin = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> 
> OK, so this is making a decision that needs to be documented;
> that is that if one CPU was already paused at time (a), then a second
> CPU we see is paused at time (b), then the time  we record only starts
> at (b) and ignores the time from a..b  - is that the way you want to do it?
Yes, time interval when at least one of vCPU is running isn't counted.

> As I say, it should be documented somewhere; it's probably worth
> adding something to docs/migration.txt about how this measurement works.
> 
> 
> > +    trace_mark_postcopy_downtime_begin(addr, dd, dd->begin, cpu);
> > +}
> > +
> > +void mark_postcopy_downtime_end(uint64_t addr)
> > +{
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    DowntimeDuration *dd;
> > +    if (!mis->postcopy_downtime) {
> > +        return;
> > +    }
> > +
> > +    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr);
> > +    if (!dd) {
> > +        /* error_report("Could not populate downtime duration completion time \n\
> > +                        There is no downtime duration for 0x%"PRIx64, addr); */
> 
> Error or no error - decide!   Is this happening for pages that arrive before
> they've been requested?
> 
> > +        return;
> > +    }
> > +
> > +    dd->end = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +    trace_mark_postcopy_downtime_end(addr, dd, dd->end);
> > +}
> > +
> > +struct downtime_overlay_cxt {
> > +    GPtrArray *downtime_points;
> > +    size_t number_of_points;
> > +};
> 
> Why 'cxt' ? If you mean as an abbreviation to context, then we normally use ctxt.
> 
> > +/*
> > + * This function split each DowntimeDuration, which represents as start/end
> > + * pointand makes a points of it, then fill array with points,
> > + * to sort it in future.
> > + */
> > +static gboolean split_duration_and_fill_points(gpointer key, gpointer value,
> > +                                        gpointer data)
> > +{
> > +    struct downtime_overlay_cxt *ctx = (struct downtime_overlay_cxt *)data;
> > +    DowntimeDuration *dd = (DowntimeDuration *)value;
> > +    GPtrArray *interval = ctx->downtime_points;
> > +    if (dd->begin) {
> > +        OverlapDowntime *od_begin = g_new0(OverlapDowntime, 1);
> > +        od_begin->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> > +        od_begin->tp = dd->begin;
> > +        od_begin->is_end = false;
> > +        g_ptr_array_add(interval, od_begin);
> > +        ctx->number_of_points += 1;
> > +    }
> > +
> > +    if (dd->end) {
> > +        OverlapDowntime *od_end = g_new0(OverlapDowntime, 1);
> > +        od_end->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> > +        od_end->tp = dd->end;
> > +        od_end->is_end = true;
> > +        g_ptr_array_add(interval, od_end);
> > +        ctx->number_of_points += 1;
> > +    }
> > +
> > +    if (dd->end && dd->begin)
> > +        trace_split_duration_and_fill_points(dd->end - dd->begin, (uint64_t)key);
> 
> again, need {}'s
> 
> > +    return FALSE;
> > +}
> > +
> > +#ifdef DEBUG_VCPU_DOWNTIME
> > +static gboolean calculate_per_cpu(gpointer key, gpointer value,
> > +                                  gpointer data)
> > +{
> > +    int *downtime_cpu = (int *)data;
> > +    DowntimeDuration *dd = (DowntimeDuration *)value;
> > +    int cpu_iter;
> > +    for (cpu_iter = 0; cpu_iter < smp_cpus; cpu_iter++) {
> > +        if (test_bit(cpu_iter, dd->cpus) && dd->end && dd->begin)
> > +            downtime_cpu[cpu_iter] += dd->end - dd->begin;
> > +    }
> > +    return FALSE;
> > +}
> > +#endif /* DEBUG_VCPU_DOWNTIME */
> > +
> > +static gint compare_downtime(gconstpointer a, gconstpointer b)
> > +{
> > +    DowntimeDuration *dda = (DowntimeDuration *)a;
> > +    DowntimeDuration *ddb = (DowntimeDuration *)b;
> > +    return dda->begin - ddb->begin;
> > +}
> > +
> > +static void destroy_overlap_downtime(gpointer data)
> > +{
> > +    OverlapDowntime *od = (OverlapDowntime *)data;
> > +    g_free(od->cpus);
> > +    g_free(data);
> > +}
> > +
> > +static int check_overlap(uint64_t *b)
> > +{
> > +    unsigned long zero_bit = find_first_zero_bit(b, BITS_PER_LONG * SIZE_TO_KEEP_CPUBITS);
> 
> Line's too long.
> 
> > +    return zero_bit >= smp_cpus;
> 
> So this is really 'all cpus are blocked'?
yes, that condition for it
> 
> > +}
> > +
> > +/*
> > + * This function calculates downtime per cpu and trace it
> > + *
> > + *  Also it calculates total downtime as an interval's overlap,
> > + *  for many vCPU.
> > + *
> > + *  The approach is following:
> > + *  Initially intervals are represented in tree where key is
> > + *  pagefault address, and values:
> > + *   begin - page fault time
> > + *   end   - page load time
> > + *   cpus  - bit mask shows affected cpus
> > + *
> > + *  To calculate overlap on all cpus, intervals converted into
> > + *  array of points in time (downtime_points), the size of
> > + *  array is 2 * number of nodes in tree of intervals (2 array
> > + *  elements per one in element of interval).
> > + *  Each element is marked as end (E) or as start (S) of interval.
> > + *  The overlap downtime will be calculated for SE, only in case
> > + *  there is sequence S(0..N)E(M) for every vCPU.
> > + *
> > + * As example we have 3 CPU
> > + *
> > + *      S1        E1           S1               E1
> > + * -----***********------------xxx***************------------------------> CPU1
> > + *
> > + *             S2                E2
> > + * ------------****************xxx---------------------------------------> CPU2
> > + *
> > + *                         S3            E3
> > + * ------------------------****xxx********-------------------------------> CPU3
> > + *
> > + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> > + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> > + * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
>                        ^ typo
> 
> > + * Legend of picture is following: * - means downtime per vCPU
> > + *                                 x - means overlapped downtime
> > + */
> > +uint64_t get_postcopy_total_downtime(void)
> > +{
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    uint64_t total_downtime = 0; /* for total overlapped downtime */
> > +    const int intervals = g_tree_nnodes(mis->postcopy_downtime);
> > +    int point_iter, start_point_iter, i;
> > +    struct downtime_overlay_cxt dp_ctx = { 0 };
> > +    /*
> > +     * array will contain 2 * interval points or less, if
> > +     * it was not page fault finalization for page,
> > +     * real count will be in ctx.number_of_points
> > +     */
> > +    dp_ctx.downtime_points = g_ptr_array_new_full(2 * intervals,
> > +                                                     destroy_overlap_downtime);
> 
> Is the g_ptr_array giving you anything here over a plain-old C array of pointers?
> You're not dynamically growing it.
Yes, I know upper bound of that array, at that time, and GPtrArray maybe
is little bit heavy structure here. Ok I'll use plain array.

> 
> > +    if (!mis->postcopy_downtime) {
> > +        goto out;
> > +    }
> > +
> > +#ifdef DEBUG_VCPU_DOWNTIME
> > +    {
> > +        gint *downtime_cpu = g_new0(int, smp_cpus);
> > +        g_tree_foreach(mis->postcopy_downtime, calculate_per_cpu, downtime_cpu);
> > +        for (point_iter = 0; point_iter < smp_cpus; point_iter++)
> > +        {
> > +            trace_downtime_per_cpu(point_iter, downtime_cpu[point_iter]);
> > +        }
> > +        g_free(downtime_cpu);
> > +    }
> > +#endif /* DEBUG_VCPU_DOWNTIME */
> 
> You mgight want to make that:
>   if (TRACE_DOWNTIME_PER_CPU_ENABLED) {
>   }
> 
> and remove the ifdef.
> 
> > +    /* make downtime points S/E from interval */
> > +    g_tree_foreach(mis->postcopy_downtime, split_duration_and_fill_points,
> > +                   &dp_ctx);
> > +    g_ptr_array_sort(dp_ctx.downtime_points, compare_downtime);
> > +
> > +    for (point_iter = 1; point_iter < dp_ctx.number_of_points;
> > +         point_iter++) {
> > +        OverlapDowntime *od = g_ptr_array_index(dp_ctx.downtime_points,
> > +                point_iter);
> > +        uint64_t *cur_cpus;
> > +        int smp_cpus_i = smp_cpus;
> > +        OverlapDowntime *prev_od = g_ptr_array_index(dp_ctx.downtime_points,
> > +                                                     point_iter - 1);
> > +        if (!od || !prev_od)
> > +            continue;
> 
> Why would that happen?
Now cycle goes till dp_ctx.number_of_points, so in this version it looks
impossible.
> 
> > +        /* we need sequence SE */
> > +        if (!od->is_end || prev_od->is_end)
> > +            continue;
> > +
> > +        cur_cpus = g_memdup(od->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> > +        for (start_point_iter = point_iter - 1;
> > +             start_point_iter >= 0 && smp_cpus_i;
> > +             start_point_iter--, smp_cpus_i--) {
> 
> I think I see what you're doing in this loop, although it's a bit hairy;
> I don't think I understand why we needed to get prev_od  if this loop is searching
> backwards?
Just for condition,
if (!od->is_end || prev_od->is_end) {
    continue;
}
to skip any other sequences, like EE,
do you think following condition more readable?
!(od->is_end && !prev_od->is_end) 
    continue;

Also prev_od is  nearest point to end, so time since that point to end
is interesting.
I depicted that.
> 
> > +            OverlapDowntime *t_od = g_ptr_array_index(dp_ctx.downtime_points,
> > +                                                      start_point_iter);
> > +            if (!t_od)
> > +                break;
> > +            /* should be S */
> > +            if (t_od->is_end)
> > +                break;
> > +
> > +            /* points were sorted, it's possible when
> > +             * end is not occured, but this points were ommited
> > +             * in split_duration_and_fill_points */
> > +            if (od->tp <= prev_od->tp) {
> 
> Why is this checking od and prev_od in this loop - isn't this
> loop mainly t_od ?
right, that code shouldn't be here.
> 
> > +                break;
> > +            }
> > +
> > +            for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
> > +                cur_cpus[i] |= t_od->cpus[i];
> > +            }
> > +
> > +            /* check_overlap - just count number of bits in cur_cpus,
> > +             * and compare it with smp_cpus */
> > +            if (check_overlap(cur_cpus)) {
> > +                total_downtime += od->tp - prev_od->tp;
> > +                /* situation when one S point represents all vCPU is possible */
> > +                break;
> > +            }
> > +        }
> > +        g_free(cur_cpus);
> > +    }
> > +    trace_get_postcopy_total_downtime(g_tree_nnodes(mis->postcopy_downtime),
> > +        total_downtime);
> > +out:
> > +    g_ptr_array_free(dp_ctx.downtime_points, TRUE);
> > +    return total_downtime;
> > +}
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 70f0480..ea89f4e 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -23,8 +23,10 @@
> >  #include "migration/postcopy-ram.h"
> >  #include "sysemu/sysemu.h"
> >  #include "sysemu/balloon.h"
> > +#include <sys/param.h>
> >  #include "qemu/error-report.h"
> >  #include "trace.h"
> > +#include "glib/glib-helper.h"
> >  
> >  /* Arbitrary limit on size of each discard command,
> >   * keeps them around ~200 bytes
> > @@ -81,6 +83,11 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> >          return false;
> >      }
> >  
> > +    if (mis && UFFD_FEATURE_THREAD_ID & api_struct.features) {
> 
> That's a very weird way of writing that test!  Also, I think you need
> to still make this user-selectable given the complexity/cost.
>
Like that?
{"execute": "migrate-set-capabilities" , "arguments":
{ "capabilities": [ { "capability": "calculate-postcopy-downtime", "state": true } ]
} }                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^

I tried to put heavy operations much after hot path (page requesting and
copying). The algorithm complexity is NumberOfPage*NumberOfvCPU, in case
of hugepages it's not so many.
Ok, if it's conditionally obtained from kernel, why not to give a user
ability to choose.

> > +        mis->postcopy_downtime = g_tree_new_full(g_int_cmp64,
> > +                                         NULL, NULL, destroy_downtime_duration);
> > +    }
> > +
> >      if (getpagesize() != ram_pagesize_summary()) {
> >          bool have_hp = false;
> >          /* We've got a huge page */
> > @@ -404,6 +411,18 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
> >      return 0;
> >  }
> >  
> > +static int get_mem_fault_cpu_index(uint32_t pid)
> > +{
> > +    CPUState *cpu_iter;
> > +
> > +    CPU_FOREACH(cpu_iter) {
> > +        if (cpu_iter->thread_id == pid)
> > +           return cpu_iter->cpu_index;
> > +    }
> > +    trace_get_mem_fault_cpu_index(pid);
> > +    return -1;
> > +}
> > +
> >  /*
> >   * Handle faults detected by the USERFAULT markings
> >   */
> > @@ -481,8 +500,10 @@ static void *postcopy_ram_fault_thread(void *opaque)
> >          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
> >          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
> >                                                  qemu_ram_get_idstr(rb),
> > -                                                rb_offset);
> > +                                                rb_offset, msg.arg.pagefault.feat.ptid);
> 
> Line length!
> 
> >  
> > +        mark_postcopy_downtime_begin(msg.arg.pagefault.address,
> > +                            get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
> >          /*
> >           * Send the request to the source - we want to request one
> >           * of our host page sizes (which is >= TPS)
> > @@ -577,6 +598,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> >  
> >          return -e;
> >      }
> > +    mark_postcopy_downtime_end((uint64_t)host);
> >  
> >      trace_postcopy_place_page(host);
> >      return 0;
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index 195fa94..c9f3e47 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -547,7 +547,6 @@ size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
> >  int qemu_peek_byte(QEMUFile *f, int offset)
> >  {
> >      int index = f->buf_index + offset;
> > -
> 
> Stray!
> 
> >      assert(!qemu_file_is_writable(f));
> >      assert(offset < IO_BUF_SIZE);
> >  
> > diff --git a/migration/trace-events b/migration/trace-events
> > index 7372ce2..ab2e1e4 100644
> > --- a/migration/trace-events
> > +++ b/migration/trace-events
> > @@ -110,6 +110,12 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> >  process_incoming_migration_co_postcopy_end_main(void) ""
> >  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
> >  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> > +mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> > +mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
> > +get_postcopy_total_downtime(int num, uint64_t total) "faults %d, total downtime %" PRIu64
> > +split_duration_and_fill_points(int64_t downtime, uint64_t addr) "downtime %" PRId64 " addr 0x%" PRIx64
> > +downtime_per_cpu(int cpu_index, int downtime) "downtime cpu[%d]=%d"
> > +source_return_path_thread_downtime(uint64_t downtime) "downtime %" PRIu64
> >  
> >  # migration/rdma.c
> >  qemu_rdma_accept_incoming_migration(void) ""
> > @@ -186,7 +192,7 @@ postcopy_ram_enable_notify(void) ""
> >  postcopy_ram_fault_thread_entry(void) ""
> >  postcopy_ram_fault_thread_exit(void) ""
> >  postcopy_ram_fault_thread_quit(void) ""
> > -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> > +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, int pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %d"
> >  postcopy_ram_incoming_cleanup_closeuf(void) ""
> >  postcopy_ram_incoming_cleanup_entry(void) ""
> >  postcopy_ram_incoming_cleanup_exit(void) ""
> > @@ -195,6 +201,7 @@ save_xbzrle_page_skipping(void) ""
> >  save_xbzrle_page_overflow(void) ""
> >  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
> >  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> > +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
> >  
> >  # migration/exec.c
> >  migration_exec_outgoing(const char *cmd) "cmd=%s"
> > -- 
> > 1.8.3.1
> > 
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side (CPUMASK)
  2017-04-21 12:00       ` Dr. David Alan Gilbert
  2017-04-21 18:47         ` Alexey
@ 2017-04-22  9:49         ` Alexey
  2017-04-24 17:13           ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 38+ messages in thread
From: Alexey @ 2017-04-22  9:49 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: i.maximets, qemu-devel

Hello David,
this mail just for CPUMASK discussion.

On Fri, Apr 21, 2017 at 01:00:32PM +0100, Dr. David Alan Gilbert wrote:
> * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > This patch provides downtime calculation per vCPU,
> > as a summary and as a overlapped value for all vCPUs.
> > 
> > This approach just keeps tree with page fault addr as a key,
> > and t1-t2 interval of pagefault time and page copy time, with
> > affected vCPU bit mask.
> > For more implementation details please see comment to
> > get_postcopy_total_downtime function.
> > 
> > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > ---
> >  include/migration/migration.h |  14 +++
> >  migration/migration.c         | 280 +++++++++++++++++++++++++++++++++++++++++-
> >  migration/postcopy-ram.c      |  24 +++-
> >  migration/qemu-file.c         |   1 -
> >  migration/trace-events        |   9 +-
> >  5 files changed, 323 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 5720c88..5d2c628 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -123,10 +123,24 @@ struct MigrationIncomingState {
> >  
> >      /* See savevm.c */
> >      LoadStateEntry_Head loadvm_handlers;
> > +
> > +    /*
> > +     *  Tree for keeping postcopy downtime,
> > +     *  necessary to calculate correct downtime, during multiple
> > +     *  vm suspends, it keeps host page address as a key and
> > +     *  DowntimeDuration as a data
> > +     *  NULL means kernel couldn't provide process thread id,
> > +     *  and QEMU couldn't identify which vCPU raise page fault
> > +     */
> > +    GTree *postcopy_downtime;
> >  };
> >  
> >  MigrationIncomingState *migration_incoming_get_current(void);
> >  void migration_incoming_state_destroy(void);
> > +void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > +void mark_postcopy_downtime_end(uint64_t addr);
> > +uint64_t get_postcopy_total_downtime(void);
> > +void destroy_downtime_duration(gpointer data);
> >  
> >  /*
> >   * An outstanding page request, on the source, having been received
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 79f6425..5bac434 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -38,6 +38,8 @@
> >  #include "io/channel-tls.h"
> >  #include "migration/colo.h"
> >  
> > +#define DEBUG_VCPU_DOWNTIME 1
> > +
> >  #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
> >  
> >  /* Amount of time to allocate to each "chunk" of bandwidth-throttled
> > @@ -77,6 +79,19 @@ static NotifierList migration_state_notifiers =
> >  
> >  static bool deferred_incoming;
> >  
> > +typedef struct {
> > +    int64_t begin;
> > +    int64_t end;
> > +    uint64_t *cpus; /* cpus bit mask array, QEMU bit functions support
> > +     bit operation on memory regions, but doesn't check out of range */
> > +} DowntimeDuration;
> > +
> > +typedef struct {
> > +    int64_t tp; /* point in time */
> > +    bool is_end;
> > +    uint64_t *cpus;
> > +} OverlapDowntime;
> > +
> >  /*
> >   * Current state of incoming postcopy; note this is not part of
> >   * MigrationIncomingState since it's state is used during cleanup
> > @@ -117,6 +132,13 @@ MigrationState *migrate_get_current(void)
> >      return &current_migration;
> >  }
> >  
> > +void destroy_downtime_duration(gpointer data)
> > +{
> > +    DowntimeDuration *dd = (DowntimeDuration *)data;
> > +    g_free(dd->cpus);
> > +    g_free(data);
> > +}
> > +
> >  MigrationIncomingState *migration_incoming_get_current(void)
> >  {
> >      static bool once;
> > @@ -138,10 +160,13 @@ void migration_incoming_state_destroy(void)
> >      struct MigrationIncomingState *mis = migration_incoming_get_current();
> >  
> >      qemu_event_destroy(&mis->main_thread_load_event);
> > +    if (mis->postcopy_downtime) {
> > +        g_tree_destroy(mis->postcopy_downtime);
> > +        mis->postcopy_downtime = NULL;
> > +    }
> >      loadvm_free_handlers(mis);
> >  }
> >  
> > -
> >  typedef struct {
> >      bool optional;
> >      uint32_t size;
> > @@ -1754,7 +1779,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> >       */
> >      ms->postcopy_after_devices = true;
> >      notifier_list_notify(&migration_state_notifiers, ms);
> > -
> 
> Stray deletion
> 
> >      ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
> >  
> >      qemu_mutex_unlock_iothread();
> > @@ -2117,3 +2141,255 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> >      return atomic_xchg(&incoming_postcopy_state, new_state);
> >  }
> >  
> > +#define SIZE_TO_KEEP_CPUBITS (1 + smp_cpus/sizeof(guint64))
> 
> Split out your cpu-sets so that you have an 'alloc_cpu_set',
> a 'set bit' a 'set all bits', dup etc
> (I see Linux has cpumask.h that has a 'cpu_set' that's
> basically the same thing, but we need something portablish.)
> 
Agree, the way I'm working with cpumask is little bit naive.
instead of set all_cpumask in case when all vCPU are sleeping with precision
((1 << smp_cpus) - 1), I just set ~0 it all, because I didn't use
functions like cpumask_and.
If you think, this patch should use cpumask, cpumask patchset/separate
thread should be introduced before, and then this patchset should be
rebased on top of it.


> > +void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > +{
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    DowntimeDuration *dd;
> > +    if (!mis->postcopy_downtime) {
> > +        return;
> > +    }
> > +
> > +    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr); /* !!! cast */
> > +    if (!dd) {
> > +        dd = (DowntimeDuration *)g_new0(DowntimeDuration, 1);
> > +        dd->cpus = g_new0(guint64, SIZE_TO_KEEP_CPUBITS);
> > +        g_tree_insert(mis->postcopy_downtime, (gpointer)addr, (gpointer)dd);
> > +    }
> > +
> > +    if (cpu < 0) {
> > +        /* assume in this situation all vCPUs are sleeping */
> > +        int i;
> > +        for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
> > +            dd->cpus[i] = ~(uint64_t)0u;
> > +        }
> > +    } else
> > +        set_bit(cpu, dd->cpus);
> 
> Qemu coding style: Use {}'s even on one line blocks
> 
> > +
> > +    /*
> > +     *  overwrite previously set dd->begin, if that page already was
> > +     *     faulted on another cpu
> > +     */
> > +    dd->begin = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> 
> OK, so this is making a decision that needs to be documented;
> that is that if one CPU was already paused at time (a), then a second
> CPU we see is paused at time (b), then the time  we record only starts
> at (b) and ignores the time from a..b  - is that the way you want to do it?
> As I say, it should be documented somewhere; it's probably worth
> adding something to docs/migration.txt about how this measurement works.
> 
> 
> > +    trace_mark_postcopy_downtime_begin(addr, dd, dd->begin, cpu);
> > +}
> > +
> > +void mark_postcopy_downtime_end(uint64_t addr)
> > +{
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    DowntimeDuration *dd;
> > +    if (!mis->postcopy_downtime) {
> > +        return;
> > +    }
> > +
> > +    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr);
> > +    if (!dd) {
> > +        /* error_report("Could not populate downtime duration completion time \n\
> > +                        There is no downtime duration for 0x%"PRIx64, addr); */
> 
> Error or no error - decide!   Is this happening for pages that arrive before
> they've been requested?
> 
> > +        return;
> > +    }
> > +
> > +    dd->end = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +    trace_mark_postcopy_downtime_end(addr, dd, dd->end);
> > +}
> > +
> > +struct downtime_overlay_cxt {
> > +    GPtrArray *downtime_points;
> > +    size_t number_of_points;
> > +};
> 
> Why 'cxt' ? If you mean as an abbreviation to context, then we normally use ctxt.
> 
> > +/*
> > + * This function split each DowntimeDuration, which represents as start/end
> > + * pointand makes a points of it, then fill array with points,
> > + * to sort it in future.
> > + */
> > +static gboolean split_duration_and_fill_points(gpointer key, gpointer value,
> > +                                        gpointer data)
> > +{
> > +    struct downtime_overlay_cxt *ctx = (struct downtime_overlay_cxt *)data;
> > +    DowntimeDuration *dd = (DowntimeDuration *)value;
> > +    GPtrArray *interval = ctx->downtime_points;
> > +    if (dd->begin) {
> > +        OverlapDowntime *od_begin = g_new0(OverlapDowntime, 1);
> > +        od_begin->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> > +        od_begin->tp = dd->begin;
> > +        od_begin->is_end = false;
> > +        g_ptr_array_add(interval, od_begin);
> > +        ctx->number_of_points += 1;
> > +    }
> > +
> > +    if (dd->end) {
> > +        OverlapDowntime *od_end = g_new0(OverlapDowntime, 1);
> > +        od_end->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> > +        od_end->tp = dd->end;
> > +        od_end->is_end = true;
> > +        g_ptr_array_add(interval, od_end);
> > +        ctx->number_of_points += 1;
> > +    }
> > +
> > +    if (dd->end && dd->begin)
> > +        trace_split_duration_and_fill_points(dd->end - dd->begin, (uint64_t)key);
> 
> again, need {}'s
> 
> > +    return FALSE;
> > +}
> > +
> > +#ifdef DEBUG_VCPU_DOWNTIME
> > +static gboolean calculate_per_cpu(gpointer key, gpointer value,
> > +                                  gpointer data)
> > +{
> > +    int *downtime_cpu = (int *)data;
> > +    DowntimeDuration *dd = (DowntimeDuration *)value;
> > +    int cpu_iter;
> > +    for (cpu_iter = 0; cpu_iter < smp_cpus; cpu_iter++) {
> > +        if (test_bit(cpu_iter, dd->cpus) && dd->end && dd->begin)
> > +            downtime_cpu[cpu_iter] += dd->end - dd->begin;
> > +    }
> > +    return FALSE;
> > +}
> > +#endif /* DEBUG_VCPU_DOWNTIME */
> > +
> > +static gint compare_downtime(gconstpointer a, gconstpointer b)
> > +{
> > +    DowntimeDuration *dda = (DowntimeDuration *)a;
> > +    DowntimeDuration *ddb = (DowntimeDuration *)b;
> > +    return dda->begin - ddb->begin;
> > +}
> > +
> > +static void destroy_overlap_downtime(gpointer data)
> > +{
> > +    OverlapDowntime *od = (OverlapDowntime *)data;
> > +    g_free(od->cpus);
> > +    g_free(data);
> > +}
> > +
> > +static int check_overlap(uint64_t *b)
> > +{
> > +    unsigned long zero_bit = find_first_zero_bit(b, BITS_PER_LONG * SIZE_TO_KEEP_CPUBITS);
> 
> Line's too long.
> 
> > +    return zero_bit >= smp_cpus;
> 
> So this is really 'all cpus are blocked'?
> 
> > +}
> > +
> > +/*
> > + * This function calculates downtime per cpu and trace it
> > + *
> > + *  Also it calculates total downtime as an interval's overlap,
> > + *  for many vCPU.
> > + *
> > + *  The approach is following:
> > + *  Initially intervals are represented in tree where key is
> > + *  pagefault address, and values:
> > + *   begin - page fault time
> > + *   end   - page load time
> > + *   cpus  - bit mask shows affected cpus
> > + *
> > + *  To calculate overlap on all cpus, intervals converted into
> > + *  array of points in time (downtime_points), the size of
> > + *  array is 2 * number of nodes in tree of intervals (2 array
> > + *  elements per one in element of interval).
> > + *  Each element is marked as end (E) or as start (S) of interval.
> > + *  The overlap downtime will be calculated for SE, only in case
> > + *  there is sequence S(0..N)E(M) for every vCPU.
> > + *
> > + * As example we have 3 CPU
> > + *
> > + *      S1        E1           S1               E1
> > + * -----***********------------xxx***************------------------------> CPU1
> > + *
> > + *             S2                E2
> > + * ------------****************xxx---------------------------------------> CPU2
> > + *
> > + *                         S3            E3
> > + * ------------------------****xxx********-------------------------------> CPU3
> > + *
> > + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> > + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> > + * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
>                        ^ typo
> 
> > + * Legend of picture is following: * - means downtime per vCPU
> > + *                                 x - means overlapped downtime
> > + */
> > +uint64_t get_postcopy_total_downtime(void)
> > +{
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    uint64_t total_downtime = 0; /* for total overlapped downtime */
> > +    const int intervals = g_tree_nnodes(mis->postcopy_downtime);
> > +    int point_iter, start_point_iter, i;
> > +    struct downtime_overlay_cxt dp_ctx = { 0 };
> > +    /*
> > +     * array will contain 2 * interval points or less, if
> > +     * it was not page fault finalization for page,
> > +     * real count will be in ctx.number_of_points
> > +     */
> > +    dp_ctx.downtime_points = g_ptr_array_new_full(2 * intervals,
> > +                                                     destroy_overlap_downtime);
> 
> Is the g_ptr_array giving you anything here over a plain-old C array of pointers?
> You're not dynamically growing it.
> 
> > +    if (!mis->postcopy_downtime) {
> > +        goto out;
> > +    }
> > +
> > +#ifdef DEBUG_VCPU_DOWNTIME
> > +    {
> > +        gint *downtime_cpu = g_new0(int, smp_cpus);
> > +        g_tree_foreach(mis->postcopy_downtime, calculate_per_cpu, downtime_cpu);
> > +        for (point_iter = 0; point_iter < smp_cpus; point_iter++)
> > +        {
> > +            trace_downtime_per_cpu(point_iter, downtime_cpu[point_iter]);
> > +        }
> > +        g_free(downtime_cpu);
> > +    }
> > +#endif /* DEBUG_VCPU_DOWNTIME */
> 
> You mgight want to make that:
>   if (TRACE_DOWNTIME_PER_CPU_ENABLED) {
>   }
> 
> and remove the ifdef.
> 
> > +    /* make downtime points S/E from interval */
> > +    g_tree_foreach(mis->postcopy_downtime, split_duration_and_fill_points,
> > +                   &dp_ctx);
> > +    g_ptr_array_sort(dp_ctx.downtime_points, compare_downtime);
> > +
> > +    for (point_iter = 1; point_iter < dp_ctx.number_of_points;
> > +         point_iter++) {
> > +        OverlapDowntime *od = g_ptr_array_index(dp_ctx.downtime_points,
> > +                point_iter);
> > +        uint64_t *cur_cpus;
> > +        int smp_cpus_i = smp_cpus;
> > +        OverlapDowntime *prev_od = g_ptr_array_index(dp_ctx.downtime_points,
> > +                                                     point_iter - 1);
> > +        if (!od || !prev_od)
> > +            continue;
> 
> Why would that happen?
> 
> > +        /* we need sequence SE */
> > +        if (!od->is_end || prev_od->is_end)
> > +            continue;
> > +
> > +        cur_cpus = g_memdup(od->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> > +        for (start_point_iter = point_iter - 1;
> > +             start_point_iter >= 0 && smp_cpus_i;
> > +             start_point_iter--, smp_cpus_i--) {
> 
> I think I see what you're doing in this loop, although it's a bit hairy;
> I don't think I understand why we needed to get prev_od  if this loop is searching
> backwards?
> 
> > +            OverlapDowntime *t_od = g_ptr_array_index(dp_ctx.downtime_points,
> > +                                                      start_point_iter);
> > +            if (!t_od)
> > +                break;
> > +            /* should be S */
> > +            if (t_od->is_end)
> > +                break;
> > +
> > +            /* points were sorted, it's possible when
> > +             * end is not occured, but this points were ommited
> > +             * in split_duration_and_fill_points */
> > +            if (od->tp <= prev_od->tp) {
> 
> Why is this checking od and prev_od in this loop - isn't this
> loop mainly t_od ?
> 
> > +                break;
> > +            }
> > +
> > +            for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
> > +                cur_cpus[i] |= t_od->cpus[i];
> > +            }
> > +
> > +            /* check_overlap - just count number of bits in cur_cpus,
> > +             * and compare it with smp_cpus */
> > +            if (check_overlap(cur_cpus)) {
> > +                total_downtime += od->tp - prev_od->tp;
> > +                /* situation when one S point represents all vCPU is possible */
> > +                break;
> > +            }
> > +        }
> > +        g_free(cur_cpus);
> > +    }
> > +    trace_get_postcopy_total_downtime(g_tree_nnodes(mis->postcopy_downtime),
> > +        total_downtime);
> > +out:
> > +    g_ptr_array_free(dp_ctx.downtime_points, TRUE);
> > +    return total_downtime;
> > +}
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 70f0480..ea89f4e 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -23,8 +23,10 @@
> >  #include "migration/postcopy-ram.h"
> >  #include "sysemu/sysemu.h"
> >  #include "sysemu/balloon.h"
> > +#include <sys/param.h>
> >  #include "qemu/error-report.h"
> >  #include "trace.h"
> > +#include "glib/glib-helper.h"
> >  
> >  /* Arbitrary limit on size of each discard command,
> >   * keeps them around ~200 bytes
> > @@ -81,6 +83,11 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> >          return false;
> >      }
> >  
> > +    if (mis && UFFD_FEATURE_THREAD_ID & api_struct.features) {
> 
> That's a very weird way of writing that test!  Also, I think you need
> to still make this user-selectable given the complexity/cost.
> 
> > +        mis->postcopy_downtime = g_tree_new_full(g_int_cmp64,
> > +                                         NULL, NULL, destroy_downtime_duration);
> > +    }
> > +
> >      if (getpagesize() != ram_pagesize_summary()) {
> >          bool have_hp = false;
> >          /* We've got a huge page */
> > @@ -404,6 +411,18 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
> >      return 0;
> >  }
> >  
> > +static int get_mem_fault_cpu_index(uint32_t pid)
> > +{
> > +    CPUState *cpu_iter;
> > +
> > +    CPU_FOREACH(cpu_iter) {
> > +        if (cpu_iter->thread_id == pid)
> > +           return cpu_iter->cpu_index;
> > +    }
> > +    trace_get_mem_fault_cpu_index(pid);
> > +    return -1;
> > +}
> > +
> >  /*
> >   * Handle faults detected by the USERFAULT markings
> >   */
> > @@ -481,8 +500,10 @@ static void *postcopy_ram_fault_thread(void *opaque)
> >          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
> >          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
> >                                                  qemu_ram_get_idstr(rb),
> > -                                                rb_offset);
> > +                                                rb_offset, msg.arg.pagefault.feat.ptid);
> 
> Line length!
> 
> >  
> > +        mark_postcopy_downtime_begin(msg.arg.pagefault.address,
> > +                            get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
> >          /*
> >           * Send the request to the source - we want to request one
> >           * of our host page sizes (which is >= TPS)
> > @@ -577,6 +598,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> >  
> >          return -e;
> >      }
> > +    mark_postcopy_downtime_end((uint64_t)host);
> >  
> >      trace_postcopy_place_page(host);
> >      return 0;
> > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > index 195fa94..c9f3e47 100644
> > --- a/migration/qemu-file.c
> > +++ b/migration/qemu-file.c
> > @@ -547,7 +547,6 @@ size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
> >  int qemu_peek_byte(QEMUFile *f, int offset)
> >  {
> >      int index = f->buf_index + offset;
> > -
> 
> Stray!
> 
> >      assert(!qemu_file_is_writable(f));
> >      assert(offset < IO_BUF_SIZE);
> >  
> > diff --git a/migration/trace-events b/migration/trace-events
> > index 7372ce2..ab2e1e4 100644
> > --- a/migration/trace-events
> > +++ b/migration/trace-events
> > @@ -110,6 +110,12 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> >  process_incoming_migration_co_postcopy_end_main(void) ""
> >  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
> >  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> > +mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> > +mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
> > +get_postcopy_total_downtime(int num, uint64_t total) "faults %d, total downtime %" PRIu64
> > +split_duration_and_fill_points(int64_t downtime, uint64_t addr) "downtime %" PRId64 " addr 0x%" PRIx64
> > +downtime_per_cpu(int cpu_index, int downtime) "downtime cpu[%d]=%d"
> > +source_return_path_thread_downtime(uint64_t downtime) "downtime %" PRIu64
> >  
> >  # migration/rdma.c
> >  qemu_rdma_accept_incoming_migration(void) ""
> > @@ -186,7 +192,7 @@ postcopy_ram_enable_notify(void) ""
> >  postcopy_ram_fault_thread_entry(void) ""
> >  postcopy_ram_fault_thread_exit(void) ""
> >  postcopy_ram_fault_thread_quit(void) ""
> > -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> > +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, int pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %d"
> >  postcopy_ram_incoming_cleanup_closeuf(void) ""
> >  postcopy_ram_incoming_cleanup_entry(void) ""
> >  postcopy_ram_incoming_cleanup_exit(void) ""
> > @@ -195,6 +201,7 @@ save_xbzrle_page_skipping(void) ""
> >  save_xbzrle_page_overflow(void) ""
> >  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
> >  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> > +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
> >  
> >  # migration/exec.c
> >  migration_exec_outgoing(const char *cmd) "cmd=%s"
> > -- 
> > 1.8.3.1
> > 
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support
  2017-04-21 15:22         ` Alexey
@ 2017-04-24  8:03           ` Peter Xu
  2017-04-24  8:12           ` Peter Xu
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Xu @ 2017-04-24  8:03 UTC (permalink / raw)
  To: Alexey; +Cc: Dr. David Alan Gilbert, i.maximets, aarcange, qemu-devel

On Fri, Apr 21, 2017 at 06:22:12PM +0300, Alexey wrote:
> On Fri, Apr 21, 2017 at 11:24:54AM +0100, Dr. David Alan Gilbert wrote:
> > * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > > Userfaultfd mechanism is able to provide process thread id,
> > > in case when client request it with UFDD_API ioctl.
> > > 
> > > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > 
> > There seem to be two parts to this:
> >   a) Adding the mis parameter to ufd_version_check
> >   b) Asking for the feature
> > 
> > Please split it into two patches.
> > 
> > Also....
> > 
> > > ---
> > >  include/migration/postcopy-ram.h |  2 +-
> > >  migration/migration.c            |  2 +-
> > >  migration/postcopy-ram.c         | 12 ++++++------
> > >  migration/savevm.c               |  2 +-
> > >  4 files changed, 9 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > > index 8e036b9..809f6db 100644
> > > --- a/include/migration/postcopy-ram.h
> > > +++ b/include/migration/postcopy-ram.h
> > > @@ -14,7 +14,7 @@
> > >  #define QEMU_POSTCOPY_RAM_H
> > >  
> > >  /* Return true if the host supports everything we need to do postcopy-ram */
> > > -bool postcopy_ram_supported_by_host(void);
> > > +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
> > >  
> > >  /*
> > >   * Make all of RAM sensitive to accesses to areas that haven't yet been written
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index ad4036f..79f6425 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -802,7 +802,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
> > >           * special support.
> > >           */
> > >          if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
> > > -            !postcopy_ram_supported_by_host()) {
> > > +            !postcopy_ram_supported_by_host(NULL)) {
> > >              /* postcopy_ram_supported_by_host will have emitted a more
> > >               * detailed message
> > >               */
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index dc80dbb..70f0480 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -60,13 +60,13 @@ struct PostcopyDiscardState {
> > >  #include <sys/eventfd.h>
> > >  #include <linux/userfaultfd.h>
> > >  
> > > -static bool ufd_version_check(int ufd)
> > > +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > >  {
> > >      struct uffdio_api api_struct;
> > >      uint64_t ioctl_mask;
> > >  
> > >      api_struct.api = UFFD_API;
> > > -    api_struct.features = 0;
> > > +    api_struct.features = UFFD_FEATURE_THREAD_ID;
> > >      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> > >          error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
> > >                       strerror(errno));
> > 
> > You're not actually using the 'mis' here - what I'd expected was
> > something that was going to check if the UFFDIO_API return said that it really
> > had the feature, and if so store a flag in the MIS somewhere.
> > 
> > Also, I'm not sure it's right to set 'api_struct.features' on the input - what
> > happens if this is run on an old kernel - we don't want postcopy to fail on
> > an old kernel without your feature.
> > I'm not 100% sure of the interface, but I think the way it works is you set
> > features = 0 before the call, and then check the api_struct.features in the
> > return - in the same way that I check for UFFD_FEATURE_MISSING_HUGETLBFS.
> > 
> We need to ask kernel about that feature,
> right,
> kernel returns back available features
> uffdio_api.features = UFFD_API_FEATURES
> but it also stores requested features

I feel like this does not against Dave's comment, maybe we just need
to send the UFFDIO_API twice? Like:

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 85fd8d7..fd0905f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -64,6 +64,7 @@ static bool ufd_version_check(int ufd)
 {
     struct uffdio_api api_struct;
     uint64_t ioctl_mask;
+    uint64_t features = 0;

     api_struct.api = UFFD_API;
     api_struct.features = 0;
@@ -92,6 +93,27 @@ static bool ufd_version_check(int ufd)
             return false;
         }
     }
+
+#ifdef UFFD_FEATURE_THREAD_ID
+    if (api_struct.features & UFFD_FEATURE_THREAD_ID) {
+        features |= UFFD_FEATURE_THREAD_ID;
+    }
+#endif
+
+    if (features) {
+        /*
+         * If there are new features to be enabled from userspace,
+         * trigger another UFFDIO_API ioctl.
+         */
+        api_struct.api = UFFD_API;
+        api_struct.features = features;
+        if (ioctl(ufd, UFFDIO_API, &api_struct)) {
+            error_report("UFFDIO_API failed to setup features: 0x%"PRIx64,
+                         features);
+            return false;
+        }
+    }
+
     return true;
 }

> /* only enable the requested features for this uffd context */
>  ctx->features = uffd_ctx_features(features);
> 
> so, at the time when process thread id is going to be sent
> kernel checks if it was requested
> +       if (features & UFFD_FEATURE_THREAD_ID)
> +               msg.arg.pagefault.ptid = task_pid_vnr(current);

I am slightly curious about why we need this if block, after all
userspace should know whether the ptid is valid from the fist
UFFDIO_API feature list...

Thanks,

> 
> from patch message:
> 
>  Process's thread id is being provided when user requeste it
> by setting UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.
> 
> UFFD_FEATURE_MISSING_HUGETLBFS - look like default, unconditional
> behavior (I didn't find any usage of that define in kernel).

-- 
Peter Xu

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support
  2017-04-21 15:22         ` Alexey
  2017-04-24  8:03           ` Peter Xu
@ 2017-04-24  8:12           ` Peter Xu
  2017-04-24  8:38             ` Alexey
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Xu @ 2017-04-24  8:12 UTC (permalink / raw)
  To: Alexey; +Cc: Dr. David Alan Gilbert, i.maximets, aarcange, qemu-devel

On Fri, Apr 21, 2017 at 06:22:12PM +0300, Alexey wrote:
> On Fri, Apr 21, 2017 at 11:24:54AM +0100, Dr. David Alan Gilbert wrote:
> > * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > > Userfaultfd mechanism is able to provide process thread id,
> > > in case when client request it with UFDD_API ioctl.
> > > 
> > > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > 
> > There seem to be two parts to this:
> >   a) Adding the mis parameter to ufd_version_check
> >   b) Asking for the feature
> > 
> > Please split it into two patches.
> > 
> > Also....
> > 
> > > ---
> > >  include/migration/postcopy-ram.h |  2 +-
> > >  migration/migration.c            |  2 +-
> > >  migration/postcopy-ram.c         | 12 ++++++------
> > >  migration/savevm.c               |  2 +-
> > >  4 files changed, 9 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > > index 8e036b9..809f6db 100644
> > > --- a/include/migration/postcopy-ram.h
> > > +++ b/include/migration/postcopy-ram.h
> > > @@ -14,7 +14,7 @@
> > >  #define QEMU_POSTCOPY_RAM_H
> > >  
> > >  /* Return true if the host supports everything we need to do postcopy-ram */
> > > -bool postcopy_ram_supported_by_host(void);
> > > +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
> > >  
> > >  /*
> > >   * Make all of RAM sensitive to accesses to areas that haven't yet been written
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index ad4036f..79f6425 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -802,7 +802,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
> > >           * special support.
> > >           */
> > >          if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
> > > -            !postcopy_ram_supported_by_host()) {
> > > +            !postcopy_ram_supported_by_host(NULL)) {
> > >              /* postcopy_ram_supported_by_host will have emitted a more
> > >               * detailed message
> > >               */
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index dc80dbb..70f0480 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -60,13 +60,13 @@ struct PostcopyDiscardState {
> > >  #include <sys/eventfd.h>
> > >  #include <linux/userfaultfd.h>
> > >  
> > > -static bool ufd_version_check(int ufd)
> > > +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > >  {
> > >      struct uffdio_api api_struct;
> > >      uint64_t ioctl_mask;
> > >  
> > >      api_struct.api = UFFD_API;
> > > -    api_struct.features = 0;
> > > +    api_struct.features = UFFD_FEATURE_THREAD_ID;
> > >      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> > >          error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
> > >                       strerror(errno));
> > 
> > You're not actually using the 'mis' here - what I'd expected was
> > something that was going to check if the UFFDIO_API return said that it really
> > had the feature, and if so store a flag in the MIS somewhere.
> > 
> > Also, I'm not sure it's right to set 'api_struct.features' on the input - what
> > happens if this is run on an old kernel - we don't want postcopy to fail on
> > an old kernel without your feature.
> > I'm not 100% sure of the interface, but I think the way it works is you set
> > features = 0 before the call, and then check the api_struct.features in the
> > return - in the same way that I check for UFFD_FEATURE_MISSING_HUGETLBFS.
> > 
> We need to ask kernel about that feature,
> right,
> kernel returns back available features
> uffdio_api.features = UFFD_API_FEATURES
> but it also stores requested features

I feel like this does not against Dave's comment, maybe we just need
to send the UFFDIO_API twice? Like:

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 85fd8d7..fd0905f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -64,6 +64,7 @@ static bool ufd_version_check(int ufd)
 {
     struct uffdio_api api_struct;
     uint64_t ioctl_mask;
+    uint64_t features = 0;

     api_struct.api = UFFD_API;
     api_struct.features = 0;
@@ -92,6 +93,27 @@ static bool ufd_version_check(int ufd)
             return false;
         }
     }
+
+#ifdef UFFD_FEATURE_THREAD_ID
+    if (api_struct.features & UFFD_FEATURE_THREAD_ID) {
+        features |= UFFD_FEATURE_THREAD_ID;
+    }
+#endif
+
+    if (features) {
+        /*
+         * If there are new features to be enabled from userspace,
+         * trigger another UFFDIO_API ioctl.
+         */
+        api_struct.api = UFFD_API;
+        api_struct.features = features;
+        if (ioctl(ufd, UFFDIO_API, &api_struct)) {
+            error_report("UFFDIO_API failed to setup features: 0x%"PRIx64,
+                         features);
+            return false;
+        }
+    }
+
     return true;
 }

> /* only enable the requested features for this uffd context */
>  ctx->features = uffd_ctx_features(features);
> 
> so, at the time when process thread id is going to be sent
> kernel checks if it was requested
> +       if (features & UFFD_FEATURE_THREAD_ID)
> +               msg.arg.pagefault.ptid = task_pid_vnr(current);

(I am slightly curious about why we need this if block, after all
 userspace should know whether the ptid field would be valid from the
 first UFFDIO_API ioctl, right?)

Thanks,

> 
> from patch message:
> 
>  Process's thread id is being provided when user requeste it
> by setting UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.
> 
> UFFD_FEATURE_MISSING_HUGETLBFS - look like default, unconditional
> behavior (I didn't find any usage of that define in kernel).

-- 
Peter Xu

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support
  2017-04-24  8:12           ` Peter Xu
@ 2017-04-24  8:38             ` Alexey
  2017-04-24 17:10               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 38+ messages in thread
From: Alexey @ 2017-04-24  8:38 UTC (permalink / raw)
  To: Peter Xu; +Cc: i.maximets, aarcange, Dr. David Alan Gilbert, qemu-devel

On Mon, Apr 24, 2017 at 04:12:29PM +0800, Peter Xu wrote:
> On Fri, Apr 21, 2017 at 06:22:12PM +0300, Alexey wrote:
> > On Fri, Apr 21, 2017 at 11:24:54AM +0100, Dr. David Alan Gilbert wrote:
> > > * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > > > Userfaultfd mechanism is able to provide process thread id,
> > > > in case when client request it with UFDD_API ioctl.
> > > > 
> > > > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > 
> > > There seem to be two parts to this:
> > >   a) Adding the mis parameter to ufd_version_check
> > >   b) Asking for the feature
> > > 
> > > Please split it into two patches.
> > > 
> > > Also....
> > > 
> > > > ---
> > > >  include/migration/postcopy-ram.h |  2 +-
> > > >  migration/migration.c            |  2 +-
> > > >  migration/postcopy-ram.c         | 12 ++++++------
> > > >  migration/savevm.c               |  2 +-
> > > >  4 files changed, 9 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > > > index 8e036b9..809f6db 100644
> > > > --- a/include/migration/postcopy-ram.h
> > > > +++ b/include/migration/postcopy-ram.h
> > > > @@ -14,7 +14,7 @@
> > > >  #define QEMU_POSTCOPY_RAM_H
> > > >  
> > > >  /* Return true if the host supports everything we need to do postcopy-ram */
> > > > -bool postcopy_ram_supported_by_host(void);
> > > > +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
> > > >  
> > > >  /*
> > > >   * Make all of RAM sensitive to accesses to areas that haven't yet been written
> > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > index ad4036f..79f6425 100644
> > > > --- a/migration/migration.c
> > > > +++ b/migration/migration.c
> > > > @@ -802,7 +802,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
> > > >           * special support.
> > > >           */
> > > >          if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
> > > > -            !postcopy_ram_supported_by_host()) {
> > > > +            !postcopy_ram_supported_by_host(NULL)) {
> > > >              /* postcopy_ram_supported_by_host will have emitted a more
> > > >               * detailed message
> > > >               */
> > > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > > index dc80dbb..70f0480 100644
> > > > --- a/migration/postcopy-ram.c
> > > > +++ b/migration/postcopy-ram.c
> > > > @@ -60,13 +60,13 @@ struct PostcopyDiscardState {
> > > >  #include <sys/eventfd.h>
> > > >  #include <linux/userfaultfd.h>
> > > >  
> > > > -static bool ufd_version_check(int ufd)
> > > > +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > > >  {
> > > >      struct uffdio_api api_struct;
> > > >      uint64_t ioctl_mask;
> > > >  
> > > >      api_struct.api = UFFD_API;
> > > > -    api_struct.features = 0;
> > > > +    api_struct.features = UFFD_FEATURE_THREAD_ID;
> > > >      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> > > >          error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
> > > >                       strerror(errno));
> > > 
> > > You're not actually using the 'mis' here - what I'd expected was
> > > something that was going to check if the UFFDIO_API return said that it really
> > > had the feature, and if so store a flag in the MIS somewhere.
> > > 
> > > Also, I'm not sure it's right to set 'api_struct.features' on the input - what
> > > happens if this is run on an old kernel - we don't want postcopy to fail on
> > > an old kernel without your feature.
> > > I'm not 100% sure of the interface, but I think the way it works is you set
> > > features = 0 before the call, and then check the api_struct.features in the
> > > return - in the same way that I check for UFFD_FEATURE_MISSING_HUGETLBFS.
> > > 
> > We need to ask kernel about that feature,
> > right,
> > kernel returns back available features
> > uffdio_api.features = UFFD_API_FEATURES
> > but it also stores requested features
> 
> I feel like this does not against Dave's comment, maybe we just need
> to send the UFFDIO_API twice? Like:
yes, ioctl with UFFDIO_API will fail on old kernel if we will request
e.g. UFFD_FEATURE_THREAD_ID or other new feature.

So in general way need a per feature request, for better error handling.

> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 85fd8d7..fd0905f 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -64,6 +64,7 @@ static bool ufd_version_check(int ufd)
>  {
>      struct uffdio_api api_struct;
>      uint64_t ioctl_mask;
> +    uint64_t features = 0;
> 
>      api_struct.api = UFFD_API;
>      api_struct.features = 0;
> @@ -92,6 +93,27 @@ static bool ufd_version_check(int ufd)
>              return false;
>          }
>      }
> +
> +#ifdef UFFD_FEATURE_THREAD_ID
> +    if (api_struct.features & UFFD_FEATURE_THREAD_ID) {
> +        features |= UFFD_FEATURE_THREAD_ID;
> +    }
> +#endif
> +
> +    if (features) {
> +        /*
> +         * If there are new features to be enabled from userspace,
> +         * trigger another UFFDIO_API ioctl.
> +         */
> +        api_struct.api = UFFD_API;
> +        api_struct.features = features;
> +        if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> +            error_report("UFFDIO_API failed to setup features: 0x%"PRIx64,
> +                         features);
> +            return false;
> +        }
> +    }
> +
>      return true;
>  }
> 
> > /* only enable the requested features for this uffd context */
> >  ctx->features = uffd_ctx_features(features);
> > 
> > so, at the time when process thread id is going to be sent
> > kernel checks if it was requested
> > +       if (features & UFFD_FEATURE_THREAD_ID)
> > +               msg.arg.pagefault.ptid = task_pid_vnr(current);
> 
> (I am slightly curious about why we need this if block, after all
>  userspace should know whether the ptid field would be valid from the
>  first UFFDIO_API ioctl, right?)
If I correctly understand you question ) that condition was suggested,
due to page faulting is performance critical part (in general, not only postcopy
case ), that's why it should be enabled from userspace, 
only for statistics/debug purpose.
Also looks like David want to see that feature on QEMU as not always
feature too.

> 
> Thanks,
> 
> > 
> > from patch message:
> > 
> >  Process's thread id is being provided when user requeste it
> > by setting UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.
> > 
> > UFFD_FEATURE_MISSING_HUGETLBFS - look like default, unconditional
> > behavior (I didn't find any usage of that define in kernel).
> 
> -- 
> Peter Xu
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support
  2017-04-24  8:38             ` Alexey
@ 2017-04-24 17:10               ` Dr. David Alan Gilbert
  2017-04-25  7:55                 ` Alexey
  0 siblings, 1 reply; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-24 17:10 UTC (permalink / raw)
  To: Alexey; +Cc: Peter Xu, i.maximets, aarcange, qemu-devel

* Alexey (a.perevalov@samsung.com) wrote:
> On Mon, Apr 24, 2017 at 04:12:29PM +0800, Peter Xu wrote:
> > On Fri, Apr 21, 2017 at 06:22:12PM +0300, Alexey wrote:
> > > On Fri, Apr 21, 2017 at 11:24:54AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > > > > Userfaultfd mechanism is able to provide process thread id,
> > > > > in case when client request it with UFDD_API ioctl.
> > > > > 
> > > > > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > 
> > > > There seem to be two parts to this:
> > > >   a) Adding the mis parameter to ufd_version_check
> > > >   b) Asking for the feature
> > > > 
> > > > Please split it into two patches.
> > > > 
> > > > Also....
> > > > 
> > > > > ---
> > > > >  include/migration/postcopy-ram.h |  2 +-
> > > > >  migration/migration.c            |  2 +-
> > > > >  migration/postcopy-ram.c         | 12 ++++++------
> > > > >  migration/savevm.c               |  2 +-
> > > > >  4 files changed, 9 insertions(+), 9 deletions(-)
> > > > > 
> > > > > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > > > > index 8e036b9..809f6db 100644
> > > > > --- a/include/migration/postcopy-ram.h
> > > > > +++ b/include/migration/postcopy-ram.h
> > > > > @@ -14,7 +14,7 @@
> > > > >  #define QEMU_POSTCOPY_RAM_H
> > > > >  
> > > > >  /* Return true if the host supports everything we need to do postcopy-ram */
> > > > > -bool postcopy_ram_supported_by_host(void);
> > > > > +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
> > > > >  
> > > > >  /*
> > > > >   * Make all of RAM sensitive to accesses to areas that haven't yet been written
> > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > index ad4036f..79f6425 100644
> > > > > --- a/migration/migration.c
> > > > > +++ b/migration/migration.c
> > > > > @@ -802,7 +802,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
> > > > >           * special support.
> > > > >           */
> > > > >          if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
> > > > > -            !postcopy_ram_supported_by_host()) {
> > > > > +            !postcopy_ram_supported_by_host(NULL)) {
> > > > >              /* postcopy_ram_supported_by_host will have emitted a more
> > > > >               * detailed message
> > > > >               */
> > > > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > > > index dc80dbb..70f0480 100644
> > > > > --- a/migration/postcopy-ram.c
> > > > > +++ b/migration/postcopy-ram.c
> > > > > @@ -60,13 +60,13 @@ struct PostcopyDiscardState {
> > > > >  #include <sys/eventfd.h>
> > > > >  #include <linux/userfaultfd.h>
> > > > >  
> > > > > -static bool ufd_version_check(int ufd)
> > > > > +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > > > >  {
> > > > >      struct uffdio_api api_struct;
> > > > >      uint64_t ioctl_mask;
> > > > >  
> > > > >      api_struct.api = UFFD_API;
> > > > > -    api_struct.features = 0;
> > > > > +    api_struct.features = UFFD_FEATURE_THREAD_ID;
> > > > >      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> > > > >          error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
> > > > >                       strerror(errno));
> > > > 
> > > > You're not actually using the 'mis' here - what I'd expected was
> > > > something that was going to check if the UFFDIO_API return said that it really
> > > > had the feature, and if so store a flag in the MIS somewhere.
> > > > 
> > > > Also, I'm not sure it's right to set 'api_struct.features' on the input - what
> > > > happens if this is run on an old kernel - we don't want postcopy to fail on
> > > > an old kernel without your feature.
> > > > I'm not 100% sure of the interface, but I think the way it works is you set
> > > > features = 0 before the call, and then check the api_struct.features in the
> > > > return - in the same way that I check for UFFD_FEATURE_MISSING_HUGETLBFS.
> > > > 
> > > We need to ask kernel about that feature,
> > > right,
> > > kernel returns back available features
> > > uffdio_api.features = UFFD_API_FEATURES
> > > but it also stores requested features
> > 
> > I feel like this does not against Dave's comment, maybe we just need
> > to send the UFFDIO_API twice? Like:
> yes, ioctl with UFFDIO_API will fail on old kernel if we will request
> e.g. UFFD_FEATURE_THREAD_ID or other new feature.
> 
> So in general way need a per feature request, for better error handling.

No, we don't need to - I think the way the kernel works is that you pass
features = 0 in, and it sets api_struct.features on the way out;
so if you always pass 0 in, you can then just check the features that
it returns.

Dave

> 
> > 
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 85fd8d7..fd0905f 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -64,6 +64,7 @@ static bool ufd_version_check(int ufd)
> >  {
> >      struct uffdio_api api_struct;
> >      uint64_t ioctl_mask;
> > +    uint64_t features = 0;
> > 
> >      api_struct.api = UFFD_API;
> >      api_struct.features = 0;
> > @@ -92,6 +93,27 @@ static bool ufd_version_check(int ufd)
> >              return false;
> >          }
> >      }
> > +
> > +#ifdef UFFD_FEATURE_THREAD_ID
> > +    if (api_struct.features & UFFD_FEATURE_THREAD_ID) {
> > +        features |= UFFD_FEATURE_THREAD_ID;
> > +    }
> > +#endif
> > +
> > +    if (features) {
> > +        /*
> > +         * If there are new features to be enabled from userspace,
> > +         * trigger another UFFDIO_API ioctl.
> > +         */
> > +        api_struct.api = UFFD_API;
> > +        api_struct.features = features;
> > +        if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> > +            error_report("UFFDIO_API failed to setup features: 0x%"PRIx64,
> > +                         features);
> > +            return false;
> > +        }
> > +    }
> > +
> >      return true;
> >  }
> > 
> > > /* only enable the requested features for this uffd context */
> > >  ctx->features = uffd_ctx_features(features);
> > > 
> > > so, at the time when process thread id is going to be sent
> > > kernel checks if it was requested
> > > +       if (features & UFFD_FEATURE_THREAD_ID)
> > > +               msg.arg.pagefault.ptid = task_pid_vnr(current);
> > 
> > (I am slightly curious about why we need this if block, after all
> >  userspace should know whether the ptid field would be valid from the
> >  first UFFDIO_API ioctl, right?)
> If I correctly understand you question ) that condition was suggested,
> due to page faulting is performance critical part (in general, not only postcopy
> case ), that's why it should be enabled from userspace, 
> only for statistics/debug purpose.
> Also looks like David want to see that feature on QEMU as not always
> feature too.
> 
> > 
> > Thanks,
> > 
> > > 
> > > from patch message:
> > > 
> > >  Process's thread id is being provided when user requeste it
> > > by setting UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.
> > > 
> > > UFFD_FEATURE_MISSING_HUGETLBFS - look like default, unconditional
> > > behavior (I didn't find any usage of that define in kernel).
> > 
> > -- 
> > Peter Xu
> > 
> 
> -- 
> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side
  2017-04-21 18:47         ` Alexey
@ 2017-04-24 17:11           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-24 17:11 UTC (permalink / raw)
  To: Alexey; +Cc: i.maximets, qemu-devel

* Alexey (a.perevalov@samsung.com) wrote:
> Hello, David!
> 
> 
> I apologize, forgot to check patches with checkpatch.pl script, but now I checked,
> and I fixed code styles in patches, however I checked also files,
> migration.c has code style errors and glib-compat.h too.
> I could send that patches to qemu-trivial, if you not against.

Feel free to send style patches to trivial;  if they're right next
to a line you're changing then you can include them in the same patch
but if they're elsewhere do as you say with a trivial patch.

Dave

> 
> On Fri, Apr 21, 2017 at 01:00:32PM +0100, Dr. David Alan Gilbert wrote:
> > * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > > This patch provides downtime calculation per vCPU,
> > > as a summary and as a overlapped value for all vCPUs.
> > > 
> > > This approach just keeps tree with page fault addr as a key,
> > > and t1-t2 interval of pagefault time and page copy time, with
> > > affected vCPU bit mask.
> > > For more implementation details please see comment to
> > > get_postcopy_total_downtime function.
> > > 
> > > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > ---
> > >  include/migration/migration.h |  14 +++
> > >  migration/migration.c         | 280 +++++++++++++++++++++++++++++++++++++++++-
> > >  migration/postcopy-ram.c      |  24 +++-
> > >  migration/qemu-file.c         |   1 -
> > >  migration/trace-events        |   9 +-
> > >  5 files changed, 323 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > index 5720c88..5d2c628 100644
> > > --- a/include/migration/migration.h
> > > +++ b/include/migration/migration.h
> > > @@ -123,10 +123,24 @@ struct MigrationIncomingState {
> > >  
> > >      /* See savevm.c */
> > >      LoadStateEntry_Head loadvm_handlers;
> > > +
> > > +    /*
> > > +     *  Tree for keeping postcopy downtime,
> > > +     *  necessary to calculate correct downtime, during multiple
> > > +     *  vm suspends, it keeps host page address as a key and
> > > +     *  DowntimeDuration as a data
> > > +     *  NULL means kernel couldn't provide process thread id,
> > > +     *  and QEMU couldn't identify which vCPU raise page fault
> > > +     */
> > > +    GTree *postcopy_downtime;
> > >  };
> > >  
> > >  MigrationIncomingState *migration_incoming_get_current(void);
> > >  void migration_incoming_state_destroy(void);
> > > +void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > +void mark_postcopy_downtime_end(uint64_t addr);
> > > +uint64_t get_postcopy_total_downtime(void);
> > > +void destroy_downtime_duration(gpointer data);
> > >  
> > >  /*
> > >   * An outstanding page request, on the source, having been received
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 79f6425..5bac434 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -38,6 +38,8 @@
> > >  #include "io/channel-tls.h"
> > >  #include "migration/colo.h"
> > >  
> > > +#define DEBUG_VCPU_DOWNTIME 1
> > > +
> > >  #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
> > >  
> > >  /* Amount of time to allocate to each "chunk" of bandwidth-throttled
> > > @@ -77,6 +79,19 @@ static NotifierList migration_state_notifiers =
> > >  
> > >  static bool deferred_incoming;
> > >  
> > > +typedef struct {
> > > +    int64_t begin;
> > > +    int64_t end;
> > > +    uint64_t *cpus; /* cpus bit mask array, QEMU bit functions support
> > > +     bit operation on memory regions, but doesn't check out of range */
> > > +} DowntimeDuration;
> > > +
> > > +typedef struct {
> > > +    int64_t tp; /* point in time */
> > > +    bool is_end;
> > > +    uint64_t *cpus;
> > > +} OverlapDowntime;
> > > +
> > >  /*
> > >   * Current state of incoming postcopy; note this is not part of
> > >   * MigrationIncomingState since it's state is used during cleanup
> > > @@ -117,6 +132,13 @@ MigrationState *migrate_get_current(void)
> > >      return &current_migration;
> > >  }
> > >  
> > > +void destroy_downtime_duration(gpointer data)
> > > +{
> > > +    DowntimeDuration *dd = (DowntimeDuration *)data;
> > > +    g_free(dd->cpus);
> > > +    g_free(data);
> > > +}
> > > +
> > >  MigrationIncomingState *migration_incoming_get_current(void)
> > >  {
> > >      static bool once;
> > > @@ -138,10 +160,13 @@ void migration_incoming_state_destroy(void)
> > >      struct MigrationIncomingState *mis = migration_incoming_get_current();
> > >  
> > >      qemu_event_destroy(&mis->main_thread_load_event);
> > > +    if (mis->postcopy_downtime) {
> > > +        g_tree_destroy(mis->postcopy_downtime);
> > > +        mis->postcopy_downtime = NULL;
> > > +    }
> > >      loadvm_free_handlers(mis);
> > >  }
> > >  
> > > -
> > >  typedef struct {
> > >      bool optional;
> > >      uint32_t size;
> > > @@ -1754,7 +1779,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> > >       */
> > >      ms->postcopy_after_devices = true;
> > >      notifier_list_notify(&migration_state_notifiers, ms);
> > > -
> > 
> > Stray deletion
> > 
> > >      ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
> > >  
> > >      qemu_mutex_unlock_iothread();
> > > @@ -2117,3 +2141,255 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > >      return atomic_xchg(&incoming_postcopy_state, new_state);
> > >  }
> > >  
> > > +#define SIZE_TO_KEEP_CPUBITS (1 + smp_cpus/sizeof(guint64))
> > 
> > Split out your cpu-sets so that you have an 'alloc_cpu_set',
> > a 'set bit' a 'set all bits', dup etc
> > (I see Linux has cpumask.h that has a 'cpu_set' that's
> > basically the same thing, but we need something portablish.)
> > 
> > > +void mark_postcopy_downtime_begin(uint64_t addr, int cpu)
> > > +{
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    DowntimeDuration *dd;
> > > +    if (!mis->postcopy_downtime) {
> > > +        return;
> > > +    }
> > > +
> > > +    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr); /* !!! cast */
> > > +    if (!dd) {
> > > +        dd = (DowntimeDuration *)g_new0(DowntimeDuration, 1);
> > > +        dd->cpus = g_new0(guint64, SIZE_TO_KEEP_CPUBITS);
> > > +        g_tree_insert(mis->postcopy_downtime, (gpointer)addr, (gpointer)dd);
> > > +    }
> > > +
> > > +    if (cpu < 0) {
> > > +        /* assume in this situation all vCPUs are sleeping */
> > > +        int i;
> > > +        for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
> > > +            dd->cpus[i] = ~(uint64_t)0u;
> > > +        }
> > > +    } else
> > > +        set_bit(cpu, dd->cpus);
> > 
> > Qemu coding style: Use {}'s even on one line blocks
> > 
> > > +
> > > +    /*
> > > +     *  overwrite previously set dd->begin, if that page already was
> > > +     *     faulted on another cpu
> > > +     */
> > > +    dd->begin = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > 
> > OK, so this is making a decision that needs to be documented;
> > that is that if one CPU was already paused at time (a), then a second
> > CPU we see is paused at time (b), then the time  we record only starts
> > at (b) and ignores the time from a..b  - is that the way you want to do it?
> Yes, time interval when at least one of vCPU is running isn't counted.
> 
> > As I say, it should be documented somewhere; it's probably worth
> > adding something to docs/migration.txt about how this measurement works.
> > 
> > 
> > > +    trace_mark_postcopy_downtime_begin(addr, dd, dd->begin, cpu);
> > > +}
> > > +
> > > +void mark_postcopy_downtime_end(uint64_t addr)
> > > +{
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    DowntimeDuration *dd;
> > > +    if (!mis->postcopy_downtime) {
> > > +        return;
> > > +    }
> > > +
> > > +    dd = g_tree_lookup(mis->postcopy_downtime, (gpointer)addr);
> > > +    if (!dd) {
> > > +        /* error_report("Could not populate downtime duration completion time \n\
> > > +                        There is no downtime duration for 0x%"PRIx64, addr); */
> > 
> > Error or no error - decide!   Is this happening for pages that arrive before
> > they've been requested?
> > 
> > > +        return;
> > > +    }
> > > +
> > > +    dd->end = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > > +    trace_mark_postcopy_downtime_end(addr, dd, dd->end);
> > > +}
> > > +
> > > +struct downtime_overlay_cxt {
> > > +    GPtrArray *downtime_points;
> > > +    size_t number_of_points;
> > > +};
> > 
> > Why 'cxt' ? If you mean as an abbreviation to context, then we normally use ctxt.
> > 
> > > +/*
> > > + * This function split each DowntimeDuration, which represents as start/end
> > > + * pointand makes a points of it, then fill array with points,
> > > + * to sort it in future.
> > > + */
> > > +static gboolean split_duration_and_fill_points(gpointer key, gpointer value,
> > > +                                        gpointer data)
> > > +{
> > > +    struct downtime_overlay_cxt *ctx = (struct downtime_overlay_cxt *)data;
> > > +    DowntimeDuration *dd = (DowntimeDuration *)value;
> > > +    GPtrArray *interval = ctx->downtime_points;
> > > +    if (dd->begin) {
> > > +        OverlapDowntime *od_begin = g_new0(OverlapDowntime, 1);
> > > +        od_begin->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> > > +        od_begin->tp = dd->begin;
> > > +        od_begin->is_end = false;
> > > +        g_ptr_array_add(interval, od_begin);
> > > +        ctx->number_of_points += 1;
> > > +    }
> > > +
> > > +    if (dd->end) {
> > > +        OverlapDowntime *od_end = g_new0(OverlapDowntime, 1);
> > > +        od_end->cpus = g_memdup(dd->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> > > +        od_end->tp = dd->end;
> > > +        od_end->is_end = true;
> > > +        g_ptr_array_add(interval, od_end);
> > > +        ctx->number_of_points += 1;
> > > +    }
> > > +
> > > +    if (dd->end && dd->begin)
> > > +        trace_split_duration_and_fill_points(dd->end - dd->begin, (uint64_t)key);
> > 
> > again, need {}'s
> > 
> > > +    return FALSE;
> > > +}
> > > +
> > > +#ifdef DEBUG_VCPU_DOWNTIME
> > > +static gboolean calculate_per_cpu(gpointer key, gpointer value,
> > > +                                  gpointer data)
> > > +{
> > > +    int *downtime_cpu = (int *)data;
> > > +    DowntimeDuration *dd = (DowntimeDuration *)value;
> > > +    int cpu_iter;
> > > +    for (cpu_iter = 0; cpu_iter < smp_cpus; cpu_iter++) {
> > > +        if (test_bit(cpu_iter, dd->cpus) && dd->end && dd->begin)
> > > +            downtime_cpu[cpu_iter] += dd->end - dd->begin;
> > > +    }
> > > +    return FALSE;
> > > +}
> > > +#endif /* DEBUG_VCPU_DOWNTIME */
> > > +
> > > +static gint compare_downtime(gconstpointer a, gconstpointer b)
> > > +{
> > > +    DowntimeDuration *dda = (DowntimeDuration *)a;
> > > +    DowntimeDuration *ddb = (DowntimeDuration *)b;
> > > +    return dda->begin - ddb->begin;
> > > +}
> > > +
> > > +static void destroy_overlap_downtime(gpointer data)
> > > +{
> > > +    OverlapDowntime *od = (OverlapDowntime *)data;
> > > +    g_free(od->cpus);
> > > +    g_free(data);
> > > +}
> > > +
> > > +static int check_overlap(uint64_t *b)
> > > +{
> > > +    unsigned long zero_bit = find_first_zero_bit(b, BITS_PER_LONG * SIZE_TO_KEEP_CPUBITS);
> > 
> > Line's too long.
> > 
> > > +    return zero_bit >= smp_cpus;
> > 
> > So this is really 'all cpus are blocked'?
> yes, that condition for it
> > 
> > > +}
> > > +
> > > +/*
> > > + * This function calculates downtime per cpu and trace it
> > > + *
> > > + *  Also it calculates total downtime as an interval's overlap,
> > > + *  for many vCPU.
> > > + *
> > > + *  The approach is following:
> > > + *  Initially intervals are represented in tree where key is
> > > + *  pagefault address, and values:
> > > + *   begin - page fault time
> > > + *   end   - page load time
> > > + *   cpus  - bit mask shows affected cpus
> > > + *
> > > + *  To calculate overlap on all cpus, intervals converted into
> > > + *  array of points in time (downtime_points), the size of
> > > + *  array is 2 * number of nodes in tree of intervals (2 array
> > > + *  elements per one in element of interval).
> > > + *  Each element is marked as end (E) or as start (S) of interval.
> > > + *  The overlap downtime will be calculated for SE, only in case
> > > + *  there is sequence S(0..N)E(M) for every vCPU.
> > > + *
> > > + * As example we have 3 CPU
> > > + *
> > > + *      S1        E1           S1               E1
> > > + * -----***********------------xxx***************------------------------> CPU1
> > > + *
> > > + *             S2                E2
> > > + * ------------****************xxx---------------------------------------> CPU2
> > > + *
> > > + *                         S3            E3
> > > + * ------------------------****xxx********-------------------------------> CPU3
> > > + *
> > > + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> > > + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> > > + * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
> >                        ^ typo
> > 
> > > + * Legend of picture is following: * - means downtime per vCPU
> > > + *                                 x - means overlapped downtime
> > > + */
> > > +uint64_t get_postcopy_total_downtime(void)
> > > +{
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    uint64_t total_downtime = 0; /* for total overlapped downtime */
> > > +    const int intervals = g_tree_nnodes(mis->postcopy_downtime);
> > > +    int point_iter, start_point_iter, i;
> > > +    struct downtime_overlay_cxt dp_ctx = { 0 };
> > > +    /*
> > > +     * array will contain 2 * interval points or less, if
> > > +     * it was not page fault finalization for page,
> > > +     * real count will be in ctx.number_of_points
> > > +     */
> > > +    dp_ctx.downtime_points = g_ptr_array_new_full(2 * intervals,
> > > +                                                     destroy_overlap_downtime);
> > 
> > Is the g_ptr_array giving you anything here over a plain-old C array of pointers?
> > You're not dynamically growing it.
> Yes, I know upper bound of that array, at that time, and GPtrArray maybe
> is little bit heavy structure here. Ok I'll use plain array.
> 
> > 
> > > +    if (!mis->postcopy_downtime) {
> > > +        goto out;
> > > +    }
> > > +
> > > +#ifdef DEBUG_VCPU_DOWNTIME
> > > +    {
> > > +        gint *downtime_cpu = g_new0(int, smp_cpus);
> > > +        g_tree_foreach(mis->postcopy_downtime, calculate_per_cpu, downtime_cpu);
> > > +        for (point_iter = 0; point_iter < smp_cpus; point_iter++)
> > > +        {
> > > +            trace_downtime_per_cpu(point_iter, downtime_cpu[point_iter]);
> > > +        }
> > > +        g_free(downtime_cpu);
> > > +    }
> > > +#endif /* DEBUG_VCPU_DOWNTIME */
> > 
> > You mgight want to make that:
> >   if (TRACE_DOWNTIME_PER_CPU_ENABLED) {
> >   }
> > 
> > and remove the ifdef.
> > 
> > > +    /* make downtime points S/E from interval */
> > > +    g_tree_foreach(mis->postcopy_downtime, split_duration_and_fill_points,
> > > +                   &dp_ctx);
> > > +    g_ptr_array_sort(dp_ctx.downtime_points, compare_downtime);
> > > +
> > > +    for (point_iter = 1; point_iter < dp_ctx.number_of_points;
> > > +         point_iter++) {
> > > +        OverlapDowntime *od = g_ptr_array_index(dp_ctx.downtime_points,
> > > +                point_iter);
> > > +        uint64_t *cur_cpus;
> > > +        int smp_cpus_i = smp_cpus;
> > > +        OverlapDowntime *prev_od = g_ptr_array_index(dp_ctx.downtime_points,
> > > +                                                     point_iter - 1);
> > > +        if (!od || !prev_od)
> > > +            continue;
> > 
> > Why would that happen?
> Now cycle goes till dp_ctx.number_of_points, so in this version it looks
> impossible.
> > 
> > > +        /* we need sequence SE */
> > > +        if (!od->is_end || prev_od->is_end)
> > > +            continue;
> > > +
> > > +        cur_cpus = g_memdup(od->cpus, sizeof(uint64_t) * SIZE_TO_KEEP_CPUBITS);
> > > +        for (start_point_iter = point_iter - 1;
> > > +             start_point_iter >= 0 && smp_cpus_i;
> > > +             start_point_iter--, smp_cpus_i--) {
> > 
> > I think I see what you're doing in this loop, although it's a bit hairy;
> > I don't think I understand why we needed to get prev_od  if this loop is searching
> > backwards?
> Just for condition,
> if (!od->is_end || prev_od->is_end) {
>     continue;
> }
> to skip any other sequences, like EE,
> do you think following condition more readable?
> !(od->is_end && !prev_od->is_end) 
>     continue;
> 
> Also prev_od is  nearest point to end, so time since that point to end
> is interesting.
> I depicted that.
> > 
> > > +            OverlapDowntime *t_od = g_ptr_array_index(dp_ctx.downtime_points,
> > > +                                                      start_point_iter);
> > > +            if (!t_od)
> > > +                break;
> > > +            /* should be S */
> > > +            if (t_od->is_end)
> > > +                break;
> > > +
> > > +            /* points were sorted, it's possible when
> > > +             * end is not occured, but this points were ommited
> > > +             * in split_duration_and_fill_points */
> > > +            if (od->tp <= prev_od->tp) {
> > 
> > Why is this checking od and prev_od in this loop - isn't this
> > loop mainly t_od ?
> right, that code shouldn't be here.
> > 
> > > +                break;
> > > +            }
> > > +
> > > +            for (i = 0; i < SIZE_TO_KEEP_CPUBITS; i++) {
> > > +                cur_cpus[i] |= t_od->cpus[i];
> > > +            }
> > > +
> > > +            /* check_overlap - just count number of bits in cur_cpus,
> > > +             * and compare it with smp_cpus */
> > > +            if (check_overlap(cur_cpus)) {
> > > +                total_downtime += od->tp - prev_od->tp;
> > > +                /* situation when one S point represents all vCPU is possible */
> > > +                break;
> > > +            }
> > > +        }
> > > +        g_free(cur_cpus);
> > > +    }
> > > +    trace_get_postcopy_total_downtime(g_tree_nnodes(mis->postcopy_downtime),
> > > +        total_downtime);
> > > +out:
> > > +    g_ptr_array_free(dp_ctx.downtime_points, TRUE);
> > > +    return total_downtime;
> > > +}
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index 70f0480..ea89f4e 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -23,8 +23,10 @@
> > >  #include "migration/postcopy-ram.h"
> > >  #include "sysemu/sysemu.h"
> > >  #include "sysemu/balloon.h"
> > > +#include <sys/param.h>
> > >  #include "qemu/error-report.h"
> > >  #include "trace.h"
> > > +#include "glib/glib-helper.h"
> > >  
> > >  /* Arbitrary limit on size of each discard command,
> > >   * keeps them around ~200 bytes
> > > @@ -81,6 +83,11 @@ static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > >          return false;
> > >      }
> > >  
> > > +    if (mis && UFFD_FEATURE_THREAD_ID & api_struct.features) {
> > 
> > That's a very weird way of writing that test!  Also, I think you need
> > to still make this user-selectable given the complexity/cost.
> >
> Like that?
> {"execute": "migrate-set-capabilities" , "arguments":
> { "capabilities": [ { "capability": "calculate-postcopy-downtime", "state": true } ]
> } }                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> I tried to put heavy operations much after hot path (page requesting and
> copying). The algorithm complexity is NumberOfPage*NumberOfvCPU, in case
> of hugepages it's not so many.
> Ok, if it's conditionally obtained from kernel, why not to give a user
> ability to choose.
> 
> > > +        mis->postcopy_downtime = g_tree_new_full(g_int_cmp64,
> > > +                                         NULL, NULL, destroy_downtime_duration);
> > > +    }
> > > +
> > >      if (getpagesize() != ram_pagesize_summary()) {
> > >          bool have_hp = false;
> > >          /* We've got a huge page */
> > > @@ -404,6 +411,18 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
> > >      return 0;
> > >  }
> > >  
> > > +static int get_mem_fault_cpu_index(uint32_t pid)
> > > +{
> > > +    CPUState *cpu_iter;
> > > +
> > > +    CPU_FOREACH(cpu_iter) {
> > > +        if (cpu_iter->thread_id == pid)
> > > +           return cpu_iter->cpu_index;
> > > +    }
> > > +    trace_get_mem_fault_cpu_index(pid);
> > > +    return -1;
> > > +}
> > > +
> > >  /*
> > >   * Handle faults detected by the USERFAULT markings
> > >   */
> > > @@ -481,8 +500,10 @@ static void *postcopy_ram_fault_thread(void *opaque)
> > >          rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
> > >          trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
> > >                                                  qemu_ram_get_idstr(rb),
> > > -                                                rb_offset);
> > > +                                                rb_offset, msg.arg.pagefault.feat.ptid);
> > 
> > Line length!
> > 
> > >  
> > > +        mark_postcopy_downtime_begin(msg.arg.pagefault.address,
> > > +                            get_mem_fault_cpu_index(msg.arg.pagefault.feat.ptid));
> > >          /*
> > >           * Send the request to the source - we want to request one
> > >           * of our host page sizes (which is >= TPS)
> > > @@ -577,6 +598,7 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
> > >  
> > >          return -e;
> > >      }
> > > +    mark_postcopy_downtime_end((uint64_t)host);
> > >  
> > >      trace_postcopy_place_page(host);
> > >      return 0;
> > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> > > index 195fa94..c9f3e47 100644
> > > --- a/migration/qemu-file.c
> > > +++ b/migration/qemu-file.c
> > > @@ -547,7 +547,6 @@ size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
> > >  int qemu_peek_byte(QEMUFile *f, int offset)
> > >  {
> > >      int index = f->buf_index + offset;
> > > -
> > 
> > Stray!
> > 
> > >      assert(!qemu_file_is_writable(f));
> > >      assert(offset < IO_BUF_SIZE);
> > >  
> > > diff --git a/migration/trace-events b/migration/trace-events
> > > index 7372ce2..ab2e1e4 100644
> > > --- a/migration/trace-events
> > > +++ b/migration/trace-events
> > > @@ -110,6 +110,12 @@ process_incoming_migration_co_end(int ret, int ps) "ret=%d postcopy-state=%d"
> > >  process_incoming_migration_co_postcopy_end_main(void) ""
> > >  migration_set_incoming_channel(void *ioc, const char *ioctype) "ioc=%p ioctype=%s"
> > >  migration_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostname)  "ioc=%p ioctype=%s hostname=%s"
> > > +mark_postcopy_downtime_begin(uint64_t addr, void *dd, int64_t time, int cpu) "addr 0x%" PRIx64 " dd %p time %" PRId64 " cpu %d"
> > > +mark_postcopy_downtime_end(uint64_t addr, void *dd, int64_t time) "addr 0x%" PRIx64 " dd %p time %" PRId64
> > > +get_postcopy_total_downtime(int num, uint64_t total) "faults %d, total downtime %" PRIu64
> > > +split_duration_and_fill_points(int64_t downtime, uint64_t addr) "downtime %" PRId64 " addr 0x%" PRIx64
> > > +downtime_per_cpu(int cpu_index, int downtime) "downtime cpu[%d]=%d"
> > > +source_return_path_thread_downtime(uint64_t downtime) "downtime %" PRIu64
> > >  
> > >  # migration/rdma.c
> > >  qemu_rdma_accept_incoming_migration(void) ""
> > > @@ -186,7 +192,7 @@ postcopy_ram_enable_notify(void) ""
> > >  postcopy_ram_fault_thread_entry(void) ""
> > >  postcopy_ram_fault_thread_exit(void) ""
> > >  postcopy_ram_fault_thread_quit(void) ""
> > > -postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=%" PRIx64 " rb=%s offset=%zx"
> > > +postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset, int pid) "Request for HVA=%" PRIx64 " rb=%s offset=%zx %d"
> > >  postcopy_ram_incoming_cleanup_closeuf(void) ""
> > >  postcopy_ram_incoming_cleanup_entry(void) ""
> > >  postcopy_ram_incoming_cleanup_exit(void) ""
> > > @@ -195,6 +201,7 @@ save_xbzrle_page_skipping(void) ""
> > >  save_xbzrle_page_overflow(void) ""
> > >  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
> > >  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
> > > +get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
> > >  
> > >  # migration/exec.c
> > >  migration_exec_outgoing(const char *cmd) "cmd=%s"
> > > -- 
> > > 1.8.3.1
> > > 
> > 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
> -- 
> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side (CPUMASK)
  2017-04-22  9:49         ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side (CPUMASK) Alexey
@ 2017-04-24 17:13           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-24 17:13 UTC (permalink / raw)
  To: Alexey; +Cc: i.maximets, qemu-devel

* Alexey (a.perevalov@samsung.com) wrote:
> Hello David,
> this mail just for CPUMASK discussion.
> 
> On Fri, Apr 21, 2017 at 01:00:32PM +0100, Dr. David Alan Gilbert wrote:
> > * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > > This patch provides downtime calculation per vCPU,
> > > as a summary and as a overlapped value for all vCPUs.
> > > 
> > > This approach just keeps tree with page fault addr as a key,
> > > and t1-t2 interval of pagefault time and page copy time, with
> > > affected vCPU bit mask.
> > > For more implementation details please see comment to
> > > get_postcopy_total_downtime function.
> > > 
> > > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > ---
> > >  include/migration/migration.h |  14 +++
> > >  migration/migration.c         | 280 +++++++++++++++++++++++++++++++++++++++++-
> > >  migration/postcopy-ram.c      |  24 +++-
> > >  migration/qemu-file.c         |   1 -
> > >  migration/trace-events        |   9 +-
> > >  5 files changed, 323 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > > index 5720c88..5d2c628 100644
> > > --- a/include/migration/migration.h
> > > +++ b/include/migration/migration.h
> > > @@ -123,10 +123,24 @@ struct MigrationIncomingState {
> > >  
> > >      /* See savevm.c */
> > >      LoadStateEntry_Head loadvm_handlers;
> > > +
> > > +    /*
> > > +     *  Tree for keeping postcopy downtime,
> > > +     *  necessary to calculate correct downtime, during multiple
> > > +     *  vm suspends, it keeps host page address as a key and
> > > +     *  DowntimeDuration as a data
> > > +     *  NULL means kernel couldn't provide process thread id,
> > > +     *  and QEMU couldn't identify which vCPU raise page fault
> > > +     */
> > > +    GTree *postcopy_downtime;
> > >  };
> > >  
> > >  MigrationIncomingState *migration_incoming_get_current(void);
> > >  void migration_incoming_state_destroy(void);
> > > +void mark_postcopy_downtime_begin(uint64_t addr, int cpu);
> > > +void mark_postcopy_downtime_end(uint64_t addr);
> > > +uint64_t get_postcopy_total_downtime(void);
> > > +void destroy_downtime_duration(gpointer data);
> > >  
> > >  /*
> > >   * An outstanding page request, on the source, having been received
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 79f6425..5bac434 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -38,6 +38,8 @@
> > >  #include "io/channel-tls.h"
> > >  #include "migration/colo.h"
> > >  
> > > +#define DEBUG_VCPU_DOWNTIME 1
> > > +
> > >  #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
> > >  
> > >  /* Amount of time to allocate to each "chunk" of bandwidth-throttled
> > > @@ -77,6 +79,19 @@ static NotifierList migration_state_notifiers =
> > >  
> > >  static bool deferred_incoming;
> > >  
> > > +typedef struct {
> > > +    int64_t begin;
> > > +    int64_t end;
> > > +    uint64_t *cpus; /* cpus bit mask array, QEMU bit functions support
> > > +     bit operation on memory regions, but doesn't check out of range */
> > > +} DowntimeDuration;
> > > +
> > > +typedef struct {
> > > +    int64_t tp; /* point in time */
> > > +    bool is_end;
> > > +    uint64_t *cpus;
> > > +} OverlapDowntime;
> > > +
> > >  /*
> > >   * Current state of incoming postcopy; note this is not part of
> > >   * MigrationIncomingState since it's state is used during cleanup
> > > @@ -117,6 +132,13 @@ MigrationState *migrate_get_current(void)
> > >      return &current_migration;
> > >  }
> > >  
> > > +void destroy_downtime_duration(gpointer data)
> > > +{
> > > +    DowntimeDuration *dd = (DowntimeDuration *)data;
> > > +    g_free(dd->cpus);
> > > +    g_free(data);
> > > +}
> > > +
> > >  MigrationIncomingState *migration_incoming_get_current(void)
> > >  {
> > >      static bool once;
> > > @@ -138,10 +160,13 @@ void migration_incoming_state_destroy(void)
> > >      struct MigrationIncomingState *mis = migration_incoming_get_current();
> > >  
> > >      qemu_event_destroy(&mis->main_thread_load_event);
> > > +    if (mis->postcopy_downtime) {
> > > +        g_tree_destroy(mis->postcopy_downtime);
> > > +        mis->postcopy_downtime = NULL;
> > > +    }
> > >      loadvm_free_handlers(mis);
> > >  }
> > >  
> > > -
> > >  typedef struct {
> > >      bool optional;
> > >      uint32_t size;
> > > @@ -1754,7 +1779,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> > >       */
> > >      ms->postcopy_after_devices = true;
> > >      notifier_list_notify(&migration_state_notifiers, ms);
> > > -
> > 
> > Stray deletion
> > 
> > >      ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
> > >  
> > >      qemu_mutex_unlock_iothread();
> > > @@ -2117,3 +2141,255 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
> > >      return atomic_xchg(&incoming_postcopy_state, new_state);
> > >  }
> > >  
> > > +#define SIZE_TO_KEEP_CPUBITS (1 + smp_cpus/sizeof(guint64))
> > 
> > Split out your cpu-sets so that you have an 'alloc_cpu_set',
> > a 'set bit' a 'set all bits', dup etc
> > (I see Linux has cpumask.h that has a 'cpu_set' that's
> > basically the same thing, but we need something portablish.)
> > 
> Agree, the way I'm working with cpumask is little bit naive.
> instead of set all_cpumask in case when all vCPU are sleeping with precision
> ((1 << smp_cpus) - 1), I just set ~0 it all, because I didn't use
> functions like cpumask_and.
> If you think, this patch should use cpumask, cpumask patchset/separate
> thread should be introduced before, and then this patchset should be
> rebased on top of it.

Yes, some functions like that - because then it makes this code
clearer; and there's less repetition.

Dave

> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] migration: send postcopy downtime back to source
  2017-04-14 13:17     ` [Qemu-devel] [PATCH 5/6] migration: send postcopy downtime back to source Alexey Perevalov
@ 2017-04-24 17:26       ` Dr. David Alan Gilbert
  2017-04-25  5:51         ` Alexey
  0 siblings, 1 reply; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-24 17:26 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> Right now to initiate postcopy live migration need to
> send request to source machine and specify destination.
> 
> User could request migration status by query-migrate qmp command on
> source machine, but postcopy downtime is being evaluated on destination,
> so it should be transmitted back to source. For this purpose return path
> socket was shosen.
> 
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>

That will break a migration from an older QEMU to a newer QEMU with this feature
since the old QEMU won't know the message type and fail with a
  'Received invalid message'

near the start of source_return_path_thread.

The simpler solution is to let the stat be read on the destination side
and not bother sending it backwards over the wire.

Dave

> ---
>  include/migration/migration.h |  4 +++-
>  migration/migration.c         | 20 ++++++++++++++++++--
>  migration/postcopy-ram.c      |  1 +
>  3 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index 5d2c628..5535aa6 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -55,7 +55,8 @@ enum mig_rp_message_type {
>  
>      MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
>      MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
> -
> +    MIG_RP_MSG_DOWNTIME,    /* downtime value from destination,
> +                               calculated and sent in case of post copy */
>      MIG_RP_MSG_MAX
>  };
>  
> @@ -364,6 +365,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
>                            uint32_t value);
>  void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
>                                ram_addr_t start, size_t len);
> +void migrate_send_rp_downtime(MigrationIncomingState *mis, uint64_t downtime);
>  
>  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
>  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> diff --git a/migration/migration.c b/migration/migration.c
> index 5bac434..3134e24 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -553,6 +553,19 @@ void migrate_send_rp_message(MigrationIncomingState *mis,
>  }
>  
>  /*
> + * Send postcopy migration downtime,
> + * at the moment of calling this function migration should
> + * be completed.
> + */
> +void migrate_send_rp_downtime(MigrationIncomingState *mis, uint64_t downtime)
> +{
> +    uint64_t buf;
> +
> +    buf = cpu_to_be64(downtime);
> +    migrate_send_rp_message(mis, MIG_RP_MSG_DOWNTIME, sizeof(downtime), &buf);
> +}
> +
> +/*
>   * Send a 'SHUT' message on the return channel with the given value
>   * to indicate that we've finished with the RP.  Non-0 value indicates
>   * error.
> @@ -1483,6 +1496,7 @@ static struct rp_cmd_args {
>      [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
>      [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
>      [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
> +    [MIG_RP_MSG_DOWNTIME]       = { .len =  8, .name = "DOWNTIME" },
>      [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
>  };
>  
> @@ -1613,6 +1627,10 @@ static void *source_return_path_thread(void *opaque)
>              migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
>              break;
>  
> +        case MIG_RP_MSG_DOWNTIME:
> +            ms->downtime = ldq_be_p(buf);
> +            break;
> +
>          default:
>              break;
>          }
> @@ -1677,7 +1695,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
>      int ret;
>      QIOChannelBuffer *bioc;
>      QEMUFile *fb;
> -    int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>      bool restart_block = false;
>      migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
>                        MIGRATION_STATUS_POSTCOPY_ACTIVE);
> @@ -1779,7 +1796,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
>       */
>      ms->postcopy_after_devices = true;
>      notifier_list_notify(&migration_state_notifiers, ms);
> -    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
>  
>      qemu_mutex_unlock_iothread();
>  
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index ea89f4e..42330fd 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -330,6 +330,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
>      }
>  
>      postcopy_state_set(POSTCOPY_INCOMING_END);
> +    migrate_send_rp_downtime(mis, get_postcopy_total_downtime());
>      migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
>  
>      if (mis->postcopy_tmp_page) {
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 6/6] migration: detailed traces for postcopy
  2017-04-14 13:17     ` [Qemu-devel] [PATCH 6/6] migration: detailed traces for postcopy Alexey Perevalov
  2017-04-17 13:32       ` Philippe Mathieu-Daudé
@ 2017-04-24 18:03       ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-24 18:03 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: qemu-devel, i.maximets

* Alexey Perevalov (a.perevalov@samsung.com) wrote:
> It could help to track down vCPU state during page fault and
> page fault sources.
> 
> This patch showes proc's status/stack/syscall file at the moment of pagefault,
> it's very interesting to know who was page fault initiator.

This is a LOT of debug code, almost none of it is postcopy specific,
so probably a question for generic tracing code; but I'll admit to
not being happy about the idea of putting this much code in for
this type of dumping; when it gets this desperate we just normally do
a special build.

However, some specific comments as well.

> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---
>  migration/postcopy-ram.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++-
>  migration/trace-events   |  6 +++
>  2 files changed, 103 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 42330fd..513633c 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -412,7 +412,91 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
>      return 0;
>  }
>  
> -static int get_mem_fault_cpu_index(uint32_t pid)
> +#define PROC_LEN 1024
> +#define DEBUG_FAULT_PROCESS_STATUS 1
> +
> +#ifdef DEBUG_FAULT_PROCESS_STATUS
> +
> +static FILE *get_proc_file(const gchar *frmt, pid_t thread_id)
> +{
> +    FILE *f = NULL;
> +    gchar *file_path = g_strdup_printf(frmt, thread_id);
> +    if (file_path == NULL) {
> +        error_report("Couldn't allocate path for %u", thread_id);
> +        return NULL;
> +    }

I was going to say that I thought g_strdup_printf couldn't
return NULL; but then I looked at the source - eww it can.

> +    f = fopen(file_path, "r");
> +    if (!f) {
> +        error_report("can't open %s", file_path);
> +    }
> +
> +    trace_get_proc_file(file_path);
> +    g_free(file_path);
> +    return f;
> +}
> +
> +typedef void(*proc_line_handler)(const char *line);
> +
> +static void proc_line_cb(const char *line)
> +{
> +    /* trace_ functions are inline */
> +    trace_proc_line_cb(line);
> +}
> +
> +static void foreach_line_in_file(FILE *f, proc_line_handler cb)
> +{
> +    char *line = NULL;
> +    ssize_t read;
> +    size_t len;
> +
> +    while ((read = getline(&line, &len, f)) != -1) {
> +        /* workaround, trace_ infrastructure already insert \n
> +         * and getline includes it */
> +        ssize_t str_len = strlen(line) - 1;
> +        if (str_len <= 0)
> +            continue;
> +        line[str_len] = '\0';
> +        cb(line);
> +    }
> +    free(line);
> +}
> +
> +static void observe_thread_proc(const gchar *path_frmt, pid_t thread_id)
> +{
> +    FILE *f = get_proc_file(path_frmt, thread_id);
> +    if (!f) {
> +        error_report("can't read thread's proc");
> +        return;
> +    }
> +
> +    foreach_line_in_file(f, proc_line_cb);

> +    fclose(f);
> +}
> +
> +/*
> + * for convinience tracing need to trace
> + * observe_thread_begin
> + * get_proc_file
> + * proc_line_cb
> + * observe_thread_end
> + */
> +static void observe_thread(const char *msg, pid_t thread_id)
> +{
> +    trace_observe_thread_begin(msg);
> +    observe_thread_proc("/proc/%d/status", thread_id);
> +    observe_thread_proc("/proc/%d/syscall", thread_id);
> +    observe_thread_proc("/proc/%d/stack", thread_id);

You could wrap that in something like:
  if (TRACE_PROC_LINE_CB_ENABLED) {

so it doesn't read all of the files and do all the allocation
to get to the point it realised no one cared.

Dave

> +    trace_observe_thread_end(msg);
> +}
> +
> +#else
> +static void observe_thread(const char *msg, pid_t thread_id)
> +{
> +}
> +
> +#endif /* DEBUG_FAULT_PROCESS_STATUS */
> +
> +static int get_mem_fault_cpu_index(pid_t pid)
>  {
>      CPUState *cpu_iter;
>  
> @@ -421,9 +505,20 @@ static int get_mem_fault_cpu_index(uint32_t pid)
>             return cpu_iter->cpu_index;
>      }
>      trace_get_mem_fault_cpu_index(pid);
> +    observe_thread("not a vCPU", pid);
> +
>      return -1;
>  }
>  
> +static void observe_vcpu_state(void)
> +{
> +    CPUState *cpu_iter;
> +    CPU_FOREACH(cpu_iter) {
> +        observe_thread("vCPU", cpu_iter->thread_id);
> +        trace_vcpu_state(cpu_iter->running, cpu_iter->cpu_index);
> +    }
> +}
> +
>  /*
>   * Handle faults detected by the USERFAULT markings
>   */
> @@ -465,6 +560,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
>          }
>  
>          ret = read(mis->userfault_fd, &msg, sizeof(msg));
> +        observe_vcpu_state();
>          if (ret != sizeof(msg)) {
>              if (errno == EAGAIN) {
>                  /*
> diff --git a/migration/trace-events b/migration/trace-events
> index ab2e1e4..3a74f91 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -202,6 +202,12 @@ save_xbzrle_page_overflow(void) ""
>  ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
>  ram_load_complete(int ret, uint64_t seq_iter) "exit_code %d seq iteration %" PRIu64
>  get_mem_fault_cpu_index(uint32_t pid) "pid %u is not vCPU"
> +observe_thread_status(int ptid, char *name, char *status) "host_tid %d %s %s"
> +vcpu_state(int cpu_index, int is_running) "cpu %d running %d"
> +proc_line_cb(const char *str) "%s"
> +get_proc_file(const char *str) "opened %s"
> +observe_thread_begin(const char *str) "%s"
> +observe_thread_end(const char *str) "%s"
>  
>  # migration/exec.c
>  migration_exec_outgoing(const char *cmd) "cmd=%s"
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 5/6] migration: send postcopy downtime back to source
  2017-04-24 17:26       ` Dr. David Alan Gilbert
@ 2017-04-25  5:51         ` Alexey
  0 siblings, 0 replies; 38+ messages in thread
From: Alexey @ 2017-04-25  5:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: i.maximets, qemu-devel

On Mon, Apr 24, 2017 at 06:26:31PM +0100, Dr. David Alan Gilbert wrote:
> * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > Right now to initiate postcopy live migration need to
> > send request to source machine and specify destination.
> > 
> > User could request migration status by query-migrate qmp command on
> > source machine, but postcopy downtime is being evaluated on destination,
> > so it should be transmitted back to source. For this purpose return path
> > socket was shosen.
> > 
> > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> 
> That will break a migration from an older QEMU to a newer QEMU with this feature
> since the old QEMU won't know the message type and fail with a
>   'Received invalid message'
> 
> near the start of source_return_path_thread.
> 
> The simpler solution is to let the stat be read on the destination side
> and not bother sending it backwards over the wire.
Yes, the simplest solution was just to trace_ it. And in this patch set,
I'll keep it.

Looks like, yes, current code couldn't just skip unknown header_type.
Mmm, binary protocol and it have to know the *length*, and length is not
transmitted with header_type, it's hard coded per header type. So
MIG_RP_MSG isn't scalable.
BTW, are you going to replace that protocol in the future?
I think it's even possible to keep MIG_RP_MSG protocol as is, but just
need to send before RP opening an RP_METADATE, header_type and field length,
in the first approximation. But, again, old QEMU will not know about
RP_METADATA and will fail. Or json based, I had coming across on json based encapsulation
for devices.


As a total alternative, I could suggest to send request every time user
request query-migration on src, but in this case MigrationIncomingState
should live forever.

> 
> Dave
> 
> > ---
> >  include/migration/migration.h |  4 +++-
> >  migration/migration.c         | 20 ++++++++++++++++++--
> >  migration/postcopy-ram.c      |  1 +
> >  3 files changed, 22 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/migration/migration.h b/include/migration/migration.h
> > index 5d2c628..5535aa6 100644
> > --- a/include/migration/migration.h
> > +++ b/include/migration/migration.h
> > @@ -55,7 +55,8 @@ enum mig_rp_message_type {
> >  
> >      MIG_RP_MSG_REQ_PAGES_ID, /* data (start: be64, len: be32, id: string) */
> >      MIG_RP_MSG_REQ_PAGES,    /* data (start: be64, len: be32) */
> > -
> > +    MIG_RP_MSG_DOWNTIME,    /* downtime value from destination,
> > +                               calculated and sent in case of post copy */
> >      MIG_RP_MSG_MAX
> >  };
> >  
> > @@ -364,6 +365,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
> >                            uint32_t value);
> >  void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname,
> >                                ram_addr_t start, size_t len);
> > +void migrate_send_rp_downtime(MigrationIncomingState *mis, uint64_t downtime);
> >  
> >  void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
> >  void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 5bac434..3134e24 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -553,6 +553,19 @@ void migrate_send_rp_message(MigrationIncomingState *mis,
> >  }
> >  
> >  /*
> > + * Send postcopy migration downtime,
> > + * at the moment of calling this function migration should
> > + * be completed.
> > + */
> > +void migrate_send_rp_downtime(MigrationIncomingState *mis, uint64_t downtime)
> > +{
> > +    uint64_t buf;
> > +
> > +    buf = cpu_to_be64(downtime);
> > +    migrate_send_rp_message(mis, MIG_RP_MSG_DOWNTIME, sizeof(downtime), &buf);
> > +}
> > +
> > +/*
> >   * Send a 'SHUT' message on the return channel with the given value
> >   * to indicate that we've finished with the RP.  Non-0 value indicates
> >   * error.
> > @@ -1483,6 +1496,7 @@ static struct rp_cmd_args {
> >      [MIG_RP_MSG_PONG]           = { .len =  4, .name = "PONG" },
> >      [MIG_RP_MSG_REQ_PAGES]      = { .len = 12, .name = "REQ_PAGES" },
> >      [MIG_RP_MSG_REQ_PAGES_ID]   = { .len = -1, .name = "REQ_PAGES_ID" },
> > +    [MIG_RP_MSG_DOWNTIME]       = { .len =  8, .name = "DOWNTIME" },
> >      [MIG_RP_MSG_MAX]            = { .len = -1, .name = "MAX" },
> >  };
> >  
> > @@ -1613,6 +1627,10 @@ static void *source_return_path_thread(void *opaque)
> >              migrate_handle_rp_req_pages(ms, (char *)&buf[13], start, len);
> >              break;
> >  
> > +        case MIG_RP_MSG_DOWNTIME:
> > +            ms->downtime = ldq_be_p(buf);
> > +            break;
> > +
> >          default:
> >              break;
> >          }
> > @@ -1677,7 +1695,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> >      int ret;
> >      QIOChannelBuffer *bioc;
> >      QEMUFile *fb;
> > -    int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >      bool restart_block = false;
> >      migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
> >                        MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > @@ -1779,7 +1796,6 @@ static int postcopy_start(MigrationState *ms, bool *old_vm_running)
> >       */
> >      ms->postcopy_after_devices = true;
> >      notifier_list_notify(&migration_state_notifiers, ms);
> > -    ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
> >  
> >      qemu_mutex_unlock_iothread();
> >  
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index ea89f4e..42330fd 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -330,6 +330,7 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
> >      }
> >  
> >      postcopy_state_set(POSTCOPY_INCOMING_END);
> > +    migrate_send_rp_downtime(mis, get_postcopy_total_downtime());
> >      migrate_send_rp_shut(mis, qemu_file_get_error(mis->from_src_file) != 0);
> >  
> >      if (mis->postcopy_tmp_page) {
> > -- 
> > 1.8.3.1
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support
  2017-04-24 17:10               ` Dr. David Alan Gilbert
@ 2017-04-25  7:55                 ` Alexey
  2017-04-25 11:14                   ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 38+ messages in thread
From: Alexey @ 2017-04-25  7:55 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, aarcange; +Cc: i.maximets, qemu-devel, Peter Xu

+ Andrea Arcangeli

On Mon, Apr 24, 2017 at 06:10:02PM +0100, Dr. David Alan Gilbert wrote:
> * Alexey (a.perevalov@samsung.com) wrote:
> > On Mon, Apr 24, 2017 at 04:12:29PM +0800, Peter Xu wrote:
> > > On Fri, Apr 21, 2017 at 06:22:12PM +0300, Alexey wrote:
> > > > On Fri, Apr 21, 2017 at 11:24:54AM +0100, Dr. David Alan Gilbert wrote:
> > > > > * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > > > > > Userfaultfd mechanism is able to provide process thread id,
> > > > > > in case when client request it with UFDD_API ioctl.
> > > > > > 
> > > > > > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > 
> > > > > There seem to be two parts to this:
> > > > >   a) Adding the mis parameter to ufd_version_check
> > > > >   b) Asking for the feature
> > > > > 
> > > > > Please split it into two patches.
> > > > > 
> > > > > Also....
> > > > > 
> > > > > > ---
> > > > > >  include/migration/postcopy-ram.h |  2 +-
> > > > > >  migration/migration.c            |  2 +-
> > > > > >  migration/postcopy-ram.c         | 12 ++++++------
> > > > > >  migration/savevm.c               |  2 +-
> > > > > >  4 files changed, 9 insertions(+), 9 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > > > > > index 8e036b9..809f6db 100644
> > > > > > --- a/include/migration/postcopy-ram.h
> > > > > > +++ b/include/migration/postcopy-ram.h
> > > > > > @@ -14,7 +14,7 @@
> > > > > >  #define QEMU_POSTCOPY_RAM_H
> > > > > >  
> > > > > >  /* Return true if the host supports everything we need to do postcopy-ram */
> > > > > > -bool postcopy_ram_supported_by_host(void);
> > > > > > +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
> > > > > >  
> > > > > >  /*
> > > > > >   * Make all of RAM sensitive to accesses to areas that haven't yet been written
> > > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > > index ad4036f..79f6425 100644
> > > > > > --- a/migration/migration.c
> > > > > > +++ b/migration/migration.c
> > > > > > @@ -802,7 +802,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
> > > > > >           * special support.
> > > > > >           */
> > > > > >          if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
> > > > > > -            !postcopy_ram_supported_by_host()) {
> > > > > > +            !postcopy_ram_supported_by_host(NULL)) {
> > > > > >              /* postcopy_ram_supported_by_host will have emitted a more
> > > > > >               * detailed message
> > > > > >               */
> > > > > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > > > > index dc80dbb..70f0480 100644
> > > > > > --- a/migration/postcopy-ram.c
> > > > > > +++ b/migration/postcopy-ram.c
> > > > > > @@ -60,13 +60,13 @@ struct PostcopyDiscardState {
> > > > > >  #include <sys/eventfd.h>
> > > > > >  #include <linux/userfaultfd.h>
> > > > > >  
> > > > > > -static bool ufd_version_check(int ufd)
> > > > > > +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > > > > >  {
> > > > > >      struct uffdio_api api_struct;
> > > > > >      uint64_t ioctl_mask;
> > > > > >  
> > > > > >      api_struct.api = UFFD_API;
> > > > > > -    api_struct.features = 0;
> > > > > > +    api_struct.features = UFFD_FEATURE_THREAD_ID;
> > > > > >      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> > > > > >          error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
> > > > > >                       strerror(errno));
> > > > > 
> > > > > You're not actually using the 'mis' here - what I'd expected was
> > > > > something that was going to check if the UFFDIO_API return said that it really
> > > > > had the feature, and if so store a flag in the MIS somewhere.
> > > > > 
> > > > > Also, I'm not sure it's right to set 'api_struct.features' on the input - what
> > > > > happens if this is run on an old kernel - we don't want postcopy to fail on
> > > > > an old kernel without your feature.
> > > > > I'm not 100% sure of the interface, but I think the way it works is you set
> > > > > features = 0 before the call, and then check the api_struct.features in the
> > > > > return - in the same way that I check for UFFD_FEATURE_MISSING_HUGETLBFS.
> > > > > 
> > > > We need to ask kernel about that feature,
> > > > right,
> > > > kernel returns back available features
> > > > uffdio_api.features = UFFD_API_FEATURES
> > > > but it also stores requested features
> > > 
> > > I feel like this does not against Dave's comment, maybe we just need
> > > to send the UFFDIO_API twice? Like:
> > yes, ioctl with UFFDIO_API will fail on old kernel if we will request
> > e.g. UFFD_FEATURE_THREAD_ID or other new feature.
> > 
> > So in general way need a per feature request, for better error handling.
> 
> No, we don't need to - I think the way the kernel works is that you pass
> features = 0 in, and it sets api_struct.features on the way out;
> so if you always pass 0 in, you can then just check the features that
> it returns.
>
Without explicitly set UFFD_FEATURE_THREAD_ID, ptid will not sent back
to user space.

Also it's impossible to call ioctl UFFD_API more than one time, due to
internal state of userfault_ctx inside kernel is changing
UFFD_STATE_WAIT_API -> UFFD_STATE_RUNNING, 
but ioctl UFFD_API expects UFFD_STATE_WAIT_API
^^^

So looks like no way to provide backward compatibility for old kernels.
I even don't know how to be with new kernels, because point of extension
should be for new kernels (e.g. I want to add new feature in future,
UFFD_FEATURE_ALLOW_PADDING which will allow UFFD_COPY for lesser page
size than was registered).
So how to be in this case, add new UFFD feature, like
UFFD_FEATURE_ALLOW_CALL_API_AGAIN (allow set not always/persistent feature,
like UFFD_FEATURE_THREAD_ID)

or just remove condition in kernel while sending ptid.

Or it's even not a problem, just close ufd/reopen and resend
UFFD_FEATURE_THREAD_ID.

> Dave
> 
> > 
> > > 
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index 85fd8d7..fd0905f 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -64,6 +64,7 @@ static bool ufd_version_check(int ufd)
> > >  {
> > >      struct uffdio_api api_struct;
> > >      uint64_t ioctl_mask;
> > > +    uint64_t features = 0;
> > > 
> > >      api_struct.api = UFFD_API;
> > >      api_struct.features = 0;
> > > @@ -92,6 +93,27 @@ static bool ufd_version_check(int ufd)
> > >              return false;
> > >          }
> > >      }
> > > +
> > > +#ifdef UFFD_FEATURE_THREAD_ID
> > > +    if (api_struct.features & UFFD_FEATURE_THREAD_ID) {
> > > +        features |= UFFD_FEATURE_THREAD_ID;
> > > +    }
> > > +#endif
> > > +
> > > +    if (features) {
> > > +        /*
> > > +         * If there are new features to be enabled from userspace,
> > > +         * trigger another UFFDIO_API ioctl.
> > > +         */
> > > +        api_struct.api = UFFD_API;
> > > +        api_struct.features = features;
> > > +        if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> > > +            error_report("UFFDIO_API failed to setup features: 0x%"PRIx64,
> > > +                         features);
> > > +            return false;
> > > +        }
> > > +    }
> > > +
> > >      return true;
> > >  }
> > > 
> > > > /* only enable the requested features for this uffd context */
> > > >  ctx->features = uffd_ctx_features(features);
> > > > 
> > > > so, at the time when process thread id is going to be sent
> > > > kernel checks if it was requested
> > > > +       if (features & UFFD_FEATURE_THREAD_ID)
> > > > +               msg.arg.pagefault.ptid = task_pid_vnr(current);
> > > 
> > > (I am slightly curious about why we need this if block, after all
> > >  userspace should know whether the ptid field would be valid from the
> > >  first UFFDIO_API ioctl, right?)
> > If I correctly understand you question ) that condition was suggested,
> > due to page faulting is performance critical part (in general, not only postcopy
> > case ), that's why it should be enabled from userspace, 
> > only for statistics/debug purpose.
> > Also looks like David want to see that feature on QEMU as not always
> > feature too.
> > 
> > > 
> > > Thanks,
> > > 
> > > > 
> > > > from patch message:
> > > > 
> > > >  Process's thread id is being provided when user requeste it
> > > > by setting UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.
> > > > 
> > > > UFFD_FEATURE_MISSING_HUGETLBFS - look like default, unconditional
> > > > behavior (I didn't find any usage of that define in kernel).
> > > 
> > > -- 
> > > Peter Xu
> > > 
> > 
> > -- 
> > 
> > BR
> > Alexey
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

-- 

BR
Alexey

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side
  2017-04-14 13:17     ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side Alexey Perevalov
  2017-04-21 12:00       ` Dr. David Alan Gilbert
@ 2017-04-25  8:24       ` Peter Xu
  2017-04-25 10:10         ` Alexey Perevalov
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Xu @ 2017-04-25  8:24 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: dgilbert, qemu-devel, i.maximets

On Fri, Apr 14, 2017 at 04:17:18PM +0300, Alexey Perevalov wrote:

[...]

> +/*
> + * This function calculates downtime per cpu and trace it
> + *
> + *  Also it calculates total downtime as an interval's overlap,
> + *  for many vCPU.
> + *
> + *  The approach is following:
> + *  Initially intervals are represented in tree where key is
> + *  pagefault address, and values:
> + *   begin - page fault time
> + *   end   - page load time
> + *   cpus  - bit mask shows affected cpus
> + *
> + *  To calculate overlap on all cpus, intervals converted into
> + *  array of points in time (downtime_points), the size of
> + *  array is 2 * number of nodes in tree of intervals (2 array
> + *  elements per one in element of interval).
> + *  Each element is marked as end (E) or as start (S) of interval.
> + *  The overlap downtime will be calculated for SE, only in case
> + *  there is sequence S(0..N)E(M) for every vCPU.
> + *
> + * As example we have 3 CPU
> + *
> + *      S1        E1           S1               E1
> + * -----***********------------xxx***************------------------------> CPU1
> + *
> + *             S2                E2
> + * ------------****************xxx---------------------------------------> CPU2
> + *
> + *                         S3            E3
> + * ------------------------****xxx********-------------------------------> CPU3
> + *
> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> + * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
> + * Legend of picture is following: * - means downtime per vCPU
> + *                                 x - means overlapped downtime
> + */

Not sure whether I get the point in this patch... iiuc we defined the
downtime here as the period when all vcpus are halted, right?

If so, I have a few questions:

- will this algorithm consume lots of memory? since I see we have one
  trace object per fault page address

- do we need to protect the tree to make sure there's no insertion
  when doing the calculation?

- if the only thing we want here is the "total downtime", whether
  below would work? (assuming N is vcpu numbers)

  a. define array cpu_fault_addr[N], to store current faulted address
     for each vcpu. When vcpu X is running, cpu_fault_addr[X] should
     be 0.

  b. when page fault happens on vcpu A, setup cpu_fault_addr[A] with
     corresponding fault address.

  c. when page copy finished, loop over cpu_fault_addr[] to see
     whether that matches any, clear corresponding element if matched.

  Then, we can just measure the period when cpu_fault_addr[] is all
  set (by tracing at both b. and c.). Can this work?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side
  2017-04-25  8:24       ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side Peter Xu
@ 2017-04-25 10:10         ` Alexey Perevalov
  2017-04-25 10:25           ` Peter Xu
  0 siblings, 1 reply; 38+ messages in thread
From: Alexey Perevalov @ 2017-04-25 10:10 UTC (permalink / raw)
  To: Peter Xu; +Cc: dgilbert, qemu-devel, i.maximets

On 04/25/2017 11:24 AM, Peter Xu wrote:
> On Fri, Apr 14, 2017 at 04:17:18PM +0300, Alexey Perevalov wrote:
>
> [...]
>
>> +/*
>> + * This function calculates downtime per cpu and trace it
>> + *
>> + *  Also it calculates total downtime as an interval's overlap,
>> + *  for many vCPU.
>> + *
>> + *  The approach is following:
>> + *  Initially intervals are represented in tree where key is
>> + *  pagefault address, and values:
>> + *   begin - page fault time
>> + *   end   - page load time
>> + *   cpus  - bit mask shows affected cpus
>> + *
>> + *  To calculate overlap on all cpus, intervals converted into
>> + *  array of points in time (downtime_points), the size of
>> + *  array is 2 * number of nodes in tree of intervals (2 array
>> + *  elements per one in element of interval).
>> + *  Each element is marked as end (E) or as start (S) of interval.
>> + *  The overlap downtime will be calculated for SE, only in case
>> + *  there is sequence S(0..N)E(M) for every vCPU.
>> + *
>> + * As example we have 3 CPU
>> + *
>> + *      S1        E1           S1               E1
>> + * -----***********------------xxx***************------------------------> CPU1
>> + *
>> + *             S2                E2
>> + * ------------****************xxx---------------------------------------> CPU2
>> + *
>> + *                         S3            E3
>> + * ------------------------****xxx********-------------------------------> CPU3
>> + *
>> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
>> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
>> + * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
>> + * Legend of picture is following: * - means downtime per vCPU
>> + *                                 x - means overlapped downtime
>> + */
> Not sure whether I get the point in this patch... iiuc we defined the
> downtime here as the period when all vcpus are halted, right?
>
> If so, I have a few questions:
>
> - will this algorithm consume lots of memory? since I see we have one
>    trace object per fault page address
I don't think, it consumes too much, one DowntimeDuration
takes (if I'm using bitmap_try_new, in this patch set I used pointer to 
uint64_t array to keep bitmap array,
but I'm going to use include/qemu/bitmap.h, it works with pointers to long)

(2* int64 + (ROUND_UP((smp_cpus + BITS_PER_BYTE * sizeof(long) - 1 / 
(BITS_PER_BYTE * sizeof(long)))) * siezof(long)
so it's about 16 + at least 4 bytes, per page fault,
Lets assume we migration 256 vCPU and 256 Gb of ram and that ram is 
based on 4Kb pages - it's really bad case
16 + ((256 + 8 * 4 - 1) / ( 8 * 4 )) * 4 = 52 bytes
(256 * 1024 * 1024 * 1024)/(4 * 1024) = 67108864 page faults, but not 
all of these pages will be pagefaulted, due to
page pre-fetching
67108864 * 52 = 3489660928 bytes (3.5 Gb for that operation),
but I have a doubt, who will use 4Kb pages for 256 Gb, probably
2Mb or 1G huge page will be chosen on x86, on ARM or other architecture 
it could be another values.

>
> - do we need to protect the tree to make sure there's no insertion
>    when doing the calculation?
I asked the same question when sent RFC patches,
the answer here is no, we should not, due to right now,
it's only one socket and one listen thread (maybe in future,
it will be required, maybe after multi fd patch set),
and calculation is doing synchronously right after migration complete.

>
> - if the only thing we want here is the "total downtime", whether
>    below would work? (assuming N is vcpu numbers)
>
>    a. define array cpu_fault_addr[N], to store current faulted address
>       for each vcpu. When vcpu X is running, cpu_fault_addr[X] should
>       be 0.
>
>    b. when page fault happens on vcpu A, setup cpu_fault_addr[A] with
>       corresponding fault address.
at this time need to is fault happens for all another vCPU,
and if it happens mark current time as total vCPU downtime start.

>    c. when page copy finished, loop over cpu_fault_addr[] to see
>       whether that matches any, clear corresponding element if matched.
so when page copy finished and mark for total vCPU is set,
yes that interval is a part of total downtime.
>
>    Then, we can just measure the period when cpu_fault_addr[] is all
>    set (by tracing at both b. and c.). Can this work?
Yes, it works, but it's better to keep time - cpu_fault_time,
address is not important here, it doesn't matter the reason of pagefault.
2 vCPU could fault due to access to one page, ok, it's not a problem, 
just store
time when it was faulted.
Looks like it's better algorithm, with lesser complexity,
thank you a lot.


>
> Thanks,
>


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side
  2017-04-25 10:10         ` Alexey Perevalov
@ 2017-04-25 10:25           ` Peter Xu
  2017-04-25 10:47             ` Alexey Perevalov
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Xu @ 2017-04-25 10:25 UTC (permalink / raw)
  To: Alexey Perevalov; +Cc: dgilbert, qemu-devel, i.maximets

On Tue, Apr 25, 2017 at 01:10:30PM +0300, Alexey Perevalov wrote:
> On 04/25/2017 11:24 AM, Peter Xu wrote:
> >On Fri, Apr 14, 2017 at 04:17:18PM +0300, Alexey Perevalov wrote:
> >
> >[...]
> >
> >>+/*
> >>+ * This function calculates downtime per cpu and trace it
> >>+ *
> >>+ *  Also it calculates total downtime as an interval's overlap,
> >>+ *  for many vCPU.
> >>+ *
> >>+ *  The approach is following:
> >>+ *  Initially intervals are represented in tree where key is
> >>+ *  pagefault address, and values:
> >>+ *   begin - page fault time
> >>+ *   end   - page load time
> >>+ *   cpus  - bit mask shows affected cpus
> >>+ *
> >>+ *  To calculate overlap on all cpus, intervals converted into
> >>+ *  array of points in time (downtime_points), the size of
> >>+ *  array is 2 * number of nodes in tree of intervals (2 array
> >>+ *  elements per one in element of interval).
> >>+ *  Each element is marked as end (E) or as start (S) of interval.
> >>+ *  The overlap downtime will be calculated for SE, only in case
> >>+ *  there is sequence S(0..N)E(M) for every vCPU.
> >>+ *
> >>+ * As example we have 3 CPU
> >>+ *
> >>+ *      S1        E1           S1               E1
> >>+ * -----***********------------xxx***************------------------------> CPU1
> >>+ *
> >>+ *             S2                E2
> >>+ * ------------****************xxx---------------------------------------> CPU2
> >>+ *
> >>+ *                         S3            E3
> >>+ * ------------------------****xxx********-------------------------------> CPU3
> >>+ *
> >>+ * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
> >>+ * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
> >>+ * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
> >>+ * Legend of picture is following: * - means downtime per vCPU
> >>+ *                                 x - means overlapped downtime
> >>+ */
> >Not sure whether I get the point in this patch... iiuc we defined the
> >downtime here as the period when all vcpus are halted, right?
> >
> >If so, I have a few questions:
> >
> >- will this algorithm consume lots of memory? since I see we have one
> >   trace object per fault page address
> I don't think, it consumes too much, one DowntimeDuration
> takes (if I'm using bitmap_try_new, in this patch set I used pointer to
> uint64_t array to keep bitmap array,
> but I'm going to use include/qemu/bitmap.h, it works with pointers to long)
> 
> (2* int64 + (ROUND_UP((smp_cpus + BITS_PER_BYTE * sizeof(long) - 1 /
> (BITS_PER_BYTE * sizeof(long)))) * siezof(long)
> so it's about 16 + at least 4 bytes, per page fault,
> Lets assume we migration 256 vCPU and 256 Gb of ram and that ram is based on
> 4Kb pages - it's really bad case
> 16 + ((256 + 8 * 4 - 1) / ( 8 * 4 )) * 4 = 52 bytes
> (256 * 1024 * 1024 * 1024)/(4 * 1024) = 67108864 page faults, but not all of
> these pages will be pagefaulted, due to
> page pre-fetching
> 67108864 * 52 = 3489660928 bytes (3.5 Gb for that operation),
> but I have a doubt, who will use 4Kb pages for 256 Gb, probably
> 2Mb or 1G huge page will be chosen on x86, on ARM or other architecture it
> could be another values.

Hmm, it looks still big though...

> 
> >
> >- do we need to protect the tree to make sure there's no insertion
> >   when doing the calculation?
> I asked the same question when sent RFC patches,
> the answer here is no, we should not, due to right now,
> it's only one socket and one listen thread (maybe in future,
> it will be required, maybe after multi fd patch set),
> and calculation is doing synchronously right after migration complete.

Okay.

> 
> >
> >- if the only thing we want here is the "total downtime", whether
> >   below would work? (assuming N is vcpu numbers)
> >
> >   a. define array cpu_fault_addr[N], to store current faulted address
> >      for each vcpu. When vcpu X is running, cpu_fault_addr[X] should
> >      be 0.
> >
> >   b. when page fault happens on vcpu A, setup cpu_fault_addr[A] with
> >      corresponding fault address.
> at this time need to is fault happens for all another vCPU,
> and if it happens mark current time as total vCPU downtime start.
> 
> >   c. when page copy finished, loop over cpu_fault_addr[] to see
> >      whether that matches any, clear corresponding element if matched.
> so when page copy finished and mark for total vCPU is set,
> yes that interval is a part of total downtime.
> >
> >   Then, we can just measure the period when cpu_fault_addr[] is all
> >   set (by tracing at both b. and c.). Can this work?
> Yes, it works, but it's better to keep time - cpu_fault_time,
> address is not important here, it doesn't matter the reason of pagefault.

We still need the addresses? So that when we do COPY, we can check the
new page address against these stored ones, to know which vcpus to
clear the bit.

> 2 vCPU could fault due to access to one page, ok, it's not a problem, just
> store
> time when it was faulted.
> Looks like it's better algorithm, with lesser complexity,
> thank you a lot.

My pleasure. Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side
  2017-04-25 10:25           ` Peter Xu
@ 2017-04-25 10:47             ` Alexey Perevalov
  0 siblings, 0 replies; 38+ messages in thread
From: Alexey Perevalov @ 2017-04-25 10:47 UTC (permalink / raw)
  To: Peter Xu; +Cc: dgilbert, qemu-devel, i.maximets

On 04/25/2017 01:25 PM, Peter Xu wrote:
> On Tue, Apr 25, 2017 at 01:10:30PM +0300, Alexey Perevalov wrote:
>> On 04/25/2017 11:24 AM, Peter Xu wrote:
>>> On Fri, Apr 14, 2017 at 04:17:18PM +0300, Alexey Perevalov wrote:
>>>
>>> [...]
>>>
>>>> +/*
>>>> + * This function calculates downtime per cpu and trace it
>>>> + *
>>>> + *  Also it calculates total downtime as an interval's overlap,
>>>> + *  for many vCPU.
>>>> + *
>>>> + *  The approach is following:
>>>> + *  Initially intervals are represented in tree where key is
>>>> + *  pagefault address, and values:
>>>> + *   begin - page fault time
>>>> + *   end   - page load time
>>>> + *   cpus  - bit mask shows affected cpus
>>>> + *
>>>> + *  To calculate overlap on all cpus, intervals converted into
>>>> + *  array of points in time (downtime_points), the size of
>>>> + *  array is 2 * number of nodes in tree of intervals (2 array
>>>> + *  elements per one in element of interval).
>>>> + *  Each element is marked as end (E) or as start (S) of interval.
>>>> + *  The overlap downtime will be calculated for SE, only in case
>>>> + *  there is sequence S(0..N)E(M) for every vCPU.
>>>> + *
>>>> + * As example we have 3 CPU
>>>> + *
>>>> + *      S1        E1           S1               E1
>>>> + * -----***********------------xxx***************------------------------> CPU1
>>>> + *
>>>> + *             S2                E2
>>>> + * ------------****************xxx---------------------------------------> CPU2
>>>> + *
>>>> + *                         S3            E3
>>>> + * ------------------------****xxx********-------------------------------> CPU3
>>>> + *
>>>> + * We have sequence S1,S2,E1,S3,S1,E2,E3,E1
>>>> + * S2,E1 - doesn't match condition due to sequence S1,S2,E1 doesn't include CPU3
>>>> + * S3,S1,E2 - sequenece includes all CPUs, in this case overlap will be S1,E2
>>>> + * Legend of picture is following: * - means downtime per vCPU
>>>> + *                                 x - means overlapped downtime
>>>> + */
>>> Not sure whether I get the point in this patch... iiuc we defined the
>>> downtime here as the period when all vcpus are halted, right?
>>>
>>> If so, I have a few questions:
>>>
>>> - will this algorithm consume lots of memory? since I see we have one
>>>    trace object per fault page address
>> I don't think, it consumes too much, one DowntimeDuration
>> takes (if I'm using bitmap_try_new, in this patch set I used pointer to
>> uint64_t array to keep bitmap array,
>> but I'm going to use include/qemu/bitmap.h, it works with pointers to long)
>>
>> (2* int64 + (ROUND_UP((smp_cpus + BITS_PER_BYTE * sizeof(long) - 1 /
>> (BITS_PER_BYTE * sizeof(long)))) * siezof(long)
>> so it's about 16 + at least 4 bytes, per page fault,
>> Lets assume we migration 256 vCPU and 256 Gb of ram and that ram is based on
>> 4Kb pages - it's really bad case
>> 16 + ((256 + 8 * 4 - 1) / ( 8 * 4 )) * 4 = 52 bytes
>> (256 * 1024 * 1024 * 1024)/(4 * 1024) = 67108864 page faults, but not all of
>> these pages will be pagefaulted, due to
>> page pre-fetching
>> 67108864 * 52 = 3489660928 bytes (3.5 Gb for that operation),
>> but I have a doubt, who will use 4Kb pages for 256 Gb, probably
>> 2Mb or 1G huge page will be chosen on x86, on ARM or other architecture it
>> could be another values.
> Hmm, it looks still big though...
>
>>> - do we need to protect the tree to make sure there's no insertion
>>>    when doing the calculation?
>> I asked the same question when sent RFC patches,
>> the answer here is no, we should not, due to right now,
>> it's only one socket and one listen thread (maybe in future,
>> it will be required, maybe after multi fd patch set),
>> and calculation is doing synchronously right after migration complete.
> Okay.
>
>>> - if the only thing we want here is the "total downtime", whether
>>>    below would work? (assuming N is vcpu numbers)
>>>
>>>    a. define array cpu_fault_addr[N], to store current faulted address
>>>       for each vcpu. When vcpu X is running, cpu_fault_addr[X] should
>>>       be 0.
>>>
>>>    b. when page fault happens on vcpu A, setup cpu_fault_addr[A] with
>>>       corresponding fault address.
>> at this time need to is fault happens for all another vCPU,
>> and if it happens mark current time as total vCPU downtime start.
>>
>>>    c. when page copy finished, loop over cpu_fault_addr[] to see
>>>       whether that matches any, clear corresponding element if matched.
>> so when page copy finished and mark for total vCPU is set,
>> yes that interval is a part of total downtime.
>>>    Then, we can just measure the period when cpu_fault_addr[] is all
>>>    set (by tracing at both b. and c.). Can this work?
>> Yes, it works, but it's better to keep time - cpu_fault_time,
>> address is not important here, it doesn't matter the reason of pagefault.
> We still need the addresses? So that when we do COPY, we can check the
> new page address against these stored ones, to know which vcpus to
> clear the bit.
Frankly say, we need ) because there is not another way to determine
vCPU at COPY time.

>
>> 2 vCPU could fault due to access to one page, ok, it's not a problem, just
>> store
>> time when it was faulted.
>> Looks like it's better algorithm, with lesser complexity,
>> thank you a lot.
> My pleasure. Thanks,
>


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support
  2017-04-25  7:55                 ` Alexey
@ 2017-04-25 11:14                   ` Dr. David Alan Gilbert
  2017-04-25 11:51                     ` Alexey Perevalov
  0 siblings, 1 reply; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-25 11:14 UTC (permalink / raw)
  To: Alexey; +Cc: aarcange, i.maximets, qemu-devel, Peter Xu

* Alexey (a.perevalov@samsung.com) wrote:
> + Andrea Arcangeli
> 
> On Mon, Apr 24, 2017 at 06:10:02PM +0100, Dr. David Alan Gilbert wrote:
> > * Alexey (a.perevalov@samsung.com) wrote:
> > > On Mon, Apr 24, 2017 at 04:12:29PM +0800, Peter Xu wrote:
> > > > On Fri, Apr 21, 2017 at 06:22:12PM +0300, Alexey wrote:
> > > > > On Fri, Apr 21, 2017 at 11:24:54AM +0100, Dr. David Alan Gilbert wrote:
> > > > > > * Alexey Perevalov (a.perevalov@samsung.com) wrote:
> > > > > > > Userfaultfd mechanism is able to provide process thread id,
> > > > > > > in case when client request it with UFDD_API ioctl.
> > > > > > > 
> > > > > > > Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> > > > > > 
> > > > > > There seem to be two parts to this:
> > > > > >   a) Adding the mis parameter to ufd_version_check
> > > > > >   b) Asking for the feature
> > > > > > 
> > > > > > Please split it into two patches.
> > > > > > 
> > > > > > Also....
> > > > > > 
> > > > > > > ---
> > > > > > >  include/migration/postcopy-ram.h |  2 +-
> > > > > > >  migration/migration.c            |  2 +-
> > > > > > >  migration/postcopy-ram.c         | 12 ++++++------
> > > > > > >  migration/savevm.c               |  2 +-
> > > > > > >  4 files changed, 9 insertions(+), 9 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
> > > > > > > index 8e036b9..809f6db 100644
> > > > > > > --- a/include/migration/postcopy-ram.h
> > > > > > > +++ b/include/migration/postcopy-ram.h
> > > > > > > @@ -14,7 +14,7 @@
> > > > > > >  #define QEMU_POSTCOPY_RAM_H
> > > > > > >  
> > > > > > >  /* Return true if the host supports everything we need to do postcopy-ram */
> > > > > > > -bool postcopy_ram_supported_by_host(void);
> > > > > > > +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
> > > > > > >  
> > > > > > >  /*
> > > > > > >   * Make all of RAM sensitive to accesses to areas that haven't yet been written
> > > > > > > diff --git a/migration/migration.c b/migration/migration.c
> > > > > > > index ad4036f..79f6425 100644
> > > > > > > --- a/migration/migration.c
> > > > > > > +++ b/migration/migration.c
> > > > > > > @@ -802,7 +802,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
> > > > > > >           * special support.
> > > > > > >           */
> > > > > > >          if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
> > > > > > > -            !postcopy_ram_supported_by_host()) {
> > > > > > > +            !postcopy_ram_supported_by_host(NULL)) {
> > > > > > >              /* postcopy_ram_supported_by_host will have emitted a more
> > > > > > >               * detailed message
> > > > > > >               */
> > > > > > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > > > > > index dc80dbb..70f0480 100644
> > > > > > > --- a/migration/postcopy-ram.c
> > > > > > > +++ b/migration/postcopy-ram.c
> > > > > > > @@ -60,13 +60,13 @@ struct PostcopyDiscardState {
> > > > > > >  #include <sys/eventfd.h>
> > > > > > >  #include <linux/userfaultfd.h>
> > > > > > >  
> > > > > > > -static bool ufd_version_check(int ufd)
> > > > > > > +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
> > > > > > >  {
> > > > > > >      struct uffdio_api api_struct;
> > > > > > >      uint64_t ioctl_mask;
> > > > > > >  
> > > > > > >      api_struct.api = UFFD_API;
> > > > > > > -    api_struct.features = 0;
> > > > > > > +    api_struct.features = UFFD_FEATURE_THREAD_ID;
> > > > > > >      if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> > > > > > >          error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
> > > > > > >                       strerror(errno));
> > > > > > 
> > > > > > You're not actually using the 'mis' here - what I'd expected was
> > > > > > something that was going to check if the UFFDIO_API return said that it really
> > > > > > had the feature, and if so store a flag in the MIS somewhere.
> > > > > > 
> > > > > > Also, I'm not sure it's right to set 'api_struct.features' on the input - what
> > > > > > happens if this is run on an old kernel - we don't want postcopy to fail on
> > > > > > an old kernel without your feature.
> > > > > > I'm not 100% sure of the interface, but I think the way it works is you set
> > > > > > features = 0 before the call, and then check the api_struct.features in the
> > > > > > return - in the same way that I check for UFFD_FEATURE_MISSING_HUGETLBFS.
> > > > > > 
> > > > > We need to ask kernel about that feature,
> > > > > right,
> > > > > kernel returns back available features
> > > > > uffdio_api.features = UFFD_API_FEATURES
> > > > > but it also stores requested features
> > > > 
> > > > I feel like this does not against Dave's comment, maybe we just need
> > > > to send the UFFDIO_API twice? Like:
> > > yes, ioctl with UFFDIO_API will fail on old kernel if we will request
> > > e.g. UFFD_FEATURE_THREAD_ID or other new feature.
> > > 
> > > So in general way need a per feature request, for better error handling.
> > 
> > No, we don't need to - I think the way the kernel works is that you pass
> > features = 0 in, and it sets api_struct.features on the way out;
> > so if you always pass 0 in, you can then just check the features that
> > it returns.
> >
> Without explicitly set UFFD_FEATURE_THREAD_ID, ptid will not sent back
> to user space.
> 
> Also it's impossible to call ioctl UFFD_API more than one time, due to
> internal state of userfault_ctx inside kernel is changing
> UFFD_STATE_WAIT_API -> UFFD_STATE_RUNNING, 
> but ioctl UFFD_API expects UFFD_STATE_WAIT_API
> ^^^
> 
> So looks like no way to provide backward compatibility for old kernels.
> I even don't know how to be with new kernels, because point of extension
> should be for new kernels (e.g. I want to add new feature in future,
> UFFD_FEATURE_ALLOW_PADDING which will allow UFFD_COPY for lesser page
> size than was registered).
> So how to be in this case, add new UFFD feature, like
> UFFD_FEATURE_ALLOW_CALL_API_AGAIN (allow set not always/persistent feature,
> like UFFD_FEATURE_THREAD_ID)
> 
> or just remove condition in kernel while sending ptid.
> 
> Or it's even not a problem, just close ufd/reopen and resend
> UFFD_FEATURE_THREAD_ID.

Yes, I think you'll have to do that;  so I guess you need to:
   a) Change ufd_version_check to open the ufd in the first place
      and remove the other syscalls that open it
   b) Make it pass features = 0 in to start with so that you can
      see if the kernel supports it
   c) Close the ufd and then do the API call again with the
      features you need.

Does that work?

Note we must never break QEMU working on old kernels.

Dave

> > Dave
> > 
> > > 
> > > > 
> > > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > > index 85fd8d7..fd0905f 100644
> > > > --- a/migration/postcopy-ram.c
> > > > +++ b/migration/postcopy-ram.c
> > > > @@ -64,6 +64,7 @@ static bool ufd_version_check(int ufd)
> > > >  {
> > > >      struct uffdio_api api_struct;
> > > >      uint64_t ioctl_mask;
> > > > +    uint64_t features = 0;
> > > > 
> > > >      api_struct.api = UFFD_API;
> > > >      api_struct.features = 0;
> > > > @@ -92,6 +93,27 @@ static bool ufd_version_check(int ufd)
> > > >              return false;
> > > >          }
> > > >      }
> > > > +
> > > > +#ifdef UFFD_FEATURE_THREAD_ID
> > > > +    if (api_struct.features & UFFD_FEATURE_THREAD_ID) {
> > > > +        features |= UFFD_FEATURE_THREAD_ID;
> > > > +    }
> > > > +#endif
> > > > +
> > > > +    if (features) {
> > > > +        /*
> > > > +         * If there are new features to be enabled from userspace,
> > > > +         * trigger another UFFDIO_API ioctl.
> > > > +         */
> > > > +        api_struct.api = UFFD_API;
> > > > +        api_struct.features = features;
> > > > +        if (ioctl(ufd, UFFDIO_API, &api_struct)) {
> > > > +            error_report("UFFDIO_API failed to setup features: 0x%"PRIx64,
> > > > +                         features);
> > > > +            return false;
> > > > +        }
> > > > +    }
> > > > +
> > > >      return true;
> > > >  }
> > > > 
> > > > > /* only enable the requested features for this uffd context */
> > > > >  ctx->features = uffd_ctx_features(features);
> > > > > 
> > > > > so, at the time when process thread id is going to be sent
> > > > > kernel checks if it was requested
> > > > > +       if (features & UFFD_FEATURE_THREAD_ID)
> > > > > +               msg.arg.pagefault.ptid = task_pid_vnr(current);
> > > > 
> > > > (I am slightly curious about why we need this if block, after all
> > > >  userspace should know whether the ptid field would be valid from the
> > > >  first UFFDIO_API ioctl, right?)
> > > If I correctly understand you question ) that condition was suggested,
> > > due to page faulting is performance critical part (in general, not only postcopy
> > > case ), that's why it should be enabled from userspace, 
> > > only for statistics/debug purpose.
> > > Also looks like David want to see that feature on QEMU as not always
> > > feature too.
> > > 
> > > > 
> > > > Thanks,
> > > > 
> > > > > 
> > > > > from patch message:
> > > > > 
> > > > >  Process's thread id is being provided when user requeste it
> > > > > by setting UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.
> > > > > 
> > > > > UFFD_FEATURE_MISSING_HUGETLBFS - look like default, unconditional
> > > > > behavior (I didn't find any usage of that define in kernel).
> > > > 
> > > > -- 
> > > > Peter Xu
> > > > 
> > > 
> > > -- 
> > > 
> > > BR
> > > Alexey
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
> -- 
> 
> BR
> Alexey
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c
  2017-04-21 15:49           ` Peter Maydell
@ 2017-04-25 11:23             ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 38+ messages in thread
From: Dr. David Alan Gilbert @ 2017-04-25 11:23 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Alexey, i.maximets, QEMU Developers

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On 21 April 2017 at 16:10, Alexey <a.perevalov@samsung.com> wrote:
> > Hello, thank you for so  detailed comment,
> >
> > On Fri, Apr 21, 2017 at 11:27:55AM +0100, Peter Maydell wrote:
> 
> >> Can we have a proper doc comment format comment, please,
> >> since this is now a function available to all of QEMU?
> >>
> >> > +gint g_int_cmp64(gconstpointer a, gconstpointer b,
> >> > +        gpointer __attribute__((unused)) user_data);
> >>
> >> What is this actually for? Looking at the original uses
> >> I can tell that this is a GCompareDataFunc function, but
> >> the comment should tell me that.
> > I looked at another functions comments in QEMU, I didn't find
> > some common style, and decided keep it as is. Maybe I omitted some
> > best practice here.
> 
> See include/qemu/bitops.h for an example of the comment style.
> More important than just the style is that the comment
> should clearly explain the purpose of the function in detail.
> 
> Certainly many of our existing functions are poorly documented,
> but we're trying to raise the bar gradually here.
> 
> > yes, it was copy pasted,
> > right now, after mingw build check I think to use intptr_t as a type
> > for comparision in this function or even keep gpointer and merge these two
> > functions into _direct_.
> > I saw intptr_t is widely used in QEMU.
> >
> > The intent of this function was a comparator for case when client code
> > want to keep integers in pointer field. xen_disk.c uses UINT32 so it
> > wasn't a problem, but migration uses 64 address (kernel provides it in
> > __u64, long long), so on 32 platform it's a problem.
> 
> Code which tries to put a genuinely 64 bit value into a pointer
> is buggy and needs to be fixed. I'm not clear if that is the
> case here, or if the ABI from the kernel guarantees that the
> value is really a pointer type and fits in uintptr_t / gpointer.

It's a (probably masked) HVA, so always a valid pointer.

Dave

> I don't think we need more than one of these functions.
> 
> >> This is also missing the copyright line.
> > Yes, maybe it was better for me to ask before send.
> > I found in util files with reference to GNU GPL, version 2, like
> > in this file, also I found that
> >
> >  * This library is free software; you can redistribute it and/or
> >  * modify it under the terms of the GNU Lesser General Public
> >  * License as published by the Free Software Foundation; either
> >  * version 2 of the License, or (at your option) any later version.
> >
> > So I just copied copyright reference from glib-compat.h.
> 
> Yes, that's the license statement, which is fine. What is
> missing is the copyright line, which in glib-compat.h looks
> like:
>  Copyright IBM, Corp. 2013
> 
> For code you write, you want either your personal or (more likely)
> a Samsung copyright line -- check with your company about what
> their preferred form is.
> 
> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support
  2017-04-25 11:14                   ` Dr. David Alan Gilbert
@ 2017-04-25 11:51                     ` Alexey Perevalov
  0 siblings, 0 replies; 38+ messages in thread
From: Alexey Perevalov @ 2017-04-25 11:51 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: aarcange, i.maximets, qemu-devel, Peter Xu

On 04/25/2017 02:14 PM, Dr. David Alan Gilbert wrote:
> * Alexey (a.perevalov@samsung.com) wrote:
>> + Andrea Arcangeli
>>
>> On Mon, Apr 24, 2017 at 06:10:02PM +0100, Dr. David Alan Gilbert wrote:
>>> * Alexey (a.perevalov@samsung.com) wrote:
>>>> On Mon, Apr 24, 2017 at 04:12:29PM +0800, Peter Xu wrote:
>>>>> On Fri, Apr 21, 2017 at 06:22:12PM +0300, Alexey wrote:
>>>>>> On Fri, Apr 21, 2017 at 11:24:54AM +0100, Dr. David Alan Gilbert wrote:
>>>>>>> * Alexey Perevalov (a.perevalov@samsung.com) wrote:
>>>>>>>> Userfaultfd mechanism is able to provide process thread id,
>>>>>>>> in case when client request it with UFDD_API ioctl.
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>>>>>>> There seem to be two parts to this:
>>>>>>>    a) Adding the mis parameter to ufd_version_check
>>>>>>>    b) Asking for the feature
>>>>>>>
>>>>>>> Please split it into two patches.
>>>>>>>
>>>>>>> Also....
>>>>>>>
>>>>>>>> ---
>>>>>>>>   include/migration/postcopy-ram.h |  2 +-
>>>>>>>>   migration/migration.c            |  2 +-
>>>>>>>>   migration/postcopy-ram.c         | 12 ++++++------
>>>>>>>>   migration/savevm.c               |  2 +-
>>>>>>>>   4 files changed, 9 insertions(+), 9 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/include/migration/postcopy-ram.h b/include/migration/postcopy-ram.h
>>>>>>>> index 8e036b9..809f6db 100644
>>>>>>>> --- a/include/migration/postcopy-ram.h
>>>>>>>> +++ b/include/migration/postcopy-ram.h
>>>>>>>> @@ -14,7 +14,7 @@
>>>>>>>>   #define QEMU_POSTCOPY_RAM_H
>>>>>>>>   
>>>>>>>>   /* Return true if the host supports everything we need to do postcopy-ram */
>>>>>>>> -bool postcopy_ram_supported_by_host(void);
>>>>>>>> +bool postcopy_ram_supported_by_host(MigrationIncomingState *mis);
>>>>>>>>   
>>>>>>>>   /*
>>>>>>>>    * Make all of RAM sensitive to accesses to areas that haven't yet been written
>>>>>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>>>>>> index ad4036f..79f6425 100644
>>>>>>>> --- a/migration/migration.c
>>>>>>>> +++ b/migration/migration.c
>>>>>>>> @@ -802,7 +802,7 @@ void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
>>>>>>>>            * special support.
>>>>>>>>            */
>>>>>>>>           if (!old_postcopy_cap && runstate_check(RUN_STATE_INMIGRATE) &&
>>>>>>>> -            !postcopy_ram_supported_by_host()) {
>>>>>>>> +            !postcopy_ram_supported_by_host(NULL)) {
>>>>>>>>               /* postcopy_ram_supported_by_host will have emitted a more
>>>>>>>>                * detailed message
>>>>>>>>                */
>>>>>>>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>>>>>>>> index dc80dbb..70f0480 100644
>>>>>>>> --- a/migration/postcopy-ram.c
>>>>>>>> +++ b/migration/postcopy-ram.c
>>>>>>>> @@ -60,13 +60,13 @@ struct PostcopyDiscardState {
>>>>>>>>   #include <sys/eventfd.h>
>>>>>>>>   #include <linux/userfaultfd.h>
>>>>>>>>   
>>>>>>>> -static bool ufd_version_check(int ufd)
>>>>>>>> +static bool ufd_version_check(int ufd, MigrationIncomingState *mis)
>>>>>>>>   {
>>>>>>>>       struct uffdio_api api_struct;
>>>>>>>>       uint64_t ioctl_mask;
>>>>>>>>   
>>>>>>>>       api_struct.api = UFFD_API;
>>>>>>>> -    api_struct.features = 0;
>>>>>>>> +    api_struct.features = UFFD_FEATURE_THREAD_ID;
>>>>>>>>       if (ioctl(ufd, UFFDIO_API, &api_struct)) {
>>>>>>>>           error_report("postcopy_ram_supported_by_host: UFFDIO_API failed: %s",
>>>>>>>>                        strerror(errno));
>>>>>>> You're not actually using the 'mis' here - what I'd expected was
>>>>>>> something that was going to check if the UFFDIO_API return said that it really
>>>>>>> had the feature, and if so store a flag in the MIS somewhere.
>>>>>>>
>>>>>>> Also, I'm not sure it's right to set 'api_struct.features' on the input - what
>>>>>>> happens if this is run on an old kernel - we don't want postcopy to fail on
>>>>>>> an old kernel without your feature.
>>>>>>> I'm not 100% sure of the interface, but I think the way it works is you set
>>>>>>> features = 0 before the call, and then check the api_struct.features in the
>>>>>>> return - in the same way that I check for UFFD_FEATURE_MISSING_HUGETLBFS.
>>>>>>>
>>>>>> We need to ask kernel about that feature,
>>>>>> right,
>>>>>> kernel returns back available features
>>>>>> uffdio_api.features = UFFD_API_FEATURES
>>>>>> but it also stores requested features
>>>>> I feel like this does not against Dave's comment, maybe we just need
>>>>> to send the UFFDIO_API twice? Like:
>>>> yes, ioctl with UFFDIO_API will fail on old kernel if we will request
>>>> e.g. UFFD_FEATURE_THREAD_ID or other new feature.
>>>>
>>>> So in general way need a per feature request, for better error handling.
>>> No, we don't need to - I think the way the kernel works is that you pass
>>> features = 0 in, and it sets api_struct.features on the way out;
>>> so if you always pass 0 in, you can then just check the features that
>>> it returns.
>>>
>> Without explicitly set UFFD_FEATURE_THREAD_ID, ptid will not sent back
>> to user space.
>>
>> Also it's impossible to call ioctl UFFD_API more than one time, due to
>> internal state of userfault_ctx inside kernel is changing
>> UFFD_STATE_WAIT_API -> UFFD_STATE_RUNNING,
>> but ioctl UFFD_API expects UFFD_STATE_WAIT_API
>> ^^^
>>
>> So looks like no way to provide backward compatibility for old kernels.
>> I even don't know how to be with new kernels, because point of extension
>> should be for new kernels (e.g. I want to add new feature in future,
>> UFFD_FEATURE_ALLOW_PADDING which will allow UFFD_COPY for lesser page
>> size than was registered).
>> So how to be in this case, add new UFFD feature, like
>> UFFD_FEATURE_ALLOW_CALL_API_AGAIN (allow set not always/persistent feature,
>> like UFFD_FEATURE_THREAD_ID)
>>
>> or just remove condition in kernel while sending ptid.
>>
>> Or it's even not a problem, just close ufd/reopen and resend
>> UFFD_FEATURE_THREAD_ID.
> Yes, I think you'll have to do that;  so I guess you need to:
>     a) Change ufd_version_check to open the ufd in the first place
>        and remove the other syscalls that open it
>     b) Make it pass features = 0 in to start with so that you can
>        see if the kernel supports it
>     c) Close the ufd and then do the API call again with the
>        features you need.
>
> Does that work?
yes, I already checked it, and that approach works.

>
> Note we must never break QEMU working on old kernels.
>
> Dave
>
>>> Dave
>>>
>>>>> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
>>>>> index 85fd8d7..fd0905f 100644
>>>>> --- a/migration/postcopy-ram.c
>>>>> +++ b/migration/postcopy-ram.c
>>>>> @@ -64,6 +64,7 @@ static bool ufd_version_check(int ufd)
>>>>>   {
>>>>>       struct uffdio_api api_struct;
>>>>>       uint64_t ioctl_mask;
>>>>> +    uint64_t features = 0;
>>>>>
>>>>>       api_struct.api = UFFD_API;
>>>>>       api_struct.features = 0;
>>>>> @@ -92,6 +93,27 @@ static bool ufd_version_check(int ufd)
>>>>>               return false;
>>>>>           }
>>>>>       }
>>>>> +
>>>>> +#ifdef UFFD_FEATURE_THREAD_ID
>>>>> +    if (api_struct.features & UFFD_FEATURE_THREAD_ID) {
>>>>> +        features |= UFFD_FEATURE_THREAD_ID;
>>>>> +    }
>>>>> +#endif
>>>>> +
>>>>> +    if (features) {
>>>>> +        /*
>>>>> +         * If there are new features to be enabled from userspace,
>>>>> +         * trigger another UFFDIO_API ioctl.
>>>>> +         */
>>>>> +        api_struct.api = UFFD_API;
>>>>> +        api_struct.features = features;
>>>>> +        if (ioctl(ufd, UFFDIO_API, &api_struct)) {
>>>>> +            error_report("UFFDIO_API failed to setup features: 0x%"PRIx64,
>>>>> +                         features);
>>>>> +            return false;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>       return true;
>>>>>   }
>>>>>
>>>>>> /* only enable the requested features for this uffd context */
>>>>>>   ctx->features = uffd_ctx_features(features);
>>>>>>
>>>>>> so, at the time when process thread id is going to be sent
>>>>>> kernel checks if it was requested
>>>>>> +       if (features & UFFD_FEATURE_THREAD_ID)
>>>>>> +               msg.arg.pagefault.ptid = task_pid_vnr(current);
>>>>> (I am slightly curious about why we need this if block, after all
>>>>>   userspace should know whether the ptid field would be valid from the
>>>>>   first UFFDIO_API ioctl, right?)
>>>> If I correctly understand you question ) that condition was suggested,
>>>> due to page faulting is performance critical part (in general, not only postcopy
>>>> case ), that's why it should be enabled from userspace,
>>>> only for statistics/debug purpose.
>>>> Also looks like David want to see that feature on QEMU as not always
>>>> feature too.
>>>>
>>>>> Thanks,
>>>>>
>>>>>> from patch message:
>>>>>>
>>>>>>   Process's thread id is being provided when user requeste it
>>>>>> by setting UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.
>>>>>>
>>>>>> UFFD_FEATURE_MISSING_HUGETLBFS - look like default, unconditional
>>>>>> behavior (I didn't find any usage of that define in kernel).
>>>>> -- 
>>>>> Peter Xu
>>>>>
>>>> -- 
>>>>
>>>> BR
>>>> Alexey
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>> -- 
>>
>> BR
>> Alexey
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
>


-- 
Best regards,
Alexey Perevalov

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2017-04-25 11:51 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20170414131735eucas1p21f1fcadf426789276f567191372f7794@eucas1p2.samsung.com>
2017-04-14 13:17 ` [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration Alexey Perevalov
     [not found]   ` <CGME20170414131738eucas1p28fe4896d7f42d8c5b23cb95312c41eca@eucas1p2.samsung.com>
2017-04-14 13:17     ` [Qemu-devel] [PATCH 1/6] userfault: add pid into uffd_msg & update UFFD_FEATURE_* Alexey Perevalov
     [not found]   ` <CGME20170414131739eucas1p1ea9a6adcdbe8cfe45ac1ff582d28d873@eucas1p1.samsung.com>
2017-04-14 13:17     ` [Qemu-devel] [PATCH 2/6] util: introduce glib-helper.c Alexey Perevalov
2017-04-14 16:05       ` Philippe Mathieu-Daudé
2017-04-17  7:07         ` Alexey
2017-04-21 10:01         ` Dr. David Alan Gilbert
2017-04-21 10:27       ` Peter Maydell
2017-04-21 15:10         ` Alexey
2017-04-21 15:49           ` Peter Maydell
2017-04-25 11:23             ` Dr. David Alan Gilbert
     [not found]   ` <CGME20170414131739eucas1p27a3eed795ae545efff380d7c5f8358c3@eucas1p2.samsung.com>
2017-04-14 13:17     ` [Qemu-devel] [PATCH 3/6] migration: add UFFD_FEATURE_THREAD_ID feature support Alexey Perevalov
2017-04-21 10:24       ` Dr. David Alan Gilbert
2017-04-21 15:22         ` Alexey
2017-04-24  8:03           ` Peter Xu
2017-04-24  8:12           ` Peter Xu
2017-04-24  8:38             ` Alexey
2017-04-24 17:10               ` Dr. David Alan Gilbert
2017-04-25  7:55                 ` Alexey
2017-04-25 11:14                   ` Dr. David Alan Gilbert
2017-04-25 11:51                     ` Alexey Perevalov
     [not found]   ` <CGME20170414131740eucas1p27eba648b990a93a627265c740e7ff118@eucas1p2.samsung.com>
2017-04-14 13:17     ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side Alexey Perevalov
2017-04-21 12:00       ` Dr. David Alan Gilbert
2017-04-21 18:47         ` Alexey
2017-04-24 17:11           ` Dr. David Alan Gilbert
2017-04-22  9:49         ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side (CPUMASK) Alexey
2017-04-24 17:13           ` Dr. David Alan Gilbert
2017-04-25  8:24       ` [Qemu-devel] [PATCH 4/6] migration: calculate downtime on dst side Peter Xu
2017-04-25 10:10         ` Alexey Perevalov
2017-04-25 10:25           ` Peter Xu
2017-04-25 10:47             ` Alexey Perevalov
     [not found]   ` <CGME20170414131740eucas1p28f240a4e6c78fb56be52f2641c3e5af6@eucas1p2.samsung.com>
2017-04-14 13:17     ` [Qemu-devel] [PATCH 5/6] migration: send postcopy downtime back to source Alexey Perevalov
2017-04-24 17:26       ` Dr. David Alan Gilbert
2017-04-25  5:51         ` Alexey
     [not found]   ` <CGME20170414131741eucas1p2f34e11e4292fef1c50ef63bd3522ad04@eucas1p2.samsung.com>
2017-04-14 13:17     ` [Qemu-devel] [PATCH 6/6] migration: detailed traces for postcopy Alexey Perevalov
2017-04-17 13:32       ` Philippe Mathieu-Daudé
2017-04-24 18:03       ` Dr. David Alan Gilbert
2017-04-17  2:32   ` [Qemu-devel] [PATCH 0/6] calculate downtime for postcopy live migration no-reply
2017-04-17  2:36   ` no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.