All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
@ 2013-07-21  8:42 Liu Ping Fan
  2013-07-21  8:42 ` [Qemu-devel] [RFC 1/8] timer: associate alarm_timer with AioContext Liu Ping Fan
                   ` (9 more replies)
  0 siblings, 10 replies; 63+ messages in thread
From: Liu Ping Fan @ 2013-07-21  8:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

Currently, the timers run on iothread within BQL, so virtio-block dataplane can not use throttle,
as Stefan Hajnoczi pointed out in his patches to port dataplane onto block layer.(Thanks, Stefan)
To enable this feature, I plan to enable timers to run on AioContext's thread.
And maybe in future, hpet can run with its dedicated thread too.

Also, I see Alex Bligh is on the same effort by another method,(it is a good idea) 
  "[RFC] aio/async: Add timed bottom-halves".
So I think it is better to post my series for discussion, although I have not thought
very clearly point, e.g. sigaction 

This series ports two parts of timer stuff onto AioContext: alarm timer and timer list.
Currently I worked based on Stefanha's git tree
	 https://github.com/stefanha/qemu.git dataplane-block-layer.

---
Open issue:
  The thread safe problem on timer list. To resolve that, I plan to adopt the bh model.
(Not sure about this, maybe Stefan's solution in another thread is better)  
Although leave most of the race issue unresolved, patch 4 has tried to fix one of
them as Jan Kiszka points out that vm_clock can be read outside BQL, thanks Jan :)



Liu Ping Fan (8):
  timer: associate alarm_timer with AioContext
  timer: pick out timer list info from QemuClock
  timer: make timers_state static
  timer: protect timers_state with lock
  timer: associate timer with AioContext
  timer: run timers on aio_poll
  block: associate BlockDriverState with AioContext
  block: enable throttle with aiocontext

 aio-posix.c                     |   2 +
 async.c                         |  12 ++
 block.c                         |  13 ++-
 cpus.c                          |  29 ++++-
 hw/block/dataplane/virtio-blk.c |   1 +
 include/block/aio.h             |  14 +++
 include/block/block.h           |   1 +
 include/block/block_int.h       |   1 +
 include/qemu/timer.h            |  24 +++-
 main-loop.c                     |   6 -
 qemu-timer.c                    | 237 +++++++++++++++++++++++++++-------------
 11 files changed, 253 insertions(+), 87 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Qemu-devel] [RFC 1/8] timer: associate alarm_timer with AioContext
  2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
@ 2013-07-21  8:42 ` Liu Ping Fan
  2013-07-22  6:55   ` Jan Kiszka
  2013-07-21  8:42 ` [Qemu-devel] [RFC 2/8] timer: pick out timer list info from QemuClock Liu Ping Fan
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 63+ messages in thread
From: Liu Ping Fan @ 2013-07-21  8:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

We will arm each AioContext with its own timer stuff. As the first
step, we should make each AioContext with its own alarm_timer,
so they can raise the deadline independent.
Each thread with AioContext will have dedicated signal handler
to trigger it own alarm_timer. And all of the alarm timers are
linked on a QSLIST for clock reset.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
----- To fix ------------
sigaction(SIGALRM, &act, NULL) is process-wide, not thread?
I think what I need is an timerfd for each thread, right?
Will fix in next version.

---
 async.c              |   3 ++
 include/block/aio.h  |   1 +
 include/qemu/timer.h |   4 +-
 main-loop.c          |   4 --
 qemu-timer.c         | 106 +++++++++++++++++++++++++++++++++++++--------------
 5 files changed, 85 insertions(+), 33 deletions(-)

diff --git a/async.c b/async.c
index ba4072c..8209cea 100644
--- a/async.c
+++ b/async.c
@@ -26,6 +26,7 @@
 #include "block/aio.h"
 #include "block/thread-pool.h"
 #include "qemu/main-loop.h"
+#include "qemu/timer.h"
 
 DEFINE_TLS(AioContext*, thread_aio_context);
 
@@ -206,6 +207,7 @@ aio_ctx_finalize(GSource     *source)
     aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL);
     event_notifier_cleanup(&ctx->notifier);
     g_array_free(ctx->pollfds, TRUE);
+    alarm_timer_destroy(ctx->alarm_timer);
 }
 
 static GSourceFuncs aio_source_funcs = {
@@ -245,6 +247,7 @@ AioContext *aio_context_new(void)
     aio_set_event_notifier(ctx, &ctx->notifier, 
                            (EventNotifierHandler *)
                            event_notifier_test_and_clear, NULL);
+    ctx->alarm_timer = alarm_timer_create(ctx);
 
     return ctx;
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index 04598b2..84537a2 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -73,6 +73,7 @@ typedef struct AioContext {
 
     /* Thread pool for performing work and receiving completion callbacks */
     struct ThreadPool *thread_pool;
+    struct qemu_alarm_timer *alarm_timer;
 } AioContext;
 
 /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 9dd206c..4a72c99 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -57,7 +57,9 @@ void qemu_run_timers(QEMUClock *clock);
 void qemu_run_all_timers(void);
 void configure_alarms(char const *opt);
 void init_clocks(void);
-int init_timer_alarm(void);
+int init_timer_alarm(struct qemu_alarm_timer *t);
+struct qemu_alarm_timer *alarm_timer_create(AioContext *ctx);
+void alarm_timer_destroy(struct qemu_alarm_timer *alarm);
 
 int64_t cpu_get_ticks(void);
 void cpu_enable_ticks(void);
diff --git a/main-loop.c b/main-loop.c
index 5fbdd4a..4a94a52 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -131,10 +131,6 @@ int qemu_init_main_loop(void)
     GSource *src;
 
     init_clocks();
-    if (init_timer_alarm() < 0) {
-        fprintf(stderr, "could not initialize alarm timer\n");
-        exit(1);
-    }
 
     ret = qemu_signal_init();
     if (ret) {
diff --git a/qemu-timer.c b/qemu-timer.c
index 9500d12..32c70ed 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -65,6 +65,8 @@ struct QEMUTimer {
     int scale;
 };
 
+typedef struct qemu_alarm_timer qemu_alarm_timer;
+
 struct qemu_alarm_timer {
     char const *name;
     int (*start)(struct qemu_alarm_timer *t);
@@ -82,9 +84,43 @@ struct qemu_alarm_timer {
     /* Was the nearest deadline timer modified (possibly by another thread)? */
     QemuMutex timer_modified_lock;
     bool timer_modified;
+    AioContext *ctx;
+    /* protected by alarm_timer_list_lock */
+    QSLIST_ENTRY(qemu_alarm_timer) next_alarm_timer;
 };
 
-static struct qemu_alarm_timer *alarm_timer;
+static QSLIST_HEAD(, qemu_alarm_timer) \
+    alarm_timer_list = QSLIST_HEAD_INITIALIZER(alarm_timer_list);
+/* innermost lock */
+static QemuMutex alarm_timer_list_lock;
+
+struct qemu_alarm_timer *alarm_timer_create(AioContext *ctx)
+{
+    struct qemu_alarm_timer *t;
+
+    t = g_malloc0(sizeof(qemu_alarm_timer));
+    init_timer_alarm(t);
+    t->ctx = ctx;
+    qemu_mutex_lock(&alarm_timer_list_lock);
+    QSLIST_INSERT_HEAD(&alarm_timer_list, t, next_alarm_timer);
+    qemu_mutex_unlock(&alarm_timer_list_lock);
+    return t;
+}
+
+void alarm_timer_destroy(struct qemu_alarm_timer *t)
+{
+    struct qemu_alarm_timer *var, *tvar;
+
+    t->stop(t);
+    qemu_mutex_lock(&alarm_timer_list_lock);
+    QSLIST_FOREACH_SAFE(var, &alarm_timer_list, next_alarm_timer, tvar) {
+        if (tvar == t) {
+            QSLIST_REMOVE_AFTER(var, next_alarm_timer);
+        }
+    }
+    qemu_mutex_unlock(&alarm_timer_list_lock);
+    g_free(t);
+}
 
 static bool qemu_timer_expired_ns(QEMUTimer *timer_head, int64_t current_time)
 {
@@ -114,7 +150,10 @@ static int64_t qemu_next_clock_deadline(QEMUClock *clock, int64_t delta)
     return MIN(next, delta);
 }
 
-static int64_t qemu_next_alarm_deadline(void)
+/* Soon this will be fixed: till now, timer list is not associated with
+ * AioContext, so @ctx has no effect on deadline currently.
+ */
+static int64_t qemu_next_alarm_deadline(AioContext *ctx)
 {
     int64_t delta = INT64_MAX;
 
@@ -127,12 +166,23 @@ static int64_t qemu_next_alarm_deadline(void)
 
 static void qemu_rearm_alarm_timer(struct qemu_alarm_timer *t)
 {
-    int64_t nearest_delta_ns = qemu_next_alarm_deadline();
+    int64_t nearest_delta_ns = qemu_next_alarm_deadline(t->ctx);
     if (nearest_delta_ns < INT64_MAX) {
         t->rearm(t, nearest_delta_ns);
     }
 }
 
+static void qemu_rearm_alarm_timers(void)
+{
+    struct qemu_alarm_timer *t;
+
+    qemu_mutex_lock(&alarm_timer_list_lock);
+    QSLIST_FOREACH(t, &alarm_timer_list, next_alarm_timer) {
+        qemu_rearm_alarm_timer(t);
+    }
+    qemu_mutex_unlock(&alarm_timer_list_lock);
+}
+
 /* TODO: MIN_TIMER_REARM_NS should be optimized */
 #define MIN_TIMER_REARM_NS 250000
 
@@ -262,7 +312,7 @@ void qemu_clock_enable(QEMUClock *clock, bool enabled)
     bool old = clock->enabled;
     clock->enabled = enabled;
     if (enabled && !old) {
-        qemu_rearm_alarm_timer(alarm_timer);
+        qemu_rearm_alarm_timers();
     }
 }
 
@@ -355,6 +405,8 @@ void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
 {
     QEMUClock *clock = ts->clock;
     QEMUTimer **pt, *t;
+    AioContext *ctx = *tls_get_thread_aio_context();
+    struct qemu_alarm_timer *alarm_timer = ctx->alarm_timer;
 
     qemu_del_timer(ts);
 
@@ -485,6 +537,8 @@ uint64_t qemu_timer_expire_time_ns(QEMUTimer *ts)
 void qemu_run_all_timers(void)
 {
     bool timer_modified;
+    AioContext *ctx = *tls_get_thread_aio_context();
+    struct qemu_alarm_timer *alarm_timer = ctx->alarm_timer;
 
     alarm_timer->pending = false;
 
@@ -515,13 +569,15 @@ static void CALLBACK host_alarm_handler(PVOID lpParam, BOOLEAN unused)
 static void host_alarm_handler(int host_signum)
 #endif
 {
-    struct qemu_alarm_timer *t = alarm_timer;
+    AioContext *ctx = *tls_get_thread_aio_context();
+    struct qemu_alarm_timer *t = ctx->alarm_timer;
+
     if (!t)
 	return;
 
     t->expired = true;
     t->pending = true;
-    qemu_notify_event();
+    aio_notify(ctx);
 }
 
 #if defined(__linux__)
@@ -774,37 +830,30 @@ static void win32_rearm_timer(struct qemu_alarm_timer *t,
 
 #endif /* _WIN32 */
 
-static void quit_timers(void)
-{
-    struct qemu_alarm_timer *t = alarm_timer;
-    alarm_timer = NULL;
-    t->stop(t);
-}
-
 #ifdef CONFIG_POSIX
 static void reinit_timers(void)
 {
-    struct qemu_alarm_timer *t = alarm_timer;
-    t->stop(t);
-    if (t->start(t)) {
-        fprintf(stderr, "Internal timer error: aborting\n");
-        exit(1);
+    struct qemu_alarm_timer *t;
+
+    qemu_mutex_lock(&alarm_timer_list_lock);
+    QSLIST_FOREACH(t, &alarm_timer_list, next_alarm_timer) {
+        t->stop(t);
+        if (t->start(t)) {
+            fprintf(stderr, "Internal timer error: aborting\n");
+            exit(1);
+        }
+        qemu_rearm_alarm_timer(t);
     }
-    qemu_rearm_alarm_timer(t);
+    qemu_mutex_unlock(&alarm_timer_list_lock);
 }
 #endif /* CONFIG_POSIX */
 
-int init_timer_alarm(void)
+int init_timer_alarm(struct qemu_alarm_timer *t)
 {
-    struct qemu_alarm_timer *t = NULL;
     int i, err = -1;
 
-    if (alarm_timer) {
-        return 0;
-    }
-
     for (i = 0; alarm_timers[i].name; i++) {
-        t = &alarm_timers[i];
+        *t = alarm_timers[i];
 
         err = t->start(t);
         if (!err)
@@ -818,14 +867,15 @@ int init_timer_alarm(void)
 
     qemu_mutex_init(&t->timer_modified_lock);
 
-    atexit(quit_timers);
 #ifdef CONFIG_POSIX
     pthread_atfork(NULL, NULL, reinit_timers);
 #endif
-    alarm_timer = t;
     return 0;
 
 fail:
+    fprintf(stderr, "could not initialize alarm timer\n");
+    exit(1);
+
     return err;
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [RFC 2/8] timer: pick out timer list info from QemuClock
  2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
  2013-07-21  8:42 ` [Qemu-devel] [RFC 1/8] timer: associate alarm_timer with AioContext Liu Ping Fan
@ 2013-07-21  8:42 ` Liu Ping Fan
  2013-07-21  8:43 ` [Qemu-devel] [RFC 3/8] timer: make timers_state static Liu Ping Fan
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 63+ messages in thread
From: Liu Ping Fan @ 2013-07-21  8:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

To allow timer to run on the different backend threads,
we will arm the thread's AioContext with its own timer list.
So separate these kind of info from QemuClock.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 qemu-timer.c | 106 ++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 68 insertions(+), 38 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index 32c70ed..0ee68dc 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -30,6 +30,7 @@
 
 #include "qemu/timer.h"
 #include "qemu/thread.h"
+#include "qemu/tls.h"
 #ifdef CONFIG_POSIX
 #include <pthread.h>
 #endif
@@ -44,11 +45,16 @@
 #define QEMU_CLOCK_REALTIME 0
 #define QEMU_CLOCK_VIRTUAL  1
 #define QEMU_CLOCK_HOST     2
+#define QEMU_CLOCK_MAXCNT 3
 
-struct QEMUClock {
+typedef struct TimerList {
     QEMUTimer *active_timers;
     QemuMutex active_timers_lock;
+} TimerList;
+
+static TimerList timer_list[QEMU_CLOCK_MAXCNT];
 
+struct QEMUClock {
     NotifierList reset_notifiers;
     int64_t last;
 
@@ -58,7 +64,7 @@ struct QEMUClock {
 
 struct QEMUTimer {
     int64_t expire_time;	/* in nanoseconds */
-    QEMUClock *clock;
+    TimerList *list;
     QEMUTimerCB *cb;
     void *opaque;
     QEMUTimer *next;
@@ -122,6 +128,13 @@ void alarm_timer_destroy(struct qemu_alarm_timer *t)
     g_free(t);
 }
 
+static TimerList *clock_to_timerlist(QEMUClock *clock)
+{
+    int type = clock->type;
+
+    return  &timer_list[type];
+}
+
 static bool qemu_timer_expired_ns(QEMUTimer *timer_head, int64_t current_time)
 {
     return timer_head && (timer_head->expire_time <= current_time);
@@ -131,17 +144,19 @@ static int64_t qemu_next_clock_deadline(QEMUClock *clock, int64_t delta)
 {
     int64_t expire_time, next;
     bool has_timer = false;
+    TimerList *tlist;
 
     if (!clock->enabled) {
         return delta;
     }
 
-    qemu_mutex_lock(&clock->active_timers_lock);
-    if (clock->active_timers) {
+    tlist = clock_to_timerlist(clock);
+    qemu_mutex_lock(&tlist->active_timers_lock);
+    if (tlist->active_timers) {
         has_timer = true;
-        expire_time = clock->active_timers->expire_time;
+        expire_time = tlist->active_timers->expire_time;
     }
-    qemu_mutex_unlock(&clock->active_timers_lock);
+    qemu_mutex_unlock(&tlist->active_timers_lock);
     if (!has_timer) {
         return delta;
     }
@@ -294,16 +309,25 @@ QEMUClock *rt_clock;
 QEMUClock *vm_clock;
 QEMUClock *host_clock;
 
+static void timer_list_init(TimerList *tlist)
+{
+    qemu_mutex_init(&tlist->active_timers_lock);
+    tlist->active_timers = NULL;
+}
+
 static QEMUClock *qemu_new_clock(int type)
 {
     QEMUClock *clock;
+    TimerList *tlist;
 
     clock = g_malloc0(sizeof(QEMUClock));
     clock->type = type;
     clock->enabled = true;
     clock->last = INT64_MIN;
     notifier_list_init(&clock->reset_notifiers);
-    qemu_mutex_init(&clock->active_timers_lock);
+    tlist = clock_to_timerlist(clock);
+    timer_list_init(tlist);
+
     return clock;
 }
 
@@ -319,10 +343,11 @@ void qemu_clock_enable(QEMUClock *clock, bool enabled)
 int64_t qemu_clock_has_timers(QEMUClock *clock)
 {
     bool has_timers;
+    TimerList *tlist = clock_to_timerlist(clock);
 
-    qemu_mutex_lock(&clock->active_timers_lock);
-    has_timers = !!clock->active_timers;
-    qemu_mutex_unlock(&clock->active_timers_lock);
+    qemu_mutex_lock(&tlist->active_timers_lock);
+    has_timers = !!tlist->active_timers;
+    qemu_mutex_unlock(&tlist->active_timers_lock);
     return has_timers;
 }
 
@@ -330,11 +355,12 @@ int64_t qemu_clock_expired(QEMUClock *clock)
 {
     bool has_timers;
     int64_t expire_time;
+    TimerList *tlist = clock_to_timerlist(clock);
 
-    qemu_mutex_lock(&clock->active_timers_lock);
-    has_timers = clock->active_timers;
-    expire_time = clock->active_timers->expire_time;
-    qemu_mutex_unlock(&clock->active_timers_lock);
+    qemu_mutex_lock(&tlist->active_timers_lock);
+    has_timers = tlist->active_timers;
+    expire_time = tlist->active_timers->expire_time;
+    qemu_mutex_unlock(&tlist->active_timers_lock);
 
     return has_timers && expire_time < qemu_get_clock_ns(clock);
 }
@@ -345,11 +371,12 @@ int64_t qemu_clock_deadline(QEMUClock *clock)
     int64_t delta = INT32_MAX;
     bool has_timers;
     int64_t expire_time;
+    TimerList *tlist = clock_to_timerlist(clock);
 
-    qemu_mutex_lock(&clock->active_timers_lock);
-    has_timers = clock->active_timers;
-    expire_time = clock->active_timers->expire_time;
-    qemu_mutex_unlock(&clock->active_timers_lock);
+    qemu_mutex_lock(&tlist->active_timers_lock);
+    has_timers = tlist->active_timers;
+    expire_time = tlist->active_timers->expire_time;
+    qemu_mutex_unlock(&tlist->active_timers_lock);
 
     if (has_timers) {
         delta = expire_time - qemu_get_clock_ns(clock);
@@ -366,7 +393,7 @@ QEMUTimer *qemu_new_timer(QEMUClock *clock, int scale,
     QEMUTimer *ts;
 
     ts = g_malloc0(sizeof(QEMUTimer));
-    ts->clock = clock;
+    ts->list = clock_to_timerlist(clock);
     ts->cb = cb;
     ts->opaque = opaque;
     ts->scale = scale;
@@ -381,11 +408,11 @@ void qemu_free_timer(QEMUTimer *ts)
 /* stop a timer, but do not dealloc it */
 void qemu_del_timer(QEMUTimer *ts)
 {
-    QEMUClock *clock = ts->clock;
+    TimerList *tlist = ts->list;
     QEMUTimer **pt, *t;
 
-    qemu_mutex_lock(&clock->active_timers_lock);
-    pt = &ts->clock->active_timers;
+    qemu_mutex_lock(&tlist->active_timers_lock);
+    pt = &tlist->active_timers;
     for(;;) {
         t = *pt;
         if (!t)
@@ -396,14 +423,14 @@ void qemu_del_timer(QEMUTimer *ts)
         }
         pt = &t->next;
     }
-    qemu_mutex_unlock(&clock->active_timers_lock);
+    qemu_mutex_unlock(&tlist->active_timers_lock);
 }
 
 /* modify the current timer so that it will be fired when current_time
    >= expire_time. The corresponding callback will be called. */
 void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
 {
-    QEMUClock *clock = ts->clock;
+    TimerList *tlist = ts->list;
     QEMUTimer **pt, *t;
     AioContext *ctx = *tls_get_thread_aio_context();
     struct qemu_alarm_timer *alarm_timer = ctx->alarm_timer;
@@ -411,8 +438,8 @@ void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
     qemu_del_timer(ts);
 
     /* add the timer in the sorted list */
-    qemu_mutex_lock(&clock->active_timers_lock);
-    pt = &clock->active_timers;
+    qemu_mutex_lock(&tlist->active_timers_lock);
+    pt = &tlist->active_timers;
     for(;;) {
         t = *pt;
         if (!qemu_timer_expired_ns(t, expire_time)) {
@@ -423,10 +450,10 @@ void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
     ts->expire_time = expire_time;
     ts->next = *pt;
     *pt = ts;
-    qemu_mutex_unlock(&clock->active_timers_lock);
+    qemu_mutex_unlock(&tlist->active_timers_lock);
 
     /* Rearm if necessary  */
-    if (pt == &ts->clock->active_timers) {
+    if (pt == &tlist->active_timers) {
         qemu_mutex_lock(&alarm_timer->timer_modified_lock);
         alarm_timer->timer_modified = true;
         qemu_mutex_unlock(&alarm_timer->timer_modified_lock);
@@ -442,15 +469,15 @@ void qemu_mod_timer(QEMUTimer *ts, int64_t expire_time)
 bool qemu_timer_pending(QEMUTimer *ts)
 {
     QEMUTimer *t;
-    QEMUClock *clock = ts->clock;
+    TimerList *tlist = ts->list;
 
-    qemu_mutex_lock(&clock->active_timers_lock);
-    for (t = ts->clock->active_timers; t != NULL; t = t->next) {
+    qemu_mutex_lock(&tlist->active_timers_lock);
+    for (t = tlist->active_timers; t != NULL; t = t->next) {
         if (t == ts) {
             break;
         }
     }
-    qemu_mutex_unlock(&clock->active_timers_lock);
+    qemu_mutex_unlock(&tlist->active_timers_lock);
     return t;
 }
 
@@ -463,22 +490,25 @@ void qemu_run_timers(QEMUClock *clock)
 {
     QEMUTimer *ts;
     int64_t current_time;
-   
+    TimerList *tlist;
+
     if (!clock->enabled)
         return;
 
     current_time = qemu_get_clock_ns(clock);
+    tlist = clock_to_timerlist(clock);
+
     for(;;) {
-        qemu_mutex_lock(&clock->active_timers_lock);
-        ts = clock->active_timers;
+        qemu_mutex_lock(&tlist->active_timers_lock);
+        ts = tlist->active_timers;
         if (!qemu_timer_expired_ns(ts, current_time)) {
-            qemu_mutex_unlock(&clock->active_timers_lock);
+            qemu_mutex_unlock(&tlist->active_timers_lock);
             break;
         }
         /* remove timer from the list before calling the callback */
-        clock->active_timers = ts->next;
+        tlist->active_timers = ts->next;
         ts->next = NULL;
-        qemu_mutex_unlock(&clock->active_timers_lock);
+        qemu_mutex_unlock(&tlist->active_timers_lock);
 
         /* run the callback (the timer list can be modified) */
         ts->cb(ts->opaque);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [RFC 3/8] timer: make timers_state static
  2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
  2013-07-21  8:42 ` [Qemu-devel] [RFC 1/8] timer: associate alarm_timer with AioContext Liu Ping Fan
  2013-07-21  8:42 ` [Qemu-devel] [RFC 2/8] timer: pick out timer list info from QemuClock Liu Ping Fan
@ 2013-07-21  8:43 ` Liu Ping Fan
  2013-07-22  6:36   ` Jan Kiszka
  2013-07-21  8:43 ` [Qemu-devel] [RFC 4/8] timer: protect timers_state with lock Liu Ping Fan
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 63+ messages in thread
From: Liu Ping Fan @ 2013-07-21  8:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 cpus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 61e86a8..4254ca9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -112,7 +112,7 @@ typedef struct TimersState {
     int64_t dummy;
 } TimersState;
 
-TimersState timers_state;
+static TimersState timers_state;
 
 /* Return the virtual CPU time, based on the instruction counter.  */
 int64_t cpu_get_icount(void)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [RFC 4/8] timer: protect timers_state with lock
  2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
                   ` (2 preceding siblings ...)
  2013-07-21  8:43 ` [Qemu-devel] [RFC 3/8] timer: make timers_state static Liu Ping Fan
@ 2013-07-21  8:43 ` Liu Ping Fan
  2013-07-22  6:40   ` Jan Kiszka
  2013-07-21  8:43 ` [Qemu-devel] [RFC 5/8] timer: associate timer with AioContext Liu Ping Fan
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 63+ messages in thread
From: Liu Ping Fan @ 2013-07-21  8:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

In kvm mode, vm_clock may be read on AioContexts outside BQL(next
patch). This will make timers_state --the foundation of vm_clock
exposed to race condition. Using private lock to protect it.
Note in tcg mode, vm_clock still read inside BQL, so icount is
left without change.

Lock rule: private lock innermost, ie BQL->"this lock"

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 cpus.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/cpus.c b/cpus.c
index 4254ca9..22df5fb 100644
--- a/cpus.c
+++ b/cpus.c
@@ -113,6 +113,8 @@ typedef struct TimersState {
 } TimersState;
 
 static TimersState timers_state;
+/* lock rule: innermost */
+static QemuMutex timers_state_lock;
 
 /* Return the virtual CPU time, based on the instruction counter.  */
 int64_t cpu_get_icount(void)
@@ -134,11 +136,15 @@ int64_t cpu_get_icount(void)
 /* return the host CPU cycle counter and handle stop/restart */
 int64_t cpu_get_ticks(void)
 {
+    int64_t ret;
+
     if (use_icount) {
         return cpu_get_icount();
     }
+    qemu_mutex_lock(&timers_state_lock);
     if (!timers_state.cpu_ticks_enabled) {
-        return timers_state.cpu_ticks_offset;
+        ret = timers_state.cpu_ticks_offset;
+        goto out;
     } else {
         int64_t ticks;
         ticks = cpu_get_real_ticks();
@@ -148,41 +154,53 @@ int64_t cpu_get_ticks(void)
             timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - ticks;
         }
         timers_state.cpu_ticks_prev = ticks;
-        return ticks + timers_state.cpu_ticks_offset;
+        ret = ticks + timers_state.cpu_ticks_offset;
+        goto out;
     }
+out:
+    qemu_mutex_lock(&timers_state_lock);
+    return ret;
 }
 
 /* return the host CPU monotonic timer and handle stop/restart */
 int64_t cpu_get_clock(void)
 {
     int64_t ti;
+
+    qemu_mutex_lock(&timers_state_lock);
     if (!timers_state.cpu_ticks_enabled) {
-        return timers_state.cpu_clock_offset;
+        ti = timers_state.cpu_clock_offset;
     } else {
         ti = get_clock();
-        return ti + timers_state.cpu_clock_offset;
+        ti += timers_state.cpu_clock_offset;
     }
+    qemu_mutex_unlock(&timers_state_lock);
+    return ti;
 }
 
 /* enable cpu_get_ticks() */
 void cpu_enable_ticks(void)
 {
+    qemu_mutex_lock(&timers_state_lock);
     if (!timers_state.cpu_ticks_enabled) {
         timers_state.cpu_ticks_offset -= cpu_get_real_ticks();
         timers_state.cpu_clock_offset -= get_clock();
         timers_state.cpu_ticks_enabled = 1;
     }
+    qemu_mutex_unlock(&timers_state_lock);
 }
 
 /* disable cpu_get_ticks() : the clock is stopped. You must not call
    cpu_get_ticks() after that.  */
 void cpu_disable_ticks(void)
 {
+    qemu_mutex_lock(&timers_state_lock);
     if (timers_state.cpu_ticks_enabled) {
         timers_state.cpu_ticks_offset = cpu_get_ticks();
         timers_state.cpu_clock_offset = cpu_get_clock();
         timers_state.cpu_ticks_enabled = 0;
     }
+    qemu_mutex_unlock(&timers_state_lock);
 }
 
 /* Correlation between real and virtual time is always going to be
@@ -353,6 +371,7 @@ static const VMStateDescription vmstate_timers = {
 
 void configure_icount(const char *option)
 {
+    qemu_mutex_init(&timers_state_lock);
     vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
     if (!option) {
         return;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [RFC 5/8] timer: associate timer with AioContext
  2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
                   ` (3 preceding siblings ...)
  2013-07-21  8:43 ` [Qemu-devel] [RFC 4/8] timer: protect timers_state with lock Liu Ping Fan
@ 2013-07-21  8:43 ` Liu Ping Fan
  2013-07-21  8:43 ` [Qemu-devel] [RFC 6/8] timer: run timers on aio_poll Liu Ping Fan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 63+ messages in thread
From: Liu Ping Fan @ 2013-07-21  8:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

Currently, timers run on iothread inside QBL, this limits the usage
of timers in some case, e.g. virtio-blk-dataplane. In order to run
timers on private thread, we arm AioContext with three lists in
according to three QemuClock (and later, run timers in aio_poll).

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

------ issue to fix ---
Note: before this patch, there should be another one to fix the race
issue by qemu_mod_timer() and _run_timers().  I plan to adopt the BH
method for timers to fix it.
---
 async.c              |  9 +++++++
 include/block/aio.h  | 13 ++++++++++
 include/qemu/timer.h | 20 ++++++++++++++++
 qemu-timer.c         | 67 +++++++++++++++++++++++++++++-----------------------
 4 files changed, 80 insertions(+), 29 deletions(-)

diff --git a/async.c b/async.c
index 8209cea..36df208 100644
--- a/async.c
+++ b/async.c
@@ -202,12 +202,16 @@ static void
 aio_ctx_finalize(GSource     *source)
 {
     AioContext *ctx = (AioContext *) source;
+    int i;
 
     thread_pool_free(ctx->thread_pool);
     aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL);
     event_notifier_cleanup(&ctx->notifier);
     g_array_free(ctx->pollfds, TRUE);
     alarm_timer_destroy(ctx->alarm_timer);
+    for (i = 0; i < QEMU_CLOCK_MAXCNT; i++) {
+        timer_list_finalize(&ctx->timer_list[i]);
+    }
 }
 
 static GSourceFuncs aio_source_funcs = {
@@ -239,6 +243,8 @@ void aio_notify(AioContext *ctx)
 AioContext *aio_context_new(void)
 {
     AioContext *ctx;
+    int i;
+
     ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
     ctx->pollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
     ctx->thread_pool = NULL;
@@ -248,6 +254,9 @@ AioContext *aio_context_new(void)
                            (EventNotifierHandler *)
                            event_notifier_test_and_clear, NULL);
     ctx->alarm_timer = alarm_timer_create(ctx);
+    for (i = 0; i < QEMU_CLOCK_MAXCNT; i++) {
+        timer_list_init(&ctx->timer_list[i]);
+    }
 
     return ctx;
 }
diff --git a/include/block/aio.h b/include/block/aio.h
index 84537a2..3f1a7b4 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -43,6 +43,18 @@ typedef struct AioHandler AioHandler;
 typedef void QEMUBHFunc(void *opaque);
 typedef void IOHandler(void *opaque);
 
+/* Related timer with AioContext */
+typedef struct QEMUTimer QEMUTimer;
+#define QEMU_CLOCK_MAXCNT 3
+
+typedef struct TimerList {
+    QEMUTimer *active_timers;
+    QemuMutex active_timers_lock;
+} TimerList;
+
+void timer_list_init(TimerList *tlist);
+void timer_list_finalize(TimerList *tlist);
+
 typedef struct AioContext {
     GSource source;
 
@@ -74,6 +86,7 @@ typedef struct AioContext {
     /* Thread pool for performing work and receiving completion callbacks */
     struct ThreadPool *thread_pool;
     struct qemu_alarm_timer *alarm_timer;
+    TimerList timer_list[QEMU_CLOCK_MAXCNT];
 } AioContext;
 
 /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 4a72c99..a5acfe0 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -33,9 +33,14 @@ extern QEMUClock *vm_clock;
 extern QEMUClock *host_clock;
 
 int64_t qemu_get_clock_ns(QEMUClock *clock);
+/* qemu_clock_has_timers, qemu_clock_expired, qemu_clock_deadline
+ * run In tcg icount mode. There is only one AioContext i.e. qemu_aio_context.
+ * So we only count the timers on qemu_aio_context.
+ */
 int64_t qemu_clock_has_timers(QEMUClock *clock);
 int64_t qemu_clock_expired(QEMUClock *clock);
 int64_t qemu_clock_deadline(QEMUClock *clock);
+
 void qemu_clock_enable(QEMUClock *clock, bool enabled);
 void qemu_clock_warp(QEMUClock *clock);
 
@@ -45,6 +50,9 @@ void qemu_unregister_clock_reset_notifier(QEMUClock *clock,
 
 QEMUTimer *qemu_new_timer(QEMUClock *clock, int scale,
                           QEMUTimerCB *cb, void *opaque);
+QEMUTimer *aioctx_new_timer(QEMUClock *clock, int scale,
+                          QEMUTimerCB *cb, void *opaque, AioContext *ctx);
+
 void qemu_free_timer(QEMUTimer *ts);
 void qemu_del_timer(QEMUTimer *ts);
 void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time);
@@ -77,6 +85,18 @@ static inline QEMUTimer *qemu_new_timer_ms(QEMUClock *clock, QEMUTimerCB *cb,
     return qemu_new_timer(clock, SCALE_MS, cb, opaque);
 }
 
+static inline QEMUTimer *aioctx_new_timer_ns(QEMUClock *clock, QEMUTimerCB *cb,
+                                           void *opaque, AioContext *ctx)
+{
+    return aioctx_new_timer(clock, SCALE_NS, cb, opaque, ctx);
+}
+
+static inline QEMUTimer *aioctx_new_timer_ms(QEMUClock *clock, QEMUTimerCB *cb,
+                                           void *opaque, AioContext *ctx)
+{
+    return aioctx_new_timer(clock, SCALE_MS, cb, opaque, ctx);
+}
+
 static inline int64_t qemu_get_clock_ms(QEMUClock *clock)
 {
     return qemu_get_clock_ns(clock) / SCALE_MS;
diff --git a/qemu-timer.c b/qemu-timer.c
index 0ee68dc..1a0cbae 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -45,14 +45,6 @@
 #define QEMU_CLOCK_REALTIME 0
 #define QEMU_CLOCK_VIRTUAL  1
 #define QEMU_CLOCK_HOST     2
-#define QEMU_CLOCK_MAXCNT 3
-
-typedef struct TimerList {
-    QEMUTimer *active_timers;
-    QemuMutex active_timers_lock;
-} TimerList;
-
-static TimerList timer_list[QEMU_CLOCK_MAXCNT];
 
 struct QEMUClock {
     NotifierList reset_notifiers;
@@ -64,7 +56,9 @@ struct QEMUClock {
 
 struct QEMUTimer {
     int64_t expire_time;	/* in nanoseconds */
+    /* quick link to AioContext timer list */
     TimerList *list;
+    AioContext *ctx;
     QEMUTimerCB *cb;
     void *opaque;
     QEMUTimer *next;
@@ -128,11 +122,12 @@ void alarm_timer_destroy(struct qemu_alarm_timer *t)
     g_free(t);
 }
 
-static TimerList *clock_to_timerlist(QEMUClock *clock)
+static TimerList *clock_to_timerlist(QEMUClock *clock, AioContext *ctx)
 {
     int type = clock->type;
 
-    return  &timer_list[type];
+    assert(ctx);
+    return  &ctx->timer_list[type];
 }
 
 static bool qemu_timer_expired_ns(QEMUTimer *timer_head, int64_t current_time)
@@ -140,7 +135,8 @@ static bool qemu_timer_expired_ns(QEMUTimer *timer_head, int64_t current_time)
     return timer_head && (timer_head->expire_time <= current_time);
 }
 
-static int64_t qemu_next_clock_deadline(QEMUClock *clock, int64_t delta)
+static int64_t qemu_next_clock_deadline(QEMUClock *clock, int64_t delta,
+    AioContext *ctx)
 {
     int64_t expire_time, next;
     bool has_timer = false;
@@ -150,7 +146,7 @@ static int64_t qemu_next_clock_deadline(QEMUClock *clock, int64_t delta)
         return delta;
     }
 
-    tlist = clock_to_timerlist(clock);
+    tlist = clock_to_timerlist(clock, ctx);
     qemu_mutex_lock(&tlist->active_timers_lock);
     if (tlist->active_timers) {
         has_timer = true;
@@ -165,18 +161,15 @@ static int64_t qemu_next_clock_deadline(QEMUClock *clock, int64_t delta)
     return MIN(next, delta);
 }
 
-/* Soon this will be fixed: till now, timer list is not associated with
- * AioContext, so @ctx has no effect on deadline currently.
- */
 static int64_t qemu_next_alarm_deadline(AioContext *ctx)
 {
     int64_t delta = INT64_MAX;
 
     if (!use_icount) {
-        delta = qemu_next_clock_deadline(vm_clock, delta);
+        delta = qemu_next_clock_deadline(vm_clock, delta, ctx);
     }
-    delta = qemu_next_clock_deadline(host_clock, delta);
-    return qemu_next_clock_deadline(rt_clock, delta);
+    delta = qemu_next_clock_deadline(host_clock, delta, ctx);
+    return qemu_next_clock_deadline(rt_clock, delta, ctx);
 }
 
 static void qemu_rearm_alarm_timer(struct qemu_alarm_timer *t)
@@ -309,24 +302,27 @@ QEMUClock *rt_clock;
 QEMUClock *vm_clock;
 QEMUClock *host_clock;
 
-static void timer_list_init(TimerList *tlist)
+void timer_list_init(TimerList *tlist)
 {
     qemu_mutex_init(&tlist->active_timers_lock);
     tlist->active_timers = NULL;
 }
 
+void timer_list_finalize(TimerList *tlist)
+{
+    qemu_mutex_destroy(&tlist->active_timers_lock);
+    assert(!tlist->active_timers);
+}
+
 static QEMUClock *qemu_new_clock(int type)
 {
     QEMUClock *clock;
-    TimerList *tlist;
 
     clock = g_malloc0(sizeof(QEMUClock));
     clock->type = type;
     clock->enabled = true;
     clock->last = INT64_MIN;
     notifier_list_init(&clock->reset_notifiers);
-    tlist = clock_to_timerlist(clock);
-    timer_list_init(tlist);
 
     return clock;
 }
@@ -340,10 +336,14 @@ void qemu_clock_enable(QEMUClock *clock, bool enabled)
     }
 }
 
+/* qemu_clock_has_timers, qemu_clock_expired, qemu_clock_deadline
+  * run In tcg icount mode. There is only one AioContext i.e. qemu_aio_context.
+ * So we only count the timers on qemu_aio_context.
+*/
 int64_t qemu_clock_has_timers(QEMUClock *clock)
 {
     bool has_timers;
-    TimerList *tlist = clock_to_timerlist(clock);
+    TimerList *tlist = clock_to_timerlist(clock, qemu_get_aio_context());
 
     qemu_mutex_lock(&tlist->active_timers_lock);
     has_timers = !!tlist->active_timers;
@@ -355,7 +355,7 @@ int64_t qemu_clock_expired(QEMUClock *clock)
 {
     bool has_timers;
     int64_t expire_time;
-    TimerList *tlist = clock_to_timerlist(clock);
+    TimerList *tlist = clock_to_timerlist(clock, qemu_get_aio_context());
 
     qemu_mutex_lock(&tlist->active_timers_lock);
     has_timers = tlist->active_timers;
@@ -371,7 +371,7 @@ int64_t qemu_clock_deadline(QEMUClock *clock)
     int64_t delta = INT32_MAX;
     bool has_timers;
     int64_t expire_time;
-    TimerList *tlist = clock_to_timerlist(clock);
+    TimerList *tlist = clock_to_timerlist(clock, qemu_get_aio_context());
 
     qemu_mutex_lock(&tlist->active_timers_lock);
     has_timers = tlist->active_timers;
@@ -387,19 +387,26 @@ int64_t qemu_clock_deadline(QEMUClock *clock)
     return delta;
 }
 
-QEMUTimer *qemu_new_timer(QEMUClock *clock, int scale,
-                          QEMUTimerCB *cb, void *opaque)
+QEMUTimer *aioctx_new_timer(QEMUClock *clock, int scale,
+                          QEMUTimerCB *cb, void *opaque, AioContext *ctx)
 {
     QEMUTimer *ts;
 
     ts = g_malloc0(sizeof(QEMUTimer));
-    ts->list = clock_to_timerlist(clock);
+    ts->list = clock_to_timerlist(clock, ctx);
     ts->cb = cb;
     ts->opaque = opaque;
     ts->scale = scale;
+    ts->ctx = ctx;
     return ts;
 }
 
+QEMUTimer *qemu_new_timer(QEMUClock *clock, int scale,
+                          QEMUTimerCB *cb, void *opaque)
+{
+    return aioctx_new_timer(clock, scale, cb, opaque, qemu_get_aio_context());
+}
+
 void qemu_free_timer(QEMUTimer *ts)
 {
     g_free(ts);
@@ -491,12 +498,14 @@ void qemu_run_timers(QEMUClock *clock)
     QEMUTimer *ts;
     int64_t current_time;
     TimerList *tlist;
+    AioContext *ctx;
 
     if (!clock->enabled)
         return;
 
     current_time = qemu_get_clock_ns(clock);
-    tlist = clock_to_timerlist(clock);
+    ctx = *tls_get_thread_aio_context();
+    tlist = clock_to_timerlist(clock, ctx);
 
     for(;;) {
         qemu_mutex_lock(&tlist->active_timers_lock);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [RFC 6/8] timer: run timers on aio_poll
  2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
                   ` (4 preceding siblings ...)
  2013-07-21  8:43 ` [Qemu-devel] [RFC 5/8] timer: associate timer with AioContext Liu Ping Fan
@ 2013-07-21  8:43 ` Liu Ping Fan
  2013-07-21  9:55   ` Alex Bligh
  2013-07-21  8:43 ` [Qemu-devel] [RFC 7/8] block: associate BlockDriverState with AioContext Liu Ping Fan
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 63+ messages in thread
From: Liu Ping Fan @ 2013-07-21  8:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

Stop call timers in main loop and let each mini event-loop
run its own timers.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 aio-posix.c | 2 ++
 main-loop.c | 2 --
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index b68eccd..29c2769 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -191,6 +191,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
         progress = true;
     }
 
+    qemu_run_all_timers();
+
     if (progress && !blocking) {
         return true;
     }
diff --git a/main-loop.c b/main-loop.c
index 4a94a52..46a98a3 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -467,8 +467,6 @@ int main_loop_wait(int nonblocking)
     slirp_pollfds_poll(gpollfds, (ret < 0));
 #endif
 
-    qemu_run_all_timers();
-
     return ret;
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [RFC 7/8] block: associate BlockDriverState with AioContext
  2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
                   ` (5 preceding siblings ...)
  2013-07-21  8:43 ` [Qemu-devel] [RFC 6/8] timer: run timers on aio_poll Liu Ping Fan
@ 2013-07-21  8:43 ` Liu Ping Fan
  2013-07-21  8:43 ` [Qemu-devel] [RFC 8/8] block: enable throttle with aiocontext Liu Ping Fan
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 63+ messages in thread
From: Liu Ping Fan @ 2013-07-21  8:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

The default aio for BDS is qemu_aio_context, while for data plane,
it will has its own ctx. Relating BDS with AioContext can help
block layer to determine its running environment. Some stuff like
timers need such info, so that they can run in the correct
environment.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 block.c                         | 6 ++++++
 hw/block/dataplane/virtio-blk.c | 1 +
 include/block/block.h           | 1 +
 include/block/block_int.h       | 1 +
 4 files changed, 9 insertions(+)

diff --git a/block.c b/block.c
index daf5717..c6b7b6c 100644
--- a/block.c
+++ b/block.c
@@ -315,6 +315,12 @@ BlockDriverState *bdrv_new(const char *device_name)
     return bs;
 }
 
+void bdrv_set_aio_context(BlockDriverState *bs, AioContext *ctx)
+{
+    assert(ctx);
+    bs->ctx = ctx;
+}
+
 void bdrv_add_close_notifier(BlockDriverState *bs, Notifier *notify)
 {
     notifier_list_add(&bs->close_notifiers, notify);
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 5fde06f..ec477b0 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -378,6 +378,7 @@ void virtio_blk_data_plane_start(VirtIOBlockDataPlane *s)
     }
 
     s->ctx = aio_context_new();
+    bdrv_set_aio_context(s->bs, s->ctx);
 
     /* Set up guest notifier (irq) */
     if (k->set_guest_notifiers(qbus->parent, 1, true) != 0) {
diff --git a/include/block/block.h b/include/block/block.h
index cb44f27..23e743d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -128,6 +128,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
     QEMUOptionParameter *options);
 int bdrv_create_file(const char* filename, QEMUOptionParameter *options);
 BlockDriverState *bdrv_new(const char *device_name);
+void bdrv_set_aio_context(BlockDriverState *bs, AioContext *ctx);
 void bdrv_make_anon(BlockDriverState *bs);
 void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old);
 void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index cd0e0a8..801d1c1 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -299,6 +299,7 @@ struct BlockDriverState {
 
     QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
 
+    AioContext *ctx;
     /* long-running background operation */
     BlockJob *job;
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [Qemu-devel] [RFC 8/8] block: enable throttle with aiocontext
  2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
                   ` (6 preceding siblings ...)
  2013-07-21  8:43 ` [Qemu-devel] [RFC 7/8] block: associate BlockDriverState with AioContext Liu Ping Fan
@ 2013-07-21  8:43 ` Liu Ping Fan
  2013-07-21  9:53 ` [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Alex Bligh
  2013-07-25 12:05 ` Stefan Hajnoczi
  9 siblings, 0 replies; 63+ messages in thread
From: Liu Ping Fan @ 2013-07-21  8:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 block.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index c6b7b6c..b9e6cc8 100644
--- a/block.c
+++ b/block.c
@@ -149,7 +149,12 @@ static void bdrv_block_timer(void *opaque)
 void bdrv_io_limits_enable(BlockDriverState *bs)
 {
     qemu_co_queue_init(&bs->throttled_reqs);
-    bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+    if (!bs->ctx) {
+        bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+    } else {
+        bs->block_timer = aioctx_new_timer_ns(vm_clock, bdrv_block_timer, bs,
+                        bs->ctx);
+    }
     bs->io_limits_enabled = true;
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
                   ` (7 preceding siblings ...)
  2013-07-21  8:43 ` [Qemu-devel] [RFC 8/8] block: enable throttle with aiocontext Liu Ping Fan
@ 2013-07-21  9:53 ` Alex Bligh
  2013-07-22  4:38   ` liu ping fan
  2013-07-25 12:05 ` Stefan Hajnoczi
  9 siblings, 1 reply; 63+ messages in thread
From: Alex Bligh @ 2013-07-21  9:53 UTC (permalink / raw)
  To: Liu Ping Fan, qemu-devel
  Cc: Kevin Wolf, Anthony Liguori, Jan Kiszka, Stefan Hajnoczi,
	Alex Bligh, Paolo Bonzini

Liu,

--On 21 July 2013 16:42:57 +0800 Liu Ping Fan <qemulist@gmail.com> wrote:

> Currently, the timers run on iothread within BQL, so virtio-block
> dataplane can not use throttle, as Stefan Hajnoczi pointed out in his
> patches to port dataplane onto block layer.(Thanks, Stefan) To enable
> this feature, I plan to enable timers to run on AioContext's thread. And
> maybe in future, hpet can run with its dedicated thread too.
>
> Also, I see Alex Bligh is on the same effort by another method,(it is a
> good idea)    "[RFC] aio/async: Add timed bottom-halves".

Stefan & Paolo did not like that method much, so I did a third method
(posted yesterday) suggested by Stefan which adds a clock to AioContext (to
which timers can be attached), deletes ALL the alarm_timer stuff (which was
very cathartic), uses timeouts on the g_poll, and adds ppoll where this is
available. Series at:
  http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg03334.html

I suspect this also overlaps with your code.

So now we have 3 methods to do similar things!

One advantage of my approach is that it removes more code than it adds
(by quite a margin). However, alarm timers could have been left in.
What's the advantage in giving an AioContext its own alarm timer as
opposed to just its own clock?

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] timer: run timers on aio_poll
  2013-07-21  8:43 ` [Qemu-devel] [RFC 6/8] timer: run timers on aio_poll Liu Ping Fan
@ 2013-07-21  9:55   ` Alex Bligh
  2013-07-23  2:56     ` liu ping fan
  0 siblings, 1 reply; 63+ messages in thread
From: Alex Bligh @ 2013-07-21  9:55 UTC (permalink / raw)
  To: Liu Ping Fan, qemu-devel
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Alex Bligh,
	Anthony Liguori, Paolo Bonzini



--On 21 July 2013 16:43:03 +0800 Liu Ping Fan <qemulist@gmail.com> wrote:

> diff --git a/aio-posix.c b/aio-posix.c
> index b68eccd..29c2769 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -191,6 +191,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>          progress = true;
>      }
>
> +    qemu_run_all_timers();
> +
>      if (progress && !blocking) {
>          return true;
>      }

I am told (by Stefan H) this approach is unsafe as existing timers may
not expect to be run within aio_poll.

Also, I suspect you need to change the value of progress if timers
run so bdrv draining terminates properly.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-21  9:53 ` [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Alex Bligh
@ 2013-07-22  4:38   ` liu ping fan
  2013-07-22  6:28     ` Jan Kiszka
  2013-07-22  9:40     ` Alex Bligh
  0 siblings, 2 replies; 63+ messages in thread
From: liu ping fan @ 2013-07-22  4:38 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On Sun, Jul 21, 2013 at 5:53 PM, Alex Bligh <alex@alex.org.uk> wrote:
> Liu,
>
>
> --On 21 July 2013 16:42:57 +0800 Liu Ping Fan <qemulist@gmail.com> wrote:
>
>> Currently, the timers run on iothread within BQL, so virtio-block
>> dataplane can not use throttle, as Stefan Hajnoczi pointed out in his
>> patches to port dataplane onto block layer.(Thanks, Stefan) To enable
>> this feature, I plan to enable timers to run on AioContext's thread. And
>> maybe in future, hpet can run with its dedicated thread too.
>>
>> Also, I see Alex Bligh is on the same effort by another method,(it is a
>> good idea)    "[RFC] aio/async: Add timed bottom-halves".
>
>
> Stefan & Paolo did not like that method much, so I did a third method
> (posted yesterday) suggested by Stefan which adds a clock to AioContext (to
> which timers can be attached), deletes ALL the alarm_timer stuff (which was
> very cathartic), uses timeouts on the g_poll, and adds ppoll where this is
> available. Series at:
>  http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg03334.html
>
> I suspect this also overlaps with your code.
>
> So now we have 3 methods to do similar things!
>
> One advantage of my approach is that it removes more code than it adds
> (by quite a margin). However, alarm timers could have been left in.
> What's the advantage in giving an AioContext its own alarm timer as
> opposed to just its own clock?
>
I read your second series, and try to summary the main different between us.
Please correct me, if I misunderstood something.
--1st. You try to create a separate QemuClock for AioContext.
    I think QemuClock is the clock event source and we have three
classic with fine definition. They should be qemu-wide for time
measurement.  On the other handler, timer is  a concept for timeout,
so it can be AioContext-related. So I have patch2&5.
--2nd. You want to substitute alarm_timer with timeout of poll.
    I try to trigger each specified thread when its deadline comes.
But unfortunately, the signal can not be delivered to the specified
thread directly, and I need timerfd for each AioContext. (If we can
have the equivalent on other platform)

Anything else?

Thanks and regards,
Pingfan

> --
> Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-22  4:38   ` liu ping fan
@ 2013-07-22  6:28     ` Jan Kiszka
  2013-07-23  2:51       ` liu ping fan
  2013-07-22  9:40     ` Alex Bligh
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Kiszka @ 2013-07-22  6:28 UTC (permalink / raw)
  To: liu ping fan
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On 2013-07-22 06:38, liu ping fan wrote:
> On Sun, Jul 21, 2013 at 5:53 PM, Alex Bligh <alex@alex.org.uk> wrote:
>> Liu,
>>
>>
>> --On 21 July 2013 16:42:57 +0800 Liu Ping Fan <qemulist@gmail.com> wrote:
>>
>>> Currently, the timers run on iothread within BQL, so virtio-block
>>> dataplane can not use throttle, as Stefan Hajnoczi pointed out in his
>>> patches to port dataplane onto block layer.(Thanks, Stefan) To enable
>>> this feature, I plan to enable timers to run on AioContext's thread. And
>>> maybe in future, hpet can run with its dedicated thread too.
>>>
>>> Also, I see Alex Bligh is on the same effort by another method,(it is a
>>> good idea)    "[RFC] aio/async: Add timed bottom-halves".
>>
>>
>> Stefan & Paolo did not like that method much, so I did a third method
>> (posted yesterday) suggested by Stefan which adds a clock to AioContext (to
>> which timers can be attached), deletes ALL the alarm_timer stuff (which was
>> very cathartic), uses timeouts on the g_poll, and adds ppoll where this is
>> available. Series at:
>>  http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg03334.html
>>
>> I suspect this also overlaps with your code.
>>
>> So now we have 3 methods to do similar things!
>>
>> One advantage of my approach is that it removes more code than it adds
>> (by quite a margin). However, alarm timers could have been left in.
>> What's the advantage in giving an AioContext its own alarm timer as
>> opposed to just its own clock?
>>
> I read your second series, and try to summary the main different between us.
> Please correct me, if I misunderstood something.
> --1st. You try to create a separate QemuClock for AioContext.
>     I think QemuClock is the clock event source and we have three
> classic with fine definition. They should be qemu-wide for time
> measurement.  On the other handler, timer is  a concept for timeout,

Timers, as used in QEMU, are not only for "unimportant" and
unlikely-to-fire timeouts. They are also for potential high-rate, high
resolution events. Your last series neglects this. We may need two
versions of timers - or one interface that caters both use cases properly.

Jan

> so it can be AioContext-related. So I have patch2&5.
> --2nd. You want to substitute alarm_timer with timeout of poll.
>     I try to trigger each specified thread when its deadline comes.
> But unfortunately, the signal can not be delivered to the specified
> thread directly, and I need timerfd for each AioContext. (If we can
> have the equivalent on other platform)
> 
> Anything else?
> 
> Thanks and regards,
> Pingfan
> 
>> --
>> Alex Bligh

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 3/8] timer: make timers_state static
  2013-07-21  8:43 ` [Qemu-devel] [RFC 3/8] timer: make timers_state static Liu Ping Fan
@ 2013-07-22  6:36   ` Jan Kiszka
  2013-07-22 17:40     ` [Qemu-devel] [Qemu-trivial] " Michael Tokarev
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Kiszka @ 2013-07-22  6:36 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-trivial, qemu-devel,
	Alex Bligh, Anthony Liguori, Paolo Bonzini

On 2013-07-21 10:43, Liu Ping Fan wrote:
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  cpus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 61e86a8..4254ca9 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -112,7 +112,7 @@ typedef struct TimersState {
>      int64_t dummy;
>  } TimersState;
>  
> -TimersState timers_state;
> +static TimersState timers_state;
>  
>  /* Return the virtual CPU time, based on the instruction counter.  */
>  int64_t cpu_get_icount(void)
> 

Could go in directly via trivial IMHO.

Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 4/8] timer: protect timers_state with lock
  2013-07-21  8:43 ` [Qemu-devel] [RFC 4/8] timer: protect timers_state with lock Liu Ping Fan
@ 2013-07-22  6:40   ` Jan Kiszka
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Kiszka @ 2013-07-22  6:40 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

On 2013-07-21 10:43, Liu Ping Fan wrote:
> In kvm mode, vm_clock may be read on AioContexts outside BQL(next
> patch). This will make timers_state --the foundation of vm_clock
> exposed to race condition. Using private lock to protect it.
> Note in tcg mode, vm_clock still read inside BQL, so icount is
> left without change.
> 
> Lock rule: private lock innermost, ie BQL->"this lock"
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  cpus.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 4254ca9..22df5fb 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -113,6 +113,8 @@ typedef struct TimersState {
>  } TimersState;
>  
>  static TimersState timers_state;
> +/* lock rule: innermost */
> +static QemuMutex timers_state_lock;
>  
>  /* Return the virtual CPU time, based on the instruction counter.  */
>  int64_t cpu_get_icount(void)
> @@ -134,11 +136,15 @@ int64_t cpu_get_icount(void)
>  /* return the host CPU cycle counter and handle stop/restart */
>  int64_t cpu_get_ticks(void)
>  {
> +    int64_t ret;
> +
>      if (use_icount) {

[ Some day we should introduce something like assert_bql_held() and add
it here, among other places. ]

>          return cpu_get_icount();
>      }
> +    qemu_mutex_lock(&timers_state_lock);
>      if (!timers_state.cpu_ticks_enabled) {
> -        return timers_state.cpu_ticks_offset;
> +        ret = timers_state.cpu_ticks_offset;
> +        goto out;

No need for goto here and below.

>      } else {
>          int64_t ticks;
>          ticks = cpu_get_real_ticks();
> @@ -148,41 +154,53 @@ int64_t cpu_get_ticks(void)
>              timers_state.cpu_ticks_offset += timers_state.cpu_ticks_prev - ticks;
>          }
>          timers_state.cpu_ticks_prev = ticks;
> -        return ticks + timers_state.cpu_ticks_offset;
> +        ret = ticks + timers_state.cpu_ticks_offset;
> +        goto out;
>      }
> +out:
> +    qemu_mutex_lock(&timers_state_lock);
> +    return ret;
>  }
>  
>  /* return the host CPU monotonic timer and handle stop/restart */
>  int64_t cpu_get_clock(void)
>  {
>      int64_t ti;
> +
> +    qemu_mutex_lock(&timers_state_lock);
>      if (!timers_state.cpu_ticks_enabled) {
> -        return timers_state.cpu_clock_offset;
> +        ti = timers_state.cpu_clock_offset;
>      } else {
>          ti = get_clock();
> -        return ti + timers_state.cpu_clock_offset;
> +        ti += timers_state.cpu_clock_offset;
>      }
> +    qemu_mutex_unlock(&timers_state_lock);
> +    return ti;
>  }
>  
>  /* enable cpu_get_ticks() */
>  void cpu_enable_ticks(void)
>  {
> +    qemu_mutex_lock(&timers_state_lock);
>      if (!timers_state.cpu_ticks_enabled) {
>          timers_state.cpu_ticks_offset -= cpu_get_real_ticks();
>          timers_state.cpu_clock_offset -= get_clock();
>          timers_state.cpu_ticks_enabled = 1;
>      }
> +    qemu_mutex_unlock(&timers_state_lock);
>  }
>  
>  /* disable cpu_get_ticks() : the clock is stopped. You must not call
>     cpu_get_ticks() after that.  */
>  void cpu_disable_ticks(void)
>  {
> +    qemu_mutex_lock(&timers_state_lock);
>      if (timers_state.cpu_ticks_enabled) {
>          timers_state.cpu_ticks_offset = cpu_get_ticks();
>          timers_state.cpu_clock_offset = cpu_get_clock();
>          timers_state.cpu_ticks_enabled = 0;
>      }
> +    qemu_mutex_unlock(&timers_state_lock);
>  }
>  
>  /* Correlation between real and virtual time is always going to be
> @@ -353,6 +371,7 @@ static const VMStateDescription vmstate_timers = {
>  
>  void configure_icount(const char *option)

[ Misnamed function, it's not only about icount stuff - but not an issue
of this patch. ]

>  {
> +    qemu_mutex_init(&timers_state_lock);
>      vmstate_register(NULL, 0, &vmstate_timers, &timers_state);
>      if (!option) {
>          return;
> 

Looks good except for the goto.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 1/8] timer: associate alarm_timer with AioContext
  2013-07-21  8:42 ` [Qemu-devel] [RFC 1/8] timer: associate alarm_timer with AioContext Liu Ping Fan
@ 2013-07-22  6:55   ` Jan Kiszka
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Kiszka @ 2013-07-22  6:55 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

On 2013-07-21 10:42, Liu Ping Fan wrote:
> We will arm each AioContext with its own timer stuff. As the first
> step, we should make each AioContext with its own alarm_timer,
> so they can raise the deadline independent.
> Each thread with AioContext will have dedicated signal handler
> to trigger it own alarm_timer. And all of the alarm timers are
> linked on a QSLIST for clock reset.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ----- To fix ------------
> sigaction(SIGALRM, &act, NULL) is process-wide, not thread?
> I think what I need is an timerfd for each thread, right?
> Will fix in next version.
> 
> ---
>  async.c              |   3 ++
>  include/block/aio.h  |   1 +
>  include/qemu/timer.h |   4 +-
>  main-loop.c          |   4 --
>  qemu-timer.c         | 106 +++++++++++++++++++++++++++++++++++++--------------
>  5 files changed, 85 insertions(+), 33 deletions(-)
> 
> diff --git a/async.c b/async.c
> index ba4072c..8209cea 100644
> --- a/async.c
> +++ b/async.c
> @@ -26,6 +26,7 @@
>  #include "block/aio.h"
>  #include "block/thread-pool.h"
>  #include "qemu/main-loop.h"
> +#include "qemu/timer.h"
>  
>  DEFINE_TLS(AioContext*, thread_aio_context);
>  
> @@ -206,6 +207,7 @@ aio_ctx_finalize(GSource     *source)
>      aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL);
>      event_notifier_cleanup(&ctx->notifier);
>      g_array_free(ctx->pollfds, TRUE);
> +    alarm_timer_destroy(ctx->alarm_timer);
>  }
>  
>  static GSourceFuncs aio_source_funcs = {
> @@ -245,6 +247,7 @@ AioContext *aio_context_new(void)
>      aio_set_event_notifier(ctx, &ctx->notifier, 
>                             (EventNotifierHandler *)
>                             event_notifier_test_and_clear, NULL);
> +    ctx->alarm_timer = alarm_timer_create(ctx);
>  
>      return ctx;
>  }
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 04598b2..84537a2 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -73,6 +73,7 @@ typedef struct AioContext {
>  
>      /* Thread pool for performing work and receiving completion callbacks */
>      struct ThreadPool *thread_pool;
> +    struct qemu_alarm_timer *alarm_timer;
>  } AioContext;
>  
>  /* Returns 1 if there are still outstanding AIO requests; 0 otherwise */
> diff --git a/include/qemu/timer.h b/include/qemu/timer.h
> index 9dd206c..4a72c99 100644
> --- a/include/qemu/timer.h
> +++ b/include/qemu/timer.h
> @@ -57,7 +57,9 @@ void qemu_run_timers(QEMUClock *clock);
>  void qemu_run_all_timers(void);
>  void configure_alarms(char const *opt);
>  void init_clocks(void);
> -int init_timer_alarm(void);
> +int init_timer_alarm(struct qemu_alarm_timer *t);
> +struct qemu_alarm_timer *alarm_timer_create(AioContext *ctx);
> +void alarm_timer_destroy(struct qemu_alarm_timer *alarm);
>  
>  int64_t cpu_get_ticks(void);
>  void cpu_enable_ticks(void);
> diff --git a/main-loop.c b/main-loop.c
> index 5fbdd4a..4a94a52 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -131,10 +131,6 @@ int qemu_init_main_loop(void)
>      GSource *src;
>  
>      init_clocks();
> -    if (init_timer_alarm() < 0) {
> -        fprintf(stderr, "could not initialize alarm timer\n");
> -        exit(1);
> -    }
>  
>      ret = qemu_signal_init();
>      if (ret) {
> diff --git a/qemu-timer.c b/qemu-timer.c
> index 9500d12..32c70ed 100644
> --- a/qemu-timer.c
> +++ b/qemu-timer.c
> @@ -65,6 +65,8 @@ struct QEMUTimer {
>      int scale;
>  };
>  
> +typedef struct qemu_alarm_timer qemu_alarm_timer;
> +
>  struct qemu_alarm_timer {
>      char const *name;
>      int (*start)(struct qemu_alarm_timer *t);
> @@ -82,9 +84,43 @@ struct qemu_alarm_timer {
>      /* Was the nearest deadline timer modified (possibly by another thread)? */
>      QemuMutex timer_modified_lock;
>      bool timer_modified;
> +    AioContext *ctx;
> +    /* protected by alarm_timer_list_lock */
> +    QSLIST_ENTRY(qemu_alarm_timer) next_alarm_timer;
>  };
>  
> -static struct qemu_alarm_timer *alarm_timer;
> +static QSLIST_HEAD(, qemu_alarm_timer) \
> +    alarm_timer_list = QSLIST_HEAD_INITIALIZER(alarm_timer_list);
> +/* innermost lock */
> +static QemuMutex alarm_timer_list_lock;
> +
> +struct qemu_alarm_timer *alarm_timer_create(AioContext *ctx)
> +{
> +    struct qemu_alarm_timer *t;
> +
> +    t = g_malloc0(sizeof(qemu_alarm_timer));
> +    init_timer_alarm(t);
> +    t->ctx = ctx;
> +    qemu_mutex_lock(&alarm_timer_list_lock);
> +    QSLIST_INSERT_HEAD(&alarm_timer_list, t, next_alarm_timer);
> +    qemu_mutex_unlock(&alarm_timer_list_lock);
> +    return t;
> +}
> +
> +void alarm_timer_destroy(struct qemu_alarm_timer *t)
> +{
> +    struct qemu_alarm_timer *var, *tvar;
> +
> +    t->stop(t);
> +    qemu_mutex_lock(&alarm_timer_list_lock);
> +    QSLIST_FOREACH_SAFE(var, &alarm_timer_list, next_alarm_timer, tvar) {
> +        if (tvar == t) {
> +            QSLIST_REMOVE_AFTER(var, next_alarm_timer);
> +        }
> +    }
> +    qemu_mutex_unlock(&alarm_timer_list_lock);
> +    g_free(t);
> +}
>  
>  static bool qemu_timer_expired_ns(QEMUTimer *timer_head, int64_t current_time)
>  {
> @@ -114,7 +150,10 @@ static int64_t qemu_next_clock_deadline(QEMUClock *clock, int64_t delta)
>      return MIN(next, delta);
>  }
>  
> -static int64_t qemu_next_alarm_deadline(void)
> +/* Soon this will be fixed: till now, timer list is not associated with
> + * AioContext, so @ctx has no effect on deadline currently.
> + */
> +static int64_t qemu_next_alarm_deadline(AioContext *ctx)
>  {
>      int64_t delta = INT64_MAX;
>  
> @@ -127,12 +166,23 @@ static int64_t qemu_next_alarm_deadline(void)
>  
>  static void qemu_rearm_alarm_timer(struct qemu_alarm_timer *t)
>  {
> -    int64_t nearest_delta_ns = qemu_next_alarm_deadline();
> +    int64_t nearest_delta_ns = qemu_next_alarm_deadline(t->ctx);
>      if (nearest_delta_ns < INT64_MAX) {
>          t->rearm(t, nearest_delta_ns);
>      }
>  }
>  
> +static void qemu_rearm_alarm_timers(void)
> +{
> +    struct qemu_alarm_timer *t;
> +
> +    qemu_mutex_lock(&alarm_timer_list_lock);
> +    QSLIST_FOREACH(t, &alarm_timer_list, next_alarm_timer) {
> +        qemu_rearm_alarm_timer(t);
> +    }
> +    qemu_mutex_unlock(&alarm_timer_list_lock);
> +}
> +
>  /* TODO: MIN_TIMER_REARM_NS should be optimized */
>  #define MIN_TIMER_REARM_NS 250000
>  
> @@ -262,7 +312,7 @@ void qemu_clock_enable(QEMUClock *clock, bool enabled)
>      bool old = clock->enabled;
>      clock->enabled = enabled;
>      if (enabled && !old) {
> -        qemu_rearm_alarm_timer(alarm_timer);
> +        qemu_rearm_alarm_timers();
>      }
>  }
>  
> @@ -355,6 +405,8 @@ void qemu_mod_timer_ns(QEMUTimer *ts, int64_t expire_time)
>  {
>      QEMUClock *clock = ts->clock;
>      QEMUTimer **pt, *t;
> +    AioContext *ctx = *tls_get_thread_aio_context();
> +    struct qemu_alarm_timer *alarm_timer = ctx->alarm_timer;
>  
>      qemu_del_timer(ts);
>  
> @@ -485,6 +537,8 @@ uint64_t qemu_timer_expire_time_ns(QEMUTimer *ts)
>  void qemu_run_all_timers(void)
>  {
>      bool timer_modified;
> +    AioContext *ctx = *tls_get_thread_aio_context();
> +    struct qemu_alarm_timer *alarm_timer = ctx->alarm_timer;
>  
>      alarm_timer->pending = false;
>  
> @@ -515,13 +569,15 @@ static void CALLBACK host_alarm_handler(PVOID lpParam, BOOLEAN unused)
>  static void host_alarm_handler(int host_signum)
>  #endif
>  {
> -    struct qemu_alarm_timer *t = alarm_timer;
> +    AioContext *ctx = *tls_get_thread_aio_context();
> +    struct qemu_alarm_timer *t = ctx->alarm_timer;
> +
>      if (!t)
>  	return;
>  
>      t->expired = true;
>      t->pending = true;
> -    qemu_notify_event();
> +    aio_notify(ctx);
>  }
>  
>  #if defined(__linux__)
> @@ -774,37 +830,30 @@ static void win32_rearm_timer(struct qemu_alarm_timer *t,
>  
>  #endif /* _WIN32 */
>  
> -static void quit_timers(void)
> -{
> -    struct qemu_alarm_timer *t = alarm_timer;
> -    alarm_timer = NULL;
> -    t->stop(t);
> -}
> -
>  #ifdef CONFIG_POSIX
>  static void reinit_timers(void)
>  {
> -    struct qemu_alarm_timer *t = alarm_timer;
> -    t->stop(t);
> -    if (t->start(t)) {
> -        fprintf(stderr, "Internal timer error: aborting\n");
> -        exit(1);
> +    struct qemu_alarm_timer *t;
> +
> +    qemu_mutex_lock(&alarm_timer_list_lock);
> +    QSLIST_FOREACH(t, &alarm_timer_list, next_alarm_timer) {
> +        t->stop(t);
> +        if (t->start(t)) {
> +            fprintf(stderr, "Internal timer error: aborting\n");
> +            exit(1);
> +        }
> +        qemu_rearm_alarm_timer(t);
>      }
> -    qemu_rearm_alarm_timer(t);
> +    qemu_mutex_unlock(&alarm_timer_list_lock);
>  }
>  #endif /* CONFIG_POSIX */
>  
> -int init_timer_alarm(void)
> +int init_timer_alarm(struct qemu_alarm_timer *t)
>  {
> -    struct qemu_alarm_timer *t = NULL;
>      int i, err = -1;
>  
> -    if (alarm_timer) {
> -        return 0;
> -    }
> -
>      for (i = 0; alarm_timers[i].name; i++) {
> -        t = &alarm_timers[i];
> +        *t = alarm_timers[i];
>  
>          err = t->start(t);
>          if (!err)
> @@ -818,14 +867,15 @@ int init_timer_alarm(void)
>  
>      qemu_mutex_init(&t->timer_modified_lock);
>  
> -    atexit(quit_timers);
>  #ifdef CONFIG_POSIX
>      pthread_atfork(NULL, NULL, reinit_timers);
>  #endif
> -    alarm_timer = t;
>      return 0;
>  
>  fail:
> +    fprintf(stderr, "could not initialize alarm timer\n");
> +    exit(1);
> +
>      return err;
>  }
>  
> 

This goes into a similar direction like what I have here locally.
However, as said in a different thread, you must not remove the main
alarm timers from the main loop, rather just add the option for
additional alarm timers so that AIO etc. could define their private ones.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-22  4:38   ` liu ping fan
  2013-07-22  6:28     ` Jan Kiszka
@ 2013-07-22  9:40     ` Alex Bligh
  2013-07-22 10:18       ` liu ping fan
  1 sibling, 1 reply; 63+ messages in thread
From: Alex Bligh @ 2013-07-22  9:40 UTC (permalink / raw)
  To: liu ping fan
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka, qemu-devel,
	Anthony Liguori, Paolo Bonzini

Liu,

--On 22 July 2013 12:38:02 +0800 liu ping fan <qemulist@gmail.com> wrote:

> I read your second series, and try to summary the main different between
> us. Please correct me, if I misunderstood something.
> --1st. You try to create a separate QemuClock for AioContext.
>     I think QemuClock is the clock event source and we have three
> classic with fine definition. They should be qemu-wide for time
> measurement.  On the other handler, timer is  a concept for timeout,
> so it can be AioContext-related. So I have patch2&5.

Yes and no. QEMUClock is not a clock source. QEMUClock /has/ a clock
source, and more than one QEMUClock can share the same clock source.

QEMUClock does have its own list of timers, and so if you only
want to run a subset of timers in aio_poll (which is I believe the
requirement so not to change the behaviour of existing timers),
you need a separate QEMUClock.

The approach I took (StefanH's idea) was to put a QEMUClock into
each AioContext. Arguably you might want to put a set of 3 QEMUClock's
into each AioContext, one for each clock source.

QEMUClock itself is very lightweight as far as I can tell.

> --2nd. You want to substitute alarm_timer with timeout of poll.

Correct.

>     I try to trigger each specified thread when its deadline comes.
> But unfortunately, the signal can not be delivered to the specified
> thread directly, and I need timerfd for each AioContext. (If we can
> have the equivalent on other platform)

Firstly, I can't see the advantage of keeping the alarm_timer stuff around
at all if we can delete it. Save, of course, that on systems that don't
have ppoll or equivalent you lose sub-millisecond timing by deleting them.

Secondly, I don't understand this point. Let's assume we keep alarm_timers.
The timers run from signals at the moment (under POSIX) and could easily
run in their own thread. They synchronise with the thread waiting on poll
by using the event notifiers. Why do you need to have multiple threads
dealing with timers? At most one new thread (and quite possibly zero)
should be sufficient, as all they need to do is write() to the correct
notifier FD, which will end the relevant poll. Of course if we delete
alarm_timers this is all irrelevant.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-22  9:40     ` Alex Bligh
@ 2013-07-22 10:18       ` liu ping fan
  2013-07-23  2:53         ` liu ping fan
  2013-07-25 11:47         ` Stefan Hajnoczi
  0 siblings, 2 replies; 63+ messages in thread
From: liu ping fan @ 2013-07-22 10:18 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On Mon, Jul 22, 2013 at 5:40 PM, Alex Bligh <alex@alex.org.uk> wrote:
> Liu,
>
>
> --On 22 July 2013 12:38:02 +0800 liu ping fan <qemulist@gmail.com> wrote:
>
>> I read your second series, and try to summary the main different between
>> us. Please correct me, if I misunderstood something.
>> --1st. You try to create a separate QemuClock for AioContext.
>>     I think QemuClock is the clock event source and we have three
>> classic with fine definition. They should be qemu-wide for time
>> measurement.  On the other handler, timer is  a concept for timeout,
>> so it can be AioContext-related. So I have patch2&5.
>
>
> Yes and no. QEMUClock is not a clock source. QEMUClock /has/ a clock
> source, and more than one QEMUClock can share the same clock source.
>
Supposed the case sync the time across iothread/vcpu/dataplane, the
vm_clock is a unique interface for all of them.

> QEMUClock does have its own list of timers, and so if you only
> want to run a subset of timers in aio_poll (which is I believe the
> requirement so not to change the behaviour of existing timers),
> you need a separate QEMUClock.
>
> The approach I took (StefanH's idea) was to put a QEMUClock into
> each AioContext. Arguably you might want to put a set of 3 QEMUClock's
> into each AioContext, one for each clock source.
>
> QEMUClock itself is very lightweight as far as I can tell.
>
I think the weight is not the point here, rather, the concept.
>
>> --2nd. You want to substitute alarm_timer with timeout of poll.
>
>
> Correct.
>
>
>>     I try to trigger each specified thread when its deadline comes.
>> But unfortunately, the signal can not be delivered to the specified
>> thread directly, and I need timerfd for each AioContext. (If we can
>> have the equivalent on other platform)
>
>
> Firstly, I can't see the advantage of keeping the alarm_timer stuff around
> at all if we can delete it. Save, of course, that on systems that don't
> have ppoll or equivalent you lose sub-millisecond timing by deleting them.
>
I do not think it very clearly. But the final aim is to associate an
eventfd with deadline, and attach the eventfd to g_poll.  For linux,
it can be easily achieved by timerfd,  for other posix, what I can
thought till now is to distinguish the source of timer in signal
handler, and revoke a eventfd (stands for deadline) on AioContext.
I am not sure whether this is better than timeout of poll.

> Secondly, I don't understand this point. Let's assume we keep alarm_timers.
> The timers run from signals at the moment (under POSIX) and could easily
> run in their own thread. They synchronise with the thread waiting on poll
> by using the event notifiers. Why do you need to have multiple threads
> dealing with timers? At most one new thread (and quite possibly zero)
> should be sufficient, as all they need to do is write() to the correct
> notifier FD, which will end the relevant poll. Of course if we delete

Different AioContext may have different deadline, so I separate the alarm_timers

Regards,
Pingfan
> alarm_timers this is all irrelevant.
>
> --
> Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [Qemu-trivial] [RFC 3/8] timer: make timers_state static
  2013-07-22  6:36   ` Jan Kiszka
@ 2013-07-22 17:40     ` Michael Tokarev
  0 siblings, 0 replies; 63+ messages in thread
From: Michael Tokarev @ 2013-07-22 17:40 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-trivial, Liu Ping Fan,
	qemu-devel, Alex Bligh, Anthony Liguori, Paolo Bonzini

22.07.2013 10:36, Jan Kiszka wrote:
> On 2013-07-21 10:43, Liu Ping Fan wrote:
>> -TimersState timers_state;
>> +static TimersState timers_state;

> Could go in directly via trivial IMHO.

It should not conflict with anything else.

Thanks, applied to the trivial patches queue.

/mjt

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-22  6:28     ` Jan Kiszka
@ 2013-07-23  2:51       ` liu ping fan
  2013-07-25 11:44         ` Stefan Hajnoczi
  0 siblings, 1 reply; 63+ messages in thread
From: liu ping fan @ 2013-07-23  2:51 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On Mon, Jul 22, 2013 at 2:28 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2013-07-22 06:38, liu ping fan wrote:
>> On Sun, Jul 21, 2013 at 5:53 PM, Alex Bligh <alex@alex.org.uk> wrote:
>>> Liu,
>>>
>>>
>>> --On 21 July 2013 16:42:57 +0800 Liu Ping Fan <qemulist@gmail.com> wrote:
>>>
>>>> Currently, the timers run on iothread within BQL, so virtio-block
>>>> dataplane can not use throttle, as Stefan Hajnoczi pointed out in his
>>>> patches to port dataplane onto block layer.(Thanks, Stefan) To enable
>>>> this feature, I plan to enable timers to run on AioContext's thread. And
>>>> maybe in future, hpet can run with its dedicated thread too.
>>>>
>>>> Also, I see Alex Bligh is on the same effort by another method,(it is a
>>>> good idea)    "[RFC] aio/async: Add timed bottom-halves".
>>>
>>>
>>> Stefan & Paolo did not like that method much, so I did a third method
>>> (posted yesterday) suggested by Stefan which adds a clock to AioContext (to
>>> which timers can be attached), deletes ALL the alarm_timer stuff (which was
>>> very cathartic), uses timeouts on the g_poll, and adds ppoll where this is
>>> available. Series at:
>>>  http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg03334.html
>>>
>>> I suspect this also overlaps with your code.
>>>
>>> So now we have 3 methods to do similar things!
>>>
>>> One advantage of my approach is that it removes more code than it adds
>>> (by quite a margin). However, alarm timers could have been left in.
>>> What's the advantage in giving an AioContext its own alarm timer as
>>> opposed to just its own clock?
>>>
>> I read your second series, and try to summary the main different between us.
>> Please correct me, if I misunderstood something.
>> --1st. You try to create a separate QemuClock for AioContext.
>>     I think QemuClock is the clock event source and we have three
>> classic with fine definition. They should be qemu-wide for time
>> measurement.  On the other handler, timer is  a concept for timeout,
>
> Timers, as used in QEMU, are not only for "unimportant" and
> unlikely-to-fire timeouts. They are also for potential high-rate, high

Sorry, but can not catch the point.  QemuTimer is used to trigger
something, so the process can be delayed for some reason. For high
resolution events, do we go directly to QEMUClock, which fall back on
host's hpet? And the use case is just for get the time stamp, NOT for
trigger?

Thx&regards,
Pingfan

> resolution events. Your last series neglects this. We may need two
> versions of timers - or one interface that caters both use cases properly.
>
> Jan
>
>> so it can be AioContext-related. So I have patch2&5.
>> --2nd. You want to substitute alarm_timer with timeout of poll.
>>     I try to trigger each specified thread when its deadline comes.
>> But unfortunately, the signal can not be delivered to the specified
>> thread directly, and I need timerfd for each AioContext. (If we can
>> have the equivalent on other platform)
>>
>> Anything else?
>>
>> Thanks and regards,
>> Pingfan
>>
>>> --
>>> Alex Bligh
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-22 10:18       ` liu ping fan
@ 2013-07-23  2:53         ` liu ping fan
  2013-07-23 10:30           ` Paolo Bonzini
  2013-07-23 14:21           ` [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Alex Bligh
  2013-07-25 11:47         ` Stefan Hajnoczi
  1 sibling, 2 replies; 63+ messages in thread
From: liu ping fan @ 2013-07-23  2:53 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On Mon, Jul 22, 2013 at 6:18 PM, liu ping fan <qemulist@gmail.com> wrote:
> On Mon, Jul 22, 2013 at 5:40 PM, Alex Bligh <alex@alex.org.uk> wrote:
>> Liu,
>>
>>
>> --On 22 July 2013 12:38:02 +0800 liu ping fan <qemulist@gmail.com> wrote:
>>
>>> I read your second series, and try to summary the main different between
>>> us. Please correct me, if I misunderstood something.
>>> --1st. You try to create a separate QemuClock for AioContext.
>>>     I think QemuClock is the clock event source and we have three
>>> classic with fine definition. They should be qemu-wide for time
>>> measurement.  On the other handler, timer is  a concept for timeout,
>>> so it can be AioContext-related. So I have patch2&5.
>>
>>
>> Yes and no. QEMUClock is not a clock source. QEMUClock /has/ a clock
>> source, and more than one QEMUClock can share the same clock source.
>>
> Supposed the case sync the time across iothread/vcpu/dataplane, the
> vm_clock is a unique interface for all of them.
>
>> QEMUClock does have its own list of timers, and so if you only
>> want to run a subset of timers in aio_poll (which is I believe the
>> requirement so not to change the behaviour of existing timers),
>> you need a separate QEMUClock.
>>
>> The approach I took (StefanH's idea) was to put a QEMUClock into
>> each AioContext. Arguably you might want to put a set of 3 QEMUClock's
>> into each AioContext, one for each clock source.
>>
>> QEMUClock itself is very lightweight as far as I can tell.
>>
> I think the weight is not the point here, rather, the concept.
>>
>>> --2nd. You want to substitute alarm_timer with timeout of poll.
>>
>>
>> Correct.
>>
>>
>>>     I try to trigger each specified thread when its deadline comes.
>>> But unfortunately, the signal can not be delivered to the specified
>>> thread directly, and I need timerfd for each AioContext. (If we can
>>> have the equivalent on other platform)
>>
>>
>> Firstly, I can't see the advantage of keeping the alarm_timer stuff around
>> at all if we can delete it. Save, of course, that on systems that don't
>> have ppoll or equivalent you lose sub-millisecond timing by deleting them.
>>
The scenior I can figure out is if adopting timeout of poll, then when
changing the deadline, we need to invoke poll, and set the new
timeout, right?

Regards,
Pingfan
> I do not think it very clearly. But the final aim is to associate an
> eventfd with deadline, and attach the eventfd to g_poll.  For linux,
> it can be easily achieved by timerfd,  for other posix, what I can
> thought till now is to distinguish the source of timer in signal
> handler, and revoke a eventfd (stands for deadline) on AioContext.
> I am not sure whether this is better than timeout of poll.
>
>> Secondly, I don't understand this point. Let's assume we keep alarm_timers.
>> The timers run from signals at the moment (under POSIX) and could easily
>> run in their own thread. They synchronise with the thread waiting on poll
>> by using the event notifiers. Why do you need to have multiple threads
>> dealing with timers? At most one new thread (and quite possibly zero)
>> should be sufficient, as all they need to do is write() to the correct
>> notifier FD, which will end the relevant poll. Of course if we delete
>
> Different AioContext may have different deadline, so I separate the alarm_timers
>
> Regards,
> Pingfan
>> alarm_timers this is all irrelevant.
>>
>> --
>> Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] timer: run timers on aio_poll
  2013-07-21  9:55   ` Alex Bligh
@ 2013-07-23  2:56     ` liu ping fan
  2013-07-23 14:22       ` Alex Bligh
  0 siblings, 1 reply; 63+ messages in thread
From: liu ping fan @ 2013-07-23  2:56 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Anthony Liguori, Jan Kiszka, qemu-devel,
	Stefan Hajnoczi, Paolo Bonzini

On Sun, Jul 21, 2013 at 5:55 PM, Alex Bligh <alex@alex.org.uk> wrote:
>
>
> --On 21 July 2013 16:43:03 +0800 Liu Ping Fan <qemulist@gmail.com> wrote:
>
>> diff --git a/aio-posix.c b/aio-posix.c
>> index b68eccd..29c2769 100644
>> --- a/aio-posix.c
>> +++ b/aio-posix.c
>> @@ -191,6 +191,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
>>          progress = true;
>>      }
>>
>> +    qemu_run_all_timers();
>> +
>>      if (progress && !blocking) {
>>          return true;
>>      }
>
>
> I am told (by Stefan H) this approach is unsafe as existing timers may
> not expect to be run within aio_poll.
>
Have not figure out the reason, could you elaborate ?

Thanks
> Also, I suspect you need to change the value of progress if timers
> run so bdrv draining terminates properly.
>
Yes, you are right
> --
> Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-23  2:53         ` liu ping fan
@ 2013-07-23 10:30           ` Paolo Bonzini
  2013-07-24  1:28             ` liu ping fan
  2013-07-23 14:21           ` [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Alex Bligh
  1 sibling, 1 reply; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-23 10:30 UTC (permalink / raw)
  To: liu ping fan
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka, qemu-devel,
	Anthony Liguori

Il 23/07/2013 04:53, liu ping fan ha scritto:
> The scenior I can figure out is if adopting timeout of poll, then when
> changing the deadline, we need to invoke poll, and set the new
> timeout, right?

Yes, you need to call aio_notify so that poll is reinvoked.

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-23  2:53         ` liu ping fan
  2013-07-23 10:30           ` Paolo Bonzini
@ 2013-07-23 14:21           ` Alex Bligh
  1 sibling, 0 replies; 63+ messages in thread
From: Alex Bligh @ 2013-07-23 14:21 UTC (permalink / raw)
  To: liu ping fan
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka, qemu-devel,
	Anthony Liguori, Paolo Bonzini



--On 23 July 2013 10:53:26 +0800 liu ping fan <qemulist@gmail.com> wrote:

>>> Firstly, I can't see the advantage of keeping the alarm_timer stuff
>>> around at all if we can delete it. Save, of course, that on systems
>>> that don't have ppoll or equivalent you lose sub-millisecond timing by
>>> deleting them.
>>>
> The scenior I can figure out is if adopting timeout of poll, then when
> changing the deadline, we need to invoke poll, and set the new
> timeout, right?

Are you asking how I deal with modification of the timeout when the
poll is already running? If so, I use the event notifier to end the
current poll. It will then (presumably) repoll with a recalculated
timeout.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] timer: run timers on aio_poll
  2013-07-23  2:56     ` liu ping fan
@ 2013-07-23 14:22       ` Alex Bligh
  0 siblings, 0 replies; 63+ messages in thread
From: Alex Bligh @ 2013-07-23 14:22 UTC (permalink / raw)
  To: liu ping fan
  Cc: Kevin Wolf, Anthony Liguori, Alex Bligh, Jan Kiszka, qemu-devel,
	Stefan Hajnoczi, Paolo Bonzini

Liu,

--On 23 July 2013 10:56:42 +0800 liu ping fan <qemulist@gmail.com> wrote:

>> I am told (by Stefan H) this approach is unsafe as existing timers may
>> not expect to be run within aio_poll.
>>
> Have not figure out the reason, could you elaborate ?

I asked Stefan H whether we could guarantee that it was safe to
call any timer routine within aio_poll. He said no, because they
might (reasonably) assume they were only being called from the
main loop. Therefore we'd have to audit any existing user. I suppose
an example of something unsafe might be calling aio_poll from inside
a timer routine.

Stefan thus recommended using a separate QEMUClock, and only running
that clock's timers.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-23 10:30           ` Paolo Bonzini
@ 2013-07-24  1:28             ` liu ping fan
  2013-07-24  6:42               ` Paolo Bonzini
  0 siblings, 1 reply; 63+ messages in thread
From: liu ping fan @ 2013-07-24  1:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka, qemu-devel,
	Anthony Liguori

On Tue, Jul 23, 2013 at 6:30 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 23/07/2013 04:53, liu ping fan ha scritto:
>> The scenior I can figure out is if adopting timeout of poll, then when
>> changing the deadline, we need to invoke poll, and set the new
>> timeout, right?
>
> Yes, you need to call aio_notify so that poll is reinvoked.
>
I try to list the difference between alarm_timer and timeout of poll.
It includes thread-affinity, resolution and easy-use.

Most of all,  thread-affinity
The main issue with alarm timer is the affinity of timer_t with
threads. For linux, SIGEV_THREAD_ID has been supported for a very long
time and we already associate the signal with the specified thread. So
the only issue is left for other unix, we can emulate the affinity by
using SIGEV_THREAD and repost the event to the specified thread.
As to timeout of poll, it has the affinity of threads.

Resolution:
alarm_timer provides higher resolution, but do we care about it?

easy-use:
The reset of the deadline as mentioned.

Finally, I admit timeout of poll will save large chunk of platform-related code.

Thanks and regards,
Pingfan

> Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  1:28             ` liu ping fan
@ 2013-07-24  6:42               ` Paolo Bonzini
  2013-07-24  7:31                 ` Alex Bligh
  2013-07-24  7:43                 ` liu ping fan
  0 siblings, 2 replies; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-24  6:42 UTC (permalink / raw)
  To: liu ping fan
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka, qemu-devel,
	Anthony Liguori

Il 24/07/2013 03:28, liu ping fan ha scritto:
> On Tue, Jul 23, 2013 at 6:30 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> > Il 23/07/2013 04:53, liu ping fan ha scritto:
>>> >> The scenior I can figure out is if adopting timeout of poll, then when
>>> >> changing the deadline, we need to invoke poll, and set the new
>>> >> timeout, right?
>> >
>> > Yes, you need to call aio_notify so that poll is reinvoked.
>> >
> I try to list the difference between alarm_timer and timeout of poll.
> It includes thread-affinity, resolution and easy-use.
> 
> Most of all,  thread-affinity
> The main issue with alarm timer is the affinity of timer_t with
> threads. For linux, SIGEV_THREAD_ID has been supported for a very long
> time and we already associate the signal with the specified thread. So
> the only issue is left for other unix, we can emulate the affinity by
> using SIGEV_THREAD and repost the event to the specified thread.
> As to timeout of poll, it has the affinity of threads.
> 
> Resolution:
> alarm_timer provides higher resolution, but do we care about it?

With ppoll, is this true or just hearsay?

(Without ppoll, indeed setitimer has 1 us resolution while poll has 1
ms; too bad that select has other problems, because select has also 1 us
resolution).

Paolo

> easy-use:
> The reset of the deadline as mentioned.
> 
> Finally, I admit timeout of poll will save large chunk of platform-related code.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  6:42               ` Paolo Bonzini
@ 2013-07-24  7:31                 ` Alex Bligh
  2013-07-24  7:43                   ` Paolo Bonzini
  2013-07-24  7:43                 ` liu ping fan
  1 sibling, 1 reply; 63+ messages in thread
From: Alex Bligh @ 2013-07-24  7:31 UTC (permalink / raw)
  To: Paolo Bonzini, liu ping fan
  Cc: Kevin Wolf, Alex Bligh, Jan Kiszka, Stefan Hajnoczi, qemu-devel,
	Anthony Liguori



--On 24 July 2013 08:42:26 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:

> With ppoll, is this true or just hearsay?
>
> (Without ppoll, indeed setitimer has 1 us resolution while poll has 1
> ms; too bad that select has other problems, because select has also 1 us
> resolution).

Most 'reasonable' POSIX compliant operating systems have ppoll and I would
have thought there is /better/ resolution there than relying not only
on signal, but also a pipe or eventfd plus the underlying poll().

If it was my comments you are referring to, my concern was mainly about
Windows (which I know very little about), as there does not appear
to be a nanosecond or even microsecond alternative to
WaitForMultipleObjects. However, articles like this:

 
http://social.msdn.microsoft.com/Forums/vstudio/en-US/e8a7cb1e-9edd-4ee3-982e-f66b7bf6ae44/improve-accuracy-waitforsingleobject

suggest that WaitFor{Single,Multiple}Objects can have pretty
appalling latency anyway (100ms!), and there's no evidence that's
limited by making one of the FDs (or objects) ready. In these
circumstances, I'd question whether we gain anything by worrying
about timer resolution.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  7:31                 ` Alex Bligh
@ 2013-07-24  7:43                   ` Paolo Bonzini
  2013-07-24  8:01                     ` Alex Bligh
  2013-07-24  8:30                     ` liu ping fan
  0 siblings, 2 replies; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-24  7:43 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, liu ping fan,
	qemu-devel, Anthony Liguori

Il 24/07/2013 09:31, Alex Bligh ha scritto:
> 
> 
> --On 24 July 2013 08:42:26 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> With ppoll, is this true or just hearsay?
>>
>> (Without ppoll, indeed setitimer has 1 us resolution while poll has 1
>> ms; too bad that select has other problems, because select has also 1 us
>> resolution).
> 
> Most 'reasonable' POSIX compliant operating systems have ppoll

Really?  I could find no manpages for any of Solaris and *BSD.

> and I would
> have thought there is /better/ resolution there than relying not only
> on signal, but also a pipe or eventfd plus the underlying poll().

I agree.

> If it was my comments you are referring to

The message I replied to was Ping Fan's, but perhaps you brought it up
first.  I don't know.

> , my concern was mainly about
> Windows (which I know very little about), as there does not appear
> to be a nanosecond or even microsecond alternative to
> WaitForMultipleObjects. However, articles like this:
> 
> http://social.msdn.microsoft.com/Forums/vstudio/en-US/e8a7cb1e-9edd-4ee3-982e-f66b7bf6ae44/improve-accuracy-waitforsingleobject
> 
> suggest that WaitFor{Single,Multiple}Objects can have pretty
> appalling latency anyway (100ms!), and there's no evidence that's
> limited by making one of the FDs (or objects) ready.

... especially when making one of the FDs ready would likely have the
same latency in some internal Windows thread that implements timers.

> In these
> circumstances, I'd question whether we gain anything by worrying
> about timer resolution.

Part of it should be fixed by os_setup_early_signal_handling.

This is corroborated by the fact that without
os_setup_early_signal_handling Wine always works, and Windows breaks.

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  6:42               ` Paolo Bonzini
  2013-07-24  7:31                 ` Alex Bligh
@ 2013-07-24  7:43                 ` liu ping fan
  2013-07-24  7:54                   ` Paolo Bonzini
  1 sibling, 1 reply; 63+ messages in thread
From: liu ping fan @ 2013-07-24  7:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka, qemu-devel,
	Anthony Liguori

On Wed, Jul 24, 2013 at 2:42 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 24/07/2013 03:28, liu ping fan ha scritto:
>> On Tue, Jul 23, 2013 at 6:30 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> > Il 23/07/2013 04:53, liu ping fan ha scritto:
>>>> >> The scenior I can figure out is if adopting timeout of poll, then when
>>>> >> changing the deadline, we need to invoke poll, and set the new
>>>> >> timeout, right?
>>> >
>>> > Yes, you need to call aio_notify so that poll is reinvoked.
>>> >
>> I try to list the difference between alarm_timer and timeout of poll.
>> It includes thread-affinity, resolution and easy-use.
>>
>> Most of all,  thread-affinity
>> The main issue with alarm timer is the affinity of timer_t with
>> threads. For linux, SIGEV_THREAD_ID has been supported for a very long
>> time and we already associate the signal with the specified thread. So
>> the only issue is left for other unix, we can emulate the affinity by
>> using SIGEV_THREAD and repost the event to the specified thread.
>> As to timeout of poll, it has the affinity of threads.
>>
>> Resolution:
>> alarm_timer provides higher resolution, but do we care about it?
>
> With ppoll, is this true or just hearsay?
>
> (Without ppoll, indeed setitimer has 1 us resolution while poll has 1
> ms; too bad that select has other problems, because select has also 1 us
> resolution).
>
Paid some time to dig the kernel code, and find out that the
resolution lost by timeout of poll/select..etc is cause by the timeout
is a slack region.
See code in
do_poll()
   if (!poll_schedule_timeout(wait, TASK_INTERRUPTIBLE, to, slack))

Notice the slack param, it causes the lose of resolution.
The process default slack time  inherits from init_task and it is
      .timer_slack_ns = 50000, /* 50 usec default slack */
But we can fix it by PR_SET_TIMERSLACK to decrease it for select/poll/...

So ppoll with timerslack adjustment will meet our requirement. But
what about other non-linux system?

Regards,
Pingfan

> Paolo
>
>> easy-use:
>> The reset of the deadline as mentioned.
>>
>> Finally, I admit timeout of poll will save large chunk of platform-related code.
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  7:43                 ` liu ping fan
@ 2013-07-24  7:54                   ` Paolo Bonzini
  2013-07-24  8:06                     ` Alex Bligh
  2013-07-24 14:46                     ` [Qemu-devel] [PATCHv2a] [RFC 8/7 (really)] Add prctl(PR_SET_TIMERSLACK, 1, ...) to reduce timer slack Alex Bligh
  0 siblings, 2 replies; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-24  7:54 UTC (permalink / raw)
  To: liu ping fan
  Cc: Kevin Wolf, Alex Bligh, Jan Kiszka, Stefan Hajnoczi, qemu-devel,
	Anthony Liguori

Il 24/07/2013 09:43, liu ping fan ha scritto:
> Paid some time to dig the kernel code, and find out that the
> resolution lost by timeout of poll/select..etc is cause by the timeout
> is a slack region.
> See code in
> do_poll()
>    if (!poll_schedule_timeout(wait, TASK_INTERRUPTIBLE, to, slack))
> 
> Notice the slack param, it causes the lose of resolution.
> The process default slack time  inherits from init_task and it is
>       .timer_slack_ns = 50000, /* 50 usec default slack */
> But we can fix it by PR_SET_TIMERSLACK to decrease it for select/poll/...

Right, good catch.  I just learnt about PR_SET_TIMERSLACK. :)

Alex, can you add it to your series?  (Note that you must set a timer
slack of 1, because 0 is interpreted as "default").

> So ppoll with timerslack adjustment will meet our requirement. But
> what about other non-linux system?

They might have their own mechanism similar to PR_SET_TIMERSLACK.

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  7:43                   ` Paolo Bonzini
@ 2013-07-24  8:01                     ` Alex Bligh
  2013-07-24  8:19                       ` Paolo Bonzini
  2013-07-24  8:37                       ` Alex Bligh
  2013-07-24  8:30                     ` liu ping fan
  1 sibling, 2 replies; 63+ messages in thread
From: Alex Bligh @ 2013-07-24  8:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka,
	liu ping fan, qemu-devel, Anthony Liguori

Paolo,

--On 24 July 2013 09:43:28 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:

>> Most 'reasonable' POSIX compliant operating systems have ppoll
>
> Really?  I could find no manpages for any of Solaris and *BSD.

OK I shall (re)research that then! I suppose select() / pselect() is
an alternative when there are few FDs.

>> , my concern was mainly about
>> Windows (which I know very little about), as there does not appear
>> to be a nanosecond or even microsecond alternative to
>> WaitForMultipleObjects. However, articles like this:
>>
>> http://social.msdn.microsoft.com/Forums/vstudio/en-US/e8a7cb1e-9edd-4ee3
>> -982e-f66b7bf6ae44/improve-accuracy-waitforsingleobject
>>
>> suggest that WaitFor{Single,Multiple}Objects can have pretty
>> appalling latency anyway (100ms!), and there's no evidence that's
>> limited by making one of the FDs (or objects) ready.
>
> ... especially when making one of the FDs ready would likely have the
> same latency in some internal Windows thread that implements timers.
>
>> In these
>> circumstances, I'd question whether we gain anything by worrying
>> about timer resolution.
>
> Part of it should be fixed by os_setup_early_signal_handling.
>
> This is corroborated by the fact that without
> os_setup_early_signal_handling Wine always works, and Windows breaks.

This:
  http://www.windowstimestamp.com/description
suggests that whilst WaitForMultipleEvents has a millisecond timeout, one 
can (see section 3.2) use these to wait for an object which is itself a 
timer and expires with - in this case - 100ns resolution which is probably 
enough.

Again I know nothing about Windows so this may be completely wrong.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  7:54                   ` Paolo Bonzini
@ 2013-07-24  8:06                     ` Alex Bligh
  2013-07-24 14:46                     ` [Qemu-devel] [PATCHv2a] [RFC 8/7 (really)] Add prctl(PR_SET_TIMERSLACK, 1, ...) to reduce timer slack Alex Bligh
  1 sibling, 0 replies; 63+ messages in thread
From: Alex Bligh @ 2013-07-24  8:06 UTC (permalink / raw)
  To: Paolo Bonzini, liu ping fan
  Cc: Kevin Wolf, Alex Bligh, Jan Kiszka, Stefan Hajnoczi, qemu-devel,
	Anthony Liguori

Paolo,

--On 24 July 2013 09:54:57 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:

> Alex, can you add it to your series?  (Note that you must set a timer
> slack of 1, because 0 is interpreted as "default").

Sure, will do. I'm guessing I'll have to look for that inside configure
as well.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  8:01                     ` Alex Bligh
@ 2013-07-24  8:19                       ` Paolo Bonzini
  2013-07-24  8:37                       ` Alex Bligh
  1 sibling, 0 replies; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-24  8:19 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, liu ping fan,
	qemu-devel, Anthony Liguori

Il 24/07/2013 10:01, Alex Bligh ha scritto:
>>>
>>
>> Part of it should be fixed by os_setup_early_signal_handling.
>>
>> This is corroborated by the fact that without
>> os_setup_early_signal_handling Wine always works, and Windows breaks.
> 
> This:
>  http://www.windowstimestamp.com/description
> suggests that whilst WaitForMultipleEvents has a millisecond timeout,
> one can (see section 3.2) use these to wait for an object which is
> itself a timer and expires with - in this case - 100ns resolution which
> is probably enough.
> 
> Again I know nothing about Windows so this may be completely wrong.

This is roughly what the alarm timer code does on Windows.  I also don't
know much about the internals, I wouldn't worry too much.

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  7:43                   ` Paolo Bonzini
  2013-07-24  8:01                     ` Alex Bligh
@ 2013-07-24  8:30                     ` liu ping fan
  1 sibling, 0 replies; 63+ messages in thread
From: liu ping fan @ 2013-07-24  8:30 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka, qemu-devel,
	Anthony Liguori

[...]
>> http://social.msdn.microsoft.com/Forums/vstudio/en-US/e8a7cb1e-9edd-4ee3-982e-f66b7bf6ae44/improve-accuracy-waitforsingleobject
>>
>> suggest that WaitFor{Single,Multiple}Objects can have pretty
>> appalling latency anyway (100ms!), and there's no evidence that's
>> limited by making one of the FDs (or objects) ready.
>
> ... especially when making one of the FDs ready would likely have the
> same latency in some internal Windows thread that implements timers.
>
>> In these
>> circumstances, I'd question whether we gain anything by worrying
>> about timer resolution.
>
Does hpet emulation care about it?

> Part of it should be fixed by os_setup_early_signal_handling.
>
> This is corroborated by the fact that without
> os_setup_early_signal_handling Wine always works, and Windows breaks.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  8:01                     ` Alex Bligh
  2013-07-24  8:19                       ` Paolo Bonzini
@ 2013-07-24  8:37                       ` Alex Bligh
  2013-07-24 11:28                         ` Paolo Bonzini
  1 sibling, 1 reply; 63+ messages in thread
From: Alex Bligh @ 2013-07-24  8:37 UTC (permalink / raw)
  To: Alex Bligh, Paolo Bonzini
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, liu ping fan,
	qemu-devel, Anthony Liguori



--On 24 July 2013 09:01:22 +0100 Alex Bligh <alex@alex.org.uk> wrote:

>>> Most 'reasonable' POSIX compliant operating systems have ppoll
>>
>> Really?  I could find no manpages for any of Solaris and *BSD.
>
> OK I shall (re)research that then! I suppose select() / pselect() is
> an alternative when there are few FDs.

Looks like I was wrong. However, pselect support is pretty wide.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-24  8:37                       ` Alex Bligh
@ 2013-07-24 11:28                         ` Paolo Bonzini
  0 siblings, 0 replies; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-24 11:28 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, liu ping fan,
	qemu-devel, Anthony Liguori

Il 24/07/2013 10:37, Alex Bligh ha scritto:
> 
> 
> --On 24 July 2013 09:01:22 +0100 Alex Bligh <alex@alex.org.uk> wrote:
> 
>>>> Most 'reasonable' POSIX compliant operating systems have ppoll
>>>
>>> Really?  I could find no manpages for any of Solaris and *BSD.
>>
>> OK I shall (re)research that then! I suppose select() / pselect() is
>> an alternative when there are few FDs.
> 
> Looks like I was wrong. However, pselect support is pretty wide.

Yes, on the other hand we only recently switched from select() to poll().

I guess using ms resolution wouldn't be too bad for non-Linux.  After
all before dynticks support was added to the alarm timer, it used to use
/dev/rtc or /dev/hpet -- which is very precise but only has ms
resolution too.

We might not care about the slack either, in practice.  TCG timing sucks
anyway, and for KVM/Xen most relevant device models are in the kernel.

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [Qemu-devel] [PATCHv2a] [RFC 8/7 (really)] Add prctl(PR_SET_TIMERSLACK, 1, ...) to reduce timer slack
  2013-07-24  7:54                   ` Paolo Bonzini
  2013-07-24  8:06                     ` Alex Bligh
@ 2013-07-24 14:46                     ` Alex Bligh
  1 sibling, 0 replies; 63+ messages in thread
From: Alex Bligh @ 2013-07-24 14:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Anthony Liguori, Alex Bligh, Stefan Hajnoczi,
	Paolo Bonzini, rth

Where supported, called prctl(PR_SET_TIMERSLACK, 1, ...) to
set one nanosecond timer slack to increase precision of timer
calls.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
---
[ Additional patch on the the end of the PATCHv2 series - I'll
resend if I have further comments with this reordered so it's
next to the ppoll introduction ]

 configure    |   18 ++++++++++++++++++
 qemu-timer.c |    7 +++++++
 2 files changed, 25 insertions(+)

diff --git a/configure b/configure
index b491c00..c8e39bc 100755
--- a/configure
+++ b/configure
@@ -2817,6 +2817,21 @@ if compile_prog "" "" ; then
   ppoll=yes
 fi
 
+# check for prctl(PR_SET_TIMERSLACK , ... ) support
+prctl_pr_set_timerslack=no
+cat > $TMPC << EOF
+#include <sys/prctl.h>
+
+int main(void)
+{
+    prctl(PR_SET_TIMERSLACK, 1, 0, 0, 0);
+    return 0;
+}
+EOF
+if compile_prog "" "" ; then
+  prctl_pr_set_timerslack=yes
+fi
+
 # check for epoll support
 epoll=no
 cat > $TMPC << EOF
@@ -3811,6 +3826,9 @@ fi
 if test "$ppoll" = "yes" ; then
   echo "CONFIG_PPOLL=y" >> $config_host_mak
 fi
+if test "$prctl_pr_set_timerslack" = "yes" ; then
+  echo "CONFIG_PRCTL_PR_SET_TIMERSLACK=y" >> $config_host_mak
+fi
 if test "$epoll" = "yes" ; then
   echo "CONFIG_EPOLL=y" >> $config_host_mak
 fi
diff --git a/qemu-timer.c b/qemu-timer.c
index 4cba055..53cb62c 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -37,6 +37,10 @@
 #include <poll.h>
 #endif
 
+#ifdef CONFIG_PRCTL_PR_SET_TIMERSLACK
+#include <sys/prctl.h>
+#endif
+
 /***********************************************************/
 /* timers */
 
@@ -367,6 +371,9 @@ void init_clocks(void)
         vm_clock = qemu_new_clock(QEMU_CLOCK_VIRTUAL);
         host_clock = qemu_new_clock(QEMU_CLOCK_HOST);
     }
+#ifdef CONFIG_PRCTL_PR_SET_TIMERSLACK
+    prctl(PR_SET_TIMERSLACK, 1, 0, 0, 0);
+#endif
 }
 
 uint64_t qemu_timer_expire_time_ns(QEMUTimer *ts)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-23  2:51       ` liu ping fan
@ 2013-07-25 11:44         ` Stefan Hajnoczi
  2013-07-25 12:01           ` Jan Kiszka
  0 siblings, 1 reply; 63+ messages in thread
From: Stefan Hajnoczi @ 2013-07-25 11:44 UTC (permalink / raw)
  To: liu ping fan
  Cc: Kevin Wolf, Alex Bligh, Jan Kiszka, Stefan Hajnoczi, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On Tue, Jul 23, 2013 at 10:51:06AM +0800, liu ping fan wrote:
> On Mon, Jul 22, 2013 at 2:28 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > On 2013-07-22 06:38, liu ping fan wrote:
> >> On Sun, Jul 21, 2013 at 5:53 PM, Alex Bligh <alex@alex.org.uk> wrote:
> >>> Liu,
> >>>
> >>>
> >>> --On 21 July 2013 16:42:57 +0800 Liu Ping Fan <qemulist@gmail.com> wrote:
> >>>
> >>>> Currently, the timers run on iothread within BQL, so virtio-block
> >>>> dataplane can not use throttle, as Stefan Hajnoczi pointed out in his
> >>>> patches to port dataplane onto block layer.(Thanks, Stefan) To enable
> >>>> this feature, I plan to enable timers to run on AioContext's thread. And
> >>>> maybe in future, hpet can run with its dedicated thread too.
> >>>>
> >>>> Also, I see Alex Bligh is on the same effort by another method,(it is a
> >>>> good idea)    "[RFC] aio/async: Add timed bottom-halves".
> >>>
> >>>
> >>> Stefan & Paolo did not like that method much, so I did a third method
> >>> (posted yesterday) suggested by Stefan which adds a clock to AioContext (to
> >>> which timers can be attached), deletes ALL the alarm_timer stuff (which was
> >>> very cathartic), uses timeouts on the g_poll, and adds ppoll where this is
> >>> available. Series at:
> >>>  http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg03334.html
> >>>
> >>> I suspect this also overlaps with your code.
> >>>
> >>> So now we have 3 methods to do similar things!
> >>>
> >>> One advantage of my approach is that it removes more code than it adds
> >>> (by quite a margin). However, alarm timers could have been left in.
> >>> What's the advantage in giving an AioContext its own alarm timer as
> >>> opposed to just its own clock?
> >>>
> >> I read your second series, and try to summary the main different between us.
> >> Please correct me, if I misunderstood something.
> >> --1st. You try to create a separate QemuClock for AioContext.
> >>     I think QemuClock is the clock event source and we have three
> >> classic with fine definition. They should be qemu-wide for time
> >> measurement.  On the other handler, timer is  a concept for timeout,
> >
> > Timers, as used in QEMU, are not only for "unimportant" and
> > unlikely-to-fire timeouts. They are also for potential high-rate, high
> 
> Sorry, but can not catch the point.  QemuTimer is used to trigger
> something, so the process can be delayed for some reason. For high
> resolution events, do we go directly to QEMUClock, which fall back on
> host's hpet? And the use case is just for get the time stamp, NOT for
> trigger?

It's also unclear to me what you mean, Jan.  Which approach exactly are
you unhappy with?

We always invoke timer callbacks from the main loop, there is no
high-frequency high-priority timer code path, so I'm not sure what the
issue is.

Stefan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-22 10:18       ` liu ping fan
  2013-07-23  2:53         ` liu ping fan
@ 2013-07-25 11:47         ` Stefan Hajnoczi
  1 sibling, 0 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2013-07-25 11:47 UTC (permalink / raw)
  To: liu ping fan
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka, qemu-devel,
	Anthony Liguori, Paolo Bonzini

On Mon, Jul 22, 2013 at 06:18:03PM +0800, liu ping fan wrote:
> On Mon, Jul 22, 2013 at 5:40 PM, Alex Bligh <alex@alex.org.uk> wrote:
> > Liu,
> >
> >
> > --On 22 July 2013 12:38:02 +0800 liu ping fan <qemulist@gmail.com> wrote:
> >
> >> I read your second series, and try to summary the main different between
> >> us. Please correct me, if I misunderstood something.
> >> --1st. You try to create a separate QemuClock for AioContext.
> >>     I think QemuClock is the clock event source and we have three
> >> classic with fine definition. They should be qemu-wide for time
> >> measurement.  On the other handler, timer is  a concept for timeout,
> >> so it can be AioContext-related. So I have patch2&5.
> >
> >
> > Yes and no. QEMUClock is not a clock source. QEMUClock /has/ a clock
> > source, and more than one QEMUClock can share the same clock source.
> >
> Supposed the case sync the time across iothread/vcpu/dataplane, the
> vm_clock is a unique interface for all of them.
> 
> > QEMUClock does have its own list of timers, and so if you only
> > want to run a subset of timers in aio_poll (which is I believe the
> > requirement so not to change the behaviour of existing timers),
> > you need a separate QEMUClock.
> >
> > The approach I took (StefanH's idea) was to put a QEMUClock into
> > each AioContext. Arguably you might want to put a set of 3 QEMUClock's
> > into each AioContext, one for each clock source.
> >
> > QEMUClock itself is very lightweight as far as I can tell.
> >
> I think the weight is not the point here, rather, the concept.
> >
> >> --2nd. You want to substitute alarm_timer with timeout of poll.
> >
> >
> > Correct.
> >
> >
> >>     I try to trigger each specified thread when its deadline comes.
> >> But unfortunately, the signal can not be delivered to the specified
> >> thread directly, and I need timerfd for each AioContext. (If we can
> >> have the equivalent on other platform)
> >
> >
> > Firstly, I can't see the advantage of keeping the alarm_timer stuff around
> > at all if we can delete it. Save, of course, that on systems that don't
> > have ppoll or equivalent you lose sub-millisecond timing by deleting them.
> >
> I do not think it very clearly. But the final aim is to associate an
> eventfd with deadline, and attach the eventfd to g_poll.  For linux,
> it can be easily achieved by timerfd,  for other posix, what I can
> thought till now is to distinguish the source of timer in signal
> handler, and revoke a eventfd (stands for deadline) on AioContext.
> I am not sure whether this is better than timeout of poll.

Using eventfds for timeouts is more heavy-weight.  It means a file
descriptor per timeout and more system calls.   We already need to do
the poll(2) system call, we might as well make use of the timeout
argument.

Stefan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 11:44         ` Stefan Hajnoczi
@ 2013-07-25 12:01           ` Jan Kiszka
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Kiszka @ 2013-07-25 12:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Alex Bligh, Stefan Hajnoczi, liu ping fan,
	qemu-devel, Anthony Liguori, Paolo Bonzini

On 2013-07-25 13:44, Stefan Hajnoczi wrote:
> On Tue, Jul 23, 2013 at 10:51:06AM +0800, liu ping fan wrote:
>> On Mon, Jul 22, 2013 at 2:28 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2013-07-22 06:38, liu ping fan wrote:
>>>> On Sun, Jul 21, 2013 at 5:53 PM, Alex Bligh <alex@alex.org.uk> wrote:
>>>>> Liu,
>>>>>
>>>>>
>>>>> --On 21 July 2013 16:42:57 +0800 Liu Ping Fan <qemulist@gmail.com> wrote:
>>>>>
>>>>>> Currently, the timers run on iothread within BQL, so virtio-block
>>>>>> dataplane can not use throttle, as Stefan Hajnoczi pointed out in his
>>>>>> patches to port dataplane onto block layer.(Thanks, Stefan) To enable
>>>>>> this feature, I plan to enable timers to run on AioContext's thread. And
>>>>>> maybe in future, hpet can run with its dedicated thread too.
>>>>>>
>>>>>> Also, I see Alex Bligh is on the same effort by another method,(it is a
>>>>>> good idea)    "[RFC] aio/async: Add timed bottom-halves".
>>>>>
>>>>>
>>>>> Stefan & Paolo did not like that method much, so I did a third method
>>>>> (posted yesterday) suggested by Stefan which adds a clock to AioContext (to
>>>>> which timers can be attached), deletes ALL the alarm_timer stuff (which was
>>>>> very cathartic), uses timeouts on the g_poll, and adds ppoll where this is
>>>>> available. Series at:
>>>>>  http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg03334.html
>>>>>
>>>>> I suspect this also overlaps with your code.
>>>>>
>>>>> So now we have 3 methods to do similar things!
>>>>>
>>>>> One advantage of my approach is that it removes more code than it adds
>>>>> (by quite a margin). However, alarm timers could have been left in.
>>>>> What's the advantage in giving an AioContext its own alarm timer as
>>>>> opposed to just its own clock?
>>>>>
>>>> I read your second series, and try to summary the main different between us.
>>>> Please correct me, if I misunderstood something.
>>>> --1st. You try to create a separate QemuClock for AioContext.
>>>>     I think QemuClock is the clock event source and we have three
>>>> classic with fine definition. They should be qemu-wide for time
>>>> measurement.  On the other handler, timer is  a concept for timeout,
>>>
>>> Timers, as used in QEMU, are not only for "unimportant" and
>>> unlikely-to-fire timeouts. They are also for potential high-rate, high
>>
>> Sorry, but can not catch the point.  QemuTimer is used to trigger
>> something, so the process can be delayed for some reason. For high
>> resolution events, do we go directly to QEMUClock, which fall back on
>> host's hpet? And the use case is just for get the time stamp, NOT for
>> trigger?
> 
> It's also unclear to me what you mean, Jan.  Which approach exactly are
> you unhappy with?

"Timer is a concept for timeouts" is either an unfortunate term or
indicates a misunderstanding what timers are used for, also in QEMU. You
may currently see them only in the context of timeouts, but they are
also used for periodic or continuously reprogrammed one-shot guest timers.

Make sure that we can easily extend the timer architecture so that
selected timers can run against their own carrier threads while the rest
sticks with the mainloop or whatever best-effort context, and I'm fine.

> 
> We always invoke timer callbacks from the main loop, there is no
> high-frequency high-priority timer code path,

...yet...

> so I'm not sure what the
> issue is.
> 
> Stefan
> 

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
                   ` (8 preceding siblings ...)
  2013-07-21  9:53 ` [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Alex Bligh
@ 2013-07-25 12:05 ` Stefan Hajnoczi
  2013-07-25 12:21   ` Alex Bligh
  9 siblings, 1 reply; 63+ messages in thread
From: Stefan Hajnoczi @ 2013-07-25 12:05 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, qemu-devel, Alex Bligh,
	Anthony Liguori, Paolo Bonzini

On Sun, Jul 21, 2013 at 04:42:57PM +0800, Liu Ping Fan wrote:
> Currently, the timers run on iothread within BQL, so virtio-block dataplane can not use throttle,
> as Stefan Hajnoczi pointed out in his patches to port dataplane onto block layer.(Thanks, Stefan)
> To enable this feature, I plan to enable timers to run on AioContext's thread.
> And maybe in future, hpet can run with its dedicated thread too.
> 
> Also, I see Alex Bligh is on the same effort by another method,(it is a good idea) 
>   "[RFC] aio/async: Add timed bottom-halves".
> So I think it is better to post my series for discussion, although I have not thought
> very clearly point, e.g. sigaction 
> 
> This series ports two parts of timer stuff onto AioContext: alarm timer and timer list.
> Currently I worked based on Stefanha's git tree
> 	 https://github.com/stefanha/qemu.git dataplane-block-layer.
> 
> ---
> Open issue:
>   The thread safe problem on timer list. To resolve that, I plan to adopt the bh model.
> (Not sure about this, maybe Stefan's solution in another thread is better)  
> Although leave most of the race issue unresolved, patch 4 has tried to fix one of
> them as Jan Kiszka points out that vm_clock can be read outside BQL, thanks Jan :)

Thanks for sharing this series.

I think Paolo's idea of getting rid of alarm timers is a good one.  We
end up with less code and fewer system calls.  The only issue is timer
resolution and at least on Linux we can use ppoll(2).

Alex Bligh's series gives each AioContext its own rt_clock.  This avoids
the need for synchronization in the simple case.  If we require timer
access between threads then we really need to synchronize.

You pointed out in another email that vm_clock stops when the guest is
paused.  I think we can find a solution for I/O throttling and QED,
which use vm_clock in the block layer.  Note that block jobs already use
rt_clock.

I'm in favor of Alex Bligh's series rather than continuing to use alarm
timers.

Stefan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 12:05 ` Stefan Hajnoczi
@ 2013-07-25 12:21   ` Alex Bligh
  2013-07-25 12:32     ` Jan Kiszka
  0 siblings, 1 reply; 63+ messages in thread
From: Alex Bligh @ 2013-07-25 12:21 UTC (permalink / raw)
  To: Stefan Hajnoczi, Liu Ping Fan
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka, qemu-devel,
	Anthony Liguori, Paolo Bonzini



--On 25 July 2013 14:05:30 +0200 Stefan Hajnoczi <stefanha@gmail.com> wrote:

> Alex Bligh's series gives each AioContext its own rt_clock.  This avoids
> the need for synchronization in the simple case.  If we require timer
> access between threads then we really need to synchronize.
>
> You pointed out in another email that vm_clock stops when the guest is
> paused.  I think we can find a solution for I/O throttling and QED,
> which use vm_clock in the block layer.  Note that block jobs already use
> rt_clock.

I would happily at a QEMUClock of each type to AioContext. They are after
all pretty lightweight.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 12:21   ` Alex Bligh
@ 2013-07-25 12:32     ` Jan Kiszka
  2013-07-25 12:35       ` Paolo Bonzini
  2013-07-25 18:53       ` Alex Bligh
  0 siblings, 2 replies; 63+ messages in thread
From: Jan Kiszka @ 2013-07-25 12:32 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Stefan Hajnoczi, Stefan Hajnoczi, Liu Ping Fan,
	qemu-devel, Anthony Liguori, Paolo Bonzini

On 2013-07-25 14:21, Alex Bligh wrote:
> 
> 
> --On 25 July 2013 14:05:30 +0200 Stefan Hajnoczi <stefanha@gmail.com>
> wrote:
> 
>> Alex Bligh's series gives each AioContext its own rt_clock.  This avoids
>> the need for synchronization in the simple case.  If we require timer
>> access between threads then we really need to synchronize.
>>
>> You pointed out in another email that vm_clock stops when the guest is
>> paused.  I think we can find a solution for I/O throttling and QED,
>> which use vm_clock in the block layer.  Note that block jobs already use
>> rt_clock.
> 
> I would happily at a QEMUClock of each type to AioContext. They are after
> all pretty lightweight.

What's the point of adding tones of QEMUClock instances? Considering
proper abstraction, how are they different for each AioContext? Will
they run against different clock sources, start/stop at different times?
If the answer is "they have different timer list", then fix this
incorrect abstraction.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 12:32     ` Jan Kiszka
@ 2013-07-25 12:35       ` Paolo Bonzini
  2013-07-25 12:38         ` Jan Kiszka
  2013-07-25 18:53       ` Alex Bligh
  1 sibling, 1 reply; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-25 12:35 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Alex Bligh, Stefan Hajnoczi, Liu Ping Fan,
	qemu-devel, Stefan Hajnoczi, Anthony Liguori

Il 25/07/2013 14:32, Jan Kiszka ha scritto:
> On 2013-07-25 14:21, Alex Bligh wrote:
>>
>>
>> --On 25 July 2013 14:05:30 +0200 Stefan Hajnoczi <stefanha@gmail.com>
>> wrote:
>>
>>> Alex Bligh's series gives each AioContext its own rt_clock.  This avoids
>>> the need for synchronization in the simple case.  If we require timer
>>> access between threads then we really need to synchronize.
>>>
>>> You pointed out in another email that vm_clock stops when the guest is
>>> paused.  I think we can find a solution for I/O throttling and QED,
>>> which use vm_clock in the block layer.  Note that block jobs already use
>>> rt_clock.
>>
>> I would happily at a QEMUClock of each type to AioContext. They are after
>> all pretty lightweight.
> 
> What's the point of adding tones of QEMUClock instances? Considering
> proper abstraction, how are they different for each AioContext? Will
> they run against different clock sources, start/stop at different times?
> If the answer is "they have different timer list", then fix this
> incorrect abstraction.

s/QEMUClock/QEMUTimerList/ ? :)

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 12:35       ` Paolo Bonzini
@ 2013-07-25 12:38         ` Jan Kiszka
  2013-07-25 12:41           ` Stefan Hajnoczi
  2013-07-25 12:59           ` Paolo Bonzini
  0 siblings, 2 replies; 63+ messages in thread
From: Jan Kiszka @ 2013-07-25 12:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Alex Bligh, Stefan Hajnoczi, Liu Ping Fan,
	qemu-devel, Stefan Hajnoczi, Anthony Liguori

On 2013-07-25 14:35, Paolo Bonzini wrote:
> Il 25/07/2013 14:32, Jan Kiszka ha scritto:
>> On 2013-07-25 14:21, Alex Bligh wrote:
>>>
>>>
>>> --On 25 July 2013 14:05:30 +0200 Stefan Hajnoczi <stefanha@gmail.com>
>>> wrote:
>>>
>>>> Alex Bligh's series gives each AioContext its own rt_clock.  This avoids
>>>> the need for synchronization in the simple case.  If we require timer
>>>> access between threads then we really need to synchronize.
>>>>
>>>> You pointed out in another email that vm_clock stops when the guest is
>>>> paused.  I think we can find a solution for I/O throttling and QED,
>>>> which use vm_clock in the block layer.  Note that block jobs already use
>>>> rt_clock.
>>>
>>> I would happily at a QEMUClock of each type to AioContext. They are after
>>> all pretty lightweight.
>>
>> What's the point of adding tones of QEMUClock instances? Considering
>> proper abstraction, how are they different for each AioContext? Will
>> they run against different clock sources, start/stop at different times?
>> If the answer is "they have different timer list", then fix this
>> incorrect abstraction.
> 
> s/QEMUClock/QEMUTimerList/ ? :)

What do you mean? If the content of struct QEMUClock remained the same,
that would just paper over a design mistake.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 12:38         ` Jan Kiszka
@ 2013-07-25 12:41           ` Stefan Hajnoczi
  2013-07-25 12:48             ` Jan Kiszka
  2013-07-25 12:59           ` Paolo Bonzini
  1 sibling, 1 reply; 63+ messages in thread
From: Stefan Hajnoczi @ 2013-07-25 12:41 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Alex Bligh, Liu Ping Fan, qemu-devel,
	Stefan Hajnoczi, Anthony Liguori, Paolo Bonzini

On Thu, Jul 25, 2013 at 2:38 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2013-07-25 14:35, Paolo Bonzini wrote:
>> Il 25/07/2013 14:32, Jan Kiszka ha scritto:
>>> On 2013-07-25 14:21, Alex Bligh wrote:
>>>>
>>>>
>>>> --On 25 July 2013 14:05:30 +0200 Stefan Hajnoczi <stefanha@gmail.com>
>>>> wrote:
>>>>
>>>>> Alex Bligh's series gives each AioContext its own rt_clock.  This avoids
>>>>> the need for synchronization in the simple case.  If we require timer
>>>>> access between threads then we really need to synchronize.
>>>>>
>>>>> You pointed out in another email that vm_clock stops when the guest is
>>>>> paused.  I think we can find a solution for I/O throttling and QED,
>>>>> which use vm_clock in the block layer.  Note that block jobs already use
>>>>> rt_clock.
>>>>
>>>> I would happily at a QEMUClock of each type to AioContext. They are after
>>>> all pretty lightweight.
>>>
>>> What's the point of adding tones of QEMUClock instances? Considering
>>> proper abstraction, how are they different for each AioContext? Will
>>> they run against different clock sources, start/stop at different times?
>>> If the answer is "they have different timer list", then fix this
>>> incorrect abstraction.
>>
>> s/QEMUClock/QEMUTimerList/ ? :)
>
> What do you mean? If the content of struct QEMUClock remained the same,
> that would just paper over a design mistake.

QEMUClock really isn't much more than a reference to a clock source,
an enabled/disabled flag, and a timer list.

The point of having one QEMUClock per AioContext is to allow each
thread's event loop to have its own timers, without synchronization.

What is the design mistake?  I only see poor naming.

Stefan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 12:41           ` Stefan Hajnoczi
@ 2013-07-25 12:48             ` Jan Kiszka
  2013-07-25 13:02               ` Paolo Bonzini
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Kiszka @ 2013-07-25 12:48 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Alex Bligh, Liu Ping Fan, qemu-devel,
	Stefan Hajnoczi, Anthony Liguori, Paolo Bonzini

On 2013-07-25 14:41, Stefan Hajnoczi wrote:
> On Thu, Jul 25, 2013 at 2:38 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2013-07-25 14:35, Paolo Bonzini wrote:
>>> Il 25/07/2013 14:32, Jan Kiszka ha scritto:
>>>> On 2013-07-25 14:21, Alex Bligh wrote:
>>>>>
>>>>>
>>>>> --On 25 July 2013 14:05:30 +0200 Stefan Hajnoczi <stefanha@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Alex Bligh's series gives each AioContext its own rt_clock.  This avoids
>>>>>> the need for synchronization in the simple case.  If we require timer
>>>>>> access between threads then we really need to synchronize.
>>>>>>
>>>>>> You pointed out in another email that vm_clock stops when the guest is
>>>>>> paused.  I think we can find a solution for I/O throttling and QED,
>>>>>> which use vm_clock in the block layer.  Note that block jobs already use
>>>>>> rt_clock.
>>>>>
>>>>> I would happily at a QEMUClock of each type to AioContext. They are after
>>>>> all pretty lightweight.
>>>>
>>>> What's the point of adding tones of QEMUClock instances? Considering
>>>> proper abstraction, how are they different for each AioContext? Will
>>>> they run against different clock sources, start/stop at different times?
>>>> If the answer is "they have different timer list", then fix this
>>>> incorrect abstraction.
>>>
>>> s/QEMUClock/QEMUTimerList/ ? :)
>>
>> What do you mean? If the content of struct QEMUClock remained the same,
>> that would just paper over a design mistake.
> 
> QEMUClock really isn't much more than a reference to a clock source,
> an enabled/disabled flag, and a timer list.
> 
> The point of having one QEMUClock per AioContext is to allow each
> thread's event loop to have its own timers, without synchronization.
> 
> What is the design mistake?  I only see poor naming.

The concept of clocks (with start/stop property) and active timers shall
not be mixed, they are independent. Just factor out a separate list of
running timers and pass that to the infrastructure that deals with them.
I attached them to a QEMUAlarmTimer instance, you will likely need a
different abstraction.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 12:38         ` Jan Kiszka
  2013-07-25 12:41           ` Stefan Hajnoczi
@ 2013-07-25 12:59           ` Paolo Bonzini
  1 sibling, 0 replies; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-25 12:59 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Alex Bligh, Stefan Hajnoczi, Liu Ping Fan,
	qemu-devel, Stefan Hajnoczi, Anthony Liguori

Il 25/07/2013 14:38, Jan Kiszka ha scritto:
> On 2013-07-25 14:35, Paolo Bonzini wrote:
>> Il 25/07/2013 14:32, Jan Kiszka ha scritto:
>>> On 2013-07-25 14:21, Alex Bligh wrote:
>>>>
>>>>
>>>> --On 25 July 2013 14:05:30 +0200 Stefan Hajnoczi <stefanha@gmail.com>
>>>> wrote:
>>>>
>>>>> Alex Bligh's series gives each AioContext its own rt_clock.  This avoids
>>>>> the need for synchronization in the simple case.  If we require timer
>>>>> access between threads then we really need to synchronize.
>>>>>
>>>>> You pointed out in another email that vm_clock stops when the guest is
>>>>> paused.  I think we can find a solution for I/O throttling and QED,
>>>>> which use vm_clock in the block layer.  Note that block jobs already use
>>>>> rt_clock.
>>>>
>>>> I would happily at a QEMUClock of each type to AioContext. They are after
>>>> all pretty lightweight.
>>>
>>> What's the point of adding tones of QEMUClock instances? Considering
>>> proper abstraction, how are they different for each AioContext? Will
>>> they run against different clock sources, start/stop at different times?
>>> If the answer is "they have different timer list", then fix this
>>> incorrect abstraction.
>>
>> s/QEMUClock/QEMUTimerList/ ? :)
> 
> What do you mean? If the content of struct QEMUClock remained the same,
> that would just paper over a design mistake.

I mean that really the "clock" part of QEMUClock is really just

#define QEMU_CLOCK_REALTIME 0
#define QEMU_CLOCK_VIRTUAL  1
#define QEMU_CLOCK_HOST     2

and

    switch(clock_type) {
    case QEMU_CLOCK_REALTIME:
        return get_clock();
    default:
    case QEMU_CLOCK_VIRTUAL:
        if (use_icount) {
            return cpu_get_icount();
        } else {
            return cpu_get_clock();
        }
    case QEMU_CLOCK_HOST:
        now = get_clock_realtime();
        last = clock->last;
        clock->last = now;
        if (now < last) {
            notifier_list_notify(&clock->reset_notifiers, &now);
        }
        return now;
    }

Everything else in QEMUClock is just a list of timers.  You also cannot
put timers from different clocks in the same list.

Different AioContexts needs different lists of timers even for the same
source (for obvious reasons of thread-safety).  Different sources need
different lists of timers because they do not advance together.  The
most obvious example is vm_clock which can stop for an arbitrary amount
of time while other rt_clock timers can be executed.  It is a
fundamental invariant of QEMUClock that if the first timer in the list
hasn't expired, the following ones also haven't.

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 12:48             ` Jan Kiszka
@ 2013-07-25 13:02               ` Paolo Bonzini
  2013-07-25 13:06                 ` Jan Kiszka
  0 siblings, 1 reply; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-25 13:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Alex Bligh, Stefan Hajnoczi, Liu Ping Fan,
	qemu-devel, Stefan Hajnoczi, Anthony Liguori

Il 25/07/2013 14:48, Jan Kiszka ha scritto:
> The concept of clocks (with start/stop property) and active timers shall
> not be mixed, they are independent. 

Are you referring to this in particular:

void pause_all_vcpus(void)
{
    CPUState *cpu = first_cpu;

    qemu_clock_enable(vm_clock, false);
...
}

void resume_all_vcpus(void)
{
    CPUState *cpu = first_cpu;

    qemu_clock_enable(vm_clock, true);
...
}

where _all_ the vm_clock timer lists need to be stopped at the same time?

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 13:02               ` Paolo Bonzini
@ 2013-07-25 13:06                 ` Jan Kiszka
  2013-07-25 13:31                   ` Stefan Hajnoczi
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Kiszka @ 2013-07-25 13:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Alex Bligh, Stefan Hajnoczi, Liu Ping Fan,
	qemu-devel, Stefan Hajnoczi, Anthony Liguori

On 2013-07-25 15:02, Paolo Bonzini wrote:
> Il 25/07/2013 14:48, Jan Kiszka ha scritto:
>> The concept of clocks (with start/stop property) and active timers shall
>> not be mixed, they are independent. 
> 
> Are you referring to this in particular:
> 
> void pause_all_vcpus(void)
> {
>     CPUState *cpu = first_cpu;
> 
>     qemu_clock_enable(vm_clock, false);
> ...
> }
> 
> void resume_all_vcpus(void)
> {
>     CPUState *cpu = first_cpu;
> 
>     qemu_clock_enable(vm_clock, true);
> ...
> }
> 
> where _all_ the vm_clock timer lists need to be stopped at the same time?

Still wrong abstraction: the vm_clock is stopped. And that implies that
no timer using this clock will fire anymore.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 13:06                 ` Jan Kiszka
@ 2013-07-25 13:31                   ` Stefan Hajnoczi
  2013-07-25 14:01                     ` Jan Kiszka
  0 siblings, 1 reply; 63+ messages in thread
From: Stefan Hajnoczi @ 2013-07-25 13:31 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Alex Bligh, Liu Ping Fan, qemu-devel,
	Stefan Hajnoczi, Anthony Liguori, Paolo Bonzini

On Thu, Jul 25, 2013 at 3:06 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2013-07-25 15:02, Paolo Bonzini wrote:
>> Il 25/07/2013 14:48, Jan Kiszka ha scritto:
>>> The concept of clocks (with start/stop property) and active timers shall
>>> not be mixed, they are independent.
>>
>> Are you referring to this in particular:
>>
>> void pause_all_vcpus(void)
>> {
>>     CPUState *cpu = first_cpu;
>>
>>     qemu_clock_enable(vm_clock, false);
>> ...
>> }
>>
>> void resume_all_vcpus(void)
>> {
>>     CPUState *cpu = first_cpu;
>>
>>     qemu_clock_enable(vm_clock, true);
>> ...
>> }
>>
>> where _all_ the vm_clock timer lists need to be stopped at the same time?
>
> Still wrong abstraction: the vm_clock is stopped. And that implies that
> no timer using this clock will fire anymore.

Perhaps we should split QEMUClock into ClockSource and TimerList.
Instead of using the int type field in TimerList we keep a ClockSource
pointer.

A ClockSource can be enabled/disabled.

Then there's the question of ClockSource thread-safety.  Host clock is
weird because it notifies listeners when the clock source jumps.  vm
clock is hard because of icount.

It's also weird to think about multiple threads dispatching timer
callbacks while a ClockSource is disabled.  Basically, do we need a
guarantee that timer callbacks have completed and will not be invoked
in any thread after the disable function returns?

But assuming thread-safety is addressed, this would fit the design
you're suggesting.

Stefan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 13:31                   ` Stefan Hajnoczi
@ 2013-07-25 14:01                     ` Jan Kiszka
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Kiszka @ 2013-07-25 14:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Alex Bligh, Liu Ping Fan, qemu-devel,
	Stefan Hajnoczi, Anthony Liguori, Paolo Bonzini

On 2013-07-25 15:31, Stefan Hajnoczi wrote:
> On Thu, Jul 25, 2013 at 3:06 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2013-07-25 15:02, Paolo Bonzini wrote:
>>> Il 25/07/2013 14:48, Jan Kiszka ha scritto:
>>>> The concept of clocks (with start/stop property) and active timers shall
>>>> not be mixed, they are independent.
>>>
>>> Are you referring to this in particular:
>>>
>>> void pause_all_vcpus(void)
>>> {
>>>     CPUState *cpu = first_cpu;
>>>
>>>     qemu_clock_enable(vm_clock, false);
>>> ...
>>> }
>>>
>>> void resume_all_vcpus(void)
>>> {
>>>     CPUState *cpu = first_cpu;
>>>
>>>     qemu_clock_enable(vm_clock, true);
>>> ...
>>> }
>>>
>>> where _all_ the vm_clock timer lists need to be stopped at the same time?
>>
>> Still wrong abstraction: the vm_clock is stopped. And that implies that
>> no timer using this clock will fire anymore.
> 
> Perhaps we should split QEMUClock into ClockSource and TimerList.
> Instead of using the int type field in TimerList we keep a ClockSource
> pointer.
> 
> A ClockSource can be enabled/disabled.
> 
> Then there's the question of ClockSource thread-safety.  Host clock is
> weird because it notifies listeners when the clock source jumps.  

Detection can be handled with a per-clock lock, dispatching can continue
to run under BQL for now.

> vm clock is hard because of icount.

True.

> 
> It's also weird to think about multiple threads dispatching timer
> callbacks while a ClockSource is disabled.  Basically, do we need a
> guarantee that timer callbacks have completed and will not be invoked
> in any thread after the disable function returns?

We need to make sure that no timer callback dispatcher works with a
clock value that is past the one at which we stopped. The simple answer
to this is a lock...

> 
> But assuming thread-safety is addressed, this would fit the design
> you're suggesting.

Yes, this looks good.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 12:32     ` Jan Kiszka
  2013-07-25 12:35       ` Paolo Bonzini
@ 2013-07-25 18:53       ` Alex Bligh
  2013-07-26  8:43         ` Stefan Hajnoczi
  2013-07-26 10:05         ` Jan Kiszka
  1 sibling, 2 replies; 63+ messages in thread
From: Alex Bligh @ 2013-07-25 18:53 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Stefan Hajnoczi,
	Liu Ping Fan, qemu-devel, Anthony Liguori, Paolo Bonzini



--On 25 July 2013 14:32:59 +0200 Jan Kiszka <jan.kiszka@siemens.com> wrote:

>> I would happily at a QEMUClock of each type to AioContext. They are after
>> all pretty lightweight.
>
> What's the point of adding tones of QEMUClock instances? Considering
> proper abstraction, how are they different for each AioContext? Will
> they run against different clock sources, start/stop at different times?
> If the answer is "they have different timer list", then fix this
> incorrect abstraction.

Even if I fix the abstraction, there is a question of whether it is
necessary to have more than one timer list per AioContext, because
the timer list is fundamentally per clock-source. I am currently
just using QEMU_CLOCK_REALTIME as that's what the block drivers normally
want. Will block drivers ever want timers from a different clock source?

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 18:53       ` Alex Bligh
@ 2013-07-26  8:43         ` Stefan Hajnoczi
  2013-07-26  9:08           ` Alex Bligh
  2013-07-29  8:58           ` Kevin Wolf
  2013-07-26 10:05         ` Jan Kiszka
  1 sibling, 2 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2013-07-26  8:43 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Stefan Hajnoczi, Jan Kiszka, Liu Ping Fan,
	qemu-devel, Anthony Liguori, Paolo Bonzini

On Thu, Jul 25, 2013 at 07:53:33PM +0100, Alex Bligh wrote:
> 
> 
> --On 25 July 2013 14:32:59 +0200 Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
> >>I would happily at a QEMUClock of each type to AioContext. They are after
> >>all pretty lightweight.
> >
> >What's the point of adding tones of QEMUClock instances? Considering
> >proper abstraction, how are they different for each AioContext? Will
> >they run against different clock sources, start/stop at different times?
> >If the answer is "they have different timer list", then fix this
> >incorrect abstraction.
> 
> Even if I fix the abstraction, there is a question of whether it is
> necessary to have more than one timer list per AioContext, because
> the timer list is fundamentally per clock-source. I am currently
> just using QEMU_CLOCK_REALTIME as that's what the block drivers normally
> want. Will block drivers ever want timers from a different clock source?

block.c and block/qed.c use vm_clock because block drivers should not do
guest I/O while the vm is stopped.  This is especially true during live
migration where it's important to hand off the image file from the
source host to the destination host with good cache consistency.  The
source host is not allowed to modify the image file anymore once the
destination host has resumed the guest.

Block jobs use rt_clock because they aren't considered guest I/O.

Stefan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-26  8:43         ` Stefan Hajnoczi
@ 2013-07-26  9:08           ` Alex Bligh
  2013-07-26  9:19             ` Paolo Bonzini
  2013-07-29  8:58           ` Kevin Wolf
  1 sibling, 1 reply; 63+ messages in thread
From: Alex Bligh @ 2013-07-26  9:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Jan Kiszka,
	Liu Ping Fan, qemu-devel, Anthony Liguori, Paolo Bonzini



--On 26 July 2013 10:43:45 +0200 Stefan Hajnoczi <stefanha@gmail.com> wrote:

> block.c and block/qed.c use vm_clock because block drivers should not do
> guest I/O while the vm is stopped.  This is especially true during live
> migration where it's important to hand off the image file from the
> source host to the destination host with good cache consistency.  The
> source host is not allowed to modify the image file anymore once the
> destination host has resumed the guest.
>
> Block jobs use rt_clock because they aren't considered guest I/O.

That's going to be 'interesting' if qemu-img uses a block driver that
uses a timer, as I don't think the vm timer is running when qemu-img
is running.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-26  9:08           ` Alex Bligh
@ 2013-07-26  9:19             ` Paolo Bonzini
  0 siblings, 0 replies; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-26  9:19 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Anthony Liguori, Stefan Hajnoczi, qemu-devel,
	Liu Ping Fan, Stefan Hajnoczi, Jan Kiszka

Il 26/07/2013 11:08, Alex Bligh ha scritto:
> 
> 
> --On 26 July 2013 10:43:45 +0200 Stefan Hajnoczi <stefanha@gmail.com>
> wrote:
> 
>> block.c and block/qed.c use vm_clock because block drivers should not do
>> guest I/O while the vm is stopped.  This is especially true during live
>> migration where it's important to hand off the image file from the
>> source host to the destination host with good cache consistency.  The
>> source host is not allowed to modify the image file anymore once the
>> destination host has resumed the guest.
>>
>> Block jobs use rt_clock because they aren't considered guest I/O.
> 
> That's going to be 'interesting' if qemu-img uses a block driver that
> uses a timer, as I don't think the vm timer is running when qemu-img
> is running.

Yes, it is, see stubs/cpu-get-clock.c:

#include "qemu-common.h"
#include "qemu/timer.h"

int64_t cpu_get_clock(void)
{
    return get_clock_realtime();
}


vm_clock becomes a synonym of rt_clock basically.

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-25 18:53       ` Alex Bligh
  2013-07-26  8:43         ` Stefan Hajnoczi
@ 2013-07-26 10:05         ` Jan Kiszka
  2013-07-26 19:29           ` Alex Bligh
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Kiszka @ 2013-07-26 10:05 UTC (permalink / raw)
  To: Alex Bligh
  Cc: Kevin Wolf, Stefan Hajnoczi, Stefan Hajnoczi, Liu Ping Fan,
	qemu-devel, Anthony Liguori, Paolo Bonzini

On 2013-07-25 20:53, Alex Bligh wrote:
> 
> 
> --On 25 July 2013 14:32:59 +0200 Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
>>> I would happily at a QEMUClock of each type to AioContext. They are
>>> after
>>> all pretty lightweight.
>>
>> What's the point of adding tones of QEMUClock instances? Considering
>> proper abstraction, how are they different for each AioContext? Will
>> they run against different clock sources, start/stop at different times?
>> If the answer is "they have different timer list", then fix this
>> incorrect abstraction.
> 
> Even if I fix the abstraction, there is a question of whether it is
> necessary to have more than one timer list per AioContext, because
> the timer list is fundamentally per clock-source.

So far. Once we support different handler thread for timers, there will
be more lists than clock sources.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-26 10:05         ` Jan Kiszka
@ 2013-07-26 19:29           ` Alex Bligh
  0 siblings, 0 replies; 63+ messages in thread
From: Alex Bligh @ 2013-07-26 19:29 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Stefan Hajnoczi, Alex Bligh, Stefan Hajnoczi,
	Liu Ping Fan, qemu-devel, Anthony Liguori, Paolo Bonzini

Jan,

--On 26 July 2013 12:05:06 +0200 Jan Kiszka <jan.kiszka@siemens.com> wrote:

>>>> I would happily at a QEMUClock of each type to AioContext. They are
>>>> after
>>>> all pretty lightweight.
>>>
>>> What's the point of adding tones of QEMUClock instances? Considering
>>> proper abstraction, how are they different for each AioContext? Will
>>> they run against different clock sources, start/stop at different times?
>>> If the answer is "they have different timer list", then fix this
>>> incorrect abstraction.
>>
>> Even if I fix the abstraction, there is a question of whether it is
>> necessary to have more than one timer list per AioContext, because
>> the timer list is fundamentally per clock-source.
>
> So far. Once we support different handler thread for timers, there will
> be more lists than clock sources.

Right.

So my PATCHv4 series breaks up QEMUClock into QEMUClock and QEMUTimerList:
   http://lists.nongnu.org/archive/html/qemu-devel/2013-07/msg04887.html

Unfortunately I've needed to keep qemu_new_timer AND
qemu_new_timer_timerlist about to avoid horrendous git stats
due to the consequent API changes (I did it, but you don't want to see
it), and these have several variants. Apart from that it wasn't too bad.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-26  8:43         ` Stefan Hajnoczi
  2013-07-26  9:08           ` Alex Bligh
@ 2013-07-29  8:58           ` Kevin Wolf
  2013-07-29 10:22             ` Alex Bligh
                               ` (2 more replies)
  1 sibling, 3 replies; 63+ messages in thread
From: Kevin Wolf @ 2013-07-29  8:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Alex Bligh, Jan Kiszka, Liu Ping Fan,
	qemu-devel, Anthony Liguori, Paolo Bonzini

Am 26.07.2013 um 10:43 hat Stefan Hajnoczi geschrieben:
> On Thu, Jul 25, 2013 at 07:53:33PM +0100, Alex Bligh wrote:
> > 
> > 
> > --On 25 July 2013 14:32:59 +0200 Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > 
> > >>I would happily at a QEMUClock of each type to AioContext. They are after
> > >>all pretty lightweight.
> > >
> > >What's the point of adding tones of QEMUClock instances? Considering
> > >proper abstraction, how are they different for each AioContext? Will
> > >they run against different clock sources, start/stop at different times?
> > >If the answer is "they have different timer list", then fix this
> > >incorrect abstraction.
> > 
> > Even if I fix the abstraction, there is a question of whether it is
> > necessary to have more than one timer list per AioContext, because
> > the timer list is fundamentally per clock-source. I am currently
> > just using QEMU_CLOCK_REALTIME as that's what the block drivers normally
> > want. Will block drivers ever want timers from a different clock source?
> 
> block.c and block/qed.c use vm_clock because block drivers should not do
> guest I/O while the vm is stopped.  This is especially true during live
> migration where it's important to hand off the image file from the
> source host to the destination host with good cache consistency.  The
> source host is not allowed to modify the image file anymore once the
> destination host has resumed the guest.
> 
> Block jobs use rt_clock because they aren't considered guest I/O.

But considering your first paragraph, why is it safe to let block jobs
running while we're migrating? Do we really do that? It sounds unsafe to
me.

Kevin

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-29  8:58           ` Kevin Wolf
@ 2013-07-29 10:22             ` Alex Bligh
  2013-07-29 10:45             ` Paolo Bonzini
  2013-07-31  9:02             ` Stefan Hajnoczi
  2 siblings, 0 replies; 63+ messages in thread
From: Alex Bligh @ 2013-07-29 10:22 UTC (permalink / raw)
  To: Kevin Wolf, Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Alex Bligh, Jan Kiszka, Liu Ping Fan,
	qemu-devel, Anthony Liguori, Paolo Bonzini



--On 29 July 2013 10:58:40 +0200 Kevin Wolf <kwolf@redhat.com> wrote:

> But considering your first paragraph, why is it safe to let block jobs
> running while we're migrating? Do we really do that? It sounds unsafe to
> me.

If I remember right, the sending end now does the bdrv_close before
the receiving end does the bdrv_open, and that's right at the
end of migration. It seems to me it should be safe to do block
jobs right up to the bdrv_close.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-29  8:58           ` Kevin Wolf
  2013-07-29 10:22             ` Alex Bligh
@ 2013-07-29 10:45             ` Paolo Bonzini
  2013-07-31  9:02             ` Stefan Hajnoczi
  2 siblings, 0 replies; 63+ messages in thread
From: Paolo Bonzini @ 2013-07-29 10:45 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Alex Bligh, Stefan Hajnoczi, qemu-devel, Liu Ping Fan,
	Stefan Hajnoczi, Jan Kiszka, Anthony Liguori

Il 29/07/2013 10:58, Kevin Wolf ha scritto:
> Am 26.07.2013 um 10:43 hat Stefan Hajnoczi geschrieben:
>> On Thu, Jul 25, 2013 at 07:53:33PM +0100, Alex Bligh wrote:
>>>
>>>
>>> --On 25 July 2013 14:32:59 +0200 Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>
>>>>> I would happily at a QEMUClock of each type to AioContext. They are after
>>>>> all pretty lightweight.
>>>>
>>>> What's the point of adding tones of QEMUClock instances? Considering
>>>> proper abstraction, how are they different for each AioContext? Will
>>>> they run against different clock sources, start/stop at different times?
>>>> If the answer is "they have different timer list", then fix this
>>>> incorrect abstraction.
>>>
>>> Even if I fix the abstraction, there is a question of whether it is
>>> necessary to have more than one timer list per AioContext, because
>>> the timer list is fundamentally per clock-source. I am currently
>>> just using QEMU_CLOCK_REALTIME as that's what the block drivers normally
>>> want. Will block drivers ever want timers from a different clock source?
>>
>> block.c and block/qed.c use vm_clock because block drivers should not do
>> guest I/O while the vm is stopped.  This is especially true during live
>> migration where it's important to hand off the image file from the
>> source host to the destination host with good cache consistency.  The
>> source host is not allowed to modify the image file anymore once the
>> destination host has resumed the guest.
>>
>> Block jobs use rt_clock because they aren't considered guest I/O.
> 
> But considering your first paragraph, why is it safe to let block jobs
> running while we're migrating? Do we really do that? It sounds unsafe to
> me.

I think we should cancel them (synchronously) before the final
bdrv_drain_all().

Paolo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff
  2013-07-29  8:58           ` Kevin Wolf
  2013-07-29 10:22             ` Alex Bligh
  2013-07-29 10:45             ` Paolo Bonzini
@ 2013-07-31  9:02             ` Stefan Hajnoczi
  2 siblings, 0 replies; 63+ messages in thread
From: Stefan Hajnoczi @ 2013-07-31  9:02 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, Alex Bligh, Jan Kiszka, Liu Ping Fan,
	qemu-devel, Anthony Liguori, Paolo Bonzini

On Mon, Jul 29, 2013 at 10:58:40AM +0200, Kevin Wolf wrote:
> Am 26.07.2013 um 10:43 hat Stefan Hajnoczi geschrieben:
> > On Thu, Jul 25, 2013 at 07:53:33PM +0100, Alex Bligh wrote:
> > > 
> > > 
> > > --On 25 July 2013 14:32:59 +0200 Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > > 
> > > >>I would happily at a QEMUClock of each type to AioContext. They are after
> > > >>all pretty lightweight.
> > > >
> > > >What's the point of adding tones of QEMUClock instances? Considering
> > > >proper abstraction, how are they different for each AioContext? Will
> > > >they run against different clock sources, start/stop at different times?
> > > >If the answer is "they have different timer list", then fix this
> > > >incorrect abstraction.
> > > 
> > > Even if I fix the abstraction, there is a question of whether it is
> > > necessary to have more than one timer list per AioContext, because
> > > the timer list is fundamentally per clock-source. I am currently
> > > just using QEMU_CLOCK_REALTIME as that's what the block drivers normally
> > > want. Will block drivers ever want timers from a different clock source?
> > 
> > block.c and block/qed.c use vm_clock because block drivers should not do
> > guest I/O while the vm is stopped.  This is especially true during live
> > migration where it's important to hand off the image file from the
> > source host to the destination host with good cache consistency.  The
> > source host is not allowed to modify the image file anymore once the
> > destination host has resumed the guest.
> > 
> > Block jobs use rt_clock because they aren't considered guest I/O.
> 
> But considering your first paragraph, why is it safe to let block jobs
> running while we're migrating? Do we really do that? It sounds unsafe to
> me.

It is not safe:

1. Block jobs are not migration-aware and it therefore does not make
   sense to run them across live migration.

2. Running block jobs may modify the image file after the destination
   host has resumed the guest.

We simply forgot to add the check.  I'll try to send a patch for this
today.

Stefan

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2013-07-31  9:02 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-21  8:42 [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Liu Ping Fan
2013-07-21  8:42 ` [Qemu-devel] [RFC 1/8] timer: associate alarm_timer with AioContext Liu Ping Fan
2013-07-22  6:55   ` Jan Kiszka
2013-07-21  8:42 ` [Qemu-devel] [RFC 2/8] timer: pick out timer list info from QemuClock Liu Ping Fan
2013-07-21  8:43 ` [Qemu-devel] [RFC 3/8] timer: make timers_state static Liu Ping Fan
2013-07-22  6:36   ` Jan Kiszka
2013-07-22 17:40     ` [Qemu-devel] [Qemu-trivial] " Michael Tokarev
2013-07-21  8:43 ` [Qemu-devel] [RFC 4/8] timer: protect timers_state with lock Liu Ping Fan
2013-07-22  6:40   ` Jan Kiszka
2013-07-21  8:43 ` [Qemu-devel] [RFC 5/8] timer: associate timer with AioContext Liu Ping Fan
2013-07-21  8:43 ` [Qemu-devel] [RFC 6/8] timer: run timers on aio_poll Liu Ping Fan
2013-07-21  9:55   ` Alex Bligh
2013-07-23  2:56     ` liu ping fan
2013-07-23 14:22       ` Alex Bligh
2013-07-21  8:43 ` [Qemu-devel] [RFC 7/8] block: associate BlockDriverState with AioContext Liu Ping Fan
2013-07-21  8:43 ` [Qemu-devel] [RFC 8/8] block: enable throttle with aiocontext Liu Ping Fan
2013-07-21  9:53 ` [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Alex Bligh
2013-07-22  4:38   ` liu ping fan
2013-07-22  6:28     ` Jan Kiszka
2013-07-23  2:51       ` liu ping fan
2013-07-25 11:44         ` Stefan Hajnoczi
2013-07-25 12:01           ` Jan Kiszka
2013-07-22  9:40     ` Alex Bligh
2013-07-22 10:18       ` liu ping fan
2013-07-23  2:53         ` liu ping fan
2013-07-23 10:30           ` Paolo Bonzini
2013-07-24  1:28             ` liu ping fan
2013-07-24  6:42               ` Paolo Bonzini
2013-07-24  7:31                 ` Alex Bligh
2013-07-24  7:43                   ` Paolo Bonzini
2013-07-24  8:01                     ` Alex Bligh
2013-07-24  8:19                       ` Paolo Bonzini
2013-07-24  8:37                       ` Alex Bligh
2013-07-24 11:28                         ` Paolo Bonzini
2013-07-24  8:30                     ` liu ping fan
2013-07-24  7:43                 ` liu ping fan
2013-07-24  7:54                   ` Paolo Bonzini
2013-07-24  8:06                     ` Alex Bligh
2013-07-24 14:46                     ` [Qemu-devel] [PATCHv2a] [RFC 8/7 (really)] Add prctl(PR_SET_TIMERSLACK, 1, ...) to reduce timer slack Alex Bligh
2013-07-23 14:21           ` [Qemu-devel] [RFC 0/8] arm AioContext with its own timer stuff Alex Bligh
2013-07-25 11:47         ` Stefan Hajnoczi
2013-07-25 12:05 ` Stefan Hajnoczi
2013-07-25 12:21   ` Alex Bligh
2013-07-25 12:32     ` Jan Kiszka
2013-07-25 12:35       ` Paolo Bonzini
2013-07-25 12:38         ` Jan Kiszka
2013-07-25 12:41           ` Stefan Hajnoczi
2013-07-25 12:48             ` Jan Kiszka
2013-07-25 13:02               ` Paolo Bonzini
2013-07-25 13:06                 ` Jan Kiszka
2013-07-25 13:31                   ` Stefan Hajnoczi
2013-07-25 14:01                     ` Jan Kiszka
2013-07-25 12:59           ` Paolo Bonzini
2013-07-25 18:53       ` Alex Bligh
2013-07-26  8:43         ` Stefan Hajnoczi
2013-07-26  9:08           ` Alex Bligh
2013-07-26  9:19             ` Paolo Bonzini
2013-07-29  8:58           ` Kevin Wolf
2013-07-29 10:22             ` Alex Bligh
2013-07-29 10:45             ` Paolo Bonzini
2013-07-31  9:02             ` Stefan Hajnoczi
2013-07-26 10:05         ` Jan Kiszka
2013-07-26 19:29           ` Alex Bligh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.