All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-22 23:00 ` Anthony Liguori
  0 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-22 23:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Chris Wright, kvm, Anthony Liguori

qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
them to respond to these signals, introduce monitor commands that stop and start
individual vcpus.

The purpose of these commands are to implement CPU hard limits using an external
tool that watches the CPU consumption and stops the CPU as appropriate.

The monitor commands provide a more elegant solution that signals because it
ensures that a stopped vcpu isn't holding the qemu_mutex.

I'll reply to this note with an example tool.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/hmp-commands.hx b/hmp-commands.hx
index ba6de28..827bd67 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -279,6 +279,24 @@ Resume emulation.
 ETEXI
 
     {
+        .name       = "cpu_start",
+        .args_type  = "cpu:i",
+        .params     = "[cpu]",
+        .help       = "start cpu emulation",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_vcpu_start,
+    },
+
+    {
+        .name       = "cpu_stop",
+        .args_type  = "cpu:i",
+        .params     = "[cpu]",
+        .help       = "stop cpu emulation",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_vcpu_stop,
+    },
+
+    {
         .name       = "gdbserver",
         .args_type  = "device:s?",
         .params     = "[device]",
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 471306b..35121ed 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1351,6 +1351,65 @@ static void pause_all_threads(void)
     }
 }
 
+static void vcpu_stop(int cpu)
+{
+    CPUState *env = first_cpu;
+
+    for (env = first_cpu; env; env = env->next_cpu) {
+        if (env->cpu_index == cpu) {
+            break;
+        }
+    }
+
+    if (env) {
+        if (env != cpu_single_env) {
+            env->stop = 1;
+            pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
+        } else {
+            env->stop = 0;
+            env->stopped = 1;
+            cpu_exit(env);
+        }
+
+        while (!env->stopped) {
+            qemu_cond_wait(&qemu_pause_cond);
+        }
+    }
+}
+
+static void vcpu_start(int cpu)
+{
+    CPUState *env = first_cpu;
+
+    assert(!cpu_single_env);
+
+    for (env = first_cpu; env; env = env->next_cpu) {
+        if (env->cpu_index == cpu) {
+            break;
+        }
+    }
+
+    if (env) {
+        env->stop = 0;
+        env->stopped = 0;
+        pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
+    }
+}
+
+int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    int vcpu = qdict_get_int(qdict, "cpu");
+    vcpu_stop(vcpu);
+    return 0;
+}
+
+int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    int vcpu = qdict_get_int(qdict, "cpu");
+    vcpu_start(vcpu);
+    return 0;
+}
+
 static void resume_all_threads(void)
 {
     CPUState *penv = first_cpu;
diff --git a/sysemu.h b/sysemu.h
index 849dc8c..3ef68dd 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -61,6 +61,9 @@ void qemu_system_reset(void);
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
 
+int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data);
+
 void do_savevm(Monitor *mon, const QDict *qdict);
 int load_vmstate(const char *name);
 void do_delvm(Monitor *mon, const QDict *qdict);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-22 23:00 ` Anthony Liguori
  0 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-22 23:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Chris Wright, Anthony Liguori, kvm

qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
them to respond to these signals, introduce monitor commands that stop and start
individual vcpus.

The purpose of these commands are to implement CPU hard limits using an external
tool that watches the CPU consumption and stops the CPU as appropriate.

The monitor commands provide a more elegant solution that signals because it
ensures that a stopped vcpu isn't holding the qemu_mutex.

I'll reply to this note with an example tool.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/hmp-commands.hx b/hmp-commands.hx
index ba6de28..827bd67 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -279,6 +279,24 @@ Resume emulation.
 ETEXI
 
     {
+        .name       = "cpu_start",
+        .args_type  = "cpu:i",
+        .params     = "[cpu]",
+        .help       = "start cpu emulation",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_vcpu_start,
+    },
+
+    {
+        .name       = "cpu_stop",
+        .args_type  = "cpu:i",
+        .params     = "[cpu]",
+        .help       = "stop cpu emulation",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_vcpu_stop,
+    },
+
+    {
         .name       = "gdbserver",
         .args_type  = "device:s?",
         .params     = "[device]",
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 471306b..35121ed 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1351,6 +1351,65 @@ static void pause_all_threads(void)
     }
 }
 
+static void vcpu_stop(int cpu)
+{
+    CPUState *env = first_cpu;
+
+    for (env = first_cpu; env; env = env->next_cpu) {
+        if (env->cpu_index == cpu) {
+            break;
+        }
+    }
+
+    if (env) {
+        if (env != cpu_single_env) {
+            env->stop = 1;
+            pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
+        } else {
+            env->stop = 0;
+            env->stopped = 1;
+            cpu_exit(env);
+        }
+
+        while (!env->stopped) {
+            qemu_cond_wait(&qemu_pause_cond);
+        }
+    }
+}
+
+static void vcpu_start(int cpu)
+{
+    CPUState *env = first_cpu;
+
+    assert(!cpu_single_env);
+
+    for (env = first_cpu; env; env = env->next_cpu) {
+        if (env->cpu_index == cpu) {
+            break;
+        }
+    }
+
+    if (env) {
+        env->stop = 0;
+        env->stopped = 0;
+        pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
+    }
+}
+
+int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    int vcpu = qdict_get_int(qdict, "cpu");
+    vcpu_stop(vcpu);
+    return 0;
+}
+
+int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    int vcpu = qdict_get_int(qdict, "cpu");
+    vcpu_start(vcpu);
+    return 0;
+}
+
 static void resume_all_threads(void)
 {
     CPUState *penv = first_cpu;
diff --git a/sysemu.h b/sysemu.h
index 849dc8c..3ef68dd 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -61,6 +61,9 @@ void qemu_system_reset(void);
 void qemu_add_exit_notifier(Notifier *notify);
 void qemu_remove_exit_notifier(Notifier *notify);
 
+int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data);
+
 void do_savevm(Monitor *mon, const QDict *qdict);
 int load_vmstate(const char *name);
 void do_delvm(Monitor *mon, const QDict *qdict);
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-22 23:00 ` [Qemu-devel] " Anthony Liguori
@ 2010-11-22 23:03   ` Anthony Liguori
  -1 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-22 23:03 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, Chris Wright, kvm

[-- Attachment #1: Type: text/plain, Size: 3938 bytes --]

On 11/22/2010 05:00 PM, Anthony Liguori wrote:
> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> them to respond to these signals, introduce monitor commands that stop and start
> individual vcpus.
>
> The purpose of these commands are to implement CPU hard limits using an external
> tool that watches the CPU consumption and stops the CPU as appropriate.
>
> The monitor commands provide a more elegant solution that signals because it
> ensures that a stopped vcpu isn't holding the qemu_mutex.
>
> I'll reply to this note with an example tool.
>    

This is super rough but demonstrates the concept.  If you run it with '0 
50 100' it will cap VCPU 0 at 50%.

It's not the prettiest thing in the world but it's minimally invasive 
and seems to work well.

Regards,

Anthony Liguori

> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com>
>
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index ba6de28..827bd67 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -279,6 +279,24 @@ Resume emulation.
>   ETEXI
>
>       {
> +        .name       = "cpu_start",
> +        .args_type  = "cpu:i",
> +        .params     = "[cpu]",
> +        .help       = "start cpu emulation",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = do_vcpu_start,
> +    },
> +
> +    {
> +        .name       = "cpu_stop",
> +        .args_type  = "cpu:i",
> +        .params     = "[cpu]",
> +        .help       = "stop cpu emulation",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = do_vcpu_stop,
> +    },
> +
> +    {
>           .name       = "gdbserver",
>           .args_type  = "device:s?",
>           .params     = "[device]",
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index 471306b..35121ed 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -1351,6 +1351,65 @@ static void pause_all_threads(void)
>       }
>   }
>
> +static void vcpu_stop(int cpu)
> +{
> +    CPUState *env = first_cpu;
> +
> +    for (env = first_cpu; env; env = env->next_cpu) {
> +        if (env->cpu_index == cpu) {
> +            break;
> +        }
> +    }
> +
> +    if (env) {
> +        if (env != cpu_single_env) {
> +            env->stop = 1;
> +            pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
> +        } else {
> +            env->stop = 0;
> +            env->stopped = 1;
> +            cpu_exit(env);
> +        }
> +
> +        while (!env->stopped) {
> +            qemu_cond_wait(&qemu_pause_cond);
> +        }
> +    }
> +}
> +
> +static void vcpu_start(int cpu)
> +{
> +    CPUState *env = first_cpu;
> +
> +    assert(!cpu_single_env);
> +
> +    for (env = first_cpu; env; env = env->next_cpu) {
> +        if (env->cpu_index == cpu) {
> +            break;
> +        }
> +    }
> +
> +    if (env) {
> +        env->stop = 0;
> +        env->stopped = 0;
> +        pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
> +    }
> +}
> +
> +int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +    int vcpu = qdict_get_int(qdict, "cpu");
> +    vcpu_stop(vcpu);
> +    return 0;
> +}
> +
> +int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +    int vcpu = qdict_get_int(qdict, "cpu");
> +    vcpu_start(vcpu);
> +    return 0;
> +}
> +
>   static void resume_all_threads(void)
>   {
>       CPUState *penv = first_cpu;
> diff --git a/sysemu.h b/sysemu.h
> index 849dc8c..3ef68dd 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -61,6 +61,9 @@ void qemu_system_reset(void);
>   void qemu_add_exit_notifier(Notifier *notify);
>   void qemu_remove_exit_notifier(Notifier *notify);
>
> +int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data);
> +int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data);
> +
>   void do_savevm(Monitor *mon, const QDict *qdict);
>   int load_vmstate(const char *name);
>   void do_delvm(Monitor *mon, const QDict *qdict);
>    


[-- Attachment #2: main.c --]
[-- Type: text/x-csrc, Size: 5658 bytes --]

#define _XOPEN_SOURCE 500
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdbool.h>
#include <sys/types.h>
#include <signal.h>
#include <sys/time.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <stdarg.h>

#define USEC_PER_SEC 1000000ULL

static long get_cguest_time(const char *buffer)
{
    const char *ptr;
    int space_count;

    for (ptr = buffer; *ptr && space_count != 42; ptr++) {
        if (*ptr == ' ') {
            space_count++;
        }
    }

    return strtol(ptr, NULL, 10);
}

static void tv_add(struct timeval *tv, suseconds_t usec)
{
    tv->tv_usec += usec;

    while (tv->tv_usec > USEC_PER_SEC) {
        tv->tv_sec += 1;
        tv->tv_usec -= USEC_PER_SEC;
    }
}

static int tv_cmp(struct timeval *lhs, struct timeval *rhs)
{
    if (lhs->tv_sec == rhs->tv_sec) {
        if (lhs->tv_usec < rhs->tv_usec) {
            return -1;
        } else if (lhs->tv_usec > rhs->tv_usec) {
            return 1;
        }
        return 0;
    } else if (lhs->tv_sec < rhs->tv_sec) {
        return -1;
    } else if (lhs->tv_sec > rhs->tv_sec) {
        return 1;
    }
    return 0;
}

static void write_all(int fd, const void *buffer, size_t buffer_len)
{
    size_t offset = 0;

    while (offset < buffer_len) {
        ssize_t len;

        len = write(fd, buffer + offset, buffer_len - offset);
        if (len > 0) {
            offset += len;
        }
    }
}

static void read_reply(int fd, char *buffer, size_t buffer_len)
{
    size_t offset = 0;

    while (offset < buffer_len) {
        ssize_t len;

        len = read(fd, buffer + offset, buffer_len - offset);
        if (len > 0) {
            offset += len;
        }
        if (offset > 8 &&
            memcmp("\n(qemu) ", buffer + (offset - 8), 8) == 0) {
            char *ptr;
            buffer[offset - 8] = 0;
            ptr = strchr(buffer, '\n');
            if (ptr == NULL) {
                buffer[0] = 0;
            } else {
                memmove(buffer, ptr + 1, offset - (ptr - buffer) - 1);
            }
            return;
        }
    }
}

static int monitor_fd;

static void monitor_command(const char *fmt, ...)
{
    char buffer[256];
    va_list ap;
    size_t len;
    
    va_start(ap, fmt);
    len = vsnprintf(buffer, sizeof(buffer), fmt, ap);
    va_end(ap);

    write_all(monitor_fd, buffer, len);
    write_all(monitor_fd, "\n", 1);
    read_reply(monitor_fd, buffer, sizeof(buffer));
}

static void monitor_command_response(char *rsp, size_t rsp_len,
                                     const char *fmt, ...)
{
    char buffer[256];
    va_list ap;
    size_t len;
    
    va_start(ap, fmt);
    len = vsnprintf(buffer, sizeof(buffer), fmt, ap);
    va_end(ap);

    write_all(monitor_fd, buffer, len);
    write_all(monitor_fd, "\n", 1);
    read_reply(monitor_fd, rsp, rsp_len);
}

static int vm_running = 1;

static void guest_start(int vcpu)
{
    if (!vm_running) {
        monitor_command("cpu_start %d", vcpu);
    }
    vm_running = 1;
}

static void guest_stop(int vcpu)
{
    if (vm_running) {
        monitor_command("cpu_stop %d", vcpu);
    }
    vm_running = 0;
}

static int find_pid(char *buffer, int vcpu)
{
    char *ptr = buffer;
    int i;

    for (i = 0; ptr && i < vcpu; i++) {
        ptr = strchr(ptr, '\n');
        if (ptr) {
            ptr++;
        }
    }

    if (ptr) {
        ptr = strstr(ptr, "thread_id=");
        if (ptr) {
            ptr += 10;
            return atoi(ptr);
        }
    }

    return 0;
}

int main(int argc, char **argv)
{
    int fd, pid, vcpu;
    char buffer[1024];
    long ticks_per_sec;
    long cguest_time_last = 0;
    struct timeval period_end;
    long cguest_ticks;
    long entitlement;
    long period;
    struct sockaddr_un addr;

    if (argc != 4) {
        fprintf(stderr, "Missing arguments\n");
        return 1;
    }

    vcpu = atoi(argv[1]);
    /* FIXME hack, does guest time get scaled with vcpu count? */
    entitlement = atoi(argv[2]) * 2;
    period = atoi(argv[3]);

    monitor_fd = socket(PF_UNIX, SOCK_STREAM, 0);
    addr.sun_family = AF_UNIX;
    snprintf(addr.sun_path, 108, "/tmp/monitor.sock");

    if (connect(monitor_fd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
        return 1;
    }

    read_reply(monitor_fd, buffer, sizeof(buffer));
    monitor_command_response(buffer, sizeof(buffer), "info cpus");
    pid = find_pid(buffer, vcpu);

    ticks_per_sec = sysconf(_SC_CLK_TCK);
    entitlement = (entitlement * ticks_per_sec) / 1000;
    period *= 1000;

    snprintf(buffer, sizeof(buffer), "/proc/%d/stat", pid);
    fd = open(buffer, O_RDONLY);

    gettimeofday(&period_end, NULL);
    tv_add(&period_end, period);
    cguest_ticks = 0;

    while (1) {
        long cguest_time_now;
        struct timeval tv_now;
        ssize_t len;

        gettimeofday(&tv_now, NULL);
        len = pread(fd, buffer, sizeof(buffer) - 1, 0);
        buffer[len] = 0;
        cguest_time_now = get_cguest_time(buffer);

        if (cguest_time_last) {
            cguest_ticks += cguest_time_now - cguest_time_last;

            if (tv_cmp(&tv_now, &period_end) < 0) {
                if (cguest_ticks >= entitlement) {
                    guest_stop(vcpu);
                    cguest_ticks = 0;
                }
            } else {
                guest_start(vcpu);
                cguest_ticks = 0;
                tv_add(&tv_now, period);
                period_end = tv_now;
            }
        }

        cguest_time_last = cguest_time_now;
        usleep(10000); // 10ms
    }

    close(fd);

    return 0;
}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-22 23:03   ` Anthony Liguori
  0 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-22 23:03 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 3938 bytes --]

On 11/22/2010 05:00 PM, Anthony Liguori wrote:
> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> them to respond to these signals, introduce monitor commands that stop and start
> individual vcpus.
>
> The purpose of these commands are to implement CPU hard limits using an external
> tool that watches the CPU consumption and stops the CPU as appropriate.
>
> The monitor commands provide a more elegant solution that signals because it
> ensures that a stopped vcpu isn't holding the qemu_mutex.
>
> I'll reply to this note with an example tool.
>    

This is super rough but demonstrates the concept.  If you run it with '0 
50 100' it will cap VCPU 0 at 50%.

It's not the prettiest thing in the world but it's minimally invasive 
and seems to work well.

Regards,

Anthony Liguori

> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com>
>
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index ba6de28..827bd67 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -279,6 +279,24 @@ Resume emulation.
>   ETEXI
>
>       {
> +        .name       = "cpu_start",
> +        .args_type  = "cpu:i",
> +        .params     = "[cpu]",
> +        .help       = "start cpu emulation",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = do_vcpu_start,
> +    },
> +
> +    {
> +        .name       = "cpu_stop",
> +        .args_type  = "cpu:i",
> +        .params     = "[cpu]",
> +        .help       = "stop cpu emulation",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = do_vcpu_stop,
> +    },
> +
> +    {
>           .name       = "gdbserver",
>           .args_type  = "device:s?",
>           .params     = "[device]",
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index 471306b..35121ed 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -1351,6 +1351,65 @@ static void pause_all_threads(void)
>       }
>   }
>
> +static void vcpu_stop(int cpu)
> +{
> +    CPUState *env = first_cpu;
> +
> +    for (env = first_cpu; env; env = env->next_cpu) {
> +        if (env->cpu_index == cpu) {
> +            break;
> +        }
> +    }
> +
> +    if (env) {
> +        if (env != cpu_single_env) {
> +            env->stop = 1;
> +            pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
> +        } else {
> +            env->stop = 0;
> +            env->stopped = 1;
> +            cpu_exit(env);
> +        }
> +
> +        while (!env->stopped) {
> +            qemu_cond_wait(&qemu_pause_cond);
> +        }
> +    }
> +}
> +
> +static void vcpu_start(int cpu)
> +{
> +    CPUState *env = first_cpu;
> +
> +    assert(!cpu_single_env);
> +
> +    for (env = first_cpu; env; env = env->next_cpu) {
> +        if (env->cpu_index == cpu) {
> +            break;
> +        }
> +    }
> +
> +    if (env) {
> +        env->stop = 0;
> +        env->stopped = 0;
> +        pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
> +    }
> +}
> +
> +int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +    int vcpu = qdict_get_int(qdict, "cpu");
> +    vcpu_stop(vcpu);
> +    return 0;
> +}
> +
> +int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +    int vcpu = qdict_get_int(qdict, "cpu");
> +    vcpu_start(vcpu);
> +    return 0;
> +}
> +
>   static void resume_all_threads(void)
>   {
>       CPUState *penv = first_cpu;
> diff --git a/sysemu.h b/sysemu.h
> index 849dc8c..3ef68dd 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -61,6 +61,9 @@ void qemu_system_reset(void);
>   void qemu_add_exit_notifier(Notifier *notify);
>   void qemu_remove_exit_notifier(Notifier *notify);
>
> +int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data);
> +int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data);
> +
>   void do_savevm(Monitor *mon, const QDict *qdict);
>   int load_vmstate(const char *name);
>   void do_delvm(Monitor *mon, const QDict *qdict);
>    


[-- Attachment #2: main.c --]
[-- Type: text/x-csrc, Size: 5658 bytes --]

#define _XOPEN_SOURCE 500
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdbool.h>
#include <sys/types.h>
#include <signal.h>
#include <sys/time.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <stdarg.h>

#define USEC_PER_SEC 1000000ULL

static long get_cguest_time(const char *buffer)
{
    const char *ptr;
    int space_count;

    for (ptr = buffer; *ptr && space_count != 42; ptr++) {
        if (*ptr == ' ') {
            space_count++;
        }
    }

    return strtol(ptr, NULL, 10);
}

static void tv_add(struct timeval *tv, suseconds_t usec)
{
    tv->tv_usec += usec;

    while (tv->tv_usec > USEC_PER_SEC) {
        tv->tv_sec += 1;
        tv->tv_usec -= USEC_PER_SEC;
    }
}

static int tv_cmp(struct timeval *lhs, struct timeval *rhs)
{
    if (lhs->tv_sec == rhs->tv_sec) {
        if (lhs->tv_usec < rhs->tv_usec) {
            return -1;
        } else if (lhs->tv_usec > rhs->tv_usec) {
            return 1;
        }
        return 0;
    } else if (lhs->tv_sec < rhs->tv_sec) {
        return -1;
    } else if (lhs->tv_sec > rhs->tv_sec) {
        return 1;
    }
    return 0;
}

static void write_all(int fd, const void *buffer, size_t buffer_len)
{
    size_t offset = 0;

    while (offset < buffer_len) {
        ssize_t len;

        len = write(fd, buffer + offset, buffer_len - offset);
        if (len > 0) {
            offset += len;
        }
    }
}

static void read_reply(int fd, char *buffer, size_t buffer_len)
{
    size_t offset = 0;

    while (offset < buffer_len) {
        ssize_t len;

        len = read(fd, buffer + offset, buffer_len - offset);
        if (len > 0) {
            offset += len;
        }
        if (offset > 8 &&
            memcmp("\n(qemu) ", buffer + (offset - 8), 8) == 0) {
            char *ptr;
            buffer[offset - 8] = 0;
            ptr = strchr(buffer, '\n');
            if (ptr == NULL) {
                buffer[0] = 0;
            } else {
                memmove(buffer, ptr + 1, offset - (ptr - buffer) - 1);
            }
            return;
        }
    }
}

static int monitor_fd;

static void monitor_command(const char *fmt, ...)
{
    char buffer[256];
    va_list ap;
    size_t len;
    
    va_start(ap, fmt);
    len = vsnprintf(buffer, sizeof(buffer), fmt, ap);
    va_end(ap);

    write_all(monitor_fd, buffer, len);
    write_all(monitor_fd, "\n", 1);
    read_reply(monitor_fd, buffer, sizeof(buffer));
}

static void monitor_command_response(char *rsp, size_t rsp_len,
                                     const char *fmt, ...)
{
    char buffer[256];
    va_list ap;
    size_t len;
    
    va_start(ap, fmt);
    len = vsnprintf(buffer, sizeof(buffer), fmt, ap);
    va_end(ap);

    write_all(monitor_fd, buffer, len);
    write_all(monitor_fd, "\n", 1);
    read_reply(monitor_fd, rsp, rsp_len);
}

static int vm_running = 1;

static void guest_start(int vcpu)
{
    if (!vm_running) {
        monitor_command("cpu_start %d", vcpu);
    }
    vm_running = 1;
}

static void guest_stop(int vcpu)
{
    if (vm_running) {
        monitor_command("cpu_stop %d", vcpu);
    }
    vm_running = 0;
}

static int find_pid(char *buffer, int vcpu)
{
    char *ptr = buffer;
    int i;

    for (i = 0; ptr && i < vcpu; i++) {
        ptr = strchr(ptr, '\n');
        if (ptr) {
            ptr++;
        }
    }

    if (ptr) {
        ptr = strstr(ptr, "thread_id=");
        if (ptr) {
            ptr += 10;
            return atoi(ptr);
        }
    }

    return 0;
}

int main(int argc, char **argv)
{
    int fd, pid, vcpu;
    char buffer[1024];
    long ticks_per_sec;
    long cguest_time_last = 0;
    struct timeval period_end;
    long cguest_ticks;
    long entitlement;
    long period;
    struct sockaddr_un addr;

    if (argc != 4) {
        fprintf(stderr, "Missing arguments\n");
        return 1;
    }

    vcpu = atoi(argv[1]);
    /* FIXME hack, does guest time get scaled with vcpu count? */
    entitlement = atoi(argv[2]) * 2;
    period = atoi(argv[3]);

    monitor_fd = socket(PF_UNIX, SOCK_STREAM, 0);
    addr.sun_family = AF_UNIX;
    snprintf(addr.sun_path, 108, "/tmp/monitor.sock");

    if (connect(monitor_fd, (struct sockaddr *)&addr, sizeof(addr)) == -1) {
        return 1;
    }

    read_reply(monitor_fd, buffer, sizeof(buffer));
    monitor_command_response(buffer, sizeof(buffer), "info cpus");
    pid = find_pid(buffer, vcpu);

    ticks_per_sec = sysconf(_SC_CLK_TCK);
    entitlement = (entitlement * ticks_per_sec) / 1000;
    period *= 1000;

    snprintf(buffer, sizeof(buffer), "/proc/%d/stat", pid);
    fd = open(buffer, O_RDONLY);

    gettimeofday(&period_end, NULL);
    tv_add(&period_end, period);
    cguest_ticks = 0;

    while (1) {
        long cguest_time_now;
        struct timeval tv_now;
        ssize_t len;

        gettimeofday(&tv_now, NULL);
        len = pread(fd, buffer, sizeof(buffer) - 1, 0);
        buffer[len] = 0;
        cguest_time_now = get_cguest_time(buffer);

        if (cguest_time_last) {
            cguest_ticks += cguest_time_now - cguest_time_last;

            if (tv_cmp(&tv_now, &period_end) < 0) {
                if (cguest_ticks >= entitlement) {
                    guest_stop(vcpu);
                    cguest_ticks = 0;
                }
            } else {
                guest_start(vcpu);
                cguest_ticks = 0;
                tv_add(&tv_now, period);
                period_end = tv_now;
            }
        }

        cguest_time_last = cguest_time_now;
        usleep(10000); // 10ms
    }

    close(fd);

    return 0;
}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-22 23:00 ` [Qemu-devel] " Anthony Liguori
@ 2010-11-22 23:04   ` Chris Wright
  -1 siblings, 0 replies; 29+ messages in thread
From: Chris Wright @ 2010-11-22 23:04 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, Chris Wright, kvm

* Anthony Liguori (aliguori@us.ibm.com) wrote:
> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> them to respond to these signals, introduce monitor commands that stop and start
> individual vcpus.

In the past SIGSTOP has introduced time skew.  Have you verified this
isn't an issue.

thanks,
-chris

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-22 23:04   ` Chris Wright
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Wright @ 2010-11-22 23:04 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

* Anthony Liguori (aliguori@us.ibm.com) wrote:
> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> them to respond to these signals, introduce monitor commands that stop and start
> individual vcpus.

In the past SIGSTOP has introduced time skew.  Have you verified this
isn't an issue.

thanks,
-chris

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-22 23:04   ` [Qemu-devel] " Chris Wright
@ 2010-11-22 23:44     ` Anthony Liguori
  -1 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-22 23:44 UTC (permalink / raw)
  To: Chris Wright; +Cc: Anthony Liguori, qemu-devel, kvm

On 11/22/2010 05:04 PM, Chris Wright wrote:
> * Anthony Liguori (aliguori@us.ibm.com) wrote:
>    
>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
>> them to respond to these signals, introduce monitor commands that stop and start
>> individual vcpus.
>>      
> In the past SIGSTOP has introduced time skew.  Have you verified this
> isn't an issue.
>    

Time skew is a big topic.  Are you talking about TSC drift, pit/rtc/hpet 
drift, etc?

It's certainly going to stress periodic interrupt catch up code.

Regards,

Anthony Liguori

> thanks,
> -chris
>    


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-22 23:44     ` Anthony Liguori
  0 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-22 23:44 UTC (permalink / raw)
  To: Chris Wright; +Cc: Anthony Liguori, qemu-devel, kvm

On 11/22/2010 05:04 PM, Chris Wright wrote:
> * Anthony Liguori (aliguori@us.ibm.com) wrote:
>    
>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
>> them to respond to these signals, introduce monitor commands that stop and start
>> individual vcpus.
>>      
> In the past SIGSTOP has introduced time skew.  Have you verified this
> isn't an issue.
>    

Time skew is a big topic.  Are you talking about TSC drift, pit/rtc/hpet 
drift, etc?

It's certainly going to stress periodic interrupt catch up code.

Regards,

Anthony Liguori

> thanks,
> -chris
>    

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-22 23:44     ` [Qemu-devel] " Anthony Liguori
@ 2010-11-22 23:56       ` Chris Wright
  -1 siblings, 0 replies; 29+ messages in thread
From: Chris Wright @ 2010-11-22 23:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, Anthony Liguori, qemu-devel, kvm

* Anthony Liguori (aliguori@linux.vnet.ibm.com) wrote:
> On 11/22/2010 05:04 PM, Chris Wright wrote:
> >* Anthony Liguori (aliguori@us.ibm.com) wrote:
> >>qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> >>them to respond to these signals, introduce monitor commands that stop and start
> >>individual vcpus.
> >In the past SIGSTOP has introduced time skew.  Have you verified this
> >isn't an issue.
> 
> Time skew is a big topic.  Are you talking about TSC drift,
> pit/rtc/hpet drift, etc?

Sorry to be vague, but it's been long enough that I don't recall
the details.  The guest kernel's clocksource effected how timekeeping
progressed across STOP/CONT (was probably missing qemu based timer ticks).
While this is not the same, made me wonder if you'd tested against that.

> It's certainly going to stress periodic interrupt catch up code.

Heh, call it a feature for autotest ;)

thanks,
-chris

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-22 23:56       ` Chris Wright
  0 siblings, 0 replies; 29+ messages in thread
From: Chris Wright @ 2010-11-22 23:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, Anthony Liguori, qemu-devel, kvm

* Anthony Liguori (aliguori@linux.vnet.ibm.com) wrote:
> On 11/22/2010 05:04 PM, Chris Wright wrote:
> >* Anthony Liguori (aliguori@us.ibm.com) wrote:
> >>qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> >>them to respond to these signals, introduce monitor commands that stop and start
> >>individual vcpus.
> >In the past SIGSTOP has introduced time skew.  Have you verified this
> >isn't an issue.
> 
> Time skew is a big topic.  Are you talking about TSC drift,
> pit/rtc/hpet drift, etc?

Sorry to be vague, but it's been long enough that I don't recall
the details.  The guest kernel's clocksource effected how timekeeping
progressed across STOP/CONT (was probably missing qemu based timer ticks).
While this is not the same, made me wonder if you'd tested against that.

> It's certainly going to stress periodic interrupt catch up code.

Heh, call it a feature for autotest ;)

thanks,
-chris

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-22 23:56       ` [Qemu-devel] " Chris Wright
@ 2010-11-23  0:24         ` Anthony Liguori
  -1 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-23  0:24 UTC (permalink / raw)
  To: Chris Wright; +Cc: qemu-devel, kvm

On 11/22/2010 05:56 PM, Chris Wright wrote:
> * Anthony Liguori (aliguori@linux.vnet.ibm.com) wrote:
>    
>> On 11/22/2010 05:04 PM, Chris Wright wrote:
>>      
>>> * Anthony Liguori (aliguori@us.ibm.com) wrote:
>>>        
>>>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
>>>> them to respond to these signals, introduce monitor commands that stop and start
>>>> individual vcpus.
>>>>          
>>> In the past SIGSTOP has introduced time skew.  Have you verified this
>>> isn't an issue.
>>>        
>> Time skew is a big topic.  Are you talking about TSC drift,
>> pit/rtc/hpet drift, etc?
>>      
> Sorry to be vague, but it's been long enough that I don't recall
> the details.  The guest kernel's clocksource effected how timekeeping
> progressed across STOP/CONT (was probably missing qemu based timer ticks).
> While this is not the same, made me wonder if you'd tested against that.
>    

Yeah, it's definitely going to increase the likelihood of interrupt 
coalescing but only as much as a contended CPU would already.

QEMU will keep getting timer ticks but the guest won't process them in a 
timely fashion.

>> It's certainly going to stress periodic interrupt catch up code.
>>      
> Heh, call it a feature for autotest ;)
>    

Excellent idea :-)

Regards,

Anthony Liguori

> thanks,
> -chris
>    


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-23  0:24         ` Anthony Liguori
  0 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-23  0:24 UTC (permalink / raw)
  To: Chris Wright; +Cc: qemu-devel, kvm

On 11/22/2010 05:56 PM, Chris Wright wrote:
> * Anthony Liguori (aliguori@linux.vnet.ibm.com) wrote:
>    
>> On 11/22/2010 05:04 PM, Chris Wright wrote:
>>      
>>> * Anthony Liguori (aliguori@us.ibm.com) wrote:
>>>        
>>>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
>>>> them to respond to these signals, introduce monitor commands that stop and start
>>>> individual vcpus.
>>>>          
>>> In the past SIGSTOP has introduced time skew.  Have you verified this
>>> isn't an issue.
>>>        
>> Time skew is a big topic.  Are you talking about TSC drift,
>> pit/rtc/hpet drift, etc?
>>      
> Sorry to be vague, but it's been long enough that I don't recall
> the details.  The guest kernel's clocksource effected how timekeeping
> progressed across STOP/CONT (was probably missing qemu based timer ticks).
> While this is not the same, made me wonder if you'd tested against that.
>    

Yeah, it's definitely going to increase the likelihood of interrupt 
coalescing but only as much as a contended CPU would already.

QEMU will keep getting timer ticks but the guest won't process them in a 
timely fashion.

>> It's certainly going to stress periodic interrupt catch up code.
>>      
> Heh, call it a feature for autotest ;)
>    

Excellent idea :-)

Regards,

Anthony Liguori

> thanks,
> -chris
>    

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-22 23:04   ` [Qemu-devel] " Chris Wright
  (?)
  (?)
@ 2010-11-23  6:35   ` Avi Kivity
  -1 siblings, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2010-11-23  6:35 UTC (permalink / raw)
  To: Chris Wright; +Cc: Anthony Liguori, qemu-devel, kvm

On 11/23/2010 01:04 AM, Chris Wright wrote:
> * Anthony Liguori (aliguori@us.ibm.com) wrote:
> >  qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> >  them to respond to these signals, introduce monitor commands that stop and start
> >  individual vcpus.
>
> In the past SIGSTOP has introduced time skew.  Have you verified this
> isn't an issue.

Wouldn't we have the same problems with kernel cpu limits?  I'd say it 
only depends on the period of the controller, not on how it's implemented.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-22 23:00 ` [Qemu-devel] " Anthony Liguori
@ 2010-11-23  6:41   ` Avi Kivity
  -1 siblings, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2010-11-23  6:41 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, Chris Wright, kvm

On 11/23/2010 01:00 AM, Anthony Liguori wrote:
> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> them to respond to these signals, introduce monitor commands that stop and start
> individual vcpus.
>
> The purpose of these commands are to implement CPU hard limits using an external
> tool that watches the CPU consumption and stops the CPU as appropriate.
>
> The monitor commands provide a more elegant solution that signals because it
> ensures that a stopped vcpu isn't holding the qemu_mutex.
>

 From signal(7):

   The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

Perhaps this is a bug in kvm?

If we could catch SIGSTOP, then it would be easy to unblock it only 
while running in guest context. It would then stop on exit to userspace.

Using monitor commands is fairly heavyweight for something as high 
frequency as this.  What control period do you see people using?  Maybe 
we should define USR1 for vcpu start/stop.

What happens if one vcpu is stopped while another is running?  Spin 
loops, synchronous IPIs will take forever.  Maybe we need to stop the 
entire process.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-23  6:41   ` Avi Kivity
  0 siblings, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2010-11-23  6:41 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

On 11/23/2010 01:00 AM, Anthony Liguori wrote:
> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> them to respond to these signals, introduce monitor commands that stop and start
> individual vcpus.
>
> The purpose of these commands are to implement CPU hard limits using an external
> tool that watches the CPU consumption and stops the CPU as appropriate.
>
> The monitor commands provide a more elegant solution that signals because it
> ensures that a stopped vcpu isn't holding the qemu_mutex.
>

 From signal(7):

   The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

Perhaps this is a bug in kvm?

If we could catch SIGSTOP, then it would be easy to unblock it only 
while running in guest context. It would then stop on exit to userspace.

Using monitor commands is fairly heavyweight for something as high 
frequency as this.  What control period do you see people using?  Maybe 
we should define USR1 for vcpu start/stop.

What happens if one vcpu is stopped while another is running?  Spin 
loops, synchronous IPIs will take forever.  Maybe we need to stop the 
entire process.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-22 23:00 ` [Qemu-devel] " Anthony Liguori
@ 2010-11-23  7:29   ` Gleb Natapov
  -1 siblings, 0 replies; 29+ messages in thread
From: Gleb Natapov @ 2010-11-23  7:29 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, Chris Wright, kvm

On Mon, Nov 22, 2010 at 05:00:18PM -0600, Anthony Liguori wrote:
> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> them to respond to these signals, introduce monitor commands that stop and start
> individual vcpus.
> 
> The purpose of these commands are to implement CPU hard limits using an external
> tool that watches the CPU consumption and stops the CPU as appropriate.
> 
> The monitor commands provide a more elegant solution that signals because it
> ensures that a stopped vcpu isn't holding the qemu_mutex.
> 
Do you really want to stop vcpu while it holds guest lock? Does external tool
have enough info to make smart decision about how to limit vcpu runtime.

> I'll reply to this note with an example tool.
> 
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index ba6de28..827bd67 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -279,6 +279,24 @@ Resume emulation.
>  ETEXI
>  
>      {
> +        .name       = "cpu_start",
> +        .args_type  = "cpu:i",
> +        .params     = "[cpu]",
> +        .help       = "start cpu emulation",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = do_vcpu_start,
> +    },
> +
> +    {
> +        .name       = "cpu_stop",
> +        .args_type  = "cpu:i",
> +        .params     = "[cpu]",
> +        .help       = "stop cpu emulation",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = do_vcpu_stop,
> +    },
> +
> +    {
>          .name       = "gdbserver",
>          .args_type  = "device:s?",
>          .params     = "[device]",
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index 471306b..35121ed 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -1351,6 +1351,65 @@ static void pause_all_threads(void)
>      }
>  }
>  
> +static void vcpu_stop(int cpu)
> +{
> +    CPUState *env = first_cpu;
> +
> +    for (env = first_cpu; env; env = env->next_cpu) {
> +        if (env->cpu_index == cpu) {
> +            break;
> +        }
> +    }
> +
> +    if (env) {
> +        if (env != cpu_single_env) {
> +            env->stop = 1;
> +            pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
> +        } else {
> +            env->stop = 0;
> +            env->stopped = 1;
> +            cpu_exit(env);
> +        }
> +
> +        while (!env->stopped) {
> +            qemu_cond_wait(&qemu_pause_cond);
> +        }
> +    }
> +}
> +
> +static void vcpu_start(int cpu)
> +{
> +    CPUState *env = first_cpu;
> +
> +    assert(!cpu_single_env);
> +
> +    for (env = first_cpu; env; env = env->next_cpu) {
> +        if (env->cpu_index == cpu) {
> +            break;
> +        }
> +    }
> +
> +    if (env) {
> +        env->stop = 0;
> +        env->stopped = 0;
> +        pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
> +    }
> +}
> +
> +int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +    int vcpu = qdict_get_int(qdict, "cpu");
> +    vcpu_stop(vcpu);
> +    return 0;
> +}
> +
> +int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +    int vcpu = qdict_get_int(qdict, "cpu");
> +    vcpu_start(vcpu);
> +    return 0;
> +}
> +
>  static void resume_all_threads(void)
>  {
>      CPUState *penv = first_cpu;
> diff --git a/sysemu.h b/sysemu.h
> index 849dc8c..3ef68dd 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -61,6 +61,9 @@ void qemu_system_reset(void);
>  void qemu_add_exit_notifier(Notifier *notify);
>  void qemu_remove_exit_notifier(Notifier *notify);
>  
> +int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data);
> +int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data);
> +
>  void do_savevm(Monitor *mon, const QDict *qdict);
>  int load_vmstate(const char *name);
>  void do_delvm(Monitor *mon, const QDict *qdict);
> -- 
> 1.7.0.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] Re: [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-23  7:29   ` Gleb Natapov
  0 siblings, 0 replies; 29+ messages in thread
From: Gleb Natapov @ 2010-11-23  7:29 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

On Mon, Nov 22, 2010 at 05:00:18PM -0600, Anthony Liguori wrote:
> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of teaching
> them to respond to these signals, introduce monitor commands that stop and start
> individual vcpus.
> 
> The purpose of these commands are to implement CPU hard limits using an external
> tool that watches the CPU consumption and stops the CPU as appropriate.
> 
> The monitor commands provide a more elegant solution that signals because it
> ensures that a stopped vcpu isn't holding the qemu_mutex.
> 
Do you really want to stop vcpu while it holds guest lock? Does external tool
have enough info to make smart decision about how to limit vcpu runtime.

> I'll reply to this note with an example tool.
> 
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index ba6de28..827bd67 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -279,6 +279,24 @@ Resume emulation.
>  ETEXI
>  
>      {
> +        .name       = "cpu_start",
> +        .args_type  = "cpu:i",
> +        .params     = "[cpu]",
> +        .help       = "start cpu emulation",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = do_vcpu_start,
> +    },
> +
> +    {
> +        .name       = "cpu_stop",
> +        .args_type  = "cpu:i",
> +        .params     = "[cpu]",
> +        .help       = "stop cpu emulation",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = do_vcpu_stop,
> +    },
> +
> +    {
>          .name       = "gdbserver",
>          .args_type  = "device:s?",
>          .params     = "[device]",
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index 471306b..35121ed 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -1351,6 +1351,65 @@ static void pause_all_threads(void)
>      }
>  }
>  
> +static void vcpu_stop(int cpu)
> +{
> +    CPUState *env = first_cpu;
> +
> +    for (env = first_cpu; env; env = env->next_cpu) {
> +        if (env->cpu_index == cpu) {
> +            break;
> +        }
> +    }
> +
> +    if (env) {
> +        if (env != cpu_single_env) {
> +            env->stop = 1;
> +            pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
> +        } else {
> +            env->stop = 0;
> +            env->stopped = 1;
> +            cpu_exit(env);
> +        }
> +
> +        while (!env->stopped) {
> +            qemu_cond_wait(&qemu_pause_cond);
> +        }
> +    }
> +}
> +
> +static void vcpu_start(int cpu)
> +{
> +    CPUState *env = first_cpu;
> +
> +    assert(!cpu_single_env);
> +
> +    for (env = first_cpu; env; env = env->next_cpu) {
> +        if (env->cpu_index == cpu) {
> +            break;
> +        }
> +    }
> +
> +    if (env) {
> +        env->stop = 0;
> +        env->stopped = 0;
> +        pthread_kill(env->kvm_cpu_state.thread, SIG_IPI);
> +    }
> +}
> +
> +int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +    int vcpu = qdict_get_int(qdict, "cpu");
> +    vcpu_stop(vcpu);
> +    return 0;
> +}
> +
> +int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> +    int vcpu = qdict_get_int(qdict, "cpu");
> +    vcpu_start(vcpu);
> +    return 0;
> +}
> +
>  static void resume_all_threads(void)
>  {
>      CPUState *penv = first_cpu;
> diff --git a/sysemu.h b/sysemu.h
> index 849dc8c..3ef68dd 100644
> --- a/sysemu.h
> +++ b/sysemu.h
> @@ -61,6 +61,9 @@ void qemu_system_reset(void);
>  void qemu_add_exit_notifier(Notifier *notify);
>  void qemu_remove_exit_notifier(Notifier *notify);
>  
> +int do_vcpu_stop(Monitor *mon, const QDict *qdict, QObject **ret_data);
> +int do_vcpu_start(Monitor *mon, const QDict *qdict, QObject **ret_data);
> +
>  void do_savevm(Monitor *mon, const QDict *qdict);
>  int load_vmstate(const char *name);
>  void do_delvm(Monitor *mon, const QDict *qdict);
> -- 
> 1.7.0.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-23  6:41   ` Avi Kivity
@ 2010-11-23  8:16     ` Dor Laor
  -1 siblings, 0 replies; 29+ messages in thread
From: Dor Laor @ 2010-11-23  8:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, Chris Wright, qemu-devel, kvm

On 11/23/2010 08:41 AM, Avi Kivity wrote:
> On 11/23/2010 01:00 AM, Anthony Liguori wrote:
>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of
>> teaching
>> them to respond to these signals, introduce monitor commands that stop
>> and start
>> individual vcpus.
>>
>> The purpose of these commands are to implement CPU hard limits using
>> an external
>> tool that watches the CPU consumption and stops the CPU as appropriate.

Why not use cgroup for that?

>>
>> The monitor commands provide a more elegant solution that signals
>> because it
>> ensures that a stopped vcpu isn't holding the qemu_mutex.
>>
>
>  From signal(7):
>
> The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
>
> Perhaps this is a bug in kvm?
>
> If we could catch SIGSTOP, then it would be easy to unblock it only
> while running in guest context. It would then stop on exit to userspace.
>
> Using monitor commands is fairly heavyweight for something as high
> frequency as this. What control period do you see people using? Maybe we
> should define USR1 for vcpu start/stop.
>
> What happens if one vcpu is stopped while another is running? Spin
> loops, synchronous IPIs will take forever. Maybe we need to stop the
> entire process.
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-23  8:16     ` Dor Laor
  0 siblings, 0 replies; 29+ messages in thread
From: Dor Laor @ 2010-11-23  8:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, Anthony Liguori, qemu-devel, kvm

On 11/23/2010 08:41 AM, Avi Kivity wrote:
> On 11/23/2010 01:00 AM, Anthony Liguori wrote:
>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of
>> teaching
>> them to respond to these signals, introduce monitor commands that stop
>> and start
>> individual vcpus.
>>
>> The purpose of these commands are to implement CPU hard limits using
>> an external
>> tool that watches the CPU consumption and stops the CPU as appropriate.

Why not use cgroup for that?

>>
>> The monitor commands provide a more elegant solution that signals
>> because it
>> ensures that a stopped vcpu isn't holding the qemu_mutex.
>>
>
>  From signal(7):
>
> The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
>
> Perhaps this is a bug in kvm?
>
> If we could catch SIGSTOP, then it would be easy to unblock it only
> while running in guest context. It would then stop on exit to userspace.
>
> Using monitor commands is fairly heavyweight for something as high
> frequency as this. What control period do you see people using? Maybe we
> should define USR1 for vcpu start/stop.
>
> What happens if one vcpu is stopped while another is running? Spin
> loops, synchronous IPIs will take forever. Maybe we need to stop the
> entire process.
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-23  6:41   ` Avi Kivity
@ 2010-11-23 13:51     ` Anthony Liguori
  -1 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-23 13:51 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, qemu-devel, Chris Wright, kvm

On 11/23/2010 12:41 AM, Avi Kivity wrote:
> On 11/23/2010 01:00 AM, Anthony Liguori wrote:
>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of 
>> teaching
>> them to respond to these signals, introduce monitor commands that 
>> stop and start
>> individual vcpus.
>>
>> The purpose of these commands are to implement CPU hard limits using 
>> an external
>> tool that watches the CPU consumption and stops the CPU as appropriate.
>>
>> The monitor commands provide a more elegant solution that signals 
>> because it
>> ensures that a stopped vcpu isn't holding the qemu_mutex.
>>
>
> From signal(7):
>
>   The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
>
> Perhaps this is a bug in kvm?

I need to dig deeper than.

Maybe its something about sending SIGSTOP to a process?

>
> If we could catch SIGSTOP, then it would be easy to unblock it only 
> while running in guest context. It would then stop on exit to userspace.

Yeah, that's not a bad idea.

> Using monitor commands is fairly heavyweight for something as high 
> frequency as this.  What control period do you see people using?  
> Maybe we should define USR1 for vcpu start/stop.
>
> What happens if one vcpu is stopped while another is running?  Spin 
> loops, synchronous IPIs will take forever.  Maybe we need to stop the 
> entire process.

It's the same problem if a VCPU is descheduled while another is 
running.  The problem with stopping the entire process is that a big 
motivation for this is to ensure that benchmarks have consistent results 
regardless of CPU capacity.  If you just monitor the full process, then 
one VCPU may dominate the entitlement resulting in very erratic 
benchmarking.

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-23 13:51     ` Anthony Liguori
  0 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-23 13:51 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, Anthony Liguori, qemu-devel, kvm

On 11/23/2010 12:41 AM, Avi Kivity wrote:
> On 11/23/2010 01:00 AM, Anthony Liguori wrote:
>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of 
>> teaching
>> them to respond to these signals, introduce monitor commands that 
>> stop and start
>> individual vcpus.
>>
>> The purpose of these commands are to implement CPU hard limits using 
>> an external
>> tool that watches the CPU consumption and stops the CPU as appropriate.
>>
>> The monitor commands provide a more elegant solution that signals 
>> because it
>> ensures that a stopped vcpu isn't holding the qemu_mutex.
>>
>
> From signal(7):
>
>   The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
>
> Perhaps this is a bug in kvm?

I need to dig deeper than.

Maybe its something about sending SIGSTOP to a process?

>
> If we could catch SIGSTOP, then it would be easy to unblock it only 
> while running in guest context. It would then stop on exit to userspace.

Yeah, that's not a bad idea.

> Using monitor commands is fairly heavyweight for something as high 
> frequency as this.  What control period do you see people using?  
> Maybe we should define USR1 for vcpu start/stop.
>
> What happens if one vcpu is stopped while another is running?  Spin 
> loops, synchronous IPIs will take forever.  Maybe we need to stop the 
> entire process.

It's the same problem if a VCPU is descheduled while another is 
running.  The problem with stopping the entire process is that a big 
motivation for this is to ensure that benchmarks have consistent results 
regardless of CPU capacity.  If you just monitor the full process, then 
one VCPU may dominate the entitlement resulting in very erratic 
benchmarking.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-23  8:16     ` Dor Laor
@ 2010-11-23 13:57       ` Anthony Liguori
  -1 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-23 13:57 UTC (permalink / raw)
  To: dlaor; +Cc: Avi Kivity, Chris Wright, qemu-devel, kvm

On 11/23/2010 02:16 AM, Dor Laor wrote:
> On 11/23/2010 08:41 AM, Avi Kivity wrote:
>> On 11/23/2010 01:00 AM, Anthony Liguori wrote:
>>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of
>>> teaching
>>> them to respond to these signals, introduce monitor commands that stop
>>> and start
>>> individual vcpus.
>>>
>>> The purpose of these commands are to implement CPU hard limits using
>>> an external
>>> tool that watches the CPU consumption and stops the CPU as appropriate.
>
> Why not use cgroup for that?

This is a stop-gap.

The cgroup solution isn't perfect.  It doesn't know anything about guest 
time verses hypervisor time so it can't account just the guest time like 
we do with this implementation.  Also, since it may deschedule the vcpu 
thread while it's holding the qemu_mutex, it may unfairly tax other vcpu 
threads by creating additional lock contention.

This is all solvable but if there's an alternative that just requires a 
small change to qemu, it's worth doing in the short term.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-23 13:57       ` Anthony Liguori
  0 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-23 13:57 UTC (permalink / raw)
  To: dlaor; +Cc: Chris Wright, Avi Kivity, kvm, qemu-devel

On 11/23/2010 02:16 AM, Dor Laor wrote:
> On 11/23/2010 08:41 AM, Avi Kivity wrote:
>> On 11/23/2010 01:00 AM, Anthony Liguori wrote:
>>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of
>>> teaching
>>> them to respond to these signals, introduce monitor commands that stop
>>> and start
>>> individual vcpus.
>>>
>>> The purpose of these commands are to implement CPU hard limits using
>>> an external
>>> tool that watches the CPU consumption and stops the CPU as appropriate.
>
> Why not use cgroup for that?

This is a stop-gap.

The cgroup solution isn't perfect.  It doesn't know anything about guest 
time verses hypervisor time so it can't account just the guest time like 
we do with this implementation.  Also, since it may deschedule the vcpu 
thread while it's holding the qemu_mutex, it may unfairly tax other vcpu 
threads by creating additional lock contention.

This is all solvable but if there's an alternative that just requires a 
small change to qemu, it's worth doing in the short term.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-23 13:51     ` Anthony Liguori
@ 2010-11-23 14:00       ` Avi Kivity
  -1 siblings, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2010-11-23 14:00 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Anthony Liguori, qemu-devel, Chris Wright, kvm

On 11/23/2010 03:51 PM, Anthony Liguori wrote:
> On 11/23/2010 12:41 AM, Avi Kivity wrote:
>> On 11/23/2010 01:00 AM, Anthony Liguori wrote:
>>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of 
>>> teaching
>>> them to respond to these signals, introduce monitor commands that 
>>> stop and start
>>> individual vcpus.
>>>
>>> The purpose of these commands are to implement CPU hard limits using 
>>> an external
>>> tool that watches the CPU consumption and stops the CPU as appropriate.
>>>
>>> The monitor commands provide a more elegant solution that signals 
>>> because it
>>> ensures that a stopped vcpu isn't holding the qemu_mutex.
>>>
>>
>> From signal(7):
>>
>>   The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
>>
>> Perhaps this is a bug in kvm?
>
> I need to dig deeper than.

Signals are a bottomless pit.

> Maybe its something about sending SIGSTOP to a process?

AFAIK sending SIGSTOP to a process should stop all of its threads?  
SIGSTOPping a thread should also work.

>>
>> If we could catch SIGSTOP, then it would be easy to unblock it only 
>> while running in guest context. It would then stop on exit to userspace.
>
> Yeah, that's not a bad idea.

Except we can't.

>
>> Using monitor commands is fairly heavyweight for something as high 
>> frequency as this.  What control period do you see people using?  
>> Maybe we should define USR1 for vcpu start/stop.
>>
>> What happens if one vcpu is stopped while another is running?  Spin 
>> loops, synchronous IPIs will take forever.  Maybe we need to stop the 
>> entire process.
>
> It's the same problem if a VCPU is descheduled while another is running. 

We can fix that with directed yield or lock holder preemption 
prevention.  But if a vcpu is stopped by qemu, we suddenly can't.

> The problem with stopping the entire process is that a big motivation 
> for this is to ensure that benchmarks have consistent results 
> regardless of CPU capacity.  If you just monitor the full process, 
> then one VCPU may dominate the entitlement resulting in very erratic 
> benchmarking.

What's the desired behaviour?  Give each vcpu 300M cycles per second, or 
give a 2vcpu guest 600M cycles per second?

You could monitor threads separately but stop the entire process.  
Stopping individual threads will break apart as soon as they start 
taking locks.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-23 14:00       ` Avi Kivity
  0 siblings, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2010-11-23 14:00 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, Anthony Liguori, qemu-devel, kvm

On 11/23/2010 03:51 PM, Anthony Liguori wrote:
> On 11/23/2010 12:41 AM, Avi Kivity wrote:
>> On 11/23/2010 01:00 AM, Anthony Liguori wrote:
>>> qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT.  Instead of 
>>> teaching
>>> them to respond to these signals, introduce monitor commands that 
>>> stop and start
>>> individual vcpus.
>>>
>>> The purpose of these commands are to implement CPU hard limits using 
>>> an external
>>> tool that watches the CPU consumption and stops the CPU as appropriate.
>>>
>>> The monitor commands provide a more elegant solution that signals 
>>> because it
>>> ensures that a stopped vcpu isn't holding the qemu_mutex.
>>>
>>
>> From signal(7):
>>
>>   The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
>>
>> Perhaps this is a bug in kvm?
>
> I need to dig deeper than.

Signals are a bottomless pit.

> Maybe its something about sending SIGSTOP to a process?

AFAIK sending SIGSTOP to a process should stop all of its threads?  
SIGSTOPping a thread should also work.

>>
>> If we could catch SIGSTOP, then it would be easy to unblock it only 
>> while running in guest context. It would then stop on exit to userspace.
>
> Yeah, that's not a bad idea.

Except we can't.

>
>> Using monitor commands is fairly heavyweight for something as high 
>> frequency as this.  What control period do you see people using?  
>> Maybe we should define USR1 for vcpu start/stop.
>>
>> What happens if one vcpu is stopped while another is running?  Spin 
>> loops, synchronous IPIs will take forever.  Maybe we need to stop the 
>> entire process.
>
> It's the same problem if a VCPU is descheduled while another is running. 

We can fix that with directed yield or lock holder preemption 
prevention.  But if a vcpu is stopped by qemu, we suddenly can't.

> The problem with stopping the entire process is that a big motivation 
> for this is to ensure that benchmarks have consistent results 
> regardless of CPU capacity.  If you just monitor the full process, 
> then one VCPU may dominate the entitlement resulting in very erratic 
> benchmarking.

What's the desired behaviour?  Give each vcpu 300M cycles per second, or 
give a 2vcpu guest 600M cycles per second?

You could monitor threads separately but stop the entire process.  
Stopping individual threads will break apart as soon as they start 
taking locks.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-23 14:00       ` Avi Kivity
@ 2010-11-23 14:24         ` Anthony Liguori
  -1 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-23 14:24 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Anthony Liguori, qemu-devel, Chris Wright, kvm

On 11/23/2010 08:00 AM, Avi Kivity wrote:
>>>
>>> If we could catch SIGSTOP, then it would be easy to unblock it only 
>>> while running in guest context. It would then stop on exit to 
>>> userspace.
>>
>> Yeah, that's not a bad idea.
>
> Except we can't.

Yeah, I s:SIGSTOP:SIGUSR1:g.

>>
>>> Using monitor commands is fairly heavyweight for something as high 
>>> frequency as this.  What control period do you see people using?  
>>> Maybe we should define USR1 for vcpu start/stop.
>>>
>>> What happens if one vcpu is stopped while another is running?  Spin 
>>> loops, synchronous IPIs will take forever.  Maybe we need to stop 
>>> the entire process.
>>
>> It's the same problem if a VCPU is descheduled while another is running. 
>
> We can fix that with directed yield or lock holder preemption 
> prevention.  But if a vcpu is stopped by qemu, we suddenly can't.

That only works for spin locks.

Here's the scenario:

1) VCPU 0 drops to userspace and acquires qemu_mutex
2) VCPU 0 gets descheduled
3) VCPU 1 needs to drop to userspace and acquire qemu_mutex, gets 
blocked and yields
4) If we're lucky, VCPU 0 gets scheduled but it depends on how busy the 
system is

With CFS hard limits, once (2) happens, we're boned for (3) because (4) 
cannot happen.  By having QEMU know about (2), it can choose to run just 
a little bit longer in order to drop qemu_mutex such that (3) never happens.

>
>> The problem with stopping the entire process is that a big motivation 
>> for this is to ensure that benchmarks have consistent results 
>> regardless of CPU capacity.  If you just monitor the full process, 
>> then one VCPU may dominate the entitlement resulting in very erratic 
>> benchmarking.
>
> What's the desired behaviour?  Give each vcpu 300M cycles per second, 
> or give a 2vcpu guest 600M cycles per second?

Each vcpu gets 300M cycles per second.

> You could monitor threads separately but stop the entire process.  
> Stopping individual threads will break apart as soon as they start 
> taking locks.

I don't think so..  PLE should work as expected.  It's no different than 
a normally contended system.

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-23 14:24         ` Anthony Liguori
  0 siblings, 0 replies; 29+ messages in thread
From: Anthony Liguori @ 2010-11-23 14:24 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, Anthony Liguori, qemu-devel, kvm

On 11/23/2010 08:00 AM, Avi Kivity wrote:
>>>
>>> If we could catch SIGSTOP, then it would be easy to unblock it only 
>>> while running in guest context. It would then stop on exit to 
>>> userspace.
>>
>> Yeah, that's not a bad idea.
>
> Except we can't.

Yeah, I s:SIGSTOP:SIGUSR1:g.

>>
>>> Using monitor commands is fairly heavyweight for something as high 
>>> frequency as this.  What control period do you see people using?  
>>> Maybe we should define USR1 for vcpu start/stop.
>>>
>>> What happens if one vcpu is stopped while another is running?  Spin 
>>> loops, synchronous IPIs will take forever.  Maybe we need to stop 
>>> the entire process.
>>
>> It's the same problem if a VCPU is descheduled while another is running. 
>
> We can fix that with directed yield or lock holder preemption 
> prevention.  But if a vcpu is stopped by qemu, we suddenly can't.

That only works for spin locks.

Here's the scenario:

1) VCPU 0 drops to userspace and acquires qemu_mutex
2) VCPU 0 gets descheduled
3) VCPU 1 needs to drop to userspace and acquire qemu_mutex, gets 
blocked and yields
4) If we're lucky, VCPU 0 gets scheduled but it depends on how busy the 
system is

With CFS hard limits, once (2) happens, we're boned for (3) because (4) 
cannot happen.  By having QEMU know about (2), it can choose to run just 
a little bit longer in order to drop qemu_mutex such that (3) never happens.

>
>> The problem with stopping the entire process is that a big motivation 
>> for this is to ensure that benchmarks have consistent results 
>> regardless of CPU capacity.  If you just monitor the full process, 
>> then one VCPU may dominate the entitlement resulting in very erratic 
>> benchmarking.
>
> What's the desired behaviour?  Give each vcpu 300M cycles per second, 
> or give a 2vcpu guest 600M cycles per second?

Each vcpu gets 300M cycles per second.

> You could monitor threads separately but stop the entire process.  
> Stopping individual threads will break apart as soon as they start 
> taking locks.

I don't think so..  PLE should work as expected.  It's no different than 
a normally contended system.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
  2010-11-23 14:24         ` Anthony Liguori
@ 2010-11-23 14:35           ` Avi Kivity
  -1 siblings, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2010-11-23 14:35 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Anthony Liguori, qemu-devel, Chris Wright, kvm

On 11/23/2010 04:24 PM, Anthony Liguori wrote:
>
>>>
>>>> Using monitor commands is fairly heavyweight for something as high 
>>>> frequency as this.  What control period do you see people using?  
>>>> Maybe we should define USR1 for vcpu start/stop.
>>>>
>>>> What happens if one vcpu is stopped while another is running?  Spin 
>>>> loops, synchronous IPIs will take forever.  Maybe we need to stop 
>>>> the entire process.
>>>
>>> It's the same problem if a VCPU is descheduled while another is 
>>> running. 
>>
>> We can fix that with directed yield or lock holder preemption 
>> prevention.  But if a vcpu is stopped by qemu, we suddenly can't.
>
> That only works for spin locks.
>
> Here's the scenario:
>
> 1) VCPU 0 drops to userspace and acquires qemu_mutex
> 2) VCPU 0 gets descheduled
> 3) VCPU 1 needs to drop to userspace and acquire qemu_mutex, gets 
> blocked and yields
> 4) If we're lucky, VCPU 0 gets scheduled but it depends on how busy 
> the system is
>
> With CFS hard limits, once (2) happens, we're boned for (3) because 
> (4) cannot happen.  By having QEMU know about (2), it can choose to 
> run just a little bit longer in order to drop qemu_mutex such that (3) 
> never happens.

There's some support for futex priority inheritance, perhaps we can 
leverage that.  It's supposed to be for realtime threads, but perhaps we 
can hook the priority booster to directed yield.

It's really the same problem -- preempted lock holder -- only in 
userspace.  We should be able to use the same solution.

>
>>
>>> The problem with stopping the entire process is that a big 
>>> motivation for this is to ensure that benchmarks have consistent 
>>> results regardless of CPU capacity.  If you just monitor the full 
>>> process, then one VCPU may dominate the entitlement resulting in 
>>> very erratic benchmarking.
>>
>> What's the desired behaviour?  Give each vcpu 300M cycles per second, 
>> or give a 2vcpu guest 600M cycles per second?
>
> Each vcpu gets 300M cycles per second.
>
>> You could monitor threads separately but stop the entire process.  
>> Stopping individual threads will break apart as soon as they start 
>> taking locks.
>
> I don't think so..  PLE should work as expected.  It's no different 
> than a normally contended system.
>

PLE without directed yield is useless.  With directed yield, it may 
work, but if the vcpu is stopped, it becomes ineffective.

Directed yield allows the scheduler to follow a bouncing lock around by 
increasing the priority (or decreasing vruntime) of the immediate lock 
holder at the expense of waiters.  SIGSTOP may drop the priority of the 
lock holder to zero without giving PLE a way to adjust.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands
@ 2010-11-23 14:35           ` Avi Kivity
  0 siblings, 0 replies; 29+ messages in thread
From: Avi Kivity @ 2010-11-23 14:35 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, Anthony Liguori, qemu-devel, kvm

On 11/23/2010 04:24 PM, Anthony Liguori wrote:
>
>>>
>>>> Using monitor commands is fairly heavyweight for something as high 
>>>> frequency as this.  What control period do you see people using?  
>>>> Maybe we should define USR1 for vcpu start/stop.
>>>>
>>>> What happens if one vcpu is stopped while another is running?  Spin 
>>>> loops, synchronous IPIs will take forever.  Maybe we need to stop 
>>>> the entire process.
>>>
>>> It's the same problem if a VCPU is descheduled while another is 
>>> running. 
>>
>> We can fix that with directed yield or lock holder preemption 
>> prevention.  But if a vcpu is stopped by qemu, we suddenly can't.
>
> That only works for spin locks.
>
> Here's the scenario:
>
> 1) VCPU 0 drops to userspace and acquires qemu_mutex
> 2) VCPU 0 gets descheduled
> 3) VCPU 1 needs to drop to userspace and acquire qemu_mutex, gets 
> blocked and yields
> 4) If we're lucky, VCPU 0 gets scheduled but it depends on how busy 
> the system is
>
> With CFS hard limits, once (2) happens, we're boned for (3) because 
> (4) cannot happen.  By having QEMU know about (2), it can choose to 
> run just a little bit longer in order to drop qemu_mutex such that (3) 
> never happens.

There's some support for futex priority inheritance, perhaps we can 
leverage that.  It's supposed to be for realtime threads, but perhaps we 
can hook the priority booster to directed yield.

It's really the same problem -- preempted lock holder -- only in 
userspace.  We should be able to use the same solution.

>
>>
>>> The problem with stopping the entire process is that a big 
>>> motivation for this is to ensure that benchmarks have consistent 
>>> results regardless of CPU capacity.  If you just monitor the full 
>>> process, then one VCPU may dominate the entitlement resulting in 
>>> very erratic benchmarking.
>>
>> What's the desired behaviour?  Give each vcpu 300M cycles per second, 
>> or give a 2vcpu guest 600M cycles per second?
>
> Each vcpu gets 300M cycles per second.
>
>> You could monitor threads separately but stop the entire process.  
>> Stopping individual threads will break apart as soon as they start 
>> taking locks.
>
> I don't think so..  PLE should work as expected.  It's no different 
> than a normally contended system.
>

PLE without directed yield is useless.  With directed yield, it may 
work, but if the vcpu is stopped, it becomes ineffective.

Directed yield allows the scheduler to follow a bouncing lock around by 
increasing the priority (or decreasing vruntime) of the immediate lock 
holder at the expense of waiters.  SIGSTOP may drop the priority of the 
lock holder to zero without giving PLE a way to adjust.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2010-11-23 14:36 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-22 23:00 [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands Anthony Liguori
2010-11-22 23:00 ` [Qemu-devel] " Anthony Liguori
2010-11-22 23:03 ` Anthony Liguori
2010-11-22 23:03   ` [Qemu-devel] " Anthony Liguori
2010-11-22 23:04 ` Chris Wright
2010-11-22 23:04   ` [Qemu-devel] " Chris Wright
2010-11-22 23:44   ` Anthony Liguori
2010-11-22 23:44     ` [Qemu-devel] " Anthony Liguori
2010-11-22 23:56     ` Chris Wright
2010-11-22 23:56       ` [Qemu-devel] " Chris Wright
2010-11-23  0:24       ` Anthony Liguori
2010-11-23  0:24         ` [Qemu-devel] " Anthony Liguori
2010-11-23  6:35   ` Avi Kivity
2010-11-23  6:41 ` [Qemu-devel] " Avi Kivity
2010-11-23  6:41   ` Avi Kivity
2010-11-23  8:16   ` Dor Laor
2010-11-23  8:16     ` Dor Laor
2010-11-23 13:57     ` Anthony Liguori
2010-11-23 13:57       ` Anthony Liguori
2010-11-23 13:51   ` Anthony Liguori
2010-11-23 13:51     ` Anthony Liguori
2010-11-23 14:00     ` Avi Kivity
2010-11-23 14:00       ` Avi Kivity
2010-11-23 14:24       ` Anthony Liguori
2010-11-23 14:24         ` Anthony Liguori
2010-11-23 14:35         ` Avi Kivity
2010-11-23 14:35           ` Avi Kivity
2010-11-23  7:29 ` Gleb Natapov
2010-11-23  7:29   ` [Qemu-devel] " Gleb Natapov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.