* [PATCH] Enhance perf to collect KVM guest os statistics from host side @ 2010-03-16 5:27 Zhang, Yanmin 2010-03-16 5:41 ` Avi Kivity 2010-03-19 3:38 ` Zhang, Yanmin 0 siblings, 2 replies; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-16 5:27 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra Cc: Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang From: Zhang, Yanmin <yanmin_zhang@linux.intel.com> Based on the discussion in KVM community, I worked out the patch to support perf to collect guest os statistics from host side. This patch is implemented with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a critical bug and provided good suggestions with other guys. I really appreciate their kind help. The patch adds new subcommand kvm to perf. perf kvm top perf kvm record perf kvm report perf kvm diff The new perf could profile guest os kernel except guest os user space, but it could summarize guest os user space utilization per guest os. Below are some examples. 1) perf kvm top [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms --guestmodules=/home/ymzhang/guest/modules top -------------------------------------------------------------------------------------------------------------------------- PerfTop: 16010 irqs/sec kernel:59.1% us: 1.5% guest kernel:31.9% guest us: 7.5% exact: 0.0% [1000Hz cycles], (all, 16 CPUs) -------------------------------------------------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ _________________________ _______________________ 38770.00 20.4% __ticket_spin_lock [guest.kernel.kallsyms] 22560.00 11.9% ftrace_likely_update [kernel.kallsyms] 9208.00 4.8% __lock_acquire [kernel.kallsyms] 5473.00 2.9% trace_hardirqs_off_caller [kernel.kallsyms] 5222.00 2.7% copy_user_generic_string [guest.kernel.kallsyms] 4450.00 2.3% validate_chain [kernel.kallsyms] 4262.00 2.2% trace_hardirqs_on_caller [kernel.kallsyms] 4239.00 2.2% do_raw_spin_lock [kernel.kallsyms] 3548.00 1.9% do_raw_spin_unlock [kernel.kallsyms] 2487.00 1.3% lock_release [kernel.kallsyms] 2165.00 1.1% __local_bh_disable [kernel.kallsyms] 1905.00 1.0% check_chain_key [kernel.kallsyms] 1737.00 0.9% lock_acquire [kernel.kallsyms] 1604.00 0.8% tcp_recvmsg [kernel.kallsyms] 1524.00 0.8% mark_lock [kernel.kallsyms] 1464.00 0.8% schedule [kernel.kallsyms] 1423.00 0.7% __d_lookup [guest.kernel.kallsyms] If you want to just show host data, pls. don't use parameter --guest. The headline includes guest os kernel and userspace percentage. 2) perf kvm record [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms --guestmodules=/home/ymzhang/guest/modules record -f -a sleep 60 [ perf record: Woken up 15 times to write data ] [ perf record: Captured and wrote 29.385 MB perf.data.kvm (~1283837 samples) ] 3) perf kvm report 3.1) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms --guestmodules=/home/ymzhang/guest/modules report --sort pid --showcpuutilization>norm.host.guest.report.pid # Samples: 2453796285126 # # Overhead sys us guest sys guest us Command: Pid # ........ ..................... # 43.67% 1.35% 0.01% 39.06% 3.26% qemu-system-x86: 3913 3.78% 3.58% 0.20% 0.00% 0.00% tbench:13519 3.69% 3.66% 0.03% 0.00% 0.00% tbench_srv:13526 Some performance guys require perf to show sys/us/guest_sys/guest_us per KVM guest instance which is actually just a multi-threaded process. Above sub parameter --showcpuutilization does so. 3.2) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms --guestmodules=/home/ymzhang/guest/modules report >norm.host.guest.report # Samples: 2466991384118 # # Overhead Command Shared Object Symbol # ........ ............... ........................................................................ ...... # 29.11% qemu-system-x86 [guest.kernel.kallsyms] [g] __ticket_spin_lock 5.88% tbench_srv [kernel.kallsyms] [k] ftrace_likely_update 5.76% tbench [kernel.kallsyms] [k] ftrace_likely_update 3.88% qemu-system-x86 34c3255482 [u] 0x000034c3255482 1.83% tbench [kernel.kallsyms] [k] __lock_acquire 1.81% tbench_srv [kernel.kallsyms] [k] __lock_acquire 1.38% tbench_srv [kernel.kallsyms] [k] trace_hardirqs_off_caller 1.37% tbench [kernel.kallsyms] [k] trace_hardirqs_off_caller 1.13% qemu-system-x86 [guest.kernel.kallsyms] [g] copy_user_generic_string 1.04% tbench_srv [kernel.kallsyms] [k] validate_chain 1.00% tbench [kernel.kallsyms] [k] trace_hardirqs_on_caller 1.00% tbench_srv [kernel.kallsyms] [k] trace_hardirqs_on_caller 0.95% tbench [kernel.kallsyms] [k] do_raw_spin_lock [u] means it's in guest os user space. [g] means in guest os kernel. Other info is very direct. If it shows a module such like [ext4], it means guest kernel module, because native host kernel's modules are start from something like /lib/modules/XXX. Below is the patch against tip/master tree of 15th March. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> --- diff -Nraup linux-2.6_tipmaster0315/arch/x86/include/asm/perf_event.h linux-2.6_tipmaster0315_perfkvm/arch/x86/include/asm/perf_event.h --- linux-2.6_tipmaster0315/arch/x86/include/asm/perf_event.h 2010-03-16 08:59:11.533288951 +0800 +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/include/asm/perf_event.h 2010-03-16 09:01:09.972117272 +0800 @@ -143,17 +143,10 @@ extern void perf_events_lapic_init(void) */ #define PERF_EFLAGS_EXACT (1UL << 3) -#define perf_misc_flags(regs) \ -({ int misc = 0; \ - if (user_mode(regs)) \ - misc |= PERF_RECORD_MISC_USER; \ - else \ - misc |= PERF_RECORD_MISC_KERNEL; \ - if (regs->flags & PERF_EFLAGS_EXACT) \ - misc |= PERF_RECORD_MISC_EXACT; \ - misc; }) - -#define perf_instruction_pointer(regs) ((regs)->ip) +struct pt_regs; +extern unsigned long perf_instruction_pointer(struct pt_regs *regs); +extern unsigned long perf_misc_flags(struct pt_regs *regs); +#define perf_misc_flags(regs) perf_misc_flags(regs) #else static inline void init_hw_perf_events(void) { } diff -Nraup linux-2.6_tipmaster0315/arch/x86/include/asm/ptrace.h linux-2.6_tipmaster0315_perfkvm/arch/x86/include/asm/ptrace.h --- linux-2.6_tipmaster0315/arch/x86/include/asm/ptrace.h 2010-03-16 08:59:11.701271925 +0800 +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/include/asm/ptrace.h 2010-03-16 09:01:09.972117272 +0800 @@ -167,6 +167,15 @@ static inline int user_mode(struct pt_re #endif } +static inline int user_mode_cs(u16 cs) +{ +#ifdef CONFIG_X86_32 + return (cs & SEGMENT_RPL_MASK) == USER_RPL; +#else + return !!(cs & 3); +#endif +} + static inline int user_mode_vm(struct pt_regs *regs) { #ifdef CONFIG_X86_32 diff -Nraup linux-2.6_tipmaster0315/arch/x86/kernel/cpu/perf_event.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kernel/cpu/perf_event.c --- linux-2.6_tipmaster0315/arch/x86/kernel/cpu/perf_event.c 2010-03-16 08:59:12.225267457 +0800 +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kernel/cpu/perf_event.c 2010-03-16 09:03:02.343617673 +0800 @@ -1707,3 +1707,30 @@ void perf_arch_fetch_caller_regs(struct local_save_flags(regs->flags); } EXPORT_SYMBOL_GPL(perf_arch_fetch_caller_regs); + +unsigned long perf_instruction_pointer(struct pt_regs *regs) +{ + unsigned long ip; + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) { + ip = perf_guest_cbs->get_guest_ip(); + } else + ip = instruction_pointer(regs); + return ip; +} + +unsigned long perf_misc_flags(struct pt_regs *regs) +{ + int misc = 0; + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) { + misc |= perf_guest_cbs->is_user_mode() ? + PERF_RECORD_MISC_GUEST_USER : + PERF_RECORD_MISC_GUEST_KERNEL; + } else + misc |= user_mode(regs) ? PERF_RECORD_MISC_USER : + PERF_RECORD_MISC_KERNEL; + if (regs->flags & PERF_EFLAGS_EXACT) + misc |= PERF_RECORD_MISC_EXACT; + + return misc; +} + diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 2010-03-16 08:59:11.825295404 +0800 +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c 2010-03-16 09:01:09.976084492 +0800 @@ -26,6 +26,7 @@ #include <linux/sched.h> #include <linux/moduleparam.h> #include <linux/ftrace_event.h> +#include <linux/perf_event.h> #include "kvm_cache_regs.h" #include "x86.h" @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct vmcs_write32(TPR_THRESHOLD, irr); } +DEFINE_PER_CPU(int, kvm_in_guest) = {0}; + +static void kvm_set_in_guest(void) +{ + percpu_write(kvm_in_guest, 1); +} + +static int kvm_is_in_guest(void) +{ + return percpu_read(kvm_in_guest); +} + +static int kvm_is_user_mode(void) +{ + int user_mode; + user_mode = user_mode_cs(vmcs_read16(GUEST_CS_SELECTOR)); + return user_mode; +} + +static unsigned long kvm_get_guest_ip(void) +{ + return vmcs_readl(GUEST_RIP); +} + +static void kvm_reset_in_guest(void) +{ + if (percpu_read(kvm_in_guest)) + percpu_write(kvm_in_guest, 0); +} + +static struct perf_guest_info_callbacks kvm_guest_cbs = { + .is_in_guest = kvm_is_in_guest, + .is_user_mode = kvm_is_user_mode, + .get_guest_ip = kvm_get_guest_ip, + .reset_in_guest = kvm_reset_in_guest, +}; + static void vmx_complete_interrupts(struct vcpu_vmx *vmx) { u32 exit_intr_info; @@ -3653,8 +3691,11 @@ static void vmx_complete_interrupts(stru /* We need to handle NMIs before interrupts are enabled */ if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR && - (exit_intr_info & INTR_INFO_VALID_MASK)) + (exit_intr_info & INTR_INFO_VALID_MASK)) { + kvm_set_in_guest(); asm("int $2"); + kvm_reset_in_guest(); + } idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK; @@ -4251,6 +4292,8 @@ static int __init vmx_init(void) if (bypass_guest_pf) kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull); + perf_register_guest_info_callbacks(&kvm_guest_cbs); + return 0; out3: @@ -4266,6 +4309,8 @@ out: static void __exit vmx_exit(void) { + perf_unregister_guest_info_callbacks(&kvm_guest_cbs); + free_page((unsigned long)vmx_msr_bitmap_legacy); free_page((unsigned long)vmx_msr_bitmap_longmode); free_page((unsigned long)vmx_io_bitmap_b); diff -Nraup linux-2.6_tipmaster0315/include/linux/perf_event.h linux-2.6_tipmaster0315_perfkvm/include/linux/perf_event.h --- linux-2.6_tipmaster0315/include/linux/perf_event.h 2010-03-16 08:59:21.940168828 +0800 +++ linux-2.6_tipmaster0315_perfkvm/include/linux/perf_event.h 2010-03-16 09:01:09.976084492 +0800 @@ -288,11 +288,13 @@ struct perf_event_mmap_page { __u64 data_tail; /* user-space written tail */ }; -#define PERF_RECORD_MISC_CPUMODE_MASK (3 << 0) +#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0) #define PERF_RECORD_MISC_CPUMODE_UNKNOWN (0 << 0) #define PERF_RECORD_MISC_KERNEL (1 << 0) #define PERF_RECORD_MISC_USER (2 << 0) #define PERF_RECORD_MISC_HYPERVISOR (3 << 0) +#define PERF_RECORD_MISC_GUEST_KERNEL (4 << 0) +#define PERF_RECORD_MISC_GUEST_USER (5 << 0) #define PERF_RECORD_MISC_EXACT (1 << 14) /* @@ -446,6 +448,13 @@ enum perf_callchain_context { # include <asm/perf_event.h> #endif +struct perf_guest_info_callbacks { + int (*is_in_guest) (void); + int (*is_user_mode) (void); + unsigned long (*get_guest_ip) (void); + void (*reset_in_guest) (void); +}; + #ifdef CONFIG_HAVE_HW_BREAKPOINT #include <asm/hw_breakpoint.h> #endif @@ -913,6 +922,12 @@ static inline void perf_event_mmap(struc __perf_event_mmap(vma); } +extern struct perf_guest_info_callbacks *perf_guest_cbs; +extern int perf_register_guest_info_callbacks( + struct perf_guest_info_callbacks *); +extern int perf_unregister_guest_info_callbacks( + struct perf_guest_info_callbacks *); + extern void perf_event_comm(struct task_struct *tsk); extern void perf_event_fork(struct task_struct *tsk); @@ -982,6 +997,11 @@ perf_sw_event(u32 event_id, u64 nr, int static inline void perf_bp_event(struct perf_event *event, void *data) { } +static inline int perf_register_guest_info_callbacks +(struct perf_guest_info_callbacks *) {return 0; } +static inline int perf_unregister_guest_info_callbacks +(struct perf_guest_info_callbacks *) {return 0; } + static inline void perf_event_mmap(struct vm_area_struct *vma) { } static inline void perf_event_comm(struct task_struct *tsk) { } static inline void perf_event_fork(struct task_struct *tsk) { } diff -Nraup linux-2.6_tipmaster0315/kernel/perf_event.c linux-2.6_tipmaster0315_perfkvm/kernel/perf_event.c --- linux-2.6_tipmaster0315/kernel/perf_event.c 2010-03-16 08:59:55.108431543 +0800 +++ linux-2.6_tipmaster0315_perfkvm/kernel/perf_event.c 2010-03-16 09:01:09.980084394 +0800 @@ -2796,6 +2796,27 @@ void perf_arch_fetch_caller_regs(struct } /* + * We assume there is only KVM supporting the callbacks. + * Later on, we might change it to a list if there is + * another virtualization implementation supporting the callbacks. + */ +struct perf_guest_info_callbacks *perf_guest_cbs; + +int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks * cbs) +{ + perf_guest_cbs = cbs; + return 0; +} +EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks); + +int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks * cbs) +{ + perf_guest_cbs = NULL; + return 0; +} +EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks); + +/* * Output */ static bool perf_output_space(struct perf_mmap_data *data, unsigned long tail, @@ -3738,7 +3759,7 @@ void __perf_event_mmap(struct vm_area_st .event_id = { .header = { .type = PERF_RECORD_MMAP, - .misc = 0, + .misc = PERF_RECORD_MISC_USER, /* .size */ }, /* .pid */ diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-diff.c linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-diff.c --- linux-2.6_tipmaster0315/tools/perf/builtin-diff.c 2010-03-16 08:59:54.736473543 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-diff.c 2010-03-16 10:13:14.620371938 +0800 @@ -33,7 +33,7 @@ static int perf_session__add_hist_entry( return -ENOMEM; if (hit) - he->count += count; + __perf_session__add_count(he, al, count); return 0; } @@ -225,6 +225,9 @@ int cmd_diff(int argc, const char **argv input_new = argv[1]; } else input_new = argv[0]; + } else if (symbol_conf.guest_vmlinux_name || symbol_conf.guest_kallsyms) { + input_old = "perf.data.host"; + input_new = "perf.data.guest"; } symbol_conf.exclude_other = false; diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin.h linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin.h --- linux-2.6_tipmaster0315/tools/perf/builtin.h 2010-03-16 08:59:54.692509868 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin.h 2010-03-16 09:01:09.980084394 +0800 @@ -32,5 +32,6 @@ extern int cmd_version(int argc, const c extern int cmd_probe(int argc, const char **argv, const char *prefix); extern int cmd_kmem(int argc, const char **argv, const char *prefix); extern int cmd_lock(int argc, const char **argv, const char *prefix); +extern int cmd_kvm(int argc, const char **argv, const char *prefix); #endif diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-kvm.c linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-kvm.c --- linux-2.6_tipmaster0315/tools/perf/builtin-kvm.c 1970-01-01 08:00:00.000000000 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-kvm.c 2010-03-16 09:01:09.980084394 +0800 @@ -0,0 +1,123 @@ +#include "builtin.h" +#include "perf.h" + +#include "util/util.h" +#include "util/cache.h" +#include "util/symbol.h" +#include "util/thread.h" +#include "util/header.h" +#include "util/session.h" + +#include "util/parse-options.h" +#include "util/trace-event.h" + +#include "util/debug.h" + +#include <sys/prctl.h> + +#include <semaphore.h> +#include <pthread.h> +#include <math.h> + +static char *file_name = NULL; +static char name_buffer[256]; + +int perf_host = 1; +int perf_guest = 0; + +static const char * const kvm_usage[] = { + "perf kvm [<options>] {top|record|report|diff}", + NULL +}; + +static const struct option kvm_options[] = { + OPT_STRING('i', "input", &file_name, "file", + "Input file name"), + OPT_STRING('o', "output", &file_name, "file", + "Output file name"), + OPT_BOOLEAN(0, "guest", &perf_guest, + "Collect guest os data"), + OPT_BOOLEAN(0, "host", &perf_host, + "Collect guest os data"), + OPT_STRING(0, "guestvmlinux", &symbol_conf.guest_vmlinux_name, "file", + "file saving guest os vmlinux"), + OPT_STRING(0, "guestkallsyms", &symbol_conf.guest_kallsyms, "file", + "file saving guest os /proc/kallsyms"), + OPT_STRING(0, "guestmodules", &symbol_conf.guest_modules, "file", + "file saving guest os /proc/modules"), + OPT_END() +}; + +static int __cmd_record(int argc, const char **argv) +{ + int rec_argc, i = 0, j; + const char **rec_argv; + + rec_argc = argc + 2; + rec_argv = calloc(rec_argc + 1, sizeof(char *)); + rec_argv[i++] = strdup("record"); + rec_argv[i++] = strdup("-o"); + rec_argv[i++] = strdup(file_name); + for (j = 1; j < argc; j++, i++) + rec_argv[i] = argv[j]; + + BUG_ON(i != rec_argc); + + return cmd_record(i, rec_argv, NULL); +} + +static int __cmd_report(int argc, const char **argv) +{ + int rec_argc, i = 0, j; + const char **rec_argv; + + rec_argc = argc + 2; + rec_argv = calloc(rec_argc + 1, sizeof(char *)); + rec_argv[i++] = strdup("report"); + rec_argv[i++] = strdup("-i"); + rec_argv[i++] = strdup(file_name); + for (j = 1; j < argc; j++, i++) + rec_argv[i] = argv[j]; + + BUG_ON(i != rec_argc); + + return cmd_report(i, rec_argv, NULL); +} + +int cmd_kvm(int argc, const char **argv, const char *prefix __used) +{ + perf_host = perf_guest = 0; + + argc = parse_options(argc, argv, kvm_options, kvm_usage, + PARSE_OPT_STOP_AT_NON_OPTION); + if (!argc) + usage_with_options(kvm_usage, kvm_options); + + if (!perf_host) + perf_guest = 1; + + if (!file_name) { + if (perf_host && !perf_guest) + sprintf(name_buffer, "perf.data.host"); + else if (!perf_host && perf_guest) + sprintf(name_buffer, "perf.data.guest"); + else + sprintf(name_buffer, "perf.data.kvm"); + file_name = name_buffer; + } + + if (!strncmp(argv[0], "rec", 3)) { + return __cmd_record(argc, argv); + } else if (!strncmp(argv[0], "rep", 3)) { + return __cmd_report(argc, argv); + } else if (!strncmp(argv[0], "diff", 4)) { + return cmd_diff(argc, argv, NULL); + } else if (!strncmp(argv[0], "top", 3)) { + return cmd_top(argc, argv, NULL); + } else { + usage_with_options(kvm_usage, kvm_options); + } + + return 0; +} + diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-record.c linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-record.c --- linux-2.6_tipmaster0315/tools/perf/builtin-record.c 2010-03-16 08:59:54.896488489 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-record.c 2010-03-16 09:01:09.980084394 +0800 @@ -566,18 +566,58 @@ static int __cmd_record(int argc, const post_processing_offset = lseek(output, 0, SEEK_CUR); err = event__synthesize_kernel_mmap(process_synthesized_event, - session, "_text"); + session, "/proc/kallsyms", + "kernel.kallsyms", + session->vmlinux_maps, + "_text", PERF_RECORD_MISC_KERNEL); if (err < 0) { pr_err("Couldn't record kernel reference relocation symbol.\n"); return err; } - err = event__synthesize_modules(process_synthesized_event, session); + err = event__synthesize_modules(process_synthesized_event, + session, + &session->kmaps, + PERF_RECORD_MISC_KERNEL); if (err < 0) { pr_err("Couldn't record kernel reference relocation symbol.\n"); return err; } + if (perf_guest) { + /* + *As for guest kernel when processing subcommand record&report, + *we arrange module mmap prior to guest kernel mmap and trigger + *a preload dso because guest module symbols are loaded from guest + *kallsyms instead of /lib/modules/XXX/XXX. This method is used to + *avoid symbol missing when the first addr is in module instead of + *in guest kernel + */ + err = event__synthesize_modules(process_synthesized_event, + session, + &session->guest_kmaps, + PERF_RECORD_MISC_GUEST_KERNEL); + if (err < 0) { + pr_err("Couldn't record guest kernel reference relocation symbol.\n"); + return err; + } + + /* + * We use _stext for guest kernel because guest kernel's /proc/kallsyms + * have no _text. + */ + err = event__synthesize_kernel_mmap(process_synthesized_event, + session, symbol_conf.guest_kallsyms, + "guest.kernel.kallsyms", + session->guest_vmlinux_maps, + "_stext", + PERF_RECORD_MISC_GUEST_KERNEL); + if (err < 0) { + pr_err("Couldn't record guest kernel reference relocation symbol.\n"); + return err; + } + } + if (!system_wide && profile_cpu == -1) event__synthesize_thread(target_pid, process_synthesized_event, session); diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-report.c linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-report.c --- linux-2.6_tipmaster0315/tools/perf/builtin-report.c 2010-03-16 08:59:54.760470652 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-report.c 2010-03-16 10:40:24.102800324 +0800 @@ -104,7 +104,7 @@ static int perf_session__add_hist_entry( return -ENOMEM; if (hit) - he->count += data->period; + __perf_session__add_count(he, al, data->period); if (symbol_conf.use_callchain) { if (!hit) @@ -428,6 +428,8 @@ static const struct option options[] = { "sort by key(s): pid, comm, dso, symbol, parent"), OPT_BOOLEAN('P', "full-paths", &symbol_conf.full_paths, "Don't shorten the pathnames taking into account the cwd"), + OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization, + "Show sample percentage for different cpu modes"), OPT_STRING('p', "parent", &parent_pattern, "regex", "regex filter to identify parent, see: '--sort parent'"), OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other, diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-top.c linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-top.c --- linux-2.6_tipmaster0315/tools/perf/builtin-top.c 2010-03-16 08:59:54.760470652 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-top.c 2010-03-16 09:01:09.984084103 +0800 @@ -417,8 +417,9 @@ static double sym_weight(const struct sy } static long samples; -static long userspace_samples; +static long kernel_samples, userspace_samples; static long exact_samples; +static long guest_us_samples, guest_kernel_samples; static const char CONSOLE_CLEAR[] = "^[[H^[[2J"; static void __list_insert_active_sym(struct sym_entry *syme) @@ -458,7 +459,10 @@ static void print_sym_table(void) int printed = 0, j; int counter, snap = !display_weighted ? sym_counter : 0; float samples_per_sec = samples/delay_secs; - float ksamples_per_sec = (samples-userspace_samples)/delay_secs; + float ksamples_per_sec = kernel_samples/delay_secs; + float userspace_samples_per_sec = (userspace_samples)/delay_secs; + float guest_kernel_samples_per_sec = (guest_kernel_samples)/delay_secs; + float guest_us_samples_per_sec = (guest_us_samples)/delay_secs; float esamples_percent = (100.0*exact_samples)/samples; float sum_ksamples = 0.0; struct sym_entry *syme, *n; @@ -467,7 +471,8 @@ static void print_sym_table(void) int sym_width = 0, dso_width = 0, max_dso_width; const int win_width = winsize.ws_col - 1; - samples = userspace_samples = exact_samples = 0; + samples = userspace_samples = kernel_samples = exact_samples = 0; + guest_kernel_samples = guest_us_samples = 0; /* Sort the active symbols */ pthread_mutex_lock(&active_symbols_lock); @@ -498,10 +503,21 @@ static void print_sym_table(void) puts(CONSOLE_CLEAR); printf("%-*.*s\n", win_width, win_width, graph_dotted_line); - printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% exact: %4.1f%% [", - samples_per_sec, - 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)), - esamples_percent); + if (!perf_guest) { + printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% exact: %4.1f%% [", + samples_per_sec, + 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)), + esamples_percent); + } else { + printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% us:%4.1f%%" + " guest kernel:%4.1f%% guest us:%4.1f%% exact: %4.1f%% [", + samples_per_sec, + 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)), + 100.0 - (100.0*((samples_per_sec-userspace_samples_per_sec)/samples_per_sec)), + 100.0 - (100.0*((samples_per_sec-guest_kernel_samples_per_sec)/samples_per_sec)), + 100.0 - (100.0*((samples_per_sec-guest_us_samples_per_sec)/samples_per_sec)), + esamples_percent); + } if (nr_counters == 1 || !display_weighted) { printf("%Ld", (u64)attrs[0].sample_period); @@ -958,9 +974,20 @@ static void event__process_sample(const return; break; case PERF_RECORD_MISC_KERNEL: + ++kernel_samples; if (hide_kernel_symbols) return; break; + case PERF_RECORD_MISC_GUEST_KERNEL: + ++guest_kernel_samples; + break; + case PERF_RECORD_MISC_GUEST_USER: + ++guest_us_samples; + /* + * TODO: we don't process guest user from host side + * except simple counting + */ + return; default: return; } diff -Nraup linux-2.6_tipmaster0315/tools/perf/Makefile linux-2.6_tipmaster0315_perfkvm/tools/perf/Makefile --- linux-2.6_tipmaster0315/tools/perf/Makefile 2010-03-16 08:59:54.892460680 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/Makefile 2010-03-16 10:45:19.503860691 +0800 @@ -462,6 +462,7 @@ BUILTIN_OBJS += builtin-trace.o BUILTIN_OBJS += builtin-probe.o BUILTIN_OBJS += builtin-kmem.o BUILTIN_OBJS += builtin-lock.o +BUILTIN_OBJS += builtin-kvm.o PERFLIBS = $(LIB_FILE) diff -Nraup linux-2.6_tipmaster0315/tools/perf/perf.c linux-2.6_tipmaster0315_perfkvm/tools/perf/perf.c --- linux-2.6_tipmaster0315/tools/perf/perf.c 2010-03-16 08:59:54.764469663 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/perf.c 2010-03-16 09:01:09.984084103 +0800 @@ -308,6 +308,7 @@ static void handle_internal_command(int { "probe", cmd_probe, 0 }, { "kmem", cmd_kmem, 0 }, { "lock", cmd_lock, 0 }, + { "kvm", cmd_kvm, 0 }, }; unsigned int i; static const char ext[] = STRIP_EXTENSION; diff -Nraup linux-2.6_tipmaster0315/tools/perf/perf.h linux-2.6_tipmaster0315_perfkvm/tools/perf/perf.h --- linux-2.6_tipmaster0315/tools/perf/perf.h 2010-03-16 08:59:54.896488489 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/perf.h 2010-03-16 09:01:10.000116335 +0800 @@ -133,4 +133,6 @@ struct ip_callchain { u64 ips[0]; }; +extern int perf_host, perf_guest; + #endif diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/event.c linux-2.6_tipmaster0315_perfkvm/tools/perf/util/event.c --- linux-2.6_tipmaster0315/tools/perf/util/event.c 2010-03-16 08:59:54.864459297 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/event.c 2010-03-16 09:45:19.660852164 +0800 @@ -112,7 +112,7 @@ static int event__synthesize_mmap_events event_t ev = { .header = { .type = PERF_RECORD_MMAP, - .misc = 0, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */ + .misc = PERF_RECORD_MISC_USER, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */ }, }; int n; @@ -158,11 +158,13 @@ static int event__synthesize_mmap_events } int event__synthesize_modules(event__handler_t process, - struct perf_session *session) + struct perf_session *session, + struct map_groups *kmaps, + unsigned int misc) { struct rb_node *nd; - for (nd = rb_first(&session->kmaps.maps[MAP__FUNCTION]); + for (nd = rb_first(&kmaps->maps[MAP__FUNCTION]); nd; nd = rb_next(nd)) { event_t ev; size_t size; @@ -173,7 +175,7 @@ int event__synthesize_modules(event__han size = ALIGN(pos->dso->long_name_len + 1, sizeof(u64)); memset(&ev, 0, sizeof(ev)); - ev.mmap.header.misc = 1; /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */ + ev.mmap.header.misc = misc; /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */ ev.mmap.header.type = PERF_RECORD_MMAP; ev.mmap.header.size = (sizeof(ev.mmap) - (sizeof(ev.mmap.filename) - size)); @@ -241,13 +243,17 @@ static int find_symbol_cb(void *arg, con int event__synthesize_kernel_mmap(event__handler_t process, struct perf_session *session, - const char *symbol_name) + const char *kallsyms_name, + const char *mmap_name, + struct map **maps, + const char *symbol_name, + unsigned int misc) { size_t size; event_t ev = { .header = { .type = PERF_RECORD_MMAP, - .misc = 1, /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */ + .misc = misc, /* kernel uses PERF_RECORD_MISC_USER for user space maps, see kernel/perf_event.c __perf_event_mmap */ }, }; /* @@ -257,16 +263,16 @@ int event__synthesize_kernel_mmap(event_ */ struct process_symbol_args args = { .name = symbol_name, }; - if (kallsyms__parse("/proc/kallsyms", &args, find_symbol_cb) <= 0) + if (kallsyms__parse(kallsyms_name, &args, find_symbol_cb) <= 0) return -ENOENT; size = snprintf(ev.mmap.filename, sizeof(ev.mmap.filename), - "[kernel.kallsyms.%s]", symbol_name) + 1; + "[%s.%s]", mmap_name, symbol_name) + 1; size = ALIGN(size, sizeof(u64)); ev.mmap.header.size = (sizeof(ev.mmap) - (sizeof(ev.mmap.filename) - size)); ev.mmap.pgoff = args.start; - ev.mmap.start = session->vmlinux_maps[MAP__FUNCTION]->start; - ev.mmap.len = session->vmlinux_maps[MAP__FUNCTION]->end - ev.mmap.start ; + ev.mmap.start = maps[MAP__FUNCTION]->start; + ev.mmap.len = maps[MAP__FUNCTION]->end - ev.mmap.start ; return process(&ev, session); } @@ -320,19 +326,25 @@ int event__process_lost(event_t *self, s return 0; } -int event__process_mmap(event_t *self, struct perf_session *session) +static void event_set_kernel_mmap_len(struct map **maps, event_t *self) { - struct thread *thread; - struct map *map; + maps[MAP__FUNCTION]->start = self->mmap.start; + maps[MAP__FUNCTION]->end = self->mmap.start + self->mmap.len; + /* + * Be a bit paranoid here, some perf.data file came with + * a zero sized synthesized MMAP event for the kernel. + */ + if (maps[MAP__FUNCTION]->end == 0) + maps[MAP__FUNCTION]->end = ~0UL; +} - dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n", - self->mmap.pid, self->mmap.tid, self->mmap.start, - self->mmap.len, self->mmap.pgoff, self->mmap.filename); +static int __event__process_mmap(event_t *self, struct perf_session *session) +{ + struct map *map; + static const char kmmap_prefix[] = "[kernel.kallsyms."; - if (self->mmap.pid == 0) { - static const char kmmap_prefix[] = "[kernel.kallsyms."; + if (self->mmap.filename[0] == '/') { - if (self->mmap.filename[0] == '/') { char short_module_name[1024]; char *name = strrchr(self->mmap.filename, '/'), *dot; @@ -348,9 +360,10 @@ int event__process_mmap(event_t *self, s "[%.*s]", (int)(dot - name), name); strxfrchar(short_module_name, '-', '_'); - map = perf_session__new_module_map(session, + map = map_groups__new_module(&session->kmaps, self->mmap.start, - self->mmap.filename); + self->mmap.filename, + 0); if (map == NULL) goto out_problem; @@ -373,22 +386,94 @@ int event__process_mmap(event_t *self, s if (kernel == NULL) goto out_problem; - kernel->kernel = 1; - if (__perf_session__create_kernel_maps(session, kernel) < 0) + kernel->kernel = DSO_TYPE_KERNEL; + if (__map_groups__create_kernel_maps(&session->kmaps, + session->vmlinux_maps, kernel) < 0) goto out_problem; - session->vmlinux_maps[MAP__FUNCTION]->start = self->mmap.start; - session->vmlinux_maps[MAP__FUNCTION]->end = self->mmap.start + self->mmap.len; - /* - * Be a bit paranoid here, some perf.data file came with - * a zero sized synthesized MMAP event for the kernel. - */ - if (session->vmlinux_maps[MAP__FUNCTION]->end == 0) - session->vmlinux_maps[MAP__FUNCTION]->end = ~0UL; - - perf_session__set_kallsyms_ref_reloc_sym(session, symbol_name, - self->mmap.pgoff); + event_set_kernel_mmap_len(session->vmlinux_maps, self); + perf_session__set_kallsyms_ref_reloc_sym(session->vmlinux_maps, + symbol_name, + self->mmap.pgoff); } + return 0; + +out_problem: + return -1; +} + +static int __event__process_guest_mmap(event_t *self, struct perf_session *session) +{ + struct map *map; + + static const char kmmap_prefix[] = "[guest.kernel.kallsyms."; + + if (memcmp(self->mmap.filename, kmmap_prefix, + sizeof(kmmap_prefix) - 1) == 0) { + const char *symbol_name = (self->mmap.filename + + sizeof(kmmap_prefix) - 1); + /* + * Should be there already, from the build-id table in + * the header. + */ + struct dso *kernel = __dsos__findnew(&dsos__guest_kernel, + "[guest.kernel.kallsyms]"); + if (kernel == NULL) + goto out_problem; + + kernel->kernel = DSO_TYPE_GUEST_KERNEL; + if (__map_groups__create_kernel_maps(&session->guest_kmaps, + session->guest_vmlinux_maps, kernel) < 0) + goto out_problem; + + event_set_kernel_mmap_len(session->guest_vmlinux_maps, self); + perf_session__set_kallsyms_ref_reloc_sym(session->guest_vmlinux_maps, + symbol_name, + self->mmap.pgoff); + /* + * preload dso of guest kernel and modules + */ + dso__load(kernel, session->guest_vmlinux_maps[MAP__FUNCTION], NULL); + } else if (self->mmap.filename[0] == '[') { + char *name; + + map = map_groups__new_module(&session->guest_kmaps, + self->mmap.start, + self->mmap.filename, + 1); + if (map == NULL) + goto out_problem; + name = strdup(self->mmap.filename); + if (name == NULL) + goto out_problem; + + map->dso->short_name = name; + map->end = map->start + self->mmap.len; + } + + return 0; +out_problem: + return -1; +} + +int event__process_mmap(event_t *self, struct perf_session *session) +{ + struct thread *thread; + struct map *map; + u8 cpumode = self->header.misc & PERF_RECORD_MISC_CPUMODE_MASK; + int ret; + + dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n", + self->mmap.pid, self->mmap.tid, self->mmap.start, + self->mmap.len, self->mmap.pgoff, self->mmap.filename); + + if (self->mmap.pid == 0) { + if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL) + ret = __event__process_guest_mmap(self, session); + else + ret = __event__process_mmap(self, session); + if (ret < 0) + goto out_problem; return 0; } @@ -441,15 +526,33 @@ void thread__find_addr_map(struct thread al->thread = self; al->addr = addr; + al->cpumode = cpumode; - if (cpumode == PERF_RECORD_MISC_KERNEL) { + if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) { al->level = 'k'; mg = &session->kmaps; - } else if (cpumode == PERF_RECORD_MISC_USER) + } else if (cpumode == PERF_RECORD_MISC_USER && perf_host) { al->level = '.'; - else { - al->level = 'H'; + } else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) { + al->level = 'g'; + mg = &session->guest_kmaps; + } else { + /* TODO: We don't support guest user space. Might support late */ + if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) + al->level = 'u'; + else + al->level = 'H'; al->map = NULL; + + if ((cpumode == PERF_RECORD_MISC_GUEST_USER || + cpumode == PERF_RECORD_MISC_GUEST_KERNEL) && + !perf_guest) + al->filtered = true; + if ((cpumode == PERF_RECORD_MISC_USER || + cpumode == PERF_RECORD_MISC_KERNEL) && + !perf_host) + al->filtered = true; + return; } try_again: @@ -464,10 +567,18 @@ try_again: * "[vdso]" dso, but for now lets use the old trick of looking * in the whole kernel symbol list. */ - if ((long long)al->addr < 0 && mg != &session->kmaps) { + if ((long long)al->addr < 0 && + mg != &session->kmaps && + cpumode == PERF_RECORD_MISC_KERNEL) { mg = &session->kmaps; goto try_again; } + if ((long long)al->addr < 0 && + mg != &session->guest_kmaps && + cpumode == PERF_RECORD_MISC_GUEST_KERNEL) { + mg = &session->guest_kmaps; + goto try_again; + } } else al->addr = al->map->map_ip(al->map, al->addr); } @@ -513,6 +624,7 @@ int event__preprocess_sample(const event dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid); + al->filtered = false; thread__find_addr_location(thread, session, cpumode, MAP__FUNCTION, self->ip.ip, al, filter); dump_printf(" ...... dso: %s\n", @@ -536,7 +648,6 @@ int event__preprocess_sample(const event !strlist__has_entry(symbol_conf.sym_list, al->sym->name)) goto out_filtered; - al->filtered = false; return 0; out_filtered: diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/event.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/event.h --- linux-2.6_tipmaster0315/tools/perf/util/event.h 2010-03-16 08:59:54.856460879 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/event.h 2010-03-16 09:01:10.000116335 +0800 @@ -119,10 +119,17 @@ int event__synthesize_thread(pid_t pid, void event__synthesize_threads(event__handler_t process, struct perf_session *session); int event__synthesize_kernel_mmap(event__handler_t process, - struct perf_session *session, - const char *symbol_name); + struct perf_session *session, + const char *kallsyms_name, + const char *mmap_name, + struct map **maps, + const char *symbol_name, + unsigned int misc); + int event__synthesize_modules(event__handler_t process, - struct perf_session *session); + struct perf_session *session, + struct map_groups *kmaps, + unsigned int misc); int event__process_comm(event_t *self, struct perf_session *session); int event__process_lost(event_t *self, struct perf_session *session); diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/hist.c linux-2.6_tipmaster0315_perfkvm/tools/perf/util/hist.c --- linux-2.6_tipmaster0315/tools/perf/util/hist.c 2010-03-16 08:59:54.880462306 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/hist.c 2010-03-16 10:44:18.228997471 +0800 @@ -8,6 +8,30 @@ struct callchain_param callchain_param = .min_percent = 0.5 }; +void __perf_session__add_count(struct hist_entry *he, + struct addr_location *al, + u64 count) +{ + he->count += count; + + switch (al->cpumode) { + case PERF_RECORD_MISC_KERNEL: + he->count_sys += count; + break; + case PERF_RECORD_MISC_USER: + he->count_us += count; + break; + case PERF_RECORD_MISC_GUEST_KERNEL: + he->count_guest_sys += count; + break; + case PERF_RECORD_MISC_GUEST_USER: + he->count_guest_us += count; + break; + default: + break; + } +} + /* * histogram, sorted on item, collects counts */ @@ -26,7 +50,6 @@ struct hist_entry *__perf_session__add_h .sym = al->sym, .ip = al->addr, .level = al->level, - .count = count, .parent = sym_parent, }; int cmp; @@ -48,6 +71,8 @@ struct hist_entry *__perf_session__add_h p = &(*p)->rb_right; } + __perf_session__add_count(&entry, al, count); + he = malloc(sizeof(*he)); if (!he) return NULL; @@ -462,7 +487,7 @@ size_t hist_entry__fprintf(struct hist_e u64 session_total) { struct sort_entry *se; - u64 count, total; + u64 count, total, count_sys, count_us, count_guest_sys, count_guest_us; const char *sep = symbol_conf.field_sep; size_t ret; @@ -472,15 +497,35 @@ size_t hist_entry__fprintf(struct hist_e if (pair_session) { count = self->pair ? self->pair->count : 0; total = pair_session->events_stats.total; + count_sys = self->pair ? self->pair->count_sys : 0; + count_us = self->pair ? self->pair->count_us : 0; + count_guest_sys = self->pair ? self->pair->count_guest_sys : 0; + count_guest_us = self->pair ? self->pair->count_guest_us : 0; } else { count = self->count; total = session_total; + count_sys = self->count_sys; + count_us = self->count_us; + count_guest_sys = self->count_guest_sys; + count_guest_us = self->count_guest_us; } - if (total) + if (total) { ret = percent_color_fprintf(fp, sep ? "%.2f" : " %6.2f%%", (count * 100.0) / total); - else + if (symbol_conf.show_cpu_utilization) { + ret += percent_color_fprintf(fp, sep ? "%.2f" : " %6.2f%%", + (count_sys * 100.0) / total); + ret += percent_color_fprintf(fp, sep ? "%.2f" : " %6.2f%%", + (count_us * 100.0) / total); + if (perf_guest) { + ret += percent_color_fprintf(fp, sep ? "%.2f" : " %6.2f%%", + (count_guest_sys * 100.0) / total); + ret += percent_color_fprintf(fp, sep ? "%.2f" : " %6.2f%%", + (count_guest_us * 100.0) / total); + } + } + } else ret = fprintf(fp, sep ? "%lld" : "%12lld ", count); if (symbol_conf.show_nr_samples) { @@ -576,6 +621,20 @@ size_t perf_session__fprintf_hists(struc fputs(" Samples ", fp); } + if (symbol_conf.show_cpu_utilization) { + if (sep) { + ret += fprintf(fp, "%csys", *sep); + ret += fprintf(fp, "%cus", *sep); + ret += fprintf(fp, "%cguest sys", *sep); + ret += fprintf(fp, "%cguest us", *sep); + } else { + ret += fprintf(fp, " sys "); + ret += fprintf(fp, " us "); + ret += fprintf(fp, " guest sys "); + ret += fprintf(fp, " guest us "); + } + } + if (pair) { if (sep) ret += fprintf(fp, "%cDelta", *sep); diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/hist.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/hist.h --- linux-2.6_tipmaster0315/tools/perf/util/hist.h 2010-03-16 08:59:54.868491838 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/hist.h 2010-03-16 10:11:24.744056043 +0800 @@ -12,6 +12,9 @@ struct addr_location; struct symbol; struct rb_root; +void __perf_session__add_count(struct hist_entry *he, + struct addr_location *al, + u64 count); struct hist_entry *__perf_session__add_hist_entry(struct rb_root *hists, struct addr_location *al, struct symbol *parent, diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/session.c linux-2.6_tipmaster0315_perfkvm/tools/perf/util/session.c --- linux-2.6_tipmaster0315/tools/perf/util/session.c 2010-03-16 08:59:54.888458734 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/session.c 2010-03-16 09:01:10.000116335 +0800 @@ -54,7 +54,12 @@ out_close: static inline int perf_session__create_kernel_maps(struct perf_session *self) { - return map_groups__create_kernel_maps(&self->kmaps, self->vmlinux_maps); + int ret; + ret = map_groups__create_kernel_maps(&self->kmaps, self->vmlinux_maps); + if (ret >= 0) + ret = map_groups__create_guest_kernel_maps(&self->guest_kmaps, + self->guest_vmlinux_maps); + return ret; } struct perf_session *perf_session__new(const char *filename, int mode, bool force) @@ -77,6 +82,7 @@ struct perf_session *perf_session__new(c self->cwdlen = 0; self->unknown_events = 0; map_groups__init(&self->kmaps); + map_groups__init(&self->guest_kmaps); if (mode == O_RDONLY) { if (perf_session__open(self, force) < 0) @@ -356,7 +362,8 @@ int perf_header__read_build_ids(struct p if (read(input, filename, len) != len) goto out; - if (bev.header.misc & PERF_RECORD_MISC_KERNEL) + if ((bev.header.misc & PERF_RECORD_MISC_CPUMODE_MASK) + == PERF_RECORD_MISC_KERNEL) head = &dsos__kernel; dso = __dsos__findnew(head, filename); @@ -519,26 +526,33 @@ bool perf_session__has_traces(struct per return true; } -int perf_session__set_kallsyms_ref_reloc_sym(struct perf_session *self, +int perf_session__set_kallsyms_ref_reloc_sym(struct map ** maps, const char *symbol_name, u64 addr) { char *bracket; enum map_type i; + struct ref_reloc_sym *ref; - self->ref_reloc_sym.name = strdup(symbol_name); - if (self->ref_reloc_sym.name == NULL) + ref = zalloc(sizeof(struct ref_reloc_sym)); + if (ref == NULL) return -ENOMEM; - bracket = strchr(self->ref_reloc_sym.name, ']'); + ref->name = strdup(symbol_name); + if (ref->name == NULL) { + free(ref); + return -ENOMEM; + } + + bracket = strchr(ref->name, ']'); if (bracket) *bracket = '\0'; - self->ref_reloc_sym.addr = addr; + ref->addr = addr; for (i = 0; i < MAP__NR_TYPES; ++i) { - struct kmap *kmap = map__kmap(self->vmlinux_maps[i]); - kmap->ref_reloc_sym = &self->ref_reloc_sym; + struct kmap *kmap = map__kmap(maps[i]); + kmap->ref_reloc_sym = ref; } return 0; diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/session.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/session.h --- linux-2.6_tipmaster0315/tools/perf/util/session.h 2010-03-16 08:59:54.768472278 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/session.h 2010-03-16 09:04:50.827525867 +0800 @@ -16,16 +16,17 @@ struct perf_session { unsigned long size; unsigned long mmap_window; struct map_groups kmaps; + struct map_groups guest_kmaps; struct rb_root threads; struct thread *last_match; struct map *vmlinux_maps[MAP__NR_TYPES]; + struct map *guest_vmlinux_maps[MAP__NR_TYPES]; struct events_stats events_stats; struct rb_root stats_by_id; unsigned long event_total[PERF_RECORD_MAX]; unsigned long unknown_events; struct rb_root hists; u64 sample_type; - struct ref_reloc_sym ref_reloc_sym; int fd; int cwdlen; char *cwd; @@ -67,26 +68,12 @@ bool perf_session__has_traces(struct per int perf_header__read_build_ids(struct perf_header *self, int input, u64 offset, u64 file_size); -int perf_session__set_kallsyms_ref_reloc_sym(struct perf_session *self, +int perf_session__set_kallsyms_ref_reloc_sym(struct map ** maps, const char *symbol_name, u64 addr); void mem_bswap_64(void *src, int byte_size); -static inline int __perf_session__create_kernel_maps(struct perf_session *self, - struct dso *kernel) -{ - return __map_groups__create_kernel_maps(&self->kmaps, - self->vmlinux_maps, kernel); -} - -static inline struct map * - perf_session__new_module_map(struct perf_session *self, - u64 start, const char *filename) -{ - return map_groups__new_module(&self->kmaps, start, filename); -} - #ifdef NO_NEWT_SUPPORT static inline void perf_session__browse_hists(struct rb_root *hists __used, u64 session_total __used, diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/sort.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/sort.h --- linux-2.6_tipmaster0315/tools/perf/util/sort.h 2010-03-16 08:59:54.780505450 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/sort.h 2010-03-16 09:46:38.997734739 +0800 @@ -44,6 +44,10 @@ extern enum sort_type sort__first_dimens struct hist_entry { struct rb_node rb_node; u64 count; + u64 count_sys; + u64 count_us; + u64 count_guest_sys; + u64 count_guest_us; struct thread *thread; struct map *map; struct symbol *sym; diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/symbol.c linux-2.6_tipmaster0315_perfkvm/tools/perf/util/symbol.c --- linux-2.6_tipmaster0315/tools/perf/util/symbol.c 2010-03-16 08:59:54.784503211 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/symbol.c 2010-03-16 10:47:03.587519946 +0800 @@ -22,6 +22,8 @@ static void dsos__add(struct list_head * static struct map *map__new2(u64 start, struct dso *dso, enum map_type type); static int dso__load_kernel_sym(struct dso *self, struct map *map, symbol_filter_t filter); +static int dso__load_guest_kernel_sym(struct dso *self, struct map *map, + symbol_filter_t filter); static int vmlinux_path__nr_entries; static char **vmlinux_path; @@ -172,6 +174,7 @@ struct dso *dso__new(const char *name) self->loaded = 0; self->sorted_by_name = 0; self->has_build_id = 0; + self->kernel = DSO_TYPE_USER; } return self; @@ -388,12 +391,9 @@ int kallsyms__parse(const char *filename char *symbol_name; line_len = getline(&line, &n, file); - if (line_len < 0) + if (line_len < 0 || !line) break; - if (!line) - goto out_failure; - line[--line_len] = '\0'; /* \n */ len = hex2u64(line, &start); @@ -445,6 +445,7 @@ static int map__process_kallsym_symbol(v * map__split_kallsyms, when we have split the maps per module */ symbols__insert(root, sym); + return 0; } @@ -490,6 +491,15 @@ static int dso__split_kallsyms(struct ds *module++ = '\0'; if (strcmp(curr_map->dso->short_name, module)) { + if (curr_map != map && + self->kernel == DSO_TYPE_GUEST_KERNEL) { + /* + * We assume all symbols of a module are continuous in + * kallsyms, so curr_map points to a module and all its + * symbols are in its kmap. Mark it as loaded. + */ + dso__set_loaded(curr_map->dso, curr_map->type); + } curr_map = map_groups__find_by_name(kmaps, map->type, module); if (curr_map == NULL) { pr_debug("/proc/{kallsyms,modules} " @@ -511,13 +521,19 @@ static int dso__split_kallsyms(struct ds char dso_name[PATH_MAX]; struct dso *dso; - snprintf(dso_name, sizeof(dso_name), "[kernel].%d", - kernel_range++); + if (self->kernel == DSO_TYPE_GUEST_KERNEL) + snprintf(dso_name, sizeof(dso_name), "[guest.kernel].%d", + kernel_range++); + else + snprintf(dso_name, sizeof(dso_name), "[kernel].%d", + kernel_range++); dso = dso__new(dso_name); if (dso == NULL) return -1; + dso->kernel = self->kernel; + curr_map = map__new2(pos->start, dso, map->type); if (curr_map == NULL) { dso__delete(dso); @@ -541,6 +557,10 @@ discard_symbol: rb_erase(&pos->rb_node, } } + if (curr_map != map && + self->kernel == DSO_TYPE_GUEST_KERNEL) + dso__set_loaded(curr_map->dso, curr_map->type); + return count; } @@ -551,7 +571,10 @@ int dso__load_kallsyms(struct dso *self, return -1; symbols__fixup_end(&self->symbols[map->type]); - self->origin = DSO__ORIG_KERNEL; + if (self->kernel == DSO_TYPE_GUEST_KERNEL) + self->origin = DSO__ORIG_GUEST_KERNEL; + else + self->origin = DSO__ORIG_KERNEL; return dso__split_kallsyms(self, map, filter); } @@ -939,7 +962,7 @@ static int dso__load_sym(struct dso *sel nr_syms = shdr.sh_size / shdr.sh_entsize; memset(&sym, 0, sizeof(sym)); - if (!self->kernel) { + if (self->kernel == DSO_TYPE_USER) { self->adjust_symbols = (ehdr.e_type == ET_EXEC || elf_section_by_name(elf, &ehdr, &shdr, ".gnu.prelink_undo", @@ -971,7 +994,7 @@ static int dso__load_sym(struct dso *sel section_name = elf_sec__name(&shdr, secstrs); - if (self->kernel || kmodule) { + if (self->kernel != DSO_TYPE_USER || kmodule) { char dso_name[PATH_MAX]; if (strcmp(section_name, @@ -997,6 +1020,7 @@ static int dso__load_sym(struct dso *sel curr_dso = dso__new(dso_name); if (curr_dso == NULL) goto out_elf_end; + curr_dso->kernel = self->kernel; curr_map = map__new2(start, curr_dso, map->type); if (curr_map == NULL) { @@ -1007,7 +1031,10 @@ static int dso__load_sym(struct dso *sel curr_map->unmap_ip = identity__map_ip; curr_dso->origin = self->origin; map_groups__insert(kmap->kmaps, curr_map); - dsos__add(&dsos__kernel, curr_dso); + if (curr_dso->kernel == DSO_TYPE_GUEST_KERNEL) + dsos__add(&dsos__guest_kernel, curr_dso); + else + dsos__add(&dsos__kernel, curr_dso); dso__set_loaded(curr_dso, map->type); } else curr_dso = curr_map->dso; @@ -1228,6 +1255,8 @@ char dso__symtab_origin(const struct dso [DSO__ORIG_BUILDID] = 'b', [DSO__ORIG_DSO] = 'd', [DSO__ORIG_KMODULE] = 'K', + [DSO__ORIG_GUEST_KERNEL] = 'g', + [DSO__ORIG_GUEST_KMODULE] = 'G', }; if (self == NULL || self->origin == DSO__ORIG_NOT_FOUND) @@ -1246,8 +1275,10 @@ int dso__load(struct dso *self, struct m dso__set_loaded(self, map->type); - if (self->kernel) + if (self->kernel == DSO_TYPE_KERNEL) return dso__load_kernel_sym(self, map, filter); + else if (self->kernel == DSO_TYPE_GUEST_KERNEL) + return dso__load_guest_kernel_sym(self, map, filter); name = malloc(size); if (!name) @@ -1451,7 +1482,7 @@ static int map_groups__set_modules_path( static struct map *map__new2(u64 start, struct dso *dso, enum map_type type) { struct map *self = zalloc(sizeof(*self) + - (dso->kernel ? sizeof(struct kmap) : 0)); + (dso->kernel != DSO_TYPE_USER ? sizeof(struct kmap) : 0)); if (self != NULL) { /* * ->end will be filled after we load all the symbols @@ -1463,11 +1494,15 @@ static struct map *map__new2(u64 start, } struct map *map_groups__new_module(struct map_groups *self, u64 start, - const char *filename) + const char *filename, int guest) { struct map *map; - struct dso *dso = __dsos__findnew(&dsos__kernel, filename); + struct dso *dso; + if (!guest) + dso = __dsos__findnew(&dsos__kernel, filename); + else + dso = __dsos__findnew(&dsos__guest_kernel, filename); if (dso == NULL) return NULL; @@ -1475,16 +1510,20 @@ struct map *map_groups__new_module(struc if (map == NULL) return NULL; - dso->origin = DSO__ORIG_KMODULE; + if (guest) + dso->origin = DSO__ORIG_GUEST_KMODULE; + else + dso->origin = DSO__ORIG_KMODULE; map_groups__insert(self, map); return map; } -static int map_groups__create_modules(struct map_groups *self) +static int __map_groups__create_modules(struct map_groups *self, + const char * filename, int guest) { char *line = NULL; size_t n; - FILE *file = fopen("/proc/modules", "r"); + FILE *file = fopen(filename, "r"); struct map *map; if (file == NULL) @@ -1518,16 +1557,17 @@ static int map_groups__create_modules(st *sep = '\0'; snprintf(name, sizeof(name), "[%s]", line); - map = map_groups__new_module(self, start, name); + map = map_groups__new_module(self, start, name, guest); if (map == NULL) goto out_delete_line; - dso__kernel_module_get_build_id(map->dso); + if (!guest) + dso__kernel_module_get_build_id(map->dso); } free(line); fclose(file); - return map_groups__set_modules_path(self); + return 0; out_delete_line: free(line); @@ -1535,6 +1575,21 @@ out_failure: return -1; } +static int map_groups__create_modules(struct map_groups *self) +{ + int ret; + + ret = __map_groups__create_modules(self, "/proc/modules", 0); + if (ret >= 0) + ret = map_groups__set_modules_path(self); + return ret; +} + +static int map_groups__create_guest_modules(struct map_groups *self) +{ + return __map_groups__create_modules(self, symbol_conf.guest_modules, 1); +} + static int dso__load_vmlinux(struct dso *self, struct map *map, const char *vmlinux, symbol_filter_t filter) { @@ -1694,8 +1749,44 @@ out_fixup: return err; } +static int dso__load_guest_kernel_sym(struct dso *self, struct map *map, + symbol_filter_t filter) +{ + int err; + const char *kallsyms_filename; + /* + * if the user specified a vmlinux filename, use it and only + * it, reporting errors to the user if it cannot be used. + * Or use file guest_kallsyms inputted by user on commandline + */ + if (symbol_conf.guest_vmlinux_name != NULL) { + err = dso__load_vmlinux(self, map, + symbol_conf.guest_vmlinux_name, filter); + goto out_try_fixup; + } + + kallsyms_filename = symbol_conf.guest_kallsyms; + if (!kallsyms_filename) + return -1; + err = dso__load_kallsyms(self, kallsyms_filename, map, filter); + if (err > 0) + pr_debug("Using %s for symbols\n", kallsyms_filename); + +out_try_fixup: + if (err > 0) { + if (kallsyms_filename != NULL) + dso__set_long_name(self, strdup("[guest.kernel.kallsyms]")); + map__fixup_start(map); + map__fixup_end(map); + } + + return err; +} + LIST_HEAD(dsos__user); LIST_HEAD(dsos__kernel); +LIST_HEAD(dsos__guest_user); +LIST_HEAD(dsos__guest_kernel); static void dsos__add(struct list_head *head, struct dso *dso) { @@ -1742,6 +1833,8 @@ void dsos__fprintf(FILE *fp) { __dsos__fprintf(&dsos__kernel, fp); __dsos__fprintf(&dsos__user, fp); + __dsos__fprintf(&dsos__guest_kernel, fp); + __dsos__fprintf(&dsos__guest_user, fp); } static size_t __dsos__fprintf_buildid(struct list_head *head, FILE *fp, @@ -1771,7 +1864,19 @@ struct dso *dso__new_kernel(const char * if (self != NULL) { self->short_name = "[kernel]"; - self->kernel = 1; + self->kernel = DSO_TYPE_KERNEL; + } + + return self; +} + +struct dso *dso__new_guest_kernel(const char *name) +{ + struct dso *self = dso__new(name ?: "[guest.kernel.kallsyms]"); + + if (self != NULL) { + self->short_name = "[guest.kernel]"; + self->kernel = DSO_TYPE_GUEST_KERNEL; } return self; @@ -1796,6 +1901,15 @@ static struct dso *dsos__create_kernel(c return kernel; } +static struct dso *dsos__create_guest_kernel(const char *vmlinux) +{ + struct dso *kernel = dso__new_guest_kernel(vmlinux); + + if (kernel != NULL) + dsos__add(&dsos__guest_kernel, kernel); + return kernel; +} + int __map_groups__create_kernel_maps(struct map_groups *self, struct map *vmlinux_maps[MAP__NR_TYPES], struct dso *kernel) @@ -1955,3 +2069,24 @@ int map_groups__create_kernel_maps(struc map_groups__fixup_end(self); return 0; } + +int map_groups__create_guest_kernel_maps(struct map_groups *self, + struct map *vmlinux_maps[MAP__NR_TYPES]) +{ + struct dso *kernel = dsos__create_guest_kernel(symbol_conf.guest_vmlinux_name); + + if (kernel == NULL) + return -1; + + if (__map_groups__create_kernel_maps(self, vmlinux_maps, kernel) < 0) + return -1; + + if (symbol_conf.use_modules && map_groups__create_guest_modules(self) < 0) + pr_debug("Problems creating module maps, continuing anyway...\n"); + /* + * Now that we have all the maps created, just set the ->end of them: + */ + map_groups__fixup_end(self); + return 0; +} + diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/symbol.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/symbol.h --- linux-2.6_tipmaster0315/tools/perf/util/symbol.h 2010-03-16 08:59:54.880462306 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/symbol.h 2010-03-16 10:37:03.880361568 +0800 @@ -63,10 +63,14 @@ struct symbol_conf { show_nr_samples, use_callchain, exclude_other, - full_paths; + full_paths, + show_cpu_utilization; const char *vmlinux_name, *field_sep; - char *dso_list_str, + const char *guest_vmlinux_name, + *guest_kallsyms, + *guest_modules; + char *dso_list_str, *comm_list_str, *sym_list_str, *col_width_list_str; @@ -95,6 +99,13 @@ struct addr_location { u64 addr; char level; bool filtered; + unsigned int cpumode; +}; + +enum dso_kernel_type { + DSO_TYPE_USER = 0, + DSO_TYPE_KERNEL, + DSO_TYPE_GUEST_KERNEL }; struct dso { @@ -104,7 +115,7 @@ struct dso { u8 adjust_symbols:1; u8 slen_calculated:1; u8 has_build_id:1; - u8 kernel:1; + enum dso_kernel_type kernel; u8 hit:1; u8 annotate_warned:1; unsigned char origin; @@ -119,6 +130,7 @@ struct dso { struct dso *dso__new(const char *name); struct dso *dso__new_kernel(const char *name); +struct dso *dso__new_guest_kernel(const char *name); void dso__delete(struct dso *self); bool dso__loaded(const struct dso *self, enum map_type type); @@ -131,7 +143,7 @@ static inline void dso__set_loaded(struc void dso__sort_by_name(struct dso *self, enum map_type type); -extern struct list_head dsos__user, dsos__kernel; +extern struct list_head dsos__user, dsos__kernel, dsos__guest_user, dsos__guest_kernel; struct dso *__dsos__findnew(struct list_head *head, const char *name); @@ -160,6 +172,8 @@ enum dso_origin { DSO__ORIG_BUILDID, DSO__ORIG_DSO, DSO__ORIG_KMODULE, + DSO__ORIG_GUEST_KERNEL, + DSO__ORIG_GUEST_KMODULE, DSO__ORIG_NOT_FOUND, }; diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/thread.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/thread.h --- linux-2.6_tipmaster0315/tools/perf/util/thread.h 2010-03-16 08:59:54.764469663 +0800 +++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/thread.h 2010-03-16 09:01:10.004081483 +0800 @@ -82,6 +82,9 @@ int __map_groups__create_kernel_maps(str int map_groups__create_kernel_maps(struct map_groups *self, struct map *vmlinux_maps[MAP__NR_TYPES]); +int map_groups__create_guest_kernel_maps(struct map_groups *self, + struct map *vmlinux_maps[MAP__NR_TYPES]); + struct map *map_groups__new_module(struct map_groups *self, u64 start, - const char *filename); + const char *filename, int guest); #endif /* __PERF_THREAD_H */ ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 5:27 [PATCH] Enhance perf to collect KVM guest os statistics from host side Zhang, Yanmin @ 2010-03-16 5:41 ` Avi Kivity 2010-03-16 7:24 ` Ingo Molnar 2010-03-16 7:48 ` Zhang, Yanmin 2010-03-19 3:38 ` Zhang, Yanmin 1 sibling, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-16 5:41 UTC (permalink / raw) To: Zhang, Yanmin Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> > > Based on the discussion in KVM community, I worked out the patch to support > perf to collect guest os statistics from host side. This patch is implemented > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a > critical bug and provided good suggestions with other guys. I really appreciate > their kind help. > > The patch adds new subcommand kvm to perf. > > perf kvm top > perf kvm record > perf kvm report > perf kvm diff > > The new perf could profile guest os kernel except guest os user space, but it > could summarize guest os user space utilization per guest os. > > Below are some examples. > 1) perf kvm top > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > --guestmodules=/home/ymzhang/guest/modules top > > Excellent, support for guest kernel != host kernel is critical (I can't remember the last time I ran same kernels). How would we support multiple guests with different kernels? Perhaps a symbol server that perf can connect to (and that would connect to guests in turn)? > diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c > --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 2010-03-16 08:59:11.825295404 +0800 > +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c 2010-03-16 09:01:09.976084492 +0800 > @@ -26,6 +26,7 @@ > #include<linux/sched.h> > #include<linux/moduleparam.h> > #include<linux/ftrace_event.h> > +#include<linux/perf_event.h> > #include "kvm_cache_regs.h" > #include "x86.h" > > @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct > vmcs_write32(TPR_THRESHOLD, irr); > } > > +DEFINE_PER_CPU(int, kvm_in_guest) = {0}; > + > +static void kvm_set_in_guest(void) > +{ > + percpu_write(kvm_in_guest, 1); > +} > + > +static int kvm_is_in_guest(void) > +{ > + return percpu_read(kvm_in_guest); > +} > There is already PF_VCPU for this. > +static struct perf_guest_info_callbacks kvm_guest_cbs = { > + .is_in_guest = kvm_is_in_guest, > + .is_user_mode = kvm_is_user_mode, > + .get_guest_ip = kvm_get_guest_ip, > + .reset_in_guest = kvm_reset_in_guest, > +}; > Should be in common code, not vmx specific. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 5:41 ` Avi Kivity @ 2010-03-16 7:24 ` Ingo Molnar 2010-03-16 9:20 ` Avi Kivity 2010-03-16 7:48 ` Zhang, Yanmin 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 7:24 UTC (permalink / raw) To: Avi Kivity Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Avi Kivity <avi@redhat.com> wrote: > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > >From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> > > > >Based on the discussion in KVM community, I worked out the patch to support > >perf to collect guest os statistics from host side. This patch is implemented > >with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a > >critical bug and provided good suggestions with other guys. I really appreciate > >their kind help. > > > >The patch adds new subcommand kvm to perf. > > > > perf kvm top > > perf kvm record > > perf kvm report > > perf kvm diff > > > >The new perf could profile guest os kernel except guest os user space, but it > >could summarize guest os user space utilization per guest os. > > > >Below are some examples. > >1) perf kvm top > >[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > >--guestmodules=/home/ymzhang/guest/modules top > > > > Excellent, support for guest kernel != host kernel is critical (I > can't remember the last time I ran same kernels). > > How would we support multiple guests with different kernels? Perhaps a > symbol server that perf can connect to (and that would connect to guests in > turn)? The highest quality solution would be if KVM offered a 'guest extension' to the guest kernel's /proc/kallsyms that made it easy for user-space to get this information from an authorative source. That's the main reason why the host side /proc/kallsyms is so popular and so useful: while in theory it's mostly redundant information which can be gleaned from the System.map and other sources of symbol information, it's easily available and is _always_ trustable to come from the host kernel. Separate System.map's have a tendency to go out of sync (or go missing when a devel kernel gets rebuilt, or if a devel package is not installed), and server ports (be that a TCP port space server or an UDP port space mount-point) are both a configuration hassle and are not guest-transparent. So for instrumentation infrastructure (such as perf) we have a large and well founded preference for intrinsic, built-in, kernel-provided information: i.e. a largely 'built-in' and transparent mechanism to get to guest symbols. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 7:24 ` Ingo Molnar @ 2010-03-16 9:20 ` Avi Kivity 2010-03-16 9:53 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-16 9:20 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 09:24 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: >> >>> From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> >>> >>> Based on the discussion in KVM community, I worked out the patch to support >>> perf to collect guest os statistics from host side. This patch is implemented >>> with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a >>> critical bug and provided good suggestions with other guys. I really appreciate >>> their kind help. >>> >>> The patch adds new subcommand kvm to perf. >>> >>> perf kvm top >>> perf kvm record >>> perf kvm report >>> perf kvm diff >>> >>> The new perf could profile guest os kernel except guest os user space, but it >>> could summarize guest os user space utilization per guest os. >>> >>> Below are some examples. >>> 1) perf kvm top >>> [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms >>> --guestmodules=/home/ymzhang/guest/modules top >>> >>> >> Excellent, support for guest kernel != host kernel is critical (I >> can't remember the last time I ran same kernels). >> >> How would we support multiple guests with different kernels? Perhaps a >> symbol server that perf can connect to (and that would connect to guests in >> turn)? >> > The highest quality solution would be if KVM offered a 'guest extension' to > the guest kernel's /proc/kallsyms that made it easy for user-space to get this > information from an authorative source. > > That's the main reason why the host side /proc/kallsyms is so popular and so > useful: while in theory it's mostly redundant information which can be gleaned > from the System.map and other sources of symbol information, it's easily > available and is _always_ trustable to come from the host kernel. > > Separate System.map's have a tendency to go out of sync (or go missing when a > devel kernel gets rebuilt, or if a devel package is not installed), and server > ports (be that a TCP port space server or an UDP port space mount-point) are > both a configuration hassle and are not guest-transparent. > > So for instrumentation infrastructure (such as perf) we have a large and well > founded preference for intrinsic, built-in, kernel-provided information: i.e. > a largely 'built-in' and transparent mechanism to get to guest symbols. > The symbol server's client can certainly access the bits through vmchannel. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 9:20 ` Avi Kivity @ 2010-03-16 9:53 ` Ingo Molnar 2010-03-16 10:13 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 9:53 UTC (permalink / raw) To: Avi Kivity Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Avi Kivity <avi@redhat.com> wrote: > On 03/16/2010 09:24 AM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > >>>From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> > >>> > >>>Based on the discussion in KVM community, I worked out the patch to support > >>>perf to collect guest os statistics from host side. This patch is implemented > >>>with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a > >>>critical bug and provided good suggestions with other guys. I really appreciate > >>>their kind help. > >>> > >>>The patch adds new subcommand kvm to perf. > >>> > >>> perf kvm top > >>> perf kvm record > >>> perf kvm report > >>> perf kvm diff > >>> > >>>The new perf could profile guest os kernel except guest os user space, but it > >>>could summarize guest os user space utilization per guest os. > >>> > >>>Below are some examples. > >>>1) perf kvm top > >>>[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > >>>--guestmodules=/home/ymzhang/guest/modules top > >>> > >>Excellent, support for guest kernel != host kernel is critical (I > >>can't remember the last time I ran same kernels). > >> > >>How would we support multiple guests with different kernels? Perhaps a > >>symbol server that perf can connect to (and that would connect to guests in > >>turn)? > >The highest quality solution would be if KVM offered a 'guest extension' to > >the guest kernel's /proc/kallsyms that made it easy for user-space to get this > >information from an authorative source. > > > >That's the main reason why the host side /proc/kallsyms is so popular and so > >useful: while in theory it's mostly redundant information which can be gleaned > >from the System.map and other sources of symbol information, it's easily > >available and is _always_ trustable to come from the host kernel. > > > >Separate System.map's have a tendency to go out of sync (or go missing when a > >devel kernel gets rebuilt, or if a devel package is not installed), and server > >ports (be that a TCP port space server or an UDP port space mount-point) are > >both a configuration hassle and are not guest-transparent. > > > >So for instrumentation infrastructure (such as perf) we have a large and well > >founded preference for intrinsic, built-in, kernel-provided information: i.e. > >a largely 'built-in' and transparent mechanism to get to guest symbols. > > The symbol server's client can certainly access the bits through vmchannel. Ok, that would work i suspect. Would be nice to have the symbol server in tools/perf/ and also make it easy to add it to the initrd via a .config switch or so. That would have basically all of the advantages of being built into the kernel (availability, configurability, transparency, hackability), while having all the advantages of a user-space approach as well (flexibility, extensibility, robustness, ease of maintenance, etc.). If only we had tools/xorg/ integrated via the initrd that way ;-) Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 9:53 ` Ingo Molnar @ 2010-03-16 10:13 ` Avi Kivity 2010-03-16 10:20 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-16 10:13 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 11:53 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/16/2010 09:24 AM, Ingo Molnar wrote: >> >>> * Avi Kivity<avi@redhat.com> wrote: >>> >>> >>>> On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: >>>> >>>>> From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> >>>>> >>>>> Based on the discussion in KVM community, I worked out the patch to support >>>>> perf to collect guest os statistics from host side. This patch is implemented >>>>> with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a >>>>> critical bug and provided good suggestions with other guys. I really appreciate >>>>> their kind help. >>>>> >>>>> The patch adds new subcommand kvm to perf. >>>>> >>>>> perf kvm top >>>>> perf kvm record >>>>> perf kvm report >>>>> perf kvm diff >>>>> >>>>> The new perf could profile guest os kernel except guest os user space, but it >>>>> could summarize guest os user space utilization per guest os. >>>>> >>>>> Below are some examples. >>>>> 1) perf kvm top >>>>> [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms >>>>> --guestmodules=/home/ymzhang/guest/modules top >>>>> >>>>> >>>> Excellent, support for guest kernel != host kernel is critical (I >>>> can't remember the last time I ran same kernels). >>>> >>>> How would we support multiple guests with different kernels? Perhaps a >>>> symbol server that perf can connect to (and that would connect to guests in >>>> turn)? >>>> >>> The highest quality solution would be if KVM offered a 'guest extension' to >>> the guest kernel's /proc/kallsyms that made it easy for user-space to get this >>> information from an authorative source. >>> >>> That's the main reason why the host side /proc/kallsyms is so popular and so >>> useful: while in theory it's mostly redundant information which can be gleaned >>> >> >from the System.map and other sources of symbol information, it's easily >> >>> available and is _always_ trustable to come from the host kernel. >>> >>> Separate System.map's have a tendency to go out of sync (or go missing when a >>> devel kernel gets rebuilt, or if a devel package is not installed), and server >>> ports (be that a TCP port space server or an UDP port space mount-point) are >>> both a configuration hassle and are not guest-transparent. >>> >>> So for instrumentation infrastructure (such as perf) we have a large and well >>> founded preference for intrinsic, built-in, kernel-provided information: i.e. >>> a largely 'built-in' and transparent mechanism to get to guest symbols. >>> >> The symbol server's client can certainly access the bits through vmchannel. >> > Ok, that would work i suspect. > > Would be nice to have the symbol server in tools/perf/ and also make it easy > to add it to the initrd via a .config switch or so. > > That would have basically all of the advantages of being built into the kernel > (availability, configurability, transparency, hackability), while having all > the advantages of a user-space approach as well (flexibility, extensibility, > robustness, ease of maintenance, etc.). > Note, I am not advocating building the vmchannel client into the host kernel. While that makes everything simpler for the user, it increases the kernel footprint with all the disadvantages that come with that (any bug is converted into a host DoS or worse). So, perf would connect to qemu via (say) a well-known unix domain socket, which would then talk to the guest kernel. I know you won't like it, we'll continue to disagree on this unfortunately. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 10:13 ` Avi Kivity @ 2010-03-16 10:20 ` Ingo Molnar 2010-03-16 10:40 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 10:20 UTC (permalink / raw) To: Avi Kivity Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Avi Kivity <avi@redhat.com> wrote: > On 03/16/2010 11:53 AM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>On 03/16/2010 09:24 AM, Ingo Molnar wrote: > >>>* Avi Kivity<avi@redhat.com> wrote: > >>> > >>>>On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > >>>>>From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> > >>>>> > >>>>>Based on the discussion in KVM community, I worked out the patch to support > >>>>>perf to collect guest os statistics from host side. This patch is implemented > >>>>>with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a > >>>>>critical bug and provided good suggestions with other guys. I really appreciate > >>>>>their kind help. > >>>>> > >>>>>The patch adds new subcommand kvm to perf. > >>>>> > >>>>> perf kvm top > >>>>> perf kvm record > >>>>> perf kvm report > >>>>> perf kvm diff > >>>>> > >>>>>The new perf could profile guest os kernel except guest os user space, but it > >>>>>could summarize guest os user space utilization per guest os. > >>>>> > >>>>>Below are some examples. > >>>>>1) perf kvm top > >>>>>[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > >>>>>--guestmodules=/home/ymzhang/guest/modules top > >>>>> > >>>>Excellent, support for guest kernel != host kernel is critical (I > >>>>can't remember the last time I ran same kernels). > >>>> > >>>>How would we support multiple guests with different kernels? Perhaps a > >>>>symbol server that perf can connect to (and that would connect to guests in > >>>>turn)? > >>>The highest quality solution would be if KVM offered a 'guest extension' to > >>>the guest kernel's /proc/kallsyms that made it easy for user-space to get this > >>>information from an authorative source. > >>> > >>>That's the main reason why the host side /proc/kallsyms is so popular and so > >>>useful: while in theory it's mostly redundant information which can be gleaned > >>>from the System.map and other sources of symbol information, it's easily > >>>available and is _always_ trustable to come from the host kernel. > >>> > >>>Separate System.map's have a tendency to go out of sync (or go missing when a > >>>devel kernel gets rebuilt, or if a devel package is not installed), and server > >>>ports (be that a TCP port space server or an UDP port space mount-point) are > >>>both a configuration hassle and are not guest-transparent. > >>> > >>>So for instrumentation infrastructure (such as perf) we have a large and well > >>>founded preference for intrinsic, built-in, kernel-provided information: i.e. > >>>a largely 'built-in' and transparent mechanism to get to guest symbols. > >>The symbol server's client can certainly access the bits through vmchannel. > >Ok, that would work i suspect. > > > >Would be nice to have the symbol server in tools/perf/ and also make it easy > >to add it to the initrd via a .config switch or so. > > > >That would have basically all of the advantages of being built into the kernel > >(availability, configurability, transparency, hackability), while having all > >the advantages of a user-space approach as well (flexibility, extensibility, > >robustness, ease of maintenance, etc.). > > Note, I am not advocating building the vmchannel client into the host > kernel. [...] Neither am i. What i suggested was a user-space binary/executable built in tools/perf and put into the initrd. That approach has the advantages i listed above, without having the disadvantages of in-kernel code you listed. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 10:20 ` Ingo Molnar @ 2010-03-16 10:40 ` Avi Kivity 2010-03-16 10:50 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-16 10:40 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 12:20 PM, Ingo Molnar wrote: >>>> >>>> The symbol server's client can certainly access the bits through vmchannel. >>>> >>> Ok, that would work i suspect. >>> >>> Would be nice to have the symbol server in tools/perf/ and also make it easy >>> to add it to the initrd via a .config switch or so. >>> >>> That would have basically all of the advantages of being built into the kernel >>> (availability, configurability, transparency, hackability), while having all >>> the advantages of a user-space approach as well (flexibility, extensibility, >>> robustness, ease of maintenance, etc.). >>> >> Note, I am not advocating building the vmchannel client into the host >> kernel. [...] >> > Neither am i. What i suggested was a user-space binary/executable built in > tools/perf and put into the initrd. > I'm confused - initrd seems to be guest-side. I was talking about the host side. For the guest, placing the symbol server in tools/ is reasonable. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 10:40 ` Avi Kivity @ 2010-03-16 10:50 ` Ingo Molnar 2010-03-16 11:10 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 10:50 UTC (permalink / raw) To: Avi Kivity Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Avi Kivity <avi@redhat.com> wrote: > On 03/16/2010 12:20 PM, Ingo Molnar wrote: > >>>> > >>>>The symbol server's client can certainly access the bits through vmchannel. > >>>Ok, that would work i suspect. > >>> > >>>Would be nice to have the symbol server in tools/perf/ and also make it easy > >>>to add it to the initrd via a .config switch or so. > >>> > >>>That would have basically all of the advantages of being built into the kernel > >>>(availability, configurability, transparency, hackability), while having all > >>>the advantages of a user-space approach as well (flexibility, extensibility, > >>>robustness, ease of maintenance, etc.). > >>Note, I am not advocating building the vmchannel client into the host > >>kernel. [...] > >Neither am i. What i suggested was a user-space binary/executable built in > >tools/perf and put into the initrd. > > I'm confused - initrd seems to be guest-side. I was talking about the host > side. host side doesnt need much support - just some client capability in perf itself. I suspect vmchannels are sufficiently flexible and configuration-free for such purposes? (i.e. like a filesystem in essence) Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 10:50 ` Ingo Molnar @ 2010-03-16 11:10 ` Avi Kivity 2010-03-16 11:25 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-16 11:10 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 12:50 PM, Ingo Molnar wrote: > >> I'm confused - initrd seems to be guest-side. I was talking about the host >> side. >> > host side doesnt need much support - just some client capability in perf > itself. I suspect vmchannels are sufficiently flexible and configuration-free > for such purposes? (i.e. like a filesystem in essence) > I haven't followed vmchannel closely, but I think it is. vmchannel is terminated in qemu on the host side, not in the host kernel. So perf would need to connect to qemu. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 11:10 ` Avi Kivity @ 2010-03-16 11:25 ` Ingo Molnar 2010-03-16 12:21 ` Avi Kivity 2010-03-16 22:30 ` [PATCH] Enhance perf to collect KVM guest os statistics from host side oerg Roedel 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 11:25 UTC (permalink / raw) To: Avi Kivity Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Avi Kivity <avi@redhat.com> wrote: > On 03/16/2010 12:50 PM, Ingo Molnar wrote: > > > >>I'm confused - initrd seems to be guest-side. I was talking about the host > >>side. > >host side doesnt need much support - just some client capability in perf > >itself. I suspect vmchannels are sufficiently flexible and configuration-free > >for such purposes? (i.e. like a filesystem in essence) > > I haven't followed vmchannel closely, but I think it is. vmchannel is > terminated in qemu on the host side, not in the host kernel. So perf would > need to connect to qemu. Hm, that sounds rather messy if we want to use it to basically expose kernel functionality in a guest/host unified way. Is the qemu process discoverable in some secure way? Can we trust it? Is there some proper tooling available to do it, or do we have to push it through 2-3 packages to get such a useful feature done? ( That is the general thought process how many cross-discipline useful desktop/server features hit the bit bucket before having had any chance of being vetted by users, and why Linux sucks so much when it comes to feature integration and application usability. ) Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 11:25 ` Ingo Molnar @ 2010-03-16 12:21 ` Avi Kivity 2010-03-16 12:29 ` Ingo Molnar 2010-03-16 22:30 ` [PATCH] Enhance perf to collect KVM guest os statistics from host side oerg Roedel 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-16 12:21 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 01:25 PM, Ingo Molnar wrote: > >> I haven't followed vmchannel closely, but I think it is. vmchannel is >> terminated in qemu on the host side, not in the host kernel. So perf would >> need to connect to qemu. >> > Hm, that sounds rather messy if we want to use it to basically expose kernel > functionality in a guest/host unified way. Is the qemu process discoverable in > some secure way? We know its pid. > Can we trust it? No choice, it contains the guest address space. > Is there some proper tooling available to do > it, or do we have to push it through 2-3 packages to get such a useful feature > done? > libvirt manages qemu processes, but I don't think this should go through libvirt. qemu can do this directly by opening a unix domain socket in a well-known place. > ( That is the general thought process how many cross-discipline useful > desktop/server features hit the bit bucket before having had any chance of > being vetted by users, and why Linux sucks so much when it comes to feature > integration and application usability. ) > You can't solve everything in the kernel, even with a well populated tools/. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 12:21 ` Avi Kivity @ 2010-03-16 12:29 ` Ingo Molnar 2010-03-16 12:41 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 12:29 UTC (permalink / raw) To: Avi Kivity Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Avi Kivity <avi@redhat.com> wrote: > On 03/16/2010 01:25 PM, Ingo Molnar wrote: > > > >>I haven't followed vmchannel closely, but I think it is. vmchannel is > >>terminated in qemu on the host side, not in the host kernel. So perf would > >>need to connect to qemu. > >Hm, that sounds rather messy if we want to use it to basically expose kernel > >functionality in a guest/host unified way. Is the qemu process discoverable in > >some secure way? > > We know its pid. How do i get a list of all 'guest instance PIDs', and what is the way to talk to Qemu? > > Can we trust it? > > No choice, it contains the guest address space. I mean, i can trust a kernel service and i can trust /proc/kallsyms. Can perf trust a random process claiming to be Qemu? What's the trust mechanism here? > > Is there some proper tooling available to do it, or do we have to push it > > through 2-3 packages to get such a useful feature done? > > libvirt manages qemu processes, but I don't think this should go through > libvirt. qemu can do this directly by opening a unix domain socket in a > well-known place. So Qemu has never run into such problems before? ( Sounds weird - i think Qemu configuration itself should be done via a unix domain socket driven configuration protocol as well. ) > >( That is the general thought process how many cross-discipline useful > > desktop/server features hit the bit bucket before having had any chance of > > being vetted by users, and why Linux sucks so much when it comes to feature > > integration and application usability. ) > > You can't solve everything in the kernel, even with a well populated tools/. Certainly not, but this is a technical problem in the kernel's domain, so it's a fair (and natural) expectation to be able to solve this within the kernel project. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 12:29 ` Ingo Molnar @ 2010-03-16 12:41 ` Avi Kivity 2010-03-16 13:08 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-16 12:41 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 02:29 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/16/2010 01:25 PM, Ingo Molnar wrote: >> >>> >>>> I haven't followed vmchannel closely, but I think it is. vmchannel is >>>> terminated in qemu on the host side, not in the host kernel. So perf would >>>> need to connect to qemu. >>>> >>> Hm, that sounds rather messy if we want to use it to basically expose kernel >>> functionality in a guest/host unified way. Is the qemu process discoverable in >>> some secure way? >>> >> We know its pid. >> > How do i get a list of all 'guest instance PIDs', Libvirt manages all qemus, but this should be implemented independently of libvirt. > and what is the way to talk > to Qemu? > In general qemu exposes communication channels (such as the monitor) as tcp connections, unix-domain sockets, stdio, etc. It's very flexible. >>> Can we trust it? >>> >> No choice, it contains the guest address space. >> > I mean, i can trust a kernel service and i can trust /proc/kallsyms. > > Can perf trust a random process claiming to be Qemu? What's the trust > mechanism here? > Obviously you can't trust anything you get from a guest, no matter how you get it. How do you trust a userspace program's symbols? you don't. How do you get them? they're on a well-known location. >>> Is there some proper tooling available to do it, or do we have to push it >>> through 2-3 packages to get such a useful feature done? >>> >> libvirt manages qemu processes, but I don't think this should go through >> libvirt. qemu can do this directly by opening a unix domain socket in a >> well-known place. >> > So Qemu has never run into such problems before? > > ( Sounds weird - i think Qemu configuration itself should be done via a > unix domain socket driven configuration protocol as well. ) > That's exactly what happens. You invoke qemu with -monitor unix:blah,server (or -qmp for a machine-readable format) and have your management application connect to that. You can redirect guest serial ports, console, parallel port, etc. to unix-domain or tcp sockets. vmchannel is an extension of that mechanism. >>> ( That is the general thought process how many cross-discipline useful >>> desktop/server features hit the bit bucket before having had any chance of >>> being vetted by users, and why Linux sucks so much when it comes to feature >>> integration and application usability. ) >>> >> You can't solve everything in the kernel, even with a well populated tools/. >> > Certainly not, but this is a technical problem in the kernel's domain, so it's > a fair (and natural) expectation to be able to solve this within the kernel > project. > Someone writing perf-gui outside the kernel would have the same problems, no? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 12:41 ` Avi Kivity @ 2010-03-16 13:08 ` Ingo Molnar 2010-03-16 13:16 ` Avi Kivity 2010-03-16 17:06 ` Anthony Liguori 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 13:08 UTC (permalink / raw) To: Avi Kivity Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Avi Kivity <avi@redhat.com> wrote: > On 03/16/2010 02:29 PM, Ingo Molnar wrote: > > I mean, i can trust a kernel service and i can trust /proc/kallsyms. > > > > Can perf trust a random process claiming to be Qemu? What's the trust > > mechanism here? > > Obviously you can't trust anything you get from a guest, no matter how you > get it. I'm not talking about the symbol strings and addresses, and the object contents for allocation (or debuginfo). I'm talking about the basic protocol of establishing which guest is which. I.e. we really want to be able users to: 1) have it all working with a single guest, without having to specify 'which' guest (qemu PID) to work with. That is the dominant usecase both for developers and for a fair portion of testers. 2) Have some reasonable symbolic identification for guests. For example a usable approach would be to have 'perf kvm list', which would list all currently active guests: $ perf kvm list [1] Fedora [2] OpenSuse [3] Windows-XP [4] Windows-7 And from that point on 'perf kvm -g OpenSuse record' would do the obvious thing. Users will be able to just use the 'OpenSuse' symbolic name for that guest, even if the guest got restarted and switched its main PID. Any such facility needs trusted enumeration and a protocol where i can trust that the information i got is authorative. (I.e. 'OpenSuse' truly matches to the OpenSuse session - not to some local user starting up a Qemu instance that claims to be 'OpenSuse'.) Is such a scheme possible/available? I suspect all the KVM configuration tools (i havent used them in some time - gui and command-line tools alike) use similar methods to ease guest management? Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 13:08 ` Ingo Molnar @ 2010-03-16 13:16 ` Avi Kivity 2010-03-16 13:31 ` Ingo Molnar 2010-03-16 17:06 ` Anthony Liguori 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-16 13:16 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 03:08 PM, Ingo Molnar wrote: > >>> I mean, i can trust a kernel service and i can trust /proc/kallsyms. >>> >>> Can perf trust a random process claiming to be Qemu? What's the trust >>> mechanism here? >>> >> Obviously you can't trust anything you get from a guest, no matter how you >> get it. >> > I'm not talking about the symbol strings and addresses, and the object > contents for allocation (or debuginfo). I'm talking about the basic protocol > of establishing which guest is which. > There is none. So far, qemu only dealt with managing just its own guest, and left all multiple guest management to higher levels up the stack (like libvirt). > I.e. we really want to be able users to: > > 1) have it all working with a single guest, without having to specify 'which' > guest (qemu PID) to work with. That is the dominant usecase both for > developers and for a fair portion of testers. > That's reasonable if we can get it working simply. > 2) Have some reasonable symbolic identification for guests. For example a > usable approach would be to have 'perf kvm list', which would list all > currently active guests: > > $ perf kvm list > [1] Fedora > [2] OpenSuse > [3] Windows-XP > [4] Windows-7 > > And from that point on 'perf kvm -g OpenSuse record' would do the obvious > thing. Users will be able to just use the 'OpenSuse' symbolic name for > that guest, even if the guest got restarted and switched its main PID. > > Any such facility needs trusted enumeration and a protocol where i can trust > that the information i got is authorative. (I.e. 'OpenSuse' truly matches to > the OpenSuse session - not to some local user starting up a Qemu instance that > claims to be 'OpenSuse'.) > > Is such a scheme possible/available? I suspect all the KVM configuration tools > (i havent used them in some time - gui and command-line tools alike) use > similar methods to ease guest management? > You can do that through libvirt, but that only works for guests started through libvirt. libvirt provides command-line tools to list and manage guests (for example autostarting them on startup), and tools built on top of libvirt can manage guests graphically. Looks like we have a layer inversion here. Maybe we need a plugin system - libvirt drops a .so into perf that teaches it how to list guests and get their symbols. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 13:16 ` Avi Kivity @ 2010-03-16 13:31 ` Ingo Molnar 2010-03-16 13:37 ` Avi Kivity 2010-03-16 15:06 ` Frank Ch. Eigler 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 13:31 UTC (permalink / raw) To: Avi Kivity Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Avi Kivity <avi@redhat.com> wrote: > On 03/16/2010 03:08 PM, Ingo Molnar wrote: > > > >>>I mean, i can trust a kernel service and i can trust /proc/kallsyms. > >>> > >>>Can perf trust a random process claiming to be Qemu? What's the trust > >>>mechanism here? > >>Obviously you can't trust anything you get from a guest, no matter how you > >>get it. > >I'm not talking about the symbol strings and addresses, and the object > >contents for allocation (or debuginfo). I'm talking about the basic protocol > >of establishing which guest is which. > > There is none. So far, qemu only dealt with managing just its own > guest, and left all multiple guest management to higher levels up > the stack (like libvirt). > > >I.e. we really want to be able users to: > > > > 1) have it all working with a single guest, without having to specify 'which' > > guest (qemu PID) to work with. That is the dominant usecase both for > > developers and for a fair portion of testers. > > That's reasonable if we can get it working simply. IMO such ease of use is reasonable and required, full stop. If it cannot be gotten simply then that's a bug: either in the code, or in the design, or in the development process that led to the design. Bugs need fixing. > > 2) Have some reasonable symbolic identification for guests. For example a > > usable approach would be to have 'perf kvm list', which would list all > > currently active guests: > > > > $ perf kvm list > > [1] Fedora > > [2] OpenSuse > > [3] Windows-XP > > [4] Windows-7 > > > > And from that point on 'perf kvm -g OpenSuse record' would do the obvious > > thing. Users will be able to just use the 'OpenSuse' symbolic name for > > that guest, even if the guest got restarted and switched its main PID. > > > > Any such facility needs trusted enumeration and a protocol where i can > > trust that the information i got is authorative. (I.e. 'OpenSuse' truly > > matches to the OpenSuse session - not to some local user starting up a > > Qemu instance that claims to be 'OpenSuse'.) > > > > Is such a scheme possible/available? I suspect all the KVM configuration > > tools (i havent used them in some time - gui and command-line tools alike) > > use similar methods to ease guest management? > > You can do that through libvirt, but that only works for guests started > through libvirt. libvirt provides command-line tools to list and manage > guests (for example autostarting them on startup), and tools built on top of > libvirt can manage guests graphically. > > Looks like we have a layer inversion here. Maybe we need a plugin system - > libvirt drops a .so into perf that teaches it how to list guests and get > their symbols. Is libvirt used to start up all KVM guests? If not, if it's only used on some distros while other distros have other solutions then there's apparently no good way to get to such information, and the kernel bits of KVM do not provide it. To the user (and to me) this looks like a KVM bug / missing feature. (and the user doesnt care where the blame is) If that is true then apparently the current KVM design has no technically actionable solution for certain categories of features! Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 13:31 ` Ingo Molnar @ 2010-03-16 13:37 ` Avi Kivity 2010-03-16 15:06 ` Frank Ch. Eigler 1 sibling, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-16 13:37 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 03:31 PM, Ingo Molnar wrote: > >> You can do that through libvirt, but that only works for guests started >> through libvirt. libvirt provides command-line tools to list and manage >> guests (for example autostarting them on startup), and tools built on top of >> libvirt can manage guests graphically. >> >> Looks like we have a layer inversion here. Maybe we need a plugin system - >> libvirt drops a .so into perf that teaches it how to list guests and get >> their symbols. >> > Is libvirt used to start up all KVM guests? If not, if it's only used on some > distros while other distros have other solutions then there's apparently no > good way to get to such information, and the kernel bits of KVM do not provide > it. > Developers tend to start qemu from the command line, but the majority of users and all distros I know of use libvirt. Some users cobble up their own scripts. > To the user (and to me) this looks like a KVM bug / missing feature. (and the > user doesnt care where the blame is) If that is true then apparently the > current KVM design has no technically actionable solution for certain > categories of features! > A plugin system allows anyone who is interested to provide the information; they just need to write a plugin for their management tool. Since we can't prevent people from writing management tools, I don't see what else we can do. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 13:31 ` Ingo Molnar 2010-03-16 13:37 ` Avi Kivity @ 2010-03-16 15:06 ` Frank Ch. Eigler 2010-03-16 15:52 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Frank Ch. Eigler @ 2010-03-16 15:06 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang Ingo Molnar <mingo@elte.hu> writes: > [...] >> >I.e. we really want to be able users to: >> > >> > 1) have it all working with a single guest, without having to specify 'which' >> > guest (qemu PID) to work with. That is the dominant usecase both for >> > developers and for a fair portion of testers. >> >> That's reasonable if we can get it working simply. > > IMO such ease of use is reasonable and required, full stop. > If it cannot be gotten simply then that's a bug: either in the code, or in the > design, or in the development process that led to the design. Bugs need > fixing. [...] Perhaps the fact that kvm happens to deal with an interesting application area (virtualization) is misleading here. As far as the host kernel or other host userspace is concerned, qemu is just some random unprivileged userspace program (with some *optional* /dev/kvm services that might happen to require temporary root). As such, perf trying to instrument qemu is no different than perf trying to instrument any other userspace widget. Therefore, expecting 'trusted enumeration' of instances is just as sensible as using 'trusted ps' and 'trusted /var/run/FOO.pid files'. - FChE ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 15:06 ` Frank Ch. Eigler @ 2010-03-16 15:52 ` Ingo Molnar 2010-03-16 16:08 ` Frank Ch. Eigler 2010-03-16 17:34 ` Anthony Liguori 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 15:52 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Frank Ch. Eigler <fche@redhat.com> wrote: > > Ingo Molnar <mingo@elte.hu> writes: > > > [...] > >> >I.e. we really want to be able users to: > >> > > >> > 1) have it all working with a single guest, without having to specify 'which' > >> > guest (qemu PID) to work with. That is the dominant usecase both for > >> > developers and for a fair portion of testers. > >> > >> That's reasonable if we can get it working simply. > > > > IMO such ease of use is reasonable and required, full stop. > > If it cannot be gotten simply then that's a bug: either in the code, or in the > > design, or in the development process that led to the design. Bugs need > > fixing. [...] > > Perhaps the fact that kvm happens to deal with an interesting application > area (virtualization) is misleading here. As far as the host kernel or > other host userspace is concerned, qemu is just some random unprivileged > userspace program (with some *optional* /dev/kvm services that might happen > to require temporary root). > > As such, perf trying to instrument qemu is no different than perf trying to > instrument any other userspace widget. Therefore, expecting 'trusted > enumeration' of instances is just as sensible as using 'trusted ps' and > 'trusted /var/run/FOO.pid files'. You are quite mistaken: KVM isnt really a 'random unprivileged application' in this context, it is clearly an extension of system/kernel services. ( Which can be seen from the simple fact that what started the discussion was 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the existing host-space /proc/kallsyms was desired. ) In that sense the most natural 'extension' would be the solution i mentioned a week or two ago: to have a (read only) mount of all guest filesystems, plus a channel for profiling/tracing data. That would make symbol parsing easier and it's what extends the existing 'host space' abstraction in the most natural way. ( It doesnt even have to be done via the kernel - Qemu could implement that via FUSE for example. ) As a second best option a 'symbol server' might be used too. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 15:52 ` Ingo Molnar @ 2010-03-16 16:08 ` Frank Ch. Eigler 2010-03-16 16:35 ` Ingo Molnar 2010-03-16 17:34 ` Anthony Liguori 1 sibling, 1 reply; 390+ messages in thread From: Frank Ch. Eigler @ 2010-03-16 16:08 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang Hi - On Tue, Mar 16, 2010 at 04:52:21PM +0100, Ingo Molnar wrote: > [...] > > Perhaps the fact that kvm happens to deal with an interesting application > > area (virtualization) is misleading here. As far as the host kernel or > > other host userspace is concerned, qemu is just some random unprivileged > > userspace program [...] > You are quite mistaken: KVM isnt really a 'random unprivileged > application' in this context, it is clearly an extension of > system/kernel services. I don't know what "extension of system/kernel services" means in this context, beyond something running on the system/kernel, like every other process. To clarify, to what extent do you consider your classification similarly clear for a host is running * multiple kvm instances run as unprivileged users * non-kvm OS simulators such as vmware or xen or gdb * kvm instances running something other than linux > ( Which can be seen from the simple fact that what started the > discussion was 'how do we get /proc/kallsyms from the > guest'. I.e. an extension of the existing host-space /proc/kallsyms > was desired. ) (Sorry, that smacks of circular reasoning.) It may be a charming convenience function for perf users to give them shortcuts for certain favoured configurations (kvm running freshest linux), but that says more about perf than kvm. - FChE ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 16:08 ` Frank Ch. Eigler @ 2010-03-16 16:35 ` Ingo Molnar 0 siblings, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 16:35 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Frank Ch. Eigler <fche@redhat.com> wrote: > Hi - > > On Tue, Mar 16, 2010 at 04:52:21PM +0100, Ingo Molnar wrote: > > [...] > > > Perhaps the fact that kvm happens to deal with an interesting application > > > area (virtualization) is misleading here. As far as the host kernel or > > > other host userspace is concerned, qemu is just some random unprivileged > > > userspace program [...] > > > You are quite mistaken: KVM isnt really a 'random unprivileged > > application' in this context, it is clearly an extension of > > system/kernel services. > > I don't know what "extension of system/kernel services" means in this > context, beyond something running on the system/kernel, like every other > process. [...] It means something like my example of 'extended to guest space' /proc/kallsyms: > > [...] > > > > ( Which can be seen from the simple fact that what started the > > discussion was 'how do we get /proc/kallsyms from the guest'. I.e. an > > extension of the existing host-space /proc/kallsyms was desired. ) > > (Sorry, that smacks of circular reasoning.) To me it sounds like an example supporting my point. /proc/kallsyms is a service by the kernel, and 'perf kvm' desires this to be extended to guest space as well. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 15:52 ` Ingo Molnar 2010-03-16 16:08 ` Frank Ch. Eigler @ 2010-03-16 17:34 ` Anthony Liguori 2010-03-16 17:52 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-16 17:34 UTC (permalink / raw) To: Ingo Molnar Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 10:52 AM, Ingo Molnar wrote: > You are quite mistaken: KVM isnt really a 'random unprivileged application' in > this context, it is clearly an extension of system/kernel services. > > ( Which can be seen from the simple fact that what started the discussion was > 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the > existing host-space /proc/kallsyms was desired. ) > Random tools (like perf) should not be able to do what you describe. It's a security nightmare. If it's desirable to have /proc/kallsyms available, we can expose an interface in QEMU to provide that. That can then be plumbed through libvirt and QMP. Then a management tool can use libvirt or QMP to obtain that information and interact with the kernel appropriately. > In that sense the most natural 'extension' would be the solution i mentioned a > week or two ago: to have a (read only) mount of all guest filesystems, plus a > channel for profiling/tracing data. That would make symbol parsing easier and > it's what extends the existing 'host space' abstraction in the most natural > way. > > ( It doesnt even have to be done via the kernel - Qemu could implement that > via FUSE for example. ) > No way. The guest has sensitive data and exposing it widely on the host is a bad thing to do. It's a bad interface. We can expose specific information about guests but only through our existing channels which are validated through a security infrastructure. Ultimately, your goal is to keep perf a simple tool with little dependencies. But practically speaking, if you want to add features to it, it's going to have to interact with other subsystems in the appropriate way. That means, it's going to need to interact with libvirt or QMP. If you want all applications to expose their data via synthetic file systems, then there's always plan9 :-) Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 17:34 ` Anthony Liguori @ 2010-03-16 17:52 ` Ingo Molnar 2010-03-16 18:06 ` Anthony Liguori 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 17:52 UTC (permalink / raw) To: Anthony Liguori Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Anthony Liguori <aliguori@linux.vnet.ibm.com> wrote: > On 03/16/2010 10:52 AM, Ingo Molnar wrote: > >You are quite mistaken: KVM isnt really a 'random unprivileged application' in > >this context, it is clearly an extension of system/kernel services. > > > >( Which can be seen from the simple fact that what started the discussion was > > 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the > > existing host-space /proc/kallsyms was desired. ) > > Random tools (like perf) should not be able to do what you describe. It's a > security nightmare. A security nightmare exactly how? Mind to go into details as i dont understand your point. > If it's desirable to have /proc/kallsyms available, we can expose an > interface in QEMU to provide that. That can then be plumbed through libvirt > and QMP. > > Then a management tool can use libvirt or QMP to obtain that information and > interact with the kernel appropriately. > > > In that sense the most natural 'extension' would be the solution i > > mentioned a week or two ago: to have a (read only) mount of all guest > > filesystems, plus a channel for profiling/tracing data. That would make > > symbol parsing easier and it's what extends the existing 'host space' > > abstraction in the most natural way. > > > > ( It doesnt even have to be done via the kernel - Qemu could implement that > > via FUSE for example. ) > > No way. The guest has sensitive data and exposing it widely on the host is > a bad thing to do. [...] Firstly, you are putting words into my mouth, as i said nothing about 'exposing it widely'. I suggest exposing it under the privileges of whoever has access to the guest image. Secondly, regarding confidentiality, and this is guest security 101: whoever can access the image on the host _already_ has access to all the guest data! A Linux image can generally be loopback mounted straight away: losetup -o 32256 /dev/loop0 ./guest-image.img mount -o ro /dev/loop0 /mnt-guest (Or, if you are an unprivileged user who cannot mount, it can be read via ext2 tools.) There's nothing the guest can do about that. The host is in total control of guest image data for heaven's sake! All i'm suggesting is to make what is already possible more convenient. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 17:52 ` Ingo Molnar @ 2010-03-16 18:06 ` Anthony Liguori 2010-03-16 18:28 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-16 18:06 UTC (permalink / raw) To: Ingo Molnar Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 12:52 PM, Ingo Molnar wrote: > * Anthony Liguori<aliguori@linux.vnet.ibm.com> wrote: > > >> On 03/16/2010 10:52 AM, Ingo Molnar wrote: >> >>> You are quite mistaken: KVM isnt really a 'random unprivileged application' in >>> this context, it is clearly an extension of system/kernel services. >>> >>> ( Which can be seen from the simple fact that what started the discussion was >>> 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the >>> existing host-space /proc/kallsyms was desired. ) >>> >> Random tools (like perf) should not be able to do what you describe. It's a >> security nightmare. >> > A security nightmare exactly how? Mind to go into details as i dont understand > your point. > Assume you're using SELinux to implement mandatory access control. How do you label this file system? Generally speaking, we don't know the difference between /proc/kallsyms vs. /dev/mem if we do generic passthrough. While it might be safe to have a relaxed label of kallsyms (since it's read only), it's clearly not safe to do that for /dev/mem, /etc/shadow, or any file containing sensitive information. Rather, we ought to expose a higher level interface that we have more confidence in with respect to understanding the ramifications of exposing that guest data. >> >> No way. The guest has sensitive data and exposing it widely on the host is >> a bad thing to do. [...] >> > Firstly, you are putting words into my mouth, as i said nothing about > 'exposing it widely'. I suggest exposing it under the privileges of whoever > has access to the guest image. > That doesn't work as nicely with SELinux. It's completely reasonable to have a user that can interact in a read only mode with a VM via libvirt but cannot read the guest's disk images or the guest's memory contents. > Secondly, regarding confidentiality, and this is guest security 101: whoever > can access the image on the host _already_ has access to all the guest data! > > A Linux image can generally be loopback mounted straight away: > > losetup -o 32256 /dev/loop0 ./guest-image.img > mount -o ro /dev/loop0 /mnt-guest > > (Or, if you are an unprivileged user who cannot mount, it can be read via ext2 > tools.) > > There's nothing the guest can do about that. The host is in total control of > guest image data for heaven's sake! > It's not that simple in a MAC environment. Regards, Anthony Liguori > All i'm suggesting is to make what is already possible more convenient. > > Ingo > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 18:06 ` Anthony Liguori @ 2010-03-16 18:28 ` Ingo Molnar 2010-03-16 23:04 ` Anthony Liguori 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 18:28 UTC (permalink / raw) To: Anthony Liguori Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Anthony Liguori <aliguori@linux.vnet.ibm.com> wrote: > On 03/16/2010 12:52 PM, Ingo Molnar wrote: > >* Anthony Liguori<aliguori@linux.vnet.ibm.com> wrote: > > > >>On 03/16/2010 10:52 AM, Ingo Molnar wrote: > >>>You are quite mistaken: KVM isnt really a 'random unprivileged application' in > >>>this context, it is clearly an extension of system/kernel services. > >>> > >>>( Which can be seen from the simple fact that what started the discussion was > >>> 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the > >>> existing host-space /proc/kallsyms was desired. ) > >>Random tools (like perf) should not be able to do what you describe. It's a > >>security nightmare. > >A security nightmare exactly how? Mind to go into details as i dont understand > >your point. > > Assume you're using SELinux to implement mandatory access control. > How do you label this file system? > > Generally speaking, we don't know the difference between /proc/kallsyms vs. > /dev/mem if we do generic passthrough. While it might be safe to have a > relaxed label of kallsyms (since it's read only), it's clearly not safe to > do that for /dev/mem, /etc/shadow, or any file containing sensitive > information. What's your _point_? Please outline a threat model, a vector of attack, _anything_ that substantiates your "it's a security nightmare" claim. > Rather, we ought to expose a higher level interface that we have more > confidence in with respect to understanding the ramifications of exposing > that guest data. Exactly, we want something that has a flexible namespace and works well with Linux tools in general. Preferably that namespace should be human readable, and it should be hierarchic, and it should have a well-known permission model. This concept exists in Linux and is generally called a 'filesystem'. > >> No way. The guest has sensitive data and exposing it widely on the host > >> is a bad thing to do. [...] > > > > Firstly, you are putting words into my mouth, as i said nothing about > > 'exposing it widely'. I suggest exposing it under the privileges of > > whoever has access to the guest image. > > That doesn't work as nicely with SELinux. > > It's completely reasonable to have a user that can interact in a read only > mode with a VM via libvirt but cannot read the guest's disk images or the > guest's memory contents. If a user cannot read the image file then the user has no access to its contents via other namespaces either. That is, of course, a basic security aspect. ( That is perfectly true with a non-SELinux Unix permission model as well, and is true in the SELinux case as well. ) > > Secondly, regarding confidentiality, and this is guest security 101: whoever > > can access the image on the host _already_ has access to all the guest data! > > > > A Linux image can generally be loopback mounted straight away: > > > > losetup -o 32256 /dev/loop0 ./guest-image.img > > mount -o ro /dev/loop0 /mnt-guest > > > >(Or, if you are an unprivileged user who cannot mount, it can be read via ext2 > >tools.) > > > > There's nothing the guest can do about that. The host is in total control of > > guest image data for heaven's sake! > > It's not that simple in a MAC environment. Erm. Please explain to me, what exactly is 'not that simple' in a MAC environment? Also, i'd like to note that the 'restrictive SELinux setups' usecases are pretty secondary. To demonstrate that, i'd like every KVM developer on this list who reads this mail and who has their home development system where they produce their patches set up in a restrictive MAC environment, in that you cannot even read the images you are using, to chime in with a "I'm doing that" reply. If there's just a _single_ KVM developer amongst dozens and dozens of developers on this list who develops in an environment like that i'd be surprised. That result should pretty much tell you where the weight of instrumentation focus should lie - and it isnt on restrictive MAC environments ... Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 18:28 ` Ingo Molnar @ 2010-03-16 23:04 ` Anthony Liguori 2010-03-17 0:41 ` Frank Ch. Eigler 2010-03-17 8:53 ` Ingo Molnar 0 siblings, 2 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-16 23:04 UTC (permalink / raw) To: Ingo Molnar Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 01:28 PM, Ingo Molnar wrote: > * Anthony Liguori<aliguori@linux.vnet.ibm.com> wrote: > > >> On 03/16/2010 12:52 PM, Ingo Molnar wrote: >> >>> * Anthony Liguori<aliguori@linux.vnet.ibm.com> wrote: >>> >>> >>>> On 03/16/2010 10:52 AM, Ingo Molnar wrote: >>>> >>>>> You are quite mistaken: KVM isnt really a 'random unprivileged application' in >>>>> this context, it is clearly an extension of system/kernel services. >>>>> >>>>> ( Which can be seen from the simple fact that what started the discussion was >>>>> 'how do we get /proc/kallsyms from the guest'. I.e. an extension of the >>>>> existing host-space /proc/kallsyms was desired. ) >>>>> >>>> Random tools (like perf) should not be able to do what you describe. It's a >>>> security nightmare. >>>> >>> A security nightmare exactly how? Mind to go into details as i dont understand >>> your point. >>> >> Assume you're using SELinux to implement mandatory access control. >> How do you label this file system? >> >> Generally speaking, we don't know the difference between /proc/kallsyms vs. >> /dev/mem if we do generic passthrough. While it might be safe to have a >> relaxed label of kallsyms (since it's read only), it's clearly not safe to >> do that for /dev/mem, /etc/shadow, or any file containing sensitive >> information. >> > What's your _point_? Please outline a threat model, a vector of attack, > _anything_ that substantiates your "it's a security nightmare" claim. > You suggested "to have a (read only) mount of all guest filesystems". As I described earlier, not all of the information within the guest filesystem has the same level of sensitivity. If you exposed a generic interface like this, it makes it very difficult to delegate privileges. Delegating privileges is important because from in a higher security environment, you may want to prevent a management tool from accessing the VM's disk directly, but still allow it to do basic operations (in particular, to view performance statistics). >> Rather, we ought to expose a higher level interface that we have more >> confidence in with respect to understanding the ramifications of exposing >> that guest data. >> > Exactly, we want something that has a flexible namespace and works well with > Linux tools in general. Preferably that namespace should be human readable, > and it should be hierarchic, and it should have a well-known permission model. > > This concept exists in Linux and is generally called a 'filesystem'. > If you want to use a synthetic filesystem as the management interface for qemu, that's one thing. But you suggested exposing the guest filesystem in its entirely and that's what I disagreed with. > If a user cannot read the image file then the user has no access to its > contents via other namespaces either. That is, of course, a basic security > aspect. > > ( That is perfectly true with a non-SELinux Unix permission model as well, and > is true in the SELinux case as well. ) > I don't think that's reasonable at all. The guest may encrypt it's disk image. It still ought to be possible to run perf against that guest, no? > Erm. Please explain to me, what exactly is 'not that simple' in a MAC > environment? > > Also, i'd like to note that the 'restrictive SELinux setups' usecases are > pretty secondary. > > To demonstrate that, i'd like every KVM developer on this list who reads this > mail and who has their home development system where they produce their > patches set up in a restrictive MAC environment, in that you cannot even read > the images you are using, to chime in with a "I'm doing that" reply. > My home system doesn't run SELinux but I work daily with systems that are using SELinux. I want to be able to run tools like perf on these systems because ultimately, I need to debug these systems on a daily basis. But that's missing the point. We want to have an interface that works for both cases so that we're not maintaining two separate interfaces. We've rat holed a bit though. You want: 1) to run perf kvm list and be able to enumerate KVM guests 2) for this to Just Work with qemu guests launched from the command line You could achieve (1) by tying perf to libvirt but that won't work for (2). There are a few practical problems with (2). qemu does not require the user to associate any uniquely identifying information with a VM. We've also optimized the command line use case so that if all you want to do is run a disk image, you just execute "qemu foo.img". To satisfy your use case, we would either have to force a use to always specify unique information, which would be less convenient for our users or we would have to let the name be an optional parameter. As it turns out, we already support "qemu -name Fedora foo.img". What we don't do today, but I've been suggesting we should, is automatically create a QMP management socket in a well known location based on the -name parameter when it's specified. That would let a tool like perf Just Work provided that a user specified -name. No one uses -name today though and I'm sure you don't either. The only way to really address this is to change the interaction. Instead of running perf externally to qemu, we should support a perf command in the qemu monitor that can then tie directly to the perf tooling. That gives us the best possible user experience. We can't do that though unless perf is a library or is in some way more programmatic. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 23:04 ` Anthony Liguori @ 2010-03-17 0:41 ` Frank Ch. Eigler 2010-03-17 3:54 ` Avi Kivity 2010-03-17 8:14 ` Ingo Molnar 2010-03-17 8:53 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: Frank Ch. Eigler @ 2010-03-17 0:41 UTC (permalink / raw) To: Anthony Liguori Cc: Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang Hi - On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote: > [...] > The only way to really address this is to change the interaction. > Instead of running perf externally to qemu, we should support a perf > command in the qemu monitor that can then tie directly to the perf > tooling. That gives us the best possible user experience. To what extent could this be solved with less crossing of isolation/abstraction layers, if the perfctr facilities were properly virtualized? That way guests could run perf goo internally. Optionally virt tools on the host side could aggregate data from cooperating self-monitoring guests. - FChE ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 0:41 ` Frank Ch. Eigler @ 2010-03-17 3:54 ` Avi Kivity 2010-03-17 8:16 ` Ingo Molnar 2010-03-18 5:27 ` Huang, Zhiteng 2010-03-17 8:14 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-17 3:54 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Anthony Liguori, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/17/2010 02:41 AM, Frank Ch. Eigler wrote: > Hi - > > On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote: > >> [...] >> The only way to really address this is to change the interaction. >> Instead of running perf externally to qemu, we should support a perf >> command in the qemu monitor that can then tie directly to the perf >> tooling. That gives us the best possible user experience. >> > To what extent could this be solved with less crossing of > isolation/abstraction layers, if the perfctr facilities were properly > virtualized? > That's the more interesting (by far) usage model. In general guest owners don't have access to the host, and host owners can't (and shouldn't) change guests. Monitoring guests from the host is useful for kvm developers, but less so for users. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 3:54 ` Avi Kivity @ 2010-03-17 8:16 ` Ingo Molnar 2010-03-17 8:20 ` Avi Kivity 2010-03-18 5:27 ` Huang, Zhiteng 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-17 8:16 UTC (permalink / raw) To: Avi Kivity Cc: Frank Ch. Eigler, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Avi Kivity <avi@redhat.com> wrote: > Monitoring guests from the host is useful for kvm developers, but less so > for users. Guest space profiling is easy, and 'perf kvm' is not about that. (plain 'perf' will work if a proper paravirt channel is opened to the host) I think you might have misunderstood the purpose and role of the 'perf kvm' patch here? 'perf kvm' is aimed at KVM developers: it is them who improve KVM code, not guest kernel users. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 8:16 ` Ingo Molnar @ 2010-03-17 8:20 ` Avi Kivity 2010-03-17 8:59 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-17 8:20 UTC (permalink / raw) To: Ingo Molnar Cc: Frank Ch. Eigler, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/17/2010 10:16 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> Monitoring guests from the host is useful for kvm developers, but less so >> for users. >> > Guest space profiling is easy, and 'perf kvm' is not about that. (plain 'perf' > will work if a proper paravirt channel is opened to the host) > > I think you might have misunderstood the purpose and role of the 'perf kvm' > patch here? 'perf kvm' is aimed at KVM developers: it is them who improve KVM > code, not guest kernel users. > Of course I understood it. My point was that 'perf kvm' serves a tiny minority of users. That doesn't mean it isn't useful, just that it doesn't satisfy all needs by itself. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 8:20 ` Avi Kivity @ 2010-03-17 8:59 ` Ingo Molnar 0 siblings, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-17 8:59 UTC (permalink / raw) To: Avi Kivity Cc: Frank Ch. Eigler, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Avi Kivity <avi@redhat.com> wrote: > On 03/17/2010 10:16 AM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >> Monitoring guests from the host is useful for kvm developers, but less so > >> for users. > > > > Guest space profiling is easy, and 'perf kvm' is not about that. (plain > > 'perf' will work if a proper paravirt channel is opened to the host) > > > > I think you might have misunderstood the purpose and role of the 'perf > > kvm' patch here? 'perf kvm' is aimed at KVM developers: it is them who > > improve KVM code, not guest kernel users. > > Of course I understood it. My point was that 'perf kvm' serves a tiny > minority of users. [...] I hope you wont be disappointed to learn that 100% of Linux, all 13+ million lines of it, was and is being developed by a tiny, tiny, tiny minority of users ;-) > [...] That doesn't mean it isn't useful, just that it doesn't satisfy all > needs by itself. Of course - and it doesnt bring world peace either. One step at a time. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* RE: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 3:54 ` Avi Kivity @ 2010-03-18 5:27 ` Huang, Zhiteng 2010-03-18 5:27 ` Huang, Zhiteng 1 sibling, 0 replies; 390+ messages in thread From: Huang, Zhiteng @ 2010-03-18 5:27 UTC (permalink / raw) To: Avi Kivity, Frank Ch. Eigler Cc: Anthony Liguori, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2779 bytes --] Hi Avi, Ingo, I've been following through this long thread since the very first email. I'm a performance engineer whose job is to tune workloads run on top of KVM (and Xen previously). As a performance engineer, I desperately want to have a tool that can monitor the host and guests at same time. Think about >100 guests mixed with Linux/Windows running together on single system, being able to know what's happening is critical to do performance analysis. Actually I am the person asked Yanmin to add feature for CPU utilization break down (into host_usr, host_krn, guest_usr, guest_krn) so that I can monitor dozens of running guests. I hasn't made this patch work on my system yet but I _do_ think this patch is a very good start. And finally, monitoring guests from host is useful for users too (administrator and performance guy like me). I really appreciate you guys' work and would love to provide feedback from my point of view if needed. Regards, HUANG, Zhiteng Intel SSG/SSD/SPA/PRC Scalability Lab -----Original Message----- From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of Avi Kivity Sent: Wednesday, March 17, 2010 11:55 AM To: Frank Ch. Eigler Cc: Anthony Liguori; Ingo Molnar; Zhang, Yanmin; Peter Zijlstra; Sheng Yang; linux-kernel@vger.kernel.org; kvm@vger.kernel.org; Marcelo Tosatti; oerg Roedel; Jes Sorensen; Gleb Natapov; Zachary Amsden; ziteng.huang@intel.com Subject: Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side On 03/17/2010 02:41 AM, Frank Ch. Eigler wrote: > Hi - > > On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote: > >> [...] >> The only way to really address this is to change the interaction. >> Instead of running perf externally to qemu, we should support a perf >> command in the qemu monitor that can then tie directly to the perf >> tooling. That gives us the best possible user experience. >> > To what extent could this be solved with less crossing of > isolation/abstraction layers, if the perfctr facilities were properly > virtualized? > That's the more interesting (by far) usage model. In general guest owners don't have access to the host, and host owners can't (and shouldn't) change guests. Monitoring guests from the host is useful for kvm developers, but less so for users. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 390+ messages in thread
* RE: [PATCH] Enhance perf to collect KVM guest os statistics from host side @ 2010-03-18 5:27 ` Huang, Zhiteng 0 siblings, 0 replies; 390+ messages in thread From: Huang, Zhiteng @ 2010-03-18 5:27 UTC (permalink / raw) To: Avi Kivity, Frank Ch. Eigler Cc: Anthony Liguori, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden Hi Avi, Ingo, I've been following through this long thread since the very first email. I'm a performance engineer whose job is to tune workloads run on top of KVM (and Xen previously). As a performance engineer, I desperately want to have a tool that can monitor the host and guests at same time. Think about >100 guests mixed with Linux/Windows running together on single system, being able to know what's happening is critical to do performance analysis. Actually I am the person asked Yanmin to add feature for CPU utilization break down (into host_usr, host_krn, guest_usr, guest_krn) so that I can monitor dozens of running guests. I hasn't made this patch work on my system yet but I _do_ think this patch is a very good start. And finally, monitoring guests from host is useful for users too (administrator and performance guy like me). I really appreciate you guys' work and would love to provide feedback from my point of view if needed. Regards, HUANG, Zhiteng Intel SSG/SSD/SPA/PRC Scalability Lab -----Original Message----- From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of Avi Kivity Sent: Wednesday, March 17, 2010 11:55 AM To: Frank Ch. Eigler Cc: Anthony Liguori; Ingo Molnar; Zhang, Yanmin; Peter Zijlstra; Sheng Yang; linux-kernel@vger.kernel.org; kvm@vger.kernel.org; Marcelo Tosatti; oerg Roedel; Jes Sorensen; Gleb Natapov; Zachary Amsden; ziteng.huang@intel.com Subject: Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side On 03/17/2010 02:41 AM, Frank Ch. Eigler wrote: > Hi - > > On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote: > >> [...] >> The only way to really address this is to change the interaction. >> Instead of running perf externally to qemu, we should support a perf >> command in the qemu monitor that can then tie directly to the perf >> tooling. That gives us the best possible user experience. >> > To what extent could this be solved with less crossing of > isolation/abstraction layers, if the perfctr facilities were properly > virtualized? > That's the more interesting (by far) usage model. In general guest owners don't have access to the host, and host owners can't (and shouldn't) change guests. Monitoring guests from the host is useful for kvm developers, but less so for users. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 0:41 ` Frank Ch. Eigler 2010-03-17 3:54 ` Avi Kivity @ 2010-03-17 8:14 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-17 8:14 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Frank Ch. Eigler <fche@redhat.com> wrote: > Hi - > > On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote: > > [...] > > The only way to really address this is to change the interaction. > > Instead of running perf externally to qemu, we should support a perf > > command in the qemu monitor that can then tie directly to the perf > > tooling. That gives us the best possible user experience. > > To what extent could this be solved with less crossing of > isolation/abstraction layers, if the perfctr facilities were properly > virtualized? [...] Note, 'perfctr' is a different out-of-tree Linux kernel project run by someone else: it offers the /dev/perfctr special-purpose device that allows raw, unabstracted, low-level access to the PMU. I suspect the one you wanted to mention here is called 'perf' or 'perf events'. (and used to be called 'performance counters' or 'perfcounters' until it got renamed about a year ago) Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 23:04 ` Anthony Liguori 2010-03-17 0:41 ` Frank Ch. Eigler @ 2010-03-17 8:53 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-17 8:53 UTC (permalink / raw) To: Anthony Liguori Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * Anthony Liguori <aliguori@linux.vnet.ibm.com> wrote: > If you want to use a synthetic filesystem as the management interface for > qemu, that's one thing. But you suggested exposing the guest filesystem in > its entirely and that's what I disagreed with. What did you think, that it would be world-readable? Why would we do such a stupid thing? Any mounted content should at minimum match whatever policy covers the image file. The mounting of contents is not a privilege escallation and it is already possible today - just not integrated properly and not practical. (and apparently not implemented for all the wrong 'security' reasons) > The guest may encrypt it's disk image. It still ought to be possible to run > perf against that guest, no? _In_ the guest you can of course run it just fine. (once paravirt bits are in place) That has no connection to 'perf kvm' though, which this patch submission is about ... If you want unified profiling of both host and guest then you need access to both the guest and the host. This is what the 'perf kvm' patch is about. Please read the patch, i think you might be misunderstanding what it does ... Regarding encrypted contents - that's really a distraction but the host has absolute, 100% control over the guest and there's nothing the guest can do about that - unless you are thinking about the sub-sub-case of Orwellian DRM-locked-down systems - in which case there's nothing for the host to mount and the guest can reject any requests for information on itself and impose additional policy that way. So it's a security non-issue. Note that DRM is pretty much the worst place to look at when it comes to usability: DRM lock-down is the anti-thesis of usability. Do you really want KVM to match the mind-set of the RIAA and MPAA? Why do you pretend that a developer cannot mount his own disk image? Pretty please, help Linux instead, where development is driven by usability and accessibility ... Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 13:08 ` Ingo Molnar 2010-03-16 13:16 ` Avi Kivity @ 2010-03-16 17:06 ` Anthony Liguori 2010-03-16 17:39 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-16 17:06 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On 03/16/2010 08:08 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/16/2010 02:29 PM, Ingo Molnar wrote: >> > >>> I mean, i can trust a kernel service and i can trust /proc/kallsyms. >>> >>> Can perf trust a random process claiming to be Qemu? What's the trust >>> mechanism here? >>> >> Obviously you can't trust anything you get from a guest, no matter how you >> get it. >> > I'm not talking about the symbol strings and addresses, and the object > contents for allocation (or debuginfo). I'm talking about the basic protocol > of establishing which guest is which. > > I.e. we really want to be able users to: > > 1) have it all working with a single guest, without having to specify 'which' > guest (qemu PID) to work with. That is the dominant usecase both for > developers and for a fair portion of testers. > You're making too many assumptions. There is no list of guests anymore than there is a list of web browsers. You can have a multi-tenant scenario where you have distinct groups of virtual machines running as unprivileged users. > 2) Have some reasonable symbolic identification for guests. For example a > usable approach would be to have 'perf kvm list', which would list all > currently active guests: > > $ perf kvm list > [1] Fedora > [2] OpenSuse > [3] Windows-XP > [4] Windows-7 > > And from that point on 'perf kvm -g OpenSuse record' would do the obvious > thing. Users will be able to just use the 'OpenSuse' symbolic name for > that guest, even if the guest got restarted and switched its main PID. > Does "perf kvm list" always run as root? What if two unprivileged users both have a VM named "Fedora"? If we look at the use-case, it's going to be something like, a user is creating virtual machines and wants to get performance information about them. Having to run a separate tool like perf is not going to be what they would expect they had to do. Instead, they would either use their existing GUI tool (like virt-manager) or they would use their management interface (either QMP or libvirt). The complexity of interaction is due to the fact that perf shouldn't be a stand alone tool. It should be a library or something with a programmatic interface that another tool can make use of. Regards, Anthony Liguori > Is such a scheme possible/available? I suspect all the KVM configuration tools > (i havent used them in some time - gui and command-line tools alike) use > similar methods to ease guest management? > > Ingo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 17:06 ` Anthony Liguori @ 2010-03-16 17:39 ` Ingo Molnar 2010-03-16 23:07 ` Anthony Liguori 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 17:39 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, =?unknown-8bit?B?RnLDqWTDqXJpYw==?= Weisbecker * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/16/2010 08:08 AM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>On 03/16/2010 02:29 PM, Ingo Molnar wrote: > >>>I mean, i can trust a kernel service and i can trust /proc/kallsyms. > >>> > >>>Can perf trust a random process claiming to be Qemu? What's the trust > >>>mechanism here? > >>Obviously you can't trust anything you get from a guest, no matter how you > >>get it. > >I'm not talking about the symbol strings and addresses, and the object > >contents for allocation (or debuginfo). I'm talking about the basic protocol > >of establishing which guest is which. > > > >I.e. we really want to be able users to: > > > > 1) have it all working with a single guest, without having to specify 'which' > > guest (qemu PID) to work with. That is the dominant usecase both for > > developers and for a fair portion of testers. > > You're making too many assumptions. > > There is no list of guests anymore than there is a list of web browsers. > > You can have a multi-tenant scenario where you have distinct groups of > virtual machines running as unprivileged users. "multi-tenant" and groups is not a valid excuse at all for giving crappy technology in the simplest case: when there's a single VM. Yes, eventually it can be supported and any sane scheme will naturally support it too, but it's by no means what we care about primarily when it comes to these tools. I thought everyone learned the lesson behind SystemTap's failure (and to a certain degree this was behind Oprofile's failure as well): when it comes to tooling/instrumentation we dont want to concentrate on the fancy complex setups and abstract requirements drawn up by CIOs, as development isnt being done there. Concentrate on our developers today, and provide no-compromises usability to those who contribute stuff. If we dont help make the simplest (and most common) use-case convenient then we are failing on a fundamental level. > > 2) Have some reasonable symbolic identification for guests. For example a > > usable approach would be to have 'perf kvm list', which would list all > > currently active guests: > > > > $ perf kvm list > > [1] Fedora > > [2] OpenSuse > > [3] Windows-XP > > [4] Windows-7 > > > > And from that point on 'perf kvm -g OpenSuse record' would do the obvious > > thing. Users will be able to just use the 'OpenSuse' symbolic name for > > that guest, even if the guest got restarted and switched its main PID. > > Does "perf kvm list" always run as root? What if two unprivileged users > both have a VM named "Fedora"? Again, the single-VM case is the most important case, by far. If you have multiple VMs running and want to develop the kernel on multiple VMs (sounds rather messy if you think it through ...), what would happen is similar to what happens when we have two probes for example: # perf probe schedule Added new event: probe:schedule (on schedule+0) You can now use it on all perf tools, such as: perf record -e probe:schedule -a sleep 1 # perf probe -f schedule Added new event: probe:schedule_1 (on schedule+0) You can now use it on all perf tools, such as: perf record -e probe:schedule_1 -a sleep 1 # perf probe -f schedule Added new event: probe:schedule_2 (on schedule+0) You can now use it on all perf tools, such as: perf record -e probe:schedule_2 -a sleep 1 Something similar could be used for KVM/Qemu: whichever got created first is named 'Fedora', the second is named 'Fedora-2'. > If we look at the use-case, it's going to be something like, a user is > creating virtual machines and wants to get performance information about > them. > > Having to run a separate tool like perf is not going to be what they would > expect they had to do. Instead, they would either use their existing GUI > tool (like virt-manager) or they would use their management interface > (either QMP or libvirt). > > The complexity of interaction is due to the fact that perf shouldn't be a > stand alone tool. It should be a library or something with a programmatic > interface that another tool can make use of. But ... a GUI interface/integration is of course possible too, and it's being worked on. perf is mainly a kernel developer tool, and kernel developers generally dont use GUIs to do their stuff: which is the (sole) reason why its first ~850 commits of tools/perf/ were done without a GUI. We go where our developers are. In any case it's not an excuse to have no proper command-line tooling. In fact if you cannot get simpler, more atomic command-line tooling right then you'll probably doubly suck at doing a GUI as well. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 17:39 ` Ingo Molnar @ 2010-03-16 23:07 ` Anthony Liguori 2010-03-17 8:10 ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-16 23:07 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Frédéric Weisbecker On 03/16/2010 12:39 PM, Ingo Molnar wrote: >> If we look at the use-case, it's going to be something like, a user is >> creating virtual machines and wants to get performance information about >> them. >> >> Having to run a separate tool like perf is not going to be what they would >> expect they had to do. Instead, they would either use their existing GUI >> tool (like virt-manager) or they would use their management interface >> (either QMP or libvirt). >> >> The complexity of interaction is due to the fact that perf shouldn't be a >> stand alone tool. It should be a library or something with a programmatic >> interface that another tool can make use of. >> > But ... a GUI interface/integration is of course possible too, and it's being > worked on. > > perf is mainly a kernel developer tool, and kernel developers generally dont > use GUIs to do their stuff: which is the (sole) reason why its first ~850 > commits of tools/perf/ were done without a GUI. We go where our developers > are. > > In any case it's not an excuse to have no proper command-line tooling. In fact > if you cannot get simpler, more atomic command-line tooling right then you'll > probably doubly suck at doing a GUI as well. > It's about who owns the user interface. If qemu owns the user interface, than we can satisfy this in a very simple way by adding a perf monitor command. If we have to support third party tools, then it significantly complicates things. Regards, Anthony Liguori > Ingo > ^ permalink raw reply [flat|nested] 390+ messages in thread
* [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-16 23:07 ` Anthony Liguori @ 2010-03-17 8:10 ` Ingo Molnar 2010-03-18 8:20 ` Avi Kivity ` (3 more replies) 0 siblings, 4 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-17 8:10 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/16/2010 12:39 PM, Ingo Molnar wrote: > >>If we look at the use-case, it's going to be something like, a user is > >>creating virtual machines and wants to get performance information about > >>them. > >> > >>Having to run a separate tool like perf is not going to be what they would > >>expect they had to do. Instead, they would either use their existing GUI > >>tool (like virt-manager) or they would use their management interface > >>(either QMP or libvirt). > >> > >>The complexity of interaction is due to the fact that perf shouldn't be a > >>stand alone tool. It should be a library or something with a programmatic > >>interface that another tool can make use of. > >But ... a GUI interface/integration is of course possible too, and it's being > >worked on. > > > >perf is mainly a kernel developer tool, and kernel developers generally dont > >use GUIs to do their stuff: which is the (sole) reason why its first ~850 > >commits of tools/perf/ were done without a GUI. We go where our developers > >are. > > > >In any case it's not an excuse to have no proper command-line tooling. In fact > >if you cannot get simpler, more atomic command-line tooling right then you'll > >probably doubly suck at doing a GUI as well. > > It's about who owns the user interface. > > If qemu owns the user interface, than we can satisfy this in a very simple > way by adding a perf monitor command. If we have to support third party > tools, then it significantly complicates things. Of course illogical modularization complicates things 'significantly'. I wish both you and Avi looked back 3-4 years and realized what made KVM so successful back then and why the hearts and minds of virtualization developers were captured by KVM almost overnight. KVM's main strength back then was that it was a surprisingly functional piece of code offered by a 10 KLOC patch - right on the very latest upstream kernel. Code was shared with upstream, there was version parity, and it all was in the same single repo which was (and is) a pleasure to develop on. Unlike Xen, which was a 200+ KLOC patch on top of a forked 10 MLOC kernel a few upstream versions back. Xen had constant version friction due to that fork and due to that forced/false separation/modularization: Xen _itself_ was a fork of Linux to begin with. (for exampe Xen still had my copyrights last i checked, which it got from old Linux code i worked on) That forced separation and version friction in Xen was a development and productization nightmare, and developing on KVM was a truly refreshing experience. (I'll go out on a limb to declare that you wont find a _single_ developer on this list who will tells us otherwise.) Fast forward to 2010. The kernel side of KVM is maximum goodness - by far the worst-quality remaining aspects of KVM are precisely in areas that you mention: 'if we have to support third party tools, then it significantly complicates things'. You kept Qemu as an external 'third party' entity to KVM, and KVM is clearly hurting from that - just see the recent KVM usability thread for examples about suckage. So a similar 'complication' is the crux of the matter behind KVM quality problems: you've not followed through with the original KVM vision and you have not applied that concept to Qemu! And please realize that the user does not care that KVM's kernel bits are top notch, if the rest of the package has sucky aspects: it's always the weakest link of the chain that matters to the user. Xen sucked because of such design shortsightedness on the kernel level, and now KVM suffers from it on the user space level. If you want to jump to the next level of technological quality you need to fix this attitude and you need to go back to the design roots of KVM. Concentrate on Qemu (as that is the weakest link now), make it a first class member of the KVM repo and simplify your development model by having a single repo: - move a clean (and minimal) version of the Qemu code base to tools/kvm/, in the upstream kernel repo, and work on that from that point on. - co-develop new features within the same patch. Release new versions of kvm-qemu and the kvm bits at the same time (together with the upstream kernel), at well defined points in time. - encourage kernel-space and user-space KVM developers to work on both user-space and kernel-space bits as a single unit. It's one project and a single experience to the user. - [ and probably libvirt should go there too ] If KVM's hypervisor and guest kernel code can enjoy the benefits of a single repository, why cannot the rest of KVM enjoy the same developer goodness? Only fixing that will bring the break-through in quality - not more manpower really. Yes, i've read a thousand excuses for why this is an absolutely impossible and a bad thing to do, and none of them was really convincing to me - and you also have become rather emotional about all the arguments so it's hard to argue about it on a technical basis. We made a similar (admittedly very difficult ...) design jump from oprofile to perf, and i can tell you from that experience that it's day and night, both in terms of development and in terms of the end result! ( We recently also made another, kernel/kernel unification that had a very positive result: we unified the 32-bit and 64-bit x86 architectures. Even within the same repo the unification of technology is generally a good thing. The KVM/Qemu situation is different - it's more similar to the perf design. ) Not having to fight artificial package boundaries and forced package separation is very refreshing experience to a developer - and very rewarding and flexible to develop on. ABI compatibility is _easier_ to maintain in such a model. It's quite similar to the jump from Xen hacking to KVM hacking (i did both). It's a bit like the jump from CVS to Git. Trust me, you _cannot_ know the difference if you havent tried a similar jump with Qemu. Anyway, you made your position about this rather clear and you are clearly uncompromising, so i just wanted to post this note to the list: you'll waste years of your life on a visibly crappy development model that has been unable to break through a magic usability barrier for the past 2-3 years - just like the Xen mis-design has wasted so many people's time and effort in kernel space. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-17 8:10 ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar @ 2010-03-18 8:20 ` Avi Kivity 2010-03-18 8:56 ` Ingo Molnar 2010-03-18 9:22 ` Ingo Molnar 2010-03-18 8:44 ` Jes Sorensen ` (2 subsequent siblings) 3 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-18 8:20 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/17/2010 10:10 AM, Ingo Molnar wrote: > >> It's about who owns the user interface. >> >> If qemu owns the user interface, than we can satisfy this in a very simple >> way by adding a perf monitor command. If we have to support third party >> tools, then it significantly complicates things. >> > Of course illogical modularization complicates things 'significantly'. > Who should own the user interface then? > Fast forward to 2010. The kernel side of KVM is maximum goodness - by far the > worst-quality remaining aspects of KVM are precisely in areas that you > mention: 'if we have to support third party tools, then it significantly > complicates things'. You kept Qemu as an external 'third party' entity to KVM, > and KVM is clearly hurting from that - just see the recent KVM usability > thread for examples about suckage. > Any qemu usability problems are because developers (or their employers) are not interested in fixing them, not because of the repository location. Most kvm developer interest is in server-side deployment (even for desktop guests), so there is limited effort in implementing a virtualbox-style GUI. > - move a clean (and minimal) version of the Qemu code base to tools/kvm/, in > the upstream kernel repo, and work on that from that point on. > I'll ignore the repository location which should be immaterial to a serious developer and concentrate on the 'clean and minimal' aspect, since it has some merit. Qemu development does have a tension between the needs of kvm and tcg. For kvm we need fine-grained threading to improve performance and tons of RAS work. For tcg these are mostly meaningless, and the tcg code has sufficient inertia to reduce the rate at which we can develop. Nevertheless, the majority of developers feel that we'll lose more by a fork (the community) than we gain by it (reduced constraints). > - co-develop new features within the same patch. Release new versions of > kvm-qemu and the kvm bits at the same time (together with the upstream > kernel), at well defined points in time. > The majority of patches to qemu don't require changes to kvm, and vice versa. The interface between qemu and kvm is fairly narrow, and most of the changes are related to save/restore and guest debugging, hardly areas of great interest to the causal user. > - encourage kernel-space and user-space KVM developers to work on both > user-space and kernel-space bits as a single unit. It's one project and a > single experience to the user. > When a feature is developed that requires both kernel and qemu changes, the same developer makes the changes in both projects. Having them in different repositories does not appear to be a problem. > - [ and probably libvirt should go there too ] > Let's make a list of projects who don't need to be in the kernel repository, it will probably be shorted. Seriously, libvirt is a cross-platform cross-hypervisor library, it has no business near the Linux kernel. > If KVM's hypervisor and guest kernel code can enjoy the benefits of a single > repository, In fact I try hard not to rely too much on that. While both kvm guest and host code are in the same repo, there is an ABI barrier between them because we need to support any guest version on any host version. When designing, writing, or reading guest or host code that interacts across that barrier we need to keep forward and backward compatibility in mind. It's very different from normal kernel APIs that we can adapt whenever the need arises. > why cannot the rest of KVM enjoy the same developer goodness? Only > fixing that will bring the break-through in quality - not more manpower > really. > I really don't understand why you believe that. You seem to want a virtualbox-style GUI, and lkml is probably the last place in the world to develop something like that. The developers here are mostly uninterested in GUI and usability problems, remember these are people who thing emacs xor vi is a great editor. > Yes, i've read a thousand excuses for why this is an absolutely impossible and > a bad thing to do, and none of them was really convincing to me - and you also > have become rather emotional about all the arguments so it's hard to argue > about it on a technical basis. > > We made a similar (admittedly very difficult ...) design jump from oprofile to > perf, and i can tell you from that experience that it's day and night, both in > terms of development and in terms of the end result! > Maybe it was due to better design and implementation choices. > ( We recently also made another, kernel/kernel unification that had a very > positive result: we unified the 32-bit and 64-bit x86 architectures. Even > within the same repo the unification of technology is generally a good > thing. The KVM/Qemu situation is different - it's more similar to the perf > design. ) > > Not having to fight artificial package boundaries and forced package > separation is very refreshing experience to a developer - and very rewarding > and flexible to develop on. ABI compatibility is _easier_ to maintain in such > a model. It's quite similar to the jump from Xen hacking to KVM hacking (i did > both). It's a bit like the jump from CVS to Git. Trust me, you _cannot_ know > the difference if you havent tried a similar jump with Qemu. > Why is ABI compatibility easier to maintain in a single repo? > Anyway, you made your position about this rather clear and you are clearly > uncompromising, so i just wanted to post this note to the list: you'll waste > years of your life on a visibly crappy development model that has been unable > to break through a magic usability barrier for the past 2-3 years - just like > the Xen mis-design has wasted so many people's time and effort in kernel > space. > Do you really think the echo'n'cat school of usability wants to write a GUI? In linux-2.6.git? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 8:20 ` Avi Kivity @ 2010-03-18 8:56 ` Ingo Molnar 2010-03-18 9:24 ` Alexander Graf 2010-03-18 10:12 ` Avi Kivity 2010-03-18 9:22 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 8:56 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/17/2010 10:10 AM, Ingo Molnar wrote: > > > >> It's about who owns the user interface. > >> > >> If qemu owns the user interface, than we can satisfy this in a very > >> simple way by adding a perf monitor command. If we have to support third > >> party tools, then it significantly complicates things. > > > > Of course illogical modularization complicates things 'significantly'. > > Who should own the user interface then? If qemu was in tools/kvm/ then we wouldnt have such issues. A single patch (or series of patches) could modify tools/kvm/, arch/x86/kvm/, virt/ and tools/perf/. Numerous times did we have patches to kernel/perf_event.c that fixed some detail, also accompanied by a tools/perf/ patch fixing another detail. Having a single 'culture of contribution' is a powerful way to develop. It turns out kernel developers can be pretty good user-space developers as well and user-space developers can be pretty good kernel developers as well. Some like to do both - as long as it's all within a single project. The moment any change (be it as trivial as fixing a GUI detail or as complex as a new feature) involves two or more packages, development speed slows down to a crawl - while the complexity of the change might be very low! Also, there's the harmful process that people start categorizing themselves into 'I am a kernel developer' and 'I am a user space programmer' stereotypes, which limits the scope of contributions artificially. > > Fast forward to 2010. The kernel side of KVM is maximum goodness - by far > > the worst-quality remaining aspects of KVM are precisely in areas that you > > mention: 'if we have to support third party tools, then it significantly > > complicates things'. You kept Qemu as an external 'third party' entity to > > KVM, and KVM is clearly hurting from that - just see the recent KVM > > usability thread for examples about suckage. > > Any qemu usability problems are because developers (or their employers) are > not interested in fixing them, not because of the repository location. Most > kvm developer interest is in server-side deployment (even for desktop > guests), so there is limited effort in implementing a virtualbox-style GUI. The same has been said of oprofile as well: 'it somewhat sucks because we are too server centric', 'nobody is interested in good usability and oprofile is fine for the enterprises'. Ironically, the same has been said of Xen usability as well, up to the point KVM came around. What was the core of the problem was a bad design and a split kernel-side user-side tool landscape. In fact i think saying that 'our developers only care about the server' is borderline dishonest, when at the same time you are making it doubly sure (by inaction) that it stays so: by leaving an artificial package wall between kernel-side KVM and user-side KVM and not integrating the two technologies. You'll never know what heights you could achieve if you leave that wall there ... Furthermore, what should be realized is that bad usability hurts "server features" just as much. Most of the day-to-day testing is done on the desktop by desktop oriented testers/developers. _Not_ by enterprise shops - they tend to see the code years down the line to begin with ... Yes, a particular feature might be server oriented, but a good portion of our testing is on the desktop and everyone is hurting from bad usability and this puts limits on contribution efficiency. As the patch posted in _this very thread demonstrates it_, it is doubly more difficult to contribute a joint KVM+Qemu feature, because it's two separate code bases, two contribution guidelines, two release schedules. While to the user it really is just one and the same thing. It should be so for the developer as well. Put in another way: KVM's current split design is making it easy to contribute server features (because the kernel side is clean and cool), but also makes it artificially hard to contribute desktop features: because the tooling side (Qemu) is 'just another package', is separated by a package and maintenance wall and is made somewhat uncool by a (as some KVM developers have pointed out in this thread) quirky codebase. (the rest of your points are really a function of this fundamental disagreement) Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 8:56 ` Ingo Molnar @ 2010-03-18 9:24 ` Alexander Graf 2010-03-18 10:10 ` Ingo Molnar 2010-03-18 10:12 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Alexander Graf @ 2010-03-18 9:24 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 18.03.2010, at 09:56, Ingo Molnar wrote: > > * Avi Kivity <avi@redhat.com> wrote: > >> On 03/17/2010 10:10 AM, Ingo Molnar wrote: >>> >>>> It's about who owns the user interface. >>>> >>>> If qemu owns the user interface, than we can satisfy this in a very >>>> simple way by adding a perf monitor command. If we have to support third >>>> party tools, then it significantly complicates things. >>> >>> Of course illogical modularization complicates things 'significantly'. >> >> Who should own the user interface then? > > If qemu was in tools/kvm/ then we wouldnt have such issues. A single patch (or > series of patches) could modify tools/kvm/, arch/x86/kvm/, virt/ and > tools/perf/. It's not a 1:1 connection. There are more users of the KVM interface. To name a few I'm aware of: - Mac-on-Linux (PPC) - Dolphin (PPC) - Xenner (x86) - Kuli (s390) Having a clear userspace interface is the only viable solution there. And if you're interested, look at my MOL enabling patch. It's less than 500 lines of code. The kernel/userspace interface really isn't the difficult part. Getting device emulation working properly, easily and fast is. Alex ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 9:24 ` Alexander Graf @ 2010-03-18 10:10 ` Ingo Molnar 2010-03-18 10:21 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 10:10 UTC (permalink / raw) To: Alexander Graf Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Alexander Graf <agraf@suse.de> wrote: > > On 18.03.2010, at 09:56, Ingo Molnar wrote: > > > > > * Avi Kivity <avi@redhat.com> wrote: > > > >> On 03/17/2010 10:10 AM, Ingo Molnar wrote: > >>> > >>>> It's about who owns the user interface. > >>>> > >>>> If qemu owns the user interface, than we can satisfy this in a very > >>>> simple way by adding a perf monitor command. If we have to support third > >>>> party tools, then it significantly complicates things. > >>> > >>> Of course illogical modularization complicates things 'significantly'. > >> > >> Who should own the user interface then? > > > > If qemu was in tools/kvm/ then we wouldnt have such issues. A single patch (or > > series of patches) could modify tools/kvm/, arch/x86/kvm/, virt/ and > > tools/perf/. > > It's not a 1:1 connection. There are more users of the KVM interface. To > name a few I'm aware of: > > - Mac-on-Linux (PPC) > - Dolphin (PPC) > - Xenner (x86) > - Kuli (s390) There must be a misunderstanding here: tools/perf/ still has a clear userspace interface and ABI. There's external projects making use of it: sysprof and libpfm (and probably more i dont know about it). Those projects are also contributing back. Still it's _very_ useful to have a single reference implementation under tools/perf/ where we concentrate the best of the code. That is where we make sure that each new kernel feature is appropriately implemented in user-space as well, that the combination works well together and is releasable to users. That is what keeps us all honest: the latency of features is much lower, and there's no ping-pong of blame going on between the two components in case of bugs or in case of misfeatures. Same goes for KVM+Qemu: it would be so much nicer to have a single, well-focused reference implementation under tools/kvm/ and have improvements flow into that code base. That way KVM developers cannot just shrug "well, GUI suckage is a user-space problem" - like the answers i got in the KVM usability thread ... The buck will stop here. And if someone thinks he can do better an external project can be started anytime. (it may even replace the upstream thing if it's better) > Having a clear userspace interface is the only viable solution there. And if > you're interested, look at my MOL enabling patch. It's less than 500 lines > of code. Why do you suppose that what i propose is an "either or" scenario? It isnt. I just suggested that instead of letting core KVM fragment its limbs into an external entity, put your name behind one good all-around solution and focus the development model into a single project. I.e. do what KVM has done originally in the kernel space to begin with - and where it was so much better than Xen: single focus. Learn from what KVM has done so well in the initial years and use the concept on the user-space components as well. The very same arguments that caused KVM to integrate into the upstream kernel (instead of being a separate project) are a valid basis to integrate the user-space components into tools/kvm/. Dont forget your roots and dont assume all your design decisions were correct. > The kernel/userspace interface really isn't the difficult part. Getting > device emulation working properly, easily and fast is. The kernel/userspace ABI is not difficult at all. Getting device emulation working properly, easily and fast indeed is. And my experience is that it is not working properly nor quickly at the moment, at all. (see the 'KVM usability' thread) Getting device emulation working properly often involves putting certain pieces that are currently done in Qemu into kernel-space. That kind of 'movement of emulation technology' from user-space component into the kernel-space component [or back] would very clearly be helped if those two components were in the same repository. And i have first-hand experience there: we had (and have) similar scenarios with tools/perf routinely. We did some aspects in user-space, then decided to do it in kernel-space. Sometimes we moved kernel bits to user-space. It was very easy and there were no package and version complications as it's a single project. Sometimes we even moved bits back and forth until we found the right balance. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 10:10 ` Ingo Molnar @ 2010-03-18 10:21 ` Avi Kivity 2010-03-18 11:35 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-18 10:21 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 12:10 PM, Ingo Molnar wrote: > >> It's not a 1:1 connection. There are more users of the KVM interface. To >> name a few I'm aware of: >> >> - Mac-on-Linux (PPC) >> - Dolphin (PPC) >> - Xenner (x86) >> - Kuli (s390) >> > There must be a misunderstanding here: tools/perf/ still has a clear userspace > interface and ABI. There's external projects making use of it: sysprof and > libpfm (and probably more i dont know about it). Those projects are also > contributing back. > So it seems it is possible to scale the package wall. > Still it's _very_ useful to have a single reference implementation under > tools/perf/ where we concentrate the best of the code. That is where we make > sure that each new kernel feature is appropriately implemented in user-space > as well, that the combination works well together and is releasable to users. > That is what keeps us all honest: the latency of features is much lower, and > there's no ping-pong of blame going on between the two components in case of > bugs or in case of misfeatures. > That would make sense for a truly minimal userspace for kvm: we once had a tool called kvmctl which was used to run tests (since folded into qemu). It didn't contain a GUI and was unable to run a general purpose guest. It was a few hundred lines of code, and indeed patches to kvmctl had a much closer correspondence to patches with kvm (though still low, as most kvm patches don't modify the ABI). > Same goes for KVM+Qemu: it would be so much nicer to have a single, > well-focused reference implementation under tools/kvm/ and have improvements > flow into that code base. > > That way KVM developers cannot just shrug "well, GUI suckage is a user-space > problem" - like the answers i got in the KVM usability thread ... > > The buck will stop here. > Suppose we copy qemu tomorrow into tools/. All the problems will be copied with it. Someone still has to write patches to fix them. Who will it be? >> The kernel/userspace interface really isn't the difficult part. Getting >> device emulation working properly, easily and fast is. >> > The kernel/userspace ABI is not difficult at all. Getting device emulation > working properly, easily and fast indeed is. And my experience is that it is > not working properly nor quickly at the moment, at all. (see the 'KVM > usability' thread) > > Getting device emulation working properly often involves putting certain > pieces that are currently done in Qemu into kernel-space. That kind of > 'movement of emulation technology' from user-space component into the > kernel-space component [or back] would very clearly be helped if those two > components were in the same repository. > Moving emulation into the kernel is indeed a problem. Not because it's difficult, but because it indicates that the interfaces exposed to userspace are insufficient to obtain good performance. We had that with vhost-net and I'm afraid we'll have that with vhost-blk. > And i have first-hand experience there: we had (and have) similar scenarios > with tools/perf routinely. We did some aspects in user-space, then decided to > do it in kernel-space. Sometimes we moved kernel bits to user-space. It was > very easy and there were no package and version complications as it's a single > project. Sometimes we even moved bits back and forth until we found the right > balance. > That's reasonable in the first iterations of a project. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 10:21 ` Avi Kivity @ 2010-03-18 11:35 ` Ingo Molnar 2010-03-18 12:00 ` Alexander Graf 2010-03-18 12:33 ` Frank Ch. Eigler 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 11:35 UTC (permalink / raw) To: Avi Kivity Cc: Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > > Still it's _very_ useful to have a single reference implementation under > > tools/perf/ where we concentrate the best of the code. That is where we > > make sure that each new kernel feature is appropriately implemented in > > user-space as well, that the combination works well together and is > > releasable to users. That is what keeps us all honest: the latency of > > features is much lower, and there's no ping-pong of blame going on between > > the two components in case of bugs or in case of misfeatures. > > That would make sense for a truly minimal userspace for kvm: we once had a > tool called kvmctl which was used to run tests (since folded into qemu). It > didn't contain a GUI and was unable to run a general purpose guest. It was > a few hundred lines of code, and indeed patches to kvmctl had a much closer > correspondence to patches with kvm (though still low, as most kvm patches > don't modify the ABI). If it's functional to the extent of at least allowing say a serial console via the console (like the UML binary allows) i'd expect the minimal user-space to quickly grow out of this minimal state. The rest will be history. Maybe this is a better, simpler (and much cleaner and less controversial) approach than moving a 'full' copy of qemu there. There's certainly no risk: if qemu stays dominant then nothing is lost [tools/kvm/ can be removed after some time], and if this clean base works out fine then the useful qemu technologies will move over to it gradually and without much fuss, and the developers will move with it as well. If it's just a token effort with near zero utility to begin with it certainly wont take off. Once it's there in tools/kvm/ and bootable i'd certainly hack up some quick xlib based VGA output capability myself - it's not that hard ;-) It would also allow me to test whether latest-KVM still boots fine in a much simpler way. (most of my testboxes dont have qemu installed) So you have one user signed up for that already ;-) > > Same goes for KVM+Qemu: it would be so much nicer to have a single, > > well-focused reference implementation under tools/kvm/ and have > > improvements flow into that code base. > > > > That way KVM developers cannot just shrug "well, GUI suckage is a > > user-space problem" - like the answers i got in the KVM usability thread > > ... > > > > The buck will stop here. > > Suppose we copy qemu tomorrow into tools/. All the problems will be copied > with it. Someone still has to write patches to fix them. Who will it be? What we saw with tools/perf/ was that pure proximity to actual kernel testers and kernel developers produces a steady influx of new developers. It didnt happen overnight, but it happened. A simple: cd tools/perf/ make -j install Gets them something to play with. That kind of proximity is very powerful. The other benefit was that distros can package perf with the kernel package, so it's updated together with the kernel. This means a very efficient distribution of new technologies, together with new kernel releases. Distributions are very eager to update kernels even in stable periods of the distro lifetime - they are much less willing to update user-space packages. You can literally get full KVM+userspace features done _and deployed to users_ within the 3 months development cycle of upstream KVM. All these create synergies that are very clear once you see the process in motion. It's a powerful positive feedback loop. Give it some thought please. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 11:35 ` Ingo Molnar @ 2010-03-18 12:00 ` Alexander Graf 2010-03-18 12:33 ` Frank Ch. Eigler 1 sibling, 0 replies; 390+ messages in thread From: Alexander Graf @ 2010-03-18 12:00 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker Ingo Molnar wrote: > * Avi Kivity <avi@redhat.com> wrote: > > >>> Still it's _very_ useful to have a single reference implementation under >>> tools/perf/ where we concentrate the best of the code. That is where we >>> make sure that each new kernel feature is appropriately implemented in >>> user-space as well, that the combination works well together and is >>> releasable to users. That is what keeps us all honest: the latency of >>> features is much lower, and there's no ping-pong of blame going on between >>> the two components in case of bugs or in case of misfeatures. >>> >> That would make sense for a truly minimal userspace for kvm: we once had a >> tool called kvmctl which was used to run tests (since folded into qemu). It >> didn't contain a GUI and was unable to run a general purpose guest. It was >> a few hundred lines of code, and indeed patches to kvmctl had a much closer >> correspondence to patches with kvm (though still low, as most kvm patches >> don't modify the ABI). >> > > If it's functional to the extent of at least allowing say a serial console via > the console (like the UML binary allows) i'd expect the minimal user-space to > quickly grow out of this minimal state. The rest will be history. > > Maybe this is a better, simpler (and much cleaner and less controversial) > approach than moving a 'full' copy of qemu there. > > There's certainly no risk: if qemu stays dominant then nothing is lost > [tools/kvm/ can be removed after some time], and if this clean base works out > fine then the useful qemu technologies will move over to it gradually and > without much fuss, and the developers will move with it as well. > > If it's just a token effort with near zero utility to begin with it certainly > wont take off. > > Once it's there in tools/kvm/ and bootable i'd certainly hack up some quick > xlib based VGA output capability myself - it's not that hard ;-) It would also > allow me to test whether latest-KVM still boots fine in a much simpler way. > (most of my testboxes dont have qemu installed) > > So you have one user signed up for that already ;-) > Alright, you just volunteered. Just give it a go and try to implement the "oh so simple" KVM frontend while maintaining compatibility with at least a few older Linux guests. My guess is that you'll realize it's a dead end before committing anything to the kernel source tree. But really, just try it out. Good Luck Alex ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 11:35 ` Ingo Molnar 2010-03-18 12:00 ` Alexander Graf @ 2010-03-18 12:33 ` Frank Ch. Eigler 2010-03-18 13:01 ` John Kacur 2010-03-18 13:02 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: Frank Ch. Eigler @ 2010-03-18 12:33 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker Ingo Molnar <mingo@elte.hu> writes: > [...] > Distributions are very eager to update kernels even in stable periods of the > distro lifetime - they are much less willing to update user-space packages. > [...] Sorry, er, what? What distributions eagerly upgrade kernels in stable periods, were it not primarily motivated by security fixes? What users eagerly replace their kernels? - FChE ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 12:33 ` Frank Ch. Eigler @ 2010-03-18 13:01 ` John Kacur 2010-03-18 13:02 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: John Kacur @ 2010-03-18 13:01 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Ingo Molnar, Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Thu, Mar 18, 2010 at 1:33 PM, Frank Ch. Eigler <fche@redhat.com> wrote: > Ingo Molnar <mingo@elte.hu> writes: > >> [...] >> Distributions are very eager to update kernels even in stable periods of the >> distro lifetime - they are much less willing to update user-space packages. >> [...] > > Sorry, er, what? What distributions eagerly upgrade kernels in stable > periods, were it not primarily motivated by security fixes? What users > eagerly replace their kernels? > Us guys reading and participating on the list. ;) ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-18 13:01 ` John Kacur 0 siblings, 0 replies; 390+ messages in thread From: John Kacur @ 2010-03-18 13:01 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Ingo Molnar, Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Thu, Mar 18, 2010 at 1:33 PM, Frank Ch. Eigler <fche@redhat.com> wrote: > Ingo Molnar <mingo@elte.hu> writes: > >> [...] >> Distributions are very eager to update kernels even in stable periods of the >> distro lifetime - they are much less willing to update user-space packages. >> [...] > > Sorry, er, what? What distributions eagerly upgrade kernels in stable > periods, were it not primarily motivated by security fixes? What users > eagerly replace their kernels? > Us guys reading and participating on the list. ;) ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:01 ` John Kacur (?) @ 2010-03-18 14:25 ` Ingo Molnar 2010-03-18 14:39 ` Frank Ch. Eigler -1 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 14:25 UTC (permalink / raw) To: John Kacur Cc: Frank Ch. Eigler, Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * John Kacur <jkacur@redhat.com> wrote: > On Thu, Mar 18, 2010 at 1:33 PM, Frank Ch. Eigler <fche@redhat.com> wrote: > > Ingo Molnar <mingo@elte.hu> writes: > > > >> [...] > >> Distributions are very eager to update kernels even in stable periods of the > >> distro lifetime - they are much less willing to update user-space packages. > >> [...] > > > > Sorry, er, what? ?What distributions eagerly upgrade kernels in stable > > periods, were it not primarily motivated by security fixes? ?What users > > eagerly replace their kernels? > > > > Us guys reading and participating on the list. ;) I'd like to second that - i'm actually quite happy to update the distro kernel. Also, i have rarely any problems even with bleeding edge kernels in rawhide - they are working pretty smoothly. A large xorg update showing up in yum update gives me the cringe though ;-) Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 14:25 ` Ingo Molnar @ 2010-03-18 14:39 ` Frank Ch. Eigler 0 siblings, 0 replies; 390+ messages in thread From: Frank Ch. Eigler @ 2010-03-18 14:39 UTC (permalink / raw) To: Ingo Molnar Cc: John Kacur, Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker Hi - On Thu, Mar 18, 2010 at 03:25:04PM +0100, Ingo Molnar wrote: > [...] > > Us guys reading and participating on the list. ;) > > I'd like to second that - i'm actually quite happy to update the distro > kernel. Also, i have rarely any problems even with bleeding edge kernels in > rawhide - they are working pretty smoothly. > > A large xorg update showing up in yum update gives me the cringe though ;-) >From a parochial point of view, that makes perfect sense: someone else's large software changes are a source of concern. The same thing applies to non-LKML people -- ordinary users -- when *your* large software changes are proposed. Perhaps this change in perspective would help you see the absurdity of proposing kernel-2.6.git as a hosting repository for all kinds of stuff, on the theory that kernel updates get pushed to "eager" users more frequently than other kinds of updates. (Never mind that data shows otherwise.) - FChE ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 12:33 ` Frank Ch. Eigler 2010-03-18 13:01 ` John Kacur @ 2010-03-18 13:02 ` Ingo Molnar 2010-03-18 13:10 ` Avi Kivity 2010-03-18 13:24 ` Frank Ch. Eigler 1 sibling, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 13:02 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Frank Ch. Eigler <fche@redhat.com> wrote: > Ingo Molnar <mingo@elte.hu> writes: > > > [...] > > Distributions are very eager to update kernels even in stable periods of the > > distro lifetime - they are much less willing to update user-space packages. > > [...] > > Sorry, er, what? What distributions eagerly upgrade kernels in stable > periods, were it not primarily motivated by security fixes? [...] Please check the popular distro called 'Fedora' for example, and its kernel upgrade policies. > [...] What users eagerly replace their kernels? Those 99% who click on the 'install 193 updates' popup. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:02 ` Ingo Molnar @ 2010-03-18 13:10 ` Avi Kivity 2010-03-18 13:31 ` Ingo Molnar 2010-03-18 13:24 ` Frank Ch. Eigler 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-18 13:10 UTC (permalink / raw) To: Ingo Molnar Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 03:02 PM, Ingo Molnar wrote: > >> [...] What users eagerly replace their kernels? >> > Those 99% who click on the 'install 193 updates' popup. > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be qemu). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:10 ` Avi Kivity @ 2010-03-18 13:31 ` Ingo Molnar 2010-03-18 13:44 ` Daniel P. Berrange 2010-03-18 13:46 ` Avi Kivity 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 13:31 UTC (permalink / raw) To: Avi Kivity Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/18/2010 03:02 PM, Ingo Molnar wrote: > > > >> [...] What users eagerly replace their kernels? > > > > Those 99% who click on the 'install 193 updates' popup. > > > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be > qemu). I think you didnt understand my (tersely explained) point - which is probably my fault. What i said is: - distros update the kernel first. Often in stable releases as well if there's a new kernel released. (They must because it provides new hardware enablement and other critical changes they generally cannot skip.) - Qemu on the other hand is not upgraded with (nearly) that level of urgency. Completely new versions will generally have to wait for the next distro release. With in-kernel tools the kernel and the tooling that accompanies the kernel are upgraded in the same low-latency pathway. That is a big plus if you are offering things like instrumentation (which perf does), which relates closely to the kernel. Furthermore, many distros package up the latest -git kernel as well. They almost never do that with user-space packages. Let me give you a specific example: I'm running Fedora Rawhide with 2.6.34-rc1 right now on my main desktop, and that comes with perf-2.6.34-0.10.rc1.git0.fc14.noarch. My rawhide box has qemu-kvm-0.12.3-3.fc14.x86_64 installed. That's more than a 1000 Qemu commits older than the latest Qemu development branch. So by being part of the kernel repo there's lower latency upgrades and earlier and better testing available on most distros. You made it very clear that you dont want that, but please dont try to claim that those advantages do not exist - they are very much real and we are making good use of it. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:31 ` Ingo Molnar @ 2010-03-18 13:44 ` Daniel P. Berrange 2010-03-18 13:59 ` Ingo Molnar 2010-03-18 13:46 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Daniel P. Berrange @ 2010-03-18 13:44 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Thu, Mar 18, 2010 at 02:31:24PM +0100, Ingo Molnar wrote: > > * Avi Kivity <avi@redhat.com> wrote: > > > On 03/18/2010 03:02 PM, Ingo Molnar wrote: > > > > > >> [...] What users eagerly replace their kernels? > > > > > > Those 99% who click on the 'install 193 updates' popup. > > > > > > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be > > qemu). > > I think you didnt understand my (tersely explained) point - which is probably > my fault. What i said is: > > - distros update the kernel first. Often in stable releases as well if > there's a new kernel released. (They must because it provides new hardware > enablement and other critical changes they generally cannot skip.) > > - Qemu on the other hand is not upgraded with (nearly) that level of urgency. > Completely new versions will generally have to wait for the next distro > release. This has nothing todo with them being in separate source repos. We could update QEMU to new major feature releaes with the same frequency in a Fedora release, but we delibrately choose not to rebase the QEMU userspace because experiance has shown the downside from new bugs / regressions outweighs the benefit of any new features. The QEMU updates in stable Fedora trees, now just follow the minor bugfix release stream provided by QEMU & those arrive in Fedora with little noticable delay. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:44 ` Daniel P. Berrange @ 2010-03-18 13:59 ` Ingo Molnar 2010-03-18 14:06 ` John Kacur 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 13:59 UTC (permalink / raw) To: Daniel P. Berrange Cc: Avi Kivity, Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Daniel P. Berrange <berrange@redhat.com> wrote: > On Thu, Mar 18, 2010 at 02:31:24PM +0100, Ingo Molnar wrote: > > > > * Avi Kivity <avi@redhat.com> wrote: > > > > > On 03/18/2010 03:02 PM, Ingo Molnar wrote: > > > > > > > >> [...] What users eagerly replace their kernels? > > > > > > > > Those 99% who click on the 'install 193 updates' popup. > > > > > > > > > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be > > > qemu). > > > > I think you didnt understand my (tersely explained) point - which is probably > > my fault. What i said is: > > > > - distros update the kernel first. Often in stable releases as well if > > there's a new kernel released. (They must because it provides new hardware > > enablement and other critical changes they generally cannot skip.) > > > > - Qemu on the other hand is not upgraded with (nearly) that level of urgency. > > Completely new versions will generally have to wait for the next distro > > release. > > This has nothing todo with them being in separate source repos. We could > update QEMU to new major feature releaes with the same frequency in a Fedora > release, but we delibrately choose not to rebase the QEMU userspace because > experiance has shown the downside from new bugs / regressions outweighs the > benefit of any new features. > > The QEMU updates in stable Fedora trees, now just follow the minor bugfix > release stream provided by QEMU & those arrive in Fedora with little > noticable delay. That is exactly what i said: Qemu and most user-space packages are on a 'slower' update track than the kernel: generally updated for minor releases. My further point was that the kernel on the other hand gets updated more frequently and as such, any user-space tool bits hosted in the kernel repo get updated more frequently as well. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:59 ` Ingo Molnar @ 2010-03-18 14:06 ` John Kacur 0 siblings, 0 replies; 390+ messages in thread From: John Kacur @ 2010-03-18 14:06 UTC (permalink / raw) To: Ingo Molnar Cc: Daniel P. Berrange, Avi Kivity, Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Thu, Mar 18, 2010 at 2:59 PM, Ingo Molnar <mingo@elte.hu> wrote: > > * Daniel P. Berrange <berrange@redhat.com> wrote: > >> On Thu, Mar 18, 2010 at 02:31:24PM +0100, Ingo Molnar wrote: >> > >> > * Avi Kivity <avi@redhat.com> wrote: >> > >> > > On 03/18/2010 03:02 PM, Ingo Molnar wrote: >> > > > >> > > >> [...] What users eagerly replace their kernels? >> > > > >> > > > Those 99% who click on the 'install 193 updates' popup. >> > > > >> > > >> > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be >> > > qemu). >> > >> > I think you didnt understand my (tersely explained) point - which is probably >> > my fault. What i said is: >> > >> > - distros update the kernel first. Often in stable releases as well if >> > there's a new kernel released. (They must because it provides new hardware >> > enablement and other critical changes they generally cannot skip.) >> > >> > - Qemu on the other hand is not upgraded with (nearly) that level of urgency. >> > Completely new versions will generally have to wait for the next distro >> > release. >> >> This has nothing todo with them being in separate source repos. We could >> update QEMU to new major feature releaes with the same frequency in a Fedora >> release, but we delibrately choose not to rebase the QEMU userspace because >> experiance has shown the downside from new bugs / regressions outweighs the >> benefit of any new features. >> >> The QEMU updates in stable Fedora trees, now just follow the minor bugfix >> release stream provided by QEMU & those arrive in Fedora with little >> noticable delay. > > That is exactly what i said: Qemu and most user-space packages are on a > 'slower' update track than the kernel: generally updated for minor releases. > > My further point was that the kernel on the other hand gets updated more > frequently and as such, any user-space tool bits hosted in the kernel repo get > updated more frequently as well. > > Thanks, > > Ingo Just to play devil's advocate, let's not mix up the development model with the distribution model. There is nothing to stop packagers and distributors from providing separate kernel "proper" packages and perf tools packages. It might even make good sense assuming backwards compatibility for distros that have conservative policies about new kernel versions to provide newer perf tools packages with older kernels. John ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-18 14:06 ` John Kacur 0 siblings, 0 replies; 390+ messages in thread From: John Kacur @ 2010-03-18 14:06 UTC (permalink / raw) To: Ingo Molnar Cc: Daniel P. Berrange, Avi Kivity, Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Thu, Mar 18, 2010 at 2:59 PM, Ingo Molnar <mingo@elte.hu> wrote: > > * Daniel P. Berrange <berrange@redhat.com> wrote: > >> On Thu, Mar 18, 2010 at 02:31:24PM +0100, Ingo Molnar wrote: >> > >> > * Avi Kivity <avi@redhat.com> wrote: >> > >> > > On 03/18/2010 03:02 PM, Ingo Molnar wrote: >> > > > >> > > >> [...] What users eagerly replace their kernels? >> > > > >> > > > Those 99% who click on the 'install 193 updates' popup. >> > > > >> > > >> > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be >> > > qemu). >> > >> > I think you didnt understand my (tersely explained) point - which is probably >> > my fault. What i said is: >> > >> > - distros update the kernel first. Often in stable releases as well if >> > there's a new kernel released. (They must because it provides new hardware >> > enablement and other critical changes they generally cannot skip.) >> > >> > - Qemu on the other hand is not upgraded with (nearly) that level of urgency. >> > Completely new versions will generally have to wait for the next distro >> > release. >> >> This has nothing todo with them being in separate source repos. We could >> update QEMU to new major feature releaes with the same frequency in a Fedora >> release, but we delibrately choose not to rebase the QEMU userspace because >> experiance has shown the downside from new bugs / regressions outweighs the >> benefit of any new features. >> >> The QEMU updates in stable Fedora trees, now just follow the minor bugfix >> release stream provided by QEMU & those arrive in Fedora with little >> noticable delay. > > That is exactly what i said: Qemu and most user-space packages are on a > 'slower' update track than the kernel: generally updated for minor releases. > > My further point was that the kernel on the other hand gets updated more > frequently and as such, any user-space tool bits hosted in the kernel repo get > updated more frequently as well. > > Thanks, > > Ingo Just to play devil's advocate, let's not mix up the development model with the distribution model. There is nothing to stop packagers and distributors from providing separate kernel "proper" packages and perf tools packages. It might even make good sense assuming backwards compatibility for distros that have conservative policies about new kernel versions to provide newer perf tools packages with older kernels. John ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 14:06 ` John Kacur (?) @ 2010-03-18 14:11 ` Ingo Molnar -1 siblings, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 14:11 UTC (permalink / raw) To: John Kacur Cc: Daniel P. Berrange, Avi Kivity, Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * John Kacur <jkacur@redhat.com> wrote: > On Thu, Mar 18, 2010 at 2:59 PM, Ingo Molnar <mingo@elte.hu> wrote: > > > > * Daniel P. Berrange <berrange@redhat.com> wrote: > > > >> On Thu, Mar 18, 2010 at 02:31:24PM +0100, Ingo Molnar wrote: > >> > > >> > * Avi Kivity <avi@redhat.com> wrote: > >> > > >> > > On 03/18/2010 03:02 PM, Ingo Molnar wrote: > >> > > > > >> > > >> [...] What users eagerly replace their kernels? > >> > > > > >> > > > Those 99% who click on the 'install 193 updates' popup. > >> > > > > >> > > > >> > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be > >> > > qemu). > >> > > >> > I think you didnt understand my (tersely explained) point - which is probably > >> > my fault. What i said is: > >> > > >> > ?- distros update the kernel first. Often in stable releases as well if > >> > ? ?there's a new kernel released. (They must because it provides new hardware > >> > ? ?enablement and other critical changes they generally cannot skip.) > >> > > >> > ?- Qemu on the other hand is not upgraded with (nearly) that level of urgency. > >> > ? ?Completely new versions will generally have to wait for the next distro > >> > ? ?release. > >> > >> This has nothing todo with them being in separate source repos. We could > >> update QEMU to new major feature releaes with the same frequency in a Fedora > >> release, but we delibrately choose not to rebase the QEMU userspace because > >> experiance has shown the downside from new bugs / regressions outweighs the > >> benefit of any new features. > >> > >> The QEMU updates in stable Fedora trees, now just follow the minor bugfix > >> release stream provided by QEMU & those arrive in Fedora with little > >> noticable delay. > > > > That is exactly what i said: Qemu and most user-space packages are on a > > 'slower' update track than the kernel: generally updated for minor releases. > > > > My further point was that the kernel on the other hand gets updated more > > frequently and as such, any user-space tool bits hosted in the kernel repo get > > updated more frequently as well. > > > > Thanks, > > > > ? ? ? ?Ingo > > Just to play devil's advocate, let's not mix up the development model with > the distribution model. There is nothing to stop packagers and distributors > from providing separate kernel "proper" packages and perf tools packages. > > It might even make good sense assuming backwards compatibility for distros > that have conservative policies about new kernel versions to provide newer > perf tools packages with older kernels. Of course. Some distros are also very conservative about updating the kernel at all. I'm mostly talking about the distros that are at the frontier of kernel development: those with fresh packages, those which provide eager bleeding-edge testers and developers. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:31 ` Ingo Molnar 2010-03-18 13:44 ` Daniel P. Berrange @ 2010-03-18 13:46 ` Avi Kivity 2010-03-18 13:57 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-18 13:46 UTC (permalink / raw) To: Ingo Molnar Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 03:31 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/18/2010 03:02 PM, Ingo Molnar wrote: >> >>> >>>> [...] What users eagerly replace their kernels? >>>> >>> Those 99% who click on the 'install 193 updates' popup. >>> >>> >> Of which 1 is the kernel, and 192 are userspace updates (of which one may be >> qemu). >> > I think you didnt understand my (tersely explained) point - which is probably > my fault. What i said is: > > - distros update the kernel first. Often in stable releases as well if > there's a new kernel released. (They must because it provides new hardware > enablement and other critical changes they generally cannot skip.) > No, they don't. RHEL 5 is still on 2.6.18, for example. Users don't like their kernels updated unless absolutely necessary, with good reason. Kernel updates = reboots. > - Qemu on the other hand is not upgraded with (nearly) that level of urgency. > Completely new versions will generally have to wait for the next distro > release. > F12 recently updated to 2.6.32. This is probably due to 2.6.31.stable dropping away, and no capacity at Fedora to maintain it on their own. So they are caught in a bind - stay on 2.6.31 and expose users to security vulnerabilities or move to 2.6.32 and cause regressions. Not a happy choice. > With in-kernel tools the kernel and the tooling that accompanies the kernel > are upgraded in the same low-latency pathway. That is a big plus if you are > offering things like instrumentation (which perf does), which relates closely > to the kernel. > > Furthermore, many distros package up the latest -git kernel as well. They > almost never do that with user-space packages. > I'm sure if we ask the Fedora qemu maintainer to package qemu-kvm.git they'll consider it favourably. Isn't that what rawhide is for? > Let me give you a specific example: > > I'm running Fedora Rawhide with 2.6.34-rc1 right now on my main desktop, and > that comes with perf-2.6.34-0.10.rc1.git0.fc14.noarch. > > My rawhide box has qemu-kvm-0.12.3-3.fc14.x86_64 installed. That's more than a > 1000 Qemu commits older than the latest Qemu development branch. > > So by being part of the kernel repo there's lower latency upgrades and earlier > and better testing available on most distros. > > You made it very clear that you dont want that, but please dont try to claim > that those advantages do not exist - they are very much real and we are making > good use of it. > I don't mind at all if rawhide users run on the latest and greatest, but release users deserve a little more stability. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:46 ` Avi Kivity @ 2010-03-18 13:57 ` Ingo Molnar 2010-03-18 14:25 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 13:57 UTC (permalink / raw) To: Avi Kivity Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/18/2010 03:31 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>On 03/18/2010 03:02 PM, Ingo Molnar wrote: > >>>>[...] What users eagerly replace their kernels? > >>>Those 99% who click on the 'install 193 updates' popup. > >>> > >>Of which 1 is the kernel, and 192 are userspace updates (of which one may be > >>qemu). > >I think you didnt understand my (tersely explained) point - which is probably > >my fault. What i said is: > > > > - distros update the kernel first. Often in stable releases as well if > > there's a new kernel released. (They must because it provides new hardware > > enablement and other critical changes they generally cannot skip.) > > No, they don't. [...] I just replied to Frank Ch. Eigler with a specific example that shows how this happens - and believe me, it happens. > [...] RHEL 5 is still on 2.6.18, for example. Users > don't like their kernels updated unless absolutely necessary, with > good reason. Nope - RHEL 5 is on a 2.6.18 base for entirely different reasons. > Kernel updates = reboots. If you check the update frequency of RHEL 5 kernels you'll see that it's comparable to that of Fedora. > > - Qemu on the other hand is not upgraded with (nearly) that level of urgency. > > Completely new versions will generally have to wait for the next distro > > release. > > F12 recently updated to 2.6.32. This is probably due to 2.6.31.stable > dropping away, and no capacity at Fedora to maintain it on their own. So > they are caught in a bind - stay on 2.6.31 and expose users to security > vulnerabilities or move to 2.6.32 and cause regressions. Not a happy > choice. Happy choice or not, this is what i said is the distro practice these days. (i dont know all the distros that well so i'm sure there's differences) > > With in-kernel tools the kernel and the tooling that accompanies the kernel > > are upgraded in the same low-latency pathway. That is a big plus if you are > > offering things like instrumentation (which perf does), which relates closely > > to the kernel. > > > > Furthermore, many distros package up the latest -git kernel as well. They > > almost never do that with user-space packages. > > I'm sure if we ask the Fedora qemu maintainer to package qemu-kvm.git > they'll consider it favourably. Isn't that what rawhide is for? Rawhide is generally for latest released versions, to ready them for the next distro release - with special exception for the kernel, which has a special position due being a hardware-enabler and because it has an extremely predictable release schedule of every 90 days (+- 10 days). Very rarely do distro people jump versions for things like GCC or Xorg or Gnome/KDE, but they've been burned enough times by unexpected delays in those projects to be really loathe to do it. Qemu might get an exception - dunno, you could ask. My point still holds: by hosting KVM user-space bits in the kernel together with the rest of KVM you get version parity - which has clear advantages. You also might have more luck with a bleeding-edge distro such as Gentoo. > >Let me give you a specific example: > > > >I'm running Fedora Rawhide with 2.6.34-rc1 right now on my main desktop, and > >that comes with perf-2.6.34-0.10.rc1.git0.fc14.noarch. > > > >My rawhide box has qemu-kvm-0.12.3-3.fc14.x86_64 installed. That's more than a > >1000 Qemu commits older than the latest Qemu development branch. > > > >So by being part of the kernel repo there's lower latency upgrades and earlier > >and better testing available on most distros. > > > >You made it very clear that you dont want that, but please dont try to claim > >that those advantages do not exist - they are very much real and we are making > >good use of it. > > I don't mind at all if rawhide users run on the latest and greatest, but > release users deserve a little more stability. What are you suggesting, that released versions of KVM are not reliable? Of course any tools/ bits are release engineered just as much as the rest of KVM ... Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:57 ` Ingo Molnar @ 2010-03-18 14:25 ` Avi Kivity 2010-03-18 14:36 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-18 14:25 UTC (permalink / raw) To: Ingo Molnar Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 03:57 PM, Ingo Molnar wrote: > >> [...] RHEL 5 is still on 2.6.18, for example. Users >> don't like their kernels updated unless absolutely necessary, with >> good reason. >> > Nope - RHEL 5 is on a 2.6.18 base for entirely different reasons. > All the reasons have 'stability' in them. >> Kernel updates = reboots. >> > If you check the update frequency of RHEL 5 kernels you'll see that it's > comparable to that of Fedora. > I'm sorry to say that's pretty bad. Users don't want to update their kernels. >>> - Qemu on the other hand is not upgraded with (nearly) that level of urgency. >>> Completely new versions will generally have to wait for the next distro >>> release. >>> >> F12 recently updated to 2.6.32. This is probably due to 2.6.31.stable >> dropping away, and no capacity at Fedora to maintain it on their own. So >> they are caught in a bind - stay on 2.6.31 and expose users to security >> vulnerabilities or move to 2.6.32 and cause regressions. Not a happy >> choice. >> > Happy choice or not, this is what i said is the distro practice these days. (i > dont know all the distros that well so i'm sure there's differences) > So in addition to all the normal kernel regressions, you want to force tools/kvm/ regressions on users. >> I don't mind at all if rawhide users run on the latest and greatest, but >> release users deserve a little more stability. >> > What are you suggesting, that released versions of KVM are not reliable? Of > course any tools/ bits are release engineered just as much as the rest of KVM > ... > No, I am suggesting qemu-kvm.git is not as stable as released versions (and won't get fixed backported). Keep in mind that unlike many userspace applications, qemu exposes an ABI to guests which we must keep compatible. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 14:25 ` Avi Kivity @ 2010-03-18 14:36 ` Ingo Molnar 2010-03-18 14:51 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 14:36 UTC (permalink / raw) To: Avi Kivity Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > > Happy choice or not, this is what i said is the distro practice these > > days. (i dont know all the distros that well so i'm sure there's > > differences) > > So in addition to all the normal kernel regressions, you want to force > tools/kvm/ regressions on users. So instead you force a NxN compatibility matrix [all versions of qemu combined with all versions of the kernel] instead of a linear N versions matrix with a clear focus on the last version. Brilliant engineering i have to say ;-) Also, by your argument the kernel should be split up into a micro-kernel, with different packages for KVM, scheduler, drivers, upgradeable separately. That would be a nightmare. (i can detail many facets of that nightmare if you insist but i'll spare the electrons for now) Fortunately few kernel developers share your views about this. > > > I don't mind at all if rawhide users run on the latest and greatest, but > > > release users deserve a little more stability. > > > > What are you suggesting, that released versions of KVM are not reliable? > > Of course any tools/ bits are release engineered just as much as the rest > > of KVM ... > > No, I am suggesting qemu-kvm.git is not as stable as released versions (and > won't get fixed backported). Keep in mind that unlike many userspace > applications, qemu exposes an ABI to guests which we must keep compatible. I think you still dont understand it: if a tool moves to the kernel repo, then it is _released stable_ together with the next stable kernel. I.e. you'd get a stable qemu-2.6.34 in essence, when v2.6.34 is released. You get minor updates with 2.6.34.1, 2.6.34.2, 2.6.34.3, etc - while development continues. I.e. you get _more_ stability, because a matching kernel is released with a matching Qemu. Qemu might have a different release schedule. Which, i argue, is not a good thing for exactly that reason :-) If it moved to tools/kvm/ it would get the same 90 days release frequency, merge window and stabilization window treatment as the upstream kernel. Furthermore, users can also run experimental versions of qemu together with experimental versions of the kernel, by running something like 2.6.34-rc1 on Rawhide. Even if they dont download the latest qemu git and build it. I.e. clearly _more_ is possible in such a scheme. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 14:36 ` Ingo Molnar @ 2010-03-18 14:51 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-18 14:51 UTC (permalink / raw) To: Ingo Molnar Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 04:36 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> Happy choice or not, this is what i said is the distro practice these >>> days. (i dont know all the distros that well so i'm sure there's >>> differences) >>> >> So in addition to all the normal kernel regressions, you want to force >> tools/kvm/ regressions on users. >> > So instead you force a NxN compatibility matrix [all versions of qemu combined > with all versions of the kernel] instead of a linear N versions matrix with a > clear focus on the last version. Brilliant engineering i have to say ;-) > Thanks. In fact with have an QxKxGxT compatibility matrix since we need to keep compatibility with guests and with tools. Since the easiest interface to keep compatible is the qemu/kernel interface, allowing the kernel and qemu to change independently allows reducing the compatibility matrix while still providing some improvements. Regardless of that I'd keep binary compatibility anyway. Not everyone is on the update treadmill with everything updating every three months and those people appreciate stability. I intend to keep providing it. > Also, by your argument the kernel should be split up into a micro-kernel, with > different packages for KVM, scheduler, drivers, upgradeable separately. > Some kernels do provide some of that facility (without being microkernels), for example the Windows and RHEL kernels. So it seems people want it. > That would be a nightmare. (i can detail many facets of that nightmare if you > insist but i'll spare the electrons for now) Fortunately few kernel developers > share your views about this. > I'm not sure you know my views about this. >>>> I don't mind at all if rawhide users run on the latest and greatest, but >>>> release users deserve a little more stability. >>>> >>> What are you suggesting, that released versions of KVM are not reliable? >>> Of course any tools/ bits are release engineered just as much as the rest >>> of KVM ... >>> >> No, I am suggesting qemu-kvm.git is not as stable as released versions (and >> won't get fixed backported). Keep in mind that unlike many userspace >> applications, qemu exposes an ABI to guests which we must keep compatible. >> > I think you still dont understand it: if a tool moves to the kernel repo, then > it is _released stable_ together with the next stable kernel. > I was confused by the talk about 2.6.34-rc1, which isn't stable. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:02 ` Ingo Molnar 2010-03-18 13:10 ` Avi Kivity @ 2010-03-18 13:24 ` Frank Ch. Eigler 2010-03-18 13:48 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Frank Ch. Eigler @ 2010-03-18 13:24 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker Hi - > > > [...] > > > Distributions are very eager to update kernels even in stable periods of the > > > distro lifetime - they are much less willing to update user-space packages. > > > [...] > > > > Sorry, er, what? What distributions eagerly upgrade kernels in stable > > periods, were it not primarily motivated by security fixes? [...] > > Please check the popular distro called 'Fedora' for example I do believe I've heard of it. According to fedora bodhi, there have been 18 kernel updates issues for fedora 11 since its release, of which 12 were for purely security updates, and most of the other six also contain security fixes. None are described as 'enhancement' updates. Oh, what about fedora 12? 8 updates total, of which 5 are security only, one for drm showstoppers, others including security fixes, again 0 tagged as 'enhancement'. So where is that "eagerness" again?? My sense is that most users are happy to leave a stable kernel running as long as possible, and distributions know this. You surely must understand that the lkml demographics are different. > and its kernel upgrade policies. [citation needed] > > [...] What users eagerly replace their kernels? > > Those 99% who click on the 'install 193 updates' popup. That's not "eager". That's "I'm exasperated from guessing what's really important; let's not have so many updates; meh". - FChE ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:24 ` Frank Ch. Eigler @ 2010-03-18 13:48 ` Ingo Molnar 0 siblings, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 13:48 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Frank Ch. Eigler <fche@redhat.com> wrote: > Hi - > > > > > [...] > > > > Distributions are very eager to update kernels even in stable periods of the > > > > distro lifetime - they are much less willing to update user-space packages. > > > > [...] > > > > > > Sorry, er, what? What distributions eagerly upgrade kernels in stable > > > periods, were it not primarily motivated by security fixes? [...] > > > > Please check the popular distro called 'Fedora' for example > > I do believe I've heard of it. According to fedora bodhi, there have > been 18 kernel updates issues for fedora 11 since its release, of > which 12 were for purely security updates, and most of the other six > also contain security fixes. None are described as 'enhancement' > updates. Oh, what about fedora 12? 8 updates total, of which 5 are > security only, one for drm showstoppers, others including security > fixes, again 0 tagged as 'enhancement'. > > So where is that "eagerness" again?? My sense is that most users are > happy to leave a stable kernel running as long as possible, and > distributions know this. You surely must understand that the lkml > demographics are different. > > > and its kernel upgrade policies. > > [citation needed] You are quite wrong, despite the sarcastic tone you are attempting to use, and this is distro kernel policy 101. For distros such as Fedora it's simpler to support the same kernel version across many older versions of the distro than having to support different kernel versions. Check Fedora 12 for example. Four months ago it was released with kernel v2.6.31: http://download.fedora.redhat.com/pub/fedora/linux/releases/12/Fedora/x86_64/os/Packages/kernel-2.6.31.5-127.fc12.x86_64.rpm But if you update a Fedora 12 installation today you'll get kernel v2.6.32: http://download.fedora.redhat.com/pub/fedora/linux/updates/12/SRPMS/kernel-2.6.32.9-70.fc12.src.rpm As a result you'll get a new 2.6.32 kernel on Fedora 12. The end result is what i said in the previous mail: that you'll get a newer kernel even on a stable distro - while user-space packages will only be updated if there's a security issue (and even then there's no version jump like for the kernel). > > > [...] What users eagerly replace their kernels? > > > > Those 99% who click on the 'install 193 updates' popup. > > That's not "eager". That's "I'm exasperated from guessing what's really > important; let's not have so many updates; meh". Erm, fact is, 99% [WAG] of the users click on the update button and accept whatever kernel version the distro update offers them. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 8:56 ` Ingo Molnar 2010-03-18 9:24 ` Alexander Graf @ 2010-03-18 10:12 ` Avi Kivity 2010-03-18 10:28 ` Ingo Molnar 2010-03-18 10:50 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-18 10:12 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 10:56 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/17/2010 10:10 AM, Ingo Molnar wrote: >> >>> >>>> It's about who owns the user interface. >>>> >>>> If qemu owns the user interface, than we can satisfy this in a very >>>> simple way by adding a perf monitor command. If we have to support third >>>> party tools, then it significantly complicates things. >>>> >>> Of course illogical modularization complicates things 'significantly'. >>> >> Who should own the user interface then? >> > If qemu was in tools/kvm/ then we wouldnt have such issues. A single patch (or > series of patches) could modify tools/kvm/, arch/x86/kvm/, virt/ and > tools/perf/. > We would have exactly the same issues, only they would be in a single repository. The only difference is that we could ignore potential alternatives to qemu, libvirt, and RHEV-M. But that's not how kernel ABIs are developed, we try to make them general, not suited to just one consumer that happens to be close to our heart. > Numerous times did we have patches to kernel/perf_event.c that fixed some > detail, also accompanied by a tools/perf/ patch fixing another detail. Having > a single 'culture of contribution' is a powerful way to develop. > In fact kvm started out in a single repo, and it certainly made it easy to bring it up in baby steps. But we've long outgrown that. Maybe the difference is that perf is still new and thus needs tight cooperation. If/when perf gains a real GUI, I doubt more than 1% of the patches will touch both kernel and userspace. > It turns out kernel developers can be pretty good user-space developers as > well and user-space developers can be pretty good kernel developers as well. > Some like to do both - as long as it's all within a single project. > Very childish of them. If someone wants to contribute to a userspace project, they can swallow their pride and send patches to a non-kernel mailing list and repository. > The moment any change (be it as trivial as fixing a GUI detail or as complex > as a new feature) involves two or more packages, development speed slows down > to a crawl - while the complexity of the change might be very low! > Why is that? I the maintainers of all packages are cooperative and responsive, then the patches will get accepted quickly. If they aren't, development will be slow. It isn't any different from contributing to two unrelated kernel subsystems (which are in fact in different repositories until the next merge window). > Also, there's the harmful process that people start categorizing themselves > into 'I am a kernel developer' and 'I am a user space programmer' stereotypes, > which limits the scope of contributions artificially. > You're encouraging this with your proposal. You're basically using the glory of kernel development to attract people to userspace. >>> Fast forward to 2010. The kernel side of KVM is maximum goodness - by far >>> the worst-quality remaining aspects of KVM are precisely in areas that you >>> mention: 'if we have to support third party tools, then it significantly >>> complicates things'. You kept Qemu as an external 'third party' entity to >>> KVM, and KVM is clearly hurting from that - just see the recent KVM >>> usability thread for examples about suckage. >>> >> Any qemu usability problems are because developers (or their employers) are >> not interested in fixing them, not because of the repository location. Most >> kvm developer interest is in server-side deployment (even for desktop >> guests), so there is limited effort in implementing a virtualbox-style GUI. >> > The same has been said of oprofile as well: 'it somewhat sucks because we are > too server centric', 'nobody is interested in good usability and oprofile is > fine for the enterprises'. Ironically, the same has been said of Xen usability > as well, up to the point KVM came around. > > What was the core of the problem was a bad design and a split kernel-side > user-side tool landscape. > I can accept the bad design (not knowing any of the details), but how can the kernel/user split affect usability? > In fact i think saying that 'our developers only care about the server' is > borderline dishonest, when at the same time you are making it doubly sure (by > inaction) that it stays so: by leaving an artificial package wall between > kernel-side KVM and user-side KVM and not integrating the two technologies. > The wall is maybe four nanometers high. Please be serious. If someone wants to work on qemu usability all they have to do is to clone the repository and start sending patches to qemu-devel@. What's gained by putting it in the kernel repository? You're saving a minute's worth of clone, and that only for people who already happen to be kernel developers. > You'll never know what heights you could achieve if you leave that wall there > ... > I truly don't know. What highly usable GUIs were developed in the kernel? > Furthermore, what should be realized is that bad usability hurts "server > features" just as much. Most of the day-to-day testing is done on the desktop > by desktop oriented testers/developers. _Not_ by enterprise shops - they tend > to see the code years down the line to begin with ... > > Yes, a particular feature might be server oriented, but a good portion of our > testing is on the desktop and everyone is hurting from bad usability and this > puts limits on contribution efficiency. > I'm not saying that improved usability isn't a good thing, but time spent on improving the GUI is time not spent on the features that we really want. Desktop oriented users also rarely test 16 vcpu guests with tons of RAM exercising 10Gb NICs and a SAN. Instead they care about graphics performance for 2vcpu/1GB guests. > As the patch posted in _this very thread demonstrates it_, it is doubly more > difficult to contribute a joint KVM+Qemu feature, because it's two separate > code bases, two contribution guidelines, two release schedules. While to the > user it really is just one and the same thing. It should be so for the > developer as well. > It's hard to contribute a patch that goes against the architecture of the system, where kvm deals with cpu virtualization, qemu (or theoretically another tool) manages a guest, and libvirt (or another tool) manages the host. You want a list of guests to be provided by qemu or the kernel, and that simply isn't how the system works. > Put in another way: KVM's current split design is making it easy to contribute > server features (because the kernel side is clean and cool), but also makes it > artificially hard to contribute desktop features: because the tooling side > (Qemu) is 'just another package', is separated by a package and maintenance > wall Most server oriented patches in qemu/kvm have gone into qemu, not kvm (simply because it sees many more patches overall). It isn't hard to contribute to 'just another package', I have 1700 packages installed on my desktop and only one of them is a kernel. Anyway your arguments apply equally well to gedit. > and is made somewhat uncool by a (as some KVM developers have pointed out > in this thread) quirky codebase. > The qemu codebase is in fact quirky, but cp won't solve it. Only long patchsets to qemu-devel@. > (the rest of your points are really a function of this fundamental > disagreement) > I disagree. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 10:12 ` Avi Kivity @ 2010-03-18 10:28 ` Ingo Molnar 2010-03-18 10:50 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 10:28 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/18/2010 10:56 AM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>On 03/17/2010 10:10 AM, Ingo Molnar wrote: > >>>>It's about who owns the user interface. > >>>> > >>>>If qemu owns the user interface, than we can satisfy this in a very > >>>>simple way by adding a perf monitor command. If we have to support third > >>>>party tools, then it significantly complicates things. > >>>Of course illogical modularization complicates things 'significantly'. > >>Who should own the user interface then? > >If qemu was in tools/kvm/ then we wouldnt have such issues. A single patch (or > >series of patches) could modify tools/kvm/, arch/x86/kvm/, virt/ and > >tools/perf/. > > We would have exactly the same issues, only they would be in a single > repository. The only difference is that we could ignore potential > alternatives to qemu, libvirt, and RHEV-M. But that's not how kernel ABIs > are developed, we try to make them general, not suited to just one consumer > that happens to be close to our heart. Not at all - as i replied to in a previous mail, tools/perf/ still has a clear userspace interface and ABI, and external projects are making use of it. So there's no problem with the ABI at all. In fact our experience has been the opposite: the perf ABI is markedly better _because_ there's an immediate consumer of it in the form of tools/perf/. It gets tested better and external projects can get their ABI tweaks in as well and can provide a reference implementation for tools/perf. This has happened a couple of times. It's a win-win scenario. So the exact opposite of what you suggest is happening in practice. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 10:12 ` Avi Kivity 2010-03-18 10:28 ` Ingo Molnar @ 2010-03-18 10:50 ` Ingo Molnar 2010-03-18 11:30 ` Avi Kivity 2010-03-18 21:02 ` Zachary Amsden 1 sibling, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 10:50 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > > The moment any change (be it as trivial as fixing a GUI detail or as > > complex as a new feature) involves two or more packages, development speed > > slows down to a crawl - while the complexity of the change might be very > > low! > > Why is that? It's very simple: because the contribution latencies and overhead compound, almost inevitably. If you ever tried to implement a combo GCC+glibc+kernel feature you'll know ... Even with the best-run projects in existence it takes forever and is very painful - and here i talk about first hand experience over many years. > I the maintainers of all packages are cooperative and responsive, then the > patches will get accepted quickly. If they aren't, development will be > slow. [...] I'm afraid practice is different from the rosy ideal you paint there. Even with assumed 'perfect projects' there's always random differences between projects, causing doubled (tripled) overhead and compounded up overhead: - random differences in release schedules - random differences in contribution guidelines - random differences in coding style > [...] It isn't any different from contributing to two unrelated kernel > subsystems (which are in fact in different repositories until the next merge > window). You mention a perfect example: contributing to multipe kernel subsystems. Even _that_ is very noticeably harder than contributing to a single subsystem - due to the inevitable buerocratic overhead, due to different development trees, due to different merge criteria. So you are underlining my point (perhaps without intending to): treating closely related bits of technology as a single project is much better. Obviously arch/x86/kvm/, virt/ and tools/kvm/ should live in a single development repository (perhaps micro-differentiated by a few topical branches), for exactly those reasons you mention. Just like tools/perf/ and kernel/perf_event.c and arch/*/kernel/perf*.c are treated as a single project. [ Note: we actually started from a 'split' design [almost everyone picks that, because of this false 'kernel space bits must be separate from user space bits' myth] where the user-space component was a separate code base and unified it later on as the project progressed. Trust me, the practical benefits of the unified approach are enormous to developers and to users alike, and there was no looking back once we made the switch. ] Also, i dont really try to 'convince' you here - you made your position very clear early on and despite many unopposed technical arguments i made, the positions seem to have hardened and i expect it wont change, no matter what arguments i bring. It's a pity but hey, i'm just an observer here really - it's the rest of _your_ life this all impacts. I just wanted to point out the root cause of KVM's usability problems as i see it - just like i was pointing out the mortal Xen design deficiencies back when i was backing KVM strongly, four years ago. Back then everyone was saying that i'm crazy and we are stuck with Xen forever and while KVM is nice it has no chance. Just because you got the kernel bits of KVM right a few years ago does not mean you cannot mess up other design aspects, and sometimes badly so ;-) Historically i messed up more than half of all first-gut-feeling technical design decisions i did, so i had to correct the course many, many times. I hope you are still keeping an open mind about it all and dont think that because the project was split for 4 years (to no fault of your own, simply out of necessity) it should be split forever ... arch/x86 was split for a much longer period than that. Circumstances have changed. Most Qemu users/contributions are now coming from the KVM angle, so please simply start thinking about the next level of evolution. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 10:50 ` Ingo Molnar @ 2010-03-18 11:30 ` Avi Kivity 2010-03-18 11:48 ` Ingo Molnar 2010-03-18 21:02 ` Zachary Amsden 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-18 11:30 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 12:50 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> The moment any change (be it as trivial as fixing a GUI detail or as >>> complex as a new feature) involves two or more packages, development speed >>> slows down to a crawl - while the complexity of the change might be very >>> low! >>> >> Why is that? >> > It's very simple: because the contribution latencies and overhead compound, > almost inevitably. > It's not inevitable, if the projects are badly run, you'll have high latencies, but projects don't have to be badly run. > If you ever tried to implement a combo GCC+glibc+kernel feature you'll know > ... > > Even with the best-run projects in existence it takes forever and is very > painful - and here i talk about first hand experience over many years. > Try sending a patch to qemu-devel@, you may be pleasantly surprised. >> I the maintainers of all packages are cooperative and responsive, then the >> patches will get accepted quickly. If they aren't, development will be >> slow. [...] >> > I'm afraid practice is different from the rosy ideal you paint there. Even > with assumed 'perfect projects' there's always random differences between > projects, causing doubled (tripled) overhead and compounded up overhead: > > - random differences in release schedules > > - random differences in contribution guidelines > > - random differences in coding style > None of these matter for steady contributors. >> [...] It isn't any different from contributing to two unrelated kernel >> subsystems (which are in fact in different repositories until the next merge >> window). >> > You mention a perfect example: contributing to multipe kernel subsystems. Even > _that_ is very noticeably harder than contributing to a single subsystem - due > to the inevitable buerocratic overhead, due to different development trees, > due to different merge criteria. > > So you are underlining my point (perhaps without intending to): treating > closely related bits of technology as a single project is much better. > > Obviously arch/x86/kvm/, virt/ and tools/kvm/ should live in a single > development repository (perhaps micro-differentiated by a few topical > branches), for exactly those reasons you mention. > How is a patch for the qemu GUI eject button and the kvm shadow mmu related? Should a single maintainer deal with both? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 11:30 ` Avi Kivity @ 2010-03-18 11:48 ` Ingo Molnar 2010-03-18 12:22 ` Avi Kivity 2010-03-18 14:53 ` Anthony Liguori 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 11:48 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/18/2010 12:50 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>>The moment any change (be it as trivial as fixing a GUI detail or as > >>>complex as a new feature) involves two or more packages, development speed > >>>slows down to a crawl - while the complexity of the change might be very > >>>low! > >>Why is that? > > It's very simple: because the contribution latencies and overhead compound, > > almost inevitably. > > It's not inevitable, if the projects are badly run, you'll have high > latencies, but projects don't have to be badly run. So the 64K dollar question is, why does Qemu still suck? > > >If you ever tried to implement a combo GCC+glibc+kernel feature you'll know > >... > > > >Even with the best-run projects in existence it takes forever and is very > >painful - and here i talk about first hand experience over many years. > > Try sending a patch to qemu-devel@, you may be pleasantly surprised. > > > >>I the maintainers of all packages are cooperative and responsive, then the > >>patches will get accepted quickly. If they aren't, development will be > >>slow. [...] > >I'm afraid practice is different from the rosy ideal you paint there. Even > >with assumed 'perfect projects' there's always random differences between > >projects, causing doubled (tripled) overhead and compounded up overhead: > > > > - random differences in release schedules > > > > - random differences in contribution guidelines > > > > - random differences in coding style > > None of these matter for steady contributors. > > >>[...] It isn't any different from contributing to two unrelated kernel > >>subsystems (which are in fact in different repositories until the next merge > >>window). > >You mention a perfect example: contributing to multipe kernel subsystems. Even > >_that_ is very noticeably harder than contributing to a single subsystem - due > >to the inevitable buerocratic overhead, due to different development trees, > >due to different merge criteria. > > > >So you are underlining my point (perhaps without intending to): treating > >closely related bits of technology as a single project is much better. > > > > Obviously arch/x86/kvm/, virt/ and tools/kvm/ should live in a single > > development repository (perhaps micro-differentiated by a few topical > > branches), for exactly those reasons you mention. > > How is a patch for the qemu GUI eject button and the kvm shadow mmu related? > Should a single maintainer deal with both? We have co-maintainers for perf that have a different focus. It works pretty well. Look at git log tools/perf/ and how user-space and kernel-space components interact in practice. You'll patches that only impact one side, but you'll see very big overlap both in contributor identity and in patches as well. Also, let me put similar questions in a bit different way: - ' how is an in-kernel PIT emulation connected to Qemu's PIT emulation? ' - ' how is the in-kernel dynticks implementation related to Qemu's implementation of hardware timers? ' - ' how is an in-kernel event for a CD-ROM eject connected to an in-Qemu eject event? ' - ' how is a new hardware virtualization feature related to being able to configure and use it via Qemu? ' - ' how is the in-kernel x86 decoder/emulator related to the Qemu x86 emulator? ' - ' how is the performance of the qemu GUI related to the way VGA buffers are mapped and accelerated by KVM? ' They are obviously deeply related. The quality of a development process is not defined by the easy cases where no project unification is needed. The quality of a development process is defined by the _difficult_ cases. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 11:48 ` Ingo Molnar @ 2010-03-18 12:22 ` Avi Kivity 2010-03-18 13:00 ` Ingo Molnar 2010-03-18 14:53 ` Anthony Liguori 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-18 12:22 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 01:48 PM, Ingo Molnar wrote: > >> It's not inevitable, if the projects are badly run, you'll have high >> latencies, but projects don't have to be badly run. >> > So the 64K dollar question is, why does Qemu still suck? > Where people sent patches, it doesn't suck (or sucks less). Where they don't, it still sucks. And it cost way more than $64K. If moving things to tools/ helps, let's move Fedora to tools/. >> How is a patch for the qemu GUI eject button and the kvm shadow mmu related? >> Should a single maintainer deal with both? >> > We have co-maintainers for perf that have a different focus. It works pretty > well. > And it works well when I have patches that change x86 core and kvm. But that's no longer a single repository and we have to coordinate. > Look at git log tools/perf/ and how user-space and kernel-space components > interact in practice. You'll patches that only impact one side, but you'll see > very big overlap both in contributor identity and in patches as well. > > Also, let me put similar questions in a bit different way: > > - ' how is an in-kernel PIT emulation connected to Qemu's PIT emulation? ' > Both implement the same spec. One is be a code derivative of the other (via Xen). > - ' how is the in-kernel dynticks implementation related to Qemu's > implementation of hardware timers? ' > The quality of host kernel timers directly determines the quality of qemu's timer emulation. > - ' how is an in-kernel event for a CD-ROM eject connected to an in-Qemu > eject event? ' > Both implement the same spec. The kernel of course needs to handle all implementation variants, while qemu only needs to implement it once. > - ' how is a new hardware virtualization feature related to being able to > configure and use it via Qemu? ' > Most features (example: npt) are transparent to userspace, some are not. When they are not, we introduce an ioctl() to kvm for controlling the feature, and a command-line switch to qemu for calling it. > - ' how is the in-kernel x86 decoder/emulator related to the Qemu x86 > emulator? ' > Both implement the same spec. Note qemu is not an emulator but a binary translator. > - ' how is the performance of the qemu GUI related to the way VGA buffers are > mapped and accelerated by KVM? ' > kvm needs to support direct mapping when possible and efficient data transfer when not. The latter will obviously be much slower. When direct mapping is possible, kvm needs to track pages touched by the guest to avoid full screen redraws. The rest (interfacing to X or vnc, implementing emulated hardware acceleration, full-screen mode, etc.) are unrelated. > They are obviously deeply related. Not at all. kvm in fact knows nothing about vga, to take your last example. To suggest that qemu needs to be close to the kernel to benefit from the kernel's timer implementation means we don't care about providing quality timing except to ourselves, which luckily isn't the case. Some time ago the various desktops needed directory change notification, and people implemented inotify (or whatever it's called today). No one suggested tools/gnome/ and tools/kde/. > The quality of a development process is not > defined by the easy cases where no project unification is needed. The quality > of a development process is defined by the _difficult_ cases. > That's true, but we don't have issues at the qemu/kvm boundary. Note we do have issues at the qemu/aio interfaces and qemu/net interfaces (out of which vhost-net was born) but these wouldn't be solved by tools/qemu/. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 12:22 ` Avi Kivity @ 2010-03-18 13:00 ` Ingo Molnar 2010-03-18 13:36 ` Avi Kivity 2010-03-18 14:59 ` Anthony Liguori 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 13:00 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/18/2010 01:48 PM, Ingo Molnar wrote: > > > > It's not inevitable, if the projects are badly run, you'll have high > > > latencies, but projects don't have to be badly run. > > > > So the 64K dollar question is, why does Qemu still suck? > > Where people sent patches, it doesn't suck (or sucks less). Where they > don't, it still sucks. [...] So is your point that the development process and basic code structure does not matter at all, it's just a matter of people sending patches? I beg to differ ... > [...] And it cost way more than $64K. > > If moving things to tools/ helps, let's move Fedora to tools/. Those bits of Fedora which deeply relate to the kernel - yes. Those bits that are arguably separate - nope. > >> How is a patch for the qemu GUI eject button and the kvm shadow mmu > >> related? Should a single maintainer deal with both? > > > > We have co-maintainers for perf that have a different focus. It works > > pretty well. > > And it works well when I have patches that change x86 core and kvm. But > that's no longer a single repository and we have to coordinate. Actually, it works much better if, contrary to your proposal it ends up in a single repo. Last i checked both of us really worked on such a project, run by some guy. (Named Linus or so.) > > Look at git log tools/perf/ and how user-space and kernel-space components > > interact in practice. You'll patches that only impact one side, but you'll > > see very big overlap both in contributor identity and in patches as well. > > > > Also, let me put similar questions in a bit different way: > > > > - ' how is an in-kernel PIT emulation connected to Qemu's PIT emulation? ' > > Both implement the same spec. One is be a code derivative of the other (via > Xen). > > > - ' how is the in-kernel dynticks implementation related to Qemu's > > implementation of hardware timers? ' > > The quality of host kernel timers directly determines the quality of > qemu's timer emulation. > > > - ' how is an in-kernel event for a CD-ROM eject connected to an in-Qemu > > eject event? ' > > Both implement the same spec. The kernel of course needs to handle > all implementation variants, while qemu only needs to implement it > once. > > > - ' how is a new hardware virtualization feature related to being able to > > configure and use it via Qemu? ' > > Most features (example: npt) are transparent to userspace, some are > not. When they are not, we introduce an ioctl() to kvm for > controlling the feature, and a command-line switch to qemu for > calling it. > > > - ' how is the in-kernel x86 decoder/emulator related to the Qemu x86 > > emulator? ' > > Both implement the same spec. Note qemu is not an emulator but a > binary translator. > > > - ' how is the performance of the qemu GUI related to the way VGA buffers are > > mapped and accelerated by KVM? ' > > kvm needs to support direct mapping when possible and efficient data > transfer when not. The latter will obviously be much slower. When > direct mapping is possible, kvm needs to track pages touched by the > guest to avoid full screen redraws. The rest (interfacing to X or > vnc, implementing emulated hardware acceleration, full-screen mode, > etc.) are unrelated. > > > They are obviously deeply related. > > Not at all. [...] You are obviously arguing for something like UML. Fortunately KVM is not that. Or i hope it isnt. > [...] kvm in fact knows nothing about vga, to take your last > example. [...] Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG ioctl. See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap(). It started out as a VGA optimization (also used by live migration) and even today it's mostly used by the VGA drivers - albeit a weak one. I wish there were stronger VGA optimizations implemented, copying the dirty bitmap is not a particularly performant solution. (although it's certainly better than full emulation) Graphics performance is one of the more painful aspects of KVM usability today. > [...] To suggest that qemu needs to be close to the kernel to benefit from > the kernel's timer implementation means we don't care about providing > quality timing except to ourselves, which luckily isn't the case. That is not what i said. I said they are closely related, and where technologies are closely related, project proximity turns into project unification at a certain stage. > Some time ago the various desktops needed directory change > notification, and people implemented inotify (or whatever it's > called today). No one suggested tools/gnome/ and tools/kde/. You are misconstruing and misrepresenting my argument - i'd expect better. Gnome and KDE runs on other kernels as well and is generally not considered close to the kernel. Do you seriously argue that Qemu has nothing to do with KVM these days? > > The quality of a development process is not defined by the easy cases > > where no project unification is needed. The quality of a development > > process is defined by the _difficult_ cases. > > That's true, but we don't have issues at the qemu/kvm boundary. Note we do > have issues at the qemu/aio interfaces and qemu/net interfaces (out of which > vhost-net was born) but these wouldn't be solved by tools/qemu/. That was not what i suggested. They would be solved by what i proposed: tools/kvm/, right? Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:00 ` Ingo Molnar @ 2010-03-18 13:36 ` Avi Kivity 2010-03-18 14:09 ` Ingo Molnar 2010-03-18 14:59 ` Anthony Liguori 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-18 13:36 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 03:00 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/18/2010 01:48 PM, Ingo Molnar wrote: >> >> >>>> It's not inevitable, if the projects are badly run, you'll have high >>>> latencies, but projects don't have to be badly run. >>>> >>> So the 64K dollar question is, why does Qemu still suck? >>> >> Where people sent patches, it doesn't suck (or sucks less). Where they >> don't, it still sucks. [...] >> > So is your point that the development process and basic code structure does > not matter at all, it's just a matter of people sending patches? I beg to > differ ... > The development process of course matters, and we have worked hard to fix qemu's. Basic code structure also matters, but you don't fix that with cp. >> [...] And it cost way more than $64K. >> >> If moving things to tools/ helps, let's move Fedora to tools/. >> > Those bits of Fedora which deeply relate to the kernel - yes. > Those bits that are arguably separate - nope. > A qemu GUI is not deeply related to the kernel. Or at all. >>>> How is a patch for the qemu GUI eject button and the kvm shadow mmu >>>> related? Should a single maintainer deal with both? >>>> >>> We have co-maintainers for perf that have a different focus. It works >>> pretty well. >>> >> And it works well when I have patches that change x86 core and kvm. But >> that's no longer a single repository and we have to coordinate. >> > Actually, it works much better if, contrary to your proposal it ends up in a > single repo. Last i checked both of us really worked on such a project, run by > some guy. (Named Linus or so.) > Well, when last I sent x86 patches, they went to you and hpa, applied to tip, from which I had to merge them back. Two repositories. After several weeks they did end up in a third repository, Linus'. The process isn't trivial or fast, but it works. >>> Look at git log tools/perf/ and how user-space and kernel-space components >>> interact in practice. You'll patches that only impact one side, but you'll >>> see very big overlap both in contributor identity and in patches as well. >>> >>> Also, let me put similar questions in a bit different way: >>> >>> - ' how is an in-kernel PIT emulation connected to Qemu's PIT emulation? ' >>> >> Both implement the same spec. One is be a code derivative of the other (via >> Xen). >> >> >>> - ' how is the in-kernel dynticks implementation related to Qemu's >>> implementation of hardware timers? ' >>> >> The quality of host kernel timers directly determines the quality of >> qemu's timer emulation. >> >> >>> - ' how is an in-kernel event for a CD-ROM eject connected to an in-Qemu >>> eject event? ' >>> >> Both implement the same spec. The kernel of course needs to handle >> all implementation variants, while qemu only needs to implement it >> once. >> >> >>> - ' how is a new hardware virtualization feature related to being able to >>> configure and use it via Qemu? ' >>> >> Most features (example: npt) are transparent to userspace, some are >> not. When they are not, we introduce an ioctl() to kvm for >> controlling the feature, and a command-line switch to qemu for >> calling it. >> >> >>> - ' how is the in-kernel x86 decoder/emulator related to the Qemu x86 >>> emulator? ' >>> >> Both implement the same spec. Note qemu is not an emulator but a >> binary translator. >> >> >>> - ' how is the performance of the qemu GUI related to the way VGA buffers are >>> mapped and accelerated by KVM? ' >>> >> kvm needs to support direct mapping when possible and efficient data >> transfer when not. The latter will obviously be much slower. When >> direct mapping is possible, kvm needs to track pages touched by the >> guest to avoid full screen redraws. The rest (interfacing to X or >> vnc, implementing emulated hardware acceleration, full-screen mode, >> etc.) are unrelated. >> >> >>> They are obviously deeply related. >>> >> Not at all. [...] >> > You are obviously arguing for something like UML. Fortunately KVM is not that. > Or i hope it isnt. > I am not arguing for UML and don't understand why you think so. >> [...] kvm in fact knows nothing about vga, to take your last >> example. [...] >> > Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG ioctl. > > See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap(). > > It started out as a VGA optimization (also used by live migration) and even > today it's mostly used by the VGA drivers - albeit a weak one. > > I wish there were stronger VGA optimizations implemented, copying the dirty > bitmap is not a particularly performant solution. The VGA dirty bitmap is 256 bytes in length. Copying it doesn't take any time at all. People are in fact working on a copy-less dirty bitmap solution, for live migration of very large memory guests. Expect set_bit_user() patches for tip.git. > (although it's certainly > better than full emulation) Graphics performance is one of the more painful > aspects of KVM usability today. > If you have suggestions for further optimizations (or even patches) I'd love to hear them. One solution we are working on is QXL, a framebuffer-less graphics card designed for spice. The use case is again server based (hosted desktops) but may be adapted for desktop-on-desktop use. >> [...] To suggest that qemu needs to be close to the kernel to benefit from >> the kernel's timer implementation means we don't care about providing >> quality timing except to ourselves, which luckily isn't the case. >> > That is not what i said. I said they are closely related, and where > technologies are closely related, project proximity turns into project > unification at a certain stage. > I really don't see how. So what if both qemu and kvm implement an i8254? They can't share any code since the internal APIs are so different. Even worse for the x86 emulator as qemu and kvm are fundamentally different. Even more with the qemu timers and kernel dyntick code. >> Some time ago the various desktops needed directory change >> notification, and people implemented inotify (or whatever it's >> called today). No one suggested tools/gnome/ and tools/kde/. >> > You are misconstruing and misrepresenting my argument - i'd expect better. > Gnome and KDE runs on other kernels as well and is generally not considered > close to the kernel. > qemu runs on other kernels (including Windows), just without kvm. > Do you seriously argue that Qemu has nothing to do with KVM these days? > The vast majority of qemu has nothing to do with kvm, all the kvm interface bits are in two files. Things like the GUI, the VNC server, IDE emulation, the management interface (the monitor), live migration, qcow2 and ~15 other file format drivers, chipset emulation, USB controller emulation, snapshot support, slirp, serial port emulation, and a zillion other details have nothing to do with kvm. >>> The quality of a development process is not defined by the easy cases >>> where no project unification is needed. The quality of a development >>> process is defined by the _difficult_ cases. >>> >> That's true, but we don't have issues at the qemu/kvm boundary. Note we do >> have issues at the qemu/aio interfaces and qemu/net interfaces (out of which >> vhost-net was born) but these wouldn't be solved by tools/qemu/. >> > That was not what i suggested. They would be solved by what i proposed: > tools/kvm/, right? > If they were, it would be worth it. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:36 ` Avi Kivity @ 2010-03-18 14:09 ` Ingo Molnar 2010-03-18 14:38 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 14:09 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > > That is not what i said. I said they are closely related, and where > > technologies are closely related, project proximity turns into project > > unification at a certain stage. > > I really don't see how. So what if both qemu and kvm implement an i8254? > They can't share any code since the internal APIs are so different. [...] I wouldnt jump to assumptions there. perf shares some facilities with the kernel on the source code level - they can be built both in the kernel and in user-space. But my main thought wasnt even to actually share the implementation - but to actually synchronize when a piece of device emulation moves into the kernel. It is arguably bad for performance in most cases when Qemu handles a given device - so all the common devices should be kernel accelerated. The version and testing matrix would be simplified significantly as well: as kernel and qemu goes hand in hand, they are always on the same version. > [...] Even worse for the x86 emulator as qemu and kvm are fundamentally > different. So is it your argument that the difference and the duplication in x86 instruction emulation is a good thing? You said it some time ago that the kvm x86 emulator was very messy and you wish it was cleaner. While qemu's is indeed rather different (it's partly a translator/JIT), i'm sure the decoder logic could be shared - and qemu has a slow-path full-emulation fallback in any case, which is similar to what in-kernel emulator does (IIRC ...). That might have changed meanwhile. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 14:09 ` Ingo Molnar @ 2010-03-18 14:38 ` Avi Kivity 2010-03-18 17:16 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-18 14:38 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 04:09 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> That is not what i said. I said they are closely related, and where >>> technologies are closely related, project proximity turns into project >>> unification at a certain stage. >>> >> I really don't see how. So what if both qemu and kvm implement an i8254? >> They can't share any code since the internal APIs are so different. [...] >> > I wouldnt jump to assumptions there. perf shares some facilities with the > kernel on the source code level - they can be built both in the kernel and in > user-space. > > But my main thought wasnt even to actually share the implementation - but to > actually synchronize when a piece of device emulation moves into the kernel. > It is arguably bad for performance in most cases when Qemu handles a given > device - so all the common devices should be kernel accelerated. > > The version and testing matrix would be simplified significantly as well: as > kernel and qemu goes hand in hand, they are always on the same version. > So, you propose to allow running tools/kvm/ only on the kernel it was shipped with? Otherwise the testing matrix isn't simplified. >> [...] Even worse for the x86 emulator as qemu and kvm are fundamentally >> different. >> > So is it your argument that the difference and the duplication in x86 > instruction emulation is a good thing? Of course it isn't a good thing, but it is unavoidable. Qemu compiles code just-in-time to avoid interpretation overhead, while kvm emulates one instruction at a time. No caching is possible, especially with ept/npt, since the guest is free to manipulate memory with no notification to the host. Qemu also supports the full instruction set while kvm only implements what is necessary. Qemu is a multi-source/multi-target translator while kvm's emulator is x86 specific. > You said it some time ago that > the kvm x86 emulator was very messy and you wish it was cleaner. > It's still messy but is being cleaned up. > While qemu's is indeed rather different (it's partly a translator/JIT), i'm > sure the decoder logic could be shared - and qemu has a slow-path > full-emulation fallback in any case, which is similar to what in-kernel > emulator does (IIRC ...). > > That might have changed meanwhile. > IIUC it only ever translates. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 14:38 ` Avi Kivity @ 2010-03-18 17:16 ` Ingo Molnar 0 siblings, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 17:16 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/18/2010 04:09 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>> That is not what i said. I said they are closely related, and where > >>> technologies are closely related, project proximity turns into project > >>> unification at a certain stage. > >> > >> I really don't see how. So what if both qemu and kvm implement an i8254? > >> They can't share any code since the internal APIs are so different. [...] > > > > I wouldnt jump to assumptions there. perf shares some facilities with the > > kernel on the source code level - they can be built both in the kernel and > > in user-space. > > > > But my main thought wasnt even to actually share the implementation - but > > to actually synchronize when a piece of device emulation moves into the > > kernel. It is arguably bad for performance in most cases when Qemu handles > > a given device - so all the common devices should be kernel accelerated. > > > > The version and testing matrix would be simplified significantly as well: > > as kernel and qemu goes hand in hand, they are always on the same version. > > So, you propose to allow running tools/kvm/ only on the kernel it was > shipped with? No, but i propose concentrating on that natural combination. > Otherwise the testing matrix isn't simplified. It is, because testing is more focused and more people are testing the combination that developers tested as well. (and not some random version combination picked by the distributor or the user) Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:00 ` Ingo Molnar 2010-03-18 13:36 ` Avi Kivity @ 2010-03-18 14:59 ` Anthony Liguori 2010-03-18 15:17 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-18 14:59 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 08:00 AM, Ingo Molnar wrote: >> [...] kvm in fact knows nothing about vga, to take your last >> example. [...] >> > Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG ioctl. > > See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap(). > > It started out as a VGA optimization (also used by live migration) and even > today it's mostly used by the VGA drivers - albeit a weak one. > > I wish there were stronger VGA optimizations implemented, copying the dirty > bitmap is not a particularly performant solution. (although it's certainly > better than full emulation) Graphics performance is one of the more painful > aspects of KVM usability today. > We have to maintain a dirty bitmap because we don't have a paravirtual graphics driver. IOW, someone needs to write an Xorg driver. Ideally, we could just implement a Linux framebuffer device, right? Well, we took that approach in Xen and that sucks even worse because the Xorg framebuffer driver doesn't implement any of the optimizations that the Linux framebuffer supports and the Xorg driver does not provide use the kernel's interfaces for providing update regions. Of course, we need to pull in X into the kernel to fix this, right? Any sufficiently complicated piece of software is going to interact with a lot of other projects. The solution is not to pull it all into one massive repository. It's to build relationships and to find ways to efficiently work with the various communities. And we're working on this with X. We'll have a paravirtual graphics driver very soon. There are no magic solutions. We need more developers working on the hard problems. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 14:59 ` Anthony Liguori @ 2010-03-18 15:17 ` Ingo Molnar 2010-03-18 16:11 ` Anthony Liguori 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 15:17 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/18/2010 08:00 AM, Ingo Molnar wrote: > >> > >> [...] kvm in fact knows nothing about vga, to take your last example. > >> [...] > > > > Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG > > ioctl. > > > > See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap(). > > > > It started out as a VGA optimization (also used by live migration) and > > even today it's mostly used by the VGA drivers - albeit a weak one. > > > > I wish there were stronger VGA optimizations implemented, copying the > > dirty bitmap is not a particularly performant solution. (although it's > > certainly better than full emulation) Graphics performance is one of the > > more painful aspects of KVM usability today. > > We have to maintain a dirty bitmap because we don't have a paravirtual > graphics driver. IOW, someone needs to write an Xorg driver. > > Ideally, we could just implement a Linux framebuffer device, right? No, you'd want to interact with DRM. ( Especially as you want to write guest accelerators passing guest-space OpenGL requests straight to the kernel DRM level. ) Especially if you want to do things like graphics card virtualization, with aspects of the graphics driver passed through to the guest OS. There are all kernel space projects, going through Xorg would be a horrible waste of performance for full-screen virtualization. It's fine for the windowed or networked case (and good as a compatibility fallback), but very much not fine for local desktop use. > Well, we took that approach in Xen and that sucks even worse because the > Xorg framebuffer driver doesn't implement any of the optimizations that the > Linux framebuffer supports and the Xorg driver does not provide use the > kernel's interfaces for providing update regions. > > Of course, we need to pull in X into the kernel to fix this, right? FYI, this part of X has already been pulled into the kernel, it's called DRM. If then it's being expanded. > Any sufficiently complicated piece of software is going to interact with a > lot of other projects. The solution is not to pull it all into one massive > repository. It's to build relationships and to find ways to efficiently > work with the various communities. That's my whole point with this thread: the kernel side of KVM and qemu, but all practical purposes should not be two 'separate communities'. They should be one and the same thing. Separation makes sense where the relationship is light or strictly hierarchical - here it's neither. KVM and Qemu is interconnected, quite fundamentally so. > And we're working on this with X. We'll have a paravirtual graphics driver > very soon. There are no magic solutions. We need more developers working > on the hard problems. The thing is, writing up a DRM connector to a guest Linux OS could be done in no time. It could be deployed to users in no time as well, with the proper development model. That after years and years of waiting proper GX support is _still_ not implemented in KVM is really telling of the efficiency of development based on such disjoint 'communities'. Maybe put up a committee as well to increase efficiency? ;-) Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 15:17 ` Ingo Molnar @ 2010-03-18 16:11 ` Anthony Liguori 2010-03-18 16:28 ` Ingo Molnar 2010-03-19 9:19 ` Paul Mundt 0 siblings, 2 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-18 16:11 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 10:17 AM, Ingo Molnar wrote: > * Anthony Liguori<anthony@codemonkey.ws> wrote: > > >> On 03/18/2010 08:00 AM, Ingo Molnar wrote: >> >>>> [...] kvm in fact knows nothing about vga, to take your last example. >>>> [...] >>>> >>> Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG >>> ioctl. >>> >>> See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap(). >>> >>> It started out as a VGA optimization (also used by live migration) and >>> even today it's mostly used by the VGA drivers - albeit a weak one. >>> >>> I wish there were stronger VGA optimizations implemented, copying the >>> dirty bitmap is not a particularly performant solution. (although it's >>> certainly better than full emulation) Graphics performance is one of the >>> more painful aspects of KVM usability today. >>> >> We have to maintain a dirty bitmap because we don't have a paravirtual >> graphics driver. IOW, someone needs to write an Xorg driver. >> >> Ideally, we could just implement a Linux framebuffer device, right? >> > No, you'd want to interact with DRM. > Using DRM doesn't help very much. You still need an X driver and most of the operations you care about (video rendering, window movement, etc) are not operations that need to go through DRM. 3D graphics virtualization is extremely difficult in the non-passthrough case. It really requires hardware support that isn't widely available today (outside a few NVIDIA chipsets). >> Xorg framebuffer driver doesn't implement any of the optimizations that the >> Linux framebuffer supports and the Xorg driver does not provide use the >> kernel's interfaces for providing update regions. >> >> Of course, we need to pull in X into the kernel to fix this, right? >> > FYI, this part of X has already been pulled into the kernel, it's called DRM. > If then it's being expanded. > It doesn't provide the things we need to a good user experience. You need things like an absolute input device, host driven display resize, RGBA hardware cursors. None of these go through DRI and it's those things that really provide the graphics user experience. >> Any sufficiently complicated piece of software is going to interact with a >> lot of other projects. The solution is not to pull it all into one massive >> repository. It's to build relationships and to find ways to efficiently >> work with the various communities. >> > That's my whole point with this thread: the kernel side of KVM and qemu, but > all practical purposes should not be two 'separate communities'. They should > be one and the same thing. > I don't know why you keep saying this. The people who are in these "separate communities" keep claiming that they don't feel this way. I'm not just saying this to be argumentative. Many of the people in the community have thought this same thing, and tried it themselves, and we've all come to the same conclusion. It's certainly possible that we just missed the obvious thing to do but we'll never know that unless someone shows us. > The thing is, writing up a DRM connector to a guest Linux OS could be done in > no time. It could be deployed to users in no time as well, with the proper > development model. > If this is true, please demonstrate it. Prove your point with patches and I'll happily turn around and do whatever I can to help out. > That after years and years of waiting proper GX support is _still_ not > implemented in KVM is really telling of the efficiency of development based on > such disjoint 'communities'. Maybe put up a committee as well to increase > efficiency? ;-) > Nah, instead we can just have a few hundred mail thread on the list. Otherwise we'd have to write patches and do other kinds of productive work. Regards, Anthony Liguori > Ingo > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:11 ` Anthony Liguori @ 2010-03-18 16:28 ` Ingo Molnar 2010-03-18 16:38 ` Anthony Liguori 2010-03-19 9:19 ` Paul Mundt 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 16:28 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/18/2010 10:17 AM, Ingo Molnar wrote: > >* Anthony Liguori<anthony@codemonkey.ws> wrote: > > > >>On 03/18/2010 08:00 AM, Ingo Molnar wrote: > >>>>[...] kvm in fact knows nothing about vga, to take your last example. > >>>>[...] > >>>Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG > >>>ioctl. > >>> > >>>See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap(). > >>> > >>>It started out as a VGA optimization (also used by live migration) and > >>>even today it's mostly used by the VGA drivers - albeit a weak one. > >>> > >>>I wish there were stronger VGA optimizations implemented, copying the > >>>dirty bitmap is not a particularly performant solution. (although it's > >>>certainly better than full emulation) Graphics performance is one of the > >>>more painful aspects of KVM usability today. > >>We have to maintain a dirty bitmap because we don't have a paravirtual > >>graphics driver. IOW, someone needs to write an Xorg driver. > >> > >>Ideally, we could just implement a Linux framebuffer device, right? > >No, you'd want to interact with DRM. > > Using DRM doesn't help very much. You still need an X driver and most of > the operations you care about (video rendering, window movement, etc) are > not operations that need to go through DRM. You stripped out this bit from my reply: > > There are all kernel space projects, going through Xorg would be a > > horrible waste of performance for full-screen virtualization. It's fine > > for the windowed or networked case (and good as a compatibility fallback), > > but very much not fine for local desktop use. For the full-screen case (which is a very common mode of using a guest OS on the desktop) there's not much of window management needed. You need to save/restore as you switch in/out. > 3D graphics virtualization is extremely difficult in the non-passthrough > case. It really requires hardware support that isn't widely available today > (outside a few NVIDIA chipsets). Granted it's difficult in the general case. > >>Xorg framebuffer driver doesn't implement any of the optimizations that the > >>Linux framebuffer supports and the Xorg driver does not provide use the > >>kernel's interfaces for providing update regions. > >> > >>Of course, we need to pull in X into the kernel to fix this, right? > > > > FYI, this part of X has already been pulled into the kernel, it's called > > DRM. If then it's being expanded. > > It doesn't provide the things we need to a good user experience. You need > things like an absolute input device, host driven display resize, RGBA > hardware cursors. None of these go through DRI and it's those things that > really provide the graphics user experience. With KSM the display resize is in the kernel. Cursor management is not. Yet: i think it would be a nice feature as the cursor could move even if Xorg is blocked or busy with other things. > >> Any sufficiently complicated piece of software is going to interact with > >> a lot of other projects. The solution is not to pull it all into one > >> massive repository. It's to build relationships and to find ways to > >> efficiently work with the various communities. > > > > That's my whole point with this thread: the kernel side of KVM and qemu, > > but all practical purposes should not be two 'separate communities'. They > > should be one and the same thing. > > I don't know why you keep saying this. The people who are in these > "separate communities" keep claiming that they don't feel this way. If you are not two separate communities but one community, then why do you go through the (somewhat masochistic) self-punishing excercise of keeping the project in two different pieces? In a distant past Qemu was a separate project and KVM was just a newcomer who used it for fancy stuff. Today as you say(?) the two communities are one and the same. Why not bring it to its logical conclusion? > I'm not just saying this to be argumentative. Many of the people in the > community have thought this same thing, and tried it themselves, and we've > all come to the same conclusion. > > It's certainly possible that we just missed the obvious thing to do but > we'll never know that unless someone shows us. I'm not aware of anyone in the past having attempted to move qemu to tools/kvm/ in the uptream kernel repo, and having reported on the experiences with such a contribution setup. (obviously it's not possible at all without heavy cooperation and acceptance from you and Avi, so this will probably remain a thought experiment forever) If then you must refer to previous attempts to 'strip down' Qemu, right? Those attempts didnt really solve the fundamental problem of project code base separation. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:28 ` Ingo Molnar @ 2010-03-18 16:38 ` Anthony Liguori 2010-03-18 16:51 ` Pekka Enberg 0 siblings, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-18 16:38 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 11:28 AM, Ingo Molnar wrote: >>> There are all kernel space projects, going through Xorg would be a >>> horrible waste of performance for full-screen virtualization. It's fine >>> for the windowed or networked case (and good as a compatibility fallback), >>> but very much not fine for local desktop use. >>> > For the full-screen case (which is a very common mode of using a guest OS on > the desktop) there's not much of window management needed. You need to > save/restore as you switch in/out. > I don't think I've ever used full-screen mode with my VMs and I use virtualization on a daily basis. We hear very infrequently from users using full screen mode. >> 3D graphics virtualization is extremely difficult in the non-passthrough >> case. It really requires hardware support that isn't widely available today >> (outside a few NVIDIA chipsets). >> > Granted it's difficult in the general case. > > >>>> Xorg framebuffer driver doesn't implement any of the optimizations that the >>>> Linux framebuffer supports and the Xorg driver does not provide use the >>>> kernel's interfaces for providing update regions. >>>> >>>> Of course, we need to pull in X into the kernel to fix this, right? >>>> >>> FYI, this part of X has already been pulled into the kernel, it's called >>> DRM. If then it's being expanded. >>> >> It doesn't provide the things we need to a good user experience. You need >> things like an absolute input device, host driven display resize, RGBA >> hardware cursors. None of these go through DRI and it's those things that >> really provide the graphics user experience. >> > With KSM the display resize is in the kernel. KMS > Cursor management is not. Yet: i > think it would be a nice feature as the cursor could move even if Xorg is > blocked or busy with other things. > If it was all in the kernel, we'd try to support it. >>>> Any sufficiently complicated piece of software is going to interact with >>>> a lot of other projects. The solution is not to pull it all into one >>>> massive repository. It's to build relationships and to find ways to >>>> efficiently work with the various communities. >>>> >>> That's my whole point with this thread: the kernel side of KVM and qemu, >>> but all practical purposes should not be two 'separate communities'. They >>> should be one and the same thing. >>> >> I don't know why you keep saying this. The people who are in these >> "separate communities" keep claiming that they don't feel this way. >> > If you are not two separate communities but one community, then why do you go > through the (somewhat masochistic) self-punishing excercise of keeping the > project in two different pieces? > I don't see any actual KVM developer complaining about this so I'm not sure why you're describing it like this. > In a distant past Qemu was a separate project and KVM was just a newcomer who > used it for fancy stuff. Today as you say(?) the two communities are one and > the same. Why not bring it to its logical conclusion? > We lose a huge amount of users and contributors if we put QEMU in the Linux kernel. As I said earlier, a huge number of our contributions come from people not using KVM. >> I'm not just saying this to be argumentative. Many of the people in the >> community have thought this same thing, and tried it themselves, and we've >> all come to the same conclusion. >> >> It's certainly possible that we just missed the obvious thing to do but >> we'll never know that unless someone shows us. >> > I'm not aware of anyone in the past having attempted to move qemu to > tools/kvm/ in the uptream kernel repo, and having reported on the experiences > with such a contribution setup. (obviously it's not possible at all without > heavy cooperation and acceptance from you and Avi, so this will probably > remain a thought experiment forever) > We've tried to create a "clean" version of QEMU specifically for KVM. Moving it into tools/kvm would be the second step. We've all failed on the firs step. > If then you must refer to previous attempts to 'strip down' Qemu, right? Those > attempts didnt really solve the fundamental problem of project code base > separation. > If the problem is combining the two, I've sent you a patch that you can put into tip.git if you're so inclined. Regards, Anthony Liguori > Ingo > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:38 ` Anthony Liguori @ 2010-03-18 16:51 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-18 16:51 UTC (permalink / raw) To: Anthony Liguori Cc: Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Thu, Mar 18, 2010 at 6:38 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: >>>> There are all kernel space projects, going through Xorg would be a >>>> horrible waste of performance for full-screen virtualization. It's fine >>>> for the windowed or networked case (and good as a compatibility >>>> fallback), but very much not fine for local desktop use. >> >> For the full-screen case (which is a very common mode of using a guest OS >> on the desktop) there's not much of window management needed. You need to >> save/restore as you switch in/out. > > I don't think I've ever used full-screen mode with my VMs and I use > virtualization on a daily basis. > > We hear very infrequently from users using full screen mode. Sorry for getting slightly off-topic but I find the above statement interesting. I don't use virtualization on daily basis but a working, fully integrated full-screen model with VirtualBox was the only reason I bothered to give VMs a second chance. From my point of view, the user experience of earlier versions (e.g. Parallels) was just too painful to live with. /me crawls back to his hole now... Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-18 16:51 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-18 16:51 UTC (permalink / raw) To: Anthony Liguori Cc: Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Thu, Mar 18, 2010 at 6:38 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: >>>> There are all kernel space projects, going through Xorg would be a >>>> horrible waste of performance for full-screen virtualization. It's fine >>>> for the windowed or networked case (and good as a compatibility >>>> fallback), but very much not fine for local desktop use. >> >> For the full-screen case (which is a very common mode of using a guest OS >> on the desktop) there's not much of window management needed. You need to >> save/restore as you switch in/out. > > I don't think I've ever used full-screen mode with my VMs and I use > virtualization on a daily basis. > > We hear very infrequently from users using full screen mode. Sorry for getting slightly off-topic but I find the above statement interesting. I don't use virtualization on daily basis but a working, fully integrated full-screen model with VirtualBox was the only reason I bothered to give VMs a second chance. From my point of view, the user experience of earlier versions (e.g. Parallels) was just too painful to live with. /me crawls back to his hole now... Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:51 ` Pekka Enberg (?) @ 2010-03-18 17:02 ` Ingo Molnar 2010-03-18 17:09 ` Avi Kivity -1 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 17:02 UTC (permalink / raw) To: Pekka Enberg Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Pekka Enberg <penberg@cs.helsinki.fi> wrote: > On Thu, Mar 18, 2010 at 6:38 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: > >>>> There are all kernel space projects, going through Xorg would be a > >>>> horrible waste of performance for full-screen virtualization. It's fine > >>>> for the windowed or networked case (and good as a compatibility > >>>> fallback), but very much not fine for local desktop use. > >> > >> For the full-screen case (which is a very common mode of using a guest OS > >> on the desktop) there's not much of window management needed. You need to > >> save/restore as you switch in/out. > > > > I don't think I've ever used full-screen mode with my VMs and I use > > virtualization on a daily basis. > > > > We hear very infrequently from users using full screen mode. > > Sorry for getting slightly off-topic but I find the above statement > interesting. > > I don't use virtualization on daily basis but a working, fully integrated > full-screen model with VirtualBox was the only reason I bothered to give VMs > a second chance. From my point of view, the user experience of earlier > versions (e.g. Parallels) was just too painful to live with. That's the same i do, and that's what i'm hearing from other desktop users as well. The moment you work seriously in a guest OS you often want to switch to it full-screen, to maximize screen real-estate and to reduce host GUI element distractions. If it's just casual use of a single app then windowed mode suffices (but in that case performance doesnt matter much to begin with). I find the 'KVM mostly cares about the server, not about the desktop' attitude expressed in this thread troubling. > /me crawls back to his hole now... /me should do that too - this discussion is not resulting in any positive result so it has become rather pointless. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 17:02 ` Ingo Molnar @ 2010-03-18 17:09 ` Avi Kivity 2010-03-18 17:28 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-18 17:09 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 07:02 PM, Ingo Molnar wrote: > > I find the 'KVM mostly cares about the server, not about the desktop' attitude > expressed in this thread troubling. > It's not kvm, just it's developers (and their employers, where applicable). If you post desktop oriented patches I'm sure they'll be welcome. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 17:09 ` Avi Kivity @ 2010-03-18 17:28 ` Ingo Molnar 2010-03-19 7:56 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 17:28 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/18/2010 07:02 PM, Ingo Molnar wrote: > > > > I find the 'KVM mostly cares about the server, not about the desktop' > > attitude expressed in this thread troubling. > > It's not kvm, just it's developers (and their employers, where applicable). > If you post desktop oriented patches I'm sure they'll be welcome. Just such a patch-set was posted in this very thread: 'perf kvm'. There were two negative reactions immediately, both showed a fundamental server versus desktop bias: - you did not accept that the most important usecase is when there is a single guest running. - the reaction to the 'how do we get symbols out of the guest' sub-question was, paraphrased: 'we dont want that due to <unspecified> security threat to XYZ selinux usecase with lots of guests'. Anyone being aware of how Linux and KVM is being used on the desktop will know how detached that attitude is from the typical desktop usecase ... Usability _never_ sucks because of lack of patches or lack of suggestions. I bet if you made the next server feature contingent on essential usability fixes they'd happen overnight - for God's sake there's been 1000 commits in the last 3 months in the Qemu repository so there's plenty of manpower... Usability suckage - and i'm not going to be popular for saying this out loud - almost always shows a basic maintainer disconnect with the real world. See your very first reactions to my 'KVM usability' observations. Read back your and Anthony's replies: total 'sure, patches welcome' kind of indifference. It is _your project_, not some other project down the road ... So that is my first-hand experience about how you are welcoming these desktop issues, in this very thread. I suspect people try a few times with suggestions, then get shot down like our suggestions were shot down and then give up. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 17:28 ` Ingo Molnar @ 2010-03-19 7:56 ` Avi Kivity 2010-03-19 8:53 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-19 7:56 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 07:28 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/18/2010 07:02 PM, Ingo Molnar wrote: >> >>> I find the 'KVM mostly cares about the server, not about the desktop' >>> attitude expressed in this thread troubling. >>> >> It's not kvm, just it's developers (and their employers, where applicable). >> If you post desktop oriented patches I'm sure they'll be welcome. >> > Just such a patch-set was posted in this very thread: 'perf kvm'. > > There were two negative reactions immediately, both showed a fundamental > server versus desktop bias: > > - you did not accept that the most important usecase is when there is a > single guest running. > Well, it isn't. > - the reaction to the 'how do we get symbols out of the guest' sub-question > was, paraphrased: 'we dont want that due to<unspecified> security threat > to XYZ selinux usecase with lots of guests'. > When I review a patch, I try to think of the difficult cases, not just the easy case. > Anyone being aware of how Linux and KVM is being used on the desktop will know > how detached that attitude is from the typical desktop usecase ... > > Usability _never_ sucks because of lack of patches or lack of suggestions. I > bet if you made the next server feature contingent on essential usability > fixes they'd happen overnight - for God's sake there's been 1000 commits in > the last 3 months in the Qemu repository so there's plenty of manpower... > First of all I am not a qemu maintainer. Second, from my point of view all contributors are volunteers (perhaps their employer volunteered them, but there's no difference from my perspective). Asking them to repaint my apartment as a condition to get a patch applied is abuse. If a patch is good, it gets applied. > Usability suckage - and i'm not going to be popular for saying this out loud - > almost always shows a basic maintainer disconnect with the real world. See > your very first reactions to my 'KVM usability' observations. Read back your > and Anthony's replies: total 'sure, patches welcome' kind of indifference. It > is _your project_, not some other project down the road ... > I could drop everything and write a gtk GUI for qemu. Is that what you want? If someone is truly interested in a qemu usability, it's up to them to write the patches. Personally I've never missed the eject button. As to disconnect from the real world, most products based on kvm and qemu (and Linux) are server based. Perhaps that's the reason people emphasise that? Maybe if Linux had 10-20% desktop market penetration, there would be more interest in a bells and whistles qemu GUI. > So that is my first-hand experience about how you are welcoming these desktop > issues, in this very thread. I suspect people try a few times with > suggestions, then get shot down like our suggestions were shot down and then > give up. > I don't recall anyone trying this much less being shot down. Perhaps people are concentrating on virt-manager and the like and leaving qemu alone. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-19 7:56 ` Avi Kivity @ 2010-03-19 8:53 ` Ingo Molnar 2010-03-19 12:56 ` Anthony Liguori 2010-03-20 7:35 ` Avi Kivity 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-19 8:53 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > > There were two negative reactions immediately, both showed a fundamental > > server versus desktop bias: > > > > - you did not accept that the most important usecase is when there is a > > single guest running. > > Well, it isn't. Erm, my usability points are _doubly_ true when there are multiple guests ... The inconvenience of having to type: perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ --guestmodules=/home/ymzhang/guest/modules top is very obvious even with a single guest. Now multiply that by more guests ... The crux is: we are working on improving KVM instrumentation. There are working patches posted to this thread and we would like to have/implement an automatism to allow the discovery of all this information. The information should be available to the developer who wants it, and easily/transparently so - in true Linux fashion. > > - the reaction to the 'how do we get symbols out of the guest' sub-question > > was, paraphrased: 'we dont want that due to<unspecified> security threat > > to XYZ selinux usecase with lots of guests'. > > When I review a patch, I try to think of the difficult cases, not > just the easy case. You havent articulated an actionable reason and you have suggested no solution either, you just passive-agressive backed the claim that giving developers access to the symbol space is some sort of vague 'security threat'. If that is not so i'd be glad to be proven wrong. > > Anyone being aware of how Linux and KVM is being used on the desktop will > > know how detached that attitude is from the typical desktop usecase ... > > > > Usability _never_ sucks because of lack of patches or lack of suggestions. > > I bet if you made the next server feature contingent on essential > > usability fixes they'd happen overnight - for God's sake there's been 1000 > > commits in the last 3 months in the Qemu repository so there's plenty of > > manpower... > > First of all I am not a qemu maintainer. [...] That is the crux of the matter. My experience in these threads was that no-one really seems to feel in charge of the whole thing. Should we really wonder why KVM usability sucks? > [...] Second, from my point of view all contributors are volunteers (perhaps > their employer volunteered them, but there's no difference from my > perspective). Asking them to repaint my apartment as a condition to get a > patch applied is abuse. If a patch is good, it gets applied. This is one of the weirdest arguments i've seen in this thread. Almost all the time do we make contributions conditional on the general shape of the project. Developers dont get to do just the fun stuff. This is a basic quid pro quo: new features introduce risks and create additional workload not just to the originating developer but on the rest of the community as well. You should check how Linus has pulled new features in the past 15 years: he very much requires the existing code to first be top-notch before he accepts new features for a given area of functionality. Doing that and insisting on developers to see those imbalances as well is absolutely essential to code quality: otherwise everyone would be running around implementing just the features they are interested in, without regard for the general health of the project. Of course, if you keep the project in two halves (KVM and Qemu), and pretend that they are separate and have little relation, imbalances of quality can mount up and you can throw your hands up and say that it's "too bad, I'm not maintaining that". It is your basic duty as a Linux maintainer to keep balances of quality. I do it all day, other maintainers do it all day. > > Usability suckage - and i'm not going to be popular for saying this out > > loud - almost always shows a basic maintainer disconnect with the real > > world. See your very first reactions to my 'KVM usability' observations. > > Read back your and Anthony's replies: total 'sure, patches welcome' kind > > of indifference. It is _your project_, not some other project down the > > road ... > > I could drop everything and write a gtk GUI for qemu. Is that what you > want? No, my suggestion to you (it's up to you whether you give my opinion any weight) is to accept your mistakes and improve, and to not stand in the way of people who'd like to improve the situation. You are happy with the server features and you also made it clear that you dont feel responsible for the rest of the package - which is a big mistake IMO. Also, you have demonstrated it in this thread that you have near zero technical clue about basic desktop and development usability matters - for example your stance on symbol space access and your stance on how to enumerate guests symbolically are outright bizarre. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-19 8:53 ` Ingo Molnar @ 2010-03-19 12:56 ` Anthony Liguori 2010-03-21 19:17 ` Ingo Molnar 2010-03-20 7:35 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-19 12:56 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/19/2010 03:53 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> There were two negative reactions immediately, both showed a fundamental >>> server versus desktop bias: >>> >>> - you did not accept that the most important usecase is when there is a >>> single guest running. >>> >> Well, it isn't. >> > Erm, my usability points are _doubly_ true when there are multiple guests ... > > The inconvenience of having to type: > > perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ > --guestmodules=/home/ymzhang/guest/modules top > > is very obvious even with a single guest. Now multiply that by more guests ... > If you want to improve this, you need to do the following: 1) Add a userspace daemon that uses vmchannel that runs in the guest and can fetch kallsyms and arbitrary modules. If that daemon lives in tools/perf, that's fine. 2) Add a QMP interface in qemu to interact with such daemon 3) Add a default QMP port in a well known location[1] 4) Modify the perf tool to look for a default QMP port. In the case of a single guest, there's one port. If there are multiple guests, then you will have to connect to each port, find the name or any other identifying information, and let the user choose. Patches are certainly welcome. [1] I've written up this patch and will send it out some time today. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-19 12:56 ` Anthony Liguori @ 2010-03-21 19:17 ` Ingo Molnar 2010-03-21 19:35 ` Antoine Martin ` (2 more replies) 0 siblings, 3 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 19:17 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/19/2010 03:53 AM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>>There were two negative reactions immediately, both showed a fundamental > >>>server versus desktop bias: > >>> > >>> - you did not accept that the most important usecase is when there is a > >>> single guest running. > >>Well, it isn't. > >Erm, my usability points are _doubly_ true when there are multiple guests ... > > > >The inconvenience of having to type: > > > > perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ > > --guestmodules=/home/ymzhang/guest/modules top > > > >is very obvious even with a single guest. Now multiply that by more guests ... > > If you want to improve this, you need to do the following: > > 1) Add a userspace daemon that uses vmchannel that runs in the guest and can > fetch kallsyms and arbitrary modules. If that daemon lives in > tools/perf, that's fine. Adding any new daemon to an existing guest is a deployment and usability nightmare. The basic rule of good instrumentation is to be transparent. The moment we have to modify the user-space of a guest just to monitor it, the purpose of transparent instrumentation is defeated. That was one of the fundamental usability mistakes of Oprofile. There is no 'perf' daemon - all the perf functionality is _built in_, and for very good reasons. It is one of the main reasons for perf's success as well. Now Qemu is trying to repeat that stupid mistake ... So please either suggest a different transparent solution that is technically better than the one i suggested, or you should concede the point really. Please try think with the heads of our users and developers and dont suggest some weird ivory-tower design that is totally impractical ... And no, you have to code none of this, we'll do all the coding. The only thing we are asking is for you to not stand in the way of good usability ... Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 19:17 ` Ingo Molnar @ 2010-03-21 19:35 ` Antoine Martin 2010-03-21 19:59 ` Ingo Molnar 2010-03-21 20:01 ` Avi Kivity 2010-03-21 23:35 ` Anthony Liguori 2 siblings, 1 reply; 390+ messages in thread From: Antoine Martin @ 2010-03-21 19:35 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 02:17 AM, Ingo Molnar wrote: > * Anthony Liguori<anthony@codemonkey.ws> wrote: > >> On 03/19/2010 03:53 AM, Ingo Molnar wrote: >> >>> * Avi Kivity<avi@redhat.com> wrote: >>> >>>>> There were two negative reactions immediately, both showed a fundamental >>>>> server versus desktop bias: >>>>> >>>>> - you did not accept that the most important usecase is when there is a >>>>> single guest running. >>>>> >>>> Well, it isn't. >>>> >>> Erm, my usability points are _doubly_ true when there are multiple guests ... >>> >>> The inconvenience of having to type: >>> >>> perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ >>> --guestmodules=/home/ymzhang/guest/modules top >>> >>> is very obvious even with a single guest. Now multiply that by more guests ... >>> >> If you want to improve this, you need to do the following: >> >> 1) Add a userspace daemon that uses vmchannel that runs in the guest and can >> fetch kallsyms and arbitrary modules. If that daemon lives in >> tools/perf, that's fine. >> > Adding any new daemon to an existing guest is a deployment and usability > nightmare. > Absolutely. In most cases it is not desirable, and you'll find that in a lot of cases it is not even possible - for non-technical reasons. One of the main benefits of virtualization is the ability to manage and see things from the outside. > The basic rule of good instrumentation is to be transparent. The moment we > have to modify the user-space of a guest just to monitor it, the purpose of > transparent instrumentation is defeated. > Not to mention Heisenbugs and interference. Cheers Antoine > That was one of the fundamental usability mistakes of Oprofile. > > There is no 'perf' daemon - all the perf functionality is _built in_, and for > very good reasons. It is one of the main reasons for perf's success as well. > > Now Qemu is trying to repeat that stupid mistake ... > > So please either suggest a different transparent solution that is technically > better than the one i suggested, or you should concede the point really. > > Please try think with the heads of our users and developers and dont suggest > some weird ivory-tower design that is totally impractical ... > > And no, you have to code none of this, we'll do all the coding. The only thing > we are asking is for you to not stand in the way of good usability ... > > Thanks, > > Ingo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 19:35 ` Antoine Martin @ 2010-03-21 19:59 ` Ingo Molnar 2010-03-21 20:09 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 19:59 UTC (permalink / raw) To: Antoine Martin Cc: Anthony Liguori, Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Antoine Martin <antoine@nagafix.co.uk> wrote: > On 03/22/2010 02:17 AM, Ingo Molnar wrote: > >* Anthony Liguori<anthony@codemonkey.ws> wrote: > >>On 03/19/2010 03:53 AM, Ingo Molnar wrote: > >>>* Avi Kivity<avi@redhat.com> wrote: > >>>>>There were two negative reactions immediately, both showed a fundamental > >>>>>server versus desktop bias: > >>>>> > >>>>> - you did not accept that the most important usecase is when there is a > >>>>> single guest running. > >>>>Well, it isn't. > >>>Erm, my usability points are _doubly_ true when there are multiple guests ... > >>> > >>>The inconvenience of having to type: > >>> > >>> perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ > >>> --guestmodules=/home/ymzhang/guest/modules top > >>> > >>>is very obvious even with a single guest. Now multiply that by more guests ... > >>If you want to improve this, you need to do the following: > >> > >>1) Add a userspace daemon that uses vmchannel that runs in the guest and can > >> fetch kallsyms and arbitrary modules. If that daemon lives in > >> tools/perf, that's fine. > > > > Adding any new daemon to an existing guest is a deployment and usability > > nightmare. > > Absolutely. In most cases it is not desirable, and you'll find that in a lot > of cases it is not even possible - for non-technical reasons. > > One of the main benefits of virtualization is the ability to manage and see > things from the outside. > > > The basic rule of good instrumentation is to be transparent. The moment we > > have to modify the user-space of a guest just to monitor it, the purpose > > of transparent instrumentation is defeated. > > Not to mention Heisenbugs and interference. Correct. Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony suggesting such a clearly inferior "add a demon to the guest space" solution. It's a usability and deployment non-starter. Furthermore, allowing a guest to integrate/mount its files into the host VFS space (which was my suggestion) has many other uses and advantages as well, beyond the instrumentation/symbol-lookup purpose. So can we please have some resolution here and move on: the KVM maintainers should either suggest a different transparent approach, or should retract the NAK for the solution we suggested. We very much want to make progress and want to write code, but obviously we cannot code against a maintainer NAK, nor can we code up an inferior solution either. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 19:59 ` Ingo Molnar @ 2010-03-21 20:09 ` Avi Kivity 2010-03-21 21:00 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-21 20:09 UTC (permalink / raw) To: Ingo Molnar Cc: Antoine Martin, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 09:59 PM, Ingo Molnar wrote: > > Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony > suggesting such a clearly inferior "add a demon to the guest space" solution. > It's a usability and deployment non-starter. > It's only clearly inferior if you ignore every consideration against it. It's definitely not a deployment non-starter, see the tons of daemons that come with any Linux system. The basic ones are installed and enabled automatically during system installation. > Furthermore, allowing a guest to integrate/mount its files into the host VFS > space (which was my suggestion) has many other uses and advantages as well, > beyond the instrumentation/symbol-lookup purpose. > Yes. I'm just not sure about the auto-enabling part. > So can we please have some resolution here and move on: the KVM maintainers > should either suggest a different transparent approach, or should retract the > NAK for the solution we suggested. > So long as you define 'transparent' as in 'only the guest kernel is involved' or even 'only the guest and host kernels are involved' we aren't going to make a lot of progress. I oppose shoving random bits of functionality into the kernel, especially things that are in daily use. While us developers do and will use profiling extensively, it doesn't need sit in every guest's non-swappable .text. > We very much want to make progress and want to write code, but obviously we > cannot code against a maintainer NAK, nor can we code up an inferior solution > either. > You haven't heard any NAKs, only objections. If we discuss things perhaps we can achieve something that works for everyone. If we keep turning the flames higher that's unlikely. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:09 ` Avi Kivity @ 2010-03-21 21:00 ` Ingo Molnar 2010-03-21 21:44 ` Avi Kivity 2010-03-21 23:43 ` Anthony Liguori 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 21:00 UTC (permalink / raw) To: Avi Kivity Cc: Antoine Martin, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/21/2010 09:59 PM, Ingo Molnar wrote: > > > >Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony > >suggesting such a clearly inferior "add a demon to the guest space" solution. > >It's a usability and deployment non-starter. > > It's only clearly inferior if you ignore every consideration against it. > It's definitely not a deployment non-starter, see the tons of daemons that > come with any Linux system. [...] Avi, please dont put arguments into my mouth that i never made. My (clearly expressed) argument was that: _a new guest-side demon is a transparent instrumentation non-starter_ What is so hard to understand about that simple concept? Instrumentation is good if it's as transparent as possible. Of course lots of other features can be done via a new user-space package ... Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:00 ` Ingo Molnar @ 2010-03-21 21:44 ` Avi Kivity 2010-03-21 23:43 ` Anthony Liguori 1 sibling, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-21 21:44 UTC (permalink / raw) To: Ingo Molnar Cc: Antoine Martin, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 11:00 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/21/2010 09:59 PM, Ingo Molnar wrote: >> >>> Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony >>> suggesting such a clearly inferior "add a demon to the guest space" solution. >>> It's a usability and deployment non-starter. >>> >> It's only clearly inferior if you ignore every consideration against it. >> It's definitely not a deployment non-starter, see the tons of daemons that >> come with any Linux system. [...] >> > Avi, please dont put arguments into my mouth that i never made. > Sorry, that was not the intent. I meant that putting things into the kernel have disadvantages that must be considered. > My (clearly expressed) argument was that: > > _a new guest-side demon is a transparent instrumentation non-starter_ > > What is so hard to understand about that simple concept? Instrumentation is > good if it's as transparent as possible. > > Of course lots of other features can be done via a new user-space package ... > I believe you can deploy this daemon via a (default) package, without any hassle to users. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:00 ` Ingo Molnar 2010-03-21 21:44 ` Avi Kivity @ 2010-03-21 23:43 ` Anthony Liguori 1 sibling, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-21 23:43 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Antoine Martin, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 04:00 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/21/2010 09:59 PM, Ingo Molnar wrote: >> >>> Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony >>> suggesting such a clearly inferior "add a demon to the guest space" solution. >>> It's a usability and deployment non-starter. >>> >> It's only clearly inferior if you ignore every consideration against it. >> It's definitely not a deployment non-starter, see the tons of daemons that >> come with any Linux system. [...] >> > Avi, please dont put arguments into my mouth that i never made. > > My (clearly expressed) argument was that: > > _a new guest-side demon is a transparent instrumentation non-starter_ > FWIW, there's no reason you couldn't consume a vmchannel port from within the kernel. I don't think the code needs to be in the kernel and from a security PoV, that suggests that it should be in userspace IMHO. But if you want to make a kernel thread, knock yourself out. I have no objection to that from a qemu perspective. I can't see why Avi would mind either. I think it's papering around another problem (the kernel should control initrds IMHO) but that's a different topic. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 19:17 ` Ingo Molnar 2010-03-21 19:35 ` Antoine Martin @ 2010-03-21 20:01 ` Avi Kivity 2010-03-21 20:08 ` Olivier Galibert 2010-03-21 20:31 ` Ingo Molnar 2010-03-21 23:35 ` Anthony Liguori 2 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-21 20:01 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 09:17 PM, Ingo Molnar wrote: > > Adding any new daemon to an existing guest is a deployment and usability > nightmare. > The logical conclusion of that is that everything should be built into the kernel. Where a failure brings the system down or worse. Where you have to bear the memory footprint whether you ever use the functionality or not. Where to update the functionality you need to deploy a new kernel (possibly introducing unrelated bugs) and reboot. If userspace daemons are such a deployment and usability nightmare, maybe we should fix that instead. > The basic rule of good instrumentation is to be transparent. The moment we > have to modify the user-space of a guest just to monitor it, the purpose of > transparent instrumentation is defeated. > You have to modify the guest anyway by deploying a new kernel. > Please try think with the heads of our users and developers and dont suggest > some weird ivory-tower design that is totally impractical ... > inetd.d style 'drop a listener config here and it will be executed on connection' should work. The listener could come with the kernel package, though I don't think it's a good idea. module-init-tools doesn't and people have survived somehow. > And no, you have to code none of this, we'll do all the coding. The only thing > we are asking is for you to not stand in the way of good usability ... > Thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:01 ` Avi Kivity @ 2010-03-21 20:08 ` Olivier Galibert 2010-03-21 20:11 ` Avi Kivity 2010-03-21 20:11 ` Avi Kivity 2010-03-21 20:31 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: Olivier Galibert @ 2010-03-21 20:08 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote: > On 03/21/2010 09:17 PM, Ingo Molnar wrote: > > > >Adding any new daemon to an existing guest is a deployment and usability > >nightmare. > > > > The logical conclusion of that is that everything should be built into > the kernel. Where a failure brings the system down or worse. Where you > have to bear the memory footprint whether you ever use the functionality > or not. Where to update the functionality you need to deploy a new > kernel (possibly introducing unrelated bugs) and reboot. > > If userspace daemons are such a deployment and usability nightmare, > maybe we should fix that instead. Which userspace? Deploying *anything* in the guest can be a nightmare, including paravirt drivers if you don't have a natively supported in the OS virtual hardware backoff. Deploying things in the host OTOH is business as usual. And you're smart enough to know that. OG. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:08 ` Olivier Galibert @ 2010-03-21 20:11 ` Avi Kivity 2010-03-21 20:18 ` Antoine Martin 2010-03-21 20:37 ` Ingo Molnar 2010-03-21 20:11 ` Avi Kivity 1 sibling, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-21 20:11 UTC (permalink / raw) To: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 10:08 PM, Olivier Galibert wrote: > On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote: > >> On 03/21/2010 09:17 PM, Ingo Molnar wrote: >> >>> Adding any new daemon to an existing guest is a deployment and usability >>> nightmare. >>> >>> >> The logical conclusion of that is that everything should be built into >> the kernel. Where a failure brings the system down or worse. Where you >> have to bear the memory footprint whether you ever use the functionality >> or not. Where to update the functionality you need to deploy a new >> kernel (possibly introducing unrelated bugs) and reboot. >> >> If userspace daemons are such a deployment and usability nightmare, >> maybe we should fix that instead. >> > Which userspace? Deploying *anything* in the guest can be a > nightmare, including paravirt drivers if you don't have a natively > supported in the OS virtual hardware backoff. That includes the guest kernel. If you can deploy a new kernel in the guest, presumably you can deploy a userspace package. > Deploying things in the > host OTOH is business as usual. > True. > And you're smart enough to know that. > Thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:11 ` Avi Kivity @ 2010-03-21 20:18 ` Antoine Martin 2010-03-21 20:24 ` Avi Kivity 2010-03-21 20:37 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Antoine Martin @ 2010-03-21 20:18 UTC (permalink / raw) To: Avi Kivity Cc: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 03:11 AM, Avi Kivity wrote: > On 03/21/2010 10:08 PM, Olivier Galibert wrote: >> On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote: >>> On 03/21/2010 09:17 PM, Ingo Molnar wrote: >>>> Adding any new daemon to an existing guest is a deployment and >>>> usability >>>> nightmare. >>>> >>> The logical conclusion of that is that everything should be built into >>> the kernel. Where a failure brings the system down or worse. Where >>> you >>> have to bear the memory footprint whether you ever use the >>> functionality >>> or not. Where to update the functionality you need to deploy a new >>> kernel (possibly introducing unrelated bugs) and reboot. >>> >>> If userspace daemons are such a deployment and usability nightmare, >>> maybe we should fix that instead. >> Which userspace? Deploying *anything* in the guest can be a >> nightmare, including paravirt drivers if you don't have a natively >> supported in the OS virtual hardware backoff. > > That includes the guest kernel. If you can deploy a new kernel in the > guest, presumably you can deploy a userspace package. That's not always true. The host admin can control the guest kernel via "kvm -kernel" easily enough, but he may or may not have access to the disk that is used in the guest. (think encrypted disks, service agreements, etc) Antoine >> Deploying things in the >> host OTOH is business as usual. > > True. > >> And you're smart enough to know that. > > Thanks. > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:18 ` Antoine Martin @ 2010-03-21 20:24 ` Avi Kivity 2010-03-21 20:31 ` Antoine Martin 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-21 20:24 UTC (permalink / raw) To: Antoine Martin Cc: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 10:18 PM, Antoine Martin wrote: >> That includes the guest kernel. If you can deploy a new kernel in >> the guest, presumably you can deploy a userspace package. > > That's not always true. > The host admin can control the guest kernel via "kvm -kernel" easily > enough, but he may or may not have access to the disk that is used in > the guest. (think encrypted disks, service agreements, etc) There is a matching -initrd argument that you can use to launch a daemon. I believe that -kernel use will be rare, though. It's a lot easier to keep everything in one filesystem. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:24 ` Avi Kivity @ 2010-03-21 20:31 ` Antoine Martin 2010-03-21 21:03 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Antoine Martin @ 2010-03-21 20:31 UTC (permalink / raw) To: Avi Kivity Cc: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 03:24 AM, Avi Kivity wrote: > On 03/21/2010 10:18 PM, Antoine Martin wrote: >>> That includes the guest kernel. If you can deploy a new kernel in >>> the guest, presumably you can deploy a userspace package. >> >> That's not always true. >> The host admin can control the guest kernel via "kvm -kernel" easily >> enough, but he may or may not have access to the disk that is used in >> the guest. (think encrypted disks, service agreements, etc) > > There is a matching -initrd argument that you can use to launch a daemon. I thought this discussion was about making it easy to deploy... and generating a custom initrd isn't easy by any means, and it requires access to the guest filesystem (and its mkinitrd tools). > I believe that -kernel use will be rare, though. It's a lot easier > to keep everything in one filesystem. Well, for what it's worth, I rarely ever use anything else. My virtual disks are raw so I can loop mount them easily, and I can also switch my guest kernels from outside... without ever needing to mount those disks. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:31 ` Antoine Martin @ 2010-03-21 21:03 ` Avi Kivity 2010-03-21 21:20 ` Ingo Molnar 2010-03-22 12:05 ` Antoine Martin 0 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-21 21:03 UTC (permalink / raw) To: Antoine Martin Cc: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 10:31 PM, Antoine Martin wrote: > On 03/22/2010 03:24 AM, Avi Kivity wrote: >> On 03/21/2010 10:18 PM, Antoine Martin wrote: >>>> That includes the guest kernel. If you can deploy a new kernel in >>>> the guest, presumably you can deploy a userspace package. >>> >>> That's not always true. >>> The host admin can control the guest kernel via "kvm -kernel" easily >>> enough, but he may or may not have access to the disk that is used >>> in the guest. (think encrypted disks, service agreements, etc) >> >> There is a matching -initrd argument that you can use to launch a >> daemon. > I thought this discussion was about making it easy to deploy... and > generating a custom initrd isn't easy by any means, and it requires > access to the guest filesystem (and its mkinitrd tools). That's true. You need to run mkinitrd anyway, though, unless your guest is non-modular and non-lvm. >> I believe that -kernel use will be rare, though. It's a lot easier >> to keep everything in one filesystem. > Well, for what it's worth, I rarely ever use anything else. My virtual > disks are raw so I can loop mount them easily, and I can also switch > my guest kernels from outside... without ever needing to mount those > disks. Curious, what do you use them for? btw, if you build your kernel outside the guest, then you already have access to all its symbols, without needing anything further. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:03 ` Avi Kivity @ 2010-03-21 21:20 ` Ingo Molnar 2010-03-22 6:35 ` Avi Kivity 2010-03-22 6:59 ` Zhang, Yanmin 2010-03-22 12:05 ` Antoine Martin 1 sibling, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 21:20 UTC (permalink / raw) To: Avi Kivity Cc: Antoine Martin, Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > > Well, for what it's worth, I rarely ever use anything else. My virtual > > disks are raw so I can loop mount them easily, and I can also switch my > > guest kernels from outside... without ever needing to mount those disks. > > Curious, what do you use them for? > > btw, if you build your kernel outside the guest, then you already have > access to all its symbols, without needing anything further. There's two errors with your argument: 1) you are assuming that it's only about kernel symbols Look at this 'perf report' output: # Samples: 7127509216 # # Overhead Command Shared Object Symbol # ........ .......... ............................. ...... # 19.14% git git [.] lookup_object 15.16% perf git [.] lookup_object 4.74% perf libz.so.1.2.3 [.] inflate 4.52% git libz.so.1.2.3 [.] inflate 4.21% perf libz.so.1.2.3 [.] inflate_table 3.94% git libz.so.1.2.3 [.] inflate_table 3.29% git git [.] find_pack_entry_one 3.24% git libz.so.1.2.3 [.] inflate_fast 2.96% perf libz.so.1.2.3 [.] inflate_fast 2.96% git git [.] decode_tree_entry 2.80% perf libc-2.11.90.so [.] __strlen_sse42 2.56% git libc-2.11.90.so [.] __strlen_sse42 1.98% perf libc-2.11.90.so [.] __GI_memcpy 1.71% perf git [.] decode_tree_entry 1.53% git libc-2.11.90.so [.] __GI_memcpy 1.48% git git [.] lookup_blob 1.30% git git [.] process_tree 1.30% perf git [.] process_tree 0.90% perf git [.] tree_entry 0.82% perf git [.] lookup_blob 0.78% git [kernel.kallsyms] [k] kstat_irqs_cpu kernel symbols are only a small portion of the symbols. (a single line in this case) To get to those other symbols we have to read the ELF symbols of those binaries in the guest filesystem, in the post-processing/reporting phase. This is both complex to do and relatively slow so we dont want to (and cannot) do this at sample time from IRQ context or NMI context ... Also, many aspects of reporting are interactive so it's done lazily or on-demand. So we need ready access to the guest filesystem - for those guests which decide to integrate with the host for this. 2) the 'SystemTap mistake' You are assuming that the symbols of the kernel when it got built got saved properly and are discoverable easily. In reality those symbols can be erased by a make clean, can be modified by a new build, can be misplaced and can generally be hard to find because each distro puts them in a different installation path. My 10+ years experience with kernel instrumentation solutions is that kernel-driven, self-sufficient, robust, trustable, well-enumerated sources of information work far better in practice. The thing is, in this thread i'm forced to repeat the same basic facts again and again. Could you _PLEASE_, pretty please, when it comes to instrumentation details, at least _read the mails_ of the guys who actually ... write and maintain Linux instrumentation code? This is getting ridiculous really. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:20 ` Ingo Molnar @ 2010-03-22 6:35 ` Avi Kivity 2010-03-22 11:48 ` Ingo Molnar 2010-03-22 6:59 ` Zhang, Yanmin 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 6:35 UTC (permalink / raw) To: Ingo Molnar Cc: Antoine Martin, Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 11:20 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> Well, for what it's worth, I rarely ever use anything else. My virtual >>> disks are raw so I can loop mount them easily, and I can also switch my >>> guest kernels from outside... without ever needing to mount those disks. >>> >> Curious, what do you use them for? >> >> btw, if you build your kernel outside the guest, then you already have >> access to all its symbols, without needing anything further. >> > There's two errors with your argument: > > 1) you are assuming that it's only about kernel symbols > > Look at this 'perf report' output: > > # Samples: 7127509216 > # > # Overhead Command Shared Object Symbol > # ........ .......... ............................. ...... > # > 19.14% git git [.] lookup_object > 15.16% perf git [.] lookup_object > 4.74% perf libz.so.1.2.3 [.] inflate > 4.52% git libz.so.1.2.3 [.] inflate > 4.21% perf libz.so.1.2.3 [.] inflate_table > 3.94% git libz.so.1.2.3 [.] inflate_table > 3.29% git git [.] find_pack_entry_one > 3.24% git libz.so.1.2.3 [.] inflate_fast > 2.96% perf libz.so.1.2.3 [.] inflate_fast > 2.96% git git [.] decode_tree_entry > 2.80% perf libc-2.11.90.so [.] __strlen_sse42 > 2.56% git libc-2.11.90.so [.] __strlen_sse42 > 1.98% perf libc-2.11.90.so [.] __GI_memcpy > 1.71% perf git [.] decode_tree_entry > 1.53% git libc-2.11.90.so [.] __GI_memcpy > 1.48% git git [.] lookup_blob > 1.30% git git [.] process_tree > 1.30% perf git [.] process_tree > 0.90% perf git [.] tree_entry > 0.82% perf git [.] lookup_blob > 0.78% git [kernel.kallsyms] [k] kstat_irqs_cpu > > kernel symbols are only a small portion of the symbols. (a single line in this > case) > > To get to those other symbols we have to read the ELF symbols of those > binaries in the guest filesystem, in the post-processing/reporting phase. This > is both complex to do and relatively slow so we dont want to (and cannot) do > this at sample time from IRQ context or NMI context ... > Okay. So a symbol server is necessary. Still, I don't think -kernel is a good reason for including the symbol server in the kernel itself. If someone uses it extensively together with perf, _and_ they can't put the symbol server in the guest for some reason, let them patch mkinitrd to include it. > Also, many aspects of reporting are interactive so it's done lazily or > on-demand. So we need ready access to the guest filesystem - for those guests > which decide to integrate with the host for this. > > 2) the 'SystemTap mistake' > > You are assuming that the symbols of the kernel when it got built got saved > properly and are discoverable easily. In reality those symbols can be erased > by a make clean, can be modified by a new build, can be misplaced and can > generally be hard to find because each distro puts them in a different > installation path. > > My 10+ years experience with kernel instrumentation solutions is that > kernel-driven, self-sufficient, robust, trustable, well-enumerated sources of > information work far better in practice. > What about line number information? And the source? Into the kernel with them as well? > The thing is, in this thread i'm forced to repeat the same basic facts again > and again. Could you _PLEASE_, pretty please, when it comes to instrumentation > details, at least _read the mails_ of the guys who actually ... write and > maintain Linux instrumentation code? This is getting ridiculous really. > I've read every one of your emails. If I misunderstood or overlooked something, I apologize. The thread is very long and at times antagonistic so it's hard to keep all the details straight. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 6:35 ` Avi Kivity @ 2010-03-22 11:48 ` Ingo Molnar 2010-03-22 12:31 ` Pekka Enberg 2010-03-22 12:36 ` Avi Kivity 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 11:48 UTC (permalink / raw) To: Avi Kivity Cc: Antoine Martin, Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > > My 10+ years experience with kernel instrumentation solutions is that > > kernel-driven, self-sufficient, robust, trustable, well-enumerated sources > > of information work far better in practice. > > What about line number information? And the source? Into the kernel with > them as well? Sigh. Please read the _very first_ suggestion i made, which solves all that. I rarely go into discussions without suggesting technical solutions - i'm not interested in flaming, i'm interested in real solutions. Here it is, repeated for the Nth time: Allow a guest to (optionally) integrate its VFS namespace with the host side as well. An example scheme would be: /guests/Fedora-G1/ /guests/Fedora-G1/proc/ /guests/Fedora-G1/usr/ /guests/Fedora-G1/.../ /guests/OpenSuse-G2/ /guests/OpenSuse-G2/proc/ /guests/OpenSuse-G2/usr/ /guests/OpenSuse-G2/.../ ( This feature would be configurable and would be default-off, to maintain the current status quo. ) Line number information and the source (dwarf info) and ELF symbols are all provided and accessible via such an interface - no need to run any 'symbol demon' on the guest side. And, obviously, having the guest VFS namespace (optionally) available on the host side also has far more uses than perf's symbol needs. I was surprised no-one ever came up with such a suggestion - it is so obvious to allow the integration of the VFS namespaces. But given your explicit declaration of your KVM desktop usability indifference i'm kind of not surprised about that anymore. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 11:48 ` Ingo Molnar @ 2010-03-22 12:31 ` Pekka Enberg 2010-03-22 12:36 ` Avi Kivity 1 sibling, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 12:31 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 1:48 PM, Ingo Molnar <mingo@elte.hu> wrote: >> What about line number information? And the source? Into the kernel with >> them as well? > > Sigh. Please read the _very first_ suggestion i made, which solves all that. I > rarely go into discussions without suggesting technical solutions - i'm not > interested in flaming, i'm interested in real solutions. > > Here it is, repeated for the Nth time: > > Allow a guest to (optionally) integrate its VFS namespace with the host side > as well. An example scheme would be: > > /guests/Fedora-G1/ > /guests/Fedora-G1/proc/ > /guests/Fedora-G1/usr/ > /guests/Fedora-G1/.../ > /guests/OpenSuse-G2/ > /guests/OpenSuse-G2/proc/ > /guests/OpenSuse-G2/usr/ > /guests/OpenSuse-G2/.../ > > ( This feature would be configurable and would be default-off, to maintain > the current status quo. ) Heh, funny. That would also solve my number one gripe with virtualization these days: how to get files in and out of guests without having to install extra packages on the guest side and fiddling with mount points on every single guest image I want to play with. Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-22 12:31 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 12:31 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 1:48 PM, Ingo Molnar <mingo@elte.hu> wrote: >> What about line number information? And the source? Into the kernel with >> them as well? > > Sigh. Please read the _very first_ suggestion i made, which solves all that. I > rarely go into discussions without suggesting technical solutions - i'm not > interested in flaming, i'm interested in real solutions. > > Here it is, repeated for the Nth time: > > Allow a guest to (optionally) integrate its VFS namespace with the host side > as well. An example scheme would be: > > /guests/Fedora-G1/ > /guests/Fedora-G1/proc/ > /guests/Fedora-G1/usr/ > /guests/Fedora-G1/.../ > /guests/OpenSuse-G2/ > /guests/OpenSuse-G2/proc/ > /guests/OpenSuse-G2/usr/ > /guests/OpenSuse-G2/.../ > > ( This feature would be configurable and would be default-off, to maintain > the current status quo. ) Heh, funny. That would also solve my number one gripe with virtualization these days: how to get files in and out of guests without having to install extra packages on the guest side and fiddling with mount points on every single guest image I want to play with. Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:31 ` Pekka Enberg (?) @ 2010-03-22 12:37 ` Daniel P. Berrange 2010-03-22 12:44 ` Pekka Enberg 2010-03-22 12:54 ` Ingo Molnar -1 siblings, 2 replies; 390+ messages in thread From: Daniel P. Berrange @ 2010-03-22 12:37 UTC (permalink / raw) To: Pekka Enberg Cc: Ingo Molnar, Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 02:31:49PM +0200, Pekka Enberg wrote: > On Mon, Mar 22, 2010 at 1:48 PM, Ingo Molnar <mingo@elte.hu> wrote: > >> What about line number information? And the source? Into the kernel with > >> them as well? > > > > Sigh. Please read the _very first_ suggestion i made, which solves all that. I > > rarely go into discussions without suggesting technical solutions - i'm not > > interested in flaming, i'm interested in real solutions. > > > > Here it is, repeated for the Nth time: > > > > Allow a guest to (optionally) integrate its VFS namespace with the host side > > as well. An example scheme would be: > > > > /guests/Fedora-G1/ > > /guests/Fedora-G1/proc/ > > /guests/Fedora-G1/usr/ > > /guests/Fedora-G1/.../ > > /guests/OpenSuse-G2/ > > /guests/OpenSuse-G2/proc/ > > /guests/OpenSuse-G2/usr/ > > /guests/OpenSuse-G2/.../ > > > > ( This feature would be configurable and would be default-off, to maintain > > the current status quo. ) > > Heh, funny. That would also solve my number one gripe with > virtualization these days: how to get files in and out of guests > without having to install extra packages on the guest side and > fiddling with mount points on every single guest image I want to play > with. FYI, for offline guests, you can use libguestfs[1] to access & change files inside the guest, and read-only access to running guests files. It provides access via a interactive shell, APIs in all major languages, and also has a FUSE mdule to expose it directly in the host VFS. It could probably be made to work read-write for running guests too if its agent were installed inside the guest & leverage the new Virtio-Serial channel for comms (avoiding any network setup requirements). Regards, Daniel [1] http://libguestfs.org/ -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:37 ` Daniel P. Berrange @ 2010-03-22 12:44 ` Pekka Enberg 2010-03-22 12:54 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 12:44 UTC (permalink / raw) To: Daniel P. Berrange Cc: Ingo Molnar, Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker Hi Daniel, (I'm getting slightly off-topic, sorry about that.) Daniel P. Berrange kirjoitti: >>> Here it is, repeated for the Nth time: >>> >>> Allow a guest to (optionally) integrate its VFS namespace with the host side >>> as well. An example scheme would be: >>> >>> /guests/Fedora-G1/ >>> /guests/Fedora-G1/proc/ >>> /guests/Fedora-G1/usr/ >>> /guests/Fedora-G1/.../ >>> /guests/OpenSuse-G2/ >>> /guests/OpenSuse-G2/proc/ >>> /guests/OpenSuse-G2/usr/ >>> /guests/OpenSuse-G2/.../ >>> >>> ( This feature would be configurable and would be default-off, to maintain >>> the current status quo. ) >> Heh, funny. That would also solve my number one gripe with >> virtualization these days: how to get files in and out of guests >> without having to install extra packages on the guest side and >> fiddling with mount points on every single guest image I want to play >> with. > > FYI, for offline guests, you can use libguestfs[1] to access & change files > inside the guest, and read-only access to running guests files. It provides > access via a interactive shell, APIs in all major languages, and also has a > FUSE mdule to expose it directly in the host VFS. It could probably be made > to work read-write for running guests too if its agent were installed inside > the guest & leverage the new Virtio-Serial channel for comms (avoiding any > network setup requirements). Right. Thanks for the pointer. The use case I am thinking of is working on an userspace project and wanting to test a piece of code on multiple distributions before pushing it out. That pretty much means being able to pull from the host git repository (or push to the guest repo) while the guest is running, maybe changing the code a bit and then getting the changes back to the host for the final push. What I do now is I push the changes on the host side to a (private) remote branch and do the work through that. But that's pretty lame workaround in my opinion. Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:37 ` Daniel P. Berrange 2010-03-22 12:44 ` Pekka Enberg @ 2010-03-22 12:54 ` Ingo Molnar 2010-03-22 13:05 ` Daniel P. Berrange 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 12:54 UTC (permalink / raw) To: Daniel P. Berrange Cc: Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Daniel P. Berrange <berrange@redhat.com> wrote: > On Mon, Mar 22, 2010 at 02:31:49PM +0200, Pekka Enberg wrote: > > On Mon, Mar 22, 2010 at 1:48 PM, Ingo Molnar <mingo@elte.hu> wrote: > > >> What about line number information? ?And the source? ?Into the kernel with > > >> them as well? > > > > > > Sigh. Please read the _very first_ suggestion i made, which solves all that. I > > > rarely go into discussions without suggesting technical solutions - i'm not > > > interested in flaming, i'm interested in real solutions. > > > > > > Here it is, repeated for the Nth time: > > > > > > Allow a guest to (optionally) integrate its VFS namespace with the host side > > > as well. An example scheme would be: > > > > > > ? /guests/Fedora-G1/ > > > ? /guests/Fedora-G1/proc/ > > > ? /guests/Fedora-G1/usr/ > > > ? /guests/Fedora-G1/.../ > > > ? /guests/OpenSuse-G2/ > > > ? /guests/OpenSuse-G2/proc/ > > > ? /guests/OpenSuse-G2/usr/ > > > ? /guests/OpenSuse-G2/.../ > > > > > > ?( This feature would be configurable and would be default-off, to maintain > > > ? ?the current status quo. ) > > > > Heh, funny. That would also solve my number one gripe with virtualization > > these days: how to get files in and out of guests without having to > > install extra packages on the guest side and fiddling with mount points on > > every single guest image I want to play with. > > FYI, for offline guests, you can use libguestfs[1] to access & change files > inside the guest, and read-only access to running guests files. It provides > access via a interactive shell, APIs in all major languages, and also has a > FUSE mdule to expose it directly in the host VFS. It could probably be made > to work read-write for running guests too if its agent were installed inside > the guest & leverage the new Virtio-Serial channel for comms (avoiding any > network setup requirements). > > Regards, > Daniel > > [1] http://libguestfs.org/ Yes, this is the kind of functionality i'm suggesting. I'd suggest a different implementation for live guests: to drive this from within the live guest side of KVM, i.e. basically a paravirt driver for guestfs. You'd pass file API guests to the guest directly, via the KVM ioctl or so - and get responses from the guest. That will give true read-write access and completely coherent (and still transparent) VFS integration, with no host-side knowledge needed for the guest's low level (raw) filesystem structure. That's a big advantage. Yes, it needs an 'aware' guest kernel - but that is a one-off transition overhead whose cost is zero in the long run. (i.e. all KVM kernels beyond a given version would have this ability - otherwise it's guest side distribution transparent) Even 'offline' read-only access could be implemented by booting a minimal kernel via qemu -kernel and using a 'ro' boot option. That way you could eliminate all lowlevel filesystem knowledge from libguestfs. You could run ext4 or btrfs guest filesystems and FAT ones as well - with no restriction. This would allow 'offline' access to Windows images as well: a FAT or ntfs enabled mini-kernel could be booted in read-only mode. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:54 ` Ingo Molnar @ 2010-03-22 13:05 ` Daniel P. Berrange 2010-03-22 13:23 ` Richard W.M. Jones 2010-03-22 13:56 ` Ingo Molnar 0 siblings, 2 replies; 390+ messages in thread From: Daniel P. Berrange @ 2010-03-22 13:05 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 01:54:40PM +0100, Ingo Molnar wrote: > > * Daniel P. Berrange <berrange@redhat.com> wrote: > > > > FYI, for offline guests, you can use libguestfs[1] to access & change files > > inside the guest, and read-only access to running guests files. It provides > > access via a interactive shell, APIs in all major languages, and also has a > > FUSE mdule to expose it directly in the host VFS. It could probably be made > > to work read-write for running guests too if its agent were installed inside > > the guest & leverage the new Virtio-Serial channel for comms (avoiding any > > network setup requirements). > > > > Regards, > > Daniel > > > > [1] http://libguestfs.org/ > > Yes, this is the kind of functionality i'm suggesting. > > I'd suggest a different implementation for live guests: to drive this from > within the live guest side of KVM, i.e. basically a paravirt driver for > guestfs. You'd pass file API guests to the guest directly, via the KVM ioctl > or so - and get responses from the guest. > > That will give true read-write access and completely coherent (and still > transparent) VFS integration, with no host-side knowledge needed for the > guest's low level (raw) filesystem structure. That's a big advantage. > > Yes, it needs an 'aware' guest kernel - but that is a one-off transition > overhead whose cost is zero in the long run. (i.e. all KVM kernels beyond a > given version would have this ability - otherwise it's guest side distribution > transparent) > > Even 'offline' read-only access could be implemented by booting a minimal > kernel via qemu -kernel and using a 'ro' boot option. That way you could > eliminate all lowlevel filesystem knowledge from libguestfs. You could run > ext4 or btrfs guest filesystems and FAT ones as well - with no restriction. > > This would allow 'offline' access to Windows images as well: a FAT or ntfs > enabled mini-kernel could be booted in read-only mode. This is close to the way libguestfs already works. It boots QEMU/KVM pointing to a minimal stripped down appliance linux OS image, containing a small agent it talks to over some form of vmchannel/serial/virtio-serial device. Thus the kernel in the appliance it runs is the only thing that needs to know about the filesystem/lvm/dm on-disk formats - libguestfs definitely does not want to be duplicating this detailed knowledge of on disk format itself. It is doing full read-write access to the guest filesystem in offline mode - one of the major use cases is disaster recovery from a unbootable guest OS image. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 13:05 ` Daniel P. Berrange @ 2010-03-22 13:23 ` Richard W.M. Jones 2010-03-22 14:02 ` Ingo Molnar 2010-03-22 14:20 ` oerg Roedel 2010-03-22 13:56 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: Richard W.M. Jones @ 2010-03-22 13:23 UTC (permalink / raw) To: Daniel P. Berrange Cc: Ingo Molnar, Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 01:05:13PM +0000, Daniel P. Berrange wrote: > This is close to the way libguestfs already works. It boots QEMU/KVM pointing > to a minimal stripped down appliance linux OS image, containing a small agent > it talks to over some form of vmchannel/serial/virtio-serial device. Thus the > kernel in the appliance it runs is the only thing that needs to know about the > filesystem/lvm/dm on-disk formats - libguestfs definitely does not want to be > duplicating this detailed knowledge of on disk format itself. It is doing > full read-write access to the guest filesystem in offline mode - one of the > major use cases is disaster recovery from a unbootable guest OS image. As Dan said, the 'daemon' part is separate and could be run as a standard part of a guest install, talking over vmchannel to the host. The only real issue I can see is adding access control to the daemon (currently it doesn't need it and doesn't do any). Doing it this way you'd be leveraging the ~250,000 lines of existing libguestfs code, bindings in multiple languages, tools etc. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones New in Fedora 11: Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 70 libraries supprt'd http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 13:23 ` Richard W.M. Jones @ 2010-03-22 14:02 ` Ingo Molnar 2010-03-22 14:20 ` oerg Roedel 1 sibling, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 14:02 UTC (permalink / raw) To: Richard W.M. Jones Cc: Daniel P. Berrange, Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Richard W.M. Jones <rjones@redhat.com> wrote: > On Mon, Mar 22, 2010 at 01:05:13PM +0000, Daniel P. Berrange wrote: > > This is close to the way libguestfs already works. It boots QEMU/KVM pointing > > to a minimal stripped down appliance linux OS image, containing a small agent > > it talks to over some form of vmchannel/serial/virtio-serial device. Thus the > > kernel in the appliance it runs is the only thing that needs to know about the > > filesystem/lvm/dm on-disk formats - libguestfs definitely does not want to be > > duplicating this detailed knowledge of on disk format itself. It is doing > > full read-write access to the guest filesystem in offline mode - one of the > > major use cases is disaster recovery from a unbootable guest OS image. > > As Dan said, the 'daemon' part is separate and could be run as a standard > part of a guest install, talking over vmchannel to the host. The only real > issue I can see is adding access control to the daemon (currently it doesn't > need it and doesn't do any). Doing it this way you'd be leveraging the > ~250,000 lines of existing libguestfs code, bindings in multiple languages, > tools etc. I think it would be a nice option to allow such guest-side "daemon's" to be executed in the guest context without _any_ guest-side support. This would be possible by building such minimal daemons that use vmchannel, and which are built for generic x86 (maybe even built for 32-bit x86 so that they can run on any x86 distro). They could execute as the init task of any guest kernel - Qemu could 'blend in / replace' the binary as the init task of the guest temporarily - and some simple bootstrap code could then start the daemon and start the real init binary (and turn off the 'blending' of the init task). That way any guest could be extended via such Qemu functionality - even without any kernel changes. Has anyone thought about (or coded) such a solution perhaps? Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 13:23 ` Richard W.M. Jones 2010-03-22 14:02 ` Ingo Molnar @ 2010-03-22 14:20 ` oerg Roedel 1 sibling, 0 replies; 390+ messages in thread From: oerg Roedel @ 2010-03-22 14:20 UTC (permalink / raw) To: Richard W.M. Jones Cc: Daniel P. Berrange, Ingo Molnar, Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 01:23:26PM +0000, Richard W.M. Jones wrote: > On Mon, Mar 22, 2010 at 01:05:13PM +0000, Daniel P. Berrange wrote: > > This is close to the way libguestfs already works. It boots QEMU/KVM pointing > > to a minimal stripped down appliance linux OS image, containing a small agent > > it talks to over some form of vmchannel/serial/virtio-serial device. Thus the > > kernel in the appliance it runs is the only thing that needs to know about the > > filesystem/lvm/dm on-disk formats - libguestfs definitely does not want to be > > duplicating this detailed knowledge of on disk format itself. It is doing > > full read-write access to the guest filesystem in offline mode - one of the > > major use cases is disaster recovery from a unbootable guest OS image. > > As Dan said, the 'daemon' part is separate and could be run as a > standard part of a guest install, talking over vmchannel to the host. > The only real issue I can see is adding access control to the daemon > (currently it doesn't need it and doesn't do any). Doing it this way > you'd be leveraging the ~250,000 lines of existing libguestfs code, > bindings in multiple languages, tools etc. I think we don't need per-guest-file access control. Probably we could apply the image-file permissions to all guestfs files. This would cover the usecases: * perf for reading symbol information (needs ro-access only anyway) * Desktop like host<->guest file copy I have not looked into libguestfs yet but I guess this approach is easier to achieve. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 13:05 ` Daniel P. Berrange 2010-03-22 13:23 ` Richard W.M. Jones @ 2010-03-22 13:56 ` Ingo Molnar 2010-03-22 14:01 ` Richard W.M. Jones 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 13:56 UTC (permalink / raw) To: Daniel P. Berrange, Richard Jones Cc: Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Daniel P. Berrange <berrange@redhat.com> wrote: > On Mon, Mar 22, 2010 at 01:54:40PM +0100, Ingo Molnar wrote: > > > > * Daniel P. Berrange <berrange@redhat.com> wrote: > > > > > > FYI, for offline guests, you can use libguestfs[1] to access & change files > > > inside the guest, and read-only access to running guests files. It provides > > > access via a interactive shell, APIs in all major languages, and also has a > > > FUSE mdule to expose it directly in the host VFS. It could probably be made > > > to work read-write for running guests too if its agent were installed inside > > > the guest & leverage the new Virtio-Serial channel for comms (avoiding any > > > network setup requirements). > > > > > > Regards, > > > Daniel > > > > > > [1] http://libguestfs.org/ > > > > Yes, this is the kind of functionality i'm suggesting. > > > > I'd suggest a different implementation for live guests: to drive this from > > within the live guest side of KVM, i.e. basically a paravirt driver for > > guestfs. You'd pass file API guests to the guest directly, via the KVM ioctl > > or so - and get responses from the guest. > > > > That will give true read-write access and completely coherent (and still > > transparent) VFS integration, with no host-side knowledge needed for the > > guest's low level (raw) filesystem structure. That's a big advantage. > > > > Yes, it needs an 'aware' guest kernel - but that is a one-off transition > > overhead whose cost is zero in the long run. (i.e. all KVM kernels beyond a > > given version would have this ability - otherwise it's guest side distribution > > transparent) > > > > Even 'offline' read-only access could be implemented by booting a minimal > > kernel via qemu -kernel and using a 'ro' boot option. That way you could > > eliminate all lowlevel filesystem knowledge from libguestfs. You could run > > ext4 or btrfs guest filesystems and FAT ones as well - with no restriction. > > > > This would allow 'offline' access to Windows images as well: a FAT or ntfs > > enabled mini-kernel could be booted in read-only mode. > > This is close to the way libguestfs already works. [...] [ Oops, you are right - sorry for not looking more closely! I was confused by the 'read only' aspect. ] > [...] It boots QEMU/KVM pointing to a minimal stripped down appliance linux > OS image, containing a small agent it talks to over some form of > vmchannel/serial/virtio-serial device. Thus the kernel in the appliance it > runs is the only thing that needs to know about the filesystem/lvm/dm > on-disk formats - libguestfs definitely does not want to be duplicating this > detailed knowledge of on disk format itself. It is doing full read-write > access to the guest filesystem in offline mode - one of the major use cases > is disaster recovery from a unbootable guest OS image. Just curious: any plans to extend this to include live read/write access as well? I.e. to have the 'agent' (guestfsd) running universally, so that tools such as perf and by users could rely on the VFS integration as well, not just disaster recovery tools? Without universal access to this feature it's not adequate for instrumentation purposes. One option to achieve that would be to extend Qemu to allow 'qemu daemons' to run on the (Linux) guest side. These would be statically linked binaries that can run on any Linux system, and which could provide various built-in Qemu functionality from the guest side to the host side. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 13:56 ` Ingo Molnar @ 2010-03-22 14:01 ` Richard W.M. Jones 2010-03-22 14:07 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Richard W.M. Jones @ 2010-03-22 14:01 UTC (permalink / raw) To: Ingo Molnar Cc: Daniel P. Berrange, Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, libguestfs On Mon, Mar 22, 2010 at 02:56:47PM +0100, Ingo Molnar wrote: > Just curious: any plans to extend this to include live read/write access as > well? > > I.e. to have the 'agent' (guestfsd) running universally, so that > tools such as perf and by users could rely on the VFS integration as > well, not just disaster recovery tools? Totally. That's not to say there is a definite plan, but we're very open to doing this. We already wrote the daemon in such a way that it doesn't require the appliance part, but could run inside any existing guest (we've even ported bits of it to Windoze ...). The only remaining issue is how access control would be handled. You obviously wouldn't want anything in the host that can get access to the vmchannel socket to start sending destructive write commands into guests. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/ ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 14:01 ` Richard W.M. Jones @ 2010-03-22 14:07 ` Ingo Molnar 0 siblings, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 14:07 UTC (permalink / raw) To: Richard W.M. Jones Cc: Daniel P. Berrange, Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, libguestfs * Richard W.M. Jones <rjones@redhat.com> wrote: > On Mon, Mar 22, 2010 at 02:56:47PM +0100, Ingo Molnar wrote: > > Just curious: any plans to extend this to include live read/write access as > > well? > > > > I.e. to have the 'agent' (guestfsd) running universally, so that > > tools such as perf and by users could rely on the VFS integration as > > well, not just disaster recovery tools? > > Totally. That's not to say there is a definite plan, but we're very open to > doing this. We already wrote the daemon in such a way that it doesn't > require the appliance part, but could run inside any existing guest (we've > even ported bits of it to Windoze ...). > > The only remaining issue is how access control would be handled. You > obviously wouldn't want anything in the host that can get access to the > vmchannel socket to start sending destructive write commands into guests. By default i'd suggest to put it into a maximally restricted mount point. I.e. restrict access to only the security context running libguestfs or so. ( Which in practice will be the user starting the guest, so there will be proper protection from other users while still allowing easy access to the user that has access already. ) Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 11:48 ` Ingo Molnar 2010-03-22 12:31 ` Pekka Enberg @ 2010-03-22 12:36 ` Avi Kivity 2010-03-22 12:50 ` Pekka Enberg 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 12:36 UTC (permalink / raw) To: Ingo Molnar Cc: Antoine Martin, Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 01:48 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> My 10+ years experience with kernel instrumentation solutions is that >>> kernel-driven, self-sufficient, robust, trustable, well-enumerated sources >>> of information work far better in practice. >>> >> What about line number information? And the source? Into the kernel with >> them as well? >> > Sigh. Please read the _very first_ suggestion i made, which solves all that. I > rarely go into discussions without suggesting technical solutions - i'm not > interested in flaming, i'm interested in real solutions. > > Here it is, repeated for the Nth time: > > Allow a guest to (optionally) integrate its VFS namespace with the host side > as well. An example scheme would be: > > /guests/Fedora-G1/ > [...] You're missing something. This sub-thread is about someone launching a kernel with 'qemu -kernel', the kernel lives outside the guest disk image, they don't want a custom initrd because it's hard to make. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:36 ` Avi Kivity @ 2010-03-22 12:50 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 12:50 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 2:36 PM, Avi Kivity <avi@redhat.com> wrote: >> Here it is, repeated for the Nth time: >> >> Allow a guest to (optionally) integrate its VFS namespace with the host >> side >> as well. An example scheme would be: >> >> /guests/Fedora-G1/ >> > > [...] > > You're missing something. This sub-thread is about someone launching a > kernel with 'qemu -kernel', the kernel lives outside the guest disk image, > they don't want a custom initrd because it's hard to make. Well, you know, I am missing your point here about initrd. Surely the guest kernels need to use sys_mount() at some point at which time they could just tell the host kernel where they can find the mount points? But maybe we're not talking about that kind of scenario here? Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-22 12:50 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 12:50 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 2:36 PM, Avi Kivity <avi@redhat.com> wrote: >> Here it is, repeated for the Nth time: >> >> Allow a guest to (optionally) integrate its VFS namespace with the host >> side >> as well. An example scheme would be: >> >> /guests/Fedora-G1/ >> > > [...] > > You're missing something. This sub-thread is about someone launching a > kernel with 'qemu -kernel', the kernel lives outside the guest disk image, > they don't want a custom initrd because it's hard to make. Well, you know, I am missing your point here about initrd. Surely the guest kernels need to use sys_mount() at some point at which time they could just tell the host kernel where they can find the mount points? But maybe we're not talking about that kind of scenario here? Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:20 ` Ingo Molnar 2010-03-22 6:35 ` Avi Kivity @ 2010-03-22 6:59 ` Zhang, Yanmin 1 sibling, 0 replies; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-22 6:59 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori, Pekka Enberg, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Sun, 2010-03-21 at 22:20 +0100, Ingo Molnar wrote: > * Avi Kivity <avi@redhat.com> wrote: > > > > Well, for what it's worth, I rarely ever use anything else. My virtual > > > disks are raw so I can loop mount them easily, and I can also switch my > > > guest kernels from outside... without ever needing to mount those disks. > > > > Curious, what do you use them for? > > > > btw, if you build your kernel outside the guest, then you already have > > access to all its symbols, without needing anything further. > > There's two errors with your argument: > > 1) you are assuming that it's only about kernel symbols > > Look at this 'perf report' output: > > # Samples: 7127509216 > # > # Overhead Command Shared Object Symbol > # ........ .......... ............................. ...... > # > 19.14% git git [.] lookup_object > 15.16% perf git [.] lookup_object > 4.74% perf libz.so.1.2.3 [.] inflate > 4.52% git libz.so.1.2.3 [.] inflate > 4.21% perf libz.so.1.2.3 [.] inflate_table > 3.94% git libz.so.1.2.3 [.] inflate_table > 3.29% git git [.] find_pack_entry_one > 3.24% git libz.so.1.2.3 [.] inflate_fast > 2.96% perf libz.so.1.2.3 [.] inflate_fast > 2.96% git git [.] decode_tree_entry > 2.80% perf libc-2.11.90.so [.] __strlen_sse42 > 2.56% git libc-2.11.90.so [.] __strlen_sse42 > 1.98% perf libc-2.11.90.so [.] __GI_memcpy > 1.71% perf git [.] decode_tree_entry > 1.53% git libc-2.11.90.so [.] __GI_memcpy > 1.48% git git [.] lookup_blob > 1.30% git git [.] process_tree > 1.30% perf git [.] process_tree > 0.90% perf git [.] tree_entry > 0.82% perf git [.] lookup_blob > 0.78% git [kernel.kallsyms] [k] kstat_irqs_cpu > > kernel symbols are only a small portion of the symbols. (a single line in this > case) Above example shows perf could summarize both kernel and application hot functions. If we collect guest os statistics from host side, we can't summarize detailed guest os application info because we couldn't get guest os's application process id from host side. So we could only get detailed kernel info and the total utilization percent of guest application processes. > > To get to those other symbols we have to read the ELF symbols of those > binaries in the guest filesystem, in the post-processing/reporting phase. This > is both complex to do and relatively slow so we dont want to (and cannot) do > this at sample time from IRQ context or NMI context ... > > Also, many aspects of reporting are interactive so it's done lazily or > on-demand. So we need ready access to the guest filesystem - for those guests > which decide to integrate with the host for this. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:03 ` Avi Kivity 2010-03-21 21:20 ` Ingo Molnar @ 2010-03-22 12:05 ` Antoine Martin 1 sibling, 0 replies; 390+ messages in thread From: Antoine Martin @ 2010-03-22 12:05 UTC (permalink / raw) To: Avi Kivity Cc: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker [snip] >>> I believe that -kernel use will be rare, though. It's a lot >>> easier to keep everything in one filesystem. >> Well, for what it's worth, I rarely ever use anything else. My >> virtual disks are raw so I can loop mount them easily, and I can also >> switch my guest kernels from outside... without ever needing to mount >> those disks. > > Curious, what do you use them for? Various things, here is one use case which I think is under-used: read-only virtual disks with just one network application on them (no runlevels, sshd, user accounts, etc), a hell of a lot easier to maintain and secure than a full blown distro. Want a new kernel? boot a new VM and swap it for the old one with zero downtime (if your network app supports this sort of hot-swap - which a lot of cluster apps do) Another reason for wanting to keep the kernel outside is to limit the potential points of failure: remove the partition table, remove the bootloader, remove even the ramdisk. Also makes it easier to switch to another solution (say UML) or another disk driver (as someone mentioned previously). In virtualized environments I often prefer to remove the ability to load kernel modules too, for obvious reasons. Hope this helps. Antoine ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:11 ` Avi Kivity 2010-03-21 20:18 ` Antoine Martin @ 2010-03-21 20:37 ` Ingo Molnar 2010-03-22 6:37 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 20:37 UTC (permalink / raw) To: Avi Kivity Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/21/2010 10:08 PM, Olivier Galibert wrote: > >On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote: > >>On 03/21/2010 09:17 PM, Ingo Molnar wrote: > >>>Adding any new daemon to an existing guest is a deployment and usability > >>>nightmare. > >>> > >>The logical conclusion of that is that everything should be built into > >>the kernel. Where a failure brings the system down or worse. Where you > >>have to bear the memory footprint whether you ever use the functionality > >>or not. Where to update the functionality you need to deploy a new > >>kernel (possibly introducing unrelated bugs) and reboot. > >> > >>If userspace daemons are such a deployment and usability nightmare, > >>maybe we should fix that instead. > >Which userspace? Deploying *anything* in the guest can be a > >nightmare, including paravirt drivers if you don't have a natively > >supported in the OS virtual hardware backoff. > > That includes the guest kernel. If you can deploy a new kernel in the > guest, presumably you can deploy a userspace package. Note that with perf we can instrument the guest with zero guest-kernel modifications as well. We try to reduce the guest impact to a bare minimum, as the difficulties in deployment are function of the cross section surface to the guest. Also, note that the kernel is special with regards to instrumentation: since this is the kernel project, we are doing kernel space changes, as we are doing them _anyway_. So adding symbol resolution capabilities would be a minimal addition to that - while adding a while new guest package for the demon would significantly increase the cross section surface. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:37 ` Ingo Molnar @ 2010-03-22 6:37 ` Avi Kivity 2010-03-22 11:39 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 6:37 UTC (permalink / raw) To: Ingo Molnar Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 10:37 PM, Ingo Molnar wrote: > >> That includes the guest kernel. If you can deploy a new kernel in the >> guest, presumably you can deploy a userspace package. >> > Note that with perf we can instrument the guest with zero guest-kernel > modifications as well. > > We try to reduce the guest impact to a bare minimum, as the difficulties in > deployment are function of the cross section surface to the guest. > > Also, note that the kernel is special with regards to instrumentation: since > this is the kernel project, we are doing kernel space changes, as we are doing > them _anyway_. So adding symbol resolution capabilities would be a minimal > addition to that - while adding a while new guest package for the demon would > significantly increase the cross section surface. > It's true that for us, changing the kernel is easier than changing the rest of the guest. IMO we should still resist the temptation to go the easy path and do the right thing (I understand we disagree about what the right thing is). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 6:37 ` Avi Kivity @ 2010-03-22 11:39 ` Ingo Molnar 2010-03-22 12:44 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 11:39 UTC (permalink / raw) To: Avi Kivity Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/21/2010 10:37 PM, Ingo Molnar wrote: > > > >>That includes the guest kernel. If you can deploy a new kernel in the > >>guest, presumably you can deploy a userspace package. > > > > Note that with perf we can instrument the guest with zero guest-kernel > > modifications as well. > > > > We try to reduce the guest impact to a bare minimum, as the difficulties > > in deployment are function of the cross section surface to the guest. > > > > Also, note that the kernel is special with regards to instrumentation: > > since this is the kernel project, we are doing kernel space changes, as we > > are doing them _anyway_. So adding symbol resolution capabilities would be > > a minimal addition to that - while adding a while new guest package for > > the demon would significantly increase the cross section surface. > > It's true that for us, changing the kernel is easier than changing the rest > of the guest. IMO we should still resist the temptation to go the easy path > and do the right thing (I understand we disagree about what the right thing > is). It is not about the 'temptation to go the easy path'. It is about finding the most pragmatic approach and realizing the cost of inaction: sucky Linux, sucky KVM. Let me give you an example: Linus's commit in v2.6.30 that changed the user-space policy of the EXT3 filesystem to make it more desktop capable: bbae8bc: ext3: make default data ordering mode configurable That changes was opposed vehemently with your kind of arguments: "such changes should be done by the distributions", "it should be done correctly", "the kernel should not implement policy", etc.. I can also tell you that this commit improved my desktop experience incredibly. Still, distros didnt do it for almost a decade of ext3 existence. Why? Truth is that those kinds of "do it right" arguments are mistaken because they assume that we live in an ideal, 'perfect market' where all inefficiencies will get eliminated in the long run. In reality the "market" for OSS software is imperfect: - there's marginal costs of action - a too small change has difficulty getting over that - there's costs of modularization (which are both technical and social) - there's the power of the status quo acting against marginally good changes - there's the power of entropy ripping Linux distributions apart making all-distro changes harder So the solution to the "why dont the distributions do this" question you pose is exactly what i propose: _give a default, reference implementation of KVM tooling that has to be eclipsed_. There's the unique position of the kernel that it can impose sanity in a more central way which acts as a reference implementation. I.e. the kernel can very much improve quality all across the board by providing a sane default (in the ext3 case) - or, as in the case of perf, by providing a sane 'baseline' tooling. It should do the same for KVM as well. If we dont do that, Linux will eventually stop mattering on the desktop - and some time after that, it will vanish from the server space as well. Then, may it be a decade down the line, you wont have a KVM hacking job left, and you wont know where all those forces eliminating your project came from. But i told you now so you'll know ;-) Reality is, the server space never was and never will be self-sustaining in the long run (as Novell has found it out with Netware), it is the desktop that dictates future markets. This is why i find your views about this naive and shortsighted. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 11:39 ` Ingo Molnar @ 2010-03-22 12:44 ` Avi Kivity 2010-03-22 12:54 ` Daniel P. Berrange 2010-03-22 14:26 ` Ingo Molnar 0 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 12:44 UTC (permalink / raw) To: Ingo Molnar Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 01:39 PM, Ingo Molnar wrote: > Reality is, the server space never was and never will be self-sustaining in > the long run (as Novell has found it out with Netware), it is the desktop that > dictates future markets. This is why i find your views about this naive and > shortsighted. > Yet Linux is gaining ground in the server and embedded space while struggling on the desktop. Apple is gaining ground on the desktop but is invisible on the server side (despite having a nice product - Xserve). It's true Windows achieved server dominance through it's desktop power, but I don't think that's what keeping them there now. In any case, I'm not going to write a kvm GUI. It doesn't match my skills, interest, or my employer's interest. If you wish to see a kvm GUI you have to write one yourself or convince someone to write it (perhaps convince Red Hat to fund such an effort beyond virt-manager). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:44 ` Avi Kivity @ 2010-03-22 12:54 ` Daniel P. Berrange 2010-03-22 14:26 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Daniel P. Berrange @ 2010-03-22 12:54 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 02:44:57PM +0200, Avi Kivity wrote: > On 03/22/2010 01:39 PM, Ingo Molnar wrote: > >Reality is, the server space never was and never will be self-sustaining in > >the long run (as Novell has found it out with Netware), it is the desktop > >that > >dictates future markets. This is why i find your views about this naive and > >shortsighted. > > > > Yet Linux is gaining ground in the server and embedded space while > struggling on the desktop. Apple is gaining ground on the desktop but > is invisible on the server side (despite having a nice product - Xserve). > > It's true Windows achieved server dominance through it's desktop power, > but I don't think that's what keeping them there now. > > In any case, I'm not going to write a kvm GUI. It doesn't match my > skills, interest, or my employer's interest. If you wish to see a kvm > GUI you have to write one yourself or convince someone to write it > (perhaps convince Red Hat to fund such an effort beyond virt-manager). It is planned to add support for SPICE remote desktop to virt-manager once that matures & is accepted into upstream KVM/QEMU. That will improve the guest/desktop interaction in many ways compared to VNC or SDL, with improved display resolution changing, copy+paste between host & guest, much better graphics performance, etc. Development efforts aren't totally ignoring the desktop, more that they are focusing on remoting guest desktops, rather than interaction host desktop since that's where alot of demand is. This benefits single host desktops scenarios too, since there's alot of overlap in the problems faced there. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:44 ` Avi Kivity 2010-03-22 12:54 ` Daniel P. Berrange @ 2010-03-22 14:26 ` Ingo Molnar 2010-03-22 17:29 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 14:26 UTC (permalink / raw) To: Avi Kivity Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/22/2010 01:39 PM, Ingo Molnar wrote: > > > > Reality is, the server space never was and never will be self-sustaining > > in the long run (as Novell has found it out with Netware), it is the > > desktop that dictates future markets. This is why i find your views about > > this naive and shortsighted. > > Yet Linux is gaining ground in the server and embedded space while > struggling on the desktop. [...] Frankly, Linux is mainly growing in the server space due to: 1) the server space is technically much simpler than the desktop space. It is far easier to code up a server performance feature than to make struggle through stupid (server-motivated) package boundaries and get something done on the desktop. It is far easier to code up a server app as that space is well standardized and servers tend to be compartmented. Integration between server apps is much less common than integration between desktop apps, hence the harm that our modularization idiocies cause less harm. 2) Linux's growth is still feeding on the remains of the destruction of Unix. Linux is struggling on the desktop due to the desktop's inherent complexity, due to the lack of the Unix inertia and due to incompetence, insensitivity, intellectual arrogance and shortsightedness of server-centric thinking, like your arguments/position displayed in this very thread. > [...] Apple is gaining ground on the desktop but is invisible on the server > side (despite having a nice product - Xserve). But the thing is, Apple doesnt really care about the server space, yet. It is lucrative but it is a side-show: it will fall automatically to the 'winner' of the desktop (or gadget) of tomorrow. Has the quick fall of Banyan Vines or Netware (both excellent all-around server products) taught you nothing? We need a lot more desktop focus in the kernel community. The best method to achieve this, that i know of currently, is to simply have kernel developers think outside the kernel box and to have them do bits of user-space coding as well - and in particular desktop coding. To eat our own dogfood in essence. Suffer through crap we cause to user-space. To face the _real_ difficulties of users. We seem to have forgotten our roots. > [...] > > It's true Windows achieved server dominance through it's desktop power, but > I don't think that's what keeping them there now. What is keeping them there is precisely that. > In any case, I'm not going to write a kvm GUI. It doesn't match my skills, > interest, or my employer's interest. If you wish to see a kvm GUI you have > to write one yourself or convince someone to write it (perhaps convince Red > Hat to fund such an effort beyond virt-manager). As a maintainer you certainly dont have to write a single line of code, if you dont want to. You 'just' need to care about the big picture and encourage/help the flow and balance of the whole project. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 14:26 ` Ingo Molnar @ 2010-03-22 17:29 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 17:29 UTC (permalink / raw) To: Ingo Molnar Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 04:26 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/22/2010 01:39 PM, Ingo Molnar wrote: >> >>> Reality is, the server space never was and never will be self-sustaining >>> in the long run (as Novell has found it out with Netware), it is the >>> desktop that dictates future markets. This is why i find your views about >>> this naive and shortsighted. >>> >> Yet Linux is gaining ground in the server and embedded space while >> struggling on the desktop. [...] >> > Frankly, Linux is mainly growing in the server space due to: > > 1) the server space is technically much simpler than the desktop space. It > is far easier to code up a server performance feature than to make > struggle through stupid (server-motivated) package boundaries and get > something done on the desktop. It is far easier to code up a server app > as that space is well standardized and servers tend to be compartmented. > Integration between server apps is much less common than integration > between desktop apps, hence the harm that our modularization idiocies > cause less harm. > > 2) Linux's growth is still feeding on the remains of the destruction of Unix. > Agreed (minus the 'package boundaries' stuff). Also, Linux is cheaper than Windows. > Linux is struggling on the desktop due to the desktop's inherent complexity, > due to the lack of the Unix inertia and due to incompetence, insensitivity, > intellectual arrogance and shortsightedness of server-centric thinking, like > your arguments/position displayed in this very thread. > It's struggling because it isn't competitive technically with other desktops, because there is no application base, because of a chicken-and-egg problem with some drivers, because lack of a stable ABI means you can't get a driver CD with your device so you need a yet-unreleased kernel, because the zillion binary incompatible distributions mean that application developers don't know what to code and test for, because of lack of documentation, to name a few. At least it's improving all the time. The incompetence, insensitivity, intellectual arrogance and shortsightedness of server-centric thinking of my arguments/position are a result of this, not the cause. >> [...] Apple is gaining ground on the desktop but is invisible on the server >> side (despite having a nice product - Xserve). >> > But the thing is, Apple doesnt really care about the server space, yet. It is > lucrative but it is a side-show: it will fall automatically to the 'winner' of > the desktop (or gadget) of tomorrow. > It won't automatically fall to Apple, there's tons of middleware and server apps that need porting (the "ecosystem"), plus they need to work hard on improving their kernel which is desktop oriented. Looks like they're interesting in other things. > Has the quick fall of Banyan Vines or Netware (both excellent all-around > server products) taught you nothing? > Not familiar with Banyan, but wasn't Netware a cooperative multitasking command line only thing? It couldn't compete with preemptive modern system with a nice GUI. Windows didn't need the desktop to win that fight. > We need a lot more desktop focus in the kernel community. The best method to > achieve this, that i know of currently, is to simply have kernel developers > think outside the kernel box and to have them do bits of user-space coding as > well - and in particular desktop coding. To eat our own dogfood in essence. > Suffer through crap we cause to user-space. To face the _real_ difficulties of > users. We seem to have forgotten our roots. > Try it yourself and report the experience. Note: perf is not desktop development, it's kernel tooling development. >> [...] >> >> It's true Windows achieved server dominance through it's desktop power, but >> I don't think that's what keeping them there now. >> > What is keeping them there is precisely that. > Not at all. They have excellent development tools and lots of middleware and other third party products that make it easy to pick Windows. For example, Exchange is more or less standard for groupware, and they made C# and the technology around it easy to develop for, learning from Java's mistakes. >> In any case, I'm not going to write a kvm GUI. It doesn't match my skills, >> interest, or my employer's interest. If you wish to see a kvm GUI you have >> to write one yourself or convince someone to write it (perhaps convince Red >> Hat to fund such an effort beyond virt-manager). >> > As a maintainer you certainly dont have to write a single line of code, if you > dont want to. You 'just' need to care about the big picture and encourage/help > the flow and balance of the whole project. > I haven't written that line of code, and no one else has either. Don't tell me they're all scared of me. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:08 ` Olivier Galibert 2010-03-21 20:11 ` Avi Kivity @ 2010-03-21 20:11 ` Avi Kivity 1 sibling, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-21 20:11 UTC (permalink / raw) To: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin On 03/21/2010 10:08 PM, Olivier Galibert wrote: > On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote: > >> On 03/21/2010 09:17 PM, Ingo Molnar wrote: >> >>> Adding any new daemon to an existing guest is a deployment and usability >>> nightmare. >>> >>> >> The logical conclusion of that is that everything should be built into >> the kernel. Where a failure brings the system down or worse. Where you >> have to bear the memory footprint whether you ever use the functionality >> or not. Where to update the functionality you need to deploy a new >> kernel (possibly introducing unrelated bugs) and reboot. >> >> If userspace daemons are such a deployment and usability nightmare, >> maybe we should fix that instead. >> > Which userspace? Deploying *anything* in the guest can be a > nightmare, including paravirt drivers if you don't have a natively > supported in the OS virtual hardware backoff. That includes the guest kernel. If you can deploy a new kernel in the guest, presumably you can deploy a userspace package. > Deploying things in the > host OTOH is business as usual. > True. > And you're smart enough to know that. > Thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:01 ` Avi Kivity 2010-03-21 20:08 ` Olivier Galibert @ 2010-03-21 20:31 ` Ingo Molnar 2010-03-21 21:30 ` Avi Kivity 2010-03-22 11:10 ` oerg Roedel 1 sibling, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 20:31 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/21/2010 09:17 PM, Ingo Molnar wrote: > > > > Adding any new daemon to an existing guest is a deployment and usability > > nightmare. > > The logical conclusion of that is that everything should be built into the > kernel. [...] Only if you apply it as a totalitarian rule. Furthermore, the logical conclusion of _your_ line of argument (applied in a totalitarian manner) is that 'nothing should be built into the kernel'. I.e. you are arguing for microkernel Linux, while you see me as arguing for a monolithic kernel. Reality is that we are somewhere inbetween, we are neither black nor white: it's shades of grey. If we want to do a good job with all this then we observe subsystems, we see how they relate to the physical world and decide about how to shape them. We identify long-term changes and re-design modularization boundaries in hindsight - when we got them wrong initially. We dont try to rationalize the status-quo. Lets see one example of that thought process in action: Oprofile. We saw that the modularization of oprofile was a total nightmare: a separate kernel-space and a separate user-space component, which was in constant version friction. The ABI between them was stiffling: it was hard to change it (you needed to trickle that through the tool as well which was on a different release schedule, etc.e tc.) The result was sucky usability that never went beyond some basic 'you can do profiling' threshold. The subsystem worked well within that design box, and it was worked on by highly competent people - but it was still far, far away from the potential it could have achieved. So we observed those problems and decided to do something about it: - We unified the two parts into a single maintenance domain. There's the kernel-side in kernel/perf_event.c and arch/*/*/perf_event.c, plus the user-side in tools/perf/. The two are connected by a very flexible, forwards and backwards compatible ABI. - We moved much more code into the kernel, realizing that transparent and robust instrumentation should be offered instead of punting abstractions into user-space (which is in a disadvantaged position to implement system-wide abstractions). - We created a no-bullsh*t approach to usability. perf is by no means perfect, but it's written by developers for developers and if you report a bug to us we'll act on it before anything else. Furthermore the kernel developers do the user-space coding as well, so there's no chinese wall separating them. Kernel-space becomes aware of the intricacies of user-space and user-space developers become aware of the difficulties of kernel-space as well. It's a good mix in our experience. The thing is (and i doubt you are surprised that i say that), i see a similar situation with KVM. The basic parameters are comparable to Oprofile: it has a kernel-space component and a KVM-specific user-space. By all practical means the two are one and the same, but are maintained as different projects. I have followed KVM since its inception with great interest. I saw its good initial design, i tried it early on and even wrote various patches for it. So i care more about KVM than a random observer would, but this preference and passion for KVM's good technical sides does not cloud my judgement when it comes to its weaknesses. In fact the weaknesses are far more important to identify and express publicly, so i tend to concentrate on them. Dont take this as me blasting KVM, we both know the many good aspects of KVM. So, as i explained it earlier in greater detail the modularization of KVM into a separate kernel-space and user-space component is one of its worst current weaknesses, and it has become the main stiffling force in the way of a better KVM experience to users. That, IMO, is the 'weakest link' of KVM today and no matter how well the rest of KVM gets improved those nice bits all get unfairly ignored when the user cannot have a usable and good desktop experience and thinks that KVM is crappy. I think you should think outside the initial design box you have created 4 years ago, you should consider iterating the model and you should consider the alternative i suggested: move (or create) KVM tooling to tools/kvm/ and treat it as a single project from there on. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:31 ` Ingo Molnar @ 2010-03-21 21:30 ` Avi Kivity 2010-03-21 21:52 ` Ingo Molnar 2010-03-22 11:10 ` oerg Roedel 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-21 21:30 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 10:31 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/21/2010 09:17 PM, Ingo Molnar wrote: >> >>> Adding any new daemon to an existing guest is a deployment and usability >>> nightmare. >>> >> The logical conclusion of that is that everything should be built into the >> kernel. [...] >> > Only if you apply it as a totalitarian rule. > > Furthermore, the logical conclusion of _your_ line of argument (applied in a > totalitarian manner) is that 'nothing should be built into the kernel'. > I'm certainly a minimalist, but that doesn't follow. Things that require privileged access, or access to the page cache, or that can't be made to perform otherwise should certainly be in the kernel. That's why I submitted kvm for inclusion in the first place. If it's something that can work just as well in userspace but we can't be bothered to fix any 'deployment nightmares', then they shouldn't be in the kernel. Examples include lvm2 and mdadm (which truly are 'deployment nightmares' - you need to start them before you have access to your filesystem - yet they work somehow). > I.e. you are arguing for microkernel Linux, while you see me as arguing for a > monolithic kernel. > No. I'm arguing for reducing bloat wherever possible. Kernel code is more expensive than userspace code in every metric possible. > Reality is that we are somewhere inbetween, we are neither black nor white: > it's shades of grey. > > If we want to do a good job with all this then we observe subsystems, we see > how they relate to the physical world and decide about how to shape them. We > identify long-term changes and re-design modularization boundaries in > hindsight - when we got them wrong initially. We dont try to rationalize the > status-quo. > I'm not for the status quo either - I'm for reducing the kernel code footprint whereever it doesn't impact performance or break clean interfaces. > Lets see one example of that thought process in action: Oprofile. > > We saw that the modularization of oprofile was a total nightmare: a separate > kernel-space and a separate user-space component, which was in constant > version friction. The ABI between them was stiffling: it was hard to change it > (you needed to trickle that through the tool as well which was on a different > release schedule, etc.e tc.) > > The result was sucky usability that never went beyond some basic 'you can do > profiling' threshold. The subsystem worked well within that design box, and it > was worked on by highly competent people - but it was still far, far away from > the potential it could have achieved. > > So we observed those problems and decided to do something about it: > > - We unified the two parts into a single maintenance domain. There's > the kernel-side in kernel/perf_event.c and arch/*/*/perf_event.c, > plus the user-side in tools/perf/. The two are connected by a very > flexible, forwards and backwards compatible ABI. > That's useful because perf is still small. If it were a full fledged 350KLOC GUI, then most of the development would concentrate on the GUI and very little (relatively) would have to do with the kernel. Qemu is in that state today. Please, please look at the recent commits and check how many have actually anything to do with kvm, and how many with everything else. > - We moved much more code into the kernel, realizing that transparent > and robust instrumentation should be offered instead of punting > abstractions into user-space (which is in a disadvantaged position > to implement system-wide abstractions). > No argument. I have a similar experience with kvm. The user/kernel break is at the cpu virtualization level - that is kvm is solely responsible for emulating a cpu and userspace is responsible for emulating devices. An exception was made for the PIC/IOAPIC/PIT due to performance considerations - they are emulated in the kernel as well. A common FAQ is why do we not emulate real-mode instructions in qemu. The answer is that it the interface to kvm would be insane - it would emulate a partial cpu. All other users of that interface would have to implement an emulator (there is also a practical argument - the qemu emulator does not implement atomics correctly wrt other threads). > - We created a no-bullsh*t approach to usability. perf is by no means > perfect, but it's written by developers for developers and if you report a > bug to us we'll act on it before anything else. Furthermore the kernel > developers do the user-space coding as well, so there's no chinese > wall separating them. Kernel-space becomes aware of the intricacies of > user-space and user-space developers become aware of the difficulties of > kernel-space as well. It's a good mix in our experience. > Excellent. However qemu is written by developers for their users, and their users are not worried about an eject button in the qemu SDL interface, or about running the qemu command line by hand. They have complicated management interfaces that do everything, so we concentrate, for example, on a robust RPC interface for qemu. That means nothing for command line users but is critical for our users. I am not _against_ excellent support for command-line users, but I am not going to divert the resources I control (=me) into something that is not needed by my users. I encourage anyone who wants to improve usability to subscribe to qemu-devel and contribute, they will receive a warm welcome. > The thing is (and i doubt you are surprised that i say that), i see a similar > situation with KVM. The basic parameters are comparable to Oprofile: it has a > kernel-space component and a KVM-specific user-space. By all practical means > the two are one and the same, but are maintained as different projects. > There is tight cooperation between the maintainers and developers of these two projects. Most developers are subscibed to both mailing lists and many have contributed to both repositories. There does not appear to be a problem with release schedules. > I have followed KVM since its inception with great interest. I saw its good > initial design, i tried it early on and even wrote various patches for it. So > i care more about KVM than a random observer would, but this preference and > passion for KVM's good technical sides does not cloud my judgement when it > comes to its weaknesses. > > In fact the weaknesses are far more important to identify and express > publicly, so i tend to concentrate on them. Dont take this as me blasting KVM, > we both know the many good aspects of KVM. > > So, as i explained it earlier in greater detail the modularization of KVM into > a separate kernel-space and user-space component is one of its worst current > weaknesses, and it has become the main stiffling force in the way of a better > KVM experience to users. > > That, IMO, is the 'weakest link' of KVM today and no matter how well the rest > of KVM gets improved those nice bits all get unfairly ignored when the user > cannot have a usable and good desktop experience and thinks that KVM is > crappy. > Thanks. I agree the user experience when launching qemu from the command line is miles behind virtualbox and vmware workstation. What I disagree is that this is how a typical user will first experience kvm - most distributions now integrate virt-manager which allows you much better graphical interaction. Unfortunately, virt-manager is still server-oriented (for example, it uses VNC instead of displaying directly to X), and is hardly polished to the same level as commercial tools. However, you cannot force someone to write good desktop integration for qemu, it has to come from someone with the itch, the experience, the capability, and the time. > I think you should think outside the initial design box you have created 4 > years ago, you should consider iterating the model and you should consider the > alternative i suggested: move (or create) KVM tooling to tools/kvm/ and treat > it as a single project from there on. > Do you really think that tools/kvm/ would create a good GUI for kvm? lkml is hardly the place where GUI developers and designers congregate. Please, if any of you GUI experts are reading this, please consider contributing to qemu directly. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:30 ` Avi Kivity @ 2010-03-21 21:52 ` Ingo Molnar 2010-03-22 6:49 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 21:52 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > > I.e. you are arguing for microkernel Linux, while you see me as arguing > > for a monolithic kernel. > > No. I'm arguing for reducing bloat wherever possible. Kernel code is more > expensive than userspace code in every metric possible. 1) One of the primary design arguments of the micro-kernel design as well was to push as much into user-space as possible without impacting performance too much so you very much seem to be arguing for a micro-kernel design for the kernel. I think history has given us the answer for that fight between microkernels and monolithic kernels. Furthermore, to not engage in hypotheticals about microkernels: by your argument the Oprofile design was perfect (it was minimalistic kernel-space, with all the complexity in user-space), while perf was over-complex (which does many things in the kernel that could have been done in user-space). Practical results suggest the exact opposite happened - Oprofile is being replaced by perf. How do you explain that? 2) In your analysis you again ignore the package boundary costs and artifacts as if they didnt exist. That was my main argument, and that is what we saw with oprofile and perf: while maintaining more kernel-code may be more expensive, it sure pays off for getting us a much better solution in the end. And getting a 'much better solution' to users is the goal of all this, isnt it? I dont mind what you call 'bloat' per se if it's for a purpose that users find like a good deal. I have quite a bit of RAM in most of my systems, having 50K more or less included in the kernel image is far less important than having a healthy and vibrant development model and having satisfied users ... Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:52 ` Ingo Molnar @ 2010-03-22 6:49 ` Avi Kivity 2010-03-22 11:23 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 6:49 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 11:52 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> I.e. you are arguing for microkernel Linux, while you see me as arguing >>> for a monolithic kernel. >>> >> No. I'm arguing for reducing bloat wherever possible. Kernel code is more >> expensive than userspace code in every metric possible. >> > 1) > > One of the primary design arguments of the micro-kernel design as well was to > push as much into user-space as possible without impacting performance too > much so you very much seem to be arguing for a micro-kernel design for the > kernel. > > I think history has given us the answer for that fight between microkernels > and monolithic kernels. > I am not arguing for a microkernel. Again: reduce bloat where possible, kernel code is more expensive than userspace code. > Furthermore, to not engage in hypotheticals about microkernels: by your > argument the Oprofile design was perfect (it was minimalistic kernel-space, > with all the complexity in user-space), while perf was over-complex (which > does many things in the kernel that could have been done in user-space). > > Practical results suggest the exact opposite happened - Oprofile is being > replaced by perf. How do you explain that? > I did not say that the amount of kernel and userspace code is the only factor deciding the quality of software. If that were so, microkernels would have won out long ago. It may be that that perf has too much kernel code, and won against oprofile despite that because it was better in other areas. Or it may be that perf has exactly the right user/kernel division. Or maybe perf needs some of the code moved from userspace to the kernel. I don't know, I haven't examined the code. The user/kernel boundary is only one metric for code quality. Nor is it always in favour of pushing things to userspace. Narrowing or simplifying an interface is often an argument in favour of pushing things into the kernel. IMO the reason perf is more usable than oprofile has less to do with the kernel/userspace boundary and more do to with effort and attention spent on the userspace/user boundary. > 2) > > In your analysis you again ignore the package boundary costs and artifacts as > if they didnt exist. > > That was my main argument, and that is what we saw with oprofile and perf: > while maintaining more kernel-code may be more expensive, it sure pays off for > getting us a much better solution in the end. > Package costs are real. We need to bear them. I don't think that because maintaining another package (and the interface between two packages) is more difficult, then the kernel size should increase. > And getting a 'much better solution' to users is the goal of all this, isnt > it? > > I dont mind what you call 'bloat' per se if it's for a purpose that users find > like a good deal. I have quite a bit of RAM in most of my systems, having 50K > more or less included in the kernel image is far less important than having a > healthy and vibrant development model and having satisfied users ... > I'm not worried about 50K or so, I'm worried about a bug in those 50K taking down the guest. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 6:49 ` Avi Kivity @ 2010-03-22 11:23 ` Ingo Molnar 2010-03-22 12:49 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 11:23 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > IMO the reason perf is more usable than oprofile has less to do with the > kernel/userspace boundary and more do to with effort and attention spent on > the userspace/user boundary. > > [...] If you are interested in the first-hand experience of the people who are doing the perf work then here it is: by far the biggest reason for perf success and perf usability is the integration of the user-space tooling with the kernel-space bits, into a single repository and project. The very move you are opposing so vehemently for KVM. Oprofile went the way you proposed, and it was a failure. It failed not because it was bad technology (it was pretty decent and people used it), it was not a failure because the wrong people worked on it (to the contrary, very capable people worked on it), it was a failure in hindsight because it simply incorrectly split into two projects which stiffled the progress of each other. Obviously 3 years ago you'd have seen a similar, big "Oprofile is NOT broken!" flamewar, had i posted the same observations about Oprofile that i expressed about KVM here. (In fact there was a similar, big flamewar about all this when perf was posted a year ago.) And yes, (as you are aware of) i see very similar patterns of inefficiency in the KVM/Qemu tooling relationship as well, hence did i express my views about it. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 11:23 ` Ingo Molnar @ 2010-03-22 12:49 ` Avi Kivity 2010-03-22 13:01 ` Pekka Enberg 2010-03-22 14:47 ` Ingo Molnar 0 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 12:49 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 01:23 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> IMO the reason perf is more usable than oprofile has less to do with the >> kernel/userspace boundary and more do to with effort and attention spent on >> the userspace/user boundary. >> >> [...] >> > If you are interested in the first-hand experience of the people who are doing > the perf work then here it is: by far the biggest reason for perf success and > perf usability is the integration of the user-space tooling with the > kernel-space bits, into a single repository and project. > Please take a look at the kvm integration code in qemu as a fraction of the whole code base. > The very move you are opposing so vehemently for KVM. > I don't want to fracture a working community. > Oprofile went the way you proposed, and it was a failure. It failed not > because it was bad technology (it was pretty decent and people used it), it > was not a failure because the wrong people worked on it (to the contrary, very > capable people worked on it), it was a failure in hindsight because it simply > incorrectly split into two projects which stiffled the progress of each other. > Every project that has some kernel footprint, except perf, is split like that. Are they all failures? Seems like perf is also split, with sysprof being developed outside the kernel. Will you bring sysprof into the kernel? Will every feature be duplicated in prof and sysprof? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:49 ` Avi Kivity @ 2010-03-22 13:01 ` Pekka Enberg 2010-03-22 14:47 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 13:01 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, sandmann Hi Avi, On Mon, Mar 22, 2010 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote: > Seems like perf is also split, with sysprof being developed outside the > kernel. Will you bring sysprof into the kernel? Will every feature be > duplicated in prof and sysprof? I am glad you brought it up! Sysprof was historically outside of the kernel (with it's own kernel module, actually). While the GUI was nice, it was much harder to set up compared to oprofile so it wasn't all that popular. Things improved slightly when Ingo merged the custom kernel module but the _userspace_ part of sysprof was lagging behind a bit. I don't know what's the situation now that they've switched over to perf syscalls but you probably get my point. It would be nice if the two projects merged but I honestly don't see any fundamental problem with two (or more) co-existing projects. Friendly competition will ultimately benefit the users (think KDE and Gnome here). Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-22 13:01 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 13:01 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, sandmann Hi Avi, On Mon, Mar 22, 2010 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote: > Seems like perf is also split, with sysprof being developed outside the > kernel. Will you bring sysprof into the kernel? Will every feature be > duplicated in prof and sysprof? I am glad you brought it up! Sysprof was historically outside of the kernel (with it's own kernel module, actually). While the GUI was nice, it was much harder to set up compared to oprofile so it wasn't all that popular. Things improved slightly when Ingo merged the custom kernel module but the _userspace_ part of sysprof was lagging behind a bit. I don't know what's the situation now that they've switched over to perf syscalls but you probably get my point. It would be nice if the two projects merged but I honestly don't see any fundamental problem with two (or more) co-existing projects. Friendly competition will ultimately benefit the users (think KDE and Gnome here). Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 13:01 ` Pekka Enberg (?) @ 2010-03-22 14:54 ` Ingo Molnar 2010-03-22 19:04 ` Avi Kivity 2010-03-23 9:46 ` Olivier Galibert -1 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 14:54 UTC (permalink / raw) To: Pekka Enberg Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, sandmann * Pekka Enberg <penberg@cs.helsinki.fi> wrote: > Hi Avi, > > On Mon, Mar 22, 2010 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote: > > Seems like perf is also split, with sysprof being developed outside the > > kernel. ?Will you bring sysprof into the kernel? ?Will every feature be > > duplicated in prof and sysprof? > > I am glad you brought it up! Sysprof was historically outside of the kernel > (with it's own kernel module, actually). While the GUI was nice, it was much > harder to set up compared to oprofile so it wasn't all that popular. Things > improved slightly when Ingo merged the custom kernel module but the > _userspace_ part of sysprof was lagging behind a bit. I don't know what's > the situation now that they've switched over to perf syscalls but you > probably get my point. > > It would be nice if the two projects merged but I honestly don't see any > fundamental problem with two (or more) co-existing projects. Friendly > competition will ultimately benefit the users (think KDE and Gnome here). See my previous mail - what i see as the most healthy project model is to have a full solution reference implementation, connected to a flexible halo of plugins or sub-apps. Firefox does that, KDE does that, and Gnome as well to a certain degree. The 'halo' provides a constant feedback of new features, and it also provides competition and pressure on the 'main' code to be top-notch. The problem i see with KVM is that there's no reference implementation! There is _only_ the KVM kernel part which is not functional in itself. Surrounded by a 'halo' - where none of the entities is really 'the' reference implementation we call 'KVM'. This causes constant quality problems as the developers of the main project dont have constant pressure towards good quality (it is not their responsibility to care about user-space bits after all), plus it causes a lack of focus as well: integration between (friendly) competing user-space components is a lot harder than integration within a single framework such as Firefox. I hope this explains my points about modularization a bit better! I suggested KVM to grow a user-space tool component in the kernel repo in tools/kvm/, which would become the reference implementation for tooling. User-space projects can still provide alternative tooling or can plug into this tooling, just like they are doing it now. So the main effect isnt even on those projects but on the kernel developers. The ABI remains and all the user-space packages and projects remain. Yes, i thought Qemu would be a prime candidate to be the baseline for tools/kvm/, but i guess that has become socially impossible now after this flamewar. It's not a big problem in the big scheme of things: tools/kvm/ is best grown up from a small towards larger size anyway ... Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 14:54 ` Ingo Molnar @ 2010-03-22 19:04 ` Avi Kivity 2010-03-23 9:46 ` Olivier Galibert 1 sibling, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 19:04 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, sandmann On 03/22/2010 04:54 PM, Ingo Molnar wrote: > * Pekka Enberg<penberg@cs.helsinki.fi> wrote: > > >> Hi Avi, >> >> On Mon, Mar 22, 2010 at 2:49 PM, Avi Kivity<avi@redhat.com> wrote: >> >>> Seems like perf is also split, with sysprof being developed outside the >>> kernel. ?Will you bring sysprof into the kernel? ?Will every feature be >>> duplicated in prof and sysprof? >>> >> I am glad you brought it up! Sysprof was historically outside of the kernel >> (with it's own kernel module, actually). While the GUI was nice, it was much >> harder to set up compared to oprofile so it wasn't all that popular. Things >> improved slightly when Ingo merged the custom kernel module but the >> _userspace_ part of sysprof was lagging behind a bit. I don't know what's >> the situation now that they've switched over to perf syscalls but you >> probably get my point. >> >> It would be nice if the two projects merged but I honestly don't see any >> fundamental problem with two (or more) co-existing projects. Friendly >> competition will ultimately benefit the users (think KDE and Gnome here). >> > See my previous mail - what i see as the most healthy project model is to have > a full solution reference implementation, connected to a flexible halo of > plugins or sub-apps. > > Firefox does that, KDE does that, and Gnome as well to a certain degree. > > The 'halo' provides a constant feedback of new features, and it also provides > competition and pressure on the 'main' code to be top-notch. > > The problem i see with KVM is that there's no reference implementation! There > is _only_ the KVM kernel part which is not functional in itself. Surrounded by > a 'halo' - where none of the entities is really 'the' reference implementation > we call 'KVM'. > The reference implementation is qemu-kvm.git, in the future qemu.git. Like the reference implementation of device-mapper is lvm2/device-mapper, not tools/device-mapper. > This causes constant quality problems as the developers of the main project > dont have constant pressure towards good quality (it is not their > responsibility to care about user-space bits after all), The developers of the main project are very much aware that users don't call the ioctls directly but instead use qemu. > plus it causes a lack > of focus as well: integration between (friendly) competing user-space > components is a lot harder than integration within a single framework such as > Firefox. > We are very focused, just not on what you think we should be focused. > I hope this explains my points about modularization a bit better! I suggested > KVM to grow a user-space tool component in the kernel repo in tools/kvm/, > which would become the reference implementation for tooling. User-space > projects can still provide alternative tooling or can plug into this tooling, > just like they are doing it now. So the main effect isnt even on those > projects but on the kernel developers. The ABI remains and all the user-space > packages and projects remain. > Seems like wanton duplication of effort. Can we throw so many developer-years away on duplicate projects? Assuming not all are true volunteers (85% for 2.6.33) who will fund this duplicate effort? > Yes, i thought Qemu would be a prime candidate to be the baseline for > tools/kvm/, but i guess that has become socially impossible now after this > flamewar. It's not a big problem in the big scheme of things: tools/kvm/ is > best grown up from a small towards larger size anyway ... > Qemu is open source, you can cp it into tools/kvm. Rewriting it from scratch is a mammoth effort, there's a reason kvm, Xen, and virtualbox all use qemu. Qemu itself copied code from bochs. Writing this stuff is hard, especially if there is something already working. You'll probably get much better threading (the qemu device model is still single threaded), but it will take years to reach where qemu is already at. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 14:54 ` Ingo Molnar 2010-03-22 19:04 ` Avi Kivity @ 2010-03-23 9:46 ` Olivier Galibert 1 sibling, 0 replies; 390+ messages in thread From: Olivier Galibert @ 2010-03-23 9:46 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, sandmann On Mon, Mar 22, 2010 at 03:54:37PM +0100, Ingo Molnar wrote: > Yes, i thought Qemu would be a prime candidate to be the baseline for > tools/kvm/, but i guess that has become socially impossible now after this > flamewar. It's not a big problem in the big scheme of things: tools/kvm/ is > best grown up from a small towards larger size anyway ... I'm curious, where would you put the limit? Let's imagine a tools/kvm appears, be it qemu or not, that's outside the scope of my question. Would you put the legacy PC bios in there (seabios I guess)? The EFI bios? The windows-compiled paravirtual drivers? The Xorg paravirtual DDX ? Mesa (which includes the pv gallium drivers)? The libvirt-equivalent? The GUI? That's not a rhetorical question btw, I really wonder where the limit should be. OG. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:49 ` Avi Kivity 2010-03-22 13:01 ` Pekka Enberg @ 2010-03-22 14:47 ` Ingo Molnar 2010-03-22 18:15 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 14:47 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/22/2010 01:23 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>IMO the reason perf is more usable than oprofile has less to do with the > >>kernel/userspace boundary and more do to with effort and attention spent on > >>the userspace/user boundary. > >> > >>[...] > > > > If you are interested in the first-hand experience of the people who are > > doing the perf work then here it is: by far the biggest reason for perf > > success and perf usability is the integration of the user-space tooling > > with the kernel-space bits, into a single repository and project. > > Please take a look at the kvm integration code in qemu as a fraction of the > whole code base. You have to admit that much of Qemu's past 2-3 years of development was motivated by Linux/KVM (i'd say more than 50% of the code). As such it's one and the same code base - you just continue to define Qemu to be different from KVM. I very much remember how Qemu looked like _before_ KVM: it was a struggling, dying project. KVM clearly changed that. > > The very move you are opposing so vehemently for KVM. > > I don't want to fracture a working community. Would you accept (or at least not NAK) a new tools/kvm/ tool that builds tooling from grounds up, while leaving Qemu untouched? [assuming it's all clean code, etc.] Although i have doubts about how well that would work 'against' your opinion: such a tool would need lots of KVM-side features and a positive attitude from you to be really useful. There's a lot of missing functionality to cover. > > Oprofile went the way you proposed, and it was a failure. It failed not > > because it was bad technology (it was pretty decent and people used it), > > it was not a failure because the wrong people worked on it (to the > > contrary, very capable people worked on it), it was a failure in hindsight > > because it simply incorrectly split into two projects which stiffled the > > progress of each other. > > Every project that has some kernel footprint, except perf, is split like > that. Are they all failures? No. Did i ever claim KVM was a failure? I said it's hindered by this design aspect. Are other Linux kernel tool projects affected by similar problems? You bet ... > Seems like perf is also split, with sysprof being developed outside the > kernel. Will you bring sysprof into the kernel? Will every feature be > duplicated in prof and sysprof? I'd prefer if sysprof merged into perf as 'perf view' - but its maintainer does not want that - which is perfectly OK. So we are building equivalent functionality into perf instead. Think about it like Firefox plugins: the main Firefox project picks up the functionality of the most popular Firefox plugins all the time. Session Saver, Tab Mix Plus, etc. were all in essence 'merged' (in functionality, not in code) into the 'reference' Firefox project. I think that's a fundamentally healthy model: it allows extensions and thus give others an honest chance to show that you are potentially coding an inferior piece of code - but also express a clear opinion about what you consider a full, usable, high-quality reference implementation and constantly improve this reference implementation. I dont think that can be argued to be a bad model. Yes, it takes a bit of thinking outside the box to do tools/kvm/ but of all people i'd expect some of that from you. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 14:47 ` Ingo Molnar @ 2010-03-22 18:15 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 18:15 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 04:47 PM, Ingo Molnar wrote: > >>> If you are interested in the first-hand experience of the people who are >>> doing the perf work then here it is: by far the biggest reason for perf >>> success and perf usability is the integration of the user-space tooling >>> with the kernel-space bits, into a single repository and project. >>> >> Please take a look at the kvm integration code in qemu as a fraction of the >> whole code base. >> > You have to admit that much of Qemu's past 2-3 years of development was > motivated by Linux/KVM (i'd say more than 50% of the code). kvm certainly revitalized qemu development. > As such it's one > and the same code base - you just continue to define Qemu to be different from > KVM. > It's not the same code base. kvm provides a cpu virtualization service, qemu uses it. There could be other users. qemu could go away one day and be replaced by something else (tools/kvm?), and kvm would be unaffected. > I very much remember how Qemu looked like _before_ KVM: it was a struggling, > dying project. KVM clearly changed that. > I'm a hero. >>> The very move you are opposing so vehemently for KVM. >>> >> I don't want to fracture a working community. >> > Would you accept (or at least not NAK) a new tools/kvm/ tool that builds > tooling from grounds up, while leaving Qemu untouched? [assuming it's all > clean code, etc.] > I couldn't NAK tools/kvm any more than I could NAK a new project outside the kernel repository. IMO it would be duplicated effort, but like I mentioned before, I can't tell volunteers what to do, only recommend that they join the existing effort. > Although i have doubts about how well that would work 'against' your opinion: > such a tool would need lots of KVM-side features and a positive attitude from > you to be really useful. There's a lot of missing functionality to cover. > Functionality that can be implemented in userspace will not be accepted into kvm unless there are very good reasons why it should be. Things that belong in kvm will be more than welcome. >> Seems like perf is also split, with sysprof being developed outside the >> kernel. Will you bring sysprof into the kernel? Will every feature be >> duplicated in prof and sysprof? >> > I'd prefer if sysprof merged into perf as 'perf view' - but its maintainer > does not want that - which is perfectly OK. You spared him the flamewar, I hope. > So we are building equivalent > functionality into perf instead. > Ah, duplicating effort. Great. > Think about it like Firefox plugins: the main Firefox project picks up the > functionality of the most popular Firefox plugins all the time. Session Saver, > Tab Mix Plus, etc. were all in essence 'merged' (in functionality, not in > code) into the 'reference' Firefox project. > There's a difference between absorbing a small plugin and duplicating a project. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:31 ` Ingo Molnar 2010-03-21 21:30 ` Avi Kivity @ 2010-03-22 11:10 ` oerg Roedel 2010-03-22 12:22 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: oerg Roedel @ 2010-03-22 11:10 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Sun, Mar 21, 2010 at 09:31:21PM +0100, Ingo Molnar wrote: > Lets see one example of that thought process in action: Oprofile. Since you are talking so much about oProfile in this thread I think it is important to mention that the problem with oProfile was not the repository separation. The problem was (and is) that the kernel and the user-space parts are maintained by different people who dont talk to each other or have a direction where they want to go with the project. Basically the reason of the oProfile failure is a disfunctional community. I told the kernel-maintainer several times to also maintain user-space but he didn't want that. The situation with KVM is entirely different. Avi commits to kvm.git and qemu-kvm.git so he maintains both. Anthony is working to integrate the qemu-kvm changes into upstream qemu. Further these people work very closely together and the community around KVM works well too. The problems that oProfile has are not even in sight for KVM. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 11:10 ` oerg Roedel @ 2010-03-22 12:22 ` Ingo Molnar 2010-03-22 13:46 ` Joerg Roedel 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 12:22 UTC (permalink / raw) To: oerg Roedel Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * oerg Roedel <joro@8bytes.org> wrote: > On Sun, Mar 21, 2010 at 09:31:21PM +0100, Ingo Molnar wrote: > > Lets see one example of that thought process in action: Oprofile. > > Since you are talking so much about oProfile in this thread I think it is > important to mention that the problem with oProfile was not the repository > separation. > > The problem was (and is) that the kernel and the user-space parts are > maintained by different people [...] Caused by: repository separation and the inevitable code and social fork a decade later. > [...] who dont talk to each other or have a direction where they want to go > with the project. [...] Caused by: repository separation and the inevitable code and social fork a decade later. > [...] Basically the reason of the oProfile failure is a disfunctional > community. [...] Caused by: repository separation and the inevitable code and social fork a decade later. > [...] I told the kernel-maintainer several times to also maintain > user-space but he didn't want that. > > The situation with KVM is entirely different. Avi commits to kvm.git and > qemu-kvm.git so he maintains both. [...] What you fail to realise (or what you fail to know, you werent around when Oprofile was written, i was) is that Oprofile _did_ have a functional single community when it was written. The tooling and the kernel bits was written by the same people. But a decade is a long time and the drift happened due to the inevitability of the repository separation, and due to the _inability_ to reach a sane, usable solution within that framework of separation. So i dont see much of a difference to the Oprofile situation really and i see many parallels. I also see similar kinds of desktop usability problems. The difference is that we dont have KVM with a decade of history and we dont have a 'told you so' KVM reimplementation to show that proves the point. I guess it's a matter of time before that happens, because Qemu usability is so absymal today - so i guess we should suspend any discussions until that happens, no need to waste time on arguing hypoteticals. I think you are rationalizing the status quo. It's as if you argued in 1990 that the unification of East and West Germany wouldnt make much sense because despite clear problems and incompatibilites and different styles westerners were still allowed to visit eastern relatives and they both spoke the same language after all ;-) Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:22 ` Ingo Molnar @ 2010-03-22 13:46 ` Joerg Roedel 2010-03-22 16:32 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-22 13:46 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 01:22:28PM +0100, Ingo Molnar wrote: > > * Joerg Roedel <joro@8bytes.org> wrote: > > > [...] Basically the reason of the oProfile failure is a disfunctional > > community. [...] > > Caused by: repository separation and the inevitable code and social fork a > decade later. No, the split-repository situation was the smallest problem after all. Its was a community thing. If the community doesn't work a single-repo project will also fail. Look at the state of the alpha arch in Linux today, it is maintained in one repository but nobody really cares about it. Thus it is miles behine most other archs Linux supports today in quality and feature completeness. > What you fail to realise (or what you fail to know, you werent around when > Oprofile was written, i was) is that Oprofile _did_ have a functional single > community when it was written. The tooling and the kernel bits was written by > the same people. Yes, this was probably the time when everybody was enthusiastic about the feature and they could attract lots of developers. But situation changed over time. > So i dont see much of a difference to the Oprofile situation really and i see > many parallels. I also see similar kinds of desktop usability problems. The difference is that KVM has a working community with good developers and maintainers. > The difference is that we dont have KVM with a decade of history and we dont > have a 'told you so' KVM reimplementation to show that proves the point. I > guess it's a matter of time before that happens, because Qemu usability is so > absymal today - so i guess we should suspend any discussions until that > happens, no need to waste time on arguing hypoteticals. We actually have lguest which is small. But it lacks functionality and the developer community KVM has attracted. > I think you are rationalizing the status quo. I see that there are issues with KVM today in some areas. You pointed out the desktop usability already. I personally have trouble with the qem-kvm.git because it is unbisectable. But repository unification doesn't solve the problem here. The point for a single repository is that it simplifies the development process. I agree with you here. But the current process of KVM is not too difficult after all. I don't have to touch qemu sources for most of my work on KVM. > It's as if you argued in 1990 that the unification of East and West Germany > wouldnt make much sense because despite clear problems and incompatibilites > and different styles westerners were still allowed to visit eastern relatives > and they both spoke the same language after all ;-) Um, hmm. I don't think these situations have enough in common to compare them ;-) Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 13:46 ` Joerg Roedel @ 2010-03-22 16:32 ` Ingo Molnar 2010-03-22 17:17 ` Frank Ch. Eigler ` (2 more replies) 0 siblings, 3 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 16:32 UTC (permalink / raw) To: Joerg Roedel Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Joerg Roedel <joro@8bytes.org> wrote: > [...] Look at the state of the alpha arch in Linux today, it is maintained > in one repository but nobody really cares about it. Thus it is miles behine > most other archs Linux supports today in quality and feature completeness. I dont know how you can find the situation of Alpha comparable, which is a legacy architecture for which no new CPU was manufactored in the past ~10 years. The negative effects of physical obscolescence cannot be overcome even by the very best of development models ... So this is a total non-argument in this context. > On Mon, Mar 22, 2010 at 01:22:28PM +0100, Ingo Molnar wrote: > > > > * Joerg Roedel <joro@8bytes.org> wrote: > > > > > [...] Basically the reason of the oProfile failure is a disfunctional > > > community. [...] > > > > Caused by: repository separation and the inevitable code and social fork a > > decade later. > > No, the split-repository situation was the smallest problem after all. Its > was a community thing. If the community doesn't work a single-repo project > will also fail. [...] So, what do you think creates code communities and keeps them alive? Developers and code. And the wellbeing of developers are primarily influenced by the repository structure and by the development/maintenance process - i.e. by the 'fun' aspect. (i'm simplifying things there but that's the crux of it.) So yes, i do claim that what stiffled and eventually killed off the Oprofile community was the split repository. None of the other Oprofile shortcomings were really unfixable, but this one was. It gave no way for the community to grow in a healthy way, after the initial phase. Features were more difficult and less fun to develop. And yes, there were times when there was still active Oprofile development but the development process warning signs should have been noticed, and the community could have been kept alive by unification and similar measures. Instead what happened was a complete rewrite and a competitive replacement by perf. (Which isnt particularly nice to users btw. - they prefer more gradual transitions - but there was no other option, so many problems accumulated in Oprofile.) I simply do not want to see KVM face the same fate, and yes i do see similar warnings signs. > > What you fail to realise (or what you fail to know, you werent around when > > Oprofile was written, i was) is that Oprofile _did_ have a functional > > single community when it was written. The tooling and the kernel bits was > > written by the same people. > > Yes, this was probably the time when everybody was enthusiastic about the > feature and they could attract lots of developers. But situation changed > over time. The thing is, the drift was pre-programmed by having a split ... > > So i dont see much of a difference to the Oprofile situation really and i > > see many parallels. I also see similar kinds of desktop usability > > problems. > > The difference is that KVM has a working community with good developers and > maintainers. Oprofile certainly had good developers and maintainers as well. In the end it wasnt enough ... Also, a project can easily still be 'alive' but not reach its full potential. Why do you assume that my argument means that KVM isnt viable today? It can very well still be viable and even healthy - just not _as healthy_ as it could be ... > > The difference is that we dont have KVM with a decade of history and we > > dont have a 'told you so' KVM reimplementation to show that proves the > > point. I guess it's a matter of time before that happens, because Qemu > > usability is so absymal today - so i guess we should suspend any > > discussions until that happens, no need to waste time on arguing > > hypoteticals. > > We actually have lguest which is small. But it lacks functionality and the > developer community KVM has attracted. I suggested long ago to merge lguest into KVM to cover non-VMX/non-SVM execution. > > I think you are rationalizing the status quo. > > I see that there are issues with KVM today in some areas. You pointed out > the desktop usability already. I personally have trouble with the > qem-kvm.git because it is unbisectable. But repository unification doesn't > solve the problem here. Why doesnt it solve the bisectability problem? The kernel repo is supposed to be bisectable so that problem would be solved. > The point for a single repository is that it simplifies the development > process. I agree with you here. But the current process of KVM is not too > difficult after all. I don't have to touch qemu sources for most of my work > on KVM. In my judgement you'd have to do that more frequently, if KVM was properly weighting its priorities. For example regarding this recent KVM commit of yours: | commit ec1ff79084fccdae0dca9b04b89dcdf3235bbfa1 | Author: Joerg Roedel <joerg.roedel@amd.com> | Date: Fri Oct 9 16:08:31 2009 +0200 | | KVM: SVM: Add tracepoint for invlpga instruction | | This patch adds a tracepoint for the event that the guest | executed the INVLPGA instruction. With integrated KVM tooling i might have insisted for that new tracepoint to be available to users as well via some more meaningful tooling than just a pure tracepoint. There's synergies like that all around the place. You should realize that naturally developers will gravitate towards the most 'fun' aspects of a project. It is the task of the maintainer to keep the balance between fun and utility, bugs and features, quality and code-rot. > > It's as if you argued in 1990 that the unification of East and West > > Germany wouldnt make much sense because despite clear problems and > > incompatibilites and different styles westerners were still allowed to > > visit eastern relatives and they both spoke the same language after all > > ;-) > > Um, hmm. I don't think these situations have enough in common to compare > them ;-) Probably, but it's an interesting parallel nevertheless ;-) Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:32 ` Ingo Molnar @ 2010-03-22 17:17 ` Frank Ch. Eigler 2010-03-22 17:27 ` Pekka Enberg 2010-03-22 17:44 ` Avi Kivity 2010-03-22 19:20 ` Joerg Roedel 2 siblings, 1 reply; 390+ messages in thread From: Frank Ch. Eigler @ 2010-03-22 17:17 UTC (permalink / raw) To: Ingo Molnar Cc: Joerg Roedel, Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker mingo wrote: > [...] >> No, the split-repository situation was the smallest problem after all. Its >> was a community thing. If the community doesn't work a single-repo project >> will also fail. [...] > > So, what do you think creates code communities and keeps them alive? > Developers and code. And the wellbeing of developers are primarily influenced > by the repository structure and by the development/maintenance process - i.e. > by the 'fun' aspect. (i'm simplifying things there but that's the crux of it.) > > So yes, i do claim that what stiffled and eventually killed off the Oprofile > community was the split repository. [...] In your very previous paragraphs, you enumerate two separate causes: "repository structure" and "development/maintenance process" as being sources of "fun". Please simply accept that the former is considered by many as absolutely trivial compared to the latter, and additional verbose repetition of your thesis will not change this. - FChE ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:17 ` Frank Ch. Eigler @ 2010-03-22 17:27 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 17:27 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Ingo Molnar, Joerg Roedel, Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker Hi Frank, On Mon, Mar 22, 2010 at 7:17 PM, Frank Ch. Eigler <fche@redhat.com> wrote: > In your very previous paragraphs, you enumerate two separate causes: > "repository structure" and "development/maintenance process" as being > sources of "fun". Please simply accept that the former is considered > by many as absolutely trivial compared to the latter, and additional > verbose repetition of your thesis will not change this. I can accept that many people consider it trivial but the problem is that we have _real data_ on kmemtrace and now perf that the amount of contributors is significantly smaller when your code is outside the kernel repository. Now admittedly both of them are pretty intimate with the kernel but Ingo's suggestion of putting kvm-qemu in tools/ is an interesting idea nevertheless. It's kinda funny to see people argue that having an external repository is not a problem and that it's not a big deal if building something from the repository is slightly painful as long as it doesn't require a PhD when we have _real world_ experience that it _does_ limit developer base in some cases. Whether or not that applies to kvm remains to be seen but I've yet to see a convincing argument why it doesn't. Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-22 17:27 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 17:27 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Ingo Molnar, Joerg Roedel, Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker Hi Frank, On Mon, Mar 22, 2010 at 7:17 PM, Frank Ch. Eigler <fche@redhat.com> wrote: > In your very previous paragraphs, you enumerate two separate causes: > "repository structure" and "development/maintenance process" as being > sources of "fun". Please simply accept that the former is considered > by many as absolutely trivial compared to the latter, and additional > verbose repetition of your thesis will not change this. I can accept that many people consider it trivial but the problem is that we have _real data_ on kmemtrace and now perf that the amount of contributors is significantly smaller when your code is outside the kernel repository. Now admittedly both of them are pretty intimate with the kernel but Ingo's suggestion of putting kvm-qemu in tools/ is an interesting idea nevertheless. It's kinda funny to see people argue that having an external repository is not a problem and that it's not a big deal if building something from the repository is slightly painful as long as it doesn't require a PhD when we have _real world_ experience that it _does_ limit developer base in some cases. Whether or not that applies to kvm remains to be seen but I've yet to see a convincing argument why it doesn't. Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:27 ` Pekka Enberg (?) @ 2010-03-22 17:32 ` Avi Kivity 2010-03-22 17:39 ` Ingo Molnar 2010-03-22 17:52 ` Pekka Enberg -1 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 17:32 UTC (permalink / raw) To: Pekka Enberg Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 07:27 PM, Pekka Enberg wrote: > It's kinda funny to see people argue that having an external > repository is not a problem and that it's not a big deal if building > something from the repository is slightly painful as long as it > doesn't require a PhD when we have _real world_ experience that it > _does_ limit developer base in some cases. Whether or not that applies > to kvm remains to be seen but I've yet to see a convincing argument > why it doesn't. > qemu has non-Linux developers. Not all of their contributions are relevant to kvm but some are. If we pull qemu into tools/kvm, we lose them. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:32 ` Avi Kivity @ 2010-03-22 17:39 ` Ingo Molnar 2010-03-22 17:58 ` Avi Kivity 2010-03-22 17:52 ` Pekka Enberg 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 17:39 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Frank Ch. Eigler, Joerg Roedel, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/22/2010 07:27 PM, Pekka Enberg wrote: > > > > It's kinda funny to see people argue that having an external repository is > > not a problem and that it's not a big deal if building something from the > > repository is slightly painful as long as it doesn't require a PhD when we > > have _real world_ experience that it _does_ limit developer base in some > > cases. Whether or not that applies to kvm remains to be seen but I've yet > > to see a convincing argument why it doesn't. > > qemu has non-Linux developers. Not all of their contributions are relevant > to kvm but some are. If we pull qemu into tools/kvm, we lose them. Qemu had very few developers before KVM made use of it - i know it because i followed the project prior KVM. So whatever development activitity Qemu has today, it's 99% [WAG] attributable to KVM. It might have non-Linux contributors, but they wouldnt be there if it wasnt for all the Linux contributors ... Furthermore, those contributors wouldnt have to leave - they could simply use a different Git URI ... Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:39 ` Ingo Molnar @ 2010-03-22 17:58 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 17:58 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Frank Ch. Eigler, Joerg Roedel, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 07:39 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/22/2010 07:27 PM, Pekka Enberg wrote: >> >>> It's kinda funny to see people argue that having an external repository is >>> not a problem and that it's not a big deal if building something from the >>> repository is slightly painful as long as it doesn't require a PhD when we >>> have _real world_ experience that it _does_ limit developer base in some >>> cases. Whether or not that applies to kvm remains to be seen but I've yet >>> to see a convincing argument why it doesn't. >>> >> qemu has non-Linux developers. Not all of their contributions are relevant >> to kvm but some are. If we pull qemu into tools/kvm, we lose them. >> > Qemu had very few developers before KVM made use of it - i know it because i > followed the project prior KVM. > No argument. > So whatever development activitity Qemu has today, it's 99% [WAG] attributable > to KVM. It might have non-Linux contributors, but they wouldnt be there if it > wasnt for all the Linux contributors ... > > Furthermore, those contributors wouldnt have to leave - they could simply use > a different Git URI ... > tools/kvm would drop support for non-Linux hosts, for tcg, and for architectures which kvm doesn't support ("clean and minimal"). That would be the real win, not sharing the repository. But those other contributors would just stay with the original qemu. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:32 ` Avi Kivity @ 2010-03-22 17:52 ` Pekka Enberg 2010-03-22 17:52 ` Pekka Enberg 1 sibling, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 17:52 UTC (permalink / raw) To: Avi Kivity Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker Hi Avi, On Mon, Mar 22, 2010 at 7:32 PM, Avi Kivity <avi@redhat.com> wrote: >> It's kinda funny to see people argue that having an external >> repository is not a problem and that it's not a big deal if building >> something from the repository is slightly painful as long as it >> doesn't require a PhD when we have _real world_ experience that it >> _does_ limit developer base in some cases. Whether or not that applies >> to kvm remains to be seen but I've yet to see a convincing argument >> why it doesn't. > > qemu has non-Linux developers. Not all of their contributions are relevant > to kvm but some are. If we pull qemu into tools/kvm, we lose them. Yeah, you probably would but the hypothesis is that you'd end up with a bigger net developer base for the _Linux_ version. Now you might not think that's important but I certainly do and I think Ingo does as well. ;-) That said, pulling 400 KLOC of code into the kernel sounds really excessive. Would we need all that if we just do native virtualization and no actual emulation? Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-22 17:52 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 17:52 UTC (permalink / raw) To: Avi Kivity Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker Hi Avi, On Mon, Mar 22, 2010 at 7:32 PM, Avi Kivity <avi@redhat.com> wrote: >> It's kinda funny to see people argue that having an external >> repository is not a problem and that it's not a big deal if building >> something from the repository is slightly painful as long as it >> doesn't require a PhD when we have _real world_ experience that it >> _does_ limit developer base in some cases. Whether or not that applies >> to kvm remains to be seen but I've yet to see a convincing argument >> why it doesn't. > > qemu has non-Linux developers. Not all of their contributions are relevant > to kvm but some are. If we pull qemu into tools/kvm, we lose them. Yeah, you probably would but the hypothesis is that you'd end up with a bigger net developer base for the _Linux_ version. Now you might not think that's important but I certainly do and I think Ingo does as well. ;-) That said, pulling 400 KLOC of code into the kernel sounds really excessive. Would we need all that if we just do native virtualization and no actual emulation? Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:52 ` Pekka Enberg (?) @ 2010-03-22 18:04 ` Avi Kivity 2010-03-22 18:10 ` Pekka Enberg -1 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 18:04 UTC (permalink / raw) To: Pekka Enberg Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 07:52 PM, Pekka Enberg wrote: > Hi Avi, > > On Mon, Mar 22, 2010 at 7:32 PM, Avi Kivity<avi@redhat.com> wrote: > >>> It's kinda funny to see people argue that having an external >>> repository is not a problem and that it's not a big deal if building >>> something from the repository is slightly painful as long as it >>> doesn't require a PhD when we have _real world_ experience that it >>> _does_ limit developer base in some cases. Whether or not that applies >>> to kvm remains to be seen but I've yet to see a convincing argument >>> why it doesn't. >>> >> qemu has non-Linux developers. Not all of their contributions are relevant >> to kvm but some are. If we pull qemu into tools/kvm, we lose them. >> > Yeah, you probably would but the hypothesis is that you'd end up with > a bigger net developer base for the _Linux_ version. Now you might not > think that's important but I certainly do and I think Ingo does as > well. ;-) > You're probably correct, but the point is that non-Linux developers also contribute things which kvm benefits from. Not a whole lot, but some. > That said, pulling 400 KLOC of code into the kernel sounds really > excessive. Would we need all that if we just do native virtualization > and no actual emulation? > What is native virtualization and no actual emulation? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 18:04 ` Avi Kivity @ 2010-03-22 18:10 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 18:10 UTC (permalink / raw) To: Avi Kivity Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 8:04 PM, Avi Kivity <avi@redhat.com> wrote: >> That said, pulling 400 KLOC of code into the kernel sounds really >> excessive. Would we need all that if we just do native virtualization >> and no actual emulation? > > What is native virtualization and no actual emulation? What I meant with "actual emulation" was running architecture A code on architecture B what was qemu's traditional use case. So the question was how much of the 400 KLOC do we need for just KVM on all the architectures that it supports? ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-22 18:10 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 18:10 UTC (permalink / raw) To: Avi Kivity Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 8:04 PM, Avi Kivity <avi@redhat.com> wrote: >> That said, pulling 400 KLOC of code into the kernel sounds really >> excessive. Would we need all that if we just do native virtualization >> and no actual emulation? > > What is native virtualization and no actual emulation? What I meant with "actual emulation" was running architecture A code on architecture B what was qemu's traditional use case. So the question was how much of the 400 KLOC do we need for just KVM on all the architectures that it supports? ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 18:10 ` Pekka Enberg (?) @ 2010-03-22 18:55 ` Avi Kivity -1 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 18:55 UTC (permalink / raw) To: Pekka Enberg Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 08:10 PM, Pekka Enberg wrote: > On Mon, Mar 22, 2010 at 8:04 PM, Avi Kivity<avi@redhat.com> wrote: > >>> That said, pulling 400 KLOC of code into the kernel sounds really >>> excessive. Would we need all that if we just do native virtualization >>> and no actual emulation? >>> >> What is native virtualization and no actual emulation? >> > What I meant with "actual emulation" was running architecture A code > on architecture B what was qemu's traditional use case. So the > question was how much of the 400 KLOC do we need for just KVM on all > the architectures that it supports? > qemu is 620 KLOC. Without cpu emulation that drops to ~480 KLOC. Much of that is device emulation that is not supported by kvm now (like ARM) but some might be needed again in the future (like ARM). x86-only is perhaps 300 KLOC, but kvm is not x86 only. And that is with a rudimentary GUI. GUIs are heavy. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:27 ` Pekka Enberg (?) (?) @ 2010-03-22 17:43 ` Ingo Molnar 2010-03-22 18:02 ` Avi Kivity -1 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 17:43 UTC (permalink / raw) To: Pekka Enberg Cc: Frank Ch. Eigler, Joerg Roedel, Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Pekka Enberg <penberg@cs.helsinki.fi> wrote: > Hi Frank, > > On Mon, Mar 22, 2010 at 7:17 PM, Frank Ch. Eigler <fche@redhat.com> wrote: > > In your very previous paragraphs, you enumerate two separate causes: > > "repository structure" and "development/maintenance process" as being > > sources of "fun". ?Please simply accept that the former is considered > > by many as absolutely trivial compared to the latter, and additional > > verbose repetition of your thesis will not change this. > > I can accept that many people consider it trivial but the problem is that we > have _real data_ on kmemtrace and now perf that the amount of contributors > is significantly smaller when your code is outside the kernel repository. > Now admittedly both of them are pretty intimate with the kernel but Ingo's > suggestion of putting kvm-qemu in tools/ is an interesting idea > nevertheless. Correct. > It's kinda funny to see people argue that having an external repository is > not a problem and that it's not a big deal if building something from the > repository is slightly painful as long as it doesn't require a PhD when we > have _real world_ experience that it _does_ limit developer base in some > cases. Whether or not that applies to kvm remains to be seen but I've yet to > see a convincing argument why it doesn't. Yeah. Also, if in fact the claim that the 'repository does not matter' is true then it doesnt matter that it's hosted in tools/kvm/ either, right? I.e. it's a win-win situation. Worst-case nothing happens beyond a Git URI change. Best-case the project is propelled to never seen heights due to contribution advantages not contemplated and not experienced by the KVM guys before ... Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:43 ` Ingo Molnar @ 2010-03-22 18:02 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 18:02 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Frank Ch. Eigler, Joerg Roedel, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 07:43 PM, Ingo Molnar wrote: > >> It's kinda funny to see people argue that having an external repository is >> not a problem and that it's not a big deal if building something from the >> repository is slightly painful as long as it doesn't require a PhD when we >> have _real world_ experience that it _does_ limit developer base in some >> cases. Whether or not that applies to kvm remains to be seen but I've yet to >> see a convincing argument why it doesn't. >> > Yeah. > > Also, if in fact the claim that the 'repository does not matter' is true then > it doesnt matter that it's hosted in tools/kvm/ either, right? > Again, the second it's moved to tools/kvm/ we strip it off anything that kvm can't use. > I.e. it's a win-win situation. Worst-case nothing happens beyond a Git URI > change. Best-case the project is propelled to never seen heights due to > contribution advantages not contemplated and not experienced by the KVM guys > before ... > You're exaggerating. There were 773 commits into qemu.git (excluding qemu-kvm.git) in the past three months. 162 for the same period for tools/perf. The pool is not that deep. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:32 ` Ingo Molnar 2010-03-22 17:17 ` Frank Ch. Eigler @ 2010-03-22 17:44 ` Avi Kivity 2010-03-22 19:10 ` Ingo Molnar 2010-03-22 19:20 ` Joerg Roedel 2 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 17:44 UTC (permalink / raw) To: Ingo Molnar Cc: Joerg Roedel, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 06:32 PM, Ingo Molnar wrote: > > So, what do you think creates code communities and keeps them alive? > Developers and code. And the wellbeing of developers are primarily influenced > by the repository structure and by the development/maintenance process - i.e. > by the 'fun' aspect. (i'm simplifying things there but that's the crux of it.) > There is nothing fun about having one repository or two. Who cares about this anyway? tools/kvm/ probably will draw developers, simply because of the glory associated with kernel work. That's a bug, not a feature. It means that effort is not distributed according to how it's needed, but because of irrelevant considerations. > I simply do not want to see KVM face the same fate, and yes i do see similar > warnings signs. > The number of kvm and qemu developers keeps increasing. We're having a kvm forum in August where we all meet. Come and see for yourself. >> We actually have lguest which is small. But it lacks functionality and the >> developer community KVM has attracted. >> > I suggested long ago to merge lguest into KVM to cover non-VMX/non-SVM > execution. > Rusty posted some initial patches for pv-only kvm but he lost interest before they were completed. No one followed up. btw, lguest has a single repository, userspace and kernel in the same repository, yet is practically dead. >>> I think you are rationalizing the status quo. >>> >> I see that there are issues with KVM today in some areas. You pointed out >> the desktop usability already. I personally have trouble with the >> qem-kvm.git because it is unbisectable. But repository unification doesn't >> solve the problem here. >> > Why doesnt it solve the bisectability problem? The kernel repo is supposed to > be bisectable so that problem would be solved. > These days qemu-kvm.git is bisectable (though not always trivially). qemu.git doesn't have this problem. >> The point for a single repository is that it simplifies the development >> process. I agree with you here. But the current process of KVM is not too >> difficult after all. I don't have to touch qemu sources for most of my work >> on KVM. >> > In my judgement you'd have to do that more frequently, if KVM was properly > weighting its priorities. For example regarding this recent KVM commit of > yours: > > | commit ec1ff79084fccdae0dca9b04b89dcdf3235bbfa1 > | Author: Joerg Roedel<joerg.roedel@amd.com> > | Date: Fri Oct 9 16:08:31 2009 +0200 > | > | KVM: SVM: Add tracepoint for invlpga instruction > | > | This patch adds a tracepoint for the event that the guest > | executed the INVLPGA instruction. > > With integrated KVM tooling i might have insisted for that new tracepoint to > be available to users as well via some more meaningful tooling than just a > pure tracepoint. > Something I've wanted for a long time is to port kvm_stat to use tracepoints instead of the home-grown instrumentation. But that is unrelated to this new tracepoint. Other than that we're satisfied with ftrace. > You should realize that naturally developers will gravitate towards the most > 'fun' aspects of a project. It is the task of the maintainer to keep the > balance between fun and utility, bugs and features, quality and code-rot. > There are plenty of un-fun tasks (like fixing bugs and providing RAS features) that we're doing. We don't do this for fun but to satisfy our users. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:44 ` Avi Kivity @ 2010-03-22 19:10 ` Ingo Molnar 2010-03-22 19:18 ` Anthony Liguori ` (2 more replies) 0 siblings, 3 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 19:10 UTC (permalink / raw) To: Avi Kivity Cc: Joerg Roedel, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/22/2010 06:32 PM, Ingo Molnar wrote: > > > > So, what do you think creates code communities and keeps them alive? > > Developers and code. And the wellbeing of developers are primarily > > influenced by the repository structure and by the development/maintenance > > process - i.e. by the 'fun' aspect. (i'm simplifying things there but > > that's the crux of it.) > > There is nothing fun about having one repository or two. Who cares about > this anyway? > > tools/kvm/ probably will draw developers, simply because of the glory > associated with kernel work. That's a bug, not a feature. It means that > effort is not distributed according to how it's needed, but because of > irrelevant considerations. And yet your solution to that is to ... do all your work in the kernel space and declare the tooling as something that does not interest you? ;-) > Something I've wanted for a long time is to port kvm_stat to use tracepoints > instead of the home-grown instrumentation. But that is unrelated to this > new tracepoint. Other than that we're satisfied with ftrace. Despite it being another in-kernel subsystem that by your earlier arguments should be done via a user-space package? ;-) > > You should realize that naturally developers will gravitate towards the > > most 'fun' aspects of a project. It is the task of the maintainer to keep > > the balance between fun and utility, bugs and features, quality and > > code-rot. > > There are plenty of un-fun tasks (like fixing bugs and providing RAS > features) that we're doing. We don't do this for fun but to satisfy our > users. So which one is it, KVM developers are volunteers that do fun stuff and cannot be told about project priorities, or KVM developers are pros who do unfun stuff because they can be told about priorities? I posit that it's both: and that priorities can be communicated - if only you try as a maintainer. All i'm suggesting is to add 'usable, unified user-space' to the list of unfun priorities, because it's possible and because it matters. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:10 ` Ingo Molnar @ 2010-03-22 19:18 ` Anthony Liguori 2010-03-22 19:23 ` Avi Kivity 2010-03-22 19:28 ` Andrea Arcangeli 2 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 19:18 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Joerg Roedel, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 02:10 PM, Ingo Molnar wrote: > > I posit that it's both: and that priorities can be communicated - if only you > try as a maintainer. All i'm suggesting is to add 'usable, unified user-space' > to the list of unfun priorities, because it's possible and because it matters. > I've spent the past few months dealing with customers using the libvirt/qemu/kvm stack. Usability is a major problem and is a top priority for me. That is definitely a shift but that occurred before you started your thread. But I disagree with your analysis of what the root of the problem is. It's a very kernel centric view and doesn't consider the interactions between userspace. Regards, Anthony Liguori > Ingo > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:10 ` Ingo Molnar 2010-03-22 19:18 ` Anthony Liguori @ 2010-03-22 19:23 ` Avi Kivity 2010-03-22 19:28 ` Andrea Arcangeli 2 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 19:23 UTC (permalink / raw) To: Ingo Molnar Cc: Joerg Roedel, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 09:10 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/22/2010 06:32 PM, Ingo Molnar wrote: >> >>> So, what do you think creates code communities and keeps them alive? >>> Developers and code. And the wellbeing of developers are primarily >>> influenced by the repository structure and by the development/maintenance >>> process - i.e. by the 'fun' aspect. (i'm simplifying things there but >>> that's the crux of it.) >>> >> There is nothing fun about having one repository or two. Who cares about >> this anyway? >> >> tools/kvm/ probably will draw developers, simply because of the glory >> associated with kernel work. That's a bug, not a feature. It means that >> effort is not distributed according to how it's needed, but because of >> irrelevant considerations. >> > And yet your solution to that is to ... do all your work in the kernel space > and declare the tooling as something that does not interest you? ;-) > I have done plenty of userspace work in qemu. I don't have a lack of interest in qemu, just in a desktop GUI. I'm not a GUI person and my employer doesn't have a desktop-on-desktop virtualization product that I know of. >> Something I've wanted for a long time is to port kvm_stat to use tracepoints >> instead of the home-grown instrumentation. But that is unrelated to this >> new tracepoint. Other than that we're satisfied with ftrace. >> > Despite it being another in-kernel subsystem that by your earlier arguments > should be done via a user-space package? ;-) > I'm satisfied with it as a user. Architecturally, I'd have preferred it to be a userspace tool. It might have improved usability as well to have something with --help instead of a set of debugfs files. But I'm a lot happier with ftrace existing as a kernel component than not at all. >>> You should realize that naturally developers will gravitate towards the >>> most 'fun' aspects of a project. It is the task of the maintainer to keep >>> the balance between fun and utility, bugs and features, quality and >>> code-rot. >>> >> There are plenty of un-fun tasks (like fixing bugs and providing RAS >> features) that we're doing. We don't do this for fun but to satisfy our >> users. >> > So which one is it, KVM developers are volunteers that do fun stuff and cannot > be told about project priorities, or KVM developers are pros who do unfun > stuff because they can be told about priorities? > From my point of view as maintainer, all contributors are volunteers, I can't tell any of them what to do. From the point of view of many of these volunteer's employers, they are wage slaves who do as they're told or else. So: when someone sends me a patch I gratefully accept if it is good or point out the issues if not. At the secret Red Hat headquarters and the kvm weekly conference call I participate in deciding priorities and task assignments. > I posit that it's both: and that priorities can be communicated - if only you > try as a maintainer. All i'm suggesting is to add 'usable, unified user-space' > to the list of unfun priorities, because it's possible and because it matters. > So: I require a volunteer to write some GUI code before I accept a patch. Back at the Red Hat lair, we think of what features we drop from the product because the kvm maintainer has gone nuts. The 'unified' part of your suggestion is not a requirement, but an implementation detail. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:10 ` Ingo Molnar 2010-03-22 19:18 ` Anthony Liguori 2010-03-22 19:23 ` Avi Kivity @ 2010-03-22 19:28 ` Andrea Arcangeli 2 siblings, 0 replies; 390+ messages in thread From: Andrea Arcangeli @ 2010-03-22 19:28 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Joerg Roedel, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 08:10:28PM +0100, Ingo Molnar wrote: > I posit that it's both: and that priorities can be communicated - if only you > try as a maintainer. All i'm suggesting is to add 'usable, unified user-space' > to the list of unfun priorities, because it's possible and because it matters. IMHO blaming anybody for it but qemu maintainership is very unfair. They intentionally reinveinted a less self contained, inferior, underperforming, underfeatured wheel instead of doing the right thing and just making sure that it as self contained enough as possible to avoid risking destabilizing their existing codebase. What can anybody (without qemu git commit access) do about it unless qemu git maintainer change attitude, dumps its qemu/kvm-all.c nosense for good, and do the right thing so we can unify for real? We need to move forward, including multithread the qemu core and be ready to include desktop virtualization protocol when they're ready for submission without being suggested to extend vnc instead to gain a similar speedup (i.e. yet another inferior wheel). Unification means that _all_ qemu users, pure research, theoretical interest, Xen, virtualbox, weird pure software architecture, will be able to push their stuff in for the common good, but that also shall apply to KVM! It has to become clear that reinveinting inferior wheels instead of merging the real thing, is absolutely time wasteful, unnecessary, and it won't make any difference as far as KVM is concerned, proof is that 0% of userbase runs qemu git to run KVM (except the kvm-all.c developers to test it perhaps or somebody by mistake not adding -kvm prefix to command line maybe). I don't pretend to rate KVM as more important than all the rest of niche usages for qemu but it shall be _as_ important as the rest and it'd be nice one day to be able to install only qemu on a system and get something actually usable in production. I very much like that qemu gets contributions from everywhere, it's also nice it can run without KVM (that is purely useful as a debugging tool to me but still...). I think it can all happen and unification should be the object for the gain of everyone in both qemu/kvm and even xen and all the rest. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:32 ` Ingo Molnar 2010-03-22 17:17 ` Frank Ch. Eigler 2010-03-22 17:44 ` Avi Kivity @ 2010-03-22 19:20 ` Joerg Roedel 2010-03-22 19:28 ` Avi Kivity 2010-03-22 19:49 ` Ingo Molnar 2 siblings, 2 replies; 390+ messages in thread From: Joerg Roedel @ 2010-03-22 19:20 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Mon, Mar 22, 2010 at 05:32:15PM +0100, Ingo Molnar wrote: > I dont know how you can find the situation of Alpha comparable, which is a > legacy architecture for which no new CPU was manufactored in the past ~10 > years. > > The negative effects of physical obscolescence cannot be overcome even by the > very best of development models ... The maintainers of that architecture could at least continue to maintain it. But that is not the case. Most newer syscalls are not available and overall stability on alpha sucks (kernel crashed when I tried to start Xorg for example) but nobody cares about it. Hardware is still around and there are still some users of it. > > > * Joerg Roedel <joro@8bytes.org> wrote: > > No, the split-repository situation was the smallest problem after all. Its > > was a community thing. If the community doesn't work a single-repo project > > will also fail. [...] > > So, what do you think creates code communities and keeps them alive? > Developers and code. And the wellbeing of developers are primarily influenced > by the repository structure and by the development/maintenance process - i.e. > by the 'fun' aspect. (i'm simplifying things there but that's the crux of it.) Right. A living community needs developers that write new code. And the repository structure is one important thing. But in my opinion it is not the most important one. With my 3-4 years experience in the kernel community I made the experience that the maintainers are the most important factor. I find a maintainer not commiting or caring about patches or not releasing new versions much worse than the wrong repository structure. oProfile has this problem with its userspace part. I partly made this bad experience with x86-64 before the architecture merge. KVM does not have this problem. > So yes, i do claim that what stiffled and eventually killed off the Oprofile > community was the split repository. None of the other Oprofile shortcomings > were really unfixable, but this one was. It gave no way for the community to > grow in a healthy way, after the initial phase. Features were more difficult > and less fun to develop. The biggest problem oProfile has is that it does not support per-process measuring. This is indeed not unfixable but it also doesn't fit well in the overall oProfile concept. > I simply do not want to see KVM face the same fate, and yes i do see similar > warnings signs. In fact, the development process in KVM has improved over time. In the early beginnings everything was kept in svn. Avi switched to git some day but at the time when we had these kvm-XX releases both kernel- and user-space together were unbisectable. This has improved to a point where the kernel-part could be bisected. The KVM maintainers and community have shown in the past that they can address problems with the development process if they come up. > Oprofile certainly had good developers and maintainers as well. In the end it > wasnt enough ... > > Also, a project can easily still be 'alive' but not reach its full potential. > > Why do you assume that my argument means that KVM isnt viable today? It can > very well still be viable and even healthy - just not _as healthy_ as it could > be ... I am not aware that I made you say anything ;-) > > > > The difference is that we dont have KVM with a decade of history and we > > > dont have a 'told you so' KVM reimplementation to show that proves the > > > point. I guess it's a matter of time before that happens, because Qemu > > > usability is so absymal today - so i guess we should suspend any > > > discussions until that happens, no need to waste time on arguing > > > hypoteticals. > > > > We actually have lguest which is small. But it lacks functionality and the > > developer community KVM has attracted. > > I suggested long ago to merge lguest into KVM to cover non-VMX/non-SVM > execution. That would have been the best. Rusty already started this work and presented it at the first KVM Forum. But I have never seen patches ... > > > I think you are rationalizing the status quo. > > > > I see that there are issues with KVM today in some areas. You pointed out > > the desktop usability already. I personally have trouble with the > > qem-kvm.git because it is unbisectable. But repository unification doesn't > > solve the problem here. > > Why doesnt it solve the bisectability problem? The kernel repo is supposed to > be bisectable so that problem would be solved. Because Marcelo and Avi try to keep as close to upstream qemu as possible. So the qemu repo is regularly merged in qemu-kvm and if you want to bisect you may end up somewhere in the middle of the qemu repository which has only very minimal kvm-support. The problem here is that two qemu repositorys exist. But the current effort of Anthony is directed to create a single qemu repository. But thats not done overnight. Merging qemu into the kernel would make Linus in fact a qemu maintainer. I am not sure he wants to be that ;-) > In my judgement you'd have to do that more frequently, if KVM was properly > weighting its priorities. For example regarding this recent KVM commit of > yours: > > | commit ec1ff79084fccdae0dca9b04b89dcdf3235bbfa1 > | Author: Joerg Roedel <joerg.roedel@amd.com> > | Date: Fri Oct 9 16:08:31 2009 +0200 > | > | KVM: SVM: Add tracepoint for invlpga instruction > | > | This patch adds a tracepoint for the event that the guest > | executed the INVLPGA instruction. > > With integrated KVM tooling i might have insisted for that new tracepoint to > be available to users as well via some more meaningful tooling than just a > pure tracepoint. > > There's synergies like that all around the place. True. Tools for better analyzing kvm traces is for sure something that belongs to tools/kvm. I am not sure if anyone has such tools. If yes, they should send it upstream. > > > It's as if you argued in 1990 that the unification of East and West > > > Germany wouldnt make much sense because despite clear problems and > > > incompatibilites and different styles westerners were still allowed to > > > visit eastern relatives and they both spoke the same language after all > > > ;-) > > > > Um, hmm. I don't think these situations have enough in common to compare > > them ;-) > > Probably, but it's an interesting parallel nevertheless ;-) That for sure ;-) Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:20 ` Joerg Roedel @ 2010-03-22 19:28 ` Avi Kivity 2010-03-22 19:49 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 19:28 UTC (permalink / raw) To: Joerg Roedel Cc: Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 09:20 PM, Joerg Roedel wrote: > >> Why doesnt it solve the bisectability problem? The kernel repo is supposed to >> be bisectable so that problem would be solved. >> > Because Marcelo and Avi try to keep as close to upstream qemu as > possible. So the qemu repo is regularly merged in qemu-kvm and if you > want to bisect you may end up somewhere in the middle of the qemu > repository which has only very minimal kvm-support. > The problem here is that two qemu repositorys exist. But the current > effort of Anthony is directed to create a single qemu repository. But > thats not done overnight. > It's in fact possible to bisect qemu-kvm.git. If you end up in qemu.git, do a 'git bisect skip'. If you end up in a merge, call the merge point A, bisect A^1..A^2, each time merging A^1 before compiling (the merge is always trivial due to the way we do it). Not fun, but it works. When we complete merging kvm integration into qemu.git, this problem will disappear. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:20 ` Joerg Roedel 2010-03-22 19:28 ` Avi Kivity @ 2010-03-22 19:49 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 19:49 UTC (permalink / raw) To: Joerg Roedel Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Joerg Roedel <joro@8bytes.org> wrote: > On Mon, Mar 22, 2010 at 05:32:15PM +0100, Ingo Molnar wrote: > > I dont know how you can find the situation of Alpha comparable, which is a > > legacy architecture for which no new CPU was manufactored in the past ~10 > > years. > > > > The negative effects of physical obscolescence cannot be overcome even by the > > very best of development models ... > > The maintainers of that architecture could at least continue to maintain it. > But that is not the case. Most newer syscalls are not available and overall > stability on alpha sucks (kernel crashed when I tried to start Xorg for > example) but nobody cares about it. Hardware is still around and there are > still some users of it. You are arguing why maintainers do not act as you suggest, against the huge negative effects of physical obscolescence? Please use common sense: they dont act because ... there are huge negative effects due to physical obscolescence? No amount of development model engineering can offset that negative. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 19:17 ` Ingo Molnar 2010-03-21 19:35 ` Antoine Martin 2010-03-21 20:01 ` Avi Kivity @ 2010-03-21 23:35 ` Anthony Liguori 2 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-21 23:35 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 02:17 PM, Ingo Molnar wrote: > >> If you want to improve this, you need to do the following: >> >> 1) Add a userspace daemon that uses vmchannel that runs in the guest and can >> fetch kallsyms and arbitrary modules. If that daemon lives in >> tools/perf, that's fine. >> > Adding any new daemon to an existing guest is a deployment and usability > nightmare. > > The basic rule of good instrumentation is to be transparent. The moment we > have to modify the user-space of a guest just to monitor it, the purpose of > transparent instrumentation is defeated. > > That was one of the fundamental usability mistakes of Oprofile. > > There is no 'perf' daemon - all the perf functionality is _built in_, and for > very good reasons. It is one of the main reasons for perf's success as well. > The solution should be a long lived piece of code that runs without kernel privileges. How the code is delivered to the user is a separate problem. If you want to argue that the kernel should build an initramfs that contains some things that always should be shipped with the kernel but don't need to be within the kernel, I think that's something that's long over due. We could make it a kernel thread, but what's the point? It's much safer for it to be a userspace thread and it doesn't need to interact with the kernel in an intimate way. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-19 8:53 ` Ingo Molnar 2010-03-19 12:56 ` Anthony Liguori @ 2010-03-20 7:35 ` Avi Kivity 2010-03-21 19:06 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-20 7:35 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/19/2010 10:53 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> There were two negative reactions immediately, both showed a fundamental >>> server versus desktop bias: >>> >>> - you did not accept that the most important usecase is when there is a >>> single guest running. >>> >> Well, it isn't. >> > Erm, my usability points are _doubly_ true when there are multiple guests ... > > The inconvenience of having to type: > > perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ > --guestmodules=/home/ymzhang/guest/modules top > > is very obvious even with a single guest. Now multiply that by more guests ... > Yes. That's why I asked how this is handled. > The crux is: we are working on improving KVM instrumentation. There are > working patches posted to this thread and we would like to have/implement an > automatism to allow the discovery of all this information. The information > should be available to the developer who wants it, and easily/transparently so > - in true Linux fashion. > > >>> - the reaction to the 'how do we get symbols out of the guest' sub-question >>> was, paraphrased: 'we dont want that due to<unspecified> security threat >>> to XYZ selinux usecase with lots of guests'. >>> >> When I review a patch, I try to think of the difficult cases, not >> just the easy case. >> > You havent articulated an actionable reason and you have suggested no solution > either, I did suggest a symbol server, and using a well-known location, though I'm unhappy with it. Multiple guest management should be done by the appropriate tools, not qemu or implicitly. > you just passive-agressive backed the claim that giving developers > access to the symbol space is some sort of vague 'security threat'. > Passive-aggressive? Should I see a doctor? > If that is not so i'd be glad to be proven wrong. > > >>> Anyone being aware of how Linux and KVM is being used on the desktop will >>> know how detached that attitude is from the typical desktop usecase ... >>> >>> Usability _never_ sucks because of lack of patches or lack of suggestions. >>> I bet if you made the next server feature contingent on essential >>> usability fixes they'd happen overnight - for God's sake there's been 1000 >>> commits in the last 3 months in the Qemu repository so there's plenty of >>> manpower... >>> >> First of all I am not a qemu maintainer. [...] >> > That is the crux of the matter. My experience in these threads was that no-one > really seems to feel in charge of the whole thing. I am comfortable with having someone I trust maintain qemu. While sometimes Anthony overrides me on issues where I know I'm right and he's wrong, still I prefer that to having to do everything myself, I would surely do a worse job due to overload. I you actually look at qemu patches, the vast majority have little to do directly with kvm; and I (along with Marcelo) maintain the kvm integration in qemu. > Should we really wonder why > KVM usability sucks? > That wouldn't change at all if I were to maintain it, since I wouldn't start writing a GUI for it and wouldn't force other contributors to do so as a condition for accepting unrelated patches. >> [...] Second, from my point of view all contributors are volunteers (perhaps >> their employer volunteered them, but there's no difference from my >> perspective). Asking them to repaint my apartment as a condition to get a >> patch applied is abuse. If a patch is good, it gets applied. >> > This is one of the weirdest arguments i've seen in this thread. Almost all the > time do we make contributions conditional on the general shape of the project. > Developers dont get to do just the fun stuff. > So, do you think a reply to a patch along the lines of NAK. Improving scalability is pointless while we don't have a decent GUI. I'll review you RCU patches _after_ you've contributed a usable GUI. ? > This is a basic quid pro quo: new features introduce risks and create > additional workload not just to the originating developer but on the rest of > the community as well. You should check how Linus has pulled new features in > the past 15 years: he very much requires the existing code to first be > top-notch before he accepts new features for a given area of functionality. > For a given area, yes. It makes sense to clean up code before changing it, otherwise cruft accumulates rapidly. What you're describing is completely different and amounts to total disregard of contributors' time and effort. > Doing that and insisting on developers to see those imbalances as well is > absolutely essential to code quality: otherwise everyone would be running > around implementing just the features they are interested in, without regard > for the general health of the project. > The general health of qemu in terms of code quality was indeed pretty bad and there was (and is) a massive effort to modernise it. If you're interested look at qdev and qmp. Both are efforts to improve the infrastructure rather than add features on rotten code, and very successful IMO. There was no effort to write a GUI since no one appears to be motivated to do it except you. > Of course, if you keep the project in two halves (KVM and Qemu), and pretend > that they are separate and have little relation, imbalances of quality can > mount up and you can throw your hands up and say that it's "too bad, I'm not > maintaining that". It is your basic duty as a Linux maintainer to keep > balances of quality. I do it all day, other maintainers do it all day. > IMO qemu quality has improved dramatically in the last year or two. >>> Usability suckage - and i'm not going to be popular for saying this out >>> loud - almost always shows a basic maintainer disconnect with the real >>> world. See your very first reactions to my 'KVM usability' observations. >>> Read back your and Anthony's replies: total 'sure, patches welcome' kind >>> of indifference. It is _your project_, not some other project down the >>> road ... >>> >> I could drop everything and write a gtk GUI for qemu. Is that what you >> want? >> > No, my suggestion to you (it's up to you whether you give my opinion any > weight) is to accept your mistakes and improve, and to not stand in the way of > people who'd like to improve the situation. You are happy with the server > features and you also made it clear that you dont feel responsible for the > rest of the package - which is a big mistake IMO. > If there were no capable maintainer I would reluctantly step in. That is not the case. If I were to displace Anthony then qemu quality would suffer, or I would have to drop kvm maintainership, or, if some false modesty is allowed, perhaps both. > Also, you have demonstrated it in this thread that you have near zero > technical clue about basic desktop and development usability matters > Neither do you. At least I have spent enough time among real usability people to know this. I don't have any pretences in this area and am happy to leave it to the experts. As infrastructure projects kvm and qemu should concentrate on providing flexible capabilities to consumers, which then expose it to users. These consumers can be server-oriented management applications, or end-user GUIs. My preferred plan for GUIs, btw, is a plugin based approach where qemu exposes its internal objects (the same ones that are exposed to management applications) to the GUI which can then manipulate them, without being co-maintained in the same code base. This allows multiple GUIs (KDE and GNOME) and allows people with a clue to work on them. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-20 7:35 ` Avi Kivity @ 2010-03-21 19:06 ` Ingo Molnar 2010-03-21 20:22 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 19:06 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > >> [...] Second, from my point of view all contributors are volunteers > >> (perhaps their employer volunteered them, but there's no difference from > >> my perspective). Asking them to repaint my apartment as a condition to > >> get a patch applied is abuse. If a patch is good, it gets applied. > > > > This is one of the weirdest arguments i've seen in this thread. Almost all > > the time do we make contributions conditional on the general shape of the > > project. Developers dont get to do just the fun stuff. > > So, do you think a reply to a patch along the lines of > > NAK. Improving scalability is pointless while we don't have a decent GUI. > I'll review you RCU patches > _after_ you've contributed a usable GUI. > > ? What does this have to do with RCU? I'm talking about KVM, which is a Linux kernel feature that is useless without a proper, KVM-specific app making use of it. RCU is a general kernel performance feature that works across the board. It helps KVM indirectly, and it helps many other kernel subsystems as well. It needs no user-space tool to be useful. KVM on the other hand is useless without a user-space tool. [ Theoretically you might have a fair point if it were a critical feature of RCU for it to have a GUI, and if the main tool that made use of it sucked. But it isnt and you should know that. ] Had you suggested the following 'NAK', applied to a different, relevant subsystem: | NAK. Improving scalability is pointless while we don't have a usable | tool. I'll review you perf patches _after_ you've contributed a usable | tool. you would have a fair point. In fact, we are doing that we are living by that. It makes absolutely zero sense to improve the scalability of perf if its usability sucks. So where you are trying to point out an inconsistency in my argument there is none. > > This is a basic quid pro quo: new features introduce risks and create > > additional workload not just to the originating developer but on the rest > > of the community as well. You should check how Linus has pulled new > > features in the past 15 years: he very much requires the existing code to > > first be top-notch before he accepts new features for a given area of > > functionality. > > For a given area, yes. [...] That is my precise point. KVM is a specific subsystem or "area" that makes no sense without the user-space tooling it relates to. You seem to argue that you have no 'right' to insist on good quality of that tooling - and IMO you are fundamentally wrong with that. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 19:06 ` Ingo Molnar @ 2010-03-21 20:22 ` Avi Kivity 2010-03-21 20:55 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-21 20:22 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 09:06 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>>> [...] Second, from my point of view all contributors are volunteers >>>> (perhaps their employer volunteered them, but there's no difference from >>>> my perspective). Asking them to repaint my apartment as a condition to >>>> get a patch applied is abuse. If a patch is good, it gets applied. >>>> >>> This is one of the weirdest arguments i've seen in this thread. Almost all >>> the time do we make contributions conditional on the general shape of the >>> project. Developers dont get to do just the fun stuff. >>> >> So, do you think a reply to a patch along the lines of >> >> NAK. Improving scalability is pointless while we don't have a decent GUI. >> I'll review you RCU patches >> _after_ you've contributed a usable GUI. >> >> ? >> > What does this have to do with RCU? > The example was rcuifying kvm which took place a bit ago. Sorry, it wasn't clear. > I'm talking about KVM, which is a Linux kernel feature that is useless without > a proper, KVM-specific app making use of it. > > RCU is a general kernel performance feature that works across the board. It > helps KVM indirectly, and it helps many other kernel subsystems as well. It > needs no user-space tool to be useful. > Correct. So should I tell someone that has sent a patch that rcu-ified kvm in order to scale it, that I won't accept the patch unless they do some usability userspace work? say, implementing an eject button. That's what I understood you to mean. > KVM on the other hand is useless without a user-space tool. > > [ Theoretically you might have a fair point if it were a critical feature of > RCU for it to have a GUI, and if the main tool that made use of it sucked. > But it isnt and you should know that. ] > > Had you suggested the following 'NAK', applied to a different, relevant > subsystem: > > | NAK. Improving scalability is pointless while we don't have a usable > | tool. I'll review you perf patches _after_ you've contributed a usable > | tool. > That might hold, but the tool is usable at least for some people - it runs in production. The people running it won't benefit from an eject button or any usability improvement since they run it through a centralized management tool that hides everything. They will benefit from the scalability patches. Should I still make those patches conditional on scalability work that is of no interest to the submitter? > >>> This is a basic quid pro quo: new features introduce risks and create >>> additional workload not just to the originating developer but on the rest >>> of the community as well. You should check how Linus has pulled new >>> features in the past 15 years: he very much requires the existing code to >>> first be top-notch before he accepts new features for a given area of >>> functionality. >>> >> For a given area, yes. [...] >> > That is my precise point. > > KVM is a specific subsystem or "area" that makes no sense without the > user-space tooling it relates to. You seem to argue that you have no 'right' > to insist on good quality of that tooling - and IMO you are fundamentally > wrong with that. > kvm contains many sub-areas. I'm not going to tie unrelated things together like the GUI and sclability, configuration file format and emulator correctness, nested virtualization and qcow2 asynchronity, or other crazy combinations. People either leave en mass or become frustrated if they can't. I do reject patches touching a sub-area that I think need to be done in userspace, for example. That's not to say kvm development is random. We have a weekly conference call where regular contributors and maintainers of both qemu and kvm participate and where we decide where to focus. Sadly the issue of a qemu GUI is not raised often. Perhaps you can participate and voice your concerns. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:22 ` Avi Kivity @ 2010-03-21 20:55 ` Ingo Molnar 2010-03-21 21:42 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 20:55 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/21/2010 09:06 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>>>[...] Second, from my point of view all contributors are volunteers > >>>>(perhaps their employer volunteered them, but there's no difference from > >>>>my perspective). Asking them to repaint my apartment as a condition to > >>>>get a patch applied is abuse. If a patch is good, it gets applied. > >>>This is one of the weirdest arguments i've seen in this thread. Almost all > >>>the time do we make contributions conditional on the general shape of the > >>>project. Developers dont get to do just the fun stuff. > >>So, do you think a reply to a patch along the lines of > >> > >> NAK. Improving scalability is pointless while we don't have a decent GUI. > >>I'll review you RCU patches > >> _after_ you've contributed a usable GUI. > >> > >>? > >What does this have to do with RCU? > > The example was rcuifying kvm which took place a bit ago. Sorry, it wasn't > clear. > > > I'm talking about KVM, which is a Linux kernel feature that is useless > > without a proper, KVM-specific app making use of it. > > > > RCU is a general kernel performance feature that works across the board. > > It helps KVM indirectly, and it helps many other kernel subsystems as > > well. It needs no user-space tool to be useful. > > Correct. So should I tell someone that has sent a patch that rcu-ified kvm > in order to scale it, that I won't accept the patch unless they do some > usability userspace work? say, implementing an eject button. That's what I > understood you to mean. Of course you could say the following: ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not able to add this to the v2.6.35 kernel queue anymore as the ongoing usability work already takes up all of the project's maintainer and testing bandwidth. If you want the feature to be merged sooner than that then please help us cut down on the TODO and BUGS list that can be found at XYZ. There's quite a few low hanging fruits there. ' Although this RCU example is 'worst' possible example, as it's a pure speedup change with no functionality effect. Consider the _other_ examples that are a lot more clear: ' If you expose paravirt spilocks via KVM please also make sure the KVM tooling can make use of it, has an option for it to configure it, and that it has sufficient efficiency statistics displayed in the tool for admins to monitor.' ' If you create this new paravirt driver then please also make sure it can be configured in the tooling. ' ' Please also add a testcase for this bug to tools/kvm/testcases/ so we dont repeat this same mistake in the future. ' I'd say most of the high-level feature work in KVM has tooling impact. And note the important arguement that the 'eject button' thing would not occur naturally in a project that is well designed and has a good quality balance. It would only occur in the transitionary period if a big lump of lower-quality code is unified with higher-quality code. Then indeed a lot of pressure gets created on the people working on the high-quality portion to go over and fix the low-quality portion. Which, btw., is an unconditonally good thing ... But even an RCU speedup can be fairly linked/ordered to more pressing needs of a project. Really, the unification of two tightly related pieces of code has numerous clear advantages. Please give it some thought before rejecting it. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 20:55 ` Ingo Molnar @ 2010-03-21 21:42 ` Avi Kivity 2010-03-21 21:54 ` Ingo Molnar 2010-03-21 22:00 ` Ingo Molnar 0 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-21 21:42 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 10:55 PM, Ingo Molnar wrote: > > Of course you could say the following: > > ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not > able to add this to the v2.6.35 kernel queue anymore as the ongoing > usability work already takes up all of the project's maintainer and > testing bandwidth. If you want the feature to be merged sooner than that > then please help us cut down on the TODO and BUGS list that can be found > at XYZ. There's quite a few low hanging fruits there. ' > That would be shooting at my own foot as well as the contributor's since I badly want that RCU stuff, and while a GUI would be nice, that itch isn't on my back. You're asking a developer and a maintainer to put off the work they're interested in, in order to work on something someone else is interested in, but not contributing the work. > Although this RCU example is 'worst' possible example, as it's a pure speedup > change with no functionality effect. > > Consider the _other_ examples that are a lot more clear: > > ' If you expose paravirt spilocks via KVM please also make sure the KVM > tooling can make use of it, has an option for it to configure it, and > that it has sufficient efficiency statistics displayed in the tool for > admins to monitor.' > > ' If you create this new paravirt driver then please also make sure it can > be configured in the tooling. ' > > ' Please also add a testcase for this bug to tools/kvm/testcases/ so we dont > repeat this same mistake in the future. ' > All three happen quite commonly in qemu/kvm development. Of course someone who develops a feature also develops a patch that exposes it in qemu. There are several test cases in qemu-kvm.git/kvm/user/test. > I'd say most of the high-level feature work in KVM has tooling impact. > Usually, pretty low. Plumbing down a feature is usually trivial. There are exceptions, of course - smp is only supported in qemu-kvm.git, not in upstream qemu.git, for example. In any case of course the work is done in both qemu and kvm - do you think people develop features to see them bitrot? > And note the important arguement that the 'eject button' thing would not occur > naturally in a project that is well designed and has a good quality balance. > It would only occur in the transitionary period if a big lump of lower-quality > code is unified with higher-quality code. Then indeed a lot of pressure gets > created on the people working on the high-quality portion to go over and fix > the low-quality portion. > It's a matter of priorities. > Which, btw., is an unconditonally good thing ... > > But even an RCU speedup can be fairly linked/ordered to more pressing needs of > a project. > Pressing to whom? > Really, the unification of two tightly related pieces of code has numerous > clear advantages. Please give it some thought before rejecting it. > I'm not blind to the advantages. Dropping tcg would be the biggest of them by far (much more than moving the repository, IMO). But there are disadvantages as well. Around two years ago I seriously considered forking qemu, at this time I do not think it is a good idea. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:42 ` Avi Kivity @ 2010-03-21 21:54 ` Ingo Molnar 2010-03-22 0:16 ` Anthony Liguori 2010-03-22 7:13 ` Avi Kivity 2010-03-21 22:00 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 21:54 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/21/2010 10:55 PM, Ingo Molnar wrote: > > > >Of course you could say the following: > > > > ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not > > able to add this to the v2.6.35 kernel queue anymore as the ongoing > > usability work already takes up all of the project's maintainer and > > testing bandwidth. If you want the feature to be merged sooner than that > > then please help us cut down on the TODO and BUGS list that can be found > > at XYZ. There's quite a few low hanging fruits there. ' > > That would be shooting at my own foot as well as the contributor's since I > badly want that RCU stuff, and while a GUI would be nice, that itch isn't on > my back. I think this sums up the root cause of all the problems i see with KVM pretty well. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:54 ` Ingo Molnar @ 2010-03-22 0:16 ` Anthony Liguori 2010-03-22 11:59 ` Ingo Molnar 2010-03-22 7:13 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 0:16 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 04:54 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/21/2010 10:55 PM, Ingo Molnar wrote: >> >>> Of course you could say the following: >>> >>> ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not >>> able to add this to the v2.6.35 kernel queue anymore as the ongoing >>> usability work already takes up all of the project's maintainer and >>> testing bandwidth. If you want the feature to be merged sooner than that >>> then please help us cut down on the TODO and BUGS list that can be found >>> at XYZ. There's quite a few low hanging fruits there. ' >>> >> That would be shooting at my own foot as well as the contributor's since I >> badly want that RCU stuff, and while a GUI would be nice, that itch isn't on >> my back. >> > I think this sums up the root cause of all the problems i see with KVM pretty > well. > A good maintainer has to strike a balance between asking more of people than what they initially volunteer and getting people to implement the less fun things that are nonetheless required. The kernel can take this to an extreme because at the end of the day, it's the only game in town and there is an unending number of potential volunteers. Most other projects are not quite as fortunate. When someone submits a patch set to QEMU implementing a new network backend for raw sockets, we can push back about how it fits into the entire stack wrt security, usability, etc. Ultimately, we can arrive at a different, more user friendly solution (networking helpers) and along with some time investment on my part, we can create a much nicer, more user friendly solution. Still command line based though. Responding to such a patch set with, replace the SDL front end with a GTK one that lets you graphically configure networking, is not reasonable and the result would be one less QEMU contributor in the long run. Overtime, we can, and are, pushing people to focus more on usability. But that doesn't get you a first class GTK GUI overnight. The only way you're going to get that is by having a contributor be specifically interesting in building such a thing. We simply haven't had that in the past 5 years that I've been involved in the project. If someone stepped up to build this, I'd certainly support it in every way possible and there are probably some steps we could take to even further encourage this. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 0:16 ` Anthony Liguori @ 2010-03-22 11:59 ` Ingo Molnar 0 siblings, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 11:59 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Pekka Enberg, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/21/2010 04:54 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>On 03/21/2010 10:55 PM, Ingo Molnar wrote: > >>>Of course you could say the following: > >>> > >>> ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not > >>> able to add this to the v2.6.35 kernel queue anymore as the ongoing > >>> usability work already takes up all of the project's maintainer and > >>> testing bandwidth. If you want the feature to be merged sooner than that > >>> then please help us cut down on the TODO and BUGS list that can be found > >>> at XYZ. There's quite a few low hanging fruits there. ' > >>That would be shooting at my own foot as well as the contributor's since I > >>badly want that RCU stuff, and while a GUI would be nice, that itch isn't on > >>my back. > >I think this sums up the root cause of all the problems i see with KVM pretty > >well. > > A good maintainer has to strike a balance between asking more of people than > what they initially volunteer and getting people to implement the less fun > things that are nonetheless required. [...] Sorry to be blunt, but i dont think there's a different way to say it: i am a user of the software you are maintaining (Qemu) and i dont think you have the basis to educate people about what a good maintainer should do to achieve a quality end result. I think you could/should learn much from Linus and others who very much require quality to permeate the full dimension of a contribution (including usability), beyond the narrow, local scope of the contribution. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:54 ` Ingo Molnar 2010-03-22 0:16 ` Anthony Liguori @ 2010-03-22 7:13 ` Avi Kivity 2010-03-22 11:14 ` Ingo Molnar 2010-03-24 12:06 ` Paolo Bonzini 1 sibling, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 7:13 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 11:54 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/21/2010 10:55 PM, Ingo Molnar wrote: >> >>> Of course you could say the following: >>> >>> ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not >>> able to add this to the v2.6.35 kernel queue anymore as the ongoing >>> usability work already takes up all of the project's maintainer and >>> testing bandwidth. If you want the feature to be merged sooner than that >>> then please help us cut down on the TODO and BUGS list that can be found >>> at XYZ. There's quite a few low hanging fruits there. ' >>> >> That would be shooting at my own foot as well as the contributor's since I >> badly want that RCU stuff, and while a GUI would be nice, that itch isn't on >> my back. >> > I think this sums up the root cause of all the problems i see with KVM pretty > well. > I think we agree at last. Neither I nor my employer are interested in running qemu as a desktop-on-desktop tool, therefore I don't invest any effort in that direction, or require it from volunteers. If you think a good GUI is so badly needed, either write one yourself, or convince someone else to do it. (btw, why are you interested in desktop-on-desktop? one use case is developers, which don't really need fancy GUIs; a second is people who test out distributions, but that doesn't seem to be a huge population; and a third is people running Windows for some application that doesn't run on Linux - hopefully a small catergory as well. Seems to be quite a small target audience, compared to, say, video editing) -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 7:13 ` Avi Kivity @ 2010-03-22 11:14 ` Ingo Molnar 2010-03-22 11:23 ` Alexander Graf 2010-03-22 12:29 ` Avi Kivity 2010-03-24 12:06 ` Paolo Bonzini 1 sibling, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 11:14 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > On 03/21/2010 11:54 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>On 03/21/2010 10:55 PM, Ingo Molnar wrote: > >>>Of course you could say the following: > >>> > >>> ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not > >>> able to add this to the v2.6.35 kernel queue anymore as the ongoing > >>> usability work already takes up all of the project's maintainer and > >>> testing bandwidth. If you want the feature to be merged sooner than that > >>> then please help us cut down on the TODO and BUGS list that can be found > >>> at XYZ. There's quite a few low hanging fruits there. ' > >>That would be shooting at my own foot as well as the contributor's since I > >>badly want that RCU stuff, and while a GUI would be nice, that itch isn't on > >>my back. > >I think this sums up the root cause of all the problems i see with KVM pretty > >well. > > I think we agree at last. Neither I nor my employer are interested in > running qemu as a desktop-on-desktop tool, therefore I don't invest any > effort in that direction, or require it from volunteers. Obviously your employer at least in part defers to you when it comes to KVM priorities. So, just to make this really clear, _you_ are not interested in running qemu as a desktop-on-desktop tool, subsequently this kind of disinterest-for-desktop-usability trickled through the whole KVM stack and poisoned your attitude and your contributor's attitude. Too sad really and it's doubly sad that you dont feel anything wrong about that. > If you think a good GUI is so badly needed, either write one yourself, or > convince someone else to do it. To a certain degree we are trying to do a small bit of that (see this very thread) - and you are NAK-ing and objecting the heck out of it via your unreasonable microkernelish and server-centric views. With constant maintainer disinterest there's no wonder a non-desktop-oriented KVM becomes a self-fulfilling prophecy: you think the desktop does not matter, hence it becomes a reality in KVM space which you can constantly refer back to as a 'fact'. Which i find dishonest and disingenious at best. > (btw, why are you interested in desktop-on-desktop? one use case is > developers, which don't really need fancy GUIs; a second is people who test > out distributions, but that doesn't seem to be a huge population; and a > third is people running Windows for some application that doesn't run on > Linux - hopefully a small catergory as well. Seems to be quite a small > target audience, compared to, say, video editing) I'm interested in desktop-on-desktop because i walk this world with open eyes and i care about Linux, and these days qemu-kvm is the first thing a new Linux user sees about Linux virtualization. I've observed several people i know in person to turn away from Linux and go back to Windows or go over to Apple because they had a much more mature solution. I'd probably turn away from Linux myself if i were a newbie and if i were forced to use KVM on the desktop today. Again, you dont seem to realize that you as a maintainer are at a central point where you have the ability to turn the self-fulfilling prophecy that 'nobody cares about the Linux desktop' into a reality - or where you have the ability to prevent it from happening. It is hugely harmful process, especially as you seem to delude yourself that you have nothing to do with it. Anyway, it's good you expressed your views about this as this will help the chances of a fresh restart. (which chances are still not too good though) Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 11:14 ` Ingo Molnar @ 2010-03-22 11:23 ` Alexander Graf 2010-03-22 12:33 ` Lukas Kolbe 2010-03-22 12:29 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Alexander Graf @ 2010-03-22 11:23 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 22.03.2010, at 12:14, Ingo Molnar wrote: > > * Avi Kivity <avi@redhat.com> wrote: > >> On 03/21/2010 11:54 PM, Ingo Molnar wrote: >>> * Avi Kivity<avi@redhat.com> wrote: >>> >>>> On 03/21/2010 10:55 PM, Ingo Molnar wrote: >>>>> Of course you could say the following: >>>>> >>>>> ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not >>>>> able to add this to the v2.6.35 kernel queue anymore as the ongoing >>>>> usability work already takes up all of the project's maintainer and >>>>> testing bandwidth. If you want the feature to be merged sooner than that >>>>> then please help us cut down on the TODO and BUGS list that can be found >>>>> at XYZ. There's quite a few low hanging fruits there. ' >>>> That would be shooting at my own foot as well as the contributor's since I >>>> badly want that RCU stuff, and while a GUI would be nice, that itch isn't on >>>> my back. >>> I think this sums up the root cause of all the problems i see with KVM pretty >>> well. >> >> I think we agree at last. Neither I nor my employer are interested in >> running qemu as a desktop-on-desktop tool, therefore I don't invest any >> effort in that direction, or require it from volunteers. > > Obviously your employer at least in part defers to you when it comes to KVM > priorities. > > So, just to make this really clear, _you_ are not interested in running qemu > as a desktop-on-desktop tool, subsequently this kind of > disinterest-for-desktop-usability trickled through the whole KVM stack and > poisoned your attitude and your contributor's attitude. > > Too sad really and it's doubly sad that you dont feel anything wrong about > that. Please, don't jump to unjust conclusions. The whole point is that there's no money behind desktop-on-desktop virtualization. Thus nobody pays people to work on it. Thus nothing significant happens in that space. If there was someone standing up to create a really decent desktop qemu front-end I'm confident we'd even officially suggest using that. In fact, that whole discussion did come up in the weekly Qemu/KVM community call and everybody agreed heavily that we do need a desktop client. The problem is just that there is nobody standing up. And I hope you don't expect Avi to be the one creating a GUI. Alex ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 11:23 ` Alexander Graf @ 2010-03-22 12:33 ` Lukas Kolbe 0 siblings, 0 replies; 390+ messages in thread From: Lukas Kolbe @ 2010-03-22 12:33 UTC (permalink / raw) To: Alexander Graf Cc: Ingo Molnar, Avi Kivity, Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins Am Montag, den 22.03.2010, 12:23 +0100 schrieb Alexander Graf: > >> I think we agree at last. Neither I nor my employer are interested in > >> running qemu as a desktop-on-desktop tool, therefore I don't invest any > >> effort in that direction, or require it from volunteers. > > > > Obviously your employer at least in part defers to you when it comes to KVM > > priorities. > > > > So, just to make this really clear, _you_ are not interested in running qemu > > as a desktop-on-desktop tool, subsequently this kind of > > disinterest-for-desktop-usability trickled through the whole KVM stack and > > poisoned your attitude and your contributor's attitude. > > > > Too sad really and it's doubly sad that you dont feel anything wrong about > > that. > > Please, don't jump to unjust conclusions. > > The whole point is that there's no money behind desktop-on-desktop > virtualization. Thus nobody pays people to work on it. Thus nothing > significant happens in that space. > > If there was someone standing up to create a really decent desktop > qemu front-end I'm confident we'd even officially suggest using that. > In fact, that whole discussion did come up in the weekly Qemu/KVM > community call and everybody agreed heavily that we do need a desktop > client. > > The problem is just that there is nobody standing up. And I hope you > don't expect Avi to be the one creating a GUI. Besides, Ingo could just go ahead and use libvirt together with virt-manager. It solves a few of the usability issues he came up with somewhere in this thread, is available even in every current distribution, and *actually* works quite well for the desktop usecase. It just desparatly needs more brainpower and manpower to make it a competitor to VirtualBox & Co, because its not as polished and featurecomplete yet. But I bet virt-managers maintainers welcome patches to fix and enhance usability. Most of the needed fixes probably wouldn't touch qemu at all, let alone kvm. Sorry to chime in with my opinion, but this whole thread is incredibly boring and full of non-arguments yet really highly amusing. -- Lukas ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 11:14 ` Ingo Molnar 2010-03-22 11:23 ` Alexander Graf @ 2010-03-22 12:29 ` Avi Kivity 2010-03-22 12:44 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 12:29 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 01:14 PM, Ingo Molnar wrote: > >> I think we agree at last. Neither I nor my employer are interested in >> running qemu as a desktop-on-desktop tool, therefore I don't invest any >> effort in that direction, or require it from volunteers. >> > Obviously your employer at least in part defers to you when it comes to KVM > priorities. > In part, yes. > So, just to make this really clear, _you_ are not interested in running qemu > as a desktop-on-desktop tool, subsequently this kind of > disinterest-for-desktop-usability trickled through the whole KVM stack and > poisoned your attitude and your contributor's attitude. > I am also disinterested in ppc virtualization, yet it happened. I am disinterested in ia64 virtualization, yet it happened. I am disinterested in s390 virtualization, yet it happened. Linus doesn't care about virtualization, yet it happened. I don't tell my contributor what to be interested in, only whether their patches are good or not. I can tell you that Red Hat contributors don't work on a desktop kvm GUI not because I discourage them, but because the product we are working on does not contain a desktop kvm GUI. Jan Kiszka contributed a lot of debugger features, fixes, and improvement, presumably he and/or his employer need that more than a kvm desktop GUI. I can't see why you see anything wrong with this. People write patches for their own interest, not yours or mine. > Too sad really and it's doubly sad that you dont feel anything wrong about > that. > It would be lovely to have a desktop kvm GUI. I don't feel I have to write it myself or compel others to write it. I don't feel sad about it. >> If you think a good GUI is so badly needed, either write one yourself, or >> convince someone else to do it. >> > To a certain degree we are trying to do a small bit of that (see this very > thread) - and you are NAK-ing and objecting the heck out of it via your > unreasonable microkernelish and server-centric views. > The perf bits have nothing to do with a GUI or usability for general users. Calling them "unreasonable microkernelish sever-centric views" is just a way of not addressing them. > With constant maintainer disinterest there's no wonder a non-desktop-oriented > KVM becomes a self-fulfilling prophecy: you think the desktop does not matter, > hence it becomes a reality in KVM space which you can constantly refer back to > as a 'fact'. > It's a fact that virtualization is happening in the data center, not on the desktop. You think a kvm GUI can become a killer application? fine, write one. You don't need any consent from me as kvm maintainer (if patches are needed to kvm that improve the desktop experience, I'll accept them, though they'll have to pass my unreasonable microkernelish filters). If you're right then the desktop kvm GUI will be a huge hit with zillions of developers and people will drop Windows and switch to Linux just to use it. But my opinion is that it will end up like virtualbox, a nice app that you can use to run Windows-on-Linux, but is not all that useful. > Which i find dishonest and disingenious at best. > If you're going to use words like 'dishonest' then please don't send me any more email. >> (btw, why are you interested in desktop-on-desktop? one use case is >> developers, which don't really need fancy GUIs; a second is people who test >> out distributions, but that doesn't seem to be a huge population; and a >> third is people running Windows for some application that doesn't run on >> Linux - hopefully a small catergory as well. Seems to be quite a small >> target audience, compared to, say, video editing) >> > I'm interested in desktop-on-desktop because i walk this world with open eyes > and i care about Linux, and these days qemu-kvm is the first thing a new Linux > user sees about Linux virtualization. I've observed several people i know in > person to turn away from Linux and go back to Windows or go over to Apple > because they had a much more mature solution. > Which distribution are they using? Most people would see virt-manager as the first thing, not open gnome-terminal and start typing in the qemu command line. While it's not perfect, it does have a shiny GUI with lots of tabs and buttons. > I'd probably turn away from Linux myself if i were a newbie and if i were > forced to use KVM on the desktop today. > > Again, you dont seem to realize that you as a maintainer are at a central > point where you have the ability to turn the self-fulfilling prophecy that > 'nobody cares about the Linux desktop' into a reality - or where you have the > ability to prevent it from happening. It is hugely harmful process, especially > as you seem to delude yourself that you have nothing to do with it. > It doesn't have to be me. Better to pick someone who has a clue about usability to design and guide this effort. That someone can work on qemu, or if they prefer, tools/kvm (we worked hard to avoid making kvm tied to a single userspace). The kvm toolstack is maintained by multiple people - Marcelo and myself at the kernel level, Anthony and the other qemu maintainers at the qemu single-guest level, Daniel Veillard and Dan Berrange at the libvirt or host level, and Cole Robinson at the virt-manager or GUI level. It's unreasonable to ask one person to do all of this, just like Linus doesn't maintain the scheduler even though it is so important. > Anyway, it's good you expressed your views about this as this will help the > chances of a fresh restart. (which chances are still not too good though) > All that's needed is to find someone with the skills, time, and interest. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:29 ` Avi Kivity @ 2010-03-22 12:44 ` Ingo Molnar 2010-03-22 12:52 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 12:44 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > On 03/22/2010 01:14 PM, Ingo Molnar wrote: > > > >>I think we agree at last. Neither I nor my employer are interested in > >>running qemu as a desktop-on-desktop tool, therefore I don't invest any > >>effort in that direction, or require it from volunteers. > >Obviously your employer at least in part defers to you when it comes to KVM > >priorities. > > In part, yes. > > > So, just to make this really clear, _you_ are not interested in running > > qemu as a desktop-on-desktop tool, subsequently this kind of > > disinterest-for-desktop-usability trickled through the whole KVM stack and > > poisoned your attitude and your contributor's attitude. > > I am also disinterested in ppc virtualization, yet it happened. I am > disinterested in ia64 virtualization, yet it happened. I am disinterested > in s390 virtualization, yet it happened. > > Linus doesn't care about virtualization, yet it happened. You should know the answer yourself: the difference is that usability is a core quality of any project. I as a maintainer can be neutral towards a number of features and patch attributes that i dont consider key aspects. (although they can grow out to become key features in the future. SMP was a fringe thing 15 years ago.) Usability is not an attribute you can ignore and i for sure am never neutral towards usability deficiencies in patches - i consider usability a key quality. > I don't tell my contributor what to be interested in, only whether their > patches are good or not. [...] Whether a feature is usable or not is sure a metric of 'goodness'. You have restricted your metric of goodness artificially to not include usability. You do that by claiming that the user-space tooling of KVM, while being functionally absolutely essential for any user to even try out KVM, is 'separate' and has no quality connection with the kernel bits of KVM. It is a convenient argument that allows you to do the kernel bits only. It is absolutely catastrophic to the user who'd like to see a usable solution and a single project who stands behind their tech. Thus, _today_, after years of neglect, you can claim that none of the dozens of usability problems of KVM has anything to do with the features you are working on today. It's in a separate project (the so-called 'Qemu' package) after all - none of KVM's business. In reality if you consider it a single project then those bugs were all usability problems introduced earlier on, years ago, when a piece of functionality was exposed via KVM. It adds up and now you claim they have nothing to do with current work. This is why i consider that line of argument rather dishonest ... Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:44 ` Ingo Molnar @ 2010-03-22 12:52 ` Avi Kivity 2010-03-22 14:32 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 12:52 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 02:44 PM, Ingo Molnar wrote: > This is why i consider that line of argument rather dishonest ... > I am not going to reply to any more email from you on this thread. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 12:52 ` Avi Kivity @ 2010-03-22 14:32 ` Ingo Molnar 2010-03-22 14:43 ` Anthony Liguori 2010-03-22 14:46 ` Avi Kivity 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 14:32 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > On 03/22/2010 02:44 PM, Ingo Molnar wrote: > >This is why i consider that line of argument rather dishonest ... > > I am not going to reply to any more email from you on this thread. Because i pointed out that i consider a line of argument intellectually dishonest? I did not say _you_ as a person are dishonest - doing that would be an ad honimen attack against your person. (In fact i dont think you are, to the contrary) An argument can certainly be labeled dishonest in a fair discussion and it is not a personal attack against you to express my opinion about that. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 14:32 ` Ingo Molnar @ 2010-03-22 14:43 ` Anthony Liguori 2010-03-22 15:55 ` Ingo Molnar 2010-03-22 14:46 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 14:43 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 09:32 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/22/2010 02:44 PM, Ingo Molnar wrote: >> >>> This is why i consider that line of argument rather dishonest ... >>> >> I am not going to reply to any more email from you on this thread. >> > Because i pointed out that i consider a line of argument intellectually > dishonest? > > I did not say _you_ as a person are dishonest - doing that would be an ad > honimen attack against your person. (In fact i dont think you are, to the > contrary) > > An argument can certainly be labeled dishonest in a fair discussion and it is > not a personal attack against you to express my opinion about that. > You're being excessively rude in this thread. That might be acceptable on LKML but it's not how the QEMU and KVM communities behave. This thread is a good example of why LKML has the reputation it has. Avi and I argue all of the time on qemu-devel and kvm-devel and it's never degraded into a series of personal attacks like this. I've been trying very hard to turn this into a productive thread attempting to capture your feedback and give clear suggestions about how you can solve achieve your desired functionality. What are you looking to achieve? To you just want to piss and moan about how terrible you think Avi and I are? Or do you want to try to actually help make things better? If you want to help make things better, please focus on making constructive suggestions and clarifying what you see as requirements. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 14:43 ` Anthony Liguori @ 2010-03-22 15:55 ` Ingo Molnar 2010-03-22 16:08 ` Anthony Liguori 2010-03-22 16:12 ` Avi Kivity 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 15:55 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Anthony Liguori <anthony@codemonkey.ws> wrote: > [...] > > I've been trying very hard to turn this into a productive thread attempting > to capture your feedback and give clear suggestions about how you can solve > achieve your desired functionality. I'm glad that we are at this more productive stage. I'm still trying to achieve the very same technological capabilities that i expressed in the first few mails when i reviewed the 'perf kvm' patch that was submitted by Yanmin. The crux of the problem is very simple. To quote my earlier mail: | | - The inconvenience of having to type: | perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ | --guestmodules=/home/ymzhang/guest/modules top | | | is very obvious even with a single guest. Now multiply that by more guests ... | For example we want 'perf kvm top' to do something useful by default: it should find the first guest running and it should report its profile. The tool shouldnt have to guess about where the guests are, what their namespaces is and how to talk to them. We also want easy symbolic access to guest, for example: perf kvm -g OpenSuse-2 record sleep 1 I.e.: - Easy default reference to guest instances, and a way for tools to reference them symbolically as well in the multi-guest case. Preferably something trustable and kernel-provided - not some indirect information like a PID file created by libvirt-manager or so. - Guest-transparent VFS integration into the host, to recover symbols and debug info in binaries, etc. There were a few responses to that but none really addressed those problems - they mostly tried to re-define the problem and suggested that i was wrong to want such capabilities and suggested various inferior approaches instead. See the thread for the details - i think i covered every technical suggestion that was made. So we are still at an impasse as far as i can see. If i overlooked some suggestion that addresses these problems then please let me know ... Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 15:55 ` Ingo Molnar @ 2010-03-22 16:08 ` Anthony Liguori 2010-03-22 16:59 ` Ingo Molnar 2010-03-22 17:11 ` Ingo Molnar 2010-03-22 16:12 ` Avi Kivity 1 sibling, 2 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 16:08 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 10:55 AM, Ingo Molnar wrote: > * Anthony Liguori<anthony@codemonkey.ws> wrote: > > >> [...] >> >> I've been trying very hard to turn this into a productive thread attempting >> to capture your feedback and give clear suggestions about how you can solve >> achieve your desired functionality. >> > I'm glad that we are at this more productive stage. I'm still trying to > achieve the very same technological capabilities that i expressed in the first > few mails when i reviewed the 'perf kvm' patch that was submitted by Yanmin. > > The crux of the problem is very simple. To quote my earlier mail: > > | > | - The inconvenience of having to type: > | perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ > | --guestmodules=/home/ymzhang/guest/modules top > | > | > | is very obvious even with a single guest. Now multiply that by more guests ... > | > > For example we want 'perf kvm top' to do something useful by default: it > should find the first guest running and it should report its profile. > > The tool shouldnt have to guess about where the guests are, what their > namespaces is and how to talk to them. We also want easy symbolic access to > guest, for example: > > perf kvm -g OpenSuse-2 record sleep 1 > Two things are needed. The first thing needed is to be able to enumerate running guests and identify a symbolic name. I have a patch for this and it'll be posted this week or so. perf will need to have a QMP client and it will need to look in ${HOME}/.qemu/qmp/ to sockets to connect to. This is too much to expect from a client and we've got a GSoC idea posted to make a nice library for tools to use to simplify this. The sockets are named based on UUID and you'll have to connect to a guest and ask it for it's name. Some guests don't have names so we'll have to come up with a clever way to describe a nameless VM. > I.e.: > > - Easy default reference to guest instances, and a way for tools to > reference them symbolically as well in the multi-guest case. Preferably > something trustable and kernel-provided - not some indirect information > like a PID file created by libvirt-manager or so. > A guest is not a KVM concept. It's a qemu concept so it needs to be something provided by qemu. The other caveat is that you won't see guests created by libvirt because we're implementing this in terms of a default QMP device and libvirt will disable defaults. This is desired behaviour. libvirt wants to be in complete control and doesn't want a tool like perf interacting with a guest directly. > - Guest-transparent VFS integration into the host, to recover symbols and > debug info in binaries, etc. > The way I'd like to see this implemented is a guest userspace daemon. I think having the guest userspace daemon be something that can be updated by the host is reasonable. In terms of exposing that on the host, my preferred approach is QMP. I'd be happy with a QMP command that is essentially, guest_fs_read(filename) and guest_fd_readdir(path). If desired, one could implement a fuse filesystem that interacted with all local qemu instances to expose this on the host. There's a lot of ugly things about fuse though so I think sticking to QMP is best (particularly with respect to root access of a fuse filesystem). With just those couple things in place, perf should be able to do exactly what you want it to do. Regards, Anthony Liguroi > There were a few responses to that but none really addressed those problems - > they mostly tried to re-define the problem and suggested that i was wrong to > want such capabilities and suggested various inferior approaches instead. See > the thread for the details - i think i covered every technical suggestion that > was made. > > So we are still at an impasse as far as i can see. If i overlooked some > suggestion that addresses these problems then please let me know ... > > Thanks, > > Ingo > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:08 ` Anthony Liguori @ 2010-03-22 16:59 ` Ingo Molnar 2010-03-22 18:28 ` Anthony Liguori 2010-03-22 17:11 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 16:59 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/22/2010 10:55 AM, Ingo Molnar wrote: > >* Anthony Liguori<anthony@codemonkey.ws> wrote: > > > >>[...] > >> > >>I've been trying very hard to turn this into a productive thread attempting > >>to capture your feedback and give clear suggestions about how you can solve > >>achieve your desired functionality. > >I'm glad that we are at this more productive stage. I'm still trying to > >achieve the very same technological capabilities that i expressed in the first > >few mails when i reviewed the 'perf kvm' patch that was submitted by Yanmin. > > > >The crux of the problem is very simple. To quote my earlier mail: > > > > | > > | - The inconvenience of having to type: > > | perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ > > | --guestmodules=/home/ymzhang/guest/modules top > > | > > | > > | is very obvious even with a single guest. Now multiply that by more guests ... > > | > > > > For example we want 'perf kvm top' to do something useful by default: it > > should find the first guest running and it should report its profile. > > > > The tool shouldnt have to guess about where the guests are, what their > > namespaces is and how to talk to them. We also want easy symbolic access > > to guest, for example: > > > > perf kvm -g OpenSuse-2 record sleep 1 > > Two things are needed. The first thing needed is to be able to enumerate > running guests and identify a symbolic name. I have a patch for this and > it'll be posted this week or so. perf will need to have a QMP client and it > will need to look in ${HOME}/.qemu/qmp/ to sockets to connect to. > > This is too much to expect from a client and we've got a GSoC idea posted to > make a nice library for tools to use to simplify this. Ok, that sounds interesting! I'd rather see some raw mechanism that 'perf kvm' could use instead of having to require yet another library (which generally dampens adoption of a tool). So i think we can work from there. Btw., have you considered using Qemu's command name (task->comm[]) as the symbolic name? That way we could see the guest name in 'top' on the host - a nice touch. > The sockets are named based on UUID and you'll have to connect to a guest > and ask it for it's name. Some guests don't have names so we'll have to > come up with a clever way to describe a nameless VM. I think just exposing the UUID in that lazy case would be adequate? It creates pressure for VM launchers to use better symbolic names. > > I.e.: > > > > - Easy default reference to guest instances, and a way for tools to > > reference them symbolically as well in the multi-guest case. Preferably > > something trustable and kernel-provided - not some indirect information > > like a PID file created by libvirt-manager or so. > > A guest is not a KVM concept. It's a qemu concept so it needs to be > something provided by qemu. The other caveat is that you won't see guests > created by libvirt because we're implementing this in terms of a default QMP > device and libvirt will disable defaults. This is desired behaviour. > libvirt wants to be in complete control and doesn't want a tool like perf > interacting with a guest directly. Hm, this sucks for multiple reasons. Firstly, perf isnt a tool that 'interacts', it's an observation tool: just like 'top' is an observation tool. We want to enable developers to see all activities on the system - regardless of who started the VM or who started the process. Imagine if we had a way to hide tasks to hide from 'top'. It would be rather awful. Secondly, it tells us that the concept is fragile if it doesnt automatically enumerate all guests, regardless of how they were created. Full system enumeration is generally best left to the kernel, as it can offer coherent access. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:59 ` Ingo Molnar @ 2010-03-22 18:28 ` Anthony Liguori 0 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 18:28 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 11:59 AM, Ingo Molnar wrote: > > Ok, that sounds interesting! I'd rather see some raw mechanism that 'perf kvm' > could use instead of having to require yet another library (which generally > dampens adoption of a tool). So i think we can work from there. > You can access the protocol directly if you don't want a library dependency. > Btw., have you considered using Qemu's command name (task->comm[]) as the > symbolic name? That way we could see the guest name in 'top' on the host - a > nice touch. > qemu-system-x86_64 -name Fedora,process=qemu-Fedora Does exactly that. We don't make this default based on the element of least surprise. Many users expect to be able to do killall qemu-system-x86 and if we did this by default, that wouldn't work. >> The sockets are named based on UUID and you'll have to connect to a guest >> and ask it for it's name. Some guests don't have names so we'll have to >> come up with a clever way to describe a nameless VM. >> > I think just exposing the UUID in that lazy case would be adequate? It creates > pressure for VM launchers to use better symbolic names. > Yup. >>> I.e.: >>> >>> - Easy default reference to guest instances, and a way for tools to >>> reference them symbolically as well in the multi-guest case. Preferably >>> something trustable and kernel-provided - not some indirect information >>> like a PID file created by libvirt-manager or so. >>> >> A guest is not a KVM concept. It's a qemu concept so it needs to be >> something provided by qemu. The other caveat is that you won't see guests >> created by libvirt because we're implementing this in terms of a default QMP >> device and libvirt will disable defaults. This is desired behaviour. >> libvirt wants to be in complete control and doesn't want a tool like perf >> interacting with a guest directly. >> > Hm, this sucks for multiple reasons. Firstly, perf isnt a tool that > 'interacts', it's an observation tool: just like 'top' is an observation tool. > > We want to enable developers to see all activities on the system - regardless > of who started the VM or who started the process. Imagine if we had a way to > hide tasks to hide from 'top'. It would be rather awful. > > Secondly, it tells us that the concept is fragile if it doesnt automatically > enumerate all guests, regardless of how they were created. > Perf does interact with a guest though because it queries a guest to read it's file system. I understand the point you're making though. If instead of doing a pull interface where the host queries the guest for files, if the guest pushed a small set of files at startup which the host cached, then you could potentially unconditionally expose a "read-only" socket that only exposed limited information. > Full system enumeration is generally best left to the kernel, as it can offer > coherent access. > I don't see why qemu can't offer coherent access. The limitation today is intentional and if it's overly restrictive, we can figure out a means to change it. Regards, Anthony Liguori > Ingo > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:08 ` Anthony Liguori 2010-03-22 16:59 ` Ingo Molnar @ 2010-03-22 17:11 ` Ingo Molnar 2010-03-22 18:30 ` Anthony Liguori 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 17:11 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Anthony Liguori <anthony@codemonkey.ws> wrote: > > - Easy default reference to guest instances, and a way for tools to > > reference them symbolically as well in the multi-guest case. Preferably > > something trustable and kernel-provided - not some indirect information > > like a PID file created by libvirt-manager or so. > > A guest is not a KVM concept. [...] Well, in a sense a guest is a KVM concept too: it's in essence represented via the 'vcpu state attached to a struct mm' abstraction that is attached to the /dev/kvm file descriptor attached to a Linux process. Multiple vcpus can be started by the same process to represent SMP, but the whole guest notion is present: a Linux MM that carries KVM state. In that sense when we type 'perf kvm list' we'd like to get a list of all currently present guests that the developer has permission to profile: i.e. we'd like a list of all [debuggable] Linux tasks that have a KVM instance attached to them. A convenient way to do that would be to use the Qemu process's ->comm[] name, and to have a KVM ioctl that gets us a list of all vcpus that the querying task has ptrace permission to. [the standard permission check we do for instrumentation] No need for communication with Qemu for that - just an ioctl, and an always-guaranteed result that works fine on a whole-system and on a per user basis as well. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:11 ` Ingo Molnar @ 2010-03-22 18:30 ` Anthony Liguori 0 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 18:30 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 12:11 PM, Ingo Molnar wrote: > * Anthony Liguori<anthony@codemonkey.ws> wrote: > > >>> - Easy default reference to guest instances, and a way for tools to >>> reference them symbolically as well in the multi-guest case. Preferably >>> something trustable and kernel-provided - not some indirect information >>> like a PID file created by libvirt-manager or so. >>> >> A guest is not a KVM concept. [...] >> > Well, in a sense a guest is a KVM concept too: it's in essence represented via > the 'vcpu state attached to a struct mm' abstraction that is attached to the > /dev/kvm file descriptor attached to a Linux process. > > Multiple vcpus can be started by the same process to represent SMP, but the > whole guest notion is present: a Linux MM that carries KVM state. > > In that sense when we type 'perf kvm list' we'd like to get a list of all > currently present guests that the developer has permission to profile: i.e. > we'd like a list of all [debuggable] Linux tasks that have a KVM instance > attached to them. > > A convenient way to do that would be to use the Qemu process's ->comm[] name, > and to have a KVM ioctl that gets us a list of all vcpus that the querying > task has ptrace permission to. [the standard permission check we do for > instrumentation] > > No need for communication with Qemu for that - just an ioctl, and an > always-guaranteed result that works fine on a whole-system and on a per user > basis as well. > You need a way to interact with the guest which means you need some type of device. All of the interesting devices are implemented in qemu so you're going to have to interact with qemu if you want meaningful interaction with a guest. Regards, Anthony Liguori > Thanks, > > Ingo > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 15:55 ` Ingo Molnar 2010-03-22 16:08 ` Anthony Liguori @ 2010-03-22 16:12 ` Avi Kivity 2010-03-22 16:16 ` Avi Kivity 2010-03-22 16:51 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 16:12 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 05:55 PM, Ingo Molnar wrote: > * Anthony Liguori<anthony@codemonkey.ws> wrote: > > >> [...] >> >> I've been trying very hard to turn this into a productive thread attempting >> to capture your feedback and give clear suggestions about how you can solve >> achieve your desired functionality. >> > I'm glad that we are at this more productive stage. I'm still trying to > achieve the very same technological capabilities that i expressed in the first > few mails when i reviewed the 'perf kvm' patch that was submitted by Yanmin. > No, you're not. You're trying to fracture the qemu community with your tools/kvm proposal, you're explaining to me how I'm working on the wrong thing by concentrating on things that my employer needs rather than what you think kvm needs, and attaching various unsavoury labels to Anthony and myself. Any wonder we aren't getting anything done? If you can commit to a reasonable conversation we might be able to make progress. Is this actually possible? > The crux of the problem is very simple. To quote my earlier mail: > > | > | - The inconvenience of having to type: > | perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ > | --guestmodules=/home/ymzhang/guest/modules top > | > | > | is very obvious even with a single guest. Now multiply that by more guests ... > | > > For example we want 'perf kvm top' to do something useful by default: it > should find the first guest running and it should report its profile. > > The tool shouldnt have to guess about where the guests are, what their > namespaces is and how to talk to them. We also want easy symbolic access to > guest, for example: > > perf kvm -g OpenSuse-2 record sleep 1 > > I.e.: > > - Easy default reference to guest instances, and a way for tools to > reference them symbolically as well in the multi-guest case. Preferably > something trustable and kernel-provided - not some indirect information > like a PID file created by libvirt-manager or so. > Usually 'layering violation' is trotted out at such suggestions. I don't like using the term, because sometimes the layers are incorrect and need to be violated. But it should be done explicitly, not as a shortcut for a minor feature (and profiling is a minor feature, most users will never use it, especially guest-from-host). The fact is we have well defined layers today, kvm virtualizes the cpu and memory, qemu emulates devices for a single guest, libvirt manages guests. We break this sometimes but there has to be a good reason. So perf needs to talk to libvirt if it wants names. Could be done via linking, or can be done using a pluging libvirt drops into perf. > - Guest-transparent VFS integration into the host, to recover symbols and > debug info in binaries, etc. > > There were a few responses to that but none really addressed those problems - > they mostly tried to re-define the problem and suggested that i was wrong to > want such capabilities and suggested various inferior approaches instead. See > the thread for the details - i think i covered every technical suggestion that > was made. > You simply kept ignoring me when I said that if something can be kept out of the kernel without impacting performance, it should be. I don't want emergency patches closing some security hole or oops in a kernel symbol server. The usability argument is a red herring. True, it takes time for things to trickle down to distributions and users. Those who can't wait can download the code and compile, it isn't that difficult. > So we are still at an impasse as far as i can see. If i overlooked some > suggestion that addresses these problems then please let me know ... > The impasse is mostly due to you insisting on doing everything your way, in the kernel, and disregarding how libvirt/qemu/kvm does things. Learn the kvm ecosystem, you'll find it is quite easy to contribute code. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:12 ` Avi Kivity @ 2010-03-22 16:16 ` Avi Kivity 2010-03-22 16:40 ` Pekka Enberg 2010-03-22 16:51 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 16:16 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 06:12 PM, Avi Kivity wrote: >> There were a few responses to that but none really addressed those >> problems - >> they mostly tried to re-define the problem and suggested that i was >> wrong to >> want such capabilities and suggested various inferior approaches >> instead. See >> the thread for the details - i think i covered every technical >> suggestion that >> was made. > > > You simply kept ignoring me when I said that if something can be kept > out of the kernel without impacting performance, it should be. I > don't want emergency patches closing some security hole or oops in a > kernel symbol server. Or rather, explained how I am a wicked microkernelist. The herring were out in force today. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:16 ` Avi Kivity @ 2010-03-22 16:40 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 16:40 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Mon, Mar 22, 2010 at 6:16 PM, Avi Kivity <avi@redhat.com> wrote: >> You simply kept ignoring me when I said that if something can be kept out >> of the kernel without impacting performance, it should be. I don't want >> emergency patches closing some security hole or oops in a kernel symbol >> server. > > Or rather, explained how I am a wicked microkernelist. The herring were out > in force today. Well, if it's not being a "wicked microkernelist" then what is it? Performance is hardly the only motivation to put things into the kernel. Think kernel mode-setting and devtmpfs (with the ironic twist of original devfs being removed from the kernel) here, for example. Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-22 16:40 ` Pekka Enberg 0 siblings, 0 replies; 390+ messages in thread From: Pekka Enberg @ 2010-03-22 16:40 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Mon, Mar 22, 2010 at 6:16 PM, Avi Kivity <avi@redhat.com> wrote: >> You simply kept ignoring me when I said that if something can be kept out >> of the kernel without impacting performance, it should be. I don't want >> emergency patches closing some security hole or oops in a kernel symbol >> server. > > Or rather, explained how I am a wicked microkernelist. The herring were out > in force today. Well, if it's not being a "wicked microkernelist" then what is it? Performance is hardly the only motivation to put things into the kernel. Think kernel mode-setting and devtmpfs (with the ironic twist of original devfs being removed from the kernel) here, for example. Pekka ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:40 ` Pekka Enberg (?) @ 2010-03-22 18:06 ` Avi Kivity -1 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 18:06 UTC (permalink / raw) To: Pekka Enberg Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 06:40 PM, Pekka Enberg wrote: > On Mon, Mar 22, 2010 at 6:16 PM, Avi Kivity<avi@redhat.com> wrote: > >>> You simply kept ignoring me when I said that if something can be kept out >>> of the kernel without impacting performance, it should be. I don't want >>> emergency patches closing some security hole or oops in a kernel symbol >>> server. >>> >> Or rather, explained how I am a wicked microkernelist. The herring were out >> in force today. >> > Well, if it's not being a "wicked microkernelist" then what is it? > I know I'm bad. > Performance is hardly the only motivation to put things into the > kernel. Think kernel mode-setting and devtmpfs (with the ironic twist > of original devfs being removed from the kernel) here, for example. > Motivations include privileged device access, needing to access physical memory, security, and keeping the userspace interface sane. There are others. I don't think any of them hold here. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:12 ` Avi Kivity 2010-03-22 16:16 ` Avi Kivity @ 2010-03-22 16:51 ` Ingo Molnar 2010-03-22 17:08 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 16:51 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > > The crux of the problem is very simple. To quote my earlier mail: > > > > | > > | - The inconvenience of having to type: > > | perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ > > | --guestmodules=/home/ymzhang/guest/modules top > > | > > | > > | is very obvious even with a single guest. Now multiply that by more guests ... > > | > > > > For example we want 'perf kvm top' to do something useful by default: it > > should find the first guest running and it should report its profile. > > > > The tool shouldnt have to guess about where the guests are, what their > > namespaces is and how to talk to them. We also want easy symbolic access to > > guest, for example: > > > > perf kvm -g OpenSuse-2 record sleep 1 [ Sidenote: i still received no adequate suggestions about how to provide this category of technical features. ] > > I.e.: > > > > - Easy default reference to guest instances, and a way for tools to > > reference them symbolically as well in the multi-guest case. Preferably > > something trustable and kernel-provided - not some indirect information > > like a PID file created by libvirt-manager or so. > > Usually 'layering violation' is trotted out at such suggestions. > [...] That's weird, how can a feature request be a 'layering violation'? If something that users find straightforward and usable is a layering violation to you (such as easily being able to access their own files on the host as well ...) then i think you need to revisit the definition of that term instead of trying to fix the user. > [...] I don't like using the term, because sometimes the layers are > incorrect and need to be violated. But it should be done explicitly, not as > a shortcut for a minor feature (and profiling is a minor feature, most users > will never use it, especially guest-from-host). > > The fact is we have well defined layers today, kvm virtualizes the cpu and > memory, qemu emulates devices for a single guest, libvirt manages guests. > We break this sometimes but there has to be a good reason. So perf needs to > talk to libvirt if it wants names. Could be done via linking, or can be > done using a pluging libvirt drops into perf. > > > - Guest-transparent VFS integration into the host, to recover symbols and > > debug info in binaries, etc. > > > > There were a few responses to that but none really addressed those > > problems - they mostly tried to re-define the problem and suggested that i > > was wrong to want such capabilities and suggested various inferior > > approaches instead. See the thread for the details - i think i covered > > every technical suggestion that was made. > > You simply kept ignoring me when I said that if something can be kept out of > the kernel without impacting performance, it should be. I don't want > emergency patches closing some security hole or oops in a kernel symbol > server. I never suggested an "in kernel space symbol server" which could oops, why would i have suggested that? Please point me to an email where i suggested that. > The usability argument is a red herring. True, it takes time for things to > trickle down to distributions and users. Those who can't wait can download > the code and compile, it isn't that difficult. It's not just "download and compile", it's also "configure correctly for several separate major distributions" and "configure to per guest instance local rules". It's far more fragile in practice than you make it appear to be, and since you yourself expressed that you are not interested much in the tooling side, how can you have adequate experience to judge such matters? In fact for instrumentation it's beyond a critical threshold of fragility - instrumentation above all needs to be accessible, transparent and robust. If you cannot see the advantages of a properly integrated solution then i suspect there's not much i can do to convince you. And you ignored not just me but you ignored several people in this thread who thought the current status quo was inadequate and expressed interest in both the VFS integration and in the guest enumeration features. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:51 ` Ingo Molnar @ 2010-03-22 17:08 ` Avi Kivity 2010-03-22 17:34 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 17:08 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 06:51 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> The crux of the problem is very simple. To quote my earlier mail: >>> >>> | >>> | - The inconvenience of having to type: >>> | perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \ >>> | --guestmodules=/home/ymzhang/guest/modules top >>> | >>> | >>> | is very obvious even with a single guest. Now multiply that by more guests ... >>> | >>> >>> For example we want 'perf kvm top' to do something useful by default: it >>> should find the first guest running and it should report its profile. >>> >>> The tool shouldnt have to guess about where the guests are, what their >>> namespaces is and how to talk to them. We also want easy symbolic access to >>> guest, for example: >>> >>> perf kvm -g OpenSuse-2 record sleep 1 >>> > [ Sidenote: i still received no adequate suggestions about how to provide this > category of technical features. ] > You need to integrate with libvirt to convert guest names something that can be used to obtain guest symbols. >>> I.e.: >>> >>> - Easy default reference to guest instances, and a way for tools to >>> reference them symbolically as well in the multi-guest case. Preferably >>> something trustable and kernel-provided - not some indirect information >>> like a PID file created by libvirt-manager or so. >>> >> Usually 'layering violation' is trotted out at such suggestions. >> [...] >> > That's weird, how can a feature request be a 'layering violation'? > The 'something trustable and kernel-provided'. The kernel knows nothing about guest names. > If something that users find straightforward and usable is a layering > violation to you (such as easily being able to access their own files on the > host as well ...) then i think you need to revisit the definition of that term > instead of trying to fix the user. > Here is the explanation, you left it quoted: >> [...] I don't like using the term, because sometimes the layers are >> incorrect and need to be violated. But it should be done explicitly, not as >> a shortcut for a minor feature (and profiling is a minor feature, most users >> will never use it, especially guest-from-host). >> >> The fact is we have well defined layers today, kvm virtualizes the cpu and >> memory, qemu emulates devices for a single guest, libvirt manages guests. >> We break this sometimes but there has to be a good reason. So perf needs to >> talk to libvirt if it wants names. Could be done via linking, or can be >> done using a pluging libvirt drops into perf. >> You simply kept ignoring me when I said that if something can be kept out of >> the kernel without impacting performance, it should be. I don't want >> emergency patches closing some security hole or oops in a kernel symbol >> server. >> > I never suggested an "in kernel space symbol server" which could oops, why > would i have suggested that? Please point me to an email where i suggested > that. > You insisted that it be in the kernel. Later you relaxed that and said a daemon is fine. I'm not going to reread this thread, once is more than enough. >> The usability argument is a red herring. True, it takes time for things to >> trickle down to distributions and users. Those who can't wait can download >> the code and compile, it isn't that difficult. >> > It's not just "download and compile", it's also "configure correctly for > several separate major distributions" and "configure to per guest instance > local rules". > That's life in Linux-land. Either you let distributions feed you cooked packages and relax, or you do the work yourself. If we had tools/everything/ it wouldn't be this way, but we don't. > It's far more fragile in practice than you make it appear to be, and since you > yourself expressed that you are not interested much in the tooling side, how > can you have adequate experience to judge such matters? > People on kvm-devel manage to build and run release tarballs and even directly from git. I build packages from source occasionally. It isn't fun but it doesn't take a PhD. > In fact for instrumentation it's beyond a critical threshold of fragility - > instrumentation above all needs to be accessible, transparent and robust. > > If you cannot see the advantages of a properly integrated solution then i > suspect there's not much i can do to convince you. > Integration in Linux happens at the desktop or distribution level. You want to move it to the kernel level. It works for perf, great, but that doesn't mean it will work for everything else. Once perf grows a GUI, I expect it will stop working for perf as well (for example, if gtk breaks its API in a major release, which version will perf code for?) > And you ignored not just me but you ignored several people in this thread who > thought the current status quo was inadequate and expressed interest in both > the VFS integration and in the guest enumeration features. > I'm sorry. I don't reply to every email. If you want my opinion on something, you can ask me again. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:08 ` Avi Kivity @ 2010-03-22 17:34 ` Ingo Molnar 2010-03-22 17:55 ` Avi Kivity ` (2 more replies) 0 siblings, 3 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 17:34 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > >>> - Easy default reference to guest instances, and a way for tools to > >>> reference them symbolically as well in the multi-guest case. Preferably > >>> something trustable and kernel-provided - not some indirect information > >>> like a PID file created by libvirt-manager or so. > >> > >> Usually 'layering violation' is trotted out at such suggestions. > >> [...] > > > > That's weird, how can a feature request be a 'layering violation'? > > The 'something trustable and kernel-provided'. The kernel knows nothing > about guest names. The kernel certainly knows about other resources such as task names or network interface names or tracepoint names. This is kernel design 101. > > If something that users find straightforward and usable is a layering > > violation to you (such as easily being able to access their own files on > > the host as well ...) then i think you need to revisit the definition of > > that term instead of trying to fix the user. > > Here is the explanation, you left it quoted: > > >> [...] I don't like using the term, because sometimes the layers are > >> incorrect and need to be violated. But it should be done explicitly, not > >> as a shortcut for a minor feature (and profiling is a minor feature, most > >> users will never use it, especially guest-from-host). > >> > >> The fact is we have well defined layers today, kvm virtualizes the cpu > >> and memory, qemu emulates devices for a single guest, libvirt manages > >> guests. We break this sometimes but there has to be a good reason. So > >> perf needs to talk to libvirt if it wants names. Could be done via > >> linking, or can be done using a pluging libvirt drops into perf. This is really just the much-discredited microkernel approach for keeping global enumeration data that should be kept by the kernel ... Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony. There's numerous ways that this can break: - Those special files can get corrupted, mis-setup, get out of sync, or can be hard to discover. - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious design flaw: it is per user. When i'm root i'd like to query _all_ current guest images, not just the ones started by root. A system might not even have a notion of '${HOME}'. - Apps might start KVM vcpu instances without adhering to the ${HOME}/.qemu/qmp/ access method. - There is no guarantee for the Qemu process to reply to a request - while the kernel can always guarantee an enumeration result. I dont want 'perf kvm' to hang or misbehave just because Qemu has hung. Really, for such reasons user-space is pretty poor at doing system-wide enumeration and resource management. Microkernels lost for a reason. You are committing several grave design mistakes here. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:34 ` Ingo Molnar @ 2010-03-22 17:55 ` Avi Kivity 2010-03-22 19:15 ` Anthony Liguori 2010-03-22 19:20 ` Ingo Molnar 2010-03-22 18:35 ` Anthony Liguori 2010-03-22 18:41 ` Anthony Liguori 2 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 17:55 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 07:34 PM, Ingo Molnar wrote: > >> The 'something trustable and kernel-provided'. The kernel knows nothing >> about guest names. >> > The kernel certainly knows about other resources such as task names or network > interface names or tracepoint names. This is kernel design 101. > But it doesn't know about guest names. You can't trust task names since any user can create a task with any name. Network interfaces are root only so you can trust their names. There are dozens or even hundreds of object classes the kernel does not know about and cannot enumerate. User names, for instance. X sessions. Windows (the screen artifact, not the OS). CIFS shares exported by this machine. Currently running applications (not processes). btw, network interfaces would have been much better of using /dev/netif/name rather than having their own namespace, IMO, like disks. >>>> [...] I don't like using the term, because sometimes the layers are >>>> incorrect and need to be violated. But it should be done explicitly, not >>>> as a shortcut for a minor feature (and profiling is a minor feature, most >>>> users will never use it, especially guest-from-host). >>>> >>>> The fact is we have well defined layers today, kvm virtualizes the cpu >>>> and memory, qemu emulates devices for a single guest, libvirt manages >>>> guests. We break this sometimes but there has to be a good reason. So >>>> perf needs to talk to libvirt if it wants names. Could be done via >>>> linking, or can be done using a pluging libvirt drops into perf. >>>> > This is really just the much-discredited microkernel approach for keeping > global enumeration data that should be kept by the kernel ... > I disagree it should be kept in the kernel. Why introduce a new namespace, with APIs to query it, manage it, rules regarding conflicts, then virtualize it for containers. > Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony. > There's numerous ways that this can break: > I don't like it either. We have libvirt for enumerating guests. > - Those special files can get corrupted, mis-setup, get out of sync, or can > be hard to discover. > > - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious > design flaw: it is per user. When i'm root i'd like to query _all_ current > guest images, not just the ones started by root. A system might not even > have a notion of '${HOME}'. > > - Apps might start KVM vcpu instances without adhering to the > ${HOME}/.qemu/qmp/ access method. > - it doesn't work with nfs. > - There is no guarantee for the Qemu process to reply to a request - while > the kernel can always guarantee an enumeration result. I dont want 'perf > kvm' to hang or misbehave just because Qemu has hung. > If qemu doesn't reply, your guest is dead anyway. > Really, for such reasons user-space is pretty poor at doing system-wide > enumeration and resource management. Microkernels lost for a reason. > Take a look at your desktop, userspace is doing all of that everywhere, from enumerating users and groups, to deciding how your disks are named. The kernel only provides the bare facilities. > You are committing several grave design mistakes here. > I am committing on the shoulders of giants. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:55 ` Avi Kivity @ 2010-03-22 19:15 ` Anthony Liguori 2010-03-22 19:31 ` Daniel P. Berrange 2010-03-22 20:00 ` Antoine Martin 2010-03-22 19:20 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 19:15 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 12:55 PM, Avi Kivity wrote: >> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by >> Anthony. >> There's numerous ways that this can break: > > I don't like it either. We have libvirt for enumerating guests. We're stuck in a rut with libvirt and I think a lot of the dissatisfaction with qemu is rooted in that. It's not libvirt that's the probably, but the relationship between qemu and libvirt. We add a feature to qemu and maybe after six month it gets exposed by libvirt. Release time lines of the two projects complicate the situation further. People that write GUIs are limited by libvirt because that's what they're told to use and when they need something simple, they're presented with first getting that feature implemented in qemu, then plumbed through libvirt. It wouldn't be so bad if libvirt was basically a passthrough interface to qemu but it tries to model everything in a generic way which is more or less doomed to fail when you're adding lots of new features (as we are). The list of things that libvirt doesn't support and won't any time soon is staggering. libvirt serves an important purpose, but we need to do a better job in qemu with respect to usability. We can't just punt to libvirt. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:15 ` Anthony Liguori @ 2010-03-22 19:31 ` Daniel P. Berrange 2010-03-22 19:33 ` Anthony Liguori 2010-03-22 19:39 ` Alexander Graf 2010-03-22 20:00 ` Antoine Martin 1 sibling, 2 replies; 390+ messages in thread From: Daniel P. Berrange @ 2010-03-22 19:31 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Mon, Mar 22, 2010 at 02:15:35PM -0500, Anthony Liguori wrote: > On 03/22/2010 12:55 PM, Avi Kivity wrote: > >>Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by > >>Anthony. > >>There's numerous ways that this can break: > > > >I don't like it either. We have libvirt for enumerating guests. > > We're stuck in a rut with libvirt and I think a lot of the > dissatisfaction with qemu is rooted in that. It's not libvirt that's > the probably, but the relationship between qemu and libvirt. > > We add a feature to qemu and maybe after six month it gets exposed by > libvirt. Release time lines of the two projects complicate the > situation further. People that write GUIs are limited by libvirt > because that's what they're told to use and when they need something > simple, they're presented with first getting that feature implemented in > qemu, then plumbed through libvirt. That is somewhat unfair as a blanket statement! While some features have had a long time delay & others are not supported at all, in many cases we have had zero delay. We have been supporting QMP, qdev, vhost-net since before the patches for those features were even merged in QEMU GIT! It varies depending on how closely QEMU & libvirt people have been working together on a feature, and on how strongly end users are demanding the features. > It wouldn't be so bad if libvirt was basically a passthrough interface > to qemu but it tries to model everything in a generic way which is more > or less doomed to fail when you're adding lots of new features (as we are). > > The list of things that libvirt doesn't support and won't any time soon > is staggering. As previously discussed, we want to improve both the set of features supported, and make it much easier to support new features promptly. The QMP & qdev stuff has been a very good step forward in making it easier to support QEMU management. There have been a proposals from several people, yourself included, on how to improve libvirt's support for the full range of QEMU features. We're committed to looking at this and figuring out which proposals are practical to support, so we can improve QEMU & libvirt interaction for everyone. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:31 ` Daniel P. Berrange @ 2010-03-22 19:33 ` Anthony Liguori 2010-03-22 19:39 ` Alexander Graf 1 sibling, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 19:33 UTC (permalink / raw) To: Daniel P. Berrange Cc: Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 02:31 PM, Daniel P. Berrange wrote: > On Mon, Mar 22, 2010 at 02:15:35PM -0500, Anthony Liguori wrote: > >> On 03/22/2010 12:55 PM, Avi Kivity wrote: >> >>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by >>>> Anthony. >>>> There's numerous ways that this can break: >>>> >>> I don't like it either. We have libvirt for enumerating guests. >>> >> We're stuck in a rut with libvirt and I think a lot of the >> dissatisfaction with qemu is rooted in that. It's not libvirt that's >> the probably, but the relationship between qemu and libvirt. >> >> We add a feature to qemu and maybe after six month it gets exposed by >> libvirt. Release time lines of the two projects complicate the >> situation further. People that write GUIs are limited by libvirt >> because that's what they're told to use and when they need something >> simple, they're presented with first getting that feature implemented in >> qemu, then plumbed through libvirt. >> > That is somewhat unfair as a blanket statement! > Sorry, you're certainly correct. Some features appear quickly, but others can take an awfully long time. >> It wouldn't be so bad if libvirt was basically a passthrough interface >> to qemu but it tries to model everything in a generic way which is more >> or less doomed to fail when you're adding lots of new features (as we are). >> >> The list of things that libvirt doesn't support and won't any time soon >> is staggering. >> > As previously discussed, we want to improve both the set of features > supported, and make it much easier to support new features promptly. > The QMP& qdev stuff has been a very good step forward in making it > easier to support QEMU management. There have been a proposals from > several people, yourself included, on how to improve libvirt's support > for the full range of QEMU features. We're committed to looking at this > and figuring out which proposals are practical to support, so we can > improve QEMU& libvirt interaction for everyone. > Regards, Anthony Liguori > Regards, > Daniel > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:31 ` Daniel P. Berrange 2010-03-22 19:33 ` Anthony Liguori @ 2010-03-22 19:39 ` Alexander Graf 2010-03-22 19:54 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Alexander Graf @ 2010-03-22 19:39 UTC (permalink / raw) To: Daniel P. Berrange Cc: Anthony Liguori, Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 22.03.2010, at 20:31, Daniel P. Berrange wrote: > On Mon, Mar 22, 2010 at 02:15:35PM -0500, Anthony Liguori wrote: >> On 03/22/2010 12:55 PM, Avi Kivity wrote: >>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by >>>> Anthony. >>>> There's numerous ways that this can break: >>> >>> I don't like it either. We have libvirt for enumerating guests. >> >> We're stuck in a rut with libvirt and I think a lot of the >> dissatisfaction with qemu is rooted in that. It's not libvirt that's >> the probably, but the relationship between qemu and libvirt. >> >> We add a feature to qemu and maybe after six month it gets exposed by >> libvirt. Release time lines of the two projects complicate the >> situation further. People that write GUIs are limited by libvirt >> because that's what they're told to use and when they need something >> simple, they're presented with first getting that feature implemented in >> qemu, then plumbed through libvirt. > > That is somewhat unfair as a blanket statement! > > While some features have had a long time delay & others are not supported > at all, in many cases we have had zero delay. We have been supporting QMP, > qdev, vhost-net since before the patches for those features were even merged > in QEMU GIT! It varies depending on how closely QEMU & libvirt people have > been working together on a feature, and on how strongly end users are demanding > the features. Yes. I think the point was that every layer in between brings potential slowdown and loss of features. Hopefully this will go away with QMP. By then people can decide if they want to be hypervisor agnostic (libvirt) or tightly coupled with qemu (QMP). The best of both worlds would of course be a QMP pass-through in libvirt. No idea if that's easily possible. Either way, things are improving. What people see at the end is virt-manager though. And if you compare if feature-wise as well as looks-wise vbox is simply superior. Several features lacking in lower layers too (pv graphics, always working absolute pointers, clipboard sharing, ...). That said it doesn't mean we should resign. It means we know which areas to work on :-). And we know that our problem is not the kernel/userspace interface, but the qemu and above interfaces. Alex ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:39 ` Alexander Graf @ 2010-03-22 19:54 ` Ingo Molnar 2010-03-22 19:58 ` Alexander Graf 2010-03-22 20:19 ` Antoine Martin 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 19:54 UTC (permalink / raw) To: Alexander Graf Cc: Daniel P. Berrange, Anthony Liguori, Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Alexander Graf <agraf@suse.de> wrote: > Yes. I think the point was that every layer in between brings potential > slowdown and loss of features. Exactly. The more 'fragmented' a project is into sub-projects, without a single, unified, functional reference implementation in the center of it, the longer it takes to fix 'unsexy' problems like trivial usability bugs. Furthermore, another negative effect is that many times features are implemented not in their technically best way, but in a way to keep them local to the project that originates them. This is done to keep deployment latencies and general contribution overhead down to a minimum. The moment you have to work with yet another project, the overhead adds up. So developers rather go for the quicker (yet inferior) hack within the sub-project they have best access to. Tell me this isnt happening in this space ;-) Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:54 ` Ingo Molnar @ 2010-03-22 19:58 ` Alexander Graf 2010-03-22 20:21 ` Ingo Molnar 2010-03-22 20:19 ` Antoine Martin 1 sibling, 1 reply; 390+ messages in thread From: Alexander Graf @ 2010-03-22 19:58 UTC (permalink / raw) To: Ingo Molnar Cc: Daniel P. Berrange, Anthony Liguori, Avi Kivity, Pekka Enberg, Yanmin Zhang, Peter Zijlstra, Sheng Yang, LKML Mailing List, kvm-devel General, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 22.03.2010, at 20:54, Ingo Molnar wrote: > > * Alexander Graf <agraf@suse.de> wrote: > >> Yes. I think the point was that every layer in between brings potential >> slowdown and loss of features. > > Exactly. The more 'fragmented' a project is into sub-projects, without a > single, unified, functional reference implementation in the center of it, the > longer it takes to fix 'unsexy' problems like trivial usability bugs. I agree to that part. As previously stated there are few people working on qemu that would go and implement higher level things though. So some solution is needed there. > Furthermore, another negative effect is that many times features are > implemented not in their technically best way, but in a way to keep them local > to the project that originates them. This is done to keep deployment latencies > and general contribution overhead down to a minimum. The moment you have to > work with yet another project, the overhead adds up. I disagree there. Keeping things local and self-contained has been the UNIX secret. It works really well as long as the boundaries are well defined. The problem we're facing is that we're simply lacking an active GUI / desktop user development community. We have desktop users, but nobody feels like tackling the issue of doing a great GUI project while talking to qemu-devel about his needs. > So developers rather go for the quicker (yet inferior) hack within the > sub-project they have best access to. Well - not necessarily hacks. It's more about project boundaries. Nothing is bad about that. You wouldn't want "ls" implemented in the Linux kernel either, right? :-) Alex ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:58 ` Alexander Graf @ 2010-03-22 20:21 ` Ingo Molnar 2010-03-22 20:35 ` Avi Kivity 2010-03-23 10:48 ` Bernd Petrovitsch 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 20:21 UTC (permalink / raw) To: Alexander Graf Cc: Daniel P. Berrange, Anthony Liguori, Avi Kivity, Pekka Enberg, Yanmin Zhang, Peter Zijlstra, Sheng Yang, LKML Mailing List, kvm-devel General, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Alexander Graf <agraf@suse.de> wrote: > > Furthermore, another negative effect is that many times features are > > implemented not in their technically best way, but in a way to keep them > > local to the project that originates them. This is done to keep deployment > > latencies and general contribution overhead down to a minimum. The moment > > you have to work with yet another project, the overhead adds up. > > I disagree there. Keeping things local and self-contained has been the UNIX > secret. It works really well as long as the boundaries are well defined. The 'UNIX secret' works for text driven pipelined commands where we are essentially programming via narrow ASCII input of mathematical logic. It doesnt work for a GUI that is a 2D/3D environment of millions of pixels, shaped by human visual perception of prettiness, familiarity and efficiency. > The problem we're facing is that we're simply lacking an active GUI / > desktop user development community. We have desktop users, but nobody feels > like tackling the issue of doing a great GUI project while talking to > qemu-devel about his needs. Have you made thoughts about why that might be so? I think it's because of what i outlined above - that you are trying to apply the "UNIX secret" to GUIs - and that is a mistake. A good GUI is almost at the _exact opposite spectrum_ of good command-line tool: tightly integrated, with 'layering violations' designed into it all over the place: look i can paste the text from an editor straight into a firefox form. I didnt go through any hiearchy of layers, i just took the shortest path between the apps! In other words: in a GUI the output controls the design, for command-line tools the design controls the output. It is no wonder Unix always had its problems with creating good GUIs that are efficient to humans. A good GUI works like the human brain, and the human brain does not mind 'layering violations' when that gets it a more efficient result. > > So developers rather go for the quicker (yet inferior) hack within the > > sub-project they have best access to. > > Well - not necessarily hacks. It's more about project boundaries. Nothing is > bad about that. You wouldn't want "ls" implemented in the Linux kernel > either, right? :-) I guess you are talking to the wrong person as i actually have implemented ls functionality in the kernel, using async IO concepts and extreme threading ;-) It was a bit crazy, but was also the fastest FTP server ever running on this planet. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 20:21 ` Ingo Molnar @ 2010-03-22 20:35 ` Avi Kivity 2010-03-23 10:48 ` Bernd Petrovitsch 1 sibling, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 20:35 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander Graf, Daniel P. Berrange, Anthony Liguori, Pekka Enberg, Yanmin Zhang, Peter Zijlstra, Sheng Yang, LKML Mailing List, kvm-devel General, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 10:21 PM, Ingo Molnar wrote: > * Alexander Graf<agraf@suse.de> wrote: > > >>> Furthermore, another negative effect is that many times features are >>> implemented not in their technically best way, but in a way to keep them >>> local to the project that originates them. This is done to keep deployment >>> latencies and general contribution overhead down to a minimum. The moment >>> you have to work with yet another project, the overhead adds up. >>> >> I disagree there. Keeping things local and self-contained has been the UNIX >> secret. It works really well as long as the boundaries are well defined. >> > The 'UNIX secret' works for text driven pipelined commands where we are > essentially programming via narrow ASCII input of mathematical logic. > > It doesnt work for a GUI that is a 2D/3D environment of millions of pixels, > shaped by human visual perception of prettiness, familiarity and efficiency. > Modularization is needed when a project exceeds the average developer's capacity. For kvm, it is logical to separate privileged cpu virtualization, from guest virtualization, from host management, from cluster management. >> The problem we're facing is that we're simply lacking an active GUI / >> desktop user development community. We have desktop users, but nobody feels >> like tackling the issue of doing a great GUI project while talking to >> qemu-devel about his needs. >> > Have you made thoughts about why that might be so? > > I think it's because of what i outlined above - that you are trying to apply > the "UNIX secret" to GUIs - and that is a mistake. > > A good GUI is almost at the _exact opposite spectrum_ of good command-line > tool: tightly integrated, with 'layering violations' designed into it all over > the place: > > look i can paste the text from an editor straight into a firefox form. I > didnt go through any hiearchy of layers, i just took the shortest path > between the apps! > Nope. You copied text from one application into the clipboard (or selection, or PRIMARY, or whatever ) and pasted text from the clipboard to another application. If firefox and your editor had to interact directly, all would be lost. See - there was a global (for the session) third party, and it wasn't the kernel. > In other words: in a GUI the output controls the design, for command-line > tools the design controls the output. > Not in GUIs that I've seen the internals of. > It is no wonder Unix always had its problems with creating good GUIs that are > efficient to humans. A good GUI works like the human brain, and the human > brain does not mind 'layering violations' when that gets it a more efficient > result. > The problem is that only developers are involved, not people who understand human-computer interaction (in many cases, not human-human interaction either). Another problem is that a good GUI takes a lot of work so you need a lot of committed resources. A third problem is that it isn't a lot of fun, at least not the 20% of the work that take 800% of the time. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 20:21 ` Ingo Molnar 2010-03-22 20:35 ` Avi Kivity @ 2010-03-23 10:48 ` Bernd Petrovitsch 1 sibling, 0 replies; 390+ messages in thread From: Bernd Petrovitsch @ 2010-03-23 10:48 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander Graf, Daniel P. Berrange, Anthony Liguori, Avi Kivity, Pekka Enberg, Yanmin Zhang, Peter Zijlstra, Sheng Yang, LKML Mailing List, kvm-devel General, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Mon, 2010-03-22 at 21:21 +0100, Ingo Molnar wrote: [...] > Have you made thoughts about why that might be so? Yes. Forword: I assume with "GUI" you mean "a user interface for the classical desktop user with next to no interest in learning details or basics". That doesn't mean the classical desktop user is silly, stupid or otherwise handicapped - it's just the lack of interest and/or time. > I think it's because of what i outlined above - that you are trying to apply > the "UNIX secret" to GUIs - and that is a mistake. No, it's the very same mechanism. But you just have to start at the correct point. In the kernel/device driver world, you start at the device. And in the GUI world, you better start at the GUI (and not some kernel API, library API, GUI tool or toolchains or anywhere else). > A good GUI is almost at the _exact opposite spectrum_ of good command-line > tool: tightly integrated, with 'layering violations' designed into it all over > the place: > > look i can paste the text from an editor straight into a firefox form. I > didnt go through any hiearchy of layers, i just took the shortest path > between the apps! > > In other words: in a GUI the output controls the design, for command-line ACK, because you to make the GUI understandable to the intended users. If that means "hiding 90% of all possibilities and features", you just hide them. Of course, the user of such an UI is quite limited doesn't use much of the functionality - because s/he can't access it through the GUI - (but presenting 100% - or even 40% - doesn't help either as s/he won't understand it anyways). > tools the design controls the output. ACK, because the user in this case (which is most of the time a developer, sys-admin, or similar techie) *wants* an 1:1 picture of the underlying model because s/he already *knows* the underlying model (and is willing and able to adapt the own workflow to the underlying models). > It is no wonder Unix always had its problems with creating good GUIs that are ACK. The clichee-Unix-person doesn't come from the "GUI world". So most of them are "trained" and used to look what's there and improve on it. > efficient to humans. A good GUI works like the human brain, and the human > brain does not mind 'layering violations' when that gets it a more efficient > result. If this is the case, the layering/structure/design of the GUI is (very) badly defined/chosen (for whatever reason). [ Most probably because some seasoned software developer designed the GUI-app *without* designing (and testing!) the GUI (or more to the point: the look - how does it look like - and feel - how does it behave, what are the possible workflows, ... - of it) first. ] Bernd -- Bernd Petrovitsch Email : bernd@petrovitsch.priv.at LUGA : http://www.luga.at ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:54 ` Ingo Molnar 2010-03-22 19:58 ` Alexander Graf @ 2010-03-22 20:19 ` Antoine Martin 1 sibling, 0 replies; 390+ messages in thread From: Antoine Martin @ 2010-03-22 20:19 UTC (permalink / raw) To: Ingo Molnar Cc: Alexander Graf, Daniel P. Berrange, Anthony Liguori, Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/23/2010 02:54 AM, Ingo Molnar wrote: > * Alexander Graf<agraf@suse.de> wrote >> Yes. I think the point was that every layer in between brings potential >> slowdown and loss of features. >> > Exactly. The more 'fragmented' a project is into sub-projects, without a > single, unified, functional reference implementation in the center of it, the > longer it takes to fix 'unsexy' problems like trivial usability bugs. > > Furthermore, another negative effect is that many times features are > implemented not in their technically best way, but in a way to keep them local > to the project that originates them. This is done to keep deployment latencies > and general contribution overhead down to a minimum. The moment you have to > work with yet another project, the overhead adds up. > > So developers rather go for the quicker (yet inferior) hack within the > sub-project they have best access to. > > Tell me this isnt happening in this space ;-) > Integration is hard, requires a wider set of technical skills and getting good test coverage becomes more difficult. But I agree that it is worth the effort, kvm could reap large rewards from putting a greater emphasis on integration (ala vbox) - no matter how it is achieved (cowardly not taking sides on implementation decisions like repository locations). Antoine > Thanks, > > Ingo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:15 ` Anthony Liguori 2010-03-22 19:31 ` Daniel P. Berrange @ 2010-03-22 20:00 ` Antoine Martin 2010-03-22 20:58 ` Daniel P. Berrange 1 sibling, 1 reply; 390+ messages in thread From: Antoine Martin @ 2010-03-22 20:00 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/23/2010 02:15 AM, Anthony Liguori wrote: > On 03/22/2010 12:55 PM, Avi Kivity wrote: >>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by >>> Anthony. >>> There's numerous ways that this can break: >> >> I don't like it either. We have libvirt for enumerating guests. > > We're stuck in a rut with libvirt and I think a lot of the > dissatisfaction with qemu is rooted in that. It's not libvirt that's > the probably, but the relationship between qemu and libvirt. +1 The obvious reason why so many people still use shell scripts rather than libvirt is because if it just doesn't provide what they need. Every time I've looked at it (and I've been looking for a better solution for many years), it seems that it would have provided most of the things I needed, but the remaining bits were unsolvable. Shell scripts can be ugly, but you get total control. Antoine > We add a feature to qemu and maybe after six month it gets exposed by > libvirt. Release time lines of the two projects complicate the > situation further. People that write GUIs are limited by libvirt > because that's what they're told to use and when they need something > simple, they're presented with first getting that feature implemented > in qemu, then plumbed through libvirt. > > It wouldn't be so bad if libvirt was basically a passthrough interface > to qemu but it tries to model everything in a generic way which is > more or less doomed to fail when you're adding lots of new features > (as we are). > > The list of things that libvirt doesn't support and won't any time > soon is staggering. > > libvirt serves an important purpose, but we need to do a better job in > qemu with respect to usability. We can't just punt to libvirt. > > Regards, > > Anthony Liguori > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 20:00 ` Antoine Martin @ 2010-03-22 20:58 ` Daniel P. Berrange 0 siblings, 0 replies; 390+ messages in thread From: Daniel P. Berrange @ 2010-03-22 20:58 UTC (permalink / raw) To: Antoine Martin Cc: Anthony Liguori, Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Tue, Mar 23, 2010 at 03:00:28AM +0700, Antoine Martin wrote: > On 03/23/2010 02:15 AM, Anthony Liguori wrote: > >On 03/22/2010 12:55 PM, Avi Kivity wrote: > >>>Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by > >>>Anthony. > >>>There's numerous ways that this can break: > >> > >>I don't like it either. We have libvirt for enumerating guests. > > > >We're stuck in a rut with libvirt and I think a lot of the > >dissatisfaction with qemu is rooted in that. It's not libvirt that's > >the probably, but the relationship between qemu and libvirt. > +1 > The obvious reason why so many people still use shell scripts rather > than libvirt is because if it just doesn't provide what they need. > Every time I've looked at it (and I've been looking for a better > solution for many years), it seems that it would have provided most of > the things I needed, but the remaining bits were unsolvable. If you happen to remember what missing features prevented you choosing libvirt, that would be invaluable information for us, to see if there are quick wins that will help out. We got very useful feedback when recently asking people this same question http://rwmj.wordpress.com/2010/01/07/quick-quiz-what-stops-you-from-using-libvirt/ Allowing arbitrary passthrough of QEMU commands/args will solve some of these issues, but certainly far from solving all of them. eg guest cut+ paste, host side control of guest screen resolution, easier x509/TLS configuration for remote management, soft reboot, Windows desktop support for virt-manager, host network interface management/setup, etc Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:55 ` Avi Kivity 2010-03-22 19:15 ` Anthony Liguori @ 2010-03-22 19:20 ` Ingo Molnar 2010-03-22 19:44 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 19:20 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > > Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by > > Anthony. There's numerous ways that this can break: > > I don't like it either. We have libvirt for enumerating guests. Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution, obviously. > > - Those special files can get corrupted, mis-setup, get out of sync, or can > > be hard to discover. > > > > - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious > > design flaw: it is per user. When i'm root i'd like to query _all_ current > > guest images, not just the ones started by root. A system might not even > > have a notion of '${HOME}'. > > > > - Apps might start KVM vcpu instances without adhering to the > > ${HOME}/.qemu/qmp/ access method. > > - it doesn't work with nfs. So out of a list of 4 disadvantages your reply is that you agree with 3? > > - There is no guarantee for the Qemu process to reply to a request - while > > the kernel can always guarantee an enumeration result. I dont want 'perf > > kvm' to hang or misbehave just because Qemu has hung. > > If qemu doesn't reply, your guest is dead anyway. Erm, but i'm talking about a dead tool here. There's a world of a difference between 'kvm top' not showing new entries (because the guest is dead), and 'perf kvm top' hanging due to Qemu hanging. So it's essentially 4 our of 4. Yet your reply isnt "Ingo you are right" but "hey, too bad" ? > > Really, for such reasons user-space is pretty poor at doing system-wide > > enumeration and resource management. Microkernels lost for a reason. > > Take a look at your desktop, userspace is doing all of that everywhere, from > enumerating users and groups, to deciding how your disks are named. The > kernel only provides the bare facilities. We dont do that for robust system instrumentation, for heaven's sake! By your argument it would be perfectly fine to implement /proc purely via user-space, correct? > > You are committing several grave design mistakes here. > > I am committing on the shoulders of giants. Really, this is getting outright ridiculous. You agree with me that Anothony suggested a technically inferior solution, yet you even seem to be proud of it and are joking about it? And _you_ are complaining about lkml-style hard-talk discussions? Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:20 ` Ingo Molnar @ 2010-03-22 19:44 ` Avi Kivity 2010-03-22 20:06 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 19:44 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 09:20 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by >>> Anthony. There's numerous ways that this can break: >>> >> I don't like it either. We have libvirt for enumerating guests. >> > Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution, > obviously. > It doesn't follow. The libvirt daemon could/should own guests from all users. I don't know if it does so now, but nothing is preventing it technically. >>> - Those special files can get corrupted, mis-setup, get out of sync, or can >>> be hard to discover. >>> >>> - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious >>> design flaw: it is per user. When i'm root i'd like to query _all_ current >>> guest images, not just the ones started by root. A system might not even >>> have a notion of '${HOME}'. >>> >>> - Apps might start KVM vcpu instances without adhering to the >>> ${HOME}/.qemu/qmp/ access method. >>> >> - it doesn't work with nfs. >> > So out of a list of 4 disadvantages your reply is that you agree with 3? > I agree with 1-3, disagree with 4, and add 5. Yes. > >>> - There is no guarantee for the Qemu process to reply to a request - while >>> the kernel can always guarantee an enumeration result. I dont want 'perf >>> kvm' to hang or misbehave just because Qemu has hung. >>> >> If qemu doesn't reply, your guest is dead anyway. >> > Erm, but i'm talking about a dead tool here. There's a world of a difference > between 'kvm top' not showing new entries (because the guest is dead), and > 'perf kvm top' hanging due to Qemu hanging. > If qemu hangs, the guest hangs a few milliseconds later. > So it's essentially 4 our of 4. Yet your reply isnt "Ingo you are right" but > "hey, too bad" ? > My reply is "you are right" (phrased earlier as "I don't like it either" meaning I agree with your dislike). One of your criticisms was invalid, IMO, and I pointed it out. >>> Really, for such reasons user-space is pretty poor at doing system-wide >>> enumeration and resource management. Microkernels lost for a reason. >>> >> Take a look at your desktop, userspace is doing all of that everywhere, from >> enumerating users and groups, to deciding how your disks are named. The >> kernel only provides the bare facilities. >> > We dont do that for robust system instrumentation, for heaven's sake! > If qemu fails, you lose your guest. If libvirt forgets about a guest, you can't do anything with it any more. These are more serious problems than 'perf kvm' not working. Qemu and libvirt have to be robust anyway, we can rely on them. Like we have to rely on init, X, sshd, and a zillion other critical tools. > By your argument it would be perfectly fine to implement /proc purely via > user-space, correct? > I would have preferred /proc to be implemented via syscalls called directly from tools, and good tools written to expose the information in it. When computers were slower 'top' would spend tons of time opening and closing all those tiny files and parsing them. Of course the kernel needs to provide the information. >>> You are committing several grave design mistakes here. >>> >> I am committing on the shoulders of giants. >> > Really, this is getting outright ridiculous. You agree with me that Anothony > suggested a technically inferior solution, yet you even seem to be proud of it > and are joking about it? > The bit above this was: > Really, for such reasons user-space is pretty poor at doing system-wide > enumeration and resource management. Microkernels lost for a reason. > In every Linux system userspace is doing or proxying much of the enumeration and resource management. So if enumerating guests in userspace is a mistake, then I am not alone in making it. > And _you_ are complaining about lkml-style hard-talk discussions? > There is a difference between joking and insulting people. I enjoy jokes but I dislike being insulted. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:44 ` Avi Kivity @ 2010-03-22 20:06 ` Ingo Molnar 2010-03-22 20:15 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 20:06 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > On 03/22/2010 09:20 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>>Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by > >>>Anthony. There's numerous ways that this can break: > >>I don't like it either. We have libvirt for enumerating guests. > >Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution, > >obviously. > > It doesn't follow. The libvirt daemon could/should own guests from all > users. I don't know if it does so now, but nothing is preventing it > technically. It's hard for me to argue against a hypothetical implementation, but all user-space driven solutions for resource enumeration i've seen so far had weaknesses that kernel-based solutions dont have. > >>> - Those special files can get corrupted, mis-setup, get out of sync, or can > >>> be hard to discover. > >>> > >>> - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious > >>> design flaw: it is per user. When i'm root i'd like to query _all_ current > >>> guest images, not just the ones started by root. A system might not even > >>> have a notion of '${HOME}'. > >>> > >>> - Apps might start KVM vcpu instances without adhering to the > >>> ${HOME}/.qemu/qmp/ access method. > >>- it doesn't work with nfs. > >So out of a list of 4 disadvantages your reply is that you agree with 3? > > I agree with 1-3, disagree with 4, and add 5. Yes. > > >>> - There is no guarantee for the Qemu process to reply to a request - while > >>> the kernel can always guarantee an enumeration result. I dont want 'perf > >>> kvm' to hang or misbehave just because Qemu has hung. > >>If qemu doesn't reply, your guest is dead anyway. > >Erm, but i'm talking about a dead tool here. There's a world of a difference > >between 'kvm top' not showing new entries (because the guest is dead), and > >'perf kvm top' hanging due to Qemu hanging. > > If qemu hangs, the guest hangs a few milliseconds later. I think you didnt understand my point. I am talking about 'perf kvm top' hanging if Qemu hangs. With a proper in-kernel enumeration the kernel would always guarantee the functionality, even if the vcpu does not make progress (i.e. it's "hung"). With this implemented in Qemu we lose that kind of robustness guarantee. And especially during development (when developers use instrumentation the most) is it important to have robust instrumentation that does not hang along with the Qemu process. > If qemu fails, you lose your guest. If libvirt forgets about a > guest, you can't do anything with it any more. These are more > serious problems than 'perf kvm' not working. [...] How on earth can you justify a bug ("perf kvm top" hanging) with that there are other bugs as well? Basically you are arguing the equivalent that a gdb session would be fine to become unresponsive if the debugged task hangs. Fortunately ptrace is kernel-based and it never 'hangs' if the user-space process hangs somewhere. This is an essential property of good instrumentation. So the enumeration method you suggested is a poor, sub-part solution, simple as that. > [...] Qemu and libvirt have to be robust anyway, we can rely on them. Like > we have to rely on init, X, sshd, and a zillion other critical tools. We can still profile any of those tools without the profiler breaking if the debugged tool breaks ... > > By your argument it would be perfectly fine to implement /proc purely via > > user-space, correct? > > I would have preferred /proc to be implemented via syscalls called directly > from tools, and good tools written to expose the information in it. When > computers were slower 'top' would spend tons of time opening and closing all > those tiny files and parsing them. Of course the kernel needs to provide > the information. (Then you'll be enjoyed to hear that perf has enabled exactly that, and that we are working towards that precise usecase.) Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 20:06 ` Ingo Molnar @ 2010-03-22 20:15 ` Avi Kivity 2010-03-22 20:29 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 20:15 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 10:06 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/22/2010 09:20 PM, Ingo Molnar wrote: >> >>> * Avi Kivity<avi@redhat.com> wrote: >>> >>> >>>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by >>>>> Anthony. There's numerous ways that this can break: >>>>> >>>> I don't like it either. We have libvirt for enumerating guests. >>>> >>> Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution, >>> obviously. >>> >> It doesn't follow. The libvirt daemon could/should own guests from all >> users. I don't know if it does so now, but nothing is preventing it >> technically. >> > It's hard for me to argue against a hypothetical implementation, but all > user-space driven solutions for resource enumeration i've seen so far had > weaknesses that kernel-based solutions dont have. > Correct. kernel-based solutions also have issues. >> If qemu hangs, the guest hangs a few milliseconds later. >> > I think you didnt understand my point. I am talking about 'perf kvm top' > hanging if Qemu hangs. > Use non-blocking I/O, report that guest as dead. No point in profiling it, it isn't making any progress. > With a proper in-kernel enumeration the kernel would always guarantee the > functionality, even if the vcpu does not make progress (i.e. it's "hung"). > > With this implemented in Qemu we lose that kind of robustness guarantee. > If qemu has a bug in the resource enumeration code, you can't profile one guest. If the kernel has a bug in the resource enumeration code, the system either panics or needs to be rebooted later. > And especially during development (when developers use instrumentation the > most) is it important to have robust instrumentation that does not hang along > with the Qemu process. > It's nice not to have kernel oopses either. So when code can be in userspace, that's where it should be. >> If qemu fails, you lose your guest. If libvirt forgets about a >> guest, you can't do anything with it any more. These are more >> serious problems than 'perf kvm' not working. [...] >> > How on earth can you justify a bug ("perf kvm top" hanging) with that there > are other bugs as well? > There's no reason for 'perf kvm top' to hang if some process is not responsive. That would be a perf bug. > Basically you are arguing the equivalent that a gdb session would be fine to > become unresponsive if the debugged task hangs. Fortunately ptrace is > kernel-based and it never 'hangs' if the user-space process hangs somewhere. > Neither gdb nor perf should hang. > This is an essential property of good instrumentation. > > So the enumeration method you suggested is a poor, sub-part solution, simple > as that. > Or, you misunderstood it. >> [...] Qemu and libvirt have to be robust anyway, we can rely on them. Like >> we have to rely on init, X, sshd, and a zillion other critical tools. >> > We can still profile any of those tools without the profiler breaking if the > debugged tool breaks ... > You can't profile without qemu. >>> By your argument it would be perfectly fine to implement /proc purely via >>> user-space, correct? >>> >> I would have preferred /proc to be implemented via syscalls called directly >> from tools, and good tools written to expose the information in it. When >> computers were slower 'top' would spend tons of time opening and closing all >> those tiny files and parsing them. Of course the kernel needs to provide >> the information. >> > (Then you'll be enjoyed to hear that perf has enabled exactly that, and that we > are working towards that precise usecase.) > Are you exporting /proc/pid data via the perf syscall? If so, I think that's a good move. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 20:15 ` Avi Kivity @ 2010-03-22 20:29 ` Ingo Molnar 2010-03-22 20:40 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 20:29 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > > I think you didnt understand my point. I am talking about 'perf kvm top' > > hanging if Qemu hangs. > > Use non-blocking I/O, report that guest as dead. No point in profiling it, > it isn't making any progress. Erm, at what point do i decide that a guest is 'dead' versus 'just lagged due to lots of IO' ? Also, do you realize that you increase complexity (the use of non-blocking IO), just to protect against something that wouldnt happen if the right solution was used in the first place? > > With a proper in-kernel enumeration the kernel would always guarantee the > > functionality, even if the vcpu does not make progress (i.e. it's "hung"). > > > > With this implemented in Qemu we lose that kind of robustness guarantee. > > If qemu has a bug in the resource enumeration code, you can't profile one > guest. If the kernel has a bug in the resource enumeration code, the system > either panics or needs to be rebooted later. This is really simple code, not rocket science. If there's a bug in it we'll fix it. On the other hand a 500KLOC+ piece of Qemu code has lots of places to hang, so that is a large cross section. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 20:29 ` Ingo Molnar @ 2010-03-22 20:40 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 20:40 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 10:29 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> I think you didnt understand my point. I am talking about 'perf kvm top' >>> hanging if Qemu hangs. >>> >> Use non-blocking I/O, report that guest as dead. No point in profiling it, >> it isn't making any progress. >> > Erm, at what point do i decide that a guest is 'dead' versus 'just lagged due > to lots of IO' ? > qemu shouldn't block due to I/O (it does now, but there is work to fix it). Of course it could be swapping or other things. Pick a timeout, everything we do has timeouts these days. It's the price we pay for protection: if you put something where a failure can't hurt you, you have to be prepared for failure, and you might have false alarms. Is it so horrible for 'perf kvm top'? No user data loss will happen, surely? On the other hand, if it's in the kernel and it fails, you will lose service or perhaps data. > Also, do you realize that you increase complexity (the use of non-blocking > IO), just to protect against something that wouldnt happen if the right > solution was used in the first place? > It's a tradeoff. Increasing the kernel code size vs. increasing userspace size. >>> With a proper in-kernel enumeration the kernel would always guarantee the >>> functionality, even if the vcpu does not make progress (i.e. it's "hung"). >>> >>> With this implemented in Qemu we lose that kind of robustness guarantee. >>> >> If qemu has a bug in the resource enumeration code, you can't profile one >> guest. If the kernel has a bug in the resource enumeration code, the system >> either panics or needs to be rebooted later. >> > This is really simple code, not rocket science. If there's a bug in it we'll > fix it. On the other hand a 500KLOC+ piece of Qemu code has lots of places to > hang, so that is a large cross section. > > The kernel has tons of very simple code (and some very complex code as well), and tons of -stable updates as well. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:34 ` Ingo Molnar 2010-03-22 17:55 ` Avi Kivity @ 2010-03-22 18:35 ` Anthony Liguori 2010-03-22 19:22 ` Ingo Molnar 2010-03-22 18:41 ` Anthony Liguori 2 siblings, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 18:35 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 12:34 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>>>> - Easy default reference to guest instances, and a way for tools to >>>>> reference them symbolically as well in the multi-guest case. Preferably >>>>> something trustable and kernel-provided - not some indirect information >>>>> like a PID file created by libvirt-manager or so. >>>>> >>>> Usually 'layering violation' is trotted out at such suggestions. >>>> [...] >>>> >>> That's weird, how can a feature request be a 'layering violation'? >>> >> The 'something trustable and kernel-provided'. The kernel knows nothing >> about guest names. >> > The kernel certainly knows about other resources such as task names or network > interface names or tracepoint names. This is kernel design 101. > > >>> If something that users find straightforward and usable is a layering >>> violation to you (such as easily being able to access their own files on >>> the host as well ...) then i think you need to revisit the definition of >>> that term instead of trying to fix the user. >>> >> Here is the explanation, you left it quoted: >> >> >>>> [...] I don't like using the term, because sometimes the layers are >>>> incorrect and need to be violated. But it should be done explicitly, not >>>> as a shortcut for a minor feature (and profiling is a minor feature, most >>>> users will never use it, especially guest-from-host). >>>> >>>> The fact is we have well defined layers today, kvm virtualizes the cpu >>>> and memory, qemu emulates devices for a single guest, libvirt manages >>>> guests. We break this sometimes but there has to be a good reason. So >>>> perf needs to talk to libvirt if it wants names. Could be done via >>>> linking, or can be done using a pluging libvirt drops into perf. >>>> > This is really just the much-discredited microkernel approach for keeping > global enumeration data that should be kept by the kernel ... > > Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony. > There's numerous ways that this can break: > > - Those special files can get corrupted, mis-setup, get out of sync, or can > be hard to discover. > > - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious > design flaw: it is per user. When i'm root i'd like to query _all_ current > guest images, not just the ones started by root. A system might not even > have a notion of '${HOME}'. > > - Apps might start KVM vcpu instances without adhering to the > ${HOME}/.qemu/qmp/ access method. > Not all KVM vcpus are running operating systems. Transitive had a product that was using a KVM context to run their binary translator which allowed them full access to the host processes virtual address space range. In this case, there is no kernel and there are no devices. That's what I mean by a guest being a userspace context. KVM simply provides a new CPU mode to userspace in the same way that vm8086 mode. Regards, Anthony Liguori > - There is no guarantee for the Qemu process to reply to a request - while > the kernel can always guarantee an enumeration result. I dont want 'perf > kvm' to hang or misbehave just because Qemu has hung. > > Really, for such reasons user-space is pretty poor at doing system-wide > enumeration and resource management. Microkernels lost for a reason. > > You are committing several grave design mistakes here. > > Thanks, > > Ingo > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 18:35 ` Anthony Liguori @ 2010-03-22 19:22 ` Ingo Molnar 2010-03-22 19:29 ` Anthony Liguori 2010-03-22 19:45 ` Avi Kivity 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 19:22 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/22/2010 12:34 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>>>> - Easy default reference to guest instances, and a way for tools to > >>>>> reference them symbolically as well in the multi-guest case. Preferably > >>>>> something trustable and kernel-provided - not some indirect information > >>>>> like a PID file created by libvirt-manager or so. > >>>>Usually 'layering violation' is trotted out at such suggestions. > >>>>[...] > >>>That's weird, how can a feature request be a 'layering violation'? > >>The 'something trustable and kernel-provided'. The kernel knows nothing > >>about guest names. > >The kernel certainly knows about other resources such as task names or network > >interface names or tracepoint names. This is kernel design 101. > > > >>>If something that users find straightforward and usable is a layering > >>>violation to you (such as easily being able to access their own files on > >>>the host as well ...) then i think you need to revisit the definition of > >>>that term instead of trying to fix the user. > >>Here is the explanation, you left it quoted: > >> > >>>>[...] I don't like using the term, because sometimes the layers are > >>>>incorrect and need to be violated. But it should be done explicitly, not > >>>>as a shortcut for a minor feature (and profiling is a minor feature, most > >>>>users will never use it, especially guest-from-host). > >>>> > >>>>The fact is we have well defined layers today, kvm virtualizes the cpu > >>>>and memory, qemu emulates devices for a single guest, libvirt manages > >>>>guests. We break this sometimes but there has to be a good reason. So > >>>>perf needs to talk to libvirt if it wants names. Could be done via > >>>>linking, or can be done using a pluging libvirt drops into perf. > >This is really just the much-discredited microkernel approach for keeping > >global enumeration data that should be kept by the kernel ... > > > >Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony. > >There's numerous ways that this can break: > > > > - Those special files can get corrupted, mis-setup, get out of sync, or can > > be hard to discover. > > > > - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious > > design flaw: it is per user. When i'm root i'd like to query _all_ current > > guest images, not just the ones started by root. A system might not even > > have a notion of '${HOME}'. > > > > - Apps might start KVM vcpu instances without adhering to the > > ${HOME}/.qemu/qmp/ access method. > > Not all KVM vcpus are running operating systems. But we want to allow developers to instrument all of them ... > Transitive had a product that was using a KVM context to run their > binary translator which allowed them full access to the host > processes virtual address space range. In this case, there is no > kernel and there are no devices. And your point is that such vcpus should be excluded from profiling just because they fall outside the Qemu/libvirt umbrella? That is a ridiculous position. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:22 ` Ingo Molnar @ 2010-03-22 19:29 ` Anthony Liguori 2010-03-22 20:32 ` Ingo Molnar 2010-03-22 19:45 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 19:29 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 02:22 PM, Ingo Molnar wrote: >> Transitive had a product that was using a KVM context to run their >> binary translator which allowed them full access to the host >> processes virtual address space range. In this case, there is no >> kernel and there are no devices. >> > And your point is that such vcpus should be excluded from profiling just > because they fall outside the Qemu/libvirt umbrella? > You don't instrument it the way you'd instrument an operating system so no, you don't want it to show up in perf kvm top. Regards, Anthony Liguori > Ingo > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:29 ` Anthony Liguori @ 2010-03-22 20:32 ` Ingo Molnar 2010-03-22 20:43 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 20:32 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/22/2010 02:22 PM, Ingo Molnar wrote: > >>Transitive had a product that was using a KVM context to run their > >>binary translator which allowed them full access to the host > >>processes virtual address space range. In this case, there is no > >>kernel and there are no devices. > > > > And your point is that such vcpus should be excluded from profiling just > > because they fall outside the Qemu/libvirt umbrella? > > You don't instrument it the way you'd instrument an operating system so no, > you don't want it to show up in perf kvm top. Erm, why not? It's executing a virtualized CPU, so sure it makes sense to allow the profiling of it! It might even not be the weird case you mentioned by some competing virtualization project to Qemu ... So your argument is wrong on several technical levels, sorry. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 20:32 ` Ingo Molnar @ 2010-03-22 20:43 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 20:43 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 10:32 PM, Ingo Molnar wrote: > * Anthony Liguori<anthony@codemonkey.ws> wrote: > > >> On 03/22/2010 02:22 PM, Ingo Molnar wrote: >> >>>> Transitive had a product that was using a KVM context to run their >>>> binary translator which allowed them full access to the host >>>> processes virtual address space range. In this case, there is no >>>> kernel and there are no devices. >>>> >>> And your point is that such vcpus should be excluded from profiling just >>> because they fall outside the Qemu/libvirt umbrella? >>> >> You don't instrument it the way you'd instrument an operating system so no, >> you don't want it to show up in perf kvm top. >> > Erm, why not? It's executing a virtualized CPU, so sure it makes sense to > allow the profiling of it! > It may not make sense to have symbol tables for it, for example it isn't generated from source code but from binary code for another architecture. Of course, just showing addresses is fine, but you don't need qemu for that. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:22 ` Ingo Molnar 2010-03-22 19:29 ` Anthony Liguori @ 2010-03-22 19:45 ` Avi Kivity 2010-03-22 20:35 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 19:45 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 09:22 PM, Ingo Molnar wrote: > >> Transitive had a product that was using a KVM context to run their >> binary translator which allowed them full access to the host >> processes virtual address space range. In this case, there is no >> kernel and there are no devices. >> > And your point is that such vcpus should be excluded from profiling just > because they fall outside the Qemu/libvirt umbrella? > > That is a ridiculous position. > > Non-guest vcpus will not be able to provide Linux-style symbols. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:45 ` Avi Kivity @ 2010-03-22 20:35 ` Ingo Molnar 2010-03-22 20:45 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 20:35 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > On 03/22/2010 09:22 PM, Ingo Molnar wrote: > > > >> Transitive had a product that was using a KVM context to run their binary > >> translator which allowed them full access to the host processes virtual > >> address space range. In this case, there is no kernel and there are no > >> devices. > > > > And your point is that such vcpus should be excluded from profiling just > > because they fall outside the Qemu/libvirt umbrella? > > > > That is a ridiculous position. > > > > Non-guest vcpus will not be able to provide Linux-style symbols. And why do you say that it makes no sense to profile them? Also, why do you define 'guest vcpus' to be 'Qemu started guest vcpus'? If some other KVM using project (which you encouraged just a few mails ago) starts a vcpu we still want to be able to profile them. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 20:35 ` Ingo Molnar @ 2010-03-22 20:45 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 20:45 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 10:35 PM, Ingo Molnar wrote: > >>> And your point is that such vcpus should be excluded from profiling just >>> because they fall outside the Qemu/libvirt umbrella? >>> >>> That is a ridiculous position. >>> >>> >> Non-guest vcpus will not be able to provide Linux-style symbols. >> > And why do you say that it makes no sense to profile them? > It makes sense to profile them, but you don't need to contact their userspace tool for that. > Also, why do you define 'guest vcpus' to be 'Qemu started guest vcpus'? If > some other KVM using project (which you encouraged just a few mails ago) > starts a vcpu we still want to be able to profile them. > > Maybe it should provide a mechanism for libvirt to list it. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 17:34 ` Ingo Molnar 2010-03-22 17:55 ` Avi Kivity 2010-03-22 18:35 ` Anthony Liguori @ 2010-03-22 18:41 ` Anthony Liguori 2010-03-22 19:27 ` Ingo Molnar 2 siblings, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 18:41 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 12:34 PM, Ingo Molnar wrote: > This is really just the much-discredited microkernel approach for keeping > global enumeration data that should be kept by the kernel ... > > Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony. > There's numerous ways that this can break: > > - Those special files can get corrupted, mis-setup, get out of sync, or can > be hard to discover. > > - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious > design flaw: it is per user. When i'm root i'd like to query _all_ current > guest images, not just the ones started by root. A system might not even > have a notion of '${HOME}'. > > - Apps might start KVM vcpu instances without adhering to the > ${HOME}/.qemu/qmp/ access method. > > - There is no guarantee for the Qemu process to reply to a request - while > the kernel can always guarantee an enumeration result. I dont want 'perf > kvm' to hang or misbehave just because Qemu has hung. > If your position basically boils down to, we can't trust userspace and we can always trust the kernel, I want to eliminate any userspace path, then I can't really help you out. I believe we can come up with an infrastructure that satisfies your actual requirements within qemu but if you're also insisting upon the above implementation detail then there's nothing I can do. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 18:41 ` Anthony Liguori @ 2010-03-22 19:27 ` Ingo Molnar 2010-03-22 19:47 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 19:27 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/22/2010 12:34 PM, Ingo Molnar wrote: > >This is really just the much-discredited microkernel approach for keeping > >global enumeration data that should be kept by the kernel ... > > > >Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony. > >There's numerous ways that this can break: > > > > - Those special files can get corrupted, mis-setup, get out of sync, or can > > be hard to discover. > > > > - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious > > design flaw: it is per user. When i'm root i'd like to query _all_ current > > guest images, not just the ones started by root. A system might not even > > have a notion of '${HOME}'. > > > > - Apps might start KVM vcpu instances without adhering to the > > ${HOME}/.qemu/qmp/ access method. > > > > - There is no guarantee for the Qemu process to reply to a request - while > > the kernel can always guarantee an enumeration result. I dont want 'perf > > kvm' to hang or misbehave just because Qemu has hung. > > If your position basically boils down to, we can't trust userspace > and we can always trust the kernel, I want to eliminate any > userspace path, then I can't really help you out. Why would you want to 'help me out'? I can tell a good solution from a bad one just fine. You should instead read the long list of disadvantages above, invert them and list then as advantages for the kernel-based vcpu enumeration solution, apply common sense and go admit to yourself that indeed in this situation a kernel provided enumeration of vcpu contexts is the most robust solution. It's really as simple as that :-) Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:27 ` Ingo Molnar @ 2010-03-22 19:47 ` Avi Kivity 2010-03-22 20:46 ` Ingo Molnar 2010-03-22 22:06 ` Anthony Liguori 0 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 19:47 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 09:27 PM, Ingo Molnar wrote: > >> If your position basically boils down to, we can't trust userspace >> and we can always trust the kernel, I want to eliminate any >> userspace path, then I can't really help you out. >> > Why would you want to 'help me out'? I can tell a good solution from a bad one > just fine. > You are basically making a kernel implementation a requirement, instead of something that follows from the requirement. > You should instead read the long list of disadvantages above, invert them and > list then as advantages for the kernel-based vcpu enumeration solution, apply > common sense and go admit to yourself that indeed in this situation a kernel > provided enumeration of vcpu contexts is the most robust solution. > Having qemu enumerate guests one way or another is not a good idea IMO since it is focused on one guest and doesn't have a system-wide entity. A userspace system-wide entity will work just as well as kernel implementation, without its disadvantages. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:47 ` Avi Kivity @ 2010-03-22 20:46 ` Ingo Molnar 2010-03-22 20:53 ` Avi Kivity 2010-03-22 22:06 ` Anthony Liguori 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 20:46 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > On 03/22/2010 09:27 PM, Ingo Molnar wrote: > > > >> If your position basically boils down to, we can't trust userspace > >> and we can always trust the kernel, I want to eliminate any > >> userspace path, then I can't really help you out. > > > > Why would you want to 'help me out'? I can tell a good solution from a bad > > one just fine. > > You are basically making a kernel implementation a requirement, instead of > something that follows from the requirement. No, i'm not. > > You should instead read the long list of disadvantages above, invert them > > and list then as advantages for the kernel-based vcpu enumeration > > solution, apply common sense and go admit to yourself that indeed in this > > situation a kernel provided enumeration of vcpu contexts is the most > > robust solution. > > Having qemu enumerate guests one way or another is not a good idea IMO since > it is focused on one guest and doesn't have a system-wide entity. A > userspace system-wide entity will work just as well as kernel > implementation, without its disadvantages. A system-wide user-space entity only solves one problem out of the 4 i listed, still leaving the other 3: - Those special files can get corrupted, mis-setup, get out of sync, or can be hard to discover. - Apps might start KVM vcpu instances without adhering to the system-wide access method. - There is no guarantee for the system-wide process to reply to a request - while the kernel can always guarantee an enumeration result. I dont want 'perf kvm' to hang or misbehave just because the system-wide entity has hung. Really, i think i have to give up and not try to convince you guys about this anymore - i dont think you are arguing constructively anymore and i dont want yet another pointless flamewar about this. Please consider 'perf kvm' scrapped indefinitely, due to lack of robust KVM instrumentation features: due to lack of robust+universal vcpu/guest enumeration and due to lack of robust+universal symbol access on the KVM side. It was a really promising feature IMO and i invested two days of arguments into it trying to find a workable solution, but it was not to be. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 20:46 ` Ingo Molnar @ 2010-03-22 20:53 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 20:53 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 10:46 PM, Ingo Molnar wrote: > >>> You should instead read the long list of disadvantages above, invert them >>> and list then as advantages for the kernel-based vcpu enumeration >>> solution, apply common sense and go admit to yourself that indeed in this >>> situation a kernel provided enumeration of vcpu contexts is the most >>> robust solution. >>> >> Having qemu enumerate guests one way or another is not a good idea IMO since >> it is focused on one guest and doesn't have a system-wide entity. A >> userspace system-wide entity will work just as well as kernel >> implementation, without its disadvantages. >> > A system-wide user-space entity only solves one problem out of the 4 i listed, > still leaving the other 3: > > - Those special files can get corrupted, mis-setup, get out of sync, or can > be hard to discover. > That's a hard requirement anyway. If it happens, we get massive data loss. Way more troubling than 'perf kvm top' doesn't work. So consider it fulfilled. > - Apps might start KVM vcpu instances without adhering to the > system-wide access method. > Then you don't get their symbol tables. That happens anyway if the symbol server is not installed, not running, handing out fake data. So we have to deal with that anyway. > - There is no guarantee for the system-wide process to reply to a request - > while the kernel can always guarantee an enumeration result. I dont want > 'perf kvm' to hang or misbehave just because the system-wide entity has > hung. > When you press a key there is no guarantee no component along the way will time out. > Really, i think i have to give up and not try to convince you guys about this > anymore - i dont think you are arguing constructively anymore and i dont want > yet another pointless flamewar about this. > > Please consider 'perf kvm' scrapped indefinitely, due to lack of robust KVM > instrumentation features: due to lack of robust+universal vcpu/guest > enumeration and due to lack of robust+universal symbol access on the KVM side. > It was a really promising feature IMO and i invested two days of arguments > into it trying to find a workable solution, but it was not to be. > I am not going to push libvirt or a subset thereof into the kernel for 'perf kvm'. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 19:47 ` Avi Kivity 2010-03-22 20:46 ` Ingo Molnar @ 2010-03-22 22:06 ` Anthony Liguori 2010-03-23 9:07 ` Avi Kivity ` (2 more replies) 1 sibling, 3 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 22:06 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 02:47 PM, Avi Kivity wrote: > On 03/22/2010 09:27 PM, Ingo Molnar wrote: >> >>> If your position basically boils down to, we can't trust userspace >>> and we can always trust the kernel, I want to eliminate any >>> userspace path, then I can't really help you out. >> Why would you want to 'help me out'? I can tell a good solution from >> a bad one >> just fine. > > You are basically making a kernel implementation a requirement, > instead of something that follows from the requirement. > >> You should instead read the long list of disadvantages above, invert >> them and >> list then as advantages for the kernel-based vcpu enumeration >> solution, apply >> common sense and go admit to yourself that indeed in this situation a >> kernel >> provided enumeration of vcpu contexts is the most robust solution. > > Having qemu enumerate guests one way or another is not a good idea IMO > since it is focused on one guest and doesn't have a system-wide entity. There always needs to be a system wide entity. There are two ways to enumerate instances from that system wide entity. You can centralize the creation of instances and there by maintain an list of current instances. You can also allow instances to be created in a decentralized manner and provide a standard mechanism for instances to register themselves with the system wide entity. IOW, it's the difference between asking libvirtd to exec(qemu) vs allowing a user to exec(qemu) and having qemu connect to a well known unix domain socket for libvirt to tell libvirtd that it exists. The later approach has a number of advantages. libvirt already supports both models. The former is the '/system' uri and the later is the '/session' uri. What I'm proposing, is to use the host file system as the system wide entity instead of libvirtd. libvirtd can monitor the host file system to participate in these activities but ultimately, moving this functionality out of libvirtd means that it becomes the standard mechanism for all qemu instances regardless of how they're launched. Regards, Anthony Liguori > A userspace system-wide entity will work just as well as kernel > implementation, without its disadvantages. > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 22:06 ` Anthony Liguori @ 2010-03-23 9:07 ` Avi Kivity 2010-03-23 14:09 ` Anthony Liguori 2010-03-23 10:13 ` Kevin Wolf 2010-03-23 14:06 ` Joerg Roedel 2 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-23 9:07 UTC (permalink / raw) To: Anthony Liguori Cc: Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/23/2010 12:06 AM, Anthony Liguori wrote: >> Having qemu enumerate guests one way or another is not a good idea >> IMO since it is focused on one guest and doesn't have a system-wide >> entity. > > > There always needs to be a system wide entity. There are two ways to > enumerate instances from that system wide entity. You can centralize > the creation of instances and there by maintain an list of current > instances. You can also allow instances to be created in a > decentralized manner and provide a standard mechanism for instances to > register themselves with the system wide entity. > > IOW, it's the difference between asking libvirtd to exec(qemu) vs > allowing a user to exec(qemu) and having qemu connect to a well known > unix domain socket for libvirt to tell libvirtd that it exists. > > The later approach has a number of advantages. libvirt already > supports both models. The former is the '/system' uri and the later > is the '/session' uri. > > What I'm proposing, is to use the host file system as the system wide > entity instead of libvirtd. libvirtd can monitor the host file system > to participate in these activities but ultimately, moving this > functionality out of libvirtd means that it becomes the standard > mechanism for all qemu instances regardless of how they're launched. I don't like dropping sockets into the host filesystem, especially as they won't be cleaned up on abnormal exit. I also think this breaks our 'mechanism, not policy' policy. Someone may want to do something weird with qemu that doesn't work well with this. We could allow starting monitors from the global configuration file, so a distribution can do this if it wants, but I don't think we should do this ourselves by default. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-23 9:07 ` Avi Kivity @ 2010-03-23 14:09 ` Anthony Liguori 0 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-23 14:09 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/23/2010 04:07 AM, Avi Kivity wrote: > On 03/23/2010 12:06 AM, Anthony Liguori wrote: >>> Having qemu enumerate guests one way or another is not a good idea >>> IMO since it is focused on one guest and doesn't have a system-wide >>> entity. >> >> >> There always needs to be a system wide entity. There are two ways to >> enumerate instances from that system wide entity. You can centralize >> the creation of instances and there by maintain an list of current >> instances. You can also allow instances to be created in a >> decentralized manner and provide a standard mechanism for instances >> to register themselves with the system wide entity. >> >> IOW, it's the difference between asking libvirtd to exec(qemu) vs >> allowing a user to exec(qemu) and having qemu connect to a well known >> unix domain socket for libvirt to tell libvirtd that it exists. >> >> The later approach has a number of advantages. libvirt already >> supports both models. The former is the '/system' uri and the later >> is the '/session' uri. >> >> What I'm proposing, is to use the host file system as the system wide >> entity instead of libvirtd. libvirtd can monitor the host file >> system to participate in these activities but ultimately, moving this >> functionality out of libvirtd means that it becomes the standard >> mechanism for all qemu instances regardless of how they're launched. > > I don't like dropping sockets into the host filesystem, especially as > they won't be cleaned up on abnormal exit. I also think this breaks > our 'mechanism, not policy' policy. Someone may want to do something > weird with qemu that doesn't work well with this. The approach I've taken (which I accidentally committed and reverted) was to set this up as the default qmp device much like we have a default monitor device. A user is capable of overriding this by manually specifying a qmp device or by disabling defaults. > We could allow starting monitors from the global configuration file, > so a distribution can do this if it wants, but I don't think we should > do this ourselves by default. I've looked at making default devices globally configurable. We'll get there but I think that's orthogonal to setting up a useful default qmp device. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 22:06 ` Anthony Liguori 2010-03-23 9:07 ` Avi Kivity @ 2010-03-23 10:13 ` Kevin Wolf 2010-03-23 10:28 ` Antoine Martin 2010-03-23 14:06 ` Joerg Roedel 2 siblings, 1 reply; 390+ messages in thread From: Kevin Wolf @ 2010-03-23 10:13 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins Am 22.03.2010 23:06, schrieb Anthony Liguori: > On 03/22/2010 02:47 PM, Avi Kivity wrote: >> Having qemu enumerate guests one way or another is not a good idea IMO >> since it is focused on one guest and doesn't have a system-wide entity. > > There always needs to be a system wide entity. There are two ways to > enumerate instances from that system wide entity. You can centralize > the creation of instances and there by maintain an list of current > instances. You can also allow instances to be created in a > decentralized manner and provide a standard mechanism for instances to > register themselves with the system wide entity. > > IOW, it's the difference between asking libvirtd to exec(qemu) vs > allowing a user to exec(qemu) and having qemu connect to a well known > unix domain socket for libvirt to tell libvirtd that it exists. I think the latter is exactly what I would want for myself. I do see the advantages of having a central instance, but I really don't want to bother with libvirt configuration files or even GUIs just to get an ad-hoc VM up when I can simply run "qemu -hda hd.img -m 1024". Let alone that I usually want to have full control over qemu, including monitor access and small details available as command line options. I know that I'm not the average user with these requirements, but still I am one user and do have these requirements. If I could just install libvirt, continue using qemu as I always did and libvirt picked my VMs up for things like global enumeration, that would be more or less the optimal thing for me. Kevin ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-23 10:13 ` Kevin Wolf @ 2010-03-23 10:28 ` Antoine Martin 0 siblings, 0 replies; 390+ messages in thread From: Antoine Martin @ 2010-03-23 10:28 UTC (permalink / raw) To: Kevin Wolf Cc: Anthony Liguori, Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/23/2010 05:13 PM, Kevin Wolf wrote: > Am 22.03.2010 23:06, schrieb Anthony Liguori: > >> On 03/22/2010 02:47 PM, Avi Kivity wrote: >> >>> Having qemu enumerate guests one way or another is not a good idea IMO >>> since it is focused on one guest and doesn't have a system-wide entity. >>> >> There always needs to be a system wide entity. There are two ways to >> enumerate instances from that system wide entity. You can centralize >> the creation of instances and there by maintain an list of current >> instances. You can also allow instances to be created in a >> decentralized manner and provide a standard mechanism for instances to >> register themselves with the system wide entity. >> >> IOW, it's the difference between asking libvirtd to exec(qemu) vs >> allowing a user to exec(qemu) and having qemu connect to a well known >> unix domain socket for libvirt to tell libvirtd that it exists. >> > I think the latter is exactly what I would want for myself. I do see the > advantages of having a central instance, but I really don't want to > bother with libvirt configuration files or even GUIs just to get an > ad-hoc VM up when I can simply run "qemu -hda hd.img -m 1024". Let alone > that I usually want to have full control over qemu, including monitor > access and small details available as command line options. > > I know that I'm not the average user with these requirements, but still > I am one user and do have these requirements. If I could just install > libvirt, continue using qemu as I always did and libvirt picked my VMs > up for things like global enumeration, that would be more or less the > optimal thing for me. > +1 And it would also make it more likely that users like us would convert to libvirt in the long run, by providing an easy and integrated transition path. I've had another look at libvirt, and one of the things that is holding me back is the cost of moving existing scripts to libvirt. If it could just pick up what I have (at least in part), then I don't have to. Antoine > Kevin > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 22:06 ` Anthony Liguori 2010-03-23 9:07 ` Avi Kivity 2010-03-23 10:13 ` Kevin Wolf @ 2010-03-23 14:06 ` Joerg Roedel 2010-03-23 16:39 ` Avi Kivity 2 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-23 14:06 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Mon, Mar 22, 2010 at 05:06:17PM -0500, Anthony Liguori wrote: > There always needs to be a system wide entity. There are two ways to > enumerate instances from that system wide entity. You can centralize > the creation of instances and there by maintain an list of current > instances. You can also allow instances to be created in a > decentralized manner and provide a standard mechanism for instances to > register themselves with the system wide entity. And this system wide entity is the kvm module. It creates instances of 'struct kvm' and destroys them. I see no problem if we just attach a name to every instance with a good default value like kvm0, kvm1 ... or guest0, guest1 ... User-space can override the name if it wants. The kvm module takes care about the names being unique. This is very much the same as network card numbering is implemented in the kernel. Forcing perf to talk to qemu or even libvirt produces to much overhead imho. Instrumentation only produces useful results with low overhead. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-23 14:06 ` Joerg Roedel @ 2010-03-23 16:39 ` Avi Kivity 2010-03-23 18:21 ` Joerg Roedel 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-23 16:39 UTC (permalink / raw) To: Joerg Roedel Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/23/2010 04:06 PM, Joerg Roedel wrote: > On Mon, Mar 22, 2010 at 05:06:17PM -0500, Anthony Liguori wrote: > >> There always needs to be a system wide entity. There are two ways to >> enumerate instances from that system wide entity. You can centralize >> the creation of instances and there by maintain an list of current >> instances. You can also allow instances to be created in a >> decentralized manner and provide a standard mechanism for instances to >> register themselves with the system wide entity. >> > And this system wide entity is the kvm module. It creates instances of > 'struct kvm' and destroys them. I see no problem if we just attach a > name to every instance with a good default value like kvm0, kvm1 ... or > guest0, guest1 ... User-space can override the name if it wants. The kvm > module takes care about the names being unique. > So, two users can't have a guest named MyGuest each? What about namespace support? There's a lot of work in virtualizing all kernel namespaces, you're adding to that. What about notifications when guests are added or removed? > This is very much the same as network card numbering is implemented in > the kernel. > Forcing perf to talk to qemu or even libvirt produces to much overhead > imho. Instrumentation only produces useful results with low overhead. > > It's a setup cost only. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-23 16:39 ` Avi Kivity @ 2010-03-23 18:21 ` Joerg Roedel 2010-03-23 18:27 ` Peter Zijlstra ` (3 more replies) 0 siblings, 4 replies; 390+ messages in thread From: Joerg Roedel @ 2010-03-23 18:21 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote: > On 03/23/2010 04:06 PM, Joerg Roedel wrote: >> And this system wide entity is the kvm module. It creates instances of >> 'struct kvm' and destroys them. I see no problem if we just attach a >> name to every instance with a good default value like kvm0, kvm1 ... or >> guest0, guest1 ... User-space can override the name if it wants. The kvm >> module takes care about the names being unique. >> > > So, two users can't have a guest named MyGuest each? What about > namespace support? There's a lot of work in virtualizing all kernel > namespaces, you're adding to that. This enumeration is a very small and non-intrusive feature. Making it aware of namespaces is easy too. > What about notifications when guests are added or removed? Who would be the consumer of such notifications? A 'perf kvm list' can live without I guess. If we need them later we can still add them. >> This is very much the same as network card numbering is implemented in >> the kernel. >> Forcing perf to talk to qemu or even libvirt produces to much overhead >> imho. Instrumentation only produces useful results with low overhead. >> > > It's a setup cost only. My statement was not limited to enumeration, I should have been more clear about that. The guest filesystem access-channel is another affected part. The 'perf kvm top' command will access the guest filesystem regularly and going over qemu would be more overhead here. Providing this in the KVM module directly also has the benefit that it would work out-of-the-box with different userspaces too. Or do we want to limit 'perf kvm' to the libvirt-qemu-kvm software stack? Sidenote: I really think we should come to a conclusion about the concept. KVM integration into perf is very useful feature to analyze virtualization workloads. Thanks, Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-23 18:21 ` Joerg Roedel @ 2010-03-23 18:27 ` Peter Zijlstra 2010-03-23 19:05 ` Javier Guerra Giraldez ` (2 subsequent siblings) 3 siblings, 0 replies; 390+ messages in thread From: Peter Zijlstra @ 2010-03-23 18:27 UTC (permalink / raw) To: Joerg Roedel Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Tue, 2010-03-23 at 19:21 +0100, Joerg Roedel wrote: > Sidenote: I really think we should come to a conclusion about the > concept. KVM integration into perf is very useful feature to > analyze virtualization workloads. I always start my things with bare kvm, It would be very unwelcome to mandate libvirt, or for that matter running a particular userspace in the guest. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-23 18:21 ` Joerg Roedel @ 2010-03-23 19:05 ` Javier Guerra Giraldez 2010-03-23 19:05 ` Javier Guerra Giraldez ` (2 subsequent siblings) 3 siblings, 0 replies; 390+ messages in thread From: Javier Guerra Giraldez @ 2010-03-23 19:05 UTC (permalink / raw) To: Joerg Roedel Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Tue, Mar 23, 2010 at 2:21 PM, Joerg Roedel <joro@8bytes.org> wrote: > On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote: >> So, two users can't have a guest named MyGuest each? What about >> namespace support? There's a lot of work in virtualizing all kernel >> namespaces, you're adding to that. > > This enumeration is a very small and non-intrusive feature. Making it > aware of namespaces is easy too. an outsider's comment: this path leads to a filesystem... which could be a very nice idea. it could have a directory for each VM, with pseudo-files with all the guest's status, and even the memory it's using. perf could simply watch those files. in fact, such a filesystem could be the main userleve/kernel interface. but i'm sure such a layour was considered (and rejected) very early in the KVM design. i don't think there's anything new to make it more desirable than it was back then. -- Javier ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-23 19:05 ` Javier Guerra Giraldez 0 siblings, 0 replies; 390+ messages in thread From: Javier Guerra Giraldez @ 2010-03-23 19:05 UTC (permalink / raw) To: Joerg Roedel Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Tue, Mar 23, 2010 at 2:21 PM, Joerg Roedel <joro@8bytes.org> wrote: > On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote: >> So, two users can't have a guest named MyGuest each? What about >> namespace support? There's a lot of work in virtualizing all kernel >> namespaces, you're adding to that. > > This enumeration is a very small and non-intrusive feature. Making it > aware of namespaces is easy too. an outsider's comment: this path leads to a filesystem... which could be a very nice idea. it could have a directory for each VM, with pseudo-files with all the guest's status, and even the memory it's using. perf could simply watch those files. in fact, such a filesystem could be the main userleve/kernel interface. but i'm sure such a layour was considered (and rejected) very early in the KVM design. i don't think there's anything new to make it more desirable than it was back then. -- Javier ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-23 18:21 ` Joerg Roedel 2010-03-23 18:27 ` Peter Zijlstra 2010-03-23 19:05 ` Javier Guerra Giraldez @ 2010-03-24 4:57 ` Avi Kivity 2010-03-24 11:59 ` Joerg Roedel 2010-03-24 5:09 ` Andi Kleen 3 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 4:57 UTC (permalink / raw) To: Joerg Roedel Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/23/2010 08:21 PM, Joerg Roedel wrote: > On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote: > >> On 03/23/2010 04:06 PM, Joerg Roedel wrote: >> > >>> And this system wide entity is the kvm module. It creates instances of >>> 'struct kvm' and destroys them. I see no problem if we just attach a >>> name to every instance with a good default value like kvm0, kvm1 ... or >>> guest0, guest1 ... User-space can override the name if it wants. The kvm >>> module takes care about the names being unique. >>> >>> >> So, two users can't have a guest named MyGuest each? What about >> namespace support? There's a lot of work in virtualizing all kernel >> namespaces, you're adding to that. >> > This enumeration is a very small and non-intrusive feature. Making it > aware of namespaces is easy too. > It's easier (and safer and all the other boring bits) not to do it at all in the kernel. >> What about notifications when guests are added or removed? >> > Who would be the consumer of such notifications? A 'perf kvm list' can > live without I guess. If we need them later we can still add them. > System-wide monitoring needs to work equally well for guests started before or after the monitor. Even disregarding that, if you introduce an API, people will start using it and complaining if it's incomplete. The equivalent functionality for network interfaces is in netlink. >>> This is very much the same as network card numbering is implemented in >>> the kernel. >>> Forcing perf to talk to qemu or even libvirt produces to much overhead >>> imho. Instrumentation only produces useful results with low overhead. >>> >>> >> It's a setup cost only. >> > My statement was not limited to enumeration, I should have been more > clear about that. The guest filesystem access-channel is another > affected part. The 'perf kvm top' command will access the guest > filesystem regularly and going over qemu would be more overhead here. > Why? Also, the real cost would be accessing the filesystem, not copying data over qemu. > Providing this in the KVM module directly also has the benefit that it > would work out-of-the-box with different userspaces too. Or do we want > to limit 'perf kvm' to the libvirt-qemu-kvm software stack? > Other userspaces can also provide this functionality, like they have to provide disk, network, and display emulation. The kernel is not a huge library. > Sidenote: I really think we should come to a conclusion about the > concept. KVM integration into perf is very useful feature to > analyze virtualization workloads. > > Agreed. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 4:57 ` Avi Kivity @ 2010-03-24 11:59 ` Joerg Roedel 2010-03-24 12:08 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 11:59 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 06:57:47AM +0200, Avi Kivity wrote: > On 03/23/2010 08:21 PM, Joerg Roedel wrote: >> This enumeration is a very small and non-intrusive feature. Making it >> aware of namespaces is easy too. >> > > It's easier (and safer and all the other boring bits) not to do it at > all in the kernel. For the KVM stack is doesn't matter where it is implemented. It is as easy in qemu or libvirt as in the kernel. I also don't see big risks. On the perf side and for its users it is a lot easier to have this in the kernel. I for example always use plain qemu when running kvm guests and never used libvirt. The only central entity I have here is the kvm kernel modules. I don't want to start using it only to be able to use perf kvm. >> Who would be the consumer of such notifications? A 'perf kvm list' can >> live without I guess. If we need them later we can still add them. > > System-wide monitoring needs to work equally well for guests started > before or after the monitor. Could be easily done using notifier chains already in the kernel. Probably implemented with much less than 100 lines of additional code. > Even disregarding that, if you introduce an API, people will start > using it and complaining if it's incomplete. There is nothing wrong with that. We only need to define what this API should be used for to prevent rank growth. It could be an instrumentation-only API for example. >> My statement was not limited to enumeration, I should have been more >> clear about that. The guest filesystem access-channel is another >> affected part. The 'perf kvm top' command will access the guest >> filesystem regularly and going over qemu would be more overhead here. >> > > Why? Also, the real cost would be accessing the filesystem, not copying > data over qemu. When measuring cache-misses any additional (and in this case unnecessary) copy-overhead result in less appropriate results. >> Providing this in the KVM module directly also has the benefit that it >> would work out-of-the-box with different userspaces too. Or do we want >> to limit 'perf kvm' to the libvirt-qemu-kvm software stack? > > Other userspaces can also provide this functionality, like they have to > provide disk, network, and display emulation. The kernel is not a huge > library. This has nothing to do with a library. It is about entity and resource management which is what os kernels are about. The virtual machine is the entity (similar to a process) and we want to add additional access channels and names to it. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 11:59 ` Joerg Roedel @ 2010-03-24 12:08 ` Avi Kivity 2010-03-24 12:50 ` Joerg Roedel 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 12:08 UTC (permalink / raw) To: Joerg Roedel Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 01:59 PM, Joerg Roedel wrote: > On Wed, Mar 24, 2010 at 06:57:47AM +0200, Avi Kivity wrote: > >> On 03/23/2010 08:21 PM, Joerg Roedel wrote: >> >>> This enumeration is a very small and non-intrusive feature. Making it >>> aware of namespaces is easy too. >>> >>> >> It's easier (and safer and all the other boring bits) not to do it at >> all in the kernel. >> > For the KVM stack is doesn't matter where it is implemented. It is as > easy in qemu or libvirt as in the kernel. I also don't see big risks. On > the perf side and for its users it is a lot easier to have this in the > kernel. > I for example always use plain qemu when running kvm guests and never > used libvirt. The only central entity I have here is the kvm kernel > modules. I don't want to start using it only to be able to use perf kvm. > You can always provide the kernel and module paths as command line parameters. It just won't be transparently usable, but if you're using qemu from the command line, presumably you can live with that. >>> Who would be the consumer of such notifications? A 'perf kvm list' can >>> live without I guess. If we need them later we can still add them. >>> >> System-wide monitoring needs to work equally well for guests started >> before or after the monitor. >> > Could be easily done using notifier chains already in the kernel. > Probably implemented with much less than 100 lines of additional code. > And a userspace interface for that. >> Even disregarding that, if you introduce an API, people will start >> using it and complaining if it's incomplete. >> > There is nothing wrong with that. We only need to define what this API > should be used for to prevent rank growth. It could be an > instrumentation-only API for example. > If we make an API, I'd like it to be generally useful. It's a total headache. For example, we'd need security module hooks to determine access permissions. So far we managed to avoid that since kvm doesn't allow you to access any information beyond what you provided it directly. >>> My statement was not limited to enumeration, I should have been more >>> clear about that. The guest filesystem access-channel is another >>> affected part. The 'perf kvm top' command will access the guest >>> filesystem regularly and going over qemu would be more overhead here. >>> >>> >> Why? Also, the real cost would be accessing the filesystem, not copying >> data over qemu. >> > When measuring cache-misses any additional (and in this case > unnecessary) copy-overhead result in less appropriate results. > Copying the objects is a one time cost. If you run perf for more than a second or two, it would fetch and cache all of the data. It's really the same problem with non-guest profiling, only magnified a bit. >>> Providing this in the KVM module directly also has the benefit that it >>> would work out-of-the-box with different userspaces too. Or do we want >>> to limit 'perf kvm' to the libvirt-qemu-kvm software stack? >>> >> Other userspaces can also provide this functionality, like they have to >> provide disk, network, and display emulation. The kernel is not a huge >> library. >> > This has nothing to do with a library. It is about entity and resource > management which is what os kernels are about. The virtual machine is > the entity (similar to a process) and we want to add additional access > channels and names to it. > kvm.ko has only a small subset of the information that is used to define a guest. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 12:08 ` Avi Kivity @ 2010-03-24 12:50 ` Joerg Roedel 2010-03-24 13:05 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 12:50 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 02:08:17PM +0200, Avi Kivity wrote: > On 03/24/2010 01:59 PM, Joerg Roedel wrote: > You can always provide the kernel and module paths as command line > parameters. It just won't be transparently usable, but if you're using > qemu from the command line, presumably you can live with that. I don't want the tool for myself only. A typical perf user expects that it works transparent. >> Could be easily done using notifier chains already in the kernel. >> Probably implemented with much less than 100 lines of additional code. > > And a userspace interface for that. Not necessarily. The perf event is configured to measure systemwide kvm by userspace. The kernel side of perf takes care that it stays system-wide even with added vm instances. So in this case the consumer for the notifier would be the perf kernel part. No userspace interface required. > If we make an API, I'd like it to be generally useful. Thats hard to do at this point since we don't know what people will use it for. We should keep it simple in the beginning and add new features as they are requested and make sense in this context. > It's a total headache. For example, we'd need security module hooks to > determine access permissions. So far we managed to avoid that since kvm > doesn't allow you to access any information beyond what you provided it > directly. Depends on how it is designed. A filesystem approach was already mentioned. We could create /sys/kvm/ for example to expose information about virtual machines to userspace. This would not require any new security hooks. > Copying the objects is a one time cost. If you run perf for more than a > second or two, it would fetch and cache all of the data. It's really > the same problem with non-guest profiling, only magnified a bit. I don't think we can cache filesystem data of a running guest on the host. It is too hard to keep such a cache coherent. >>> Other userspaces can also provide this functionality, like they have to >>> provide disk, network, and display emulation. The kernel is not a huge >>> library. If two userspaces run in parallel what is the single instance where perf can get a list of guests from? > kvm.ko has only a small subset of the information that is used to define > a guest. The subset is not small. It contains all guest vcpus, the complete interrupt routing hardware emulation and manages event the guests memory. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 12:50 ` Joerg Roedel @ 2010-03-24 13:05 ` Avi Kivity 2010-03-24 13:46 ` Joerg Roedel 2010-03-24 13:53 ` Alexander Graf 0 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-24 13:05 UTC (permalink / raw) To: Joerg Roedel Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 02:50 PM, Joerg Roedel wrote: > >> You can always provide the kernel and module paths as command line >> parameters. It just won't be transparently usable, but if you're using >> qemu from the command line, presumably you can live with that. >> > I don't want the tool for myself only. A typical perf user expects that > it works transparent. > A typical kvm user uses libvirt, so we can integrate it with that. >>> Could be easily done using notifier chains already in the kernel. >>> Probably implemented with much less than 100 lines of additional code. >>> >> And a userspace interface for that. >> > Not necessarily. The perf event is configured to measure systemwide kvm > by userspace. The kernel side of perf takes care that it stays > system-wide even with added vm instances. So in this case the consumer > for the notifier would be the perf kernel part. No userspace interface > required. > Someone needs to know about the new guest to fetch its symbols. Or do you want that part in the kernel too? >> If we make an API, I'd like it to be generally useful. >> > Thats hard to do at this point since we don't know what people will use > it for. We should keep it simple in the beginning and add new features > as they are requested and make sense in this context. > IMO this use case is to rare to warrant its own API, especially as there are alternatives. >> It's a total headache. For example, we'd need security module hooks to >> determine access permissions. So far we managed to avoid that since kvm >> doesn't allow you to access any information beyond what you provided it >> directly. >> > Depends on how it is designed. A filesystem approach was already > mentioned. We could create /sys/kvm/ for example to expose information > about virtual machines to userspace. This would not require any new > security hooks. > Who would set the security context on those files? Plus, we need cgroup support so you can't see one container's guests from an unrelated container. >> Copying the objects is a one time cost. If you run perf for more than a >> second or two, it would fetch and cache all of the data. It's really >> the same problem with non-guest profiling, only magnified a bit. >> > I don't think we can cache filesystem data of a running guest on the > host. It is too hard to keep such a cache coherent. > I don't see any choice. The guest can change its symbols at any time (say by kexec), without any notification. >>>> Other userspaces can also provide this functionality, like they have to >>>> provide disk, network, and display emulation. The kernel is not a huge >>>> library. >>>> > If two userspaces run in parallel what is the single instance where perf > can get a list of guests from? > I don't know. Surely that's solvable though. >> kvm.ko has only a small subset of the information that is used to define >> a guest. >> > The subset is not small. It contains all guest vcpus, the complete > interrupt routing hardware emulation and manages event the guests > memory. > It doesn't contain most of the mmio and pio address space. Integration with qemu would allow perf to tell us that the guest is hitting the interrupt status register of a virtio-blk device in pci slot 5 (the information is already available through the kvm_mmio trace event, but only qemu can decode it). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 13:05 ` Avi Kivity @ 2010-03-24 13:46 ` Joerg Roedel 2010-03-24 13:57 ` Avi Kivity 2010-03-24 13:53 ` Alexander Graf 1 sibling, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 13:46 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 03:05:02PM +0200, Avi Kivity wrote: > On 03/24/2010 02:50 PM, Joerg Roedel wrote: >> I don't want the tool for myself only. A typical perf user expects that >> it works transparent. > > A typical kvm user uses libvirt, so we can integrate it with that. Someone who uses libvirt and virt-manager by default is probably not interested in this feature at the same level a kvm developer is. And developers tend not to use libvirt for low-level kvm development. A number of developers have stated in this thread already that they would appreciate a solution for guest enumeration that would not involve libvirt. > Someone needs to know about the new guest to fetch its symbols. Or do > you want that part in the kernel too? The samples will be tagged with the guest-name (and some additional information perf needs). Perf userspace can access the symbols then through /sys/kvm/guest0/fs/... >> Depends on how it is designed. A filesystem approach was already >> mentioned. We could create /sys/kvm/ for example to expose information >> about virtual machines to userspace. This would not require any new >> security hooks. > > Who would set the security context on those files? An approach like: "The files are owned and only readable by the same user that started the vm." might be a good start. So a user can measure its own guests and root can measure all of them. > Plus, we need cgroup support so you can't see one container's guests > from an unrelated container. cgroup support is an issue but we can solve that too. Its in general still less complex than going through the whole libvirt-qemu-kvm stack. > Integration with qemu would allow perf to tell us that the guest is > hitting the interrupt status register of a virtio-blk device in pci > slot 5 (the information is already available through the kvm_mmio > trace event, but only qemu can decode it). Yeah that would be interesting information. But it is more related to tracing than to pmu measurements. The information which you mentioned above are probably better captured by an extension of trace-events to userspace. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 13:46 ` Joerg Roedel @ 2010-03-24 13:57 ` Avi Kivity 2010-03-24 15:01 ` Joerg Roedel 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 13:57 UTC (permalink / raw) To: Joerg Roedel Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 03:46 PM, Joerg Roedel wrote: > On Wed, Mar 24, 2010 at 03:05:02PM +0200, Avi Kivity wrote: > >> On 03/24/2010 02:50 PM, Joerg Roedel wrote: >> > >>> I don't want the tool for myself only. A typical perf user expects that >>> it works transparent. >>> >> A typical kvm user uses libvirt, so we can integrate it with that. >> > Someone who uses libvirt and virt-manager by default is probably not > interested in this feature at the same level a kvm developer is. And > developers tend not to use libvirt for low-level kvm development. A > number of developers have stated in this thread already that they would > appreciate a solution for guest enumeration that would not involve > libvirt. > So would I. But when I weigh the benefit of truly transparent system-wide perf integration for users who don't use libvirt but do use perf, versus the cost of transforming kvm from a single-process API to a system-wide API with all the complications that I've listed, it comes out in favour of not adding the API. Those few users can probably script something to cover their needs. >> Someone needs to know about the new guest to fetch its symbols. Or do >> you want that part in the kernel too? >> > The samples will be tagged with the guest-name (and some additional > information perf needs). Perf userspace can access the symbols then > through /sys/kvm/guest0/fs/... > I take that as a yes? So we need a virtio-serial client in the kernel (which might be exploitable by a malicious guest if buggy) and a fs-over-virtio-serial client in the kernel (also exploitable). >>> Depends on how it is designed. A filesystem approach was already >>> mentioned. We could create /sys/kvm/ for example to expose information >>> about virtual machines to userspace. This would not require any new >>> security hooks. >>> >> Who would set the security context on those files? >> > An approach like: "The files are owned and only readable by the same > user that started the vm." might be a good start. So a user can measure > its own guests and root can measure all of them. > That's not how sVirt works. sVirt isolates a user's VMs from each other, so if a guest breaks into qemu it can't break into other guests owned by the same user. The users who need this API (!libvirt and perf) probably don't care about sVirt, but a new API must not break it. >> Plus, we need cgroup support so you can't see one container's guests >> from an unrelated container. >> > cgroup support is an issue but we can solve that too. Its in general > still less complex than going through the whole libvirt-qemu-kvm stack. > It's a tradeoff. IMO, going through qemu is the better way, and also provides more information. >> Integration with qemu would allow perf to tell us that the guest is >> hitting the interrupt status register of a virtio-blk device in pci >> slot 5 (the information is already available through the kvm_mmio >> trace event, but only qemu can decode it). >> > Yeah that would be interesting information. But it is more related to > tracing than to pmu measurements. > The information which you mentioned above are probably better > captured by an extension of trace-events to userspace. > It's all related. You start with perf, see a problem with mmio, call up a histogram of mmio or interrupts or whatever, then zoom in on the misbehaving device. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 13:57 ` Avi Kivity @ 2010-03-24 15:01 ` Joerg Roedel 2010-03-24 15:12 ` Avi Kivity ` (2 more replies) 0 siblings, 3 replies; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 15:01 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 03:57:39PM +0200, Avi Kivity wrote: > On 03/24/2010 03:46 PM, Joerg Roedel wrote: >> Someone who uses libvirt and virt-manager by default is probably not >> interested in this feature at the same level a kvm developer is. And >> developers tend not to use libvirt for low-level kvm development. A >> number of developers have stated in this thread already that they would >> appreciate a solution for guest enumeration that would not involve >> libvirt. > > So would I. Great. > But when I weigh the benefit of truly transparent system-wide perf > integration for users who don't use libvirt but do use perf, versus > the cost of transforming kvm from a single-process API to a > system-wide API with all the complications that I've listed, it comes > out in favour of not adding the API. Its not a transformation, its an extension. The current per-process /dev/kvm stays mostly untouched. Its all about having something like this: $ cd /sys/kvm/guest0 $ ls -l -r-------- 1 root root 0 2009-08-17 12:05 name dr-x------ 1 root root 0 2009-08-17 12:05 fs $ cat name guest0 $ # ... The fs/ directory is used as the mount point for the guest root fs. >> The samples will be tagged with the guest-name (and some additional >> information perf needs). Perf userspace can access the symbols then >> through /sys/kvm/guest0/fs/... > > I take that as a yes? So we need a virtio-serial client in the kernel > (which might be exploitable by a malicious guest if buggy) and a > fs-over-virtio-serial client in the kernel (also exploitable). What I meant was: perf-kernel puts the guest-name into every sample and perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the symbols. I leave the question of how the guest-fs is exposed to the host out of this discussion. We should discuss this seperatly. >> An approach like: "The files are owned and only readable by the same >> user that started the vm." might be a good start. So a user can measure >> its own guests and root can measure all of them. > > That's not how sVirt works. sVirt isolates a user's VMs from each > other, so if a guest breaks into qemu it can't break into other guests > owned by the same user. If a vm breaks into qemu it can access the host file system which is the bigger problem. In this case there is no isolation anymore. From that context it can even kill other VMs of the same user independent of a hypothetical /sys/kvm/. >> Yeah that would be interesting information. But it is more related to >> tracing than to pmu measurements. The information which you >> mentioned above are probably better captured by an extension of >> trace-events to userspace. > > It's all related. You start with perf, see a problem with mmio, call up > a histogram of mmio or interrupts or whatever, then zoom in on the > misbehaving device. Yes, but its different from the implementation point-of-view. For the user it surely all plays together. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:01 ` Joerg Roedel @ 2010-03-24 15:12 ` Avi Kivity 2010-03-24 15:46 ` Joerg Roedel 2010-03-24 15:26 ` Daniel P. Berrange 2010-03-24 16:03 ` Peter Zijlstra 2 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 15:12 UTC (permalink / raw) To: Joerg Roedel Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 05:01 PM, Joerg Roedel wrote: > >> But when I weigh the benefit of truly transparent system-wide perf >> integration for users who don't use libvirt but do use perf, versus >> the cost of transforming kvm from a single-process API to a >> system-wide API with all the complications that I've listed, it comes >> out in favour of not adding the API. >> > Its not a transformation, its an extension. The current per-process > /dev/kvm stays mostly untouched. Its all about having something like > this: > > $ cd /sys/kvm/guest0 > $ ls -l > -r-------- 1 root root 0 2009-08-17 12:05 name > dr-x------ 1 root root 0 2009-08-17 12:05 fs > $ cat name > guest0 > $ # ... > > The fs/ directory is used as the mount point for the guest root fs. > The problem is /sys/kvm, not /sys/kvm/fs. >>> The samples will be tagged with the guest-name (and some additional >>> information perf needs). Perf userspace can access the symbols then >>> through /sys/kvm/guest0/fs/... >>> >> I take that as a yes? So we need a virtio-serial client in the kernel >> (which might be exploitable by a malicious guest if buggy) and a >> fs-over-virtio-serial client in the kernel (also exploitable). >> > What I meant was: perf-kernel puts the guest-name into every sample and > perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the > symbols. I leave the question of how the guest-fs is exposed to the host > out of this discussion. We should discuss this seperatly. > How I see it: perf-kernel puts the guest pid into every sample, and perf-userspace uses that to resolve to a mountpoint served by fuse, or to a unix domain socket that serves the files. >>> An approach like: "The files are owned and only readable by the same >>> user that started the vm." might be a good start. So a user can measure >>> its own guests and root can measure all of them. >>> >> That's not how sVirt works. sVirt isolates a user's VMs from each >> other, so if a guest breaks into qemu it can't break into other guests >> owned by the same user. >> > If a vm breaks into qemu it can access the host file system which is the > bigger problem. In this case there is no isolation anymore. From that > context it can even kill other VMs of the same user independent of a > hypothetical /sys/kvm/. > It cannot. sVirt labels the disk image and other files qemu needs with the appropriate label, and everything else is off limits. Even if you run the guest as root, it won't have access to other files. >>> Yeah that would be interesting information. But it is more related to >>> tracing than to pmu measurements. The information which you >>> mentioned above are probably better captured by an extension of >>> trace-events to userspace. >>> >> It's all related. You start with perf, see a problem with mmio, call up >> a histogram of mmio or interrupts or whatever, then zoom in on the >> misbehaving device. >> > Yes, but its different from the implementation point-of-view. For the > user it surely all plays together. > We need qemu to cooperate for mmio tracing, and we can cooperate with qemu for symbol resolution. If it prevents adding another kernel API, that's a win from my POV. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:12 ` Avi Kivity @ 2010-03-24 15:46 ` Joerg Roedel 2010-03-24 15:49 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 15:46 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote: > On 03/24/2010 05:01 PM, Joerg Roedel wrote: >> $ cd /sys/kvm/guest0 >> $ ls -l >> -r-------- 1 root root 0 2009-08-17 12:05 name >> dr-x------ 1 root root 0 2009-08-17 12:05 fs >> $ cat name >> guest0 >> $ # ... >> >> The fs/ directory is used as the mount point for the guest root fs. > > The problem is /sys/kvm, not /sys/kvm/fs. I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for example. This would keep anything in the process space (except for the global list of VMs which we should have anyway). >> What I meant was: perf-kernel puts the guest-name into every sample and >> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the >> symbols. I leave the question of how the guest-fs is exposed to the host >> out of this discussion. We should discuss this seperatly. > > How I see it: perf-kernel puts the guest pid into every sample, and > perf-userspace uses that to resolve to a mountpoint served by fuse, or > to a unix domain socket that serves the files. We need a bit more information than just the qemu-pid, but yes, this would also work out. >> If a vm breaks into qemu it can access the host file system which is the >> bigger problem. In this case there is no isolation anymore. From that >> context it can even kill other VMs of the same user independent of a >> hypothetical /sys/kvm/. > > It cannot. sVirt labels the disk image and other files qemu needs with > the appropriate label, and everything else is off limits. Even if you > run the guest as root, it won't have access to other files. See my reply to Daniel's email. >> Yes, but its different from the implementation point-of-view. For the >> user it surely all plays together. > > We need qemu to cooperate for mmio tracing, and we can cooperate with > qemu for symbol resolution. If it prevents adding another kernel API, > that's a win from my POV. Thats true. Probably qemu can inject this information in the kvm-trace-events stream. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:46 ` Joerg Roedel @ 2010-03-24 15:49 ` Avi Kivity 2010-03-24 15:59 ` Joerg Roedel 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 15:49 UTC (permalink / raw) To: Joerg Roedel Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 05:46 PM, Joerg Roedel wrote: > On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote: > >> On 03/24/2010 05:01 PM, Joerg Roedel wrote: >> >>> $ cd /sys/kvm/guest0 >>> $ ls -l >>> -r-------- 1 root root 0 2009-08-17 12:05 name >>> dr-x------ 1 root root 0 2009-08-17 12:05 fs >>> $ cat name >>> guest0 >>> $ # ... >>> >>> The fs/ directory is used as the mount point for the guest root fs. >>> >> The problem is /sys/kvm, not /sys/kvm/fs. >> > I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for > example. This would keep anything in the process space (except for the > global list of VMs which we should have anyway). > How about ~/.qemu/guests/$pid? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:49 ` Avi Kivity @ 2010-03-24 15:59 ` Joerg Roedel 2010-03-24 16:09 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 15:59 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 05:49:42PM +0200, Avi Kivity wrote: > On 03/24/2010 05:46 PM, Joerg Roedel wrote: >> On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote: >> >>> On 03/24/2010 05:01 PM, Joerg Roedel wrote: >>> >>>> $ cd /sys/kvm/guest0 >>>> $ ls -l >>>> -r-------- 1 root root 0 2009-08-17 12:05 name >>>> dr-x------ 1 root root 0 2009-08-17 12:05 fs >>>> $ cat name >>>> guest0 >>>> $ # ... >>>> >>>> The fs/ directory is used as the mount point for the guest root fs. >>>> >>> The problem is /sys/kvm, not /sys/kvm/fs. >>> >> I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for >> example. This would keep anything in the process space (except for the >> global list of VMs which we should have anyway). >> > > How about ~/.qemu/guests/$pid? That makes it hard for perf to find it and even harder to get a list of all VMs. With /proc/<pid>/kvm/guest we could symlink all guest directories to /proc/kvm/ and perf reads the list from there. Also perf can easily derive the directory for a guest from its pid. Last but not least its kernel-created and thus independent from the userspace part being used. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:59 ` Joerg Roedel @ 2010-03-24 16:09 ` Avi Kivity 2010-03-24 16:40 ` Joerg Roedel 2010-03-24 17:47 ` Arnaldo Carvalho de Melo 0 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-24 16:09 UTC (permalink / raw) To: Joerg Roedel Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 05:59 PM, Joerg Roedel wrote: > > >>> I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for >>> example. This would keep anything in the process space (except for the >>> global list of VMs which we should have anyway). >>> >>> >> How about ~/.qemu/guests/$pid? >> > That makes it hard for perf to find it and even harder to get a list of > all VMs. Looks trivial to find a guest, less so with enumerating (still doable). > With /proc/<pid>/kvm/guest we could symlink all guest > directories to /proc/kvm/ and perf reads the list from there. Also perf > can easily derive the directory for a guest from its pid. > Last but not least its kernel-created and thus independent from the > userspace part being used. > Doesn't perf already has a dependency on naming conventions for finding debug information? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:09 ` Avi Kivity @ 2010-03-24 16:40 ` Joerg Roedel 2010-03-24 16:47 ` Avi Kivity 2010-03-24 17:47 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 16:40 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity wrote: > On 03/24/2010 05:59 PM, Joerg Roedel wrote: >> >> >>>> I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for >>>> example. This would keep anything in the process space (except for the >>>> global list of VMs which we should have anyway). >>>> >>>> >>> How about ~/.qemu/guests/$pid? >>> >> That makes it hard for perf to find it and even harder to get a list of >> all VMs. > > Looks trivial to find a guest, less so with enumerating (still doable). Not so trival and even more likely to break. Even it perf has the pid of the process and wants to find the directory it has to do: 1. Get the uid of the process 2. Find the username for the uid 3. Use the username to find the home-directory Steps 2. and 3. need nsswitch and/or pam access to get this information from whatever source the admin has configured. And depending on what the source is it may be temporarily unavailable causing nasty timeouts. In short, there are many weak parts in that chain making it more likely to break. A kernel-based approach with /proc/<pid>/kvm does not have those issues (and to repeat myself, it is independent from the userspace being used). Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:40 ` Joerg Roedel @ 2010-03-24 16:47 ` Avi Kivity 2010-03-24 16:52 ` Avi Kivity 2010-04-08 14:29 ` Antoine Martin 0 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-24 16:47 UTC (permalink / raw) To: Joerg Roedel Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 06:40 PM, Joerg Roedel wrote: > >> Looks trivial to find a guest, less so with enumerating (still doable). >> > Not so trival and even more likely to break. Even it perf has the pid of > the process and wants to find the directory it has to do: > > 1. Get the uid of the process > 2. Find the username for the uid > 3. Use the username to find the home-directory > > Steps 2. and 3. need nsswitch and/or pam access to get this information > from whatever source the admin has configured. And depending on what the > source is it may be temporarily unavailable causing nasty timeouts. In > short, there are many weak parts in that chain making it more likely to > break. > It's true. If the kernel provides something, there are fewer things that can break. But if your system is so broken that you can't resolve uids, fix that before running perf. Must we design perf for that case? After all, 'ls -l' will break under the same circumstances. It's hard to imagine doing useful work when that doesn't work. > A kernel-based approach with /proc/<pid>/kvm does not have those issues > (and to repeat myself, it is independent from the userspace being used). > It has other issues, which are IMO more problematic. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:47 ` Avi Kivity @ 2010-03-24 16:52 ` Avi Kivity 2010-04-08 14:29 ` Antoine Martin 1 sibling, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-24 16:52 UTC (permalink / raw) To: Joerg Roedel Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 06:47 PM, Avi Kivity wrote: > > It's true. If the kernel provides something, there are fewer things > that can break. But if your system is so broken that you can't > resolve uids, fix that before running perf. Must we design perf for > that case? > > After all, 'ls -l' will break under the same circumstances. It's hard > to imagine doing useful work when that doesn't work. Also, perf itself will hang if it needs to access a file using autofs or nfs, and those are broken. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:47 ` Avi Kivity 2010-03-24 16:52 ` Avi Kivity @ 2010-04-08 14:29 ` Antoine Martin 1 sibling, 0 replies; 390+ messages in thread From: Antoine Martin @ 2010-04-08 14:29 UTC (permalink / raw) To: Avi Kivity Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins Avi Kivity wrote: > On 03/24/2010 06:40 PM, Joerg Roedel wrote: >> >>> Looks trivial to find a guest, less so with enumerating (still doable). >>> >> Not so trival and even more likely to break. Even it perf has the pid of >> the process and wants to find the directory it has to do: >> >> 1. Get the uid of the process >> 2. Find the username for the uid >> 3. Use the username to find the home-directory >> >> Steps 2. and 3. need nsswitch and/or pam access to get this information >> from whatever source the admin has configured. And depending on what the >> source is it may be temporarily unavailable causing nasty timeouts. In >> short, there are many weak parts in that chain making it more likely to >> break. >> > > It's true. If the kernel provides something, there are fewer things > that can break. But if your system is so broken that you can't resolve > uids, fix that before running perf. Must we design perf for that case? uid to username can fail when using chroots, or worse point to an incorrect location (and yes, I do use this) Sorry if this has been covered / discussion has moved on. Just catching up with the 500+ messages in my inbox.. Antoine > > After all, 'ls -l' will break under the same circumstances. It's hard > to imagine doing useful work when that doesn't work. > >> A kernel-based approach with /proc/<pid>/kvm does not have those issues >> (and to repeat myself, it is independent from the userspace being used). >> > > It has other issues, which are IMO more problematic. > ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:09 ` Avi Kivity 2010-03-24 16:40 ` Joerg Roedel @ 2010-03-24 17:47 ` Arnaldo Carvalho de Melo 2010-03-24 18:20 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Arnaldo Carvalho de Melo @ 2010-03-24 17:47 UTC (permalink / raw) To: Avi Kivity Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Fr?d?ric Weisbecker, Gregory Haskins Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu: > Doesn't perf already has a dependency on naming conventions for finding > debug information? It looks at several places, from most symbol rich (/usr/lib/debug/, aka -debuginfo packages, where we have full symtabs) to poorest (the packaged binary, where we may just have a .dynsym). In an ideal world, it would just get the build-id (a SHA1 cookie that is in an ELF session inserted in every binary (aka DSOs), kernel module, kallsyms or vmlinux file) and use that to look first in a local cache (implemented in perf for a long time already) or in some symbol server. For instance, for a random perf.data file I collected here in my machine I have: [acme@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread 5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so [acme@doppio linux-2.6-tip]$ So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some convention to get a debuginfo in a local file like: /usr/lib/debug/lib64/libpthread-2.10.2.so.debug Instead the tools look at: [acme@doppio linux-2.6-tip]$ l ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 -> ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6* To find the file for that specific build-id, not the one installed in my machine (or on the different machine, of a different architecture) that may be completely unrelated, a new one, or one for a different arch. - Arnaldo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 17:47 ` Arnaldo Carvalho de Melo @ 2010-03-24 18:20 ` Avi Kivity 2010-03-24 18:27 ` Arnaldo Carvalho de Melo 2010-03-25 9:00 ` Zhang, Yanmin 0 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-24 18:20 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote: > Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu: > >> Doesn't perf already has a dependency on naming conventions for finding >> debug information? >> > It looks at several places, from most symbol rich (/usr/lib/debug/, aka > -debuginfo packages, where we have full symtabs) to poorest (the > packaged binary, where we may just have a .dynsym). > > In an ideal world, it would just get the build-id (a SHA1 cookie that is > in an ELF session inserted in every binary (aka DSOs), kernel module, > kallsyms or vmlinux file) and use that to look first in a local cache > (implemented in perf for a long time already) or in some symbol server. > > For instance, for a random perf.data file I collected here in my machine > I have: > > [acme@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread > 5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so > [acme@doppio linux-2.6-tip]$ > > So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some > convention to get a debuginfo in a local file like: > > /usr/lib/debug/lib64/libpthread-2.10.2.so.debug > > Instead the tools look at: > > [acme@doppio linux-2.6-tip]$ l ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 > lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 -> ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6* > > To find the file for that specific build-id, not the one installed in my > machine (or on the different machine, of a different architecture) that > may be completely unrelated, a new one, or one for a different arch. > Thanks. I believe qemu could easily act as a symbol server for this use case. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 18:20 ` Avi Kivity @ 2010-03-24 18:27 ` Arnaldo Carvalho de Melo 2010-03-25 9:00 ` Zhang, Yanmin 1 sibling, 0 replies; 390+ messages in thread From: Arnaldo Carvalho de Melo @ 2010-03-24 18:27 UTC (permalink / raw) To: Avi Kivity Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Fr?d?ric Weisbecker, Gregory Haskins Em Wed, Mar 24, 2010 at 08:20:10PM +0200, Avi Kivity escreveu: > On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote: >> Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu: >> >>> Doesn't perf already has a dependency on naming conventions for finding >>> debug information? >>> >> It looks at several places, from most symbol rich (/usr/lib/debug/, aka >> -debuginfo packages, where we have full symtabs) to poorest (the >> packaged binary, where we may just have a .dynsym). >> >> In an ideal world, it would just get the build-id (a SHA1 cookie that is >> in an ELF session inserted in every binary (aka DSOs), kernel module, >> kallsyms or vmlinux file) and use that to look first in a local cache >> (implemented in perf for a long time already) or in some symbol server. >> >> For instance, for a random perf.data file I collected here in my machine >> I have: >> >> [acme@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread >> 5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so >> [acme@doppio linux-2.6-tip]$ >> >> So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some >> convention to get a debuginfo in a local file like: >> >> /usr/lib/debug/lib64/libpthread-2.10.2.so.debug >> >> Instead the tools look at: >> >> [acme@doppio linux-2.6-tip]$ l ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 >> lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 -> ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6* >> >> To find the file for that specific build-id, not the one installed in my >> machine (or on the different machine, of a different architecture) that >> may be completely unrelated, a new one, or one for a different arch. > Thanks. I believe qemu could easily act as a symbol server for this use > case. Agreed, but it doesn't even have to :-) We just need to get the build-id in the PERF_RECORD_MMAP event somehow and then get this symbol from elsewhere, say the same DVD/RHN channel/Debian Repository/embedded developer toolkit image not stripped/whatever. Or it may already be in the local cache from last week's perf report session :-) - Arnaldo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 18:20 ` Avi Kivity 2010-03-24 18:27 ` Arnaldo Carvalho de Melo @ 2010-03-25 9:00 ` Zhang, Yanmin 1 sibling, 0 replies; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-25 9:00 UTC (permalink / raw) To: Avi Kivity Cc: Arnaldo Carvalho de Melo, Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, zhiteng.huang, Fr?d?ric Weisbecker, Gregory Haskins On Wed, 2010-03-24 at 20:20 +0200, Avi Kivity wrote: > On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote: > > Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu: > > > >> Doesn't perf already has a dependency on naming conventions for finding > >> debug information? > >> > > It looks at several places, from most symbol rich (/usr/lib/debug/, aka > > -debuginfo packages, where we have full symtabs) to poorest (the > > packaged binary, where we may just have a .dynsym). > > > > In an ideal world, it would just get the build-id (a SHA1 cookie that is > > in an ELF session inserted in every binary (aka DSOs), kernel module, > > kallsyms or vmlinux file) and use that to look first in a local cache > > (implemented in perf for a long time already) or in some symbol server. > > > > For instance, for a random perf.data file I collected here in my machine > > I have: > > > > [acme@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread > > 5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so > > [acme@doppio linux-2.6-tip]$ > > > > So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some > > convention to get a debuginfo in a local file like: > > > > /usr/lib/debug/lib64/libpthread-2.10.2.so.debug > > > > Instead the tools look at: > > > > [acme@doppio linux-2.6-tip]$ l ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 > > lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 -> ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6* > > > > To find the file for that specific build-id, not the one installed in my > > machine (or on the different machine, of a different architecture) that > > may be completely unrelated, a new one, or one for a different arch. > > > > Thanks. I believe qemu could easily act as a symbol server for this use > case. I spent a couple of days to investigate why sshfs/fuse doesn't work well with procfs and sysfs. Just after my patch against fuse is ready almost, I found fuse already supports such access by direct I/O. With parameter -o direct_io, it could work well. Here is an example to mount / from a guest os. #sshfs -p 5551 -o direct_io localhost:/ guestmount We can read files and write files if permission is ok. I will go ahead to support multiple guest os instance statistics parsing. Yanmin ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:01 ` Joerg Roedel 2010-03-24 15:12 ` Avi Kivity @ 2010-03-24 15:26 ` Daniel P. Berrange 2010-03-24 15:37 ` Joerg Roedel 2010-03-24 16:03 ` Peter Zijlstra 2 siblings, 1 reply; 390+ messages in thread From: Daniel P. Berrange @ 2010-03-24 15:26 UTC (permalink / raw) To: Joerg Roedel Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 04:01:37PM +0100, Joerg Roedel wrote: > >> An approach like: "The files are owned and only readable by the same > >> user that started the vm." might be a good start. So a user can measure > >> its own guests and root can measure all of them. > > > > That's not how sVirt works. sVirt isolates a user's VMs from each > > other, so if a guest breaks into qemu it can't break into other guests > > owned by the same user. > > If a vm breaks into qemu it can access the host file system which is the > bigger problem. In this case there is no isolation anymore. From that > context it can even kill other VMs of the same user independent of a > hypothetical /sys/kvm/. No it can't. With sVirt every single VM has a custom security label and the policy only allows it access to disks / files with a matching label, and prevents it attacking any other VMs or processes on the host. THis confines the scope of any exploit in QEMU to those resources the admin has explicitly assigned to the guest. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:26 ` Daniel P. Berrange @ 2010-03-24 15:37 ` Joerg Roedel 2010-03-24 15:43 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 15:37 UTC (permalink / raw) To: Daniel P. Berrange Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 03:26:53PM +0000, Daniel P. Berrange wrote: > On Wed, Mar 24, 2010 at 04:01:37PM +0100, Joerg Roedel wrote: > > >> An approach like: "The files are owned and only readable by the same > > >> user that started the vm." might be a good start. So a user can measure > > >> its own guests and root can measure all of them. > > > > > > That's not how sVirt works. sVirt isolates a user's VMs from each > > > other, so if a guest breaks into qemu it can't break into other guests > > > owned by the same user. > > > > If a vm breaks into qemu it can access the host file system which is the > > bigger problem. In this case there is no isolation anymore. From that > > context it can even kill other VMs of the same user independent of a > > hypothetical /sys/kvm/. > > No it can't. With sVirt every single VM has a custom security label and > the policy only allows it access to disks / files with a matching label, > and prevents it attacking any other VMs or processes on the host. THis > confines the scope of any exploit in QEMU to those resources the admin > has explicitly assigned to the guest. Even better. So a guest which breaks out can't even access its own /sys/kvm/ directory. Perfect, it doesn't need that access anyway. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:37 ` Joerg Roedel @ 2010-03-24 15:43 ` Avi Kivity 2010-03-24 15:50 ` Joerg Roedel 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 15:43 UTC (permalink / raw) To: Joerg Roedel Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 05:37 PM, Joerg Roedel wrote: > >> No it can't. With sVirt every single VM has a custom security label and >> the policy only allows it access to disks / files with a matching label, >> and prevents it attacking any other VMs or processes on the host. THis >> confines the scope of any exploit in QEMU to those resources the admin >> has explicitly assigned to the guest. >> > Even better. So a guest which breaks out can't even access its own > /sys/kvm/ directory. Perfect, it doesn't need that access anyway. > > But what security label does that directory have? How can we make sure that whoever needs access to those files, gets them? Automatically created objects don't work well with that model. They're simply missing information. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:43 ` Avi Kivity @ 2010-03-24 15:50 ` Joerg Roedel 2010-03-24 15:52 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 15:50 UTC (permalink / raw) To: Avi Kivity Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 05:43:31PM +0200, Avi Kivity wrote: > On 03/24/2010 05:37 PM, Joerg Roedel wrote: >> Even better. So a guest which breaks out can't even access its own >> /sys/kvm/ directory. Perfect, it doesn't need that access anyway. > > But what security label does that directory have? How can we make sure > that whoever needs access to those files, gets them? > > Automatically created objects don't work well with that model. They're > simply missing information. If we go the /proc/<pid>/kvm way then the directory should probably inherit the label from /proc/<pid>/? Same could be applied to /sys/kvm/guest/ if we decide for it. The VM is still bound to a single process with a /proc/<pid> after all. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:50 ` Joerg Roedel @ 2010-03-24 15:52 ` Avi Kivity 2010-03-24 16:17 ` Joerg Roedel 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 15:52 UTC (permalink / raw) To: Joerg Roedel Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 05:50 PM, Joerg Roedel wrote: > On Wed, Mar 24, 2010 at 05:43:31PM +0200, Avi Kivity wrote: > >> On 03/24/2010 05:37 PM, Joerg Roedel wrote: >> >>> Even better. So a guest which breaks out can't even access its own >>> /sys/kvm/ directory. Perfect, it doesn't need that access anyway. >>> >> But what security label does that directory have? How can we make sure >> that whoever needs access to those files, gets them? >> >> Automatically created objects don't work well with that model. They're >> simply missing information. >> > If we go the /proc/<pid>/kvm way then the directory should probably > inherit the label from /proc/<pid>/? > That's a security policy. The security people like their policies outside the kernel. For example, they may want a label that allows a trace context to read the data, and also qemu itself for introspection. > Same could be applied to /sys/kvm/guest/ if we decide for it. The VM is > still bound to a single process with a /proc/<pid> after all. > Ditto. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:52 ` Avi Kivity @ 2010-03-24 16:17 ` Joerg Roedel 2010-03-24 16:20 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 16:17 UTC (permalink / raw) To: Avi Kivity Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 05:52:54PM +0200, Avi Kivity wrote: > On 03/24/2010 05:50 PM, Joerg Roedel wrote: >> If we go the /proc/<pid>/kvm way then the directory should probably >> inherit the label from /proc/<pid>/? > > That's a security policy. The security people like their policies > outside the kernel. > > For example, they may want a label that allows a trace context to read > the data, and also qemu itself for introspection. Hm, I am not a security expert. But is this not only one entity more for sVirt to handle? I would leave that decision to the sVirt developers. Does attaching the same label as for the VM resources mean that root could not access it anymore? Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:17 ` Joerg Roedel @ 2010-03-24 16:20 ` Avi Kivity 2010-03-24 16:31 ` Joerg Roedel 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 16:20 UTC (permalink / raw) To: Joerg Roedel Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 06:17 PM, Joerg Roedel wrote: > On Wed, Mar 24, 2010 at 05:52:54PM +0200, Avi Kivity wrote: > >> On 03/24/2010 05:50 PM, Joerg Roedel wrote: >> >>> If we go the /proc/<pid>/kvm way then the directory should probably >>> inherit the label from /proc/<pid>/? >>> >> That's a security policy. The security people like their policies >> outside the kernel. >> >> For example, they may want a label that allows a trace context to read >> the data, and also qemu itself for introspection. >> > Hm, I am not a security expert. I'm out of my depth here as well. > But is this not only one entity more for > sVirt to handle? I would leave that decision to the sVirt developers. > Does attaching the same label as for the VM resources mean that root > could not access it anymore? > IIUC processes run under a context, and there's a policy somewhere that tells you which context can access which label (and with what permissions). There was a server on the Internet once that gave you root access and invited you to attack it. No idea if anyone succeeded or not (I got bored after about a minute). So it depends on the policy. If you attach the same label, that means all files with the same label have the same access permissions. I think. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:20 ` Avi Kivity @ 2010-03-24 16:31 ` Joerg Roedel 2010-03-24 16:32 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 16:31 UTC (permalink / raw) To: Avi Kivity Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 06:20:38PM +0200, Avi Kivity wrote: > On 03/24/2010 06:17 PM, Joerg Roedel wrote: >> But is this not only one entity more for >> sVirt to handle? I would leave that decision to the sVirt developers. >> Does attaching the same label as for the VM resources mean that root >> could not access it anymore? >> > > IIUC processes run under a context, and there's a policy somewhere that > tells you which context can access which label (and with what > permissions). There was a server on the Internet once that gave you > root access and invited you to attack it. No idea if anyone succeeded > or not (I got bored after about a minute). > > So it depends on the policy. If you attach the same label, that means > all files with the same label have the same access permissions. I think. So if this is true we can introduce a 'trace' label and add all contexts that should be allowed to trace to it. But we probably should leave the details to the security experts ;-) Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:31 ` Joerg Roedel @ 2010-03-24 16:32 ` Avi Kivity 2010-03-24 16:45 ` Joerg Roedel 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 16:32 UTC (permalink / raw) To: Joerg Roedel Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 06:31 PM, Joerg Roedel wrote: > On Wed, Mar 24, 2010 at 06:20:38PM +0200, Avi Kivity wrote: > >> On 03/24/2010 06:17 PM, Joerg Roedel wrote: >> >>> But is this not only one entity more for >>> sVirt to handle? I would leave that decision to the sVirt developers. >>> Does attaching the same label as for the VM resources mean that root >>> could not access it anymore? >>> >>> >> IIUC processes run under a context, and there's a policy somewhere that >> tells you which context can access which label (and with what >> permissions). There was a server on the Internet once that gave you >> root access and invited you to attack it. No idea if anyone succeeded >> or not (I got bored after about a minute). >> >> So it depends on the policy. If you attach the same label, that means >> all files with the same label have the same access permissions. I think. >> > So if this is true we can introduce a 'trace' label and add all contexts > that should be allowed to trace to it. > But we probably should leave the details to the security experts ;-) > That's just what I want to do. Leave it in userspace and then they can deal with it without telling us about it. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:32 ` Avi Kivity @ 2010-03-24 16:45 ` Joerg Roedel 2010-03-24 16:48 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 16:45 UTC (permalink / raw) To: Avi Kivity Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 06:32:51PM +0200, Avi Kivity wrote: > On 03/24/2010 06:31 PM, Joerg Roedel wrote: > That's just what I want to do. Leave it in userspace and then they can > deal with it without telling us about it. They can't do that with a directory in /proc? ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:45 ` Joerg Roedel @ 2010-03-24 16:48 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-24 16:48 UTC (permalink / raw) To: Joerg Roedel Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 06:45 PM, Joerg Roedel wrote: > >> That's just what I want to do. Leave it in userspace and then they can >> deal with it without telling us about it. >> > They can't do that with a directory in /proc? > > I don't know. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 15:01 ` Joerg Roedel 2010-03-24 15:12 ` Avi Kivity 2010-03-24 15:26 ` Daniel P. Berrange @ 2010-03-24 16:03 ` Peter Zijlstra 2010-03-24 16:16 ` Avi Kivity 2010-03-24 16:23 ` Joerg Roedel 2 siblings, 2 replies; 390+ messages in thread From: Peter Zijlstra @ 2010-03-24 16:03 UTC (permalink / raw) To: Joerg Roedel Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote: > What I meant was: perf-kernel puts the guest-name into every sample and > perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the > symbols. I leave the question of how the guest-fs is exposed to the host > out of this discussion. We should discuss this seperatly. I'd much prefer a pid like suggested later, keeps the samples smaller. But that said, we need guest kernel events like mmap and context switches too, otherwise we simply can't make sense of guest userspace addresses, we need to know the guest address space layout. So aside from a filesystem content, we first need mmap and context switch events to find the files we need to access. And while I appreciate all the security talk, its basically pointless anyway, the host can access it anyway, everybody agrees on that, but still you're arguing the case.. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:03 ` Peter Zijlstra @ 2010-03-24 16:16 ` Avi Kivity 2010-03-24 16:23 ` Joerg Roedel 1 sibling, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-24 16:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 06:03 PM, Peter Zijlstra wrote: > On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote: > > >> What I meant was: perf-kernel puts the guest-name into every sample and >> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the >> symbols. I leave the question of how the guest-fs is exposed to the host >> out of this discussion. We should discuss this seperatly. >> > I'd much prefer a pid like suggested later, keeps the samples smaller. > > But that said, we need guest kernel events like mmap and context > switches too, otherwise we simply can't make sense of guest userspace > addresses, we need to know the guest address space layout. > The kernel knows some of the address space layout, qemu knows all of it. > So aside from a filesystem content, we first need mmap and context > switch events to find the files we need to access. > This only works for the guest kernel, we don't know anything about guest processes [1]. > And while I appreciate all the security talk, its basically pointless > anyway, the host can access it anyway, everybody agrees on that, but > still you're arguing the case.. > root can access anything, but we're not talking about root. The idea is to protect against a guest that has exploited its qemu and is now attacking the host and its fellow guests. uid protection is no good since we want to isolate the guest from host processes belonging to the same uid and from other guests running under the same uid. [1] We can find out guest pids if we teach the kernel what to dereference, i.e. gs:offset1->offset2->offset3. Of course this varies from kernel to kernel, so we need some kind of bytecode that we can run in perf nmi context. Kind of what we need to run an unwinder for -fomit-frame-pointer. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:03 ` Peter Zijlstra 2010-03-24 16:16 ` Avi Kivity @ 2010-03-24 16:23 ` Joerg Roedel 2010-03-24 16:45 ` Peter Zijlstra 1 sibling, 1 reply; 390+ messages in thread From: Joerg Roedel @ 2010-03-24 16:23 UTC (permalink / raw) To: Peter Zijlstra Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, Mar 24, 2010 at 05:03:42PM +0100, Peter Zijlstra wrote: > On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote: > > > What I meant was: perf-kernel puts the guest-name into every sample and > > perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the > > symbols. I leave the question of how the guest-fs is exposed to the host > > out of this discussion. We should discuss this seperatly. > > I'd much prefer a pid like suggested later, keeps the samples smaller. > > But that said, we need guest kernel events like mmap and context > switches too, otherwise we simply can't make sense of guest userspace > addresses, we need to know the guest address space layout. With the filesystem approach all we need is the pid of the guest process. Then we can access proc/<pid>/maps of the guest and read out the address space layout, no? Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 16:23 ` Joerg Roedel @ 2010-03-24 16:45 ` Peter Zijlstra 0 siblings, 0 replies; 390+ messages in thread From: Peter Zijlstra @ 2010-03-24 16:45 UTC (permalink / raw) To: Joerg Roedel Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On Wed, 2010-03-24 at 17:23 +0100, Joerg Roedel wrote: > On Wed, Mar 24, 2010 at 05:03:42PM +0100, Peter Zijlstra wrote: > > On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote: > > > > > What I meant was: perf-kernel puts the guest-name into every sample and > > > perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the > > > symbols. I leave the question of how the guest-fs is exposed to the host > > > out of this discussion. We should discuss this seperatly. > > > > I'd much prefer a pid like suggested later, keeps the samples smaller. > > > > But that said, we need guest kernel events like mmap and context > > switches too, otherwise we simply can't make sense of guest userspace > > addresses, we need to know the guest address space layout. > > With the filesystem approach all we need is the pid of the guest > process. Then we can access proc/<pid>/maps of the guest and read out the > address space layout, no? No, what if it maps new things after you read it? But still getting the pid of the guest process seems non trivial without guest kernel support. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 13:05 ` Avi Kivity 2010-03-24 13:46 ` Joerg Roedel @ 2010-03-24 13:53 ` Alexander Graf 2010-03-24 13:59 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Alexander Graf @ 2010-03-24 13:53 UTC (permalink / raw) To: Avi Kivity Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins Avi Kivity wrote: > On 03/24/2010 02:50 PM, Joerg Roedel wrote: >> >>> You can always provide the kernel and module paths as command line >>> parameters. It just won't be transparently usable, but if you're using >>> qemu from the command line, presumably you can live with that. >>> >> I don't want the tool for myself only. A typical perf user expects that >> it works transparent. >> > > A typical kvm user uses libvirt, so we can integrate it with that. > >>>> Could be easily done using notifier chains already in the kernel. >>>> Probably implemented with much less than 100 lines of additional code. >>>> >>> And a userspace interface for that. >>> >> Not necessarily. The perf event is configured to measure systemwide kvm >> by userspace. The kernel side of perf takes care that it stays >> system-wide even with added vm instances. So in this case the consumer >> for the notifier would be the perf kernel part. No userspace interface >> required. >> > > Someone needs to know about the new guest to fetch its symbols. Or do > you want that part in the kernel too? How about we add a virtio "guest file system access" device? The guest would then expose its own file system using that device. On the host side this would simply be a -virtioguestfs unix:/tmp/guest.fs and you'd get a unix socket that gives you full access to the guest file system by using commands. I envision something like: SEND: GET /proc/version RECV: Linux version 2.6.27.37-0.1-default (geeko@buildhost) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15 14:56:58 +0200 Now all we need is integration in perf to enumerate virtual machines based on libvirt. If you want to run qemu-kvm directly, just go with --guestfs=/tmp/guest.fs and perf could fetch all required information automatically. This should solve all issues while staying 100% in user space, right? Alex ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 13:53 ` Alexander Graf @ 2010-03-24 13:59 ` Avi Kivity 2010-03-24 14:24 ` Alexander Graf 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 13:59 UTC (permalink / raw) To: Alexander Graf Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 03:53 PM, Alexander Graf wrote: > >> Someone needs to know about the new guest to fetch its symbols. Or do >> you want that part in the kernel too? >> > > How about we add a virtio "guest file system access" device? The guest > would then expose its own file system using that device. > > On the host side this would simply be a -virtioguestfs > unix:/tmp/guest.fs and you'd get a unix socket that gives you full > access to the guest file system by using commands. I envision something > like: > The idea is to use a dedicated channel over virtio-serial. If the channel is present the file server can serve files over it. > SEND: GET /proc/version > RECV: Linux version 2.6.27.37-0.1-default (geeko@buildhost) (gcc version > 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15 > 14:56:58 +0200 > > Now all we need is integration in perf to enumerate virtual machines > based on libvirt. If you want to run qemu-kvm directly, just go with > --guestfs=/tmp/guest.fs and perf could fetch all required information > automatically. > > This should solve all issues while staying 100% in user space, right? > Yeah, needs a fuse filesystem to populate the host namespace (kind of sshfs over virtio-serial). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 13:59 ` Avi Kivity @ 2010-03-24 14:24 ` Alexander Graf 2010-03-24 15:06 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Alexander Graf @ 2010-03-24 14:24 UTC (permalink / raw) To: Avi Kivity Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins Avi Kivity wrote: > On 03/24/2010 03:53 PM, Alexander Graf wrote: >> >>> Someone needs to know about the new guest to fetch its symbols. Or do >>> you want that part in the kernel too? >>> >> >> How about we add a virtio "guest file system access" device? The guest >> would then expose its own file system using that device. >> >> On the host side this would simply be a -virtioguestfs >> unix:/tmp/guest.fs and you'd get a unix socket that gives you full >> access to the guest file system by using commands. I envision something >> like: >> > > The idea is to use a dedicated channel over virtio-serial. If the > channel is present the file server can serve files over it. The file server being a kernel module inside the guest? We want to be able to serve things as early and hassle free as possible, so in this case I agree with Ingo that a kernel module is superior. > >> SEND: GET /proc/version >> RECV: Linux version 2.6.27.37-0.1-default (geeko@buildhost) (gcc version >> 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15 >> 14:56:58 +0200 >> >> Now all we need is integration in perf to enumerate virtual machines >> based on libvirt. If you want to run qemu-kvm directly, just go with >> --guestfs=/tmp/guest.fs and perf could fetch all required information >> automatically. >> >> This should solve all issues while staying 100% in user space, right? >> > > Yeah, needs a fuse filesystem to populate the host namespace (kind of > sshfs over virtio-serial). I don't see why we need a fuse filesystem. We can of course create one later on. But for now all you need is a user connecting to that socket. Alex ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 14:24 ` Alexander Graf @ 2010-03-24 15:06 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-24 15:06 UTC (permalink / raw) To: Alexander Graf Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 04:24 PM, Alexander Graf wrote: > Avi Kivity wrote: > >> On 03/24/2010 03:53 PM, Alexander Graf wrote: >> >>> >>>> Someone needs to know about the new guest to fetch its symbols. Or do >>>> you want that part in the kernel too? >>>> >>>> >>> How about we add a virtio "guest file system access" device? The guest >>> would then expose its own file system using that device. >>> >>> On the host side this would simply be a -virtioguestfs >>> unix:/tmp/guest.fs and you'd get a unix socket that gives you full >>> access to the guest file system by using commands. I envision something >>> like: >>> >>> >> The idea is to use a dedicated channel over virtio-serial. If the >> channel is present the file server can serve files over it. >> > The file server being a kernel module inside the guest? We want to be > able to serve things as early and hassle free as possible, so in this > case I agree with Ingo that a kernel module is superior. > No, just a daemon. If it's important enough we can get distributions to package it by default, and then it will be hassle free. If "early enough" is also so important, we can get it to start up on initrd. If it's really critical, we can patch grub to serve the files as well. >>> SEND: GET /proc/version >>> RECV: Linux version 2.6.27.37-0.1-default (geeko@buildhost) (gcc version >>> 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15 >>> 14:56:58 +0200 >>> >>> Now all we need is integration in perf to enumerate virtual machines >>> based on libvirt. If you want to run qemu-kvm directly, just go with >>> --guestfs=/tmp/guest.fs and perf could fetch all required information >>> automatically. >>> >>> This should solve all issues while staying 100% in user space, right? >>> >>> >> Yeah, needs a fuse filesystem to populate the host namespace (kind of >> sshfs over virtio-serial). >> > I don't see why we need a fuse filesystem. We can of course create one > later on. But for now all you need is a user connecting to that socket. > If the perf app knows the protocol, no problem. But leave perf with pure filesystem access and hide the details in fuse. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-23 18:21 ` Joerg Roedel ` (2 preceding siblings ...) 2010-03-24 4:57 ` Avi Kivity @ 2010-03-24 5:09 ` Andi Kleen 2010-03-24 6:42 ` Avi Kivity 3 siblings, 1 reply; 390+ messages in thread From: Andi Kleen @ 2010-03-24 5:09 UTC (permalink / raw) To: Joerg Roedel Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins Joerg Roedel <joro@8bytes.org> writes: > > Sidenote: I really think we should come to a conclusion about the > concept. KVM integration into perf is very useful feature to > analyze virtualization workloads. Agreed. I especially would like to see instruction/branch tracing working this way. This would a lot of the benefits of a simulator on a real CPU. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 5:09 ` Andi Kleen @ 2010-03-24 6:42 ` Avi Kivity 2010-03-24 7:38 ` Andi Kleen 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 6:42 UTC (permalink / raw) To: Andi Kleen Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 07:09 AM, Andi Kleen wrote: > Joerg Roedel<joro@8bytes.org> writes: > >> Sidenote: I really think we should come to a conclusion about the >> concept. KVM integration into perf is very useful feature to >> analyze virtualization workloads. >> > Agreed. I especially would like to see instruction/branch tracing > working this way. This would a lot of the benefits of a simulator on > a real CPU. > If you're profiling a single guest it makes more sense to do this from inside the guest - you can profile userspace as well as the kernel. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 6:42 ` Avi Kivity @ 2010-03-24 7:38 ` Andi Kleen 2010-03-24 8:59 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Andi Kleen @ 2010-03-24 7:38 UTC (permalink / raw) To: Avi Kivity Cc: Andi Kleen, Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins > If you're profiling a single guest it makes more sense to do this from > inside the guest - you can profile userspace as well as the kernel. I'm interested in debugging the guest without guest cooperation. In many cases qemu's new gdb stub works for that, but in some cases I would prefer instruction/branch traces over standard gdb style debugging. I used to use that very successfully with simulators in the past for some hard bugs. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 7:38 ` Andi Kleen @ 2010-03-24 8:59 ` Avi Kivity 2010-03-24 9:31 ` Andi Kleen 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-24 8:59 UTC (permalink / raw) To: Andi Kleen Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/24/2010 09:38 AM, Andi Kleen wrote: >> If you're profiling a single guest it makes more sense to do this from >> inside the guest - you can profile userspace as well as the kernel. >> > I'm interested in debugging the guest without guest cooperation. > > In many cases qemu's new gdb stub works for that, but in some cases > I would prefer instruction/branch traces over standard gdb style > debugging. > Isn't gdb supposed to be able to use branch traces? It makes sense to expose them via the gdb stub then. Not to say an external tool doesn't make sense. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-24 8:59 ` Avi Kivity @ 2010-03-24 9:31 ` Andi Kleen 0 siblings, 0 replies; 390+ messages in thread From: Andi Kleen @ 2010-03-24 9:31 UTC (permalink / raw) To: Avi Kivity Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins Avi Kivity <avi@redhat.com> writes: > On 03/24/2010 09:38 AM, Andi Kleen wrote: >>> If you're profiling a single guest it makes more sense to do this from >>> inside the guest - you can profile userspace as well as the kernel. >>> >> I'm interested in debugging the guest without guest cooperation. >> >> In many cases qemu's new gdb stub works for that, but in some cases >> I would prefer instruction/branch traces over standard gdb style >> debugging. >> > > Isn't gdb supposed to be able to use branch traces? AFAIK not. The ptrace interface is only used by idb I believe. I might be wrong on that. Not sure if there is even a remote protocol command for branch traces either. There's a concept of "tracepoints" in the protocol, but it doesn't quite match at. > It makes sense to > expose them via the gdb stub then. Not to say an external tool > doesn't make sense. Ok that would work for me too. As long as I can set start/stop triggers and pipe the log somewhere it's fine for me. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 14:32 ` Ingo Molnar 2010-03-22 14:43 ` Anthony Liguori @ 2010-03-22 14:46 ` Avi Kivity 2010-03-22 16:08 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-22 14:46 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 04:32 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/22/2010 02:44 PM, Ingo Molnar wrote: >> >>> This is why i consider that line of argument rather dishonest ... >>> >> I am not going to reply to any more email from you on this thread. >> > Because i pointed out that i consider a line of argument intellectually > dishonest? > > I did not say _you_ as a person are dishonest - doing that would be an ad > honimen attack against your person. (In fact i dont think you are, to the > contrary) > > An argument can certainly be labeled dishonest in a fair discussion and it is > not a personal attack against you to express my opinion about that. > > Sigh, why am I drawn into this. A person who uses dishonest arguments is a dishonest person. When you say I use a dishonest argument you are implying I am dishonest. Why do you argue with me at all if you think I am trying to cheat? If you disagree with me, tell me I am wrong, not dishonest (or that my arguments are dishonest). And this is just one example in this thread. Seriously, tools/kvm would cause a loss of developers, not a gain, simply because of the style of argument of some people on this list. Maybe qemu/kernels is a better idea. Again, if you want to talk to me, use the same language you'd like to hear yourself. Or maybe years of lkml made you so thick skinned you no longer understand how to interact with people. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 14:46 ` Avi Kivity @ 2010-03-22 16:08 ` Ingo Molnar 2010-03-22 16:13 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 16:08 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins * Avi Kivity <avi@redhat.com> wrote: > On 03/22/2010 04:32 PM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>On 03/22/2010 02:44 PM, Ingo Molnar wrote: > >>>This is why i consider that line of argument rather dishonest ... > >>I am not going to reply to any more email from you on this thread. > >Because i pointed out that i consider a line of argument intellectually > >dishonest? > > > > I did not say _you_ as a person are dishonest - doing that would be an ad > > honimen attack against your person. (In fact i dont think you are, to the > > contrary) > > > > An argument can certainly be labeled dishonest in a fair discussion and it > > is not a personal attack against you to express my opinion about that. > > > > Sigh, why am I drawn into this. > > A person who uses dishonest arguments is a dishonest person. [...] That's not how i understood that phrase - and i did not mean to suggest that you are dishonest and i do not think that you are dishonest (to the contrary). Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 16:08 ` Ingo Molnar @ 2010-03-22 16:13 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 16:13 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins On 03/22/2010 06:08 PM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/22/2010 04:32 PM, Ingo Molnar wrote: >> >>> * Avi Kivity<avi@redhat.com> wrote: >>> >>> >>>> On 03/22/2010 02:44 PM, Ingo Molnar wrote: >>>> >>>>> This is why i consider that line of argument rather dishonest ... >>>>> >>>> I am not going to reply to any more email from you on this thread. >>>> >>> Because i pointed out that i consider a line of argument intellectually >>> dishonest? >>> >>> I did not say _you_ as a person are dishonest - doing that would be an ad >>> honimen attack against your person. (In fact i dont think you are, to the >>> contrary) >>> >>> An argument can certainly be labeled dishonest in a fair discussion and it >>> is not a personal attack against you to express my opinion about that. >>> >>> >> Sigh, why am I drawn into this. >> >> A person who uses dishonest arguments is a dishonest person. [...] >> > That's not how i understood that phrase - and i did not mean to suggest that > you are dishonest and i do not think that you are dishonest (to the contrary). > Word games. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-22 7:13 ` Avi Kivity 2010-03-22 11:14 ` Ingo Molnar @ 2010-03-24 12:06 ` Paolo Bonzini 1 sibling, 0 replies; 390+ messages in thread From: Paolo Bonzini @ 2010-03-24 12:06 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 08:13 AM, Avi Kivity wrote: > > (btw, why are you interested in desktop-on-desktop? one use case is > developers, which don't really need fancy GUIs; a second is people who > test out distributions, but that doesn't seem to be a huge population; > and a third is people running Windows for some application that doesn't > run on Linux - hopefully a small catergory as well. This third category is pretty well served by virt-manager. It has its quirks and shortcomings, but at least it exists. Paolo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 21:42 ` Avi Kivity 2010-03-21 21:54 ` Ingo Molnar @ 2010-03-21 22:00 ` Ingo Molnar 2010-03-21 23:50 ` Anthony Liguori ` (2 more replies) 1 sibling, 3 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 22:00 UTC (permalink / raw) To: Avi Kivity Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > > Consider the _other_ examples that are a lot more clear: > > > > ' If you expose paravirt spilocks via KVM please also make sure the KVM > > tooling can make use of it, has an option for it to configure it, and > > that it has sufficient efficiency statistics displayed in the tool for > > admins to monitor.' > > > > ' If you create this new paravirt driver then please also make sure it can > > be configured in the tooling. ' > > > > ' Please also add a testcase for this bug to tools/kvm/testcases/ so we dont > > repeat this same mistake in the future. ' > > All three happen quite commonly in qemu/kvm development. Of course someone > who develops a feature also develops a patch that exposes it in qemu. There > are several test cases in qemu-kvm.git/kvm/user/test. If that is the theory then it has failed to trickle through in practice. As you know i have reported a long list of usability problems with hardly a look. That list could be created by pretty much anyone spending a few minutes of getting a first impression with qemu-kvm. So something is seriously wrong in KVM land, to pretty much anyone trying it for the first time. I have explained how i see the root cause of that, while you seem to suggest that there's nothing wrong to begin with. I guess we'll have to agree to disagree on that. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 22:00 ` Ingo Molnar @ 2010-03-21 23:50 ` Anthony Liguori 2010-03-22 0:25 ` Anthony Liguori 2010-03-22 7:18 ` Avi Kivity 2 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-21 23:50 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 05:00 PM, Ingo Molnar wrote: > If that is the theory then it has failed to trickle through in practice. As > you know i have reported a long list of usability problems with hardly a look. > That list could be created by pretty much anyone spending a few minutes of > getting a first impression with qemu-kvm. > I think the point you're missing is that your list was from the perspective of someone looking at a desktop virtualization solution that had was graphically oriented. As Avi has repeatedly mentioned, so far, that has not been the target audience of QEMU. The target audience tends to be: 1) people looking to do server virtualization and 2) people looking to do command line based development. Usually, both (1) and (2) are working on machines that are remotely located. What's important to these users is that VMs be easily launchable from the command line, that there is a lot of flexibility in defining machine types, and that there is a programmatic way to interact with a given instance of QEMU. Those are the things that we've been focusing on recently. The reason we don't have better desktop virtualization support is simple. No one is volunteering to do it and no company is funding development for it. When you look at something like VirtualBox, what you're looking at is a long ago forked version of QEMU with a GUI added focusing on desktop virtualization. There is no magic behind adding a better, more usable GUI to QEMU. It just takes resources. I understand that you're trying to make the point that without catering to the desktop virtualization use case, we won't get as many developers as we could. Personally, I don't think that argument is accurate. If you look at VirtualBox, it's performance is terrible. Having a nice GUI hasn't gotten them the type of developers that can improve their performance. No one is arguing that we wouldn't like to have a nicer UI. I'd love to merge any patch like that. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 22:00 ` Ingo Molnar 2010-03-21 23:50 ` Anthony Liguori @ 2010-03-22 0:25 ` Anthony Liguori 2010-03-22 7:18 ` Avi Kivity 2 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-22 0:25 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/21/2010 05:00 PM, Ingo Molnar wrote: > If that is the theory then it has failed to trickle through in practice. As > you know i have reported a long list of usability problems with hardly a look. > That list could be created by pretty much anyone spending a few minutes of > getting a first impression with qemu-kvm. > Can you transfer your list to the following wiki page: http://wiki.qemu.org/Features/Usability This thread is so large that I can't find your note that contained the initial list. I want to make sure this input doesn't die once this thread settles down. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-21 22:00 ` Ingo Molnar 2010-03-21 23:50 ` Anthony Liguori 2010-03-22 0:25 ` Anthony Liguori @ 2010-03-22 7:18 ` Avi Kivity 2 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-22 7:18 UTC (permalink / raw) To: Ingo Molnar Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/22/2010 12:00 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> Consider the _other_ examples that are a lot more clear: >>> >>> ' If you expose paravirt spilocks via KVM please also make sure the KVM >>> tooling can make use of it, has an option for it to configure it, and >>> that it has sufficient efficiency statistics displayed in the tool for >>> admins to monitor.' >>> >>> ' If you create this new paravirt driver then please also make sure it can >>> be configured in the tooling. ' >>> >>> ' Please also add a testcase for this bug to tools/kvm/testcases/ so we dont >>> repeat this same mistake in the future. ' >>> >> All three happen quite commonly in qemu/kvm development. Of course someone >> who develops a feature also develops a patch that exposes it in qemu. There >> are several test cases in qemu-kvm.git/kvm/user/test. >> > If that is the theory then it has failed to trickle through in practice. As > you know i have reported a long list of usability problems with hardly a look. > That list could be created by pretty much anyone spending a few minutes of > getting a first impression with qemu-kvm. > It does happen in practice, just not in the GUI areas, since no one is working on them. I am not going to condition a qcow2 reliability fix to a gtk GUI. > So something is seriously wrong in KVM land, to pretty much anyone trying it > for the first time. I have explained how i see the root cause of that, while > you seem to suggest that there's nothing wrong to begin with. I guess we'll > have to agree to disagree on that. > Not anyone trying it for the first time. RHEV-M users will see a polished GUI that can be used to manage thousands of guests and hosts. I presume IBM and Siemens (and all other contributors) users will also enjoy a good user experience with their respective products. Qemu is not the only GUI for kvm. So far only one company was interested in a qemu GUI - the makers of virtualbox. Unfortunately they chose not to contribute that back to qemu, and no one was sufficiently motivated to pick out the bits and try to merge them. Again, if you are interested in a qemu GUI, you either have to write it yourself or convince someone else to do it. My own plate is full and my priorities are clear. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:11 ` Anthony Liguori 2010-03-18 16:28 ` Ingo Molnar @ 2010-03-19 9:19 ` Paul Mundt 2010-03-19 9:52 ` Olivier Galibert 1 sibling, 1 reply; 390+ messages in thread From: Paul Mundt @ 2010-03-19 9:19 UTC (permalink / raw) To: Anthony Liguori Cc: Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Thu, Mar 18, 2010 at 11:11:43AM -0500, Anthony Liguori wrote: > On 03/18/2010 10:17 AM, Ingo Molnar wrote: > >* Anthony Liguori<anthony@codemonkey.ws> wrote: > >>On 03/18/2010 08:00 AM, Ingo Molnar wrote: > >>>>[...] kvm in fact knows nothing about vga, to take your last example. > >>>>[...] > >>>> > >>>Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG > >>>ioctl. > >>> > >>>See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap(). > >>> > >>>It started out as a VGA optimization (also used by live migration) and > >>>even today it's mostly used by the VGA drivers - albeit a weak one. > >>> > >>>I wish there were stronger VGA optimizations implemented, copying the > >>>dirty bitmap is not a particularly performant solution. (although it's > >>>certainly better than full emulation) Graphics performance is one of the > >>>more painful aspects of KVM usability today. > >>> > >>We have to maintain a dirty bitmap because we don't have a paravirtual > >>graphics driver. IOW, someone needs to write an Xorg driver. > >> > >>Ideally, we could just implement a Linux framebuffer device, right? > >> > >No, you'd want to interact with DRM. > > Using DRM doesn't help very much. You still need an X driver and most > of the operations you care about (video rendering, window movement, etc) > are not operations that need to go through DRM. > > 3D graphics virtualization is extremely difficult in the non-passthrough > case. It really requires hardware support that isn't widely available > today (outside a few NVIDIA chipsets). > Implementing a virtualized DRM/KMS driver would at least get you the framebuffer interface more or less for free, while allowing you to deal with the userspace side of things incrementally (ie, running a dummy xorg on top of the virtualized fbdev until the DRI side catches up). It would also enable you to focus on the 2D and 3D parts independently. > It doesn't provide the things we need to a good user experience. You > need things like an absolute input device, host driven display resize, > RGBA hardware cursors. None of these go through DRI and it's those > things that really provide the graphics user experience. > None of these things negate the benefit one would get from a virtualized DRM/KMS driver either. There are multiple problems that need solving in this area, and it's a bit disingenuous to discount a valid suggestion out of hand due to the fact it doesn't solve all of the outstanding issues. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-19 9:19 ` Paul Mundt @ 2010-03-19 9:52 ` Olivier Galibert 2010-03-19 13:56 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 390+ messages in thread From: Olivier Galibert @ 2010-03-19 9:52 UTC (permalink / raw) To: Paul Mundt Cc: Anthony Liguori, Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Fri, Mar 19, 2010 at 06:19:04PM +0900, Paul Mundt wrote: > Implementing a virtualized DRM/KMS driver would at least get you the > framebuffer interface more or less for free, while allowing you to deal > with the userspace side of things incrementally (ie, running a dummy xorg > on top of the virtualized fbdev until the DRI side catches up). It would > also enable you to focus on the 2D and 3D parts independently. Guys, have a look at Gallium. In many ways it's a pile of crap, but at least it's a pile of crap designed by vmware for *exactly* your problem space. OG. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [LKML] Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-19 9:52 ` Olivier Galibert @ 2010-03-19 13:56 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 390+ messages in thread From: Konrad Rzeszutek Wilk @ 2010-03-19 13:56 UTC (permalink / raw) To: Olivier Galibert, Paul Mundt, Anthony Liguori, Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Fri, Mar 19, 2010 at 10:52:08AM +0100, Olivier Galibert wrote: > On Fri, Mar 19, 2010 at 06:19:04PM +0900, Paul Mundt wrote: > > Implementing a virtualized DRM/KMS driver would at least get you the > > framebuffer interface more or less for free, while allowing you to deal > > with the userspace side of things incrementally (ie, running a dummy xorg > > on top of the virtualized fbdev until the DRI side catches up). It would > > also enable you to focus on the 2D and 3D parts independently. > > Guys, have a look at Gallium. In many ways it's a pile of crap, but > at least it's a pile of crap designed by vmware for *exactly* your > problem space. Or perhaps Chromium, which was designed years ago and can pass-through OpenGL commands via a pipe. VirtualBox uses it for their PV drivers. Naturally it is not a FB, just a OpenGL command pass-through interface. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [LKML] Re: [RFC] Unify KVM kernel-space and user-space code into a single project @ 2010-03-19 13:56 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 390+ messages in thread From: Konrad Rzeszutek Wilk @ 2010-03-19 13:56 UTC (permalink / raw) To: Olivier Galibert, Paul Mundt, Anthony Liguori, Ingo Molnar, Avi Kivity, On Fri, Mar 19, 2010 at 10:52:08AM +0100, Olivier Galibert wrote: > On Fri, Mar 19, 2010 at 06:19:04PM +0900, Paul Mundt wrote: > > Implementing a virtualized DRM/KMS driver would at least get you the > > framebuffer interface more or less for free, while allowing you to deal > > with the userspace side of things incrementally (ie, running a dummy xorg > > on top of the virtualized fbdev until the DRI side catches up). It would > > also enable you to focus on the 2D and 3D parts independently. > > Guys, have a look at Gallium. In many ways it's a pile of crap, but > at least it's a pile of crap designed by vmware for *exactly* your > problem space. Or perhaps Chromium, which was designed years ago and can pass-through OpenGL commands via a pipe. VirtualBox uses it for their PV drivers. Naturally it is not a FB, just a OpenGL command pass-through interface. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 11:48 ` Ingo Molnar 2010-03-18 12:22 ` Avi Kivity @ 2010-03-18 14:53 ` Anthony Liguori 2010-03-18 16:13 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Anthony Liguori @ 2010-03-18 14:53 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 06:48 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >> On 03/18/2010 12:50 PM, Ingo Molnar wrote: >> >>> * Avi Kivity<avi@redhat.com> wrote: >>> >>> >>>>> The moment any change (be it as trivial as fixing a GUI detail or as >>>>> complex as a new feature) involves two or more packages, development speed >>>>> slows down to a crawl - while the complexity of the change might be very >>>>> low! >>>>> >>>> Why is that? >>>> >>> It's very simple: because the contribution latencies and overhead compound, >>> almost inevitably. >>> >> It's not inevitable, if the projects are badly run, you'll have high >> latencies, but projects don't have to be badly run. >> > So the 64K dollar question is, why does Qemu still suck? > Why does Linux AIO still suck? Why do we not have a proper interface in userspace for doing asynchronous file system operations? Why don't we have an interface in userspace to do zero-copy transmit and receive of raw network packets? The lack of a decent userspace API for asynchronous file system operations is a huge usability problem for us. Take a look at the complexity of our -drive option. It's all because the kernel gives us sucky interfaces. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 14:53 ` Anthony Liguori @ 2010-03-18 16:13 ` Ingo Molnar 2010-03-18 16:54 ` Avi Kivity ` (3 more replies) 0 siblings, 4 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 16:13 UTC (permalink / raw) To: Anthony Liguori Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/18/2010 06:48 AM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>On 03/18/2010 12:50 PM, Ingo Molnar wrote: > >>>* Avi Kivity<avi@redhat.com> wrote: > >>> > >>>>>The moment any change (be it as trivial as fixing a GUI detail or as > >>>>>complex as a new feature) involves two or more packages, development speed > >>>>>slows down to a crawl - while the complexity of the change might be very > >>>>>low! > >>>>Why is that? > >>>It's very simple: because the contribution latencies and overhead compound, > >>>almost inevitably. > >>It's not inevitable, if the projects are badly run, you'll have high > >>latencies, but projects don't have to be badly run. > >So the 64K dollar question is, why does Qemu still suck? > > Why does Linux AIO still suck? Why do we not have a proper interface in > userspace for doing asynchronous file system operations? Good that you mention it, i think it's an excellent example. The suckage of kernel async IO is for similar reasons: there's an ugly package separation problem between the kernel and between glibc - and between the apps that would make use of it. ( With the separated libaio it was made worse: there were 3 libraries to work with, and even less applications that could make use of it ... ) So IMO klibc is an arguably good idea - eventually hpa will get around posting it for upstream merging again. Then we could offer both new libraries much faster, and could offer things like comprehensive AIO used pervasively within existing APIs. > Why don't we have an interface in userspace to do zero-copy transmit and > receive of raw network packets? > > The lack of a decent userspace API for asynchronous file system operations > is a huge usability problem for us. Take a look at the complexity of our > -drive option. It's all because the kernel gives us sucky interfaces. If you had your bits in tools/kvm/ you could make a strong case for a good kaio implementation - coupled with an actual, working use-case. ( You could use the raw syscall even without klibc. ) We could see the arguments on lkml turn from: 'do we want this and it will take years to propagate this into apps' into something like: ' Exactly how much faster does kvm go? and I'd get is straight away with my next kernel update tomorrow? Wow! ' Ok, i exaggerated a bit - but you get the idea. It's a much different picture when kernel developers and maintainers see an actual use-case, _right in the kernel repo they work with every day_. Currently there's a wall between kernel developers and user-space developers, and there's somewhat of an element of fear and arrogance on both sides. For efficient technology such walls needs torn down and people need a bit more experience with each other's areas. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:13 ` Ingo Molnar @ 2010-03-18 16:54 ` Avi Kivity 2010-03-18 17:11 ` Ingo Molnar 2010-03-18 18:20 ` Anthony Liguori ` (2 subsequent siblings) 3 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-18 16:54 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 06:13 PM, Ingo Molnar wrote: > Currently there's a wall between kernel developers and user-space developers, > and there's somewhat of an element of fear and arrogance on both sides. For > efficient technology such walls needs torn down and people need a bit more > experience with each other's areas. > I think you're increasing the height of that wall by arguing that a userspace project cannot be successful because it's development process sucks and the only way to fix it is to put it into the kernel where people know so much better. Instead we kernel developers should listen to requirements from users, even if their code isn't in tools/. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:54 ` Avi Kivity @ 2010-03-18 17:11 ` Ingo Molnar 0 siblings, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 17:11 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/18/2010 06:13 PM, Ingo Molnar wrote: > > > Currently there's a wall between kernel developers and user-space > > developers, and there's somewhat of an element of fear and arrogance on > > both sides. For efficient technology such walls needs torn down and people > > need a bit more experience with each other's areas. > > I think you're increasing the height of that wall by arguing that a > userspace project cannot be successful because it's development process > sucks and the only way to fix it is to put it into the kernel where people > know so much better. Instead we kernel developers should listen to > requirements from users, even if their code isn't in tools/. No, it's tearing down that wall because finally, instead of providing rather abstract system calls that are designed perfectly, the kernel can operate by providing useful libraries and apps. At least on the context i've worked on it has torn down walls and has improved the efficiency of working on ABIs towards user-space. (sysprof is an example of that) Kernel developers are finally faced with user-space development directly, in the same repository, using the same rules of contribution. Non-kernel-hosted apps win from that process too, as even if they dont integrate (because they dont want to or cannot for license reasons) they can participate in a more direct (and more practical) exchange with kernel developers. They can contribute a new system call and create a library function for it straight away. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:13 ` Ingo Molnar 2010-03-18 16:54 ` Avi Kivity @ 2010-03-18 18:20 ` Anthony Liguori 2010-03-18 18:23 ` drepper 2010-03-21 13:27 ` Gabor Gombas 3 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-18 18:20 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 11:13 AM, Ingo Molnar wrote: > Good that you mention it, i think it's an excellent example. > The suckage of kernel async IO is for similar reasons: there's an ugly package > separation problem between the kernel and between glibc - and between the apps > that would make use of it. > > ( With the separated libaio it was made worse: there were 3 libraries to > work with, and even less applications that could make use of it ... ) > > So IMO klibc is an arguably good idea - eventually hpa will get around posting > it for upstream merging again. Then we could offer both new libraries much > faster, and could offer things like comprehensive AIO used pervasively within > existing APIs. > And why wouldn't the kernel developers produce posix-aio within klibc. posix-aio is also a really terrible interface (although not as bad as linux-aio). The reason boils down to the fact that these interfaces are designed without interacting with the consumers. Part of the reason for that is the attitude of the community. You approached this discussion with, "QEMU/KVM sucks, you should move into the kernel because we're awesome and we'd fix everything in a heart beat". That attitude does not result in any useful collaboration. Had you started trying to understand what the problems that we face are and whether there's anything that can be done in the kernel to improve it, it would have been an entirely different discussion. The sad thing is, QEMU is probably one of the most demanding free software applications out there today with respect to performance. We consume interfaces IO interfaces and things like large pages in a deeper way than just about any application out there. We've been trying for a long time to improve Linux interfaces for years but we've not had many people in the kernel community be receptive. We've failed to improve the userspace networking interfaces. Compare Rusty's posting of vringfd to vhost-net. They are the same interface except we tried to do something more generally useful with vringfd and it was shot down because it was "yet another kernel/userspace data transfer interface". Unfortunately, we're learning that if we claim something is virtualization specific, we avoid a lot of the kernel bureaucracy. My concern is that over time, we'll have more things like vhost and that's bad for everyone. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:13 ` Ingo Molnar 2010-03-18 16:54 ` Avi Kivity 2010-03-18 18:20 ` Anthony Liguori @ 2010-03-18 18:23 ` drepper 2010-03-18 19:15 ` Ingo Molnar 2010-03-21 13:27 ` Gabor Gombas 3 siblings, 1 reply; 390+ messages in thread From: drepper @ 2010-03-18 18:23 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker [-- Attachment #1: Type: text/plain, Size: 781 bytes --] On Thu, Mar 18, 2010 at 09:13, Ingo Molnar <mingo@elte.hu> wrote: > The suckage of kernel async IO is for similar reasons: there's an ugly package > separation problem between the kernel and between glibc Bollocks. glibc would use (and is using) everything the kernel provides. We even have an implementation using the current AIO code. It only works in some situations but that's what the few users are OK with. Don't try to blame anyone but kernel people for the complete and utter failure of AIO in Linux. I don't know how often I've discussed design of a kernel interface with various kernel developers. Heck, whenever Zach Brown and I meet there never is a different topic. And following these meetings the ball is not and cannot be in my court. How could it? [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 272 bytes --] ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 18:23 ` drepper @ 2010-03-18 19:15 ` Ingo Molnar 2010-03-18 19:37 ` drepper 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 19:15 UTC (permalink / raw) To: drepper Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * drepper@gmail.com <drepper@gmail.com> wrote: > On Thu, Mar 18, 2010 at 09:13, Ingo Molnar <mingo@elte.hu> wrote: > > > The suckage of kernel async IO is for similar reasons: there's an ugly > > package separation problem between the kernel and between glibc > > Bollocks. glibc would use (and is using) everything the kernel provides. I didnt say it's glibc's fault - if then it's more of the kernel's fault as most of the complexity is on that side. I said it's due to the fundamental distance between the app that makes use of it, the library and the kernel, and the resulting difficulties in getting a combined solution out. None of the parties really feels it to be their own thing. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 19:15 ` Ingo Molnar @ 2010-03-18 19:37 ` drepper 2010-03-18 20:18 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: drepper @ 2010-03-18 19:37 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Avi Kivity, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker [-- Attachment #1: Type: text/plain, Size: 1490 bytes --] On Thu, Mar 18, 2010 at 12:15, Ingo Molnar <mingo@elte.hu> wrote: > I didnt say it's glibc's fault - if then it's more of the kernel's fault as > most of the complexity is on that side. I said it's due to the fundamental > distance between the app that makes use of it, the library and the kernel, and > the resulting difficulties in getting a combined solution out. This is wrong, too. Once there is a kernel patch that has a reasonable syscall interface it's easy enough to hack up the glibc side. Don't try to artificially find an argument to support your thesis. If kernel developers always need an immediate itch which lives inside the kernel walls to make a change this is a failure of the kernel model and mustn't be "solved" by dragging ever more code into the kernel. Aside, you don't need a full-fledged glibc implementation for testing. Especially for AIO it should be usable in much lighter-weight contexts than POSIX AIO. These wrappers are even more easy to hack up (and have been in the few cases where some code has been produced). For AIO the situation isn't that the people interested in working on it don't know or care about the use. Zach (through Oracle's products) is very much interested in the code and knows how it should look like. Face it, AIO is an example of a complete failure of the kernel developers to provide something usable. This was the argument and where you started the misdirection of including other projects in the reasoning. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 272 bytes --] ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 19:37 ` drepper @ 2010-03-18 20:18 ` Ingo Molnar 2010-03-18 20:39 ` drepper 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 20:18 UTC (permalink / raw) To: drepper Cc: Anthony Liguori, Avi Kivity, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * drepper@gmail.com <drepper@gmail.com> wrote: > On Thu, Mar 18, 2010 at 12:15, Ingo Molnar <mingo@elte.hu> wrote: > > > I didnt say it's glibc's fault - if then it's more of the kernel's fault > > as most of the complexity is on that side. I said it's due to the > > fundamental distance between the app that makes use of it, the library and > > the kernel, and the resulting difficulties in getting a combined solution > > out. > > This is wrong, too. Once there is a kernel patch that has a reasonable > syscall interface it's easy enough to hack up the glibc side. [...] Where 'reasonable' is defined by you, right? As i said, the KAIO situation is mostly the kernel's fault, but you are a pretty passive and unhelpful entity in this matter too, arent you? For example, just to state the obvious: libaio has been written 8 years ago in 2002 and has been used in apps early on. Why arent those kernel APIs, while not being a full/complete solution, supported by glibc, and wrapped to pthreads based emulation on kernels that dont support it? I'm not talking about a 100% full POSIX AIO implementation (the kernel side is not complete enough for that) - i'm just talking about the APIs that libaio and the kernel supports today. Why isnt glibc itself making use of those AIO capabilities internally? (even if it's not possible to support full POSIX AIO) I checked today's glibc repo, and there's no sign of any of that: glibc> git grep io_submit glibc> git grep aio_context_t glibc> Zero, nil, nada. Getting _something_ into glibc would certainly help move the situation. Glibc itself using existing KAIO bits internally would help too and dont tell me it's 100% unusable: it's certainly capable enough to run DB servers. glibc using it would create further demand (and pressure, and incentives) for improvements. There were even glibc patches created by Ben LaHaise for some of these bits, IIRC. One can certainly make the argument that glibc not using _any_ of the current KAIO capabilities harms its further development. > [...] Don't try to artificially find an argument to support your thesis. Charming argumentation style, i really missed it. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 20:18 ` Ingo Molnar @ 2010-03-18 20:39 ` drepper 2010-03-18 20:56 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: drepper @ 2010-03-18 20:39 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Avi Kivity, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker [-- Attachment #1: Type: text/plain, Size: 3022 bytes --] On Thu, Mar 18, 2010 at 13:18, Ingo Molnar <mingo@elte.hu> wrote: > Where 'reasonable' is defined by you, right? Not only by me. For some of the AIO approaches which happened there were also glibc patches other people wrote. It's pretty simple. > As i said, the KAIO situation is mostly the kernel's fault, but you are a > pretty passive and unhelpful entity in this matter too, arent you? How'd you guess? I've always been been willing to discuss interface requirements with whoever showed interest in implementing things. Again, ask Zach. I think Christoph Lameter also was involved as were various SGI people over the years. Short of actually doing all the work myself I've done what can be expected. > For example, just to state the obvious: libaio has been written 8 years ago in > 2002 and has been used in apps early on. Why arent those kernel APIs, while > not being a full/complete solution, supported by glibc, and wrapped to > pthreads based emulation on kernels that dont support it? You never looked at the glibc code in use and didn't read what I wrote before. We do have an implementation of libaio using those interfaces. They exist in the Fedora/RHEL glibc and are probably copied elsewhere, too. The code is not upstream because it is not general enough. It simply doesn't work in all situations. The problem with using it (among others) is that certain operations cannot be implemented. And that's not a kernel interface problem. I cannot just switch to using the pthread-based code when coming across something that's not implementable because then the requests have already been sent to the kernel. Only code that knows about the limitations ahead of time can use the KAIO code. > Why isnt glibc itself making use of those AIO capabilities internally? (even > if it's not possible to support full POSIX AIO) For what? glibc doesn't implement anything requiring AIO. The only non-trivial file handling is in nscd and nscd uses memory mapped files. > I checked today's glibc repo, and there's no sign of any of that: Check the Fedora/RHEL/... source files. > Getting _something_ into glibc would certainly help move the situation. No it won't as the 7+ since Jakub wrote the code nothing came out of it. And before you again make groundless claims, there was plenty of discussions with kernel people at the time when the code was written. > it's certainly capable enough to run DB servers. glibc > using it would create further demand (and pressure, and incentives) for > improvements. There simply is no need for AIO in glibc internally. Well, there might be, if it could be used on sockets. But that's not the case. > Charming argumentation style, i really missed it. Well, then this last mail should show you. Without knowing the subject matter, just based on flawed lookups you try to spread the blame to make sure that no mud ever sticks to development process you are so fond of. Sorry to disappoint you. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 272 bytes --] ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 20:39 ` drepper @ 2010-03-18 20:56 ` Ingo Molnar 2010-03-18 22:06 ` Alan Cox 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 20:56 UTC (permalink / raw) To: drepper Cc: Anthony Liguori, Avi Kivity, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * drepper@gmail.com <drepper@gmail.com> wrote: > > For example, just to state the obvious: libaio has been written 8 years > > ago in 2002 and has been used in apps early on. Why arent those kernel > > APIs, while not being a full/complete solution, supported by glibc, and > > wrapped to pthreads based emulation on kernels that dont support it? > > You never looked at the glibc code in use and didn't read what I wrote > before. We do have an implementation of libaio using those interfaces. > They exist in the Fedora/RHEL glibc and are probably copied elsewhere, too. > The code is not upstream because it is not general enough. It simply > doesn't work in all situations. So it's good enough to be in Fedora/RHEL but not good enough to be in upstream glibc? How is that possible? Isnt that a double standard? Upstream libc presence is really what is needed for an API to be ubiquitous to apps. That is what 'closes the loop' in the the positive feedback cycle loop and creates real back pressure and demand on the kernel to get its act together. Again, i state it for the third time, the KAIO situation is mostly the kernel's fault. But glibc is certainly not being helpful in that situation either and your earlier claim that you are only waiting for the patches is rather dishonest. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 20:56 ` Ingo Molnar @ 2010-03-18 22:06 ` Alan Cox 2010-03-18 22:16 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Alan Cox @ 2010-03-18 22:06 UTC (permalink / raw) To: Ingo Molnar Cc: drepper, Anthony Liguori, Avi Kivity, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker > So it's good enough to be in Fedora/RHEL but not good enough to be in upstream > glibc? How is that possible? Isnt that a double standard? Yes its a double standard Glibc has a higher standard than Fedora/RHEL. Just like the Ubuntu kernel ships various ugly unfit for upstream kernel drivers. > kernel's fault. But glibc is certainly not being helpful in that situation > either and your earlier claim that you are only waiting for the patches is > rather dishonest. I am sure Ulrich is being totally honest, but send him the patches and you'll find out. Plus you will learn what the API should look like when you try and create them ... Alan ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 22:06 ` Alan Cox @ 2010-03-18 22:16 ` Ingo Molnar 2010-03-19 7:22 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 22:16 UTC (permalink / raw) To: Alan Cox Cc: drepper, Anthony Liguori, Avi Kivity, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > So it's good enough to be in Fedora/RHEL but not good enough to be in > > upstream glibc? How is that possible? Isnt that a double standard? > > Yes its a double standard > > Glibc has a higher standard than Fedora/RHEL. > > Just like the Ubuntu kernel ships various ugly unfit for upstream kernel > drivers. There's a world of a difference between a fugly driver and a glibc patch. Also, we tend to upstream even fugly kernel drivers if they are important and are deployed by a major distro - see Noveau. > > kernel's fault. But glibc is certainly not being helpful in that situation > > either and your earlier claim that you are only waiting for the patches is > > rather dishonest. > > I am sure Ulrich is being totally honest, but send him the patches and > you'll find out. Plus you will learn what the API should look like when you > try and create them ... I was there and extended/fixed bits of the kaio/libaio code when they were written so yes i already know something about it. To say that the glibc reaction was less than enthusiastic back then is a strong euphemism ;-) So after 8 years some of the bits made its way into Fedora/RHEL. I think this is a pretty good demonstration of the points i made ;-) Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 22:16 ` Ingo Molnar @ 2010-03-19 7:22 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-19 7:22 UTC (permalink / raw) To: Ingo Molnar Cc: Alan Cox, drepper, Anthony Liguori, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/19/2010 12:16 AM, Ingo Molnar wrote: > >> Yes its a double standard >> >> Glibc has a higher standard than Fedora/RHEL. >> >> Just like the Ubuntu kernel ships various ugly unfit for upstream kernel >> drivers. >> > There's a world of a difference between a fugly driver and a glibc patch. > > Yes, fugly drivers can be cleaned up, but glibc ABIs are forever. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:13 ` Ingo Molnar ` (2 preceding siblings ...) 2010-03-18 18:23 ` drepper @ 2010-03-21 13:27 ` Gabor Gombas 3 siblings, 0 replies; 390+ messages in thread From: Gabor Gombas @ 2010-03-21 13:27 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Thu, Mar 18, 2010 at 05:13:10PM +0100, Ingo Molnar wrote: > > Why does Linux AIO still suck? Why do we not have a proper interface in > > userspace for doing asynchronous file system operations? > > Good that you mention it, i think it's an excellent example. > > The suckage of kernel async IO is for similar reasons: there's an ugly package > separation problem between the kernel and between glibc - and between the apps > that would make use of it. No, kernel async IO sucks because it still does not play well with buffered I/O. Last time I checked (about a year ago or so), AIO syscall latencies were much worse when buffered I/O was used compared to direct I/O. Unfortunately, to achieve good performance with direct I/O, you need a HW RAID card with lots of on-board cache. Gabor ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 10:50 ` Ingo Molnar 2010-03-18 11:30 ` Avi Kivity @ 2010-03-18 21:02 ` Zachary Amsden 2010-03-18 21:15 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Zachary Amsden @ 2010-03-18 21:02 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 12:50 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> The moment any change (be it as trivial as fixing a GUI detail or as >>> complex as a new feature) involves two or more packages, development speed >>> slows down to a crawl - while the complexity of the change might be very >>> low! >>> >> Why is that? >> > It's very simple: because the contribution latencies and overhead compound, > almost inevitably. > > If you ever tried to implement a combo GCC+glibc+kernel feature you'll know > ... > > Even with the best-run projects in existence it takes forever and is very > painful - and here i talk about first hand experience over many years. > Ingo, what you miss is that this is not a bad thing. Fact of the matter is, it's not just painful, it downright sucks. This is actually a Good Thing (tm). It means you have to get your feature and its interfaces well defined and able to version forwards and backwards independently from each other. And that introduces some complexity and time and testing, but in the end it's what you want. You don't introduce a requirement to have the feature, but take advantage of it if it is there. It may take everyone else a couple years to upgrade the compilers, tools, libraries and kernel, and by that time any bugs introduced by interacting with this feature will have been ironed out and their patterns well known. If you haven't well defined and carefully thought out the feature ahead of time, you end up creating a giant mess, possibly the need for nasty backwards compatibility (case in point: COMPAT_VDSO). But in the end, you would have made those same mistakes on your internal tree anyway, and then you (or likely, some other hapless project maintainer for the project you forked) would have to go add the features, fixes and workarounds back to the original project(s). However, since you developed in an insulated sheltered environment, those fixes and workarounds would not be robust and independently versionable from each other. The result is you've kept your codebase version-neutral, forked in outside code, enhanced it, and left the hard work of backporting those changes and keeping them version-safe to the original package maintainers you forked from. What you've created is no longer a single project, it is called a distro, and you're being short-sighted and anti-social to think you can garner more support than all of those individual packages you forked. This is why most developers work upstream and let the goodness propagate down from the top like molten sugar of each granular package on a flan where it is collected from the rich custard channel sitting on a distribution plate below before the big hungry mouth of the consumer devours it and incorporates it into their infrastructure. Or at least, something like that, until the last sentence. In short, if project A has Y active developers, you better have Z >> Y active developers to throw at project A when you fork it into project B. Zach ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 21:02 ` Zachary Amsden @ 2010-03-18 21:15 ` Ingo Molnar 2010-03-18 22:19 ` Zachary Amsden 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 21:15 UTC (permalink / raw) To: Zachary Amsden Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Zachary Amsden <zamsden@redhat.com> wrote: > On 03/18/2010 12:50 AM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>>The moment any change (be it as trivial as fixing a GUI detail or as > >>>complex as a new feature) involves two or more packages, development speed > >>>slows down to a crawl - while the complexity of the change might be very > >>>low! > >>Why is that? > >It's very simple: because the contribution latencies and overhead compound, > >almost inevitably. > > > >If you ever tried to implement a combo GCC+glibc+kernel feature you'll know > >... > > > >Even with the best-run projects in existence it takes forever and is very > >painful - and here i talk about first hand experience over many years. > > Ingo, what you miss is that this is not a bad thing. Fact of the > matter is, it's not just painful, it downright sucks. Our experience is the opposite, and we tried both variants and report about our experience with both models honestly. You only have experience about one variant - the one you advocate. See the assymetry? > This is actually a Good Thing (tm). It means you have to get your > feature and its interfaces well defined and able to version forwards > and backwards independently from each other. And that introduces > some complexity and time and testing, but in the end it's what you > want. You don't introduce a requirement to have the feature, but > take advantage of it if it is there. > > It may take everyone else a couple years to upgrade the compilers, > tools, libraries and kernel, and by that time any bugs introduced by > interacting with this feature will have been ironed out and their > patterns well known. Sorry, but this is pain not true. The 2.4->2.6 kernel cycle debacle has taught us that waiting long to 'iron out' the details has the following effects: - developer pain - user pain - distro pain - disconnect - loss of developers, testers and users - grave bugs discovered months (years ...) down the line - untested features - developer exhaustion It didnt work, trust me - and i've been around long enough to have suffered through the whole 2.5.x misery. Some of our worst ABIs come from that cycle as well. So we first created the 2.6.x process, then as we saw that it worked much better we _sped up_ the kernel development process some more, to what many claimed was an impossible, crazy pace: two weeks merge window, 2.5 months stabilization and a stable release every 3 months. And you can also see the countless examples of carefully drafted, well thought out, committee written computer standards that were honed for years, which are not worth the paper they are written on. 'extra time' and 'extra buerocratic overhead to think things through' is about the worst thing you can inject into a development process. You should think about the human brain as a cache - the 'closer' things are both in time and pyshically, the better they end up being. Also, the more gradual, the more concentrated a thing is, the better it works out in general. This is part of the basic human nature. Sorry, but i really think you are really trying to rationalize a disadvantage here ... Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 21:15 ` Ingo Molnar @ 2010-03-18 22:19 ` Zachary Amsden 2010-03-18 22:44 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Zachary Amsden @ 2010-03-18 22:19 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 11:15 AM, Ingo Molnar wrote: > * Zachary Amsden<zamsden@redhat.com> wrote: > > >> On 03/18/2010 12:50 AM, Ingo Molnar wrote: >> >>> * Avi Kivity<avi@redhat.com> wrote: >>> >>> >>>>> The moment any change (be it as trivial as fixing a GUI detail or as >>>>> complex as a new feature) involves two or more packages, development speed >>>>> slows down to a crawl - while the complexity of the change might be very >>>>> low! >>>>> >>>> Why is that? >>>> >>> It's very simple: because the contribution latencies and overhead compound, >>> almost inevitably. >>> >>> If you ever tried to implement a combo GCC+glibc+kernel feature you'll know >>> ... >>> >>> Even with the best-run projects in existence it takes forever and is very >>> painful - and here i talk about first hand experience over many years. >>> >> Ingo, what you miss is that this is not a bad thing. Fact of the >> matter is, it's not just painful, it downright sucks. >> > Our experience is the opposite, and we tried both variants and report about > our experience with both models honestly. > > You only have experience about one variant - the one you advocate. > > See the assymetry? > > >> This is actually a Good Thing (tm). It means you have to get your >> feature and its interfaces well defined and able to version forwards >> and backwards independently from each other. And that introduces >> some complexity and time and testing, but in the end it's what you >> want. You don't introduce a requirement to have the feature, but >> take advantage of it if it is there. >> >> It may take everyone else a couple years to upgrade the compilers, >> tools, libraries and kernel, and by that time any bugs introduced by >> interacting with this feature will have been ironed out and their >> patterns well known. >> > Sorry, but this is pain not true. The 2.4->2.6 kernel cycle debacle has taught > us that waiting long to 'iron out' the details has the following effects: > > - developer pain > - user pain > - distro pain > - disconnect > - loss of developers, testers and users > - grave bugs discovered months (years ...) down the line > - untested features > - developer exhaustion > > It didnt work, trust me - and i've been around long enough to have suffered > through the whole 2.5.x misery. Some of our worst ABIs come from that cycle as > well. > You're talking about a single project and comparing it to my argument about multiple independent projects. In that case, I see no point in the discussion. If you want to win the argument by strawman, you are welcome to do so. > Sorry, but i really think you are really trying to rationalize a disadvantage > here ... > This could very well be true, but until someone comes forward with compelling numbers (as in, developers committed to working on the project, number of patches and total amount of code contribution), there is no point in having an argument, there really isn't anything to discuss other than opinion. My opinion is you need a really strong justification to have a successful fork and I don't see that justification. Zach ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 22:19 ` Zachary Amsden @ 2010-03-18 22:44 ` Ingo Molnar 2010-03-19 7:21 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 22:44 UTC (permalink / raw) To: Zachary Amsden Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Zachary Amsden <zamsden@redhat.com> wrote: > On 03/18/2010 11:15 AM, Ingo Molnar wrote: > >* Zachary Amsden<zamsden@redhat.com> wrote: > > > >>On 03/18/2010 12:50 AM, Ingo Molnar wrote: > >>>* Avi Kivity<avi@redhat.com> wrote: > >>> > >>>>>The moment any change (be it as trivial as fixing a GUI detail or as > >>>>>complex as a new feature) involves two or more packages, development speed > >>>>>slows down to a crawl - while the complexity of the change might be very > >>>>>low! > >>>>Why is that? > >>>It's very simple: because the contribution latencies and overhead compound, > >>>almost inevitably. > >>> > >>>If you ever tried to implement a combo GCC+glibc+kernel feature you'll know > >>>... > >>> > >>>Even with the best-run projects in existence it takes forever and is very > >>>painful - and here i talk about first hand experience over many years. > >>Ingo, what you miss is that this is not a bad thing. Fact of the > >>matter is, it's not just painful, it downright sucks. > >Our experience is the opposite, and we tried both variants and report about > >our experience with both models honestly. > > > >You only have experience about one variant - the one you advocate. > > > >See the assymetry? > > > >>This is actually a Good Thing (tm). It means you have to get your > >>feature and its interfaces well defined and able to version forwards > >>and backwards independently from each other. And that introduces > >>some complexity and time and testing, but in the end it's what you > >>want. You don't introduce a requirement to have the feature, but > >>take advantage of it if it is there. > >> > >>It may take everyone else a couple years to upgrade the compilers, > >>tools, libraries and kernel, and by that time any bugs introduced by > >>interacting with this feature will have been ironed out and their > >>patterns well known. > >Sorry, but this is pain not true. The 2.4->2.6 kernel cycle debacle has taught > >us that waiting long to 'iron out' the details has the following effects: > > > > - developer pain > > - user pain > > - distro pain > > - disconnect > > - loss of developers, testers and users > > - grave bugs discovered months (years ...) down the line > > - untested features > > - developer exhaustion > > > >It didnt work, trust me - and i've been around long enough to have suffered > >through the whole 2.5.x misery. Some of our worst ABIs come from that cycle as > >well. > > You're talking about a single project and comparing it to my argument about > multiple independent projects. In that case, I see no point in the > discussion. If you want to win the argument by strawman, you are welcome to > do so. The kernel is a very complex project with many ABI issues, so all those arguments apply to it as well. The description you gave: | This is actually a Good Thing (tm). It means you have to get your feature | and its interfaces well defined and able to version forwards and backwards | independently from each other. And that introduces some complexity and | time and testing, but in the end it's what you want. You don't introduce a | requirement to have the feature, but take advantage of it if it is there. matches the kernel too. We have many such situations. (Furthermore, the tools/perf/ situation, which relates to ABIs and user-space/kernel-space interactions is similar as well.) Do you still think i'm making a straw-man argument? > > Sorry, but i really think you are really trying to rationalize a > > disadvantage here ... > > This could very well be true, but until someone comes forward with > compelling numbers (as in, developers committed to working on the project, > number of patches and total amount of code contribution), there is no point > in having an argument, there really isn't anything to discuss other than > opinion. My opinion is you need a really strong justification to have a > successful fork and I don't see that justification. I can give you rough numbers for tools/perf - if that counts for you. For the first four months of its existence, when it was a separate project, i had a single external contributor IIRC. The moment it went into the kernel repo the number of contributors and contributions skyrocketed and basically all contributions were top-notch. We are at 60+ separate contributors now (after about 8 months upstream) - which is still small compared to the kernel or to Qemu, but huge for a relatively isolated project like instrumentation. So in my estimation tools/kvm/ would certainly be popular. Whether it would be more popular than current Qemu is hard to tell - it would be pure speculation. Any reliable numbers for the other aspect, whether a split project creates a more fragile and less developed ABI would be extremely hard to get. I believe it to be true, but that's my opinion based on my experience with other projects, extrapolated to KVM/Qemu. Anyway, the issue is moot as there's clear opposition to the unification idea. Too bad - there was heavy initial opposition to the arch/x86 unification as well [and heavy opposition to tools/perf/ as well], still both worked out extremely well :-) Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 22:44 ` Ingo Molnar @ 2010-03-19 7:21 ` Avi Kivity 2010-03-20 14:59 ` Andrea Arcangeli 0 siblings, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-19 7:21 UTC (permalink / raw) To: Ingo Molnar Cc: Zachary Amsden, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/19/2010 12:44 AM, Ingo Molnar wrote: > > Too bad - there was heavy initial opposition to the arch/x86 unification as > well [and heavy opposition to tools/perf/ as well], still both worked out > extremely well :-) > Did you forget that arch/x86 was a merging of a code fork that happened several years previously? Maybe that fork shouldn't have been done to begin with. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-19 7:21 ` Avi Kivity @ 2010-03-20 14:59 ` Andrea Arcangeli 2010-03-21 10:03 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Andrea Arcangeli @ 2010-03-20 14:59 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Zachary Amsden, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On Fri, Mar 19, 2010 at 09:21:49AM +0200, Avi Kivity wrote: > On 03/19/2010 12:44 AM, Ingo Molnar wrote: > > > > Too bad - there was heavy initial opposition to the arch/x86 unification as > > well [and heavy opposition to tools/perf/ as well], still both worked out > > extremely well :-) > > > > Did you forget that arch/x86 was a merging of a code fork that happened > several years previously? Maybe that fork shouldn't have been done to > begin with. We discussed and probably timidly tried to share the sharable initially but we realized it was too time wasteful. In addition to having to adapt the code to 64bit we would also had to constantly solve another problem on top of it (see the various split on _32/_64, those takes time to achieve, maybe not huge time but still definitely some time and effort). Even in retrospect I am quite sure the way x86-64 happened was optimal and if we would go back we would do it again the exact same way even if the final object was to have a common arch/x86 (and thankfully Linus is flexible and smart enough to realize that code that isn't risking to destabilize anything shouldn't be forced out just because it's not to a totally theoretical-perfect-nitpicking-clean-state yet). It's still a lot of work do the unification later as a separate task, but it's not like if we did it immediately it would have been a lot less work. It's about the same amount of effort and we were able to defer it for later and decrease the time to market which surely has contributed to the success of x86-64. Problem of qemu is not some lack of GUI or that it's not included in the linux kernel git tree, the definitive problem is how to merge qemu-kvm/kvm and qlx into it. If you (Avi) were the qemu maintainer I am sure there wouldn't two trees so as a developer I would totally love it, and I am sure that with you as maintainer it would have a chance to move forward with qlx on desktop virtualization without proposing to extend vnc instead to achieve a "similar" result (imagine if btrfs is published on a website and people starts to discuss if it should ever be merged ever because reinventing some part of btrfs inside ext5 might achieve ""similar"" results). About a GUI for KVM to use on desktop distributions, that is an irrelevant concern compared to the lack of protocol more efficient than rdesktop/rdp/vnc for desktop virtualization. I've people asking me to migrate hundreds of desktops to desktop virtualization on KVM in their organizations and I tell them to use spice because I believe it's the most efficient option available (at least as far as we stick to open source open protocols), there are universities using spice on thousand of student desktops, and I think we need paravirt graphics to happen ASAP in the main qemu tree too. In short: running KVM on the desktop is irrelevant compared to running the desktop on KVM so I suggest to focus on what is more important first ;). Thanks, Andrea ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-20 14:59 ` Andrea Arcangeli @ 2010-03-21 10:03 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-21 10:03 UTC (permalink / raw) To: Andrea Arcangeli Cc: Ingo Molnar, Zachary Amsden, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/20/2010 04:59 PM, Andrea Arcangeli wrote: > On Fri, Mar 19, 2010 at 09:21:49AM +0200, Avi Kivity wrote: > >> On 03/19/2010 12:44 AM, Ingo Molnar wrote: >> >>> Too bad - there was heavy initial opposition to the arch/x86 unification as >>> well [and heavy opposition to tools/perf/ as well], still both worked out >>> extremely well :-) >>> >>> >> Did you forget that arch/x86 was a merging of a code fork that happened >> several years previously? Maybe that fork shouldn't have been done to >> begin with. >> > We discussed and probably timidly tried to share the sharable > initially but we realized it was too time wasteful. In addition to > having to adapt the code to 64bit we would also had to constantly > solve another problem on top of it (see the various split on _32/_64, > those takes time to achieve, maybe not huge time but still definitely > some time and effort). Even in retrospect I am quite sure the way > x86-64 happened was optimal and if we would go back we would do it > again the exact same way even if the final object was to have a common > arch/x86 (and thankfully Linus is flexible and smart enough to realize > that code that isn't risking to destabilize anything shouldn't be > forced out just because it's not to a totally > theoretical-perfect-nitpicking-clean-state yet). It's still a lot of > work do the unification later as a separate task, but it's not like if > we did it immediately it would have been a lot less work. It's about > the same amount of effort and we were able to defer it for later and > decrease the time to market which surely has contributed to the > success of x86-64. > In hindsight decisions are much easier. I agree it was less risky to fork than to share. But if another instruction set forks out a 64-bit not-exactly-compatible variant, I'm sure we'll start out shared and not fork it, especially if the platform remains the same. > Problem of qemu is not some lack of GUI or that it's not included in > the linux kernel git tree, the definitive problem is how to merge > qemu-kvm/kvm and qlx into it. If you (Avi) were the qemu maintainer I > am sure there wouldn't two trees so as a developer I would totally > love it, and I am sure that with you as maintainer it would have a > chance to move forward with qlx on desktop virtualization without > proposing to extend vnc instead to achieve a "similar" result (imagine > if btrfs is published on a website and people starts to discuss if it > should ever be merged ever because reinventing some part of btrfs > inside ext5 might achieve ""similar"" results). > The qemu/qemu-kvm fork is definitely hurting. Some history: when kvm started out I pulled qemu for fast hacking and, much like arch/x86_64, I couldn't destabilize qemu for something that was completely experimental (and closed source at the time). Moreover, it wasn't clear if the qemu community would be interested. The qemu-kvm fork was designed for minimal intrusion so I could merge upstream qemu regularly. This resulted in kvm integration that was fairly ugly. Later Anthony merged a well-integrated alternative implementation (in retrospect this was a mistake IMO - we were left with a well tested high performing ugly implementation and a clean, slow, untested, and unfeatured implementation, and no one who wants to merge the two). So now it is pretty confusing to read the code which has the two alternate implementation sometimes sharing code and sometimes diverging. > About a GUI for KVM to use on desktop distributions, that is an > irrelevant concern compared to the lack of protocol more efficient > than rdesktop/rdp/vnc for desktop virtualization. I've people asking > me to migrate hundreds of desktops to desktop virtualization on KVM in > their organizations and I tell them to use spice because I believe > it's the most efficient option available (at least as far as we stick > to open source open protocols), there are universities using spice on > thousand of student desktops, and I think we need paravirt graphics to > happen ASAP in the main qemu tree too. > That effort will have to wait for the spice project to mature. > In short: running KVM on the desktop is irrelevant compared to running > the desktop on KVM so I suggest to focus on what is more important > first ;). > Anyone can focus on what interests them, if someone has an interest in a good desktop-on-desktop experience they should start hacking and sending patches. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 8:20 ` Avi Kivity 2010-03-18 8:56 ` Ingo Molnar @ 2010-03-18 9:22 ` Ingo Molnar 2010-03-18 10:32 ` Avi Kivity 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 9:22 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > > - move a clean (and minimal) version of the Qemu code base to tools/kvm/, > > in the upstream kernel repo, and work on that from that point on. > > I'll ignore the repository location which should be immaterial to a serious > developer and concentrate on the 'clean and minimal' aspect, since it has > some merit. [...] To the contrary, experience shows that repository location, and in particular a shared repository for closely related bits is very much material! It matters because when there are two separate projects, even a "serious developer" is finding it double and triple difficult to contribute even trivial changes. It becomes literally a nightmare if you have to touch 3 packages: kernel, a library and an app codebase. It takes _forever_ to get anything useful done. Also, 'focus on a single thing' is a very basic aspect of humans, especially those who do computer programming. Working on two code bases in two repositories at once can be very challenging physically and psychically. So what i've seen is that OSS programmers tend to pick a side, pretty much randomly, and then rationalize it in hindsight why they prefer that side ;-) Most of them become either a kernel developer or a user-space package developer - and then they specialize on that field and shy away from changes that involve both. It's a basic human thing to avoid the hassle that comes with multi-package changes. (One really has to be outright stupid, fanatic or desperate to even attempt such changes these days - such are the difficulties for a comparatively low return.) The solution is to tear down such artificial walls of contribution where possible. And tearing down the wall between KVM and qemu-kvm seems very much possible and the advantages would be numerous. Unless by "serious developer" you meant: "developer willing to [or forced to] waste time and effort on illogically structured technology". > [...] > > Do you really think the echo'n'cat school of usability wants to write a GUI? > In linux-2.6.git? Then you'll be surprised to hear that it's happening as we speak and the commits are there in linux-2.6.git. Both a TUI and GUI is in the works. Furthermore, the numbers show that half of the usability fixes to tools/perf/ came not from regular perf contributors but from random kernel developers and testers who when they build the latest kernel and try out perf at the same time (it's very easy because you already have it in the kernel repository - no separate download, no installation, etc. necessary). I had literally zero such contributions when (the precursor to) 'perf' was still a separate user-space project. You could have the same effect for Qemu: the latest bits in tools/kvm/ would be built by regular kernel testers and developers. The integration benefits dont just extend to developers, a unified project is vastly easier to test as well. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 9:22 ` Ingo Molnar @ 2010-03-18 10:32 ` Avi Kivity 2010-03-18 11:19 ` Ingo Molnar 2010-03-18 18:20 ` Frederic Weisbecker 0 siblings, 2 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-18 10:32 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 11:22 AM, Ingo Molnar wrote: > * Avi Kivity<avi@redhat.com> wrote: > > >>> - move a clean (and minimal) version of the Qemu code base to tools/kvm/, >>> in the upstream kernel repo, and work on that from that point on. >>> >> I'll ignore the repository location which should be immaterial to a serious >> developer and concentrate on the 'clean and minimal' aspect, since it has >> some merit. [...] >> > To the contrary, experience shows that repository location, and in particular > a shared repository for closely related bits is very much material! > > It matters because when there are two separate projects, even a "serious > developer" is finding it double and triple difficult to contribute even > trivial changes. > > It becomes literally a nightmare if you have to touch 3 packages: kernel, a > library and an app codebase. It takes _forever_ to get anything useful done. > You can't be serious. I find that the difficulty in contributing a patch has mostly to do with writing the patch, and less with figuring out which email address to send it to. > Also, 'focus on a single thing' is a very basic aspect of humans, especially > those who do computer programming. Working on two code bases in two > repositories at once can be very challenging physically and psychically. > Indeed, working simultaneously on two different projects is difficult. I usually work for a while on one, and then 'cd', physically and psychically, to the other. Then switch back. Sort of like the scheduler on a uniprocessor machine. > So what i've seen is that OSS programmers tend to pick a side, pretty much > randomly, and then rationalize it in hindsight why they prefer that side ;-) > > Most of them become either a kernel developer or a user-space package > developer - and then they specialize on that field and shy away from changes > that involve both. It's a basic human thing to avoid the hassle that comes > with multi-package changes. (One really has to be outright stupid, fanatic or > desperate to even attempt such changes these days - such are the difficulties > for a comparatively low return.) > We have a large number of such stupid, fanatic, desperate developers in the qemu and kvm communities. > The solution is to tear down such artificial walls of contribution where > possible. And tearing down the wall between KVM and qemu-kvm seems very much > possible and the advantages would be numerous. > > Unless by "serious developer" you meant: "developer willing to [or forced to] > waste time and effort on illogically structured technology". > By "serious developer" I mean - someone who is interested in contributing, not in getting their name into the kernel commits list - someone who is willing to read the wiki page and find out where the repository and mailing list for a project is - someone who will spend enough time on the project so that the time to clone two repositories will not be a factor in their contributions - someone who will work on the uncool stuff like fixing bugs and providing interfaces to other tools >> [...] >> >> Do you really think the echo'n'cat school of usability wants to write a GUI? >> In linux-2.6.git? >> > Then you'll be surprised to hear that it's happening as we speak and the > commits are there in linux-2.6.git. Both a TUI and GUI is in the works. > > Furthermore, the numbers show that half of the usability fixes to tools/perf/ > came not from regular perf contributors but from random kernel developers and > testers who when they build the latest kernel and try out perf at the same > time (it's very easy because you already have it in the kernel repository - no > separate download, no installation, etc. necessary). > > I had literally zero such contributions when (the precursor to) 'perf' was > still a separate user-space project. > > You could have the same effect for Qemu: the latest bits in tools/kvm/ would > be built by regular kernel testers and developers. The integration benefits > dont just extend to developers, a unified project is vastly easier to test as > well. > > Let's wait and see then. If the tools/perf/ experience has really good results, we can reconsider this at a later date. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 10:32 ` Avi Kivity @ 2010-03-18 11:19 ` Ingo Molnar 2010-03-18 18:20 ` Frederic Weisbecker 1 sibling, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 11:19 UTC (permalink / raw) To: Avi Kivity Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Avi Kivity <avi@redhat.com> wrote: > On 03/18/2010 11:22 AM, Ingo Molnar wrote: > >* Avi Kivity<avi@redhat.com> wrote: > > > >>> - move a clean (and minimal) version of the Qemu code base to tools/kvm/, > >>> in the upstream kernel repo, and work on that from that point on. > >>I'll ignore the repository location which should be immaterial to a serious > >>developer and concentrate on the 'clean and minimal' aspect, since it has > >>some merit. [...] > > > > To the contrary, experience shows that repository location, and in > > particular a shared repository for closely related bits is very much > > material! > > > > It matters because when there are two separate projects, even a "serious > > developer" is finding it double and triple difficult to contribute even > > trivial changes. > > > > It becomes literally a nightmare if you have to touch 3 packages: kernel, > > a library and an app codebase. It takes _forever_ to get anything useful > > done. > > You can't be serious. I find that the difficulty in contributing a patch > has mostly to do with writing the patch, and less with figuring out which > email address to send it to. My own experience and everyone i've talked about such topics (developers and distro people) about feature contribution tells the exact opposite: it's much harder to contribute features to multiple packages than to a single project. kernel+library+app features take forever to propagate, and there's constant fear of version friction, productization deadlines are uncertain and ABI messups are frequent as well due to disjoint testing. Also, each component has essential veto power: so if the proposed API or approach is opposed or changed in a later stage then that affects (sometimes already committed) changes. If you've ever done it you'll know how tedious it is. This very thread and recent threads about KVM usability demonstrate the same complications. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 10:32 ` Avi Kivity 2010-03-18 11:19 ` Ingo Molnar @ 2010-03-18 18:20 ` Frederic Weisbecker 2010-03-18 19:50 ` Frank Ch. Eigler 1 sibling, 1 reply; 390+ messages in thread From: Frederic Weisbecker @ 2010-03-18 18:20 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo On Thu, Mar 18, 2010 at 12:32:51PM +0200, Avi Kivity wrote: > By "serious developer" I mean > > - someone who is interested in contributing, not in getting their name > into the kernel commits list > - someone who is willing to read the wiki page and find out where the > repository and mailing list for a project is > - someone who will spend enough time on the project so that the time to > clone two repositories will not be a factor in their contributions I'm not going to argue about the Qemu merging here. But your above assessment is incomplete. It is not because developers don't want to clone two different trees that tools/perf is a success. Or may be it's a factor but I suspect it to be very minimal. I can script git commands if needed. It is actually because both kernel and user side are sync in this scheme. > Let's wait and see then. If the tools/perf/ experience has really good > results, we can reconsider this at a later date. I think it has already really good results. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 18:20 ` Frederic Weisbecker @ 2010-03-18 19:50 ` Frank Ch. Eigler 2010-03-18 20:47 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Frank Ch. Eigler @ 2010-03-18 19:50 UTC (permalink / raw) To: Frederic Weisbecker Cc: Avi Kivity, Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo Frederic Weisbecker <fweisbec@gmail.com> writes: > [...] It is actually because both kernel and user side are sync in > this scheme. [...] This argues that co-evolution of an interface is easiest on the developers if they own both sides of that interface. No quarrel. This does not argue that that the preservation of a stable ABI is best done this way. If anything, it makes it too easy to change both the provider and the preferred user of the interface without noticing unintentional breakage to forlorn out-of-your-tree clients. - FChE ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 19:50 ` Frank Ch. Eigler @ 2010-03-18 20:47 ` Ingo Molnar 0 siblings, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 20:47 UTC (permalink / raw) To: Frank Ch. Eigler Cc: Frederic Weisbecker, Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo * Frank Ch. Eigler <fche@redhat.com> wrote: > Frederic Weisbecker <fweisbec@gmail.com> writes: > > > [...] It is actually because both kernel and user side are sync in this > > scheme. [...] > > This argues that co-evolution of an interface is easiest on the developers > if they own both sides of that interface. No quarrel. Correct, that's a big advantage. > This does not argue that that the preservation of a stable ABI is best done > this way. If anything, it makes it too easy to change both the provider and > the preferred user of the interface without noticing unintentional breakage > to forlorn out-of-your-tree clients. Your concern is valid, and this issue has been raised in the past as one of the main counter-arguments against tools/perf/. (there was a big flamewar about it on lkml when it was introduced) Our roughly 1 year experience with perf is that, somewhat pradoxially, this scheme not only works as well as classic ABI schemes but actually brings a _better_ ABI than the classic "let the kernel define an ABI" single-sided solution. I know the difference first hand, i've written various syscalls ABIs in the past 10+ years before perf and know how they interact with their user space counterparts. Why did it work out better with tools/perf/? It turns out that there's an immediate, direct, actionable test feedback effect on the ABI, and much closer relation to the ABI. Typically the same developer implements the kernel bits and the user-space bits (because it's so easy to do co-development), so the ABI aspects are ingrained in the developer much more deeply. Once you see the kind of havoc ABI breakage can cause during development you avoid it in the future. So developers find that a good, stable ABI helps development. It turns out that developers dont actually _want_ to break the ABI and are careful about it - and having the app next to the kernel ABI and co-developing it makes it sure there's never any true mismatch. Also, we can do ABI improvements at a far higher rate than any other kernel subsystem. I checked the git logs, we've done over three dozen ABI extensions since the first version, and all were forwards _and_ backwards compatible. A higher rate of change gives developers more experience and lets them do a better ABI, and makes them more ABI-conscious. I think if all kernel ABIs had such a healthy rate of change we'd fill in all the missing kernel features very quickly. With detached packages ABI features are often done by a kernel developer (who is familar with the kernel subsystem in question) and a separate user-space developer (who is familar with the user-space project in question), and the ABI consciousness is less strong. So you are right that there's a danger of accidental ABI breakage, but it's not an issue in practice. There are external apps making use of the ABI as well, not just tools/perf/. In a more abstract sense this is kind of a classic case of game theory: that a assume-trust strategy pays off in the long run. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-17 8:10 ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar 2010-03-18 8:20 ` Avi Kivity @ 2010-03-18 8:44 ` Jes Sorensen 2010-03-18 9:54 ` Ingo Molnar 2010-03-19 14:53 ` Andrea Arcangeli 2010-03-18 14:38 ` Anthony Liguori 2010-03-18 14:44 ` Anthony Liguori 3 siblings, 2 replies; 390+ messages in thread From: Jes Sorensen @ 2010-03-18 8:44 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/17/10 09:10, Ingo Molnar wrote: > I wish both you and Avi looked back 3-4 years and realized what made KVM so > successful back then and why the hearts and minds of virtualization developers > were captured by KVM almost overnight. Ingo, What made KVM so successful was that the core kernel of the hypervisor was designed the right way, as a kernel module where it belonged. It was obvious to anyone who had been exposed to the main competition at the time, Xen, that this was the right approach. What has ended up killing Xen in the end is the not-invented-here approach of copying everything over, reformatting it, and rewriting half of it, which made it impossible to maintain and support as a single codebase. At my previous employer we ended up dropping all Xen efforts exactly because it was like maintaining two separate operating system kernels. The key to KVM is that once you have Linux, you practically have KVM as well. > Fast forward to 2010. The kernel side of KVM is maximum goodness - by far the > worst-quality remaining aspects of KVM are precisely in areas that you > mention: 'if we have to support third party tools, then it significantly > complicates things'. You kept Qemu as an external 'third party' entity to KVM, > and KVM is clearly hurting from that - just see the recent KVM usability > thread for examples about suckage. > > So a similar 'complication' is the crux of the matter behind KVM quality > problems: you've not followed through with the original KVM vision and you > have not applied that concept to Qemu! Well there are two ways to go about this. Either you base the KVM userland on top of an existing project, like QEMU, _or_ you rewrite it all from scratch. However, there is far more to it than just a couple of ioctls, for example the stack of reverse device-drivers is a pretty significant code base, rewriting that and maintaining it is not a trivial task. It is certainly my belief that the benefit we get from sharing that with QEMU by far outweighs the cost of forking it and keeping our own fork in the kernel tree. In fact it would result in exactly the same problems I mentioned above wrt Xen. > If you want to jump to the next level of technological quality you need to fix > this attitude and you need to go back to the design roots of KVM. Concentrate > on Qemu (as that is the weakest link now), make it a first class member of the > KVM repo and simplify your development model by having a single repo: > > - move a clean (and minimal) version of the Qemu code base to tools/kvm/, in > the upstream kernel repo, and work on that from that point on. With this you have just thrown away all the benefits of having the QEMU repository shared with other developers who will actively fix bugs in components we do care about for KVM. > - encourage kernel-space and user-space KVM developers to work on both > user-space and kernel-space bits as a single unit. It's one project and a > single experience to the user. This is already happening and a total non issue. > - [ and probably libvirt should go there too ] Now that would be interesting, next we'll have to include things like libxml in the kernel git tree as well, to make sure libvirt doesn't get out of sync with the version supplied by your distribution vendor. > Yes, i've read a thousand excuses for why this is an absolutely impossible and > a bad thing to do, and none of them was really convincing to me - and you also > have become rather emotional about all the arguments so it's hard to argue > about it on a technical basis. So far your argument would justify pulling all of gdb into the kernel git tree as well, to support the kgdb efforts, or gcc so we can get rid of the gcc version quirks in the kernel header files, e2fsprogs and equivalent for _all_ file systems included in the kernel so we can make sure our fs tools never get out of sync with whats supported in the kernel...... > We made a similar (admittedly very difficult ...) design jump from oprofile to > perf, and i can tell you from that experience that it's day and night, both in > terms of development and in terms of the end result! The user components for perf vs oprofile are _tiny_ projects compared to the portions of QEMU that are actually used by KVM. Oh and you completely forgot SeaBIOS. KVM+QEMU rely on SeaBIOS too, so from what you're saying we should pull that into the kernel git repository as well. Never mind the fact that we share SeaBIOS with the coreboot project which is very actively adding features to it that benefit us as well..... Sorry, but there are times when unification make sense, and there are times where having a reasonably well designed split makes sense. KVM had problems with QEMU in the past which resulted in the qemu-kvm branch of it, which proved to be a major pain to deal with, but that is fortunately improving and qemu-kvm should go away completely at some point. Cheers, Jes ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 8:44 ` Jes Sorensen @ 2010-03-18 9:54 ` Ingo Molnar 2010-03-18 10:40 ` Jes Sorensen 2010-03-19 14:53 ` Andrea Arcangeli 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 9:54 UTC (permalink / raw) To: Jes Sorensen Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > On 03/17/10 09:10, Ingo Molnar wrote: > > > I wish both you and Avi looked back 3-4 years and realized what made KVM > > so successful back then and why the hearts and minds of virtualization > > developers were captured by KVM almost overnight. > > Ingo, > > What made KVM so successful was that the core kernel of the hypervisor was > designed the right way, as a kernel module where it belonged. It was obvious > to anyone who had been exposed to the main competition at the time, Xen, > that this was the right approach. What has ended up killing Xen in the end > is the not-invented-here approach of copying everything over, reformatting > it, and rewriting half of it, which made it impossible to maintain and > support as a single codebase. [...] Yes, exactly. I was part of that nightmare so i know. > [...] > > At my previous employer we ended up dropping all Xen efforts exactly because > it was like maintaining two separate operating system kernels. The key to > KVM is that once you have Linux, you practically have KVM as well. Yes. Please realize that what is behind it is a strikingly simple argument: "Once you have a single project to develop and maintain all is much better." > > Fast forward to 2010. The kernel side of KVM is maximum goodness - by far > > the worst-quality remaining aspects of KVM are precisely in areas that you > > mention: 'if we have to support third party tools, then it significantly > > complicates things'. You kept Qemu as an external 'third party' entity to > > KVM, and KVM is clearly hurting from that - just see the recent KVM > > usability thread for examples about suckage. > > > > So a similar 'complication' is the crux of the matter behind KVM quality > > problems: you've not followed through with the original KVM vision and you > > have not applied that concept to Qemu! > > Well there are two ways to go about this. Either you base the KVM userland > on top of an existing project, like QEMU, _or_ you rewrite it all from > scratch. [...] Btw., i made similar arguments to Avi about 3 years ago when it was going upstream, that qemu should be unified with KVM. This is more true today than ever. > [...] However, there is far more to it than just a couple of ioctls, for > example the stack of reverse device-drivers is a pretty significant code > base, rewriting that and maintaining it is not a trivial task. It is > certainly my belief that the benefit we get from sharing that with QEMU by > far outweighs the cost of forking it and keeping our own fork in the kernel > tree. In fact it would result in exactly the same problems I mentioned above > wrt Xen. I do not suggest forking Qemu at all, i suggest using the most natural development model for the KVM+Qemu shared project: a single repository. > > If you want to jump to the next level of technological quality you need to > > fix this attitude and you need to go back to the design roots of KVM. > > Concentrate on Qemu (as that is the weakest link now), make it a first > > class member of the KVM repo and simplify your development model by having > > a single repo: > > > > - move a clean (and minimal) version of the Qemu code base to tools/kvm/, > > in the upstream kernel repo, and work on that from that point on. > > With this you have just thrown away all the benefits of having the QEMU > repository shared with other developers who will actively fix bugs in > components we do care about for KVM. Not if it's a unified project. > > - encourage kernel-space and user-space KVM developers to work on both > > user-space and kernel-space bits as a single unit. It's one project and > > a single experience to the user. > > This is already happening and a total non issue. My experience as an external observer of the end result contradicts this. Seemingly trivial usability changes to the KVM+Qemu combo are not being done often because they involve cross-discipline changes. ( _In this very thread_ there has been a somewhat self-defeating argument by Anthony that multi-package scenario would 'significantly complicate' matters. What more proof do we need to state the obvious? Keeping what has become one piece of technology over the years in two separate halves is obviously bad. ) > > - [ and probably libvirt should go there too ] > > Now that would be interesting, next we'll have to include things like libxml > in the kernel git tree as well, to make sure libvirt doesn't get out of sync > with the version supplied by your distribution vendor. The way we have gone about this in tools/perf/ is similar to the route picked by Git: we only use very lowlevel libraries available everywhere, and we provide optional wrappers to the rest. We are also using the kernel's libraries so we rarely need to go outside to get some functionality. I.e. it's a non-issue in practice and despite perf having an (optional) dependency on xmlto and docbook we dont include those packages nor do we force users to install particular versions of them. > > Yes, i've read a thousand excuses for why this is an absolutely impossible > > and a bad thing to do, and none of them was really convincing to me - and > > you also have become rather emotional about all the arguments so it's hard > > to argue about it on a technical basis. > > So far your argument would justify pulling all of gdb into the kernel git > tree as well, to support the kgdb efforts, or gcc so we can get rid of the > gcc version quirks in the kernel header files, e2fsprogs and equivalent for > _all_ file systems included in the kernel so we can make sure our fs tools > never get out of sync with whats supported in the kernel...... gdb and gcc is clearly extrinsic to the kernel so why would we move them there? I was talking about tools that are closely related to the kernel - where much of the development and actual use is in combination with the Linux kernel. 90%+ of the Qemu usecases are combined with Linux. (Yes, i know that you can run Qemu without KVM, and no, i dont think it matters in the grand scheme of things and most investment into Qemu comes from the KVM angle these days. In particular it for sure does not justify handicapping future KVM evolution so drastically.) > > We made a similar (admittedly very difficult ...) design jump from > > oprofile to perf, and i can tell you from that experience that it's day > > and night, both in terms of development and in terms of the end result! > > The user components for perf vs oprofile are _tiny_ projects compared to the > portions of QEMU that are actually used by KVM. I know the size and scope of Qemu, i even hacked it - still my points remain. (my arguments are influenced and strengthened by that past hacking experience) > Oh and you completely forgot SeaBIOS. KVM+QEMU rely on SeaBIOS too, so from > what you're saying we should pull that into the kernel git repository as > well. Never mind the fact that we share SeaBIOS with the coreboot project > which is very actively adding features to it that benefit us as well..... SeaBIOS is in essence a firmware, so it could either be loaded as such. Just look at the qemu source code - the BIOSes are .bin images in qemu/pc-bios/ imported externally in essence. Moving qemu to tools/kvm/ would not change that much. The firmware could become part of /lib/firmware/*.bin. ( That would probably be a more intelligent approach to the BIOS image import problem as well. ) > Sorry, but there are times when unification make sense, and there are times > where having a reasonably well designed split makes sense. KVM had problems > with QEMU in the past which resulted in the qemu-kvm branch of it, which > proved to be a major pain to deal with, but that is fortunately improving > and qemu-kvm should go away completely at some point. qemu-kvm branch is not similar to my proposal at all: it made KVM _more_ fragmented, not more unified. I.e. it was a move in the exact opposite direction and i'd expect such a move to fail. In fact the failure of qemu-kvm supports my point rather explicitly: it demonstrates that extra packages and split development are actively harmful. I speak about this as a person who has done successful unifications of split codebases and in my judgement this move would be significantly beneficial to KVM. You cannot really validly reject this proposal with "It wont work" as it clearly has worked in other, comparable cases. You could only reject this with "I have tried it and it didnt work". Think about it: a clean and hackable user-space component in tools/kvm/. It's very tempting :-) Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 9:54 ` Ingo Molnar @ 2010-03-18 10:40 ` Jes Sorensen 2010-03-18 10:58 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Jes Sorensen @ 2010-03-18 10:40 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/10 10:54, Ingo Molnar wrote: > * Jes Sorensen<Jes.Sorensen@redhat.com> wrote: [...] >> >> At my previous employer we ended up dropping all Xen efforts exactly because >> it was like maintaining two separate operating system kernels. The key to >> KVM is that once you have Linux, you practically have KVM as well. > > Yes. Please realize that what is behind it is a strikingly simple argument: > > "Once you have a single project to develop and maintain all is much better." Thats a very glorified statement but it's not reality, sorry. You can do that with something like perf because it's so small and development of perf is limited to a very small group of developers. >> [...] However, there is far more to it than just a couple of ioctls, for >> example the stack of reverse device-drivers is a pretty significant code >> base, rewriting that and maintaining it is not a trivial task. It is >> certainly my belief that the benefit we get from sharing that with QEMU by >> far outweighs the cost of forking it and keeping our own fork in the kernel >> tree. In fact it would result in exactly the same problems I mentioned above >> wrt Xen. > > I do not suggest forking Qemu at all, i suggest using the most natural > development model for the KVM+Qemu shared project: a single repository. If you are not suggesting to fork QEMU, what are you suggesting then? You don't seriously expect that the KVM community will be able to mandate that the QEMU community switch to the Linux kernel repository? That would be like telling the openssl developers that they should merge with glibc and start working out of the glibc tree. What you are suggesting is *only* going to happen if we fork QEMU, there is zero chance to move the main QEMU repository into the Linux kernel tree. And trust me, you don't want to have Linus having to deal with handling patches for tcg or embedded board emulation. >> With this you have just thrown away all the benefits of having the QEMU >> repository shared with other developers who will actively fix bugs in >> components we do care about for KVM. > > Not if it's a unified project. You still haven't explained how you expect create a unified KVM+QEMU project, without forking from the existing QEMU. >>> - encourage kernel-space and user-space KVM developers to work on both >>> user-space and kernel-space bits as a single unit. It's one project and >>> a single experience to the user. >> >> This is already happening and a total non issue. > > My experience as an external observer of the end result contradicts this. What I have seen you complain about here is the lack of a good end user GUI for KVM. However that is a different thing. So far no vendor has put significant effort into it, but that is nothing new in Linux. We have a great kernel, but our user applications are still lacking. We have 217 CD players for GNOME, but we have no usable calendering application. A good GUI for virtualization is a big task, and whoever designs it will base their design upon their preferences for whats important. A lot of spare time developers would clearly care most about a gui installation and fancy icons to click on, whereas server users would be much more interested in automation and remote access to the systems. For a good example of an incomplete solution, try installing Fedora over a serial line, you cannot do half the things without launching VNC :( Getting a comprehensive solution for this that would satisfy the bulk of the users would be a huge chunk of code in the kernel tree. Imagine the screaming that would result in? How often have we not had the moaning from x86 users who wanted to rip out all the non x86 code to reduce the size of the tarball? > Seemingly trivial usability changes to the KVM+Qemu combo are not being done > often because they involve cross-discipline changes. Which trivial usability changes? >>> - [ and probably libvirt should go there too ] >> >> Now that would be interesting, next we'll have to include things like libxml >> in the kernel git tree as well, to make sure libvirt doesn't get out of sync >> with the version supplied by your distribution vendor. > > The way we have gone about this in tools/perf/ is similar to the route picked > by Git: we only use very lowlevel libraries available everywhere, and we > provide optional wrappers to the rest. Did you ever look at what libvirt actually does and what it offers? Or how about the various libraries used by QEMU to offer things like VNC support or X support? Again this works fine for something like perf where the primary display is text mode. >> So far your argument would justify pulling all of gdb into the kernel git >> tree as well, to support the kgdb efforts, or gcc so we can get rid of the >> gcc version quirks in the kernel header files, e2fsprogs and equivalent for >> _all_ file systems included in the kernel so we can make sure our fs tools >> never get out of sync with whats supported in the kernel...... > > gdb and gcc is clearly extrinsic to the kernel so why would we move them > there? gdb should go with kgdb which goes with the kernel to keep it in sync. If you want to be consistent in your argument, you have to go all the way. > I was talking about tools that are closely related to the kernel - where much > of the development and actual use is in combination with the Linux kernel. Well the file system tools would obviously have to go into the kernel then so appropriate binaries can be distributed to match the kernel. > 90%+ of the Qemu usecases are combined with Linux. (Yes, i know that you can > run Qemu without KVM, and no, i dont think it matters in the grand scheme of > things and most investment into Qemu comes from the KVM angle these days. In > particular it for sure does not justify handicapping future KVM evolution so > drastically.) 90+%? You got to be kidding? You clearly have no idea just how much it's used for running embedded emulators on non Linux. You should have seen the noise it made when I added C99 initializers to certain structs, because it broke builds using very old GCC versions on BeOS. Linux only, not a chance. Try subscribing to qemu-devel and you'll see a list that is only overtaken by few lists like lkml in terms of daily traffic. >> Oh and you completely forgot SeaBIOS. KVM+QEMU rely on SeaBIOS too, so from >> what you're saying we should pull that into the kernel git repository as >> well. Never mind the fact that we share SeaBIOS with the coreboot project >> which is very actively adding features to it that benefit us as well..... > > SeaBIOS is in essence a firmware, so it could either be loaded as such. > > Just look at the qemu source code - the BIOSes are .bin images in > qemu/pc-bios/ imported externally in essence. Ehm no, QEMU now pulls in SeaBIOS to build it. And there are a lot of changes that require modification in SeaBIOS to match changes to QEMU. > qemu-kvm branch is not similar to my proposal at all: it made KVM _more_ > fragmented, not more unified. I.e. it was a move in the exact opposite > direction and i'd expect such a move to fail. > > In fact the failure of qemu-kvm supports my point rather explicitly: it > demonstrates that extra packages and split development are actively harmful. Ehm it showed what happens when you fork QEMU to modify it primarily for your own project, ie. KVM. You are suggesting we fork QEMU for the benefit of KVM, and it will be exactly the same thing that happens. I know you state that you are not suggesting we fork it, but as I showed above, pulling QEMU into the kernel tree, can only happen as a fork. There is no point pretending otherwise. > I speak about this as a person who has done successful unifications of split > codebases and in my judgement this move would be significantly beneficial to > KVM. > > You cannot really validly reject this proposal with "It wont work" as it > clearly has worked in other, comparable cases. You could only reject this with > "I have tried it and it didnt work". > > Think about it: a clean and hackable user-space component in tools/kvm/. It's > very tempting :-) I say this based on my hacking experience, my experience with the kernel, the QEMU base, SeaBIOS and merging projects. Yes it can be done, but the cost is much higher than the gain. Cheers, Jes ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 10:40 ` Jes Sorensen @ 2010-03-18 10:58 ` Ingo Molnar 2010-03-18 13:23 ` Jes Sorensen 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 10:58 UTC (permalink / raw) To: Jes Sorensen Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > On 03/18/10 10:54, Ingo Molnar wrote: > >* Jes Sorensen<Jes.Sorensen@redhat.com> wrote: > [...] > >> > >>At my previous employer we ended up dropping all Xen efforts exactly because > >>it was like maintaining two separate operating system kernels. The key to > >>KVM is that once you have Linux, you practically have KVM as well. > > > >Yes. Please realize that what is behind it is a strikingly simple argument: > > > > "Once you have a single project to develop and maintain all is much better." > > Thats a very glorified statement but it's not reality, sorry. You can do > that with something like perf because it's so small and development of perf > is limited to a very small group of developers. I was not talking about just perf: i am also talking about the arch/x86/ unification which is 200+ KLOC of highly non-trivial kernel code with hundreds of contributors and with 8000+ commits in the past two years. Also, it applies to perf as well: people said exactly that a year ago: 'perf has it easy to be clean as it is small, once it gets as large as Oprofile tooling it will be in the same messy situation'. Today perf has more features than Oprofile, has a larger and more complex code base, has more contributors, and no, it's not in the same messy situation at all. So whatever you think of large, unified projects, you are quite clearly mistaken. I have done and maintained through two different types of unifications and the experience was very similar: both developers and users (and maintainers) are much better off. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 10:58 ` Ingo Molnar @ 2010-03-18 13:23 ` Jes Sorensen 2010-03-18 14:22 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Jes Sorensen @ 2010-03-18 13:23 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/10 11:58, Ingo Molnar wrote: > > * Jes Sorensen<Jes.Sorensen@redhat.com> wrote: >> Thats a very glorified statement but it's not reality, sorry. You can do >> that with something like perf because it's so small and development of perf >> is limited to a very small group of developers. > > I was not talking about just perf: i am also talking about the arch/x86/ > unification which is 200+ KLOC of highly non-trivial kernel code with hundreds > of contributors and with 8000+ commits in the past two years. Sorry but you cannot compare merging two chunks of kernel code that originated from the same base, with the efforts of mixing a large userland project with a kernel component. Apples and oranges. > Also, it applies to perf as well: people said exactly that a year ago: 'perf > has it easy to be clean as it is small, once it gets as large as Oprofile > tooling it will be in the same messy situation'. > > Today perf has more features than Oprofile, has a larger and more complex code > base, has more contributors, and no, it's not in the same messy situation at > all. Both perf and oprofile are still relatively small projects in comparison to QEMU. > So whatever you think of large, unified projects, you are quite clearly > mistaken. I have done and maintained through two different types of > unifications and the experience was very similar: both developers and users > (and maintainers) are much better off. You believe that I am wrong in my assessment of unified projects, and I obviously think you are mistaken and underestimating the cost and effects of trying to merge the two. Well I think we are just going to agree to disagree on this one. I am not against merging projects where it makes sense, but in this particular case I am strongly convinced the loss would be much greater than the gain. Cheers, Jes ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 13:23 ` Jes Sorensen @ 2010-03-18 14:22 ` Ingo Molnar 2010-03-18 14:45 ` Jes Sorensen 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 14:22 UTC (permalink / raw) To: Jes Sorensen Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > On 03/18/10 11:58, Ingo Molnar wrote: > > > >* Jes Sorensen<Jes.Sorensen@redhat.com> wrote: > >>Thats a very glorified statement but it's not reality, sorry. You can do > >>that with something like perf because it's so small and development of perf > >>is limited to a very small group of developers. > > > > I was not talking about just perf: i am also talking about the arch/x86/ > > unification which is 200+ KLOC of highly non-trivial kernel code with > > hundreds of contributors and with 8000+ commits in the past two years. > > Sorry but you cannot compare merging two chunks of kernel code that > originated from the same base, with the efforts of mixing a large userland > project with a kernel component. Apples and oranges. That's true to a certain degree, but combined with the perf experience it's all rather clear. Similar arguments were made against the x86 unification and against perf. Similar arguments were made against KVM and in favor of Xen years ago - back when few of you knew about it ;-) These are all repeating patterns in my experience. You could fairly contrast that with a _failed_ unification perhaps - but i'm not aware of any such failed unification. (please educate me if you are) The thing is, unifications are rare in the OSS space not because they dont make sense technically (to the contrary), they are rare due to blind inertia (why change if we managed to muddle through with the current scheme?) and to a certain degree due to the egos involved ;-) As such we have a proliferation of packages in Linux, and we'd be much better off in a more focused fashion. And whenever i see that in the kernel's context i'll mention it - as it happened here too. > > Also, it applies to perf as well: people said exactly that a year ago: > > 'perf has it easy to be clean as it is small, once it gets as large as > > Oprofile tooling it will be in the same messy situation'. > > > > Today perf has more features than Oprofile, has a larger and more complex > > code base, has more contributors, and no, it's not in the same messy > > situation at all. > > Both perf and oprofile are still relatively small projects in comparison to > QEMU. So is your argument that the unification does not make sense due to size? Would a smaller Qemu be more appropriate for this purpose? > > So whatever you think of large, unified projects, you are quite clearly > > mistaken. I have done and maintained through two different types of > > unifications and the experience was very similar: both developers and > > users (and maintainers) are much better off. > > You believe that I am wrong in my assessment of unified projects, and I > obviously think you are mistaken and underestimating the cost and effects of > trying to merge the two. > > Well I think we are just going to agree to disagree on this one. I am not > against merging projects where it makes sense, but in this particular case I > am strongly convinced the loss would be much greater than the gain. I wish you said that based on first hand negative experience with unifications, not based on just pure speculation. (and yes, i speculate too, but at least with some basis) Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 14:22 ` Ingo Molnar @ 2010-03-18 14:45 ` Jes Sorensen 2010-03-18 16:54 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Jes Sorensen @ 2010-03-18 14:45 UTC (permalink / raw) To: Ingo Molnar Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/10 15:22, Ingo Molnar wrote: > > * Jes Sorensen<Jes.Sorensen@redhat.com> wrote: >> Both perf and oprofile are still relatively small projects in comparison to >> QEMU. > > So is your argument that the unification does not make sense due to size? > Would a smaller Qemu be more appropriate for this purpose? As I have stated repeatedly in this discussion, a unification would hurt the QEMU development process because it would alienate a large number of QEMU developers who are *not* Linux kernel users. QEMU is a lot more complex than you let on. >> Well I think we are just going to agree to disagree on this one. I am not >> against merging projects where it makes sense, but in this particular case I >> am strongly convinced the loss would be much greater than the gain. > > I wish you said that based on first hand negative experience with > unifications, not based on just pure speculation. > > (and yes, i speculate too, but at least with some basis) You still haven't given us a *single* example of unification of something that wasn't purely linked to the Linux kernel. perf/ oprofile is 100% linked to the Linux kernel, QEMU is not. I wish you would actually look at what users use QEMU for. As long as you continue to purely speculate on this, to use your own words, your arguments are not holding up. And you are not being consistent either. You have conveniently continue to ignore my questions about why the file system tools are not to be merged into the Linux kernel source tree? Jes ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 14:45 ` Jes Sorensen @ 2010-03-18 16:54 ` Ingo Molnar 2010-03-18 18:10 ` Anthony Liguori 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 16:54 UTC (permalink / raw) To: Jes Sorensen Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker * Jes Sorensen <Jes.Sorensen@redhat.com> wrote: > On 03/18/10 15:22, Ingo Molnar wrote: > > > >* Jes Sorensen<Jes.Sorensen@redhat.com> wrote: > >>Both perf and oprofile are still relatively small projects in comparison to > >>QEMU. > > > >So is your argument that the unification does not make sense due to size? > >Would a smaller Qemu be more appropriate for this purpose? > > As I have stated repeatedly in this discussion, a unification would hurt the > QEMU development process because it would alienate a large number of QEMU > developers who are *not* Linux kernel users. I took a quick look at the qemu.git log and more than half of all recent contributions came from Linux distributors. So without KVM Qemu would be a much, much smaller project. It would be similar to how it was 5 years ago. > QEMU is a lot more complex than you let on. Please educate me then about the specifics. > >>Well I think we are just going to agree to disagree on this one. I am not > >>against merging projects where it makes sense, but in this particular case I > >>am strongly convinced the loss would be much greater than the gain. > > > >I wish you said that based on first hand negative experience with > >unifications, not based on just pure speculation. > > > >(and yes, i speculate too, but at least with some basis) > > You still haven't given us a *single* example of unification of something > that wasn't purely linked to the Linux kernel. perf/ oprofile is 100% linked > to the Linux kernel, QEMU is not. I wish you would actually look at what > users use QEMU for. As long as you continue to purely speculate on this, to > use your own words, your arguments are not holding up. The stats show that the huge increase in Qemu contributions over the past few years was mainly due to KVM. Do you claim it wasnt? What other projects make use of it and pay developers to work on it? > And you are not being consistent either. You have conveniently continue to > ignore my questions about why the file system tools are not to be merged > into the Linux kernel source tree? Sorry, i didnt comment on it because the answer is obvious: the file system tools and pretty much any Linux-exclusive tool (such as udev) should be moved there. The difference is that there's not much active development done in most of those tools so the benefits are probably marginal. Both Qemu and KVM is being developed very actively though, so development model inefficiencies show up. Anyway, i didnt think i'd step into such a hornet's nest by explaining what i see as KVM's biggest weakness today and how i suggest it to be fixed :-) If you dont agree with me, then dont do it - no need to get emotional about it. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 16:54 ` Ingo Molnar @ 2010-03-18 18:10 ` Anthony Liguori 0 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-18 18:10 UTC (permalink / raw) To: Ingo Molnar Cc: Jes Sorensen, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/18/2010 11:54 AM, Ingo Molnar wrote: > I took a quick look at the qemu.git log and more than half of all recent > contributions came from Linux distributors. > I don't know what you're looking at, but in the past month, there's been 56 unique contributors, with 411 changesets. I count 16 people employed by distributions with 188 changesets. > So without KVM Qemu would be a much, much smaller project. It would be similar > to how it was 5 years ago. > I'm not saying that KVM isn't significant. I'm employed to work on QEMU because of KVM. I'm just saying that KVM users aren't 99% of the community and that we can't neglect the rest of the community. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-18 8:44 ` Jes Sorensen 2010-03-18 9:54 ` Ingo Molnar @ 2010-03-19 14:53 ` Andrea Arcangeli 1 sibling, 0 replies; 390+ messages in thread From: Andrea Arcangeli @ 2010-03-19 14:53 UTC (permalink / raw) To: Jes Sorensen Cc: Ingo Molnar, Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker Hi there, not really trying to get into the CC list of this discussion ;) but for what is worth I'd like to share my opinion on the matter. On Thu, Mar 18, 2010 at 09:44:18AM +0100, Jes Sorensen wrote: > What made KVM so successful was that the core kernel of the hypervisor > was designed the right way, as a kernel module where it belonged. It was > obvious to anyone who had been exposed to the main competition at the > time, Xen, that this was the right approach. What has ended up killing > Xen in the end is the not-invented-here approach of copying everything > over, reformatting it, and rewriting half of it, which made it > impossible to maintain and support as a single codebase. At my previous Full agreement with that. CVS/git/patches and development model is next to irrelevant compared to the basic design of the code. qemu (and especially qemu-kvm) is surely much closer to perf, than a firefox or openoffice, because there is some tight interconnect with the kernel API. And the skills required to produce useful patches in qemu are similar to the skills requires to produce useful patches for the kernel, more often than not a new feature in kvm also requires some merging of a qemu-kvm side patch (it always happened to me so far ;). But clearly we've to draw a barrier somewhere and while I could see things like systemtap and util-linux included into the kernel and perf already is, I've an hard time to see userland code supporting kernels other than linux into the kernel. I think that's probably where I'd draw the line. Let's say somebody creates a pure paravirt userland for kvm without full driver emulation that only runs on a linux kernel and no other OS, maybe that thing wouldn't be so controversial to include into the kernel as qemu is. qemu is clearly beyond the "only-running-on-a-linux-kernel" barrier... I'd definitely start with systemtap, which I think is even more suitable than perf to be merged into the kernel. Things useful only for developers like perf/systemtap makes even more sense to fetch silently hidden in a single pull. Those projects are so ideal to fetch together because you run your own compiled userland binary and not an rpm, and you need very latest kernel and userland package and sometime new userland might not work so well with older kernel too and the other way around. they're tool for developers and no developer cares about API as they rebuild latest userland code anyway, they almost don't require backwards compatibility of kernel. > So far your argument would justify pulling all of gdb into the kernel > git tree as well, to support the kgdb efforts, or gcc so we can get rid > of the gcc version quirks in the kernel header files, e2fsprogs and > equivalent for _all_ file systems included in the kernel so we can make > sure our fs tools never get out of sync with whats supported in the > kernel...... It also boils down to the maintainer, where the code is, defines the maintainer who pushes/commits it to the central repository that everyone pulls from. And having the code into Linus's tree doesn't make sense unless Linus is interested to follow and review qemu. So it'd only create blind pulls. But I entirely see what Ingo is going after and I have no doubt that contribution increases if some code is merged into the kernel even if it's userland. The more people clones a project, the more people builds, uses, reads the code and has incentive to contribute... and there's nothing else like the linux git tree to give visibility to a project and get more contributions (well as long as the pulled code requires similar skills to the kernel code of course). Plus it's annoying to go on web, find url to clone, clone.. running make is faster. But this purely is a PR effect. It's like free ads, to get more people using and looking into the code because it's already on the harddisk and you've only to run make. After somebody gets familiar with the pulled userland code because it find it in the tree and it didn't need to search the web and cut-and-paste and stuff, clone a new repo, I think it wouldn't matter anymore to have it into the kernel or not. Like for perf, by now I doubt it'd get less contributions if it's moved out of the kernel tree. The only reason to leave it there is if Linus actively checks the code before pulling it. So I think what would be nice maybe to get the PR/ads positive effect and get more _users_ (and later developers) involved without actually merging, is a command like: git clone linux-2.6 cd linux-2.6 tools/clone-project qemu-kvm tools/clone-project qemu tools/clone-project systemtap tools/clone-project seabios tools/clone-project e2fsprogs tools/clone-project perf ... And then maybe a git-send-email or similar command that would do the right thing and send it to the right list or similar. Learning the process of other projects is time consuming and requires some effort. But as far as qemu-kvm goes, the visibility is already there, and most people that could possibly contribute to the kernel side already has the userland cloned and pulls regularly from it so I doubt it'd generate anything remotely as beneficial like it did for perf. systemtap is really _identical_ to perf. You include it, lots more developers toys with it by just running "make", they find a bug and they fixes it, and they keep on contributing new features later after they got familiar with it. As usual ;). In separate mail Ingo said: > Btw., i made similar arguments to Avi about 3 years ago when it was > going upstream, that qemu should be unified with KVM. This is more > true today than ever. Well not sure if with KVM you mean qemu-kvm or KVM kernel code, but I would see an huge value and a win-win situation to see qemu-kvm and qemu unified. It's out of my reach how there can be still be a difference considering that nobody runs qemu with perhaps the exception of the maintainers themself. There's not even a qemu/kvm directory in qemu. These are the real problems that should be solved... Every time I've to send a patch I've to check if it also applies to qemu. I usually start with qemu-kvm to do my development there, and then I cross fingers and I hope the patch applies clean against qemu too, and send it there in that case. qemu and qemu-kvm are the same thing, it's the same people, it's the same community, it's the same skills, it's an absolute waste that a little amount of code isn't merged so we can all work on the same tree. And it's not an huge patch at all if compared to the size of qemu-kvm... so there can't be any technical explanation for qemu to take a tangent here. So my suggestion is to start with what will give a _real_ tangible benefit to developers (i.e. to all work on the same branch), the PR effect of merging it into the kernel would be minor for qemu-kvm, it really doesn't matter which url we pull the code from as long as there is only 1 url and not 2. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-17 8:10 ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar 2010-03-18 8:20 ` Avi Kivity 2010-03-18 8:44 ` Jes Sorensen @ 2010-03-18 14:38 ` Anthony Liguori 2010-03-18 14:44 ` Anthony Liguori 3 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-18 14:38 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker [-- Attachment #1: Type: text/plain, Size: 1836 bytes --] On 03/17/2010 03:10 AM, Ingo Molnar wrote: > * Anthony Liguori<anthony@codemonkey.ws> wrote: > > >> On 03/16/2010 12:39 PM, Ingo Molnar wrote: >> >>>> If we look at the use-case, it's going to be something like, a user is >>>> creating virtual machines and wants to get performance information about >>>> them. >>>> >>>> Having to run a separate tool like perf is not going to be what they would >>>> expect they had to do. Instead, they would either use their existing GUI >>>> tool (like virt-manager) or they would use their management interface >>>> (either QMP or libvirt). >>>> >>>> The complexity of interaction is due to the fact that perf shouldn't be a >>>> stand alone tool. It should be a library or something with a programmatic >>>> interface that another tool can make use of. >>>> >>> But ... a GUI interface/integration is of course possible too, and it's being >>> worked on. >>> >>> perf is mainly a kernel developer tool, and kernel developers generally dont >>> use GUIs to do their stuff: which is the (sole) reason why its first ~850 >>> commits of tools/perf/ were done without a GUI. We go where our developers >>> are. >>> >>> In any case it's not an excuse to have no proper command-line tooling. In fact >>> if you cannot get simpler, more atomic command-line tooling right then you'll >>> probably doubly suck at doing a GUI as well. >>> >> It's about who owns the user interface. >> >> If qemu owns the user interface, than we can satisfy this in a very simple >> way by adding a perf monitor command. If we have to support third party >> tools, then it significantly complicates things. >> > Of course illogical modularization complicates things 'significantly'. > Ok. Then apply this to the kernel. I'm then happy to take patches. Regards, Anthony Liguori [-- Attachment #2: qemu-linux.patch --] [-- Type: text/plain, Size: 1204 bytes --] commit 84b84db054e83e7686b80fad9f8d2aa87aade1a1 Author: Anthony Liguori <aliguori@us.ibm.com> Date: Thu Mar 18 09:35:29 2010 -0500 Bring QEMU into the Linux kernel tree Ingo is under the impression that this will result in a massive improvement in the usability of QEMU. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> diff --git a/.gitmodules b/.gitmodules new file mode 100644 index 0000000..76cdb68 --- /dev/null +++ b/.gitmodules @@ -0,0 +1,3 @@ +[submodule "tools/qemu"] + path = tools/qemu + url = git://git.qemu.org/qemu.git diff --git a/MAINTAINERS b/MAINTAINERS index 03f38c1..6275796 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4427,6 +4427,12 @@ M: Robert Jarzmik <robert.jarzmik@free.fr> L: rtc-linux@googlegroups.com S: Maintained +QEMU +M: Anthony Liguori <aliguori@us.ibm.com> +L: qemu-devel@nongnu.org +S: Maintained +F: tools/qemu + QLOGIC QLA2XXX FC-SCSI DRIVER M: Andrew Vasquez <andrew.vasquez@qlogic.com> M: linux-driver@qlogic.com diff --git a/tools/qemu b/tools/qemu new file mode 160000 index 0000000..e5322f7 --- /dev/null +++ b/tools/qemu @@ -0,0 +1 @@ +Subproject commit e5322f76a72352eea8eb511390c27726b64e5a87 ^ permalink raw reply related [flat|nested] 390+ messages in thread
* Re: [RFC] Unify KVM kernel-space and user-space code into a single project 2010-03-17 8:10 ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar ` (2 preceding siblings ...) 2010-03-18 14:38 ` Anthony Liguori @ 2010-03-18 14:44 ` Anthony Liguori 3 siblings, 0 replies; 390+ messages in thread From: Anthony Liguori @ 2010-03-18 14:44 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker On 03/17/2010 03:10 AM, Ingo Molnar wrote: > - move a clean (and minimal) version of the Qemu code base to tools/kvm/, in > the upstream kernel repo, and work on that from that point on. > QEMU is about 600k LOC. We have a mechanism to compile out portions of the code but a lot things are tied together in an intimate way. In the long run, we're working on adding stronger interfaces such that we can split components out into libraries that are consumable by other applications. Simplying forking the device model won't work. Well more than half of our contributors are not coming from KVM developers/users. If you just fork the device models, you start to lose a ton of fixes (look at Xen and VirtualBox). So feel free to either 1) apply my previous patch and then start working on a "clean (and minimal)" QEMU or 2) wait to commit my previous patch and start sending patches to clean up QEMU. Absolute none of this is going to give you a VirtualBox like GUI for QEMU. Regards, Anthony Liguori ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 11:25 ` Ingo Molnar 2010-03-16 12:21 ` Avi Kivity @ 2010-03-16 22:30 ` oerg Roedel 2010-03-16 23:01 ` Masami Hiramatsu 2010-03-17 7:27 ` Ingo Molnar 1 sibling, 2 replies; 390+ messages in thread From: oerg Roedel @ 2010-03-16 22:30 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote: > Hm, that sounds rather messy if we want to use it to basically expose kernel > functionality in a guest/host unified way. Is the qemu process discoverable in > some secure way? Can we trust it? Is there some proper tooling available to do > it, or do we have to push it through 2-3 packages to get such a useful feature > done? Since we want to implement a pmu usable for the guest anyway why we don't just use a guests perf to get all information we want? If we get a pmu-nmi from the guest we just re-inject it to the guest and perf in the guest gives us all information we wand including kernel and userspace symbols, stack traces, and so on. In the previous thread we discussed about a direct trace channel between guest and host kernel (which can be used for ftrace events for example). This channel could be used to transport this information to the host kernel. The only additional feature needed is a way for the host to start a perf instance in the guest. Opinions? Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 22:30 ` [PATCH] Enhance perf to collect KVM guest os statistics from host side oerg Roedel @ 2010-03-16 23:01 ` Masami Hiramatsu 2010-03-17 7:27 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Masami Hiramatsu @ 2010-03-16 23:01 UTC (permalink / raw) To: oerg Roedel Cc: Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, sungho.kim.zd oerg Roedel wrote: > On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote: >> Hm, that sounds rather messy if we want to use it to basically expose kernel >> functionality in a guest/host unified way. Is the qemu process discoverable in >> some secure way? Can we trust it? Is there some proper tooling available to do >> it, or do we have to push it through 2-3 packages to get such a useful feature >> done? > > Since we want to implement a pmu usable for the guest anyway why we > don't just use a guests perf to get all information we want? If we get a > pmu-nmi from the guest we just re-inject it to the guest and perf in the > guest gives us all information we wand including kernel and userspace > symbols, stack traces, and so on. I guess this aims to get information from old environments running on kvm for life extension :) > In the previous thread we discussed about a direct trace channel between > guest and host kernel (which can be used for ftrace events for example). > This channel could be used to transport this information to the host > kernel. Interesting! I know the people who are trying to do that with systemtap. See, http://vesper.sourceforge.net/ > > The only additional feature needed is a way for the host to start a perf > instance in the guest. # ssh localguest perf record --host-chanel ... ? B-) Thank you, > > Opinions? > > > Joerg > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Masami Hiramatsu e-mail: mhiramat@redhat.com ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 22:30 ` [PATCH] Enhance perf to collect KVM guest os statistics from host side oerg Roedel 2010-03-16 23:01 ` Masami Hiramatsu @ 2010-03-17 7:27 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-17 7:27 UTC (permalink / raw) To: Joerg Roedel Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang * oerg Roedel <joro@8bytes.org> wrote: > On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote: > > Hm, that sounds rather messy if we want to use it to basically expose kernel > > functionality in a guest/host unified way. Is the qemu process discoverable in > > some secure way? Can we trust it? Is there some proper tooling available to do > > it, or do we have to push it through 2-3 packages to get such a useful feature > > done? > > Since we want to implement a pmu usable for the guest anyway why we don't > just use a guests perf to get all information we want? [...] Look at the previous posting of this patch, this is something new and rather unique. The main power in the 'perf kvm' kind of instrumentation is to profile _both_ the host and the guest on the host, using the same tool (often using the same kernel) and using similar workloads, and do profile comparisons using 'perf diff'. Note that KVM's in-kernel design makes it easy to offer this kind of host/guest shared implementation that Yanmin has created. Other virtulization solutions with a poorer design (for example where the hypervisor code base is split away from the guest implementation) will have it much harder to create something similar. That kind of integrated approach can result in very interesting finds straight away, see: http://lkml.indiana.edu/hypermail/linux/kernel/1003.0/00613.html ( the profile there demoes the need for spinlock accelerators for example - there's clearly assymetrically large overhead in guest spinlock code. Guess how much else we'll be able to find with a full 'perf kvm' implementation. ) One of the main goals of a virtualization implementation is to eliminate as many performance differences to the host kernel as possible. From the first day KVM was released the overriding question from users was always: 'how much slower is it than native, and which workloads are hit worst, and why, and could you pretty please speed up important workload XYZ'. 'perf kvm' helps exactly that kind of development workflow. Note that with oprofile you can already do separate guest space and host space profiling (with the timer driven fallbackin the guest). One idea with 'perf kvm' is to change that paradigm of forced separation and forced duplication and to supprt the workflow that most developers employ: use the host space for development and unify instrumentation in an intuitive framework. Yanmin's 'perf kvm' patch is a very good step towards that goal. Anyway ... look at the patches, try them and see it for yourself. Back in the days when i did KVM performance work i wish i had something like Yanmin's 'perf kvm' feature. I'd probably still be hacking KVM today ;-) So, the code is there, it's useful and it's up to you guys whether you live with this opportunity - the perf developers are certainly eager to help out with the details. There's already tons of per kernel subsystem perf helper tools: perf sched, perf kmem, perf lock, perf bench, perf timechart. 'perf kvm' is really a natural and good next step IMO that underlines the main design goodness KVM brought to the world of virtualization: proper guest/host code base integration. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 5:41 ` Avi Kivity 2010-03-16 7:24 ` Ingo Molnar @ 2010-03-16 7:48 ` Zhang, Yanmin 2010-03-16 9:28 ` Zhang, Yanmin 2010-03-16 9:32 ` Avi Kivity 1 sibling, 2 replies; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-16 7:48 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote: > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> > > > > Based on the discussion in KVM community, I worked out the patch to support > > perf to collect guest os statistics from host side. This patch is implemented > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a > > critical bug and provided good suggestions with other guys. I really appreciate > > their kind help. > > > > The patch adds new subcommand kvm to perf. > > > > perf kvm top > > perf kvm record > > perf kvm report > > perf kvm diff > > > > The new perf could profile guest os kernel except guest os user space, but it > > could summarize guest os user space utilization per guest os. > > > > Below are some examples. > > 1) perf kvm top > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > > --guestmodules=/home/ymzhang/guest/modules top > > > > > Thanks for your kind comments. > Excellent, support for guest kernel != host kernel is critical (I can't > remember the last time I ran same kernels). > > How would we support multiple guests with different kernels? With the patch, 'perf kvm report --sort pid" could show summary statistics for all guest os instances. Then, use parameter --pid of 'perf kvm record' to collect single problematic instance data. > Perhaps a > symbol server that perf can connect to (and that would connect to guests > in turn)? > > > diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c > > --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 2010-03-16 08:59:11.825295404 +0800 > > +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c 2010-03-16 09:01:09.976084492 +0800 > > @@ -26,6 +26,7 @@ > > #include<linux/sched.h> > > #include<linux/moduleparam.h> > > #include<linux/ftrace_event.h> > > +#include<linux/perf_event.h> > > #include "kvm_cache_regs.h" > > #include "x86.h" > > > > @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct > > vmcs_write32(TPR_THRESHOLD, irr); > > } > > > > +DEFINE_PER_CPU(int, kvm_in_guest) = {0}; > > + > > +static void kvm_set_in_guest(void) > > +{ > > + percpu_write(kvm_in_guest, 1); > > +} > > + > > +static int kvm_is_in_guest(void) > > +{ > > + return percpu_read(kvm_in_guest); > > +} > > > > There is already PF_VCPU for this. Right, but there is a scope between kvm_guest_enter and really running in guest os, where a perf event might overflow. Anyway, the scope is very narrow, I will change it to use flag PF_VCPU. > > > +static struct perf_guest_info_callbacks kvm_guest_cbs = { > > + .is_in_guest = kvm_is_in_guest, > > + .is_user_mode = kvm_is_user_mode, > > + .get_guest_ip = kvm_get_guest_ip, > > + .reset_in_guest = kvm_reset_in_guest, > > +}; > > > > Should be in common code, not vmx specific. Right. I discussed with Yangsheng. I will move above data structures and callbacks to file arch/x86/kvm/x86.c, and add get_ip, a new callback to kvm_x86_ops. Yanmin ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 7:48 ` Zhang, Yanmin @ 2010-03-16 9:28 ` Zhang, Yanmin 2010-03-16 9:33 ` Avi Kivity 2010-03-16 9:47 ` Ingo Molnar 2010-03-16 9:32 ` Avi Kivity 1 sibling, 2 replies; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-16 9:28 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote: > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote: > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> > > > > > > Based on the discussion in KVM community, I worked out the patch to support > > > perf to collect guest os statistics from host side. This patch is implemented > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a > > > critical bug and provided good suggestions with other guys. I really appreciate > > > their kind help. > > > > > > The patch adds new subcommand kvm to perf. > > > > > > perf kvm top > > > perf kvm record > > > perf kvm report > > > perf kvm diff > > > > > > The new perf could profile guest os kernel except guest os user space, but it > > > could summarize guest os user space utilization per guest os. > > > > > > Below are some examples. > > > 1) perf kvm top > > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > > > --guestmodules=/home/ymzhang/guest/modules top > > > > > > > > > Thanks for your kind comments. > > > Excellent, support for guest kernel != host kernel is critical (I can't > > remember the last time I ran same kernels). > > > > How would we support multiple guests with different kernels? > With the patch, 'perf kvm report --sort pid" could show > summary statistics for all guest os instances. Then, use > parameter --pid of 'perf kvm record' to collect single problematic instance data. Sorry. I found currently --pid isn't process but a thread (main thread). Ingo, Is it possible to support a new parameter or extend --inherit, so 'perf record' and 'perf top' could collect data on all threads of a process when the process is running? If not, I need add a new ugly parameter which is similar to --pid to filter out process data in userspace. Yanmin ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 9:28 ` Zhang, Yanmin @ 2010-03-16 9:33 ` Avi Kivity 2010-03-16 9:47 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-16 9:33 UTC (permalink / raw) To: Zhang, Yanmin Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang On 03/16/2010 11:28 AM, Zhang, Yanmin wrote: > Sorry. I found currently --pid isn't process but a thread (main thread). > > Ingo, > > Is it possible to support a new parameter or extend --inherit, so 'perf record' and > 'perf top' could collect data on all threads of a process when the process is running? > That seems like a worthwhile addition regardless of this thread. Profile all current threads and any new ones. It probably makes sense to call this --pid and rename the existing --pid to --thread. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 9:28 ` Zhang, Yanmin 2010-03-16 9:33 ` Avi Kivity @ 2010-03-16 9:47 ` Ingo Molnar 2010-03-17 9:26 ` Zhang, Yanmin 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-16 9:47 UTC (permalink / raw) To: Zhang, Yanmin Cc: Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote: > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote: > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > > > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> > > > > > > > > Based on the discussion in KVM community, I worked out the patch to support > > > > perf to collect guest os statistics from host side. This patch is implemented > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a > > > > critical bug and provided good suggestions with other guys. I really appreciate > > > > their kind help. > > > > > > > > The patch adds new subcommand kvm to perf. > > > > > > > > perf kvm top > > > > perf kvm record > > > > perf kvm report > > > > perf kvm diff > > > > > > > > The new perf could profile guest os kernel except guest os user space, but it > > > > could summarize guest os user space utilization per guest os. > > > > > > > > Below are some examples. > > > > 1) perf kvm top > > > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > > > > --guestmodules=/home/ymzhang/guest/modules top > > > > > > > > > > > > > Thanks for your kind comments. > > > > > Excellent, support for guest kernel != host kernel is critical (I can't > > > remember the last time I ran same kernels). > > > > > > How would we support multiple guests with different kernels? > > With the patch, 'perf kvm report --sort pid" could show > > summary statistics for all guest os instances. Then, use > > parameter --pid of 'perf kvm record' to collect single problematic instance data. > Sorry. I found currently --pid isn't process but a thread (main thread). > > Ingo, > > Is it possible to support a new parameter or extend --inherit, so 'perf > record' and 'perf top' could collect data on all threads of a process when > the process is running? > > If not, I need add a new ugly parameter which is similar to --pid to filter > out process data in userspace. Yeah. For maximum utility i'd suggest to extend --pid to include this, and introduce --tid for the previous, limited-to-a-single-task functionality. Most users would expect --pid to work like a 'late attach' - i.e. to work like strace -f or like a gdb attach. Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 9:47 ` Ingo Molnar @ 2010-03-17 9:26 ` Zhang, Yanmin 2010-03-18 2:45 ` Zhang, Yanmin 0 siblings, 1 reply; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-17 9:26 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote: > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote: > > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote: > > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > > > > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> > > > > > > > > > > Based on the discussion in KVM community, I worked out the patch to support > > > > > perf to collect guest os statistics from host side. This patch is implemented > > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a > > > > > critical bug and provided good suggestions with other guys. I really appreciate > > > > > their kind help. > > > > > > > > > > The patch adds new subcommand kvm to perf. > > > > > > > > > > perf kvm top > > > > > perf kvm record > > > > > perf kvm report > > > > > perf kvm diff > > > > > > > > > > The new perf could profile guest os kernel except guest os user space, but it > > > > > could summarize guest os user space utilization per guest os. > > > > > > > > > > Below are some examples. > > > > > 1) perf kvm top > > > > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > > > > > --guestmodules=/home/ymzhang/guest/modules top > > > > > > > > > > > > > > > > > Thanks for your kind comments. > > > > > > > Excellent, support for guest kernel != host kernel is critical (I can't > > > > remember the last time I ran same kernels). > > > > > > > > How would we support multiple guests with different kernels? > > > With the patch, 'perf kvm report --sort pid" could show > > > summary statistics for all guest os instances. Then, use > > > parameter --pid of 'perf kvm record' to collect single problematic instance data. > > Sorry. I found currently --pid isn't process but a thread (main thread). > > > > Ingo, > > > > Is it possible to support a new parameter or extend --inherit, so 'perf > > record' and 'perf top' could collect data on all threads of a process when > > the process is running? > > > > If not, I need add a new ugly parameter which is similar to --pid to filter > > out process data in userspace. > > Yeah. For maximum utility i'd suggest to extend --pid to include this, and > introduce --tid for the previous, limited-to-a-single-task functionality. > > Most users would expect --pid to work like a 'late attach' - i.e. to work like > strace -f or like a gdb attach. Thanks Ingo, Avi. I worked out below patch against tip/master of March 15th. Subject: [PATCH] Change perf's parameter --pid to process-wide collection From: Zhang, Yanmin <yanmin_zhang@linux.intel.com> Change parameter -p (--pid) to real process pid and add -t (--tid) meaning thread id. Now, --pid means perf collects the statistics of all threads of the process, while --tid means perf just collect the statistics of that thread. BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures attr->disabled=1 if it isn't a system-wide collection. If there is a '-p' and no forks, 'perf stat -p' doesn't collect any data. In addition, the while(!done) in run_perf_stat consumes 100% single cpu time which has bad impact on running workload. I added a sleep(1) in the loop. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> --- diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-record.c linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-record.c --- linux-2.6_tipmaster0315/tools/perf/builtin-record.c 2010-03-16 08:59:54.896488489 +0800 +++ linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-record.c 2010-03-17 16:30:17.755551706 +0800 @@ -27,7 +27,7 @@ #include <unistd.h> #include <sched.h> -static int fd[MAX_NR_CPUS][MAX_COUNTERS]; +static int *fd[MAX_NR_CPUS][MAX_COUNTERS]; static long default_interval = 0; @@ -43,6 +43,9 @@ static int raw_samples = 0; static int system_wide = 0; static int profile_cpu = -1; static pid_t target_pid = -1; +static pid_t target_tid = -1; +static int *all_tids = NULL; +static int thread_num = 0; static pid_t child_pid = -1; static int inherit = 1; static int force = 0; @@ -60,7 +63,7 @@ static struct timeval this_read; static u64 bytes_written = 0; -static struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS]; +static struct pollfd *event_array; static int nr_poll = 0; static int nr_cpu = 0; @@ -77,7 +80,7 @@ struct mmap_data { unsigned int prev; }; -static struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS]; +static struct mmap_data *mmap_array[MAX_NR_CPUS][MAX_COUNTERS]; static unsigned long mmap_read_head(struct mmap_data *md) { @@ -225,11 +228,12 @@ static struct perf_header_attr *get_head return h_attr; } -static void create_counter(int counter, int cpu, pid_t pid, bool forks) +static void create_counter(int counter, int cpu, bool forks) { char *filter = filters[counter]; struct perf_event_attr *attr = attrs + counter; struct perf_header_attr *h_attr; + int thread_index; int track = !counter; /* only the first counter needs these */ int ret; struct { @@ -280,117 +284,124 @@ static void create_counter(int counter, if (forks) attr->enable_on_exec = 1; + for (thread_index = 0; thread_index < thread_num; thread_index++) { try_again: - fd[nr_cpu][counter] = sys_perf_event_open(attr, pid, cpu, group_fd, 0); + fd[nr_cpu][counter][thread_index] = sys_perf_event_open(attr, + all_tids[thread_index], cpu, group_fd, 0); - if (fd[nr_cpu][counter] < 0) { - int err = errno; + if (fd[nr_cpu][counter][thread_index] < 0) { + int err = errno; - if (err == EPERM || err == EACCES) - die("Permission error - are you root?\n" - "\t Consider tweaking /proc/sys/kernel/perf_event_paranoid.\n"); - else if (err == ENODEV && profile_cpu != -1) - die("No such device - did you specify an out-of-range profile CPU?\n"); + if (err == EPERM || err == EACCES) + die("Permission error - are you root?\n" + "\t Consider tweaking /proc/sys/kernel/perf_event_paranoid.\n"); + else if (err == ENODEV && profile_cpu != -1) + die("No such device - did you specify an out-of-range profile CPU?\n"); - /* - * If it's cycles then fall back to hrtimer - * based cpu-clock-tick sw counter, which - * is always available even if no PMU support: - */ - if (attr->type == PERF_TYPE_HARDWARE - && attr->config == PERF_COUNT_HW_CPU_CYCLES) { + /* + * If it's cycles then fall back to hrtimer + * based cpu-clock-tick sw counter, which + * is always available even if no PMU support: + */ + if (attr->type == PERF_TYPE_HARDWARE + && attr->config == PERF_COUNT_HW_CPU_CYCLES) { - if (verbose) - warning(" ... trying to fall back to cpu-clock-ticks\n"); - attr->type = PERF_TYPE_SOFTWARE; - attr->config = PERF_COUNT_SW_CPU_CLOCK; - goto try_again; - } - printf("\n"); - error("perfcounter syscall returned with %d (%s)\n", - fd[nr_cpu][counter], strerror(err)); + if (verbose) + warning(" ... trying to fall back to cpu-clock-ticks\n"); + attr->type = PERF_TYPE_SOFTWARE; + attr->config = PERF_COUNT_SW_CPU_CLOCK; + goto try_again; + } + printf("\n"); + error("perfcounter syscall returned with %d (%s)\n", + fd[nr_cpu][counter][thread_index], + strerror(err)); #if defined(__i386__) || defined(__x86_64__) - if (attr->type == PERF_TYPE_HARDWARE && err == EOPNOTSUPP) - die("No hardware sampling interrupt available. No APIC? If so then you can boot the kernel with the \"lapic\" boot parameter to force-enable it.\n"); + if (attr->type == PERF_TYPE_HARDWARE && err == EOPNOTSUPP) + die("No hardware sampling interrupt available. No APIC? If so then you can boot the kernel with the \"lapic\" boot parameter to force-enable it.\n"); #endif - die("No CONFIG_PERF_EVENTS=y kernel support configured?\n"); - exit(-1); - } + die("No CONFIG_PERF_EVENTS=y kernel support configured?\n"); + exit(-1); + } - h_attr = get_header_attr(attr, counter); - if (h_attr == NULL) - die("nomem\n"); + h_attr = get_header_attr(attr, counter); + if (h_attr == NULL) + die("nomem\n"); + + if (!file_new) { + if (memcmp(&h_attr->attr, attr, sizeof(*attr))) { + fprintf(stderr, "incompatible append\n"); + exit(-1); + } + } - if (!file_new) { - if (memcmp(&h_attr->attr, attr, sizeof(*attr))) { - fprintf(stderr, "incompatible append\n"); + if (read(fd[nr_cpu][counter][thread_index], &read_data, sizeof(read_data)) == -1) { + perror("Unable to read perf file descriptor\n"); exit(-1); } - } - - if (read(fd[nr_cpu][counter], &read_data, sizeof(read_data)) == -1) { - perror("Unable to read perf file descriptor\n"); - exit(-1); - } - if (perf_header_attr__add_id(h_attr, read_data.id) < 0) { - pr_warning("Not enough memory to add id\n"); - exit(-1); - } + if (perf_header_attr__add_id(h_attr, read_data.id) < 0) { + pr_warning("Not enough memory to add id\n"); + exit(-1); + } - assert(fd[nr_cpu][counter] >= 0); - fcntl(fd[nr_cpu][counter], F_SETFL, O_NONBLOCK); + assert(fd[nr_cpu][counter][thread_index] >= 0); + fcntl(fd[nr_cpu][counter][thread_index], F_SETFL, O_NONBLOCK); - /* - * First counter acts as the group leader: - */ - if (group && group_fd == -1) - group_fd = fd[nr_cpu][counter]; - if (multiplex && multiplex_fd == -1) - multiplex_fd = fd[nr_cpu][counter]; + /* + * First counter acts as the group leader: + */ + if (group && group_fd == -1) + group_fd = fd[nr_cpu][counter][thread_index]; + if (multiplex && multiplex_fd == -1) + multiplex_fd = fd[nr_cpu][counter][thread_index]; - if (multiplex && fd[nr_cpu][counter] != multiplex_fd) { + if (multiplex && fd[nr_cpu][counter][thread_index] != multiplex_fd) { - ret = ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_SET_OUTPUT, multiplex_fd); - assert(ret != -1); - } else { - event_array[nr_poll].fd = fd[nr_cpu][counter]; - event_array[nr_poll].events = POLLIN; - nr_poll++; - - mmap_array[nr_cpu][counter].counter = counter; - mmap_array[nr_cpu][counter].prev = 0; - mmap_array[nr_cpu][counter].mask = mmap_pages*page_size - 1; - mmap_array[nr_cpu][counter].base = mmap(NULL, (mmap_pages+1)*page_size, - PROT_READ|PROT_WRITE, MAP_SHARED, fd[nr_cpu][counter], 0); - if (mmap_array[nr_cpu][counter].base == MAP_FAILED) { - error("failed to mmap with %d (%s)\n", errno, strerror(errno)); - exit(-1); + ret = ioctl(fd[nr_cpu][counter][thread_index], PERF_EVENT_IOC_SET_OUTPUT, multiplex_fd); + assert(ret != -1); + } else { + event_array[nr_poll].fd = fd[nr_cpu][counter][thread_index]; + event_array[nr_poll].events = POLLIN; + nr_poll++; + + mmap_array[nr_cpu][counter][thread_index].counter = counter; + mmap_array[nr_cpu][counter][thread_index].prev = 0; + mmap_array[nr_cpu][counter][thread_index].mask = mmap_pages*page_size - 1; + mmap_array[nr_cpu][counter][thread_index].base = mmap(NULL, + (mmap_pages+1)*page_size, + PROT_READ|PROT_WRITE, MAP_SHARED, + fd[nr_cpu][counter][thread_index], + 0); + if (mmap_array[nr_cpu][counter][thread_index].base == MAP_FAILED) { + error("failed to mmap with %d (%s)\n", errno, strerror(errno)); + exit(-1); + } } - } - if (filter != NULL) { - ret = ioctl(fd[nr_cpu][counter], - PERF_EVENT_IOC_SET_FILTER, filter); - if (ret) { - error("failed to set filter with %d (%s)\n", errno, - strerror(errno)); - exit(-1); + if (filter != NULL) { + ret = ioctl(fd[nr_cpu][counter][thread_index], + PERF_EVENT_IOC_SET_FILTER, filter); + if (ret) { + error("failed to set filter with %d (%s)\n", errno, + strerror(errno)); + exit(-1); + } } - } - ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_ENABLE); + ioctl(fd[nr_cpu][counter][thread_index], PERF_EVENT_IOC_ENABLE); + } } -static void open_counters(int cpu, pid_t pid, bool forks) +static void open_counters(int cpu, bool forks) { int counter; group_fd = -1; for (counter = 0; counter < nr_counters; counter++) - create_counter(counter, cpu, pid, forks); + create_counter(counter, cpu, forks); nr_cpu++; } @@ -425,7 +436,7 @@ static int __cmd_record(int argc, const int err; unsigned long waking = 0; int child_ready_pipe[2], go_pipe[2]; - const bool forks = target_pid == -1 && argc > 0; + const bool forks = target_tid == -1 && argc > 0; char buf; page_size = sysconf(_SC_PAGE_SIZE); @@ -534,7 +545,7 @@ static int __cmd_record(int argc, const child_pid = pid; if (!system_wide) - target_pid = pid; + target_tid = pid; close(child_ready_pipe[1]); close(go_pipe[0]); @@ -550,11 +561,11 @@ static int __cmd_record(int argc, const if ((!system_wide && !inherit) || profile_cpu != -1) { - open_counters(profile_cpu, target_pid, forks); + open_counters(profile_cpu, forks); } else { nr_cpus = read_cpu_map(); for (i = 0; i < nr_cpus; i++) - open_counters(cpumap[i], target_pid, forks); + open_counters(cpumap[i], forks); } if (file_new) { @@ -579,7 +590,7 @@ static int __cmd_record(int argc, const } if (!system_wide && profile_cpu == -1) - event__synthesize_thread(target_pid, process_synthesized_event, + event__synthesize_thread(target_tid, process_synthesized_event, session); else event__synthesize_threads(process_synthesized_event, session); @@ -602,11 +613,15 @@ static int __cmd_record(int argc, const for (;;) { int hits = samples; + int thread; for (i = 0; i < nr_cpu; i++) { for (counter = 0; counter < nr_counters; counter++) { - if (mmap_array[i][counter].base) - mmap_read(&mmap_array[i][counter]); + for (thread = 0; + thread < thread_num; thread++) { + if (mmap_array[i][counter][thread].base) + mmap_read(&mmap_array[i][counter][thread]); + } } } @@ -619,8 +634,13 @@ static int __cmd_record(int argc, const if (done) { for (i = 0; i < nr_cpu; i++) { - for (counter = 0; counter < nr_counters; counter++) - ioctl(fd[i][counter], PERF_EVENT_IOC_DISABLE); + for (counter = 0; counter < nr_counters; + counter++) { + for (thread = 0; + thread < thread_num; thread++) + ioctl(fd[i][counter][thread], + PERF_EVENT_IOC_DISABLE); + } } } } @@ -653,6 +673,8 @@ static const struct option options[] = { "event filter", parse_filter), OPT_INTEGER('p', "pid", &target_pid, "record events on existing pid"), + OPT_INTEGER('t', "tid", &target_tid, + "record events on existing thread id"), OPT_INTEGER('r', "realtime", &realtime_prio, "collect data with this RT SCHED_FIFO priority"), OPT_BOOLEAN('R', "raw-samples", &raw_samples, @@ -693,10 +715,11 @@ static const struct option options[] = { int cmd_record(int argc, const char **argv, const char *prefix __used) { int counter; + int i,j; argc = parse_options(argc, argv, options, record_usage, PARSE_OPT_STOP_AT_NON_OPTION); - if (!argc && target_pid == -1 && !system_wide && profile_cpu == -1) + if (!argc && target_tid == -1 && !system_wide && profile_cpu == -1) usage_with_options(record_usage, options); symbol__init(); @@ -707,6 +730,37 @@ int cmd_record(int argc, const char **ar attrs[0].config = PERF_COUNT_HW_CPU_CYCLES; } + if (target_pid != -1) { + target_tid = target_pid; + thread_num = find_all_tid(target_pid, &all_tids); + if (thread_num <= 0) { + fprintf(stderr, "Can't find all threads of pid %d\n", + target_pid); + usage_with_options(record_usage, options); + } + } else { + all_tids=malloc(sizeof(int)); + if (!all_tids) + return -ENOMEM; + + all_tids[0] = target_tid; + thread_num = 1; + } + + for (i = 0; i < MAX_NR_CPUS; i++) { + for (j = 0; j < MAX_COUNTERS; j++) { + fd[i][j] = malloc(sizeof(int)*thread_num); + mmap_array[i][j] = malloc( + sizeof(struct mmap_data)*thread_num); + if (!fd[i][j] || !mmap_array[i][j]) + return -ENOMEM; + } + } + event_array = malloc( + sizeof(struct pollfd)*MAX_NR_CPUS*MAX_COUNTERS*thread_num); + if (!event_array) + return -ENOMEM; + /* * User specified count overrides default frequency. */ diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-stat.c linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-stat.c --- linux-2.6_tipmaster0315/tools/perf/builtin-stat.c 2010-03-16 08:59:54.892460680 +0800 +++ linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-stat.c 2010-03-17 16:30:25.484062179 +0800 @@ -46,6 +46,7 @@ #include "util/debug.h" #include "util/header.h" #include "util/cpumap.h" +#include "util/thread.h" #include <sys/prctl.h> #include <math.h> @@ -74,10 +75,14 @@ static int run_count = 1; static int inherit = 1; static int scale = 1; static pid_t target_pid = -1; +static pid_t target_tid = -1; +static int *all_tids = NULL; +static int thread_num = 0; static pid_t child_pid = -1; static int null_run = 0; +static bool forks = false; -static int fd[MAX_NR_CPUS][MAX_COUNTERS]; +static int *fd[MAX_NR_CPUS][MAX_COUNTERS]; static int event_scaled[MAX_COUNTERS]; @@ -140,9 +145,10 @@ struct stats runtime_branches_stats; #define ERR_PERF_OPEN \ "Error: counter %d, sys_perf_event_open() syscall returned with %d (%s)\n" -static void create_perf_stat_counter(int counter, int pid) +static void create_perf_stat_counter(int counter) { struct perf_event_attr *attr = attrs + counter; + int thread; if (scale) attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED | @@ -152,20 +158,24 @@ static void create_perf_stat_counter(int unsigned int cpu; for (cpu = 0; cpu < nr_cpus; cpu++) { - fd[cpu][counter] = sys_perf_event_open(attr, -1, cpumap[cpu], -1, 0); - if (fd[cpu][counter] < 0 && verbose) + fd[cpu][counter][0] = sys_perf_event_open(attr, -1, cpumap[cpu], -1, 0); + if (fd[cpu][counter][0] < 0 && verbose) fprintf(stderr, ERR_PERF_OPEN, counter, - fd[cpu][counter], strerror(errno)); + fd[cpu][counter][0], strerror(errno)); } } else { attr->inherit = inherit; - attr->disabled = 1; + if (forks) + attr->disabled = 1; attr->enable_on_exec = 1; - fd[0][counter] = sys_perf_event_open(attr, pid, -1, -1, 0); - if (fd[0][counter] < 0 && verbose) - fprintf(stderr, ERR_PERF_OPEN, counter, - fd[0][counter], strerror(errno)); + for (thread = 0; thread < thread_num; thread++) { + fd[0][counter][thread] = sys_perf_event_open(attr, + all_tids[thread], -1, -1, 0); + if (fd[0][counter][thread] < 0 && verbose) + fprintf(stderr, ERR_PERF_OPEN, counter, + fd[0][counter][thread], strerror(errno)); + } } } @@ -190,25 +200,29 @@ static void read_counter(int counter) unsigned int cpu; size_t res, nv; int scaled; - int i; + int i, thread; count[0] = count[1] = count[2] = 0; nv = scale ? 3 : 1; for (cpu = 0; cpu < nr_cpus; cpu++) { - if (fd[cpu][counter] < 0) - continue; - - res = read(fd[cpu][counter], single_count, nv * sizeof(u64)); - assert(res == nv * sizeof(u64)); - - close(fd[cpu][counter]); - fd[cpu][counter] = -1; - - count[0] += single_count[0]; - if (scale) { - count[1] += single_count[1]; - count[2] += single_count[2]; + + for (thread = 0; thread < thread_num; thread++) { + if (fd[cpu][counter][thread] < 0) + continue; + + res = read(fd[cpu][counter][thread], + single_count, nv * sizeof(u64)); + assert(res == nv * sizeof(u64)); + + close(fd[cpu][counter][thread]); + fd[cpu][counter][thread] = -1; + + count[0] += single_count[0]; + if (scale) { + count[1] += single_count[1]; + count[2] += single_count[2]; + } } } @@ -251,11 +265,11 @@ static int run_perf_stat(int argc __used unsigned long long t0, t1; int status = 0; int counter; - int pid = target_pid; + int pid = target_tid; int child_ready_pipe[2], go_pipe[2]; - const bool forks = (target_pid == -1 && argc > 0); char buf; + forks = (target_tid == -1 && argc > 0); if (!system_wide) nr_cpus = 1; @@ -307,10 +321,12 @@ static int run_perf_stat(int argc __used if (read(child_ready_pipe[0], &buf, 1) == -1) perror("unable to read pipe"); close(child_ready_pipe[0]); + + all_tids[0] = pid; } for (counter = 0; counter < nr_counters; counter++) - create_perf_stat_counter(counter, pid); + create_perf_stat_counter(counter); /* * Enable counters and exec the command: @@ -321,7 +337,7 @@ static int run_perf_stat(int argc __used close(go_pipe[1]); wait(&status); } else { - while(!done); + while(!done) sleep(1); } t1 = rdclock(); @@ -429,12 +445,12 @@ static void print_stat(int argc, const c fprintf(stderr, "\n"); fprintf(stderr, " Performance counter stats for "); - if(target_pid == -1) { + if(target_tid == -1) { fprintf(stderr, "\'%s", argv[0]); for (i = 1; i < argc; i++) fprintf(stderr, " %s", argv[i]); }else - fprintf(stderr, "task pid \'%d", target_pid); + fprintf(stderr, "task pid \'%d", target_tid); fprintf(stderr, "\'"); if (run_count > 1) @@ -459,7 +475,7 @@ static volatile int signr = -1; static void skip_signal(int signo) { - if(target_pid != -1) + if(target_tid != -1) done = 1; signr = signo; @@ -488,8 +504,10 @@ static const struct option options[] = { parse_events), OPT_BOOLEAN('i', "inherit", &inherit, "child tasks inherit counters"), - OPT_INTEGER('p', "pid", &target_pid, + OPT_INTEGER('p', "pid", &target_tid, "stat events on existing pid"), + OPT_INTEGER('t', "tid", &target_tid, + "stat events on existing tid"), OPT_BOOLEAN('a', "all-cpus", &system_wide, "system-wide collection from all CPUs"), OPT_BOOLEAN('c', "scale", &scale, @@ -506,10 +524,11 @@ static const struct option options[] = { int cmd_stat(int argc, const char **argv, const char *prefix __used) { int status; + int i,j; argc = parse_options(argc, argv, options, stat_usage, PARSE_OPT_STOP_AT_NON_OPTION); - if (!argc && target_pid == -1) + if (!argc && (target_pid == -1 || target_tid == -1)) usage_with_options(stat_usage, options); if (run_count <= 0) usage_with_options(stat_usage, options); @@ -525,6 +544,32 @@ int cmd_stat(int argc, const char **argv else nr_cpus = 1; + if (target_pid != -1) { + target_tid = target_pid; + thread_num = find_all_tid(target_pid, &all_tids); + if (thread_num <= 0) { + fprintf(stderr, "Can't find all threads of pid %d\n", + target_pid); + usage_with_options(stat_usage, options); + } + + } else { + all_tids=malloc(sizeof(int)); + if (!all_tids) + return -ENOMEM; + + all_tids[0] = target_tid; + thread_num = 1; + } + + for (i = 0; i < MAX_NR_CPUS; i++) { + for (j = 0; j < MAX_COUNTERS; j++) { + fd[i][j] = malloc(sizeof(int)*thread_num); + if (!fd[i][j]) + return -ENOMEM; + } + } + /* * We dont want to block the signals - that would cause * child tasks to inherit that and Ctrl-C would not work. diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-top.c linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-top.c --- linux-2.6_tipmaster0315/tools/perf/builtin-top.c 2010-03-16 08:59:54.760470652 +0800 +++ linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-top.c 2010-03-17 16:30:35.316716557 +0800 @@ -55,7 +55,8 @@ #include <linux/unistd.h> #include <linux/types.h> -static int fd[MAX_NR_CPUS][MAX_COUNTERS]; +static int *fd[MAX_NR_CPUS][MAX_COUNTERS]; +static int *all_tids = NULL; static int system_wide = 0; @@ -64,7 +65,9 @@ static int default_interval = 0; static int count_filter = 5; static int print_entries; +static int target_tid = -1; static int target_pid = -1; +static int thread_num = 0; static int inherit = 0; static int profile_cpu = -1; static int nr_cpus = 0; @@ -524,13 +527,15 @@ static void print_sym_table(void) if (target_pid != -1) printf(" (target_pid: %d", target_pid); + else if (target_tid != -1) + printf(" (target_tid: %d", target_tid); else printf(" (all"); if (profile_cpu != -1) printf(", cpu: %d)\n", profile_cpu); else { - if (target_pid != -1) + if (target_tid != -1) printf(")\n"); else printf(", %d CPUs)\n", nr_cpus); @@ -1124,16 +1129,21 @@ static void perf_session__mmap_read_coun md->prev = old; } -static struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS]; -static struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS]; +static struct pollfd *event_array; +static struct mmap_data *mmap_array[MAX_NR_CPUS][MAX_COUNTERS]; static void perf_session__mmap_read(struct perf_session *self) { - int i, counter; + int i, counter, thread_index; for (i = 0; i < nr_cpus; i++) { for (counter = 0; counter < nr_counters; counter++) - perf_session__mmap_read_counter(self, &mmap_array[i][counter]); + for (thread_index = 0; + thread_index < thread_num; + thread_index++) { + perf_session__mmap_read_counter(self, + &mmap_array[i][counter][thread_index]); + } } } @@ -1144,9 +1154,10 @@ static void start_counter(int i, int cou { struct perf_event_attr *attr; int cpu; + int thread_index; cpu = profile_cpu; - if (target_pid == -1 && profile_cpu == -1) + if (target_tid == -1 && profile_cpu == -1) cpu = cpumap[i]; attr = attrs + counter; @@ -1162,55 +1173,58 @@ static void start_counter(int i, int cou attr->inherit = (cpu < 0) && inherit; attr->mmap = 1; + for (thread_index = 0; thread_index < thread_num; thread_index++) { try_again: - fd[i][counter] = sys_perf_event_open(attr, target_pid, cpu, group_fd, 0); + fd[i][counter][thread_index] = sys_perf_event_open(attr, + all_tids[thread_index], cpu, group_fd, 0); + + if (fd[i][counter][thread_index] < 0) { + int err = errno; - if (fd[i][counter] < 0) { - int err = errno; + if (err == EPERM || err == EACCES) + die("No permission - are you root?\n"); + /* + * If it's cycles then fall back to hrtimer + * based cpu-clock-tick sw counter, which + * is always available even if no PMU support: + */ + if (attr->type == PERF_TYPE_HARDWARE + && attr->config == PERF_COUNT_HW_CPU_CYCLES) { + + if (verbose) + warning(" ... trying to fall back to cpu-clock-ticks\n"); + + attr->type = PERF_TYPE_SOFTWARE; + attr->config = PERF_COUNT_SW_CPU_CLOCK; + goto try_again; + } + printf("\n"); + error("perfcounter syscall returned with %d (%s)\n", + fd[i][counter][thread_index], strerror(err)); + die("No CONFIG_PERF_EVENTS=y kernel support configured?\n"); + exit(-1); + } + assert(fd[i][counter][thread_index] >= 0); + fcntl(fd[i][counter][thread_index], F_SETFL, O_NONBLOCK); - if (err == EPERM || err == EACCES) - die("No permission - are you root?\n"); /* - * If it's cycles then fall back to hrtimer - * based cpu-clock-tick sw counter, which - * is always available even if no PMU support: + * First counter acts as the group leader: */ - if (attr->type == PERF_TYPE_HARDWARE - && attr->config == PERF_COUNT_HW_CPU_CYCLES) { + if (group && group_fd == -1) + group_fd = fd[i][counter][thread_index]; - if (verbose) - warning(" ... trying to fall back to cpu-clock-ticks\n"); - - attr->type = PERF_TYPE_SOFTWARE; - attr->config = PERF_COUNT_SW_CPU_CLOCK; - goto try_again; - } - printf("\n"); - error("perfcounter syscall returned with %d (%s)\n", - fd[i][counter], strerror(err)); - die("No CONFIG_PERF_EVENTS=y kernel support configured?\n"); - exit(-1); + event_array[nr_poll].fd = fd[i][counter][thread_index]; + event_array[nr_poll].events = POLLIN; + nr_poll++; + + mmap_array[i][counter][thread_index].counter = counter; + mmap_array[i][counter][thread_index].prev = 0; + mmap_array[i][counter][thread_index].mask = mmap_pages*page_size - 1; + mmap_array[i][counter][thread_index].base = mmap(NULL, (mmap_pages+1)*page_size, + PROT_READ, MAP_SHARED, fd[i][counter][thread_index], 0); + if (mmap_array[i][counter][thread_index].base == MAP_FAILED) + die("failed to mmap with %d (%s)\n", errno, strerror(errno)); } - assert(fd[i][counter] >= 0); - fcntl(fd[i][counter], F_SETFL, O_NONBLOCK); - - /* - * First counter acts as the group leader: - */ - if (group && group_fd == -1) - group_fd = fd[i][counter]; - - event_array[nr_poll].fd = fd[i][counter]; - event_array[nr_poll].events = POLLIN; - nr_poll++; - - mmap_array[i][counter].counter = counter; - mmap_array[i][counter].prev = 0; - mmap_array[i][counter].mask = mmap_pages*page_size - 1; - mmap_array[i][counter].base = mmap(NULL, (mmap_pages+1)*page_size, - PROT_READ, MAP_SHARED, fd[i][counter], 0); - if (mmap_array[i][counter].base == MAP_FAILED) - die("failed to mmap with %d (%s)\n", errno, strerror(errno)); } static int __cmd_top(void) @@ -1226,8 +1240,8 @@ static int __cmd_top(void) if (session == NULL) return -ENOMEM; - if (target_pid != -1) - event__synthesize_thread(target_pid, event__process, session); + if (target_tid != -1) + event__synthesize_thread(target_tid, event__process, session); else event__synthesize_threads(event__process, session); @@ -1238,7 +1252,7 @@ static int __cmd_top(void) } /* Wait for a minimal set of events before starting the snapshot */ - poll(event_array, nr_poll, 100); + poll(&event_array[0], nr_poll, 100); perf_session__mmap_read(session); @@ -1282,6 +1296,8 @@ static const struct option options[] = { "event period to sample"), OPT_INTEGER('p', "pid", &target_pid, "profile events on existing pid"), + OPT_INTEGER('t', "tid", &target_tid, + "profile events on existing tid"), OPT_BOOLEAN('a', "all-cpus", &system_wide, "system-wide collection from all CPUs"), OPT_INTEGER('C', "CPU", &profile_cpu, @@ -1322,6 +1338,7 @@ static const struct option options[] = { int cmd_top(int argc, const char **argv, const char *prefix __used) { int counter; + int i,j; page_size = sysconf(_SC_PAGE_SIZE); @@ -1329,8 +1346,39 @@ int cmd_top(int argc, const char **argv, if (argc) usage_with_options(top_usage, options); + if (target_pid != -1) { + target_tid = target_pid; + thread_num = find_all_tid(target_pid, &all_tids); + if (thread_num <= 0) { + fprintf(stderr, "Can't find all threads of pid %d\n", + target_pid); + usage_with_options(top_usage, options); + } + } else { + all_tids=malloc(sizeof(int)); + if (!all_tids) + return -ENOMEM; + + all_tids[0] = target_tid; + thread_num = 1; + } + + for (i = 0; i < MAX_NR_CPUS; i++) { + for (j = 0; j < MAX_COUNTERS; j++) { + fd[i][j] = malloc(sizeof(int)*thread_num); + mmap_array[i][j] = malloc( + sizeof(struct mmap_data)*thread_num); + if (!fd[i][j] || !mmap_array[i][j]) + return -ENOMEM; + } + } + event_array = malloc( + sizeof(struct pollfd)*MAX_NR_CPUS*MAX_COUNTERS*thread_num); + if (!event_array) + return -ENOMEM; + /* CPU and PID are mutually exclusive */ - if (target_pid != -1 && profile_cpu != -1) { + if (target_tid > 0 && profile_cpu != -1) { printf("WARNING: PID switch overriding CPU\n"); sleep(1); profile_cpu = -1; @@ -1371,7 +1419,7 @@ int cmd_top(int argc, const char **argv, attrs[counter].sample_period = default_interval; } - if (target_pid != -1 || profile_cpu != -1) + if (target_tid != -1 || profile_cpu != -1) nr_cpus = 1; else nr_cpus = read_cpu_map(); diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/thread.c linux-2.6_tipmaster0315_perfpid/tools/perf/util/thread.c --- linux-2.6_tipmaster0315/tools/perf/util/thread.c 2010-03-16 08:59:54.892460680 +0800 +++ linux-2.6_tipmaster0315_perfpid/tools/perf/util/thread.c 2010-03-17 14:07:25.725218425 +0800 @@ -7,6 +7,37 @@ #include "util.h" #include "debug.h" +int find_all_tid(int pid, int ** all_tid) +{ + char name[256]; + int items; + struct dirent **namelist = NULL; + int ret = 0; + int i; + + sprintf(name, "/proc/%d/task", pid); + items = scandir(name, &namelist, NULL, NULL); + if (items <= 0) + return -ENOENT; + *all_tid = malloc(sizeof(int) * items); + if (!*all_tid) { + ret = -ENOMEM; + goto failure; + } + + for (i = 0; i < items; i++) + (*all_tid)[i] = atoi(namelist[i]->d_name); + + ret = items; + +failure: + for (i=0; i<items; i++) + free(namelist[i]); + free(namelist); + + return ret; +} + void map_groups__init(struct map_groups *self) { int i; @@ -348,3 +379,4 @@ struct symbol *map_groups__find_symbol(s return NULL; } + diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/thread.h linux-2.6_tipmaster0315_perfpid/tools/perf/util/thread.h --- linux-2.6_tipmaster0315/tools/perf/util/thread.h 2010-03-16 08:59:54.764469663 +0800 +++ linux-2.6_tipmaster0315_perfpid/tools/perf/util/thread.h 2010-03-17 14:03:09.628322688 +0800 @@ -23,6 +23,7 @@ struct thread { int comm_len; }; +int find_all_tid(int pid, int ** all_tid); void map_groups__init(struct map_groups *self); int thread__set_comm(struct thread *self, const char *comm); int thread__comm_len(struct thread *self); ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 9:26 ` Zhang, Yanmin @ 2010-03-18 2:45 ` Zhang, Yanmin 2010-03-18 7:49 ` Zhang, Yanmin 0 siblings, 1 reply; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-18 2:45 UTC (permalink / raw) To: Ingo Molnar Cc: Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang On Wed, 2010-03-17 at 17:26 +0800, Zhang, Yanmin wrote: > On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote: > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > > > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote: > > > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote: > > > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > > > > > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> > > > > > > > > > > > > Based on the discussion in KVM community, I worked out the patch to support > > > > > > perf to collect guest os statistics from host side. This patch is implemented > > > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a > > > > > > critical bug and provided good suggestions with other guys. I really appreciate > > > > > > their kind help. > > > > > > > > > > > > The patch adds new subcommand kvm to perf. > > > > > > > > > > > > perf kvm top > > > > > > perf kvm record > > > > > > perf kvm report > > > > > > perf kvm diff > > > > > > > > > > > > The new perf could profile guest os kernel except guest os user space, but it > > > > > > could summarize guest os user space utilization per guest os. > > > > > > > > > > > > Below are some examples. > > > > > > 1) perf kvm top > > > > > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > > > > > > --guestmodules=/home/ymzhang/guest/modules top > > > > > > > > > > > > > > > > > > > > > Thanks for your kind comments. > > > > > > > > > Excellent, support for guest kernel != host kernel is critical (I can't > > > > > remember the last time I ran same kernels). > > > > > > > > > > How would we support multiple guests with different kernels? > > > > With the patch, 'perf kvm report --sort pid" could show > > > > summary statistics for all guest os instances. Then, use > > > > parameter --pid of 'perf kvm record' to collect single problematic instance data. > > > Sorry. I found currently --pid isn't process but a thread (main thread). > > > > > > Ingo, > > > > > > Is it possible to support a new parameter or extend --inherit, so 'perf > > > record' and 'perf top' could collect data on all threads of a process when > > > the process is running? > > > > > > If not, I need add a new ugly parameter which is similar to --pid to filter > > > out process data in userspace. > > > > Yeah. For maximum utility i'd suggest to extend --pid to include this, and > > introduce --tid for the previous, limited-to-a-single-task functionality. > > > > Most users would expect --pid to work like a 'late attach' - i.e. to work like > > strace -f or like a gdb attach. > > Thanks Ingo, Avi. > > I worked out below patch against tip/master of March 15th. > > Subject: [PATCH] Change perf's parameter --pid to process-wide collection > From: Zhang, Yanmin <yanmin_zhang@linux.intel.com> > > Change parameter -p (--pid) to real process pid and add -t (--tid) meaning > thread id. Now, --pid means perf collects the statistics of all threads of > the process, while --tid means perf just collect the statistics of that thread. > > BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures > attr->disabled=1 if it isn't a system-wide collection. If there is a '-p' > and no forks, 'perf stat -p' doesn't collect any data. In addition, the > while(!done) in run_perf_stat consumes 100% single cpu time which has bad impact > on running workload. I added a sleep(1) in the loop. > > Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Ingo, Sorry, the patch has bugs. I need do a better job and will work out 2 separate patches against the 2 issues. Yanmin ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-18 2:45 ` Zhang, Yanmin @ 2010-03-18 7:49 ` Zhang, Yanmin 2010-03-18 8:03 ` Ingo Molnar 0 siblings, 1 reply; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-18 7:49 UTC (permalink / raw) To: Ingo Molnar, Arnaldo Carvalho de Melo Cc: Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang [-- Attachment #1: Type: text/plain, Size: 4977 bytes --] On Thu, 2010-03-18 at 10:45 +0800, Zhang, Yanmin wrote: > On Wed, 2010-03-17 at 17:26 +0800, Zhang, Yanmin wrote: > > On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote: > > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > > > > > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote: > > > > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote: > > > > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote: > > > > > > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com> > > > > > > > > > > > > > > Based on the discussion in KVM community, I worked out the patch to support > > > > > > > perf to collect guest os statistics from host side. This patch is implemented > > > > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a > > > > > > > critical bug and provided good suggestions with other guys. I really appreciate > > > > > > > their kind help. > > > > > > > > > > > > > > The patch adds new subcommand kvm to perf. > > > > > > > > > > > > > > perf kvm top > > > > > > > perf kvm record > > > > > > > perf kvm report > > > > > > > perf kvm diff > > > > > > > > > > > > > > The new perf could profile guest os kernel except guest os user space, but it > > > > > > > could summarize guest os user space utilization per guest os. > > > > > > > > > > > > > > Below are some examples. > > > > > > > 1) perf kvm top > > > > > > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > > > > > > > --guestmodules=/home/ymzhang/guest/modules top > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your kind comments. > > > > > > > > > > > Excellent, support for guest kernel != host kernel is critical (I can't > > > > > > remember the last time I ran same kernels). > > > > > > > > > > > > How would we support multiple guests with different kernels? > > > > > With the patch, 'perf kvm report --sort pid" could show > > > > > summary statistics for all guest os instances. Then, use > > > > > parameter --pid of 'perf kvm record' to collect single problematic instance data. > > > > Sorry. I found currently --pid isn't process but a thread (main thread). > > > > > > > > Ingo, > > > > > > > > Is it possible to support a new parameter or extend --inherit, so 'perf > > > > record' and 'perf top' could collect data on all threads of a process when > > > > the process is running? > > > > > > > > If not, I need add a new ugly parameter which is similar to --pid to filter > > > > out process data in userspace. > > > > > > Yeah. For maximum utility i'd suggest to extend --pid to include this, and > > > introduce --tid for the previous, limited-to-a-single-task functionality. > > > > > > Most users would expect --pid to work like a 'late attach' - i.e. to work like > > > strace -f or like a gdb attach. > > > > Thanks Ingo, Avi. > > > > I worked out below patch against tip/master of March 15th. > > > > Subject: [PATCH] Change perf's parameter --pid to process-wide collection > > From: Zhang, Yanmin <yanmin_zhang@linux.intel.com> > > > > Change parameter -p (--pid) to real process pid and add -t (--tid) meaning > > thread id. Now, --pid means perf collects the statistics of all threads of > > the process, while --tid means perf just collect the statistics of that thread. > > > > BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures > > attr->disabled=1 if it isn't a system-wide collection. If there is a '-p' > > and no forks, 'perf stat -p' doesn't collect any data. In addition, the > > while(!done) in run_perf_stat consumes 100% single cpu time which has bad impact > > on running workload. I added a sleep(1) in the loop. > > > > Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> > Ingo, > > Sorry, the patch has bugs. I need do a better job and will work out 2 > separate patches against the 2 issues. I worked out 3 new patches against tip/master tree of Mar. 17th. 1) Patch perf_stat: Fix the issue that perf doesn't enable counters when target_pid != -1. Change the condition to fork/exec subcommand. If there is a subcommand parameter, perf always fork/exec it. The usage example is: #perf stat -a sleep 10 So this command could collect statistics for 10 seconds precisely. User still could stop it by CTRL+C. 2) Patch perf_record: Fix the issue that when perf forks/exec a subcommand, it should enable all counters after the new process is execing.Change the condition to fork/exec subcommand. If there is a subcommand parameter, perf always fork/exec it. The usage example is: #perf record -f -a sleep 10 So this command could collect statistics for 10 seconds precisely. User still could stop it by CTRL+C. 3) perf_pid: Change parameter --pid to process-wide collection. Add --tid which means collecting thread-wide statistics. Usage example is: #perf top -p 8888 #perf record -p 8888 -f sleep 10 #perf stat -p 8888 -f sleep 10 Arnaldo, Pls. apply the 3 attached patches. Yanmin [-- Attachment #2: perf_stat_2.6_tipmaster0317_v02.patch --] [-- Type: text/x-patch, Size: 2075 bytes --] diff -Nraup linux-2.6_tipmaster0317/tools/perf/builtin-stat.c linux-2.6_tipmaster0317_fixstat/tools/perf/builtin-stat.c --- linux-2.6_tipmaster0317/tools/perf/builtin-stat.c 2010-03-18 09:04:40.938289813 +0800 +++ linux-2.6_tipmaster0317_fixstat/tools/perf/builtin-stat.c 2010-03-18 13:07:26.773773541 +0800 @@ -159,8 +159,10 @@ static void create_perf_stat_counter(int } } else { attr->inherit = inherit; - attr->disabled = 1; - attr->enable_on_exec = 1; + if (target_pid == -1) { + attr->disabled = 1; + attr->enable_on_exec = 1; + } fd[0][counter] = sys_perf_event_open(attr, pid, -1, -1, 0); if (fd[0][counter] < 0 && verbose) @@ -251,9 +253,9 @@ static int run_perf_stat(int argc __used unsigned long long t0, t1; int status = 0; int counter; - int pid = target_pid; + int pid; int child_ready_pipe[2], go_pipe[2]; - const bool forks = (target_pid == -1 && argc > 0); + const bool forks = (argc > 0); char buf; if (!system_wide) @@ -265,10 +267,10 @@ static int run_perf_stat(int argc __used } if (forks) { - if ((pid = fork()) < 0) + if ((child_pid = fork()) < 0) perror("failed to fork"); - if (!pid) { + if (!child_pid) { close(child_ready_pipe[0]); close(go_pipe[1]); fcntl(go_pipe[0], F_SETFD, FD_CLOEXEC); @@ -297,8 +299,6 @@ static int run_perf_stat(int argc __used exit(-1); } - child_pid = pid; - /* * Wait for the child to be ready to exec. */ @@ -309,6 +309,10 @@ static int run_perf_stat(int argc __used close(child_ready_pipe[0]); } + if (target_pid == -1) + pid = child_pid; + else + pid = target_pid; for (counter = 0; counter < nr_counters; counter++) create_perf_stat_counter(counter, pid); @@ -321,7 +325,7 @@ static int run_perf_stat(int argc __used close(go_pipe[1]); wait(&status); } else { - while(!done); + while(!done) sleep(1); } t1 = rdclock(); @@ -459,7 +463,7 @@ static volatile int signr = -1; static void skip_signal(int signo) { - if(target_pid != -1) + if(child_pid == -1) done = 1; signr = signo; [-- Attachment #3: perf_record_2.6_tipmaster0317_v02.patch --] [-- Type: text/x-patch, Size: 2729 bytes --] diff -Nraup linux-2.6_tip0317/tools/perf/builtin-record.c linux-2.6_tip0317_fixrecord/tools/perf/builtin-record.c --- linux-2.6_tip0317/tools/perf/builtin-record.c 2010-03-18 09:04:40.942263175 +0800 +++ linux-2.6_tip0317_fixrecord/tools/perf/builtin-record.c 2010-03-18 13:33:24.254359348 +0800 @@ -225,7 +225,7 @@ static struct perf_header_attr *get_head return h_attr; } -static void create_counter(int counter, int cpu, pid_t pid, bool forks) +static void create_counter(int counter, int cpu, pid_t pid) { char *filter = filters[counter]; struct perf_event_attr *attr = attrs + counter; @@ -275,10 +275,10 @@ static void create_counter(int counter, attr->mmap = track; attr->comm = track; attr->inherit = inherit; - attr->disabled = 1; - - if (forks) + if (target_pid == -1 && !system_wide) { + attr->disabled = 1; attr->enable_on_exec = 1; + } try_again: fd[nr_cpu][counter] = sys_perf_event_open(attr, pid, cpu, group_fd, 0); @@ -380,17 +380,15 @@ try_again: exit(-1); } } - - ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_ENABLE); } -static void open_counters(int cpu, pid_t pid, bool forks) +static void open_counters(int cpu, pid_t pid) { int counter; group_fd = -1; for (counter = 0; counter < nr_counters; counter++) - create_counter(counter, cpu, pid, forks); + create_counter(counter, cpu, pid); nr_cpu++; } @@ -425,7 +423,7 @@ static int __cmd_record(int argc, const int err; unsigned long waking = 0; int child_ready_pipe[2], go_pipe[2]; - const bool forks = target_pid == -1 && argc > 0; + const bool forks = argc > 0; char buf; page_size = sysconf(_SC_PAGE_SIZE); @@ -496,13 +494,13 @@ static int __cmd_record(int argc, const atexit(atexit_header); if (forks) { - pid = fork(); + child_pid = fork(); if (pid < 0) { perror("failed to fork"); exit(-1); } - if (!pid) { + if (!child_pid) { close(child_ready_pipe[0]); close(go_pipe[1]); fcntl(go_pipe[0], F_SETFD, FD_CLOEXEC); @@ -531,11 +529,6 @@ static int __cmd_record(int argc, const exit(-1); } - child_pid = pid; - - if (!system_wide) - target_pid = pid; - close(child_ready_pipe[1]); close(go_pipe[0]); /* @@ -548,13 +541,17 @@ static int __cmd_record(int argc, const close(child_ready_pipe[0]); } + if (forks && target_pid == -1 && !system_wide) + pid = child_pid; + else + pid = target_pid; if ((!system_wide && !inherit) || profile_cpu != -1) { - open_counters(profile_cpu, target_pid, forks); + open_counters(profile_cpu, pid); } else { nr_cpus = read_cpu_map(); for (i = 0; i < nr_cpus; i++) - open_counters(cpumap[i], target_pid, forks); + open_counters(cpumap[i], pid); } if (file_new) { [-- Attachment #4: perf_pid_2.6_tip0317_v06.patch --] [-- Type: text/x-patch, Size: 29446 bytes --] diff -Nraup linux-2.6_tip0317_statrecord/tools/perf/builtin-record.c linux-2.6_tip0317_statrecordpid/tools/perf/builtin-record.c --- linux-2.6_tip0317_statrecord/tools/perf/builtin-record.c 2010-03-18 13:48:39.578181540 +0800 +++ linux-2.6_tip0317_statrecordpid/tools/perf/builtin-record.c 2010-03-18 14:28:41.449631936 +0800 @@ -27,7 +27,7 @@ #include <unistd.h> #include <sched.h> -static int fd[MAX_NR_CPUS][MAX_COUNTERS]; +static int *fd[MAX_NR_CPUS][MAX_COUNTERS]; static long default_interval = 0; @@ -43,6 +43,9 @@ static int raw_samples = 0; static int system_wide = 0; static int profile_cpu = -1; static pid_t target_pid = -1; +static pid_t target_tid = -1; +static pid_t *all_tids = NULL; +static int thread_num = 0; static pid_t child_pid = -1; static int inherit = 1; static int force = 0; @@ -60,7 +63,7 @@ static struct timeval this_read; static u64 bytes_written = 0; -static struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS]; +static struct pollfd *event_array; static int nr_poll = 0; static int nr_cpu = 0; @@ -77,7 +80,7 @@ struct mmap_data { unsigned int prev; }; -static struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS]; +static struct mmap_data *mmap_array[MAX_NR_CPUS][MAX_COUNTERS]; static unsigned long mmap_read_head(struct mmap_data *md) { @@ -225,12 +228,13 @@ static struct perf_header_attr *get_head return h_attr; } -static void create_counter(int counter, int cpu, pid_t pid) +static void create_counter(int counter, int cpu) { char *filter = filters[counter]; struct perf_event_attr *attr = attrs + counter; struct perf_header_attr *h_attr; int track = !counter; /* only the first counter needs these */ + int thread_index; int ret; struct { u64 count; @@ -280,115 +284,124 @@ static void create_counter(int counter, attr->enable_on_exec = 1; } + for (thread_index = 0; thread_index < thread_num; thread_index++) { try_again: - fd[nr_cpu][counter] = sys_perf_event_open(attr, pid, cpu, group_fd, 0); + fd[nr_cpu][counter][thread_index] = sys_perf_event_open(attr, + all_tids[thread_index], cpu, group_fd, 0); - if (fd[nr_cpu][counter] < 0) { - int err = errno; + if (fd[nr_cpu][counter][thread_index] < 0) { + int err = errno; - if (err == EPERM || err == EACCES) - die("Permission error - are you root?\n" - "\t Consider tweaking /proc/sys/kernel/perf_event_paranoid.\n"); - else if (err == ENODEV && profile_cpu != -1) - die("No such device - did you specify an out-of-range profile CPU?\n"); + if (err == EPERM || err == EACCES) + die("Permission error - are you root?\n" + "\t Consider tweaking" + " /proc/sys/kernel/perf_event_paranoid.\n"); + else if (err == ENODEV && profile_cpu != -1) { + die("No such device - did you specify" + " an out-of-range profile CPU?\n"); + } - /* - * If it's cycles then fall back to hrtimer - * based cpu-clock-tick sw counter, which - * is always available even if no PMU support: - */ - if (attr->type == PERF_TYPE_HARDWARE - && attr->config == PERF_COUNT_HW_CPU_CYCLES) { + /* + * If it's cycles then fall back to hrtimer + * based cpu-clock-tick sw counter, which + * is always available even if no PMU support: + */ + if (attr->type == PERF_TYPE_HARDWARE + && attr->config == PERF_COUNT_HW_CPU_CYCLES) { - if (verbose) - warning(" ... trying to fall back to cpu-clock-ticks\n"); - attr->type = PERF_TYPE_SOFTWARE; - attr->config = PERF_COUNT_SW_CPU_CLOCK; - goto try_again; - } - printf("\n"); - error("perfcounter syscall returned with %d (%s)\n", - fd[nr_cpu][counter], strerror(err)); + if (verbose) + warning(" ... trying to fall back to cpu-clock-ticks\n"); + attr->type = PERF_TYPE_SOFTWARE; + attr->config = PERF_COUNT_SW_CPU_CLOCK; + goto try_again; + } + printf("\n"); + error("perfcounter syscall returned with %d (%s)\n", + fd[nr_cpu][counter][thread_index], strerror(err)); #if defined(__i386__) || defined(__x86_64__) - if (attr->type == PERF_TYPE_HARDWARE && err == EOPNOTSUPP) - die("No hardware sampling interrupt available. No APIC? If so then you can boot the kernel with the \"lapic\" boot parameter to force-enable it.\n"); + if (attr->type == PERF_TYPE_HARDWARE && err == EOPNOTSUPP) + die("No hardware sampling interrupt available." + " No APIC? If so then you can boot the kernel" + " with the \"lapic\" boot parameter to" + " force-enable it.\n"); #endif - die("No CONFIG_PERF_EVENTS=y kernel support configured?\n"); - exit(-1); - } + die("No CONFIG_PERF_EVENTS=y kernel support configured?\n"); + exit(-1); + } - h_attr = get_header_attr(attr, counter); - if (h_attr == NULL) - die("nomem\n"); + h_attr = get_header_attr(attr, counter); + if (h_attr == NULL) + die("nomem\n"); + + if (!file_new) { + if (memcmp(&h_attr->attr, attr, sizeof(*attr))) { + fprintf(stderr, "incompatible append\n"); + exit(-1); + } + } - if (!file_new) { - if (memcmp(&h_attr->attr, attr, sizeof(*attr))) { - fprintf(stderr, "incompatible append\n"); + if (read(fd[nr_cpu][counter][thread_index], &read_data, sizeof(read_data)) == -1) { + perror("Unable to read perf file descriptor\n"); exit(-1); } - } - - if (read(fd[nr_cpu][counter], &read_data, sizeof(read_data)) == -1) { - perror("Unable to read perf file descriptor\n"); - exit(-1); - } - if (perf_header_attr__add_id(h_attr, read_data.id) < 0) { - pr_warning("Not enough memory to add id\n"); - exit(-1); - } + if (perf_header_attr__add_id(h_attr, read_data.id) < 0) { + pr_warning("Not enough memory to add id\n"); + exit(-1); + } - assert(fd[nr_cpu][counter] >= 0); - fcntl(fd[nr_cpu][counter], F_SETFL, O_NONBLOCK); + assert(fd[nr_cpu][counter][thread_index] >= 0); + fcntl(fd[nr_cpu][counter][thread_index], F_SETFL, O_NONBLOCK); - /* - * First counter acts as the group leader: - */ - if (group && group_fd == -1) - group_fd = fd[nr_cpu][counter]; - if (multiplex && multiplex_fd == -1) - multiplex_fd = fd[nr_cpu][counter]; + /* + * First counter acts as the group leader: + */ + if (group && group_fd == -1) + group_fd = fd[nr_cpu][counter][thread_index]; + if (multiplex && multiplex_fd == -1) + multiplex_fd = fd[nr_cpu][counter][thread_index]; - if (multiplex && fd[nr_cpu][counter] != multiplex_fd) { + if (multiplex && fd[nr_cpu][counter][thread_index] != multiplex_fd) { - ret = ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_SET_OUTPUT, multiplex_fd); - assert(ret != -1); - } else { - event_array[nr_poll].fd = fd[nr_cpu][counter]; - event_array[nr_poll].events = POLLIN; - nr_poll++; - - mmap_array[nr_cpu][counter].counter = counter; - mmap_array[nr_cpu][counter].prev = 0; - mmap_array[nr_cpu][counter].mask = mmap_pages*page_size - 1; - mmap_array[nr_cpu][counter].base = mmap(NULL, (mmap_pages+1)*page_size, - PROT_READ|PROT_WRITE, MAP_SHARED, fd[nr_cpu][counter], 0); - if (mmap_array[nr_cpu][counter].base == MAP_FAILED) { - error("failed to mmap with %d (%s)\n", errno, strerror(errno)); - exit(-1); + ret = ioctl(fd[nr_cpu][counter][thread_index], PERF_EVENT_IOC_SET_OUTPUT, multiplex_fd); + assert(ret != -1); + } else { + event_array[nr_poll].fd = fd[nr_cpu][counter][thread_index]; + event_array[nr_poll].events = POLLIN; + nr_poll++; + + mmap_array[nr_cpu][counter][thread_index].counter = counter; + mmap_array[nr_cpu][counter][thread_index].prev = 0; + mmap_array[nr_cpu][counter][thread_index].mask = mmap_pages*page_size - 1; + mmap_array[nr_cpu][counter][thread_index].base = mmap(NULL, (mmap_pages+1)*page_size, + PROT_READ|PROT_WRITE, MAP_SHARED, fd[nr_cpu][counter][thread_index], 0); + if (mmap_array[nr_cpu][counter][thread_index].base == MAP_FAILED) { + error("failed to mmap with %d (%s)\n", errno, strerror(errno)); + exit(-1); + } } - } - if (filter != NULL) { - ret = ioctl(fd[nr_cpu][counter], - PERF_EVENT_IOC_SET_FILTER, filter); - if (ret) { - error("failed to set filter with %d (%s)\n", errno, - strerror(errno)); - exit(-1); + if (filter != NULL) { + ret = ioctl(fd[nr_cpu][counter][thread_index], + PERF_EVENT_IOC_SET_FILTER, filter); + if (ret) { + error("failed to set filter with %d (%s)\n", errno, + strerror(errno)); + exit(-1); + } } } } -static void open_counters(int cpu, pid_t pid) +static void open_counters(int cpu) { int counter; group_fd = -1; for (counter = 0; counter < nr_counters; counter++) - create_counter(counter, cpu, pid); + create_counter(counter, cpu); nr_cpu++; } @@ -529,6 +542,9 @@ static int __cmd_record(int argc, const exit(-1); } + if (!system_wide && target_tid == -1 && target_pid == -1) + all_tids[0] = child_pid; + close(child_ready_pipe[1]); close(go_pipe[0]); /* @@ -541,17 +557,12 @@ static int __cmd_record(int argc, const close(child_ready_pipe[0]); } - if (forks && target_pid == -1 && !system_wide) - pid = child_pid; - else - pid = target_pid; - if ((!system_wide && !inherit) || profile_cpu != -1) { - open_counters(profile_cpu, pid); + open_counters(profile_cpu); } else { nr_cpus = read_cpu_map(); for (i = 0; i < nr_cpus; i++) - open_counters(cpumap[i], pid); + open_counters(cpumap[i]); } if (file_new) { @@ -576,7 +587,7 @@ static int __cmd_record(int argc, const } if (!system_wide && profile_cpu == -1) - event__synthesize_thread(target_pid, process_synthesized_event, + event__synthesize_thread(target_tid, process_synthesized_event, session); else event__synthesize_threads(process_synthesized_event, session); @@ -599,11 +610,16 @@ static int __cmd_record(int argc, const for (;;) { int hits = samples; + int thread; for (i = 0; i < nr_cpu; i++) { for (counter = 0; counter < nr_counters; counter++) { - if (mmap_array[i][counter].base) - mmap_read(&mmap_array[i][counter]); + for (thread = 0; + thread < thread_num; thread++) { + if (mmap_array[i][counter][thread].base) + mmap_read(&mmap_array[i][counter][thread]); + } + } } @@ -616,8 +632,15 @@ static int __cmd_record(int argc, const if (done) { for (i = 0; i < nr_cpu; i++) { - for (counter = 0; counter < nr_counters; counter++) - ioctl(fd[i][counter], PERF_EVENT_IOC_DISABLE); + for (counter = 0; + counter < nr_counters; + counter++) { + for (thread = 0; + thread < thread_num; + thread++) + ioctl(fd[i][counter][thread], + PERF_EVENT_IOC_DISABLE); + } } } } @@ -649,7 +672,9 @@ static const struct option options[] = { OPT_CALLBACK(0, "filter", NULL, "filter", "event filter", parse_filter), OPT_INTEGER('p', "pid", &target_pid, - "record events on existing pid"), + "record events on existing process id"), + OPT_INTEGER('t', "tid", &target_tid, + "record events on existing thread id"), OPT_INTEGER('r', "realtime", &realtime_prio, "collect data with this RT SCHED_FIFO priority"), OPT_BOOLEAN('R', "raw-samples", &raw_samples, @@ -690,10 +715,12 @@ static const struct option options[] = { int cmd_record(int argc, const char **argv, const char *prefix __used) { int counter; + int i,j; argc = parse_options(argc, argv, options, record_usage, PARSE_OPT_STOP_AT_NON_OPTION); - if (!argc && target_pid == -1 && !system_wide && profile_cpu == -1) + if (!argc && target_pid == -1 && target_tid == -1 && + !system_wide && profile_cpu == -1) usage_with_options(record_usage, options); symbol__init(); @@ -704,6 +731,37 @@ int cmd_record(int argc, const char **ar attrs[0].config = PERF_COUNT_HW_CPU_CYCLES; } + if (target_pid != -1) { + target_tid = target_pid; + thread_num = find_all_tid(target_pid, &all_tids); + if (thread_num <= 0) { + fprintf(stderr, "Can't find all threads of pid %d\n", + target_pid); + usage_with_options(record_usage, options); + } + } else { + all_tids=malloc(sizeof(pid_t)); + if (!all_tids) + return -ENOMEM; + + all_tids[0] = target_tid; + thread_num = 1; + } + + for (i = 0; i < MAX_NR_CPUS; i++) { + for (j = 0; j < MAX_COUNTERS; j++) { + fd[i][j] = malloc(sizeof(int)*thread_num); + mmap_array[i][j] = malloc( + sizeof(struct mmap_data)*thread_num); + if (!fd[i][j] || !mmap_array[i][j]) + return -ENOMEM; + } + } + event_array = malloc( + sizeof(struct pollfd)*MAX_NR_CPUS*MAX_COUNTERS*thread_num); + if (!event_array) + return -ENOMEM; + /* * User specified count overrides default frequency. */ diff -Nraup linux-2.6_tip0317_statrecord/tools/perf/builtin-stat.c linux-2.6_tip0317_statrecordpid/tools/perf/builtin-stat.c --- linux-2.6_tip0317_statrecord/tools/perf/builtin-stat.c 2010-03-18 13:46:14.600074330 +0800 +++ linux-2.6_tip0317_statrecordpid/tools/perf/builtin-stat.c 2010-03-18 14:29:49.318367157 +0800 @@ -46,6 +46,7 @@ #include "util/debug.h" #include "util/header.h" #include "util/cpumap.h" +#include "util/thread.h" #include <sys/prctl.h> #include <math.h> @@ -74,10 +75,13 @@ static int run_count = 1; static int inherit = 1; static int scale = 1; static pid_t target_pid = -1; +static pid_t target_tid = -1; +static pid_t *all_tids = NULL; +static int thread_num = 0; static pid_t child_pid = -1; static int null_run = 0; -static int fd[MAX_NR_CPUS][MAX_COUNTERS]; +static int *fd[MAX_NR_CPUS][MAX_COUNTERS]; static int event_scaled[MAX_COUNTERS]; @@ -140,9 +144,10 @@ struct stats runtime_branches_stats; #define ERR_PERF_OPEN \ "Error: counter %d, sys_perf_event_open() syscall returned with %d (%s)\n" -static void create_perf_stat_counter(int counter, int pid) +static void create_perf_stat_counter(int counter) { struct perf_event_attr *attr = attrs + counter; + int thread; if (scale) attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED | @@ -152,10 +157,11 @@ static void create_perf_stat_counter(int unsigned int cpu; for (cpu = 0; cpu < nr_cpus; cpu++) { - fd[cpu][counter] = sys_perf_event_open(attr, -1, cpumap[cpu], -1, 0); - if (fd[cpu][counter] < 0 && verbose) + fd[cpu][counter][0] = sys_perf_event_open(attr, + -1, cpumap[cpu], -1, 0); + if (fd[cpu][counter][0] < 0 && verbose) fprintf(stderr, ERR_PERF_OPEN, counter, - fd[cpu][counter], strerror(errno)); + fd[cpu][counter][0], strerror(errno)); } } else { attr->inherit = inherit; @@ -163,11 +169,14 @@ static void create_perf_stat_counter(int attr->disabled = 1; attr->enable_on_exec = 1; } - - fd[0][counter] = sys_perf_event_open(attr, pid, -1, -1, 0); - if (fd[0][counter] < 0 && verbose) - fprintf(stderr, ERR_PERF_OPEN, counter, - fd[0][counter], strerror(errno)); + for (thread = 0; thread < thread_num; thread++) { + fd[0][counter][thread] = sys_perf_event_open(attr, + all_tids[thread], -1, -1, 0); + if (fd[0][counter][thread] < 0 && verbose) + fprintf(stderr, ERR_PERF_OPEN, counter, + fd[0][counter][thread], + strerror(errno)); + } } } @@ -192,25 +201,28 @@ static void read_counter(int counter) unsigned int cpu; size_t res, nv; int scaled; - int i; + int i, thread; count[0] = count[1] = count[2] = 0; nv = scale ? 3 : 1; for (cpu = 0; cpu < nr_cpus; cpu++) { - if (fd[cpu][counter] < 0) - continue; - - res = read(fd[cpu][counter], single_count, nv * sizeof(u64)); - assert(res == nv * sizeof(u64)); - - close(fd[cpu][counter]); - fd[cpu][counter] = -1; - - count[0] += single_count[0]; - if (scale) { - count[1] += single_count[1]; - count[2] += single_count[2]; + for (thread = 0; thread < thread_num; thread++) { + if (fd[cpu][counter][thread] < 0) + continue; + + res = read(fd[cpu][counter][thread], + single_count, nv * sizeof(u64)); + assert(res == nv * sizeof(u64)); + + close(fd[cpu][counter][thread]); + fd[cpu][counter][thread] = -1; + + count[0] += single_count[0]; + if (scale) { + count[1] += single_count[1]; + count[2] += single_count[2]; + } } } @@ -253,7 +265,6 @@ static int run_perf_stat(int argc __used unsigned long long t0, t1; int status = 0; int counter; - int pid; int child_ready_pipe[2], go_pipe[2]; const bool forks = (argc > 0); char buf; @@ -299,6 +310,9 @@ static int run_perf_stat(int argc __used exit(-1); } + if (target_tid == -1 && target_pid == -1 && !system_wide) + all_tids[0] = child_pid; + /* * Wait for the child to be ready to exec. */ @@ -309,12 +323,8 @@ static int run_perf_stat(int argc __used close(child_ready_pipe[0]); } - if (target_pid == -1) - pid = child_pid; - else - pid = target_pid; for (counter = 0; counter < nr_counters; counter++) - create_perf_stat_counter(counter, pid); + create_perf_stat_counter(counter); /* * Enable counters and exec the command: @@ -433,12 +443,14 @@ static void print_stat(int argc, const c fprintf(stderr, "\n"); fprintf(stderr, " Performance counter stats for "); - if(target_pid == -1) { + if(target_pid == -1 && target_tid == -1) { fprintf(stderr, "\'%s", argv[0]); for (i = 1; i < argc; i++) fprintf(stderr, " %s", argv[i]); - }else - fprintf(stderr, "task pid \'%d", target_pid); + } else if (target_pid != -1) + fprintf(stderr, "process id \'%d", target_pid); + else + fprintf(stderr, "thread id \'%d", target_tid); fprintf(stderr, "\'"); if (run_count > 1) @@ -493,7 +505,9 @@ static const struct option options[] = { OPT_BOOLEAN('i', "inherit", &inherit, "child tasks inherit counters"), OPT_INTEGER('p', "pid", &target_pid, - "stat events on existing pid"), + "stat events on existing process id"), + OPT_INTEGER('t', "tid", &target_tid, + "stat events on existing thread id"), OPT_BOOLEAN('a', "all-cpus", &system_wide, "system-wide collection from all CPUs"), OPT_BOOLEAN('c', "scale", &scale, @@ -510,10 +524,11 @@ static const struct option options[] = { int cmd_stat(int argc, const char **argv, const char *prefix __used) { int status; + int i,j; argc = parse_options(argc, argv, options, stat_usage, PARSE_OPT_STOP_AT_NON_OPTION); - if (!argc && target_pid == -1) + if (!argc && target_pid == -1 && target_tid == -1) usage_with_options(stat_usage, options); if (run_count <= 0) usage_with_options(stat_usage, options); @@ -529,6 +544,31 @@ int cmd_stat(int argc, const char **argv else nr_cpus = 1; + if (target_pid != -1) { + target_tid = target_pid; + thread_num = find_all_tid(target_pid, &all_tids); + if (thread_num <= 0) { + fprintf(stderr, "Can't find all threads of pid %d\n", + target_pid); + usage_with_options(stat_usage, options); + } + } else { + all_tids=malloc(sizeof(pid_t)); + if (!all_tids) + return -ENOMEM; + + all_tids[0] = target_tid; + thread_num = 1; + } + + for (i = 0; i < MAX_NR_CPUS; i++) { + for (j = 0; j < MAX_COUNTERS; j++) { + fd[i][j] = malloc(sizeof(int)*thread_num); + if (!fd[i][j]) + return -ENOMEM; + } + } + /* * We dont want to block the signals - that would cause * child tasks to inherit that and Ctrl-C would not work. diff -Nraup linux-2.6_tip0317_statrecord/tools/perf/builtin-top.c linux-2.6_tip0317_statrecordpid/tools/perf/builtin-top.c --- linux-2.6_tip0317_statrecord/tools/perf/builtin-top.c 2010-03-18 13:45:27.252768232 +0800 +++ linux-2.6_tip0317_statrecordpid/tools/perf/builtin-top.c 2010-03-18 14:26:52.766054822 +0800 @@ -55,7 +55,7 @@ #include <linux/unistd.h> #include <linux/types.h> -static int fd[MAX_NR_CPUS][MAX_COUNTERS]; +static int *fd[MAX_NR_CPUS][MAX_COUNTERS]; static int system_wide = 0; @@ -65,6 +65,9 @@ static int count_filter = 5; static int print_entries; static int target_pid = -1; +static int target_tid = -1; +static pid_t *all_tids = NULL; +static int thread_num = 0; static int inherit = 0; static int profile_cpu = -1; static int nr_cpus = 0; @@ -524,13 +527,15 @@ static void print_sym_table(void) if (target_pid != -1) printf(" (target_pid: %d", target_pid); + else if (target_tid != -1) + printf(" (target_tid: %d", target_tid); else printf(" (all"); if (profile_cpu != -1) printf(", cpu: %d)\n", profile_cpu); else { - if (target_pid != -1) + if (target_tid != -1) printf(")\n"); else printf(", %d CPUs)\n", nr_cpus); @@ -1129,16 +1134,21 @@ static void perf_session__mmap_read_coun md->prev = old; } -static struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS]; -static struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS]; +static struct pollfd *event_array; +static struct mmap_data *mmap_array[MAX_NR_CPUS][MAX_COUNTERS]; static void perf_session__mmap_read(struct perf_session *self) { - int i, counter; + int i, counter, thread_index; for (i = 0; i < nr_cpus; i++) { for (counter = 0; counter < nr_counters; counter++) - perf_session__mmap_read_counter(self, &mmap_array[i][counter]); + for (thread_index = 0; + thread_index < thread_num; + thread_index++) { + perf_session__mmap_read_counter(self, + &mmap_array[i][counter][thread_index]); + } } } @@ -1149,9 +1159,10 @@ static void start_counter(int i, int cou { struct perf_event_attr *attr; int cpu; + int thread_index; cpu = profile_cpu; - if (target_pid == -1 && profile_cpu == -1) + if (target_tid == -1 && profile_cpu == -1) cpu = cpumap[i]; attr = attrs + counter; @@ -1167,55 +1178,58 @@ static void start_counter(int i, int cou attr->inherit = (cpu < 0) && inherit; attr->mmap = 1; + for (thread_index = 0; thread_index < thread_num; thread_index++) { try_again: - fd[i][counter] = sys_perf_event_open(attr, target_pid, cpu, group_fd, 0); + fd[i][counter][thread_index] = sys_perf_event_open(attr, + all_tids[thread_index], cpu, group_fd, 0); + + if (fd[i][counter][thread_index] < 0) { + int err = errno; - if (fd[i][counter] < 0) { - int err = errno; + if (err == EPERM || err == EACCES) + die("No permission - are you root?\n"); + /* + * If it's cycles then fall back to hrtimer + * based cpu-clock-tick sw counter, which + * is always available even if no PMU support: + */ + if (attr->type == PERF_TYPE_HARDWARE + && attr->config == PERF_COUNT_HW_CPU_CYCLES) { + + if (verbose) + warning(" ... trying to fall back to cpu-clock-ticks\n"); + + attr->type = PERF_TYPE_SOFTWARE; + attr->config = PERF_COUNT_SW_CPU_CLOCK; + goto try_again; + } + printf("\n"); + error("perfcounter syscall returned with %d (%s)\n", + fd[i][counter][thread_index], strerror(err)); + die("No CONFIG_PERF_EVENTS=y kernel support configured?\n"); + exit(-1); + } + assert(fd[i][counter][thread_index] >= 0); + fcntl(fd[i][counter][thread_index], F_SETFL, O_NONBLOCK); - if (err == EPERM || err == EACCES) - die("No permission - are you root?\n"); /* - * If it's cycles then fall back to hrtimer - * based cpu-clock-tick sw counter, which - * is always available even if no PMU support: + * First counter acts as the group leader: */ - if (attr->type == PERF_TYPE_HARDWARE - && attr->config == PERF_COUNT_HW_CPU_CYCLES) { + if (group && group_fd == -1) + group_fd = fd[i][counter][thread_index]; - if (verbose) - warning(" ... trying to fall back to cpu-clock-ticks\n"); - - attr->type = PERF_TYPE_SOFTWARE; - attr->config = PERF_COUNT_SW_CPU_CLOCK; - goto try_again; - } - printf("\n"); - error("perfcounter syscall returned with %d (%s)\n", - fd[i][counter], strerror(err)); - die("No CONFIG_PERF_EVENTS=y kernel support configured?\n"); - exit(-1); + event_array[nr_poll].fd = fd[i][counter][thread_index]; + event_array[nr_poll].events = POLLIN; + nr_poll++; + + mmap_array[i][counter][thread_index].counter = counter; + mmap_array[i][counter][thread_index].prev = 0; + mmap_array[i][counter][thread_index].mask = mmap_pages*page_size - 1; + mmap_array[i][counter][thread_index].base = mmap(NULL, (mmap_pages+1)*page_size, + PROT_READ, MAP_SHARED, fd[i][counter][thread_index], 0); + if (mmap_array[i][counter][thread_index].base == MAP_FAILED) + die("failed to mmap with %d (%s)\n", errno, strerror(errno)); } - assert(fd[i][counter] >= 0); - fcntl(fd[i][counter], F_SETFL, O_NONBLOCK); - - /* - * First counter acts as the group leader: - */ - if (group && group_fd == -1) - group_fd = fd[i][counter]; - - event_array[nr_poll].fd = fd[i][counter]; - event_array[nr_poll].events = POLLIN; - nr_poll++; - - mmap_array[i][counter].counter = counter; - mmap_array[i][counter].prev = 0; - mmap_array[i][counter].mask = mmap_pages*page_size - 1; - mmap_array[i][counter].base = mmap(NULL, (mmap_pages+1)*page_size, - PROT_READ, MAP_SHARED, fd[i][counter], 0); - if (mmap_array[i][counter].base == MAP_FAILED) - die("failed to mmap with %d (%s)\n", errno, strerror(errno)); } static int __cmd_top(void) @@ -1231,8 +1245,8 @@ static int __cmd_top(void) if (session == NULL) return -ENOMEM; - if (target_pid != -1) - event__synthesize_thread(target_pid, event__process, session); + if (target_tid != -1) + event__synthesize_thread(target_tid, event__process, session); else event__synthesize_threads(event__process, session); @@ -1243,7 +1257,7 @@ static int __cmd_top(void) } /* Wait for a minimal set of events before starting the snapshot */ - poll(event_array, nr_poll, 100); + poll(&event_array[0], nr_poll, 100); perf_session__mmap_read(session); @@ -1286,7 +1300,9 @@ static const struct option options[] = { OPT_INTEGER('c', "count", &default_interval, "event period to sample"), OPT_INTEGER('p', "pid", &target_pid, - "profile events on existing pid"), + "profile events on existing process id"), + OPT_INTEGER('t', "tid", &target_tid, + "profile events on existing thread id"), OPT_BOOLEAN('a', "all-cpus", &system_wide, "system-wide collection from all CPUs"), OPT_INTEGER('C', "CPU", &profile_cpu, @@ -1327,6 +1343,7 @@ static const struct option options[] = { int cmd_top(int argc, const char **argv, const char *prefix __used) { int counter; + int i,j; page_size = sysconf(_SC_PAGE_SIZE); @@ -1334,8 +1351,39 @@ int cmd_top(int argc, const char **argv, if (argc) usage_with_options(top_usage, options); + if (target_pid != -1) { + target_tid = target_pid; + thread_num = find_all_tid(target_pid, &all_tids); + if (thread_num <= 0) { + fprintf(stderr, "Can't find all threads of pid %d\n", + target_pid); + usage_with_options(top_usage, options); + } + } else { + all_tids=malloc(sizeof(pid_t)); + if (!all_tids) + return -ENOMEM; + + all_tids[0] = target_tid; + thread_num = 1; + } + + for (i = 0; i < MAX_NR_CPUS; i++) { + for (j = 0; j < MAX_COUNTERS; j++) { + fd[i][j] = malloc(sizeof(int)*thread_num); + mmap_array[i][j] = malloc( + sizeof(struct mmap_data)*thread_num); + if (!fd[i][j] || !mmap_array[i][j]) + return -ENOMEM; + } + } + event_array = malloc( + sizeof(struct pollfd)*MAX_NR_CPUS*MAX_COUNTERS*thread_num); + if (!event_array) + return -ENOMEM; + /* CPU and PID are mutually exclusive */ - if (target_pid != -1 && profile_cpu != -1) { + if (target_tid > 0 && profile_cpu != -1) { printf("WARNING: PID switch overriding CPU\n"); sleep(1); profile_cpu = -1; @@ -1376,7 +1424,7 @@ int cmd_top(int argc, const char **argv, attrs[counter].sample_period = default_interval; } - if (target_pid != -1 || profile_cpu != -1) + if (target_tid != -1 || profile_cpu != -1) nr_cpus = 1; else nr_cpus = read_cpu_map(); diff -Nraup linux-2.6_tip0317_statrecord/tools/perf/util/thread.c linux-2.6_tip0317_statrecordpid/tools/perf/util/thread.c --- linux-2.6_tip0317_statrecord/tools/perf/util/thread.c 2010-03-18 13:45:27.268773347 +0800 +++ linux-2.6_tip0317_statrecordpid/tools/perf/util/thread.c 2010-03-18 14:26:29.588441791 +0800 @@ -7,6 +7,37 @@ #include "util.h" #include "debug.h" +int find_all_tid(int pid, pid_t ** all_tid) +{ + char name[256]; + int items; + struct dirent **namelist = NULL; + int ret = 0; + int i; + + sprintf(name, "/proc/%d/task", pid); + items = scandir(name, &namelist, NULL, NULL); + if (items <= 0) + return -ENOENT; + *all_tid = malloc(sizeof(pid_t) * items); + if (!*all_tid) { + ret = -ENOMEM; + goto failure; + } + + for (i = 0; i < items; i++) + (*all_tid)[i] = atoi(namelist[i]->d_name); + + ret = items; + +failure: + for (i=0; i<items; i++) + free(namelist[i]); + free(namelist); + + return ret; +} + void map_groups__init(struct map_groups *self) { int i; @@ -348,3 +379,4 @@ struct symbol *map_groups__find_symbol(s return NULL; } + diff -Nraup linux-2.6_tip0317_statrecord/tools/perf/util/thread.h linux-2.6_tip0317_statrecordpid/tools/perf/util/thread.h --- linux-2.6_tip0317_statrecord/tools/perf/util/thread.h 2010-03-18 13:45:27.256771458 +0800 +++ linux-2.6_tip0317_statrecordpid/tools/perf/util/thread.h 2010-03-18 14:26:03.522627096 +0800 @@ -23,6 +23,7 @@ struct thread { int comm_len; }; +int find_all_tid(int pid, pid_t ** all_tid); void map_groups__init(struct map_groups *self); int thread__set_comm(struct thread *self, const char *comm); int thread__comm_len(struct thread *self); ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-18 7:49 ` Zhang, Yanmin @ 2010-03-18 8:03 ` Ingo Molnar 2010-03-18 13:03 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-18 8:03 UTC (permalink / raw) To: Zhang, Yanmin Cc: Arnaldo Carvalho de Melo, Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > I worked out 3 new patches against tip/master tree of Mar. 17th. Cool! Mind sending them as a series of patches instead of attachment? That makes it easier to review them. Also, the Signed-off-by lines seem to be missing plus we need a per patch changelog as well. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-18 8:03 ` Ingo Molnar @ 2010-03-18 13:03 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 390+ messages in thread From: Arnaldo Carvalho de Melo @ 2010-03-18 13:03 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang Em Thu, Mar 18, 2010 at 09:03:25AM +0100, Ingo Molnar escreveu: > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote: > > > I worked out 3 new patches against tip/master tree of Mar. 17th. > > Cool! Mind sending them as a series of patches instead of attachment? That > makes it easier to review them. Also, the Signed-off-by lines seem to be > missing plus we need a per patch changelog as well. Yeah, please, and I hadn't merged them, so the resend was the best thing to do. - Arnaldo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 7:48 ` Zhang, Yanmin 2010-03-16 9:28 ` Zhang, Yanmin @ 2010-03-16 9:32 ` Avi Kivity 2010-03-17 2:34 ` Zhang, Yanmin 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-16 9:32 UTC (permalink / raw) To: Zhang, Yanmin Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Joerg Roedel On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > >> Excellent, support for guest kernel != host kernel is critical (I can't >> remember the last time I ran same kernels). >> >> How would we support multiple guests with different kernels? >> > With the patch, 'perf kvm report --sort pid" could show > summary statistics for all guest os instances. Then, use > parameter --pid of 'perf kvm record' to collect single problematic instance data. > That certainly works, though automatic association of guest data with guest symbols is friendlier. >>> diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c >>> --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 2010-03-16 08:59:11.825295404 +0800 >>> +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c 2010-03-16 09:01:09.976084492 +0800 >>> @@ -26,6 +26,7 @@ >>> #include<linux/sched.h> >>> #include<linux/moduleparam.h> >>> #include<linux/ftrace_event.h> >>> +#include<linux/perf_event.h> >>> #include "kvm_cache_regs.h" >>> #include "x86.h" >>> >>> @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct >>> vmcs_write32(TPR_THRESHOLD, irr); >>> } >>> >>> +DEFINE_PER_CPU(int, kvm_in_guest) = {0}; >>> + >>> +static void kvm_set_in_guest(void) >>> +{ >>> + percpu_write(kvm_in_guest, 1); >>> +} >>> + >>> +static int kvm_is_in_guest(void) >>> +{ >>> + return percpu_read(kvm_in_guest); >>> +} >>> >>> >> > >> There is already PF_VCPU for this. >> > Right, but there is a scope between kvm_guest_enter and really running > in guest os, where a perf event might overflow. Anyway, the scope is very > narrow, I will change it to use flag PF_VCPU. > There is also a window between setting the flag and calling 'int $2' where an NMI might happen and be accounted incorrectly. Perhaps separate the 'int $2' into a direct call into perf and another call for the rest of NMI handling. I don't see how it would work on svm though - AFAICT the NMI is held whereas vmx swallows it. I guess NMIs will be disabled until the next IRET so it isn't racy, just tricky. >>> +static struct perf_guest_info_callbacks kvm_guest_cbs = { >>> + .is_in_guest = kvm_is_in_guest, >>> + .is_user_mode = kvm_is_user_mode, >>> + .get_guest_ip = kvm_get_guest_ip, >>> + .reset_in_guest = kvm_reset_in_guest, >>> +}; >>> >>> >> Should be in common code, not vmx specific. >> > Right. I discussed with Yangsheng. I will move above data structures and > callbacks to file arch/x86/kvm/x86.c, and add get_ip, a new callback to > kvm_x86_ops. > You will need access to the vcpu pointer (kvm_rip_read() needs it), you can put it in a percpu variable. I guess if it's not null, you know you're in a guest, so no need for PF_VCPU. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 9:32 ` Avi Kivity @ 2010-03-17 2:34 ` Zhang, Yanmin 2010-03-17 9:28 ` Sheng Yang 0 siblings, 1 reply; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-17 2:34 UTC (permalink / raw) To: Avi Kivity Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang, Joerg Roedel On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > > > >> Excellent, support for guest kernel != host kernel is critical (I can't > >> remember the last time I ran same kernels). > >> > >> How would we support multiple guests with different kernels? > >> > > With the patch, 'perf kvm report --sort pid" could show > > summary statistics for all guest os instances. Then, use > > parameter --pid of 'perf kvm record' to collect single problematic instance data. > > > > That certainly works, though automatic association of guest data with > guest symbols is friendlier. Thanks. Originally, I planed to add a -G parameter to perf. Such like -G 8888:/XXX/XXX/guestkallsyms:/XXX/XXX/modules,8889:/XXX/XXX/guestkallsyms:/XXX/XXX/modules 8888 and 8889 are just qemu guest pid. So we could define multiple guest os symbol files. But it seems ugly, and 'perf kvm report --sort pid" and 'perf kvm top --pid' could provide similar functionality. > > >>> diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c > >>> --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c 2010-03-16 08:59:11.825295404 +0800 > >>> +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c 2010-03-16 09:01:09.976084492 +0800 > >>> @@ -26,6 +26,7 @@ > >>> #include<linux/sched.h> > >>> #include<linux/moduleparam.h> > >>> #include<linux/ftrace_event.h> > >>> +#include<linux/perf_event.h> > >>> #include "kvm_cache_regs.h" > >>> #include "x86.h" > >>> > >>> @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct > >>> vmcs_write32(TPR_THRESHOLD, irr); > >>> } > >>> > >>> +DEFINE_PER_CPU(int, kvm_in_guest) = {0}; > >>> + > >>> +static void kvm_set_in_guest(void) > >>> +{ > >>> + percpu_write(kvm_in_guest, 1); > >>> +} > >>> + > >>> +static int kvm_is_in_guest(void) > >>> +{ > >>> + return percpu_read(kvm_in_guest); > >>> +} > >>> > >>> > >> > > > >> There is already PF_VCPU for this. > >> > > Right, but there is a scope between kvm_guest_enter and really running > > in guest os, where a perf event might overflow. Anyway, the scope is very > > narrow, I will change it to use flag PF_VCPU. > > > > There is also a window between setting the flag and calling 'int $2' > where an NMI might happen and be accounted incorrectly. > > Perhaps separate the 'int $2' into a direct call into perf and another > call for the rest of NMI handling. I don't see how it would work on svm > though - AFAICT the NMI is held whereas vmx swallows it. > I guess NMIs > will be disabled until the next IRET so it isn't racy, just tricky. I'm not sure if vmexit does break NMI context or not. Hardware NMI context isn't reentrant till a IRET. YangSheng would like to double check it. > > >>> +static struct perf_guest_info_callbacks kvm_guest_cbs = { > >>> + .is_in_guest = kvm_is_in_guest, > >>> + .is_user_mode = kvm_is_user_mode, > >>> + .get_guest_ip = kvm_get_guest_ip, > >>> + .reset_in_guest = kvm_reset_in_guest, > >>> +}; > >>> > >>> > >> Should be in common code, not vmx specific. > >> > > Right. I discussed with Yangsheng. I will move above data structures and > > callbacks to file arch/x86/kvm/x86.c, and add get_ip, a new callback to > > kvm_x86_ops. > > > > You will need access to the vcpu pointer (kvm_rip_read() needs it), you > can put it in a percpu variable. We do so now in a new patch. > I guess if it's not null, you know > you're in a guest, so no need for PF_VCPU. Good suggestion. Thanks. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 2:34 ` Zhang, Yanmin @ 2010-03-17 9:28 ` Sheng Yang 2010-03-17 9:41 ` Avi Kivity 2010-03-17 21:14 ` Zachary Amsden 0 siblings, 2 replies; 390+ messages in thread From: Sheng Yang @ 2010-03-17 9:28 UTC (permalink / raw) To: Avi Kivity Cc: Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Huang, Zhiteng, Joerg Roedel On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: > On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: > > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > > > Right, but there is a scope between kvm_guest_enter and really running > > > in guest os, where a perf event might overflow. Anyway, the scope is > > > very narrow, I will change it to use flag PF_VCPU. > > > > There is also a window between setting the flag and calling 'int $2' > > where an NMI might happen and be accounted incorrectly. > > > > Perhaps separate the 'int $2' into a direct call into perf and another > > call for the rest of NMI handling. I don't see how it would work on svm > > though - AFAICT the NMI is held whereas vmx swallows it. > > > > I guess NMIs > > will be disabled until the next IRET so it isn't racy, just tricky. > > I'm not sure if vmexit does break NMI context or not. Hardware NMI context > isn't reentrant till a IRET. YangSheng would like to double check it. After more check, I think VMX won't remained NMI block state for host. That's means, if NMI happened and processor is in VMX non-root mode, it would only result in VMExit, with a reason indicate that it's due to NMI happened, but no more state change in the host. So in that meaning, there _is_ a window between VMExit and KVM handle the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling code because "int $2" don't have effect to block following NMI. And if the NMI sequence is not important(I think so), then we need to generate a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to itself is a good idea. I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace "int $2". Something unexpected is happening... -- regards Yang, Sheng ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 9:28 ` Sheng Yang @ 2010-03-17 9:41 ` Avi Kivity 2010-03-17 9:51 ` Sheng Yang 2010-03-17 21:14 ` Zachary Amsden 1 sibling, 1 reply; 390+ messages in thread From: Avi Kivity @ 2010-03-17 9:41 UTC (permalink / raw) To: Sheng Yang Cc: Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Huang, Zhiteng, Joerg Roedel On 03/17/2010 11:28 AM, Sheng Yang wrote: > >> I'm not sure if vmexit does break NMI context or not. Hardware NMI context >> isn't reentrant till a IRET. YangSheng would like to double check it. >> > After more check, I think VMX won't remained NMI block state for host. That's > means, if NMI happened and processor is in VMX non-root mode, it would only > result in VMExit, with a reason indicate that it's due to NMI happened, but no > more state change in the host. > > So in that meaning, there _is_ a window between VMExit and KVM handle the NMI. > Moreover, I think we _can't_ stop the re-entrance of NMI handling code because > "int $2" don't have effect to block following NMI. > That's pretty bad, as NMI runs on a separate stack (via IST). So if another NMI happens while our int $2 is running, the stack will be corrupted. > And if the NMI sequence is not important(I think so), then we need to generate > a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to > itself is a good idea. > > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace > "int $2". Something unexpected is happening... > I think you need DM_NMI for that to work correctly. An alternative is to call the NMI handler directly. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 9:41 ` Avi Kivity @ 2010-03-17 9:51 ` Sheng Yang 2010-03-17 10:06 ` Avi Kivity 0 siblings, 1 reply; 390+ messages in thread From: Sheng Yang @ 2010-03-17 9:51 UTC (permalink / raw) To: Avi Kivity Cc: Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Huang, Zhiteng, Joerg Roedel On Wednesday 17 March 2010 17:41:58 Avi Kivity wrote: > On 03/17/2010 11:28 AM, Sheng Yang wrote: > >> I'm not sure if vmexit does break NMI context or not. Hardware NMI > >> context isn't reentrant till a IRET. YangSheng would like to double > >> check it. > > > > After more check, I think VMX won't remained NMI block state for host. > > That's means, if NMI happened and processor is in VMX non-root mode, it > > would only result in VMExit, with a reason indicate that it's due to NMI > > happened, but no more state change in the host. > > > > So in that meaning, there _is_ a window between VMExit and KVM handle the > > NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling > > code because "int $2" don't have effect to block following NMI. > > That's pretty bad, as NMI runs on a separate stack (via IST). So if > another NMI happens while our int $2 is running, the stack will be > corrupted. Though hardware didn't provide this kind of block, software at least would warn about it... nmi_enter() still would be executed by "int $2", and result in BUG() if we are already in NMI context(OK, it is a little better than mysterious crash due to corrupted stack). > > > And if the NMI sequence is not important(I think so), then we need to > > generate a real NMI in current vmexit-after code. Seems let APIC send a > > NMI IPI to itself is a good idea. > > > > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to > > replace "int $2". Something unexpected is happening... > > I think you need DM_NMI for that to work correctly. > > An alternative is to call the NMI handler directly. apic_send_IPI_self() already took care of APIC_DM_NMI. And NMI handler would block the following NMI? -- regards Yang, Sheng ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 9:51 ` Sheng Yang @ 2010-03-17 10:06 ` Avi Kivity 0 siblings, 0 replies; 390+ messages in thread From: Avi Kivity @ 2010-03-17 10:06 UTC (permalink / raw) To: Sheng Yang Cc: Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, Huang, Zhiteng, Joerg Roedel On 03/17/2010 11:51 AM, Sheng Yang wrote: > >> I think you need DM_NMI for that to work correctly. >> >> An alternative is to call the NMI handler directly. >> > apic_send_IPI_self() already took care of APIC_DM_NMI. > So it does (though not for x2apic?). I don't see why it doesn't work. > And NMI handler would block the following NMI? > > It wouldn't - won't work without extensive changes. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 9:28 ` Sheng Yang 2010-03-17 9:41 ` Avi Kivity @ 2010-03-17 21:14 ` Zachary Amsden 2010-03-18 1:19 ` Sheng Yang 1 sibling, 1 reply; 390+ messages in thread From: Zachary Amsden @ 2010-03-17 21:14 UTC (permalink / raw) To: Sheng Yang Cc: Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Huang, Zhiteng, Joerg Roedel On 03/16/2010 11:28 PM, Sheng Yang wrote: > On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: > >> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: >> >>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: >>> >>>> Right, but there is a scope between kvm_guest_enter and really running >>>> in guest os, where a perf event might overflow. Anyway, the scope is >>>> very narrow, I will change it to use flag PF_VCPU. >>>> >>> There is also a window between setting the flag and calling 'int $2' >>> where an NMI might happen and be accounted incorrectly. >>> >>> Perhaps separate the 'int $2' into a direct call into perf and another >>> call for the rest of NMI handling. I don't see how it would work on svm >>> though - AFAICT the NMI is held whereas vmx swallows it. >>> >>> I guess NMIs >>> will be disabled until the next IRET so it isn't racy, just tricky. >>> >> I'm not sure if vmexit does break NMI context or not. Hardware NMI context >> isn't reentrant till a IRET. YangSheng would like to double check it. >> > After more check, I think VMX won't remained NMI block state for host. That's > means, if NMI happened and processor is in VMX non-root mode, it would only > result in VMExit, with a reason indicate that it's due to NMI happened, but no > more state change in the host. > > So in that meaning, there _is_ a window between VMExit and KVM handle the NMI. > Moreover, I think we _can't_ stop the re-entrance of NMI handling code because > "int $2" don't have effect to block following NMI. > > And if the NMI sequence is not important(I think so), then we need to generate > a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to > itself is a good idea. > > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace > "int $2". Something unexpected is happening... > You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't supposed to be able to. Zach ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-17 21:14 ` Zachary Amsden @ 2010-03-18 1:19 ` Sheng Yang 2010-03-18 4:50 ` Zachary Amsden 0 siblings, 1 reply; 390+ messages in thread From: Sheng Yang @ 2010-03-18 1:19 UTC (permalink / raw) To: Zachary Amsden Cc: Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Huang, Zhiteng, Joerg Roedel On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote: > On 03/16/2010 11:28 PM, Sheng Yang wrote: > > On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: > >> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: > >>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > >>>> Right, but there is a scope between kvm_guest_enter and really running > >>>> in guest os, where a perf event might overflow. Anyway, the scope is > >>>> very narrow, I will change it to use flag PF_VCPU. > >>> > >>> There is also a window between setting the flag and calling 'int $2' > >>> where an NMI might happen and be accounted incorrectly. > >>> > >>> Perhaps separate the 'int $2' into a direct call into perf and another > >>> call for the rest of NMI handling. I don't see how it would work on > >>> svm though - AFAICT the NMI is held whereas vmx swallows it. > >>> > >>> I guess NMIs > >>> will be disabled until the next IRET so it isn't racy, just tricky. > >> > >> I'm not sure if vmexit does break NMI context or not. Hardware NMI > >> context isn't reentrant till a IRET. YangSheng would like to double > >> check it. > > > > After more check, I think VMX won't remained NMI block state for host. > > That's means, if NMI happened and processor is in VMX non-root mode, it > > would only result in VMExit, with a reason indicate that it's due to NMI > > happened, but no more state change in the host. > > > > So in that meaning, there _is_ a window between VMExit and KVM handle the > > NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling > > code because "int $2" don't have effect to block following NMI. > > > > And if the NMI sequence is not important(I think so), then we need to > > generate a real NMI in current vmexit-after code. Seems let APIC send a > > NMI IPI to itself is a good idea. > > > > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to > > replace "int $2". Something unexpected is happening... > > You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't > supposed to be able to. Um? Why? Especially kernel is already using it to deliver NMI. -- regards Yang, Sheng ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-18 1:19 ` Sheng Yang @ 2010-03-18 4:50 ` Zachary Amsden 2010-03-18 5:22 ` Sheng Yang 0 siblings, 1 reply; 390+ messages in thread From: Zachary Amsden @ 2010-03-18 4:50 UTC (permalink / raw) To: Sheng Yang Cc: Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Huang, Zhiteng, Joerg Roedel On 03/17/2010 03:19 PM, Sheng Yang wrote: > On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote: > >> On 03/16/2010 11:28 PM, Sheng Yang wrote: >> >>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: >>> >>>> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: >>>> >>>>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: >>>>> >>>>>> Right, but there is a scope between kvm_guest_enter and really running >>>>>> in guest os, where a perf event might overflow. Anyway, the scope is >>>>>> very narrow, I will change it to use flag PF_VCPU. >>>>>> >>>>> There is also a window between setting the flag and calling 'int $2' >>>>> where an NMI might happen and be accounted incorrectly. >>>>> >>>>> Perhaps separate the 'int $2' into a direct call into perf and another >>>>> call for the rest of NMI handling. I don't see how it would work on >>>>> svm though - AFAICT the NMI is held whereas vmx swallows it. >>>>> >>>>> I guess NMIs >>>>> will be disabled until the next IRET so it isn't racy, just tricky. >>>>> >>>> I'm not sure if vmexit does break NMI context or not. Hardware NMI >>>> context isn't reentrant till a IRET. YangSheng would like to double >>>> check it. >>>> >>> After more check, I think VMX won't remained NMI block state for host. >>> That's means, if NMI happened and processor is in VMX non-root mode, it >>> would only result in VMExit, with a reason indicate that it's due to NMI >>> happened, but no more state change in the host. >>> >>> So in that meaning, there _is_ a window between VMExit and KVM handle the >>> NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling >>> code because "int $2" don't have effect to block following NMI. >>> >>> And if the NMI sequence is not important(I think so), then we need to >>> generate a real NMI in current vmexit-after code. Seems let APIC send a >>> NMI IPI to itself is a good idea. >>> >>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to >>> replace "int $2". Something unexpected is happening... >>> >> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't >> supposed to be able to. >> > Um? Why? > > Especially kernel is already using it to deliver NMI. > > That's the only defined case, and it is defined because the vector field is ignore for DM_NMI. Vol 3A (exact section numbers may vary depending on your version). 8.5.1 / 8.6.1 '100 (NMI) Delivers an NMI interrupt to the target processor or processors. The vector information is ignored' 8.5.2 Valid Interrupt Vectors 'Local and I/O APICs support 240 of these vectors (in the range of 16 to 255) as valid interrupts.' 8.8.4 Interrupt Acceptance for Fixed Interrupts '...; vectors 0 through 15 are reserved by the APIC (see also: Section 8.5.2, "Valid Interrupt Vectors")' So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but vectors 0x00-0x0f are not valid to send via APIC or I/O APIC. Zach ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-18 4:50 ` Zachary Amsden @ 2010-03-18 5:22 ` Sheng Yang 2010-03-18 5:41 ` Sheng Yang 0 siblings, 1 reply; 390+ messages in thread From: Sheng Yang @ 2010-03-18 5:22 UTC (permalink / raw) To: Zachary Amsden Cc: Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Huang, Zhiteng, Joerg Roedel On Thursday 18 March 2010 12:50:58 Zachary Amsden wrote: > On 03/17/2010 03:19 PM, Sheng Yang wrote: > > On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote: > >> On 03/16/2010 11:28 PM, Sheng Yang wrote: > >>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: > >>>> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: > >>>>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > >>>>>> Right, but there is a scope between kvm_guest_enter and really > >>>>>> running in guest os, where a perf event might overflow. Anyway, the > >>>>>> scope is very narrow, I will change it to use flag PF_VCPU. > >>>>> > >>>>> There is also a window between setting the flag and calling 'int $2' > >>>>> where an NMI might happen and be accounted incorrectly. > >>>>> > >>>>> Perhaps separate the 'int $2' into a direct call into perf and > >>>>> another call for the rest of NMI handling. I don't see how it would > >>>>> work on svm though - AFAICT the NMI is held whereas vmx swallows it. > >>>>> > >>>>> I guess NMIs > >>>>> will be disabled until the next IRET so it isn't racy, just tricky. > >>>> > >>>> I'm not sure if vmexit does break NMI context or not. Hardware NMI > >>>> context isn't reentrant till a IRET. YangSheng would like to double > >>>> check it. > >>> > >>> After more check, I think VMX won't remained NMI block state for host. > >>> That's means, if NMI happened and processor is in VMX non-root mode, it > >>> would only result in VMExit, with a reason indicate that it's due to > >>> NMI happened, but no more state change in the host. > >>> > >>> So in that meaning, there _is_ a window between VMExit and KVM handle > >>> the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI > >>> handling code because "int $2" don't have effect to block following > >>> NMI. > >>> > >>> And if the NMI sequence is not important(I think so), then we need to > >>> generate a real NMI in current vmexit-after code. Seems let APIC send a > >>> NMI IPI to itself is a good idea. > >>> > >>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to > >>> replace "int $2". Something unexpected is happening... > >> > >> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't > >> supposed to be able to. > > > > Um? Why? > > > > Especially kernel is already using it to deliver NMI. > > That's the only defined case, and it is defined because the vector field > is ignore for DM_NMI. Vol 3A (exact section numbers may vary depending > on your version). > > 8.5.1 / 8.6.1 > > '100 (NMI) Delivers an NMI interrupt to the target processor or > processors. The vector information is ignored' > > 8.5.2 Valid Interrupt Vectors > > 'Local and I/O APICs support 240 of these vectors (in the range of 16 to > 255) as valid interrupts.' > > 8.8.4 Interrupt Acceptance for Fixed Interrupts > > '...; vectors 0 through 15 are reserved by the APIC (see also: Section > 8.5.2, "Valid Interrupt Vectors")' > > So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but > vectors 0x00-0x0f are not valid to send via APIC or I/O APIC. As you pointed out, NMI is not "Fixed interrupt". If we want to send NMI, it would need a specific delivery mode rather than vector number. And if you look at code, if we specific NMI_VECTOR, the delivery mode would be set to NMI. So what's wrong here? -- regards Yang, Sheng ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-18 5:22 ` Sheng Yang @ 2010-03-18 5:41 ` Sheng Yang 2010-03-18 8:47 ` Zachary Amsden 0 siblings, 1 reply; 390+ messages in thread From: Sheng Yang @ 2010-03-18 5:41 UTC (permalink / raw) To: kvm Cc: Zachary Amsden, Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Huang, Zhiteng, Joerg Roedel On Thursday 18 March 2010 13:22:28 Sheng Yang wrote: > On Thursday 18 March 2010 12:50:58 Zachary Amsden wrote: > > On 03/17/2010 03:19 PM, Sheng Yang wrote: > > > On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote: > > >> On 03/16/2010 11:28 PM, Sheng Yang wrote: > > >>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: > > >>>> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: > > >>>>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: > > >>>>>> Right, but there is a scope between kvm_guest_enter and really > > >>>>>> running in guest os, where a perf event might overflow. Anyway, > > >>>>>> the scope is very narrow, I will change it to use flag PF_VCPU. > > >>>>> > > >>>>> There is also a window between setting the flag and calling 'int > > >>>>> $2' where an NMI might happen and be accounted incorrectly. > > >>>>> > > >>>>> Perhaps separate the 'int $2' into a direct call into perf and > > >>>>> another call for the rest of NMI handling. I don't see how it > > >>>>> would work on svm though - AFAICT the NMI is held whereas vmx > > >>>>> swallows it. > > >>>>> > > >>>>> I guess NMIs > > >>>>> will be disabled until the next IRET so it isn't racy, just tricky. > > >>>> > > >>>> I'm not sure if vmexit does break NMI context or not. Hardware NMI > > >>>> context isn't reentrant till a IRET. YangSheng would like to double > > >>>> check it. > > >>> > > >>> After more check, I think VMX won't remained NMI block state for > > >>> host. That's means, if NMI happened and processor is in VMX non-root > > >>> mode, it would only result in VMExit, with a reason indicate that > > >>> it's due to NMI happened, but no more state change in the host. > > >>> > > >>> So in that meaning, there _is_ a window between VMExit and KVM handle > > >>> the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI > > >>> handling code because "int $2" don't have effect to block following > > >>> NMI. > > >>> > > >>> And if the NMI sequence is not important(I think so), then we need to > > >>> generate a real NMI in current vmexit-after code. Seems let APIC send > > >>> a NMI IPI to itself is a good idea. > > >>> > > >>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to > > >>> replace "int $2". Something unexpected is happening... > > >> > > >> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't > > >> supposed to be able to. > > > > > > Um? Why? > > > > > > Especially kernel is already using it to deliver NMI. > > > > That's the only defined case, and it is defined because the vector field > > is ignore for DM_NMI. Vol 3A (exact section numbers may vary depending > > on your version). > > > > 8.5.1 / 8.6.1 > > > > '100 (NMI) Delivers an NMI interrupt to the target processor or > > processors. The vector information is ignored' > > > > 8.5.2 Valid Interrupt Vectors > > > > 'Local and I/O APICs support 240 of these vectors (in the range of 16 to > > 255) as valid interrupts.' > > > > 8.8.4 Interrupt Acceptance for Fixed Interrupts > > > > '...; vectors 0 through 15 are reserved by the APIC (see also: Section > > 8.5.2, "Valid Interrupt Vectors")' > > > > So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but > > vectors 0x00-0x0f are not valid to send via APIC or I/O APIC. > > As you pointed out, NMI is not "Fixed interrupt". If we want to send NMI, > it would need a specific delivery mode rather than vector number. > > And if you look at code, if we specific NMI_VECTOR, the delivery mode would > be set to NMI. > > So what's wrong here? OK, I think I understand your points now. You meant that these vectors can't be filled in vector field directly, right? But NMI is a exception due to DM_NMI. Is that your point? I think we agree on this. -- regards Yang, Sheng ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-18 5:41 ` Sheng Yang @ 2010-03-18 8:47 ` Zachary Amsden 0 siblings, 0 replies; 390+ messages in thread From: Zachary Amsden @ 2010-03-18 8:47 UTC (permalink / raw) To: Sheng Yang Cc: kvm, Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Huang, Zhiteng, Joerg Roedel On 03/17/2010 07:41 PM, Sheng Yang wrote: > On Thursday 18 March 2010 13:22:28 Sheng Yang wrote: > >> On Thursday 18 March 2010 12:50:58 Zachary Amsden wrote: >> >>> On 03/17/2010 03:19 PM, Sheng Yang wrote: >>> >>>> On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote: >>>> >>>>> On 03/16/2010 11:28 PM, Sheng Yang wrote: >>>>> >>>>>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote: >>>>>> >>>>>>> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote: >>>>>>> >>>>>>>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote: >>>>>>>> >>>>>>>>> Right, but there is a scope between kvm_guest_enter and really >>>>>>>>> running in guest os, where a perf event might overflow. Anyway, >>>>>>>>> the scope is very narrow, I will change it to use flag PF_VCPU. >>>>>>>>> >>>>>>>> There is also a window between setting the flag and calling 'int >>>>>>>> $2' where an NMI might happen and be accounted incorrectly. >>>>>>>> >>>>>>>> Perhaps separate the 'int $2' into a direct call into perf and >>>>>>>> another call for the rest of NMI handling. I don't see how it >>>>>>>> would work on svm though - AFAICT the NMI is held whereas vmx >>>>>>>> swallows it. >>>>>>>> >>>>>>>> I guess NMIs >>>>>>>> will be disabled until the next IRET so it isn't racy, just tricky. >>>>>>>> >>>>>>> I'm not sure if vmexit does break NMI context or not. Hardware NMI >>>>>>> context isn't reentrant till a IRET. YangSheng would like to double >>>>>>> check it. >>>>>>> >>>>>> After more check, I think VMX won't remained NMI block state for >>>>>> host. That's means, if NMI happened and processor is in VMX non-root >>>>>> mode, it would only result in VMExit, with a reason indicate that >>>>>> it's due to NMI happened, but no more state change in the host. >>>>>> >>>>>> So in that meaning, there _is_ a window between VMExit and KVM handle >>>>>> the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI >>>>>> handling code because "int $2" don't have effect to block following >>>>>> NMI. >>>>>> >>>>>> And if the NMI sequence is not important(I think so), then we need to >>>>>> generate a real NMI in current vmexit-after code. Seems let APIC send >>>>>> a NMI IPI to itself is a good idea. >>>>>> >>>>>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to >>>>>> replace "int $2". Something unexpected is happening... >>>>>> >>>>> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't >>>>> supposed to be able to. >>>>> >>>> Um? Why? >>>> >>>> Especially kernel is already using it to deliver NMI. >>>> >>> That's the only defined case, and it is defined because the vector field >>> is ignore for DM_NMI. Vol 3A (exact section numbers may vary depending >>> on your version). >>> >>> 8.5.1 / 8.6.1 >>> >>> '100 (NMI) Delivers an NMI interrupt to the target processor or >>> processors. The vector information is ignored' >>> >>> 8.5.2 Valid Interrupt Vectors >>> >>> 'Local and I/O APICs support 240 of these vectors (in the range of 16 to >>> 255) as valid interrupts.' >>> >>> 8.8.4 Interrupt Acceptance for Fixed Interrupts >>> >>> '...; vectors 0 through 15 are reserved by the APIC (see also: Section >>> 8.5.2, "Valid Interrupt Vectors")' >>> >>> So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but >>> vectors 0x00-0x0f are not valid to send via APIC or I/O APIC. >>> >> As you pointed out, NMI is not "Fixed interrupt". If we want to send NMI, >> it would need a specific delivery mode rather than vector number. >> >> And if you look at code, if we specific NMI_VECTOR, the delivery mode would >> be set to NMI. >> >> So what's wrong here? >> > OK, I think I understand your points now. You meant that these vectors can't > be filled in vector field directly, right? But NMI is a exception due to > DM_NMI. Is that your point? I think we agree on this. > Yes, I think we agree. NMI is the only vector in 0x0-0xf which can be sent via self-IPI because the vector itself does not matter for NMI. Zach ^ permalink raw reply [flat|nested] 390+ messages in thread
* [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-16 5:27 [PATCH] Enhance perf to collect KVM guest os statistics from host side Zhang, Yanmin 2010-03-16 5:41 ` Avi Kivity @ 2010-03-19 3:38 ` Zhang, Yanmin 2010-03-19 8:21 ` Ingo Molnar 1 sibling, 1 reply; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-19 3:38 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang On Tue, 2010-03-16 at 13:27 +0800, Zhang, Yanmin wrote: > From: Zhang, Yanmin <yanmin_zhang@linux.intel.com> > Here is the new patch of V2 against tip/master of March 17th if anyone wants to try it. ChangeLog V2: 1) Based on Avi's suggestion, I moved callback functions to generic code area. So the kernel part of the patch is clearer. 2) Add 'perf kvm stat'. From: Zhang, Yanmin <yanmin_zhang@linux.intel.com> Based on the discussion in KVM community, I worked out the patch to support perf to collect guest os statistics from host side. This patch is implemented with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a critical bug and provided good suggestions with other guys. I really appreciate their kind help. The patch adds new subcommand kvm to perf. perf kvm top perf kvm record perf kvm report perf kvm diff perf kvm stat The new perf could profile guest os kernel except guest os user space, but it could summarize guest os user space utilization per guest os. Below are some examples. 1) perf kvm top [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms --guestmodules=/home/ymzhang/guest/modules top -------------------------------------------------------------------------------------------------------------------------- PerfTop: 16010 irqs/sec kernel:59.1% us: 1.5% guest kernel:31.9% guest us: 7.5% exact: 0.0% [1000Hz cycles], (all, 16 CPUs) -------------------------------------------------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ _________________________ _______________________ 38770.00 20.4% __ticket_spin_lock [guest.kernel.kallsyms] 22560.00 11.9% ftrace_likely_update [kernel.kallsyms] 9208.00 4.8% __lock_acquire [kernel.kallsyms] 5473.00 2.9% trace_hardirqs_off_caller [kernel.kallsyms] 5222.00 2.7% copy_user_generic_string [guest.kernel.kallsyms] 4450.00 2.3% validate_chain [kernel.kallsyms] 4262.00 2.2% trace_hardirqs_on_caller [kernel.kallsyms] 4239.00 2.2% do_raw_spin_lock [kernel.kallsyms] 3548.00 1.9% do_raw_spin_unlock [kernel.kallsyms] 2487.00 1.3% lock_release [kernel.kallsyms] 2165.00 1.1% __local_bh_disable [kernel.kallsyms] 1905.00 1.0% check_chain_key [kernel.kallsyms] 1737.00 0.9% lock_acquire [kernel.kallsyms] 1604.00 0.8% tcp_recvmsg [kernel.kallsyms] 1524.00 0.8% mark_lock [kernel.kallsyms] 1464.00 0.8% schedule [kernel.kallsyms] 1423.00 0.7% __d_lookup [guest.kernel.kallsyms] If you want to just show host data, pls. don't use parameter --guest. The headline includes guest os kernel and userspace percentage. 2) perf kvm record [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms --guestmodules=/home/ymzhang/guest/modules record -f -a sleep 60 [ perf record: Woken up 15 times to write data ] [ perf record: Captured and wrote 29.385 MB perf.data.kvm (~1283837 samples) ] 3) perf kvm report 3.1) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms --guestmodules=/home/ymzhang/guest/modules report --sort pid --showcpuutilization>norm.host.guest.report.pid # Samples: 424719292247 # # Overhead sys us guest sys guest us Command: Pid # ........ ..................... # 50.57% 1.02% 0.00% 39.97% 9.58% qemu-system-x86: 3587 49.32% 1.35% 0.01% 35.20% 12.76% qemu-system-x86: 3347 0.07% 0.07% 0.00% 0.00% 0.00% perf: 5217 Some performance guys require perf to show sys/us/guest_sys/guest_us per KVM guest instance which is actually just a multi-threaded process. Above sub parameter --showcpuutilization does so. 3.2) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms --guestmodules=/home/ymzhang/guest/modules report >norm.host.guest.report # Samples: 2466991384118 # # Overhead Command Shared Object Symbol # ........ ............... ........................................................................ ...... # 29.11% qemu-system-x86 [guest.kernel.kallsyms] [g] __ticket_spin_lock 5.88% tbench_srv [kernel.kallsyms] [k] ftrace_likely_update 5.76% tbench [kernel.kallsyms] [k] ftrace_likely_update 3.88% qemu-system-x86 34c3255482 [u] 0x000034c3255482 1.83% tbench [kernel.kallsyms] [k] __lock_acquire 1.81% tbench_srv [kernel.kallsyms] [k] __lock_acquire 1.38% tbench_srv [kernel.kallsyms] [k] trace_hardirqs_off_caller 1.37% tbench [kernel.kallsyms] [k] trace_hardirqs_off_caller 1.13% qemu-system-x86 [guest.kernel.kallsyms] [g] copy_user_generic_string 1.04% tbench_srv [kernel.kallsyms] [k] validate_chain 1.00% tbench [kernel.kallsyms] [k] trace_hardirqs_on_caller 1.00% tbench_srv [kernel.kallsyms] [k] trace_hardirqs_on_caller 0.95% tbench [kernel.kallsyms] [k] do_raw_spin_lock [u] means it's in guest os user space. [g] means in guest os kernel. Other info is very direct. If it shows a module such like [ext4], it means guest kernel module, because native host kernel's modules are start from something like /lib/modules/XXX. Below is the patch against tip/master tree of 17th March. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> --- diff -Nraup linux-2.6_tip0317/arch/x86/include/asm/perf_event.h linux-2.6_tip0317_perfkvm/arch/x86/include/asm/perf_event.h --- linux-2.6_tip0317/arch/x86/include/asm/perf_event.h 2010-03-18 09:04:36.597952883 +0800 +++ linux-2.6_tip0317_perfkvm/arch/x86/include/asm/perf_event.h 2010-03-18 15:06:19.579081193 +0800 @@ -143,17 +143,10 @@ extern void perf_events_lapic_init(void) */ #define PERF_EFLAGS_EXACT (1UL << 3) -#define perf_misc_flags(regs) \ -({ int misc = 0; \ - if (user_mode(regs)) \ - misc |= PERF_RECORD_MISC_USER; \ - else \ - misc |= PERF_RECORD_MISC_KERNEL; \ - if (regs->flags & PERF_EFLAGS_EXACT) \ - misc |= PERF_RECORD_MISC_EXACT; \ - misc; }) - -#define perf_instruction_pointer(regs) ((regs)->ip) +struct pt_regs; +extern unsigned long perf_instruction_pointer(struct pt_regs *regs); +extern unsigned long perf_misc_flags(struct pt_regs *regs); +#define perf_misc_flags(regs) perf_misc_flags(regs) #else static inline void init_hw_perf_events(void) { } diff -Nraup linux-2.6_tip0317/arch/x86/kernel/cpu/perf_event.c linux-2.6_tip0317_perfkvm/arch/x86/kernel/cpu/perf_event.c --- linux-2.6_tip0317/arch/x86/kernel/cpu/perf_event.c 2010-03-18 09:04:36.665958497 +0800 +++ linux-2.6_tip0317_perfkvm/arch/x86/kernel/cpu/perf_event.c 2010-03-18 15:07:20.555339370 +0800 @@ -1708,3 +1708,30 @@ void perf_arch_fetch_caller_regs(struct local_save_flags(regs->flags); } #endif + +unsigned long perf_instruction_pointer(struct pt_regs *regs) +{ + unsigned long ip; + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) + ip = perf_guest_cbs->get_guest_ip(); + else + ip = instruction_pointer(regs); + return ip; +} + +unsigned long perf_misc_flags(struct pt_regs *regs) +{ + int misc = 0; + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) { + misc |= perf_guest_cbs->is_user_mode() ? + PERF_RECORD_MISC_GUEST_USER : + PERF_RECORD_MISC_GUEST_KERNEL; + } else + misc |= user_mode(regs) ? PERF_RECORD_MISC_USER : + PERF_RECORD_MISC_KERNEL; + if (regs->flags & PERF_EFLAGS_EXACT) + misc |= PERF_RECORD_MISC_EXACT; + + return misc; +} + diff -Nraup linux-2.6_tip0317/arch/x86/kvm/x86.c linux-2.6_tip0317_perfkvm/arch/x86/kvm/x86.c --- linux-2.6_tip0317/arch/x86/kvm/x86.c 2010-03-18 09:04:36.629956698 +0800 +++ linux-2.6_tip0317_perfkvm/arch/x86/kvm/x86.c 2010-03-18 15:06:19.579081193 +0800 @@ -3764,6 +3764,35 @@ static void kvm_timer_init(void) } } +static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu); + +static int kvm_is_in_guest(void) +{ + return percpu_read(current_vcpu) != NULL; +} + +static int kvm_is_user_mode(void) +{ + int user_mode = 3; + if (percpu_read(current_vcpu)) + user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu)); + return user_mode != 0; +} + +static unsigned long kvm_get_guest_ip(void) +{ + unsigned long ip = 0; + if (percpu_read(current_vcpu)) + ip = kvm_rip_read(percpu_read(current_vcpu)); + return ip; +} + +static struct perf_guest_info_callbacks kvm_guest_cbs = { + .is_in_guest = kvm_is_in_guest, + .is_user_mode = kvm_is_user_mode, + .get_guest_ip = kvm_get_guest_ip, +}; + int kvm_arch_init(void *opaque) { int r; @@ -3800,6 +3829,8 @@ int kvm_arch_init(void *opaque) kvm_timer_init(); + perf_register_guest_info_callbacks(&kvm_guest_cbs); + return 0; out: @@ -3808,6 +3839,8 @@ out: void kvm_arch_exit(void) { + perf_unregister_guest_info_callbacks(&kvm_guest_cbs); + if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block, CPUFREQ_TRANSITION_NOTIFIER); @@ -4338,7 +4371,10 @@ static int vcpu_enter_guest(struct kvm_v } trace_kvm_entry(vcpu->vcpu_id); + + percpu_write(current_vcpu, vcpu); kvm_x86_ops->run(vcpu); + percpu_write(current_vcpu, NULL); /* * If the guest has used debug registers, at least dr7 diff -Nraup linux-2.6_tip0317/include/linux/perf_event.h linux-2.6_tip0317_perfkvm/include/linux/perf_event.h --- linux-2.6_tip0317/include/linux/perf_event.h 2010-03-18 09:04:37.674034701 +0800 +++ linux-2.6_tip0317_perfkvm/include/linux/perf_event.h 2010-03-18 15:06:19.583056523 +0800 @@ -288,11 +288,13 @@ struct perf_event_mmap_page { __u64 data_tail; /* user-space written tail */ }; -#define PERF_RECORD_MISC_CPUMODE_MASK (3 << 0) +#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0) #define PERF_RECORD_MISC_CPUMODE_UNKNOWN (0 << 0) #define PERF_RECORD_MISC_KERNEL (1 << 0) #define PERF_RECORD_MISC_USER (2 << 0) #define PERF_RECORD_MISC_HYPERVISOR (3 << 0) +#define PERF_RECORD_MISC_GUEST_KERNEL (4 << 0) +#define PERF_RECORD_MISC_GUEST_USER (5 << 0) #define PERF_RECORD_MISC_EXACT (1 << 14) /* @@ -446,6 +448,12 @@ enum perf_callchain_context { # include <asm/perf_event.h> #endif +struct perf_guest_info_callbacks { + int (*is_in_guest) (void); + int (*is_user_mode) (void); + unsigned long (*get_guest_ip) (void); +}; + #ifdef CONFIG_HAVE_HW_BREAKPOINT #include <asm/hw_breakpoint.h> #endif @@ -913,6 +921,12 @@ static inline void perf_event_mmap(struc __perf_event_mmap(vma); } +extern struct perf_guest_info_callbacks *perf_guest_cbs; +extern int perf_register_guest_info_callbacks( + struct perf_guest_info_callbacks *); +extern int perf_unregister_guest_info_callbacks( + struct perf_guest_info_callbacks *); + extern void perf_event_comm(struct task_struct *tsk); extern void perf_event_fork(struct task_struct *tsk); @@ -982,6 +996,11 @@ perf_sw_event(u32 event_id, u64 nr, int static inline void perf_bp_event(struct perf_event *event, void *data) { } +static inline int perf_register_guest_info_callbacks +(struct perf_guest_info_callbacks *) {return 0; } +static inline int perf_unregister_guest_info_callbacks +(struct perf_guest_info_callbacks *) {return 0; } + static inline void perf_event_mmap(struct vm_area_struct *vma) { } static inline void perf_event_comm(struct task_struct *tsk) { } static inline void perf_event_fork(struct task_struct *tsk) { } diff -Nraup linux-2.6_tip0317/kernel/perf_event.c linux-2.6_tip0317_perfkvm/kernel/perf_event.c --- linux-2.6_tip0317/kernel/perf_event.c 2010-03-18 09:04:40.954262305 +0800 +++ linux-2.6_tip0317_perfkvm/kernel/perf_event.c 2010-03-18 15:06:19.583056523 +0800 @@ -2798,6 +2798,27 @@ void perf_arch_fetch_caller_regs(struct #endif /* + * We assume there is only KVM supporting the callbacks. + * Later on, we might change it to a list if there is + * another virtualization implementation supporting the callbacks. + */ +struct perf_guest_info_callbacks *perf_guest_cbs; + +int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs) +{ + perf_guest_cbs = cbs; + return 0; +} +EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks); + +int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs) +{ + perf_guest_cbs = NULL; + return 0; +} +EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks); + +/* * Output */ static bool perf_output_space(struct perf_mmap_data *data, unsigned long tail, @@ -3740,7 +3761,7 @@ void __perf_event_mmap(struct vm_area_st .event_id = { .header = { .type = PERF_RECORD_MMAP, - .misc = 0, + .misc = PERF_RECORD_MISC_USER, /* .size */ }, /* .pid */ diff -Nraup linux-2.6_tip0317/tools/perf/builtin-diff.c linux-2.6_tip0317_perfkvm/tools/perf/builtin-diff.c --- linux-2.6_tip0317/tools/perf/builtin-diff.c 2010-03-18 09:04:40.914226433 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/builtin-diff.c 2010-03-18 15:06:19.583056523 +0800 @@ -33,7 +33,7 @@ static int perf_session__add_hist_entry( return -ENOMEM; if (hit) - he->count += count; + __perf_session__add_count(he, al, count); return 0; } @@ -225,6 +225,9 @@ int cmd_diff(int argc, const char **argv input_new = argv[1]; } else input_new = argv[0]; + } else if (symbol_conf.guest_vmlinux_name || symbol_conf.guest_kallsyms) { + input_old = "perf.data.host"; + input_new = "perf.data.guest"; } symbol_conf.exclude_other = false; diff -Nraup linux-2.6_tip0317/tools/perf/builtin.h linux-2.6_tip0317_perfkvm/tools/perf/builtin.h --- linux-2.6_tip0317/tools/perf/builtin.h 2010-03-18 09:04:40.910227768 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/builtin.h 2010-03-18 15:06:19.583056523 +0800 @@ -32,5 +32,6 @@ extern int cmd_version(int argc, const c extern int cmd_probe(int argc, const char **argv, const char *prefix); extern int cmd_kmem(int argc, const char **argv, const char *prefix); extern int cmd_lock(int argc, const char **argv, const char *prefix); +extern int cmd_kvm(int argc, const char **argv, const char *prefix); #endif diff -Nraup linux-2.6_tip0317/tools/perf/builtin-kvm.c linux-2.6_tip0317_perfkvm/tools/perf/builtin-kvm.c --- linux-2.6_tip0317/tools/perf/builtin-kvm.c 1970-01-01 08:00:00.000000000 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/builtin-kvm.c 2010-03-18 15:06:19.583056523 +0800 @@ -0,0 +1,125 @@ +#include "builtin.h" +#include "perf.h" + +#include "util/util.h" +#include "util/cache.h" +#include "util/symbol.h" +#include "util/thread.h" +#include "util/header.h" +#include "util/session.h" + +#include "util/parse-options.h" +#include "util/trace-event.h" + +#include "util/debug.h" + +#include <sys/prctl.h> + +#include <semaphore.h> +#include <pthread.h> +#include <math.h> + +static char *file_name = NULL; +static char name_buffer[256]; + +int perf_host = 1; +int perf_guest = 0; + +static const char * const kvm_usage[] = { + "perf kvm [<options>] {top|record|report|diff|stat}", + NULL +}; + +static const struct option kvm_options[] = { + OPT_STRING('i', "input", &file_name, "file", + "Input file name"), + OPT_STRING('o', "output", &file_name, "file", + "Output file name"), + OPT_BOOLEAN(0, "guest", &perf_guest, + "Collect guest os data"), + OPT_BOOLEAN(0, "host", &perf_host, + "Collect guest os data"), + OPT_STRING(0, "guestvmlinux", &symbol_conf.guest_vmlinux_name, "file", + "file saving guest os vmlinux"), + OPT_STRING(0, "guestkallsyms", &symbol_conf.guest_kallsyms, "file", + "file saving guest os /proc/kallsyms"), + OPT_STRING(0, "guestmodules", &symbol_conf.guest_modules, "file", + "file saving guest os /proc/modules"), + OPT_END() +}; + +static int __cmd_record(int argc, const char **argv) +{ + int rec_argc, i = 0, j; + const char **rec_argv; + + rec_argc = argc + 2; + rec_argv = calloc(rec_argc + 1, sizeof(char *)); + rec_argv[i++] = strdup("record"); + rec_argv[i++] = strdup("-o"); + rec_argv[i++] = strdup(file_name); + for (j = 1; j < argc; j++, i++) + rec_argv[i] = argv[j]; + + BUG_ON(i != rec_argc); + + return cmd_record(i, rec_argv, NULL); +} + +static int __cmd_report(int argc, const char **argv) +{ + int rec_argc, i = 0, j; + const char **rec_argv; + + rec_argc = argc + 2; + rec_argv = calloc(rec_argc + 1, sizeof(char *)); + rec_argv[i++] = strdup("report"); + rec_argv[i++] = strdup("-i"); + rec_argv[i++] = strdup(file_name); + for (j = 1; j < argc; j++, i++) + rec_argv[i] = argv[j]; + + BUG_ON(i != rec_argc); + + return cmd_report(i, rec_argv, NULL); +} + +int cmd_kvm(int argc, const char **argv, const char *prefix __used) +{ + perf_host = perf_guest = 0; + + argc = parse_options(argc, argv, kvm_options, kvm_usage, + PARSE_OPT_STOP_AT_NON_OPTION); + if (!argc) + usage_with_options(kvm_usage, kvm_options); + + if (!perf_host) + perf_guest = 1; + + if (!file_name) { + if (perf_host && !perf_guest) + sprintf(name_buffer, "perf.data.host"); + else if (!perf_host && perf_guest) + sprintf(name_buffer, "perf.data.guest"); + else + sprintf(name_buffer, "perf.data.kvm"); + file_name = name_buffer; + } + + if (!strncmp(argv[0], "rec", 3)) { + return __cmd_record(argc, argv); + } else if (!strncmp(argv[0], "rep", 3)) { + return __cmd_report(argc, argv); + } else if (!strncmp(argv[0], "diff", 4)) { + return cmd_diff(argc, argv, NULL); + } else if (!strncmp(argv[0], "top", 3)) { + return cmd_top(argc, argv, NULL); + } else if (!strncmp(argv[0], "stat", 3)) { + return cmd_stat(argc, argv, NULL); + } else { + usage_with_options(kvm_usage, kvm_options); + } + + return 0; +} + diff -Nraup linux-2.6_tip0317/tools/perf/builtin-record.c linux-2.6_tip0317_perfkvm/tools/perf/builtin-record.c --- linux-2.6_tip0317/tools/perf/builtin-record.c 2010-03-18 09:04:40.942263175 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/builtin-record.c 2010-03-18 15:06:19.583056523 +0800 @@ -566,18 +566,58 @@ static int __cmd_record(int argc, const post_processing_offset = lseek(output, 0, SEEK_CUR); err = event__synthesize_kernel_mmap(process_synthesized_event, - session, "_text"); + session, "/proc/kallsyms", + "kernel.kallsyms", + session->vmlinux_maps, + "_text", PERF_RECORD_MISC_KERNEL); if (err < 0) { pr_err("Couldn't record kernel reference relocation symbol.\n"); return err; } - err = event__synthesize_modules(process_synthesized_event, session); + err = event__synthesize_modules(process_synthesized_event, + session, + &session->kmaps, + PERF_RECORD_MISC_KERNEL); if (err < 0) { pr_err("Couldn't record kernel reference relocation symbol.\n"); return err; } + if (perf_guest) { + /* + *As for guest kernel when processing subcommand record&report, + *we arrange module mmap prior to guest kernel mmap and trigger + *a preload dso because guest module symbols are loaded from guest + *kallsyms instead of /lib/modules/XXX/XXX. This method is used to + *avoid symbol missing when the first addr is in module instead of + *in guest kernel + */ + err = event__synthesize_modules(process_synthesized_event, + session, + &session->guest_kmaps, + PERF_RECORD_MISC_GUEST_KERNEL); + if (err < 0) { + pr_err("Couldn't record guest kernel reference relocation symbol.\n"); + return err; + } + + /* + * We use _stext for guest kernel because guest kernel's /proc/kallsyms + * have no _text. + */ + err = event__synthesize_kernel_mmap(process_synthesized_event, + session, symbol_conf.guest_kallsyms, + "guest.kernel.kallsyms", + session->guest_vmlinux_maps, + "_stext", + PERF_RECORD_MISC_GUEST_KERNEL); + if (err < 0) { + pr_err("Couldn't record guest kernel reference relocation symbol.\n"); + return err; + } + } + if (!system_wide && profile_cpu == -1) event__synthesize_thread(target_pid, process_synthesized_event, session); diff -Nraup linux-2.6_tip0317/tools/perf/builtin-report.c linux-2.6_tip0317_perfkvm/tools/perf/builtin-report.c --- linux-2.6_tip0317/tools/perf/builtin-report.c 2010-03-18 09:04:40.926228328 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/builtin-report.c 2010-03-18 15:06:19.587050319 +0800 @@ -104,7 +104,7 @@ static int perf_session__add_hist_entry( return -ENOMEM; if (hit) - he->count += data->period; + __perf_session__add_count(he, al, data->period); if (symbol_conf.use_callchain) { if (!hit) @@ -428,6 +428,8 @@ static const struct option options[] = { "sort by key(s): pid, comm, dso, symbol, parent"), OPT_BOOLEAN('P', "full-paths", &symbol_conf.full_paths, "Don't shorten the pathnames taking into account the cwd"), + OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization, + "Show sample percentage for different cpu modes"), OPT_STRING('p', "parent", &parent_pattern, "regex", "regex filter to identify parent, see: '--sort parent'"), OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other, diff -Nraup linux-2.6_tip0317/tools/perf/builtin-top.c linux-2.6_tip0317_perfkvm/tools/perf/builtin-top.c --- linux-2.6_tip0317/tools/perf/builtin-top.c 2010-03-18 09:04:40.926228328 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/builtin-top.c 2010-03-18 15:06:19.587050319 +0800 @@ -417,8 +417,9 @@ static double sym_weight(const struct sy } static long samples; -static long userspace_samples; +static long kernel_samples, userspace_samples; static long exact_samples; +static long guest_us_samples, guest_kernel_samples; static const char CONSOLE_CLEAR[] = "^[[H^[[2J"; static void __list_insert_active_sym(struct sym_entry *syme) @@ -458,7 +459,10 @@ static void print_sym_table(void) int printed = 0, j; int counter, snap = !display_weighted ? sym_counter : 0; float samples_per_sec = samples/delay_secs; - float ksamples_per_sec = (samples-userspace_samples)/delay_secs; + float ksamples_per_sec = kernel_samples/delay_secs; + float userspace_samples_per_sec = (userspace_samples)/delay_secs; + float guest_kernel_samples_per_sec = (guest_kernel_samples)/delay_secs; + float guest_us_samples_per_sec = (guest_us_samples)/delay_secs; float esamples_percent = (100.0*exact_samples)/samples; float sum_ksamples = 0.0; struct sym_entry *syme, *n; @@ -467,7 +471,8 @@ static void print_sym_table(void) int sym_width = 0, dso_width = 0, dso_short_width = 0; const int win_width = winsize.ws_col - 1; - samples = userspace_samples = exact_samples = 0; + samples = userspace_samples = kernel_samples = exact_samples = 0; + guest_kernel_samples = guest_us_samples = 0; /* Sort the active symbols */ pthread_mutex_lock(&active_symbols_lock); @@ -498,10 +503,21 @@ static void print_sym_table(void) puts(CONSOLE_CLEAR); printf("%-*.*s\n", win_width, win_width, graph_dotted_line); - printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% exact: %4.1f%% [", - samples_per_sec, - 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)), - esamples_percent); + if (!perf_guest) { + printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% exact: %4.1f%% [", + samples_per_sec, + 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)), + esamples_percent); + } else { + printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% us:%4.1f%%" + " guest kernel:%4.1f%% guest us:%4.1f%% exact: %4.1f%% [", + samples_per_sec, + 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)), + 100.0 - (100.0*((samples_per_sec-userspace_samples_per_sec)/samples_per_sec)), + 100.0 - (100.0*((samples_per_sec-guest_kernel_samples_per_sec)/samples_per_sec)), + 100.0 - (100.0*((samples_per_sec-guest_us_samples_per_sec)/samples_per_sec)), + esamples_percent); + } if (nr_counters == 1 || !display_weighted) { printf("%Ld", (u64)attrs[0].sample_period); @@ -963,9 +979,20 @@ static void event__process_sample(const return; break; case PERF_RECORD_MISC_KERNEL: + ++kernel_samples; if (hide_kernel_symbols) return; break; + case PERF_RECORD_MISC_GUEST_KERNEL: + ++guest_kernel_samples; + break; + case PERF_RECORD_MISC_GUEST_USER: + ++guest_us_samples; + /* + * TODO: we don't process guest user from host side + * except simple counting + */ + return; default: return; } diff -Nraup linux-2.6_tip0317/tools/perf/Makefile linux-2.6_tip0317_perfkvm/tools/perf/Makefile --- linux-2.6_tip0317/tools/perf/Makefile 2010-03-18 09:04:40.938289813 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/Makefile 2010-03-18 15:06:19.587050319 +0800 @@ -462,6 +462,7 @@ BUILTIN_OBJS += builtin-trace.o BUILTIN_OBJS += builtin-probe.o BUILTIN_OBJS += builtin-kmem.o BUILTIN_OBJS += builtin-lock.o +BUILTIN_OBJS += builtin-kvm.o PERFLIBS = $(LIB_FILE) diff -Nraup linux-2.6_tip0317/tools/perf/perf.c linux-2.6_tip0317_perfkvm/tools/perf/perf.c --- linux-2.6_tip0317/tools/perf/perf.c 2010-03-18 09:04:40.926228328 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/perf.c 2010-03-18 15:06:19.587050319 +0800 @@ -308,6 +308,7 @@ static void handle_internal_command(int { "probe", cmd_probe, 0 }, { "kmem", cmd_kmem, 0 }, { "lock", cmd_lock, 0 }, + { "kvm", cmd_kvm, 0 }, }; unsigned int i; static const char ext[] = STRIP_EXTENSION; diff -Nraup linux-2.6_tip0317/tools/perf/perf.h linux-2.6_tip0317_perfkvm/tools/perf/perf.h --- linux-2.6_tip0317/tools/perf/perf.h 2010-03-18 09:04:40.942263175 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/perf.h 2010-03-18 15:06:19.587050319 +0800 @@ -133,4 +133,6 @@ struct ip_callchain { u64 ips[0]; }; +extern int perf_host, perf_guest; + #endif diff -Nraup linux-2.6_tip0317/tools/perf/util/event.c linux-2.6_tip0317_perfkvm/tools/perf/util/event.c --- linux-2.6_tip0317/tools/perf/util/event.c 2010-03-18 09:04:40.934227537 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/util/event.c 2010-03-18 15:06:19.587050319 +0800 @@ -112,7 +112,7 @@ static int event__synthesize_mmap_events event_t ev = { .header = { .type = PERF_RECORD_MMAP, - .misc = 0, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */ + .misc = PERF_RECORD_MISC_USER, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */ }, }; int n; @@ -158,11 +158,13 @@ static int event__synthesize_mmap_events } int event__synthesize_modules(event__handler_t process, - struct perf_session *session) + struct perf_session *session, + struct map_groups *kmaps, + unsigned int misc) { struct rb_node *nd; - for (nd = rb_first(&session->kmaps.maps[MAP__FUNCTION]); + for (nd = rb_first(&kmaps->maps[MAP__FUNCTION]); nd; nd = rb_next(nd)) { event_t ev; size_t size; @@ -173,7 +175,7 @@ int event__synthesize_modules(event__han size = ALIGN(pos->dso->long_name_len + 1, sizeof(u64)); memset(&ev, 0, sizeof(ev)); - ev.mmap.header.misc = 1; /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */ + ev.mmap.header.misc = misc; /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */ ev.mmap.header.type = PERF_RECORD_MMAP; ev.mmap.header.size = (sizeof(ev.mmap) - (sizeof(ev.mmap.filename) - size)); @@ -241,13 +243,17 @@ static int find_symbol_cb(void *arg, con int event__synthesize_kernel_mmap(event__handler_t process, struct perf_session *session, - const char *symbol_name) + const char *kallsyms_name, + const char *mmap_name, + struct map **maps, + const char *symbol_name, + unsigned int misc) { size_t size; event_t ev = { .header = { .type = PERF_RECORD_MMAP, - .misc = 1, /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */ + .misc = misc, /* kernel uses PERF_RECORD_MISC_USER for user space maps, see kernel/perf_event.c __perf_event_mmap */ }, }; /* @@ -257,16 +263,16 @@ int event__synthesize_kernel_mmap(event_ */ struct process_symbol_args args = { .name = symbol_name, }; - if (kallsyms__parse("/proc/kallsyms", &args, find_symbol_cb) <= 0) + if (kallsyms__parse(kallsyms_name, &args, find_symbol_cb) <= 0) return -ENOENT; size = snprintf(ev.mmap.filename, sizeof(ev.mmap.filename), - "[kernel.kallsyms.%s]", symbol_name) + 1; + "[%s.%s]", mmap_name, symbol_name) + 1; size = ALIGN(size, sizeof(u64)); ev.mmap.header.size = (sizeof(ev.mmap) - (sizeof(ev.mmap.filename) - size)); ev.mmap.pgoff = args.start; - ev.mmap.start = session->vmlinux_maps[MAP__FUNCTION]->start; - ev.mmap.len = session->vmlinux_maps[MAP__FUNCTION]->end - ev.mmap.start ; + ev.mmap.start = maps[MAP__FUNCTION]->start; + ev.mmap.len = maps[MAP__FUNCTION]->end - ev.mmap.start ; return process(&ev, session); } @@ -320,19 +326,25 @@ int event__process_lost(event_t *self, s return 0; } -int event__process_mmap(event_t *self, struct perf_session *session) +static void event_set_kernel_mmap_len(struct map **maps, event_t *self) { - struct thread *thread; - struct map *map; + maps[MAP__FUNCTION]->start = self->mmap.start; + maps[MAP__FUNCTION]->end = self->mmap.start + self->mmap.len; + /* + * Be a bit paranoid here, some perf.data file came with + * a zero sized synthesized MMAP event for the kernel. + */ + if (maps[MAP__FUNCTION]->end == 0) + maps[MAP__FUNCTION]->end = ~0UL; +} - dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n", - self->mmap.pid, self->mmap.tid, self->mmap.start, - self->mmap.len, self->mmap.pgoff, self->mmap.filename); +static int __event__process_mmap(event_t *self, struct perf_session *session) +{ + struct map *map; + static const char kmmap_prefix[] = "[kernel.kallsyms."; - if (self->mmap.pid == 0) { - static const char kmmap_prefix[] = "[kernel.kallsyms."; + if (self->mmap.filename[0] == '/') { - if (self->mmap.filename[0] == '/') { char short_module_name[1024]; char *name = strrchr(self->mmap.filename, '/'), *dot; @@ -348,9 +360,10 @@ int event__process_mmap(event_t *self, s "[%.*s]", (int)(dot - name), name); strxfrchar(short_module_name, '-', '_'); - map = perf_session__new_module_map(session, + map = map_groups__new_module(&session->kmaps, self->mmap.start, - self->mmap.filename); + self->mmap.filename, + 0); if (map == NULL) goto out_problem; @@ -373,22 +386,94 @@ int event__process_mmap(event_t *self, s if (kernel == NULL) goto out_problem; - kernel->kernel = 1; - if (__perf_session__create_kernel_maps(session, kernel) < 0) + kernel->kernel = DSO_TYPE_KERNEL; + if (__map_groups__create_kernel_maps(&session->kmaps, + session->vmlinux_maps, kernel) < 0) goto out_problem; - session->vmlinux_maps[MAP__FUNCTION]->start = self->mmap.start; - session->vmlinux_maps[MAP__FUNCTION]->end = self->mmap.start + self->mmap.len; - /* - * Be a bit paranoid here, some perf.data file came with - * a zero sized synthesized MMAP event for the kernel. - */ - if (session->vmlinux_maps[MAP__FUNCTION]->end == 0) - session->vmlinux_maps[MAP__FUNCTION]->end = ~0UL; - - perf_session__set_kallsyms_ref_reloc_sym(session, symbol_name, - self->mmap.pgoff); + event_set_kernel_mmap_len(session->vmlinux_maps, self); + perf_session__set_kallsyms_ref_reloc_sym(session->vmlinux_maps, + symbol_name, + self->mmap.pgoff); } + return 0; + +out_problem: + return -1; +} + +static int __event__process_guest_mmap(event_t *self, struct perf_session *session) +{ + struct map *map; + + static const char kmmap_prefix[] = "[guest.kernel.kallsyms."; + + if (memcmp(self->mmap.filename, kmmap_prefix, + sizeof(kmmap_prefix) - 1) == 0) { + const char *symbol_name = (self->mmap.filename + + sizeof(kmmap_prefix) - 1); + /* + * Should be there already, from the build-id table in + * the header. + */ + struct dso *kernel = __dsos__findnew(&dsos__guest_kernel, + "[guest.kernel.kallsyms]"); + if (kernel == NULL) + goto out_problem; + + kernel->kernel = DSO_TYPE_GUEST_KERNEL; + if (__map_groups__create_kernel_maps(&session->guest_kmaps, + session->guest_vmlinux_maps, kernel) < 0) + goto out_problem; + + event_set_kernel_mmap_len(session->guest_vmlinux_maps, self); + perf_session__set_kallsyms_ref_reloc_sym(session->guest_vmlinux_maps, + symbol_name, + self->mmap.pgoff); + /* + * preload dso of guest kernel and modules + */ + dso__load(kernel, session->guest_vmlinux_maps[MAP__FUNCTION], NULL); + } else if (self->mmap.filename[0] == '[') { + char *name; + + map = map_groups__new_module(&session->guest_kmaps, + self->mmap.start, + self->mmap.filename, + 1); + if (map == NULL) + goto out_problem; + name = strdup(self->mmap.filename); + if (name == NULL) + goto out_problem; + + map->dso->short_name = name; + map->end = map->start + self->mmap.len; + } + + return 0; +out_problem: + return -1; +} + +int event__process_mmap(event_t *self, struct perf_session *session) +{ + struct thread *thread; + struct map *map; + u8 cpumode = self->header.misc & PERF_RECORD_MISC_CPUMODE_MASK; + int ret; + + dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n", + self->mmap.pid, self->mmap.tid, self->mmap.start, + self->mmap.len, self->mmap.pgoff, self->mmap.filename); + + if (self->mmap.pid == 0) { + if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL) + ret = __event__process_guest_mmap(self, session); + else + ret = __event__process_mmap(self, session); + if (ret < 0) + goto out_problem; return 0; } @@ -441,15 +526,33 @@ void thread__find_addr_map(struct thread al->thread = self; al->addr = addr; + al->cpumode = cpumode; - if (cpumode == PERF_RECORD_MISC_KERNEL) { + if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) { al->level = 'k'; mg = &session->kmaps; - } else if (cpumode == PERF_RECORD_MISC_USER) + } else if (cpumode == PERF_RECORD_MISC_USER && perf_host) { al->level = '.'; - else { - al->level = 'H'; + } else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) { + al->level = 'g'; + mg = &session->guest_kmaps; + } else { + /* TODO: We don't support guest user space. Might support late */ + if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest) + al->level = 'u'; + else + al->level = 'H'; al->map = NULL; + + if ((cpumode == PERF_RECORD_MISC_GUEST_USER || + cpumode == PERF_RECORD_MISC_GUEST_KERNEL) && + !perf_guest) + al->filtered = true; + if ((cpumode == PERF_RECORD_MISC_USER || + cpumode == PERF_RECORD_MISC_KERNEL) && + !perf_host) + al->filtered = true; + return; } try_again: @@ -464,10 +567,18 @@ try_again: * "[vdso]" dso, but for now lets use the old trick of looking * in the whole kernel symbol list. */ - if ((long long)al->addr < 0 && mg != &session->kmaps) { + if ((long long)al->addr < 0 && + mg != &session->kmaps && + cpumode == PERF_RECORD_MISC_KERNEL) { mg = &session->kmaps; goto try_again; } + if ((long long)al->addr < 0 && + mg != &session->guest_kmaps && + cpumode == PERF_RECORD_MISC_GUEST_KERNEL) { + mg = &session->guest_kmaps; + goto try_again; + } } else al->addr = al->map->map_ip(al->map, al->addr); } @@ -513,6 +624,7 @@ int event__preprocess_sample(const event dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid); + al->filtered = false; thread__find_addr_location(thread, session, cpumode, MAP__FUNCTION, self->ip.ip, al, filter); dump_printf(" ...... dso: %s\n", @@ -536,7 +648,6 @@ int event__preprocess_sample(const event !strlist__has_entry(symbol_conf.sym_list, al->sym->name)) goto out_filtered; - al->filtered = false; return 0; out_filtered: diff -Nraup linux-2.6_tip0317/tools/perf/util/event.h linux-2.6_tip0317_perfkvm/tools/perf/util/event.h --- linux-2.6_tip0317/tools/perf/util/event.h 2010-03-18 09:04:40.934227537 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/util/event.h 2010-03-18 15:06:19.587050319 +0800 @@ -119,10 +119,17 @@ int event__synthesize_thread(pid_t pid, void event__synthesize_threads(event__handler_t process, struct perf_session *session); int event__synthesize_kernel_mmap(event__handler_t process, - struct perf_session *session, - const char *symbol_name); + struct perf_session *session, + const char *kallsyms_name, + const char *mmap_name, + struct map **maps, + const char *symbol_name, + unsigned int misc); + int event__synthesize_modules(event__handler_t process, - struct perf_session *session); + struct perf_session *session, + struct map_groups *kmaps, + unsigned int misc); int event__process_comm(event_t *self, struct perf_session *session); int event__process_lost(event_t *self, struct perf_session *session); diff -Nraup linux-2.6_tip0317/tools/perf/util/hist.c linux-2.6_tip0317_perfkvm/tools/perf/util/hist.c --- linux-2.6_tip0317/tools/perf/util/hist.c 2010-03-18 09:04:40.938289813 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/util/hist.c 2010-03-18 15:06:19.587050319 +0800 @@ -8,6 +8,30 @@ struct callchain_param callchain_param = .min_percent = 0.5 }; +void __perf_session__add_count(struct hist_entry *he, + struct addr_location *al, + u64 count) +{ + he->count += count; + + switch (al->cpumode) { + case PERF_RECORD_MISC_KERNEL: + he->count_sys += count; + break; + case PERF_RECORD_MISC_USER: + he->count_us += count; + break; + case PERF_RECORD_MISC_GUEST_KERNEL: + he->count_guest_sys += count; + break; + case PERF_RECORD_MISC_GUEST_USER: + he->count_guest_us += count; + break; + default: + break; + } +} + /* * histogram, sorted on item, collects counts */ @@ -26,7 +50,6 @@ struct hist_entry *__perf_session__add_h .sym = al->sym, .ip = al->addr, .level = al->level, - .count = count, .parent = sym_parent, }; int cmp; @@ -48,6 +71,8 @@ struct hist_entry *__perf_session__add_h p = &(*p)->rb_right; } + __perf_session__add_count(&entry, al, count); + he = malloc(sizeof(*he)); if (!he) return NULL; @@ -462,7 +487,7 @@ size_t hist_entry__fprintf(struct hist_e u64 session_total) { struct sort_entry *se; - u64 count, total; + u64 count, total, count_sys, count_us, count_guest_sys, count_guest_us; const char *sep = symbol_conf.field_sep; size_t ret; @@ -472,15 +497,35 @@ size_t hist_entry__fprintf(struct hist_e if (pair_session) { count = self->pair ? self->pair->count : 0; total = pair_session->events_stats.total; + count_sys = self->pair ? self->pair->count_sys : 0; + count_us = self->pair ? self->pair->count_us : 0; + count_guest_sys = self->pair ? self->pair->count_guest_sys : 0; + count_guest_us = self->pair ? self->pair->count_guest_us : 0; } else { count = self->count; total = session_total; + count_sys = self->count_sys; + count_us = self->count_us; + count_guest_sys = self->count_guest_sys; + count_guest_us = self->count_guest_us; } - if (total) + if (total) { ret = percent_color_fprintf(fp, sep ? "%.2f" : " %6.2f%%", (count * 100.0) / total); - else + if (symbol_conf.show_cpu_utilization) { + ret += percent_color_fprintf(fp, sep ? "%.2f" : " %6.2f%%", + (count_sys * 100.0) / total); + ret += percent_color_fprintf(fp, sep ? "%.2f" : " %6.2f%%", + (count_us * 100.0) / total); + if (perf_guest) { + ret += percent_color_fprintf(fp, sep ? "%.2f" : " %6.2f%%", + (count_guest_sys * 100.0) / total); + ret += percent_color_fprintf(fp, sep ? "%.2f" : " %6.2f%%", + (count_guest_us * 100.0) / total); + } + } + } else ret = fprintf(fp, sep ? "%lld" : "%12lld ", count); if (symbol_conf.show_nr_samples) { @@ -576,6 +621,20 @@ size_t perf_session__fprintf_hists(struc fputs(" Samples ", fp); } + if (symbol_conf.show_cpu_utilization) { + if (sep) { + ret += fprintf(fp, "%csys", *sep); + ret += fprintf(fp, "%cus", *sep); + ret += fprintf(fp, "%cguest sys", *sep); + ret += fprintf(fp, "%cguest us", *sep); + } else { + ret += fprintf(fp, " sys "); + ret += fprintf(fp, " us "); + ret += fprintf(fp, " guest sys "); + ret += fprintf(fp, " guest us "); + } + } + if (pair) { if (sep) ret += fprintf(fp, "%cDelta", *sep); diff -Nraup linux-2.6_tip0317/tools/perf/util/hist.h linux-2.6_tip0317_perfkvm/tools/perf/util/hist.h --- linux-2.6_tip0317/tools/perf/util/hist.h 2010-03-18 09:04:40.938289813 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/util/hist.h 2010-03-18 15:06:19.591054262 +0800 @@ -12,6 +12,9 @@ struct addr_location; struct symbol; struct rb_root; +void __perf_session__add_count(struct hist_entry *he, + struct addr_location *al, + u64 count); struct hist_entry *__perf_session__add_hist_entry(struct rb_root *hists, struct addr_location *al, struct symbol *parent, diff -Nraup linux-2.6_tip0317/tools/perf/util/session.c linux-2.6_tip0317_perfkvm/tools/perf/util/session.c --- linux-2.6_tip0317/tools/perf/util/session.c 2010-03-18 09:04:40.938289813 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/util/session.c 2010-03-18 15:06:19.591054262 +0800 @@ -54,7 +54,12 @@ out_close: static inline int perf_session__create_kernel_maps(struct perf_session *self) { - return map_groups__create_kernel_maps(&self->kmaps, self->vmlinux_maps); + int ret; + ret = map_groups__create_kernel_maps(&self->kmaps, self->vmlinux_maps); + if (ret >= 0) + ret = map_groups__create_guest_kernel_maps(&self->guest_kmaps, + self->guest_vmlinux_maps); + return ret; } struct perf_session *perf_session__new(const char *filename, int mode, bool force) @@ -77,6 +82,7 @@ struct perf_session *perf_session__new(c self->cwdlen = 0; self->unknown_events = 0; map_groups__init(&self->kmaps); + map_groups__init(&self->guest_kmaps); if (mode == O_RDONLY) { if (perf_session__open(self, force) < 0) @@ -356,7 +362,8 @@ int perf_header__read_build_ids(struct p if (read(input, filename, len) != len) goto out; - if (bev.header.misc & PERF_RECORD_MISC_KERNEL) + if ((bev.header.misc & PERF_RECORD_MISC_CPUMODE_MASK) + == PERF_RECORD_MISC_KERNEL) head = &dsos__kernel; dso = __dsos__findnew(head, filename); @@ -519,26 +526,33 @@ bool perf_session__has_traces(struct per return true; } -int perf_session__set_kallsyms_ref_reloc_sym(struct perf_session *self, +int perf_session__set_kallsyms_ref_reloc_sym(struct map ** maps, const char *symbol_name, u64 addr) { char *bracket; enum map_type i; + struct ref_reloc_sym *ref; - self->ref_reloc_sym.name = strdup(symbol_name); - if (self->ref_reloc_sym.name == NULL) + ref = zalloc(sizeof(struct ref_reloc_sym)); + if (ref == NULL) return -ENOMEM; - bracket = strchr(self->ref_reloc_sym.name, ']'); + ref->name = strdup(symbol_name); + if (ref->name == NULL) { + free(ref); + return -ENOMEM; + } + + bracket = strchr(ref->name, ']'); if (bracket) *bracket = '\0'; - self->ref_reloc_sym.addr = addr; + ref->addr = addr; for (i = 0; i < MAP__NR_TYPES; ++i) { - struct kmap *kmap = map__kmap(self->vmlinux_maps[i]); - kmap->ref_reloc_sym = &self->ref_reloc_sym; + struct kmap *kmap = map__kmap(maps[i]); + kmap->ref_reloc_sym = ref; } return 0; diff -Nraup linux-2.6_tip0317/tools/perf/util/session.h linux-2.6_tip0317_perfkvm/tools/perf/util/session.h --- linux-2.6_tip0317/tools/perf/util/session.h 2010-03-18 09:04:40.926228328 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/util/session.h 2010-03-18 15:06:19.591054262 +0800 @@ -16,16 +16,17 @@ struct perf_session { unsigned long size; unsigned long mmap_window; struct map_groups kmaps; + struct map_groups guest_kmaps; struct rb_root threads; struct thread *last_match; struct map *vmlinux_maps[MAP__NR_TYPES]; + struct map *guest_vmlinux_maps[MAP__NR_TYPES]; struct events_stats events_stats; struct rb_root stats_by_id; unsigned long event_total[PERF_RECORD_MAX]; unsigned long unknown_events; struct rb_root hists; u64 sample_type; - struct ref_reloc_sym ref_reloc_sym; int fd; int cwdlen; char *cwd; @@ -67,26 +68,12 @@ bool perf_session__has_traces(struct per int perf_header__read_build_ids(struct perf_header *self, int input, u64 offset, u64 file_size); -int perf_session__set_kallsyms_ref_reloc_sym(struct perf_session *self, +int perf_session__set_kallsyms_ref_reloc_sym(struct map ** maps, const char *symbol_name, u64 addr); void mem_bswap_64(void *src, int byte_size); -static inline int __perf_session__create_kernel_maps(struct perf_session *self, - struct dso *kernel) -{ - return __map_groups__create_kernel_maps(&self->kmaps, - self->vmlinux_maps, kernel); -} - -static inline struct map * - perf_session__new_module_map(struct perf_session *self, - u64 start, const char *filename) -{ - return map_groups__new_module(&self->kmaps, start, filename); -} - #ifdef NO_NEWT_SUPPORT static inline void perf_session__browse_hists(struct rb_root *hists __used, u64 session_total __used, diff -Nraup linux-2.6_tip0317/tools/perf/util/sort.h linux-2.6_tip0317_perfkvm/tools/perf/util/sort.h --- linux-2.6_tip0317/tools/perf/util/sort.h 2010-03-18 09:04:40.930227237 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/util/sort.h 2010-03-18 15:06:19.591054262 +0800 @@ -44,6 +44,10 @@ extern enum sort_type sort__first_dimens struct hist_entry { struct rb_node rb_node; u64 count; + u64 count_sys; + u64 count_us; + u64 count_guest_sys; + u64 count_guest_us; struct thread *thread; struct map *map; struct symbol *sym; diff -Nraup linux-2.6_tip0317/tools/perf/util/symbol.c linux-2.6_tip0317_perfkvm/tools/perf/util/symbol.c --- linux-2.6_tip0317/tools/perf/util/symbol.c 2010-03-18 09:04:40.930227237 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/util/symbol.c 2010-03-18 15:09:59.498404450 +0800 @@ -22,6 +22,8 @@ static void dsos__add(struct list_head * static struct map *map__new2(u64 start, struct dso *dso, enum map_type type); static int dso__load_kernel_sym(struct dso *self, struct map *map, symbol_filter_t filter); +static int dso__load_guest_kernel_sym(struct dso *self, struct map *map, + symbol_filter_t filter); static int vmlinux_path__nr_entries; static char **vmlinux_path; @@ -180,6 +182,7 @@ struct dso *dso__new(const char *name) self->loaded = 0; self->sorted_by_name = 0; self->has_build_id = 0; + self->kernel = DSO_TYPE_USER; } return self; @@ -396,12 +399,9 @@ int kallsyms__parse(const char *filename char *symbol_name; line_len = getline(&line, &n, file); - if (line_len < 0) + if (line_len < 0 || !line) break; - if (!line) - goto out_failure; - line[--line_len] = '\0'; /* \n */ len = hex2u64(line, &start); @@ -453,6 +453,7 @@ static int map__process_kallsym_symbol(v * map__split_kallsyms, when we have split the maps per module */ symbols__insert(root, sym); + return 0; } @@ -498,6 +499,15 @@ static int dso__split_kallsyms(struct ds *module++ = '\0'; if (strcmp(curr_map->dso->short_name, module)) { + if (curr_map != map && + self->kernel == DSO_TYPE_GUEST_KERNEL) { + /* + * We assume all symbols of a module are continuous in + * kallsyms, so curr_map points to a module and all its + * symbols are in its kmap. Mark it as loaded. + */ + dso__set_loaded(curr_map->dso, curr_map->type); + } curr_map = map_groups__find_by_name(kmaps, map->type, module); if (curr_map == NULL) { pr_debug("/proc/{kallsyms,modules} " @@ -519,13 +529,19 @@ static int dso__split_kallsyms(struct ds char dso_name[PATH_MAX]; struct dso *dso; - snprintf(dso_name, sizeof(dso_name), "[kernel].%d", - kernel_range++); + if (self->kernel == DSO_TYPE_GUEST_KERNEL) + snprintf(dso_name, sizeof(dso_name), "[guest.kernel].%d", + kernel_range++); + else + snprintf(dso_name, sizeof(dso_name), "[kernel].%d", + kernel_range++); dso = dso__new(dso_name); if (dso == NULL) return -1; + dso->kernel = self->kernel; + curr_map = map__new2(pos->start, dso, map->type); if (curr_map == NULL) { dso__delete(dso); @@ -549,6 +565,10 @@ discard_symbol: rb_erase(&pos->rb_node, } } + if (curr_map != map && + self->kernel == DSO_TYPE_GUEST_KERNEL) + dso__set_loaded(curr_map->dso, curr_map->type); + return count; } @@ -559,7 +579,10 @@ int dso__load_kallsyms(struct dso *self, return -1; symbols__fixup_end(&self->symbols[map->type]); - self->origin = DSO__ORIG_KERNEL; + if (self->kernel == DSO_TYPE_GUEST_KERNEL) + self->origin = DSO__ORIG_GUEST_KERNEL; + else + self->origin = DSO__ORIG_KERNEL; return dso__split_kallsyms(self, map, filter); } @@ -946,7 +969,7 @@ static int dso__load_sym(struct dso *sel nr_syms = shdr.sh_size / shdr.sh_entsize; memset(&sym, 0, sizeof(sym)); - if (!self->kernel) { + if (self->kernel == DSO_TYPE_USER) { self->adjust_symbols = (ehdr.e_type == ET_EXEC || elf_section_by_name(elf, &ehdr, &shdr, ".gnu.prelink_undo", @@ -978,7 +1001,7 @@ static int dso__load_sym(struct dso *sel section_name = elf_sec__name(&shdr, secstrs); - if (self->kernel || kmodule) { + if (self->kernel != DSO_TYPE_USER || kmodule) { char dso_name[PATH_MAX]; if (strcmp(section_name, @@ -1005,6 +1028,7 @@ static int dso__load_sym(struct dso *sel curr_dso = dso__new(dso_name); if (curr_dso == NULL) goto out_elf_end; + curr_dso->kernel = self->kernel; curr_map = map__new2(start, curr_dso, map->type); if (curr_map == NULL) { @@ -1015,7 +1039,10 @@ static int dso__load_sym(struct dso *sel curr_map->unmap_ip = identity__map_ip; curr_dso->origin = self->origin; map_groups__insert(kmap->kmaps, curr_map); - dsos__add(&dsos__kernel, curr_dso); + if (curr_dso->kernel == DSO_TYPE_GUEST_KERNEL) + dsos__add(&dsos__guest_kernel, curr_dso); + else + dsos__add(&dsos__kernel, curr_dso); dso__set_loaded(curr_dso, map->type); } else curr_dso = curr_map->dso; @@ -1236,6 +1263,8 @@ char dso__symtab_origin(const struct dso [DSO__ORIG_BUILDID] = 'b', [DSO__ORIG_DSO] = 'd', [DSO__ORIG_KMODULE] = 'K', + [DSO__ORIG_GUEST_KERNEL] = 'g', + [DSO__ORIG_GUEST_KMODULE] = 'G', }; if (self == NULL || self->origin == DSO__ORIG_NOT_FOUND) @@ -1254,8 +1283,10 @@ int dso__load(struct dso *self, struct m dso__set_loaded(self, map->type); - if (self->kernel) + if (self->kernel == DSO_TYPE_KERNEL) return dso__load_kernel_sym(self, map, filter); + else if (self->kernel == DSO_TYPE_GUEST_KERNEL) + return dso__load_guest_kernel_sym(self, map, filter); name = malloc(size); if (!name) @@ -1459,7 +1490,7 @@ static int map_groups__set_modules_path( static struct map *map__new2(u64 start, struct dso *dso, enum map_type type) { struct map *self = zalloc(sizeof(*self) + - (dso->kernel ? sizeof(struct kmap) : 0)); + (dso->kernel != DSO_TYPE_USER ? sizeof(struct kmap) : 0)); if (self != NULL) { /* * ->end will be filled after we load all the symbols @@ -1471,11 +1502,15 @@ static struct map *map__new2(u64 start, } struct map *map_groups__new_module(struct map_groups *self, u64 start, - const char *filename) + const char *filename, int guest) { struct map *map; - struct dso *dso = __dsos__findnew(&dsos__kernel, filename); + struct dso *dso; + if (!guest) + dso = __dsos__findnew(&dsos__kernel, filename); + else + dso = __dsos__findnew(&dsos__guest_kernel, filename); if (dso == NULL) return NULL; @@ -1483,16 +1518,20 @@ struct map *map_groups__new_module(struc if (map == NULL) return NULL; - dso->origin = DSO__ORIG_KMODULE; + if (guest) + dso->origin = DSO__ORIG_GUEST_KMODULE; + else + dso->origin = DSO__ORIG_KMODULE; map_groups__insert(self, map); return map; } -static int map_groups__create_modules(struct map_groups *self) +static int __map_groups__create_modules(struct map_groups *self, + const char * filename, int guest) { char *line = NULL; size_t n; - FILE *file = fopen("/proc/modules", "r"); + FILE *file = fopen(filename, "r"); struct map *map; if (file == NULL) @@ -1526,16 +1565,17 @@ static int map_groups__create_modules(st *sep = '\0'; snprintf(name, sizeof(name), "[%s]", line); - map = map_groups__new_module(self, start, name); + map = map_groups__new_module(self, start, name, guest); if (map == NULL) goto out_delete_line; - dso__kernel_module_get_build_id(map->dso); + if (!guest) + dso__kernel_module_get_build_id(map->dso); } free(line); fclose(file); - return map_groups__set_modules_path(self); + return 0; out_delete_line: free(line); @@ -1543,6 +1583,21 @@ out_failure: return -1; } +static int map_groups__create_modules(struct map_groups *self) +{ + int ret; + + ret = __map_groups__create_modules(self, "/proc/modules", 0); + if (ret >= 0) + ret = map_groups__set_modules_path(self); + return ret; +} + +static int map_groups__create_guest_modules(struct map_groups *self) +{ + return __map_groups__create_modules(self, symbol_conf.guest_modules, 1); +} + static int dso__load_vmlinux(struct dso *self, struct map *map, const char *vmlinux, symbol_filter_t filter) { @@ -1702,8 +1757,45 @@ out_fixup: return err; } +static int dso__load_guest_kernel_sym(struct dso *self, struct map *map, + symbol_filter_t filter) +{ + int err; + const char *kallsyms_filename = NULL; + + /* + * if the user specified a vmlinux filename, use it and only + * it, reporting errors to the user if it cannot be used. + * Or use file guest_kallsyms inputted by user on commandline + */ + if (symbol_conf.guest_vmlinux_name != NULL) { + err = dso__load_vmlinux(self, map, + symbol_conf.guest_vmlinux_name, filter); + goto out_try_fixup; + } + + kallsyms_filename = symbol_conf.guest_kallsyms; + if (!kallsyms_filename) + return -1; + err = dso__load_kallsyms(self, kallsyms_filename, map, filter); + if (err > 0) + pr_debug("Using %s for symbols\n", kallsyms_filename); + +out_try_fixup: + if (err > 0) { + if (kallsyms_filename != NULL) + dso__set_long_name(self, strdup("[guest.kernel.kallsyms]")); + map__fixup_start(map); + map__fixup_end(map); + } + + return err; +} + LIST_HEAD(dsos__user); LIST_HEAD(dsos__kernel); +LIST_HEAD(dsos__guest_user); +LIST_HEAD(dsos__guest_kernel); static void dsos__add(struct list_head *head, struct dso *dso) { @@ -1750,6 +1842,8 @@ void dsos__fprintf(FILE *fp) { __dsos__fprintf(&dsos__kernel, fp); __dsos__fprintf(&dsos__user, fp); + __dsos__fprintf(&dsos__guest_kernel, fp); + __dsos__fprintf(&dsos__guest_user, fp); } static size_t __dsos__fprintf_buildid(struct list_head *head, FILE *fp, @@ -1779,7 +1873,19 @@ struct dso *dso__new_kernel(const char * if (self != NULL) { dso__set_short_name(self, "[kernel]"); - self->kernel = 1; + self->kernel = DSO_TYPE_KERNEL; + } + + return self; +} + +struct dso *dso__new_guest_kernel(const char *name) +{ + struct dso *self = dso__new(name ?: "[guest.kernel.kallsyms]"); + + if (self != NULL) { + dso__set_short_name(self, "[guest.kernel]"); + self->kernel = DSO_TYPE_GUEST_KERNEL; } return self; @@ -1804,6 +1910,15 @@ static struct dso *dsos__create_kernel(c return kernel; } +static struct dso *dsos__create_guest_kernel(const char *vmlinux) +{ + struct dso *kernel = dso__new_guest_kernel(vmlinux); + + if (kernel != NULL) + dsos__add(&dsos__guest_kernel, kernel); + return kernel; +} + int __map_groups__create_kernel_maps(struct map_groups *self, struct map *vmlinux_maps[MAP__NR_TYPES], struct dso *kernel) @@ -1963,3 +2078,24 @@ int map_groups__create_kernel_maps(struc map_groups__fixup_end(self); return 0; } + +int map_groups__create_guest_kernel_maps(struct map_groups *self, + struct map *vmlinux_maps[MAP__NR_TYPES]) +{ + struct dso *kernel = dsos__create_guest_kernel(symbol_conf.guest_vmlinux_name); + + if (kernel == NULL) + return -1; + + if (__map_groups__create_kernel_maps(self, vmlinux_maps, kernel) < 0) + return -1; + + if (symbol_conf.use_modules && map_groups__create_guest_modules(self) < 0) + pr_debug("Problems creating module maps, continuing anyway...\n"); + /* + * Now that we have all the maps created, just set the ->end of them: + */ + map_groups__fixup_end(self); + return 0; +} + diff -Nraup linux-2.6_tip0317/tools/perf/util/symbol.h linux-2.6_tip0317_perfkvm/tools/perf/util/symbol.h --- linux-2.6_tip0317/tools/perf/util/symbol.h 2010-03-18 09:04:40.938289813 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/util/symbol.h 2010-03-18 15:06:19.591054262 +0800 @@ -63,10 +63,14 @@ struct symbol_conf { show_nr_samples, use_callchain, exclude_other, - full_paths; + full_paths, + show_cpu_utilization; const char *vmlinux_name, *field_sep; - char *dso_list_str, + const char *guest_vmlinux_name, + *guest_kallsyms, + *guest_modules; + char *dso_list_str, *comm_list_str, *sym_list_str, *col_width_list_str; @@ -95,6 +99,13 @@ struct addr_location { u64 addr; char level; bool filtered; + unsigned int cpumode; +}; + +enum dso_kernel_type { + DSO_TYPE_USER = 0, + DSO_TYPE_KERNEL, + DSO_TYPE_GUEST_KERNEL }; struct dso { @@ -104,7 +115,7 @@ struct dso { u8 adjust_symbols:1; u8 slen_calculated:1; u8 has_build_id:1; - u8 kernel:1; + enum dso_kernel_type kernel; u8 hit:1; u8 annotate_warned:1; unsigned char origin; @@ -120,6 +131,7 @@ struct dso { struct dso *dso__new(const char *name); struct dso *dso__new_kernel(const char *name); +struct dso *dso__new_guest_kernel(const char *name); void dso__delete(struct dso *self); bool dso__loaded(const struct dso *self, enum map_type type); @@ -132,7 +144,7 @@ static inline void dso__set_loaded(struc void dso__sort_by_name(struct dso *self, enum map_type type); -extern struct list_head dsos__user, dsos__kernel; +extern struct list_head dsos__user, dsos__kernel, dsos__guest_user, dsos__guest_kernel; struct dso *__dsos__findnew(struct list_head *head, const char *name); @@ -161,6 +173,8 @@ enum dso_origin { DSO__ORIG_BUILDID, DSO__ORIG_DSO, DSO__ORIG_KMODULE, + DSO__ORIG_GUEST_KERNEL, + DSO__ORIG_GUEST_KMODULE, DSO__ORIG_NOT_FOUND, }; diff -Nraup linux-2.6_tip0317/tools/perf/util/thread.h linux-2.6_tip0317_perfkvm/tools/perf/util/thread.h --- linux-2.6_tip0317/tools/perf/util/thread.h 2010-03-18 09:04:40.926228328 +0800 +++ linux-2.6_tip0317_perfkvm/tools/perf/util/thread.h 2010-03-18 15:06:19.591054262 +0800 @@ -82,6 +82,9 @@ int __map_groups__create_kernel_maps(str int map_groups__create_kernel_maps(struct map_groups *self, struct map *vmlinux_maps[MAP__NR_TYPES]); +int map_groups__create_guest_kernel_maps(struct map_groups *self, + struct map *vmlinux_maps[MAP__NR_TYPES]); + struct map *map_groups__new_module(struct map_groups *self, u64 start, - const char *filename); + const char *filename, int guest); #endif /* __PERF_THREAD_H */ ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-19 3:38 ` Zhang, Yanmin @ 2010-03-19 8:21 ` Ingo Molnar 2010-03-19 17:29 ` oerg Roedel 2010-03-22 7:24 ` Zhang, Yanmin 0 siblings, 2 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-19 8:21 UTC (permalink / raw) To: Zhang, Yanmin Cc: Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo Nice progress! This bit: > 1) perf kvm top > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > --guestmodules=/home/ymzhang/guest/modules top Will be really be painful to developers - to enter that long line while we have these things called 'computers' that ought to reduce human work. Also, it's incomplete, we need access to the guest system's binaries to do ELF symbol resolution and dwarf decoding. So we really need some good, automatic way to get to the guest symbol space, so that if a developer types: perf kvm top Then the obvious thing happens by default. (which is to show the guest overhead) There's no technical barrier on the perf tooling side to implement all that: perf supports build-ids extensively and can deal with multiple symbol spaces - as long as it has access to it. The guest kernel could be ID-ed based on its /sys/kernel/notes and /sys/module/*/notes/.note.gnu.build-id build-ids. So some sort of --guestmount option would be the natural solution, which points to the guest system's root: and a Qemu enumeration of guest mounts (which would be off by default and configurable) from which perf can pick up the target guest all automatically. (obviously only under allowed permissions so that such access is secure) This would allow not just kallsyms access via $guest/proc/kallsyms but also gives us the full space of symbol features: access to the guest binaries for annotation and general symbol resolution, command/binary name identification, etc. Such a mount would obviously not broaden existing privileges - and as an additional control a guest would also have a way to indicate that it does not wish a guest mount at all. Unfortunately, in a previous thread the Qemu maintainer has indicated that he will essentially NAK any attempt to enhance Qemu to provide an easily discoverable, self-contained, transparent guest mount on the host side. No technical justification was given for that NAK, despite my repeated requests to particulate the exact security problems that such an approach would cause. If that NAK does not stand in that form then i'd like to know about it - it makes no sense for us to try to code up a solution against a standing maintainer NAK ... The other option is some sysadmin level hackery to NFS-mount the guest or so. This is a vastly inferior method that brings us back to the absymal usability levels of OProfile: 1) it wont be guest transparent 2) has to be re-done for every guest image. 3) even if packaged it has to be gotten into every. single. Linux. distro. separately. 4) old Linux guests wont work out of box In other words: it's very inconvenient on multiple levels and wont ever happen on any reasonable enough scale to make a difference to Linux. Which is an unfortunate situation - and the ball is on the KVM/Qemu side so i can do little about it. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-19 8:21 ` Ingo Molnar @ 2010-03-19 17:29 ` oerg Roedel 2010-03-21 18:43 ` Ingo Molnar 2010-03-22 7:24 ` Zhang, Yanmin 1 sibling, 1 reply; 390+ messages in thread From: oerg Roedel @ 2010-03-19 17:29 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo On Fri, Mar 19, 2010 at 09:21:22AM +0100, Ingo Molnar wrote: > Unfortunately, in a previous thread the Qemu maintainer has indicated that he > will essentially NAK any attempt to enhance Qemu to provide an easily > discoverable, self-contained, transparent guest mount on the host side. > > No technical justification was given for that NAK, despite my repeated > requests to particulate the exact security problems that such an approach > would cause. > > If that NAK does not stand in that form then i'd like to know about it - it > makes no sense for us to try to code up a solution against a standing > maintainer NAK ... I still think it is the best and most generic way to let the guest do the symbol resolution. This has several advantages: 1. The guest knows best about its symbol space. So this would be extensible to other guest operating systems. A brave developer may even implement symbol passing for Windows or the BSDs ;-) 2. The guest can decide for its own if it want to pass this inforamtion to the host-perf. No security issues at all. 3. The guest can also pass us the call-chain and we don't need to care about complicated of fetching from the guest ourself. 4. This way extensible to nested virtualization too. How we speak to the guest was already discussed in this thread. My personal opinion is that going through qemu is an unnecessary step and we can solve that more clever and transparent for perf. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-19 17:29 ` oerg Roedel @ 2010-03-21 18:43 ` Ingo Molnar 2010-03-22 10:14 ` oerg Roedel 0 siblings, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-21 18:43 UTC (permalink / raw) To: oerg Roedel Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo * oerg Roedel <joro@8bytes.org> wrote: > On Fri, Mar 19, 2010 at 09:21:22AM +0100, Ingo Molnar wrote: > > Unfortunately, in a previous thread the Qemu maintainer has indicated that he > > will essentially NAK any attempt to enhance Qemu to provide an easily > > discoverable, self-contained, transparent guest mount on the host side. > > > > No technical justification was given for that NAK, despite my repeated > > requests to particulate the exact security problems that such an approach > > would cause. > > > > If that NAK does not stand in that form then i'd like to know about it - it > > makes no sense for us to try to code up a solution against a standing > > maintainer NAK ... > > I still think it is the best and most generic way to let the guest do the > symbol resolution. [...] Not really. > [...] This has several advantages: > > 1. The guest knows best about its symbol space. So this would be > extensible to other guest operating systems. A brave > developer may even implement symbol passing for Windows or > the BSDs ;-) Having access to the actual executable files that include the symbols achieves precisely that - with the additional robustness that all this functionality is concentrated into the host, while the guest side is kept minimal (and transparent). > 2. The guest can decide for its own if it want to pass this > inforamtion to the host-perf. No security issues at all. It can decide whether it exposes the files. Nor are there any "security issues" to begin with. > 3. The guest can also pass us the call-chain and we don't need > to care about complicated of fetching from the guest > ourself. You need to be aware of the fact that symbol resolution is a separate step from call chain generation. I.e. call-chains are a (entirely) separate issue, and could reasonably be done in the guest or in the host. It has no bearing on this symbol resolution question. > 4. This way extensible to nested virtualization too. Nested virtualization is actually already taken care of by the filesystem solution via an existing method called 'subdirectories'. If the guest offers sub-guests then those symbols will be exposed in a similar way via its own 'guest files' directory hierarchy. I.e. if we have 'Guest-2' nested inside 'the 'Guest-Fedora-1' instance, we get: /guests/ /guests/Guest-Fedora-1/etc/ /guests/Guest-Fedora-1/usr/ we'd also have: /guests/Guest-Fedora-1/guests/Guest-2/ So this is taken care of automatically. I.e. none of the four 'advantages' listed here are actually advantages over my proposed solution, so your conclusion is subsequently flawed as well. > How we speak to the guest was already discussed in this thread. My personal > opinion is that going through qemu is an unnecessary step and we can solve > that more clever and transparent for perf. Meaning exactly what? Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-21 18:43 ` Ingo Molnar @ 2010-03-22 10:14 ` oerg Roedel 2010-03-22 10:37 ` Ingo Molnar 2010-03-22 10:59 ` Ingo Molnar 0 siblings, 2 replies; 390+ messages in thread From: oerg Roedel @ 2010-03-22 10:14 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo On Sun, Mar 21, 2010 at 07:43:00PM +0100, Ingo Molnar wrote: > Having access to the actual executable files that include the symbols achieves > precisely that - with the additional robustness that all this functionality is > concentrated into the host, while the guest side is kept minimal (and > transparent). If you want to access the guests file-system you need a piece of software running in the guest which gives you this access. But when you get an event this piece of software may not be runnable (if the guest is in an interrupt handler or any other non-preemptible code path). When the host finally gets access to the guests filesystem again the source of that event may already be gone (process has exited, module unloaded...). The only way to solve that is to pass the event information to the guest immediatly and let it collect the information we want. > It can decide whether it exposes the files. Nor are there any "security > issues" to begin with. I am not talking about security. Security was sufficiently flamed about already. > You need to be aware of the fact that symbol resolution is a separate step > from call chain generation. Same concern as above applies to call-chain generation too. > > How we speak to the guest was already discussed in this thread. My personal > > opinion is that going through qemu is an unnecessary step and we can solve > > that more clever and transparent for perf. > > Meaning exactly what? Avi was against that but I think it would make sense to give names to virtual machines (with a default, similar to network interface names). Then we can create a directory in /dev/ with that name (e.g. /dev/vm/fedora/). Inside the guest a (priviledged) process can create some kind of named virt-pipe which results in a device file created in the guests directory (perf could create /dev/vm/fedora/perf for example). This file is used for guest-host communication. Thanks, Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-22 10:14 ` oerg Roedel @ 2010-03-22 10:37 ` Ingo Molnar 2010-03-22 10:59 ` Ingo Molnar 1 sibling, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 10:37 UTC (permalink / raw) To: oerg Roedel Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo * oerg Roedel <joro@8bytes.org> wrote: > > It can decide whether it exposes the files. Nor are there any "security > > issues" to begin with. > > I am not talking about security. [...] You were talking about security, in the portion of your mail that you snipped out, and which i replied to: > > 2. The guest can decide for its own if it want to pass this > > inforamtion to the host-perf. No security issues at all. I understood that portion to mean what it says: that your claim that your proposal 'has no security issues at all', in contrast to my suggestion. > [...] Security was sufficiently flamed about already. All i saw was my suggestion to allow a guest to securely (and scalably and conveniently) integrate/mount its filesystems to the host if both sides (both the host and the guest) permit it, to make it easier for instrumentation to pick up symbol details. I.e. if a guest runs then its filesystem may be present on the host side as: /guests/Fedora-G1/ /guests/Fedora-G1/proc/ /guests/Fedora-G1/usr/ /guests/Fedora-G1/.../ ( This feature would be configurable and would be default-off, to maintain the current status quo. ) i.e. it's a bit like sshfs or NFS or loopback block mounts, just in an integrated and working fashion (sshfs doesnt work well with /proc for example) and more guest transparent (obviously sshfs or NFS exports need per guest configuration), and lower overhead than sshfs/NFS - i.e. without the (unnecessary) networking overhead. That suggestion was 'countered' by an unsubstantiated claim by Anthony that this kind of usability feature would somehow be a 'security nighmare'. In reality it is just an incremental, more usable, faster and more guest-transparent form of what is already possible today via: - loopback mounts on host - NFS exports - SMB exports - sshfs - (and other mechanisms) I wish there was at least flaming about it - as flames tend to have at least some specifics in them. What i saw instead was a claim about a 'security nightmare', which was, when i asked for specifics, was followed by deafening silence. And you appear to have repeated that claim here, unwilling to back it up with specifics. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-22 10:14 ` oerg Roedel 2010-03-22 10:37 ` Ingo Molnar @ 2010-03-22 10:59 ` Ingo Molnar 2010-03-22 11:47 ` Joerg Roedel 1 sibling, 1 reply; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 10:59 UTC (permalink / raw) To: oerg Roedel Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo * oerg Roedel <joro@8bytes.org> wrote: > On Sun, Mar 21, 2010 at 07:43:00PM +0100, Ingo Molnar wrote: > > Having access to the actual executable files that include the symbols achieves > > precisely that - with the additional robustness that all this functionality is > > concentrated into the host, while the guest side is kept minimal (and > > transparent). > > If you want to access the guests file-system you need a piece of software > running in the guest which gives you this access. But when you get an event > this piece of software may not be runnable (if the guest is in an interrupt > handler or any other non-preemptible code path). When the host finally gets > access to the guests filesystem again the source of that event may already > be gone (process has exited, module unloaded...). The only way to solve that > is to pass the event information to the guest immediatly and let it collect > the information we want. The very same is true of profiling in the host space as well (KVM is nothing special here, other than its unreasonable insistence on not enumerating readily available information in a more usable way). So are you suggesting a solution to a perf problem we already solved differently? (and which i argue we solved in a better way) We have solved that in the host space already (and quite elaborately so), and not via your suggestion of moving symbol resolution to a different stage, but by properly generating the right events to allow the post-processing stage to see processes that have already exited, to robustly handle files that have been rebuilt, etc. >From an instrumentation POV it is fundamentally better to acquire the right data and delay any complexities to the analysis stage (the perf model) than to complicate sampling (the oprofile dcookies model). Your proposal of 'doing the symbol resolution in the guest context' is in essence re-arguing that very similar point that oprofile lost. Did you really intend to re-argue that point as well? If yes then please propose an alternative implementation for everything that perf does wrt. symbol lookups. What we propose for 'perf kvm' right now is simply a straight-forward extension of the existing (and well working) symbol handling code to virtualization. > > You need to be aware of the fact that symbol resolution is a separate step > > from call chain generation. > > Same concern as above applies to call-chain generation too. Best would be if you demonstrated any problems of the perf symbol lookup code you are aware of on the host side, as it has that exact design you are criticising here. We are eager to fix any bugs in it. If you claim that it's buggy then that should very much be demonstratable - no need to go into theoretical arguments about it. ( You should be aware of the fact that perf currently works with 'processes exiting prematurely' and similar scenarios just fine, so if you want to demonstrate that it's broken you will probably need a different example. ) > > > How we speak to the guest was already discussed in this thread. My > > > personal opinion is that going through qemu is an unnecessary step and > > > we can solve that more clever and transparent for perf. > > > > Meaning exactly what? > > Avi was against that but I think it would make sense to give names to > virtual machines (with a default, similar to network interface names). Then > we can create a directory in /dev/ with that name (e.g. /dev/vm/fedora/). > Inside the guest a (priviledged) process can create some kind of named > virt-pipe which results in a device file created in the guests directory > (perf could create /dev/vm/fedora/perf for example). This file is used for > guest-host communication. That is kind of half of my suggestion - the built-in enumeration guests and a guaranteed channel to them accessible to tools. (KVM already has its own special channel so it's not like channels of communication are useless.) The other half of my suggestion is that if we bring this thought to its logical conclusion then we might as well walk the whole mile and not use quirky, binary API single-channel pipes. I.e. we could use this convenient, human-readable, structured, hierarchical abstraction to expose information in a finegrained, scalable way, which has a world-class implementation in Linux: the 'VFS namespace'. Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-22 10:59 ` Ingo Molnar @ 2010-03-22 11:47 ` Joerg Roedel 2010-03-22 12:26 ` Ingo Molnar 2010-03-23 13:18 ` Soeren Sandmann 0 siblings, 2 replies; 390+ messages in thread From: Joerg Roedel @ 2010-03-22 11:47 UTC (permalink / raw) To: Ingo Molnar Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo On Mon, Mar 22, 2010 at 11:59:27AM +0100, Ingo Molnar wrote: > Best would be if you demonstrated any problems of the perf symbol lookup code > you are aware of on the host side, as it has that exact design you are > criticising here. We are eager to fix any bugs in it. > > If you claim that it's buggy then that should very much be demonstratable - no > need to go into theoretical arguments about it. I am not claiming anything. I just try to imagine how your proposal will look like in practice and forgot that symbol resolution is done at a later point. But even with defered symbol resolution we need more information from the guest than just the rip falling out of KVM. The guest needs to tell us about the process where the event happened (information that the host has about itself without any hassle) and which executable-files it was loaded from. > > Avi was against that but I think it would make sense to give names to > > virtual machines (with a default, similar to network interface names). Then > > we can create a directory in /dev/ with that name (e.g. /dev/vm/fedora/). > > Inside the guest a (priviledged) process can create some kind of named > > virt-pipe which results in a device file created in the guests directory > > (perf could create /dev/vm/fedora/perf for example). This file is used for > > guest-host communication. > > That is kind of half of my suggestion - the built-in enumeration guests and a > guaranteed channel to them accessible to tools. (KVM already has its own > special channel so it's not like channels of communication are useless.) > > The other half of my suggestion is that if we bring this thought to its > logical conclusion then we might as well walk the whole mile and not use > quirky, binary API single-channel pipes. I.e. we could use this convenient, > human-readable, structured, hierarchical abstraction to expose information in > a finegrained, scalable way, which has a world-class implementation in Linux: > the 'VFS namespace'. Probably. At least it is the solution that fits best into the current design of perf. But we should think about how this will be done. Raw disk access is no solution because we need to access virtual file-systems of the guest too. Network filesystems may be a solution but then we come back to the 'deployment-nightmare'. Joerg ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-22 11:47 ` Joerg Roedel @ 2010-03-22 12:26 ` Ingo Molnar 2010-03-23 13:18 ` Soeren Sandmann 1 sibling, 0 replies; 390+ messages in thread From: Ingo Molnar @ 2010-03-22 12:26 UTC (permalink / raw) To: Joerg Roedel Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo * Joerg Roedel <joro@8bytes.org> wrote: > On Mon, Mar 22, 2010 at 11:59:27AM +0100, Ingo Molnar wrote: > > Best would be if you demonstrated any problems of the perf symbol lookup code > > you are aware of on the host side, as it has that exact design you are > > criticising here. We are eager to fix any bugs in it. > > > > If you claim that it's buggy then that should very much be demonstratable - no > > need to go into theoretical arguments about it. > > I am not claiming anything. I just try to imagine how your proposal will > look like in practice and forgot that symbol resolution is done at a later > point. > > But even with defered symbol resolution we need more information from the > guest than just the rip falling out of KVM. The guest needs to tell us about > the process where the event happened (information that the host has about > itself without any hassle) and which executable-files it was loaded from. Correct - for full information we need a good paravirt perf integration of the kernel bits to pass that through. (I.e. we want to 'integrate' the PID space as well, at least within the perf notion of PIDs.) Initially we can do without that as well. > Probably. At least it is the solution that fits best into the current design > of perf. But we should think about how this will be done. Raw disk access is > no solution because we need to access virtual file-systems of the guest too. > [...] I never said anything about 'raw disk access'. Have you seen my proposal of (optional) VFS namespace integration? (It can be found repeated the Nth time in my mail you replied to) Thanks, Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-22 11:47 ` Joerg Roedel 2010-03-22 12:26 ` Ingo Molnar @ 2010-03-23 13:18 ` Soeren Sandmann 2010-03-23 13:49 ` Andi Kleen 1 sibling, 1 reply; 390+ messages in thread From: Soeren Sandmann @ 2010-03-23 13:18 UTC (permalink / raw) To: Joerg Roedel Cc: Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo Joerg Roedel <joro@8bytes.org> writes: > On Mon, Mar 22, 2010 at 11:59:27AM +0100, Ingo Molnar wrote: > > Best would be if you demonstrated any problems of the perf symbol lookup code > > you are aware of on the host side, as it has that exact design you are > > criticising here. We are eager to fix any bugs in it. > > > > If you claim that it's buggy then that should very much be demonstratable - no > > need to go into theoretical arguments about it. > > I am not claiming anything. I just try to imagine how your proposal > will look like in practice and forgot that symbol resolution is done at > a later point. > But even with defered symbol resolution we need more information from > the guest than just the rip falling out of KVM. The guest needs to tell > us about the process where the event happened (information that the host > has about itself without any hassle) and which executable-files it was > loaded from. Slightly tangential, but there is another case that has some of the same problems: profiling other language runtimes than C and C++, say Python. At the moment profilers will generally tell you what is going on inside the python runtime, but not what the python program itself is doing. To fix that problem, it seems like we need some way to have python export what is going on. Maybe the same mechanism could be used to both access what is going on in qemu and python. Soren ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-23 13:18 ` Soeren Sandmann @ 2010-03-23 13:49 ` Andi Kleen 2010-03-23 14:04 ` Soeren Sandmann 2010-03-23 14:10 ` Arnaldo Carvalho de Melo 0 siblings, 2 replies; 390+ messages in thread From: Andi Kleen @ 2010-03-23 13:49 UTC (permalink / raw) To: Soeren Sandmann Cc: Joerg Roedel, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo Soeren Sandmann <sandmann@daimi.au.dk> writes: > > To fix that problem, it seems like we need some way to have python > export what is going on. Maybe the same mechanism could be used to > both access what is going on in qemu and python. oprofile already has an interface to let JITs export information about the JITed code. C Python is not a JIT, but presumably one of the python JITs could do it. http://oprofile.sourceforge.net/doc/devel/index.html I know it's not envogue anymore and you won't be a approved cool kid if you do, but you could just use oprofile? Ok presumably one would need to do a python interface for this first. I believe it's currently only implemented for Java and Mono. I presume it might work today with IronPython on Mono. IMHO it doesn't make sense to invent another interface for this, although I'm sure someone will propose just that. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-23 13:49 ` Andi Kleen @ 2010-03-23 14:04 ` Soeren Sandmann 2010-03-23 14:20 ` Andi Kleen 2010-03-23 14:46 ` Frank Ch. Eigler 2010-03-23 14:10 ` Arnaldo Carvalho de Melo 1 sibling, 2 replies; 390+ messages in thread From: Soeren Sandmann @ 2010-03-23 14:04 UTC (permalink / raw) To: Andi Kleen Cc: Joerg Roedel, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo Andi Kleen <andi@firstfloor.org> writes: > Soeren Sandmann <sandmann@daimi.au.dk> writes: > > > > To fix that problem, it seems like we need some way to have python > > export what is going on. Maybe the same mechanism could be used to > > both access what is going on in qemu and python. > > oprofile already has an interface to let JITs export > information about the JITed code. C Python is not a JIT, > but presumably one of the python JITs could do it. > > http://oprofile.sourceforge.net/doc/devel/index.html It's not that I personally want to profile a particular python program. I'm interested in the more general problem of extracting more information from profiled user space programs than just stack traces. Examples: - What is going on inside QEMU? - Which client is the X server servicing? - What parts of a python/shell/scheme/javascript program is taking the most CPU time? I don't think the oprofile JIT interface solves any of these problems. (In fact, I don't see why the JIT problem is even hard. The JIT compiler can just generate a little ELF file with symbols in it, and the profiler can pick it up through the mmap events that you get through the perf interface). > I know it's not envogue anymore and you won't be a approved > cool kid if you do, but you could just use oprofile? I am bringing this up because I want to extend sysprof to be more useful. Soren ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-23 14:04 ` Soeren Sandmann @ 2010-03-23 14:20 ` Andi Kleen 2010-03-23 14:29 ` Arnaldo Carvalho de Melo 2010-03-23 14:46 ` Frank Ch. Eigler 1 sibling, 1 reply; 390+ messages in thread From: Andi Kleen @ 2010-03-23 14:20 UTC (permalink / raw) To: Soeren Sandmann Cc: Joerg Roedel, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo Soeren Sandmann <sandmann@daimi.au.dk> writes: > > Examples: > > - What is going on inside QEMU? That's something the JIT interface could answer. > - Which client is the X server servicing? > > - What parts of a python/shell/scheme/javascript program is > taking the most CPU time? I suspect for those you rather need event based tracers of some sort, similar to kernel trace points. Otherwise you would need own separate stacks and other complications. systemtap has some effort to use the dtrace instrumentation that crops up in more and more user programs for this. It wouldn't surprise me if that was already in python and other programs you're interested in. I presume right now it only works if you apply the utrace monstrosity though, but perhaps the new uprobes patches floating around will come to rescue. There also was some effort to have a pure user space daemon based approach for LTT, but I believe that currently needs own trace points. Again I fully expect someone to reinvent the wheel here and afterwards complain about "community inefficiences" :-) > I don't think the oprofile JIT interface solves any of these > problems. (In fact, I don't see why the JIT problem is even hard. The > JIT compiler can just generate a little ELF file with symbols in it, > and the profiler can pick it up through the mmap events that you get > through the perf interface). That would require keeping those temporary ELF files for potentially unlimited time around (profilers today look at the ELF files at the final analysis phase, which might be weeks away) Also that would be a lot of overhead for the JIT and most likely be a larger scale rewrite for a given JIT code base. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-23 14:20 ` Andi Kleen @ 2010-03-23 14:29 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 390+ messages in thread From: Arnaldo Carvalho de Melo @ 2010-03-23 14:29 UTC (permalink / raw) To: Andi Kleen Cc: Soeren Sandmann, Joerg Roedel, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker Em Tue, Mar 23, 2010 at 03:20:11PM +0100, Andi Kleen escreveu: > Soeren Sandmann <sandmann@daimi.au.dk> writes: > > I don't think the oprofile JIT interface solves any of these > > problems. (In fact, I don't see why the JIT problem is even hard. The > > JIT compiler can just generate a little ELF file with symbols in it, > > and the profiler can pick it up through the mmap events that you get > > through the perf interface). > > That would require keeping those temporary ELF files for > potentially unlimited time around (profilers today look at the ELF > files at the final analysis phase, which might be weeks away) 'perf record' will traverse the perf.data file just collected and, if the binaries have build-ids, will stash them in ~/.debug/, keyed by build-id just like the -debuginfo packages do. So only the binaries with hits. Also one can use 'perf archive' to create a tar.bz2 file with the files with hits for the specified perf.data file, that can then be transfered to another machine, whatever arch, untarred at ~/.debug and then the report can be done there. As it is done by build-id, multiple 'perf record' sessions share files in the cache. Right now the whole ELF file (or /proc/kallsyms copy) is stored if collected from the DSO directly, or the bits that are stored in -debuginfo files if we find it installed (so smaller). We could strip that down further by storing just the ELF sections needed to make sense of the symtab. - Arnaldo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-23 14:04 ` Soeren Sandmann 2010-03-23 14:20 ` Andi Kleen @ 2010-03-23 14:46 ` Frank Ch. Eigler 1 sibling, 0 replies; 390+ messages in thread From: Frank Ch. Eigler @ 2010-03-23 14:46 UTC (permalink / raw) To: Soeren Sandmann Cc: Andi Kleen, Joerg Roedel, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo Soeren Sandmann <sandmann@daimi.au.dk> writes: > [...] > - What is going on inside QEMU? > - Which client is the X server servicing? > - What parts of a python/shell/scheme/javascript program is > taking the most CPU time? > [...] These kinds of questions usually require navigation through internal data of the user-space process ("Where in this linked list is this pointer?"), and often also correlating them with history ("which socket/fd was most recently serviced?"). Systemtap excels at letting one express such things. - FChE ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-23 13:49 ` Andi Kleen 2010-03-23 14:04 ` Soeren Sandmann @ 2010-03-23 14:10 ` Arnaldo Carvalho de Melo 2010-03-23 15:23 ` Peter Zijlstra 1 sibling, 1 reply; 390+ messages in thread From: Arnaldo Carvalho de Melo @ 2010-03-23 14:10 UTC (permalink / raw) To: Andi Kleen Cc: Soeren Sandmann, Joerg Roedel, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker Em Tue, Mar 23, 2010 at 02:49:01PM +0100, Andi Kleen escreveu: > Soeren Sandmann <sandmann@daimi.au.dk> writes: > > To fix that problem, it seems like we need some way to have python > > export what is going on. Maybe the same mechanism could be used to > > both access what is going on in qemu and python. > > oprofile already has an interface to let JITs export > information about the JITed code. C Python is not a JIT, > but presumably one of the python JITs could do it. > > http://oprofile.sourceforge.net/doc/devel/index.html > > I know it's not envogue anymore and you won't be a approved > cool kid if you do, but you could just use oprofile? perf also has supports for this and Pekka Enberg's jato uses it: http://penberg.blogspot.com/2009/06/jato-has-profiler.html - Arnaldo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-23 14:10 ` Arnaldo Carvalho de Melo @ 2010-03-23 15:23 ` Peter Zijlstra 0 siblings, 0 replies; 390+ messages in thread From: Peter Zijlstra @ 2010-03-23 15:23 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Andi Kleen, Soeren Sandmann, Joerg Roedel, Ingo Molnar, Zhang, Yanmin, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Pekka Enberg On Tue, 2010-03-23 at 11:10 -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Mar 23, 2010 at 02:49:01PM +0100, Andi Kleen escreveu: > > Soeren Sandmann <sandmann@daimi.au.dk> writes: > > > To fix that problem, it seems like we need some way to have python > > > export what is going on. Maybe the same mechanism could be used to > > > both access what is going on in qemu and python. > > > > oprofile already has an interface to let JITs export > > information about the JITed code. C Python is not a JIT, > > but presumably one of the python JITs could do it. > > > > http://oprofile.sourceforge.net/doc/devel/index.html > > > > I know it's not envogue anymore and you won't be a approved > > cool kid if you do, but you could just use oprofile? > > perf also has supports for this and Pekka Enberg's jato uses it: > > http://penberg.blogspot.com/2009/06/jato-has-profiler.html Right, we need to move that into a library though (always meant to do that, never got around to doing it). That way the app can link against a dso with weak empty stubs and have perf record LD_PRELOAD a version that has a suitable implementation. That all has the advantage of not exposing the actual interface like we do now. ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-19 8:21 ` Ingo Molnar 2010-03-19 17:29 ` oerg Roedel @ 2010-03-22 7:24 ` Zhang, Yanmin 2010-03-22 16:44 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-22 7:24 UTC (permalink / raw) To: Ingo Molnar Cc: Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo On Fri, 2010-03-19 at 09:21 +0100, Ingo Molnar wrote: > Nice progress! > > This bit: > > > 1) perf kvm top > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms > > --guestmodules=/home/ymzhang/guest/modules top > > Will be really be painful to developers - to enter that long line while we > have these things called 'computers' that ought to reduce human work. Also, > it's incomplete, we need access to the guest system's binaries to do ELF > symbol resolution and dwarf decoding. Yes, I agree with you and Avi that we need the enhancement be user-friendly. One of my start points is to keep the tool having less dependency on other components. Admin/developers could write script wrappers quickly if perf has parameters to support the new capability. > > So we really need some good, automatic way to get to the guest symbol space, > so that if a developer types: > > perf kvm top > > Then the obvious thing happens by default. (which is to show the guest > overhead) > > There's no technical barrier on the perf tooling side to implement all that: > perf supports build-ids extensively and can deal with multiple symbol spaces - > as long as it has access to it. The guest kernel could be ID-ed based on its > /sys/kernel/notes and /sys/module/*/notes/.note.gnu.build-id build-ids. I tried sshfs quickly. sshfs could mount root filesystem of guest os nicely. I could access the files quickly. However, it doesn't work when I access /proc/ and /sys/ because sshfs/scp depend on file size while the sizes of most files of /proc/ and /sys/ are 0. > > So some sort of --guestmount option would be the natural solution, which > points to the guest system's root: and a Qemu enumeration of guest mounts > (which would be off by default and configurable) from which perf can pick up > the target guest all automatically. (obviously only under allowed permissions > so that such access is secure) If sshfs could access /proc/ and /sys correctly, here is a design: --guestmount points to a directory which consists of a list of sub-directories. Every sub-directory's name is just the qemu process id of guest os. Admin/developer mounts every guest os instance's root directory to corresponding sub-directory. Then, perf could access all files. It's possible because guest os instance happens to be multi-threading in a process. One of the defects is the accessing to guest os becomes slow or impossible when guest os is very busy. > > This would allow not just kallsyms access via $guest/proc/kallsyms but also > gives us the full space of symbol features: access to the guest binaries for > annotation and general symbol resolution, command/binary name identification, > etc. > > Such a mount would obviously not broaden existing privileges - and as an > additional control a guest would also have a way to indicate that it does not > wish a guest mount at all. > > Unfortunately, in a previous thread the Qemu maintainer has indicated that he > will essentially NAK any attempt to enhance Qemu to provide an easily > discoverable, self-contained, transparent guest mount on the host side. > > No technical justification was given for that NAK, despite my repeated > requests to particulate the exact security problems that such an approach > would cause. > > If that NAK does not stand in that form then i'd like to know about it - it > makes no sense for us to try to code up a solution against a standing > maintainer NAK ... > > The other option is some sysadmin level hackery to NFS-mount the guest or so. > This is a vastly inferior method that brings us back to the absymal usability > levels of OProfile: > > 1) it wont be guest transparent > 2) has to be re-done for every guest image. > 3) even if packaged it has to be gotten into every. single. Linux. distro. separately. > 4) old Linux guests wont work out of box > > In other words: it's very inconvenient on multiple levels and wont ever happen > on any reasonable enough scale to make a difference to Linux. > > Which is an unfortunate situation - and the ball is on the KVM/Qemu side so i > can do little about it. > > Thanks, > > Ingo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-22 7:24 ` Zhang, Yanmin @ 2010-03-22 16:44 ` Arnaldo Carvalho de Melo 2010-03-23 3:14 ` Zhang, Yanmin 0 siblings, 1 reply; 390+ messages in thread From: Arnaldo Carvalho de Melo @ 2010-03-22 16:44 UTC (permalink / raw) To: Zhang, Yanmin Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Frédéric Weisbecker Em Mon, Mar 22, 2010 at 03:24:47PM +0800, Zhang, Yanmin escreveu: > On Fri, 2010-03-19 at 09:21 +0100, Ingo Molnar wrote: > > So some sort of --guestmount option would be the natural solution, which > > points to the guest system's root: and a Qemu enumeration of guest mounts > > (which would be off by default and configurable) from which perf can pick up > > the target guest all automatically. (obviously only under allowed permissions > > so that such access is secure) > If sshfs could access /proc/ and /sys correctly, here is a design: > --guestmount points to a directory which consists of a list of sub-directories. > Every sub-directory's name is just the qemu process id of guest os. Admin/developer > mounts every guest os instance's root directory to corresponding sub-directory. > > Then, perf could access all files. It's possible because guest os instance > happens to be multi-threading in a process. One of the defects is the accessing to > guest os becomes slow or impossible when guest os is very busy. If the MMAP events on the guest included a cookie that could later be used to query for the symtab of that DSO, we wouldn't need to access the guest FS at all, right? With build-ids and debuginfo-install like tools the symbol resolution could be performed by using the cookies (build-ids) as keys to get to the *-debuginfo packages with matching symtabs (and DWARF for source annotation, etc). We have that for the kernel as: [acme@doppio linux-2.6-tip]$ l /sys/kernel/notes -r--r--r-- 1 root root 36 2010-03-22 13:14 /sys/kernel/notes [acme@doppio linux-2.6-tip]$ l /sys/module/ipv6/sections/.note.gnu.build-id -r--r--r-- 1 root root 4096 2010-03-22 13:38 /sys/module/ipv6/sections/.note.gnu.build-id [acme@doppio linux-2.6-tip]$ That way we would cover DSOs being reinstalled in long running 'perf record' sessions too. This was discussed some time ago but would require help from the bits that load DSOs. build-ids then would be first class citizens. - Arnaldo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-22 16:44 ` Arnaldo Carvalho de Melo @ 2010-03-23 3:14 ` Zhang, Yanmin 2010-03-23 13:15 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-23 3:14 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Frédéric Weisbecker On Mon, 2010-03-22 at 13:44 -0300, Arnaldo Carvalho de Melo wrote: > Em Mon, Mar 22, 2010 at 03:24:47PM +0800, Zhang, Yanmin escreveu: > > On Fri, 2010-03-19 at 09:21 +0100, Ingo Molnar wrote: > > > So some sort of --guestmount option would be the natural solution, which > > > points to the guest system's root: and a Qemu enumeration of guest mounts > > > (which would be off by default and configurable) from which perf can pick up > > > the target guest all automatically. (obviously only under allowed permissions > > > so that such access is secure) > > If sshfs could access /proc/ and /sys correctly, here is a design: > > --guestmount points to a directory which consists of a list of sub-directories. > > Every sub-directory's name is just the qemu process id of guest os. Admin/developer > > mounts every guest os instance's root directory to corresponding sub-directory. > > > > Then, perf could access all files. It's possible because guest os instance > > happens to be multi-threading in a process. One of the defects is the accessing to > > guest os becomes slow or impossible when guest os is very busy. > > If the MMAP events on the guest included a cookie that could later be > used to query for the symtab of that DSO, we wouldn't need to access the > guest FS at all, right? It depends on specific sub commands. As for 'perf kvm top', developers want to see the profiling immediately. Even with 'perf kvm record', developers also want to see results quickly. At least I'm eager for the results when investigating a performance issue. > > With build-ids and debuginfo-install like tools the symbol resolution > could be performed by using the cookies (build-ids) as keys to get to > the *-debuginfo packages with matching symtabs (and DWARF for source > annotation, etc). We can't make sure guest os uses the same os images, or don't know where we could find the original DVD images being used to install guest os. Current perf does save build id, including both kernls's and other application lib/executables. > > We have that for the kernel as: > > [acme@doppio linux-2.6-tip]$ l /sys/kernel/notes > -r--r--r-- 1 root root 36 2010-03-22 13:14 /sys/kernel/notes > [acme@doppio linux-2.6-tip]$ l /sys/module/ipv6/sections/.note.gnu.build-id > -r--r--r-- 1 root root 4096 2010-03-22 13:38 /sys/module/ipv6/sections/.note.gnu.build-id > [acme@doppio linux-2.6-tip]$ > > That way we would cover DSOs being reinstalled in long running 'perf > record' sessions too. That's one of objectives of perf to support long running. > > This was discussed some time ago but would require help from the bits > that load DSOs. > > build-ids then would be first class citizens. > > - Arnaldo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-23 3:14 ` Zhang, Yanmin @ 2010-03-23 13:15 ` Arnaldo Carvalho de Melo 2010-03-24 1:39 ` Zhang, Yanmin 0 siblings, 1 reply; 390+ messages in thread From: Arnaldo Carvalho de Melo @ 2010-03-23 13:15 UTC (permalink / raw) To: Zhang, Yanmin Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Frédéric Weisbecker Em Tue, Mar 23, 2010 at 11:14:41AM +0800, Zhang, Yanmin escreveu: > On Mon, 2010-03-22 at 13:44 -0300, Arnaldo Carvalho de Melo wrote: > > Em Mon, Mar 22, 2010 at 03:24:47PM +0800, Zhang, Yanmin escreveu: > > > On Fri, 2010-03-19 at 09:21 +0100, Ingo Molnar wrote: > > > Then, perf could access all files. It's possible because guest os instance > > > happens to be multi-threading in a process. One of the defects is the accessing to > > > guest os becomes slow or impossible when guest os is very busy. > > > > If the MMAP events on the guest included a cookie that could later be > > used to query for the symtab of that DSO, we wouldn't need to access the > > guest FS at all, right? > It depends on specific sub commands. As for 'perf kvm top', developers > want to see the profiling immediately. Even with 'perf kvm record', > developers also want to That is not a problem, if you have the relevant buildids in your cache (Look in your machine at ~/.debug/), it will be as fast as ever. If you use a distro that has its userspace with build-ids, you probably use it always without noticing :-) > see results quickly. At least I'm eager for the results when > investigating a performance issue. Sure thing. > > With build-ids and debuginfo-install like tools the symbol > > resolution could be performed by using the cookies (build-ids) as > > keys to get to the *-debuginfo packages with matching symtabs (and > > DWARF for source annotation, etc). > We can't make sure guest os uses the same os images, or don't know > where we could find the original DVD images being used to install > guest os. You don't have to have guest and host sharing the same OS image, you just have to somehow populate your buildid cache with what you need, be it using sshfs or what Ingo is suggesting once, or using what your vendor provides (debuginfo packages). And you just have to do it once, for the relevant apps, to have it in your buildid cache. > Current perf does save build id, including both kernls's and other > application lib/executables. Yeah, I know, I implemented it. :-) > > We have that for the kernel as: > > [acme@doppio linux-2.6-tip]$ l /sys/kernel/notes > > -r--r--r-- 1 root root 36 2010-03-22 13:14 /sys/kernel/notes > > [acme@doppio linux-2.6-tip]$ l /sys/module/ipv6/sections/.note.gnu.build-id > > -r--r--r-- 1 root root 4096 2010-03-22 13:38 /sys/module/ipv6/sections/.note.gnu.build-id > > [acme@doppio linux-2.6-tip]$ > > That way we would cover DSOs being reinstalled in long running 'perf > > record' sessions too. > That's one of objectives of perf to support long running. But it doesn't fully supports right now, as I explained, build-ids are collected at the end of the record session, because we have to open the DSOs that had hits to get the 20 bytes cookie we need, the build-id. If we had it in the PERF_RECORD_MMAP record, we would close this race, and the added cost at load time should be minimal, to get the ELF section with it and put it somewhere in task struct. If only we could coalesce it a bit to reclaim this: [acme@doppio linux-2.6-tip]$ pahole -C task_struct ../build/v2.6.34-rc1-tip+/kernel/sched.o | tail -5 /* size: 5968, cachelines: 94, members: 150 */ /* sum members: 5943, holes: 7, sum holes: 25 */ /* bit holes: 1, sum bit holes: 28 bits */ /* last cacheline: 16 bytes */ }; [acme@doppio linux-2.6-tip]$ 8-) Or at least get just one of those 4 bytes holes then we could stick it at the end to get our build-id there, accessing it would be done only at PERF_RECORD_MMAP injection time, i.e. close to the time when we actually are loading the executable mmap, i.e. close to the time when the loader is injecting the build-id, I guess the extra memory and processing costs would be in the noise. > > This was discussed some time ago but would require help from the bits > > that load DSOs. > > build-ids then would be first class citizens. - Arnaldo ^ permalink raw reply [flat|nested] 390+ messages in thread
* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side 2010-03-23 13:15 ` Arnaldo Carvalho de Melo @ 2010-03-24 1:39 ` Zhang, Yanmin 0 siblings, 0 replies; 390+ messages in thread From: Zhang, Yanmin @ 2010-03-24 1:39 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang, Frédéric Weisbecker On Tue, 2010-03-23 at 10:15 -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Mar 23, 2010 at 11:14:41AM +0800, Zhang, Yanmin escreveu: > > On Mon, 2010-03-22 at 13:44 -0300, Arnaldo Carvalho de Melo wrote: > > > Em Mon, Mar 22, 2010 at 03:24:47PM +0800, Zhang, Yanmin escreveu: > > > > On Fri, 2010-03-19 at 09:21 +0100, Ingo Molnar wrote: > > > > Then, perf could access all files. It's possible because guest os instance > > > > happens to be multi-threading in a process. One of the defects is the accessing to > > > > guest os becomes slow or impossible when guest os is very busy. > > > > > > If the MMAP events on the guest included a cookie that could later be > > > used to query for the symtab of that DSO, we wouldn't need to access the > > > guest FS at all, right? > > > It depends on specific sub commands. As for 'perf kvm top', developers > > want to see the profiling immediately. Even with 'perf kvm record', > > developers also want to > > That is not a problem, if you have the relevant buildids in your cache > (Look in your machine at ~/.debug/), it will be as fast as ever. > > If you use a distro that has its userspace with build-ids, you probably > use it always without noticing :-) > > > see results quickly. At least I'm eager for the results when > > investigating a performance issue. > > Sure thing. > > > > With build-ids and debuginfo-install like tools the symbol > > > resolution could be performed by using the cookies (build-ids) as > > > keys to get to the *-debuginfo packages with matching symtabs (and > > > DWARF for source annotation, etc). > > > We can't make sure guest os uses the same os images, or don't know > > where we could find the original DVD images being used to install > > guest os. > > You don't have to have guest and host sharing the same OS image, you > just have to somehow populate your buildid cache with what you need, be > it using sshfs or what Ingo is suggesting once, or using what your > vendor provides (debuginfo packages). And you just have to do it once, > for the relevant apps, to have it in your buildid cache. > > > Current perf does save build id, including both kernls's and other > > application lib/executables. > > Yeah, I know, I implemented it. :-) > > > > We have that for the kernel as: > > > > [acme@doppio linux-2.6-tip]$ l /sys/kernel/notes > > > -r--r--r-- 1 root root 36 2010-03-22 13:14 /sys/kernel/notes > > > [acme@doppio linux-2.6-tip]$ l /sys/module/ipv6/sections/.note.gnu.build-id > > > -r--r--r-- 1 root root 4096 2010-03-22 13:38 /sys/module/ipv6/sections/.note.gnu.build-id > > > [acme@doppio linux-2.6-tip]$ > > > > That way we would cover DSOs being reinstalled in long running 'perf > > > record' sessions too. > > > That's one of objectives of perf to support long running. > > But it doesn't fully supports right now, as I explained, build-ids are > collected at the end of the record session, because we have to open the > DSOs that had hits to get the 20 bytes cookie we need, the build-id. > > If we had it in the PERF_RECORD_MMAP record, we would close this race, > and the added cost at load time should be minimal, to get the ELF > section with it and put it somewhere in task struct. Well, you are improving upon perfection. > > If only we could coalesce it a bit to reclaim this: > > [acme@doppio linux-2.6-tip]$ pahole -C task_struct ../build/v2.6.34-rc1-tip+/kernel/sched.o | tail -5 > /* size: 5968, cachelines: 94, members: 150 */ > /* sum members: 5943, holes: 7, sum holes: 25 */ > /* bit holes: 1, sum bit holes: 28 bits */ > /* last cacheline: 16 bytes */ > }; > [acme@doppio linux-2.6-tip]$ That reminds me I listened to your presentation on 2007 OLS. :) > > 8-) > > Or at least get just one of those 4 bytes holes then we could stick it > at the end to get our build-id there, accessing it would be done only > at PERF_RECORD_MMAP injection time, i.e. close to the time when we > actually are loading the executable mmap, i.e. close to the time when > the loader is injecting the build-id, I guess the extra memory and > processing costs would be in the noise. > > > > This was discussed some time ago but would require help from the bits > > > that load DSOs. > > > > build-ids then would be first class citizens. > > - Arnaldo ^ permalink raw reply [flat|nested] 390+ messages in thread
end of thread, other threads:[~2010-04-08 14:29 UTC | newest] Thread overview: 390+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-03-16 5:27 [PATCH] Enhance perf to collect KVM guest os statistics from host side Zhang, Yanmin 2010-03-16 5:41 ` Avi Kivity 2010-03-16 7:24 ` Ingo Molnar 2010-03-16 9:20 ` Avi Kivity 2010-03-16 9:53 ` Ingo Molnar 2010-03-16 10:13 ` Avi Kivity 2010-03-16 10:20 ` Ingo Molnar 2010-03-16 10:40 ` Avi Kivity 2010-03-16 10:50 ` Ingo Molnar 2010-03-16 11:10 ` Avi Kivity 2010-03-16 11:25 ` Ingo Molnar 2010-03-16 12:21 ` Avi Kivity 2010-03-16 12:29 ` Ingo Molnar 2010-03-16 12:41 ` Avi Kivity 2010-03-16 13:08 ` Ingo Molnar 2010-03-16 13:16 ` Avi Kivity 2010-03-16 13:31 ` Ingo Molnar 2010-03-16 13:37 ` Avi Kivity 2010-03-16 15:06 ` Frank Ch. Eigler 2010-03-16 15:52 ` Ingo Molnar 2010-03-16 16:08 ` Frank Ch. Eigler 2010-03-16 16:35 ` Ingo Molnar 2010-03-16 17:34 ` Anthony Liguori 2010-03-16 17:52 ` Ingo Molnar 2010-03-16 18:06 ` Anthony Liguori 2010-03-16 18:28 ` Ingo Molnar 2010-03-16 23:04 ` Anthony Liguori 2010-03-17 0:41 ` Frank Ch. Eigler 2010-03-17 3:54 ` Avi Kivity 2010-03-17 8:16 ` Ingo Molnar 2010-03-17 8:20 ` Avi Kivity 2010-03-17 8:59 ` Ingo Molnar 2010-03-18 5:27 ` Huang, Zhiteng 2010-03-18 5:27 ` Huang, Zhiteng 2010-03-17 8:14 ` Ingo Molnar 2010-03-17 8:53 ` Ingo Molnar 2010-03-16 17:06 ` Anthony Liguori 2010-03-16 17:39 ` Ingo Molnar 2010-03-16 23:07 ` Anthony Liguori 2010-03-17 8:10 ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar 2010-03-18 8:20 ` Avi Kivity 2010-03-18 8:56 ` Ingo Molnar 2010-03-18 9:24 ` Alexander Graf 2010-03-18 10:10 ` Ingo Molnar 2010-03-18 10:21 ` Avi Kivity 2010-03-18 11:35 ` Ingo Molnar 2010-03-18 12:00 ` Alexander Graf 2010-03-18 12:33 ` Frank Ch. Eigler 2010-03-18 13:01 ` John Kacur 2010-03-18 13:01 ` John Kacur 2010-03-18 14:25 ` Ingo Molnar 2010-03-18 14:39 ` Frank Ch. Eigler 2010-03-18 13:02 ` Ingo Molnar 2010-03-18 13:10 ` Avi Kivity 2010-03-18 13:31 ` Ingo Molnar 2010-03-18 13:44 ` Daniel P. Berrange 2010-03-18 13:59 ` Ingo Molnar 2010-03-18 14:06 ` John Kacur 2010-03-18 14:06 ` John Kacur 2010-03-18 14:11 ` Ingo Molnar 2010-03-18 13:46 ` Avi Kivity 2010-03-18 13:57 ` Ingo Molnar 2010-03-18 14:25 ` Avi Kivity 2010-03-18 14:36 ` Ingo Molnar 2010-03-18 14:51 ` Avi Kivity 2010-03-18 13:24 ` Frank Ch. Eigler 2010-03-18 13:48 ` Ingo Molnar 2010-03-18 10:12 ` Avi Kivity 2010-03-18 10:28 ` Ingo Molnar 2010-03-18 10:50 ` Ingo Molnar 2010-03-18 11:30 ` Avi Kivity 2010-03-18 11:48 ` Ingo Molnar 2010-03-18 12:22 ` Avi Kivity 2010-03-18 13:00 ` Ingo Molnar 2010-03-18 13:36 ` Avi Kivity 2010-03-18 14:09 ` Ingo Molnar 2010-03-18 14:38 ` Avi Kivity 2010-03-18 17:16 ` Ingo Molnar 2010-03-18 14:59 ` Anthony Liguori 2010-03-18 15:17 ` Ingo Molnar 2010-03-18 16:11 ` Anthony Liguori 2010-03-18 16:28 ` Ingo Molnar 2010-03-18 16:38 ` Anthony Liguori 2010-03-18 16:51 ` Pekka Enberg 2010-03-18 16:51 ` Pekka Enberg 2010-03-18 17:02 ` Ingo Molnar 2010-03-18 17:09 ` Avi Kivity 2010-03-18 17:28 ` Ingo Molnar 2010-03-19 7:56 ` Avi Kivity 2010-03-19 8:53 ` Ingo Molnar 2010-03-19 12:56 ` Anthony Liguori 2010-03-21 19:17 ` Ingo Molnar 2010-03-21 19:35 ` Antoine Martin 2010-03-21 19:59 ` Ingo Molnar 2010-03-21 20:09 ` Avi Kivity 2010-03-21 21:00 ` Ingo Molnar 2010-03-21 21:44 ` Avi Kivity 2010-03-21 23:43 ` Anthony Liguori 2010-03-21 20:01 ` Avi Kivity 2010-03-21 20:08 ` Olivier Galibert 2010-03-21 20:11 ` Avi Kivity 2010-03-21 20:18 ` Antoine Martin 2010-03-21 20:24 ` Avi Kivity 2010-03-21 20:31 ` Antoine Martin 2010-03-21 21:03 ` Avi Kivity 2010-03-21 21:20 ` Ingo Molnar 2010-03-22 6:35 ` Avi Kivity 2010-03-22 11:48 ` Ingo Molnar 2010-03-22 12:31 ` Pekka Enberg 2010-03-22 12:31 ` Pekka Enberg 2010-03-22 12:37 ` Daniel P. Berrange 2010-03-22 12:44 ` Pekka Enberg 2010-03-22 12:54 ` Ingo Molnar 2010-03-22 13:05 ` Daniel P. Berrange 2010-03-22 13:23 ` Richard W.M. Jones 2010-03-22 14:02 ` Ingo Molnar 2010-03-22 14:20 ` oerg Roedel 2010-03-22 13:56 ` Ingo Molnar 2010-03-22 14:01 ` Richard W.M. Jones 2010-03-22 14:07 ` Ingo Molnar 2010-03-22 12:36 ` Avi Kivity 2010-03-22 12:50 ` Pekka Enberg 2010-03-22 12:50 ` Pekka Enberg 2010-03-22 6:59 ` Zhang, Yanmin 2010-03-22 12:05 ` Antoine Martin 2010-03-21 20:37 ` Ingo Molnar 2010-03-22 6:37 ` Avi Kivity 2010-03-22 11:39 ` Ingo Molnar 2010-03-22 12:44 ` Avi Kivity 2010-03-22 12:54 ` Daniel P. Berrange 2010-03-22 14:26 ` Ingo Molnar 2010-03-22 17:29 ` Avi Kivity 2010-03-21 20:11 ` Avi Kivity 2010-03-21 20:31 ` Ingo Molnar 2010-03-21 21:30 ` Avi Kivity 2010-03-21 21:52 ` Ingo Molnar 2010-03-22 6:49 ` Avi Kivity 2010-03-22 11:23 ` Ingo Molnar 2010-03-22 12:49 ` Avi Kivity 2010-03-22 13:01 ` Pekka Enberg 2010-03-22 13:01 ` Pekka Enberg 2010-03-22 14:54 ` Ingo Molnar 2010-03-22 19:04 ` Avi Kivity 2010-03-23 9:46 ` Olivier Galibert 2010-03-22 14:47 ` Ingo Molnar 2010-03-22 18:15 ` Avi Kivity 2010-03-22 11:10 ` oerg Roedel 2010-03-22 12:22 ` Ingo Molnar 2010-03-22 13:46 ` Joerg Roedel 2010-03-22 16:32 ` Ingo Molnar 2010-03-22 17:17 ` Frank Ch. Eigler 2010-03-22 17:27 ` Pekka Enberg 2010-03-22 17:27 ` Pekka Enberg 2010-03-22 17:32 ` Avi Kivity 2010-03-22 17:39 ` Ingo Molnar 2010-03-22 17:58 ` Avi Kivity 2010-03-22 17:52 ` Pekka Enberg 2010-03-22 17:52 ` Pekka Enberg 2010-03-22 18:04 ` Avi Kivity 2010-03-22 18:10 ` Pekka Enberg 2010-03-22 18:10 ` Pekka Enberg 2010-03-22 18:55 ` Avi Kivity 2010-03-22 17:43 ` Ingo Molnar 2010-03-22 18:02 ` Avi Kivity 2010-03-22 17:44 ` Avi Kivity 2010-03-22 19:10 ` Ingo Molnar 2010-03-22 19:18 ` Anthony Liguori 2010-03-22 19:23 ` Avi Kivity 2010-03-22 19:28 ` Andrea Arcangeli 2010-03-22 19:20 ` Joerg Roedel 2010-03-22 19:28 ` Avi Kivity 2010-03-22 19:49 ` Ingo Molnar 2010-03-21 23:35 ` Anthony Liguori 2010-03-20 7:35 ` Avi Kivity 2010-03-21 19:06 ` Ingo Molnar 2010-03-21 20:22 ` Avi Kivity 2010-03-21 20:55 ` Ingo Molnar 2010-03-21 21:42 ` Avi Kivity 2010-03-21 21:54 ` Ingo Molnar 2010-03-22 0:16 ` Anthony Liguori 2010-03-22 11:59 ` Ingo Molnar 2010-03-22 7:13 ` Avi Kivity 2010-03-22 11:14 ` Ingo Molnar 2010-03-22 11:23 ` Alexander Graf 2010-03-22 12:33 ` Lukas Kolbe 2010-03-22 12:29 ` Avi Kivity 2010-03-22 12:44 ` Ingo Molnar 2010-03-22 12:52 ` Avi Kivity 2010-03-22 14:32 ` Ingo Molnar 2010-03-22 14:43 ` Anthony Liguori 2010-03-22 15:55 ` Ingo Molnar 2010-03-22 16:08 ` Anthony Liguori 2010-03-22 16:59 ` Ingo Molnar 2010-03-22 18:28 ` Anthony Liguori 2010-03-22 17:11 ` Ingo Molnar 2010-03-22 18:30 ` Anthony Liguori 2010-03-22 16:12 ` Avi Kivity 2010-03-22 16:16 ` Avi Kivity 2010-03-22 16:40 ` Pekka Enberg 2010-03-22 16:40 ` Pekka Enberg 2010-03-22 18:06 ` Avi Kivity 2010-03-22 16:51 ` Ingo Molnar 2010-03-22 17:08 ` Avi Kivity 2010-03-22 17:34 ` Ingo Molnar 2010-03-22 17:55 ` Avi Kivity 2010-03-22 19:15 ` Anthony Liguori 2010-03-22 19:31 ` Daniel P. Berrange 2010-03-22 19:33 ` Anthony Liguori 2010-03-22 19:39 ` Alexander Graf 2010-03-22 19:54 ` Ingo Molnar 2010-03-22 19:58 ` Alexander Graf 2010-03-22 20:21 ` Ingo Molnar 2010-03-22 20:35 ` Avi Kivity 2010-03-23 10:48 ` Bernd Petrovitsch 2010-03-22 20:19 ` Antoine Martin 2010-03-22 20:00 ` Antoine Martin 2010-03-22 20:58 ` Daniel P. Berrange 2010-03-22 19:20 ` Ingo Molnar 2010-03-22 19:44 ` Avi Kivity 2010-03-22 20:06 ` Ingo Molnar 2010-03-22 20:15 ` Avi Kivity 2010-03-22 20:29 ` Ingo Molnar 2010-03-22 20:40 ` Avi Kivity 2010-03-22 18:35 ` Anthony Liguori 2010-03-22 19:22 ` Ingo Molnar 2010-03-22 19:29 ` Anthony Liguori 2010-03-22 20:32 ` Ingo Molnar 2010-03-22 20:43 ` Avi Kivity 2010-03-22 19:45 ` Avi Kivity 2010-03-22 20:35 ` Ingo Molnar 2010-03-22 20:45 ` Avi Kivity 2010-03-22 18:41 ` Anthony Liguori 2010-03-22 19:27 ` Ingo Molnar 2010-03-22 19:47 ` Avi Kivity 2010-03-22 20:46 ` Ingo Molnar 2010-03-22 20:53 ` Avi Kivity 2010-03-22 22:06 ` Anthony Liguori 2010-03-23 9:07 ` Avi Kivity 2010-03-23 14:09 ` Anthony Liguori 2010-03-23 10:13 ` Kevin Wolf 2010-03-23 10:28 ` Antoine Martin 2010-03-23 14:06 ` Joerg Roedel 2010-03-23 16:39 ` Avi Kivity 2010-03-23 18:21 ` Joerg Roedel 2010-03-23 18:27 ` Peter Zijlstra 2010-03-23 19:05 ` Javier Guerra Giraldez 2010-03-23 19:05 ` Javier Guerra Giraldez 2010-03-24 4:57 ` Avi Kivity 2010-03-24 11:59 ` Joerg Roedel 2010-03-24 12:08 ` Avi Kivity 2010-03-24 12:50 ` Joerg Roedel 2010-03-24 13:05 ` Avi Kivity 2010-03-24 13:46 ` Joerg Roedel 2010-03-24 13:57 ` Avi Kivity 2010-03-24 15:01 ` Joerg Roedel 2010-03-24 15:12 ` Avi Kivity 2010-03-24 15:46 ` Joerg Roedel 2010-03-24 15:49 ` Avi Kivity 2010-03-24 15:59 ` Joerg Roedel 2010-03-24 16:09 ` Avi Kivity 2010-03-24 16:40 ` Joerg Roedel 2010-03-24 16:47 ` Avi Kivity 2010-03-24 16:52 ` Avi Kivity 2010-04-08 14:29 ` Antoine Martin 2010-03-24 17:47 ` Arnaldo Carvalho de Melo 2010-03-24 18:20 ` Avi Kivity 2010-03-24 18:27 ` Arnaldo Carvalho de Melo 2010-03-25 9:00 ` Zhang, Yanmin 2010-03-24 15:26 ` Daniel P. Berrange 2010-03-24 15:37 ` Joerg Roedel 2010-03-24 15:43 ` Avi Kivity 2010-03-24 15:50 ` Joerg Roedel 2010-03-24 15:52 ` Avi Kivity 2010-03-24 16:17 ` Joerg Roedel 2010-03-24 16:20 ` Avi Kivity 2010-03-24 16:31 ` Joerg Roedel 2010-03-24 16:32 ` Avi Kivity 2010-03-24 16:45 ` Joerg Roedel 2010-03-24 16:48 ` Avi Kivity 2010-03-24 16:03 ` Peter Zijlstra 2010-03-24 16:16 ` Avi Kivity 2010-03-24 16:23 ` Joerg Roedel 2010-03-24 16:45 ` Peter Zijlstra 2010-03-24 13:53 ` Alexander Graf 2010-03-24 13:59 ` Avi Kivity 2010-03-24 14:24 ` Alexander Graf 2010-03-24 15:06 ` Avi Kivity 2010-03-24 5:09 ` Andi Kleen 2010-03-24 6:42 ` Avi Kivity 2010-03-24 7:38 ` Andi Kleen 2010-03-24 8:59 ` Avi Kivity 2010-03-24 9:31 ` Andi Kleen 2010-03-22 14:46 ` Avi Kivity 2010-03-22 16:08 ` Ingo Molnar 2010-03-22 16:13 ` Avi Kivity 2010-03-24 12:06 ` Paolo Bonzini 2010-03-21 22:00 ` Ingo Molnar 2010-03-21 23:50 ` Anthony Liguori 2010-03-22 0:25 ` Anthony Liguori 2010-03-22 7:18 ` Avi Kivity 2010-03-19 9:19 ` Paul Mundt 2010-03-19 9:52 ` Olivier Galibert 2010-03-19 13:56 ` [LKML] " Konrad Rzeszutek Wilk 2010-03-19 13:56 ` Konrad Rzeszutek Wilk 2010-03-18 14:53 ` Anthony Liguori 2010-03-18 16:13 ` Ingo Molnar 2010-03-18 16:54 ` Avi Kivity 2010-03-18 17:11 ` Ingo Molnar 2010-03-18 18:20 ` Anthony Liguori 2010-03-18 18:23 ` drepper 2010-03-18 19:15 ` Ingo Molnar 2010-03-18 19:37 ` drepper 2010-03-18 20:18 ` Ingo Molnar 2010-03-18 20:39 ` drepper 2010-03-18 20:56 ` Ingo Molnar 2010-03-18 22:06 ` Alan Cox 2010-03-18 22:16 ` Ingo Molnar 2010-03-19 7:22 ` Avi Kivity 2010-03-21 13:27 ` Gabor Gombas 2010-03-18 21:02 ` Zachary Amsden 2010-03-18 21:15 ` Ingo Molnar 2010-03-18 22:19 ` Zachary Amsden 2010-03-18 22:44 ` Ingo Molnar 2010-03-19 7:21 ` Avi Kivity 2010-03-20 14:59 ` Andrea Arcangeli 2010-03-21 10:03 ` Avi Kivity 2010-03-18 9:22 ` Ingo Molnar 2010-03-18 10:32 ` Avi Kivity 2010-03-18 11:19 ` Ingo Molnar 2010-03-18 18:20 ` Frederic Weisbecker 2010-03-18 19:50 ` Frank Ch. Eigler 2010-03-18 20:47 ` Ingo Molnar 2010-03-18 8:44 ` Jes Sorensen 2010-03-18 9:54 ` Ingo Molnar 2010-03-18 10:40 ` Jes Sorensen 2010-03-18 10:58 ` Ingo Molnar 2010-03-18 13:23 ` Jes Sorensen 2010-03-18 14:22 ` Ingo Molnar 2010-03-18 14:45 ` Jes Sorensen 2010-03-18 16:54 ` Ingo Molnar 2010-03-18 18:10 ` Anthony Liguori 2010-03-19 14:53 ` Andrea Arcangeli 2010-03-18 14:38 ` Anthony Liguori 2010-03-18 14:44 ` Anthony Liguori 2010-03-16 22:30 ` [PATCH] Enhance perf to collect KVM guest os statistics from host side oerg Roedel 2010-03-16 23:01 ` Masami Hiramatsu 2010-03-17 7:27 ` Ingo Molnar 2010-03-16 7:48 ` Zhang, Yanmin 2010-03-16 9:28 ` Zhang, Yanmin 2010-03-16 9:33 ` Avi Kivity 2010-03-16 9:47 ` Ingo Molnar 2010-03-17 9:26 ` Zhang, Yanmin 2010-03-18 2:45 ` Zhang, Yanmin 2010-03-18 7:49 ` Zhang, Yanmin 2010-03-18 8:03 ` Ingo Molnar 2010-03-18 13:03 ` Arnaldo Carvalho de Melo 2010-03-16 9:32 ` Avi Kivity 2010-03-17 2:34 ` Zhang, Yanmin 2010-03-17 9:28 ` Sheng Yang 2010-03-17 9:41 ` Avi Kivity 2010-03-17 9:51 ` Sheng Yang 2010-03-17 10:06 ` Avi Kivity 2010-03-17 21:14 ` Zachary Amsden 2010-03-18 1:19 ` Sheng Yang 2010-03-18 4:50 ` Zachary Amsden 2010-03-18 5:22 ` Sheng Yang 2010-03-18 5:41 ` Sheng Yang 2010-03-18 8:47 ` Zachary Amsden 2010-03-19 3:38 ` Zhang, Yanmin 2010-03-19 8:21 ` Ingo Molnar 2010-03-19 17:29 ` oerg Roedel 2010-03-21 18:43 ` Ingo Molnar 2010-03-22 10:14 ` oerg Roedel 2010-03-22 10:37 ` Ingo Molnar 2010-03-22 10:59 ` Ingo Molnar 2010-03-22 11:47 ` Joerg Roedel 2010-03-22 12:26 ` Ingo Molnar 2010-03-23 13:18 ` Soeren Sandmann 2010-03-23 13:49 ` Andi Kleen 2010-03-23 14:04 ` Soeren Sandmann 2010-03-23 14:20 ` Andi Kleen 2010-03-23 14:29 ` Arnaldo Carvalho de Melo 2010-03-23 14:46 ` Frank Ch. Eigler 2010-03-23 14:10 ` Arnaldo Carvalho de Melo 2010-03-23 15:23 ` Peter Zijlstra 2010-03-22 7:24 ` Zhang, Yanmin 2010-03-22 16:44 ` Arnaldo Carvalho de Melo 2010-03-23 3:14 ` Zhang, Yanmin 2010-03-23 13:15 ` Arnaldo Carvalho de Melo 2010-03-24 1:39 ` Zhang, Yanmin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.