All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Enhance perf to collect KVM guest os statistics from host side
@ 2010-03-16  5:27 Zhang, Yanmin
  2010-03-16  5:41 ` Avi Kivity
  2010-03-19  3:38 ` Zhang, Yanmin
  0 siblings, 2 replies; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-16  5:27 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang

From: Zhang, Yanmin <yanmin_zhang@linux.intel.com>

Based on the discussion in KVM community, I worked out the patch to support
perf to collect guest os statistics from host side. This patch is implemented
with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
critical bug and provided good suggestions with other guys. I really appreciate
their kind help.

The patch adds new subcommand kvm to perf.

  perf kvm top
  perf kvm record
  perf kvm report
  perf kvm diff

The new perf could profile guest os kernel except guest os user space, but it
could summarize guest os user space utilization per guest os.

Below are some examples.
1) perf kvm top
[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules top

--------------------------------------------------------------------------------------------------------------------------
   PerfTop:   16010 irqs/sec  kernel:59.1% us: 1.5% guest kernel:31.9% guest us: 7.5% exact:  0.0% [1000Hz cycles],  (all, 16 CPUs)
--------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                  DSO
             _______ _____ _________________________ _______________________

            38770.00 20.4% __ticket_spin_lock        [guest.kernel.kallsyms]
            22560.00 11.9% ftrace_likely_update      [kernel.kallsyms]
             9208.00  4.8% __lock_acquire            [kernel.kallsyms]
             5473.00  2.9% trace_hardirqs_off_caller [kernel.kallsyms]
             5222.00  2.7% copy_user_generic_string  [guest.kernel.kallsyms]
             4450.00  2.3% validate_chain            [kernel.kallsyms]
             4262.00  2.2% trace_hardirqs_on_caller  [kernel.kallsyms]
             4239.00  2.2% do_raw_spin_lock          [kernel.kallsyms]
             3548.00  1.9% do_raw_spin_unlock        [kernel.kallsyms]
             2487.00  1.3% lock_release              [kernel.kallsyms]
             2165.00  1.1% __local_bh_disable        [kernel.kallsyms]
             1905.00  1.0% check_chain_key           [kernel.kallsyms]
             1737.00  0.9% lock_acquire              [kernel.kallsyms]
             1604.00  0.8% tcp_recvmsg               [kernel.kallsyms]
             1524.00  0.8% mark_lock                 [kernel.kallsyms]
             1464.00  0.8% schedule                  [kernel.kallsyms]
             1423.00  0.7% __d_lookup                [guest.kernel.kallsyms]

If you want to just show host data, pls. don't use parameter --guest.
The headline includes guest os kernel and userspace percentage.

2) perf kvm record
[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules record -f -a sleep 60
[ perf record: Woken up 15 times to write data ]
[ perf record: Captured and wrote 29.385 MB perf.data.kvm (~1283837 samples) ]

3) perf kvm report
	3.1) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules report --sort pid --showcpuutilization>norm.host.guest.report.pid
# Samples: 2453796285126
#
# Overhead  sys    us    guest sys    guest us            Command:  Pid
# ........  .....................
#
    43.67%     1.35%     0.01%    39.06%     3.26%  qemu-system-x86: 3913
     3.78%     3.58%     0.20%     0.00%     0.00%           tbench:13519
     3.69%     3.66%     0.03%     0.00%     0.00%       tbench_srv:13526

Some performance guys require perf to show sys/us/guest_sys/guest_us per KVM guest
instance which is actually just a multi-threaded process. Above sub parameter --showcpuutilization
does so.

	3.2) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules report >norm.host.guest.report
# Samples: 2466991384118
#
# Overhead          Command                                                             Shared Object  Symbol
# ........  ...............  ........................................................................  ......
#
    29.11%  qemu-system-x86  [guest.kernel.kallsyms]                                                   [g] __ticket_spin_lock
     5.88%       tbench_srv  [kernel.kallsyms]                                                         [k] ftrace_likely_update
     5.76%           tbench  [kernel.kallsyms]                                                         [k] ftrace_likely_update
     3.88%  qemu-system-x86                                                                34c3255482  [u] 0x000034c3255482
     1.83%           tbench  [kernel.kallsyms]                                                         [k] __lock_acquire
     1.81%       tbench_srv  [kernel.kallsyms]                                                         [k] __lock_acquire
     1.38%       tbench_srv  [kernel.kallsyms]                                                         [k] trace_hardirqs_off_caller
     1.37%           tbench  [kernel.kallsyms]                                                         [k] trace_hardirqs_off_caller
     1.13%  qemu-system-x86  [guest.kernel.kallsyms]                                                   [g] copy_user_generic_string
     1.04%       tbench_srv  [kernel.kallsyms]                                                         [k] validate_chain
     1.00%           tbench  [kernel.kallsyms]                                                         [k] trace_hardirqs_on_caller
     1.00%       tbench_srv  [kernel.kallsyms]                                                         [k] trace_hardirqs_on_caller
     0.95%           tbench  [kernel.kallsyms]                                                         [k] do_raw_spin_lock


[u] means it's in guest os user space. [g] means in guest os kernel. Other info is very direct.
If it shows a module such like [ext4], it means guest kernel module, because native host kernel's
modules are start from something like /lib/modules/XXX.


Below is the patch against tip/master tree of 15th March.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>

---

diff -Nraup linux-2.6_tipmaster0315/arch/x86/include/asm/perf_event.h linux-2.6_tipmaster0315_perfkvm/arch/x86/include/asm/perf_event.h
--- linux-2.6_tipmaster0315/arch/x86/include/asm/perf_event.h	2010-03-16 08:59:11.533288951 +0800
+++ linux-2.6_tipmaster0315_perfkvm/arch/x86/include/asm/perf_event.h	2010-03-16 09:01:09.972117272 +0800
@@ -143,17 +143,10 @@ extern void perf_events_lapic_init(void)
  */
 #define PERF_EFLAGS_EXACT	(1UL << 3)
 
-#define perf_misc_flags(regs)				\
-({	int misc = 0;					\
-	if (user_mode(regs))				\
-		misc |= PERF_RECORD_MISC_USER;		\
-	else						\
-		misc |= PERF_RECORD_MISC_KERNEL;	\
-	if (regs->flags & PERF_EFLAGS_EXACT)		\
-		misc |= PERF_RECORD_MISC_EXACT;		\
-	misc; })
-
-#define perf_instruction_pointer(regs)	((regs)->ip)
+struct pt_regs;
+extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
+extern unsigned long perf_misc_flags(struct pt_regs *regs);
+#define perf_misc_flags(regs)	perf_misc_flags(regs)
 
 #else
 static inline void init_hw_perf_events(void)		{ }
diff -Nraup linux-2.6_tipmaster0315/arch/x86/include/asm/ptrace.h linux-2.6_tipmaster0315_perfkvm/arch/x86/include/asm/ptrace.h
--- linux-2.6_tipmaster0315/arch/x86/include/asm/ptrace.h	2010-03-16 08:59:11.701271925 +0800
+++ linux-2.6_tipmaster0315_perfkvm/arch/x86/include/asm/ptrace.h	2010-03-16 09:01:09.972117272 +0800
@@ -167,6 +167,15 @@ static inline int user_mode(struct pt_re
 #endif
 }
 
+static inline int user_mode_cs(u16 cs)
+{
+#ifdef CONFIG_X86_32
+	return (cs & SEGMENT_RPL_MASK) == USER_RPL;
+#else
+	return !!(cs & 3);
+#endif
+}
+
 static inline int user_mode_vm(struct pt_regs *regs)
 {
 #ifdef CONFIG_X86_32
diff -Nraup linux-2.6_tipmaster0315/arch/x86/kernel/cpu/perf_event.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kernel/cpu/perf_event.c
--- linux-2.6_tipmaster0315/arch/x86/kernel/cpu/perf_event.c	2010-03-16 08:59:12.225267457 +0800
+++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kernel/cpu/perf_event.c	2010-03-16 09:03:02.343617673 +0800
@@ -1707,3 +1707,30 @@ void perf_arch_fetch_caller_regs(struct 
 	local_save_flags(regs->flags);
 }
 EXPORT_SYMBOL_GPL(perf_arch_fetch_caller_regs);
+
+unsigned long perf_instruction_pointer(struct pt_regs *regs)
+{
+	unsigned long ip;
+	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+		ip = perf_guest_cbs->get_guest_ip();
+	} else
+		ip = instruction_pointer(regs);
+	return ip;
+}
+
+unsigned long perf_misc_flags(struct pt_regs *regs)
+{
+	int misc = 0;
+	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+		misc |= perf_guest_cbs->is_user_mode() ?
+			PERF_RECORD_MISC_GUEST_USER :
+			PERF_RECORD_MISC_GUEST_KERNEL;
+	} else
+		misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
+			PERF_RECORD_MISC_KERNEL;
+	if (regs->flags & PERF_EFLAGS_EXACT)
+		misc |= PERF_RECORD_MISC_EXACT;
+
+	return misc;
+}
+
diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c
--- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c	2010-03-16 08:59:11.825295404 +0800
+++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c	2010-03-16 09:01:09.976084492 +0800
@@ -26,6 +26,7 @@
 #include <linux/sched.h>
 #include <linux/moduleparam.h>
 #include <linux/ftrace_event.h>
+#include <linux/perf_event.h>
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct 
 	vmcs_write32(TPR_THRESHOLD, irr);
 }
 
+DEFINE_PER_CPU(int, kvm_in_guest) = {0};
+
+static void kvm_set_in_guest(void)
+{
+	percpu_write(kvm_in_guest, 1);
+}
+
+static int kvm_is_in_guest(void)
+{
+	return percpu_read(kvm_in_guest);
+}
+
+static int kvm_is_user_mode(void)
+{
+	int user_mode;
+	user_mode = user_mode_cs(vmcs_read16(GUEST_CS_SELECTOR));
+	return user_mode;
+}
+
+static unsigned long kvm_get_guest_ip(void)
+{
+	return vmcs_readl(GUEST_RIP);
+}
+
+static void kvm_reset_in_guest(void)
+{
+	if (percpu_read(kvm_in_guest))
+		percpu_write(kvm_in_guest, 0);
+}
+
+static struct perf_guest_info_callbacks kvm_guest_cbs = {
+	.is_in_guest 		= kvm_is_in_guest,
+	.is_user_mode		= kvm_is_user_mode,
+	.get_guest_ip		= kvm_get_guest_ip,
+	.reset_in_guest		= kvm_reset_in_guest,
+};
+
 static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
 {
 	u32 exit_intr_info;
@@ -3653,8 +3691,11 @@ static void vmx_complete_interrupts(stru
 
 	/* We need to handle NMIs before interrupts are enabled */
 	if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR &&
-	    (exit_intr_info & INTR_INFO_VALID_MASK))
+		(exit_intr_info & INTR_INFO_VALID_MASK)) {
+		kvm_set_in_guest();
 		asm("int $2");
+		kvm_reset_in_guest();
+	}
 
 	idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
 
@@ -4251,6 +4292,8 @@ static int __init vmx_init(void)
 	if (bypass_guest_pf)
 		kvm_mmu_set_nonpresent_ptes(~0xffeull, 0ull);
 
+	perf_register_guest_info_callbacks(&kvm_guest_cbs);
+
 	return 0;
 
 out3:
@@ -4266,6 +4309,8 @@ out:
 
 static void __exit vmx_exit(void)
 {
+	perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+
 	free_page((unsigned long)vmx_msr_bitmap_legacy);
 	free_page((unsigned long)vmx_msr_bitmap_longmode);
 	free_page((unsigned long)vmx_io_bitmap_b);
diff -Nraup linux-2.6_tipmaster0315/include/linux/perf_event.h linux-2.6_tipmaster0315_perfkvm/include/linux/perf_event.h
--- linux-2.6_tipmaster0315/include/linux/perf_event.h	2010-03-16 08:59:21.940168828 +0800
+++ linux-2.6_tipmaster0315_perfkvm/include/linux/perf_event.h	2010-03-16 09:01:09.976084492 +0800
@@ -288,11 +288,13 @@ struct perf_event_mmap_page {
 	__u64	data_tail;		/* user-space written tail */
 };
 
-#define PERF_RECORD_MISC_CPUMODE_MASK		(3 << 0)
+#define PERF_RECORD_MISC_CPUMODE_MASK		(7 << 0)
 #define PERF_RECORD_MISC_CPUMODE_UNKNOWN	(0 << 0)
 #define PERF_RECORD_MISC_KERNEL			(1 << 0)
 #define PERF_RECORD_MISC_USER			(2 << 0)
 #define PERF_RECORD_MISC_HYPERVISOR		(3 << 0)
+#define PERF_RECORD_MISC_GUEST_KERNEL		(4 << 0)
+#define PERF_RECORD_MISC_GUEST_USER		(5 << 0)
 
 #define PERF_RECORD_MISC_EXACT			(1 << 14)
 /*
@@ -446,6 +448,13 @@ enum perf_callchain_context {
 # include <asm/perf_event.h>
 #endif
 
+struct perf_guest_info_callbacks {
+	int (*is_in_guest) (void);
+	int (*is_user_mode) (void);
+	unsigned long (*get_guest_ip) (void);
+	void (*reset_in_guest) (void);
+};
+
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 #include <asm/hw_breakpoint.h>
 #endif
@@ -913,6 +922,12 @@ static inline void perf_event_mmap(struc
 		__perf_event_mmap(vma);
 }
 
+extern struct perf_guest_info_callbacks *perf_guest_cbs;
+extern int perf_register_guest_info_callbacks(
+		struct perf_guest_info_callbacks *);
+extern int perf_unregister_guest_info_callbacks(
+		struct perf_guest_info_callbacks *);
+
 extern void perf_event_comm(struct task_struct *tsk);
 extern void perf_event_fork(struct task_struct *tsk);
 
@@ -982,6 +997,11 @@ perf_sw_event(u32 event_id, u64 nr, int 
 static inline void
 perf_bp_event(struct perf_event *event, void *data)			{ }
 
+static inline int perf_register_guest_info_callbacks
+(struct perf_guest_info_callbacks *)   {return 0; }
+static inline int perf_unregister_guest_info_callbacks
+(struct perf_guest_info_callbacks *)   {return 0; }
+
 static inline void perf_event_mmap(struct vm_area_struct *vma)		{ }
 static inline void perf_event_comm(struct task_struct *tsk)		{ }
 static inline void perf_event_fork(struct task_struct *tsk)		{ }
diff -Nraup linux-2.6_tipmaster0315/kernel/perf_event.c linux-2.6_tipmaster0315_perfkvm/kernel/perf_event.c
--- linux-2.6_tipmaster0315/kernel/perf_event.c	2010-03-16 08:59:55.108431543 +0800
+++ linux-2.6_tipmaster0315_perfkvm/kernel/perf_event.c	2010-03-16 09:01:09.980084394 +0800
@@ -2796,6 +2796,27 @@ void perf_arch_fetch_caller_regs(struct 
 }
 
 /*
+ * We assume there is only KVM supporting the callbacks.
+ * Later on, we might change it to a list if there is
+ * another virtualization implementation supporting the callbacks.
+ */
+struct perf_guest_info_callbacks *perf_guest_cbs;
+
+int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks * cbs)
+{
+	perf_guest_cbs = cbs;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
+
+int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks * cbs)
+{
+	perf_guest_cbs = NULL;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
+
+/*
  * Output
  */
 static bool perf_output_space(struct perf_mmap_data *data, unsigned long tail,
@@ -3738,7 +3759,7 @@ void __perf_event_mmap(struct vm_area_st
 		.event_id  = {
 			.header = {
 				.type = PERF_RECORD_MMAP,
-				.misc = 0,
+				.misc = PERF_RECORD_MISC_USER,
 				/* .size */
 			},
 			/* .pid */
diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-diff.c linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-diff.c
--- linux-2.6_tipmaster0315/tools/perf/builtin-diff.c	2010-03-16 08:59:54.736473543 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-diff.c	2010-03-16 10:13:14.620371938 +0800
@@ -33,7 +33,7 @@ static int perf_session__add_hist_entry(
 		return -ENOMEM;
 
 	if (hit)
-		he->count += count;
+		__perf_session__add_count(he, al, count);
 
 	return 0;
 }
@@ -225,6 +225,9 @@ int cmd_diff(int argc, const char **argv
 			input_new = argv[1];
 		} else
 			input_new = argv[0];
+	} else if (symbol_conf.guest_vmlinux_name || symbol_conf.guest_kallsyms) {
+		input_old = "perf.data.host";
+		input_new = "perf.data.guest";
 	}
 
 	symbol_conf.exclude_other = false;
diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin.h linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin.h
--- linux-2.6_tipmaster0315/tools/perf/builtin.h	2010-03-16 08:59:54.692509868 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin.h	2010-03-16 09:01:09.980084394 +0800
@@ -32,5 +32,6 @@ extern int cmd_version(int argc, const c
 extern int cmd_probe(int argc, const char **argv, const char *prefix);
 extern int cmd_kmem(int argc, const char **argv, const char *prefix);
 extern int cmd_lock(int argc, const char **argv, const char *prefix);
+extern int cmd_kvm(int argc, const char **argv, const char *prefix);
 
 #endif
diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-kvm.c linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-kvm.c
--- linux-2.6_tipmaster0315/tools/perf/builtin-kvm.c	1970-01-01 08:00:00.000000000 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-kvm.c	2010-03-16 09:01:09.980084394 +0800
@@ -0,0 +1,123 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/session.h"
+
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+
+#include "util/debug.h"
+
+#include <sys/prctl.h>
+
+#include <semaphore.h>
+#include <pthread.h>
+#include <math.h>
+
+static char			*file_name = NULL;
+static char			name_buffer[256];
+
+int				perf_host = 1;
+int				perf_guest = 0;
+
+static const char * const kvm_usage[] = {
+	"perf kvm [<options>] {top|record|report|diff}",
+	NULL
+};
+
+static const struct option kvm_options[] = {
+	OPT_STRING('i', "input", &file_name, "file",
+		    "Input file name"),
+	OPT_STRING('o', "output", &file_name, "file",
+		    "Output file name"),
+	OPT_BOOLEAN(0, "guest", &perf_guest,
+		    "Collect guest os data"),
+	OPT_BOOLEAN(0, "host", &perf_host,
+		    "Collect guest os data"),
+	OPT_STRING(0, "guestvmlinux", &symbol_conf.guest_vmlinux_name, "file",
+		    "file saving guest os vmlinux"),
+	OPT_STRING(0, "guestkallsyms", &symbol_conf.guest_kallsyms, "file",
+		    "file saving guest os /proc/kallsyms"),
+	OPT_STRING(0, "guestmodules", &symbol_conf.guest_modules, "file",
+		    "file saving guest os /proc/modules"),
+	OPT_END()
+};
+
+static int __cmd_record(int argc, const char **argv)
+{
+	int rec_argc, i = 0, j;
+	const char **rec_argv;
+
+	rec_argc = argc + 2;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+	rec_argv[i++] = strdup("record");
+	rec_argv[i++] = strdup("-o");
+	rec_argv[i++] = strdup(file_name);
+	for (j = 1; j < argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	BUG_ON(i != rec_argc);
+
+	return cmd_record(i, rec_argv, NULL);
+}
+
+static int __cmd_report(int argc, const char **argv)
+{
+	int rec_argc, i = 0, j;
+	const char **rec_argv;
+
+	rec_argc = argc + 2;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+	rec_argv[i++] = strdup("report");
+	rec_argv[i++] = strdup("-i");
+	rec_argv[i++] = strdup(file_name);
+	for (j = 1; j < argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	BUG_ON(i != rec_argc);
+
+	return cmd_report(i, rec_argv, NULL);
+}
+
+int cmd_kvm(int argc, const char **argv, const char *prefix __used)
+{
+	perf_host = perf_guest = 0;
+
+	argc = parse_options(argc, argv, kvm_options, kvm_usage,
+			PARSE_OPT_STOP_AT_NON_OPTION);
+	if (!argc)
+		usage_with_options(kvm_usage, kvm_options);
+
+	if (!perf_host)
+		perf_guest = 1;
+
+	if (!file_name) {
+		if (perf_host && !perf_guest)
+			sprintf(name_buffer, "perf.data.host");
+		else if (!perf_host && perf_guest)
+			sprintf(name_buffer, "perf.data.guest");
+		else
+			sprintf(name_buffer, "perf.data.kvm");
+		file_name = name_buffer;
+	}
+
+	if (!strncmp(argv[0], "rec", 3)) {
+		return __cmd_record(argc, argv);
+	} else if (!strncmp(argv[0], "rep", 3)) {
+		return __cmd_report(argc, argv);
+	} else if (!strncmp(argv[0], "diff", 4)) {
+		return cmd_diff(argc, argv, NULL);
+	} else if (!strncmp(argv[0], "top", 3)) {
+		return cmd_top(argc, argv, NULL);
+	} else {
+		usage_with_options(kvm_usage, kvm_options);
+	}
+
+	return 0;
+}
+
diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-record.c linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-record.c
--- linux-2.6_tipmaster0315/tools/perf/builtin-record.c	2010-03-16 08:59:54.896488489 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-record.c	2010-03-16 09:01:09.980084394 +0800
@@ -566,18 +566,58 @@ static int __cmd_record(int argc, const 
 	post_processing_offset = lseek(output, 0, SEEK_CUR);
 
 	err = event__synthesize_kernel_mmap(process_synthesized_event,
-					    session, "_text");
+					    session, "/proc/kallsyms",
+					    "kernel.kallsyms",
+					    session->vmlinux_maps,
+					    "_text", PERF_RECORD_MISC_KERNEL);
 	if (err < 0) {
 		pr_err("Couldn't record kernel reference relocation symbol.\n");
 		return err;
 	}
 
-	err = event__synthesize_modules(process_synthesized_event, session);
+	err = event__synthesize_modules(process_synthesized_event,
+				session,
+				&session->kmaps,
+				PERF_RECORD_MISC_KERNEL);
 	if (err < 0) {
 		pr_err("Couldn't record kernel reference relocation symbol.\n");
 		return err;
 	}
 
+	if (perf_guest) {
+		/*
+		 *As for guest kernel when processing subcommand record&report,
+		 *we arrange module mmap prior to guest kernel mmap and trigger
+		 *a preload dso because guest module symbols are loaded from guest
+		 *kallsyms instead of /lib/modules/XXX/XXX. This method is used to
+		 *avoid symbol missing when the first addr is in module instead of
+		 *in guest kernel
+		 */
+		err = event__synthesize_modules(process_synthesized_event,
+				session,
+				&session->guest_kmaps,
+				PERF_RECORD_MISC_GUEST_KERNEL);
+		if (err < 0) {
+			pr_err("Couldn't record guest kernel reference relocation symbol.\n");
+			return err;
+		}
+
+		/*
+		 * We use _stext for guest kernel because guest kernel's /proc/kallsyms
+		 * have no _text.
+		 */
+		err = event__synthesize_kernel_mmap(process_synthesized_event,
+				session, symbol_conf.guest_kallsyms,
+				"guest.kernel.kallsyms",
+				session->guest_vmlinux_maps,
+				"_stext",
+				PERF_RECORD_MISC_GUEST_KERNEL);
+		if (err < 0) {
+			pr_err("Couldn't record guest kernel reference relocation symbol.\n");
+			return err;
+		}
+	}
+
 	if (!system_wide && profile_cpu == -1)
 		event__synthesize_thread(target_pid, process_synthesized_event,
 					 session);
diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-report.c linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-report.c
--- linux-2.6_tipmaster0315/tools/perf/builtin-report.c	2010-03-16 08:59:54.760470652 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-report.c	2010-03-16 10:40:24.102800324 +0800
@@ -104,7 +104,7 @@ static int perf_session__add_hist_entry(
 		return -ENOMEM;
 
 	if (hit)
-		he->count += data->period;
+		__perf_session__add_count(he, al,  data->period);
 
 	if (symbol_conf.use_callchain) {
 		if (!hit)
@@ -428,6 +428,8 @@ static const struct option options[] = {
 		   "sort by key(s): pid, comm, dso, symbol, parent"),
 	OPT_BOOLEAN('P', "full-paths", &symbol_conf.full_paths,
 		    "Don't shorten the pathnames taking into account the cwd"),
+	OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
+		    "Show sample percentage for different cpu modes"),
 	OPT_STRING('p', "parent", &parent_pattern, "regex",
 		   "regex filter to identify parent, see: '--sort parent'"),
 	OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-top.c linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-top.c
--- linux-2.6_tipmaster0315/tools/perf/builtin-top.c	2010-03-16 08:59:54.760470652 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/builtin-top.c	2010-03-16 09:01:09.984084103 +0800
@@ -417,8 +417,9 @@ static double sym_weight(const struct sy
 }
 
 static long			samples;
-static long			userspace_samples;
+static long			kernel_samples, userspace_samples;
 static long			exact_samples;
+static long			guest_us_samples, guest_kernel_samples;
 static const char		CONSOLE_CLEAR[] = "^[[H^[[2J";
 
 static void __list_insert_active_sym(struct sym_entry *syme)
@@ -458,7 +459,10 @@ static void print_sym_table(void)
 	int printed = 0, j;
 	int counter, snap = !display_weighted ? sym_counter : 0;
 	float samples_per_sec = samples/delay_secs;
-	float ksamples_per_sec = (samples-userspace_samples)/delay_secs;
+	float ksamples_per_sec = kernel_samples/delay_secs;
+	float userspace_samples_per_sec = (userspace_samples)/delay_secs;
+	float guest_kernel_samples_per_sec = (guest_kernel_samples)/delay_secs;
+	float guest_us_samples_per_sec = (guest_us_samples)/delay_secs;
 	float esamples_percent = (100.0*exact_samples)/samples;
 	float sum_ksamples = 0.0;
 	struct sym_entry *syme, *n;
@@ -467,7 +471,8 @@ static void print_sym_table(void)
 	int sym_width = 0, dso_width = 0, max_dso_width;
 	const int win_width = winsize.ws_col - 1;
 
-	samples = userspace_samples = exact_samples = 0;
+	samples = userspace_samples = kernel_samples = exact_samples = 0;
+	guest_kernel_samples = guest_us_samples = 0;
 
 	/* Sort the active symbols */
 	pthread_mutex_lock(&active_symbols_lock);
@@ -498,10 +503,21 @@ static void print_sym_table(void)
 	puts(CONSOLE_CLEAR);
 
 	printf("%-*.*s\n", win_width, win_width, graph_dotted_line);
-	printf( "   PerfTop:%8.0f irqs/sec  kernel:%4.1f%%  exact: %4.1f%% [",
-		samples_per_sec,
-		100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)),
-		esamples_percent);
+	if (!perf_guest) {
+		printf( "   PerfTop:%8.0f irqs/sec  kernel:%4.1f%%  exact: %4.1f%% [",
+			samples_per_sec,
+			100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)),
+			esamples_percent);
+	} else {
+		printf( "   PerfTop:%8.0f irqs/sec  kernel:%4.1f%% us:%4.1f%%"
+			" guest kernel:%4.1f%% guest us:%4.1f%% exact: %4.1f%% [",
+			samples_per_sec,
+			100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)),
+			100.0 - (100.0*((samples_per_sec-userspace_samples_per_sec)/samples_per_sec)),
+			100.0 - (100.0*((samples_per_sec-guest_kernel_samples_per_sec)/samples_per_sec)),
+			100.0 - (100.0*((samples_per_sec-guest_us_samples_per_sec)/samples_per_sec)),
+			esamples_percent);
+	}
 
 	if (nr_counters == 1 || !display_weighted) {
 		printf("%Ld", (u64)attrs[0].sample_period);
@@ -958,9 +974,20 @@ static void event__process_sample(const 
 			return;
 		break;
 	case PERF_RECORD_MISC_KERNEL:
+		++kernel_samples;
 		if (hide_kernel_symbols)
 			return;
 		break;
+	case PERF_RECORD_MISC_GUEST_KERNEL:
+		++guest_kernel_samples;
+		break;
+	case PERF_RECORD_MISC_GUEST_USER:
+		++guest_us_samples;
+		/*
+		 * TODO: we don't process guest user from host side
+		 * except simple counting 
+		 */
+		return;
 	default:
 		return;
 	}
diff -Nraup linux-2.6_tipmaster0315/tools/perf/Makefile linux-2.6_tipmaster0315_perfkvm/tools/perf/Makefile
--- linux-2.6_tipmaster0315/tools/perf/Makefile	2010-03-16 08:59:54.892460680 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/Makefile	2010-03-16 10:45:19.503860691 +0800
@@ -462,6 +462,7 @@ BUILTIN_OBJS += builtin-trace.o
 BUILTIN_OBJS += builtin-probe.o
 BUILTIN_OBJS += builtin-kmem.o
 BUILTIN_OBJS += builtin-lock.o
+BUILTIN_OBJS += builtin-kvm.o
 
 PERFLIBS = $(LIB_FILE)
 
diff -Nraup linux-2.6_tipmaster0315/tools/perf/perf.c linux-2.6_tipmaster0315_perfkvm/tools/perf/perf.c
--- linux-2.6_tipmaster0315/tools/perf/perf.c	2010-03-16 08:59:54.764469663 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/perf.c	2010-03-16 09:01:09.984084103 +0800
@@ -308,6 +308,7 @@ static void handle_internal_command(int 
 		{ "probe",	cmd_probe,	0 },
 		{ "kmem",	cmd_kmem,	0 },
 		{ "lock",	cmd_lock,	0 },
+		{ "kvm",	cmd_kvm,	0 },
 	};
 	unsigned int i;
 	static const char ext[] = STRIP_EXTENSION;
diff -Nraup linux-2.6_tipmaster0315/tools/perf/perf.h linux-2.6_tipmaster0315_perfkvm/tools/perf/perf.h
--- linux-2.6_tipmaster0315/tools/perf/perf.h	2010-03-16 08:59:54.896488489 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/perf.h	2010-03-16 09:01:10.000116335 +0800
@@ -133,4 +133,6 @@ struct ip_callchain {
 	u64 ips[0];
 };
 
+extern int perf_host, perf_guest;
+
 #endif
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/event.c linux-2.6_tipmaster0315_perfkvm/tools/perf/util/event.c
--- linux-2.6_tipmaster0315/tools/perf/util/event.c	2010-03-16 08:59:54.864459297 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/event.c	2010-03-16 09:45:19.660852164 +0800
@@ -112,7 +112,7 @@ static int event__synthesize_mmap_events
 		event_t ev = {
 			.header = {
 				.type = PERF_RECORD_MMAP,
-				.misc = 0, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */
+				.misc = PERF_RECORD_MISC_USER, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */
 			 },
 		};
 		int n;
@@ -158,11 +158,13 @@ static int event__synthesize_mmap_events
 }
 
 int event__synthesize_modules(event__handler_t process,
-			      struct perf_session *session)
+			      struct perf_session *session,
+			      struct map_groups *kmaps,
+			      unsigned int misc)
 {
 	struct rb_node *nd;
 
-	for (nd = rb_first(&session->kmaps.maps[MAP__FUNCTION]);
+	for (nd = rb_first(&kmaps->maps[MAP__FUNCTION]);
 	     nd; nd = rb_next(nd)) {
 		event_t ev;
 		size_t size;
@@ -173,7 +175,7 @@ int event__synthesize_modules(event__han
 
 		size = ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
 		memset(&ev, 0, sizeof(ev));
-		ev.mmap.header.misc = 1; /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */
+		ev.mmap.header.misc = misc; /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */
 		ev.mmap.header.type = PERF_RECORD_MMAP;
 		ev.mmap.header.size = (sizeof(ev.mmap) -
 				        (sizeof(ev.mmap.filename) - size));
@@ -241,13 +243,17 @@ static int find_symbol_cb(void *arg, con
 
 int event__synthesize_kernel_mmap(event__handler_t process,
 				  struct perf_session *session,
-				  const char *symbol_name)
+				  const char *kallsyms_name,
+				  const char *mmap_name,
+				  struct map **maps,
+				  const char *symbol_name,
+				  unsigned int misc)
 {
 	size_t size;
 	event_t ev = {
 		.header = {
 			.type = PERF_RECORD_MMAP,
-			.misc = 1, /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */
+			.misc = misc, /* kernel uses PERF_RECORD_MISC_USER for user space maps, see kernel/perf_event.c __perf_event_mmap */
 		},
 	};
 	/*
@@ -257,16 +263,16 @@ int event__synthesize_kernel_mmap(event_
 	 */
 	struct process_symbol_args args = { .name = symbol_name, };
 
-	if (kallsyms__parse("/proc/kallsyms", &args, find_symbol_cb) <= 0)
+	if (kallsyms__parse(kallsyms_name, &args, find_symbol_cb) <= 0)
 		return -ENOENT;
 
 	size = snprintf(ev.mmap.filename, sizeof(ev.mmap.filename),
-			"[kernel.kallsyms.%s]", symbol_name) + 1;
+			"[%s.%s]", mmap_name, symbol_name) + 1;
 	size = ALIGN(size, sizeof(u64));
 	ev.mmap.header.size = (sizeof(ev.mmap) - (sizeof(ev.mmap.filename) - size));
 	ev.mmap.pgoff = args.start;
-	ev.mmap.start = session->vmlinux_maps[MAP__FUNCTION]->start;
-	ev.mmap.len   = session->vmlinux_maps[MAP__FUNCTION]->end - ev.mmap.start ;
+	ev.mmap.start = maps[MAP__FUNCTION]->start;
+	ev.mmap.len   = maps[MAP__FUNCTION]->end - ev.mmap.start ;
 
 	return process(&ev, session);
 }
@@ -320,19 +326,25 @@ int event__process_lost(event_t *self, s
 	return 0;
 }
 
-int event__process_mmap(event_t *self, struct perf_session *session)
+static void event_set_kernel_mmap_len(struct map **maps, event_t *self)
 {
-	struct thread *thread;
-	struct map *map;
+	maps[MAP__FUNCTION]->start = self->mmap.start;
+	maps[MAP__FUNCTION]->end   = self->mmap.start + self->mmap.len;
+	/*
+	 * Be a bit paranoid here, some perf.data file came with
+	 * a zero sized synthesized MMAP event for the kernel.
+	 */
+	if (maps[MAP__FUNCTION]->end == 0)
+		maps[MAP__FUNCTION]->end = ~0UL;
+}
 
-	dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n",
-		    self->mmap.pid, self->mmap.tid, self->mmap.start,
-		    self->mmap.len, self->mmap.pgoff, self->mmap.filename);
+static int __event__process_mmap(event_t *self, struct perf_session *session)
+{
+	struct map *map;
+	static const char kmmap_prefix[] = "[kernel.kallsyms.";
 
-	if (self->mmap.pid == 0) {
-		static const char kmmap_prefix[] = "[kernel.kallsyms.";
+	if (self->mmap.filename[0] == '/') {
 
-		if (self->mmap.filename[0] == '/') {
 			char short_module_name[1024];
 			char *name = strrchr(self->mmap.filename, '/'), *dot;
 
@@ -348,9 +360,10 @@ int event__process_mmap(event_t *self, s
 				 "[%.*s]", (int)(dot - name), name);
 			strxfrchar(short_module_name, '-', '_');
 
-			map = perf_session__new_module_map(session,
+			map = map_groups__new_module(&session->kmaps,
 							   self->mmap.start,
-							   self->mmap.filename);
+							   self->mmap.filename,
+							   0);
 			if (map == NULL)
 				goto out_problem;
 
@@ -373,22 +386,94 @@ int event__process_mmap(event_t *self, s
 			if (kernel == NULL)
 				goto out_problem;
 
-			kernel->kernel = 1;
-			if (__perf_session__create_kernel_maps(session, kernel) < 0)
+			kernel->kernel = DSO_TYPE_KERNEL;
+			if (__map_groups__create_kernel_maps(&session->kmaps,
+						session->vmlinux_maps, kernel) < 0)
 				goto out_problem;
 
-			session->vmlinux_maps[MAP__FUNCTION]->start = self->mmap.start;
-			session->vmlinux_maps[MAP__FUNCTION]->end   = self->mmap.start + self->mmap.len;
-			/*
-			 * Be a bit paranoid here, some perf.data file came with
-			 * a zero sized synthesized MMAP event for the kernel.
-			 */
-			if (session->vmlinux_maps[MAP__FUNCTION]->end == 0)
-				session->vmlinux_maps[MAP__FUNCTION]->end = ~0UL;
-
-			perf_session__set_kallsyms_ref_reloc_sym(session, symbol_name,
-								 self->mmap.pgoff);
+			event_set_kernel_mmap_len(session->vmlinux_maps, self);
+			perf_session__set_kallsyms_ref_reloc_sym(session->vmlinux_maps,
+							symbol_name,
+							self->mmap.pgoff);
 		}
+	return 0;
+
+out_problem:
+	return -1;
+}
+
+static int __event__process_guest_mmap(event_t *self, struct perf_session *session)
+{
+	struct map *map;
+
+	static const char kmmap_prefix[] = "[guest.kernel.kallsyms.";
+
+	if (memcmp(self->mmap.filename, kmmap_prefix,
+				sizeof(kmmap_prefix) - 1) == 0) {
+		const char *symbol_name = (self->mmap.filename +
+				sizeof(kmmap_prefix) - 1);
+		/*
+		 * Should be there already, from the build-id table in
+		 * the header.
+		 */
+		struct dso *kernel = __dsos__findnew(&dsos__guest_kernel,
+				"[guest.kernel.kallsyms]");
+		if (kernel == NULL)
+			goto out_problem;
+
+		kernel->kernel = DSO_TYPE_GUEST_KERNEL;
+		if (__map_groups__create_kernel_maps(&session->guest_kmaps,
+				session->guest_vmlinux_maps, kernel) < 0)
+			goto out_problem;
+
+		event_set_kernel_mmap_len(session->guest_vmlinux_maps, self);
+		perf_session__set_kallsyms_ref_reloc_sym(session->guest_vmlinux_maps,
+				symbol_name,
+				self->mmap.pgoff);
+		/*
+		 * preload dso of guest kernel and modules
+		 */
+		dso__load(kernel, session->guest_vmlinux_maps[MAP__FUNCTION], NULL);
+	} else if (self->mmap.filename[0] == '[') {
+		char *name;
+
+		map = map_groups__new_module(&session->guest_kmaps,
+				self->mmap.start,
+				self->mmap.filename,
+				1);
+		if (map == NULL)
+			goto out_problem;
+		name = strdup(self->mmap.filename);
+		if (name == NULL)
+			goto out_problem;
+
+		map->dso->short_name = name;
+		map->end = map->start + self->mmap.len;
+	}
+
+	return 0;
+out_problem:
+	return -1;
+}
+
+int event__process_mmap(event_t *self, struct perf_session *session)
+{
+	struct thread *thread;
+	struct map *map;
+	u8 cpumode = self->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+	int ret;
+
+	dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n",
+			self->mmap.pid, self->mmap.tid, self->mmap.start,
+			self->mmap.len, self->mmap.pgoff, self->mmap.filename);
+
+	if (self->mmap.pid == 0) {
+		if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL)
+			ret = __event__process_guest_mmap(self, session);
+		else
+			ret = __event__process_mmap(self, session);
+		if (ret < 0)
+			goto out_problem;
 		return 0;
 	}
 
@@ -441,15 +526,33 @@ void thread__find_addr_map(struct thread
 
 	al->thread = self;
 	al->addr = addr;
+	al->cpumode = cpumode;
 
-	if (cpumode == PERF_RECORD_MISC_KERNEL) {
+	if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
 		al->level = 'k';
 		mg = &session->kmaps;
-	} else if (cpumode == PERF_RECORD_MISC_USER)
+	} else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
 		al->level = '.';
-	else {
-		al->level = 'H';
+	} else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
+		al->level = 'g';
+		mg = &session->guest_kmaps;
+	} else {
+		/* TODO: We don't support guest user space. Might support late */
+		if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest)
+			al->level = 'u';
+		else
+			al->level = 'H';
 		al->map = NULL;
+
+		if ((cpumode == PERF_RECORD_MISC_GUEST_USER ||
+			cpumode == PERF_RECORD_MISC_GUEST_KERNEL) &&
+			!perf_guest)
+			al->filtered = true;
+		if ((cpumode == PERF_RECORD_MISC_USER ||
+			cpumode == PERF_RECORD_MISC_KERNEL) &&
+			!perf_host)
+			al->filtered = true;
+
 		return;
 	}
 try_again:
@@ -464,10 +567,18 @@ try_again:
 		 * "[vdso]" dso, but for now lets use the old trick of looking
 		 * in the whole kernel symbol list.
 		 */
-		if ((long long)al->addr < 0 && mg != &session->kmaps) {
+		if ((long long)al->addr < 0 &&
+			mg != &session->kmaps &&
+			cpumode == PERF_RECORD_MISC_KERNEL) {
 			mg = &session->kmaps;
 			goto try_again;
 		}
+		if ((long long)al->addr < 0 &&
+				mg != &session->guest_kmaps &&
+				cpumode == PERF_RECORD_MISC_GUEST_KERNEL) {
+			mg = &session->guest_kmaps;
+			goto try_again;
+		}
 	} else
 		al->addr = al->map->map_ip(al->map, al->addr);
 }
@@ -513,6 +624,7 @@ int event__preprocess_sample(const event
 
 	dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
 
+	al->filtered = false;
 	thread__find_addr_location(thread, session, cpumode, MAP__FUNCTION,
 				   self->ip.ip, al, filter);
 	dump_printf(" ...... dso: %s\n",
@@ -536,7 +648,6 @@ int event__preprocess_sample(const event
 	    !strlist__has_entry(symbol_conf.sym_list, al->sym->name))
 		goto out_filtered;
 
-	al->filtered = false;
 	return 0;
 
 out_filtered:
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/event.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/event.h
--- linux-2.6_tipmaster0315/tools/perf/util/event.h	2010-03-16 08:59:54.856460879 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/event.h	2010-03-16 09:01:10.000116335 +0800
@@ -119,10 +119,17 @@ int event__synthesize_thread(pid_t pid, 
 void event__synthesize_threads(event__handler_t process,
 			       struct perf_session *session);
 int event__synthesize_kernel_mmap(event__handler_t process,
-				  struct perf_session *session,
-				  const char *symbol_name);
+				struct perf_session *session,
+				const char *kallsyms_name,
+				const char *mmap_name,
+				struct map **maps,
+				const char *symbol_name,
+				unsigned int misc);
+
 int event__synthesize_modules(event__handler_t process,
-			      struct perf_session *session);
+			      struct perf_session *session,
+			      struct map_groups *kmaps,
+			      unsigned int misc);
 
 int event__process_comm(event_t *self, struct perf_session *session);
 int event__process_lost(event_t *self, struct perf_session *session);
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/hist.c linux-2.6_tipmaster0315_perfkvm/tools/perf/util/hist.c
--- linux-2.6_tipmaster0315/tools/perf/util/hist.c	2010-03-16 08:59:54.880462306 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/hist.c	2010-03-16 10:44:18.228997471 +0800
@@ -8,6 +8,30 @@ struct callchain_param	callchain_param =
 	.min_percent = 0.5
 };
 
+void __perf_session__add_count(struct hist_entry *he,
+			struct addr_location *al,
+			u64 count)
+{
+	he->count += count;
+
+	switch (al->cpumode) {
+	case PERF_RECORD_MISC_KERNEL:
+		he->count_sys += count;
+		break;
+	case PERF_RECORD_MISC_USER:
+		he->count_us += count;
+		break;
+	case PERF_RECORD_MISC_GUEST_KERNEL:
+		he->count_guest_sys += count;
+		break;
+	case PERF_RECORD_MISC_GUEST_USER:
+		he->count_guest_us += count;
+		break;
+	default:
+		break;
+	}
+}
+
 /*
  * histogram, sorted on item, collects counts
  */
@@ -26,7 +50,6 @@ struct hist_entry *__perf_session__add_h
 		.sym	= al->sym,
 		.ip	= al->addr,
 		.level	= al->level,
-		.count	= count,
 		.parent = sym_parent,
 	};
 	int cmp;
@@ -48,6 +71,8 @@ struct hist_entry *__perf_session__add_h
 			p = &(*p)->rb_right;
 	}
 
+	__perf_session__add_count(&entry, al, count);
+
 	he = malloc(sizeof(*he));
 	if (!he)
 		return NULL;
@@ -462,7 +487,7 @@ size_t hist_entry__fprintf(struct hist_e
 			   u64 session_total)
 {
 	struct sort_entry *se;
-	u64 count, total;
+	u64 count, total, count_sys, count_us, count_guest_sys, count_guest_us;
 	const char *sep = symbol_conf.field_sep;
 	size_t ret;
 
@@ -472,15 +497,35 @@ size_t hist_entry__fprintf(struct hist_e
 	if (pair_session) {
 		count = self->pair ? self->pair->count : 0;
 		total = pair_session->events_stats.total;
+		count_sys = self->pair ? self->pair->count_sys : 0;
+		count_us = self->pair ? self->pair->count_us : 0;
+		count_guest_sys = self->pair ? self->pair->count_guest_sys : 0;
+		count_guest_us = self->pair ? self->pair->count_guest_us : 0;
 	} else {
 		count = self->count;
 		total = session_total;
+		count_sys = self->count_sys;
+		count_us = self->count_us;
+		count_guest_sys = self->count_guest_sys;
+		count_guest_us = self->count_guest_us;
 	}
 
-	if (total)
+	if (total) {
 		ret = percent_color_fprintf(fp, sep ? "%.2f" : "   %6.2f%%",
 					    (count * 100.0) / total);
-	else
+		if (symbol_conf.show_cpu_utilization) {
+			ret += percent_color_fprintf(fp, sep ? "%.2f" : "   %6.2f%%",
+					(count_sys * 100.0) / total);
+			ret += percent_color_fprintf(fp, sep ? "%.2f" : "   %6.2f%%",
+					(count_us * 100.0) / total);
+			if (perf_guest) {
+				ret += percent_color_fprintf(fp, sep ? "%.2f" : "   %6.2f%%",
+						(count_guest_sys * 100.0) / total);
+				ret += percent_color_fprintf(fp, sep ? "%.2f" : "   %6.2f%%",
+						(count_guest_us * 100.0) / total);
+			}
+		}
+	} else
 		ret = fprintf(fp, sep ? "%lld" : "%12lld ", count);
 
 	if (symbol_conf.show_nr_samples) {
@@ -576,6 +621,20 @@ size_t perf_session__fprintf_hists(struc
 			fputs("  Samples  ", fp);
 	}
 
+	if (symbol_conf.show_cpu_utilization) {
+		if (sep) {
+			ret += fprintf(fp, "%csys", *sep);
+			ret += fprintf(fp, "%cus", *sep);
+			ret += fprintf(fp, "%cguest sys", *sep);
+			ret += fprintf(fp, "%cguest us", *sep);
+		} else {
+			ret += fprintf(fp, "  sys  ");
+			ret += fprintf(fp, "  us  ");
+			ret += fprintf(fp, "  guest sys  ");
+			ret += fprintf(fp, "  guest us  ");
+		}
+	}
+
 	if (pair) {
 		if (sep)
 			ret += fprintf(fp, "%cDelta", *sep);
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/hist.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/hist.h
--- linux-2.6_tipmaster0315/tools/perf/util/hist.h	2010-03-16 08:59:54.868491838 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/hist.h	2010-03-16 10:11:24.744056043 +0800
@@ -12,6 +12,9 @@ struct addr_location;
 struct symbol;
 struct rb_root;
 
+void __perf_session__add_count(struct hist_entry *he,
+			struct addr_location *al,
+			u64 count);
 struct hist_entry *__perf_session__add_hist_entry(struct rb_root *hists,
 						  struct addr_location *al,
 						  struct symbol *parent,
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/session.c linux-2.6_tipmaster0315_perfkvm/tools/perf/util/session.c
--- linux-2.6_tipmaster0315/tools/perf/util/session.c	2010-03-16 08:59:54.888458734 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/session.c	2010-03-16 09:01:10.000116335 +0800
@@ -54,7 +54,12 @@ out_close:
 
 static inline int perf_session__create_kernel_maps(struct perf_session *self)
 {
-	return map_groups__create_kernel_maps(&self->kmaps, self->vmlinux_maps);
+	int ret;
+	ret = map_groups__create_kernel_maps(&self->kmaps, self->vmlinux_maps);
+	if (ret >= 0)
+		ret = map_groups__create_guest_kernel_maps(&self->guest_kmaps,
+				self->guest_vmlinux_maps);
+	return ret;
 }
 
 struct perf_session *perf_session__new(const char *filename, int mode, bool force)
@@ -77,6 +82,7 @@ struct perf_session *perf_session__new(c
 	self->cwdlen = 0;
 	self->unknown_events = 0;
 	map_groups__init(&self->kmaps);
+	map_groups__init(&self->guest_kmaps);
 
 	if (mode == O_RDONLY) {
 		if (perf_session__open(self, force) < 0)
@@ -356,7 +362,8 @@ int perf_header__read_build_ids(struct p
 		if (read(input, filename, len) != len)
 			goto out;
 
-		if (bev.header.misc & PERF_RECORD_MISC_KERNEL)
+		if ((bev.header.misc & PERF_RECORD_MISC_CPUMODE_MASK)
+			==  PERF_RECORD_MISC_KERNEL)
 			head = &dsos__kernel;
 
 		dso = __dsos__findnew(head, filename);
@@ -519,26 +526,33 @@ bool perf_session__has_traces(struct per
 	return true;
 }
 
-int perf_session__set_kallsyms_ref_reloc_sym(struct perf_session *self,
+int perf_session__set_kallsyms_ref_reloc_sym(struct map ** maps,
 					     const char *symbol_name,
 					     u64 addr)
 {
 	char *bracket;
 	enum map_type i;
+	struct ref_reloc_sym *ref;
 
-	self->ref_reloc_sym.name = strdup(symbol_name);
-	if (self->ref_reloc_sym.name == NULL)
+	ref = zalloc(sizeof(struct ref_reloc_sym));
+	if (ref == NULL)
 		return -ENOMEM;
 
-	bracket = strchr(self->ref_reloc_sym.name, ']');
+	ref->name = strdup(symbol_name);
+	if (ref->name == NULL) {
+		free(ref);
+		return -ENOMEM;
+	}
+
+	bracket = strchr(ref->name, ']');
 	if (bracket)
 		*bracket = '\0';
 
-	self->ref_reloc_sym.addr = addr;
+	ref->addr = addr;
 
 	for (i = 0; i < MAP__NR_TYPES; ++i) {
-		struct kmap *kmap = map__kmap(self->vmlinux_maps[i]);
-		kmap->ref_reloc_sym = &self->ref_reloc_sym;
+		struct kmap *kmap = map__kmap(maps[i]);
+		kmap->ref_reloc_sym = ref;
 	}
 
 	return 0;
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/session.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/session.h
--- linux-2.6_tipmaster0315/tools/perf/util/session.h	2010-03-16 08:59:54.768472278 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/session.h	2010-03-16 09:04:50.827525867 +0800
@@ -16,16 +16,17 @@ struct perf_session {
 	unsigned long		size;
 	unsigned long		mmap_window;
 	struct map_groups	kmaps;
+	struct map_groups	guest_kmaps;
 	struct rb_root		threads;
 	struct thread		*last_match;
 	struct map		*vmlinux_maps[MAP__NR_TYPES];
+	struct map		*guest_vmlinux_maps[MAP__NR_TYPES];
 	struct events_stats	events_stats;
 	struct rb_root		stats_by_id;
 	unsigned long		event_total[PERF_RECORD_MAX];
 	unsigned long		unknown_events;
 	struct rb_root		hists;
 	u64			sample_type;
-	struct ref_reloc_sym	ref_reloc_sym;
 	int			fd;
 	int			cwdlen;
 	char			*cwd;
@@ -67,26 +68,12 @@ bool perf_session__has_traces(struct per
 int perf_header__read_build_ids(struct perf_header *self, int input,
 				u64 offset, u64 file_size);
 
-int perf_session__set_kallsyms_ref_reloc_sym(struct perf_session *self,
+int perf_session__set_kallsyms_ref_reloc_sym(struct map ** maps,
 					     const char *symbol_name,
 					     u64 addr);
 
 void mem_bswap_64(void *src, int byte_size);
 
-static inline int __perf_session__create_kernel_maps(struct perf_session *self,
-						struct dso *kernel)
-{
-	return __map_groups__create_kernel_maps(&self->kmaps,
-						self->vmlinux_maps, kernel);
-}
-
-static inline struct map *
-	perf_session__new_module_map(struct perf_session *self,
-				     u64 start, const char *filename)
-{
-	return map_groups__new_module(&self->kmaps, start, filename);
-}
-
 #ifdef NO_NEWT_SUPPORT
 static inline void perf_session__browse_hists(struct rb_root *hists __used,
 					      u64 session_total __used,
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/sort.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/sort.h
--- linux-2.6_tipmaster0315/tools/perf/util/sort.h	2010-03-16 08:59:54.780505450 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/sort.h	2010-03-16 09:46:38.997734739 +0800
@@ -44,6 +44,10 @@ extern enum sort_type sort__first_dimens
 struct hist_entry {
 	struct rb_node		rb_node;
 	u64			count;
+	u64			count_sys;
+	u64			count_us;
+	u64			count_guest_sys;
+	u64			count_guest_us;
 	struct thread		*thread;
 	struct map		*map;
 	struct symbol		*sym;
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/symbol.c linux-2.6_tipmaster0315_perfkvm/tools/perf/util/symbol.c
--- linux-2.6_tipmaster0315/tools/perf/util/symbol.c	2010-03-16 08:59:54.784503211 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/symbol.c	2010-03-16 10:47:03.587519946 +0800
@@ -22,6 +22,8 @@ static void dsos__add(struct list_head *
 static struct map *map__new2(u64 start, struct dso *dso, enum map_type type);
 static int dso__load_kernel_sym(struct dso *self, struct map *map,
 				symbol_filter_t filter);
+static int dso__load_guest_kernel_sym(struct dso *self, struct map *map,
+			symbol_filter_t filter);
 static int vmlinux_path__nr_entries;
 static char **vmlinux_path;
 
@@ -172,6 +174,7 @@ struct dso *dso__new(const char *name)
 		self->loaded = 0;
 		self->sorted_by_name = 0;
 		self->has_build_id = 0;
+		self->kernel = DSO_TYPE_USER;
 	}
 
 	return self;
@@ -388,12 +391,9 @@ int kallsyms__parse(const char *filename
 		char *symbol_name;
 
 		line_len = getline(&line, &n, file);
-		if (line_len < 0)
+		if (line_len < 0 || !line)
 			break;
 
-		if (!line)
-			goto out_failure;
-
 		line[--line_len] = '\0'; /* \n */
 
 		len = hex2u64(line, &start);
@@ -445,6 +445,7 @@ static int map__process_kallsym_symbol(v
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
 	symbols__insert(root, sym);
+
 	return 0;
 }
 
@@ -490,6 +491,15 @@ static int dso__split_kallsyms(struct ds
 			*module++ = '\0';
 
 			if (strcmp(curr_map->dso->short_name, module)) {
+				if (curr_map != map &&
+					self->kernel == DSO_TYPE_GUEST_KERNEL) {
+					/*
+					 * We assume all symbols of a module are continuous in
+					 * kallsyms, so curr_map points to a module and all its
+					 * symbols are in its kmap. Mark it as loaded.
+					 */
+					dso__set_loaded(curr_map->dso, curr_map->type);
+				}
 				curr_map = map_groups__find_by_name(kmaps, map->type, module);
 				if (curr_map == NULL) {
 					pr_debug("/proc/{kallsyms,modules} "
@@ -511,13 +521,19 @@ static int dso__split_kallsyms(struct ds
 			char dso_name[PATH_MAX];
 			struct dso *dso;
 
-			snprintf(dso_name, sizeof(dso_name), "[kernel].%d",
-				 kernel_range++);
+			if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+				snprintf(dso_name, sizeof(dso_name), "[guest.kernel].%d",
+						kernel_range++);
+			else
+				snprintf(dso_name, sizeof(dso_name), "[kernel].%d",
+						kernel_range++);
 
 			dso = dso__new(dso_name);
 			if (dso == NULL)
 				return -1;
 
+			dso->kernel = self->kernel;
+
 			curr_map = map__new2(pos->start, dso, map->type);
 			if (curr_map == NULL) {
 				dso__delete(dso);
@@ -541,6 +557,10 @@ discard_symbol:		rb_erase(&pos->rb_node,
 		}
 	}
 
+	if (curr_map != map &&
+		self->kernel == DSO_TYPE_GUEST_KERNEL)
+		dso__set_loaded(curr_map->dso, curr_map->type);
+
 	return count;
 }
 
@@ -551,7 +571,10 @@ int dso__load_kallsyms(struct dso *self,
 		return -1;
 
 	symbols__fixup_end(&self->symbols[map->type]);
-	self->origin = DSO__ORIG_KERNEL;
+	if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+		self->origin = DSO__ORIG_GUEST_KERNEL;
+	else
+		self->origin = DSO__ORIG_KERNEL;
 
 	return dso__split_kallsyms(self, map, filter);
 }
@@ -939,7 +962,7 @@ static int dso__load_sym(struct dso *sel
 	nr_syms = shdr.sh_size / shdr.sh_entsize;
 
 	memset(&sym, 0, sizeof(sym));
-	if (!self->kernel) {
+	if (self->kernel == DSO_TYPE_USER) {
 		self->adjust_symbols = (ehdr.e_type == ET_EXEC ||
 				elf_section_by_name(elf, &ehdr, &shdr,
 						     ".gnu.prelink_undo",
@@ -971,7 +994,7 @@ static int dso__load_sym(struct dso *sel
 
 		section_name = elf_sec__name(&shdr, secstrs);
 
-		if (self->kernel || kmodule) {
+		if (self->kernel != DSO_TYPE_USER || kmodule) {
 			char dso_name[PATH_MAX];
 
 			if (strcmp(section_name,
@@ -997,6 +1020,7 @@ static int dso__load_sym(struct dso *sel
 				curr_dso = dso__new(dso_name);
 				if (curr_dso == NULL)
 					goto out_elf_end;
+				curr_dso->kernel = self->kernel;
 				curr_map = map__new2(start, curr_dso,
 						     map->type);
 				if (curr_map == NULL) {
@@ -1007,7 +1031,10 @@ static int dso__load_sym(struct dso *sel
 				curr_map->unmap_ip = identity__map_ip;
 				curr_dso->origin = self->origin;
 				map_groups__insert(kmap->kmaps, curr_map);
-				dsos__add(&dsos__kernel, curr_dso);
+				if (curr_dso->kernel == DSO_TYPE_GUEST_KERNEL)
+					dsos__add(&dsos__guest_kernel, curr_dso);
+				else
+					dsos__add(&dsos__kernel, curr_dso);
 				dso__set_loaded(curr_dso, map->type);
 			} else
 				curr_dso = curr_map->dso;
@@ -1228,6 +1255,8 @@ char dso__symtab_origin(const struct dso
 		[DSO__ORIG_BUILDID] =  'b',
 		[DSO__ORIG_DSO] =      'd',
 		[DSO__ORIG_KMODULE] =  'K',
+		[DSO__ORIG_GUEST_KERNEL] =  'g',
+		[DSO__ORIG_GUEST_KMODULE] =  'G',
 	};
 
 	if (self == NULL || self->origin == DSO__ORIG_NOT_FOUND)
@@ -1246,8 +1275,10 @@ int dso__load(struct dso *self, struct m
 
 	dso__set_loaded(self, map->type);
 
-	if (self->kernel)
+	if (self->kernel == DSO_TYPE_KERNEL)
 		return dso__load_kernel_sym(self, map, filter);
+	else if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+		return dso__load_guest_kernel_sym(self, map, filter);
 
 	name = malloc(size);
 	if (!name)
@@ -1451,7 +1482,7 @@ static int map_groups__set_modules_path(
 static struct map *map__new2(u64 start, struct dso *dso, enum map_type type)
 {
 	struct map *self = zalloc(sizeof(*self) +
-				  (dso->kernel ? sizeof(struct kmap) : 0));
+			  (dso->kernel != DSO_TYPE_USER ? sizeof(struct kmap) : 0));
 	if (self != NULL) {
 		/*
 		 * ->end will be filled after we load all the symbols
@@ -1463,11 +1494,15 @@ static struct map *map__new2(u64 start, 
 }
 
 struct map *map_groups__new_module(struct map_groups *self, u64 start,
-				   const char *filename)
+				   const char *filename, int guest)
 {
 	struct map *map;
-	struct dso *dso = __dsos__findnew(&dsos__kernel, filename);
+	struct dso *dso;
 
+	if (!guest)
+		dso = __dsos__findnew(&dsos__kernel, filename);
+	else
+		dso = __dsos__findnew(&dsos__guest_kernel, filename);
 	if (dso == NULL)
 		return NULL;
 
@@ -1475,16 +1510,20 @@ struct map *map_groups__new_module(struc
 	if (map == NULL)
 		return NULL;
 
-	dso->origin = DSO__ORIG_KMODULE;
+	if (guest)
+		dso->origin = DSO__ORIG_GUEST_KMODULE;
+	else
+		dso->origin = DSO__ORIG_KMODULE;
 	map_groups__insert(self, map);
 	return map;
 }
 
-static int map_groups__create_modules(struct map_groups *self)
+static int __map_groups__create_modules(struct map_groups *self,
+			const char * filename, int guest)
 {
 	char *line = NULL;
 	size_t n;
-	FILE *file = fopen("/proc/modules", "r");
+	FILE *file = fopen(filename, "r");
 	struct map *map;
 
 	if (file == NULL)
@@ -1518,16 +1557,17 @@ static int map_groups__create_modules(st
 		*sep = '\0';
 
 		snprintf(name, sizeof(name), "[%s]", line);
-		map = map_groups__new_module(self, start, name);
+		map = map_groups__new_module(self, start, name, guest);
 		if (map == NULL)
 			goto out_delete_line;
-		dso__kernel_module_get_build_id(map->dso);
+		if (!guest)
+			dso__kernel_module_get_build_id(map->dso);
 	}
 
 	free(line);
 	fclose(file);
 
-	return map_groups__set_modules_path(self);
+	return 0;
 
 out_delete_line:
 	free(line);
@@ -1535,6 +1575,21 @@ out_failure:
 	return -1;
 }
 
+static int map_groups__create_modules(struct map_groups *self)
+{
+	int ret;
+
+	ret = __map_groups__create_modules(self, "/proc/modules", 0);
+	if (ret >= 0)
+		ret = map_groups__set_modules_path(self);
+	return ret;
+}
+
+static int map_groups__create_guest_modules(struct map_groups *self)
+{
+	return  __map_groups__create_modules(self, symbol_conf.guest_modules, 1);
+}
+
 static int dso__load_vmlinux(struct dso *self, struct map *map,
 			     const char *vmlinux, symbol_filter_t filter)
 {
@@ -1694,8 +1749,44 @@ out_fixup:
 	return err;
 }
 
+static int dso__load_guest_kernel_sym(struct dso *self, struct map *map,
+				symbol_filter_t filter)
+{
+	int err;
+	const char *kallsyms_filename;
+	/*
+	 * if the user specified a vmlinux filename, use it and only
+	 * it, reporting errors to the user if it cannot be used.
+	 * Or use file guest_kallsyms inputted by user on commandline
+	 */
+	if (symbol_conf.guest_vmlinux_name != NULL) {
+		err = dso__load_vmlinux(self, map,
+					symbol_conf.guest_vmlinux_name, filter);
+		goto out_try_fixup;
+	}
+
+	kallsyms_filename = symbol_conf.guest_kallsyms;
+	if (!kallsyms_filename)
+		return -1;
+	err = dso__load_kallsyms(self, kallsyms_filename, map, filter);
+	if (err > 0)
+		pr_debug("Using %s for symbols\n", kallsyms_filename);
+
+out_try_fixup:
+	if (err > 0) {
+		if (kallsyms_filename != NULL)
+			dso__set_long_name(self, strdup("[guest.kernel.kallsyms]"));
+		map__fixup_start(map);
+		map__fixup_end(map);
+	}
+
+	return err;
+}
+
 LIST_HEAD(dsos__user);
 LIST_HEAD(dsos__kernel);
+LIST_HEAD(dsos__guest_user);
+LIST_HEAD(dsos__guest_kernel);
 
 static void dsos__add(struct list_head *head, struct dso *dso)
 {
@@ -1742,6 +1833,8 @@ void dsos__fprintf(FILE *fp)
 {
 	__dsos__fprintf(&dsos__kernel, fp);
 	__dsos__fprintf(&dsos__user, fp);
+	__dsos__fprintf(&dsos__guest_kernel, fp);
+	__dsos__fprintf(&dsos__guest_user, fp);
 }
 
 static size_t __dsos__fprintf_buildid(struct list_head *head, FILE *fp,
@@ -1771,7 +1864,19 @@ struct dso *dso__new_kernel(const char *
 
 	if (self != NULL) {
 		self->short_name = "[kernel]";
-		self->kernel	 = 1;
+		self->kernel	 = DSO_TYPE_KERNEL;
+	}
+
+	return self;
+}
+
+struct dso *dso__new_guest_kernel(const char *name)
+{
+	struct dso *self = dso__new(name ?: "[guest.kernel.kallsyms]");
+
+	if (self != NULL) {
+		self->short_name = "[guest.kernel]";
+		self->kernel	 = DSO_TYPE_GUEST_KERNEL;
 	}
 
 	return self;
@@ -1796,6 +1901,15 @@ static struct dso *dsos__create_kernel(c
 	return kernel;
 }
 
+static struct dso *dsos__create_guest_kernel(const char *vmlinux)
+{
+	struct dso *kernel = dso__new_guest_kernel(vmlinux);
+
+	if (kernel != NULL)
+		dsos__add(&dsos__guest_kernel, kernel);
+	return kernel;
+}
+
 int __map_groups__create_kernel_maps(struct map_groups *self,
 				     struct map *vmlinux_maps[MAP__NR_TYPES],
 				     struct dso *kernel)
@@ -1955,3 +2069,24 @@ int map_groups__create_kernel_maps(struc
 	map_groups__fixup_end(self);
 	return 0;
 }
+
+int map_groups__create_guest_kernel_maps(struct map_groups *self,
+				   struct map *vmlinux_maps[MAP__NR_TYPES])
+{
+	struct dso *kernel = dsos__create_guest_kernel(symbol_conf.guest_vmlinux_name);
+
+	if (kernel == NULL)
+		return -1;
+
+	if (__map_groups__create_kernel_maps(self, vmlinux_maps, kernel) < 0)
+		return -1;
+
+	if (symbol_conf.use_modules && map_groups__create_guest_modules(self) < 0)
+		pr_debug("Problems creating module maps, continuing anyway...\n");
+	/*
+	 * Now that we have all the maps created, just set the ->end of them:
+	 */
+	map_groups__fixup_end(self);
+	return 0;
+}
+
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/symbol.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/symbol.h
--- linux-2.6_tipmaster0315/tools/perf/util/symbol.h	2010-03-16 08:59:54.880462306 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/symbol.h	2010-03-16 10:37:03.880361568 +0800
@@ -63,10 +63,14 @@ struct symbol_conf {
 			show_nr_samples,
 			use_callchain,
 			exclude_other,
-			full_paths;
+			full_paths,
+			show_cpu_utilization;
 	const char	*vmlinux_name,
 			*field_sep;
-	char            *dso_list_str,
+	const char	*guest_vmlinux_name,
+			*guest_kallsyms,
+			*guest_modules;
+	char		*dso_list_str,
 			*comm_list_str,
 			*sym_list_str,
 			*col_width_list_str;
@@ -95,6 +99,13 @@ struct addr_location {
 	u64	      addr;
 	char	      level;
 	bool	      filtered;
+	unsigned int  cpumode;
+};
+
+enum dso_kernel_type {
+	DSO_TYPE_USER = 0,
+	DSO_TYPE_KERNEL,
+	DSO_TYPE_GUEST_KERNEL
 };
 
 struct dso {
@@ -104,7 +115,7 @@ struct dso {
 	u8		 adjust_symbols:1;
 	u8		 slen_calculated:1;
 	u8		 has_build_id:1;
-	u8		 kernel:1;
+	enum dso_kernel_type	kernel;
 	u8		 hit:1;
 	u8		 annotate_warned:1;
 	unsigned char	 origin;
@@ -119,6 +130,7 @@ struct dso {
 
 struct dso *dso__new(const char *name);
 struct dso *dso__new_kernel(const char *name);
+struct dso *dso__new_guest_kernel(const char *name);
 void dso__delete(struct dso *self);
 
 bool dso__loaded(const struct dso *self, enum map_type type);
@@ -131,7 +143,7 @@ static inline void dso__set_loaded(struc
 
 void dso__sort_by_name(struct dso *self, enum map_type type);
 
-extern struct list_head dsos__user, dsos__kernel;
+extern struct list_head dsos__user, dsos__kernel, dsos__guest_user, dsos__guest_kernel;
 
 struct dso *__dsos__findnew(struct list_head *head, const char *name);
 
@@ -160,6 +172,8 @@ enum dso_origin {
 	DSO__ORIG_BUILDID,
 	DSO__ORIG_DSO,
 	DSO__ORIG_KMODULE,
+	DSO__ORIG_GUEST_KERNEL,
+	DSO__ORIG_GUEST_KMODULE,
 	DSO__ORIG_NOT_FOUND,
 };
 
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/thread.h linux-2.6_tipmaster0315_perfkvm/tools/perf/util/thread.h
--- linux-2.6_tipmaster0315/tools/perf/util/thread.h	2010-03-16 08:59:54.764469663 +0800
+++ linux-2.6_tipmaster0315_perfkvm/tools/perf/util/thread.h	2010-03-16 09:01:10.004081483 +0800
@@ -82,6 +82,9 @@ int __map_groups__create_kernel_maps(str
 int map_groups__create_kernel_maps(struct map_groups *self,
 				   struct map *vmlinux_maps[MAP__NR_TYPES]);
 
+int map_groups__create_guest_kernel_maps(struct map_groups *self,
+				   struct map *vmlinux_maps[MAP__NR_TYPES]);
+
 struct map *map_groups__new_module(struct map_groups *self, u64 start,
-				   const char *filename);
+				   const char *filename, int guest);
 #endif	/* __PERF_THREAD_H */



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  5:27 [PATCH] Enhance perf to collect KVM guest os statistics from host side Zhang, Yanmin
@ 2010-03-16  5:41 ` Avi Kivity
  2010-03-16  7:24   ` Ingo Molnar
  2010-03-16  7:48   ` Zhang, Yanmin
  2010-03-19  3:38 ` Zhang, Yanmin
  1 sibling, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-16  5:41 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
>
> Based on the discussion in KVM community, I worked out the patch to support
> perf to collect guest os statistics from host side. This patch is implemented
> with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> critical bug and provided good suggestions with other guys. I really appreciate
> their kind help.
>
> The patch adds new subcommand kvm to perf.
>
>    perf kvm top
>    perf kvm record
>    perf kvm report
>    perf kvm diff
>
> The new perf could profile guest os kernel except guest os user space, but it
> could summarize guest os user space utilization per guest os.
>
> Below are some examples.
> 1) perf kvm top
> [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> --guestmodules=/home/ymzhang/guest/modules top
>
>    

Excellent, support for guest kernel != host kernel is critical (I can't 
remember the last time I ran same kernels).

How would we support multiple guests with different kernels?  Perhaps a 
symbol server that perf can connect to (and that would connect to guests 
in turn)?

> diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c
> --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c	2010-03-16 08:59:11.825295404 +0800
> +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c	2010-03-16 09:01:09.976084492 +0800
> @@ -26,6 +26,7 @@
>   #include<linux/sched.h>
>   #include<linux/moduleparam.h>
>   #include<linux/ftrace_event.h>
> +#include<linux/perf_event.h>
>   #include "kvm_cache_regs.h"
>   #include "x86.h"
>
> @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct
>   	vmcs_write32(TPR_THRESHOLD, irr);
>   }
>
> +DEFINE_PER_CPU(int, kvm_in_guest) = {0};
> +
> +static void kvm_set_in_guest(void)
> +{
> +	percpu_write(kvm_in_guest, 1);
> +}
> +
> +static int kvm_is_in_guest(void)
> +{
> +	return percpu_read(kvm_in_guest);
> +}
>    

There is already PF_VCPU for this.

> +static struct perf_guest_info_callbacks kvm_guest_cbs = {
> +	.is_in_guest 		= kvm_is_in_guest,
> +	.is_user_mode		= kvm_is_user_mode,
> +	.get_guest_ip		= kvm_get_guest_ip,
> +	.reset_in_guest		= kvm_reset_in_guest,
> +};
>    

Should be in common code, not vmx specific.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  5:41 ` Avi Kivity
@ 2010-03-16  7:24   ` Ingo Molnar
  2010-03-16  9:20     ` Avi Kivity
  2010-03-16  7:48   ` Zhang, Yanmin
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16  7:24 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang


* Avi Kivity <avi@redhat.com> wrote:

> On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> >From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
> >
> >Based on the discussion in KVM community, I worked out the patch to support
> >perf to collect guest os statistics from host side. This patch is implemented
> >with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> >critical bug and provided good suggestions with other guys. I really appreciate
> >their kind help.
> >
> >The patch adds new subcommand kvm to perf.
> >
> >   perf kvm top
> >   perf kvm record
> >   perf kvm report
> >   perf kvm diff
> >
> >The new perf could profile guest os kernel except guest os user space, but it
> >could summarize guest os user space utilization per guest os.
> >
> >Below are some examples.
> >1) perf kvm top
> >[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> >--guestmodules=/home/ymzhang/guest/modules top
> >
> 
> Excellent, support for guest kernel != host kernel is critical (I
> can't remember the last time I ran same kernels).
> 
> How would we support multiple guests with different kernels? Perhaps a 
> symbol server that perf can connect to (and that would connect to guests in 
> turn)?

The highest quality solution would be if KVM offered a 'guest extension' to 
the guest kernel's /proc/kallsyms that made it easy for user-space to get this 
information from an authorative source.

That's the main reason why the host side /proc/kallsyms is so popular and so 
useful: while in theory it's mostly redundant information which can be gleaned 
from the System.map and other sources of symbol information, it's easily 
available and is _always_ trustable to come from the host kernel.

Separate System.map's have a tendency to go out of sync (or go missing when a 
devel kernel gets rebuilt, or if a devel package is not installed), and server 
ports (be that a TCP port space server or an UDP port space mount-point) are 
both a configuration hassle and are not guest-transparent.

So for instrumentation infrastructure (such as perf) we have a large and well 
founded preference for intrinsic, built-in, kernel-provided information: i.e. 
a largely 'built-in' and transparent mechanism to get to guest symbols.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  5:41 ` Avi Kivity
  2010-03-16  7:24   ` Ingo Molnar
@ 2010-03-16  7:48   ` Zhang, Yanmin
  2010-03-16  9:28     ` Zhang, Yanmin
  2010-03-16  9:32     ` Avi Kivity
  1 sibling, 2 replies; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-16  7:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
> >
> > Based on the discussion in KVM community, I worked out the patch to support
> > perf to collect guest os statistics from host side. This patch is implemented
> > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> > critical bug and provided good suggestions with other guys. I really appreciate
> > their kind help.
> >
> > The patch adds new subcommand kvm to perf.
> >
> >    perf kvm top
> >    perf kvm record
> >    perf kvm report
> >    perf kvm diff
> >
> > The new perf could profile guest os kernel except guest os user space, but it
> > could summarize guest os user space utilization per guest os.
> >
> > Below are some examples.
> > 1) perf kvm top
> > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> > --guestmodules=/home/ymzhang/guest/modules top
> >
> >    
> 
Thanks for your kind comments.

> Excellent, support for guest kernel != host kernel is critical (I can't 
> remember the last time I ran same kernels).
> 
> How would we support multiple guests with different kernels?
With the patch, 'perf kvm report --sort pid" could show
summary statistics for all guest os instances. Then, use
parameter --pid of 'perf kvm record' to collect single problematic instance data.

>   Perhaps a 
> symbol server that perf can connect to (and that would connect to guests 
> in turn)?

> 
> > diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c
> > --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c	2010-03-16 08:59:11.825295404 +0800
> > +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c	2010-03-16 09:01:09.976084492 +0800
> > @@ -26,6 +26,7 @@
> >   #include<linux/sched.h>
> >   #include<linux/moduleparam.h>
> >   #include<linux/ftrace_event.h>
> > +#include<linux/perf_event.h>
> >   #include "kvm_cache_regs.h"
> >   #include "x86.h"
> >
> > @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct
> >   	vmcs_write32(TPR_THRESHOLD, irr);
> >   }
> >
> > +DEFINE_PER_CPU(int, kvm_in_guest) = {0};
> > +
> > +static void kvm_set_in_guest(void)
> > +{
> > +	percpu_write(kvm_in_guest, 1);
> > +}
> > +
> > +static int kvm_is_in_guest(void)
> > +{
> > +	return percpu_read(kvm_in_guest);
> > +}
> >    
> 

> There is already PF_VCPU for this.
Right, but there is a scope between kvm_guest_enter and really running
in guest os, where a perf event might overflow. Anyway, the scope is very
narrow, I will change it to use flag PF_VCPU.

> 
> > +static struct perf_guest_info_callbacks kvm_guest_cbs = {
> > +	.is_in_guest 		= kvm_is_in_guest,
> > +	.is_user_mode		= kvm_is_user_mode,
> > +	.get_guest_ip		= kvm_get_guest_ip,
> > +	.reset_in_guest		= kvm_reset_in_guest,
> > +};
> >    
> 
> Should be in common code, not vmx specific.
Right. I discussed with Yangsheng. I will move above data structures and
callbacks to file arch/x86/kvm/x86.c, and add get_ip, a new callback to
kvm_x86_ops.

Yanmin



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  7:24   ` Ingo Molnar
@ 2010-03-16  9:20     ` Avi Kivity
  2010-03-16  9:53       ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-16  9:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On 03/16/2010 09:24 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
>>      
>>> From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
>>>
>>> Based on the discussion in KVM community, I worked out the patch to support
>>> perf to collect guest os statistics from host side. This patch is implemented
>>> with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
>>> critical bug and provided good suggestions with other guys. I really appreciate
>>> their kind help.
>>>
>>> The patch adds new subcommand kvm to perf.
>>>
>>>    perf kvm top
>>>    perf kvm record
>>>    perf kvm report
>>>    perf kvm diff
>>>
>>> The new perf could profile guest os kernel except guest os user space, but it
>>> could summarize guest os user space utilization per guest os.
>>>
>>> Below are some examples.
>>> 1) perf kvm top
>>> [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
>>> --guestmodules=/home/ymzhang/guest/modules top
>>>
>>>        
>> Excellent, support for guest kernel != host kernel is critical (I
>> can't remember the last time I ran same kernels).
>>
>> How would we support multiple guests with different kernels? Perhaps a
>> symbol server that perf can connect to (and that would connect to guests in
>> turn)?
>>      
> The highest quality solution would be if KVM offered a 'guest extension' to
> the guest kernel's /proc/kallsyms that made it easy for user-space to get this
> information from an authorative source.
>
> That's the main reason why the host side /proc/kallsyms is so popular and so
> useful: while in theory it's mostly redundant information which can be gleaned
> from the System.map and other sources of symbol information, it's easily
> available and is _always_ trustable to come from the host kernel.
>
> Separate System.map's have a tendency to go out of sync (or go missing when a
> devel kernel gets rebuilt, or if a devel package is not installed), and server
> ports (be that a TCP port space server or an UDP port space mount-point) are
> both a configuration hassle and are not guest-transparent.
>
> So for instrumentation infrastructure (such as perf) we have a large and well
> founded preference for intrinsic, built-in, kernel-provided information: i.e.
> a largely 'built-in' and transparent mechanism to get to guest symbols.
>    

The symbol server's client can certainly access the bits through vmchannel.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  7:48   ` Zhang, Yanmin
@ 2010-03-16  9:28     ` Zhang, Yanmin
  2010-03-16  9:33       ` Avi Kivity
  2010-03-16  9:47       ` Ingo Molnar
  2010-03-16  9:32     ` Avi Kivity
  1 sibling, 2 replies; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-16  9:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang

On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
> On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
> > >
> > > Based on the discussion in KVM community, I worked out the patch to support
> > > perf to collect guest os statistics from host side. This patch is implemented
> > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> > > critical bug and provided good suggestions with other guys. I really appreciate
> > > their kind help.
> > >
> > > The patch adds new subcommand kvm to perf.
> > >
> > >    perf kvm top
> > >    perf kvm record
> > >    perf kvm report
> > >    perf kvm diff
> > >
> > > The new perf could profile guest os kernel except guest os user space, but it
> > > could summarize guest os user space utilization per guest os.
> > >
> > > Below are some examples.
> > > 1) perf kvm top
> > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> > > --guestmodules=/home/ymzhang/guest/modules top
> > >
> > >    
> > 
> Thanks for your kind comments.
> 
> > Excellent, support for guest kernel != host kernel is critical (I can't 
> > remember the last time I ran same kernels).
> > 
> > How would we support multiple guests with different kernels?
> With the patch, 'perf kvm report --sort pid" could show
> summary statistics for all guest os instances. Then, use
> parameter --pid of 'perf kvm record' to collect single problematic instance data.
Sorry. I found currently --pid isn't process but a thread (main thread).

Ingo,

Is it possible to support a new parameter or extend --inherit, so 'perf record' and
'perf top' could collect data on all threads of a process when the process is running?

If not, I need add a new ugly parameter which is similar to --pid to filter out process
data in userspace.

Yanmin



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  7:48   ` Zhang, Yanmin
  2010-03-16  9:28     ` Zhang, Yanmin
@ 2010-03-16  9:32     ` Avi Kivity
  2010-03-17  2:34       ` Zhang, Yanmin
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-16  9:32 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Joerg Roedel

On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
>
>> Excellent, support for guest kernel != host kernel is critical (I can't
>> remember the last time I ran same kernels).
>>
>> How would we support multiple guests with different kernels?
>>      
> With the patch, 'perf kvm report --sort pid" could show
> summary statistics for all guest os instances. Then, use
> parameter --pid of 'perf kvm record' to collect single problematic instance data.
>    

That certainly works, though automatic association of guest data with 
guest symbols is friendlier.

>>> diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c
>>> --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c	2010-03-16 08:59:11.825295404 +0800
>>> +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c	2010-03-16 09:01:09.976084492 +0800
>>> @@ -26,6 +26,7 @@
>>>    #include<linux/sched.h>
>>>    #include<linux/moduleparam.h>
>>>    #include<linux/ftrace_event.h>
>>> +#include<linux/perf_event.h>
>>>    #include "kvm_cache_regs.h"
>>>    #include "x86.h"
>>>
>>> @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct
>>>    	vmcs_write32(TPR_THRESHOLD, irr);
>>>    }
>>>
>>> +DEFINE_PER_CPU(int, kvm_in_guest) = {0};
>>> +
>>> +static void kvm_set_in_guest(void)
>>> +{
>>> +	percpu_write(kvm_in_guest, 1);
>>> +}
>>> +
>>> +static int kvm_is_in_guest(void)
>>> +{
>>> +	return percpu_read(kvm_in_guest);
>>> +}
>>>
>>>        
>>      
>    
>> There is already PF_VCPU for this.
>>      
> Right, but there is a scope between kvm_guest_enter and really running
> in guest os, where a perf event might overflow. Anyway, the scope is very
> narrow, I will change it to use flag PF_VCPU.
>    

There is also a window between setting the flag and calling 'int $2' 
where an NMI might happen and be accounted incorrectly.

Perhaps separate the 'int $2' into a direct call into perf and another 
call for the rest of NMI handling.  I don't see how it would work on svm 
though - AFAICT the NMI is held whereas vmx swallows it.  I guess NMIs 
will be disabled until the next IRET so it isn't racy, just tricky.

>>> +static struct perf_guest_info_callbacks kvm_guest_cbs = {
>>> +	.is_in_guest 		= kvm_is_in_guest,
>>> +	.is_user_mode		= kvm_is_user_mode,
>>> +	.get_guest_ip		= kvm_get_guest_ip,
>>> +	.reset_in_guest		= kvm_reset_in_guest,
>>> +};
>>>
>>>        
>> Should be in common code, not vmx specific.
>>      
> Right. I discussed with Yangsheng. I will move above data structures and
> callbacks to file arch/x86/kvm/x86.c, and add get_ip, a new callback to
> kvm_x86_ops.
>    

You will need access to the vcpu pointer (kvm_rip_read() needs it), you 
can put it in a percpu variable.  I guess if it's not null, you know 
you're in a guest, so no need for PF_VCPU.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  9:28     ` Zhang, Yanmin
@ 2010-03-16  9:33       ` Avi Kivity
  2010-03-16  9:47       ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-16  9:33 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang

On 03/16/2010 11:28 AM, Zhang, Yanmin wrote:
> Sorry. I found currently --pid isn't process but a thread (main thread).
>
> Ingo,
>
> Is it possible to support a new parameter or extend --inherit, so 'perf record' and
> 'perf top' could collect data on all threads of a process when the process is running?
>    

That seems like a worthwhile addition regardless of this thread.  
Profile all current threads and any new ones.  It probably makes sense 
to call this --pid and rename the existing --pid to --thread.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  9:28     ` Zhang, Yanmin
  2010-03-16  9:33       ` Avi Kivity
@ 2010-03-16  9:47       ` Ingo Molnar
  2010-03-17  9:26         ` Zhang, Yanmin
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16  9:47 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang


* Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:

> On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
> > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
> > > >
> > > > Based on the discussion in KVM community, I worked out the patch to support
> > > > perf to collect guest os statistics from host side. This patch is implemented
> > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> > > > critical bug and provided good suggestions with other guys. I really appreciate
> > > > their kind help.
> > > >
> > > > The patch adds new subcommand kvm to perf.
> > > >
> > > >    perf kvm top
> > > >    perf kvm record
> > > >    perf kvm report
> > > >    perf kvm diff
> > > >
> > > > The new perf could profile guest os kernel except guest os user space, but it
> > > > could summarize guest os user space utilization per guest os.
> > > >
> > > > Below are some examples.
> > > > 1) perf kvm top
> > > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> > > > --guestmodules=/home/ymzhang/guest/modules top
> > > >
> > > >    
> > > 
> > Thanks for your kind comments.
> > 
> > > Excellent, support for guest kernel != host kernel is critical (I can't 
> > > remember the last time I ran same kernels).
> > > 
> > > How would we support multiple guests with different kernels?
> > With the patch, 'perf kvm report --sort pid" could show
> > summary statistics for all guest os instances. Then, use
> > parameter --pid of 'perf kvm record' to collect single problematic instance data.
> Sorry. I found currently --pid isn't process but a thread (main thread).
> 
> Ingo,
> 
> Is it possible to support a new parameter or extend --inherit, so 'perf 
> record' and 'perf top' could collect data on all threads of a process when 
> the process is running?
> 
> If not, I need add a new ugly parameter which is similar to --pid to filter 
> out process data in userspace.

Yeah. For maximum utility i'd suggest to extend --pid to include this, and 
introduce --tid for the previous, limited-to-a-single-task functionality.

Most users would expect --pid to work like a 'late attach' - i.e. to work like 
strace -f or like a gdb attach.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  9:20     ` Avi Kivity
@ 2010-03-16  9:53       ` Ingo Molnar
  2010-03-16 10:13         ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16  9:53 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang


* Avi Kivity <avi@redhat.com> wrote:

> On 03/16/2010 09:24 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> >>>From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
> >>>
> >>>Based on the discussion in KVM community, I worked out the patch to support
> >>>perf to collect guest os statistics from host side. This patch is implemented
> >>>with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> >>>critical bug and provided good suggestions with other guys. I really appreciate
> >>>their kind help.
> >>>
> >>>The patch adds new subcommand kvm to perf.
> >>>
> >>>   perf kvm top
> >>>   perf kvm record
> >>>   perf kvm report
> >>>   perf kvm diff
> >>>
> >>>The new perf could profile guest os kernel except guest os user space, but it
> >>>could summarize guest os user space utilization per guest os.
> >>>
> >>>Below are some examples.
> >>>1) perf kvm top
> >>>[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> >>>--guestmodules=/home/ymzhang/guest/modules top
> >>>
> >>Excellent, support for guest kernel != host kernel is critical (I
> >>can't remember the last time I ran same kernels).
> >>
> >>How would we support multiple guests with different kernels? Perhaps a
> >>symbol server that perf can connect to (and that would connect to guests in
> >>turn)?
> >The highest quality solution would be if KVM offered a 'guest extension' to
> >the guest kernel's /proc/kallsyms that made it easy for user-space to get this
> >information from an authorative source.
> >
> >That's the main reason why the host side /proc/kallsyms is so popular and so
> >useful: while in theory it's mostly redundant information which can be gleaned
> >from the System.map and other sources of symbol information, it's easily
> >available and is _always_ trustable to come from the host kernel.
> >
> >Separate System.map's have a tendency to go out of sync (or go missing when a
> >devel kernel gets rebuilt, or if a devel package is not installed), and server
> >ports (be that a TCP port space server or an UDP port space mount-point) are
> >both a configuration hassle and are not guest-transparent.
> >
> >So for instrumentation infrastructure (such as perf) we have a large and well
> >founded preference for intrinsic, built-in, kernel-provided information: i.e.
> >a largely 'built-in' and transparent mechanism to get to guest symbols.
> 
> The symbol server's client can certainly access the bits through vmchannel.

Ok, that would work i suspect.

Would be nice to have the symbol server in tools/perf/ and also make it easy 
to add it to the initrd via a .config switch or so.

That would have basically all of the advantages of being built into the kernel 
(availability, configurability, transparency, hackability), while having all 
the advantages of a user-space approach as well (flexibility, extensibility, 
robustness, ease of maintenance, etc.).

If only we had tools/xorg/ integrated via the initrd that way ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  9:53       ` Ingo Molnar
@ 2010-03-16 10:13         ` Avi Kivity
  2010-03-16 10:20           ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-16 10:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On 03/16/2010 11:53 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/16/2010 09:24 AM, Ingo Molnar wrote:
>>      
>>> * Avi Kivity<avi@redhat.com>   wrote:
>>>
>>>        
>>>> On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
>>>>          
>>>>> From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
>>>>>
>>>>> Based on the discussion in KVM community, I worked out the patch to support
>>>>> perf to collect guest os statistics from host side. This patch is implemented
>>>>> with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
>>>>> critical bug and provided good suggestions with other guys. I really appreciate
>>>>> their kind help.
>>>>>
>>>>> The patch adds new subcommand kvm to perf.
>>>>>
>>>>>    perf kvm top
>>>>>    perf kvm record
>>>>>    perf kvm report
>>>>>    perf kvm diff
>>>>>
>>>>> The new perf could profile guest os kernel except guest os user space, but it
>>>>> could summarize guest os user space utilization per guest os.
>>>>>
>>>>> Below are some examples.
>>>>> 1) perf kvm top
>>>>> [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
>>>>> --guestmodules=/home/ymzhang/guest/modules top
>>>>>
>>>>>            
>>>> Excellent, support for guest kernel != host kernel is critical (I
>>>> can't remember the last time I ran same kernels).
>>>>
>>>> How would we support multiple guests with different kernels? Perhaps a
>>>> symbol server that perf can connect to (and that would connect to guests in
>>>> turn)?
>>>>          
>>> The highest quality solution would be if KVM offered a 'guest extension' to
>>> the guest kernel's /proc/kallsyms that made it easy for user-space to get this
>>> information from an authorative source.
>>>
>>> That's the main reason why the host side /proc/kallsyms is so popular and so
>>> useful: while in theory it's mostly redundant information which can be gleaned
>>>        
>> >from the System.map and other sources of symbol information, it's easily
>>      
>>> available and is _always_ trustable to come from the host kernel.
>>>
>>> Separate System.map's have a tendency to go out of sync (or go missing when a
>>> devel kernel gets rebuilt, or if a devel package is not installed), and server
>>> ports (be that a TCP port space server or an UDP port space mount-point) are
>>> both a configuration hassle and are not guest-transparent.
>>>
>>> So for instrumentation infrastructure (such as perf) we have a large and well
>>> founded preference for intrinsic, built-in, kernel-provided information: i.e.
>>> a largely 'built-in' and transparent mechanism to get to guest symbols.
>>>        
>> The symbol server's client can certainly access the bits through vmchannel.
>>      
> Ok, that would work i suspect.
>
> Would be nice to have the symbol server in tools/perf/ and also make it easy
> to add it to the initrd via a .config switch or so.
>
> That would have basically all of the advantages of being built into the kernel
> (availability, configurability, transparency, hackability), while having all
> the advantages of a user-space approach as well (flexibility, extensibility,
> robustness, ease of maintenance, etc.).
>    

Note, I am not advocating building the vmchannel client into the host 
kernel.  While that makes everything simpler for the user, it increases 
the kernel footprint with all the disadvantages that come with that (any 
bug is converted into a host DoS or worse).

So, perf would connect to qemu via (say) a well-known unix domain 
socket, which would then talk to the guest kernel.

I know you won't like it, we'll continue to disagree on this unfortunately.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 10:13         ` Avi Kivity
@ 2010-03-16 10:20           ` Ingo Molnar
  2010-03-16 10:40             ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 10:20 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang


* Avi Kivity <avi@redhat.com> wrote:

> On 03/16/2010 11:53 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>On 03/16/2010 09:24 AM, Ingo Molnar wrote:
> >>>* Avi Kivity<avi@redhat.com>   wrote:
> >>>
> >>>>On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> >>>>>From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
> >>>>>
> >>>>>Based on the discussion in KVM community, I worked out the patch to support
> >>>>>perf to collect guest os statistics from host side. This patch is implemented
> >>>>>with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> >>>>>critical bug and provided good suggestions with other guys. I really appreciate
> >>>>>their kind help.
> >>>>>
> >>>>>The patch adds new subcommand kvm to perf.
> >>>>>
> >>>>>   perf kvm top
> >>>>>   perf kvm record
> >>>>>   perf kvm report
> >>>>>   perf kvm diff
> >>>>>
> >>>>>The new perf could profile guest os kernel except guest os user space, but it
> >>>>>could summarize guest os user space utilization per guest os.
> >>>>>
> >>>>>Below are some examples.
> >>>>>1) perf kvm top
> >>>>>[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> >>>>>--guestmodules=/home/ymzhang/guest/modules top
> >>>>>
> >>>>Excellent, support for guest kernel != host kernel is critical (I
> >>>>can't remember the last time I ran same kernels).
> >>>>
> >>>>How would we support multiple guests with different kernels? Perhaps a
> >>>>symbol server that perf can connect to (and that would connect to guests in
> >>>>turn)?
> >>>The highest quality solution would be if KVM offered a 'guest extension' to
> >>>the guest kernel's /proc/kallsyms that made it easy for user-space to get this
> >>>information from an authorative source.
> >>>
> >>>That's the main reason why the host side /proc/kallsyms is so popular and so
> >>>useful: while in theory it's mostly redundant information which can be gleaned
> >>>from the System.map and other sources of symbol information, it's easily
> >>>available and is _always_ trustable to come from the host kernel.
> >>>
> >>>Separate System.map's have a tendency to go out of sync (or go missing when a
> >>>devel kernel gets rebuilt, or if a devel package is not installed), and server
> >>>ports (be that a TCP port space server or an UDP port space mount-point) are
> >>>both a configuration hassle and are not guest-transparent.
> >>>
> >>>So for instrumentation infrastructure (such as perf) we have a large and well
> >>>founded preference for intrinsic, built-in, kernel-provided information: i.e.
> >>>a largely 'built-in' and transparent mechanism to get to guest symbols.
> >>The symbol server's client can certainly access the bits through vmchannel.
> >Ok, that would work i suspect.
> >
> >Would be nice to have the symbol server in tools/perf/ and also make it easy
> >to add it to the initrd via a .config switch or so.
> >
> >That would have basically all of the advantages of being built into the kernel
> >(availability, configurability, transparency, hackability), while having all
> >the advantages of a user-space approach as well (flexibility, extensibility,
> >robustness, ease of maintenance, etc.).
> 
> Note, I am not advocating building the vmchannel client into the host 
> kernel. [...]

Neither am i. What i suggested was a user-space binary/executable built in 
tools/perf and put into the initrd.

That approach has the advantages i listed above, without having the 
disadvantages of in-kernel code you listed.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 10:20           ` Ingo Molnar
@ 2010-03-16 10:40             ` Avi Kivity
  2010-03-16 10:50               ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-16 10:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On 03/16/2010 12:20 PM, Ingo Molnar wrote:
>>>>
>>>> The symbol server's client can certainly access the bits through vmchannel.
>>>>          
>>> Ok, that would work i suspect.
>>>
>>> Would be nice to have the symbol server in tools/perf/ and also make it easy
>>> to add it to the initrd via a .config switch or so.
>>>
>>> That would have basically all of the advantages of being built into the kernel
>>> (availability, configurability, transparency, hackability), while having all
>>> the advantages of a user-space approach as well (flexibility, extensibility,
>>> robustness, ease of maintenance, etc.).
>>>        
>> Note, I am not advocating building the vmchannel client into the host
>> kernel. [...]
>>      
> Neither am i. What i suggested was a user-space binary/executable built in
> tools/perf and put into the initrd.
>    

I'm confused - initrd seems to be guest-side.  I was talking about the 
host side.

For the guest, placing the symbol server in tools/ is reasonable.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 10:40             ` Avi Kivity
@ 2010-03-16 10:50               ` Ingo Molnar
  2010-03-16 11:10                 ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 10:50 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang


* Avi Kivity <avi@redhat.com> wrote:

> On 03/16/2010 12:20 PM, Ingo Molnar wrote:
> >>>>
> >>>>The symbol server's client can certainly access the bits through vmchannel.
> >>>Ok, that would work i suspect.
> >>>
> >>>Would be nice to have the symbol server in tools/perf/ and also make it easy
> >>>to add it to the initrd via a .config switch or so.
> >>>
> >>>That would have basically all of the advantages of being built into the kernel
> >>>(availability, configurability, transparency, hackability), while having all
> >>>the advantages of a user-space approach as well (flexibility, extensibility,
> >>>robustness, ease of maintenance, etc.).
> >>Note, I am not advocating building the vmchannel client into the host
> >>kernel. [...]
> >Neither am i. What i suggested was a user-space binary/executable built in
> >tools/perf and put into the initrd.
> 
> I'm confused - initrd seems to be guest-side.  I was talking about the host 
> side.

host side doesnt need much support - just some client capability in perf 
itself. I suspect vmchannels are sufficiently flexible and configuration-free 
for such purposes? (i.e. like a filesystem in essence)

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 10:50               ` Ingo Molnar
@ 2010-03-16 11:10                 ` Avi Kivity
  2010-03-16 11:25                   ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-16 11:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On 03/16/2010 12:50 PM, Ingo Molnar wrote:
>
>> I'm confused - initrd seems to be guest-side.  I was talking about the host
>> side.
>>      
> host side doesnt need much support - just some client capability in perf
> itself. I suspect vmchannels are sufficiently flexible and configuration-free
> for such purposes? (i.e. like a filesystem in essence)
>    

I haven't followed vmchannel closely, but I think it is.  vmchannel is 
terminated in qemu on the host side, not in the host kernel.  So perf 
would need to connect to qemu.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 11:10                 ` Avi Kivity
@ 2010-03-16 11:25                   ` Ingo Molnar
  2010-03-16 12:21                     ` Avi Kivity
  2010-03-16 22:30                     ` [PATCH] Enhance perf to collect KVM guest os statistics from host side oerg Roedel
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 11:25 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang


* Avi Kivity <avi@redhat.com> wrote:

> On 03/16/2010 12:50 PM, Ingo Molnar wrote:
> >
> >>I'm confused - initrd seems to be guest-side.  I was talking about the host
> >>side.
> >host side doesnt need much support - just some client capability in perf
> >itself. I suspect vmchannels are sufficiently flexible and configuration-free
> >for such purposes? (i.e. like a filesystem in essence)
> 
> I haven't followed vmchannel closely, but I think it is.  vmchannel is 
> terminated in qemu on the host side, not in the host kernel.  So perf would 
> need to connect to qemu.

Hm, that sounds rather messy if we want to use it to basically expose kernel 
functionality in a guest/host unified way. Is the qemu process discoverable in 
some secure way? Can we trust it? Is there some proper tooling available to do 
it, or do we have to push it through 2-3 packages to get such a useful feature 
done?

( That is the general thought process how many cross-discipline useful
  desktop/server features hit the bit bucket before having had any chance of
  being vetted by users, and why Linux sucks so much when it comes to feature
  integration and application usability. )

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 11:25                   ` Ingo Molnar
@ 2010-03-16 12:21                     ` Avi Kivity
  2010-03-16 12:29                       ` Ingo Molnar
  2010-03-16 22:30                     ` [PATCH] Enhance perf to collect KVM guest os statistics from host side oerg Roedel
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-16 12:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On 03/16/2010 01:25 PM, Ingo Molnar wrote:
>
>> I haven't followed vmchannel closely, but I think it is.  vmchannel is
>> terminated in qemu on the host side, not in the host kernel.  So perf would
>> need to connect to qemu.
>>      
> Hm, that sounds rather messy if we want to use it to basically expose kernel
> functionality in a guest/host unified way. Is the qemu process discoverable in
> some secure way?

We know its pid.

> Can we trust it?

No choice, it contains the guest address space.

> Is there some proper tooling available to do
> it, or do we have to push it through 2-3 packages to get such a useful feature
> done?
>    

libvirt manages qemu processes, but I don't think this should go through 
libvirt.  qemu can do this directly by opening a unix domain socket in a 
well-known place.

> ( That is the general thought process how many cross-discipline useful
>    desktop/server features hit the bit bucket before having had any chance of
>    being vetted by users, and why Linux sucks so much when it comes to feature
>    integration and application usability. )
>    

You can't solve everything in the kernel, even with a well populated tools/.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 12:21                     ` Avi Kivity
@ 2010-03-16 12:29                       ` Ingo Molnar
  2010-03-16 12:41                         ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 12:29 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang


* Avi Kivity <avi@redhat.com> wrote:

> On 03/16/2010 01:25 PM, Ingo Molnar wrote:
> >
> >>I haven't followed vmchannel closely, but I think it is.  vmchannel is
> >>terminated in qemu on the host side, not in the host kernel.  So perf would
> >>need to connect to qemu.
> >Hm, that sounds rather messy if we want to use it to basically expose kernel
> >functionality in a guest/host unified way. Is the qemu process discoverable in
> >some secure way?
> 
> We know its pid.

How do i get a list of all 'guest instance PIDs', and what is the way to talk 
to Qemu?

> > Can we trust it?
> 
> No choice, it contains the guest address space.

I mean, i can trust a kernel service and i can trust /proc/kallsyms.

Can perf trust a random process claiming to be Qemu? What's the trust 
mechanism here?

> > Is there some proper tooling available to do it, or do we have to push it 
> > through 2-3 packages to get such a useful feature done?
> 
> libvirt manages qemu processes, but I don't think this should go through 
> libvirt.  qemu can do this directly by opening a unix domain socket in a 
> well-known place.

So Qemu has never run into such problems before?

( Sounds weird - i think Qemu configuration itself should be done via a 
  unix domain socket driven configuration protocol as well. )

> >( That is the general thought process how many cross-discipline useful
> >   desktop/server features hit the bit bucket before having had any chance of
> >   being vetted by users, and why Linux sucks so much when it comes to feature
> >   integration and application usability. )
> 
> You can't solve everything in the kernel, even with a well populated tools/.

Certainly not, but this is a technical problem in the kernel's domain, so it's 
a fair (and natural) expectation to be able to solve this within the kernel 
project.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 12:29                       ` Ingo Molnar
@ 2010-03-16 12:41                         ` Avi Kivity
  2010-03-16 13:08                           ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-16 12:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On 03/16/2010 02:29 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/16/2010 01:25 PM, Ingo Molnar wrote:
>>      
>>>        
>>>> I haven't followed vmchannel closely, but I think it is.  vmchannel is
>>>> terminated in qemu on the host side, not in the host kernel.  So perf would
>>>> need to connect to qemu.
>>>>          
>>> Hm, that sounds rather messy if we want to use it to basically expose kernel
>>> functionality in a guest/host unified way. Is the qemu process discoverable in
>>> some secure way?
>>>        
>> We know its pid.
>>      
> How do i get a list of all 'guest instance PIDs',

Libvirt manages all qemus, but this should be implemented independently 
of libvirt.

> and what is the way to talk
> to Qemu?
>    

In general qemu exposes communication channels (such as the monitor) as 
tcp connections, unix-domain sockets, stdio, etc.  It's very flexible.

>>> Can we trust it?
>>>        
>> No choice, it contains the guest address space.
>>      
> I mean, i can trust a kernel service and i can trust /proc/kallsyms.
>
> Can perf trust a random process claiming to be Qemu? What's the trust
> mechanism here?
>    

Obviously you can't trust anything you get from a guest, no matter how 
you get it.

How do you trust a userspace program's symbols?  you don't.  How do you 
get them?  they're on a well-known location.

>>> Is there some proper tooling available to do it, or do we have to push it
>>> through 2-3 packages to get such a useful feature done?
>>>        
>> libvirt manages qemu processes, but I don't think this should go through
>> libvirt.  qemu can do this directly by opening a unix domain socket in a
>> well-known place.
>>      
> So Qemu has never run into such problems before?
>
> ( Sounds weird - i think Qemu configuration itself should be done via a
>    unix domain socket driven configuration protocol as well. )
>    

That's exactly what happens.  You invoke qemu with -monitor 
unix:blah,server (or -qmp for a machine-readable format) and have your 
management application connect to that.  You can redirect guest serial 
ports, console, parallel port, etc. to unix-domain or tcp sockets.  
vmchannel is an extension of that mechanism.


>>> ( That is the general thought process how many cross-discipline useful
>>>    desktop/server features hit the bit bucket before having had any chance of
>>>    being vetted by users, and why Linux sucks so much when it comes to feature
>>>    integration and application usability. )
>>>        
>> You can't solve everything in the kernel, even with a well populated tools/.
>>      
> Certainly not, but this is a technical problem in the kernel's domain, so it's
> a fair (and natural) expectation to be able to solve this within the kernel
> project.
>    

Someone writing perf-gui outside the kernel would have the same 
problems, no?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 12:41                         ` Avi Kivity
@ 2010-03-16 13:08                           ` Ingo Molnar
  2010-03-16 13:16                             ` Avi Kivity
  2010-03-16 17:06                             ` Anthony Liguori
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 13:08 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang


* Avi Kivity <avi@redhat.com> wrote:

> On 03/16/2010 02:29 PM, Ingo Molnar wrote:

> > I mean, i can trust a kernel service and i can trust /proc/kallsyms.
> >
> > Can perf trust a random process claiming to be Qemu? What's the trust 
> > mechanism here?
> 
> Obviously you can't trust anything you get from a guest, no matter how you 
> get it.

I'm not talking about the symbol strings and addresses, and the object 
contents for allocation (or debuginfo). I'm talking about the basic protocol 
of establishing which guest is which.

I.e. we really want to be able users to:

 1) have it all working with a single guest, without having to specify 'which' 
    guest (qemu PID) to work with. That is the dominant usecase both for 
    developers and for a fair portion of testers.

 2) Have some reasonable symbolic identification for guests. For example a 
    usable approach would be to have 'perf kvm list', which would list all 
    currently active guests:

     $ perf kvm list
       [1] Fedora
       [2] OpenSuse
       [3] Windows-XP
       [4] Windows-7

    And from that point on 'perf kvm -g OpenSuse record' would do the obvious 
    thing. Users will be able to just use the 'OpenSuse' symbolic name for 
    that guest, even if the guest got restarted and switched its main PID.

Any such facility needs trusted enumeration and a protocol where i can trust 
that the information i got is authorative. (I.e. 'OpenSuse' truly matches to 
the OpenSuse session - not to some local user starting up a Qemu instance that 
claims to be 'OpenSuse'.)

Is such a scheme possible/available? I suspect all the KVM configuration tools 
(i havent used them in some time - gui and command-line tools alike) use 
similar methods to ease guest management?

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 13:08                           ` Ingo Molnar
@ 2010-03-16 13:16                             ` Avi Kivity
  2010-03-16 13:31                               ` Ingo Molnar
  2010-03-16 17:06                             ` Anthony Liguori
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-16 13:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On 03/16/2010 03:08 PM, Ingo Molnar wrote:
>
>>> I mean, i can trust a kernel service and i can trust /proc/kallsyms.
>>>
>>> Can perf trust a random process claiming to be Qemu? What's the trust
>>> mechanism here?
>>>        
>> Obviously you can't trust anything you get from a guest, no matter how you
>> get it.
>>      
> I'm not talking about the symbol strings and addresses, and the object
> contents for allocation (or debuginfo). I'm talking about the basic protocol
> of establishing which guest is which.
>    

There is none.  So far, qemu only dealt with managing just its own 
guest, and left all multiple guest management to higher levels up the 
stack (like libvirt).

> I.e. we really want to be able users to:
>
>   1) have it all working with a single guest, without having to specify 'which'
>      guest (qemu PID) to work with. That is the dominant usecase both for
>      developers and for a fair portion of testers.
>    

That's reasonable if we can get it working simply.

>   2) Have some reasonable symbolic identification for guests. For example a
>      usable approach would be to have 'perf kvm list', which would list all
>      currently active guests:
>
>       $ perf kvm list
>         [1] Fedora
>         [2] OpenSuse
>         [3] Windows-XP
>         [4] Windows-7
>
>      And from that point on 'perf kvm -g OpenSuse record' would do the obvious
>      thing. Users will be able to just use the 'OpenSuse' symbolic name for
>      that guest, even if the guest got restarted and switched its main PID.
>
> Any such facility needs trusted enumeration and a protocol where i can trust
> that the information i got is authorative. (I.e. 'OpenSuse' truly matches to
> the OpenSuse session - not to some local user starting up a Qemu instance that
> claims to be 'OpenSuse'.)
>
> Is such a scheme possible/available? I suspect all the KVM configuration tools
> (i havent used them in some time - gui and command-line tools alike) use
> similar methods to ease guest management?
>    

You can do that through libvirt, but that only works for guests started 
through libvirt.  libvirt provides command-line tools to list and manage 
guests (for example autostarting them on startup), and tools built on 
top of libvirt can manage guests graphically.

Looks like we have a layer inversion here.  Maybe we need a plugin 
system - libvirt drops a .so into perf that teaches it how to list 
guests and get their symbols.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 13:16                             ` Avi Kivity
@ 2010-03-16 13:31                               ` Ingo Molnar
  2010-03-16 13:37                                 ` Avi Kivity
  2010-03-16 15:06                                 ` Frank Ch. Eigler
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 13:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang


* Avi Kivity <avi@redhat.com> wrote:

> On 03/16/2010 03:08 PM, Ingo Molnar wrote:
> >
> >>>I mean, i can trust a kernel service and i can trust /proc/kallsyms.
> >>>
> >>>Can perf trust a random process claiming to be Qemu? What's the trust
> >>>mechanism here?
> >>Obviously you can't trust anything you get from a guest, no matter how you
> >>get it.
> >I'm not talking about the symbol strings and addresses, and the object
> >contents for allocation (or debuginfo). I'm talking about the basic protocol
> >of establishing which guest is which.
> 
> There is none.  So far, qemu only dealt with managing just its own
> guest, and left all multiple guest management to higher levels up
> the stack (like libvirt).
> 
> >I.e. we really want to be able users to:
> >
> >  1) have it all working with a single guest, without having to specify 'which'
> >     guest (qemu PID) to work with. That is the dominant usecase both for
> >     developers and for a fair portion of testers.
> 
> That's reasonable if we can get it working simply.

IMO such ease of use is reasonable and required, full stop.

If it cannot be gotten simply then that's a bug: either in the code, or in the 
design, or in the development process that led to the design. Bugs need 
fixing.

> >  2) Have some reasonable symbolic identification for guests. For example a
> >     usable approach would be to have 'perf kvm list', which would list all
> >     currently active guests:
> >
> >      $ perf kvm list
> >        [1] Fedora
> >        [2] OpenSuse
> >        [3] Windows-XP
> >        [4] Windows-7
> >
> >     And from that point on 'perf kvm -g OpenSuse record' would do the obvious
> >     thing. Users will be able to just use the 'OpenSuse' symbolic name for
> >     that guest, even if the guest got restarted and switched its main PID.
> >
> > Any such facility needs trusted enumeration and a protocol where i can 
> > trust that the information i got is authorative. (I.e. 'OpenSuse' truly 
> > matches to the OpenSuse session - not to some local user starting up a 
> > Qemu instance that claims to be 'OpenSuse'.)
> >
> > Is such a scheme possible/available? I suspect all the KVM configuration 
> > tools (i havent used them in some time - gui and command-line tools alike) 
> > use similar methods to ease guest management?
> 
> You can do that through libvirt, but that only works for guests started 
> through libvirt.  libvirt provides command-line tools to list and manage 
> guests (for example autostarting them on startup), and tools built on top of 
> libvirt can manage guests graphically.
> 
> Looks like we have a layer inversion here.  Maybe we need a plugin system - 
> libvirt drops a .so into perf that teaches it how to list guests and get 
> their symbols.

Is libvirt used to start up all KVM guests? If not, if it's only used on some 
distros while other distros have other solutions then there's apparently no 
good way to get to such information, and the kernel bits of KVM do not provide 
it.

To the user (and to me) this looks like a KVM bug / missing feature. (and the 
user doesnt care where the blame is) If that is true then apparently the 
current KVM design has no technically actionable solution for certain 
categories of features!

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 13:31                               ` Ingo Molnar
@ 2010-03-16 13:37                                 ` Avi Kivity
  2010-03-16 15:06                                 ` Frank Ch. Eigler
  1 sibling, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-16 13:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On 03/16/2010 03:31 PM, Ingo Molnar wrote:
>
>> You can do that through libvirt, but that only works for guests started
>> through libvirt.  libvirt provides command-line tools to list and manage
>> guests (for example autostarting them on startup), and tools built on top of
>> libvirt can manage guests graphically.
>>
>> Looks like we have a layer inversion here.  Maybe we need a plugin system -
>> libvirt drops a .so into perf that teaches it how to list guests and get
>> their symbols.
>>      
> Is libvirt used to start up all KVM guests? If not, if it's only used on some
> distros while other distros have other solutions then there's apparently no
> good way to get to such information, and the kernel bits of KVM do not provide
> it.
>    

Developers tend to start qemu from the command line, but the majority of 
users and all distros I know of use libvirt.  Some users cobble up their 
own scripts.

> To the user (and to me) this looks like a KVM bug / missing feature. (and the
> user doesnt care where the blame is) If that is true then apparently the
> current KVM design has no technically actionable solution for certain
> categories of features!
>    

A plugin system allows anyone who is interested to provide the 
information; they just need to write a plugin for their management tool.

Since we can't prevent people from writing management tools, I don't see 
what else we can do.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 13:31                               ` Ingo Molnar
  2010-03-16 13:37                                 ` Avi Kivity
@ 2010-03-16 15:06                                 ` Frank Ch. Eigler
  2010-03-16 15:52                                   ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Frank Ch. Eigler @ 2010-03-16 15:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang


Ingo Molnar <mingo@elte.hu> writes:

> [...]
>> >I.e. we really want to be able users to:
>> >
>> >  1) have it all working with a single guest, without having to specify 'which'
>> >     guest (qemu PID) to work with. That is the dominant usecase both for
>> >     developers and for a fair portion of testers.
>> 
>> That's reasonable if we can get it working simply.
>
> IMO such ease of use is reasonable and required, full stop.
> If it cannot be gotten simply then that's a bug: either in the code, or in the 
> design, or in the development process that led to the design. Bugs need 
> fixing. [...]

Perhaps the fact that kvm happens to deal with an interesting
application area (virtualization) is misleading here.  As far as the
host kernel or other host userspace is concerned, qemu is just some
random unprivileged userspace program (with some *optional* /dev/kvm
services that might happen to require temporary root).

As such, perf trying to instrument qemu is no different than perf
trying to instrument any other userspace widget.  Therefore, expecting
'trusted enumeration' of instances is just as sensible as using
'trusted ps' and 'trusted /var/run/FOO.pid files'.


- FChE

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 15:06                                 ` Frank Ch. Eigler
@ 2010-03-16 15:52                                   ` Ingo Molnar
  2010-03-16 16:08                                     ` Frank Ch. Eigler
  2010-03-16 17:34                                     ` Anthony Liguori
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 15:52 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang


* Frank Ch. Eigler <fche@redhat.com> wrote:

> 
> Ingo Molnar <mingo@elte.hu> writes:
> 
> > [...]
> >> >I.e. we really want to be able users to:
> >> >
> >> >  1) have it all working with a single guest, without having to specify 'which'
> >> >     guest (qemu PID) to work with. That is the dominant usecase both for
> >> >     developers and for a fair portion of testers.
> >> 
> >> That's reasonable if we can get it working simply.
> >
> > IMO such ease of use is reasonable and required, full stop.
> > If it cannot be gotten simply then that's a bug: either in the code, or in the 
> > design, or in the development process that led to the design. Bugs need 
> > fixing. [...]
> 
> Perhaps the fact that kvm happens to deal with an interesting application 
> area (virtualization) is misleading here.  As far as the host kernel or 
> other host userspace is concerned, qemu is just some random unprivileged 
> userspace program (with some *optional* /dev/kvm services that might happen 
> to require temporary root).
> 
> As such, perf trying to instrument qemu is no different than perf trying to 
> instrument any other userspace widget.  Therefore, expecting 'trusted 
> enumeration' of instances is just as sensible as using 'trusted ps' and 
> 'trusted /var/run/FOO.pid files'.

You are quite mistaken: KVM isnt really a 'random unprivileged application' in 
this context, it is clearly an extension of system/kernel services.

( Which can be seen from the simple fact that what started the discussion was 
  'how do we get /proc/kallsyms from the guest'. I.e. an extension of the 
  existing host-space /proc/kallsyms was desired. )

In that sense the most natural 'extension' would be the solution i mentioned a 
week or two ago: to have a (read only) mount of all guest filesystems, plus a 
channel for profiling/tracing data. That would make symbol parsing easier and 
it's what extends the existing 'host space' abstraction in the most natural 
way.

( It doesnt even have to be done via the kernel - Qemu could implement that
  via FUSE for example. )

As a second best option a 'symbol server' might be used too.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 15:52                                   ` Ingo Molnar
@ 2010-03-16 16:08                                     ` Frank Ch. Eigler
  2010-03-16 16:35                                       ` Ingo Molnar
  2010-03-16 17:34                                     ` Anthony Liguori
  1 sibling, 1 reply; 390+ messages in thread
From: Frank Ch. Eigler @ 2010-03-16 16:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang

Hi -

On Tue, Mar 16, 2010 at 04:52:21PM +0100, Ingo Molnar wrote:
> [...]
> > Perhaps the fact that kvm happens to deal with an interesting application 
> > area (virtualization) is misleading here.  As far as the host kernel or 
> > other host userspace is concerned, qemu is just some random unprivileged 
> > userspace program [...]

> You are quite mistaken: KVM isnt really a 'random unprivileged
> application' in this context, it is clearly an extension of
> system/kernel services.

I don't know what "extension of system/kernel services" means in this
context, beyond something running on the system/kernel, like every
other process.  To clarify, to what extent do you consider your
classification similarly clear for a host is running

* multiple kvm instances run as unprivileged users
* non-kvm OS simulators such as vmware or xen or gdb
* kvm instances running something other than linux

> ( Which can be seen from the simple fact that what started the
> discussion was 'how do we get /proc/kallsyms from the
> guest'. I.e. an extension of the existing host-space /proc/kallsyms
> was desired. )

(Sorry, that smacks of circular reasoning.)

It may be a charming convenience function for perf users to give them
shortcuts for certain favoured configurations (kvm running freshest
linux), but that says more about perf than kvm.


- FChE

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 16:08                                     ` Frank Ch. Eigler
@ 2010-03-16 16:35                                       ` Ingo Molnar
  0 siblings, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 16:35 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang


* Frank Ch. Eigler <fche@redhat.com> wrote:

> Hi -
> 
> On Tue, Mar 16, 2010 at 04:52:21PM +0100, Ingo Molnar wrote:
> > [...]
> > > Perhaps the fact that kvm happens to deal with an interesting application 
> > > area (virtualization) is misleading here.  As far as the host kernel or 
> > > other host userspace is concerned, qemu is just some random unprivileged 
> > > userspace program [...]
> 
> > You are quite mistaken: KVM isnt really a 'random unprivileged
> > application' in this context, it is clearly an extension of
> > system/kernel services.
> 
> I don't know what "extension of system/kernel services" means in this 
> context, beyond something running on the system/kernel, like every other 
> process. [...]

It means something like my example of 'extended to guest space' 
/proc/kallsyms:

> > [...]
> >
> > ( Which can be seen from the simple fact that what started the
> >   discussion was 'how do we get /proc/kallsyms from the guest'. I.e. an 
> >   extension of the existing host-space /proc/kallsyms was desired. )
> 
> (Sorry, that smacks of circular reasoning.)

To me it sounds like an example supporting my point. /proc/kallsyms is a 
service by the kernel, and 'perf kvm' desires this to be extended to guest 
space as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 13:08                           ` Ingo Molnar
  2010-03-16 13:16                             ` Avi Kivity
@ 2010-03-16 17:06                             ` Anthony Liguori
  2010-03-16 17:39                               ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-16 17:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang

On 03/16/2010 08:08 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/16/2010 02:29 PM, Ingo Molnar wrote:
>>      
>    
>>> I mean, i can trust a kernel service and i can trust /proc/kallsyms.
>>>
>>> Can perf trust a random process claiming to be Qemu? What's the trust
>>> mechanism here?
>>>        
>> Obviously you can't trust anything you get from a guest, no matter how you
>> get it.
>>      
> I'm not talking about the symbol strings and addresses, and the object
> contents for allocation (or debuginfo). I'm talking about the basic protocol
> of establishing which guest is which.
>
> I.e. we really want to be able users to:
>
>   1) have it all working with a single guest, without having to specify 'which'
>      guest (qemu PID) to work with. That is the dominant usecase both for
>      developers and for a fair portion of testers.
>    

You're making too many assumptions.

There is no list of guests anymore than there is a list of web browsers.

You can have a multi-tenant scenario where you have distinct groups of 
virtual machines running as unprivileged users.

>   2) Have some reasonable symbolic identification for guests. For example a
>      usable approach would be to have 'perf kvm list', which would list all
>      currently active guests:
>
>       $ perf kvm list
>         [1] Fedora
>         [2] OpenSuse
>         [3] Windows-XP
>         [4] Windows-7
>
>      And from that point on 'perf kvm -g OpenSuse record' would do the obvious
>      thing. Users will be able to just use the 'OpenSuse' symbolic name for
>      that guest, even if the guest got restarted and switched its main PID.
>    

Does "perf kvm list" always run as root?  What if two unprivileged users 
both have a VM named "Fedora"?

If we look at the use-case, it's going to be something like, a user is 
creating virtual machines and wants to get performance information about 
them.

Having to run a separate tool like perf is not going to be what they 
would expect they had to do.  Instead, they would either use their 
existing GUI tool (like virt-manager) or they would use their management 
interface (either QMP or libvirt).

The complexity of interaction is due to the fact that perf shouldn't be 
a stand alone tool.  It should be a library or something with a 
programmatic interface that another tool can make use of.

Regards,

Anthony Liguori

> Is such a scheme possible/available? I suspect all the KVM configuration tools
> (i havent used them in some time - gui and command-line tools alike) use
> similar methods to ease guest management?
>
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 15:52                                   ` Ingo Molnar
  2010-03-16 16:08                                     ` Frank Ch. Eigler
@ 2010-03-16 17:34                                     ` Anthony Liguori
  2010-03-16 17:52                                       ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-16 17:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang

On 03/16/2010 10:52 AM, Ingo Molnar wrote:
> You are quite mistaken: KVM isnt really a 'random unprivileged application' in
> this context, it is clearly an extension of system/kernel services.
>
> ( Which can be seen from the simple fact that what started the discussion was
>    'how do we get /proc/kallsyms from the guest'. I.e. an extension of the
>    existing host-space /proc/kallsyms was desired. )
>    

Random tools (like perf) should not be able to do what you describe.  
It's a security nightmare.

If it's desirable to have /proc/kallsyms available, we can expose an 
interface in QEMU to provide that.  That can then be plumbed through 
libvirt and QMP.

Then a management tool can use libvirt or QMP to obtain that information 
and interact with the kernel appropriately.

> In that sense the most natural 'extension' would be the solution i mentioned a
> week or two ago: to have a (read only) mount of all guest filesystems, plus a
> channel for profiling/tracing data. That would make symbol parsing easier and
> it's what extends the existing 'host space' abstraction in the most natural
> way.
>
> ( It doesnt even have to be done via the kernel - Qemu could implement that
>    via FUSE for example. )
>    

No way.  The guest has sensitive data and exposing it widely on the host 
is a bad thing to do.  It's a bad interface.  We can expose specific 
information about guests but only through our existing channels which 
are validated through a security infrastructure.

Ultimately, your goal is to keep perf a simple tool with little 
dependencies.  But practically speaking, if you want to add features to 
it, it's going to have to interact with other subsystems in the 
appropriate way.  That means, it's going to need to interact with 
libvirt or QMP.

If you want all applications to expose their data via synthetic file 
systems, then there's always plan9 :-)

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 17:06                             ` Anthony Liguori
@ 2010-03-16 17:39                               ` Ingo Molnar
  2010-03-16 23:07                                 ` Anthony Liguori
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 17:39 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo,
	=?unknown-8bit?B?RnLDqWTDqXJpYw==?= Weisbecker


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/16/2010 08:08 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>On 03/16/2010 02:29 PM, Ingo Molnar wrote:
> >>>I mean, i can trust a kernel service and i can trust /proc/kallsyms.
> >>>
> >>>Can perf trust a random process claiming to be Qemu? What's the trust
> >>>mechanism here?
> >>Obviously you can't trust anything you get from a guest, no matter how you
> >>get it.
> >I'm not talking about the symbol strings and addresses, and the object
> >contents for allocation (or debuginfo). I'm talking about the basic protocol
> >of establishing which guest is which.
> >
> >I.e. we really want to be able users to:
> >
> >  1) have it all working with a single guest, without having to specify 'which'
> >     guest (qemu PID) to work with. That is the dominant usecase both for
> >     developers and for a fair portion of testers.
> 
> You're making too many assumptions.
> 
> There is no list of guests anymore than there is a list of web browsers.
> 
> You can have a multi-tenant scenario where you have distinct groups of 
> virtual machines running as unprivileged users.

"multi-tenant" and groups is not a valid excuse at all for giving crappy 
technology in the simplest case: when there's a single VM. Yes, eventually it 
can be supported and any sane scheme will naturally support it too, but it's 
by no means what we care about primarily when it comes to these tools.

I thought everyone learned the lesson behind SystemTap's failure (and to a 
certain degree this was behind Oprofile's failure as well): when it comes to 
tooling/instrumentation we dont want to concentrate on the fancy complex 
setups and abstract requirements drawn up by CIOs, as development isnt being 
done there. Concentrate on our developers today, and provide no-compromises 
usability to those who contribute stuff.

If we dont help make the simplest (and most common) use-case convenient then 
we are failing on a fundamental level.

> >  2) Have some reasonable symbolic identification for guests. For example a
> >     usable approach would be to have 'perf kvm list', which would list all
> >     currently active guests:
> >
> >      $ perf kvm list
> >        [1] Fedora
> >        [2] OpenSuse
> >        [3] Windows-XP
> >        [4] Windows-7
> >
> >     And from that point on 'perf kvm -g OpenSuse record' would do the obvious
> >     thing. Users will be able to just use the 'OpenSuse' symbolic name for
> >     that guest, even if the guest got restarted and switched its main PID.
> 
> Does "perf kvm list" always run as root?  What if two unprivileged users 
> both have a VM named "Fedora"?

Again, the single-VM case is the most important case, by far. If you have 
multiple VMs running and want to develop the kernel on multiple VMs (sounds 
rather messy if you think it through ...), what would happen is similar to 
what happens when we have two probes for example:

 # perf probe schedule
 Added new event:
   probe:schedule                           (on schedule+0)

 You can now use it on all perf tools, such as:

 	perf record -e probe:schedule -a sleep 1

 # perf probe -f schedule   
 Added new event:
   probe:schedule_1                         (on schedule+0)

 You can now use it on all perf tools, such as:

 	perf record -e probe:schedule_1 -a sleep 1

 # perf probe -f schedule
 Added new event:
   probe:schedule_2                         (on schedule+0)

 You can now use it on all perf tools, such as:

 	perf record -e probe:schedule_2 -a sleep 1

Something similar could be used for KVM/Qemu: whichever got created first is 
named 'Fedora', the second is named 'Fedora-2'.

> If we look at the use-case, it's going to be something like, a user is 
> creating virtual machines and wants to get performance information about 
> them.
> 
> Having to run a separate tool like perf is not going to be what they would 
> expect they had to do.  Instead, they would either use their existing GUI 
> tool (like virt-manager) or they would use their management interface 
> (either QMP or libvirt).
> 
> The complexity of interaction is due to the fact that perf shouldn't be a 
> stand alone tool.  It should be a library or something with a programmatic 
> interface that another tool can make use of.

But ... a GUI interface/integration is of course possible too, and it's being 
worked on.

perf is mainly a kernel developer tool, and kernel developers generally dont 
use GUIs to do their stuff: which is the (sole) reason why its first ~850 
commits of tools/perf/ were done without a GUI. We go where our developers 
are.

In any case it's not an excuse to have no proper command-line tooling. In fact 
if you cannot get simpler, more atomic command-line tooling right then you'll 
probably doubly suck at doing a GUI as well.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 17:34                                     ` Anthony Liguori
@ 2010-03-16 17:52                                       ` Ingo Molnar
  2010-03-16 18:06                                         ` Anthony Liguori
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 17:52 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang


* Anthony Liguori <aliguori@linux.vnet.ibm.com> wrote:

> On 03/16/2010 10:52 AM, Ingo Molnar wrote:
> >You are quite mistaken: KVM isnt really a 'random unprivileged application' in
> >this context, it is clearly an extension of system/kernel services.
> >
> >( Which can be seen from the simple fact that what started the discussion was
> >   'how do we get /proc/kallsyms from the guest'. I.e. an extension of the
> >   existing host-space /proc/kallsyms was desired. )
> 
> Random tools (like perf) should not be able to do what you describe. It's a 
> security nightmare.

A security nightmare exactly how? Mind to go into details as i dont understand 
your point.

> If it's desirable to have /proc/kallsyms available, we can expose an 
> interface in QEMU to provide that.  That can then be plumbed through libvirt 
> and QMP.
> 
> Then a management tool can use libvirt or QMP to obtain that information and 
> interact with the kernel appropriately.
> 
> > In that sense the most natural 'extension' would be the solution i 
> > mentioned a week or two ago: to have a (read only) mount of all guest 
> > filesystems, plus a channel for profiling/tracing data. That would make 
> > symbol parsing easier and it's what extends the existing 'host space' 
> > abstraction in the most natural way.
> >
> > ( It doesnt even have to be done via the kernel - Qemu could implement that
> >   via FUSE for example. )
> 
> No way.  The guest has sensitive data and exposing it widely on the host is 
> a bad thing to do. [...]

Firstly, you are putting words into my mouth, as i said nothing about 
'exposing it widely'. I suggest exposing it under the privileges of whoever 
has access to the guest image.

Secondly, regarding confidentiality, and this is guest security 101: whoever 
can access the image on the host _already_ has access to all the guest data!

A Linux image can generally be loopback mounted straight away:

  losetup -o 32256 /dev/loop0 ./guest-image.img
  mount -o ro /dev/loop0 /mnt-guest

(Or, if you are an unprivileged user who cannot mount, it can be read via ext2 
tools.)

There's nothing the guest can do about that. The host is in total control of 
guest image data for heaven's sake!

All i'm suggesting is to make what is already possible more convenient.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 17:52                                       ` Ingo Molnar
@ 2010-03-16 18:06                                         ` Anthony Liguori
  2010-03-16 18:28                                           ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-16 18:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang

On 03/16/2010 12:52 PM, Ingo Molnar wrote:
> * Anthony Liguori<aliguori@linux.vnet.ibm.com>  wrote:
>
>    
>> On 03/16/2010 10:52 AM, Ingo Molnar wrote:
>>      
>>> You are quite mistaken: KVM isnt really a 'random unprivileged application' in
>>> this context, it is clearly an extension of system/kernel services.
>>>
>>> ( Which can be seen from the simple fact that what started the discussion was
>>>    'how do we get /proc/kallsyms from the guest'. I.e. an extension of the
>>>    existing host-space /proc/kallsyms was desired. )
>>>        
>> Random tools (like perf) should not be able to do what you describe. It's a
>> security nightmare.
>>      
> A security nightmare exactly how? Mind to go into details as i dont understand
> your point.
>    

Assume you're using SELinux to implement mandatory access control.  How 
do you label this file system?

Generally speaking, we don't know the difference between /proc/kallsyms 
vs. /dev/mem if we do generic passthrough.  While it might be safe to 
have a relaxed label of kallsyms (since it's read only), it's clearly 
not safe to do that for /dev/mem, /etc/shadow, or any file containing 
sensitive information.

Rather, we ought to expose a higher level interface that we have more 
confidence in with respect to understanding the ramifications of 
exposing that guest data.

>>
>> No way.  The guest has sensitive data and exposing it widely on the host is
>> a bad thing to do. [...]
>>      
> Firstly, you are putting words into my mouth, as i said nothing about
> 'exposing it widely'. I suggest exposing it under the privileges of whoever
> has access to the guest image.
>    

That doesn't work as nicely with SELinux.

It's completely reasonable to have a user that can interact in a read 
only mode with a VM via libvirt but cannot read the guest's disk images 
or the guest's memory contents.

> Secondly, regarding confidentiality, and this is guest security 101: whoever
> can access the image on the host _already_ has access to all the guest data!
>
> A Linux image can generally be loopback mounted straight away:
>
>    losetup -o 32256 /dev/loop0 ./guest-image.img
>    mount -o ro /dev/loop0 /mnt-guest
>
> (Or, if you are an unprivileged user who cannot mount, it can be read via ext2
> tools.)
>
> There's nothing the guest can do about that. The host is in total control of
> guest image data for heaven's sake!
>    

It's not that simple in a MAC environment.

Regards,

Anthony Liguori

> All i'm suggesting is to make what is already possible more convenient.
>
> 	Ingo
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 18:06                                         ` Anthony Liguori
@ 2010-03-16 18:28                                           ` Ingo Molnar
  2010-03-16 23:04                                             ` Anthony Liguori
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-16 18:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang


* Anthony Liguori <aliguori@linux.vnet.ibm.com> wrote:

> On 03/16/2010 12:52 PM, Ingo Molnar wrote:
> >* Anthony Liguori<aliguori@linux.vnet.ibm.com>  wrote:
> >
> >>On 03/16/2010 10:52 AM, Ingo Molnar wrote:
> >>>You are quite mistaken: KVM isnt really a 'random unprivileged application' in
> >>>this context, it is clearly an extension of system/kernel services.
> >>>
> >>>( Which can be seen from the simple fact that what started the discussion was
> >>>   'how do we get /proc/kallsyms from the guest'. I.e. an extension of the
> >>>   existing host-space /proc/kallsyms was desired. )
> >>Random tools (like perf) should not be able to do what you describe. It's a
> >>security nightmare.
> >A security nightmare exactly how? Mind to go into details as i dont understand
> >your point.
> 
> Assume you're using SELinux to implement mandatory access control.
> How do you label this file system?
>
> Generally speaking, we don't know the difference between /proc/kallsyms vs. 
> /dev/mem if we do generic passthrough.  While it might be safe to have a 
> relaxed label of kallsyms (since it's read only), it's clearly not safe to 
> do that for /dev/mem, /etc/shadow, or any file containing sensitive 
> information.

What's your _point_? Please outline a threat model, a vector of attack, 
_anything_ that substantiates your "it's a security nightmare" claim.

> Rather, we ought to expose a higher level interface that we have more 
> confidence in with respect to understanding the ramifications of exposing 
> that guest data.

Exactly, we want something that has a flexible namespace and works well with 
Linux tools in general. Preferably that namespace should be human readable, 
and it should be hierarchic, and it should have a well-known permission model.

This concept exists in Linux and is generally called a 'filesystem'.

> >> No way.  The guest has sensitive data and exposing it widely on the host 
> >> is a bad thing to do. [...]
> >
> > Firstly, you are putting words into my mouth, as i said nothing about 
> > 'exposing it widely'. I suggest exposing it under the privileges of 
> > whoever has access to the guest image.
> 
> That doesn't work as nicely with SELinux.
> 
> It's completely reasonable to have a user that can interact in a read only 
> mode with a VM via libvirt but cannot read the guest's disk images or the 
> guest's memory contents.

If a user cannot read the image file then the user has no access to its 
contents via other namespaces either. That is, of course, a basic security 
aspect.

( That is perfectly true with a non-SELinux Unix permission model as well, and
  is true in the SELinux case as well. )

> > Secondly, regarding confidentiality, and this is guest security 101: whoever
> > can access the image on the host _already_ has access to all the guest data!
> >
> > A Linux image can generally be loopback mounted straight away:
> >
> >   losetup -o 32256 /dev/loop0 ./guest-image.img
> >   mount -o ro /dev/loop0 /mnt-guest
> >
> >(Or, if you are an unprivileged user who cannot mount, it can be read via ext2
> >tools.)
> >
> > There's nothing the guest can do about that. The host is in total control of
> > guest image data for heaven's sake!
> 
> It's not that simple in a MAC environment.

Erm. Please explain to me, what exactly is 'not that simple' in a MAC 
environment?

Also, i'd like to note that the 'restrictive SELinux setups' usecases are 
pretty secondary.

To demonstrate that, i'd like every KVM developer on this list who reads this 
mail and who has their home development system where they produce their 
patches set up in a restrictive MAC environment, in that you cannot even read 
the images you are using, to chime in with a "I'm doing that" reply.

If there's just a _single_ KVM developer amongst dozens and dozens of 
developers on this list who develops in an environment like that i'd be 
surprised. That result should pretty much tell you where the weight of 
instrumentation focus should lie - and it isnt on restrictive MAC environments 
...

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 11:25                   ` Ingo Molnar
  2010-03-16 12:21                     ` Avi Kivity
@ 2010-03-16 22:30                     ` oerg Roedel
  2010-03-16 23:01                       ` Masami Hiramatsu
  2010-03-17  7:27                       ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: oerg Roedel @ 2010-03-16 22:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang

On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote:
> Hm, that sounds rather messy if we want to use it to basically expose kernel 
> functionality in a guest/host unified way. Is the qemu process discoverable in 
> some secure way? Can we trust it? Is there some proper tooling available to do 
> it, or do we have to push it through 2-3 packages to get such a useful feature 
> done?

Since we want to implement a pmu usable for the guest anyway why we
don't just use a guests perf to get all information we want? If we get a
pmu-nmi from the guest we just re-inject it to the guest and perf in the
guest gives us all information we wand including kernel and userspace
symbols, stack traces, and so on.

In the previous thread we discussed about a direct trace channel between
guest and host kernel (which can be used for ftrace events for example).
This channel could be used to transport this information to the host
kernel.

The only additional feature needed is a way for the host to start a perf
instance in the guest.

Opinions?


	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 22:30                     ` [PATCH] Enhance perf to collect KVM guest os statistics from host side oerg Roedel
@ 2010-03-16 23:01                       ` Masami Hiramatsu
  2010-03-17  7:27                       ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Masami Hiramatsu @ 2010-03-16 23:01 UTC (permalink / raw)
  To: oerg Roedel
  Cc: Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang, sungho.kim.zd

oerg Roedel wrote:
> On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote:
>> Hm, that sounds rather messy if we want to use it to basically expose kernel 
>> functionality in a guest/host unified way. Is the qemu process discoverable in 
>> some secure way? Can we trust it? Is there some proper tooling available to do 
>> it, or do we have to push it through 2-3 packages to get such a useful feature 
>> done?
> 
> Since we want to implement a pmu usable for the guest anyway why we
> don't just use a guests perf to get all information we want? If we get a
> pmu-nmi from the guest we just re-inject it to the guest and perf in the
> guest gives us all information we wand including kernel and userspace
> symbols, stack traces, and so on.

I guess this aims to get information from old environments running on
kvm for life extension :)

> In the previous thread we discussed about a direct trace channel between
> guest and host kernel (which can be used for ftrace events for example).
> This channel could be used to transport this information to the host
> kernel.

Interesting! I know the people who are trying to do that with systemtap.
See, http://vesper.sourceforge.net/

> 
> The only additional feature needed is a way for the host to start a perf
> instance in the guest.

# ssh localguest perf record --host-chanel ... ? B-)

Thank you,

> 
> Opinions?
> 
> 
> 	Joerg
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Masami Hiramatsu
e-mail: mhiramat@redhat.com

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 18:28                                           ` Ingo Molnar
@ 2010-03-16 23:04                                             ` Anthony Liguori
  2010-03-17  0:41                                               ` Frank Ch. Eigler
  2010-03-17  8:53                                               ` Ingo Molnar
  0 siblings, 2 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-16 23:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang

On 03/16/2010 01:28 PM, Ingo Molnar wrote:
> * Anthony Liguori<aliguori@linux.vnet.ibm.com>  wrote:
>
>    
>> On 03/16/2010 12:52 PM, Ingo Molnar wrote:
>>      
>>> * Anthony Liguori<aliguori@linux.vnet.ibm.com>   wrote:
>>>
>>>        
>>>> On 03/16/2010 10:52 AM, Ingo Molnar wrote:
>>>>          
>>>>> You are quite mistaken: KVM isnt really a 'random unprivileged application' in
>>>>> this context, it is clearly an extension of system/kernel services.
>>>>>
>>>>> ( Which can be seen from the simple fact that what started the discussion was
>>>>>    'how do we get /proc/kallsyms from the guest'. I.e. an extension of the
>>>>>    existing host-space /proc/kallsyms was desired. )
>>>>>            
>>>> Random tools (like perf) should not be able to do what you describe. It's a
>>>> security nightmare.
>>>>          
>>> A security nightmare exactly how? Mind to go into details as i dont understand
>>> your point.
>>>        
>> Assume you're using SELinux to implement mandatory access control.
>> How do you label this file system?
>>
>> Generally speaking, we don't know the difference between /proc/kallsyms vs.
>> /dev/mem if we do generic passthrough.  While it might be safe to have a
>> relaxed label of kallsyms (since it's read only), it's clearly not safe to
>> do that for /dev/mem, /etc/shadow, or any file containing sensitive
>> information.
>>      
> What's your _point_? Please outline a threat model, a vector of attack,
> _anything_ that substantiates your "it's a security nightmare" claim.
>    

You suggested "to have a (read only) mount of all guest filesystems".

As I described earlier, not all of the information within the guest 
filesystem has the same level of sensitivity.  If you exposed a generic 
interface like this, it makes it very difficult to delegate privileges.

Delegating privileges is important because from in a higher security 
environment, you may want to prevent a management tool from accessing 
the VM's disk directly, but still allow it to do basic operations (in 
particular, to view performance statistics).

>> Rather, we ought to expose a higher level interface that we have more
>> confidence in with respect to understanding the ramifications of exposing
>> that guest data.
>>      
> Exactly, we want something that has a flexible namespace and works well with
> Linux tools in general. Preferably that namespace should be human readable,
> and it should be hierarchic, and it should have a well-known permission model.
>
> This concept exists in Linux and is generally called a 'filesystem'.
>    

If you want to use a synthetic filesystem as the management interface 
for qemu, that's one thing.  But you suggested exposing the guest 
filesystem in its entirely and that's what I disagreed with.

> If a user cannot read the image file then the user has no access to its
> contents via other namespaces either. That is, of course, a basic security
> aspect.
>
> ( That is perfectly true with a non-SELinux Unix permission model as well, and
>    is true in the SELinux case as well. )
>    

I don't think that's reasonable at all.  The guest may encrypt it's disk 
image.  It still ought to be possible to run perf against that guest, no?

> Erm. Please explain to me, what exactly is 'not that simple' in a MAC
> environment?
>
> Also, i'd like to note that the 'restrictive SELinux setups' usecases are
> pretty secondary.
>
> To demonstrate that, i'd like every KVM developer on this list who reads this
> mail and who has their home development system where they produce their
> patches set up in a restrictive MAC environment, in that you cannot even read
> the images you are using, to chime in with a "I'm doing that" reply.
>    

My home system doesn't run SELinux but I work daily with systems that 
are using SELinux.

I want to be able to run tools like perf on these systems because 
ultimately, I need to debug these systems on a daily basis.

But that's missing the point.  We want to have an interface that works 
for both cases so that we're not maintaining two separate interfaces.

We've rat holed a bit though.  You want:

1) to run perf kvm list and be able to enumerate KVM guests

2) for this to Just Work with qemu guests launched from the command line

You could achieve (1) by tying perf to libvirt but that won't work for 
(2).  There are a few practical problems with (2).

qemu does not require the user to associate any uniquely identifying 
information with a VM.  We've also optimized the command line use case 
so that if all you want to do is run a disk image, you just execute 
"qemu foo.img".  To satisfy your use case, we would either have to force 
a use to always specify unique information, which would be less 
convenient for our users or we would have to let the name be an optional 
parameter.

As it turns out, we already support "qemu -name Fedora foo.img".  What 
we don't do today, but I've been suggesting we should, is automatically 
create a QMP management socket in a well known location based on the 
-name parameter when it's specified.  That would let a tool like perf 
Just Work provided that a user specified -name.

No one uses -name today though and I'm sure you don't either.

The only way to really address this is to change the interaction.  
Instead of running perf externally to qemu, we should support a perf 
command in the qemu monitor that can then tie directly to the perf 
tooling.  That gives us the best possible user experience.

We can't do that though unless perf is a library or is in some way more 
programmatic.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 17:39                               ` Ingo Molnar
@ 2010-03-16 23:07                                 ` Anthony Liguori
  2010-03-17  8:10                                   ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-16 23:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Frédéric Weisbecker

On 03/16/2010 12:39 PM, Ingo Molnar wrote:
>> If we look at the use-case, it's going to be something like, a user is
>> creating virtual machines and wants to get performance information about
>> them.
>>
>> Having to run a separate tool like perf is not going to be what they would
>> expect they had to do.  Instead, they would either use their existing GUI
>> tool (like virt-manager) or they would use their management interface
>> (either QMP or libvirt).
>>
>> The complexity of interaction is due to the fact that perf shouldn't be a
>> stand alone tool.  It should be a library or something with a programmatic
>> interface that another tool can make use of.
>>      
> But ... a GUI interface/integration is of course possible too, and it's being
> worked on.
>
> perf is mainly a kernel developer tool, and kernel developers generally dont
> use GUIs to do their stuff: which is the (sole) reason why its first ~850
> commits of tools/perf/ were done without a GUI. We go where our developers
> are.
>
> In any case it's not an excuse to have no proper command-line tooling. In fact
> if you cannot get simpler, more atomic command-line tooling right then you'll
> probably doubly suck at doing a GUI as well.
>    

It's about who owns the user interface.

If qemu owns the user interface, than we can satisfy this in a very 
simple way by adding a perf monitor command.  If we have to support 
third party tools, then it significantly complicates things.

Regards,

Anthony Liguori

> 	Ingo
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 23:04                                             ` Anthony Liguori
@ 2010-03-17  0:41                                               ` Frank Ch. Eigler
  2010-03-17  3:54                                                 ` Avi Kivity
  2010-03-17  8:14                                                 ` Ingo Molnar
  2010-03-17  8:53                                               ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: Frank Ch. Eigler @ 2010-03-17  0:41 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang

Hi -

On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote:
> [...]
> The only way to really address this is to change the interaction.  
> Instead of running perf externally to qemu, we should support a perf 
> command in the qemu monitor that can then tie directly to the perf 
> tooling.  That gives us the best possible user experience.

To what extent could this be solved with less crossing of
isolation/abstraction layers, if the perfctr facilities were properly
virtualized?  That way guests could run perf goo internally.
Optionally virt tools on the host side could aggregate data from
cooperating self-monitoring guests.

- FChE

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  9:32     ` Avi Kivity
@ 2010-03-17  2:34       ` Zhang, Yanmin
  2010-03-17  9:28         ` Sheng Yang
  0 siblings, 1 reply; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-17  2:34 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Joerg Roedel

On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
> >
> >> Excellent, support for guest kernel != host kernel is critical (I can't
> >> remember the last time I ran same kernels).
> >>
> >> How would we support multiple guests with different kernels?
> >>      
> > With the patch, 'perf kvm report --sort pid" could show
> > summary statistics for all guest os instances. Then, use
> > parameter --pid of 'perf kvm record' to collect single problematic instance data.
> >    
> 
> That certainly works, though automatic association of guest data with 
> guest symbols is friendlier.
Thanks. Originally, I planed to add a -G parameter to perf. Such like
-G 8888:/XXX/XXX/guestkallsyms:/XXX/XXX/modules,8889:/XXX/XXX/guestkallsyms:/XXX/XXX/modules
8888 and 8889 are just qemu guest pid.

So we could define multiple guest os symbol files. But it seems ugly,
and 'perf kvm report --sort pid" and 'perf kvm top --pid' could provide
similar functionality.

> 
> >>> diff -Nraup linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c
> >>> --- linux-2.6_tipmaster0315/arch/x86/kvm/vmx.c	2010-03-16 08:59:11.825295404 +0800
> >>> +++ linux-2.6_tipmaster0315_perfkvm/arch/x86/kvm/vmx.c	2010-03-16 09:01:09.976084492 +0800
> >>> @@ -26,6 +26,7 @@
> >>>    #include<linux/sched.h>
> >>>    #include<linux/moduleparam.h>
> >>>    #include<linux/ftrace_event.h>
> >>> +#include<linux/perf_event.h>
> >>>    #include "kvm_cache_regs.h"
> >>>    #include "x86.h"
> >>>
> >>> @@ -3632,6 +3633,43 @@ static void update_cr8_intercept(struct
> >>>    	vmcs_write32(TPR_THRESHOLD, irr);
> >>>    }
> >>>
> >>> +DEFINE_PER_CPU(int, kvm_in_guest) = {0};
> >>> +
> >>> +static void kvm_set_in_guest(void)
> >>> +{
> >>> +	percpu_write(kvm_in_guest, 1);
> >>> +}
> >>> +
> >>> +static int kvm_is_in_guest(void)
> >>> +{
> >>> +	return percpu_read(kvm_in_guest);
> >>> +}
> >>>
> >>>        
> >>      
> >    
> >> There is already PF_VCPU for this.
> >>      
> > Right, but there is a scope between kvm_guest_enter and really running
> > in guest os, where a perf event might overflow. Anyway, the scope is very
> > narrow, I will change it to use flag PF_VCPU.
> >    
> 
> There is also a window between setting the flag and calling 'int $2' 
> where an NMI might happen and be accounted incorrectly.
> 
> Perhaps separate the 'int $2' into a direct call into perf and another 
> call for the rest of NMI handling.  I don't see how it would work on svm 
> though - AFAICT the NMI is held whereas vmx swallows it. 

>  I guess NMIs 
> will be disabled until the next IRET so it isn't racy, just tricky.
I'm not sure if vmexit does break NMI context or not. Hardware NMI context
isn't reentrant till a IRET. YangSheng would like to double check it.

> 
> >>> +static struct perf_guest_info_callbacks kvm_guest_cbs = {
> >>> +	.is_in_guest 		= kvm_is_in_guest,
> >>> +	.is_user_mode		= kvm_is_user_mode,
> >>> +	.get_guest_ip		= kvm_get_guest_ip,
> >>> +	.reset_in_guest		= kvm_reset_in_guest,
> >>> +};
> >>>
> >>>        
> >> Should be in common code, not vmx specific.
> >>      
> > Right. I discussed with Yangsheng. I will move above data structures and
> > callbacks to file arch/x86/kvm/x86.c, and add get_ip, a new callback to
> > kvm_x86_ops.
> >    
> 
> You will need access to the vcpu pointer (kvm_rip_read() needs it), you 
> can put it in a percpu variable.
We do so now in a new patch.

>   I guess if it's not null, you know 
> you're in a guest, so no need for PF_VCPU.
Good suggestion.

Thanks.



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  0:41                                               ` Frank Ch. Eigler
@ 2010-03-17  3:54                                                 ` Avi Kivity
  2010-03-17  8:16                                                   ` Ingo Molnar
  2010-03-18  5:27                                                     ` Huang, Zhiteng
  2010-03-17  8:14                                                 ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-17  3:54 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Anthony Liguori, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang

On 03/17/2010 02:41 AM, Frank Ch. Eigler wrote:
> Hi -
>
> On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote:
>    
>> [...]
>> The only way to really address this is to change the interaction.
>> Instead of running perf externally to qemu, we should support a perf
>> command in the qemu monitor that can then tie directly to the perf
>> tooling.  That gives us the best possible user experience.
>>      
> To what extent could this be solved with less crossing of
> isolation/abstraction layers, if the perfctr facilities were properly
> virtualized?
>    

That's the more interesting (by far) usage model.  In general guest 
owners don't have access to the host, and host owners can't (and 
shouldn't) change guests.

Monitoring guests from the host is useful for kvm developers, but less 
so for users.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 22:30                     ` [PATCH] Enhance perf to collect KVM guest os statistics from host side oerg Roedel
  2010-03-16 23:01                       ` Masami Hiramatsu
@ 2010-03-17  7:27                       ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-17  7:27 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang


* oerg Roedel <joro@8bytes.org> wrote:

> On Tue, Mar 16, 2010 at 12:25:00PM +0100, Ingo Molnar wrote:
> > Hm, that sounds rather messy if we want to use it to basically expose kernel 
> > functionality in a guest/host unified way. Is the qemu process discoverable in 
> > some secure way? Can we trust it? Is there some proper tooling available to do 
> > it, or do we have to push it through 2-3 packages to get such a useful feature 
> > done?
> 
> Since we want to implement a pmu usable for the guest anyway why we don't 
> just use a guests perf to get all information we want? [...]

Look at the previous posting of this patch, this is something new and rather 
unique. The main power in the 'perf kvm' kind of instrumentation is to profile 
_both_ the host and the guest on the host, using the same tool (often using 
the same kernel) and using similar workloads, and do profile comparisons using 
'perf diff'.

Note that KVM's in-kernel design makes it easy to offer this kind of 
host/guest shared implementation that Yanmin has created. Other virtulization 
solutions with a poorer design (for example where the hypervisor code base is 
split away from the guest implementation) will have it much harder to create 
something similar.

That kind of integrated approach can result in very interesting finds straight 
away, see:

  http://lkml.indiana.edu/hypermail/linux/kernel/1003.0/00613.html

( the profile there demoes the need for spinlock accelerators for example - 
  there's clearly assymetrically large overhead in guest spinlock code. Guess 
  how much else we'll be able to find with a full 'perf kvm' implementation. )

One of the main goals of a virtualization implementation is to eliminate as 
many performance differences to the host kernel as possible. From the first 
day KVM was released the overriding question from users was always: 'how much 
slower is it than native, and which workloads are hit worst, and why, and 
could you pretty please speed up important workload XYZ'.

'perf kvm' helps exactly that kind of development workflow.

Note that with oprofile you can already do separate guest space and host space 
profiling (with the timer driven fallbackin the guest). One idea with 'perf 
kvm' is to change that paradigm of forced separation and forced duplication 
and to supprt the workflow that most developers employ: use the host space for 
development and unify instrumentation in an intuitive framework. Yanmin's 
'perf kvm' patch is a very good step towards that goal.

Anyway ... look at the patches, try them and see it for yourself. Back in the 
days when i did KVM performance work i wish i had something like Yanmin's 
'perf kvm' feature. I'd probably still be hacking KVM today ;-)

So, the code is there, it's useful and it's up to you guys whether you live 
with this opportunity - the perf developers are certainly eager to help out 
with the details. There's already tons of per kernel subsystem perf helper 
tools: perf sched, perf kmem, perf lock, perf bench, perf timechart.

'perf kvm' is really a natural and good next step IMO that underlines the main 
design goodness KVM brought to the world of virtualization: proper guest/host 
code base integration.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-16 23:07                                 ` Anthony Liguori
@ 2010-03-17  8:10                                   ` Ingo Molnar
  2010-03-18  8:20                                     ` Avi Kivity
                                                       ` (3 more replies)
  0 siblings, 4 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-17  8:10 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/16/2010 12:39 PM, Ingo Molnar wrote:
> >>If we look at the use-case, it's going to be something like, a user is
> >>creating virtual machines and wants to get performance information about
> >>them.
> >>
> >>Having to run a separate tool like perf is not going to be what they would
> >>expect they had to do.  Instead, they would either use their existing GUI
> >>tool (like virt-manager) or they would use their management interface
> >>(either QMP or libvirt).
> >>
> >>The complexity of interaction is due to the fact that perf shouldn't be a
> >>stand alone tool.  It should be a library or something with a programmatic
> >>interface that another tool can make use of.
> >But ... a GUI interface/integration is of course possible too, and it's being
> >worked on.
> >
> >perf is mainly a kernel developer tool, and kernel developers generally dont
> >use GUIs to do their stuff: which is the (sole) reason why its first ~850
> >commits of tools/perf/ were done without a GUI. We go where our developers
> >are.
> >
> >In any case it's not an excuse to have no proper command-line tooling. In fact
> >if you cannot get simpler, more atomic command-line tooling right then you'll
> >probably doubly suck at doing a GUI as well.
> 
> It's about who owns the user interface.
> 
> If qemu owns the user interface, than we can satisfy this in a very simple 
> way by adding a perf monitor command.  If we have to support third party 
> tools, then it significantly complicates things.

Of course illogical modularization complicates things 'significantly'.

I wish both you and Avi looked back 3-4 years and realized what made KVM so 
successful back then and why the hearts and minds of virtualization developers 
were captured by KVM almost overnight.

KVM's main strength back then was that it was a surprisingly functional piece 
of code offered by a 10 KLOC patch - right on the very latest upstream kernel. 
Code was shared with upstream, there was version parity, and it all was in the 
same single repo which was (and is) a pleasure to develop on.

Unlike Xen, which was a 200+ KLOC patch on top of a forked 10 MLOC kernel a 
few upstream versions back. Xen had constant version friction due to that fork 
and due to that forced/false separation/modularization: Xen _itself_ was a 
fork of Linux to begin with. (for exampe Xen still had my copyrights last i 
checked, which it got from old Linux code i worked on)

That forced separation and version friction in Xen was a development and 
productization nightmare, and developing on KVM was a truly refreshing 
experience. (I'll go out on a limb to declare that you wont find a _single_ 
developer on this list who will tells us otherwise.)

Fast forward to 2010. The kernel side of KVM is maximum goodness - by far the 
worst-quality remaining aspects of KVM are precisely in areas that you 
mention: 'if we have to support third party tools, then it significantly 
complicates things'. You kept Qemu as an external 'third party' entity to KVM, 
and KVM is clearly hurting from that - just see the recent KVM usability 
thread for examples about suckage.

So a similar 'complication' is the crux of the matter behind KVM quality 
problems: you've not followed through with the original KVM vision and you 
have not applied that concept to Qemu!

And please realize that the user does not care that KVM's kernel bits are top 
notch, if the rest of the package has sucky aspects: it's always the weakest 
link of the chain that matters to the user.

Xen sucked because of such design shortsightedness on the kernel level, and 
now KVM suffers from it on the user space level.

If you want to jump to the next level of technological quality you need to fix 
this attitude and you need to go back to the design roots of KVM. Concentrate 
on Qemu (as that is the weakest link now), make it a first class member of the 
KVM repo and simplify your development model by having a single repo:

 - move a clean (and minimal) version of the Qemu code base to tools/kvm/, in
   the upstream kernel repo, and work on that from that point on.

 - co-develop new features within the same patch. Release new versions of
   kvm-qemu and the kvm bits at the same time (together with the upstream
   kernel), at well defined points in time.

 - encourage kernel-space and user-space KVM developers to work on both 
   user-space and kernel-space bits as a single unit. It's one project and a
   single experience to the user.

 - [ and probably libvirt should go there too ]

If KVM's hypervisor and guest kernel code can enjoy the benefits of a single 
repository, why cannot the rest of KVM enjoy the same developer goodness? Only 
fixing that will bring the break-through in quality - not more manpower 
really.

Yes, i've read a thousand excuses for why this is an absolutely impossible and 
a bad thing to do, and none of them was really convincing to me - and you also 
have become rather emotional about all the arguments so it's hard to argue 
about it on a technical basis.

We made a similar (admittedly very difficult ...) design jump from oprofile to 
perf, and i can tell you from that experience that it's day and night, both in 
terms of development and in terms of the end result!

( We recently also made another, kernel/kernel unification that had a very
  positive result: we unified the 32-bit and 64-bit x86 architectures. Even 
  within the same repo the unification of technology is generally a good 
  thing. The KVM/Qemu situation is different - it's more similar to the perf 
  design. )

Not having to fight artificial package boundaries and forced package 
separation is very refreshing experience to a developer - and very rewarding 
and flexible to develop on. ABI compatibility is _easier_ to maintain in such 
a model. It's quite similar to the jump from Xen hacking to KVM hacking (i did 
both). It's a bit like the jump from CVS to Git. Trust me, you _cannot_ know 
the difference if you havent tried a similar jump with Qemu.

Anyway, you made your position about this rather clear and you are clearly 
uncompromising, so i just wanted to post this note to the list: you'll waste 
years of your life on a visibly crappy development model that has been unable 
to break through a magic usability barrier for the past 2-3 years - just like 
the Xen mis-design has wasted so many people's time and effort in kernel 
space.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  0:41                                               ` Frank Ch. Eigler
  2010-03-17  3:54                                                 ` Avi Kivity
@ 2010-03-17  8:14                                                 ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-17  8:14 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang


* Frank Ch. Eigler <fche@redhat.com> wrote:

> Hi -
> 
> On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote:
> > [...]
> > The only way to really address this is to change the interaction.  
> > Instead of running perf externally to qemu, we should support a perf 
> > command in the qemu monitor that can then tie directly to the perf 
> > tooling.  That gives us the best possible user experience.
> 
> To what extent could this be solved with less crossing of 
> isolation/abstraction layers, if the perfctr facilities were properly 
> virtualized? [...]

Note, 'perfctr' is a different out-of-tree Linux kernel project run by someone 
else: it offers the /dev/perfctr special-purpose device that allows raw, 
unabstracted, low-level access to the PMU.

I suspect the one you wanted to mention here is called 'perf' or 'perf 
events'. (and used to be called 'performance counters' or 'perfcounters' until 
it got renamed about a year ago)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  3:54                                                 ` Avi Kivity
@ 2010-03-17  8:16                                                   ` Ingo Molnar
  2010-03-17  8:20                                                     ` Avi Kivity
  2010-03-18  5:27                                                     ` Huang, Zhiteng
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-17  8:16 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Frank Ch. Eigler, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang


* Avi Kivity <avi@redhat.com> wrote:

> Monitoring guests from the host is useful for kvm developers, but less so 
> for users.

Guest space profiling is easy, and 'perf kvm' is not about that. (plain 'perf' 
will work if a proper paravirt channel is opened to the host)

I think you might have misunderstood the purpose and role of the 'perf kvm' 
patch here? 'perf kvm' is aimed at KVM developers: it is them who improve KVM 
code, not guest kernel users.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  8:16                                                   ` Ingo Molnar
@ 2010-03-17  8:20                                                     ` Avi Kivity
  2010-03-17  8:59                                                       ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-17  8:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frank Ch. Eigler, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang

On 03/17/2010 10:16 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> Monitoring guests from the host is useful for kvm developers, but less so
>> for users.
>>      
> Guest space profiling is easy, and 'perf kvm' is not about that. (plain 'perf'
> will work if a proper paravirt channel is opened to the host)
>
> I think you might have misunderstood the purpose and role of the 'perf kvm'
> patch here? 'perf kvm' is aimed at KVM developers: it is them who improve KVM
> code, not guest kernel users.
>    

Of course I understood it.  My point was that 'perf kvm' serves a tiny 
minority of users.  That doesn't mean it isn't useful, just that it 
doesn't satisfy all needs by itself.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16 23:04                                             ` Anthony Liguori
  2010-03-17  0:41                                               ` Frank Ch. Eigler
@ 2010-03-17  8:53                                               ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-17  8:53 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Frank Ch. Eigler, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang


* Anthony Liguori <aliguori@linux.vnet.ibm.com> wrote:

> If you want to use a synthetic filesystem as the management interface for 
> qemu, that's one thing.  But you suggested exposing the guest filesystem in 
> its entirely and that's what I disagreed with.

What did you think, that it would be world-readable? Why would we do such a 
stupid thing? Any mounted content should at minimum match whatever policy 
covers the image file. The mounting of contents is not a privilege escallation 
and it is already possible today - just not integrated properly and not 
practical. (and apparently not implemented for all the wrong 'security' 
reasons)

> The guest may encrypt it's disk image.  It still ought to be possible to run 
> perf against that guest, no?

_In_ the guest you can of course run it just fine. (once paravirt bits are in 
place)

That has no connection to 'perf kvm' though, which this patch submission is 
about ...

If you want unified profiling of both host and guest then you need access to 
both the guest and the host. This is what the 'perf kvm' patch is about. 
Please read the patch, i think you might be misunderstanding what it does ...

Regarding encrypted contents - that's really a distraction but the host has 
absolute, 100% control over the guest and there's nothing the guest can do 
about that - unless you are thinking about the sub-sub-case of Orwellian 
DRM-locked-down systems - in which case there's nothing for the host to mount 
and the guest can reject any requests for information on itself and impose 
additional policy that way. So it's a security non-issue.

Note that DRM is pretty much the worst place to look at when it comes to 
usability: DRM lock-down is the anti-thesis of usability. Do you really want 
KVM to match the mind-set of the RIAA and MPAA? Why do you pretend that a 
developer cannot mount his own disk image? Pretty please, help Linux instead, 
where development is driven by usability and accessibility ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  8:20                                                     ` Avi Kivity
@ 2010-03-17  8:59                                                       ` Ingo Molnar
  0 siblings, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-17  8:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Frank Ch. Eigler, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang


* Avi Kivity <avi@redhat.com> wrote:

> On 03/17/2010 10:16 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >> Monitoring guests from the host is useful for kvm developers, but less so
> >> for users.
> >
> > Guest space profiling is easy, and 'perf kvm' is not about that. (plain 
> > 'perf' will work if a proper paravirt channel is opened to the host)
> >
> > I think you might have misunderstood the purpose and role of the 'perf 
> > kvm' patch here? 'perf kvm' is aimed at KVM developers: it is them who 
> > improve KVM code, not guest kernel users.
> 
> Of course I understood it.  My point was that 'perf kvm' serves a tiny 
> minority of users. [...]

I hope you wont be disappointed to learn that 100% of Linux, all 13+ million 
lines of it, was and is being developed by a tiny, tiny, tiny minority of 
users ;-)

> [...]  That doesn't mean it isn't useful, just that it doesn't satisfy all 
> needs by itself.

Of course - and it doesnt bring world peace either. One step at a time.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  9:47       ` Ingo Molnar
@ 2010-03-17  9:26         ` Zhang, Yanmin
  2010-03-18  2:45           ` Zhang, Yanmin
  0 siblings, 1 reply; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-17  9:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang

On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote:
> * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> 
> > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
> > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > > > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
> > > > >
> > > > > Based on the discussion in KVM community, I worked out the patch to support
> > > > > perf to collect guest os statistics from host side. This patch is implemented
> > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> > > > > critical bug and provided good suggestions with other guys. I really appreciate
> > > > > their kind help.
> > > > >
> > > > > The patch adds new subcommand kvm to perf.
> > > > >
> > > > >    perf kvm top
> > > > >    perf kvm record
> > > > >    perf kvm report
> > > > >    perf kvm diff
> > > > >
> > > > > The new perf could profile guest os kernel except guest os user space, but it
> > > > > could summarize guest os user space utilization per guest os.
> > > > >
> > > > > Below are some examples.
> > > > > 1) perf kvm top
> > > > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> > > > > --guestmodules=/home/ymzhang/guest/modules top
> > > > >
> > > > >    
> > > > 
> > > Thanks for your kind comments.
> > > 
> > > > Excellent, support for guest kernel != host kernel is critical (I can't 
> > > > remember the last time I ran same kernels).
> > > > 
> > > > How would we support multiple guests with different kernels?
> > > With the patch, 'perf kvm report --sort pid" could show
> > > summary statistics for all guest os instances. Then, use
> > > parameter --pid of 'perf kvm record' to collect single problematic instance data.
> > Sorry. I found currently --pid isn't process but a thread (main thread).
> > 
> > Ingo,
> > 
> > Is it possible to support a new parameter or extend --inherit, so 'perf 
> > record' and 'perf top' could collect data on all threads of a process when 
> > the process is running?
> > 
> > If not, I need add a new ugly parameter which is similar to --pid to filter 
> > out process data in userspace.
> 
> Yeah. For maximum utility i'd suggest to extend --pid to include this, and 
> introduce --tid for the previous, limited-to-a-single-task functionality.
> 
> Most users would expect --pid to work like a 'late attach' - i.e. to work like 
> strace -f or like a gdb attach.

Thanks Ingo, Avi.

I worked out below patch against tip/master of March 15th.

Subject: [PATCH] Change perf's parameter --pid to process-wide collection
From: Zhang, Yanmin <yanmin_zhang@linux.intel.com>

Change parameter -p (--pid) to real process pid and add -t (--tid) meaning
thread id. Now, --pid means perf collects the statistics of all threads of
the process, while --tid means perf just collect the statistics of that thread.

BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures
attr->disabled=1 if it isn't a system-wide collection. If there is a '-p'
and no forks, 'perf stat -p' doesn't collect any data. In addition, the
while(!done) in run_perf_stat consumes 100% single cpu time which has bad impact
on running workload. I added a sleep(1) in the loop.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>

---

diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-record.c linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-record.c
--- linux-2.6_tipmaster0315/tools/perf/builtin-record.c	2010-03-16 08:59:54.896488489 +0800
+++ linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-record.c	2010-03-17 16:30:17.755551706 +0800
@@ -27,7 +27,7 @@
 #include <unistd.h>
 #include <sched.h>
 
-static int			fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int			*fd[MAX_NR_CPUS][MAX_COUNTERS];
 
 static long			default_interval		=      0;
 
@@ -43,6 +43,9 @@ static int			raw_samples			=      0;
 static int			system_wide			=      0;
 static int			profile_cpu			=     -1;
 static pid_t			target_pid			=     -1;
+static pid_t			target_tid			=     -1;
+static int			*all_tids			=      NULL;
+static int			thread_num			=      0;
 static pid_t			child_pid			=     -1;
 static int			inherit				=      1;
 static int			force				=      0;
@@ -60,7 +63,7 @@ static struct timeval		this_read;
 
 static u64			bytes_written			=      0;
 
-static struct pollfd		event_array[MAX_NR_CPUS * MAX_COUNTERS];
+static struct pollfd		*event_array;
 
 static int			nr_poll				=      0;
 static int			nr_cpu				=      0;
@@ -77,7 +80,7 @@ struct mmap_data {
 	unsigned int		prev;
 };
 
-static struct mmap_data		mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
+static struct mmap_data		*mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
 
 static unsigned long mmap_read_head(struct mmap_data *md)
 {
@@ -225,11 +228,12 @@ static struct perf_header_attr *get_head
 	return h_attr;
 }
 
-static void create_counter(int counter, int cpu, pid_t pid, bool forks)
+static void create_counter(int counter, int cpu, bool forks)
 {
 	char *filter = filters[counter];
 	struct perf_event_attr *attr = attrs + counter;
 	struct perf_header_attr *h_attr;
+	int thread_index;
 	int track = !counter; /* only the first counter needs these */
 	int ret;
 	struct {
@@ -280,117 +284,124 @@ static void create_counter(int counter, 
 	if (forks)
 		attr->enable_on_exec = 1;
 
+	for (thread_index = 0; thread_index < thread_num; thread_index++) {
 try_again:
-	fd[nr_cpu][counter] = sys_perf_event_open(attr, pid, cpu, group_fd, 0);
+		fd[nr_cpu][counter][thread_index] = sys_perf_event_open(attr,
+				all_tids[thread_index], cpu, group_fd, 0);
 
-	if (fd[nr_cpu][counter] < 0) {
-		int err = errno;
+		if (fd[nr_cpu][counter][thread_index] < 0) {
+			int err = errno;
 
-		if (err == EPERM || err == EACCES)
-			die("Permission error - are you root?\n"
-			    "\t Consider tweaking /proc/sys/kernel/perf_event_paranoid.\n");
-		else if (err ==  ENODEV && profile_cpu != -1)
-			die("No such device - did you specify an out-of-range profile CPU?\n");
+			if (err == EPERM || err == EACCES)
+				die("Permission error - are you root?\n"
+						"\t Consider tweaking /proc/sys/kernel/perf_event_paranoid.\n");
+			else if (err ==  ENODEV && profile_cpu != -1)
+				die("No such device - did you specify an out-of-range profile CPU?\n");
 
-		/*
-		 * If it's cycles then fall back to hrtimer
-		 * based cpu-clock-tick sw counter, which
-		 * is always available even if no PMU support:
-		 */
-		if (attr->type == PERF_TYPE_HARDWARE
-			&& attr->config == PERF_COUNT_HW_CPU_CYCLES) {
+			/*
+			 * If it's cycles then fall back to hrtimer
+			 * based cpu-clock-tick sw counter, which
+			 * is always available even if no PMU support:
+			 */
+			if (attr->type == PERF_TYPE_HARDWARE
+					&& attr->config == PERF_COUNT_HW_CPU_CYCLES) {
 
-			if (verbose)
-				warning(" ... trying to fall back to cpu-clock-ticks\n");
-			attr->type = PERF_TYPE_SOFTWARE;
-			attr->config = PERF_COUNT_SW_CPU_CLOCK;
-			goto try_again;
-		}
-		printf("\n");
-		error("perfcounter syscall returned with %d (%s)\n",
-			fd[nr_cpu][counter], strerror(err));
+				if (verbose)
+					warning(" ... trying to fall back to cpu-clock-ticks\n");
+				attr->type = PERF_TYPE_SOFTWARE;
+				attr->config = PERF_COUNT_SW_CPU_CLOCK;
+				goto try_again;
+			}
+			printf("\n");
+			error("perfcounter syscall returned with %d (%s)\n",
+				fd[nr_cpu][counter][thread_index],
+				strerror(err));
 
 #if defined(__i386__) || defined(__x86_64__)
-		if (attr->type == PERF_TYPE_HARDWARE && err == EOPNOTSUPP)
-			die("No hardware sampling interrupt available. No APIC? If so then you can boot the kernel with the \"lapic\" boot parameter to force-enable it.\n");
+			if (attr->type == PERF_TYPE_HARDWARE && err == EOPNOTSUPP)
+				die("No hardware sampling interrupt available. No APIC? If so then you can boot the kernel with the \"lapic\" boot parameter to force-enable it.\n");
 #endif
 
-		die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
-		exit(-1);
-	}
+			die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
+			exit(-1);
+		}
 
-	h_attr = get_header_attr(attr, counter);
-	if (h_attr == NULL)
-		die("nomem\n");
+		h_attr = get_header_attr(attr, counter);
+		if (h_attr == NULL)
+			die("nomem\n");
+
+		if (!file_new) {
+			if (memcmp(&h_attr->attr, attr, sizeof(*attr))) {
+				fprintf(stderr, "incompatible append\n");
+				exit(-1);
+			}
+		}
 
-	if (!file_new) {
-		if (memcmp(&h_attr->attr, attr, sizeof(*attr))) {
-			fprintf(stderr, "incompatible append\n");
+		if (read(fd[nr_cpu][counter][thread_index], &read_data, sizeof(read_data)) == -1) {
+			perror("Unable to read perf file descriptor\n");
 			exit(-1);
 		}
-	}
-
-	if (read(fd[nr_cpu][counter], &read_data, sizeof(read_data)) == -1) {
-		perror("Unable to read perf file descriptor\n");
-		exit(-1);
-	}
 
-	if (perf_header_attr__add_id(h_attr, read_data.id) < 0) {
-		pr_warning("Not enough memory to add id\n");
-		exit(-1);
-	}
+		if (perf_header_attr__add_id(h_attr, read_data.id) < 0) {
+			pr_warning("Not enough memory to add id\n");
+			exit(-1);
+		}
 
-	assert(fd[nr_cpu][counter] >= 0);
-	fcntl(fd[nr_cpu][counter], F_SETFL, O_NONBLOCK);
+		assert(fd[nr_cpu][counter][thread_index] >= 0);
+		fcntl(fd[nr_cpu][counter][thread_index], F_SETFL, O_NONBLOCK);
 
-	/*
-	 * First counter acts as the group leader:
-	 */
-	if (group && group_fd == -1)
-		group_fd = fd[nr_cpu][counter];
-	if (multiplex && multiplex_fd == -1)
-		multiplex_fd = fd[nr_cpu][counter];
+		/*
+		 * First counter acts as the group leader:
+		 */
+		if (group && group_fd == -1)
+			group_fd = fd[nr_cpu][counter][thread_index];
+		if (multiplex && multiplex_fd == -1)
+			multiplex_fd = fd[nr_cpu][counter][thread_index];
 
-	if (multiplex && fd[nr_cpu][counter] != multiplex_fd) {
+		if (multiplex && fd[nr_cpu][counter][thread_index] != multiplex_fd) {
 
-		ret = ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_SET_OUTPUT, multiplex_fd);
-		assert(ret != -1);
-	} else {
-		event_array[nr_poll].fd = fd[nr_cpu][counter];
-		event_array[nr_poll].events = POLLIN;
-		nr_poll++;
-
-		mmap_array[nr_cpu][counter].counter = counter;
-		mmap_array[nr_cpu][counter].prev = 0;
-		mmap_array[nr_cpu][counter].mask = mmap_pages*page_size - 1;
-		mmap_array[nr_cpu][counter].base = mmap(NULL, (mmap_pages+1)*page_size,
-				PROT_READ|PROT_WRITE, MAP_SHARED, fd[nr_cpu][counter], 0);
-		if (mmap_array[nr_cpu][counter].base == MAP_FAILED) {
-			error("failed to mmap with %d (%s)\n", errno, strerror(errno));
-			exit(-1);
+			ret = ioctl(fd[nr_cpu][counter][thread_index], PERF_EVENT_IOC_SET_OUTPUT, multiplex_fd);
+			assert(ret != -1);
+		} else {
+			event_array[nr_poll].fd = fd[nr_cpu][counter][thread_index];
+			event_array[nr_poll].events = POLLIN;
+			nr_poll++;
+
+			mmap_array[nr_cpu][counter][thread_index].counter = counter;
+			mmap_array[nr_cpu][counter][thread_index].prev = 0;
+			mmap_array[nr_cpu][counter][thread_index].mask = mmap_pages*page_size - 1;
+			mmap_array[nr_cpu][counter][thread_index].base = mmap(NULL,
+							(mmap_pages+1)*page_size,
+							PROT_READ|PROT_WRITE, MAP_SHARED,
+							fd[nr_cpu][counter][thread_index],
+							0);
+			if (mmap_array[nr_cpu][counter][thread_index].base == MAP_FAILED) {
+				error("failed to mmap with %d (%s)\n", errno, strerror(errno));
+				exit(-1);
+			}
 		}
-	}
 
-	if (filter != NULL) {
-		ret = ioctl(fd[nr_cpu][counter],
-			    PERF_EVENT_IOC_SET_FILTER, filter);
-		if (ret) {
-			error("failed to set filter with %d (%s)\n", errno,
-			      strerror(errno));
-			exit(-1);
+		if (filter != NULL) {
+			ret = ioctl(fd[nr_cpu][counter][thread_index],
+					PERF_EVENT_IOC_SET_FILTER, filter);
+			if (ret) {
+				error("failed to set filter with %d (%s)\n", errno,
+						strerror(errno));
+				exit(-1);
+			}
 		}
-	}
 
-	ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_ENABLE);
+		ioctl(fd[nr_cpu][counter][thread_index], PERF_EVENT_IOC_ENABLE);
+	}
 }
 
-static void open_counters(int cpu, pid_t pid, bool forks)
+static void open_counters(int cpu, bool forks)
 {
 	int counter;
 
 	group_fd = -1;
 	for (counter = 0; counter < nr_counters; counter++)
-		create_counter(counter, cpu, pid, forks);
+		create_counter(counter, cpu, forks);
 
 	nr_cpu++;
 }
@@ -425,7 +436,7 @@ static int __cmd_record(int argc, const 
 	int err;
 	unsigned long waking = 0;
 	int child_ready_pipe[2], go_pipe[2];
-	const bool forks = target_pid == -1 && argc > 0;
+	const bool forks = target_tid == -1 && argc > 0;
 	char buf;
 
 	page_size = sysconf(_SC_PAGE_SIZE);
@@ -534,7 +545,7 @@ static int __cmd_record(int argc, const 
 		child_pid = pid;
 
 		if (!system_wide)
-			target_pid = pid;
+			target_tid = pid;
 
 		close(child_ready_pipe[1]);
 		close(go_pipe[0]);
@@ -550,11 +561,11 @@ static int __cmd_record(int argc, const 
 
 
 	if ((!system_wide && !inherit) || profile_cpu != -1) {
-		open_counters(profile_cpu, target_pid, forks);
+		open_counters(profile_cpu, forks);
 	} else {
 		nr_cpus = read_cpu_map();
 		for (i = 0; i < nr_cpus; i++)
-			open_counters(cpumap[i], target_pid, forks);
+			open_counters(cpumap[i], forks);
 	}
 
 	if (file_new) {
@@ -579,7 +590,7 @@ static int __cmd_record(int argc, const 
 	}
 
 	if (!system_wide && profile_cpu == -1)
-		event__synthesize_thread(target_pid, process_synthesized_event,
+		event__synthesize_thread(target_tid, process_synthesized_event,
 					 session);
 	else
 		event__synthesize_threads(process_synthesized_event, session);
@@ -602,11 +613,15 @@ static int __cmd_record(int argc, const 
 
 	for (;;) {
 		int hits = samples;
+		int thread;
 
 		for (i = 0; i < nr_cpu; i++) {
 			for (counter = 0; counter < nr_counters; counter++) {
-				if (mmap_array[i][counter].base)
-					mmap_read(&mmap_array[i][counter]);
+				for (thread = 0;
+					thread < thread_num; thread++) {
+					if (mmap_array[i][counter][thread].base)
+						mmap_read(&mmap_array[i][counter][thread]);
+				}
 			}
 		}
 
@@ -619,8 +634,13 @@ static int __cmd_record(int argc, const 
 
 		if (done) {
 			for (i = 0; i < nr_cpu; i++) {
-				for (counter = 0; counter < nr_counters; counter++)
-					ioctl(fd[i][counter], PERF_EVENT_IOC_DISABLE);
+				for (counter = 0; counter < nr_counters;
+					counter++) {
+					for (thread = 0;
+						thread < thread_num; thread++)
+						ioctl(fd[i][counter][thread],
+							PERF_EVENT_IOC_DISABLE);
+				}
 			}
 		}
 	}
@@ -653,6 +673,8 @@ static const struct option options[] = {
 		     "event filter", parse_filter),
 	OPT_INTEGER('p', "pid", &target_pid,
 		    "record events on existing pid"),
+	OPT_INTEGER('t', "tid", &target_tid,
+		    "record events on existing thread id"),
 	OPT_INTEGER('r', "realtime", &realtime_prio,
 		    "collect data with this RT SCHED_FIFO priority"),
 	OPT_BOOLEAN('R', "raw-samples", &raw_samples,
@@ -693,10 +715,11 @@ static const struct option options[] = {
 int cmd_record(int argc, const char **argv, const char *prefix __used)
 {
 	int counter;
+	int i,j;
 
 	argc = parse_options(argc, argv, options, record_usage,
 			    PARSE_OPT_STOP_AT_NON_OPTION);
-	if (!argc && target_pid == -1 && !system_wide && profile_cpu == -1)
+	if (!argc && target_tid == -1 && !system_wide && profile_cpu == -1)
 		usage_with_options(record_usage, options);
 
 	symbol__init();
@@ -707,6 +730,37 @@ int cmd_record(int argc, const char **ar
 		attrs[0].config = PERF_COUNT_HW_CPU_CYCLES;
 	}
 
+	if (target_pid != -1) {
+		target_tid = target_pid;
+		thread_num = find_all_tid(target_pid, &all_tids);
+		if (thread_num <= 0) {
+			fprintf(stderr, "Can't find all threads of pid %d\n",
+				target_pid);
+			usage_with_options(record_usage, options);
+		}
+	} else {
+		all_tids=malloc(sizeof(int));
+		if (!all_tids)
+			return -ENOMEM;
+
+		all_tids[0] = target_tid;
+		thread_num = 1;
+	}
+
+	for (i = 0; i < MAX_NR_CPUS; i++) {
+		for (j = 0; j < MAX_COUNTERS; j++) {
+			fd[i][j] = malloc(sizeof(int)*thread_num);
+			mmap_array[i][j] = malloc(
+				sizeof(struct mmap_data)*thread_num);
+			if (!fd[i][j] || !mmap_array[i][j])
+				return -ENOMEM;
+		}
+	}
+	event_array = malloc(
+		sizeof(struct pollfd)*MAX_NR_CPUS*MAX_COUNTERS*thread_num);
+	if (!event_array)
+		return -ENOMEM;
+
 	/*
 	 * User specified count overrides default frequency.
 	 */
diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-stat.c linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-stat.c
--- linux-2.6_tipmaster0315/tools/perf/builtin-stat.c	2010-03-16 08:59:54.892460680 +0800
+++ linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-stat.c	2010-03-17 16:30:25.484062179 +0800
@@ -46,6 +46,7 @@
 #include "util/debug.h"
 #include "util/header.h"
 #include "util/cpumap.h"
+#include "util/thread.h"
 
 #include <sys/prctl.h>
 #include <math.h>
@@ -74,10 +75,14 @@ static int			run_count			=  1;
 static int			inherit				=  1;
 static int			scale				=  1;
 static pid_t			target_pid			= -1;
+static pid_t			target_tid			= -1;
+static int			*all_tids			=  NULL;
+static int			thread_num			=  0;
 static pid_t			child_pid			= -1;
 static int			null_run			=  0;
+static bool			forks				=  false;
 
-static int			fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int			*fd[MAX_NR_CPUS][MAX_COUNTERS];
 
 static int			event_scaled[MAX_COUNTERS];
 
@@ -140,9 +145,10 @@ struct stats			runtime_branches_stats;
 #define ERR_PERF_OPEN \
 "Error: counter %d, sys_perf_event_open() syscall returned with %d (%s)\n"
 
-static void create_perf_stat_counter(int counter, int pid)
+static void create_perf_stat_counter(int counter)
 {
 	struct perf_event_attr *attr = attrs + counter;
+	int thread;
 
 	if (scale)
 		attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
@@ -152,20 +158,24 @@ static void create_perf_stat_counter(int
 		unsigned int cpu;
 
 		for (cpu = 0; cpu < nr_cpus; cpu++) {
-			fd[cpu][counter] = sys_perf_event_open(attr, -1, cpumap[cpu], -1, 0);
-			if (fd[cpu][counter] < 0 && verbose)
+			fd[cpu][counter][0] = sys_perf_event_open(attr, -1, cpumap[cpu], -1, 0);
+			if (fd[cpu][counter][0] < 0 && verbose)
 				fprintf(stderr, ERR_PERF_OPEN, counter,
-					fd[cpu][counter], strerror(errno));
+					fd[cpu][counter][0], strerror(errno));
 		}
 	} else {
 		attr->inherit	     = inherit;
-		attr->disabled	     = 1;
+		if (forks)
+			attr->disabled	     = 1;
 		attr->enable_on_exec = 1;
 
-		fd[0][counter] = sys_perf_event_open(attr, pid, -1, -1, 0);
-		if (fd[0][counter] < 0 && verbose)
-			fprintf(stderr, ERR_PERF_OPEN, counter,
-				fd[0][counter], strerror(errno));
+		for (thread = 0; thread < thread_num; thread++) {
+			fd[0][counter][thread] = sys_perf_event_open(attr,
+				all_tids[thread], -1, -1, 0);
+			if (fd[0][counter][thread] < 0 && verbose)
+				fprintf(stderr, ERR_PERF_OPEN, counter,
+					fd[0][counter][thread], strerror(errno));
+		}
 	}
 }
 
@@ -190,25 +200,29 @@ static void read_counter(int counter)
 	unsigned int cpu;
 	size_t res, nv;
 	int scaled;
-	int i;
+	int i, thread;
 
 	count[0] = count[1] = count[2] = 0;
 
 	nv = scale ? 3 : 1;
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
-		if (fd[cpu][counter] < 0)
-			continue;
-
-		res = read(fd[cpu][counter], single_count, nv * sizeof(u64));
-		assert(res == nv * sizeof(u64));
-
-		close(fd[cpu][counter]);
-		fd[cpu][counter] = -1;
-
-		count[0] += single_count[0];
-		if (scale) {
-			count[1] += single_count[1];
-			count[2] += single_count[2];
+		
+		for (thread = 0; thread < thread_num; thread++) {
+			if (fd[cpu][counter][thread] < 0)
+				continue;
+
+			res = read(fd[cpu][counter][thread],
+					single_count, nv * sizeof(u64));
+			assert(res == nv * sizeof(u64));
+
+			close(fd[cpu][counter][thread]);
+			fd[cpu][counter][thread] = -1;
+
+			count[0] += single_count[0];
+			if (scale) {
+				count[1] += single_count[1];
+				count[2] += single_count[2];
+			}
 		}
 	}
 
@@ -251,11 +265,11 @@ static int run_perf_stat(int argc __used
 	unsigned long long t0, t1;
 	int status = 0;
 	int counter;
-	int pid = target_pid;
+	int pid = target_tid;
 	int child_ready_pipe[2], go_pipe[2];
-	const bool forks = (target_pid == -1 && argc > 0);
 	char buf;
 
+	forks = (target_tid == -1 && argc > 0);
 	if (!system_wide)
 		nr_cpus = 1;
 
@@ -307,10 +321,12 @@ static int run_perf_stat(int argc __used
 		if (read(child_ready_pipe[0], &buf, 1) == -1)
 			perror("unable to read pipe");
 		close(child_ready_pipe[0]);
+
+		all_tids[0] = pid;
 	}
 
 	for (counter = 0; counter < nr_counters; counter++)
-		create_perf_stat_counter(counter, pid);
+		create_perf_stat_counter(counter);
 
 	/*
 	 * Enable counters and exec the command:
@@ -321,7 +337,7 @@ static int run_perf_stat(int argc __used
 		close(go_pipe[1]);
 		wait(&status);
 	} else {
-		while(!done);
+		while(!done) sleep(1);
 	}
 
 	t1 = rdclock();
@@ -429,12 +445,12 @@ static void print_stat(int argc, const c
 
 	fprintf(stderr, "\n");
 	fprintf(stderr, " Performance counter stats for ");
-	if(target_pid == -1) {
+	if(target_tid == -1) {
 		fprintf(stderr, "\'%s", argv[0]);
 		for (i = 1; i < argc; i++)
 			fprintf(stderr, " %s", argv[i]);
 	}else
-		fprintf(stderr, "task pid \'%d", target_pid);
+		fprintf(stderr, "task pid \'%d", target_tid);
 
 	fprintf(stderr, "\'");
 	if (run_count > 1)
@@ -459,7 +475,7 @@ static volatile int signr = -1;
 
 static void skip_signal(int signo)
 {
-	if(target_pid != -1)
+	if(target_tid != -1)
 		done = 1;
 
 	signr = signo;
@@ -488,8 +504,10 @@ static const struct option options[] = {
 		     parse_events),
 	OPT_BOOLEAN('i', "inherit", &inherit,
 		    "child tasks inherit counters"),
-	OPT_INTEGER('p', "pid", &target_pid,
+	OPT_INTEGER('p', "pid", &target_tid,
 		    "stat events on existing pid"),
+	OPT_INTEGER('t', "tid", &target_tid,
+		    "stat events on existing tid"),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
 	OPT_BOOLEAN('c', "scale", &scale,
@@ -506,10 +524,11 @@ static const struct option options[] = {
 int cmd_stat(int argc, const char **argv, const char *prefix __used)
 {
 	int status;
+	int i,j;
 
 	argc = parse_options(argc, argv, options, stat_usage,
 		PARSE_OPT_STOP_AT_NON_OPTION);
-	if (!argc && target_pid == -1)
+	if (!argc && (target_pid == -1 || target_tid == -1))
 		usage_with_options(stat_usage, options);
 	if (run_count <= 0)
 		usage_with_options(stat_usage, options);
@@ -525,6 +544,32 @@ int cmd_stat(int argc, const char **argv
 	else
 		nr_cpus = 1;
 
+	if (target_pid != -1) {
+		target_tid = target_pid;
+		thread_num = find_all_tid(target_pid, &all_tids);
+		if (thread_num <= 0) {
+			fprintf(stderr, "Can't find all threads of pid %d\n",
+				target_pid);
+			usage_with_options(stat_usage, options);
+		}
+
+	} else {
+		all_tids=malloc(sizeof(int));
+		if (!all_tids)
+			return -ENOMEM;
+
+		all_tids[0] = target_tid;
+		thread_num = 1;
+	}
+
+	for (i = 0; i < MAX_NR_CPUS; i++) {
+		for (j = 0; j < MAX_COUNTERS; j++) {
+			fd[i][j] = malloc(sizeof(int)*thread_num);
+			if (!fd[i][j])
+				return -ENOMEM;
+		}
+	}
+
 	/*
 	 * We dont want to block the signals - that would cause
 	 * child tasks to inherit that and Ctrl-C would not work.
diff -Nraup linux-2.6_tipmaster0315/tools/perf/builtin-top.c linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-top.c
--- linux-2.6_tipmaster0315/tools/perf/builtin-top.c	2010-03-16 08:59:54.760470652 +0800
+++ linux-2.6_tipmaster0315_perfpid/tools/perf/builtin-top.c	2010-03-17 16:30:35.316716557 +0800
@@ -55,7 +55,8 @@
 #include <linux/unistd.h>
 #include <linux/types.h>
 
-static int			fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int			*fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int			*all_tids			=      NULL;
 
 static int			system_wide			=      0;
 
@@ -64,7 +65,9 @@ static int			default_interval		=      0;
 static int			count_filter			=      5;
 static int			print_entries;
 
+static int			target_tid			=     -1;
 static int			target_pid			=     -1;
+static int			thread_num			=      0;
 static int			inherit				=      0;
 static int			profile_cpu			=     -1;
 static int			nr_cpus				=      0;
@@ -524,13 +527,15 @@ static void print_sym_table(void)
 
 	if (target_pid != -1)
 		printf(" (target_pid: %d", target_pid);
+	else if (target_tid != -1)
+		printf(" (target_tid: %d", target_tid);
 	else
 		printf(" (all");
 
 	if (profile_cpu != -1)
 		printf(", cpu: %d)\n", profile_cpu);
 	else {
-		if (target_pid != -1)
+		if (target_tid != -1)
 			printf(")\n");
 		else
 			printf(", %d CPUs)\n", nr_cpus);
@@ -1124,16 +1129,21 @@ static void perf_session__mmap_read_coun
 	md->prev = old;
 }
 
-static struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS];
-static struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
+static struct pollfd *event_array;
+static struct mmap_data *mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
 
 static void perf_session__mmap_read(struct perf_session *self)
 {
-	int i, counter;
+	int i, counter, thread_index;
 
 	for (i = 0; i < nr_cpus; i++) {
 		for (counter = 0; counter < nr_counters; counter++)
-			perf_session__mmap_read_counter(self, &mmap_array[i][counter]);
+			for (thread_index = 0;
+				thread_index < thread_num;
+				thread_index++) {
+				perf_session__mmap_read_counter(self,
+					&mmap_array[i][counter][thread_index]);
+			}
 	}
 }
 
@@ -1144,9 +1154,10 @@ static void start_counter(int i, int cou
 {
 	struct perf_event_attr *attr;
 	int cpu;
+	int thread_index;
 
 	cpu = profile_cpu;
-	if (target_pid == -1 && profile_cpu == -1)
+	if (target_tid == -1 && profile_cpu == -1)
 		cpu = cpumap[i];
 
 	attr = attrs + counter;
@@ -1162,55 +1173,58 @@ static void start_counter(int i, int cou
 	attr->inherit		= (cpu < 0) && inherit;
 	attr->mmap		= 1;
 
+	for (thread_index = 0; thread_index < thread_num; thread_index++) {
 try_again:
-	fd[i][counter] = sys_perf_event_open(attr, target_pid, cpu, group_fd, 0);
+		fd[i][counter][thread_index] = sys_perf_event_open(attr,
+				all_tids[thread_index], cpu, group_fd, 0);
+
+		if (fd[i][counter][thread_index] < 0) {
+			int err = errno;
 
-	if (fd[i][counter] < 0) {
-		int err = errno;
+			if (err == EPERM || err == EACCES)
+				die("No permission - are you root?\n");
+			/*
+			 * If it's cycles then fall back to hrtimer
+			 * based cpu-clock-tick sw counter, which
+			 * is always available even if no PMU support:
+			 */
+			if (attr->type == PERF_TYPE_HARDWARE
+					&& attr->config == PERF_COUNT_HW_CPU_CYCLES) {
+
+				if (verbose)
+					warning(" ... trying to fall back to cpu-clock-ticks\n");
+
+				attr->type = PERF_TYPE_SOFTWARE;
+				attr->config = PERF_COUNT_SW_CPU_CLOCK;
+				goto try_again;
+			}
+			printf("\n");
+			error("perfcounter syscall returned with %d (%s)\n",
+					fd[i][counter][thread_index], strerror(err));
+			die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
+			exit(-1);
+		}
+		assert(fd[i][counter][thread_index] >= 0);
+		fcntl(fd[i][counter][thread_index], F_SETFL, O_NONBLOCK);
 
-		if (err == EPERM || err == EACCES)
-			die("No permission - are you root?\n");
 		/*
-		 * If it's cycles then fall back to hrtimer
-		 * based cpu-clock-tick sw counter, which
-		 * is always available even if no PMU support:
+		 * First counter acts as the group leader:
 		 */
-		if (attr->type == PERF_TYPE_HARDWARE
-			&& attr->config == PERF_COUNT_HW_CPU_CYCLES) {
+		if (group && group_fd == -1)
+			group_fd = fd[i][counter][thread_index];
 
-			if (verbose)
-				warning(" ... trying to fall back to cpu-clock-ticks\n");
-
-			attr->type = PERF_TYPE_SOFTWARE;
-			attr->config = PERF_COUNT_SW_CPU_CLOCK;
-			goto try_again;
-		}
-		printf("\n");
-		error("perfcounter syscall returned with %d (%s)\n",
-			fd[i][counter], strerror(err));
-		die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
-		exit(-1);
+		event_array[nr_poll].fd = fd[i][counter][thread_index];
+		event_array[nr_poll].events = POLLIN;
+		nr_poll++;
+
+		mmap_array[i][counter][thread_index].counter = counter;
+		mmap_array[i][counter][thread_index].prev = 0;
+		mmap_array[i][counter][thread_index].mask = mmap_pages*page_size - 1;
+		mmap_array[i][counter][thread_index].base = mmap(NULL, (mmap_pages+1)*page_size,
+				PROT_READ, MAP_SHARED, fd[i][counter][thread_index], 0);
+		if (mmap_array[i][counter][thread_index].base == MAP_FAILED)
+			die("failed to mmap with %d (%s)\n", errno, strerror(errno));
 	}
-	assert(fd[i][counter] >= 0);
-	fcntl(fd[i][counter], F_SETFL, O_NONBLOCK);
-
-	/*
-	 * First counter acts as the group leader:
-	 */
-	if (group && group_fd == -1)
-		group_fd = fd[i][counter];
-
-	event_array[nr_poll].fd = fd[i][counter];
-	event_array[nr_poll].events = POLLIN;
-	nr_poll++;
-
-	mmap_array[i][counter].counter = counter;
-	mmap_array[i][counter].prev = 0;
-	mmap_array[i][counter].mask = mmap_pages*page_size - 1;
-	mmap_array[i][counter].base = mmap(NULL, (mmap_pages+1)*page_size,
-			PROT_READ, MAP_SHARED, fd[i][counter], 0);
-	if (mmap_array[i][counter].base == MAP_FAILED)
-		die("failed to mmap with %d (%s)\n", errno, strerror(errno));
 }
 
 static int __cmd_top(void)
@@ -1226,8 +1240,8 @@ static int __cmd_top(void)
 	if (session == NULL)
 		return -ENOMEM;
 
-	if (target_pid != -1)
-		event__synthesize_thread(target_pid, event__process, session);
+	if (target_tid != -1)
+		event__synthesize_thread(target_tid, event__process, session);
 	else
 		event__synthesize_threads(event__process, session);
 
@@ -1238,7 +1252,7 @@ static int __cmd_top(void)
 	}
 
 	/* Wait for a minimal set of events before starting the snapshot */
-	poll(event_array, nr_poll, 100);
+	poll(&event_array[0], nr_poll, 100);
 
 	perf_session__mmap_read(session);
 
@@ -1282,6 +1296,8 @@ static const struct option options[] = {
 		    "event period to sample"),
 	OPT_INTEGER('p', "pid", &target_pid,
 		    "profile events on existing pid"),
+	OPT_INTEGER('t', "tid", &target_tid,
+		    "profile events on existing tid"),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 			    "system-wide collection from all CPUs"),
 	OPT_INTEGER('C', "CPU", &profile_cpu,
@@ -1322,6 +1338,7 @@ static const struct option options[] = {
 int cmd_top(int argc, const char **argv, const char *prefix __used)
 {
 	int counter;
+	int i,j;
 
 	page_size = sysconf(_SC_PAGE_SIZE);
 
@@ -1329,8 +1346,39 @@ int cmd_top(int argc, const char **argv,
 	if (argc)
 		usage_with_options(top_usage, options);
 
+	if (target_pid != -1) {
+		target_tid = target_pid;
+		thread_num = find_all_tid(target_pid, &all_tids);
+		if (thread_num <= 0) {
+			fprintf(stderr, "Can't find all threads of pid %d\n",
+				target_pid);
+			usage_with_options(top_usage, options);
+		}
+	} else {
+		all_tids=malloc(sizeof(int));
+		if (!all_tids)
+			return -ENOMEM;
+
+		all_tids[0] = target_tid;
+		thread_num = 1;
+	}
+
+	for (i = 0; i < MAX_NR_CPUS; i++) {
+		for (j = 0; j < MAX_COUNTERS; j++) {
+			fd[i][j] = malloc(sizeof(int)*thread_num);
+			mmap_array[i][j] = malloc(
+				sizeof(struct mmap_data)*thread_num);
+			if (!fd[i][j] || !mmap_array[i][j])
+				return -ENOMEM;
+		}
+	}
+	event_array = malloc(
+		sizeof(struct pollfd)*MAX_NR_CPUS*MAX_COUNTERS*thread_num);
+	if (!event_array)
+		return -ENOMEM;
+
 	/* CPU and PID are mutually exclusive */
-	if (target_pid != -1 && profile_cpu != -1) {
+	if (target_tid > 0 && profile_cpu != -1) {
 		printf("WARNING: PID switch overriding CPU\n");
 		sleep(1);
 		profile_cpu = -1;
@@ -1371,7 +1419,7 @@ int cmd_top(int argc, const char **argv,
 		attrs[counter].sample_period = default_interval;
 	}
 
-	if (target_pid != -1 || profile_cpu != -1)
+	if (target_tid != -1 || profile_cpu != -1)
 		nr_cpus = 1;
 	else
 		nr_cpus = read_cpu_map();
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/thread.c linux-2.6_tipmaster0315_perfpid/tools/perf/util/thread.c
--- linux-2.6_tipmaster0315/tools/perf/util/thread.c	2010-03-16 08:59:54.892460680 +0800
+++ linux-2.6_tipmaster0315_perfpid/tools/perf/util/thread.c	2010-03-17 14:07:25.725218425 +0800
@@ -7,6 +7,37 @@
 #include "util.h"
 #include "debug.h"
 
+int find_all_tid(int pid, int ** all_tid)
+{
+	char name[256];
+	int items;
+	struct dirent **namelist = NULL;
+	int ret = 0;
+	int i;
+
+	sprintf(name, "/proc/%d/task", pid);
+	items = scandir(name, &namelist, NULL, NULL);
+	if (items <= 0)
+                return -ENOENT;
+	*all_tid = malloc(sizeof(int) * items);
+	if (!*all_tid) {
+		ret = -ENOMEM;
+		goto failure;
+	}
+
+	for (i = 0; i < items; i++)
+		(*all_tid)[i] = atoi(namelist[i]->d_name);
+
+	ret = items;
+
+failure:
+	for (i=0; i<items; i++)
+		free(namelist[i]);
+	free(namelist);
+
+	return ret;
+}
+
 void map_groups__init(struct map_groups *self)
 {
 	int i;
@@ -348,3 +379,4 @@ struct symbol *map_groups__find_symbol(s
 
 	return NULL;
 }
+
diff -Nraup linux-2.6_tipmaster0315/tools/perf/util/thread.h linux-2.6_tipmaster0315_perfpid/tools/perf/util/thread.h
--- linux-2.6_tipmaster0315/tools/perf/util/thread.h	2010-03-16 08:59:54.764469663 +0800
+++ linux-2.6_tipmaster0315_perfpid/tools/perf/util/thread.h	2010-03-17 14:03:09.628322688 +0800
@@ -23,6 +23,7 @@ struct thread {
 	int			comm_len;
 };
 
+int find_all_tid(int pid, int ** all_tid);
 void map_groups__init(struct map_groups *self);
 int thread__set_comm(struct thread *self, const char *comm);
 int thread__comm_len(struct thread *self);



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  2:34       ` Zhang, Yanmin
@ 2010-03-17  9:28         ` Sheng Yang
  2010-03-17  9:41           ` Avi Kivity
  2010-03-17 21:14           ` Zachary Amsden
  0 siblings, 2 replies; 390+ messages in thread
From: Sheng Yang @ 2010-03-17  9:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, Huang, Zhiteng, Joerg Roedel

On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
> > On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
> > > Right, but there is a scope between kvm_guest_enter and really running
> > > in guest os, where a perf event might overflow. Anyway, the scope is
> > > very narrow, I will change it to use flag PF_VCPU.
> >
> > There is also a window between setting the flag and calling 'int $2'
> > where an NMI might happen and be accounted incorrectly.
> >
> > Perhaps separate the 'int $2' into a direct call into perf and another
> > call for the rest of NMI handling.  I don't see how it would work on svm
> > though - AFAICT the NMI is held whereas vmx swallows it.
> >
> >  I guess NMIs
> > will be disabled until the next IRET so it isn't racy, just tricky.
> 
> I'm not sure if vmexit does break NMI context or not. Hardware NMI context
> isn't reentrant till a IRET. YangSheng would like to double check it.

After more check, I think VMX won't remained NMI block state for host. That's 
means, if NMI happened and processor is in VMX non-root mode, it would only 
result in VMExit, with a reason indicate that it's due to NMI happened, but no 
more state change in the host.

So in that meaning, there _is_ a window between VMExit and KVM handle the NMI. 
Moreover, I think we _can't_ stop the re-entrance of NMI handling code because 
"int $2" don't have effect to block following NMI.

And if the NMI sequence is not important(I think so), then we need to generate 
a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to 
itself is a good idea.

I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace 
"int $2". Something unexpected is happening...

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  9:28         ` Sheng Yang
@ 2010-03-17  9:41           ` Avi Kivity
  2010-03-17  9:51             ` Sheng Yang
  2010-03-17 21:14           ` Zachary Amsden
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-17  9:41 UTC (permalink / raw)
  To: Sheng Yang
  Cc: Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, Huang, Zhiteng, Joerg Roedel

On 03/17/2010 11:28 AM, Sheng Yang wrote:
>
>> I'm not sure if vmexit does break NMI context or not. Hardware NMI context
>> isn't reentrant till a IRET. YangSheng would like to double check it.
>>      
> After more check, I think VMX won't remained NMI block state for host. That's
> means, if NMI happened and processor is in VMX non-root mode, it would only
> result in VMExit, with a reason indicate that it's due to NMI happened, but no
> more state change in the host.
>
> So in that meaning, there _is_ a window between VMExit and KVM handle the NMI.
> Moreover, I think we _can't_ stop the re-entrance of NMI handling code because
> "int $2" don't have effect to block following NMI.
>    

That's pretty bad, as NMI runs on a separate stack (via IST).  So if 
another NMI happens while our int $2 is running, the stack will be 
corrupted.

> And if the NMI sequence is not important(I think so), then we need to generate
> a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to
> itself is a good idea.
>
> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace
> "int $2". Something unexpected is happening...
>    

I think you need DM_NMI for that to work correctly.

An alternative is to call the NMI handler directly.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  9:41           ` Avi Kivity
@ 2010-03-17  9:51             ` Sheng Yang
  2010-03-17 10:06               ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Sheng Yang @ 2010-03-17  9:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, Huang, Zhiteng, Joerg Roedel

On Wednesday 17 March 2010 17:41:58 Avi Kivity wrote:
> On 03/17/2010 11:28 AM, Sheng Yang wrote:
> >> I'm not sure if vmexit does break NMI context or not. Hardware NMI
> >> context isn't reentrant till a IRET. YangSheng would like to double
> >> check it.
> >
> > After more check, I think VMX won't remained NMI block state for host.
> > That's means, if NMI happened and processor is in VMX non-root mode, it
> > would only result in VMExit, with a reason indicate that it's due to NMI
> > happened, but no more state change in the host.
> >
> > So in that meaning, there _is_ a window between VMExit and KVM handle the
> > NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling
> > code because "int $2" don't have effect to block following NMI.
> 
> That's pretty bad, as NMI runs on a separate stack (via IST).  So if
> another NMI happens while our int $2 is running, the stack will be
> corrupted.

Though hardware didn't provide this kind of block, software at least would 
warn about it... nmi_enter() still would be executed by "int $2", and result 
in BUG() if we are already in NMI context(OK, it is a little better than 
mysterious crash due to corrupted stack).
> 
> > And if the NMI sequence is not important(I think so), then we need to
> > generate a real NMI in current vmexit-after code. Seems let APIC send a
> > NMI IPI to itself is a good idea.
> >
> > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
> > replace "int $2". Something unexpected is happening...
> 
> I think you need DM_NMI for that to work correctly.
> 
> An alternative is to call the NMI handler directly.

apic_send_IPI_self() already took care of APIC_DM_NMI.

And NMI handler would block the following NMI?

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  9:51             ` Sheng Yang
@ 2010-03-17 10:06               ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-17 10:06 UTC (permalink / raw)
  To: Sheng Yang
  Cc: Zhang, Yanmin, Ingo Molnar, Peter Zijlstra, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, Huang, Zhiteng, Joerg Roedel

On 03/17/2010 11:51 AM, Sheng Yang wrote:
>
>> I think you need DM_NMI for that to work correctly.
>>
>> An alternative is to call the NMI handler directly.
>>      
> apic_send_IPI_self() already took care of APIC_DM_NMI.
>    

So it does (though not for x2apic?).  I don't see why it doesn't work.

> And NMI handler would block the following NMI?
>
>    

It wouldn't - won't work without extensive changes.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  9:28         ` Sheng Yang
  2010-03-17  9:41           ` Avi Kivity
@ 2010-03-17 21:14           ` Zachary Amsden
  2010-03-18  1:19             ` Sheng Yang
  1 sibling, 1 reply; 390+ messages in thread
From: Zachary Amsden @ 2010-03-17 21:14 UTC (permalink / raw)
  To: Sheng Yang
  Cc: Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Huang, Zhiteng, Joerg Roedel

On 03/16/2010 11:28 PM, Sheng Yang wrote:
> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
>    
>> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
>>      
>>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
>>>        
>>>> Right, but there is a scope between kvm_guest_enter and really running
>>>> in guest os, where a perf event might overflow. Anyway, the scope is
>>>> very narrow, I will change it to use flag PF_VCPU.
>>>>          
>>> There is also a window between setting the flag and calling 'int $2'
>>> where an NMI might happen and be accounted incorrectly.
>>>
>>> Perhaps separate the 'int $2' into a direct call into perf and another
>>> call for the rest of NMI handling.  I don't see how it would work on svm
>>> though - AFAICT the NMI is held whereas vmx swallows it.
>>>
>>>   I guess NMIs
>>> will be disabled until the next IRET so it isn't racy, just tricky.
>>>        
>> I'm not sure if vmexit does break NMI context or not. Hardware NMI context
>> isn't reentrant till a IRET. YangSheng would like to double check it.
>>      
> After more check, I think VMX won't remained NMI block state for host. That's
> means, if NMI happened and processor is in VMX non-root mode, it would only
> result in VMExit, with a reason indicate that it's due to NMI happened, but no
> more state change in the host.
>
> So in that meaning, there _is_ a window between VMExit and KVM handle the NMI.
> Moreover, I think we _can't_ stop the re-entrance of NMI handling code because
> "int $2" don't have effect to block following NMI.
>
> And if the NMI sequence is not important(I think so), then we need to generate
> a real NMI in current vmexit-after code. Seems let APIC send a NMI IPI to
> itself is a good idea.
>
> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to replace
> "int $2". Something unexpected is happening...
>    

You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't 
supposed to be able to.

Zach

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17 21:14           ` Zachary Amsden
@ 2010-03-18  1:19             ` Sheng Yang
  2010-03-18  4:50               ` Zachary Amsden
  0 siblings, 1 reply; 390+ messages in thread
From: Sheng Yang @ 2010-03-18  1:19 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Huang, Zhiteng, Joerg Roedel

On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote:
> On 03/16/2010 11:28 PM, Sheng Yang wrote:
> > On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
> >> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
> >>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
> >>>> Right, but there is a scope between kvm_guest_enter and really running
> >>>> in guest os, where a perf event might overflow. Anyway, the scope is
> >>>> very narrow, I will change it to use flag PF_VCPU.
> >>>
> >>> There is also a window between setting the flag and calling 'int $2'
> >>> where an NMI might happen and be accounted incorrectly.
> >>>
> >>> Perhaps separate the 'int $2' into a direct call into perf and another
> >>> call for the rest of NMI handling.  I don't see how it would work on
> >>> svm though - AFAICT the NMI is held whereas vmx swallows it.
> >>>
> >>>   I guess NMIs
> >>> will be disabled until the next IRET so it isn't racy, just tricky.
> >>
> >> I'm not sure if vmexit does break NMI context or not. Hardware NMI
> >> context isn't reentrant till a IRET. YangSheng would like to double
> >> check it.
> >
> > After more check, I think VMX won't remained NMI block state for host.
> > That's means, if NMI happened and processor is in VMX non-root mode, it
> > would only result in VMExit, with a reason indicate that it's due to NMI
> > happened, but no more state change in the host.
> >
> > So in that meaning, there _is_ a window between VMExit and KVM handle the
> > NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling
> > code because "int $2" don't have effect to block following NMI.
> >
> > And if the NMI sequence is not important(I think so), then we need to
> > generate a real NMI in current vmexit-after code. Seems let APIC send a
> > NMI IPI to itself is a good idea.
> >
> > I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
> > replace "int $2". Something unexpected is happening...
> 
> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't
> supposed to be able to.

Um? Why?

Especially kernel is already using it to deliver NMI.

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  9:26         ` Zhang, Yanmin
@ 2010-03-18  2:45           ` Zhang, Yanmin
  2010-03-18  7:49             ` Zhang, Yanmin
  0 siblings, 1 reply; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-18  2:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang

On Wed, 2010-03-17 at 17:26 +0800, Zhang, Yanmin wrote:
> On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote:
> > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> > 
> > > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
> > > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> > > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > > > > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
> > > > > >
> > > > > > Based on the discussion in KVM community, I worked out the patch to support
> > > > > > perf to collect guest os statistics from host side. This patch is implemented
> > > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> > > > > > critical bug and provided good suggestions with other guys. I really appreciate
> > > > > > their kind help.
> > > > > >
> > > > > > The patch adds new subcommand kvm to perf.
> > > > > >
> > > > > >    perf kvm top
> > > > > >    perf kvm record
> > > > > >    perf kvm report
> > > > > >    perf kvm diff
> > > > > >
> > > > > > The new perf could profile guest os kernel except guest os user space, but it
> > > > > > could summarize guest os user space utilization per guest os.
> > > > > >
> > > > > > Below are some examples.
> > > > > > 1) perf kvm top
> > > > > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> > > > > > --guestmodules=/home/ymzhang/guest/modules top
> > > > > >
> > > > > >    
> > > > > 
> > > > Thanks for your kind comments.
> > > > 
> > > > > Excellent, support for guest kernel != host kernel is critical (I can't 
> > > > > remember the last time I ran same kernels).
> > > > > 
> > > > > How would we support multiple guests with different kernels?
> > > > With the patch, 'perf kvm report --sort pid" could show
> > > > summary statistics for all guest os instances. Then, use
> > > > parameter --pid of 'perf kvm record' to collect single problematic instance data.
> > > Sorry. I found currently --pid isn't process but a thread (main thread).
> > > 
> > > Ingo,
> > > 
> > > Is it possible to support a new parameter or extend --inherit, so 'perf 
> > > record' and 'perf top' could collect data on all threads of a process when 
> > > the process is running?
> > > 
> > > If not, I need add a new ugly parameter which is similar to --pid to filter 
> > > out process data in userspace.
> > 
> > Yeah. For maximum utility i'd suggest to extend --pid to include this, and 
> > introduce --tid for the previous, limited-to-a-single-task functionality.
> > 
> > Most users would expect --pid to work like a 'late attach' - i.e. to work like 
> > strace -f or like a gdb attach.
> 
> Thanks Ingo, Avi.
> 
> I worked out below patch against tip/master of March 15th.
> 
> Subject: [PATCH] Change perf's parameter --pid to process-wide collection
> From: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
> 
> Change parameter -p (--pid) to real process pid and add -t (--tid) meaning
> thread id. Now, --pid means perf collects the statistics of all threads of
> the process, while --tid means perf just collect the statistics of that thread.
> 
> BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures
> attr->disabled=1 if it isn't a system-wide collection. If there is a '-p'
> and no forks, 'perf stat -p' doesn't collect any data. In addition, the
> while(!done) in run_perf_stat consumes 100% single cpu time which has bad impact
> on running workload. I added a sleep(1) in the loop.
> 
> Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Ingo,

Sorry, the patch has bugs.  I need do a better job and will work out 2
separate patches against the 2 issues.

Yanmin



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-18  1:19             ` Sheng Yang
@ 2010-03-18  4:50               ` Zachary Amsden
  2010-03-18  5:22                 ` Sheng Yang
  0 siblings, 1 reply; 390+ messages in thread
From: Zachary Amsden @ 2010-03-18  4:50 UTC (permalink / raw)
  To: Sheng Yang
  Cc: Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Huang, Zhiteng, Joerg Roedel

On 03/17/2010 03:19 PM, Sheng Yang wrote:
> On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote:
>    
>> On 03/16/2010 11:28 PM, Sheng Yang wrote:
>>      
>>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
>>>        
>>>> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
>>>>          
>>>>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
>>>>>            
>>>>>> Right, but there is a scope between kvm_guest_enter and really running
>>>>>> in guest os, where a perf event might overflow. Anyway, the scope is
>>>>>> very narrow, I will change it to use flag PF_VCPU.
>>>>>>              
>>>>> There is also a window between setting the flag and calling 'int $2'
>>>>> where an NMI might happen and be accounted incorrectly.
>>>>>
>>>>> Perhaps separate the 'int $2' into a direct call into perf and another
>>>>> call for the rest of NMI handling.  I don't see how it would work on
>>>>> svm though - AFAICT the NMI is held whereas vmx swallows it.
>>>>>
>>>>>    I guess NMIs
>>>>> will be disabled until the next IRET so it isn't racy, just tricky.
>>>>>            
>>>> I'm not sure if vmexit does break NMI context or not. Hardware NMI
>>>> context isn't reentrant till a IRET. YangSheng would like to double
>>>> check it.
>>>>          
>>> After more check, I think VMX won't remained NMI block state for host.
>>> That's means, if NMI happened and processor is in VMX non-root mode, it
>>> would only result in VMExit, with a reason indicate that it's due to NMI
>>> happened, but no more state change in the host.
>>>
>>> So in that meaning, there _is_ a window between VMExit and KVM handle the
>>> NMI. Moreover, I think we _can't_ stop the re-entrance of NMI handling
>>> code because "int $2" don't have effect to block following NMI.
>>>
>>> And if the NMI sequence is not important(I think so), then we need to
>>> generate a real NMI in current vmexit-after code. Seems let APIC send a
>>> NMI IPI to itself is a good idea.
>>>
>>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
>>> replace "int $2". Something unexpected is happening...
>>>        
>> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't
>> supposed to be able to.
>>      
> Um? Why?
>
> Especially kernel is already using it to deliver NMI.
>
>    

That's the only defined case, and it is defined because the vector field 
is ignore for DM_NMI.  Vol 3A (exact section numbers may vary depending 
on your version).

8.5.1 / 8.6.1

'100 (NMI) Delivers an NMI interrupt to the target processor or 
processors.  The vector information is ignored'

8.5.2  Valid Interrupt Vectors

'Local and I/O APICs support 240 of these vectors (in the range of 16 to 
255) as valid interrupts.'

8.8.4 Interrupt Acceptance for Fixed Interrupts

'...; vectors 0 through 15 are reserved by the APIC (see also: Section 
8.5.2, "Valid Interrupt Vectors")'

So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but 
vectors 0x00-0x0f are not valid to send via APIC or I/O APIC.

Zach

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-18  4:50               ` Zachary Amsden
@ 2010-03-18  5:22                 ` Sheng Yang
  2010-03-18  5:41                   ` Sheng Yang
  0 siblings, 1 reply; 390+ messages in thread
From: Sheng Yang @ 2010-03-18  5:22 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Huang, Zhiteng, Joerg Roedel

On Thursday 18 March 2010 12:50:58 Zachary Amsden wrote:
> On 03/17/2010 03:19 PM, Sheng Yang wrote:
> > On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote:
> >> On 03/16/2010 11:28 PM, Sheng Yang wrote:
> >>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
> >>>> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
> >>>>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
> >>>>>> Right, but there is a scope between kvm_guest_enter and really
> >>>>>> running in guest os, where a perf event might overflow. Anyway, the
> >>>>>> scope is very narrow, I will change it to use flag PF_VCPU.
> >>>>>
> >>>>> There is also a window between setting the flag and calling 'int $2'
> >>>>> where an NMI might happen and be accounted incorrectly.
> >>>>>
> >>>>> Perhaps separate the 'int $2' into a direct call into perf and
> >>>>> another call for the rest of NMI handling.  I don't see how it would
> >>>>> work on svm though - AFAICT the NMI is held whereas vmx swallows it.
> >>>>>
> >>>>>    I guess NMIs
> >>>>> will be disabled until the next IRET so it isn't racy, just tricky.
> >>>>
> >>>> I'm not sure if vmexit does break NMI context or not. Hardware NMI
> >>>> context isn't reentrant till a IRET. YangSheng would like to double
> >>>> check it.
> >>>
> >>> After more check, I think VMX won't remained NMI block state for host.
> >>> That's means, if NMI happened and processor is in VMX non-root mode, it
> >>> would only result in VMExit, with a reason indicate that it's due to
> >>> NMI happened, but no more state change in the host.
> >>>
> >>> So in that meaning, there _is_ a window between VMExit and KVM handle
> >>> the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI
> >>> handling code because "int $2" don't have effect to block following
> >>> NMI.
> >>>
> >>> And if the NMI sequence is not important(I think so), then we need to
> >>> generate a real NMI in current vmexit-after code. Seems let APIC send a
> >>> NMI IPI to itself is a good idea.
> >>>
> >>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
> >>> replace "int $2". Something unexpected is happening...
> >>
> >> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't
> >> supposed to be able to.
> >
> > Um? Why?
> >
> > Especially kernel is already using it to deliver NMI.
> 
> That's the only defined case, and it is defined because the vector field
> is ignore for DM_NMI.  Vol 3A (exact section numbers may vary depending
> on your version).
> 
> 8.5.1 / 8.6.1
> 
> '100 (NMI) Delivers an NMI interrupt to the target processor or
> processors.  The vector information is ignored'
> 
> 8.5.2  Valid Interrupt Vectors
> 
> 'Local and I/O APICs support 240 of these vectors (in the range of 16 to
> 255) as valid interrupts.'
> 
> 8.8.4 Interrupt Acceptance for Fixed Interrupts
> 
> '...; vectors 0 through 15 are reserved by the APIC (see also: Section
> 8.5.2, "Valid Interrupt Vectors")'
> 
> So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but
> vectors 0x00-0x0f are not valid to send via APIC or I/O APIC.

As you pointed out, NMI is not "Fixed interrupt". If we want to send NMI, it 
would need a specific delivery mode rather than vector number. 

And if you look at code, if we specific NMI_VECTOR, the delivery mode would be 
set to NMI.

So what's wrong here?

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 390+ messages in thread

* RE: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-17  3:54                                                 ` Avi Kivity
@ 2010-03-18  5:27                                                     ` Huang, Zhiteng
  2010-03-18  5:27                                                     ` Huang, Zhiteng
  1 sibling, 0 replies; 390+ messages in thread
From: Huang, Zhiteng @ 2010-03-18  5:27 UTC (permalink / raw)
  To: Avi Kivity, Frank Ch. Eigler
  Cc: Anthony Liguori, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2779 bytes --]

Hi Avi, Ingo,

I've been following through this long thread since the very first email.  

I'm a performance engineer whose job is to tune workloads run on top of KVM (and Xen previously).  As a performance engineer, I desperately want to have a tool that can monitor the host and guests at same time.  Think about >100 guests mixed with Linux/Windows running together on single system, being able to know what's happening is critical to do performance analysis.   Actually I am the person asked Yanmin to add feature for CPU utilization break down (into host_usr, host_krn, guest_usr, guest_krn) so that I can monitor dozens of running guests.   I hasn't made this patch work on my system yet but I _do_ think this patch is a very good start.  

And finally, monitoring guests from host is useful for users too (administrator and performance guy like me).   I really appreciate you guys' work and would love to provide feedback from my point of view if needed.


Regards,

HUANG, Zhiteng

Intel SSG/SSD/SPA/PRC Scalability Lab


-----Original Message-----
From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of Avi Kivity
Sent: Wednesday, March 17, 2010 11:55 AM
To: Frank Ch. Eigler
Cc: Anthony Liguori; Ingo Molnar; Zhang, Yanmin; Peter Zijlstra; Sheng Yang; linux-kernel@vger.kernel.org; kvm@vger.kernel.org; Marcelo Tosatti; oerg Roedel; Jes Sorensen; Gleb Natapov; Zachary Amsden; ziteng.huang@intel.com
Subject: Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

On 03/17/2010 02:41 AM, Frank Ch. Eigler wrote:
> Hi -
>
> On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote:
>    
>> [...]
>> The only way to really address this is to change the interaction.
>> Instead of running perf externally to qemu, we should support a perf
>> command in the qemu monitor that can then tie directly to the perf
>> tooling.  That gives us the best possible user experience.
>>      
> To what extent could this be solved with less crossing of
> isolation/abstraction layers, if the perfctr facilities were properly
> virtualized?
>    

That's the more interesting (by far) usage model.  In general guest 
owners don't have access to the host, and host owners can't (and 
shouldn't) change guests.

Monitoring guests from the host is useful for kvm developers, but less 
so for users.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 390+ messages in thread

* RE: [PATCH] Enhance perf to collect KVM guest os statistics from host side
@ 2010-03-18  5:27                                                     ` Huang, Zhiteng
  0 siblings, 0 replies; 390+ messages in thread
From: Huang, Zhiteng @ 2010-03-18  5:27 UTC (permalink / raw)
  To: Avi Kivity, Frank Ch. Eigler
  Cc: Anthony Liguori, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden

Hi Avi, Ingo,

I've been following through this long thread since the very first email.  

I'm a performance engineer whose job is to tune workloads run on top of KVM (and Xen previously).  As a performance engineer, I desperately want to have a tool that can monitor the host and guests at same time.  Think about >100 guests mixed with Linux/Windows running together on single system, being able to know what's happening is critical to do performance analysis.   Actually I am the person asked Yanmin to add feature for CPU utilization break down (into host_usr, host_krn, guest_usr, guest_krn) so that I can monitor dozens of running guests.   I hasn't made this patch work on my system yet but I _do_ think this patch is a very good start.  

And finally, monitoring guests from host is useful for users too (administrator and performance guy like me).   I really appreciate you guys' work and would love to provide feedback from my point of view if needed.


Regards,

HUANG, Zhiteng

Intel SSG/SSD/SPA/PRC Scalability Lab


-----Original Message-----
From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf Of Avi Kivity
Sent: Wednesday, March 17, 2010 11:55 AM
To: Frank Ch. Eigler
Cc: Anthony Liguori; Ingo Molnar; Zhang, Yanmin; Peter Zijlstra; Sheng Yang; linux-kernel@vger.kernel.org; kvm@vger.kernel.org; Marcelo Tosatti; oerg Roedel; Jes Sorensen; Gleb Natapov; Zachary Amsden; ziteng.huang@intel.com
Subject: Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side

On 03/17/2010 02:41 AM, Frank Ch. Eigler wrote:
> Hi -
>
> On Tue, Mar 16, 2010 at 06:04:10PM -0500, Anthony Liguori wrote:
>    
>> [...]
>> The only way to really address this is to change the interaction.
>> Instead of running perf externally to qemu, we should support a perf
>> command in the qemu monitor that can then tie directly to the perf
>> tooling.  That gives us the best possible user experience.
>>      
> To what extent could this be solved with less crossing of
> isolation/abstraction layers, if the perfctr facilities were properly
> virtualized?
>    

That's the more interesting (by far) usage model.  In general guest 
owners don't have access to the host, and host owners can't (and 
shouldn't) change guests.

Monitoring guests from the host is useful for kvm developers, but less 
so for users.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-18  5:22                 ` Sheng Yang
@ 2010-03-18  5:41                   ` Sheng Yang
  2010-03-18  8:47                     ` Zachary Amsden
  0 siblings, 1 reply; 390+ messages in thread
From: Sheng Yang @ 2010-03-18  5:41 UTC (permalink / raw)
  To: kvm
  Cc: Zachary Amsden, Avi Kivity, Zhang, Yanmin, Ingo Molnar,
	Peter Zijlstra, linux-kernel, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Huang, Zhiteng, Joerg Roedel

On Thursday 18 March 2010 13:22:28 Sheng Yang wrote:
> On Thursday 18 March 2010 12:50:58 Zachary Amsden wrote:
> > On 03/17/2010 03:19 PM, Sheng Yang wrote:
> > > On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote:
> > >> On 03/16/2010 11:28 PM, Sheng Yang wrote:
> > >>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
> > >>>> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
> > >>>>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
> > >>>>>> Right, but there is a scope between kvm_guest_enter and really
> > >>>>>> running in guest os, where a perf event might overflow. Anyway,
> > >>>>>> the scope is very narrow, I will change it to use flag PF_VCPU.
> > >>>>>
> > >>>>> There is also a window between setting the flag and calling 'int
> > >>>>> $2' where an NMI might happen and be accounted incorrectly.
> > >>>>>
> > >>>>> Perhaps separate the 'int $2' into a direct call into perf and
> > >>>>> another call for the rest of NMI handling.  I don't see how it
> > >>>>> would work on svm though - AFAICT the NMI is held whereas vmx
> > >>>>> swallows it.
> > >>>>>
> > >>>>>    I guess NMIs
> > >>>>> will be disabled until the next IRET so it isn't racy, just tricky.
> > >>>>
> > >>>> I'm not sure if vmexit does break NMI context or not. Hardware NMI
> > >>>> context isn't reentrant till a IRET. YangSheng would like to double
> > >>>> check it.
> > >>>
> > >>> After more check, I think VMX won't remained NMI block state for
> > >>> host. That's means, if NMI happened and processor is in VMX non-root
> > >>> mode, it would only result in VMExit, with a reason indicate that
> > >>> it's due to NMI happened, but no more state change in the host.
> > >>>
> > >>> So in that meaning, there _is_ a window between VMExit and KVM handle
> > >>> the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI
> > >>> handling code because "int $2" don't have effect to block following
> > >>> NMI.
> > >>>
> > >>> And if the NMI sequence is not important(I think so), then we need to
> > >>> generate a real NMI in current vmexit-after code. Seems let APIC send
> > >>> a NMI IPI to itself is a good idea.
> > >>>
> > >>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
> > >>> replace "int $2". Something unexpected is happening...
> > >>
> > >> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't
> > >> supposed to be able to.
> > >
> > > Um? Why?
> > >
> > > Especially kernel is already using it to deliver NMI.
> >
> > That's the only defined case, and it is defined because the vector field
> > is ignore for DM_NMI.  Vol 3A (exact section numbers may vary depending
> > on your version).
> >
> > 8.5.1 / 8.6.1
> >
> > '100 (NMI) Delivers an NMI interrupt to the target processor or
> > processors.  The vector information is ignored'
> >
> > 8.5.2  Valid Interrupt Vectors
> >
> > 'Local and I/O APICs support 240 of these vectors (in the range of 16 to
> > 255) as valid interrupts.'
> >
> > 8.8.4 Interrupt Acceptance for Fixed Interrupts
> >
> > '...; vectors 0 through 15 are reserved by the APIC (see also: Section
> > 8.5.2, "Valid Interrupt Vectors")'
> >
> > So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but
> > vectors 0x00-0x0f are not valid to send via APIC or I/O APIC.
> 
> As you pointed out, NMI is not "Fixed interrupt". If we want to send NMI,
>  it would need a specific delivery mode rather than vector number.
> 
> And if you look at code, if we specific NMI_VECTOR, the delivery mode would
>  be set to NMI.
> 
> So what's wrong here?

OK, I think I understand your points now. You meant that these vectors can't 
be filled in vector field directly, right? But NMI is a exception due to 
DM_NMI. Is that your point? I think we agree on this.

-- 
regards
Yang, Sheng

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-18  2:45           ` Zhang, Yanmin
@ 2010-03-18  7:49             ` Zhang, Yanmin
  2010-03-18  8:03               ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-18  7:49 UTC (permalink / raw)
  To: Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang

[-- Attachment #1: Type: text/plain, Size: 4977 bytes --]

On Thu, 2010-03-18 at 10:45 +0800, Zhang, Yanmin wrote:
> On Wed, 2010-03-17 at 17:26 +0800, Zhang, Yanmin wrote:
> > On Tue, 2010-03-16 at 10:47 +0100, Ingo Molnar wrote:
> > > * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> > > 
> > > > On Tue, 2010-03-16 at 15:48 +0800, Zhang, Yanmin wrote:
> > > > > On Tue, 2010-03-16 at 07:41 +0200, Avi Kivity wrote:
> > > > > > On 03/16/2010 07:27 AM, Zhang, Yanmin wrote:
> > > > > > > From: Zhang, Yanmin<yanmin_zhang@linux.intel.com>
> > > > > > >
> > > > > > > Based on the discussion in KVM community, I worked out the patch to support
> > > > > > > perf to collect guest os statistics from host side. This patch is implemented
> > > > > > > with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
> > > > > > > critical bug and provided good suggestions with other guys. I really appreciate
> > > > > > > their kind help.
> > > > > > >
> > > > > > > The patch adds new subcommand kvm to perf.
> > > > > > >
> > > > > > >    perf kvm top
> > > > > > >    perf kvm record
> > > > > > >    perf kvm report
> > > > > > >    perf kvm diff
> > > > > > >
> > > > > > > The new perf could profile guest os kernel except guest os user space, but it
> > > > > > > could summarize guest os user space utilization per guest os.
> > > > > > >
> > > > > > > Below are some examples.
> > > > > > > 1) perf kvm top
> > > > > > > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> > > > > > > --guestmodules=/home/ymzhang/guest/modules top
> > > > > > >
> > > > > > >    
> > > > > > 
> > > > > Thanks for your kind comments.
> > > > > 
> > > > > > Excellent, support for guest kernel != host kernel is critical (I can't 
> > > > > > remember the last time I ran same kernels).
> > > > > > 
> > > > > > How would we support multiple guests with different kernels?
> > > > > With the patch, 'perf kvm report --sort pid" could show
> > > > > summary statistics for all guest os instances. Then, use
> > > > > parameter --pid of 'perf kvm record' to collect single problematic instance data.
> > > > Sorry. I found currently --pid isn't process but a thread (main thread).
> > > > 
> > > > Ingo,
> > > > 
> > > > Is it possible to support a new parameter or extend --inherit, so 'perf 
> > > > record' and 'perf top' could collect data on all threads of a process when 
> > > > the process is running?
> > > > 
> > > > If not, I need add a new ugly parameter which is similar to --pid to filter 
> > > > out process data in userspace.
> > > 
> > > Yeah. For maximum utility i'd suggest to extend --pid to include this, and 
> > > introduce --tid for the previous, limited-to-a-single-task functionality.
> > > 
> > > Most users would expect --pid to work like a 'late attach' - i.e. to work like 
> > > strace -f or like a gdb attach.
> > 
> > Thanks Ingo, Avi.
> > 
> > I worked out below patch against tip/master of March 15th.
> > 
> > Subject: [PATCH] Change perf's parameter --pid to process-wide collection
> > From: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
> > 
> > Change parameter -p (--pid) to real process pid and add -t (--tid) meaning
> > thread id. Now, --pid means perf collects the statistics of all threads of
> > the process, while --tid means perf just collect the statistics of that thread.
> > 
> > BTW, the patch fixes a bug of 'perf stat -p'. 'perf stat' always configures
> > attr->disabled=1 if it isn't a system-wide collection. If there is a '-p'
> > and no forks, 'perf stat -p' doesn't collect any data. In addition, the
> > while(!done) in run_perf_stat consumes 100% single cpu time which has bad impact
> > on running workload. I added a sleep(1) in the loop.
> > 
> > Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
> Ingo,
> 
> Sorry, the patch has bugs.  I need do a better job and will work out 2
> separate patches against the 2 issues.

I worked out 3 new patches against tip/master tree of Mar. 17th.

1) Patch perf_stat: Fix the issue that perf doesn't enable counters when
target_pid != -1. Change the condition to fork/exec subcommand. If there
is a subcommand parameter, perf always fork/exec it. The usage example is:
#perf stat -a sleep 10
So this command could collect statistics for 10 seconds precisely. User
still could stop it by CTRL+C.

2) Patch perf_record: Fix the issue that when perf forks/exec a subcommand,
it should enable all counters after the new process is execing.Change the
condition to fork/exec subcommand. If there is a subcommand parameter,
perf always fork/exec it. The usage example is:
#perf record -f -a sleep 10
So this command could collect statistics for 10 seconds precisely. User
still could stop it by CTRL+C.

3) perf_pid: Change parameter --pid to process-wide collection. Add --tid
which means collecting thread-wide statistics. Usage example is:
#perf top -p 8888
#perf record -p 8888 -f sleep 10
#perf stat -p 8888 -f sleep 10

Arnaldo,

Pls. apply the 3 attached patches.

Yanmin


[-- Attachment #2: perf_stat_2.6_tipmaster0317_v02.patch --]
[-- Type: text/x-patch, Size: 2075 bytes --]

diff -Nraup linux-2.6_tipmaster0317/tools/perf/builtin-stat.c linux-2.6_tipmaster0317_fixstat/tools/perf/builtin-stat.c
--- linux-2.6_tipmaster0317/tools/perf/builtin-stat.c	2010-03-18 09:04:40.938289813 +0800
+++ linux-2.6_tipmaster0317_fixstat/tools/perf/builtin-stat.c	2010-03-18 13:07:26.773773541 +0800
@@ -159,8 +159,10 @@ static void create_perf_stat_counter(int
 		}
 	} else {
 		attr->inherit	     = inherit;
-		attr->disabled	     = 1;
-		attr->enable_on_exec = 1;
+		if (target_pid == -1) {
+			attr->disabled = 1;
+			attr->enable_on_exec = 1;
+		}
 
 		fd[0][counter] = sys_perf_event_open(attr, pid, -1, -1, 0);
 		if (fd[0][counter] < 0 && verbose)
@@ -251,9 +253,9 @@ static int run_perf_stat(int argc __used
 	unsigned long long t0, t1;
 	int status = 0;
 	int counter;
-	int pid = target_pid;
+	int pid;
 	int child_ready_pipe[2], go_pipe[2];
-	const bool forks = (target_pid == -1 && argc > 0);
+	const bool forks = (argc > 0);
 	char buf;
 
 	if (!system_wide)
@@ -265,10 +267,10 @@ static int run_perf_stat(int argc __used
 	}
 
 	if (forks) {
-		if ((pid = fork()) < 0)
+		if ((child_pid = fork()) < 0)
 			perror("failed to fork");
 
-		if (!pid) {
+		if (!child_pid) {
 			close(child_ready_pipe[0]);
 			close(go_pipe[1]);
 			fcntl(go_pipe[0], F_SETFD, FD_CLOEXEC);
@@ -297,8 +299,6 @@ static int run_perf_stat(int argc __used
 			exit(-1);
 		}
 
-		child_pid = pid;
-
 		/*
 		 * Wait for the child to be ready to exec.
 		 */
@@ -309,6 +309,10 @@ static int run_perf_stat(int argc __used
 		close(child_ready_pipe[0]);
 	}
 
+	if (target_pid == -1)
+		pid = child_pid;
+	else
+		pid = target_pid;
 	for (counter = 0; counter < nr_counters; counter++)
 		create_perf_stat_counter(counter, pid);
 
@@ -321,7 +325,7 @@ static int run_perf_stat(int argc __used
 		close(go_pipe[1]);
 		wait(&status);
 	} else {
-		while(!done);
+		while(!done) sleep(1);
 	}
 
 	t1 = rdclock();
@@ -459,7 +463,7 @@ static volatile int signr = -1;
 
 static void skip_signal(int signo)
 {
-	if(target_pid != -1)
+	if(child_pid == -1)
 		done = 1;
 
 	signr = signo;

[-- Attachment #3: perf_record_2.6_tipmaster0317_v02.patch --]
[-- Type: text/x-patch, Size: 2729 bytes --]

diff -Nraup linux-2.6_tip0317/tools/perf/builtin-record.c linux-2.6_tip0317_fixrecord/tools/perf/builtin-record.c
--- linux-2.6_tip0317/tools/perf/builtin-record.c	2010-03-18 09:04:40.942263175 +0800
+++ linux-2.6_tip0317_fixrecord/tools/perf/builtin-record.c	2010-03-18 13:33:24.254359348 +0800
@@ -225,7 +225,7 @@ static struct perf_header_attr *get_head
 	return h_attr;
 }
 
-static void create_counter(int counter, int cpu, pid_t pid, bool forks)
+static void create_counter(int counter, int cpu, pid_t pid)
 {
 	char *filter = filters[counter];
 	struct perf_event_attr *attr = attrs + counter;
@@ -275,10 +275,10 @@ static void create_counter(int counter, 
 	attr->mmap		= track;
 	attr->comm		= track;
 	attr->inherit		= inherit;
-	attr->disabled		= 1;
-
-	if (forks)
+	if (target_pid == -1 && !system_wide) {
+		attr->disabled = 1;
 		attr->enable_on_exec = 1;
+	}
 
 try_again:
 	fd[nr_cpu][counter] = sys_perf_event_open(attr, pid, cpu, group_fd, 0);
@@ -380,17 +380,15 @@ try_again:
 			exit(-1);
 		}
 	}
-
-	ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_ENABLE);
 }
 
-static void open_counters(int cpu, pid_t pid, bool forks)
+static void open_counters(int cpu, pid_t pid)
 {
 	int counter;
 
 	group_fd = -1;
 	for (counter = 0; counter < nr_counters; counter++)
-		create_counter(counter, cpu, pid, forks);
+		create_counter(counter, cpu, pid);
 
 	nr_cpu++;
 }
@@ -425,7 +423,7 @@ static int __cmd_record(int argc, const 
 	int err;
 	unsigned long waking = 0;
 	int child_ready_pipe[2], go_pipe[2];
-	const bool forks = target_pid == -1 && argc > 0;
+	const bool forks = argc > 0;
 	char buf;
 
 	page_size = sysconf(_SC_PAGE_SIZE);
@@ -496,13 +494,13 @@ static int __cmd_record(int argc, const 
 	atexit(atexit_header);
 
 	if (forks) {
-		pid = fork();
+		child_pid = fork();
 		if (pid < 0) {
 			perror("failed to fork");
 			exit(-1);
 		}
 
-		if (!pid) {
+		if (!child_pid) {
 			close(child_ready_pipe[0]);
 			close(go_pipe[1]);
 			fcntl(go_pipe[0], F_SETFD, FD_CLOEXEC);
@@ -531,11 +529,6 @@ static int __cmd_record(int argc, const 
 			exit(-1);
 		}
 
-		child_pid = pid;
-
-		if (!system_wide)
-			target_pid = pid;
-
 		close(child_ready_pipe[1]);
 		close(go_pipe[0]);
 		/*
@@ -548,13 +541,17 @@ static int __cmd_record(int argc, const 
 		close(child_ready_pipe[0]);
 	}
 
+	if (forks && target_pid == -1 && !system_wide)
+		pid = child_pid;
+	else
+		pid = target_pid;
 
 	if ((!system_wide && !inherit) || profile_cpu != -1) {
-		open_counters(profile_cpu, target_pid, forks);
+		open_counters(profile_cpu, pid);
 	} else {
 		nr_cpus = read_cpu_map();
 		for (i = 0; i < nr_cpus; i++)
-			open_counters(cpumap[i], target_pid, forks);
+			open_counters(cpumap[i], pid);
 	}
 
 	if (file_new) {

[-- Attachment #4: perf_pid_2.6_tip0317_v06.patch --]
[-- Type: text/x-patch, Size: 29446 bytes --]

diff -Nraup linux-2.6_tip0317_statrecord/tools/perf/builtin-record.c linux-2.6_tip0317_statrecordpid/tools/perf/builtin-record.c
--- linux-2.6_tip0317_statrecord/tools/perf/builtin-record.c	2010-03-18 13:48:39.578181540 +0800
+++ linux-2.6_tip0317_statrecordpid/tools/perf/builtin-record.c	2010-03-18 14:28:41.449631936 +0800
@@ -27,7 +27,7 @@
 #include <unistd.h>
 #include <sched.h>
 
-static int			fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int			*fd[MAX_NR_CPUS][MAX_COUNTERS];
 
 static long			default_interval		=      0;
 
@@ -43,6 +43,9 @@ static int			raw_samples			=      0;
 static int			system_wide			=      0;
 static int			profile_cpu			=     -1;
 static pid_t			target_pid			=     -1;
+static pid_t			target_tid			=     -1;
+static pid_t			*all_tids			=      NULL;
+static int			thread_num			=      0;
 static pid_t			child_pid			=     -1;
 static int			inherit				=      1;
 static int			force				=      0;
@@ -60,7 +63,7 @@ static struct timeval		this_read;
 
 static u64			bytes_written			=      0;
 
-static struct pollfd		event_array[MAX_NR_CPUS * MAX_COUNTERS];
+static struct pollfd		*event_array;
 
 static int			nr_poll				=      0;
 static int			nr_cpu				=      0;
@@ -77,7 +80,7 @@ struct mmap_data {
 	unsigned int		prev;
 };
 
-static struct mmap_data		mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
+static struct mmap_data		*mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
 
 static unsigned long mmap_read_head(struct mmap_data *md)
 {
@@ -225,12 +228,13 @@ static struct perf_header_attr *get_head
 	return h_attr;
 }
 
-static void create_counter(int counter, int cpu, pid_t pid)
+static void create_counter(int counter, int cpu)
 {
 	char *filter = filters[counter];
 	struct perf_event_attr *attr = attrs + counter;
 	struct perf_header_attr *h_attr;
 	int track = !counter; /* only the first counter needs these */
+	int thread_index;
 	int ret;
 	struct {
 		u64 count;
@@ -280,115 +284,124 @@ static void create_counter(int counter, 
 		attr->enable_on_exec = 1;
 	}
 
+	for (thread_index = 0; thread_index < thread_num; thread_index++) {
 try_again:
-	fd[nr_cpu][counter] = sys_perf_event_open(attr, pid, cpu, group_fd, 0);
+		fd[nr_cpu][counter][thread_index] = sys_perf_event_open(attr,
+				all_tids[thread_index], cpu, group_fd, 0);
 
-	if (fd[nr_cpu][counter] < 0) {
-		int err = errno;
+		if (fd[nr_cpu][counter][thread_index] < 0) {
+			int err = errno;
 
-		if (err == EPERM || err == EACCES)
-			die("Permission error - are you root?\n"
-			    "\t Consider tweaking /proc/sys/kernel/perf_event_paranoid.\n");
-		else if (err ==  ENODEV && profile_cpu != -1)
-			die("No such device - did you specify an out-of-range profile CPU?\n");
+			if (err == EPERM || err == EACCES)
+				die("Permission error - are you root?\n"
+					"\t Consider tweaking"
+					" /proc/sys/kernel/perf_event_paranoid.\n");
+			else if (err ==  ENODEV && profile_cpu != -1) {
+				die("No such device - did you specify"
+					" an out-of-range profile CPU?\n");
+			}
 
-		/*
-		 * If it's cycles then fall back to hrtimer
-		 * based cpu-clock-tick sw counter, which
-		 * is always available even if no PMU support:
-		 */
-		if (attr->type == PERF_TYPE_HARDWARE
-			&& attr->config == PERF_COUNT_HW_CPU_CYCLES) {
+			/*
+			 * If it's cycles then fall back to hrtimer
+			 * based cpu-clock-tick sw counter, which
+			 * is always available even if no PMU support:
+			 */
+			if (attr->type == PERF_TYPE_HARDWARE
+					&& attr->config == PERF_COUNT_HW_CPU_CYCLES) {
 
-			if (verbose)
-				warning(" ... trying to fall back to cpu-clock-ticks\n");
-			attr->type = PERF_TYPE_SOFTWARE;
-			attr->config = PERF_COUNT_SW_CPU_CLOCK;
-			goto try_again;
-		}
-		printf("\n");
-		error("perfcounter syscall returned with %d (%s)\n",
-			fd[nr_cpu][counter], strerror(err));
+				if (verbose)
+					warning(" ... trying to fall back to cpu-clock-ticks\n");
+				attr->type = PERF_TYPE_SOFTWARE;
+				attr->config = PERF_COUNT_SW_CPU_CLOCK;
+				goto try_again;
+			}
+			printf("\n");
+			error("perfcounter syscall returned with %d (%s)\n",
+					fd[nr_cpu][counter][thread_index], strerror(err));
 
 #if defined(__i386__) || defined(__x86_64__)
-		if (attr->type == PERF_TYPE_HARDWARE && err == EOPNOTSUPP)
-			die("No hardware sampling interrupt available. No APIC? If so then you can boot the kernel with the \"lapic\" boot parameter to force-enable it.\n");
+			if (attr->type == PERF_TYPE_HARDWARE && err == EOPNOTSUPP)
+				die("No hardware sampling interrupt available."
+				    " No APIC? If so then you can boot the kernel"
+				    " with the \"lapic\" boot parameter to"
+				    " force-enable it.\n");
 #endif
 
-		die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
-		exit(-1);
-	}
+			die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
+			exit(-1);
+		}
 
-	h_attr = get_header_attr(attr, counter);
-	if (h_attr == NULL)
-		die("nomem\n");
+		h_attr = get_header_attr(attr, counter);
+		if (h_attr == NULL)
+			die("nomem\n");
+
+		if (!file_new) {
+			if (memcmp(&h_attr->attr, attr, sizeof(*attr))) {
+				fprintf(stderr, "incompatible append\n");
+				exit(-1);
+			}
+		}
 
-	if (!file_new) {
-		if (memcmp(&h_attr->attr, attr, sizeof(*attr))) {
-			fprintf(stderr, "incompatible append\n");
+		if (read(fd[nr_cpu][counter][thread_index], &read_data, sizeof(read_data)) == -1) {
+			perror("Unable to read perf file descriptor\n");
 			exit(-1);
 		}
-	}
-
-	if (read(fd[nr_cpu][counter], &read_data, sizeof(read_data)) == -1) {
-		perror("Unable to read perf file descriptor\n");
-		exit(-1);
-	}
 
-	if (perf_header_attr__add_id(h_attr, read_data.id) < 0) {
-		pr_warning("Not enough memory to add id\n");
-		exit(-1);
-	}
+		if (perf_header_attr__add_id(h_attr, read_data.id) < 0) {
+			pr_warning("Not enough memory to add id\n");
+			exit(-1);
+		}
 
-	assert(fd[nr_cpu][counter] >= 0);
-	fcntl(fd[nr_cpu][counter], F_SETFL, O_NONBLOCK);
+		assert(fd[nr_cpu][counter][thread_index] >= 0);
+		fcntl(fd[nr_cpu][counter][thread_index], F_SETFL, O_NONBLOCK);
 
-	/*
-	 * First counter acts as the group leader:
-	 */
-	if (group && group_fd == -1)
-		group_fd = fd[nr_cpu][counter];
-	if (multiplex && multiplex_fd == -1)
-		multiplex_fd = fd[nr_cpu][counter];
+		/*
+		 * First counter acts as the group leader:
+		 */
+		if (group && group_fd == -1)
+			group_fd = fd[nr_cpu][counter][thread_index];
+		if (multiplex && multiplex_fd == -1)
+			multiplex_fd = fd[nr_cpu][counter][thread_index];
 
-	if (multiplex && fd[nr_cpu][counter] != multiplex_fd) {
+		if (multiplex && fd[nr_cpu][counter][thread_index] != multiplex_fd) {
 
-		ret = ioctl(fd[nr_cpu][counter], PERF_EVENT_IOC_SET_OUTPUT, multiplex_fd);
-		assert(ret != -1);
-	} else {
-		event_array[nr_poll].fd = fd[nr_cpu][counter];
-		event_array[nr_poll].events = POLLIN;
-		nr_poll++;
-
-		mmap_array[nr_cpu][counter].counter = counter;
-		mmap_array[nr_cpu][counter].prev = 0;
-		mmap_array[nr_cpu][counter].mask = mmap_pages*page_size - 1;
-		mmap_array[nr_cpu][counter].base = mmap(NULL, (mmap_pages+1)*page_size,
-				PROT_READ|PROT_WRITE, MAP_SHARED, fd[nr_cpu][counter], 0);
-		if (mmap_array[nr_cpu][counter].base == MAP_FAILED) {
-			error("failed to mmap with %d (%s)\n", errno, strerror(errno));
-			exit(-1);
+			ret = ioctl(fd[nr_cpu][counter][thread_index], PERF_EVENT_IOC_SET_OUTPUT, multiplex_fd);
+			assert(ret != -1);
+		} else {
+			event_array[nr_poll].fd = fd[nr_cpu][counter][thread_index];
+			event_array[nr_poll].events = POLLIN;
+			nr_poll++;
+
+			mmap_array[nr_cpu][counter][thread_index].counter = counter;
+			mmap_array[nr_cpu][counter][thread_index].prev = 0;
+			mmap_array[nr_cpu][counter][thread_index].mask = mmap_pages*page_size - 1;
+			mmap_array[nr_cpu][counter][thread_index].base = mmap(NULL, (mmap_pages+1)*page_size,
+				PROT_READ|PROT_WRITE, MAP_SHARED, fd[nr_cpu][counter][thread_index], 0);
+			if (mmap_array[nr_cpu][counter][thread_index].base == MAP_FAILED) {
+				error("failed to mmap with %d (%s)\n", errno, strerror(errno));
+				exit(-1);
+			}
 		}
-	}
 
-	if (filter != NULL) {
-		ret = ioctl(fd[nr_cpu][counter],
-			    PERF_EVENT_IOC_SET_FILTER, filter);
-		if (ret) {
-			error("failed to set filter with %d (%s)\n", errno,
-			      strerror(errno));
-			exit(-1);
+		if (filter != NULL) {
+			ret = ioctl(fd[nr_cpu][counter][thread_index],
+					PERF_EVENT_IOC_SET_FILTER, filter);
+			if (ret) {
+				error("failed to set filter with %d (%s)\n", errno,
+						strerror(errno));
+				exit(-1);
+			}
 		}
 	}
 }
 
-static void open_counters(int cpu, pid_t pid)
+static void open_counters(int cpu)
 {
 	int counter;
 
 	group_fd = -1;
 	for (counter = 0; counter < nr_counters; counter++)
-		create_counter(counter, cpu, pid);
+		create_counter(counter, cpu);
 
 	nr_cpu++;
 }
@@ -529,6 +542,9 @@ static int __cmd_record(int argc, const 
 			exit(-1);
 		}
 
+		if (!system_wide && target_tid == -1 && target_pid == -1)
+			all_tids[0] = child_pid;
+
 		close(child_ready_pipe[1]);
 		close(go_pipe[0]);
 		/*
@@ -541,17 +557,12 @@ static int __cmd_record(int argc, const 
 		close(child_ready_pipe[0]);
 	}
 
-	if (forks && target_pid == -1 && !system_wide)
-		pid = child_pid;
-	else
-		pid = target_pid;
-
 	if ((!system_wide && !inherit) || profile_cpu != -1) {
-		open_counters(profile_cpu, pid);
+		open_counters(profile_cpu);
 	} else {
 		nr_cpus = read_cpu_map();
 		for (i = 0; i < nr_cpus; i++)
-			open_counters(cpumap[i], pid);
+			open_counters(cpumap[i]);
 	}
 
 	if (file_new) {
@@ -576,7 +587,7 @@ static int __cmd_record(int argc, const 
 	}
 
 	if (!system_wide && profile_cpu == -1)
-		event__synthesize_thread(target_pid, process_synthesized_event,
+		event__synthesize_thread(target_tid, process_synthesized_event,
 					 session);
 	else
 		event__synthesize_threads(process_synthesized_event, session);
@@ -599,11 +610,16 @@ static int __cmd_record(int argc, const 
 
 	for (;;) {
 		int hits = samples;
+		int thread;
 
 		for (i = 0; i < nr_cpu; i++) {
 			for (counter = 0; counter < nr_counters; counter++) {
-				if (mmap_array[i][counter].base)
-					mmap_read(&mmap_array[i][counter]);
+				for (thread = 0;
+					thread < thread_num; thread++) {
+					if (mmap_array[i][counter][thread].base)
+						mmap_read(&mmap_array[i][counter][thread]);
+				}
+
 			}
 		}
 
@@ -616,8 +632,15 @@ static int __cmd_record(int argc, const 
 
 		if (done) {
 			for (i = 0; i < nr_cpu; i++) {
-				for (counter = 0; counter < nr_counters; counter++)
-					ioctl(fd[i][counter], PERF_EVENT_IOC_DISABLE);
+				for (counter = 0;
+					counter < nr_counters;
+					counter++) {
+					for (thread = 0;
+						thread < thread_num;
+						thread++)
+						ioctl(fd[i][counter][thread],
+							PERF_EVENT_IOC_DISABLE);
+				}
 			}
 		}
 	}
@@ -649,7 +672,9 @@ static const struct option options[] = {
 	OPT_CALLBACK(0, "filter", NULL, "filter",
 		     "event filter", parse_filter),
 	OPT_INTEGER('p', "pid", &target_pid,
-		    "record events on existing pid"),
+		    "record events on existing process id"),
+	OPT_INTEGER('t', "tid", &target_tid,
+		    "record events on existing thread id"),
 	OPT_INTEGER('r', "realtime", &realtime_prio,
 		    "collect data with this RT SCHED_FIFO priority"),
 	OPT_BOOLEAN('R', "raw-samples", &raw_samples,
@@ -690,10 +715,12 @@ static const struct option options[] = {
 int cmd_record(int argc, const char **argv, const char *prefix __used)
 {
 	int counter;
+	int i,j;
 
 	argc = parse_options(argc, argv, options, record_usage,
 			    PARSE_OPT_STOP_AT_NON_OPTION);
-	if (!argc && target_pid == -1 && !system_wide && profile_cpu == -1)
+	if (!argc && target_pid == -1 && target_tid == -1 &&
+		!system_wide && profile_cpu == -1)
 		usage_with_options(record_usage, options);
 
 	symbol__init();
@@ -704,6 +731,37 @@ int cmd_record(int argc, const char **ar
 		attrs[0].config = PERF_COUNT_HW_CPU_CYCLES;
 	}
 
+	if (target_pid != -1) {
+		target_tid = target_pid;
+		thread_num = find_all_tid(target_pid, &all_tids);
+		if (thread_num <= 0) {
+			fprintf(stderr, "Can't find all threads of pid %d\n",
+					target_pid);
+			usage_with_options(record_usage, options);
+		}
+	} else {
+		all_tids=malloc(sizeof(pid_t));
+		if (!all_tids)
+			return -ENOMEM;
+
+		all_tids[0] = target_tid;
+		thread_num = 1;
+	}
+
+	for (i = 0; i < MAX_NR_CPUS; i++) {
+		for (j = 0; j < MAX_COUNTERS; j++) {
+			fd[i][j] = malloc(sizeof(int)*thread_num);
+			mmap_array[i][j] = malloc(
+				sizeof(struct mmap_data)*thread_num);
+			if (!fd[i][j] || !mmap_array[i][j])
+				return -ENOMEM;
+		}
+	}
+	event_array = malloc(
+		sizeof(struct pollfd)*MAX_NR_CPUS*MAX_COUNTERS*thread_num);
+	if (!event_array)
+		return -ENOMEM;
+
 	/*
 	 * User specified count overrides default frequency.
 	 */
diff -Nraup linux-2.6_tip0317_statrecord/tools/perf/builtin-stat.c linux-2.6_tip0317_statrecordpid/tools/perf/builtin-stat.c
--- linux-2.6_tip0317_statrecord/tools/perf/builtin-stat.c	2010-03-18 13:46:14.600074330 +0800
+++ linux-2.6_tip0317_statrecordpid/tools/perf/builtin-stat.c	2010-03-18 14:29:49.318367157 +0800
@@ -46,6 +46,7 @@
 #include "util/debug.h"
 #include "util/header.h"
 #include "util/cpumap.h"
+#include "util/thread.h"
 
 #include <sys/prctl.h>
 #include <math.h>
@@ -74,10 +75,13 @@ static int			run_count			=  1;
 static int			inherit				=  1;
 static int			scale				=  1;
 static pid_t			target_pid			= -1;
+static pid_t			target_tid			= -1;
+static pid_t			*all_tids			=  NULL;
+static int			thread_num			=  0;
 static pid_t			child_pid			= -1;
 static int			null_run			=  0;
 
-static int			fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int			*fd[MAX_NR_CPUS][MAX_COUNTERS];
 
 static int			event_scaled[MAX_COUNTERS];
 
@@ -140,9 +144,10 @@ struct stats			runtime_branches_stats;
 #define ERR_PERF_OPEN \
 "Error: counter %d, sys_perf_event_open() syscall returned with %d (%s)\n"
 
-static void create_perf_stat_counter(int counter, int pid)
+static void create_perf_stat_counter(int counter)
 {
 	struct perf_event_attr *attr = attrs + counter;
+	int thread;
 
 	if (scale)
 		attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
@@ -152,10 +157,11 @@ static void create_perf_stat_counter(int
 		unsigned int cpu;
 
 		for (cpu = 0; cpu < nr_cpus; cpu++) {
-			fd[cpu][counter] = sys_perf_event_open(attr, -1, cpumap[cpu], -1, 0);
-			if (fd[cpu][counter] < 0 && verbose)
+			fd[cpu][counter][0] = sys_perf_event_open(attr,
+					-1, cpumap[cpu], -1, 0);
+			if (fd[cpu][counter][0] < 0 && verbose)
 				fprintf(stderr, ERR_PERF_OPEN, counter,
-					fd[cpu][counter], strerror(errno));
+					fd[cpu][counter][0], strerror(errno));
 		}
 	} else {
 		attr->inherit	     = inherit;
@@ -163,11 +169,14 @@ static void create_perf_stat_counter(int
 			attr->disabled = 1;
 			attr->enable_on_exec = 1;
 		}
-
-		fd[0][counter] = sys_perf_event_open(attr, pid, -1, -1, 0);
-		if (fd[0][counter] < 0 && verbose)
-			fprintf(stderr, ERR_PERF_OPEN, counter,
-				fd[0][counter], strerror(errno));
+		for (thread = 0; thread < thread_num; thread++) {
+			fd[0][counter][thread] = sys_perf_event_open(attr,
+				all_tids[thread], -1, -1, 0);
+			if (fd[0][counter][thread] < 0 && verbose)
+				fprintf(stderr, ERR_PERF_OPEN, counter,
+					fd[0][counter][thread],
+					strerror(errno));
+		}
 	}
 }
 
@@ -192,25 +201,28 @@ static void read_counter(int counter)
 	unsigned int cpu;
 	size_t res, nv;
 	int scaled;
-	int i;
+	int i, thread;
 
 	count[0] = count[1] = count[2] = 0;
 
 	nv = scale ? 3 : 1;
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
-		if (fd[cpu][counter] < 0)
-			continue;
-
-		res = read(fd[cpu][counter], single_count, nv * sizeof(u64));
-		assert(res == nv * sizeof(u64));
-
-		close(fd[cpu][counter]);
-		fd[cpu][counter] = -1;
-
-		count[0] += single_count[0];
-		if (scale) {
-			count[1] += single_count[1];
-			count[2] += single_count[2];
+		for (thread = 0; thread < thread_num; thread++) {
+			if (fd[cpu][counter][thread] < 0)
+				continue;
+
+			res = read(fd[cpu][counter][thread],
+					single_count, nv * sizeof(u64));
+			assert(res == nv * sizeof(u64));
+
+			close(fd[cpu][counter][thread]);
+			fd[cpu][counter][thread] = -1;
+
+			count[0] += single_count[0];
+			if (scale) {
+				count[1] += single_count[1];
+				count[2] += single_count[2];
+			}
 		}
 	}
 
@@ -253,7 +265,6 @@ static int run_perf_stat(int argc __used
 	unsigned long long t0, t1;
 	int status = 0;
 	int counter;
-	int pid;
 	int child_ready_pipe[2], go_pipe[2];
 	const bool forks = (argc > 0);
 	char buf;
@@ -299,6 +310,9 @@ static int run_perf_stat(int argc __used
 			exit(-1);
 		}
 
+		if (target_tid == -1 && target_pid == -1 && !system_wide)
+			all_tids[0] = child_pid;
+
 		/*
 		 * Wait for the child to be ready to exec.
 		 */
@@ -309,12 +323,8 @@ static int run_perf_stat(int argc __used
 		close(child_ready_pipe[0]);
 	}
 
-	if (target_pid == -1)
-		pid = child_pid;
-	else
-		pid = target_pid;
 	for (counter = 0; counter < nr_counters; counter++)
-		create_perf_stat_counter(counter, pid);
+		create_perf_stat_counter(counter);
 
 	/*
 	 * Enable counters and exec the command:
@@ -433,12 +443,14 @@ static void print_stat(int argc, const c
 
 	fprintf(stderr, "\n");
 	fprintf(stderr, " Performance counter stats for ");
-	if(target_pid == -1) {
+	if(target_pid == -1 && target_tid == -1) {
 		fprintf(stderr, "\'%s", argv[0]);
 		for (i = 1; i < argc; i++)
 			fprintf(stderr, " %s", argv[i]);
-	}else
-		fprintf(stderr, "task pid \'%d", target_pid);
+	} else if (target_pid != -1)
+		fprintf(stderr, "process id \'%d", target_pid);
+	else
+		fprintf(stderr, "thread id \'%d", target_tid);
 
 	fprintf(stderr, "\'");
 	if (run_count > 1)
@@ -493,7 +505,9 @@ static const struct option options[] = {
 	OPT_BOOLEAN('i', "inherit", &inherit,
 		    "child tasks inherit counters"),
 	OPT_INTEGER('p', "pid", &target_pid,
-		    "stat events on existing pid"),
+		    "stat events on existing process id"),
+	OPT_INTEGER('t', "tid", &target_tid,
+		    "stat events on existing thread id"),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
 	OPT_BOOLEAN('c', "scale", &scale,
@@ -510,10 +524,11 @@ static const struct option options[] = {
 int cmd_stat(int argc, const char **argv, const char *prefix __used)
 {
 	int status;
+	int i,j;
 
 	argc = parse_options(argc, argv, options, stat_usage,
 		PARSE_OPT_STOP_AT_NON_OPTION);
-	if (!argc && target_pid == -1)
+	if (!argc && target_pid == -1 && target_tid == -1)
 		usage_with_options(stat_usage, options);
 	if (run_count <= 0)
 		usage_with_options(stat_usage, options);
@@ -529,6 +544,31 @@ int cmd_stat(int argc, const char **argv
 	else
 		nr_cpus = 1;
 
+	if (target_pid != -1) {
+		target_tid = target_pid;
+		thread_num = find_all_tid(target_pid, &all_tids);
+		if (thread_num <= 0) {
+			fprintf(stderr, "Can't find all threads of pid %d\n",
+					target_pid);
+			usage_with_options(stat_usage, options);
+		}
+	} else {
+		all_tids=malloc(sizeof(pid_t));
+		if (!all_tids)
+			return -ENOMEM;
+
+		all_tids[0] = target_tid;
+		thread_num = 1;
+	}
+
+	for (i = 0; i < MAX_NR_CPUS; i++) {
+		for (j = 0; j < MAX_COUNTERS; j++) {
+			fd[i][j] = malloc(sizeof(int)*thread_num);
+			if (!fd[i][j])
+				return -ENOMEM;
+		}
+	}
+
 	/*
 	 * We dont want to block the signals - that would cause
 	 * child tasks to inherit that and Ctrl-C would not work.
diff -Nraup linux-2.6_tip0317_statrecord/tools/perf/builtin-top.c linux-2.6_tip0317_statrecordpid/tools/perf/builtin-top.c
--- linux-2.6_tip0317_statrecord/tools/perf/builtin-top.c	2010-03-18 13:45:27.252768232 +0800
+++ linux-2.6_tip0317_statrecordpid/tools/perf/builtin-top.c	2010-03-18 14:26:52.766054822 +0800
@@ -55,7 +55,7 @@
 #include <linux/unistd.h>
 #include <linux/types.h>
 
-static int			fd[MAX_NR_CPUS][MAX_COUNTERS];
+static int			*fd[MAX_NR_CPUS][MAX_COUNTERS];
 
 static int			system_wide			=      0;
 
@@ -65,6 +65,9 @@ static int			count_filter			=      5;
 static int			print_entries;
 
 static int			target_pid			=     -1;
+static int			target_tid			=     -1;
+static pid_t			*all_tids			=      NULL;
+static int			thread_num			=      0;
 static int			inherit				=      0;
 static int			profile_cpu			=     -1;
 static int			nr_cpus				=      0;
@@ -524,13 +527,15 @@ static void print_sym_table(void)
 
 	if (target_pid != -1)
 		printf(" (target_pid: %d", target_pid);
+	else if (target_tid != -1)
+		printf(" (target_tid: %d", target_tid);
 	else
 		printf(" (all");
 
 	if (profile_cpu != -1)
 		printf(", cpu: %d)\n", profile_cpu);
 	else {
-		if (target_pid != -1)
+		if (target_tid != -1)
 			printf(")\n");
 		else
 			printf(", %d CPUs)\n", nr_cpus);
@@ -1129,16 +1134,21 @@ static void perf_session__mmap_read_coun
 	md->prev = old;
 }
 
-static struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS];
-static struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
+static struct pollfd *event_array;
+static struct mmap_data *mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
 
 static void perf_session__mmap_read(struct perf_session *self)
 {
-	int i, counter;
+	int i, counter, thread_index;
 
 	for (i = 0; i < nr_cpus; i++) {
 		for (counter = 0; counter < nr_counters; counter++)
-			perf_session__mmap_read_counter(self, &mmap_array[i][counter]);
+			for (thread_index = 0;
+				thread_index < thread_num;
+				thread_index++) {
+				perf_session__mmap_read_counter(self,
+					&mmap_array[i][counter][thread_index]);
+			}
 	}
 }
 
@@ -1149,9 +1159,10 @@ static void start_counter(int i, int cou
 {
 	struct perf_event_attr *attr;
 	int cpu;
+	int thread_index;
 
 	cpu = profile_cpu;
-	if (target_pid == -1 && profile_cpu == -1)
+	if (target_tid == -1 && profile_cpu == -1)
 		cpu = cpumap[i];
 
 	attr = attrs + counter;
@@ -1167,55 +1178,58 @@ static void start_counter(int i, int cou
 	attr->inherit		= (cpu < 0) && inherit;
 	attr->mmap		= 1;
 
+	for (thread_index = 0; thread_index < thread_num; thread_index++) {
 try_again:
-	fd[i][counter] = sys_perf_event_open(attr, target_pid, cpu, group_fd, 0);
+		fd[i][counter][thread_index] = sys_perf_event_open(attr,
+				all_tids[thread_index], cpu, group_fd, 0);
+
+		if (fd[i][counter][thread_index] < 0) {
+			int err = errno;
 
-	if (fd[i][counter] < 0) {
-		int err = errno;
+			if (err == EPERM || err == EACCES)
+				die("No permission - are you root?\n");
+			/*
+			 * If it's cycles then fall back to hrtimer
+			 * based cpu-clock-tick sw counter, which
+			 * is always available even if no PMU support:
+			 */
+			if (attr->type == PERF_TYPE_HARDWARE
+					&& attr->config == PERF_COUNT_HW_CPU_CYCLES) {
+
+				if (verbose)
+					warning(" ... trying to fall back to cpu-clock-ticks\n");
+
+				attr->type = PERF_TYPE_SOFTWARE;
+				attr->config = PERF_COUNT_SW_CPU_CLOCK;
+				goto try_again;
+			}
+			printf("\n");
+			error("perfcounter syscall returned with %d (%s)\n",
+					fd[i][counter][thread_index], strerror(err));
+			die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
+			exit(-1);
+		}
+		assert(fd[i][counter][thread_index] >= 0);
+		fcntl(fd[i][counter][thread_index], F_SETFL, O_NONBLOCK);
 
-		if (err == EPERM || err == EACCES)
-			die("No permission - are you root?\n");
 		/*
-		 * If it's cycles then fall back to hrtimer
-		 * based cpu-clock-tick sw counter, which
-		 * is always available even if no PMU support:
+		 * First counter acts as the group leader:
 		 */
-		if (attr->type == PERF_TYPE_HARDWARE
-			&& attr->config == PERF_COUNT_HW_CPU_CYCLES) {
+		if (group && group_fd == -1)
+			group_fd = fd[i][counter][thread_index];
 
-			if (verbose)
-				warning(" ... trying to fall back to cpu-clock-ticks\n");
-
-			attr->type = PERF_TYPE_SOFTWARE;
-			attr->config = PERF_COUNT_SW_CPU_CLOCK;
-			goto try_again;
-		}
-		printf("\n");
-		error("perfcounter syscall returned with %d (%s)\n",
-			fd[i][counter], strerror(err));
-		die("No CONFIG_PERF_EVENTS=y kernel support configured?\n");
-		exit(-1);
+		event_array[nr_poll].fd = fd[i][counter][thread_index];
+		event_array[nr_poll].events = POLLIN;
+		nr_poll++;
+
+		mmap_array[i][counter][thread_index].counter = counter;
+		mmap_array[i][counter][thread_index].prev = 0;
+		mmap_array[i][counter][thread_index].mask = mmap_pages*page_size - 1;
+		mmap_array[i][counter][thread_index].base = mmap(NULL, (mmap_pages+1)*page_size,
+				PROT_READ, MAP_SHARED, fd[i][counter][thread_index], 0);
+		if (mmap_array[i][counter][thread_index].base == MAP_FAILED)
+			die("failed to mmap with %d (%s)\n", errno, strerror(errno));
 	}
-	assert(fd[i][counter] >= 0);
-	fcntl(fd[i][counter], F_SETFL, O_NONBLOCK);
-
-	/*
-	 * First counter acts as the group leader:
-	 */
-	if (group && group_fd == -1)
-		group_fd = fd[i][counter];
-
-	event_array[nr_poll].fd = fd[i][counter];
-	event_array[nr_poll].events = POLLIN;
-	nr_poll++;
-
-	mmap_array[i][counter].counter = counter;
-	mmap_array[i][counter].prev = 0;
-	mmap_array[i][counter].mask = mmap_pages*page_size - 1;
-	mmap_array[i][counter].base = mmap(NULL, (mmap_pages+1)*page_size,
-			PROT_READ, MAP_SHARED, fd[i][counter], 0);
-	if (mmap_array[i][counter].base == MAP_FAILED)
-		die("failed to mmap with %d (%s)\n", errno, strerror(errno));
 }
 
 static int __cmd_top(void)
@@ -1231,8 +1245,8 @@ static int __cmd_top(void)
 	if (session == NULL)
 		return -ENOMEM;
 
-	if (target_pid != -1)
-		event__synthesize_thread(target_pid, event__process, session);
+	if (target_tid != -1)
+		event__synthesize_thread(target_tid, event__process, session);
 	else
 		event__synthesize_threads(event__process, session);
 
@@ -1243,7 +1257,7 @@ static int __cmd_top(void)
 	}
 
 	/* Wait for a minimal set of events before starting the snapshot */
-	poll(event_array, nr_poll, 100);
+	poll(&event_array[0], nr_poll, 100);
 
 	perf_session__mmap_read(session);
 
@@ -1286,7 +1300,9 @@ static const struct option options[] = {
 	OPT_INTEGER('c', "count", &default_interval,
 		    "event period to sample"),
 	OPT_INTEGER('p', "pid", &target_pid,
-		    "profile events on existing pid"),
+		    "profile events on existing process id"),
+	OPT_INTEGER('t', "tid", &target_tid,
+		    "profile events on existing thread id"),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 			    "system-wide collection from all CPUs"),
 	OPT_INTEGER('C', "CPU", &profile_cpu,
@@ -1327,6 +1343,7 @@ static const struct option options[] = {
 int cmd_top(int argc, const char **argv, const char *prefix __used)
 {
 	int counter;
+	int i,j;
 
 	page_size = sysconf(_SC_PAGE_SIZE);
 
@@ -1334,8 +1351,39 @@ int cmd_top(int argc, const char **argv,
 	if (argc)
 		usage_with_options(top_usage, options);
 
+	if (target_pid != -1) {
+		target_tid = target_pid;
+		thread_num = find_all_tid(target_pid, &all_tids);
+		if (thread_num <= 0) {
+			fprintf(stderr, "Can't find all threads of pid %d\n",
+				target_pid);
+			usage_with_options(top_usage, options);
+		}
+	} else {
+		all_tids=malloc(sizeof(pid_t));
+		if (!all_tids)
+			return -ENOMEM;
+
+		all_tids[0] = target_tid;
+		thread_num = 1;
+	}
+
+	for (i = 0; i < MAX_NR_CPUS; i++) {
+		for (j = 0; j < MAX_COUNTERS; j++) {
+			fd[i][j] = malloc(sizeof(int)*thread_num);
+			mmap_array[i][j] = malloc(
+				sizeof(struct mmap_data)*thread_num);
+			if (!fd[i][j] || !mmap_array[i][j])
+				return -ENOMEM;
+		}
+	}
+	event_array = malloc(
+		sizeof(struct pollfd)*MAX_NR_CPUS*MAX_COUNTERS*thread_num);
+	if (!event_array)
+		return -ENOMEM;
+
 	/* CPU and PID are mutually exclusive */
-	if (target_pid != -1 && profile_cpu != -1) {
+	if (target_tid > 0 && profile_cpu != -1) {
 		printf("WARNING: PID switch overriding CPU\n");
 		sleep(1);
 		profile_cpu = -1;
@@ -1376,7 +1424,7 @@ int cmd_top(int argc, const char **argv,
 		attrs[counter].sample_period = default_interval;
 	}
 
-	if (target_pid != -1 || profile_cpu != -1)
+	if (target_tid != -1 || profile_cpu != -1)
 		nr_cpus = 1;
 	else
 		nr_cpus = read_cpu_map();
diff -Nraup linux-2.6_tip0317_statrecord/tools/perf/util/thread.c linux-2.6_tip0317_statrecordpid/tools/perf/util/thread.c
--- linux-2.6_tip0317_statrecord/tools/perf/util/thread.c	2010-03-18 13:45:27.268773347 +0800
+++ linux-2.6_tip0317_statrecordpid/tools/perf/util/thread.c	2010-03-18 14:26:29.588441791 +0800
@@ -7,6 +7,37 @@
 #include "util.h"
 #include "debug.h"
 
+int find_all_tid(int pid, pid_t ** all_tid)
+{
+	char name[256];
+	int items;
+	struct dirent **namelist = NULL;
+	int ret = 0;
+	int i;
+
+	sprintf(name, "/proc/%d/task", pid);
+	items = scandir(name, &namelist, NULL, NULL);
+	if (items <= 0)
+                return -ENOENT;
+	*all_tid = malloc(sizeof(pid_t) * items);
+	if (!*all_tid) {
+		ret = -ENOMEM;
+		goto failure;
+	}
+
+	for (i = 0; i < items; i++)
+		(*all_tid)[i] = atoi(namelist[i]->d_name);
+
+	ret = items;
+
+failure:
+	for (i=0; i<items; i++)
+		free(namelist[i]);
+	free(namelist);
+
+	return ret;
+}
+
 void map_groups__init(struct map_groups *self)
 {
 	int i;
@@ -348,3 +379,4 @@ struct symbol *map_groups__find_symbol(s
 
 	return NULL;
 }
+
diff -Nraup linux-2.6_tip0317_statrecord/tools/perf/util/thread.h linux-2.6_tip0317_statrecordpid/tools/perf/util/thread.h
--- linux-2.6_tip0317_statrecord/tools/perf/util/thread.h	2010-03-18 13:45:27.256771458 +0800
+++ linux-2.6_tip0317_statrecordpid/tools/perf/util/thread.h	2010-03-18 14:26:03.522627096 +0800
@@ -23,6 +23,7 @@ struct thread {
 	int			comm_len;
 };
 
+int find_all_tid(int pid, pid_t ** all_tid);
 void map_groups__init(struct map_groups *self);
 int thread__set_comm(struct thread *self, const char *comm);
 int thread__comm_len(struct thread *self);

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-18  7:49             ` Zhang, Yanmin
@ 2010-03-18  8:03               ` Ingo Molnar
  2010-03-18 13:03                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18  8:03 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Arnaldo Carvalho de Melo, Avi Kivity, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, zhiteng.huang


* Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:

> I worked out 3 new patches against tip/master tree of Mar. 17th.

Cool! Mind sending them as a series of patches instead of attachment? That 
makes it easier to review them. Also, the Signed-off-by lines seem to be 
missing plus we need a per patch changelog as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-17  8:10                                   ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar
@ 2010-03-18  8:20                                     ` Avi Kivity
  2010-03-18  8:56                                       ` Ingo Molnar
  2010-03-18  9:22                                       ` Ingo Molnar
  2010-03-18  8:44                                     ` Jes Sorensen
                                                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-18  8:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/17/2010 10:10 AM, Ingo Molnar wrote:
>
>> It's about who owns the user interface.
>>
>> If qemu owns the user interface, than we can satisfy this in a very simple
>> way by adding a perf monitor command.  If we have to support third party
>> tools, then it significantly complicates things.
>>      
> Of course illogical modularization complicates things 'significantly'.
>    

Who should own the user interface then?

> Fast forward to 2010. The kernel side of KVM is maximum goodness - by far the
> worst-quality remaining aspects of KVM are precisely in areas that you
> mention: 'if we have to support third party tools, then it significantly
> complicates things'. You kept Qemu as an external 'third party' entity to KVM,
> and KVM is clearly hurting from that - just see the recent KVM usability
> thread for examples about suckage.
>    

Any qemu usability problems are because developers (or their employers) 
are not interested in fixing them, not because of the repository 
location.  Most kvm developer interest is in server-side deployment 
(even for desktop guests), so there is limited effort in implementing a 
virtualbox-style GUI.

>   - move a clean (and minimal) version of the Qemu code base to tools/kvm/, in
>     the upstream kernel repo, and work on that from that point on.
>    

I'll ignore the repository location which should be immaterial to a 
serious developer and concentrate on the 'clean and minimal' aspect, 
since it has some merit.  Qemu development does have a tension between 
the needs of kvm and tcg.  For kvm we need fine-grained threading to 
improve performance and tons of RAS work.  For tcg these are mostly 
meaningless, and the tcg code has sufficient inertia to reduce the rate 
at which we can develop.

Nevertheless, the majority of developers feel that we'll lose more by a 
fork (the community) than we gain by it (reduced constraints).

>   - co-develop new features within the same patch. Release new versions of
>     kvm-qemu and the kvm bits at the same time (together with the upstream
>     kernel), at well defined points in time.
>    

The majority of patches to qemu don't require changes to kvm, and vice 
versa.  The interface between qemu and kvm is fairly narrow, and most of 
the changes are related to save/restore and guest debugging, hardly 
areas of great interest to the causal user.

>   - encourage kernel-space and user-space KVM developers to work on both
>     user-space and kernel-space bits as a single unit. It's one project and a
>     single experience to the user.
>    

When a feature is developed that requires both kernel and qemu changes, 
the same developer makes the changes in both projects.  Having them in 
different repositories does not appear to be a problem.

>   - [ and probably libvirt should go there too ]
>    

Let's make a list of projects who don't need to be in the kernel 
repository, it will probably be shorted.

Seriously, libvirt is a cross-platform cross-hypervisor library, it has 
no business near the Linux kernel.

> If KVM's hypervisor and guest kernel code can enjoy the benefits of a single
> repository,

In fact I try hard not to rely too much on that.  While both kvm guest 
and host code are in the same repo, there is an ABI barrier between them 
because we need to support any guest version on any host version.  When 
designing, writing, or reading guest or host code that interacts across 
that barrier we need to keep forward and backward compatibility in 
mind.  It's very different from normal kernel APIs that we can adapt 
whenever the need arises.

> why cannot the rest of KVM enjoy the same developer goodness? Only
> fixing that will bring the break-through in quality - not more manpower
> really.
>    

I really don't understand why you believe that.  You seem to want a 
virtualbox-style GUI, and lkml is probably the last place in the world 
to develop something like that.  The developers here are mostly 
uninterested in GUI and usability problems, remember these are people 
who thing emacs xor vi is a great editor.

> Yes, i've read a thousand excuses for why this is an absolutely impossible and
> a bad thing to do, and none of them was really convincing to me - and you also
> have become rather emotional about all the arguments so it's hard to argue
> about it on a technical basis.
>
> We made a similar (admittedly very difficult ...) design jump from oprofile to
> perf, and i can tell you from that experience that it's day and night, both in
> terms of development and in terms of the end result!
>    

Maybe it was due to better design and implementation choices.

> ( We recently also made another, kernel/kernel unification that had a very
>    positive result: we unified the 32-bit and 64-bit x86 architectures. Even
>    within the same repo the unification of technology is generally a good
>    thing. The KVM/Qemu situation is different - it's more similar to the perf
>    design. )
>
> Not having to fight artificial package boundaries and forced package
> separation is very refreshing experience to a developer - and very rewarding
> and flexible to develop on. ABI compatibility is _easier_ to maintain in such
> a model. It's quite similar to the jump from Xen hacking to KVM hacking (i did
> both). It's a bit like the jump from CVS to Git. Trust me, you _cannot_ know
> the difference if you havent tried a similar jump with Qemu.
>    

Why is ABI compatibility easier to maintain in a single repo?

> Anyway, you made your position about this rather clear and you are clearly
> uncompromising, so i just wanted to post this note to the list: you'll waste
> years of your life on a visibly crappy development model that has been unable
> to break through a magic usability barrier for the past 2-3 years - just like
> the Xen mis-design has wasted so many people's time and effort in kernel
> space.
>    

Do you really think the echo'n'cat school of usability wants to write a 
GUI?  In linux-2.6.git?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-17  8:10                                   ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar
  2010-03-18  8:20                                     ` Avi Kivity
@ 2010-03-18  8:44                                     ` Jes Sorensen
  2010-03-18  9:54                                       ` Ingo Molnar
  2010-03-19 14:53                                       ` Andrea Arcangeli
  2010-03-18 14:38                                     ` Anthony Liguori
  2010-03-18 14:44                                     ` Anthony Liguori
  3 siblings, 2 replies; 390+ messages in thread
From: Jes Sorensen @ 2010-03-18  8:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/17/10 09:10, Ingo Molnar wrote:
> I wish both you and Avi looked back 3-4 years and realized what made KVM so
> successful back then and why the hearts and minds of virtualization developers
> were captured by KVM almost overnight.

Ingo,

What made KVM so successful was that the core kernel of the hypervisor
was designed the right way, as a kernel module where it belonged. It was
obvious to anyone who had been exposed to the main competition at the
time, Xen, that this was the right approach. What has ended up killing
Xen in the end is the not-invented-here approach of copying everything
over, reformatting it, and rewriting half of it, which made it
impossible to maintain and support as a single codebase. At my previous
employer we ended up dropping all Xen efforts exactly because it was
like maintaining two separate operating system kernels. The key to KVM
is that once you have Linux, you practically have KVM as well.

> Fast forward to 2010. The kernel side of KVM is maximum goodness - by far the
> worst-quality remaining aspects of KVM are precisely in areas that you
> mention: 'if we have to support third party tools, then it significantly
> complicates things'. You kept Qemu as an external 'third party' entity to KVM,
> and KVM is clearly hurting from that - just see the recent KVM usability
> thread for examples about suckage.
>
> So a similar 'complication' is the crux of the matter behind KVM quality
> problems: you've not followed through with the original KVM vision and you
> have not applied that concept to Qemu!

Well there are two ways to go about this. Either you base the KVM
userland on top of an existing project, like QEMU, _or_ you rewrite it
all from scratch. However, there is far more to it than just a couple of
ioctls, for example the stack of reverse device-drivers is a pretty
significant code base, rewriting that and maintaining it is not a
trivial task. It is certainly my belief that the benefit we get from
sharing that with QEMU by far outweighs the cost of forking it and
keeping our own fork in the kernel tree. In fact it would result in
exactly the same problems I mentioned above wrt Xen.

> If you want to jump to the next level of technological quality you need to fix
> this attitude and you need to go back to the design roots of KVM. Concentrate
> on Qemu (as that is the weakest link now), make it a first class member of the
> KVM repo and simplify your development model by having a single repo:
>
>   - move a clean (and minimal) version of the Qemu code base to tools/kvm/, in
>     the upstream kernel repo, and work on that from that point on.

With this you have just thrown away all the benefits of having the QEMU
repository shared with other developers who will actively fix bugs in
components we do care about for KVM.

>   - encourage kernel-space and user-space KVM developers to work on both
>     user-space and kernel-space bits as a single unit. It's one project and a
>     single experience to the user.

This is already happening and a total non issue.

>   - [ and probably libvirt should go there too ]

Now that would be interesting, next we'll have to include things like
libxml in the kernel git tree as well, to make sure libvirt doesn't get
out of sync with the version supplied by your distribution vendor.

> Yes, i've read a thousand excuses for why this is an absolutely impossible and
> a bad thing to do, and none of them was really convincing to me - and you also
> have become rather emotional about all the arguments so it's hard to argue
> about it on a technical basis.

So far your argument would justify pulling all of gdb into the kernel
git tree as well, to support the kgdb efforts, or gcc so we can get rid
of the gcc version quirks in the kernel header files, e2fsprogs and
equivalent for _all_ file systems included in the kernel so we can make
sure our fs tools never get out of sync with whats supported in the
kernel......

> We made a similar (admittedly very difficult ...) design jump from oprofile to
> perf, and i can tell you from that experience that it's day and night, both in
> terms of development and in terms of the end result!

The user components for perf vs oprofile are _tiny_ projects compared to
the portions of QEMU that are actually used by KVM.

Oh and you completely forgot SeaBIOS. KVM+QEMU rely on SeaBIOS too, so
from what you're saying we should pull that into the kernel git
repository as well. Never mind the fact that we share SeaBIOS with the
coreboot project which is very actively adding features to it that
benefit us as well.....

Sorry, but there are times when unification make sense, and there are
times where having a reasonably well designed split makes sense. KVM
had problems with QEMU in the past which resulted in the qemu-kvm branch
of it, which proved to be a major pain to deal with, but that is
fortunately improving and qemu-kvm should go away completely at some
point.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-18  5:41                   ` Sheng Yang
@ 2010-03-18  8:47                     ` Zachary Amsden
  0 siblings, 0 replies; 390+ messages in thread
From: Zachary Amsden @ 2010-03-18  8:47 UTC (permalink / raw)
  To: Sheng Yang
  Cc: kvm, Avi Kivity, Zhang, Yanmin, Ingo Molnar, Peter Zijlstra,
	linux-kernel, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Huang, Zhiteng, Joerg Roedel

On 03/17/2010 07:41 PM, Sheng Yang wrote:
> On Thursday 18 March 2010 13:22:28 Sheng Yang wrote:
>    
>> On Thursday 18 March 2010 12:50:58 Zachary Amsden wrote:
>>      
>>> On 03/17/2010 03:19 PM, Sheng Yang wrote:
>>>        
>>>> On Thursday 18 March 2010 05:14:52 Zachary Amsden wrote:
>>>>          
>>>>> On 03/16/2010 11:28 PM, Sheng Yang wrote:
>>>>>            
>>>>>> On Wednesday 17 March 2010 10:34:33 Zhang, Yanmin wrote:
>>>>>>              
>>>>>>> On Tue, 2010-03-16 at 11:32 +0200, Avi Kivity wrote:
>>>>>>>                
>>>>>>>> On 03/16/2010 09:48 AM, Zhang, Yanmin wrote:
>>>>>>>>                  
>>>>>>>>> Right, but there is a scope between kvm_guest_enter and really
>>>>>>>>> running in guest os, where a perf event might overflow. Anyway,
>>>>>>>>> the scope is very narrow, I will change it to use flag PF_VCPU.
>>>>>>>>>                    
>>>>>>>> There is also a window between setting the flag and calling 'int
>>>>>>>> $2' where an NMI might happen and be accounted incorrectly.
>>>>>>>>
>>>>>>>> Perhaps separate the 'int $2' into a direct call into perf and
>>>>>>>> another call for the rest of NMI handling.  I don't see how it
>>>>>>>> would work on svm though - AFAICT the NMI is held whereas vmx
>>>>>>>> swallows it.
>>>>>>>>
>>>>>>>>     I guess NMIs
>>>>>>>> will be disabled until the next IRET so it isn't racy, just tricky.
>>>>>>>>                  
>>>>>>> I'm not sure if vmexit does break NMI context or not. Hardware NMI
>>>>>>> context isn't reentrant till a IRET. YangSheng would like to double
>>>>>>> check it.
>>>>>>>                
>>>>>> After more check, I think VMX won't remained NMI block state for
>>>>>> host. That's means, if NMI happened and processor is in VMX non-root
>>>>>> mode, it would only result in VMExit, with a reason indicate that
>>>>>> it's due to NMI happened, but no more state change in the host.
>>>>>>
>>>>>> So in that meaning, there _is_ a window between VMExit and KVM handle
>>>>>> the NMI. Moreover, I think we _can't_ stop the re-entrance of NMI
>>>>>> handling code because "int $2" don't have effect to block following
>>>>>> NMI.
>>>>>>
>>>>>> And if the NMI sequence is not important(I think so), then we need to
>>>>>> generate a real NMI in current vmexit-after code. Seems let APIC send
>>>>>> a NMI IPI to itself is a good idea.
>>>>>>
>>>>>> I am debugging a patch based on apic->send_IPI_self(NMI_VECTOR) to
>>>>>> replace "int $2". Something unexpected is happening...
>>>>>>              
>>>>> You can't use the APIC to send vectors 0x00-0x1f, or at least, aren't
>>>>> supposed to be able to.
>>>>>            
>>>> Um? Why?
>>>>
>>>> Especially kernel is already using it to deliver NMI.
>>>>          
>>> That's the only defined case, and it is defined because the vector field
>>> is ignore for DM_NMI.  Vol 3A (exact section numbers may vary depending
>>> on your version).
>>>
>>> 8.5.1 / 8.6.1
>>>
>>> '100 (NMI) Delivers an NMI interrupt to the target processor or
>>> processors.  The vector information is ignored'
>>>
>>> 8.5.2  Valid Interrupt Vectors
>>>
>>> 'Local and I/O APICs support 240 of these vectors (in the range of 16 to
>>> 255) as valid interrupts.'
>>>
>>> 8.8.4 Interrupt Acceptance for Fixed Interrupts
>>>
>>> '...; vectors 0 through 15 are reserved by the APIC (see also: Section
>>> 8.5.2, "Valid Interrupt Vectors")'
>>>
>>> So I misremembered, apparently you can deliver interrupts 0x10-0x1f, but
>>> vectors 0x00-0x0f are not valid to send via APIC or I/O APIC.
>>>        
>> As you pointed out, NMI is not "Fixed interrupt". If we want to send NMI,
>>   it would need a specific delivery mode rather than vector number.
>>
>> And if you look at code, if we specific NMI_VECTOR, the delivery mode would
>>   be set to NMI.
>>
>> So what's wrong here?
>>      
> OK, I think I understand your points now. You meant that these vectors can't
> be filled in vector field directly, right? But NMI is a exception due to
> DM_NMI. Is that your point? I think we agree on this.
>    

Yes, I think we agree.  NMI is the only vector in 0x0-0xf which can be 
sent via self-IPI because the vector itself does not matter for NMI.

Zach

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18  8:20                                     ` Avi Kivity
@ 2010-03-18  8:56                                       ` Ingo Molnar
  2010-03-18  9:24                                         ` Alexander Graf
  2010-03-18 10:12                                         ` Avi Kivity
  2010-03-18  9:22                                       ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18  8:56 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/17/2010 10:10 AM, Ingo Molnar wrote:
> >
> >> It's about who owns the user interface.
> >>
> >> If qemu owns the user interface, than we can satisfy this in a very 
> >> simple way by adding a perf monitor command.  If we have to support third 
> >> party tools, then it significantly complicates things.
> >
> > Of course illogical modularization complicates things 'significantly'.
> 
> Who should own the user interface then?

If qemu was in tools/kvm/ then we wouldnt have such issues. A single patch (or 
series of patches) could modify tools/kvm/, arch/x86/kvm/, virt/ and 
tools/perf/.

Numerous times did we have patches to kernel/perf_event.c that fixed some 
detail, also accompanied by a tools/perf/ patch fixing another detail. Having 
a single 'culture of contribution' is a powerful way to develop.

It turns out kernel developers can be pretty good user-space developers as 
well and user-space developers can be pretty good kernel developers as well. 
Some like to do both - as long as it's all within a single project.

The moment any change (be it as trivial as fixing a GUI detail or as complex 
as a new feature) involves two or more packages, development speed slows down 
to a crawl - while the complexity of the change might be very low!

Also, there's the harmful process that people start categorizing themselves 
into 'I am a kernel developer' and 'I am a user space programmer' stereotypes, 
which limits the scope of contributions artificially.

> > Fast forward to 2010. The kernel side of KVM is maximum goodness - by far 
> > the worst-quality remaining aspects of KVM are precisely in areas that you 
> > mention: 'if we have to support third party tools, then it significantly 
> > complicates things'. You kept Qemu as an external 'third party' entity to 
> > KVM, and KVM is clearly hurting from that - just see the recent KVM 
> > usability thread for examples about suckage.
> 
> Any qemu usability problems are because developers (or their employers) are 
> not interested in fixing them, not because of the repository location.  Most 
> kvm developer interest is in server-side deployment (even for desktop 
> guests), so there is limited effort in implementing a virtualbox-style GUI.

The same has been said of oprofile as well: 'it somewhat sucks because we are 
too server centric', 'nobody is interested in good usability and oprofile is 
fine for the enterprises'. Ironically, the same has been said of Xen usability 
as well, up to the point KVM came around.

What was the core of the problem was a bad design and a split kernel-side 
user-side tool landscape.

In fact i think saying that 'our developers only care about the server' is 
borderline dishonest, when at the same time you are making it doubly sure (by 
inaction) that it stays so: by leaving an artificial package wall between 
kernel-side KVM and user-side KVM and not integrating the two technologies.

You'll never know what heights you could achieve if you leave that wall there 
...

Furthermore, what should be realized is that bad usability hurts "server 
features" just as much. Most of the day-to-day testing is done on the desktop 
by desktop oriented testers/developers. _Not_ by enterprise shops - they tend 
to see the code years down the line to begin with ...

Yes, a particular feature might be server oriented, but a good portion of our 
testing is on the desktop and everyone is hurting from bad usability and this 
puts limits on contribution efficiency.

As the patch posted in _this very thread demonstrates it_, it is doubly more 
difficult to contribute a joint KVM+Qemu feature, because it's two separate 
code bases, two contribution guidelines, two release schedules. While to the 
user it really is just one and the same thing. It should be so for the 
developer as well.

Put in another way: KVM's current split design is making it easy to contribute 
server features (because the kernel side is clean and cool), but also makes it 
artificially hard to contribute desktop features: because the tooling side 
(Qemu) is 'just another package', is separated by a package and maintenance 
wall and is made somewhat uncool by a (as some KVM developers have pointed out 
in this thread) quirky codebase.

(the rest of your points are really a function of this fundamental 
disagreement)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18  8:20                                     ` Avi Kivity
  2010-03-18  8:56                                       ` Ingo Molnar
@ 2010-03-18  9:22                                       ` Ingo Molnar
  2010-03-18 10:32                                         ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18  9:22 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> >  - move a clean (and minimal) version of the Qemu code base to tools/kvm/, 
> >    in the upstream kernel repo, and work on that from that point on.
> 
> I'll ignore the repository location which should be immaterial to a serious 
> developer and concentrate on the 'clean and minimal' aspect, since it has 
> some merit.  [...]

To the contrary, experience shows that repository location, and in particular 
a shared repository for closely related bits is very much material!

It matters because when there are two separate projects, even a "serious 
developer" is finding it double and triple difficult to contribute even 
trivial changes.

It becomes literally a nightmare if you have to touch 3 packages: kernel, a 
library and an app codebase. It takes _forever_ to get anything useful done.

Also, 'focus on a single thing' is a very basic aspect of humans, especially 
those who do computer programming. Working on two code bases in two 
repositories at once can be very challenging physically and psychically.

So what i've seen is that OSS programmers tend to pick a side, pretty much 
randomly, and then rationalize it in hindsight why they prefer that side ;-)

Most of them become either a kernel developer or a user-space package 
developer - and then they specialize on that field and shy away from changes 
that involve both. It's a basic human thing to avoid the hassle that comes 
with multi-package changes. (One really has to be outright stupid, fanatic or 
desperate to even attempt such changes these days - such are the difficulties 
for a comparatively low return.)

The solution is to tear down such artificial walls of contribution where 
possible. And tearing down the wall between KVM and qemu-kvm seems very much 
possible and the advantages would be numerous.

Unless by "serious developer" you meant: "developer willing to [or forced to] 
waste time and effort on illogically structured technology".

> [...]
>
> Do you really think the echo'n'cat school of usability wants to write a GUI?  
> In linux-2.6.git?

Then you'll be surprised to hear that it's happening as we speak and the 
commits are there in linux-2.6.git. Both a TUI and GUI is in the works.

Furthermore, the numbers show that half of the usability fixes to tools/perf/ 
came not from regular perf contributors but from random kernel developers and 
testers who when they build the latest kernel and try out perf at the same 
time (it's very easy because you already have it in the kernel repository - no 
separate download, no installation, etc. necessary).

I had literally zero such contributions when (the precursor to) 'perf' was 
still a separate user-space project.

You could have the same effect for Qemu: the latest bits in tools/kvm/ would 
be built by regular kernel testers and developers. The integration benefits 
dont just extend to developers, a unified project is vastly easier to test as 
well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18  8:56                                       ` Ingo Molnar
@ 2010-03-18  9:24                                         ` Alexander Graf
  2010-03-18 10:10                                           ` Ingo Molnar
  2010-03-18 10:12                                         ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Alexander Graf @ 2010-03-18  9:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


On 18.03.2010, at 09:56, Ingo Molnar wrote:

> 
> * Avi Kivity <avi@redhat.com> wrote:
> 
>> On 03/17/2010 10:10 AM, Ingo Molnar wrote:
>>> 
>>>> It's about who owns the user interface.
>>>> 
>>>> If qemu owns the user interface, than we can satisfy this in a very 
>>>> simple way by adding a perf monitor command.  If we have to support third 
>>>> party tools, then it significantly complicates things.
>>> 
>>> Of course illogical modularization complicates things 'significantly'.
>> 
>> Who should own the user interface then?
> 
> If qemu was in tools/kvm/ then we wouldnt have such issues. A single patch (or 
> series of patches) could modify tools/kvm/, arch/x86/kvm/, virt/ and 
> tools/perf/.

It's not a 1:1 connection. There are more users of the KVM interface. To name a few I'm aware of:

- Mac-on-Linux (PPC)
- Dolphin (PPC)
- Xenner (x86)
- Kuli (s390)

Having a clear userspace interface is the only viable solution there. And if you're interested, look at my MOL enabling patch. It's less than 500 lines of code.

The kernel/userspace interface really isn't the difficult part. Getting device emulation working properly, easily and fast is.


Alex

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18  8:44                                     ` Jes Sorensen
@ 2010-03-18  9:54                                       ` Ingo Molnar
  2010-03-18 10:40                                         ` Jes Sorensen
  2010-03-19 14:53                                       ` Andrea Arcangeli
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18  9:54 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Jes Sorensen <Jes.Sorensen@redhat.com> wrote:

> On 03/17/10 09:10, Ingo Molnar wrote:
>
> > I wish both you and Avi looked back 3-4 years and realized what made KVM 
> > so successful back then and why the hearts and minds of virtualization 
> > developers were captured by KVM almost overnight.
> 
> Ingo,
> 
> What made KVM so successful was that the core kernel of the hypervisor was 
> designed the right way, as a kernel module where it belonged. It was obvious 
> to anyone who had been exposed to the main competition at the time, Xen, 
> that this was the right approach. What has ended up killing Xen in the end 
> is the not-invented-here approach of copying everything over, reformatting 
> it, and rewriting half of it, which made it impossible to maintain and 
> support as a single codebase. [...]

Yes, exactly.

I was part of that nightmare so i know.

> [...]
>
> At my previous employer we ended up dropping all Xen efforts exactly because 
> it was like maintaining two separate operating system kernels. The key to 
> KVM is that once you have Linux, you practically have KVM as well.

Yes. Please realize that what is behind it is a strikingly simple argument:

 "Once you have a single project to develop and maintain all is much better."

> > Fast forward to 2010. The kernel side of KVM is maximum goodness - by far 
> > the worst-quality remaining aspects of KVM are precisely in areas that you 
> > mention: 'if we have to support third party tools, then it significantly 
> > complicates things'. You kept Qemu as an external 'third party' entity to 
> > KVM, and KVM is clearly hurting from that - just see the recent KVM 
> > usability thread for examples about suckage.
> >
> > So a similar 'complication' is the crux of the matter behind KVM quality 
> > problems: you've not followed through with the original KVM vision and you 
> > have not applied that concept to Qemu!
> 
> Well there are two ways to go about this. Either you base the KVM userland 
> on top of an existing project, like QEMU, _or_ you rewrite it all from 
> scratch. [...]

Btw., i made similar arguments to Avi about 3 years ago when it was going 
upstream, that qemu should be unified with KVM. This is more true today than 
ever.

> [...] However, there is far more to it than just a couple of ioctls, for 
> example the stack of reverse device-drivers is a pretty significant code 
> base, rewriting that and maintaining it is not a trivial task. It is 
> certainly my belief that the benefit we get from sharing that with QEMU by 
> far outweighs the cost of forking it and keeping our own fork in the kernel 
> tree. In fact it would result in exactly the same problems I mentioned above 
> wrt Xen.

I do not suggest forking Qemu at all, i suggest using the most natural 
development model for the KVM+Qemu shared project: a single repository.

> > If you want to jump to the next level of technological quality you need to 
> > fix this attitude and you need to go back to the design roots of KVM. 
> > Concentrate on Qemu (as that is the weakest link now), make it a first 
> > class member of the KVM repo and simplify your development model by having 
> > a single repo:
> >
> >  - move a clean (and minimal) version of the Qemu code base to tools/kvm/,
> >    in the upstream kernel repo, and work on that from that point on.
> 
> With this you have just thrown away all the benefits of having the QEMU 
> repository shared with other developers who will actively fix bugs in 
> components we do care about for KVM.

Not if it's a unified project.

> >  - encourage kernel-space and user-space KVM developers to work on both
> >    user-space and kernel-space bits as a single unit. It's one project and 
> >    a single experience to the user.
> 
> This is already happening and a total non issue.

My experience as an external observer of the end result contradicts this.

Seemingly trivial usability changes to the KVM+Qemu combo are not being done 
often because they involve cross-discipline changes.

( _In this very thread_ there has been a somewhat self-defeating argument by 
  Anthony that multi-package scenario would 'significantly complicate' 
  matters. What more proof do we need to state the obvious? Keeping what
  has become one piece of technology over the years in two separate halves is
  obviously bad. )

> >  - [ and probably libvirt should go there too ]
> 
> Now that would be interesting, next we'll have to include things like libxml 
> in the kernel git tree as well, to make sure libvirt doesn't get out of sync 
> with the version supplied by your distribution vendor.

The way we have gone about this in tools/perf/ is similar to the route picked 
by Git: we only use very lowlevel libraries available everywhere, and we 
provide optional wrappers to the rest.

We are also using the kernel's libraries so we rarely need to go outside to 
get some functionality.

I.e. it's a non-issue in practice and despite perf having an (optional) 
dependency on xmlto and docbook we dont include those packages nor do we force 
users to install particular versions of them.

> > Yes, i've read a thousand excuses for why this is an absolutely impossible 
> > and a bad thing to do, and none of them was really convincing to me - and 
> > you also have become rather emotional about all the arguments so it's hard 
> > to argue about it on a technical basis.
> 
> So far your argument would justify pulling all of gdb into the kernel git 
> tree as well, to support the kgdb efforts, or gcc so we can get rid of the 
> gcc version quirks in the kernel header files, e2fsprogs and equivalent for 
> _all_ file systems included in the kernel so we can make sure our fs tools 
> never get out of sync with whats supported in the kernel......

gdb and gcc is clearly extrinsic to the kernel so why would we move them 
there?

I was talking about tools that are closely related to the kernel - where much 
of the development and actual use is in combination with the Linux kernel.

90%+ of the Qemu usecases are combined with Linux. (Yes, i know that you can 
run Qemu without KVM, and no, i dont think it matters in the grand scheme of 
things and most investment into Qemu comes from the KVM angle these days. In 
particular it for sure does not justify handicapping future KVM evolution so 
drastically.)

> > We made a similar (admittedly very difficult ...) design jump from 
> > oprofile to perf, and i can tell you from that experience that it's day 
> > and night, both in terms of development and in terms of the end result!
> 
> The user components for perf vs oprofile are _tiny_ projects compared to the 
> portions of QEMU that are actually used by KVM.

I know the size and scope of Qemu, i even hacked it - still my points remain. 
(my arguments are influenced and strengthened by that past hacking experience)

> Oh and you completely forgot SeaBIOS. KVM+QEMU rely on SeaBIOS too, so from 
> what you're saying we should pull that into the kernel git repository as 
> well. Never mind the fact that we share SeaBIOS with the coreboot project 
> which is very actively adding features to it that benefit us as well.....

SeaBIOS is in essence a firmware, so it could either be loaded as such.

Just look at the qemu source code - the BIOSes are .bin images in 
qemu/pc-bios/ imported externally in essence.

Moving qemu to tools/kvm/ would not change that much. The firmware could 
become part of /lib/firmware/*.bin.

( That would probably be a more intelligent approach to the BIOS image import 
  problem as well. )

> Sorry, but there are times when unification make sense, and there are times 
> where having a reasonably well designed split makes sense. KVM had problems 
> with QEMU in the past which resulted in the qemu-kvm branch of it, which 
> proved to be a major pain to deal with, but that is fortunately improving 
> and qemu-kvm should go away completely at some point.

qemu-kvm branch is not similar to my proposal at all: it made KVM _more_ 
fragmented, not more unified. I.e. it was a move in the exact opposite 
direction and i'd expect such a move to fail.

In fact the failure of qemu-kvm supports my point rather explicitly: it 
demonstrates that extra packages and split development are actively harmful.

I speak about this as a person who has done successful unifications of split 
codebases and in my judgement this move would be significantly beneficial to 
KVM.

You cannot really validly reject this proposal with "It wont work" as it 
clearly has worked in other, comparable cases. You could only reject this with 
"I have tried it and it didnt work".

Think about it: a clean and hackable user-space component in tools/kvm/. It's 
very tempting :-)

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18  9:24                                         ` Alexander Graf
@ 2010-03-18 10:10                                           ` Ingo Molnar
  2010-03-18 10:21                                             ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 10:10 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Alexander Graf <agraf@suse.de> wrote:

> 
> On 18.03.2010, at 09:56, Ingo Molnar wrote:
> 
> > 
> > * Avi Kivity <avi@redhat.com> wrote:
> > 
> >> On 03/17/2010 10:10 AM, Ingo Molnar wrote:
> >>> 
> >>>> It's about who owns the user interface.
> >>>> 
> >>>> If qemu owns the user interface, than we can satisfy this in a very 
> >>>> simple way by adding a perf monitor command.  If we have to support third 
> >>>> party tools, then it significantly complicates things.
> >>> 
> >>> Of course illogical modularization complicates things 'significantly'.
> >> 
> >> Who should own the user interface then?
> > 
> > If qemu was in tools/kvm/ then we wouldnt have such issues. A single patch (or 
> > series of patches) could modify tools/kvm/, arch/x86/kvm/, virt/ and 
> > tools/perf/.
> 
> It's not a 1:1 connection. There are more users of the KVM interface. To 
> name a few I'm aware of:
> 
> - Mac-on-Linux (PPC)
> - Dolphin (PPC)
> - Xenner (x86)
> - Kuli (s390)

There must be a misunderstanding here: tools/perf/ still has a clear userspace 
interface and ABI. There's external projects making use of it: sysprof and 
libpfm (and probably more i dont know about it). Those projects are also 
contributing back.

Still it's _very_ useful to have a single reference implementation under 
tools/perf/ where we concentrate the best of the code. That is where we make 
sure that each new kernel feature is appropriately implemented in user-space 
as well, that the combination works well together and is releasable to users. 
That is what keeps us all honest: the latency of features is much lower, and 
there's no ping-pong of blame going on between the two components in case of 
bugs or in case of misfeatures.

Same goes for KVM+Qemu: it would be so much nicer to have a single, 
well-focused reference implementation under tools/kvm/ and have improvements 
flow into that code base.

That way KVM developers cannot just shrug "well, GUI suckage is a user-space 
problem" - like the answers i got in the KVM usability thread ...

The buck will stop here.

And if someone thinks he can do better an external project can be started 
anytime. (it may even replace the upstream thing if it's better)

> Having a clear userspace interface is the only viable solution there. And if 
> you're interested, look at my MOL enabling patch. It's less than 500 lines 
> of code.

Why do you suppose that what i propose is an "either or" scenario?

It isnt. I just suggested that instead of letting core KVM fragment its limbs 
into an external entity, put your name behind one good all-around solution and 
focus the development model into a single project.

I.e. do what KVM has done originally in the kernel space to begin with - and 
where it was so much better than Xen: single focus.

Learn from what KVM has done so well in the initial years and use the concept 
on the user-space components as well. The very same arguments that caused KVM 
to integrate into the upstream kernel (instead of being a separate project) 
are a valid basis to integrate the user-space components into tools/kvm/. Dont 
forget your roots and dont assume all your design decisions were correct.

> The kernel/userspace interface really isn't the difficult part. Getting 
> device emulation working properly, easily and fast is.

The kernel/userspace ABI is not difficult at all. Getting device emulation 
working properly, easily and fast indeed is. And my experience is that it is 
not working properly nor quickly at the moment, at all. (see the 'KVM 
usability' thread)

Getting device emulation working properly often involves putting certain 
pieces that are currently done in Qemu into kernel-space. That kind of 
'movement of emulation technology' from user-space component into the 
kernel-space component [or back] would very clearly be helped if those two 
components were in the same repository.

And i have first-hand experience there: we had (and have) similar scenarios 
with tools/perf routinely. We did some aspects in user-space, then decided to 
do it in kernel-space. Sometimes we moved kernel bits to user-space. It was 
very easy and there were no package and version complications as it's a single 
project. Sometimes we even moved bits back and forth until we found the right 
balance.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18  8:56                                       ` Ingo Molnar
  2010-03-18  9:24                                         ` Alexander Graf
@ 2010-03-18 10:12                                         ` Avi Kivity
  2010-03-18 10:28                                           ` Ingo Molnar
  2010-03-18 10:50                                           ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 10:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 10:56 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/17/2010 10:10 AM, Ingo Molnar wrote:
>>      
>>>        
>>>> It's about who owns the user interface.
>>>>
>>>> If qemu owns the user interface, than we can satisfy this in a very
>>>> simple way by adding a perf monitor command.  If we have to support third
>>>> party tools, then it significantly complicates things.
>>>>          
>>> Of course illogical modularization complicates things 'significantly'.
>>>        
>> Who should own the user interface then?
>>      
> If qemu was in tools/kvm/ then we wouldnt have such issues. A single patch (or
> series of patches) could modify tools/kvm/, arch/x86/kvm/, virt/ and
> tools/perf/.
>    

We would have exactly the same issues, only they would be in a single 
repository.  The only difference is that we could ignore potential 
alternatives to qemu, libvirt, and RHEV-M.  But that's not how kernel 
ABIs are developed, we try to make them general, not suited to just one 
consumer that happens to be close to our heart.

> Numerous times did we have patches to kernel/perf_event.c that fixed some
> detail, also accompanied by a tools/perf/ patch fixing another detail. Having
> a single 'culture of contribution' is a powerful way to develop.
>    

In fact kvm started out in a single repo, and it certainly made it easy 
to bring it up in baby steps.  But we've long outgrown that.  Maybe the 
difference is that perf is still new and thus needs tight cooperation.  
If/when perf gains a real GUI, I doubt more than 1% of the patches will 
touch both kernel and userspace.

> It turns out kernel developers can be pretty good user-space developers as
> well and user-space developers can be pretty good kernel developers as well.
> Some like to do both - as long as it's all within a single project.
>    

Very childish of them.  If someone wants to contribute to a userspace 
project, they can swallow their pride and send patches to a non-kernel 
mailing list and repository.

> The moment any change (be it as trivial as fixing a GUI detail or as complex
> as a new feature) involves two or more packages, development speed slows down
> to a crawl - while the complexity of the change might be very low!
>    

Why is that?

I the maintainers of all packages are cooperative and responsive, then 
the patches will get accepted quickly.  If they aren't, development will 
be slow.  It isn't any different from contributing to two unrelated 
kernel subsystems (which are in fact in different repositories until the 
next merge window).

> Also, there's the harmful process that people start categorizing themselves
> into 'I am a kernel developer' and 'I am a user space programmer' stereotypes,
> which limits the scope of contributions artificially.
>    

You're encouraging this with your proposal.  You're basically using the 
glory of kernel development to attract people to userspace.

>>> Fast forward to 2010. The kernel side of KVM is maximum goodness - by far
>>> the worst-quality remaining aspects of KVM are precisely in areas that you
>>> mention: 'if we have to support third party tools, then it significantly
>>> complicates things'. You kept Qemu as an external 'third party' entity to
>>> KVM, and KVM is clearly hurting from that - just see the recent KVM
>>> usability thread for examples about suckage.
>>>        
>> Any qemu usability problems are because developers (or their employers) are
>> not interested in fixing them, not because of the repository location.  Most
>> kvm developer interest is in server-side deployment (even for desktop
>> guests), so there is limited effort in implementing a virtualbox-style GUI.
>>      
> The same has been said of oprofile as well: 'it somewhat sucks because we are
> too server centric', 'nobody is interested in good usability and oprofile is
> fine for the enterprises'. Ironically, the same has been said of Xen usability
> as well, up to the point KVM came around.
>
> What was the core of the problem was a bad design and a split kernel-side
> user-side tool landscape.
>    

I can accept the bad design (not knowing any of the details), but how 
can the kernel/user split affect usability?

> In fact i think saying that 'our developers only care about the server' is
> borderline dishonest, when at the same time you are making it doubly sure (by
> inaction) that it stays so: by leaving an artificial package wall between
> kernel-side KVM and user-side KVM and not integrating the two technologies.
>    

The wall is maybe four nanometers high.  Please be serious.  If someone 
wants to work on qemu usability all they have to do is to clone the 
repository and start sending patches to qemu-devel@.  What's gained by 
putting it in the kernel repository?  You're saving a minute's worth of 
clone, and that only for people who already happen to be kernel developers.

> You'll never know what heights you could achieve if you leave that wall there
> ...
>    

I truly don't know.  What highly usable GUIs were developed in the kernel?

> Furthermore, what should be realized is that bad usability hurts "server
> features" just as much. Most of the day-to-day testing is done on the desktop
> by desktop oriented testers/developers. _Not_ by enterprise shops - they tend
> to see the code years down the line to begin with ...
>
> Yes, a particular feature might be server oriented, but a good portion of our
> testing is on the desktop and everyone is hurting from bad usability and this
> puts limits on contribution efficiency.
>    

I'm not saying that improved usability isn't a good thing, but time 
spent on improving the GUI is time not spent on the features that we 
really want.

Desktop oriented users also rarely test 16 vcpu guests with tons of RAM 
exercising 10Gb NICs and a SAN.  Instead they care about graphics 
performance for 2vcpu/1GB guests.

> As the patch posted in _this very thread demonstrates it_, it is doubly more
> difficult to contribute a joint KVM+Qemu feature, because it's two separate
> code bases, two contribution guidelines, two release schedules. While to the
> user it really is just one and the same thing. It should be so for the
> developer as well.
>    

It's hard to contribute a patch that goes against the architecture of 
the system, where kvm deals with cpu virtualization, qemu (or 
theoretically another tool) manages a guest, and libvirt (or another 
tool) manages the host.  You want a list of guests to be provided by 
qemu or the kernel, and that simply isn't how the system works.

> Put in another way: KVM's current split design is making it easy to contribute
> server features (because the kernel side is clean and cool), but also makes it
> artificially hard to contribute desktop features: because the tooling side
> (Qemu) is 'just another package', is separated by a package and maintenance
> wall


Most server oriented patches in qemu/kvm have gone into qemu, not kvm 
(simply because it sees many more patches overall).  It isn't hard to 
contribute to 'just another package', I have 1700 packages installed on 
my desktop and only one of them is a kernel.

Anyway your arguments apply equally well to gedit.

> and is made somewhat uncool by a (as some KVM developers have pointed out
> in this thread) quirky codebase.
>    

The qemu codebase is in fact quirky, but cp won't solve it.  Only long 
patchsets to qemu-devel@.

> (the rest of your points are really a function of this fundamental
> disagreement)
>    

I disagree.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 10:10                                           ` Ingo Molnar
@ 2010-03-18 10:21                                             ` Avi Kivity
  2010-03-18 11:35                                               ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 10:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 12:10 PM, Ingo Molnar wrote:
>
>> It's not a 1:1 connection. There are more users of the KVM interface. To
>> name a few I'm aware of:
>>
>> - Mac-on-Linux (PPC)
>> - Dolphin (PPC)
>> - Xenner (x86)
>> - Kuli (s390)
>>      
> There must be a misunderstanding here: tools/perf/ still has a clear userspace
> interface and ABI. There's external projects making use of it: sysprof and
> libpfm (and probably more i dont know about it). Those projects are also
> contributing back.
>    

So it seems it is possible to scale the package wall.

> Still it's _very_ useful to have a single reference implementation under
> tools/perf/ where we concentrate the best of the code. That is where we make
> sure that each new kernel feature is appropriately implemented in user-space
> as well, that the combination works well together and is releasable to users.
> That is what keeps us all honest: the latency of features is much lower, and
> there's no ping-pong of blame going on between the two components in case of
> bugs or in case of misfeatures.
>    

That would make sense for a truly minimal userspace for kvm: we once had 
a tool called kvmctl which was used to run tests (since folded into 
qemu).  It didn't contain a GUI and was unable to run a general purpose 
guest.  It was a few hundred lines of code, and indeed patches to kvmctl 
had a much closer correspondence to patches with kvm (though still low, 
as most kvm patches don't modify the ABI).

> Same goes for KVM+Qemu: it would be so much nicer to have a single,
> well-focused reference implementation under tools/kvm/ and have improvements
> flow into that code base.
>
> That way KVM developers cannot just shrug "well, GUI suckage is a user-space
> problem" - like the answers i got in the KVM usability thread ...
>
> The buck will stop here.
>    

Suppose we copy qemu tomorrow into tools/.  All the problems will be 
copied with it.  Someone still has to write patches to fix them.  Who 
will it be?

>> The kernel/userspace interface really isn't the difficult part. Getting
>> device emulation working properly, easily and fast is.
>>      
> The kernel/userspace ABI is not difficult at all. Getting device emulation
> working properly, easily and fast indeed is. And my experience is that it is
> not working properly nor quickly at the moment, at all. (see the 'KVM
> usability' thread)
>
> Getting device emulation working properly often involves putting certain
> pieces that are currently done in Qemu into kernel-space. That kind of
> 'movement of emulation technology' from user-space component into the
> kernel-space component [or back] would very clearly be helped if those two
> components were in the same repository.
>    

Moving emulation into the kernel is indeed a problem.  Not because it's 
difficult, but because it indicates that the interfaces exposed to 
userspace are insufficient to obtain good performance.  We had that with 
vhost-net and I'm afraid we'll have that with vhost-blk.

> And i have first-hand experience there: we had (and have) similar scenarios
> with tools/perf routinely. We did some aspects in user-space, then decided to
> do it in kernel-space. Sometimes we moved kernel bits to user-space. It was
> very easy and there were no package and version complications as it's a single
> project. Sometimes we even moved bits back and forth until we found the right
> balance.
>    

That's reasonable in the first iterations of a project.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 10:12                                         ` Avi Kivity
@ 2010-03-18 10:28                                           ` Ingo Molnar
  2010-03-18 10:50                                           ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 10:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/18/2010 10:56 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>On 03/17/2010 10:10 AM, Ingo Molnar wrote:
> >>>>It's about who owns the user interface.
> >>>>
> >>>>If qemu owns the user interface, than we can satisfy this in a very
> >>>>simple way by adding a perf monitor command.  If we have to support third
> >>>>party tools, then it significantly complicates things.
> >>>Of course illogical modularization complicates things 'significantly'.
> >>Who should own the user interface then?
> >If qemu was in tools/kvm/ then we wouldnt have such issues. A single patch (or
> >series of patches) could modify tools/kvm/, arch/x86/kvm/, virt/ and
> >tools/perf/.
> 
> We would have exactly the same issues, only they would be in a single 
> repository.  The only difference is that we could ignore potential 
> alternatives to qemu, libvirt, and RHEV-M.  But that's not how kernel ABIs 
> are developed, we try to make them general, not suited to just one consumer 
> that happens to be close to our heart.

Not at all - as i replied to in a previous mail, tools/perf/ still has a clear 
userspace interface and ABI, and external projects are making use of it.

So there's no problem with the ABI at all.

In fact our experience has been the opposite: the perf ABI is markedly better 
_because_ there's an immediate consumer of it in the form of tools/perf/. It 
gets tested better and external projects can get their ABI tweaks in as well 
and can provide a reference implementation for tools/perf. This has happened a 
couple of times. It's a win-win scenario.

So the exact opposite of what you suggest is happening in practice.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18  9:22                                       ` Ingo Molnar
@ 2010-03-18 10:32                                         ` Avi Kivity
  2010-03-18 11:19                                           ` Ingo Molnar
  2010-03-18 18:20                                           ` Frederic Weisbecker
  0 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 10:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 11:22 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>>   - move a clean (and minimal) version of the Qemu code base to tools/kvm/,
>>>     in the upstream kernel repo, and work on that from that point on.
>>>        
>> I'll ignore the repository location which should be immaterial to a serious
>> developer and concentrate on the 'clean and minimal' aspect, since it has
>> some merit.  [...]
>>      
> To the contrary, experience shows that repository location, and in particular
> a shared repository for closely related bits is very much material!
>
> It matters because when there are two separate projects, even a "serious
> developer" is finding it double and triple difficult to contribute even
> trivial changes.
>
> It becomes literally a nightmare if you have to touch 3 packages: kernel, a
> library and an app codebase. It takes _forever_ to get anything useful done.
>    

You can't be serious.  I find that the difficulty in contributing a 
patch has mostly to do with writing the patch, and less with figuring 
out which email address to send it to.

> Also, 'focus on a single thing' is a very basic aspect of humans, especially
> those who do computer programming. Working on two code bases in two
> repositories at once can be very challenging physically and psychically.
>    

Indeed, working simultaneously on two different projects is difficult.  
I usually work for a while on one, and then 'cd', physically and 
psychically, to the other.  Then switch back.  Sort of like the 
scheduler on a uniprocessor machine.

> So what i've seen is that OSS programmers tend to pick a side, pretty much
> randomly, and then rationalize it in hindsight why they prefer that side ;-)
>
> Most of them become either a kernel developer or a user-space package
> developer - and then they specialize on that field and shy away from changes
> that involve both. It's a basic human thing to avoid the hassle that comes
> with multi-package changes. (One really has to be outright stupid, fanatic or
> desperate to even attempt such changes these days - such are the difficulties
> for a comparatively low return.)
>    

We have a large number of such stupid, fanatic, desperate developers in 
the qemu and kvm communities.

> The solution is to tear down such artificial walls of contribution where
> possible. And tearing down the wall between KVM and qemu-kvm seems very much
> possible and the advantages would be numerous.
>
> Unless by "serious developer" you meant: "developer willing to [or forced to]
> waste time and effort on illogically structured technology".
>    

By "serious developer" I mean

  - someone who is interested in contributing, not in getting their name 
into the kernel commits list
  - someone who is willing to read the wiki page and find out where the 
repository and mailing list for a project is
  - someone who will spend enough time on the project so that the time 
to clone two repositories will not be a factor in their contributions
  - someone who will work on the uncool stuff like fixing bugs and 
providing interfaces to other tools

>> [...]
>>
>> Do you really think the echo'n'cat school of usability wants to write a GUI?
>> In linux-2.6.git?
>>      
> Then you'll be surprised to hear that it's happening as we speak and the
> commits are there in linux-2.6.git. Both a TUI and GUI is in the works.
>
> Furthermore, the numbers show that half of the usability fixes to tools/perf/
> came not from regular perf contributors but from random kernel developers and
> testers who when they build the latest kernel and try out perf at the same
> time (it's very easy because you already have it in the kernel repository - no
> separate download, no installation, etc. necessary).
>
> I had literally zero such contributions when (the precursor to) 'perf' was
> still a separate user-space project.
>
> You could have the same effect for Qemu: the latest bits in tools/kvm/ would
> be built by regular kernel testers and developers. The integration benefits
> dont just extend to developers, a unified project is vastly easier to test as
> well.
>
>    

Let's wait and see then.  If the tools/perf/ experience has really good 
results, we can reconsider this at a later date.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18  9:54                                       ` Ingo Molnar
@ 2010-03-18 10:40                                         ` Jes Sorensen
  2010-03-18 10:58                                           ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Jes Sorensen @ 2010-03-18 10:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/10 10:54, Ingo Molnar wrote:
> * Jes Sorensen<Jes.Sorensen@redhat.com>  wrote:
[...]
>>
>> At my previous employer we ended up dropping all Xen efforts exactly because
>> it was like maintaining two separate operating system kernels. The key to
>> KVM is that once you have Linux, you practically have KVM as well.
>
> Yes. Please realize that what is behind it is a strikingly simple argument:
>
>   "Once you have a single project to develop and maintain all is much better."

Thats a very glorified statement but it's not reality, sorry. You can do
that with something like perf because it's so small and development of
perf is limited to a very small group of developers.

>> [...] However, there is far more to it than just a couple of ioctls, for
>> example the stack of reverse device-drivers is a pretty significant code
>> base, rewriting that and maintaining it is not a trivial task. It is
>> certainly my belief that the benefit we get from sharing that with QEMU by
>> far outweighs the cost of forking it and keeping our own fork in the kernel
>> tree. In fact it would result in exactly the same problems I mentioned above
>> wrt Xen.
>
> I do not suggest forking Qemu at all, i suggest using the most natural
> development model for the KVM+Qemu shared project: a single repository.

If you are not suggesting to fork QEMU, what are you suggesting then?
You don't seriously expect that the KVM community will be able to
mandate that the QEMU community switch to the Linux kernel repository?
That would be like telling the openssl developers that they should merge
with glibc and start working out of the glibc tree.

What you are suggesting is *only* going to happen if we fork QEMU, there
is zero chance to move the main QEMU repository into the Linux kernel
tree. And trust me, you don't want to have Linus having to deal with
handling patches for tcg or embedded board emulation.

>> With this you have just thrown away all the benefits of having the QEMU
>> repository shared with other developers who will actively fix bugs in
>> components we do care about for KVM.
>
> Not if it's a unified project.

You still haven't explained how you expect create a unified KVM+QEMU
project, without forking from the existing QEMU.

>>>   - encourage kernel-space and user-space KVM developers to work on both
>>>     user-space and kernel-space bits as a single unit. It's one project and
>>>     a single experience to the user.
>>
>> This is already happening and a total non issue.
>
> My experience as an external observer of the end result contradicts this.

What I have seen you complain about here is the lack of a good end user
GUI for KVM. However that is a different thing. So far no vendor has put
significant effort into it, but that is nothing new in Linux. We have a
great kernel, but our user applications are still lacking. We have 217
CD players for GNOME, but we have no usable calendering application.

A good GUI for virtualization is a big task, and whoever designs it will
base their design upon their preferences for whats important. A lot of
spare time developers would clearly care most about a gui installation
and fancy icons to click on, whereas server users would be much more
interested in automation and remote access to the systems. For a good
example of an incomplete solution, try installing Fedora over a serial
line, you cannot do half the things without launching VNC :( Getting a
comprehensive solution for this that would satisfy the bulk of the users
would be a huge chunk of code in the kernel tree. Imagine the screaming
that would result in? How often have we not had the moaning from x86
users who wanted to rip out all the non x86 code to reduce the size of
the tarball?

> Seemingly trivial usability changes to the KVM+Qemu combo are not being done
> often because they involve cross-discipline changes.

Which trivial usability changes?

>>>   - [ and probably libvirt should go there too ]
>>
>> Now that would be interesting, next we'll have to include things like libxml
>> in the kernel git tree as well, to make sure libvirt doesn't get out of sync
>> with the version supplied by your distribution vendor.
>
> The way we have gone about this in tools/perf/ is similar to the route picked
> by Git: we only use very lowlevel libraries available everywhere, and we
> provide optional wrappers to the rest.

Did you ever look at what libvirt actually does and what it offers? Or
how about the various libraries used by QEMU to offer things like VNC
support or X support?

Again this works fine for something like perf where the primary
display is text mode.

>> So far your argument would justify pulling all of gdb into the kernel git
>> tree as well, to support the kgdb efforts, or gcc so we can get rid of the
>> gcc version quirks in the kernel header files, e2fsprogs and equivalent for
>> _all_ file systems included in the kernel so we can make sure our fs tools
>> never get out of sync with whats supported in the kernel......
>
> gdb and gcc is clearly extrinsic to the kernel so why would we move them
> there?

gdb should go with kgdb which goes with the kernel to keep it in sync.
If you want to be consistent in your argument, you have to go all the
way.

> I was talking about tools that are closely related to the kernel - where much
> of the development and actual use is in combination with the Linux kernel.

Well the file system tools would obviously have to go into the kernel
then so appropriate binaries can be distributed to match the kernel.

> 90%+ of the Qemu usecases are combined with Linux. (Yes, i know that you can
> run Qemu without KVM, and no, i dont think it matters in the grand scheme of
> things and most investment into Qemu comes from the KVM angle these days. In
> particular it for sure does not justify handicapping future KVM evolution so
> drastically.)

90+%? You got to be kidding? You clearly have no idea just how much it's
used for running embedded emulators on non Linux. You should have seen
the noise it made when I added C99 initializers to certain structs,
because it broke builds using very old GCC versions on BeOS. Linux only,
not a chance. Try subscribing to qemu-devel and you'll see a list that
is only overtaken by few lists like lkml in terms of daily traffic.

>> Oh and you completely forgot SeaBIOS. KVM+QEMU rely on SeaBIOS too, so from
>> what you're saying we should pull that into the kernel git repository as
>> well. Never mind the fact that we share SeaBIOS with the coreboot project
>> which is very actively adding features to it that benefit us as well.....
>
> SeaBIOS is in essence a firmware, so it could either be loaded as such.
>
> Just look at the qemu source code - the BIOSes are .bin images in
> qemu/pc-bios/ imported externally in essence.

Ehm no, QEMU now pulls in SeaBIOS to build it. And there are a lot of
changes that require modification in SeaBIOS to match changes to QEMU.

> qemu-kvm branch is not similar to my proposal at all: it made KVM _more_
> fragmented, not more unified. I.e. it was a move in the exact opposite
> direction and i'd expect such a move to fail.
>
> In fact the failure of qemu-kvm supports my point rather explicitly: it
> demonstrates that extra packages and split development are actively harmful.

Ehm it showed what happens when you fork QEMU to modify it primarily for
your own project, ie. KVM. You are suggesting we fork QEMU for the
benefit of KVM, and it will be exactly the same thing that happens.

I know you state that you are not suggesting we fork it, but as I showed
above, pulling QEMU into the kernel tree, can only happen as a fork.
There is no point pretending otherwise.

> I speak about this as a person who has done successful unifications of split
> codebases and in my judgement this move would be significantly beneficial to
> KVM.
>
> You cannot really validly reject this proposal with "It wont work" as it
> clearly has worked in other, comparable cases. You could only reject this with
> "I have tried it and it didnt work".
>
> Think about it: a clean and hackable user-space component in tools/kvm/. It's
> very tempting :-)

I say this based on my hacking experience, my experience with the
kernel, the QEMU base, SeaBIOS and merging projects. Yes it can be done,
but the cost is much higher than the gain.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 10:12                                         ` Avi Kivity
  2010-03-18 10:28                                           ` Ingo Molnar
@ 2010-03-18 10:50                                           ` Ingo Molnar
  2010-03-18 11:30                                             ` Avi Kivity
  2010-03-18 21:02                                             ` Zachary Amsden
  1 sibling, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 10:50 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> > The moment any change (be it as trivial as fixing a GUI detail or as 
> > complex as a new feature) involves two or more packages, development speed 
> > slows down to a crawl - while the complexity of the change might be very 
> > low!
> 
> Why is that?

It's very simple: because the contribution latencies and overhead compound, 
almost inevitably.

If you ever tried to implement a combo GCC+glibc+kernel feature you'll know 
...

Even with the best-run projects in existence it takes forever and is very 
painful - and here i talk about first hand experience over many years.

> I the maintainers of all packages are cooperative and responsive, then the 
> patches will get accepted quickly.  If they aren't, development will be 
> slow. [...]

I'm afraid practice is different from the rosy ideal you paint there. Even 
with assumed 'perfect projects' there's always random differences between 
projects, causing doubled (tripled) overhead and compounded up overhead:

 - random differences in release schedules

 - random differences in contribution guidelines

 - random differences in coding style

> [...] It isn't any different from contributing to two unrelated kernel 
> subsystems (which are in fact in different repositories until the next merge 
> window).

You mention a perfect example: contributing to multipe kernel subsystems. Even 
_that_ is very noticeably harder than contributing to a single subsystem - due 
to the inevitable buerocratic overhead, due to different development trees, 
due to different merge criteria.

So you are underlining my point (perhaps without intending to): treating 
closely related bits of technology as a single project is much better.

Obviously arch/x86/kvm/, virt/ and tools/kvm/ should live in a single 
development repository (perhaps micro-differentiated by a few topical 
branches), for exactly those reasons you mention.

Just like tools/perf/ and kernel/perf_event.c and arch/*/kernel/perf*.c are 
treated as a single project.

[ Note: we actually started from a 'split' design [almost everyone picks that, 
  because of this false 'kernel space bits must be separate from user space 
  bits' myth] where the user-space component was a separate code base and 
  unified it later on as the project progressed.

  Trust me, the practical benefits of the unified approach are enormous to 
  developers and to users alike, and there was no looking back once we made 
  the switch. ]

Also, i dont really try to 'convince' you here - you made your position very 
clear early on and despite many unopposed technical arguments i made, the 
positions seem to have hardened and i expect it wont change, no matter what 
arguments i bring. It's a pity but hey, i'm just an observer here really - 
it's the rest of _your_ life this all impacts.

I just wanted to point out the root cause of KVM's usability problems as i see 
it - just like i was pointing out the mortal Xen design deficiencies back when 
i was backing KVM strongly, four years ago. Back then everyone was saying that 
i'm crazy and we are stuck with Xen forever and while KVM is nice it has no 
chance.

Just because you got the kernel bits of KVM right a few years ago does not 
mean you cannot mess up other design aspects, and sometimes badly so ;-) 
Historically i messed up more than half of all first-gut-feeling technical 
design decisions i did, so i had to correct the course many, many times.

I hope you are still keeping an open mind about it all and dont think that 
because the project was split for 4 years (to no fault of your own, simply out 
of necessity) it should be split forever ...

arch/x86 was split for a much longer period than that.

Circumstances have changed. Most Qemu users/contributions are now coming from 
the KVM angle, so please simply start thinking about the next level of 
evolution.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 10:40                                         ` Jes Sorensen
@ 2010-03-18 10:58                                           ` Ingo Molnar
  2010-03-18 13:23                                             ` Jes Sorensen
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 10:58 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Jes Sorensen <Jes.Sorensen@redhat.com> wrote:

> On 03/18/10 10:54, Ingo Molnar wrote:
> >* Jes Sorensen<Jes.Sorensen@redhat.com>  wrote:
> [...]
> >>
> >>At my previous employer we ended up dropping all Xen efforts exactly because
> >>it was like maintaining two separate operating system kernels. The key to
> >>KVM is that once you have Linux, you practically have KVM as well.
> >
> >Yes. Please realize that what is behind it is a strikingly simple argument:
> >
> >  "Once you have a single project to develop and maintain all is much better."
> 
> Thats a very glorified statement but it's not reality, sorry. You can do 
> that with something like perf because it's so small and development of perf 
> is limited to a very small group of developers.

I was not talking about just perf: i am also talking about the arch/x86/ 
unification which is 200+ KLOC of highly non-trivial kernel code with hundreds 
of contributors and with 8000+ commits in the past two years.

Also, it applies to perf as well: people said exactly that a year ago: 'perf 
has it easy to be clean as it is small, once it gets as large as Oprofile 
tooling it will be in the same messy situation'.

Today perf has more features than Oprofile, has a larger and more complex code 
base, has more contributors, and no, it's not in the same messy situation at 
all.

So whatever you think of large, unified projects, you are quite clearly 
mistaken. I have done and maintained through two different types of 
unifications and the experience was very similar: both developers and users 
(and maintainers) are much better off.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 10:32                                         ` Avi Kivity
@ 2010-03-18 11:19                                           ` Ingo Molnar
  2010-03-18 18:20                                           ` Frederic Weisbecker
  1 sibling, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 11:19 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/18/2010 11:22 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>>  - move a clean (and minimal) version of the Qemu code base to tools/kvm/,
> >>>    in the upstream kernel repo, and work on that from that point on.
> >>I'll ignore the repository location which should be immaterial to a serious
> >>developer and concentrate on the 'clean and minimal' aspect, since it has
> >>some merit.  [...]
> >
> > To the contrary, experience shows that repository location, and in 
> > particular a shared repository for closely related bits is very much 
> > material!
> >
> > It matters because when there are two separate projects, even a "serious 
> > developer" is finding it double and triple difficult to contribute even 
> > trivial changes.
> >
> > It becomes literally a nightmare if you have to touch 3 packages: kernel, 
> > a library and an app codebase. It takes _forever_ to get anything useful 
> > done.
> 
> You can't be serious.  I find that the difficulty in contributing a patch 
> has mostly to do with writing the patch, and less with figuring out which 
> email address to send it to.

My own experience and everyone i've talked about such topics (developers and 
distro people) about feature contribution tells the exact opposite: it's much 
harder to contribute features to multiple packages than to a single project.

kernel+library+app features take forever to propagate, and there's constant 
fear of version friction, productization deadlines are uncertain and ABI 
messups are frequent as well due to disjoint testing. Also, each component has 
essential veto power: so if the proposed API or approach is opposed or changed 
in a later stage then that affects (sometimes already committed) changes. If 
you've ever done it you'll know how tedious it is.

This very thread and recent threads about KVM usability demonstrate the same 
complications.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 10:50                                           ` Ingo Molnar
@ 2010-03-18 11:30                                             ` Avi Kivity
  2010-03-18 11:48                                               ` Ingo Molnar
  2010-03-18 21:02                                             ` Zachary Amsden
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 11:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 12:50 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> The moment any change (be it as trivial as fixing a GUI detail or as
>>> complex as a new feature) involves two or more packages, development speed
>>> slows down to a crawl - while the complexity of the change might be very
>>> low!
>>>        
>> Why is that?
>>      
> It's very simple: because the contribution latencies and overhead compound,
> almost inevitably.
>    

It's not inevitable, if the projects are badly run, you'll have high 
latencies, but projects don't have to be badly run.

> If you ever tried to implement a combo GCC+glibc+kernel feature you'll know
> ...
>
> Even with the best-run projects in existence it takes forever and is very
> painful - and here i talk about first hand experience over many years.
>    

Try sending a patch to qemu-devel@, you may be pleasantly surprised.


>> I the maintainers of all packages are cooperative and responsive, then the
>> patches will get accepted quickly.  If they aren't, development will be
>> slow. [...]
>>      
> I'm afraid practice is different from the rosy ideal you paint there. Even
> with assumed 'perfect projects' there's always random differences between
> projects, causing doubled (tripled) overhead and compounded up overhead:
>
>   - random differences in release schedules
>
>   - random differences in contribution guidelines
>
>   - random differences in coding style
>    

None of these matter for steady contributors.

>> [...] It isn't any different from contributing to two unrelated kernel
>> subsystems (which are in fact in different repositories until the next merge
>> window).
>>      
> You mention a perfect example: contributing to multipe kernel subsystems. Even
> _that_ is very noticeably harder than contributing to a single subsystem - due
> to the inevitable buerocratic overhead, due to different development trees,
> due to different merge criteria.
>
> So you are underlining my point (perhaps without intending to): treating
> closely related bits of technology as a single project is much better.
>
> Obviously arch/x86/kvm/, virt/ and tools/kvm/ should live in a single
> development repository (perhaps micro-differentiated by a few topical
> branches), for exactly those reasons you mention.
>    

How is a patch for the qemu GUI eject button and the kvm shadow mmu 
related?  Should a single maintainer deal with both?


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 10:21                                             ` Avi Kivity
@ 2010-03-18 11:35                                               ` Ingo Molnar
  2010-03-18 12:00                                                 ` Alexander Graf
  2010-03-18 12:33                                                 ` Frank Ch. Eigler
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 11:35 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexander Graf, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> > Still it's _very_ useful to have a single reference implementation under 
> > tools/perf/ where we concentrate the best of the code. That is where we 
> > make sure that each new kernel feature is appropriately implemented in 
> > user-space as well, that the combination works well together and is 
> > releasable to users. That is what keeps us all honest: the latency of 
> > features is much lower, and there's no ping-pong of blame going on between 
> > the two components in case of bugs or in case of misfeatures.
> 
> That would make sense for a truly minimal userspace for kvm: we once had a 
> tool called kvmctl which was used to run tests (since folded into qemu).  It 
> didn't contain a GUI and was unable to run a general purpose guest.  It was 
> a few hundred lines of code, and indeed patches to kvmctl had a much closer 
> correspondence to patches with kvm (though still low, as most kvm patches 
> don't modify the ABI).

If it's functional to the extent of at least allowing say a serial console via 
the console (like the UML binary allows) i'd expect the minimal user-space to 
quickly grow out of this minimal state. The rest will be history.

Maybe this is a better, simpler (and much cleaner and less controversial) 
approach than moving a 'full' copy of qemu there.

There's certainly no risk: if qemu stays dominant then nothing is lost 
[tools/kvm/ can be removed after some time], and if this clean base works out 
fine then the useful qemu technologies will move over to it gradually and 
without much fuss, and the developers will move with it as well.

If it's just a token effort with near zero utility to begin with it certainly 
wont take off.

Once it's there in tools/kvm/ and bootable i'd certainly hack up some quick 
xlib based VGA output capability myself - it's not that hard ;-) It would also 
allow me to test whether latest-KVM still boots fine in a much simpler way. 
(most of my testboxes dont have qemu installed)

So you have one user signed up for that already ;-)

> > Same goes for KVM+Qemu: it would be so much nicer to have a single, 
> > well-focused reference implementation under tools/kvm/ and have 
> > improvements flow into that code base.
> >
> > That way KVM developers cannot just shrug "well, GUI suckage is a 
> > user-space problem" - like the answers i got in the KVM usability thread 
> > ...
> >
> > The buck will stop here.
> 
> Suppose we copy qemu tomorrow into tools/.  All the problems will be copied 
> with it.  Someone still has to write patches to fix them. Who will it be?

What we saw with tools/perf/ was that pure proximity to actual kernel testers 
and kernel developers produces a steady influx of new developers. It didnt 
happen overnight, but it happened. A simple:

  cd tools/perf/
  make -j install

Gets them something to play with. That kind of proximity is very powerful.

The other benefit was that distros can package perf with the kernel package, 
so it's updated together with the kernel. This means a very efficient 
distribution of new technologies, together with new kernel releases.

Distributions are very eager to update kernels even in stable periods of the 
distro lifetime - they are much less willing to update user-space packages.

You can literally get full KVM+userspace features done _and deployed to users_ 
within the 3 months development cycle of upstream KVM.

All these create synergies that are very clear once you see the process in 
motion. It's a powerful positive feedback loop. Give it some thought please.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 11:30                                             ` Avi Kivity
@ 2010-03-18 11:48                                               ` Ingo Molnar
  2010-03-18 12:22                                                 ` Avi Kivity
  2010-03-18 14:53                                                 ` Anthony Liguori
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 11:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/18/2010 12:50 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>>The moment any change (be it as trivial as fixing a GUI detail or as
> >>>complex as a new feature) involves two or more packages, development speed
> >>>slows down to a crawl - while the complexity of the change might be very
> >>>low!
> >>Why is that?
> > It's very simple: because the contribution latencies and overhead compound,
> > almost inevitably.
> 
> It's not inevitable, if the projects are badly run, you'll have high 
> latencies, but projects don't have to be badly run.

So the 64K dollar question is, why does Qemu still suck?

> 
> >If you ever tried to implement a combo GCC+glibc+kernel feature you'll know
> >...
> >
> >Even with the best-run projects in existence it takes forever and is very
> >painful - and here i talk about first hand experience over many years.
> 
> Try sending a patch to qemu-devel@, you may be pleasantly surprised.
> 
> 
> >>I the maintainers of all packages are cooperative and responsive, then the
> >>patches will get accepted quickly.  If they aren't, development will be
> >>slow. [...]
> >I'm afraid practice is different from the rosy ideal you paint there. Even
> >with assumed 'perfect projects' there's always random differences between
> >projects, causing doubled (tripled) overhead and compounded up overhead:
> >
> >  - random differences in release schedules
> >
> >  - random differences in contribution guidelines
> >
> >  - random differences in coding style
> 
> None of these matter for steady contributors.
> 
> >>[...] It isn't any different from contributing to two unrelated kernel
> >>subsystems (which are in fact in different repositories until the next merge
> >>window).
> >You mention a perfect example: contributing to multipe kernel subsystems. Even
> >_that_ is very noticeably harder than contributing to a single subsystem - due
> >to the inevitable buerocratic overhead, due to different development trees,
> >due to different merge criteria.
> >
> >So you are underlining my point (perhaps without intending to): treating
> >closely related bits of technology as a single project is much better.
> >
> > Obviously arch/x86/kvm/, virt/ and tools/kvm/ should live in a single 
> > development repository (perhaps micro-differentiated by a few topical 
> > branches), for exactly those reasons you mention.
> 
> How is a patch for the qemu GUI eject button and the kvm shadow mmu related?  
> Should a single maintainer deal with both?

We have co-maintainers for perf that have a different focus. It works pretty 
well.

Look at git log tools/perf/ and how user-space and kernel-space components 
interact in practice. You'll patches that only impact one side, but you'll see 
very big overlap both in contributor identity and in patches as well.

Also, let me put similar questions in a bit different way:

 - ' how is an in-kernel PIT emulation connected to Qemu's PIT emulation? '

 - ' how is the in-kernel dynticks implementation related to Qemu's 
     implementation of hardware timers? '

 - ' how is an in-kernel event for a CD-ROM eject connected to an in-Qemu 
     eject event? '

 - ' how is a new hardware virtualization feature related to being able to 
     configure and use it via Qemu? '

 - ' how is the in-kernel x86 decoder/emulator related to the Qemu x86 
     emulator? '

 - ' how is the performance of the qemu GUI related to the way VGA buffers are 
     mapped and accelerated by KVM? '

They are obviously deeply related. The quality of a development process is not 
defined by the easy cases where no project unification is needed. The quality 
of a development process is defined by the _difficult_ cases.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 11:35                                               ` Ingo Molnar
@ 2010-03-18 12:00                                                 ` Alexander Graf
  2010-03-18 12:33                                                 ` Frank Ch. Eigler
  1 sibling, 0 replies; 390+ messages in thread
From: Alexander Graf @ 2010-03-18 12:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
>
>   
>>> Still it's _very_ useful to have a single reference implementation under 
>>> tools/perf/ where we concentrate the best of the code. That is where we 
>>> make sure that each new kernel feature is appropriately implemented in 
>>> user-space as well, that the combination works well together and is 
>>> releasable to users. That is what keeps us all honest: the latency of 
>>> features is much lower, and there's no ping-pong of blame going on between 
>>> the two components in case of bugs or in case of misfeatures.
>>>       
>> That would make sense for a truly minimal userspace for kvm: we once had a 
>> tool called kvmctl which was used to run tests (since folded into qemu).  It 
>> didn't contain a GUI and was unable to run a general purpose guest.  It was 
>> a few hundred lines of code, and indeed patches to kvmctl had a much closer 
>> correspondence to patches with kvm (though still low, as most kvm patches 
>> don't modify the ABI).
>>     
>
> If it's functional to the extent of at least allowing say a serial console via 
> the console (like the UML binary allows) i'd expect the minimal user-space to 
> quickly grow out of this minimal state. The rest will be history.
>
> Maybe this is a better, simpler (and much cleaner and less controversial) 
> approach than moving a 'full' copy of qemu there.
>
> There's certainly no risk: if qemu stays dominant then nothing is lost 
> [tools/kvm/ can be removed after some time], and if this clean base works out 
> fine then the useful qemu technologies will move over to it gradually and 
> without much fuss, and the developers will move with it as well.
>
> If it's just a token effort with near zero utility to begin with it certainly 
> wont take off.
>
> Once it's there in tools/kvm/ and bootable i'd certainly hack up some quick 
> xlib based VGA output capability myself - it's not that hard ;-) It would also 
> allow me to test whether latest-KVM still boots fine in a much simpler way. 
> (most of my testboxes dont have qemu installed)
>
> So you have one user signed up for that already ;-)
>   

Alright, you just volunteered. Just give it a go and try to implement
the "oh so simple" KVM frontend while maintaining compatibility with at
least a few older Linux guests. My guess is that you'll realize it's a
dead end before committing anything to the kernel source tree. But
really, just try it out.


Good Luck

Alex

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 11:48                                               ` Ingo Molnar
@ 2010-03-18 12:22                                                 ` Avi Kivity
  2010-03-18 13:00                                                   ` Ingo Molnar
  2010-03-18 14:53                                                 ` Anthony Liguori
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 12:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 01:48 PM, Ingo Molnar wrote:
>
>> It's not inevitable, if the projects are badly run, you'll have high
>> latencies, but projects don't have to be badly run.
>>      
> So the 64K dollar question is, why does Qemu still suck?
>    

Where people sent patches, it doesn't suck (or sucks less).  Where they 
don't, it still sucks.  And it cost way more than $64K.

If moving things to tools/ helps, let's move Fedora to tools/.

>> How is a patch for the qemu GUI eject button and the kvm shadow mmu related?
>> Should a single maintainer deal with both?
>>      
> We have co-maintainers for perf that have a different focus. It works pretty
> well.
>    

And it works well when I have patches that change x86 core and kvm.  But 
that's no longer a single repository and we have to coordinate.

> Look at git log tools/perf/ and how user-space and kernel-space components
> interact in practice. You'll patches that only impact one side, but you'll see
> very big overlap both in contributor identity and in patches as well.
>
> Also, let me put similar questions in a bit different way:
>
>   - ' how is an in-kernel PIT emulation connected to Qemu's PIT emulation? '
>    

Both implement the same spec.  One is be a code derivative of the other 
(via Xen).

>   - ' how is the in-kernel dynticks implementation related to Qemu's
>       implementation of hardware timers? '
>    

The quality of host kernel timers directly determines the quality of 
qemu's timer emulation.

>   - ' how is an in-kernel event for a CD-ROM eject connected to an in-Qemu
>       eject event? '
>    

Both implement the same spec.  The kernel of course needs to handle all 
implementation variants, while qemu only needs to implement it once.

>   - ' how is a new hardware virtualization feature related to being able to
>       configure and use it via Qemu? '
>    

Most features (example: npt) are transparent to userspace, some are 
not.  When they are not, we introduce an ioctl() to kvm for controlling 
the feature, and a command-line switch to qemu for calling it.

>   - ' how is the in-kernel x86 decoder/emulator related to the Qemu x86
>       emulator? '
>    

Both implement the same spec.  Note qemu is not an emulator but a binary 
translator.

>   - ' how is the performance of the qemu GUI related to the way VGA buffers are
>       mapped and accelerated by KVM? '
>    

kvm needs to support direct mapping when possible and efficient data 
transfer when not.  The latter will obviously be much slower.  When 
direct mapping is possible, kvm needs to track pages touched by the 
guest to avoid full screen redraws.  The rest (interfacing to X or vnc, 
implementing emulated hardware acceleration, full-screen mode, etc.) are 
unrelated.

> They are obviously deeply related.

Not at all.  kvm in fact knows nothing about vga, to take your last 
example.  To suggest that qemu needs to be close to the kernel to 
benefit from the kernel's timer implementation means we don't care about 
providing quality timing except to ourselves, which luckily isn't the case.

Some time ago the various desktops needed directory change notification, 
and people implemented inotify (or whatever it's called today).  No one 
suggested tools/gnome/ and tools/kde/.

> The quality of a development process is not
> defined by the easy cases where no project unification is needed. The quality
> of a development process is defined by the _difficult_ cases.
>    

That's true, but we don't have issues at the qemu/kvm boundary.  Note we 
do have issues at the qemu/aio interfaces and qemu/net interfaces (out 
of which vhost-net was born) but these wouldn't be solved by tools/qemu/.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 11:35                                               ` Ingo Molnar
  2010-03-18 12:00                                                 ` Alexander Graf
@ 2010-03-18 12:33                                                 ` Frank Ch. Eigler
  2010-03-18 13:01                                                     ` John Kacur
  2010-03-18 13:02                                                   ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: Frank Ch. Eigler @ 2010-03-18 12:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

Ingo Molnar <mingo@elte.hu> writes:

> [...]
> Distributions are very eager to update kernels even in stable periods of the 
> distro lifetime - they are much less willing to update user-space packages.
> [...]

Sorry, er, what?  What distributions eagerly upgrade kernels in stable
periods, were it not primarily motivated by security fixes?  What users
eagerly replace their kernels?

- FChE

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 12:22                                                 ` Avi Kivity
@ 2010-03-18 13:00                                                   ` Ingo Molnar
  2010-03-18 13:36                                                     ` Avi Kivity
  2010-03-18 14:59                                                     ` Anthony Liguori
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 13:00 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/18/2010 01:48 PM, Ingo Molnar wrote:
>
> > > It's not inevitable, if the projects are badly run, you'll have high 
> > > latencies, but projects don't have to be badly run.
> >
> > So the 64K dollar question is, why does Qemu still suck?
> 
> Where people sent patches, it doesn't suck (or sucks less). Where they 
> don't, it still sucks. [...]

So is your point that the development process and basic code structure does 
not matter at all, it's just a matter of people sending patches? I beg to 
differ ...

> [...]  And it cost way more than $64K.
> 
> If moving things to tools/ helps, let's move Fedora to tools/.

Those bits of Fedora which deeply relate to the kernel - yes.
Those bits that are arguably separate - nope.

> >> How is a patch for the qemu GUI eject button and the kvm shadow mmu 
> >> related? Should a single maintainer deal with both?
> >
> > We have co-maintainers for perf that have a different focus. It works 
> > pretty well.
> 
> And it works well when I have patches that change x86 core and kvm. But 
> that's no longer a single repository and we have to coordinate.

Actually, it works much better if, contrary to your proposal it ends up in a 
single repo. Last i checked both of us really worked on such a project, run by 
some guy. (Named Linus or so.)

> > Look at git log tools/perf/ and how user-space and kernel-space components 
> > interact in practice. You'll patches that only impact one side, but you'll 
> > see very big overlap both in contributor identity and in patches as well.
> >
> > Also, let me put similar questions in a bit different way:
> >
> >  - ' how is an in-kernel PIT emulation connected to Qemu's PIT emulation? '
> 
> Both implement the same spec.  One is be a code derivative of the other (via 
> Xen).
> 
> >  - ' how is the in-kernel dynticks implementation related to Qemu's
> >      implementation of hardware timers? '
> 
> The quality of host kernel timers directly determines the quality of
> qemu's timer emulation.
> 
> >  - ' how is an in-kernel event for a CD-ROM eject connected to an in-Qemu
> >      eject event? '
> 
> Both implement the same spec.  The kernel of course needs to handle
> all implementation variants, while qemu only needs to implement it
> once.
> 
> >  - ' how is a new hardware virtualization feature related to being able to
> >      configure and use it via Qemu? '
> 
> Most features (example: npt) are transparent to userspace, some are
> not.  When they are not, we introduce an ioctl() to kvm for
> controlling the feature, and a command-line switch to qemu for
> calling it.
> 
> >  - ' how is the in-kernel x86 decoder/emulator related to the Qemu x86
> >      emulator? '
> 
> Both implement the same spec.  Note qemu is not an emulator but a
> binary translator.
> 
> >  - ' how is the performance of the qemu GUI related to the way VGA buffers are
> >      mapped and accelerated by KVM? '
> 
> kvm needs to support direct mapping when possible and efficient data
> transfer when not.  The latter will obviously be much slower.  When
> direct mapping is possible, kvm needs to track pages touched by the
> guest to avoid full screen redraws.  The rest (interfacing to X or
> vnc, implementing emulated hardware acceleration, full-screen mode,
> etc.) are unrelated.
> 
> > They are obviously deeply related.
> 
> Not at all. [...]

You are obviously arguing for something like UML. Fortunately KVM is not that. 
Or i hope it isnt.

> [...]  kvm in fact knows nothing about vga, to take your last
> example. [...]

Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG ioctl.

See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap().

It started out as a VGA optimization (also used by live migration) and even 
today it's mostly used by the VGA drivers - albeit a weak one.

I wish there were stronger VGA optimizations implemented, copying the dirty 
bitmap is not a particularly performant solution. (although it's certainly 
better than full emulation) Graphics performance is one of the more painful 
aspects of KVM usability today.

> [...]  To suggest that qemu needs to be close to the kernel to benefit from 
> the kernel's timer implementation means we don't care about providing 
> quality timing except to ourselves, which luckily isn't the case.

That is not what i said. I said they are closely related, and where 
technologies are closely related, project proximity turns into project 
unification at a certain stage.

> Some time ago the various desktops needed directory change
> notification, and people implemented inotify (or whatever it's
> called today).  No one suggested tools/gnome/ and tools/kde/.

You are misconstruing and misrepresenting my argument - i'd expect better. 
Gnome and KDE runs on other kernels as well and is generally not considered 
close to the kernel.

Do you seriously argue that Qemu has nothing to do with KVM these days?

> > The quality of a development process is not defined by the easy cases 
> > where no project unification is needed. The quality of a development 
> > process is defined by the _difficult_ cases.
> 
> That's true, but we don't have issues at the qemu/kvm boundary. Note we do 
> have issues at the qemu/aio interfaces and qemu/net interfaces (out of which 
> vhost-net was born) but these wouldn't be solved by tools/qemu/.

That was not what i suggested. They would be solved by what i proposed: 
tools/kvm/, right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-18 12:33                                                 ` Frank Ch. Eigler
@ 2010-03-18 13:01                                                     ` John Kacur
  2010-03-18 13:02                                                   ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: John Kacur @ 2010-03-18 13:01 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Ingo Molnar, Avi Kivity, Alexander Graf, Anthony Liguori, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On Thu, Mar 18, 2010 at 1:33 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
> Ingo Molnar <mingo@elte.hu> writes:
>
>> [...]
>> Distributions are very eager to update kernels even in stable periods of the
>> distro lifetime - they are much less willing to update user-space packages.
>> [...]
>
> Sorry, er, what?  What distributions eagerly upgrade kernels in stable
> periods, were it not primarily motivated by security fixes?  What users
> eagerly replace their kernels?
>

Us guys reading and participating on the list. ;)

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-18 13:01                                                     ` John Kacur
  0 siblings, 0 replies; 390+ messages in thread
From: John Kacur @ 2010-03-18 13:01 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Ingo Molnar, Avi Kivity, Alexander Graf, Anthony Liguori, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On Thu, Mar 18, 2010 at 1:33 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
> Ingo Molnar <mingo@elte.hu> writes:
>
>> [...]
>> Distributions are very eager to update kernels even in stable periods of the
>> distro lifetime - they are much less willing to update user-space packages.
>> [...]
>
> Sorry, er, what?  What distributions eagerly upgrade kernels in stable
> periods, were it not primarily motivated by security fixes?  What users
> eagerly replace their kernels?
>

Us guys reading and participating on the list. ;)

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 12:33                                                 ` Frank Ch. Eigler
  2010-03-18 13:01                                                     ` John Kacur
@ 2010-03-18 13:02                                                   ` Ingo Molnar
  2010-03-18 13:10                                                     ` Avi Kivity
  2010-03-18 13:24                                                     ` Frank Ch. Eigler
  1 sibling, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 13:02 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Frank Ch. Eigler <fche@redhat.com> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> > [...]
> > Distributions are very eager to update kernels even in stable periods of the 
> > distro lifetime - they are much less willing to update user-space packages.
> > [...]
> 
> Sorry, er, what?  What distributions eagerly upgrade kernels in stable 
> periods, were it not primarily motivated by security fixes? [...]

Please check the popular distro called 'Fedora' for example, and its kernel 
upgrade policies.

> [...] What users eagerly replace their kernels?

Those 99% who click on the 'install 193 updates' popup.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-18  8:03               ` Ingo Molnar
@ 2010-03-18 13:03                 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 390+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-03-18 13:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Avi Kivity, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, zhiteng.huang

Em Thu, Mar 18, 2010 at 09:03:25AM +0100, Ingo Molnar escreveu:
> 
> * Zhang, Yanmin <yanmin_zhang@linux.intel.com> wrote:
> 
> > I worked out 3 new patches against tip/master tree of Mar. 17th.
> 
> Cool! Mind sending them as a series of patches instead of attachment? That 
> makes it easier to review them. Also, the Signed-off-by lines seem to be 
> missing plus we need a per patch changelog as well.

Yeah, please, and I hadn't merged them, so the resend was the best thing to do.

- Arnaldo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:02                                                   ` Ingo Molnar
@ 2010-03-18 13:10                                                     ` Avi Kivity
  2010-03-18 13:31                                                       ` Ingo Molnar
  2010-03-18 13:24                                                     ` Frank Ch. Eigler
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 13:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 03:02 PM, Ingo Molnar wrote:
>
>> [...] What users eagerly replace their kernels?
>>      
> Those 99% who click on the 'install 193 updates' popup.
>
>    

Of which 1 is the kernel, and 192 are userspace updates (of which one 
may be qemu).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 10:58                                           ` Ingo Molnar
@ 2010-03-18 13:23                                             ` Jes Sorensen
  2010-03-18 14:22                                               ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Jes Sorensen @ 2010-03-18 13:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/18/10 11:58, Ingo Molnar wrote:
>
> * Jes Sorensen<Jes.Sorensen@redhat.com>  wrote:
>> Thats a very glorified statement but it's not reality, sorry. You can do
>> that with something like perf because it's so small and development of perf
>> is limited to a very small group of developers.
>
> I was not talking about just perf: i am also talking about the arch/x86/
> unification which is 200+ KLOC of highly non-trivial kernel code with hundreds
> of contributors and with 8000+ commits in the past two years.

Sorry but you cannot compare merging two chunks of kernel code that
originated from the same base, with the efforts of mixing a large
userland project with a kernel component. Apples and oranges.

> Also, it applies to perf as well: people said exactly that a year ago: 'perf
> has it easy to be clean as it is small, once it gets as large as Oprofile
> tooling it will be in the same messy situation'.
>
> Today perf has more features than Oprofile, has a larger and more complex code
> base, has more contributors, and no, it's not in the same messy situation at
> all.

Both perf and oprofile are still relatively small projects in comparison
to QEMU.

> So whatever you think of large, unified projects, you are quite clearly
> mistaken. I have done and maintained through two different types of
> unifications and the experience was very similar: both developers and users
> (and maintainers) are much better off.

You believe that I am wrong in my assessment of unified projects, and I
obviously think you are mistaken and underestimating the cost and
effects of trying to merge the two.

Well I think we are just going to agree to disagree on this one. I am
not against merging projects where it makes sense, but in this
particular case I am strongly convinced the loss would be much greater
than the gain.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:02                                                   ` Ingo Molnar
  2010-03-18 13:10                                                     ` Avi Kivity
@ 2010-03-18 13:24                                                     ` Frank Ch. Eigler
  2010-03-18 13:48                                                       ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Frank Ch. Eigler @ 2010-03-18 13:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

Hi -

> > > [...]
> > > Distributions are very eager to update kernels even in stable periods of the 
> > > distro lifetime - they are much less willing to update user-space packages.
> > > [...]
> > 
> > Sorry, er, what?  What distributions eagerly upgrade kernels in stable 
> > periods, were it not primarily motivated by security fixes? [...]
> 
> Please check the popular distro called 'Fedora' for example

I do believe I've heard of it.  According to fedora bodhi, there have
been 18 kernel updates issues for fedora 11 since its release, of
which 12 were for purely security updates, and most of the other six
also contain security fixes.  None are described as 'enhancement'
updates.  Oh, what about fedora 12?  8 updates total, of which 5 are
security only, one for drm showstoppers, others including security
fixes, again 0 tagged as 'enhancement'.

So where is that "eagerness" again??  My sense is that most users are
happy to leave a stable kernel running as long as possible, and
distributions know this.  You surely must understand that the lkml
demographics are different.


> and its kernel upgrade policies.

[citation needed]


> > [...] What users eagerly replace their kernels?
>
> Those 99% who click on the 'install 193 updates' popup.

That's not "eager".  That's "I'm exasperated from guessing what's
really important; let's not have so many updates; meh".


- FChE

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:10                                                     ` Avi Kivity
@ 2010-03-18 13:31                                                       ` Ingo Molnar
  2010-03-18 13:44                                                         ` Daniel P. Berrange
  2010-03-18 13:46                                                         ` Avi Kivity
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 13:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/18/2010 03:02 PM, Ingo Molnar wrote:
> >
> >> [...] What users eagerly replace their kernels?
> >
> > Those 99% who click on the 'install 193 updates' popup.
> >
> 
> Of which 1 is the kernel, and 192 are userspace updates (of which one may be 
> qemu).

I think you didnt understand my (tersely explained) point - which is probably 
my fault. What i said is:

 - distros update the kernel first. Often in stable releases as well if 
   there's a new kernel released. (They must because it provides new hardware
   enablement and other critical changes they generally cannot skip.)

 - Qemu on the other hand is not upgraded with (nearly) that level of urgency.
   Completely new versions will generally have to wait for the next distro
   release.

With in-kernel tools the kernel and the tooling that accompanies the kernel 
are upgraded in the same low-latency pathway. That is a big plus if you are 
offering things like instrumentation (which perf does), which relates closely 
to the kernel.

Furthermore, many distros package up the latest -git kernel as well. They 
almost never do that with user-space packages.

Let me give you a specific example:

I'm running Fedora Rawhide with 2.6.34-rc1 right now on my main desktop, and 
that comes with perf-2.6.34-0.10.rc1.git0.fc14.noarch.

My rawhide box has qemu-kvm-0.12.3-3.fc14.x86_64 installed. That's more than a 
1000 Qemu commits older than the latest Qemu development branch.

So by being part of the kernel repo there's lower latency upgrades and earlier 
and better testing available on most distros.

You made it very clear that you dont want that, but please dont try to claim 
that those advantages do not exist - they are very much real and we are making 
good use of it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:00                                                   ` Ingo Molnar
@ 2010-03-18 13:36                                                     ` Avi Kivity
  2010-03-18 14:09                                                       ` Ingo Molnar
  2010-03-18 14:59                                                     ` Anthony Liguori
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 13:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 03:00 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/18/2010 01:48 PM, Ingo Molnar wrote:
>>
>>      
>>>> It's not inevitable, if the projects are badly run, you'll have high
>>>> latencies, but projects don't have to be badly run.
>>>>          
>>> So the 64K dollar question is, why does Qemu still suck?
>>>        
>> Where people sent patches, it doesn't suck (or sucks less). Where they
>> don't, it still sucks. [...]
>>      
> So is your point that the development process and basic code structure does
> not matter at all, it's just a matter of people sending patches? I beg to
> differ ...
>    

The development process of course matters, and we have worked hard to 
fix qemu's.  Basic code structure also matters, but you don't fix that 
with cp.

>> [...]  And it cost way more than $64K.
>>
>> If moving things to tools/ helps, let's move Fedora to tools/.
>>      
> Those bits of Fedora which deeply relate to the kernel - yes.
> Those bits that are arguably separate - nope.
>    

A qemu GUI is not deeply related to the kernel.  Or at all.

>>>> How is a patch for the qemu GUI eject button and the kvm shadow mmu
>>>> related? Should a single maintainer deal with both?
>>>>          
>>> We have co-maintainers for perf that have a different focus. It works
>>> pretty well.
>>>        
>> And it works well when I have patches that change x86 core and kvm. But
>> that's no longer a single repository and we have to coordinate.
>>      
> Actually, it works much better if, contrary to your proposal it ends up in a
> single repo. Last i checked both of us really worked on such a project, run by
> some guy. (Named Linus or so.)
>    

Well, when last I sent x86 patches, they went to you and hpa, applied to 
tip, from which I had to merge them back.  Two repositories.  After 
several weeks they did end up in a third repository, Linus'.  The 
process isn't trivial or fast, but it works.

>>> Look at git log tools/perf/ and how user-space and kernel-space components
>>> interact in practice. You'll patches that only impact one side, but you'll
>>> see very big overlap both in contributor identity and in patches as well.
>>>
>>> Also, let me put similar questions in a bit different way:
>>>
>>>   - ' how is an in-kernel PIT emulation connected to Qemu's PIT emulation? '
>>>        
>> Both implement the same spec.  One is be a code derivative of the other (via
>> Xen).
>>
>>      
>>>   - ' how is the in-kernel dynticks implementation related to Qemu's
>>>       implementation of hardware timers? '
>>>        
>> The quality of host kernel timers directly determines the quality of
>> qemu's timer emulation.
>>
>>      
>>>   - ' how is an in-kernel event for a CD-ROM eject connected to an in-Qemu
>>>       eject event? '
>>>        
>> Both implement the same spec.  The kernel of course needs to handle
>> all implementation variants, while qemu only needs to implement it
>> once.
>>
>>      
>>>   - ' how is a new hardware virtualization feature related to being able to
>>>       configure and use it via Qemu? '
>>>        
>> Most features (example: npt) are transparent to userspace, some are
>> not.  When they are not, we introduce an ioctl() to kvm for
>> controlling the feature, and a command-line switch to qemu for
>> calling it.
>>
>>      
>>>   - ' how is the in-kernel x86 decoder/emulator related to the Qemu x86
>>>       emulator? '
>>>        
>> Both implement the same spec.  Note qemu is not an emulator but a
>> binary translator.
>>
>>      
>>>   - ' how is the performance of the qemu GUI related to the way VGA buffers are
>>>       mapped and accelerated by KVM? '
>>>        
>> kvm needs to support direct mapping when possible and efficient data
>> transfer when not.  The latter will obviously be much slower.  When
>> direct mapping is possible, kvm needs to track pages touched by the
>> guest to avoid full screen redraws.  The rest (interfacing to X or
>> vnc, implementing emulated hardware acceleration, full-screen mode,
>> etc.) are unrelated.
>>
>>      
>>> They are obviously deeply related.
>>>        
>> Not at all. [...]
>>      
> You are obviously arguing for something like UML. Fortunately KVM is not that.
> Or i hope it isnt.
>    

I am not arguing for UML and don't understand why you think so.

>> [...]  kvm in fact knows nothing about vga, to take your last
>> example. [...]
>>      
> Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG ioctl.
>
> See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap().
>
> It started out as a VGA optimization (also used by live migration) and even
> today it's mostly used by the VGA drivers - albeit a weak one.
>
> I wish there were stronger VGA optimizations implemented, copying the dirty
> bitmap is not a particularly performant solution.

The VGA dirty bitmap is 256 bytes in length.  Copying it doesn't take 
any time at all.

People are in fact working on a copy-less dirty bitmap solution, for 
live migration of very large memory guests.  Expect set_bit_user() 
patches for tip.git.

>   (although it's certainly
> better than full emulation) Graphics performance is one of the more painful
> aspects of KVM usability today.
>    

If you have suggestions for further optimizations (or even patches) I'd 
love to hear them.

One solution we are working on is QXL, a framebuffer-less graphics card 
designed for spice.  The use case is again server based (hosted 
desktops) but may be adapted for desktop-on-desktop use.

>> [...]  To suggest that qemu needs to be close to the kernel to benefit from
>> the kernel's timer implementation means we don't care about providing
>> quality timing except to ourselves, which luckily isn't the case.
>>      
> That is not what i said. I said they are closely related, and where
> technologies are closely related, project proximity turns into project
> unification at a certain stage.
>    

I really don't see how.  So what if both qemu and kvm implement an 
i8254?  They can't share any code since the internal APIs are so 
different.  Even worse for the x86 emulator as qemu and kvm are 
fundamentally different.  Even more with the qemu timers and kernel 
dyntick code.

>> Some time ago the various desktops needed directory change
>> notification, and people implemented inotify (or whatever it's
>> called today).  No one suggested tools/gnome/ and tools/kde/.
>>      
> You are misconstruing and misrepresenting my argument - i'd expect better.
> Gnome and KDE runs on other kernels as well and is generally not considered
> close to the kernel.
>    

qemu runs on other kernels (including Windows), just without kvm.

> Do you seriously argue that Qemu has nothing to do with KVM these days?
>    

The vast majority of qemu has nothing to do with kvm, all the kvm 
interface bits are in two files.  Things like the GUI, the VNC server, 
IDE emulation, the management interface (the monitor), live migration, 
qcow2 and ~15 other file format drivers, chipset emulation, USB 
controller emulation, snapshot support, slirp, serial port emulation, 
and a zillion other details have nothing to do with kvm.

>>> The quality of a development process is not defined by the easy cases
>>> where no project unification is needed. The quality of a development
>>> process is defined by the _difficult_ cases.
>>>        
>> That's true, but we don't have issues at the qemu/kvm boundary. Note we do
>> have issues at the qemu/aio interfaces and qemu/net interfaces (out of which
>> vhost-net was born) but these wouldn't be solved by tools/qemu/.
>>      
> That was not what i suggested. They would be solved by what i proposed:
> tools/kvm/, right?
>    

If they were, it would be worth it.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:31                                                       ` Ingo Molnar
@ 2010-03-18 13:44                                                         ` Daniel P. Berrange
  2010-03-18 13:59                                                           ` Ingo Molnar
  2010-03-18 13:46                                                         ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Daniel P. Berrange @ 2010-03-18 13:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Frank Ch. Eigler, Alexander Graf, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On Thu, Mar 18, 2010 at 02:31:24PM +0100, Ingo Molnar wrote:
> 
> * Avi Kivity <avi@redhat.com> wrote:
> 
> > On 03/18/2010 03:02 PM, Ingo Molnar wrote:
> > >
> > >> [...] What users eagerly replace their kernels?
> > >
> > > Those 99% who click on the 'install 193 updates' popup.
> > >
> > 
> > Of which 1 is the kernel, and 192 are userspace updates (of which one may be 
> > qemu).
> 
> I think you didnt understand my (tersely explained) point - which is probably 
> my fault. What i said is:
> 
>  - distros update the kernel first. Often in stable releases as well if 
>    there's a new kernel released. (They must because it provides new hardware
>    enablement and other critical changes they generally cannot skip.)
> 
>  - Qemu on the other hand is not upgraded with (nearly) that level of urgency.
>    Completely new versions will generally have to wait for the next distro
>    release.

This has nothing todo with them being in separate source repos. We could
update QEMU to new major feature releaes with the same frequency in a Fedora
release, but we delibrately choose not to rebase the QEMU userspace because 
experiance has shown the downside from new bugs / regressions outweighs the
benefit of any new features.

The QEMU updates in stable Fedora trees, now just follow the minor bugfix
release stream provided by QEMU & those arrive in Fedora with little
noticable delay.

Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:31                                                       ` Ingo Molnar
  2010-03-18 13:44                                                         ` Daniel P. Berrange
@ 2010-03-18 13:46                                                         ` Avi Kivity
  2010-03-18 13:57                                                           ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 13:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 03:31 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/18/2010 03:02 PM, Ingo Molnar wrote:
>>      
>>>        
>>>> [...] What users eagerly replace their kernels?
>>>>          
>>> Those 99% who click on the 'install 193 updates' popup.
>>>
>>>        
>> Of which 1 is the kernel, and 192 are userspace updates (of which one may be
>> qemu).
>>      
> I think you didnt understand my (tersely explained) point - which is probably
> my fault. What i said is:
>
>   - distros update the kernel first. Often in stable releases as well if
>     there's a new kernel released. (They must because it provides new hardware
>     enablement and other critical changes they generally cannot skip.)
>    

No, they don't.  RHEL 5 is still on 2.6.18, for example.  Users don't 
like their kernels updated unless absolutely necessary, with good reason.

Kernel updates = reboots.

>   - Qemu on the other hand is not upgraded with (nearly) that level of urgency.
>     Completely new versions will generally have to wait for the next distro
>     release.
>    

F12 recently updated to 2.6.32.  This is probably due to 2.6.31.stable 
dropping away, and no capacity at Fedora to maintain it on their own.  
So they are caught in a bind - stay on 2.6.31 and expose users to 
security vulnerabilities or move to 2.6.32 and cause regressions.  Not a 
happy choice.

> With in-kernel tools the kernel and the tooling that accompanies the kernel
> are upgraded in the same low-latency pathway. That is a big plus if you are
> offering things like instrumentation (which perf does), which relates closely
> to the kernel.
>
> Furthermore, many distros package up the latest -git kernel as well. They
> almost never do that with user-space packages.
>    

I'm sure if we ask the Fedora qemu maintainer to package qemu-kvm.git 
they'll consider it favourably.  Isn't that what rawhide is for?

> Let me give you a specific example:
>
> I'm running Fedora Rawhide with 2.6.34-rc1 right now on my main desktop, and
> that comes with perf-2.6.34-0.10.rc1.git0.fc14.noarch.
>
> My rawhide box has qemu-kvm-0.12.3-3.fc14.x86_64 installed. That's more than a
> 1000 Qemu commits older than the latest Qemu development branch.
>
> So by being part of the kernel repo there's lower latency upgrades and earlier
> and better testing available on most distros.
>
> You made it very clear that you dont want that, but please dont try to claim
> that those advantages do not exist - they are very much real and we are making
> good use of it.
>    

I don't mind at all if rawhide users run on the latest and greatest, but 
release users deserve a little more stability.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:24                                                     ` Frank Ch. Eigler
@ 2010-03-18 13:48                                                       ` Ingo Molnar
  0 siblings, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 13:48 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Avi Kivity, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Frank Ch. Eigler <fche@redhat.com> wrote:

> Hi -
> 
> > > > [...]
> > > > Distributions are very eager to update kernels even in stable periods of the 
> > > > distro lifetime - they are much less willing to update user-space packages.
> > > > [...]
> > > 
> > > Sorry, er, what?  What distributions eagerly upgrade kernels in stable 
> > > periods, were it not primarily motivated by security fixes? [...]
> > 
> > Please check the popular distro called 'Fedora' for example
> 
> I do believe I've heard of it.  According to fedora bodhi, there have
> been 18 kernel updates issues for fedora 11 since its release, of
> which 12 were for purely security updates, and most of the other six
> also contain security fixes.  None are described as 'enhancement'
> updates.  Oh, what about fedora 12?  8 updates total, of which 5 are
> security only, one for drm showstoppers, others including security
> fixes, again 0 tagged as 'enhancement'.
> 
> So where is that "eagerness" again??  My sense is that most users are
> happy to leave a stable kernel running as long as possible, and
> distributions know this.  You surely must understand that the lkml
> demographics are different.
> 
> > and its kernel upgrade policies.
> 
> [citation needed]

You are quite wrong, despite the sarcastic tone you are attempting to use, and 
this is distro kernel policy 101.

For distros such as Fedora it's simpler to support the same kernel version 
across many older versions of the distro than having to support different 
kernel versions.

Check Fedora 12 for example. Four months ago it was released with kernel 
v2.6.31:

 http://download.fedora.redhat.com/pub/fedora/linux/releases/12/Fedora/x86_64/os/Packages/kernel-2.6.31.5-127.fc12.x86_64.rpm

But if you update a Fedora 12 installation today you'll get kernel v2.6.32:

 http://download.fedora.redhat.com/pub/fedora/linux/updates/12/SRPMS/kernel-2.6.32.9-70.fc12.src.rpm

As a result you'll get a new 2.6.32 kernel on Fedora 12.

The end result is what i said in the previous mail: that you'll get a newer 
kernel even on a stable distro - while user-space packages will only be 
updated if there's a security issue (and even then there's no version jump 
like for the kernel).

> > > [...] What users eagerly replace their kernels?
> >
> > Those 99% who click on the 'install 193 updates' popup.
> 
> That's not "eager".  That's "I'm exasperated from guessing what's really 
> important; let's not have so many updates; meh".

Erm, fact is, 99% [WAG] of the users click on the update button and accept 
whatever kernel version the distro update offers them.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:46                                                         ` Avi Kivity
@ 2010-03-18 13:57                                                           ` Ingo Molnar
  2010-03-18 14:25                                                             ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 13:57 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/18/2010 03:31 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>On 03/18/2010 03:02 PM, Ingo Molnar wrote:
> >>>>[...] What users eagerly replace their kernels?
> >>>Those 99% who click on the 'install 193 updates' popup.
> >>>
> >>Of which 1 is the kernel, and 192 are userspace updates (of which one may be
> >>qemu).
> >I think you didnt understand my (tersely explained) point - which is probably
> >my fault. What i said is:
> >
> >  - distros update the kernel first. Often in stable releases as well if
> >    there's a new kernel released. (They must because it provides new hardware
> >    enablement and other critical changes they generally cannot skip.)
> 
> No, they don't. [...]

I just replied to Frank Ch. Eigler with a specific example that shows how this 
happens - and believe me, it happens.

> [...]  RHEL 5 is still on 2.6.18, for example.  Users
> don't like their kernels updated unless absolutely necessary, with
> good reason.

Nope - RHEL 5 is on a 2.6.18 base for entirely different reasons.

> Kernel updates = reboots.

If you check the update frequency of RHEL 5 kernels you'll see that it's 
comparable to that of Fedora.

> >  - Qemu on the other hand is not upgraded with (nearly) that level of urgency.
> >    Completely new versions will generally have to wait for the next distro
> >    release.
> 
> F12 recently updated to 2.6.32.  This is probably due to 2.6.31.stable 
> dropping away, and no capacity at Fedora to maintain it on their own.  So 
> they are caught in a bind - stay on 2.6.31 and expose users to security 
> vulnerabilities or move to 2.6.32 and cause regressions.  Not a happy 
> choice.

Happy choice or not, this is what i said is the distro practice these days. (i 
dont know all the distros that well so i'm sure there's differences)

> > With in-kernel tools the kernel and the tooling that accompanies the kernel
> > are upgraded in the same low-latency pathway. That is a big plus if you are
> > offering things like instrumentation (which perf does), which relates closely
> > to the kernel.
> >
> > Furthermore, many distros package up the latest -git kernel as well. They
> > almost never do that with user-space packages.
> 
> I'm sure if we ask the Fedora qemu maintainer to package qemu-kvm.git 
> they'll consider it favourably.  Isn't that what rawhide is for?

Rawhide is generally for latest released versions, to ready them for the next 
distro release - with special exception for the kernel, which has a special 
position due being a hardware-enabler and because it has an extremely 
predictable release schedule of every 90 days (+- 10 days).

Very rarely do distro people jump versions for things like GCC or Xorg or 
Gnome/KDE, but they've been burned enough times by unexpected delays in those 
projects to be really loathe to do it.

Qemu might get an exception - dunno, you could ask. My point still holds: by 
hosting KVM user-space bits in the kernel together with the rest of KVM you 
get version parity - which has clear advantages.

You also might have more luck with a bleeding-edge distro such as Gentoo.

> >Let me give you a specific example:
> >
> >I'm running Fedora Rawhide with 2.6.34-rc1 right now on my main desktop, and
> >that comes with perf-2.6.34-0.10.rc1.git0.fc14.noarch.
> >
> >My rawhide box has qemu-kvm-0.12.3-3.fc14.x86_64 installed. That's more than a
> >1000 Qemu commits older than the latest Qemu development branch.
> >
> >So by being part of the kernel repo there's lower latency upgrades and earlier
> >and better testing available on most distros.
> >
> >You made it very clear that you dont want that, but please dont try to claim
> >that those advantages do not exist - they are very much real and we are making
> >good use of it.
> 
> I don't mind at all if rawhide users run on the latest and greatest, but 
> release users deserve a little more stability.

What are you suggesting, that released versions of KVM are not reliable? Of 
course any tools/ bits are release engineered just as much as the rest of KVM 
...

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:44                                                         ` Daniel P. Berrange
@ 2010-03-18 13:59                                                           ` Ingo Molnar
  2010-03-18 14:06                                                               ` John Kacur
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 13:59 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Avi Kivity, Frank Ch. Eigler, Alexander Graf, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker


* Daniel P. Berrange <berrange@redhat.com> wrote:

> On Thu, Mar 18, 2010 at 02:31:24PM +0100, Ingo Molnar wrote:
> > 
> > * Avi Kivity <avi@redhat.com> wrote:
> > 
> > > On 03/18/2010 03:02 PM, Ingo Molnar wrote:
> > > >
> > > >> [...] What users eagerly replace their kernels?
> > > >
> > > > Those 99% who click on the 'install 193 updates' popup.
> > > >
> > > 
> > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be 
> > > qemu).
> > 
> > I think you didnt understand my (tersely explained) point - which is probably 
> > my fault. What i said is:
> > 
> >  - distros update the kernel first. Often in stable releases as well if 
> >    there's a new kernel released. (They must because it provides new hardware
> >    enablement and other critical changes they generally cannot skip.)
> > 
> >  - Qemu on the other hand is not upgraded with (nearly) that level of urgency.
> >    Completely new versions will generally have to wait for the next distro
> >    release.
> 
> This has nothing todo with them being in separate source repos. We could 
> update QEMU to new major feature releaes with the same frequency in a Fedora 
> release, but we delibrately choose not to rebase the QEMU userspace because 
> experiance has shown the downside from new bugs / regressions outweighs the 
> benefit of any new features.
> 
> The QEMU updates in stable Fedora trees, now just follow the minor bugfix 
> release stream provided by QEMU & those arrive in Fedora with little 
> noticable delay.

That is exactly what i said: Qemu and most user-space packages are on a 
'slower' update track than the kernel: generally updated for minor releases.

My further point was that the kernel on the other hand gets updated more 
frequently and as such, any user-space tool bits hosted in the kernel repo get 
updated more frequently as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-18 13:59                                                           ` Ingo Molnar
@ 2010-03-18 14:06                                                               ` John Kacur
  0 siblings, 0 replies; 390+ messages in thread
From: John Kacur @ 2010-03-18 14:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Daniel P. Berrange, Avi Kivity, Frank Ch. Eigler, Alexander Graf,
	Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Thu, Mar 18, 2010 at 2:59 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Daniel P. Berrange <berrange@redhat.com> wrote:
>
>> On Thu, Mar 18, 2010 at 02:31:24PM +0100, Ingo Molnar wrote:
>> >
>> > * Avi Kivity <avi@redhat.com> wrote:
>> >
>> > > On 03/18/2010 03:02 PM, Ingo Molnar wrote:
>> > > >
>> > > >> [...] What users eagerly replace their kernels?
>> > > >
>> > > > Those 99% who click on the 'install 193 updates' popup.
>> > > >
>> > >
>> > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be
>> > > qemu).
>> >
>> > I think you didnt understand my (tersely explained) point - which is probably
>> > my fault. What i said is:
>> >
>> >  - distros update the kernel first. Often in stable releases as well if
>> >    there's a new kernel released. (They must because it provides new hardware
>> >    enablement and other critical changes they generally cannot skip.)
>> >
>> >  - Qemu on the other hand is not upgraded with (nearly) that level of urgency.
>> >    Completely new versions will generally have to wait for the next distro
>> >    release.
>>
>> This has nothing todo with them being in separate source repos. We could
>> update QEMU to new major feature releaes with the same frequency in a Fedora
>> release, but we delibrately choose not to rebase the QEMU userspace because
>> experiance has shown the downside from new bugs / regressions outweighs the
>> benefit of any new features.
>>
>> The QEMU updates in stable Fedora trees, now just follow the minor bugfix
>> release stream provided by QEMU & those arrive in Fedora with little
>> noticable delay.
>
> That is exactly what i said: Qemu and most user-space packages are on a
> 'slower' update track than the kernel: generally updated for minor releases.
>
> My further point was that the kernel on the other hand gets updated more
> frequently and as such, any user-space tool bits hosted in the kernel repo get
> updated more frequently as well.
>
> Thanks,
>
>        Ingo

Just to play devil's advocate, let's not mix up the development model with the
distribution model. There is nothing to stop packagers and distributors from
providing separate kernel "proper" packages and perf tools packages.

It might even make good sense assuming backwards compatibility for distros
that have conservative policies about new kernel versions to provide newer
perf tools packages with older kernels.

John

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-18 14:06                                                               ` John Kacur
  0 siblings, 0 replies; 390+ messages in thread
From: John Kacur @ 2010-03-18 14:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Daniel P. Berrange, Avi Kivity, Frank Ch. Eigler, Alexander Graf,
	Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Thu, Mar 18, 2010 at 2:59 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Daniel P. Berrange <berrange@redhat.com> wrote:
>
>> On Thu, Mar 18, 2010 at 02:31:24PM +0100, Ingo Molnar wrote:
>> >
>> > * Avi Kivity <avi@redhat.com> wrote:
>> >
>> > > On 03/18/2010 03:02 PM, Ingo Molnar wrote:
>> > > >
>> > > >> [...] What users eagerly replace their kernels?
>> > > >
>> > > > Those 99% who click on the 'install 193 updates' popup.
>> > > >
>> > >
>> > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be
>> > > qemu).
>> >
>> > I think you didnt understand my (tersely explained) point - which is probably
>> > my fault. What i said is:
>> >
>> >  - distros update the kernel first. Often in stable releases as well if
>> >    there's a new kernel released. (They must because it provides new hardware
>> >    enablement and other critical changes they generally cannot skip.)
>> >
>> >  - Qemu on the other hand is not upgraded with (nearly) that level of urgency.
>> >    Completely new versions will generally have to wait for the next distro
>> >    release.
>>
>> This has nothing todo with them being in separate source repos. We could
>> update QEMU to new major feature releaes with the same frequency in a Fedora
>> release, but we delibrately choose not to rebase the QEMU userspace because
>> experiance has shown the downside from new bugs / regressions outweighs the
>> benefit of any new features.
>>
>> The QEMU updates in stable Fedora trees, now just follow the minor bugfix
>> release stream provided by QEMU & those arrive in Fedora with little
>> noticable delay.
>
> That is exactly what i said: Qemu and most user-space packages are on a
> 'slower' update track than the kernel: generally updated for minor releases.
>
> My further point was that the kernel on the other hand gets updated more
> frequently and as such, any user-space tool bits hosted in the kernel repo get
> updated more frequently as well.
>
> Thanks,
>
>        Ingo

Just to play devil's advocate, let's not mix up the development model with the
distribution model. There is nothing to stop packagers and distributors from
providing separate kernel "proper" packages and perf tools packages.

It might even make good sense assuming backwards compatibility for distros
that have conservative policies about new kernel versions to provide newer
perf tools packages with older kernels.

John

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:36                                                     ` Avi Kivity
@ 2010-03-18 14:09                                                       ` Ingo Molnar
  2010-03-18 14:38                                                         ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 14:09 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> > That is not what i said. I said they are closely related, and where 
> > technologies are closely related, project proximity turns into project 
> > unification at a certain stage.
> 
> I really don't see how.  So what if both qemu and kvm implement an i8254?  
> They can't share any code since the internal APIs are so different. [...]

I wouldnt jump to assumptions there. perf shares some facilities with the 
kernel on the source code level - they can be built both in the kernel and in 
user-space.

But my main thought wasnt even to actually share the implementation - but to 
actually synchronize when a piece of device emulation moves into the kernel. 
It is arguably bad for performance in most cases when Qemu handles a given 
device - so all the common devices should be kernel accelerated.

The version and testing matrix would be simplified significantly as well: as 
kernel and qemu goes hand in hand, they are always on the same version.

> [...] Even worse for the x86 emulator as qemu and kvm are fundamentally 
> different.

So is it your argument that the difference and the duplication in x86 
instruction emulation is a good thing? You said it some time ago that
the kvm x86 emulator was very messy and you wish it was cleaner.

While qemu's is indeed rather different (it's partly a translator/JIT), i'm 
sure the decoder logic could be shared - and qemu has a slow-path 
full-emulation fallback in any case, which is similar to what in-kernel 
emulator does (IIRC ...).

That might have changed meanwhile.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 14:06                                                               ` John Kacur
  (?)
@ 2010-03-18 14:11                                                               ` Ingo Molnar
  -1 siblings, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 14:11 UTC (permalink / raw)
  To: John Kacur
  Cc: Daniel P. Berrange, Avi Kivity, Frank Ch. Eigler, Alexander Graf,
	Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* John Kacur <jkacur@redhat.com> wrote:

> On Thu, Mar 18, 2010 at 2:59 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Daniel P. Berrange <berrange@redhat.com> wrote:
> >
> >> On Thu, Mar 18, 2010 at 02:31:24PM +0100, Ingo Molnar wrote:
> >> >
> >> > * Avi Kivity <avi@redhat.com> wrote:
> >> >
> >> > > On 03/18/2010 03:02 PM, Ingo Molnar wrote:
> >> > > >
> >> > > >> [...] What users eagerly replace their kernels?
> >> > > >
> >> > > > Those 99% who click on the 'install 193 updates' popup.
> >> > > >
> >> > >
> >> > > Of which 1 is the kernel, and 192 are userspace updates (of which one may be
> >> > > qemu).
> >> >
> >> > I think you didnt understand my (tersely explained) point - which is probably
> >> > my fault. What i said is:
> >> >
> >> > ?- distros update the kernel first. Often in stable releases as well if
> >> > ? ?there's a new kernel released. (They must because it provides new hardware
> >> > ? ?enablement and other critical changes they generally cannot skip.)
> >> >
> >> > ?- Qemu on the other hand is not upgraded with (nearly) that level of urgency.
> >> > ? ?Completely new versions will generally have to wait for the next distro
> >> > ? ?release.
> >>
> >> This has nothing todo with them being in separate source repos. We could
> >> update QEMU to new major feature releaes with the same frequency in a Fedora
> >> release, but we delibrately choose not to rebase the QEMU userspace because
> >> experiance has shown the downside from new bugs / regressions outweighs the
> >> benefit of any new features.
> >>
> >> The QEMU updates in stable Fedora trees, now just follow the minor bugfix
> >> release stream provided by QEMU & those arrive in Fedora with little
> >> noticable delay.
> >
> > That is exactly what i said: Qemu and most user-space packages are on a
> > 'slower' update track than the kernel: generally updated for minor releases.
> >
> > My further point was that the kernel on the other hand gets updated more
> > frequently and as such, any user-space tool bits hosted in the kernel repo get
> > updated more frequently as well.
> >
> > Thanks,
> >
> > ? ? ? ?Ingo
> 
> Just to play devil's advocate, let's not mix up the development model with 
> the distribution model. There is nothing to stop packagers and distributors 
> from providing separate kernel "proper" packages and perf tools packages.
> 
> It might even make good sense assuming backwards compatibility for distros 
> that have conservative policies about new kernel versions to provide newer 
> perf tools packages with older kernels.

Of course. Some distros are also very conservative about updating the kernel 
at all.

I'm mostly talking about the distros that are at the frontier of kernel 
development: those with fresh packages, those which provide eager 
bleeding-edge testers and developers.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:23                                             ` Jes Sorensen
@ 2010-03-18 14:22                                               ` Ingo Molnar
  2010-03-18 14:45                                                 ` Jes Sorensen
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 14:22 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker


* Jes Sorensen <Jes.Sorensen@redhat.com> wrote:

> On 03/18/10 11:58, Ingo Molnar wrote:
> >
> >* Jes Sorensen<Jes.Sorensen@redhat.com>  wrote:
> >>Thats a very glorified statement but it's not reality, sorry. You can do
> >>that with something like perf because it's so small and development of perf
> >>is limited to a very small group of developers.
> >
> > I was not talking about just perf: i am also talking about the arch/x86/ 
> > unification which is 200+ KLOC of highly non-trivial kernel code with 
> > hundreds of contributors and with 8000+ commits in the past two years.
> 
> Sorry but you cannot compare merging two chunks of kernel code that 
> originated from the same base, with the efforts of mixing a large userland 
> project with a kernel component. Apples and oranges.

That's true to a certain degree, but combined with the perf experience it's 
all rather clear.

Similar arguments were made against the x86 unification and against perf. 
Similar arguments were made against KVM and in favor of Xen years ago - back 
when few of you knew about it ;-)

These are all repeating patterns in my experience.

You could fairly contrast that with a _failed_ unification perhaps - but i'm 
not aware of any such failed unification. (please educate me if you are)

The thing is, unifications are rare in the OSS space not because they dont 
make sense technically (to the contrary), they are rare due to blind inertia 
(why change if we managed to muddle through with the current scheme?) and to a 
certain degree due to the egos involved ;-)

As such we have a proliferation of packages in Linux, and we'd be much better 
off in a more focused fashion. And whenever i see that in the kernel's context 
i'll mention it - as it happened here too.

> > Also, it applies to perf as well: people said exactly that a year ago: 
> > 'perf has it easy to be clean as it is small, once it gets as large as 
> > Oprofile tooling it will be in the same messy situation'.
> >
> > Today perf has more features than Oprofile, has a larger and more complex 
> > code base, has more contributors, and no, it's not in the same messy 
> > situation at all.
> 
> Both perf and oprofile are still relatively small projects in comparison to 
> QEMU.

So is your argument that the unification does not make sense due to size? 
Would a smaller Qemu be more appropriate for this purpose?

> > So whatever you think of large, unified projects, you are quite clearly 
> > mistaken. I have done and maintained through two different types of 
> > unifications and the experience was very similar: both developers and 
> > users (and maintainers) are much better off.
> 
> You believe that I am wrong in my assessment of unified projects, and I 
> obviously think you are mistaken and underestimating the cost and effects of 
> trying to merge the two.
> 
> Well I think we are just going to agree to disagree on this one. I am not 
> against merging projects where it makes sense, but in this particular case I 
> am strongly convinced the loss would be much greater than the gain.

I wish you said that based on first hand negative experience with 
unifications, not based on just pure speculation.

(and yes, i speculate too, but at least with some basis)

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:01                                                     ` John Kacur
  (?)
@ 2010-03-18 14:25                                                     ` Ingo Molnar
  2010-03-18 14:39                                                       ` Frank Ch. Eigler
  -1 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 14:25 UTC (permalink / raw)
  To: John Kacur
  Cc: Frank Ch. Eigler, Avi Kivity, Alexander Graf, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker


* John Kacur <jkacur@redhat.com> wrote:

> On Thu, Mar 18, 2010 at 1:33 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
> > Ingo Molnar <mingo@elte.hu> writes:
> >
> >> [...]
> >> Distributions are very eager to update kernels even in stable periods of the
> >> distro lifetime - they are much less willing to update user-space packages.
> >> [...]
> >
> > Sorry, er, what? ?What distributions eagerly upgrade kernels in stable
> > periods, were it not primarily motivated by security fixes? ?What users
> > eagerly replace their kernels?
> >
> 
> Us guys reading and participating on the list. ;)

I'd like to second that - i'm actually quite happy to update the distro 
kernel. Also, i have rarely any problems even with bleeding edge kernels in 
rawhide - they are working pretty smoothly.

A large xorg update showing up in yum update gives me the cringe though ;-)

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:57                                                           ` Ingo Molnar
@ 2010-03-18 14:25                                                             ` Avi Kivity
  2010-03-18 14:36                                                               ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 14:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 03:57 PM, Ingo Molnar wrote:
>
>> [...]  RHEL 5 is still on 2.6.18, for example.  Users
>> don't like their kernels updated unless absolutely necessary, with
>> good reason.
>>      
> Nope - RHEL 5 is on a 2.6.18 base for entirely different reasons.
>    

All the reasons have 'stability' in them.

>> Kernel updates = reboots.
>>      
> If you check the update frequency of RHEL 5 kernels you'll see that it's
> comparable to that of Fedora.
>    

I'm sorry to say that's pretty bad.  Users don't want to update their 
kernels.

>>>   - Qemu on the other hand is not upgraded with (nearly) that level of urgency.
>>>     Completely new versions will generally have to wait for the next distro
>>>     release.
>>>        
>> F12 recently updated to 2.6.32.  This is probably due to 2.6.31.stable
>> dropping away, and no capacity at Fedora to maintain it on their own.  So
>> they are caught in a bind - stay on 2.6.31 and expose users to security
>> vulnerabilities or move to 2.6.32 and cause regressions.  Not a happy
>> choice.
>>      
> Happy choice or not, this is what i said is the distro practice these days. (i
> dont know all the distros that well so i'm sure there's differences)
>    

So in addition to all the normal kernel regressions, you want to force 
tools/kvm/ regressions on users.

>> I don't mind at all if rawhide users run on the latest and greatest, but
>> release users deserve a little more stability.
>>      
> What are you suggesting, that released versions of KVM are not reliable? Of
> course any tools/ bits are release engineered just as much as the rest of KVM
> ...
>    

No, I am suggesting qemu-kvm.git is not as stable as released versions 
(and won't get fixed backported).  Keep in mind that unlike many 
userspace applications, qemu exposes an ABI to guests which we must keep 
compatible.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 14:25                                                             ` Avi Kivity
@ 2010-03-18 14:36                                                               ` Ingo Molnar
  2010-03-18 14:51                                                                 ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 14:36 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> > Happy choice or not, this is what i said is the distro practice these 
> > days. (i dont know all the distros that well so i'm sure there's 
> > differences)
> 
> So in addition to all the normal kernel regressions, you want to force 
> tools/kvm/ regressions on users.

So instead you force a NxN compatibility matrix [all versions of qemu combined 
with all versions of the kernel] instead of a linear N versions matrix with a 
clear focus on the last version. Brilliant engineering i have to say ;-)

Also, by your argument the kernel should be split up into a micro-kernel, with 
different packages for KVM, scheduler, drivers, upgradeable separately.

That would be a nightmare. (i can detail many facets of that nightmare if you 
insist but i'll spare the electrons for now) Fortunately few kernel developers 
share your views about this.

> > > I don't mind at all if rawhide users run on the latest and greatest, but 
> > > release users deserve a little more stability.
> >
> > What are you suggesting, that released versions of KVM are not reliable? 
> > Of course any tools/ bits are release engineered just as much as the rest 
> > of KVM ...
> 
> No, I am suggesting qemu-kvm.git is not as stable as released versions (and 
> won't get fixed backported).  Keep in mind that unlike many userspace 
> applications, qemu exposes an ABI to guests which we must keep compatible.

I think you still dont understand it: if a tool moves to the kernel repo, then 
it is _released stable_ together with the next stable kernel.

I.e. you'd get a stable qemu-2.6.34 in essence, when v2.6.34 is released. You 
get minor updates with 2.6.34.1, 2.6.34.2, 2.6.34.3, etc - while development 
continues.

I.e. you get _more_ stability, because a matching kernel is released with a 
matching Qemu.

Qemu might have a different release schedule. Which, i argue, is not a good 
thing for exactly that reason :-) If it moved to tools/kvm/ it would get the 
same 90 days release frequency, merge window and stabilization window 
treatment as the upstream kernel.

Furthermore, users can also run experimental versions of qemu together with 
experimental versions of the kernel, by running something like 2.6.34-rc1 on 
Rawhide. Even if they dont download the latest qemu git and build it.

I.e. clearly _more_ is possible in such a scheme.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 14:09                                                       ` Ingo Molnar
@ 2010-03-18 14:38                                                         ` Avi Kivity
  2010-03-18 17:16                                                           ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 14:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 04:09 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> That is not what i said. I said they are closely related, and where
>>> technologies are closely related, project proximity turns into project
>>> unification at a certain stage.
>>>        
>> I really don't see how.  So what if both qemu and kvm implement an i8254?
>> They can't share any code since the internal APIs are so different. [...]
>>      
> I wouldnt jump to assumptions there. perf shares some facilities with the
> kernel on the source code level - they can be built both in the kernel and in
> user-space.
>
> But my main thought wasnt even to actually share the implementation - but to
> actually synchronize when a piece of device emulation moves into the kernel.
> It is arguably bad for performance in most cases when Qemu handles a given
> device - so all the common devices should be kernel accelerated.
>
> The version and testing matrix would be simplified significantly as well: as
> kernel and qemu goes hand in hand, they are always on the same version.
>    

So, you propose to allow running tools/kvm/ only on the kernel it was 
shipped with?

Otherwise the testing matrix isn't simplified.

>> [...] Even worse for the x86 emulator as qemu and kvm are fundamentally
>> different.
>>      
> So is it your argument that the difference and the duplication in x86
> instruction emulation is a good thing?

Of course it isn't a good thing, but it is unavoidable.  Qemu compiles 
code just-in-time to avoid interpretation overhead, while kvm emulates 
one instruction at a time.  No caching is possible, especially with 
ept/npt, since the guest is free to manipulate memory with no 
notification to the host.  Qemu also supports the full instruction set 
while kvm only implements what is necessary.  Qemu is a 
multi-source/multi-target translator while kvm's emulator is x86 specific.

> You said it some time ago that
> the kvm x86 emulator was very messy and you wish it was cleaner.
>    

It's still messy but is being cleaned up.

> While qemu's is indeed rather different (it's partly a translator/JIT), i'm
> sure the decoder logic could be shared - and qemu has a slow-path
> full-emulation fallback in any case, which is similar to what in-kernel
> emulator does (IIRC ...).
>
> That might have changed meanwhile.
>    

IIUC it only ever translates.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-17  8:10                                   ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar
  2010-03-18  8:20                                     ` Avi Kivity
  2010-03-18  8:44                                     ` Jes Sorensen
@ 2010-03-18 14:38                                     ` Anthony Liguori
  2010-03-18 14:44                                     ` Anthony Liguori
  3 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-18 14:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]

On 03/17/2010 03:10 AM, Ingo Molnar wrote:
> * Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>    
>> On 03/16/2010 12:39 PM, Ingo Molnar wrote:
>>      
>>>> If we look at the use-case, it's going to be something like, a user is
>>>> creating virtual machines and wants to get performance information about
>>>> them.
>>>>
>>>> Having to run a separate tool like perf is not going to be what they would
>>>> expect they had to do.  Instead, they would either use their existing GUI
>>>> tool (like virt-manager) or they would use their management interface
>>>> (either QMP or libvirt).
>>>>
>>>> The complexity of interaction is due to the fact that perf shouldn't be a
>>>> stand alone tool.  It should be a library or something with a programmatic
>>>> interface that another tool can make use of.
>>>>          
>>> But ... a GUI interface/integration is of course possible too, and it's being
>>> worked on.
>>>
>>> perf is mainly a kernel developer tool, and kernel developers generally dont
>>> use GUIs to do their stuff: which is the (sole) reason why its first ~850
>>> commits of tools/perf/ were done without a GUI. We go where our developers
>>> are.
>>>
>>> In any case it's not an excuse to have no proper command-line tooling. In fact
>>> if you cannot get simpler, more atomic command-line tooling right then you'll
>>> probably doubly suck at doing a GUI as well.
>>>        
>> It's about who owns the user interface.
>>
>> If qemu owns the user interface, than we can satisfy this in a very simple
>> way by adding a perf monitor command.  If we have to support third party
>> tools, then it significantly complicates things.
>>      
> Of course illogical modularization complicates things 'significantly'.
>    

Ok.  Then apply this to the kernel.  I'm then happy to take patches.

Regards,

Anthony Liguori


[-- Attachment #2: qemu-linux.patch --]
[-- Type: text/plain, Size: 1204 bytes --]

commit 84b84db054e83e7686b80fad9f8d2aa87aade1a1
Author: Anthony Liguori <aliguori@us.ibm.com>
Date:   Thu Mar 18 09:35:29 2010 -0500

    Bring QEMU into the Linux kernel tree
    
    Ingo is under the impression that this will result in a massive improvement in
    the usability of QEMU.
    
    Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/.gitmodules b/.gitmodules
new file mode 100644
index 0000000..76cdb68
--- /dev/null
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "tools/qemu"]
+	path = tools/qemu
+	url = git://git.qemu.org/qemu.git
diff --git a/MAINTAINERS b/MAINTAINERS
index 03f38c1..6275796 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4427,6 +4427,12 @@ M:	Robert Jarzmik <robert.jarzmik@free.fr>
 L:	rtc-linux@googlegroups.com
 S:	Maintained
 
+QEMU
+M:      Anthony Liguori <aliguori@us.ibm.com>
+L:      qemu-devel@nongnu.org
+S:      Maintained
+F:      tools/qemu
+
 QLOGIC QLA2XXX FC-SCSI DRIVER
 M:	Andrew Vasquez <andrew.vasquez@qlogic.com>
 M:	linux-driver@qlogic.com
diff --git a/tools/qemu b/tools/qemu
new file mode 160000
index 0000000..e5322f7
--- /dev/null
+++ b/tools/qemu
@@ -0,0 +1 @@
+Subproject commit e5322f76a72352eea8eb511390c27726b64e5a87

^ permalink raw reply related	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 14:25                                                     ` Ingo Molnar
@ 2010-03-18 14:39                                                       ` Frank Ch. Eigler
  0 siblings, 0 replies; 390+ messages in thread
From: Frank Ch. Eigler @ 2010-03-18 14:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: John Kacur, Avi Kivity, Alexander Graf, Anthony Liguori, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

Hi -

On Thu, Mar 18, 2010 at 03:25:04PM +0100, Ingo Molnar wrote:
> [...]
> > Us guys reading and participating on the list. ;)
> 
> I'd like to second that - i'm actually quite happy to update the distro 
> kernel. Also, i have rarely any problems even with bleeding edge kernels in 
> rawhide - they are working pretty smoothly.
> 
> A large xorg update showing up in yum update gives me the cringe though ;-)

>From a parochial point of view, that makes perfect sense: someone
else's large software changes are a source of concern.  The same thing
applies to non-LKML people -- ordinary users -- when *your* large
software changes are proposed.

Perhaps this change in perspective would help you see the absurdity of
proposing kernel-2.6.git as a hosting repository for all kinds of
stuff, on the theory that kernel updates get pushed to "eager" users
more frequently than other kinds of updates.  (Never mind that data
shows otherwise.)


- FChE

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-17  8:10                                   ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar
                                                       ` (2 preceding siblings ...)
  2010-03-18 14:38                                     ` Anthony Liguori
@ 2010-03-18 14:44                                     ` Anthony Liguori
  3 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-18 14:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/17/2010 03:10 AM, Ingo Molnar wrote:
>   - move a clean (and minimal) version of the Qemu code base to tools/kvm/, in
>     the upstream kernel repo, and work on that from that point on.
>    

QEMU is about 600k LOC.  We have a mechanism to compile out portions of 
the code but a lot things are tied together in an intimate way.  In the 
long run, we're working on adding stronger interfaces such that we can 
split components out into libraries that are consumable by other 
applications.

Simplying forking the device model won't work.  Well more than half of 
our contributors are not coming from KVM developers/users.  If you just 
fork the device models, you start to lose a ton of fixes (look at Xen 
and VirtualBox).

So feel free to either 1) apply my previous patch and then start working 
on a "clean (and minimal)" QEMU or 2) wait to commit my previous patch 
and start sending patches to clean up QEMU.

Absolute none of this is going to give you a VirtualBox like GUI for QEMU.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 14:22                                               ` Ingo Molnar
@ 2010-03-18 14:45                                                 ` Jes Sorensen
  2010-03-18 16:54                                                   ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Jes Sorensen @ 2010-03-18 14:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/18/10 15:22, Ingo Molnar wrote:
>
> * Jes Sorensen<Jes.Sorensen@redhat.com>  wrote:
>> Both perf and oprofile are still relatively small projects in comparison to
>> QEMU.
>
> So is your argument that the unification does not make sense due to size?
> Would a smaller Qemu be more appropriate for this purpose?

As I have stated repeatedly in this discussion, a unification would hurt
the QEMU development process because it would alienate a large number of
QEMU developers who are *not* Linux kernel users.

QEMU is a lot more complex than you let on.

>> Well I think we are just going to agree to disagree on this one. I am not
>> against merging projects where it makes sense, but in this particular case I
>> am strongly convinced the loss would be much greater than the gain.
>
> I wish you said that based on first hand negative experience with
> unifications, not based on just pure speculation.
>
> (and yes, i speculate too, but at least with some basis)

You still haven't given us a *single* example of unification of
something that wasn't purely linked to the Linux kernel. perf/
oprofile is 100% linked to the Linux kernel, QEMU is not. I wish
you would actually look at what users use QEMU for. As long as you
continue to purely speculate on this, to use your own words, your
arguments are not holding up.

And you are not being consistent either. You have conveniently
continue to ignore my questions about why the file system tools are not
to be merged into the Linux kernel source tree?

Jes

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 14:36                                                               ` Ingo Molnar
@ 2010-03-18 14:51                                                                 ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 14:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frank Ch. Eigler, Alexander Graf, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 04:36 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> Happy choice or not, this is what i said is the distro practice these
>>> days. (i dont know all the distros that well so i'm sure there's
>>> differences)
>>>        
>> So in addition to all the normal kernel regressions, you want to force
>> tools/kvm/ regressions on users.
>>      
> So instead you force a NxN compatibility matrix [all versions of qemu combined
> with all versions of the kernel] instead of a linear N versions matrix with a
> clear focus on the last version. Brilliant engineering i have to say ;-)
>    

Thanks.  In fact with have an QxKxGxT compatibility matrix since we need 
to keep compatibility with guests and with tools.  Since the easiest 
interface to keep compatible is the qemu/kernel interface, allowing the 
kernel and qemu to change independently allows reducing the 
compatibility matrix while still providing some improvements.

Regardless of that I'd keep binary compatibility anyway.  Not everyone 
is on the update treadmill with everything updating every three months 
and those people appreciate stability.  I intend to keep providing it.

> Also, by your argument the kernel should be split up into a micro-kernel, with
> different packages for KVM, scheduler, drivers, upgradeable separately.
>    

Some kernels do provide some of that facility (without being 
microkernels), for example the Windows and RHEL kernels.  So it seems 
people want it.

> That would be a nightmare. (i can detail many facets of that nightmare if you
> insist but i'll spare the electrons for now) Fortunately few kernel developers
> share your views about this.
>    

I'm not sure you know my views about this.

>>>> I don't mind at all if rawhide users run on the latest and greatest, but
>>>> release users deserve a little more stability.
>>>>          
>>> What are you suggesting, that released versions of KVM are not reliable?
>>> Of course any tools/ bits are release engineered just as much as the rest
>>> of KVM ...
>>>        
>> No, I am suggesting qemu-kvm.git is not as stable as released versions (and
>> won't get fixed backported).  Keep in mind that unlike many userspace
>> applications, qemu exposes an ABI to guests which we must keep compatible.
>>      
> I think you still dont understand it: if a tool moves to the kernel repo, then
> it is _released stable_ together with the next stable kernel.
>    

I was confused by the talk about 2.6.34-rc1, which isn't stable.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 11:48                                               ` Ingo Molnar
  2010-03-18 12:22                                                 ` Avi Kivity
@ 2010-03-18 14:53                                                 ` Anthony Liguori
  2010-03-18 16:13                                                   ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-18 14:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 06:48 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/18/2010 12:50 PM, Ingo Molnar wrote:
>>      
>>> * Avi Kivity<avi@redhat.com>   wrote:
>>>
>>>        
>>>>> The moment any change (be it as trivial as fixing a GUI detail or as
>>>>> complex as a new feature) involves two or more packages, development speed
>>>>> slows down to a crawl - while the complexity of the change might be very
>>>>> low!
>>>>>            
>>>> Why is that?
>>>>          
>>> It's very simple: because the contribution latencies and overhead compound,
>>> almost inevitably.
>>>        
>> It's not inevitable, if the projects are badly run, you'll have high
>> latencies, but projects don't have to be badly run.
>>      
> So the 64K dollar question is, why does Qemu still suck?
>    

Why does Linux AIO still suck?  Why do we not have a proper interface in 
userspace for doing asynchronous file system operations?

Why don't we have an interface in userspace to do zero-copy transmit and 
receive of raw network packets?

The lack of a decent userspace API for asynchronous file system 
operations is a huge usability problem for us.  Take a look at the 
complexity of our -drive option.  It's all because the kernel gives us 
sucky interfaces.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 13:00                                                   ` Ingo Molnar
  2010-03-18 13:36                                                     ` Avi Kivity
@ 2010-03-18 14:59                                                     ` Anthony Liguori
  2010-03-18 15:17                                                       ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-18 14:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 08:00 AM, Ingo Molnar wrote:
>> [...]  kvm in fact knows nothing about vga, to take your last
>> example. [...]
>>      
> Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG ioctl.
>
> See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap().
>
> It started out as a VGA optimization (also used by live migration) and even
> today it's mostly used by the VGA drivers - albeit a weak one.
>
> I wish there were stronger VGA optimizations implemented, copying the dirty
> bitmap is not a particularly performant solution. (although it's certainly
> better than full emulation) Graphics performance is one of the more painful
> aspects of KVM usability today.
>    

We have to maintain a dirty bitmap because we don't have a paravirtual 
graphics driver.  IOW, someone needs to write an Xorg driver.

Ideally, we could just implement a Linux framebuffer device, right?  
Well, we took that approach in Xen and that sucks even worse because the 
Xorg framebuffer driver doesn't implement any of the optimizations that 
the Linux framebuffer supports and the Xorg driver does not provide use 
the kernel's interfaces for providing update regions.

Of course, we need to pull in X into the kernel to fix this, right?

Any sufficiently complicated piece of software is going to interact with 
a lot of other projects.  The solution is not to pull it all into one 
massive repository.  It's to build relationships and to find ways to 
efficiently work with the various communities.

And we're working on this with X.  We'll have a paravirtual graphics 
driver very soon.  There are no magic solutions.  We need more 
developers working on the hard problems.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 14:59                                                     ` Anthony Liguori
@ 2010-03-18 15:17                                                       ` Ingo Molnar
  2010-03-18 16:11                                                         ` Anthony Liguori
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 15:17 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/18/2010 08:00 AM, Ingo Molnar wrote:
> >>
> >> [...]  kvm in fact knows nothing about vga, to take your last example. 
> >> [...]
> >
> > Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG 
> > ioctl.
> >
> > See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap().
> >
> > It started out as a VGA optimization (also used by live migration) and 
> > even today it's mostly used by the VGA drivers - albeit a weak one.
> >
> > I wish there were stronger VGA optimizations implemented, copying the 
> > dirty bitmap is not a particularly performant solution. (although it's 
> > certainly better than full emulation) Graphics performance is one of the 
> > more painful aspects of KVM usability today.
> 
> We have to maintain a dirty bitmap because we don't have a paravirtual 
> graphics driver.  IOW, someone needs to write an Xorg driver.
>
> Ideally, we could just implement a Linux framebuffer device, right?

No, you'd want to interact with DRM.

( Especially as you want to write guest accelerators passing guest-space 
  OpenGL requests straight to the kernel DRM level. )

Especially if you want to do things like graphics card virtualization, with 
aspects of the graphics driver passed through to the guest OS.

There are all kernel space projects, going through Xorg would be a horrible 
waste of performance for full-screen virtualization. It's fine for the 
windowed or networked case (and good as a compatibility fallback), but very 
much not fine for local desktop use.

> Well, we took that approach in Xen and that sucks even worse because the 
> Xorg framebuffer driver doesn't implement any of the optimizations that the 
> Linux framebuffer supports and the Xorg driver does not provide use the 
> kernel's interfaces for providing update regions.
> 
> Of course, we need to pull in X into the kernel to fix this, right?

FYI, this part of X has already been pulled into the kernel, it's called DRM. 
If then it's being expanded.

> Any sufficiently complicated piece of software is going to interact with a 
> lot of other projects.  The solution is not to pull it all into one massive 
> repository.  It's to build relationships and to find ways to efficiently 
> work with the various communities.

That's my whole point with this thread: the kernel side of KVM and qemu, but 
all practical purposes should not be two 'separate communities'. They should 
be one and the same thing.

Separation makes sense where the relationship is light or strictly 
hierarchical - here it's neither. KVM and Qemu is interconnected, quite 
fundamentally so.

> And we're working on this with X.  We'll have a paravirtual graphics driver 
> very soon.  There are no magic solutions.  We need more developers working 
> on the hard problems.

The thing is, writing up a DRM connector to a guest Linux OS could be done in 
no time. It could be deployed to users in no time as well, with the proper 
development model.

That after years and years of waiting proper GX support is _still_ not 
implemented in KVM is really telling of the efficiency of development based on 
such disjoint 'communities'. Maybe put up a committee as well to increase 
efficiency? ;-)

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 15:17                                                       ` Ingo Molnar
@ 2010-03-18 16:11                                                         ` Anthony Liguori
  2010-03-18 16:28                                                           ` Ingo Molnar
  2010-03-19  9:19                                                           ` Paul Mundt
  0 siblings, 2 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-18 16:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 10:17 AM, Ingo Molnar wrote:
> * Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>    
>> On 03/18/2010 08:00 AM, Ingo Molnar wrote:
>>      
>>>> [...]  kvm in fact knows nothing about vga, to take your last example.
>>>> [...]
>>>>          
>>> Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG
>>> ioctl.
>>>
>>> See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap().
>>>
>>> It started out as a VGA optimization (also used by live migration) and
>>> even today it's mostly used by the VGA drivers - albeit a weak one.
>>>
>>> I wish there were stronger VGA optimizations implemented, copying the
>>> dirty bitmap is not a particularly performant solution. (although it's
>>> certainly better than full emulation) Graphics performance is one of the
>>> more painful aspects of KVM usability today.
>>>        
>> We have to maintain a dirty bitmap because we don't have a paravirtual
>> graphics driver.  IOW, someone needs to write an Xorg driver.
>>
>> Ideally, we could just implement a Linux framebuffer device, right?
>>      
> No, you'd want to interact with DRM.
>    

Using DRM doesn't help very much.  You still need an X driver and most 
of the operations you care about (video rendering, window movement, etc) 
are not operations that need to go through DRM.

3D graphics virtualization is extremely difficult in the non-passthrough 
case.  It really requires hardware support that isn't widely available 
today (outside a few NVIDIA chipsets).

>> Xorg framebuffer driver doesn't implement any of the optimizations that the
>> Linux framebuffer supports and the Xorg driver does not provide use the
>> kernel's interfaces for providing update regions.
>>
>> Of course, we need to pull in X into the kernel to fix this, right?
>>      
> FYI, this part of X has already been pulled into the kernel, it's called DRM.
> If then it's being expanded.
>    

It doesn't provide the things we need to a good user experience.  You 
need things like an absolute input device, host driven display resize, 
RGBA hardware cursors.  None of these go through DRI and it's those 
things that really provide the graphics user experience.

>> Any sufficiently complicated piece of software is going to interact with a
>> lot of other projects.  The solution is not to pull it all into one massive
>> repository.  It's to build relationships and to find ways to efficiently
>> work with the various communities.
>>      
> That's my whole point with this thread: the kernel side of KVM and qemu, but
> all practical purposes should not be two 'separate communities'. They should
> be one and the same thing.
>    

I don't know why you keep saying this.  The people who are in these 
"separate communities" keep claiming that they don't feel this way.

I'm not just saying this to be argumentative.  Many of the people in the 
community have thought this same thing, and tried it themselves, and 
we've all come to the same conclusion.

It's certainly possible that we just missed the obvious thing to do but 
we'll never know that unless someone shows us.

> The thing is, writing up a DRM connector to a guest Linux OS could be done in
> no time. It could be deployed to users in no time as well, with the proper
> development model.
>    

If this is true, please demonstrate it.  Prove your point with patches 
and I'll happily turn around and do whatever I can to help out.

> That after years and years of waiting proper GX support is _still_ not
> implemented in KVM is really telling of the efficiency of development based on
> such disjoint 'communities'. Maybe put up a committee as well to increase
> efficiency? ;-)
>    

Nah, instead we can just have a few hundred mail thread on the list.  
Otherwise we'd have to write patches and do other kinds of productive work.

Regards,

Anthony Liguori

> 	Ingo
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 14:53                                                 ` Anthony Liguori
@ 2010-03-18 16:13                                                   ` Ingo Molnar
  2010-03-18 16:54                                                     ` Avi Kivity
                                                                       ` (3 more replies)
  0 siblings, 4 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 16:13 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/18/2010 06:48 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>On 03/18/2010 12:50 PM, Ingo Molnar wrote:
> >>>* Avi Kivity<avi@redhat.com>   wrote:
> >>>
> >>>>>The moment any change (be it as trivial as fixing a GUI detail or as
> >>>>>complex as a new feature) involves two or more packages, development speed
> >>>>>slows down to a crawl - while the complexity of the change might be very
> >>>>>low!
> >>>>Why is that?
> >>>It's very simple: because the contribution latencies and overhead compound,
> >>>almost inevitably.
> >>It's not inevitable, if the projects are badly run, you'll have high
> >>latencies, but projects don't have to be badly run.
> >So the 64K dollar question is, why does Qemu still suck?
> 
> Why does Linux AIO still suck?  Why do we not have a proper interface in 
> userspace for doing asynchronous file system operations?

Good that you mention it, i think it's an excellent example.

The suckage of kernel async IO is for similar reasons: there's an ugly package 
separation problem between the kernel and between glibc - and between the apps 
that would make use of it.

( With the separated libaio it was made worse: there were 3 libraries to
  work with, and even less applications that could make use of it ... )

So IMO klibc is an arguably good idea - eventually hpa will get around posting 
it for upstream merging again. Then we could offer both new libraries much 
faster, and could offer things like comprehensive AIO used pervasively within 
existing APIs.

> Why don't we have an interface in userspace to do zero-copy transmit and 
> receive of raw network packets?
>
> The lack of a decent userspace API for asynchronous file system operations 
> is a huge usability problem for us.  Take a look at the complexity of our 
> -drive option.  It's all because the kernel gives us sucky interfaces.

If you had your bits in tools/kvm/ you could make a strong case for a good 
kaio implementation - coupled with an actual, working use-case. ( You could 
use the raw syscall even without klibc. )

We could see the arguments on lkml turn from:

   'do we want this and it will take years to propagate this into apps'

into something like:

   ' Exactly how much faster does kvm go? and I'd get is straight away with my
     next kernel update tomorrow? Wow! '

Ok, i exaggerated a bit - but you get the idea. It's a much different picture 
when kernel developers and maintainers see an actual use-case, _right in the 
kernel repo they work with every day_.

Currently there's a wall between kernel developers and user-space developers, 
and there's somewhat of an element of fear and arrogance on both sides. For 
efficient technology such walls needs torn down and people need a bit more 
experience with each other's areas.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 16:11                                                         ` Anthony Liguori
@ 2010-03-18 16:28                                                           ` Ingo Molnar
  2010-03-18 16:38                                                             ` Anthony Liguori
  2010-03-19  9:19                                                           ` Paul Mundt
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 16:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/18/2010 10:17 AM, Ingo Molnar wrote:
> >* Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >
> >>On 03/18/2010 08:00 AM, Ingo Molnar wrote:
> >>>>[...]  kvm in fact knows nothing about vga, to take your last example.
> >>>>[...]
> >>>Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG
> >>>ioctl.
> >>>
> >>>See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap().
> >>>
> >>>It started out as a VGA optimization (also used by live migration) and
> >>>even today it's mostly used by the VGA drivers - albeit a weak one.
> >>>
> >>>I wish there were stronger VGA optimizations implemented, copying the
> >>>dirty bitmap is not a particularly performant solution. (although it's
> >>>certainly better than full emulation) Graphics performance is one of the
> >>>more painful aspects of KVM usability today.
> >>We have to maintain a dirty bitmap because we don't have a paravirtual
> >>graphics driver.  IOW, someone needs to write an Xorg driver.
> >>
> >>Ideally, we could just implement a Linux framebuffer device, right?
> >No, you'd want to interact with DRM.
> 
> Using DRM doesn't help very much.  You still need an X driver and most of 
> the operations you care about (video rendering, window movement, etc) are 
> not operations that need to go through DRM.

You stripped out this bit from my reply:

> > There are all kernel space projects, going through Xorg would be a 
> > horrible waste of performance for full-screen virtualization. It's fine 
> > for the windowed or networked case (and good as a compatibility fallback), 
> > but very much not fine for local desktop use.

For the full-screen case (which is a very common mode of using a guest OS on 
the desktop) there's not much of window management needed. You need to 
save/restore as you switch in/out.

> 3D graphics virtualization is extremely difficult in the non-passthrough 
> case.  It really requires hardware support that isn't widely available today 
> (outside a few NVIDIA chipsets).

Granted it's difficult in the general case.

> >>Xorg framebuffer driver doesn't implement any of the optimizations that the
> >>Linux framebuffer supports and the Xorg driver does not provide use the
> >>kernel's interfaces for providing update regions.
> >>
> >>Of course, we need to pull in X into the kernel to fix this, right?
> >
> > FYI, this part of X has already been pulled into the kernel, it's called 
> > DRM. If then it's being expanded.
> 
> It doesn't provide the things we need to a good user experience. You need 
> things like an absolute input device, host driven display resize, RGBA 
> hardware cursors.  None of these go through DRI and it's those things that 
> really provide the graphics user experience.

With KSM the display resize is in the kernel. Cursor management is not. Yet: i 
think it would be a nice feature as the cursor could move even if Xorg is 
blocked or busy with other things.

> >> Any sufficiently complicated piece of software is going to interact with 
> >> a lot of other projects.  The solution is not to pull it all into one 
> >> massive repository.  It's to build relationships and to find ways to 
> >> efficiently work with the various communities.
> >
> > That's my whole point with this thread: the kernel side of KVM and qemu, 
> > but all practical purposes should not be two 'separate communities'. They 
> > should be one and the same thing.
> 
> I don't know why you keep saying this.  The people who are in these 
> "separate communities" keep claiming that they don't feel this way.

If you are not two separate communities but one community, then why do you go 
through the (somewhat masochistic) self-punishing excercise of keeping the 
project in two different pieces?

In a distant past Qemu was a separate project and KVM was just a newcomer who 
used it for fancy stuff. Today as you say(?) the two communities are one and 
the same. Why not bring it to its logical conclusion?

> I'm not just saying this to be argumentative.  Many of the people in the 
> community have thought this same thing, and tried it themselves, and we've 
> all come to the same conclusion.
> 
> It's certainly possible that we just missed the obvious thing to do but 
> we'll never know that unless someone shows us.

I'm not aware of anyone in the past having attempted to move qemu to 
tools/kvm/ in the uptream kernel repo, and having reported on the experiences 
with such a contribution setup. (obviously it's not possible at all without 
heavy cooperation and acceptance from you and Avi, so this will probably 
remain a thought experiment forever)

If then you must refer to previous attempts to 'strip down' Qemu, right? Those 
attempts didnt really solve the fundamental problem of project code base 
separation.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 16:28                                                           ` Ingo Molnar
@ 2010-03-18 16:38                                                             ` Anthony Liguori
  2010-03-18 16:51                                                                 ` Pekka Enberg
  0 siblings, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-18 16:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 11:28 AM, Ingo Molnar wrote:
>>> There are all kernel space projects, going through Xorg would be a
>>> horrible waste of performance for full-screen virtualization. It's fine
>>> for the windowed or networked case (and good as a compatibility fallback),
>>> but very much not fine for local desktop use.
>>>        
> For the full-screen case (which is a very common mode of using a guest OS on
> the desktop) there's not much of window management needed. You need to
> save/restore as you switch in/out.
>    

I don't think I've ever used full-screen mode with my VMs and I use 
virtualization on a daily basis.

We hear very infrequently from users using full screen mode.

>> 3D graphics virtualization is extremely difficult in the non-passthrough
>> case.  It really requires hardware support that isn't widely available today
>> (outside a few NVIDIA chipsets).
>>      
> Granted it's difficult in the general case.
>
>    
>>>> Xorg framebuffer driver doesn't implement any of the optimizations that the
>>>> Linux framebuffer supports and the Xorg driver does not provide use the
>>>> kernel's interfaces for providing update regions.
>>>>
>>>> Of course, we need to pull in X into the kernel to fix this, right?
>>>>          
>>> FYI, this part of X has already been pulled into the kernel, it's called
>>> DRM. If then it's being expanded.
>>>        
>> It doesn't provide the things we need to a good user experience. You need
>> things like an absolute input device, host driven display resize, RGBA
>> hardware cursors.  None of these go through DRI and it's those things that
>> really provide the graphics user experience.
>>      
> With KSM the display resize is in the kernel.

KMS

>   Cursor management is not. Yet: i
> think it would be a nice feature as the cursor could move even if Xorg is
> blocked or busy with other things.
>    

If it was all in the kernel, we'd try to support it.

>>>> Any sufficiently complicated piece of software is going to interact with
>>>> a lot of other projects.  The solution is not to pull it all into one
>>>> massive repository.  It's to build relationships and to find ways to
>>>> efficiently work with the various communities.
>>>>          
>>> That's my whole point with this thread: the kernel side of KVM and qemu,
>>> but all practical purposes should not be two 'separate communities'. They
>>> should be one and the same thing.
>>>        
>> I don't know why you keep saying this.  The people who are in these
>> "separate communities" keep claiming that they don't feel this way.
>>      
> If you are not two separate communities but one community, then why do you go
> through the (somewhat masochistic) self-punishing excercise of keeping the
> project in two different pieces?
>    

I don't see any actual KVM developer complaining about this so I'm not 
sure why you're describing it like this.

> In a distant past Qemu was a separate project and KVM was just a newcomer who
> used it for fancy stuff. Today as you say(?) the two communities are one and
> the same. Why not bring it to its logical conclusion?
>    

We lose a huge amount of users and contributors if we put QEMU in the 
Linux kernel.  As I said earlier, a huge number of our contributions 
come from people not using KVM.

>> I'm not just saying this to be argumentative.  Many of the people in the
>> community have thought this same thing, and tried it themselves, and we've
>> all come to the same conclusion.
>>
>> It's certainly possible that we just missed the obvious thing to do but
>> we'll never know that unless someone shows us.
>>      
> I'm not aware of anyone in the past having attempted to move qemu to
> tools/kvm/ in the uptream kernel repo, and having reported on the experiences
> with such a contribution setup. (obviously it's not possible at all without
> heavy cooperation and acceptance from you and Avi, so this will probably
> remain a thought experiment forever)
>    

We've tried to create a "clean" version of QEMU specifically for KVM.  
Moving it into tools/kvm would be the second step.  We've all failed on 
the firs step.

> If then you must refer to previous attempts to 'strip down' Qemu, right? Those
> attempts didnt really solve the fundamental problem of project code base
> separation.
>    

If the problem is combining the two, I've sent you a patch that you can 
put into tip.git if you're so inclined.

Regards,

Anthony Liguori

> 	Ingo
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-18 16:38                                                             ` Anthony Liguori
@ 2010-03-18 16:51                                                                 ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-18 16:51 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Thu, Mar 18, 2010 at 6:38 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
>>>> There are all kernel space projects, going through Xorg would be a
>>>> horrible waste of performance for full-screen virtualization. It's fine
>>>> for the windowed or networked case (and good as a compatibility
>>>> fallback), but very much not fine for local desktop use.
>>
>> For the full-screen case (which is a very common mode of using a guest OS
>> on the desktop) there's not much of window management needed. You need to
>> save/restore as you switch in/out.
>
> I don't think I've ever used full-screen mode with my VMs and I use
> virtualization on a daily basis.
>
> We hear very infrequently from users using full screen mode.

Sorry for getting slightly off-topic but I find the above statement interesting.

I don't use virtualization on daily basis but a working, fully
integrated full-screen model with VirtualBox was the only reason I
bothered to give VMs a second chance. From my point of view, the user
experience of earlier versions (e.g. Parallels) was just too painful
to live with.

/me crawls back to his hole now...

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-18 16:51                                                                 ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-18 16:51 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Thu, Mar 18, 2010 at 6:38 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
>>>> There are all kernel space projects, going through Xorg would be a
>>>> horrible waste of performance for full-screen virtualization. It's fine
>>>> for the windowed or networked case (and good as a compatibility
>>>> fallback), but very much not fine for local desktop use.
>>
>> For the full-screen case (which is a very common mode of using a guest OS
>> on the desktop) there's not much of window management needed. You need to
>> save/restore as you switch in/out.
>
> I don't think I've ever used full-screen mode with my VMs and I use
> virtualization on a daily basis.
>
> We hear very infrequently from users using full screen mode.

Sorry for getting slightly off-topic but I find the above statement interesting.

I don't use virtualization on daily basis but a working, fully
integrated full-screen model with VirtualBox was the only reason I
bothered to give VMs a second chance. From my point of view, the user
experience of earlier versions (e.g. Parallels) was just too painful
to live with.

/me crawls back to his hole now...

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 14:45                                                 ` Jes Sorensen
@ 2010-03-18 16:54                                                   ` Ingo Molnar
  2010-03-18 18:10                                                     ` Anthony Liguori
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 16:54 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker


* Jes Sorensen <Jes.Sorensen@redhat.com> wrote:

> On 03/18/10 15:22, Ingo Molnar wrote:
> >
> >* Jes Sorensen<Jes.Sorensen@redhat.com>  wrote:
> >>Both perf and oprofile are still relatively small projects in comparison to
> >>QEMU.
> >
> >So is your argument that the unification does not make sense due to size?
> >Would a smaller Qemu be more appropriate for this purpose?
> 
> As I have stated repeatedly in this discussion, a unification would hurt the 
> QEMU development process because it would alienate a large number of QEMU 
> developers who are *not* Linux kernel users.

I took a quick look at the qemu.git log and more than half of all recent 
contributions came from Linux distributors.

So without KVM Qemu would be a much, much smaller project. It would be similar 
to how it was 5 years ago.

> QEMU is a lot more complex than you let on.

Please educate me then about the specifics.

> >>Well I think we are just going to agree to disagree on this one. I am not
> >>against merging projects where it makes sense, but in this particular case I
> >>am strongly convinced the loss would be much greater than the gain.
> >
> >I wish you said that based on first hand negative experience with
> >unifications, not based on just pure speculation.
> >
> >(and yes, i speculate too, but at least with some basis)
> 
> You still haven't given us a *single* example of unification of something 
> that wasn't purely linked to the Linux kernel. perf/ oprofile is 100% linked 
> to the Linux kernel, QEMU is not. I wish you would actually look at what 
> users use QEMU for. As long as you continue to purely speculate on this, to 
> use your own words, your arguments are not holding up.

The stats show that the huge increase in Qemu contributions over the past few 
years was mainly due to KVM. Do you claim it wasnt? What other projects make 
use of it and pay developers to work on it?

> And you are not being consistent either. You have conveniently continue to 
> ignore my questions about why the file system tools are not to be merged 
> into the Linux kernel source tree?

Sorry, i didnt comment on it because the answer is obvious: the file system 
tools and pretty much any Linux-exclusive tool (such as udev) should be moved 
there. The difference is that there's not much active development done in most 
of those tools so the benefits are probably marginal. Both Qemu and KVM is 
being developed very actively though, so development model inefficiencies show 
up.

Anyway, i didnt think i'd step into such a hornet's nest by explaining what i 
see as KVM's biggest weakness today and how i suggest it to be fixed :-)

If you dont agree with me, then dont do it - no need to get emotional about 
it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 16:13                                                   ` Ingo Molnar
@ 2010-03-18 16:54                                                     ` Avi Kivity
  2010-03-18 17:11                                                       ` Ingo Molnar
  2010-03-18 18:20                                                     ` Anthony Liguori
                                                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 16:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 06:13 PM, Ingo Molnar wrote:
> Currently there's a wall between kernel developers and user-space developers,
> and there's somewhat of an element of fear and arrogance on both sides. For
> efficient technology such walls needs torn down and people need a bit more
> experience with each other's areas.
>    


I think you're increasing the height of that wall by arguing that a 
userspace project cannot be successful because it's development process 
sucks and the only way to fix it is to put it into the kernel where 
people know so much better.  Instead we kernel developers should listen 
to requirements from users, even if their code isn't in tools/.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 16:51                                                                 ` Pekka Enberg
  (?)
@ 2010-03-18 17:02                                                                 ` Ingo Molnar
  2010-03-18 17:09                                                                   ` Avi Kivity
  -1 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 17:02 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> On Thu, Mar 18, 2010 at 6:38 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> >>>> There are all kernel space projects, going through Xorg would be a
> >>>> horrible waste of performance for full-screen virtualization. It's fine
> >>>> for the windowed or networked case (and good as a compatibility
> >>>> fallback), but very much not fine for local desktop use.
> >>
> >> For the full-screen case (which is a very common mode of using a guest OS
> >> on the desktop) there's not much of window management needed. You need to
> >> save/restore as you switch in/out.
> >
> > I don't think I've ever used full-screen mode with my VMs and I use
> > virtualization on a daily basis.
> >
> > We hear very infrequently from users using full screen mode.
> 
> Sorry for getting slightly off-topic but I find the above statement 
> interesting.
> 
> I don't use virtualization on daily basis but a working, fully integrated 
> full-screen model with VirtualBox was the only reason I bothered to give VMs 
> a second chance. From my point of view, the user experience of earlier 
> versions (e.g. Parallels) was just too painful to live with.

That's the same i do, and that's what i'm hearing from other desktop users as 
well.

The moment you work seriously in a guest OS you often want to switch to it 
full-screen, to maximize screen real-estate and to reduce host GUI element 
distractions. If it's just casual use of a single app then windowed mode 
suffices (but in that case performance doesnt matter much to begin with).

I find the 'KVM mostly cares about the server, not about the desktop' attitude 
expressed in this thread troubling.

> /me crawls back to his hole now...

/me should do that too - this discussion is not resulting in any positive 
result so it has become rather pointless.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 17:02                                                                 ` Ingo Molnar
@ 2010-03-18 17:09                                                                   ` Avi Kivity
  2010-03-18 17:28                                                                     ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-18 17:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 07:02 PM, Ingo Molnar wrote:
>
> I find the 'KVM mostly cares about the server, not about the desktop' attitude
> expressed in this thread troubling.
>    

It's not kvm, just it's developers (and their employers, where 
applicable).  If you post desktop oriented patches I'm sure they'll be 
welcome.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 16:54                                                     ` Avi Kivity
@ 2010-03-18 17:11                                                       ` Ingo Molnar
  0 siblings, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 17:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/18/2010 06:13 PM, Ingo Molnar wrote:
>
> > Currently there's a wall between kernel developers and user-space 
> > developers, and there's somewhat of an element of fear and arrogance on 
> > both sides. For efficient technology such walls needs torn down and people 
> > need a bit more experience with each other's areas.
> 
> I think you're increasing the height of that wall by arguing that a 
> userspace project cannot be successful because it's development process 
> sucks and the only way to fix it is to put it into the kernel where people 
> know so much better.  Instead we kernel developers should listen to 
> requirements from users, even if their code isn't in tools/.

No, it's tearing down that wall because finally, instead of providing rather 
abstract system calls that are designed perfectly, the kernel can operate by 
providing useful libraries and apps.

At least on the context i've worked on it has torn down walls and has improved 
the efficiency of working on ABIs towards user-space. (sysprof is an example 
of that)

Kernel developers are finally faced with user-space development directly, in 
the same repository, using the same rules of contribution.

Non-kernel-hosted apps win from that process too, as even if they dont 
integrate (because they dont want to or cannot for license reasons) they can 
participate in a more direct (and more practical) exchange with kernel 
developers. They can contribute a new system call and create a library 
function for it straight away.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 14:38                                                         ` Avi Kivity
@ 2010-03-18 17:16                                                           ` Ingo Molnar
  0 siblings, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 17:16 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/18/2010 04:09 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>> That is not what i said. I said they are closely related, and where 
> >>> technologies are closely related, project proximity turns into project 
> >>> unification at a certain stage.
> >>
> >> I really don't see how.  So what if both qemu and kvm implement an i8254? 
> >> They can't share any code since the internal APIs are so different. [...]
> >
> > I wouldnt jump to assumptions there. perf shares some facilities with the 
> > kernel on the source code level - they can be built both in the kernel and 
> > in user-space.
> >
> > But my main thought wasnt even to actually share the implementation - but 
> > to actually synchronize when a piece of device emulation moves into the 
> > kernel. It is arguably bad for performance in most cases when Qemu handles 
> > a given device - so all the common devices should be kernel accelerated.
> >
> > The version and testing matrix would be simplified significantly as well: 
> > as kernel and qemu goes hand in hand, they are always on the same version.
> 
> So, you propose to allow running tools/kvm/ only on the kernel it was 
> shipped with?

No, but i propose concentrating on that natural combination.

> Otherwise the testing matrix isn't simplified.

It is, because testing is more focused and more people are testing the 
combination that developers tested as well. (and not some random version 
combination picked by the distributor or the user)

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 17:09                                                                   ` Avi Kivity
@ 2010-03-18 17:28                                                                     ` Ingo Molnar
  2010-03-19  7:56                                                                       ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 17:28 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/18/2010 07:02 PM, Ingo Molnar wrote:
> >
> > I find the 'KVM mostly cares about the server, not about the desktop' 
> > attitude expressed in this thread troubling.
> 
> It's not kvm, just it's developers (and their employers, where applicable).  
> If you post desktop oriented patches I'm sure they'll be welcome.

Just such a patch-set was posted in this very thread: 'perf kvm'.

There were two negative reactions immediately, both showed a fundamental 
server versus desktop bias:

 - you did not accept that the most important usecase is when there is a
   single guest running.

 - the reaction to the 'how do we get symbols out of the guest' sub-question 
   was, paraphrased: 'we dont want that due to <unspecified> security threat 
   to XYZ selinux usecase with lots of guests'.

Anyone being aware of how Linux and KVM is being used on the desktop will know 
how detached that attitude is from the typical desktop usecase ...

Usability _never_ sucks because of lack of patches or lack of suggestions. I 
bet if you made the next server feature contingent on essential usability 
fixes they'd happen overnight - for God's sake there's been 1000 commits in 
the last 3 months in the Qemu repository so there's plenty of manpower...

Usability suckage - and i'm not going to be popular for saying this out loud - 
almost always shows a basic maintainer disconnect with the real world. See 
your very first reactions to my 'KVM usability' observations. Read back your 
and Anthony's replies: total 'sure, patches welcome' kind of indifference. It 
is _your project_, not some other project down the road ...

So that is my first-hand experience about how you are welcoming these desktop 
issues, in this very thread. I suspect people try a few times with 
suggestions, then get shot down like our suggestions were shot down and then 
give up.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 16:54                                                   ` Ingo Molnar
@ 2010-03-18 18:10                                                     ` Anthony Liguori
  0 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-18 18:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jes Sorensen, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/18/2010 11:54 AM, Ingo Molnar wrote:
> I took a quick look at the qemu.git log and more than half of all recent
> contributions came from Linux distributors.
>    

I don't know what you're looking at, but in the past month, there's been 
56 unique contributors, with 411 changesets.  I count 16 people employed 
by distributions with 188 changesets.

> So without KVM Qemu would be a much, much smaller project. It would be similar
> to how it was 5 years ago.
>    

I'm not saying that KVM isn't significant.  I'm employed to work on QEMU 
because of KVM.

I'm just saying that KVM users aren't 99% of the community and that we 
can't neglect the rest of the community.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 16:13                                                   ` Ingo Molnar
  2010-03-18 16:54                                                     ` Avi Kivity
@ 2010-03-18 18:20                                                     ` Anthony Liguori
  2010-03-18 18:23                                                     ` drepper
  2010-03-21 13:27                                                     ` Gabor Gombas
  3 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-18 18:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/18/2010 11:13 AM, Ingo Molnar wrote:
> Good that you mention it, i think it's an excellent example.
> The suckage of kernel async IO is for similar reasons: there's an ugly package
> separation problem between the kernel and between glibc - and between the apps
> that would make use of it.
>
> ( With the separated libaio it was made worse: there were 3 libraries to
>    work with, and even less applications that could make use of it ... )
>
> So IMO klibc is an arguably good idea - eventually hpa will get around posting
> it for upstream merging again. Then we could offer both new libraries much
> faster, and could offer things like comprehensive AIO used pervasively within
> existing APIs.
>    

And why wouldn't the kernel developers produce posix-aio within klibc.

posix-aio is also a really terrible interface (although not as bad as 
linux-aio).

The reason boils down to the fact that these interfaces are designed 
without interacting with the consumers.  Part of the reason for that is 
the attitude of the community.

You approached this discussion with, "QEMU/KVM sucks, you should move 
into the kernel because we're awesome and we'd fix everything in a heart 
beat".  That attitude does not result in any useful collaboration.

Had you started trying to understand what the problems that we face are 
and whether there's anything that can be done in the kernel to improve 
it, it would have been an entirely different discussion.

The sad thing is, QEMU is probably one of the most demanding free 
software applications out there today with respect to performance.  We 
consume interfaces IO interfaces and things like large pages in a deeper 
way than just about any application out there.

We've been trying for a long time to improve Linux interfaces for years 
but we've not had many people in the kernel community be receptive.

We've failed to improve the userspace networking interfaces.  Compare 
Rusty's posting of vringfd to vhost-net.  They are the same interface 
except we tried to do something more generally useful with vringfd and 
it was shot down because it was "yet another kernel/userspace data 
transfer interface".  Unfortunately, we're learning that if we claim 
something is virtualization specific, we avoid a lot of the kernel 
bureaucracy.  My concern is that over time, we'll have more things like 
vhost and that's bad for everyone.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 10:32                                         ` Avi Kivity
  2010-03-18 11:19                                           ` Ingo Molnar
@ 2010-03-18 18:20                                           ` Frederic Weisbecker
  2010-03-18 19:50                                             ` Frank Ch. Eigler
  1 sibling, 1 reply; 390+ messages in thread
From: Frederic Weisbecker @ 2010-03-18 18:20 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo

On Thu, Mar 18, 2010 at 12:32:51PM +0200, Avi Kivity wrote:
> By "serious developer" I mean
>
>  - someone who is interested in contributing, not in getting their name  
> into the kernel commits list
>  - someone who is willing to read the wiki page and find out where the  
> repository and mailing list for a project is
>  - someone who will spend enough time on the project so that the time to 
> clone two repositories will not be a factor in their contributions


I'm not going to argue about the Qemu merging here.
But your above assessment is incomplete.

It is not because developers don't want to clone two different
trees that tools/perf is a success. Or may be it's a factor but
I suspect it to be very minimal. I can script git commands if
needed. It is actually because both kernel and user side are
sync in this scheme.



> Let's wait and see then.  If the tools/perf/ experience has really good  
> results, we can reconsider this at a later date.


I think it has already really good results.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 16:13                                                   ` Ingo Molnar
  2010-03-18 16:54                                                     ` Avi Kivity
  2010-03-18 18:20                                                     ` Anthony Liguori
@ 2010-03-18 18:23                                                     ` drepper
  2010-03-18 19:15                                                       ` Ingo Molnar
  2010-03-21 13:27                                                     ` Gabor Gombas
  3 siblings, 1 reply; 390+ messages in thread
From: drepper @ 2010-03-18 18:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

On Thu, Mar 18, 2010 at 09:13, Ingo Molnar <mingo@elte.hu> wrote:
> The suckage of kernel async IO is for similar reasons: there's an ugly package
> separation problem between the kernel and between glibc

Bollocks.  glibc would use (and is using) everything the kernel provides.  We even have an implementation using the current AIO code.  It only works in some situations but that's what the few users are OK with.

Don't try to blame anyone but kernel people for the complete and utter failure of AIO in Linux.  I don't know how often I've discussed design of a kernel interface with various kernel developers.  Heck, whenever Zach Brown and I meet there never is a different topic.  And following these meetings the ball is not and cannot be in my court.  How could it?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 272 bytes --]

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 18:23                                                     ` drepper
@ 2010-03-18 19:15                                                       ` Ingo Molnar
  2010-03-18 19:37                                                         ` drepper
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 19:15 UTC (permalink / raw)
  To: drepper
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* drepper@gmail.com <drepper@gmail.com> wrote:

> On Thu, Mar 18, 2010 at 09:13, Ingo Molnar <mingo@elte.hu> wrote:
>
> > The suckage of kernel async IO is for similar reasons: there's an ugly 
> > package separation problem between the kernel and between glibc
> 
> Bollocks.  glibc would use (and is using) everything the kernel provides.

I didnt say it's glibc's fault - if then it's more of the kernel's fault as 
most of the complexity is on that side. I said it's due to the fundamental 
distance between the app that makes use of it, the library and the kernel, and 
the resulting difficulties in getting a combined solution out.

None of the parties really feels it to be their own thing.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 19:15                                                       ` Ingo Molnar
@ 2010-03-18 19:37                                                         ` drepper
  2010-03-18 20:18                                                           ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: drepper @ 2010-03-18 19:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Avi Kivity, Peter Zijlstra, linux-kernel, kvm,
	Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

[-- Attachment #1: Type: text/plain, Size: 1490 bytes --]

On Thu, Mar 18, 2010 at 12:15, Ingo Molnar <mingo@elte.hu> wrote:
> I didnt say it's glibc's fault - if then it's more of the kernel's fault as
> most of the complexity is on that side. I said it's due to the fundamental
> distance between the app that makes use of it, the library and the kernel, and
> the resulting difficulties in getting a combined solution out.

This is wrong, too.  Once there is a kernel patch that has a reasonable syscall interface it's easy enough to hack up the glibc side.  Don't try to artificially find an argument to support your thesis.  If kernel developers always need an immediate itch which lives inside the kernel walls to make a change this is a failure of the kernel model and mustn't be "solved" by dragging ever more code into the kernel.

Aside, you don't need a full-fledged glibc implementation for testing.  Especially for AIO it should be usable in much lighter-weight contexts than POSIX AIO.  These wrappers are even more easy to hack up (and have been in the few cases where some code has been produced).

For AIO the situation isn't that the people interested in working on it don't know or care about the use.  Zach (through Oracle's products) is very much interested in the code and knows how it should look like.

Face it, AIO is an example of a complete failure of the kernel developers to provide something usable.  This was the argument and where you started the misdirection of including other projects in the reasoning.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 272 bytes --]

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 18:20                                           ` Frederic Weisbecker
@ 2010-03-18 19:50                                             ` Frank Ch. Eigler
  2010-03-18 20:47                                               ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Frank Ch. Eigler @ 2010-03-18 19:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Avi Kivity, Ingo Molnar, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo


Frederic Weisbecker <fweisbec@gmail.com> writes:

> [...]  It is actually because both kernel and user side are sync in
> this scheme.  [...]

This argues that co-evolution of an interface is easiest on the
developers if they own both sides of that interface.  No quarrel.

This does not argue that that the preservation of a stable ABI is best
done this way.  If anything, it makes it too easy to change both the
provider and the preferred user of the interface without noticing
unintentional breakage to forlorn out-of-your-tree clients.


- FChE

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 19:37                                                         ` drepper
@ 2010-03-18 20:18                                                           ` Ingo Molnar
  2010-03-18 20:39                                                             ` drepper
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 20:18 UTC (permalink / raw)
  To: drepper
  Cc: Anthony Liguori, Avi Kivity, Peter Zijlstra, linux-kernel, kvm,
	Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* drepper@gmail.com <drepper@gmail.com> wrote:

> On Thu, Mar 18, 2010 at 12:15, Ingo Molnar <mingo@elte.hu> wrote:
>
> > I didnt say it's glibc's fault - if then it's more of the kernel's fault 
> > as most of the complexity is on that side. I said it's due to the 
> > fundamental distance between the app that makes use of it, the library and 
> > the kernel, and the resulting difficulties in getting a combined solution 
> > out.
> 
> This is wrong, too.  Once there is a kernel patch that has a reasonable 
> syscall interface it's easy enough to hack up the glibc side. [...]

Where 'reasonable' is defined by you, right?

As i said, the KAIO situation is mostly the kernel's fault, but you are a 
pretty passive and unhelpful entity in this matter too, arent you?

For example, just to state the obvious: libaio has been written 8 years ago in 
2002 and has been used in apps early on. Why arent those kernel APIs, while 
not being a full/complete solution, supported by glibc, and wrapped to 
pthreads based emulation on kernels that dont support it?

I'm not talking about a 100% full POSIX AIO implementation (the kernel side is 
not complete enough for that) - i'm just talking about the APIs that libaio 
and the kernel supports today.

Why isnt glibc itself making use of those AIO capabilities internally? (even 
if it's not possible to support full POSIX AIO)

I checked today's glibc repo, and there's no sign of any of that:

 glibc> git grep io_submit 
 glibc> git grep aio_context_t 
 glibc> 

Zero, nil, nada.

Getting _something_ into glibc would certainly help move the situation. Glibc 
itself using existing KAIO bits internally would help too and dont tell me 
it's 100% unusable: it's certainly capable enough to run DB servers. glibc 
using it would create further demand (and pressure, and incentives) for 
improvements.

There were even glibc patches created by Ben LaHaise for some of these bits, 
IIRC.

One can certainly make the argument that glibc not using _any_ of the current 
KAIO capabilities harms its further development.

> [...] Don't try to artificially find an argument to support your thesis.

Charming argumentation style, i really missed it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 20:18                                                           ` Ingo Molnar
@ 2010-03-18 20:39                                                             ` drepper
  2010-03-18 20:56                                                               ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: drepper @ 2010-03-18 20:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Avi Kivity, Peter Zijlstra, linux-kernel, kvm,
	Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

[-- Attachment #1: Type: text/plain, Size: 3022 bytes --]

On Thu, Mar 18, 2010 at 13:18, Ingo Molnar <mingo@elte.hu> wrote:
> Where 'reasonable' is defined by you, right?

Not only by me.  For some of the AIO approaches which happened there were also glibc patches other people wrote.  It's pretty simple.


> As i said, the KAIO situation is mostly the kernel's fault, but you are a
> pretty passive and unhelpful entity in this matter too, arent you?

How'd you guess?  I've always been been willing to discuss interface requirements with whoever showed interest in implementing things.  Again, ask Zach.  I think Christoph Lameter also was involved as were various SGI people over the years.

Short of actually doing all the work myself I've done what can be expected.


> For example, just to state the obvious: libaio has been written 8 years ago in
> 2002 and has been used in apps early on. Why arent those kernel APIs, while
> not being a full/complete solution, supported by glibc, and wrapped to
> pthreads based emulation on kernels that dont support it?

You never looked at the glibc code in use and didn't read what I wrote before.  We do have an implementation of libaio using those interfaces.  They exist  in the Fedora/RHEL glibc and are probably copied elsewhere, too.  The code is not upstream because it is not general enough.  It simply doesn't work in all situations.

The problem with using it (among others) is that certain operations cannot be implemented.  And that's not a kernel interface problem.  I cannot just switch to using the pthread-based code when coming across something that's not implementable because then the requests have already been sent to the kernel.  Only code that knows about the limitations ahead of time can use the KAIO code.


> Why isnt glibc itself making use of those AIO capabilities internally? (even
> if it's not possible to support full POSIX AIO)

For what?  glibc doesn't implement anything requiring AIO.  The only non-trivial file handling is in nscd and nscd uses memory mapped files.


> I checked today's glibc repo, and there's no sign of any of that:

Check the Fedora/RHEL/... source files.


> Getting _something_ into glibc would certainly help move the situation.

No it won't as the 7+ since Jakub wrote the code nothing came out of it.  And before you again make groundless claims, there was plenty of discussions with kernel people at the time when the code was written.


> it's certainly capable enough to run DB servers. glibc
> using it would create further demand (and pressure, and incentives) for
> improvements.

There simply is no need for AIO in glibc internally.  Well, there might be, if it could be used on sockets.  But that's not the case.


> Charming argumentation style, i really missed it.

Well, then this last mail should show you.  Without knowing the subject matter, just based on flawed lookups you try to spread the blame to make sure that no mud ever sticks to development process you are so fond of.  Sorry to disappoint you.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 272 bytes --]

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 19:50                                             ` Frank Ch. Eigler
@ 2010-03-18 20:47                                               ` Ingo Molnar
  0 siblings, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 20:47 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Frederic Weisbecker, Avi Kivity, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo


* Frank Ch. Eigler <fche@redhat.com> wrote:

> Frederic Weisbecker <fweisbec@gmail.com> writes:
> 
> > [...]  It is actually because both kernel and user side are sync in this 
> > scheme.  [...]
> 
> This argues that co-evolution of an interface is easiest on the developers 
> if they own both sides of that interface.  No quarrel.

Correct, that's a big advantage.

> This does not argue that that the preservation of a stable ABI is best done 
> this way.  If anything, it makes it too easy to change both the provider and 
> the preferred user of the interface without noticing unintentional breakage 
> to forlorn out-of-your-tree clients.

Your concern is valid, and this issue has been raised in the past as one of 
the main counter-arguments against tools/perf/. (there was a big flamewar 
about it on lkml when it was introduced)

Our roughly 1 year experience with perf is that, somewhat pradoxially, this 
scheme not only works as well as classic ABI schemes but actually brings a 
_better_ ABI than the classic "let the kernel define an ABI" single-sided 
solution.

I know the difference first hand, i've written various syscalls ABIs in the 
past 10+ years before perf and know how they interact with their user space 
counterparts.

Why did it work out better with tools/perf/? It turns out that there's an 
immediate, direct, actionable test feedback effect on the ABI, and much closer 
relation to the ABI. Typically the same developer implements the kernel bits 
and the user-space bits (because it's so easy to do co-development), so the 
ABI aspects are ingrained in the developer much more deeply. Once you see the 
kind of havoc ABI breakage can cause during development you avoid it in the 
future.

So developers find that a good, stable ABI helps development. It turns out 
that developers dont actually _want_ to break the ABI and are careful about it 
- and having the app next to the kernel ABI and co-developing it makes it sure 
there's never any true mismatch.

Also, we can do ABI improvements at a far higher rate than any other kernel 
subsystem. I checked the git logs, we've done over three dozen ABI extensions 
since the first version, and all were forwards _and_ backwards compatible.

A higher rate of change gives developers more experience and lets them do a 
better ABI, and makes them more ABI-conscious. I think if all kernel ABIs had 
such a healthy rate of change we'd fill in all the missing kernel features 
very quickly.

With detached packages ABI features are often done by a kernel developer (who 
is familar with the kernel subsystem in question) and a separate user-space 
developer (who is familar with the user-space project in question), and the 
ABI consciousness is less strong.

So you are right that there's a danger of accidental ABI breakage, but it's 
not an issue in practice. There are external apps making use of the ABI as 
well, not just tools/perf/.

In a more abstract sense this is kind of a classic case of game theory: that a 
assume-trust strategy pays off in the long run.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 20:39                                                             ` drepper
@ 2010-03-18 20:56                                                               ` Ingo Molnar
  2010-03-18 22:06                                                                 ` Alan Cox
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 20:56 UTC (permalink / raw)
  To: drepper
  Cc: Anthony Liguori, Avi Kivity, Peter Zijlstra, linux-kernel, kvm,
	Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* drepper@gmail.com <drepper@gmail.com> wrote:

> > For example, just to state the obvious: libaio has been written 8 years 
> > ago in 2002 and has been used in apps early on. Why arent those kernel 
> > APIs, while not being a full/complete solution, supported by glibc, and 
> > wrapped to pthreads based emulation on kernels that dont support it?
> 
> You never looked at the glibc code in use and didn't read what I wrote 
> before.  We do have an implementation of libaio using those interfaces.  
> They exist in the Fedora/RHEL glibc and are probably copied elsewhere, too.  
> The code is not upstream because it is not general enough.  It simply 
> doesn't work in all situations.

So it's good enough to be in Fedora/RHEL but not good enough to be in upstream 
glibc? How is that possible? Isnt that a double standard?

Upstream libc presence is really what is needed for an API to be ubiquitous to 
apps. That is what 'closes the loop' in the the positive feedback cycle loop 
and creates real back pressure and demand on the kernel to get its act 
together.

Again, i state it for the third time, the KAIO situation is mostly the 
kernel's fault. But glibc is certainly not being helpful in that situation 
either and your earlier claim that you are only waiting for the patches is 
rather dishonest.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 10:50                                           ` Ingo Molnar
  2010-03-18 11:30                                             ` Avi Kivity
@ 2010-03-18 21:02                                             ` Zachary Amsden
  2010-03-18 21:15                                               ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Zachary Amsden @ 2010-03-18 21:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 12:50 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> The moment any change (be it as trivial as fixing a GUI detail or as
>>> complex as a new feature) involves two or more packages, development speed
>>> slows down to a crawl - while the complexity of the change might be very
>>> low!
>>>        
>> Why is that?
>>      
> It's very simple: because the contribution latencies and overhead compound,
> almost inevitably.
>
> If you ever tried to implement a combo GCC+glibc+kernel feature you'll know
> ...
>
> Even with the best-run projects in existence it takes forever and is very
> painful - and here i talk about first hand experience over many years.
>    

Ingo, what you miss is that this is not a bad thing.  Fact of the matter 
is, it's not just painful, it downright sucks.

This is actually a Good Thing (tm).  It means you have to get your 
feature and its interfaces well defined and able to version forwards and 
backwards independently from each other.  And that introduces some 
complexity and time and testing, but in the end it's what you want.  You 
don't introduce a requirement to have the feature, but take advantage of 
it if it is there.

It may take everyone else a couple years to upgrade the compilers, 
tools, libraries and kernel, and by that time any bugs introduced by 
interacting with this feature will have been ironed out and their 
patterns well known.

If you haven't well defined and carefully thought out the feature ahead 
of time, you end up creating a giant mess, possibly the need for nasty 
backwards compatibility (case in point: COMPAT_VDSO).  But in the end, 
you would have made those same mistakes on your internal tree anyway, 
and then you (or likely, some other hapless project maintainer for the 
project you forked) would have to go add the features, fixes and 
workarounds back to the original project(s).  However, since you 
developed in an insulated sheltered environment, those fixes and 
workarounds would not be robust and independently versionable from each 
other.

The result is you've kept your codebase version-neutral, forked in 
outside code, enhanced it, and left the hard work of backporting those 
changes and keeping them version-safe to the original package 
maintainers you forked from.  What you've created is no longer a single 
project, it is called a distro, and you're being short-sighted and 
anti-social to think you can garner more support than all of those 
individual packages you forked.  This is why most developers work 
upstream and let the goodness propagate down from the top like molten 
sugar of each granular package on a flan where it is collected from the 
rich custard channel sitting on a distribution plate below before the 
big hungry mouth of the consumer devours it and incorporates it into 
their infrastructure.

Or at least, something like that, until the last sentence.  In short, if 
project A has Y active developers, you better have Z >> Y active 
developers to throw at project A when you fork it into project B.

Zach

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 21:02                                             ` Zachary Amsden
@ 2010-03-18 21:15                                               ` Ingo Molnar
  2010-03-18 22:19                                                 ` Zachary Amsden
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 21:15 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Zachary Amsden <zamsden@redhat.com> wrote:

> On 03/18/2010 12:50 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>>The moment any change (be it as trivial as fixing a GUI detail or as
> >>>complex as a new feature) involves two or more packages, development speed
> >>>slows down to a crawl - while the complexity of the change might be very
> >>>low!
> >>Why is that?
> >It's very simple: because the contribution latencies and overhead compound,
> >almost inevitably.
> >
> >If you ever tried to implement a combo GCC+glibc+kernel feature you'll know
> >...
> >
> >Even with the best-run projects in existence it takes forever and is very
> >painful - and here i talk about first hand experience over many years.
> 
> Ingo, what you miss is that this is not a bad thing.  Fact of the
> matter is, it's not just painful, it downright sucks.

Our experience is the opposite, and we tried both variants and report about 
our experience with both models honestly.

You only have experience about one variant - the one you advocate.

See the assymetry?

> This is actually a Good Thing (tm).  It means you have to get your
> feature and its interfaces well defined and able to version forwards
> and backwards independently from each other.  And that introduces
> some complexity and time and testing, but in the end it's what you
> want.  You don't introduce a requirement to have the feature, but
> take advantage of it if it is there.
> 
> It may take everyone else a couple years to upgrade the compilers,
> tools, libraries and kernel, and by that time any bugs introduced by
> interacting with this feature will have been ironed out and their
> patterns well known.

Sorry, but this is pain not true. The 2.4->2.6 kernel cycle debacle has taught 
us that waiting long to 'iron out' the details has the following effects:

 - developer pain
 - user pain
 - distro pain
 - disconnect
 - loss of developers, testers and users
 - grave bugs discovered months (years ...) down the line
 - untested features
 - developer exhaustion

It didnt work, trust me - and i've been around long enough to have suffered 
through the whole 2.5.x misery. Some of our worst ABIs come from that cycle as 
well.

So we first created the 2.6.x process, then as we saw that it worked much 
better we _sped up_ the kernel development process some more, to what many 
claimed was an impossible, crazy pace: two weeks merge window, 2.5 months 
stabilization and a stable release every 3 months.

And you can also see the countless examples of carefully drafted, well thought 
out, committee written computer standards that were honed for years, which are 
not worth the paper they are written on.

'extra time' and 'extra buerocratic overhead to think things through' is about 
the worst thing you can inject into a development process.

You should think about the human brain as a cache - the 'closer' things are 
both in time and pyshically, the better they end up being. Also, the more 
gradual, the more concentrated a thing is, the better it works out in general. 
This is part of the basic human nature.

Sorry, but i really think you are really trying to rationalize a disadvantage 
here ...

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 20:56                                                               ` Ingo Molnar
@ 2010-03-18 22:06                                                                 ` Alan Cox
  2010-03-18 22:16                                                                   ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Alan Cox @ 2010-03-18 22:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: drepper, Anthony Liguori, Avi Kivity, Peter Zijlstra,
	linux-kernel, kvm, Marcelo Tosatti, Joerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

> So it's good enough to be in Fedora/RHEL but not good enough to be in upstream 
> glibc? How is that possible? Isnt that a double standard?

Yes its a double standard

Glibc has a higher standard than Fedora/RHEL.

Just like the Ubuntu kernel ships various ugly unfit for upstream kernel
drivers.

> kernel's fault. But glibc is certainly not being helpful in that situation 
> either and your earlier claim that you are only waiting for the patches is 
> rather dishonest.

I am sure Ulrich is being totally honest, but send him the patches and
you'll find out. Plus you will learn what the API should look like when
you try and create them ...

Alan

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 22:06                                                                 ` Alan Cox
@ 2010-03-18 22:16                                                                   ` Ingo Molnar
  2010-03-19  7:22                                                                     ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 22:16 UTC (permalink / raw)
  To: Alan Cox
  Cc: drepper, Anthony Liguori, Avi Kivity, Peter Zijlstra,
	linux-kernel, kvm, Marcelo Tosatti, Joerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker


* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > So it's good enough to be in Fedora/RHEL but not good enough to be in 
> > upstream glibc? How is that possible? Isnt that a double standard?
> 
> Yes its a double standard
> 
> Glibc has a higher standard than Fedora/RHEL.
>
> Just like the Ubuntu kernel ships various ugly unfit for upstream kernel 
> drivers.

There's a world of a difference between a fugly driver and a glibc patch.

Also, we tend to upstream even fugly kernel drivers if they are important and 
are deployed by a major distro - see Noveau.

> > kernel's fault. But glibc is certainly not being helpful in that situation 
> > either and your earlier claim that you are only waiting for the patches is 
> > rather dishonest.
> 
> I am sure Ulrich is being totally honest, but send him the patches and 
> you'll find out. Plus you will learn what the API should look like when you 
> try and create them ...

I was there and extended/fixed bits of the kaio/libaio code when they were 
written so yes i already know something about it. To say that the glibc 
reaction was less than enthusiastic back then is a strong euphemism ;-)

So after 8 years some of the bits made its way into Fedora/RHEL.

I think this is a pretty good demonstration of the points i made ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 21:15                                               ` Ingo Molnar
@ 2010-03-18 22:19                                                 ` Zachary Amsden
  2010-03-18 22:44                                                   ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Zachary Amsden @ 2010-03-18 22:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 11:15 AM, Ingo Molnar wrote:
> * Zachary Amsden<zamsden@redhat.com>  wrote:
>
>    
>> On 03/18/2010 12:50 AM, Ingo Molnar wrote:
>>      
>>> * Avi Kivity<avi@redhat.com>   wrote:
>>>
>>>        
>>>>> The moment any change (be it as trivial as fixing a GUI detail or as
>>>>> complex as a new feature) involves two or more packages, development speed
>>>>> slows down to a crawl - while the complexity of the change might be very
>>>>> low!
>>>>>            
>>>> Why is that?
>>>>          
>>> It's very simple: because the contribution latencies and overhead compound,
>>> almost inevitably.
>>>
>>> If you ever tried to implement a combo GCC+glibc+kernel feature you'll know
>>> ...
>>>
>>> Even with the best-run projects in existence it takes forever and is very
>>> painful - and here i talk about first hand experience over many years.
>>>        
>> Ingo, what you miss is that this is not a bad thing.  Fact of the
>> matter is, it's not just painful, it downright sucks.
>>      
> Our experience is the opposite, and we tried both variants and report about
> our experience with both models honestly.
>
> You only have experience about one variant - the one you advocate.
>
> See the assymetry?
>
>    
>> This is actually a Good Thing (tm).  It means you have to get your
>> feature and its interfaces well defined and able to version forwards
>> and backwards independently from each other.  And that introduces
>> some complexity and time and testing, but in the end it's what you
>> want.  You don't introduce a requirement to have the feature, but
>> take advantage of it if it is there.
>>
>> It may take everyone else a couple years to upgrade the compilers,
>> tools, libraries and kernel, and by that time any bugs introduced by
>> interacting with this feature will have been ironed out and their
>> patterns well known.
>>      
> Sorry, but this is pain not true. The 2.4->2.6 kernel cycle debacle has taught
> us that waiting long to 'iron out' the details has the following effects:
>
>   - developer pain
>   - user pain
>   - distro pain
>   - disconnect
>   - loss of developers, testers and users
>   - grave bugs discovered months (years ...) down the line
>   - untested features
>   - developer exhaustion
>
> It didnt work, trust me - and i've been around long enough to have suffered
> through the whole 2.5.x misery. Some of our worst ABIs come from that cycle as
> well.
>    

You're talking about a single project and comparing it to my argument 
about multiple independent projects.  In that case, I see no point in 
the discussion.  If you want to win the argument by strawman, you are 
welcome to do so.

> Sorry, but i really think you are really trying to rationalize a disadvantage
> here ...
>    

This could very well be true, but until someone comes forward with 
compelling numbers (as in, developers committed to working on the 
project, number of patches and total amount of code contribution), there 
is no point in having an argument, there really isn't anything to 
discuss other than opinion.  My opinion is you need a really strong 
justification to have a successful fork and I don't see that justification.

Zach

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 22:19                                                 ` Zachary Amsden
@ 2010-03-18 22:44                                                   ` Ingo Molnar
  2010-03-19  7:21                                                     ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-18 22:44 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Zachary Amsden <zamsden@redhat.com> wrote:

> On 03/18/2010 11:15 AM, Ingo Molnar wrote:
> >* Zachary Amsden<zamsden@redhat.com>  wrote:
> >
> >>On 03/18/2010 12:50 AM, Ingo Molnar wrote:
> >>>* Avi Kivity<avi@redhat.com>   wrote:
> >>>
> >>>>>The moment any change (be it as trivial as fixing a GUI detail or as
> >>>>>complex as a new feature) involves two or more packages, development speed
> >>>>>slows down to a crawl - while the complexity of the change might be very
> >>>>>low!
> >>>>Why is that?
> >>>It's very simple: because the contribution latencies and overhead compound,
> >>>almost inevitably.
> >>>
> >>>If you ever tried to implement a combo GCC+glibc+kernel feature you'll know
> >>>...
> >>>
> >>>Even with the best-run projects in existence it takes forever and is very
> >>>painful - and here i talk about first hand experience over many years.
> >>Ingo, what you miss is that this is not a bad thing.  Fact of the
> >>matter is, it's not just painful, it downright sucks.
> >Our experience is the opposite, and we tried both variants and report about
> >our experience with both models honestly.
> >
> >You only have experience about one variant - the one you advocate.
> >
> >See the assymetry?
> >
> >>This is actually a Good Thing (tm).  It means you have to get your
> >>feature and its interfaces well defined and able to version forwards
> >>and backwards independently from each other.  And that introduces
> >>some complexity and time and testing, but in the end it's what you
> >>want.  You don't introduce a requirement to have the feature, but
> >>take advantage of it if it is there.
> >>
> >>It may take everyone else a couple years to upgrade the compilers,
> >>tools, libraries and kernel, and by that time any bugs introduced by
> >>interacting with this feature will have been ironed out and their
> >>patterns well known.
> >Sorry, but this is pain not true. The 2.4->2.6 kernel cycle debacle has taught
> >us that waiting long to 'iron out' the details has the following effects:
> >
> >  - developer pain
> >  - user pain
> >  - distro pain
> >  - disconnect
> >  - loss of developers, testers and users
> >  - grave bugs discovered months (years ...) down the line
> >  - untested features
> >  - developer exhaustion
> >
> >It didnt work, trust me - and i've been around long enough to have suffered
> >through the whole 2.5.x misery. Some of our worst ABIs come from that cycle as
> >well.
> 
> You're talking about a single project and comparing it to my argument about 
> multiple independent projects.  In that case, I see no point in the 
> discussion.  If you want to win the argument by strawman, you are welcome to 
> do so.

The kernel is a very complex project with many ABI issues, so all those 
arguments apply to it as well. The description you gave:

 | This is actually a Good Thing (tm).  It means you have to get your feature 
 | and its interfaces well defined and able to version forwards and backwards 
 | independently from each other.  And that introduces some complexity and 
 | time and testing, but in the end it's what you want.  You don't introduce a 
 | requirement to have the feature, but take advantage of it if it is there.

matches the kernel too. We have many such situations. (Furthermore, the 
tools/perf/ situation, which relates to ABIs and user-space/kernel-space 
interactions is similar as well.)

Do you still think i'm making a straw-man argument?

> > Sorry, but i really think you are really trying to rationalize a 
> > disadvantage here ...
> 
> This could very well be true, but until someone comes forward with 
> compelling numbers (as in, developers committed to working on the project, 
> number of patches and total amount of code contribution), there is no point 
> in having an argument, there really isn't anything to discuss other than 
> opinion.  My opinion is you need a really strong justification to have a 
> successful fork and I don't see that justification.

I can give you rough numbers for tools/perf - if that counts for you.

For the first four months of its existence, when it was a separate project, i 
had a single external contributor IIRC.

The moment it went into the kernel repo the number of contributors and 
contributions skyrocketed and basically all contributions were top-notch. We 
are at 60+ separate contributors now (after about 8 months upstream) - which 
is still small compared to the kernel or to Qemu, but huge for a relatively 
isolated project like instrumentation.

So in my estimation tools/kvm/ would certainly be popular. Whether it would be 
more popular than current Qemu is hard to tell - it would be pure speculation.

Any reliable numbers for the other aspect, whether a split project creates a 
more fragile and less developed ABI would be extremely hard to get. I believe 
it to be true, but that's my opinion based on my experience with other 
projects, extrapolated to KVM/Qemu.

Anyway, the issue is moot as there's clear opposition to the unification idea. 

Too bad - there was heavy initial opposition to the arch/x86 unification as 
well [and heavy opposition to tools/perf/ as well], still both worked out 
extremely well :-)

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-16  5:27 [PATCH] Enhance perf to collect KVM guest os statistics from host side Zhang, Yanmin
  2010-03-16  5:41 ` Avi Kivity
@ 2010-03-19  3:38 ` Zhang, Yanmin
  2010-03-19  8:21   ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-19  3:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang

On Tue, 2010-03-16 at 13:27 +0800, Zhang, Yanmin wrote:
> From: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
> 
Here is the new patch of V2 against tip/master of March 17th
if anyone wants to try it.


ChangeLog V2:
	1) Based on Avi's suggestion, I moved callback functions
	to generic code area. So the kernel part of the patch is
	clearer.
	2) Add 'perf kvm stat'.


From: Zhang, Yanmin <yanmin_zhang@linux.intel.com>

Based on the discussion in KVM community, I worked out the patch to support
perf to collect guest os statistics from host side. This patch is implemented
with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
critical bug and provided good suggestions with other guys. I really appreciate
their kind help.

The patch adds new subcommand kvm to perf.

  perf kvm top
  perf kvm record
  perf kvm report
  perf kvm diff
  perf kvm stat

The new perf could profile guest os kernel except guest os user space, but it
could summarize guest os user space utilization per guest os.

Below are some examples.
1) perf kvm top
[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules top

--------------------------------------------------------------------------------------------------------------------------
   PerfTop:   16010 irqs/sec  kernel:59.1% us: 1.5% guest kernel:31.9% guest us: 7.5% exact:  0.0% [1000Hz cycles],  (all, 16 CPUs)
--------------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                  DSO
             _______ _____ _________________________ _______________________

            38770.00 20.4% __ticket_spin_lock        [guest.kernel.kallsyms]
            22560.00 11.9% ftrace_likely_update      [kernel.kallsyms]
             9208.00  4.8% __lock_acquire            [kernel.kallsyms]
             5473.00  2.9% trace_hardirqs_off_caller [kernel.kallsyms]
             5222.00  2.7% copy_user_generic_string  [guest.kernel.kallsyms]
             4450.00  2.3% validate_chain            [kernel.kallsyms]
             4262.00  2.2% trace_hardirqs_on_caller  [kernel.kallsyms]
             4239.00  2.2% do_raw_spin_lock          [kernel.kallsyms]
             3548.00  1.9% do_raw_spin_unlock        [kernel.kallsyms]
             2487.00  1.3% lock_release              [kernel.kallsyms]
             2165.00  1.1% __local_bh_disable        [kernel.kallsyms]
             1905.00  1.0% check_chain_key           [kernel.kallsyms]
             1737.00  0.9% lock_acquire              [kernel.kallsyms]
             1604.00  0.8% tcp_recvmsg               [kernel.kallsyms]
             1524.00  0.8% mark_lock                 [kernel.kallsyms]
             1464.00  0.8% schedule                  [kernel.kallsyms]
             1423.00  0.7% __d_lookup                [guest.kernel.kallsyms]

If you want to just show host data, pls. don't use parameter --guest.
The headline includes guest os kernel and userspace percentage.

2) perf kvm record
[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules record -f -a sleep 60
[ perf record: Woken up 15 times to write data ]
[ perf record: Captured and wrote 29.385 MB perf.data.kvm (~1283837 samples) ]

3) perf kvm report
        3.1) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules report --sort pid --showcpuutilization>norm.host.guest.report.pid
# Samples: 424719292247
#
# Overhead  sys    us    guest sys    guest us            Command:  Pid
# ........  .....................
#
    50.57%     1.02%     0.00%    39.97%     9.58%  qemu-system-x86: 3587
    49.32%     1.35%     0.01%    35.20%    12.76%  qemu-system-x86: 3347
     0.07%     0.07%     0.00%     0.00%     0.00%             perf: 5217


Some performance guys require perf to show sys/us/guest_sys/guest_us per KVM guest
instance which is actually just a multi-threaded process. Above sub parameter --showcpuutilization
does so.

        3.2) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules report >norm.host.guest.report
# Samples: 2466991384118
#
# Overhead          Command                                                             Shared Object  Symbol
# ........  ...............  ........................................................................  ......
#
    29.11%  qemu-system-x86  [guest.kernel.kallsyms]                                                   [g] __ticket_spin_lock
     5.88%       tbench_srv  [kernel.kallsyms]                                                         [k] ftrace_likely_update
     5.76%           tbench  [kernel.kallsyms]                                                         [k] ftrace_likely_update
     3.88%  qemu-system-x86                                                                34c3255482  [u] 0x000034c3255482
     1.83%           tbench  [kernel.kallsyms]                                                         [k] __lock_acquire
     1.81%       tbench_srv  [kernel.kallsyms]                                                         [k] __lock_acquire
     1.38%       tbench_srv  [kernel.kallsyms]                                                         [k] trace_hardirqs_off_caller
     1.37%           tbench  [kernel.kallsyms]                                                         [k] trace_hardirqs_off_caller
     1.13%  qemu-system-x86  [guest.kernel.kallsyms]                                                   [g] copy_user_generic_string
     1.04%       tbench_srv  [kernel.kallsyms]                                                         [k] validate_chain
     1.00%           tbench  [kernel.kallsyms]                                                         [k] trace_hardirqs_on_caller
     1.00%       tbench_srv  [kernel.kallsyms]                                                         [k] trace_hardirqs_on_caller
     0.95%           tbench  [kernel.kallsyms]                                                         [k] do_raw_spin_lock


[u] means it's in guest os user space. [g] means in guest os kernel. Other info is very direct.
If it shows a module such like [ext4], it means guest kernel module, because native host kernel's
modules are start from something like /lib/modules/XXX.


Below is the patch against tip/master tree of 17th March.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>

---

diff -Nraup linux-2.6_tip0317/arch/x86/include/asm/perf_event.h linux-2.6_tip0317_perfkvm/arch/x86/include/asm/perf_event.h
--- linux-2.6_tip0317/arch/x86/include/asm/perf_event.h	2010-03-18 09:04:36.597952883 +0800
+++ linux-2.6_tip0317_perfkvm/arch/x86/include/asm/perf_event.h	2010-03-18 15:06:19.579081193 +0800
@@ -143,17 +143,10 @@ extern void perf_events_lapic_init(void)
  */
 #define PERF_EFLAGS_EXACT	(1UL << 3)
 
-#define perf_misc_flags(regs)				\
-({	int misc = 0;					\
-	if (user_mode(regs))				\
-		misc |= PERF_RECORD_MISC_USER;		\
-	else						\
-		misc |= PERF_RECORD_MISC_KERNEL;	\
-	if (regs->flags & PERF_EFLAGS_EXACT)		\
-		misc |= PERF_RECORD_MISC_EXACT;		\
-	misc; })
-
-#define perf_instruction_pointer(regs)	((regs)->ip)
+struct pt_regs;
+extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
+extern unsigned long perf_misc_flags(struct pt_regs *regs);
+#define perf_misc_flags(regs)	perf_misc_flags(regs)
 
 #else
 static inline void init_hw_perf_events(void)		{ }
diff -Nraup linux-2.6_tip0317/arch/x86/kernel/cpu/perf_event.c linux-2.6_tip0317_perfkvm/arch/x86/kernel/cpu/perf_event.c
--- linux-2.6_tip0317/arch/x86/kernel/cpu/perf_event.c	2010-03-18 09:04:36.665958497 +0800
+++ linux-2.6_tip0317_perfkvm/arch/x86/kernel/cpu/perf_event.c	2010-03-18 15:07:20.555339370 +0800
@@ -1708,3 +1708,30 @@ void perf_arch_fetch_caller_regs(struct 
 	local_save_flags(regs->flags);
 }
 #endif
+
+unsigned long perf_instruction_pointer(struct pt_regs *regs)
+{
+	unsigned long ip;
+	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
+		ip = perf_guest_cbs->get_guest_ip();
+	else
+		ip = instruction_pointer(regs);
+	return ip;
+}
+
+unsigned long perf_misc_flags(struct pt_regs *regs)
+{
+	int misc = 0;
+	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+		misc |= perf_guest_cbs->is_user_mode() ?
+			PERF_RECORD_MISC_GUEST_USER :
+			PERF_RECORD_MISC_GUEST_KERNEL;
+	} else
+		misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
+			PERF_RECORD_MISC_KERNEL;
+	if (regs->flags & PERF_EFLAGS_EXACT)
+		misc |= PERF_RECORD_MISC_EXACT;
+
+	return misc;
+}
+
diff -Nraup linux-2.6_tip0317/arch/x86/kvm/x86.c linux-2.6_tip0317_perfkvm/arch/x86/kvm/x86.c
--- linux-2.6_tip0317/arch/x86/kvm/x86.c	2010-03-18 09:04:36.629956698 +0800
+++ linux-2.6_tip0317_perfkvm/arch/x86/kvm/x86.c	2010-03-18 15:06:19.579081193 +0800
@@ -3764,6 +3764,35 @@ static void kvm_timer_init(void)
 	}
 }
 
+static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
+
+static int kvm_is_in_guest(void)
+{
+	return percpu_read(current_vcpu) != NULL;
+}
+
+static int kvm_is_user_mode(void)
+{
+	int user_mode = 3;
+	if (percpu_read(current_vcpu))
+		user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu));
+	return user_mode != 0;
+}
+
+static unsigned long kvm_get_guest_ip(void)
+{
+	unsigned long ip = 0;
+	if (percpu_read(current_vcpu))
+		ip = kvm_rip_read(percpu_read(current_vcpu));
+	return ip;
+}
+
+static struct perf_guest_info_callbacks kvm_guest_cbs = {
+	.is_in_guest            = kvm_is_in_guest,
+	.is_user_mode           = kvm_is_user_mode,
+	.get_guest_ip           = kvm_get_guest_ip,
+};
+
 int kvm_arch_init(void *opaque)
 {
 	int r;
@@ -3800,6 +3829,8 @@ int kvm_arch_init(void *opaque)
 
 	kvm_timer_init();
 
+	perf_register_guest_info_callbacks(&kvm_guest_cbs);
+
 	return 0;
 
 out:
@@ -3808,6 +3839,8 @@ out:
 
 void kvm_arch_exit(void)
 {
+	perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+
 	if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
 		cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
 					    CPUFREQ_TRANSITION_NOTIFIER);
@@ -4338,7 +4371,10 @@ static int vcpu_enter_guest(struct kvm_v
 	}
 
 	trace_kvm_entry(vcpu->vcpu_id);
+
+	percpu_write(current_vcpu, vcpu);
 	kvm_x86_ops->run(vcpu);
+	percpu_write(current_vcpu, NULL);
 
 	/*
 	 * If the guest has used debug registers, at least dr7
diff -Nraup linux-2.6_tip0317/include/linux/perf_event.h linux-2.6_tip0317_perfkvm/include/linux/perf_event.h
--- linux-2.6_tip0317/include/linux/perf_event.h	2010-03-18 09:04:37.674034701 +0800
+++ linux-2.6_tip0317_perfkvm/include/linux/perf_event.h	2010-03-18 15:06:19.583056523 +0800
@@ -288,11 +288,13 @@ struct perf_event_mmap_page {
 	__u64	data_tail;		/* user-space written tail */
 };
 
-#define PERF_RECORD_MISC_CPUMODE_MASK		(3 << 0)
+#define PERF_RECORD_MISC_CPUMODE_MASK		(7 << 0)
 #define PERF_RECORD_MISC_CPUMODE_UNKNOWN	(0 << 0)
 #define PERF_RECORD_MISC_KERNEL			(1 << 0)
 #define PERF_RECORD_MISC_USER			(2 << 0)
 #define PERF_RECORD_MISC_HYPERVISOR		(3 << 0)
+#define PERF_RECORD_MISC_GUEST_KERNEL		(4 << 0)
+#define PERF_RECORD_MISC_GUEST_USER		(5 << 0)
 
 #define PERF_RECORD_MISC_EXACT			(1 << 14)
 /*
@@ -446,6 +448,12 @@ enum perf_callchain_context {
 # include <asm/perf_event.h>
 #endif
 
+struct perf_guest_info_callbacks {
+	int (*is_in_guest) (void);
+	int (*is_user_mode) (void);
+	unsigned long (*get_guest_ip) (void);
+};
+
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 #include <asm/hw_breakpoint.h>
 #endif
@@ -913,6 +921,12 @@ static inline void perf_event_mmap(struc
 		__perf_event_mmap(vma);
 }
 
+extern struct perf_guest_info_callbacks *perf_guest_cbs;
+extern int perf_register_guest_info_callbacks(
+		struct perf_guest_info_callbacks *);
+extern int perf_unregister_guest_info_callbacks(
+		struct perf_guest_info_callbacks *);
+
 extern void perf_event_comm(struct task_struct *tsk);
 extern void perf_event_fork(struct task_struct *tsk);
 
@@ -982,6 +996,11 @@ perf_sw_event(u32 event_id, u64 nr, int 
 static inline void
 perf_bp_event(struct perf_event *event, void *data)			{ }
 
+static inline int perf_register_guest_info_callbacks
+(struct perf_guest_info_callbacks *)   {return 0; }
+static inline int perf_unregister_guest_info_callbacks
+(struct perf_guest_info_callbacks *)   {return 0; }
+
 static inline void perf_event_mmap(struct vm_area_struct *vma)		{ }
 static inline void perf_event_comm(struct task_struct *tsk)		{ }
 static inline void perf_event_fork(struct task_struct *tsk)		{ }
diff -Nraup linux-2.6_tip0317/kernel/perf_event.c linux-2.6_tip0317_perfkvm/kernel/perf_event.c
--- linux-2.6_tip0317/kernel/perf_event.c	2010-03-18 09:04:40.954262305 +0800
+++ linux-2.6_tip0317_perfkvm/kernel/perf_event.c	2010-03-18 15:06:19.583056523 +0800
@@ -2798,6 +2798,27 @@ void perf_arch_fetch_caller_regs(struct 
 #endif
 
 /*
+ * We assume there is only KVM supporting the callbacks.
+ * Later on, we might change it to a list if there is
+ * another virtualization implementation supporting the callbacks.
+ */
+struct perf_guest_info_callbacks *perf_guest_cbs;
+
+int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
+{
+	perf_guest_cbs = cbs;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
+
+int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
+{
+	perf_guest_cbs = NULL;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
+
+/*
  * Output
  */
 static bool perf_output_space(struct perf_mmap_data *data, unsigned long tail,
@@ -3740,7 +3761,7 @@ void __perf_event_mmap(struct vm_area_st
 		.event_id  = {
 			.header = {
 				.type = PERF_RECORD_MMAP,
-				.misc = 0,
+				.misc = PERF_RECORD_MISC_USER,
 				/* .size */
 			},
 			/* .pid */
diff -Nraup linux-2.6_tip0317/tools/perf/builtin-diff.c linux-2.6_tip0317_perfkvm/tools/perf/builtin-diff.c
--- linux-2.6_tip0317/tools/perf/builtin-diff.c	2010-03-18 09:04:40.914226433 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/builtin-diff.c	2010-03-18 15:06:19.583056523 +0800
@@ -33,7 +33,7 @@ static int perf_session__add_hist_entry(
 		return -ENOMEM;
 
 	if (hit)
-		he->count += count;
+		__perf_session__add_count(he, al, count);
 
 	return 0;
 }
@@ -225,6 +225,9 @@ int cmd_diff(int argc, const char **argv
 			input_new = argv[1];
 		} else
 			input_new = argv[0];
+	} else if (symbol_conf.guest_vmlinux_name || symbol_conf.guest_kallsyms) {
+		input_old = "perf.data.host";
+		input_new = "perf.data.guest";
 	}
 
 	symbol_conf.exclude_other = false;
diff -Nraup linux-2.6_tip0317/tools/perf/builtin.h linux-2.6_tip0317_perfkvm/tools/perf/builtin.h
--- linux-2.6_tip0317/tools/perf/builtin.h	2010-03-18 09:04:40.910227768 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/builtin.h	2010-03-18 15:06:19.583056523 +0800
@@ -32,5 +32,6 @@ extern int cmd_version(int argc, const c
 extern int cmd_probe(int argc, const char **argv, const char *prefix);
 extern int cmd_kmem(int argc, const char **argv, const char *prefix);
 extern int cmd_lock(int argc, const char **argv, const char *prefix);
+extern int cmd_kvm(int argc, const char **argv, const char *prefix);
 
 #endif
diff -Nraup linux-2.6_tip0317/tools/perf/builtin-kvm.c linux-2.6_tip0317_perfkvm/tools/perf/builtin-kvm.c
--- linux-2.6_tip0317/tools/perf/builtin-kvm.c	1970-01-01 08:00:00.000000000 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/builtin-kvm.c	2010-03-18 15:06:19.583056523 +0800
@@ -0,0 +1,125 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/session.h"
+
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+
+#include "util/debug.h"
+
+#include <sys/prctl.h>
+
+#include <semaphore.h>
+#include <pthread.h>
+#include <math.h>
+
+static char			*file_name = NULL;
+static char			name_buffer[256];
+
+int				perf_host = 1;
+int				perf_guest = 0;
+
+static const char * const kvm_usage[] = {
+	"perf kvm [<options>] {top|record|report|diff|stat}",
+	NULL
+};
+
+static const struct option kvm_options[] = {
+	OPT_STRING('i', "input", &file_name, "file",
+		    "Input file name"),
+	OPT_STRING('o', "output", &file_name, "file",
+		    "Output file name"),
+	OPT_BOOLEAN(0, "guest", &perf_guest,
+		    "Collect guest os data"),
+	OPT_BOOLEAN(0, "host", &perf_host,
+		    "Collect guest os data"),
+	OPT_STRING(0, "guestvmlinux", &symbol_conf.guest_vmlinux_name, "file",
+		    "file saving guest os vmlinux"),
+	OPT_STRING(0, "guestkallsyms", &symbol_conf.guest_kallsyms, "file",
+		    "file saving guest os /proc/kallsyms"),
+	OPT_STRING(0, "guestmodules", &symbol_conf.guest_modules, "file",
+		    "file saving guest os /proc/modules"),
+	OPT_END()
+};
+
+static int __cmd_record(int argc, const char **argv)
+{
+	int rec_argc, i = 0, j;
+	const char **rec_argv;
+
+	rec_argc = argc + 2;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+	rec_argv[i++] = strdup("record");
+	rec_argv[i++] = strdup("-o");
+	rec_argv[i++] = strdup(file_name);
+	for (j = 1; j < argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	BUG_ON(i != rec_argc);
+
+	return cmd_record(i, rec_argv, NULL);
+}
+
+static int __cmd_report(int argc, const char **argv)
+{
+	int rec_argc, i = 0, j;
+	const char **rec_argv;
+
+	rec_argc = argc + 2;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+	rec_argv[i++] = strdup("report");
+	rec_argv[i++] = strdup("-i");
+	rec_argv[i++] = strdup(file_name);
+	for (j = 1; j < argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	BUG_ON(i != rec_argc);
+
+	return cmd_report(i, rec_argv, NULL);
+}
+
+int cmd_kvm(int argc, const char **argv, const char *prefix __used)
+{
+	perf_host = perf_guest = 0;
+
+	argc = parse_options(argc, argv, kvm_options, kvm_usage,
+			PARSE_OPT_STOP_AT_NON_OPTION);
+	if (!argc)
+		usage_with_options(kvm_usage, kvm_options);
+
+	if (!perf_host)
+		perf_guest = 1;
+
+	if (!file_name) {
+		if (perf_host && !perf_guest)
+			sprintf(name_buffer, "perf.data.host");
+		else if (!perf_host && perf_guest)
+			sprintf(name_buffer, "perf.data.guest");
+		else
+			sprintf(name_buffer, "perf.data.kvm");
+		file_name = name_buffer;
+	}
+
+	if (!strncmp(argv[0], "rec", 3)) {
+		return __cmd_record(argc, argv);
+	} else if (!strncmp(argv[0], "rep", 3)) {
+		return __cmd_report(argc, argv);
+	} else if (!strncmp(argv[0], "diff", 4)) {
+		return cmd_diff(argc, argv, NULL);
+	} else if (!strncmp(argv[0], "top", 3)) {
+		return cmd_top(argc, argv, NULL);
+	} else if (!strncmp(argv[0], "stat", 3)) {
+		return cmd_stat(argc, argv, NULL);
+	} else {
+		usage_with_options(kvm_usage, kvm_options);
+	}
+
+	return 0;
+}
+
diff -Nraup linux-2.6_tip0317/tools/perf/builtin-record.c linux-2.6_tip0317_perfkvm/tools/perf/builtin-record.c
--- linux-2.6_tip0317/tools/perf/builtin-record.c	2010-03-18 09:04:40.942263175 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/builtin-record.c	2010-03-18 15:06:19.583056523 +0800
@@ -566,18 +566,58 @@ static int __cmd_record(int argc, const 
 	post_processing_offset = lseek(output, 0, SEEK_CUR);
 
 	err = event__synthesize_kernel_mmap(process_synthesized_event,
-					    session, "_text");
+					    session, "/proc/kallsyms",
+					    "kernel.kallsyms",
+					    session->vmlinux_maps,
+					    "_text", PERF_RECORD_MISC_KERNEL);
 	if (err < 0) {
 		pr_err("Couldn't record kernel reference relocation symbol.\n");
 		return err;
 	}
 
-	err = event__synthesize_modules(process_synthesized_event, session);
+	err = event__synthesize_modules(process_synthesized_event,
+				session,
+				&session->kmaps,
+				PERF_RECORD_MISC_KERNEL);
 	if (err < 0) {
 		pr_err("Couldn't record kernel reference relocation symbol.\n");
 		return err;
 	}
 
+	if (perf_guest) {
+		/*
+		 *As for guest kernel when processing subcommand record&report,
+		 *we arrange module mmap prior to guest kernel mmap and trigger
+		 *a preload dso because guest module symbols are loaded from guest
+		 *kallsyms instead of /lib/modules/XXX/XXX. This method is used to
+		 *avoid symbol missing when the first addr is in module instead of
+		 *in guest kernel
+		 */
+		err = event__synthesize_modules(process_synthesized_event,
+				session,
+				&session->guest_kmaps,
+				PERF_RECORD_MISC_GUEST_KERNEL);
+		if (err < 0) {
+			pr_err("Couldn't record guest kernel reference relocation symbol.\n");
+			return err;
+		}
+
+		/*
+		 * We use _stext for guest kernel because guest kernel's /proc/kallsyms
+		 * have no _text.
+		 */
+		err = event__synthesize_kernel_mmap(process_synthesized_event,
+				session, symbol_conf.guest_kallsyms,
+				"guest.kernel.kallsyms",
+				session->guest_vmlinux_maps,
+				"_stext",
+				PERF_RECORD_MISC_GUEST_KERNEL);
+		if (err < 0) {
+			pr_err("Couldn't record guest kernel reference relocation symbol.\n");
+			return err;
+		}
+	}
+
 	if (!system_wide && profile_cpu == -1)
 		event__synthesize_thread(target_pid, process_synthesized_event,
 					 session);
diff -Nraup linux-2.6_tip0317/tools/perf/builtin-report.c linux-2.6_tip0317_perfkvm/tools/perf/builtin-report.c
--- linux-2.6_tip0317/tools/perf/builtin-report.c	2010-03-18 09:04:40.926228328 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/builtin-report.c	2010-03-18 15:06:19.587050319 +0800
@@ -104,7 +104,7 @@ static int perf_session__add_hist_entry(
 		return -ENOMEM;
 
 	if (hit)
-		he->count += data->period;
+		__perf_session__add_count(he, al,  data->period);
 
 	if (symbol_conf.use_callchain) {
 		if (!hit)
@@ -428,6 +428,8 @@ static const struct option options[] = {
 		   "sort by key(s): pid, comm, dso, symbol, parent"),
 	OPT_BOOLEAN('P', "full-paths", &symbol_conf.full_paths,
 		    "Don't shorten the pathnames taking into account the cwd"),
+	OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
+		    "Show sample percentage for different cpu modes"),
 	OPT_STRING('p', "parent", &parent_pattern, "regex",
 		   "regex filter to identify parent, see: '--sort parent'"),
 	OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
diff -Nraup linux-2.6_tip0317/tools/perf/builtin-top.c linux-2.6_tip0317_perfkvm/tools/perf/builtin-top.c
--- linux-2.6_tip0317/tools/perf/builtin-top.c	2010-03-18 09:04:40.926228328 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/builtin-top.c	2010-03-18 15:06:19.587050319 +0800
@@ -417,8 +417,9 @@ static double sym_weight(const struct sy
 }
 
 static long			samples;
-static long			userspace_samples;
+static long			kernel_samples, userspace_samples;
 static long			exact_samples;
+static long			guest_us_samples, guest_kernel_samples;
 static const char		CONSOLE_CLEAR[] = "^[[H^[[2J";
 
 static void __list_insert_active_sym(struct sym_entry *syme)
@@ -458,7 +459,10 @@ static void print_sym_table(void)
 	int printed = 0, j;
 	int counter, snap = !display_weighted ? sym_counter : 0;
 	float samples_per_sec = samples/delay_secs;
-	float ksamples_per_sec = (samples-userspace_samples)/delay_secs;
+	float ksamples_per_sec = kernel_samples/delay_secs;
+	float userspace_samples_per_sec = (userspace_samples)/delay_secs;
+	float guest_kernel_samples_per_sec = (guest_kernel_samples)/delay_secs;
+	float guest_us_samples_per_sec = (guest_us_samples)/delay_secs;
 	float esamples_percent = (100.0*exact_samples)/samples;
 	float sum_ksamples = 0.0;
 	struct sym_entry *syme, *n;
@@ -467,7 +471,8 @@ static void print_sym_table(void)
 	int sym_width = 0, dso_width = 0, dso_short_width = 0;
 	const int win_width = winsize.ws_col - 1;
 
-	samples = userspace_samples = exact_samples = 0;
+	samples = userspace_samples = kernel_samples = exact_samples = 0;
+	guest_kernel_samples = guest_us_samples = 0;
 
 	/* Sort the active symbols */
 	pthread_mutex_lock(&active_symbols_lock);
@@ -498,10 +503,21 @@ static void print_sym_table(void)
 	puts(CONSOLE_CLEAR);
 
 	printf("%-*.*s\n", win_width, win_width, graph_dotted_line);
-	printf( "   PerfTop:%8.0f irqs/sec  kernel:%4.1f%%  exact: %4.1f%% [",
-		samples_per_sec,
-		100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)),
-		esamples_percent);
+	if (!perf_guest) {
+		printf( "   PerfTop:%8.0f irqs/sec  kernel:%4.1f%%  exact: %4.1f%% [",
+			samples_per_sec,
+			100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)),
+			esamples_percent);
+	} else {
+		printf( "   PerfTop:%8.0f irqs/sec  kernel:%4.1f%% us:%4.1f%%"
+			" guest kernel:%4.1f%% guest us:%4.1f%% exact: %4.1f%% [",
+			samples_per_sec,
+			100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)),
+			100.0 - (100.0*((samples_per_sec-userspace_samples_per_sec)/samples_per_sec)),
+			100.0 - (100.0*((samples_per_sec-guest_kernel_samples_per_sec)/samples_per_sec)),
+			100.0 - (100.0*((samples_per_sec-guest_us_samples_per_sec)/samples_per_sec)),
+			esamples_percent);
+	}
 
 	if (nr_counters == 1 || !display_weighted) {
 		printf("%Ld", (u64)attrs[0].sample_period);
@@ -963,9 +979,20 @@ static void event__process_sample(const 
 			return;
 		break;
 	case PERF_RECORD_MISC_KERNEL:
+		++kernel_samples;
 		if (hide_kernel_symbols)
 			return;
 		break;
+	case PERF_RECORD_MISC_GUEST_KERNEL:
+		++guest_kernel_samples;
+		break;
+	case PERF_RECORD_MISC_GUEST_USER:
+		++guest_us_samples;
+		/*
+		 * TODO: we don't process guest user from host side
+		 * except simple counting 
+		 */
+		return;
 	default:
 		return;
 	}
diff -Nraup linux-2.6_tip0317/tools/perf/Makefile linux-2.6_tip0317_perfkvm/tools/perf/Makefile
--- linux-2.6_tip0317/tools/perf/Makefile	2010-03-18 09:04:40.938289813 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/Makefile	2010-03-18 15:06:19.587050319 +0800
@@ -462,6 +462,7 @@ BUILTIN_OBJS += builtin-trace.o
 BUILTIN_OBJS += builtin-probe.o
 BUILTIN_OBJS += builtin-kmem.o
 BUILTIN_OBJS += builtin-lock.o
+BUILTIN_OBJS += builtin-kvm.o
 
 PERFLIBS = $(LIB_FILE)
 
diff -Nraup linux-2.6_tip0317/tools/perf/perf.c linux-2.6_tip0317_perfkvm/tools/perf/perf.c
--- linux-2.6_tip0317/tools/perf/perf.c	2010-03-18 09:04:40.926228328 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/perf.c	2010-03-18 15:06:19.587050319 +0800
@@ -308,6 +308,7 @@ static void handle_internal_command(int 
 		{ "probe",	cmd_probe,	0 },
 		{ "kmem",	cmd_kmem,	0 },
 		{ "lock",	cmd_lock,	0 },
+		{ "kvm",	cmd_kvm,	0 },
 	};
 	unsigned int i;
 	static const char ext[] = STRIP_EXTENSION;
diff -Nraup linux-2.6_tip0317/tools/perf/perf.h linux-2.6_tip0317_perfkvm/tools/perf/perf.h
--- linux-2.6_tip0317/tools/perf/perf.h	2010-03-18 09:04:40.942263175 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/perf.h	2010-03-18 15:06:19.587050319 +0800
@@ -133,4 +133,6 @@ struct ip_callchain {
 	u64 ips[0];
 };
 
+extern int perf_host, perf_guest;
+
 #endif
diff -Nraup linux-2.6_tip0317/tools/perf/util/event.c linux-2.6_tip0317_perfkvm/tools/perf/util/event.c
--- linux-2.6_tip0317/tools/perf/util/event.c	2010-03-18 09:04:40.934227537 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/util/event.c	2010-03-18 15:06:19.587050319 +0800
@@ -112,7 +112,7 @@ static int event__synthesize_mmap_events
 		event_t ev = {
 			.header = {
 				.type = PERF_RECORD_MMAP,
-				.misc = 0, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */
+				.misc = PERF_RECORD_MISC_USER, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */
 			 },
 		};
 		int n;
@@ -158,11 +158,13 @@ static int event__synthesize_mmap_events
 }
 
 int event__synthesize_modules(event__handler_t process,
-			      struct perf_session *session)
+			      struct perf_session *session,
+			      struct map_groups *kmaps,
+			      unsigned int misc)
 {
 	struct rb_node *nd;
 
-	for (nd = rb_first(&session->kmaps.maps[MAP__FUNCTION]);
+	for (nd = rb_first(&kmaps->maps[MAP__FUNCTION]);
 	     nd; nd = rb_next(nd)) {
 		event_t ev;
 		size_t size;
@@ -173,7 +175,7 @@ int event__synthesize_modules(event__han
 
 		size = ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
 		memset(&ev, 0, sizeof(ev));
-		ev.mmap.header.misc = 1; /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */
+		ev.mmap.header.misc = misc; /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */
 		ev.mmap.header.type = PERF_RECORD_MMAP;
 		ev.mmap.header.size = (sizeof(ev.mmap) -
 				        (sizeof(ev.mmap.filename) - size));
@@ -241,13 +243,17 @@ static int find_symbol_cb(void *arg, con
 
 int event__synthesize_kernel_mmap(event__handler_t process,
 				  struct perf_session *session,
-				  const char *symbol_name)
+				  const char *kallsyms_name,
+				  const char *mmap_name,
+				  struct map **maps,
+				  const char *symbol_name,
+				  unsigned int misc)
 {
 	size_t size;
 	event_t ev = {
 		.header = {
 			.type = PERF_RECORD_MMAP,
-			.misc = 1, /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */
+			.misc = misc, /* kernel uses PERF_RECORD_MISC_USER for user space maps, see kernel/perf_event.c __perf_event_mmap */
 		},
 	};
 	/*
@@ -257,16 +263,16 @@ int event__synthesize_kernel_mmap(event_
 	 */
 	struct process_symbol_args args = { .name = symbol_name, };
 
-	if (kallsyms__parse("/proc/kallsyms", &args, find_symbol_cb) <= 0)
+	if (kallsyms__parse(kallsyms_name, &args, find_symbol_cb) <= 0)
 		return -ENOENT;
 
 	size = snprintf(ev.mmap.filename, sizeof(ev.mmap.filename),
-			"[kernel.kallsyms.%s]", symbol_name) + 1;
+			"[%s.%s]", mmap_name, symbol_name) + 1;
 	size = ALIGN(size, sizeof(u64));
 	ev.mmap.header.size = (sizeof(ev.mmap) - (sizeof(ev.mmap.filename) - size));
 	ev.mmap.pgoff = args.start;
-	ev.mmap.start = session->vmlinux_maps[MAP__FUNCTION]->start;
-	ev.mmap.len   = session->vmlinux_maps[MAP__FUNCTION]->end - ev.mmap.start ;
+	ev.mmap.start = maps[MAP__FUNCTION]->start;
+	ev.mmap.len   = maps[MAP__FUNCTION]->end - ev.mmap.start ;
 
 	return process(&ev, session);
 }
@@ -320,19 +326,25 @@ int event__process_lost(event_t *self, s
 	return 0;
 }
 
-int event__process_mmap(event_t *self, struct perf_session *session)
+static void event_set_kernel_mmap_len(struct map **maps, event_t *self)
 {
-	struct thread *thread;
-	struct map *map;
+	maps[MAP__FUNCTION]->start = self->mmap.start;
+	maps[MAP__FUNCTION]->end   = self->mmap.start + self->mmap.len;
+	/*
+	 * Be a bit paranoid here, some perf.data file came with
+	 * a zero sized synthesized MMAP event for the kernel.
+	 */
+	if (maps[MAP__FUNCTION]->end == 0)
+		maps[MAP__FUNCTION]->end = ~0UL;
+}
 
-	dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n",
-		    self->mmap.pid, self->mmap.tid, self->mmap.start,
-		    self->mmap.len, self->mmap.pgoff, self->mmap.filename);
+static int __event__process_mmap(event_t *self, struct perf_session *session)
+{
+	struct map *map;
+	static const char kmmap_prefix[] = "[kernel.kallsyms.";
 
-	if (self->mmap.pid == 0) {
-		static const char kmmap_prefix[] = "[kernel.kallsyms.";
+	if (self->mmap.filename[0] == '/') {
 
-		if (self->mmap.filename[0] == '/') {
 			char short_module_name[1024];
 			char *name = strrchr(self->mmap.filename, '/'), *dot;
 
@@ -348,9 +360,10 @@ int event__process_mmap(event_t *self, s
 				 "[%.*s]", (int)(dot - name), name);
 			strxfrchar(short_module_name, '-', '_');
 
-			map = perf_session__new_module_map(session,
+			map = map_groups__new_module(&session->kmaps,
 							   self->mmap.start,
-							   self->mmap.filename);
+							   self->mmap.filename,
+							   0);
 			if (map == NULL)
 				goto out_problem;
 
@@ -373,22 +386,94 @@ int event__process_mmap(event_t *self, s
 			if (kernel == NULL)
 				goto out_problem;
 
-			kernel->kernel = 1;
-			if (__perf_session__create_kernel_maps(session, kernel) < 0)
+			kernel->kernel = DSO_TYPE_KERNEL;
+			if (__map_groups__create_kernel_maps(&session->kmaps,
+						session->vmlinux_maps, kernel) < 0)
 				goto out_problem;
 
-			session->vmlinux_maps[MAP__FUNCTION]->start = self->mmap.start;
-			session->vmlinux_maps[MAP__FUNCTION]->end   = self->mmap.start + self->mmap.len;
-			/*
-			 * Be a bit paranoid here, some perf.data file came with
-			 * a zero sized synthesized MMAP event for the kernel.
-			 */
-			if (session->vmlinux_maps[MAP__FUNCTION]->end == 0)
-				session->vmlinux_maps[MAP__FUNCTION]->end = ~0UL;
-
-			perf_session__set_kallsyms_ref_reloc_sym(session, symbol_name,
-								 self->mmap.pgoff);
+			event_set_kernel_mmap_len(session->vmlinux_maps, self);
+			perf_session__set_kallsyms_ref_reloc_sym(session->vmlinux_maps,
+							symbol_name,
+							self->mmap.pgoff);
 		}
+	return 0;
+
+out_problem:
+	return -1;
+}
+
+static int __event__process_guest_mmap(event_t *self, struct perf_session *session)
+{
+	struct map *map;
+
+	static const char kmmap_prefix[] = "[guest.kernel.kallsyms.";
+
+	if (memcmp(self->mmap.filename, kmmap_prefix,
+				sizeof(kmmap_prefix) - 1) == 0) {
+		const char *symbol_name = (self->mmap.filename +
+				sizeof(kmmap_prefix) - 1);
+		/*
+		 * Should be there already, from the build-id table in
+		 * the header.
+		 */
+		struct dso *kernel = __dsos__findnew(&dsos__guest_kernel,
+				"[guest.kernel.kallsyms]");
+		if (kernel == NULL)
+			goto out_problem;
+
+		kernel->kernel = DSO_TYPE_GUEST_KERNEL;
+		if (__map_groups__create_kernel_maps(&session->guest_kmaps,
+				session->guest_vmlinux_maps, kernel) < 0)
+			goto out_problem;
+
+		event_set_kernel_mmap_len(session->guest_vmlinux_maps, self);
+		perf_session__set_kallsyms_ref_reloc_sym(session->guest_vmlinux_maps,
+				symbol_name,
+				self->mmap.pgoff);
+		/*
+		 * preload dso of guest kernel and modules
+		 */
+		dso__load(kernel, session->guest_vmlinux_maps[MAP__FUNCTION], NULL);
+	} else if (self->mmap.filename[0] == '[') {
+		char *name;
+
+		map = map_groups__new_module(&session->guest_kmaps,
+				self->mmap.start,
+				self->mmap.filename,
+				1);
+		if (map == NULL)
+			goto out_problem;
+		name = strdup(self->mmap.filename);
+		if (name == NULL)
+			goto out_problem;
+
+		map->dso->short_name = name;
+		map->end = map->start + self->mmap.len;
+	}
+
+	return 0;
+out_problem:
+	return -1;
+}
+
+int event__process_mmap(event_t *self, struct perf_session *session)
+{
+	struct thread *thread;
+	struct map *map;
+	u8 cpumode = self->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+	int ret;
+
+	dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n",
+			self->mmap.pid, self->mmap.tid, self->mmap.start,
+			self->mmap.len, self->mmap.pgoff, self->mmap.filename);
+
+	if (self->mmap.pid == 0) {
+		if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL)
+			ret = __event__process_guest_mmap(self, session);
+		else
+			ret = __event__process_mmap(self, session);
+		if (ret < 0)
+			goto out_problem;
 		return 0;
 	}
 
@@ -441,15 +526,33 @@ void thread__find_addr_map(struct thread
 
 	al->thread = self;
 	al->addr = addr;
+	al->cpumode = cpumode;
 
-	if (cpumode == PERF_RECORD_MISC_KERNEL) {
+	if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
 		al->level = 'k';
 		mg = &session->kmaps;
-	} else if (cpumode == PERF_RECORD_MISC_USER)
+	} else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
 		al->level = '.';
-	else {
-		al->level = 'H';
+	} else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
+		al->level = 'g';
+		mg = &session->guest_kmaps;
+	} else {
+		/* TODO: We don't support guest user space. Might support late */
+		if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest)
+			al->level = 'u';
+		else
+			al->level = 'H';
 		al->map = NULL;
+
+		if ((cpumode == PERF_RECORD_MISC_GUEST_USER ||
+			cpumode == PERF_RECORD_MISC_GUEST_KERNEL) &&
+			!perf_guest)
+			al->filtered = true;
+		if ((cpumode == PERF_RECORD_MISC_USER ||
+			cpumode == PERF_RECORD_MISC_KERNEL) &&
+			!perf_host)
+			al->filtered = true;
+
 		return;
 	}
 try_again:
@@ -464,10 +567,18 @@ try_again:
 		 * "[vdso]" dso, but for now lets use the old trick of looking
 		 * in the whole kernel symbol list.
 		 */
-		if ((long long)al->addr < 0 && mg != &session->kmaps) {
+		if ((long long)al->addr < 0 &&
+			mg != &session->kmaps &&
+			cpumode == PERF_RECORD_MISC_KERNEL) {
 			mg = &session->kmaps;
 			goto try_again;
 		}
+		if ((long long)al->addr < 0 &&
+				mg != &session->guest_kmaps &&
+				cpumode == PERF_RECORD_MISC_GUEST_KERNEL) {
+			mg = &session->guest_kmaps;
+			goto try_again;
+		}
 	} else
 		al->addr = al->map->map_ip(al->map, al->addr);
 }
@@ -513,6 +624,7 @@ int event__preprocess_sample(const event
 
 	dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
 
+	al->filtered = false;
 	thread__find_addr_location(thread, session, cpumode, MAP__FUNCTION,
 				   self->ip.ip, al, filter);
 	dump_printf(" ...... dso: %s\n",
@@ -536,7 +648,6 @@ int event__preprocess_sample(const event
 	    !strlist__has_entry(symbol_conf.sym_list, al->sym->name))
 		goto out_filtered;
 
-	al->filtered = false;
 	return 0;
 
 out_filtered:
diff -Nraup linux-2.6_tip0317/tools/perf/util/event.h linux-2.6_tip0317_perfkvm/tools/perf/util/event.h
--- linux-2.6_tip0317/tools/perf/util/event.h	2010-03-18 09:04:40.934227537 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/util/event.h	2010-03-18 15:06:19.587050319 +0800
@@ -119,10 +119,17 @@ int event__synthesize_thread(pid_t pid, 
 void event__synthesize_threads(event__handler_t process,
 			       struct perf_session *session);
 int event__synthesize_kernel_mmap(event__handler_t process,
-				  struct perf_session *session,
-				  const char *symbol_name);
+				struct perf_session *session,
+				const char *kallsyms_name,
+				const char *mmap_name,
+				struct map **maps,
+				const char *symbol_name,
+				unsigned int misc);
+
 int event__synthesize_modules(event__handler_t process,
-			      struct perf_session *session);
+			      struct perf_session *session,
+			      struct map_groups *kmaps,
+			      unsigned int misc);
 
 int event__process_comm(event_t *self, struct perf_session *session);
 int event__process_lost(event_t *self, struct perf_session *session);
diff -Nraup linux-2.6_tip0317/tools/perf/util/hist.c linux-2.6_tip0317_perfkvm/tools/perf/util/hist.c
--- linux-2.6_tip0317/tools/perf/util/hist.c	2010-03-18 09:04:40.938289813 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/util/hist.c	2010-03-18 15:06:19.587050319 +0800
@@ -8,6 +8,30 @@ struct callchain_param	callchain_param =
 	.min_percent = 0.5
 };
 
+void __perf_session__add_count(struct hist_entry *he,
+			struct addr_location *al,
+			u64 count)
+{
+	he->count += count;
+
+	switch (al->cpumode) {
+	case PERF_RECORD_MISC_KERNEL:
+		he->count_sys += count;
+		break;
+	case PERF_RECORD_MISC_USER:
+		he->count_us += count;
+		break;
+	case PERF_RECORD_MISC_GUEST_KERNEL:
+		he->count_guest_sys += count;
+		break;
+	case PERF_RECORD_MISC_GUEST_USER:
+		he->count_guest_us += count;
+		break;
+	default:
+		break;
+	}
+}
+
 /*
  * histogram, sorted on item, collects counts
  */
@@ -26,7 +50,6 @@ struct hist_entry *__perf_session__add_h
 		.sym	= al->sym,
 		.ip	= al->addr,
 		.level	= al->level,
-		.count	= count,
 		.parent = sym_parent,
 	};
 	int cmp;
@@ -48,6 +71,8 @@ struct hist_entry *__perf_session__add_h
 			p = &(*p)->rb_right;
 	}
 
+	__perf_session__add_count(&entry, al, count);
+
 	he = malloc(sizeof(*he));
 	if (!he)
 		return NULL;
@@ -462,7 +487,7 @@ size_t hist_entry__fprintf(struct hist_e
 			   u64 session_total)
 {
 	struct sort_entry *se;
-	u64 count, total;
+	u64 count, total, count_sys, count_us, count_guest_sys, count_guest_us;
 	const char *sep = symbol_conf.field_sep;
 	size_t ret;
 
@@ -472,15 +497,35 @@ size_t hist_entry__fprintf(struct hist_e
 	if (pair_session) {
 		count = self->pair ? self->pair->count : 0;
 		total = pair_session->events_stats.total;
+		count_sys = self->pair ? self->pair->count_sys : 0;
+		count_us = self->pair ? self->pair->count_us : 0;
+		count_guest_sys = self->pair ? self->pair->count_guest_sys : 0;
+		count_guest_us = self->pair ? self->pair->count_guest_us : 0;
 	} else {
 		count = self->count;
 		total = session_total;
+		count_sys = self->count_sys;
+		count_us = self->count_us;
+		count_guest_sys = self->count_guest_sys;
+		count_guest_us = self->count_guest_us;
 	}
 
-	if (total)
+	if (total) {
 		ret = percent_color_fprintf(fp, sep ? "%.2f" : "   %6.2f%%",
 					    (count * 100.0) / total);
-	else
+		if (symbol_conf.show_cpu_utilization) {
+			ret += percent_color_fprintf(fp, sep ? "%.2f" : "   %6.2f%%",
+					(count_sys * 100.0) / total);
+			ret += percent_color_fprintf(fp, sep ? "%.2f" : "   %6.2f%%",
+					(count_us * 100.0) / total);
+			if (perf_guest) {
+				ret += percent_color_fprintf(fp, sep ? "%.2f" : "   %6.2f%%",
+						(count_guest_sys * 100.0) / total);
+				ret += percent_color_fprintf(fp, sep ? "%.2f" : "   %6.2f%%",
+						(count_guest_us * 100.0) / total);
+			}
+		}
+	} else
 		ret = fprintf(fp, sep ? "%lld" : "%12lld ", count);
 
 	if (symbol_conf.show_nr_samples) {
@@ -576,6 +621,20 @@ size_t perf_session__fprintf_hists(struc
 			fputs("  Samples  ", fp);
 	}
 
+	if (symbol_conf.show_cpu_utilization) {
+		if (sep) {
+			ret += fprintf(fp, "%csys", *sep);
+			ret += fprintf(fp, "%cus", *sep);
+			ret += fprintf(fp, "%cguest sys", *sep);
+			ret += fprintf(fp, "%cguest us", *sep);
+		} else {
+			ret += fprintf(fp, "  sys  ");
+			ret += fprintf(fp, "  us  ");
+			ret += fprintf(fp, "  guest sys  ");
+			ret += fprintf(fp, "  guest us  ");
+		}
+	}
+
 	if (pair) {
 		if (sep)
 			ret += fprintf(fp, "%cDelta", *sep);
diff -Nraup linux-2.6_tip0317/tools/perf/util/hist.h linux-2.6_tip0317_perfkvm/tools/perf/util/hist.h
--- linux-2.6_tip0317/tools/perf/util/hist.h	2010-03-18 09:04:40.938289813 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/util/hist.h	2010-03-18 15:06:19.591054262 +0800
@@ -12,6 +12,9 @@ struct addr_location;
 struct symbol;
 struct rb_root;
 
+void __perf_session__add_count(struct hist_entry *he,
+			struct addr_location *al,
+			u64 count);
 struct hist_entry *__perf_session__add_hist_entry(struct rb_root *hists,
 						  struct addr_location *al,
 						  struct symbol *parent,
diff -Nraup linux-2.6_tip0317/tools/perf/util/session.c linux-2.6_tip0317_perfkvm/tools/perf/util/session.c
--- linux-2.6_tip0317/tools/perf/util/session.c	2010-03-18 09:04:40.938289813 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/util/session.c	2010-03-18 15:06:19.591054262 +0800
@@ -54,7 +54,12 @@ out_close:
 
 static inline int perf_session__create_kernel_maps(struct perf_session *self)
 {
-	return map_groups__create_kernel_maps(&self->kmaps, self->vmlinux_maps);
+	int ret;
+	ret = map_groups__create_kernel_maps(&self->kmaps, self->vmlinux_maps);
+	if (ret >= 0)
+		ret = map_groups__create_guest_kernel_maps(&self->guest_kmaps,
+				self->guest_vmlinux_maps);
+	return ret;
 }
 
 struct perf_session *perf_session__new(const char *filename, int mode, bool force)
@@ -77,6 +82,7 @@ struct perf_session *perf_session__new(c
 	self->cwdlen = 0;
 	self->unknown_events = 0;
 	map_groups__init(&self->kmaps);
+	map_groups__init(&self->guest_kmaps);
 
 	if (mode == O_RDONLY) {
 		if (perf_session__open(self, force) < 0)
@@ -356,7 +362,8 @@ int perf_header__read_build_ids(struct p
 		if (read(input, filename, len) != len)
 			goto out;
 
-		if (bev.header.misc & PERF_RECORD_MISC_KERNEL)
+		if ((bev.header.misc & PERF_RECORD_MISC_CPUMODE_MASK)
+			==  PERF_RECORD_MISC_KERNEL)
 			head = &dsos__kernel;
 
 		dso = __dsos__findnew(head, filename);
@@ -519,26 +526,33 @@ bool perf_session__has_traces(struct per
 	return true;
 }
 
-int perf_session__set_kallsyms_ref_reloc_sym(struct perf_session *self,
+int perf_session__set_kallsyms_ref_reloc_sym(struct map ** maps,
 					     const char *symbol_name,
 					     u64 addr)
 {
 	char *bracket;
 	enum map_type i;
+	struct ref_reloc_sym *ref;
 
-	self->ref_reloc_sym.name = strdup(symbol_name);
-	if (self->ref_reloc_sym.name == NULL)
+	ref = zalloc(sizeof(struct ref_reloc_sym));
+	if (ref == NULL)
 		return -ENOMEM;
 
-	bracket = strchr(self->ref_reloc_sym.name, ']');
+	ref->name = strdup(symbol_name);
+	if (ref->name == NULL) {
+		free(ref);
+		return -ENOMEM;
+	}
+
+	bracket = strchr(ref->name, ']');
 	if (bracket)
 		*bracket = '\0';
 
-	self->ref_reloc_sym.addr = addr;
+	ref->addr = addr;
 
 	for (i = 0; i < MAP__NR_TYPES; ++i) {
-		struct kmap *kmap = map__kmap(self->vmlinux_maps[i]);
-		kmap->ref_reloc_sym = &self->ref_reloc_sym;
+		struct kmap *kmap = map__kmap(maps[i]);
+		kmap->ref_reloc_sym = ref;
 	}
 
 	return 0;
diff -Nraup linux-2.6_tip0317/tools/perf/util/session.h linux-2.6_tip0317_perfkvm/tools/perf/util/session.h
--- linux-2.6_tip0317/tools/perf/util/session.h	2010-03-18 09:04:40.926228328 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/util/session.h	2010-03-18 15:06:19.591054262 +0800
@@ -16,16 +16,17 @@ struct perf_session {
 	unsigned long		size;
 	unsigned long		mmap_window;
 	struct map_groups	kmaps;
+	struct map_groups	guest_kmaps;
 	struct rb_root		threads;
 	struct thread		*last_match;
 	struct map		*vmlinux_maps[MAP__NR_TYPES];
+	struct map		*guest_vmlinux_maps[MAP__NR_TYPES];
 	struct events_stats	events_stats;
 	struct rb_root		stats_by_id;
 	unsigned long		event_total[PERF_RECORD_MAX];
 	unsigned long		unknown_events;
 	struct rb_root		hists;
 	u64			sample_type;
-	struct ref_reloc_sym	ref_reloc_sym;
 	int			fd;
 	int			cwdlen;
 	char			*cwd;
@@ -67,26 +68,12 @@ bool perf_session__has_traces(struct per
 int perf_header__read_build_ids(struct perf_header *self, int input,
 				u64 offset, u64 file_size);
 
-int perf_session__set_kallsyms_ref_reloc_sym(struct perf_session *self,
+int perf_session__set_kallsyms_ref_reloc_sym(struct map ** maps,
 					     const char *symbol_name,
 					     u64 addr);
 
 void mem_bswap_64(void *src, int byte_size);
 
-static inline int __perf_session__create_kernel_maps(struct perf_session *self,
-						struct dso *kernel)
-{
-	return __map_groups__create_kernel_maps(&self->kmaps,
-						self->vmlinux_maps, kernel);
-}
-
-static inline struct map *
-	perf_session__new_module_map(struct perf_session *self,
-				     u64 start, const char *filename)
-{
-	return map_groups__new_module(&self->kmaps, start, filename);
-}
-
 #ifdef NO_NEWT_SUPPORT
 static inline void perf_session__browse_hists(struct rb_root *hists __used,
 					      u64 session_total __used,
diff -Nraup linux-2.6_tip0317/tools/perf/util/sort.h linux-2.6_tip0317_perfkvm/tools/perf/util/sort.h
--- linux-2.6_tip0317/tools/perf/util/sort.h	2010-03-18 09:04:40.930227237 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/util/sort.h	2010-03-18 15:06:19.591054262 +0800
@@ -44,6 +44,10 @@ extern enum sort_type sort__first_dimens
 struct hist_entry {
 	struct rb_node		rb_node;
 	u64			count;
+	u64			count_sys;
+	u64			count_us;
+	u64			count_guest_sys;
+	u64			count_guest_us;
 	struct thread		*thread;
 	struct map		*map;
 	struct symbol		*sym;
diff -Nraup linux-2.6_tip0317/tools/perf/util/symbol.c linux-2.6_tip0317_perfkvm/tools/perf/util/symbol.c
--- linux-2.6_tip0317/tools/perf/util/symbol.c	2010-03-18 09:04:40.930227237 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/util/symbol.c	2010-03-18 15:09:59.498404450 +0800
@@ -22,6 +22,8 @@ static void dsos__add(struct list_head *
 static struct map *map__new2(u64 start, struct dso *dso, enum map_type type);
 static int dso__load_kernel_sym(struct dso *self, struct map *map,
 				symbol_filter_t filter);
+static int dso__load_guest_kernel_sym(struct dso *self, struct map *map,
+			symbol_filter_t filter);
 static int vmlinux_path__nr_entries;
 static char **vmlinux_path;
 
@@ -180,6 +182,7 @@ struct dso *dso__new(const char *name)
 		self->loaded = 0;
 		self->sorted_by_name = 0;
 		self->has_build_id = 0;
+		self->kernel = DSO_TYPE_USER;
 	}
 
 	return self;
@@ -396,12 +399,9 @@ int kallsyms__parse(const char *filename
 		char *symbol_name;
 
 		line_len = getline(&line, &n, file);
-		if (line_len < 0)
+		if (line_len < 0 || !line)
 			break;
 
-		if (!line)
-			goto out_failure;
-
 		line[--line_len] = '\0'; /* \n */
 
 		len = hex2u64(line, &start);
@@ -453,6 +453,7 @@ static int map__process_kallsym_symbol(v
 	 * map__split_kallsyms, when we have split the maps per module
 	 */
 	symbols__insert(root, sym);
+
 	return 0;
 }
 
@@ -498,6 +499,15 @@ static int dso__split_kallsyms(struct ds
 			*module++ = '\0';
 
 			if (strcmp(curr_map->dso->short_name, module)) {
+				if (curr_map != map &&
+					self->kernel == DSO_TYPE_GUEST_KERNEL) {
+					/*
+					 * We assume all symbols of a module are continuous in
+					 * kallsyms, so curr_map points to a module and all its
+					 * symbols are in its kmap. Mark it as loaded.
+					 */
+					dso__set_loaded(curr_map->dso, curr_map->type);
+				}
 				curr_map = map_groups__find_by_name(kmaps, map->type, module);
 				if (curr_map == NULL) {
 					pr_debug("/proc/{kallsyms,modules} "
@@ -519,13 +529,19 @@ static int dso__split_kallsyms(struct ds
 			char dso_name[PATH_MAX];
 			struct dso *dso;
 
-			snprintf(dso_name, sizeof(dso_name), "[kernel].%d",
-				 kernel_range++);
+			if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+				snprintf(dso_name, sizeof(dso_name), "[guest.kernel].%d",
+						kernel_range++);
+			else
+				snprintf(dso_name, sizeof(dso_name), "[kernel].%d",
+						kernel_range++);
 
 			dso = dso__new(dso_name);
 			if (dso == NULL)
 				return -1;
 
+			dso->kernel = self->kernel;
+
 			curr_map = map__new2(pos->start, dso, map->type);
 			if (curr_map == NULL) {
 				dso__delete(dso);
@@ -549,6 +565,10 @@ discard_symbol:		rb_erase(&pos->rb_node,
 		}
 	}
 
+	if (curr_map != map &&
+		self->kernel == DSO_TYPE_GUEST_KERNEL)
+		dso__set_loaded(curr_map->dso, curr_map->type);
+
 	return count;
 }
 
@@ -559,7 +579,10 @@ int dso__load_kallsyms(struct dso *self,
 		return -1;
 
 	symbols__fixup_end(&self->symbols[map->type]);
-	self->origin = DSO__ORIG_KERNEL;
+	if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+		self->origin = DSO__ORIG_GUEST_KERNEL;
+	else
+		self->origin = DSO__ORIG_KERNEL;
 
 	return dso__split_kallsyms(self, map, filter);
 }
@@ -946,7 +969,7 @@ static int dso__load_sym(struct dso *sel
 	nr_syms = shdr.sh_size / shdr.sh_entsize;
 
 	memset(&sym, 0, sizeof(sym));
-	if (!self->kernel) {
+	if (self->kernel == DSO_TYPE_USER) {
 		self->adjust_symbols = (ehdr.e_type == ET_EXEC ||
 				elf_section_by_name(elf, &ehdr, &shdr,
 						     ".gnu.prelink_undo",
@@ -978,7 +1001,7 @@ static int dso__load_sym(struct dso *sel
 
 		section_name = elf_sec__name(&shdr, secstrs);
 
-		if (self->kernel || kmodule) {
+		if (self->kernel != DSO_TYPE_USER || kmodule) {
 			char dso_name[PATH_MAX];
 
 			if (strcmp(section_name,
@@ -1005,6 +1028,7 @@ static int dso__load_sym(struct dso *sel
 				curr_dso = dso__new(dso_name);
 				if (curr_dso == NULL)
 					goto out_elf_end;
+				curr_dso->kernel = self->kernel;
 				curr_map = map__new2(start, curr_dso,
 						     map->type);
 				if (curr_map == NULL) {
@@ -1015,7 +1039,10 @@ static int dso__load_sym(struct dso *sel
 				curr_map->unmap_ip = identity__map_ip;
 				curr_dso->origin = self->origin;
 				map_groups__insert(kmap->kmaps, curr_map);
-				dsos__add(&dsos__kernel, curr_dso);
+				if (curr_dso->kernel == DSO_TYPE_GUEST_KERNEL)
+					dsos__add(&dsos__guest_kernel, curr_dso);
+				else
+					dsos__add(&dsos__kernel, curr_dso);
 				dso__set_loaded(curr_dso, map->type);
 			} else
 				curr_dso = curr_map->dso;
@@ -1236,6 +1263,8 @@ char dso__symtab_origin(const struct dso
 		[DSO__ORIG_BUILDID] =  'b',
 		[DSO__ORIG_DSO] =      'd',
 		[DSO__ORIG_KMODULE] =  'K',
+		[DSO__ORIG_GUEST_KERNEL] =  'g',
+		[DSO__ORIG_GUEST_KMODULE] =  'G',
 	};
 
 	if (self == NULL || self->origin == DSO__ORIG_NOT_FOUND)
@@ -1254,8 +1283,10 @@ int dso__load(struct dso *self, struct m
 
 	dso__set_loaded(self, map->type);
 
-	if (self->kernel)
+	if (self->kernel == DSO_TYPE_KERNEL)
 		return dso__load_kernel_sym(self, map, filter);
+	else if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+		return dso__load_guest_kernel_sym(self, map, filter);
 
 	name = malloc(size);
 	if (!name)
@@ -1459,7 +1490,7 @@ static int map_groups__set_modules_path(
 static struct map *map__new2(u64 start, struct dso *dso, enum map_type type)
 {
 	struct map *self = zalloc(sizeof(*self) +
-				  (dso->kernel ? sizeof(struct kmap) : 0));
+			  (dso->kernel != DSO_TYPE_USER ? sizeof(struct kmap) : 0));
 	if (self != NULL) {
 		/*
 		 * ->end will be filled after we load all the symbols
@@ -1471,11 +1502,15 @@ static struct map *map__new2(u64 start, 
 }
 
 struct map *map_groups__new_module(struct map_groups *self, u64 start,
-				   const char *filename)
+				   const char *filename, int guest)
 {
 	struct map *map;
-	struct dso *dso = __dsos__findnew(&dsos__kernel, filename);
+	struct dso *dso;
 
+	if (!guest)
+		dso = __dsos__findnew(&dsos__kernel, filename);
+	else
+		dso = __dsos__findnew(&dsos__guest_kernel, filename);
 	if (dso == NULL)
 		return NULL;
 
@@ -1483,16 +1518,20 @@ struct map *map_groups__new_module(struc
 	if (map == NULL)
 		return NULL;
 
-	dso->origin = DSO__ORIG_KMODULE;
+	if (guest)
+		dso->origin = DSO__ORIG_GUEST_KMODULE;
+	else
+		dso->origin = DSO__ORIG_KMODULE;
 	map_groups__insert(self, map);
 	return map;
 }
 
-static int map_groups__create_modules(struct map_groups *self)
+static int __map_groups__create_modules(struct map_groups *self,
+			const char * filename, int guest)
 {
 	char *line = NULL;
 	size_t n;
-	FILE *file = fopen("/proc/modules", "r");
+	FILE *file = fopen(filename, "r");
 	struct map *map;
 
 	if (file == NULL)
@@ -1526,16 +1565,17 @@ static int map_groups__create_modules(st
 		*sep = '\0';
 
 		snprintf(name, sizeof(name), "[%s]", line);
-		map = map_groups__new_module(self, start, name);
+		map = map_groups__new_module(self, start, name, guest);
 		if (map == NULL)
 			goto out_delete_line;
-		dso__kernel_module_get_build_id(map->dso);
+		if (!guest)
+			dso__kernel_module_get_build_id(map->dso);
 	}
 
 	free(line);
 	fclose(file);
 
-	return map_groups__set_modules_path(self);
+	return 0;
 
 out_delete_line:
 	free(line);
@@ -1543,6 +1583,21 @@ out_failure:
 	return -1;
 }
 
+static int map_groups__create_modules(struct map_groups *self)
+{
+	int ret;
+
+	ret = __map_groups__create_modules(self, "/proc/modules", 0);
+	if (ret >= 0)
+		ret = map_groups__set_modules_path(self);
+	return ret;
+}
+
+static int map_groups__create_guest_modules(struct map_groups *self)
+{
+	return  __map_groups__create_modules(self, symbol_conf.guest_modules, 1);
+}
+
 static int dso__load_vmlinux(struct dso *self, struct map *map,
 			     const char *vmlinux, symbol_filter_t filter)
 {
@@ -1702,8 +1757,45 @@ out_fixup:
 	return err;
 }
 
+static int dso__load_guest_kernel_sym(struct dso *self, struct map *map,
+				symbol_filter_t filter)
+{
+	int err;
+	const char *kallsyms_filename = NULL;
+
+	/*
+	 * if the user specified a vmlinux filename, use it and only
+	 * it, reporting errors to the user if it cannot be used.
+	 * Or use file guest_kallsyms inputted by user on commandline
+	 */
+	if (symbol_conf.guest_vmlinux_name != NULL) {
+		err = dso__load_vmlinux(self, map,
+					symbol_conf.guest_vmlinux_name, filter);
+		goto out_try_fixup;
+	}
+
+	kallsyms_filename = symbol_conf.guest_kallsyms;
+	if (!kallsyms_filename)
+		return -1;
+	err = dso__load_kallsyms(self, kallsyms_filename, map, filter);
+	if (err > 0)
+		pr_debug("Using %s for symbols\n", kallsyms_filename);
+
+out_try_fixup:
+	if (err > 0) {
+		if (kallsyms_filename != NULL)
+			dso__set_long_name(self, strdup("[guest.kernel.kallsyms]"));
+		map__fixup_start(map);
+		map__fixup_end(map);
+	}
+
+	return err;
+}
+
 LIST_HEAD(dsos__user);
 LIST_HEAD(dsos__kernel);
+LIST_HEAD(dsos__guest_user);
+LIST_HEAD(dsos__guest_kernel);
 
 static void dsos__add(struct list_head *head, struct dso *dso)
 {
@@ -1750,6 +1842,8 @@ void dsos__fprintf(FILE *fp)
 {
 	__dsos__fprintf(&dsos__kernel, fp);
 	__dsos__fprintf(&dsos__user, fp);
+	__dsos__fprintf(&dsos__guest_kernel, fp);
+	__dsos__fprintf(&dsos__guest_user, fp);
 }
 
 static size_t __dsos__fprintf_buildid(struct list_head *head, FILE *fp,
@@ -1779,7 +1873,19 @@ struct dso *dso__new_kernel(const char *
 
 	if (self != NULL) {
 		dso__set_short_name(self, "[kernel]");
-		self->kernel	 = 1;
+		self->kernel	 = DSO_TYPE_KERNEL;
+	}
+
+	return self;
+}
+
+struct dso *dso__new_guest_kernel(const char *name)
+{
+	struct dso *self = dso__new(name ?: "[guest.kernel.kallsyms]");
+
+	if (self != NULL) {
+		dso__set_short_name(self, "[guest.kernel]");
+		self->kernel     = DSO_TYPE_GUEST_KERNEL;
 	}
 
 	return self;
@@ -1804,6 +1910,15 @@ static struct dso *dsos__create_kernel(c
 	return kernel;
 }
 
+static struct dso *dsos__create_guest_kernel(const char *vmlinux)
+{
+	struct dso *kernel = dso__new_guest_kernel(vmlinux);
+
+	if (kernel != NULL)
+		dsos__add(&dsos__guest_kernel, kernel);
+	return kernel;
+}
+
 int __map_groups__create_kernel_maps(struct map_groups *self,
 				     struct map *vmlinux_maps[MAP__NR_TYPES],
 				     struct dso *kernel)
@@ -1963,3 +2078,24 @@ int map_groups__create_kernel_maps(struc
 	map_groups__fixup_end(self);
 	return 0;
 }
+
+int map_groups__create_guest_kernel_maps(struct map_groups *self,
+				   struct map *vmlinux_maps[MAP__NR_TYPES])
+{
+	struct dso *kernel = dsos__create_guest_kernel(symbol_conf.guest_vmlinux_name);
+
+	if (kernel == NULL)
+		return -1;
+
+	if (__map_groups__create_kernel_maps(self, vmlinux_maps, kernel) < 0)
+		return -1;
+
+	if (symbol_conf.use_modules && map_groups__create_guest_modules(self) < 0)
+		pr_debug("Problems creating module maps, continuing anyway...\n");
+	/*
+	 * Now that we have all the maps created, just set the ->end of them:
+	 */
+	map_groups__fixup_end(self);
+	return 0;
+}
+
diff -Nraup linux-2.6_tip0317/tools/perf/util/symbol.h linux-2.6_tip0317_perfkvm/tools/perf/util/symbol.h
--- linux-2.6_tip0317/tools/perf/util/symbol.h	2010-03-18 09:04:40.938289813 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/util/symbol.h	2010-03-18 15:06:19.591054262 +0800
@@ -63,10 +63,14 @@ struct symbol_conf {
 			show_nr_samples,
 			use_callchain,
 			exclude_other,
-			full_paths;
+			full_paths,
+			show_cpu_utilization;
 	const char	*vmlinux_name,
 			*field_sep;
-	char            *dso_list_str,
+	const char	*guest_vmlinux_name,
+			*guest_kallsyms,
+			*guest_modules;
+	char		*dso_list_str,
 			*comm_list_str,
 			*sym_list_str,
 			*col_width_list_str;
@@ -95,6 +99,13 @@ struct addr_location {
 	u64	      addr;
 	char	      level;
 	bool	      filtered;
+	unsigned int  cpumode;
+};
+
+enum dso_kernel_type {
+	DSO_TYPE_USER = 0,
+	DSO_TYPE_KERNEL,
+	DSO_TYPE_GUEST_KERNEL
 };
 
 struct dso {
@@ -104,7 +115,7 @@ struct dso {
 	u8		 adjust_symbols:1;
 	u8		 slen_calculated:1;
 	u8		 has_build_id:1;
-	u8		 kernel:1;
+	enum dso_kernel_type	kernel;
 	u8		 hit:1;
 	u8		 annotate_warned:1;
 	unsigned char	 origin;
@@ -120,6 +131,7 @@ struct dso {
 
 struct dso *dso__new(const char *name);
 struct dso *dso__new_kernel(const char *name);
+struct dso *dso__new_guest_kernel(const char *name);
 void dso__delete(struct dso *self);
 
 bool dso__loaded(const struct dso *self, enum map_type type);
@@ -132,7 +144,7 @@ static inline void dso__set_loaded(struc
 
 void dso__sort_by_name(struct dso *self, enum map_type type);
 
-extern struct list_head dsos__user, dsos__kernel;
+extern struct list_head dsos__user, dsos__kernel, dsos__guest_user, dsos__guest_kernel;
 
 struct dso *__dsos__findnew(struct list_head *head, const char *name);
 
@@ -161,6 +173,8 @@ enum dso_origin {
 	DSO__ORIG_BUILDID,
 	DSO__ORIG_DSO,
 	DSO__ORIG_KMODULE,
+	DSO__ORIG_GUEST_KERNEL,
+	DSO__ORIG_GUEST_KMODULE,
 	DSO__ORIG_NOT_FOUND,
 };
 
diff -Nraup linux-2.6_tip0317/tools/perf/util/thread.h linux-2.6_tip0317_perfkvm/tools/perf/util/thread.h
--- linux-2.6_tip0317/tools/perf/util/thread.h	2010-03-18 09:04:40.926228328 +0800
+++ linux-2.6_tip0317_perfkvm/tools/perf/util/thread.h	2010-03-18 15:06:19.591054262 +0800
@@ -82,6 +82,9 @@ int __map_groups__create_kernel_maps(str
 int map_groups__create_kernel_maps(struct map_groups *self,
 				   struct map *vmlinux_maps[MAP__NR_TYPES]);
 
+int map_groups__create_guest_kernel_maps(struct map_groups *self,
+				   struct map *vmlinux_maps[MAP__NR_TYPES]);
+
 struct map *map_groups__new_module(struct map_groups *self, u64 start,
-				   const char *filename);
+				   const char *filename, int guest);
 #endif	/* __PERF_THREAD_H */



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 22:44                                                   ` Ingo Molnar
@ 2010-03-19  7:21                                                     ` Avi Kivity
  2010-03-20 14:59                                                       ` Andrea Arcangeli
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-19  7:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zachary Amsden, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/19/2010 12:44 AM, Ingo Molnar wrote:
>
> Too bad - there was heavy initial opposition to the arch/x86 unification as
> well [and heavy opposition to tools/perf/ as well], still both worked out
> extremely well :-)
>    

Did you forget that arch/x86 was a merging of a code fork that happened 
several years previously?  Maybe that fork shouldn't have been done to 
begin with.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 22:16                                                                   ` Ingo Molnar
@ 2010-03-19  7:22                                                                     ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-19  7:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alan Cox, drepper, Anthony Liguori, Peter Zijlstra, linux-kernel,
	kvm, Marcelo Tosatti, Joerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/19/2010 12:16 AM, Ingo Molnar wrote:
>
>> Yes its a double standard
>>
>> Glibc has a higher standard than Fedora/RHEL.
>>
>> Just like the Ubuntu kernel ships various ugly unfit for upstream kernel
>> drivers.
>>      
> There's a world of a difference between a fugly driver and a glibc patch.
>
>    

Yes, fugly drivers can be cleaned up, but glibc ABIs are forever.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 17:28                                                                     ` Ingo Molnar
@ 2010-03-19  7:56                                                                       ` Avi Kivity
  2010-03-19  8:53                                                                         ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-19  7:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/18/2010 07:28 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/18/2010 07:02 PM, Ingo Molnar wrote:
>>      
>>> I find the 'KVM mostly cares about the server, not about the desktop'
>>> attitude expressed in this thread troubling.
>>>        
>> It's not kvm, just it's developers (and their employers, where applicable).
>> If you post desktop oriented patches I'm sure they'll be welcome.
>>      
> Just such a patch-set was posted in this very thread: 'perf kvm'.
>
> There were two negative reactions immediately, both showed a fundamental
> server versus desktop bias:
>
>   - you did not accept that the most important usecase is when there is a
>     single guest running.
>    

Well, it isn't.

>   - the reaction to the 'how do we get symbols out of the guest' sub-question
>     was, paraphrased: 'we dont want that due to<unspecified>  security threat
>     to XYZ selinux usecase with lots of guests'.
>    

When I review a patch, I try to think of the difficult cases, not just 
the easy case.

> Anyone being aware of how Linux and KVM is being used on the desktop will know
> how detached that attitude is from the typical desktop usecase ...
>
> Usability _never_ sucks because of lack of patches or lack of suggestions. I
> bet if you made the next server feature contingent on essential usability
> fixes they'd happen overnight - for God's sake there's been 1000 commits in
> the last 3 months in the Qemu repository so there's plenty of manpower...
>    

First of all I am not a qemu maintainer.  Second, from my point of view 
all contributors are volunteers (perhaps their employer volunteered 
them, but there's no difference from my perspective).  Asking them to 
repaint my apartment as a condition to get a patch applied is abuse.  If 
a patch is good, it gets applied.

> Usability suckage - and i'm not going to be popular for saying this out loud -
> almost always shows a basic maintainer disconnect with the real world. See
> your very first reactions to my 'KVM usability' observations. Read back your
> and Anthony's replies: total 'sure, patches welcome' kind of indifference. It
> is _your project_, not some other project down the road ...
>    

I could drop everything and write a gtk GUI for qemu.  Is that what you 
want?

If someone is truly interested in a qemu usability, it's up to them to 
write the patches.  Personally I've never missed the eject button.

As to disconnect from the real world, most products based on kvm and 
qemu (and Linux) are server based.  Perhaps that's the reason people 
emphasise that?  Maybe if Linux had 10-20% desktop market penetration, 
there would be more interest in a bells and whistles qemu GUI.

> So that is my first-hand experience about how you are welcoming these desktop
> issues, in this very thread. I suspect people try a few times with
> suggestions, then get shot down like our suggestions were shot down and then
> give up.
>    

I don't recall anyone trying this much less being shot down.  Perhaps 
people are concentrating on virt-manager and the like and leaving qemu 
alone.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-19  3:38 ` Zhang, Yanmin
@ 2010-03-19  8:21   ` Ingo Molnar
  2010-03-19 17:29     ` oerg Roedel
  2010-03-22  7:24     ` Zhang, Yanmin
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-19  8:21 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo


Nice progress!

This bit:

> 1) perf kvm top
> [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> --guestmodules=/home/ymzhang/guest/modules top

Will be really be painful to developers - to enter that long line while we 
have these things called 'computers' that ought to reduce human work. Also, 
it's incomplete, we need access to the guest system's binaries to do ELF 
symbol resolution and dwarf decoding.

So we really need some good, automatic way to get to the guest symbol space, 
so that if a developer types:

   perf kvm top

Then the obvious thing happens by default. (which is to show the guest 
overhead)

There's no technical barrier on the perf tooling side to implement all that: 
perf supports build-ids extensively and can deal with multiple symbol spaces - 
as long as it has access to it. The guest kernel could be ID-ed based on its 
/sys/kernel/notes and /sys/module/*/notes/.note.gnu.build-id build-ids.

So some sort of --guestmount option would be the natural solution, which 
points to the guest system's root: and a Qemu enumeration of guest mounts 
(which would be off by default and configurable) from which perf can pick up 
the target guest all automatically. (obviously only under allowed permissions 
so that such access is secure)

This would allow not just kallsyms access via $guest/proc/kallsyms but also 
gives us the full space of symbol features: access to the guest binaries for 
annotation and general symbol resolution, command/binary name identification, 
etc.

Such a mount would obviously not broaden existing privileges - and as an 
additional control a guest would also have a way to indicate that it does not 
wish a guest mount at all.

Unfortunately, in a previous thread the Qemu maintainer has indicated that he 
will essentially NAK any attempt to enhance Qemu to provide an easily 
discoverable, self-contained, transparent guest mount on the host side.

No technical justification was given for that NAK, despite my repeated 
requests to particulate the exact security problems that such an approach 
would cause.

If that NAK does not stand in that form then i'd like to know about it - it 
makes no sense for us to try to code up a solution against a standing 
maintainer NAK ...

The other option is some sysadmin level hackery to NFS-mount the guest or so. 
This is a vastly inferior method that brings us back to the absymal usability 
levels of OProfile:

 1) it wont be guest transparent
 2) has to be re-done for every guest image. 
 3) even if packaged it has to be gotten into every. single. Linux. distro. separately.
 4) old Linux guests wont work out of box

In other words: it's very inconvenient on multiple levels and wont ever happen 
on any reasonable enough scale to make a difference to Linux.

Which is an unfortunate situation - and the ball is on the KVM/Qemu side so i 
can do little about it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-19  7:56                                                                       ` Avi Kivity
@ 2010-03-19  8:53                                                                         ` Ingo Molnar
  2010-03-19 12:56                                                                           ` Anthony Liguori
  2010-03-20  7:35                                                                           ` Avi Kivity
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-19  8:53 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> > There were two negative reactions immediately, both showed a fundamental 
> > server versus desktop bias:
> >
> >  - you did not accept that the most important usecase is when there is a
> >    single guest running.
> 
> Well, it isn't.

Erm, my usability points are _doubly_ true when there are multiple guests ...

The inconvenience of having to type:

  perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
  --guestmodules=/home/ymzhang/guest/modules top

is very obvious even with a single guest. Now multiply that by more guests ...

The crux is: we are working on improving KVM instrumentation. There are 
working patches posted to this thread and we would like to have/implement an 
automatism to allow the discovery of all this information. The information 
should be available to the developer who wants it, and easily/transparently so 
- in true Linux fashion.

> >  - the reaction to the 'how do we get symbols out of the guest' sub-question
> >    was, paraphrased: 'we dont want that due to<unspecified>  security threat
> >    to XYZ selinux usecase with lots of guests'.
> 
> When I review a patch, I try to think of the difficult cases, not
> just the easy case.

You havent articulated an actionable reason and you have suggested no solution 
either, you just passive-agressive backed the claim that giving developers 
access to the symbol space is some sort of vague 'security threat'.

If that is not so i'd be glad to be proven wrong.

> > Anyone being aware of how Linux and KVM is being used on the desktop will 
> > know how detached that attitude is from the typical desktop usecase ...
> >
> > Usability _never_ sucks because of lack of patches or lack of suggestions. 
> > I bet if you made the next server feature contingent on essential 
> > usability fixes they'd happen overnight - for God's sake there's been 1000 
> > commits in the last 3 months in the Qemu repository so there's plenty of 
> > manpower...
> 
> First of all I am not a qemu maintainer. [...]

That is the crux of the matter. My experience in these threads was that no-one 
really seems to feel in charge of the whole thing. Should we really wonder why 
KVM usability sucks?

> [...] Second, from my point of view all contributors are volunteers (perhaps 
> their employer volunteered them, but there's no difference from my 
> perspective). Asking them to repaint my apartment as a condition to get a 
> patch applied is abuse.  If a patch is good, it gets applied.

This is one of the weirdest arguments i've seen in this thread. Almost all the 
time do we make contributions conditional on the general shape of the project. 
Developers dont get to do just the fun stuff.

This is a basic quid pro quo: new features introduce risks and create 
additional workload not just to the originating developer but on the rest of 
the community as well. You should check how Linus has pulled new features in 
the past 15 years: he very much requires the existing code to first be 
top-notch before he accepts new features for a given area of functionality.

Doing that and insisting on developers to see those imbalances as well is 
absolutely essential to code quality: otherwise everyone would be running 
around implementing just the features they are interested in, without regard 
for the general health of the project.

Of course, if you keep the project in two halves (KVM and Qemu), and pretend 
that they are separate and have little relation, imbalances of quality can 
mount up and you can throw your hands up and say that it's "too bad, I'm not 
maintaining that". It is your basic duty as a Linux maintainer to keep 
balances of quality. I do it all day, other maintainers do it all day.

> > Usability suckage - and i'm not going to be popular for saying this out 
> > loud - almost always shows a basic maintainer disconnect with the real 
> > world. See your very first reactions to my 'KVM usability' observations. 
> > Read back your and Anthony's replies: total 'sure, patches welcome' kind 
> > of indifference. It is _your project_, not some other project down the 
> > road ...
> 
> I could drop everything and write a gtk GUI for qemu.  Is that what you 
> want?

No, my suggestion to you (it's up to you whether you give my opinion any 
weight) is to accept your mistakes and improve, and to not stand in the way of 
people who'd like to improve the situation. You are happy with the server 
features and you also made it clear that you dont feel responsible for the 
rest of the package - which is a big mistake IMO.

Also, you have demonstrated it in this thread that you have near zero 
technical clue about basic desktop and development usability matters - for 
example your stance on symbol space access and your stance on how to enumerate 
guests symbolically are outright bizarre.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 16:11                                                         ` Anthony Liguori
  2010-03-18 16:28                                                           ` Ingo Molnar
@ 2010-03-19  9:19                                                           ` Paul Mundt
  2010-03-19  9:52                                                             ` Olivier Galibert
  1 sibling, 1 reply; 390+ messages in thread
From: Paul Mundt @ 2010-03-19  9:19 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ingo Molnar, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Thu, Mar 18, 2010 at 11:11:43AM -0500, Anthony Liguori wrote:
> On 03/18/2010 10:17 AM, Ingo Molnar wrote:
> >* Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >>On 03/18/2010 08:00 AM, Ingo Molnar wrote:
> >>>>[...]  kvm in fact knows nothing about vga, to take your last example.
> >>>>[...]
> >>>>         
> >>>Look at the VGA dirty bitmap optimization a'ka the KVM_GET_DIRTY_LOG
> >>>ioctl.
> >>>
> >>>See qemu/kvm-all.c's kvm_physical_sync_dirty_bitmap().
> >>>
> >>>It started out as a VGA optimization (also used by live migration) and
> >>>even today it's mostly used by the VGA drivers - albeit a weak one.
> >>>
> >>>I wish there were stronger VGA optimizations implemented, copying the
> >>>dirty bitmap is not a particularly performant solution. (although it's
> >>>certainly better than full emulation) Graphics performance is one of the
> >>>more painful aspects of KVM usability today.
> >>>       
> >>We have to maintain a dirty bitmap because we don't have a paravirtual
> >>graphics driver.  IOW, someone needs to write an Xorg driver.
> >>
> >>Ideally, we could just implement a Linux framebuffer device, right?
> >>     
> >No, you'd want to interact with DRM.
> 
> Using DRM doesn't help very much.  You still need an X driver and most 
> of the operations you care about (video rendering, window movement, etc) 
> are not operations that need to go through DRM.
> 
> 3D graphics virtualization is extremely difficult in the non-passthrough 
> case.  It really requires hardware support that isn't widely available 
> today (outside a few NVIDIA chipsets).
> 
Implementing a virtualized DRM/KMS driver would at least get you the
framebuffer interface more or less for free, while allowing you to deal
with the userspace side of things incrementally (ie, running a dummy xorg
on top of the virtualized fbdev until the DRI side catches up). It would
also enable you to focus on the 2D and 3D parts independently.

> It doesn't provide the things we need to a good user experience.  You 
> need things like an absolute input device, host driven display resize, 
> RGBA hardware cursors.  None of these go through DRI and it's those 
> things that really provide the graphics user experience.
> 
None of these things negate the benefit one would get from a virtualized
DRM/KMS driver either. There are multiple problems that need solving in
this area, and it's a bit disingenuous to discount a valid suggestion out
of hand due to the fact it doesn't solve all of the outstanding issues.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-19  9:19                                                           ` Paul Mundt
@ 2010-03-19  9:52                                                             ` Olivier Galibert
  2010-03-19 13:56                                                                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 390+ messages in thread
From: Olivier Galibert @ 2010-03-19  9:52 UTC (permalink / raw)
  To: Paul Mundt
  Cc: Anthony Liguori, Ingo Molnar, Avi Kivity, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Fri, Mar 19, 2010 at 06:19:04PM +0900, Paul Mundt wrote:
> Implementing a virtualized DRM/KMS driver would at least get you the
> framebuffer interface more or less for free, while allowing you to deal
> with the userspace side of things incrementally (ie, running a dummy xorg
> on top of the virtualized fbdev until the DRI side catches up). It would
> also enable you to focus on the 2D and 3D parts independently.

Guys, have a look at Gallium.  In many ways it's a pile of crap, but
at least it's a pile of crap designed by vmware for *exactly* your
problem space.

  OG.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-19  8:53                                                                         ` Ingo Molnar
@ 2010-03-19 12:56                                                                           ` Anthony Liguori
  2010-03-21 19:17                                                                             ` Ingo Molnar
  2010-03-20  7:35                                                                           ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-19 12:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/19/2010 03:53 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> There were two negative reactions immediately, both showed a fundamental
>>> server versus desktop bias:
>>>
>>>   - you did not accept that the most important usecase is when there is a
>>>     single guest running.
>>>        
>> Well, it isn't.
>>      
> Erm, my usability points are _doubly_ true when there are multiple guests ...
>
> The inconvenience of having to type:
>
>    perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
>    --guestmodules=/home/ymzhang/guest/modules top
>
> is very obvious even with a single guest. Now multiply that by more guests ...
>    

If you want to improve this, you need to do the following:

1) Add a userspace daemon that uses vmchannel that runs in the guest and 
can fetch kallsyms and arbitrary modules.  If that daemon lives in 
tools/perf, that's fine.
2) Add a QMP interface in qemu to interact with such daemon
3) Add a default QMP port in a well known location[1]
4) Modify the perf tool to look for a default QMP port.  In the case of 
a single guest, there's one port.  If there are multiple guests, then 
you will have to connect to each port, find the name or any other 
identifying information, and let the user choose.

Patches are certainly welcome.

[1] I've written up this patch and will send it out some time today.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [LKML] Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-19  9:52                                                             ` Olivier Galibert
@ 2010-03-19 13:56                                                                 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 390+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-19 13:56 UTC (permalink / raw)
  To: Olivier Galibert, Paul Mundt, Anthony Liguori, Ingo Molnar,
	Avi Kivity, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Fri, Mar 19, 2010 at 10:52:08AM +0100, Olivier Galibert wrote:
> On Fri, Mar 19, 2010 at 06:19:04PM +0900, Paul Mundt wrote:
> > Implementing a virtualized DRM/KMS driver would at least get you the
> > framebuffer interface more or less for free, while allowing you to deal
> > with the userspace side of things incrementally (ie, running a dummy xorg
> > on top of the virtualized fbdev until the DRI side catches up). It would
> > also enable you to focus on the 2D and 3D parts independently.
> 
> Guys, have a look at Gallium.  In many ways it's a pile of crap, but
> at least it's a pile of crap designed by vmware for *exactly* your
> problem space.

Or perhaps Chromium, which was designed years ago and can pass-through
OpenGL commands via a pipe. VirtualBox uses it for their PV drivers.
Naturally it is not a FB, just a OpenGL command pass-through interface.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [LKML] Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-19 13:56                                                                 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 390+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-19 13:56 UTC (permalink / raw)
  To: Olivier Galibert, Paul Mundt, Anthony Liguori, Ingo Molnar, Avi Kivity, 

On Fri, Mar 19, 2010 at 10:52:08AM +0100, Olivier Galibert wrote:
> On Fri, Mar 19, 2010 at 06:19:04PM +0900, Paul Mundt wrote:
> > Implementing a virtualized DRM/KMS driver would at least get you the
> > framebuffer interface more or less for free, while allowing you to deal
> > with the userspace side of things incrementally (ie, running a dummy xorg
> > on top of the virtualized fbdev until the DRI side catches up). It would
> > also enable you to focus on the 2D and 3D parts independently.
> 
> Guys, have a look at Gallium.  In many ways it's a pile of crap, but
> at least it's a pile of crap designed by vmware for *exactly* your
> problem space.

Or perhaps Chromium, which was designed years ago and can pass-through
OpenGL commands via a pipe. VirtualBox uses it for their PV drivers.
Naturally it is not a FB, just a OpenGL command pass-through interface.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18  8:44                                     ` Jes Sorensen
  2010-03-18  9:54                                       ` Ingo Molnar
@ 2010-03-19 14:53                                       ` Andrea Arcangeli
  1 sibling, 0 replies; 390+ messages in thread
From: Andrea Arcangeli @ 2010-03-19 14:53 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Ingo Molnar, Anthony Liguori, Avi Kivity, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

Hi there,

not really trying to get into the CC list of this discussion ;) but
for what is worth I'd like to share my opinion on the matter.

On Thu, Mar 18, 2010 at 09:44:18AM +0100, Jes Sorensen wrote:
> What made KVM so successful was that the core kernel of the hypervisor
> was designed the right way, as a kernel module where it belonged. It was
> obvious to anyone who had been exposed to the main competition at the
> time, Xen, that this was the right approach. What has ended up killing
> Xen in the end is the not-invented-here approach of copying everything
> over, reformatting it, and rewriting half of it, which made it
> impossible to maintain and support as a single codebase. At my previous

Full agreement with that. CVS/git/patches and development model is
next to irrelevant compared to the basic design of the code.

qemu (and especially qemu-kvm) is surely much closer to perf, than a
firefox or openoffice, because there is some tight interconnect with
the kernel API. And the skills required to produce useful patches in
qemu are similar to the skills requires to produce useful patches for
the kernel, more often than not a new feature in kvm also requires
some merging of a qemu-kvm side patch (it always happened to me so far
;). But clearly we've to draw a barrier somewhere and while I could
see things like systemtap and util-linux included into the kernel and
perf already is, I've an hard time to see userland code supporting
kernels other than linux into the kernel.

I think that's probably where I'd draw the line. Let's say somebody
creates a pure paravirt userland for kvm without full driver emulation
that only runs on a linux kernel and no other OS, maybe that thing
wouldn't be so controversial to include into the kernel as qemu
is. qemu is clearly beyond the "only-running-on-a-linux-kernel"
barrier...

I'd definitely start with systemtap, which I think is even more
suitable than perf to be merged into the kernel. Things useful only
for developers like perf/systemtap makes even more sense to fetch
silently hidden in a single pull. Those projects are so ideal to fetch
together because you run your own compiled userland binary and not an
rpm, and you need very latest kernel and userland package and sometime
new userland might not work so well with older kernel too and the
other way around. they're tool for developers and no developer cares
about API as they rebuild latest userland code anyway, they almost
don't require backwards compatibility of kernel.

> So far your argument would justify pulling all of gdb into the kernel
> git tree as well, to support the kgdb efforts, or gcc so we can get rid
> of the gcc version quirks in the kernel header files, e2fsprogs and
> equivalent for _all_ file systems included in the kernel so we can make
> sure our fs tools never get out of sync with whats supported in the
> kernel......

It also boils down to the maintainer, where the code is, defines the
maintainer who pushes/commits it to the central repository that
everyone pulls from. And having the code into Linus's tree doesn't
make sense unless Linus is interested to follow and review qemu. So
it'd only create blind pulls.

But I entirely see what Ingo is going after and I have no doubt that
contribution increases if some code is merged into the kernel even if
it's userland. The more people clones a project, the more people
builds, uses, reads the code and has incentive to contribute... and
there's nothing else like the linux git tree to give visibility to a
project and get more contributions (well as long as the pulled code
requires similar skills to the kernel code of course). Plus it's
annoying to go on web, find url to clone, clone.. running make is
faster. But this purely is a PR effect. It's like free ads, to get
more people using and looking into the code because it's already on
the harddisk and you've only to run make. After somebody gets familiar
with the pulled userland code because it find it in the tree and it
didn't need to search the web and cut-and-paste and stuff, clone a new
repo, I think it wouldn't matter anymore to have it into the kernel or
not. Like for perf, by now I doubt it'd get less contributions if it's
moved out of the kernel tree. The only reason to leave it there is if
Linus actively checks the code before pulling it.

So I think what would be nice maybe to get the PR/ads positive effect
and get more _users_ (and later developers) involved without actually
merging, is a command like:

   git clone linux-2.6
   cd linux-2.6
   tools/clone-project qemu-kvm
   tools/clone-project qemu
   tools/clone-project systemtap
   tools/clone-project seabios
   tools/clone-project e2fsprogs
   tools/clone-project perf
   ...

And then maybe a git-send-email or similar command that would do the
right thing and send it to the right list or similar. Learning the
process of other projects is time consuming and requires some effort.

But as far as qemu-kvm goes, the visibility is already there, and most
people that could possibly contribute to the kernel side already has
the userland cloned and pulls regularly from it so I doubt it'd
generate anything remotely as beneficial like it did for
perf. systemtap is really _identical_ to perf. You include it, lots
more developers toys with it by just running "make", they find a bug
and they fixes it, and they keep on contributing new features later
after they got familiar with it. As usual ;).

In separate mail Ingo said:

> Btw., i made similar arguments to Avi about 3 years ago when it was
> going upstream, that qemu should be unified with KVM. This is more
> true today than ever.

Well not sure if with KVM you mean qemu-kvm or KVM kernel code, but I
would see an huge value and a win-win situation to see qemu-kvm and
qemu unified. It's out of my reach how there can be still be a
difference considering that nobody runs qemu with perhaps the
exception of the maintainers themself. There's not even a qemu/kvm
directory in qemu. These are the real problems that should be
solved... Every time I've to send a patch I've to check if it also
applies to qemu. I usually start with qemu-kvm to do my development
there, and then I cross fingers and I hope the patch applies clean
against qemu too, and send it there in that case. qemu and qemu-kvm
are the same thing, it's the same people, it's the same community,
it's the same skills, it's an absolute waste that a little amount of
code isn't merged so we can all work on the same tree.  And it's not
an huge patch at all if compared to the size of qemu-kvm... so there
can't be any technical explanation for qemu to take a tangent
here.

So my suggestion is to start with what will give a _real_ tangible
benefit to developers (i.e. to all work on the same branch), the PR
effect of merging it into the kernel would be minor for qemu-kvm,
it really doesn't matter which url we pull the code from as long as
there is only 1 url and not 2.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-19  8:21   ` Ingo Molnar
@ 2010-03-19 17:29     ` oerg Roedel
  2010-03-21 18:43       ` Ingo Molnar
  2010-03-22  7:24     ` Zhang, Yanmin
  1 sibling, 1 reply; 390+ messages in thread
From: oerg Roedel @ 2010-03-19 17:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo

On Fri, Mar 19, 2010 at 09:21:22AM +0100, Ingo Molnar wrote:
> Unfortunately, in a previous thread the Qemu maintainer has indicated that he 
> will essentially NAK any attempt to enhance Qemu to provide an easily 
> discoverable, self-contained, transparent guest mount on the host side.
> 
> No technical justification was given for that NAK, despite my repeated 
> requests to particulate the exact security problems that such an approach 
> would cause.
> 
> If that NAK does not stand in that form then i'd like to know about it - it 
> makes no sense for us to try to code up a solution against a standing 
> maintainer NAK ...

I still think it is the best and most generic way to let the guest do
the symbol resolution. This has several advantages:

	1. The guest knows best about its symbol space. So this would be
	   extensible to other guest operating systems.  A brave
	   developer may even implement symbol passing for Windows or
	   the BSDs ;-)

	2. The guest can decide for its own if it want to pass this
	   inforamtion to the host-perf. No security issues at all.

	3. The guest can also pass us the call-chain and we don't need
	   to care about complicated of fetching from the guest
	   ourself.

	4. This way extensible to nested virtualization too.

How we speak to the guest was already discussed in this thread. My
personal opinion is that going through qemu is an unnecessary step and
we can solve that more clever and transparent for perf.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-19  8:53                                                                         ` Ingo Molnar
  2010-03-19 12:56                                                                           ` Anthony Liguori
@ 2010-03-20  7:35                                                                           ` Avi Kivity
  2010-03-21 19:06                                                                             ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-20  7:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/19/2010 10:53 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> There were two negative reactions immediately, both showed a fundamental
>>> server versus desktop bias:
>>>
>>>   - you did not accept that the most important usecase is when there is a
>>>     single guest running.
>>>        
>> Well, it isn't.
>>      
> Erm, my usability points are _doubly_ true when there are multiple guests ...
>
> The inconvenience of having to type:
>
>    perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
>    --guestmodules=/home/ymzhang/guest/modules top
>
> is very obvious even with a single guest. Now multiply that by more guests ...
>    

Yes.  That's why I asked how this is handled.

> The crux is: we are working on improving KVM instrumentation. There are
> working patches posted to this thread and we would like to have/implement an
> automatism to allow the discovery of all this information. The information
> should be available to the developer who wants it, and easily/transparently so
> - in true Linux fashion.
>
>    
>>>   - the reaction to the 'how do we get symbols out of the guest' sub-question
>>>     was, paraphrased: 'we dont want that due to<unspecified>   security threat
>>>     to XYZ selinux usecase with lots of guests'.
>>>        
>> When I review a patch, I try to think of the difficult cases, not
>> just the easy case.
>>      
> You havent articulated an actionable reason and you have suggested no solution
> either,

I did suggest a symbol server, and using a well-known location, though 
I'm unhappy with it.  Multiple guest management should be done by the 
appropriate tools, not qemu or implicitly.

> you just passive-agressive backed the claim that giving developers
> access to the symbol space is some sort of vague 'security threat'.
>    

Passive-aggressive?  Should I see a doctor?

> If that is not so i'd be glad to be proven wrong.
>
>    
>>> Anyone being aware of how Linux and KVM is being used on the desktop will
>>> know how detached that attitude is from the typical desktop usecase ...
>>>
>>> Usability _never_ sucks because of lack of patches or lack of suggestions.
>>> I bet if you made the next server feature contingent on essential
>>> usability fixes they'd happen overnight - for God's sake there's been 1000
>>> commits in the last 3 months in the Qemu repository so there's plenty of
>>> manpower...
>>>        
>> First of all I am not a qemu maintainer. [...]
>>      
> That is the crux of the matter. My experience in these threads was that no-one
> really seems to feel in charge of the whole thing.

I am comfortable with having someone I trust maintain qemu.  While 
sometimes Anthony overrides me on issues where I know I'm right and he's 
wrong, still I prefer that to having to do everything myself, I would 
surely do a worse job due to overload.

I you actually look at qemu patches, the vast majority have little to do 
directly with kvm; and I (along with Marcelo) maintain the kvm 
integration in qemu.

> Should we really wonder why
> KVM usability sucks?
>    

That wouldn't change at all if I were to maintain it, since I wouldn't 
start writing a GUI for it and wouldn't force other contributors to do 
so as a condition for accepting unrelated patches.

>> [...] Second, from my point of view all contributors are volunteers (perhaps
>> their employer volunteered them, but there's no difference from my
>> perspective). Asking them to repaint my apartment as a condition to get a
>> patch applied is abuse.  If a patch is good, it gets applied.
>>      
> This is one of the weirdest arguments i've seen in this thread. Almost all the
> time do we make contributions conditional on the general shape of the project.
> Developers dont get to do just the fun stuff.
>    

So, do you think a reply to a patch along the lines of

   NAK.  Improving scalability is pointless while we don't have a decent 
GUI.  I'll review you RCU patches
   _after_ you've contributed a usable GUI.

?

> This is a basic quid pro quo: new features introduce risks and create
> additional workload not just to the originating developer but on the rest of
> the community as well. You should check how Linus has pulled new features in
> the past 15 years: he very much requires the existing code to first be
> top-notch before he accepts new features for a given area of functionality.
>    

For a given area, yes.  It makes sense to clean up code before changing 
it, otherwise cruft accumulates rapidly.  What you're describing is 
completely different and amounts to total disregard of contributors' 
time and effort.

> Doing that and insisting on developers to see those imbalances as well is
> absolutely essential to code quality: otherwise everyone would be running
> around implementing just the features they are interested in, without regard
> for the general health of the project.
>    

The general health of qemu in terms of code quality was indeed pretty 
bad and there was (and is) a massive effort to modernise it.  If you're 
interested look at qdev and qmp.  Both are efforts to improve the 
infrastructure rather than add features on rotten code, and very 
successful IMO.  There was no effort to write a GUI since no one appears 
to be motivated to do it except you.

> Of course, if you keep the project in two halves (KVM and Qemu), and pretend
> that they are separate and have little relation, imbalances of quality can
> mount up and you can throw your hands up and say that it's "too bad, I'm not
> maintaining that". It is your basic duty as a Linux maintainer to keep
> balances of quality. I do it all day, other maintainers do it all day.
>    

IMO qemu quality has improved dramatically in the last year or two.

>>> Usability suckage - and i'm not going to be popular for saying this out
>>> loud - almost always shows a basic maintainer disconnect with the real
>>> world. See your very first reactions to my 'KVM usability' observations.
>>> Read back your and Anthony's replies: total 'sure, patches welcome' kind
>>> of indifference. It is _your project_, not some other project down the
>>> road ...
>>>        
>> I could drop everything and write a gtk GUI for qemu.  Is that what you
>> want?
>>      
> No, my suggestion to you (it's up to you whether you give my opinion any
> weight) is to accept your mistakes and improve, and to not stand in the way of
> people who'd like to improve the situation. You are happy with the server
> features and you also made it clear that you dont feel responsible for the
> rest of the package - which is a big mistake IMO.
>    

If there were no capable maintainer I would reluctantly step in.  That 
is not the case.  If I were to displace Anthony then qemu quality would 
suffer, or I would have to drop kvm maintainership, or, if some false 
modesty is allowed, perhaps both.

> Also, you have demonstrated it in this thread that you have near zero
> technical clue about basic desktop and development usability matters
>    

Neither do you.  At least I have spent enough time among real usability 
people to know this.  I don't have any pretences in this area and am 
happy to leave it to the experts.  As infrastructure projects kvm and 
qemu should concentrate on providing flexible capabilities to consumers, 
which then expose it to users.  These consumers can be server-oriented 
management applications, or end-user GUIs.

My preferred plan for GUIs, btw, is a plugin based approach where qemu 
exposes its internal objects (the same ones that are exposed to 
management applications) to the GUI which can then manipulate them, 
without being co-maintained in the same code base.  This allows multiple 
GUIs (KDE and GNOME) and allows people with a clue to work on them.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-19  7:21                                                     ` Avi Kivity
@ 2010-03-20 14:59                                                       ` Andrea Arcangeli
  2010-03-21 10:03                                                         ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Andrea Arcangeli @ 2010-03-20 14:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Zachary Amsden, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Fri, Mar 19, 2010 at 09:21:49AM +0200, Avi Kivity wrote:
> On 03/19/2010 12:44 AM, Ingo Molnar wrote:
> >
> > Too bad - there was heavy initial opposition to the arch/x86 unification as
> > well [and heavy opposition to tools/perf/ as well], still both worked out
> > extremely well :-)
> >    
> 
> Did you forget that arch/x86 was a merging of a code fork that happened 
> several years previously?  Maybe that fork shouldn't have been done to 
> begin with.

We discussed and probably timidly tried to share the sharable
initially but we realized it was too time wasteful. In addition to
having to adapt the code to 64bit we would also had to constantly
solve another problem on top of it (see the various split on _32/_64,
those takes time to achieve, maybe not huge time but still definitely
some time and effort). Even in retrospect I am quite sure the way
x86-64 happened was optimal and if we would go back we would do it
again the exact same way even if the final object was to have a common
arch/x86 (and thankfully Linus is flexible and smart enough to realize
that code that isn't risking to destabilize anything shouldn't be
forced out just because it's not to a totally
theoretical-perfect-nitpicking-clean-state yet). It's still a lot of
work do the unification later as a separate task, but it's not like if
we did it immediately it would have been a lot less work. It's about
the same amount of effort and we were able to defer it for later and
decrease the time to market which surely has contributed to the
success of x86-64.

Problem of qemu is not some lack of GUI or that it's not included in
the linux kernel git tree, the definitive problem is how to merge
qemu-kvm/kvm and qlx into it. If you (Avi) were the qemu maintainer I
am sure there wouldn't two trees so as a developer I would totally
love it, and I am sure that with you as maintainer it would have a
chance to move forward with qlx on desktop virtualization without
proposing to extend vnc instead to achieve a "similar" result (imagine
if btrfs is published on a website and people starts to discuss if it
should ever be merged ever because reinventing some part of btrfs
inside ext5 might achieve ""similar"" results).

About a GUI for KVM to use on desktop distributions, that is an
irrelevant concern compared to the lack of protocol more efficient
than rdesktop/rdp/vnc for desktop virtualization. I've people asking
me to migrate hundreds of desktops to desktop virtualization on KVM in
their organizations and I tell them to use spice because I believe
it's the most efficient option available (at least as far as we stick
to open source open protocols), there are universities using spice on
thousand of student desktops, and I think we need paravirt graphics to
happen ASAP in the main qemu tree too.

In short: running KVM on the desktop is irrelevant compared to running
the desktop on KVM so I suggest to focus on what is more important
first ;).

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-20 14:59                                                       ` Andrea Arcangeli
@ 2010-03-21 10:03                                                         ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 10:03 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, Zachary Amsden, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/20/2010 04:59 PM, Andrea Arcangeli wrote:
> On Fri, Mar 19, 2010 at 09:21:49AM +0200, Avi Kivity wrote:
>    
>> On 03/19/2010 12:44 AM, Ingo Molnar wrote:
>>      
>>> Too bad - there was heavy initial opposition to the arch/x86 unification as
>>> well [and heavy opposition to tools/perf/ as well], still both worked out
>>> extremely well :-)
>>>
>>>        
>> Did you forget that arch/x86 was a merging of a code fork that happened
>> several years previously?  Maybe that fork shouldn't have been done to
>> begin with.
>>      
> We discussed and probably timidly tried to share the sharable
> initially but we realized it was too time wasteful. In addition to
> having to adapt the code to 64bit we would also had to constantly
> solve another problem on top of it (see the various split on _32/_64,
> those takes time to achieve, maybe not huge time but still definitely
> some time and effort). Even in retrospect I am quite sure the way
> x86-64 happened was optimal and if we would go back we would do it
> again the exact same way even if the final object was to have a common
> arch/x86 (and thankfully Linus is flexible and smart enough to realize
> that code that isn't risking to destabilize anything shouldn't be
> forced out just because it's not to a totally
> theoretical-perfect-nitpicking-clean-state yet). It's still a lot of
> work do the unification later as a separate task, but it's not like if
> we did it immediately it would have been a lot less work. It's about
> the same amount of effort and we were able to defer it for later and
> decrease the time to market which surely has contributed to the
> success of x86-64.
>    

In hindsight decisions are much easier.  I agree it was less risky to 
fork than to share.  But if another instruction set forks out a 64-bit 
not-exactly-compatible variant, I'm sure we'll start out shared and not 
fork it, especially if the platform remains the same.

> Problem of qemu is not some lack of GUI or that it's not included in
> the linux kernel git tree, the definitive problem is how to merge
> qemu-kvm/kvm and qlx into it. If you (Avi) were the qemu maintainer I
> am sure there wouldn't two trees so as a developer I would totally
> love it, and I am sure that with you as maintainer it would have a
> chance to move forward with qlx on desktop virtualization without
> proposing to extend vnc instead to achieve a "similar" result (imagine
> if btrfs is published on a website and people starts to discuss if it
> should ever be merged ever because reinventing some part of btrfs
> inside ext5 might achieve ""similar"" results).
>    

The qemu/qemu-kvm fork is definitely hurting.  Some history: when kvm 
started out I pulled qemu for fast hacking and, much like arch/x86_64, I 
couldn't destabilize qemu for something that was completely experimental 
(and closed source at the time).  Moreover, it wasn't clear if the qemu 
community would be interested.

The qemu-kvm fork was designed for minimal intrusion so I could merge 
upstream qemu regularly.  This resulted in kvm integration that was 
fairly ugly.  Later Anthony merged a well-integrated alternative 
implementation (in retrospect this was a mistake IMO - we were left with 
a well tested high performing ugly implementation and a clean, slow, 
untested, and unfeatured implementation, and no one who wants to merge 
the two).  So now it is pretty confusing to read the code which has the 
two alternate implementation sometimes sharing code and sometimes diverging.


> About a GUI for KVM to use on desktop distributions, that is an
> irrelevant concern compared to the lack of protocol more efficient
> than rdesktop/rdp/vnc for desktop virtualization. I've people asking
> me to migrate hundreds of desktops to desktop virtualization on KVM in
> their organizations and I tell them to use spice because I believe
> it's the most efficient option available (at least as far as we stick
> to open source open protocols), there are universities using spice on
> thousand of student desktops, and I think we need paravirt graphics to
> happen ASAP in the main qemu tree too.
>    

That effort will have to wait for the spice project to mature.

> In short: running KVM on the desktop is irrelevant compared to running
> the desktop on KVM so I suggest to focus on what is more important
> first ;).
>    

Anyone can focus on what interests them, if someone has an interest in a 
good desktop-on-desktop experience they should start hacking and sending 
patches.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-18 16:13                                                   ` Ingo Molnar
                                                                       ` (2 preceding siblings ...)
  2010-03-18 18:23                                                     ` drepper
@ 2010-03-21 13:27                                                     ` Gabor Gombas
  3 siblings, 0 replies; 390+ messages in thread
From: Gabor Gombas @ 2010-03-21 13:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Avi Kivity, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Thu, Mar 18, 2010 at 05:13:10PM +0100, Ingo Molnar wrote:

> > Why does Linux AIO still suck?  Why do we not have a proper interface in 
> > userspace for doing asynchronous file system operations?
> 
> Good that you mention it, i think it's an excellent example.
> 
> The suckage of kernel async IO is for similar reasons: there's an ugly package 
> separation problem between the kernel and between glibc - and between the apps 
> that would make use of it.

No, kernel async IO sucks because it still does not play well with
buffered I/O. Last time I checked (about a year ago or so), AIO syscall
latencies were much worse when buffered I/O was used compared to direct
I/O. Unfortunately, to achieve good performance with direct I/O, you
need a HW RAID card with lots of on-board cache.

Gabor

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-19 17:29     ` oerg Roedel
@ 2010-03-21 18:43       ` Ingo Molnar
  2010-03-22 10:14         ` oerg Roedel
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 18:43 UTC (permalink / raw)
  To: oerg Roedel
  Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo


* oerg Roedel <joro@8bytes.org> wrote:

> On Fri, Mar 19, 2010 at 09:21:22AM +0100, Ingo Molnar wrote:
> > Unfortunately, in a previous thread the Qemu maintainer has indicated that he 
> > will essentially NAK any attempt to enhance Qemu to provide an easily 
> > discoverable, self-contained, transparent guest mount on the host side.
> > 
> > No technical justification was given for that NAK, despite my repeated 
> > requests to particulate the exact security problems that such an approach 
> > would cause.
> > 
> > If that NAK does not stand in that form then i'd like to know about it - it 
> > makes no sense for us to try to code up a solution against a standing 
> > maintainer NAK ...
> 
> I still think it is the best and most generic way to let the guest do the 
> symbol resolution. [...]

Not really.

> [...] This has several advantages:
> 
> 	1. The guest knows best about its symbol space. So this would be
> 	   extensible to other guest operating systems.  A brave
> 	   developer may even implement symbol passing for Windows or
> 	   the BSDs ;-)

Having access to the actual executable files that include the symbols achieves 
precisely that - with the additional robustness that all this functionality is 
concentrated into the host, while the guest side is kept minimal (and 
transparent).

> 	2. The guest can decide for its own if it want to pass this
> 	   inforamtion to the host-perf. No security issues at all.

It can decide whether it exposes the files. Nor are there any "security 
issues" to begin with.

> 	3. The guest can also pass us the call-chain and we don't need
> 	   to care about complicated of fetching from the guest
> 	   ourself.

You need to be aware of the fact that symbol resolution is a separate step 
from call chain generation.

I.e. call-chains are a (entirely) separate issue, and could reasonably be done 
in the guest or in the host.

It has no bearing on this symbol resolution question.

> 	4. This way extensible to nested virtualization too.

Nested virtualization is actually already taken care of by the filesystem 
solution via an existing method called 'subdirectories'. If the guest offers 
sub-guests then those symbols will be exposed in a similar way via its own 
'guest files' directory hierarchy.

I.e. if we have 'Guest-2' nested inside 'the 'Guest-Fedora-1' instance, we get:

 /guests/
 /guests/Guest-Fedora-1/etc/
 /guests/Guest-Fedora-1/usr/

we'd also have:

 /guests/Guest-Fedora-1/guests/Guest-2/

So this is taken care of automatically.

I.e. none of the four 'advantages' listed here are actually advantages over my 
proposed solution, so your conclusion is subsequently flawed as well.

> How we speak to the guest was already discussed in this thread. My personal 
> opinion is that going through qemu is an unnecessary step and we can solve 
> that more clever and transparent for perf.

Meaning exactly what?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-20  7:35                                                                           ` Avi Kivity
@ 2010-03-21 19:06                                                                             ` Ingo Molnar
  2010-03-21 20:22                                                                               ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 19:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> >> [...] Second, from my point of view all contributors are volunteers 
> >> (perhaps their employer volunteered them, but there's no difference from 
> >> my perspective). Asking them to repaint my apartment as a condition to 
> >> get a patch applied is abuse.  If a patch is good, it gets applied.
> >
> > This is one of the weirdest arguments i've seen in this thread. Almost all 
> > the time do we make contributions conditional on the general shape of the 
> > project. Developers dont get to do just the fun stuff.
> 
> So, do you think a reply to a patch along the lines of
> 
>   NAK.  Improving scalability is pointless while we don't have a decent GUI.  
> I'll review you RCU patches
>   _after_ you've contributed a usable GUI.
> 
> ?

What does this have to do with RCU?

I'm talking about KVM, which is a Linux kernel feature that is useless without 
a proper, KVM-specific app making use of it.

RCU is a general kernel performance feature that works across the board. It 
helps KVM indirectly, and it helps many other kernel subsystems as well. It 
needs no user-space tool to be useful.

KVM on the other hand is useless without a user-space tool.

[ Theoretically you might have a fair point if it were a critical feature of 
  RCU for it to have a GUI, and if the main tool that made use of it sucked. 
  But it isnt and you should know that. ]

Had you suggested the following 'NAK', applied to a different, relevant 
subsystem:

  |   NAK.  Improving scalability is pointless while we don't have a usable 
  | tool.  I'll review you perf patches _after_ you've contributed a usable 
  | tool.

you would have a fair point. In fact, we are doing that we are living by that. 
It makes absolutely zero sense to improve the scalability of perf if its 
usability sucks.

So where you are trying to point out an inconsistency in my argument there is 
none.

> > This is a basic quid pro quo: new features introduce risks and create 
> > additional workload not just to the originating developer but on the rest 
> > of the community as well. You should check how Linus has pulled new 
> > features in the past 15 years: he very much requires the existing code to 
> > first be top-notch before he accepts new features for a given area of 
> > functionality.
> 
> For a given area, yes. [...]

That is my precise point.

KVM is a specific subsystem or "area" that makes no sense without the 
user-space tooling it relates to. You seem to argue that you have no 'right' 
to insist on good quality of that tooling - and IMO you are fundamentally 
wrong with that.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-19 12:56                                                                           ` Anthony Liguori
@ 2010-03-21 19:17                                                                             ` Ingo Molnar
  2010-03-21 19:35                                                                               ` Antoine Martin
                                                                                                 ` (2 more replies)
  0 siblings, 3 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 19:17 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/19/2010 03:53 AM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>>There were two negative reactions immediately, both showed a fundamental
> >>>server versus desktop bias:
> >>>
> >>>  - you did not accept that the most important usecase is when there is a
> >>>    single guest running.
> >>Well, it isn't.
> >Erm, my usability points are _doubly_ true when there are multiple guests ...
> >
> >The inconvenience of having to type:
> >
> >   perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
> >   --guestmodules=/home/ymzhang/guest/modules top
> >
> >is very obvious even with a single guest. Now multiply that by more guests ...
> 
> If you want to improve this, you need to do the following:
> 
> 1) Add a userspace daemon that uses vmchannel that runs in the guest and can 
>    fetch kallsyms and arbitrary modules.  If that daemon lives in 
>    tools/perf, that's fine.

Adding any new daemon to an existing guest is a deployment and usability 
nightmare.

The basic rule of good instrumentation is to be transparent. The moment we 
have to modify the user-space of a guest just to monitor it, the purpose of 
transparent instrumentation is defeated.

That was one of the fundamental usability mistakes of Oprofile.

There is no 'perf' daemon - all the perf functionality is _built in_, and for 
very good reasons. It is one of the main reasons for perf's success as well.

Now Qemu is trying to repeat that stupid mistake ...

So please either suggest a different transparent solution that is technically 
better than the one i suggested, or you should concede the point really.

Please try think with the heads of our users and developers and dont suggest 
some weird ivory-tower design that is totally impractical ...

And no, you have to code none of this, we'll do all the coding. The only thing 
we are asking is for you to not stand in the way of good usability ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 19:17                                                                             ` Ingo Molnar
@ 2010-03-21 19:35                                                                               ` Antoine Martin
  2010-03-21 19:59                                                                                 ` Ingo Molnar
  2010-03-21 20:01                                                                               ` Avi Kivity
  2010-03-21 23:35                                                                               ` Anthony Liguori
  2 siblings, 1 reply; 390+ messages in thread
From: Antoine Martin @ 2010-03-21 19:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Avi Kivity, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 02:17 AM, Ingo Molnar wrote:
> * Anthony Liguori<anthony@codemonkey.ws>  wrote:
>    
>> On 03/19/2010 03:53 AM, Ingo Molnar wrote:
>>      
>>> * Avi Kivity<avi@redhat.com>   wrote:
>>>        
>>>>> There were two negative reactions immediately, both showed a fundamental
>>>>> server versus desktop bias:
>>>>>
>>>>>   - you did not accept that the most important usecase is when there is a
>>>>>     single guest running.
>>>>>            
>>>> Well, it isn't.
>>>>          
>>> Erm, my usability points are _doubly_ true when there are multiple guests ...
>>>
>>> The inconvenience of having to type:
>>>
>>>    perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
>>>    --guestmodules=/home/ymzhang/guest/modules top
>>>
>>> is very obvious even with a single guest. Now multiply that by more guests ...
>>>        
>> If you want to improve this, you need to do the following:
>>
>> 1) Add a userspace daemon that uses vmchannel that runs in the guest and can
>>     fetch kallsyms and arbitrary modules.  If that daemon lives in
>>     tools/perf, that's fine.
>>      
> Adding any new daemon to an existing guest is a deployment and usability
> nightmare.
>    
Absolutely. In most cases it is not desirable, and you'll find that in a 
lot of cases it is not even possible - for non-technical reasons.
One of the main benefits of virtualization is the ability to manage and 
see things from the outside.
> The basic rule of good instrumentation is to be transparent. The moment we
> have to modify the user-space of a guest just to monitor it, the purpose of
> transparent instrumentation is defeated.
>    
Not to mention Heisenbugs and interference.

Cheers
Antoine

> That was one of the fundamental usability mistakes of Oprofile.
>
> There is no 'perf' daemon - all the perf functionality is _built in_, and for
> very good reasons. It is one of the main reasons for perf's success as well.
>
> Now Qemu is trying to repeat that stupid mistake ...
>
> So please either suggest a different transparent solution that is technically
> better than the one i suggested, or you should concede the point really.
>
> Please try think with the heads of our users and developers and dont suggest
> some weird ivory-tower design that is totally impractical ...
>
> And no, you have to code none of this, we'll do all the coding. The only thing
> we are asking is for you to not stand in the way of good usability ...
>
> Thanks,
>
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 19:35                                                                               ` Antoine Martin
@ 2010-03-21 19:59                                                                                 ` Ingo Molnar
  2010-03-21 20:09                                                                                   ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 19:59 UTC (permalink / raw)
  To: Antoine Martin
  Cc: Anthony Liguori, Avi Kivity, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Antoine Martin <antoine@nagafix.co.uk> wrote:

> On 03/22/2010 02:17 AM, Ingo Molnar wrote:
> >* Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >>On 03/19/2010 03:53 AM, Ingo Molnar wrote:
> >>>* Avi Kivity<avi@redhat.com>   wrote:
> >>>>>There were two negative reactions immediately, both showed a fundamental
> >>>>>server versus desktop bias:
> >>>>>
> >>>>>  - you did not accept that the most important usecase is when there is a
> >>>>>    single guest running.
> >>>>Well, it isn't.
> >>>Erm, my usability points are _doubly_ true when there are multiple guests ...
> >>>
> >>>The inconvenience of having to type:
> >>>
> >>>   perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
> >>>   --guestmodules=/home/ymzhang/guest/modules top
> >>>
> >>>is very obvious even with a single guest. Now multiply that by more guests ...
> >>If you want to improve this, you need to do the following:
> >>
> >>1) Add a userspace daemon that uses vmchannel that runs in the guest and can
> >>    fetch kallsyms and arbitrary modules.  If that daemon lives in
> >>    tools/perf, that's fine.
> >
> > Adding any new daemon to an existing guest is a deployment and usability 
> > nightmare.
>
> Absolutely. In most cases it is not desirable, and you'll find that in a lot 
> of cases it is not even possible - for non-technical reasons.
>
> One of the main benefits of virtualization is the ability to manage and see 
> things from the outside.
>
> > The basic rule of good instrumentation is to be transparent. The moment we 
> > have to modify the user-space of a guest just to monitor it, the purpose 
> > of transparent instrumentation is defeated.
>
> Not to mention Heisenbugs and interference.

Correct.

Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony 
suggesting such a clearly inferior "add a demon to the guest space" solution. 
It's a usability and deployment non-starter.

Furthermore, allowing a guest to integrate/mount its files into the host VFS 
space (which was my suggestion) has many other uses and advantages as well, 
beyond the instrumentation/symbol-lookup purpose.

So can we please have some resolution here and move on: the KVM maintainers 
should either suggest a different transparent approach, or should retract the 
NAK for the solution we suggested.

We very much want to make progress and want to write code, but obviously we 
cannot code against a maintainer NAK, nor can we code up an inferior solution 
either.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 19:17                                                                             ` Ingo Molnar
  2010-03-21 19:35                                                                               ` Antoine Martin
@ 2010-03-21 20:01                                                                               ` Avi Kivity
  2010-03-21 20:08                                                                                 ` Olivier Galibert
  2010-03-21 20:31                                                                                 ` Ingo Molnar
  2010-03-21 23:35                                                                               ` Anthony Liguori
  2 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 20:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 09:17 PM, Ingo Molnar wrote:
>
> Adding any new daemon to an existing guest is a deployment and usability
> nightmare.
>    

The logical conclusion of that is that everything should be built into 
the kernel.  Where a failure brings the system down or worse.  Where you 
have to bear the memory footprint whether you ever use the functionality 
or not.  Where to update the functionality you need to deploy a new 
kernel (possibly introducing unrelated bugs) and reboot.

If userspace daemons are such a deployment and usability nightmare, 
maybe we should fix that instead.

> The basic rule of good instrumentation is to be transparent. The moment we
> have to modify the user-space of a guest just to monitor it, the purpose of
> transparent instrumentation is defeated.
>    

You have to modify the guest anyway by deploying a new kernel.

> Please try think with the heads of our users and developers and dont suggest
> some weird ivory-tower design that is totally impractical ...
>    

inetd.d style 'drop a listener config here and it will be executed on 
connection' should work.  The listener could come with the kernel 
package, though I don't think it's a good idea.  module-init-tools 
doesn't and people have survived somehow.

> And no, you have to code none of this, we'll do all the coding. The only thing
> we are asking is for you to not stand in the way of good usability ...
>    

Thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:01                                                                               ` Avi Kivity
@ 2010-03-21 20:08                                                                                 ` Olivier Galibert
  2010-03-21 20:11                                                                                   ` Avi Kivity
  2010-03-21 20:11                                                                                   ` Avi Kivity
  2010-03-21 20:31                                                                                 ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: Olivier Galibert @ 2010-03-21 20:08 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote:
> On 03/21/2010 09:17 PM, Ingo Molnar wrote:
> >
> >Adding any new daemon to an existing guest is a deployment and usability
> >nightmare.
> >   
> 
> The logical conclusion of that is that everything should be built into 
> the kernel.  Where a failure brings the system down or worse.  Where you 
> have to bear the memory footprint whether you ever use the functionality 
> or not.  Where to update the functionality you need to deploy a new 
> kernel (possibly introducing unrelated bugs) and reboot.
> 
> If userspace daemons are such a deployment and usability nightmare, 
> maybe we should fix that instead.

Which userspace?  Deploying *anything* in the guest can be a
nightmare, including paravirt drivers if you don't have a natively
supported in the OS virtual hardware backoff.  Deploying things in the
host OTOH is business as usual.

And you're smart enough to know that.

  OG.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 19:59                                                                                 ` Ingo Molnar
@ 2010-03-21 20:09                                                                                   ` Avi Kivity
  2010-03-21 21:00                                                                                     ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 20:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Antoine Martin, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 09:59 PM, Ingo Molnar wrote:
>
> Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony
> suggesting such a clearly inferior "add a demon to the guest space" solution.
> It's a usability and deployment non-starter.
>    

It's only clearly inferior if you ignore every consideration against 
it.  It's definitely not a deployment non-starter, see the tons of 
daemons that come with any Linux system.  The basic ones are installed 
and enabled automatically during system installation.

> Furthermore, allowing a guest to integrate/mount its files into the host VFS
> space (which was my suggestion) has many other uses and advantages as well,
> beyond the instrumentation/symbol-lookup purpose.
>    

Yes.  I'm just not sure about the auto-enabling part.

> So can we please have some resolution here and move on: the KVM maintainers
> should either suggest a different transparent approach, or should retract the
> NAK for the solution we suggested.
>    

So long as you define 'transparent' as in 'only the guest kernel is 
involved' or even 'only the guest and host kernels are involved' we 
aren't going to make a lot of progress.  I oppose shoving random bits of 
functionality into the kernel, especially things that are in daily use.  
While us developers do and will use profiling extensively, it doesn't 
need sit in every guest's non-swappable .text.

> We very much want to make progress and want to write code, but obviously we
> cannot code against a maintainer NAK, nor can we code up an inferior solution
> either.
>    

You haven't heard any NAKs, only objections.  If we discuss things 
perhaps we can achieve something that works for everyone.  If we keep 
turning the flames higher that's unlikely.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:08                                                                                 ` Olivier Galibert
@ 2010-03-21 20:11                                                                                   ` Avi Kivity
  2010-03-21 20:18                                                                                     ` Antoine Martin
  2010-03-21 20:37                                                                                     ` Ingo Molnar
  2010-03-21 20:11                                                                                   ` Avi Kivity
  1 sibling, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 20:11 UTC (permalink / raw)
  To: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/21/2010 10:08 PM, Olivier Galibert wrote:
> On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote:
>    
>> On 03/21/2010 09:17 PM, Ingo Molnar wrote:
>>      
>>> Adding any new daemon to an existing guest is a deployment and usability
>>> nightmare.
>>>
>>>        
>> The logical conclusion of that is that everything should be built into
>> the kernel.  Where a failure brings the system down or worse.  Where you
>> have to bear the memory footprint whether you ever use the functionality
>> or not.  Where to update the functionality you need to deploy a new
>> kernel (possibly introducing unrelated bugs) and reboot.
>>
>> If userspace daemons are such a deployment and usability nightmare,
>> maybe we should fix that instead.
>>      
> Which userspace?  Deploying *anything* in the guest can be a
> nightmare, including paravirt drivers if you don't have a natively
> supported in the OS virtual hardware backoff.

That includes the guest kernel.  If you can deploy a new kernel in the 
guest, presumably you can deploy a userspace package.

> Deploying things in the
> host OTOH is business as usual.
>    

True.

> And you're smart enough to know that.
>    

Thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:08                                                                                 ` Olivier Galibert
  2010-03-21 20:11                                                                                   ` Avi Kivity
@ 2010-03-21 20:11                                                                                   ` Avi Kivity
  1 sibling, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 20:11 UTC (permalink / raw)
  To: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin

On 03/21/2010 10:08 PM, Olivier Galibert wrote:
> On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote:
>    
>> On 03/21/2010 09:17 PM, Ingo Molnar wrote:
>>      
>>> Adding any new daemon to an existing guest is a deployment and usability
>>> nightmare.
>>>
>>>        
>> The logical conclusion of that is that everything should be built into
>> the kernel.  Where a failure brings the system down or worse.  Where you
>> have to bear the memory footprint whether you ever use the functionality
>> or not.  Where to update the functionality you need to deploy a new
>> kernel (possibly introducing unrelated bugs) and reboot.
>>
>> If userspace daemons are such a deployment and usability nightmare,
>> maybe we should fix that instead.
>>      
> Which userspace?  Deploying *anything* in the guest can be a
> nightmare, including paravirt drivers if you don't have a natively
> supported in the OS virtual hardware backoff.

That includes the guest kernel.  If you can deploy a new kernel in the 
guest, presumably you can deploy a userspace package.

> Deploying things in the
> host OTOH is business as usual.
>    

True.

> And you're smart enough to know that.
>    

Thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:11                                                                                   ` Avi Kivity
@ 2010-03-21 20:18                                                                                     ` Antoine Martin
  2010-03-21 20:24                                                                                       ` Avi Kivity
  2010-03-21 20:37                                                                                     ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Antoine Martin @ 2010-03-21 20:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/22/2010 03:11 AM, Avi Kivity wrote:
> On 03/21/2010 10:08 PM, Olivier Galibert wrote:
>> On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote:
>>> On 03/21/2010 09:17 PM, Ingo Molnar wrote:
>>>> Adding any new daemon to an existing guest is a deployment and 
>>>> usability
>>>> nightmare.
>>>>
>>> The logical conclusion of that is that everything should be built into
>>> the kernel.  Where a failure brings the system down or worse.  Where 
>>> you
>>> have to bear the memory footprint whether you ever use the 
>>> functionality
>>> or not.  Where to update the functionality you need to deploy a new
>>> kernel (possibly introducing unrelated bugs) and reboot.
>>>
>>> If userspace daemons are such a deployment and usability nightmare,
>>> maybe we should fix that instead.
>> Which userspace?  Deploying *anything* in the guest can be a
>> nightmare, including paravirt drivers if you don't have a natively
>> supported in the OS virtual hardware backoff.
>
> That includes the guest kernel.  If you can deploy a new kernel in the 
> guest, presumably you can deploy a userspace package.
That's not always true.
The host admin can control the guest kernel via "kvm -kernel" easily 
enough, but he may or may not have access to the disk that is used in 
the guest. (think encrypted disks, service agreements, etc)

Antoine
>> Deploying things in the
>> host OTOH is business as usual.
>
> True.
>
>> And you're smart enough to know that.
>
> Thanks.
>


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 19:06                                                                             ` Ingo Molnar
@ 2010-03-21 20:22                                                                               ` Avi Kivity
  2010-03-21 20:55                                                                                 ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 20:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 09:06 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>>> [...] Second, from my point of view all contributors are volunteers
>>>> (perhaps their employer volunteered them, but there's no difference from
>>>> my perspective). Asking them to repaint my apartment as a condition to
>>>> get a patch applied is abuse.  If a patch is good, it gets applied.
>>>>          
>>> This is one of the weirdest arguments i've seen in this thread. Almost all
>>> the time do we make contributions conditional on the general shape of the
>>> project. Developers dont get to do just the fun stuff.
>>>        
>> So, do you think a reply to a patch along the lines of
>>
>>    NAK.  Improving scalability is pointless while we don't have a decent GUI.
>> I'll review you RCU patches
>>    _after_ you've contributed a usable GUI.
>>
>> ?
>>      
> What does this have to do with RCU?
>    

The example was rcuifying kvm which took place a bit ago.  Sorry, it 
wasn't clear.

> I'm talking about KVM, which is a Linux kernel feature that is useless without
> a proper, KVM-specific app making use of it.
>
> RCU is a general kernel performance feature that works across the board. It
> helps KVM indirectly, and it helps many other kernel subsystems as well. It
> needs no user-space tool to be useful.
>    

Correct.  So should I tell someone that has sent a patch that rcu-ified 
kvm in order to scale it, that I won't accept the patch unless they do 
some usability userspace work?  say, implementing an eject button. 
That's what I understood you to mean.

> KVM on the other hand is useless without a user-space tool.
>
> [ Theoretically you might have a fair point if it were a critical feature of
>    RCU for it to have a GUI, and if the main tool that made use of it sucked.
>    But it isnt and you should know that. ]
>
> Had you suggested the following 'NAK', applied to a different, relevant
> subsystem:
>
>    |   NAK.  Improving scalability is pointless while we don't have a usable
>    | tool.  I'll review you perf patches _after_ you've contributed a usable
>    | tool.
>    

That might hold, but the tool is usable at least for some people - it 
runs in production.  The people running it won't benefit from an eject 
button or any usability improvement since they run it through a 
centralized management tool that hides everything.  They will benefit 
from the scalability patches.  Should I still make those patches 
conditional on scalability work that is of no interest to the submitter?

>    
>>> This is a basic quid pro quo: new features introduce risks and create
>>> additional workload not just to the originating developer but on the rest
>>> of the community as well. You should check how Linus has pulled new
>>> features in the past 15 years: he very much requires the existing code to
>>> first be top-notch before he accepts new features for a given area of
>>> functionality.
>>>        
>> For a given area, yes. [...]
>>      
> That is my precise point.
>
> KVM is a specific subsystem or "area" that makes no sense without the
> user-space tooling it relates to. You seem to argue that you have no 'right'
> to insist on good quality of that tooling - and IMO you are fundamentally
> wrong with that.
>    

kvm contains many sub-areas.  I'm not going to tie unrelated things 
together like the GUI and sclability, configuration file format and 
emulator correctness, nested virtualization and qcow2 asynchronity, or 
other crazy combinations.  People either leave en mass or become 
frustrated if they can't.  I do reject patches touching a sub-area that 
I think need to be done in userspace, for example.

That's not to say kvm development is random.  We have a weekly 
conference call where regular contributors and maintainers of both qemu 
and kvm participate and where we decide where to focus.  Sadly the issue 
of a qemu GUI is not raised often.  Perhaps you can participate and 
voice your concerns.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:18                                                                                     ` Antoine Martin
@ 2010-03-21 20:24                                                                                       ` Avi Kivity
  2010-03-21 20:31                                                                                         ` Antoine Martin
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 20:24 UTC (permalink / raw)
  To: Antoine Martin
  Cc: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/21/2010 10:18 PM, Antoine Martin wrote:
>> That includes the guest kernel.  If you can deploy a new kernel in 
>> the guest, presumably you can deploy a userspace package.
>
> That's not always true.
> The host admin can control the guest kernel via "kvm -kernel" easily 
> enough, but he may or may not have access to the disk that is used in 
> the guest. (think encrypted disks, service agreements, etc)

There is a matching -initrd argument that you can use to launch a 
daemon.  I believe that -kernel use will be rare, though.  It's a lot 
easier to keep everything in one filesystem.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:01                                                                               ` Avi Kivity
  2010-03-21 20:08                                                                                 ` Olivier Galibert
@ 2010-03-21 20:31                                                                                 ` Ingo Molnar
  2010-03-21 21:30                                                                                   ` Avi Kivity
  2010-03-22 11:10                                                                                   ` oerg Roedel
  1 sibling, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 20:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/21/2010 09:17 PM, Ingo Molnar wrote:
> >
> > Adding any new daemon to an existing guest is a deployment and usability
> > nightmare.
> 
> The logical conclusion of that is that everything should be built into the 
> kernel. [...]

Only if you apply it as a totalitarian rule.

Furthermore, the logical conclusion of _your_ line of argument (applied in a 
totalitarian manner) is that 'nothing should be built into the kernel'.

I.e. you are arguing for microkernel Linux, while you see me as arguing for a 
monolithic kernel.

Reality is that we are somewhere inbetween, we are neither black nor white:
it's shades of grey.

If we want to do a good job with all this then we observe subsystems, we see 
how they relate to the physical world and decide about how to shape them. We 
identify long-term changes and re-design modularization boundaries in 
hindsight - when we got them wrong initially. We dont try to rationalize the 
status-quo.

Lets see one example of that thought process in action: Oprofile.

We saw that the modularization of oprofile was a total nightmare: a separate 
kernel-space and a separate user-space component, which was in constant 
version friction. The ABI between them was stiffling: it was hard to change it 
(you needed to trickle that through the tool as well which was on a different 
release schedule, etc.e tc.)

The result was sucky usability that never went beyond some basic 'you can do 
profiling' threshold. The subsystem worked well within that design box, and it 
was worked on by highly competent people - but it was still far, far away from 
the potential it could have achieved.

So we observed those problems and decided to do something about it:

 - We unified the two parts into a single maintenance domain. There's
   the kernel-side in kernel/perf_event.c and arch/*/*/perf_event.c,
   plus the user-side in tools/perf/. The two are connected by a very
   flexible, forwards and backwards compatible ABI.

 - We moved much more code into the kernel, realizing that transparent
   and robust instrumentation should be offered instead of punting
   abstractions into user-space (which is in a disadvantaged position
   to implement system-wide abstractions).

 - We created a no-bullsh*t approach to usability. perf is by no means 
   perfect, but it's written by developers for developers and if you report a 
   bug to us we'll act on it before anything else. Furthermore the kernel
   developers do the user-space coding as well, so there's no chinese
   wall separating them. Kernel-space becomes aware of the intricacies of
   user-space and user-space developers become aware of the difficulties of
   kernel-space as well. It's a good mix in our experience.

The thing is (and i doubt you are surprised that i say that), i see a similar 
situation with KVM. The basic parameters are comparable to Oprofile: it has a 
kernel-space component and a KVM-specific user-space. By all practical means 
the two are one and the same, but are maintained as different projects.

I have followed KVM since its inception with great interest. I saw its good 
initial design, i tried it early on and even wrote various patches for it. So 
i care more about KVM than a random observer would, but this preference and 
passion for KVM's good technical sides does not cloud my judgement when it 
comes to its weaknesses.

In fact the weaknesses are far more important to identify and express 
publicly, so i tend to concentrate on them. Dont take this as me blasting KVM, 
we both know the many good aspects of KVM.

So, as i explained it earlier in greater detail the modularization of KVM into 
a separate kernel-space and user-space component is one of its worst current 
weaknesses, and it has become the main stiffling force in the way of a better 
KVM experience to users.

That, IMO, is the 'weakest link' of KVM today and no matter how well the rest 
of KVM gets improved those nice bits all get unfairly ignored when the user 
cannot have a usable and good desktop experience and thinks that KVM is 
crappy.

I think you should think outside the initial design box you have created 4 
years ago, you should consider iterating the model and you should consider the 
alternative i suggested: move (or create) KVM tooling to tools/kvm/ and treat 
it as a single project from there on.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:24                                                                                       ` Avi Kivity
@ 2010-03-21 20:31                                                                                         ` Antoine Martin
  2010-03-21 21:03                                                                                           ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Antoine Martin @ 2010-03-21 20:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/22/2010 03:24 AM, Avi Kivity wrote:
> On 03/21/2010 10:18 PM, Antoine Martin wrote:
>>> That includes the guest kernel.  If you can deploy a new kernel in 
>>> the guest, presumably you can deploy a userspace package.
>>
>> That's not always true.
>> The host admin can control the guest kernel via "kvm -kernel" easily 
>> enough, but he may or may not have access to the disk that is used in 
>> the guest. (think encrypted disks, service agreements, etc)
>
> There is a matching -initrd argument that you can use to launch a daemon.
I thought this discussion was about making it easy to deploy... and 
generating a custom initrd isn't easy by any means, and it requires 
access to the guest filesystem (and its mkinitrd tools).
>   I believe that -kernel use will be rare, though.  It's a lot easier 
> to keep everything in one filesystem.
Well, for what it's worth, I rarely ever use anything else. My virtual 
disks are raw so I can loop mount them easily, and I can also switch my 
guest kernels from outside... without ever needing to mount those disks.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:11                                                                                   ` Avi Kivity
  2010-03-21 20:18                                                                                     ` Antoine Martin
@ 2010-03-21 20:37                                                                                     ` Ingo Molnar
  2010-03-22  6:37                                                                                       ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 20:37 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/21/2010 10:08 PM, Olivier Galibert wrote:
> >On Sun, Mar 21, 2010 at 10:01:51PM +0200, Avi Kivity wrote:
> >>On 03/21/2010 09:17 PM, Ingo Molnar wrote:
> >>>Adding any new daemon to an existing guest is a deployment and usability
> >>>nightmare.
> >>>
> >>The logical conclusion of that is that everything should be built into
> >>the kernel.  Where a failure brings the system down or worse.  Where you
> >>have to bear the memory footprint whether you ever use the functionality
> >>or not.  Where to update the functionality you need to deploy a new
> >>kernel (possibly introducing unrelated bugs) and reboot.
> >>
> >>If userspace daemons are such a deployment and usability nightmare,
> >>maybe we should fix that instead.
> >Which userspace?  Deploying *anything* in the guest can be a
> >nightmare, including paravirt drivers if you don't have a natively
> >supported in the OS virtual hardware backoff.
> 
> That includes the guest kernel.  If you can deploy a new kernel in the 
> guest, presumably you can deploy a userspace package.

Note that with perf we can instrument the guest with zero guest-kernel 
modifications as well.

We try to reduce the guest impact to a bare minimum, as the difficulties in 
deployment are function of the cross section surface to the guest.

Also, note that the kernel is special with regards to instrumentation: since 
this is the kernel project, we are doing kernel space changes, as we are doing 
them _anyway_. So adding symbol resolution capabilities would be a minimal 
addition to that - while adding a while new guest package for the demon would 
significantly increase the cross section surface.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:22                                                                               ` Avi Kivity
@ 2010-03-21 20:55                                                                                 ` Ingo Molnar
  2010-03-21 21:42                                                                                   ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 20:55 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/21/2010 09:06 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>>>[...] Second, from my point of view all contributors are volunteers
> >>>>(perhaps their employer volunteered them, but there's no difference from
> >>>>my perspective). Asking them to repaint my apartment as a condition to
> >>>>get a patch applied is abuse.  If a patch is good, it gets applied.
> >>>This is one of the weirdest arguments i've seen in this thread. Almost all
> >>>the time do we make contributions conditional on the general shape of the
> >>>project. Developers dont get to do just the fun stuff.
> >>So, do you think a reply to a patch along the lines of
> >>
> >>   NAK.  Improving scalability is pointless while we don't have a decent GUI.
> >>I'll review you RCU patches
> >>   _after_ you've contributed a usable GUI.
> >>
> >>?
> >What does this have to do with RCU?
> 
> The example was rcuifying kvm which took place a bit ago.  Sorry, it wasn't 
> clear.
> 
> > I'm talking about KVM, which is a Linux kernel feature that is useless 
> > without a proper, KVM-specific app making use of it.
> >
> > RCU is a general kernel performance feature that works across the board. 
> > It helps KVM indirectly, and it helps many other kernel subsystems as 
> > well. It needs no user-space tool to be useful.
> 
> Correct.  So should I tell someone that has sent a patch that rcu-ified kvm 
> in order to scale it, that I won't accept the patch unless they do some 
> usability userspace work?  say, implementing an eject button. That's what I 
> understood you to mean.

Of course you could say the following:

  ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
    able to add this to the v2.6.35 kernel queue anymore as the ongoing 
    usability work already takes up all of the project's maintainer and 
    testing bandwidth. If you want the feature to be merged sooner than that 
    then please help us cut down on the TODO and BUGS list that can be found 
    at XYZ. There's quite a few low hanging fruits there. '

Although this RCU example is 'worst' possible example, as it's a pure speedup 
change with no functionality effect.

Consider the _other_ examples that are a lot more clear:

   ' If you expose paravirt spilocks via KVM please also make sure the KVM
     tooling can make use of it, has an option for it to configure it, and 
     that it has sufficient efficiency statistics displayed in the tool for 
     admins to monitor.'

   ' If you create this new paravirt driver then please also make sure it can
     be configured in the tooling. '

   ' Please also add a testcase for this bug to tools/kvm/testcases/ so we dont
     repeat this same mistake in the future. '

I'd say most of the high-level feature work in KVM has tooling impact.

And note the important arguement that the 'eject button' thing would not occur 
naturally in a project that is well designed and has a good quality balance. 
It would only occur in the transitionary period if a big lump of lower-quality 
code is unified with higher-quality code. Then indeed a lot of pressure gets 
created on the people working on the high-quality portion to go over and fix 
the low-quality portion.

Which, btw., is an unconditonally good thing ...

But even an RCU speedup can be fairly linked/ordered to more pressing needs of 
a project.

Really, the unification of two tightly related pieces of code has numerous 
clear advantages. Please give it some thought before rejecting it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:09                                                                                   ` Avi Kivity
@ 2010-03-21 21:00                                                                                     ` Ingo Molnar
  2010-03-21 21:44                                                                                       ` Avi Kivity
  2010-03-21 23:43                                                                                       ` Anthony Liguori
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 21:00 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Antoine Martin, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/21/2010 09:59 PM, Ingo Molnar wrote:
> >
> >Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony
> >suggesting such a clearly inferior "add a demon to the guest space" solution.
> >It's a usability and deployment non-starter.
> 
> It's only clearly inferior if you ignore every consideration against it.  
> It's definitely not a deployment non-starter, see the tons of daemons that 
> come with any Linux system. [...]

Avi, please dont put arguments into my mouth that i never made.

My (clearly expressed) argument was that:

    _a new guest-side demon is a transparent instrumentation non-starter_

What is so hard to understand about that simple concept? Instrumentation is 
good if it's as transparent as possible.

Of course lots of other features can be done via a new user-space package ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:31                                                                                         ` Antoine Martin
@ 2010-03-21 21:03                                                                                           ` Avi Kivity
  2010-03-21 21:20                                                                                             ` Ingo Molnar
  2010-03-22 12:05                                                                                             ` Antoine Martin
  0 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 21:03 UTC (permalink / raw)
  To: Antoine Martin
  Cc: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/21/2010 10:31 PM, Antoine Martin wrote:
> On 03/22/2010 03:24 AM, Avi Kivity wrote:
>> On 03/21/2010 10:18 PM, Antoine Martin wrote:
>>>> That includes the guest kernel.  If you can deploy a new kernel in 
>>>> the guest, presumably you can deploy a userspace package.
>>>
>>> That's not always true.
>>> The host admin can control the guest kernel via "kvm -kernel" easily 
>>> enough, but he may or may not have access to the disk that is used 
>>> in the guest. (think encrypted disks, service agreements, etc)
>>
>> There is a matching -initrd argument that you can use to launch a 
>> daemon.
> I thought this discussion was about making it easy to deploy... and 
> generating a custom initrd isn't easy by any means, and it requires 
> access to the guest filesystem (and its mkinitrd tools).

That's true.  You need to run mkinitrd anyway, though, unless your guest 
is non-modular and non-lvm.

>>   I believe that -kernel use will be rare, though.  It's a lot easier 
>> to keep everything in one filesystem.
> Well, for what it's worth, I rarely ever use anything else. My virtual 
> disks are raw so I can loop mount them easily, and I can also switch 
> my guest kernels from outside... without ever needing to mount those 
> disks.

Curious, what do you use them for?

btw, if you build your kernel outside the guest, then you already have 
access to all its symbols, without needing anything further.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:03                                                                                           ` Avi Kivity
@ 2010-03-21 21:20                                                                                             ` Ingo Molnar
  2010-03-22  6:35                                                                                               ` Avi Kivity
  2010-03-22  6:59                                                                                               ` Zhang, Yanmin
  2010-03-22 12:05                                                                                             ` Antoine Martin
  1 sibling, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 21:20 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Antoine Martin, Olivier Galibert, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> > Well, for what it's worth, I rarely ever use anything else. My virtual 
> > disks are raw so I can loop mount them easily, and I can also switch my 
> > guest kernels from outside... without ever needing to mount those disks.
> 
> Curious, what do you use them for?
> 
> btw, if you build your kernel outside the guest, then you already have 
> access to all its symbols, without needing anything further.

There's two errors with your argument:

1) you are assuming that it's only about kernel symbols

Look at this 'perf report' output:

# Samples: 7127509216
#
# Overhead     Command                  Shared Object  Symbol
# ........  ..........  .............................  ......
#
    19.14%         git  git                            [.] lookup_object
    15.16%        perf  git                            [.] lookup_object
     4.74%        perf  libz.so.1.2.3                  [.] inflate
     4.52%         git  libz.so.1.2.3                  [.] inflate
     4.21%        perf  libz.so.1.2.3                  [.] inflate_table
     3.94%         git  libz.so.1.2.3                  [.] inflate_table
     3.29%         git  git                            [.] find_pack_entry_one
     3.24%         git  libz.so.1.2.3                  [.] inflate_fast
     2.96%        perf  libz.so.1.2.3                  [.] inflate_fast
     2.96%         git  git                            [.] decode_tree_entry
     2.80%        perf  libc-2.11.90.so                [.] __strlen_sse42
     2.56%         git  libc-2.11.90.so                [.] __strlen_sse42
     1.98%        perf  libc-2.11.90.so                [.] __GI_memcpy
     1.71%        perf  git                            [.] decode_tree_entry
     1.53%         git  libc-2.11.90.so                [.] __GI_memcpy
     1.48%         git  git                            [.] lookup_blob
     1.30%         git  git                            [.] process_tree
     1.30%        perf  git                            [.] process_tree
     0.90%        perf  git                            [.] tree_entry
     0.82%        perf  git                            [.] lookup_blob
     0.78%         git  [kernel.kallsyms]              [k] kstat_irqs_cpu

kernel symbols are only a small portion of the symbols. (a single line in this 
case)

To get to those other symbols we have to read the ELF symbols of those 
binaries in the guest filesystem, in the post-processing/reporting phase. This 
is both complex to do and relatively slow so we dont want to (and cannot) do 
this at sample time from IRQ context or NMI context ...

Also, many aspects of reporting are interactive so it's done lazily or 
on-demand. So we need ready access to the guest filesystem - for those guests 
which decide to integrate with the host for this.

2) the 'SystemTap mistake'

You are assuming that the symbols of the kernel when it got built got saved 
properly and are discoverable easily. In reality those symbols can be erased 
by a make clean, can be modified by a new build, can be misplaced and can 
generally be hard to find because each distro puts them in a different 
installation path.

My 10+ years experience with kernel instrumentation solutions is that 
kernel-driven, self-sufficient, robust, trustable, well-enumerated sources of 
information work far better in practice.

The thing is, in this thread i'm forced to repeat the same basic facts again 
and again. Could you _PLEASE_, pretty please, when it comes to instrumentation 
details, at least _read the mails_ of the guys who actually ... write and 
maintain Linux instrumentation code? This is getting ridiculous really.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:31                                                                                 ` Ingo Molnar
@ 2010-03-21 21:30                                                                                   ` Avi Kivity
  2010-03-21 21:52                                                                                     ` Ingo Molnar
  2010-03-22 11:10                                                                                   ` oerg Roedel
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 21:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 10:31 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/21/2010 09:17 PM, Ingo Molnar wrote:
>>      
>>> Adding any new daemon to an existing guest is a deployment and usability
>>> nightmare.
>>>        
>> The logical conclusion of that is that everything should be built into the
>> kernel. [...]
>>      
> Only if you apply it as a totalitarian rule.
>
> Furthermore, the logical conclusion of _your_ line of argument (applied in a
> totalitarian manner) is that 'nothing should be built into the kernel'.
>    

I'm certainly a minimalist, but that doesn't follow.  Things that 
require privileged access, or access to the page cache, or that can't be 
made to perform otherwise should certainly be in the kernel.  That's why 
I submitted kvm for inclusion in the first place.

If it's something that can work just as well in userspace but we can't 
be bothered to fix any 'deployment nightmares', then they shouldn't be 
in the kernel.  Examples include lvm2 and mdadm (which truly are 
'deployment nightmares' - you need to start them before you have access 
to your filesystem - yet they work somehow).

> I.e. you are arguing for microkernel Linux, while you see me as arguing for a
> monolithic kernel.
>    

No. I'm arguing for reducing bloat wherever possible.  Kernel code is 
more expensive than userspace code in every metric possible.

> Reality is that we are somewhere inbetween, we are neither black nor white:
> it's shades of grey.
>
> If we want to do a good job with all this then we observe subsystems, we see
> how they relate to the physical world and decide about how to shape them. We
> identify long-term changes and re-design modularization boundaries in
> hindsight - when we got them wrong initially. We dont try to rationalize the
> status-quo.
>    

I'm not for the status quo either - I'm for reducing the kernel code 
footprint whereever it doesn't impact performance or break clean interfaces.

> Lets see one example of that thought process in action: Oprofile.
>
> We saw that the modularization of oprofile was a total nightmare: a separate
> kernel-space and a separate user-space component, which was in constant
> version friction. The ABI between them was stiffling: it was hard to change it
> (you needed to trickle that through the tool as well which was on a different
> release schedule, etc.e tc.)
>
> The result was sucky usability that never went beyond some basic 'you can do
> profiling' threshold. The subsystem worked well within that design box, and it
> was worked on by highly competent people - but it was still far, far away from
> the potential it could have achieved.
>
> So we observed those problems and decided to do something about it:
>
>   - We unified the two parts into a single maintenance domain. There's
>     the kernel-side in kernel/perf_event.c and arch/*/*/perf_event.c,
>     plus the user-side in tools/perf/. The two are connected by a very
>     flexible, forwards and backwards compatible ABI.
>    

That's useful because perf is still small.  If it were a full fledged 
350KLOC GUI, then most of the development would concentrate on the GUI 
and very little (relatively) would have to do with the kernel.

Qemu is in that state today.  Please, please look at the recent commits 
and check how many have actually anything to do with kvm, and how many 
with everything else.

>   - We moved much more code into the kernel, realizing that transparent
>     and robust instrumentation should be offered instead of punting
>     abstractions into user-space (which is in a disadvantaged position
>     to implement system-wide abstractions).
>    

No argument.

I have a similar experience with kvm.  The user/kernel break is at the 
cpu virtualization level - that is kvm is solely responsible for 
emulating a cpu and userspace is responsible for emulating devices.  An 
exception was made for the PIC/IOAPIC/PIT due to performance 
considerations - they are emulated in the kernel as well.

A common FAQ is why do we not emulate real-mode instructions in qemu.  
The answer is that it the interface to kvm would be insane - it would 
emulate a partial cpu.  All other users of that interface would have to 
implement an emulator (there is also a practical argument - the qemu 
emulator does not implement atomics correctly wrt other threads).

>   - We created a no-bullsh*t approach to usability. perf is by no means
>     perfect, but it's written by developers for developers and if you report a
>     bug to us we'll act on it before anything else. Furthermore the kernel
>     developers do the user-space coding as well, so there's no chinese
>     wall separating them. Kernel-space becomes aware of the intricacies of
>     user-space and user-space developers become aware of the difficulties of
>     kernel-space as well. It's a good mix in our experience.
>    

Excellent.  However qemu is written by developers for their users, and 
their users are not worried about an eject button in the qemu SDL 
interface, or about running the qemu command line by hand.  They have 
complicated management interfaces that do everything, so we concentrate, 
for example, on a robust RPC interface for qemu.  That means nothing for 
command line users but is critical for our users.

I am not _against_ excellent support for command-line users, but I am 
not going to divert the resources I control (=me) into something that is 
not needed by my users.  I encourage anyone who wants to improve 
usability to subscribe to qemu-devel and contribute, they will receive a 
warm welcome.

> The thing is (and i doubt you are surprised that i say that), i see a similar
> situation with KVM. The basic parameters are comparable to Oprofile: it has a
> kernel-space component and a KVM-specific user-space. By all practical means
> the two are one and the same, but are maintained as different projects.
>    

There is tight cooperation between the maintainers and developers of 
these two projects.  Most developers are subscibed to both mailing lists 
and many have contributed to both repositories.  There does not appear 
to be a problem with release schedules.

> I have followed KVM since its inception with great interest. I saw its good
> initial design, i tried it early on and even wrote various patches for it. So
> i care more about KVM than a random observer would, but this preference and
> passion for KVM's good technical sides does not cloud my judgement when it
> comes to its weaknesses.
>
> In fact the weaknesses are far more important to identify and express
> publicly, so i tend to concentrate on them. Dont take this as me blasting KVM,
> we both know the many good aspects of KVM.
>
> So, as i explained it earlier in greater detail the modularization of KVM into
> a separate kernel-space and user-space component is one of its worst current
> weaknesses, and it has become the main stiffling force in the way of a better
> KVM experience to users.
>
> That, IMO, is the 'weakest link' of KVM today and no matter how well the rest
> of KVM gets improved those nice bits all get unfairly ignored when the user
> cannot have a usable and good desktop experience and thinks that KVM is
> crappy.
>    

Thanks.  I agree the user experience when launching qemu from the 
command line is miles behind virtualbox and vmware workstation.  What I 
disagree is that this is how a typical user will first experience kvm - 
most distributions now integrate virt-manager which allows you much 
better graphical interaction.

Unfortunately, virt-manager is still server-oriented (for example, it 
uses VNC instead of displaying directly to X), and is hardly polished to 
the same level as commercial tools.  However, you cannot force someone 
to write good desktop integration for qemu, it has to come from someone 
with the itch, the experience, the capability, and the time.

> I think you should think outside the initial design box you have created 4
> years ago, you should consider iterating the model and you should consider the
> alternative i suggested: move (or create) KVM tooling to tools/kvm/ and treat
> it as a single project from there on.
>    

Do you really think that tools/kvm/ would create a good GUI for kvm?  
lkml is hardly the place where GUI developers and designers congregate.  
Please, if any of you GUI experts are reading this, please consider 
contributing to qemu directly.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:55                                                                                 ` Ingo Molnar
@ 2010-03-21 21:42                                                                                   ` Avi Kivity
  2010-03-21 21:54                                                                                     ` Ingo Molnar
  2010-03-21 22:00                                                                                     ` Ingo Molnar
  0 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 21:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 10:55 PM, Ingo Molnar wrote:
>
> Of course you could say the following:
>
>    ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
>      able to add this to the v2.6.35 kernel queue anymore as the ongoing
>      usability work already takes up all of the project's maintainer and
>      testing bandwidth. If you want the feature to be merged sooner than that
>      then please help us cut down on the TODO and BUGS list that can be found
>      at XYZ. There's quite a few low hanging fruits there. '
>    

That would be shooting at my own foot as well as the contributor's since 
I badly want that RCU stuff, and while a GUI would be nice, that itch 
isn't on my back.

You're asking a developer and a maintainer to put off the work they're 
interested in, in order to work on something someone else is interested 
in, but not contributing the work.

> Although this RCU example is 'worst' possible example, as it's a pure speedup
> change with no functionality effect.
>
> Consider the _other_ examples that are a lot more clear:
>
>     ' If you expose paravirt spilocks via KVM please also make sure the KVM
>       tooling can make use of it, has an option for it to configure it, and
>       that it has sufficient efficiency statistics displayed in the tool for
>       admins to monitor.'
>
>     ' If you create this new paravirt driver then please also make sure it can
>       be configured in the tooling. '
>
>     ' Please also add a testcase for this bug to tools/kvm/testcases/ so we dont
>       repeat this same mistake in the future. '
>    

All three happen quite commonly in qemu/kvm development.  Of course 
someone who develops a feature also develops a patch that exposes it in 
qemu.  There are several test cases in qemu-kvm.git/kvm/user/test.

> I'd say most of the high-level feature work in KVM has tooling impact.
>    

Usually, pretty low.  Plumbing down a feature is usually trivial.  There 
are exceptions, of course - smp is only supported in qemu-kvm.git, not 
in upstream qemu.git, for example.  In any case of course the work is 
done in both qemu and kvm - do you think people develop features to see 
them bitrot?

> And note the important arguement that the 'eject button' thing would not occur
> naturally in a project that is well designed and has a good quality balance.
> It would only occur in the transitionary period if a big lump of lower-quality
> code is unified with higher-quality code. Then indeed a lot of pressure gets
> created on the people working on the high-quality portion to go over and fix
> the low-quality portion.
>    

It's a matter of priorities.

> Which, btw., is an unconditonally good thing ...
>
> But even an RCU speedup can be fairly linked/ordered to more pressing needs of
> a project.
>    

Pressing to whom?

> Really, the unification of two tightly related pieces of code has numerous
> clear advantages. Please give it some thought before rejecting it.
>    

I'm not blind to the advantages.  Dropping tcg would be the biggest of 
them by far (much more than moving the repository, IMO).  But there are 
disadvantages as well.

Around two years ago I seriously considered forking qemu, at this time I 
do not think it is a good idea.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:00                                                                                     ` Ingo Molnar
@ 2010-03-21 21:44                                                                                       ` Avi Kivity
  2010-03-21 23:43                                                                                       ` Anthony Liguori
  1 sibling, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-21 21:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Antoine Martin, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 11:00 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/21/2010 09:59 PM, Ingo Molnar wrote:
>>      
>>> Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony
>>> suggesting such a clearly inferior "add a demon to the guest space" solution.
>>> It's a usability and deployment non-starter.
>>>        
>> It's only clearly inferior if you ignore every consideration against it.
>> It's definitely not a deployment non-starter, see the tons of daemons that
>> come with any Linux system. [...]
>>      
> Avi, please dont put arguments into my mouth that i never made.
>    

Sorry, that was not the intent.  I meant that putting things into the 
kernel have disadvantages that must be considered.

> My (clearly expressed) argument was that:
>
>      _a new guest-side demon is a transparent instrumentation non-starter_
>
> What is so hard to understand about that simple concept? Instrumentation is
> good if it's as transparent as possible.
>
> Of course lots of other features can be done via a new user-space package ...
>    

I believe you can deploy this daemon via a (default) package, without 
any hassle to users.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:30                                                                                   ` Avi Kivity
@ 2010-03-21 21:52                                                                                     ` Ingo Molnar
  2010-03-22  6:49                                                                                       ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 21:52 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> > I.e. you are arguing for microkernel Linux, while you see me as arguing 
> > for a monolithic kernel.
> 
> No. I'm arguing for reducing bloat wherever possible.  Kernel code is more 
> expensive than userspace code in every metric possible.

1)

One of the primary design arguments of the micro-kernel design as well was to 
push as much into user-space as possible without impacting performance too 
much so you very much seem to be arguing for a micro-kernel design for the 
kernel.

I think history has given us the answer for that fight between microkernels 
and monolithic kernels.

Furthermore, to not engage in hypotheticals about microkernels: by your 
argument the Oprofile design was perfect (it was minimalistic kernel-space, 
with all the complexity in user-space), while perf was over-complex (which 
does many things in the kernel that could have been done in user-space).

Practical results suggest the exact opposite happened - Oprofile is being 
replaced by perf. How do you explain that?

2)

In your analysis you again ignore the package boundary costs and artifacts as 
if they didnt exist.

That was my main argument, and that is what we saw with oprofile and perf: 
while maintaining more kernel-code may be more expensive, it sure pays off for 
getting us a much better solution in the end.

And getting a 'much better solution' to users is the goal of all this, isnt 
it?

I dont mind what you call 'bloat' per se if it's for a purpose that users find 
like a good deal. I have quite a bit of RAM in most of my systems, having 50K 
more or less included in the kernel image is far less important than having a 
healthy and vibrant development model and having satisfied users ...

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:42                                                                                   ` Avi Kivity
@ 2010-03-21 21:54                                                                                     ` Ingo Molnar
  2010-03-22  0:16                                                                                       ` Anthony Liguori
  2010-03-22  7:13                                                                                       ` Avi Kivity
  2010-03-21 22:00                                                                                     ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 21:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/21/2010 10:55 PM, Ingo Molnar wrote:
> >
> >Of course you could say the following:
> >
> >   ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
> >     able to add this to the v2.6.35 kernel queue anymore as the ongoing
> >     usability work already takes up all of the project's maintainer and
> >     testing bandwidth. If you want the feature to be merged sooner than that
> >     then please help us cut down on the TODO and BUGS list that can be found
> >     at XYZ. There's quite a few low hanging fruits there. '
> 
> That would be shooting at my own foot as well as the contributor's since I 
> badly want that RCU stuff, and while a GUI would be nice, that itch isn't on 
> my back.

I think this sums up the root cause of all the problems i see with KVM pretty 
well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:42                                                                                   ` Avi Kivity
  2010-03-21 21:54                                                                                     ` Ingo Molnar
@ 2010-03-21 22:00                                                                                     ` Ingo Molnar
  2010-03-21 23:50                                                                                       ` Anthony Liguori
                                                                                                         ` (2 more replies)
  1 sibling, 3 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-21 22:00 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> > Consider the _other_ examples that are a lot more clear:
> >
> >    ' If you expose paravirt spilocks via KVM please also make sure the KVM
> >      tooling can make use of it, has an option for it to configure it, and
> >      that it has sufficient efficiency statistics displayed in the tool for
> >      admins to monitor.'
> >
> >    ' If you create this new paravirt driver then please also make sure it can
> >      be configured in the tooling. '
> >
> >    ' Please also add a testcase for this bug to tools/kvm/testcases/ so we dont
> >      repeat this same mistake in the future. '
> 
> All three happen quite commonly in qemu/kvm development.  Of course someone 
> who develops a feature also develops a patch that exposes it in qemu.  There 
> are several test cases in qemu-kvm.git/kvm/user/test.

If that is the theory then it has failed to trickle through in practice. As 
you know i have reported a long list of usability problems with hardly a look. 
That list could be created by pretty much anyone spending a few minutes of 
getting a first impression with qemu-kvm.

So something is seriously wrong in KVM land, to pretty much anyone trying it 
for the first time. I have explained how i see the root cause of that, while 
you seem to suggest that there's nothing wrong to begin with. I guess we'll 
have to agree to disagree on that.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 19:17                                                                             ` Ingo Molnar
  2010-03-21 19:35                                                                               ` Antoine Martin
  2010-03-21 20:01                                                                               ` Avi Kivity
@ 2010-03-21 23:35                                                                               ` Anthony Liguori
  2 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-21 23:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 02:17 PM, Ingo Molnar wrote:
>
>> If you want to improve this, you need to do the following:
>>
>> 1) Add a userspace daemon that uses vmchannel that runs in the guest and can
>>     fetch kallsyms and arbitrary modules.  If that daemon lives in
>>     tools/perf, that's fine.
>>      
> Adding any new daemon to an existing guest is a deployment and usability
> nightmare.
>
> The basic rule of good instrumentation is to be transparent. The moment we
> have to modify the user-space of a guest just to monitor it, the purpose of
> transparent instrumentation is defeated.
>
> That was one of the fundamental usability mistakes of Oprofile.
>
> There is no 'perf' daemon - all the perf functionality is _built in_, and for
> very good reasons. It is one of the main reasons for perf's success as well.
>    

The solution should be a long lived piece of code that runs without 
kernel privileges.  How the code is delivered to the user is a separate 
problem.

If you want to argue that the kernel should build an initramfs that 
contains some things that always should be shipped with the kernel but 
don't need to be within the kernel, I think that's something that's long 
over due.

We could make it a kernel thread, but what's the point?  It's much safer 
for it to be a userspace thread and it doesn't need to interact with the 
kernel in an intimate way.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:00                                                                                     ` Ingo Molnar
  2010-03-21 21:44                                                                                       ` Avi Kivity
@ 2010-03-21 23:43                                                                                       ` Anthony Liguori
  1 sibling, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-21 23:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Antoine Martin, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 04:00 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/21/2010 09:59 PM, Ingo Molnar wrote:
>>      
>>> Frankly, i was surprised (and taken slightly off base) by both Avi and Anthony
>>> suggesting such a clearly inferior "add a demon to the guest space" solution.
>>> It's a usability and deployment non-starter.
>>>        
>> It's only clearly inferior if you ignore every consideration against it.
>> It's definitely not a deployment non-starter, see the tons of daemons that
>> come with any Linux system. [...]
>>      
> Avi, please dont put arguments into my mouth that i never made.
>
> My (clearly expressed) argument was that:
>
>      _a new guest-side demon is a transparent instrumentation non-starter_
>    

FWIW, there's no reason you couldn't consume a vmchannel port from 
within the kernel.  I don't think the code needs to be in the kernel and 
from a security PoV, that suggests that it should be in userspace IMHO.

But if you want to make a kernel thread, knock yourself out.  I have no 
objection to that from a qemu perspective.  I can't see why Avi would 
mind either.  I think it's papering around another problem (the kernel 
should control initrds IMHO) but that's a different topic.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 22:00                                                                                     ` Ingo Molnar
@ 2010-03-21 23:50                                                                                       ` Anthony Liguori
  2010-03-22  0:25                                                                                       ` Anthony Liguori
  2010-03-22  7:18                                                                                       ` Avi Kivity
  2 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-21 23:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 05:00 PM, Ingo Molnar wrote:
> If that is the theory then it has failed to trickle through in practice. As
> you know i have reported a long list of usability problems with hardly a look.
> That list could be created by pretty much anyone spending a few minutes of
> getting a first impression with qemu-kvm.
>    

I think the point you're missing is that your list was from the 
perspective of someone looking at a desktop virtualization solution that 
had was graphically oriented.

As Avi has repeatedly mentioned, so far, that has not been the target 
audience of QEMU.  The target audience tends to be: 1) people looking to 
do server virtualization and 2) people looking to do command line based 
development.

Usually, both (1) and (2) are working on machines that are remotely 
located.  What's important to these users is that VMs be easily 
launchable from the command line, that there is a lot of flexibility in 
defining machine types, and that there is a programmatic way to interact 
with a given instance of QEMU.  Those are the things that we've been 
focusing on recently.

The reason we don't have better desktop virtualization support is 
simple.  No one is volunteering to do it and no company is funding 
development for it.

When you look at something like VirtualBox, what you're looking at is a 
long ago forked version of QEMU with a GUI added focusing on desktop 
virtualization.

There is no magic behind adding a better, more usable GUI to QEMU.  It 
just takes resources.

I understand that you're trying to make the point that without catering 
to the desktop virtualization use case, we won't get as many developers 
as we could.  Personally, I don't think that argument is accurate.  If 
you look at VirtualBox, it's performance is terrible.  Having a nice GUI 
hasn't gotten them the type of developers that can improve their 
performance.

No one is arguing that we wouldn't like to have a nicer UI.  I'd love to 
merge any patch like that.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:54                                                                                     ` Ingo Molnar
@ 2010-03-22  0:16                                                                                       ` Anthony Liguori
  2010-03-22 11:59                                                                                         ` Ingo Molnar
  2010-03-22  7:13                                                                                       ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22  0:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 04:54 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/21/2010 10:55 PM, Ingo Molnar wrote:
>>      
>>> Of course you could say the following:
>>>
>>>    ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
>>>      able to add this to the v2.6.35 kernel queue anymore as the ongoing
>>>      usability work already takes up all of the project's maintainer and
>>>      testing bandwidth. If you want the feature to be merged sooner than that
>>>      then please help us cut down on the TODO and BUGS list that can be found
>>>      at XYZ. There's quite a few low hanging fruits there. '
>>>        
>> That would be shooting at my own foot as well as the contributor's since I
>> badly want that RCU stuff, and while a GUI would be nice, that itch isn't on
>> my back.
>>      
> I think this sums up the root cause of all the problems i see with KVM pretty
> well.
>    

A good maintainer has to strike a balance between asking more of people 
than what they initially volunteer and getting people to implement the 
less fun things that are nonetheless required.  The kernel can take this 
to an extreme because at the end of the day, it's the only game in town 
and there is an unending number of potential volunteers.  Most other 
projects are not quite as fortunate.

When someone submits a patch set to QEMU implementing a new network 
backend for raw sockets, we can push back about how it fits into the 
entire stack wrt security, usability, etc.  Ultimately, we can arrive at 
a different, more user friendly solution (networking helpers) and along 
with some time investment on my part, we can create a much nicer, more 
user friendly solution.  Still command line based though.

Responding to such a patch set with, replace the SDL front end with a 
GTK one that lets you graphically configure networking, is not 
reasonable and the result would be one less QEMU contributor in the long 
run.

Overtime, we can, and are, pushing people to focus more on usability.  
But that doesn't get you a first class GTK GUI overnight.  The only way 
you're going to get that is by having a contributor be specifically 
interesting in building such a thing.

We simply haven't had that in the past 5 years that I've been involved 
in the project.  If someone stepped up to build this, I'd certainly 
support it in every way possible and there are probably some steps we 
could take to even further encourage this.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 22:00                                                                                     ` Ingo Molnar
  2010-03-21 23:50                                                                                       ` Anthony Liguori
@ 2010-03-22  0:25                                                                                       ` Anthony Liguori
  2010-03-22  7:18                                                                                       ` Avi Kivity
  2 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22  0:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 05:00 PM, Ingo Molnar wrote:
> If that is the theory then it has failed to trickle through in practice. As
> you know i have reported a long list of usability problems with hardly a look.
> That list could be created by pretty much anyone spending a few minutes of
> getting a first impression with qemu-kvm.
>    

Can you transfer your list to the following wiki page:

http://wiki.qemu.org/Features/Usability

This thread is so large that I can't find your note that contained the 
initial list.

I want to make sure this input doesn't die once this thread settles down.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:20                                                                                             ` Ingo Molnar
@ 2010-03-22  6:35                                                                                               ` Avi Kivity
  2010-03-22 11:48                                                                                                 ` Ingo Molnar
  2010-03-22  6:59                                                                                               ` Zhang, Yanmin
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22  6:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Antoine Martin, Olivier Galibert, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/21/2010 11:20 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> Well, for what it's worth, I rarely ever use anything else. My virtual
>>> disks are raw so I can loop mount them easily, and I can also switch my
>>> guest kernels from outside... without ever needing to mount those disks.
>>>        
>> Curious, what do you use them for?
>>
>> btw, if you build your kernel outside the guest, then you already have
>> access to all its symbols, without needing anything further.
>>      
> There's two errors with your argument:
>
> 1) you are assuming that it's only about kernel symbols
>
> Look at this 'perf report' output:
>
> # Samples: 7127509216
> #
> # Overhead     Command                  Shared Object  Symbol
> # ........  ..........  .............................  ......
> #
>      19.14%         git  git                            [.] lookup_object
>      15.16%        perf  git                            [.] lookup_object
>       4.74%        perf  libz.so.1.2.3                  [.] inflate
>       4.52%         git  libz.so.1.2.3                  [.] inflate
>       4.21%        perf  libz.so.1.2.3                  [.] inflate_table
>       3.94%         git  libz.so.1.2.3                  [.] inflate_table
>       3.29%         git  git                            [.] find_pack_entry_one
>       3.24%         git  libz.so.1.2.3                  [.] inflate_fast
>       2.96%        perf  libz.so.1.2.3                  [.] inflate_fast
>       2.96%         git  git                            [.] decode_tree_entry
>       2.80%        perf  libc-2.11.90.so                [.] __strlen_sse42
>       2.56%         git  libc-2.11.90.so                [.] __strlen_sse42
>       1.98%        perf  libc-2.11.90.so                [.] __GI_memcpy
>       1.71%        perf  git                            [.] decode_tree_entry
>       1.53%         git  libc-2.11.90.so                [.] __GI_memcpy
>       1.48%         git  git                            [.] lookup_blob
>       1.30%         git  git                            [.] process_tree
>       1.30%        perf  git                            [.] process_tree
>       0.90%        perf  git                            [.] tree_entry
>       0.82%        perf  git                            [.] lookup_blob
>       0.78%         git  [kernel.kallsyms]              [k] kstat_irqs_cpu
>
> kernel symbols are only a small portion of the symbols. (a single line in this
> case)
>
> To get to those other symbols we have to read the ELF symbols of those
> binaries in the guest filesystem, in the post-processing/reporting phase. This
> is both complex to do and relatively slow so we dont want to (and cannot) do
> this at sample time from IRQ context or NMI context ...
>    

Okay.  So a symbol server is necessary.  Still, I don't think -kernel is 
a good reason for including the symbol server in the kernel itself.  If 
someone uses it extensively together with perf, _and_ they can't put the 
symbol server in the guest for some reason, let them patch mkinitrd to 
include it.

> Also, many aspects of reporting are interactive so it's done lazily or
> on-demand. So we need ready access to the guest filesystem - for those guests
> which decide to integrate with the host for this.
>
> 2) the 'SystemTap mistake'
>
> You are assuming that the symbols of the kernel when it got built got saved
> properly and are discoverable easily. In reality those symbols can be erased
> by a make clean, can be modified by a new build, can be misplaced and can
> generally be hard to find because each distro puts them in a different
> installation path.
>
> My 10+ years experience with kernel instrumentation solutions is that
> kernel-driven, self-sufficient, robust, trustable, well-enumerated sources of
> information work far better in practice.
>    

What about line number information?  And the source?  Into the kernel 
with them as well?


> The thing is, in this thread i'm forced to repeat the same basic facts again
> and again. Could you _PLEASE_, pretty please, when it comes to instrumentation
> details, at least _read the mails_ of the guys who actually ... write and
> maintain Linux instrumentation code? This is getting ridiculous really.
>    

I've read every one of your emails.  If I misunderstood or overlooked 
something, I apologize.  The thread is very long and at times 
antagonistic so it's hard to keep all the details straight.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:37                                                                                     ` Ingo Molnar
@ 2010-03-22  6:37                                                                                       ` Avi Kivity
  2010-03-22 11:39                                                                                         ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22  6:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 10:37 PM, Ingo Molnar wrote:
>
>> That includes the guest kernel.  If you can deploy a new kernel in the
>> guest, presumably you can deploy a userspace package.
>>      
> Note that with perf we can instrument the guest with zero guest-kernel
> modifications as well.
>
> We try to reduce the guest impact to a bare minimum, as the difficulties in
> deployment are function of the cross section surface to the guest.
>
> Also, note that the kernel is special with regards to instrumentation: since
> this is the kernel project, we are doing kernel space changes, as we are doing
> them _anyway_. So adding symbol resolution capabilities would be a minimal
> addition to that - while adding a while new guest package for the demon would
> significantly increase the cross section surface.
>    

It's true that for us, changing the kernel is easier than changing the 
rest of the guest.  IMO we should still resist the temptation to go the 
easy path and do the right thing (I understand we disagree about what 
the right thing is).

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:52                                                                                     ` Ingo Molnar
@ 2010-03-22  6:49                                                                                       ` Avi Kivity
  2010-03-22 11:23                                                                                         ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22  6:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 11:52 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> I.e. you are arguing for microkernel Linux, while you see me as arguing
>>> for a monolithic kernel.
>>>        
>> No. I'm arguing for reducing bloat wherever possible.  Kernel code is more
>> expensive than userspace code in every metric possible.
>>      
> 1)
>
> One of the primary design arguments of the micro-kernel design as well was to
> push as much into user-space as possible without impacting performance too
> much so you very much seem to be arguing for a micro-kernel design for the
> kernel.
>
> I think history has given us the answer for that fight between microkernels
> and monolithic kernels.
>    

I am not arguing for a microkernel.  Again: reduce bloat where possible, 
kernel code is more expensive than userspace code.

> Furthermore, to not engage in hypotheticals about microkernels: by your
> argument the Oprofile design was perfect (it was minimalistic kernel-space,
> with all the complexity in user-space), while perf was over-complex (which
> does many things in the kernel that could have been done in user-space).
>
> Practical results suggest the exact opposite happened - Oprofile is being
> replaced by perf. How do you explain that?
>    

I did not say that the amount of kernel and userspace code is the only 
factor deciding the quality of software.  If that were so, microkernels 
would have won out long ago.

It may be that that perf has too much kernel code, and won against 
oprofile despite that because it was better in other areas.  Or it may 
be that perf has exactly the right user/kernel division.  Or maybe perf 
needs some of the code moved from userspace to the kernel.  I don't 
know, I haven't examined the code.

The user/kernel boundary is only one metric for code quality.  Nor is it 
always in favour of pushing things to userspace.  Narrowing or 
simplifying an interface is often an argument in favour of pushing 
things into the kernel.

IMO the reason perf is more usable than oprofile has less to do with the 
kernel/userspace boundary and more do to with effort and attention spent 
on the userspace/user boundary.

> 2)
>
> In your analysis you again ignore the package boundary costs and artifacts as
> if they didnt exist.
>
> That was my main argument, and that is what we saw with oprofile and perf:
> while maintaining more kernel-code may be more expensive, it sure pays off for
> getting us a much better solution in the end.
>    

Package costs are real.  We need to bear them.  I don't think that 
because maintaining another package (and the interface between two 
packages) is more difficult, then the kernel size should increase.

> And getting a 'much better solution' to users is the goal of all this, isnt
> it?
>
> I dont mind what you call 'bloat' per se if it's for a purpose that users find
> like a good deal. I have quite a bit of RAM in most of my systems, having 50K
> more or less included in the kernel image is far less important than having a
> healthy and vibrant development model and having satisfied users ...
>    

I'm not worried about 50K or so, I'm worried about a bug in those 50K 
taking down the guest.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:20                                                                                             ` Ingo Molnar
  2010-03-22  6:35                                                                                               ` Avi Kivity
@ 2010-03-22  6:59                                                                                               ` Zhang, Yanmin
  1 sibling, 0 replies; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-22  6:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori,
	Pekka Enberg, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On Sun, 2010-03-21 at 22:20 +0100, Ingo Molnar wrote:
> * Avi Kivity <avi@redhat.com> wrote:
> 
> > > Well, for what it's worth, I rarely ever use anything else. My virtual 
> > > disks are raw so I can loop mount them easily, and I can also switch my 
> > > guest kernels from outside... without ever needing to mount those disks.
> > 
> > Curious, what do you use them for?
> > 
> > btw, if you build your kernel outside the guest, then you already have 
> > access to all its symbols, without needing anything further.
> 
> There's two errors with your argument:
> 
> 1) you are assuming that it's only about kernel symbols
> 
> Look at this 'perf report' output:
> 
> # Samples: 7127509216
> #
> # Overhead     Command                  Shared Object  Symbol
> # ........  ..........  .............................  ......
> #
>     19.14%         git  git                            [.] lookup_object
>     15.16%        perf  git                            [.] lookup_object
>      4.74%        perf  libz.so.1.2.3                  [.] inflate
>      4.52%         git  libz.so.1.2.3                  [.] inflate
>      4.21%        perf  libz.so.1.2.3                  [.] inflate_table
>      3.94%         git  libz.so.1.2.3                  [.] inflate_table
>      3.29%         git  git                            [.] find_pack_entry_one
>      3.24%         git  libz.so.1.2.3                  [.] inflate_fast
>      2.96%        perf  libz.so.1.2.3                  [.] inflate_fast
>      2.96%         git  git                            [.] decode_tree_entry
>      2.80%        perf  libc-2.11.90.so                [.] __strlen_sse42
>      2.56%         git  libc-2.11.90.so                [.] __strlen_sse42
>      1.98%        perf  libc-2.11.90.so                [.] __GI_memcpy
>      1.71%        perf  git                            [.] decode_tree_entry
>      1.53%         git  libc-2.11.90.so                [.] __GI_memcpy
>      1.48%         git  git                            [.] lookup_blob
>      1.30%         git  git                            [.] process_tree
>      1.30%        perf  git                            [.] process_tree
>      0.90%        perf  git                            [.] tree_entry
>      0.82%        perf  git                            [.] lookup_blob
>      0.78%         git  [kernel.kallsyms]              [k] kstat_irqs_cpu
> 
> kernel symbols are only a small portion of the symbols. (a single line in this 
> case)
Above example shows perf could summarize both kernel and application hot functions.
If we collect guest os statistics from host side, we can't summarize detailed guest os
application info because we couldn't get guest os's application process id from host
side. So we could only get detailed kernel info and the total utilization percent of
guest application processes.


> 
> To get to those other symbols we have to read the ELF symbols of those 
> binaries in the guest filesystem, in the post-processing/reporting phase. This 
> is both complex to do and relatively slow so we dont want to (and cannot) do 
> this at sample time from IRQ context or NMI context ...
> 
> Also, many aspects of reporting are interactive so it's done lazily or 
> on-demand. So we need ready access to the guest filesystem - for those guests 
> which decide to integrate with the host for this.



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:54                                                                                     ` Ingo Molnar
  2010-03-22  0:16                                                                                       ` Anthony Liguori
@ 2010-03-22  7:13                                                                                       ` Avi Kivity
  2010-03-22 11:14                                                                                         ` Ingo Molnar
  2010-03-24 12:06                                                                                         ` Paolo Bonzini
  1 sibling, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22  7:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/21/2010 11:54 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/21/2010 10:55 PM, Ingo Molnar wrote:
>>      
>>> Of course you could say the following:
>>>
>>>    ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
>>>      able to add this to the v2.6.35 kernel queue anymore as the ongoing
>>>      usability work already takes up all of the project's maintainer and
>>>      testing bandwidth. If you want the feature to be merged sooner than that
>>>      then please help us cut down on the TODO and BUGS list that can be found
>>>      at XYZ. There's quite a few low hanging fruits there. '
>>>        
>> That would be shooting at my own foot as well as the contributor's since I
>> badly want that RCU stuff, and while a GUI would be nice, that itch isn't on
>> my back.
>>      
> I think this sums up the root cause of all the problems i see with KVM pretty
> well.
>    

I think we agree at last.  Neither I nor my employer are interested in 
running qemu as a desktop-on-desktop tool, therefore I don't invest any 
effort in that direction, or require it from volunteers.

If you think a good GUI is so badly needed, either write one yourself, 
or convince someone else to do it.

(btw, why are you interested in desktop-on-desktop?  one use case is 
developers, which don't really need fancy GUIs; a second is people who 
test out distributions, but that doesn't seem to be a huge population; 
and a third is people running Windows for some application that doesn't 
run on Linux - hopefully a small catergory as well.  Seems to be quite a 
small target audience, compared to, say, video editing)

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 22:00                                                                                     ` Ingo Molnar
  2010-03-21 23:50                                                                                       ` Anthony Liguori
  2010-03-22  0:25                                                                                       ` Anthony Liguori
@ 2010-03-22  7:18                                                                                       ` Avi Kivity
  2 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22  7:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 12:00 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> Consider the _other_ examples that are a lot more clear:
>>>
>>>     ' If you expose paravirt spilocks via KVM please also make sure the KVM
>>>       tooling can make use of it, has an option for it to configure it, and
>>>       that it has sufficient efficiency statistics displayed in the tool for
>>>       admins to monitor.'
>>>
>>>     ' If you create this new paravirt driver then please also make sure it can
>>>       be configured in the tooling. '
>>>
>>>     ' Please also add a testcase for this bug to tools/kvm/testcases/ so we dont
>>>       repeat this same mistake in the future. '
>>>        
>> All three happen quite commonly in qemu/kvm development.  Of course someone
>> who develops a feature also develops a patch that exposes it in qemu.  There
>> are several test cases in qemu-kvm.git/kvm/user/test.
>>      
> If that is the theory then it has failed to trickle through in practice. As
> you know i have reported a long list of usability problems with hardly a look.
> That list could be created by pretty much anyone spending a few minutes of
> getting a first impression with qemu-kvm.
>    

It does happen in practice, just not in the GUI areas, since no one is 
working on them.  I am not going to condition a qcow2 reliability fix to 
a gtk GUI.

> So something is seriously wrong in KVM land, to pretty much anyone trying it
> for the first time. I have explained how i see the root cause of that, while
> you seem to suggest that there's nothing wrong to begin with. I guess we'll
> have to agree to disagree on that.
>    

Not anyone trying it for the first time.  RHEV-M users will see a 
polished GUI that can be used to manage thousands of guests and hosts.  
I presume IBM and Siemens (and all other contributors) users will also 
enjoy a good user experience with their respective products.  Qemu is 
not the only GUI for kvm.

So far only one company was interested in a qemu GUI - the makers of 
virtualbox.  Unfortunately they chose not to contribute that back to 
qemu, and no one was sufficiently motivated to pick out the bits and try 
to merge them.

Again, if you are interested in a qemu GUI, you either have to write it 
yourself or convince someone else to do it.  My own plate is full and my 
priorities are clear.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-19  8:21   ` Ingo Molnar
  2010-03-19 17:29     ` oerg Roedel
@ 2010-03-22  7:24     ` Zhang, Yanmin
  2010-03-22 16:44       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-22  7:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo

On Fri, 2010-03-19 at 09:21 +0100, Ingo Molnar wrote:
> Nice progress!
> 
> This bit:
> 
> > 1) perf kvm top
> > [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
> > --guestmodules=/home/ymzhang/guest/modules top
> 

> Will be really be painful to developers - to enter that long line while we 
> have these things called 'computers' that ought to reduce human work. Also, 
> it's incomplete, we need access to the guest system's binaries to do ELF 
> symbol resolution and dwarf decoding.
Yes, I agree with you and Avi that we need the enhancement be user-friendly.
One of my start points is to keep the tool having less dependency on
other components. Admin/developers could write script wrappers quickly if
perf has parameters to support the new capability.


> 
> So we really need some good, automatic way to get to the guest symbol space, 
> so that if a developer types:
> 
>    perf kvm top
> 
> Then the obvious thing happens by default. (which is to show the guest 
> overhead)
> 
> There's no technical barrier on the perf tooling side to implement all that: 
> perf supports build-ids extensively and can deal with multiple symbol spaces - 
> as long as it has access to it. The guest kernel could be ID-ed based on its 
> /sys/kernel/notes and /sys/module/*/notes/.note.gnu.build-id build-ids.
I tried sshfs quickly. sshfs could mount root filesystem of guest os nicely.
I could access the files quickly. However, it doesn't work when I access
/proc/ and /sys/ because sshfs/scp depend on file size while the sizes of most
files of /proc/ and /sys/ are 0.


> 
> So some sort of --guestmount option would be the natural solution, which 
> points to the guest system's root: and a Qemu enumeration of guest mounts 
> (which would be off by default and configurable) from which perf can pick up 
> the target guest all automatically. (obviously only under allowed permissions 
> so that such access is secure)
If sshfs could access /proc/ and /sys correctly, here is a design:
--guestmount points to a directory which consists of a list of sub-directories.
Every sub-directory's name is just the qemu process id of guest os. Admin/developer
mounts every guest os instance's root directory to corresponding sub-directory.

Then, perf could access all files. It's possible because guest os instance
happens to be multi-threading in a process. One of the defects is the accessing to
guest os becomes slow or impossible when guest os is very busy.


> 
> This would allow not just kallsyms access via $guest/proc/kallsyms but also 
> gives us the full space of symbol features: access to the guest binaries for 
> annotation and general symbol resolution, command/binary name identification, 
> etc.
> 
> Such a mount would obviously not broaden existing privileges - and as an 
> additional control a guest would also have a way to indicate that it does not 
> wish a guest mount at all.
> 
> Unfortunately, in a previous thread the Qemu maintainer has indicated that he 
> will essentially NAK any attempt to enhance Qemu to provide an easily 
> discoverable, self-contained, transparent guest mount on the host side.
> 
> No technical justification was given for that NAK, despite my repeated 
> requests to particulate the exact security problems that such an approach 
> would cause.
> 
> If that NAK does not stand in that form then i'd like to know about it - it 
> makes no sense for us to try to code up a solution against a standing 
> maintainer NAK ...
> 
> The other option is some sysadmin level hackery to NFS-mount the guest or so. 
> This is a vastly inferior method that brings us back to the absymal usability 
> levels of OProfile:
> 
>  1) it wont be guest transparent
>  2) has to be re-done for every guest image. 
>  3) even if packaged it has to be gotten into every. single. Linux. distro. separately.
>  4) old Linux guests wont work out of box
> 
> In other words: it's very inconvenient on multiple levels and wont ever happen 
> on any reasonable enough scale to make a difference to Linux.
> 
> Which is an unfortunate situation - and the ball is on the KVM/Qemu side so i 
> can do little about it.
> 
> Thanks,
> 
> 	Ingo



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-21 18:43       ` Ingo Molnar
@ 2010-03-22 10:14         ` oerg Roedel
  2010-03-22 10:37           ` Ingo Molnar
  2010-03-22 10:59           ` Ingo Molnar
  0 siblings, 2 replies; 390+ messages in thread
From: oerg Roedel @ 2010-03-22 10:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo

On Sun, Mar 21, 2010 at 07:43:00PM +0100, Ingo Molnar wrote:
> Having access to the actual executable files that include the symbols achieves 
> precisely that - with the additional robustness that all this functionality is 
> concentrated into the host, while the guest side is kept minimal (and 
> transparent).

If you want to access the guests file-system you need a piece of
software running in the guest which gives you this access. But when you
get an event this piece of software may not be runnable (if the guest is
in an interrupt handler or any other non-preemptible code path). When the
host finally gets access to the guests filesystem again the source of
that event may already be gone (process has exited, module unloaded...).
The only way to solve that is to pass the event information to the guest
immediatly and let it collect the information we want.


> It can decide whether it exposes the files. Nor are there any "security 
> issues" to begin with.

I am not talking about security. Security was sufficiently flamed about
already.

> You need to be aware of the fact that symbol resolution is a separate step 
> from call chain generation.

Same concern as above applies to call-chain generation too.

> > How we speak to the guest was already discussed in this thread. My personal 
> > opinion is that going through qemu is an unnecessary step and we can solve 
> > that more clever and transparent for perf.
> 
> Meaning exactly what?

Avi was against that but I think it would make sense to give names to
virtual machines (with a default, similar to network interface names).
Then we can create a directory in /dev/ with that name (e.g.
/dev/vm/fedora/). Inside the guest a (priviledged) process can create
some kind of named virt-pipe which results in a device file created in
the guests directory (perf could create /dev/vm/fedora/perf for
example). This file is used for guest-host communication.

Thanks,

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-22 10:14         ` oerg Roedel
@ 2010-03-22 10:37           ` Ingo Molnar
  2010-03-22 10:59           ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 10:37 UTC (permalink / raw)
  To: oerg Roedel
  Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo


* oerg Roedel <joro@8bytes.org> wrote:

> > It can decide whether it exposes the files. Nor are there any "security 
> > issues" to begin with.
> 
> I am not talking about security. [...]

You were talking about security, in the portion of your mail that you snipped 
out, and which i replied to:

> >      2. The guest can decide for its own if it want to pass this
> >         inforamtion to the host-perf. No security issues at all.

I understood that portion to mean what it says: that your claim that your 
proposal 'has no security issues at all', in contrast to my suggestion.

> [...] Security was sufficiently flamed about already.

All i saw was my suggestion to allow a guest to securely (and scalably and 
conveniently) integrate/mount its filesystems to the host if both sides (both 
the host and the guest) permit it, to make it easier for instrumentation to 
pick up symbol details.

I.e. if a guest runs then its filesystem may be present on the host side as:

   /guests/Fedora-G1/
   /guests/Fedora-G1/proc/
   /guests/Fedora-G1/usr/
   /guests/Fedora-G1/.../

( This feature would be configurable and would be default-off, to maintain the 
  current status quo. )

i.e. it's a bit like sshfs or NFS or loopback block mounts, just in an 
integrated and working fashion (sshfs doesnt work well with /proc for example) 
and more guest transparent (obviously sshfs or NFS exports need per guest 
configuration), and lower overhead than sshfs/NFS - i.e. without the 
(unnecessary) networking overhead.

That suggestion was 'countered' by an unsubstantiated claim by Anthony that 
this kind of usability feature would somehow be a 'security nighmare'.

In reality it is just an incremental, more usable, faster and more 
guest-transparent form of what is already possible today via:

  - loopback mounts on host
  - NFS exports
  - SMB exports
  - sshfs
  - (and other mechanisms)

I wish there was at least flaming about it - as flames tend to have at least 
some specifics in them.

What i saw instead was a claim about a 'security nightmare', which was, when i 
asked for specifics, was followed by deafening silence. And you appear to have 
repeated that claim here, unwilling to back it up with specifics.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-22 10:14         ` oerg Roedel
  2010-03-22 10:37           ` Ingo Molnar
@ 2010-03-22 10:59           ` Ingo Molnar
  2010-03-22 11:47             ` Joerg Roedel
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 10:59 UTC (permalink / raw)
  To: oerg Roedel
  Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo


* oerg Roedel <joro@8bytes.org> wrote:

> On Sun, Mar 21, 2010 at 07:43:00PM +0100, Ingo Molnar wrote:
> > Having access to the actual executable files that include the symbols achieves 
> > precisely that - with the additional robustness that all this functionality is 
> > concentrated into the host, while the guest side is kept minimal (and 
> > transparent).
> 
> If you want to access the guests file-system you need a piece of software 
> running in the guest which gives you this access. But when you get an event 
> this piece of software may not be runnable (if the guest is in an interrupt 
> handler or any other non-preemptible code path). When the host finally gets 
> access to the guests filesystem again the source of that event may already 
> be gone (process has exited, module unloaded...). The only way to solve that 
> is to pass the event information to the guest immediatly and let it collect 
> the information we want.

The very same is true of profiling in the host space as well (KVM is nothing 
special here, other than its unreasonable insistence on not enumerating 
readily available information in a more usable way).

So are you suggesting a solution to a perf problem we already solved 
differently? (and which i argue we solved in a better way)

We have solved that in the host space already (and quite elaborately so), and 
not via your suggestion of moving symbol resolution to a different stage, but 
by properly generating the right events to allow the post-processing stage to 
see processes that have already exited, to robustly handle files that have 
been rebuilt, etc.

>From an instrumentation POV it is fundamentally better to acquire the right 
data and delay any complexities to the analysis stage (the perf model) than to 
complicate sampling (the oprofile dcookies model).

Your proposal of 'doing the symbol resolution in the guest context' is in 
essence re-arguing that very similar point that oprofile lost. Did you really 
intend to re-argue that point as well? If yes then please propose an 
alternative implementation for everything that perf does wrt. symbol lookups.

What we propose for 'perf kvm' right now is simply a straight-forward 
extension of the existing (and well working) symbol handling code to 
virtualization.

> > You need to be aware of the fact that symbol resolution is a separate step 
> > from call chain generation.
> 
> Same concern as above applies to call-chain generation too.

Best would be if you demonstrated any problems of the perf symbol lookup code 
you are aware of on the host side, as it has that exact design you are 
criticising here. We are eager to fix any bugs in it.

If you claim that it's buggy then that should very much be demonstratable - no 
need to go into theoretical arguments about it.

( You should be aware of the fact that perf currently works with 'processes
  exiting prematurely' and similar scenarios just fine, so if you want to
  demonstrate that it's broken you will probably need a different example. )

> > > How we speak to the guest was already discussed in this thread. My 
> > > personal opinion is that going through qemu is an unnecessary step and 
> > > we can solve that more clever and transparent for perf.
> > 
> > Meaning exactly what?
> 
> Avi was against that but I think it would make sense to give names to 
> virtual machines (with a default, similar to network interface names). Then 
> we can create a directory in /dev/ with that name (e.g. /dev/vm/fedora/). 
> Inside the guest a (priviledged) process can create some kind of named 
> virt-pipe which results in a device file created in the guests directory 
> (perf could create /dev/vm/fedora/perf for example). This file is used for 
> guest-host communication.

That is kind of half of my suggestion - the built-in enumeration guests and a 
guaranteed channel to them accessible to tools. (KVM already has its own 
special channel so it's not like channels of communication are useless.)

The other half of my suggestion is that if we bring this thought to its 
logical conclusion then we might as well walk the whole mile and not use 
quirky, binary API single-channel pipes. I.e. we could use this convenient, 
human-readable, structured, hierarchical abstraction to expose information in 
a finegrained, scalable way, which has a world-class implementation in Linux: 
the 'VFS namespace'.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 20:31                                                                                 ` Ingo Molnar
  2010-03-21 21:30                                                                                   ` Avi Kivity
@ 2010-03-22 11:10                                                                                   ` oerg Roedel
  2010-03-22 12:22                                                                                     ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: oerg Roedel @ 2010-03-22 11:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Sun, Mar 21, 2010 at 09:31:21PM +0100, Ingo Molnar wrote:
> Lets see one example of that thought process in action: Oprofile.

Since you are talking so much about oProfile in this thread I think it
is important to mention that the problem with oProfile was not the
repository separation.

The problem was (and is) that the kernel and the user-space parts are
maintained by different people who dont talk to each other or have a
direction where they want to go with the project. Basically the reason
of the oProfile failure is a disfunctional community. I told the
kernel-maintainer several times to also maintain user-space but he
didn't want that.

The situation with KVM is entirely different. Avi commits to kvm.git and
qemu-kvm.git so he maintains both. Anthony is working to integrate the
qemu-kvm changes into upstream qemu. Further these people work very
closely together and the community around KVM works well too. The
problems that oProfile has are not even in sight for KVM.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22  7:13                                                                                       ` Avi Kivity
@ 2010-03-22 11:14                                                                                         ` Ingo Molnar
  2010-03-22 11:23                                                                                           ` Alexander Graf
  2010-03-22 12:29                                                                                           ` Avi Kivity
  2010-03-24 12:06                                                                                         ` Paolo Bonzini
  1 sibling, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 11:14 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> On 03/21/2010 11:54 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>On 03/21/2010 10:55 PM, Ingo Molnar wrote:
> >>>Of course you could say the following:
> >>>
> >>>   ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
> >>>     able to add this to the v2.6.35 kernel queue anymore as the ongoing
> >>>     usability work already takes up all of the project's maintainer and
> >>>     testing bandwidth. If you want the feature to be merged sooner than that
> >>>     then please help us cut down on the TODO and BUGS list that can be found
> >>>     at XYZ. There's quite a few low hanging fruits there. '
> >>That would be shooting at my own foot as well as the contributor's since I
> >>badly want that RCU stuff, and while a GUI would be nice, that itch isn't on
> >>my back.
> >I think this sums up the root cause of all the problems i see with KVM pretty
> >well.
> 
> I think we agree at last.  Neither I nor my employer are interested in 
> running qemu as a desktop-on-desktop tool, therefore I don't invest any 
> effort in that direction, or require it from volunteers.

Obviously your employer at least in part defers to you when it comes to KVM 
priorities.

So, just to make this really clear, _you_ are not interested in running qemu 
as a desktop-on-desktop tool, subsequently this kind of 
disinterest-for-desktop-usability trickled through the whole KVM stack and 
poisoned your attitude and your contributor's attitude.

Too sad really and it's doubly sad that you dont feel anything wrong about 
that.

> If you think a good GUI is so badly needed, either write one yourself, or 
> convince someone else to do it.

To a certain degree we are trying to do a small bit of that (see this very 
thread) - and you are NAK-ing and objecting the heck out of it via your 
unreasonable microkernelish and server-centric views.

With constant maintainer disinterest there's no wonder a non-desktop-oriented 
KVM becomes a self-fulfilling prophecy: you think the desktop does not matter, 
hence it becomes a reality in KVM space which you can constantly refer back to 
as a 'fact'.

Which i find dishonest and disingenious at best.

> (btw, why are you interested in desktop-on-desktop? one use case is 
> developers, which don't really need fancy GUIs; a second is people who test 
> out distributions, but that doesn't seem to be a huge population; and a 
> third is people running Windows for some application that doesn't run on 
> Linux - hopefully a small catergory as well.  Seems to be quite a small 
> target audience, compared to, say, video editing)

I'm interested in desktop-on-desktop because i walk this world with open eyes 
and i care about Linux, and these days qemu-kvm is the first thing a new Linux 
user sees about Linux virtualization. I've observed several people i know in 
person to turn away from Linux and go back to Windows or go over to Apple 
because they had a much more mature solution.

I'd probably turn away from Linux myself if i were a newbie and if i were 
forced to use KVM on the desktop today.

Again, you dont seem to realize that you as a maintainer are at a central 
point where you have the ability to turn the self-fulfilling prophecy that 
'nobody cares about the Linux desktop' into a reality - or where you have the 
ability to prevent it from happening. It is hugely harmful process, especially 
as you seem to delude yourself that you have nothing to do with it.

Anyway, it's good you expressed your views about this as this will help the 
chances of a fresh restart. (which chances are still not too good though)

Thanks,
	
	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 11:14                                                                                         ` Ingo Molnar
@ 2010-03-22 11:23                                                                                           ` Alexander Graf
  2010-03-22 12:33                                                                                             ` Lukas Kolbe
  2010-03-22 12:29                                                                                           ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Alexander Graf @ 2010-03-22 11:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins


On 22.03.2010, at 12:14, Ingo Molnar wrote:

> 
> * Avi Kivity <avi@redhat.com> wrote:
> 
>> On 03/21/2010 11:54 PM, Ingo Molnar wrote:
>>> * Avi Kivity<avi@redhat.com>  wrote:
>>> 
>>>> On 03/21/2010 10:55 PM, Ingo Molnar wrote:
>>>>> Of course you could say the following:
>>>>> 
>>>>>  ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
>>>>>    able to add this to the v2.6.35 kernel queue anymore as the ongoing
>>>>>    usability work already takes up all of the project's maintainer and
>>>>>    testing bandwidth. If you want the feature to be merged sooner than that
>>>>>    then please help us cut down on the TODO and BUGS list that can be found
>>>>>    at XYZ. There's quite a few low hanging fruits there. '
>>>> That would be shooting at my own foot as well as the contributor's since I
>>>> badly want that RCU stuff, and while a GUI would be nice, that itch isn't on
>>>> my back.
>>> I think this sums up the root cause of all the problems i see with KVM pretty
>>> well.
>> 
>> I think we agree at last.  Neither I nor my employer are interested in 
>> running qemu as a desktop-on-desktop tool, therefore I don't invest any 
>> effort in that direction, or require it from volunteers.
> 
> Obviously your employer at least in part defers to you when it comes to KVM 
> priorities.
> 
> So, just to make this really clear, _you_ are not interested in running qemu 
> as a desktop-on-desktop tool, subsequently this kind of 
> disinterest-for-desktop-usability trickled through the whole KVM stack and 
> poisoned your attitude and your contributor's attitude.
> 
> Too sad really and it's doubly sad that you dont feel anything wrong about 
> that.

Please, don't jump to unjust conclusions.

The whole point is that there's no money behind desktop-on-desktop virtualization. Thus nobody pays people to work on it. Thus nothing significant happens in that space.

If there was someone standing up to create a really decent desktop qemu front-end I'm confident we'd even officially suggest using that. In fact, that whole discussion did come up in the weekly Qemu/KVM community call and everybody agreed heavily that we do need a desktop client.

The problem is just that there is nobody standing up. And I hope you don't expect Avi to be the one creating a GUI.


Alex


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22  6:49                                                                                       ` Avi Kivity
@ 2010-03-22 11:23                                                                                         ` Ingo Molnar
  2010-03-22 12:49                                                                                           ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 11:23 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> IMO the reason perf is more usable than oprofile has less to do with the 
> kernel/userspace boundary and more do to with effort and attention spent on 
> the userspace/user boundary.
>
> [...]

If you are interested in the first-hand experience of the people who are doing 
the perf work then here it is: by far the biggest reason for perf success and 
perf usability is the integration of the user-space tooling with the 
kernel-space bits, into a single repository and project.

The very move you are opposing so vehemently for KVM.

Oprofile went the way you proposed, and it was a failure. It failed not 
because it was bad technology (it was pretty decent and people used it), it 
was not a failure because the wrong people worked on it (to the contrary, very 
capable people worked on it), it was a failure in hindsight because it simply 
incorrectly split into two projects which stiffled the progress of each other.

Obviously 3 years ago you'd have seen a similar, big "Oprofile is NOT broken!" 
flamewar, had i posted the same observations about Oprofile that i expressed 
about KVM here. (In fact there was a similar, big flamewar about all this when 
perf was posted a year ago.)

And yes, (as you are aware of) i see very similar patterns of inefficiency in 
the KVM/Qemu tooling relationship as well, hence did i express my views about 
it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22  6:37                                                                                       ` Avi Kivity
@ 2010-03-22 11:39                                                                                         ` Ingo Molnar
  2010-03-22 12:44                                                                                           ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 11:39 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/21/2010 10:37 PM, Ingo Molnar wrote:
> >
> >>That includes the guest kernel.  If you can deploy a new kernel in the
> >>guest, presumably you can deploy a userspace package.
> >
> > Note that with perf we can instrument the guest with zero guest-kernel 
> > modifications as well.
> >
> > We try to reduce the guest impact to a bare minimum, as the difficulties 
> > in deployment are function of the cross section surface to the guest.
> >
> > Also, note that the kernel is special with regards to instrumentation: 
> > since this is the kernel project, we are doing kernel space changes, as we 
> > are doing them _anyway_. So adding symbol resolution capabilities would be 
> > a minimal addition to that - while adding a while new guest package for 
> > the demon would significantly increase the cross section surface.
> 
> It's true that for us, changing the kernel is easier than changing the rest 
> of the guest.  IMO we should still resist the temptation to go the easy path 
> and do the right thing (I understand we disagree about what the right thing 
> is).

It is not about the 'temptation to go the easy path'.

It is about finding the most pragmatic approach and realizing the cost of 
inaction: sucky Linux, sucky KVM.

Let me give you an example: Linus's commit in v2.6.30 that changed the 
user-space policy of the EXT3 filesystem to make it more desktop capable:

  bbae8bc: ext3: make default data ordering mode configurable

That changes was opposed vehemently with your kind of arguments: "such changes 
should be done by the distributions", "it should be done correctly", "the 
kernel should not implement policy", etc..

I can also tell you that this commit improved my desktop experience 
incredibly. Still, distros didnt do it for almost a decade of ext3 existence. 
Why?

Truth is that those kinds of "do it right" arguments are mistaken because they 
assume that we live in an ideal, 'perfect market' where all inefficiencies 
will get eliminated in the long run.

In reality the "market" for OSS software is imperfect:

 - there's marginal costs of action - a too small change has difficulty 
   getting over that

 - there's costs of modularization (which are both technical and social)

 - there's the power of the status quo acting against marginally good changes

 - there's the power of entropy ripping Linux distributions apart making
   all-distro changes harder 

So the solution to the "why dont the distributions do this" question you pose 
is exactly what i propose: _give a default, reference implementation of KVM 
tooling that has to be eclipsed_.

There's the unique position of the kernel that it can impose sanity in a more 
central way which acts as a reference implementation.

I.e. the kernel can very much improve quality all across the board by 
providing a sane default (in the ext3 case) - or, as in the case of perf, by 
providing a sane 'baseline' tooling. It should do the same for KVM as well.

If we dont do that, Linux will eventually stop mattering on the desktop - and 
some time after that, it will vanish from the server space as well. Then, may 
it be a decade down the line, you wont have a KVM hacking job left, and you 
wont know where all those forces eliminating your project came from.

But i told you now so you'll know ;-)

Reality is, the server space never was and never will be self-sustaining in 
the long run (as Novell has found it out with Netware), it is the desktop that 
dictates future markets. This is why i find your views about this naive and 
shortsighted.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-22 10:59           ` Ingo Molnar
@ 2010-03-22 11:47             ` Joerg Roedel
  2010-03-22 12:26               ` Ingo Molnar
  2010-03-23 13:18               ` Soeren Sandmann
  0 siblings, 2 replies; 390+ messages in thread
From: Joerg Roedel @ 2010-03-22 11:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo

On Mon, Mar 22, 2010 at 11:59:27AM +0100, Ingo Molnar wrote:
> Best would be if you demonstrated any problems of the perf symbol lookup code 
> you are aware of on the host side, as it has that exact design you are 
> criticising here. We are eager to fix any bugs in it.
> 
> If you claim that it's buggy then that should very much be demonstratable - no 
> need to go into theoretical arguments about it.

I am not claiming anything. I just try to imagine how your proposal
will look like in practice and forgot that symbol resolution is done at
a later point.
But even with defered symbol resolution we need more information from
the guest than just the rip falling out of KVM. The guest needs to tell
us about the process where the event happened (information that the host
has about itself without any hassle) and which executable-files it was
loaded from.

> > Avi was against that but I think it would make sense to give names to 
> > virtual machines (with a default, similar to network interface names). Then 
> > we can create a directory in /dev/ with that name (e.g. /dev/vm/fedora/). 
> > Inside the guest a (priviledged) process can create some kind of named 
> > virt-pipe which results in a device file created in the guests directory 
> > (perf could create /dev/vm/fedora/perf for example). This file is used for 
> > guest-host communication.
> 
> That is kind of half of my suggestion - the built-in enumeration guests and a 
> guaranteed channel to them accessible to tools. (KVM already has its own 
> special channel so it's not like channels of communication are useless.)
> 
> The other half of my suggestion is that if we bring this thought to its 
> logical conclusion then we might as well walk the whole mile and not use 
> quirky, binary API single-channel pipes. I.e. we could use this convenient, 
> human-readable, structured, hierarchical abstraction to expose information in 
> a finegrained, scalable way, which has a world-class implementation in Linux: 
> the 'VFS namespace'.

Probably. At least it is the solution that fits best into the current
design of perf. But we should think about how this will be done. Raw
disk access is no solution because we need to access virtual
file-systems of the guest too. Network filesystems may be a solution but
then we come back to the 'deployment-nightmare'.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22  6:35                                                                                               ` Avi Kivity
@ 2010-03-22 11:48                                                                                                 ` Ingo Molnar
  2010-03-22 12:31                                                                                                     ` Pekka Enberg
  2010-03-22 12:36                                                                                                   ` Avi Kivity
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 11:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Antoine Martin, Olivier Galibert, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> > My 10+ years experience with kernel instrumentation solutions is that 
> > kernel-driven, self-sufficient, robust, trustable, well-enumerated sources 
> > of information work far better in practice.
> 
> What about line number information?  And the source?  Into the kernel with 
> them as well?

Sigh. Please read the _very first_ suggestion i made, which solves all that. I 
rarely go into discussions without suggesting technical solutions - i'm not 
interested in flaming, i'm interested in real solutions.

Here it is, repeated for the Nth time:

Allow a guest to (optionally) integrate its VFS namespace with the host side 
as well. An example scheme would be:

   /guests/Fedora-G1/
   /guests/Fedora-G1/proc/
   /guests/Fedora-G1/usr/
   /guests/Fedora-G1/.../
   /guests/OpenSuse-G2/
   /guests/OpenSuse-G2/proc/
   /guests/OpenSuse-G2/usr/
   /guests/OpenSuse-G2/.../

  ( This feature would be configurable and would be default-off, to maintain 
    the current status quo. )

Line number information and the source (dwarf info) and ELF symbols are all 
provided and accessible via such an interface - no need to run any 'symbol 
demon' on the guest side.

And, obviously, having the guest VFS namespace (optionally) available on the 
host side also has far more uses than perf's symbol needs.

I was surprised no-one ever came up with such a suggestion - it is so obvious 
to allow the integration of the VFS namespaces. But given your explicit 
declaration of your KVM desktop usability indifference i'm kind of not 
surprised about that anymore.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22  0:16                                                                                       ` Anthony Liguori
@ 2010-03-22 11:59                                                                                         ` Ingo Molnar
  0 siblings, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 11:59 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Pekka Enberg, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/21/2010 04:54 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>On 03/21/2010 10:55 PM, Ingo Molnar wrote:
> >>>Of course you could say the following:
> >>>
> >>>   ' Thanks, I'll mark this for v2.6.36 integration. Note that we are not
> >>>     able to add this to the v2.6.35 kernel queue anymore as the ongoing
> >>>     usability work already takes up all of the project's maintainer and
> >>>     testing bandwidth. If you want the feature to be merged sooner than that
> >>>     then please help us cut down on the TODO and BUGS list that can be found
> >>>     at XYZ. There's quite a few low hanging fruits there. '
> >>That would be shooting at my own foot as well as the contributor's since I
> >>badly want that RCU stuff, and while a GUI would be nice, that itch isn't on
> >>my back.
> >I think this sums up the root cause of all the problems i see with KVM pretty
> >well.
> 
> A good maintainer has to strike a balance between asking more of people than 
> what they initially volunteer and getting people to implement the less fun 
> things that are nonetheless required. [...]

Sorry to be blunt, but i dont think there's a different way to say it: i am a 
user of the software you are maintaining (Qemu) and i dont think you have the 
basis to educate people about what a good maintainer should do to achieve a 
quality end result.

I think you could/should learn much from Linus and others who very much 
require quality to permeate the full dimension of a contribution (including 
usability), beyond the narrow, local scope of the contribution.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-21 21:03                                                                                           ` Avi Kivity
  2010-03-21 21:20                                                                                             ` Ingo Molnar
@ 2010-03-22 12:05                                                                                             ` Antoine Martin
  1 sibling, 0 replies; 390+ messages in thread
From: Antoine Martin @ 2010-03-22 12:05 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Olivier Galibert, Ingo Molnar, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

[snip]
>>>   I believe that -kernel use will be rare, though.  It's a lot 
>>> easier to keep everything in one filesystem.
>> Well, for what it's worth, I rarely ever use anything else. My 
>> virtual disks are raw so I can loop mount them easily, and I can also 
>> switch my guest kernels from outside... without ever needing to mount 
>> those disks.
>
> Curious, what do you use them for?
Various things, here is one use case which I think is under-used: 
read-only virtual disks with just one network application on them (no 
runlevels, sshd, user accounts, etc), a hell of a lot easier to maintain 
and secure than a full blown distro. Want a new kernel? boot a new VM 
and swap it for the old one with zero downtime (if your network app 
supports this sort of hot-swap - which a lot of cluster apps do)

Another reason for wanting to keep the kernel outside is to limit the 
potential points of failure: remove the partition table, remove the 
bootloader, remove even the ramdisk. Also makes it easier to switch to 
another solution (say UML) or another disk driver (as someone mentioned 
previously).
In virtualized environments I often prefer to remove the ability to load 
kernel modules too, for obvious reasons.

Hope this helps.

Antoine

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 11:10                                                                                   ` oerg Roedel
@ 2010-03-22 12:22                                                                                     ` Ingo Molnar
  2010-03-22 13:46                                                                                       ` Joerg Roedel
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 12:22 UTC (permalink / raw)
  To: oerg Roedel
  Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* oerg Roedel <joro@8bytes.org> wrote:

> On Sun, Mar 21, 2010 at 09:31:21PM +0100, Ingo Molnar wrote:
> > Lets see one example of that thought process in action: Oprofile.
> 
> Since you are talking so much about oProfile in this thread I think it is 
> important to mention that the problem with oProfile was not the repository 
> separation.
> 
> The problem was (and is) that the kernel and the user-space parts are 
> maintained by different people [...]

Caused by: repository separation and the inevitable code and social fork a 
decade later.

> [...] who dont talk to each other or have a direction where they want to go 
> with the project. [...]

Caused by: repository separation and the inevitable code and social fork a 
decade later.

> [...] Basically the reason of the oProfile failure is a disfunctional 
> community. [...]

Caused by: repository separation and the inevitable code and social fork a 
decade later.

> [...] I told the kernel-maintainer several times to also maintain 
> user-space but he didn't want that.
> 
> The situation with KVM is entirely different. Avi commits to kvm.git and 
> qemu-kvm.git so he maintains both. [...]

What you fail to realise (or what you fail to know, you werent around when 
Oprofile was written, i was) is that Oprofile _did_ have a functional single 
community when it was written. The tooling and the kernel bits was written by 
the same people.

But a decade is a long time and the drift happened due to the inevitability of 
the repository separation, and due to the _inability_ to reach a sane, usable 
solution within that framework of separation.

So i dont see much of a difference to the Oprofile situation really and i see 
many parallels. I also see similar kinds of desktop usability problems.

The difference is that we dont have KVM with a decade of history and we dont 
have a 'told you so' KVM reimplementation to show that proves the point. I 
guess it's a matter of time before that happens, because Qemu usability is so 
absymal today - so i guess we should suspend any discussions until that 
happens, no need to waste time on arguing hypoteticals.

I think you are rationalizing the status quo.

It's as if you argued in 1990 that the unification of East and West Germany 
wouldnt make much sense because despite clear problems and incompatibilites 
and different styles westerners were still allowed to visit eastern relatives 
and they both spoke the same language after all ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-22 11:47             ` Joerg Roedel
@ 2010-03-22 12:26               ` Ingo Molnar
  2010-03-23 13:18               ` Soeren Sandmann
  1 sibling, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 12:26 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Zhang, Yanmin, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, zhiteng.huang, Fr??d??ric Weisbecker,
	Arnaldo Carvalho de Melo


* Joerg Roedel <joro@8bytes.org> wrote:

> On Mon, Mar 22, 2010 at 11:59:27AM +0100, Ingo Molnar wrote:
> > Best would be if you demonstrated any problems of the perf symbol lookup code 
> > you are aware of on the host side, as it has that exact design you are 
> > criticising here. We are eager to fix any bugs in it.
> > 
> > If you claim that it's buggy then that should very much be demonstratable - no 
> > need to go into theoretical arguments about it.
> 
> I am not claiming anything. I just try to imagine how your proposal will 
> look like in practice and forgot that symbol resolution is done at a later 
> point.
>
> But even with defered symbol resolution we need more information from the 
> guest than just the rip falling out of KVM. The guest needs to tell us about 
> the process where the event happened (information that the host has about 
> itself without any hassle) and which executable-files it was loaded from.

Correct - for full information we need a good paravirt perf integration of the 
kernel bits to pass that through. (I.e. we want to 'integrate' the PID space 
as well, at least within the perf notion of PIDs.)

Initially we can do without that as well.

> Probably. At least it is the solution that fits best into the current design 
> of perf. But we should think about how this will be done. Raw disk access is 
> no solution because we need to access virtual file-systems of the guest too. 
> [...]

I never said anything about 'raw disk access'. Have you seen my proposal of 
(optional) VFS namespace integration? (It can be found repeated the Nth time 
in my mail you replied to)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 11:14                                                                                         ` Ingo Molnar
  2010-03-22 11:23                                                                                           ` Alexander Graf
@ 2010-03-22 12:29                                                                                           ` Avi Kivity
  2010-03-22 12:44                                                                                             ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 12:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 01:14 PM, Ingo Molnar wrote:
>
>> I think we agree at last.  Neither I nor my employer are interested in
>> running qemu as a desktop-on-desktop tool, therefore I don't invest any
>> effort in that direction, or require it from volunteers.
>>      
> Obviously your employer at least in part defers to you when it comes to KVM
> priorities.
>    

In part, yes.

> So, just to make this really clear, _you_ are not interested in running qemu
> as a desktop-on-desktop tool, subsequently this kind of
> disinterest-for-desktop-usability trickled through the whole KVM stack and
> poisoned your attitude and your contributor's attitude.
>    

I am also disinterested in ppc virtualization, yet it happened.  I am 
disinterested in ia64 virtualization, yet it happened.  I am 
disinterested in s390 virtualization, yet it happened.

Linus doesn't care about virtualization, yet it happened.

I don't tell my contributor what to be interested in, only whether their 
patches are good or not.  I can tell you that Red Hat contributors don't 
work on a desktop kvm GUI not because I discourage them, but because the 
product we are working on does not contain a desktop kvm GUI.  Jan 
Kiszka contributed a lot of debugger features, fixes, and improvement, 
presumably he and/or his employer need that more than a kvm desktop GUI.

I can't see why you see anything wrong with this.  People write patches 
for their own interest, not yours or mine.

> Too sad really and it's doubly sad that you dont feel anything wrong about
> that.
>    

It would be lovely to have a desktop kvm GUI.  I don't feel I have to 
write it myself or compel others to write it.  I don't feel sad about it.

>> If you think a good GUI is so badly needed, either write one yourself, or
>> convince someone else to do it.
>>      
> To a certain degree we are trying to do a small bit of that (see this very
> thread) - and you are NAK-ing and objecting the heck out of it via your
> unreasonable microkernelish and server-centric views.
>    

The perf bits have nothing to do with a GUI or usability for general 
users.  Calling them "unreasonable microkernelish sever-centric views" 
is just a way of not addressing them.

> With constant maintainer disinterest there's no wonder a non-desktop-oriented
> KVM becomes a self-fulfilling prophecy: you think the desktop does not matter,
> hence it becomes a reality in KVM space which you can constantly refer back to
> as a 'fact'.
>    

It's a fact that virtualization is happening in the data center, not on 
the desktop.  You think a kvm GUI can become a killer application? fine, 
write one.  You don't need any consent from me as kvm maintainer (if 
patches are needed to kvm that improve the desktop experience, I'll 
accept them, though they'll have to pass my unreasonable microkernelish 
filters).  If you're right then the desktop kvm GUI will be a huge hit 
with zillions of developers and people will drop Windows and switch to 
Linux just to use it.

But my opinion is that it will end up like virtualbox, a nice app that 
you can use to run Windows-on-Linux, but is not all that useful.

> Which i find dishonest and disingenious at best.
>    

If you're going to use words like 'dishonest' then please don't send me 
any more email.

>> (btw, why are you interested in desktop-on-desktop? one use case is
>> developers, which don't really need fancy GUIs; a second is people who test
>> out distributions, but that doesn't seem to be a huge population; and a
>> third is people running Windows for some application that doesn't run on
>> Linux - hopefully a small catergory as well.  Seems to be quite a small
>> target audience, compared to, say, video editing)
>>      
> I'm interested in desktop-on-desktop because i walk this world with open eyes
> and i care about Linux, and these days qemu-kvm is the first thing a new Linux
> user sees about Linux virtualization. I've observed several people i know in
> person to turn away from Linux and go back to Windows or go over to Apple
> because they had a much more mature solution.
>    

Which distribution are they using?  Most people would see virt-manager 
as the first thing, not open gnome-terminal and start typing in the qemu 
command line.  While it's not perfect, it does have a shiny GUI with 
lots of tabs and buttons.

> I'd probably turn away from Linux myself if i were a newbie and if i were
> forced to use KVM on the desktop today.
>
> Again, you dont seem to realize that you as a maintainer are at a central
> point where you have the ability to turn the self-fulfilling prophecy that
> 'nobody cares about the Linux desktop' into a reality - or where you have the
> ability to prevent it from happening. It is hugely harmful process, especially
> as you seem to delude yourself that you have nothing to do with it.
>    

It doesn't have to be me.  Better to pick someone who has a clue about 
usability to design and guide this effort.  That someone can work on 
qemu, or if they prefer, tools/kvm (we worked hard to avoid making kvm 
tied to a single userspace).

The kvm toolstack is maintained by multiple people - Marcelo and myself 
at the kernel level, Anthony and the other qemu maintainers at the qemu 
single-guest level, Daniel Veillard and Dan Berrange at the libvirt or 
host level, and Cole Robinson at the virt-manager or GUI level.  It's 
unreasonable to ask one person to do all of this, just like Linus 
doesn't maintain the scheduler even though it is so important.

> Anyway, it's good you expressed your views about this as this will help the
> chances of a fresh restart. (which chances are still not too good though)
>    

All that's needed is to find someone with the skills, time, and interest.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-22 11:48                                                                                                 ` Ingo Molnar
@ 2010-03-22 12:31                                                                                                     ` Pekka Enberg
  2010-03-22 12:36                                                                                                   ` Avi Kivity
  1 sibling, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 12:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 1:48 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> What about line number information?  And the source?  Into the kernel with
>> them as well?
>
> Sigh. Please read the _very first_ suggestion i made, which solves all that. I
> rarely go into discussions without suggesting technical solutions - i'm not
> interested in flaming, i'm interested in real solutions.
>
> Here it is, repeated for the Nth time:
>
> Allow a guest to (optionally) integrate its VFS namespace with the host side
> as well. An example scheme would be:
>
>   /guests/Fedora-G1/
>   /guests/Fedora-G1/proc/
>   /guests/Fedora-G1/usr/
>   /guests/Fedora-G1/.../
>   /guests/OpenSuse-G2/
>   /guests/OpenSuse-G2/proc/
>   /guests/OpenSuse-G2/usr/
>   /guests/OpenSuse-G2/.../
>
>  ( This feature would be configurable and would be default-off, to maintain
>    the current status quo. )

Heh, funny. That would also solve my number one gripe with
virtualization these days: how to get files in and out of guests
without having to install extra packages on the guest side and
fiddling with mount points on every single guest image I want to play
with.

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-22 12:31                                                                                                     ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 12:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Antoine Martin, Olivier Galibert, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 1:48 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> What about line number information?  And the source?  Into the kernel with
>> them as well?
>
> Sigh. Please read the _very first_ suggestion i made, which solves all that. I
> rarely go into discussions without suggesting technical solutions - i'm not
> interested in flaming, i'm interested in real solutions.
>
> Here it is, repeated for the Nth time:
>
> Allow a guest to (optionally) integrate its VFS namespace with the host side
> as well. An example scheme would be:
>
>   /guests/Fedora-G1/
>   /guests/Fedora-G1/proc/
>   /guests/Fedora-G1/usr/
>   /guests/Fedora-G1/.../
>   /guests/OpenSuse-G2/
>   /guests/OpenSuse-G2/proc/
>   /guests/OpenSuse-G2/usr/
>   /guests/OpenSuse-G2/.../
>
>  ( This feature would be configurable and would be default-off, to maintain
>    the current status quo. )

Heh, funny. That would also solve my number one gripe with
virtualization these days: how to get files in and out of guests
without having to install extra packages on the guest side and
fiddling with mount points on every single guest image I want to play
with.

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 11:23                                                                                           ` Alexander Graf
@ 2010-03-22 12:33                                                                                             ` Lukas Kolbe
  0 siblings, 0 replies; 390+ messages in thread
From: Lukas Kolbe @ 2010-03-22 12:33 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Ingo Molnar, Avi Kivity, Pekka Enberg, Anthony Liguori, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker, Gregory Haskins

Am Montag, den 22.03.2010, 12:23 +0100 schrieb Alexander Graf:
 
> >> I think we agree at last.  Neither I nor my employer are interested in 
> >> running qemu as a desktop-on-desktop tool, therefore I don't invest any 
> >> effort in that direction, or require it from volunteers.
> > 
> > Obviously your employer at least in part defers to you when it comes to KVM 
> > priorities.
> > 
> > So, just to make this really clear, _you_ are not interested in running qemu 
> > as a desktop-on-desktop tool, subsequently this kind of 
> > disinterest-for-desktop-usability trickled through the whole KVM stack and 
> > poisoned your attitude and your contributor's attitude.
> > 
> > Too sad really and it's doubly sad that you dont feel anything wrong about 
> > that.
> 
> Please, don't jump to unjust conclusions.
> 
> The whole point is that there's no money behind desktop-on-desktop
> virtualization. Thus nobody pays people to work on it. Thus nothing
> significant happens in that space.
> 
> If there was someone standing up to create a really decent desktop
> qemu front-end I'm confident we'd even officially suggest using that.
> In fact, that whole discussion did come up in the weekly Qemu/KVM
> community call and everybody agreed heavily that we do need a desktop
> client.
> 
> The problem is just that there is nobody standing up. And I hope you
> don't expect Avi to be the one creating a GUI.

Besides, Ingo could just go ahead and use libvirt together with
virt-manager. It solves a few of the usability issues he came up with
somewhere in this thread, is available even in every current
distribution, and *actually* works quite well for the desktop usecase.
It just desparatly needs more brainpower and manpower to make it a
competitor to VirtualBox & Co, because its not as polished and
featurecomplete yet. But I bet virt-managers maintainers welcome patches
to fix and enhance usability. Most of the needed fixes probably wouldn't
touch qemu at all, let alone kvm.

Sorry to chime in with my opinion, but this whole thread is incredibly
boring and full of non-arguments yet really highly amusing.

-- 
Lukas



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 11:48                                                                                                 ` Ingo Molnar
  2010-03-22 12:31                                                                                                     ` Pekka Enberg
@ 2010-03-22 12:36                                                                                                   ` Avi Kivity
  2010-03-22 12:50                                                                                                       ` Pekka Enberg
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 12:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Antoine Martin, Olivier Galibert, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On 03/22/2010 01:48 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> My 10+ years experience with kernel instrumentation solutions is that
>>> kernel-driven, self-sufficient, robust, trustable, well-enumerated sources
>>> of information work far better in practice.
>>>        
>> What about line number information?  And the source?  Into the kernel with
>> them as well?
>>      
> Sigh. Please read the _very first_ suggestion i made, which solves all that. I
> rarely go into discussions without suggesting technical solutions - i'm not
> interested in flaming, i'm interested in real solutions.
>
> Here it is, repeated for the Nth time:
>
> Allow a guest to (optionally) integrate its VFS namespace with the host side
> as well. An example scheme would be:
>
>     /guests/Fedora-G1/
>    

[...]

You're missing something.  This sub-thread is about someone launching a 
kernel with 'qemu -kernel', the kernel lives outside the guest disk 
image, they don't want a custom initrd because it's hard to make.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:31                                                                                                     ` Pekka Enberg
  (?)
@ 2010-03-22 12:37                                                                                                     ` Daniel P. Berrange
  2010-03-22 12:44                                                                                                       ` Pekka Enberg
  2010-03-22 12:54                                                                                                       ` Ingo Molnar
  -1 siblings, 2 replies; 390+ messages in thread
From: Daniel P. Berrange @ 2010-03-22 12:37 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ingo Molnar, Avi Kivity, Antoine Martin, Olivier Galibert,
	Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 02:31:49PM +0200, Pekka Enberg wrote:
> On Mon, Mar 22, 2010 at 1:48 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >> What about line number information?  And the source?  Into the kernel with
> >> them as well?
> >
> > Sigh. Please read the _very first_ suggestion i made, which solves all that. I
> > rarely go into discussions without suggesting technical solutions - i'm not
> > interested in flaming, i'm interested in real solutions.
> >
> > Here it is, repeated for the Nth time:
> >
> > Allow a guest to (optionally) integrate its VFS namespace with the host side
> > as well. An example scheme would be:
> >
> >   /guests/Fedora-G1/
> >   /guests/Fedora-G1/proc/
> >   /guests/Fedora-G1/usr/
> >   /guests/Fedora-G1/.../
> >   /guests/OpenSuse-G2/
> >   /guests/OpenSuse-G2/proc/
> >   /guests/OpenSuse-G2/usr/
> >   /guests/OpenSuse-G2/.../
> >
> >  ( This feature would be configurable and would be default-off, to maintain
> >    the current status quo. )
> 
> Heh, funny. That would also solve my number one gripe with
> virtualization these days: how to get files in and out of guests
> without having to install extra packages on the guest side and
> fiddling with mount points on every single guest image I want to play
> with.

FYI, for offline guests, you can use libguestfs[1] to access & change files
inside the guest, and read-only access to running guests files. It provides
access via a interactive shell, APIs in all major languages, and also has a
FUSE mdule to expose it directly in the host VFS.  It could probably be made
to work read-write for running guests too if its agent were installed inside
the guest & leverage the new Virtio-Serial channel for comms (avoiding any
network setup requirements).

Regards,
Daniel

[1] http://libguestfs.org/
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:29                                                                                           ` Avi Kivity
@ 2010-03-22 12:44                                                                                             ` Ingo Molnar
  2010-03-22 12:52                                                                                               ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 12:44 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> On 03/22/2010 01:14 PM, Ingo Molnar wrote:
> >
> >>I think we agree at last.  Neither I nor my employer are interested in
> >>running qemu as a desktop-on-desktop tool, therefore I don't invest any
> >>effort in that direction, or require it from volunteers.
> >Obviously your employer at least in part defers to you when it comes to KVM
> >priorities.
> 
> In part, yes.
> 
> > So, just to make this really clear, _you_ are not interested in running 
> > qemu as a desktop-on-desktop tool, subsequently this kind of 
> > disinterest-for-desktop-usability trickled through the whole KVM stack and 
> > poisoned your attitude and your contributor's attitude.
> 
> I am also disinterested in ppc virtualization, yet it happened.  I am 
> disinterested in ia64 virtualization, yet it happened.  I am disinterested 
> in s390 virtualization, yet it happened.
> 
> Linus doesn't care about virtualization, yet it happened.

You should know the answer yourself: the difference is that usability is a 
core quality of any project.

I as a maintainer can be neutral towards a number of features and patch 
attributes that i dont consider key aspects. (although they can grow out to 
become key features in the future. SMP was a fringe thing 15 years ago.)

Usability is not an attribute you can ignore and i for sure am never neutral 
towards usability deficiencies in patches - i consider usability a key 
quality.

> I don't tell my contributor what to be interested in, only whether their 
> patches are good or not. [...]

Whether a feature is usable or not is sure a metric of 'goodness'.

You have restricted your metric of goodness artificially to not include 
usability. You do that by claiming that the user-space tooling of KVM, while 
being functionally absolutely essential for any user to even try out KVM, is 
'separate' and has no quality connection with the kernel bits of KVM.

It is a convenient argument that allows you to do the kernel bits only. It is 
absolutely catastrophic to the user who'd like to see a usable solution and a 
single project who stands behind their tech.

Thus, _today_, after years of neglect, you can claim that none of the dozens 
of usability problems of KVM has anything to do with the features you are 
working on today. It's in a separate project (the so-called 'Qemu' package) 
after all - none of KVM's business.

In reality if you consider it a single project then those bugs were all 
usability problems introduced earlier on, years ago, when a piece of 
functionality was exposed via KVM. It adds up and now you claim they have 
nothing to do with current work.

This is why i consider that line of argument rather dishonest ...

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:37                                                                                                     ` Daniel P. Berrange
@ 2010-03-22 12:44                                                                                                       ` Pekka Enberg
  2010-03-22 12:54                                                                                                       ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 12:44 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Ingo Molnar, Avi Kivity, Antoine Martin, Olivier Galibert,
	Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

Hi Daniel,

(I'm getting slightly off-topic, sorry about that.)

Daniel P. Berrange kirjoitti:
>>> Here it is, repeated for the Nth time:
>>>
>>> Allow a guest to (optionally) integrate its VFS namespace with the host side
>>> as well. An example scheme would be:
>>>
>>>   /guests/Fedora-G1/
>>>   /guests/Fedora-G1/proc/
>>>   /guests/Fedora-G1/usr/
>>>   /guests/Fedora-G1/.../
>>>   /guests/OpenSuse-G2/
>>>   /guests/OpenSuse-G2/proc/
>>>   /guests/OpenSuse-G2/usr/
>>>   /guests/OpenSuse-G2/.../
>>>
>>>  ( This feature would be configurable and would be default-off, to maintain
>>>    the current status quo. )
>> Heh, funny. That would also solve my number one gripe with
>> virtualization these days: how to get files in and out of guests
>> without having to install extra packages on the guest side and
>> fiddling with mount points on every single guest image I want to play
>> with.
> 
> FYI, for offline guests, you can use libguestfs[1] to access & change files
> inside the guest, and read-only access to running guests files. It provides
> access via a interactive shell, APIs in all major languages, and also has a
> FUSE mdule to expose it directly in the host VFS.  It could probably be made
> to work read-write for running guests too if its agent were installed inside
> the guest & leverage the new Virtio-Serial channel for comms (avoiding any
> network setup requirements).

Right. Thanks for the pointer.

The use case I am thinking of is working on an userspace project and 
wanting to test a piece of code on multiple distributions before pushing 
it out. That pretty much means being able to pull from the host git 
repository (or push to the guest repo) while the guest is running, maybe 
changing the code a bit and then getting the changes back to the host 
for the final push.

What I do now is I push the changes on the host side to a (private) 
remote branch and do the work through that. But that's pretty lame 
workaround in my opinion.

			Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 11:39                                                                                         ` Ingo Molnar
@ 2010-03-22 12:44                                                                                           ` Avi Kivity
  2010-03-22 12:54                                                                                             ` Daniel P. Berrange
  2010-03-22 14:26                                                                                             ` Ingo Molnar
  0 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 12:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 01:39 PM, Ingo Molnar wrote:
> Reality is, the server space never was and never will be self-sustaining in
> the long run (as Novell has found it out with Netware), it is the desktop that
> dictates future markets. This is why i find your views about this naive and
> shortsighted.
>    

Yet Linux is gaining ground in the server and embedded space while 
struggling on the desktop.  Apple is gaining ground on the desktop but 
is invisible on the server side (despite having a nice product - Xserve).

It's true Windows achieved server dominance through it's desktop power, 
but I don't think that's what keeping them there now.

In any case, I'm not going to write a kvm GUI.  It doesn't match my 
skills, interest, or my employer's interest.  If you wish to see a kvm 
GUI you have to write one yourself or convince someone to write it 
(perhaps convince Red Hat to fund such an effort beyond virt-manager).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 11:23                                                                                         ` Ingo Molnar
@ 2010-03-22 12:49                                                                                           ` Avi Kivity
  2010-03-22 13:01                                                                                               ` Pekka Enberg
  2010-03-22 14:47                                                                                             ` Ingo Molnar
  0 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 12:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 01:23 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> IMO the reason perf is more usable than oprofile has less to do with the
>> kernel/userspace boundary and more do to with effort and attention spent on
>> the userspace/user boundary.
>>
>> [...]
>>      
> If you are interested in the first-hand experience of the people who are doing
> the perf work then here it is: by far the biggest reason for perf success and
> perf usability is the integration of the user-space tooling with the
> kernel-space bits, into a single repository and project.
>    

Please take a look at the kvm integration code in qemu as a fraction of 
the whole code base.

> The very move you are opposing so vehemently for KVM.
>    

I don't want to fracture a working community.

> Oprofile went the way you proposed, and it was a failure. It failed not
> because it was bad technology (it was pretty decent and people used it), it
> was not a failure because the wrong people worked on it (to the contrary, very
> capable people worked on it), it was a failure in hindsight because it simply
> incorrectly split into two projects which stiffled the progress of each other.
>    

Every project that has some kernel footprint, except perf, is split like 
that.  Are they all failures?

Seems like perf is also split, with sysprof being developed outside the 
kernel.  Will you bring sysprof into the kernel?  Will every feature be 
duplicated in prof and sysprof?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-22 12:36                                                                                                   ` Avi Kivity
@ 2010-03-22 12:50                                                                                                       ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 12:50 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Antoine Martin, Olivier Galibert, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 2:36 PM, Avi Kivity <avi@redhat.com> wrote:
>> Here it is, repeated for the Nth time:
>>
>> Allow a guest to (optionally) integrate its VFS namespace with the host
>> side
>> as well. An example scheme would be:
>>
>>    /guests/Fedora-G1/
>>
>
> [...]
>
> You're missing something.  This sub-thread is about someone launching a
> kernel with 'qemu -kernel', the kernel lives outside the guest disk image,
> they don't want a custom initrd because it's hard to make.

Well, you know, I am missing your point here about initrd. Surely the
guest kernels need to use sys_mount() at some point at which time they
could just tell the host kernel where they can find the mount points?
But maybe we're not talking about that kind of scenario here?

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-22 12:50                                                                                                       ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 12:50 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Antoine Martin, Olivier Galibert, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 2:36 PM, Avi Kivity <avi@redhat.com> wrote:
>> Here it is, repeated for the Nth time:
>>
>> Allow a guest to (optionally) integrate its VFS namespace with the host
>> side
>> as well. An example scheme would be:
>>
>>    /guests/Fedora-G1/
>>
>
> [...]
>
> You're missing something.  This sub-thread is about someone launching a
> kernel with 'qemu -kernel', the kernel lives outside the guest disk image,
> they don't want a custom initrd because it's hard to make.

Well, you know, I am missing your point here about initrd. Surely the
guest kernels need to use sys_mount() at some point at which time they
could just tell the host kernel where they can find the mount points?
But maybe we're not talking about that kind of scenario here?

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:44                                                                                             ` Ingo Molnar
@ 2010-03-22 12:52                                                                                               ` Avi Kivity
  2010-03-22 14:32                                                                                                 ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 12:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 02:44 PM, Ingo Molnar wrote:
> This is why i consider that line of argument rather dishonest ...
>    

I am not going to reply to any more email from you on this thread.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:37                                                                                                     ` Daniel P. Berrange
  2010-03-22 12:44                                                                                                       ` Pekka Enberg
@ 2010-03-22 12:54                                                                                                       ` Ingo Molnar
  2010-03-22 13:05                                                                                                         ` Daniel P. Berrange
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 12:54 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert,
	Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Daniel P. Berrange <berrange@redhat.com> wrote:

> On Mon, Mar 22, 2010 at 02:31:49PM +0200, Pekka Enberg wrote:
> > On Mon, Mar 22, 2010 at 1:48 PM, Ingo Molnar <mingo@elte.hu> wrote:
> > >> What about line number information? ?And the source? ?Into the kernel with
> > >> them as well?
> > >
> > > Sigh. Please read the _very first_ suggestion i made, which solves all that. I
> > > rarely go into discussions without suggesting technical solutions - i'm not
> > > interested in flaming, i'm interested in real solutions.
> > >
> > > Here it is, repeated for the Nth time:
> > >
> > > Allow a guest to (optionally) integrate its VFS namespace with the host side
> > > as well. An example scheme would be:
> > >
> > > ? /guests/Fedora-G1/
> > > ? /guests/Fedora-G1/proc/
> > > ? /guests/Fedora-G1/usr/
> > > ? /guests/Fedora-G1/.../
> > > ? /guests/OpenSuse-G2/
> > > ? /guests/OpenSuse-G2/proc/
> > > ? /guests/OpenSuse-G2/usr/
> > > ? /guests/OpenSuse-G2/.../
> > >
> > > ?( This feature would be configurable and would be default-off, to maintain
> > > ? ?the current status quo. )
> > 
> > Heh, funny. That would also solve my number one gripe with virtualization 
> > these days: how to get files in and out of guests without having to 
> > install extra packages on the guest side and fiddling with mount points on 
> > every single guest image I want to play with.
> 
> FYI, for offline guests, you can use libguestfs[1] to access & change files 
> inside the guest, and read-only access to running guests files. It provides 
> access via a interactive shell, APIs in all major languages, and also has a 
> FUSE mdule to expose it directly in the host VFS.  It could probably be made 
> to work read-write for running guests too if its agent were installed inside 
> the guest & leverage the new Virtio-Serial channel for comms (avoiding any 
> network setup requirements).
> 
> Regards,
> Daniel
> 
> [1] http://libguestfs.org/

Yes, this is the kind of functionality i'm suggesting.

I'd suggest a different implementation for live guests: to drive this from 
within the live guest side of KVM, i.e. basically a paravirt driver for 
guestfs. You'd pass file API guests to the guest directly, via the KVM ioctl 
or so - and get responses from the guest.

That will give true read-write access and completely coherent (and still 
transparent) VFS integration, with no host-side knowledge needed for the 
guest's low level (raw) filesystem structure. That's a big advantage.

Yes, it needs an 'aware' guest kernel - but that is a one-off transition 
overhead whose cost is zero in the long run. (i.e. all KVM kernels beyond a 
given version would have this ability - otherwise it's guest side distribution 
transparent)

Even 'offline' read-only access could be implemented by booting a minimal 
kernel via qemu -kernel and using a 'ro' boot option. That way you could 
eliminate all lowlevel filesystem knowledge from libguestfs. You could run 
ext4 or btrfs guest filesystems and FAT ones as well - with no restriction.

This would allow 'offline' access to Windows images as well: a FAT or ntfs 
enabled mini-kernel could be booted in read-only mode.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:44                                                                                           ` Avi Kivity
@ 2010-03-22 12:54                                                                                             ` Daniel P. Berrange
  2010-03-22 14:26                                                                                             ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Daniel P. Berrange @ 2010-03-22 12:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Olivier Galibert, Anthony Liguori, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 02:44:57PM +0200, Avi Kivity wrote:
> On 03/22/2010 01:39 PM, Ingo Molnar wrote:
> >Reality is, the server space never was and never will be self-sustaining in
> >the long run (as Novell has found it out with Netware), it is the desktop 
> >that
> >dictates future markets. This is why i find your views about this naive and
> >shortsighted.
> >   
> 
> Yet Linux is gaining ground in the server and embedded space while 
> struggling on the desktop.  Apple is gaining ground on the desktop but 
> is invisible on the server side (despite having a nice product - Xserve).
> 
> It's true Windows achieved server dominance through it's desktop power, 
> but I don't think that's what keeping them there now.
> 
> In any case, I'm not going to write a kvm GUI.  It doesn't match my 
> skills, interest, or my employer's interest.  If you wish to see a kvm 
> GUI you have to write one yourself or convince someone to write it 
> (perhaps convince Red Hat to fund such an effort beyond virt-manager).

It is planned to add support for SPICE remote desktop to virt-manager
once that matures & is accepted into upstream KVM/QEMU. That will improve
the guest/desktop interaction in many ways compared to VNC or SDL, with
improved display resolution changing, copy+paste between host & guest,
much better graphics performance, etc. 

Development efforts aren't totally ignoring the desktop, more that they
are focusing on remoting guest desktops, rather than interaction host 
desktop since that's where alot of demand is. This benefits single host
desktops scenarios too, since there's alot of overlap in the problems
faced there.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-22 12:49                                                                                           ` Avi Kivity
@ 2010-03-22 13:01                                                                                               ` Pekka Enberg
  2010-03-22 14:47                                                                                             ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 13:01 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, sandmann

Hi Avi,

On Mon, Mar 22, 2010 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote:
> Seems like perf is also split, with sysprof being developed outside the
> kernel.  Will you bring sysprof into the kernel?  Will every feature be
> duplicated in prof and sysprof?

I am glad you brought it up! Sysprof was historically outside of the
kernel (with it's own kernel module, actually). While the GUI was
nice, it was much harder to set up compared to oprofile so it wasn't
all that popular. Things improved slightly when Ingo merged the custom
kernel module but the _userspace_ part of sysprof was lagging behind a
bit. I don't know what's the situation now that they've switched over
to perf syscalls but you probably get my point.

It would be nice if the two projects merged but I honestly don't see
any fundamental problem with two (or more) co-existing projects.
Friendly competition will ultimately benefit the users (think KDE and
Gnome here).

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-22 13:01                                                                                               ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 13:01 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, sandmann

Hi Avi,

On Mon, Mar 22, 2010 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote:
> Seems like perf is also split, with sysprof being developed outside the
> kernel.  Will you bring sysprof into the kernel?  Will every feature be
> duplicated in prof and sysprof?

I am glad you brought it up! Sysprof was historically outside of the
kernel (with it's own kernel module, actually). While the GUI was
nice, it was much harder to set up compared to oprofile so it wasn't
all that popular. Things improved slightly when Ingo merged the custom
kernel module but the _userspace_ part of sysprof was lagging behind a
bit. I don't know what's the situation now that they've switched over
to perf syscalls but you probably get my point.

It would be nice if the two projects merged but I honestly don't see
any fundamental problem with two (or more) co-existing projects.
Friendly competition will ultimately benefit the users (think KDE and
Gnome here).

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:54                                                                                                       ` Ingo Molnar
@ 2010-03-22 13:05                                                                                                         ` Daniel P. Berrange
  2010-03-22 13:23                                                                                                           ` Richard W.M. Jones
  2010-03-22 13:56                                                                                                           ` Ingo Molnar
  0 siblings, 2 replies; 390+ messages in thread
From: Daniel P. Berrange @ 2010-03-22 13:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert,
	Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 01:54:40PM +0100, Ingo Molnar wrote:
> 
> * Daniel P. Berrange <berrange@redhat.com> wrote:
> > 
> > FYI, for offline guests, you can use libguestfs[1] to access & change files 
> > inside the guest, and read-only access to running guests files. It provides 
> > access via a interactive shell, APIs in all major languages, and also has a 
> > FUSE mdule to expose it directly in the host VFS.  It could probably be made 
> > to work read-write for running guests too if its agent were installed inside 
> > the guest & leverage the new Virtio-Serial channel for comms (avoiding any 
> > network setup requirements).
> > 
> > Regards,
> > Daniel
> > 
> > [1] http://libguestfs.org/
> 
> Yes, this is the kind of functionality i'm suggesting.
> 
> I'd suggest a different implementation for live guests: to drive this from 
> within the live guest side of KVM, i.e. basically a paravirt driver for 
> guestfs. You'd pass file API guests to the guest directly, via the KVM ioctl 
> or so - and get responses from the guest.
> 
> That will give true read-write access and completely coherent (and still 
> transparent) VFS integration, with no host-side knowledge needed for the 
> guest's low level (raw) filesystem structure. That's a big advantage.
> 
> Yes, it needs an 'aware' guest kernel - but that is a one-off transition 
> overhead whose cost is zero in the long run. (i.e. all KVM kernels beyond a 
> given version would have this ability - otherwise it's guest side distribution 
> transparent)
> 
> Even 'offline' read-only access could be implemented by booting a minimal 
> kernel via qemu -kernel and using a 'ro' boot option. That way you could 
> eliminate all lowlevel filesystem knowledge from libguestfs. You could run 
> ext4 or btrfs guest filesystems and FAT ones as well - with no restriction.
> 
> This would allow 'offline' access to Windows images as well: a FAT or ntfs 
> enabled mini-kernel could be booted in read-only mode.

This is close to the way libguestfs already works. It boots QEMU/KVM pointing
to a minimal stripped down appliance linux OS image, containing a small agent
it talks to over some form of vmchannel/serial/virtio-serial device. Thus the
kernel in the appliance it runs is the only thing that needs to know about the
filesystem/lvm/dm on-disk formats - libguestfs definitely does not want to be
duplicating this detailed knowledge of on disk format itself. It is doing
full read-write access to the guest filesystem in offline mode - one of the
major use cases is disaster recovery from a unbootable guest OS image.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 13:05                                                                                                         ` Daniel P. Berrange
@ 2010-03-22 13:23                                                                                                           ` Richard W.M. Jones
  2010-03-22 14:02                                                                                                             ` Ingo Molnar
  2010-03-22 14:20                                                                                                             ` oerg Roedel
  2010-03-22 13:56                                                                                                           ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: Richard W.M. Jones @ 2010-03-22 13:23 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Ingo Molnar, Pekka Enberg, Avi Kivity, Antoine Martin,
	Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 01:05:13PM +0000, Daniel P. Berrange wrote:
> This is close to the way libguestfs already works. It boots QEMU/KVM pointing
> to a minimal stripped down appliance linux OS image, containing a small agent
> it talks to over some form of vmchannel/serial/virtio-serial device. Thus the
> kernel in the appliance it runs is the only thing that needs to know about the
> filesystem/lvm/dm on-disk formats - libguestfs definitely does not want to be
> duplicating this detailed knowledge of on disk format itself. It is doing
> full read-write access to the guest filesystem in offline mode - one of the
> major use cases is disaster recovery from a unbootable guest OS image.

As Dan said, the 'daemon' part is separate and could be run as a
standard part of a guest install, talking over vmchannel to the host.
The only real issue I can see is adding access control to the daemon
(currently it doesn't need it and doesn't do any).  Doing it this way
you'd be leveraging the ~250,000 lines of existing libguestfs code,
bindings in multiple languages, tools etc.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
New in Fedora 11: Fedora Windows cross-compiler. Compile Windows
programs, test, and build Windows installers. Over 70 libraries supprt'd
http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:22                                                                                     ` Ingo Molnar
@ 2010-03-22 13:46                                                                                       ` Joerg Roedel
  2010-03-22 16:32                                                                                         ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-22 13:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 01:22:28PM +0100, Ingo Molnar wrote:
> 
> * Joerg Roedel <joro@8bytes.org> wrote:
> 
> > [...] Basically the reason of the oProfile failure is a disfunctional 
> > community. [...]
> 
> Caused by: repository separation and the inevitable code and social fork a 
> decade later.

No, the split-repository situation was the smallest problem after all.
Its was a community thing. If the community doesn't work a single-repo
project will also fail. Look at the state of the alpha arch in Linux
today, it is maintained in one repository but nobody really cares about
it. Thus it is miles behine most other archs Linux supports today in
quality and feature completeness.

> What you fail to realise (or what you fail to know, you werent around when 
> Oprofile was written, i was) is that Oprofile _did_ have a functional single 
> community when it was written. The tooling and the kernel bits was written by 
> the same people.

Yes, this was probably the time when everybody was enthusiastic about
the feature and they could attract lots of developers. But situation
changed over time.

> So i dont see much of a difference to the Oprofile situation really and i see 
> many parallels. I also see similar kinds of desktop usability problems.

The difference is that KVM has a working community with good developers
and maintainers.

> The difference is that we dont have KVM with a decade of history and we dont 
> have a 'told you so' KVM reimplementation to show that proves the point. I 
> guess it's a matter of time before that happens, because Qemu usability is so 
> absymal today - so i guess we should suspend any discussions until that 
> happens, no need to waste time on arguing hypoteticals.

We actually have lguest which is small. But it lacks functionality and
the developer community KVM has attracted.

> I think you are rationalizing the status quo.

I see that there are issues with KVM today in some areas. You pointed
out the desktop usability already. I personally have trouble with the
qem-kvm.git because it is unbisectable. But repository unification
doesn't solve the problem here.
The point for a single repository is that it simplifies the development
process. I agree with you here. But the current process of KVM is not
too difficult after all. I don't have to touch qemu sources for most of
my work on KVM.

> It's as if you argued in 1990 that the unification of East and West Germany 
> wouldnt make much sense because despite clear problems and incompatibilites 
> and different styles westerners were still allowed to visit eastern relatives 
> and they both spoke the same language after all ;-)

Um, hmm. I don't think these situations have enough in common to compare
them ;-)

	Joerg




^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 13:05                                                                                                         ` Daniel P. Berrange
  2010-03-22 13:23                                                                                                           ` Richard W.M. Jones
@ 2010-03-22 13:56                                                                                                           ` Ingo Molnar
  2010-03-22 14:01                                                                                                             ` Richard W.M. Jones
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 13:56 UTC (permalink / raw)
  To: Daniel P. Berrange, Richard Jones
  Cc: Pekka Enberg, Avi Kivity, Antoine Martin, Olivier Galibert,
	Anthony Liguori, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Daniel P. Berrange <berrange@redhat.com> wrote:

> On Mon, Mar 22, 2010 at 01:54:40PM +0100, Ingo Molnar wrote:
> > 
> > * Daniel P. Berrange <berrange@redhat.com> wrote:
> > > 
> > > FYI, for offline guests, you can use libguestfs[1] to access & change files 
> > > inside the guest, and read-only access to running guests files. It provides 
> > > access via a interactive shell, APIs in all major languages, and also has a 
> > > FUSE mdule to expose it directly in the host VFS.  It could probably be made 
> > > to work read-write for running guests too if its agent were installed inside 
> > > the guest & leverage the new Virtio-Serial channel for comms (avoiding any 
> > > network setup requirements).
> > > 
> > > Regards,
> > > Daniel
> > > 
> > > [1] http://libguestfs.org/
> > 
> > Yes, this is the kind of functionality i'm suggesting.
> > 
> > I'd suggest a different implementation for live guests: to drive this from 
> > within the live guest side of KVM, i.e. basically a paravirt driver for 
> > guestfs. You'd pass file API guests to the guest directly, via the KVM ioctl 
> > or so - and get responses from the guest.
> > 
> > That will give true read-write access and completely coherent (and still 
> > transparent) VFS integration, with no host-side knowledge needed for the 
> > guest's low level (raw) filesystem structure. That's a big advantage.
> > 
> > Yes, it needs an 'aware' guest kernel - but that is a one-off transition 
> > overhead whose cost is zero in the long run. (i.e. all KVM kernels beyond a 
> > given version would have this ability - otherwise it's guest side distribution 
> > transparent)
> > 
> > Even 'offline' read-only access could be implemented by booting a minimal 
> > kernel via qemu -kernel and using a 'ro' boot option. That way you could 
> > eliminate all lowlevel filesystem knowledge from libguestfs. You could run 
> > ext4 or btrfs guest filesystems and FAT ones as well - with no restriction.
> > 
> > This would allow 'offline' access to Windows images as well: a FAT or ntfs 
> > enabled mini-kernel could be booted in read-only mode.
> 
> This is close to the way libguestfs already works. [...]

[ Oops, you are right - sorry for not looking more closely! I was confused by
  the 'read only' aspect. ]

> [...] It boots QEMU/KVM pointing to a minimal stripped down appliance linux 
> OS image, containing a small agent it talks to over some form of 
> vmchannel/serial/virtio-serial device. Thus the kernel in the appliance it 
> runs is the only thing that needs to know about the filesystem/lvm/dm 
> on-disk formats - libguestfs definitely does not want to be duplicating this 
> detailed knowledge of on disk format itself. It is doing full read-write 
> access to the guest filesystem in offline mode - one of the major use cases 
> is disaster recovery from a unbootable guest OS image.

Just curious: any plans to extend this to include live read/write access as 
well?

I.e. to have the 'agent' (guestfsd) running universally, so that tools such as 
perf and by users could rely on the VFS integration as well, not just disaster 
recovery tools?

Without universal access to this feature it's not adequate for instrumentation 
purposes.

One option to achieve that would be to extend Qemu to allow 'qemu daemons' to 
run on the (Linux) guest side. These would be statically linked binaries that 
can run on any Linux system, and which could provide various built-in Qemu 
functionality from the guest side to the host side.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 13:56                                                                                                           ` Ingo Molnar
@ 2010-03-22 14:01                                                                                                             ` Richard W.M. Jones
  2010-03-22 14:07                                                                                                               ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Richard W.M. Jones @ 2010-03-22 14:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Daniel P. Berrange, Pekka Enberg, Avi Kivity, Antoine Martin,
	Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, libguestfs

On Mon, Mar 22, 2010 at 02:56:47PM +0100, Ingo Molnar wrote:
> Just curious: any plans to extend this to include live read/write access as 
> well?
>
> I.e. to have the 'agent' (guestfsd) running universally, so that
> tools such as perf and by users could rely on the VFS integration as
> well, not just disaster recovery tools?

Totally.  That's not to say there is a definite plan, but we're very
open to doing this.  We already wrote the daemon in such a way that it
doesn't require the appliance part, but could run inside any existing
guest (we've even ported bits of it to Windoze ...).

The only remaining issue is how access control would be handled.  You
obviously wouldn't want anything in the host that can get access to
the vmchannel socket to start sending destructive write commands into
guests.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 13:23                                                                                                           ` Richard W.M. Jones
@ 2010-03-22 14:02                                                                                                             ` Ingo Molnar
  2010-03-22 14:20                                                                                                             ` oerg Roedel
  1 sibling, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 14:02 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Daniel P. Berrange, Pekka Enberg, Avi Kivity, Antoine Martin,
	Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Richard W.M. Jones <rjones@redhat.com> wrote:

> On Mon, Mar 22, 2010 at 01:05:13PM +0000, Daniel P. Berrange wrote:
> > This is close to the way libguestfs already works. It boots QEMU/KVM pointing
> > to a minimal stripped down appliance linux OS image, containing a small agent
> > it talks to over some form of vmchannel/serial/virtio-serial device. Thus the
> > kernel in the appliance it runs is the only thing that needs to know about the
> > filesystem/lvm/dm on-disk formats - libguestfs definitely does not want to be
> > duplicating this detailed knowledge of on disk format itself. It is doing
> > full read-write access to the guest filesystem in offline mode - one of the
> > major use cases is disaster recovery from a unbootable guest OS image.
> 
> As Dan said, the 'daemon' part is separate and could be run as a standard 
> part of a guest install, talking over vmchannel to the host. The only real 
> issue I can see is adding access control to the daemon (currently it doesn't 
> need it and doesn't do any).  Doing it this way you'd be leveraging the 
> ~250,000 lines of existing libguestfs code, bindings in multiple languages, 
> tools etc.

I think it would be a nice option to allow such guest-side "daemon's" to be 
executed in the guest context without _any_ guest-side support.

This would be possible by building such minimal daemons that use vmchannel, 
and which are built for generic x86 (maybe even built for 32-bit x86 so that 
they can run on any x86 distro). They could execute as the init task of any 
guest kernel - Qemu could 'blend in / replace' the binary as the init task of 
the guest temporarily - and some simple bootstrap code could then start the 
daemon and start the real init binary (and turn off the 'blending' of the init 
task).

That way any guest could be extended via such Qemu functionality - even 
without any kernel changes. Has anyone thought about (or coded) such a 
solution perhaps?

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 14:01                                                                                                             ` Richard W.M. Jones
@ 2010-03-22 14:07                                                                                                               ` Ingo Molnar
  0 siblings, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 14:07 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Daniel P. Berrange, Pekka Enberg, Avi Kivity, Antoine Martin,
	Olivier Galibert, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, libguestfs


* Richard W.M. Jones <rjones@redhat.com> wrote:

> On Mon, Mar 22, 2010 at 02:56:47PM +0100, Ingo Molnar wrote:
> > Just curious: any plans to extend this to include live read/write access as 
> > well?
> >
> > I.e. to have the 'agent' (guestfsd) running universally, so that
> > tools such as perf and by users could rely on the VFS integration as
> > well, not just disaster recovery tools?
> 
> Totally.  That's not to say there is a definite plan, but we're very open to 
> doing this.  We already wrote the daemon in such a way that it doesn't 
> require the appliance part, but could run inside any existing guest (we've 
> even ported bits of it to Windoze ...).
> 
> The only remaining issue is how access control would be handled.  You 
> obviously wouldn't want anything in the host that can get access to the 
> vmchannel socket to start sending destructive write commands into guests.

By default i'd suggest to put it into a maximally restricted mount point. I.e. 
restrict access to only the security context running libguestfs or so.

( Which in practice will be the user starting the guest, so there will be 
  proper protection from other users while still allowing easy access to the 
  user that has access already. )

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 13:23                                                                                                           ` Richard W.M. Jones
  2010-03-22 14:02                                                                                                             ` Ingo Molnar
@ 2010-03-22 14:20                                                                                                             ` oerg Roedel
  1 sibling, 0 replies; 390+ messages in thread
From: oerg Roedel @ 2010-03-22 14:20 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Daniel P. Berrange, Ingo Molnar, Pekka Enberg, Avi Kivity,
	Antoine Martin, Olivier Galibert, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 01:23:26PM +0000, Richard W.M. Jones wrote:
> On Mon, Mar 22, 2010 at 01:05:13PM +0000, Daniel P. Berrange wrote:
> > This is close to the way libguestfs already works. It boots QEMU/KVM pointing
> > to a minimal stripped down appliance linux OS image, containing a small agent
> > it talks to over some form of vmchannel/serial/virtio-serial device. Thus the
> > kernel in the appliance it runs is the only thing that needs to know about the
> > filesystem/lvm/dm on-disk formats - libguestfs definitely does not want to be
> > duplicating this detailed knowledge of on disk format itself. It is doing
> > full read-write access to the guest filesystem in offline mode - one of the
> > major use cases is disaster recovery from a unbootable guest OS image.
> 
> As Dan said, the 'daemon' part is separate and could be run as a
> standard part of a guest install, talking over vmchannel to the host.
> The only real issue I can see is adding access control to the daemon
> (currently it doesn't need it and doesn't do any).  Doing it this way
> you'd be leveraging the ~250,000 lines of existing libguestfs code,
> bindings in multiple languages, tools etc.

I think we don't need per-guest-file access control. Probably we could
apply the image-file permissions to all guestfs files. This would cover
the usecases:

	* perf for reading symbol information (needs ro-access only
	  anyway)
	* Desktop like host<->guest file copy

I have not looked into libguestfs yet but I guess this approach is
easier to achieve.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:44                                                                                           ` Avi Kivity
  2010-03-22 12:54                                                                                             ` Daniel P. Berrange
@ 2010-03-22 14:26                                                                                             ` Ingo Molnar
  2010-03-22 17:29                                                                                               ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 14:26 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/22/2010 01:39 PM, Ingo Molnar wrote:
> >
> > Reality is, the server space never was and never will be self-sustaining 
> > in the long run (as Novell has found it out with Netware), it is the 
> > desktop that dictates future markets. This is why i find your views about 
> > this naive and shortsighted.
> 
> Yet Linux is gaining ground in the server and embedded space while 
> struggling on the desktop. [...]

Frankly, Linux is mainly growing in the server space due to:

 1) the server space is technically much simpler than the desktop space. It
    is far easier to code up a server performance feature than to make
    struggle through stupid (server-motivated) package boundaries and get
    something done on the desktop. It is far easier to code up a server app
    as that space is well standardized and servers tend to be compartmented.
    Integration between server apps is much less common than integration
    between desktop apps, hence the harm that our modularization idiocies
    cause less harm.

 2) Linux's growth is still feeding on the remains of the destruction of Unix.

Linux is struggling on the desktop due to the desktop's inherent complexity, 
due to the lack of the Unix inertia and due to incompetence, insensitivity, 
intellectual arrogance and shortsightedness of server-centric thinking, like 
your arguments/position displayed in this very thread.

> [...] Apple is gaining ground on the desktop but is invisible on the server 
> side (despite having a nice product - Xserve).

But the thing is, Apple doesnt really care about the server space, yet. It is 
lucrative but it is a side-show: it will fall automatically to the 'winner' of 
the desktop (or gadget) of tomorrow.

Has the quick fall of Banyan Vines or Netware (both excellent all-around 
server products) taught you nothing?

We need a lot more desktop focus in the kernel community. The best method to 
achieve this, that i know of currently, is to simply have kernel developers 
think outside the kernel box and to have them do bits of user-space coding as 
well - and in particular desktop coding. To eat our own dogfood in essence. 
Suffer through crap we cause to user-space. To face the _real_ difficulties of 
users. We seem to have forgotten our roots.

> [...]
>
> It's true Windows achieved server dominance through it's desktop power, but 
> I don't think that's what keeping them there now.

What is keeping them there is precisely that.

> In any case, I'm not going to write a kvm GUI.  It doesn't match my skills, 
> interest, or my employer's interest.  If you wish to see a kvm GUI you have 
> to write one yourself or convince someone to write it (perhaps convince Red 
> Hat to fund such an effort beyond virt-manager).

As a maintainer you certainly dont have to write a single line of code, if you 
dont want to. You 'just' need to care about the big picture and encourage/help 
the flow and balance of the whole project.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:52                                                                                               ` Avi Kivity
@ 2010-03-22 14:32                                                                                                 ` Ingo Molnar
  2010-03-22 14:43                                                                                                   ` Anthony Liguori
  2010-03-22 14:46                                                                                                   ` Avi Kivity
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 14:32 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> On 03/22/2010 02:44 PM, Ingo Molnar wrote:
> >This is why i consider that line of argument rather dishonest ...
> 
> I am not going to reply to any more email from you on this thread.

Because i pointed out that i consider a line of argument intellectually 
dishonest?

I did not say _you_ as a person are dishonest - doing that would be an ad 
honimen attack against your person. (In fact i dont think you are, to the 
contrary)

An argument can certainly be labeled dishonest in a fair discussion and it is 
not a personal attack against you to express my opinion about that.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 14:32                                                                                                 ` Ingo Molnar
@ 2010-03-22 14:43                                                                                                   ` Anthony Liguori
  2010-03-22 15:55                                                                                                     ` Ingo Molnar
  2010-03-22 14:46                                                                                                   ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 14:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 09:32 AM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/22/2010 02:44 PM, Ingo Molnar wrote:
>>      
>>> This is why i consider that line of argument rather dishonest ...
>>>        
>> I am not going to reply to any more email from you on this thread.
>>      
> Because i pointed out that i consider a line of argument intellectually
> dishonest?
>
> I did not say _you_ as a person are dishonest - doing that would be an ad
> honimen attack against your person. (In fact i dont think you are, to the
> contrary)
>
> An argument can certainly be labeled dishonest in a fair discussion and it is
> not a personal attack against you to express my opinion about that.
>    

You're being excessively rude in this thread.  That might be acceptable 
on LKML but it's not how the QEMU and KVM communities behave.  This 
thread is a good example of why LKML has the reputation it has.  Avi and 
I argue all of the time on qemu-devel and kvm-devel and it's never 
degraded into a series of personal attacks like this.

I've been trying very hard to turn this into a productive thread 
attempting to capture your feedback and give clear suggestions about how 
you can solve achieve your desired functionality.

What are you looking to achieve?  To you just want to piss and moan 
about how terrible you think Avi and I are?  Or do you want to try to 
actually help make things better?

If you want to help make things better, please focus on making 
constructive suggestions and clarifying what you see as requirements.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 14:32                                                                                                 ` Ingo Molnar
  2010-03-22 14:43                                                                                                   ` Anthony Liguori
@ 2010-03-22 14:46                                                                                                   ` Avi Kivity
  2010-03-22 16:08                                                                                                     ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 14:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 04:32 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/22/2010 02:44 PM, Ingo Molnar wrote:
>>      
>>> This is why i consider that line of argument rather dishonest ...
>>>        
>> I am not going to reply to any more email from you on this thread.
>>      
> Because i pointed out that i consider a line of argument intellectually
> dishonest?
>
> I did not say _you_ as a person are dishonest - doing that would be an ad
> honimen attack against your person. (In fact i dont think you are, to the
> contrary)
>
> An argument can certainly be labeled dishonest in a fair discussion and it is
> not a personal attack against you to express my opinion about that.
>
>    

Sigh, why am I drawn into this.

A person who uses dishonest arguments is a dishonest person.  When you 
say I use a dishonest argument you are implying I am dishonest.  Why do 
you argue with me at all if you think I am trying to cheat?

If you disagree with me, tell me I am wrong, not dishonest (or that my 
arguments are dishonest).  And this is just one example in this thread.  
Seriously, tools/kvm would cause a loss of developers, not a gain, 
simply because of the style of argument of some people on this list.  
Maybe qemu/kernels is a better idea.

Again, if you want to talk to me, use the same language you'd like to 
hear yourself.  Or maybe years of lkml made you so thick skinned you no 
longer understand how to interact with people.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 12:49                                                                                           ` Avi Kivity
  2010-03-22 13:01                                                                                               ` Pekka Enberg
@ 2010-03-22 14:47                                                                                             ` Ingo Molnar
  2010-03-22 18:15                                                                                               ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 14:47 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/22/2010 01:23 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>IMO the reason perf is more usable than oprofile has less to do with the
> >>kernel/userspace boundary and more do to with effort and attention spent on
> >>the userspace/user boundary.
> >>
> >>[...]
> >
> > If you are interested in the first-hand experience of the people who are 
> > doing the perf work then here it is: by far the biggest reason for perf 
> > success and perf usability is the integration of the user-space tooling 
> > with the kernel-space bits, into a single repository and project.
> 
> Please take a look at the kvm integration code in qemu as a fraction of the 
> whole code base.

You have to admit that much of Qemu's past 2-3 years of development was 
motivated by Linux/KVM (i'd say more than 50% of the code). As such it's one 
and the same code base - you just continue to define Qemu to be different from 
KVM.

I very much remember how Qemu looked like _before_ KVM: it was a struggling, 
dying project. KVM clearly changed that.

> > The very move you are opposing so vehemently for KVM.
> 
> I don't want to fracture a working community.

Would you accept (or at least not NAK) a new tools/kvm/ tool that builds 
tooling from grounds up, while leaving Qemu untouched? [assuming it's all 
clean code, etc.]

Although i have doubts about how well that would work 'against' your opinion: 
such a tool would need lots of KVM-side features and a positive attitude from 
you to be really useful. There's a lot of missing functionality to cover.

> > Oprofile went the way you proposed, and it was a failure. It failed not 
> > because it was bad technology (it was pretty decent and people used it), 
> > it was not a failure because the wrong people worked on it (to the 
> > contrary, very capable people worked on it), it was a failure in hindsight 
> > because it simply incorrectly split into two projects which stiffled the 
> > progress of each other.
> 
> Every project that has some kernel footprint, except perf, is split like 
> that.  Are they all failures?

No. Did i ever claim KVM was a failure? I said it's hindered by this design 
aspect.

Are other Linux kernel tool projects affected by similar problems? You bet ...

> Seems like perf is also split, with sysprof being developed outside the 
> kernel.  Will you bring sysprof into the kernel?  Will every feature be 
> duplicated in prof and sysprof?

I'd prefer if sysprof merged into perf as 'perf view' - but its maintainer 
does not want that - which is perfectly OK. So we are building equivalent 
functionality into perf instead.

Think about it like Firefox plugins: the main Firefox project picks up the 
functionality of the most popular Firefox plugins all the time. Session Saver, 
Tab Mix Plus, etc. were all in essence 'merged' (in functionality, not in 
code) into the 'reference' Firefox project.

I think that's a fundamentally healthy model: it allows extensions and thus 
give others an honest chance to show that you are potentially coding an 
inferior piece of code - but also express a clear opinion about what you 
consider a full, usable, high-quality reference implementation and constantly 
improve this reference implementation.

I dont think that can be argued to be a bad model. Yes, it takes a bit of 
thinking outside the box to do tools/kvm/ but of all people i'd expect some of 
that from you.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 13:01                                                                                               ` Pekka Enberg
  (?)
@ 2010-03-22 14:54                                                                                               ` Ingo Molnar
  2010-03-22 19:04                                                                                                 ` Avi Kivity
  2010-03-23  9:46                                                                                                 ` Olivier Galibert
  -1 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 14:54 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Avi Kivity, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, sandmann


* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> Hi Avi,
> 
> On Mon, Mar 22, 2010 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote:
> > Seems like perf is also split, with sysprof being developed outside the
> > kernel. ?Will you bring sysprof into the kernel? ?Will every feature be
> > duplicated in prof and sysprof?
> 
> I am glad you brought it up! Sysprof was historically outside of the kernel 
> (with it's own kernel module, actually). While the GUI was nice, it was much 
> harder to set up compared to oprofile so it wasn't all that popular. Things 
> improved slightly when Ingo merged the custom kernel module but the 
> _userspace_ part of sysprof was lagging behind a bit. I don't know what's 
> the situation now that they've switched over to perf syscalls but you 
> probably get my point.
> 
> It would be nice if the two projects merged but I honestly don't see any 
> fundamental problem with two (or more) co-existing projects. Friendly 
> competition will ultimately benefit the users (think KDE and Gnome here).

See my previous mail - what i see as the most healthy project model is to have 
a full solution reference implementation, connected to a flexible halo of 
plugins or sub-apps.

Firefox does that, KDE does that, and Gnome as well to a certain degree.

The 'halo' provides a constant feedback of new features, and it also provides 
competition and pressure on the 'main' code to be top-notch.

The problem i see with KVM is that there's no reference implementation! There 
is _only_ the KVM kernel part which is not functional in itself. Surrounded by 
a 'halo' - where none of the entities is really 'the' reference implementation 
we call 'KVM'.

This causes constant quality problems as the developers of the main project 
dont have constant pressure towards good quality (it is not their 
responsibility to care about user-space bits after all), plus it causes a lack 
of focus as well: integration between (friendly) competing user-space 
components is a lot harder than integration within a single framework such as 
Firefox.

I hope this explains my points about modularization a bit better! I suggested 
KVM to grow a user-space tool component in the kernel repo in tools/kvm/, 
which would become the reference implementation for tooling. User-space 
projects can still provide alternative tooling or can plug into this tooling, 
just like they are doing it now. So the main effect isnt even on those 
projects but on the kernel developers. The ABI remains and all the user-space 
packages and projects remain.

Yes, i thought Qemu would be a prime candidate to be the baseline for 
tools/kvm/, but i guess that has become socially impossible now after this 
flamewar. It's not a big problem in the big scheme of things: tools/kvm/ is 
best grown up from a small towards larger size anyway ...

Thanks,
 
	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 14:43                                                                                                   ` Anthony Liguori
@ 2010-03-22 15:55                                                                                                     ` Ingo Molnar
  2010-03-22 16:08                                                                                                       ` Anthony Liguori
  2010-03-22 16:12                                                                                                       ` Avi Kivity
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 15:55 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> [...]
> 
> I've been trying very hard to turn this into a productive thread attempting 
> to capture your feedback and give clear suggestions about how you can solve 
> achieve your desired functionality.

I'm glad that we are at this more productive stage. I'm still trying to 
achieve the very same technological capabilities that i expressed in the first 
few mails when i reviewed the 'perf kvm' patch that was submitted by Yanmin.

The crux of the problem is very simple. To quote my earlier mail:

 |
 | - The inconvenience of having to type:
 |      perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
 |               --guestmodules=/home/ymzhang/guest/modules top
 |
 |
 |   is very obvious even with a single guest. Now multiply that by more guests ...
 |

For example we want 'perf kvm top' to do something useful by default: it 
should find the first guest running and it should report its profile.

The tool shouldnt have to guess about where the guests are, what their 
namespaces is and how to talk to them. We also want easy symbolic access to 
guest, for example:

  perf kvm -g OpenSuse-2 record sleep 1

I.e.:

 - Easy default reference to guest instances, and a way for tools to
   reference them symbolically as well in the multi-guest case. Preferably
   something trustable and kernel-provided - not some indirect information 
   like a PID file created by libvirt-manager or so.

 - Guest-transparent VFS integration into the host, to recover symbols and 
   debug info in binaries, etc.

There were a few responses to that but none really addressed those problems - 
they mostly tried to re-define the problem and suggested that i was wrong to 
want such capabilities and suggested various inferior approaches instead. See 
the thread for the details - i think i covered every technical suggestion that 
was made.

So we are still at an impasse as far as i can see. If i overlooked some 
suggestion that addresses these problems then please let me know ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 15:55                                                                                                     ` Ingo Molnar
@ 2010-03-22 16:08                                                                                                       ` Anthony Liguori
  2010-03-22 16:59                                                                                                         ` Ingo Molnar
  2010-03-22 17:11                                                                                                         ` Ingo Molnar
  2010-03-22 16:12                                                                                                       ` Avi Kivity
  1 sibling, 2 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 16:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 10:55 AM, Ingo Molnar wrote:
> * Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>    
>> [...]
>>
>> I've been trying very hard to turn this into a productive thread attempting
>> to capture your feedback and give clear suggestions about how you can solve
>> achieve your desired functionality.
>>      
> I'm glad that we are at this more productive stage. I'm still trying to
> achieve the very same technological capabilities that i expressed in the first
> few mails when i reviewed the 'perf kvm' patch that was submitted by Yanmin.
>
> The crux of the problem is very simple. To quote my earlier mail:
>
>   |
>   | - The inconvenience of having to type:
>   |      perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
>   |               --guestmodules=/home/ymzhang/guest/modules top
>   |
>   |
>   |   is very obvious even with a single guest. Now multiply that by more guests ...
>   |
>
> For example we want 'perf kvm top' to do something useful by default: it
> should find the first guest running and it should report its profile.
>
> The tool shouldnt have to guess about where the guests are, what their
> namespaces is and how to talk to them. We also want easy symbolic access to
> guest, for example:
>
>    perf kvm -g OpenSuse-2 record sleep 1
>    

Two things are needed.  The first thing needed is to be able to 
enumerate running guests and identify a symbolic name.  I have a patch 
for this and it'll be posted this week or so.  perf will need to have a 
QMP client and it will need to look in ${HOME}/.qemu/qmp/ to sockets to 
connect to.

This is too much to expect from a client and we've got a GSoC idea 
posted to make a nice library for tools to use to simplify this.

The sockets are named based on UUID and you'll have to connect to a 
guest and ask it for it's name.  Some guests don't have names so we'll 
have to come up with a clever way to describe a nameless VM.

> I.e.:
>
>   - Easy default reference to guest instances, and a way for tools to
>     reference them symbolically as well in the multi-guest case. Preferably
>     something trustable and kernel-provided - not some indirect information
>     like a PID file created by libvirt-manager or so.
>    

A guest is not a KVM concept.  It's a qemu concept so it needs to be 
something provided by qemu.  The other caveat is that you won't see 
guests created by libvirt because we're implementing this in terms of a 
default QMP device and libvirt will disable defaults.  This is desired 
behaviour.  libvirt wants to be in complete control and doesn't want a 
tool like perf interacting with a guest directly.

>   - Guest-transparent VFS integration into the host, to recover symbols and
>     debug info in binaries, etc.
>    

The way I'd like to see this implemented is a guest userspace daemon.  I 
think having the guest userspace daemon be something that can be updated 
by the host is reasonable.

In terms of exposing that on the host, my preferred approach is QMP.  
I'd be happy with a QMP command that is essentially, 
guest_fs_read(filename) and guest_fd_readdir(path).

If desired, one could implement a fuse filesystem that interacted with 
all local qemu instances to expose this on the host.  There's a lot of 
ugly things about fuse though so I think sticking to QMP is best 
(particularly with respect to root access of a fuse filesystem).

With just those couple things in place, perf should be able to do 
exactly what you want it to do.

Regards,

Anthony Liguroi

> There were a few responses to that but none really addressed those problems -
> they mostly tried to re-define the problem and suggested that i was wrong to
> want such capabilities and suggested various inferior approaches instead. See
> the thread for the details - i think i covered every technical suggestion that
> was made.
>
> So we are still at an impasse as far as i can see. If i overlooked some
> suggestion that addresses these problems then please let me know ...
>
> Thanks,
>
> 	Ingo
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 14:46                                                                                                   ` Avi Kivity
@ 2010-03-22 16:08                                                                                                     ` Ingo Molnar
  2010-03-22 16:13                                                                                                       ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 16:08 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> On 03/22/2010 04:32 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>On 03/22/2010 02:44 PM, Ingo Molnar wrote:
> >>>This is why i consider that line of argument rather dishonest ...
> >>I am not going to reply to any more email from you on this thread.
> >Because i pointed out that i consider a line of argument intellectually
> >dishonest?
> >
> > I did not say _you_ as a person are dishonest - doing that would be an ad 
> > honimen attack against your person. (In fact i dont think you are, to the 
> > contrary)
> >
> > An argument can certainly be labeled dishonest in a fair discussion and it 
> > is not a personal attack against you to express my opinion about that.
> >
> 
> Sigh, why am I drawn into this.
> 
> A person who uses dishonest arguments is a dishonest person. [...]

That's not how i understood that phrase - and i did not mean to suggest that 
you are dishonest and i do not think that you are dishonest (to the contrary).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 15:55                                                                                                     ` Ingo Molnar
  2010-03-22 16:08                                                                                                       ` Anthony Liguori
@ 2010-03-22 16:12                                                                                                       ` Avi Kivity
  2010-03-22 16:16                                                                                                         ` Avi Kivity
  2010-03-22 16:51                                                                                                         ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 16:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 05:55 PM, Ingo Molnar wrote:
> * Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>    
>> [...]
>>
>> I've been trying very hard to turn this into a productive thread attempting
>> to capture your feedback and give clear suggestions about how you can solve
>> achieve your desired functionality.
>>      
> I'm glad that we are at this more productive stage. I'm still trying to
> achieve the very same technological capabilities that i expressed in the first
> few mails when i reviewed the 'perf kvm' patch that was submitted by Yanmin.
>    

No, you're not.  You're trying to fracture the qemu community with your 
tools/kvm proposal, you're explaining to me how I'm working on the wrong 
thing by concentrating on things that my employer needs rather than what 
you think kvm needs, and attaching various unsavoury labels to Anthony 
and myself.  Any wonder we aren't getting anything done?

If you can commit to a reasonable conversation we might be able to make 
progress.  Is this actually possible?

> The crux of the problem is very simple. To quote my earlier mail:
>
>   |
>   | - The inconvenience of having to type:
>   |      perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
>   |               --guestmodules=/home/ymzhang/guest/modules top
>   |
>   |
>   |   is very obvious even with a single guest. Now multiply that by more guests ...
>   |
>
> For example we want 'perf kvm top' to do something useful by default: it
> should find the first guest running and it should report its profile.
>
> The tool shouldnt have to guess about where the guests are, what their
> namespaces is and how to talk to them. We also want easy symbolic access to
> guest, for example:
>
>    perf kvm -g OpenSuse-2 record sleep 1
>
> I.e.:
>
>   - Easy default reference to guest instances, and a way for tools to
>     reference them symbolically as well in the multi-guest case. Preferably
>     something trustable and kernel-provided - not some indirect information
>     like a PID file created by libvirt-manager or so.
>    

Usually 'layering violation' is trotted out at such suggestions.  I 
don't like using the term, because sometimes the layers are incorrect 
and need to be violated.  But it should be done explicitly, not as a 
shortcut for a minor feature (and profiling is a minor feature, most 
users will never use it, especially guest-from-host).

The fact is we have well defined layers today, kvm virtualizes the cpu 
and memory, qemu emulates devices for a single guest, libvirt manages 
guests.  We break this sometimes but there has to be a good reason.  So 
perf needs to talk to libvirt if it wants names.  Could be done via 
linking, or can be done using a pluging libvirt drops into perf.

>   - Guest-transparent VFS integration into the host, to recover symbols and
>     debug info in binaries, etc.
>
> There were a few responses to that but none really addressed those problems -
> they mostly tried to re-define the problem and suggested that i was wrong to
> want such capabilities and suggested various inferior approaches instead. See
> the thread for the details - i think i covered every technical suggestion that
> was made.
>    

You simply kept ignoring me when I said that if something can be kept 
out of the kernel without impacting performance, it should be.  I don't 
want emergency patches closing some security hole or oops in a kernel 
symbol server.

The usability argument is a red herring.  True, it takes time for things 
to trickle down to distributions and users.  Those who can't wait can 
download the code and compile, it isn't that difficult.

> So we are still at an impasse as far as i can see. If i overlooked some
> suggestion that addresses these problems then please let me know ...
>    

The impasse is mostly due to you insisting on doing everything your way, 
in the kernel, and disregarding how libvirt/qemu/kvm does things.  Learn 
the kvm ecosystem, you'll find it is quite easy to contribute code.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:08                                                                                                     ` Ingo Molnar
@ 2010-03-22 16:13                                                                                                       ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 16:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 06:08 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/22/2010 04:32 PM, Ingo Molnar wrote:
>>      
>>> * Avi Kivity<avi@redhat.com>   wrote:
>>>
>>>        
>>>> On 03/22/2010 02:44 PM, Ingo Molnar wrote:
>>>>          
>>>>> This is why i consider that line of argument rather dishonest ...
>>>>>            
>>>> I am not going to reply to any more email from you on this thread.
>>>>          
>>> Because i pointed out that i consider a line of argument intellectually
>>> dishonest?
>>>
>>> I did not say _you_ as a person are dishonest - doing that would be an ad
>>> honimen attack against your person. (In fact i dont think you are, to the
>>> contrary)
>>>
>>> An argument can certainly be labeled dishonest in a fair discussion and it
>>> is not a personal attack against you to express my opinion about that.
>>>
>>>        
>> Sigh, why am I drawn into this.
>>
>> A person who uses dishonest arguments is a dishonest person. [...]
>>      
> That's not how i understood that phrase - and i did not mean to suggest that
> you are dishonest and i do not think that you are dishonest (to the contrary).
>    

Word games.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:12                                                                                                       ` Avi Kivity
@ 2010-03-22 16:16                                                                                                         ` Avi Kivity
  2010-03-22 16:40                                                                                                             ` Pekka Enberg
  2010-03-22 16:51                                                                                                         ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 16:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 06:12 PM, Avi Kivity wrote:
>> There were a few responses to that but none really addressed those 
>> problems -
>> they mostly tried to re-define the problem and suggested that i was 
>> wrong to
>> want such capabilities and suggested various inferior approaches 
>> instead. See
>> the thread for the details - i think i covered every technical 
>> suggestion that
>> was made.
>
>
> You simply kept ignoring me when I said that if something can be kept 
> out of the kernel without impacting performance, it should be.  I 
> don't want emergency patches closing some security hole or oops in a 
> kernel symbol server.

Or rather, explained how I am a wicked microkernelist.  The herring were 
out in force today.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 13:46                                                                                       ` Joerg Roedel
@ 2010-03-22 16:32                                                                                         ` Ingo Molnar
  2010-03-22 17:17                                                                                           ` Frank Ch. Eigler
                                                                                                             ` (2 more replies)
  0 siblings, 3 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 16:32 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Joerg Roedel <joro@8bytes.org> wrote:

> [...] Look at the state of the alpha arch in Linux today, it is maintained 
> in one repository but nobody really cares about it. Thus it is miles behine 
> most other archs Linux supports today in quality and feature completeness.

I dont know how you can find the situation of Alpha comparable, which is a 
legacy architecture for which no new CPU was manufactored in the past ~10 
years.

The negative effects of physical obscolescence cannot be overcome even by the 
very best of development models ...

So this is a total non-argument in this context.

> On Mon, Mar 22, 2010 at 01:22:28PM +0100, Ingo Molnar wrote:
> > 
> > * Joerg Roedel <joro@8bytes.org> wrote:
> > 
> > > [...] Basically the reason of the oProfile failure is a disfunctional 
> > > community. [...]
> > 
> > Caused by: repository separation and the inevitable code and social fork a 
> > decade later.
> 
> No, the split-repository situation was the smallest problem after all. Its 
> was a community thing. If the community doesn't work a single-repo project 
> will also fail. [...]

So, what do you think creates code communities and keeps them alive? 
Developers and code. And the wellbeing of developers are primarily influenced 
by the repository structure and by the development/maintenance process - i.e. 
by the 'fun' aspect. (i'm simplifying things there but that's the crux of it.)

So yes, i do claim that what stiffled and eventually killed off the Oprofile 
community was the split repository. None of the other Oprofile shortcomings 
were really unfixable, but this one was. It gave no way for the community to 
grow in a healthy way, after the initial phase. Features were more difficult 
and less fun to develop.

And yes, there were times when there was still active Oprofile development but 
the development process warning signs should have been noticed, and the 
community could have been kept alive by unification and similar measures. 
Instead what happened was a complete rewrite and a competitive replacement by 
perf. (Which isnt particularly nice to users btw. - they prefer more gradual 
transitions - but there was no other option, so many problems accumulated in 
Oprofile.)

I simply do not want to see KVM face the same fate, and yes i do see similar 
warnings signs.

> > What you fail to realise (or what you fail to know, you werent around when 
> > Oprofile was written, i was) is that Oprofile _did_ have a functional 
> > single community when it was written. The tooling and the kernel bits was 
> > written by the same people.
> 
> Yes, this was probably the time when everybody was enthusiastic about the 
> feature and they could attract lots of developers. But situation changed 
> over time.

The thing is, the drift was pre-programmed by having a split ...

> > So i dont see much of a difference to the Oprofile situation really and i 
> > see many parallels. I also see similar kinds of desktop usability 
> > problems.
> 
> The difference is that KVM has a working community with good developers and 
> maintainers.

Oprofile certainly had good developers and maintainers as well. In the end it 
wasnt enough ...

Also, a project can easily still be 'alive' but not reach its full potential. 

Why do you assume that my argument means that KVM isnt viable today? It can 
very well still be viable and even healthy - just not _as healthy_ as it could 
be ...

> > The difference is that we dont have KVM with a decade of history and we 
> > dont have a 'told you so' KVM reimplementation to show that proves the 
> > point. I guess it's a matter of time before that happens, because Qemu 
> > usability is so absymal today - so i guess we should suspend any 
> > discussions until that happens, no need to waste time on arguing 
> > hypoteticals.
> 
> We actually have lguest which is small. But it lacks functionality and the 
> developer community KVM has attracted.

I suggested long ago to merge lguest into KVM to cover non-VMX/non-SVM 
execution.

> > I think you are rationalizing the status quo.
> 
> I see that there are issues with KVM today in some areas. You pointed out 
> the desktop usability already. I personally have trouble with the 
> qem-kvm.git because it is unbisectable. But repository unification doesn't 
> solve the problem here.

Why doesnt it solve the bisectability problem? The kernel repo is supposed to 
be bisectable so that problem would be solved.

> The point for a single repository is that it simplifies the development 
> process. I agree with you here. But the current process of KVM is not too 
> difficult after all. I don't have to touch qemu sources for most of my work 
> on KVM.

In my judgement you'd have to do that more frequently, if KVM was properly 
weighting its priorities. For example regarding this recent KVM commit of 
yours:

| commit ec1ff79084fccdae0dca9b04b89dcdf3235bbfa1
| Author: Joerg Roedel <joerg.roedel@amd.com>
| Date:   Fri Oct 9 16:08:31 2009 +0200
|
|     KVM: SVM: Add tracepoint for invlpga instruction
|     
|     This patch adds a tracepoint for the event that the guest
|     executed the INVLPGA instruction.

With integrated KVM tooling i might have insisted for that new tracepoint to 
be available to users as well via some more meaningful tooling than just a 
pure tracepoint.

There's synergies like that all around the place.

You should realize that naturally developers will gravitate towards the most 
'fun' aspects of a project. It is the task of the maintainer to keep the 
balance between fun and utility, bugs and features, quality and code-rot.

> > It's as if you argued in 1990 that the unification of East and West 
> > Germany wouldnt make much sense because despite clear problems and 
> > incompatibilites and different styles westerners were still allowed to 
> > visit eastern relatives and they both spoke the same language after all 
> > ;-)
> 
> Um, hmm. I don't think these situations have enough in common to compare 
> them ;-)

Probably, but it's an interesting parallel nevertheless ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-22 16:16                                                                                                         ` Avi Kivity
@ 2010-03-22 16:40                                                                                                             ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 16:40 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Mon, Mar 22, 2010 at 6:16 PM, Avi Kivity <avi@redhat.com> wrote:
>> You simply kept ignoring me when I said that if something can be kept out
>> of the kernel without impacting performance, it should be.  I don't want
>> emergency patches closing some security hole or oops in a kernel symbol
>> server.
>
> Or rather, explained how I am a wicked microkernelist.  The herring were out
> in force today.

Well, if it's not being a "wicked microkernelist" then what is it?
Performance is hardly the only motivation to put things into the
kernel. Think kernel mode-setting and devtmpfs (with the ironic twist
of original devfs being removed from the kernel) here, for example.

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-22 16:40                                                                                                             ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 16:40 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Mon, Mar 22, 2010 at 6:16 PM, Avi Kivity <avi@redhat.com> wrote:
>> You simply kept ignoring me when I said that if something can be kept out
>> of the kernel without impacting performance, it should be.  I don't want
>> emergency patches closing some security hole or oops in a kernel symbol
>> server.
>
> Or rather, explained how I am a wicked microkernelist.  The herring were out
> in force today.

Well, if it's not being a "wicked microkernelist" then what is it?
Performance is hardly the only motivation to put things into the
kernel. Think kernel mode-setting and devtmpfs (with the ironic twist
of original devfs being removed from the kernel) here, for example.

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-22  7:24     ` Zhang, Yanmin
@ 2010-03-22 16:44       ` Arnaldo Carvalho de Melo
  2010-03-23  3:14         ` Zhang, Yanmin
  0 siblings, 1 reply; 390+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-03-22 16:44 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, zhiteng.huang,
	Frédéric Weisbecker

Em Mon, Mar 22, 2010 at 03:24:47PM +0800, Zhang, Yanmin escreveu:
> On Fri, 2010-03-19 at 09:21 +0100, Ingo Molnar wrote:
> > So some sort of --guestmount option would be the natural solution, which 
> > points to the guest system's root: and a Qemu enumeration of guest mounts 
> > (which would be off by default and configurable) from which perf can pick up 
> > the target guest all automatically. (obviously only under allowed permissions 
> > so that such access is secure)
> If sshfs could access /proc/ and /sys correctly, here is a design:
> --guestmount points to a directory which consists of a list of sub-directories.
> Every sub-directory's name is just the qemu process id of guest os. Admin/developer
> mounts every guest os instance's root directory to corresponding sub-directory.
> 
> Then, perf could access all files. It's possible because guest os instance
> happens to be multi-threading in a process. One of the defects is the accessing to
> guest os becomes slow or impossible when guest os is very busy.

If the MMAP events on the guest included a cookie that could later be
used to query for the symtab of that DSO, we wouldn't need to access the
guest FS at all, right?

With build-ids and debuginfo-install like tools the symbol resolution
could be performed by using the cookies (build-ids) as keys to get to
the *-debuginfo packages with matching symtabs (and DWARF for source
annotation, etc).

We have that for the kernel as:

[acme@doppio linux-2.6-tip]$ l /sys/kernel/notes 
-r--r--r-- 1 root root 36 2010-03-22 13:14 /sys/kernel/notes
[acme@doppio linux-2.6-tip]$ l /sys/module/ipv6/sections/.note.gnu.build-id 
-r--r--r-- 1 root root 4096 2010-03-22 13:38 /sys/module/ipv6/sections/.note.gnu.build-id
[acme@doppio linux-2.6-tip]$

That way we would cover DSOs being reinstalled in long running 'perf
record' sessions too.

This was discussed some time ago but would require help from the bits
that load DSOs.

build-ids then would be first class citizens.

- Arnaldo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:12                                                                                                       ` Avi Kivity
  2010-03-22 16:16                                                                                                         ` Avi Kivity
@ 2010-03-22 16:51                                                                                                         ` Ingo Molnar
  2010-03-22 17:08                                                                                                           ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 16:51 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> > The crux of the problem is very simple. To quote my earlier mail:
> >
> >  |
> >  | - The inconvenience of having to type:
> >  |      perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
> >  |               --guestmodules=/home/ymzhang/guest/modules top
> >  |
> >  |
> >  |   is very obvious even with a single guest. Now multiply that by more guests ...
> >  |
> >
> > For example we want 'perf kvm top' to do something useful by default: it
> > should find the first guest running and it should report its profile.
> >
> > The tool shouldnt have to guess about where the guests are, what their
> > namespaces is and how to talk to them. We also want easy symbolic access to
> > guest, for example:
> >
> >   perf kvm -g OpenSuse-2 record sleep 1

[ Sidenote: i still received no adequate suggestions about how to provide this
  category of technical features. ]

> > I.e.:
> >
> >  - Easy default reference to guest instances, and a way for tools to
> >    reference them symbolically as well in the multi-guest case. Preferably
> >    something trustable and kernel-provided - not some indirect information
> >    like a PID file created by libvirt-manager or so.
> 
> Usually 'layering violation' is trotted out at such suggestions.
> [...]

That's weird, how can a feature request be a 'layering violation'?

If something that users find straightforward and usable is a layering 
violation to you (such as easily being able to access their own files on the 
host as well ...) then i think you need to revisit the definition of that term 
instead of trying to fix the user.

> [...]  I don't like using the term, because sometimes the layers are 
> incorrect and need to be violated.  But it should be done explicitly, not as 
> a shortcut for a minor feature (and profiling is a minor feature, most users 
> will never use it, especially guest-from-host).
> 
> The fact is we have well defined layers today, kvm virtualizes the cpu and 
> memory, qemu emulates devices for a single guest, libvirt manages guests.  
> We break this sometimes but there has to be a good reason.  So perf needs to 
> talk to libvirt if it wants names.  Could be done via linking, or can be 
> done using a pluging libvirt drops into perf.
> 
> >  - Guest-transparent VFS integration into the host, to recover symbols and
> >    debug info in binaries, etc.
> >
> > There were a few responses to that but none really addressed those 
> > problems - they mostly tried to re-define the problem and suggested that i 
> > was wrong to want such capabilities and suggested various inferior 
> > approaches instead. See the thread for the details - i think i covered 
> > every technical suggestion that was made.
> 
> You simply kept ignoring me when I said that if something can be kept out of 
> the kernel without impacting performance, it should be. I don't want 
> emergency patches closing some security hole or oops in a kernel symbol 
> server.

I never suggested an "in kernel space symbol server" which could oops, why 
would i have suggested that? Please point me to an email where i suggested 
that.

> The usability argument is a red herring.  True, it takes time for things to 
> trickle down to distributions and users.  Those who can't wait can download 
> the code and compile, it isn't that difficult.

It's not just "download and compile", it's also "configure correctly for 
several separate major distributions" and "configure to per guest instance 
local rules".

It's far more fragile in practice than you make it appear to be, and since you 
yourself expressed that you are not interested much in the tooling side, how 
can you have adequate experience to judge such matters?

In fact for instrumentation it's beyond a critical threshold of fragility - 
instrumentation above all needs to be accessible, transparent and robust.

If you cannot see the advantages of a properly integrated solution then i 
suspect there's not much i can do to convince you.

And you ignored not just me but you ignored several people in this thread who 
thought the current status quo was inadequate and expressed interest in both 
the VFS integration and in the guest enumeration features.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:08                                                                                                       ` Anthony Liguori
@ 2010-03-22 16:59                                                                                                         ` Ingo Molnar
  2010-03-22 18:28                                                                                                           ` Anthony Liguori
  2010-03-22 17:11                                                                                                         ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 16:59 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/22/2010 10:55 AM, Ingo Molnar wrote:
> >* Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >
> >>[...]
> >>
> >>I've been trying very hard to turn this into a productive thread attempting
> >>to capture your feedback and give clear suggestions about how you can solve
> >>achieve your desired functionality.
> >I'm glad that we are at this more productive stage. I'm still trying to
> >achieve the very same technological capabilities that i expressed in the first
> >few mails when i reviewed the 'perf kvm' patch that was submitted by Yanmin.
> >
> >The crux of the problem is very simple. To quote my earlier mail:
> >
> >  |
> >  | - The inconvenience of having to type:
> >  |      perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
> >  |               --guestmodules=/home/ymzhang/guest/modules top
> >  |
> >  |
> >  |   is very obvious even with a single guest. Now multiply that by more guests ...
> >  |
> >
> > For example we want 'perf kvm top' to do something useful by default: it 
> > should find the first guest running and it should report its profile.
> >
> > The tool shouldnt have to guess about where the guests are, what their 
> > namespaces is and how to talk to them. We also want easy symbolic access 
> > to guest, for example:
> >
> >   perf kvm -g OpenSuse-2 record sleep 1
> 
> Two things are needed.  The first thing needed is to be able to enumerate 
> running guests and identify a symbolic name.  I have a patch for this and 
> it'll be posted this week or so.  perf will need to have a QMP client and it 
> will need to look in ${HOME}/.qemu/qmp/ to sockets to connect to.
> 
> This is too much to expect from a client and we've got a GSoC idea posted to 
> make a nice library for tools to use to simplify this.

Ok, that sounds interesting! I'd rather see some raw mechanism that 'perf kvm' 
could use instead of having to require yet another library (which generally 
dampens adoption of a tool). So i think we can work from there.

Btw., have you considered using Qemu's command name (task->comm[]) as the 
symbolic name? That way we could see the guest name in 'top' on the host - a 
nice touch.

> The sockets are named based on UUID and you'll have to connect to a guest 
> and ask it for it's name.  Some guests don't have names so we'll have to 
> come up with a clever way to describe a nameless VM.

I think just exposing the UUID in that lazy case would be adequate? It creates 
pressure for VM launchers to use better symbolic names.

> > I.e.:
> >
> >  - Easy default reference to guest instances, and a way for tools to
> >    reference them symbolically as well in the multi-guest case. Preferably 
> >    something trustable and kernel-provided - not some indirect information 
> >    like a PID file created by libvirt-manager or so.
> 
> A guest is not a KVM concept.  It's a qemu concept so it needs to be 
> something provided by qemu.  The other caveat is that you won't see guests 
> created by libvirt because we're implementing this in terms of a default QMP 
> device and libvirt will disable defaults.  This is desired behaviour.  
> libvirt wants to be in complete control and doesn't want a tool like perf 
> interacting with a guest directly.

Hm, this sucks for multiple reasons. Firstly, perf isnt a tool that 
'interacts', it's an observation tool: just like 'top' is an observation tool.

We want to enable developers to see all activities on the system - regardless 
of who started the VM or who started the process. Imagine if we had a way to 
hide tasks to hide from 'top'. It would be rather awful.

Secondly, it tells us that the concept is fragile if it doesnt automatically 
enumerate all guests, regardless of how they were created.

Full system enumeration is generally best left to the kernel, as it can offer 
coherent access.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:51                                                                                                         ` Ingo Molnar
@ 2010-03-22 17:08                                                                                                           ` Avi Kivity
  2010-03-22 17:34                                                                                                             ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 17:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 06:51 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> The crux of the problem is very simple. To quote my earlier mail:
>>>
>>>   |
>>>   | - The inconvenience of having to type:
>>>   |      perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms \
>>>   |               --guestmodules=/home/ymzhang/guest/modules top
>>>   |
>>>   |
>>>   |   is very obvious even with a single guest. Now multiply that by more guests ...
>>>   |
>>>
>>> For example we want 'perf kvm top' to do something useful by default: it
>>> should find the first guest running and it should report its profile.
>>>
>>> The tool shouldnt have to guess about where the guests are, what their
>>> namespaces is and how to talk to them. We also want easy symbolic access to
>>> guest, for example:
>>>
>>>    perf kvm -g OpenSuse-2 record sleep 1
>>>        
> [ Sidenote: i still received no adequate suggestions about how to provide this
>    category of technical features. ]
>    

You need to integrate with libvirt to convert guest names something that 
can be used to obtain guest symbols.

>>> I.e.:
>>>
>>>   - Easy default reference to guest instances, and a way for tools to
>>>     reference them symbolically as well in the multi-guest case. Preferably
>>>     something trustable and kernel-provided - not some indirect information
>>>     like a PID file created by libvirt-manager or so.
>>>        
>> Usually 'layering violation' is trotted out at such suggestions.
>> [...]
>>      
> That's weird, how can a feature request be a 'layering violation'?
>    

The 'something trustable and kernel-provided'.  The kernel knows nothing 
about guest names.

> If something that users find straightforward and usable is a layering
> violation to you (such as easily being able to access their own files on the
> host as well ...) then i think you need to revisit the definition of that term
> instead of trying to fix the user.
>    

Here is the explanation, you left it quoted:

>> [...]  I don't like using the term, because sometimes the layers are
>> incorrect and need to be violated.  But it should be done explicitly, not as
>> a shortcut for a minor feature (and profiling is a minor feature, most users
>> will never use it, especially guest-from-host).
>>
>> The fact is we have well defined layers today, kvm virtualizes the cpu and
>> memory, qemu emulates devices for a single guest, libvirt manages guests.
>> We break this sometimes but there has to be a good reason.  So perf needs to
>> talk to libvirt if it wants names.  Could be done via linking, or can be
>> done using a pluging libvirt drops into perf.

>> You simply kept ignoring me when I said that if something can be kept out of
>> the kernel without impacting performance, it should be. I don't want
>> emergency patches closing some security hole or oops in a kernel symbol
>> server.
>>      
> I never suggested an "in kernel space symbol server" which could oops, why
> would i have suggested that? Please point me to an email where i suggested
> that.
>    

You insisted that it be in the kernel.  Later you relaxed that and said 
a daemon is fine.  I'm not going to reread this thread, once is more 
than enough.

>> The usability argument is a red herring.  True, it takes time for things to
>> trickle down to distributions and users.  Those who can't wait can download
>> the code and compile, it isn't that difficult.
>>      
> It's not just "download and compile", it's also "configure correctly for
> several separate major distributions" and "configure to per guest instance
> local rules".
>    

That's life in Linux-land.  Either you let distributions feed you cooked 
packages and relax, or you do the work yourself.  If we had 
tools/everything/ it wouldn't be this way, but we don't.

> It's far more fragile in practice than you make it appear to be, and since you
> yourself expressed that you are not interested much in the tooling side, how
> can you have adequate experience to judge such matters?
>    

People on kvm-devel manage to build and run release tarballs and even 
directly from git.  I build packages from source occasionally.  It isn't 
fun but it doesn't take a PhD.

> In fact for instrumentation it's beyond a critical threshold of fragility -
> instrumentation above all needs to be accessible, transparent and robust.
>
> If you cannot see the advantages of a properly integrated solution then i
> suspect there's not much i can do to convince you.
>    

Integration in Linux happens at the desktop or distribution level.  You 
want to move it to the kernel level.  It works for perf, great, but that 
doesn't mean it will work for everything else.  Once perf grows a GUI, I 
expect it will stop working for perf as well (for example, if gtk breaks 
its API in a major release, which version will perf code for?)

> And you ignored not just me but you ignored several people in this thread who
> thought the current status quo was inadequate and expressed interest in both
> the VFS integration and in the guest enumeration features.
>    

I'm sorry.  I don't reply to every email.  If you want my opinion on 
something, you can ask me again.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:08                                                                                                       ` Anthony Liguori
  2010-03-22 16:59                                                                                                         ` Ingo Molnar
@ 2010-03-22 17:11                                                                                                         ` Ingo Molnar
  2010-03-22 18:30                                                                                                           ` Anthony Liguori
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 17:11 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> >  - Easy default reference to guest instances, and a way for tools to
> >    reference them symbolically as well in the multi-guest case. Preferably
> >    something trustable and kernel-provided - not some indirect information
> >    like a PID file created by libvirt-manager or so.
> 
> A guest is not a KVM concept. [...]

Well, in a sense a guest is a KVM concept too: it's in essence represented via 
the 'vcpu state attached to a struct mm' abstraction that is attached to the 
/dev/kvm file descriptor attached to a Linux process.

Multiple vcpus can be started by the same process to represent SMP, but the 
whole guest notion is present: a Linux MM that carries KVM state.

In that sense when we type 'perf kvm list' we'd like to get a list of all 
currently present guests that the developer has permission to profile: i.e. 
we'd like a list of all [debuggable] Linux tasks that have a KVM instance 
attached to them.

A convenient way to do that would be to use the Qemu process's ->comm[] name, 
and to have a KVM ioctl that gets us a list of all vcpus that the querying 
task has ptrace permission to. [the standard permission check we do for 
instrumentation]

No need for communication with Qemu for that - just an ioctl, and an 
always-guaranteed result that works fine on a whole-system and on a per user 
basis as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:32                                                                                         ` Ingo Molnar
@ 2010-03-22 17:17                                                                                           ` Frank Ch. Eigler
  2010-03-22 17:27                                                                                               ` Pekka Enberg
  2010-03-22 17:44                                                                                           ` Avi Kivity
  2010-03-22 19:20                                                                                           ` Joerg Roedel
  2 siblings, 1 reply; 390+ messages in thread
From: Frank Ch. Eigler @ 2010-03-22 17:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Joerg Roedel, Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


mingo wrote:

> [...]
>> No, the split-repository situation was the smallest problem after all. Its 
>> was a community thing. If the community doesn't work a single-repo project 
>> will also fail. [...]
>
> So, what do you think creates code communities and keeps them alive? 
> Developers and code. And the wellbeing of developers are primarily influenced 
> by the repository structure and by the development/maintenance process - i.e. 
> by the 'fun' aspect. (i'm simplifying things there but that's the crux of it.)
>
> So yes, i do claim that what stiffled and eventually killed off the Oprofile 
> community was the split repository.  [...]

In your very previous paragraphs, you enumerate two separate causes:
"repository structure" and "development/maintenance process" as being
sources of "fun".  Please simply accept that the former is considered
by many as absolutely trivial compared to the latter, and additional
verbose repetition of your thesis will not change this.

- FChE

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-22 17:17                                                                                           ` Frank Ch. Eigler
@ 2010-03-22 17:27                                                                                               ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 17:27 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Ingo Molnar, Joerg Roedel, Avi Kivity, Anthony Liguori, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

Hi Frank,

On Mon, Mar 22, 2010 at 7:17 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
> In your very previous paragraphs, you enumerate two separate causes:
> "repository structure" and "development/maintenance process" as being
> sources of "fun".  Please simply accept that the former is considered
> by many as absolutely trivial compared to the latter, and additional
> verbose repetition of your thesis will not change this.

I can accept that many people consider it trivial but the problem is
that we have _real data_ on kmemtrace and now perf that the amount of
contributors is significantly smaller when your code is outside the
kernel repository. Now admittedly both of them are pretty intimate
with the kernel but Ingo's suggestion of putting kvm-qemu in tools/ is
an interesting idea nevertheless.

It's kinda funny to see people argue that having an external
repository is not a problem and that it's not a big deal if building
something from the repository is slightly painful as long as it
doesn't require a PhD when we have _real world_ experience that it
_does_ limit developer base in some cases. Whether or not that applies
to kvm remains to be seen but I've yet to see a convincing argument
why it doesn't.

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-22 17:27                                                                                               ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 17:27 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Ingo Molnar, Joerg Roedel, Avi Kivity, Anthony Liguori, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

Hi Frank,

On Mon, Mar 22, 2010 at 7:17 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
> In your very previous paragraphs, you enumerate two separate causes:
> "repository structure" and "development/maintenance process" as being
> sources of "fun".  Please simply accept that the former is considered
> by many as absolutely trivial compared to the latter, and additional
> verbose repetition of your thesis will not change this.

I can accept that many people consider it trivial but the problem is
that we have _real data_ on kmemtrace and now perf that the amount of
contributors is significantly smaller when your code is outside the
kernel repository. Now admittedly both of them are pretty intimate
with the kernel but Ingo's suggestion of putting kvm-qemu in tools/ is
an interesting idea nevertheless.

It's kinda funny to see people argue that having an external
repository is not a problem and that it's not a big deal if building
something from the repository is slightly painful as long as it
doesn't require a PhD when we have _real world_ experience that it
_does_ limit developer base in some cases. Whether or not that applies
to kvm remains to be seen but I've yet to see a convincing argument
why it doesn't.

                        Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 14:26                                                                                             ` Ingo Molnar
@ 2010-03-22 17:29                                                                                               ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 17:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Olivier Galibert, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 04:26 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/22/2010 01:39 PM, Ingo Molnar wrote:
>>      
>>> Reality is, the server space never was and never will be self-sustaining
>>> in the long run (as Novell has found it out with Netware), it is the
>>> desktop that dictates future markets. This is why i find your views about
>>> this naive and shortsighted.
>>>        
>> Yet Linux is gaining ground in the server and embedded space while
>> struggling on the desktop. [...]
>>      
> Frankly, Linux is mainly growing in the server space due to:
>
>   1) the server space is technically much simpler than the desktop space. It
>      is far easier to code up a server performance feature than to make
>      struggle through stupid (server-motivated) package boundaries and get
>      something done on the desktop. It is far easier to code up a server app
>      as that space is well standardized and servers tend to be compartmented.
>      Integration between server apps is much less common than integration
>      between desktop apps, hence the harm that our modularization idiocies
>      cause less harm.
>
>   2) Linux's growth is still feeding on the remains of the destruction of Unix.
>    

Agreed (minus the 'package boundaries' stuff).  Also, Linux is cheaper 
than Windows.

> Linux is struggling on the desktop due to the desktop's inherent complexity,
> due to the lack of the Unix inertia and due to incompetence, insensitivity,
> intellectual arrogance and shortsightedness of server-centric thinking, like
> your arguments/position displayed in this very thread.
>    

It's struggling because it isn't competitive technically with other 
desktops, because there is no application base, because of a 
chicken-and-egg problem with some drivers, because lack of a stable ABI 
means you can't get a driver CD with your device so you need a 
yet-unreleased kernel, because the zillion binary incompatible 
distributions mean that application developers don't know what to code 
and test for, because of lack of documentation, to name a few.  At least 
it's improving all the time.

The incompetence, insensitivity, intellectual arrogance and 
shortsightedness of server-centric thinking of my arguments/position are 
a result of this, not the cause.

>> [...] Apple is gaining ground on the desktop but is invisible on the server
>> side (despite having a nice product - Xserve).
>>      
> But the thing is, Apple doesnt really care about the server space, yet. It is
> lucrative but it is a side-show: it will fall automatically to the 'winner' of
> the desktop (or gadget) of tomorrow.
>    

It won't automatically fall to Apple, there's tons of middleware and 
server apps that need porting (the "ecosystem"), plus they need to work 
hard on improving their kernel which is desktop oriented.  Looks like 
they're interesting in other things.

> Has the quick fall of Banyan Vines or Netware (both excellent all-around
> server products) taught you nothing?
>    

Not familiar with Banyan, but wasn't Netware a cooperative multitasking 
command line only thing?  It couldn't compete with preemptive modern 
system with a nice GUI.  Windows didn't need the desktop to win that fight.

> We need a lot more desktop focus in the kernel community. The best method to
> achieve this, that i know of currently, is to simply have kernel developers
> think outside the kernel box and to have them do bits of user-space coding as
> well - and in particular desktop coding. To eat our own dogfood in essence.
> Suffer through crap we cause to user-space. To face the _real_ difficulties of
> users. We seem to have forgotten our roots.
>    

Try it yourself and report the experience.  Note: perf is not desktop 
development, it's kernel tooling development.

>> [...]
>>
>> It's true Windows achieved server dominance through it's desktop power, but
>> I don't think that's what keeping them there now.
>>      
> What is keeping them there is precisely that.
>    

Not at all.  They have excellent development tools and lots of 
middleware and other third party products that make it easy to pick 
Windows.  For example, Exchange is more or less standard for groupware, 
and they made C# and the technology around it easy to develop for, 
learning from Java's mistakes.

>> In any case, I'm not going to write a kvm GUI.  It doesn't match my skills,
>> interest, or my employer's interest.  If you wish to see a kvm GUI you have
>> to write one yourself or convince someone to write it (perhaps convince Red
>> Hat to fund such an effort beyond virt-manager).
>>      
> As a maintainer you certainly dont have to write a single line of code, if you
> dont want to. You 'just' need to care about the big picture and encourage/help
> the flow and balance of the whole project.
>    

I haven't written that line of code, and no one else has either.  Don't 
tell me they're all scared of me.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:27                                                                                               ` Pekka Enberg
  (?)
@ 2010-03-22 17:32                                                                                               ` Avi Kivity
  2010-03-22 17:39                                                                                                 ` Ingo Molnar
  2010-03-22 17:52                                                                                                   ` Pekka Enberg
  -1 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 17:32 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 07:27 PM, Pekka Enberg wrote:
> It's kinda funny to see people argue that having an external
> repository is not a problem and that it's not a big deal if building
> something from the repository is slightly painful as long as it
> doesn't require a PhD when we have _real world_ experience that it
> _does_ limit developer base in some cases. Whether or not that applies
> to kvm remains to be seen but I've yet to see a convincing argument
> why it doesn't.
>    

qemu has non-Linux developers.  Not all of their contributions are 
relevant to kvm but some are.  If we pull qemu into tools/kvm, we lose them.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:08                                                                                                           ` Avi Kivity
@ 2010-03-22 17:34                                                                                                             ` Ingo Molnar
  2010-03-22 17:55                                                                                                               ` Avi Kivity
                                                                                                                                 ` (2 more replies)
  0 siblings, 3 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 17:34 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> >>>  - Easy default reference to guest instances, and a way for tools to
> >>>    reference them symbolically as well in the multi-guest case. Preferably
> >>>    something trustable and kernel-provided - not some indirect information
> >>>    like a PID file created by libvirt-manager or so.
> >>
> >> Usually 'layering violation' is trotted out at such suggestions.
> >> [...]
> >
> > That's weird, how can a feature request be a 'layering violation'?
> 
> The 'something trustable and kernel-provided'.  The kernel knows nothing 
> about guest names.

The kernel certainly knows about other resources such as task names or network 
interface names or tracepoint names. This is kernel design 101.

> > If something that users find straightforward and usable is a layering 
> > violation to you (such as easily being able to access their own files on 
> > the host as well ...) then i think you need to revisit the definition of 
> > that term instead of trying to fix the user.
> 
> Here is the explanation, you left it quoted:
> 
> >> [...]  I don't like using the term, because sometimes the layers are 
> >> incorrect and need to be violated.  But it should be done explicitly, not 
> >> as a shortcut for a minor feature (and profiling is a minor feature, most 
> >> users will never use it, especially guest-from-host).
> >>
> >> The fact is we have well defined layers today, kvm virtualizes the cpu 
> >> and memory, qemu emulates devices for a single guest, libvirt manages 
> >> guests. We break this sometimes but there has to be a good reason.  So 
> >> perf needs to talk to libvirt if it wants names.  Could be done via 
> >> linking, or can be done using a pluging libvirt drops into perf.

This is really just the much-discredited microkernel approach for keeping 
global enumeration data that should be kept by the kernel ...

Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony. 
There's numerous ways that this can break:

 - Those special files can get corrupted, mis-setup, get out of sync, or can
   be hard to discover.

 - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
   design flaw: it is per user. When i'm root i'd like to query _all_ current
   guest images, not just the ones started by root. A system might not even
   have a notion of '${HOME}'.

 - Apps might start KVM vcpu instances without adhering to the
   ${HOME}/.qemu/qmp/ access method.

 - There is no guarantee for the Qemu process to reply to a request - while
   the kernel can always guarantee an enumeration result. I dont want 'perf 
   kvm' to hang or misbehave just because Qemu has hung.

Really, for such reasons user-space is pretty poor at doing system-wide 
enumeration and resource management. Microkernels lost for a reason.

You are committing several grave design mistakes here.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:32                                                                                               ` Avi Kivity
@ 2010-03-22 17:39                                                                                                 ` Ingo Molnar
  2010-03-22 17:58                                                                                                   ` Avi Kivity
  2010-03-22 17:52                                                                                                   ` Pekka Enberg
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 17:39 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Pekka Enberg, Frank Ch. Eigler, Joerg Roedel, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/22/2010 07:27 PM, Pekka Enberg wrote:
> >
> > It's kinda funny to see people argue that having an external repository is 
> > not a problem and that it's not a big deal if building something from the 
> > repository is slightly painful as long as it doesn't require a PhD when we 
> > have _real world_ experience that it _does_ limit developer base in some 
> > cases. Whether or not that applies to kvm remains to be seen but I've yet 
> > to see a convincing argument why it doesn't.
> 
> qemu has non-Linux developers.  Not all of their contributions are relevant 
> to kvm but some are.  If we pull qemu into tools/kvm, we lose them.

Qemu had very few developers before KVM made use of it - i know it because i 
followed the project prior KVM.

So whatever development activitity Qemu has today, it's 99% [WAG] attributable 
to KVM. It might have non-Linux contributors, but they wouldnt be there if it 
wasnt for all the Linux contributors ...

Furthermore, those contributors wouldnt have to leave - they could simply use 
a different Git URI ...

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:27                                                                                               ` Pekka Enberg
  (?)
  (?)
@ 2010-03-22 17:43                                                                                               ` Ingo Molnar
  2010-03-22 18:02                                                                                                 ` Avi Kivity
  -1 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 17:43 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Frank Ch. Eigler, Joerg Roedel, Avi Kivity, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Pekka Enberg <penberg@cs.helsinki.fi> wrote:

> Hi Frank,
> 
> On Mon, Mar 22, 2010 at 7:17 PM, Frank Ch. Eigler <fche@redhat.com> wrote:
> > In your very previous paragraphs, you enumerate two separate causes:
> > "repository structure" and "development/maintenance process" as being
> > sources of "fun". ?Please simply accept that the former is considered
> > by many as absolutely trivial compared to the latter, and additional
> > verbose repetition of your thesis will not change this.
> 
> I can accept that many people consider it trivial but the problem is that we 
> have _real data_ on kmemtrace and now perf that the amount of contributors 
> is significantly smaller when your code is outside the kernel repository. 
> Now admittedly both of them are pretty intimate with the kernel but Ingo's 
> suggestion of putting kvm-qemu in tools/ is an interesting idea 
> nevertheless.

Correct.

> It's kinda funny to see people argue that having an external repository is 
> not a problem and that it's not a big deal if building something from the 
> repository is slightly painful as long as it doesn't require a PhD when we 
> have _real world_ experience that it _does_ limit developer base in some 
> cases. Whether or not that applies to kvm remains to be seen but I've yet to 
> see a convincing argument why it doesn't.

Yeah.

Also, if in fact the claim that the 'repository does not matter' is true then 
it doesnt matter that it's hosted in tools/kvm/ either, right?

I.e. it's a win-win situation. Worst-case nothing happens beyond a Git URI 
change. Best-case the project is propelled to never seen heights due to 
contribution advantages not contemplated and not experienced by the KVM guys 
before ...

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:32                                                                                         ` Ingo Molnar
  2010-03-22 17:17                                                                                           ` Frank Ch. Eigler
@ 2010-03-22 17:44                                                                                           ` Avi Kivity
  2010-03-22 19:10                                                                                             ` Ingo Molnar
  2010-03-22 19:20                                                                                           ` Joerg Roedel
  2 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 17:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Joerg Roedel, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 06:32 PM, Ingo Molnar wrote:
>
> So, what do you think creates code communities and keeps them alive?
> Developers and code. And the wellbeing of developers are primarily influenced
> by the repository structure and by the development/maintenance process - i.e.
> by the 'fun' aspect. (i'm simplifying things there but that's the crux of it.)
>    

There is nothing fun about having one repository or two.  Who cares 
about this anyway?

tools/kvm/ probably will draw developers, simply because of the glory 
associated with kernel work.  That's a bug, not a feature.  It means 
that effort is not distributed according to how it's needed, but because 
of irrelevant considerations.

> I simply do not want to see KVM face the same fate, and yes i do see similar
> warnings signs.
>    

The number of kvm and qemu developers keeps increasing.

We're having a kvm forum in August where we all meet.  Come and see for 
yourself.

>> We actually have lguest which is small. But it lacks functionality and the
>> developer community KVM has attracted.
>>      
> I suggested long ago to merge lguest into KVM to cover non-VMX/non-SVM
> execution.
>    

Rusty posted some initial patches for pv-only kvm but he lost interest 
before they were completed.  No one followed up.

btw, lguest has a single repository, userspace and kernel in the same 
repository, yet is practically dead.

>>> I think you are rationalizing the status quo.
>>>        
>> I see that there are issues with KVM today in some areas. You pointed out
>> the desktop usability already. I personally have trouble with the
>> qem-kvm.git because it is unbisectable. But repository unification doesn't
>> solve the problem here.
>>      
> Why doesnt it solve the bisectability problem? The kernel repo is supposed to
> be bisectable so that problem would be solved.
>    

These days qemu-kvm.git is bisectable (though not always trivially).  
qemu.git doesn't have this problem.

>> The point for a single repository is that it simplifies the development
>> process. I agree with you here. But the current process of KVM is not too
>> difficult after all. I don't have to touch qemu sources for most of my work
>> on KVM.
>>      
> In my judgement you'd have to do that more frequently, if KVM was properly
> weighting its priorities. For example regarding this recent KVM commit of
> yours:
>
> | commit ec1ff79084fccdae0dca9b04b89dcdf3235bbfa1
> | Author: Joerg Roedel<joerg.roedel@amd.com>
> | Date:   Fri Oct 9 16:08:31 2009 +0200
> |
> |     KVM: SVM: Add tracepoint for invlpga instruction
> |
> |     This patch adds a tracepoint for the event that the guest
> |     executed the INVLPGA instruction.
>
> With integrated KVM tooling i might have insisted for that new tracepoint to
> be available to users as well via some more meaningful tooling than just a
> pure tracepoint.
>    

Something I've wanted for a long time is to port kvm_stat to use 
tracepoints instead of the home-grown instrumentation.  But that is 
unrelated to this new tracepoint.  Other than that we're satisfied with 
ftrace.

> You should realize that naturally developers will gravitate towards the most
> 'fun' aspects of a project. It is the task of the maintainer to keep the
> balance between fun and utility, bugs and features, quality and code-rot.
>    

There are plenty of un-fun tasks (like fixing bugs and providing RAS 
features) that we're doing.  We don't do this for fun but to satisfy our 
users.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-22 17:32                                                                                               ` Avi Kivity
@ 2010-03-22 17:52                                                                                                   ` Pekka Enberg
  2010-03-22 17:52                                                                                                   ` Pekka Enberg
  1 sibling, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 17:52 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

Hi Avi,

On Mon, Mar 22, 2010 at 7:32 PM, Avi Kivity <avi@redhat.com> wrote:
>> It's kinda funny to see people argue that having an external
>> repository is not a problem and that it's not a big deal if building
>> something from the repository is slightly painful as long as it
>> doesn't require a PhD when we have _real world_ experience that it
>> _does_ limit developer base in some cases. Whether or not that applies
>> to kvm remains to be seen but I've yet to see a convincing argument
>> why it doesn't.
>
> qemu has non-Linux developers.  Not all of their contributions are relevant
> to kvm but some are.  If we pull qemu into tools/kvm, we lose them.

Yeah, you probably would but the hypothesis is that you'd end up with
a bigger net developer base for the _Linux_ version. Now you might not
think that's important but I certainly do and I think Ingo does as
well. ;-)

That said, pulling 400 KLOC of code into the kernel sounds really
excessive. Would we need all that if we just do native virtualization
and no actual emulation?

                       Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-22 17:52                                                                                                   ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 17:52 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

Hi Avi,

On Mon, Mar 22, 2010 at 7:32 PM, Avi Kivity <avi@redhat.com> wrote:
>> It's kinda funny to see people argue that having an external
>> repository is not a problem and that it's not a big deal if building
>> something from the repository is slightly painful as long as it
>> doesn't require a PhD when we have _real world_ experience that it
>> _does_ limit developer base in some cases. Whether or not that applies
>> to kvm remains to be seen but I've yet to see a convincing argument
>> why it doesn't.
>
> qemu has non-Linux developers.  Not all of their contributions are relevant
> to kvm but some are.  If we pull qemu into tools/kvm, we lose them.

Yeah, you probably would but the hypothesis is that you'd end up with
a bigger net developer base for the _Linux_ version. Now you might not
think that's important but I certainly do and I think Ingo does as
well. ;-)

That said, pulling 400 KLOC of code into the kernel sounds really
excessive. Would we need all that if we just do native virtualization
and no actual emulation?

                       Pekka

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:34                                                                                                             ` Ingo Molnar
@ 2010-03-22 17:55                                                                                                               ` Avi Kivity
  2010-03-22 19:15                                                                                                                 ` Anthony Liguori
  2010-03-22 19:20                                                                                                                 ` Ingo Molnar
  2010-03-22 18:35                                                                                                               ` Anthony Liguori
  2010-03-22 18:41                                                                                                               ` Anthony Liguori
  2 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 17:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 07:34 PM, Ingo Molnar wrote:
>
>> The 'something trustable and kernel-provided'.  The kernel knows nothing
>> about guest names.
>>      
> The kernel certainly knows about other resources such as task names or network
> interface names or tracepoint names. This is kernel design 101.
>    

But it doesn't know about guest names.  You can't trust task names since 
any user can create a task with any name.  Network interfaces are root 
only so you can trust their names.

There are dozens or even hundreds of object classes the kernel does not 
know about and cannot enumerate.  User names, for instance. X sessions.  
Windows (the screen artifact, not the OS).  CIFS shares exported by this 
machine.  Currently running applications (not processes).

btw, network interfaces would have been much better of using 
/dev/netif/name rather than having their own namespace, IMO, like disks.


>>>> [...]  I don't like using the term, because sometimes the layers are
>>>> incorrect and need to be violated.  But it should be done explicitly, not
>>>> as a shortcut for a minor feature (and profiling is a minor feature, most
>>>> users will never use it, especially guest-from-host).
>>>>
>>>> The fact is we have well defined layers today, kvm virtualizes the cpu
>>>> and memory, qemu emulates devices for a single guest, libvirt manages
>>>> guests. We break this sometimes but there has to be a good reason.  So
>>>> perf needs to talk to libvirt if it wants names.  Could be done via
>>>> linking, or can be done using a pluging libvirt drops into perf.
>>>>          
> This is really just the much-discredited microkernel approach for keeping
> global enumeration data that should be kept by the kernel ...
>    

I disagree it should be kept in the kernel.  Why introduce a new 
namespace, with APIs to query it, manage it, rules regarding conflicts, 
then virtualize it for containers.

> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
> There's numerous ways that this can break:
>    

I don't like it either.  We have libvirt for enumerating guests.

>   - Those special files can get corrupted, mis-setup, get out of sync, or can
>     be hard to discover.
>
>   - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
>     design flaw: it is per user. When i'm root i'd like to query _all_ current
>     guest images, not just the ones started by root. A system might not even
>     have a notion of '${HOME}'.
>
>   - Apps might start KVM vcpu instances without adhering to the
>     ${HOME}/.qemu/qmp/ access method.
>    

- it doesn't work with nfs.

>   - There is no guarantee for the Qemu process to reply to a request - while
>     the kernel can always guarantee an enumeration result. I dont want 'perf
>     kvm' to hang or misbehave just because Qemu has hung.
>    

If qemu doesn't reply, your guest is dead anyway.

> Really, for such reasons user-space is pretty poor at doing system-wide
> enumeration and resource management. Microkernels lost for a reason.
>    

Take a look at your desktop, userspace is doing all of that everywhere, 
from enumerating users and groups, to deciding how your disks are 
named.  The kernel only provides the bare facilities.

> You are committing several grave design mistakes here.
>    

I am committing on the shoulders of giants.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:39                                                                                                 ` Ingo Molnar
@ 2010-03-22 17:58                                                                                                   ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 17:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Frank Ch. Eigler, Joerg Roedel, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 07:39 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/22/2010 07:27 PM, Pekka Enberg wrote:
>>      
>>> It's kinda funny to see people argue that having an external repository is
>>> not a problem and that it's not a big deal if building something from the
>>> repository is slightly painful as long as it doesn't require a PhD when we
>>> have _real world_ experience that it _does_ limit developer base in some
>>> cases. Whether or not that applies to kvm remains to be seen but I've yet
>>> to see a convincing argument why it doesn't.
>>>        
>> qemu has non-Linux developers.  Not all of their contributions are relevant
>> to kvm but some are.  If we pull qemu into tools/kvm, we lose them.
>>      
> Qemu had very few developers before KVM made use of it - i know it because i
> followed the project prior KVM.
>    

No argument.

> So whatever development activitity Qemu has today, it's 99% [WAG] attributable
> to KVM. It might have non-Linux contributors, but they wouldnt be there if it
> wasnt for all the Linux contributors ...
>
> Furthermore, those contributors wouldnt have to leave - they could simply use
> a different Git URI ...
>    

tools/kvm would drop support for non-Linux hosts, for tcg, and for 
architectures which kvm doesn't support ("clean and minimal").  That 
would be the real win, not sharing the repository.  But those other 
contributors would just stay with the original qemu.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:43                                                                                               ` Ingo Molnar
@ 2010-03-22 18:02                                                                                                 ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 18:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Frank Ch. Eigler, Joerg Roedel, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 07:43 PM, Ingo Molnar wrote:
>
>> It's kinda funny to see people argue that having an external repository is
>> not a problem and that it's not a big deal if building something from the
>> repository is slightly painful as long as it doesn't require a PhD when we
>> have _real world_ experience that it _does_ limit developer base in some
>> cases. Whether or not that applies to kvm remains to be seen but I've yet to
>> see a convincing argument why it doesn't.
>>      
> Yeah.
>
> Also, if in fact the claim that the 'repository does not matter' is true then
> it doesnt matter that it's hosted in tools/kvm/ either, right?
>    

Again, the second it's moved to tools/kvm/ we strip it off anything that 
kvm can't use.

> I.e. it's a win-win situation. Worst-case nothing happens beyond a Git URI
> change. Best-case the project is propelled to never seen heights due to
> contribution advantages not contemplated and not experienced by the KVM guys
> before ...
>    

You're exaggerating.  There were 773 commits into qemu.git (excluding 
qemu-kvm.git) in the past three months.  162 for the same period for 
tools/perf.  The pool is not that deep.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:52                                                                                                   ` Pekka Enberg
  (?)
@ 2010-03-22 18:04                                                                                                   ` Avi Kivity
  2010-03-22 18:10                                                                                                       ` Pekka Enberg
  -1 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 18:04 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 07:52 PM, Pekka Enberg wrote:
> Hi Avi,
>
> On Mon, Mar 22, 2010 at 7:32 PM, Avi Kivity<avi@redhat.com>  wrote:
>    
>>> It's kinda funny to see people argue that having an external
>>> repository is not a problem and that it's not a big deal if building
>>> something from the repository is slightly painful as long as it
>>> doesn't require a PhD when we have _real world_ experience that it
>>> _does_ limit developer base in some cases. Whether or not that applies
>>> to kvm remains to be seen but I've yet to see a convincing argument
>>> why it doesn't.
>>>        
>> qemu has non-Linux developers.  Not all of their contributions are relevant
>> to kvm but some are.  If we pull qemu into tools/kvm, we lose them.
>>      
> Yeah, you probably would but the hypothesis is that you'd end up with
> a bigger net developer base for the _Linux_ version. Now you might not
> think that's important but I certainly do and I think Ingo does as
> well. ;-)
>    

You're probably correct, but the point is that non-Linux developers also 
contribute things which kvm benefits from.  Not a whole lot, but some.

> That said, pulling 400 KLOC of code into the kernel sounds really
> excessive. Would we need all that if we just do native virtualization
> and no actual emulation?
>    

What is native virtualization and no actual emulation?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:40                                                                                                             ` Pekka Enberg
  (?)
@ 2010-03-22 18:06                                                                                                             ` Avi Kivity
  -1 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 18:06 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ingo Molnar, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 06:40 PM, Pekka Enberg wrote:
> On Mon, Mar 22, 2010 at 6:16 PM, Avi Kivity<avi@redhat.com>  wrote:
>    
>>> You simply kept ignoring me when I said that if something can be kept out
>>> of the kernel without impacting performance, it should be.  I don't want
>>> emergency patches closing some security hole or oops in a kernel symbol
>>> server.
>>>        
>> Or rather, explained how I am a wicked microkernelist.  The herring were out
>> in force today.
>>      
> Well, if it's not being a "wicked microkernelist" then what is it?
>    

I know I'm bad.

> Performance is hardly the only motivation to put things into the
> kernel. Think kernel mode-setting and devtmpfs (with the ironic twist
> of original devfs being removed from the kernel) here, for example.
>    

Motivations include privileged device access, needing to access physical 
memory, security, and keeping the userspace interface sane.  There are 
others.  I don't think any of them hold here.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-22 18:04                                                                                                   ` Avi Kivity
@ 2010-03-22 18:10                                                                                                       ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 18:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 8:04 PM, Avi Kivity <avi@redhat.com> wrote:
>> That said, pulling 400 KLOC of code into the kernel sounds really
>> excessive. Would we need all that if we just do native virtualization
>> and no actual emulation?
>
> What is native virtualization and no actual emulation?

What I meant with "actual emulation" was running architecture A code
on architecture B what was qemu's traditional use case. So the
question was how much of the 400 KLOC do we need for just KVM on all
the architectures that it supports?

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-22 18:10                                                                                                       ` Pekka Enberg
  0 siblings, 0 replies; 390+ messages in thread
From: Pekka Enberg @ 2010-03-22 18:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 8:04 PM, Avi Kivity <avi@redhat.com> wrote:
>> That said, pulling 400 KLOC of code into the kernel sounds really
>> excessive. Would we need all that if we just do native virtualization
>> and no actual emulation?
>
> What is native virtualization and no actual emulation?

What I meant with "actual emulation" was running architecture A code
on architecture B what was qemu's traditional use case. So the
question was how much of the 400 KLOC do we need for just KVM on all
the architectures that it supports?

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 14:47                                                                                             ` Ingo Molnar
@ 2010-03-22 18:15                                                                                               ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 18:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 04:47 PM, Ingo Molnar wrote:
>
>>> If you are interested in the first-hand experience of the people who are
>>> doing the perf work then here it is: by far the biggest reason for perf
>>> success and perf usability is the integration of the user-space tooling
>>> with the kernel-space bits, into a single repository and project.
>>>        
>> Please take a look at the kvm integration code in qemu as a fraction of the
>> whole code base.
>>      
> You have to admit that much of Qemu's past 2-3 years of development was
> motivated by Linux/KVM (i'd say more than 50% of the code).

kvm certainly revitalized qemu development.

> As such it's one
> and the same code base - you just continue to define Qemu to be different from
> KVM.
>    

It's not the same code base.  kvm provides a cpu virtualization service, 
qemu uses it.  There could be other users.  qemu could go away one day 
and be replaced by something else (tools/kvm?), and kvm would be unaffected.

> I very much remember how Qemu looked like _before_ KVM: it was a struggling,
> dying project. KVM clearly changed that.
>    

I'm a hero.

>>> The very move you are opposing so vehemently for KVM.
>>>        
>> I don't want to fracture a working community.
>>      
> Would you accept (or at least not NAK) a new tools/kvm/ tool that builds
> tooling from grounds up, while leaving Qemu untouched? [assuming it's all
> clean code, etc.]
>    

I couldn't NAK tools/kvm any more than I could NAK a new project outside 
the kernel repository.  IMO it would be duplicated effort, but like I 
mentioned before, I can't tell volunteers what to do, only recommend 
that they join the existing effort.

> Although i have doubts about how well that would work 'against' your opinion:
> such a tool would need lots of KVM-side features and a positive attitude from
> you to be really useful. There's a lot of missing functionality to cover.
>    

Functionality that can be implemented in userspace will not be accepted 
into kvm unless there are very good reasons why it should be.  Things 
that belong in kvm will be more than welcome.

>> Seems like perf is also split, with sysprof being developed outside the
>> kernel.  Will you bring sysprof into the kernel?  Will every feature be
>> duplicated in prof and sysprof?
>>      
> I'd prefer if sysprof merged into perf as 'perf view' - but its maintainer
> does not want that - which is perfectly OK.

You spared him the flamewar, I hope.

> So we are building equivalent
> functionality into perf instead.
>    

Ah, duplicating effort.  Great.

> Think about it like Firefox plugins: the main Firefox project picks up the
> functionality of the most popular Firefox plugins all the time. Session Saver,
> Tab Mix Plus, etc. were all in essence 'merged' (in functionality, not in
> code) into the 'reference' Firefox project.
>    

There's a difference between absorbing a small plugin and duplicating a 
project.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:59                                                                                                         ` Ingo Molnar
@ 2010-03-22 18:28                                                                                                           ` Anthony Liguori
  0 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 18:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 11:59 AM, Ingo Molnar wrote:
>
> Ok, that sounds interesting! I'd rather see some raw mechanism that 'perf kvm'
> could use instead of having to require yet another library (which generally
> dampens adoption of a tool). So i think we can work from there.
>    

You can access the protocol directly if you don't want a library dependency.

> Btw., have you considered using Qemu's command name (task->comm[]) as the
> symbolic name? That way we could see the guest name in 'top' on the host - a
> nice touch.
>    

qemu-system-x86_64 -name Fedora,process=qemu-Fedora

Does exactly that.  We don't make this default based on the element of 
least surprise.  Many users expect to be able to do killall 
qemu-system-x86 and if we did this by default, that wouldn't work.

>> The sockets are named based on UUID and you'll have to connect to a guest
>> and ask it for it's name.  Some guests don't have names so we'll have to
>> come up with a clever way to describe a nameless VM.
>>      
> I think just exposing the UUID in that lazy case would be adequate? It creates
> pressure for VM launchers to use better symbolic names.
>    

Yup.

>>> I.e.:
>>>
>>>   - Easy default reference to guest instances, and a way for tools to
>>>     reference them symbolically as well in the multi-guest case. Preferably
>>>     something trustable and kernel-provided - not some indirect information
>>>     like a PID file created by libvirt-manager or so.
>>>        
>> A guest is not a KVM concept.  It's a qemu concept so it needs to be
>> something provided by qemu.  The other caveat is that you won't see guests
>> created by libvirt because we're implementing this in terms of a default QMP
>> device and libvirt will disable defaults.  This is desired behaviour.
>> libvirt wants to be in complete control and doesn't want a tool like perf
>> interacting with a guest directly.
>>      
> Hm, this sucks for multiple reasons. Firstly, perf isnt a tool that
> 'interacts', it's an observation tool: just like 'top' is an observation tool.
>
> We want to enable developers to see all activities on the system - regardless
> of who started the VM or who started the process. Imagine if we had a way to
> hide tasks to hide from 'top'. It would be rather awful.
>
> Secondly, it tells us that the concept is fragile if it doesnt automatically
> enumerate all guests, regardless of how they were created.
>    

Perf does interact with a guest though because it queries a guest to 
read it's file system.

I understand the point you're making though.  If instead of doing a pull 
interface where the host queries the guest for files, if the guest 
pushed a small set of files at startup which the host cached, then you 
could potentially unconditionally expose a "read-only" socket that only 
exposed limited information.

> Full system enumeration is generally best left to the kernel, as it can offer
> coherent access.
>    

I don't see why qemu can't offer coherent access.  The limitation today 
is intentional and if it's overly restrictive, we can figure out a means 
to change it.

Regards,

Anthony Liguori


> 	Ingo
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:11                                                                                                         ` Ingo Molnar
@ 2010-03-22 18:30                                                                                                           ` Anthony Liguori
  0 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 18:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 12:11 PM, Ingo Molnar wrote:
> * Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>    
>>>   - Easy default reference to guest instances, and a way for tools to
>>>     reference them symbolically as well in the multi-guest case. Preferably
>>>     something trustable and kernel-provided - not some indirect information
>>>     like a PID file created by libvirt-manager or so.
>>>        
>> A guest is not a KVM concept. [...]
>>      
> Well, in a sense a guest is a KVM concept too: it's in essence represented via
> the 'vcpu state attached to a struct mm' abstraction that is attached to the
> /dev/kvm file descriptor attached to a Linux process.
>
> Multiple vcpus can be started by the same process to represent SMP, but the
> whole guest notion is present: a Linux MM that carries KVM state.
>
> In that sense when we type 'perf kvm list' we'd like to get a list of all
> currently present guests that the developer has permission to profile: i.e.
> we'd like a list of all [debuggable] Linux tasks that have a KVM instance
> attached to them.
>
> A convenient way to do that would be to use the Qemu process's ->comm[] name,
> and to have a KVM ioctl that gets us a list of all vcpus that the querying
> task has ptrace permission to. [the standard permission check we do for
> instrumentation]
>
> No need for communication with Qemu for that - just an ioctl, and an
> always-guaranteed result that works fine on a whole-system and on a per user
> basis as well.
>    

You need a way to interact with the guest which means you need some type 
of device.  All of the interesting devices are implemented in qemu so 
you're going to have to interact with qemu if you want meaningful 
interaction with a guest.

Regards,

Anthony Liguori

> Thanks,
>
> 	Ingo
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:34                                                                                                             ` Ingo Molnar
  2010-03-22 17:55                                                                                                               ` Avi Kivity
@ 2010-03-22 18:35                                                                                                               ` Anthony Liguori
  2010-03-22 19:22                                                                                                                 ` Ingo Molnar
  2010-03-22 18:41                                                                                                               ` Anthony Liguori
  2 siblings, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 18:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 12:34 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>>>>   - Easy default reference to guest instances, and a way for tools to
>>>>>     reference them symbolically as well in the multi-guest case. Preferably
>>>>>     something trustable and kernel-provided - not some indirect information
>>>>>     like a PID file created by libvirt-manager or so.
>>>>>            
>>>> Usually 'layering violation' is trotted out at such suggestions.
>>>> [...]
>>>>          
>>> That's weird, how can a feature request be a 'layering violation'?
>>>        
>> The 'something trustable and kernel-provided'.  The kernel knows nothing
>> about guest names.
>>      
> The kernel certainly knows about other resources such as task names or network
> interface names or tracepoint names. This is kernel design 101.
>
>    
>>> If something that users find straightforward and usable is a layering
>>> violation to you (such as easily being able to access their own files on
>>> the host as well ...) then i think you need to revisit the definition of
>>> that term instead of trying to fix the user.
>>>        
>> Here is the explanation, you left it quoted:
>>
>>      
>>>> [...]  I don't like using the term, because sometimes the layers are
>>>> incorrect and need to be violated.  But it should be done explicitly, not
>>>> as a shortcut for a minor feature (and profiling is a minor feature, most
>>>> users will never use it, especially guest-from-host).
>>>>
>>>> The fact is we have well defined layers today, kvm virtualizes the cpu
>>>> and memory, qemu emulates devices for a single guest, libvirt manages
>>>> guests. We break this sometimes but there has to be a good reason.  So
>>>> perf needs to talk to libvirt if it wants names.  Could be done via
>>>> linking, or can be done using a pluging libvirt drops into perf.
>>>>          
> This is really just the much-discredited microkernel approach for keeping
> global enumeration data that should be kept by the kernel ...
>
> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
> There's numerous ways that this can break:
>
>   - Those special files can get corrupted, mis-setup, get out of sync, or can
>     be hard to discover.
>
>   - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
>     design flaw: it is per user. When i'm root i'd like to query _all_ current
>     guest images, not just the ones started by root. A system might not even
>     have a notion of '${HOME}'.
>
>   - Apps might start KVM vcpu instances without adhering to the
>     ${HOME}/.qemu/qmp/ access method.
>    

Not all KVM vcpus are running operating systems.

Transitive had a product that was using a KVM context to run their 
binary translator which allowed them full access to the host processes 
virtual address space range.  In this case, there is no kernel and there 
are no devices.

That's what I mean by a guest being a userspace context.  KVM simply 
provides a new CPU mode to userspace in the same way that vm8086 mode.

Regards,

Anthony Liguori

>   - There is no guarantee for the Qemu process to reply to a request - while
>     the kernel can always guarantee an enumeration result. I dont want 'perf
>     kvm' to hang or misbehave just because Qemu has hung.
>
> Really, for such reasons user-space is pretty poor at doing system-wide
> enumeration and resource management. Microkernels lost for a reason.
>
> You are committing several grave design mistakes here.
>
> Thanks,
>
> 	Ingo
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:34                                                                                                             ` Ingo Molnar
  2010-03-22 17:55                                                                                                               ` Avi Kivity
  2010-03-22 18:35                                                                                                               ` Anthony Liguori
@ 2010-03-22 18:41                                                                                                               ` Anthony Liguori
  2010-03-22 19:27                                                                                                                 ` Ingo Molnar
  2 siblings, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 18:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 12:34 PM, Ingo Molnar wrote:
> This is really just the much-discredited microkernel approach for keeping
> global enumeration data that should be kept by the kernel ...
>
> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
> There's numerous ways that this can break:
>
>   - Those special files can get corrupted, mis-setup, get out of sync, or can
>     be hard to discover.
>
>   - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
>     design flaw: it is per user. When i'm root i'd like to query _all_ current
>     guest images, not just the ones started by root. A system might not even
>     have a notion of '${HOME}'.
>
>   - Apps might start KVM vcpu instances without adhering to the
>     ${HOME}/.qemu/qmp/ access method.
>
>   - There is no guarantee for the Qemu process to reply to a request - while
>     the kernel can always guarantee an enumeration result. I dont want 'perf
>     kvm' to hang or misbehave just because Qemu has hung.
>    

If your position basically boils down to, we can't trust userspace and 
we can always trust the kernel, I want to eliminate any userspace path, 
then I can't really help you out.

I believe we can come up with an infrastructure that satisfies your 
actual requirements within qemu but if you're also insisting upon the 
above implementation detail then there's nothing I can do.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 18:10                                                                                                       ` Pekka Enberg
  (?)
@ 2010-03-22 18:55                                                                                                       ` Avi Kivity
  -1 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 18:55 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Frank Ch. Eigler, Ingo Molnar, Joerg Roedel, Anthony Liguori,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 08:10 PM, Pekka Enberg wrote:
> On Mon, Mar 22, 2010 at 8:04 PM, Avi Kivity<avi@redhat.com>  wrote:
>    
>>> That said, pulling 400 KLOC of code into the kernel sounds really
>>> excessive. Would we need all that if we just do native virtualization
>>> and no actual emulation?
>>>        
>> What is native virtualization and no actual emulation?
>>      
> What I meant with "actual emulation" was running architecture A code
> on architecture B what was qemu's traditional use case. So the
> question was how much of the 400 KLOC do we need for just KVM on all
> the architectures that it supports?
>    

qemu is 620 KLOC.  Without cpu emulation that drops to ~480 KLOC.  Much 
of that is device emulation that is not supported by kvm now (like ARM) 
but some might be needed again in the future (like ARM).

x86-only is perhaps 300 KLOC, but kvm is not x86 only.

And that is with a rudimentary GUI.  GUIs are heavy.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 14:54                                                                                               ` Ingo Molnar
@ 2010-03-22 19:04                                                                                                 ` Avi Kivity
  2010-03-23  9:46                                                                                                 ` Olivier Galibert
  1 sibling, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 19:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Anthony Liguori, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, sandmann

On 03/22/2010 04:54 PM, Ingo Molnar wrote:
> * Pekka Enberg<penberg@cs.helsinki.fi>  wrote:
>
>    
>> Hi Avi,
>>
>> On Mon, Mar 22, 2010 at 2:49 PM, Avi Kivity<avi@redhat.com>  wrote:
>>      
>>> Seems like perf is also split, with sysprof being developed outside the
>>> kernel. ?Will you bring sysprof into the kernel? ?Will every feature be
>>> duplicated in prof and sysprof?
>>>        
>> I am glad you brought it up! Sysprof was historically outside of the kernel
>> (with it's own kernel module, actually). While the GUI was nice, it was much
>> harder to set up compared to oprofile so it wasn't all that popular. Things
>> improved slightly when Ingo merged the custom kernel module but the
>> _userspace_ part of sysprof was lagging behind a bit. I don't know what's
>> the situation now that they've switched over to perf syscalls but you
>> probably get my point.
>>
>> It would be nice if the two projects merged but I honestly don't see any
>> fundamental problem with two (or more) co-existing projects. Friendly
>> competition will ultimately benefit the users (think KDE and Gnome here).
>>      
> See my previous mail - what i see as the most healthy project model is to have
> a full solution reference implementation, connected to a flexible halo of
> plugins or sub-apps.
>
> Firefox does that, KDE does that, and Gnome as well to a certain degree.
>
> The 'halo' provides a constant feedback of new features, and it also provides
> competition and pressure on the 'main' code to be top-notch.
>
> The problem i see with KVM is that there's no reference implementation! There
> is _only_ the KVM kernel part which is not functional in itself. Surrounded by
> a 'halo' - where none of the entities is really 'the' reference implementation
> we call 'KVM'.
>    

The reference implementation is qemu-kvm.git, in the future qemu.git.  
Like the reference implementation of device-mapper is 
lvm2/device-mapper, not tools/device-mapper.

> This causes constant quality problems as the developers of the main project
> dont have constant pressure towards good quality (it is not their
> responsibility to care about user-space bits after all),

The developers of the main project are very much aware that users don't 
call the ioctls directly but instead use qemu.

>   plus it causes a lack
> of focus as well: integration between (friendly) competing user-space
> components is a lot harder than integration within a single framework such as
> Firefox.
>    

We are very focused, just not on what you think we should be focused.

> I hope this explains my points about modularization a bit better! I suggested
> KVM to grow a user-space tool component in the kernel repo in tools/kvm/,
> which would become the reference implementation for tooling. User-space
> projects can still provide alternative tooling or can plug into this tooling,
> just like they are doing it now. So the main effect isnt even on those
> projects but on the kernel developers. The ABI remains and all the user-space
> packages and projects remain.
>    

Seems like wanton duplication of effort.  Can we throw so many 
developer-years away on duplicate projects?  Assuming not all are true 
volunteers (85% for 2.6.33) who will fund this duplicate effort?

> Yes, i thought Qemu would be a prime candidate to be the baseline for
> tools/kvm/, but i guess that has become socially impossible now after this
> flamewar. It's not a big problem in the big scheme of things: tools/kvm/ is
> best grown up from a small towards larger size anyway ...
>    

Qemu is open source, you can cp it into tools/kvm.  Rewriting it from 
scratch is a mammoth effort, there's a reason kvm, Xen, and virtualbox 
all use qemu.  Qemu itself copied code from bochs.  Writing this stuff 
is hard, especially if there is something already working.

You'll probably get much better threading (the qemu device model is 
still single threaded), but it will take years to reach where qemu is 
already at.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:44                                                                                           ` Avi Kivity
@ 2010-03-22 19:10                                                                                             ` Ingo Molnar
  2010-03-22 19:18                                                                                               ` Anthony Liguori
                                                                                                                 ` (2 more replies)
  0 siblings, 3 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 19:10 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Joerg Roedel, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Avi Kivity <avi@redhat.com> wrote:

> On 03/22/2010 06:32 PM, Ingo Molnar wrote:
> >
> > So, what do you think creates code communities and keeps them alive? 
> > Developers and code. And the wellbeing of developers are primarily 
> > influenced by the repository structure and by the development/maintenance 
> > process - i.e. by the 'fun' aspect. (i'm simplifying things there but 
> > that's the crux of it.)
> 
> There is nothing fun about having one repository or two.  Who cares about 
> this anyway?
> 
> tools/kvm/ probably will draw developers, simply because of the glory 
> associated with kernel work.  That's a bug, not a feature.  It means that 
> effort is not distributed according to how it's needed, but because of 
> irrelevant considerations.

And yet your solution to that is to ... do all your work in the kernel space 
and declare the tooling as something that does not interest you? ;-)

> Something I've wanted for a long time is to port kvm_stat to use tracepoints 
> instead of the home-grown instrumentation.  But that is unrelated to this 
> new tracepoint.  Other than that we're satisfied with ftrace.

Despite it being another in-kernel subsystem that by your earlier arguments 
should be done via a user-space package? ;-)

> > You should realize that naturally developers will gravitate towards the 
> > most 'fun' aspects of a project. It is the task of the maintainer to keep 
> > the balance between fun and utility, bugs and features, quality and 
> > code-rot.
> 
> There are plenty of un-fun tasks (like fixing bugs and providing RAS 
> features) that we're doing.  We don't do this for fun but to satisfy our 
> users.

So which one is it, KVM developers are volunteers that do fun stuff and cannot 
be told about project priorities, or KVM developers are pros who do unfun 
stuff because they can be told about priorities?

I posit that it's both: and that priorities can be communicated - if only you 
try as a maintainer. All i'm suggesting is to add 'usable, unified user-space' 
to the list of unfun priorities, because it's possible and because it matters.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:55                                                                                                               ` Avi Kivity
@ 2010-03-22 19:15                                                                                                                 ` Anthony Liguori
  2010-03-22 19:31                                                                                                                   ` Daniel P. Berrange
  2010-03-22 20:00                                                                                                                   ` Antoine Martin
  2010-03-22 19:20                                                                                                                 ` Ingo Molnar
  1 sibling, 2 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 19:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 12:55 PM, Avi Kivity wrote:
>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
>> Anthony.
>> There's numerous ways that this can break:
>
> I don't like it either.  We have libvirt for enumerating guests.

We're stuck in a rut with libvirt and I think a lot of the 
dissatisfaction with qemu is rooted in that.  It's not libvirt that's 
the probably, but the relationship between qemu and libvirt.

We add a feature to qemu and maybe after six month it gets exposed by 
libvirt.  Release time lines of the two projects complicate the 
situation further.  People that write GUIs are limited by libvirt 
because that's what they're told to use and when they need something 
simple, they're presented with first getting that feature implemented in 
qemu, then plumbed through libvirt.

It wouldn't be so bad if libvirt was basically a passthrough interface 
to qemu but it tries to model everything in a generic way which is more 
or less doomed to fail when you're adding lots of new features (as we are).

The list of things that libvirt doesn't support and won't any time soon 
is staggering.

libvirt serves an important purpose, but we need to do a better job in 
qemu with respect to usability.  We can't just punt to libvirt.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:10                                                                                             ` Ingo Molnar
@ 2010-03-22 19:18                                                                                               ` Anthony Liguori
  2010-03-22 19:23                                                                                               ` Avi Kivity
  2010-03-22 19:28                                                                                               ` Andrea Arcangeli
  2 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 19:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Joerg Roedel, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 02:10 PM, Ingo Molnar wrote:
>
> I posit that it's both: and that priorities can be communicated - if only you
> try as a maintainer. All i'm suggesting is to add 'usable, unified user-space'
> to the list of unfun priorities, because it's possible and because it matters.
>    

I've spent the past few months dealing with customers using the 
libvirt/qemu/kvm stack.  Usability is a major problem and is a top 
priority for me.  That is definitely a shift but that occurred before 
you started your thread.

But I disagree with your analysis of what the root of the problem is.  
It's a very kernel centric view and doesn't consider the interactions 
between userspace.

Regards,

Anthony Liguori

> 	Ingo
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 17:55                                                                                                               ` Avi Kivity
  2010-03-22 19:15                                                                                                                 ` Anthony Liguori
@ 2010-03-22 19:20                                                                                                                 ` Ingo Molnar
  2010-03-22 19:44                                                                                                                   ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 19:20 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> > Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
> > Anthony. There's numerous ways that this can break:
> 
> I don't like it either.  We have libvirt for enumerating guests.

Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution, 
obviously.

> >  - Those special files can get corrupted, mis-setup, get out of sync, or can
> >    be hard to discover.
> >
> >  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
> >    design flaw: it is per user. When i'm root i'd like to query _all_ current
> >    guest images, not just the ones started by root. A system might not even
> >    have a notion of '${HOME}'.
> >
> >  - Apps might start KVM vcpu instances without adhering to the
> >    ${HOME}/.qemu/qmp/ access method.
> 
> - it doesn't work with nfs.

So out of a list of 4 disadvantages your reply is that you agree with 3?

> >  - There is no guarantee for the Qemu process to reply to a request - while
> >    the kernel can always guarantee an enumeration result. I dont want 'perf
> >    kvm' to hang or misbehave just because Qemu has hung.
> 
> If qemu doesn't reply, your guest is dead anyway.

Erm, but i'm talking about a dead tool here. There's a world of a difference 
between 'kvm top' not showing new entries (because the guest is dead), and 
'perf kvm top' hanging due to Qemu hanging.

So it's essentially 4 our of 4. Yet your reply isnt "Ingo you are right" but 
"hey, too bad" ?

> > Really, for such reasons user-space is pretty poor at doing system-wide 
> > enumeration and resource management. Microkernels lost for a reason.
> 
> Take a look at your desktop, userspace is doing all of that everywhere, from 
> enumerating users and groups, to deciding how your disks are named.  The 
> kernel only provides the bare facilities.

We dont do that for robust system instrumentation, for heaven's sake!

By your argument it would be perfectly fine to implement /proc purely via 
user-space, correct?

> > You are committing several grave design mistakes here.
> 
> I am committing on the shoulders of giants.

Really, this is getting outright ridiculous. You agree with me that Anothony 
suggested a technically inferior solution, yet you even seem to be proud of it 
and are joking about it?

And _you_ are complaining about lkml-style hard-talk discussions?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 16:32                                                                                         ` Ingo Molnar
  2010-03-22 17:17                                                                                           ` Frank Ch. Eigler
  2010-03-22 17:44                                                                                           ` Avi Kivity
@ 2010-03-22 19:20                                                                                           ` Joerg Roedel
  2010-03-22 19:28                                                                                             ` Avi Kivity
  2010-03-22 19:49                                                                                             ` Ingo Molnar
  2 siblings, 2 replies; 390+ messages in thread
From: Joerg Roedel @ 2010-03-22 19:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 05:32:15PM +0100, Ingo Molnar wrote:
> I dont know how you can find the situation of Alpha comparable, which is a 
> legacy architecture for which no new CPU was manufactored in the past ~10 
> years.
> 
> The negative effects of physical obscolescence cannot be overcome even by the 
> very best of development models ...

The maintainers of that architecture could at least continue to maintain
it. But that is not the case. Most newer syscalls are not available and
overall stability on alpha sucks (kernel crashed when I tried to start
Xorg for example) but nobody cares about it. Hardware is still around
and there are still some users of it.

> > > * Joerg Roedel <joro@8bytes.org> wrote:
> > No, the split-repository situation was the smallest problem after all. Its 
> > was a community thing. If the community doesn't work a single-repo project 
> > will also fail. [...]
> 
> So, what do you think creates code communities and keeps them alive? 
> Developers and code. And the wellbeing of developers are primarily influenced 
> by the repository structure and by the development/maintenance process - i.e. 
> by the 'fun' aspect. (i'm simplifying things there but that's the crux of it.)

Right. A living community needs developers that write new code. And the
repository structure is one important thing. But in my opinion it is not
the most important one. With my 3-4 years experience in the kernel
community I made the experience that the maintainers are the most
important factor. I find a maintainer not commiting or caring about
patches or not releasing new versions much worse than the wrong
repository structure.
oProfile has this problem with its userspace part. I partly made this
bad experience with x86-64 before the architecture merge. KVM does not
have this problem.

> So yes, i do claim that what stiffled and eventually killed off the Oprofile 
> community was the split repository. None of the other Oprofile shortcomings 
> were really unfixable, but this one was. It gave no way for the community to 
> grow in a healthy way, after the initial phase. Features were more difficult 
> and less fun to develop.

The biggest problem oProfile has is that it does not support per-process
measuring. This is indeed not unfixable but it also doesn't fit well in
the overall oProfile concept.

> I simply do not want to see KVM face the same fate, and yes i do see similar 
> warnings signs.

In fact, the development process in KVM has improved over time. In the
early beginnings everything was kept in svn. Avi switched to git some
day but at the time when we had these kvm-XX releases both kernel- and
user-space together were unbisectable. This has improved to a point
where the kernel-part could be bisected. The KVM maintainers and
community have shown in the past that they can address problems with the
development process if they come up.

> Oprofile certainly had good developers and maintainers as well. In the end it 
> wasnt enough ...
> 
> Also, a project can easily still be 'alive' but not reach its full potential. 
> 
> Why do you assume that my argument means that KVM isnt viable today? It can 
> very well still be viable and even healthy - just not _as healthy_ as it could 
> be ...

I am not aware that I made you say anything ;-)

> 
> > > The difference is that we dont have KVM with a decade of history and we 
> > > dont have a 'told you so' KVM reimplementation to show that proves the 
> > > point. I guess it's a matter of time before that happens, because Qemu 
> > > usability is so absymal today - so i guess we should suspend any 
> > > discussions until that happens, no need to waste time on arguing 
> > > hypoteticals.
> > 
> > We actually have lguest which is small. But it lacks functionality and the 
> > developer community KVM has attracted.
> 
> I suggested long ago to merge lguest into KVM to cover non-VMX/non-SVM 
> execution.

That would have been the best. Rusty already started this work and
presented it at the first KVM Forum. But I have never seen patches ...

> > > I think you are rationalizing the status quo.
> > 
> > I see that there are issues with KVM today in some areas. You pointed out 
> > the desktop usability already. I personally have trouble with the 
> > qem-kvm.git because it is unbisectable. But repository unification doesn't 
> > solve the problem here.
> 
> Why doesnt it solve the bisectability problem? The kernel repo is supposed to 
> be bisectable so that problem would be solved.

Because Marcelo and Avi try to keep as close to upstream qemu as
possible. So the qemu repo is regularly merged in qemu-kvm and if you
want to bisect you may end up somewhere in the middle of the qemu
repository which has only very minimal kvm-support.
The problem here is that two qemu repositorys exist. But the current
effort of Anthony is directed to create a single qemu repository. But
thats not done overnight.
Merging qemu into the kernel would make Linus in fact a qemu maintainer.
I am not sure he wants to be that ;-)
 
> In my judgement you'd have to do that more frequently, if KVM was properly 
> weighting its priorities. For example regarding this recent KVM commit of 
> yours:
> 
> | commit ec1ff79084fccdae0dca9b04b89dcdf3235bbfa1
> | Author: Joerg Roedel <joerg.roedel@amd.com>
> | Date:   Fri Oct 9 16:08:31 2009 +0200
> |
> |     KVM: SVM: Add tracepoint for invlpga instruction
> |     
> |     This patch adds a tracepoint for the event that the guest
> |     executed the INVLPGA instruction.
> 
> With integrated KVM tooling i might have insisted for that new tracepoint to 
> be available to users as well via some more meaningful tooling than just a 
> pure tracepoint.
> 
> There's synergies like that all around the place.

True. Tools for better analyzing kvm traces is for sure something that
belongs to tools/kvm. I am not sure if anyone has such tools. If yes,
they should send it upstream.

> > > It's as if you argued in 1990 that the unification of East and West 
> > > Germany wouldnt make much sense because despite clear problems and 
> > > incompatibilites and different styles westerners were still allowed to 
> > > visit eastern relatives and they both spoke the same language after all 
> > > ;-)
> > 
> > Um, hmm. I don't think these situations have enough in common to compare 
> > them ;-)
> 
> Probably, but it's an interesting parallel nevertheless ;-)

That for sure ;-)

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 18:35                                                                                                               ` Anthony Liguori
@ 2010-03-22 19:22                                                                                                                 ` Ingo Molnar
  2010-03-22 19:29                                                                                                                   ` Anthony Liguori
  2010-03-22 19:45                                                                                                                   ` Avi Kivity
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 19:22 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/22/2010 12:34 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>>>>  - Easy default reference to guest instances, and a way for tools to
> >>>>>    reference them symbolically as well in the multi-guest case. Preferably
> >>>>>    something trustable and kernel-provided - not some indirect information
> >>>>>    like a PID file created by libvirt-manager or so.
> >>>>Usually 'layering violation' is trotted out at such suggestions.
> >>>>[...]
> >>>That's weird, how can a feature request be a 'layering violation'?
> >>The 'something trustable and kernel-provided'.  The kernel knows nothing
> >>about guest names.
> >The kernel certainly knows about other resources such as task names or network
> >interface names or tracepoint names. This is kernel design 101.
> >
> >>>If something that users find straightforward and usable is a layering
> >>>violation to you (such as easily being able to access their own files on
> >>>the host as well ...) then i think you need to revisit the definition of
> >>>that term instead of trying to fix the user.
> >>Here is the explanation, you left it quoted:
> >>
> >>>>[...]  I don't like using the term, because sometimes the layers are
> >>>>incorrect and need to be violated.  But it should be done explicitly, not
> >>>>as a shortcut for a minor feature (and profiling is a minor feature, most
> >>>>users will never use it, especially guest-from-host).
> >>>>
> >>>>The fact is we have well defined layers today, kvm virtualizes the cpu
> >>>>and memory, qemu emulates devices for a single guest, libvirt manages
> >>>>guests. We break this sometimes but there has to be a good reason.  So
> >>>>perf needs to talk to libvirt if it wants names.  Could be done via
> >>>>linking, or can be done using a pluging libvirt drops into perf.
> >This is really just the much-discredited microkernel approach for keeping
> >global enumeration data that should be kept by the kernel ...
> >
> >Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
> >There's numerous ways that this can break:
> >
> >  - Those special files can get corrupted, mis-setup, get out of sync, or can
> >    be hard to discover.
> >
> >  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
> >    design flaw: it is per user. When i'm root i'd like to query _all_ current
> >    guest images, not just the ones started by root. A system might not even
> >    have a notion of '${HOME}'.
> >
> >  - Apps might start KVM vcpu instances without adhering to the
> >    ${HOME}/.qemu/qmp/ access method.
> 
> Not all KVM vcpus are running operating systems.

But we want to allow developers to instrument all of them ...

> Transitive had a product that was using a KVM context to run their
> binary translator which allowed them full access to the host
> processes virtual address space range.  In this case, there is no
> kernel and there are no devices.

And your point is that such vcpus should be excluded from profiling just 
because they fall outside the Qemu/libvirt umbrella?

That is a ridiculous position.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:10                                                                                             ` Ingo Molnar
  2010-03-22 19:18                                                                                               ` Anthony Liguori
@ 2010-03-22 19:23                                                                                               ` Avi Kivity
  2010-03-22 19:28                                                                                               ` Andrea Arcangeli
  2 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 19:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Joerg Roedel, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 09:10 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/22/2010 06:32 PM, Ingo Molnar wrote:
>>      
>>> So, what do you think creates code communities and keeps them alive?
>>> Developers and code. And the wellbeing of developers are primarily
>>> influenced by the repository structure and by the development/maintenance
>>> process - i.e. by the 'fun' aspect. (i'm simplifying things there but
>>> that's the crux of it.)
>>>        
>> There is nothing fun about having one repository or two.  Who cares about
>> this anyway?
>>
>> tools/kvm/ probably will draw developers, simply because of the glory
>> associated with kernel work.  That's a bug, not a feature.  It means that
>> effort is not distributed according to how it's needed, but because of
>> irrelevant considerations.
>>      
> And yet your solution to that is to ... do all your work in the kernel space
> and declare the tooling as something that does not interest you? ;-)
>    

I have done plenty of userspace work in qemu.  I don't have a lack of 
interest in qemu, just in a desktop GUI.  I'm not a GUI person and my 
employer doesn't have a desktop-on-desktop virtualization product that I 
know of.

>> Something I've wanted for a long time is to port kvm_stat to use tracepoints
>> instead of the home-grown instrumentation.  But that is unrelated to this
>> new tracepoint.  Other than that we're satisfied with ftrace.
>>      
> Despite it being another in-kernel subsystem that by your earlier arguments
> should be done via a user-space package? ;-)
>    

I'm satisfied with it as a user.  Architecturally, I'd have preferred it 
to be a userspace tool.  It might have improved usability as well to 
have something with --help instead of a set of debugfs files.  But I'm a 
lot happier with ftrace existing as a kernel component than not at all.

>>> You should realize that naturally developers will gravitate towards the
>>> most 'fun' aspects of a project. It is the task of the maintainer to keep
>>> the balance between fun and utility, bugs and features, quality and
>>> code-rot.
>>>        
>> There are plenty of un-fun tasks (like fixing bugs and providing RAS
>> features) that we're doing.  We don't do this for fun but to satisfy our
>> users.
>>      
> So which one is it, KVM developers are volunteers that do fun stuff and cannot
> be told about project priorities, or KVM developers are pros who do unfun
> stuff because they can be told about priorities?
>    

 From my point of view as maintainer, all contributors are volunteers, I 
can't tell any of them what to do.  From the point of view of many of 
these volunteer's employers, they are wage slaves who do as they're told 
or else.

So: when someone sends me a patch I gratefully accept if it is good or 
point out the issues if not.  At the secret Red Hat headquarters and the 
kvm weekly conference call I participate in deciding priorities and task 
assignments.

> I posit that it's both: and that priorities can be communicated - if only you
> try as a maintainer. All i'm suggesting is to add 'usable, unified user-space'
> to the list of unfun priorities, because it's possible and because it matters.
>    

So: I require a volunteer to write some GUI code before I accept a 
patch.  Back at the Red Hat lair, we think of what features we drop from 
the product because the kvm maintainer has gone nuts.

The 'unified' part of your suggestion is not a requirement, but an 
implementation detail.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 18:41                                                                                                               ` Anthony Liguori
@ 2010-03-22 19:27                                                                                                                 ` Ingo Molnar
  2010-03-22 19:47                                                                                                                   ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 19:27 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/22/2010 12:34 PM, Ingo Molnar wrote:
> >This is really just the much-discredited microkernel approach for keeping
> >global enumeration data that should be kept by the kernel ...
> >
> >Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by Anthony.
> >There's numerous ways that this can break:
> >
> >  - Those special files can get corrupted, mis-setup, get out of sync, or can
> >    be hard to discover.
> >
> >  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
> >    design flaw: it is per user. When i'm root i'd like to query _all_ current
> >    guest images, not just the ones started by root. A system might not even
> >    have a notion of '${HOME}'.
> >
> >  - Apps might start KVM vcpu instances without adhering to the
> >    ${HOME}/.qemu/qmp/ access method.
> >
> >  - There is no guarantee for the Qemu process to reply to a request - while
> >    the kernel can always guarantee an enumeration result. I dont want 'perf
> >    kvm' to hang or misbehave just because Qemu has hung.
> 
> If your position basically boils down to, we can't trust userspace
> and we can always trust the kernel, I want to eliminate any
> userspace path, then I can't really help you out.

Why would you want to 'help me out'? I can tell a good solution from a bad one 
just fine.

You should instead read the long list of disadvantages above, invert them and 
list then as advantages for the kernel-based vcpu enumeration solution, apply 
common sense and go admit to yourself that indeed in this situation a kernel 
provided enumeration of vcpu contexts is the most robust solution.

It's really as simple as that :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:10                                                                                             ` Ingo Molnar
  2010-03-22 19:18                                                                                               ` Anthony Liguori
  2010-03-22 19:23                                                                                               ` Avi Kivity
@ 2010-03-22 19:28                                                                                               ` Andrea Arcangeli
  2 siblings, 0 replies; 390+ messages in thread
From: Andrea Arcangeli @ 2010-03-22 19:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Joerg Roedel, Anthony Liguori, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On Mon, Mar 22, 2010 at 08:10:28PM +0100, Ingo Molnar wrote:
> I posit that it's both: and that priorities can be communicated - if only you 
> try as a maintainer. All i'm suggesting is to add 'usable, unified user-space' 
> to the list of unfun priorities, because it's possible and because it matters.

IMHO blaming anybody for it but qemu maintainership is very
unfair. They intentionally reinveinted a less self contained,
inferior, underperforming, underfeatured wheel instead of doing the
right thing and just making sure that it as self contained enough as
possible to avoid risking destabilizing their existing codebase. What
can anybody (without qemu git commit access) do about it unless qemu
git maintainer change attitude, dumps its qemu/kvm-all.c nosense for
good, and do the right thing so we can unify for real?

We need to move forward, including multithread the qemu core and be
ready to include desktop virtualization protocol when they're ready
for submission without being suggested to extend vnc instead to gain a
similar speedup (i.e. yet another inferior wheel).

Unification means that _all_ qemu users, pure research, theoretical
interest, Xen, virtualbox, weird pure software architecture, will be
able to push their stuff in for the common good, but that also shall
apply to KVM! It has to become clear that reinveinting inferior wheels
instead of merging the real thing, is absolutely time wasteful,
unnecessary, and it won't make any difference as far as KVM is
concerned, proof is that 0% of userbase runs qemu git to run KVM
(except the kvm-all.c developers to test it perhaps or somebody by
mistake not adding -kvm prefix to command line maybe). I don't pretend
to rate KVM as more important than all the rest of niche usages for
qemu but it shall be _as_ important as the rest and it'd be nice one
day to be able to install only qemu on a system and get something
actually usable in production.

I very much like that qemu gets contributions from everywhere, it's
also nice it can run without KVM (that is purely useful as a
debugging tool to me but still...). I think it can all happen and
unification should be the object for the gain of everyone in both
qemu/kvm and even xen and all the rest.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:20                                                                                           ` Joerg Roedel
@ 2010-03-22 19:28                                                                                             ` Avi Kivity
  2010-03-22 19:49                                                                                             ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 19:28 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 09:20 PM, Joerg Roedel wrote:
>
>> Why doesnt it solve the bisectability problem? The kernel repo is supposed to
>> be bisectable so that problem would be solved.
>>      
> Because Marcelo and Avi try to keep as close to upstream qemu as
> possible. So the qemu repo is regularly merged in qemu-kvm and if you
> want to bisect you may end up somewhere in the middle of the qemu
> repository which has only very minimal kvm-support.
> The problem here is that two qemu repositorys exist. But the current
> effort of Anthony is directed to create a single qemu repository. But
> thats not done overnight.
>    

It's in fact possible to bisect qemu-kvm.git.  If you end up in 
qemu.git, do a 'git bisect skip'.  If you end up in a merge, call the 
merge point A, bisect A^1..A^2, each time merging A^1 before compiling 
(the merge is always trivial due to the way we do it).

Not fun, but it works.  When we complete merging kvm integration into 
qemu.git, this problem will disappear.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:22                                                                                                                 ` Ingo Molnar
@ 2010-03-22 19:29                                                                                                                   ` Anthony Liguori
  2010-03-22 20:32                                                                                                                     ` Ingo Molnar
  2010-03-22 19:45                                                                                                                   ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 19:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 02:22 PM, Ingo Molnar wrote:
>> Transitive had a product that was using a KVM context to run their
>> binary translator which allowed them full access to the host
>> processes virtual address space range.  In this case, there is no
>> kernel and there are no devices.
>>      
> And your point is that such vcpus should be excluded from profiling just
> because they fall outside the Qemu/libvirt umbrella?
>    

You don't instrument it the way you'd instrument an operating system so 
no, you don't want it to show up in perf kvm top.

Regards,

Anthony Liguori

> 	Ingo
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:15                                                                                                                 ` Anthony Liguori
@ 2010-03-22 19:31                                                                                                                   ` Daniel P. Berrange
  2010-03-22 19:33                                                                                                                     ` Anthony Liguori
  2010-03-22 19:39                                                                                                                     ` Alexander Graf
  2010-03-22 20:00                                                                                                                   ` Antoine Martin
  1 sibling, 2 replies; 390+ messages in thread
From: Daniel P. Berrange @ 2010-03-22 19:31 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins

On Mon, Mar 22, 2010 at 02:15:35PM -0500, Anthony Liguori wrote:
> On 03/22/2010 12:55 PM, Avi Kivity wrote:
> >>Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
> >>Anthony.
> >>There's numerous ways that this can break:
> >
> >I don't like it either.  We have libvirt for enumerating guests.
> 
> We're stuck in a rut with libvirt and I think a lot of the 
> dissatisfaction with qemu is rooted in that.  It's not libvirt that's 
> the probably, but the relationship between qemu and libvirt.
> 
> We add a feature to qemu and maybe after six month it gets exposed by 
> libvirt.  Release time lines of the two projects complicate the 
> situation further.  People that write GUIs are limited by libvirt 
> because that's what they're told to use and when they need something 
> simple, they're presented with first getting that feature implemented in 
> qemu, then plumbed through libvirt.

That is somewhat unfair as a blanket statement! 

While some features have had a long time delay & others are not supported
at all, in many cases we have had zero delay. We have been supporting QMP,
qdev, vhost-net since before the patches for those features were even merged
in QEMU GIT! It varies depending on how closely QEMU & libvirt people have
been working together on a feature, and on how strongly end users are demanding
the features. 

> It wouldn't be so bad if libvirt was basically a passthrough interface 
> to qemu but it tries to model everything in a generic way which is more 
> or less doomed to fail when you're adding lots of new features (as we are).
> 
> The list of things that libvirt doesn't support and won't any time soon 
> is staggering.

As previously discussed, we want to improve both the set of features
supported, and make it much easier to support new features promptly.
The QMP & qdev stuff has been a very good step forward in making it
easier to support QEMU management. There have been a proposals from 
several people, yourself included, on how to improve libvirt's support
for the full range of QEMU features. We're committed to looking at this
and figuring out which proposals are practical to support, so we can
improve QEMU & libvirt interaction for everyone.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:31                                                                                                                   ` Daniel P. Berrange
@ 2010-03-22 19:33                                                                                                                     ` Anthony Liguori
  2010-03-22 19:39                                                                                                                     ` Alexander Graf
  1 sibling, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 19:33 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins

On 03/22/2010 02:31 PM, Daniel P. Berrange wrote:
> On Mon, Mar 22, 2010 at 02:15:35PM -0500, Anthony Liguori wrote:
>    
>> On 03/22/2010 12:55 PM, Avi Kivity wrote:
>>      
>>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
>>>> Anthony.
>>>> There's numerous ways that this can break:
>>>>          
>>> I don't like it either.  We have libvirt for enumerating guests.
>>>        
>> We're stuck in a rut with libvirt and I think a lot of the
>> dissatisfaction with qemu is rooted in that.  It's not libvirt that's
>> the probably, but the relationship between qemu and libvirt.
>>
>> We add a feature to qemu and maybe after six month it gets exposed by
>> libvirt.  Release time lines of the two projects complicate the
>> situation further.  People that write GUIs are limited by libvirt
>> because that's what they're told to use and when they need something
>> simple, they're presented with first getting that feature implemented in
>> qemu, then plumbed through libvirt.
>>      
> That is somewhat unfair as a blanket statement!
>    

Sorry, you're certainly correct.  Some features appear quickly, but 
others can take an awfully long time.

>> It wouldn't be so bad if libvirt was basically a passthrough interface
>> to qemu but it tries to model everything in a generic way which is more
>> or less doomed to fail when you're adding lots of new features (as we are).
>>
>> The list of things that libvirt doesn't support and won't any time soon
>> is staggering.
>>      
> As previously discussed, we want to improve both the set of features
> supported, and make it much easier to support new features promptly.
> The QMP&  qdev stuff has been a very good step forward in making it
> easier to support QEMU management. There have been a proposals from
> several people, yourself included, on how to improve libvirt's support
> for the full range of QEMU features. We're committed to looking at this
> and figuring out which proposals are practical to support, so we can
> improve QEMU&  libvirt interaction for everyone.
>    

Regards,

Anthony Liguori

> Regards,
> Daniel
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:31                                                                                                                   ` Daniel P. Berrange
  2010-03-22 19:33                                                                                                                     ` Anthony Liguori
@ 2010-03-22 19:39                                                                                                                     ` Alexander Graf
  2010-03-22 19:54                                                                                                                       ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Alexander Graf @ 2010-03-22 19:39 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Anthony Liguori, Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker, Gregory Haskins


On 22.03.2010, at 20:31, Daniel P. Berrange wrote:

> On Mon, Mar 22, 2010 at 02:15:35PM -0500, Anthony Liguori wrote:
>> On 03/22/2010 12:55 PM, Avi Kivity wrote:
>>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
>>>> Anthony.
>>>> There's numerous ways that this can break:
>>> 
>>> I don't like it either.  We have libvirt for enumerating guests.
>> 
>> We're stuck in a rut with libvirt and I think a lot of the 
>> dissatisfaction with qemu is rooted in that.  It's not libvirt that's 
>> the probably, but the relationship between qemu and libvirt.
>> 
>> We add a feature to qemu and maybe after six month it gets exposed by 
>> libvirt.  Release time lines of the two projects complicate the 
>> situation further.  People that write GUIs are limited by libvirt 
>> because that's what they're told to use and when they need something 
>> simple, they're presented with first getting that feature implemented in 
>> qemu, then plumbed through libvirt.
> 
> That is somewhat unfair as a blanket statement! 
> 
> While some features have had a long time delay & others are not supported
> at all, in many cases we have had zero delay. We have been supporting QMP,
> qdev, vhost-net since before the patches for those features were even merged
> in QEMU GIT! It varies depending on how closely QEMU & libvirt people have
> been working together on a feature, and on how strongly end users are demanding
> the features. 

Yes. I think the point was that every layer in between brings potential slowdown and loss of features.

Hopefully this will go away with QMP. By then people can decide if they want to be hypervisor agnostic (libvirt) or tightly coupled with qemu (QMP). The best of both worlds would of course be a QMP pass-through in libvirt. No idea if that's easily possible.

Either way, things are improving. What people see at the end is virt-manager though. And if you compare if feature-wise as well as looks-wise vbox is simply superior. Several features lacking in lower layers too (pv graphics, always working absolute pointers, clipboard sharing, ...).

That said it doesn't mean we should resign. It means we know which areas to work on :-). And we know that our problem is not the kernel/userspace interface, but the qemu and above interfaces.

Alex

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:20                                                                                                                 ` Ingo Molnar
@ 2010-03-22 19:44                                                                                                                   ` Avi Kivity
  2010-03-22 20:06                                                                                                                     ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 19:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 09:20 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
>>> Anthony. There's numerous ways that this can break:
>>>        
>> I don't like it either.  We have libvirt for enumerating guests.
>>      
> Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution,
> obviously.
>    

It doesn't follow.  The libvirt daemon could/should own guests from all 
users.  I don't know if it does so now, but nothing is preventing it 
technically.

>>>   - Those special files can get corrupted, mis-setup, get out of sync, or can
>>>     be hard to discover.
>>>
>>>   - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
>>>     design flaw: it is per user. When i'm root i'd like to query _all_ current
>>>     guest images, not just the ones started by root. A system might not even
>>>     have a notion of '${HOME}'.
>>>
>>>   - Apps might start KVM vcpu instances without adhering to the
>>>     ${HOME}/.qemu/qmp/ access method.
>>>        
>> - it doesn't work with nfs.
>>      
> So out of a list of 4 disadvantages your reply is that you agree with 3?
>    

I agree with 1-3, disagree with 4, and add 5.  Yes.

>    
>>>   - There is no guarantee for the Qemu process to reply to a request - while
>>>     the kernel can always guarantee an enumeration result. I dont want 'perf
>>>     kvm' to hang or misbehave just because Qemu has hung.
>>>        
>> If qemu doesn't reply, your guest is dead anyway.
>>      
> Erm, but i'm talking about a dead tool here. There's a world of a difference
> between 'kvm top' not showing new entries (because the guest is dead), and
> 'perf kvm top' hanging due to Qemu hanging.
>    

If qemu hangs, the guest hangs a few milliseconds later.

> So it's essentially 4 our of 4. Yet your reply isnt "Ingo you are right" but
> "hey, too bad" ?
>    

My reply is "you are right" (phrased earlier as "I don't like it either" 
meaning I agree with your dislike).  One of your criticisms was invalid, 
IMO, and I pointed it out.

>>> Really, for such reasons user-space is pretty poor at doing system-wide
>>> enumeration and resource management. Microkernels lost for a reason.
>>>        
>> Take a look at your desktop, userspace is doing all of that everywhere, from
>> enumerating users and groups, to deciding how your disks are named.  The
>> kernel only provides the bare facilities.
>>      
> We dont do that for robust system instrumentation, for heaven's sake!
>    

If qemu fails, you lose your guest.  If libvirt forgets about a guest, 
you can't do anything with it any more.  These are more serious problems 
than 'perf kvm' not working.  Qemu and libvirt have to be robust anyway, 
we can rely on them.  Like we have to rely on init, X, sshd, and a 
zillion other critical tools.

> By your argument it would be perfectly fine to implement /proc purely via
> user-space, correct?
>    

I would have preferred /proc to be implemented via syscalls called 
directly from tools, and good tools written to expose the information in 
it.  When computers were slower 'top' would spend tons of time opening 
and closing all those tiny files and parsing them.  Of course the kernel 
needs to provide the information.

>>> You are committing several grave design mistakes here.
>>>        
>> I am committing on the shoulders of giants.
>>      
> Really, this is getting outright ridiculous. You agree with me that Anothony
> suggested a technically inferior solution, yet you even seem to be proud of it
> and are joking about it?
>    

The bit above this was:

>  Really, for such reasons user-space is pretty poor at doing system-wide
>  enumeration and resource management. Microkernels lost for a reason.
>  

In every Linux system userspace is doing or proxying much of the 
enumeration and resource management.  So if enumerating guests in 
userspace is a mistake, then I am not alone in making it.

> And _you_ are complaining about lkml-style hard-talk discussions?
>    

There is a difference between joking and insulting people.  I enjoy 
jokes but I dislike being insulted.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:22                                                                                                                 ` Ingo Molnar
  2010-03-22 19:29                                                                                                                   ` Anthony Liguori
@ 2010-03-22 19:45                                                                                                                   ` Avi Kivity
  2010-03-22 20:35                                                                                                                     ` Ingo Molnar
  1 sibling, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 19:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 09:22 PM, Ingo Molnar wrote:
>
>> Transitive had a product that was using a KVM context to run their
>> binary translator which allowed them full access to the host
>> processes virtual address space range.  In this case, there is no
>> kernel and there are no devices.
>>      
> And your point is that such vcpus should be excluded from profiling just
> because they fall outside the Qemu/libvirt umbrella?
>
> That is a ridiculous position.
>
>    

Non-guest vcpus will not be able to provide Linux-style symbols.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:27                                                                                                                 ` Ingo Molnar
@ 2010-03-22 19:47                                                                                                                   ` Avi Kivity
  2010-03-22 20:46                                                                                                                     ` Ingo Molnar
  2010-03-22 22:06                                                                                                                     ` Anthony Liguori
  0 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 19:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 09:27 PM, Ingo Molnar wrote:
>
>> If your position basically boils down to, we can't trust userspace
>> and we can always trust the kernel, I want to eliminate any
>> userspace path, then I can't really help you out.
>>      
> Why would you want to 'help me out'? I can tell a good solution from a bad one
> just fine.
>    

You are basically making a kernel implementation a requirement, instead 
of something that follows from the requirement.

> You should instead read the long list of disadvantages above, invert them and
> list then as advantages for the kernel-based vcpu enumeration solution, apply
> common sense and go admit to yourself that indeed in this situation a kernel
> provided enumeration of vcpu contexts is the most robust solution.
>    

Having qemu enumerate guests one way or another is not a good idea IMO 
since it is focused on one guest and doesn't have a system-wide entity.  
A userspace system-wide entity will work just as well as kernel 
implementation, without its disadvantages.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:20                                                                                           ` Joerg Roedel
  2010-03-22 19:28                                                                                             ` Avi Kivity
@ 2010-03-22 19:49                                                                                             ` Ingo Molnar
  1 sibling, 0 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 19:49 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Anthony Liguori, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker


* Joerg Roedel <joro@8bytes.org> wrote:

> On Mon, Mar 22, 2010 at 05:32:15PM +0100, Ingo Molnar wrote:
> > I dont know how you can find the situation of Alpha comparable, which is a 
> > legacy architecture for which no new CPU was manufactored in the past ~10 
> > years.
> > 
> > The negative effects of physical obscolescence cannot be overcome even by the 
> > very best of development models ...
> 
> The maintainers of that architecture could at least continue to maintain it. 
> But that is not the case. Most newer syscalls are not available and overall 
> stability on alpha sucks (kernel crashed when I tried to start Xorg for 
> example) but nobody cares about it. Hardware is still around and there are 
> still some users of it.

You are arguing why maintainers do not act as you suggest, against the huge 
negative effects of physical obscolescence?

Please use common sense: they dont act because ... there are huge negative 
effects due to physical obscolescence?

No amount of development model engineering can offset that negative.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:39                                                                                                                     ` Alexander Graf
@ 2010-03-22 19:54                                                                                                                       ` Ingo Molnar
  2010-03-22 19:58                                                                                                                         ` Alexander Graf
  2010-03-22 20:19                                                                                                                         ` Antoine Martin
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 19:54 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Daniel P. Berrange, Anthony Liguori, Avi Kivity, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker, Gregory Haskins


* Alexander Graf <agraf@suse.de> wrote:

> Yes. I think the point was that every layer in between brings potential 
> slowdown and loss of features.

Exactly. The more 'fragmented' a project is into sub-projects, without a 
single, unified, functional reference implementation in the center of it, the 
longer it takes to fix 'unsexy' problems like trivial usability bugs.

Furthermore, another negative effect is that many times features are 
implemented not in their technically best way, but in a way to keep them local 
to the project that originates them. This is done to keep deployment latencies 
and general contribution overhead down to a minimum. The moment you have to 
work with yet another project, the overhead adds up.

So developers rather go for the quicker (yet inferior) hack within the 
sub-project they have best access to.

Tell me this isnt happening in this space ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:54                                                                                                                       ` Ingo Molnar
@ 2010-03-22 19:58                                                                                                                         ` Alexander Graf
  2010-03-22 20:21                                                                                                                           ` Ingo Molnar
  2010-03-22 20:19                                                                                                                         ` Antoine Martin
  1 sibling, 1 reply; 390+ messages in thread
From: Alexander Graf @ 2010-03-22 19:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Daniel P. Berrange, Anthony Liguori, Avi Kivity, Pekka Enberg,
	Yanmin Zhang, Peter Zijlstra, Sheng Yang, LKML Mailing List,
	kvm-devel General, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker, Gregory Haskins


On 22.03.2010, at 20:54, Ingo Molnar wrote:

> 
> * Alexander Graf <agraf@suse.de> wrote:
> 
>> Yes. I think the point was that every layer in between brings potential 
>> slowdown and loss of features.
> 
> Exactly. The more 'fragmented' a project is into sub-projects, without a 
> single, unified, functional reference implementation in the center of it, the 
> longer it takes to fix 'unsexy' problems like trivial usability bugs.

I agree to that part. As previously stated there are few people working on qemu that would go and implement higher level things though. So some solution is needed there.

> Furthermore, another negative effect is that many times features are 
> implemented not in their technically best way, but in a way to keep them local 
> to the project that originates them. This is done to keep deployment latencies 
> and general contribution overhead down to a minimum. The moment you have to 
> work with yet another project, the overhead adds up.

I disagree there. Keeping things local and self-contained has been the UNIX secret. It works really well as long as the boundaries are well defined.

The problem we're facing is that we're simply lacking an active GUI / desktop user development community. We have desktop users, but nobody feels like tackling the issue of doing a great GUI project while talking to qemu-devel about his needs.

> So developers rather go for the quicker (yet inferior) hack within the 
> sub-project they have best access to.

Well - not necessarily hacks. It's more about project boundaries. Nothing is bad about that. You wouldn't want "ls" implemented in the Linux kernel either, right? :-)


Alex

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:15                                                                                                                 ` Anthony Liguori
  2010-03-22 19:31                                                                                                                   ` Daniel P. Berrange
@ 2010-03-22 20:00                                                                                                                   ` Antoine Martin
  2010-03-22 20:58                                                                                                                     ` Daniel P. Berrange
  1 sibling, 1 reply; 390+ messages in thread
From: Antoine Martin @ 2010-03-22 20:00 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins

On 03/23/2010 02:15 AM, Anthony Liguori wrote:
> On 03/22/2010 12:55 PM, Avi Kivity wrote:
>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
>>> Anthony.
>>> There's numerous ways that this can break:
>>
>> I don't like it either.  We have libvirt for enumerating guests.
>
> We're stuck in a rut with libvirt and I think a lot of the 
> dissatisfaction with qemu is rooted in that.  It's not libvirt that's 
> the probably, but the relationship between qemu and libvirt.
+1
The obvious reason why so many people still use shell scripts rather 
than libvirt is because if it just doesn't provide what they need.
Every time I've looked at it (and I've been looking for a better 
solution for many years), it seems that it would have provided most of 
the things I needed, but the remaining bits were unsolvable.

Shell scripts can be ugly, but you get total control.

Antoine
> We add a feature to qemu and maybe after six month it gets exposed by 
> libvirt.  Release time lines of the two projects complicate the 
> situation further.  People that write GUIs are limited by libvirt 
> because that's what they're told to use and when they need something 
> simple, they're presented with first getting that feature implemented 
> in qemu, then plumbed through libvirt.
>
> It wouldn't be so bad if libvirt was basically a passthrough interface 
> to qemu but it tries to model everything in a generic way which is 
> more or less doomed to fail when you're adding lots of new features 
> (as we are).
>
> The list of things that libvirt doesn't support and won't any time 
> soon is staggering.
>
> libvirt serves an important purpose, but we need to do a better job in 
> qemu with respect to usability.  We can't just punt to libvirt.
>
> Regards,
>
> Anthony Liguori
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:44                                                                                                                   ` Avi Kivity
@ 2010-03-22 20:06                                                                                                                     ` Ingo Molnar
  2010-03-22 20:15                                                                                                                       ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 20:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> On 03/22/2010 09:20 PM, Ingo Molnar wrote:
> >* Avi Kivity<avi@redhat.com>  wrote:
> >
> >>>Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
> >>>Anthony. There's numerous ways that this can break:
> >>I don't like it either.  We have libvirt for enumerating guests.
> >Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution,
> >obviously.
> 
> It doesn't follow.  The libvirt daemon could/should own guests from all 
> users.  I don't know if it does so now, but nothing is preventing it 
> technically.

It's hard for me to argue against a hypothetical implementation, but all 
user-space driven solutions for resource enumeration i've seen so far had 
weaknesses that kernel-based solutions dont have.

> >>>  - Those special files can get corrupted, mis-setup, get out of sync, or can
> >>>    be hard to discover.
> >>>
> >>>  - The ${HOME}/.qemu/qmp/ solution suggested by Anthony has a very obvious
> >>>    design flaw: it is per user. When i'm root i'd like to query _all_ current
> >>>    guest images, not just the ones started by root. A system might not even
> >>>    have a notion of '${HOME}'.
> >>>
> >>>  - Apps might start KVM vcpu instances without adhering to the
> >>>    ${HOME}/.qemu/qmp/ access method.
> >>- it doesn't work with nfs.
> >So out of a list of 4 disadvantages your reply is that you agree with 3?
> 
> I agree with 1-3, disagree with 4, and add 5.  Yes.
> 
> >>>  - There is no guarantee for the Qemu process to reply to a request - while
> >>>    the kernel can always guarantee an enumeration result. I dont want 'perf
> >>>    kvm' to hang or misbehave just because Qemu has hung.
> >>If qemu doesn't reply, your guest is dead anyway.
> >Erm, but i'm talking about a dead tool here. There's a world of a difference
> >between 'kvm top' not showing new entries (because the guest is dead), and
> >'perf kvm top' hanging due to Qemu hanging.
> 
> If qemu hangs, the guest hangs a few milliseconds later.

I think you didnt understand my point. I am talking about 'perf kvm top' 
hanging if Qemu hangs.

With a proper in-kernel enumeration the kernel would always guarantee the 
functionality, even if the vcpu does not make progress (i.e. it's "hung").

With this implemented in Qemu we lose that kind of robustness guarantee.

And especially during development (when developers use instrumentation the 
most) is it important to have robust instrumentation that does not hang along 
with the Qemu process.

> If qemu fails, you lose your guest.  If libvirt forgets about a
> guest, you can't do anything with it any more.  These are more
> serious problems than 'perf kvm' not working. [...]

How on earth can you justify a bug ("perf kvm top" hanging) with that there 
are other bugs as well?

Basically you are arguing the equivalent that a gdb session would be fine to 
become unresponsive if the debugged task hangs. Fortunately ptrace is 
kernel-based and it never 'hangs' if the user-space process hangs somewhere.

This is an essential property of good instrumentation.

So the enumeration method you suggested is a poor, sub-part solution, simple 
as that.

> [...] Qemu and libvirt have to be robust anyway, we can rely on them.  Like 
> we have to rely on init, X, sshd, and a zillion other critical tools.

We can still profile any of those tools without the profiler breaking if the 
debugged tool breaks ...

> > By your argument it would be perfectly fine to implement /proc purely via 
> > user-space, correct?
> 
> I would have preferred /proc to be implemented via syscalls called directly 
> from tools, and good tools written to expose the information in it.  When 
> computers were slower 'top' would spend tons of time opening and closing all 
> those tiny files and parsing them.  Of course the kernel needs to provide 
> the information.

(Then you'll be enjoyed to hear that perf has enabled exactly that, and that we 
are working towards that precise usecase.)

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 20:06                                                                                                                     ` Ingo Molnar
@ 2010-03-22 20:15                                                                                                                       ` Avi Kivity
  2010-03-22 20:29                                                                                                                         ` Ingo Molnar
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 20:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 10:06 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>> On 03/22/2010 09:20 PM, Ingo Molnar wrote:
>>      
>>> * Avi Kivity<avi@redhat.com>   wrote:
>>>
>>>        
>>>>> Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by
>>>>> Anthony. There's numerous ways that this can break:
>>>>>            
>>>> I don't like it either.  We have libvirt for enumerating guests.
>>>>          
>>> Which has pretty much the same problems to the ${HOME}/.qemu/qmp/ solution,
>>> obviously.
>>>        
>> It doesn't follow.  The libvirt daemon could/should own guests from all
>> users.  I don't know if it does so now, but nothing is preventing it
>> technically.
>>      
> It's hard for me to argue against a hypothetical implementation, but all
> user-space driven solutions for resource enumeration i've seen so far had
> weaknesses that kernel-based solutions dont have.
>    

Correct.  kernel-based solutions also have issues.

>> If qemu hangs, the guest hangs a few milliseconds later.
>>      
> I think you didnt understand my point. I am talking about 'perf kvm top'
> hanging if Qemu hangs.
>    

Use non-blocking I/O, report that guest as dead.  No point in profiling 
it, it isn't making any progress.

> With a proper in-kernel enumeration the kernel would always guarantee the
> functionality, even if the vcpu does not make progress (i.e. it's "hung").
>
> With this implemented in Qemu we lose that kind of robustness guarantee.
>    

If qemu has a bug in the resource enumeration code, you can't profile 
one guest.  If the kernel has a bug in the resource enumeration code, 
the system either panics or needs to be rebooted later.

> And especially during development (when developers use instrumentation the
> most) is it important to have robust instrumentation that does not hang along
> with the Qemu process.
>    

It's nice not to have kernel oopses either.  So when code can be in 
userspace, that's where it should be.

>> If qemu fails, you lose your guest.  If libvirt forgets about a
>> guest, you can't do anything with it any more.  These are more
>> serious problems than 'perf kvm' not working. [...]
>>      
> How on earth can you justify a bug ("perf kvm top" hanging) with that there
> are other bugs as well?
>    

There's no reason for 'perf kvm top' to hang if some process is not 
responsive.  That would be a perf bug.

> Basically you are arguing the equivalent that a gdb session would be fine to
> become unresponsive if the debugged task hangs. Fortunately ptrace is
> kernel-based and it never 'hangs' if the user-space process hangs somewhere.
>    

Neither gdb nor perf should hang.

> This is an essential property of good instrumentation.
>
> So the enumeration method you suggested is a poor, sub-part solution, simple
> as that.
>    

Or, you misunderstood it.

>> [...] Qemu and libvirt have to be robust anyway, we can rely on them.  Like
>> we have to rely on init, X, sshd, and a zillion other critical tools.
>>      
> We can still profile any of those tools without the profiler breaking if the
> debugged tool breaks ...
>    

You can't profile without qemu.

>>> By your argument it would be perfectly fine to implement /proc purely via
>>> user-space, correct?
>>>        
>> I would have preferred /proc to be implemented via syscalls called directly
>> from tools, and good tools written to expose the information in it.  When
>> computers were slower 'top' would spend tons of time opening and closing all
>> those tiny files and parsing them.  Of course the kernel needs to provide
>> the information.
>>      
> (Then you'll be enjoyed to hear that perf has enabled exactly that, and that we
> are working towards that precise usecase.)
>    

Are you exporting /proc/pid data via the perf syscall?  If so, I think 
that's a good move.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:54                                                                                                                       ` Ingo Molnar
  2010-03-22 19:58                                                                                                                         ` Alexander Graf
@ 2010-03-22 20:19                                                                                                                         ` Antoine Martin
  1 sibling, 0 replies; 390+ messages in thread
From: Antoine Martin @ 2010-03-22 20:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander Graf, Daniel P. Berrange, Anthony Liguori, Avi Kivity,
	Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/23/2010 02:54 AM, Ingo Molnar wrote:
> * Alexander Graf<agraf@suse.de>  wrote
>> Yes. I think the point was that every layer in between brings potential
>> slowdown and loss of features.
>>      
> Exactly. The more 'fragmented' a project is into sub-projects, without a
> single, unified, functional reference implementation in the center of it, the
> longer it takes to fix 'unsexy' problems like trivial usability bugs.
>
> Furthermore, another negative effect is that many times features are
> implemented not in their technically best way, but in a way to keep them local
> to the project that originates them. This is done to keep deployment latencies
> and general contribution overhead down to a minimum. The moment you have to
> work with yet another project, the overhead adds up.
>
> So developers rather go for the quicker (yet inferior) hack within the
> sub-project they have best access to.
>
> Tell me this isnt happening in this space ;-)
>    
Integration is hard, requires a wider set of technical skills and 
getting good test coverage becomes more difficult.
But I agree that it is worth the effort, kvm could reap large rewards 
from putting a greater emphasis on integration (ala vbox) - no matter 
how it is achieved (cowardly not taking sides on implementation 
decisions like repository locations).

Antoine

> Thanks,
>
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:58                                                                                                                         ` Alexander Graf
@ 2010-03-22 20:21                                                                                                                           ` Ingo Molnar
  2010-03-22 20:35                                                                                                                             ` Avi Kivity
  2010-03-23 10:48                                                                                                                             ` Bernd Petrovitsch
  0 siblings, 2 replies; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 20:21 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Daniel P. Berrange, Anthony Liguori, Avi Kivity, Pekka Enberg,
	Yanmin Zhang, Peter Zijlstra, Sheng Yang, LKML Mailing List,
	kvm-devel General, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker, Gregory Haskins


* Alexander Graf <agraf@suse.de> wrote:

> > Furthermore, another negative effect is that many times features are 
> > implemented not in their technically best way, but in a way to keep them 
> > local to the project that originates them. This is done to keep deployment 
> > latencies and general contribution overhead down to a minimum. The moment 
> > you have to work with yet another project, the overhead adds up.
> 
> I disagree there. Keeping things local and self-contained has been the UNIX 
> secret. It works really well as long as the boundaries are well defined.

The 'UNIX secret' works for text driven pipelined commands where we are 
essentially programming via narrow ASCII input of mathematical logic.

It doesnt work for a GUI that is a 2D/3D environment of millions of pixels, 
shaped by human visual perception of prettiness, familiarity and efficiency.

> The problem we're facing is that we're simply lacking an active GUI / 
> desktop user development community. We have desktop users, but nobody feels 
> like tackling the issue of doing a great GUI project while talking to 
> qemu-devel about his needs.

Have you made thoughts about why that might be so?

I think it's because of what i outlined above - that you are trying to apply 
the "UNIX secret" to GUIs - and that is a mistake.

A good GUI is almost at the _exact opposite spectrum_ of good command-line 
tool: tightly integrated, with 'layering violations' designed into it all over 
the place:

  look i can paste the text from an editor straight into a firefox form. I
  didnt go through any hiearchy of layers, i just took the shortest path 
  between the apps!

In other words: in a GUI the output controls the design, for command-line 
tools the design controls the output.

It is no wonder Unix always had its problems with creating good GUIs that are 
efficient to humans. A good GUI works like the human brain, and the human 
brain does not mind 'layering violations' when that gets it a more efficient 
result.

> > So developers rather go for the quicker (yet inferior) hack within the 
> > sub-project they have best access to.
> 
> Well - not necessarily hacks. It's more about project boundaries. Nothing is 
> bad about that. You wouldn't want "ls" implemented in the Linux kernel 
> either, right? :-)

I guess you are talking to the wrong person as i actually have implemented ls 
functionality in the kernel, using async IO concepts and extreme threading ;-) 
It was a bit crazy, but was also the fastest FTP server ever running on this 
planet.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 20:15                                                                                                                       ` Avi Kivity
@ 2010-03-22 20:29                                                                                                                         ` Ingo Molnar
  2010-03-22 20:40                                                                                                                           ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 20:29 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> > I think you didnt understand my point. I am talking about 'perf kvm top' 
> > hanging if Qemu hangs.
> 
> Use non-blocking I/O, report that guest as dead.  No point in profiling it, 
> it isn't making any progress.

Erm, at what point do i decide that a guest is 'dead' versus 'just lagged due 
to lots of IO' ?

Also, do you realize that you increase complexity (the use of non-blocking 
IO), just to protect against something that wouldnt happen if the right 
solution was used in the first place?

> > With a proper in-kernel enumeration the kernel would always guarantee the 
> > functionality, even if the vcpu does not make progress (i.e. it's "hung").
> >
> > With this implemented in Qemu we lose that kind of robustness guarantee.
> 
> If qemu has a bug in the resource enumeration code, you can't profile one 
> guest.  If the kernel has a bug in the resource enumeration code, the system 
> either panics or needs to be rebooted later.

This is really simple code, not rocket science. If there's a bug in it we'll 
fix it. On the other hand a 500KLOC+ piece of Qemu code has lots of places to 
hang, so that is a large cross section.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:29                                                                                                                   ` Anthony Liguori
@ 2010-03-22 20:32                                                                                                                     ` Ingo Molnar
  2010-03-22 20:43                                                                                                                       ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 20:32 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/22/2010 02:22 PM, Ingo Molnar wrote:
> >>Transitive had a product that was using a KVM context to run their
> >>binary translator which allowed them full access to the host
> >>processes virtual address space range.  In this case, there is no
> >>kernel and there are no devices.
> >
> > And your point is that such vcpus should be excluded from profiling just 
> > because they fall outside the Qemu/libvirt umbrella?
> 
> You don't instrument it the way you'd instrument an operating system so no, 
> you don't want it to show up in perf kvm top.

Erm, why not? It's executing a virtualized CPU, so sure it makes sense to 
allow the profiling of it!

It might even not be the weird case you mentioned by some competing 
virtualization project to Qemu ...

So your argument is wrong on several technical levels, sorry.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:45                                                                                                                   ` Avi Kivity
@ 2010-03-22 20:35                                                                                                                     ` Ingo Molnar
  2010-03-22 20:45                                                                                                                       ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 20:35 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> On 03/22/2010 09:22 PM, Ingo Molnar wrote:
> >
> >> Transitive had a product that was using a KVM context to run their binary 
> >> translator which allowed them full access to the host processes virtual 
> >> address space range.  In this case, there is no kernel and there are no 
> >> devices.
> >
> > And your point is that such vcpus should be excluded from profiling just 
> > because they fall outside the Qemu/libvirt umbrella?
> >
> > That is a ridiculous position.
> >
> 
> Non-guest vcpus will not be able to provide Linux-style symbols.

And why do you say that it makes no sense to profile them?

Also, why do you define 'guest vcpus' to be 'Qemu started guest vcpus'? If 
some other KVM using project (which you encouraged just a few mails ago) 
starts a vcpu we still want to be able to profile them.

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 20:21                                                                                                                           ` Ingo Molnar
@ 2010-03-22 20:35                                                                                                                             ` Avi Kivity
  2010-03-23 10:48                                                                                                                             ` Bernd Petrovitsch
  1 sibling, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 20:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander Graf, Daniel P. Berrange, Anthony Liguori,
	Pekka Enberg, Yanmin Zhang, Peter Zijlstra, Sheng Yang,
	LKML Mailing List, kvm-devel General, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 10:21 PM, Ingo Molnar wrote:
> * Alexander Graf<agraf@suse.de>  wrote:
>
>    
>>> Furthermore, another negative effect is that many times features are
>>> implemented not in their technically best way, but in a way to keep them
>>> local to the project that originates them. This is done to keep deployment
>>> latencies and general contribution overhead down to a minimum. The moment
>>> you have to work with yet another project, the overhead adds up.
>>>        
>> I disagree there. Keeping things local and self-contained has been the UNIX
>> secret. It works really well as long as the boundaries are well defined.
>>      
> The 'UNIX secret' works for text driven pipelined commands where we are
> essentially programming via narrow ASCII input of mathematical logic.
>
> It doesnt work for a GUI that is a 2D/3D environment of millions of pixels,
> shaped by human visual perception of prettiness, familiarity and efficiency.
>    

Modularization is needed when a project exceeds the average developer's 
capacity.  For kvm,  it is logical to separate privileged cpu 
virtualization, from guest virtualization, from host management, from 
cluster management.

>> The problem we're facing is that we're simply lacking an active GUI /
>> desktop user development community. We have desktop users, but nobody feels
>> like tackling the issue of doing a great GUI project while talking to
>> qemu-devel about his needs.
>>      
> Have you made thoughts about why that might be so?
>
> I think it's because of what i outlined above - that you are trying to apply
> the "UNIX secret" to GUIs - and that is a mistake.
>
> A good GUI is almost at the _exact opposite spectrum_ of good command-line
> tool: tightly integrated, with 'layering violations' designed into it all over
> the place:
>
>    look i can paste the text from an editor straight into a firefox form. I
>    didnt go through any hiearchy of layers, i just took the shortest path
>    between the apps!
>    

Nope.  You copied text from one application into the clipboard (or 
selection, or PRIMARY, or whatever
) and pasted text from the clipboard to another application.  If firefox 
and your editor had to interact directly, all would be lost.

See - there was a global (for the session) third party, and it wasn't 
the kernel.

> In other words: in a GUI the output controls the design, for command-line
> tools the design controls the output.
>    

Not in GUIs that I've seen the internals of.

> It is no wonder Unix always had its problems with creating good GUIs that are
> efficient to humans. A good GUI works like the human brain, and the human
> brain does not mind 'layering violations' when that gets it a more efficient
> result.
>    

The problem is that only developers are involved, not people who 
understand human-computer interaction (in many cases, not human-human 
interaction either).  Another problem is that a good GUI takes a lot of 
work so you need a lot of committed resources.  A third problem is that 
it isn't a lot of fun, at least not the 20% of the work that take 800% 
of the time.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 20:29                                                                                                                         ` Ingo Molnar
@ 2010-03-22 20:40                                                                                                                           ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 20:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 10:29 PM, Ingo Molnar wrote:
> * Avi Kivity<avi@redhat.com>  wrote:
>
>    
>>> I think you didnt understand my point. I am talking about 'perf kvm top'
>>> hanging if Qemu hangs.
>>>        
>> Use non-blocking I/O, report that guest as dead.  No point in profiling it,
>> it isn't making any progress.
>>      
> Erm, at what point do i decide that a guest is 'dead' versus 'just lagged due
> to lots of IO' ?
>    

qemu shouldn't block due to I/O (it does now, but there is work to fix 
it).  Of course it could be swapping or other things.

Pick a timeout, everything we do has timeouts these days.  It's the 
price we pay for protection: if you put something where a failure can't 
hurt you, you have to be prepared for failure, and you might have false 
alarms.

Is it so horrible for 'perf kvm top'?  No user data loss will happen, 
surely?

On the other hand, if it's in the kernel and it fails, you will lose 
service or perhaps data.

> Also, do you realize that you increase complexity (the use of non-blocking
> IO), just to protect against something that wouldnt happen if the right
> solution was used in the first place?
>    

It's a tradeoff.  Increasing the kernel code size vs. increasing 
userspace size.

>>> With a proper in-kernel enumeration the kernel would always guarantee the
>>> functionality, even if the vcpu does not make progress (i.e. it's "hung").
>>>
>>> With this implemented in Qemu we lose that kind of robustness guarantee.
>>>        
>> If qemu has a bug in the resource enumeration code, you can't profile one
>> guest.  If the kernel has a bug in the resource enumeration code, the system
>> either panics or needs to be rebooted later.
>>      
> This is really simple code, not rocket science. If there's a bug in it we'll
> fix it. On the other hand a 500KLOC+ piece of Qemu code has lots of places to
> hang, so that is a large cross section.
>
>    

The kernel has tons of very simple code (and some very complex code as 
well), and tons of -stable updates as well.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 20:32                                                                                                                     ` Ingo Molnar
@ 2010-03-22 20:43                                                                                                                       ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 20:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 10:32 PM, Ingo Molnar wrote:
> * Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>    
>> On 03/22/2010 02:22 PM, Ingo Molnar wrote:
>>      
>>>> Transitive had a product that was using a KVM context to run their
>>>> binary translator which allowed them full access to the host
>>>> processes virtual address space range.  In this case, there is no
>>>> kernel and there are no devices.
>>>>          
>>> And your point is that such vcpus should be excluded from profiling just
>>> because they fall outside the Qemu/libvirt umbrella?
>>>        
>> You don't instrument it the way you'd instrument an operating system so no,
>> you don't want it to show up in perf kvm top.
>>      
> Erm, why not? It's executing a virtualized CPU, so sure it makes sense to
> allow the profiling of it!
>    

It may not make sense to have symbol tables for it, for example it isn't 
generated from source code but from binary code for another architecture.

Of course, just showing addresses is fine, but you don't need qemu for that.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 20:35                                                                                                                     ` Ingo Molnar
@ 2010-03-22 20:45                                                                                                                       ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 20:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 10:35 PM, Ingo Molnar wrote:
>
>>> And your point is that such vcpus should be excluded from profiling just
>>> because they fall outside the Qemu/libvirt umbrella?
>>>
>>> That is a ridiculous position.
>>>
>>>        
>> Non-guest vcpus will not be able to provide Linux-style symbols.
>>      
> And why do you say that it makes no sense to profile them?
>    

It makes sense to profile them, but you don't need to contact their 
userspace tool for that.

> Also, why do you define 'guest vcpus' to be 'Qemu started guest vcpus'? If
> some other KVM using project (which you encouraged just a few mails ago)
> starts a vcpu we still want to be able to profile them.
>
>    

Maybe it should provide a mechanism for libvirt to list it.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:47                                                                                                                   ` Avi Kivity
@ 2010-03-22 20:46                                                                                                                     ` Ingo Molnar
  2010-03-22 20:53                                                                                                                       ` Avi Kivity
  2010-03-22 22:06                                                                                                                     ` Anthony Liguori
  1 sibling, 1 reply; 390+ messages in thread
From: Ingo Molnar @ 2010-03-22 20:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins


* Avi Kivity <avi@redhat.com> wrote:

> On 03/22/2010 09:27 PM, Ingo Molnar wrote:
> >
> >> If your position basically boils down to, we can't trust userspace
> >> and we can always trust the kernel, I want to eliminate any
> >> userspace path, then I can't really help you out.
> >
> > Why would you want to 'help me out'? I can tell a good solution from a bad 
> > one just fine.
> 
> You are basically making a kernel implementation a requirement, instead of 
> something that follows from the requirement.

No, i'm not.

> > You should instead read the long list of disadvantages above, invert them 
> > and list then as advantages for the kernel-based vcpu enumeration 
> > solution, apply common sense and go admit to yourself that indeed in this 
> > situation a kernel provided enumeration of vcpu contexts is the most 
> > robust solution.
> 
> Having qemu enumerate guests one way or another is not a good idea IMO since 
> it is focused on one guest and doesn't have a system-wide entity.  A 
> userspace system-wide entity will work just as well as kernel 
> implementation, without its disadvantages.

A system-wide user-space entity only solves one problem out of the 4 i listed, 
still leaving the other 3:

 - Those special files can get corrupted, mis-setup, get out of sync, or can
   be hard to discover.

 - Apps might start KVM vcpu instances without adhering to the
   system-wide access method.

 - There is no guarantee for the system-wide process to reply to a request -
   while the kernel can always guarantee an enumeration result. I dont want
   'perf kvm' to hang or misbehave just because the system-wide entity has 
   hung.

Really, i think i have to give up and not try to convince you guys about this 
anymore - i dont think you are arguing constructively anymore and i dont want 
yet another pointless flamewar about this.

Please consider 'perf kvm' scrapped indefinitely, due to lack of robust KVM 
instrumentation features: due to lack of robust+universal vcpu/guest 
enumeration and due to lack of robust+universal symbol access on the KVM side. 
It was a really promising feature IMO and i invested two days of arguments 
into it trying to find a workable solution, but it was not to be.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 20:46                                                                                                                     ` Ingo Molnar
@ 2010-03-22 20:53                                                                                                                       ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-22 20:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anthony Liguori, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 10:46 PM, Ingo Molnar wrote:
>
>>> You should instead read the long list of disadvantages above, invert them
>>> and list then as advantages for the kernel-based vcpu enumeration
>>> solution, apply common sense and go admit to yourself that indeed in this
>>> situation a kernel provided enumeration of vcpu contexts is the most
>>> robust solution.
>>>        
>> Having qemu enumerate guests one way or another is not a good idea IMO since
>> it is focused on one guest and doesn't have a system-wide entity.  A
>> userspace system-wide entity will work just as well as kernel
>> implementation, without its disadvantages.
>>      
> A system-wide user-space entity only solves one problem out of the 4 i listed,
> still leaving the other 3:
>
>   - Those special files can get corrupted, mis-setup, get out of sync, or can
>     be hard to discover.
>    

That's a hard requirement anyway.  If it happens, we get massive data 
loss.  Way more troubling than 'perf kvm top' doesn't work.  So consider 
it fulfilled.

>   - Apps might start KVM vcpu instances without adhering to the
>     system-wide access method.
>    

Then you don't get their symbol tables.  That happens anyway if the 
symbol server is not installed, not running, handing out fake data.  So 
we have to deal with that anyway.

>   - There is no guarantee for the system-wide process to reply to a request -
>     while the kernel can always guarantee an enumeration result. I dont want
>     'perf kvm' to hang or misbehave just because the system-wide entity has
>     hung.
>    

When you press a key there is no guarantee no component along the way 
will time out.

> Really, i think i have to give up and not try to convince you guys about this
> anymore - i dont think you are arguing constructively anymore and i dont want
> yet another pointless flamewar about this.
>
> Please consider 'perf kvm' scrapped indefinitely, due to lack of robust KVM
> instrumentation features: due to lack of robust+universal vcpu/guest
> enumeration and due to lack of robust+universal symbol access on the KVM side.
> It was a really promising feature IMO and i invested two days of arguments
> into it trying to find a workable solution, but it was not to be.
>    

I am not going to push libvirt or a subset thereof into the kernel for 
'perf kvm'.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 20:00                                                                                                                   ` Antoine Martin
@ 2010-03-22 20:58                                                                                                                     ` Daniel P. Berrange
  0 siblings, 0 replies; 390+ messages in thread
From: Daniel P. Berrange @ 2010-03-22 20:58 UTC (permalink / raw)
  To: Antoine Martin
  Cc: Anthony Liguori, Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker, Gregory Haskins

On Tue, Mar 23, 2010 at 03:00:28AM +0700, Antoine Martin wrote:
> On 03/23/2010 02:15 AM, Anthony Liguori wrote:
> >On 03/22/2010 12:55 PM, Avi Kivity wrote:
> >>>Lets look at the ${HOME}/.qemu/qmp/ enumeration method suggested by 
> >>>Anthony.
> >>>There's numerous ways that this can break:
> >>
> >>I don't like it either.  We have libvirt for enumerating guests.
> >
> >We're stuck in a rut with libvirt and I think a lot of the 
> >dissatisfaction with qemu is rooted in that.  It's not libvirt that's 
> >the probably, but the relationship between qemu and libvirt.
> +1
> The obvious reason why so many people still use shell scripts rather 
> than libvirt is because if it just doesn't provide what they need.
> Every time I've looked at it (and I've been looking for a better 
> solution for many years), it seems that it would have provided most of 
> the things I needed, but the remaining bits were unsolvable.

If you happen to remember what missing features prevented you choosing
libvirt, that would be invaluable information for us, to see if there
are quick wins that will help out. We got very useful feedback when
recently asking people this same question

http://rwmj.wordpress.com/2010/01/07/quick-quiz-what-stops-you-from-using-libvirt/

Allowing arbitrary passthrough of QEMU commands/args will solve some of
these issues, but certainly far from solving all of them. eg guest cut+
paste, host side control of guest screen resolution, easier x509/TLS 
configuration for remote management, soft reboot, Windows desktop support
for virt-manager, host network interface management/setup, etc

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 19:47                                                                                                                   ` Avi Kivity
  2010-03-22 20:46                                                                                                                     ` Ingo Molnar
@ 2010-03-22 22:06                                                                                                                     ` Anthony Liguori
  2010-03-23  9:07                                                                                                                       ` Avi Kivity
                                                                                                                                         ` (2 more replies)
  1 sibling, 3 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-22 22:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/22/2010 02:47 PM, Avi Kivity wrote:
> On 03/22/2010 09:27 PM, Ingo Molnar wrote:
>>
>>> If your position basically boils down to, we can't trust userspace
>>> and we can always trust the kernel, I want to eliminate any
>>> userspace path, then I can't really help you out.
>> Why would you want to 'help me out'? I can tell a good solution from 
>> a bad one
>> just fine.
>
> You are basically making a kernel implementation a requirement, 
> instead of something that follows from the requirement.
>
>> You should instead read the long list of disadvantages above, invert 
>> them and
>> list then as advantages for the kernel-based vcpu enumeration 
>> solution, apply
>> common sense and go admit to yourself that indeed in this situation a 
>> kernel
>> provided enumeration of vcpu contexts is the most robust solution.
>
> Having qemu enumerate guests one way or another is not a good idea IMO 
> since it is focused on one guest and doesn't have a system-wide entity.

There always needs to be a system wide entity.  There are two ways to 
enumerate instances from that system wide entity.  You can centralize 
the creation of instances and there by maintain an list of current 
instances.  You can also allow instances to be created in a 
decentralized manner and provide a standard mechanism for instances to 
register themselves with the system wide entity.

IOW, it's the difference between asking libvirtd to exec(qemu) vs 
allowing a user to exec(qemu) and having qemu connect to a well known 
unix domain socket for libvirt to tell libvirtd that it exists.

The later approach has a number of advantages.  libvirt already supports 
both models.  The former is the '/system' uri and the later is the 
'/session' uri.

What I'm proposing, is to use the host file system as the system wide 
entity instead of libvirtd.  libvirtd can monitor the host file system 
to participate in these activities but ultimately, moving this 
functionality out of libvirtd means that it becomes the standard 
mechanism for all qemu instances regardless of how they're launched.

Regards,

Anthony Liguori

>   A userspace system-wide entity will work just as well as kernel 
> implementation, without its disadvantages.
>


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-22 16:44       ` Arnaldo Carvalho de Melo
@ 2010-03-23  3:14         ` Zhang, Yanmin
  2010-03-23 13:15           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-23  3:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, zhiteng.huang,
	Frédéric Weisbecker

On Mon, 2010-03-22 at 13:44 -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Mar 22, 2010 at 03:24:47PM +0800, Zhang, Yanmin escreveu:
> > On Fri, 2010-03-19 at 09:21 +0100, Ingo Molnar wrote:
> > > So some sort of --guestmount option would be the natural solution, which 
> > > points to the guest system's root: and a Qemu enumeration of guest mounts 
> > > (which would be off by default and configurable) from which perf can pick up 
> > > the target guest all automatically. (obviously only under allowed permissions 
> > > so that such access is secure)
> > If sshfs could access /proc/ and /sys correctly, here is a design:
> > --guestmount points to a directory which consists of a list of sub-directories.
> > Every sub-directory's name is just the qemu process id of guest os. Admin/developer
> > mounts every guest os instance's root directory to corresponding sub-directory.
> > 
> > Then, perf could access all files. It's possible because guest os instance
> > happens to be multi-threading in a process. One of the defects is the accessing to
> > guest os becomes slow or impossible when guest os is very busy.
> 
> If the MMAP events on the guest included a cookie that could later be
> used to query for the symtab of that DSO, we wouldn't need to access the
> guest FS at all, right?
It depends on specific sub commands. As for 'perf kvm top', developers want to see
the profiling immediately. Even with 'perf kvm record', developers also want to
see results quickly. At least I'm eager for the results when investigating
a performance issue.

> 
> With build-ids and debuginfo-install like tools the symbol resolution
> could be performed by using the cookies (build-ids) as keys to get to
> the *-debuginfo packages with matching symtabs (and DWARF for source
> annotation, etc).
We can't make sure guest os uses the same os images, or don't know where we
could find the original DVD images being used to install guest os.

Current perf does save build id, including both kernls's and other application
lib/executables.

> 
> We have that for the kernel as:
> 
> [acme@doppio linux-2.6-tip]$ l /sys/kernel/notes 
> -r--r--r-- 1 root root 36 2010-03-22 13:14 /sys/kernel/notes
> [acme@doppio linux-2.6-tip]$ l /sys/module/ipv6/sections/.note.gnu.build-id 
> -r--r--r-- 1 root root 4096 2010-03-22 13:38 /sys/module/ipv6/sections/.note.gnu.build-id
> [acme@doppio linux-2.6-tip]$
> 
> That way we would cover DSOs being reinstalled in long running 'perf
> record' sessions too.
That's one of objectives of perf to support long running.

> 
> This was discussed some time ago but would require help from the bits
> that load DSOs.
> 
> build-ids then would be first class citizens.
> 
> - Arnaldo



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 22:06                                                                                                                     ` Anthony Liguori
@ 2010-03-23  9:07                                                                                                                       ` Avi Kivity
  2010-03-23 14:09                                                                                                                         ` Anthony Liguori
  2010-03-23 10:13                                                                                                                       ` Kevin Wolf
  2010-03-23 14:06                                                                                                                       ` Joerg Roedel
  2 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-23  9:07 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/23/2010 12:06 AM, Anthony Liguori wrote:
>> Having qemu enumerate guests one way or another is not a good idea 
>> IMO since it is focused on one guest and doesn't have a system-wide 
>> entity.
>
>
> There always needs to be a system wide entity.  There are two ways to 
> enumerate instances from that system wide entity.  You can centralize 
> the creation of instances and there by maintain an list of current 
> instances.  You can also allow instances to be created in a 
> decentralized manner and provide a standard mechanism for instances to 
> register themselves with the system wide entity.
>
> IOW, it's the difference between asking libvirtd to exec(qemu) vs 
> allowing a user to exec(qemu) and having qemu connect to a well known 
> unix domain socket for libvirt to tell libvirtd that it exists.
>
> The later approach has a number of advantages.  libvirt already 
> supports both models.  The former is the '/system' uri and the later 
> is the '/session' uri.
>
> What I'm proposing, is to use the host file system as the system wide 
> entity instead of libvirtd.  libvirtd can monitor the host file system 
> to participate in these activities but ultimately, moving this 
> functionality out of libvirtd means that it becomes the standard 
> mechanism for all qemu instances regardless of how they're launched.

I don't like dropping sockets into the host filesystem, especially as 
they won't be cleaned up on abnormal exit.  I also think this breaks our 
'mechanism, not policy' policy.  Someone may want to do something weird 
with qemu that doesn't work well with this.

We could allow starting monitors from the global configuration file, so 
a distribution can do this if it wants, but I don't think we should do 
this ourselves by default.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 14:54                                                                                               ` Ingo Molnar
  2010-03-22 19:04                                                                                                 ` Avi Kivity
@ 2010-03-23  9:46                                                                                                 ` Olivier Galibert
  1 sibling, 0 replies; 390+ messages in thread
From: Olivier Galibert @ 2010-03-23  9:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pekka Enberg, Avi Kivity, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	sandmann

On Mon, Mar 22, 2010 at 03:54:37PM +0100, Ingo Molnar wrote:
> Yes, i thought Qemu would be a prime candidate to be the baseline for 
> tools/kvm/, but i guess that has become socially impossible now after this 
> flamewar. It's not a big problem in the big scheme of things: tools/kvm/ is 
> best grown up from a small towards larger size anyway ...

I'm curious, where would you put the limit?  Let's imagine a tools/kvm
appears, be it qemu or not, that's outside the scope of my question.
Would you put the legacy PC bios in there (seabios I guess)?  The EFI
bios? The windows-compiled paravirtual drivers? The Xorg paravirtual
DDX ?  Mesa (which includes the pv gallium drivers)? The
libvirt-equivalent? The GUI?

That's not a rhetorical question btw, I really wonder where the limit
should be.

  OG.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 22:06                                                                                                                     ` Anthony Liguori
  2010-03-23  9:07                                                                                                                       ` Avi Kivity
@ 2010-03-23 10:13                                                                                                                       ` Kevin Wolf
  2010-03-23 10:28                                                                                                                         ` Antoine Martin
  2010-03-23 14:06                                                                                                                       ` Joerg Roedel
  2 siblings, 1 reply; 390+ messages in thread
From: Kevin Wolf @ 2010-03-23 10:13 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins

Am 22.03.2010 23:06, schrieb Anthony Liguori:
> On 03/22/2010 02:47 PM, Avi Kivity wrote:
>> Having qemu enumerate guests one way or another is not a good idea IMO 
>> since it is focused on one guest and doesn't have a system-wide entity.
> 
> There always needs to be a system wide entity.  There are two ways to 
> enumerate instances from that system wide entity.  You can centralize 
> the creation of instances and there by maintain an list of current 
> instances.  You can also allow instances to be created in a 
> decentralized manner and provide a standard mechanism for instances to 
> register themselves with the system wide entity.
> 
> IOW, it's the difference between asking libvirtd to exec(qemu) vs 
> allowing a user to exec(qemu) and having qemu connect to a well known 
> unix domain socket for libvirt to tell libvirtd that it exists.

I think the latter is exactly what I would want for myself. I do see the
advantages of having a central instance, but I really don't want to
bother with libvirt configuration files or even GUIs just to get an
ad-hoc VM up when I can simply run "qemu -hda hd.img -m 1024". Let alone
that I usually want to have full control over qemu, including monitor
access and small details available as command line options.

I know that I'm not the average user with these requirements, but still
I am one user and do have these requirements. If I could just install
libvirt, continue using qemu as I always did and libvirt picked my VMs
up for things like global enumeration, that would be more or less the
optimal thing for me.

Kevin

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-23 10:13                                                                                                                       ` Kevin Wolf
@ 2010-03-23 10:28                                                                                                                         ` Antoine Martin
  0 siblings, 0 replies; 390+ messages in thread
From: Antoine Martin @ 2010-03-23 10:28 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Anthony Liguori, Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, oerg Roedel, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker, Gregory Haskins

On 03/23/2010 05:13 PM, Kevin Wolf wrote:
> Am 22.03.2010 23:06, schrieb Anthony Liguori:
>    
>> On 03/22/2010 02:47 PM, Avi Kivity wrote:
>>      
>>> Having qemu enumerate guests one way or another is not a good idea IMO
>>> since it is focused on one guest and doesn't have a system-wide entity.
>>>        
>> There always needs to be a system wide entity.  There are two ways to
>> enumerate instances from that system wide entity.  You can centralize
>> the creation of instances and there by maintain an list of current
>> instances.  You can also allow instances to be created in a
>> decentralized manner and provide a standard mechanism for instances to
>> register themselves with the system wide entity.
>>
>> IOW, it's the difference between asking libvirtd to exec(qemu) vs
>> allowing a user to exec(qemu) and having qemu connect to a well known
>> unix domain socket for libvirt to tell libvirtd that it exists.
>>      
> I think the latter is exactly what I would want for myself. I do see the
> advantages of having a central instance, but I really don't want to
> bother with libvirt configuration files or even GUIs just to get an
> ad-hoc VM up when I can simply run "qemu -hda hd.img -m 1024". Let alone
> that I usually want to have full control over qemu, including monitor
> access and small details available as command line options.
>
> I know that I'm not the average user with these requirements, but still
> I am one user and do have these requirements. If I could just install
> libvirt, continue using qemu as I always did and libvirt picked my VMs
> up for things like global enumeration, that would be more or less the
> optimal thing for me.
>    
+1
And it would also make it more likely that users like us would convert 
to libvirt in the long run, by providing an easy and integrated 
transition path.
I've had another look at libvirt, and one of the things that is holding 
me back is the cost of moving existing scripts to libvirt. If it could 
just pick up what I have (at least in part), then I don't have to.

Antoine

> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 20:21                                                                                                                           ` Ingo Molnar
  2010-03-22 20:35                                                                                                                             ` Avi Kivity
@ 2010-03-23 10:48                                                                                                                             ` Bernd Petrovitsch
  1 sibling, 0 replies; 390+ messages in thread
From: Bernd Petrovitsch @ 2010-03-23 10:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander Graf, Daniel P. Berrange, Anthony Liguori, Avi Kivity,
	Pekka Enberg, Yanmin Zhang, Peter Zijlstra, Sheng Yang,
	LKML Mailing List, kvm-devel General, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Mon, 2010-03-22 at 21:21 +0100, Ingo Molnar wrote:
[...]
> Have you made thoughts about why that might be so?

Yes.

Forword: I assume with "GUI" you mean "a user interface for the
classical desktop user with next to no interest in learning details or
basics".
That doesn't mean the classical desktop user is silly, stupid or
otherwise handicapped - it's just the lack of interest and/or time.

> I think it's because of what i outlined above - that you are trying to apply 
> the "UNIX secret" to GUIs - and that is a mistake.

No, it's the very same mechanism. But you just have to start at the
correct point. In the kernel/device driver world, you start at the
device.
And in the GUI world, you better start at the GUI (and not some kernel
API, library API, GUI tool or toolchains or anywhere else).

> A good GUI is almost at the _exact opposite spectrum_ of good command-line 
> tool: tightly integrated, with 'layering violations' designed into it all over 
> the place:
>
>   look i can paste the text from an editor straight into a firefox form. I
>   didnt go through any hiearchy of layers, i just took the shortest path 
>   between the apps!
> 
> In other words: in a GUI the output controls the design, for command-line 
ACK, because you to make the GUI understandable to the intended users.
If that means "hiding 90% of all possibilities and features", you just
hide them.
Of course, the user of such an UI is quite limited doesn't use much of
the functionality - because s/he can't access it through the GUI - (but
presenting 100% - or even 40% - doesn't help either as s/he won't
understand it anyways).

> tools the design controls the output.
ACK, because the user in this case (which is most of the time a
developer, sys-admin, or similar techie) *wants* an 1:1 picture of the
underlying model because s/he already *knows* the underlying model (and
is willing and able to adapt the own workflow to the underlying models).

> It is no wonder Unix always had its problems with creating good GUIs that are 

ACK. The clichee-Unix-person doesn't come from the "GUI world". So most
of them are "trained" and used to look what's there and improve on it.

> efficient to humans. A good GUI works like the human brain, and the human 
> brain does not mind 'layering violations' when that gets it a more efficient 
> result.

If this is the case, the layering/structure/design of the GUI is (very)
badly defined/chosen (for whatever reason).

[ Most probably because some seasoned software developer designed the
GUI-app *without* designing (and testing!) the GUI (or more to the
point: the look - how does it look like - and feel - how does it behave,
what are the possible workflows, ... - of it) first. ]

	Bernd
-- 
Bernd Petrovitsch                  Email : bernd@petrovitsch.priv.at
                     LUGA : http://www.luga.at


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-23  3:14         ` Zhang, Yanmin
@ 2010-03-23 13:15           ` Arnaldo Carvalho de Melo
  2010-03-24  1:39             ` Zhang, Yanmin
  0 siblings, 1 reply; 390+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-03-23 13:15 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, zhiteng.huang,
	Frédéric Weisbecker

Em Tue, Mar 23, 2010 at 11:14:41AM +0800, Zhang, Yanmin escreveu:
> On Mon, 2010-03-22 at 13:44 -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Mar 22, 2010 at 03:24:47PM +0800, Zhang, Yanmin escreveu:
> > > On Fri, 2010-03-19 at 09:21 +0100, Ingo Molnar wrote:
> > > Then, perf could access all files. It's possible because guest os instance
> > > happens to be multi-threading in a process. One of the defects is the accessing to
> > > guest os becomes slow or impossible when guest os is very busy.
> > 
> > If the MMAP events on the guest included a cookie that could later be
> > used to query for the symtab of that DSO, we wouldn't need to access the
> > guest FS at all, right?

> It depends on specific sub commands. As for 'perf kvm top', developers
> want to see the profiling immediately. Even with 'perf kvm record',
> developers also want to

That is not a problem, if you have the relevant buildids in your cache
(Look in your machine at ~/.debug/), it will be as fast as ever.

If you use a distro that has its userspace with build-ids, you probably
use it always without noticing :-)

> see results quickly. At least I'm eager for the results when
> investigating a performance issue.

Sure thing.
 
> > With build-ids and debuginfo-install like tools the symbol
> > resolution could be performed by using the cookies (build-ids) as
> > keys to get to the *-debuginfo packages with matching symtabs (and
> > DWARF for source annotation, etc).

> We can't make sure guest os uses the same os images, or don't know
> where we could find the original DVD images being used to install
> guest os.

You don't have to have guest and host sharing the same OS image, you
just have to somehow populate your buildid cache with what you need, be
it using sshfs or what Ingo is suggesting once, or using what your
vendor provides (debuginfo packages). And you just have to do it once,
for the relevant apps, to have it in your buildid cache.
 
> Current perf does save build id, including both kernls's and other
> application lib/executables.

Yeah, I know, I implemented it. :-)
 
> > We have that for the kernel as:

> > [acme@doppio linux-2.6-tip]$ l /sys/kernel/notes 
> > -r--r--r-- 1 root root 36 2010-03-22 13:14 /sys/kernel/notes
> > [acme@doppio linux-2.6-tip]$ l /sys/module/ipv6/sections/.note.gnu.build-id 
> > -r--r--r-- 1 root root 4096 2010-03-22 13:38 /sys/module/ipv6/sections/.note.gnu.build-id
> > [acme@doppio linux-2.6-tip]$

> > That way we would cover DSOs being reinstalled in long running 'perf
> > record' sessions too.

> That's one of objectives of perf to support long running.

But it doesn't fully supports right now, as I explained, build-ids are
collected at the end of the record session, because we have to open the
DSOs that had hits to get the 20 bytes cookie we need, the build-id.

If we had it in the PERF_RECORD_MMAP record, we would close this race,
and the added cost at load time should be minimal, to get the ELF
section with it and put it somewhere in task struct.

If only we could coalesce it a bit to reclaim this:

[acme@doppio linux-2.6-tip]$ pahole -C task_struct ../build/v2.6.34-rc1-tip+/kernel/sched.o  | tail -5
	/* size: 5968, cachelines: 94, members: 150 */
	/* sum members: 5943, holes: 7, sum holes: 25 */
	/* bit holes: 1, sum bit holes: 28 bits */
	/* last cacheline: 16 bytes */
};
[acme@doppio linux-2.6-tip]$ 

8-)

Or at least get just one of those 4 bytes holes then we could stick it
at the end to get our build-id there, accessing it would be done only
at PERF_RECORD_MMAP injection time, i.e. close to the time when we
actually are loading the executable mmap, i.e. close to the time when
the loader is injecting the build-id, I guess the extra memory and
processing costs would be in the noise.

> > This was discussed some time ago but would require help from the bits
> > that load DSOs.

> > build-ids then would be first class citizens.

- Arnaldo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-22 11:47             ` Joerg Roedel
  2010-03-22 12:26               ` Ingo Molnar
@ 2010-03-23 13:18               ` Soeren Sandmann
  2010-03-23 13:49                 ` Andi Kleen
  1 sibling, 1 reply; 390+ messages in thread
From: Soeren Sandmann @ 2010-03-23 13:18 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Zhang, Yanmin, Peter Zijlstra, Avi Kivity,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, zhiteng.huang,
	Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo

Joerg Roedel <joro@8bytes.org> writes:

> On Mon, Mar 22, 2010 at 11:59:27AM +0100, Ingo Molnar wrote:
> > Best would be if you demonstrated any problems of the perf symbol lookup code 
> > you are aware of on the host side, as it has that exact design you are 
> > criticising here. We are eager to fix any bugs in it.
> > 
> > If you claim that it's buggy then that should very much be demonstratable - no 
> > need to go into theoretical arguments about it.
> 
> I am not claiming anything. I just try to imagine how your proposal
> will look like in practice and forgot that symbol resolution is done at
> a later point.
> But even with defered symbol resolution we need more information from
> the guest than just the rip falling out of KVM. The guest needs to tell
> us about the process where the event happened (information that the host
> has about itself without any hassle) and which executable-files it was
> loaded from.

Slightly tangential, but there is another case that has some of the
same problems: profiling other language runtimes than C and C++, say
Python. At the moment profilers will generally tell you what is going
on inside the python runtime, but not what the python program itself
is doing.

To fix that problem, it seems like we need some way to have python
export what is going on. Maybe the same mechanism could be used to
both access what is going on in qemu and python.


Soren

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-23 13:18               ` Soeren Sandmann
@ 2010-03-23 13:49                 ` Andi Kleen
  2010-03-23 14:04                   ` Soeren Sandmann
  2010-03-23 14:10                   ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 390+ messages in thread
From: Andi Kleen @ 2010-03-23 13:49 UTC (permalink / raw)
  To: Soeren Sandmann
  Cc: Joerg Roedel, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra,
	Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang,
	Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo

Soeren Sandmann <sandmann@daimi.au.dk> writes:
>
> To fix that problem, it seems like we need some way to have python
> export what is going on. Maybe the same mechanism could be used to
> both access what is going on in qemu and python.

oprofile already has an interface to let JITs export
information about the JITed code. C Python is not a JIT,
but presumably one of the python JITs could do it.

http://oprofile.sourceforge.net/doc/devel/index.html

I know it's not envogue anymore and you won't be a approved 
cool kid if you do, but you could just use oprofile? 

Ok presumably one would need to do a python interface for this
first. I believe it's currently only implemented for Java and
Mono. I presume it might work today with IronPython on Mono.

IMHO it doesn't make sense to invent another interface for this,
although I'm sure someone will propose just that.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-23 13:49                 ` Andi Kleen
@ 2010-03-23 14:04                   ` Soeren Sandmann
  2010-03-23 14:20                     ` Andi Kleen
  2010-03-23 14:46                     ` Frank Ch. Eigler
  2010-03-23 14:10                   ` Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 390+ messages in thread
From: Soeren Sandmann @ 2010-03-23 14:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Joerg Roedel, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra,
	Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang,
	Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo

Andi Kleen <andi@firstfloor.org> writes:

> Soeren Sandmann <sandmann@daimi.au.dk> writes:
> >
> > To fix that problem, it seems like we need some way to have python
> > export what is going on. Maybe the same mechanism could be used to
> > both access what is going on in qemu and python.
> 
> oprofile already has an interface to let JITs export
> information about the JITed code. C Python is not a JIT,
> but presumably one of the python JITs could do it.
> 
> http://oprofile.sourceforge.net/doc/devel/index.html

It's not that I personally want to profile a particular python
program. I'm interested in the more general problem of extracting more
information from profiled user space programs than just stack traces.

Examples:

        - What is going on inside QEMU? 

        - Which client is the X server servicing?

        - What parts of a python/shell/scheme/javascript program is
          taking the most CPU time?

I don't think the oprofile JIT interface solves any of these
problems. (In fact, I don't see why the JIT problem is even hard. The
JIT compiler can just generate a little ELF file with symbols in it,
and the profiler can pick it up through the mmap events that you get
through the perf interface).

> I know it's not envogue anymore and you won't be a approved 
> cool kid if you do, but you could just use oprofile? 

I am bringing this up because I want to extend sysprof to be more
useful. 


Soren

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22 22:06                                                                                                                     ` Anthony Liguori
  2010-03-23  9:07                                                                                                                       ` Avi Kivity
  2010-03-23 10:13                                                                                                                       ` Kevin Wolf
@ 2010-03-23 14:06                                                                                                                       ` Joerg Roedel
  2010-03-23 16:39                                                                                                                         ` Avi Kivity
  2 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-23 14:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Mon, Mar 22, 2010 at 05:06:17PM -0500, Anthony Liguori wrote:
> There always needs to be a system wide entity.  There are two ways to  
> enumerate instances from that system wide entity.  You can centralize  
> the creation of instances and there by maintain an list of current  
> instances.  You can also allow instances to be created in a  
> decentralized manner and provide a standard mechanism for instances to  
> register themselves with the system wide entity.

And this system wide entity is the kvm module. It creates instances of
'struct kvm' and destroys them. I see no problem if we just attach a
name to every instance with a good default value like kvm0, kvm1 ... or
guest0, guest1 ... User-space can override the name if it wants. The kvm
module takes care about the names being unique.
This is very much the same as network card numbering is implemented in
the kernel.
Forcing perf to talk to qemu or even libvirt produces to much overhead
imho. Instrumentation only produces useful results with low overhead.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-23  9:07                                                                                                                       ` Avi Kivity
@ 2010-03-23 14:09                                                                                                                         ` Anthony Liguori
  0 siblings, 0 replies; 390+ messages in thread
From: Anthony Liguori @ 2010-03-23 14:09 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Pekka Enberg, Zhang, Yanmin, Peter Zijlstra,
	Sheng Yang, linux-kernel, kvm, Marcelo Tosatti, oerg Roedel,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/23/2010 04:07 AM, Avi Kivity wrote:
> On 03/23/2010 12:06 AM, Anthony Liguori wrote:
>>> Having qemu enumerate guests one way or another is not a good idea 
>>> IMO since it is focused on one guest and doesn't have a system-wide 
>>> entity.
>>
>>
>> There always needs to be a system wide entity.  There are two ways to 
>> enumerate instances from that system wide entity.  You can centralize 
>> the creation of instances and there by maintain an list of current 
>> instances.  You can also allow instances to be created in a 
>> decentralized manner and provide a standard mechanism for instances 
>> to register themselves with the system wide entity.
>>
>> IOW, it's the difference between asking libvirtd to exec(qemu) vs 
>> allowing a user to exec(qemu) and having qemu connect to a well known 
>> unix domain socket for libvirt to tell libvirtd that it exists.
>>
>> The later approach has a number of advantages.  libvirt already 
>> supports both models.  The former is the '/system' uri and the later 
>> is the '/session' uri.
>>
>> What I'm proposing, is to use the host file system as the system wide 
>> entity instead of libvirtd.  libvirtd can monitor the host file 
>> system to participate in these activities but ultimately, moving this 
>> functionality out of libvirtd means that it becomes the standard 
>> mechanism for all qemu instances regardless of how they're launched.
>
> I don't like dropping sockets into the host filesystem, especially as 
> they won't be cleaned up on abnormal exit.  I also think this breaks 
> our 'mechanism, not policy' policy.  Someone may want to do something 
> weird with qemu that doesn't work well with this.

The approach I've taken (which I accidentally committed and reverted) 
was to set this up as the default qmp device much like we have a default 
monitor device.  A user is capable of overriding this by manually 
specifying a qmp device or by disabling defaults.

> We could allow starting monitors from the global configuration file, 
> so a distribution can do this if it wants, but I don't think we should 
> do this ourselves by default.

I've looked at making default devices globally configurable.  We'll get 
there but I think that's orthogonal to setting up a useful default qmp 
device.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-23 13:49                 ` Andi Kleen
  2010-03-23 14:04                   ` Soeren Sandmann
@ 2010-03-23 14:10                   ` Arnaldo Carvalho de Melo
  2010-03-23 15:23                     ` Peter Zijlstra
  1 sibling, 1 reply; 390+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-03-23 14:10 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Soeren Sandmann, Joerg Roedel, Ingo Molnar, Zhang, Yanmin,
	Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	zhiteng.huang, Fr??d??ric Weisbecker

Em Tue, Mar 23, 2010 at 02:49:01PM +0100, Andi Kleen escreveu:
> Soeren Sandmann <sandmann@daimi.au.dk> writes:
> > To fix that problem, it seems like we need some way to have python
> > export what is going on. Maybe the same mechanism could be used to
> > both access what is going on in qemu and python.
> 
> oprofile already has an interface to let JITs export
> information about the JITed code. C Python is not a JIT,
> but presumably one of the python JITs could do it.
> 
> http://oprofile.sourceforge.net/doc/devel/index.html
> 
> I know it's not envogue anymore and you won't be a approved 
> cool kid if you do, but you could just use oprofile? 

perf also has supports for this and Pekka Enberg's jato uses it:

http://penberg.blogspot.com/2009/06/jato-has-profiler.html

- Arnaldo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-23 14:04                   ` Soeren Sandmann
@ 2010-03-23 14:20                     ` Andi Kleen
  2010-03-23 14:29                       ` Arnaldo Carvalho de Melo
  2010-03-23 14:46                     ` Frank Ch. Eigler
  1 sibling, 1 reply; 390+ messages in thread
From: Andi Kleen @ 2010-03-23 14:20 UTC (permalink / raw)
  To: Soeren Sandmann
  Cc: Joerg Roedel, Ingo Molnar, Zhang, Yanmin, Peter Zijlstra,
	Avi Kivity, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, zhiteng.huang,
	Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo

Soeren Sandmann <sandmann@daimi.au.dk> writes:
>
> Examples:
>
>         - What is going on inside QEMU? 

That's something the JIT interface could answer.

>         - Which client is the X server servicing?
>
>         - What parts of a python/shell/scheme/javascript program is
>           taking the most CPU time?

I suspect for those you rather need event based tracers of some sort,
similar to kernel trace points. Otherwise you would need own
separate stacks and other complications.

systemtap has some effort to use the dtrace instrumentation
that crops up in more and more user programs for this.  It wouldn't
surprise me if that was already in python and other programs
you're interested in.

I presume right now it only works if you apply the utrace monstrosity
though, but perhaps the new uprobes patches floating around 
will come to rescue.

There also was some effort to have a pure user space
daemon based approach for LTT, but I believe that currently
needs own trace points.

Again I fully expect someone to reinvent the wheel here
and afterwards complain about "community inefficiences" :-)

> I don't think the oprofile JIT interface solves any of these
> problems. (In fact, I don't see why the JIT problem is even hard. The
> JIT compiler can just generate a little ELF file with symbols in it,
> and the profiler can pick it up through the mmap events that you get
> through the perf interface).

That would require keeping those temporary ELF files for
potentially unlimited time around (profilers today look at the ELF
files at the final analysis phase, which might be weeks away)

Also that would be a lot of overhead for the JIT and most likely
be a larger scale rewrite for a given JIT code base.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-23 14:20                     ` Andi Kleen
@ 2010-03-23 14:29                       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 390+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-03-23 14:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Soeren Sandmann, Joerg Roedel, Ingo Molnar, Zhang, Yanmin,
	Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	zhiteng.huang, Fr??d??ric Weisbecker

Em Tue, Mar 23, 2010 at 03:20:11PM +0100, Andi Kleen escreveu:
> Soeren Sandmann <sandmann@daimi.au.dk> writes:
> > I don't think the oprofile JIT interface solves any of these
> > problems. (In fact, I don't see why the JIT problem is even hard. The
> > JIT compiler can just generate a little ELF file with symbols in it,
> > and the profiler can pick it up through the mmap events that you get
> > through the perf interface).
> 
> That would require keeping those temporary ELF files for
> potentially unlimited time around (profilers today look at the ELF
> files at the final analysis phase, which might be weeks away)

'perf record' will traverse the perf.data file just collected and, if the
binaries have build-ids, will stash them in ~/.debug/, keyed by build-id
just like the -debuginfo packages do.

So only the binaries with hits. Also one can use 'perf archive' to
create a tar.bz2 file with the files with hits for the specified
perf.data file, that can then be transfered to another machine, whatever
arch, untarred at ~/.debug and then the report can be done there.

As it is done by build-id, multiple 'perf record' sessions share files
in the cache.

Right now the whole ELF file (or /proc/kallsyms copy) is stored if
collected from the DSO directly, or the bits that are stored in
-debuginfo files if we find it installed (so smaller). We could strip
that down further by storing just the ELF sections needed to make sense
of the symtab.

- Arnaldo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-23 14:04                   ` Soeren Sandmann
  2010-03-23 14:20                     ` Andi Kleen
@ 2010-03-23 14:46                     ` Frank Ch. Eigler
  1 sibling, 0 replies; 390+ messages in thread
From: Frank Ch. Eigler @ 2010-03-23 14:46 UTC (permalink / raw)
  To: Soeren Sandmann
  Cc: Andi Kleen, Joerg Roedel, Ingo Molnar, Zhang, Yanmin,
	Peter Zijlstra, Avi Kivity, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	zhiteng.huang, Fr??d??ric Weisbecker, Arnaldo Carvalho de Melo

Soeren Sandmann <sandmann@daimi.au.dk> writes:

> [...]
>         - What is going on inside QEMU? 
>         - Which client is the X server servicing?
>         - What parts of a python/shell/scheme/javascript program is
>           taking the most CPU time?
> [...]

These kinds of questions usually require navigation through internal
data of the user-space process ("Where in this linked list is this
pointer?"), and often also correlating them with history ("which
socket/fd was most recently serviced?").

Systemtap excels at letting one express such things.

- FChE

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-23 14:10                   ` Arnaldo Carvalho de Melo
@ 2010-03-23 15:23                     ` Peter Zijlstra
  0 siblings, 0 replies; 390+ messages in thread
From: Peter Zijlstra @ 2010-03-23 15:23 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Soeren Sandmann, Joerg Roedel, Ingo Molnar, Zhang,
	Yanmin, Avi Kivity, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	zhiteng.huang, Fr??d??ric Weisbecker, Pekka Enberg

On Tue, 2010-03-23 at 11:10 -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Mar 23, 2010 at 02:49:01PM +0100, Andi Kleen escreveu:
> > Soeren Sandmann <sandmann@daimi.au.dk> writes:
> > > To fix that problem, it seems like we need some way to have python
> > > export what is going on. Maybe the same mechanism could be used to
> > > both access what is going on in qemu and python.
> > 
> > oprofile already has an interface to let JITs export
> > information about the JITed code. C Python is not a JIT,
> > but presumably one of the python JITs could do it.
> > 
> > http://oprofile.sourceforge.net/doc/devel/index.html
> > 
> > I know it's not envogue anymore and you won't be a approved 
> > cool kid if you do, but you could just use oprofile? 
> 
> perf also has supports for this and Pekka Enberg's jato uses it:
> 
> http://penberg.blogspot.com/2009/06/jato-has-profiler.html

Right, we need to move that into a library though (always meant to do
that, never got around to doing it).

That way the app can link against a dso with weak empty stubs and have
perf record LD_PRELOAD a version that has a suitable implementation.

That all has the advantage of not exposing the actual interface like we
do now.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-23 14:06                                                                                                                       ` Joerg Roedel
@ 2010-03-23 16:39                                                                                                                         ` Avi Kivity
  2010-03-23 18:21                                                                                                                           ` Joerg Roedel
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-23 16:39 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/23/2010 04:06 PM, Joerg Roedel wrote:
> On Mon, Mar 22, 2010 at 05:06:17PM -0500, Anthony Liguori wrote:
>    
>> There always needs to be a system wide entity.  There are two ways to
>> enumerate instances from that system wide entity.  You can centralize
>> the creation of instances and there by maintain an list of current
>> instances.  You can also allow instances to be created in a
>> decentralized manner and provide a standard mechanism for instances to
>> register themselves with the system wide entity.
>>      
> And this system wide entity is the kvm module. It creates instances of
> 'struct kvm' and destroys them. I see no problem if we just attach a
> name to every instance with a good default value like kvm0, kvm1 ... or
> guest0, guest1 ... User-space can override the name if it wants. The kvm
> module takes care about the names being unique.
>    

So, two users can't have a guest named MyGuest each?  What about 
namespace support?  There's a lot of work in virtualizing all kernel 
namespaces, you're adding to that.  What about notifications when guests 
are added or removed?

> This is very much the same as network card numbering is implemented in
> the kernel.
> Forcing perf to talk to qemu or even libvirt produces to much overhead
> imho. Instrumentation only produces useful results with low overhead.
>
>    

It's a setup cost only.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-23 16:39                                                                                                                         ` Avi Kivity
@ 2010-03-23 18:21                                                                                                                           ` Joerg Roedel
  2010-03-23 18:27                                                                                                                             ` Peter Zijlstra
                                                                                                                                               ` (3 more replies)
  0 siblings, 4 replies; 390+ messages in thread
From: Joerg Roedel @ 2010-03-23 18:21 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote:
> On 03/23/2010 04:06 PM, Joerg Roedel wrote:

>> And this system wide entity is the kvm module. It creates instances of
>> 'struct kvm' and destroys them. I see no problem if we just attach a
>> name to every instance with a good default value like kvm0, kvm1 ... or
>> guest0, guest1 ... User-space can override the name if it wants. The kvm
>> module takes care about the names being unique.
>>    
>
> So, two users can't have a guest named MyGuest each?  What about  
> namespace support?  There's a lot of work in virtualizing all kernel  
> namespaces, you're adding to that.

This enumeration is a very small and non-intrusive feature. Making it
aware of namespaces is easy too.

> What about notifications when guests  are added or removed?

Who would be the consumer of such notifications? A 'perf kvm list' can
live without I guess. If we need them later we can still add them.

>> This is very much the same as network card numbering is implemented in
>> the kernel.
>> Forcing perf to talk to qemu or even libvirt produces to much overhead
>> imho. Instrumentation only produces useful results with low overhead.
>>
>
> It's a setup cost only.

My statement was not limited to enumeration, I should have been more
clear about that. The guest filesystem access-channel is another
affected part. The 'perf kvm top' command will access the guest
filesystem regularly and going over qemu would be more overhead here.
Providing this in the KVM module directly also has the benefit that it
would work out-of-the-box with different userspaces too.  Or do we want
to limit 'perf kvm' to the libvirt-qemu-kvm software stack?

Sidenote: I really think we should come to a conclusion about the
          concept. KVM integration into perf is very useful feature to
	  analyze virtualization workloads.

Thanks,

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-23 18:21                                                                                                                           ` Joerg Roedel
@ 2010-03-23 18:27                                                                                                                             ` Peter Zijlstra
  2010-03-23 19:05                                                                                                                               ` Javier Guerra Giraldez
                                                                                                                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 390+ messages in thread
From: Peter Zijlstra @ 2010-03-23 18:27 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Tue, 2010-03-23 at 19:21 +0100, Joerg Roedel wrote:

> Sidenote: I really think we should come to a conclusion about the
>           concept. KVM integration into perf is very useful feature to
> 	  analyze virtualization workloads.

I always start my things with bare kvm, It would be very unwelcome to
mandate libvirt, or for that matter running a particular userspace in
the guest.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single  project
  2010-03-23 18:21                                                                                                                           ` Joerg Roedel
@ 2010-03-23 19:05                                                                                                                               ` Javier Guerra Giraldez
  2010-03-23 19:05                                                                                                                               ` Javier Guerra Giraldez
                                                                                                                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 390+ messages in thread
From: Javier Guerra Giraldez @ 2010-03-23 19:05 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins

On Tue, Mar 23, 2010 at 2:21 PM, Joerg Roedel <joro@8bytes.org> wrote:
> On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote:
>> So, two users can't have a guest named MyGuest each?  What about
>> namespace support?  There's a lot of work in virtualizing all kernel
>> namespaces, you're adding to that.
>
> This enumeration is a very small and non-intrusive feature. Making it
> aware of namespaces is easy too.

an outsider's comment: this path leads to a filesystem... which could
be a very nice idea.  it could have a directory for each VM, with
pseudo-files with all the guest's status, and even the memory it's
using.  perf could simply watch those files.   in fact, such a
filesystem could be the main userleve/kernel interface.

but i'm sure such a layour was considered (and rejected) very early in
the KVM design.  i don't think there's anything new to make it more
desirable than it was back then.


-- 
Javier

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
@ 2010-03-23 19:05                                                                                                                               ` Javier Guerra Giraldez
  0 siblings, 0 replies; 390+ messages in thread
From: Javier Guerra Giraldez @ 2010-03-23 19:05 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins

On Tue, Mar 23, 2010 at 2:21 PM, Joerg Roedel <joro@8bytes.org> wrote:
> On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote:
>> So, two users can't have a guest named MyGuest each?  What about
>> namespace support?  There's a lot of work in virtualizing all kernel
>> namespaces, you're adding to that.
>
> This enumeration is a very small and non-intrusive feature. Making it
> aware of namespaces is easy too.

an outsider's comment: this path leads to a filesystem... which could
be a very nice idea.  it could have a directory for each VM, with
pseudo-files with all the guest's status, and even the memory it's
using.  perf could simply watch those files.   in fact, such a
filesystem could be the main userleve/kernel interface.

but i'm sure such a layour was considered (and rejected) very early in
the KVM design.  i don't think there's anything new to make it more
desirable than it was back then.


-- 
Javier

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [PATCH] Enhance perf to collect KVM guest os statistics from host side
  2010-03-23 13:15           ` Arnaldo Carvalho de Melo
@ 2010-03-24  1:39             ` Zhang, Yanmin
  0 siblings, 0 replies; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-24  1:39 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Avi Kivity, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, oerg Roedel, Jes Sorensen,
	Gleb Natapov, Zachary Amsden, zhiteng.huang,
	Frédéric Weisbecker

On Tue, 2010-03-23 at 10:15 -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Mar 23, 2010 at 11:14:41AM +0800, Zhang, Yanmin escreveu:
> > On Mon, 2010-03-22 at 13:44 -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Mar 22, 2010 at 03:24:47PM +0800, Zhang, Yanmin escreveu:
> > > > On Fri, 2010-03-19 at 09:21 +0100, Ingo Molnar wrote:
> > > > Then, perf could access all files. It's possible because guest os instance
> > > > happens to be multi-threading in a process. One of the defects is the accessing to
> > > > guest os becomes slow or impossible when guest os is very busy.
> > > 
> > > If the MMAP events on the guest included a cookie that could later be
> > > used to query for the symtab of that DSO, we wouldn't need to access the
> > > guest FS at all, right?
> 
> > It depends on specific sub commands. As for 'perf kvm top', developers
> > want to see the profiling immediately. Even with 'perf kvm record',
> > developers also want to
> 
> That is not a problem, if you have the relevant buildids in your cache
> (Look in your machine at ~/.debug/), it will be as fast as ever.
> 
> If you use a distro that has its userspace with build-ids, you probably
> use it always without noticing :-)
> 
> > see results quickly. At least I'm eager for the results when
> > investigating a performance issue.
> 
> Sure thing.
>  
> > > With build-ids and debuginfo-install like tools the symbol
> > > resolution could be performed by using the cookies (build-ids) as
> > > keys to get to the *-debuginfo packages with matching symtabs (and
> > > DWARF for source annotation, etc).
> 
> > We can't make sure guest os uses the same os images, or don't know
> > where we could find the original DVD images being used to install
> > guest os.
> 
> You don't have to have guest and host sharing the same OS image, you
> just have to somehow populate your buildid cache with what you need, be
> it using sshfs or what Ingo is suggesting once, or using what your
> vendor provides (debuginfo packages). And you just have to do it once,
> for the relevant apps, to have it in your buildid cache.
>  
> > Current perf does save build id, including both kernls's and other
> > application lib/executables.
> 
> Yeah, I know, I implemented it. :-)
>  
> > > We have that for the kernel as:
> 
> > > [acme@doppio linux-2.6-tip]$ l /sys/kernel/notes 
> > > -r--r--r-- 1 root root 36 2010-03-22 13:14 /sys/kernel/notes
> > > [acme@doppio linux-2.6-tip]$ l /sys/module/ipv6/sections/.note.gnu.build-id 
> > > -r--r--r-- 1 root root 4096 2010-03-22 13:38 /sys/module/ipv6/sections/.note.gnu.build-id
> > > [acme@doppio linux-2.6-tip]$
> 
> > > That way we would cover DSOs being reinstalled in long running 'perf
> > > record' sessions too.
> 
> > That's one of objectives of perf to support long running.
> 
> But it doesn't fully supports right now, as I explained, build-ids are
> collected at the end of the record session, because we have to open the
> DSOs that had hits to get the 20 bytes cookie we need, the build-id.
> 
> If we had it in the PERF_RECORD_MMAP record, we would close this race,
> and the added cost at load time should be minimal, to get the ELF
> section with it and put it somewhere in task struct.
Well, you are improving upon perfection.

> 
> If only we could coalesce it a bit to reclaim this:
> 
> [acme@doppio linux-2.6-tip]$ pahole -C task_struct ../build/v2.6.34-rc1-tip+/kernel/sched.o  | tail -5
> 	/* size: 5968, cachelines: 94, members: 150 */
> 	/* sum members: 5943, holes: 7, sum holes: 25 */
> 	/* bit holes: 1, sum bit holes: 28 bits */
> 	/* last cacheline: 16 bytes */
> };
> [acme@doppio linux-2.6-tip]$ 
That reminds me I listened to your presentation on 2007 OLS. :)

> 
> 8-)
> 
> Or at least get just one of those 4 bytes holes then we could stick it
> at the end to get our build-id there, accessing it would be done only
> at PERF_RECORD_MMAP injection time, i.e. close to the time when we
> actually are loading the executable mmap, i.e. close to the time when
> the loader is injecting the build-id, I guess the extra memory and
> processing costs would be in the noise.
> 
> > > This was discussed some time ago but would require help from the bits
> > > that load DSOs.
> 
> > > build-ids then would be first class citizens.
> 
> - Arnaldo



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-23 18:21                                                                                                                           ` Joerg Roedel
  2010-03-23 18:27                                                                                                                             ` Peter Zijlstra
  2010-03-23 19:05                                                                                                                               ` Javier Guerra Giraldez
@ 2010-03-24  4:57                                                                                                                             ` Avi Kivity
  2010-03-24 11:59                                                                                                                               ` Joerg Roedel
  2010-03-24  5:09                                                                                                                             ` Andi Kleen
  3 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24  4:57 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, Zachary Amsden, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/23/2010 08:21 PM, Joerg Roedel wrote:
> On Tue, Mar 23, 2010 at 06:39:58PM +0200, Avi Kivity wrote:
>    
>> On 03/23/2010 04:06 PM, Joerg Roedel wrote:
>>      
>    
>>> And this system wide entity is the kvm module. It creates instances of
>>> 'struct kvm' and destroys them. I see no problem if we just attach a
>>> name to every instance with a good default value like kvm0, kvm1 ... or
>>> guest0, guest1 ... User-space can override the name if it wants. The kvm
>>> module takes care about the names being unique.
>>>
>>>        
>> So, two users can't have a guest named MyGuest each?  What about
>> namespace support?  There's a lot of work in virtualizing all kernel
>> namespaces, you're adding to that.
>>      
> This enumeration is a very small and non-intrusive feature. Making it
> aware of namespaces is easy too.
>    

It's easier (and safer and all the other boring bits) not to do it at 
all in the kernel.

>> What about notifications when guests  are added or removed?
>>      
> Who would be the consumer of such notifications? A 'perf kvm list' can
> live without I guess. If we need them later we can still add them.
>    

System-wide monitoring needs to work equally well for guests started 
before or after the monitor.  Even disregarding that, if you introduce 
an API, people will start using it and complaining if it's incomplete.

The equivalent functionality for network interfaces is in netlink.

>>> This is very much the same as network card numbering is implemented in
>>> the kernel.
>>> Forcing perf to talk to qemu or even libvirt produces to much overhead
>>> imho. Instrumentation only produces useful results with low overhead.
>>>
>>>        
>> It's a setup cost only.
>>      
> My statement was not limited to enumeration, I should have been more
> clear about that. The guest filesystem access-channel is another
> affected part. The 'perf kvm top' command will access the guest
> filesystem regularly and going over qemu would be more overhead here.
>    

Why?  Also, the real cost would be accessing the filesystem, not copying 
data over qemu.

> Providing this in the KVM module directly also has the benefit that it
> would work out-of-the-box with different userspaces too.  Or do we want
> to limit 'perf kvm' to the libvirt-qemu-kvm software stack?
>    

Other userspaces can also provide this functionality, like they have to 
provide disk, network, and display emulation.  The kernel is not a huge 
library.

> Sidenote: I really think we should come to a conclusion about the
>            concept. KVM integration into perf is very useful feature to
> 	  analyze virtualization workloads.
>
>    

Agreed.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-23 18:21                                                                                                                           ` Joerg Roedel
                                                                                                                                               ` (2 preceding siblings ...)
  2010-03-24  4:57                                                                                                                             ` Avi Kivity
@ 2010-03-24  5:09                                                                                                                             ` Andi Kleen
  2010-03-24  6:42                                                                                                                               ` Avi Kivity
  3 siblings, 1 reply; 390+ messages in thread
From: Andi Kleen @ 2010-03-24  5:09 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins

Joerg Roedel <joro@8bytes.org> writes:
>
> Sidenote: I really think we should come to a conclusion about the
>           concept. KVM integration into perf is very useful feature to
> 	  analyze virtualization workloads.

Agreed. I especially would like to see instruction/branch tracing
working this way.  This would a lot of the benefits of a simulator on
a real CPU.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24  5:09                                                                                                                             ` Andi Kleen
@ 2010-03-24  6:42                                                                                                                               ` Avi Kivity
  2010-03-24  7:38                                                                                                                                 ` Andi Kleen
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24  6:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins

On 03/24/2010 07:09 AM, Andi Kleen wrote:
> Joerg Roedel<joro@8bytes.org>  writes:
>    
>> Sidenote: I really think we should come to a conclusion about the
>>            concept. KVM integration into perf is very useful feature to
>> 	  analyze virtualization workloads.
>>      
> Agreed. I especially would like to see instruction/branch tracing
> working this way.  This would a lot of the benefits of a simulator on
> a real CPU.
>    

If you're profiling a single guest it makes more sense to do this from 
inside the guest - you can profile userspace as well as the kernel.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24  6:42                                                                                                                               ` Avi Kivity
@ 2010-03-24  7:38                                                                                                                                 ` Andi Kleen
  2010-03-24  8:59                                                                                                                                   ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Andi Kleen @ 2010-03-24  7:38 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Andi Kleen, Joerg Roedel, Anthony Liguori, Ingo Molnar,
	Pekka Enberg, Zhang, Yanmin, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	Zachary Amsden, ziteng.huang, Arnaldo Carvalho de Melo,
	Fr?d?ric Weisbecker, Gregory Haskins

> If you're profiling a single guest it makes more sense to do this from 
> inside the guest - you can profile userspace as well as the kernel.

I'm interested in debugging the guest without guest cooperation.

In many cases qemu's new gdb stub works for that, but in some cases
I would prefer instruction/branch traces over standard gdb style
debugging.

I used to use that very successfully with simulators in the past
for some hard bugs.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24  7:38                                                                                                                                 ` Andi Kleen
@ 2010-03-24  8:59                                                                                                                                   ` Avi Kivity
  2010-03-24  9:31                                                                                                                                     ` Andi Kleen
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24  8:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins

On 03/24/2010 09:38 AM, Andi Kleen wrote:
>> If you're profiling a single guest it makes more sense to do this from
>> inside the guest - you can profile userspace as well as the kernel.
>>      
> I'm interested in debugging the guest without guest cooperation.
>
> In many cases qemu's new gdb stub works for that, but in some cases
> I would prefer instruction/branch traces over standard gdb style
> debugging.
>    

Isn't gdb supposed to be able to use branch traces?  It makes sense to 
expose them via the gdb stub then.  Not to say an external tool doesn't 
make sense.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24  8:59                                                                                                                                   ` Avi Kivity
@ 2010-03-24  9:31                                                                                                                                     ` Andi Kleen
  0 siblings, 0 replies; 390+ messages in thread
From: Andi Kleen @ 2010-03-24  9:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker,
	Gregory Haskins

Avi Kivity <avi@redhat.com> writes:

> On 03/24/2010 09:38 AM, Andi Kleen wrote:
>>> If you're profiling a single guest it makes more sense to do this from
>>> inside the guest - you can profile userspace as well as the kernel.
>>>      
>> I'm interested in debugging the guest without guest cooperation.
>>
>> In many cases qemu's new gdb stub works for that, but in some cases
>> I would prefer instruction/branch traces over standard gdb style
>> debugging.
>>    
>
> Isn't gdb supposed to be able to use branch traces? 

AFAIK not. The ptrace interface is only used by idb I believe.
I might be wrong on that.

Not sure if there is even a remote protocol command for 
branch traces either.

There's a concept of "tracepoints" in the protocol, but it 
doesn't quite match at.

> It makes sense to
> expose them via the gdb stub then.  Not to say an external tool
> doesn't make sense.

Ok that would work for me too. As long as I can set start/stop
triggers and pipe the log somewhere it's fine for me.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24  4:57                                                                                                                             ` Avi Kivity
@ 2010-03-24 11:59                                                                                                                               ` Joerg Roedel
  2010-03-24 12:08                                                                                                                                 ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 11:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 06:57:47AM +0200, Avi Kivity wrote:
> On 03/23/2010 08:21 PM, Joerg Roedel wrote:
>> This enumeration is a very small and non-intrusive feature. Making it
>> aware of namespaces is easy too.
>>    
>
> It's easier (and safer and all the other boring bits) not to do it at  
> all in the kernel.

For the KVM stack is doesn't matter where it is implemented. It is as
easy in qemu or libvirt as in the kernel. I also don't see big risks. On
the perf side and for its users it is a lot easier to have this in the
kernel.
I for example always use plain qemu when running kvm guests and never
used libvirt. The only central entity I have here is the kvm kernel
modules. I don't want to start using it only to be able to use perf kvm.

>> Who would be the consumer of such notifications? A 'perf kvm list' can
>> live without I guess. If we need them later we can still add them.
>
> System-wide monitoring needs to work equally well for guests started  
> before or after the monitor.

Could be easily done using notifier chains already in the kernel.
Probably implemented with much less than 100 lines of additional code.

> Even disregarding that, if you introduce  an API, people will start
> using it and complaining if it's incomplete.

There is nothing wrong with that. We only need to define what this API
should be used for to prevent rank growth. It could be an
instrumentation-only API for example.

>> My statement was not limited to enumeration, I should have been more
>> clear about that. The guest filesystem access-channel is another
>> affected part. The 'perf kvm top' command will access the guest
>> filesystem regularly and going over qemu would be more overhead here.
>>    
>
> Why?  Also, the real cost would be accessing the filesystem, not copying  
> data over qemu.

When measuring cache-misses any additional (and in this case
unnecessary) copy-overhead result in less appropriate results.

>> Providing this in the KVM module directly also has the benefit that it
>> would work out-of-the-box with different userspaces too.  Or do we want
>> to limit 'perf kvm' to the libvirt-qemu-kvm software stack?
>
> Other userspaces can also provide this functionality, like they have to  
> provide disk, network, and display emulation.  The kernel is not a huge  
> library.

This has nothing to do with a library. It is about entity and resource
management which is what os kernels are about. The virtual machine is
the entity (similar to a process) and we want to add additional access
channels and names to it.

        Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-22  7:13                                                                                       ` Avi Kivity
  2010-03-22 11:14                                                                                         ` Ingo Molnar
@ 2010-03-24 12:06                                                                                         ` Paolo Bonzini
  1 sibling, 0 replies; 390+ messages in thread
From: Paolo Bonzini @ 2010-03-24 12:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Ingo Molnar, Pekka Enberg, Anthony Liguori, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	oerg Roedel, Jes Sorensen, Gleb Natapov, Zachary Amsden,
	ziteng.huang, Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker

On 03/22/2010 08:13 AM, Avi Kivity wrote:
>
> (btw, why are you interested in desktop-on-desktop?  one use case is
> developers, which don't really need fancy GUIs; a second is people who
> test out distributions, but that doesn't seem to be a huge population;
> and a third is people running Windows for some application that doesn't
> run on Linux - hopefully a small catergory as well.

This third category is pretty well served by virt-manager.  It has its 
quirks and shortcomings, but at least it exists.

Paolo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 11:59                                                                                                                               ` Joerg Roedel
@ 2010-03-24 12:08                                                                                                                                 ` Avi Kivity
  2010-03-24 12:50                                                                                                                                   ` Joerg Roedel
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 12:08 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 01:59 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 06:57:47AM +0200, Avi Kivity wrote:
>    
>> On 03/23/2010 08:21 PM, Joerg Roedel wrote:
>>      
>>> This enumeration is a very small and non-intrusive feature. Making it
>>> aware of namespaces is easy too.
>>>
>>>        
>> It's easier (and safer and all the other boring bits) not to do it at
>> all in the kernel.
>>      
> For the KVM stack is doesn't matter where it is implemented. It is as
> easy in qemu or libvirt as in the kernel. I also don't see big risks. On
> the perf side and for its users it is a lot easier to have this in the
> kernel.
> I for example always use plain qemu when running kvm guests and never
> used libvirt. The only central entity I have here is the kvm kernel
> modules. I don't want to start using it only to be able to use perf kvm.
>    

You can always provide the kernel and module paths as command line 
parameters.  It just won't be transparently usable, but if you're using 
qemu from the command line, presumably you can live with that.

>>> Who would be the consumer of such notifications? A 'perf kvm list' can
>>> live without I guess. If we need them later we can still add them.
>>>        
>> System-wide monitoring needs to work equally well for guests started
>> before or after the monitor.
>>      
> Could be easily done using notifier chains already in the kernel.
> Probably implemented with much less than 100 lines of additional code.
>    

And a userspace interface for that.

>> Even disregarding that, if you introduce  an API, people will start
>> using it and complaining if it's incomplete.
>>      
> There is nothing wrong with that. We only need to define what this API
> should be used for to prevent rank growth. It could be an
> instrumentation-only API for example.
>    

If we make an API, I'd like it to be generally useful.

It's a total headache.  For example, we'd need security module hooks to 
determine access permissions.  So far we managed to avoid that since kvm 
doesn't allow you to access any information beyond what you provided it 
directly.


>>> My statement was not limited to enumeration, I should have been more
>>> clear about that. The guest filesystem access-channel is another
>>> affected part. The 'perf kvm top' command will access the guest
>>> filesystem regularly and going over qemu would be more overhead here.
>>>
>>>        
>> Why?  Also, the real cost would be accessing the filesystem, not copying
>> data over qemu.
>>      
> When measuring cache-misses any additional (and in this case
> unnecessary) copy-overhead result in less appropriate results.
>    

Copying the objects is a one time cost.  If you run perf for more than a 
second or two, it would fetch and cache all of the data.  It's really 
the same problem with non-guest profiling, only magnified a bit.

>>> Providing this in the KVM module directly also has the benefit that it
>>> would work out-of-the-box with different userspaces too.  Or do we want
>>> to limit 'perf kvm' to the libvirt-qemu-kvm software stack?
>>>        
>> Other userspaces can also provide this functionality, like they have to
>> provide disk, network, and display emulation.  The kernel is not a huge
>> library.
>>      
> This has nothing to do with a library. It is about entity and resource
> management which is what os kernels are about. The virtual machine is
> the entity (similar to a process) and we want to add additional access
> channels and names to it.
>    

kvm.ko has only a small subset of the information that is used to define 
a guest.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 12:08                                                                                                                                 ` Avi Kivity
@ 2010-03-24 12:50                                                                                                                                   ` Joerg Roedel
  2010-03-24 13:05                                                                                                                                     ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 12:50 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 02:08:17PM +0200, Avi Kivity wrote:
> On 03/24/2010 01:59 PM, Joerg Roedel wrote:

> You can always provide the kernel and module paths as command line  
> parameters.  It just won't be transparently usable, but if you're using  
> qemu from the command line, presumably you can live with that.

I don't want the tool for myself only. A typical perf user expects that
it works transparent.

>> Could be easily done using notifier chains already in the kernel.
>> Probably implemented with much less than 100 lines of additional code.
>
> And a userspace interface for that.

Not necessarily. The perf event is configured to measure systemwide kvm
by userspace. The kernel side of perf takes care that it stays
system-wide even with added vm instances. So in this case the consumer
for the notifier would be the perf kernel part. No userspace interface
required.

> If we make an API, I'd like it to be generally useful.

Thats hard to do at this point since we don't know what people will use
it for. We should keep it simple in the beginning and add new features
as they are requested and make sense in this context.

> It's a total headache.  For example, we'd need security module hooks to  
> determine access permissions.  So far we managed to avoid that since kvm  
> doesn't allow you to access any information beyond what you provided it  
> directly.

Depends on how it is designed. A filesystem approach was already
mentioned. We could create /sys/kvm/ for example to expose information
about virtual machines to userspace. This would not require any new
security hooks.

> Copying the objects is a one time cost.  If you run perf for more than a  
> second or two, it would fetch and cache all of the data.  It's really  
> the same problem with non-guest profiling, only magnified a bit.

I don't think we can cache filesystem data of a running guest on the
host. It is too hard to keep such a cache coherent.

>>> Other userspaces can also provide this functionality, like they have to
>>> provide disk, network, and display emulation.  The kernel is not a huge
>>> library.

If two userspaces run in parallel what is the single instance where perf
can get a list of guests from?

> kvm.ko has only a small subset of the information that is used to define  
> a guest.

The subset is not small. It contains all guest vcpus, the complete
interrupt routing hardware emulation and manages event the guests
memory.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 12:50                                                                                                                                   ` Joerg Roedel
@ 2010-03-24 13:05                                                                                                                                     ` Avi Kivity
  2010-03-24 13:46                                                                                                                                       ` Joerg Roedel
  2010-03-24 13:53                                                                                                                                       ` Alexander Graf
  0 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 13:05 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 02:50 PM, Joerg Roedel wrote:
>
>> You can always provide the kernel and module paths as command line
>> parameters.  It just won't be transparently usable, but if you're using
>> qemu from the command line, presumably you can live with that.
>>      
> I don't want the tool for myself only. A typical perf user expects that
> it works transparent.
>    

A typical kvm user uses libvirt, so we can integrate it with that.

>>> Could be easily done using notifier chains already in the kernel.
>>> Probably implemented with much less than 100 lines of additional code.
>>>        
>> And a userspace interface for that.
>>      
> Not necessarily. The perf event is configured to measure systemwide kvm
> by userspace. The kernel side of perf takes care that it stays
> system-wide even with added vm instances. So in this case the consumer
> for the notifier would be the perf kernel part. No userspace interface
> required.
>    

Someone needs to know about the new guest to fetch its symbols.  Or do 
you want that part in the kernel too?

>> If we make an API, I'd like it to be generally useful.
>>      
> Thats hard to do at this point since we don't know what people will use
> it for. We should keep it simple in the beginning and add new features
> as they are requested and make sense in this context.
>    

IMO this use case is to rare to warrant its own API, especially as there 
are alternatives.

>> It's a total headache.  For example, we'd need security module hooks to
>> determine access permissions.  So far we managed to avoid that since kvm
>> doesn't allow you to access any information beyond what you provided it
>> directly.
>>      
> Depends on how it is designed. A filesystem approach was already
> mentioned. We could create /sys/kvm/ for example to expose information
> about virtual machines to userspace. This would not require any new
> security hooks.
>    

Who would set the security context on those files?  Plus, we need cgroup 
support so you can't see one container's guests from an unrelated container.

>> Copying the objects is a one time cost.  If you run perf for more than a
>> second or two, it would fetch and cache all of the data.  It's really
>> the same problem with non-guest profiling, only magnified a bit.
>>      
> I don't think we can cache filesystem data of a running guest on the
> host. It is too hard to keep such a cache coherent.
>    

I don't see any choice.  The guest can change its symbols at any time 
(say by kexec), without any notification.

>>>> Other userspaces can also provide this functionality, like they have to
>>>> provide disk, network, and display emulation.  The kernel is not a huge
>>>> library.
>>>>          
> If two userspaces run in parallel what is the single instance where perf
> can get a list of guests from?
>    

I don't know.  Surely that's solvable though.

>> kvm.ko has only a small subset of the information that is used to define
>> a guest.
>>      
> The subset is not small. It contains all guest vcpus, the complete
> interrupt routing hardware emulation and manages event the guests
> memory.
>    

It doesn't contain most of the mmio and pio address space.  Integration 
with qemu would allow perf to tell us that the guest is hitting the 
interrupt status register of a virtio-blk device in pci slot 5 (the 
information is already available through the kvm_mmio trace event, but 
only qemu can decode it).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 13:05                                                                                                                                     ` Avi Kivity
@ 2010-03-24 13:46                                                                                                                                       ` Joerg Roedel
  2010-03-24 13:57                                                                                                                                         ` Avi Kivity
  2010-03-24 13:53                                                                                                                                       ` Alexander Graf
  1 sibling, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 13:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 03:05:02PM +0200, Avi Kivity wrote:
> On 03/24/2010 02:50 PM, Joerg Roedel wrote:

>> I don't want the tool for myself only. A typical perf user expects that
>> it works transparent.
>
> A typical kvm user uses libvirt, so we can integrate it with that.

Someone who uses libvirt and virt-manager by default is probably not
interested in this feature at the same level a kvm developer is. And
developers tend not to use libvirt for low-level kvm development.  A
number of developers have stated in this thread already that they would
appreciate a solution for guest enumeration that would not involve
libvirt.

> Someone needs to know about the new guest to fetch its symbols.  Or do  
> you want that part in the kernel too?

The samples will be tagged with the guest-name (and some additional
information perf needs). Perf userspace can access the symbols then
through /sys/kvm/guest0/fs/...

>> Depends on how it is designed. A filesystem approach was already
>> mentioned. We could create /sys/kvm/ for example to expose information
>> about virtual machines to userspace. This would not require any new
>> security hooks.
>
> Who would set the security context on those files?

An approach like: "The files are owned and only readable by the same
user that started the vm." might be a good start. So a user can measure
its own guests and root can measure all of them.

> Plus, we need cgroup  support so you can't see one container's guests
> from an unrelated container.

cgroup support is an issue but we can solve that too. Its in general
still less complex than going through the whole libvirt-qemu-kvm stack.

> Integration with qemu would allow perf to tell us that the guest is
> hitting the interrupt status register of a virtio-blk device in pci
> slot 5 (the information is already available through the kvm_mmio
> trace event, but  only qemu can decode it).

Yeah that would be interesting information. But it is more related to
tracing than to pmu measurements.
The information which you mentioned above are probably better
captured by an extension of trace-events to userspace.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 13:05                                                                                                                                     ` Avi Kivity
  2010-03-24 13:46                                                                                                                                       ` Joerg Roedel
@ 2010-03-24 13:53                                                                                                                                       ` Alexander Graf
  2010-03-24 13:59                                                                                                                                         ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Alexander Graf @ 2010-03-24 13:53 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

Avi Kivity wrote:
> On 03/24/2010 02:50 PM, Joerg Roedel wrote:
>>
>>> You can always provide the kernel and module paths as command line
>>> parameters.  It just won't be transparently usable, but if you're using
>>> qemu from the command line, presumably you can live with that.
>>>      
>> I don't want the tool for myself only. A typical perf user expects that
>> it works transparent.
>>    
>
> A typical kvm user uses libvirt, so we can integrate it with that.
>
>>>> Could be easily done using notifier chains already in the kernel.
>>>> Probably implemented with much less than 100 lines of additional code.
>>>>        
>>> And a userspace interface for that.
>>>      
>> Not necessarily. The perf event is configured to measure systemwide kvm
>> by userspace. The kernel side of perf takes care that it stays
>> system-wide even with added vm instances. So in this case the consumer
>> for the notifier would be the perf kernel part. No userspace interface
>> required.
>>    
>
> Someone needs to know about the new guest to fetch its symbols.  Or do
> you want that part in the kernel too?


How about we add a virtio "guest file system access" device? The guest
would then expose its own file system using that device.

On the host side this would simply be a -virtioguestfs
unix:/tmp/guest.fs and you'd get a unix socket that gives you full
access to the guest file system by using commands. I envision something
like:

SEND: GET /proc/version
RECV: Linux version 2.6.27.37-0.1-default (geeko@buildhost) (gcc version
4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
14:56:58 +0200

Now all we need is integration in perf to enumerate virtual machines
based on libvirt. If you want to run qemu-kvm directly, just go with
--guestfs=/tmp/guest.fs and perf could fetch all required information
automatically.

This should solve all issues while staying 100% in user space, right?


Alex


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 13:46                                                                                                                                       ` Joerg Roedel
@ 2010-03-24 13:57                                                                                                                                         ` Avi Kivity
  2010-03-24 15:01                                                                                                                                           ` Joerg Roedel
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 13:57 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 03:46 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 03:05:02PM +0200, Avi Kivity wrote:
>    
>> On 03/24/2010 02:50 PM, Joerg Roedel wrote:
>>      
>    
>>> I don't want the tool for myself only. A typical perf user expects that
>>> it works transparent.
>>>        
>> A typical kvm user uses libvirt, so we can integrate it with that.
>>      
> Someone who uses libvirt and virt-manager by default is probably not
> interested in this feature at the same level a kvm developer is. And
> developers tend not to use libvirt for low-level kvm development.  A
> number of developers have stated in this thread already that they would
> appreciate a solution for guest enumeration that would not involve
> libvirt.
>    

So would I.  But when I weigh the benefit of truly transparent 
system-wide perf integration for users who don't use libvirt but do use 
perf, versus the cost of transforming kvm from a single-process API to a 
system-wide API with all the complications that I've listed, it comes 
out in favour of not adding the API.

Those few users can probably script something to cover their needs.

>> Someone needs to know about the new guest to fetch its symbols.  Or do
>> you want that part in the kernel too?
>>      
> The samples will be tagged with the guest-name (and some additional
> information perf needs). Perf userspace can access the symbols then
> through /sys/kvm/guest0/fs/...
>    

I take that as a yes?  So we need a virtio-serial client in the kernel 
(which might be exploitable by a malicious guest if buggy) and a 
fs-over-virtio-serial client in the kernel (also exploitable).

>>> Depends on how it is designed. A filesystem approach was already
>>> mentioned. We could create /sys/kvm/ for example to expose information
>>> about virtual machines to userspace. This would not require any new
>>> security hooks.
>>>        
>> Who would set the security context on those files?
>>      
> An approach like: "The files are owned and only readable by the same
> user that started the vm." might be a good start. So a user can measure
> its own guests and root can measure all of them.
>    

That's not how sVirt works.  sVirt isolates a user's VMs from each 
other, so if a guest breaks into qemu it can't break into other guests 
owned by the same user.

The users who need this API (!libvirt and perf) probably don't care 
about sVirt, but a new API must not break it.

>> Plus, we need cgroup  support so you can't see one container's guests
>> from an unrelated container.
>>      
> cgroup support is an issue but we can solve that too. Its in general
> still less complex than going through the whole libvirt-qemu-kvm stack.
>    

It's a tradeoff.  IMO, going through qemu is the better way, and also 
provides more information.

>> Integration with qemu would allow perf to tell us that the guest is
>> hitting the interrupt status register of a virtio-blk device in pci
>> slot 5 (the information is already available through the kvm_mmio
>> trace event, but  only qemu can decode it).
>>      
> Yeah that would be interesting information. But it is more related to
> tracing than to pmu measurements.
> The information which you mentioned above are probably better
> captured by an extension of trace-events to userspace.
>    

It's all related.  You start with perf, see a problem with mmio, call up 
a histogram of mmio or interrupts or whatever, then zoom in on the 
misbehaving device.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 13:53                                                                                                                                       ` Alexander Graf
@ 2010-03-24 13:59                                                                                                                                         ` Avi Kivity
  2010-03-24 14:24                                                                                                                                           ` Alexander Graf
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 13:59 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 03:53 PM, Alexander Graf wrote:
>
>> Someone needs to know about the new guest to fetch its symbols.  Or do
>> you want that part in the kernel too?
>>      
>
> How about we add a virtio "guest file system access" device? The guest
> would then expose its own file system using that device.
>
> On the host side this would simply be a -virtioguestfs
> unix:/tmp/guest.fs and you'd get a unix socket that gives you full
> access to the guest file system by using commands. I envision something
> like:
>    

The idea is to use a dedicated channel over virtio-serial.  If the 
channel is present the file server can serve files over it.

> SEND: GET /proc/version
> RECV: Linux version 2.6.27.37-0.1-default (geeko@buildhost) (gcc version
> 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
> 14:56:58 +0200
>
> Now all we need is integration in perf to enumerate virtual machines
> based on libvirt. If you want to run qemu-kvm directly, just go with
> --guestfs=/tmp/guest.fs and perf could fetch all required information
> automatically.
>
> This should solve all issues while staying 100% in user space, right?
>    

Yeah, needs a fuse filesystem to populate the host namespace (kind of 
sshfs over virtio-serial).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 13:59                                                                                                                                         ` Avi Kivity
@ 2010-03-24 14:24                                                                                                                                           ` Alexander Graf
  2010-03-24 15:06                                                                                                                                             ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Alexander Graf @ 2010-03-24 14:24 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

Avi Kivity wrote:
> On 03/24/2010 03:53 PM, Alexander Graf wrote:
>>
>>> Someone needs to know about the new guest to fetch its symbols.  Or do
>>> you want that part in the kernel too?
>>>      
>>
>> How about we add a virtio "guest file system access" device? The guest
>> would then expose its own file system using that device.
>>
>> On the host side this would simply be a -virtioguestfs
>> unix:/tmp/guest.fs and you'd get a unix socket that gives you full
>> access to the guest file system by using commands. I envision something
>> like:
>>    
>
> The idea is to use a dedicated channel over virtio-serial.  If the
> channel is present the file server can serve files over it.

The file server being a kernel module inside the guest? We want to be
able to serve things as early and hassle free as possible, so in this
case I agree with Ingo that a kernel module is superior.

>
>> SEND: GET /proc/version
>> RECV: Linux version 2.6.27.37-0.1-default (geeko@buildhost) (gcc version
>> 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
>> 14:56:58 +0200
>>
>> Now all we need is integration in perf to enumerate virtual machines
>> based on libvirt. If you want to run qemu-kvm directly, just go with
>> --guestfs=/tmp/guest.fs and perf could fetch all required information
>> automatically.
>>
>> This should solve all issues while staying 100% in user space, right?
>>    
>
> Yeah, needs a fuse filesystem to populate the host namespace (kind of
> sshfs over virtio-serial).

I don't see why we need a fuse filesystem. We can of course create one
later on. But for now all you need is a user connecting to that socket.


Alex



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 13:57                                                                                                                                         ` Avi Kivity
@ 2010-03-24 15:01                                                                                                                                           ` Joerg Roedel
  2010-03-24 15:12                                                                                                                                             ` Avi Kivity
                                                                                                                                                               ` (2 more replies)
  0 siblings, 3 replies; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 15:01 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 03:57:39PM +0200, Avi Kivity wrote:
> On 03/24/2010 03:46 PM, Joerg Roedel wrote:

>> Someone who uses libvirt and virt-manager by default is probably not
>> interested in this feature at the same level a kvm developer is. And
>> developers tend not to use libvirt for low-level kvm development.  A
>> number of developers have stated in this thread already that they would
>> appreciate a solution for guest enumeration that would not involve
>> libvirt.
>
> So would I.

Great.

> But when I weigh the benefit of truly transparent  system-wide perf
> integration for users who don't use libvirt but do use  perf, versus
> the cost of transforming kvm from a single-process API to a
> system-wide API with all the complications that I've listed, it comes
> out in favour of not adding the API.

Its not a transformation, its an extension. The current per-process
/dev/kvm stays mostly untouched. Its all about having something like
this:

$ cd /sys/kvm/guest0
$ ls -l
-r-------- 1 root root 0 2009-08-17 12:05 name
dr-x------ 1 root root 0 2009-08-17 12:05 fs
$ cat name
guest0
$ # ...

The fs/ directory is used as the mount point for the guest root fs.

>> The samples will be tagged with the guest-name (and some additional
>> information perf needs). Perf userspace can access the symbols then
>> through /sys/kvm/guest0/fs/...
>
> I take that as a yes?  So we need a virtio-serial client in the kernel  
> (which might be exploitable by a malicious guest if buggy) and a  
> fs-over-virtio-serial client in the kernel (also exploitable).

What I meant was: perf-kernel puts the guest-name into every sample and
perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
symbols. I leave the question of how the guest-fs is exposed to the host
out of this discussion. We should discuss this seperatly.


>> An approach like: "The files are owned and only readable by the same
>> user that started the vm." might be a good start. So a user can measure
>> its own guests and root can measure all of them.
>
> That's not how sVirt works.  sVirt isolates a user's VMs from each  
> other, so if a guest breaks into qemu it can't break into other guests  
> owned by the same user.

If a vm breaks into qemu it can access the host file system which is the
bigger problem. In this case there is no isolation anymore. From that
context it can even kill other VMs of the same user independent of a
hypothetical /sys/kvm/.

>> Yeah that would be interesting information. But it is more related to
>> tracing than to pmu measurements.  The information which you
>> mentioned above are probably better captured by an extension of
>> trace-events to userspace.
>
> It's all related.  You start with perf, see a problem with mmio, call up  
> a histogram of mmio or interrupts or whatever, then zoom in on the  
> misbehaving device.

Yes, but its different from the implementation point-of-view. For the
user it surely all plays together.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 14:24                                                                                                                                           ` Alexander Graf
@ 2010-03-24 15:06                                                                                                                                             ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 15:06 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 04:24 PM, Alexander Graf wrote:
> Avi Kivity wrote:
>    
>> On 03/24/2010 03:53 PM, Alexander Graf wrote:
>>      
>>>        
>>>> Someone needs to know about the new guest to fetch its symbols.  Or do
>>>> you want that part in the kernel too?
>>>>
>>>>          
>>> How about we add a virtio "guest file system access" device? The guest
>>> would then expose its own file system using that device.
>>>
>>> On the host side this would simply be a -virtioguestfs
>>> unix:/tmp/guest.fs and you'd get a unix socket that gives you full
>>> access to the guest file system by using commands. I envision something
>>> like:
>>>
>>>        
>> The idea is to use a dedicated channel over virtio-serial.  If the
>> channel is present the file server can serve files over it.
>>      
> The file server being a kernel module inside the guest? We want to be
> able to serve things as early and hassle free as possible, so in this
> case I agree with Ingo that a kernel module is superior.
>    

No, just a daemon.  If it's important enough we can get distributions to 
package it by default, and then it will be hassle free.  If "early 
enough" is also so important, we can get it to start up on initrd.  If 
it's really critical, we can patch grub to serve the files as well.

>>> SEND: GET /proc/version
>>> RECV: Linux version 2.6.27.37-0.1-default (geeko@buildhost) (gcc version
>>> 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP 2009-10-15
>>> 14:56:58 +0200
>>>
>>> Now all we need is integration in perf to enumerate virtual machines
>>> based on libvirt. If you want to run qemu-kvm directly, just go with
>>> --guestfs=/tmp/guest.fs and perf could fetch all required information
>>> automatically.
>>>
>>> This should solve all issues while staying 100% in user space, right?
>>>
>>>        
>> Yeah, needs a fuse filesystem to populate the host namespace (kind of
>> sshfs over virtio-serial).
>>      
> I don't see why we need a fuse filesystem. We can of course create one
> later on. But for now all you need is a user connecting to that socket.
>    

If the perf app knows the protocol, no problem.  But leave perf with 
pure filesystem access and hide the details in fuse.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:01                                                                                                                                           ` Joerg Roedel
@ 2010-03-24 15:12                                                                                                                                             ` Avi Kivity
  2010-03-24 15:46                                                                                                                                               ` Joerg Roedel
  2010-03-24 15:26                                                                                                                                             ` Daniel P. Berrange
  2010-03-24 16:03                                                                                                                                             ` Peter Zijlstra
  2 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 15:12 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>
>> But when I weigh the benefit of truly transparent  system-wide perf
>> integration for users who don't use libvirt but do use  perf, versus
>> the cost of transforming kvm from a single-process API to a
>> system-wide API with all the complications that I've listed, it comes
>> out in favour of not adding the API.
>>      
> Its not a transformation, its an extension. The current per-process
> /dev/kvm stays mostly untouched. Its all about having something like
> this:
>
> $ cd /sys/kvm/guest0
> $ ls -l
> -r-------- 1 root root 0 2009-08-17 12:05 name
> dr-x------ 1 root root 0 2009-08-17 12:05 fs
> $ cat name
> guest0
> $ # ...
>
> The fs/ directory is used as the mount point for the guest root fs.
>    

The problem is /sys/kvm, not /sys/kvm/fs.

>>> The samples will be tagged with the guest-name (and some additional
>>> information perf needs). Perf userspace can access the symbols then
>>> through /sys/kvm/guest0/fs/...
>>>        
>> I take that as a yes?  So we need a virtio-serial client in the kernel
>> (which might be exploitable by a malicious guest if buggy) and a
>> fs-over-virtio-serial client in the kernel (also exploitable).
>>      
> What I meant was: perf-kernel puts the guest-name into every sample and
> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> symbols. I leave the question of how the guest-fs is exposed to the host
> out of this discussion. We should discuss this seperatly.
>    

How I see it: perf-kernel puts the guest pid into every sample, and 
perf-userspace uses that to resolve to a mountpoint served by fuse, or 
to a unix domain socket that serves the files.

>>> An approach like: "The files are owned and only readable by the same
>>> user that started the vm." might be a good start. So a user can measure
>>> its own guests and root can measure all of them.
>>>        
>> That's not how sVirt works.  sVirt isolates a user's VMs from each
>> other, so if a guest breaks into qemu it can't break into other guests
>> owned by the same user.
>>      
> If a vm breaks into qemu it can access the host file system which is the
> bigger problem. In this case there is no isolation anymore. From that
> context it can even kill other VMs of the same user independent of a
> hypothetical /sys/kvm/.
>    

It cannot.  sVirt labels the disk image and other files qemu needs with 
the appropriate label, and everything else is off limits.  Even if you 
run the guest as root, it won't have access to other files.

>>> Yeah that would be interesting information. But it is more related to
>>> tracing than to pmu measurements.  The information which you
>>> mentioned above are probably better captured by an extension of
>>> trace-events to userspace.
>>>        
>> It's all related.  You start with perf, see a problem with mmio, call up
>> a histogram of mmio or interrupts or whatever, then zoom in on the
>> misbehaving device.
>>      
> Yes, but its different from the implementation point-of-view. For the
> user it surely all plays together.
>    

We need qemu to cooperate for mmio tracing, and we can cooperate with 
qemu for symbol resolution.  If it prevents adding another kernel API, 
that's a win from my POV.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:01                                                                                                                                           ` Joerg Roedel
  2010-03-24 15:12                                                                                                                                             ` Avi Kivity
@ 2010-03-24 15:26                                                                                                                                             ` Daniel P. Berrange
  2010-03-24 15:37                                                                                                                                               ` Joerg Roedel
  2010-03-24 16:03                                                                                                                                             ` Peter Zijlstra
  2 siblings, 1 reply; 390+ messages in thread
From: Daniel P. Berrange @ 2010-03-24 15:26 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 04:01:37PM +0100, Joerg Roedel wrote:
> >> An approach like: "The files are owned and only readable by the same
> >> user that started the vm." might be a good start. So a user can measure
> >> its own guests and root can measure all of them.
> >
> > That's not how sVirt works.  sVirt isolates a user's VMs from each  
> > other, so if a guest breaks into qemu it can't break into other guests  
> > owned by the same user.
> 
> If a vm breaks into qemu it can access the host file system which is the
> bigger problem. In this case there is no isolation anymore. From that
> context it can even kill other VMs of the same user independent of a
> hypothetical /sys/kvm/.

No it can't. With sVirt every single VM has a custom security label and
the policy only allows it access to disks / files with a matching label,
and prevents it attacking any other VMs or processes on the host. THis
confines the scope of any exploit in QEMU to those resources the admin
has explicitly assigned to the guest.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:26                                                                                                                                             ` Daniel P. Berrange
@ 2010-03-24 15:37                                                                                                                                               ` Joerg Roedel
  2010-03-24 15:43                                                                                                                                                 ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 15:37 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 03:26:53PM +0000, Daniel P. Berrange wrote:
> On Wed, Mar 24, 2010 at 04:01:37PM +0100, Joerg Roedel wrote:
> > >> An approach like: "The files are owned and only readable by the same
> > >> user that started the vm." might be a good start. So a user can measure
> > >> its own guests and root can measure all of them.
> > >
> > > That's not how sVirt works.  sVirt isolates a user's VMs from each  
> > > other, so if a guest breaks into qemu it can't break into other guests  
> > > owned by the same user.
> > 
> > If a vm breaks into qemu it can access the host file system which is the
> > bigger problem. In this case there is no isolation anymore. From that
> > context it can even kill other VMs of the same user independent of a
> > hypothetical /sys/kvm/.
> 
> No it can't. With sVirt every single VM has a custom security label and
> the policy only allows it access to disks / files with a matching label,
> and prevents it attacking any other VMs or processes on the host. THis
> confines the scope of any exploit in QEMU to those resources the admin
> has explicitly assigned to the guest.

Even better. So a guest which breaks out can't even access its own
/sys/kvm/ directory. Perfect, it doesn't need that access anyway.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:37                                                                                                                                               ` Joerg Roedel
@ 2010-03-24 15:43                                                                                                                                                 ` Avi Kivity
  2010-03-24 15:50                                                                                                                                                   ` Joerg Roedel
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 15:43 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 05:37 PM, Joerg Roedel wrote:
>
>> No it can't. With sVirt every single VM has a custom security label and
>> the policy only allows it access to disks / files with a matching label,
>> and prevents it attacking any other VMs or processes on the host. THis
>> confines the scope of any exploit in QEMU to those resources the admin
>> has explicitly assigned to the guest.
>>      
> Even better. So a guest which breaks out can't even access its own
> /sys/kvm/ directory. Perfect, it doesn't need that access anyway.
>
>    

But what security label does that directory have?  How can we make sure 
that whoever needs access to those files, gets them?

Automatically created objects don't work well with that model.  They're 
simply missing information.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:12                                                                                                                                             ` Avi Kivity
@ 2010-03-24 15:46                                                                                                                                               ` Joerg Roedel
  2010-03-24 15:49                                                                                                                                                 ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 15:46 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>> $ cd /sys/kvm/guest0
>> $ ls -l
>> -r-------- 1 root root 0 2009-08-17 12:05 name
>> dr-x------ 1 root root 0 2009-08-17 12:05 fs
>> $ cat name
>> guest0
>> $ # ...
>>
>> The fs/ directory is used as the mount point for the guest root fs.
>
> The problem is /sys/kvm, not /sys/kvm/fs.

I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for
example. This would keep anything in the process space (except for the
global list of VMs which we should have anyway).

>> What I meant was: perf-kernel puts the guest-name into every sample and
>> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
>> symbols. I leave the question of how the guest-fs is exposed to the host
>> out of this discussion. We should discuss this seperatly.
>
> How I see it: perf-kernel puts the guest pid into every sample, and  
> perf-userspace uses that to resolve to a mountpoint served by fuse, or  
> to a unix domain socket that serves the files.

We need a bit more information than just the qemu-pid, but yes, this
would also work out.

>> If a vm breaks into qemu it can access the host file system which is the
>> bigger problem. In this case there is no isolation anymore. From that
>> context it can even kill other VMs of the same user independent of a
>> hypothetical /sys/kvm/.
>
> It cannot.  sVirt labels the disk image and other files qemu needs with  
> the appropriate label, and everything else is off limits.  Even if you  
> run the guest as root, it won't have access to other files.

See my reply to Daniel's email.

>> Yes, but its different from the implementation point-of-view. For the
>> user it surely all plays together.
>
> We need qemu to cooperate for mmio tracing, and we can cooperate with  
> qemu for symbol resolution.  If it prevents adding another kernel API,  
> that's a win from my POV.

Thats true. Probably qemu can inject this information in the
kvm-trace-events stream.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:46                                                                                                                                               ` Joerg Roedel
@ 2010-03-24 15:49                                                                                                                                                 ` Avi Kivity
  2010-03-24 15:59                                                                                                                                                   ` Joerg Roedel
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 15:49 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 05:46 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
>    
>> On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>>      
>>> $ cd /sys/kvm/guest0
>>> $ ls -l
>>> -r-------- 1 root root 0 2009-08-17 12:05 name
>>> dr-x------ 1 root root 0 2009-08-17 12:05 fs
>>> $ cat name
>>> guest0
>>> $ # ...
>>>
>>> The fs/ directory is used as the mount point for the guest root fs.
>>>        
>> The problem is /sys/kvm, not /sys/kvm/fs.
>>      
> I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for
> example. This would keep anything in the process space (except for the
> global list of VMs which we should have anyway).
>    

How about ~/.qemu/guests/$pid?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:43                                                                                                                                                 ` Avi Kivity
@ 2010-03-24 15:50                                                                                                                                                   ` Joerg Roedel
  2010-03-24 15:52                                                                                                                                                     ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 15:50 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 05:43:31PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:37 PM, Joerg Roedel wrote:
>> Even better. So a guest which breaks out can't even access its own
>> /sys/kvm/ directory. Perfect, it doesn't need that access anyway.
>
> But what security label does that directory have?  How can we make sure  
> that whoever needs access to those files, gets them?
>
> Automatically created objects don't work well with that model.  They're  
> simply missing information.

If we go the /proc/<pid>/kvm way then the directory should probably
inherit the label from /proc/<pid>/?
Same could be applied to /sys/kvm/guest/ if we decide for it. The VM is
still bound to a single process with a /proc/<pid> after all.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:50                                                                                                                                                   ` Joerg Roedel
@ 2010-03-24 15:52                                                                                                                                                     ` Avi Kivity
  2010-03-24 16:17                                                                                                                                                       ` Joerg Roedel
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 15:52 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 05:50 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 05:43:31PM +0200, Avi Kivity wrote:
>    
>> On 03/24/2010 05:37 PM, Joerg Roedel wrote:
>>      
>>> Even better. So a guest which breaks out can't even access its own
>>> /sys/kvm/ directory. Perfect, it doesn't need that access anyway.
>>>        
>> But what security label does that directory have?  How can we make sure
>> that whoever needs access to those files, gets them?
>>
>> Automatically created objects don't work well with that model.  They're
>> simply missing information.
>>      
> If we go the /proc/<pid>/kvm way then the directory should probably
> inherit the label from /proc/<pid>/?
>    

That's a security policy.  The security people like their policies 
outside the kernel.

For example, they may want a label that allows a trace context to read 
the data, and also qemu itself for introspection.

> Same could be applied to /sys/kvm/guest/ if we decide for it. The VM is
> still bound to a single process with a /proc/<pid>  after all.
>    

Ditto.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:49                                                                                                                                                 ` Avi Kivity
@ 2010-03-24 15:59                                                                                                                                                   ` Joerg Roedel
  2010-03-24 16:09                                                                                                                                                     ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 15:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 05:49:42PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:46 PM, Joerg Roedel wrote:
>> On Wed, Mar 24, 2010 at 05:12:55PM +0200, Avi Kivity wrote:
>>    
>>> On 03/24/2010 05:01 PM, Joerg Roedel wrote:
>>>      
>>>> $ cd /sys/kvm/guest0
>>>> $ ls -l
>>>> -r-------- 1 root root 0 2009-08-17 12:05 name
>>>> dr-x------ 1 root root 0 2009-08-17 12:05 fs
>>>> $ cat name
>>>> guest0
>>>> $ # ...
>>>>
>>>> The fs/ directory is used as the mount point for the guest root fs.
>>>>        
>>> The problem is /sys/kvm, not /sys/kvm/fs.
>>>      
>> I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for
>> example. This would keep anything in the process space (except for the
>> global list of VMs which we should have anyway).
>>    
>
> How about ~/.qemu/guests/$pid?

That makes it hard for perf to find it and even harder to get a list of
all VMs. With /proc/<pid>/kvm/guest we could symlink all guest
directories to /proc/kvm/ and perf reads the list from there. Also perf
can easily derive the directory for a guest from its pid.
Last but not least its kernel-created and thus independent from the
userspace part being used.

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:01                                                                                                                                           ` Joerg Roedel
  2010-03-24 15:12                                                                                                                                             ` Avi Kivity
  2010-03-24 15:26                                                                                                                                             ` Daniel P. Berrange
@ 2010-03-24 16:03                                                                                                                                             ` Peter Zijlstra
  2010-03-24 16:16                                                                                                                                               ` Avi Kivity
  2010-03-24 16:23                                                                                                                                               ` Joerg Roedel
  2 siblings, 2 replies; 390+ messages in thread
From: Peter Zijlstra @ 2010-03-24 16:03 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:

> What I meant was: perf-kernel puts the guest-name into every sample and
> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> symbols. I leave the question of how the guest-fs is exposed to the host
> out of this discussion. We should discuss this seperatly.

I'd much prefer a pid like suggested later, keeps the samples smaller.

But that said, we need guest kernel events like mmap and context
switches too, otherwise we simply can't make sense of guest userspace
addresses, we need to know the guest address space layout.

So aside from a filesystem content, we first need mmap and context
switch events to find the files we need to access.

And while I appreciate all the security talk, its basically pointless
anyway, the host can access it anyway, everybody agrees on that, but
still you're arguing the case..

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:59                                                                                                                                                   ` Joerg Roedel
@ 2010-03-24 16:09                                                                                                                                                     ` Avi Kivity
  2010-03-24 16:40                                                                                                                                                       ` Joerg Roedel
  2010-03-24 17:47                                                                                                                                                       ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 16:09 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 05:59 PM, Joerg Roedel wrote:
>
>    
>>> I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for
>>> example. This would keep anything in the process space (except for the
>>> global list of VMs which we should have anyway).
>>>
>>>        
>> How about ~/.qemu/guests/$pid?
>>      
> That makes it hard for perf to find it and even harder to get a list of
> all VMs.

Looks trivial to find a guest, less so with enumerating (still doable).

>   With /proc/<pid>/kvm/guest we could symlink all guest
> directories to /proc/kvm/ and perf reads the list from there. Also perf
> can easily derive the directory for a guest from its pid.
> Last but not least its kernel-created and thus independent from the
> userspace part being used.
>    

Doesn't perf already has a dependency on naming conventions for finding 
debug information?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:03                                                                                                                                             ` Peter Zijlstra
@ 2010-03-24 16:16                                                                                                                                               ` Avi Kivity
  2010-03-24 16:23                                                                                                                                               ` Joerg Roedel
  1 sibling, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 16:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 06:03 PM, Peter Zijlstra wrote:
> On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:
>
>    
>> What I meant was: perf-kernel puts the guest-name into every sample and
>> perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
>> symbols. I leave the question of how the guest-fs is exposed to the host
>> out of this discussion. We should discuss this seperatly.
>>      
> I'd much prefer a pid like suggested later, keeps the samples smaller.
>
> But that said, we need guest kernel events like mmap and context
> switches too, otherwise we simply can't make sense of guest userspace
> addresses, we need to know the guest address space layout.
>    

The kernel knows some of the address space layout, qemu knows all of it.

> So aside from a filesystem content, we first need mmap and context
> switch events to find the files we need to access.
>    

This only works for the guest kernel, we don't know anything about guest 
processes [1].

> And while I appreciate all the security talk, its basically pointless
> anyway, the host can access it anyway, everybody agrees on that, but
> still you're arguing the case..
>    

root can access anything, but we're not talking about root.  The idea is 
to protect against a guest that has exploited its qemu and is now 
attacking the host and its fellow guests.   uid protection is no good 
since we want to isolate the guest from host processes belonging to the 
same uid and from other guests running under the same uid.

[1] We can find out guest pids if we teach the kernel what to 
dereference, i.e. gs:offset1->offset2->offset3.  Of course this varies 
from kernel to kernel, so we need some kind of bytecode that we can run 
in perf nmi context.  Kind of what we need to run an unwinder for 
-fomit-frame-pointer.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 15:52                                                                                                                                                     ` Avi Kivity
@ 2010-03-24 16:17                                                                                                                                                       ` Joerg Roedel
  2010-03-24 16:20                                                                                                                                                         ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 16:17 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 05:52:54PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:50 PM, Joerg Roedel wrote:
>> If we go the /proc/<pid>/kvm way then the directory should probably
>> inherit the label from /proc/<pid>/?
>
> That's a security policy.  The security people like their policies  
> outside the kernel.
>
> For example, they may want a label that allows a trace context to read  
> the data, and also qemu itself for introspection.

Hm, I am not a security expert. But is this not only one entity more for
sVirt to handle? I would leave that decision to the sVirt developers.
Does attaching the same label as for the VM resources mean that root
could not access it anymore?

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:17                                                                                                                                                       ` Joerg Roedel
@ 2010-03-24 16:20                                                                                                                                                         ` Avi Kivity
  2010-03-24 16:31                                                                                                                                                           ` Joerg Roedel
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 16:20 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 06:17 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 05:52:54PM +0200, Avi Kivity wrote:
>    
>> On 03/24/2010 05:50 PM, Joerg Roedel wrote:
>>      
>>> If we go the /proc/<pid>/kvm way then the directory should probably
>>> inherit the label from /proc/<pid>/?
>>>        
>> That's a security policy.  The security people like their policies
>> outside the kernel.
>>
>> For example, they may want a label that allows a trace context to read
>> the data, and also qemu itself for introspection.
>>      
> Hm, I am not a security expert.

I'm out of my depth here as well.

> But is this not only one entity more for
> sVirt to handle? I would leave that decision to the sVirt developers.
> Does attaching the same label as for the VM resources mean that root
> could not access it anymore?
>    

IIUC processes run under a context, and there's a policy somewhere that 
tells you which context can access which label (and with what 
permissions).  There was a server on the Internet once that gave you 
root access and invited you to attack it.  No idea if anyone succeeded 
or not (I got bored after about a minute).

So it depends on the policy.  If you attach the same label, that means 
all files with the same label have the same access permissions.  I think.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:03                                                                                                                                             ` Peter Zijlstra
  2010-03-24 16:16                                                                                                                                               ` Avi Kivity
@ 2010-03-24 16:23                                                                                                                                               ` Joerg Roedel
  2010-03-24 16:45                                                                                                                                                 ` Peter Zijlstra
  1 sibling, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 16:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 05:03:42PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:
> 
> > What I meant was: perf-kernel puts the guest-name into every sample and
> > perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> > symbols. I leave the question of how the guest-fs is exposed to the host
> > out of this discussion. We should discuss this seperatly.
> 
> I'd much prefer a pid like suggested later, keeps the samples smaller.
> 
> But that said, we need guest kernel events like mmap and context
> switches too, otherwise we simply can't make sense of guest userspace
> addresses, we need to know the guest address space layout.

With the filesystem approach all we need is the pid of the guest
process. Then we can access proc/<pid>/maps of the guest and read out the
address space layout, no?

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:20                                                                                                                                                         ` Avi Kivity
@ 2010-03-24 16:31                                                                                                                                                           ` Joerg Roedel
  2010-03-24 16:32                                                                                                                                                             ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 16:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 06:20:38PM +0200, Avi Kivity wrote:
> On 03/24/2010 06:17 PM, Joerg Roedel wrote:
>> But is this not only one entity more for
>> sVirt to handle? I would leave that decision to the sVirt developers.
>> Does attaching the same label as for the VM resources mean that root
>> could not access it anymore?
>>    
>
> IIUC processes run under a context, and there's a policy somewhere that  
> tells you which context can access which label (and with what  
> permissions).  There was a server on the Internet once that gave you  
> root access and invited you to attack it.  No idea if anyone succeeded  
> or not (I got bored after about a minute).
>
> So it depends on the policy.  If you attach the same label, that means  
> all files with the same label have the same access permissions.  I think.

So if this is true we can introduce a 'trace' label and add all contexts
that should be allowed to trace to it.
But we probably should leave the details to the security experts ;-)

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:31                                                                                                                                                           ` Joerg Roedel
@ 2010-03-24 16:32                                                                                                                                                             ` Avi Kivity
  2010-03-24 16:45                                                                                                                                                               ` Joerg Roedel
  0 siblings, 1 reply; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 16:32 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 06:31 PM, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 06:20:38PM +0200, Avi Kivity wrote:
>    
>> On 03/24/2010 06:17 PM, Joerg Roedel wrote:
>>      
>>> But is this not only one entity more for
>>> sVirt to handle? I would leave that decision to the sVirt developers.
>>> Does attaching the same label as for the VM resources mean that root
>>> could not access it anymore?
>>>
>>>        
>> IIUC processes run under a context, and there's a policy somewhere that
>> tells you which context can access which label (and with what
>> permissions).  There was a server on the Internet once that gave you
>> root access and invited you to attack it.  No idea if anyone succeeded
>> or not (I got bored after about a minute).
>>
>> So it depends on the policy.  If you attach the same label, that means
>> all files with the same label have the same access permissions.  I think.
>>      
> So if this is true we can introduce a 'trace' label and add all contexts
> that should be allowed to trace to it.
> But we probably should leave the details to the security experts ;-)
>    

That's just what I want to do.  Leave it in userspace and then they can 
deal with it without telling us about it.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:09                                                                                                                                                     ` Avi Kivity
@ 2010-03-24 16:40                                                                                                                                                       ` Joerg Roedel
  2010-03-24 16:47                                                                                                                                                         ` Avi Kivity
  2010-03-24 17:47                                                                                                                                                       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 16:40 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity wrote:
> On 03/24/2010 05:59 PM, Joerg Roedel wrote:
>>
>>    
>>>> I am not tied to /sys/kvm. We could also use /proc/<pid>/kvm/ for
>>>> example. This would keep anything in the process space (except for the
>>>> global list of VMs which we should have anyway).
>>>>
>>>>        
>>> How about ~/.qemu/guests/$pid?
>>>      
>> That makes it hard for perf to find it and even harder to get a list of
>> all VMs.
>
> Looks trivial to find a guest, less so with enumerating (still doable).

Not so trival and even more likely to break. Even it perf has the pid of
the process and wants to find the directory it has to do:

1. Get the uid of the process
2. Find the username for the uid
3. Use the username to find the home-directory

Steps 2. and 3. need nsswitch and/or pam access to get this information
from whatever source the admin has configured. And depending on what the
source is it may be temporarily unavailable causing nasty timeouts. In
short, there are many weak parts in that chain making it more likely to
break.
A kernel-based approach with /proc/<pid>/kvm does not have those issues
(and to repeat myself, it is independent from the userspace being used).

	Joerg


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:32                                                                                                                                                             ` Avi Kivity
@ 2010-03-24 16:45                                                                                                                                                               ` Joerg Roedel
  2010-03-24 16:48                                                                                                                                                                 ` Avi Kivity
  0 siblings, 1 reply; 390+ messages in thread
From: Joerg Roedel @ 2010-03-24 16:45 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, Mar 24, 2010 at 06:32:51PM +0200, Avi Kivity wrote:
> On 03/24/2010 06:31 PM, Joerg Roedel wrote:

> That's just what I want to do.  Leave it in userspace and then they can  
> deal with it without telling us about it.

They can't do that with a directory in /proc?


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:23                                                                                                                                               ` Joerg Roedel
@ 2010-03-24 16:45                                                                                                                                                 ` Peter Zijlstra
  0 siblings, 0 replies; 390+ messages in thread
From: Peter Zijlstra @ 2010-03-24 16:45 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Avi Kivity, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, 2010-03-24 at 17:23 +0100, Joerg Roedel wrote:
> On Wed, Mar 24, 2010 at 05:03:42PM +0100, Peter Zijlstra wrote:
> > On Wed, 2010-03-24 at 16:01 +0100, Joerg Roedel wrote:
> > 
> > > What I meant was: perf-kernel puts the guest-name into every sample and
> > > perf-userspace accesses /sys/kvm/guest_name/fs/ later to resolve the
> > > symbols. I leave the question of how the guest-fs is exposed to the host
> > > out of this discussion. We should discuss this seperatly.
> > 
> > I'd much prefer a pid like suggested later, keeps the samples smaller.
> > 
> > But that said, we need guest kernel events like mmap and context
> > switches too, otherwise we simply can't make sense of guest userspace
> > addresses, we need to know the guest address space layout.
> 
> With the filesystem approach all we need is the pid of the guest
> process. Then we can access proc/<pid>/maps of the guest and read out the
> address space layout, no?

No, what if it maps new things after you read it? But still getting the
pid of the guest process seems non trivial without guest kernel support.

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:40                                                                                                                                                       ` Joerg Roedel
@ 2010-03-24 16:47                                                                                                                                                         ` Avi Kivity
  2010-03-24 16:52                                                                                                                                                           ` Avi Kivity
  2010-04-08 14:29                                                                                                                                                           ` Antoine Martin
  0 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 16:47 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 06:40 PM, Joerg Roedel wrote:
>
>> Looks trivial to find a guest, less so with enumerating (still doable).
>>      
> Not so trival and even more likely to break. Even it perf has the pid of
> the process and wants to find the directory it has to do:
>
> 1. Get the uid of the process
> 2. Find the username for the uid
> 3. Use the username to find the home-directory
>
> Steps 2. and 3. need nsswitch and/or pam access to get this information
> from whatever source the admin has configured. And depending on what the
> source is it may be temporarily unavailable causing nasty timeouts. In
> short, there are many weak parts in that chain making it more likely to
> break.
>    

It's true.  If the kernel provides something, there are fewer things 
that can break.  But if your system is so broken that you can't resolve 
uids, fix that before running perf.  Must we design perf for that case?

After all, 'ls -l' will break under the same circumstances.  It's hard 
to imagine doing useful work when that doesn't work.

> A kernel-based approach with /proc/<pid>/kvm does not have those issues
> (and to repeat myself, it is independent from the userspace being used).
>    

It has other issues, which are IMO more problematic.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:45                                                                                                                                                               ` Joerg Roedel
@ 2010-03-24 16:48                                                                                                                                                                 ` Avi Kivity
  0 siblings, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 16:48 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Daniel P. Berrange, Anthony Liguori, Ingo Molnar, Pekka Enberg,
	Zhang, Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 06:45 PM, Joerg Roedel wrote:
>
>> That's just what I want to do.  Leave it in userspace and then they can
>> deal with it without telling us about it.
>>      
> They can't do that with a directory in /proc?
>
>    

I don't know.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:47                                                                                                                                                         ` Avi Kivity
@ 2010-03-24 16:52                                                                                                                                                           ` Avi Kivity
  2010-04-08 14:29                                                                                                                                                           ` Antoine Martin
  1 sibling, 0 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 16:52 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang, Yanmin,
	Peter Zijlstra, Sheng Yang, linux-kernel, kvm, Marcelo Tosatti,
	Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 06:47 PM, Avi Kivity wrote:
>
> It's true.  If the kernel provides something, there are fewer things 
> that can break.  But if your system is so broken that you can't 
> resolve uids, fix that before running perf.  Must we design perf for 
> that case?
>
> After all, 'ls -l' will break under the same circumstances.  It's hard 
> to imagine doing useful work when that doesn't work.


Also, perf itself will hang if it needs to access a file using autofs or 
nfs, and those are broken.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:09                                                                                                                                                     ` Avi Kivity
  2010-03-24 16:40                                                                                                                                                       ` Joerg Roedel
@ 2010-03-24 17:47                                                                                                                                                       ` Arnaldo Carvalho de Melo
  2010-03-24 18:20                                                                                                                                                         ` Avi Kivity
  1 sibling, 1 reply; 390+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-03-24 17:47 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Fr?d?ric Weisbecker, Gregory Haskins

Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
> Doesn't perf already has a dependency on naming conventions for finding  
> debug information?

It looks at several places, from most symbol rich (/usr/lib/debug/, aka
-debuginfo packages, where we have full symtabs) to poorest (the
packaged binary, where we may just have a .dynsym).

In an ideal world, it would just get the build-id (a SHA1 cookie that is
in an ELF session inserted in every binary (aka DSOs), kernel module,
kallsyms or vmlinux file) and use that to look first in a local cache
(implemented in perf for a long time already) or in some symbol server.

For instance, for a random perf.data file I collected here in my machine
I have:

[acme@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
[acme@doppio linux-2.6-tip]$

So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
convention to get a debuginfo in a local file like:

/usr/lib/debug/lib64/libpthread-2.10.2.so.debug

Instead the tools look at:

[acme@doppio linux-2.6-tip]$ l ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 -> ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*

To find the file for that specific build-id, not the one installed in my
machine (or on the different machine, of a different architecture) that
may be completely unrelated, a new one, or one for a different arch.

- Arnaldo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 17:47                                                                                                                                                       ` Arnaldo Carvalho de Melo
@ 2010-03-24 18:20                                                                                                                                                         ` Avi Kivity
  2010-03-24 18:27                                                                                                                                                           ` Arnaldo Carvalho de Melo
  2010-03-25  9:00                                                                                                                                                           ` Zhang, Yanmin
  0 siblings, 2 replies; 390+ messages in thread
From: Avi Kivity @ 2010-03-24 18:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Fr?d?ric Weisbecker, Gregory Haskins

On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
>    
>> Doesn't perf already has a dependency on naming conventions for finding
>> debug information?
>>      
> It looks at several places, from most symbol rich (/usr/lib/debug/, aka
> -debuginfo packages, where we have full symtabs) to poorest (the
> packaged binary, where we may just have a .dynsym).
>
> In an ideal world, it would just get the build-id (a SHA1 cookie that is
> in an ELF session inserted in every binary (aka DSOs), kernel module,
> kallsyms or vmlinux file) and use that to look first in a local cache
> (implemented in perf for a long time already) or in some symbol server.
>
> For instance, for a random perf.data file I collected here in my machine
> I have:
>
> [acme@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
> 5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
> [acme@doppio linux-2.6-tip]$
>
> So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
> convention to get a debuginfo in a local file like:
>
> /usr/lib/debug/lib64/libpthread-2.10.2.so.debug
>
> Instead the tools look at:
>
> [acme@doppio linux-2.6-tip]$ l ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
> lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 ->  ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*
>
> To find the file for that specific build-id, not the one installed in my
> machine (or on the different machine, of a different architecture) that
> may be completely unrelated, a new one, or one for a different arch.
>    

Thanks.  I believe qemu could easily act as a symbol server for this use 
case.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 18:20                                                                                                                                                         ` Avi Kivity
@ 2010-03-24 18:27                                                                                                                                                           ` Arnaldo Carvalho de Melo
  2010-03-25  9:00                                                                                                                                                           ` Zhang, Yanmin
  1 sibling, 0 replies; 390+ messages in thread
From: Arnaldo Carvalho de Melo @ 2010-03-24 18:27 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Fr?d?ric Weisbecker, Gregory Haskins

Em Wed, Mar 24, 2010 at 08:20:10PM +0200, Avi Kivity escreveu:
> On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote:
>> Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
>>    
>>> Doesn't perf already has a dependency on naming conventions for finding
>>> debug information?
>>>      
>> It looks at several places, from most symbol rich (/usr/lib/debug/, aka
>> -debuginfo packages, where we have full symtabs) to poorest (the
>> packaged binary, where we may just have a .dynsym).
>>
>> In an ideal world, it would just get the build-id (a SHA1 cookie that is
>> in an ELF session inserted in every binary (aka DSOs), kernel module,
>> kallsyms or vmlinux file) and use that to look first in a local cache
>> (implemented in perf for a long time already) or in some symbol server.
>>
>> For instance, for a random perf.data file I collected here in my machine
>> I have:
>>
>> [acme@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
>> 5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
>> [acme@doppio linux-2.6-tip]$
>>
>> So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
>> convention to get a debuginfo in a local file like:
>>
>> /usr/lib/debug/lib64/libpthread-2.10.2.so.debug
>>
>> Instead the tools look at:
>>
>> [acme@doppio linux-2.6-tip]$ l ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
>> lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 ->  ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*
>>
>> To find the file for that specific build-id, not the one installed in my
>> machine (or on the different machine, of a different architecture) that
>> may be completely unrelated, a new one, or one for a different arch.

> Thanks.  I believe qemu could easily act as a symbol server for this use  
> case.

Agreed, but it doesn't even have to :-)

We just need to get the build-id in the PERF_RECORD_MMAP event somehow
and then get this symbol from elsewhere, say the same DVD/RHN
channel/Debian Repository/embedded developer toolkit image not
stripped/whatever.

Or it may already be in the local cache from last week's perf report
session :-)

- Arnaldo

^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 18:20                                                                                                                                                         ` Avi Kivity
  2010-03-24 18:27                                                                                                                                                           ` Arnaldo Carvalho de Melo
@ 2010-03-25  9:00                                                                                                                                                           ` Zhang, Yanmin
  1 sibling, 0 replies; 390+ messages in thread
From: Zhang, Yanmin @ 2010-03-25  9:00 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Arnaldo Carvalho de Melo, Joerg Roedel, Anthony Liguori,
	Ingo Molnar, Pekka Enberg, Peter Zijlstra, Sheng Yang,
	linux-kernel, kvm, Marcelo Tosatti, Jes Sorensen, Gleb Natapov,
	zhiteng.huang, Fr?d?ric Weisbecker, Gregory Haskins

On Wed, 2010-03-24 at 20:20 +0200, Avi Kivity wrote:
> On 03/24/2010 07:47 PM, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Mar 24, 2010 at 06:09:30PM +0200, Avi Kivity escreveu:
> >    
> >> Doesn't perf already has a dependency on naming conventions for finding
> >> debug information?
> >>      
> > It looks at several places, from most symbol rich (/usr/lib/debug/, aka
> > -debuginfo packages, where we have full symtabs) to poorest (the
> > packaged binary, where we may just have a .dynsym).
> >
> > In an ideal world, it would just get the build-id (a SHA1 cookie that is
> > in an ELF session inserted in every binary (aka DSOs), kernel module,
> > kallsyms or vmlinux file) and use that to look first in a local cache
> > (implemented in perf for a long time already) or in some symbol server.
> >
> > For instance, for a random perf.data file I collected here in my machine
> > I have:
> >
> > [acme@doppio linux-2.6-tip]$ perf buildid-list | grep libpthread
> > 5c68f7afeb33309c78037e374b0deee84dd441f6 /lib64/libpthread-2.10.2.so
> > [acme@doppio linux-2.6-tip]$
> >
> > So I don't have to access /lib64/libpthread-2.10.2.so directly, nor some
> > convention to get a debuginfo in a local file like:
> >
> > /usr/lib/debug/lib64/libpthread-2.10.2.so.debug
> >
> > Instead the tools look at:
> >
> > [acme@doppio linux-2.6-tip]$ l ~/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6
> > lrwxrwxrwx 1 acme acme 73 2010-01-06 18:53 /home/acme/.debug/.build-id/5c/68f7afeb33309c78037e374b0deee84dd441f6 ->  ../../lib64/libpthread-2.10.2.so/5c68f7afeb33309c78037e374b0deee84dd441f6*
> >
> > To find the file for that specific build-id, not the one installed in my
> > machine (or on the different machine, of a different architecture) that
> > may be completely unrelated, a new one, or one for a different arch.
> >    
> 
> Thanks.  I believe qemu could easily act as a symbol server for this use 
> case.


I spent a couple of days to investigate why sshfs/fuse doesn't work well with
procfs and sysfs. Just after my patch against fuse is ready almost, I found
fuse already supports such access by direct I/O. With parameter -o direct_io,
it could work well.

Here is an example to mount / from a guest os.
#sshfs -p 5551 -o direct_io localhost:/ guestmount

We can read files and write files if permission is ok.

I will go ahead to support multiple guest os instance statistics parsing.

Yanmin



^ permalink raw reply	[flat|nested] 390+ messages in thread

* Re: [RFC] Unify KVM kernel-space and user-space code into a single project
  2010-03-24 16:47                                                                                                                                                         ` Avi Kivity
  2010-03-24 16:52                                                                                                                                                           ` Avi Kivity
@ 2010-04-08 14:29                                                                                                                                                           ` Antoine Martin
  1 sibling, 0 replies; 390+ messages in thread
From: Antoine Martin @ 2010-04-08 14:29 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Joerg Roedel, Anthony Liguori, Ingo Molnar, Pekka Enberg, Zhang,
	Yanmin, Peter Zijlstra, Sheng Yang, linux-kernel, kvm,
	Marcelo Tosatti, Jes Sorensen, Gleb Natapov, ziteng.huang,
	Arnaldo Carvalho de Melo, Fr?d?ric Weisbecker, Gregory Haskins

Avi Kivity wrote:
> On 03/24/2010 06:40 PM, Joerg Roedel wrote:
>>
>>> Looks trivial to find a guest, less so with enumerating (still doable).
>>>      
>> Not so trival and even more likely to break. Even it perf has the pid of
>> the process and wants to find the directory it has to do:
>>
>> 1. Get the uid of the process
>> 2. Find the username for the uid
>> 3. Use the username to find the home-directory
>>
>> Steps 2. and 3. need nsswitch and/or pam access to get this information
>> from whatever source the admin has configured. And depending on what the
>> source is it may be temporarily unavailable causing nasty timeouts. In
>> short, there are many weak parts in that chain making it more likely to
>> break.
>>    
> 
> It's true.  If the kernel provides something, there are fewer things
> that can break.  But if your system is so broken that you can't resolve
> uids, fix that before running perf.  Must we design perf for that case?
uid to username can fail when using chroots, or worse point to an
incorrect location (and yes, I do use this)

Sorry if this has been covered / discussion has moved on. Just catching
up with the 500+ messages in my inbox..

Antoine


> 
> After all, 'ls -l' will break under the same circumstances.  It's hard
> to imagine doing useful work when that doesn't work.
> 
>> A kernel-based approach with /proc/<pid>/kvm does not have those issues
>> (and to repeat myself, it is independent from the userspace being used).
>>    
> 
> It has other issues, which are IMO more problematic.
> 

^ permalink raw reply	[flat|nested] 390+ messages in thread

end of thread, other threads:[~2010-04-08 14:29 UTC | newest]

Thread overview: 390+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-16  5:27 [PATCH] Enhance perf to collect KVM guest os statistics from host side Zhang, Yanmin
2010-03-16  5:41 ` Avi Kivity
2010-03-16  7:24   ` Ingo Molnar
2010-03-16  9:20     ` Avi Kivity
2010-03-16  9:53       ` Ingo Molnar
2010-03-16 10:13         ` Avi Kivity
2010-03-16 10:20           ` Ingo Molnar
2010-03-16 10:40             ` Avi Kivity
2010-03-16 10:50               ` Ingo Molnar
2010-03-16 11:10                 ` Avi Kivity
2010-03-16 11:25                   ` Ingo Molnar
2010-03-16 12:21                     ` Avi Kivity
2010-03-16 12:29                       ` Ingo Molnar
2010-03-16 12:41                         ` Avi Kivity
2010-03-16 13:08                           ` Ingo Molnar
2010-03-16 13:16                             ` Avi Kivity
2010-03-16 13:31                               ` Ingo Molnar
2010-03-16 13:37                                 ` Avi Kivity
2010-03-16 15:06                                 ` Frank Ch. Eigler
2010-03-16 15:52                                   ` Ingo Molnar
2010-03-16 16:08                                     ` Frank Ch. Eigler
2010-03-16 16:35                                       ` Ingo Molnar
2010-03-16 17:34                                     ` Anthony Liguori
2010-03-16 17:52                                       ` Ingo Molnar
2010-03-16 18:06                                         ` Anthony Liguori
2010-03-16 18:28                                           ` Ingo Molnar
2010-03-16 23:04                                             ` Anthony Liguori
2010-03-17  0:41                                               ` Frank Ch. Eigler
2010-03-17  3:54                                                 ` Avi Kivity
2010-03-17  8:16                                                   ` Ingo Molnar
2010-03-17  8:20                                                     ` Avi Kivity
2010-03-17  8:59                                                       ` Ingo Molnar
2010-03-18  5:27                                                   ` Huang, Zhiteng
2010-03-18  5:27                                                     ` Huang, Zhiteng
2010-03-17  8:14                                                 ` Ingo Molnar
2010-03-17  8:53                                               ` Ingo Molnar
2010-03-16 17:06                             ` Anthony Liguori
2010-03-16 17:39                               ` Ingo Molnar
2010-03-16 23:07                                 ` Anthony Liguori
2010-03-17  8:10                                   ` [RFC] Unify KVM kernel-space and user-space code into a single project Ingo Molnar
2010-03-18  8:20                                     ` Avi Kivity
2010-03-18  8:56                                       ` Ingo Molnar
2010-03-18  9:24                                         ` Alexander Graf
2010-03-18 10:10                                           ` Ingo Molnar
2010-03-18 10:21                                             ` Avi Kivity
2010-03-18 11:35                                               ` Ingo Molnar
2010-03-18 12:00                                                 ` Alexander Graf
2010-03-18 12:33                                                 ` Frank Ch. Eigler
2010-03-18 13:01                                                   ` John Kacur
2010-03-18 13:01                                                     ` John Kacur
2010-03-18 14:25                                                     ` Ingo Molnar
2010-03-18 14:39                                                       ` Frank Ch. Eigler
2010-03-18 13:02                                                   ` Ingo Molnar
2010-03-18 13:10                                                     ` Avi Kivity
2010-03-18 13:31                                                       ` Ingo Molnar
2010-03-18 13:44                                                         ` Daniel P. Berrange
2010-03-18 13:59                                                           ` Ingo Molnar
2010-03-18 14:06                                                             ` John Kacur
2010-03-18 14:06                                                               ` John Kacur
2010-03-18 14:11                                                               ` Ingo Molnar
2010-03-18 13:46                                                         ` Avi Kivity
2010-03-18 13:57                                                           ` Ingo Molnar
2010-03-18 14:25                                                             ` Avi Kivity
2010-03-18 14:36                                                               ` Ingo Molnar
2010-03-18 14:51                                                                 ` Avi Kivity
2010-03-18 13:24                                                     ` Frank Ch. Eigler
2010-03-18 13:48                                                       ` Ingo Molnar
2010-03-18 10:12                                         ` Avi Kivity
2010-03-18 10:28                                           ` Ingo Molnar
2010-03-18 10:50                                           ` Ingo Molnar
2010-03-18 11:30                                             ` Avi Kivity
2010-03-18 11:48                                               ` Ingo Molnar
2010-03-18 12:22                                                 ` Avi Kivity
2010-03-18 13:00                                                   ` Ingo Molnar
2010-03-18 13:36                                                     ` Avi Kivity
2010-03-18 14:09                                                       ` Ingo Molnar
2010-03-18 14:38                                                         ` Avi Kivity
2010-03-18 17:16                                                           ` Ingo Molnar
2010-03-18 14:59                                                     ` Anthony Liguori
2010-03-18 15:17                                                       ` Ingo Molnar
2010-03-18 16:11                                                         ` Anthony Liguori
2010-03-18 16:28                                                           ` Ingo Molnar
2010-03-18 16:38                                                             ` Anthony Liguori
2010-03-18 16:51                                                               ` Pekka Enberg
2010-03-18 16:51                                                                 ` Pekka Enberg
2010-03-18 17:02                                                                 ` Ingo Molnar
2010-03-18 17:09                                                                   ` Avi Kivity
2010-03-18 17:28                                                                     ` Ingo Molnar
2010-03-19  7:56                                                                       ` Avi Kivity
2010-03-19  8:53                                                                         ` Ingo Molnar
2010-03-19 12:56                                                                           ` Anthony Liguori
2010-03-21 19:17                                                                             ` Ingo Molnar
2010-03-21 19:35                                                                               ` Antoine Martin
2010-03-21 19:59                                                                                 ` Ingo Molnar
2010-03-21 20:09                                                                                   ` Avi Kivity
2010-03-21 21:00                                                                                     ` Ingo Molnar
2010-03-21 21:44                                                                                       ` Avi Kivity
2010-03-21 23:43                                                                                       ` Anthony Liguori
2010-03-21 20:01                                                                               ` Avi Kivity
2010-03-21 20:08                                                                                 ` Olivier Galibert
2010-03-21 20:11                                                                                   ` Avi Kivity
2010-03-21 20:18                                                                                     ` Antoine Martin
2010-03-21 20:24                                                                                       ` Avi Kivity
2010-03-21 20:31                                                                                         ` Antoine Martin
2010-03-21 21:03                                                                                           ` Avi Kivity
2010-03-21 21:20                                                                                             ` Ingo Molnar
2010-03-22  6:35                                                                                               ` Avi Kivity
2010-03-22 11:48                                                                                                 ` Ingo Molnar
2010-03-22 12:31                                                                                                   ` Pekka Enberg
2010-03-22 12:31                                                                                                     ` Pekka Enberg
2010-03-22 12:37                                                                                                     ` Daniel P. Berrange
2010-03-22 12:44                                                                                                       ` Pekka Enberg
2010-03-22 12:54                                                                                                       ` Ingo Molnar
2010-03-22 13:05                                                                                                         ` Daniel P. Berrange
2010-03-22 13:23                                                                                                           ` Richard W.M. Jones
2010-03-22 14:02                                                                                                             ` Ingo Molnar
2010-03-22 14:20                                                                                                             ` oerg Roedel
2010-03-22 13:56                                                                                                           ` Ingo Molnar
2010-03-22 14:01                                                                                                             ` Richard W.M. Jones
2010-03-22 14:07                                                                                                               ` Ingo Molnar
2010-03-22 12:36                                                                                                   ` Avi Kivity
2010-03-22 12:50                                                                                                     ` Pekka Enberg
2010-03-22 12:50                                                                                                       ` Pekka Enberg
2010-03-22  6:59                                                                                               ` Zhang, Yanmin
2010-03-22 12:05                                                                                             ` Antoine Martin
2010-03-21 20:37                                                                                     ` Ingo Molnar
2010-03-22  6:37                                                                                       ` Avi Kivity
2010-03-22 11:39                                                                                         ` Ingo Molnar
2010-03-22 12:44                                                                                           ` Avi Kivity
2010-03-22 12:54                                                                                             ` Daniel P. Berrange
2010-03-22 14:26                                                                                             ` Ingo Molnar
2010-03-22 17:29                                                                                               ` Avi Kivity
2010-03-21 20:11                                                                                   ` Avi Kivity
2010-03-21 20:31                                                                                 ` Ingo Molnar
2010-03-21 21:30                                                                                   ` Avi Kivity
2010-03-21 21:52                                                                                     ` Ingo Molnar
2010-03-22  6:49                                                                                       ` Avi Kivity
2010-03-22 11:23                                                                                         ` Ingo Molnar
2010-03-22 12:49                                                                                           ` Avi Kivity
2010-03-22 13:01                                                                                             ` Pekka Enberg
2010-03-22 13:01                                                                                               ` Pekka Enberg
2010-03-22 14:54                                                                                               ` Ingo Molnar
2010-03-22 19:04                                                                                                 ` Avi Kivity
2010-03-23  9:46                                                                                                 ` Olivier Galibert
2010-03-22 14:47                                                                                             ` Ingo Molnar
2010-03-22 18:15                                                                                               ` Avi Kivity
2010-03-22 11:10                                                                                   ` oerg Roedel
2010-03-22 12:22                                                                                     ` Ingo Molnar
2010-03-22 13:46                                                                                       ` Joerg Roedel
2010-03-22 16:32                                                                                         ` Ingo Molnar
2010-03-22 17:17                                                                                           ` Frank Ch. Eigler
2010-03-22 17:27                                                                                             ` Pekka Enberg
2010-03-22 17:27                                                                                               ` Pekka Enberg
2010-03-22 17:32                                                                                               ` Avi Kivity
2010-03-22 17:39                                                                                                 ` Ingo Molnar
2010-03-22 17:58                                                                                                   ` Avi Kivity
2010-03-22 17:52                                                                                                 ` Pekka Enberg
2010-03-22 17:52                                                                                                   ` Pekka Enberg
2010-03-22 18:04                                                                                                   ` Avi Kivity
2010-03-22 18:10                                                                                                     ` Pekka Enberg
2010-03-22 18:10                                                                                                       ` Pekka Enberg
2010-03-22 18:55                                                                                                       ` Avi Kivity
2010-03-22 17:43                                                                                               ` Ingo Molnar
2010-03-22 18:02                                                                                                 ` Avi Kivity
2010-03-22 17:44                                                                                           ` Avi Kivity
2010-03-22 19:10                                                                                             ` Ingo Molnar
2010-03-22 19:18                                                                                               ` Anthony Liguori
2010-03-22 19:23                                                                                               ` Avi Kivity
2010-03-22 19:28                                                                                               ` Andrea Arcangeli
2010-03-22 19:20                                                                                           ` Joerg Roedel
2010-03-22 19:28                                                                                             ` Avi Kivity
2010-03-22 19:49                                                                                             ` Ingo Molnar
2010-03-21 23:35                                                                               ` Anthony Liguori
2010-03-20  7:35                                                                           ` Avi Kivity
2010-03-21 19:06                                                                             ` Ingo Molnar
2010-03-21 20:22                                                                               ` Avi Kivity
2010-03-21 20:55                                                                                 ` Ingo Molnar
2010-03-21 21:42                                                                                   ` Avi Kivity
2010-03-21 21:54                                                                                     ` Ingo Molnar
2010-03-22  0:16                                                                                       ` Anthony Liguori
2010-03-22 11:59                                                                                         ` Ingo Molnar
2010-03-22  7:13                                                                                       ` Avi Kivity
2010-03-22 11:14                                                                                         ` Ingo Molnar
2010-03-22 11:23                                                                                           ` Alexander Graf
2010-03-22 12:33                                                                                             ` Lukas Kolbe
2010-03-22 12:29                                                                                           ` Avi Kivity
2010-03-22 12:44                                                                                             ` Ingo Molnar
2010-03-22 12:52                                                                                               ` Avi Kivity
2010-03-22 14:32                                                                                                 ` Ingo Molnar
2010-03-22 14:43                                                                                                   ` Anthony Liguori
2010-03-22 15:55                                                                                                     ` Ingo Molnar
2010-03-22 16:08                                                                                                       ` Anthony Liguori
2010-03-22 16:59                                                                                                         ` Ingo Molnar
2010-03-22 18:28                                                                                                           ` Anthony Liguori
2010-03-22 17:11                                                                                                         ` Ingo Molnar
2010-03-22 18:30                                                                                                           ` Anthony Liguori
2010-03-22 16:12                                                                                                       ` Avi Kivity
2010-03-22 16:16                                                                                                         ` Avi Kivity
2010-03-22 16:40                                                                                                           ` Pekka Enberg
2010-03-22 16:40                                                                                                             ` Pekka Enberg
2010-03-22 18:06                                                                                                             ` Avi Kivity
2010-03-22 16:51                                                                                                         ` Ingo Molnar
2010-03-22 17:08                                                                                                           ` Avi Kivity
2010-03-22 17:34                                                                                                             ` Ingo Molnar
2010-03-22 17:55                                                                                                               ` Avi Kivity
2010-03-22 19:15                                                                                                                 ` Anthony Liguori
2010-03-22 19:31                                                                                                                   ` Daniel P. Berrange
2010-03-22 19:33                                                                                                                     ` Anthony Liguori
2010-03-22 19:39                                                                                                                     ` Alexander Graf
2010-03-22 19:54                                                                                                                       ` Ingo Molnar
2010-03-22 19:58                                                                                                                         ` Alexander Graf
2010-03-22 20:21                                                                                                                           ` Ingo Molnar
2010-03-22 20:35                                                                                                                             ` Avi Kivity
2010-03-23 10:48                                                                                                                             ` Bernd Petrovitsch
2010-03-22 20:19                                                                                                                         ` Antoine Martin
2010-03-22 20:00                                                                                                                   ` Antoine Martin
2010-03-22 20:58                                                                                                                     ` Daniel P. Berrange
2010-03-22 19:20                                                                                                                 ` Ingo Molnar
2010-03-22 19:44                                                                                                                   ` Avi Kivity
2010-03-22 20:06                                                                                                                     ` Ingo Molnar
2010-03-22 20:15                                                                                                                       ` Avi Kivity
2010-03-22 20:29                                                                                                                         ` Ingo Molnar
2010-03-22 20:40                                                                                                                           ` Avi Kivity
2010-03-22 18:35                                                                                                               ` Anthony Liguori
2010-03-22 19:22                                                                                                                 ` Ingo Molnar
2010-03-22 19:29                                                                                                                   ` Anthony Liguori
2010-03-22 20:32                                                                                                                     ` Ingo Molnar
2010-03-22 20:43                                                                                                                       ` Avi Kivity
2010-03-22 19:45                                                                                                                   ` Avi Kivity
2010-03-22 20:35                                                                                                                     ` Ingo Molnar
2010-03-22 20:45                                                                                                                       ` Avi Kivity
2010-03-22 18:41                                                                                                               ` Anthony Liguori
2010-03-22 19:27                                                                                                                 ` Ingo Molnar
2010-03-22 19:47                                                                                                                   ` Avi Kivity
2010-03-22 20:46                                                                                                                     ` Ingo Molnar
2010-03-22 20:53                                                                                                                       ` Avi Kivity
2010-03-22 22:06                                                                                                                     ` Anthony Liguori
2010-03-23  9:07                                                                                                                       ` Avi Kivity
2010-03-23 14:09                                                                                                                         ` Anthony Liguori
2010-03-23 10:13                                                                                                                       ` Kevin Wolf
2010-03-23 10:28                                                                                                                         ` Antoine Martin
2010-03-23 14:06                                                                                                                       ` Joerg Roedel
2010-03-23 16:39                                                                                                                         ` Avi Kivity
2010-03-23 18:21                                                                                                                           ` Joerg Roedel
2010-03-23 18:27                                                                                                                             ` Peter Zijlstra
2010-03-23 19:05                                                                                                                             ` Javier Guerra Giraldez
2010-03-23 19:05                                                                                                                               ` Javier Guerra Giraldez
2010-03-24  4:57                                                                                                                             ` Avi Kivity
2010-03-24 11:59                                                                                                                               ` Joerg Roedel
2010-03-24 12:08                                                                                                                                 ` Avi Kivity
2010-03-24 12:50                                                                                                                                   ` Joerg Roedel
2010-03-24 13:05                                                                                                                                     ` Avi Kivity
2010-03-24 13:46                                                                                                                                       ` Joerg Roedel
2010-03-24 13:57                                                                                                                                         ` Avi Kivity
2010-03-24 15:01                                                                                                                                           ` Joerg Roedel
2010-03-24 15:12                                                                                                                                             ` Avi Kivity
2010-03-24 15:46                                                                                                                                               ` Joerg Roedel
2010-03-24 15:49                                                                                                                                                 ` Avi Kivity
2010-03-24 15:59                                                                                                                                                   ` Joerg Roedel
2010-03-24 16:09                                                                                                                                                     ` Avi Kivity
2010-03-24 16:40                                                                                                                                                       ` Joerg Roedel
2010-03-24 16:47                                                                                                                                                         ` Avi Kivity
2010-03-24 16:52                                                                                                                                                           ` Avi Kivity
2010-04-08 14:29                                                                                                                                                           ` Antoine Martin
2010-03-24 17:47                                                                                                                                                       ` Arnaldo Carvalho de Melo
2010-03-24 18:20                                                                                                                                                         ` Avi Kivity
2010-03-24 18:27                                                                                                                                                           ` Arnaldo Carvalho de Melo
2010-03-25  9:00                                                                                                                                                           ` Zhang, Yanmin
2010-03-24 15:26                                                                                                                                             ` Daniel P. Berrange
2010-03-24 15:37                                                                                                                                               ` Joerg Roedel
2010-03-24 15:43                                                                                                                                                 ` Avi Kivity
2010-03-24 15:50                                                                                                                                                   ` Joerg Roedel
2010-03-24 15:52                                                                                                                                                     ` Avi Kivity
2010-03-24 16:17                                                                                                                                                       ` Joerg Roedel
2010-03-24 16:20                                                                                                                                                         ` Avi Kivity
2010-03-24 16:31                                                                                                                                                           ` Joerg Roedel
2010-03-24 16:32                                                                                                                                                             ` Avi Kivity
2010-03-24 16:45                                                                                                                                                               ` Joerg Roedel
2010-03-24 16:48                                                                                                                                                                 ` Avi Kivity
2010-03-24 16:03                                                                                                                                             ` Peter Zijlstra
2010-03-24 16:16                                                                                                                                               ` Avi Kivity
2010-03-24 16:23                                                                                                                                               ` Joerg Roedel
2010-03-24 16:45                                                                                                                                                 ` Peter Zijlstra
2010-03-24 13:53                                                                                                                                       ` Alexander Graf
2010-03-24 13:59                                                                                                                                         ` Avi Kivity
2010-03-24 14:24                                                                                                                                           ` Alexander Graf
2010-03-24 15:06                                                                                                                                             ` Avi Kivity
2010-03-24  5:09                                                                                                                             ` Andi Kleen
2010-03-24  6:42                                                                                                                               ` Avi Kivity
2010-03-24  7:38                                                                                                                                 ` Andi Kleen
2010-03-24  8:59                                                                                                                                   ` Avi Kivity
2010-03-24  9:31                                                                                                                                     ` Andi Kleen
2010-03-22 14:46                                                                                                   ` Avi Kivity
2010-03-22 16:08                                                                                                     ` Ingo Molnar
2010-03-22 16:13                                                                                                       ` Avi Kivity
2010-03-24 12:06                                                                                         ` Paolo Bonzini
2010-03-21 22:00                                                                                     ` Ingo Molnar
2010-03-21 23:50                                                                                       ` Anthony Liguori
2010-03-22  0:25                                                                                       ` Anthony Liguori
2010-03-22  7:18                                                                                       ` Avi Kivity
2010-03-19  9:19                                                           ` Paul Mundt
2010-03-19  9:52                                                             ` Olivier Galibert
2010-03-19 13:56                                                               ` [LKML] " Konrad Rzeszutek Wilk
2010-03-19 13:56                                                                 ` Konrad Rzeszutek Wilk
2010-03-18 14:53                                                 ` Anthony Liguori
2010-03-18 16:13                                                   ` Ingo Molnar
2010-03-18 16:54                                                     ` Avi Kivity
2010-03-18 17:11                                                       ` Ingo Molnar
2010-03-18 18:20                                                     ` Anthony Liguori
2010-03-18 18:23                                                     ` drepper
2010-03-18 19:15                                                       ` Ingo Molnar
2010-03-18 19:37                                                         ` drepper
2010-03-18 20:18                                                           ` Ingo Molnar
2010-03-18 20:39                                                             ` drepper
2010-03-18 20:56                                                               ` Ingo Molnar
2010-03-18 22:06                                                                 ` Alan Cox
2010-03-18 22:16                                                                   ` Ingo Molnar
2010-03-19  7:22                                                                     ` Avi Kivity
2010-03-21 13:27                                                     ` Gabor Gombas
2010-03-18 21:02                                             ` Zachary Amsden
2010-03-18 21:15                                               ` Ingo Molnar
2010-03-18 22:19                                                 ` Zachary Amsden
2010-03-18 22:44                                                   ` Ingo Molnar
2010-03-19  7:21                                                     ` Avi Kivity
2010-03-20 14:59                                                       ` Andrea Arcangeli
2010-03-21 10:03                                                         ` Avi Kivity
2010-03-18  9:22                                       ` Ingo Molnar
2010-03-18 10:32                                         ` Avi Kivity
2010-03-18 11:19                                           ` Ingo Molnar
2010-03-18 18:20                                           ` Frederic Weisbecker
2010-03-18 19:50                                             ` Frank Ch. Eigler
2010-03-18 20:47                                               ` Ingo Molnar
2010-03-18  8:44                                     ` Jes Sorensen
2010-03-18  9:54                                       ` Ingo Molnar
2010-03-18 10:40                                         ` Jes Sorensen
2010-03-18 10:58                                           ` Ingo Molnar
2010-03-18 13:23                                             ` Jes Sorensen
2010-03-18 14:22                                               ` Ingo Molnar
2010-03-18 14:45                                                 ` Jes Sorensen
2010-03-18 16:54                                                   ` Ingo Molnar
2010-03-18 18:10                                                     ` Anthony Liguori
2010-03-19 14:53                                       ` Andrea Arcangeli
2010-03-18 14:38                                     ` Anthony Liguori
2010-03-18 14:44                                     ` Anthony Liguori
2010-03-16 22:30                     ` [PATCH] Enhance perf to collect KVM guest os statistics from host side oerg Roedel
2010-03-16 23:01                       ` Masami Hiramatsu
2010-03-17  7:27                       ` Ingo Molnar
2010-03-16  7:48   ` Zhang, Yanmin
2010-03-16  9:28     ` Zhang, Yanmin
2010-03-16  9:33       ` Avi Kivity
2010-03-16  9:47       ` Ingo Molnar
2010-03-17  9:26         ` Zhang, Yanmin
2010-03-18  2:45           ` Zhang, Yanmin
2010-03-18  7:49             ` Zhang, Yanmin
2010-03-18  8:03               ` Ingo Molnar
2010-03-18 13:03                 ` Arnaldo Carvalho de Melo
2010-03-16  9:32     ` Avi Kivity
2010-03-17  2:34       ` Zhang, Yanmin
2010-03-17  9:28         ` Sheng Yang
2010-03-17  9:41           ` Avi Kivity
2010-03-17  9:51             ` Sheng Yang
2010-03-17 10:06               ` Avi Kivity
2010-03-17 21:14           ` Zachary Amsden
2010-03-18  1:19             ` Sheng Yang
2010-03-18  4:50               ` Zachary Amsden
2010-03-18  5:22                 ` Sheng Yang
2010-03-18  5:41                   ` Sheng Yang
2010-03-18  8:47                     ` Zachary Amsden
2010-03-19  3:38 ` Zhang, Yanmin
2010-03-19  8:21   ` Ingo Molnar
2010-03-19 17:29     ` oerg Roedel
2010-03-21 18:43       ` Ingo Molnar
2010-03-22 10:14         ` oerg Roedel
2010-03-22 10:37           ` Ingo Molnar
2010-03-22 10:59           ` Ingo Molnar
2010-03-22 11:47             ` Joerg Roedel
2010-03-22 12:26               ` Ingo Molnar
2010-03-23 13:18               ` Soeren Sandmann
2010-03-23 13:49                 ` Andi Kleen
2010-03-23 14:04                   ` Soeren Sandmann
2010-03-23 14:20                     ` Andi Kleen
2010-03-23 14:29                       ` Arnaldo Carvalho de Melo
2010-03-23 14:46                     ` Frank Ch. Eigler
2010-03-23 14:10                   ` Arnaldo Carvalho de Melo
2010-03-23 15:23                     ` Peter Zijlstra
2010-03-22  7:24     ` Zhang, Yanmin
2010-03-22 16:44       ` Arnaldo Carvalho de Melo
2010-03-23  3:14         ` Zhang, Yanmin
2010-03-23 13:15           ` Arnaldo Carvalho de Melo
2010-03-24  1:39             ` Zhang, Yanmin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.