From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: KVM usability
Date: Mon, 01 Mar 2010 09:33:47 -0600
Message-ID: <4B8BDE5B.8090201@codemonkey.ws>
References: <1267089644.12790.74.camel@laptop> <1267152599.1726.76.camel@localhost> <20100226090147.GH15885@elte.hu> <4B879A2F.50203@redhat.com> <20100226103545.GA7463@elte.hu> <4B87A6BF.3090301@redhat.com> <20100226111734.GE7463@elte.hu> <4B8813F2.8090208@redhat.com> <20100227105643.GA17425@elte.hu> <4B893B2B.40301@redhat.com> <20100227172546.GA31472@elte.hu>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Zachary Amsden <zamsden@redhat.com>, Avi Kivity <avi@redhat.com>,
	"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>, ming.m.lin@intel.com,
	sheng.yang@intel.com, Jes Sorensen <Jes.Sorensen@redhat.com>,
	KVM General <kvm@vger.kernel.org>,
	Gleb Natapov <gleb@redhat.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Fr??d??ric Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Arjan van de Ven <arjan@infradead.org>
To: Ingo Molnar <mingo@elte.hu>
Return-path: <kvm-owner@vger.kernel.org>
Received: from qw-out-2122.google.com ([74.125.92.25]:58222 "EHLO
	qw-out-2122.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751281Ab0CAPdz (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 1 Mar 2010 10:33:55 -0500
Received: by qw-out-2122.google.com with SMTP id 8so508384qwh.37
        for <kvm@vger.kernel.org>; Mon, 01 Mar 2010 07:33:54 -0800 (PST)
In-Reply-To: <20100227172546.GA31472@elte.hu>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 02/27/2010 11:25 AM, Ingo Molnar wrote:
> * Zachary Amsden<zamsden@redhat.com>  wrote:
>
> [...]
>    
>> Second, it's not over-modularized.  The modules are the individual
>> components of the architecture.  How would you propose to put it
>> differently.  They really can't naturally combine.  And with the
>> code quality of qemu in general being problematic by Linux kernel
>> standards, it's not natural to move the device emulation directly
>> into the kernel module.  So this is why we are where we are today.
>>      
> I'm not talking about moving it into a kernel _module_ - albeit that
> alone is a worthwile thing to do for any performance sensitive hw
> component.
>
> I was talking about the option of a clean, stripped down Qemu base
> hosted in the kernel proper, in linux/tools/kvm/ or so. If i were
> running a virtualization effort it would be the first place i'd
> consider to put my tooling into.
>    

Let's ignore the suggestion of hosting it in the kernel.  There's no 
reason it couldn't be as successful hosted as a separate project.

Let's consider what you would strip of out qemu.  You would obviously 
pull out TCG and the device emulation that isn't useful for KVM.  You 
can't compile out TCG today but you actually can compile out most device 
emulation so this doesn't actually buy you much.  It certainly doesn't 
fix any of the problems you outlined.

The GUI wouldn't change at all.  You still have the same fundamental 
problem that whatever this userspace executable is, is not the place 
where you need to implement a user friendly GUI.  That has to be a 
separate process.  Maybe you could integrate that separate process into 
the same repository as the core process but we can still do this with qemu.

> It would be a no-brainer: most of the devs come from the KVM side, and
> KVM itself makes little sense without Qemu, and Qemu makes little sense
> without KVM these days. (and i know about the non-KVM and non-x86
> roots of Qemu - still, it's not a significant piece of usage today)
>    

Do you have statistics to back this up?  You would probably be surprised 
at how many people use TCG.

To be honest, every KVM developer including myself has considered and 
even prototyped exactly what you described.  We've all independently 
come to the same conclusion: it's easier to incrementally improve qemu 
than it is to split the code base and try to maintain the fork.

And a lot of the other vendors who have decided to fork qemu in the past 
have learned the hard way that it's more difficult to maintain a fork 
and are now merging back to upstream qemu.

We could certainly make the same argument about forking the kernel to 
make it optimized for virtualization.  If we took Linux and added it to 
the qemu git tree, we would instantly have transparent large page 
support for users instead of having to wait years to get it properly 
integrated.  We could also add gang scheduling and hard scheduler limits 
to the kernel.  But we know better and even though the process is more 
painful and drawn out, we end up with a much better solution in the long 
run by including the input and feedback from people like you.

Xen clearly made a different decision and is still suffering the 
consequences.  They've done the same thing with qemu as you describe and 
have now realized it was a mistake and are working to merge their 
changes into upstream qemu.

There are *plenty* of usability issues (like transparent large pages) 
that need to be addressed at the KVM/kernel level.  Today, a user has to 
choose between a ~30% decrease in performance on Java workloads or the 
ability to overcommit memory.  It's a pretty significant problem and 
there's been a lot of resistance within the kernel community to fix it.

Likewise, I'm seeing a good number of people hit problems with lock 
holder pre-emption in the field.  It's absolutely a usability problem 
when a user sees catastrophically bad performance running an 8-VCPU 
virtual machine on a 2 socket host.  Whether it's gang scheduling or 
directed yields + pause loop detection, we definitely need some 
scheduler changes to fix this problem.

Not having an option enabled by default is an annoyance that a user 
eventually overcomes with the help of documentation.  Performance 
problems are deal breakers that lead users to switch to another 
virtualization technology.

Just stripping down qemu and putting the result in the kernel source 
tree doesn't fix anything.  We have plenty of hard problems to solve 
already.

Regards,

Anthony Liguori