From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
Date: Mon, 10 Jan 2011 14:11:34 -0600
Message-ID: <4D2B67F6.5030909@codemonkey.ws>
References: <cover.1294336601.git.mtosatti@redhat.com>	<ac450b882064664c79ae02e095155675ba560e88.1294336601.git.mtosatti@redhat.com>	<4D2616D6.4080309@linux.vnet.ibm.com> <4D26D6CF.5070405@web.de>	<4D27A16F.9030809@linux.vnet.ibm.com> <4D282489.90506@web.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Marcelo Tosatti <mtosatti@redhat.com>, qemu-devel@nongnu.org,
	kvm@vger.kernel.org, Alexander Graf <agraf@suse.de>
To: Jan Kiszka <jan.kiszka@web.de>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-iw0-f174.google.com ([209.85.214.174]:57967 "EHLO
	mail-iw0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754081Ab1AJUMQ (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 10 Jan 2011 15:12:16 -0500
Received: by iwn9 with SMTP id 9so19513809iwn.19
        for <kvm@vger.kernel.org>; Mon, 10 Jan 2011 12:12:15 -0800 (PST)
In-Reply-To: <4D282489.90506@web.de>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 01/08/2011 02:47 AM, Jan Kiszka wrote:
> OK, but I don't want to argue about the ioeventfd API. So let's put this
> case aside. :)
>    

I often reply too quickly without explaining myself.  Let me use 
ioeventfd as an example to highlight why KVMState is a good thing.

In real life, PIO and MMIO are never directly communicated to the device 
from the processor.  Instead, they go through a series of other 
devices.  In the case of something like an ISA device, a PIO first goes 
to the chipset into the PCI complex, it will then go through a 
PCI-to-ISA bridge via subtractive decoding, and then forward over the 
ISA device where it will be interpreted by some device.

The path to the chipset may be shared among different processors but it 
may also be unique.  The APIC is the best example as there are historic 
APICs that hung directly off of the CPUs such that the same MMIO access 
across different CPUs did not go to the same device.  This is why the 
APIC emulation in QEMU is so weird because we don't model this behavior 
correctly.

This means that a PIO operation needs to flow from a CPUState to a 
DeviceState.  It can then flow through to another DeviceState until it's 
finally handled.

The first problem with ioeventfd is that it's a per-VM operation.  It 
should be per VCPU.

But even if this were the case, the path that a PIO operation takes 
should not be impacted by ioeventfd.  IOW, a device shouldn't be 
allocating an eventfd() and handing it to a magical KVM call.  Instead, 
a device should register a callback for a particular port in the same 
way it always does.  *As an optimization*, we should have another 
interface that says that these values are only valid for this IO port.  
That would let us create eventfds and register things behind the scenes.

That means we can handle TCG, older KVM kernels, and newer KVM kernels 
without any special support in the device model.  It also means that the 
device models never have to worry about KVMState because there's an 
entirely different piece of code that's consulting the set of special 
ports and then deciding how to handle it.  The result is better, more 
portable code that doesn't have KVM-isms.

If passing state around seems to be ugly, it's probably because we're 
not abstracting things correctly.  Removing the state and making it 
implicit is the wrong solution.  Fixing the abstraction is the right 
solution (or living with the ugliness until someone else is motivated to 
fix it properly).

Regards,

Anthony Liguori