From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state Date: Thu, 20 Jan 2011 22:53:59 +0100 Message-ID: <4D38AEF7.8030506@web.de> References: <4D2B6CB5.9050602@codemonkey.ws> <4D2B74D8.4080309@web.de> <4D2B8662.9060909@web.de> <4D2C60FB.7030009@linux.vnet.ibm.com> <4D2D80ED.8030405@redhat.com> <4D2D82EE.20002@siemens.com> <4D35A39A.8000801@siemens.com> <4D35ABF8.9050700@linux.vnet.ibm.com> <4D35B521.3090601@siemens.com> <4D35B6DD.1020005@linux.vnet.ibm.com> <4D3717E7.3010105@linux.vnet.ibm.com> <4D38017D.2020401@siemens.com> <4D38A7A5.7090402@web.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig342554B94F431E0667F9611C" Cc: Anthony Liguori , Markus Armbruster , "kvm@vger.kernel.org" , Glauber Costa , Marcelo Tosatti , "qemu-devel@nongnu.org" , Avi Kivity To: Blue Swirl Return-path: Received: from fmmailgate03.web.de ([217.72.192.234]:38039 "EHLO fmmailgate03.web.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755379Ab1ATVyD (ORCPT ); Thu, 20 Jan 2011 16:54:03 -0500 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig342554B94F431E0667F9611C Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 2011-01-20 22:40, Blue Swirl wrote: > On Thu, Jan 20, 2011 at 9:22 PM, Jan Kiszka wrote: >> On 2011-01-20 20:27, Blue Swirl wrote: >>> On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka = wrote: >>>> On 2011-01-19 20:32, Blue Swirl wrote: >>>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori >>>>> wrote: >>>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote: >>>>>>> >>>>>>> So they interact with KVM (need kvm_state), and they interact wit= h the >>>>>>> emulated PCI bus. Could you elaborate on the fundamental differe= nce >>>>>>> between the two interactions that makes you choose the (hypotheti= cal) >>>>>>> KVM bus over the PCI bus as device parent? >>>>>>> >>>>>> >>>>>> It's almost arbitrary, but I would say it's the direction that I/O= s flow. >>>>>> >>>>>> But if the underlying observation is that the device tree is not r= eally a >>>>>> tree, you're 100% correct. This is part of why a factory interfac= e that >>>>>> just takes a parent bus is too simplistic. >>>>>> >>>>>> I think we ought to introduce a -pci-device option that is specifi= cally for >>>>>> creating PCI devices that doesn't require a parent bus argument bu= t provides >>>>>> a way to specify stable addressing (for instancing, using a linear= index). >>>>> >>>>> I think kvm_state should not be a property of any device or bus. It= >>>>> should be split to more logical pieces. >>>>> >>>>> Some parts of it could remain in CPUState, because they are associa= ted >>>>> with a VCPU. >>>>> >>>>> Also, for example irqfd could be considered to be similar object to= >>>>> char or block devices provided by QEMU to devices. Would it make se= nse >>>>> to introduce new host types for passing parts of kvm_state to devic= es? >>>>> >>>>> I'd also make coalesced MMIO stuff part of memory object. We are no= t >>>>> passing any state references when using cpu_physical_memory_rw(), b= ut >>>>> that could be changed. >>>> >>>> There are currently no VCPU-specific bits remaining in kvm_state. >>> >>> I think fields vcpu_events, robust_singlestep, debugregs, >>> kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be th= e >>> same for all VCPUs but still they are sort of CPU properties. I'm not= >>> sure about fd field. >> >> They are all properties of the currently loaded KVM subsystem in the >> host kernel. They can't change while KVM's root fd is opened. >> Replicating this static information into each and every VCPU state wou= ld >> be crazy. >=20 > Then each CPUX86State could have a pointer to common structure. That already exists. >=20 >> In fact, services like kvm_has_vcpu_events() already encode this: they= >> are static functions without any kvm_state reference that simply retur= n >> the content of those fields. Totally inconsistent to this, we force th= e >> caller of kvm_check_extension to pass a handle. This is part of my >> problem with the current situation and any halfhearted steps in this >> context. Either we work towards eliminating "static KVMState *kvm_stat= e" >> in kvm-all.c or eliminating KVMState. >=20 > If the CPU related fields are accessible through CPUState, the handle > should be available. >=20 >>>> It may >>>> be a good idea to introduce an arch-specific kvm_state and move rela= ted >>>> bits over. >>> >>> This should probably contain only irqchip_in_kernel, pit_in_kernel an= d >>> many_ioeventfds, maybe fd. >> >> fd is that root file descriptor you need for a few KVM services that a= re >> not bound to a specific VM - e.g. feature queries. It's not arch >> specific. Arch specific are e.g. robust_singlestep or xsave feature st= ates. >=20 > By arch you mean guest CPU architecture? They are not machine features.= No, they are practically static host features. >=20 >>> >>>> It may also once be feasible to carve out memory management >>>> related fields if we have proper abstractions for that, but I'm not >>>> completely sure here. >>> >>> I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region, >>> migration_log into the memory object. >> >> vmfd is the VM-scope file descriptor you need at machine-level. The re= st >> logically belongs to a memory object, but I haven't looked at technica= l >> details yet. >> >>> >>>> Anyway, all these things are secondary. The primary topic here is ho= w to >>>> deal with kvm_state and its fields that have VM-global scope. >>> >>> If it is an opaque blob which contains various unrelated stuff, no >>> clear place will be found. >> >> We aren't moving fields yet (and we shouldn't). We first of all need t= o >> establish the handle distribution (which apparently requires quite som= e >> work in areas beyond KVM). >=20 > But I think this is exactly the problem. If the handle is for the > current KVMState, you'll indeed need it in various places and passing > it around will be cumbersome. By moving the fields around, the > information should be available more naturally. Yeah, if we had a MachineState or if we could agree on introducing it, I'm with you again. Improving the currently cumbersome KVM API interaction was the main motivation for my original patch. Jan --------------enig342554B94F431E0667F9611C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk04rvcACgkQitSsb3rl5xQVrQCg2I/c1rj3R3CwcPtjpk+pfXb1 czgAoNGGloxxHIllgdbhlY+yo4+icI66 =3KAH -----END PGP SIGNATURE----- --------------enig342554B94F431E0667F9611C-- From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=53448 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pg2SB-0003Ss-KI for qemu-devel@nongnu.org; Thu, 20 Jan 2011 16:54:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pg2SA-0006DB-71 for qemu-devel@nongnu.org; Thu, 20 Jan 2011 16:54:03 -0500 Received: from fmmailgate03.web.de ([217.72.192.234]:38030) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pg2S9-0006Cf-ND for qemu-devel@nongnu.org; Thu, 20 Jan 2011 16:54:02 -0500 Message-ID: <4D38AEF7.8030506@web.de> Date: Thu, 20 Jan 2011 22:53:59 +0100 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state References: <4D2B6CB5.9050602@codemonkey.ws> <4D2B74D8.4080309@web.de> <4D2B8662.9060909@web.de> <4D2C60FB.7030009@linux.vnet.ibm.com> <4D2D80ED.8030405@redhat.com> <4D2D82EE.20002@siemens.com> <4D35A39A.8000801@siemens.com> <4D35ABF8.9050700@linux.vnet.ibm.com> <4D35B521.3090601@siemens.com> <4D35B6DD.1020005@linux.vnet.ibm.com> <4D3717E7.3010105@linux.vnet.ibm.com> <4D38017D.2020401@siemens.com> <4D38A7A5.7090402@web.de> In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig342554B94F431E0667F9611C" Sender: jan.kiszka@web.de List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: "kvm@vger.kernel.org" , Glauber Costa , Marcelo Tosatti , Markus Armbruster , "qemu-devel@nongnu.org" , Anthony Liguori , Avi Kivity This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig342554B94F431E0667F9611C Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 2011-01-20 22:40, Blue Swirl wrote: > On Thu, Jan 20, 2011 at 9:22 PM, Jan Kiszka wrote: >> On 2011-01-20 20:27, Blue Swirl wrote: >>> On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka = wrote: >>>> On 2011-01-19 20:32, Blue Swirl wrote: >>>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori >>>>> wrote: >>>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote: >>>>>>> >>>>>>> So they interact with KVM (need kvm_state), and they interact wit= h the >>>>>>> emulated PCI bus. Could you elaborate on the fundamental differe= nce >>>>>>> between the two interactions that makes you choose the (hypotheti= cal) >>>>>>> KVM bus over the PCI bus as device parent? >>>>>>> >>>>>> >>>>>> It's almost arbitrary, but I would say it's the direction that I/O= s flow. >>>>>> >>>>>> But if the underlying observation is that the device tree is not r= eally a >>>>>> tree, you're 100% correct. This is part of why a factory interfac= e that >>>>>> just takes a parent bus is too simplistic. >>>>>> >>>>>> I think we ought to introduce a -pci-device option that is specifi= cally for >>>>>> creating PCI devices that doesn't require a parent bus argument bu= t provides >>>>>> a way to specify stable addressing (for instancing, using a linear= index). >>>>> >>>>> I think kvm_state should not be a property of any device or bus. It= >>>>> should be split to more logical pieces. >>>>> >>>>> Some parts of it could remain in CPUState, because they are associa= ted >>>>> with a VCPU. >>>>> >>>>> Also, for example irqfd could be considered to be similar object to= >>>>> char or block devices provided by QEMU to devices. Would it make se= nse >>>>> to introduce new host types for passing parts of kvm_state to devic= es? >>>>> >>>>> I'd also make coalesced MMIO stuff part of memory object. We are no= t >>>>> passing any state references when using cpu_physical_memory_rw(), b= ut >>>>> that could be changed. >>>> >>>> There are currently no VCPU-specific bits remaining in kvm_state. >>> >>> I think fields vcpu_events, robust_singlestep, debugregs, >>> kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be th= e >>> same for all VCPUs but still they are sort of CPU properties. I'm not= >>> sure about fd field. >> >> They are all properties of the currently loaded KVM subsystem in the >> host kernel. They can't change while KVM's root fd is opened. >> Replicating this static information into each and every VCPU state wou= ld >> be crazy. >=20 > Then each CPUX86State could have a pointer to common structure. That already exists. >=20 >> In fact, services like kvm_has_vcpu_events() already encode this: they= >> are static functions without any kvm_state reference that simply retur= n >> the content of those fields. Totally inconsistent to this, we force th= e >> caller of kvm_check_extension to pass a handle. This is part of my >> problem with the current situation and any halfhearted steps in this >> context. Either we work towards eliminating "static KVMState *kvm_stat= e" >> in kvm-all.c or eliminating KVMState. >=20 > If the CPU related fields are accessible through CPUState, the handle > should be available. >=20 >>>> It may >>>> be a good idea to introduce an arch-specific kvm_state and move rela= ted >>>> bits over. >>> >>> This should probably contain only irqchip_in_kernel, pit_in_kernel an= d >>> many_ioeventfds, maybe fd. >> >> fd is that root file descriptor you need for a few KVM services that a= re >> not bound to a specific VM - e.g. feature queries. It's not arch >> specific. Arch specific are e.g. robust_singlestep or xsave feature st= ates. >=20 > By arch you mean guest CPU architecture? They are not machine features.= No, they are practically static host features. >=20 >>> >>>> It may also once be feasible to carve out memory management >>>> related fields if we have proper abstractions for that, but I'm not >>>> completely sure here. >>> >>> I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region, >>> migration_log into the memory object. >> >> vmfd is the VM-scope file descriptor you need at machine-level. The re= st >> logically belongs to a memory object, but I haven't looked at technica= l >> details yet. >> >>> >>>> Anyway, all these things are secondary. The primary topic here is ho= w to >>>> deal with kvm_state and its fields that have VM-global scope. >>> >>> If it is an opaque blob which contains various unrelated stuff, no >>> clear place will be found. >> >> We aren't moving fields yet (and we shouldn't). We first of all need t= o >> establish the handle distribution (which apparently requires quite som= e >> work in areas beyond KVM). >=20 > But I think this is exactly the problem. If the handle is for the > current KVMState, you'll indeed need it in various places and passing > it around will be cumbersome. By moving the fields around, the > information should be available more naturally. Yeah, if we had a MachineState or if we could agree on introducing it, I'm with you again. Improving the currently cumbersome KVM API interaction was the main motivation for my original patch. Jan --------------enig342554B94F431E0667F9611C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk04rvcACgkQitSsb3rl5xQVrQCg2I/c1rj3R3CwcPtjpk+pfXb1 czgAoNGGloxxHIllgdbhlY+yo4+icI66 =3KAH -----END PGP SIGNATURE----- --------------enig342554B94F431E0667F9611C--