From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: RFC: New API for PPC for vcpu mmu access Date: Fri, 11 Feb 2011 02:41:35 +0100 Message-ID: <49812881-9E7C-4295-B708-CFA986EE9500@suse.de> References: <9F6FE96B71CF29479FF1CDC8046E15030BCD40@039-SN1MPN1-002.039d.mgd.msft.net> <20110202160821.5a223366@udp111988uds> <20110204163338.54690220@udp111988uds> <30BEE027-929B-43E5-A638-A58389F90B6F@suse.de> <20110207141547.58e49caa@udp111988uds> <220F22AA-31E5-4ACB-B0D5-557010096B91@suse.de> <20110209170928.6c629514@udp111988uds> <4D53CFE2.6080008@suse.de> <20110210125112.6d1f0380@udp111988uds> <8ACEDFEA-AA7F-400F-88F1-5F99864E8AAF@suse.de> <63E8AA2B-685F-4360-9BC8-E760A2CAD570@suse.de> Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT Cc: Yoder Stuart-B08248 , kvm-ppc@vger.kernel.org, "kvm@vger.kernel.org list" , "qemu-devel@nongnu.org List" To: Scott Wood Return-path: In-Reply-To: <63E8AA2B-685F-4360-9BC8-E760A2CAD570@suse.de> Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 11.02.2011, at 01:22, Alexander Graf wrote: > > On 11.02.2011, at 01:20, Alexander Graf wrote: > >> >> On 10.02.2011, at 19:51, Scott Wood wrote: >> >>> On Thu, 10 Feb 2011 12:45:38 +0100 >>> Alexander Graf wrote: >>> >>>> Ok, thinking about this a bit more. You're basically proposing a list of >>>> tlb set calls, with each array field identifying one tlb set call. What >>>> I was thinking of was a full TLB sync, so we could keep qemu's internal >>>> TLB representation identical to the ioctl layout and then just call that >>>> one ioctl to completely overwrite all of qemu's internal data (and vice >>>> versa). >>> >>> No, this is a full sync -- the list replaces any existing TLB entries (need >>> to make that explicit in the doc). Basically it's an invalidate plus a >>> list of tlb set operations. >>> >>> Qemu's internal representation will want to be ordered with no missing >>> entries. If we require that of the transfer representation we can't do >>> early termination. It would also limit Qemu's flexibility in choosing its >>> internal representation, and make it more awkward to support multiple MMU >>> types. >> >> Well, but this way it means we'll have to assemble/disassemble a list of entries multiple times: >> >> SET: >> * qemu assembles the list from its internal representation >> * kvm disassembles the list into its internal structure >> >> GET: >> * kvm assembles the list from its internal representation >> * qemu disassembles the list into its internal structure >> >> Maybe we should go with Avi's proposal after all and simply keep the full soft-mmu synced between kernel and user space? That way we only need a setup call at first, no copying in between and simply update the user space version whenever something changes in the guest. We need to store the TLB's contents off somewhere anyways, so all we need is an additional in-kernel array with internal translation data, but that can be separate from the guest visible data, right? > > If we could then keep qemu's internal representation == shared data with kvm == kvm's internal data for guest visible stuff, we get this done with almost no additional overhead. And I don't see any problem with this. Should be easily doable. So then everything we need to get all the functionality we need is a hint from kernel to user space that something changed and vice versa. >>From kernel to user space is simple. We can just document that after every RUN, all fields can be modified. >>From user space to kernel, we could modify the entries directly and then pass in an ioctl that passes in a dirty bitmap to kernel space. KVM can then decide what to do with it. I guess the easiest implementation for now would be to ignore the bitmap and simply flush the shadow tlb. That gives us the flush almost for free. All we need to do is set the tlb to all zeros (should be done by env init anyways) and pass in the "something changed" call. KVM can then decide to simply drop all of its shadow state or loop through every shadow entry and flush it individually. Maybe we should give a hint on the amount of flushes, so KVM can implement some threshold. Also, please tell me you didn't implement the previous revisions already. It'd be a real bummer to see that work wasted only because we're still iterating through the spec O_o. Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=41796 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pni0z-0006mU-Dx for qemu-devel@nongnu.org; Thu, 10 Feb 2011 20:41:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pni0x-0003qQ-TD for qemu-devel@nongnu.org; Thu, 10 Feb 2011 20:41:41 -0500 Received: from cantor2.suse.de ([195.135.220.15]:37868 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pni0x-0003qE-Hu for qemu-devel@nongnu.org; Thu, 10 Feb 2011 20:41:39 -0500 Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: Alexander Graf In-Reply-To: <63E8AA2B-685F-4360-9BC8-E760A2CAD570@suse.de> Date: Fri, 11 Feb 2011 02:41:35 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <49812881-9E7C-4295-B708-CFA986EE9500@suse.de> References: <9F6FE96B71CF29479FF1CDC8046E15030BCD40@039-SN1MPN1-002.039d.mgd.msft.net> <20110202160821.5a223366@udp111988uds> <20110204163338.54690220@udp111988uds> <30BEE027-929B-43E5-A638-A58389F90B6F@suse.de> <20110207141547.58e49caa@udp111988uds> <220F22AA-31E5-4ACB-B0D5-557010096B91@suse.de> <20110209170928.6c629514@udp111988uds> <4D53CFE2.6080008@suse.de> <20110210125112.6d1f0380@udp111988uds> <8ACEDFEA-AA7F-400F-88F1-5F99864E8AAF@suse.de> <63E8AA2B-685F-4360-9BC8-E760A2CAD570@suse.de> Subject: [Qemu-devel] Re: RFC: New API for PPC for vcpu mmu access List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Scott Wood Cc: Yoder Stuart-B08248 , "kvm@vger.kernel.org list" , kvm-ppc@vger.kernel.org, "qemu-devel@nongnu.org List" On 11.02.2011, at 01:22, Alexander Graf wrote: >=20 > On 11.02.2011, at 01:20, Alexander Graf wrote: >=20 >>=20 >> On 10.02.2011, at 19:51, Scott Wood wrote: >>=20 >>> On Thu, 10 Feb 2011 12:45:38 +0100 >>> Alexander Graf wrote: >>>=20 >>>> Ok, thinking about this a bit more. You're basically proposing a = list of >>>> tlb set calls, with each array field identifying one tlb set call. = What >>>> I was thinking of was a full TLB sync, so we could keep qemu's = internal >>>> TLB representation identical to the ioctl layout and then just call = that >>>> one ioctl to completely overwrite all of qemu's internal data (and = vice >>>> versa). >>>=20 >>> No, this is a full sync -- the list replaces any existing TLB = entries (need >>> to make that explicit in the doc). Basically it's an invalidate = plus a >>> list of tlb set operations. >>>=20 >>> Qemu's internal representation will want to be ordered with no = missing >>> entries. If we require that of the transfer representation we can't = do >>> early termination. It would also limit Qemu's flexibility in = choosing its >>> internal representation, and make it more awkward to support = multiple MMU >>> types. >>=20 >> Well, but this way it means we'll have to assemble/disassemble a list = of entries multiple times: >>=20 >> SET: >> * qemu assembles the list from its internal representation >> * kvm disassembles the list into its internal structure >>=20 >> GET: >> * kvm assembles the list from its internal representation >> * qemu disassembles the list into its internal structure >>=20 >> Maybe we should go with Avi's proposal after all and simply keep the = full soft-mmu synced between kernel and user space? That way we only = need a setup call at first, no copying in between and simply update the = user space version whenever something changes in the guest. We need to = store the TLB's contents off somewhere anyways, so all we need is an = additional in-kernel array with internal translation data, but that can = be separate from the guest visible data, right? >=20 > If we could then keep qemu's internal representation =3D=3D shared = data with kvm =3D=3D kvm's internal data for guest visible stuff, we get = this done with almost no additional overhead. And I don't see any = problem with this. Should be easily doable. So then everything we need to get all the functionality we need is a = hint from kernel to user space that something changed and vice versa. =46rom kernel to user space is simple. We can just document that after = every RUN, all fields can be modified. =46rom user space to kernel, we could modify the entries directly and = then pass in an ioctl that passes in a dirty bitmap to kernel space. KVM = can then decide what to do with it. I guess the easiest implementation = for now would be to ignore the bitmap and simply flush the shadow tlb. That gives us the flush almost for free. All we need to do is set the = tlb to all zeros (should be done by env init anyways) and pass in the = "something changed" call. KVM can then decide to simply drop all of its = shadow state or loop through every shadow entry and flush it = individually. Maybe we should give a hint on the amount of flushes, so = KVM can implement some threshold. Also, please tell me you didn't implement the previous revisions = already. It'd be a real bummer to see that work wasted only because = we're still iterating through the spec O_o. Alex From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Date: Fri, 11 Feb 2011 01:41:35 +0000 Subject: Re: RFC: New API for PPC for vcpu mmu access Message-Id: <49812881-9E7C-4295-B708-CFA986EE9500@suse.de> List-Id: References: <9F6FE96B71CF29479FF1CDC8046E15030BCD40@039-SN1MPN1-002.039d.mgd.msft.net> <20110202160821.5a223366@udp111988uds> <20110204163338.54690220@udp111988uds> <30BEE027-929B-43E5-A638-A58389F90B6F@suse.de> <20110207141547.58e49caa@udp111988uds> <220F22AA-31E5-4ACB-B0D5-557010096B91@suse.de> <20110209170928.6c629514@udp111988uds> <4D53CFE2.6080008@suse.de> <20110210125112.6d1f0380@udp111988uds> <8ACEDFEA-AA7F-400F-88F1-5F99864E8AAF@suse.de> <63E8AA2B-685F-4360-9BC8-E760A2CAD570@suse.de> In-Reply-To: <63E8AA2B-685F-4360-9BC8-E760A2CAD570@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Scott Wood Cc: Yoder Stuart-B08248 , kvm-ppc@vger.kernel.org, "kvm@vger.kernel.org list" , "qemu-devel@nongnu.org List" On 11.02.2011, at 01:22, Alexander Graf wrote: > > On 11.02.2011, at 01:20, Alexander Graf wrote: > >> >> On 10.02.2011, at 19:51, Scott Wood wrote: >> >>> On Thu, 10 Feb 2011 12:45:38 +0100 >>> Alexander Graf wrote: >>> >>>> Ok, thinking about this a bit more. You're basically proposing a list of >>>> tlb set calls, with each array field identifying one tlb set call. What >>>> I was thinking of was a full TLB sync, so we could keep qemu's internal >>>> TLB representation identical to the ioctl layout and then just call that >>>> one ioctl to completely overwrite all of qemu's internal data (and vice >>>> versa). >>> >>> No, this is a full sync -- the list replaces any existing TLB entries (need >>> to make that explicit in the doc). Basically it's an invalidate plus a >>> list of tlb set operations. >>> >>> Qemu's internal representation will want to be ordered with no missing >>> entries. If we require that of the transfer representation we can't do >>> early termination. It would also limit Qemu's flexibility in choosing its >>> internal representation, and make it more awkward to support multiple MMU >>> types. >> >> Well, but this way it means we'll have to assemble/disassemble a list of entries multiple times: >> >> SET: >> * qemu assembles the list from its internal representation >> * kvm disassembles the list into its internal structure >> >> GET: >> * kvm assembles the list from its internal representation >> * qemu disassembles the list into its internal structure >> >> Maybe we should go with Avi's proposal after all and simply keep the full soft-mmu synced between kernel and user space? That way we only need a setup call at first, no copying in between and simply update the user space version whenever something changes in the guest. We need to store the TLB's contents off somewhere anyways, so all we need is an additional in-kernel array with internal translation data, but that can be separate from the guest visible data, right? > > If we could then keep qemu's internal representation = shared data with kvm = kvm's internal data for guest visible stuff, we get this done with almost no additional overhead. And I don't see any problem with this. Should be easily doable. So then everything we need to get all the functionality we need is a hint from kernel to user space that something changed and vice versa. >From kernel to user space is simple. We can just document that after every RUN, all fields can be modified. >From user space to kernel, we could modify the entries directly and then pass in an ioctl that passes in a dirty bitmap to kernel space. KVM can then decide what to do with it. I guess the easiest implementation for now would be to ignore the bitmap and simply flush the shadow tlb. That gives us the flush almost for free. All we need to do is set the tlb to all zeros (should be done by env init anyways) and pass in the "something changed" call. KVM can then decide to simply drop all of its shadow state or loop through every shadow entry and flush it individually. Maybe we should give a hint on the amount of flushes, so KVM can implement some threshold. Also, please tell me you didn't implement the previous revisions already. It'd be a real bummer to see that work wasted only because we're still iterating through the spec O_o. Alex