From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gleb Natapov Subject: Re: Semantics of "-cpu host" (was Re: [PATCH 2/2] Expose tsc deadline timer cpuid to guest) Date: Thu, 10 May 2012 15:53:42 +0300 Message-ID: <20120510125342.GB4296@redhat.com> References: <20120507182142.GD16951@otherpad.lan.raisama.net> <20120508201441.GN4373@otherpad.lan.raisama.net> <6BF7428F-FDEF-4497-94F5-7A43BC9E1E67@suse.de> <20120509081404.GO15960@redhat.com> <0FA57537-0C33-468F-B416-AEB2487A9DFD@suse.de> <20120509085151.GQ15960@redhat.com> <20120509093837.GT15960@redhat.com> <20120509193802.GE4373@otherpad.lan.raisama.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andre Przywara , "kvm@vger.kernel.org" , Jan Kiszka , "qemu-devel@nongnu.org" , Alexander Graf , Avi Kivity To: Eduardo Habkost Return-path: Content-Disposition: inline In-Reply-To: <20120509193802.GE4373@otherpad.lan.raisama.net> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org On Wed, May 09, 2012 at 04:38:02PM -0300, Eduardo Habkost wrote: > On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote: > > On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote: > > > > > > On 09.05.2012, at 10:51, Gleb Natapov wrote: > > > > > > > On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote: > > > >> > > > >> > > > >> On 09.05.2012, at 10:14, Gleb Natapov wrote: > > > >> > > > >>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote: > > > >>>> > > > >>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote: > > > >>>> > > > >>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote: > > > >>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote: > > > >>>>>> > > > >>>>>>> > > > >>>>>>> Andre? Are you able to help to answer the question below? > > > >>>>>>> > > > >>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to > > > >>>>>>> be able to continue working on it. I believe the code will need to be > > > >>>>>>> fixed on either case, but first we need to figure out what are the > > > >>>>>>> expectations/requirements, to know _which_ changes will be needed. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote: > > > >>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the > > > >>>>>>>> expected meaning of "-cpu host") > > > >>>>>>>> > > > >>>>>>> [...] > > > >>>>>>>> I am not sure I understand what you are proposing. Let me explain the > > > >>>>>>>> use case I am thinking about: > > > >>>>>>>> > > > >>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that > > > >>>>>>>> doesn't require additional userspace support) > > > >>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO > > > >>>>>>>> - User gets a new CPU that supports feature FOO > > > >>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in > > > >>>>>>>> GET_SUPPORTED_CPUID) > > > >>>>>>>> - User does _not_ upgrade Qemu. > > > >>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without > > > >>>>>>>> upgrading Qemu. > > > >>>>>>>> > > > >>>>>>>> The problem here is: to support the above use-case, userspace need a > > > >>>>>>>> probing mechanism that can differentiate _new_ (previously unknown) > > > >>>>>>>> features that are in group (A) (safe to blindly enable) from features > > > >>>>>>>> that are in group (B) (that can't be enabled without an userspace > > > >>>>>>>> upgrade). > > > >>>>>>>> > > > >>>>>>>> In short, it becomes a problem if we consider the following case: > > > >>>>>>>> > > > >>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra > > > >>>>>>>> userspace support) > > > >>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR > > > >>>>>>>> - User gets a new CPU that supports feature BAR > > > >>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in > > > >>>>>>>> GET_SUPPORTED_CPUID) > > > >>>>>>>> - User does _not_ upgrade Qemu. > > > >>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu > > > >>>>>>>> host", otherwise Qemu would break. > > > >>>>>>>> > > > >>>>>>>> If userspace always limited itself to features it knows about, it would > > > >>>>>>>> be really easy to implement the feature without any new probing > > > >>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu > > > >>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who > > > >>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the > > > >>>>>>>> expected semantics on the cases above. > > > >>>>>> > > > >>>>>> Can you think of any feature that'd go into category B? > > > >>>>> > > > >>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable > > > >>>>> the in-kernel irqchip. > > > >>>> > > > >>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no? > > > >>>> > > > >>> How kernel should know that userspace does not emulate it? > > > >> > > > >> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right? > > > >> > > > >>> > > > >>>>> - x2apic: ditto. > > > >>>> > > > >>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities. > > > >>>> > > > >>> Same here. > > > >>> > > > >>> Well, technically both of those features can't be implemented in > > > >>> userspace right now since MSRs are terminated in the kernel, but I > > > >> > > > >> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case. > > > >> > > > > You mean terminating MSRs in kernel does not sound like the greatest > > > > design? I do not disagree. That is why IMO kernel can't filter out > > > > TSC-deadline and x2apic like you suggest. > > > > > > I still don't see why it can't. > > > > > > Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet. > > > Now, we implement TSC-deadline in the kernel. We still filter > > > TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide > > > an interface to user space that says "call me to enable TSC-deadline > > > CPUID, but only if you're using the in-kernel apic" > > We have that interface already, it is called KVM_SET_CPUID. :-) > > > > New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic. > > > Old user space doesn't call that ioctl. > > First of all we already have TSC-deadline in GET_SUPORTED_CPUID without > > any additional ioctls. And second I do not see why we need additional > > iostls here. > > We don't have TSC-deadline set today (and that's what started this > thread), but we have x2apic. So what you say is true for x2apic, anyway. > Hmm, strange. But s/TSC-deadline/x2apic/ and the point stands :) > > Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and > > x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not > > from KVM_SET_CPUID? For those two features it may make sense indeed. > > It makes sense to me. > > It looks like my assumptions were wrong. They were: > > - GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is > going to be enabled or not. I think this is currently true. Before we changing that we have to make sure that no existing userspace calls GET_SUPPORTED_CPUID before creating irqchip. Not sure we can check all existing userspaces. > - GET_SUPPORTED_CPUID output has to be a function of the kernel code > capabilitie and host CPU, and not depend on any input from userspace. > > Are those assumptions incorrect? If we break them, we may try what > Alexander is proposing. It would be much more flexible than the options > I was considering. > > I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't What is ENABLE_CAP? > check for the in-kernel irqchip setup for some reason, ENABLE_CAP could > be used by userpace to tell the kernel "I will enable the in-kernel > irqchip, so feel free to return features that depend on it on > GET_SUPPORTED_CPUID". > > In other words, we would return only the type-A features on > GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as > long as migration is not required), but if we use ENABLE_CAP we can make > group A safely grow, as long as userspace first tells the kernel what it > supports. > > Anybody is against doing that? Otherwise I plan to work on this. > Probably I will start by making GET_SUPPORTED_CPUID not return x2apic > unless userspace tells the kernel (using ENABLE_CAP) that it will enable > the in-kernel irqchip. Then we can do the same with TSC-deadline. > > Note that all this work is to allow the kernel to let userspace blindly > enable features it _doesn't know yet_. If we limit ourselves to features > userspace already knows about, we could simply remove x2apic and > TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for > that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only > if it knows it is safe (either because it checked for the corresponding > KVM_CAP_* capability is present and it will enable the in-kernel > irqchip, or because it will emulated it in userspace). > There is not KVM_CAP_* for x2apic as far as I see. > In case anybody is against the proposal above: note that the current > documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits > that depend on specific userspace behavior/capabilities) is simply > unusable by "-cpu host". If the above proposal gets rejected, my Plan B > is to update the GET_SUPPORTED_CPUID documentation to note that it > returns only type-A features (features that userspace can safely enable > even if it doesn't know what it does), remove x2apic from > GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all > type-B features (features that depend on specific userspace > capabilities/behavior). > This will break existing userspaces, no? > > > Not > > sure there won't be others that are not dependent on irq chip presence. > > You propose to add additional ioctls to enable them if they appear? > > I am sure there will be new features in the future that don't depend on > any userspace support, so they would be enabled on GET_SUPPORTED_CPUID > unconditionally. > Those not the kind of features I am worry about though. I worry about the features that can be emulated either by kernel or by userspace and userspace may choose where it wants the feature to be emulated. > But if we have new features that depend on specific userspace > capabilities/behavior (i.e. enabling the irqchip, or something else), we > could also add them as long as we check if that capability/behavior was > enabled using ENABLE_CAP. > > > > > > > > So at the end all bits in GET_SUPPORTED_CPUID are consistent with what user space is capable of. > > > > > GET_SUPPORTED_CPUID should not be necessary consistent with what user > > space is capable of. Userspace may emulate features that are not in > > GET_SUPPORTED_CPUID. > > True. We don't need to make the interface too complex just to make > GET_SUPPORTED_CPUID match exactly what userspace is going to enable. If > userspace wants to enable a feature because it can emulate it by its > own, it can just enable it using SET_CPUID. > > -- > Eduardo -- Gleb. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:56837) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SSSsT-0006eP-JD for qemu-devel@nongnu.org; Thu, 10 May 2012 08:53:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SSSsN-0005qM-Qv for qemu-devel@nongnu.org; Thu, 10 May 2012 08:53:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:2766) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SSSsN-0005qA-Ib for qemu-devel@nongnu.org; Thu, 10 May 2012 08:53:47 -0400 Date: Thu, 10 May 2012 15:53:42 +0300 From: Gleb Natapov Message-ID: <20120510125342.GB4296@redhat.com> References: <20120507182142.GD16951@otherpad.lan.raisama.net> <20120508201441.GN4373@otherpad.lan.raisama.net> <6BF7428F-FDEF-4497-94F5-7A43BC9E1E67@suse.de> <20120509081404.GO15960@redhat.com> <0FA57537-0C33-468F-B416-AEB2487A9DFD@suse.de> <20120509085151.GQ15960@redhat.com> <20120509093837.GT15960@redhat.com> <20120509193802.GE4373@otherpad.lan.raisama.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120509193802.GE4373@otherpad.lan.raisama.net> Subject: Re: [Qemu-devel] Semantics of "-cpu host" (was Re: [PATCH 2/2] Expose tsc deadline timer cpuid to guest) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eduardo Habkost Cc: Andre Przywara , "kvm@vger.kernel.org" , Jan Kiszka , "qemu-devel@nongnu.org" , Alexander Graf , Avi Kivity On Wed, May 09, 2012 at 04:38:02PM -0300, Eduardo Habkost wrote: > On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote: > > On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote: > > > > > > On 09.05.2012, at 10:51, Gleb Natapov wrote: > > > > > > > On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote: > > > >> > > > >> > > > >> On 09.05.2012, at 10:14, Gleb Natapov wrote: > > > >> > > > >>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote: > > > >>>> > > > >>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote: > > > >>>> > > > >>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote: > > > >>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote: > > > >>>>>> > > > >>>>>>> > > > >>>>>>> Andre? Are you able to help to answer the question below? > > > >>>>>>> > > > >>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to > > > >>>>>>> be able to continue working on it. I believe the code will need to be > > > >>>>>>> fixed on either case, but first we need to figure out what are the > > > >>>>>>> expectations/requirements, to know _which_ changes will be needed. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote: > > > >>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the > > > >>>>>>>> expected meaning of "-cpu host") > > > >>>>>>>> > > > >>>>>>> [...] > > > >>>>>>>> I am not sure I understand what you are proposing. Let me explain the > > > >>>>>>>> use case I am thinking about: > > > >>>>>>>> > > > >>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that > > > >>>>>>>> doesn't require additional userspace support) > > > >>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO > > > >>>>>>>> - User gets a new CPU that supports feature FOO > > > >>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in > > > >>>>>>>> GET_SUPPORTED_CPUID) > > > >>>>>>>> - User does _not_ upgrade Qemu. > > > >>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without > > > >>>>>>>> upgrading Qemu. > > > >>>>>>>> > > > >>>>>>>> The problem here is: to support the above use-case, userspace need a > > > >>>>>>>> probing mechanism that can differentiate _new_ (previously unknown) > > > >>>>>>>> features that are in group (A) (safe to blindly enable) from features > > > >>>>>>>> that are in group (B) (that can't be enabled without an userspace > > > >>>>>>>> upgrade). > > > >>>>>>>> > > > >>>>>>>> In short, it becomes a problem if we consider the following case: > > > >>>>>>>> > > > >>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra > > > >>>>>>>> userspace support) > > > >>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR > > > >>>>>>>> - User gets a new CPU that supports feature BAR > > > >>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in > > > >>>>>>>> GET_SUPPORTED_CPUID) > > > >>>>>>>> - User does _not_ upgrade Qemu. > > > >>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu > > > >>>>>>>> host", otherwise Qemu would break. > > > >>>>>>>> > > > >>>>>>>> If userspace always limited itself to features it knows about, it would > > > >>>>>>>> be really easy to implement the feature without any new probing > > > >>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu > > > >>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who > > > >>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the > > > >>>>>>>> expected semantics on the cases above. > > > >>>>>> > > > >>>>>> Can you think of any feature that'd go into category B? > > > >>>>> > > > >>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable > > > >>>>> the in-kernel irqchip. > > > >>>> > > > >>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no? > > > >>>> > > > >>> How kernel should know that userspace does not emulate it? > > > >> > > > >> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right? > > > >> > > > >>> > > > >>>>> - x2apic: ditto. > > > >>>> > > > >>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities. > > > >>>> > > > >>> Same here. > > > >>> > > > >>> Well, technically both of those features can't be implemented in > > > >>> userspace right now since MSRs are terminated in the kernel, but I > > > >> > > > >> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case. > > > >> > > > > You mean terminating MSRs in kernel does not sound like the greatest > > > > design? I do not disagree. That is why IMO kernel can't filter out > > > > TSC-deadline and x2apic like you suggest. > > > > > > I still don't see why it can't. > > > > > > Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet. > > > Now, we implement TSC-deadline in the kernel. We still filter > > > TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide > > > an interface to user space that says "call me to enable TSC-deadline > > > CPUID, but only if you're using the in-kernel apic" > > We have that interface already, it is called KVM_SET_CPUID. :-) > > > > New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic. > > > Old user space doesn't call that ioctl. > > First of all we already have TSC-deadline in GET_SUPORTED_CPUID without > > any additional ioctls. And second I do not see why we need additional > > iostls here. > > We don't have TSC-deadline set today (and that's what started this > thread), but we have x2apic. So what you say is true for x2apic, anyway. > Hmm, strange. But s/TSC-deadline/x2apic/ and the point stands :) > > Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and > > x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not > > from KVM_SET_CPUID? For those two features it may make sense indeed. > > It makes sense to me. > > It looks like my assumptions were wrong. They were: > > - GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is > going to be enabled or not. I think this is currently true. Before we changing that we have to make sure that no existing userspace calls GET_SUPPORTED_CPUID before creating irqchip. Not sure we can check all existing userspaces. > - GET_SUPPORTED_CPUID output has to be a function of the kernel code > capabilitie and host CPU, and not depend on any input from userspace. > > Are those assumptions incorrect? If we break them, we may try what > Alexander is proposing. It would be much more flexible than the options > I was considering. > > I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't What is ENABLE_CAP? > check for the in-kernel irqchip setup for some reason, ENABLE_CAP could > be used by userpace to tell the kernel "I will enable the in-kernel > irqchip, so feel free to return features that depend on it on > GET_SUPPORTED_CPUID". > > In other words, we would return only the type-A features on > GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as > long as migration is not required), but if we use ENABLE_CAP we can make > group A safely grow, as long as userspace first tells the kernel what it > supports. > > Anybody is against doing that? Otherwise I plan to work on this. > Probably I will start by making GET_SUPPORTED_CPUID not return x2apic > unless userspace tells the kernel (using ENABLE_CAP) that it will enable > the in-kernel irqchip. Then we can do the same with TSC-deadline. > > Note that all this work is to allow the kernel to let userspace blindly > enable features it _doesn't know yet_. If we limit ourselves to features > userspace already knows about, we could simply remove x2apic and > TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for > that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only > if it knows it is safe (either because it checked for the corresponding > KVM_CAP_* capability is present and it will enable the in-kernel > irqchip, or because it will emulated it in userspace). > There is not KVM_CAP_* for x2apic as far as I see. > In case anybody is against the proposal above: note that the current > documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits > that depend on specific userspace behavior/capabilities) is simply > unusable by "-cpu host". If the above proposal gets rejected, my Plan B > is to update the GET_SUPPORTED_CPUID documentation to note that it > returns only type-A features (features that userspace can safely enable > even if it doesn't know what it does), remove x2apic from > GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all > type-B features (features that depend on specific userspace > capabilities/behavior). > This will break existing userspaces, no? > > > Not > > sure there won't be others that are not dependent on irq chip presence. > > You propose to add additional ioctls to enable them if they appear? > > I am sure there will be new features in the future that don't depend on > any userspace support, so they would be enabled on GET_SUPPORTED_CPUID > unconditionally. > Those not the kind of features I am worry about though. I worry about the features that can be emulated either by kernel or by userspace and userspace may choose where it wants the feature to be emulated. > But if we have new features that depend on specific userspace > capabilities/behavior (i.e. enabling the irqchip, or something else), we > could also add them as long as we check if that capability/behavior was > enabled using ENABLE_CAP. > > > > > > > > So at the end all bits in GET_SUPPORTED_CPUID are consistent with what user space is capable of. > > > > > GET_SUPPORTED_CPUID should not be necessary consistent with what user > > space is capable of. Userspace may emulate features that are not in > > GET_SUPPORTED_CPUID. > > True. We don't need to make the interface too complex just to make > GET_SUPPORTED_CPUID match exactly what userspace is going to enable. If > userspace wants to enable a feature because it can emulate it by its > own, it can just enable it using SET_CPUID. > > -- > Eduardo -- Gleb.