All of lore.kernel.org
 help / color / mirror / Atom feed
* Advice on HYP interface for AsyncPF
@ 2015-04-09  1:46 Mario Smarduch
  2015-04-09  7:57 ` Marc Zyngier
  0 siblings, 1 reply; 21+ messages in thread
From: Mario Smarduch @ 2015-04-09  1:46 UTC (permalink / raw)
  To: kvmarm, christoffer.dall, Marc Zyngier, Peter Maydell

I'm working with AsyncPF, and currently using
hyp call to communicate guest GFN for host to inject
virtual abort - page not available/page available.

Currently only PSCI makes use of that interface,
(handle_hvc()) can we overload interface with additional
hyp calls in this case pass guest gfn? Set arg0
to some range outside of PSCI use.

- Mario

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09  1:46 Advice on HYP interface for AsyncPF Mario Smarduch
@ 2015-04-09  7:57 ` Marc Zyngier
  2015-04-09 12:06   ` Andrew Jones
  2015-04-10  2:36   ` Mario Smarduch
  0 siblings, 2 replies; 21+ messages in thread
From: Marc Zyngier @ 2015-04-09  7:57 UTC (permalink / raw)
  To: Mario Smarduch; +Cc: kvmarm

On Thu, 9 Apr 2015 02:46:54 +0100
Mario Smarduch <m.smarduch@samsung.com> wrote:

Hi Mario,

> I'm working with AsyncPF, and currently using
> hyp call to communicate guest GFN for host to inject
> virtual abort - page not available/page available.
> 
> Currently only PSCI makes use of that interface,
> (handle_hvc()) can we overload interface with additional
> hyp calls in this case pass guest gfn? Set arg0
> to some range outside of PSCI use.

I can't see a reason why we wouldn't open handle_hvc() to other
paravirtualized services. But this has to be done with extreme caution:

- This becomes an ABI between host and guest
- We need a discovery protocol
- We need to make sure other hypervisors don't reuse the same function
  number for other purposes

Maybe we should adopt Xen's idea of a hypervisor node in DT where we
would describe the various services? How will that work with ACPI?

Coming back to AsyncPF, and purely out of curiosity: why do you need a
HYP entry point? From what I remember, AsyncPF works by injecting a
fault in the guest when the page is found not present or made
available, with the GFN being stored in a per-vcpu memory location.

Am I missing something obvious? Or have I just displayed my ignorance on
this subject? ;-)

Thanks,

	M.
-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09  7:57 ` Marc Zyngier
@ 2015-04-09 12:06   ` Andrew Jones
  2015-04-09 12:48     ` Mark Rutland
  2015-04-09 13:35     ` Christoffer Dall
  2015-04-10  2:36   ` Mario Smarduch
  1 sibling, 2 replies; 21+ messages in thread
From: Andrew Jones @ 2015-04-09 12:06 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm

On Thu, Apr 09, 2015 at 08:57:23AM +0100, Marc Zyngier wrote:
> On Thu, 9 Apr 2015 02:46:54 +0100
> Mario Smarduch <m.smarduch@samsung.com> wrote:
> 
> Hi Mario,
> 
> > I'm working with AsyncPF, and currently using
> > hyp call to communicate guest GFN for host to inject
> > virtual abort - page not available/page available.
> > 
> > Currently only PSCI makes use of that interface,
> > (handle_hvc()) can we overload interface with additional
> > hyp calls in this case pass guest gfn? Set arg0
> > to some range outside of PSCI use.
> 
> I can't see a reason why we wouldn't open handle_hvc() to other
> paravirtualized services. But this has to be done with extreme caution:
> 
> - This becomes an ABI between host and guest

To expand on that, if the benefits don't out weight the maintenance
required for that ABI, for life, then it turns into a life-time burden.
Any guest-host speedups that can be conceived, which require hypercalls,
should probably be bounced of the hardware people first. Waiting for
improvements in the virt extensions may be a better choice than
committing to a PV solution.

> - We need a discovery protocol

Hopefully all users of the PSCI hypcall have been using function #0,
because handle_hvc unfortunately hasn't been checking it. In any case,
I'm not sure we have much choice but to start enforcing it now. Once we
do, with something like

switch(hypcall_nr) {
case 0: /* handle psci call */
default: return -KVM_ENOSYS;
}

then, I think the guest's discovery protocol can simply be

if (do_hypercall() == -ENOSYS) {
   /* PV path not supported, fall back to whatever... */
}

> - We need to make sure other hypervisors don't reuse the same function
>   number for other purposes

I'm not sure what this means. Xen already has several hypercalls defined
for ARM, the same that they have for x86, which don't match any of the
KVM hypercalls. Now, KVM for other arches (which is maybe what you meant)
does define a handful, which we should integrate with, as KVM mixes
architectures within it's hypercall number allocation, see
include/uapi/linux/kvm_para.h. Just using the common code should make it
easy to avoid problems. We don't have a problem with the PSCI hypcall, as
zero isn't allocated. Ideally we would define PSCI properly though,
e.g. KVM_HC_ARM_PSCI, and still reserve zero in the common header. To do
that maybe we'll need to keep #0 as an ARM-only alias for the new number
for compatibility now?

> 
> Maybe we should adopt Xen's idea of a hypervisor node in DT where we
> would describe the various services? How will that work with ACPI?

I don't think we'll ever have a "virt guest" ACPI table that we can
use for this stuff, so this won't work for ACPI. But I think the ENOSYS
probing should be sufficient for this anyway.

drew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 12:06   ` Andrew Jones
@ 2015-04-09 12:48     ` Mark Rutland
  2015-04-09 13:43       ` Andrew Jones
  2015-04-09 13:35     ` Christoffer Dall
  1 sibling, 1 reply; 21+ messages in thread
From: Mark Rutland @ 2015-04-09 12:48 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Marc Zyngier, kvmarm

On Thu, Apr 09, 2015 at 01:06:47PM +0100, Andrew Jones wrote:
> On Thu, Apr 09, 2015 at 08:57:23AM +0100, Marc Zyngier wrote:
> > On Thu, 9 Apr 2015 02:46:54 +0100
> > Mario Smarduch <m.smarduch@samsung.com> wrote:
> > 
> > Hi Mario,
> > 
> > > I'm working with AsyncPF, and currently using
> > > hyp call to communicate guest GFN for host to inject
> > > virtual abort - page not available/page available.
> > > 
> > > Currently only PSCI makes use of that interface,
> > > (handle_hvc()) can we overload interface with additional
> > > hyp calls in this case pass guest gfn? Set arg0
> > > to some range outside of PSCI use.
> > 
> > I can't see a reason why we wouldn't open handle_hvc() to other
> > paravirtualized services. But this has to be done with extreme caution:
> > 
> > - This becomes an ABI between host and guest
> 
> To expand on that, if the benefits don't out weight the maintenance
> required for that ABI, for life, then it turns into a life-time burden.
> Any guest-host speedups that can be conceived, which require hypercalls,
> should probably be bounced of the hardware people first. Waiting for
> improvements in the virt extensions may be a better choice than
> committing to a PV solution.
> 
> > - We need a discovery protocol
> 
> Hopefully all users of the PSCI hypcall have been using function #0,
> because handle_hvc unfortunately hasn't been checking it. In any case,
> I'm not sure we have much choice but to start enforcing it now. Once we
> do, with something like
> 
> switch(hypcall_nr) {
> case 0: /* handle psci call */
> default: return -KVM_ENOSYS;
> }
> 
> then, I think the guest's discovery protocol can simply be
> 
> if (do_hypercall() == -ENOSYS) {
>    /* PV path not supported, fall back to whatever... */
> }

That only tells you the code at EL2/Hyp did something, and only if it
actually returns. Call this on a different hypervisor (or in the absence
of one, there's no mechanism for querying) and you might bring down that
CPU or the entire system.

We need to be able to detect that some hypercall interface is present
_before_ issuing the relevant hypercalls. As Marc mentioned, we could
have a DT node and/or ACPI entry for this, and it only needs to tell us
enough to bootstrap querying the hypervisor for more info (as is the
case with Xen, I believe).

> 
> > - We need to make sure other hypervisors don't reuse the same function
> >   number for other purposes

I don't think this is a problem so long as there's a mechanism for
detecting the hyp interfaces provided. Xen and KVM could use the same
numbers for different things and that's fine because you'll only use the
Xen functions when you see the Xen node, and the KVM functions when you
are aware you're under KVM. You can't have both simultaneously.

However, these numbers must be chosen so as not to clash with SMC/HVC
Calling Convention IDs. We can't risk clashing with PSCI or other
standard interfaces we may want to expose to a guest in future.

> I'm not sure what this means. Xen already has several hypercalls defined
> for ARM, the same that they have for x86, which don't match any of the
> KVM hypercalls. Now, KVM for other arches (which is maybe what you meant)
> does define a handful, which we should integrate with, as KVM mixes
> architectures within it's hypercall number allocation, see
> include/uapi/linux/kvm_para.h. Just using the common code should make it
> easy to avoid problems. We don't have a problem with the PSCI hypcall, as
> zero isn't allocated. Ideally we would define PSCI properly though,
> e.g. KVM_HC_ARM_PSCI, and still reserve zero in the common header. To do
> that maybe we'll need to keep #0 as an ARM-only alias for the new number
> for compatibility now?

While the HVC immediate could be used to distinguish different types of
calls, the guest still needs to first determine that issuing a HVC is
not going to bring down the system, which requires it to know that a
suitable hypervisor is present.

> > Maybe we should adopt Xen's idea of a hypervisor node in DT where we
> > would describe the various services? How will that work with ACPI?
> 
> I don't think we'll ever have a "virt guest" ACPI table that we can
> use for this stuff, so this won't work for ACPI. But I think the ENOSYS
> probing should be sufficient for this anyway.

As mentioned above, I don't think that probing is safe.

What prevents us from creating a trivial "KVM guest" table that we can
use to determine that we can query more advanced info from KVM itself?
Given the point is to expose KVM-specific functionality, I don't see why
we need a more generic "virt guest" ACPI table.

Mark.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 12:06   ` Andrew Jones
  2015-04-09 12:48     ` Mark Rutland
@ 2015-04-09 13:35     ` Christoffer Dall
  2015-04-09 13:59       ` Andrew Jones
  1 sibling, 1 reply; 21+ messages in thread
From: Christoffer Dall @ 2015-04-09 13:35 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Marc Zyngier, kvmarm

On Thu, Apr 09, 2015 at 02:06:47PM +0200, Andrew Jones wrote:
> On Thu, Apr 09, 2015 at 08:57:23AM +0100, Marc Zyngier wrote:
> > On Thu, 9 Apr 2015 02:46:54 +0100
> > Mario Smarduch <m.smarduch@samsung.com> wrote:
> > 
> > Hi Mario,
> > 
> > > I'm working with AsyncPF, and currently using
> > > hyp call to communicate guest GFN for host to inject
> > > virtual abort - page not available/page available.
> > > 
> > > Currently only PSCI makes use of that interface,
> > > (handle_hvc()) can we overload interface with additional
> > > hyp calls in this case pass guest gfn? Set arg0
> > > to some range outside of PSCI use.
> > 
> > I can't see a reason why we wouldn't open handle_hvc() to other
> > paravirtualized services. But this has to be done with extreme caution:
> > 
> > - This becomes an ABI between host and guest
> 
> To expand on that, if the benefits don't out weight the maintenance
> required for that ABI, for life, then it turns into a life-time burden.
> Any guest-host speedups that can be conceived, which require hypercalls,
> should probably be bounced of the hardware people first. Waiting for
> improvements in the virt extensions may be a better choice than
> committing to a PV solution.
> 
> > - We need a discovery protocol
> 
> Hopefully all users of the PSCI hypcall have been using function #0,
> because handle_hvc unfortunately hasn't been checking it.

huh?  I don't understand this, sorry.

> In any case,
> I'm not sure we have much choice but to start enforcing it now. Once we
> do, with something like
> 
> switch(hypcall_nr) {
> case 0: /* handle psci call */
> default: return -KVM_ENOSYS;
> }
> 
> then, I think the guest's discovery protocol can simply be
> 
> if (do_hypercall() == -ENOSYS) {
>    /* PV path not supported, fall back to whatever... */
> }
> 
> > - We need to make sure other hypervisors don't reuse the same function
> >   number for other purposes
> 
> I'm not sure what this means. Xen already has several hypercalls defined
> for ARM, the same that they have for x86, which don't match any of the
> KVM hypercalls. Now, KVM for other arches (which is maybe what you meant)
> does define a handful, which we should integrate with, as KVM mixes
> architectures within it's hypercall number allocation, see
> include/uapi/linux/kvm_para.h. Just using the common code should make it
> easy to avoid problems. We don't have a problem with the PSCI hypcall, as
> zero isn't allocated. Ideally we would define PSCI properly though,
> e.g. KVM_HC_ARM_PSCI, and still reserve zero in the common header. To do
> that maybe we'll need to keep #0 as an ARM-only alias for the new number
> for compatibility now?
> 
> > 
> > Maybe we should adopt Xen's idea of a hypervisor node in DT where we
> > would describe the various services? How will that work with ACPI?
> 
> I don't think we'll ever have a "virt guest" ACPI table that we can
> use for this stuff, so this won't work for ACPI. But I think the ENOSYS
> probing should be sufficient for this anyway.
> 
We've reserved a Xen table on ACPI, not sure why we can't do the same
for KVM or a generic ARM PV table for that matter... ?

-Christoffer

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 12:48     ` Mark Rutland
@ 2015-04-09 13:43       ` Andrew Jones
  2015-04-09 14:00         ` Mark Rutland
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Jones @ 2015-04-09 13:43 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Marc Zyngier, kvmarm

On Thu, Apr 09, 2015 at 01:48:52PM +0100, Mark Rutland wrote:
> On Thu, Apr 09, 2015 at 01:06:47PM +0100, Andrew Jones wrote:
> > On Thu, Apr 09, 2015 at 08:57:23AM +0100, Marc Zyngier wrote:
> > > On Thu, 9 Apr 2015 02:46:54 +0100
> > > Mario Smarduch <m.smarduch@samsung.com> wrote:
> > > 
> > > Hi Mario,
> > > 
> > > > I'm working with AsyncPF, and currently using
> > > > hyp call to communicate guest GFN for host to inject
> > > > virtual abort - page not available/page available.
> > > > 
> > > > Currently only PSCI makes use of that interface,
> > > > (handle_hvc()) can we overload interface with additional
> > > > hyp calls in this case pass guest gfn? Set arg0
> > > > to some range outside of PSCI use.
> > > 
> > > I can't see a reason why we wouldn't open handle_hvc() to other
> > > paravirtualized services. But this has to be done with extreme caution:
> > > 
> > > - This becomes an ABI between host and guest
> > 
> > To expand on that, if the benefits don't out weight the maintenance
> > required for that ABI, for life, then it turns into a life-time burden.
> > Any guest-host speedups that can be conceived, which require hypercalls,
> > should probably be bounced of the hardware people first. Waiting for
> > improvements in the virt extensions may be a better choice than
> > committing to a PV solution.
> > 
> > > - We need a discovery protocol
> > 
> > Hopefully all users of the PSCI hypcall have been using function #0,
> > because handle_hvc unfortunately hasn't been checking it. In any case,
> > I'm not sure we have much choice but to start enforcing it now. Once we
> > do, with something like
> > 
> > switch(hypcall_nr) {
> > case 0: /* handle psci call */
> > default: return -KVM_ENOSYS;
> > }
> > 
> > then, I think the guest's discovery protocol can simply be
> > 
> > if (do_hypercall() == -ENOSYS) {
> >    /* PV path not supported, fall back to whatever... */
> > }
> 
> That only tells you the code at EL2/Hyp did something, and only if it
> actually returns. Call this on a different hypervisor (or in the absence
> of one, there's no mechanism for querying) and you might bring down that
> CPU or the entire system.
> 
> We need to be able to detect that some hypercall interface is present
> _before_ issuing the relevant hypercalls. As Marc mentioned, we could
> have a DT node and/or ACPI entry for this, and it only needs to tell us
> enough to bootstrap querying the hypervisor for more info (as is the
> case with Xen, I believe).
> 
> > 
> > > - We need to make sure other hypervisors don't reuse the same function
> > >   number for other purposes
> 
> I don't think this is a problem so long as there's a mechanism for
> detecting the hyp interfaces provided. Xen and KVM could use the same
> numbers for different things and that's fine because you'll only use the
> Xen functions when you see the Xen node, and the KVM functions when you
> are aware you're under KVM. You can't have both simultaneously.
> 
> However, these numbers must be chosen so as not to clash with SMC/HVC
> Calling Convention IDs. We can't risk clashing with PSCI or other
> standard interfaces we may want to expose to a guest in future.
> 
> > I'm not sure what this means. Xen already has several hypercalls defined
> > for ARM, the same that they have for x86, which don't match any of the
> > KVM hypercalls. Now, KVM for other arches (which is maybe what you meant)
> > does define a handful, which we should integrate with, as KVM mixes
> > architectures within it's hypercall number allocation, see
> > include/uapi/linux/kvm_para.h. Just using the common code should make it
> > easy to avoid problems. We don't have a problem with the PSCI hypcall, as
> > zero isn't allocated. Ideally we would define PSCI properly though,
> > e.g. KVM_HC_ARM_PSCI, and still reserve zero in the common header. To do
> > that maybe we'll need to keep #0 as an ARM-only alias for the new number
> > for compatibility now?
> 
> While the HVC immediate could be used to distinguish different types of
> calls, the guest still needs to first determine that issuing a HVC is
> not going to bring down the system, which requires it to know that a
> suitable hypervisor is present.

Right. I forgot we don't have anything for this in the kvmarm world. I
should have remembered, having just crossed this path for a different
issue (virt-what). In the x86 world we have a cpuid that allows guests
to see that they are a) a guest and b) of what type. The hypervisor can
fake the type if it wishes. For example KVM can emulate HyperV, allowing
Windows guests to use their "native" PV ops.

In the ARM world the hypervisor DT node does seem to be the closet
equivalent that currently exists. Both Xen and ppc KVM use it already.
Using this for DT guests means we'll need the ACPI solution though.

> 
> > > Maybe we should adopt Xen's idea of a hypervisor node in DT where we
> > > would describe the various services? How will that work with ACPI?
> > 
> > I don't think we'll ever have a "virt guest" ACPI table that we can
> > use for this stuff, so this won't work for ACPI. But I think the ENOSYS
> > probing should be sufficient for this anyway.
> 
> As mentioned above, I don't think that probing is safe.
> 
> What prevents us from creating a trivial "KVM guest" table that we can
> use to determine that we can query more advanced info from KVM itself?
> Given the point is to expose KVM-specific functionality, I don't see why
> we need a more generic "virt guest" ACPI table.
>

Just as the hypervisor node is more attractive with the consideration
that it's being adopted by other parties (xen and kvmppc), any ACPI
tables will be more likely to be accepted if they have buy-in from
a greater audience. kvmarm would be the only consumer for the time
being, but it'd be good if it was more general from the start,
particularly if general kernel code would need to know about it.

drew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 13:35     ` Christoffer Dall
@ 2015-04-09 13:59       ` Andrew Jones
  2015-04-09 14:22         ` Christoffer Dall
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Jones @ 2015-04-09 13:59 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, kvmarm

On Thu, Apr 09, 2015 at 03:35:06PM +0200, Christoffer Dall wrote:
> On Thu, Apr 09, 2015 at 02:06:47PM +0200, Andrew Jones wrote:
> > On Thu, Apr 09, 2015 at 08:57:23AM +0100, Marc Zyngier wrote:
> > > On Thu, 9 Apr 2015 02:46:54 +0100
> > > Mario Smarduch <m.smarduch@samsung.com> wrote:
> > > 
> > > Hi Mario,
> > > 
> > > > I'm working with AsyncPF, and currently using
> > > > hyp call to communicate guest GFN for host to inject
> > > > virtual abort - page not available/page available.
> > > > 
> > > > Currently only PSCI makes use of that interface,
> > > > (handle_hvc()) can we overload interface with additional
> > > > hyp calls in this case pass guest gfn? Set arg0
> > > > to some range outside of PSCI use.
> > > 
> > > I can't see a reason why we wouldn't open handle_hvc() to other
> > > paravirtualized services. But this has to be done with extreme caution:
> > > 
> > > - This becomes an ABI between host and guest
> > 
> > To expand on that, if the benefits don't out weight the maintenance
> > required for that ABI, for life, then it turns into a life-time burden.
> > Any guest-host speedups that can be conceived, which require hypercalls,
> > should probably be bounced of the hardware people first. Waiting for
> > improvements in the virt extensions may be a better choice than
> > committing to a PV solution.
> > 
> > > - We need a discovery protocol
> > 
> > Hopefully all users of the PSCI hypcall have been using function #0,
> > because handle_hvc unfortunately hasn't been checking it.
> 
> huh?  I don't understand this, sorry.

The hvc immediate used for psci is 0, and that's fine because there's
only a single hvc function currently defined. If we want to define
additional functions then we could assign each additional function a
new immediate. However, as we've never enforced the immediate to be
zero, then there's no guarantee we won't mess up a guest that expects
any immediate to work.

Of course we can continue allowing any immediate to work too, by
passing the function number in some register. Xen uses x16? We could
use reg0 too, and just integrate with the PSCI function space as well.

> 
> > In any case,
> > I'm not sure we have much choice but to start enforcing it now. Once we
> > do, with something like
> > 
> > switch(hypcall_nr) {
> > case 0: /* handle psci call */
> > default: return -KVM_ENOSYS;
> > }
> > 
> > then, I think the guest's discovery protocol can simply be
> > 
> > if (do_hypercall() == -ENOSYS) {
> >    /* PV path not supported, fall back to whatever... */
> > }
> > 
> > > - We need to make sure other hypervisors don't reuse the same function
> > >   number for other purposes
> > 
> > I'm not sure what this means. Xen already has several hypercalls defined
> > for ARM, the same that they have for x86, which don't match any of the
> > KVM hypercalls. Now, KVM for other arches (which is maybe what you meant)
> > does define a handful, which we should integrate with, as KVM mixes
> > architectures within it's hypercall number allocation, see
> > include/uapi/linux/kvm_para.h. Just using the common code should make it
> > easy to avoid problems. We don't have a problem with the PSCI hypcall, as
> > zero isn't allocated. Ideally we would define PSCI properly though,
> > e.g. KVM_HC_ARM_PSCI, and still reserve zero in the common header. To do
> > that maybe we'll need to keep #0 as an ARM-only alias for the new number
> > for compatibility now?
> > 
> > > 
> > > Maybe we should adopt Xen's idea of a hypervisor node in DT where we
> > > would describe the various services? How will that work with ACPI?
> > 
> > I don't think we'll ever have a "virt guest" ACPI table that we can
> > use for this stuff, so this won't work for ACPI. But I think the ENOSYS
> > probing should be sufficient for this anyway.
> > 
> We've reserved a Xen table on ACPI, not sure why we can't do the same
> for KVM or a generic ARM PV table for that matter... ?

OK. Does that table require the Linux kernel (outside dom0 specific
code) to know about it? Would a more generic table require that? I'm
always wary of virt creep.

drew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 13:43       ` Andrew Jones
@ 2015-04-09 14:00         ` Mark Rutland
  2015-04-09 14:22           ` Andrew Jones
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Rutland @ 2015-04-09 14:00 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Marc Zyngier, kvmarm

> > While the HVC immediate could be used to distinguish different types of
> > calls, the guest still needs to first determine that issuing a HVC is
> > not going to bring down the system, which requires it to know that a
> > suitable hypervisor is present.
> 
> Right. I forgot we don't have anything for this in the kvmarm world. I
> should have remembered, having just crossed this path for a different
> issue (virt-what). In the x86 world we have a cpuid that allows guests
> to see that they are a) a guest and b) of what type. The hypervisor can
> fake the type if it wishes. For example KVM can emulate HyperV, allowing
> Windows guests to use their "native" PV ops.
> 
> In the ARM world the hypervisor DT node does seem to be the closet
> equivalent that currently exists. Both Xen and ppc KVM use it already.
> Using this for DT guests means we'll need the ACPI solution though.

Sure.

> > > > Maybe we should adopt Xen's idea of a hypervisor node in DT where we
> > > > would describe the various services? How will that work with ACPI?
> > > 
> > > I don't think we'll ever have a "virt guest" ACPI table that we can
> > > use for this stuff, so this won't work for ACPI. But I think the ENOSYS
> > > probing should be sufficient for this anyway.
> > 
> > As mentioned above, I don't think that probing is safe.
> > 
> > What prevents us from creating a trivial "KVM guest" table that we can
> > use to determine that we can query more advanced info from KVM itself?
> > Given the point is to expose KVM-specific functionality, I don't see why
> > we need a more generic "virt guest" ACPI table.
> >
> 
> Just as the hypervisor node is more attractive with the consideration
> that it's being adopted by other parties (xen and kvmppc), any ACPI
> tables will be more likely to be accepted if they have buy-in from
> a greater audience. kvmarm would be the only consumer for the time
> being, but it'd be good if it was more general from the start,
> particularly if general kernel code would need to know about it.

I'm not sure I follow, given that there's no generic hypervisor node in
DT at the moment.

In the case of Xen there's a node with a "xen,xen" compatible string.

In the case of PPC KVM, there's a node with a "linux,kvm" compatible
string.

These don't have any overlap. I'm not sure what a generic node would buy
you short of detecting (some cases) when you're begin virtualised, given
you'd have to know about the particular hypervisor and/or services
anyway.

Mark.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 13:59       ` Andrew Jones
@ 2015-04-09 14:22         ` Christoffer Dall
  2015-04-09 14:42           ` Andrew Jones
  0 siblings, 1 reply; 21+ messages in thread
From: Christoffer Dall @ 2015-04-09 14:22 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Marc Zyngier, kvmarm

On Thu, Apr 09, 2015 at 03:59:46PM +0200, Andrew Jones wrote:
> On Thu, Apr 09, 2015 at 03:35:06PM +0200, Christoffer Dall wrote:
> > On Thu, Apr 09, 2015 at 02:06:47PM +0200, Andrew Jones wrote:
> > > On Thu, Apr 09, 2015 at 08:57:23AM +0100, Marc Zyngier wrote:
> > > > On Thu, 9 Apr 2015 02:46:54 +0100
> > > > Mario Smarduch <m.smarduch@samsung.com> wrote:
> > > > 
> > > > Hi Mario,
> > > > 
> > > > > I'm working with AsyncPF, and currently using
> > > > > hyp call to communicate guest GFN for host to inject
> > > > > virtual abort - page not available/page available.
> > > > > 
> > > > > Currently only PSCI makes use of that interface,
> > > > > (handle_hvc()) can we overload interface with additional
> > > > > hyp calls in this case pass guest gfn? Set arg0
> > > > > to some range outside of PSCI use.
> > > > 
> > > > I can't see a reason why we wouldn't open handle_hvc() to other
> > > > paravirtualized services. But this has to be done with extreme caution:
> > > > 
> > > > - This becomes an ABI between host and guest
> > > 
> > > To expand on that, if the benefits don't out weight the maintenance
> > > required for that ABI, for life, then it turns into a life-time burden.
> > > Any guest-host speedups that can be conceived, which require hypercalls,
> > > should probably be bounced of the hardware people first. Waiting for
> > > improvements in the virt extensions may be a better choice than
> > > committing to a PV solution.
> > > 
> > > > - We need a discovery protocol
> > > 
> > > Hopefully all users of the PSCI hypcall have been using function #0,
> > > because handle_hvc unfortunately hasn't been checking it.
> > 
> > huh?  I don't understand this, sorry.
> 
> The hvc immediate used for psci is 0, and that's fine because there's
> only a single hvc function currently defined. If we want to define
> additional functions then we could assign each additional function a
> new immediate. However, as we've never enforced the immediate to be
> zero, then there's no guarantee we won't mess up a guest that expects
> any immediate to work.

you're refering to an implicit definition of an "hvc function"
differentiated by the immediate field - this is what threw me off.

> 
> Of course we can continue allowing any immediate to work too, by
> passing the function number in some register. Xen uses x16? We could
> use reg0 too, and just integrate with the PSCI function space as well.
> 

I think you have to allow any immediate value, since I don't think PSCI
differentiates on that, and we want to implement that spec.  So if you
can't rely on re-using the PSCI function numbers within a different
immediate number space, you might as well use the register that PSCI
uses for differentiating between 'functions'.

> > 
> > > In any case,
> > > I'm not sure we have much choice but to start enforcing it now. Once we
> > > do, with something like
> > > 
> > > switch(hypcall_nr) {
> > > case 0: /* handle psci call */
> > > default: return -KVM_ENOSYS;
> > > }
> > > 
> > > then, I think the guest's discovery protocol can simply be
> > > 
> > > if (do_hypercall() == -ENOSYS) {
> > >    /* PV path not supported, fall back to whatever... */
> > > }
> > > 
> > > > - We need to make sure other hypervisors don't reuse the same function
> > > >   number for other purposes
> > > 
> > > I'm not sure what this means. Xen already has several hypercalls defined
> > > for ARM, the same that they have for x86, which don't match any of the
> > > KVM hypercalls. Now, KVM for other arches (which is maybe what you meant)
> > > does define a handful, which we should integrate with, as KVM mixes
> > > architectures within it's hypercall number allocation, see
> > > include/uapi/linux/kvm_para.h. Just using the common code should make it
> > > easy to avoid problems. We don't have a problem with the PSCI hypcall, as
> > > zero isn't allocated. Ideally we would define PSCI properly though,
> > > e.g. KVM_HC_ARM_PSCI, and still reserve zero in the common header. To do
> > > that maybe we'll need to keep #0 as an ARM-only alias for the new number
> > > for compatibility now?
> > > 
> > > > 
> > > > Maybe we should adopt Xen's idea of a hypervisor node in DT where we
> > > > would describe the various services? How will that work with ACPI?
> > > 
> > > I don't think we'll ever have a "virt guest" ACPI table that we can
> > > use for this stuff, so this won't work for ACPI. But I think the ENOSYS
> > > probing should be sufficient for this anyway.
> > > 
> > We've reserved a Xen table on ACPI, not sure why we can't do the same
> > for KVM or a generic ARM PV table for that matter... ?
> 
> OK. Does that table require the Linux kernel (outside dom0 specific
> code) to know about it? Would a more generic table require that? I'm
> always wary of virt creep.
> 

Not sure what your distinction between dom0 specific code and Linux is,
but it's very similar to the Xen node in the DT.  If it's present and
contain valid entries, the table tells an OS that it's running on the
Xen hypervisor and provides the necessary info to run on Xen, such as
the grant table addresses etc.

See the spec here:
http://wiki.xenproject.org/mediawiki/images/c/c4/Xen-environment-table.pdf

Note that this table spec is not part of the ACPI spec but is just a
reserved name, which is significantly easier than including something in
the ACPI spec itself.  For the purposes of Linux on KVM/ARM this is
completely fine too, and if anything else wants to run on KVM/ARM they
most likely will refer to the same table and/or advocate including the
table spec within the ACPI spec itself.

-Christoffer

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 14:00         ` Mark Rutland
@ 2015-04-09 14:22           ` Andrew Jones
  2015-04-09 14:37             ` Mark Rutland
  2015-04-09 14:48             ` Mark Rutland
  0 siblings, 2 replies; 21+ messages in thread
From: Andrew Jones @ 2015-04-09 14:22 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Marc Zyngier, kvmarm

On Thu, Apr 09, 2015 at 03:00:27PM +0100, Mark Rutland wrote:
> > > While the HVC immediate could be used to distinguish different types of
> > > calls, the guest still needs to first determine that issuing a HVC is
> > > not going to bring down the system, which requires it to know that a
> > > suitable hypervisor is present.
> > 
> > Right. I forgot we don't have anything for this in the kvmarm world. I
> > should have remembered, having just crossed this path for a different
> > issue (virt-what). In the x86 world we have a cpuid that allows guests
> > to see that they are a) a guest and b) of what type. The hypervisor can
> > fake the type if it wishes. For example KVM can emulate HyperV, allowing
> > Windows guests to use their "native" PV ops.
> > 
> > In the ARM world the hypervisor DT node does seem to be the closet
> > equivalent that currently exists. Both Xen and ppc KVM use it already.
> > Using this for DT guests means we'll need the ACPI solution though.
> 
> Sure.
> 
> > > > > Maybe we should adopt Xen's idea of a hypervisor node in DT where we
> > > > > would describe the various services? How will that work with ACPI?
> > > > 
> > > > I don't think we'll ever have a "virt guest" ACPI table that we can
> > > > use for this stuff, so this won't work for ACPI. But I think the ENOSYS
> > > > probing should be sufficient for this anyway.
> > > 
> > > As mentioned above, I don't think that probing is safe.
> > > 
> > > What prevents us from creating a trivial "KVM guest" table that we can
> > > use to determine that we can query more advanced info from KVM itself?
> > > Given the point is to expose KVM-specific functionality, I don't see why
> > > we need a more generic "virt guest" ACPI table.
> > >
> > 
> > Just as the hypervisor node is more attractive with the consideration
> > that it's being adopted by other parties (xen and kvmppc), any ACPI
> > tables will be more likely to be accepted if they have buy-in from
> > a greater audience. kvmarm would be the only consumer for the time
> > being, but it'd be good if it was more general from the start,
> > particularly if general kernel code would need to know about it.
> 
> I'm not sure I follow, given that there's no generic hypervisor node in
> DT at the moment.
> 
> In the case of Xen there's a node with a "xen,xen" compatible string.
> 
> In the case of PPC KVM, there's a node with a "linux,kvm" compatible
> string.

DT nodes documented in Documentation/devicetree/bindings are about as
good as DT node formalism gets, right? Currently the hypervisor node
is only documented separately in the arm/ and powerpc/ dirs, so maybe
a Documentation/devicetree/bindings/hypervisor.txt file would be a good
addition in order to make the common node official.

> 
> These don't have any overlap. I'm not sure what a generic node would buy
> you short of detecting (some cases) when you're begin virtualised, given
> you'd have to know about the particular hypervisor and/or services
> anyway.

As I pointed out, it's possible to fake one hypervisor with another,
but only if the guest knows just one (standard) place too look. For
example, if Windows guests were taught to look for a particular ACPI
table to learn about their HyperV features, then KVM could fake that
table and also emulate the hypercalls. Besides, tools like virt-what
would benefit from knowing where to look, etc.

So, if we go the ACPI table route, then I think it should be a
hypervisor and architecture agnostic table. Just as the "hypervisor"
node is super generically named, and it's up to the compatible strings
to sort it out.

(btw, the hypervisor node already supports faking other hypervisors with
 its compatible string. You could specify "linux,kvm" first, and
 "xen,xen" second, if you had the capability to emulate the later.)

drew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 14:22           ` Andrew Jones
@ 2015-04-09 14:37             ` Mark Rutland
  2015-04-09 14:54               ` Andrew Jones
  2015-04-09 14:48             ` Mark Rutland
  1 sibling, 1 reply; 21+ messages in thread
From: Mark Rutland @ 2015-04-09 14:37 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Marc Zyngier, kvmarm

> > > Just as the hypervisor node is more attractive with the consideration
> > > that it's being adopted by other parties (xen and kvmppc), any ACPI
> > > tables will be more likely to be accepted if they have buy-in from
> > > a greater audience. kvmarm would be the only consumer for the time
> > > being, but it'd be good if it was more general from the start,
> > > particularly if general kernel code would need to know about it.
> > 
> > I'm not sure I follow, given that there's no generic hypervisor node in
> > DT at the moment.
> > 
> > In the case of Xen there's a node with a "xen,xen" compatible string.
> > 
> > In the case of PPC KVM, there's a node with a "linux,kvm" compatible
> > string.
> 
> DT nodes documented in Documentation/devicetree/bindings are about as
> good as DT node formalism gets, right? Currently the hypervisor node
> is only documented separately in the arm/ and powerpc/ dirs, so maybe
> a Documentation/devicetree/bindings/hypervisor.txt file would be a good
> addition in order to make the common node official.

>From my PoV, each hypervisor has its own binding, which is identified by
its set of compatible strings, and node naming colliding on
"/hypervisor" is coincidental.

We can suggest that "/hypervisor" is the usual name for a node related
to a hypervisor, but that tells us nothing about the format of the node.

> > These don't have any overlap. I'm not sure what a generic node would
> > buy
> > you short of detecting (some cases) when you're begin virtualised, given
> > you'd have to know about the particular hypervisor and/or services
> > anyway.
> 
> As I pointed out, it's possible to fake one hypervisor with another,
> but only if the guest knows just one (standard) place too look.

For a hypervisor to impersonate another, it just needs to do whatever
that other hypervisor does. The guest needs to know where to look for
the info relating to the hypervisor being impersonated, but that does
not require a common location.

> For example, if Windows guests were taught to look for a particular
> ACPI table to learn about their HyperV features, then KVM could fake
> that table and also emulate the hypercalls.

This requires KVM to provide those services and for the information to
be in the appropriate location. This does not require all hypervisors to
place information in the same location, nor for there to be a common
format for that information.

> Besides, tools like virt-what would benefit from knowing where to
> look, etc.

This is a different case, for which a common location would likely be
helpful, though not complete.

> So, if we go the ACPI table route, then I think it should be a
> hypervisor and architecture agnostic table. Just as the "hypervisor"
> node is super generically named, and it's up to the compatible strings
> to sort it out.
> 
> (btw, the hypervisor node already supports faking other hypervisors with
>  its compatible string. You could specify "linux,kvm" first, and
>  "xen,xen" second, if you had the capability to emulate the later.)

Evidently we have different views w.r.t. binding to tables.

>From my PoV, a node which we don't recognise the compatible string for
(which is not a legacy node with a static path) is something we know
nothing about, and any name applied to the node is just there to help
the casual reader. For there to be a contract as to the purpose and
format of the node, there should be a compatible string list.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 14:22         ` Christoffer Dall
@ 2015-04-09 14:42           ` Andrew Jones
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Jones @ 2015-04-09 14:42 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, kvmarm

On Thu, Apr 09, 2015 at 04:22:33PM +0200, Christoffer Dall wrote:
> On Thu, Apr 09, 2015 at 03:59:46PM +0200, Andrew Jones wrote:
> > On Thu, Apr 09, 2015 at 03:35:06PM +0200, Christoffer Dall wrote:
> > > On Thu, Apr 09, 2015 at 02:06:47PM +0200, Andrew Jones wrote:
> > > > On Thu, Apr 09, 2015 at 08:57:23AM +0100, Marc Zyngier wrote:
> > > > > On Thu, 9 Apr 2015 02:46:54 +0100
> > > > > Mario Smarduch <m.smarduch@samsung.com> wrote:
> > > > > 
> > > > > Hi Mario,
> > > > > 
> > > > > > I'm working with AsyncPF, and currently using
> > > > > > hyp call to communicate guest GFN for host to inject
> > > > > > virtual abort - page not available/page available.
> > > > > > 
> > > > > > Currently only PSCI makes use of that interface,
> > > > > > (handle_hvc()) can we overload interface with additional
> > > > > > hyp calls in this case pass guest gfn? Set arg0
> > > > > > to some range outside of PSCI use.
> > > > > 
> > > > > I can't see a reason why we wouldn't open handle_hvc() to other
> > > > > paravirtualized services. But this has to be done with extreme caution:
> > > > > 
> > > > > - This becomes an ABI between host and guest
> > > > 
> > > > To expand on that, if the benefits don't out weight the maintenance
> > > > required for that ABI, for life, then it turns into a life-time burden.
> > > > Any guest-host speedups that can be conceived, which require hypercalls,
> > > > should probably be bounced of the hardware people first. Waiting for
> > > > improvements in the virt extensions may be a better choice than
> > > > committing to a PV solution.
> > > > 
> > > > > - We need a discovery protocol
> > > > 
> > > > Hopefully all users of the PSCI hypcall have been using function #0,
> > > > because handle_hvc unfortunately hasn't been checking it.
> > > 
> > > huh?  I don't understand this, sorry.
> > 
> > The hvc immediate used for psci is 0, and that's fine because there's
> > only a single hvc function currently defined. If we want to define
> > additional functions then we could assign each additional function a
> > new immediate. However, as we've never enforced the immediate to be
> > zero, then there's no guarantee we won't mess up a guest that expects
> > any immediate to work.
> 
> you're refering to an implicit definition of an "hvc function"
> differentiated by the immediate field - this is what threw me off.
> 
> > 
> > Of course we can continue allowing any immediate to work too, by
> > passing the function number in some register. Xen uses x16? We could
> > use reg0 too, and just integrate with the PSCI function space as well.
> > 
> 
> I think you have to allow any immediate value, since I don't think PSCI
> differentiates on that, and we want to implement that spec.  So if you
> can't rely on re-using the PSCI function numbers within a different
> immediate number space, you might as well use the register that PSCI
> uses for differentiating between 'functions'.
> 
> > > 
> > > > In any case,
> > > > I'm not sure we have much choice but to start enforcing it now. Once we
> > > > do, with something like
> > > > 
> > > > switch(hypcall_nr) {
> > > > case 0: /* handle psci call */
> > > > default: return -KVM_ENOSYS;
> > > > }
> > > > 
> > > > then, I think the guest's discovery protocol can simply be
> > > > 
> > > > if (do_hypercall() == -ENOSYS) {
> > > >    /* PV path not supported, fall back to whatever... */
> > > > }
> > > > 
> > > > > - We need to make sure other hypervisors don't reuse the same function
> > > > >   number for other purposes
> > > > 
> > > > I'm not sure what this means. Xen already has several hypercalls defined
> > > > for ARM, the same that they have for x86, which don't match any of the
> > > > KVM hypercalls. Now, KVM for other arches (which is maybe what you meant)
> > > > does define a handful, which we should integrate with, as KVM mixes
> > > > architectures within it's hypercall number allocation, see
> > > > include/uapi/linux/kvm_para.h. Just using the common code should make it
> > > > easy to avoid problems. We don't have a problem with the PSCI hypcall, as
> > > > zero isn't allocated. Ideally we would define PSCI properly though,
> > > > e.g. KVM_HC_ARM_PSCI, and still reserve zero in the common header. To do
> > > > that maybe we'll need to keep #0 as an ARM-only alias for the new number
> > > > for compatibility now?
> > > > 
> > > > > 
> > > > > Maybe we should adopt Xen's idea of a hypervisor node in DT where we
> > > > > would describe the various services? How will that work with ACPI?
> > > > 
> > > > I don't think we'll ever have a "virt guest" ACPI table that we can
> > > > use for this stuff, so this won't work for ACPI. But I think the ENOSYS
> > > > probing should be sufficient for this anyway.
> > > > 
> > > We've reserved a Xen table on ACPI, not sure why we can't do the same
> > > for KVM or a generic ARM PV table for that matter... ?
> > 
> > OK. Does that table require the Linux kernel (outside dom0 specific
> > code) to know about it? Would a more generic table require that? I'm
> > always wary of virt creep.
> > 
> 
> Not sure what your distinction between dom0 specific code and Linux is,

Just how much code that parses the xen table is in the common acpi
parsing code vs. how much is wrapped in #ifdef CONFIG_XEN.

> but it's very similar to the Xen node in the DT.  If it's present and
> contain valid entries, the table tells an OS that it's running on the
> Xen hypervisor and provides the necessary info to run on Xen, such as
> the grant table addresses etc.
> 
> See the spec here:
> http://wiki.xenproject.org/mediawiki/images/c/c4/Xen-environment-table.pdf
> 
> Note that this table spec is not part of the ACPI spec but is just a
> reserved name, which is significantly easier than including something in
> the ACPI spec itself.  For the purposes of Linux on KVM/ARM this is
> completely fine too, and if anything else wants to run on KVM/ARM they
> most likely will refer to the same table and/or advocate including the
> table spec within the ACPI spec itself.
>

OK, so maybe we should start by sprucing up the hypervisor DT node, as
it'll be easier to work things out with that, and then the ACPI table
definition should follow more easily.

drew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 14:22           ` Andrew Jones
  2015-04-09 14:37             ` Mark Rutland
@ 2015-04-09 14:48             ` Mark Rutland
  1 sibling, 0 replies; 21+ messages in thread
From: Mark Rutland @ 2015-04-09 14:48 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Marc Zyngier, kvmarm

> (btw, the hypervisor node already supports faking other hypervisors with
>  its compatible string. You could specify "linux,kvm" first, and
>  "xen,xen" second, if you had the capability to emulate the later.)

I don't think that would be a good idea. You'd then have to ensure that
you have properties from both bindings (hoping they don't clash).

What's the guest meant to do if it recognises both strings?

Mark.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 14:37             ` Mark Rutland
@ 2015-04-09 14:54               ` Andrew Jones
  2015-04-09 15:20                 ` Mark Rutland
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Jones @ 2015-04-09 14:54 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Marc Zyngier, kvmarm

On Thu, Apr 09, 2015 at 03:37:14PM +0100, Mark Rutland wrote:
> > > > Just as the hypervisor node is more attractive with the consideration
> > > > that it's being adopted by other parties (xen and kvmppc), any ACPI
> > > > tables will be more likely to be accepted if they have buy-in from
> > > > a greater audience. kvmarm would be the only consumer for the time
> > > > being, but it'd be good if it was more general from the start,
> > > > particularly if general kernel code would need to know about it.
> > > 
> > > I'm not sure I follow, given that there's no generic hypervisor node in
> > > DT at the moment.
> > > 
> > > In the case of Xen there's a node with a "xen,xen" compatible string.
> > > 
> > > In the case of PPC KVM, there's a node with a "linux,kvm" compatible
> > > string.
> > 
> > DT nodes documented in Documentation/devicetree/bindings are about as
> > good as DT node formalism gets, right? Currently the hypervisor node
> > is only documented separately in the arm/ and powerpc/ dirs, so maybe
> > a Documentation/devicetree/bindings/hypervisor.txt file would be a good
> > addition in order to make the common node official.
> 
> From my PoV, each hypervisor has its own binding, which is identified by
> its set of compatible strings, and node naming colliding on
> "/hypervisor" is coincidental.
> 
> We can suggest that "/hypervisor" is the usual name for a node related
> to a hypervisor, but that tells us nothing about the format of the node.
> 
> > > These don't have any overlap. I'm not sure what a generic node would
> > > buy
> > > you short of detecting (some cases) when you're begin virtualised, given
> > > you'd have to know about the particular hypervisor and/or services
> > > anyway.
> > 
> > As I pointed out, it's possible to fake one hypervisor with another,
> > but only if the guest knows just one (standard) place too look.
> 
> For a hypervisor to impersonate another, it just needs to do whatever
> that other hypervisor does. The guest needs to know where to look for
> the info relating to the hypervisor being impersonated, but that does
> not require a common location.

True. Not required, but more convenient.

> 
> > For example, if Windows guests were taught to look for a particular
> > ACPI table to learn about their HyperV features, then KVM could fake
> > that table and also emulate the hypercalls.
> 
> This requires KVM to provide those services and for the information to
> be in the appropriate location. This does not require all hypervisors to
> place information in the same location, nor for there to be a common
> format for that information.
> 
> > Besides, tools like virt-what would benefit from knowing where to
> > look, etc.
> 
> This is a different case, for which a common location would likely be
> helpful, though not complete.
> 
> > So, if we go the ACPI table route, then I think it should be a
> > hypervisor and architecture agnostic table. Just as the "hypervisor"
> > node is super generically named, and it's up to the compatible strings
> > to sort it out.
> > 
> > (btw, the hypervisor node already supports faking other hypervisors with
> >  its compatible string. You could specify "linux,kvm" first, and
> >  "xen,xen" second, if you had the capability to emulate the later.)
> 
> Evidently we have different views w.r.t. binding to tables.
> 
> From my PoV, a node which we don't recognise the compatible string for
> (which is not a legacy node with a static path) is something we know
> nothing about, and any name applied to the node is just there to help
> the casual reader. For there to be a contract as to the purpose and
> format of the node, there should be a compatible string list.

I see your point of view. I see the hypervisor node as the
"you are a guest" indicator, which, for convenience, should be common
across guests. Otherwise the "is_guest()" function would be many
if-else statements trying to determine the type of guest it is before
it even knows that it is a guest. If the hypervisor node exists, then
is_guest() is true, and the compatible string will tell us how to
proceed with the type. You're right about trying to shove two types of
hypervisors into the same hypervisor node with the compatible list though.
Generically named properties, like 'reg', can only apply to one, so
perhaps that wouldn't work.

Thanks,
drew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 14:54               ` Andrew Jones
@ 2015-04-09 15:20                 ` Mark Rutland
  2015-04-09 19:01                   ` Andrew Jones
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Rutland @ 2015-04-09 15:20 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Marc Zyngier, kvmarm

> > > For example, if Windows guests were taught to look for a particular
> > > ACPI table to learn about their HyperV features, then KVM could fake
> > > that table and also emulate the hypercalls.
> > 
> > This requires KVM to provide those services and for the information to
> > be in the appropriate location. This does not require all hypervisors to
> > place information in the same location, nor for there to be a common
> > format for that information.
> > 
> > > Besides, tools like virt-what would benefit from knowing where to
> > > look, etc.
> > 
> > This is a different case, for which a common location would likely be
> > helpful, though not complete.
> > 
> > > So, if we go the ACPI table route, then I think it should be a
> > > hypervisor and architecture agnostic table. Just as the "hypervisor"
> > > node is super generically named, and it's up to the compatible strings
> > > to sort it out.
> > > 
> > > (btw, the hypervisor node already supports faking other hypervisors with
> > >  its compatible string. You could specify "linux,kvm" first, and
> > >  "xen,xen" second, if you had the capability to emulate the later.)
> > 
> > Evidently we have different views w.r.t. binding to tables.
> > 
> > From my PoV, a node which we don't recognise the compatible string for
> > (which is not a legacy node with a static path) is something we know
> > nothing about, and any name applied to the node is just there to help
> > the casual reader. For there to be a contract as to the purpose and
> > format of the node, there should be a compatible string list.
> 
> I see your point of view. I see the hypervisor node as the
> "you are a guest" indicator, which, for convenience, should be common
> across guests. 

I assume you mean common across hypervisors?

A node called "hypervisor" is admittedly a pretty good sign, but there
won't always be such a node.

We can certainly encourage vendors to call their node /hypervisor. I
don't think that it makes sense to enforce a common binding across
these, because they are by definition trying to convey
hypervisor-specific information.

> Otherwise the "is_guest()" function would be many if-else statements
> trying to determine the type of guest it is before it even knows that
> it is a guest.

It's worth noting that to some extent this may always be the case (e.g.
is Dom0 a guest?), and it won't always be possible to determine if any
(particular) hypervisor is present. For example, kvmtool does not place
any special information in the DT regarding itself, and uses common
peripherals (so you can't use these to fingerprint it reliably).

> If the hypervisor node exists, then is_guest() is true, and the
> compatible string will tell us how to proceed with the type. 

Nit: s/the hypervisor node/a hypervisor node/ -- let's not pretend
there's a common standard where there is not.

This would work where you recognise the string, but so would looking for
well-known strings.

The only thing you gain by assuming that the hypervisor has a node
called /hypervisor is the ability to (sometimes) detect that a
hypervisor you know nothing about is probably present.

> You're right about trying to shove two types of hypervisors into the
> same hypervisor node with the compatible list though.  Generically
> named properties, like 'reg', can only apply to one, so perhaps that
> wouldn't work.

In general, it will not work.

As I understand it on x86 if KVM masquerades as HyperV it's not also
visible as KVM. Is that correct?

There's also a huge grey area regarding what the guest should do if it
thinks it recognises both HyperV and KVM (or any other combination)
simultaneously.

Is there any reason for a hypervisor to try to both masquerade as a
different hypervisor and advertise itself natively? The whole sharing a
node / table thing may be irrelevant.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 15:20                 ` Mark Rutland
@ 2015-04-09 19:01                   ` Andrew Jones
  2015-04-13 10:46                     ` Mark Rutland
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Jones @ 2015-04-09 19:01 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Marc Zyngier, kvmarm

On Thu, Apr 09, 2015 at 04:20:09PM +0100, Mark Rutland wrote:
> We can certainly encourage vendors to call their node /hypervisor. I
> don't think that it makes sense to enforce a common binding across
> these, because they are by definition trying to convey
> hypervisor-specific information.

OK, I agree. There's no need to require a common binding. Making it
easy for the same binding to be used by multiple hypervisors, at least
to some degree, would still be nice though.

> > Otherwise the "is_guest()" function would be many if-else statements
> > trying to determine the type of guest it is before it even knows that
> > it is a guest.
> 
> It's worth noting that to some extent this may always be the case (e.g.
> is Dom0 a guest?), and it won't always be possible to determine if any
> (particular) hypervisor is present. For example, kvmtool does not place
> any special information in the DT regarding itself, and uses common
> peripherals (so you can't use these to fingerprint it reliably).

Right, but if we need kvm to advertise hypervisor features in some way,
(which, btw, x86 does do as well, by using a bitmap in another cpuid
leaf), then we'll need this DT node and ACPI table, or some other idea.
Anyway, it wouldn't hurt to have something now, just for the virt-what
type of case.

> This would work where you recognise the string, but so would looking for
> well-known strings.
> 
> The only thing you gain by assuming that the hypervisor has a node
> called /hypervisor is the ability to (sometimes) detect that a
> hypervisor you know nothing about is probably present.

You also gain a common location for the documentation of those
well-known strings, and the ability to share a bit of code in the
parsing.

> 
> > You're right about trying to shove two types of hypervisors into the
> > same hypervisor node with the compatible list though.  Generically
> > named properties, like 'reg', can only apply to one, so perhaps that
> > wouldn't work.
> 
> In general, it will not work.
> 
> As I understand it on x86 if KVM masquerades as HyperV it's not also
> visible as KVM. Is that correct?

Actually you can supply more than one hypervisor signature in the cpuids.
There's cpuid space allocated for up to 256. A guest can then search
the space in preference order for the hypervisor type it wants. QEMU/KVM
for x86 does this. It sets things up such that hyperv is in the first
location, which is the lowest preferred location for Linux, and likely
the only place Windows checks. It still puts KVM in the second location,
which is a higher preference location for Linux, and relies on Linux
finding it there. Linux finds it with
arch/x86/kernel/cpu/hypervisor.c:detect_hypervisor_vendor

Note how the hypervisor detection is common for all x86 hypervisors that
Linux knows about. A specified /hypervisor node could be the DT
equivalent. Unfortunately, as we've stated, specifying a list of
compatible hypervisor types in the compatible property won't work, as
generically named properties of the /hypervisor node would then become
useless. I guess we need something else.

> 
> There's also a huge grey area regarding what the guest should do if it
> thinks it recognises both HyperV and KVM (or any other combination)
> simultaneously.

I don't think so. If the guest searches in preference-order, or the list
order tells the guest which one it should prefer, as is done for x86,
then it just uses the most preferred, and supported, one it finds.

> 
> Is there any reason for a hypervisor to try to both masquerade as a
> different hypervisor and advertise itself natively? The whole sharing a
> node / table thing may be irrelevant.
>

Yes. When you don't know the guest type you're booting, then you need
to expose all that you can handle, and then rely on the guest to pick
the right one.

Thanks,
drew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09  7:57 ` Marc Zyngier
  2015-04-09 12:06   ` Andrew Jones
@ 2015-04-10  2:36   ` Mario Smarduch
  2015-04-10  8:53     ` Marc Zyngier
  1 sibling, 1 reply; 21+ messages in thread
From: Mario Smarduch @ 2015-04-10  2:36 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm

On 04/09/2015 12:57 AM, Marc Zyngier wrote:
> On Thu, 9 Apr 2015 02:46:54 +0100
> Mario Smarduch <m.smarduch@samsung.com> wrote:
> 
> Hi Mario,
> 
>> I'm working with AsyncPF, and currently using
>> hyp call to communicate guest GFN for host to inject
>> virtual abort - page not available/page available.
>>
>> Currently only PSCI makes use of that interface,
>> (handle_hvc()) can we overload interface with additional
>> hyp calls in this case pass guest gfn? Set arg0
>> to some range outside of PSCI use.
> 
> I can't see a reason why we wouldn't open handle_hvc() to other
> paravirtualized services. But this has to be done with extreme caution:
> 
> - This becomes an ABI between host and guest
> - We need a discovery protocol
> - We need to make sure other hypervisors don't reuse the same function
>   number for other purposes
> 
> Maybe we should adopt Xen's idea of a hypervisor node in DT where we
> would describe the various services? How will that work with ACPI?
> 
> Coming back to AsyncPF, and purely out of curiosity: why do you need a
> HYP entry point? From what I remember, AsyncPF works by injecting a
> fault in the guest when the page is found not present or made
> available, with the GFN being stored in a per-vcpu memory location.
> 
> Am I missing something obvious? Or have I just displayed my ignorance on
> this subject? ;-)
Hi Marc,

Or it might be me :)

But I'm thinking Guest and host need to agree on some per-vcpu
guest memory for KVM to write PV-fault type, and Guest to read
the PV-fault type, ack it, i.e. Having the guest allocate the per-cpu
PV-fault memory and inform KVM with its GPA via hyp call is one
approach I was thinking off.

I was looking through x86 that's based on CPUID extended with
PV feature support. In the guest if the ASYNC PF feature is enabled
it writes GPA to ASYNC PF MSR that's resolved in KVM (x86 folks
can correct if I'm off here).

I'm wondering if we could build on this concept maybe PV ID_* registers,
to discover existence of ASYNC PF feature?


- Mario



> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-10  2:36   ` Mario Smarduch
@ 2015-04-10  8:53     ` Marc Zyngier
  2015-04-10 23:45       ` Mario Smarduch
  0 siblings, 1 reply; 21+ messages in thread
From: Marc Zyngier @ 2015-04-10  8:53 UTC (permalink / raw)
  To: Mario Smarduch; +Cc: kvmarm

On 10/04/15 03:36, Mario Smarduch wrote:
> On 04/09/2015 12:57 AM, Marc Zyngier wrote:
>> On Thu, 9 Apr 2015 02:46:54 +0100
>> Mario Smarduch <m.smarduch@samsung.com> wrote:
>>
>> Hi Mario,
>>
>>> I'm working with AsyncPF, and currently using
>>> hyp call to communicate guest GFN for host to inject
>>> virtual abort - page not available/page available.
>>>
>>> Currently only PSCI makes use of that interface,
>>> (handle_hvc()) can we overload interface with additional
>>> hyp calls in this case pass guest gfn? Set arg0
>>> to some range outside of PSCI use.
>>
>> I can't see a reason why we wouldn't open handle_hvc() to other
>> paravirtualized services. But this has to be done with extreme caution:
>>
>> - This becomes an ABI between host and guest
>> - We need a discovery protocol
>> - We need to make sure other hypervisors don't reuse the same function
>>   number for other purposes
>>
>> Maybe we should adopt Xen's idea of a hypervisor node in DT where we
>> would describe the various services? How will that work with ACPI?
>>
>> Coming back to AsyncPF, and purely out of curiosity: why do you need a
>> HYP entry point? From what I remember, AsyncPF works by injecting a
>> fault in the guest when the page is found not present or made
>> available, with the GFN being stored in a per-vcpu memory location.
>>
>> Am I missing something obvious? Or have I just displayed my ignorance on
>> this subject? ;-)
> Hi Marc,
> 
> Or it might be me :)
> 
> But I'm thinking Guest and host need to agree on some per-vcpu
> guest memory for KVM to write PV-fault type, and Guest to read
> the PV-fault type, ack it, i.e. Having the guest allocate the per-cpu
> PV-fault memory and inform KVM with its GPA via hyp call is one
> approach I was thinking off.

Ah, I see what you mean. I was only looking at the runtime aspect of
things, and didn't consider the (all important) setup stage.

> I was looking through x86 that's based on CPUID extended with
> PV feature support. In the guest if the ASYNC PF feature is enabled
> it writes GPA to ASYNC PF MSR that's resolved in KVM (x86 folks
> can correct if I'm off here).
> 
> I'm wondering if we could build on this concept maybe PV ID_* registers,
> to discover existence of ASYNC PF feature?

I suppose we could do something similar with the ImpDef encoding space
(i.e. what is trapped using HCR_EL2.TIDCP). The main issue with that is
to be able to safely carve out a range that will never be used by any HW
implementation, ever. I can't really see how we enforce this.

Also, it will have the exact same cost as a hypercall, so maybe it is
bit of a moot point. Anyway, this is "just" a matter of being able to
describe the feature to the guest (and it seems like this is the real
controversial aspect)...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-10  8:53     ` Marc Zyngier
@ 2015-04-10 23:45       ` Mario Smarduch
  0 siblings, 0 replies; 21+ messages in thread
From: Mario Smarduch @ 2015-04-10 23:45 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm

On 04/10/2015 01:53 AM, Marc Zyngier wrote:
> On 10/04/15 03:36, Mario Smarduch wrote:
>> On 04/09/2015 12:57 AM, Marc Zyngier wrote:
>>> On Thu, 9 Apr 2015 02:46:54 +0100
>>> Mario Smarduch <m.smarduch@samsung.com> wrote:
>>>
>>> Hi Mario,
>>>
>>>> I'm working with AsyncPF, and currently using
>>>> hyp call to communicate guest GFN for host to inject
>>>> virtual abort - page not available/page available.
>>>>
>>>> Currently only PSCI makes use of that interface,
>>>> (handle_hvc()) can we overload interface with additional
>>>> hyp calls in this case pass guest gfn? Set arg0
>>>> to some range outside of PSCI use.
>>>
>>> I can't see a reason why we wouldn't open handle_hvc() to other
>>> paravirtualized services. But this has to be done with extreme caution:
>>>
>>> - This becomes an ABI between host and guest
>>> - We need a discovery protocol
>>> - We need to make sure other hypervisors don't reuse the same function
>>>   number for other purposes
>>>
>>> Maybe we should adopt Xen's idea of a hypervisor node in DT where we
>>> would describe the various services? How will that work with ACPI?
>>>
>>> Coming back to AsyncPF, and purely out of curiosity: why do you need a
>>> HYP entry point? From what I remember, AsyncPF works by injecting a
>>> fault in the guest when the page is found not present or made
>>> available, with the GFN being stored in a per-vcpu memory location.
>>>
>>> Am I missing something obvious? Or have I just displayed my ignorance on
>>> this subject? ;-)
>> Hi Marc,
>>
>> Or it might be me :)
>>
>> But I'm thinking Guest and host need to agree on some per-vcpu
>> guest memory for KVM to write PV-fault type, and Guest to read
>> the PV-fault type, ack it, i.e. Having the guest allocate the per-cpu
>> PV-fault memory and inform KVM with its GPA via hyp call is one
>> approach I was thinking off.
> 
> Ah, I see what you mean. I was only looking at the runtime aspect of
> things, and didn't consider the (all important) setup stage.
> 
>> I was looking through x86 that's based on CPUID extended with
>> PV feature support. In the guest if the ASYNC PF feature is enabled
>> it writes GPA to ASYNC PF MSR that's resolved in KVM (x86 folks
>> can correct if I'm off here).
>>
>> I'm wondering if we could build on this concept maybe PV ID_* registers,
>> to discover existence of ASYNC PF feature?
> 
> I suppose we could do something similar with the ImpDef encoding space
> (i.e. what is trapped using HCR_EL2.TIDCP). The main issue with that is
> to be able to safely carve out a range that will never be used by any HW
> implementation, ever. I can't really see how we enforce this.

I was thinking of a virtual ID register, populated with PV
features when vcpu is initialized. PV Guest would discover
PV features via MMIO read of PV ID reg. This probably could
have it's own range of features, independent of HW.

> 
> Also, it will have the exact same cost as a hypercall, so maybe it is
> bit of a moot point. Anyway, this is "just" a matter of being able to
> describe the feature to the guest (and it seems like this is the real
> controversial aspect)...

Yes I don't quite see the dividing line between
hyp call and CPU ID scheme. Need lot more thinking.

Thanks,
  Mario


> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-09 19:01                   ` Andrew Jones
@ 2015-04-13 10:46                     ` Mark Rutland
  2015-04-13 12:52                       ` Andrew Jones
  0 siblings, 1 reply; 21+ messages in thread
From: Mark Rutland @ 2015-04-13 10:46 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Marc Zyngier, kvmarm

Hi,

> > > Otherwise the "is_guest()" function would be many if-else statements
> > > trying to determine the type of guest it is before it even knows that
> > > it is a guest.
> > 
> > It's worth noting that to some extent this may always be the case (e.g.
> > is Dom0 a guest?), and it won't always be possible to determine if any
> > (particular) hypervisor is present. For example, kvmtool does not place
> > any special information in the DT regarding itself, and uses common
> > peripherals (so you can't use these to fingerprint it reliably).
> 
> Right, but if we need kvm to advertise hypervisor features in some way,
> (which, btw, x86 does do as well, by using a bitmap in another cpuid
> leaf), then we'll need this DT node and ACPI table, or some other idea.

That presumes you already know the hypervisor, in order to parse those
bitmaps. So I'm not sure I follow. We will need some mechanism to expose
features, but this is orthogonal to detecting the presence of a
hypervisor, no?

> Anyway, it wouldn't hurt to have something now, just for the virt-what
> type of case.

We can add a KVM node (or a "hypervisor services" node), but we should
first figure out what we actually need to expose. It can hurt if
whatever we come up with now clashes with what we want later.

[...]

> > The only thing you gain by assuming that the hypervisor has a node
> > called /hypervisor is the ability to (sometimes) detect that a
> > hypervisor you know nothing about is probably present.
> 
> You also gain a common location for the documentation of those
> well-known strings, and the ability to share a bit of code in the
> parsing.

Sorry if I sound like a broken record here, but I don't see why the node
needs to be called /hypervisor for either of those to be true. We can
(and should) place the documentation together in a common location
regardless, and we don't need a common DT path for code to be shared,
especially given...

> > As I understand it on x86 if KVM masquerades as HyperV it's not also
> > visible as KVM. Is that correct?
> 
> Actually you can supply more than one hypervisor signature in the cpuids.
> There's cpuid space allocated for up to 256. A guest can then search
> the space in preference order for the hypervisor type it wants. QEMU/KVM
> for x86 does this. It sets things up such that hyperv is in the first
> location, which is the lowest preferred location for Linux, and likely
> the only place Windows checks. It still puts KVM in the second location,
> which is a higher preference location for Linux, and relies on Linux
> finding it there. Linux finds it with
> arch/x86/kernel/cpu/hypervisor.c:detect_hypervisor_vendor

... that the x86 code here just iterates over a set of callbacks, with
the detection logic living in each callback.

It would be interesting to know which hypervisors pretend to be each
other and why. I can see that KVM masquerading as HyperV is useful for
some existing systems, but does KVM masquerade as Xen (or vice versa)?

If this is simply to share common services, then those services could be
described independently of the hypervisor-specific node(s).

[side note: Are any Xen (or other hypervisor) developers on this list?
We need to involve them when determining standards]

> Note how the hypervisor detection is common for all x86 hypervisors that
> Linux knows about. A specified /hypervisor node could be the DT
> equivalent.

Note how this is just a list of callbacks, and there's no actual sharing
of the hypervisor-specific detection mechanism. ;)

We could do likewise, but this doesn't require trying to share a
/hypervisor node.

> > There's also a huge grey area regarding what the guest should do if it
> > thinks it recognises both HyperV and KVM (or any other combination)
> > simultaneously.
> 
> I don't think so. If the guest searches in preference-order, or the list
> order tells the guest which one it should prefer, as is done for x86,
> then it just uses the most preferred, and supported, one it finds.

Those both assume that the guest only decides to use a single
hypervisor's interfaces, rather than trying to use portions of both,
which I can imagine being a possibility unless the OS has policies to
prevent that (e.g. stopping detection once _some_ hypervisor has been
detected). That's a horrible grey area.

Preference order would depend on the guest's preferences rather than the
user's, but perhaps that's not really a problem.

> > Is there any reason for a hypervisor to try to both masquerade as a
> > different hypervisor and advertise itself natively? The whole sharing a
> > node / table thing may be irrelevant.
> >
> 
> Yes. When you don't know the guest type you're booting, then you need
> to expose all that you can handle, and then rely on the guest to pick
> the right one.

I can see that masquerading is useful for providing services to guest
which only understand some proprietary hypervisor. However, we haven't
seen proprietary hypervisors (nor proprietary clients) thus far.

Is masquerading relevant currently?

Which services would we want to share?

Can we come up with common standards for those services instead?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Advice on HYP interface for AsyncPF
  2015-04-13 10:46                     ` Mark Rutland
@ 2015-04-13 12:52                       ` Andrew Jones
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Jones @ 2015-04-13 12:52 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Marc Zyngier, kvmarm

On Mon, Apr 13, 2015 at 11:46:36AM +0100, Mark Rutland wrote:
> Hi,
> 
> > > > Otherwise the "is_guest()" function would be many if-else statements
> > > > trying to determine the type of guest it is before it even knows that
> > > > it is a guest.
> > > 
> > > It's worth noting that to some extent this may always be the case (e.g.
> > > is Dom0 a guest?), and it won't always be possible to determine if any
> > > (particular) hypervisor is present. For example, kvmtool does not place
> > > any special information in the DT regarding itself, and uses common
> > > peripherals (so you can't use these to fingerprint it reliably).
> > 
> > Right, but if we need kvm to advertise hypervisor features in some way,
> > (which, btw, x86 does do as well, by using a bitmap in another cpuid
> > leaf), then we'll need this DT node and ACPI table, or some other idea.
> 
> That presumes you already know the hypervisor, in order to parse those
> bitmaps. So I'm not sure I follow. We will need some mechanism to expose
> features, but this is orthogonal to detecting the presence of a
> hypervisor, no?

Yes, orthogonal. But, depending on the method of detection for the
hypervisor used, then hypervisor-feature detection will either be a
straight-forward extension of that, or we'll be revisiting the 'how'
discussion for it as well (although for that discussion we would only
need to consider kvmarm). Anyway, I think it's worth considering both
at the same time, at least for now.

> 
> > Anyway, it wouldn't hurt to have something now, just for the virt-what
> > type of case.
> 
> We can add a KVM node (or a "hypervisor services" node), but we should
> first figure out what we actually need to expose. It can hurt if
> whatever we come up with now clashes with what we want later.

atm, just 'this is a KVM guest', and maybe even which userspace; qemu
vs. kvmtool vs. ??. I think a feature bitmap and/or the address of
some hypervisor shared page could be some likely additions though.

> 
> [...]
> 
> > > The only thing you gain by assuming that the hypervisor has a node
> > > called /hypervisor is the ability to (sometimes) detect that a
> > > hypervisor you know nothing about is probably present.
> > 
> > You also gain a common location for the documentation of those
> > well-known strings, and the ability to share a bit of code in the
> > parsing.
> 
> Sorry if I sound like a broken record here, but I don't see why the node
> needs to be called /hypervisor for either of those to be true. We can
> (and should) place the documentation together in a common location
> regardless, and we don't need a common DT path for code to be shared,
> especially given...
> 
> > > As I understand it on x86 if KVM masquerades as HyperV it's not also
> > > visible as KVM. Is that correct?
> > 
> > Actually you can supply more than one hypervisor signature in the cpuids.
> > There's cpuid space allocated for up to 256. A guest can then search
> > the space in preference order for the hypervisor type it wants. QEMU/KVM
> > for x86 does this. It sets things up such that hyperv is in the first
> > location, which is the lowest preferred location for Linux, and likely
> > the only place Windows checks. It still puts KVM in the second location,
> > which is a higher preference location for Linux, and relies on Linux
> > finding it there. Linux finds it with
> > arch/x86/kernel/cpu/hypervisor.c:detect_hypervisor_vendor
> 
> ... that the x86 code here just iterates over a set of callbacks, with
> the detection logic living in each callback.

Yes, at the top level it's just callbacks, thus it reserves the right to
do anything it wants, but each detect() callback actually does the same
thing today, which is to look at the same cpuid leaf for a signature. See
arch/x86/include/asm/processor.h:hypervisor_cpuid_base, which is also
common x86 code, and is used by both kvm and xen. vmware and hyeprv
detection do the same thing too, but just don't use the common helper.

> 
> It would be interesting to know which hypervisors pretend to be each
> other and why. I can see that KVM masquerading as HyperV is useful for
> some existing systems, but does KVM masquerade as Xen (or vice versa)?

Not that I know of. The only reason one would do that is if a KVM host
expected to be given xen-only enlightened images, i.e. the image knows
how to deal with xen paravirt, but not kvm. This is pretty unlikely for
Linux distros, which are generally compiled to be enlightened for both.

> 
> If this is simply to share common services, then those services could be
> described independently of the hypervisor-specific node(s).

The services this would be for are specific to hypervisors. We already have
independence from paravirt I/O support, e.g. virtio. However, once you know
what hypervisor you're on, then you can start probing for hypervisor-
specific features. I'm suggesting that it'd be nice if the determination of
hypervisors was done in a common way. The determination of features may
optionally be done the same way.

> 
> [side note: Are any Xen (or other hypervisor) developers on this list?
> We need to involve them when determining standards]

We should copy xen-devel at some point, if we get that far, but I don't
think we've actually gotten to a determining standards point with this
discussion yet. If you're starting to see some value in a standard, then
maybe we're getting closer ;-)

> 
> > Note how the hypervisor detection is common for all x86 hypervisors that
> > Linux knows about. A specified /hypervisor node could be the DT
> > equivalent.
> 
> Note how this is just a list of callbacks, and there's no actual sharing
> of the hypervisor-specific detection mechanism. ;)
> 
> We could do likewise, but this doesn't require trying to share a
> /hypervisor node.
> 
> > > There's also a huge grey area regarding what the guest should do if it
> > > thinks it recognises both HyperV and KVM (or any other combination)
> > > simultaneously.
> > 
> > I don't think so. If the guest searches in preference-order, or the list
> > order tells the guest which one it should prefer, as is done for x86,
> > then it just uses the most preferred, and supported, one it finds.
> 
> Those both assume that the guest only decides to use a single
> hypervisor's interfaces, rather than trying to use portions of both,
> which I can imagine being a possibility unless the OS has policies to
> prevent that (e.g. stopping detection once _some_ hypervisor has been
> detected). That's a horrible grey area.

Well, a guest can try anything it wants, and that idea actually points
to an area that we should look at for potential host bugs, as behavior
like this could expose something, but I don't think it makes sense to
actually support this behavior.

> 
> Preference order would depend on the guest's preferences rather than the
> user's, but perhaps that's not really a problem.

I wouldn't think so.

> 
> > > Is there any reason for a hypervisor to try to both masquerade as a
> > > different hypervisor and advertise itself natively? The whole sharing a
> > > node / table thing may be irrelevant.
> > >
> > 
> > Yes. When you don't know the guest type you're booting, then you need
> > to expose all that you can handle, and then rely on the guest to pick
> > the right one.
> 
> I can see that masquerading is useful for providing services to guest
> which only understand some proprietary hypervisor. However, we haven't
> seen proprietary hypervisors (nor proprietary clients) thus far.
> 
> Is masquerading relevant currently?

No, not for ARM, and wrt to a DT node, I doubt it will ever matter. If
we're going to need masquerading for ARM virt, then I'm guessing the
hypervisor type and features will also need to be exposed in a different
way. ACPI? I don't think it hurts to try and work out issues we can foresee
on the DT side first though.

> 
> Which services would we want to share?

Primarily 'am I guest? And, if so, what type?' Maybe also 'at what
address can I find hypervisor-specific data?'

> 
> Can we come up with common standards for those services instead?

I'm not sure what types of services you have in mind that deserve
standards. Is hypervisor type detection not worthy?

Thanks,
drew

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-04-13 12:45 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-09  1:46 Advice on HYP interface for AsyncPF Mario Smarduch
2015-04-09  7:57 ` Marc Zyngier
2015-04-09 12:06   ` Andrew Jones
2015-04-09 12:48     ` Mark Rutland
2015-04-09 13:43       ` Andrew Jones
2015-04-09 14:00         ` Mark Rutland
2015-04-09 14:22           ` Andrew Jones
2015-04-09 14:37             ` Mark Rutland
2015-04-09 14:54               ` Andrew Jones
2015-04-09 15:20                 ` Mark Rutland
2015-04-09 19:01                   ` Andrew Jones
2015-04-13 10:46                     ` Mark Rutland
2015-04-13 12:52                       ` Andrew Jones
2015-04-09 14:48             ` Mark Rutland
2015-04-09 13:35     ` Christoffer Dall
2015-04-09 13:59       ` Andrew Jones
2015-04-09 14:22         ` Christoffer Dall
2015-04-09 14:42           ` Andrew Jones
2015-04-10  2:36   ` Mario Smarduch
2015-04-10  8:53     ` Marc Zyngier
2015-04-10 23:45       ` Mario Smarduch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.