All of lore.kernel.org
 help / color / mirror / Atom feed
* Proposal for physical address based hypercalls
@ 2022-09-28 10:38 Jan Beulich
  2022-09-28 10:58 ` Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jan Beulich @ 2022-09-28 10:38 UTC (permalink / raw)
  To: xen-devel

For quite some time we've been talking about replacing the present virtual
address based hypercall interface with one using physical addresses.  This is in
particular a prerequisite to being able to support guests with encrypted
memory, as for such guests we cannot perform the page table walks necessary to
translate virtual to (guest-)physical addresses.  But using (guest) physical
addresses is also expected to help performance of non-PV guests (i.e. all Arm
ones plus HVM/PVH on x86), because of the no longer necessary address
translation.

Clearly to be able to run existing guests, we need to continue to support the
present virtual address based interface.  Previously it was suggested to change
the model on a per-domain basis, perhaps by a domain creation control.  This
has two major shortcomings:
 - Entire guest OSes would need to switch over to the new model all in one go.
   This could be particularly problematic for in-guest interfaces like Linux'es
   privcmd driver, which is passed hypercall argument from user space.  Such
   necessarily use virtual addresses, and hence the kernel would need to learn
   of all hypercalls legitimately coming in, in order to translate the buffer
   addresses.  Reaching sufficient coverage there might take some time.
 - All base components within an individual guest instance which might run in
   succession (firmware, boot loader, kernel, kexec) would need to agree on the
   hypercall ABI to use.

As an alternative I'd like to propose the introduction of a bit (or multiple
ones, see below) augmenting the hypercall number, to control the flavor of the
buffers used for every individual hypercall.  This would likely involve the
introduction of a new hypercall page (or multiple ones if more than one bit is
to be used), to retain the present abstraction where it is the hypervisor which
actually fills these pages.  For multicalls the wrapping multicall itself would
be controlled independently of the constituent hypercalls.

A model involving just a single bit to indicate "flat" buffers has limitations
when it comes to large buffers passed to a hypercall.  Since in many cases
hypercalls (currently) allowing for rather large buffers wouldn't normally be
used with buffers significantly larger than a single page (several of the
mem-ops for example), special casing the (presumably) few hypercalls which have
an actual need for large buffers might be an option.

Another approach would be to build in a scatter/gather model for buffers right
away.  Jürgen suggests that the low two address bits could be used as a
"descriptor" here.  Alternatively, since buffer sizes should always be known,
using a multi-bit augmentation to the hypercall number could also be a viable
model, distinguishing between e.g. all-linear buffers, all-single-S/G-level
ones, and size-dependent selection of zero or more S/G levels.  This would
affect all buffers used by a single hypercall.  With the level of indirection
needed derivable from buffer size, in the last of the variants small buffers
could still have their addresses provided directly while only larger buffers
would be described by e.g. a list of GFNs or a list of (address,length) tuples,
using multiple levels if even that list would still end up large.

Of course any one of the models could be selected as the only one to use (in
addition to the existing virtual address based one), allowing to stick to a
single bit augmenting the hypercall number.

Note that a dynamic model (indirection levels derived from buffer size) would
be quite impactful, as the overall buffer size would need passing to the
copying helpers alongside the size of the data which actually is to be copied.

How to express S/G lists will want to take into account existing uses.  For
example, an array of (address,length) tuples would be quite inefficient to use
with operations like copy_from_guest_offset().  Perhaps this would want to be
an array of xen_ulong_t, with the first slot holding the offset into the first
page and all further slots holding GFNs (albeit that would still require two
[generally] discontiguous reads from the array for a single
copy_from_guest_offset()).  Otoh, since calling code will need changing anyway
to use this new model, we might also require that such indirectly specified
buffers are page-aligned.

Virtual addresses will continue to be used in certain places.  Such aren't
normally expressed via handles, e.g. callback or exception handling entry
points.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-28 10:38 Proposal for physical address based hypercalls Jan Beulich
@ 2022-09-28 10:58 ` Andrew Cooper
  2022-09-28 12:06   ` Jan Beulich
  2022-09-28 13:32 ` dpsmith.dev
  2022-10-04  9:38 ` Julien Grall
  2 siblings, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2022-09-28 10:58 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 28/09/2022 11:38, Jan Beulich wrote:
> As an alternative I'd like to propose the introduction of a bit (or multiple
> ones, see below) augmenting the hypercall number, to control the flavor of the
> buffers used for every individual hypercall.  This would likely involve the
> introduction of a new hypercall page (or multiple ones if more than one bit is
> to be used), to retain the present abstraction where it is the hypervisor which
> actually fills these pages.

There are other concerns which need to be accounted for.

Encrypted VMs cannot use a hypercall page; they don't trust the
hypervisor in the first place, and the hypercall page is (specifically)
code injection.  So the sensible new ABI cannot depend on a hypercall table.

Also, rewriting the hypercall page on migrate turns out not to have been
the most clever idea, and only works right now because the instructions
are the same length in the variations for each mode.

Also continuations need to change to avoid userspace liveness problems,
and existing hypercalls that we do have need splitting between things
which are actually privileged operations (within the guest context) and
things which are logical control operations, so the kernel can expose
the latter to userspace without retaining the gaping root hole which is
/dev/xen/privcmd, and a blocker to doing UEFI Secureboot.

So yes, starting some new clean(er) interface from hypercall 64 is the
plan, but it very much does not want to be a simple mirror of the
existing 0-63 with a differing calling convention.

~Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-28 10:58 ` Andrew Cooper
@ 2022-09-28 12:06   ` Jan Beulich
  2022-09-28 13:03     ` Juergen Gross
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2022-09-28 12:06 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On 28.09.2022 12:58, Andrew Cooper wrote:
> On 28/09/2022 11:38, Jan Beulich wrote:
>> As an alternative I'd like to propose the introduction of a bit (or multiple
>> ones, see below) augmenting the hypercall number, to control the flavor of the
>> buffers used for every individual hypercall.  This would likely involve the
>> introduction of a new hypercall page (or multiple ones if more than one bit is
>> to be used), to retain the present abstraction where it is the hypervisor which
>> actually fills these pages.
> 
> There are other concerns which need to be accounted for.
> 
> Encrypted VMs cannot use a hypercall page; they don't trust the
> hypervisor in the first place, and the hypercall page is (specifically)
> code injection.  So the sensible new ABI cannot depend on a hypercall table.

I don't think there's a dependency, and I think there never really has been.
We've been advocating for its use, but we've not enforced that anywhere, I
don't think.

> Also, rewriting the hypercall page on migrate turns out not to have been
> the most clever idea, and only works right now because the instructions
> are the same length in the variations for each mode.
> 
> Also continuations need to change to avoid userspace liveness problems,
> and existing hypercalls that we do have need splitting between things
> which are actually privileged operations (within the guest context) and
> things which are logical control operations, so the kernel can expose
> the latter to userspace without retaining the gaping root hole which is
> /dev/xen/privcmd, and a blocker to doing UEFI Secureboot.
> 
> So yes, starting some new clean(er) interface from hypercall 64 is the
> plan, but it very much does not want to be a simple mirror of the
> existing 0-63 with a differing calling convention.

All of these look like orthogonal problems to me. That's likely all
relevant for, as I think you've been calling it, ABI v2, but shouldn't
hinder our switching to a physical address based hypercall model.
Otherwise I'm afraid we'll never make any progress in that direction.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-28 12:06   ` Jan Beulich
@ 2022-09-28 13:03     ` Juergen Gross
  2022-09-29 10:16       ` Wei Chen
  2022-09-29 11:32       ` Jan Beulich
  0 siblings, 2 replies; 14+ messages in thread
From: Juergen Gross @ 2022-09-28 13:03 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 3540 bytes --]

On 28.09.22 14:06, Jan Beulich wrote:
> On 28.09.2022 12:58, Andrew Cooper wrote:
>> On 28/09/2022 11:38, Jan Beulich wrote:
>>> As an alternative I'd like to propose the introduction of a bit (or multiple
>>> ones, see below) augmenting the hypercall number, to control the flavor of the
>>> buffers used for every individual hypercall.  This would likely involve the
>>> introduction of a new hypercall page (or multiple ones if more than one bit is
>>> to be used), to retain the present abstraction where it is the hypervisor which
>>> actually fills these pages.
>>
>> There are other concerns which need to be accounted for.
>>
>> Encrypted VMs cannot use a hypercall page; they don't trust the
>> hypervisor in the first place, and the hypercall page is (specifically)
>> code injection.  So the sensible new ABI cannot depend on a hypercall table.
> 
> I don't think there's a dependency, and I think there never really has been.
> We've been advocating for its use, but we've not enforced that anywhere, I
> don't think.
> 
>> Also, rewriting the hypercall page on migrate turns out not to have been
>> the most clever idea, and only works right now because the instructions
>> are the same length in the variations for each mode.
>>
>> Also continuations need to change to avoid userspace liveness problems,
>> and existing hypercalls that we do have need splitting between things
>> which are actually privileged operations (within the guest context) and
>> things which are logical control operations, so the kernel can expose
>> the latter to userspace without retaining the gaping root hole which is
>> /dev/xen/privcmd, and a blocker to doing UEFI Secureboot.
>>
>> So yes, starting some new clean(er) interface from hypercall 64 is the
>> plan, but it very much does not want to be a simple mirror of the
>> existing 0-63 with a differing calling convention.
> 
> All of these look like orthogonal problems to me. That's likely all
> relevant for, as I think you've been calling it, ABI v2, but shouldn't
> hinder our switching to a physical address based hypercall model.
> Otherwise I'm afraid we'll never make any progress in that direction.

What about an alternative model allowing to use most of the current
hypercalls unmodified?

We could add a new hypercall for registering hypercall buffers via
virtual address, physical address, and size of the buffers (kind of a
software TLB). The buffer table would want to be physically addressed
by the hypercall, of course.

It might be interesting to have this table per vcpu (it should be
allowed to use the same table for multiple vcpus) in order to speed
up finding translation entries of percpu buffers.

Any hypercall buffer being addressed virtually could first tried to
be found via the SW-TLB. This wouldn't require any changes for most
of the hypercall interfaces. Only special cases with very large buffers
might need indirect variants (like Jan said: via GFN lists, which could
be passed in registered buffers).

Encrypted guests would probably want to use static percpu buffers in
order to avoid switching the encryption state of the buffers all the
time.

An unencrypted PVH/HVM domain (e.g. PVH dom0) could just define one
giant buffer with the domain's memory size via the physical memory
mapping of the kernel. All kmalloc() addresses would be in that region.

A buffer address not found would need to be translated like today (and
fail for an encrypted guest).

Thoughts?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-28 10:38 Proposal for physical address based hypercalls Jan Beulich
  2022-09-28 10:58 ` Andrew Cooper
@ 2022-09-28 13:32 ` dpsmith.dev
  2022-09-29  8:16   ` Jan Beulich
  2022-10-04  9:38 ` Julien Grall
  2 siblings, 1 reply; 14+ messages in thread
From: dpsmith.dev @ 2022-09-28 13:32 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 9/28/22 06:38, Jan Beulich wrote:
> For quite some time we've been talking about replacing the present virtual
> address based hypercall interface with one using physical addresses.  This is in
> particular a prerequisite to being able to support guests with encrypted
> memory, as for such guests we cannot perform the page table walks necessary to
> translate virtual to (guest-)physical addresses.  But using (guest) physical
> addresses is also expected to help performance of non-PV guests (i.e. all Arm
> ones plus HVM/PVH on x86), because of the no longer necessary address
> translation.

Greetings Jan,

I think there are multiple issues in play here, but the two major ones 
are 1.) eliminating the use of guest virtual addresses and 2.) handling 
the change in the security model for hypercalls from encrypted VMs. As 
Andy was pointing out, attempting to address (1) in a backwards 
compatible approach will likely not arrive at a solution that can 
address issue (2). IMHO, the only result from teaching the existing ABI 
to speak GPAs instead of VAs will be to break current and new kernels of 
the habit of using VAs. Beyond that I do not see how it will do anything 
to prepare current OS kernels for running as encrypted VMs, at least for 
AMD since that is the specification I have been focused on studying the 
last couple of months.

As for ABIv2, I understand and can appreciate Andy's desired approach. 
Recently, especially with the hardware changes being introduced by SEV, 
I would like to have considered a naive and more radical approach. 
Currently hypercalls function using a more ioctl style. I would like to 
suggest that a packet style interface similar to netlink be considered. 
There are many benefits to adopting this type of interface that could be 
covered in a larger RFC if there was any sense of willingness to 
consider it. As a glimpse, a few benefits would be that arbitrary 
buffers, continuations/asynchronous calls, and multi-call are all 
natural consequence. It would also allow advanced extensions, such as an 
optional PF_RING-like interface for zero-copy messaging from guest 
user-space to hypervisor. While a packet interface could easily co-exist 
with the existing ioctl-style interface, it would be a paradigm shift 
from the past, though I feel ABIv2 was already going to be such a shift. 
Anyway, just my 2¢.

V/r,
DPS


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-28 13:32 ` dpsmith.dev
@ 2022-09-29  8:16   ` Jan Beulich
  2022-09-29 12:53     ` Daniel P. Smith
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2022-09-29  8:16 UTC (permalink / raw)
  To: dpsmith.dev; +Cc: xen-devel

On 28.09.2022 15:32, dpsmith.dev wrote:
> On 9/28/22 06:38, Jan Beulich wrote:
>> For quite some time we've been talking about replacing the present virtual
>> address based hypercall interface with one using physical addresses.  This is in
>> particular a prerequisite to being able to support guests with encrypted
>> memory, as for such guests we cannot perform the page table walks necessary to
>> translate virtual to (guest-)physical addresses.  But using (guest) physical
>> addresses is also expected to help performance of non-PV guests (i.e. all Arm
>> ones plus HVM/PVH on x86), because of the no longer necessary address
>> translation.
> 
> Greetings Jan,
> 
> I think there are multiple issues in play here, but the two major ones 
> are 1.) eliminating the use of guest virtual addresses and 2.) handling 
> the change in the security model for hypercalls from encrypted VMs. As 
> Andy was pointing out, attempting to address (1) in a backwards 
> compatible approach will likely not arrive at a solution that can 
> address issue (2).

It may not be sufficient, but it is (can be) a prereq.

> IMHO, the only result from teaching the existing ABI 
> to speak GPAs instead of VAs will be to break current and new kernels of 
> the habit of using VAs. Beyond that I do not see how it will do anything 
> to prepare current OS kernels for running as encrypted VMs, at least for 
> AMD since that is the specification I have been focused on studying the 
> last couple of months.

Plus we'd have code in the hypervisor then which deals with physical
address based hypercall buffers. One less prereq to take care of for
the (huge) rest of the work needed.

> As for ABIv2, I understand and can appreciate Andy's desired approach. 
> Recently, especially with the hardware changes being introduced by SEV, 
> I would like to have considered a naive and more radical approach. 
> Currently hypercalls function using a more ioctl style. I would like to 
> suggest that a packet style interface similar to netlink be considered. 
> There are many benefits to adopting this type of interface that could be 
> covered in a larger RFC if there was any sense of willingness to 
> consider it. As a glimpse, a few benefits would be that arbitrary 
> buffers, continuations/asynchronous calls, and multi-call are all 
> natural consequence. It would also allow advanced extensions, such as an 
> optional PF_RING-like interface for zero-copy messaging from guest 
> user-space to hypervisor. While a packet interface could easily co-exist 
> with the existing ioctl-style interface, it would be a paradigm shift 
> from the past, though I feel ABIv2 was already going to be such a shift. 
> Anyway, just my 2¢.

I'm sorry for my ignorance, but I have no knowledge of how netlink
works.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-28 13:03     ` Juergen Gross
@ 2022-09-29 10:16       ` Wei Chen
  2022-09-29 11:32       ` Jan Beulich
  1 sibling, 0 replies; 14+ messages in thread
From: Wei Chen @ 2022-09-29 10:16 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich, Andrew Cooper; +Cc: xen-devel

Hi Juergen,

On 2022/9/28 21:03, Juergen Gross wrote:
> On 28.09.22 14:06, Jan Beulich wrote:
>> On 28.09.2022 12:58, Andrew Cooper wrote:
>>> On 28/09/2022 11:38, Jan Beulich wrote:

> 
> What about an alternative model allowing to use most of the current
> hypercalls unmodified?
> 
> We could add a new hypercall for registering hypercall buffers via
> virtual address, physical address, and size of the buffers (kind of a
> software TLB). The buffer table would want to be physically addressed
> by the hypercall, of course.
> 
> It might be interesting to have this table per vcpu (it should be
> allowed to use the same table for multiple vcpus) in order to speed
> up finding translation entries of percpu buffers.
> 
> Any hypercall buffer being addressed virtually could first tried to
> be found via the SW-TLB. This wouldn't require any changes for most
> of the hypercall interfaces. Only special cases with very large buffers
> might need indirect variants (like Jan said: via GFN lists, which could
> be passed in registered buffers).
> 
> Encrypted guests would probably want to use static percpu buffers in
> order to avoid switching the encryption state of the buffers all the
> time.
> 

I agree with this one. When we were working on Arm Realm, we were also 
concerned about how buffers in hypercalls were shared between Xen and 
realm VM. Dynamically switching between protected and unprotected (can 
be accessed by VM and Xen, could not execute code) states of memory can 
be very expensive. And these uncertainties are also very easy to cause 
security problems. We have thought about explicitly reserving a section 
of unprotected memory for the realm VM for hypercall buffers, but that 
means Xen drivers of Linux need to be modified. It's great to see the 
community starts to do design about this.


Cheers,
Wei Chen

> An unencrypted PVH/HVM domain (e.g. PVH dom0) could just define one
> giant buffer with the domain's memory size via the physical memory
> mapping of the kernel. All kmalloc() addresses would be in that region.
> 
> A buffer address not found would need to be translated like today (and
> fail for an encrypted guest).
> 
> Thoughts?
> 
> 
> Juergen


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-28 13:03     ` Juergen Gross
  2022-09-29 10:16       ` Wei Chen
@ 2022-09-29 11:32       ` Jan Beulich
  2022-09-29 12:26         ` Juergen Gross
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2022-09-29 11:32 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Andrew Cooper

On 28.09.2022 15:03, Juergen Gross wrote:
> On 28.09.22 14:06, Jan Beulich wrote:
>> On 28.09.2022 12:58, Andrew Cooper wrote:
>>> On 28/09/2022 11:38, Jan Beulich wrote:
>>>> As an alternative I'd like to propose the introduction of a bit (or multiple
>>>> ones, see below) augmenting the hypercall number, to control the flavor of the
>>>> buffers used for every individual hypercall.  This would likely involve the
>>>> introduction of a new hypercall page (or multiple ones if more than one bit is
>>>> to be used), to retain the present abstraction where it is the hypervisor which
>>>> actually fills these pages.
>>>
>>> There are other concerns which need to be accounted for.
>>>
>>> Encrypted VMs cannot use a hypercall page; they don't trust the
>>> hypervisor in the first place, and the hypercall page is (specifically)
>>> code injection.  So the sensible new ABI cannot depend on a hypercall table.
>>
>> I don't think there's a dependency, and I think there never really has been.
>> We've been advocating for its use, but we've not enforced that anywhere, I
>> don't think.
>>
>>> Also, rewriting the hypercall page on migrate turns out not to have been
>>> the most clever idea, and only works right now because the instructions
>>> are the same length in the variations for each mode.
>>>
>>> Also continuations need to change to avoid userspace liveness problems,
>>> and existing hypercalls that we do have need splitting between things
>>> which are actually privileged operations (within the guest context) and
>>> things which are logical control operations, so the kernel can expose
>>> the latter to userspace without retaining the gaping root hole which is
>>> /dev/xen/privcmd, and a blocker to doing UEFI Secureboot.
>>>
>>> So yes, starting some new clean(er) interface from hypercall 64 is the
>>> plan, but it very much does not want to be a simple mirror of the
>>> existing 0-63 with a differing calling convention.
>>
>> All of these look like orthogonal problems to me. That's likely all
>> relevant for, as I think you've been calling it, ABI v2, but shouldn't
>> hinder our switching to a physical address based hypercall model.
>> Otherwise I'm afraid we'll never make any progress in that direction.
> 
> What about an alternative model allowing to use most of the current
> hypercalls unmodified?
> 
> We could add a new hypercall for registering hypercall buffers via
> virtual address, physical address, and size of the buffers (kind of a
> software TLB).

Why not?

> The buffer table would want to be physically addressed
> by the hypercall, of course.

I'm not convinced of this, as it would break uniformity of the hypercall
interfaces. IOW in the hypervisor we then wouldn't be able to use
copy_from_guest() to retrieve the contents. Perhaps this simply shouldn't
be a table, but a hypercall not involving any buffers (i.e. every
discontiguous piece would need registering separately). I expect such a
software TLB wouldn't have many entries, so needing to use a couple of
hypercalls shouldn't be a major issue.

> It might be interesting to have this table per vcpu (it should be
> allowed to use the same table for multiple vcpus) in order to speed
> up finding translation entries of percpu buffers.

Yes. Perhaps insertion and purging could simply be two new VCPUOP_*.

As a prereq I think we'd need to sort the cross-vCPU accessing of guest
data, coincidentally pointed out in a post-commit-message remark in
https://lists.xen.org/archives/html/xen-devel/2022-09/msg01761.html. The
subject vCPU isn't available in copy_to_user_hvm(), which is where I'd
expect the TLB lookup to occur (while assuming handles point at globally
mapped space _might_ be okay, using the wrong vCPU's TLB surely isn't).

> Any hypercall buffer being addressed virtually could first tried to
> be found via the SW-TLB. This wouldn't require any changes for most
> of the hypercall interfaces. Only special cases with very large buffers
> might need indirect variants (like Jan said: via GFN lists, which could
> be passed in registered buffers).
> 
> Encrypted guests would probably want to use static percpu buffers in
> order to avoid switching the encryption state of the buffers all the
> time.
> 
> An unencrypted PVH/HVM domain (e.g. PVH dom0) could just define one
> giant buffer with the domain's memory size via the physical memory
> mapping of the kernel. All kmalloc() addresses would be in that region.

That's Linux-centric. I'm not convinced all OSes maintain a directmap.
Without such, switching to this model might end up quite intrusive on
the OS side.

Thinking of Linux, we'd need a 2nd range covering the data part of the
kernel image.

Further this still wouldn't (afaics) pave a reasonable route towards
dealing with privcmd-invoked hypercalls.

Finally - in how far are we concerned of PV guests using linear
addresses for hypercall buffers? I ask because I don't think the model
lends itself to use also for the PV guest interfaces.

Jan

> A buffer address not found would need to be translated like today (and
> fail for an encrypted guest).
> 
> Thoughts?
> 
> 
> Juergen



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-29 11:32       ` Jan Beulich
@ 2022-09-29 12:26         ` Juergen Gross
  2022-09-29 12:58           ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Juergen Gross @ 2022-09-29 12:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 6544 bytes --]

On 29.09.22 13:32, Jan Beulich wrote:
> On 28.09.2022 15:03, Juergen Gross wrote:
>> On 28.09.22 14:06, Jan Beulich wrote:
>>> On 28.09.2022 12:58, Andrew Cooper wrote:
>>>> On 28/09/2022 11:38, Jan Beulich wrote:
>>>>> As an alternative I'd like to propose the introduction of a bit (or multiple
>>>>> ones, see below) augmenting the hypercall number, to control the flavor of the
>>>>> buffers used for every individual hypercall.  This would likely involve the
>>>>> introduction of a new hypercall page (or multiple ones if more than one bit is
>>>>> to be used), to retain the present abstraction where it is the hypervisor which
>>>>> actually fills these pages.
>>>>
>>>> There are other concerns which need to be accounted for.
>>>>
>>>> Encrypted VMs cannot use a hypercall page; they don't trust the
>>>> hypervisor in the first place, and the hypercall page is (specifically)
>>>> code injection.  So the sensible new ABI cannot depend on a hypercall table.
>>>
>>> I don't think there's a dependency, and I think there never really has been.
>>> We've been advocating for its use, but we've not enforced that anywhere, I
>>> don't think.
>>>
>>>> Also, rewriting the hypercall page on migrate turns out not to have been
>>>> the most clever idea, and only works right now because the instructions
>>>> are the same length in the variations for each mode.
>>>>
>>>> Also continuations need to change to avoid userspace liveness problems,
>>>> and existing hypercalls that we do have need splitting between things
>>>> which are actually privileged operations (within the guest context) and
>>>> things which are logical control operations, so the kernel can expose
>>>> the latter to userspace without retaining the gaping root hole which is
>>>> /dev/xen/privcmd, and a blocker to doing UEFI Secureboot.
>>>>
>>>> So yes, starting some new clean(er) interface from hypercall 64 is the
>>>> plan, but it very much does not want to be a simple mirror of the
>>>> existing 0-63 with a differing calling convention.
>>>
>>> All of these look like orthogonal problems to me. That's likely all
>>> relevant for, as I think you've been calling it, ABI v2, but shouldn't
>>> hinder our switching to a physical address based hypercall model.
>>> Otherwise I'm afraid we'll never make any progress in that direction.
>>
>> What about an alternative model allowing to use most of the current
>> hypercalls unmodified?
>>
>> We could add a new hypercall for registering hypercall buffers via
>> virtual address, physical address, and size of the buffers (kind of a
>> software TLB).
> 
> Why not?
> 
>> The buffer table would want to be physically addressed
>> by the hypercall, of course.
> 
> I'm not convinced of this, as it would break uniformity of the hypercall
> interfaces. IOW in the hypervisor we then wouldn't be able to use
> copy_from_guest() to retrieve the contents. Perhaps this simply shouldn't
> be a table, but a hypercall not involving any buffers (i.e. every
> discontiguous piece would need registering separately). I expect such a
> software TLB wouldn't have many entries, so needing to use a couple of
> hypercalls shouldn't be a major issue.

Fine with me.

> 
>> It might be interesting to have this table per vcpu (it should be
>> allowed to use the same table for multiple vcpus) in order to speed
>> up finding translation entries of percpu buffers.
> 
> Yes. Perhaps insertion and purging could simply be two new VCPUOP_*.

Again fine with me.

> As a prereq I think we'd need to sort the cross-vCPU accessing of guest
> data, coincidentally pointed out in a post-commit-message remark in
> https://lists.xen.org/archives/html/xen-devel/2022-09/msg01761.html. The
> subject vCPU isn't available in copy_to_user_hvm(), which is where I'd
> expect the TLB lookup to occur (while assuming handles point at globally
> mapped space _might_ be okay, using the wrong vCPU's TLB surely isn't).

Any per-vcpu buffer should only be used by the respective vcpu.

>> Any hypercall buffer being addressed virtually could first tried to
>> be found via the SW-TLB. This wouldn't require any changes for most
>> of the hypercall interfaces. Only special cases with very large buffers
>> might need indirect variants (like Jan said: via GFN lists, which could
>> be passed in registered buffers).
>>
>> Encrypted guests would probably want to use static percpu buffers in
>> order to avoid switching the encryption state of the buffers all the
>> time.
>>
>> An unencrypted PVH/HVM domain (e.g. PVH dom0) could just define one
>> giant buffer with the domain's memory size via the physical memory
>> mapping of the kernel. All kmalloc() addresses would be in that region.
> 
> That's Linux-centric. I'm not convinced all OSes maintain a directmap.
> Without such, switching to this model might end up quite intrusive on
> the OS side.

This model is especially interesting for dom0. The majority of installations
is running a Linux dom0 AFAIK, so having an easy way to speed this case up
is a big plus.

> Thinking of Linux, we'd need a 2nd range covering the data part of the
> kernel image.

Probably, yes.

> Further this still wouldn't (afaics) pave a reasonable route towards
> dealing with privcmd-invoked hypercalls.

Today the hypercall buffers are all allocated via the privcmd driver. It
should be fairly easy to add an ioctl to get the buffer's kernel address
instead of using the user address.

Multi-page buffers might be problematic, though, so either we need to
have special variants for hypercalls with such buffers, or we are just
falling back to use virtual addresses for the cases where no guest
physically contiguous buffer could be allocated (doesn't apply to
encrypted guests, of course, as those need to have large enough buffers
anyway).

> Finally - in how far are we concerned of PV guests using linear
> addresses for hypercall buffers? I ask because I don't think the model
> lends itself to use also for the PV guest interfaces.

Good question.

As long as we support PV guests we can't drop support for linear addresses
IMO. So the question is whether we are fine with PV guests not using the
pre-registered buffers, or if we want to introduce an interface for PV
guests using GFNs instead of MFNs.

Juergen

>> A buffer address not found would need to be translated like today (and
>> fail for an encrypted guest).
>>
>> Thoughts?
>>
>>
>> Juergen
> 


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-29  8:16   ` Jan Beulich
@ 2022-09-29 12:53     ` Daniel P. Smith
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel P. Smith @ 2022-09-29 12:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 9/29/22 04:16, Jan Beulich wrote:
> On 28.09.2022 15:32, dpsmith.dev wrote:
>> On 9/28/22 06:38, Jan Beulich wrote:
>>> For quite some time we've been talking about replacing the present virtual
>>> address based hypercall interface with one using physical addresses.  This is in
>>> particular a prerequisite to being able to support guests with encrypted
>>> memory, as for such guests we cannot perform the page table walks necessary to
>>> translate virtual to (guest-)physical addresses.  But using (guest) physical
>>> addresses is also expected to help performance of non-PV guests (i.e. all Arm
>>> ones plus HVM/PVH on x86), because of the no longer necessary address
>>> translation.
>>
>> Greetings Jan,
>>
>> I think there are multiple issues in play here, but the two major ones
>> are 1.) eliminating the use of guest virtual addresses and 2.) handling
>> the change in the security model for hypercalls from encrypted VMs. As
>> Andy was pointing out, attempting to address (1) in a backwards
>> compatible approach will likely not arrive at a solution that can
>> address issue (2).
> 
> It may not be sufficient, but it is (can be) a prereq.

As I stated below, it will start setting the precedent for using GPAs. 
The concern is two-fold, how much benefit can actually be achieved for 
an API/ABI that cannot be used in the final solution. And by focusing 
effort on an unusable API/ABI, how much will that reduce effort/focus on 
crafting an API/ABI that can be used.

>> IMHO, the only result from teaching the existing ABI
>> to speak GPAs instead of VAs will be to break current and new kernels of
>> the habit of using VAs. Beyond that I do not see how it will do anything
>> to prepare current OS kernels for running as encrypted VMs, at least for
>> AMD since that is the specification I have been focused on studying the
>> last couple of months.
> 
> Plus we'd have code in the hypervisor then which deals with physical
> address based hypercall buffers. One less prereq to take care of for
> the (huge) rest of the work needed.

A question I would have is why not just RFC a GPA buffer helper 
framework for the hypervisor since it will get used by the new ABI and 
not spend effort retrofitting the current ABI. Some follow-on questions 
I would also ask is, moving forward would it be expected that new 
revisions of guests using the existing ABI would be expected to move to 
GPAs and how long do people see the existing ABI to continue in new 
guest revisions after the new ABI is adopted.

>> As for ABIv2, I understand and can appreciate Andy's desired approach.
>> Recently, especially with the hardware changes being introduced by SEV,
>> I would like to have considered a naive and more radical approach.
>> Currently hypercalls function using a more ioctl style. I would like to
>> suggest that a packet style interface similar to netlink be considered.
>> There are many benefits to adopting this type of interface that could be
>> covered in a larger RFC if there was any sense of willingness to
>> consider it. As a glimpse, a few benefits would be that arbitrary
>> buffers, continuations/asynchronous calls, and multi-call are all
>> natural consequence. It would also allow advanced extensions, such as an
>> optional PF_RING-like interface for zero-copy messaging from guest
>> user-space to hypervisor. While a packet interface could easily co-exist
>> with the existing ioctl-style interface, it would be a paradigm shift
>> from the past, though I feel ABIv2 was already going to be such a shift.
>> Anyway, just my 2¢.
> 
> I'm sorry for my ignorance, but I have no knowledge of how netlink
> works.

Understood, and you are not the first. A very quick, and very loose, 
comparison is that currently hypercalls are managed as ioctl remote call 
with per version defined payload. This proposal would move to a packet 
dispatch where the packet is a free-form TLV that allows unknown 
elements/parameters to be present. This enables a newer toolstack , 
without requiring a constantly moving compatibility layer, to send a 
packet to an older hypervisor which can reject unknown elements and 
hypercalls to silently ignore unknown parameters. Similarly, an older 
toolstack will be able to send packets to a new hypervisor. And as I 
stated above, this approach naturally enables continuations/async 
operations and multi-call invocations. It is a significant departure, 
and thus would require substantial design and implementation work, but 
there is an opportunity here to do this work.

V/r,
DPS


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-29 12:26         ` Juergen Gross
@ 2022-09-29 12:58           ` Jan Beulich
  2022-09-29 13:03             ` Juergen Gross
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2022-09-29 12:58 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, Andrew Cooper

On 29.09.2022 14:26, Juergen Gross wrote:
> On 29.09.22 13:32, Jan Beulich wrote:
>> Finally - in how far are we concerned of PV guests using linear
>> addresses for hypercall buffers? I ask because I don't think the model
>> lends itself to use also for the PV guest interfaces.
> 
> Good question.
> 
> As long as we support PV guests we can't drop support for linear addresses
> IMO. So the question is whether we are fine with PV guests not using the
> pre-registered buffers, or if we want to introduce an interface for PV
> guests using GFNs instead of MFNs.

GFN == MFN for PV, and using PFN space (being entirely controlled by the
guest) doesn't look attractive either. Plus any form of translation we'd
need to do for PV would involve getting and putting page references (for
writes also type references), along the lines of what is already
happening for HVM. Since "put" may involve freeing a page, which in turn
require locks to be taken, we'd need to carefully check that no such
translation can occur from an inappropriate call chain.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-29 12:58           ` Jan Beulich
@ 2022-09-29 13:03             ` Juergen Gross
  0 siblings, 0 replies; 14+ messages in thread
From: Juergen Gross @ 2022-09-29 13:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 1262 bytes --]

On 29.09.22 14:58, Jan Beulich wrote:
> On 29.09.2022 14:26, Juergen Gross wrote:
>> On 29.09.22 13:32, Jan Beulich wrote:
>>> Finally - in how far are we concerned of PV guests using linear
>>> addresses for hypercall buffers? I ask because I don't think the model
>>> lends itself to use also for the PV guest interfaces.
>>
>> Good question.
>>
>> As long as we support PV guests we can't drop support for linear addresses
>> IMO. So the question is whether we are fine with PV guests not using the
>> pre-registered buffers, or if we want to introduce an interface for PV
>> guests using GFNs instead of MFNs.
> 
> GFN == MFN for PV, and using PFN space (being entirely controlled by the

Sigh. I meant to write PFNs, of course.

> guest) doesn't look attractive either. Plus any form of translation we'd
> need to do for PV would involve getting and putting page references (for
> writes also type references), along the lines of what is already
> happening for HVM. Since "put" may involve freeing a page, which in turn
> require locks to be taken, we'd need to carefully check that no such
> translation can occur from an inappropriate call chain.

Sounds like a good reason to continue using linear addresses then.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-09-28 10:38 Proposal for physical address based hypercalls Jan Beulich
  2022-09-28 10:58 ` Andrew Cooper
  2022-09-28 13:32 ` dpsmith.dev
@ 2022-10-04  9:38 ` Julien Grall
  2022-10-04  9:46   ` Jan Beulich
  2 siblings, 1 reply; 14+ messages in thread
From: Julien Grall @ 2022-10-04  9:38 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

Hi Jan,

On 28/09/2022 11:38, Jan Beulich wrote:
> For quite some time we've been talking about replacing the present virtual
> address based hypercall interface with one using physical addresses.  This is in
> particular a prerequisite to being able to support guests with encrypted
> memory, as for such guests we cannot perform the page table walks necessary to
> translate virtual to (guest-)physical addresses.  But using (guest) physical
> addresses is also expected to help performance of non-PV guests (i.e. all Arm
> ones plus HVM/PVH on x86), because of the no longer necessary address
> translation.

I am not sure this is going to be a gain in performance on Arm. In most 
of the cases we are using the HW to translate the guest virtual address 
to a host physical address. But there are no instruction to translate a 
guest physical address to a host physical address. So we will have to do
the translation in software.

That said, there are other reasons on Arm (and possibly x86) to get rid 
of the virtual address. At the moment, we are requiring the VA to be 
always valid. This is quite fragile as we can't fully control how the 
kernel is touching its page-table (remember that on Arm we need to use 
break-before-make to do any shattering).

I have actually seen in the past some failure during the translation on 
Arm32. But I never fully investigated it because they were hard to repro 
as they rarely happen.

> 
> Clearly to be able to run existing guests, we need to continue to support the
> present virtual address based interface.  Previously it was suggested to change
> the model on a per-domain basis, perhaps by a domain creation control.  This
> has two major shortcomings:
>   - Entire guest OSes would need to switch over to the new model all in one go.
>     This could be particularly problematic for in-guest interfaces like Linux'es
>     privcmd driver, which is passed hypercall argument from user space.  Such
>     necessarily use virtual addresses, and hence the kernel would need to learn
>     of all hypercalls legitimately coming in, in order to translate the buffer
>     addresses.  Reaching sufficient coverage there might take some time.
>   - All base components within an individual guest instance which might run in
>     succession (firmware, boot loader, kernel, kexec) would need to agree on the
>     hypercall ABI to use.
> 
> As an alternative I'd like to propose the introduction of a bit (or multiple
> ones, see below) augmenting the hypercall number, to control the flavor of the
> buffers used for every individual hypercall.  This would likely involve the
> introduction of a new hypercall page (or multiple ones if more than one bit is
> to be used), to retain the present abstraction where it is the hypervisor which
> actually fills these pages.  For multicalls the wrapping multicall itself would
> be controlled independently of the constituent hypercalls.
> 
> A model involving just a single bit to indicate "flat" buffers has limitations
> when it comes to large buffers passed to a hypercall.  Since in many cases
> hypercalls (currently) allowing for rather large buffers wouldn't normally be
> used with buffers significantly larger than a single page (several of the
> mem-ops for example), special casing the (presumably) few hypercalls which have
> an actual need for large buffers might be an option.
> 
> Another approach would be to build in a scatter/gather model for buffers right
> away.  Jürgen suggests that the low two address bits could be used as a
> "descriptor" here.

IIUC, with this approach we would still need to have a bit in the 
hypercall number to indicate this is not a virtual address. Is that correct?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Proposal for physical address based hypercalls
  2022-10-04  9:38 ` Julien Grall
@ 2022-10-04  9:46   ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2022-10-04  9:46 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel

On 04.10.2022 11:38, Julien Grall wrote:
> On 28/09/2022 11:38, Jan Beulich wrote:
>> Another approach would be to build in a scatter/gather model for buffers right
>> away.  Jürgen suggests that the low two address bits could be used as a
>> "descriptor" here.
> 
> IIUC, with this approach we would still need to have a bit in the 
> hypercall number to indicate this is not a virtual address. Is that correct?

Yes.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-04  9:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28 10:38 Proposal for physical address based hypercalls Jan Beulich
2022-09-28 10:58 ` Andrew Cooper
2022-09-28 12:06   ` Jan Beulich
2022-09-28 13:03     ` Juergen Gross
2022-09-29 10:16       ` Wei Chen
2022-09-29 11:32       ` Jan Beulich
2022-09-29 12:26         ` Juergen Gross
2022-09-29 12:58           ` Jan Beulich
2022-09-29 13:03             ` Juergen Gross
2022-09-28 13:32 ` dpsmith.dev
2022-09-29  8:16   ` Jan Beulich
2022-09-29 12:53     ` Daniel P. Smith
2022-10-04  9:38 ` Julien Grall
2022-10-04  9:46   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.