All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen Rust VirtIO demos work breakdown for Project Stratos
@ 2021-09-24 16:02 Alex Bennée
  2021-09-24 23:59 ` Marek Marczykowski-Górecki
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Alex Bennée @ 2021-09-24 16:02 UTC (permalink / raw)
  To: Stratos Mailing List
  Cc: Mike Holmes, Mathieu Poirier, Viresh Kumar, Peter Griffin,
	xen-devel, wl, Artem Mygaiev, Andrew Cooper, Stefano Stabellini,
	Doug Goldstein, Oleksandr Tyshchenko, Rust-VMM Mailing List,
	Sergio Lopez, Stefan Hajnoczi, David Woodhouse


Hi,

The following is a breakdown (as best I can figure) of the work needed
to demonstrate VirtIO backends in Rust on the Xen hypervisor. It
requires work across a number of projects but notably core rust and virtio
enabling in the Xen project (building on the work EPAM has already done)
and the start of enabling rust-vmm crate to work with Xen.

The first demo is a fairly simple toy to exercise the direct hypercall
approach for a unikernel backend. On it's own it isn't super impressive
but hopefully serves as a proof of concept for the idea of having
backends running in a single exception level where latency will be
important.

The second is a much more ambitious bridge between Xen and vhost-user to
allow for re-use of the existing vhost-user backends with the bridge
acting as a proxy for what would usually be a full VMM in the type-2
hypervisor case. With that in mind the rust-vmm work is only aimed at
doing the device emulation and doesn't address the larger question of
how type-1 hypervisors can be integrated into the rust-vmm hypervisor
model.

A quick note about the estimates. They are exceedingly rough guesses
plucked out of the air and I would be grateful for feedback from the
appropriate domain experts on if I'm being overly optimistic or
pessimistic.

The links to the Stratos JIRA should be at least read accessible to all
although they contain the same information as the attached document
(albeit with nicer PNG renderings of my ASCII art ;-). There is a
Stratos sync-up call next Thursday:

  https://calendar.google.com/event?action=TEMPLATE&tmeid=MWpidm5lbzM5NjlydnAxdWxvc2s4aGI0ZGpfMjAyMTA5MzBUMTUwMDAwWiBjX2o3bmdpMW84cmxvZmtwZWQ0cjVjaDk4bXZnQGc&tmsrc=c_j7ngi1o8rlofkped4r5ch98mvg%40group.calendar.google.com

and I'm sure there will also be discussion in the various projects
(hence the wide CC list). The Stratos calls are open to anyone who wants
to attend and we welcome feedback from all who are interested.

So on with the work breakdown:

                    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
                     STRATOS PLANNING FOR 21 TO 22

                              Alex Bennée
                    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━


Table of Contents
─────────────────

1. Xen Rust Bindings ([STR-51])
.. 1. Upstream an "official" rust crate for Xen ([STR-52])
.. 2. Basic Hypervisor Interactions hypercalls ([STR-53])
.. 3. [#10] Access to XenStore service ([STR-54])
.. 4. VirtIO support hypercalls ([STR-55])
2. Xen Hypervisor Support for Stratos ([STR-56])
.. 1. Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
.. 2. Tweaks to tooling to launch VirtIO guests
3. rust-vmm support for Xen VirtIO ([STR-59])
.. 1. Make vm-memory Xen aware ([STR-60])
.. 2. Xen IO notification and IRQ injections ([STR-61])
4. Stratos Demos
.. 1. Rust based stubdomain monitor ([STR-62])
.. 2. Xen aware vhost-user master ([STR-63])





1 Xen Rust Bindings ([STR-51])
══════════════════════════════

  There exists a [placeholder repository] with the start of a set of
  x86_64 bindings for Xen and a very basic hello world uni-kernel
  example. This forms the basis of the initial Xen Rust work and will be
  available as a [xen-sys crate] via cargo.


[STR-51] <https://linaro.atlassian.net/browse/STR-51>

[placeholder repository] <https://gitlab.com/cardoe/oxerun.git>

[xen-sys crate] <https://crates.io/crates/xen-sys>

1.1 Upstream an "official" rust crate for Xen ([STR-52])
────────────────────────────────────────────────────────

  To start with we will want an upstream location for future work to be
  based upon. The intention is the crate is independent of the version
  of Xen it runs on (above the baseline version chosen). This will
  entail:

  • ☐ agreeing with upstream the name/location for the source
  • ☐ documenting the rules for the "stable" hypercall ABI
  • ☐ establish an internal interface to elide between ioctl mediated
    and direct hypercalls
  • ☐ ensure the crate is multi-arch and has feature parity for arm64

  As such we expect the implementation to be standalone, i.e. not
  wrapping the existing Xen libraries for mediation. There should be a
  close (1-to-1) mapping between the interfaces in the crate and the
  eventual hypercall made to the hypervisor.

  Estimate: 4w (elapsed likely longer due to discussion)


[STR-52] <https://linaro.atlassian.net/browse/STR-52>


1.2 Basic Hypervisor Interactions hypercalls ([STR-53])
───────────────────────────────────────────────────────

  These are the bare minimum hypercalls implemented as both ioctl and
  direct calls. These allow for a very basic binary to:

  • ☐ console_io - output IO via the Xen console
  • ☐ domctl stub - basic stub for domain control (different API?)
  • ☐ sysctl stub - basic stub for system control (different API?)

  The idea would be this provides enough hypercall interface to query
  the list of domains and output their status via the xen console. There
  is an open question about if the domctl and sysctl hypercalls are way
  to go.

  Estimate: 6w


[STR-53] <https://linaro.atlassian.net/browse/STR-53>


1.3 [#10] Access to XenStore service ([STR-54])
───────────────────────────────────────────────

  This is a shared configuration storage space accessed via either Unix
  sockets (on dom0) or via the Xenbus. This is used to access
  configuration information for the domain.

  Is this needed for a backend though? Can everything just be passed
  direct on the command line?

  Estimate: 4w


[STR-54] <https://linaro.atlassian.net/browse/STR-54>


1.4 VirtIO support hypercalls ([STR-55])
────────────────────────────────────────

  These are the hypercalls that need to be implemented to support a
  VirtIO backend. This includes the ability to map another guests memory
  into the current domains address space, register to receive IOREQ
  events when the guest knocks at the doorbell and inject kicks into the
  guest. The hypercalls we need to support would be:

  • ☐ dmop - device model ops (*_ioreq_server, setirq, nr_vpus)
  • ☐ foreignmemory - map and unmap guest memory

  The DMOP space is larger than what we need for an IOREQ backend so
  I've based it just on what arch/arm/dm.c exports which is the subset
  introduced for EPAM's virtio work.

  Estimate: 12w


[STR-55] <https://linaro.atlassian.net/browse/STR-55>


2 Xen Hypervisor Support for Stratos ([STR-56])
═══════════════════════════════════════════════

  These tasks include tasks needed to support the various different
  deployments of Stratos components in Xen.


[STR-56] <https://linaro.atlassian.net/browse/STR-56>

2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
───────────────────────────────────────────────────────────────

  Currently the foreign memory mapping support only works for dom0 due
  to reference counting issues. If we are to support backends running in
  their own domains this will need to get fixed.

  Estimate: 8w


[STR-57] <https://linaro.atlassian.net/browse/STR-57>


2.2 Tweaks to tooling to launch VirtIO guests
─────────────────────────────────────────────

  There might not be too much to do here. The EPAM work already did
  something similar for their PoC for virtio-block. Essentially we need
  to ensure:
  • ☐ DT bindings are passed to the guest for virtio-mmio device
    discovery
  • ☐ Our rust backend can be instantiated before the domU is launched

  This currently assumes the tools and the backend are running in dom0.

  Estimate: 4w


3 rust-vmm support for Xen VirtIO ([STR-59])
════════════════════════════════════════════

  This encompasses the tasks required to get a vhost-user server up and
  running while interfacing to the Xen hypervisor. This will require the
  xen-sys.rs crate for the actual interface to the hypervisor.

  We need to work out how a Xen configuration option would be passed to
  the various bits of rust-vmm when something is being built.


[STR-59] <https://linaro.atlassian.net/browse/STR-59>

3.1 Make vm-memory Xen aware ([STR-60])
───────────────────────────────────────

  The vm-memory crate is the root crate for abstracting access to the
  guests memory. It currently has multiple configuration builds to
  handle difference between mmap on Windows and Unix. Although mmap
  isn't directly exposed the public interfaces support a mmap like
  interface. We would need to:

  • ☐ work out how to expose foreign memory via the vm-memory mechanism

  I'm not sure if this just means implementing the GuestMemory trait for
  a GuestMemoryXen or if we need to present a mmap like interface.

  Estimate: 8w


[STR-60] <https://linaro.atlassian.net/browse/STR-60>


3.2 Xen IO notification and IRQ injections ([STR-61])
─────────────────────────────────────────────────────

  The KVM world provides for ioeventfd (notifications) and irqfd
  (injection) to signal asynchronously between the guest and the
  backend. As far a I can tell this is currently handled inside the
  various VMMs which assume a KVM backend.

  While the vhost-user slave code doesn't see the
  register_ioevent/register_irqfd events it does deal with EventFDs
  throughout the code. Perhaps the best approach here would be to create
  a IOREQ crate that can create EventFD descriptors which can then be
  passed to the slaves to use for notification and injection.

  Otherwise there might be an argument for a new crate that can
  encapsulate this behaviour for both KVM/ioeventd and Xen/IOREQ setups?

  Estimate: 8w?


[STR-61] <https://linaro.atlassian.net/browse/STR-61>


4 Stratos Demos
═══════════════

  These tasks cover the creation of demos that brig together all the
  previous bits of work to demonstrate a new area of capability that has
  been opened up by Stratos work.


4.1 Rust based stubdomain monitor ([STR-62])
────────────────────────────────────────────

  This is a basic demo that is a proof of concept for a unikernel style
  backend written in pure Rust. This work would be a useful precursor
  for things such as the RTOS Dom0 on a safety island ([STR-11]) or as a
  carrier for the virtio-scmi backend.

  The monitor program will periodically poll the state of the other
  domains and echo their status to the Xen console.

  Estimate: 4w

#+name: stub-domain-example
#+begin_src ditaa :cmdline -o :file stub_domain_example.png
                      Dom0                      |        DomU       |      DomStub   
                                                |                   |                
                                                :  /-------------\  :                
                                                |  |cPNK         |  |                
                                                |  |             |  |                
                                                |  |             |  |                
        /------------------------------------\  |  |   GuestOS   |  |                
        |cPNK                                |  |  |             |  |                
  EL0   |   Dom0 Userspace (xl tools, QEMU)  |  |  |             |  |  /---------------\
        |                                    |  |  |             |  |  |cYEL           |
        \------------------------------------/  |  |             |  |  |               |
        +------------------------------------+  |  |             |  |  | Rust Monitor  |
  EL1   |cA1B        Dom0 Kernel             |  |  |             |  |  |               |
        +------------------------------------+  |  \-------------/  |  \---------------/
  -------------------------------------------------------------------------------=------------------
        +-------------------------------------------------------------------------------------+
  EL2   |cC02                              Xen Hypervisor                                     |
        +-------------------------------------------------------------------------------------+
#+end_src

[STR-62] <https://linaro.atlassian.net/browse/STR-62>

[STR-11] <https://linaro.atlassian.net/browse/STR-11>


4.2 Xen aware vhost-user master ([STR-63])
──────────────────────────────────────────

  Usually the master side of a vhost-user system is embedded directly in
  the VMM itself. However in a Xen deployment their is no overarching
  VMM but a series of utility programs that query the hypervisor
  directly. The Xen tooling is also responsible for setting up any
  support processes that are responsible for emulating HW for the guest.

  The task aims to bridge the gap between Xen's normal HW emulation path
  (ioreq) and VirtIO's userspace device emulation (vhost-user). The
  process would be started with some information on where the
  virtio-mmio address space is and what the slave binary will be. It
  will then:

  • map the guest into Dom0 userspace and attach to a MemFD
  • register the appropriate memory regions as IOREQ regions with Xen
  • create EventFD channels for the virtio kick notifications (one each
    way)
  • spawn the vhost-user slave process and mediate the notifications and
    kicks between the slave and Xen itself

#+name: xen-vhost-user-master
#+begin_src ditaa :cmdline -o :file xen_vhost_user_master.png

                          Dom0                                            DomU                            
                                                          |                                               
                                                          |                                               
                                                          |                                               
                                                          |                                               
                                                          |                                               
                                                          |                                               
  +-------------------+            +-------------------+  |
  |                   |----------->|                   |  |
  |    vhost-user     | vhost-user |    vhost-user     |  :  /------------------------------------\
  |      slave        |  protocol  |      master       |  |  |                                    |
  |    (existing)     |<-----------|      (rust)       |  |  |                                    |
  +-------------------+            +-------------------+  |  |                                    |
           ^                           ^   |       ^      |  |             Guest Userspace        |
           |                           |   |       |      |  |                                    |
           |                           |   | IOREQ |      |  |                                    |       
           |                           |   |       |      |  |                                    |       
           v                           v   V       |      |  \------------------------------------/       
   +---------------------------------------------------+  |  +------------------------------------+
   |       ^                           ^   | ioctl ^   |  |  |                                    |
   |       |   iofd/irqfd eventFD      |   |       |   |  |  |              Guest Kernel          |
   |       +---------------------------+   |       |   |  |  | +-------------+                    |
   |                                       |       |   |  |  | | virtio-dev  |                    |
   |                       Host Kernel     V       |   |  |  | +-------------+                    |
   +---------------------------------------------------+  |  +------------------------------------+
                                           |       ^      |      |         ^                              
                                           | hyper |             |         |                              
      ----------------------=------------- | -=--- | ----=------ | -----=- | --------=------------------  
                                           |  call |        Trap |         | IRQ                          
                                           V       |             V         |                              
            +-------------------------------------------------------------------------------------+       
            |                              |       ^             |         ^                      |       
            |                              |       +-------------+         |                      |       
      EL2   |      Xen Hypervisor          |                               |                      |       
            |                              +-------------------------------+                      |       
            |                                                                                     |       
            +-------------------------------------------------------------------------------------+       

#+end_src

[STR-63] <https://linaro.atlassian.net/browse/STR-63>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-09-24 16:02 Xen Rust VirtIO demos work breakdown for Project Stratos Alex Bennée
@ 2021-09-24 23:59 ` Marek Marczykowski-Górecki
  2021-09-27  9:50   ` Alex Bennée
  2021-09-27 17:25 ` Oleksandr
  2021-09-28 11:37 ` Andrew Cooper
  2 siblings, 1 reply; 16+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-09-24 23:59 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Stratos Mailing List, Mike Holmes, Mathieu Poirier, Viresh Kumar,
	Peter Griffin, xen-devel, wl, Artem Mygaiev, Andrew Cooper,
	Stefano Stabellini, Doug Goldstein, Oleksandr Tyshchenko,
	Rust-VMM Mailing List, Sergio Lopez, Stefan Hajnoczi,
	David Woodhouse

[-- Attachment #1: Type: text/plain, Size: 1955 bytes --]

On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
> Hi,

Hi,

> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
> ───────────────────────────────────────────────────────────────
> 
>   Currently the foreign memory mapping support only works for dom0 due
>   to reference counting issues. If we are to support backends running in
>   their own domains this will need to get fixed.
> 
>   Estimate: 8w
> 
> 
> [STR-57] <https://linaro.atlassian.net/browse/STR-57>

I'm pretty sure it was discussed before, but I can't find relevant
(part of) thread right now: does your model assumes the backend (running
outside of dom0) will gain ability to map (or access in other way)
_arbitrary_ memory page of a frontend domain? Or worse: any domain?
That is a significant regression in terms of security model Xen
provides. It would give the backend domain _a lot more_ control over the
system that it normally has with Xen PV drivers - negating significant
part of security benefits of using driver domains.

So, does the above require frontend agreeing (explicitly or implicitly)
for accessing specific pages by the backend? There were several
approaches to that discussed, including using grant tables (as PV
drivers do), vIOMMU(?), or even drastically different model with no
shared memory at all (Argo). Can you clarify which (if any) approach
your attempt of VirtIO on Xen will use?

A more general idea: can we collect info on various VirtIO on Xen
approaches (since there is more than one) in a single place, including:
 - key characteristics, differences
 - who is involved
 - status
 - links to relevant threads, maybe

I'd propose to revive https://wiki.xenproject.org/wiki/Virtio_On_Xen

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-09-24 23:59 ` Marek Marczykowski-Górecki
@ 2021-09-27  9:50   ` Alex Bennée
  2021-09-28  5:55     ` [Stratos-dev] " Christopher Clark
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bennée @ 2021-09-27  9:50 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Stratos Mailing List, Mike Holmes, Mathieu Poirier, Viresh Kumar,
	Peter Griffin, xen-devel, wl, Artem Mygaiev, Andrew Cooper,
	Stefano Stabellini, Doug Goldstein, Oleksandr Tyshchenko,
	Rust-VMM Mailing List, Sergio Lopez, Stefan Hajnoczi,
	David Woodhouse, Arnd Bergmann


Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes:

> [[PGP Signed Part:Undecided]]
> On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
>> Hi,
>
> Hi,
>
>> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
>> ───────────────────────────────────────────────────────────────
>> 
>>   Currently the foreign memory mapping support only works for dom0 due
>>   to reference counting issues. If we are to support backends running in
>>   their own domains this will need to get fixed.
>> 
>>   Estimate: 8w
>> 
>> 
>> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
>
> I'm pretty sure it was discussed before, but I can't find relevant
> (part of) thread right now: does your model assumes the backend (running
> outside of dom0) will gain ability to map (or access in other way)
> _arbitrary_ memory page of a frontend domain? Or worse: any domain?

The aim is for some DomU's to host backends for other DomU's instead of
all backends being in Dom0. Those backend DomU's would have to be
considered trusted because as you say the default memory model of VirtIO
is to have full access to the frontend domains memory map.

> That is a significant regression in terms of security model Xen
> provides. It would give the backend domain _a lot more_ control over the
> system that it normally has with Xen PV drivers - negating significant
> part of security benefits of using driver domains.

It's part of the continual trade off between security and speed. For
things like block and network backends there is a penalty if data has to
be bounce buffered before it ends up in the guest address space.

> So, does the above require frontend agreeing (explicitly or implicitly)
> for accessing specific pages by the backend? There were several
> approaches to that discussed, including using grant tables (as PV
> drivers do), vIOMMU(?), or even drastically different model with no
> shared memory at all (Argo). Can you clarify which (if any) approach
> your attempt of VirtIO on Xen will use?

There are separate strands of work in Stratos looking at how we could
further secure VirtIO for architectures with distributed backends (e.g.
you may accept the block backend having access to the whole of memory
but an i2c multiplexer has different performance characteristics).

Currently the only thing we have prototyped is "fat virtqueues" which
Arnd has been working on. Here the only actual shared memory required is
the VirtIO config space and the relevant virt queues.

Other approaches have been discussed including using the virtio-iommu to
selectively make areas available to the backend or use memory zoning so
for example network buffers are only allocated in a certain region of
guest physical memory that is shared with the backend.

> A more general idea: can we collect info on various VirtIO on Xen
> approaches (since there is more than one) in a single place, including:
>  - key characteristics, differences
>  - who is involved
>  - status
>  - links to relevant threads, maybe
>
> I'd propose to revive https://wiki.xenproject.org/wiki/Virtio_On_Xen

From the Stratos point of view Xen is a useful proving ground for
general VirtIO experimentation due to being both a type-1 and open
source. Our ultimate aim is have a high degree of code sharing for
backends regardless of the hypervisor choice so a guest can use a VirtIO
device model without having to be locked into KVM.

If your technology choice is already fixed with a Xen hypervisor and
portability isn't a concern you might well just stick to the existing
well tested Xen PV interfaces.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-09-24 16:02 Xen Rust VirtIO demos work breakdown for Project Stratos Alex Bennée
  2021-09-24 23:59 ` Marek Marczykowski-Górecki
@ 2021-09-27 17:25 ` Oleksandr
  2021-09-28 11:37 ` Andrew Cooper
  2 siblings, 0 replies; 16+ messages in thread
From: Oleksandr @ 2021-09-27 17:25 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Stratos Mailing List, Mike Holmes, Mathieu Poirier, Viresh Kumar,
	Peter Griffin, xen-devel, wl, Artem Mygaiev, Andrew Cooper,
	Stefano Stabellini, Doug Goldstein, Rust-VMM Mailing List,
	Sergio Lopez, Stefan Hajnoczi, David Woodhouse


On 24.09.21 19:02, Alex Bennée wrote:

Hi Alex

[snip]

>
> [STR-56] <https://linaro.atlassian.net/browse/STR-56>
>
> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
> ───────────────────────────────────────────────────────────────
>
>    Currently the foreign memory mapping support only works for dom0 due
>    to reference counting issues. If we are to support backends running in
>    their own domains this will need to get fixed.
>
>    Estimate: 8w
>
>
> [STR-57] <https://linaro.atlassian.net/browse/STR-57>

If I got this paragraph correctly, this is already fixed on Arm [1]


[1] 
https://lore.kernel.org/xen-devel/1611884932-1851-17-git-send-email-olekstysh@gmail.com/


[snip]


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-09-27  9:50   ` Alex Bennée
@ 2021-09-28  5:55     ` Christopher Clark
  2021-09-28  6:26       ` Stefano Stabellini
  2021-09-28  6:30       ` Stefan Hajnoczi
  0 siblings, 2 replies; 16+ messages in thread
From: Christopher Clark @ 2021-09-28  5:55 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Marek Marczykowski-Górecki, Artem Mygaiev,
	Oleksandr Tyshchenko, Stefano Stabellini, Sergio Lopez, Wei Liu,
	Stefan Hajnoczi, Rust-VMM Mailing List, Doug Goldstein,
	Andrew Cooper, xen-devel, Arnd Bergmann, David Woodhouse,
	Stratos Mailing List, Rich Persaud, Daniel Smith, Paul Durrant,
	openxt

[-- Attachment #1: Type: text/plain, Size: 6056 bytes --]

On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <
stratos-dev@op-lists.linaro.org> wrote:

>
> Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes:
>
> > [[PGP Signed Part:Undecided]]
> > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
> >> Hi,
> >
> > Hi,
> >
> >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
> >> ───────────────────────────────────────────────────────────────
> >>
> >>   Currently the foreign memory mapping support only works for dom0 due
> >>   to reference counting issues. If we are to support backends running in
> >>   their own domains this will need to get fixed.
> >>
> >>   Estimate: 8w
> >>
> >>
> >> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
> >
> > I'm pretty sure it was discussed before, but I can't find relevant
> > (part of) thread right now: does your model assumes the backend (running
> > outside of dom0) will gain ability to map (or access in other way)
> > _arbitrary_ memory page of a frontend domain? Or worse: any domain?
>
> The aim is for some DomU's to host backends for other DomU's instead of
> all backends being in Dom0. Those backend DomU's would have to be
> considered trusted because as you say the default memory model of VirtIO
> is to have full access to the frontend domains memory map.
>

I share Marek's concern. I believe that there are Xen-based systems that
will want to run guests using VirtIO devices without extending this level
of trust to the backend domains.


>
> > That is a significant regression in terms of security model Xen
> > provides. It would give the backend domain _a lot more_ control over the
> > system that it normally has with Xen PV drivers - negating significant
> > part of security benefits of using driver domains.
>
> It's part of the continual trade off between security and speed. For
> things like block and network backends there is a penalty if data has to
> be bounce buffered before it ends up in the guest address space.
>

I think we have significant flexibility in being able to modify several
layers of the stack here to make this efficient, and it would be beneficial
to avoid bounce buffering if possible without sacrificing the ability to
enforce isolation. I wonder if there's a viable approach possible with some
implementation of a virtual IOMMU (which enforces access control) that
would allow a backend to commission I/O on a physical device on behalf of a
guest, where the data buffers do not need to be mapped into the backend and
so avoid the need for a bounce?


>
> > So, does the above require frontend agreeing (explicitly or implicitly)
> > for accessing specific pages by the backend? There were several
> > approaches to that discussed, including using grant tables (as PV
> > drivers do), vIOMMU(?), or even drastically different model with no
> > shared memory at all (Argo). Can you clarify which (if any) approach
> > your attempt of VirtIO on Xen will use?
>
> There are separate strands of work in Stratos looking at how we could
> further secure VirtIO for architectures with distributed backends (e.g.
> you may accept the block backend having access to the whole of memory
> but an i2c multiplexer has different performance characteristics).
>
> Currently the only thing we have prototyped is "fat virtqueues" which
> Arnd has been working on. Here the only actual shared memory required is
> the VirtIO config space and the relevant virt queues.
>

I think the "fat virtqueues" work is a positive path for investigation and
I don't think shared memory between front and backend is hard requirement
for those to function: a VirtIO-Argo transport driver would be able to
operate with them without shared memory.


>
> Other approaches have been discussed including using the virtio-iommu to
> selectively make areas available to the backend or use memory zoning so
> for example network buffers are only allocated in a certain region of
> guest physical memory that is shared with the backend.
>
> > A more general idea: can we collect info on various VirtIO on Xen
> > approaches (since there is more than one) in a single place, including:
> >  - key characteristics, differences
> >  - who is involved
> >  - status
> >  - links to relevant threads, maybe
> >
> > I'd propose to revive https://wiki.xenproject.org/wiki/Virtio_On_Xen


Thanks for the reminder, Marek -- I've just overhauled that page to give an
overview of the several approaches in the Xen community to enabling VirtIO
on Xen, and have included a first pass at including the content you
describe. I'm happy to be involved in improving it further.


>
>
> From the Stratos point of view Xen is a useful proving ground for
> general VirtIO experimentation due to being both a type-1 and open
> source. Our ultimate aim is have a high degree of code sharing for
> backends regardless of the hypervisor choice so a guest can use a VirtIO
> device model without having to be locked into KVM.
>

Thanks, Alex - this context is useful.


>
> If your technology choice is already fixed with a Xen hypervisor and
> portability isn't a concern you might well just stick to the existing
> well tested Xen PV interfaces.
>

I wouldn't quite agree; there are additional reasons beyond portability to
be looking at other options than the traditional Xen PV interfaces: eg. an
Argo-based interdomain transport for PV devices will enable fine-grained
enforcement of Mandatory Access Control over the frontend / backend
communication, and will not depend on XenStore which is advantageous for
Hyperlaunch / dom0less Xen deployment configurations.

thanks,

Christopher



>
> --
> Alex Bennée
> --
> Stratos-dev mailing list
> Stratos-dev@op-lists.linaro.org
> https://op-lists.linaro.org/mailman/listinfo/stratos-dev
>

[-- Attachment #2: Type: text/html, Size: 8208 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-09-28  5:55     ` [Stratos-dev] " Christopher Clark
@ 2021-09-28  6:26       ` Stefano Stabellini
  2021-09-28 20:18         ` Oleksandr Tyshchenko
  2021-09-28  6:30       ` Stefan Hajnoczi
  1 sibling, 1 reply; 16+ messages in thread
From: Stefano Stabellini @ 2021-09-28  6:26 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Alex Bennée, Marek Marczykowski-Górecki, Artem Mygaiev,
	Oleksandr Tyshchenko, Stefano Stabellini, Sergio Lopez, Wei Liu,
	Stefan Hajnoczi, Rust-VMM Mailing List, Doug Goldstein,
	Andrew Cooper, xen-devel, Arnd Bergmann, David Woodhouse,
	Stratos Mailing List, Rich Persaud, Daniel Smith, Paul Durrant,
	openxt

[-- Attachment #1: Type: text/plain, Size: 2132 bytes --]

On Mon, 27 Sep 2021, Christopher Clark wrote:
> On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <stratos-dev@op-lists.linaro.org> wrote:
> 
>       Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes:
> 
>       > [[PGP Signed Part:Undecided]]
>       > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
>       >> Hi,
>       >
>       > Hi,
>       >
>       >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
>       >> ───────────────────────────────────────────────────────────────
>       >>
>       >>   Currently the foreign memory mapping support only works for dom0 due
>       >>   to reference counting issues. If we are to support backends running in
>       >>   their own domains this will need to get fixed.
>       >>
>       >>   Estimate: 8w
>       >>
>       >>
>       >> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
>       >
>       > I'm pretty sure it was discussed before, but I can't find relevant
>       > (part of) thread right now: does your model assumes the backend (running
>       > outside of dom0) will gain ability to map (or access in other way)
>       > _arbitrary_ memory page of a frontend domain? Or worse: any domain?
> 
>       The aim is for some DomU's to host backends for other DomU's instead of
>       all backends being in Dom0. Those backend DomU's would have to be
>       considered trusted because as you say the default memory model of VirtIO
>       is to have full access to the frontend domains memory map.
> 
> 
> I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices without extending
> this level of trust to the backend domains.

From a safety perspective, it would be challenging to deploy a system
with privileged backends. From a safety perspective, it would be a lot
easier if the backend were unprivileged.

This is one of those times where safety and security requirements are
actually aligned.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-09-28  5:55     ` [Stratos-dev] " Christopher Clark
  2021-09-28  6:26       ` Stefano Stabellini
@ 2021-09-28  6:30       ` Stefan Hajnoczi
  1 sibling, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2021-09-28  6:30 UTC (permalink / raw)
  To: Christopher Clark
  Cc: Alex Bennée, Marek Marczykowski-Górecki, Artem Mygaiev,
	Oleksandr Tyshchenko, Stefano Stabellini, Sergio Lopez, Wei Liu,
	Rust-VMM Mailing List, Doug Goldstein, Andrew Cooper, xen-devel,
	Arnd Bergmann, David Woodhouse, Stratos Mailing List,
	Rich Persaud, Daniel Smith, Paul Durrant, openxt

On Tue, Sep 28, 2021 at 7:55 AM Christopher Clark
<christopher.w.clark@gmail.com> wrote:
>
> On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <stratos-dev@op-lists.linaro.org> wrote:
>>
>>
>> Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes:
>>
>> > [[PGP Signed Part:Undecided]]
>> > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
>> > That is a significant regression in terms of security model Xen
>> > provides. It would give the backend domain _a lot more_ control over the
>> > system that it normally has with Xen PV drivers - negating significant
>> > part of security benefits of using driver domains.
>>
>> It's part of the continual trade off between security and speed. For
>> things like block and network backends there is a penalty if data has to
>> be bounce buffered before it ends up in the guest address space.
>
>
> I think we have significant flexibility in being able to modify several layers of the stack here to make this efficient, and it would be beneficial to avoid bounce buffering if possible without sacrificing the ability to enforce isolation. I wonder if there's a viable approach possible with some implementation of a virtual IOMMU (which enforces access control) that would allow a backend to commission I/O on a physical device on behalf of a guest, where the data buffers do not need to be mapped into the backend and so avoid the need for a bounce?

This may not require much modification for Linux guest drivers.
Although the VIRTIO drivers traditionally assumed devices can DMA to
any memory location, there are already constraints in other situations
like Confidential Computing, where swiotlb is used for bounce
buffering.

Stefan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-09-24 16:02 Xen Rust VirtIO demos work breakdown for Project Stratos Alex Bennée
  2021-09-24 23:59 ` Marek Marczykowski-Górecki
  2021-09-27 17:25 ` Oleksandr
@ 2021-09-28 11:37 ` Andrew Cooper
  2 siblings, 0 replies; 16+ messages in thread
From: Andrew Cooper @ 2021-09-28 11:37 UTC (permalink / raw)
  To: Alex Bennée, Stratos Mailing List
  Cc: Mike Holmes, Mathieu Poirier, Viresh Kumar, Peter Griffin,
	xen-devel, wl, Artem Mygaiev, Stefano Stabellini, Doug Goldstein,
	Oleksandr Tyshchenko, Rust-VMM Mailing List, Sergio Lopez,
	Stefan Hajnoczi, David Woodhouse

On 24/09/2021 17:02, Alex Bennée wrote:
> 1.1 Upstream an "official" rust crate for Xen ([STR-52])
> ────────────────────────────────────────────────────────
>
>   To start with we will want an upstream location for future work to be
>   based upon. The intention is the crate is independent of the version
>   of Xen it runs on (above the baseline version chosen). This will
>   entail:
>
>   • ☐ agreeing with upstream the name/location for the source

Probably github/xen-project/rust-bindings unless anyone has a better
suggestion.

We almost certainly want a companion repository configured as a
hello-world example using the bindings and (cross-)compiled for each
backend target.

>   • ☐ documenting the rules for the "stable" hypercall ABI

Easy.  There shall be no use of unstable interfaces at all.

This is the *only* way to avoid making the bindings dependent on the
version of the hypervisor, and will be a major improvement in the Xen
ecosystem.

Any unstable hypercall wanting to be used shall be stabilised in Xen
first, which has been vehemently agreed to at multiple dev summits in
the past, and will be a useful way of guiding the stabilisation effort.

>   • ☐ establish an internal interface to elide between ioctl mediated
>     and direct hypercalls
>   • ☐ ensure the crate is multi-arch and has feature parity for arm64
>
>   As such we expect the implementation to be standalone, i.e. not
>   wrapping the existing Xen libraries for mediation. There should be a
>   close (1-to-1) mapping between the interfaces in the crate and the
>   eventual hypercall made to the hypervisor.
>
>   Estimate: 4w (elapsed likely longer due to discussion)
>
>
> [STR-52] <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinaro.atlassian.net%2Fbrowse%2FSTR-52&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C973087248f404018a36008d97f78a0ea%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637680978363070576%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0k6ES9IXCvTlqP8jTiFeuTN51GlMngxaepWvQC6hMAg%3D&amp;reserved=0>
>
>
> 1.2 Basic Hypervisor Interactions hypercalls ([STR-53])
> ───────────────────────────────────────────────────────
>
>   These are the bare minimum hypercalls implemented as both ioctl and
>   direct calls. These allow for a very basic binary to:
>
>   • ☐ console_io - output IO via the Xen console
>   • ☐ domctl stub - basic stub for domain control (different API?)
>   • ☐ sysctl stub - basic stub for system control (different API?)
>
>   The idea would be this provides enough hypercall interface to query
>   the list of domains and output their status via the xen console. There
>   is an open question about if the domctl and sysctl hypercalls are way
>   to go.

console_io probably wants implementing as a backend to println!() or the
log module, because users of the crate won't want change how they
printf()/etc depending on the target.

That said, console_io hypercalls only do anything for unprivleged VMs in
debug builds of the hypervisor.  This is fine for development, and less
fine in production, so logging ought to use the PV console instead (with
room for future expansion to an Argo transport).

domctl/sysctl are unstable interfaces.  I don't think they'll be
necessary for a basic virtio backend, and they will be the most
complicated hypercalls to stabilise.

>
>   Estimate: 6w
>
>
> [STR-53] <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinaro.atlassian.net%2Fbrowse%2FSTR-53&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C973087248f404018a36008d97f78a0ea%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637680978363070576%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0mgMPS4ikzD%2F8fc%2BcqVjtRuSKzMu%2Ba8XgOs4hOC9pY4%3D&amp;reserved=0>
>
>
> 1.3 [#10] Access to XenStore service ([STR-54])
> ───────────────────────────────────────────────
>
>   This is a shared configuration storage space accessed via either Unix
>   sockets (on dom0) or via the Xenbus. This is used to access
>   configuration information for the domain.
>
>   Is this needed for a backend though? Can everything just be passed
>   direct on the command line?

Currently, if you want a stubdom and you want to instruct it to shut
down cleanly, it needs xenstore.  Any stubdom which wants disk or
network needs xenstore too.

xenbus (the transport) does need to split between ioctl()'s and raw
hypercalls.  xenstore (the protocol) could be in the xen crate, or a
separate one as it is a piece of higher level functionality.

However, we should pay attention to non-xenstore usecases and not paint
ourselves into a corner.  Some security usecases would prefer not to use
shared memory, and e.g. might consider using an Argo transport instead
of the traditional grant-shared page.

>
>   Estimate: 4w
>
>
> [STR-54] <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinaro.atlassian.net%2Fbrowse%2FSTR-54&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C973087248f404018a36008d97f78a0ea%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637680978363070576%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mURxxo7vQwfTkR4cX5yN5l7kPav2gXhluOhm2%2BSBAIs%3D&amp;reserved=0>
>
>
> 1.4 VirtIO support hypercalls ([STR-55])
> ────────────────────────────────────────
>
>   These are the hypercalls that need to be implemented to support a
>   VirtIO backend. This includes the ability to map another guests memory
>   into the current domains address space, register to receive IOREQ
>   events when the guest knocks at the doorbell and inject kicks into the
>   guest. The hypercalls we need to support would be:
>
>   • ☐ dmop - device model ops (*_ioreq_server, setirq, nr_vpus)
>   • ☐ foreignmemory - map and unmap guest memory

also evtchn, which you need for ioreq notifications.

>   The DMOP space is larger than what we need for an IOREQ backend so
>   I've based it just on what arch/arm/dm.c exports which is the subset
>   introduced for EPAM's virtio work.

One thing we will want to be is careful with the interface.  The current
DMOPs are a mess of units (particularly frames vs addresses, which will
need to change in Xen in due course) as well as range
inclusivity/exclusivity.

>
>   Estimate: 12w
>
>
> [STR-55] <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinaro.atlassian.net%2Fbrowse%2FSTR-55&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C973087248f404018a36008d97f78a0ea%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637680978363070576%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tczcGl6YWGRwnBooBwodDX2BHLKu5tJG3%2FTe%2Fpqf%2B9w%3D&amp;reserved=0>
>
>
> 2 Xen Hypervisor Support for Stratos ([STR-56])
> ═══════════════════════════════════════════════
>
>   These tasks include tasks needed to support the various different
>   deployments of Stratos components in Xen.
>
>
> [STR-56] <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinaro.atlassian.net%2Fbrowse%2FSTR-56&amp;data=04%7C01%7CAndrew.Cooper3%40citrix.com%7C973087248f404018a36008d97f78a0ea%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637680978363070576%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=31lFqxkbZtyNdQcyVI1d8M53l4JqVhni1s1aowrtoXg%3D&amp;reserved=0>
>
> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
> ───────────────────────────────────────────────────────────────
>
>   Currently the foreign memory mapping support only works for dom0 due
>   to reference counting issues. If we are to support backends running in
>   their own domains this will need to get fixed.

Oh.  It appears as if some of this was completed in
https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=4922caf1de5a08d3eefb4058de1b7f0122c8f76f

~Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-09-28  6:26       ` Stefano Stabellini
@ 2021-09-28 20:18         ` Oleksandr Tyshchenko
  2021-10-01 23:58           ` Stefano Stabellini
  0 siblings, 1 reply; 16+ messages in thread
From: Oleksandr Tyshchenko @ 2021-09-28 20:18 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel
  Cc: Christopher Clark, Alex Bennée,
	Marek Marczykowski-Górecki, Artem Mygaiev, Sergio Lopez,
	Wei Liu, Stefan Hajnoczi, Rust-VMM Mailing List, Doug Goldstein,
	Andrew Cooper, Arnd Bergmann, David Woodhouse,
	Stratos Mailing List, Rich Persaud, Daniel Smith, Paul Durrant,
	openxt

[-- Attachment #1: Type: text/plain, Size: 4512 bytes --]

On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini <sstabellini@kernel.org>
wrote:

Hi Stefano, all

[Sorry for the possible format issues]


On Mon, 27 Sep 2021, Christopher Clark wrote:
> > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <
> stratos-dev@op-lists.linaro.org> wrote:
> >
> >       Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> writes:
> >
> >       > [[PGP Signed Part:Undecided]]
> >       > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
> >       >> Hi,
> >       >
> >       > Hi,
> >       >
> >       >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
> >       >> ───────────────────────────────────────────────────────────────
> >       >>
> >       >>   Currently the foreign memory mapping support only works for
> dom0 due
> >       >>   to reference counting issues. If we are to support backends
> running in
> >       >>   their own domains this will need to get fixed.
> >       >>
> >       >>   Estimate: 8w
> >       >>
> >       >>
> >       >> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
> >       >
> >       > I'm pretty sure it was discussed before, but I can't find
> relevant
> >       > (part of) thread right now: does your model assumes the backend
> (running
> >       > outside of dom0) will gain ability to map (or access in other
> way)
> >       > _arbitrary_ memory page of a frontend domain? Or worse: any
> domain?
> >
> >       The aim is for some DomU's to host backends for other DomU's
> instead of
> >       all backends being in Dom0. Those backend DomU's would have to be
> >       considered trusted because as you say the default memory model of
> VirtIO
> >       is to have full access to the frontend domains memory map.
> >
> >
> > I share Marek's concern. I believe that there are Xen-based systems that
> will want to run guests using VirtIO devices without extending
> > this level of trust to the backend domains.
>
> From a safety perspective, it would be challenging to deploy a system
> with privileged backends. From a safety perspective, it would be a lot
> easier if the backend were unprivileged.
>
> This is one of those times where safety and security requirements are
> actually aligned.


Well, the foreign memory mapping has one advantage in the context of Virtio
use-case
which is that Virtio infrastructure in Guest doesn't require any
modifications to run on top Xen.
The only issue with foreign memory here is that Guest memory actually
mapped without its agreement
which doesn't perfectly fit into the security model. (although there is one
more issue with XSA-300,
but I think it will go away sooner or later, at least there are some
attempts to eliminate it).
While the ability to map any part of Guest memory is not an issue for the
backend running in Dom0
(which we usually trust), this will certainly violate Xen security model if
we want to run it in other
domain, so I completely agree with the existing concern.

It was discussed before [1], but I couldn't find any decisions regarding
that. As I understand,
the one of the possible ideas is to have some entity in Xen (PV
IOMMU/virtio-iommu/whatever)
that works in protection mode, so it denies all foreign mapping requests
from the backend running in DomU
by default and only allows requests with mapping which were *implicitly*
granted by the Guest before.
For example, Xen could be informed which MMIOs hold the queue PFN and
notify registers
(as it traps the accesses to these registers anyway) and could
theoretically parse the frontend request
and retrieve descriptors to make a decision which GFNs are actually
*allowed*.

I can't say for sure (sorry not familiar enough with the topic), but
implementing the virtio-iommu device
in Xen we could probably avoid Guest modifications at all. Of course, for
this to work
the Virtio infrastructure in Guest should use DMA API as mentioned in [1].

Would the “restricted foreign mapping” solution retain the Xen security
model and be accepted
by the Xen community? I wonder, has someone already looked in this
direction, are there any
pitfalls here or is this even feasible?

[1]
https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.com/



-- 
Regards,

Oleksandr Tyshchenko

[-- Attachment #2: Type: text/html, Size: 6321 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-09-28 20:18         ` Oleksandr Tyshchenko
@ 2021-10-01 23:58           ` Stefano Stabellini
  2021-10-02 17:55             ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Stabellini @ 2021-10-01 23:58 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, xen-devel, Christopher Clark,
	Alex Bennée, Marek Marczykowski-Górecki, Artem Mygaiev,
	Sergio Lopez, Wei Liu, Stefan Hajnoczi, Rust-VMM Mailing List,
	Doug Goldstein, Andrew Cooper, Arnd Bergmann, David Woodhouse,
	Stratos Mailing List, Rich Persaud, Daniel Smith, Paul Durrant,
	openxt

[-- Attachment #1: Type: text/plain, Size: 5574 bytes --]

On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote:
> On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> Hi Stefano, all
> 
> [Sorry for the possible format issues]
> 
> 
>       On Mon, 27 Sep 2021, Christopher Clark wrote:
>       > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <stratos-dev@op-lists.linaro.org> wrote:
>       >
>       >       Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes:
>       >
>       >       > [[PGP Signed Part:Undecided]]
>       >       > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
>       >       >> Hi,
>       >       >
>       >       > Hi,
>       >       >
>       >       >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
>       >       >> ───────────────────────────────────────────────────────────────
>       >       >>
>       >       >>   Currently the foreign memory mapping support only works for dom0 due
>       >       >>   to reference counting issues. If we are to support backends running in
>       >       >>   their own domains this will need to get fixed.
>       >       >>
>       >       >>   Estimate: 8w
>       >       >>
>       >       >>
>       >       >> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
>       >       >
>       >       > I'm pretty sure it was discussed before, but I can't find relevant
>       >       > (part of) thread right now: does your model assumes the backend (running
>       >       > outside of dom0) will gain ability to map (or access in other way)
>       >       > _arbitrary_ memory page of a frontend domain? Or worse: any domain?
>       >
>       >       The aim is for some DomU's to host backends for other DomU's instead of
>       >       all backends being in Dom0. Those backend DomU's would have to be
>       >       considered trusted because as you say the default memory model of VirtIO
>       >       is to have full access to the frontend domains memory map.
>       >
>       >
>       > I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices without
>       extending
>       > this level of trust to the backend domains.
> 
>       >From a safety perspective, it would be challenging to deploy a system
>       with privileged backends. From a safety perspective, it would be a lot
>       easier if the backend were unprivileged.
> 
>       This is one of those times where safety and security requirements are
>       actually aligned.
> 
> 
> Well, the foreign memory mapping has one advantage in the context of Virtio use-case
> which is that Virtio infrastructure in Guest doesn't require any modifications to run on top Xen.
> The only issue with foreign memory here is that Guest memory actually mapped without its agreement
> which doesn't perfectly fit into the security model. (although there is one more issue with XSA-300,
> but I think it will go away sooner or later, at least there are some attempts to eliminate it).
> While the ability to map any part of Guest memory is not an issue for the backend running in Dom0
> (which we usually trust), this will certainly violate Xen security model if we want to run it in other
> domain, so I completely agree with the existing concern.

Yep, that's what I was referring to.


> It was discussed before [1], but I couldn't find any decisions regarding that. As I understand,
> the one of the possible ideas is to have some entity in Xen (PV IOMMU/virtio-iommu/whatever)
> that works in protection mode, so it denies all foreign mapping requests from the backend running in DomU
> by default and only allows requests with mapping which were *implicitly* granted by the Guest before.
> For example, Xen could be informed which MMIOs hold the queue PFN and notify registers
> (as it traps the accesses to these registers anyway) and could theoretically parse the frontend request
> and retrieve descriptors to make a decision which GFNs are actually *allowed*.
> 
> I can't say for sure (sorry not familiar enough with the topic), but implementing the virtio-iommu device
> in Xen we could probably avoid Guest modifications at all. Of course, for this to work
> the Virtio infrastructure in Guest should use DMA API as mentioned in [1].
> 
> Would the “restricted foreign mapping” solution retain the Xen security model and be accepted
> by the Xen community? I wonder, has someone already looked in this direction, are there any
> pitfalls here or is this even feasible?
> 
> [1] https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.com/

The discussion that went further is actually one based on the idea that
there is a pre-shared memory area and the frontend always passes
addresses from it. For ease of implementation, the pre-shared area is
the virtqueue itself so this approach has been called "fat virtqueue".
But it requires guest modifications and it probably results in
additional memory copies.

I am not sure if the approach you mentioned could be implemented
completely without frontend changes. It looks like Xen would have to
learn how to inspect virtqueues in order to verify implicit grants
without frontend changes. With or without guest modifications, I am not
aware of anyone doing research and development on this approach.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-10-01 23:58           ` Stefano Stabellini
@ 2021-10-02 17:55             ` Oleksandr Tyshchenko
  2021-10-04 21:53               ` Stefano Stabellini
  0 siblings, 1 reply; 16+ messages in thread
From: Oleksandr Tyshchenko @ 2021-10-02 17:55 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Christopher Clark, Alex Bennée,
	Marek Marczykowski-Górecki, Artem Mygaiev, Sergio Lopez,
	Wei Liu, Stefan Hajnoczi, Rust-VMM Mailing List, Doug Goldstein,
	Andrew Cooper, Arnd Bergmann, David Woodhouse,
	Stratos Mailing List, Rich Persaud, Daniel Smith, Paul Durrant,
	openxt, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 8408 bytes --]

On Sat, Oct 2, 2021 at 2:58 AM Stefano Stabellini <sstabellini@kernel.org>
wrote:

Hi Stefano, all

[Sorry for the possible format issues]
[I have CCed Julien]


On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote:
> > On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini <
> sstabellini@kernel.org> wrote:
> >
> > Hi Stefano, all
> >
> > [Sorry for the possible format issues]
> >
> >
> >       On Mon, 27 Sep 2021, Christopher Clark wrote:
> >       > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <
> stratos-dev@op-lists.linaro.org> wrote:
> >       >
> >       >       Marek Marczykowski-Górecki <
> marmarek@invisiblethingslab.com> writes:
> >       >
> >       >       > [[PGP Signed Part:Undecided]]
> >       >       > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée
> wrote:
> >       >       >> Hi,
> >       >       >
> >       >       > Hi,
> >       >       >
> >       >       >> 2.1 Stable ABI for foreignmemory mapping to non-dom0
> ([STR-57])
> >       >       >>
> ───────────────────────────────────────────────────────────────
> >       >       >>
> >       >       >>   Currently the foreign memory mapping support only
> works for dom0 due
> >       >       >>   to reference counting issues. If we are to support
> backends running in
> >       >       >>   their own domains this will need to get fixed.
> >       >       >>
> >       >       >>   Estimate: 8w
> >       >       >>
> >       >       >>
> >       >       >> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
> >       >       >
> >       >       > I'm pretty sure it was discussed before, but I can't
> find relevant
> >       >       > (part of) thread right now: does your model assumes the
> backend (running
> >       >       > outside of dom0) will gain ability to map (or access in
> other way)
> >       >       > _arbitrary_ memory page of a frontend domain? Or worse:
> any domain?
> >       >
> >       >       The aim is for some DomU's to host backends for other
> DomU's instead of
> >       >       all backends being in Dom0. Those backend DomU's would
> have to be
> >       >       considered trusted because as you say the default memory
> model of VirtIO
> >       >       is to have full access to the frontend domains memory map.
> >       >
> >       >
> >       > I share Marek's concern. I believe that there are Xen-based
> systems that will want to run guests using VirtIO devices without
> >       extending
> >       > this level of trust to the backend domains.
> >
> >       >From a safety perspective, it would be challenging to deploy a
> system
> >       with privileged backends. From a safety perspective, it would be a
> lot
> >       easier if the backend were unprivileged.
> >
> >       This is one of those times where safety and security requirements
> are
> >       actually aligned.
> >
> >
> > Well, the foreign memory mapping has one advantage in the context of
> Virtio use-case
> > which is that Virtio infrastructure in Guest doesn't require any
> modifications to run on top Xen.
> > The only issue with foreign memory here is that Guest memory actually
> mapped without its agreement
> > which doesn't perfectly fit into the security model. (although there is
> one more issue with XSA-300,
> > but I think it will go away sooner or later, at least there are some
> attempts to eliminate it).
> > While the ability to map any part of Guest memory is not an issue for
> the backend running in Dom0
> > (which we usually trust), this will certainly violate Xen security model
> if we want to run it in other
> > domain, so I completely agree with the existing concern.
>
> Yep, that's what I was referring to.
>
>
> > It was discussed before [1], but I couldn't find any decisions regarding
> that. As I understand,
> > the one of the possible ideas is to have some entity in Xen (PV
> IOMMU/virtio-iommu/whatever)
> > that works in protection mode, so it denies all foreign mapping requests
> from the backend running in DomU
> > by default and only allows requests with mapping which were *implicitly*
> granted by the Guest before.
> > For example, Xen could be informed which MMIOs hold the queue PFN and
> notify registers
> > (as it traps the accesses to these registers anyway) and could
> theoretically parse the frontend request
> > and retrieve descriptors to make a decision which GFNs are actually
> *allowed*.
> >
> > I can't say for sure (sorry not familiar enough with the topic), but
> implementing the virtio-iommu device
> > in Xen we could probably avoid Guest modifications at all. Of course,
> for this to work
> > the Virtio infrastructure in Guest should use DMA API as mentioned in
> [1].
> >
> > Would the “restricted foreign mapping” solution retain the Xen security
> model and be accepted
> > by the Xen community? I wonder, has someone already looked in this
> direction, are there any
> > pitfalls here or is this even feasible?
> >
> > [1]
> https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.com/
>
> The discussion that went further is actually one based on the idea that
> there is a pre-shared memory area and the frontend always passes
> addresses from it. For ease of implementation, the pre-shared area is
> the virtqueue itself so this approach has been called "fat virtqueue".
> But it requires guest modifications and it probably results in
> additional memory copies.
>

I got it. Although we would need to map that pre-shared area anyway (I
presume it could be done at once during initialization), I think it much
better than
map arbitrary pages at runtime. If there is a way for Xen to know the
pre-shared area location in advance it will be able to allow mapping this
region only and deny other attempts.




>
> I am not sure if the approach you mentioned could be implemented
> completely without frontend changes. It looks like Xen would have to
> learn how to inspect virtqueues in order to verify implicit grants
> without frontend changes.


I looked through the virtio-iommu specification and corresponding Linux
driver but I am sure I don't see all the challenges and pitfalls. Having a
limited knowledge of IOMMU infrastructure in Linux, below is just my guess,
which might be wrong.

1. I think, if we want to avoid frontend changes the backend in Xen would
need to fully conform to the specification, I am afraid that besides just
inspecting virtqueues, the backend needs to properly and completely emulate
the virtio device, handle shadow page tables, etc. Otherwise we might break
the guest. I expect a huge amount of work to implement this properly.

2. Also, if I got the things correctly, it looks like when enabling
virtio-iommu, all addresses passed in requests to the virtio devices behind
the virtio-iommu will be in guest virtual address space (IOVA). So we would
need to find a way for userspace (if the backend is IOREQ server) to
translate them to guest physical addresses (IPA) via these shadow page
tables in the backend in front of mapping them via foreign memory map
calls. So I expect Xen, toolstack and Linux privcmd driver changes and
additional complexity taking into account how the data structures could be
accessed (data structures being continuously in IOVA, could be
discontinuous in IPA, indirect table descriptors, etc).
I am wondering, would it be possible to have identity IOMMU mapping (IOVA
== GPA) at the guest side but without bypassing an IOMMU, as we need the
virtio-iommu frontend to send map/unmap requests, can we control this
behaviour somehow?
I think this would simplify things.

3. Also, we would probably want to have a single virtio-iommu device
instance per guest, so all virtio devices which belong to this guest will
share the IOMMU mapping for the optimization purposes. For this to work all
virtio devices inside a guest should be attached to the same IOMMU domain.
Probably, we could control that, but I am not 100% sure.





> With or without guest modifications, I am not
> aware of anyone doing research and development on this approach.





-- 
Regards,

Oleksandr Tyshchenko

[-- Attachment #2: Type: text/html, Size: 11189 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-10-02 17:55             ` Oleksandr Tyshchenko
@ 2021-10-04 21:53               ` Stefano Stabellini
  2021-10-06 16:43                 ` Oleksandr
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Stabellini @ 2021-10-04 21:53 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, xen-devel, Christopher Clark,
	Alex Bennée, Marek Marczykowski-Górecki, Artem Mygaiev,
	Sergio Lopez, Wei Liu, Stefan Hajnoczi, Rust-VMM Mailing List,
	Doug Goldstein, Andrew Cooper, Arnd Bergmann, David Woodhouse,
	Stratos Mailing List, Rich Persaud, Daniel Smith, Paul Durrant,
	openxt, Julien Grall, jgross

[-- Attachment #1: Type: text/plain, Size: 9844 bytes --]

On Sat, 2 Oct 2021, Oleksandr Tyshchenko wrote:
> On Sat, Oct 2, 2021 at 2:58 AM Stefano Stabellini <sstabellini@kernel.org> wrote:
> 
> Hi Stefano, all
> 
> [Sorry for the possible format issues]
> [I have CCed Julien]
> 
> 
>       On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote:
>       > On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini <sstabellini@kernel.org> wrote:
>       >
>       > Hi Stefano, all
>       >
>       > [Sorry for the possible format issues]
>       >
>       >
>       >       On Mon, 27 Sep 2021, Christopher Clark wrote:
>       >       > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <stratos-dev@op-lists.linaro.org> wrote:
>       >       >
>       >       >       Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes:
>       >       >
>       >       >       > [[PGP Signed Part:Undecided]]
>       >       >       > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
>       >       >       >> Hi,
>       >       >       >
>       >       >       > Hi,
>       >       >       >
>       >       >       >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
>       >       >       >> ───────────────────────────────────────────────────────────────
>       >       >       >>
>       >       >       >>   Currently the foreign memory mapping support only works for dom0 due
>       >       >       >>   to reference counting issues. If we are to support backends running in
>       >       >       >>   their own domains this will need to get fixed.
>       >       >       >>
>       >       >       >>   Estimate: 8w
>       >       >       >>
>       >       >       >>
>       >       >       >> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
>       >       >       >
>       >       >       > I'm pretty sure it was discussed before, but I can't find relevant
>       >       >       > (part of) thread right now: does your model assumes the backend (running
>       >       >       > outside of dom0) will gain ability to map (or access in other way)
>       >       >       > _arbitrary_ memory page of a frontend domain? Or worse: any domain?
>       >       >
>       >       >       The aim is for some DomU's to host backends for other DomU's instead of
>       >       >       all backends being in Dom0. Those backend DomU's would have to be
>       >       >       considered trusted because as you say the default memory model of VirtIO
>       >       >       is to have full access to the frontend domains memory map.
>       >       >
>       >       >
>       >       > I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices
>       without
>       >       extending
>       >       > this level of trust to the backend domains.
>       >
>       >       >From a safety perspective, it would be challenging to deploy a system
>       >       with privileged backends. From a safety perspective, it would be a lot
>       >       easier if the backend were unprivileged.
>       >
>       >       This is one of those times where safety and security requirements are
>       >       actually aligned.
>       >
>       >
>       > Well, the foreign memory mapping has one advantage in the context of Virtio use-case
>       > which is that Virtio infrastructure in Guest doesn't require any modifications to run on top Xen.
>       > The only issue with foreign memory here is that Guest memory actually mapped without its agreement
>       > which doesn't perfectly fit into the security model. (although there is one more issue with XSA-300,
>       > but I think it will go away sooner or later, at least there are some attempts to eliminate it).
>       > While the ability to map any part of Guest memory is not an issue for the backend running in Dom0
>       > (which we usually trust), this will certainly violate Xen security model if we want to run it in other
>       > domain, so I completely agree with the existing concern.
> 
>       Yep, that's what I was referring to.
> 
> 
>       > It was discussed before [1], but I couldn't find any decisions regarding that. As I understand,
>       > the one of the possible ideas is to have some entity in Xen (PV IOMMU/virtio-iommu/whatever)
>       > that works in protection mode, so it denies all foreign mapping requests from the backend running in DomU
>       > by default and only allows requests with mapping which were *implicitly* granted by the Guest before.
>       > For example, Xen could be informed which MMIOs hold the queue PFN and notify registers
>       > (as it traps the accesses to these registers anyway) and could theoretically parse the frontend request
>       > and retrieve descriptors to make a decision which GFNs are actually *allowed*.
>       >
>       > I can't say for sure (sorry not familiar enough with the topic), but implementing the virtio-iommu device
>       > in Xen we could probably avoid Guest modifications at all. Of course, for this to work
>       > the Virtio infrastructure in Guest should use DMA API as mentioned in [1].
>       >
>       > Would the “restricted foreign mapping” solution retain the Xen security model and be accepted
>       > by the Xen community? I wonder, has someone already looked in this direction, are there any
>       > pitfalls here or is this even feasible?
>       >
>       > [1] https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.com/
> 
>       The discussion that went further is actually one based on the idea that
>       there is a pre-shared memory area and the frontend always passes
>       addresses from it. For ease of implementation, the pre-shared area is
>       the virtqueue itself so this approach has been called "fat virtqueue".
>       But it requires guest modifications and it probably results in
>       additional memory copies.
> 
>  
> I got it. Although we would need to map that pre-shared area anyway (I presume it could be done at once during initialization), I think it
> much better than
> map arbitrary pages at runtime.

Yeah that's the idea


> If there is a way for Xen to know the pre-shared area location in advance it will be able to allow mapping
> this region only and deny other attempts.
 
No, but there are patches (not yet upstream) to introduce a way to
pre-share memory regions between VMs using xl:
https://github.com/Xilinx/xen/commits/xilinx/release-2021.1?after=4bd2da58b5b008f77429007a307b658db9c0f636+104&branch=xilinx%2Frelease-2021.1

So I think it would probably be the other way around: xen/libxl
advertises on device tree (or ACPI) the presence of the pre-shared
regions to both domains. Then frontend and backend would start using it.

 
>       I am not sure if the approach you mentioned could be implemented
>       completely without frontend changes. It looks like Xen would have to
>       learn how to inspect virtqueues in order to verify implicit grants
>       without frontend changes.
> 
>  
> I looked through the virtio-iommu specification and corresponding Linux driver but I am sure I don't see all the challenges and pitfalls.
> Having a limited knowledge of IOMMU infrastructure in Linux, below is just my guess, which might be wrong.
> 
> 1. I think, if we want to avoid frontend changes the backend in Xen would need to fully conform to the specification, I am afraid that
> besides just inspecting virtqueues, the backend needs to properly and completely emulate the virtio device, handle shadow page tables, etc.
> Otherwise we might break the guest. I expect a huge amount of work to implement this properly.

Yeah, I think we would want to stay away from shadow pagetables unless
we are really forced to go there.


> 2. Also, if I got the things correctly, it looks like when enabling virtio-iommu, all addresses passed in requests to the virtio devices
> behind the virtio-iommu will be in guest virtual address space (IOVA). So we would need to find a way for userspace (if the backend is
> IOREQ server) to translate them to guest physical addresses (IPA) via these shadow page tables in the backend in front of mapping them via
> foreign memory map calls. So I expect Xen, toolstack and Linux privcmd driver changes and additional complexity taking into account how the
> data structures could be accessed (data structures being continuously in IOVA, could be discontinuous in IPA, indirect table descriptors,
> etc). 
> I am wondering, would it be possible to have identity IOMMU mapping (IOVA == GPA) at the guest side but without bypassing an IOMMU, as we
> need the virtio-iommu frontend to send map/unmap requests, can we control this behaviour somehow?
> I think this would simplify things.

None of the above looks easy. I think you are right that we would need
IOVA == GPA to make the implementation feasible and with decent
performance. But if we need a spec change, then I think Juergen's
proposal of introducing a new transport that uses grant table references
instead of GPAs is worth considering.


> 3. Also, we would probably want to have a single virtio-iommu device instance per guest, so all virtio devices which belong to this guest
> will share the IOMMU mapping for the optimization purposes. For this to work all virtio devices inside a guest should be attached to the
> same IOMMU domain. Probably, we could control that, but I am not 100% sure.  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-10-04 21:53               ` Stefano Stabellini
@ 2021-10-06 16:43                 ` Oleksandr
  2022-04-14 20:03                   ` Oleksandr Tyshchenko
  0 siblings, 1 reply; 16+ messages in thread
From: Oleksandr @ 2021-10-06 16:43 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Christopher Clark, Alex Bennée,
	Marek Marczykowski-Górecki, Artem Mygaiev, Sergio Lopez,
	Wei Liu, Stefan Hajnoczi, Rust-VMM Mailing List, Doug Goldstein,
	Andrew Cooper, Arnd Bergmann, David Woodhouse,
	Stratos Mailing List, Rich Persaud, Daniel Smith, Paul Durrant,
	openxt, Julien Grall, jgross


On 05.10.21 00:53, Stefano Stabellini wrote:

Hi Stefano, all

> On Sat, 2 Oct 2021, Oleksandr Tyshchenko wrote:
>> On Sat, Oct 2, 2021 at 2:58 AM Stefano Stabellini <sstabellini@kernel.org> wrote:
>>
>> Hi Stefano, all
>>
>> [Sorry for the possible format issues]
>> [I have CCed Julien]
>>
>>
>>        On Tue, 28 Sep 2021, Oleksandr Tyshchenko wrote:
>>        > On Tue, Sep 28, 2021 at 9:26 AM Stefano Stabellini <sstabellini@kernel.org> wrote:
>>        >
>>        > Hi Stefano, all
>>        >
>>        > [Sorry for the possible format issues]
>>        >
>>        >
>>        >       On Mon, 27 Sep 2021, Christopher Clark wrote:
>>        >       > On Mon, Sep 27, 2021 at 3:06 AM Alex Bennée via Stratos-dev <stratos-dev@op-lists.linaro.org> wrote:
>>        >       >
>>        >       >       Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> writes:
>>        >       >
>>        >       >       > [[PGP Signed Part:Undecided]]
>>        >       >       > On Fri, Sep 24, 2021 at 05:02:46PM +0100, Alex Bennée wrote:
>>        >       >       >> Hi,
>>        >       >       >
>>        >       >       > Hi,
>>        >       >       >
>>        >       >       >> 2.1 Stable ABI for foreignmemory mapping to non-dom0 ([STR-57])
>>        >       >       >> ───────────────────────────────────────────────────────────────
>>        >       >       >>
>>        >       >       >>   Currently the foreign memory mapping support only works for dom0 due
>>        >       >       >>   to reference counting issues. If we are to support backends running in
>>        >       >       >>   their own domains this will need to get fixed.
>>        >       >       >>
>>        >       >       >>   Estimate: 8w
>>        >       >       >>
>>        >       >       >>
>>        >       >       >> [STR-57] <https://linaro.atlassian.net/browse/STR-57>
>>        >       >       >
>>        >       >       > I'm pretty sure it was discussed before, but I can't find relevant
>>        >       >       > (part of) thread right now: does your model assumes the backend (running
>>        >       >       > outside of dom0) will gain ability to map (or access in other way)
>>        >       >       > _arbitrary_ memory page of a frontend domain? Or worse: any domain?
>>        >       >
>>        >       >       The aim is for some DomU's to host backends for other DomU's instead of
>>        >       >       all backends being in Dom0. Those backend DomU's would have to be
>>        >       >       considered trusted because as you say the default memory model of VirtIO
>>        >       >       is to have full access to the frontend domains memory map.
>>        >       >
>>        >       >
>>        >       > I share Marek's concern. I believe that there are Xen-based systems that will want to run guests using VirtIO devices
>>        without
>>        >       extending
>>        >       > this level of trust to the backend domains.
>>        >
>>        >       >From a safety perspective, it would be challenging to deploy a system
>>        >       with privileged backends. From a safety perspective, it would be a lot
>>        >       easier if the backend were unprivileged.
>>        >
>>        >       This is one of those times where safety and security requirements are
>>        >       actually aligned.
>>        >
>>        >
>>        > Well, the foreign memory mapping has one advantage in the context of Virtio use-case
>>        > which is that Virtio infrastructure in Guest doesn't require any modifications to run on top Xen.
>>        > The only issue with foreign memory here is that Guest memory actually mapped without its agreement
>>        > which doesn't perfectly fit into the security model. (although there is one more issue with XSA-300,
>>        > but I think it will go away sooner or later, at least there are some attempts to eliminate it).
>>        > While the ability to map any part of Guest memory is not an issue for the backend running in Dom0
>>        > (which we usually trust), this will certainly violate Xen security model if we want to run it in other
>>        > domain, so I completely agree with the existing concern.
>>
>>        Yep, that's what I was referring to.
>>
>>
>>        > It was discussed before [1], but I couldn't find any decisions regarding that. As I understand,
>>        > the one of the possible ideas is to have some entity in Xen (PV IOMMU/virtio-iommu/whatever)
>>        > that works in protection mode, so it denies all foreign mapping requests from the backend running in DomU
>>        > by default and only allows requests with mapping which were *implicitly* granted by the Guest before.
>>        > For example, Xen could be informed which MMIOs hold the queue PFN and notify registers
>>        > (as it traps the accesses to these registers anyway) and could theoretically parse the frontend request
>>        > and retrieve descriptors to make a decision which GFNs are actually *allowed*.
>>        >
>>        > I can't say for sure (sorry not familiar enough with the topic), but implementing the virtio-iommu device
>>        > in Xen we could probably avoid Guest modifications at all. Of course, for this to work
>>        > the Virtio infrastructure in Guest should use DMA API as mentioned in [1].
>>        >
>>        > Would the “restricted foreign mapping” solution retain the Xen security model and be accepted
>>        > by the Xen community? I wonder, has someone already looked in this direction, are there any
>>        > pitfalls here or is this even feasible?
>>        >
>>        > [1] https://lore.kernel.org/xen-devel/464e91ec-2b53-2338-43c7-a018087fc7f6@arm.com/
>>
>>        The discussion that went further is actually one based on the idea that
>>        there is a pre-shared memory area and the frontend always passes
>>        addresses from it. For ease of implementation, the pre-shared area is
>>        the virtqueue itself so this approach has been called "fat virtqueue".
>>        But it requires guest modifications and it probably results in
>>        additional memory copies.
>>
>>   
>> I got it. Although we would need to map that pre-shared area anyway (I presume it could be done at once during initialization), I think it
>> much better than
>> map arbitrary pages at runtime.
> Yeah that's the idea
>
>
>> If there is a way for Xen to know the pre-shared area location in advance it will be able to allow mapping
>> this region only and deny other attempts.
>   
> No, but there are patches (not yet upstream) to introduce a way to
> pre-share memory regions between VMs using xl:
> https://github.com/Xilinx/xen/commits/xilinx/release-2021.1?after=4bd2da58b5b008f77429007a307b658db9c0f636+104&branch=xilinx%2Frelease-2021.1
>
> So I think it would probably be the other way around: xen/libxl
> advertises on device tree (or ACPI) the presence of the pre-shared
> regions to both domains. Then frontend and backend would start using it.

Thank you for the explanation. I remember this series has already 
appeared in ML. If I got the idea correctly this way we won't need to 
map the foreign memory from the backend at all (I assume this eliminates 
security concern?). It looks like the every pre-shared region (described 
in config file) is mapped by the toolstack at the domains creation time 
and the details of this region are also written to the Xenstore. All 
what backend needs to do is to map the region into its address space 
(via mmap). For this to work the guest should allocate virtqueue from 
Xen specific reserved memory [1].

[1] 
https://www.kernel.org/doc/Documentation/devicetree/bindings/reserved-memory/xen%2Cshared-memory.txt


>   
>>        I am not sure if the approach you mentioned could be implemented
>>        completely without frontend changes. It looks like Xen would have to
>>        learn how to inspect virtqueues in order to verify implicit grants
>>        without frontend changes.
>>
>>   
>> I looked through the virtio-iommu specification and corresponding Linux driver but I am sure I don't see all the challenges and pitfalls.
>> Having a limited knowledge of IOMMU infrastructure in Linux, below is just my guess, which might be wrong.
>>
>> 1. I think, if we want to avoid frontend changes the backend in Xen would need to fully conform to the specification, I am afraid that
>> besides just inspecting virtqueues, the backend needs to properly and completely emulate the virtio device, handle shadow page tables, etc.
>> Otherwise we might break the guest. I expect a huge amount of work to implement this properly.
> Yeah, I think we would want to stay away from shadow pagetables unless
> we are really forced to go there.
>
>
>> 2. Also, if I got the things correctly, it looks like when enabling virtio-iommu, all addresses passed in requests to the virtio devices
>> behind the virtio-iommu will be in guest virtual address space (IOVA). So we would need to find a way for userspace (if the backend is
>> IOREQ server) to translate them to guest physical addresses (IPA) via these shadow page tables in the backend in front of mapping them via
>> foreign memory map calls. So I expect Xen, toolstack and Linux privcmd driver changes and additional complexity taking into account how the
>> data structures could be accessed (data structures being continuously in IOVA, could be discontinuous in IPA, indirect table descriptors,
>> etc).
>> I am wondering, would it be possible to have identity IOMMU mapping (IOVA == GPA) at the guest side but without bypassing an IOMMU, as we
>> need the virtio-iommu frontend to send map/unmap requests, can we control this behaviour somehow?
>> I think this would simplify things.
> None of the above looks easy. I think you are right that we would need
> IOVA == GPA to make the implementation feasible and with decent
> performance.

Yes. Otherwise, I am afraid, the implementation is going to be quite 
difficult with questionable performance at the end.

I found out that IOMMU domain in Linux can be identity mapped 
(IOMMU_DOMAIN_IDENTITY - DMA addresses are system physical addresses) 
and this can be controlled via cmd line.
I admit I didn't test, but from the IOMMU framework code it looks like 
that driver's map/unmap callback won't be called in this mode and as the 
result the IOMMU mapping never reaches the backend. Unfortunately, this 
is not what we want as we won't have any understating what the GFNs are...

> But if we need a spec change, then I think Juergen's
> proposal of introducing a new transport that uses grant table references
> instead of GPAs is worth considering.

Agree, if we the spec changes cannot be avoided then yes.


>
>
>> 3. Also, we would probably want to have a single virtio-iommu device instance per guest, so all virtio devices which belong to this guest
>> will share the IOMMU mapping for the optimization purposes. For this to work all virtio devices inside a guest should be attached to the
>> same IOMMU domain. Probably, we could control that, but I am not 100% sure.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2021-10-06 16:43                 ` Oleksandr
@ 2022-04-14 20:03                   ` Oleksandr Tyshchenko
  2022-04-15  9:07                     ` Alex Bennée
  0 siblings, 1 reply; 16+ messages in thread
From: Oleksandr Tyshchenko @ 2022-04-14 20:03 UTC (permalink / raw)
  To: xen-devel, Rust-VMM Mailing List, Stratos Mailing List
  Cc: Christopher Clark, Alex Bennée,
	Marek Marczykowski-Górecki, Artem Mygaiev, Sergio Lopez,
	Wei Liu, Stefan Hajnoczi, Doug Goldstein, Andrew Cooper,
	Arnd Bergmann, David Woodhouse, Rich Persaud, Daniel Smith,
	Paul Durrant, openxt, Julien Grall, Juergen Gross,
	Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 666 bytes --]

Hello all.

[Sorry for the possible format issues]

I have an update regarding (valid) concern which has been also raised in
current thread which is the virtio backend's ability (when using Xen
foreign mapping) to map any guest pages without guest "agreement" on that.
There is a PoC (with virtio-mmio on Arm) which is based on Juergen Gross’
work to reuse secure Xen grant mapping for the virtio communications.
All details are at:
https://lore.kernel.org/xen-devel/1649963973-22879-1-git-send-email-olekstysh@gmail.com/
https://lore.kernel.org/xen-devel/1649964960-24864-1-git-send-email-olekstysh@gmail.com/

-- 
Regards,

Oleksandr Tyshchenko

[-- Attachment #2: Type: text/html, Size: 1671 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2022-04-14 20:03                   ` Oleksandr Tyshchenko
@ 2022-04-15  9:07                     ` Alex Bennée
  2022-04-15 11:06                       ` Oleksandr
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bennée @ 2022-04-15  9:07 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Rust-VMM Mailing List, Stratos Mailing List,
	Christopher Clark, Marek Marczykowski-Górecki,
	Artem Mygaiev, Sergio Lopez, Wei Liu, Stefan Hajnoczi,
	Doug Goldstein, Andrew Cooper, Arnd Bergmann, David Woodhouse,
	Rich Persaud, Daniel Smith, Paul Durrant, openxt, Julien Grall,
	Juergen Gross, Stefano Stabellini


Oleksandr Tyshchenko <olekstysh@gmail.com> writes:

> Hello all.
>
> [Sorry for the possible format issues]
>
> I have an update regarding (valid) concern which has been also raised in current thread which is the virtio backend's ability (when using
> Xen foreign mapping) to map any guest pages without guest "agreement" on that.
> There is a PoC (with virtio-mmio on Arm) which is based on Juergen Gross’  work to reuse secure Xen grant mapping for the virtio
> communications.
> All details are at:
> https://lore.kernel.org/xen-devel/1649963973-22879-1-git-send-email-olekstysh@gmail.com/
> https://lore.kernel.org/xen-devel/1649964960-24864-1-git-send-email-olekstysh@gmail.com/

Thanks for that. I shall try and find some time to have a look at it.

Did you see Viresh's post about getting our rust-vmm vhost-user backends
working on Xen?

One thing that came up during that work was how guest pages are mapped
into the dom0 domain where Xen needs to use kernel allocated pages via
privcmd rather than then normal shared mmap that is used on KVM. As I
understand it this is to avoid the situation where dom0 may invalidate a
user PTE causing issues for the hypervisor itself. At some point we
would like to fix that wrinkle so we can remove the (minor) hack in
rust-vmm's mmap code to be truly hypervisor agnostic. 

Anyway I hope you and your team are safe and well.

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Stratos-dev] Xen Rust VirtIO demos work breakdown for Project Stratos
  2022-04-15  9:07                     ` Alex Bennée
@ 2022-04-15 11:06                       ` Oleksandr
  0 siblings, 0 replies; 16+ messages in thread
From: Oleksandr @ 2022-04-15 11:06 UTC (permalink / raw)
  To: Alex Bennée
  Cc: xen-devel, Rust-VMM Mailing List, Stratos Mailing List,
	Christopher Clark, Marek Marczykowski-Górecki,
	Artem Mygaiev, Sergio Lopez, Wei Liu, Stefan Hajnoczi,
	Doug Goldstein, Andrew Cooper, Arnd Bergmann, David Woodhouse,
	Rich Persaud, Daniel Smith, Paul Durrant, openxt, Julien Grall,
	Juergen Gross, Stefano Stabellini


On 15.04.22 12:07, Alex Bennée wrote:


Hello Alex

> Oleksandr Tyshchenko <olekstysh@gmail.com> writes:
>
>> Hello all.
>>
>> [Sorry for the possible format issues]
>>
>> I have an update regarding (valid) concern which has been also raised in current thread which is the virtio backend's ability (when using
>> Xen foreign mapping) to map any guest pages without guest "agreement" on that.
>> There is a PoC (with virtio-mmio on Arm) which is based on Juergen Gross’  work to reuse secure Xen grant mapping for the virtio
>> communications.
>> All details are at:
>> https://lore.kernel.org/xen-devel/1649963973-22879-1-git-send-email-olekstysh@gmail.com/
>> https://lore.kernel.org/xen-devel/1649964960-24864-1-git-send-email-olekstysh@gmail.com/
> Thanks for that. I shall try and find some time to have a look at it.
>
> Did you see Viresh's post about getting our rust-vmm vhost-user backends
> working on Xen?

Great work! I see the email in my mailbox, but didn't analyze it yet. I 
will definitely take a look at it.


>
> One thing that came up during that work was how guest pages are mapped
> into the dom0 domain where Xen needs to use kernel allocated pages via
> privcmd rather than then normal shared mmap that is used on KVM. As I
> understand it this is to avoid the situation where dom0 may invalidate a
> user PTE causing issues for the hypervisor itself. At some point we
> would like to fix that wrinkle so we can remove the (minor) hack in
> rust-vmm's mmap code to be truly hypervisor agnostic.
>
> Anyway I hope you and your team are safe and well.

Thank you!


>
-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-04-15 11:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-24 16:02 Xen Rust VirtIO demos work breakdown for Project Stratos Alex Bennée
2021-09-24 23:59 ` Marek Marczykowski-Górecki
2021-09-27  9:50   ` Alex Bennée
2021-09-28  5:55     ` [Stratos-dev] " Christopher Clark
2021-09-28  6:26       ` Stefano Stabellini
2021-09-28 20:18         ` Oleksandr Tyshchenko
2021-10-01 23:58           ` Stefano Stabellini
2021-10-02 17:55             ` Oleksandr Tyshchenko
2021-10-04 21:53               ` Stefano Stabellini
2021-10-06 16:43                 ` Oleksandr
2022-04-14 20:03                   ` Oleksandr Tyshchenko
2022-04-15  9:07                     ` Alex Bennée
2022-04-15 11:06                       ` Oleksandr
2021-09-28  6:30       ` Stefan Hajnoczi
2021-09-27 17:25 ` Oleksandr
2021-09-28 11:37 ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.