All of lore.kernel.org
 help / color / mirror / Atom feed
* (no subject)
@ 2020-11-30 10:31 Oleksandr Tyshchenko
  2020-11-30 10:31 ` [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
                   ` (24 more replies)
  0 siblings, 25 replies; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Tim Deegan, Daniel De Graaf,
	Volodymyr Babchuk, Jun Nakajima, Kevin Tian, Anthony PERARD,
	Bertrand Marquis, Wei Chen, Kaly Xin, Artem Mygaiev,
	Alex Bennée

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>


Date: Sat, 28 Nov 2020 22:33:51 +0200
Subject: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Hello all.

The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
You can find an initial discussion at [1] and RFC/V1/V2 series at [2]/[3]/[4].
Xen on Arm requires some implementation to forward guest MMIO access to a device
model in order to implement virtio-mmio backend or even mediator outside of hypervisor.
As Xen on x86 already contains required support this series tries to make it common
and introduce Arm specific bits plus some new functionality. Patch series is based on
Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device emulator".
Besides splitting existing IOREQ/DM support and introducing Arm side, the series
also includes virtio-mmio related changes (last 2 patches for toolstack)
for the reviewers to be able to see how the whole picture could look like.

According to the initial discussion there are a few open questions/concerns
regarding security, performance in VirtIO solution:
1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require different
   transport...
2. virtio backend is able to access all guest memory, some kind of protection
   is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest'
3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using
   Xenstore in virtio backend if possible.
4. a lot of 'foreing mapping' could lead to the memory exhaustion, Julien
   has some idea regarding that.

Looks like all of them are valid and worth considering, but the first thing
which we need on Arm is a mechanism to forward guest IO to a device emulator,
so let's focus on it in the first place.

***

There are a lot of changes since RFC series, almost all TODOs were resolved on Arm,
Arm code was improved and hardened, common IOREQ/DM code became really arch-agnostic
(without HVM-ism), the "legacy" mechanism of mapping magic pages for the IOREQ servers
was left x86 specific, etc. But one TODO still remains which is "PIO handling" on Arm.
The "PIO handling" TODO is expected to left unaddressed for the current series.
It is not an big issue for now while Xen doesn't have support for vPCI on Arm.
On Arm64 they are only used for PCI IO Bar and we would probably want to expose
them to emulator as PIO access to make a DM completely arch-agnostic. So "PIO handling"
should be implemented when we add support for vPCI.

I left interface untouched in the following patch
"xen/dm: Introduce xendevicemodel_set_irq_level DM op"
since there is still an open discussion what interface to use/what
information to pass to the hypervisor.

There is a patch on review this series depends on:
https://patchwork.kernel.org/patch/11816689

Please note, that IOREQ feature is disabled by default on Arm within current series.

***

Patch series [5] was rebased on recent "staging branch"
(181f2c2 evtchn: double per-channel locking can't hit identical channels) and tested on
Renesas Salvator-X board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend [6]
running in driver domain and unmodified Linux Guest running on existing
virtio-blk driver (frontend). No issues were observed. Guest domain 'reboot/destroy'
use-cases work properly. Patch series was only build-tested on x86.

Please note, build-test passed for the following modes:
1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set
3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y
4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set  (default)
5. Arm32: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y
6. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set  (default)

***

Any feedback/help would be highly appreciated.

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00825.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2020-08/msg00071.html
[3] https://lists.xenproject.org/archives/html/xen-devel/2020-09/msg00732.html
[4] https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01077.html
[5] https://github.com/otyshchenko1/xen/commits/ioreq_4.14_ml4
[6] https://github.com/xen-troops/virtio-disk/commits/ioreq_ml1

Julien Grall (5):
  xen/dm: Make x86's DM feature common
  xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
  arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  xen/dm: Introduce xendevicemodel_set_irq_level DM op
  libxl: Introduce basic virtio-mmio support on Arm

Oleksandr Tyshchenko (18):
  x86/ioreq: Prepare IOREQ feature for making it common
  x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving
  x86/ioreq: Provide out-of-line wrapper for the handle_mmio()
  xen/ioreq: Make x86's IOREQ feature common
  xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
  xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common
  xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
  xen/ioreq: Move x86's ioreq_server to struct domain
  xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
  xen/ioreq: Remove "hvm" prefixes from involved function names
  xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  xen/arm: Stick around in leave_hypervisor_to_guest until I/O has
    completed
  xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  xen/ioreq: Introduce domain_has_ioreq_server()
  xen/arm: io: Abstract sign-extension
  xen/ioreq: Make x86's send_invalidate_req() common
  xen/arm: Add mapcache invalidation handling
  [RFC] libxl: Add support for virtio-disk configuration

 MAINTAINERS                                  |    8 +-
 tools/include/xendevicemodel.h               |    4 +
 tools/libs/devicemodel/core.c                |   18 +
 tools/libs/devicemodel/libxendevicemodel.map |    1 +
 tools/libs/light/Makefile                    |    1 +
 tools/libs/light/libxl_arm.c                 |   94 +-
 tools/libs/light/libxl_create.c              |    1 +
 tools/libs/light/libxl_internal.h            |    1 +
 tools/libs/light/libxl_types.idl             |   16 +
 tools/libs/light/libxl_types_internal.idl    |    1 +
 tools/libs/light/libxl_virtio_disk.c         |  109 +++
 tools/xl/Makefile                            |    2 +-
 tools/xl/xl.h                                |    3 +
 tools/xl/xl_cmdtable.c                       |   15 +
 tools/xl/xl_parse.c                          |  116 +++
 tools/xl/xl_virtio_disk.c                    |   46 +
 xen/arch/arm/Makefile                        |    2 +
 xen/arch/arm/dm.c                            |   89 ++
 xen/arch/arm/domain.c                        |    9 +
 xen/arch/arm/hvm.c                           |    4 +
 xen/arch/arm/io.c                            |   29 +-
 xen/arch/arm/ioreq.c                         |  126 +++
 xen/arch/arm/p2m.c                           |   48 +-
 xen/arch/arm/traps.c                         |   58 +-
 xen/arch/x86/Kconfig                         |    1 +
 xen/arch/x86/hvm/dm.c                        |  295 +-----
 xen/arch/x86/hvm/emulate.c                   |   80 +-
 xen/arch/x86/hvm/hvm.c                       |   12 +-
 xen/arch/x86/hvm/hypercall.c                 |    9 +-
 xen/arch/x86/hvm/intercept.c                 |    5 +-
 xen/arch/x86/hvm/io.c                        |   26 +-
 xen/arch/x86/hvm/ioreq.c                     | 1357 ++------------------------
 xen/arch/x86/hvm/stdvga.c                    |   10 +-
 xen/arch/x86/hvm/svm/nestedsvm.c             |    2 +-
 xen/arch/x86/hvm/vmx/realmode.c              |    6 +-
 xen/arch/x86/hvm/vmx/vvmx.c                  |    2 +-
 xen/arch/x86/mm.c                            |   46 +-
 xen/arch/x86/mm/p2m.c                        |   13 +-
 xen/arch/x86/mm/shadow/common.c              |    2 +-
 xen/common/Kconfig                           |    3 +
 xen/common/Makefile                          |    2 +
 xen/common/dm.c                              |  292 ++++++
 xen/common/ioreq.c                           | 1307 +++++++++++++++++++++++++
 xen/common/memory.c                          |   73 +-
 xen/include/asm-arm/domain.h                 |    3 +
 xen/include/asm-arm/hvm/ioreq.h              |  139 +++
 xen/include/asm-arm/mm.h                     |    8 -
 xen/include/asm-arm/mmio.h                   |    1 +
 xen/include/asm-arm/p2m.h                    |   19 +-
 xen/include/asm-arm/traps.h                  |   24 +
 xen/include/asm-x86/hvm/domain.h             |   43 -
 xen/include/asm-x86/hvm/emulate.h            |    2 +-
 xen/include/asm-x86/hvm/io.h                 |   17 -
 xen/include/asm-x86/hvm/ioreq.h              |   58 +-
 xen/include/asm-x86/hvm/vcpu.h               |   18 -
 xen/include/asm-x86/mm.h                     |    4 -
 xen/include/asm-x86/p2m.h                    |   24 +-
 xen/include/public/arch-arm.h                |    5 +
 xen/include/public/hvm/dm_op.h               |   16 +
 xen/include/xen/dm.h                         |   44 +
 xen/include/xen/ioreq.h                      |  146 +++
 xen/include/xen/p2m-common.h                 |    4 +
 xen/include/xen/sched.h                      |   32 +
 xen/include/xsm/dummy.h                      |    4 +-
 xen/include/xsm/xsm.h                        |    6 +-
 xen/xsm/dummy.c                              |    2 +-
 xen/xsm/flask/hooks.c                        |    5 +-
 67 files changed, 3084 insertions(+), 1884 deletions(-)
 create mode 100644 tools/libs/light/libxl_virtio_disk.c
 create mode 100644 tools/xl/xl_virtio_disk.c
 create mode 100644 xen/arch/arm/dm.c
 create mode 100644 xen/arch/arm/ioreq.c
 create mode 100644 xen/common/dm.c
 create mode 100644 xen/common/ioreq.c
 create mode 100644 xen/include/asm-arm/hvm/ioreq.h
 create mode 100644 xen/include/xen/dm.h
 create mode 100644 xen/include/xen/ioreq.h

-- 
2.7.4



^ permalink raw reply	[flat|nested] 127+ messages in thread

* [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common
  2020-11-30 10:31 Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-01 11:03   ` Alex Bennée
  2020-12-07 11:13   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 02/23] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving Oleksandr Tyshchenko
                   ` (23 subsequent siblings)
  24 siblings, 2 replies; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As a lot of x86 code can be re-used on Arm later on, this
patch makes some preparation to x86/hvm/ioreq.c before moving
to the common code. This way we will get a verbatim copy
for a code movement in subsequent patch.

This patch mostly introduces specific hooks to abstract arch
specific materials taking into the account the requirment to leave
the "legacy" mechanism of mapping magic pages for the IOREQ servers
x86 specific and not expose it to the common code.

These hooks are named according to the more consistent new naming
scheme right away (including dropping the "hvm" prefixes and infixes):
- IOREQ server functions should start with "ioreq_server_"
- IOREQ functions should start with "ioreq_"
other functions will be renamed in subsequent patches.

It worth mentioning that a code which checks the return value of
p2m_set_ioreq_server() in hvm_map_mem_type_to_ioreq_server() was
folded into arch_ioreq_server_map_mem_type() for the clear split.
So the p2m_change_entry_type_global() is called with ioreq_server
lock held.

Also re-order #include-s alphabetically.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch, was split from:
     "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common"
   - fold the check of p->type into hvm_get_ioreq_server_range_type()
     and make it return success/failure
   - remove relocate_portio_handler() call from arch_hvm_ioreq_destroy()
     in arch/x86/hvm/ioreq.c
   - introduce arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion()

Changes V1 -> V2:
   - update patch description
   - make arch functions inline and put them into arch header
     to achieve a truly rename by the subsequent patch
   - return void in arch_hvm_destroy_ioreq_server()
   - return bool in arch_hvm_ioreq_destroy()
   - bring relocate_portio_handler() back to arch_hvm_ioreq_destroy()
   - rename IOREQ_IO* to IOREQ_STATUS*
   - remove *handle* from arch_handle_hvm_io_completion()
   - re-order #include-s alphabetically
   - rename hvm_get_ioreq_server_range_type() to hvm_ioreq_server_get_type_addr()
     and add "const" to several arguments

Changes V2 -> V3:
   - update patch description
   - name new arch hooks according to the new naming scheme
   - don't make arch hooks inline, move them ioreq.c
   - make get_ioreq_server() local again
   - rework the whole patch taking into the account that "legacy" interface
     should remain x86 specific (additional arch hooks, etc)
   - update the code to be able to use hvm_map_mem_type_to_ioreq_server()
     in the common code (an extra arch hook, etc)
   - don’t include <asm/hvm/emulate.h> from arch header
   - add "arch" prefix to hvm_ioreq_server_get_type_addr()
   - move IOREQ_STATUS_* #define-s introduction to the separate patch
   - move HANDLE_BUFIOREQ to the arch header
   - just return relocate_portio_handler() from arch_ioreq_server_destroy_all()
   - misc adjustments proposed by Jan (adding const, unsigned int instead of uint32_t)
---
---
 xen/arch/x86/hvm/ioreq.c        | 174 ++++++++++++++++++++++++++--------------
 xen/include/asm-x86/hvm/ioreq.h |  19 +++++
 2 files changed, 133 insertions(+), 60 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 1cc27df..e3dfb49 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -17,15 +17,15 @@
  */
 
 #include <xen/ctype.h>
+#include <xen/domain.h>
+#include <xen/event.h>
 #include <xen/init.h>
+#include <xen/irq.h>
 #include <xen/lib.h>
-#include <xen/trace.h>
+#include <xen/paging.h>
 #include <xen/sched.h>
-#include <xen/irq.h>
 #include <xen/softirq.h>
-#include <xen/domain.h>
-#include <xen/event.h>
-#include <xen/paging.h>
+#include <xen/trace.h>
 #include <xen/vpci.h>
 
 #include <asm/hvm/emulate.h>
@@ -170,6 +170,29 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
     return true;
 }
 
+bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion)
+{
+    switch ( io_completion )
+    {
+    case HVMIO_realmode_completion:
+    {
+        struct hvm_emulate_ctxt ctxt;
+
+        hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
+        vmx_realmode_emulate_one(&ctxt);
+        hvm_emulate_writeback(&ctxt);
+
+        break;
+    }
+
+    default:
+        ASSERT_UNREACHABLE();
+        break;
+    }
+
+    return true;
+}
+
 bool handle_hvm_io_completion(struct vcpu *v)
 {
     struct domain *d = v->domain;
@@ -209,19 +232,8 @@ bool handle_hvm_io_completion(struct vcpu *v)
         return handle_pio(vio->io_req.addr, vio->io_req.size,
                           vio->io_req.dir);
 
-    case HVMIO_realmode_completion:
-    {
-        struct hvm_emulate_ctxt ctxt;
-
-        hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
-        vmx_realmode_emulate_one(&ctxt);
-        hvm_emulate_writeback(&ctxt);
-
-        break;
-    }
     default:
-        ASSERT_UNREACHABLE();
-        break;
+        return arch_vcpu_ioreq_completion(io_completion);
     }
 
     return true;
@@ -477,9 +489,6 @@ static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
     }
 }
 
-#define HANDLE_BUFIOREQ(s) \
-    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
-
 static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
                                      struct vcpu *v)
 {
@@ -586,7 +595,7 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
-static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s)
 {
     int rc;
 
@@ -601,7 +610,7 @@ static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
     return rc;
 }
 
-static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
 {
     hvm_unmap_ioreq_gfn(s, true);
     hvm_unmap_ioreq_gfn(s, false);
@@ -674,6 +683,12 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
     return rc;
 }
 
+void arch_ioreq_server_enable(struct hvm_ioreq_server *s)
+{
+    hvm_remove_ioreq_gfn(s, false);
+    hvm_remove_ioreq_gfn(s, true);
+}
+
 static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
 {
     struct hvm_ioreq_vcpu *sv;
@@ -683,8 +698,7 @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
     if ( s->enabled )
         goto done;
 
-    hvm_remove_ioreq_gfn(s, false);
-    hvm_remove_ioreq_gfn(s, true);
+    arch_ioreq_server_enable(s);
 
     s->enabled = true;
 
@@ -697,6 +711,12 @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
+void arch_ioreq_server_disable(struct hvm_ioreq_server *s)
+{
+    hvm_add_ioreq_gfn(s, true);
+    hvm_add_ioreq_gfn(s, false);
+}
+
 static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
 {
     spin_lock(&s->lock);
@@ -704,8 +724,7 @@ static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
     if ( !s->enabled )
         goto done;
 
-    hvm_add_ioreq_gfn(s, true);
-    hvm_add_ioreq_gfn(s, false);
+    arch_ioreq_server_disable(s);
 
     s->enabled = false;
 
@@ -750,7 +769,7 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
 
  fail_add:
     hvm_ioreq_server_remove_all_vcpus(s);
-    hvm_ioreq_server_unmap_pages(s);
+    arch_ioreq_server_unmap_pages(s);
 
     hvm_ioreq_server_free_rangesets(s);
 
@@ -764,7 +783,7 @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
     hvm_ioreq_server_remove_all_vcpus(s);
 
     /*
-     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
+     * NOTE: It is safe to call both arch_ioreq_server_unmap_pages() and
      *       hvm_ioreq_server_free_pages() in that order.
      *       This is because the former will do nothing if the pages
      *       are not mapped, leaving the page to be freed by the latter.
@@ -772,7 +791,7 @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
      *       the page_info pointer to NULL, meaning the latter will do
      *       nothing.
      */
-    hvm_ioreq_server_unmap_pages(s);
+    arch_ioreq_server_unmap_pages(s);
     hvm_ioreq_server_free_pages(s);
 
     hvm_ioreq_server_free_rangesets(s);
@@ -836,6 +855,12 @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
     return rc;
 }
 
+/* Called when target domain is paused */
+void arch_ioreq_server_destroy(struct hvm_ioreq_server *s)
+{
+    p2m_set_ioreq_server(s->target, 0, s);
+}
+
 int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 {
     struct hvm_ioreq_server *s;
@@ -855,7 +880,7 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 
     domain_pause(d);
 
-    p2m_set_ioreq_server(d, 0, s);
+    arch_ioreq_server_destroy(s);
 
     hvm_ioreq_server_disable(s);
 
@@ -900,7 +925,7 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
 
     if ( ioreq_gfn || bufioreq_gfn )
     {
-        rc = hvm_ioreq_server_map_pages(s);
+        rc = arch_ioreq_server_map_pages(s);
         if ( rc )
             goto out;
     }
@@ -1080,6 +1105,24 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
+/* Called with ioreq_server lock held */
+int arch_ioreq_server_map_mem_type(struct domain *d,
+                                   struct hvm_ioreq_server *s,
+                                   uint32_t flags)
+{
+    int rc = p2m_set_ioreq_server(d, flags, s);
+
+    if ( rc == 0 && flags == 0 )
+    {
+        const struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
+    }
+
+    return rc;
+}
+
 /*
  * Map or unmap an ioreq server to specific memory type. For now, only
  * HVMMEM_ioreq_server is supported, and in the future new types can be
@@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
     if ( s->emulator != current->domain )
         goto out;
 
-    rc = p2m_set_ioreq_server(d, flags, s);
+    rc = arch_ioreq_server_map_mem_type(d, s, flags);
 
  out:
     spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
 
-    if ( rc == 0 && flags == 0 )
-    {
-        struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-        if ( read_atomic(&p2m->ioreq.entry_count) )
-            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
-    }
-
     return rc;
 }
 
@@ -1210,12 +1245,17 @@ void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
     spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
 }
 
+bool arch_ioreq_server_destroy_all(struct domain *d)
+{
+    return relocate_portio_handler(d, 0xcf8, 0xcf8, 4);
+}
+
 void hvm_destroy_all_ioreq_servers(struct domain *d)
 {
     struct hvm_ioreq_server *s;
     unsigned int id;
 
-    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
+    if ( !arch_ioreq_server_destroy_all(d) )
         return;
 
     spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
@@ -1239,33 +1279,28 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
     spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
 }
 
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p)
+int arch_ioreq_server_get_type_addr(const struct domain *d,
+                                    const ioreq_t *p,
+                                    uint8_t *type,
+                                    uint64_t *addr)
 {
-    struct hvm_ioreq_server *s;
-    uint32_t cf8;
-    uint8_t type;
-    uint64_t addr;
-    unsigned int id;
+    unsigned int cf8 = d->arch.hvm.pci_cf8;
 
     if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
-        return NULL;
-
-    cf8 = d->arch.hvm.pci_cf8;
+        return -EINVAL;
 
     if ( p->type == IOREQ_TYPE_PIO &&
          (p->addr & ~3) == 0xcfc &&
          CF8_ENABLED(cf8) )
     {
-        uint32_t x86_fam;
+        unsigned int x86_fam, reg;
         pci_sbdf_t sbdf;
-        unsigned int reg;
 
         reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf);
 
         /* PCI config data cycle */
-        type = XEN_DMOP_IO_RANGE_PCI;
-        addr = ((uint64_t)sbdf.sbdf << 32) | reg;
+        *type = XEN_DMOP_IO_RANGE_PCI;
+        *addr = ((uint64_t)sbdf.sbdf << 32) | reg;
         /* AMD extended configuration space access? */
         if ( CF8_ADDR_HI(cf8) &&
              d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&
@@ -1277,16 +1312,30 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
 
             if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) &&
                  (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
-                addr |= CF8_ADDR_HI(cf8);
+                *addr |= CF8_ADDR_HI(cf8);
         }
     }
     else
     {
-        type = (p->type == IOREQ_TYPE_PIO) ?
-                XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
-        addr = p->addr;
+        *type = (p->type == IOREQ_TYPE_PIO) ?
+                 XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
+        *addr = p->addr;
     }
 
+    return 0;
+}
+
+struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                 ioreq_t *p)
+{
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+    unsigned int id;
+
+    if ( arch_ioreq_server_get_type_addr(d, p, &type, &addr) )
+        return NULL;
+
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
         struct rangeset *r;
@@ -1515,11 +1564,16 @@ static int hvm_access_cf8(
     return X86EMUL_UNHANDLEABLE;
 }
 
+void arch_ioreq_domain_init(struct domain *d)
+{
+    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
+}
+
 void hvm_ioreq_init(struct domain *d)
 {
     spin_lock_init(&d->arch.hvm.ioreq_server.lock);
 
-    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
+    arch_ioreq_domain_init(d);
 }
 
 /*
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index e2588e9..cc79285 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -19,6 +19,25 @@
 #ifndef __ASM_X86_HVM_IOREQ_H__
 #define __ASM_X86_HVM_IOREQ_H__
 
+#define HANDLE_BUFIOREQ(s) \
+    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
+
+bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion);
+int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s);
+void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s);
+void arch_ioreq_server_enable(struct hvm_ioreq_server *s);
+void arch_ioreq_server_disable(struct hvm_ioreq_server *s);
+void arch_ioreq_server_destroy(struct hvm_ioreq_server *s);
+int arch_ioreq_server_map_mem_type(struct domain *d,
+                                   struct hvm_ioreq_server *s,
+                                   uint32_t flags);
+bool arch_ioreq_server_destroy_all(struct domain *d);
+int arch_ioreq_server_get_type_addr(const struct domain *d,
+                                    const ioreq_t *p,
+                                    uint8_t *type,
+                                    uint64_t *addr);
+void arch_ioreq_domain_init(struct domain *d);
+
 bool hvm_io_pending(struct vcpu *v);
 bool handle_hvm_io_completion(struct vcpu *v);
 bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 02/23] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving
  2020-11-30 10:31 Oleksandr Tyshchenko
  2020-11-30 10:31 ` [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-01 11:07   ` Alex Bennée
  2020-12-07 11:19   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 03/23] x86/ioreq: Provide out-of-line wrapper for the handle_mmio() Oleksandr Tyshchenko
                   ` (22 subsequent siblings)
  24 siblings, 2 replies; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch continues to make some preparation to x86/hvm/ioreq.c
before moving to the common code.

Add IOREQ_STATUS_* #define-s and update candidates for moving
since X86EMUL_* shouldn't be exposed to the common code in
that form.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V2 -> V3:
 - new patch, was split from
   [PATCH V2 01/23] x86/ioreq: Prepare IOREQ feature for making it common
---
---
 xen/arch/x86/hvm/ioreq.c        | 16 ++++++++--------
 xen/include/asm-x86/hvm/ioreq.h |  4 ++++
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index e3dfb49..9525554 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -1400,7 +1400,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
     pg = iorp->va;
 
     if ( !pg )
-        return X86EMUL_UNHANDLEABLE;
+        return IOREQ_STATUS_UNHANDLED;
 
     /*
      * Return 0 for the cases we can't deal with:
@@ -1430,7 +1430,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
         break;
     default:
         gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return X86EMUL_UNHANDLEABLE;
+        return IOREQ_STATUS_UNHANDLED;
     }
 
     spin_lock(&s->bufioreq_lock);
@@ -1440,7 +1440,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
     {
         /* The queue is full: send the iopacket through the normal path. */
         spin_unlock(&s->bufioreq_lock);
-        return X86EMUL_UNHANDLEABLE;
+        return IOREQ_STATUS_UNHANDLED;
     }
 
     pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
@@ -1471,7 +1471,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
     notify_via_xen_event_channel(d, s->bufioreq_evtchn);
     spin_unlock(&s->bufioreq_lock);
 
-    return X86EMUL_OKAY;
+    return IOREQ_STATUS_HANDLED;
 }
 
 int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
@@ -1487,7 +1487,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
         return hvm_send_buffered_ioreq(s, proto_p);
 
     if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-        return X86EMUL_RETRY;
+        return IOREQ_STATUS_RETRY;
 
     list_for_each_entry ( sv,
                           &s->ioreq_vcpu_list,
@@ -1527,11 +1527,11 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
             notify_via_xen_event_channel(d, port);
 
             sv->pending = true;
-            return X86EMUL_RETRY;
+            return IOREQ_STATUS_RETRY;
         }
     }
 
-    return X86EMUL_UNHANDLEABLE;
+    return IOREQ_STATUS_UNHANDLED;
 }
 
 unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
@@ -1545,7 +1545,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
         if ( !s->enabled )
             continue;
 
-        if ( hvm_send_ioreq(s, p, buffered) == X86EMUL_UNHANDLEABLE )
+        if ( hvm_send_ioreq(s, p, buffered) == IOREQ_STATUS_UNHANDLED )
             failed++;
     }
 
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index cc79285..e9c8b2d 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -74,6 +74,10 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
 
 void hvm_ioreq_init(struct domain *d);
 
+#define IOREQ_STATUS_HANDLED     X86EMUL_OKAY
+#define IOREQ_STATUS_UNHANDLED   X86EMUL_UNHANDLEABLE
+#define IOREQ_STATUS_RETRY       X86EMUL_RETRY
+
 #endif /* __ASM_X86_HVM_IOREQ_H__ */
 
 /*
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 03/23] x86/ioreq: Provide out-of-line wrapper for the handle_mmio()
  2020-11-30 10:31 Oleksandr Tyshchenko
  2020-11-30 10:31 ` [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
  2020-11-30 10:31 ` [PATCH V3 02/23] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-07 11:27   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The IOREQ is about to be common feature and Arm will have its own
implementation.

But the name of the function is pretty generic and can be confusing
on Arm (we already have a try_handle_mmio()).

In order not to rename the function (which is used for a varying
set of purposes on x86) globally and get non-confusing variant on Arm
provide a wrapper ioreq_complete_mmio() to be used on common and Arm code.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch

Changes V1 -> V2:
   - remove "handle"
   - add Jan's A-b

Changes V2 -> V3:
   - remove Jan's A-b
   - update patch subject/description
   - use out-of-line function instead of #define
   - put earlier in the series to avoid breakage
---
---
 xen/arch/x86/hvm/ioreq.c        | 7 ++++++-
 xen/include/asm-x86/hvm/ioreq.h | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 9525554..36b1e4e 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -36,6 +36,11 @@
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
+bool ioreq_complete_mmio(void)
+{
+    return handle_mmio();
+}
+
 static void set_ioreq_server(struct domain *d, unsigned int id,
                              struct hvm_ioreq_server *s)
 {
@@ -226,7 +231,7 @@ bool handle_hvm_io_completion(struct vcpu *v)
         break;
 
     case HVMIO_mmio_completion:
-        return handle_mmio();
+        return ioreq_complete_mmio();
 
     case HVMIO_pio_completion:
         return handle_pio(vio->io_req.addr, vio->io_req.size,
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index e9c8b2d..c7563e1 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -74,6 +74,8 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
 
 void hvm_ioreq_init(struct domain *d);
 
+bool ioreq_complete_mmio(void);
+
 #define IOREQ_STATUS_HANDLED     X86EMUL_OKAY
 #define IOREQ_STATUS_UNHANDLED   X86EMUL_UNHANDLEABLE
 #define IOREQ_STATUS_RETRY       X86EMUL_RETRY
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (2 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 03/23] x86/ioreq: Provide out-of-line wrapper for the handle_mmio() Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-07 11:41   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 05/23] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common Oleksandr Tyshchenko
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Tim Deegan, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As a lot of x86 code can be re-used on Arm later on, this patch
moves previously prepared IOREQ support to the common code
(the code movement is verbatim copy).

The "legacy" mechanism of mapping magic pages for the IOREQ servers
remains x86 specific and not exposed to the common code.

The common IOREQ feature is supposed to be built with IOREQ_SERVER
option enabled, which is selected for x86's config HVM for now.

In order to avoid having a gigantic patch here, the subsequent
patches will update remaining bits in the common code step by step:
- Make IOREQ related structs/materials common
- Drop the "hvm" prefixes and infixes
- Remove layering violation by moving corresponding fields
  out of *arch.hvm* or abstracting away accesses to them

Also include <xen/domain_page.h> which will be needed on Arm
to avoid touch the common code again when introducing Arm specific bits.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

***
Please note, this patch depends on the following which is
on review:
https://patchwork.kernel.org/patch/11816689/
***

Changes RFC -> V1:
   - was split into three patches:
     - x86/ioreq: Prepare IOREQ feature for making it common
     - xen/ioreq: Make x86's IOREQ feature common
     - xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
   - update MAINTAINERS file
   - do not use a separate subdir for the IOREQ stuff, move it to:
     - xen/common/ioreq.c
     - xen/include/xen/ioreq.h
   - update x86's files to include xen/ioreq.h
   - remove unneeded headers in arch/x86/hvm/ioreq.c
   - re-order the headers alphabetically in common/ioreq.c
   - update common/ioreq.c according to the newly introduced arch functions:
     arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion()

Changes V1 -> V2:
   - update patch description
   - make everything needed in the previous patch to achieve
     a truly rename here
   - don't include unnecessary headers from asm-x86/hvm/ioreq.h
     and xen/ioreq.h
   - use __XEN_IOREQ_H__ instead of __IOREQ_H__
   - move get_ioreq_server() to common/ioreq.c

Changes V2 -> V3:
   - update patch description
   - make everything needed in the previous patch to not
     expose "legacy" interface to the common code here
   - update patch according the "legacy interface" is x86 specific
   - include <xen/domain_page.h> in common ioreq.c
---
---
 MAINTAINERS                     |    8 +-
 xen/arch/x86/Kconfig            |    1 +
 xen/arch/x86/hvm/ioreq.c        | 1356 ++-------------------------------------
 xen/arch/x86/mm.c               |    2 +-
 xen/arch/x86/mm/shadow/common.c |    2 +-
 xen/common/Kconfig              |    3 +
 xen/common/Makefile             |    1 +
 xen/common/ioreq.c              | 1287 +++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/ioreq.h |   39 +-
 xen/include/xen/ioreq.h         |   73 +++
 10 files changed, 1432 insertions(+), 1340 deletions(-)
 create mode 100644 xen/common/ioreq.c
 create mode 100644 xen/include/xen/ioreq.h

diff --git a/MAINTAINERS b/MAINTAINERS
index dab38a6..5a44ba4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -333,6 +333,13 @@ X:	xen/drivers/passthrough/vtd/
 X:	xen/drivers/passthrough/device_tree.c
 F:	xen/include/xen/iommu.h
 
+I/O EMULATION (IOREQ)
+M:	Paul Durrant <paul@xen.org>
+S:	Supported
+F:	xen/common/ioreq.c
+F:	xen/include/xen/ioreq.h
+F:	xen/include/public/hvm/ioreq.h
+
 KCONFIG
 M:	Doug Goldstein <cardoe@cardoe.com>
 S:	Supported
@@ -549,7 +556,6 @@ F:	xen/arch/x86/hvm/ioreq.c
 F:	xen/include/asm-x86/hvm/emulate.h
 F:	xen/include/asm-x86/hvm/io.h
 F:	xen/include/asm-x86/hvm/ioreq.h
-F:	xen/include/public/hvm/ioreq.h
 
 X86 MEMORY MANAGEMENT
 M:	Jan Beulich <jbeulich@suse.com>
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 24868aa..abe0fce 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -91,6 +91,7 @@ config PV_LINEAR_PT
 
 config HVM
 	def_bool !PV_SHIM_EXCLUSIVE
+	select IOREQ_SERVER
 	prompt "HVM support"
 	---help---
 	  Interfaces to support HVM domains.  HVM domains require hardware
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 36b1e4e..b03ceee 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -41,140 +41,6 @@ bool ioreq_complete_mmio(void)
     return handle_mmio();
 }
 
-static void set_ioreq_server(struct domain *d, unsigned int id,
-                             struct hvm_ioreq_server *s)
-{
-    ASSERT(id < MAX_NR_IOREQ_SERVERS);
-    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
-
-    d->arch.hvm.ioreq_server.server[id] = s;
-}
-
-#define GET_IOREQ_SERVER(d, id) \
-    (d)->arch.hvm.ioreq_server.server[id]
-
-static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
-                                                 unsigned int id)
-{
-    if ( id >= MAX_NR_IOREQ_SERVERS )
-        return NULL;
-
-    return GET_IOREQ_SERVER(d, id);
-}
-
-/*
- * Iterate over all possible ioreq servers.
- *
- * NOTE: The iteration is backwards such that more recently created
- *       ioreq servers are favoured in hvm_select_ioreq_server().
- *       This is a semantic that previously existed when ioreq servers
- *       were held in a linked list.
- */
-#define FOR_EACH_IOREQ_SERVER(d, id, s) \
-    for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \
-        if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \
-            continue; \
-        else
-
-static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
-{
-    shared_iopage_t *p = s->ioreq.va;
-
-    ASSERT((v == current) || !vcpu_runnable(v));
-    ASSERT(p != NULL);
-
-    return &p->vcpu_ioreq[v->vcpu_id];
-}
-
-static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v,
-                                               struct hvm_ioreq_server **srvp)
-{
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        struct hvm_ioreq_vcpu *sv;
-
-        list_for_each_entry ( sv,
-                              &s->ioreq_vcpu_list,
-                              list_entry )
-        {
-            if ( sv->vcpu == v && sv->pending )
-            {
-                if ( srvp )
-                    *srvp = s;
-                return sv;
-            }
-        }
-    }
-
-    return NULL;
-}
-
-bool hvm_io_pending(struct vcpu *v)
-{
-    return get_pending_vcpu(v, NULL);
-}
-
-static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
-{
-    unsigned int prev_state = STATE_IOREQ_NONE;
-    unsigned int state = p->state;
-    uint64_t data = ~0;
-
-    smp_rmb();
-
-    /*
-     * The only reason we should see this condition be false is when an
-     * emulator dying races with I/O being requested.
-     */
-    while ( likely(state != STATE_IOREQ_NONE) )
-    {
-        if ( unlikely(state < prev_state) )
-        {
-            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
-                     prev_state, state);
-            sv->pending = false;
-            domain_crash(sv->vcpu->domain);
-            return false; /* bail */
-        }
-
-        switch ( prev_state = state )
-        {
-        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
-            p->state = STATE_IOREQ_NONE;
-            data = p->data;
-            break;
-
-        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
-        case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(sv->ioreq_evtchn,
-                                      ({ state = p->state;
-                                         smp_rmb();
-                                         state != prev_state; }));
-            continue;
-
-        default:
-            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state);
-            sv->pending = false;
-            domain_crash(sv->vcpu->domain);
-            return false; /* bail */
-        }
-
-        break;
-    }
-
-    p = &sv->vcpu->arch.hvm.hvm_io.io_req;
-    if ( hvm_ioreq_needs_completion(p) )
-        p->data = data;
-
-    sv->pending = false;
-
-    return true;
-}
-
 bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion)
 {
     switch ( io_completion )
@@ -198,52 +64,6 @@ bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion)
     return true;
 }
 
-bool handle_hvm_io_completion(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
-    struct hvm_ioreq_server *s;
-    struct hvm_ioreq_vcpu *sv;
-    enum hvm_io_completion io_completion;
-
-    if ( has_vpci(d) && vpci_process_pending(v) )
-    {
-        raise_softirq(SCHEDULE_SOFTIRQ);
-        return false;
-    }
-
-    sv = get_pending_vcpu(v, &s);
-    if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
-        return false;
-
-    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
-        STATE_IORESP_READY : STATE_IOREQ_NONE;
-
-    msix_write_completion(v);
-    vcpu_end_shutdown_deferral(v);
-
-    io_completion = vio->io_completion;
-    vio->io_completion = HVMIO_no_completion;
-
-    switch ( io_completion )
-    {
-    case HVMIO_no_completion:
-        break;
-
-    case HVMIO_mmio_completion:
-        return ioreq_complete_mmio();
-
-    case HVMIO_pio_completion:
-        return handle_pio(vio->io_req.addr, vio->io_req.size,
-                          vio->io_req.dir);
-
-    default:
-        return arch_vcpu_ioreq_completion(io_completion);
-    }
-
-    return true;
-}
-
 static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s)
 {
     struct domain *d = s->target;
@@ -360,93 +180,6 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     return rc;
 }
 
-static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    struct page_info *page;
-
-    if ( iorp->page )
-    {
-        /*
-         * If a guest frame has already been mapped (which may happen
-         * on demand if hvm_get_ioreq_server_info() is called), then
-         * allocating a page is not permitted.
-         */
-        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
-            return -EPERM;
-
-        return 0;
-    }
-
-    page = alloc_domheap_page(s->target, MEMF_no_refcount);
-
-    if ( !page )
-        return -ENOMEM;
-
-    if ( !get_page_and_type(page, s->target, PGT_writable_page) )
-    {
-        /*
-         * The domain can't possibly know about this page yet, so failure
-         * here is a clear indication of something fishy going on.
-         */
-        domain_crash(s->emulator);
-        return -ENODATA;
-    }
-
-    iorp->va = __map_domain_page_global(page);
-    if ( !iorp->va )
-        goto fail;
-
-    iorp->page = page;
-    clear_page(iorp->va);
-    return 0;
-
- fail:
-    put_page_alloc_ref(page);
-    put_page_and_type(page);
-
-    return -ENOMEM;
-}
-
-static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    struct page_info *page = iorp->page;
-
-    if ( !page )
-        return;
-
-    iorp->page = NULL;
-
-    unmap_domain_page_global(iorp->va);
-    iorp->va = NULL;
-
-    put_page_alloc_ref(page);
-    put_page_and_type(page);
-}
-
-bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
-{
-    const struct hvm_ioreq_server *s;
-    unsigned int id;
-    bool found = false;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
-        {
-            found = true;
-            break;
-        }
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return found;
-}
-
 static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
 
 {
@@ -481,125 +214,6 @@ static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     return rc;
 }
 
-static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
-                                    struct hvm_ioreq_vcpu *sv)
-{
-    ASSERT(spin_is_locked(&s->lock));
-
-    if ( s->ioreq.va != NULL )
-    {
-        ioreq_t *p = get_ioreq(s, sv->vcpu);
-
-        p->vp_eport = sv->ioreq_evtchn;
-    }
-}
-
-static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
-                                     struct vcpu *v)
-{
-    struct hvm_ioreq_vcpu *sv;
-    int rc;
-
-    sv = xzalloc(struct hvm_ioreq_vcpu);
-
-    rc = -ENOMEM;
-    if ( !sv )
-        goto fail1;
-
-    spin_lock(&s->lock);
-
-    rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id,
-                                         s->emulator->domain_id, NULL);
-    if ( rc < 0 )
-        goto fail2;
-
-    sv->ioreq_evtchn = rc;
-
-    if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-    {
-        rc = alloc_unbound_xen_event_channel(v->domain, 0,
-                                             s->emulator->domain_id, NULL);
-        if ( rc < 0 )
-            goto fail3;
-
-        s->bufioreq_evtchn = rc;
-    }
-
-    sv->vcpu = v;
-
-    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
-
-    if ( s->enabled )
-        hvm_update_ioreq_evtchn(s, sv);
-
-    spin_unlock(&s->lock);
-    return 0;
-
- fail3:
-    free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
- fail2:
-    spin_unlock(&s->lock);
-    xfree(sv);
-
- fail1:
-    return rc;
-}
-
-static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
-                                         struct vcpu *v)
-{
-    struct hvm_ioreq_vcpu *sv;
-
-    spin_lock(&s->lock);
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-    {
-        if ( sv->vcpu != v )
-            continue;
-
-        list_del(&sv->list_entry);
-
-        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
-
-        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
-        xfree(sv);
-        break;
-    }
-
-    spin_unlock(&s->lock);
-}
-
-static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
-{
-    struct hvm_ioreq_vcpu *sv, *next;
-
-    spin_lock(&s->lock);
-
-    list_for_each_entry_safe ( sv,
-                               next,
-                               &s->ioreq_vcpu_list,
-                               list_entry )
-    {
-        struct vcpu *v = sv->vcpu;
-
-        list_del(&sv->list_entry);
-
-        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
-
-        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
-        xfree(sv);
-    }
-
-    spin_unlock(&s->lock);
-}
-
 int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s)
 {
     int rc;
@@ -621,940 +235,91 @@ void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
     hvm_unmap_ioreq_gfn(s, false);
 }
 
-static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
+void arch_ioreq_server_enable(struct hvm_ioreq_server *s)
 {
-    int rc;
-
-    rc = hvm_alloc_ioreq_mfn(s, false);
-
-    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
-        rc = hvm_alloc_ioreq_mfn(s, true);
-
-    if ( rc )
-        hvm_free_ioreq_mfn(s, false);
-
-    return rc;
+    hvm_remove_ioreq_gfn(s, false);
+    hvm_remove_ioreq_gfn(s, true);
 }
 
-static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
+void arch_ioreq_server_disable(struct hvm_ioreq_server *s)
 {
-    hvm_free_ioreq_mfn(s, true);
-    hvm_free_ioreq_mfn(s, false);
+    hvm_add_ioreq_gfn(s, true);
+    hvm_add_ioreq_gfn(s, false);
 }
 
-static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
+/* Called when target domain is paused */
+void arch_ioreq_server_destroy(struct hvm_ioreq_server *s)
 {
-    unsigned int i;
-
-    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
-        rangeset_destroy(s->range[i]);
+    p2m_set_ioreq_server(s->target, 0, s);
 }
 
-static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
-                                            ioservid_t id)
+/* Called with ioreq_server lock held */
+int arch_ioreq_server_map_mem_type(struct domain *d,
+                                   struct hvm_ioreq_server *s,
+                                   uint32_t flags)
 {
-    unsigned int i;
-    int rc;
+    int rc = p2m_set_ioreq_server(d, flags, s);
 
-    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+    if ( rc == 0 && flags == 0 )
     {
-        char *name;
-
-        rc = asprintf(&name, "ioreq_server %d %s", id,
-                      (i == XEN_DMOP_IO_RANGE_PORT) ? "port" :
-                      (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" :
-                      (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" :
-                      "");
-        if ( rc )
-            goto fail;
-
-        s->range[i] = rangeset_new(s->target, name,
-                                   RANGESETF_prettyprint_hex);
-
-        xfree(name);
-
-        rc = -ENOMEM;
-        if ( !s->range[i] )
-            goto fail;
+        const struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
-        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+        if ( read_atomic(&p2m->ioreq.entry_count) )
+            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
     }
 
-    return 0;
-
- fail:
-    hvm_ioreq_server_free_rangesets(s);
-
     return rc;
 }
 
-void arch_ioreq_server_enable(struct hvm_ioreq_server *s)
+bool arch_ioreq_server_destroy_all(struct domain *d)
 {
-    hvm_remove_ioreq_gfn(s, false);
-    hvm_remove_ioreq_gfn(s, true);
+    return relocate_portio_handler(d, 0xcf8, 0xcf8, 4);
 }
 
-static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
+int arch_ioreq_server_get_type_addr(const struct domain *d,
+                                    const ioreq_t *p,
+                                    uint8_t *type,
+                                    uint64_t *addr)
 {
-    struct hvm_ioreq_vcpu *sv;
+    unsigned int cf8 = d->arch.hvm.pci_cf8;
 
-    spin_lock(&s->lock);
+    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
+        return -EINVAL;
 
-    if ( s->enabled )
-        goto done;
+    if ( p->type == IOREQ_TYPE_PIO &&
+         (p->addr & ~3) == 0xcfc &&
+         CF8_ENABLED(cf8) )
+    {
+        unsigned int x86_fam, reg;
+        pci_sbdf_t sbdf;
 
-    arch_ioreq_server_enable(s);
+        reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf);
 
-    s->enabled = true;
+        /* PCI config data cycle */
+        *type = XEN_DMOP_IO_RANGE_PCI;
+        *addr = ((uint64_t)sbdf.sbdf << 32) | reg;
+        /* AMD extended configuration space access? */
+        if ( CF8_ADDR_HI(cf8) &&
+             d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&
+             (x86_fam = get_cpu_family(
+                 d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 &&
+             x86_fam < 0x17 )
+        {
+            uint64_t msr_val;
 
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-        hvm_update_ioreq_evtchn(s, sv);
+            if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) &&
+                 (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
+                *addr |= CF8_ADDR_HI(cf8);
+        }
+    }
+    else
+    {
+        *type = (p->type == IOREQ_TYPE_PIO) ?
+                 XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
+        *addr = p->addr;
+    }
 
-  done:
-    spin_unlock(&s->lock);
-}
-
-void arch_ioreq_server_disable(struct hvm_ioreq_server *s)
-{
-    hvm_add_ioreq_gfn(s, true);
-    hvm_add_ioreq_gfn(s, false);
-}
-
-static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
-{
-    spin_lock(&s->lock);
-
-    if ( !s->enabled )
-        goto done;
-
-    arch_ioreq_server_disable(s);
-
-    s->enabled = false;
-
- done:
-    spin_unlock(&s->lock);
-}
-
-static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
-                                 struct domain *d, int bufioreq_handling,
-                                 ioservid_t id)
-{
-    struct domain *currd = current->domain;
-    struct vcpu *v;
-    int rc;
-
-    s->target = d;
-
-    get_knownalive_domain(currd);
-    s->emulator = currd;
-
-    spin_lock_init(&s->lock);
-    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
-    spin_lock_init(&s->bufioreq_lock);
-
-    s->ioreq.gfn = INVALID_GFN;
-    s->bufioreq.gfn = INVALID_GFN;
-
-    rc = hvm_ioreq_server_alloc_rangesets(s, id);
-    if ( rc )
-        return rc;
-
-    s->bufioreq_handling = bufioreq_handling;
-
-    for_each_vcpu ( d, v )
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc )
-            goto fail_add;
-    }
-
-    return 0;
-
- fail_add:
-    hvm_ioreq_server_remove_all_vcpus(s);
-    arch_ioreq_server_unmap_pages(s);
-
-    hvm_ioreq_server_free_rangesets(s);
-
-    put_domain(s->emulator);
-    return rc;
-}
-
-static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
-{
-    ASSERT(!s->enabled);
-    hvm_ioreq_server_remove_all_vcpus(s);
-
-    /*
-     * NOTE: It is safe to call both arch_ioreq_server_unmap_pages() and
-     *       hvm_ioreq_server_free_pages() in that order.
-     *       This is because the former will do nothing if the pages
-     *       are not mapped, leaving the page to be freed by the latter.
-     *       However if the pages are mapped then the former will set
-     *       the page_info pointer to NULL, meaning the latter will do
-     *       nothing.
-     */
-    arch_ioreq_server_unmap_pages(s);
-    hvm_ioreq_server_free_pages(s);
-
-    hvm_ioreq_server_free_rangesets(s);
-
-    put_domain(s->emulator);
-}
-
-int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
-                            ioservid_t *id)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int i;
-    int rc;
-
-    if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC )
-        return -EINVAL;
-
-    s = xzalloc(struct hvm_ioreq_server);
-    if ( !s )
-        return -ENOMEM;
-
-    domain_pause(d);
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
-    {
-        if ( !GET_IOREQ_SERVER(d, i) )
-            break;
-    }
-
-    rc = -ENOSPC;
-    if ( i >= MAX_NR_IOREQ_SERVERS )
-        goto fail;
-
-    /*
-     * It is safe to call set_ioreq_server() prior to
-     * hvm_ioreq_server_init() since the target domain is paused.
-     */
-    set_ioreq_server(d, i, s);
-
-    rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i);
-    if ( rc )
-    {
-        set_ioreq_server(d, i, NULL);
-        goto fail;
-    }
-
-    if ( id )
-        *id = i;
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    domain_unpause(d);
-
-    return 0;
-
- fail:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    domain_unpause(d);
-
-    xfree(s);
-    return rc;
-}
-
-/* Called when target domain is paused */
-void arch_ioreq_server_destroy(struct hvm_ioreq_server *s)
-{
-    p2m_set_ioreq_server(s->target, 0, s);
-}
-
-int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    domain_pause(d);
-
-    arch_ioreq_server_destroy(s);
-
-    hvm_ioreq_server_disable(s);
-
-    /*
-     * It is safe to call hvm_ioreq_server_deinit() prior to
-     * set_ioreq_server() since the target domain is paused.
-     */
-    hvm_ioreq_server_deinit(s);
-    set_ioreq_server(d, id, NULL);
-
-    domain_unpause(d);
-
-    xfree(s);
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
-                              unsigned long *ioreq_gfn,
-                              unsigned long *bufioreq_gfn,
-                              evtchn_port_t *bufioreq_port)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    if ( ioreq_gfn || bufioreq_gfn )
-    {
-        rc = arch_ioreq_server_map_pages(s);
-        if ( rc )
-            goto out;
-    }
-
-    if ( ioreq_gfn )
-        *ioreq_gfn = gfn_x(s->ioreq.gfn);
-
-    if ( HANDLE_BUFIOREQ(s) )
-    {
-        if ( bufioreq_gfn )
-            *bufioreq_gfn = gfn_x(s->bufioreq.gfn);
-
-        if ( bufioreq_port )
-            *bufioreq_port = s->bufioreq_evtchn;
-    }
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
-                               unsigned long idx, mfn_t *mfn)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    ASSERT(is_hvm_domain(d));
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    rc = hvm_ioreq_server_alloc_pages(s);
-    if ( rc )
-        goto out;
-
-    switch ( idx )
-    {
-    case XENMEM_resource_ioreq_server_frame_bufioreq:
-        rc = -ENOENT;
-        if ( !HANDLE_BUFIOREQ(s) )
-            goto out;
-
-        *mfn = page_to_mfn(s->bufioreq.page);
-        rc = 0;
-        break;
-
-    case XENMEM_resource_ioreq_server_frame_ioreq(0):
-        *mfn = page_to_mfn(s->ioreq.page);
-        rc = 0;
-        break;
-
-    default:
-        rc = -EINVAL;
-        break;
-    }
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint64_t start,
-                                     uint64_t end)
-{
-    struct hvm_ioreq_server *s;
-    struct rangeset *r;
-    int rc;
-
-    if ( start > end )
-        return -EINVAL;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    switch ( type )
-    {
-    case XEN_DMOP_IO_RANGE_PORT:
-    case XEN_DMOP_IO_RANGE_MEMORY:
-    case XEN_DMOP_IO_RANGE_PCI:
-        r = s->range[type];
-        break;
-
-    default:
-        r = NULL;
-        break;
-    }
-
-    rc = -EINVAL;
-    if ( !r )
-        goto out;
-
-    rc = -EEXIST;
-    if ( rangeset_overlaps_range(r, start, end) )
-        goto out;
-
-    rc = rangeset_add_range(r, start, end);
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
-                                         uint32_t type, uint64_t start,
-                                         uint64_t end)
-{
-    struct hvm_ioreq_server *s;
-    struct rangeset *r;
-    int rc;
-
-    if ( start > end )
-        return -EINVAL;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    switch ( type )
-    {
-    case XEN_DMOP_IO_RANGE_PORT:
-    case XEN_DMOP_IO_RANGE_MEMORY:
-    case XEN_DMOP_IO_RANGE_PCI:
-        r = s->range[type];
-        break;
-
-    default:
-        r = NULL;
-        break;
-    }
-
-    rc = -EINVAL;
-    if ( !r )
-        goto out;
-
-    rc = -ENOENT;
-    if ( !rangeset_contains_range(r, start, end) )
-        goto out;
-
-    rc = rangeset_remove_range(r, start, end);
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-/* Called with ioreq_server lock held */
-int arch_ioreq_server_map_mem_type(struct domain *d,
-                                   struct hvm_ioreq_server *s,
-                                   uint32_t flags)
-{
-    int rc = p2m_set_ioreq_server(d, flags, s);
-
-    if ( rc == 0 && flags == 0 )
-    {
-        const struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
-        if ( read_atomic(&p2m->ioreq.entry_count) )
-            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
-    }
-
-    return rc;
-}
-
-/*
- * Map or unmap an ioreq server to specific memory type. For now, only
- * HVMMEM_ioreq_server is supported, and in the future new types can be
- * introduced, e.g. HVMMEM_ioreq_serverX mapped to ioreq server X. And
- * currently, only write operations are to be forwarded to an ioreq server.
- * Support for the emulation of read operations can be added when an ioreq
- * server has such requirement in the future.
- */
-int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint32_t flags)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    if ( type != HVMMEM_ioreq_server )
-        return -EINVAL;
-
-    if ( flags & ~XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
-        return -EINVAL;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    rc = arch_ioreq_server_map_mem_type(d, s, flags);
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
-                               bool enabled)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    domain_pause(d);
-
-    if ( enabled )
-        hvm_ioreq_server_enable(s);
-    else
-        hvm_ioreq_server_disable(s);
-
-    domain_unpause(d);
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    return rc;
-}
-
-int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc )
-            goto fail;
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return 0;
-
- fail:
-    while ( ++id != MAX_NR_IOREQ_SERVERS )
-    {
-        s = GET_IOREQ_SERVER(d, id);
-
-        if ( !s )
-            continue;
-
-        hvm_ioreq_server_remove_vcpu(s, v);
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-        hvm_ioreq_server_remove_vcpu(s, v);
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-}
-
-bool arch_ioreq_server_destroy_all(struct domain *d)
-{
-    return relocate_portio_handler(d, 0xcf8, 0xcf8, 4);
-}
-
-void hvm_destroy_all_ioreq_servers(struct domain *d)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    if ( !arch_ioreq_server_destroy_all(d) )
-        return;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    /* No need to domain_pause() as the domain is being torn down */
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        hvm_ioreq_server_disable(s);
-
-        /*
-         * It is safe to call hvm_ioreq_server_deinit() prior to
-         * set_ioreq_server() since the target domain is being destroyed.
-         */
-        hvm_ioreq_server_deinit(s);
-        set_ioreq_server(d, id, NULL);
-
-        xfree(s);
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-}
-
-int arch_ioreq_server_get_type_addr(const struct domain *d,
-                                    const ioreq_t *p,
-                                    uint8_t *type,
-                                    uint64_t *addr)
-{
-    unsigned int cf8 = d->arch.hvm.pci_cf8;
-
-    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
-        return -EINVAL;
-
-    if ( p->type == IOREQ_TYPE_PIO &&
-         (p->addr & ~3) == 0xcfc &&
-         CF8_ENABLED(cf8) )
-    {
-        unsigned int x86_fam, reg;
-        pci_sbdf_t sbdf;
-
-        reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf);
-
-        /* PCI config data cycle */
-        *type = XEN_DMOP_IO_RANGE_PCI;
-        *addr = ((uint64_t)sbdf.sbdf << 32) | reg;
-        /* AMD extended configuration space access? */
-        if ( CF8_ADDR_HI(cf8) &&
-             d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&
-             (x86_fam = get_cpu_family(
-                 d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 &&
-             x86_fam < 0x17 )
-        {
-            uint64_t msr_val;
-
-            if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) &&
-                 (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
-                *addr |= CF8_ADDR_HI(cf8);
-        }
-    }
-    else
-    {
-        *type = (p->type == IOREQ_TYPE_PIO) ?
-                 XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
-        *addr = p->addr;
-    }
-
-    return 0;
-}
-
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p)
-{
-    struct hvm_ioreq_server *s;
-    uint8_t type;
-    uint64_t addr;
-    unsigned int id;
-
-    if ( arch_ioreq_server_get_type_addr(d, p, &type, &addr) )
-        return NULL;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        struct rangeset *r;
-
-        if ( !s->enabled )
-            continue;
-
-        r = s->range[type];
-
-        switch ( type )
-        {
-            unsigned long start, end;
-
-        case XEN_DMOP_IO_RANGE_PORT:
-            start = addr;
-            end = start + p->size - 1;
-            if ( rangeset_contains_range(r, start, end) )
-                return s;
-
-            break;
-
-        case XEN_DMOP_IO_RANGE_MEMORY:
-            start = hvm_mmio_first_byte(p);
-            end = hvm_mmio_last_byte(p);
-
-            if ( rangeset_contains_range(r, start, end) )
-                return s;
-
-            break;
-
-        case XEN_DMOP_IO_RANGE_PCI:
-            if ( rangeset_contains_singleton(r, addr >> 32) )
-            {
-                p->type = IOREQ_TYPE_PCI_CONFIG;
-                p->addr = addr;
-                return s;
-            }
-
-            break;
-        }
-    }
-
-    return NULL;
-}
-
-static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
-{
-    struct domain *d = current->domain;
-    struct hvm_ioreq_page *iorp;
-    buffered_iopage_t *pg;
-    buf_ioreq_t bp = { .data = p->data,
-                       .addr = p->addr,
-                       .type = p->type,
-                       .dir = p->dir };
-    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
-    int qw = 0;
-
-    /* Ensure buffered_iopage fits in a page */
-    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
-
-    iorp = &s->bufioreq;
-    pg = iorp->va;
-
-    if ( !pg )
-        return IOREQ_STATUS_UNHANDLED;
-
-    /*
-     * Return 0 for the cases we can't deal with:
-     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
-     *  - we cannot buffer accesses to guest memory buffers, as the guest
-     *    may expect the memory buffer to be synchronously accessed
-     *  - the count field is usually used with data_is_ptr and since we don't
-     *    support data_is_ptr we do not waste space for the count field either
-     */
-    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
-        return 0;
-
-    switch ( p->size )
-    {
-    case 1:
-        bp.size = 0;
-        break;
-    case 2:
-        bp.size = 1;
-        break;
-    case 4:
-        bp.size = 2;
-        break;
-    case 8:
-        bp.size = 3;
-        qw = 1;
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return IOREQ_STATUS_UNHANDLED;
-    }
-
-    spin_lock(&s->bufioreq_lock);
-
-    if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >=
-         (IOREQ_BUFFER_SLOT_NUM - qw) )
-    {
-        /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&s->bufioreq_lock);
-        return IOREQ_STATUS_UNHANDLED;
-    }
-
-    pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
-
-    if ( qw )
-    {
-        bp.data = p->data >> 32;
-        pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
-    }
-
-    /* Make the ioreq_t visible /before/ write_pointer. */
-    smp_wmb();
-    pg->ptrs.write_pointer += qw ? 2 : 1;
-
-    /* Canonicalize read/write pointers to prevent their overflow. */
-    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
-            qw++ < IOREQ_BUFFER_SLOT_NUM &&
-            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
-    {
-        union bufioreq_pointers old = pg->ptrs, new;
-        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
-
-        new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-        new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-        cmpxchg(&pg->ptrs.full, old.full, new.full);
-    }
-
-    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
-    spin_unlock(&s->bufioreq_lock);
-
-    return IOREQ_STATUS_HANDLED;
-}
-
-int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
-                   bool buffered)
-{
-    struct vcpu *curr = current;
-    struct domain *d = curr->domain;
-    struct hvm_ioreq_vcpu *sv;
-
-    ASSERT(s);
-
-    if ( buffered )
-        return hvm_send_buffered_ioreq(s, proto_p);
-
-    if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-        return IOREQ_STATUS_RETRY;
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-    {
-        if ( sv->vcpu == curr )
-        {
-            evtchn_port_t port = sv->ioreq_evtchn;
-            ioreq_t *p = get_ioreq(s, curr);
-
-            if ( unlikely(p->state != STATE_IOREQ_NONE) )
-            {
-                gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
-                        p->state);
-                break;
-            }
-
-            if ( unlikely(p->vp_eport != port) )
-            {
-                gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
-                        p->vp_eport);
-                break;
-            }
-
-            proto_p->state = STATE_IOREQ_NONE;
-            proto_p->vp_eport = port;
-            *p = *proto_p;
-
-            prepare_wait_on_xen_event_channel(port);
-
-            /*
-             * Following happens /after/ blocking and setting up ioreq
-             * contents. prepare_wait_on_xen_event_channel() is an implicit
-             * barrier.
-             */
-            p->state = STATE_IOREQ_READY;
-            notify_via_xen_event_channel(d, port);
-
-            sv->pending = true;
-            return IOREQ_STATUS_RETRY;
-        }
-    }
-
-    return IOREQ_STATUS_UNHANDLED;
-}
-
-unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
-{
-    struct domain *d = current->domain;
-    struct hvm_ioreq_server *s;
-    unsigned int id, failed = 0;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        if ( !s->enabled )
-            continue;
-
-        if ( hvm_send_ioreq(s, p, buffered) == IOREQ_STATUS_UNHANDLED )
-            failed++;
-    }
-
-    return failed;
+    return 0;
 }
 
 static int hvm_access_cf8(
@@ -1574,13 +339,6 @@ void arch_ioreq_domain_init(struct domain *d)
     register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 }
 
-void hvm_ioreq_init(struct domain *d)
-{
-    spin_lock_init(&d->arch.hvm.ioreq_server.lock);
-
-    arch_ioreq_domain_init(d);
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5a50339..e4638ef 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -100,6 +100,7 @@
  */
 
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
 #include <xen/mm.h>
@@ -141,7 +142,6 @@
 #include <asm/io_apic.h>
 #include <asm/pci.h>
 #include <asm/guest.h>
-#include <asm/hvm/ioreq.h>
 
 #include <asm/hvm/grant_table.h>
 #include <asm/pv/domain.h>
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index a33e100..f7d74d3 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -20,6 +20,7 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/ioreq.h>
 #include <xen/types.h>
 #include <xen/mm.h>
 #include <xen/trace.h>
@@ -34,7 +35,6 @@
 #include <asm/current.h>
 #include <asm/flushtlb.h>
 #include <asm/shadow.h>
-#include <asm/hvm/ioreq.h>
 #include <xen/numa.h>
 #include "private.h"
 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 3e2cf25..c971ded 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -139,6 +139,9 @@ config HYPFS_CONFIG
 	  Disable this option in case you want to spare some memory or you
 	  want to hide the .config contents from dom0.
 
+config IOREQ_SERVER
+	bool
+
 config KEXEC
 	bool "kexec support"
 	default y
diff --git a/xen/common/Makefile b/xen/common/Makefile
index d109f27..c0e91c4 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_GRANT_TABLE) += grant_table.o
 obj-y += guestcopy.o
 obj-bin-y += gunzip.init.o
 obj-$(CONFIG_HYPFS) += hypfs.o
+obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
 obj-y += irq.o
 obj-y += kernel.o
 obj-y += keyhandler.o
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
new file mode 100644
index 0000000..13ea959
--- /dev/null
+++ b/xen/common/ioreq.c
@@ -0,0 +1,1287 @@
+/*
+ * ioreq.c: hardware virtual machine I/O emulation
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/ctype.h>
+#include <xen/domain.h>
+#include <xen/domain_page.h>
+#include <xen/event.h>
+#include <xen/init.h>
+#include <xen/irq.h>
+#include <xen/lib.h>
+#include <xen/paging.h>
+#include <xen/sched.h>
+#include <xen/softirq.h>
+#include <xen/trace.h>
+#include <xen/vpci.h>
+
+#include <asm/hvm/ioreq.h>
+
+#include <public/hvm/ioreq.h>
+#include <public/hvm/params.h>
+
+static void set_ioreq_server(struct domain *d, unsigned int id,
+                             struct hvm_ioreq_server *s)
+{
+    ASSERT(id < MAX_NR_IOREQ_SERVERS);
+    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
+
+    d->arch.hvm.ioreq_server.server[id] = s;
+}
+
+#define GET_IOREQ_SERVER(d, id) \
+    (d)->arch.hvm.ioreq_server.server[id]
+
+static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
+                                                 unsigned int id)
+{
+    if ( id >= MAX_NR_IOREQ_SERVERS )
+        return NULL;
+
+    return GET_IOREQ_SERVER(d, id);
+}
+
+/*
+ * Iterate over all possible ioreq servers.
+ *
+ * NOTE: The iteration is backwards such that more recently created
+ *       ioreq servers are favoured in hvm_select_ioreq_server().
+ *       This is a semantic that previously existed when ioreq servers
+ *       were held in a linked list.
+ */
+#define FOR_EACH_IOREQ_SERVER(d, id, s) \
+    for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \
+        if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \
+            continue; \
+        else
+
+static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    shared_iopage_t *p = s->ioreq.va;
+
+    ASSERT((v == current) || !vcpu_runnable(v));
+    ASSERT(p != NULL);
+
+    return &p->vcpu_ioreq[v->vcpu_id];
+}
+
+static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v,
+                                               struct hvm_ioreq_server **srvp)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        struct hvm_ioreq_vcpu *sv;
+
+        list_for_each_entry ( sv,
+                              &s->ioreq_vcpu_list,
+                              list_entry )
+        {
+            if ( sv->vcpu == v && sv->pending )
+            {
+                if ( srvp )
+                    *srvp = s;
+                return sv;
+            }
+        }
+    }
+
+    return NULL;
+}
+
+bool hvm_io_pending(struct vcpu *v)
+{
+    return get_pending_vcpu(v, NULL);
+}
+
+static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
+{
+    unsigned int prev_state = STATE_IOREQ_NONE;
+    unsigned int state = p->state;
+    uint64_t data = ~0;
+
+    smp_rmb();
+
+    /*
+     * The only reason we should see this condition be false is when an
+     * emulator dying races with I/O being requested.
+     */
+    while ( likely(state != STATE_IOREQ_NONE) )
+    {
+        if ( unlikely(state < prev_state) )
+        {
+            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
+                     prev_state, state);
+            sv->pending = false;
+            domain_crash(sv->vcpu->domain);
+            return false; /* bail */
+        }
+
+        switch ( prev_state = state )
+        {
+        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+            p->state = STATE_IOREQ_NONE;
+            data = p->data;
+            break;
+
+        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
+        case STATE_IOREQ_INPROCESS:
+            wait_on_xen_event_channel(sv->ioreq_evtchn,
+                                      ({ state = p->state;
+                                         smp_rmb();
+                                         state != prev_state; }));
+            continue;
+
+        default:
+            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state);
+            sv->pending = false;
+            domain_crash(sv->vcpu->domain);
+            return false; /* bail */
+        }
+
+        break;
+    }
+
+    p = &sv->vcpu->arch.hvm.hvm_io.io_req;
+    if ( hvm_ioreq_needs_completion(p) )
+        p->data = data;
+
+    sv->pending = false;
+
+    return true;
+}
+
+bool handle_hvm_io_completion(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
+    struct hvm_ioreq_server *s;
+    struct hvm_ioreq_vcpu *sv;
+    enum hvm_io_completion io_completion;
+
+    if ( has_vpci(d) && vpci_process_pending(v) )
+    {
+        raise_softirq(SCHEDULE_SOFTIRQ);
+        return false;
+    }
+
+    sv = get_pending_vcpu(v, &s);
+    if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
+        return false;
+
+    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
+        STATE_IORESP_READY : STATE_IOREQ_NONE;
+
+    msix_write_completion(v);
+    vcpu_end_shutdown_deferral(v);
+
+    io_completion = vio->io_completion;
+    vio->io_completion = HVMIO_no_completion;
+
+    switch ( io_completion )
+    {
+    case HVMIO_no_completion:
+        break;
+
+    case HVMIO_mmio_completion:
+        return ioreq_complete_mmio();
+
+    case HVMIO_pio_completion:
+        return handle_pio(vio->io_req.addr, vio->io_req.size,
+                          vio->io_req.dir);
+
+    default:
+        return arch_vcpu_ioreq_completion(io_completion);
+    }
+
+    return true;
+}
+
+static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct page_info *page;
+
+    if ( iorp->page )
+    {
+        /*
+         * If a guest frame has already been mapped (which may happen
+         * on demand if hvm_get_ioreq_server_info() is called), then
+         * allocating a page is not permitted.
+         */
+        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
+            return -EPERM;
+
+        return 0;
+    }
+
+    page = alloc_domheap_page(s->target, MEMF_no_refcount);
+
+    if ( !page )
+        return -ENOMEM;
+
+    if ( !get_page_and_type(page, s->target, PGT_writable_page) )
+    {
+        /*
+         * The domain can't possibly know about this page yet, so failure
+         * here is a clear indication of something fishy going on.
+         */
+        domain_crash(s->emulator);
+        return -ENODATA;
+    }
+
+    iorp->va = __map_domain_page_global(page);
+    if ( !iorp->va )
+        goto fail;
+
+    iorp->page = page;
+    clear_page(iorp->va);
+    return 0;
+
+ fail:
+    put_page_alloc_ref(page);
+    put_page_and_type(page);
+
+    return -ENOMEM;
+}
+
+static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct page_info *page = iorp->page;
+
+    if ( !page )
+        return;
+
+    iorp->page = NULL;
+
+    unmap_domain_page_global(iorp->va);
+    iorp->va = NULL;
+
+    put_page_alloc_ref(page);
+    put_page_and_type(page);
+}
+
+bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
+{
+    const struct hvm_ioreq_server *s;
+    unsigned int id;
+    bool found = false;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
+        {
+            found = true;
+            break;
+        }
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return found;
+}
+
+static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
+                                    struct hvm_ioreq_vcpu *sv)
+{
+    ASSERT(spin_is_locked(&s->lock));
+
+    if ( s->ioreq.va != NULL )
+    {
+        ioreq_t *p = get_ioreq(s, sv->vcpu);
+
+        p->vp_eport = sv->ioreq_evtchn;
+    }
+}
+
+static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
+                                     struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+    int rc;
+
+    sv = xzalloc(struct hvm_ioreq_vcpu);
+
+    rc = -ENOMEM;
+    if ( !sv )
+        goto fail1;
+
+    spin_lock(&s->lock);
+
+    rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id,
+                                         s->emulator->domain_id, NULL);
+    if ( rc < 0 )
+        goto fail2;
+
+    sv->ioreq_evtchn = rc;
+
+    if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+    {
+        rc = alloc_unbound_xen_event_channel(v->domain, 0,
+                                             s->emulator->domain_id, NULL);
+        if ( rc < 0 )
+            goto fail3;
+
+        s->bufioreq_evtchn = rc;
+    }
+
+    sv->vcpu = v;
+
+    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
+
+    if ( s->enabled )
+        hvm_update_ioreq_evtchn(s, sv);
+
+    spin_unlock(&s->lock);
+    return 0;
+
+ fail3:
+    free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+ fail2:
+    spin_unlock(&s->lock);
+    xfree(sv);
+
+ fail1:
+    return rc;
+}
+
+static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
+                                         struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+    {
+        if ( sv->vcpu != v )
+            continue;
+
+        list_del(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+        xfree(sv);
+        break;
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
+{
+    struct hvm_ioreq_vcpu *sv, *next;
+
+    spin_lock(&s->lock);
+
+    list_for_each_entry_safe ( sv,
+                               next,
+                               &s->ioreq_vcpu_list,
+                               list_entry )
+    {
+        struct vcpu *v = sv->vcpu;
+
+        list_del(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+        xfree(sv);
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
+{
+    int rc;
+
+    rc = hvm_alloc_ioreq_mfn(s, false);
+
+    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
+        rc = hvm_alloc_ioreq_mfn(s, true);
+
+    if ( rc )
+        hvm_free_ioreq_mfn(s, false);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
+{
+    hvm_free_ioreq_mfn(s, true);
+    hvm_free_ioreq_mfn(s, false);
+}
+
+static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
+{
+    unsigned int i;
+
+    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+        rangeset_destroy(s->range[i]);
+}
+
+static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
+                                            ioservid_t id)
+{
+    unsigned int i;
+    int rc;
+
+    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+    {
+        char *name;
+
+        rc = asprintf(&name, "ioreq_server %d %s", id,
+                      (i == XEN_DMOP_IO_RANGE_PORT) ? "port" :
+                      (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" :
+                      (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" :
+                      "");
+        if ( rc )
+            goto fail;
+
+        s->range[i] = rangeset_new(s->target, name,
+                                   RANGESETF_prettyprint_hex);
+
+        xfree(name);
+
+        rc = -ENOMEM;
+        if ( !s->range[i] )
+            goto fail;
+
+        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+    }
+
+    return 0;
+
+ fail:
+    hvm_ioreq_server_free_rangesets(s);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
+{
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    if ( s->enabled )
+        goto done;
+
+    arch_ioreq_server_enable(s);
+
+    s->enabled = true;
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+        hvm_update_ioreq_evtchn(s, sv);
+
+  done:
+    spin_unlock(&s->lock);
+}
+
+static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
+{
+    spin_lock(&s->lock);
+
+    if ( !s->enabled )
+        goto done;
+
+    arch_ioreq_server_disable(s);
+
+    s->enabled = false;
+
+ done:
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
+                                 struct domain *d, int bufioreq_handling,
+                                 ioservid_t id)
+{
+    struct domain *currd = current->domain;
+    struct vcpu *v;
+    int rc;
+
+    s->target = d;
+
+    get_knownalive_domain(currd);
+    s->emulator = currd;
+
+    spin_lock_init(&s->lock);
+    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
+    spin_lock_init(&s->bufioreq_lock);
+
+    s->ioreq.gfn = INVALID_GFN;
+    s->bufioreq.gfn = INVALID_GFN;
+
+    rc = hvm_ioreq_server_alloc_rangesets(s, id);
+    if ( rc )
+        return rc;
+
+    s->bufioreq_handling = bufioreq_handling;
+
+    for_each_vcpu ( d, v )
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail_add;
+    }
+
+    return 0;
+
+ fail_add:
+    hvm_ioreq_server_remove_all_vcpus(s);
+    arch_ioreq_server_unmap_pages(s);
+
+    hvm_ioreq_server_free_rangesets(s);
+
+    put_domain(s->emulator);
+    return rc;
+}
+
+static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
+{
+    ASSERT(!s->enabled);
+    hvm_ioreq_server_remove_all_vcpus(s);
+
+    /*
+     * NOTE: It is safe to call both arch_ioreq_server_unmap_pages() and
+     *       hvm_ioreq_server_free_pages() in that order.
+     *       This is because the former will do nothing if the pages
+     *       are not mapped, leaving the page to be freed by the latter.
+     *       However if the pages are mapped then the former will set
+     *       the page_info pointer to NULL, meaning the latter will do
+     *       nothing.
+     */
+    arch_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_free_pages(s);
+
+    hvm_ioreq_server_free_rangesets(s);
+
+    put_domain(s->emulator);
+}
+
+int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
+                            ioservid_t *id)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int i;
+    int rc;
+
+    if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC )
+        return -EINVAL;
+
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        return -ENOMEM;
+
+    domain_pause(d);
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
+    {
+        if ( !GET_IOREQ_SERVER(d, i) )
+            break;
+    }
+
+    rc = -ENOSPC;
+    if ( i >= MAX_NR_IOREQ_SERVERS )
+        goto fail;
+
+    /*
+     * It is safe to call set_ioreq_server() prior to
+     * hvm_ioreq_server_init() since the target domain is paused.
+     */
+    set_ioreq_server(d, i, s);
+
+    rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i);
+    if ( rc )
+    {
+        set_ioreq_server(d, i, NULL);
+        goto fail;
+    }
+
+    if ( id )
+        *id = i;
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    domain_unpause(d);
+
+    return 0;
+
+ fail:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    domain_unpause(d);
+
+    xfree(s);
+    return rc;
+}
+
+int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    domain_pause(d);
+
+    arch_ioreq_server_destroy(s);
+
+    hvm_ioreq_server_disable(s);
+
+    /*
+     * It is safe to call hvm_ioreq_server_deinit() prior to
+     * set_ioreq_server() since the target domain is paused.
+     */
+    hvm_ioreq_server_deinit(s);
+    set_ioreq_server(d, id, NULL);
+
+    domain_unpause(d);
+
+    xfree(s);
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                              unsigned long *ioreq_gfn,
+                              unsigned long *bufioreq_gfn,
+                              evtchn_port_t *bufioreq_port)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    if ( ioreq_gfn || bufioreq_gfn )
+    {
+        rc = arch_ioreq_server_map_pages(s);
+        if ( rc )
+            goto out;
+    }
+
+    if ( ioreq_gfn )
+        *ioreq_gfn = gfn_x(s->ioreq.gfn);
+
+    if ( HANDLE_BUFIOREQ(s) )
+    {
+        if ( bufioreq_gfn )
+            *bufioreq_gfn = gfn_x(s->bufioreq.gfn);
+
+        if ( bufioreq_port )
+            *bufioreq_port = s->bufioreq_evtchn;
+    }
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
+                               unsigned long idx, mfn_t *mfn)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    ASSERT(is_hvm_domain(d));
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    rc = hvm_ioreq_server_alloc_pages(s);
+    if ( rc )
+        goto out;
+
+    switch ( idx )
+    {
+    case XENMEM_resource_ioreq_server_frame_bufioreq:
+        rc = -ENOENT;
+        if ( !HANDLE_BUFIOREQ(s) )
+            goto out;
+
+        *mfn = page_to_mfn(s->bufioreq.page);
+        rc = 0;
+        break;
+
+    case XENMEM_resource_ioreq_server_frame_ioreq(0):
+        *mfn = page_to_mfn(s->ioreq.page);
+        rc = 0;
+        break;
+
+    default:
+        rc = -EINVAL;
+        break;
+    }
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint64_t start,
+                                     uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    struct rangeset *r;
+    int rc;
+
+    if ( start > end )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    switch ( type )
+    {
+    case XEN_DMOP_IO_RANGE_PORT:
+    case XEN_DMOP_IO_RANGE_MEMORY:
+    case XEN_DMOP_IO_RANGE_PCI:
+        r = s->range[type];
+        break;
+
+    default:
+        r = NULL;
+        break;
+    }
+
+    rc = -EINVAL;
+    if ( !r )
+        goto out;
+
+    rc = -EEXIST;
+    if ( rangeset_overlaps_range(r, start, end) )
+        goto out;
+
+    rc = rangeset_add_range(r, start, end);
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                         uint32_t type, uint64_t start,
+                                         uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    struct rangeset *r;
+    int rc;
+
+    if ( start > end )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    switch ( type )
+    {
+    case XEN_DMOP_IO_RANGE_PORT:
+    case XEN_DMOP_IO_RANGE_MEMORY:
+    case XEN_DMOP_IO_RANGE_PCI:
+        r = s->range[type];
+        break;
+
+    default:
+        r = NULL;
+        break;
+    }
+
+    rc = -EINVAL;
+    if ( !r )
+        goto out;
+
+    rc = -ENOENT;
+    if ( !rangeset_contains_range(r, start, end) )
+        goto out;
+
+    rc = rangeset_remove_range(r, start, end);
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+/*
+ * Map or unmap an ioreq server to specific memory type. For now, only
+ * HVMMEM_ioreq_server is supported, and in the future new types can be
+ * introduced, e.g. HVMMEM_ioreq_serverX mapped to ioreq server X. And
+ * currently, only write operations are to be forwarded to an ioreq server.
+ * Support for the emulation of read operations can be added when an ioreq
+ * server has such requirement in the future.
+ */
+int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint32_t flags)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    if ( type != HVMMEM_ioreq_server )
+        return -EINVAL;
+
+    if ( flags & ~XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    rc = arch_ioreq_server_map_mem_type(d, s, flags);
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                               bool enabled)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    domain_pause(d);
+
+    if ( enabled )
+        hvm_ioreq_server_enable(s);
+    else
+        hvm_ioreq_server_disable(s);
+
+    domain_unpause(d);
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    return rc;
+}
+
+int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail;
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return 0;
+
+ fail:
+    while ( ++id != MAX_NR_IOREQ_SERVERS )
+    {
+        s = GET_IOREQ_SERVER(d, id);
+
+        if ( !s )
+            continue;
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+        hvm_ioreq_server_remove_vcpu(s, v);
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+}
+
+void hvm_destroy_all_ioreq_servers(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    if ( !arch_ioreq_server_destroy_all(d) )
+        return;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    /* No need to domain_pause() as the domain is being torn down */
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        hvm_ioreq_server_disable(s);
+
+        /*
+         * It is safe to call hvm_ioreq_server_deinit() prior to
+         * set_ioreq_server() since the target domain is being destroyed.
+         */
+        hvm_ioreq_server_deinit(s);
+        set_ioreq_server(d, id, NULL);
+
+        xfree(s);
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+}
+
+struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                 ioreq_t *p)
+{
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+    unsigned int id;
+
+    if ( arch_ioreq_server_get_type_addr(d, p, &type, &addr) )
+        return NULL;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        struct rangeset *r;
+
+        if ( !s->enabled )
+            continue;
+
+        r = s->range[type];
+
+        switch ( type )
+        {
+            unsigned long start, end;
+
+        case XEN_DMOP_IO_RANGE_PORT:
+            start = addr;
+            end = start + p->size - 1;
+            if ( rangeset_contains_range(r, start, end) )
+                return s;
+
+            break;
+
+        case XEN_DMOP_IO_RANGE_MEMORY:
+            start = hvm_mmio_first_byte(p);
+            end = hvm_mmio_last_byte(p);
+
+            if ( rangeset_contains_range(r, start, end) )
+                return s;
+
+            break;
+
+        case XEN_DMOP_IO_RANGE_PCI:
+            if ( rangeset_contains_singleton(r, addr >> 32) )
+            {
+                p->type = IOREQ_TYPE_PCI_CONFIG;
+                p->addr = addr;
+                return s;
+            }
+
+            break;
+        }
+    }
+
+    return NULL;
+}
+
+static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
+{
+    struct domain *d = current->domain;
+    struct hvm_ioreq_page *iorp;
+    buffered_iopage_t *pg;
+    buf_ioreq_t bp = { .data = p->data,
+                       .addr = p->addr,
+                       .type = p->type,
+                       .dir = p->dir };
+    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
+    int qw = 0;
+
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
+    iorp = &s->bufioreq;
+    pg = iorp->va;
+
+    if ( !pg )
+        return IOREQ_STATUS_UNHANDLED;
+
+    /*
+     * Return 0 for the cases we can't deal with:
+     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
+     *  - we cannot buffer accesses to guest memory buffers, as the guest
+     *    may expect the memory buffer to be synchronously accessed
+     *  - the count field is usually used with data_is_ptr and since we don't
+     *    support data_is_ptr we do not waste space for the count field either
+     */
+    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
+        return 0;
+
+    switch ( p->size )
+    {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
+        return IOREQ_STATUS_UNHANDLED;
+    }
+
+    spin_lock(&s->bufioreq_lock);
+
+    if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >=
+         (IOREQ_BUFFER_SLOT_NUM - qw) )
+    {
+        /* The queue is full: send the iopacket through the normal path. */
+        spin_unlock(&s->bufioreq_lock);
+        return IOREQ_STATUS_UNHANDLED;
+    }
+
+    pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
+
+    if ( qw )
+    {
+        bp.data = p->data >> 32;
+        pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
+    }
+
+    /* Make the ioreq_t visible /before/ write_pointer. */
+    smp_wmb();
+    pg->ptrs.write_pointer += qw ? 2 : 1;
+
+    /* Canonicalize read/write pointers to prevent their overflow. */
+    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
+            qw++ < IOREQ_BUFFER_SLOT_NUM &&
+            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
+    {
+        union bufioreq_pointers old = pg->ptrs, new;
+        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
+
+        new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
+        new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
+        cmpxchg(&pg->ptrs.full, old.full, new.full);
+    }
+
+    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
+    spin_unlock(&s->bufioreq_lock);
+
+    return IOREQ_STATUS_HANDLED;
+}
+
+int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
+                   bool buffered)
+{
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
+    struct hvm_ioreq_vcpu *sv;
+
+    ASSERT(s);
+
+    if ( buffered )
+        return hvm_send_buffered_ioreq(s, proto_p);
+
+    if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
+        return IOREQ_STATUS_RETRY;
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+    {
+        if ( sv->vcpu == curr )
+        {
+            evtchn_port_t port = sv->ioreq_evtchn;
+            ioreq_t *p = get_ioreq(s, curr);
+
+            if ( unlikely(p->state != STATE_IOREQ_NONE) )
+            {
+                gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
+                        p->state);
+                break;
+            }
+
+            if ( unlikely(p->vp_eport != port) )
+            {
+                gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
+                        p->vp_eport);
+                break;
+            }
+
+            proto_p->state = STATE_IOREQ_NONE;
+            proto_p->vp_eport = port;
+            *p = *proto_p;
+
+            prepare_wait_on_xen_event_channel(port);
+
+            /*
+             * Following happens /after/ blocking and setting up ioreq
+             * contents. prepare_wait_on_xen_event_channel() is an implicit
+             * barrier.
+             */
+            p->state = STATE_IOREQ_READY;
+            notify_via_xen_event_channel(d, port);
+
+            sv->pending = true;
+            return IOREQ_STATUS_RETRY;
+        }
+    }
+
+    return IOREQ_STATUS_UNHANDLED;
+}
+
+unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
+{
+    struct domain *d = current->domain;
+    struct hvm_ioreq_server *s;
+    unsigned int id, failed = 0;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        if ( !s->enabled )
+            continue;
+
+        if ( hvm_send_ioreq(s, p, buffered) == IOREQ_STATUS_UNHANDLED )
+            failed++;
+    }
+
+    return failed;
+}
+
+void hvm_ioreq_init(struct domain *d)
+{
+    spin_lock_init(&d->arch.hvm.ioreq_server.lock);
+
+    arch_ioreq_domain_init(d);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index c7563e1..ab2f3f8 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -19,8 +19,7 @@
 #ifndef __ASM_X86_HVM_IOREQ_H__
 #define __ASM_X86_HVM_IOREQ_H__
 
-#define HANDLE_BUFIOREQ(s) \
-    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
+#include <xen/ioreq.h>
 
 bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion);
 int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s);
@@ -38,42 +37,6 @@ int arch_ioreq_server_get_type_addr(const struct domain *d,
                                     uint64_t *addr);
 void arch_ioreq_domain_init(struct domain *d);
 
-bool hvm_io_pending(struct vcpu *v);
-bool handle_hvm_io_completion(struct vcpu *v);
-bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
-
-int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
-                            ioservid_t *id);
-int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id);
-int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
-                              unsigned long *ioreq_gfn,
-                              unsigned long *bufioreq_gfn,
-                              evtchn_port_t *bufioreq_port);
-int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
-                               unsigned long idx, mfn_t *mfn);
-int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint64_t start,
-                                     uint64_t end);
-int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
-                                         uint32_t type, uint64_t start,
-                                         uint64_t end);
-int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint32_t flags);
-int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
-                               bool enabled);
-
-int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
-void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
-void hvm_destroy_all_ioreq_servers(struct domain *d);
-
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p);
-int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
-                   bool buffered);
-unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
-
-void hvm_ioreq_init(struct domain *d);
-
 bool ioreq_complete_mmio(void);
 
 #define IOREQ_STATUS_HANDLED     X86EMUL_OKAY
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
new file mode 100644
index 0000000..ad47c61
--- /dev/null
+++ b/xen/include/xen/ioreq.h
@@ -0,0 +1,73 @@
+/*
+ * ioreq.h: Hardware virtual machine assist interface definitions.
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __XEN_IOREQ_H__
+#define __XEN_IOREQ_H__
+
+#include <xen/sched.h>
+
+#define HANDLE_BUFIOREQ(s) \
+    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
+
+bool hvm_io_pending(struct vcpu *v);
+bool handle_hvm_io_completion(struct vcpu *v);
+bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
+
+int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
+                            ioservid_t *id);
+int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id);
+int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                              unsigned long *ioreq_gfn,
+                              unsigned long *bufioreq_gfn,
+                              evtchn_port_t *bufioreq_port);
+int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
+                               unsigned long idx, mfn_t *mfn);
+int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint64_t start,
+                                     uint64_t end);
+int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                         uint32_t type, uint64_t start,
+                                         uint64_t end);
+int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint32_t flags);
+int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                               bool enabled);
+
+int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
+void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
+void hvm_destroy_all_ioreq_servers(struct domain *d);
+
+struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                 ioreq_t *p);
+int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
+                   bool buffered);
+unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
+
+void hvm_ioreq_init(struct domain *d);
+
+#endif /* __XEN_IOREQ_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 05/23] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (3 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-07 11:47   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 06/23] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common Oleksandr Tyshchenko
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The IOREQ is a common feature now and this helper will be used
on Arm as is. Move it to xen/ioreq.h and remove "hvm" prefix.

Although PIO handling on Arm is not introduced with the current series
(it will be implemented when we add support for vPCI), technically
the PIOs exist on Arm (however they are accessed the same way as MMIO)
and it would be better not to diverge now.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Paul Durrant <paul@xen.org>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch, was split from:
     "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common"

Changes V1 -> V2:
   - remove "hvm" prefix

Changes V2 -> V3:
   - add Paul's R-b
---
---
 xen/arch/x86/hvm/emulate.c     | 4 ++--
 xen/arch/x86/hvm/io.c          | 2 +-
 xen/common/ioreq.c             | 4 ++--
 xen/include/asm-x86/hvm/vcpu.h | 7 -------
 xen/include/xen/ioreq.h        | 7 +++++++
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 24cf85f..5700274 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -336,7 +336,7 @@ static int hvmemul_do_io(
             rc = hvm_send_ioreq(s, &p, 0);
             if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
                 vio->io_req.state = STATE_IOREQ_NONE;
-            else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
+            else if ( !ioreq_needs_completion(&vio->io_req) )
                 rc = X86EMUL_OKAY;
         }
         break;
@@ -2649,7 +2649,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
     if ( rc == X86EMUL_OKAY && vio->mmio_retry )
         rc = X86EMUL_RETRY;
 
-    if ( !hvm_ioreq_needs_completion(&vio->io_req) )
+    if ( !ioreq_needs_completion(&vio->io_req) )
         completion = HVMIO_no_completion;
     else if ( completion == HVMIO_no_completion )
         completion = (vio->io_req.type != IOREQ_TYPE_PIO ||
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 3e09d9b..b220d6b 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -135,7 +135,7 @@ bool handle_pio(uint16_t port, unsigned int size, int dir)
 
     rc = hvmemul_do_pio_buffer(port, size, dir, &data);
 
-    if ( hvm_ioreq_needs_completion(&vio->io_req) )
+    if ( ioreq_needs_completion(&vio->io_req) )
         vio->io_completion = HVMIO_pio_completion;
 
     switch ( rc )
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 13ea959..44385ef 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -160,7 +160,7 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
     }
 
     p = &sv->vcpu->arch.hvm.hvm_io.io_req;
-    if ( hvm_ioreq_needs_completion(p) )
+    if ( ioreq_needs_completion(p) )
         p->data = data;
 
     sv->pending = false;
@@ -186,7 +186,7 @@ bool handle_hvm_io_completion(struct vcpu *v)
     if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
         return false;
 
-    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
+    vio->io_req.state = ioreq_needs_completion(&vio->io_req) ?
         STATE_IORESP_READY : STATE_IOREQ_NONE;
 
     msix_write_completion(v);
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 5ccd075..6c1feda 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -91,13 +91,6 @@ struct hvm_vcpu_io {
     const struct g2m_ioport *g2m_ioport;
 };
 
-static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
-{
-    return ioreq->state == STATE_IOREQ_READY &&
-           !ioreq->data_is_ptr &&
-           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
-}
-
 struct nestedvcpu {
     bool_t nv_guestmode; /* vcpu in guestmode? */
     void *nv_vvmcx; /* l1 guest virtual VMCB/VMCS */
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index ad47c61..3cc333d 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -21,6 +21,13 @@
 
 #include <xen/sched.h>
 
+static inline bool ioreq_needs_completion(const ioreq_t *ioreq)
+{
+    return ioreq->state == STATE_IOREQ_READY &&
+           !ioreq->data_is_ptr &&
+           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
+}
+
 #define HANDLE_BUFIOREQ(s) \
     ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 06/23] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (4 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 05/23] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-07 11:48   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 07/23] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common Oleksandr Tyshchenko
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The IOREQ is a common feature now and these helpers will be used
on Arm as is. Move them to xen/ioreq.h and replace "hvm" prefixes
with "ioreq".

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Paul Durrant <paul@xen.org>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch

Changes V1 -> V2:
   - replace "hvm" prefix by "ioreq"

Changes V2 -> V3:
   - add Paul's R-b
---
---
 xen/arch/x86/hvm/intercept.c |  5 +++--
 xen/arch/x86/hvm/stdvga.c    |  4 ++--
 xen/common/ioreq.c           |  4 ++--
 xen/include/asm-x86/hvm/io.h | 16 ----------------
 xen/include/xen/ioreq.h      | 16 ++++++++++++++++
 5 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index cd4c4c1..02ca3b0 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -17,6 +17,7 @@
  * this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/ioreq.h>
 #include <xen/types.h>
 #include <xen/sched.h>
 #include <asm/regs.h>
@@ -34,7 +35,7 @@
 static bool_t hvm_mmio_accept(const struct hvm_io_handler *handler,
                               const ioreq_t *p)
 {
-    paddr_t first = hvm_mmio_first_byte(p), last;
+    paddr_t first = ioreq_mmio_first_byte(p), last;
 
     BUG_ON(handler->type != IOREQ_TYPE_COPY);
 
@@ -42,7 +43,7 @@ static bool_t hvm_mmio_accept(const struct hvm_io_handler *handler,
         return 0;
 
     /* Make sure the handler will accept the whole access. */
-    last = hvm_mmio_last_byte(p);
+    last = ioreq_mmio_last_byte(p);
     if ( last != first &&
          !handler->mmio.ops->check(current, last) )
         domain_crash(current->domain);
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index e267513..e184664 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -524,8 +524,8 @@ static bool_t stdvga_mem_accept(const struct hvm_io_handler *handler,
      * deadlock when hvm_mmio_internal() is called from
      * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
      */
-    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
-         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
+    if ( (ioreq_mmio_first_byte(p) < VGA_MEM_BASE) ||
+         (ioreq_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
         return 0;
 
     spin_lock(&s->lock);
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 44385ef..6e9f745 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -1075,8 +1075,8 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
             break;
 
         case XEN_DMOP_IO_RANGE_MEMORY:
-            start = hvm_mmio_first_byte(p);
-            end = hvm_mmio_last_byte(p);
+            start = ioreq_mmio_first_byte(p);
+            end = ioreq_mmio_last_byte(p);
 
             if ( rangeset_contains_range(r, start, end) )
                 return s;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 558426b..fb64294 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -40,22 +40,6 @@ struct hvm_mmio_ops {
     hvm_mmio_write_t write;
 };
 
-static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
-{
-    return unlikely(p->df) ?
-           p->addr - (p->count - 1ul) * p->size :
-           p->addr;
-}
-
-static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p)
-{
-    unsigned long size = p->size;
-
-    return unlikely(p->df) ?
-           p->addr + size - 1:
-           p->addr + (p->count * size) - 1;
-}
-
 typedef int (*portio_action_t)(
     int dir, unsigned int port, unsigned int bytes, uint32_t *val);
 
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 3cc333d..2746bb1 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -21,6 +21,22 @@
 
 #include <xen/sched.h>
 
+static inline paddr_t ioreq_mmio_first_byte(const ioreq_t *p)
+{
+    return unlikely(p->df) ?
+           p->addr - (p->count - 1ul) * p->size :
+           p->addr;
+}
+
+static inline paddr_t ioreq_mmio_last_byte(const ioreq_t *p)
+{
+    unsigned long size = p->size;
+
+    return unlikely(p->df) ?
+           p->addr + size - 1:
+           p->addr + (p->count * size) - 1;
+}
+
 static inline bool ioreq_needs_completion(const ioreq_t *ioreq)
 {
     return ioreq->state == STATE_IOREQ_READY &&
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 07/23] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (5 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 06/23] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-07 11:54   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 08/23] xen/ioreq: Move x86's ioreq_server to struct domain Oleksandr Tyshchenko
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The IOREQ is a common feature now and these structs will be used
on Arm as is. Move them to xen/ioreq.h and remove "hvm" prefixes.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch

Changes V1 -> V2:
   - remove "hvm" prefix

Changes V2 -> V3:
   - update patch according the "legacy interface" is x86 specific
---
---
 xen/arch/x86/hvm/emulate.c       |   2 +-
 xen/arch/x86/hvm/ioreq.c         |  36 ++++++-------
 xen/arch/x86/hvm/stdvga.c        |   2 +-
 xen/arch/x86/mm/p2m.c            |   8 +--
 xen/common/ioreq.c               | 108 +++++++++++++++++++--------------------
 xen/include/asm-x86/hvm/domain.h |  36 +------------
 xen/include/asm-x86/hvm/ioreq.h  |  12 ++---
 xen/include/asm-x86/p2m.h        |   8 +--
 xen/include/xen/ioreq.h          |  40 +++++++++++++--
 9 files changed, 126 insertions(+), 126 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 5700274..4746d5a 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -287,7 +287,7 @@ static int hvmemul_do_io(
          * However, there's no cheap approach to avoid above situations in xen,
          * so the device model side needs to check the incoming ioreq event.
          */
-        struct hvm_ioreq_server *s = NULL;
+        struct ioreq_server *s = NULL;
         p2m_type_t p2mt = p2m_invalid;
 
         if ( is_mmio )
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index b03ceee..009a95a 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -64,7 +64,7 @@ bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion)
     return true;
 }
 
-static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s)
+static gfn_t hvm_alloc_legacy_ioreq_gfn(struct ioreq_server *s)
 {
     struct domain *d = s->target;
     unsigned int i;
@@ -80,7 +80,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s)
     return INVALID_GFN;
 }
 
-static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
+static gfn_t hvm_alloc_ioreq_gfn(struct ioreq_server *s)
 {
     struct domain *d = s->target;
     unsigned int i;
@@ -98,7 +98,7 @@ static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
     return hvm_alloc_legacy_ioreq_gfn(s);
 }
 
-static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s,
+static bool hvm_free_legacy_ioreq_gfn(struct ioreq_server *s,
                                       gfn_t gfn)
 {
     struct domain *d = s->target;
@@ -116,7 +116,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s,
     return true;
 }
 
-static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
+static void hvm_free_ioreq_gfn(struct ioreq_server *s, gfn_t gfn)
 {
     struct domain *d = s->target;
     unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base;
@@ -130,9 +130,9 @@ static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
     }
 }
 
-static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_unmap_ioreq_gfn(struct ioreq_server *s, bool buf)
 {
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
 
     if ( gfn_eq(iorp->gfn, INVALID_GFN) )
         return;
@@ -144,10 +144,10 @@ static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     iorp->gfn = INVALID_GFN;
 }
 
-static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_map_ioreq_gfn(struct ioreq_server *s, bool buf)
 {
     struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
     int rc;
 
     if ( iorp->page )
@@ -180,11 +180,11 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     return rc;
 }
 
-static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_remove_ioreq_gfn(struct ioreq_server *s, bool buf)
 
 {
     struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
 
     if ( gfn_eq(iorp->gfn, INVALID_GFN) )
         return;
@@ -195,10 +195,10 @@ static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     clear_page(iorp->va);
 }
 
-static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_add_ioreq_gfn(struct ioreq_server *s, bool buf)
 {
     struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
     int rc;
 
     if ( gfn_eq(iorp->gfn, INVALID_GFN) )
@@ -214,7 +214,7 @@ static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
     return rc;
 }
 
-int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+int arch_ioreq_server_map_pages(struct ioreq_server *s)
 {
     int rc;
 
@@ -229,33 +229,33 @@ int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s)
     return rc;
 }
 
-void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+void arch_ioreq_server_unmap_pages(struct ioreq_server *s)
 {
     hvm_unmap_ioreq_gfn(s, true);
     hvm_unmap_ioreq_gfn(s, false);
 }
 
-void arch_ioreq_server_enable(struct hvm_ioreq_server *s)
+void arch_ioreq_server_enable(struct ioreq_server *s)
 {
     hvm_remove_ioreq_gfn(s, false);
     hvm_remove_ioreq_gfn(s, true);
 }
 
-void arch_ioreq_server_disable(struct hvm_ioreq_server *s)
+void arch_ioreq_server_disable(struct ioreq_server *s)
 {
     hvm_add_ioreq_gfn(s, true);
     hvm_add_ioreq_gfn(s, false);
 }
 
 /* Called when target domain is paused */
-void arch_ioreq_server_destroy(struct hvm_ioreq_server *s)
+void arch_ioreq_server_destroy(struct ioreq_server *s)
 {
     p2m_set_ioreq_server(s->target, 0, s);
 }
 
 /* Called with ioreq_server lock held */
 int arch_ioreq_server_map_mem_type(struct domain *d,
-                                   struct hvm_ioreq_server *s,
+                                   struct ioreq_server *s,
                                    uint32_t flags)
 {
     int rc = p2m_set_ioreq_server(d, flags, s);
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index e184664..bafb3f6 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -466,7 +466,7 @@ static int stdvga_mem_write(const struct hvm_io_handler *handler,
         .dir = IOREQ_WRITE,
         .data = data,
     };
-    struct hvm_ioreq_server *srv;
+    struct ioreq_server *srv;
 
     if ( !stdvga_cache_is_enabled(s) || !s->stdvga )
         goto done;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index d9cc185..7a2ba82 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -367,7 +367,7 @@ void p2m_memory_type_changed(struct domain *d)
 
 int p2m_set_ioreq_server(struct domain *d,
                          unsigned int flags,
-                         struct hvm_ioreq_server *s)
+                         struct ioreq_server *s)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
     int rc;
@@ -415,11 +415,11 @@ int p2m_set_ioreq_server(struct domain *d,
     return rc;
 }
 
-struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
-                                              unsigned int *flags)
+struct ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                          unsigned int *flags)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
 
     spin_lock(&p2m->ioreq.lock);
 
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 6e9f745..3e80fc6 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -35,7 +35,7 @@
 #include <public/hvm/params.h>
 
 static void set_ioreq_server(struct domain *d, unsigned int id,
-                             struct hvm_ioreq_server *s)
+                             struct ioreq_server *s)
 {
     ASSERT(id < MAX_NR_IOREQ_SERVERS);
     ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
@@ -46,8 +46,8 @@ static void set_ioreq_server(struct domain *d, unsigned int id,
 #define GET_IOREQ_SERVER(d, id) \
     (d)->arch.hvm.ioreq_server.server[id]
 
-static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
-                                                 unsigned int id)
+static struct ioreq_server *get_ioreq_server(const struct domain *d,
+                                             unsigned int id)
 {
     if ( id >= MAX_NR_IOREQ_SERVERS )
         return NULL;
@@ -69,7 +69,7 @@ static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
             continue; \
         else
 
-static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
+static ioreq_t *get_ioreq(struct ioreq_server *s, struct vcpu *v)
 {
     shared_iopage_t *p = s->ioreq.va;
 
@@ -79,16 +79,16 @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
     return &p->vcpu_ioreq[v->vcpu_id];
 }
 
-static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v,
-                                               struct hvm_ioreq_server **srvp)
+static struct ioreq_vcpu *get_pending_vcpu(const struct vcpu *v,
+                                           struct ioreq_server **srvp)
 {
     struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     unsigned int id;
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
-        struct hvm_ioreq_vcpu *sv;
+        struct ioreq_vcpu *sv;
 
         list_for_each_entry ( sv,
                               &s->ioreq_vcpu_list,
@@ -111,7 +111,7 @@ bool hvm_io_pending(struct vcpu *v)
     return get_pending_vcpu(v, NULL);
 }
 
-static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
+static bool hvm_wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p)
 {
     unsigned int prev_state = STATE_IOREQ_NONE;
     unsigned int state = p->state;
@@ -172,8 +172,8 @@ bool handle_hvm_io_completion(struct vcpu *v)
 {
     struct domain *d = v->domain;
     struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
-    struct hvm_ioreq_server *s;
-    struct hvm_ioreq_vcpu *sv;
+    struct ioreq_server *s;
+    struct ioreq_vcpu *sv;
     enum hvm_io_completion io_completion;
 
     if ( has_vpci(d) && vpci_process_pending(v) )
@@ -214,9 +214,9 @@ bool handle_hvm_io_completion(struct vcpu *v)
     return true;
 }
 
-static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_alloc_ioreq_mfn(struct ioreq_server *s, bool buf)
 {
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
     struct page_info *page;
 
     if ( iorp->page )
@@ -262,9 +262,9 @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
     return -ENOMEM;
 }
 
-static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_free_ioreq_mfn(struct ioreq_server *s, bool buf)
 {
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
     struct page_info *page = iorp->page;
 
     if ( !page )
@@ -281,7 +281,7 @@ static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
 
 bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
 {
-    const struct hvm_ioreq_server *s;
+    const struct ioreq_server *s;
     unsigned int id;
     bool found = false;
 
@@ -301,8 +301,8 @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
     return found;
 }
 
-static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
-                                    struct hvm_ioreq_vcpu *sv)
+static void hvm_update_ioreq_evtchn(struct ioreq_server *s,
+                                    struct ioreq_vcpu *sv)
 {
     ASSERT(spin_is_locked(&s->lock));
 
@@ -314,13 +314,13 @@ static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
     }
 }
 
-static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
+static int hvm_ioreq_server_add_vcpu(struct ioreq_server *s,
                                      struct vcpu *v)
 {
-    struct hvm_ioreq_vcpu *sv;
+    struct ioreq_vcpu *sv;
     int rc;
 
-    sv = xzalloc(struct hvm_ioreq_vcpu);
+    sv = xzalloc(struct ioreq_vcpu);
 
     rc = -ENOMEM;
     if ( !sv )
@@ -366,10 +366,10 @@ static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
     return rc;
 }
 
-static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
+static void hvm_ioreq_server_remove_vcpu(struct ioreq_server *s,
                                          struct vcpu *v)
 {
-    struct hvm_ioreq_vcpu *sv;
+    struct ioreq_vcpu *sv;
 
     spin_lock(&s->lock);
 
@@ -394,9 +394,9 @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
     spin_unlock(&s->lock);
 }
 
-static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_remove_all_vcpus(struct ioreq_server *s)
 {
-    struct hvm_ioreq_vcpu *sv, *next;
+    struct ioreq_vcpu *sv, *next;
 
     spin_lock(&s->lock);
 
@@ -420,7 +420,7 @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
-static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
+static int hvm_ioreq_server_alloc_pages(struct ioreq_server *s)
 {
     int rc;
 
@@ -435,13 +435,13 @@ static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
     return rc;
 }
 
-static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_free_pages(struct ioreq_server *s)
 {
     hvm_free_ioreq_mfn(s, true);
     hvm_free_ioreq_mfn(s, false);
 }
 
-static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_free_rangesets(struct ioreq_server *s)
 {
     unsigned int i;
 
@@ -449,7 +449,7 @@ static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
         rangeset_destroy(s->range[i]);
 }
 
-static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
+static int hvm_ioreq_server_alloc_rangesets(struct ioreq_server *s,
                                             ioservid_t id)
 {
     unsigned int i;
@@ -487,9 +487,9 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
     return rc;
 }
 
-static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_enable(struct ioreq_server *s)
 {
-    struct hvm_ioreq_vcpu *sv;
+    struct ioreq_vcpu *sv;
 
     spin_lock(&s->lock);
 
@@ -509,7 +509,7 @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
-static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_disable(struct ioreq_server *s)
 {
     spin_lock(&s->lock);
 
@@ -524,7 +524,7 @@ static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
-static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
+static int hvm_ioreq_server_init(struct ioreq_server *s,
                                  struct domain *d, int bufioreq_handling,
                                  ioservid_t id)
 {
@@ -569,7 +569,7 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
     return rc;
 }
 
-static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
+static void hvm_ioreq_server_deinit(struct ioreq_server *s)
 {
     ASSERT(!s->enabled);
     hvm_ioreq_server_remove_all_vcpus(s);
@@ -594,14 +594,14 @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
 int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
                             ioservid_t *id)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     unsigned int i;
     int rc;
 
     if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC )
         return -EINVAL;
 
-    s = xzalloc(struct hvm_ioreq_server);
+    s = xzalloc(struct ioreq_server);
     if ( !s )
         return -ENOMEM;
 
@@ -649,7 +649,7 @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
 
 int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     int rc;
 
     spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
@@ -694,7 +694,7 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
                               unsigned long *bufioreq_gfn,
                               evtchn_port_t *bufioreq_port)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     int rc;
 
     spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
@@ -739,7 +739,7 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
 int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
                                unsigned long idx, mfn_t *mfn)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     int rc;
 
     ASSERT(is_hvm_domain(d));
@@ -791,7 +791,7 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
                                      uint32_t type, uint64_t start,
                                      uint64_t end)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     struct rangeset *r;
     int rc;
 
@@ -843,7 +843,7 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
                                          uint32_t type, uint64_t start,
                                          uint64_t end)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     struct rangeset *r;
     int rc;
 
@@ -902,7 +902,7 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
 int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
                                      uint32_t type, uint32_t flags)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     int rc;
 
     if ( type != HVMMEM_ioreq_server )
@@ -934,7 +934,7 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
 int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
                                bool enabled)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     int rc;
 
     spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
@@ -967,7 +967,7 @@ int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
 
 int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     unsigned int id;
     int rc;
 
@@ -1002,7 +1002,7 @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
 
 void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     unsigned int id;
 
     spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
@@ -1015,7 +1015,7 @@ void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
 
 void hvm_destroy_all_ioreq_servers(struct domain *d)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     unsigned int id;
 
     if ( !arch_ioreq_server_destroy_all(d) )
@@ -1042,10 +1042,10 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
     spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
 }
 
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p)
+struct ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                             ioreq_t *p)
 {
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     uint8_t type;
     uint64_t addr;
     unsigned int id;
@@ -1098,10 +1098,10 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
     return NULL;
 }
 
-static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
+static int hvm_send_buffered_ioreq(struct ioreq_server *s, ioreq_t *p)
 {
     struct domain *d = current->domain;
-    struct hvm_ioreq_page *iorp;
+    struct ioreq_page *iorp;
     buffered_iopage_t *pg;
     buf_ioreq_t bp = { .data = p->data,
                        .addr = p->addr,
@@ -1191,12 +1191,12 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
     return IOREQ_STATUS_HANDLED;
 }
 
-int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
+int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p,
                    bool buffered)
 {
     struct vcpu *curr = current;
     struct domain *d = curr->domain;
-    struct hvm_ioreq_vcpu *sv;
+    struct ioreq_vcpu *sv;
 
     ASSERT(s);
 
@@ -1254,7 +1254,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
 unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
 {
     struct domain *d = current->domain;
-    struct hvm_ioreq_server *s;
+    struct ioreq_server *s;
     unsigned int id, failed = 0;
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 9d247ba..1c4ca47 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -30,40 +30,6 @@
 
 #include <public/hvm/dm_op.h>
 
-struct hvm_ioreq_page {
-    gfn_t gfn;
-    struct page_info *page;
-    void *va;
-};
-
-struct hvm_ioreq_vcpu {
-    struct list_head list_entry;
-    struct vcpu      *vcpu;
-    evtchn_port_t    ioreq_evtchn;
-    bool             pending;
-};
-
-#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
-#define MAX_NR_IO_RANGES  256
-
-struct hvm_ioreq_server {
-    struct domain          *target, *emulator;
-
-    /* Lock to serialize toolstack modifications */
-    spinlock_t             lock;
-
-    struct hvm_ioreq_page  ioreq;
-    struct list_head       ioreq_vcpu_list;
-    struct hvm_ioreq_page  bufioreq;
-
-    /* Lock to serialize access to buffered ioreq ring */
-    spinlock_t             bufioreq_lock;
-    evtchn_port_t          bufioreq_evtchn;
-    struct rangeset        *range[NR_IO_RANGE_TYPES];
-    bool                   enabled;
-    uint8_t                bufioreq_handling;
-};
-
 #ifdef CONFIG_MEM_SHARING
 struct mem_sharing_domain
 {
@@ -110,7 +76,7 @@ struct hvm_domain {
     /* Lock protects all other values in the sub-struct and the default */
     struct {
         spinlock_t              lock;
-        struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
+        struct ioreq_server *server[MAX_NR_IOREQ_SERVERS];
     } ioreq_server;
 
     /* Cached CF8 for guest PCI config cycles */
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index ab2f3f8..854dc77 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -22,13 +22,13 @@
 #include <xen/ioreq.h>
 
 bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion);
-int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s);
-void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s);
-void arch_ioreq_server_enable(struct hvm_ioreq_server *s);
-void arch_ioreq_server_disable(struct hvm_ioreq_server *s);
-void arch_ioreq_server_destroy(struct hvm_ioreq_server *s);
+int arch_ioreq_server_map_pages(struct ioreq_server *s);
+void arch_ioreq_server_unmap_pages(struct ioreq_server *s);
+void arch_ioreq_server_enable(struct ioreq_server *s);
+void arch_ioreq_server_disable(struct ioreq_server *s);
+void arch_ioreq_server_destroy(struct ioreq_server *s);
 int arch_ioreq_server_map_mem_type(struct domain *d,
-                                   struct hvm_ioreq_server *s,
+                                   struct ioreq_server *s,
                                    uint32_t flags);
 bool arch_ioreq_server_destroy_all(struct domain *d);
 int arch_ioreq_server_get_type_addr(const struct domain *d,
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 8d6fd1a..4603560 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -363,7 +363,7 @@ struct p2m_domain {
           * ioreq server who's responsible for the emulation of
           * gfns with specific p2m type(for now, p2m_ioreq_server).
           */
-         struct hvm_ioreq_server *server;
+         struct ioreq_server *server;
          /*
           * flags specifies whether read, write or both operations
           * are to be emulated by an ioreq server.
@@ -941,9 +941,9 @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn)
 }
 
 int p2m_set_ioreq_server(struct domain *d, unsigned int flags,
-                         struct hvm_ioreq_server *s);
-struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
-                                              unsigned int *flags);
+                         struct ioreq_server *s);
+struct ioreq_server *p2m_get_ioreq_server(struct domain *d,
+                                          unsigned int *flags);
 
 static inline int p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
                                    p2m_type_t ot, mfn_t nfn, mfn_t ofn,
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 2746bb1..979afa0 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -21,6 +21,40 @@
 
 #include <xen/sched.h>
 
+struct ioreq_page {
+    gfn_t gfn;
+    struct page_info *page;
+    void *va;
+};
+
+struct ioreq_vcpu {
+    struct list_head list_entry;
+    struct vcpu      *vcpu;
+    evtchn_port_t    ioreq_evtchn;
+    bool             pending;
+};
+
+#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
+#define MAX_NR_IO_RANGES  256
+
+struct ioreq_server {
+    struct domain          *target, *emulator;
+
+    /* Lock to serialize toolstack modifications */
+    spinlock_t             lock;
+
+    struct ioreq_page      ioreq;
+    struct list_head       ioreq_vcpu_list;
+    struct ioreq_page      bufioreq;
+
+    /* Lock to serialize access to buffered ioreq ring */
+    spinlock_t             bufioreq_lock;
+    evtchn_port_t          bufioreq_evtchn;
+    struct rangeset        *range[NR_IO_RANGE_TYPES];
+    bool                   enabled;
+    uint8_t                bufioreq_handling;
+};
+
 static inline paddr_t ioreq_mmio_first_byte(const ioreq_t *p)
 {
     return unlikely(p->df) ?
@@ -75,9 +109,9 @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
 void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
 void hvm_destroy_all_ioreq_servers(struct domain *d);
 
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p);
-int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
+struct ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                             ioreq_t *p);
+int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p,
                    bool buffered);
 unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 08/23] xen/ioreq: Move x86's ioreq_server to struct domain
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (6 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 07/23] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-07 12:04   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 09/23] xen/dm: Make x86's DM feature common Oleksandr Tyshchenko
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Julien Grall, Stefano Stabellini,
	Wei Liu, Roger Pau Monné,
	Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The IOREQ is a common feature now and this struct will be used
on Arm as is. Move it to common struct domain. This also
significantly reduces the layering violation in the common code
(*arch.hvm* usage).

We don't move ioreq_gfn since it is not used in the common code
(the "legacy" mechanism is x86 specific).

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch

Changes V2 -> V3:
   - remove the mention of "ioreq_gfn" from patch subject/description
   - update patch according the "legacy interface" is x86 specific
   - drop hvm_params related changes in arch/x86/hvm/hvm.c
   - leave ioreq_gfn in hvm_domain
---
---
 xen/common/ioreq.c               | 60 ++++++++++++++++++++--------------------
 xen/include/asm-x86/hvm/domain.h |  8 ------
 xen/include/xen/sched.h          | 10 +++++++
 3 files changed, 40 insertions(+), 38 deletions(-)

diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 3e80fc6..b7c2d5a 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -38,13 +38,13 @@ static void set_ioreq_server(struct domain *d, unsigned int id,
                              struct ioreq_server *s)
 {
     ASSERT(id < MAX_NR_IOREQ_SERVERS);
-    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
+    ASSERT(!s || !d->ioreq_server.server[id]);
 
-    d->arch.hvm.ioreq_server.server[id] = s;
+    d->ioreq_server.server[id] = s;
 }
 
 #define GET_IOREQ_SERVER(d, id) \
-    (d)->arch.hvm.ioreq_server.server[id]
+    (d)->ioreq_server.server[id]
 
 static struct ioreq_server *get_ioreq_server(const struct domain *d,
                                              unsigned int id)
@@ -285,7 +285,7 @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
     unsigned int id;
     bool found = false;
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
@@ -296,7 +296,7 @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
         }
     }
 
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 
     return found;
 }
@@ -606,7 +606,7 @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
         return -ENOMEM;
 
     domain_pause(d);
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
     {
@@ -634,13 +634,13 @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
     if ( id )
         *id = i;
 
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
     domain_unpause(d);
 
     return 0;
 
  fail:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
     domain_unpause(d);
 
     xfree(s);
@@ -652,7 +652,7 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
     struct ioreq_server *s;
     int rc;
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     s = get_ioreq_server(d, id);
 
@@ -684,7 +684,7 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
     rc = 0;
 
  out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 
     return rc;
 }
@@ -697,7 +697,7 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
     struct ioreq_server *s;
     int rc;
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     s = get_ioreq_server(d, id);
 
@@ -731,7 +731,7 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
     rc = 0;
 
  out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 
     return rc;
 }
@@ -744,7 +744,7 @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
 
     ASSERT(is_hvm_domain(d));
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     s = get_ioreq_server(d, id);
 
@@ -782,7 +782,7 @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
     }
 
  out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 
     return rc;
 }
@@ -798,7 +798,7 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
     if ( start > end )
         return -EINVAL;
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     s = get_ioreq_server(d, id);
 
@@ -834,7 +834,7 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
     rc = rangeset_add_range(r, start, end);
 
  out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 
     return rc;
 }
@@ -850,7 +850,7 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
     if ( start > end )
         return -EINVAL;
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     s = get_ioreq_server(d, id);
 
@@ -886,7 +886,7 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
     rc = rangeset_remove_range(r, start, end);
 
  out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 
     return rc;
 }
@@ -911,7 +911,7 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
     if ( flags & ~XEN_DMOP_IOREQ_MEM_ACCESS_WRITE )
         return -EINVAL;
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     s = get_ioreq_server(d, id);
 
@@ -926,7 +926,7 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
     rc = arch_ioreq_server_map_mem_type(d, s, flags);
 
  out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 
     return rc;
 }
@@ -937,7 +937,7 @@ int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
     struct ioreq_server *s;
     int rc;
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     s = get_ioreq_server(d, id);
 
@@ -961,7 +961,7 @@ int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
     rc = 0;
 
  out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
     return rc;
 }
 
@@ -971,7 +971,7 @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
     unsigned int id;
     int rc;
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
@@ -980,7 +980,7 @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
             goto fail;
     }
 
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 
     return 0;
 
@@ -995,7 +995,7 @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
         hvm_ioreq_server_remove_vcpu(s, v);
     }
 
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 
     return rc;
 }
@@ -1005,12 +1005,12 @@ void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
     struct ioreq_server *s;
     unsigned int id;
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
         hvm_ioreq_server_remove_vcpu(s, v);
 
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 }
 
 void hvm_destroy_all_ioreq_servers(struct domain *d)
@@ -1021,7 +1021,7 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
     if ( !arch_ioreq_server_destroy_all(d) )
         return;
 
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_recursive(&d->ioreq_server.lock);
 
     /* No need to domain_pause() as the domain is being torn down */
 
@@ -1039,7 +1039,7 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
         xfree(s);
     }
 
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    spin_unlock_recursive(&d->ioreq_server.lock);
 }
 
 struct ioreq_server *hvm_select_ioreq_server(struct domain *d,
@@ -1271,7 +1271,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
 
 void hvm_ioreq_init(struct domain *d)
 {
-    spin_lock_init(&d->arch.hvm.ioreq_server.lock);
+    spin_lock_init(&d->ioreq_server.lock);
 
     arch_ioreq_domain_init(d);
 }
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 1c4ca47..b8be1ad 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -63,8 +63,6 @@ struct hvm_pi_ops {
     void (*vcpu_block)(struct vcpu *);
 };
 
-#define MAX_NR_IOREQ_SERVERS 8
-
 struct hvm_domain {
     /* Guest page range used for non-default ioreq servers */
     struct {
@@ -73,12 +71,6 @@ struct hvm_domain {
         unsigned long legacy_mask; /* indexed by HVM param number */
     } ioreq_gfn;
 
-    /* Lock protects all other values in the sub-struct and the default */
-    struct {
-        spinlock_t              lock;
-        struct ioreq_server *server[MAX_NR_IOREQ_SERVERS];
-    } ioreq_server;
-
     /* Cached CF8 for guest PCI config cycles */
     uint32_t                pci_cf8;
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index a345cc0..62cbcdb 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -316,6 +316,8 @@ struct sched_unit {
 
 struct evtchn_port_ops;
 
+#define MAX_NR_IOREQ_SERVERS 8
+
 struct domain
 {
     domid_t          domain_id;
@@ -523,6 +525,14 @@ struct domain
     /* Argo interdomain communication support */
     struct argo_domain *argo;
 #endif
+
+#ifdef CONFIG_IOREQ_SERVER
+    /* Lock protects all other values in the sub-struct and the default */
+    struct {
+        spinlock_t              lock;
+        struct ioreq_server     *server[MAX_NR_IOREQ_SERVERS];
+    } ioreq_server;
+#endif
 };
 
 static inline struct page_list_head *page_to_list(
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 09/23] xen/dm: Make x86's DM feature common
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (7 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 08/23] xen/ioreq: Move x86's ioreq_server to struct domain Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-07 12:08   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 10/23] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Daniel De Graaf, Oleksandr Tyshchenko

From: Julien Grall <julien.grall@arm.com>

As a lot of x86 code can be re-used on Arm later on, this patch
splits devicemodel support into common and arch specific parts.

The common DM feature is supposed to be built with IOREQ_SERVER
option enabled (as well as the IOREQ feature), which is selected
for x86's config HVM for now.

Also update XSM code a bit to let DM op be used on Arm.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - update XSM, related changes were pulled from:
     [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features

Changes V1 -> V2:
   - update the author of a patch
   - update patch description
   - introduce xen/dm.h and move definitions here

Changes V2 -> V3:
   - no changes
---
---
 xen/arch/x86/hvm/dm.c   | 291 ++++--------------------------------------------
 xen/common/Makefile     |   1 +
 xen/common/dm.c         | 291 ++++++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/dm.h    |  44 ++++++++
 xen/include/xsm/dummy.h |   4 +-
 xen/include/xsm/xsm.h   |   6 +-
 xen/xsm/dummy.c         |   2 +-
 xen/xsm/flask/hooks.c   |   5 +-
 8 files changed, 364 insertions(+), 280 deletions(-)
 create mode 100644 xen/common/dm.c
 create mode 100644 xen/include/xen/dm.h

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 71f5ca4..35f860a 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -16,6 +16,7 @@
 
 #include <xen/event.h>
 #include <xen/guest_access.h>
+#include <xen/dm.h>
 #include <xen/hypercall.h>
 #include <xen/nospec.h>
 #include <xen/sched.h>
@@ -29,13 +30,6 @@
 
 #include <public/hvm/hvm_op.h>
 
-struct dmop_args {
-    domid_t domid;
-    unsigned int nr_bufs;
-    /* Reserve enough buf elements for all current hypercalls. */
-    struct xen_dm_op_buf buf[2];
-};
-
 static bool _raw_copy_from_guest_buf_offset(void *dst,
                                             const struct dmop_args *args,
                                             unsigned int buf_idx,
@@ -338,148 +332,20 @@ static int inject_event(struct domain *d,
     return 0;
 }
 
-static int dm_op(const struct dmop_args *op_args)
+int arch_dm_op(struct xen_dm_op *op, struct domain *d,
+               const struct dmop_args *op_args, bool *const_op)
 {
-    struct domain *d;
-    struct xen_dm_op op;
-    bool const_op = true;
     long rc;
-    size_t offset;
-
-    static const uint8_t op_size[] = {
-        [XEN_DMOP_create_ioreq_server]              = sizeof(struct xen_dm_op_create_ioreq_server),
-        [XEN_DMOP_get_ioreq_server_info]            = sizeof(struct xen_dm_op_get_ioreq_server_info),
-        [XEN_DMOP_map_io_range_to_ioreq_server]     = sizeof(struct xen_dm_op_ioreq_server_range),
-        [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range),
-        [XEN_DMOP_set_ioreq_server_state]           = sizeof(struct xen_dm_op_set_ioreq_server_state),
-        [XEN_DMOP_destroy_ioreq_server]             = sizeof(struct xen_dm_op_destroy_ioreq_server),
-        [XEN_DMOP_track_dirty_vram]                 = sizeof(struct xen_dm_op_track_dirty_vram),
-        [XEN_DMOP_set_pci_intx_level]               = sizeof(struct xen_dm_op_set_pci_intx_level),
-        [XEN_DMOP_set_isa_irq_level]                = sizeof(struct xen_dm_op_set_isa_irq_level),
-        [XEN_DMOP_set_pci_link_route]               = sizeof(struct xen_dm_op_set_pci_link_route),
-        [XEN_DMOP_modified_memory]                  = sizeof(struct xen_dm_op_modified_memory),
-        [XEN_DMOP_set_mem_type]                     = sizeof(struct xen_dm_op_set_mem_type),
-        [XEN_DMOP_inject_event]                     = sizeof(struct xen_dm_op_inject_event),
-        [XEN_DMOP_inject_msi]                       = sizeof(struct xen_dm_op_inject_msi),
-        [XEN_DMOP_map_mem_type_to_ioreq_server]     = sizeof(struct xen_dm_op_map_mem_type_to_ioreq_server),
-        [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
-        [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
-        [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
-    };
-
-    rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
-    if ( rc )
-        return rc;
-
-    if ( !is_hvm_domain(d) )
-        goto out;
-
-    rc = xsm_dm_op(XSM_DM_PRIV, d);
-    if ( rc )
-        goto out;
-
-    offset = offsetof(struct xen_dm_op, u);
-
-    rc = -EFAULT;
-    if ( op_args->buf[0].size < offset )
-        goto out;
-
-    if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) )
-        goto out;
-
-    if ( op.op >= ARRAY_SIZE(op_size) )
-    {
-        rc = -EOPNOTSUPP;
-        goto out;
-    }
-
-    op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size));
-
-    if ( op_args->buf[0].size < offset + op_size[op.op] )
-        goto out;
-
-    if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset,
-                                op_size[op.op]) )
-        goto out;
-
-    rc = -EINVAL;
-    if ( op.pad )
-        goto out;
-
-    switch ( op.op )
-    {
-    case XEN_DMOP_create_ioreq_server:
-    {
-        struct xen_dm_op_create_ioreq_server *data =
-            &op.u.create_ioreq_server;
-
-        const_op = false;
-
-        rc = -EINVAL;
-        if ( data->pad[0] || data->pad[1] || data->pad[2] )
-            break;
-
-        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
-                                     &data->id);
-        break;
-    }
 
-    case XEN_DMOP_get_ioreq_server_info:
+    switch ( op->op )
     {
-        struct xen_dm_op_get_ioreq_server_info *data =
-            &op.u.get_ioreq_server_info;
-        const uint16_t valid_flags = XEN_DMOP_no_gfns;
-
-        const_op = false;
-
-        rc = -EINVAL;
-        if ( data->flags & ~valid_flags )
-            break;
-
-        rc = hvm_get_ioreq_server_info(d, data->id,
-                                       (data->flags & XEN_DMOP_no_gfns) ?
-                                       NULL : &data->ioreq_gfn,
-                                       (data->flags & XEN_DMOP_no_gfns) ?
-                                       NULL : &data->bufioreq_gfn,
-                                       &data->bufioreq_port);
-        break;
-    }
-
-    case XEN_DMOP_map_io_range_to_ioreq_server:
-    {
-        const struct xen_dm_op_ioreq_server_range *data =
-            &op.u.map_io_range_to_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
-                                              data->start, data->end);
-        break;
-    }
-
-    case XEN_DMOP_unmap_io_range_from_ioreq_server:
-    {
-        const struct xen_dm_op_ioreq_server_range *data =
-            &op.u.unmap_io_range_from_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
-                                                  data->start, data->end);
-        break;
-    }
-
     case XEN_DMOP_map_mem_type_to_ioreq_server:
     {
         struct xen_dm_op_map_mem_type_to_ioreq_server *data =
-            &op.u.map_mem_type_to_ioreq_server;
+            &op->u.map_mem_type_to_ioreq_server;
         unsigned long first_gfn = data->opaque;
 
-        const_op = false;
+        *const_op = false;
 
         rc = -EOPNOTSUPP;
         if ( !hap_enabled(d) )
@@ -523,36 +389,10 @@ static int dm_op(const struct dmop_args *op_args)
         break;
     }
 
-    case XEN_DMOP_set_ioreq_server_state:
-    {
-        const struct xen_dm_op_set_ioreq_server_state *data =
-            &op.u.set_ioreq_server_state;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
-        break;
-    }
-
-    case XEN_DMOP_destroy_ioreq_server:
-    {
-        const struct xen_dm_op_destroy_ioreq_server *data =
-            &op.u.destroy_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_destroy_ioreq_server(d, data->id);
-        break;
-    }
-
     case XEN_DMOP_track_dirty_vram:
     {
         const struct xen_dm_op_track_dirty_vram *data =
-            &op.u.track_dirty_vram;
+            &op->u.track_dirty_vram;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -568,7 +408,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_set_pci_intx_level:
     {
         const struct xen_dm_op_set_pci_intx_level *data =
-            &op.u.set_pci_intx_level;
+            &op->u.set_pci_intx_level;
 
         rc = set_pci_intx_level(d, data->domain, data->bus,
                                 data->device, data->intx,
@@ -579,7 +419,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_set_isa_irq_level:
     {
         const struct xen_dm_op_set_isa_irq_level *data =
-            &op.u.set_isa_irq_level;
+            &op->u.set_isa_irq_level;
 
         rc = set_isa_irq_level(d, data->isa_irq, data->level);
         break;
@@ -588,7 +428,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_set_pci_link_route:
     {
         const struct xen_dm_op_set_pci_link_route *data =
-            &op.u.set_pci_link_route;
+            &op->u.set_pci_link_route;
 
         rc = hvm_set_pci_link_route(d, data->link, data->isa_irq);
         break;
@@ -597,19 +437,19 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_modified_memory:
     {
         struct xen_dm_op_modified_memory *data =
-            &op.u.modified_memory;
+            &op->u.modified_memory;
 
         rc = modified_memory(d, op_args, data);
-        const_op = !rc;
+        *const_op = !rc;
         break;
     }
 
     case XEN_DMOP_set_mem_type:
     {
         struct xen_dm_op_set_mem_type *data =
-            &op.u.set_mem_type;
+            &op->u.set_mem_type;
 
-        const_op = false;
+        *const_op = false;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -622,7 +462,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_inject_event:
     {
         const struct xen_dm_op_inject_event *data =
-            &op.u.inject_event;
+            &op->u.inject_event;
 
         rc = -EINVAL;
         if ( data->pad0 || data->pad1 )
@@ -635,7 +475,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_inject_msi:
     {
         const struct xen_dm_op_inject_msi *data =
-            &op.u.inject_msi;
+            &op->u.inject_msi;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -648,7 +488,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_remote_shutdown:
     {
         const struct xen_dm_op_remote_shutdown *data =
-            &op.u.remote_shutdown;
+            &op->u.remote_shutdown;
 
         domain_shutdown(d, data->reason);
         rc = 0;
@@ -657,7 +497,7 @@ static int dm_op(const struct dmop_args *op_args)
 
     case XEN_DMOP_relocate_memory:
     {
-        struct xen_dm_op_relocate_memory *data = &op.u.relocate_memory;
+        struct xen_dm_op_relocate_memory *data = &op->u.relocate_memory;
         struct xen_add_to_physmap xatp = {
             .domid = op_args->domid,
             .size = data->size,
@@ -680,7 +520,7 @@ static int dm_op(const struct dmop_args *op_args)
             data->size -= rc;
             data->src_gfn += rc;
             data->dst_gfn += rc;
-            const_op = false;
+            *const_op = false;
             rc = -ERESTART;
         }
         break;
@@ -689,7 +529,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_pin_memory_cacheattr:
     {
         const struct xen_dm_op_pin_memory_cacheattr *data =
-            &op.u.pin_memory_cacheattr;
+            &op->u.pin_memory_cacheattr;
 
         if ( data->pad )
         {
@@ -707,97 +547,6 @@ static int dm_op(const struct dmop_args *op_args)
         break;
     }
 
-    if ( (!rc || rc == -ERESTART) &&
-         !const_op && copy_to_guest_offset(op_args->buf[0].h, offset,
-                                           (void *)&op.u, op_size[op.op]) )
-        rc = -EFAULT;
-
- out:
-    rcu_unlock_domain(d);
-
-    return rc;
-}
-
-#include <compat/hvm/dm_op.h>
-
-CHECK_dm_op_create_ioreq_server;
-CHECK_dm_op_get_ioreq_server_info;
-CHECK_dm_op_ioreq_server_range;
-CHECK_dm_op_set_ioreq_server_state;
-CHECK_dm_op_destroy_ioreq_server;
-CHECK_dm_op_track_dirty_vram;
-CHECK_dm_op_set_pci_intx_level;
-CHECK_dm_op_set_isa_irq_level;
-CHECK_dm_op_set_pci_link_route;
-CHECK_dm_op_modified_memory;
-CHECK_dm_op_set_mem_type;
-CHECK_dm_op_inject_event;
-CHECK_dm_op_inject_msi;
-CHECK_dm_op_map_mem_type_to_ioreq_server;
-CHECK_dm_op_remote_shutdown;
-CHECK_dm_op_relocate_memory;
-CHECK_dm_op_pin_memory_cacheattr;
-
-int compat_dm_op(domid_t domid,
-                 unsigned int nr_bufs,
-                 XEN_GUEST_HANDLE_PARAM(void) bufs)
-{
-    struct dmop_args args;
-    unsigned int i;
-    int rc;
-
-    if ( nr_bufs > ARRAY_SIZE(args.buf) )
-        return -E2BIG;
-
-    args.domid = domid;
-    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
-
-    for ( i = 0; i < args.nr_bufs; i++ )
-    {
-        struct compat_dm_op_buf cmp;
-
-        if ( copy_from_guest_offset(&cmp, bufs, i, 1) )
-            return -EFAULT;
-
-#define XLAT_dm_op_buf_HNDL_h(_d_, _s_) \
-        guest_from_compat_handle((_d_)->h, (_s_)->h)
-
-        XLAT_dm_op_buf(&args.buf[i], &cmp);
-
-#undef XLAT_dm_op_buf_HNDL_h
-    }
-
-    rc = dm_op(&args);
-
-    if ( rc == -ERESTART )
-        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
-                                           domid, nr_bufs, bufs);
-
-    return rc;
-}
-
-long do_dm_op(domid_t domid,
-              unsigned int nr_bufs,
-              XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs)
-{
-    struct dmop_args args;
-    int rc;
-
-    if ( nr_bufs > ARRAY_SIZE(args.buf) )
-        return -E2BIG;
-
-    args.domid = domid;
-    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
-
-    if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) )
-        return -EFAULT;
-
-    rc = dm_op(&args);
-
-    if ( rc == -ERESTART )
-        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
-                                           domid, nr_bufs, bufs);
-
     return rc;
 }
 
diff --git a/xen/common/Makefile b/xen/common/Makefile
index c0e91c4..460f214 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -6,6 +6,7 @@ obj-$(CONFIG_CORE_PARKING) += core_parking.o
 obj-y += cpu.o
 obj-$(CONFIG_DEBUG_TRACE) += debugtrace.o
 obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o
+obj-$(CONFIG_IOREQ_SERVER) += dm.o
 obj-y += domain.o
 obj-y += event_2l.o
 obj-y += event_channel.o
diff --git a/xen/common/dm.c b/xen/common/dm.c
new file mode 100644
index 0000000..36e01a2
--- /dev/null
+++ b/xen/common/dm.c
@@ -0,0 +1,291 @@
+/*
+ * Copyright (c) 2016 Citrix Systems Inc.
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/guest_access.h>
+#include <xen/dm.h>
+#include <xen/hypercall.h>
+#include <xen/ioreq.h>
+#include <xen/nospec.h>
+
+static int dm_op(const struct dmop_args *op_args)
+{
+    struct domain *d;
+    struct xen_dm_op op;
+    long rc;
+    bool const_op = true;
+    const size_t offset = offsetof(struct xen_dm_op, u);
+
+    static const uint8_t op_size[] = {
+        [XEN_DMOP_create_ioreq_server]              = sizeof(struct xen_dm_op_create_ioreq_server),
+        [XEN_DMOP_get_ioreq_server_info]            = sizeof(struct xen_dm_op_get_ioreq_server_info),
+        [XEN_DMOP_map_io_range_to_ioreq_server]     = sizeof(struct xen_dm_op_ioreq_server_range),
+        [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range),
+        [XEN_DMOP_set_ioreq_server_state]           = sizeof(struct xen_dm_op_set_ioreq_server_state),
+        [XEN_DMOP_destroy_ioreq_server]             = sizeof(struct xen_dm_op_destroy_ioreq_server),
+        [XEN_DMOP_track_dirty_vram]                 = sizeof(struct xen_dm_op_track_dirty_vram),
+        [XEN_DMOP_set_pci_intx_level]               = sizeof(struct xen_dm_op_set_pci_intx_level),
+        [XEN_DMOP_set_isa_irq_level]                = sizeof(struct xen_dm_op_set_isa_irq_level),
+        [XEN_DMOP_set_pci_link_route]               = sizeof(struct xen_dm_op_set_pci_link_route),
+        [XEN_DMOP_modified_memory]                  = sizeof(struct xen_dm_op_modified_memory),
+        [XEN_DMOP_set_mem_type]                     = sizeof(struct xen_dm_op_set_mem_type),
+        [XEN_DMOP_inject_event]                     = sizeof(struct xen_dm_op_inject_event),
+        [XEN_DMOP_inject_msi]                       = sizeof(struct xen_dm_op_inject_msi),
+        [XEN_DMOP_map_mem_type_to_ioreq_server]     = sizeof(struct xen_dm_op_map_mem_type_to_ioreq_server),
+        [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
+        [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
+        [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
+    };
+
+    rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
+    if ( rc )
+        return rc;
+
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_dm_op(XSM_DM_PRIV, d);
+    if ( rc )
+        goto out;
+
+    rc = -EFAULT;
+    if ( op_args->buf[0].size < offset )
+        goto out;
+
+    if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) )
+        goto out;
+
+    if ( op.op >= ARRAY_SIZE(op_size) )
+    {
+        rc = -EOPNOTSUPP;
+        goto out;
+    }
+
+    op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size));
+
+    if ( op_args->buf[0].size < offset + op_size[op.op] )
+        goto out;
+
+    if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset,
+                                op_size[op.op]) )
+        goto out;
+
+    rc = -EINVAL;
+    if ( op.pad )
+        goto out;
+
+    switch ( op.op )
+    {
+    case XEN_DMOP_create_ioreq_server:
+    {
+        struct xen_dm_op_create_ioreq_server *data =
+            &op.u.create_ioreq_server;
+
+        const_op = false;
+
+        rc = -EINVAL;
+        if ( data->pad[0] || data->pad[1] || data->pad[2] )
+            break;
+
+        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
+                                     &data->id);
+        break;
+    }
+
+    case XEN_DMOP_get_ioreq_server_info:
+    {
+        struct xen_dm_op_get_ioreq_server_info *data =
+            &op.u.get_ioreq_server_info;
+        const uint16_t valid_flags = XEN_DMOP_no_gfns;
+
+        const_op = false;
+
+        rc = -EINVAL;
+        if ( data->flags & ~valid_flags )
+            break;
+
+        rc = hvm_get_ioreq_server_info(d, data->id,
+                                       (data->flags & XEN_DMOP_no_gfns) ?
+                                       NULL : (unsigned long *)&data->ioreq_gfn,
+                                       (data->flags & XEN_DMOP_no_gfns) ?
+                                       NULL : (unsigned long *)&data->bufioreq_gfn,
+                                       &data->bufioreq_port);
+        break;
+    }
+
+    case XEN_DMOP_map_io_range_to_ioreq_server:
+    {
+        const struct xen_dm_op_ioreq_server_range *data =
+            &op.u.map_io_range_to_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
+                                              data->start, data->end);
+        break;
+    }
+
+    case XEN_DMOP_unmap_io_range_from_ioreq_server:
+    {
+        const struct xen_dm_op_ioreq_server_range *data =
+            &op.u.unmap_io_range_from_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
+                                                  data->start, data->end);
+        break;
+    }
+
+    case XEN_DMOP_set_ioreq_server_state:
+    {
+        const struct xen_dm_op_set_ioreq_server_state *data =
+            &op.u.set_ioreq_server_state;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
+        break;
+    }
+
+    case XEN_DMOP_destroy_ioreq_server:
+    {
+        const struct xen_dm_op_destroy_ioreq_server *data =
+            &op.u.destroy_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_destroy_ioreq_server(d, data->id);
+        break;
+    }
+
+    default:
+        rc = arch_dm_op(&op, d, op_args, &const_op);
+    }
+
+    if ( (!rc || rc == -ERESTART) &&
+         !const_op && copy_to_guest_offset(op_args->buf[0].h, offset,
+                                           (void *)&op.u, op_size[op.op]) )
+        rc = -EFAULT;
+
+ out:
+    rcu_unlock_domain(d);
+
+    return rc;
+}
+
+#ifdef CONFIG_COMPAT
+#include <compat/hvm/dm_op.h>
+
+CHECK_dm_op_create_ioreq_server;
+CHECK_dm_op_get_ioreq_server_info;
+CHECK_dm_op_ioreq_server_range;
+CHECK_dm_op_set_ioreq_server_state;
+CHECK_dm_op_destroy_ioreq_server;
+CHECK_dm_op_track_dirty_vram;
+CHECK_dm_op_set_pci_intx_level;
+CHECK_dm_op_set_isa_irq_level;
+CHECK_dm_op_set_pci_link_route;
+CHECK_dm_op_modified_memory;
+CHECK_dm_op_set_mem_type;
+CHECK_dm_op_inject_event;
+CHECK_dm_op_inject_msi;
+CHECK_dm_op_map_mem_type_to_ioreq_server;
+CHECK_dm_op_remote_shutdown;
+CHECK_dm_op_relocate_memory;
+CHECK_dm_op_pin_memory_cacheattr;
+
+int compat_dm_op(domid_t domid,
+                 unsigned int nr_bufs,
+                 XEN_GUEST_HANDLE_PARAM(void) bufs)
+{
+    struct dmop_args args;
+    unsigned int i;
+    int rc;
+
+    if ( nr_bufs > ARRAY_SIZE(args.buf) )
+        return -E2BIG;
+
+    args.domid = domid;
+    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
+
+    for ( i = 0; i < args.nr_bufs; i++ )
+    {
+        struct compat_dm_op_buf cmp;
+
+        if ( copy_from_guest_offset(&cmp, bufs, i, 1) )
+            return -EFAULT;
+
+#define XLAT_dm_op_buf_HNDL_h(_d_, _s_) \
+        guest_from_compat_handle((_d_)->h, (_s_)->h)
+
+        XLAT_dm_op_buf(&args.buf[i], &cmp);
+
+#undef XLAT_dm_op_buf_HNDL_h
+    }
+
+    rc = dm_op(&args);
+
+    if ( rc == -ERESTART )
+        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
+                                           domid, nr_bufs, bufs);
+
+    return rc;
+}
+#endif
+
+long do_dm_op(domid_t domid,
+              unsigned int nr_bufs,
+              XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs)
+{
+    struct dmop_args args;
+    int rc;
+
+    if ( nr_bufs > ARRAY_SIZE(args.buf) )
+        return -E2BIG;
+
+    args.domid = domid;
+    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
+
+    if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) )
+        return -EFAULT;
+
+    rc = dm_op(&args);
+
+    if ( rc == -ERESTART )
+        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
+                                           domid, nr_bufs, bufs);
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/dm.h b/xen/include/xen/dm.h
new file mode 100644
index 0000000..ef15edf
--- /dev/null
+++ b/xen/include/xen/dm.h
@@ -0,0 +1,44 @@
+/*
+ * Copyright (c) 2016 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __XEN_DM_H__
+#define __XEN_DM_H__
+
+#include <xen/sched.h>
+
+struct dmop_args {
+    domid_t domid;
+    unsigned int nr_bufs;
+    /* Reserve enough buf elements for all current hypercalls. */
+    struct xen_dm_op_buf buf[2];
+};
+
+int arch_dm_op(struct xen_dm_op *op,
+               struct domain *d,
+               const struct dmop_args *op_args,
+               bool *const_op);
+
+#endif /* __XEN_DM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 7ae3c40..5c61d8e 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -707,14 +707,14 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
     }
 }
 
+#endif /* CONFIG_X86 */
+
 static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
 {
     XSM_ASSERT_ACTION(XSM_DM_PRIV);
     return xsm_default_action(action, current->domain, d);
 }
 
-#endif /* CONFIG_X86 */
-
 #ifdef CONFIG_ARGO
 static XSM_INLINE int xsm_argo_enable(const struct domain *d)
 {
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 7bd03d8..91ecff4 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -176,8 +176,8 @@ struct xsm_operations {
     int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*pmu_op) (struct domain *d, unsigned int op);
-    int (*dm_op) (struct domain *d);
 #endif
+    int (*dm_op) (struct domain *d);
     int (*xen_version) (uint32_t cmd);
     int (*domain_resource_map) (struct domain *d);
 #ifdef CONFIG_ARGO
@@ -682,13 +682,13 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
     return xsm_ops->pmu_op(d, op);
 }
 
+#endif /* CONFIG_X86 */
+
 static inline int xsm_dm_op(xsm_default_t def, struct domain *d)
 {
     return xsm_ops->dm_op(d);
 }
 
-#endif /* CONFIG_X86 */
-
 static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
 {
     return xsm_ops->xen_version(op);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 9e09512..8bdffe7 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -147,8 +147,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, ioport_permission);
     set_to_dummy_if_null(ops, ioport_mapping);
     set_to_dummy_if_null(ops, pmu_op);
-    set_to_dummy_if_null(ops, dm_op);
 #endif
+    set_to_dummy_if_null(ops, dm_op);
     set_to_dummy_if_null(ops, xen_version);
     set_to_dummy_if_null(ops, domain_resource_map);
 #ifdef CONFIG_ARGO
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 19b0d9e..11784d7 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1656,14 +1656,13 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
         return -EPERM;
     }
 }
+#endif /* CONFIG_X86 */
 
 static int flask_dm_op(struct domain *d)
 {
     return current_has_perm(d, SECCLASS_HVM, HVM__DM);
 }
 
-#endif /* CONFIG_X86 */
-
 static int flask_xen_version (uint32_t op)
 {
     u32 dsid = domain_sid(current->domain);
@@ -1865,8 +1864,8 @@ static struct xsm_operations flask_ops = {
     .ioport_permission = flask_ioport_permission,
     .ioport_mapping = flask_ioport_mapping,
     .pmu_op = flask_pmu_op,
-    .dm_op = flask_dm_op,
 #endif
+    .dm_op = flask_dm_op,
     .xen_version = flask_xen_version,
     .domain_resource_map = flask_domain_resource_map,
 #ifdef CONFIG_ARGO
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 10/23] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (8 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 09/23] xen/dm: Make x86's DM feature common Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-07 11:35   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu Oleksandr Tyshchenko
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Volodymyr Babchuk, Oleksandr Tyshchenko

From: Julien Grall <julien.grall@arm.com>

As x86 implementation of XENMEM_resource_ioreq_server can be
re-used on Arm later on, this patch makes it common and removes
arch_acquire_resource as unneeded.

Also re-order #include-s alphabetically.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - no changes

Changes V1 -> V2:
   - update the author of a patch

Changes V2 -> V3:
   - don't wrap #include <xen/ioreq.h>
   - limit the number of #ifdef-s
   - re-order #include-s alphabetically
---
---
 xen/arch/x86/mm.c        | 44 ---------------------------------
 xen/common/memory.c      | 63 +++++++++++++++++++++++++++++++++++++++---------
 xen/include/asm-arm/mm.h |  8 ------
 xen/include/asm-x86/mm.h |  4 ---
 4 files changed, 51 insertions(+), 68 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e4638ef..c0a7124 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4699,50 +4699,6 @@ int xenmem_add_to_physmap_one(
     return rc;
 }
 
-int arch_acquire_resource(struct domain *d, unsigned int type,
-                          unsigned int id, unsigned long frame,
-                          unsigned int nr_frames, xen_pfn_t mfn_list[])
-{
-    int rc;
-
-    switch ( type )
-    {
-#ifdef CONFIG_HVM
-    case XENMEM_resource_ioreq_server:
-    {
-        ioservid_t ioservid = id;
-        unsigned int i;
-
-        rc = -EINVAL;
-        if ( !is_hvm_domain(d) )
-            break;
-
-        if ( id != (unsigned int)ioservid )
-            break;
-
-        rc = 0;
-        for ( i = 0; i < nr_frames; i++ )
-        {
-            mfn_t mfn;
-
-            rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn);
-            if ( rc )
-                break;
-
-            mfn_list[i] = mfn_x(mfn);
-        }
-        break;
-    }
-#endif
-
-    default:
-        rc = -EOPNOTSUPP;
-        break;
-    }
-
-    return rc;
-}
-
 long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     int rc;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 2c86934..92cf983 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -8,22 +8,23 @@
  */
 
 #include <xen/domain_page.h>
-#include <xen/types.h>
+#include <xen/errno.h>
+#include <xen/event.h>
+#include <xen/grant_table.h>
+#include <xen/guest_access.h>
+#include <xen/hypercall.h>
+#include <xen/iocap.h>
+#include <xen/ioreq.h>
 #include <xen/lib.h>
+#include <xen/mem_access.h>
 #include <xen/mm.h>
+#include <xen/numa.h>
+#include <xen/paging.h>
 #include <xen/param.h>
 #include <xen/perfc.h>
 #include <xen/sched.h>
-#include <xen/event.h>
-#include <xen/paging.h>
-#include <xen/iocap.h>
-#include <xen/guest_access.h>
-#include <xen/hypercall.h>
-#include <xen/errno.h>
-#include <xen/numa.h>
-#include <xen/mem_access.h>
 #include <xen/trace.h>
-#include <xen/grant_table.h>
+#include <xen/types.h>
 #include <asm/current.h>
 #include <asm/hardirq.h>
 #include <asm/p2m.h>
@@ -1086,6 +1087,40 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
     return 0;
 }
 
+static int acquire_ioreq_server(struct domain *d,
+                                unsigned int id,
+                                unsigned long frame,
+                                unsigned int nr_frames,
+                                xen_pfn_t mfn_list[])
+{
+#ifdef CONFIG_IOREQ_SERVER
+    ioservid_t ioservid = id;
+    unsigned int i;
+    int rc;
+
+    if ( !is_hvm_domain(d) )
+        return -EINVAL;
+
+    if ( id != (unsigned int)ioservid )
+        return -EINVAL;
+
+    for ( i = 0; i < nr_frames; i++ )
+    {
+        mfn_t mfn;
+
+        rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn);
+        if ( rc )
+            return rc;
+
+        mfn_list[i] = mfn_x(mfn);
+    }
+
+    return 0;
+#else
+    return -EOPNOTSUPP;
+#endif
+}
+
 static int acquire_resource(
     XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
 {
@@ -1144,9 +1179,13 @@ static int acquire_resource(
                                  mfn_list);
         break;
 
+    case XENMEM_resource_ioreq_server:
+        rc = acquire_ioreq_server(d, xmar.id, xmar.frame, xmar.nr_frames,
+                                  mfn_list);
+        break;
+
     default:
-        rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame,
-                                   xmar.nr_frames, mfn_list);
+        rc = -EOPNOTSUPP;
         break;
     }
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index f8ba49b..0b7de31 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -358,14 +358,6 @@ static inline void put_page_and_type(struct page_info *page)
 
 void clear_and_clean_page(struct page_info *page);
 
-static inline
-int arch_acquire_resource(struct domain *d, unsigned int type, unsigned int id,
-                          unsigned long frame, unsigned int nr_frames,
-                          xen_pfn_t mfn_list[])
-{
-    return -EOPNOTSUPP;
-}
-
 unsigned int arch_get_dma_bitsize(void);
 
 #endif /*  __ARCH_ARM_MM__ */
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index deeba75..859214e 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -639,8 +639,4 @@ static inline bool arch_mfn_in_directmap(unsigned long mfn)
     return mfn <= (virt_to_mfn(eva - 1) + 1);
 }
 
-int arch_acquire_resource(struct domain *d, unsigned int type,
-                          unsigned int id, unsigned long frame,
-                          unsigned int nr_frames, xen_pfn_t mfn_list[]);
-
 #endif /* __ASM_X86_MM_H__ */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (9 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 10/23] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-07 12:32   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 12/23] xen/ioreq: Remove "hvm" prefixes from involved function names Oleksandr Tyshchenko
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Jun Nakajima, Kevin Tian, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The IOREQ is a common feature now and these fields will be used
on Arm as is. Move them to common struct vcpu as a part of new
struct vcpu_io and drop duplicating "io" prefixes. Also move
enum hvm_io_completion to xen/sched.h and remove "hvm" prefixes.

This patch completely removes layering violation in the common code.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch

Changes V2 -> V3:
   - update patch according the "legacy interface" is x86 specific
   - update patch description
   - drop the "io" prefixes from the field names
   - wrap IO_realmode_completion
---
---
 xen/arch/x86/hvm/emulate.c        | 72 +++++++++++++++++++--------------------
 xen/arch/x86/hvm/hvm.c            |  2 +-
 xen/arch/x86/hvm/io.c             |  8 ++---
 xen/arch/x86/hvm/ioreq.c          |  4 +--
 xen/arch/x86/hvm/svm/nestedsvm.c  |  2 +-
 xen/arch/x86/hvm/vmx/realmode.c   |  6 ++--
 xen/common/ioreq.c                | 22 ++++++------
 xen/include/asm-x86/hvm/emulate.h |  2 +-
 xen/include/asm-x86/hvm/ioreq.h   |  2 +-
 xen/include/asm-x86/hvm/vcpu.h    | 11 ------
 xen/include/xen/sched.h           | 19 +++++++++++
 11 files changed, 79 insertions(+), 71 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 4746d5a..04e4994 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -142,8 +142,8 @@ void hvmemul_cancel(struct vcpu *v)
 {
     struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
 
-    vio->io_req.state = STATE_IOREQ_NONE;
-    vio->io_completion = HVMIO_no_completion;
+    v->io.req.state = STATE_IOREQ_NONE;
+    v->io.completion = IO_no_completion;
     vio->mmio_cache_count = 0;
     vio->mmio_insn_bytes = 0;
     vio->mmio_access = (struct npfec){};
@@ -159,7 +159,7 @@ static int hvmemul_do_io(
 {
     struct vcpu *curr = current;
     struct domain *currd = curr->domain;
-    struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io;
+    struct vcpu_io *vio = &curr->io;
     ioreq_t p = {
         .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
         .addr = addr,
@@ -184,13 +184,13 @@ static int hvmemul_do_io(
         return X86EMUL_UNHANDLEABLE;
     }
 
-    switch ( vio->io_req.state )
+    switch ( vio->req.state )
     {
     case STATE_IOREQ_NONE:
         break;
     case STATE_IORESP_READY:
-        vio->io_req.state = STATE_IOREQ_NONE;
-        p = vio->io_req;
+        vio->req.state = STATE_IOREQ_NONE;
+        p = vio->req;
 
         /* Verify the emulation request has been correctly re-issued */
         if ( (p.type != (is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO)) ||
@@ -238,7 +238,7 @@ static int hvmemul_do_io(
     }
     ASSERT(p.count);
 
-    vio->io_req = p;
+    vio->req = p;
 
     rc = hvm_io_intercept(&p);
 
@@ -247,12 +247,12 @@ static int hvmemul_do_io(
      * our callers and mirror this into latched state.
      */
     ASSERT(p.count <= *reps);
-    *reps = vio->io_req.count = p.count;
+    *reps = vio->req.count = p.count;
 
     switch ( rc )
     {
     case X86EMUL_OKAY:
-        vio->io_req.state = STATE_IOREQ_NONE;
+        vio->req.state = STATE_IOREQ_NONE;
         break;
     case X86EMUL_UNHANDLEABLE:
     {
@@ -305,7 +305,7 @@ static int hvmemul_do_io(
                 if ( s == NULL )
                 {
                     rc = X86EMUL_RETRY;
-                    vio->io_req.state = STATE_IOREQ_NONE;
+                    vio->req.state = STATE_IOREQ_NONE;
                     break;
                 }
 
@@ -316,7 +316,7 @@ static int hvmemul_do_io(
                 if ( dir == IOREQ_READ )
                 {
                     rc = hvm_process_io_intercept(&ioreq_server_handler, &p);
-                    vio->io_req.state = STATE_IOREQ_NONE;
+                    vio->req.state = STATE_IOREQ_NONE;
                     break;
                 }
             }
@@ -329,14 +329,14 @@ static int hvmemul_do_io(
         if ( !s )
         {
             rc = hvm_process_io_intercept(&null_handler, &p);
-            vio->io_req.state = STATE_IOREQ_NONE;
+            vio->req.state = STATE_IOREQ_NONE;
         }
         else
         {
             rc = hvm_send_ioreq(s, &p, 0);
             if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
-                vio->io_req.state = STATE_IOREQ_NONE;
-            else if ( !ioreq_needs_completion(&vio->io_req) )
+                vio->req.state = STATE_IOREQ_NONE;
+            else if ( !ioreq_needs_completion(&vio->req) )
                 rc = X86EMUL_OKAY;
         }
         break;
@@ -1854,7 +1854,7 @@ static int hvmemul_rep_movs(
           * cheaper than multiple round trips through the device model. Yet
           * when processing a response we can always re-use the translation.
           */
-         (vio->io_req.state == STATE_IORESP_READY ||
+         (curr->io.req.state == STATE_IORESP_READY ||
           ((!df || *reps == 1) &&
            PAGE_SIZE - (saddr & ~PAGE_MASK) >= *reps * bytes_per_rep)) )
         sgpa = pfn_to_paddr(vio->mmio_gpfn) | (saddr & ~PAGE_MASK);
@@ -1870,7 +1870,7 @@ static int hvmemul_rep_movs(
     if ( vio->mmio_access.write_access &&
          (vio->mmio_gla == (daddr & PAGE_MASK)) &&
          /* See comment above. */
-         (vio->io_req.state == STATE_IORESP_READY ||
+         (curr->io.req.state == STATE_IORESP_READY ||
           ((!df || *reps == 1) &&
            PAGE_SIZE - (daddr & ~PAGE_MASK) >= *reps * bytes_per_rep)) )
         dgpa = pfn_to_paddr(vio->mmio_gpfn) | (daddr & ~PAGE_MASK);
@@ -2007,7 +2007,7 @@ static int hvmemul_rep_stos(
     if ( vio->mmio_access.write_access &&
          (vio->mmio_gla == (addr & PAGE_MASK)) &&
          /* See respective comment in MOVS processing. */
-         (vio->io_req.state == STATE_IORESP_READY ||
+         (curr->io.req.state == STATE_IORESP_READY ||
           ((!df || *reps == 1) &&
            PAGE_SIZE - (addr & ~PAGE_MASK) >= *reps * bytes_per_rep)) )
         gpa = pfn_to_paddr(vio->mmio_gpfn) | (addr & ~PAGE_MASK);
@@ -2613,13 +2613,13 @@ static const struct x86_emulate_ops hvm_emulate_ops_no_write = {
 };
 
 /*
- * Note that passing HVMIO_no_completion into this function serves as kind
+ * Note that passing IO_no_completion into this function serves as kind
  * of (but not fully) an "auto select completion" indicator.  When there's
  * no completion needed, the passed in value will be ignored in any case.
  */
 static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
     const struct x86_emulate_ops *ops,
-    enum hvm_io_completion completion)
+    enum io_completion completion)
 {
     const struct cpu_user_regs *regs = hvmemul_ctxt->ctxt.regs;
     struct vcpu *curr = current;
@@ -2634,11 +2634,11 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
      */
     if ( vio->cache->num_ents > vio->cache->max_ents )
     {
-        ASSERT(vio->io_req.state == STATE_IOREQ_NONE);
+        ASSERT(curr->io.req.state == STATE_IOREQ_NONE);
         vio->cache->num_ents = 0;
     }
     else
-        ASSERT(vio->io_req.state == STATE_IORESP_READY);
+        ASSERT(curr->io.req.state == STATE_IORESP_READY);
 
     hvm_emulate_init_per_insn(hvmemul_ctxt, vio->mmio_insn,
                               vio->mmio_insn_bytes);
@@ -2649,25 +2649,25 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
     if ( rc == X86EMUL_OKAY && vio->mmio_retry )
         rc = X86EMUL_RETRY;
 
-    if ( !ioreq_needs_completion(&vio->io_req) )
-        completion = HVMIO_no_completion;
-    else if ( completion == HVMIO_no_completion )
-        completion = (vio->io_req.type != IOREQ_TYPE_PIO ||
-                      hvmemul_ctxt->is_mem_access) ? HVMIO_mmio_completion
-                                                   : HVMIO_pio_completion;
+    if ( !ioreq_needs_completion(&curr->io.req) )
+        completion = IO_no_completion;
+    else if ( completion == IO_no_completion )
+        completion = (curr->io.req.type != IOREQ_TYPE_PIO ||
+                      hvmemul_ctxt->is_mem_access) ? IO_mmio_completion
+                                                   : IO_pio_completion;
 
-    switch ( vio->io_completion = completion )
+    switch ( curr->io.completion = completion )
     {
-    case HVMIO_no_completion:
-    case HVMIO_pio_completion:
+    case IO_no_completion:
+    case IO_pio_completion:
         vio->mmio_cache_count = 0;
         vio->mmio_insn_bytes = 0;
         vio->mmio_access = (struct npfec){};
         hvmemul_cache_disable(curr);
         break;
 
-    case HVMIO_mmio_completion:
-    case HVMIO_realmode_completion:
+    case IO_mmio_completion:
+    case IO_realmode_completion:
         BUILD_BUG_ON(sizeof(vio->mmio_insn) < sizeof(hvmemul_ctxt->insn_buf));
         vio->mmio_insn_bytes = hvmemul_ctxt->insn_buf_bytes;
         memcpy(vio->mmio_insn, hvmemul_ctxt->insn_buf, vio->mmio_insn_bytes);
@@ -2716,7 +2716,7 @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt,
 
 int hvm_emulate_one(
     struct hvm_emulate_ctxt *hvmemul_ctxt,
-    enum hvm_io_completion completion)
+    enum io_completion completion)
 {
     return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops, completion);
 }
@@ -2754,7 +2754,7 @@ int hvm_emulate_one_mmio(unsigned long mfn, unsigned long gla)
                           guest_cpu_user_regs());
     ctxt.ctxt.data = &mmio_ro_ctxt;
 
-    switch ( rc = _hvm_emulate_one(&ctxt, ops, HVMIO_no_completion) )
+    switch ( rc = _hvm_emulate_one(&ctxt, ops, IO_no_completion) )
     {
     case X86EMUL_UNHANDLEABLE:
     case X86EMUL_UNIMPLEMENTED:
@@ -2782,7 +2782,7 @@ void hvm_emulate_one_vm_event(enum emul_kind kind, unsigned int trapnr,
     {
     case EMUL_KIND_NOWRITE:
         rc = _hvm_emulate_one(&ctx, &hvm_emulate_ops_no_write,
-                              HVMIO_no_completion);
+                              IO_no_completion);
         break;
     case EMUL_KIND_SET_CONTEXT_INSN: {
         struct vcpu *curr = current;
@@ -2803,7 +2803,7 @@ void hvm_emulate_one_vm_event(enum emul_kind kind, unsigned int trapnr,
     /* Fall-through */
     default:
         ctx.set_context = (kind == EMUL_KIND_SET_CONTEXT_DATA);
-        rc = hvm_emulate_one(&ctx, HVMIO_no_completion);
+        rc = hvm_emulate_one(&ctx, IO_no_completion);
     }
 
     switch ( rc )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 54e32e4..cc46909 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3800,7 +3800,7 @@ void hvm_ud_intercept(struct cpu_user_regs *regs)
         return;
     }
 
-    switch ( hvm_emulate_one(&ctxt, HVMIO_no_completion) )
+    switch ( hvm_emulate_one(&ctxt, IO_no_completion) )
     {
     case X86EMUL_UNHANDLEABLE:
     case X86EMUL_UNIMPLEMENTED:
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index b220d6b..327a6a2 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -85,7 +85,7 @@ bool hvm_emulate_one_insn(hvm_emulate_validate_t *validate, const char *descr)
 
     hvm_emulate_init_once(&ctxt, validate, guest_cpu_user_regs());
 
-    switch ( rc = hvm_emulate_one(&ctxt, HVMIO_no_completion) )
+    switch ( rc = hvm_emulate_one(&ctxt, IO_no_completion) )
     {
     case X86EMUL_UNHANDLEABLE:
         hvm_dump_emulation_state(XENLOG_G_WARNING, descr, &ctxt, rc);
@@ -122,7 +122,7 @@ bool handle_mmio_with_translation(unsigned long gla, unsigned long gpfn,
 bool handle_pio(uint16_t port, unsigned int size, int dir)
 {
     struct vcpu *curr = current;
-    struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io;
+    struct vcpu_io *vio = &curr->io;
     unsigned int data;
     int rc;
 
@@ -135,8 +135,8 @@ bool handle_pio(uint16_t port, unsigned int size, int dir)
 
     rc = hvmemul_do_pio_buffer(port, size, dir, &data);
 
-    if ( ioreq_needs_completion(&vio->io_req) )
-        vio->io_completion = HVMIO_pio_completion;
+    if ( ioreq_needs_completion(&vio->req) )
+        vio->completion = IO_pio_completion;
 
     switch ( rc )
     {
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 009a95a..7808b75 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -41,11 +41,11 @@ bool ioreq_complete_mmio(void)
     return handle_mmio();
 }
 
-bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion)
+bool arch_vcpu_ioreq_completion(enum io_completion io_completion)
 {
     switch ( io_completion )
     {
-    case HVMIO_realmode_completion:
+    case IO_realmode_completion:
     {
         struct hvm_emulate_ctxt ctxt;
 
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index fcfccf7..6d90630 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1266,7 +1266,7 @@ enum hvm_intblk nsvm_intr_blocked(struct vcpu *v)
          * Delay the injection because this would result in delivering
          * an interrupt *within* the execution of an instruction.
          */
-        if ( v->arch.hvm.hvm_io.io_req.state != STATE_IOREQ_NONE )
+        if ( v->io.req.state != STATE_IOREQ_NONE )
             return hvm_intblk_shadow;
 
         if ( !nv->nv_vmexit_pending && n2vmcb->exit_int_info.v )
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index 768f01e..3033143 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -101,7 +101,7 @@ void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt)
 
     perfc_incr(realmode_emulations);
 
-    rc = hvm_emulate_one(hvmemul_ctxt, HVMIO_realmode_completion);
+    rc = hvm_emulate_one(hvmemul_ctxt, IO_realmode_completion);
 
     if ( rc == X86EMUL_UNHANDLEABLE )
     {
@@ -188,7 +188,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
 
         vmx_realmode_emulate_one(&hvmemul_ctxt);
 
-        if ( vio->io_req.state != STATE_IOREQ_NONE || vio->mmio_retry )
+        if ( curr->io.req.state != STATE_IOREQ_NONE || vio->mmio_retry )
             break;
 
         /* Stop emulating unless our segment state is not safe */
@@ -202,7 +202,7 @@ void vmx_realmode(struct cpu_user_regs *regs)
     }
 
     /* Need to emulate next time if we've started an IO operation */
-    if ( vio->io_req.state != STATE_IOREQ_NONE )
+    if ( curr->io.req.state != STATE_IOREQ_NONE )
         curr->arch.hvm.vmx.vmx_emulate = 1;
 
     if ( !curr->arch.hvm.vmx.vmx_emulate && !curr->arch.hvm.vmx.vmx_realmode )
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index b7c2d5a..caf4543 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -159,7 +159,7 @@ static bool hvm_wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p)
         break;
     }
 
-    p = &sv->vcpu->arch.hvm.hvm_io.io_req;
+    p = &sv->vcpu->io.req;
     if ( ioreq_needs_completion(p) )
         p->data = data;
 
@@ -171,10 +171,10 @@ static bool hvm_wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p)
 bool handle_hvm_io_completion(struct vcpu *v)
 {
     struct domain *d = v->domain;
-    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
+    struct vcpu_io *vio = &v->io;
     struct ioreq_server *s;
     struct ioreq_vcpu *sv;
-    enum hvm_io_completion io_completion;
+    enum io_completion io_completion;
 
     if ( has_vpci(d) && vpci_process_pending(v) )
     {
@@ -186,26 +186,26 @@ bool handle_hvm_io_completion(struct vcpu *v)
     if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
         return false;
 
-    vio->io_req.state = ioreq_needs_completion(&vio->io_req) ?
+    vio->req.state = ioreq_needs_completion(&vio->req) ?
         STATE_IORESP_READY : STATE_IOREQ_NONE;
 
     msix_write_completion(v);
     vcpu_end_shutdown_deferral(v);
 
-    io_completion = vio->io_completion;
-    vio->io_completion = HVMIO_no_completion;
+    io_completion = vio->completion;
+    vio->completion = IO_no_completion;
 
     switch ( io_completion )
     {
-    case HVMIO_no_completion:
+    case IO_no_completion:
         break;
 
-    case HVMIO_mmio_completion:
+    case IO_mmio_completion:
         return ioreq_complete_mmio();
 
-    case HVMIO_pio_completion:
-        return handle_pio(vio->io_req.addr, vio->io_req.size,
-                          vio->io_req.dir);
+    case IO_pio_completion:
+        return handle_pio(vio->req.addr, vio->req.size,
+                          vio->req.dir);
 
     default:
         return arch_vcpu_ioreq_completion(io_completion);
diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h
index 1620cc7..131cdf4 100644
--- a/xen/include/asm-x86/hvm/emulate.h
+++ b/xen/include/asm-x86/hvm/emulate.h
@@ -65,7 +65,7 @@ bool __nonnull(1, 2) hvm_emulate_one_insn(
     const char *descr);
 int hvm_emulate_one(
     struct hvm_emulate_ctxt *hvmemul_ctxt,
-    enum hvm_io_completion completion);
+    enum io_completion completion);
 void hvm_emulate_one_vm_event(enum emul_kind kind,
     unsigned int trapnr,
     unsigned int errcode);
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index 854dc77..ca3bf29 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -21,7 +21,7 @@
 
 #include <xen/ioreq.h>
 
-bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion);
+bool arch_vcpu_ioreq_completion(enum io_completion io_completion);
 int arch_ioreq_server_map_pages(struct ioreq_server *s);
 void arch_ioreq_server_unmap_pages(struct ioreq_server *s);
 void arch_ioreq_server_enable(struct ioreq_server *s);
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 6c1feda..8adf455 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -28,13 +28,6 @@
 #include <asm/mtrr.h>
 #include <public/hvm/ioreq.h>
 
-enum hvm_io_completion {
-    HVMIO_no_completion,
-    HVMIO_mmio_completion,
-    HVMIO_pio_completion,
-    HVMIO_realmode_completion
-};
-
 struct hvm_vcpu_asid {
     uint64_t generation;
     uint32_t asid;
@@ -52,10 +45,6 @@ struct hvm_mmio_cache {
 };
 
 struct hvm_vcpu_io {
-    /* I/O request in flight to device model. */
-    enum hvm_io_completion io_completion;
-    ioreq_t                io_req;
-
     /*
      * HVM emulation:
      *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 62cbcdb..8269f84 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -145,6 +145,21 @@ void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
 
 struct waitqueue_vcpu;
 
+enum io_completion {
+    IO_no_completion,
+    IO_mmio_completion,
+    IO_pio_completion,
+#ifdef CONFIG_X86
+    IO_realmode_completion,
+#endif
+};
+
+struct vcpu_io {
+    /* I/O request in flight to device model. */
+    enum io_completion   completion;
+    ioreq_t              req;
+};
+
 struct vcpu
 {
     int              vcpu_id;
@@ -256,6 +271,10 @@ struct vcpu
     struct vpci_vcpu vpci;
 
     struct arch_vcpu arch;
+
+#ifdef CONFIG_IOREQ_SERVER
+    struct vcpu_io io;
+#endif
 };
 
 struct sched_unit {
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 12/23] xen/ioreq: Remove "hvm" prefixes from involved function names
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (10 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-07 12:45   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 13/23] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Paul Durrant, Jun Nakajima, Kevin Tian,
	Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch removes "hvm" prefixes and infixes from IOREQ related
function names in the common code and performs a renaming where
appropriate according to the more consistent new naming scheme:
- IOREQ server functions should start with "ioreq_server_"
- IOREQ functions should start with "ioreq_"

A few function names are clarified to better fit into their purposes:
handle_hvm_io_completion -> vcpu_ioreq_handle_completion
hvm_io_pending           -> vcpu_ioreq_pending
hvm_ioreq_init           -> ioreq_domain_init
hvm_alloc_ioreq_mfn      -> ioreq_server_alloc_mfn
hvm_free_ioreq_mfn       -> ioreq_server_free_mfn

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch

Changes V2 -> V3:
   - update patch according the "legacy interface" is x86 specific
   - update patch description
   - rename everything touched according to new naming scheme
---
---
 xen/arch/x86/hvm/dm.c       |   4 +-
 xen/arch/x86/hvm/emulate.c  |   6 +-
 xen/arch/x86/hvm/hvm.c      |  10 +--
 xen/arch/x86/hvm/io.c       |   6 +-
 xen/arch/x86/hvm/ioreq.c    |   2 +-
 xen/arch/x86/hvm/stdvga.c   |   4 +-
 xen/arch/x86/hvm/vmx/vvmx.c |   2 +-
 xen/common/dm.c             |  28 +++----
 xen/common/ioreq.c          | 174 ++++++++++++++++++++++----------------------
 xen/common/memory.c         |   2 +-
 xen/include/xen/ioreq.h     |  67 ++++++++---------
 11 files changed, 153 insertions(+), 152 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 35f860a..0b6319e 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -352,8 +352,8 @@ int arch_dm_op(struct xen_dm_op *op, struct domain *d,
             break;
 
         if ( first_gfn == 0 )
-            rc = hvm_map_mem_type_to_ioreq_server(d, data->id,
-                                                  data->type, data->flags);
+            rc = ioreq_server_map_mem_type(d, data->id,
+                                           data->type, data->flags);
         else
             rc = 0;
 
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 04e4994..a025f89 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -261,7 +261,7 @@ static int hvmemul_do_io(
          * an ioreq server that can handle it.
          *
          * Rules:
-         * A> PIO or MMIO accesses run through hvm_select_ioreq_server() to
+         * A> PIO or MMIO accesses run through ioreq_server_select() to
          * choose the ioreq server by range. If no server is found, the access
          * is ignored.
          *
@@ -323,7 +323,7 @@ static int hvmemul_do_io(
         }
 
         if ( !s )
-            s = hvm_select_ioreq_server(currd, &p);
+            s = ioreq_server_select(currd, &p);
 
         /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
@@ -333,7 +333,7 @@ static int hvmemul_do_io(
         }
         else
         {
-            rc = hvm_send_ioreq(s, &p, 0);
+            rc = ioreq_send(s, &p, 0);
             if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
                 vio->req.state = STATE_IOREQ_NONE;
             else if ( !ioreq_needs_completion(&vio->req) )
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index cc46909..8e3c2e2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -546,7 +546,7 @@ void hvm_do_resume(struct vcpu *v)
 
     pt_restore_timer(v);
 
-    if ( !handle_hvm_io_completion(v) )
+    if ( !vcpu_ioreq_handle_completion(v) )
         return;
 
     if ( unlikely(v->arch.vm_event) )
@@ -677,7 +677,7 @@ int hvm_domain_initialise(struct domain *d)
     register_g2m_portio_handler(d);
     register_vpci_portio_handler(d);
 
-    hvm_ioreq_init(d);
+    ioreq_domain_init(d);
 
     hvm_init_guest_time(d);
 
@@ -739,7 +739,7 @@ void hvm_domain_relinquish_resources(struct domain *d)
 
     viridian_domain_deinit(d);
 
-    hvm_destroy_all_ioreq_servers(d);
+    ioreq_server_destroy_all(d);
 
     msixtbl_pt_cleanup(d);
 
@@ -1582,7 +1582,7 @@ int hvm_vcpu_initialise(struct vcpu *v)
     if ( rc )
         goto fail5;
 
-    rc = hvm_all_ioreq_servers_add_vcpu(d, v);
+    rc = ioreq_server_add_vcpu_all(d, v);
     if ( rc != 0 )
         goto fail6;
 
@@ -1618,7 +1618,7 @@ void hvm_vcpu_destroy(struct vcpu *v)
 {
     viridian_vcpu_deinit(v);
 
-    hvm_all_ioreq_servers_remove_vcpu(v->domain, v);
+    ioreq_server_remove_vcpu_all(v->domain, v);
 
     if ( hvm_altp2m_supported() )
         altp2m_vcpu_destroy(v);
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 327a6a2..a0dd8d1 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -60,7 +60,7 @@ void send_timeoffset_req(unsigned long timeoff)
     if ( timeoff == 0 )
         return;
 
-    if ( hvm_broadcast_ioreq(&p, true) != 0 )
+    if ( ioreq_broadcast(&p, true) != 0 )
         gprintk(XENLOG_ERR, "Unsuccessful timeoffset update\n");
 }
 
@@ -74,7 +74,7 @@ void send_invalidate_req(void)
         .data = ~0UL, /* flush all */
     };
 
-    if ( hvm_broadcast_ioreq(&p, false) != 0 )
+    if ( ioreq_broadcast(&p, false) != 0 )
         gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n");
 }
 
@@ -155,7 +155,7 @@ bool handle_pio(uint16_t port, unsigned int size, int dir)
          * We should not advance RIP/EIP if the domain is shutting down or
          * if X86EMUL_RETRY has been returned by an internal handler.
          */
-        if ( curr->domain->is_shutting_down || !hvm_io_pending(curr) )
+        if ( curr->domain->is_shutting_down || !vcpu_ioreq_pending(curr) )
             return false;
         break;
 
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 7808b75..934189e 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -154,7 +154,7 @@ static int hvm_map_ioreq_gfn(struct ioreq_server *s, bool buf)
     {
         /*
          * If a page has already been allocated (which will happen on
-         * demand if hvm_get_ioreq_server_frame() is called), then
+         * demand if ioreq_server_get_frame() is called), then
          * mapping a guest frame is not permitted.
          */
         if ( gfn_eq(iorp->gfn, INVALID_GFN) )
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index bafb3f6..390ac51 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -507,11 +507,11 @@ static int stdvga_mem_write(const struct hvm_io_handler *handler,
     }
 
  done:
-    srv = hvm_select_ioreq_server(current->domain, &p);
+    srv = ioreq_server_select(current->domain, &p);
     if ( !srv )
         return X86EMUL_UNHANDLEABLE;
 
-    return hvm_send_ioreq(srv, &p, 1);
+    return ioreq_send(srv, &p, 1);
 }
 
 static bool_t stdvga_mem_accept(const struct hvm_io_handler *handler,
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 3a37e9e..a4813f0 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1516,7 +1516,7 @@ void nvmx_switch_guest(void)
      * don't want to continue as this setup is not implemented nor supported
      * as of right now.
      */
-    if ( hvm_io_pending(v) )
+    if ( vcpu_ioreq_pending(v) )
         return;
     /*
      * a softirq may interrupt us between a virtual vmentry is
diff --git a/xen/common/dm.c b/xen/common/dm.c
index 36e01a2..9d394fc 100644
--- a/xen/common/dm.c
+++ b/xen/common/dm.c
@@ -100,8 +100,8 @@ static int dm_op(const struct dmop_args *op_args)
         if ( data->pad[0] || data->pad[1] || data->pad[2] )
             break;
 
-        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
-                                     &data->id);
+        rc = ioreq_server_create(d, data->handle_bufioreq,
+                                 &data->id);
         break;
     }
 
@@ -117,12 +117,12 @@ static int dm_op(const struct dmop_args *op_args)
         if ( data->flags & ~valid_flags )
             break;
 
-        rc = hvm_get_ioreq_server_info(d, data->id,
-                                       (data->flags & XEN_DMOP_no_gfns) ?
-                                       NULL : (unsigned long *)&data->ioreq_gfn,
-                                       (data->flags & XEN_DMOP_no_gfns) ?
-                                       NULL : (unsigned long *)&data->bufioreq_gfn,
-                                       &data->bufioreq_port);
+        rc = ioreq_server_get_info(d, data->id,
+                                   (data->flags & XEN_DMOP_no_gfns) ?
+                                   NULL : (unsigned long *)&data->ioreq_gfn,
+                                   (data->flags & XEN_DMOP_no_gfns) ?
+                                   NULL : (unsigned long *)&data->bufioreq_gfn,
+                                   &data->bufioreq_port);
         break;
     }
 
@@ -135,8 +135,8 @@ static int dm_op(const struct dmop_args *op_args)
         if ( data->pad )
             break;
 
-        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
-                                              data->start, data->end);
+        rc = ioreq_server_map_io_range(d, data->id, data->type,
+                                       data->start, data->end);
         break;
     }
 
@@ -149,8 +149,8 @@ static int dm_op(const struct dmop_args *op_args)
         if ( data->pad )
             break;
 
-        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
-                                                  data->start, data->end);
+        rc = ioreq_server_unmap_io_range(d, data->id, data->type,
+                                         data->start, data->end);
         break;
     }
 
@@ -163,7 +163,7 @@ static int dm_op(const struct dmop_args *op_args)
         if ( data->pad )
             break;
 
-        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
+        rc = ioreq_server_set_state(d, data->id, !!data->enabled);
         break;
     }
 
@@ -176,7 +176,7 @@ static int dm_op(const struct dmop_args *op_args)
         if ( data->pad )
             break;
 
-        rc = hvm_destroy_ioreq_server(d, data->id);
+        rc = ioreq_server_destroy(d, data->id);
         break;
     }
 
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index caf4543..3ca5b96 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -59,7 +59,7 @@ static struct ioreq_server *get_ioreq_server(const struct domain *d,
  * Iterate over all possible ioreq servers.
  *
  * NOTE: The iteration is backwards such that more recently created
- *       ioreq servers are favoured in hvm_select_ioreq_server().
+ *       ioreq servers are favoured in ioreq_server_select().
  *       This is a semantic that previously existed when ioreq servers
  *       were held in a linked list.
  */
@@ -106,12 +106,12 @@ static struct ioreq_vcpu *get_pending_vcpu(const struct vcpu *v,
     return NULL;
 }
 
-bool hvm_io_pending(struct vcpu *v)
+bool vcpu_ioreq_pending(struct vcpu *v)
 {
     return get_pending_vcpu(v, NULL);
 }
 
-static bool hvm_wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p)
+static bool wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p)
 {
     unsigned int prev_state = STATE_IOREQ_NONE;
     unsigned int state = p->state;
@@ -168,7 +168,7 @@ static bool hvm_wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p)
     return true;
 }
 
-bool handle_hvm_io_completion(struct vcpu *v)
+bool vcpu_ioreq_handle_completion(struct vcpu *v)
 {
     struct domain *d = v->domain;
     struct vcpu_io *vio = &v->io;
@@ -183,7 +183,7 @@ bool handle_hvm_io_completion(struct vcpu *v)
     }
 
     sv = get_pending_vcpu(v, &s);
-    if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
+    if ( sv && !wait_for_io(sv, get_ioreq(s, v)) )
         return false;
 
     vio->req.state = ioreq_needs_completion(&vio->req) ?
@@ -214,7 +214,7 @@ bool handle_hvm_io_completion(struct vcpu *v)
     return true;
 }
 
-static int hvm_alloc_ioreq_mfn(struct ioreq_server *s, bool buf)
+static int ioreq_server_alloc_mfn(struct ioreq_server *s, bool buf)
 {
     struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
     struct page_info *page;
@@ -223,7 +223,7 @@ static int hvm_alloc_ioreq_mfn(struct ioreq_server *s, bool buf)
     {
         /*
          * If a guest frame has already been mapped (which may happen
-         * on demand if hvm_get_ioreq_server_info() is called), then
+         * on demand if ioreq_server_get_info() is called), then
          * allocating a page is not permitted.
          */
         if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
@@ -262,7 +262,7 @@ static int hvm_alloc_ioreq_mfn(struct ioreq_server *s, bool buf)
     return -ENOMEM;
 }
 
-static void hvm_free_ioreq_mfn(struct ioreq_server *s, bool buf)
+static void ioreq_server_free_mfn(struct ioreq_server *s, bool buf)
 {
     struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
     struct page_info *page = iorp->page;
@@ -301,8 +301,8 @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
     return found;
 }
 
-static void hvm_update_ioreq_evtchn(struct ioreq_server *s,
-                                    struct ioreq_vcpu *sv)
+static void ioreq_update_evtchn(struct ioreq_server *s,
+                                struct ioreq_vcpu *sv)
 {
     ASSERT(spin_is_locked(&s->lock));
 
@@ -314,8 +314,8 @@ static void hvm_update_ioreq_evtchn(struct ioreq_server *s,
     }
 }
 
-static int hvm_ioreq_server_add_vcpu(struct ioreq_server *s,
-                                     struct vcpu *v)
+static int ioreq_server_add_vcpu(struct ioreq_server *s,
+                                 struct vcpu *v)
 {
     struct ioreq_vcpu *sv;
     int rc;
@@ -350,7 +350,7 @@ static int hvm_ioreq_server_add_vcpu(struct ioreq_server *s,
     list_add(&sv->list_entry, &s->ioreq_vcpu_list);
 
     if ( s->enabled )
-        hvm_update_ioreq_evtchn(s, sv);
+        ioreq_update_evtchn(s, sv);
 
     spin_unlock(&s->lock);
     return 0;
@@ -366,8 +366,8 @@ static int hvm_ioreq_server_add_vcpu(struct ioreq_server *s,
     return rc;
 }
 
-static void hvm_ioreq_server_remove_vcpu(struct ioreq_server *s,
-                                         struct vcpu *v)
+static void ioreq_server_remove_vcpu(struct ioreq_server *s,
+                                     struct vcpu *v)
 {
     struct ioreq_vcpu *sv;
 
@@ -394,7 +394,7 @@ static void hvm_ioreq_server_remove_vcpu(struct ioreq_server *s,
     spin_unlock(&s->lock);
 }
 
-static void hvm_ioreq_server_remove_all_vcpus(struct ioreq_server *s)
+static void ioreq_server_remove_all_vcpus(struct ioreq_server *s)
 {
     struct ioreq_vcpu *sv, *next;
 
@@ -420,28 +420,28 @@ static void hvm_ioreq_server_remove_all_vcpus(struct ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
-static int hvm_ioreq_server_alloc_pages(struct ioreq_server *s)
+static int ioreq_server_alloc_pages(struct ioreq_server *s)
 {
     int rc;
 
-    rc = hvm_alloc_ioreq_mfn(s, false);
+    rc = ioreq_server_alloc_mfn(s, false);
 
     if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
-        rc = hvm_alloc_ioreq_mfn(s, true);
+        rc = ioreq_server_alloc_mfn(s, true);
 
     if ( rc )
-        hvm_free_ioreq_mfn(s, false);
+        ioreq_server_free_mfn(s, false);
 
     return rc;
 }
 
-static void hvm_ioreq_server_free_pages(struct ioreq_server *s)
+static void ioreq_server_free_pages(struct ioreq_server *s)
 {
-    hvm_free_ioreq_mfn(s, true);
-    hvm_free_ioreq_mfn(s, false);
+    ioreq_server_free_mfn(s, true);
+    ioreq_server_free_mfn(s, false);
 }
 
-static void hvm_ioreq_server_free_rangesets(struct ioreq_server *s)
+static void ioreq_server_free_rangesets(struct ioreq_server *s)
 {
     unsigned int i;
 
@@ -449,8 +449,8 @@ static void hvm_ioreq_server_free_rangesets(struct ioreq_server *s)
         rangeset_destroy(s->range[i]);
 }
 
-static int hvm_ioreq_server_alloc_rangesets(struct ioreq_server *s,
-                                            ioservid_t id)
+static int ioreq_server_alloc_rangesets(struct ioreq_server *s,
+                                        ioservid_t id)
 {
     unsigned int i;
     int rc;
@@ -482,12 +482,12 @@ static int hvm_ioreq_server_alloc_rangesets(struct ioreq_server *s,
     return 0;
 
  fail:
-    hvm_ioreq_server_free_rangesets(s);
+    ioreq_server_free_rangesets(s);
 
     return rc;
 }
 
-static void hvm_ioreq_server_enable(struct ioreq_server *s)
+static void ioreq_server_enable(struct ioreq_server *s)
 {
     struct ioreq_vcpu *sv;
 
@@ -503,13 +503,13 @@ static void hvm_ioreq_server_enable(struct ioreq_server *s)
     list_for_each_entry ( sv,
                           &s->ioreq_vcpu_list,
                           list_entry )
-        hvm_update_ioreq_evtchn(s, sv);
+        ioreq_update_evtchn(s, sv);
 
   done:
     spin_unlock(&s->lock);
 }
 
-static void hvm_ioreq_server_disable(struct ioreq_server *s)
+static void ioreq_server_disable(struct ioreq_server *s)
 {
     spin_lock(&s->lock);
 
@@ -524,9 +524,9 @@ static void hvm_ioreq_server_disable(struct ioreq_server *s)
     spin_unlock(&s->lock);
 }
 
-static int hvm_ioreq_server_init(struct ioreq_server *s,
-                                 struct domain *d, int bufioreq_handling,
-                                 ioservid_t id)
+static int ioreq_server_init(struct ioreq_server *s,
+                             struct domain *d, int bufioreq_handling,
+                             ioservid_t id)
 {
     struct domain *currd = current->domain;
     struct vcpu *v;
@@ -544,7 +544,7 @@ static int hvm_ioreq_server_init(struct ioreq_server *s,
     s->ioreq.gfn = INVALID_GFN;
     s->bufioreq.gfn = INVALID_GFN;
 
-    rc = hvm_ioreq_server_alloc_rangesets(s, id);
+    rc = ioreq_server_alloc_rangesets(s, id);
     if ( rc )
         return rc;
 
@@ -552,7 +552,7 @@ static int hvm_ioreq_server_init(struct ioreq_server *s,
 
     for_each_vcpu ( d, v )
     {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
+        rc = ioreq_server_add_vcpu(s, v);
         if ( rc )
             goto fail_add;
     }
@@ -560,23 +560,23 @@ static int hvm_ioreq_server_init(struct ioreq_server *s,
     return 0;
 
  fail_add:
-    hvm_ioreq_server_remove_all_vcpus(s);
+    ioreq_server_remove_all_vcpus(s);
     arch_ioreq_server_unmap_pages(s);
 
-    hvm_ioreq_server_free_rangesets(s);
+    ioreq_server_free_rangesets(s);
 
     put_domain(s->emulator);
     return rc;
 }
 
-static void hvm_ioreq_server_deinit(struct ioreq_server *s)
+static void ioreq_server_deinit(struct ioreq_server *s)
 {
     ASSERT(!s->enabled);
-    hvm_ioreq_server_remove_all_vcpus(s);
+    ioreq_server_remove_all_vcpus(s);
 
     /*
      * NOTE: It is safe to call both arch_ioreq_server_unmap_pages() and
-     *       hvm_ioreq_server_free_pages() in that order.
+     *       ioreq_server_free_pages() in that order.
      *       This is because the former will do nothing if the pages
      *       are not mapped, leaving the page to be freed by the latter.
      *       However if the pages are mapped then the former will set
@@ -584,15 +584,15 @@ static void hvm_ioreq_server_deinit(struct ioreq_server *s)
      *       nothing.
      */
     arch_ioreq_server_unmap_pages(s);
-    hvm_ioreq_server_free_pages(s);
+    ioreq_server_free_pages(s);
 
-    hvm_ioreq_server_free_rangesets(s);
+    ioreq_server_free_rangesets(s);
 
     put_domain(s->emulator);
 }
 
-int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
-                            ioservid_t *id)
+int ioreq_server_create(struct domain *d, int bufioreq_handling,
+                        ioservid_t *id)
 {
     struct ioreq_server *s;
     unsigned int i;
@@ -620,11 +620,11 @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
 
     /*
      * It is safe to call set_ioreq_server() prior to
-     * hvm_ioreq_server_init() since the target domain is paused.
+     * ioreq_server_init() since the target domain is paused.
      */
     set_ioreq_server(d, i, s);
 
-    rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i);
+    rc = ioreq_server_init(s, d, bufioreq_handling, i);
     if ( rc )
     {
         set_ioreq_server(d, i, NULL);
@@ -647,7 +647,7 @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
     return rc;
 }
 
-int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
+int ioreq_server_destroy(struct domain *d, ioservid_t id)
 {
     struct ioreq_server *s;
     int rc;
@@ -668,13 +668,13 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 
     arch_ioreq_server_destroy(s);
 
-    hvm_ioreq_server_disable(s);
+    ioreq_server_disable(s);
 
     /*
-     * It is safe to call hvm_ioreq_server_deinit() prior to
+     * It is safe to call ioreq_server_deinit() prior to
      * set_ioreq_server() since the target domain is paused.
      */
-    hvm_ioreq_server_deinit(s);
+    ioreq_server_deinit(s);
     set_ioreq_server(d, id, NULL);
 
     domain_unpause(d);
@@ -689,10 +689,10 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
     return rc;
 }
 
-int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
-                              unsigned long *ioreq_gfn,
-                              unsigned long *bufioreq_gfn,
-                              evtchn_port_t *bufioreq_port)
+int ioreq_server_get_info(struct domain *d, ioservid_t id,
+                          unsigned long *ioreq_gfn,
+                          unsigned long *bufioreq_gfn,
+                          evtchn_port_t *bufioreq_port)
 {
     struct ioreq_server *s;
     int rc;
@@ -736,8 +736,8 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
     return rc;
 }
 
-int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
-                               unsigned long idx, mfn_t *mfn)
+int ioreq_server_get_frame(struct domain *d, ioservid_t id,
+                           unsigned long idx, mfn_t *mfn)
 {
     struct ioreq_server *s;
     int rc;
@@ -756,7 +756,7 @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
     if ( s->emulator != current->domain )
         goto out;
 
-    rc = hvm_ioreq_server_alloc_pages(s);
+    rc = ioreq_server_alloc_pages(s);
     if ( rc )
         goto out;
 
@@ -787,9 +787,9 @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
     return rc;
 }
 
-int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint64_t start,
-                                     uint64_t end)
+int ioreq_server_map_io_range(struct domain *d, ioservid_t id,
+                              uint32_t type, uint64_t start,
+                              uint64_t end)
 {
     struct ioreq_server *s;
     struct rangeset *r;
@@ -839,9 +839,9 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
-int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
-                                         uint32_t type, uint64_t start,
-                                         uint64_t end)
+int ioreq_server_unmap_io_range(struct domain *d, ioservid_t id,
+                                uint32_t type, uint64_t start,
+                                uint64_t end)
 {
     struct ioreq_server *s;
     struct rangeset *r;
@@ -899,8 +899,8 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
  * Support for the emulation of read operations can be added when an ioreq
  * server has such requirement in the future.
  */
-int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint32_t flags)
+int ioreq_server_map_mem_type(struct domain *d, ioservid_t id,
+                              uint32_t type, uint32_t flags)
 {
     struct ioreq_server *s;
     int rc;
@@ -931,8 +931,8 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
-int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
-                               bool enabled)
+int ioreq_server_set_state(struct domain *d, ioservid_t id,
+                           bool enabled)
 {
     struct ioreq_server *s;
     int rc;
@@ -952,9 +952,9 @@ int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
     domain_pause(d);
 
     if ( enabled )
-        hvm_ioreq_server_enable(s);
+        ioreq_server_enable(s);
     else
-        hvm_ioreq_server_disable(s);
+        ioreq_server_disable(s);
 
     domain_unpause(d);
 
@@ -965,7 +965,7 @@ int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
     return rc;
 }
 
-int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+int ioreq_server_add_vcpu_all(struct domain *d, struct vcpu *v)
 {
     struct ioreq_server *s;
     unsigned int id;
@@ -975,7 +975,7 @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
+        rc = ioreq_server_add_vcpu(s, v);
         if ( rc )
             goto fail;
     }
@@ -992,7 +992,7 @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
         if ( !s )
             continue;
 
-        hvm_ioreq_server_remove_vcpu(s, v);
+        ioreq_server_remove_vcpu(s, v);
     }
 
     spin_unlock_recursive(&d->ioreq_server.lock);
@@ -1000,7 +1000,7 @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
     return rc;
 }
 
-void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
+void ioreq_server_remove_vcpu_all(struct domain *d, struct vcpu *v)
 {
     struct ioreq_server *s;
     unsigned int id;
@@ -1008,12 +1008,12 @@ void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
     spin_lock_recursive(&d->ioreq_server.lock);
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
-        hvm_ioreq_server_remove_vcpu(s, v);
+        ioreq_server_remove_vcpu(s, v);
 
     spin_unlock_recursive(&d->ioreq_server.lock);
 }
 
-void hvm_destroy_all_ioreq_servers(struct domain *d)
+void ioreq_server_destroy_all(struct domain *d)
 {
     struct ioreq_server *s;
     unsigned int id;
@@ -1027,13 +1027,13 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
 
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
-        hvm_ioreq_server_disable(s);
+        ioreq_server_disable(s);
 
         /*
-         * It is safe to call hvm_ioreq_server_deinit() prior to
+         * It is safe to call ioreq_server_deinit() prior to
          * set_ioreq_server() since the target domain is being destroyed.
          */
-        hvm_ioreq_server_deinit(s);
+        ioreq_server_deinit(s);
         set_ioreq_server(d, id, NULL);
 
         xfree(s);
@@ -1042,8 +1042,8 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
     spin_unlock_recursive(&d->ioreq_server.lock);
 }
 
-struct ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                             ioreq_t *p)
+struct ioreq_server *ioreq_server_select(struct domain *d,
+                                         ioreq_t *p)
 {
     struct ioreq_server *s;
     uint8_t type;
@@ -1098,7 +1098,7 @@ struct ioreq_server *hvm_select_ioreq_server(struct domain *d,
     return NULL;
 }
 
-static int hvm_send_buffered_ioreq(struct ioreq_server *s, ioreq_t *p)
+static int ioreq_send_buffered(struct ioreq_server *s, ioreq_t *p)
 {
     struct domain *d = current->domain;
     struct ioreq_page *iorp;
@@ -1191,8 +1191,8 @@ static int hvm_send_buffered_ioreq(struct ioreq_server *s, ioreq_t *p)
     return IOREQ_STATUS_HANDLED;
 }
 
-int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p,
-                   bool buffered)
+int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p,
+               bool buffered)
 {
     struct vcpu *curr = current;
     struct domain *d = curr->domain;
@@ -1201,7 +1201,7 @@ int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p,
     ASSERT(s);
 
     if ( buffered )
-        return hvm_send_buffered_ioreq(s, proto_p);
+        return ioreq_send_buffered(s, proto_p);
 
     if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
         return IOREQ_STATUS_RETRY;
@@ -1251,7 +1251,7 @@ int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p,
     return IOREQ_STATUS_UNHANDLED;
 }
 
-unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
+unsigned int ioreq_broadcast(ioreq_t *p, bool buffered)
 {
     struct domain *d = current->domain;
     struct ioreq_server *s;
@@ -1262,14 +1262,14 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
         if ( !s->enabled )
             continue;
 
-        if ( hvm_send_ioreq(s, p, buffered) == IOREQ_STATUS_UNHANDLED )
+        if ( ioreq_send(s, p, buffered) == IOREQ_STATUS_UNHANDLED )
             failed++;
     }
 
     return failed;
 }
 
-void hvm_ioreq_init(struct domain *d)
+void ioreq_domain_init(struct domain *d)
 {
     spin_lock_init(&d->ioreq_server.lock);
 
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 92cf983..3363c06 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1108,7 +1108,7 @@ static int acquire_ioreq_server(struct domain *d,
     {
         mfn_t mfn;
 
-        rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn);
+        rc = ioreq_server_get_frame(d, id, frame + i, &mfn);
         if ( rc )
             return rc;
 
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 979afa0..02ff998 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -81,41 +81,42 @@ static inline bool ioreq_needs_completion(const ioreq_t *ioreq)
 #define HANDLE_BUFIOREQ(s) \
     ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
 
-bool hvm_io_pending(struct vcpu *v);
-bool handle_hvm_io_completion(struct vcpu *v);
+bool vcpu_ioreq_pending(struct vcpu *v);
+bool vcpu_ioreq_handle_completion(struct vcpu *v);
 bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
 
-int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
-                            ioservid_t *id);
-int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id);
-int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
-                              unsigned long *ioreq_gfn,
-                              unsigned long *bufioreq_gfn,
-                              evtchn_port_t *bufioreq_port);
-int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
-                               unsigned long idx, mfn_t *mfn);
-int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint64_t start,
-                                     uint64_t end);
-int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
-                                         uint32_t type, uint64_t start,
-                                         uint64_t end);
-int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint32_t flags);
-int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
-                               bool enabled);
-
-int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
-void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
-void hvm_destroy_all_ioreq_servers(struct domain *d);
-
-struct ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                             ioreq_t *p);
-int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p,
-                   bool buffered);
-unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
-
-void hvm_ioreq_init(struct domain *d);
+int ioreq_server_create(struct domain *d, int bufioreq_handling,
+                        ioservid_t *id);
+int ioreq_server_destroy(struct domain *d, ioservid_t id);
+int ioreq_server_get_info(struct domain *d, ioservid_t id,
+                          unsigned long *ioreq_gfn,
+                          unsigned long *bufioreq_gfn,
+                          evtchn_port_t *bufioreq_port);
+int ioreq_server_get_frame(struct domain *d, ioservid_t id,
+                           unsigned long idx, mfn_t *mfn);
+int ioreq_server_map_io_range(struct domain *d, ioservid_t id,
+                              uint32_t type, uint64_t start,
+                              uint64_t end);
+int ioreq_server_unmap_io_range(struct domain *d, ioservid_t id,
+                                uint32_t type, uint64_t start,
+                                uint64_t end);
+int ioreq_server_map_mem_type(struct domain *d, ioservid_t id,
+                              uint32_t type, uint32_t flags);
+
+int ioreq_server_set_state(struct domain *d, ioservid_t id,
+                           bool enabled);
+
+int ioreq_server_add_vcpu_all(struct domain *d, struct vcpu *v);
+void ioreq_server_remove_vcpu_all(struct domain *d, struct vcpu *v);
+void ioreq_server_destroy_all(struct domain *d);
+
+struct ioreq_server *ioreq_server_select(struct domain *d,
+                                         ioreq_t *p);
+int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p,
+               bool buffered);
+unsigned int ioreq_broadcast(ioreq_t *p, bool buffered);
+
+void ioreq_domain_init(struct domain *d);
 
 #endif /* __XEN_IOREQ_H__ */
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 13/23] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (11 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 12/23] xen/ioreq: Remove "hvm" prefixes from involved function names Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-09 21:32   ` Stefano Stabellini
  2020-11-30 10:31 ` [PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Paul Durrant, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The cmpxchg() in ioreq_send_buffered() operates on memory shared
with the emulator domain (and the target domain if the legacy
interface is used).

In order to be on the safe side we need to switch
to guest_cmpxchg64() to prevent a domain to DoS Xen on Arm.

As there is no plan to support the legacy interface on Arm,
we will have a page to be mapped in a single domain at the time,
so we can use s->emulator in guest_cmpxchg64() safely.

Thankfully the only user of the legacy interface is x86 so far
and there is not concern regarding the atomics operations.

Please note, that the legacy interface *must* not be used on Arm
without revisiting the code.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch

Changes V1 -> V2:
   - move earlier to avoid breaking arm32 compilation
   - add an explanation to commit description and hvm_allow_set_param()
   - pass s->emulator

Changes V2 -> V3:
   - update patch description
---
---
 xen/arch/arm/hvm.c | 4 ++++
 xen/common/ioreq.c | 3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
index 8951b34..9694e5a 100644
--- a/xen/arch/arm/hvm.c
+++ b/xen/arch/arm/hvm.c
@@ -31,6 +31,10 @@
 
 #include <asm/hypercall.h>
 
+/*
+ * The legacy interface (which involves magic IOREQ pages) *must* not be used
+ * without revisiting the code.
+ */
 static int hvm_allow_set_param(const struct domain *d, unsigned int param)
 {
     switch ( param )
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 3ca5b96..4855dd8 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -29,6 +29,7 @@
 #include <xen/trace.h>
 #include <xen/vpci.h>
 
+#include <asm/guest_atomics.h>
 #include <asm/hvm/ioreq.h>
 
 #include <public/hvm/ioreq.h>
@@ -1182,7 +1183,7 @@ static int ioreq_send_buffered(struct ioreq_server *s, ioreq_t *p)
 
         new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
         new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-        cmpxchg(&pg->ptrs.full, old.full, new.full);
+        guest_cmpxchg64(s->emulator, &pg->ptrs.full, old.full, new.full);
     }
 
     notify_via_xen_event_channel(d, s->bufioreq_evtchn);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (12 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 13/23] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-09 22:04   ` Stefano Stabellini
  2020-11-30 10:31 ` [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed Oleksandr Tyshchenko
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Oleksandr Tyshchenko

From: Julien Grall <julien.grall@arm.com>

This patch adds basic IOREQ/DM support on Arm. The subsequent
patches will improve functionality and add remaining bits.

The IOREQ/DM features are supposed to be built with IOREQ_SERVER
option enabled, which is disabled by default on Arm for now.

Please note, the "PIO handling" TODO is expected to left unaddressed
for the current series. It is not an big issue for now while Xen
doesn't have support for vPCI on Arm. On Arm64 they are only used
for PCI IO Bar and we would probably want to expose them to emulator
as PIO access to make a DM completely arch-agnostic. So "PIO handling"
should be implemented when we add support for vPCI.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - was split into:
     - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
     - xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
   - update patch description
   - update asm-arm/hvm/ioreq.h according to the newly introduced arch functions:
     - arch_hvm_destroy_ioreq_server()
     - arch_handle_hvm_io_completion()
   - update arch files to include xen/ioreq.h
   - remove HVMOP plumbing
   - rewrite a logic to handle properly case when hvm_send_ioreq() returns IO_RETRY
   - add a logic to handle properly handle_hvm_io_completion() return value
   - rename handle_mmio() to ioreq_handle_complete_mmio()
   - move paging_mark_pfn_dirty() to asm-arm/paging.h
   - remove forward declaration for hvm_ioreq_server in asm-arm/paging.h
   - move try_fwd_ioserv() to ioreq.c, provide stubs if !CONFIG_IOREQ_SERVER
   - do not remove #ifdef CONFIG_IOREQ_SERVER in memory.c for guarding xen/ioreq.h
   - use gdprintk in try_fwd_ioserv(), remove unneeded prints
   - update list of #include-s
   - move has_vpci() to asm-arm/domain.h
   - add a comment (TODO) to unimplemented yet handle_pio()
   - remove hvm_mmio_first(last)_byte() and hvm_ioreq_(page/vcpu/server) structs
     from the arch files, they were already moved to the common code
   - remove set_foreign_p2m_entry() changes, they will be properly implemented
     in the follow-up patch
   - select IOREQ_SERVER for Arm instead of Arm64 in Kconfig
   - remove x86's realmode and other unneeded stubs from xen/ioreq.h
   - clafify ioreq_t p.df usage in try_fwd_ioserv()
   - set ioreq_t p.count to 1 in try_fwd_ioserv()

Changes V1 -> V2:
   - was split into:
     - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
     - xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
   - update the author of a patch
   - update patch description
   - move a loop in leave_hypervisor_to_guest() to a separate patch
   - set IOREQ_SERVER disabled by default
   - remove already clarified /* XXX */
   - replace BUG() by ASSERT_UNREACHABLE() in handle_pio()
   - remove default case for handling the return value of try_handle_mmio()
   - remove struct hvm_domain, enum hvm_io_completion, struct hvm_vcpu_io,
     struct hvm_vcpu from asm-arm/domain.h, these are common materials now
   - update everything according to the recent changes (IOREQ related function
     names don't contain "hvm" prefixes/infixes anymore, IOREQ related fields
     are part of common struct vcpu/domain now, etc)

Changes V2 -> V3:
   - update patch according the "legacy interface" is x86 specific
   - add dummy arch hooks
   - remove dummy paging_mark_pfn_dirty()
   - don’t include <xen/domain_page.h> in common ioreq.c
   - don’t include <public/hvm/ioreq.h> in arch ioreq.h
   - remove #define ioreq_params(d, i)
---
---
 xen/arch/arm/Makefile           |   2 +
 xen/arch/arm/dm.c               |  34 ++++++++++
 xen/arch/arm/domain.c           |   9 +++
 xen/arch/arm/io.c               |  11 +++-
 xen/arch/arm/ioreq.c            | 141 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/traps.c            |  13 ++++
 xen/include/asm-arm/domain.h    |   3 +
 xen/include/asm-arm/hvm/ioreq.h | 139 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/mmio.h      |   1 +
 9 files changed, 352 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/dm.c
 create mode 100644 xen/arch/arm/ioreq.c
 create mode 100644 xen/include/asm-arm/hvm/ioreq.h

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 296c5e6..c3ff454 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -13,6 +13,7 @@ obj-y += cpuerrata.o
 obj-y += cpufeature.o
 obj-y += decode.o
 obj-y += device.o
+obj-$(CONFIG_IOREQ_SERVER) += dm.o
 obj-y += domain.o
 obj-y += domain_build.init.o
 obj-y += domctl.o
@@ -27,6 +28,7 @@ obj-y += guest_atomics.o
 obj-y += guest_walk.o
 obj-y += hvm.o
 obj-y += io.o
+obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
 obj-y += irq.o
 obj-y += kernel.init.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
new file mode 100644
index 0000000..5d3da37
--- /dev/null
+++ b/xen/arch/arm/dm.c
@@ -0,0 +1,34 @@
+/*
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/dm.h>
+#include <xen/hypercall.h>
+
+int arch_dm_op(struct xen_dm_op *op, struct domain *d,
+               const struct dmop_args *op_args, bool *const_op)
+{
+    return -EOPNOTSUPP;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 18cafcd..8f55aba 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -15,6 +15,7 @@
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/livepatch.h>
 #include <xen/sched.h>
@@ -696,6 +697,10 @@ int arch_domain_create(struct domain *d,
 
     ASSERT(config != NULL);
 
+#ifdef CONFIG_IOREQ_SERVER
+    ioreq_domain_init(d);
+#endif
+
     /* p2m_init relies on some value initialized by the IOMMU subsystem */
     if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
         goto fail;
@@ -1014,6 +1019,10 @@ int domain_relinquish_resources(struct domain *d)
         if (ret )
             return ret;
 
+#ifdef CONFIG_IOREQ_SERVER
+        ioreq_server_destroy_all(d);
+#endif
+
     PROGRESS(xen):
         ret = relinquish_memory(d, &d->xenpage_list);
         if ( ret )
diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index ae7ef96..f44cfd4 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -23,6 +23,7 @@
 #include <asm/cpuerrata.h>
 #include <asm/current.h>
 #include <asm/mmio.h>
+#include <asm/hvm/ioreq.h>
 
 #include "decode.h"
 
@@ -123,7 +124,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
 
     handler = find_mmio_handler(v->domain, info.gpa);
     if ( !handler )
-        return IO_UNHANDLED;
+    {
+        int rc;
+
+        rc = try_fwd_ioserv(regs, v, &info);
+        if ( rc == IO_HANDLED )
+            return handle_ioserv(regs, v);
+
+        return rc;
+    }
 
     /* All the instructions used on emulated MMIO region should be valid */
     if ( !dabt.valid )
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
new file mode 100644
index 0000000..f08190c
--- /dev/null
+++ b/xen/arch/arm/ioreq.c
@@ -0,0 +1,141 @@
+/*
+ * arm/ioreq.c: hardware virtual machine I/O emulation
+ *
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/domain.h>
+#include <xen/ioreq.h>
+
+#include <asm/traps.h>
+
+#include <public/hvm/ioreq.h>
+
+enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
+{
+    const union hsr hsr = { .bits = regs->hsr };
+    const struct hsr_dabt dabt = hsr.dabt;
+    /* Code is similar to handle_read */
+    uint8_t size = (1 << dabt.size) * 8;
+    register_t r = v->io.req.data;
+
+    /* We are done with the IO */
+    v->io.req.state = STATE_IOREQ_NONE;
+
+    if ( dabt.write )
+        return IO_HANDLED;
+
+    /*
+     * Sign extend if required.
+     * Note that we expect the read handler to have zeroed the bits
+     * outside the requested access size.
+     */
+    if ( dabt.sign && (r & (1UL << (size - 1))) )
+    {
+        /*
+         * We are relying on register_t using the same as
+         * an unsigned long in order to keep the 32-bit assembly
+         * code smaller.
+         */
+        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
+        r |= (~0UL) << size;
+    }
+
+    set_user_reg(regs, dabt.reg, r);
+
+    return IO_HANDLED;
+}
+
+enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
+                             struct vcpu *v, mmio_info_t *info)
+{
+    struct vcpu_io *vio = &v->io;
+    ioreq_t p = {
+        .type = IOREQ_TYPE_COPY,
+        .addr = info->gpa,
+        .size = 1 << info->dabt.size,
+        .count = 1,
+        .dir = !info->dabt.write,
+        /*
+         * On x86, df is used by 'rep' instruction to tell the direction
+         * to iterate (forward or backward).
+         * On Arm, all the accesses to MMIO region will do a single
+         * memory access. So for now, we can safely always set to 0.
+         */
+        .df = 0,
+        .data = get_user_reg(regs, info->dabt.reg),
+        .state = STATE_IOREQ_READY,
+    };
+    struct ioreq_server *s = NULL;
+    enum io_state rc;
+
+    switch ( vio->req.state )
+    {
+    case STATE_IOREQ_NONE:
+        break;
+
+    case STATE_IORESP_READY:
+        return IO_HANDLED;
+
+    default:
+        gdprintk(XENLOG_ERR, "wrong state %u\n", vio->req.state);
+        return IO_ABORT;
+    }
+
+    s = ioreq_server_select(v->domain, &p);
+    if ( !s )
+        return IO_UNHANDLED;
+
+    if ( !info->dabt.valid )
+        return IO_ABORT;
+
+    vio->req = p;
+
+    rc = ioreq_send(s, &p, 0);
+    if ( rc != IO_RETRY || v->domain->is_shutting_down )
+        vio->req.state = STATE_IOREQ_NONE;
+    else if ( !ioreq_needs_completion(&vio->req) )
+        rc = IO_HANDLED;
+    else
+        vio->completion = IO_mmio_completion;
+
+    return rc;
+}
+
+bool ioreq_complete_mmio(void)
+{
+    struct vcpu *v = current;
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    const union hsr hsr = { .bits = regs->hsr };
+    paddr_t addr = v->io.req.addr;
+
+    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
+    {
+        advance_pc(regs, hsr);
+        return true;
+    }
+
+    return false;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 22bd1bd..036b13f 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -21,6 +21,7 @@
 #include <xen/hypercall.h>
 #include <xen/init.h>
 #include <xen/iocap.h>
+#include <xen/ioreq.h>
 #include <xen/irq.h>
 #include <xen/lib.h>
 #include <xen/mem_access.h>
@@ -1385,6 +1386,9 @@ static arm_hypercall_t arm_hypercall_table[] = {
 #ifdef CONFIG_HYPFS
     HYPERCALL(hypfs_op, 5),
 #endif
+#ifdef CONFIG_IOREQ_SERVER
+    HYPERCALL(dm_op, 3),
+#endif
 };
 
 #ifndef NDEBUG
@@ -1956,6 +1960,9 @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
             case IO_HANDLED:
                 advance_pc(regs, hsr);
                 return;
+            case IO_RETRY:
+                /* finish later */
+                return;
             case IO_UNHANDLED:
                 /* IO unhandled, try another way to handle it. */
                 break;
@@ -2254,6 +2261,12 @@ static void check_for_vcpu_work(void)
 {
     struct vcpu *v = current;
 
+#ifdef CONFIG_IOREQ_SERVER
+    local_irq_enable();
+    vcpu_ioreq_handle_completion(v);
+    local_irq_disable();
+#endif
+
     if ( likely(!v->arch.need_flush_to_ram) )
         return;
 
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 6819a3b..c235e5b 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -10,6 +10,7 @@
 #include <asm/gic.h>
 #include <asm/vgic.h>
 #include <asm/vpl011.h>
+#include <public/hvm/dm_op.h>
 #include <public/hvm/params.h>
 
 struct hvm_domain
@@ -262,6 +263,8 @@ static inline void arch_vcpu_block(struct vcpu *v) {}
 
 #define arch_vm_assist_valid_mask(d) (1UL << VMASST_TYPE_runstate_update_flag)
 
+#define has_vpci(d)    ({ (void)(d); false; })
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h
new file mode 100644
index 0000000..2bffc7a
--- /dev/null
+++ b/xen/include/asm-arm/hvm/ioreq.h
@@ -0,0 +1,139 @@
+/*
+ * hvm.h: Hardware virtual machine assist interface definitions.
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __ASM_ARM_HVM_IOREQ_H__
+#define __ASM_ARM_HVM_IOREQ_H__
+
+#include <xen/ioreq.h>
+
+#ifdef CONFIG_IOREQ_SERVER
+enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v);
+enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
+                             struct vcpu *v, mmio_info_t *info);
+#else
+static inline enum io_state handle_ioserv(struct cpu_user_regs *regs,
+                                          struct vcpu *v)
+{
+    return IO_UNHANDLED;
+}
+
+static inline enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
+                                           struct vcpu *v, mmio_info_t *info)
+{
+    return IO_UNHANDLED;
+}
+#endif
+
+bool ioreq_complete_mmio(void);
+
+static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
+{
+    /*
+     * TODO: For Arm64, the main user will be PCI. So this should be
+     * implemented when we add support for vPCI.
+     */
+    ASSERT_UNREACHABLE();
+    return true;
+}
+
+static inline void msix_write_completion(struct vcpu *v)
+{
+}
+
+static inline bool arch_vcpu_ioreq_completion(enum io_completion io_completion)
+{
+    ASSERT_UNREACHABLE();
+    return true;
+}
+
+/*
+ * The "legacy" mechanism of mapping magic pages for the IOREQ servers
+ * is x86 specific, so the following hooks don't need to be implemented on Arm:
+ * - arch_ioreq_server_map_pages
+ * - arch_ioreq_server_unmap_pages
+ * - arch_ioreq_server_enable
+ * - arch_ioreq_server_disable
+ */
+static inline int arch_ioreq_server_map_pages(struct ioreq_server *s)
+{
+    return -EOPNOTSUPP;
+}
+
+static inline void arch_ioreq_server_unmap_pages(struct ioreq_server *s)
+{
+}
+
+static inline void arch_ioreq_server_enable(struct ioreq_server *s)
+{
+}
+
+static inline void arch_ioreq_server_disable(struct ioreq_server *s)
+{
+}
+
+static inline void arch_ioreq_server_destroy(struct ioreq_server *s)
+{
+}
+
+static inline int arch_ioreq_server_map_mem_type(struct domain *d,
+                                                 struct ioreq_server *s,
+                                                 uint32_t flags)
+{
+    return -EOPNOTSUPP;
+}
+
+static inline bool arch_ioreq_server_destroy_all(struct domain *d)
+{
+    return true;
+}
+
+static inline int arch_ioreq_server_get_type_addr(const struct domain *d,
+                                                  const ioreq_t *p,
+                                                  uint8_t *type,
+                                                  uint64_t *addr)
+{
+    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
+        return -EINVAL;
+
+    *type = (p->type == IOREQ_TYPE_PIO) ?
+             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
+    *addr = p->addr;
+
+    return 0;
+}
+
+static inline void arch_ioreq_domain_init(struct domain *d)
+{
+}
+
+#define IOREQ_STATUS_HANDLED     IO_HANDLED
+#define IOREQ_STATUS_UNHANDLED   IO_UNHANDLED
+#define IOREQ_STATUS_RETRY       IO_RETRY
+
+#endif /* __ASM_ARM_HVM_IOREQ_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h
index 8dbfb27..7ab873c 100644
--- a/xen/include/asm-arm/mmio.h
+++ b/xen/include/asm-arm/mmio.h
@@ -37,6 +37,7 @@ enum io_state
     IO_ABORT,       /* The IO was handled by the helper and led to an abort. */
     IO_HANDLED,     /* The IO was successfully handled by the helper. */
     IO_UNHANDLED,   /* The IO was not handled by the helper. */
+    IO_RETRY,       /* Retry the emulation for some reason */
 };
 
 typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (13 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-11-30 20:51   ` Volodymyr Babchuk
  2020-12-09 23:18   ` Stefano Stabellini
  2020-11-30 10:31 ` [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
                   ` (9 subsequent siblings)
  24 siblings, 2 replies; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch adds proper handling of return value of
vcpu_ioreq_handle_completion() which involves using a loop
in leave_hypervisor_to_guest().

The reason to use an unbounded loop here is the fact that vCPU
shouldn't continue until an I/O has completed. In Xen case, if an I/O
never completes then it most likely means that something went horribly
wrong with the Device Emulator. And it is most likely not safe to
continue. So letting the vCPU to spin forever if I/O never completes
is a safer action than letting it continue and leaving the guest in
unclear state and is the best what we can do for now.

This wouldn't be an issue for Xen as do_softirq() would be called at
every loop. In case of failure, the guest will crash and the vCPU
will be unscheduled.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch, changes were derived from (+ new explanation):
     arm/ioreq: Introduce arch specific bits for IOREQ/DM features

Changes V2 -> V3:
   - update patch description
---
---
 xen/arch/arm/traps.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 036b13f..4cef43e 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2257,18 +2257,23 @@ static void check_for_pcpu_work(void)
  * Process pending work for the vCPU. Any call should be fast or
  * implement preemption.
  */
-static void check_for_vcpu_work(void)
+static bool check_for_vcpu_work(void)
 {
     struct vcpu *v = current;
 
 #ifdef CONFIG_IOREQ_SERVER
+    bool handled;
+
     local_irq_enable();
-    vcpu_ioreq_handle_completion(v);
+    handled = vcpu_ioreq_handle_completion(v);
     local_irq_disable();
+
+    if ( !handled )
+        return true;
 #endif
 
     if ( likely(!v->arch.need_flush_to_ram) )
-        return;
+        return false;
 
     /*
      * Give a chance for the pCPU to process work before handling the vCPU
@@ -2279,6 +2284,8 @@ static void check_for_vcpu_work(void)
     local_irq_enable();
     p2m_flush_vm(v);
     local_irq_disable();
+
+    return false;
 }
 
 /*
@@ -2291,8 +2298,22 @@ void leave_hypervisor_to_guest(void)
 {
     local_irq_disable();
 
-    check_for_vcpu_work();
-    check_for_pcpu_work();
+    /*
+     * The reason to use an unbounded loop here is the fact that vCPU
+     * shouldn't continue until an I/O has completed. In Xen case, if an I/O
+     * never completes then it most likely means that something went horribly
+     * wrong with the Device Emulator. And it is most likely not safe to
+     * continue. So letting the vCPU to spin forever if I/O never completes
+     * is a safer action than letting it continue and leaving the guest in
+     * unclear state and is the best what we can do for now.
+     *
+     * This wouldn't be an issue for Xen as do_softirq() would be called at
+     * every loop. In case of failure, the guest will crash and the vCPU
+     * will be unscheduled.
+     */
+    do {
+        check_for_pcpu_work();
+    } while ( check_for_vcpu_work() );
 
     vgic_sync_to_lrs();
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (14 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-08 14:24   ` Jan Beulich
                     ` (2 more replies)
  2020-11-30 10:31 ` [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() Oleksandr Tyshchenko
                   ` (8 subsequent siblings)
  24 siblings, 3 replies; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Wei Liu, Roger Pau Monné,
	Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch implements reference counting of foreign entries in
in set_foreign_p2m_entry() on Arm. This is a mandatory action if
we want to run emulator (IOREQ server) in other than dom0 domain,
as we can't trust it to do the right thing if it is not running
in dom0. So we need to grab a reference on the page to avoid it
disappearing.

It is valid to always pass "p2m_map_foreign_rw" type to
guest_physmap_add_entry() since the current and foreign domains
would be always different. A case when they are equal would be
rejected by rcu_lock_remote_domain_by_id(). Besides the similar
comment in the code put a respective ASSERT() to catch incorrect
usage in future.

It was tested with IOREQ feature to confirm that all the pages given
to this function belong to a domain, so we can use the same approach
as for XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap_one().

This involves adding an extra parameter for the foreign domain to
set_foreign_p2m_entry() and a helper to indicate whether the arch
supports the reference counting of foreign entries and the restriction
for the hardware domain in the common code can be skipped for it.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch, was split from:
     "[RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features"
   - rewrite a logic to handle properly reference in set_foreign_p2m_entry()
     instead of treating foreign entries as p2m_ram_rw

Changes V1 -> V2:
   - rebase according to the recent changes to acquire_resource()
   - update patch description
   - introduce arch_refcounts_p2m()
   - add an explanation why p2m_map_foreign_rw is valid
   - move set_foreign_p2m_entry() to p2m-common.h
   - add const to new parameter

Changes V2 -> V3:
   - update patch description
   - rename arch_refcounts_p2m() to arch_acquire_resource_check()
   - move comment to x86’s arch_acquire_resource_check()
   - return rc in Arm's set_foreign_p2m_entry()
   - put a respective ASSERT() into Arm's set_foreign_p2m_entry()
---
---
 xen/arch/arm/p2m.c           | 24 ++++++++++++++++++++++++
 xen/arch/x86/mm/p2m.c        |  5 +++--
 xen/common/memory.c          | 10 +++-------
 xen/include/asm-arm/p2m.h    | 19 +++++++++----------
 xen/include/asm-x86/p2m.h    | 16 +++++++++++++---
 xen/include/xen/p2m-common.h |  4 ++++
 6 files changed, 56 insertions(+), 22 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 4eeb867..5b8d494 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1380,6 +1380,30 @@ int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn,
     return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
 }
 
+int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
+{
+    struct page_info *page = mfn_to_page(mfn);
+    int rc;
+
+    if ( !get_page(page, fd) )
+        return -EINVAL;
+
+    /*
+     * It is valid to always use p2m_map_foreign_rw here as if this gets
+     * called then d != fd. A case when d == fd would be rejected by
+     * rcu_lock_remote_domain_by_id() earlier. Put a respective ASSERT()
+     * to catch incorrect usage in future.
+     */
+    ASSERT(d != fd);
+
+    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw);
+    if ( rc )
+        put_page(page);
+
+    return rc;
+}
+
 static struct page_info *p2m_allocate_root(void)
 {
     struct page_info *page;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 7a2ba82..4772c86 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1321,7 +1321,8 @@ static int set_typed_p2m_entry(struct domain *d, unsigned long gfn_l,
 }
 
 /* Set foreign mfn in the given guest's p2m table. */
-int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
 {
     return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign,
                                p2m_get_hostp2m(d)->default_access);
@@ -2621,7 +2622,7 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
      * will update the m2p table which will result in  mfn -> gpfn of dom0
      * and not fgfn of domU.
      */
-    rc = set_foreign_p2m_entry(tdom, gpfn, mfn);
+    rc = set_foreign_p2m_entry(tdom, fdom, gpfn, mfn);
     if ( rc )
         gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. "
                  "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 3363c06..49e3001 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1134,12 +1134,8 @@ static int acquire_resource(
     xen_pfn_t mfn_list[32];
     int rc;
 
-    /*
-     * FIXME: Until foreign pages inserted into the P2M are properly
-     *        reference counted, it is unsafe to allow mapping of
-     *        resource pages unless the caller is the hardware domain.
-     */
-    if ( paging_mode_translate(currd) && !is_hardware_domain(currd) )
+    if ( paging_mode_translate(currd) && !is_hardware_domain(currd) &&
+         !arch_acquire_resource_check() )
         return -EACCES;
 
     if ( copy_from_guest(&xmar, arg, 1) )
@@ -1207,7 +1203,7 @@ static int acquire_resource(
 
         for ( i = 0; !rc && i < xmar.nr_frames; i++ )
         {
-            rc = set_foreign_p2m_entry(currd, gfn_list[i],
+            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
                                        _mfn(mfn_list[i]));
             /* rc should be -EIO for any iteration other than the first */
             if ( rc && i )
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 28ca9a8..4f8056e 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -161,6 +161,15 @@ typedef enum {
 #endif
 #include <xen/p2m-common.h>
 
+static inline bool arch_acquire_resource_check(void)
+{
+    /*
+     * The reference counting of foreign entries in set_foreign_p2m_entry()
+     * is supported on Arm.
+     */
+    return true;
+}
+
 static inline
 void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
 {
@@ -392,16 +401,6 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned int order)
     return gfn_add(gfn, 1UL << order);
 }
 
-static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
-                                        mfn_t mfn)
-{
-    /*
-     * NOTE: If this is implemented then proper reference counting of
-     *       foreign entries will need to be implemented.
-     */
-    return -EOPNOTSUPP;
-}
-
 /*
  * A vCPU has cache enabled only when the MMU is enabled and data cache
  * is enabled.
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 4603560..8d2dc22 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -382,6 +382,19 @@ struct p2m_domain {
 #endif
 #include <xen/p2m-common.h>
 
+static inline bool arch_acquire_resource_check(void)
+{
+    /*
+     * The reference counting of foreign entries in set_foreign_p2m_entry()
+     * is not supported on x86.
+     *
+     * FIXME: Until foreign pages inserted into the P2M are properly
+     * reference counted, it is unsafe to allow mapping of
+     * resource pages unless the caller is the hardware domain.
+     */
+    return false;
+}
+
 /*
  * Updates vCPU's n2pm to match its np2m_base in VMCx12 and returns that np2m.
  */
@@ -647,9 +660,6 @@ int p2m_finish_type_change(struct domain *d,
 int p2m_is_logdirty_range(struct p2m_domain *, unsigned long start,
                           unsigned long end);
 
-/* Set foreign entry in the p2m table (for priv-mapping) */
-int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
-
 /* Set mmio addresses in the p2m table (for pass-through) */
 int set_mmio_p2m_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
                        unsigned int order);
diff --git a/xen/include/xen/p2m-common.h b/xen/include/xen/p2m-common.h
index 58031a6..b4bc709 100644
--- a/xen/include/xen/p2m-common.h
+++ b/xen/include/xen/p2m-common.h
@@ -3,6 +3,10 @@
 
 #include <xen/mm.h>
 
+/* Set foreign entry in the p2m table */
+int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
+                          unsigned long gfn, mfn_t mfn);
+
 /* Remove a page from a domain's p2m table */
 int __must_check
 guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (15 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-08 15:11   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 18/23] xen/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Wei Liu, Paul Durrant, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch introduces a helper the main purpose of which is to check
if a domain is using IOREQ server(s).

On Arm the current benefit is to avoid calling vcpu_ioreq_handle_completion()
(which implies iterating over all possible IOREQ servers anyway)
on every return in leave_hypervisor_to_guest() if there is no active
servers for the particular domain.
Also this helper will be used by one of the subsequent patches on Arm.

This involves adding an extra per-domain variable to store the count
of servers in use.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch

Changes V1 -> V2:
   - update patch description
   - guard helper with CONFIG_IOREQ_SERVER
   - remove "hvm" prefix
   - modify helper to just return d->arch.hvm.ioreq_server.nr_servers
   - put suitable ASSERT()s
   - use ASSERT(d->ioreq_server.server[id] ? !s : !!s) in set_ioreq_server()
   - remove d->ioreq_server.nr_servers = 0 from hvm_ioreq_init()

Changes V2 -> V3:
   - update patch description
   - remove ASSERT()s from the helper, add a comment
   - use #ifdef CONFIG_IOREQ_SERVER inside function body
   - use new ASSERT() construction in set_ioreq_server()
---
---
 xen/arch/arm/traps.c    | 15 +++++++++------
 xen/common/ioreq.c      |  7 ++++++-
 xen/include/xen/ioreq.h | 14 ++++++++++++++
 xen/include/xen/sched.h |  1 +
 4 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 4cef43e..b6077d2 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2262,14 +2262,17 @@ static bool check_for_vcpu_work(void)
     struct vcpu *v = current;
 
 #ifdef CONFIG_IOREQ_SERVER
-    bool handled;
+    if ( domain_has_ioreq_server(v->domain) )
+    {
+        bool handled;
 
-    local_irq_enable();
-    handled = vcpu_ioreq_handle_completion(v);
-    local_irq_disable();
+        local_irq_enable();
+        handled = vcpu_ioreq_handle_completion(v);
+        local_irq_disable();
 
-    if ( !handled )
-        return true;
+        if ( !handled )
+            return true;
+    }
 #endif
 
     if ( likely(!v->arch.need_flush_to_ram) )
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 4855dd8..f35dcf9 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -39,9 +39,14 @@ static void set_ioreq_server(struct domain *d, unsigned int id,
                              struct ioreq_server *s)
 {
     ASSERT(id < MAX_NR_IOREQ_SERVERS);
-    ASSERT(!s || !d->ioreq_server.server[id]);
+    ASSERT(!s ^ !d->ioreq_server.server[id]);
 
     d->ioreq_server.server[id] = s;
+
+    if ( s )
+        d->ioreq_server.nr_servers++;
+    else
+        d->ioreq_server.nr_servers--;
 }
 
 #define GET_IOREQ_SERVER(d, id) \
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 02ff998..2289e79 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -55,6 +55,20 @@ struct ioreq_server {
     uint8_t                bufioreq_handling;
 };
 
+/*
+ * This should only be used when d == current->domain and it's not paused,
+ * or when they're distinct and d is paused. Otherwise the result is
+ * stale before the caller can inspect it.
+ */
+static inline bool domain_has_ioreq_server(const struct domain *d)
+{
+#ifdef CONFIG_IOREQ_SERVER
+    return d->ioreq_server.nr_servers;
+#else
+    return false;
+#endif
+}
+
 static inline paddr_t ioreq_mmio_first_byte(const ioreq_t *p)
 {
     return unlikely(p->df) ?
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 8269f84..2277995 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -550,6 +550,7 @@ struct domain
     struct {
         spinlock_t              lock;
         struct ioreq_server     *server[MAX_NR_IOREQ_SERVERS];
+        unsigned int            nr_servers;
     } ioreq_server;
 #endif
 };
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 18/23] xen/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (16 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-10  2:21   ` Stefano Stabellini
  2020-11-30 10:31 ` [PATCH V3 19/23] xen/arm: io: Abstract sign-extension Oleksandr Tyshchenko
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Ian Jackson, Wei Liu, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Volodymyr Babchuk,
	Oleksandr Tyshchenko

From: Julien Grall <julien.grall@arm.com>

This patch adds ability to the device emulator to notify otherend
(some entity running in the guest) using a SPI and implements Arm
specific bits for it. Proposed interface allows emulator to set
the logical level of a one of a domain's IRQ lines.

We can't reuse the existing DM op (xen_dm_op_set_isa_irq_level)
to inject an interrupt as the "isa_irq" field is only 8-bit and
able to cover IRQ 0 - 255, whereas we need a wider range (0 - 1020).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

***
Please note, I left interface untouched since there is still
an open discussion what interface to use/what information to pass
to the hypervisor. The question whether we should abstract away
the state of the line or not.
***

Changes RFC -> V1:
   - check incoming parameters in arch_dm_op()
   - add explicit padding to struct xen_dm_op_set_irq_level

Changes V1 -> V2:
   - update the author of a patch
   - update patch description
   - check that padding is always 0
   - mention that interface is Arm only and only SPIs are
     supported for now
   - allow to set the logical level of a line for non-allocated
     interrupts only
   - add xen_dm_op_set_irq_level_t

Changes V2 -> V3:
   - no changes
---
---
 tools/include/xendevicemodel.h               |  4 ++
 tools/libs/devicemodel/core.c                | 18 +++++++++
 tools/libs/devicemodel/libxendevicemodel.map |  1 +
 xen/arch/arm/dm.c                            | 57 +++++++++++++++++++++++++++-
 xen/common/dm.c                              |  1 +
 xen/include/public/hvm/dm_op.h               | 16 ++++++++
 6 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/tools/include/xendevicemodel.h b/tools/include/xendevicemodel.h
index e877f5c..c06b3c8 100644
--- a/tools/include/xendevicemodel.h
+++ b/tools/include/xendevicemodel.h
@@ -209,6 +209,10 @@ int xendevicemodel_set_isa_irq_level(
     xendevicemodel_handle *dmod, domid_t domid, uint8_t irq,
     unsigned int level);
 
+int xendevicemodel_set_irq_level(
+    xendevicemodel_handle *dmod, domid_t domid, unsigned int irq,
+    unsigned int level);
+
 /**
  * This function maps a PCI INTx line to a an IRQ line.
  *
diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index 4d40639..30bd79f 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
     return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
 }
 
+int xendevicemodel_set_irq_level(
+    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
+    unsigned int level)
+{
+    struct xen_dm_op op;
+    struct xen_dm_op_set_irq_level *data;
+
+    memset(&op, 0, sizeof(op));
+
+    op.op = XEN_DMOP_set_irq_level;
+    data = &op.u.set_irq_level;
+
+    data->irq = irq;
+    data->level = level;
+
+    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
+}
+
 int xendevicemodel_set_pci_link_route(
     xendevicemodel_handle *dmod, domid_t domid, uint8_t link, uint8_t irq)
 {
diff --git a/tools/libs/devicemodel/libxendevicemodel.map b/tools/libs/devicemodel/libxendevicemodel.map
index 561c62d..a0c3012 100644
--- a/tools/libs/devicemodel/libxendevicemodel.map
+++ b/tools/libs/devicemodel/libxendevicemodel.map
@@ -32,6 +32,7 @@ VERS_1.2 {
 	global:
 		xendevicemodel_relocate_memory;
 		xendevicemodel_pin_memory_cacheattr;
+		xendevicemodel_set_irq_level;
 } VERS_1.1;
 
 VERS_1.3 {
diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
index 5d3da37..e4bb233 100644
--- a/xen/arch/arm/dm.c
+++ b/xen/arch/arm/dm.c
@@ -17,10 +17,65 @@
 #include <xen/dm.h>
 #include <xen/hypercall.h>
 
+#include <asm/vgic.h>
+
 int arch_dm_op(struct xen_dm_op *op, struct domain *d,
                const struct dmop_args *op_args, bool *const_op)
 {
-    return -EOPNOTSUPP;
+    int rc;
+
+    switch ( op->op )
+    {
+    case XEN_DMOP_set_irq_level:
+    {
+        const struct xen_dm_op_set_irq_level *data =
+            &op->u.set_irq_level;
+        unsigned int i;
+
+        /* Only SPIs are supported */
+        if ( (data->irq < NR_LOCAL_IRQS) || (data->irq >= vgic_num_irqs(d)) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        if ( data->level != 0 && data->level != 1 )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        /* Check that padding is always 0 */
+        for ( i = 0; i < sizeof(data->pad); i++ )
+        {
+            if ( data->pad[i] )
+            {
+                rc = -EINVAL;
+                break;
+            }
+        }
+
+        /*
+         * Allow to set the logical level of a line for non-allocated
+         * interrupts only.
+         */
+        if ( test_bit(data->irq, d->arch.vgic.allocated_irqs) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        vgic_inject_irq(d, NULL, data->irq, data->level);
+        rc = 0;
+        break;
+    }
+
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+    }
+
+    return rc;
 }
 
 /*
diff --git a/xen/common/dm.c b/xen/common/dm.c
index 9d394fc..7bfb46c 100644
--- a/xen/common/dm.c
+++ b/xen/common/dm.c
@@ -48,6 +48,7 @@ static int dm_op(const struct dmop_args *op_args)
         [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
         [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
         [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
+        [XEN_DMOP_set_irq_level]                    = sizeof(struct xen_dm_op_set_irq_level),
     };
 
     rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index 66cae1a..1f70d58 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -434,6 +434,21 @@ struct xen_dm_op_pin_memory_cacheattr {
 };
 typedef struct xen_dm_op_pin_memory_cacheattr xen_dm_op_pin_memory_cacheattr_t;
 
+/*
+ * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's
+ *                         IRQ lines (currently Arm only).
+ * Only SPIs are supported.
+ */
+#define XEN_DMOP_set_irq_level 19
+
+struct xen_dm_op_set_irq_level {
+    uint32_t irq;
+    /* IN - Level: 0 -> deasserted, 1 -> asserted */
+    uint8_t level;
+    uint8_t pad[3];
+};
+typedef struct xen_dm_op_set_irq_level xen_dm_op_set_irq_level_t;
+
 struct xen_dm_op {
     uint32_t op;
     uint32_t pad;
@@ -447,6 +462,7 @@ struct xen_dm_op {
         xen_dm_op_track_dirty_vram_t track_dirty_vram;
         xen_dm_op_set_pci_intx_level_t set_pci_intx_level;
         xen_dm_op_set_isa_irq_level_t set_isa_irq_level;
+        xen_dm_op_set_irq_level_t set_irq_level;
         xen_dm_op_set_pci_link_route_t set_pci_link_route;
         xen_dm_op_modified_memory_t modified_memory;
         xen_dm_op_set_mem_type_t set_mem_type;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (17 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 18/23] xen/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-11-30 21:03   ` Volodymyr Babchuk
  2020-11-30 10:31 ` [PATCH V3 20/23] xen/ioreq: Make x86's send_invalidate_req() common Oleksandr Tyshchenko
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

In order to avoid code duplication (both handle_read() and
handle_ioserv() contain the same code for the sign-extension)
put this code to a common helper to be used for both.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch

Changes V2 -> V3:
   - no changes
---
---
 xen/arch/arm/io.c           | 18 ++----------------
 xen/arch/arm/ioreq.c        | 17 +----------------
 xen/include/asm-arm/traps.h | 24 ++++++++++++++++++++++++
 3 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index f44cfd4..8d6ec6c 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -23,6 +23,7 @@
 #include <asm/cpuerrata.h>
 #include <asm/current.h>
 #include <asm/mmio.h>
+#include <asm/traps.h>
 #include <asm/hvm/ioreq.h>
 
 #include "decode.h"
@@ -39,26 +40,11 @@ static enum io_state handle_read(const struct mmio_handler *handler,
      * setting r).
      */
     register_t r = 0;
-    uint8_t size = (1 << dabt.size) * 8;
 
     if ( !handler->ops->read(v, info, &r, handler->priv) )
         return IO_ABORT;
 
-    /*
-     * Sign extend if required.
-     * Note that we expect the read handler to have zeroed the bits
-     * outside the requested access size.
-     */
-    if ( dabt.sign && (r & (1UL << (size - 1))) )
-    {
-        /*
-         * We are relying on register_t using the same as
-         * an unsigned long in order to keep the 32-bit assembly
-         * code smaller.
-         */
-        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
-        r |= (~0UL) << size;
-    }
+    r = sign_extend(dabt, r);
 
     set_user_reg(regs, dabt.reg, r);
 
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index f08190c..2f39289 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -28,7 +28,6 @@ enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
     const union hsr hsr = { .bits = regs->hsr };
     const struct hsr_dabt dabt = hsr.dabt;
     /* Code is similar to handle_read */
-    uint8_t size = (1 << dabt.size) * 8;
     register_t r = v->io.req.data;
 
     /* We are done with the IO */
@@ -37,21 +36,7 @@ enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
     if ( dabt.write )
         return IO_HANDLED;
 
-    /*
-     * Sign extend if required.
-     * Note that we expect the read handler to have zeroed the bits
-     * outside the requested access size.
-     */
-    if ( dabt.sign && (r & (1UL << (size - 1))) )
-    {
-        /*
-         * We are relying on register_t using the same as
-         * an unsigned long in order to keep the 32-bit assembly
-         * code smaller.
-         */
-        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
-        r |= (~0UL) << size;
-    }
+    r = sign_extend(dabt, r);
 
     set_user_reg(regs, dabt.reg, r);
 
diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h
index 997c378..e301c44 100644
--- a/xen/include/asm-arm/traps.h
+++ b/xen/include/asm-arm/traps.h
@@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const struct cpu_user_regs *regs)
         (unsigned long)abort_guest_exit_end == regs->pc;
 }
 
+/* Check whether the sign extension is required and perform it */
+static inline register_t sign_extend(const struct hsr_dabt dabt, register_t r)
+{
+    uint8_t size = (1 << dabt.size) * 8;
+
+    /*
+     * Sign extend if required.
+     * Note that we expect the read handler to have zeroed the bits
+     * outside the requested access size.
+     */
+    if ( dabt.sign && (r & (1UL << (size - 1))) )
+    {
+        /*
+         * We are relying on register_t using the same as
+         * an unsigned long in order to keep the 32-bit assembly
+         * code smaller.
+         */
+        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
+        r |= (~0UL) << size;
+    }
+
+    return r;
+}
+
 #endif /* __ASM_ARM_TRAPS__ */
 /*
  * Local variables:
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 20/23] xen/ioreq: Make x86's send_invalidate_req() common
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (18 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 19/23] xen/arm: io: Abstract sign-extension Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-08 15:24   ` Jan Beulich
  2020-11-30 10:31 ` [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling Oleksandr Tyshchenko
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Paul Durrant, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As the IOREQ is a common feature now and we also need to
invalidate qemu/demu mapcache on Arm when the required condition
occurs this patch moves this function to the common code
(and remames it to ioreq_signal_mapcache_invalidate).
This patch also moves per-domain qemu_mapcache_invalidate
variable out of the arch sub-struct (and drops "qemu" prefix).

The subsequent patch will add mapcache invalidation handling on Arm.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - move send_invalidate_req() to the common code
   - update patch subject/description
   - move qemu_mapcache_invalidate out of the arch sub-struct,
     update checks
   - remove #if defined(CONFIG_ARM64) from the common code

Changes V1 -> V2:
   - was split into:
     - xen/ioreq: Make x86's send_invalidate_req() common
     - xen/arm: Add mapcache invalidation handling
   - update patch description/subject
   - move Arm bits to a separate patch
   - don't alter the common code, the flag is set by arch code
   - rename send_invalidate_req() to send_invalidate_ioreq()
   - guard qemu_mapcache_invalidate with CONFIG_IOREQ_SERVER
   - use bool instead of bool_t
   - remove blank line blank line between head comment and #include-s

Changes V2 -> V3:
   - update patch description
   - drop "qemu" prefix from the variable name
   - rename send_invalidate_req() to ioreq_signal_mapcache_invalidate()
---
---
 xen/arch/x86/hvm/hypercall.c     |  9 +++++----
 xen/arch/x86/hvm/io.c            | 14 --------------
 xen/common/ioreq.c               | 14 ++++++++++++++
 xen/include/asm-x86/hvm/domain.h |  1 -
 xen/include/asm-x86/hvm/io.h     |  1 -
 xen/include/xen/ioreq.h          |  1 +
 xen/include/xen/sched.h          |  2 ++
 7 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index ac573c8..6d41c56 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -20,6 +20,7 @@
  */
 #include <xen/lib.h>
 #include <xen/hypercall.h>
+#include <xen/ioreq.h>
 #include <xen/nospec.h>
 
 #include <asm/hvm/emulate.h>
@@ -47,7 +48,7 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         rc = compat_memory_op(cmd, arg);
 
     if ( (cmd & MEMOP_CMD_MASK) == XENMEM_decrease_reservation )
-        curr->domain->arch.hvm.qemu_mapcache_invalidate = true;
+        curr->domain->mapcache_invalidate = true;
 
     return rc;
 }
@@ -326,9 +327,9 @@ int hvm_hypercall(struct cpu_user_regs *regs)
 
     HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%lu -> %lx", eax, regs->rax);
 
-    if ( unlikely(currd->arch.hvm.qemu_mapcache_invalidate) &&
-         test_and_clear_bool(currd->arch.hvm.qemu_mapcache_invalidate) )
-        send_invalidate_req();
+    if ( unlikely(currd->mapcache_invalidate) &&
+         test_and_clear_bool(currd->mapcache_invalidate) )
+        ioreq_signal_mapcache_invalidate();
 
     return curr->hcall_preempted ? HVM_HCALL_preempted : HVM_HCALL_completed;
 }
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index a0dd8d1..ba77414 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -64,20 +64,6 @@ void send_timeoffset_req(unsigned long timeoff)
         gprintk(XENLOG_ERR, "Unsuccessful timeoffset update\n");
 }
 
-/* Ask ioemu mapcache to invalidate mappings. */
-void send_invalidate_req(void)
-{
-    ioreq_t p = {
-        .type = IOREQ_TYPE_INVALIDATE,
-        .size = 4,
-        .dir = IOREQ_WRITE,
-        .data = ~0UL, /* flush all */
-    };
-
-    if ( ioreq_broadcast(&p, false) != 0 )
-        gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n");
-}
-
 bool hvm_emulate_one_insn(hvm_emulate_validate_t *validate, const char *descr)
 {
     struct hvm_emulate_ctxt ctxt;
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index f35dcf9..61ba761 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -35,6 +35,20 @@
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
+/* Ask ioemu mapcache to invalidate mappings. */
+void ioreq_signal_mapcache_invalidate(void)
+{
+    ioreq_t p = {
+        .type = IOREQ_TYPE_INVALIDATE,
+        .size = 4,
+        .dir = IOREQ_WRITE,
+        .data = ~0UL, /* flush all */
+    };
+
+    if ( ioreq_broadcast(&p, false) != 0 )
+        gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n");
+}
+
 static void set_ioreq_server(struct domain *d, unsigned int id,
                              struct ioreq_server *s)
 {
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index b8be1ad..cf959f6 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -122,7 +122,6 @@ struct hvm_domain {
 
     struct viridian_domain *viridian;
 
-    bool_t                 qemu_mapcache_invalidate;
     bool_t                 is_s3_suspended;
 
     /*
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index fb64294..3da0136 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -97,7 +97,6 @@ bool relocate_portio_handler(
     unsigned int size);
 
 void send_timeoffset_req(unsigned long timeoff);
-void send_invalidate_req(void);
 bool handle_mmio_with_translation(unsigned long gla, unsigned long gpfn,
                                   struct npfec);
 bool handle_pio(uint16_t port, unsigned int size, int dir);
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 2289e79..482f76f 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -129,6 +129,7 @@ struct ioreq_server *ioreq_server_select(struct domain *d,
 int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p,
                bool buffered);
 unsigned int ioreq_broadcast(ioreq_t *p, bool buffered);
+void ioreq_signal_mapcache_invalidate(void);
 
 void ioreq_domain_init(struct domain *d);
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 2277995..60bf254 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -552,6 +552,8 @@ struct domain
         struct ioreq_server     *server[MAX_NR_IOREQ_SERVERS];
         unsigned int            nr_servers;
     } ioreq_server;
+
+    bool mapcache_invalidate;
 #endif
 };
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (19 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 20/23] xen/ioreq: Make x86's send_invalidate_req() common Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-12-10  2:30   ` Stefano Stabellini
  2020-11-30 10:31 ` [PATCH V3 22/23] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

We need to send mapcache invalidation request to qemu/demu everytime
the page gets removed from a guest.

At the moment, the Arm code doesn't explicitely remove the existing
mapping before inserting the new mapping. Instead, this is done
implicitely by __p2m_set_entry().

So we need to recognize a case when old entry is a RAM page *and*
the new MFN is different in order to set the corresponding flag.
The most suitable place to do this is p2m_free_entry(), there
we can find the correct leaf type. The invalidation request
will be sent in do_trap_hypercall() later on.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
CC: Julien Grall <julien.grall@arm.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes V1 -> V2:
   - new patch, some changes were derived from (+ new explanation):
     xen/ioreq: Make x86's invalidate qemu mapcache handling common
   - put setting of the flag into __p2m_set_entry()
   - clarify the conditions when the flag should be set
   - use domain_has_ioreq_server()
   - update do_trap_hypercall() by adding local variable

Changes V2 -> V3:
   - update patch description
   - move check to p2m_free_entry()
   - add a comment
   - use "curr" instead of "v" in do_trap_hypercall()
---
---
 xen/arch/arm/p2m.c   | 24 ++++++++++++++++--------
 xen/arch/arm/traps.c | 13 ++++++++++---
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 5b8d494..9674f6f 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,6 +1,7 @@
 #include <xen/cpu.h>
 #include <xen/domain_page.h>
 #include <xen/iocap.h>
+#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
 #include <xen/softirq.h>
@@ -749,17 +750,24 @@ static void p2m_free_entry(struct p2m_domain *p2m,
     if ( !p2m_is_valid(entry) )
         return;
 
-    /* Nothing to do but updating the stats if the entry is a super-page. */
-    if ( p2m_is_superpage(entry, level) )
+    if ( p2m_is_superpage(entry, level) || (level == 3) )
     {
-        p2m->stats.mappings[level]--;
-        return;
-    }
+#ifdef CONFIG_IOREQ_SERVER
+        /*
+         * If this gets called (non-recursively) then either the entry
+         * was replaced by an entry with a different base (valid case) or
+         * the shattering of a superpage was failed (error case).
+         * So, at worst, the spurious mapcache invalidation might be sent.
+         */
+        if ( domain_has_ioreq_server(p2m->domain) &&
+             (p2m->domain == current->domain) && p2m_is_ram(entry.p2m.type) )
+            p2m->domain->mapcache_invalidate = true;
+#endif
 
-    if ( level == 3 )
-    {
         p2m->stats.mappings[level]--;
-        p2m_put_l3_page(entry);
+        /* Nothing to do if the entry is a super-page. */
+        if ( level == 3 )
+            p2m_put_l3_page(entry);
         return;
     }
 
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index b6077d2..151c626 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1443,6 +1443,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
                               const union hsr hsr)
 {
     arm_hypercall_fn_t call = NULL;
+    struct vcpu *curr = current;
 
     BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
 
@@ -1459,7 +1460,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
         return;
     }
 
-    current->hcall_preempted = false;
+    curr->hcall_preempted = false;
 
     perfc_incra(hypercalls, *nr);
     call = arm_hypercall_table[*nr].fn;
@@ -1472,7 +1473,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
     HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
 
 #ifndef NDEBUG
-    if ( !current->hcall_preempted )
+    if ( !curr->hcall_preempted )
     {
         /* Deliberately corrupt parameter regs used by this hypercall. */
         switch ( arm_hypercall_table[*nr].nr_args ) {
@@ -1489,8 +1490,14 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
 #endif
 
     /* Ensure the hypercall trap instruction is re-executed. */
-    if ( current->hcall_preempted )
+    if ( curr->hcall_preempted )
         regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
+
+#ifdef CONFIG_IOREQ_SERVER
+    if ( unlikely(curr->domain->mapcache_invalidate) &&
+         test_and_clear_bool(curr->domain->mapcache_invalidate) )
+        ioreq_signal_mapcache_invalidate();
+#endif
 }
 
 void arch_hypercall_tasklet_result(struct vcpu *v, long res)
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 22/23] libxl: Introduce basic virtio-mmio support on Arm
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (20 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-11-30 10:31 ` [PATCH V3 23/23] [RFC] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Julien Grall, Ian Jackson, Wei Liu, Anthony PERARD,
	Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Oleksandr Tyshchenko

From: Julien Grall <julien.grall@arm.com>

This patch creates specific device node in the Guest device-tree
with allocated MMIO range and SPI interrupt if specific 'virtio'
property is present in domain config.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - was squashed with:
     "[RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way"
     "[RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node"
     "[RFC PATCH V1 12/12] libxl: Fix duplicate memory node in DT"
   - move VirtIO MMIO #define-s to xen/include/public/arch-arm.h

Changes V1 -> V2:
   - update the author of a patch

Changes V2 -> V3:
   - no changes
---
---
 tools/libs/light/libxl_arm.c     | 58 ++++++++++++++++++++++++++++++++++++++--
 tools/libs/light/libxl_types.idl |  1 +
 tools/xl/xl_parse.c              |  1 +
 xen/include/public/arch-arm.h    |  5 ++++
 4 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 66e8a06..588ee5a 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -26,8 +26,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 {
     uint32_t nr_spis = 0;
     unsigned int i;
-    uint32_t vuart_irq;
-    bool vuart_enabled = false;
+    uint32_t vuart_irq, virtio_irq;
+    bool vuart_enabled = false, virtio_enabled = false;
 
     /*
      * If pl011 vuart is enabled then increment the nr_spis to allow allocation
@@ -39,6 +39,17 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         vuart_enabled = true;
     }
 
+    /*
+     * XXX: Handle properly virtio
+     * A proper solution would be the toolstack to allocate the interrupts
+     * used by each virtio backend and let the backend now which one is used
+     */
+    if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {
+        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
+        virtio_irq = GUEST_VIRTIO_MMIO_SPI;
+        virtio_enabled = true;
+    }
+
     for (i = 0; i < d_config->b_info.num_irqs; i++) {
         uint32_t irq = d_config->b_info.irqs[i];
         uint32_t spi;
@@ -58,6 +69,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
             return ERROR_FAIL;
         }
 
+        /* The same check as for vpl011 */
+        if (virtio_enabled && irq == virtio_irq) {
+            LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq);
+            return ERROR_FAIL;
+        }
+
         if (irq < 32)
             continue;
 
@@ -658,6 +675,39 @@ static int make_vpl011_uart_node(libxl__gc *gc, void *fdt,
     return 0;
 }
 
+static int make_virtio_mmio_node(libxl__gc *gc, void *fdt,
+                                 uint64_t base, uint32_t irq)
+{
+    int res;
+    gic_interrupt intr;
+    /* Placeholder for virtio@ + a 64-bit number + \0 */
+    char buf[24];
+
+    snprintf(buf, sizeof(buf), "virtio@%"PRIx64, base);
+    res = fdt_begin_node(fdt, buf);
+    if (res) return res;
+
+    res = fdt_property_compat(gc, fdt, 1, "virtio,mmio");
+    if (res) return res;
+
+    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
+                            1, base, GUEST_VIRTIO_MMIO_SIZE);
+    if (res) return res;
+
+    set_interrupt(intr, irq, 0xf, DT_IRQ_TYPE_EDGE_RISING);
+    res = fdt_property_interrupts(gc, fdt, &intr, 1);
+    if (res) return res;
+
+    res = fdt_property(fdt, "dma-coherent", NULL, 0);
+    if (res) return res;
+
+    res = fdt_end_node(fdt);
+    if (res) return res;
+
+    return 0;
+
+}
+
 static const struct arch_info *get_arch_info(libxl__gc *gc,
                                              const struct xc_dom_image *dom)
 {
@@ -961,6 +1011,9 @@ next_resize:
         if (info->tee == LIBXL_TEE_TYPE_OPTEE)
             FDT( make_optee_node(gc, fdt) );
 
+        if (libxl_defbool_val(info->arch_arm.virtio))
+            FDT( make_virtio_mmio_node(gc, fdt, GUEST_VIRTIO_MMIO_BASE, GUEST_VIRTIO_MMIO_SPI) );
+
         if (pfdt)
             FDT( copy_partial_fdt(gc, fdt, pfdt) );
 
@@ -1178,6 +1231,7 @@ void libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
 {
     /* ACPI is disabled by default */
     libxl_defbool_setdefault(&b_info->acpi, false);
+    libxl_defbool_setdefault(&b_info->arch_arm.virtio, false);
 
     if (b_info->type != LIBXL_DOMAIN_TYPE_PV)
         return;
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 9d3f05f..b054bf9 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -639,6 +639,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 
 
     ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
+                               ("virtio", libxl_defbool),
                                ("vuart", libxl_vuart_type),
                               ])),
     # Alternate p2m is not bound to any architecture or guest type, as it is
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index cae8eb6..10acf22 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2581,6 +2581,7 @@ skip_usbdev:
     }
 
     xlu_cfg_get_defbool(config, "dm_restrict", &b_info->dm_restrict, 0);
+    xlu_cfg_get_defbool(config, "virtio", &b_info->arch_arm.virtio, 0);
 
     if (c_info->type == LIBXL_DOMAIN_TYPE_HVM) {
         if (!xlu_cfg_get_string (config, "vga", &buf, 0)) {
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index c365b1b..be7595f 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -464,6 +464,11 @@ typedef uint64_t xen_callback_t;
 #define PSCI_cpu_on      2
 #define PSCI_migrate     3
 
+/* VirtIO MMIO definitions */
+#define GUEST_VIRTIO_MMIO_BASE  xen_mk_ullong(0x02000000)
+#define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
+#define GUEST_VIRTIO_MMIO_SPI   33
+
 #endif
 
 #ifndef __ASSEMBLY__
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* [PATCH V3 23/23] [RFC] libxl: Add support for virtio-disk configuration
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (21 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 22/23] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
@ 2020-11-30 10:31 ` Oleksandr Tyshchenko
  2020-11-30 11:22 ` [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm Oleksandr
  2020-11-30 16:21 ` Alex Bennée
  24 siblings, 0 replies; 127+ messages in thread
From: Oleksandr Tyshchenko @ 2020-11-30 10:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Julien Grall,
	Stefano Stabellini, Anthony PERARD

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch adds basic support for configuring and assisting virtio-disk
backend (emualator) which is intended to run out of Qemu and could be run
in any domain.

Xenstore was chosen as a communication interface for the emulator running
in non-toolstack domain to be able to get configuration either by reading
Xenstore directly or by receiving command line parameters (an updated 'xl devd'
running in the same domain would read Xenstore beforehand and call backend
executable with the required arguments).

An example of domain configuration (two disks are assigned to the guest,
the latter is in readonly mode):

vdisk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3;ro:/dev/mmcblk1p3' ]

Where per-disk Xenstore entries are:
- filename and readonly flag (configured via "vdisk" property)
- base and irq (allocated dynamically)

Besides handling 'visible' params described in configuration file,
patch also allocates virtio-mmio specific ones for each device and
writes them into Xenstore. virtio-mmio params (irq and base) are
unique per guest domain, they allocated at the domain creation time
and passed through to the emulator. Each VirtIO device has at least
one pair of these params.

TODO:
1. An extra "virtio" property could be removed.
2. Update documentation.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Changes RFC -> V1:
   - no changes

Changes V1 -> V2:
   - rebase according to the new location of libxl_virtio_disk.c

Changes V2 -> V3:
   - no changes

Please note, there is a real concern about VirtIO interrupts allocation.
Just copy here what Stefano said in RFC thread.

So, if we end up allocating let's say 6 virtio interrupts for a domain,
the chance of a clash with a physical interrupt of a passthrough device is real.

I am not entirely sure how to solve it, but these are a few ideas:
- choosing virtio interrupts that are less likely to conflict (maybe > 1000)
- make the virtio irq (optionally) configurable so that a user could
  override the default irq and specify one that doesn't conflict
- implementing support for virq != pirq (even the xl interface doesn't
  allow to specify the virq number for passthrough devices, see "irqs")

Also there is one suggestion from Wei Chen regarding a parameter for domain
config file which I haven't addressed yet.
[Just copy here what Wei said in V2 thread]
Can we keep use the same 'disk' parameter for virtio-disk, but add an option like
"model=virtio-disk"?
For example:
disk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3,model=virtio-disk' ]
Just like what Xen has done for x86 virtio-net.

---
---
 tools/libs/light/Makefile                 |   1 +
 tools/libs/light/libxl_arm.c              |  56 ++++++++++++---
 tools/libs/light/libxl_create.c           |   1 +
 tools/libs/light/libxl_internal.h         |   1 +
 tools/libs/light/libxl_types.idl          |  15 ++++
 tools/libs/light/libxl_types_internal.idl |   1 +
 tools/libs/light/libxl_virtio_disk.c      | 109 ++++++++++++++++++++++++++++
 tools/xl/Makefile                         |   2 +-
 tools/xl/xl.h                             |   3 +
 tools/xl/xl_cmdtable.c                    |  15 ++++
 tools/xl/xl_parse.c                       | 115 ++++++++++++++++++++++++++++++
 tools/xl/xl_virtio_disk.c                 |  46 ++++++++++++
 12 files changed, 354 insertions(+), 11 deletions(-)
 create mode 100644 tools/libs/light/libxl_virtio_disk.c
 create mode 100644 tools/xl/xl_virtio_disk.c

diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile
index 68f6fa3..ccc91b9 100644
--- a/tools/libs/light/Makefile
+++ b/tools/libs/light/Makefile
@@ -115,6 +115,7 @@ SRCS-y += libxl_genid.c
 SRCS-y += _libxl_types.c
 SRCS-y += libxl_flask.c
 SRCS-y += _libxl_types_internal.c
+SRCS-y += libxl_virtio_disk.c
 
 ifeq ($(CONFIG_LIBNL),y)
 CFLAGS_LIBXL += $(LIBNL3_CFLAGS)
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 588ee5a..9eb3022 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -8,6 +8,12 @@
 #include <assert.h>
 #include <xen/device_tree_defs.h>
 
+#ifndef container_of
+#define container_of(ptr, type, member) ({			\
+        typeof( ((type *)0)->member ) *__mptr = (ptr);	\
+        (type *)( (char *)__mptr - offsetof(type,member) );})
+#endif
+
 static const char *gicv_to_string(libxl_gic_version gic_version)
 {
     switch (gic_version) {
@@ -39,14 +45,32 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         vuart_enabled = true;
     }
 
-    /*
-     * XXX: Handle properly virtio
-     * A proper solution would be the toolstack to allocate the interrupts
-     * used by each virtio backend and let the backend now which one is used
-     */
     if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {
-        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
+        uint64_t virtio_base;
+        libxl_device_virtio_disk *virtio_disk;
+
+        virtio_base = GUEST_VIRTIO_MMIO_BASE;
         virtio_irq = GUEST_VIRTIO_MMIO_SPI;
+
+        if (!d_config->num_virtio_disks) {
+            LOG(ERROR, "Virtio is enabled, but no Virtio devices present\n");
+            return ERROR_FAIL;
+        }
+        virtio_disk = &d_config->virtio_disks[0];
+
+        for (i = 0; i < virtio_disk->num_disks; i++) {
+            virtio_disk->disks[i].base = virtio_base;
+            virtio_disk->disks[i].irq = virtio_irq;
+
+            LOG(DEBUG, "Allocate Virtio MMIO params: IRQ %u BASE 0x%"PRIx64,
+                virtio_irq, virtio_base);
+
+            virtio_irq ++;
+            virtio_base += GUEST_VIRTIO_MMIO_SIZE;
+        }
+        virtio_irq --;
+
+        nr_spis += (virtio_irq - 32) + 1;
         virtio_enabled = true;
     }
 
@@ -70,8 +94,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         }
 
         /* The same check as for vpl011 */
-        if (virtio_enabled && irq == virtio_irq) {
-            LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq);
+        if (virtio_enabled &&
+           (irq >= GUEST_VIRTIO_MMIO_SPI && irq <= virtio_irq)) {
+            LOG(ERROR, "Physical IRQ %u conflicting with Virtio IRQ range\n", irq);
             return ERROR_FAIL;
         }
 
@@ -1011,8 +1036,19 @@ next_resize:
         if (info->tee == LIBXL_TEE_TYPE_OPTEE)
             FDT( make_optee_node(gc, fdt) );
 
-        if (libxl_defbool_val(info->arch_arm.virtio))
-            FDT( make_virtio_mmio_node(gc, fdt, GUEST_VIRTIO_MMIO_BASE, GUEST_VIRTIO_MMIO_SPI) );
+        if (libxl_defbool_val(info->arch_arm.virtio)) {
+            libxl_domain_config *d_config =
+                container_of(info, libxl_domain_config, b_info);
+            libxl_device_virtio_disk *virtio_disk = &d_config->virtio_disks[0];
+            unsigned int i;
+
+            for (i = 0; i < virtio_disk->num_disks; i++) {
+                uint64_t base = virtio_disk->disks[i].base;
+                uint32_t irq = virtio_disk->disks[i].irq;
+
+                FDT( make_virtio_mmio_node(gc, fdt, base, irq) );
+            }
+        }
 
         if (pfdt)
             FDT( copy_partial_fdt(gc, fdt, pfdt) );
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index 321a13e..8da328d 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -1821,6 +1821,7 @@ const libxl__device_type *device_type_tbl[] = {
     &libxl__dtdev_devtype,
     &libxl__vdispl_devtype,
     &libxl__vsnd_devtype,
+    &libxl__virtio_disk_devtype,
     NULL
 };
 
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
index e26cda9..ea497bb 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -4000,6 +4000,7 @@ extern const libxl__device_type libxl__vdispl_devtype;
 extern const libxl__device_type libxl__p9_devtype;
 extern const libxl__device_type libxl__pvcallsif_devtype;
 extern const libxl__device_type libxl__vsnd_devtype;
+extern const libxl__device_type libxl__virtio_disk_devtype;
 
 extern const libxl__device_type *device_type_tbl[];
 
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index b054bf9..5f8a3ff 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -935,6 +935,20 @@ libxl_device_vsnd = Struct("device_vsnd", [
     ("pcms", Array(libxl_vsnd_pcm, "num_vsnd_pcms"))
     ])
 
+libxl_virtio_disk_param = Struct("virtio_disk_param", [
+    ("filename", string),
+    ("readonly", bool),
+    ("irq", uint32),
+    ("base", uint64),
+    ])
+
+libxl_device_virtio_disk = Struct("device_virtio_disk", [
+    ("backend_domid", libxl_domid),
+    ("backend_domname", string),
+    ("devid", libxl_devid),
+    ("disks", Array(libxl_virtio_disk_param, "num_disks")),
+    ])
+
 libxl_domain_config = Struct("domain_config", [
     ("c_info", libxl_domain_create_info),
     ("b_info", libxl_domain_build_info),
@@ -951,6 +965,7 @@ libxl_domain_config = Struct("domain_config", [
     ("pvcallsifs", Array(libxl_device_pvcallsif, "num_pvcallsifs")),
     ("vdispls", Array(libxl_device_vdispl, "num_vdispls")),
     ("vsnds", Array(libxl_device_vsnd, "num_vsnds")),
+    ("virtio_disks", Array(libxl_device_virtio_disk, "num_virtio_disks")),
     # a channel manifests as a console with a name,
     # see docs/misc/channels.txt
     ("channels", Array(libxl_device_channel, "num_channels")),
diff --git a/tools/libs/light/libxl_types_internal.idl b/tools/libs/light/libxl_types_internal.idl
index 3593e21..8f71980 100644
--- a/tools/libs/light/libxl_types_internal.idl
+++ b/tools/libs/light/libxl_types_internal.idl
@@ -32,6 +32,7 @@ libxl__device_kind = Enumeration("device_kind", [
     (14, "PVCALLS"),
     (15, "VSND"),
     (16, "VINPUT"),
+    (17, "VIRTIO_DISK"),
     ])
 
 libxl__console_backend = Enumeration("console_backend", [
diff --git a/tools/libs/light/libxl_virtio_disk.c b/tools/libs/light/libxl_virtio_disk.c
new file mode 100644
index 0000000..25e7f1a
--- /dev/null
+++ b/tools/libs/light/libxl_virtio_disk.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2020 EPAM Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_internal.h"
+
+static int libxl__device_virtio_disk_setdefault(libxl__gc *gc, uint32_t domid,
+                                                libxl_device_virtio_disk *virtio_disk,
+                                                bool hotplug)
+{
+    return libxl__resolve_domid(gc, virtio_disk->backend_domname,
+                                &virtio_disk->backend_domid);
+}
+
+static int libxl__virtio_disk_from_xenstore(libxl__gc *gc, const char *libxl_path,
+                                            libxl_devid devid,
+                                            libxl_device_virtio_disk *virtio_disk)
+{
+    const char *be_path;
+    int rc;
+
+    virtio_disk->devid = devid;
+    rc = libxl__xs_read_mandatory(gc, XBT_NULL,
+                                  GCSPRINTF("%s/backend", libxl_path),
+                                  &be_path);
+    if (rc) return rc;
+
+    rc = libxl__backendpath_parse_domid(gc, be_path, &virtio_disk->backend_domid);
+    if (rc) return rc;
+
+    return 0;
+}
+
+static void libxl__update_config_virtio_disk(libxl__gc *gc,
+                                             libxl_device_virtio_disk *dst,
+                                             libxl_device_virtio_disk *src)
+{
+    dst->devid = src->devid;
+}
+
+static int libxl_device_virtio_disk_compare(libxl_device_virtio_disk *d1,
+                                            libxl_device_virtio_disk *d2)
+{
+    return COMPARE_DEVID(d1, d2);
+}
+
+static void libxl__device_virtio_disk_add(libxl__egc *egc, uint32_t domid,
+                                          libxl_device_virtio_disk *virtio_disk,
+                                          libxl__ao_device *aodev)
+{
+    libxl__device_add_async(egc, domid, &libxl__virtio_disk_devtype, virtio_disk, aodev);
+}
+
+static int libxl__set_xenstore_virtio_disk(libxl__gc *gc, uint32_t domid,
+                                           libxl_device_virtio_disk *virtio_disk,
+                                           flexarray_t *back, flexarray_t *front,
+                                           flexarray_t *ro_front)
+{
+    int rc;
+    unsigned int i;
+
+    for (i = 0; i < virtio_disk->num_disks; i++) {
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/filename", i),
+                                   GCSPRINTF("%s", virtio_disk->disks[i].filename));
+        if (rc) return rc;
+
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/readonly", i),
+                                   GCSPRINTF("%d", virtio_disk->disks[i].readonly));
+        if (rc) return rc;
+
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/base", i),
+                                   GCSPRINTF("%lu", virtio_disk->disks[i].base));
+        if (rc) return rc;
+
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/irq", i),
+                                   GCSPRINTF("%u", virtio_disk->disks[i].irq));
+        if (rc) return rc;
+    }
+
+    return 0;
+}
+
+static LIBXL_DEFINE_UPDATE_DEVID(virtio_disk)
+static LIBXL_DEFINE_DEVICE_FROM_TYPE(virtio_disk)
+static LIBXL_DEFINE_DEVICES_ADD(virtio_disk)
+
+DEFINE_DEVICE_TYPE_STRUCT(virtio_disk, VIRTIO_DISK,
+    .update_config = (device_update_config_fn_t) libxl__update_config_virtio_disk,
+    .from_xenstore = (device_from_xenstore_fn_t) libxl__virtio_disk_from_xenstore,
+    .set_xenstore_config = (device_set_xenstore_config_fn_t) libxl__set_xenstore_virtio_disk
+);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/xl/Makefile b/tools/xl/Makefile
index bdf67c8..9d8f2aa 100644
--- a/tools/xl/Makefile
+++ b/tools/xl/Makefile
@@ -23,7 +23,7 @@ XL_OBJS += xl_vtpm.o xl_block.o xl_nic.o xl_usb.o
 XL_OBJS += xl_sched.o xl_pci.o xl_vcpu.o xl_cdrom.o xl_mem.o
 XL_OBJS += xl_info.o xl_console.o xl_misc.o
 XL_OBJS += xl_vmcontrol.o xl_saverestore.o xl_migrate.o
-XL_OBJS += xl_vdispl.o xl_vsnd.o xl_vkb.o
+XL_OBJS += xl_vdispl.o xl_vsnd.o xl_vkb.o xl_virtio_disk.o
 
 $(XL_OBJS): CFLAGS += $(CFLAGS_libxentoollog)
 $(XL_OBJS): CFLAGS += $(CFLAGS_XL)
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 06569c6..3d26f19 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -178,6 +178,9 @@ int main_vsnddetach(int argc, char **argv);
 int main_vkbattach(int argc, char **argv);
 int main_vkblist(int argc, char **argv);
 int main_vkbdetach(int argc, char **argv);
+int main_virtio_diskattach(int argc, char **argv);
+int main_virtio_disklist(int argc, char **argv);
+int main_virtio_diskdetach(int argc, char **argv);
 int main_usbctrl_attach(int argc, char **argv);
 int main_usbctrl_detach(int argc, char **argv);
 int main_usbdev_attach(int argc, char **argv);
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 7da6c1b..745afab 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -435,6 +435,21 @@ struct cmd_spec cmd_table[] = {
       "Destroy a domain's virtual sound device",
       "<Domain> <DevId>",
     },
+    { "virtio-disk-attach",
+      &main_virtio_diskattach, 1, 1,
+      "Create a new virtio block device",
+      " TBD\n"
+    },
+    { "virtio-disk-list",
+      &main_virtio_disklist, 0, 0,
+      "List virtio block devices for a domain",
+      "<Domain(s)>",
+    },
+    { "virtio-disk-detach",
+      &main_virtio_diskdetach, 0, 1,
+      "Destroy a domain's virtio block device",
+      "<Domain> <DevId>",
+    },
     { "uptime",
       &main_uptime, 0, 0,
       "Print uptime for all/some domains",
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 10acf22..6cf3524 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1204,6 +1204,120 @@ out:
     if (rc) exit(EXIT_FAILURE);
 }
 
+#define MAX_VIRTIO_DISKS 4
+
+static int parse_virtio_disk_config(libxl_device_virtio_disk *virtio_disk, char *token)
+{
+    char *oparg;
+    libxl_string_list disks = NULL;
+    int i, rc;
+
+    if (MATCH_OPTION("backend", token, oparg)) {
+        virtio_disk->backend_domname = strdup(oparg);
+    } else if (MATCH_OPTION("disks", token, oparg)) {
+        split_string_into_string_list(oparg, ";", &disks);
+
+        virtio_disk->num_disks = libxl_string_list_length(&disks);
+        if (virtio_disk->num_disks > MAX_VIRTIO_DISKS) {
+            fprintf(stderr, "vdisk: currently only %d disks are supported",
+                    MAX_VIRTIO_DISKS);
+            return 1;
+        }
+        virtio_disk->disks = xcalloc(virtio_disk->num_disks,
+                                     sizeof(*virtio_disk->disks));
+
+        for(i = 0; i < virtio_disk->num_disks; i++) {
+            char *disk_opt;
+
+            rc = split_string_into_pair(disks[i], ":", &disk_opt,
+                                        &virtio_disk->disks[i].filename);
+            if (rc) {
+                fprintf(stderr, "vdisk: failed to split \"%s\" into pair\n",
+                        disks[i]);
+                goto out;
+            }
+
+            if (!strcmp(disk_opt, "ro"))
+                virtio_disk->disks[i].readonly = 1;
+            else if (!strcmp(disk_opt, "rw"))
+                virtio_disk->disks[i].readonly = 0;
+            else {
+                fprintf(stderr, "vdisk: failed to parse \"%s\" disk option\n",
+                        disk_opt);
+                rc = 1;
+            }
+            free(disk_opt);
+
+            if (rc) goto out;
+        }
+    } else {
+        fprintf(stderr, "Unknown string \"%s\" in vdisk spec\n", token);
+        rc = 1; goto out;
+    }
+
+    rc = 0;
+
+out:
+    libxl_string_list_dispose(&disks);
+    return rc;
+}
+
+static void parse_virtio_disk_list(const XLU_Config *config,
+                            libxl_domain_config *d_config)
+{
+    XLU_ConfigList *virtio_disks;
+    const char *item;
+    char *buf = NULL;
+    int rc;
+
+    if (!xlu_cfg_get_list (config, "vdisk", &virtio_disks, 0, 0)) {
+        libxl_domain_build_info *b_info = &d_config->b_info;
+        int entry = 0;
+
+        /* XXX Remove an extra property */
+        libxl_defbool_setdefault(&b_info->arch_arm.virtio, false);
+        if (!libxl_defbool_val(b_info->arch_arm.virtio)) {
+            fprintf(stderr, "Virtio device requires Virtio property to be set\n");
+            exit(EXIT_FAILURE);
+        }
+
+        while ((item = xlu_cfg_get_listitem(virtio_disks, entry)) != NULL) {
+            libxl_device_virtio_disk *virtio_disk;
+            char *p;
+
+            virtio_disk = ARRAY_EXTEND_INIT(d_config->virtio_disks,
+                                            d_config->num_virtio_disks,
+                                            libxl_device_virtio_disk_init);
+
+            buf = strdup(item);
+
+            p = strtok (buf, ",");
+            while (p != NULL)
+            {
+                while (*p == ' ') p++;
+
+                rc = parse_virtio_disk_config(virtio_disk, p);
+                if (rc) goto out;
+
+                p = strtok (NULL, ",");
+            }
+
+            entry++;
+
+            if (virtio_disk->num_disks == 0) {
+                fprintf(stderr, "At least one virtio disk should be specified\n");
+                rc = 1; goto out;
+            }
+        }
+    }
+
+    rc = 0;
+
+out:
+    free(buf);
+    if (rc) exit(EXIT_FAILURE);
+}
+
 void parse_config_data(const char *config_source,
                        const char *config_data,
                        int config_len,
@@ -2734,6 +2848,7 @@ skip_usbdev:
     }
 
     parse_vkb_list(config, d_config);
+    parse_virtio_disk_list(config, d_config);
 
     xlu_cfg_get_defbool(config, "xend_suspend_evtchn_compat",
                         &c_info->xend_suspend_evtchn_compat, 0);
diff --git a/tools/xl/xl_virtio_disk.c b/tools/xl/xl_virtio_disk.c
new file mode 100644
index 0000000..808a7da
--- /dev/null
+++ b/tools/xl/xl_virtio_disk.c
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2020 EPAM Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include <stdlib.h>
+
+#include <libxl.h>
+#include <libxl_utils.h>
+#include <libxlutil.h>
+
+#include "xl.h"
+#include "xl_utils.h"
+#include "xl_parse.h"
+
+int main_virtio_diskattach(int argc, char **argv)
+{
+    return 0;
+}
+
+int main_virtio_disklist(int argc, char **argv)
+{
+   return 0;
+}
+
+int main_virtio_diskdetach(int argc, char **argv)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (22 preceding siblings ...)
  2020-11-30 10:31 ` [PATCH V3 23/23] [RFC] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
@ 2020-11-30 11:22 ` Oleksandr
  2020-12-07 13:03   ` Wei Chen
  2020-11-30 16:21 ` Alex Bennée
  24 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-11-30 11:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Tim Deegan, Daniel De Graaf,
	Volodymyr Babchuk, Jun Nakajima, Kevin Tian, Anthony PERARD,
	Bertrand Marquis, Wei Chen, Kaly Xin, Artem Mygaiev,
	Alex Bennée


On 30.11.20 12:31, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Hello all.

Added missed subject line. I am sorry for the inconvenience.


>
>
> Date: Sat, 28 Nov 2020 22:33:51 +0200
> Subject: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Hello all.
>
> The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
> You can find an initial discussion at [1] and RFC/V1/V2 series at [2]/[3]/[4].
> Xen on Arm requires some implementation to forward guest MMIO access to a device
> model in order to implement virtio-mmio backend or even mediator outside of hypervisor.
> As Xen on x86 already contains required support this series tries to make it common
> and introduce Arm specific bits plus some new functionality. Patch series is based on
> Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device emulator".
> Besides splitting existing IOREQ/DM support and introducing Arm side, the series
> also includes virtio-mmio related changes (last 2 patches for toolstack)
> for the reviewers to be able to see how the whole picture could look like.
>
> According to the initial discussion there are a few open questions/concerns
> regarding security, performance in VirtIO solution:
> 1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require different
>     transport...
> 2. virtio backend is able to access all guest memory, some kind of protection
>     is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest'
> 3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using
>     Xenstore in virtio backend if possible.
> 4. a lot of 'foreing mapping' could lead to the memory exhaustion, Julien
>     has some idea regarding that.
>
> Looks like all of them are valid and worth considering, but the first thing
> which we need on Arm is a mechanism to forward guest IO to a device emulator,
> so let's focus on it in the first place.
>
> ***
>
> There are a lot of changes since RFC series, almost all TODOs were resolved on Arm,
> Arm code was improved and hardened, common IOREQ/DM code became really arch-agnostic
> (without HVM-ism), the "legacy" mechanism of mapping magic pages for the IOREQ servers
> was left x86 specific, etc. But one TODO still remains which is "PIO handling" on Arm.
> The "PIO handling" TODO is expected to left unaddressed for the current series.
> It is not an big issue for now while Xen doesn't have support for vPCI on Arm.
> On Arm64 they are only used for PCI IO Bar and we would probably want to expose
> them to emulator as PIO access to make a DM completely arch-agnostic. So "PIO handling"
> should be implemented when we add support for vPCI.
>
> I left interface untouched in the following patch
> "xen/dm: Introduce xendevicemodel_set_irq_level DM op"
> since there is still an open discussion what interface to use/what
> information to pass to the hypervisor.
>
> There is a patch on review this series depends on:
> https://patchwork.kernel.org/patch/11816689
>
> Please note, that IOREQ feature is disabled by default on Arm within current series.
>
> ***
>
> Patch series [5] was rebased on recent "staging branch"
> (181f2c2 evtchn: double per-channel locking can't hit identical channels) and tested on
> Renesas Salvator-X board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend [6]
> running in driver domain and unmodified Linux Guest running on existing
> virtio-blk driver (frontend). No issues were observed. Guest domain 'reboot/destroy'
> use-cases work properly. Patch series was only build-tested on x86.
>
> Please note, build-test passed for the following modes:
> 1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
> 2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set
> 3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y
> 4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set  (default)
> 5. Arm32: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y
> 6. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set  (default)
>
> ***
>
> Any feedback/help would be highly appreciated.
>
> [1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00825.html
> [2] https://lists.xenproject.org/archives/html/xen-devel/2020-08/msg00071.html
> [3] https://lists.xenproject.org/archives/html/xen-devel/2020-09/msg00732.html
> [4] https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01077.html
> [5] https://github.com/otyshchenko1/xen/commits/ioreq_4.14_ml4
> [6] https://github.com/xen-troops/virtio-disk/commits/ioreq_ml1
>
> Julien Grall (5):
>    xen/dm: Make x86's DM feature common
>    xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
>    arm/ioreq: Introduce arch specific bits for IOREQ/DM features
>    xen/dm: Introduce xendevicemodel_set_irq_level DM op
>    libxl: Introduce basic virtio-mmio support on Arm
>
> Oleksandr Tyshchenko (18):
>    x86/ioreq: Prepare IOREQ feature for making it common
>    x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving
>    x86/ioreq: Provide out-of-line wrapper for the handle_mmio()
>    xen/ioreq: Make x86's IOREQ feature common
>    xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
>    xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common
>    xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
>    xen/ioreq: Move x86's ioreq_server to struct domain
>    xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
>    xen/ioreq: Remove "hvm" prefixes from involved function names
>    xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
>    xen/arm: Stick around in leave_hypervisor_to_guest until I/O has
>      completed
>    xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
>    xen/ioreq: Introduce domain_has_ioreq_server()
>    xen/arm: io: Abstract sign-extension
>    xen/ioreq: Make x86's send_invalidate_req() common
>    xen/arm: Add mapcache invalidation handling
>    [RFC] libxl: Add support for virtio-disk configuration
>
>   MAINTAINERS                                  |    8 +-
>   tools/include/xendevicemodel.h               |    4 +
>   tools/libs/devicemodel/core.c                |   18 +
>   tools/libs/devicemodel/libxendevicemodel.map |    1 +
>   tools/libs/light/Makefile                    |    1 +
>   tools/libs/light/libxl_arm.c                 |   94 +-
>   tools/libs/light/libxl_create.c              |    1 +
>   tools/libs/light/libxl_internal.h            |    1 +
>   tools/libs/light/libxl_types.idl             |   16 +
>   tools/libs/light/libxl_types_internal.idl    |    1 +
>   tools/libs/light/libxl_virtio_disk.c         |  109 +++
>   tools/xl/Makefile                            |    2 +-
>   tools/xl/xl.h                                |    3 +
>   tools/xl/xl_cmdtable.c                       |   15 +
>   tools/xl/xl_parse.c                          |  116 +++
>   tools/xl/xl_virtio_disk.c                    |   46 +
>   xen/arch/arm/Makefile                        |    2 +
>   xen/arch/arm/dm.c                            |   89 ++
>   xen/arch/arm/domain.c                        |    9 +
>   xen/arch/arm/hvm.c                           |    4 +
>   xen/arch/arm/io.c                            |   29 +-
>   xen/arch/arm/ioreq.c                         |  126 +++
>   xen/arch/arm/p2m.c                           |   48 +-
>   xen/arch/arm/traps.c                         |   58 +-
>   xen/arch/x86/Kconfig                         |    1 +
>   xen/arch/x86/hvm/dm.c                        |  295 +-----
>   xen/arch/x86/hvm/emulate.c                   |   80 +-
>   xen/arch/x86/hvm/hvm.c                       |   12 +-
>   xen/arch/x86/hvm/hypercall.c                 |    9 +-
>   xen/arch/x86/hvm/intercept.c                 |    5 +-
>   xen/arch/x86/hvm/io.c                        |   26 +-
>   xen/arch/x86/hvm/ioreq.c                     | 1357 ++------------------------
>   xen/arch/x86/hvm/stdvga.c                    |   10 +-
>   xen/arch/x86/hvm/svm/nestedsvm.c             |    2 +-
>   xen/arch/x86/hvm/vmx/realmode.c              |    6 +-
>   xen/arch/x86/hvm/vmx/vvmx.c                  |    2 +-
>   xen/arch/x86/mm.c                            |   46 +-
>   xen/arch/x86/mm/p2m.c                        |   13 +-
>   xen/arch/x86/mm/shadow/common.c              |    2 +-
>   xen/common/Kconfig                           |    3 +
>   xen/common/Makefile                          |    2 +
>   xen/common/dm.c                              |  292 ++++++
>   xen/common/ioreq.c                           | 1307 +++++++++++++++++++++++++
>   xen/common/memory.c                          |   73 +-
>   xen/include/asm-arm/domain.h                 |    3 +
>   xen/include/asm-arm/hvm/ioreq.h              |  139 +++
>   xen/include/asm-arm/mm.h                     |    8 -
>   xen/include/asm-arm/mmio.h                   |    1 +
>   xen/include/asm-arm/p2m.h                    |   19 +-
>   xen/include/asm-arm/traps.h                  |   24 +
>   xen/include/asm-x86/hvm/domain.h             |   43 -
>   xen/include/asm-x86/hvm/emulate.h            |    2 +-
>   xen/include/asm-x86/hvm/io.h                 |   17 -
>   xen/include/asm-x86/hvm/ioreq.h              |   58 +-
>   xen/include/asm-x86/hvm/vcpu.h               |   18 -
>   xen/include/asm-x86/mm.h                     |    4 -
>   xen/include/asm-x86/p2m.h                    |   24 +-
>   xen/include/public/arch-arm.h                |    5 +
>   xen/include/public/hvm/dm_op.h               |   16 +
>   xen/include/xen/dm.h                         |   44 +
>   xen/include/xen/ioreq.h                      |  146 +++
>   xen/include/xen/p2m-common.h                 |    4 +
>   xen/include/xen/sched.h                      |   32 +
>   xen/include/xsm/dummy.h                      |    4 +-
>   xen/include/xsm/xsm.h                        |    6 +-
>   xen/xsm/dummy.c                              |    2 +-
>   xen/xsm/flask/hooks.c                        |    5 +-
>   67 files changed, 3084 insertions(+), 1884 deletions(-)
>   create mode 100644 tools/libs/light/libxl_virtio_disk.c
>   create mode 100644 tools/xl/xl_virtio_disk.c
>   create mode 100644 xen/arch/arm/dm.c
>   create mode 100644 xen/arch/arm/ioreq.c
>   create mode 100644 xen/common/dm.c
>   create mode 100644 xen/common/ioreq.c
>   create mode 100644 xen/include/asm-arm/hvm/ioreq.h
>   create mode 100644 xen/include/xen/dm.h
>   create mode 100644 xen/include/xen/ioreq.h
>
-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re:
  2020-11-30 10:31 Oleksandr Tyshchenko
                   ` (23 preceding siblings ...)
  2020-11-30 11:22 ` [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm Oleksandr
@ 2020-11-30 16:21 ` Alex Bennée
  2020-11-30 22:22   ` [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm Oleksandr
  2020-12-29 15:32   ` Roger Pau Monné
  24 siblings, 2 replies; 127+ messages in thread
From: Alex Bennée @ 2020-11-30 16:21 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Tim Deegan, Daniel De Graaf,
	Volodymyr Babchuk, Jun Nakajima, Kevin Tian, Anthony PERARD,
	Bertrand Marquis, Wei Chen, Kaly Xin, Artem Mygaiev


Oleksandr Tyshchenko <olekstysh@gmail.com> writes:

> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
>
> Date: Sat, 28 Nov 2020 22:33:51 +0200
> Subject: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Hello all.
>
> The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
> You can find an initial discussion at [1] and RFC/V1/V2 series at [2]/[3]/[4].
> Xen on Arm requires some implementation to forward guest MMIO access to a device
> model in order to implement virtio-mmio backend or even mediator outside of hypervisor.
> As Xen on x86 already contains required support this series tries to make it common
> and introduce Arm specific bits plus some new functionality. Patch series is based on
> Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device emulator".
> Besides splitting existing IOREQ/DM support and introducing Arm side, the series
> also includes virtio-mmio related changes (last 2 patches for toolstack)
> for the reviewers to be able to see how the whole picture could look
> like.

Thanks for posting the latest version.

>
> According to the initial discussion there are a few open questions/concerns
> regarding security, performance in VirtIO solution:
> 1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require different
>    transport...

I think I'm repeating things here I've said in various ephemeral video
chats over the last few weeks but I should probably put things down on
the record.

I think the original intention of the virtio framers is advanced
features would build on virtio-pci because you get a bunch of things
"for free" - notably enumeration and MSI support. There is assumption
that by the time you add these features to virtio-mmio you end up
re-creating your own less well tested version of virtio-pci. I've not
been terribly convinced by the argument that the guest implementation of
PCI presents a sufficiently large blob of code to make the simpler MMIO
desirable. My attempts to build two virtio kernels (PCI/MMIO) with
otherwise the same devices wasn't terribly conclusive either way.

That said virtio-mmio still has life in it because the cloudy slimmed
down guests moved to using it because the enumeration of PCI is a road
block to their fast boot up requirements. I'm sure they would also
appreciate a MSI implementation to reduce the overhead that handling
notifications currently has on trap-and-emulate.

AIUI for Xen the other downside to PCI is you would have to emulate it
in the hypervisor which would be additional code at the most privileged
level.

> 2. virtio backend is able to access all guest memory, some kind of protection
>    is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in
>    guest'

This is also an area of interest for Project Stratos and something we
would like to be solved generally for all hypervisors. There is a good
write up of some approaches that Jean Phillipe did on the stratos
mailing list:

  From: Jean-Philippe Brucker <jean-philippe@linaro.org>
  Subject: Limited memory sharing investigation
  Message-ID: <20201002134336.GA2196245@myrica>

I suspect there is a good argument for the simplicity of a combined
virt queue but it is unlikely to be very performance orientated.

> 3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using
>    Xenstore in virtio backend if possible.

I wonder how much work it would be for a rust expert to make:

  https://github.com/slp/vhost-user-blk

handle an IOREQ signalling pathway instead of the vhost-user/eventfd
pathway? That would give a good indication on how "hypervisor blind"
these daemons could be made.

<snip>
>
> Please note, build-test passed for the following modes:
> 1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
> 2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set
> 3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y

Forgive my relative newness to Xen, how do I convince the hypervisor to
build with this on? I've tried variants of:

  make -j9 CROSS_COMPILE=aarch64-linux-gnu- XEN_TARGET_ARCH=arm64 menuconfig XEN_EXPERT=y [CONFIG_|XEN_|_]IOREQ_SERVER=y

with no joy...

> 4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set  (default)
> 5. Arm32: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y
> 6. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set  (default)
>
> ***
>
> Any feedback/help would be highly appreciated.
<snip>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
  2020-11-30 10:31 ` [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed Oleksandr Tyshchenko
@ 2020-11-30 20:51   ` Volodymyr Babchuk
  2020-12-01 12:46     ` Julien Grall
  2020-12-09 23:18   ` Stefano Stabellini
  1 sibling, 1 reply; 127+ messages in thread
From: Volodymyr Babchuk @ 2020-11-30 20:51 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Julien Grall


Hello Oleksandr,

Oleksandr Tyshchenko writes:

> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> This patch adds proper handling of return value of
> vcpu_ioreq_handle_completion() which involves using a loop
> in leave_hypervisor_to_guest().
>
> The reason to use an unbounded loop here is the fact that vCPU
> shouldn't continue until an I/O has completed. In Xen case, if an I/O
> never completes then it most likely means that something went horribly
> wrong with the Device Emulator. And it is most likely not safe to
> continue. So letting the vCPU to spin forever if I/O never completes
> is a safer action than letting it continue and leaving the guest in
> unclear state and is the best what we can do for now.
>
> This wouldn't be an issue for Xen as do_softirq() would be called at
> every loop. In case of failure, the guest will crash and the vCPU
> will be unscheduled.
>

Why you don't block vcpu there and unblock it when response is ready? If
I got it right, "client" vcpu will spin in the loop, eating own
scheduling budget with no useful work done. In the worst case, it will
prevent "server" vcpu from running.

> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> CC: Julien Grall <julien.grall@arm.com>
>
> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
>
> Changes V1 -> V2:
>    - new patch, changes were derived from (+ new explanation):
>      arm/ioreq: Introduce arch specific bits for IOREQ/DM features
>
> Changes V2 -> V3:
>    - update patch description
> ---
> ---
>  xen/arch/arm/traps.c | 31 ++++++++++++++++++++++++++-----
>  1 file changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 036b13f..4cef43e 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -2257,18 +2257,23 @@ static void check_for_pcpu_work(void)
>   * Process pending work for the vCPU. Any call should be fast or
>   * implement preemption.
>   */
> -static void check_for_vcpu_work(void)
> +static bool check_for_vcpu_work(void)
>  {
>      struct vcpu *v = current;
>  
>  #ifdef CONFIG_IOREQ_SERVER
> +    bool handled;
> +
>      local_irq_enable();
> -    vcpu_ioreq_handle_completion(v);
> +    handled = vcpu_ioreq_handle_completion(v);
>      local_irq_disable();
> +
> +    if ( !handled )
> +        return true;
>  #endif
>  
>      if ( likely(!v->arch.need_flush_to_ram) )
> -        return;
> +        return false;
>  
>      /*
>       * Give a chance for the pCPU to process work before handling the vCPU
> @@ -2279,6 +2284,8 @@ static void check_for_vcpu_work(void)
>      local_irq_enable();
>      p2m_flush_vm(v);
>      local_irq_disable();
> +
> +    return false;
>  }
>  
>  /*
> @@ -2291,8 +2298,22 @@ void leave_hypervisor_to_guest(void)
>  {
>      local_irq_disable();
>  
> -    check_for_vcpu_work();
> -    check_for_pcpu_work();
> +    /*
> +     * The reason to use an unbounded loop here is the fact that vCPU
> +     * shouldn't continue until an I/O has completed. In Xen case, if an I/O
> +     * never completes then it most likely means that something went horribly
> +     * wrong with the Device Emulator. And it is most likely not safe to
> +     * continue. So letting the vCPU to spin forever if I/O never completes
> +     * is a safer action than letting it continue and leaving the guest in
> +     * unclear state and is the best what we can do for now.
> +     *
> +     * This wouldn't be an issue for Xen as do_softirq() would be called at
> +     * every loop. In case of failure, the guest will crash and the vCPU
> +     * will be unscheduled.
> +     */
> +    do {
> +        check_for_pcpu_work();
> +    } while ( check_for_vcpu_work() );
>  
>      vgic_sync_to_lrs();


-- 
Volodymyr Babchuk at EPAM

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-11-30 10:31 ` [PATCH V3 19/23] xen/arm: io: Abstract sign-extension Oleksandr Tyshchenko
@ 2020-11-30 21:03   ` Volodymyr Babchuk
  2020-11-30 23:27     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Volodymyr Babchuk @ 2020-11-30 21:03 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Julien Grall


Hi,

Oleksandr Tyshchenko writes:

> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> In order to avoid code duplication (both handle_read() and
> handle_ioserv() contain the same code for the sign-extension)
> put this code to a common helper to be used for both.
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> CC: Julien Grall <julien.grall@arm.com>
>
> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
>
> Changes V1 -> V2:
>    - new patch
>
> Changes V2 -> V3:
>    - no changes
> ---
> ---
>  xen/arch/arm/io.c           | 18 ++----------------
>  xen/arch/arm/ioreq.c        | 17 +----------------
>  xen/include/asm-arm/traps.h | 24 ++++++++++++++++++++++++
>  3 files changed, 27 insertions(+), 32 deletions(-)
>
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index f44cfd4..8d6ec6c 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -23,6 +23,7 @@
>  #include <asm/cpuerrata.h>
>  #include <asm/current.h>
>  #include <asm/mmio.h>
> +#include <asm/traps.h>
>  #include <asm/hvm/ioreq.h>
>  
>  #include "decode.h"
> @@ -39,26 +40,11 @@ static enum io_state handle_read(const struct mmio_handler *handler,
>       * setting r).
>       */
>      register_t r = 0;
> -    uint8_t size = (1 << dabt.size) * 8;
>  
>      if ( !handler->ops->read(v, info, &r, handler->priv) )
>          return IO_ABORT;
>  
> -    /*
> -     * Sign extend if required.
> -     * Note that we expect the read handler to have zeroed the bits
> -     * outside the requested access size.
> -     */
> -    if ( dabt.sign && (r & (1UL << (size - 1))) )
> -    {
> -        /*
> -         * We are relying on register_t using the same as
> -         * an unsigned long in order to keep the 32-bit assembly
> -         * code smaller.
> -         */
> -        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
> -        r |= (~0UL) << size;
> -    }
> +    r = sign_extend(dabt, r);
>  
>      set_user_reg(regs, dabt.reg, r);
>  
> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> index f08190c..2f39289 100644
> --- a/xen/arch/arm/ioreq.c
> +++ b/xen/arch/arm/ioreq.c
> @@ -28,7 +28,6 @@ enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
>      const union hsr hsr = { .bits = regs->hsr };
>      const struct hsr_dabt dabt = hsr.dabt;
>      /* Code is similar to handle_read */
> -    uint8_t size = (1 << dabt.size) * 8;
>      register_t r = v->io.req.data;
>  
>      /* We are done with the IO */
> @@ -37,21 +36,7 @@ enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
>      if ( dabt.write )
>          return IO_HANDLED;
>  
> -    /*
> -     * Sign extend if required.
> -     * Note that we expect the read handler to have zeroed the bits
> -     * outside the requested access size.
> -     */
> -    if ( dabt.sign && (r & (1UL << (size - 1))) )
> -    {
> -        /*
> -         * We are relying on register_t using the same as
> -         * an unsigned long in order to keep the 32-bit assembly
> -         * code smaller.
> -         */
> -        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
> -        r |= (~0UL) << size;
> -    }
> +    r = sign_extend(dabt, r);
>  
>      set_user_reg(regs, dabt.reg, r);
>  
> diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h
> index 997c378..e301c44 100644
> --- a/xen/include/asm-arm/traps.h
> +++ b/xen/include/asm-arm/traps.h
> @@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const struct cpu_user_regs *regs)
>          (unsigned long)abort_guest_exit_end == regs->pc;
>  }
>  
> +/* Check whether the sign extension is required and perform it */
> +static inline register_t sign_extend(const struct hsr_dabt dabt, register_t r)
> +{
> +    uint8_t size = (1 << dabt.size) * 8;
> +
> +    /*
> +     * Sign extend if required.
> +     * Note that we expect the read handler to have zeroed the bits
> +     * outside the requested access size.
> +     */
> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
> +    {
> +        /*
> +         * We are relying on register_t using the same as
> +         * an unsigned long in order to keep the 32-bit assembly
> +         * code smaller.
> +         */
> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
> +        r |= (~0UL) << size;

If `size` is 64, you will get undefined behavior there.

> +    }
> +
> +    return r;
> +}
> +
>  #endif /* __ASM_ARM_TRAPS__ */
>  /*
>   * Local variables:


-- 
Volodymyr Babchuk at EPAM

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
  2020-11-30 16:21 ` Alex Bennée
@ 2020-11-30 22:22   ` Oleksandr
  2020-12-29 15:32   ` Roger Pau Monné
  1 sibling, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-11-30 22:22 UTC (permalink / raw)
  To: Alex Bennée
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Tim Deegan, Daniel De Graaf,
	Volodymyr Babchuk, Jun Nakajima, Kevin Tian, Anthony PERARD,
	Bertrand Marquis, Wei Chen, Kaly Xin, Artem Mygaiev


On 30.11.20 18:21, Alex Bennée wrote:

Hi Alex

[added missed subject title]

> Oleksandr Tyshchenko <olekstysh@gmail.com> writes:
>
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>>
>> Date: Sat, 28 Nov 2020 22:33:51 +0200
>> Subject: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>>
>> Hello all.
>>
>> The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
>> You can find an initial discussion at [1] and RFC/V1/V2 series at [2]/[3]/[4].
>> Xen on Arm requires some implementation to forward guest MMIO access to a device
>> model in order to implement virtio-mmio backend or even mediator outside of hypervisor.
>> As Xen on x86 already contains required support this series tries to make it common
>> and introduce Arm specific bits plus some new functionality. Patch series is based on
>> Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device emulator".
>> Besides splitting existing IOREQ/DM support and introducing Arm side, the series
>> also includes virtio-mmio related changes (last 2 patches for toolstack)
>> for the reviewers to be able to see how the whole picture could look
>> like.
> Thanks for posting the latest version.
>
>> According to the initial discussion there are a few open questions/concerns
>> regarding security, performance in VirtIO solution:
>> 1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require different
>>     transport...
> I think I'm repeating things here I've said in various ephemeral video
> chats over the last few weeks but I should probably put things down on
> the record.
>
> I think the original intention of the virtio framers is advanced
> features would build on virtio-pci because you get a bunch of things
> "for free" - notably enumeration and MSI support. There is assumption
> that by the time you add these features to virtio-mmio you end up
> re-creating your own less well tested version of virtio-pci. I've not
> been terribly convinced by the argument that the guest implementation of
> PCI presents a sufficiently large blob of code to make the simpler MMIO
> desirable. My attempts to build two virtio kernels (PCI/MMIO) with
> otherwise the same devices wasn't terribly conclusive either way.
>
> That said virtio-mmio still has life in it because the cloudy slimmed
> down guests moved to using it because the enumeration of PCI is a road
> block to their fast boot up requirements. I'm sure they would also
> appreciate a MSI implementation to reduce the overhead that handling
> notifications currently has on trap-and-emulate.
>
> AIUI for Xen the other downside to PCI is you would have to emulate it
> in the hypervisor which would be additional code at the most privileged
> level.
Thank you for putting things together here and valuable input. As for 
me, the "virtio-mmio & MSI solution" as a performance improvement sounds 
indeed
interesting. Flipping through the virtio-mmio links I found a discussion 
regarding that [1].
I think this needs an additional investigation and experiments, however 
I am not sure there is an existing infrastructure in Xen on Arm to do so.
Once we make some progress with the IOREQ series I would be able to 
focus on enhancements which we would consider worthwhile.


>
>> 2. virtio backend is able to access all guest memory, some kind of protection
>>     is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in
>>     guest'
> This is also an area of interest for Project Stratos and something we
> would like to be solved generally for all hypervisors. There is a good
> write up of some approaches that Jean Phillipe did on the stratos
> mailing list:
>
>    From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>    Subject: Limited memory sharing investigation
>    Message-ID: <20201002134336.GA2196245@myrica>
>
> I suspect there is a good argument for the simplicity of a combined
> virt queue but it is unlikely to be very performance orientated.

I will look at it.


>> 3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using
>>     Xenstore in virtio backend if possible.
> I wonder how much work it would be for a rust expert to make:
>
>    https://github.com/slp/vhost-user-blk
>
> handle an IOREQ signalling pathway instead of the vhost-user/eventfd
> pathway? That would give a good indication on how "hypervisor blind"
> these daemons could be made.
>
> <snip>
>> Please note, build-test passed for the following modes:
>> 1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
>> 2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set
>> 3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y
> Forgive my relative newness to Xen, how do I convince the hypervisor to
> build with this on? I've tried variants of:
>
>    make -j9 CROSS_COMPILE=aarch64-linux-gnu- XEN_TARGET_ARCH=arm64 menuconfig XEN_EXPERT=y [CONFIG_|XEN_|_]IOREQ_SERVER=y
CONFIG_IOREQ_SERVER is not protected by CONFIG_XEN_EXPERT. I mentioned 
how to enable CONFIG_IOREQ_SERVER on Arm (since it is disabled by 
default within this series) when
describing how test this series to Masami, but forgot to add here. Could 
you apply the one-line patch [2] and rebuild. Sorry for inconvenience.


[1] https://lwn.net/Articles/812055/
[2] 
https://github.com/otyshchenko1/xen/commit/b371bc9a3c954595bfce01bad244260364bbcd48

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-11-30 21:03   ` Volodymyr Babchuk
@ 2020-11-30 23:27     ` Oleksandr
  2020-12-01  7:55       ` Jan Beulich
  2020-12-01 10:23       ` Julien Grall
  0 siblings, 2 replies; 127+ messages in thread
From: Oleksandr @ 2020-11-30 23:27 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Julien Grall


On 30.11.20 23:03, Volodymyr Babchuk wrote:
> Hi,

Hi Volodymyr


>
> Oleksandr Tyshchenko writes:
>
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> In order to avoid code duplication (both handle_read() and
>> handle_ioserv() contain the same code for the sign-extension)
>> put this code to a common helper to be used for both.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> CC: Julien Grall <julien.grall@arm.com>
>>
>> ---
>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> Changes V1 -> V2:
>>     - new patch
>>
>> Changes V2 -> V3:
>>     - no changes
>> ---
>> ---
>>   xen/arch/arm/io.c           | 18 ++----------------
>>   xen/arch/arm/ioreq.c        | 17 +----------------
>>   xen/include/asm-arm/traps.h | 24 ++++++++++++++++++++++++
>>   3 files changed, 27 insertions(+), 32 deletions(-)
>>
>> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>> index f44cfd4..8d6ec6c 100644
>> --- a/xen/arch/arm/io.c
>> +++ b/xen/arch/arm/io.c
>> @@ -23,6 +23,7 @@
>>   #include <asm/cpuerrata.h>
>>   #include <asm/current.h>
>>   #include <asm/mmio.h>
>> +#include <asm/traps.h>
>>   #include <asm/hvm/ioreq.h>
>>   
>>   #include "decode.h"
>> @@ -39,26 +40,11 @@ static enum io_state handle_read(const struct mmio_handler *handler,
>>        * setting r).
>>        */
>>       register_t r = 0;
>> -    uint8_t size = (1 << dabt.size) * 8;
>>   
>>       if ( !handler->ops->read(v, info, &r, handler->priv) )
>>           return IO_ABORT;
>>   
>> -    /*
>> -     * Sign extend if required.
>> -     * Note that we expect the read handler to have zeroed the bits
>> -     * outside the requested access size.
>> -     */
>> -    if ( dabt.sign && (r & (1UL << (size - 1))) )
>> -    {
>> -        /*
>> -         * We are relying on register_t using the same as
>> -         * an unsigned long in order to keep the 32-bit assembly
>> -         * code smaller.
>> -         */
>> -        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>> -        r |= (~0UL) << size;
>> -    }
>> +    r = sign_extend(dabt, r);
>>   
>>       set_user_reg(regs, dabt.reg, r);
>>   
>> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
>> index f08190c..2f39289 100644
>> --- a/xen/arch/arm/ioreq.c
>> +++ b/xen/arch/arm/ioreq.c
>> @@ -28,7 +28,6 @@ enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
>>       const union hsr hsr = { .bits = regs->hsr };
>>       const struct hsr_dabt dabt = hsr.dabt;
>>       /* Code is similar to handle_read */
>> -    uint8_t size = (1 << dabt.size) * 8;
>>       register_t r = v->io.req.data;
>>   
>>       /* We are done with the IO */
>> @@ -37,21 +36,7 @@ enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
>>       if ( dabt.write )
>>           return IO_HANDLED;
>>   
>> -    /*
>> -     * Sign extend if required.
>> -     * Note that we expect the read handler to have zeroed the bits
>> -     * outside the requested access size.
>> -     */
>> -    if ( dabt.sign && (r & (1UL << (size - 1))) )
>> -    {
>> -        /*
>> -         * We are relying on register_t using the same as
>> -         * an unsigned long in order to keep the 32-bit assembly
>> -         * code smaller.
>> -         */
>> -        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>> -        r |= (~0UL) << size;
>> -    }
>> +    r = sign_extend(dabt, r);
>>   
>>       set_user_reg(regs, dabt.reg, r);
>>   
>> diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h
>> index 997c378..e301c44 100644
>> --- a/xen/include/asm-arm/traps.h
>> +++ b/xen/include/asm-arm/traps.h
>> @@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const struct cpu_user_regs *regs)
>>           (unsigned long)abort_guest_exit_end == regs->pc;
>>   }
>>   
>> +/* Check whether the sign extension is required and perform it */
>> +static inline register_t sign_extend(const struct hsr_dabt dabt, register_t r)
>> +{
>> +    uint8_t size = (1 << dabt.size) * 8;
>> +
>> +    /*
>> +     * Sign extend if required.
>> +     * Note that we expect the read handler to have zeroed the bits
>> +     * outside the requested access size.
>> +     */
>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>> +    {
>> +        /*
>> +         * We are relying on register_t using the same as
>> +         * an unsigned long in order to keep the 32-bit assembly
>> +         * code smaller.
>> +         */
>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>> +        r |= (~0UL) << size;
> If `size` is 64, you will get undefined behavior there.
I think, we don't need to worry about undefined behavior here. Having 
size=64 would be possible with doubleword (dabt.size=3). But if "r" 
adjustment gets called (I mean Syndrome Sign Extend bit is set) then
we deal with byte, halfword or word operations (dabt.size<3). Or I 
missed something?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-11-30 23:27     ` Oleksandr
@ 2020-12-01  7:55       ` Jan Beulich
  2020-12-01 10:30         ` Julien Grall
  2020-12-01 10:23       ` Julien Grall
  1 sibling, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-01  7:55 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Julien Grall, Volodymyr Babchuk

On 01.12.2020 00:27, Oleksandr wrote:
> On 30.11.20 23:03, Volodymyr Babchuk wrote:
>> Oleksandr Tyshchenko writes:
>>> --- a/xen/include/asm-arm/traps.h
>>> +++ b/xen/include/asm-arm/traps.h
>>> @@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const struct cpu_user_regs *regs)
>>>           (unsigned long)abort_guest_exit_end == regs->pc;
>>>   }
>>>   
>>> +/* Check whether the sign extension is required and perform it */
>>> +static inline register_t sign_extend(const struct hsr_dabt dabt, register_t r)
>>> +{
>>> +    uint8_t size = (1 << dabt.size) * 8;
>>> +
>>> +    /*
>>> +     * Sign extend if required.
>>> +     * Note that we expect the read handler to have zeroed the bits
>>> +     * outside the requested access size.
>>> +     */
>>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>> +    {
>>> +        /*
>>> +         * We are relying on register_t using the same as
>>> +         * an unsigned long in order to keep the 32-bit assembly
>>> +         * code smaller.
>>> +         */
>>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>> +        r |= (~0UL) << size;
>> If `size` is 64, you will get undefined behavior there.
> I think, we don't need to worry about undefined behavior here. Having 
> size=64 would be possible with doubleword (dabt.size=3). But if "r" 
> adjustment gets called (I mean Syndrome Sign Extend bit is set) then
> we deal with byte, halfword or word operations (dabt.size<3). Or I 
> missed something?

At which point please put in a respective ASSERT(), possibly amended
by a brief comment.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-11-30 23:27     ` Oleksandr
  2020-12-01  7:55       ` Jan Beulich
@ 2020-12-01 10:23       ` Julien Grall
  1 sibling, 0 replies; 127+ messages in thread
From: Julien Grall @ 2020-12-01 10:23 UTC (permalink / raw)
  To: Oleksandr, Volodymyr Babchuk
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall



On 30/11/2020 23:27, Oleksandr wrote:
> 
> On 30.11.20 23:03, Volodymyr Babchuk wrote:
>> Hi,
> 
> Hi Volodymyr
> 
> 
>>
>> Oleksandr Tyshchenko writes:
>>
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> In order to avoid code duplication (both handle_read() and
>>> handle_ioserv() contain the same code for the sign-extension)
>>> put this code to a common helper to be used for both.
>>>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> CC: Julien Grall <julien.grall@arm.com>
>>>
>>> ---
>>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>>> "Add support for Guest IO forwarding to a device emulator"
>>>
>>> Changes V1 -> V2:
>>>     - new patch
>>>
>>> Changes V2 -> V3:
>>>     - no changes
>>> ---
>>> ---
>>>   xen/arch/arm/io.c           | 18 ++----------------
>>>   xen/arch/arm/ioreq.c        | 17 +----------------
>>>   xen/include/asm-arm/traps.h | 24 ++++++++++++++++++++++++
>>>   3 files changed, 27 insertions(+), 32 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>>> index f44cfd4..8d6ec6c 100644
>>> --- a/xen/arch/arm/io.c
>>> +++ b/xen/arch/arm/io.c
>>> @@ -23,6 +23,7 @@
>>>   #include <asm/cpuerrata.h>
>>>   #include <asm/current.h>
>>>   #include <asm/mmio.h>
>>> +#include <asm/traps.h>
>>>   #include <asm/hvm/ioreq.h>
>>>   #include "decode.h"
>>> @@ -39,26 +40,11 @@ static enum io_state handle_read(const struct 
>>> mmio_handler *handler,
>>>        * setting r).
>>>        */
>>>       register_t r = 0;
>>> -    uint8_t size = (1 << dabt.size) * 8;
>>>       if ( !handler->ops->read(v, info, &r, handler->priv) )
>>>           return IO_ABORT;
>>> -    /*
>>> -     * Sign extend if required.
>>> -     * Note that we expect the read handler to have zeroed the bits
>>> -     * outside the requested access size.
>>> -     */
>>> -    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>> -    {
>>> -        /*
>>> -         * We are relying on register_t using the same as
>>> -         * an unsigned long in order to keep the 32-bit assembly
>>> -         * code smaller.
>>> -         */
>>> -        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>> -        r |= (~0UL) << size;
>>> -    }
>>> +    r = sign_extend(dabt, r);
>>>       set_user_reg(regs, dabt.reg, r);
>>> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
>>> index f08190c..2f39289 100644
>>> --- a/xen/arch/arm/ioreq.c
>>> +++ b/xen/arch/arm/ioreq.c
>>> @@ -28,7 +28,6 @@ enum io_state handle_ioserv(struct cpu_user_regs 
>>> *regs, struct vcpu *v)
>>>       const union hsr hsr = { .bits = regs->hsr };
>>>       const struct hsr_dabt dabt = hsr.dabt;
>>>       /* Code is similar to handle_read */
>>> -    uint8_t size = (1 << dabt.size) * 8;
>>>       register_t r = v->io.req.data;
>>>       /* We are done with the IO */
>>> @@ -37,21 +36,7 @@ enum io_state handle_ioserv(struct cpu_user_regs 
>>> *regs, struct vcpu *v)
>>>       if ( dabt.write )
>>>           return IO_HANDLED;
>>> -    /*
>>> -     * Sign extend if required.
>>> -     * Note that we expect the read handler to have zeroed the bits
>>> -     * outside the requested access size.
>>> -     */
>>> -    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>> -    {
>>> -        /*
>>> -         * We are relying on register_t using the same as
>>> -         * an unsigned long in order to keep the 32-bit assembly
>>> -         * code smaller.
>>> -         */
>>> -        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>> -        r |= (~0UL) << size;
>>> -    }
>>> +    r = sign_extend(dabt, r);
>>>       set_user_reg(regs, dabt.reg, r);
>>> diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h
>>> index 997c378..e301c44 100644
>>> --- a/xen/include/asm-arm/traps.h
>>> +++ b/xen/include/asm-arm/traps.h
>>> @@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const 
>>> struct cpu_user_regs *regs)
>>>           (unsigned long)abort_guest_exit_end == regs->pc;
>>>   }
>>> +/* Check whether the sign extension is required and perform it */
>>> +static inline register_t sign_extend(const struct hsr_dabt dabt, 
>>> register_t r)
>>> +{
>>> +    uint8_t size = (1 << dabt.size) * 8;
>>> +
>>> +    /*
>>> +     * Sign extend if required.
>>> +     * Note that we expect the read handler to have zeroed the bits
>>> +     * outside the requested access size.
>>> +     */
>>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>> +    {
>>> +        /*
>>> +         * We are relying on register_t using the same as
>>> +         * an unsigned long in order to keep the 32-bit assembly
>>> +         * code smaller.
>>> +         */
>>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>> +        r |= (~0UL) << size;
>> If `size` is 64, you will get undefined behavior there.
> I think, we don't need to worry about undefined behavior here. Having 
> size=64 would be possible with doubleword (dabt.size=3). But if "r" 
> adjustment gets called (I mean Syndrome Sign Extend bit is set) then
> we deal with byte, halfword or word operations (dabt.size<3). Or I 
> missed something?

This is known and was pointed out in the commit message introducing the 
sign-extension:

"Note that the bit can only be set for access size smaller than the 
register size (i.e byte/half-word for aarch32, byte/half-word/word for 
aarch32). So we don't have to worry about undefined C behavior."

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-12-01  7:55       ` Jan Beulich
@ 2020-12-01 10:30         ` Julien Grall
  2020-12-01 10:42           ` Oleksandr
  2020-12-01 10:49           ` Jan Beulich
  0 siblings, 2 replies; 127+ messages in thread
From: Julien Grall @ 2020-12-01 10:30 UTC (permalink / raw)
  To: Jan Beulich, Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk

Hi Jan,

On 01/12/2020 07:55, Jan Beulich wrote:
> On 01.12.2020 00:27, Oleksandr wrote:
>> On 30.11.20 23:03, Volodymyr Babchuk wrote:
>>> Oleksandr Tyshchenko writes:
>>>> --- a/xen/include/asm-arm/traps.h
>>>> +++ b/xen/include/asm-arm/traps.h
>>>> @@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const struct cpu_user_regs *regs)
>>>>            (unsigned long)abort_guest_exit_end == regs->pc;
>>>>    }
>>>>    
>>>> +/* Check whether the sign extension is required and perform it */
>>>> +static inline register_t sign_extend(const struct hsr_dabt dabt, register_t r)
>>>> +{
>>>> +    uint8_t size = (1 << dabt.size) * 8;
>>>> +
>>>> +    /*
>>>> +     * Sign extend if required.
>>>> +     * Note that we expect the read handler to have zeroed the bits
>>>> +     * outside the requested access size.
>>>> +     */
>>>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>>> +    {
>>>> +        /*
>>>> +         * We are relying on register_t using the same as
>>>> +         * an unsigned long in order to keep the 32-bit assembly
>>>> +         * code smaller.
>>>> +         */
>>>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>>> +        r |= (~0UL) << size;
>>> If `size` is 64, you will get undefined behavior there.
>> I think, we don't need to worry about undefined behavior here. Having
>> size=64 would be possible with doubleword (dabt.size=3). But if "r"
>> adjustment gets called (I mean Syndrome Sign Extend bit is set) then
>> we deal with byte, halfword or word operations (dabt.size<3). Or I
>> missed something?
> 
> At which point please put in a respective ASSERT(), possibly amended
> by a brief comment.

ASSERT()s are only meant to catch programatic error. However, in this 
case, the bigger risk is an hardware bug such as advertising a sign 
extension for either 64-bit (or 32-bit) on Arm64 (resp. Arm32).

Actually the Armv8 spec is a bit more blurry when running in AArch32 
state because they suggest that the sign extension can be set even for 
32-bit access. I think this is a spelling mistake, but it is probably 
better to be cautious here.

Therefore, I would recommend to rework the code so it is only called 
when len < sizeof(register_t).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-12-01 10:30         ` Julien Grall
@ 2020-12-01 10:42           ` Oleksandr
  2020-12-01 12:13             ` Julien Grall
  2020-12-01 10:49           ` Jan Beulich
  1 sibling, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-01 10:42 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jan Beulich, xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk


On 01.12.20 12:30, Julien Grall wrote:

Hi Julien

> Hi Jan,
>
> On 01/12/2020 07:55, Jan Beulich wrote:
>> On 01.12.2020 00:27, Oleksandr wrote:
>>> On 30.11.20 23:03, Volodymyr Babchuk wrote:
>>>> Oleksandr Tyshchenko writes:
>>>>> --- a/xen/include/asm-arm/traps.h
>>>>> +++ b/xen/include/asm-arm/traps.h
>>>>> @@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const 
>>>>> struct cpu_user_regs *regs)
>>>>>            (unsigned long)abort_guest_exit_end == regs->pc;
>>>>>    }
>>>>>    +/* Check whether the sign extension is required and perform it */
>>>>> +static inline register_t sign_extend(const struct hsr_dabt dabt, 
>>>>> register_t r)
>>>>> +{
>>>>> +    uint8_t size = (1 << dabt.size) * 8;
>>>>> +
>>>>> +    /*
>>>>> +     * Sign extend if required.
>>>>> +     * Note that we expect the read handler to have zeroed the bits
>>>>> +     * outside the requested access size.
>>>>> +     */
>>>>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>>>> +    {
>>>>> +        /*
>>>>> +         * We are relying on register_t using the same as
>>>>> +         * an unsigned long in order to keep the 32-bit assembly
>>>>> +         * code smaller.
>>>>> +         */
>>>>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>>>> +        r |= (~0UL) << size;
>>>> If `size` is 64, you will get undefined behavior there.
>>> I think, we don't need to worry about undefined behavior here. Having
>>> size=64 would be possible with doubleword (dabt.size=3). But if "r"
>>> adjustment gets called (I mean Syndrome Sign Extend bit is set) then
>>> we deal with byte, halfword or word operations (dabt.size<3). Or I
>>> missed something?
>>
>> At which point please put in a respective ASSERT(), possibly amended
>> by a brief comment.
>
> ASSERT()s are only meant to catch programatic error. However, in this 
> case, the bigger risk is an hardware bug such as advertising a sign 
> extension for either 64-bit (or 32-bit) on Arm64 (resp. Arm32).
>
> Actually the Armv8 spec is a bit more blurry when running in AArch32 
> state because they suggest that the sign extension can be set even for 
> 32-bit access. I think this is a spelling mistake, but it is probably 
> better to be cautious here.
>
> Therefore, I would recommend to rework the code so it is only called 
> when len < sizeof(register_t).

I am not sure I understand the recommendation, could you please clarify 
(also I don't see 'len' being used here).


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-12-01 10:30         ` Julien Grall
  2020-12-01 10:42           ` Oleksandr
@ 2020-12-01 10:49           ` Jan Beulich
  1 sibling, 0 replies; 127+ messages in thread
From: Jan Beulich @ 2020-12-01 10:49 UTC (permalink / raw)
  To: Julien Grall, Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk

On 01.12.2020 11:30, Julien Grall wrote:
> Hi Jan,
> 
> On 01/12/2020 07:55, Jan Beulich wrote:
>> On 01.12.2020 00:27, Oleksandr wrote:
>>> On 30.11.20 23:03, Volodymyr Babchuk wrote:
>>>> Oleksandr Tyshchenko writes:
>>>>> --- a/xen/include/asm-arm/traps.h
>>>>> +++ b/xen/include/asm-arm/traps.h
>>>>> @@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const struct cpu_user_regs *regs)
>>>>>            (unsigned long)abort_guest_exit_end == regs->pc;
>>>>>    }
>>>>>    
>>>>> +/* Check whether the sign extension is required and perform it */
>>>>> +static inline register_t sign_extend(const struct hsr_dabt dabt, register_t r)
>>>>> +{
>>>>> +    uint8_t size = (1 << dabt.size) * 8;
>>>>> +
>>>>> +    /*
>>>>> +     * Sign extend if required.
>>>>> +     * Note that we expect the read handler to have zeroed the bits
>>>>> +     * outside the requested access size.
>>>>> +     */
>>>>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>>>> +    {
>>>>> +        /*
>>>>> +         * We are relying on register_t using the same as
>>>>> +         * an unsigned long in order to keep the 32-bit assembly
>>>>> +         * code smaller.
>>>>> +         */
>>>>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>>>> +        r |= (~0UL) << size;
>>>> If `size` is 64, you will get undefined behavior there.
>>> I think, we don't need to worry about undefined behavior here. Having
>>> size=64 would be possible with doubleword (dabt.size=3). But if "r"
>>> adjustment gets called (I mean Syndrome Sign Extend bit is set) then
>>> we deal with byte, halfword or word operations (dabt.size<3). Or I
>>> missed something?
>>
>> At which point please put in a respective ASSERT(), possibly amended
>> by a brief comment.
> 
> ASSERT()s are only meant to catch programatic error. However, in this 
> case, the bigger risk is an hardware bug such as advertising a sign 
> extension for either 64-bit (or 32-bit) on Arm64 (resp. Arm32).
> 
> Actually the Armv8 spec is a bit more blurry when running in AArch32 
> state because they suggest that the sign extension can be set even for 
> 32-bit access. I think this is a spelling mistake, but it is probably 
> better to be cautious here.
> 
> Therefore, I would recommend to rework the code so it is only called 
> when len < sizeof(register_t).

This would be even better in this case, I agree.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common
  2020-11-30 10:31 ` [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
@ 2020-12-01 11:03   ` Alex Bennée
  2020-12-01 18:53     ` Oleksandr
  2020-12-07 11:13   ` Jan Beulich
  1 sibling, 1 reply; 127+ messages in thread
From: Alex Bennée @ 2020-12-01 11:03 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel


Oleksandr Tyshchenko <olekstysh@gmail.com> writes:

> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> As a lot of x86 code can be re-used on Arm later on, this
> patch makes some preparation to x86/hvm/ioreq.c before moving
> to the common code. This way we will get a verbatim copy
<snip>
>
> It worth mentioning that a code which checks the return value of
> p2m_set_ioreq_server() in hvm_map_mem_type_to_ioreq_server() was
> folded into arch_ioreq_server_map_mem_type() for the clear split.
> So the p2m_change_entry_type_global() is called with ioreq_server
> lock held.
<snip>
>  
> +/* Called with ioreq_server lock held */
> +int arch_ioreq_server_map_mem_type(struct domain *d,
> +                                   struct hvm_ioreq_server *s,
> +                                   uint32_t flags)
> +{
> +    int rc = p2m_set_ioreq_server(d, flags, s);
> +
> +    if ( rc == 0 && flags == 0 )
> +    {
> +        const struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +        if ( read_atomic(&p2m->ioreq.entry_count) )
> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
> +    }
> +
> +    return rc;
> +}
> +
>  /*
>   * Map or unmap an ioreq server to specific memory type. For now, only
>   * HVMMEM_ioreq_server is supported, and in the future new types can be
> @@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>      if ( s->emulator != current->domain )
>          goto out;
>  
> -    rc = p2m_set_ioreq_server(d, flags, s);
> +    rc = arch_ioreq_server_map_mem_type(d, s, flags);
>  
>   out:
>      spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>  
> -    if ( rc == 0 && flags == 0 )
> -    {
> -        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> -
> -        if ( read_atomic(&p2m->ioreq.entry_count) )
> -            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
> -    }
> -

It should be noted that p2m holds it's own lock but I'm unfamiliar with
Xen's locking architecture. Is there anything that prevents another vCPU
accessing a page that is also being used my ioreq on the first vCPU?

Assuming that deadlock isn't a possibility to my relatively untrained
eye this looks good to me:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 02/23] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving
  2020-11-30 10:31 ` [PATCH V3 02/23] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving Oleksandr Tyshchenko
@ 2020-12-01 11:07   ` Alex Bennée
  2020-12-07 11:19   ` Jan Beulich
  1 sibling, 0 replies; 127+ messages in thread
From: Alex Bennée @ 2020-12-01 11:07 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel


Oleksandr Tyshchenko <olekstysh@gmail.com> writes:

> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> This patch continues to make some preparation to x86/hvm/ioreq.c
> before moving to the common code.
>
> Add IOREQ_STATUS_* #define-s and update candidates for moving
> since X86EMUL_* shouldn't be exposed to the common code in
> that form.
>
> This support is going to be used on Arm to be able run device
> emulator outside of Xen hypervisor.
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> CC: Julien Grall <julien.grall@arm.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-12-01 10:42           ` Oleksandr
@ 2020-12-01 12:13             ` Julien Grall
  2020-12-01 12:24               ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Julien Grall @ 2020-12-01 12:13 UTC (permalink / raw)
  To: Oleksandr
  Cc: Jan Beulich, xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk

Hi Oleksandr,

On 01/12/2020 10:42, Oleksandr wrote:
> 
> On 01.12.20 12:30, Julien Grall wrote:
> 
> Hi Julien
> 
>> Hi Jan,
>>
>> On 01/12/2020 07:55, Jan Beulich wrote:
>>> On 01.12.2020 00:27, Oleksandr wrote:
>>>> On 30.11.20 23:03, Volodymyr Babchuk wrote:
>>>>> Oleksandr Tyshchenko writes:
>>>>>> --- a/xen/include/asm-arm/traps.h
>>>>>> +++ b/xen/include/asm-arm/traps.h
>>>>>> @@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const 
>>>>>> struct cpu_user_regs *regs)
>>>>>>            (unsigned long)abort_guest_exit_end == regs->pc;
>>>>>>    }
>>>>>>    +/* Check whether the sign extension is required and perform it */
>>>>>> +static inline register_t sign_extend(const struct hsr_dabt dabt, 
>>>>>> register_t r)
>>>>>> +{
>>>>>> +    uint8_t size = (1 << dabt.size) * 8;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * Sign extend if required.
>>>>>> +     * Note that we expect the read handler to have zeroed the bits
>>>>>> +     * outside the requested access size.
>>>>>> +     */
>>>>>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>>>>> +    {
>>>>>> +        /*
>>>>>> +         * We are relying on register_t using the same as
>>>>>> +         * an unsigned long in order to keep the 32-bit assembly
>>>>>> +         * code smaller.
>>>>>> +         */
>>>>>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>>>>> +        r |= (~0UL) << size;
>>>>> If `size` is 64, you will get undefined behavior there.
>>>> I think, we don't need to worry about undefined behavior here. Having
>>>> size=64 would be possible with doubleword (dabt.size=3). But if "r"
>>>> adjustment gets called (I mean Syndrome Sign Extend bit is set) then
>>>> we deal with byte, halfword or word operations (dabt.size<3). Or I
>>>> missed something?
>>>
>>> At which point please put in a respective ASSERT(), possibly amended
>>> by a brief comment.
>>
>> ASSERT()s are only meant to catch programatic error. However, in this 
>> case, the bigger risk is an hardware bug such as advertising a sign 
>> extension for either 64-bit (or 32-bit) on Arm64 (resp. Arm32).
>>
>> Actually the Armv8 spec is a bit more blurry when running in AArch32 
>> state because they suggest that the sign extension can be set even for 
>> 32-bit access. I think this is a spelling mistake, but it is probably 
>> better to be cautious here.
>>
>> Therefore, I would recommend to rework the code so it is only called 
>> when len < sizeof(register_t).
> 
> I am not sure I understand the recommendation, could you please clarify 
> (also I don't see 'len' being used here).

Sorry I meant 'size'. I think something like:

if ( dabt.sign && (size < sizeof(register_t)) &&
      (r & (1UL << (size - 1)) )
{
}

Another posibility would be:

if ( dabt.sign && (size < sizeof(register_t)) )
{
    /* find whether the sign bit is set and propagate it */
}

I have a slight preference for the latter as the "if" is easier to read.

In any case, I think this change should be done in a separate patch (I 
don't mint whether this is done after or before this one).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-12-01 12:13             ` Julien Grall
@ 2020-12-01 12:24               ` Oleksandr
  2020-12-01 12:28                 ` Julien Grall
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-01 12:24 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jan Beulich, xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk


On 01.12.20 14:13, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien.


>
>>>>>>> --- a/xen/include/asm-arm/traps.h
>>>>>>> +++ b/xen/include/asm-arm/traps.h
>>>>>>> @@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const 
>>>>>>> struct cpu_user_regs *regs)
>>>>>>>            (unsigned long)abort_guest_exit_end == regs->pc;
>>>>>>>    }
>>>>>>>    +/* Check whether the sign extension is required and perform 
>>>>>>> it */
>>>>>>> +static inline register_t sign_extend(const struct hsr_dabt 
>>>>>>> dabt, register_t r)
>>>>>>> +{
>>>>>>> +    uint8_t size = (1 << dabt.size) * 8;
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * Sign extend if required.
>>>>>>> +     * Note that we expect the read handler to have zeroed the 
>>>>>>> bits
>>>>>>> +     * outside the requested access size.
>>>>>>> +     */
>>>>>>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>>>>>> +    {
>>>>>>> +        /*
>>>>>>> +         * We are relying on register_t using the same as
>>>>>>> +         * an unsigned long in order to keep the 32-bit assembly
>>>>>>> +         * code smaller.
>>>>>>> +         */
>>>>>>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>>>>>> +        r |= (~0UL) << size;
>>>>>> If `size` is 64, you will get undefined behavior there.
>>>>> I think, we don't need to worry about undefined behavior here. Having
>>>>> size=64 would be possible with doubleword (dabt.size=3). But if "r"
>>>>> adjustment gets called (I mean Syndrome Sign Extend bit is set) then
>>>>> we deal with byte, halfword or word operations (dabt.size<3). Or I
>>>>> missed something?
>>>>
>>>> At which point please put in a respective ASSERT(), possibly amended
>>>> by a brief comment.
>>>
>>> ASSERT()s are only meant to catch programatic error. However, in 
>>> this case, the bigger risk is an hardware bug such as advertising a 
>>> sign extension for either 64-bit (or 32-bit) on Arm64 (resp. Arm32).
>>>
>>> Actually the Armv8 spec is a bit more blurry when running in AArch32 
>>> state because they suggest that the sign extension can be set even 
>>> for 32-bit access. I think this is a spelling mistake, but it is 
>>> probably better to be cautious here.
>>>
>>> Therefore, I would recommend to rework the code so it is only called 
>>> when len < sizeof(register_t).
>>
>> I am not sure I understand the recommendation, could you please 
>> clarify (also I don't see 'len' being used here).
>
> Sorry I meant 'size'. I think something like:
>
> if ( dabt.sign && (size < sizeof(register_t)) &&
>      (r & (1UL << (size - 1)) )
> {
> }
>
> Another posibility would be:
>
> if ( dabt.sign && (size < sizeof(register_t)) )
> {
>    /* find whether the sign bit is set and propagate it */
> }
>
> I have a slight preference for the latter as the "if" is easier to read.
>
> In any case, I think this change should be done in a separate patch (I 
> don't mint whether this is done after or before this one).

ok, I got it, thank you for the clarification. Of course, I will do that 
in a separate patch, since the current one is to avoid code duplication 
only. BTW, do you have comments on this patch itself?

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 19/23] xen/arm: io: Abstract sign-extension
  2020-12-01 12:24               ` Oleksandr
@ 2020-12-01 12:28                 ` Julien Grall
  0 siblings, 0 replies; 127+ messages in thread
From: Julien Grall @ 2020-12-01 12:28 UTC (permalink / raw)
  To: Oleksandr
  Cc: Jan Beulich, xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk



On 01/12/2020 12:24, Oleksandr wrote:
> 
> On 01.12.20 14:13, Julien Grall wrote:
>> Hi Oleksandr,
> 
> Hi Julien.
> 
> 
>>
>>>>>>>> --- a/xen/include/asm-arm/traps.h
>>>>>>>> +++ b/xen/include/asm-arm/traps.h
>>>>>>>> @@ -83,6 +83,30 @@ static inline bool VABORT_GEN_BY_GUEST(const 
>>>>>>>> struct cpu_user_regs *regs)
>>>>>>>>            (unsigned long)abort_guest_exit_end == regs->pc;
>>>>>>>>    }
>>>>>>>>    +/* Check whether the sign extension is required and perform 
>>>>>>>> it */
>>>>>>>> +static inline register_t sign_extend(const struct hsr_dabt 
>>>>>>>> dabt, register_t r)
>>>>>>>> +{
>>>>>>>> +    uint8_t size = (1 << dabt.size) * 8;
>>>>>>>> +
>>>>>>>> +    /*
>>>>>>>> +     * Sign extend if required.
>>>>>>>> +     * Note that we expect the read handler to have zeroed the 
>>>>>>>> bits
>>>>>>>> +     * outside the requested access size.
>>>>>>>> +     */
>>>>>>>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>>>>>>> +    {
>>>>>>>> +        /*
>>>>>>>> +         * We are relying on register_t using the same as
>>>>>>>> +         * an unsigned long in order to keep the 32-bit assembly
>>>>>>>> +         * code smaller.
>>>>>>>> +         */
>>>>>>>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>>>>>>> +        r |= (~0UL) << size;
>>>>>>> If `size` is 64, you will get undefined behavior there.
>>>>>> I think, we don't need to worry about undefined behavior here. Having
>>>>>> size=64 would be possible with doubleword (dabt.size=3). But if "r"
>>>>>> adjustment gets called (I mean Syndrome Sign Extend bit is set) then
>>>>>> we deal with byte, halfword or word operations (dabt.size<3). Or I
>>>>>> missed something?
>>>>>
>>>>> At which point please put in a respective ASSERT(), possibly amended
>>>>> by a brief comment.
>>>>
>>>> ASSERT()s are only meant to catch programatic error. However, in 
>>>> this case, the bigger risk is an hardware bug such as advertising a 
>>>> sign extension for either 64-bit (or 32-bit) on Arm64 (resp. Arm32).
>>>>
>>>> Actually the Armv8 spec is a bit more blurry when running in AArch32 
>>>> state because they suggest that the sign extension can be set even 
>>>> for 32-bit access. I think this is a spelling mistake, but it is 
>>>> probably better to be cautious here.
>>>>
>>>> Therefore, I would recommend to rework the code so it is only called 
>>>> when len < sizeof(register_t).
>>>
>>> I am not sure I understand the recommendation, could you please 
>>> clarify (also I don't see 'len' being used here).
>>
>> Sorry I meant 'size'. I think something like:
>>
>> if ( dabt.sign && (size < sizeof(register_t)) &&
>>      (r & (1UL << (size - 1)) )
>> {
>> }
>>
>> Another posibility would be:
>>
>> if ( dabt.sign && (size < sizeof(register_t)) )
>> {
>>    /* find whether the sign bit is set and propagate it */
>> }
>>
>> I have a slight preference for the latter as the "if" is easier to read.
>>
>> In any case, I think this change should be done in a separate patch (I 
>> don't mint whether this is done after or before this one).
> 
> ok, I got it, thank you for the clarification. Of course, I will do that 
> in a separate patch, since the current one is to avoid code duplication 
> only. BTW, do you have comments on this patch itself?

The series is in my TODO list. I will have a look once in a bit :).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
  2020-11-30 20:51   ` Volodymyr Babchuk
@ 2020-12-01 12:46     ` Julien Grall
  0 siblings, 0 replies; 127+ messages in thread
From: Julien Grall @ 2020-12-01 12:46 UTC (permalink / raw)
  To: Volodymyr Babchuk, Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall

Hi Volodymyr,

On 30/11/2020 20:51, Volodymyr Babchuk wrote:
> Oleksandr Tyshchenko writes:
> 
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch adds proper handling of return value of
>> vcpu_ioreq_handle_completion() which involves using a loop
>> in leave_hypervisor_to_guest().
>>
>> The reason to use an unbounded loop here is the fact that vCPU
>> shouldn't continue until an I/O has completed. In Xen case, if an I/O
>> never completes then it most likely means that something went horribly
>> wrong with the Device Emulator. And it is most likely not safe to
>> continue. So letting the vCPU to spin forever if I/O never completes
>> is a safer action than letting it continue and leaving the guest in
>> unclear state and is the best what we can do for now.
>>
>> This wouldn't be an issue for Xen as do_softirq() would be called at
>> every loop. In case of failure, the guest will crash and the vCPU
>> will be unscheduled.
>>
> 
> Why you don't block vcpu there and unblock it when response is ready?

The vCPU will already block while waiting for the event channel. See the 
call wait_for_event_channel() in the ioreq code. However, you can still 
receive spurious unblock (e.g. the event channel notificaiton is 
received before the I/O is unhandled). So you have to loop around and 
check if there are more work to do.

> If
> I got it right, "client" vcpu will spin in the loop, eating own
> scheduling budget with no useful work done.

You can't really do much about that if you have a rogue/buggy device model.

> In the worst case, it will
> prevent "server" vcpu from running.

How so? Xen will raise the schedule softirq if the I/O is handled. You 
would have a pretty buggy system if your "client" vCPU is considered to 
be a much higher priority than your "server" vCPU.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common
  2020-12-01 11:03   ` Alex Bennée
@ 2020-12-01 18:53     ` Oleksandr
  2020-12-01 19:36       ` Alex Bennée
  2020-12-02  8:00       ` Jan Beulich
  0 siblings, 2 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-01 18:53 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel


On 01.12.20 13:03, Alex Bennée wrote:

Hi Alex

> Oleksandr Tyshchenko <olekstysh@gmail.com> writes:
>
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> As a lot of x86 code can be re-used on Arm later on, this
>> patch makes some preparation to x86/hvm/ioreq.c before moving
>> to the common code. This way we will get a verbatim copy
> <snip>
>> It worth mentioning that a code which checks the return value of
>> p2m_set_ioreq_server() in hvm_map_mem_type_to_ioreq_server() was
>> folded into arch_ioreq_server_map_mem_type() for the clear split.
>> So the p2m_change_entry_type_global() is called with ioreq_server
>> lock held.
> <snip>
>>   
>> +/* Called with ioreq_server lock held */
>> +int arch_ioreq_server_map_mem_type(struct domain *d,
>> +                                   struct hvm_ioreq_server *s,
>> +                                   uint32_t flags)
>> +{
>> +    int rc = p2m_set_ioreq_server(d, flags, s);
>> +
>> +    if ( rc == 0 && flags == 0 )
>> +    {
>> +        const struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>>   /*
>>    * Map or unmap an ioreq server to specific memory type. For now, only
>>    * HVMMEM_ioreq_server is supported, and in the future new types can be
>> @@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>       if ( s->emulator != current->domain )
>>           goto out;
>>   
>> -    rc = p2m_set_ioreq_server(d, flags, s);
>> +    rc = arch_ioreq_server_map_mem_type(d, s, flags);
>>   
>>    out:
>>       spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>>   
>> -    if ( rc == 0 && flags == 0 )
>> -    {
>> -        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> -
>> -        if ( read_atomic(&p2m->ioreq.entry_count) )
>> -            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>> -    }
>> -
> It should be noted that p2m holds it's own lock but I'm unfamiliar with
> Xen's locking architecture. Is there anything that prevents another vCPU
> accessing a page that is also being used my ioreq on the first vCPU?
I am not sure that I would be able to provide reasonable explanations here.
All what I understand is that p2m_change_entry_type_global() x86 
specific (we don't have p2m_ioreq_server concept on Arm) and should 
remain as such (not exposed to the common code).
IIRC, I raised a question during V2 review whether we could have ioreq 
server lock around the call to p2m_change_entry_type_global() and didn't 
get objections. I may mistake, but looks like the lock being used
in p2m_change_entry_type_global() is yet another lock for protecting 
page table operations, so unlikely we could get into the trouble calling 
this function with the ioreq server lock held.


>
> Assuming that deadlock isn't a possibility to my relatively untrained
> eye this looks good to me:
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

Thank you.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common
  2020-12-01 18:53     ` Oleksandr
@ 2020-12-01 19:36       ` Alex Bennée
  2020-12-02  8:00       ` Jan Beulich
  1 sibling, 0 replies; 127+ messages in thread
From: Alex Bennée @ 2020-12-01 19:36 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel


Oleksandr <olekstysh@gmail.com> writes:

> On 01.12.20 13:03, Alex Bennée wrote:
>
> Hi Alex
>
>> Oleksandr Tyshchenko <olekstysh@gmail.com> writes:
>>
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> As a lot of x86 code can be re-used on Arm later on, this
>>> patch makes some preparation to x86/hvm/ioreq.c before moving
>>> to the common code. This way we will get a verbatim copy
>> <snip>
>>> It worth mentioning that a code which checks the return value of
>>> p2m_set_ioreq_server() in hvm_map_mem_type_to_ioreq_server() was
>>> folded into arch_ioreq_server_map_mem_type() for the clear split.
>>> So the p2m_change_entry_type_global() is called with ioreq_server
>>> lock held.
>> <snip>
>>>   
>>> +/* Called with ioreq_server lock held */
>>> +int arch_ioreq_server_map_mem_type(struct domain *d,
>>> +                                   struct hvm_ioreq_server *s,
>>> +                                   uint32_t flags)
>>> +{
>>> +    int rc = p2m_set_ioreq_server(d, flags, s);
>>> +
>>> +    if ( rc == 0 && flags == 0 )
>>> +    {
>>> +        const struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +
>>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>> +    }
>>> +
>>> +    return rc;
>>> +}
>>> +
>>>   /*
>>>    * Map or unmap an ioreq server to specific memory type. For now, only
>>>    * HVMMEM_ioreq_server is supported, and in the future new types can be
>>> @@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>       if ( s->emulator != current->domain )
>>>           goto out;
>>>   
>>> -    rc = p2m_set_ioreq_server(d, flags, s;)
>>> +    rc = arch_ioreq_server_map_mem_type(d, s, flags);
>>>   
>>>    out:
>>>       spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>>>   
>>> -    if ( rc == 0 && flags == 0 )
>>> -    {
>>> -        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> -
>>> -        if ( read_atomic(&p2m->ioreq.entry_count) )
>>> -            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>> -    }
>>> -
>> It should be noted that p2m holds it's own lock but I'm unfamiliar with
>> Xen's locking architecture. Is there anything that prevents another vCPU
>> accessing a page that is also being used my ioreq on the first vCPU?
> I am not sure that I would be able to provide reasonable explanations here.
> All what I understand is that p2m_change_entry_type_global() x86 
> specific (we don't have p2m_ioreq_server concept on Arm) and should 
> remain as such (not exposed to the common code).
> IIRC, I raised a question during V2 review whether we could have ioreq 
> server lock around the call to p2m_change_entry_type_global() and didn't 
> get objections. I may mistake, but looks like the lock being used
> in p2m_change_entry_type_global() is yet another lock for protecting 
> page table operations, so unlikely we could get into the trouble calling 
> this function with the ioreq server lock held.

The p2m lock code looks designed to be recursive so I could only
envision a problem where a page somehow racing to lock under the ioreq
lock which I don't think is possible. However reasoning about locking is
hard if your not familiar - it's one reason we added Promela/Spin [1] models to
QEMU for our various locking regimes.


[1] http://spinroot.com/spin/whatispin.html
[2] https://git.qemu.org/?p=qemu.git;a=tree;f=docs/spin;h=cc168025131676429a560ca70d7234a56f958092;hb=HEAD

>
>
>>
>> Assuming that deadlock isn't a possibility to my relatively untrained
>> eye this looks good to me:
>>
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>
> Thank you.


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common
  2020-12-01 18:53     ` Oleksandr
  2020-12-01 19:36       ` Alex Bennée
@ 2020-12-02  8:00       ` Jan Beulich
  2020-12-02 11:19         ` Oleksandr
  1 sibling, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-02  8:00 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel, Alex Bennée

On 01.12.2020 19:53, Oleksandr wrote:
> On 01.12.20 13:03, Alex Bennée wrote:
>> Oleksandr Tyshchenko <olekstysh@gmail.com> writes:
>>> @@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>       if ( s->emulator != current->domain )
>>>           goto out;
>>>   
>>> -    rc = p2m_set_ioreq_server(d, flags, s);
>>> +    rc = arch_ioreq_server_map_mem_type(d, s, flags);
>>>   
>>>    out:
>>>       spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>>>   
>>> -    if ( rc == 0 && flags == 0 )
>>> -    {
>>> -        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> -
>>> -        if ( read_atomic(&p2m->ioreq.entry_count) )
>>> -            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>> -    }
>>> -
>> It should be noted that p2m holds it's own lock but I'm unfamiliar with
>> Xen's locking architecture. Is there anything that prevents another vCPU
>> accessing a page that is also being used my ioreq on the first vCPU?
> I am not sure that I would be able to provide reasonable explanations here.
> All what I understand is that p2m_change_entry_type_global() x86 
> specific (we don't have p2m_ioreq_server concept on Arm) and should 
> remain as such (not exposed to the common code).
> IIRC, I raised a question during V2 review whether we could have ioreq 
> server lock around the call to p2m_change_entry_type_global() and didn't 
> get objections.

Not getting objections doesn't mean much. Personally I don't recall
such a question, but this doesn't mean much. The important thing
here is that you properly justify this change in the description (I
didn't look at this version of the patch as a whole yet, so quite
likely you actually do). This is because you need to guarantee that
you don't introduce any lock order violations by this. There also
should be an attempt to avoid future introduction of issues, by
adding lock nesting related comments in suitable places. Again,
quite likely you actually do so, and I will notice it once looking
at the patch as a whole.

All of this said, I think it should be tried hard to avoid
introducing this extra lock nesting, if there aren't other places
already where the same nesting of locks is in effect.

> I may mistake, but looks like the lock being used
> in p2m_change_entry_type_global() is yet another lock for protecting 
> page table operations, so unlikely we could get into the trouble calling 
> this function with the ioreq server lock held.

I'm afraid I don't understand the "yet another" here: The ioreq
server lock clearly serves an entirely different purpose.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common
  2020-12-02  8:00       ` Jan Beulich
@ 2020-12-02 11:19         ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-02 11:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel, Alex Bennée


On 02.12.20 10:00, Jan Beulich wrote:

Hi Jan

> On 01.12.2020 19:53, Oleksandr wrote:
>> On 01.12.20 13:03, Alex Bennée wrote:
>>> Oleksandr Tyshchenko <olekstysh@gmail.com> writes:
>>>> @@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>>        if ( s->emulator != current->domain )
>>>>            goto out;
>>>>    
>>>> -    rc = p2m_set_ioreq_server(d, flags, s);
>>>> +    rc = arch_ioreq_server_map_mem_type(d, s, flags);
>>>>    
>>>>     out:
>>>>        spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>>>>    
>>>> -    if ( rc == 0 && flags == 0 )
>>>> -    {
>>>> -        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> -
>>>> -        if ( read_atomic(&p2m->ioreq.entry_count) )
>>>> -            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>>> -    }
>>>> -
>>> It should be noted that p2m holds it's own lock but I'm unfamiliar with
>>> Xen's locking architecture. Is there anything that prevents another vCPU
>>> accessing a page that is also being used my ioreq on the first vCPU?
>> I am not sure that I would be able to provide reasonable explanations here.
>> All what I understand is that p2m_change_entry_type_global() x86
>> specific (we don't have p2m_ioreq_server concept on Arm) and should
>> remain as such (not exposed to the common code).
>> IIRC, I raised a question during V2 review whether we could have ioreq
>> server lock around the call to p2m_change_entry_type_global() and didn't
>> get objections.
> Not getting objections doesn't mean much. Personally I don't recall
> such a question, but this doesn't mean much.

Sorry for not being clear here. Discussion happened at [1] when I was 
asked to move hvm_map_mem_type_to_ioreq_server() to the common code.


>   The important thing
> here is that you properly justify this change in the description (I
> didn't look at this version of the patch as a whole yet, so quite
> likely you actually do). This is because you need to guarantee that
> you don't introduce any lock order violations by this.
Yes, almost all changes in this patch are mostly mechanical and leave 
things as they are.
The change with p2m_change_entry_type_global() requires an additional 
attention, so decided to put emphasis on touching that
in the description and add a comment in the code that it is called with 
ioreq_server lock held.


[1] 
https://patchwork.kernel.org/project/xen-devel/patch/1602780274-29141-2-git-send-email-olekstysh@gmail.com/#23734839

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common
  2020-11-30 10:31 ` [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
  2020-12-01 11:03   ` Alex Bennée
@ 2020-12-07 11:13   ` Jan Beulich
  2020-12-07 15:27     ` Oleksandr
  1 sibling, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 11:13 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -17,15 +17,15 @@
>   */
>  
>  #include <xen/ctype.h>
> +#include <xen/domain.h>
> +#include <xen/event.h>
>  #include <xen/init.h>
> +#include <xen/irq.h>
>  #include <xen/lib.h>
> -#include <xen/trace.h>
> +#include <xen/paging.h>
>  #include <xen/sched.h>
> -#include <xen/irq.h>
>  #include <xen/softirq.h>
> -#include <xen/domain.h>
> -#include <xen/event.h>
> -#include <xen/paging.h>
> +#include <xen/trace.h>
>  #include <xen/vpci.h>

Seeing this consolidation (thanks!), have you been able to figure
out what xen/ctype.h is needed for here? It looks to me as if it
could be dropped at the same time.

> @@ -601,7 +610,7 @@ static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
>      return rc;
>  }
>  
> -static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
> +void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
>  {
>      hvm_unmap_ioreq_gfn(s, true);
>      hvm_unmap_ioreq_gfn(s, false);

How is this now different from ...

> @@ -674,6 +683,12 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
>      return rc;
>  }
>  
> +void arch_ioreq_server_enable(struct hvm_ioreq_server *s)
> +{
> +    hvm_remove_ioreq_gfn(s, false);
> +    hvm_remove_ioreq_gfn(s, true);
> +}

... this? Imo if at all possible there should be no such duplication
(i.e. at least have this one simply call the earlier one).

> @@ -1080,6 +1105,24 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>      return rc;
>  }
>  
> +/* Called with ioreq_server lock held */
> +int arch_ioreq_server_map_mem_type(struct domain *d,
> +                                   struct hvm_ioreq_server *s,
> +                                   uint32_t flags)
> +{
> +    int rc = p2m_set_ioreq_server(d, flags, s);
> +
> +    if ( rc == 0 && flags == 0 )
> +    {
> +        const struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +
> +        if ( read_atomic(&p2m->ioreq.entry_count) )
> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
> +    }
> +
> +    return rc;
> +}
> +
>  /*
>   * Map or unmap an ioreq server to specific memory type. For now, only
>   * HVMMEM_ioreq_server is supported, and in the future new types can be
> @@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>      if ( s->emulator != current->domain )
>          goto out;
>  
> -    rc = p2m_set_ioreq_server(d, flags, s);
> +    rc = arch_ioreq_server_map_mem_type(d, s, flags);
>  
>   out:
>      spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>  
> -    if ( rc == 0 && flags == 0 )
> -    {
> -        struct p2m_domain *p2m = p2m_get_hostp2m(d);
> -
> -        if ( read_atomic(&p2m->ioreq.entry_count) )
> -            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
> -    }
> -
>      return rc;
>  }

While you mention this change in the description, I'm still
missing justification as to why the change is safe to make. I
continue to think p2m_change_entry_type_global() would better
not be called inside the locked region, if at all possible.

> @@ -1239,33 +1279,28 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>      spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>  }
>  
> -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
> -                                                 ioreq_t *p)
> +int arch_ioreq_server_get_type_addr(const struct domain *d,
> +                                    const ioreq_t *p,
> +                                    uint8_t *type,
> +                                    uint64_t *addr)
>  {
> -    struct hvm_ioreq_server *s;
> -    uint32_t cf8;
> -    uint8_t type;
> -    uint64_t addr;
> -    unsigned int id;
> +    unsigned int cf8 = d->arch.hvm.pci_cf8;
>  
>      if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
> -        return NULL;
> -
> -    cf8 = d->arch.hvm.pci_cf8;
> +        return -EINVAL;

The caller cares about only a boolean. Either make the function
return bool, or (imo better, but others may like this less) have
it return "type" instead of using indirection, using e.g.
negative values to identify errors (which then could still be
errno ones if you wish).

> --- a/xen/include/asm-x86/hvm/ioreq.h
> +++ b/xen/include/asm-x86/hvm/ioreq.h
> @@ -19,6 +19,25 @@
>  #ifndef __ASM_X86_HVM_IOREQ_H__
>  #define __ASM_X86_HVM_IOREQ_H__
>  
> +#define HANDLE_BUFIOREQ(s) \
> +    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
> +
> +bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion);
> +int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s);
> +void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s);
> +void arch_ioreq_server_enable(struct hvm_ioreq_server *s);
> +void arch_ioreq_server_disable(struct hvm_ioreq_server *s);
> +void arch_ioreq_server_destroy(struct hvm_ioreq_server *s);
> +int arch_ioreq_server_map_mem_type(struct domain *d,
> +                                   struct hvm_ioreq_server *s,
> +                                   uint32_t flags);
> +bool arch_ioreq_server_destroy_all(struct domain *d);
> +int arch_ioreq_server_get_type_addr(const struct domain *d,
> +                                    const ioreq_t *p,
> +                                    uint8_t *type,
> +                                    uint64_t *addr);
> +void arch_ioreq_domain_init(struct domain *d);
> +
>  bool hvm_io_pending(struct vcpu *v);
>  bool handle_hvm_io_completion(struct vcpu *v);
>  bool is_ioreq_server_page(struct domain *d, const struct page_info *page);

What's the plan here? Introduce them into the x86 header just
to later move the entire block into the common one? Wouldn't
it make sense to introduce the common header here right away?
Or do you expect to convert some of the simpler ones to inline
functions later on?

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 02/23] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving
  2020-11-30 10:31 ` [PATCH V3 02/23] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving Oleksandr Tyshchenko
  2020-12-01 11:07   ` Alex Bennée
@ 2020-12-07 11:19   ` Jan Beulich
  2020-12-07 15:37     ` Oleksandr
  1 sibling, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 11:19 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> --- a/xen/include/asm-x86/hvm/ioreq.h
> +++ b/xen/include/asm-x86/hvm/ioreq.h
> @@ -74,6 +74,10 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
>  
>  void hvm_ioreq_init(struct domain *d);
>  
> +#define IOREQ_STATUS_HANDLED     X86EMUL_OKAY
> +#define IOREQ_STATUS_UNHANDLED   X86EMUL_UNHANDLEABLE
> +#define IOREQ_STATUS_RETRY       X86EMUL_RETRY

This correlation may not be altered. I think a comment is needed
to this effect, to avoid someone trying to subsequently fold the
x86 and (to be introduced) Arm ones. With that
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 03/23] x86/ioreq: Provide out-of-line wrapper for the handle_mmio()
  2020-11-30 10:31 ` [PATCH V3 03/23] x86/ioreq: Provide out-of-line wrapper for the handle_mmio() Oleksandr Tyshchenko
@ 2020-12-07 11:27   ` Jan Beulich
  2020-12-07 15:39     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 11:27 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -36,6 +36,11 @@
>  #include <public/hvm/ioreq.h>
>  #include <public/hvm/params.h>
>  
> +bool ioreq_complete_mmio(void)
> +{
> +    return handle_mmio();
> +}

As indicated before I don't like out-of-line functions like this
one; I think a #define would be quite fine here, but Paul as the
maintainer thinks differently. So be it. However, shouldn't this
function be named arch_ioreq_complete_mmio() according to the
new naming model, and then ...

> --- a/xen/include/asm-x86/hvm/ioreq.h
> +++ b/xen/include/asm-x86/hvm/ioreq.h
> @@ -74,6 +74,8 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
>  
>  void hvm_ioreq_init(struct domain *d);
>  
> +bool ioreq_complete_mmio(void);

... get declared next to the other arch_*() hooks? With this
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 10/23] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
  2020-11-30 10:31 ` [PATCH V3 10/23] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
@ 2020-12-07 11:35   ` Jan Beulich
  2020-12-07 12:11     ` Jan Beulich
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 11:35 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Volodymyr Babchuk, Oleksandr Tyshchenko,
	xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4699,50 +4699,6 @@ int xenmem_add_to_physmap_one(
>      return rc;
>  }
>  
> -int arch_acquire_resource(struct domain *d, unsigned int type,
> -                          unsigned int id, unsigned long frame,
> -                          unsigned int nr_frames, xen_pfn_t mfn_list[])
> -{
> -    int rc;
> -
> -    switch ( type )
> -    {
> -#ifdef CONFIG_HVM
> -    case XENMEM_resource_ioreq_server:
> -    {
> -        ioservid_t ioservid = id;
> -        unsigned int i;
> -
> -        rc = -EINVAL;
> -        if ( !is_hvm_domain(d) )
> -            break;
> -
> -        if ( id != (unsigned int)ioservid )
> -            break;
> -
> -        rc = 0;
> -        for ( i = 0; i < nr_frames; i++ )
> -        {
> -            mfn_t mfn;
> -
> -            rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn);
> -            if ( rc )
> -                break;
> -
> -            mfn_list[i] = mfn_x(mfn);
> -        }
> -        break;
> -    }
> -#endif
> -
> -    default:
> -        rc = -EOPNOTSUPP;
> -        break;
> -    }
> -
> -    return rc;
> -}

Can't this be accompanied by removal of the xen/ioreq.h inclusion?
(I'm only looking at patch 4 right now, but the renaming there made
the soon to be unnecessary #include quite apparent.)

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common
  2020-11-30 10:31 ` [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
@ 2020-12-07 11:41   ` Jan Beulich
  2020-12-07 19:43     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 11:41 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Paul Durrant, Tim Deegan, Julien Grall, xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> --- a/xen/include/asm-x86/hvm/ioreq.h
> +++ b/xen/include/asm-x86/hvm/ioreq.h
> @@ -19,8 +19,7 @@
>  #ifndef __ASM_X86_HVM_IOREQ_H__
>  #define __ASM_X86_HVM_IOREQ_H__
>  
> -#define HANDLE_BUFIOREQ(s) \
> -    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
> +#include <xen/ioreq.h>

Is there a strict need to do it this way round? Usually the common
header would include the arch one ...

> @@ -38,42 +37,6 @@ int arch_ioreq_server_get_type_addr(const struct domain *d,
>                                      uint64_t *addr);
>  void arch_ioreq_domain_init(struct domain *d);

As already mentioned in an earlier reply: What about these? They
shouldn't get declared once per arch. If anything, ones that
want to be inline functions can / should remain in the per-arch
header.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 05/23] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
  2020-11-30 10:31 ` [PATCH V3 05/23] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common Oleksandr Tyshchenko
@ 2020-12-07 11:47   ` Jan Beulich
  0 siblings, 0 replies; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 11:47 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> --- a/xen/include/xen/ioreq.h
> +++ b/xen/include/xen/ioreq.h
> @@ -21,6 +21,13 @@
>  
>  #include <xen/sched.h>
>  
> +static inline bool ioreq_needs_completion(const ioreq_t *ioreq)
> +{
> +    return ioreq->state == STATE_IOREQ_READY &&
> +           !ioreq->data_is_ptr &&
> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
> +}
> +
>  #define HANDLE_BUFIOREQ(s) \
>      ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)

Personally I would have suggested to keep the #define first, but
I see you've already got Paul's R-b. Applicable parts
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 06/23] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common
  2020-11-30 10:31 ` [PATCH V3 06/23] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common Oleksandr Tyshchenko
@ 2020-12-07 11:48   ` Jan Beulich
  0 siblings, 0 replies; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 11:48 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The IOREQ is a common feature now and these helpers will be used
> on Arm as is. Move them to xen/ioreq.h and replace "hvm" prefixes
> with "ioreq".
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Reviewed-by: Paul Durrant <paul@xen.org>

Applicable parts
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 07/23] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
  2020-11-30 10:31 ` [PATCH V3 07/23] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common Oleksandr Tyshchenko
@ 2020-12-07 11:54   ` Jan Beulich
  0 siblings, 0 replies; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 11:54 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Julien Grall, xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The IOREQ is a common feature now and these structs will be used
> on Arm as is. Move them to xen/ioreq.h and remove "hvm" prefixes.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Applicable parts
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 08/23] xen/ioreq: Move x86's ioreq_server to struct domain
  2020-11-30 10:31 ` [PATCH V3 08/23] xen/ioreq: Move x86's ioreq_server to struct domain Oleksandr Tyshchenko
@ 2020-12-07 12:04   ` Jan Beulich
  2020-12-07 12:12     ` Paul Durrant
  2020-12-07 19:52     ` Oleksandr
  0 siblings, 2 replies; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 12:04 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, Paul Durrant
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Julien Grall, xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The IOREQ is a common feature now and this struct will be used
> on Arm as is. Move it to common struct domain. This also
> significantly reduces the layering violation in the common code
> (*arch.hvm* usage).
> 
> We don't move ioreq_gfn since it is not used in the common code
> (the "legacy" mechanism is x86 specific).
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Applicable parts
Acked-by: Jan Beulich <jbeulich@suse.com>
yet with a question, but maybe more to Paul than to you:

> --- a/xen/include/asm-x86/hvm/domain.h
> +++ b/xen/include/asm-x86/hvm/domain.h
> @@ -63,8 +63,6 @@ struct hvm_pi_ops {
>      void (*vcpu_block)(struct vcpu *);
>  };
>  
> -#define MAX_NR_IOREQ_SERVERS 8
> -
>  struct hvm_domain {
>      /* Guest page range used for non-default ioreq servers */
>      struct {
> @@ -73,12 +71,6 @@ struct hvm_domain {
>          unsigned long legacy_mask; /* indexed by HVM param number */
>      } ioreq_gfn;
>  
> -    /* Lock protects all other values in the sub-struct and the default */
> -    struct {
> -        spinlock_t              lock;
> -        struct ioreq_server *server[MAX_NR_IOREQ_SERVERS];
> -    } ioreq_server;
> -
>      /* Cached CF8 for guest PCI config cycles */
>      uint32_t                pci_cf8;
>  
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -316,6 +316,8 @@ struct sched_unit {
>  
>  struct evtchn_port_ops;
>  
> +#define MAX_NR_IOREQ_SERVERS 8
> +
>  struct domain
>  {
>      domid_t          domain_id;
> @@ -523,6 +525,14 @@ struct domain
>      /* Argo interdomain communication support */
>      struct argo_domain *argo;
>  #endif
> +
> +#ifdef CONFIG_IOREQ_SERVER
> +    /* Lock protects all other values in the sub-struct and the default */
> +    struct {
> +        spinlock_t              lock;
> +        struct ioreq_server     *server[MAX_NR_IOREQ_SERVERS];
> +    } ioreq_server;
> +#endif

The comment gets merely moved, but what "default" does it talk about?
Is this a stale part which would better be dropped at this occasion?

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 09/23] xen/dm: Make x86's DM feature common
  2020-11-30 10:31 ` [PATCH V3 09/23] xen/dm: Make x86's DM feature common Oleksandr Tyshchenko
@ 2020-12-07 12:08   ` Jan Beulich
  2020-12-07 20:23     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 12:08 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Daniel De Graaf, Oleksandr Tyshchenko,
	xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> As a lot of x86 code can be re-used on Arm later on, this patch
> splits devicemodel support into common and arch specific parts.
> 
> The common DM feature is supposed to be built with IOREQ_SERVER
> option enabled (as well as the IOREQ feature), which is selected
> for x86's config HVM for now.
> 
> Also update XSM code a bit to let DM op be used on Arm.
> 
> This support is going to be used on Arm to be able run device
> emulator outside of Xen hypervisor.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Changes RFC -> V1:
>    - update XSM, related changes were pulled from:
>      [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
> 
> Changes V1 -> V2:
>    - update the author of a patch
>    - update patch description
>    - introduce xen/dm.h and move definitions here
> 
> Changes V2 -> V3:
>    - no changes

And my concern regarding the common vs arch nesting also hasn't
changed.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 10/23] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
  2020-12-07 11:35   ` Jan Beulich
@ 2020-12-07 12:11     ` Jan Beulich
  2020-12-07 21:06       ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 12:11 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Volodymyr Babchuk, Oleksandr Tyshchenko,
	xen-devel

On 07.12.2020 12:35, Jan Beulich wrote:
> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -4699,50 +4699,6 @@ int xenmem_add_to_physmap_one(
>>      return rc;
>>  }
>>  
>> -int arch_acquire_resource(struct domain *d, unsigned int type,
>> -                          unsigned int id, unsigned long frame,
>> -                          unsigned int nr_frames, xen_pfn_t mfn_list[])
>> -{
>> -    int rc;
>> -
>> -    switch ( type )
>> -    {
>> -#ifdef CONFIG_HVM
>> -    case XENMEM_resource_ioreq_server:
>> -    {
>> -        ioservid_t ioservid = id;
>> -        unsigned int i;
>> -
>> -        rc = -EINVAL;
>> -        if ( !is_hvm_domain(d) )
>> -            break;
>> -
>> -        if ( id != (unsigned int)ioservid )
>> -            break;
>> -
>> -        rc = 0;
>> -        for ( i = 0; i < nr_frames; i++ )
>> -        {
>> -            mfn_t mfn;
>> -
>> -            rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn);
>> -            if ( rc )
>> -                break;
>> -
>> -            mfn_list[i] = mfn_x(mfn);
>> -        }
>> -        break;
>> -    }
>> -#endif
>> -
>> -    default:
>> -        rc = -EOPNOTSUPP;
>> -        break;
>> -    }
>> -
>> -    return rc;
>> -}
> 
> Can't this be accompanied by removal of the xen/ioreq.h inclusion?
> (I'm only looking at patch 4 right now, but the renaming there made
> the soon to be unnecessary #include quite apparent.)

And then, now that I've looked at this patch as a whole,
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* RE: [PATCH V3 08/23] xen/ioreq: Move x86's ioreq_server to struct domain
  2020-12-07 12:04   ` Jan Beulich
@ 2020-12-07 12:12     ` Paul Durrant
  2020-12-07 19:52     ` Oleksandr
  1 sibling, 0 replies; 127+ messages in thread
From: Paul Durrant @ 2020-12-07 12:12 UTC (permalink / raw)
  To: 'Jan Beulich', 'Oleksandr Tyshchenko'
  Cc: 'Oleksandr Tyshchenko', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson',
	'Julien Grall', 'Stefano Stabellini',
	'Wei Liu', 'Roger Pau Monné',
	'Julien Grall',
	xen-devel

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 07 December 2020 12:05
> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant <paul@xen.org>
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
> George Dunlap <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Julien Grall
> <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Roger Pau Monné
> <roger.pau@citrix.com>; Julien Grall <julien.grall@arm.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH V3 08/23] xen/ioreq: Move x86's ioreq_server to struct domain
> 
> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >
> > The IOREQ is a common feature now and this struct will be used
> > on Arm as is. Move it to common struct domain. This also
> > significantly reduces the layering violation in the common code
> > (*arch.hvm* usage).
> >
> > We don't move ioreq_gfn since it is not used in the common code
> > (the "legacy" mechanism is x86 specific).
> >
> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Applicable parts
> Acked-by: Jan Beulich <jbeulich@suse.com>
> yet with a question, but maybe more to Paul than to you:
> 
> > --- a/xen/include/asm-x86/hvm/domain.h
> > +++ b/xen/include/asm-x86/hvm/domain.h
> > @@ -63,8 +63,6 @@ struct hvm_pi_ops {
> >      void (*vcpu_block)(struct vcpu *);
> >  };
> >
> > -#define MAX_NR_IOREQ_SERVERS 8
> > -
> >  struct hvm_domain {
> >      /* Guest page range used for non-default ioreq servers */
> >      struct {
> > @@ -73,12 +71,6 @@ struct hvm_domain {
> >          unsigned long legacy_mask; /* indexed by HVM param number */
> >      } ioreq_gfn;
> >
> > -    /* Lock protects all other values in the sub-struct and the default */
> > -    struct {
> > -        spinlock_t              lock;
> > -        struct ioreq_server *server[MAX_NR_IOREQ_SERVERS];
> > -    } ioreq_server;
> > -
> >      /* Cached CF8 for guest PCI config cycles */
> >      uint32_t                pci_cf8;
> >
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -316,6 +316,8 @@ struct sched_unit {
> >
> >  struct evtchn_port_ops;
> >
> > +#define MAX_NR_IOREQ_SERVERS 8
> > +
> >  struct domain
> >  {
> >      domid_t          domain_id;
> > @@ -523,6 +525,14 @@ struct domain
> >      /* Argo interdomain communication support */
> >      struct argo_domain *argo;
> >  #endif
> > +
> > +#ifdef CONFIG_IOREQ_SERVER
> > +    /* Lock protects all other values in the sub-struct and the default */
> > +    struct {
> > +        spinlock_t              lock;
> > +        struct ioreq_server     *server[MAX_NR_IOREQ_SERVERS];
> > +    } ioreq_server;
> > +#endif
> 
> The comment gets merely moved, but what "default" does it talk about?
> Is this a stale part which would better be dropped at this occasion?
> 

Yes, I think that is a stale part of the comment from the days of the default ioreq server. It can be dropped.

  Paul

> Jan



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
  2020-11-30 10:31 ` [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu Oleksandr Tyshchenko
@ 2020-12-07 12:32   ` Jan Beulich
  2020-12-07 20:59     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 12:32 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, Paul Durrant
  Cc: Oleksandr Tyshchenko, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Jun Nakajima, Kevin Tian, Julien Grall,
	xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -142,8 +142,8 @@ void hvmemul_cancel(struct vcpu *v)
>  {
>      struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>  
> -    vio->io_req.state = STATE_IOREQ_NONE;
> -    vio->io_completion = HVMIO_no_completion;
> +    v->io.req.state = STATE_IOREQ_NONE;
> +    v->io.completion = IO_no_completion;
>      vio->mmio_cache_count = 0;
>      vio->mmio_insn_bytes = 0;
>      vio->mmio_access = (struct npfec){};
> @@ -159,7 +159,7 @@ static int hvmemul_do_io(
>  {
>      struct vcpu *curr = current;
>      struct domain *currd = curr->domain;
> -    struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io;
> +    struct vcpu_io *vio = &curr->io;

Taking just these two hunks: "vio" would now stand for two entirely
different things. I realize the name is applicable to both, but I
wonder if such naming isn't going to risk confusion. Despite being
relatively familiar with the involved code, I've been repeatedly
unsure what exactly "vio" covers, and needed to go back to the
header. So together with the name possible adjustment mentioned
further down, maybe "vcpu_io" also wants it name changed, such that
the variable then also could sensibly be named (slightly)
differently? struct vcpu_io_state maybe? Or alternatively rename
variables of type struct hvm_vcpu_io * to hvio or hio? Otoh the
savings aren't very big for just ->io, so maybe better to stick to
the prior name with the prior type, and not introduce local
variables at all for the new field, like you already have it in the
former case?

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -145,6 +145,21 @@ void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
>  
>  struct waitqueue_vcpu;
>  
> +enum io_completion {
> +    IO_no_completion,
> +    IO_mmio_completion,
> +    IO_pio_completion,
> +#ifdef CONFIG_X86
> +    IO_realmode_completion,
> +#endif
> +};

I'm not entirely happy with io_ / IO_ here - they seem a little
too generic. How about ioreq_ / IOREQ_ respectively?

> +struct vcpu_io {
> +    /* I/O request in flight to device model. */
> +    enum io_completion   completion;
> +    ioreq_t              req;
> +};
> +
>  struct vcpu
>  {
>      int              vcpu_id;
> @@ -256,6 +271,10 @@ struct vcpu
>      struct vpci_vcpu vpci;
>  
>      struct arch_vcpu arch;
> +
> +#ifdef CONFIG_IOREQ_SERVER
> +    struct vcpu_io io;
> +#endif
>  };

I don't have a good solution in mind, and I'm also not meaning to
necessarily request a change here, but I'd like to point out that
this does away (for this part of it only, of course) with the
overlaying of the PV and HVM sub-structs on x86. As long as the
HVM part is the far bigger one, that's not a problem, but I wanted
to mention the aspect nevertheless.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 12/23] xen/ioreq: Remove "hvm" prefixes from involved function names
  2020-11-30 10:31 ` [PATCH V3 12/23] xen/ioreq: Remove "hvm" prefixes from involved function names Oleksandr Tyshchenko
@ 2020-12-07 12:45   ` Jan Beulich
  2020-12-07 20:28     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 12:45 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Paul Durrant, Jun Nakajima, Kevin Tian,
	Julien Grall, xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> @@ -301,8 +301,8 @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
>      return found;
>  }
>  
> -static void hvm_update_ioreq_evtchn(struct ioreq_server *s,
> -                                    struct ioreq_vcpu *sv)
> +static void ioreq_update_evtchn(struct ioreq_server *s,
> +                                struct ioreq_vcpu *sv)
>  {
>      ASSERT(spin_is_locked(&s->lock));

This looks to be an ioreq server function, which hence wants to be
named ioreq_server_update_evtchn()? Then

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* RE: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
  2020-11-30 11:22 ` [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm Oleksandr
@ 2020-12-07 13:03   ` Wei Chen
  2020-12-07 21:03     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Wei Chen @ 2020-12-07 13:03 UTC (permalink / raw)
  To: Oleksandr, xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Tim Deegan, Daniel De Graaf,
	Volodymyr Babchuk, Jun Nakajima, Kevin Tian, Anthony PERARD,
	Bertrand Marquis, Kaly Xin, Artem Mygaiev, Alex Bennée

Hi Oleksandr,

I have tested v3. It works well with the latest virtio-backend service[1].
[1] https://github.com/xen-troops/virtio-disk/commits/ioreq_ml1

Tested-by: Wei Chen <Wei.Chen@arm.com>

Regards,
Wei Chen

> -----Original Message-----
> From: Oleksandr <olekstysh@gmail.com>
> Sent: 2020年11月30日 19:23
> To: xen-devel@lists.xenproject.org
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Paul Durrant
> <paul@xen.org>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>;
> Wei Liu <wl@xen.org>; Julien Grall <Julien.Grall@arm.com>; George Dunlap
> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Julien Grall
> <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Tim Deegan
> <tim@xen.org>; Daniel De Graaf <dgdegra@tycho.nsa.gov>; Volodymyr
> Babchuk <Volodymyr_Babchuk@epam.com>; Jun Nakajima
> <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; Anthony
> PERARD <anthony.perard@citrix.com>; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Wei Chen <Wei.Chen@arm.com>; Kaly Xin
> <Kaly.Xin@arm.com>; Artem Mygaiev <joculator@gmail.com>; Alex Bennée
> <alex.bennee@linaro.org>
> Subject: Re: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
> 
> 
> On 30.11.20 12:31, Oleksandr Tyshchenko wrote:
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Hello all.
> 
> Added missed subject line. I am sorry for the inconvenience.
> 
> 
> >
> >
> > Date: Sat, 28 Nov 2020 22:33:51 +0200
> > Subject: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > Hello all.
> >
> > The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
> > You can find an initial discussion at [1] and RFC/V1/V2 series at [2]/[3]/[4].
> > Xen on Arm requires some implementation to forward guest MMIO access to a
> device
> > model in order to implement virtio-mmio backend or even mediator outside of
> hypervisor.
> > As Xen on x86 already contains required support this series tries to make it
> common
> > and introduce Arm specific bits plus some new functionality. Patch series is
> based on
> > Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device
> emulator".
> > Besides splitting existing IOREQ/DM support and introducing Arm side, the
> series
> > also includes virtio-mmio related changes (last 2 patches for toolstack)
> > for the reviewers to be able to see how the whole picture could look like.
> >
> > According to the initial discussion there are a few open questions/concerns
> > regarding security, performance in VirtIO solution:
> > 1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require different
> >     transport...
> > 2. virtio backend is able to access all guest memory, some kind of protection
> >     is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest'
> > 3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using
> >     Xenstore in virtio backend if possible.
> > 4. a lot of 'foreing mapping' could lead to the memory exhaustion, Julien
> >     has some idea regarding that.
> >
> > Looks like all of them are valid and worth considering, but the first thing
> > which we need on Arm is a mechanism to forward guest IO to a device
> emulator,
> > so let's focus on it in the first place.
> >
> > ***
> >
> > There are a lot of changes since RFC series, almost all TODOs were resolved on
> Arm,
> > Arm code was improved and hardened, common IOREQ/DM code became
> really arch-agnostic
> > (without HVM-ism), the "legacy" mechanism of mapping magic pages for the
> IOREQ servers
> > was left x86 specific, etc. But one TODO still remains which is "PIO handling"
> on Arm.
> > The "PIO handling" TODO is expected to left unaddressed for the current series.
> > It is not an big issue for now while Xen doesn't have support for vPCI on Arm.
> > On Arm64 they are only used for PCI IO Bar and we would probably want to
> expose
> > them to emulator as PIO access to make a DM completely arch-agnostic. So
> "PIO handling"
> > should be implemented when we add support for vPCI.
> >
> > I left interface untouched in the following patch
> > "xen/dm: Introduce xendevicemodel_set_irq_level DM op"
> > since there is still an open discussion what interface to use/what
> > information to pass to the hypervisor.
> >
> > There is a patch on review this series depends on:
> > https://patchwork.kernel.org/patch/11816689
> >
> > Please note, that IOREQ feature is disabled by default on Arm within current
> series.
> >
> > ***
> >
> > Patch series [5] was rebased on recent "staging branch"
> > (181f2c2 evtchn: double per-channel locking can't hit identical channels) and
> tested on
> > Renesas Salvator-X board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk
> backend [6]
> > running in driver domain and unmodified Linux Guest running on existing
> > virtio-blk driver (frontend). No issues were observed. Guest domain
> 'reboot/destroy'
> > use-cases work properly. Patch series was only build-tested on x86.
> >
> > Please note, build-test passed for the following modes:
> > 1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
> > 2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set
> > 3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y
> > 4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set  (default)
> > 5. Arm32: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y
> > 6. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set  (default)
> >
> > ***
> >
> > Any feedback/help would be highly appreciated.
> >
> > [1] https://lists.xenproject.org/archives/html/xen-devel/2020-
> 07/msg00825.html
> > [2] https://lists.xenproject.org/archives/html/xen-devel/2020-
> 08/msg00071.html
> > [3] https://lists.xenproject.org/archives/html/xen-devel/2020-
> 09/msg00732.html
> > [4] https://lists.xenproject.org/archives/html/xen-devel/2020-
> 10/msg01077.html
> > [5] https://github.com/otyshchenko1/xen/commits/ioreq_4.14_ml4
> > [6] https://github.com/xen-troops/virtio-disk/commits/ioreq_ml1
> >
> > Julien Grall (5):
> >    xen/dm: Make x86's DM feature common
> >    xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
> >    arm/ioreq: Introduce arch specific bits for IOREQ/DM features
> >    xen/dm: Introduce xendevicemodel_set_irq_level DM op
> >    libxl: Introduce basic virtio-mmio support on Arm
> >
> > Oleksandr Tyshchenko (18):
> >    x86/ioreq: Prepare IOREQ feature for making it common
> >    x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving
> >    x86/ioreq: Provide out-of-line wrapper for the handle_mmio()
> >    xen/ioreq: Make x86's IOREQ feature common
> >    xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
> >    xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common
> >    xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
> >    xen/ioreq: Move x86's ioreq_server to struct domain
> >    xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
> >    xen/ioreq: Remove "hvm" prefixes from involved function names
> >    xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
> >    xen/arm: Stick around in leave_hypervisor_to_guest until I/O has
> >      completed
> >    xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
> >    xen/ioreq: Introduce domain_has_ioreq_server()
> >    xen/arm: io: Abstract sign-extension
> >    xen/ioreq: Make x86's send_invalidate_req() common
> >    xen/arm: Add mapcache invalidation handling
> >    [RFC] libxl: Add support for virtio-disk configuration
> >
> >   MAINTAINERS                                  |    8 +-
> >   tools/include/xendevicemodel.h               |    4 +
> >   tools/libs/devicemodel/core.c                |   18 +
> >   tools/libs/devicemodel/libxendevicemodel.map |    1 +
> >   tools/libs/light/Makefile                    |    1 +
> >   tools/libs/light/libxl_arm.c                 |   94 +-
> >   tools/libs/light/libxl_create.c              |    1 +
> >   tools/libs/light/libxl_internal.h            |    1 +
> >   tools/libs/light/libxl_types.idl             |   16 +
> >   tools/libs/light/libxl_types_internal.idl    |    1 +
> >   tools/libs/light/libxl_virtio_disk.c         |  109 +++
> >   tools/xl/Makefile                            |    2 +-
> >   tools/xl/xl.h                                |    3 +
> >   tools/xl/xl_cmdtable.c                       |   15 +
> >   tools/xl/xl_parse.c                          |  116 +++
> >   tools/xl/xl_virtio_disk.c                    |   46 +
> >   xen/arch/arm/Makefile                        |    2 +
> >   xen/arch/arm/dm.c                            |   89 ++
> >   xen/arch/arm/domain.c                        |    9 +
> >   xen/arch/arm/hvm.c                           |    4 +
> >   xen/arch/arm/io.c                            |   29 +-
> >   xen/arch/arm/ioreq.c                         |  126 +++
> >   xen/arch/arm/p2m.c                           |   48 +-
> >   xen/arch/arm/traps.c                         |   58 +-
> >   xen/arch/x86/Kconfig                         |    1 +
> >   xen/arch/x86/hvm/dm.c                        |  295 +-----
> >   xen/arch/x86/hvm/emulate.c                   |   80 +-
> >   xen/arch/x86/hvm/hvm.c                       |   12 +-
> >   xen/arch/x86/hvm/hypercall.c                 |    9 +-
> >   xen/arch/x86/hvm/intercept.c                 |    5 +-
> >   xen/arch/x86/hvm/io.c                        |   26 +-
> >   xen/arch/x86/hvm/ioreq.c                     | 1357 ++------------------------
> >   xen/arch/x86/hvm/stdvga.c                    |   10 +-
> >   xen/arch/x86/hvm/svm/nestedsvm.c             |    2 +-
> >   xen/arch/x86/hvm/vmx/realmode.c              |    6 +-
> >   xen/arch/x86/hvm/vmx/vvmx.c                  |    2 +-
> >   xen/arch/x86/mm.c                            |   46 +-
> >   xen/arch/x86/mm/p2m.c                        |   13 +-
> >   xen/arch/x86/mm/shadow/common.c              |    2 +-
> >   xen/common/Kconfig                           |    3 +
> >   xen/common/Makefile                          |    2 +
> >   xen/common/dm.c                              |  292 ++++++
> >   xen/common/ioreq.c                           | 1307 +++++++++++++++++++++++++
> >   xen/common/memory.c                          |   73 +-
> >   xen/include/asm-arm/domain.h                 |    3 +
> >   xen/include/asm-arm/hvm/ioreq.h              |  139 +++
> >   xen/include/asm-arm/mm.h                     |    8 -
> >   xen/include/asm-arm/mmio.h                   |    1 +
> >   xen/include/asm-arm/p2m.h                    |   19 +-
> >   xen/include/asm-arm/traps.h                  |   24 +
> >   xen/include/asm-x86/hvm/domain.h             |   43 -
> >   xen/include/asm-x86/hvm/emulate.h            |    2 +-
> >   xen/include/asm-x86/hvm/io.h                 |   17 -
> >   xen/include/asm-x86/hvm/ioreq.h              |   58 +-
> >   xen/include/asm-x86/hvm/vcpu.h               |   18 -
> >   xen/include/asm-x86/mm.h                     |    4 -
> >   xen/include/asm-x86/p2m.h                    |   24 +-
> >   xen/include/public/arch-arm.h                |    5 +
> >   xen/include/public/hvm/dm_op.h               |   16 +
> >   xen/include/xen/dm.h                         |   44 +
> >   xen/include/xen/ioreq.h                      |  146 +++
> >   xen/include/xen/p2m-common.h                 |    4 +
> >   xen/include/xen/sched.h                      |   32 +
> >   xen/include/xsm/dummy.h                      |    4 +-
> >   xen/include/xsm/xsm.h                        |    6 +-
> >   xen/xsm/dummy.c                              |    2 +-
> >   xen/xsm/flask/hooks.c                        |    5 +-
> >   67 files changed, 3084 insertions(+), 1884 deletions(-)
> >   create mode 100644 tools/libs/light/libxl_virtio_disk.c
> >   create mode 100644 tools/xl/xl_virtio_disk.c
> >   create mode 100644 xen/arch/arm/dm.c
> >   create mode 100644 xen/arch/arm/ioreq.c
> >   create mode 100644 xen/common/dm.c
> >   create mode 100644 xen/common/ioreq.c
> >   create mode 100644 xen/include/asm-arm/hvm/ioreq.h
> >   create mode 100644 xen/include/xen/dm.h
> >   create mode 100644 xen/include/xen/ioreq.h
> >
> --
> Regards,
> 
> Oleksandr Tyshchenko


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common
  2020-12-07 11:13   ` Jan Beulich
@ 2020-12-07 15:27     ` Oleksandr
  2020-12-07 16:29       ` Jan Beulich
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-07 15:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel


On 07.12.20 13:13, Jan Beulich wrote:

Hi Jan

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -17,15 +17,15 @@
>>    */
>>   
>>   #include <xen/ctype.h>
>> +#include <xen/domain.h>
>> +#include <xen/event.h>
>>   #include <xen/init.h>
>> +#include <xen/irq.h>
>>   #include <xen/lib.h>
>> -#include <xen/trace.h>
>> +#include <xen/paging.h>
>>   #include <xen/sched.h>
>> -#include <xen/irq.h>
>>   #include <xen/softirq.h>
>> -#include <xen/domain.h>
>> -#include <xen/event.h>
>> -#include <xen/paging.h>
>> +#include <xen/trace.h>
>>   #include <xen/vpci.h>
> Seeing this consolidation (thanks!), have you been able to figure
> out what xen/ctype.h is needed for here? It looks to me as if it
> could be dropped at the same time.

Not yet, but will re-check.


>
>> @@ -601,7 +610,7 @@ static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
>>       return rc;
>>   }
>>   
>> -static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
>> +void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
>>   {
>>       hvm_unmap_ioreq_gfn(s, true);
>>       hvm_unmap_ioreq_gfn(s, false);
> How is this now different from ...
>
>> @@ -674,6 +683,12 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
>>       return rc;
>>   }
>>   
>> +void arch_ioreq_server_enable(struct hvm_ioreq_server *s)
>> +{
>> +    hvm_remove_ioreq_gfn(s, false);
>> +    hvm_remove_ioreq_gfn(s, true);
>> +}
> ... this? Imo if at all possible there should be no such duplication
> (i.e. at least have this one simply call the earlier one).

I am afraid, I don't see any duplication between mentioned functions. 
Would you mind explaining?


>
>> @@ -1080,6 +1105,24 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>>       return rc;
>>   }
>>   
>> +/* Called with ioreq_server lock held */
>> +int arch_ioreq_server_map_mem_type(struct domain *d,
>> +                                   struct hvm_ioreq_server *s,
>> +                                   uint32_t flags)
>> +{
>> +    int rc = p2m_set_ioreq_server(d, flags, s);
>> +
>> +    if ( rc == 0 && flags == 0 )
>> +    {
>> +        const struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> +
>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>>   /*
>>    * Map or unmap an ioreq server to specific memory type. For now, only
>>    * HVMMEM_ioreq_server is supported, and in the future new types can be
>> @@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>       if ( s->emulator != current->domain )
>>           goto out;
>>   
>> -    rc = p2m_set_ioreq_server(d, flags, s);
>> +    rc = arch_ioreq_server_map_mem_type(d, s, flags);
>>   
>>    out:
>>       spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>>   
>> -    if ( rc == 0 && flags == 0 )
>> -    {
>> -        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> -
>> -        if ( read_atomic(&p2m->ioreq.entry_count) )
>> -            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>> -    }
>> -
>>       return rc;
>>   }
> While you mention this change in the description, I'm still
> missing justification as to why the change is safe to make. I
> continue to think p2m_change_entry_type_global() would better
> not be called inside the locked region, if at all possible.
Well. I am afraid, I don't have a 100% justification why the change is 
safe to make as well
as I don't see an obvious reason why it is not safe to make (at least I 
didn't find a possible deadlock scenario while investigating the code).
I raised a question earlier whether I can fold this check in, which 
implied calling p2m_change_entry_type_global() with ioreq_server lock held.


If there is a concern with calling this inside the locked region 
(unfortunately still unclear for me at the moment), I will try to find 
another way how to split hvm_map_mem_type_to_ioreq_server() without
potentially unsafe change here *and* exposing 
p2m_change_entry_type_global() to the common code. Right now, I don't 
have any ideas how this could be split other than
introducing one more hook here to deal with p2m_change_entry_type_global 
(probably arch_ioreq_server_map_mem_type_complete?), but I don't expect 
it to be accepted.
I appreciate any ideas on that.
>
>> @@ -1239,33 +1279,28 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>>       spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>>   }
>>   
>> -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>> -                                                 ioreq_t *p)
>> +int arch_ioreq_server_get_type_addr(const struct domain *d,
>> +                                    const ioreq_t *p,
>> +                                    uint8_t *type,
>> +                                    uint64_t *addr)
>>   {
>> -    struct hvm_ioreq_server *s;
>> -    uint32_t cf8;
>> -    uint8_t type;
>> -    uint64_t addr;
>> -    unsigned int id;
>> +    unsigned int cf8 = d->arch.hvm.pci_cf8;
>>   
>>       if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
>> -        return NULL;
>> -
>> -    cf8 = d->arch.hvm.pci_cf8;
>> +        return -EINVAL;
> The caller cares about only a boolean. Either make the function
> return bool, or (imo better, but others may like this less) have
> it return "type" instead of using indirection, using e.g.
> negative values to identify errors (which then could still be
> errno ones if you wish).

Makes sense. I will probably make the function return bool. Even if 
return "type" we will still have an indirection for "addr".


>
>> --- a/xen/include/asm-x86/hvm/ioreq.h
>> +++ b/xen/include/asm-x86/hvm/ioreq.h
>> @@ -19,6 +19,25 @@
>>   #ifndef __ASM_X86_HVM_IOREQ_H__
>>   #define __ASM_X86_HVM_IOREQ_H__
>>   
>> +#define HANDLE_BUFIOREQ(s) \
>> +    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
>> +
>> +bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion);
>> +int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s);
>> +void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s);
>> +void arch_ioreq_server_enable(struct hvm_ioreq_server *s);
>> +void arch_ioreq_server_disable(struct hvm_ioreq_server *s);
>> +void arch_ioreq_server_destroy(struct hvm_ioreq_server *s);
>> +int arch_ioreq_server_map_mem_type(struct domain *d,
>> +                                   struct hvm_ioreq_server *s,
>> +                                   uint32_t flags);
>> +bool arch_ioreq_server_destroy_all(struct domain *d);
>> +int arch_ioreq_server_get_type_addr(const struct domain *d,
>> +                                    const ioreq_t *p,
>> +                                    uint8_t *type,
>> +                                    uint64_t *addr);
>> +void arch_ioreq_domain_init(struct domain *d);
>> +
>>   bool hvm_io_pending(struct vcpu *v);
>>   bool handle_hvm_io_completion(struct vcpu *v);
>>   bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
> What's the plan here? Introduce them into the x86 header just
> to later move the entire block into the common one? Wouldn't
> it make sense to introduce the common header here right away?
> Or do you expect to convert some of the simpler ones to inline
> functions later on?
The former. The subsequent patch is moving the the entire block(s) from 
here and from x86/hvm/ioreq.c to the common code in one go.
I thought it was a little bit odd to expose a header before exposing an 
implementation to the common code. Another reason is to minimize places 
that need touching by current patch.
After all, this is done within single series and without breakage in 
between. But, if introducing the common header right away will make 
patch more cleaner and correct I am absolutely OK and happy to update a 
patch. Shall I?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 02/23] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving
  2020-12-07 11:19   ` Jan Beulich
@ 2020-12-07 15:37     ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-07 15:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel


On 07.12.20 13:19, Jan Beulich wrote:

Hi Jan

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/asm-x86/hvm/ioreq.h
>> +++ b/xen/include/asm-x86/hvm/ioreq.h
>> @@ -74,6 +74,10 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
>>   
>>   void hvm_ioreq_init(struct domain *d);
>>   
>> +#define IOREQ_STATUS_HANDLED     X86EMUL_OKAY
>> +#define IOREQ_STATUS_UNHANDLED   X86EMUL_UNHANDLEABLE
>> +#define IOREQ_STATUS_RETRY       X86EMUL_RETRY
> This correlation may not be altered. I think a comment is needed
> to this effect, to avoid someone trying to subsequently fold the
> x86 and (to be introduced) Arm ones.

ok, will add.


> With that
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thank you.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 03/23] x86/ioreq: Provide out-of-line wrapper for the handle_mmio()
  2020-12-07 11:27   ` Jan Beulich
@ 2020-12-07 15:39     ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-07 15:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel


On 07.12.20 13:27, Jan Beulich wrote:

Hi Jan

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -36,6 +36,11 @@
>>   #include <public/hvm/ioreq.h>
>>   #include <public/hvm/params.h>
>>   
>> +bool ioreq_complete_mmio(void)
>> +{
>> +    return handle_mmio();
>> +}
> As indicated before I don't like out-of-line functions like this
> one; I think a #define would be quite fine here, but Paul as the
> maintainer thinks differently. So be it. However, shouldn't this
> function be named arch_ioreq_complete_mmio() according to the
> new naming model, and then ...
>
>> --- a/xen/include/asm-x86/hvm/ioreq.h
>> +++ b/xen/include/asm-x86/hvm/ioreq.h
>> @@ -74,6 +74,8 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
>>   
>>   void hvm_ioreq_init(struct domain *d);
>>   
>> +bool ioreq_complete_mmio(void);
> ... get declared next to the other arch_*() hooks? With this

sounds reasonable, will update


> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thank you

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common
  2020-12-07 15:27     ` Oleksandr
@ 2020-12-07 16:29       ` Jan Beulich
  2020-12-07 17:21         ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-07 16:29 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel

On 07.12.2020 16:27, Oleksandr wrote:
> On 07.12.20 13:13, Jan Beulich wrote:
>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>> @@ -601,7 +610,7 @@ static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
>>>       return rc;
>>>   }
>>>   
>>> -static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
>>> +void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
>>>   {
>>>       hvm_unmap_ioreq_gfn(s, true);
>>>       hvm_unmap_ioreq_gfn(s, false);
>> How is this now different from ...
>>
>>> @@ -674,6 +683,12 @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
>>>       return rc;
>>>   }
>>>   
>>> +void arch_ioreq_server_enable(struct hvm_ioreq_server *s)
>>> +{
>>> +    hvm_remove_ioreq_gfn(s, false);
>>> +    hvm_remove_ioreq_gfn(s, true);
>>> +}
>> ... this? Imo if at all possible there should be no such duplication
>> (i.e. at least have this one simply call the earlier one).
> 
> I am afraid, I don't see any duplication between mentioned functions. 
> Would you mind explaining?

Ouch - somehow my eyes considered "unmap" == "remove". I'm sorry
for the noise.

>>> @@ -1080,6 +1105,24 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>>>       return rc;
>>>   }
>>>   
>>> +/* Called with ioreq_server lock held */
>>> +int arch_ioreq_server_map_mem_type(struct domain *d,
>>> +                                   struct hvm_ioreq_server *s,
>>> +                                   uint32_t flags)
>>> +{
>>> +    int rc = p2m_set_ioreq_server(d, flags, s);
>>> +
>>> +    if ( rc == 0 && flags == 0 )
>>> +    {
>>> +        const struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> +
>>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>> +    }
>>> +
>>> +    return rc;
>>> +}
>>> +
>>>   /*
>>>    * Map or unmap an ioreq server to specific memory type. For now, only
>>>    * HVMMEM_ioreq_server is supported, and in the future new types can be
>>> @@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>       if ( s->emulator != current->domain )
>>>           goto out;
>>>   
>>> -    rc = p2m_set_ioreq_server(d, flags, s);
>>> +    rc = arch_ioreq_server_map_mem_type(d, s, flags);
>>>   
>>>    out:
>>>       spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>>>   
>>> -    if ( rc == 0 && flags == 0 )
>>> -    {
>>> -        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>> -
>>> -        if ( read_atomic(&p2m->ioreq.entry_count) )
>>> -            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>> -    }
>>> -
>>>       return rc;
>>>   }
>> While you mention this change in the description, I'm still
>> missing justification as to why the change is safe to make. I
>> continue to think p2m_change_entry_type_global() would better
>> not be called inside the locked region, if at all possible.
> Well. I am afraid, I don't have a 100% justification why the change is 
> safe to make as well
> as I don't see an obvious reason why it is not safe to make (at least I 
> didn't find a possible deadlock scenario while investigating the code).
> I raised a question earlier whether I can fold this check in, which 
> implied calling p2m_change_entry_type_global() with ioreq_server lock held.

I'm aware of the earlier discussion. But "didn't find" isn't good
enough in a case like this, and since it's likely hard to indeed
prove there's no deadlock possible, I think it's best to avoid
having to provide such a proof by avoiding the nesting.

> If there is a concern with calling this inside the locked region 
> (unfortunately still unclear for me at the moment), I will try to find 
> another way how to split hvm_map_mem_type_to_ioreq_server() without
> potentially unsafe change here *and* exposing 
> p2m_change_entry_type_global() to the common code. Right now, I don't 
> have any ideas how this could be split other than
> introducing one more hook here to deal with p2m_change_entry_type_global 
> (probably arch_ioreq_server_map_mem_type_complete?), but I don't expect 
> it to be accepted.
> I appreciate any ideas on that.

Is there a reason why the simplest solution (two independent
arch_*() calls) won't do? If so, what are the constraints?
Can the first one e.g. somehow indicate what needs to happen
after the lock was dropped? But the two calls look independent
right now, so I don't see any complicating factors.

>>> --- a/xen/include/asm-x86/hvm/ioreq.h
>>> +++ b/xen/include/asm-x86/hvm/ioreq.h
>>> @@ -19,6 +19,25 @@
>>>   #ifndef __ASM_X86_HVM_IOREQ_H__
>>>   #define __ASM_X86_HVM_IOREQ_H__
>>>   
>>> +#define HANDLE_BUFIOREQ(s) \
>>> +    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
>>> +
>>> +bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion);
>>> +int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s);
>>> +void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s);
>>> +void arch_ioreq_server_enable(struct hvm_ioreq_server *s);
>>> +void arch_ioreq_server_disable(struct hvm_ioreq_server *s);
>>> +void arch_ioreq_server_destroy(struct hvm_ioreq_server *s);
>>> +int arch_ioreq_server_map_mem_type(struct domain *d,
>>> +                                   struct hvm_ioreq_server *s,
>>> +                                   uint32_t flags);
>>> +bool arch_ioreq_server_destroy_all(struct domain *d);
>>> +int arch_ioreq_server_get_type_addr(const struct domain *d,
>>> +                                    const ioreq_t *p,
>>> +                                    uint8_t *type,
>>> +                                    uint64_t *addr);
>>> +void arch_ioreq_domain_init(struct domain *d);
>>> +
>>>   bool hvm_io_pending(struct vcpu *v);
>>>   bool handle_hvm_io_completion(struct vcpu *v);
>>>   bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
>> What's the plan here? Introduce them into the x86 header just
>> to later move the entire block into the common one? Wouldn't
>> it make sense to introduce the common header here right away?
>> Or do you expect to convert some of the simpler ones to inline
>> functions later on?
> The former. The subsequent patch is moving the the entire block(s) from 
> here and from x86/hvm/ioreq.c to the common code in one go.

I think I saw it move the _other_ pieces there, and this block
left here. (FAOD my comment is about the arch_*() declarations
you add, not the patch context in view.)

> I thought it was a little bit odd to expose a header before exposing an 
> implementation to the common code. Another reason is to minimize places 
> that need touching by current patch.

By exposing arch_*() declarations you don't give the impression
of exposing any "implementation". These are helpers the
implementation is to invoke; I'm fine with you moving the
declarations of the functions actually constituting this
component's external interface only once you also move the
implementation to common code.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common
  2020-12-07 16:29       ` Jan Beulich
@ 2020-12-07 17:21         ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-07 17:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Julien Grall, Stefano Stabellini, Julien Grall,
	xen-devel


Hi Jan


>>>> @@ -1080,6 +1105,24 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
>>>>        return rc;
>>>>    }
>>>>    
>>>> +/* Called with ioreq_server lock held */
>>>> +int arch_ioreq_server_map_mem_type(struct domain *d,
>>>> +                                   struct hvm_ioreq_server *s,
>>>> +                                   uint32_t flags)
>>>> +{
>>>> +    int rc = p2m_set_ioreq_server(d, flags, s);
>>>> +
>>>> +    if ( rc == 0 && flags == 0 )
>>>> +    {
>>>> +        const struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> +
>>>> +        if ( read_atomic(&p2m->ioreq.entry_count) )
>>>> +            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>>> +    }
>>>> +
>>>> +    return rc;
>>>> +}
>>>> +
>>>>    /*
>>>>     * Map or unmap an ioreq server to specific memory type. For now, only
>>>>     * HVMMEM_ioreq_server is supported, and in the future new types can be
>>>> @@ -1112,19 +1155,11 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
>>>>        if ( s->emulator != current->domain )
>>>>            goto out;
>>>>    
>>>> -    rc = p2m_set_ioreq_server(d, flags, s);
>>>> +    rc = arch_ioreq_server_map_mem_type(d, s, flags);
>>>>    
>>>>     out:
>>>>        spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>>>>    
>>>> -    if ( rc == 0 && flags == 0 )
>>>> -    {
>>>> -        struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>> -
>>>> -        if ( read_atomic(&p2m->ioreq.entry_count) )
>>>> -            p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw);
>>>> -    }
>>>> -
>>>>        return rc;
>>>>    }
>>> While you mention this change in the description, I'm still
>>> missing justification as to why the change is safe to make. I
>>> continue to think p2m_change_entry_type_global() would better
>>> not be called inside the locked region, if at all possible.
>> Well. I am afraid, I don't have a 100% justification why the change is
>> safe to make as well
>> as I don't see an obvious reason why it is not safe to make (at least I
>> didn't find a possible deadlock scenario while investigating the code).
>> I raised a question earlier whether I can fold this check in, which
>> implied calling p2m_change_entry_type_global() with ioreq_server lock held.
> I'm aware of the earlier discussion. But "didn't find" isn't good
> enough in a case like this, and since it's likely hard to indeed
> prove there's no deadlock possible, I think it's best to avoid
> having to provide such a proof by avoiding the nesting.

Agree here.


>
>> If there is a concern with calling this inside the locked region
>> (unfortunately still unclear for me at the moment), I will try to find
>> another way how to split hvm_map_mem_type_to_ioreq_server() without
>> potentially unsafe change here *and* exposing
>> p2m_change_entry_type_global() to the common code. Right now, I don't
>> have any ideas how this could be split other than
>> introducing one more hook here to deal with p2m_change_entry_type_global
>> (probably arch_ioreq_server_map_mem_type_complete?), but I don't expect
>> it to be accepted.
>> I appreciate any ideas on that.
> Is there a reason why the simplest solution (two independent
> arch_*() calls) won't do? If so, what are the constraints?

There is no reason.


> Can the first one e.g. somehow indicate what needs to happen
> after the lock was dropped?

I think, yes.


> But the two calls look independent
> right now, so I don't see any complicating factors.

ok, will go "two independent arch hooks" route then.

Thank you for the idea.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common
  2020-12-07 11:41   ` Jan Beulich
@ 2020-12-07 19:43     ` Oleksandr
  2020-12-08  9:21       ` Jan Beulich
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-07 19:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Paul Durrant, Tim Deegan, Julien Grall, xen-devel


On 07.12.20 13:41, Jan Beulich wrote:

Hi Jan

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/asm-x86/hvm/ioreq.h
>> +++ b/xen/include/asm-x86/hvm/ioreq.h
>> @@ -19,8 +19,7 @@
>>   #ifndef __ASM_X86_HVM_IOREQ_H__
>>   #define __ASM_X86_HVM_IOREQ_H__
>>   
>> -#define HANDLE_BUFIOREQ(s) \
>> -    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
>> +#include <xen/ioreq.h>
> Is there a strict need to do it this way round? Usually the common
> header would include the arch one ...
The reason was to make a bunch of x86 files (which included 
asm/hvm/ioreq.h so far) to not suffer from moving IOREQ interface location
and as the result to limit the number of files which needed touching. If 
a common rule is to another way around, I will follow it.
So will change to include arch header from the common one. Or even will 
include arch header only where it is required (common ioreq.c right now 
and Arm io.c in future).


>> @@ -38,42 +37,6 @@ int arch_ioreq_server_get_type_addr(const struct domain *d,
>>                                       uint64_t *addr);
>>   void arch_ioreq_domain_init(struct domain *d);
> As already mentioned in an earlier reply: What about these? They
> shouldn't get declared once per arch. If anything, ones that
> want to be inline functions can / should remain in the per-arch
> header.
Don't entirely get a suggestion. Is the suggestion to make "simple" ones 
inline? Why not, there are a few ones which probably want to be inline, 
such as the following for example:
- arch_ioreq_domain_init
- arch_ioreq_server_destroy
- arch_ioreq_server_destroy_all
- arch_ioreq_server_map_mem_type (probably)


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 08/23] xen/ioreq: Move x86's ioreq_server to struct domain
  2020-12-07 12:04   ` Jan Beulich
  2020-12-07 12:12     ` Paul Durrant
@ 2020-12-07 19:52     ` Oleksandr
  1 sibling, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-07 19:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Paul Durrant, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Julien Grall, xen-devel


On 07.12.20 14:04, Jan Beulich wrote:

Hi Jan

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The IOREQ is a common feature now and this struct will be used
>> on Arm as is. Move it to common struct domain. This also
>> significantly reduces the layering violation in the common code
>> (*arch.hvm* usage).
>>
>> We don't move ioreq_gfn since it is not used in the common code
>> (the "legacy" mechanism is x86 specific).
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Applicable parts
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thank you.


> yet with a question, but maybe more to Paul than to you:
>
>> --- a/xen/include/asm-x86/hvm/domain.h
>> +++ b/xen/include/asm-x86/hvm/domain.h
>> @@ -63,8 +63,6 @@ struct hvm_pi_ops {
>>       void (*vcpu_block)(struct vcpu *);
>>   };
>>   
>> -#define MAX_NR_IOREQ_SERVERS 8
>> -
>>   struct hvm_domain {
>>       /* Guest page range used for non-default ioreq servers */
>>       struct {
>> @@ -73,12 +71,6 @@ struct hvm_domain {
>>           unsigned long legacy_mask; /* indexed by HVM param number */
>>       } ioreq_gfn;
>>   
>> -    /* Lock protects all other values in the sub-struct and the default */
>> -    struct {
>> -        spinlock_t              lock;
>> -        struct ioreq_server *server[MAX_NR_IOREQ_SERVERS];
>> -    } ioreq_server;
>> -
>>       /* Cached CF8 for guest PCI config cycles */
>>       uint32_t                pci_cf8;
>>   
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -316,6 +316,8 @@ struct sched_unit {
>>   
>>   struct evtchn_port_ops;
>>   
>> +#define MAX_NR_IOREQ_SERVERS 8
>> +
>>   struct domain
>>   {
>>       domid_t          domain_id;
>> @@ -523,6 +525,14 @@ struct domain
>>       /* Argo interdomain communication support */
>>       struct argo_domain *argo;
>>   #endif
>> +
>> +#ifdef CONFIG_IOREQ_SERVER
>> +    /* Lock protects all other values in the sub-struct and the default */
>> +    struct {
>> +        spinlock_t              lock;
>> +        struct ioreq_server     *server[MAX_NR_IOREQ_SERVERS];
>> +    } ioreq_server;
>> +#endif
> The comment gets merely moved, but what "default" does it talk about?
> Is this a stale part which would better be dropped at this occasion?

I saw Paul's answer, will drop stale part.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 09/23] xen/dm: Make x86's DM feature common
  2020-12-07 12:08   ` Jan Beulich
@ 2020-12-07 20:23     ` Oleksandr
  2020-12-08  9:30       ` Jan Beulich
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-07 20:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Daniel De Graaf, Oleksandr Tyshchenko,
	xen-devel


On 07.12.20 14:08, Jan Beulich wrote:

Hi Jan.

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> From: Julien Grall <julien.grall@arm.com>
>>
>> As a lot of x86 code can be re-used on Arm later on, this patch
>> splits devicemodel support into common and arch specific parts.
>>
>> The common DM feature is supposed to be built with IOREQ_SERVER
>> option enabled (as well as the IOREQ feature), which is selected
>> for x86's config HVM for now.
>>
>> Also update XSM code a bit to let DM op be used on Arm.
>>
>> This support is going to be used on Arm to be able run device
>> emulator outside of Xen hypervisor.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> ---
>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> Changes RFC -> V1:
>>     - update XSM, related changes were pulled from:
>>       [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
>>
>> Changes V1 -> V2:
>>     - update the author of a patch
>>     - update patch description
>>     - introduce xen/dm.h and move definitions here
>>
>> Changes V2 -> V3:
>>     - no changes
> And my concern regarding the common vs arch nesting also hasn't
> changed.


I am sorry, I might misread your comment, but I failed to see any 
obvious to me request(s) for changes.
I have just re-read previous discussion...
So the question about considering doing it the other way around (top 
level dm-op handling arch-specific
and call into e.g. ioreq_server_dm_op() for otherwise unhandled ops) is 
exactly a concern which I should have addressed?

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 12/23] xen/ioreq: Remove "hvm" prefixes from involved function names
  2020-12-07 12:45   ` Jan Beulich
@ 2020-12-07 20:28     ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-07 20:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Paul Durrant, Jun Nakajima, Kevin Tian,
	Julien Grall, xen-devel


On 07.12.20 14:45, Jan Beulich wrote:

Hi Jan

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> @@ -301,8 +301,8 @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
>>       return found;
>>   }
>>   
>> -static void hvm_update_ioreq_evtchn(struct ioreq_server *s,
>> -                                    struct ioreq_vcpu *sv)
>> +static void ioreq_update_evtchn(struct ioreq_server *s,
>> +                                struct ioreq_vcpu *sv)
>>   {
>>       ASSERT(spin_is_locked(&s->lock));
> This looks to be an ioreq server function, which hence wants to be
> named ioreq_server_update_evtchn()? Then

Will rename


>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thank you

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
  2020-12-07 12:32   ` Jan Beulich
@ 2020-12-07 20:59     ` Oleksandr
  2020-12-08  7:52       ` Paul Durrant
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-07 20:59 UTC (permalink / raw)
  To: Jan Beulich, Paul Durrant
  Cc: Oleksandr Tyshchenko, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Jun Nakajima, Kevin Tian, Julien Grall,
	xen-devel


On 07.12.20 14:32, Jan Beulich wrote:

Hi Jan, Paul.

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> --- a/xen/arch/x86/hvm/emulate.c
>> +++ b/xen/arch/x86/hvm/emulate.c
>> @@ -142,8 +142,8 @@ void hvmemul_cancel(struct vcpu *v)
>>   {
>>       struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>>   
>> -    vio->io_req.state = STATE_IOREQ_NONE;
>> -    vio->io_completion = HVMIO_no_completion;
>> +    v->io.req.state = STATE_IOREQ_NONE;
>> +    v->io.completion = IO_no_completion;
>>       vio->mmio_cache_count = 0;
>>       vio->mmio_insn_bytes = 0;
>>       vio->mmio_access = (struct npfec){};
>> @@ -159,7 +159,7 @@ static int hvmemul_do_io(
>>   {
>>       struct vcpu *curr = current;
>>       struct domain *currd = curr->domain;
>> -    struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io;
>> +    struct vcpu_io *vio = &curr->io;
> Taking just these two hunks: "vio" would now stand for two entirely
> different things. I realize the name is applicable to both, but I
> wonder if such naming isn't going to risk confusion.Despite being
> relatively familiar with the involved code, I've been repeatedly
> unsure what exactly "vio" covers, and needed to go back to the

  Good comment... Agree that with the naming scheme in current patch the 
code became a little bit confusing to read.


> header. So together with the name possible adjustment mentioned
> further down, maybe "vcpu_io" also wants it name changed, such that
> the variable then also could sensibly be named (slightly)
> differently? struct vcpu_io_state maybe? Or alternatively rename
> variables of type struct hvm_vcpu_io * to hvio or hio? Otoh the
> savings aren't very big for just ->io, so maybe better to stick to
> the prior name with the prior type, and not introduce local
> variables at all for the new field, like you already have it in the
> former case?
I would much prefer the last suggestion which is "not introduce local
variables at all for the new field" (I admit I was thinking almost the 
same, but haven't chosen this direction).
But I am OK with any suggestions here. Paul what do you think?


>
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -145,6 +145,21 @@ void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */
>>   
>>   struct waitqueue_vcpu;
>>   
>> +enum io_completion {
>> +    IO_no_completion,
>> +    IO_mmio_completion,
>> +    IO_pio_completion,
>> +#ifdef CONFIG_X86
>> +    IO_realmode_completion,
>> +#endif
>> +};
> I'm not entirely happy with io_ / IO_ here - they seem a little
> too generic. How about ioreq_ / IOREQ_ respectively?

I am OK, would like to hear Paul's opinion on both questions.


>
>> +struct vcpu_io {
>> +    /* I/O request in flight to device model. */
>> +    enum io_completion   completion;
>> +    ioreq_t              req;
>> +};
>> +
>>   struct vcpu
>>   {
>>       int              vcpu_id;
>> @@ -256,6 +271,10 @@ struct vcpu
>>       struct vpci_vcpu vpci;
>>   
>>       struct arch_vcpu arch;
>> +
>> +#ifdef CONFIG_IOREQ_SERVER
>> +    struct vcpu_io io;
>> +#endif
>>   };
> I don't have a good solution in mind, and I'm also not meaning to
> necessarily request a change here, but I'd like to point out that
> this does away (for this part of it only, of course) with the
> overlaying of the PV and HVM sub-structs on x86. As long as the
> HVM part is the far bigger one, that's not a problem, but I wanted
> to mention the aspect nevertheless.
>
> Jan

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
  2020-12-07 13:03   ` Wei Chen
@ 2020-12-07 21:03     ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-07 21:03 UTC (permalink / raw)
  To: Wei Chen
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu, Julien Grall, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Tim Deegan, Daniel De Graaf,
	Volodymyr Babchuk, Jun Nakajima, Kevin Tian, Anthony PERARD,
	Bertrand Marquis, Kaly Xin, Artem Mygaiev, Alex Bennée


On 07.12.20 15:03, Wei Chen wrote:
> Hi Oleksandr,

Hi Wei


>
> I have tested v3. It works well with the latest virtio-backend service[1].
> [1] https://github.com/xen-troops/virtio-disk/commits/ioreq_ml1
>
> Tested-by: Wei Chen <Wei.Chen@arm.com>

Thank you very much for the testing!


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 10/23] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
  2020-12-07 12:11     ` Jan Beulich
@ 2020-12-07 21:06       ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-07 21:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Volodymyr Babchuk, Oleksandr Tyshchenko,
	xen-devel


On 07.12.20 14:11, Jan Beulich wrote:

Hi Jan

> On 07.12.2020 12:35, Jan Beulich wrote:
>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -4699,50 +4699,6 @@ int xenmem_add_to_physmap_one(
>>>       return rc;
>>>   }
>>>   
>>> -int arch_acquire_resource(struct domain *d, unsigned int type,
>>> -                          unsigned int id, unsigned long frame,
>>> -                          unsigned int nr_frames, xen_pfn_t mfn_list[])
>>> -{
>>> -    int rc;
>>> -
>>> -    switch ( type )
>>> -    {
>>> -#ifdef CONFIG_HVM
>>> -    case XENMEM_resource_ioreq_server:
>>> -    {
>>> -        ioservid_t ioservid = id;
>>> -        unsigned int i;
>>> -
>>> -        rc = -EINVAL;
>>> -        if ( !is_hvm_domain(d) )
>>> -            break;
>>> -
>>> -        if ( id != (unsigned int)ioservid )
>>> -            break;
>>> -
>>> -        rc = 0;
>>> -        for ( i = 0; i < nr_frames; i++ )
>>> -        {
>>> -            mfn_t mfn;
>>> -
>>> -            rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn);
>>> -            if ( rc )
>>> -                break;
>>> -
>>> -            mfn_list[i] = mfn_x(mfn);
>>> -        }
>>> -        break;
>>> -    }
>>> -#endif
>>> -
>>> -    default:
>>> -        rc = -EOPNOTSUPP;
>>> -        break;
>>> -    }
>>> -
>>> -    return rc;
>>> -}
>> Can't this be accompanied by removal of the xen/ioreq.h inclusion?
>> (I'm only looking at patch 4 right now, but the renaming there made
>> the soon to be unnecessary #include quite apparent.)
> And then, now that I've looked at this patch as a whole,
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Great, thank you.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* RE: [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
  2020-12-07 20:59     ` Oleksandr
@ 2020-12-08  7:52       ` Paul Durrant
  2020-12-08  9:35         ` Jan Beulich
  2020-12-08 18:21         ` Oleksandr
  0 siblings, 2 replies; 127+ messages in thread
From: Paul Durrant @ 2020-12-08  7:52 UTC (permalink / raw)
  To: 'Oleksandr', 'Jan Beulich'
  Cc: 'Oleksandr Tyshchenko', 'Andrew Cooper',
	'Roger Pau Monné', 'Wei Liu',
	'George Dunlap', 'Ian Jackson',
	'Julien Grall', 'Stefano Stabellini',
	'Jun Nakajima', 'Kevin Tian',
	'Julien Grall',
	xen-devel

> -----Original Message-----
> From: Oleksandr <olekstysh@gmail.com>
> Sent: 07 December 2020 21:00
> To: Jan Beulich <jbeulich@suse.com>; Paul Durrant <paul@xen.org>
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
> Roger Pau Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; George Dunlap
> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Julien Grall <julien@xen.org>; Stefano
> Stabellini <sstabellini@kernel.org>; Jun Nakajima <jun.nakajima@intel.com>; Kevin Tian
> <kevin.tian@intel.com>; Julien Grall <julien.grall@arm.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
> 
> 
> On 07.12.20 14:32, Jan Beulich wrote:
> 
> Hi Jan, Paul.
> 
> > On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> >> --- a/xen/arch/x86/hvm/emulate.c
> >> +++ b/xen/arch/x86/hvm/emulate.c
> >> @@ -142,8 +142,8 @@ void hvmemul_cancel(struct vcpu *v)
> >>   {
> >>       struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
> >>
> >> -    vio->io_req.state = STATE_IOREQ_NONE;
> >> -    vio->io_completion = HVMIO_no_completion;
> >> +    v->io.req.state = STATE_IOREQ_NONE;
> >> +    v->io.completion = IO_no_completion;
> >>       vio->mmio_cache_count = 0;
> >>       vio->mmio_insn_bytes = 0;
> >>       vio->mmio_access = (struct npfec){};
> >> @@ -159,7 +159,7 @@ static int hvmemul_do_io(
> >>   {
> >>       struct vcpu *curr = current;
> >>       struct domain *currd = curr->domain;
> >> -    struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io;
> >> +    struct vcpu_io *vio = &curr->io;
> > Taking just these two hunks: "vio" would now stand for two entirely
> > different things. I realize the name is applicable to both, but I
> > wonder if such naming isn't going to risk confusion.Despite being
> > relatively familiar with the involved code, I've been repeatedly
> > unsure what exactly "vio" covers, and needed to go back to the
> 
>   Good comment... Agree that with the naming scheme in current patch the
> code became a little bit confusing to read.
> 
> 
> > header. So together with the name possible adjustment mentioned
> > further down, maybe "vcpu_io" also wants it name changed, such that
> > the variable then also could sensibly be named (slightly)
> > differently? struct vcpu_io_state maybe? Or alternatively rename
> > variables of type struct hvm_vcpu_io * to hvio or hio? Otoh the
> > savings aren't very big for just ->io, so maybe better to stick to
> > the prior name with the prior type, and not introduce local
> > variables at all for the new field, like you already have it in the
> > former case?
> I would much prefer the last suggestion which is "not introduce local
> variables at all for the new field" (I admit I was thinking almost the
> same, but haven't chosen this direction).
> But I am OK with any suggestions here. Paul what do you think?
> 

I personally don't think there is that much risk of confusion. If there is a desire to disambiguate though, I would go the route of naming hvm_vcpu_io locals 'hvio'.

> 
> >
> >> --- a/xen/include/xen/sched.h
> >> +++ b/xen/include/xen/sched.h
> >> @@ -145,6 +145,21 @@ void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy
> */
> >>
> >>   struct waitqueue_vcpu;
> >>
> >> +enum io_completion {
> >> +    IO_no_completion,
> >> +    IO_mmio_completion,
> >> +    IO_pio_completion,
> >> +#ifdef CONFIG_X86
> >> +    IO_realmode_completion,
> >> +#endif
> >> +};
> > I'm not entirely happy with io_ / IO_ here - they seem a little
> > too generic. How about ioreq_ / IOREQ_ respectively?
> 
> I am OK, would like to hear Paul's opinion on both questions.
> 

No, I think the 'IO_' prefix is better. They relate to a field in the vcpu_io struct. An alternative might be 'VIO_'...

> 
> >
> >> +struct vcpu_io {
> >> +    /* I/O request in flight to device model. */
> >> +    enum io_completion   completion;

... in which case, you could also name the enum 'vio_completion'.

  Paul

> >> +    ioreq_t              req;
> >> +};
> >> +



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common
  2020-12-07 19:43     ` Oleksandr
@ 2020-12-08  9:21       ` Jan Beulich
  2020-12-08 13:56         ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-08  9:21 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Paul Durrant, Tim Deegan, Julien Grall, xen-devel

On 07.12.2020 20:43, Oleksandr wrote:
> On 07.12.20 13:41, Jan Beulich wrote:
>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>> @@ -38,42 +37,6 @@ int arch_ioreq_server_get_type_addr(const struct domain *d,
>>>                                       uint64_t *addr);
>>>   void arch_ioreq_domain_init(struct domain *d);
>> As already mentioned in an earlier reply: What about these? They
>> shouldn't get declared once per arch. If anything, ones that
>> want to be inline functions can / should remain in the per-arch
>> header.
> Don't entirely get a suggestion. Is the suggestion to make "simple" ones 
> inline? Why not, there are a few ones which probably want to be inline, 
> such as the following for example:
> - arch_ioreq_domain_init
> - arch_ioreq_server_destroy
> - arch_ioreq_server_destroy_all
> - arch_ioreq_server_map_mem_type (probably)

Before being able to make a suggestion, I need to have my question
answered: Why do the arch_*() declarations live in the arch header?
They represent a common interface (between common and arch code)
and hence should be declared in exactly one place. It is only at
the point where you/we _consider_ making some of them inline that
moving those (back) to the arch header may make sense. Albeit even
then I'd prefer if only the ones get moved which are expected to
be inline for all arch-es. Others would better have the arch header
indicate to the common one that no declaration is needed (such that
the declaration still remains common for all arch-es using out-of-
line functions).

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 09/23] xen/dm: Make x86's DM feature common
  2020-12-07 20:23     ` Oleksandr
@ 2020-12-08  9:30       ` Jan Beulich
  2020-12-08 14:54         ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-08  9:30 UTC (permalink / raw)
  To: Oleksandr
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Daniel De Graaf, Oleksandr Tyshchenko,
	xen-devel

On 07.12.2020 21:23, Oleksandr wrote:
> On 07.12.20 14:08, Jan Beulich wrote:
>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>> From: Julien Grall <julien.grall@arm.com>
>>>
>>> As a lot of x86 code can be re-used on Arm later on, this patch
>>> splits devicemodel support into common and arch specific parts.
>>>
>>> The common DM feature is supposed to be built with IOREQ_SERVER
>>> option enabled (as well as the IOREQ feature), which is selected
>>> for x86's config HVM for now.
>>>
>>> Also update XSM code a bit to let DM op be used on Arm.
>>>
>>> This support is going to be used on Arm to be able run device
>>> emulator outside of Xen hypervisor.
>>>
>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> ---
>>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>>> "Add support for Guest IO forwarding to a device emulator"
>>>
>>> Changes RFC -> V1:
>>>     - update XSM, related changes were pulled from:
>>>       [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
>>>
>>> Changes V1 -> V2:
>>>     - update the author of a patch
>>>     - update patch description
>>>     - introduce xen/dm.h and move definitions here
>>>
>>> Changes V2 -> V3:
>>>     - no changes
>> And my concern regarding the common vs arch nesting also hasn't
>> changed.
> 
> 
> I am sorry, I might misread your comment, but I failed to see any 
> obvious to me request(s) for changes.
> I have just re-read previous discussion...
> So the question about considering doing it the other way around (top 
> level dm-op handling arch-specific
> and call into e.g. ioreq_server_dm_op() for otherwise unhandled ops) is 
> exactly a concern which I should have addressed?

Well, on v2 you replied you didn't consider the alternative. I would
have expected that you would at least go through this consideration
process, and see whether there are better reasons to stick with the
apparently backwards arrangement than to change to the more
conventional one. If there are such reasons, I would expect them to
be called out in reply and perhaps also in the commit message; the
latter because down the road more people may wonder why the more
narrow / special set of cases gets handled at a higher layer than
the wider set of remaining ones, and they would then be able to find
an explanation without having to resort to searching through list
archives.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
  2020-12-08  7:52       ` Paul Durrant
@ 2020-12-08  9:35         ` Jan Beulich
  2020-12-08 18:21         ` Oleksandr
  1 sibling, 0 replies; 127+ messages in thread
From: Jan Beulich @ 2020-12-08  9:35 UTC (permalink / raw)
  To: paul, 'Oleksandr'
  Cc: 'Oleksandr Tyshchenko', 'Andrew Cooper',
	'Roger Pau Monné', 'Wei Liu',
	'George Dunlap', 'Ian Jackson',
	'Julien Grall', 'Stefano Stabellini',
	'Jun Nakajima', 'Kevin Tian',
	'Julien Grall',
	xen-devel

On 08.12.2020 08:52, Paul Durrant wrote:
>> From: Oleksandr <olekstysh@gmail.com>
>> Sent: 07 December 2020 21:00
>>
>> On 07.12.20 14:32, Jan Beulich wrote:
>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>> --- a/xen/include/xen/sched.h
>>>> +++ b/xen/include/xen/sched.h
>>>> @@ -145,6 +145,21 @@ void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy
>> */
>>>>
>>>>   struct waitqueue_vcpu;
>>>>
>>>> +enum io_completion {
>>>> +    IO_no_completion,
>>>> +    IO_mmio_completion,
>>>> +    IO_pio_completion,
>>>> +#ifdef CONFIG_X86
>>>> +    IO_realmode_completion,
>>>> +#endif
>>>> +};
>>> I'm not entirely happy with io_ / IO_ here - they seem a little
>>> too generic. How about ioreq_ / IOREQ_ respectively?
>>
>> I am OK, would like to hear Paul's opinion on both questions.
>>
> 
> No, I think the 'IO_' prefix is better. They relate to a field in the vcpu_io struct. An alternative might be 'VIO_'...
> 
>>
>>>
>>>> +struct vcpu_io {
>>>> +    /* I/O request in flight to device model. */
>>>> +    enum io_completion   completion;
> 
> ... in which case, you could also name the enum 'vio_completion'.

I'd be okay with these - still better than just "io".

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common
  2020-12-08  9:21       ` Jan Beulich
@ 2020-12-08 13:56         ` Oleksandr
  2020-12-08 15:02           ` Jan Beulich
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-08 13:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Paul Durrant, Tim Deegan, Julien Grall, xen-devel


On 08.12.20 11:21, Jan Beulich wrote:

Hi Jan

> On 07.12.2020 20:43, Oleksandr wrote:
>> On 07.12.20 13:41, Jan Beulich wrote:
>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>> @@ -38,42 +37,6 @@ int arch_ioreq_server_get_type_addr(const struct domain *d,
>>>>                                        uint64_t *addr);
>>>>    void arch_ioreq_domain_init(struct domain *d);
>>> As already mentioned in an earlier reply: What about these? They
>>> shouldn't get declared once per arch. If anything, ones that
>>> want to be inline functions can / should remain in the per-arch
>>> header.
>> Don't entirely get a suggestion. Is the suggestion to make "simple" ones
>> inline? Why not, there are a few ones which probably want to be inline,
>> such as the following for example:
>> - arch_ioreq_domain_init
>> - arch_ioreq_server_destroy
>> - arch_ioreq_server_destroy_all
>> - arch_ioreq_server_map_mem_type (probably)


First of all, thank you for the clarification, now your point is clear 
to me.


> Before being able to make a suggestion, I need to have my question
> answered: Why do the arch_*() declarations live in the arch header?
> They represent a common interface (between common and arch code)
> and hence should be declared in exactly one place.

I got it, I had a wrong assumption that arch hooks declarations should 
live in arch code.


> It is only at
> the point where you/we _consider_ making some of them inline that
> moving those (back) to the arch header may make sense. Albeit even
> then I'd prefer if only the ones get moved which are expected to
> be inline for all arch-es. Others would better have the arch header
> indicate to the common one that no declaration is needed (such that
> the declaration still remains common for all arch-es using out-of-
> line functions).
I got it as well.

Well, I think, in order to address your comments two options are available:
1. All arch hooks are out-of-line: мove all arch hook declarations to 
the common header here and modify
"[PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM 
features" to make all Arm variants
out-of-line (I made them inline since all of them are just stubs).
2. Some of arch hooks are inline: consider which want to be inline (for 
both arch-es) and place them into arch headers, other ones
should remain in the common header.

My question is which option is more suitable?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-11-30 10:31 ` [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
@ 2020-12-08 14:24   ` Jan Beulich
  2020-12-08 16:41     ` Oleksandr
  2020-12-09 23:49   ` Stefano Stabellini
  2021-01-15  1:18   ` Stefano Stabellini
  2 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-08 14:24 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Wei Liu, Roger Pau Monné,
	Julien Grall, xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1134,12 +1134,8 @@ static int acquire_resource(
>      xen_pfn_t mfn_list[32];
>      int rc;
>  
> -    /*
> -     * FIXME: Until foreign pages inserted into the P2M are properly
> -     *        reference counted, it is unsafe to allow mapping of
> -     *        resource pages unless the caller is the hardware domain.
> -     */
> -    if ( paging_mode_translate(currd) && !is_hardware_domain(currd) )
> +    if ( paging_mode_translate(currd) && !is_hardware_domain(currd) &&
> +         !arch_acquire_resource_check() )
>          return -EACCES;

Looks like I didn't express myself clearly enough when replying
to v2, by saying "as both prior parts of the condition should be
needed only on the x86 side, and there (for PV) there's no p2m
involved in the refcounting". While one may debate whether the
hwdom check may remain here, the "translated" one definitely
should move into the x86 hook. This (I think) will the also make
apparent that ...

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -382,6 +382,19 @@ struct p2m_domain {
>  #endif
>  #include <xen/p2m-common.h>
>  
> +static inline bool arch_acquire_resource_check(void)
> +{
> +    /*
> +     * The reference counting of foreign entries in set_foreign_p2m_entry()
> +     * is not supported on x86.
> +     *
> +     * FIXME: Until foreign pages inserted into the P2M are properly
> +     * reference counted, it is unsafe to allow mapping of
> +     * resource pages unless the caller is the hardware domain.
> +     */
> +    return false;
> +}

... the initial part of the comment is true only for translated
domains. The reference to hwdom in the latter part of the comment
(which merely gets moved here) is a good indication that the
hwdom check also wants moving here. In turn the check at the top
of p2m_add_foreign() should imo then also use this new function,
instead of effectively open-coding it (with a similar comment).
And x86's set_foreign_p2m_entry() may want to gain

    ASSERT(arch_acquire_resource_check(d));

perhaps alongside the same ASSERT() you add to the Arm variant.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 09/23] xen/dm: Make x86's DM feature common
  2020-12-08  9:30       ` Jan Beulich
@ 2020-12-08 14:54         ` Oleksandr
  2021-01-07 14:38           ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-08 14:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Daniel De Graaf, Oleksandr Tyshchenko,
	xen-devel


On 08.12.20 11:30, Jan Beulich wrote:

Hi Jan

> On 07.12.2020 21:23, Oleksandr wrote:
>> On 07.12.20 14:08, Jan Beulich wrote:
>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>> From: Julien Grall <julien.grall@arm.com>
>>>>
>>>> As a lot of x86 code can be re-used on Arm later on, this patch
>>>> splits devicemodel support into common and arch specific parts.
>>>>
>>>> The common DM feature is supposed to be built with IOREQ_SERVER
>>>> option enabled (as well as the IOREQ feature), which is selected
>>>> for x86's config HVM for now.
>>>>
>>>> Also update XSM code a bit to let DM op be used on Arm.
>>>>
>>>> This support is going to be used on Arm to be able run device
>>>> emulator outside of Xen hypervisor.
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> ---
>>>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>
>>>> Changes RFC -> V1:
>>>>      - update XSM, related changes were pulled from:
>>>>        [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
>>>>
>>>> Changes V1 -> V2:
>>>>      - update the author of a patch
>>>>      - update patch description
>>>>      - introduce xen/dm.h and move definitions here
>>>>
>>>> Changes V2 -> V3:
>>>>      - no changes
>>> And my concern regarding the common vs arch nesting also hasn't
>>> changed.
>>
>> I am sorry, I might misread your comment, but I failed to see any
>> obvious to me request(s) for changes.
>> I have just re-read previous discussion...
>> So the question about considering doing it the other way around (top
>> level dm-op handling arch-specific
>> and call into e.g. ioreq_server_dm_op() for otherwise unhandled ops) is
>> exactly a concern which I should have addressed?
> Well, on v2 you replied you didn't consider the alternative. I would
> have expected that you would at least go through this consideration
> process, and see whether there are better reasons to stick with the
> apparently backwards arrangement than to change to the more
> conventional one. If there are such reasons, I would expect them to
> be called out in reply and perhaps also in the commit message; the
> latter because down the road more people may wonder why the more
> narrow / special set of cases gets handled at a higher layer than
> the wider set of remaining ones, and they would then be able to find
> an explanation without having to resort to searching through list
> archives.
Ah, will investigate. Sorry for not paying enough attention to it.
Yes, IOREQ (I mean "common") ops are 7 out of 18 right now. The 
subsequent patch is adding one more DM op - XEN_DMOP_set_irq_level.
There are several PCI related ops which might want to be common in the 
future (but I am not sure).


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common
  2020-12-08 13:56         ` Oleksandr
@ 2020-12-08 15:02           ` Jan Beulich
  2020-12-08 17:24             ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-08 15:02 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Paul Durrant, Tim Deegan, Julien Grall, xen-devel

On 08.12.2020 14:56, Oleksandr wrote:
> 
> On 08.12.20 11:21, Jan Beulich wrote:
> 
> Hi Jan
> 
>> On 07.12.2020 20:43, Oleksandr wrote:
>>> On 07.12.20 13:41, Jan Beulich wrote:
>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>>> @@ -38,42 +37,6 @@ int arch_ioreq_server_get_type_addr(const struct domain *d,
>>>>>                                        uint64_t *addr);
>>>>>    void arch_ioreq_domain_init(struct domain *d);
>>>> As already mentioned in an earlier reply: What about these? They
>>>> shouldn't get declared once per arch. If anything, ones that
>>>> want to be inline functions can / should remain in the per-arch
>>>> header.
>>> Don't entirely get a suggestion. Is the suggestion to make "simple" ones
>>> inline? Why not, there are a few ones which probably want to be inline,
>>> such as the following for example:
>>> - arch_ioreq_domain_init
>>> - arch_ioreq_server_destroy
>>> - arch_ioreq_server_destroy_all
>>> - arch_ioreq_server_map_mem_type (probably)
> 
> 
> First of all, thank you for the clarification, now your point is clear 
> to me.
> 
> 
>> Before being able to make a suggestion, I need to have my question
>> answered: Why do the arch_*() declarations live in the arch header?
>> They represent a common interface (between common and arch code)
>> and hence should be declared in exactly one place.
> 
> I got it, I had a wrong assumption that arch hooks declarations should 
> live in arch code.
> 
> 
>> It is only at
>> the point where you/we _consider_ making some of them inline that
>> moving those (back) to the arch header may make sense. Albeit even
>> then I'd prefer if only the ones get moved which are expected to
>> be inline for all arch-es. Others would better have the arch header
>> indicate to the common one that no declaration is needed (such that
>> the declaration still remains common for all arch-es using out-of-
>> line functions).
> I got it as well.
> 
> Well, I think, in order to address your comments two options are available:
> 1. All arch hooks are out-of-line: мove all arch hook declarations to 
> the common header here and modify
> "[PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM 
> features" to make all Arm variants
> out-of-line (I made them inline since all of them are just stubs).
> 2. Some of arch hooks are inline: consider which want to be inline (for 
> both arch-es) and place them into arch headers, other ones
> should remain in the common header.
> 
> My question is which option is more suitable?

And the presumably very helpful to you answer is "Depends." It's a
case by case judgement call in the end.

Sorry, Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-11-30 10:31 ` [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() Oleksandr Tyshchenko
@ 2020-12-08 15:11   ` Jan Beulich
  2020-12-08 15:33     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-08 15:11 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Wei Liu, Paul Durrant, Julien Grall, xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> --- a/xen/include/xen/ioreq.h
> +++ b/xen/include/xen/ioreq.h
> @@ -55,6 +55,20 @@ struct ioreq_server {
>      uint8_t                bufioreq_handling;
>  };
>  
> +/*
> + * This should only be used when d == current->domain and it's not paused,

Is the "not paused" part really relevant here? Besides it being rare
that the current domain would be paused (if so, it's in the process
of having all its vCPU-s scheduled out), does this matter at all?

Apart from this the patch looks okay to me, but I'm not sure it
addresses Paul's concerns. Iirc he had suggested to switch back to
a list if doing a swipe over the entire array is too expensive in
this specific case.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 20/23] xen/ioreq: Make x86's send_invalidate_req() common
  2020-11-30 10:31 ` [PATCH V3 20/23] xen/ioreq: Make x86's send_invalidate_req() common Oleksandr Tyshchenko
@ 2020-12-08 15:24   ` Jan Beulich
  2020-12-08 16:49     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2020-12-08 15:24 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Oleksandr Tyshchenko, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Paul Durrant, Julien Grall, xen-devel

On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -552,6 +552,8 @@ struct domain
>          struct ioreq_server     *server[MAX_NR_IOREQ_SERVERS];
>          unsigned int            nr_servers;
>      } ioreq_server;
> +
> +    bool mapcache_invalidate;
>  #endif
>  };

While I can see reasons to put this inside the #ifdef here, I
would suspect putting it in the hole next to the group of 5
bools further up would be more efficient.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-12-08 15:11   ` Jan Beulich
@ 2020-12-08 15:33     ` Oleksandr
  2020-12-08 16:56       ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-08 15:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Wei Liu, Paul Durrant, Julien Grall, xen-devel


On 08.12.20 17:11, Jan Beulich wrote:

Hi Jan

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/xen/ioreq.h
>> +++ b/xen/include/xen/ioreq.h
>> @@ -55,6 +55,20 @@ struct ioreq_server {
>>       uint8_t                bufioreq_handling;
>>   };
>>   
>> +/*
>> + * This should only be used when d == current->domain and it's not paused,
> Is the "not paused" part really relevant here? Besides it being rare
> that the current domain would be paused (if so, it's in the process
> of having all its vCPU-s scheduled out), does this matter at all?do any extra actionsdo any extra actions

No, it isn't relevant, I will drop it.


>
> Apart from this the patch looks okay to me, but I'm not sure it
> addresses Paul's concerns. Iirc he had suggested to switch back to
> a list if doing a swipe over the entire array is too expensive in
> this specific case.
We would like to avoid to do any extra actions in 
leave_hypervisor_to_guest() if possible.
But not only there, the logic whether we check/set mapcache_invalidation 
variable could be avoided if a domain doesn't use IOREQ server...


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-12-08 14:24   ` Jan Beulich
@ 2020-12-08 16:41     ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-08 16:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Wei Liu, Roger Pau Monné,
	Julien Grall, xen-devel


On 08.12.20 16:24, Jan Beulich wrote:

Hi Jan

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -1134,12 +1134,8 @@ static int acquire_resource(
>>       xen_pfn_t mfn_list[32];
>>       int rc;
>>   
>> -    /*
>> -     * FIXME: Until foreign pages inserted into the P2M are properly
>> -     *        reference counted, it is unsafe to allow mapping of
>> -     *        resource pages unless the caller is the hardware domain.
>> -     */
>> -    if ( paging_mode_translate(currd) && !is_hardware_domain(currd) )
>> +    if ( paging_mode_translate(currd) && !is_hardware_domain(currd) &&
>> +         !arch_acquire_resource_check() )
>>           return -EACCES;
> Looks like I didn't express myself clearly enough when replying
> to v2, by saying "as both prior parts of the condition should be
> needed only on the x86 side, and there (for PV) there's no p2m
> involved in the refcounting". While one may debate whether the
> hwdom check may remain here, the "translated" one definitely
> should move into the x86 hook. This (I think) will the also make
> apparent that ...
>
>> --- a/xen/include/asm-x86/p2m.h
>> +++ b/xen/include/asm-x86/p2m.h
>> @@ -382,6 +382,19 @@ struct p2m_domain {
>>   #endif
>>   #include <xen/p2m-common.h>
>>   
>> +static inline bool arch_acquire_resource_check(void)
>> +{
>> +    /*
>> +     * The reference counting of foreign entries in set_foreign_p2m_entry()
>> +     * is not supported on x86.
>> +     *
>> +     * FIXME: Until foreign pages inserted into the P2M are properly
>> +     * reference counted, it is unsafe to allow mapping of
>> +     * resource pages unless the caller is the hardware domain.
>> +     */
>> +    return false;
>> +}
> ... the initial part of the comment is true only for translated
> domains. The reference to hwdom in the latter part of the comment
> (which merely gets moved here) is a good indication that the
> hwdom check also wants moving here. In turn the check at the top
> of p2m_add_foreign() should imo then also use this new function,
> instead of effectively open-coding it (with a similar comment).
> And x86's set_foreign_p2m_entry() may want to gain
>
>      ASSERT(arch_acquire_resource_check(d));
>
> perhaps alongside the same ASSERT() you add to the Arm variant.

Well, will do. I was about to mention, that new function wanted to gain 
domain as parameter, but noticed you had given a hint in the ASSERT 
example.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 20/23] xen/ioreq: Make x86's send_invalidate_req() common
  2020-12-08 15:24   ` Jan Beulich
@ 2020-12-08 16:49     ` Oleksandr
  2020-12-09  8:21       ` Jan Beulich
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-08 16:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Paul Durrant, Julien Grall, xen-devel


On 08.12.20 17:24, Jan Beulich wrote:

Hi Jan

> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -552,6 +552,8 @@ struct domain
>>           struct ioreq_server     *server[MAX_NR_IOREQ_SERVERS];
>>           unsigned int            nr_servers;
>>       } ioreq_server;
>> +
>> +    bool mapcache_invalidate;
>>   #endif
>>   };
> While I can see reasons to put this inside the #ifdef here, I
> would suspect putting it in the hole next to the group of 5
> bools further up would be more efficient.

ok, will put (although it will increase the number of #ifdef)

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-12-08 15:33     ` Oleksandr
@ 2020-12-08 16:56       ` Oleksandr
  2020-12-08 19:43         ` Paul Durrant
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-08 16:56 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Jan Beulich, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Wei Liu, Julien Grall, xen-devel


Hi Paul.


On 08.12.20 17:33, Oleksandr wrote:
>
> On 08.12.20 17:11, Jan Beulich wrote:
>
> Hi Jan
>
>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>> --- a/xen/include/xen/ioreq.h
>>> +++ b/xen/include/xen/ioreq.h
>>> @@ -55,6 +55,20 @@ struct ioreq_server {
>>>       uint8_t                bufioreq_handling;
>>>   };
>>>   +/*
>>> + * This should only be used when d == current->domain and it's not 
>>> paused,
>> Is the "not paused" part really relevant here? Besides it being rare
>> that the current domain would be paused (if so, it's in the process
>> of having all its vCPU-s scheduled out), does this matter at all?do 
>> any extra actionsdo any extra actions
>
> No, it isn't relevant, I will drop it.
>
>
>>
>> Apart from this the patch looks okay to me, but I'm not sure it
>> addresses Paul's concerns. Iirc he had suggested to switch back to
>> a list if doing a swipe over the entire array is too expensive in
>> this specific case.
> We would like to avoid to do any extra actions in 
> leave_hypervisor_to_guest() if possible.
> But not only there, the logic whether we check/set 
> mapcache_invalidation variable could be avoided if a domain doesn't 
> use IOREQ server...


Are you OK with this patch (common part of it)?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common
  2020-12-08 15:02           ` Jan Beulich
@ 2020-12-08 17:24             ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-08 17:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Paul Durrant, Tim Deegan, Julien Grall, xen-devel


On 08.12.20 17:02, Jan Beulich wrote:

Hi Jan

> On 08.12.2020 14:56, Oleksandr wrote:
>> On 08.12.20 11:21, Jan Beulich wrote:
>>
>> Hi Jan
>>
>>> On 07.12.2020 20:43, Oleksandr wrote:
>>>> On 07.12.20 13:41, Jan Beulich wrote:
>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>>>> @@ -38,42 +37,6 @@ int arch_ioreq_server_get_type_addr(const struct domain *d,
>>>>>>                                         uint64_t *addr);
>>>>>>     void arch_ioreq_domain_init(struct domain *d);
>>>>> As already mentioned in an earlier reply: What about these? They
>>>>> shouldn't get declared once per arch. If anything, ones that
>>>>> want to be inline functions can / should remain in the per-arch
>>>>> header.
>>>> Don't entirely get a suggestion. Is the suggestion to make "simple" ones
>>>> inline? Why not, there are a few ones which probably want to be inline,
>>>> such as the following for example:
>>>> - arch_ioreq_domain_init
>>>> - arch_ioreq_server_destroy
>>>> - arch_ioreq_server_destroy_all
>>>> - arch_ioreq_server_map_mem_type (probably)
>>
>> First of all, thank you for the clarification, now your point is clear
>> to me.
>>
>>
>>> Before being able to make a suggestion, I need to have my question
>>> answered: Why do the arch_*() declarations live in the arch header?
>>> They represent a common interface (between common and arch code)
>>> and hence should be declared in exactly one place.
>> I got it, I had a wrong assumption that arch hooks declarations should
>> live in arch code.
>>
>>
>>> It is only at
>>> the point where you/we _consider_ making some of them inline that
>>> moving those (back) to the arch header may make sense. Albeit even
>>> then I'd prefer if only the ones get moved which are expected to
>>> be inline for all arch-es. Others would better have the arch header
>>> indicate to the common one that no declaration is needed (such that
>>> the declaration still remains common for all arch-es using out-of-
>>> line functions).
>> I got it as well.
>>
>> Well, I think, in order to address your comments two options are available:
>> 1. All arch hooks are out-of-line: мove all arch hook declarations to
>> the common header here and modify
>> "[PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM
>> features" to make all Arm variants
>> out-of-line (I made them inline since all of them are just stubs).
>> 2. Some of arch hooks are inline: consider which want to be inline (for
>> both arch-es) and place them into arch headers, other ones
>> should remain in the common header.
>>
>> My question is which option is more suitable?
> And the presumably very helpful to you answer is "Depends." It's a
> case by case judgement call in the end.
>
> Sorry, Jan
Thank you for the honest answer. I will use option #1 since all these 
arch hooks are for single-use purpose only.
If indeed there is a need to make some of them inline, I think it could 
be done later on.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
  2020-12-08  7:52       ` Paul Durrant
  2020-12-08  9:35         ` Jan Beulich
@ 2020-12-08 18:21         ` Oleksandr
  1 sibling, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-08 18:21 UTC (permalink / raw)
  To: paul, 'Jan Beulich'
  Cc: 'Oleksandr Tyshchenko', 'Andrew Cooper',
	'Roger Pau Monné', 'Wei Liu',
	'George Dunlap', 'Ian Jackson',
	'Julien Grall', 'Stefano Stabellini',
	'Jun Nakajima', 'Kevin Tian',
	'Julien Grall',
	xen-devel


On 08.12.20 09:52, Paul Durrant wrote:

Hi Paul, Jan

>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>> --- a/xen/arch/x86/hvm/emulate.c
>>>> +++ b/xen/arch/x86/hvm/emulate.c
>>>> @@ -142,8 +142,8 @@ void hvmemul_cancel(struct vcpu *v)
>>>>    {
>>>>        struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>>>>
>>>> -    vio->io_req.state = STATE_IOREQ_NONE;
>>>> -    vio->io_completion = HVMIO_no_completion;
>>>> +    v->io.req.state = STATE_IOREQ_NONE;
>>>> +    v->io.completion = IO_no_completion;
>>>>        vio->mmio_cache_count = 0;
>>>>        vio->mmio_insn_bytes = 0;
>>>>        vio->mmio_access = (struct npfec){};
>>>> @@ -159,7 +159,7 @@ static int hvmemul_do_io(
>>>>    {
>>>>        struct vcpu *curr = current;
>>>>        struct domain *currd = curr->domain;
>>>> -    struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io;
>>>> +    struct vcpu_io *vio = &curr->io;
>>> Taking just these two hunks: "vio" would now stand for two entirely
>>> different things. I realize the name is applicable to both, but I
>>> wonder if such naming isn't going to risk confusion.Despite being
>>> relatively familiar with the involved code, I've been repeatedly
>>> unsure what exactly "vio" covers, and needed to go back to the
>>    Good comment... Agree that with the naming scheme in current patch the
>> code became a little bit confusing to read.
>>
>>
>>> header. So together with the name possible adjustment mentioned
>>> further down, maybe "vcpu_io" also wants it name changed, such that
>>> the variable then also could sensibly be named (slightly)
>>> differently? struct vcpu_io_state maybe? Or alternatively rename
>>> variables of type struct hvm_vcpu_io * to hvio or hio? Otoh the
>>> savings aren't very big for just ->io, so maybe better to stick to
>>> the prior name with the prior type, and not introduce local
>>> variables at all for the new field, like you already have it in the
>>> former case?
>> I would much prefer the last suggestion which is "not introduce local
>> variables at all for the new field" (I admit I was thinking almost the
>> same, but haven't chosen this direction).
>> But I am OK with any suggestions here. Paul what do you think?
>>
> I personally don't think there is that much risk of confusion. If there is a desire to disambiguate though, I would go the route of naming hvm_vcpu_io locals 'hvio'.
Well, I assume I should rename all hvm_vcpu_io locals in the code to 
"hvio" (even if this patch didn't touch these places so far since no new 
vcpu_io fields were involved).
I am OK, although expecting a lot of places which need touching here...


>
>>>> --- a/xen/include/xen/sched.h
>>>> +++ b/xen/include/xen/sched.h
>>>> @@ -145,6 +145,21 @@ void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy
>> */
>>>>    struct waitqueue_vcpu;
>>>>
>>>> +enum io_completion {
>>>> +    IO_no_completion,
>>>> +    IO_mmio_completion,
>>>> +    IO_pio_completion,
>>>> +#ifdef CONFIG_X86
>>>> +    IO_realmode_completion,
>>>> +#endif
>>>> +};
>>> I'm not entirely happy with io_ / IO_ here - they seem a little
>>> too generic. How about ioreq_ / IOREQ_ respectively?
>> I am OK, would like to hear Paul's opinion on both questions.
>>
> No, I think the 'IO_' prefix is better. They relate to a field in the vcpu_io struct. An alternative might be 'VIO_'...
>
>>>> +struct vcpu_io {
>>>> +    /* I/O request in flight to device model. */
>>>> +    enum io_completion   completion;
> ... in which case, you could also name the enum 'vio_completion'.
>
>    Paul

ok, will follow new renaming scheme IO_ -> VIO_ (io_ -> vio_).


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* RE: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-12-08 16:56       ` Oleksandr
@ 2020-12-08 19:43         ` Paul Durrant
  2020-12-08 20:16           ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Paul Durrant @ 2020-12-08 19:43 UTC (permalink / raw)
  To: 'Oleksandr'
  Cc: 'Jan Beulich', 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall',
	'Volodymyr Babchuk', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson', 'Wei Liu',
	'Julien Grall',
	xen-devel

> -----Original Message-----
> From: Oleksandr <olekstysh@gmail.com>
> Sent: 08 December 2020 16:57
> To: Paul Durrant <paul@xen.org>
> Cc: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano
> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Julien Grall
> <julien.grall@arm.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
> 
> 
> Hi Paul.
> 
> 
> On 08.12.20 17:33, Oleksandr wrote:
> >
> > On 08.12.20 17:11, Jan Beulich wrote:
> >
> > Hi Jan
> >
> >> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> >>> --- a/xen/include/xen/ioreq.h
> >>> +++ b/xen/include/xen/ioreq.h
> >>> @@ -55,6 +55,20 @@ struct ioreq_server {
> >>>       uint8_t                bufioreq_handling;
> >>>   };
> >>>   +/*
> >>> + * This should only be used when d == current->domain and it's not
> >>> paused,
> >> Is the "not paused" part really relevant here? Besides it being rare
> >> that the current domain would be paused (if so, it's in the process
> >> of having all its vCPU-s scheduled out), does this matter at all?do
> >> any extra actionsdo any extra actions
> >
> > No, it isn't relevant, I will drop it.
> >
> >
> >>
> >> Apart from this the patch looks okay to me, but I'm not sure it
> >> addresses Paul's concerns. Iirc he had suggested to switch back to
> >> a list if doing a swipe over the entire array is too expensive in
> >> this specific case.
> > We would like to avoid to do any extra actions in
> > leave_hypervisor_to_guest() if possible.
> > But not only there, the logic whether we check/set
> > mapcache_invalidation variable could be avoided if a domain doesn't
> > use IOREQ server...
> 
> 
> Are you OK with this patch (common part of it)?

How much of a performance benefit is this? The array is small to simply counting the non-NULL entries should be pretty quick.

  Paul

> 
> 
> --
> Regards,
> 
> Oleksandr Tyshchenko




^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-12-08 19:43         ` Paul Durrant
@ 2020-12-08 20:16           ` Oleksandr
  2020-12-09  9:01             ` Paul Durrant
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-08 20:16 UTC (permalink / raw)
  To: paul
  Cc: 'Jan Beulich', 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall',
	'Volodymyr Babchuk', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson', 'Wei Liu',
	'Julien Grall',
	xen-devel


On 08.12.20 21:43, Paul Durrant wrote:

Hi Paul

>> -----Original Message-----
>> From: Oleksandr <olekstysh@gmail.com>
>> Sent: 08 December 2020 16:57
>> To: Paul Durrant <paul@xen.org>
>> Cc: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano
>> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
>> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
>> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Julien Grall
>> <julien.grall@arm.com>; xen-devel@lists.xenproject.org
>> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
>>
>>
>> Hi Paul.
>>
>>
>> On 08.12.20 17:33, Oleksandr wrote:
>>> On 08.12.20 17:11, Jan Beulich wrote:
>>>
>>> Hi Jan
>>>
>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>>> --- a/xen/include/xen/ioreq.h
>>>>> +++ b/xen/include/xen/ioreq.h
>>>>> @@ -55,6 +55,20 @@ struct ioreq_server {
>>>>>        uint8_t                bufioreq_handling;
>>>>>    };
>>>>>    +/*
>>>>> + * This should only be used when d == current->domain and it's not
>>>>> paused,
>>>> Is the "not paused" part really relevant here? Besides it being rare
>>>> that the current domain would be paused (if so, it's in the process
>>>> of having all its vCPU-s scheduled out), does this matter at all?do
>>>> any extra actionsdo any extra actions
>>> No, it isn't relevant, I will drop it.
>>>
>>>
>>>> Apart from this the patch looks okay to me, but I'm not sure it
>>>> addresses Paul's concerns. Iirc he had suggested to switch back to
>>>> a list if doing a swipe over the entire array is too expensive in
>>>> this specific case.
>>> We would like to avoid to do any extra actions in
>>> leave_hypervisor_to_guest() if possible.
>>> But not only there, the logic whether we check/set
>>> mapcache_invalidation variable could be avoided if a domain doesn't
>>> use IOREQ server...
>>
>> Are you OK with this patch (common part of it)?
> How much of a performance benefit is this? The array is small to simply counting the non-NULL entries should be pretty quick.
I didn't perform performance measurements on how much this call consumes.
In our system we run three domains. The emulator is in DomD only, so I 
would like to avoid to call vcpu_ioreq_handle_completion() for every 
Dom0/DomU's vCPUs
if there is no real need to do it. On Arm vcpu_ioreq_handle_completion() 
is called with IRQ enabled, so the call is accompanied with 
corresponding irq_enable/irq_disable.
These unneeded actions could be avoided by using this simple one-line 
helper...


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 20/23] xen/ioreq: Make x86's send_invalidate_req() common
  2020-12-08 16:49     ` Oleksandr
@ 2020-12-09  8:21       ` Jan Beulich
  0 siblings, 0 replies; 127+ messages in thread
From: Jan Beulich @ 2020-12-09  8:21 UTC (permalink / raw)
  To: Oleksandr
  Cc: Oleksandr Tyshchenko, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Paul Durrant, Julien Grall, xen-devel

On 08.12.2020 17:49, Oleksandr wrote:
> On 08.12.20 17:24, Jan Beulich wrote:
>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -552,6 +552,8 @@ struct domain
>>>           struct ioreq_server     *server[MAX_NR_IOREQ_SERVERS];
>>>           unsigned int            nr_servers;
>>>       } ioreq_server;
>>> +
>>> +    bool mapcache_invalidate;
>>>   #endif
>>>   };
>> While I can see reasons to put this inside the #ifdef here, I
>> would suspect putting it in the hole next to the group of 5
>> bools further up would be more efficient.
> 
> ok, will put (although it will increase the number of #ifdef)

I was implying no #ifdef in this case, suitably justified by half
a sentence in the patch description.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* RE: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-12-08 20:16           ` Oleksandr
@ 2020-12-09  9:01             ` Paul Durrant
  2020-12-09 18:58               ` Julien Grall
  2020-12-09 20:36               ` Oleksandr
  0 siblings, 2 replies; 127+ messages in thread
From: Paul Durrant @ 2020-12-09  9:01 UTC (permalink / raw)
  To: 'Oleksandr'
  Cc: 'Jan Beulich', 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall',
	'Volodymyr Babchuk', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson', 'Wei Liu',
	'Julien Grall',
	xen-devel

> -----Original Message-----
> From: Oleksandr <olekstysh@gmail.com>
> Sent: 08 December 2020 20:17
> To: paul@xen.org
> Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>;
> 'Stefano Stabellini' <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk'
> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
> <george.dunlap@citrix.com>; 'Ian Jackson' <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Julien Grall'
> <julien.grall@arm.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
> 
> 
> On 08.12.20 21:43, Paul Durrant wrote:
> 
> Hi Paul
> 
> >> -----Original Message-----
> >> From: Oleksandr <olekstysh@gmail.com>
> >> Sent: 08 December 2020 16:57
> >> To: Paul Durrant <paul@xen.org>
> >> Cc: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano
> >> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
> >> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
> >> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Julien Grall
> >> <julien.grall@arm.com>; xen-devel@lists.xenproject.org
> >> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
> >>
> >>
> >> Hi Paul.
> >>
> >>
> >> On 08.12.20 17:33, Oleksandr wrote:
> >>> On 08.12.20 17:11, Jan Beulich wrote:
> >>>
> >>> Hi Jan
> >>>
> >>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> >>>>> --- a/xen/include/xen/ioreq.h
> >>>>> +++ b/xen/include/xen/ioreq.h
> >>>>> @@ -55,6 +55,20 @@ struct ioreq_server {
> >>>>>        uint8_t                bufioreq_handling;
> >>>>>    };
> >>>>>    +/*
> >>>>> + * This should only be used when d == current->domain and it's not
> >>>>> paused,
> >>>> Is the "not paused" part really relevant here? Besides it being rare
> >>>> that the current domain would be paused (if so, it's in the process
> >>>> of having all its vCPU-s scheduled out), does this matter at all?do
> >>>> any extra actionsdo any extra actions
> >>> No, it isn't relevant, I will drop it.
> >>>
> >>>
> >>>> Apart from this the patch looks okay to me, but I'm not sure it
> >>>> addresses Paul's concerns. Iirc he had suggested to switch back to
> >>>> a list if doing a swipe over the entire array is too expensive in
> >>>> this specific case.
> >>> We would like to avoid to do any extra actions in
> >>> leave_hypervisor_to_guest() if possible.
> >>> But not only there, the logic whether we check/set
> >>> mapcache_invalidation variable could be avoided if a domain doesn't
> >>> use IOREQ server...
> >>
> >> Are you OK with this patch (common part of it)?
> > How much of a performance benefit is this? The array is small to simply counting the non-NULL
> entries should be pretty quick.
> I didn't perform performance measurements on how much this call consumes.
> In our system we run three domains. The emulator is in DomD only, so I
> would like to avoid to call vcpu_ioreq_handle_completion() for every
> Dom0/DomU's vCPUs
> if there is no real need to do it.

This is not relevant to the domain that the emulator is running in; it's concerning the domains which the emulator is servicing. How many of those are there?

> On Arm vcpu_ioreq_handle_completion()
> is called with IRQ enabled, so the call is accompanied with
> corresponding irq_enable/irq_disable.
> These unneeded actions could be avoided by using this simple one-line
> helper...
> 

The helper may be one line but there is more to the patch than that. I still think you could just walk the array in the helper rather than keeping a running occupancy count.

  Paul




^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-12-09  9:01             ` Paul Durrant
@ 2020-12-09 18:58               ` Julien Grall
  2020-12-09 21:05                 ` Oleksandr
  2020-12-09 20:36               ` Oleksandr
  1 sibling, 1 reply; 127+ messages in thread
From: Julien Grall @ 2020-12-09 18:58 UTC (permalink / raw)
  To: paul, 'Oleksandr'
  Cc: 'Jan Beulich', 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Volodymyr Babchuk',
	'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Wei Liu', 'Julien Grall',
	xen-devel

Hi Oleksandr and Paul,

Sorry for jumping late in the conversation.

On 09/12/2020 09:01, Paul Durrant wrote:
>> -----Original Message-----
>> From: Oleksandr <olekstysh@gmail.com>
>> Sent: 08 December 2020 20:17
>> To: paul@xen.org
>> Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>;
>> 'Stefano Stabellini' <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk'
>> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
>> <george.dunlap@citrix.com>; 'Ian Jackson' <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Julien Grall'
>> <julien.grall@arm.com>; xen-devel@lists.xenproject.org
>> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
>>
>>
>> On 08.12.20 21:43, Paul Durrant wrote:
>>
>> Hi Paul
>>
>>>> -----Original Message-----
>>>> From: Oleksandr <olekstysh@gmail.com>
>>>> Sent: 08 December 2020 16:57
>>>> To: Paul Durrant <paul@xen.org>
>>>> Cc: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano
>>>> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
>>>> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; George Dunlap
>>>> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Julien Grall
>>>> <julien.grall@arm.com>; xen-devel@lists.xenproject.org
>>>> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
>>>>
>>>>
>>>> Hi Paul.
>>>>
>>>>
>>>> On 08.12.20 17:33, Oleksandr wrote:
>>>>> On 08.12.20 17:11, Jan Beulich wrote:
>>>>>
>>>>> Hi Jan
>>>>>
>>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>>>>> --- a/xen/include/xen/ioreq.h
>>>>>>> +++ b/xen/include/xen/ioreq.h
>>>>>>> @@ -55,6 +55,20 @@ struct ioreq_server {
>>>>>>>         uint8_t                bufioreq_handling;
>>>>>>>     };
>>>>>>>     +/*
>>>>>>> + * This should only be used when d == current->domain and it's not
>>>>>>> paused,
>>>>>> Is the "not paused" part really relevant here? Besides it being rare
>>>>>> that the current domain would be paused (if so, it's in the process
>>>>>> of having all its vCPU-s scheduled out), does this matter at all?do
>>>>>> any extra actionsdo any extra actions
>>>>> No, it isn't relevant, I will drop it.
>>>>>
>>>>>
>>>>>> Apart from this the patch looks okay to me, but I'm not sure it
>>>>>> addresses Paul's concerns. Iirc he had suggested to switch back to
>>>>>> a list if doing a swipe over the entire array is too expensive in
>>>>>> this specific case.
>>>>> We would like to avoid to do any extra actions in
>>>>> leave_hypervisor_to_guest() if possible.
>>>>> But not only there, the logic whether we check/set
>>>>> mapcache_invalidation variable could be avoided if a domain doesn't
>>>>> use IOREQ server...
>>>>
>>>> Are you OK with this patch (common part of it)?
>>> How much of a performance benefit is this? The array is small to simply counting the non-NULL
>> entries should be pretty quick.
>> I didn't perform performance measurements on how much this call consumes.
>> In our system we run three domains. The emulator is in DomD only, so I
>> would like to avoid to call vcpu_ioreq_handle_completion() for every
>> Dom0/DomU's vCPUs
>> if there is no real need to do it.
> 
> This is not relevant to the domain that the emulator is running in; it's concerning the domains which the emulator is servicing. How many of those are there?

AFAICT, the maximum number of IOREQ servers is 8 today.

> 
>> On Arm vcpu_ioreq_handle_completion()
>> is called with IRQ enabled, so the call is accompanied with
>> corresponding irq_enable/irq_disable.
>> These unneeded actions could be avoided by using this simple one-line
>> helper...
>>
> 
> The helper may be one line but there is more to the patch than that. I still think you could just walk the array in the helper rather than keeping a running occupancy count.

Right, the concern here is this function will be called in an hotpath 
(everytime we are re-entering to the guest). At the difference of x86, 
the entry/exit code is really small, so any additional code will have an 
impact on the overall performance.

That said, the IOREQ code is a tech preview for Arm. So I would be fine 
going with Paul's approach until we have a better understanding on the 
performance of virtio/IOREQ.

I am going to throw some more thoughts about the optimization here. The 
patch is focusing on performance impact when IOREQ is built-in and not 
used. I think we can do further optimization (which may superseed this one).

get_pending_vcpu() (called from handle_hvm_io_completion()) is overly 
expensive in particular if you have no I/O forwarded to an IOREQ server. 
Entry to the hypervisor can happen for many reasons (interrupts, system 
registers emulation, I/O emulation...) and the I/O forwarded should be a 
small subset.

Ideally, handle_hvm_io_completion() should be a NOP (at max a few 
instructions) if there are nothing to do. Maybe we want to introduce a 
per-vCPU flag indicating if an I/O has been forwarded to an IOREQ server.

This would also us to bypass most of the function if there is nothing to do.

Any thoughts?

In any case this is more a forward looking rather than a request for the 
current series. What matters to me is we have a functional (not 
necessarily optimized) version of IOREQ in Xen 4.15. This would be a 
great step towards using Virto on Xen.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-12-09  9:01             ` Paul Durrant
  2020-12-09 18:58               ` Julien Grall
@ 2020-12-09 20:36               ` Oleksandr
  2020-12-10  8:38                 ` Paul Durrant
  1 sibling, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-09 20:36 UTC (permalink / raw)
  To: paul
  Cc: 'Jan Beulich', 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall',
	'Volodymyr Babchuk', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson', 'Wei Liu',
	'Julien Grall',
	xen-devel


Hi Paul.


>>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>>>>> --- a/xen/include/xen/ioreq.h
>>>>>>> +++ b/xen/include/xen/ioreq.h
>>>>>>> @@ -55,6 +55,20 @@ struct ioreq_server {
>>>>>>>         uint8_t                bufioreq_handling;
>>>>>>>     };
>>>>>>>     +/*
>>>>>>> + * This should only be used when d == current->domain and it's not
>>>>>>> paused,
>>>>>> Is the "not paused" part really relevant here? Besides it being rare
>>>>>> that the current domain would be paused (if so, it's in the process
>>>>>> of having all its vCPU-s scheduled out), does this matter at all?do
>>>>>> any extra actionsdo any extra actions
>>>>> No, it isn't relevant, I will drop it.
>>>>>
>>>>>
>>>>>> Apart from this the patch looks okay to me, but I'm not sure it
>>>>>> addresses Paul's concerns. Iirc he had suggested to switch back to
>>>>>> a list if doing a swipe over the entire array is too expensive in
>>>>>> this specific case.
>>>>> We would like to avoid to do any extra actions in
>>>>> leave_hypervisor_to_guest() if possible.
>>>>> But not only there, the logic whether we check/set
>>>>> mapcache_invalidation variable could be avoided if a domain doesn't
>>>>> use IOREQ server...
>>>> Are you OK with this patch (common part of it)?
>>> How much of a performance benefit is this? The array is small to simply counting the non-NULL
>> entries should be pretty quick.
>> I didn't perform performance measurements on how much this call consumes.
>> In our system we run three domains. The emulator is in DomD only, so I
>> would like to avoid to call vcpu_ioreq_handle_completion() for every
>> Dom0/DomU's vCPUs
>> if there is no real need to do it.
> This is not relevant to the domain that the emulator is running in; it's concerning the domains which the emulator is servicing. How many of those are there?
Err, yes, I wasn't precise when providing an example.
Single emulator is running in DomD and servicing DomU. So with the 
helper in place the vcpu_ioreq_handle_completion() gets only called for 
DomU vCPUs (as expected).
Without an optimization the vcpu_ioreq_handle_completion() gets called 
for _all_ vCPUs, and I see it as an extra action for Dom0, DomD vCPUs.


>
>> On Arm vcpu_ioreq_handle_completion()
>> is called with IRQ enabled, so the call is accompanied with
>> corresponding irq_enable/irq_disable.
>> These unneeded actions could be avoided by using this simple one-line
>> helper...
>>
> The helper may be one line but there is more to the patch than that. I still think you could just walk the array in the helper rather than keeping a running occupancy count.

OK, is the implementation below close to what you propose? If yes, I 
will update a helper and drop nr_servers variable.

bool domain_has_ioreq_server(const struct domain *d)
{
     const struct ioreq_server *s;
     unsigned int id;

     FOR_EACH_IOREQ_SERVER(d, id, s)
         return true;

     return false;
}

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-12-09 18:58               ` Julien Grall
@ 2020-12-09 21:05                 ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-09 21:05 UTC (permalink / raw)
  To: Julien Grall, paul
  Cc: 'Jan Beulich', 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Volodymyr Babchuk',
	'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Wei Liu', 'Julien Grall',
	xen-devel


On 09.12.20 20:58, Julien Grall wrote:
> Hi Oleksandr and Paul,

Hi Julien, Paul.


>
> Sorry for jumping late in the conversation.
>
> On 09/12/2020 09:01, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Oleksandr <olekstysh@gmail.com>
>>> Sent: 08 December 2020 20:17
>>> To: paul@xen.org
>>> Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' 
>>> <oleksandr_tyshchenko@epam.com>;
>>> 'Stefano Stabellini' <sstabellini@kernel.org>; 'Julien Grall' 
>>> <julien@xen.org>; 'Volodymyr Babchuk'
>>> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' 
>>> <andrew.cooper3@citrix.com>; 'George Dunlap'
>>> <george.dunlap@citrix.com>; 'Ian Jackson' <iwj@xenproject.org>; 'Wei 
>>> Liu' <wl@xen.org>; 'Julien Grall'
>>> <julien.grall@arm.com>; xen-devel@lists.xenproject.org
>>> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce 
>>> domain_has_ioreq_server()
>>>
>>>
>>> On 08.12.20 21:43, Paul Durrant wrote:
>>>
>>> Hi Paul
>>>
>>>>> -----Original Message-----
>>>>> From: Oleksandr <olekstysh@gmail.com>
>>>>> Sent: 08 December 2020 16:57
>>>>> To: Paul Durrant <paul@xen.org>
>>>>> Cc: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko 
>>>>> <oleksandr_tyshchenko@epam.com>; Stefano
>>>>> Stabellini <sstabellini@kernel.org>; Julien Grall 
>>>>> <julien@xen.org>; Volodymyr Babchuk
>>>>> <Volodymyr_Babchuk@epam.com>; Andrew Cooper 
>>>>> <andrew.cooper3@citrix.com>; George Dunlap
>>>>> <george.dunlap@citrix.com>; Ian Jackson <iwj@xenproject.org>; Wei 
>>>>> Liu <wl@xen.org>; Julien Grall
>>>>> <julien.grall@arm.com>; xen-devel@lists.xenproject.org
>>>>> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce 
>>>>> domain_has_ioreq_server()
>>>>>
>>>>>
>>>>> Hi Paul.
>>>>>
>>>>>
>>>>> On 08.12.20 17:33, Oleksandr wrote:
>>>>>> On 08.12.20 17:11, Jan Beulich wrote:
>>>>>>
>>>>>> Hi Jan
>>>>>>
>>>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>>>>>> --- a/xen/include/xen/ioreq.h
>>>>>>>> +++ b/xen/include/xen/ioreq.h
>>>>>>>> @@ -55,6 +55,20 @@ struct ioreq_server {
>>>>>>>>         uint8_t                bufioreq_handling;
>>>>>>>>     };
>>>>>>>>     +/*
>>>>>>>> + * This should only be used when d == current->domain and it's 
>>>>>>>> not
>>>>>>>> paused,
>>>>>>> Is the "not paused" part really relevant here? Besides it being 
>>>>>>> rare
>>>>>>> that the current domain would be paused (if so, it's in the process
>>>>>>> of having all its vCPU-s scheduled out), does this matter at all?do
>>>>>>> any extra actionsdo any extra actions
>>>>>> No, it isn't relevant, I will drop it.
>>>>>>
>>>>>>
>>>>>>> Apart from this the patch looks okay to me, but I'm not sure it
>>>>>>> addresses Paul's concerns. Iirc he had suggested to switch back to
>>>>>>> a list if doing a swipe over the entire array is too expensive in
>>>>>>> this specific case.
>>>>>> We would like to avoid to do any extra actions in
>>>>>> leave_hypervisor_to_guest() if possible.
>>>>>> But not only there, the logic whether we check/set
>>>>>> mapcache_invalidation variable could be avoided if a domain doesn't
>>>>>> use IOREQ server...
>>>>>
>>>>> Are you OK with this patch (common part of it)?
>>>> How much of a performance benefit is this? The array is small to 
>>>> simply counting the non-NULL
>>> entries should be pretty quick.
>>> I didn't perform performance measurements on how much this call 
>>> consumes.
>>> In our system we run three domains. The emulator is in DomD only, so I
>>> would like to avoid to call vcpu_ioreq_handle_completion() for every
>>> Dom0/DomU's vCPUs
>>> if there is no real need to do it.
>>
>> This is not relevant to the domain that the emulator is running in; 
>> it's concerning the domains which the emulator is servicing. How many 
>> of those are there?
>
> AFAICT, the maximum number of IOREQ servers is 8 today.
>
>>
>>> On Arm vcpu_ioreq_handle_completion()
>>> is called with IRQ enabled, so the call is accompanied with
>>> corresponding irq_enable/irq_disable.
>>> These unneeded actions could be avoided by using this simple one-line
>>> helper...
>>>
>>
>> The helper may be one line but there is more to the patch than that. 
>> I still think you could just walk the array in the helper rather than 
>> keeping a running occupancy count.
>
> Right, the concern here is this function will be called in an hotpath 
> (everytime we are re-entering to the guest). At the difference of x86, 
> the entry/exit code is really small, so any additional code will have 
> an impact on the overall performance.
+1


>
>
> That said, the IOREQ code is a tech preview for Arm. So I would be 
> fine going with Paul's approach until we have a better understanding 
> on the performance of virtio/IOREQ.

I am fine with Paul's approach for now (I only need a confirmation that 
I got it correctly).


>
>
> I am going to throw some more thoughts about the optimization here. 
> The patch is focusing on performance impact when IOREQ is built-in and 
> not used.
It is true, what I would to add here is the helper also avoids 
unnecessary vcpu_ioreq_handle_completion() calls as well another 
unnecessary action
(mapcache handling logic, although it is not a hotpath) in subsequent 
patch when IOREQ is used.


> I think we can do further optimization (which may superseed this one).
>
> get_pending_vcpu() (called from handle_hvm_io_completion()) is overly 
> expensive in particular if you have no I/O forwarded to an IOREQ 
> server. Entry to the hypervisor can happen for many reasons 
> (interrupts, system registers emulation, I/O emulation...) and the I/O 
> forwarded should be a small subset.
>
> Ideally, handle_hvm_io_completion() should be a NOP (at max a few 
> instructions) if there are nothing to do. Maybe we want to introduce a 
> per-vCPU flag indicating if an I/O has been forwarded to an IOREQ server.
>
> This would also us to bypass most of the function if there is nothing 
> to do.
>
> Any thoughts?
>
> In any case this is more a forward looking rather than a request for 
> the current series. What matters to me is we have a functional (not 
> necessarily optimized) version of IOREQ in Xen 4.15. This would be a 
> great step towards using Virto on Xen.

Completely agree, current series is quite big) and if we will try to 
make it perfect I am afraid, we won't have it even in Xen 4.16). As for 
proposed optimization - I think it worth considering, I will mention 
about it in the cover letter for the series among other possible things 
such as buffered request, etc.


>
>
> Cheers,
>
-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 13/23] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-11-30 10:31 ` [PATCH V3 13/23] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
@ 2020-12-09 21:32   ` Stefano Stabellini
  2020-12-09 22:34     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-09 21:32 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Paul Durrant, Julien Grall

On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The cmpxchg() in ioreq_send_buffered() operates on memory shared
> with the emulator domain (and the target domain if the legacy
> interface is used).
> 
> In order to be on the safe side we need to switch
> to guest_cmpxchg64() to prevent a domain to DoS Xen on Arm.
> 
> As there is no plan to support the legacy interface on Arm,
> we will have a page to be mapped in a single domain at the time,
> so we can use s->emulator in guest_cmpxchg64() safely.
> 
> Thankfully the only user of the legacy interface is x86 so far
> and there is not concern regarding the atomics operations.
> 
> Please note, that the legacy interface *must* not be used on Arm
> without revisiting the code.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> CC: Julien Grall <julien.grall@arm.com>
>
> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Changes RFC -> V1:
>    - new patch
> 
> Changes V1 -> V2:
>    - move earlier to avoid breaking arm32 compilation
>    - add an explanation to commit description and hvm_allow_set_param()
>    - pass s->emulator
> 
> Changes V2 -> V3:
>    - update patch description
> ---
> ---
>  xen/arch/arm/hvm.c | 4 ++++
>  xen/common/ioreq.c | 3 ++-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
> index 8951b34..9694e5a 100644
> --- a/xen/arch/arm/hvm.c
> +++ b/xen/arch/arm/hvm.c
> @@ -31,6 +31,10 @@
>  
>  #include <asm/hypercall.h>
>  
> +/*
> + * The legacy interface (which involves magic IOREQ pages) *must* not be used
> + * without revisiting the code.
> + */

This is a NIT, but I'd prefer if you moved the comment a few lines
below, maybe just before the existing comment starting with "The
following parameters".

The reason is that as it is now it is not clear which set_params
interfaces should not be used without revisiting the code.

With that:

Acked-by: Stefano Stabellini <sstabellini@kernel.org>


>  static int hvm_allow_set_param(const struct domain *d, unsigned int param)
>  {
>      switch ( param )
> diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
> index 3ca5b96..4855dd8 100644
> --- a/xen/common/ioreq.c
> +++ b/xen/common/ioreq.c
> @@ -29,6 +29,7 @@
>  #include <xen/trace.h>
>  #include <xen/vpci.h>
>  
> +#include <asm/guest_atomics.h>
>  #include <asm/hvm/ioreq.h>
>  
>  #include <public/hvm/ioreq.h>
> @@ -1182,7 +1183,7 @@ static int ioreq_send_buffered(struct ioreq_server *s, ioreq_t *p)
>  
>          new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
>          new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> -        cmpxchg(&pg->ptrs.full, old.full, new.full);
> +        guest_cmpxchg64(s->emulator, &pg->ptrs.full, old.full, new.full);
>      }
>  
>      notify_via_xen_event_channel(d, s->bufioreq_evtchn);
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-11-30 10:31 ` [PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
@ 2020-12-09 22:04   ` Stefano Stabellini
  2020-12-09 22:49     ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-09 22:04 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Julien Grall, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Oleksandr Tyshchenko

[-- Attachment #1: Type: text/plain, Size: 19958 bytes --]

On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> This patch adds basic IOREQ/DM support on Arm. The subsequent
> patches will improve functionality and add remaining bits.
> 
> The IOREQ/DM features are supposed to be built with IOREQ_SERVER
> option enabled, which is disabled by default on Arm for now.
> 
> Please note, the "PIO handling" TODO is expected to left unaddressed
> for the current series. It is not an big issue for now while Xen
> doesn't have support for vPCI on Arm. On Arm64 they are only used
> for PCI IO Bar and we would probably want to expose them to emulator
> as PIO access to make a DM completely arch-agnostic. So "PIO handling"
> should be implemented when we add support for vPCI.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Changes RFC -> V1:
>    - was split into:
>      - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
>      - xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
>    - update patch description
>    - update asm-arm/hvm/ioreq.h according to the newly introduced arch functions:
>      - arch_hvm_destroy_ioreq_server()
>      - arch_handle_hvm_io_completion()
>    - update arch files to include xen/ioreq.h
>    - remove HVMOP plumbing
>    - rewrite a logic to handle properly case when hvm_send_ioreq() returns IO_RETRY
>    - add a logic to handle properly handle_hvm_io_completion() return value
>    - rename handle_mmio() to ioreq_handle_complete_mmio()
>    - move paging_mark_pfn_dirty() to asm-arm/paging.h
>    - remove forward declaration for hvm_ioreq_server in asm-arm/paging.h
>    - move try_fwd_ioserv() to ioreq.c, provide stubs if !CONFIG_IOREQ_SERVER
>    - do not remove #ifdef CONFIG_IOREQ_SERVER in memory.c for guarding xen/ioreq.h
>    - use gdprintk in try_fwd_ioserv(), remove unneeded prints
>    - update list of #include-s
>    - move has_vpci() to asm-arm/domain.h
>    - add a comment (TODO) to unimplemented yet handle_pio()
>    - remove hvm_mmio_first(last)_byte() and hvm_ioreq_(page/vcpu/server) structs
>      from the arch files, they were already moved to the common code
>    - remove set_foreign_p2m_entry() changes, they will be properly implemented
>      in the follow-up patch
>    - select IOREQ_SERVER for Arm instead of Arm64 in Kconfig
>    - remove x86's realmode and other unneeded stubs from xen/ioreq.h
>    - clafify ioreq_t p.df usage in try_fwd_ioserv()
>    - set ioreq_t p.count to 1 in try_fwd_ioserv()
> 
> Changes V1 -> V2:
>    - was split into:
>      - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
>      - xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
>    - update the author of a patch
>    - update patch description
>    - move a loop in leave_hypervisor_to_guest() to a separate patch
>    - set IOREQ_SERVER disabled by default
>    - remove already clarified /* XXX */
>    - replace BUG() by ASSERT_UNREACHABLE() in handle_pio()
>    - remove default case for handling the return value of try_handle_mmio()
>    - remove struct hvm_domain, enum hvm_io_completion, struct hvm_vcpu_io,
>      struct hvm_vcpu from asm-arm/domain.h, these are common materials now
>    - update everything according to the recent changes (IOREQ related function
>      names don't contain "hvm" prefixes/infixes anymore, IOREQ related fields
>      are part of common struct vcpu/domain now, etc)
> 
> Changes V2 -> V3:
>    - update patch according the "legacy interface" is x86 specific
>    - add dummy arch hooks
>    - remove dummy paging_mark_pfn_dirty()
>    - don’t include <xen/domain_page.h> in common ioreq.c
>    - don’t include <public/hvm/ioreq.h> in arch ioreq.h
>    - remove #define ioreq_params(d, i)
> ---
> ---
>  xen/arch/arm/Makefile           |   2 +
>  xen/arch/arm/dm.c               |  34 ++++++++++
>  xen/arch/arm/domain.c           |   9 +++
>  xen/arch/arm/io.c               |  11 +++-
>  xen/arch/arm/ioreq.c            | 141 ++++++++++++++++++++++++++++++++++++++++
>  xen/arch/arm/traps.c            |  13 ++++
>  xen/include/asm-arm/domain.h    |   3 +
>  xen/include/asm-arm/hvm/ioreq.h | 139 +++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/mmio.h      |   1 +
>  9 files changed, 352 insertions(+), 1 deletion(-)
>  create mode 100644 xen/arch/arm/dm.c
>  create mode 100644 xen/arch/arm/ioreq.c
>  create mode 100644 xen/include/asm-arm/hvm/ioreq.h
> 
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 296c5e6..c3ff454 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -13,6 +13,7 @@ obj-y += cpuerrata.o
>  obj-y += cpufeature.o
>  obj-y += decode.o
>  obj-y += device.o
> +obj-$(CONFIG_IOREQ_SERVER) += dm.o
>  obj-y += domain.o
>  obj-y += domain_build.init.o
>  obj-y += domctl.o
> @@ -27,6 +28,7 @@ obj-y += guest_atomics.o
>  obj-y += guest_walk.o
>  obj-y += hvm.o
>  obj-y += io.o
> +obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
>  obj-y += irq.o
>  obj-y += kernel.init.o
>  obj-$(CONFIG_LIVEPATCH) += livepatch.o
> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
> new file mode 100644
> index 0000000..5d3da37
> --- /dev/null
> +++ b/xen/arch/arm/dm.c
> @@ -0,0 +1,34 @@
> +/*
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/dm.h>
> +#include <xen/hypercall.h>
> +
> +int arch_dm_op(struct xen_dm_op *op, struct domain *d,
> +               const struct dmop_args *op_args, bool *const_op)
> +{
> +    return -EOPNOTSUPP;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 18cafcd..8f55aba 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -15,6 +15,7 @@
>  #include <xen/guest_access.h>
>  #include <xen/hypercall.h>
>  #include <xen/init.h>
> +#include <xen/ioreq.h>
>  #include <xen/lib.h>
>  #include <xen/livepatch.h>
>  #include <xen/sched.h>
> @@ -696,6 +697,10 @@ int arch_domain_create(struct domain *d,
>  
>      ASSERT(config != NULL);
>  
> +#ifdef CONFIG_IOREQ_SERVER
> +    ioreq_domain_init(d);
> +#endif
> +
>      /* p2m_init relies on some value initialized by the IOMMU subsystem */
>      if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
>          goto fail;
> @@ -1014,6 +1019,10 @@ int domain_relinquish_resources(struct domain *d)
>          if (ret )
>              return ret;
>  
> +#ifdef CONFIG_IOREQ_SERVER
> +        ioreq_server_destroy_all(d);
> +#endif
> +
>      PROGRESS(xen):
>          ret = relinquish_memory(d, &d->xenpage_list);
>          if ( ret )
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index ae7ef96..f44cfd4 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -23,6 +23,7 @@
>  #include <asm/cpuerrata.h>
>  #include <asm/current.h>
>  #include <asm/mmio.h>
> +#include <asm/hvm/ioreq.h>
>  
>  #include "decode.h"
>  
> @@ -123,7 +124,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>  
>      handler = find_mmio_handler(v->domain, info.gpa);
>      if ( !handler )
> -        return IO_UNHANDLED;
> +    {
> +        int rc;
> +
> +        rc = try_fwd_ioserv(regs, v, &info);
> +        if ( rc == IO_HANDLED )
> +            return handle_ioserv(regs, v);
> +
> +        return rc;
> +    }
>  
>      /* All the instructions used on emulated MMIO region should be valid */
>      if ( !dabt.valid )
> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> new file mode 100644
> index 0000000..f08190c
> --- /dev/null
> +++ b/xen/arch/arm/ioreq.c
> @@ -0,0 +1,141 @@
> +/*
> + * arm/ioreq.c: hardware virtual machine I/O emulation
> + *
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/domain.h>
> +#include <xen/ioreq.h>
> +
> +#include <asm/traps.h>
> +
> +#include <public/hvm/ioreq.h>
> +
> +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
> +{
> +    const union hsr hsr = { .bits = regs->hsr };
> +    const struct hsr_dabt dabt = hsr.dabt;
> +    /* Code is similar to handle_read */
> +    uint8_t size = (1 << dabt.size) * 8;
> +    register_t r = v->io.req.data;
> +
> +    /* We are done with the IO */
> +    v->io.req.state = STATE_IOREQ_NONE;
> +
> +    if ( dabt.write )
> +        return IO_HANDLED;
> +
> +    /*
> +     * Sign extend if required.
> +     * Note that we expect the read handler to have zeroed the bits
> +     * outside the requested access size.
> +     */
> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
> +    {
> +        /*
> +         * We are relying on register_t using the same as
> +         * an unsigned long in order to keep the 32-bit assembly
> +         * code smaller.
> +         */
> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
> +        r |= (~0UL) << size;
> +    }
> +
> +    set_user_reg(regs, dabt.reg, r);

Could you introduce a set_user_reg_signextend static inline function
that can be used both here and in handle_read?


> +    return IO_HANDLED;
> +}
> +
> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> +                             struct vcpu *v, mmio_info_t *info)
> +{
> +    struct vcpu_io *vio = &v->io;
> +    ioreq_t p = {
> +        .type = IOREQ_TYPE_COPY,
> +        .addr = info->gpa,
> +        .size = 1 << info->dabt.size,
> +        .count = 1,
> +        .dir = !info->dabt.write,
> +        /*
> +         * On x86, df is used by 'rep' instruction to tell the direction
> +         * to iterate (forward or backward).
> +         * On Arm, all the accesses to MMIO region will do a single
> +         * memory access. So for now, we can safely always set to 0.
> +         */
> +        .df = 0,
> +        .data = get_user_reg(regs, info->dabt.reg),
> +        .state = STATE_IOREQ_READY,
> +    };
> +    struct ioreq_server *s = NULL;
> +    enum io_state rc;
> +
> +    switch ( vio->req.state )
> +    {
> +    case STATE_IOREQ_NONE:
> +        break;
> +
> +    case STATE_IORESP_READY:
> +        return IO_HANDLED;
> +
> +    default:
> +        gdprintk(XENLOG_ERR, "wrong state %u\n", vio->req.state);
> +        return IO_ABORT;
> +    }
> +
> +    s = ioreq_server_select(v->domain, &p);
> +    if ( !s )
> +        return IO_UNHANDLED;
> +
> +    if ( !info->dabt.valid )
> +        return IO_ABORT;
> +
> +    vio->req = p;
> +
> +    rc = ioreq_send(s, &p, 0);
> +    if ( rc != IO_RETRY || v->domain->is_shutting_down )
> +        vio->req.state = STATE_IOREQ_NONE;
> +    else if ( !ioreq_needs_completion(&vio->req) )
> +        rc = IO_HANDLED;
> +    else
> +        vio->completion = IO_mmio_completion;
> +
> +    return rc;
> +}
> +
> +bool ioreq_complete_mmio(void)
> +{
> +    struct vcpu *v = current;
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    const union hsr hsr = { .bits = regs->hsr };
> +    paddr_t addr = v->io.req.addr;
> +
> +    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
> +    {
> +        advance_pc(regs, hsr);
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 22bd1bd..036b13f 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -21,6 +21,7 @@
>  #include <xen/hypercall.h>
>  #include <xen/init.h>
>  #include <xen/iocap.h>
> +#include <xen/ioreq.h>
>  #include <xen/irq.h>
>  #include <xen/lib.h>
>  #include <xen/mem_access.h>
> @@ -1385,6 +1386,9 @@ static arm_hypercall_t arm_hypercall_table[] = {
>  #ifdef CONFIG_HYPFS
>      HYPERCALL(hypfs_op, 5),
>  #endif
> +#ifdef CONFIG_IOREQ_SERVER
> +    HYPERCALL(dm_op, 3),
> +#endif
>  };
>  
>  #ifndef NDEBUG
> @@ -1956,6 +1960,9 @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
>              case IO_HANDLED:
>                  advance_pc(regs, hsr);
>                  return;
> +            case IO_RETRY:
> +                /* finish later */
> +                return;
>              case IO_UNHANDLED:
>                  /* IO unhandled, try another way to handle it. */
>                  break;
> @@ -2254,6 +2261,12 @@ static void check_for_vcpu_work(void)
>  {
>      struct vcpu *v = current;
>  
> +#ifdef CONFIG_IOREQ_SERVER
> +    local_irq_enable();
> +    vcpu_ioreq_handle_completion(v);
> +    local_irq_disable();
> +#endif
> +
>      if ( likely(!v->arch.need_flush_to_ram) )
>          return;
>  
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 6819a3b..c235e5b 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -10,6 +10,7 @@
>  #include <asm/gic.h>
>  #include <asm/vgic.h>
>  #include <asm/vpl011.h>
> +#include <public/hvm/dm_op.h>
>  #include <public/hvm/params.h>
>  
>  struct hvm_domain
> @@ -262,6 +263,8 @@ static inline void arch_vcpu_block(struct vcpu *v) {}
>  
>  #define arch_vm_assist_valid_mask(d) (1UL << VMASST_TYPE_runstate_update_flag)
>  
> +#define has_vpci(d)    ({ (void)(d); false; })
> +
>  #endif /* __ASM_DOMAIN_H__ */
>  
>  /*
> diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h
> new file mode 100644
> index 0000000..2bffc7a
> --- /dev/null
> +++ b/xen/include/asm-arm/hvm/ioreq.h
> @@ -0,0 +1,139 @@
> +/*
> + * hvm.h: Hardware virtual machine assist interface definitions.
> + *
> + * Copyright (c) 2016 Citrix Systems Inc.
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __ASM_ARM_HVM_IOREQ_H__
> +#define __ASM_ARM_HVM_IOREQ_H__
> +
> +#include <xen/ioreq.h>
> +
> +#ifdef CONFIG_IOREQ_SERVER
> +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v);
> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> +                             struct vcpu *v, mmio_info_t *info);
> +#else
> +static inline enum io_state handle_ioserv(struct cpu_user_regs *regs,
> +                                          struct vcpu *v)
> +{
> +    return IO_UNHANDLED;
> +}
> +
> +static inline enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> +                                           struct vcpu *v, mmio_info_t *info)
> +{
> +    return IO_UNHANDLED;
> +}
> +#endif

If we are providing stub functions, then we can also provide stub
functions for:

ioreq_domain_init
ioreq_server_destroy_all

and avoid the ifdefs.


> +bool ioreq_complete_mmio(void);
> +
> +static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
> +{
> +    /*
> +     * TODO: For Arm64, the main user will be PCI. So this should be
> +     * implemented when we add support for vPCI.
> +     */
> +    ASSERT_UNREACHABLE();
> +    return true;
> +}
> +
> +static inline void msix_write_completion(struct vcpu *v)
> +{
> +}
> +
> +static inline bool arch_vcpu_ioreq_completion(enum io_completion io_completion)
> +{
> +    ASSERT_UNREACHABLE();
> +    return true;
> +}
> +
> +/*
> + * The "legacy" mechanism of mapping magic pages for the IOREQ servers
> + * is x86 specific, so the following hooks don't need to be implemented on Arm:
> + * - arch_ioreq_server_map_pages
> + * - arch_ioreq_server_unmap_pages
> + * - arch_ioreq_server_enable
> + * - arch_ioreq_server_disable
> + */
> +static inline int arch_ioreq_server_map_pages(struct ioreq_server *s)
> +{
> +    return -EOPNOTSUPP;
> +}
> +
> +static inline void arch_ioreq_server_unmap_pages(struct ioreq_server *s)
> +{
> +}
> +
> +static inline void arch_ioreq_server_enable(struct ioreq_server *s)
> +{
> +}
> +
> +static inline void arch_ioreq_server_disable(struct ioreq_server *s)
> +{
> +}
> +
> +static inline void arch_ioreq_server_destroy(struct ioreq_server *s)
> +{
> +}
> +
> +static inline int arch_ioreq_server_map_mem_type(struct domain *d,
> +                                                 struct ioreq_server *s,
> +                                                 uint32_t flags)
> +{
> +    return -EOPNOTSUPP;
> +}
> +
> +static inline bool arch_ioreq_server_destroy_all(struct domain *d)
> +{
> +    return true;
> +}
> +
> +static inline int arch_ioreq_server_get_type_addr(const struct domain *d,
> +                                                  const ioreq_t *p,
> +                                                  uint8_t *type,
> +                                                  uint64_t *addr)
> +{
> +    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
> +        return -EINVAL;
> +
> +    *type = (p->type == IOREQ_TYPE_PIO) ?
> +             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
> +    *addr = p->addr;

This function is not used in this patch and PIOs are left unimplemented
according to a few comments, so I am puzzled by this code here. Do we
need it?


> +    return 0;
> +}
> +
> +static inline void arch_ioreq_domain_init(struct domain *d)
> +{
> +}
> +
> +#define IOREQ_STATUS_HANDLED     IO_HANDLED
> +#define IOREQ_STATUS_UNHANDLED   IO_UNHANDLED
> +#define IOREQ_STATUS_RETRY       IO_RETRY
> +
> +#endif /* __ASM_ARM_HVM_IOREQ_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h
> index 8dbfb27..7ab873c 100644
> --- a/xen/include/asm-arm/mmio.h
> +++ b/xen/include/asm-arm/mmio.h
> @@ -37,6 +37,7 @@ enum io_state
>      IO_ABORT,       /* The IO was handled by the helper and led to an abort. */
>      IO_HANDLED,     /* The IO was successfully handled by the helper. */
>      IO_UNHANDLED,   /* The IO was not handled by the helper. */
> +    IO_RETRY,       /* Retry the emulation for some reason */
>  };
>  
>  typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info,
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 13/23] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-12-09 21:32   ` Stefano Stabellini
@ 2020-12-09 22:34     ` Oleksandr
  2020-12-10  2:30       ` Stefano Stabellini
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-09 22:34 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Oleksandr Tyshchenko, Julien Grall, Volodymyr Babchuk,
	Paul Durrant, Julien Grall


On 09.12.20 23:32, Stefano Stabellini wrote:

Hi Stefano

> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The cmpxchg() in ioreq_send_buffered() operates on memory shared
>> with the emulator domain (and the target domain if the legacy
>> interface is used).
>>
>> In order to be on the safe side we need to switch
>> to guest_cmpxchg64() to prevent a domain to DoS Xen on Arm.
>>
>> As there is no plan to support the legacy interface on Arm,
>> we will have a page to be mapped in a single domain at the time,
>> so we can use s->emulator in guest_cmpxchg64() safely.
>>
>> Thankfully the only user of the legacy interface is x86 so far
>> and there is not concern regarding the atomics operations.
>>
>> Please note, that the legacy interface *must* not be used on Arm
>> without revisiting the code.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> CC: Julien Grall <julien.grall@arm.com>
>>
>> ---
>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> Changes RFC -> V1:
>>     - new patch
>>
>> Changes V1 -> V2:
>>     - move earlier to avoid breaking arm32 compilation
>>     - add an explanation to commit description and hvm_allow_set_param()
>>     - pass s->emulator
>>
>> Changes V2 -> V3:
>>     - update patch description
>> ---
>> ---
>>   xen/arch/arm/hvm.c | 4 ++++
>>   xen/common/ioreq.c | 3 ++-
>>   2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
>> index 8951b34..9694e5a 100644
>> --- a/xen/arch/arm/hvm.c
>> +++ b/xen/arch/arm/hvm.c
>> @@ -31,6 +31,10 @@
>>   
>>   #include <asm/hypercall.h>
>>   
>> +/*
>> + * The legacy interface (which involves magic IOREQ pages) *must* not be used
>> + * without revisiting the code.
>> + */
> This is a NIT, but I'd prefer if you moved the comment a few lines
> below, maybe just before the existing comment starting with "The
> following parameters".
>
> The reason is that as it is now it is not clear which set_params
> interfaces should not be used without revisiting the code.
OK, but maybe this comment wants dropping at all? It was actual when the 
legacy interface was the part of the common code (V2). Now the legacy 
interface is
x86 specific so I am not sure this comment should be here.


>
> With that:
>
> Acked-by: Stefano Stabellini <sstabellini@kernel.org>

Thank you


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-12-09 22:04   ` Stefano Stabellini
@ 2020-12-09 22:49     ` Oleksandr
  2020-12-10  2:30       ` Stefano Stabellini
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-09 22:49 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Julien Grall, Julien Grall, Volodymyr Babchuk,
	Oleksandr Tyshchenko


On 10.12.20 00:04, Stefano Stabellini wrote:

Hi Stefano

> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
>> From: Julien Grall <julien.grall@arm.com>
>>
>> This patch adds basic IOREQ/DM support on Arm. The subsequent
>> patches will improve functionality and add remaining bits.
>>
>> The IOREQ/DM features are supposed to be built with IOREQ_SERVER
>> option enabled, which is disabled by default on Arm for now.
>>
>> Please note, the "PIO handling" TODO is expected to left unaddressed
>> for the current series. It is not an big issue for now while Xen
>> doesn't have support for vPCI on Arm. On Arm64 they are only used
>> for PCI IO Bar and we would probably want to expose them to emulator
>> as PIO access to make a DM completely arch-agnostic. So "PIO handling"
>> should be implemented when we add support for vPCI.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> ---
>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> Changes RFC -> V1:
>>     - was split into:
>>       - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
>>       - xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
>>     - update patch description
>>     - update asm-arm/hvm/ioreq.h according to the newly introduced arch functions:
>>       - arch_hvm_destroy_ioreq_server()
>>       - arch_handle_hvm_io_completion()
>>     - update arch files to include xen/ioreq.h
>>     - remove HVMOP plumbing
>>     - rewrite a logic to handle properly case when hvm_send_ioreq() returns IO_RETRY
>>     - add a logic to handle properly handle_hvm_io_completion() return value
>>     - rename handle_mmio() to ioreq_handle_complete_mmio()
>>     - move paging_mark_pfn_dirty() to asm-arm/paging.h
>>     - remove forward declaration for hvm_ioreq_server in asm-arm/paging.h
>>     - move try_fwd_ioserv() to ioreq.c, provide stubs if !CONFIG_IOREQ_SERVER
>>     - do not remove #ifdef CONFIG_IOREQ_SERVER in memory.c for guarding xen/ioreq.h
>>     - use gdprintk in try_fwd_ioserv(), remove unneeded prints
>>     - update list of #include-s
>>     - move has_vpci() to asm-arm/domain.h
>>     - add a comment (TODO) to unimplemented yet handle_pio()
>>     - remove hvm_mmio_first(last)_byte() and hvm_ioreq_(page/vcpu/server) structs
>>       from the arch files, they were already moved to the common code
>>     - remove set_foreign_p2m_entry() changes, they will be properly implemented
>>       in the follow-up patch
>>     - select IOREQ_SERVER for Arm instead of Arm64 in Kconfig
>>     - remove x86's realmode and other unneeded stubs from xen/ioreq.h
>>     - clafify ioreq_t p.df usage in try_fwd_ioserv()
>>     - set ioreq_t p.count to 1 in try_fwd_ioserv()
>>
>> Changes V1 -> V2:
>>     - was split into:
>>       - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
>>       - xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
>>     - update the author of a patch
>>     - update patch description
>>     - move a loop in leave_hypervisor_to_guest() to a separate patch
>>     - set IOREQ_SERVER disabled by default
>>     - remove already clarified /* XXX */
>>     - replace BUG() by ASSERT_UNREACHABLE() in handle_pio()
>>     - remove default case for handling the return value of try_handle_mmio()
>>     - remove struct hvm_domain, enum hvm_io_completion, struct hvm_vcpu_io,
>>       struct hvm_vcpu from asm-arm/domain.h, these are common materials now
>>     - update everything according to the recent changes (IOREQ related function
>>       names don't contain "hvm" prefixes/infixes anymore, IOREQ related fields
>>       are part of common struct vcpu/domain now, etc)
>>
>> Changes V2 -> V3:
>>     - update patch according the "legacy interface" is x86 specific
>>     - add dummy arch hooks
>>     - remove dummy paging_mark_pfn_dirty()
>>     - don’t include <xen/domain_page.h> in common ioreq.c
>>     - don’t include <public/hvm/ioreq.h> in arch ioreq.h
>>     - remove #define ioreq_params(d, i)
>> ---
>> ---
>>   xen/arch/arm/Makefile           |   2 +
>>   xen/arch/arm/dm.c               |  34 ++++++++++
>>   xen/arch/arm/domain.c           |   9 +++
>>   xen/arch/arm/io.c               |  11 +++-
>>   xen/arch/arm/ioreq.c            | 141 ++++++++++++++++++++++++++++++++++++++++
>>   xen/arch/arm/traps.c            |  13 ++++
>>   xen/include/asm-arm/domain.h    |   3 +
>>   xen/include/asm-arm/hvm/ioreq.h | 139 +++++++++++++++++++++++++++++++++++++++
>>   xen/include/asm-arm/mmio.h      |   1 +
>>   9 files changed, 352 insertions(+), 1 deletion(-)
>>   create mode 100644 xen/arch/arm/dm.c
>>   create mode 100644 xen/arch/arm/ioreq.c
>>   create mode 100644 xen/include/asm-arm/hvm/ioreq.h
>>
>> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
>> index 296c5e6..c3ff454 100644
>> --- a/xen/arch/arm/Makefile
>> +++ b/xen/arch/arm/Makefile
>> @@ -13,6 +13,7 @@ obj-y += cpuerrata.o
>>   obj-y += cpufeature.o
>>   obj-y += decode.o
>>   obj-y += device.o
>> +obj-$(CONFIG_IOREQ_SERVER) += dm.o
>>   obj-y += domain.o
>>   obj-y += domain_build.init.o
>>   obj-y += domctl.o
>> @@ -27,6 +28,7 @@ obj-y += guest_atomics.o
>>   obj-y += guest_walk.o
>>   obj-y += hvm.o
>>   obj-y += io.o
>> +obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
>>   obj-y += irq.o
>>   obj-y += kernel.init.o
>>   obj-$(CONFIG_LIVEPATCH) += livepatch.o
>> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
>> new file mode 100644
>> index 0000000..5d3da37
>> --- /dev/null
>> +++ b/xen/arch/arm/dm.c
>> @@ -0,0 +1,34 @@
>> +/*
>> + * Copyright (c) 2019 Arm ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <xen/dm.h>
>> +#include <xen/hypercall.h>
>> +
>> +int arch_dm_op(struct xen_dm_op *op, struct domain *d,
>> +               const struct dmop_args *op_args, bool *const_op)
>> +{
>> +    return -EOPNOTSUPP;
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 18cafcd..8f55aba 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -15,6 +15,7 @@
>>   #include <xen/guest_access.h>
>>   #include <xen/hypercall.h>
>>   #include <xen/init.h>
>> +#include <xen/ioreq.h>
>>   #include <xen/lib.h>
>>   #include <xen/livepatch.h>
>>   #include <xen/sched.h>
>> @@ -696,6 +697,10 @@ int arch_domain_create(struct domain *d,
>>   
>>       ASSERT(config != NULL);
>>   
>> +#ifdef CONFIG_IOREQ_SERVER
>> +    ioreq_domain_init(d);
>> +#endif
>> +
>>       /* p2m_init relies on some value initialized by the IOMMU subsystem */
>>       if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
>>           goto fail;
>> @@ -1014,6 +1019,10 @@ int domain_relinquish_resources(struct domain *d)
>>           if (ret )
>>               return ret;
>>   
>> +#ifdef CONFIG_IOREQ_SERVER
>> +        ioreq_server_destroy_all(d);
>> +#endif
>> +
>>       PROGRESS(xen):
>>           ret = relinquish_memory(d, &d->xenpage_list);
>>           if ( ret )
>> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>> index ae7ef96..f44cfd4 100644
>> --- a/xen/arch/arm/io.c
>> +++ b/xen/arch/arm/io.c
>> @@ -23,6 +23,7 @@
>>   #include <asm/cpuerrata.h>
>>   #include <asm/current.h>
>>   #include <asm/mmio.h>
>> +#include <asm/hvm/ioreq.h>
>>   
>>   #include "decode.h"
>>   
>> @@ -123,7 +124,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>   
>>       handler = find_mmio_handler(v->domain, info.gpa);
>>       if ( !handler )
>> -        return IO_UNHANDLED;
>> +    {
>> +        int rc;
>> +
>> +        rc = try_fwd_ioserv(regs, v, &info);
>> +        if ( rc == IO_HANDLED )
>> +            return handle_ioserv(regs, v);
>> +
>> +        return rc;
>> +    }
>>   
>>       /* All the instructions used on emulated MMIO region should be valid */
>>       if ( !dabt.valid )
>> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
>> new file mode 100644
>> index 0000000..f08190c
>> --- /dev/null
>> +++ b/xen/arch/arm/ioreq.c
>> @@ -0,0 +1,141 @@
>> +/*
>> + * arm/ioreq.c: hardware virtual machine I/O emulation
>> + *
>> + * Copyright (c) 2019 Arm ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <xen/domain.h>
>> +#include <xen/ioreq.h>
>> +
>> +#include <asm/traps.h>
>> +
>> +#include <public/hvm/ioreq.h>
>> +
>> +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
>> +{
>> +    const union hsr hsr = { .bits = regs->hsr };
>> +    const struct hsr_dabt dabt = hsr.dabt;
>> +    /* Code is similar to handle_read */
>> +    uint8_t size = (1 << dabt.size) * 8;
>> +    register_t r = v->io.req.data;
>> +
>> +    /* We are done with the IO */
>> +    v->io.req.state = STATE_IOREQ_NONE;
>> +
>> +    if ( dabt.write )
>> +        return IO_HANDLED;
>> +
>> +    /*
>> +     * Sign extend if required.
>> +     * Note that we expect the read handler to have zeroed the bits
>> +     * outside the requested access size.
>> +     */
>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>> +    {
>> +        /*
>> +         * We are relying on register_t using the same as
>> +         * an unsigned long in order to keep the 32-bit assembly
>> +         * code smaller.
>> +         */
>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>> +        r |= (~0UL) << size;
>> +    }
>> +
>> +    set_user_reg(regs, dabt.reg, r);
> Could you introduce a set_user_reg_signextend static inline function
> that can be used both here and in handle_read?
Yes, already done (this was requested by Julien). Please see
https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg86986.html


>
>
>> +    return IO_HANDLED;
>> +}
>> +
>> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>> +                             struct vcpu *v, mmio_info_t *info)
>> +{
>> +    struct vcpu_io *vio = &v->io;
>> +    ioreq_t p = {
>> +        .type = IOREQ_TYPE_COPY,
>> +        .addr = info->gpa,
>> +        .size = 1 << info->dabt.size,
>> +        .count = 1,
>> +        .dir = !info->dabt.write,
>> +        /*
>> +         * On x86, df is used by 'rep' instruction to tell the direction
>> +         * to iterate (forward or backward).
>> +         * On Arm, all the accesses to MMIO region will do a single
>> +         * memory access. So for now, we can safely always set to 0.
>> +         */
>> +        .df = 0,
>> +        .data = get_user_reg(regs, info->dabt.reg),
>> +        .state = STATE_IOREQ_READY,
>> +    };
>> +    struct ioreq_server *s = NULL;
>> +    enum io_state rc;
>> +
>> +    switch ( vio->req.state )
>> +    {
>> +    case STATE_IOREQ_NONE:
>> +        break;
>> +
>> +    case STATE_IORESP_READY:
>> +        return IO_HANDLED;
>> +
>> +    default:
>> +        gdprintk(XENLOG_ERR, "wrong state %u\n", vio->req.state);
>> +        return IO_ABORT;
>> +    }
>> +
>> +    s = ioreq_server_select(v->domain, &p);
>> +    if ( !s )
>> +        return IO_UNHANDLED;
>> +
>> +    if ( !info->dabt.valid )
>> +        return IO_ABORT;
>> +
>> +    vio->req = p;
>> +
>> +    rc = ioreq_send(s, &p, 0);
>> +    if ( rc != IO_RETRY || v->domain->is_shutting_down )
>> +        vio->req.state = STATE_IOREQ_NONE;
>> +    else if ( !ioreq_needs_completion(&vio->req) )
>> +        rc = IO_HANDLED;
>> +    else
>> +        vio->completion = IO_mmio_completion;
>> +
>> +    return rc;
>> +}
>> +
>> +bool ioreq_complete_mmio(void)
>> +{
>> +    struct vcpu *v = current;
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +    const union hsr hsr = { .bits = regs->hsr };
>> +    paddr_t addr = v->io.req.addr;
>> +
>> +    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
>> +    {
>> +        advance_pc(regs, hsr);
>> +        return true;
>> +    }
>> +
>> +    return false;
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>> index 22bd1bd..036b13f 100644
>> --- a/xen/arch/arm/traps.c
>> +++ b/xen/arch/arm/traps.c
>> @@ -21,6 +21,7 @@
>>   #include <xen/hypercall.h>
>>   #include <xen/init.h>
>>   #include <xen/iocap.h>
>> +#include <xen/ioreq.h>
>>   #include <xen/irq.h>
>>   #include <xen/lib.h>
>>   #include <xen/mem_access.h>
>> @@ -1385,6 +1386,9 @@ static arm_hypercall_t arm_hypercall_table[] = {
>>   #ifdef CONFIG_HYPFS
>>       HYPERCALL(hypfs_op, 5),
>>   #endif
>> +#ifdef CONFIG_IOREQ_SERVER
>> +    HYPERCALL(dm_op, 3),
>> +#endif
>>   };
>>   
>>   #ifndef NDEBUG
>> @@ -1956,6 +1960,9 @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
>>               case IO_HANDLED:
>>                   advance_pc(regs, hsr);
>>                   return;
>> +            case IO_RETRY:
>> +                /* finish later */
>> +                return;
>>               case IO_UNHANDLED:
>>                   /* IO unhandled, try another way to handle it. */
>>                   break;
>> @@ -2254,6 +2261,12 @@ static void check_for_vcpu_work(void)
>>   {
>>       struct vcpu *v = current;
>>   
>> +#ifdef CONFIG_IOREQ_SERVER
>> +    local_irq_enable();
>> +    vcpu_ioreq_handle_completion(v);
>> +    local_irq_disable();
>> +#endif
>> +
>>       if ( likely(!v->arch.need_flush_to_ram) )
>>           return;
>>   
>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>> index 6819a3b..c235e5b 100644
>> --- a/xen/include/asm-arm/domain.h
>> +++ b/xen/include/asm-arm/domain.h
>> @@ -10,6 +10,7 @@
>>   #include <asm/gic.h>
>>   #include <asm/vgic.h>
>>   #include <asm/vpl011.h>
>> +#include <public/hvm/dm_op.h>
>>   #include <public/hvm/params.h>
>>   
>>   struct hvm_domain
>> @@ -262,6 +263,8 @@ static inline void arch_vcpu_block(struct vcpu *v) {}
>>   
>>   #define arch_vm_assist_valid_mask(d) (1UL << VMASST_TYPE_runstate_update_flag)
>>   
>> +#define has_vpci(d)    ({ (void)(d); false; })
>> +
>>   #endif /* __ASM_DOMAIN_H__ */
>>   
>>   /*
>> diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h
>> new file mode 100644
>> index 0000000..2bffc7a
>> --- /dev/null
>> +++ b/xen/include/asm-arm/hvm/ioreq.h
>> @@ -0,0 +1,139 @@
>> +/*
>> + * hvm.h: Hardware virtual machine assist interface definitions.
>> + *
>> + * Copyright (c) 2016 Citrix Systems Inc.
>> + * Copyright (c) 2019 Arm ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef __ASM_ARM_HVM_IOREQ_H__
>> +#define __ASM_ARM_HVM_IOREQ_H__
>> +
>> +#include <xen/ioreq.h>
>> +
>> +#ifdef CONFIG_IOREQ_SERVER
>> +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v);
>> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>> +                             struct vcpu *v, mmio_info_t *info);
>> +#else
>> +static inline enum io_state handle_ioserv(struct cpu_user_regs *regs,
>> +                                          struct vcpu *v)
>> +{
>> +    return IO_UNHANDLED;
>> +}
>> +
>> +static inline enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>> +                                           struct vcpu *v, mmio_info_t *info)
>> +{
>> +    return IO_UNHANDLED;
>> +}
>> +#endif
> If we are providing stub functions, then we can also provide stub
> functions for:
>
> ioreq_domain_init
> ioreq_server_destroy_all
>
> and avoid the ifdefs.
I got your point. These are common IOREQ interface functions, which 
declarations live in the common header, should I provide
stubs in the common ioreq.h?


>
>
>> +bool ioreq_complete_mmio(void);
>> +
>> +static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
>> +{
>> +    /*
>> +     * TODO: For Arm64, the main user will be PCI. So this should be
>> +     * implemented when we add support for vPCI.
>> +     */
>> +    ASSERT_UNREACHABLE();
>> +    return true;
>> +}
>> +
>> +static inline void msix_write_completion(struct vcpu *v)
>> +{
>> +}
>> +
>> +static inline bool arch_vcpu_ioreq_completion(enum io_completion io_completion)
>> +{
>> +    ASSERT_UNREACHABLE();
>> +    return true;
>> +}
>> +
>> +/*
>> + * The "legacy" mechanism of mapping magic pages for the IOREQ servers
>> + * is x86 specific, so the following hooks don't need to be implemented on Arm:
>> + * - arch_ioreq_server_map_pages
>> + * - arch_ioreq_server_unmap_pages
>> + * - arch_ioreq_server_enable
>> + * - arch_ioreq_server_disable
>> + */
>> +static inline int arch_ioreq_server_map_pages(struct ioreq_server *s)
>> +{
>> +    return -EOPNOTSUPP;
>> +}
>> +
>> +static inline void arch_ioreq_server_unmap_pages(struct ioreq_server *s)
>> +{
>> +}
>> +
>> +static inline void arch_ioreq_server_enable(struct ioreq_server *s)
>> +{
>> +}
>> +
>> +static inline void arch_ioreq_server_disable(struct ioreq_server *s)
>> +{
>> +}
>> +
>> +static inline void arch_ioreq_server_destroy(struct ioreq_server *s)
>> +{
>> +}
>> +
>> +static inline int arch_ioreq_server_map_mem_type(struct domain *d,
>> +                                                 struct ioreq_server *s,
>> +                                                 uint32_t flags)
>> +{
>> +    return -EOPNOTSUPP;
>> +}
>> +
>> +static inline bool arch_ioreq_server_destroy_all(struct domain *d)
>> +{
>> +    return true;
>> +}
>> +
>> +static inline int arch_ioreq_server_get_type_addr(const struct domain *d,
>> +                                                  const ioreq_t *p,
>> +                                                  uint8_t *type,
>> +                                                  uint64_t *addr)
>> +{
>> +    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
>> +        return -EINVAL;
>> +
>> +    *type = (p->type == IOREQ_TYPE_PIO) ?
>> +             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
>> +    *addr = p->addr;
> This function is not used in this patch and PIOs are left unimplemented
> according to a few comments, so I am puzzled by this code here. Do we
> need it?
Yes. It is called from ioreq_server_select (common/ioreq.c). I could 
just skip PIO case and use
*type = XEN_DMOP_IO_RANGE_MEMORY, but I didn't want to diverge.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
  2020-11-30 10:31 ` [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed Oleksandr Tyshchenko
  2020-11-30 20:51   ` Volodymyr Babchuk
@ 2020-12-09 23:18   ` Stefano Stabellini
  2020-12-09 23:35     ` Stefano Stabellini
  2020-12-09 23:38     ` Julien Grall
  1 sibling, 2 replies; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-09 23:18 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Julien Grall

On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch adds proper handling of return value of
> vcpu_ioreq_handle_completion() which involves using a loop
> in leave_hypervisor_to_guest().
> 
> The reason to use an unbounded loop here is the fact that vCPU
> shouldn't continue until an I/O has completed. In Xen case, if an I/O
> never completes then it most likely means that something went horribly
> wrong with the Device Emulator. And it is most likely not safe to
> continue. So letting the vCPU to spin forever if I/O never completes
> is a safer action than letting it continue and leaving the guest in
> unclear state and is the best what we can do for now.
> 
> This wouldn't be an issue for Xen as do_softirq() would be called at
> every loop. In case of failure, the guest will crash and the vCPU
> will be unscheduled.

Imagine that we have two guests: one that requires an ioreq server and
one that doesn't. If I am not mistaken this loop could potentially spin
forever on a pcpu, thus preventing any other guest being scheduled, even
if the other guest doesn't need any ioreq servers.


My other concern is that we are busy-looping. Could we call something
like wfi() or do_idle() instead? The ioreq server event notification of
completion should wake us up?

Following this line of thinking, I am wondering if instead of the
busy-loop we should call vcpu_block_unless_event_pending(current) in
try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or
equivalent) calls xenevtchn_notify which ends up waking up the domU
vcpu. Would that work?



> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> CC: Julien Grall <julien.grall@arm.com>
> 
> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Changes V1 -> V2:
>    - new patch, changes were derived from (+ new explanation):
>      arm/ioreq: Introduce arch specific bits for IOREQ/DM features
> 
> Changes V2 -> V3:
>    - update patch description
> ---
> ---
>  xen/arch/arm/traps.c | 31 ++++++++++++++++++++++++++-----
>  1 file changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 036b13f..4cef43e 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -2257,18 +2257,23 @@ static void check_for_pcpu_work(void)
>   * Process pending work for the vCPU. Any call should be fast or
>   * implement preemption.
>   */
> -static void check_for_vcpu_work(void)
> +static bool check_for_vcpu_work(void)
>  {
>      struct vcpu *v = current;
>  
>  #ifdef CONFIG_IOREQ_SERVER
> +    bool handled;
> +
>      local_irq_enable();
> -    vcpu_ioreq_handle_completion(v);
> +    handled = vcpu_ioreq_handle_completion(v);
>      local_irq_disable();
> +
> +    if ( !handled )
> +        return true;
>  #endif
>  
>      if ( likely(!v->arch.need_flush_to_ram) )
> -        return;
> +        return false;
>  
>      /*
>       * Give a chance for the pCPU to process work before handling the vCPU
> @@ -2279,6 +2284,8 @@ static void check_for_vcpu_work(void)
>      local_irq_enable();
>      p2m_flush_vm(v);
>      local_irq_disable();
> +
> +    return false;
>  }
>  
>  /*
> @@ -2291,8 +2298,22 @@ void leave_hypervisor_to_guest(void)
>  {
>      local_irq_disable();
>  
> -    check_for_vcpu_work();
> -    check_for_pcpu_work();
> +    /*
> +     * The reason to use an unbounded loop here is the fact that vCPU
> +     * shouldn't continue until an I/O has completed. In Xen case, if an I/O
> +     * never completes then it most likely means that something went horribly
> +     * wrong with the Device Emulator. And it is most likely not safe to
> +     * continue. So letting the vCPU to spin forever if I/O never completes
> +     * is a safer action than letting it continue and leaving the guest in
> +     * unclear state and is the best what we can do for now.
> +     *
> +     * This wouldn't be an issue for Xen as do_softirq() would be called at
> +     * every loop. In case of failure, the guest will crash and the vCPU
> +     * will be unscheduled.
> +     */
> +    do {
> +        check_for_pcpu_work();
> +    } while ( check_for_vcpu_work() );
>  
>      vgic_sync_to_lrs();
>  
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
  2020-12-09 23:18   ` Stefano Stabellini
@ 2020-12-09 23:35     ` Stefano Stabellini
  2020-12-09 23:47       ` Julien Grall
  2020-12-09 23:38     ` Julien Grall
  1 sibling, 1 reply; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-09 23:35 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, xen-devel, Oleksandr Tyshchenko,
	Julien Grall, Volodymyr Babchuk, Julien Grall

On Wed, 9 Dec 2020, Stefano Stabellini wrote:
> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > 
> > This patch adds proper handling of return value of
> > vcpu_ioreq_handle_completion() which involves using a loop
> > in leave_hypervisor_to_guest().
> > 
> > The reason to use an unbounded loop here is the fact that vCPU
> > shouldn't continue until an I/O has completed. In Xen case, if an I/O
> > never completes then it most likely means that something went horribly
> > wrong with the Device Emulator. And it is most likely not safe to
> > continue. So letting the vCPU to spin forever if I/O never completes
> > is a safer action than letting it continue and leaving the guest in
> > unclear state and is the best what we can do for now.
> > 
> > This wouldn't be an issue for Xen as do_softirq() would be called at
> > every loop. In case of failure, the guest will crash and the vCPU
> > will be unscheduled.
> 
> Imagine that we have two guests: one that requires an ioreq server and
> one that doesn't. If I am not mistaken this loop could potentially spin
> forever on a pcpu, thus preventing any other guest being scheduled, even
> if the other guest doesn't need any ioreq servers.
> 
> 
> My other concern is that we are busy-looping. Could we call something
> like wfi() or do_idle() instead? The ioreq server event notification of
> completion should wake us up?
> 
> Following this line of thinking, I am wondering if instead of the
> busy-loop we should call vcpu_block_unless_event_pending(current) in
> try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or
> equivalent) calls xenevtchn_notify which ends up waking up the domU
> vcpu. Would that work?

I read now Julien's reply: we are already doing something similar to
what I suggested with the following call chain:

check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io -> wait_on_xen_event_channel

So the busy-loop here is only a safety-belt in cause of a spurious
wake-up, in which case we are going to call again check_for_vcpu_work,
potentially causing a guest reschedule.

Then, this is fine and addresses both my concerns. Maybe let's add a note
in the commit message about it.


I am also wondering if there is any benefit in calling wait_for_io()
earlier, maybe from try_handle_mmio if IO_RETRY?
leave_hypervisor_to_guest is very late for that. In any case, it is not
an important optimization (if it is even an optimization at all) so it
is fine to leave it as is.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
  2020-12-09 23:18   ` Stefano Stabellini
  2020-12-09 23:35     ` Stefano Stabellini
@ 2020-12-09 23:38     ` Julien Grall
  1 sibling, 0 replies; 127+ messages in thread
From: Julien Grall @ 2020-12-09 23:38 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Volodymyr Babchuk, Julien Grall

Hi Stefano,

On 09/12/2020 23:18, Stefano Stabellini wrote:
> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch adds proper handling of return value of
>> vcpu_ioreq_handle_completion() which involves using a loop
>> in leave_hypervisor_to_guest().
>>
>> The reason to use an unbounded loop here is the fact that vCPU
>> shouldn't continue until an I/O has completed. In Xen case, if an I/O
>> never completes then it most likely means that something went horribly
>> wrong with the Device Emulator. And it is most likely not safe to
>> continue. So letting the vCPU to spin forever if I/O never completes
>> is a safer action than letting it continue and leaving the guest in
>> unclear state and is the best what we can do for now.
>>
>> This wouldn't be an issue for Xen as do_softirq() would be called at
>> every loop. In case of failure, the guest will crash and the vCPU
>> will be unscheduled.
>
> Imagine that we have two guests: one that requires an ioreq server and
> one that doesn't. If I am not mistaken this loop could potentially spin
> forever on a pcpu, thus preventing any other guest being scheduled, even
> if the other guest doesn't need any ioreq servers.

That's not correct. At every loop we will call check_for_pcpu_work() 
that will process pending softirqs. If the rescheduling is necessary 
(might be set by a timer or by a caller in check_for_vcpu_work()), then 
the vCPU will be descheduled to give place to someone else.

> 
> 
> My other concern is that we are busy-looping. Could we call something
> like wfi() or do_idle() instead? The ioreq server event notification of
> completion should wake us up?

There are no busy loop here. If the IOREQ has not yet handled the I/O we 
will block the vCPU until an event notification is received (see the 
call to wait_on_xen_event_channel()).

This loop make sure that all the vPCU works are done before we return to 
the guest.

The worse that can happen here if the vCPU will never run again if the 
IOREQ server is been naughty.

> 
> Following this line of thinking, I am wondering if instead of the
> busy-loop we should call vcpu_block_unless_event_pending(current) in
> try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or
> equivalent) calls xenevtchn_notify which ends up waking up the domU
> vcpu. Would that work?

vcpu_block_unless_event_pending() will not block if there are interrupts 
pending. However, here we really want to block until the I/O has been 
completed. So vcpu_block_unless_event_pending() is not the right approach.

The IOREQ code is using wait_on_xen_event_channel(). Yet, this can still 
"exit" early if an event has been received. But this doesn't mean the 
I/O has completed. So we need to check if the I/O has completed and wait 
again if it hasn't.

I seem to keep having to explain how the code works. So maybe we want to 
update the commit message with more details?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
  2020-12-09 23:35     ` Stefano Stabellini
@ 2020-12-09 23:47       ` Julien Grall
  2020-12-10  2:30         ` Stefano Stabellini
  0 siblings, 1 reply; 127+ messages in thread
From: Julien Grall @ 2020-12-09 23:47 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, xen-devel, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Julien Grall



On 09/12/2020 23:35, Stefano Stabellini wrote:
> On Wed, 9 Dec 2020, Stefano Stabellini wrote:
>> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> This patch adds proper handling of return value of
>>> vcpu_ioreq_handle_completion() which involves using a loop
>>> in leave_hypervisor_to_guest().
>>>
>>> The reason to use an unbounded loop here is the fact that vCPU
>>> shouldn't continue until an I/O has completed. In Xen case, if an I/O
>>> never completes then it most likely means that something went horribly
>>> wrong with the Device Emulator. And it is most likely not safe to
>>> continue. So letting the vCPU to spin forever if I/O never completes
>>> is a safer action than letting it continue and leaving the guest in
>>> unclear state and is the best what we can do for now.
>>>
>>> This wouldn't be an issue for Xen as do_softirq() would be called at
>>> every loop. In case of failure, the guest will crash and the vCPU
>>> will be unscheduled.
>>
>> Imagine that we have two guests: one that requires an ioreq server and
>> one that doesn't. If I am not mistaken this loop could potentially spin
>> forever on a pcpu, thus preventing any other guest being scheduled, even
>> if the other guest doesn't need any ioreq servers.
>>
>>
>> My other concern is that we are busy-looping. Could we call something
>> like wfi() or do_idle() instead? The ioreq server event notification of
>> completion should wake us up?
>>
>> Following this line of thinking, I am wondering if instead of the
>> busy-loop we should call vcpu_block_unless_event_pending(current) in
>> try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or
>> equivalent) calls xenevtchn_notify which ends up waking up the domU
>> vcpu. Would that work?
> 
> I read now Julien's reply: we are already doing something similar to
> what I suggested with the following call chain:
> 
> check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io -> wait_on_xen_event_channel
> 
> So the busy-loop here is only a safety-belt in cause of a spurious
> wake-up, in which case we are going to call again check_for_vcpu_work,
> potentially causing a guest reschedule.
> 
> Then, this is fine and addresses both my concerns. Maybe let's add a note
> in the commit message about it.

Damm, I hit the "sent" button just a second before seen your reply. :/ 
Oh well. I suggested the same because I have seen the same question 
multiple time.

> 
> 
> I am also wondering if there is any benefit in calling wait_for_io()
> earlier, maybe from try_handle_mmio if IO_RETRY?

wait_for_io() may end up to deschedule the vCPU. I would like to avoid 
this to happen in the middle of the I/O emulation because we need to 
happen it without lock held at all.

I don't think there are locks involved today, but the deeper in the call 
stack the scheduling happens, the more chance we may screw up in the future.

However...

> leave_hypervisor_to_guest is very late for that.

... I am not sure what's the problem with that. The IOREQ will be 
notified of the pending I/O as soon as try_handle_mmio() put the I/O in 
the shared page.

If the IOREQ server is running on a different pCPU, then it might be 
possible that the I/O has completed before reached 
leave_hypervisor_to_guest(). In this case, we would not have to wait for 
the I/O.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-11-30 10:31 ` [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
  2020-12-08 14:24   ` Jan Beulich
@ 2020-12-09 23:49   ` Stefano Stabellini
  2021-01-15  1:18   ` Stefano Stabellini
  2 siblings, 0 replies; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-09 23:49 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Wei Liu, Roger Pau Monné,
	Julien Grall

[-- Attachment #1: Type: text/plain, Size: 9299 bytes --]

On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch implements reference counting of foreign entries in
> in set_foreign_p2m_entry() on Arm. This is a mandatory action if
> we want to run emulator (IOREQ server) in other than dom0 domain,
> as we can't trust it to do the right thing if it is not running
> in dom0. So we need to grab a reference on the page to avoid it
> disappearing.
> 
> It is valid to always pass "p2m_map_foreign_rw" type to
> guest_physmap_add_entry() since the current and foreign domains
> would be always different. A case when they are equal would be
> rejected by rcu_lock_remote_domain_by_id(). Besides the similar
> comment in the code put a respective ASSERT() to catch incorrect
> usage in future.
> 
> It was tested with IOREQ feature to confirm that all the pages given
> to this function belong to a domain, so we can use the same approach
> as for XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap_one().
> 
> This involves adding an extra parameter for the foreign domain to
> set_foreign_p2m_entry() and a helper to indicate whether the arch
> supports the reference counting of foreign entries and the restriction
> for the hardware domain in the common code can be skipped for it.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> CC: Julien Grall <julien.grall@arm.com>

The arm side looks OK to me


> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Changes RFC -> V1:
>    - new patch, was split from:
>      "[RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features"
>    - rewrite a logic to handle properly reference in set_foreign_p2m_entry()
>      instead of treating foreign entries as p2m_ram_rw
> 
> Changes V1 -> V2:
>    - rebase according to the recent changes to acquire_resource()
>    - update patch description
>    - introduce arch_refcounts_p2m()
>    - add an explanation why p2m_map_foreign_rw is valid
>    - move set_foreign_p2m_entry() to p2m-common.h
>    - add const to new parameter
> 
> Changes V2 -> V3:
>    - update patch description
>    - rename arch_refcounts_p2m() to arch_acquire_resource_check()
>    - move comment to x86’s arch_acquire_resource_check()
>    - return rc in Arm's set_foreign_p2m_entry()
>    - put a respective ASSERT() into Arm's set_foreign_p2m_entry()
> ---
> ---
>  xen/arch/arm/p2m.c           | 24 ++++++++++++++++++++++++
>  xen/arch/x86/mm/p2m.c        |  5 +++--
>  xen/common/memory.c          | 10 +++-------
>  xen/include/asm-arm/p2m.h    | 19 +++++++++----------
>  xen/include/asm-x86/p2m.h    | 16 +++++++++++++---
>  xen/include/xen/p2m-common.h |  4 ++++
>  6 files changed, 56 insertions(+), 22 deletions(-)
> 
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 4eeb867..5b8d494 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1380,6 +1380,30 @@ int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn,
>      return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
>  }
>  
> +int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
> +                          unsigned long gfn, mfn_t mfn)
> +{
> +    struct page_info *page = mfn_to_page(mfn);
> +    int rc;
> +
> +    if ( !get_page(page, fd) )
> +        return -EINVAL;
> +
> +    /*
> +     * It is valid to always use p2m_map_foreign_rw here as if this gets
> +     * called then d != fd. A case when d == fd would be rejected by
> +     * rcu_lock_remote_domain_by_id() earlier. Put a respective ASSERT()
> +     * to catch incorrect usage in future.
> +     */
> +    ASSERT(d != fd);
> +
> +    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw);
> +    if ( rc )
> +        put_page(page);
> +
> +    return rc;
> +}
> +
>  static struct page_info *p2m_allocate_root(void)
>  {
>      struct page_info *page;
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 7a2ba82..4772c86 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1321,7 +1321,8 @@ static int set_typed_p2m_entry(struct domain *d, unsigned long gfn_l,
>  }
>  
>  /* Set foreign mfn in the given guest's p2m table. */
> -int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
> +int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
> +                          unsigned long gfn, mfn_t mfn)
>  {
>      return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign,
>                                 p2m_get_hostp2m(d)->default_access);
> @@ -2621,7 +2622,7 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
>       * will update the m2p table which will result in  mfn -> gpfn of dom0
>       * and not fgfn of domU.
>       */
> -    rc = set_foreign_p2m_entry(tdom, gpfn, mfn);
> +    rc = set_foreign_p2m_entry(tdom, fdom, gpfn, mfn);
>      if ( rc )
>          gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. "
>                   "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 3363c06..49e3001 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1134,12 +1134,8 @@ static int acquire_resource(
>      xen_pfn_t mfn_list[32];
>      int rc;
>  
> -    /*
> -     * FIXME: Until foreign pages inserted into the P2M are properly
> -     *        reference counted, it is unsafe to allow mapping of
> -     *        resource pages unless the caller is the hardware domain.
> -     */
> -    if ( paging_mode_translate(currd) && !is_hardware_domain(currd) )
> +    if ( paging_mode_translate(currd) && !is_hardware_domain(currd) &&
> +         !arch_acquire_resource_check() )
>          return -EACCES;
>  
>      if ( copy_from_guest(&xmar, arg, 1) )
> @@ -1207,7 +1203,7 @@ static int acquire_resource(
>  
>          for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>          {
> -            rc = set_foreign_p2m_entry(currd, gfn_list[i],
> +            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>                                         _mfn(mfn_list[i]));
>              /* rc should be -EIO for any iteration other than the first */
>              if ( rc && i )
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index 28ca9a8..4f8056e 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -161,6 +161,15 @@ typedef enum {
>  #endif
>  #include <xen/p2m-common.h>
>  
> +static inline bool arch_acquire_resource_check(void)
> +{
> +    /*
> +     * The reference counting of foreign entries in set_foreign_p2m_entry()
> +     * is supported on Arm.
> +     */
> +    return true;
> +}
> +
>  static inline
>  void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
>  {
> @@ -392,16 +401,6 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned int order)
>      return gfn_add(gfn, 1UL << order);
>  }
>  
> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
> -                                        mfn_t mfn)
> -{
> -    /*
> -     * NOTE: If this is implemented then proper reference counting of
> -     *       foreign entries will need to be implemented.
> -     */
> -    return -EOPNOTSUPP;
> -}
> -
>  /*
>   * A vCPU has cache enabled only when the MMU is enabled and data cache
>   * is enabled.
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 4603560..8d2dc22 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -382,6 +382,19 @@ struct p2m_domain {
>  #endif
>  #include <xen/p2m-common.h>
>  
> +static inline bool arch_acquire_resource_check(void)
> +{
> +    /*
> +     * The reference counting of foreign entries in set_foreign_p2m_entry()
> +     * is not supported on x86.
> +     *
> +     * FIXME: Until foreign pages inserted into the P2M are properly
> +     * reference counted, it is unsafe to allow mapping of
> +     * resource pages unless the caller is the hardware domain.
> +     */
> +    return false;
> +}
> +
>  /*
>   * Updates vCPU's n2pm to match its np2m_base in VMCx12 and returns that np2m.
>   */
> @@ -647,9 +660,6 @@ int p2m_finish_type_change(struct domain *d,
>  int p2m_is_logdirty_range(struct p2m_domain *, unsigned long start,
>                            unsigned long end);
>  
> -/* Set foreign entry in the p2m table (for priv-mapping) */
> -int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
> -
>  /* Set mmio addresses in the p2m table (for pass-through) */
>  int set_mmio_p2m_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
>                         unsigned int order);
> diff --git a/xen/include/xen/p2m-common.h b/xen/include/xen/p2m-common.h
> index 58031a6..b4bc709 100644
> --- a/xen/include/xen/p2m-common.h
> +++ b/xen/include/xen/p2m-common.h
> @@ -3,6 +3,10 @@
>  
>  #include <xen/mm.h>
>  
> +/* Set foreign entry in the p2m table */
> +int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
> +                          unsigned long gfn, mfn_t mfn);
> +
>  /* Remove a page from a domain's p2m table */
>  int __must_check
>  guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn,
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 18/23] xen/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-11-30 10:31 ` [PATCH V3 18/23] xen/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
@ 2020-12-10  2:21   ` Stefano Stabellini
  2020-12-10 12:58     ` Oleksandr
  2020-12-10 13:38     ` Julien Grall
  0 siblings, 2 replies; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-10  2:21 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Julien Grall, Ian Jackson, Wei Liu, Andrew Cooper,
	George Dunlap, Jan Beulich, Julien Grall, Stefano Stabellini,
	Volodymyr Babchuk, Oleksandr Tyshchenko, alex.bennee

On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> From: Julien Grall <julien.grall@arm.com>
> 
> This patch adds ability to the device emulator to notify otherend
> (some entity running in the guest) using a SPI and implements Arm
> specific bits for it. Proposed interface allows emulator to set
> the logical level of a one of a domain's IRQ lines.
> 
> We can't reuse the existing DM op (xen_dm_op_set_isa_irq_level)
> to inject an interrupt as the "isa_irq" field is only 8-bit and
> able to cover IRQ 0 - 255, whereas we need a wider range (0 - 1020).
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> ***
> Please note, I left interface untouched since there is still
> an open discussion what interface to use/what information to pass
> to the hypervisor. The question whether we should abstract away
> the state of the line or not.
> ***

Let's start with a simple question: is this going to work with
virtio-mmio emulation in QEMU that doesn't lower the state of the line
to end the notification (only calls qemu_set_irq(irq, high))?

See: hw/virtio/virtio-mmio.c:virtio_mmio_update_irq


Alex (CC'ed) might be able to confirm whether I am reading the QEMU code
correctly. Assuming that it is true that QEMU is only raising the level,
never lowering it, although the emulation is obviously not correct, I
would rather keep QEMU as is for efficiency reasons, and because we
don't want to deviate from the common implementation in QEMU.


Looking at this patch and at vgic_inject_irq, yes, I think it would
work as is.


So it looks like we are going to end up with an interface that:

- in theory it is modelling the line closely
- in practice it is only called to "trigger the IRQ"


Hence my preference for being explicit about it and just call it
trigger_irq.

If we keep the patch as is, should we at least add a comment to document
the "QEMU style" use model?


> Changes RFC -> V1:
>    - check incoming parameters in arch_dm_op()
>    - add explicit padding to struct xen_dm_op_set_irq_level
> 
> Changes V1 -> V2:
>    - update the author of a patch
>    - update patch description
>    - check that padding is always 0
>    - mention that interface is Arm only and only SPIs are
>      supported for now
>    - allow to set the logical level of a line for non-allocated
>      interrupts only
>    - add xen_dm_op_set_irq_level_t
> 
> Changes V2 -> V3:
>    - no changes
> ---
> ---
>  tools/include/xendevicemodel.h               |  4 ++
>  tools/libs/devicemodel/core.c                | 18 +++++++++
>  tools/libs/devicemodel/libxendevicemodel.map |  1 +
>  xen/arch/arm/dm.c                            | 57 +++++++++++++++++++++++++++-
>  xen/common/dm.c                              |  1 +
>  xen/include/public/hvm/dm_op.h               | 16 ++++++++
>  6 files changed, 96 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/include/xendevicemodel.h b/tools/include/xendevicemodel.h
> index e877f5c..c06b3c8 100644
> --- a/tools/include/xendevicemodel.h
> +++ b/tools/include/xendevicemodel.h
> @@ -209,6 +209,10 @@ int xendevicemodel_set_isa_irq_level(
>      xendevicemodel_handle *dmod, domid_t domid, uint8_t irq,
>      unsigned int level);
>  
> +int xendevicemodel_set_irq_level(
> +    xendevicemodel_handle *dmod, domid_t domid, unsigned int irq,
> +    unsigned int level);
> +
>  /**
>   * This function maps a PCI INTx line to a an IRQ line.
>   *
> diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
> index 4d40639..30bd79f 100644
> --- a/tools/libs/devicemodel/core.c
> +++ b/tools/libs/devicemodel/core.c
> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
>      return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>  }
>  
> +int xendevicemodel_set_irq_level(
> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
> +    unsigned int level)
> +{
> +    struct xen_dm_op op;
> +    struct xen_dm_op_set_irq_level *data;
> +
> +    memset(&op, 0, sizeof(op));
> +
> +    op.op = XEN_DMOP_set_irq_level;
> +    data = &op.u.set_irq_level;
> +
> +    data->irq = irq;
> +    data->level = level;
> +
> +    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> +}
> +
>  int xendevicemodel_set_pci_link_route(
>      xendevicemodel_handle *dmod, domid_t domid, uint8_t link, uint8_t irq)
>  {
> diff --git a/tools/libs/devicemodel/libxendevicemodel.map b/tools/libs/devicemodel/libxendevicemodel.map
> index 561c62d..a0c3012 100644
> --- a/tools/libs/devicemodel/libxendevicemodel.map
> +++ b/tools/libs/devicemodel/libxendevicemodel.map
> @@ -32,6 +32,7 @@ VERS_1.2 {
>  	global:
>  		xendevicemodel_relocate_memory;
>  		xendevicemodel_pin_memory_cacheattr;
> +		xendevicemodel_set_irq_level;
>  } VERS_1.1;
>  
>  VERS_1.3 {
> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
> index 5d3da37..e4bb233 100644
> --- a/xen/arch/arm/dm.c
> +++ b/xen/arch/arm/dm.c
> @@ -17,10 +17,65 @@
>  #include <xen/dm.h>
>  #include <xen/hypercall.h>
>  
> +#include <asm/vgic.h>
> +
>  int arch_dm_op(struct xen_dm_op *op, struct domain *d,
>                 const struct dmop_args *op_args, bool *const_op)
>  {
> -    return -EOPNOTSUPP;
> +    int rc;
> +
> +    switch ( op->op )
> +    {
> +    case XEN_DMOP_set_irq_level:
> +    {
> +        const struct xen_dm_op_set_irq_level *data =
> +            &op->u.set_irq_level;
> +        unsigned int i;
> +
> +        /* Only SPIs are supported */
> +        if ( (data->irq < NR_LOCAL_IRQS) || (data->irq >= vgic_num_irqs(d)) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        if ( data->level != 0 && data->level != 1 )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        /* Check that padding is always 0 */
> +        for ( i = 0; i < sizeof(data->pad); i++ )
> +        {
> +            if ( data->pad[i] )
> +            {
> +                rc = -EINVAL;
> +                break;
> +            }
> +        }
> +
> +        /*
> +         * Allow to set the logical level of a line for non-allocated
> +         * interrupts only.
> +         */
> +        if ( test_bit(data->irq, d->arch.vgic.allocated_irqs) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        vgic_inject_irq(d, NULL, data->irq, data->level);
> +        rc = 0;
> +        break;
> +    }
> +
> +    default:
> +        rc = -EOPNOTSUPP;
> +        break;
> +    }
> +
> +    return rc;
>  }
>  
>  /*
> diff --git a/xen/common/dm.c b/xen/common/dm.c
> index 9d394fc..7bfb46c 100644
> --- a/xen/common/dm.c
> +++ b/xen/common/dm.c
> @@ -48,6 +48,7 @@ static int dm_op(const struct dmop_args *op_args)
>          [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
>          [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
>          [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
> +        [XEN_DMOP_set_irq_level]                    = sizeof(struct xen_dm_op_set_irq_level),
>      };
>  
>      rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
> diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
> index 66cae1a..1f70d58 100644
> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -434,6 +434,21 @@ struct xen_dm_op_pin_memory_cacheattr {
>  };
>  typedef struct xen_dm_op_pin_memory_cacheattr xen_dm_op_pin_memory_cacheattr_t;
>  
> +/*
> + * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's
> + *                         IRQ lines (currently Arm only).
> + * Only SPIs are supported.
> + */
> +#define XEN_DMOP_set_irq_level 19
> +
> +struct xen_dm_op_set_irq_level {
> +    uint32_t irq;
> +    /* IN - Level: 0 -> deasserted, 1 -> asserted */
> +    uint8_t level;
> +    uint8_t pad[3];
> +};
> +typedef struct xen_dm_op_set_irq_level xen_dm_op_set_irq_level_t;
> +
>  struct xen_dm_op {
>      uint32_t op;
>      uint32_t pad;
> @@ -447,6 +462,7 @@ struct xen_dm_op {
>          xen_dm_op_track_dirty_vram_t track_dirty_vram;
>          xen_dm_op_set_pci_intx_level_t set_pci_intx_level;
>          xen_dm_op_set_isa_irq_level_t set_isa_irq_level;
> +        xen_dm_op_set_irq_level_t set_irq_level;
>          xen_dm_op_set_pci_link_route_t set_pci_link_route;
>          xen_dm_op_modified_memory_t modified_memory;
>          xen_dm_op_set_mem_type_t set_mem_type;
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
  2020-12-09 23:47       ` Julien Grall
@ 2020-12-10  2:30         ` Stefano Stabellini
  2020-12-10 13:17           ` Julien Grall
  2020-12-10 13:21           ` Oleksandr
  0 siblings, 2 replies; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-10  2:30 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, xen-devel,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Julien Grall

On Wed, 9 Dec 2020, Julien Grall wrote:
> On 09/12/2020 23:35, Stefano Stabellini wrote:
> > On Wed, 9 Dec 2020, Stefano Stabellini wrote:
> > > On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > 
> > > > This patch adds proper handling of return value of
> > > > vcpu_ioreq_handle_completion() which involves using a loop
> > > > in leave_hypervisor_to_guest().
> > > > 
> > > > The reason to use an unbounded loop here is the fact that vCPU
> > > > shouldn't continue until an I/O has completed. In Xen case, if an I/O
> > > > never completes then it most likely means that something went horribly
> > > > wrong with the Device Emulator. And it is most likely not safe to
> > > > continue. So letting the vCPU to spin forever if I/O never completes
> > > > is a safer action than letting it continue and leaving the guest in
> > > > unclear state and is the best what we can do for now.
> > > > 
> > > > This wouldn't be an issue for Xen as do_softirq() would be called at
> > > > every loop. In case of failure, the guest will crash and the vCPU
> > > > will be unscheduled.
> > > 
> > > Imagine that we have two guests: one that requires an ioreq server and
> > > one that doesn't. If I am not mistaken this loop could potentially spin
> > > forever on a pcpu, thus preventing any other guest being scheduled, even
> > > if the other guest doesn't need any ioreq servers.
> > > 
> > > 
> > > My other concern is that we are busy-looping. Could we call something
> > > like wfi() or do_idle() instead? The ioreq server event notification of
> > > completion should wake us up?
> > > 
> > > Following this line of thinking, I am wondering if instead of the
> > > busy-loop we should call vcpu_block_unless_event_pending(current) in
> > > try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or
> > > equivalent) calls xenevtchn_notify which ends up waking up the domU
> > > vcpu. Would that work?
> > 
> > I read now Julien's reply: we are already doing something similar to
> > what I suggested with the following call chain:
> > 
> > check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io ->
> > wait_on_xen_event_channel
> > 
> > So the busy-loop here is only a safety-belt in cause of a spurious
> > wake-up, in which case we are going to call again check_for_vcpu_work,
> > potentially causing a guest reschedule.
> > 
> > Then, this is fine and addresses both my concerns. Maybe let's add a note
> > in the commit message about it.
> 
> Damm, I hit the "sent" button just a second before seen your reply. :/ Oh
> well. I suggested the same because I have seen the same question multiple
> time.

:-)

 
> > I am also wondering if there is any benefit in calling wait_for_io()
> > earlier, maybe from try_handle_mmio if IO_RETRY?
> 
> wait_for_io() may end up to deschedule the vCPU. I would like to avoid this to
> happen in the middle of the I/O emulation because we need to happen it without
> lock held at all.
> 
> I don't think there are locks involved today, but the deeper in the call stack
> the scheduling happens, the more chance we may screw up in the future.
> 
> However...
> 
> > leave_hypervisor_to_guest is very late for that.
> 
> ... I am not sure what's the problem with that. The IOREQ will be notified of
> the pending I/O as soon as try_handle_mmio() put the I/O in the shared page.
> 
> If the IOREQ server is running on a different pCPU, then it might be possible
> that the I/O has completed before reached leave_hypervisor_to_guest(). In this
> case, we would not have to wait for the I/O.

Yeah, I was thinking about that too. Actually it could be faster
this way we end up being lucky.

The reason for moving it earlier would be that by the time
leave_hypervisor_to_guest is called "Xen" has already decided to
continue running this particular vcpu. If we called wait_for_io()
earlier, we would give important information to the scheduler before any
decision is made. This is more "philosophical" than practical though.
Let's leave it as is.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 13/23] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-12-09 22:34     ` Oleksandr
@ 2020-12-10  2:30       ` Stefano Stabellini
  0 siblings, 0 replies; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-10  2:30 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, Oleksandr Tyshchenko,
	Julien Grall, Volodymyr Babchuk, Paul Durrant, Julien Grall

On Thu, 10 Dec 2020, Oleksandr wrote:
> > On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > The cmpxchg() in ioreq_send_buffered() operates on memory shared
> > > with the emulator domain (and the target domain if the legacy
> > > interface is used).
> > > 
> > > In order to be on the safe side we need to switch
> > > to guest_cmpxchg64() to prevent a domain to DoS Xen on Arm.
> > > 
> > > As there is no plan to support the legacy interface on Arm,
> > > we will have a page to be mapped in a single domain at the time,
> > > so we can use s->emulator in guest_cmpxchg64() safely.
> > > 
> > > Thankfully the only user of the legacy interface is x86 so far
> > > and there is not concern regarding the atomics operations.
> > > 
> > > Please note, that the legacy interface *must* not be used on Arm
> > > without revisiting the code.
> > > 
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > CC: Julien Grall <julien.grall@arm.com>
> > > 
> > > ---
> > > Please note, this is a split/cleanup/hardening of Julien's PoC:
> > > "Add support for Guest IO forwarding to a device emulator"
> > > 
> > > Changes RFC -> V1:
> > >     - new patch
> > > 
> > > Changes V1 -> V2:
> > >     - move earlier to avoid breaking arm32 compilation
> > >     - add an explanation to commit description and hvm_allow_set_param()
> > >     - pass s->emulator
> > > 
> > > Changes V2 -> V3:
> > >     - update patch description
> > > ---
> > > ---
> > >   xen/arch/arm/hvm.c | 4 ++++
> > >   xen/common/ioreq.c | 3 ++-
> > >   2 files changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
> > > index 8951b34..9694e5a 100644
> > > --- a/xen/arch/arm/hvm.c
> > > +++ b/xen/arch/arm/hvm.c
> > > @@ -31,6 +31,10 @@
> > >     #include <asm/hypercall.h>
> > >   +/*
> > > + * The legacy interface (which involves magic IOREQ pages) *must* not be
> > > used
> > > + * without revisiting the code.
> > > + */
> > This is a NIT, but I'd prefer if you moved the comment a few lines
> > below, maybe just before the existing comment starting with "The
> > following parameters".
> > 
> > The reason is that as it is now it is not clear which set_params
> > interfaces should not be used without revisiting the code.
> OK, but maybe this comment wants dropping at all? It was actual when the
> legacy interface was the part of the common code (V2). Now the legacy
> interface is
> x86 specific so I am not sure this comment should be here.

Yeah, fine by me.

 
> > 
> > With that:
> > 
> > Acked-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> Thank you



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-12-09 22:49     ` Oleksandr
@ 2020-12-10  2:30       ` Stefano Stabellini
  0 siblings, 0 replies; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-10  2:30 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, Julien Grall, Julien Grall,
	Volodymyr Babchuk, Oleksandr Tyshchenko, jbeulich, xadimgnik

On Thu, 10 Dec 2020, Oleksandr wrote:
> > > +#ifdef CONFIG_IOREQ_SERVER
> > > +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v);
> > > +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> > > +                             struct vcpu *v, mmio_info_t *info);
> > > +#else
> > > +static inline enum io_state handle_ioserv(struct cpu_user_regs *regs,
> > > +                                          struct vcpu *v)
> > > +{
> > > +    return IO_UNHANDLED;
> > > +}
> > > +
> > > +static inline enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> > > +                                           struct vcpu *v, mmio_info_t
> > > *info)
> > > +{
> > > +    return IO_UNHANDLED;
> > > +}
> > > +#endif
> > If we are providing stub functions, then we can also provide stub
> > functions for:
> > 
> > ioreq_domain_init
> > ioreq_server_destroy_all
> > 
> > and avoid the ifdefs.
> I got your point. These are common IOREQ interface functions, which
> declarations live in the common header, should I provide
> stubs in the common ioreq.h?
 
I'd prefer that, but if Jan and Paul don't want to have them I won't insist.
 
 
> > > +bool ioreq_complete_mmio(void);
> > > +
> > > +static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
> > > +{
> > > +    /*
> > > +     * TODO: For Arm64, the main user will be PCI. So this should be
> > > +     * implemented when we add support for vPCI.
> > > +     */
> > > +    ASSERT_UNREACHABLE();
> > > +    return true;
> > > +}
> > > +
> > > +static inline void msix_write_completion(struct vcpu *v)
> > > +{
> > > +}
> > > +
> > > +static inline bool arch_vcpu_ioreq_completion(enum io_completion
> > > io_completion)
> > > +{
> > > +    ASSERT_UNREACHABLE();
> > > +    return true;
> > > +}
> > > +
> > > +/*
> > > + * The "legacy" mechanism of mapping magic pages for the IOREQ servers
> > > + * is x86 specific, so the following hooks don't need to be implemented
> > > on Arm:
> > > + * - arch_ioreq_server_map_pages
> > > + * - arch_ioreq_server_unmap_pages
> > > + * - arch_ioreq_server_enable
> > > + * - arch_ioreq_server_disable
> > > + */
> > > +static inline int arch_ioreq_server_map_pages(struct ioreq_server *s)
> > > +{
> > > +    return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static inline void arch_ioreq_server_unmap_pages(struct ioreq_server *s)
> > > +{
> > > +}
> > > +
> > > +static inline void arch_ioreq_server_enable(struct ioreq_server *s)
> > > +{
> > > +}
> > > +
> > > +static inline void arch_ioreq_server_disable(struct ioreq_server *s)
> > > +{
> > > +}
> > > +
> > > +static inline void arch_ioreq_server_destroy(struct ioreq_server *s)
> > > +{
> > > +}
> > > +
> > > +static inline int arch_ioreq_server_map_mem_type(struct domain *d,
> > > +                                                 struct ioreq_server *s,
> > > +                                                 uint32_t flags)
> > > +{
> > > +    return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static inline bool arch_ioreq_server_destroy_all(struct domain *d)
> > > +{
> > > +    return true;
> > > +}
> > > +
> > > +static inline int arch_ioreq_server_get_type_addr(const struct domain *d,
> > > +                                                  const ioreq_t *p,
> > > +                                                  uint8_t *type,
> > > +                                                  uint64_t *addr)
> > > +{
> > > +    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
> > > +        return -EINVAL;
> > > +
> > > +    *type = (p->type == IOREQ_TYPE_PIO) ?
> > > +             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
> > > +    *addr = p->addr;
> > This function is not used in this patch and PIOs are left unimplemented
> > according to a few comments, so I am puzzled by this code here. Do we
> > need it?
> Yes. It is called from ioreq_server_select (common/ioreq.c). I could just skip
> PIO case and use
> *type = XEN_DMOP_IO_RANGE_MEMORY, but I didn't want to diverge.
 
I see. OK then.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling
  2020-11-30 10:31 ` [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling Oleksandr Tyshchenko
@ 2020-12-10  2:30   ` Stefano Stabellini
  2020-12-10 18:50     ` Julien Grall
  0 siblings, 1 reply; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-10  2:30 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Julien Grall

On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> We need to send mapcache invalidation request to qemu/demu everytime
> the page gets removed from a guest.
> 
> At the moment, the Arm code doesn't explicitely remove the existing
> mapping before inserting the new mapping. Instead, this is done
> implicitely by __p2m_set_entry().
> 
> So we need to recognize a case when old entry is a RAM page *and*
> the new MFN is different in order to set the corresponding flag.
> The most suitable place to do this is p2m_free_entry(), there
> we can find the correct leaf type. The invalidation request
> will be sent in do_trap_hypercall() later on.

Why is it sent in do_trap_hypercall() ?


> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> CC: Julien Grall <julien.grall@arm.com>
> 
> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Changes V1 -> V2:
>    - new patch, some changes were derived from (+ new explanation):
>      xen/ioreq: Make x86's invalidate qemu mapcache handling common
>    - put setting of the flag into __p2m_set_entry()
>    - clarify the conditions when the flag should be set
>    - use domain_has_ioreq_server()
>    - update do_trap_hypercall() by adding local variable
> 
> Changes V2 -> V3:
>    - update patch description
>    - move check to p2m_free_entry()
>    - add a comment
>    - use "curr" instead of "v" in do_trap_hypercall()
> ---
> ---
>  xen/arch/arm/p2m.c   | 24 ++++++++++++++++--------
>  xen/arch/arm/traps.c | 13 ++++++++++---
>  2 files changed, 26 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 5b8d494..9674f6f 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1,6 +1,7 @@
>  #include <xen/cpu.h>
>  #include <xen/domain_page.h>
>  #include <xen/iocap.h>
> +#include <xen/ioreq.h>
>  #include <xen/lib.h>
>  #include <xen/sched.h>
>  #include <xen/softirq.h>
> @@ -749,17 +750,24 @@ static void p2m_free_entry(struct p2m_domain *p2m,
>      if ( !p2m_is_valid(entry) )
>          return;
>  
> -    /* Nothing to do but updating the stats if the entry is a super-page. */
> -    if ( p2m_is_superpage(entry, level) )
> +    if ( p2m_is_superpage(entry, level) || (level == 3) )
>      {
> -        p2m->stats.mappings[level]--;
> -        return;
> -    }
> +#ifdef CONFIG_IOREQ_SERVER
> +        /*
> +         * If this gets called (non-recursively) then either the entry
> +         * was replaced by an entry with a different base (valid case) or
> +         * the shattering of a superpage was failed (error case).
> +         * So, at worst, the spurious mapcache invalidation might be sent.
> +         */
> +        if ( domain_has_ioreq_server(p2m->domain) &&
> +             (p2m->domain == current->domain) && p2m_is_ram(entry.p2m.type) )
> +            p2m->domain->mapcache_invalidate = true;

Why the (p2m->domain == current->domain) check? Shouldn't we set
mapcache_invalidate to true anyway? What happens if p2m->domain !=
current->domain? We wouldn't want the domain to lose the
mapcache_invalidate notification.


> +#endif
>  
> -    if ( level == 3 )
> -    {
>          p2m->stats.mappings[level]--;
> -        p2m_put_l3_page(entry);
> +        /* Nothing to do if the entry is a super-page. */
> +        if ( level == 3 )
> +            p2m_put_l3_page(entry);
>          return;
>      }
>  
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index b6077d2..151c626 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1443,6 +1443,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>                                const union hsr hsr)
>  {
>      arm_hypercall_fn_t call = NULL;
> +    struct vcpu *curr = current;

Is this just to save 3 characters?


>      BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
>  
> @@ -1459,7 +1460,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>          return;
>      }
>  
> -    current->hcall_preempted = false;
> +    curr->hcall_preempted = false;
>  
>      perfc_incra(hypercalls, *nr);
>      call = arm_hypercall_table[*nr].fn;
> @@ -1472,7 +1473,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>      HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
>  
>  #ifndef NDEBUG
> -    if ( !current->hcall_preempted )
> +    if ( !curr->hcall_preempted )
>      {
>          /* Deliberately corrupt parameter regs used by this hypercall. */
>          switch ( arm_hypercall_table[*nr].nr_args ) {
> @@ -1489,8 +1490,14 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>  #endif
>  
>      /* Ensure the hypercall trap instruction is re-executed. */
> -    if ( current->hcall_preempted )
> +    if ( curr->hcall_preempted )
>          regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
> +
> +#ifdef CONFIG_IOREQ_SERVER
> +    if ( unlikely(curr->domain->mapcache_invalidate) &&
> +         test_and_clear_bool(curr->domain->mapcache_invalidate) )
> +        ioreq_signal_mapcache_invalidate();

Why not just:

if ( unlikely(test_and_clear_bool(curr->domain->mapcache_invalidate)) )
    ioreq_signal_mapcache_invalidate();


^ permalink raw reply	[flat|nested] 127+ messages in thread

* RE: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-12-09 20:36               ` Oleksandr
@ 2020-12-10  8:38                 ` Paul Durrant
  2020-12-10 16:57                   ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Paul Durrant @ 2020-12-10  8:38 UTC (permalink / raw)
  To: 'Oleksandr'
  Cc: 'Jan Beulich', 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall',
	'Volodymyr Babchuk', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson', 'Wei Liu',
	'Julien Grall',
	xen-devel

> -----Original Message-----
> From: Oleksandr <olekstysh@gmail.com>
> Sent: 09 December 2020 20:36
> To: paul@xen.org
> Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>;
> 'Stefano Stabellini' <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk'
> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
> <george.dunlap@citrix.com>; 'Ian Jackson' <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Julien Grall'
> <julien.grall@arm.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
> 
> 
> Hi Paul.
> 
> 
> >>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
> >>>>>>> --- a/xen/include/xen/ioreq.h
> >>>>>>> +++ b/xen/include/xen/ioreq.h
> >>>>>>> @@ -55,6 +55,20 @@ struct ioreq_server {
> >>>>>>>         uint8_t                bufioreq_handling;
> >>>>>>>     };
> >>>>>>>     +/*
> >>>>>>> + * This should only be used when d == current->domain and it's not
> >>>>>>> paused,
> >>>>>> Is the "not paused" part really relevant here? Besides it being rare
> >>>>>> that the current domain would be paused (if so, it's in the process
> >>>>>> of having all its vCPU-s scheduled out), does this matter at all?do
> >>>>>> any extra actionsdo any extra actions
> >>>>> No, it isn't relevant, I will drop it.
> >>>>>
> >>>>>
> >>>>>> Apart from this the patch looks okay to me, but I'm not sure it
> >>>>>> addresses Paul's concerns. Iirc he had suggested to switch back to
> >>>>>> a list if doing a swipe over the entire array is too expensive in
> >>>>>> this specific case.
> >>>>> We would like to avoid to do any extra actions in
> >>>>> leave_hypervisor_to_guest() if possible.
> >>>>> But not only there, the logic whether we check/set
> >>>>> mapcache_invalidation variable could be avoided if a domain doesn't
> >>>>> use IOREQ server...
> >>>> Are you OK with this patch (common part of it)?
> >>> How much of a performance benefit is this? The array is small to simply counting the non-NULL
> >> entries should be pretty quick.
> >> I didn't perform performance measurements on how much this call consumes.
> >> In our system we run three domains. The emulator is in DomD only, so I
> >> would like to avoid to call vcpu_ioreq_handle_completion() for every
> >> Dom0/DomU's vCPUs
> >> if there is no real need to do it.
> > This is not relevant to the domain that the emulator is running in; it's concerning the domains
> which the emulator is servicing. How many of those are there?
> Err, yes, I wasn't precise when providing an example.
> Single emulator is running in DomD and servicing DomU. So with the
> helper in place the vcpu_ioreq_handle_completion() gets only called for
> DomU vCPUs (as expected).
> Without an optimization the vcpu_ioreq_handle_completion() gets called
> for _all_ vCPUs, and I see it as an extra action for Dom0, DomD vCPUs.
> 
> 
> >
> >> On Arm vcpu_ioreq_handle_completion()
> >> is called with IRQ enabled, so the call is accompanied with
> >> corresponding irq_enable/irq_disable.
> >> These unneeded actions could be avoided by using this simple one-line
> >> helper...
> >>
> > The helper may be one line but there is more to the patch than that. I still think you could just
> walk the array in the helper rather than keeping a running occupancy count.
> 
> OK, is the implementation below close to what you propose? If yes, I
> will update a helper and drop nr_servers variable.
> 
> bool domain_has_ioreq_server(const struct domain *d)
> {
>      const struct ioreq_server *s;
>      unsigned int id;
> 
>      FOR_EACH_IOREQ_SERVER(d, id, s)
>          return true;
> 
>      return false;
> }

Yes, that's what I had in mind.

  Paul

> 
> --
> Regards,
> 
> Oleksandr Tyshchenko




^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 18/23] xen/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-12-10  2:21   ` Stefano Stabellini
@ 2020-12-10 12:58     ` Oleksandr
  2020-12-10 13:38     ` Julien Grall
  1 sibling, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-10 12:58 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, Julien Grall, Ian Jackson, Wei Liu, Andrew Cooper,
	George Dunlap, Jan Beulich, Julien Grall, Volodymyr Babchuk,
	Oleksandr Tyshchenko, alex.bennee


On 10.12.20 04:21, Stefano Stabellini wrote:

Hi Stefano

> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
>> From: Julien Grall <julien.grall@arm.com>
>>
>> This patch adds ability to the device emulator to notify otherend
>> (some entity running in the guest) using a SPI and implements Arm
>> specific bits for it. Proposed interface allows emulator to set
>> the logical level of a one of a domain's IRQ lines.
>>
>> We can't reuse the existing DM op (xen_dm_op_set_isa_irq_level)
>> to inject an interrupt as the "isa_irq" field is only 8-bit and
>> able to cover IRQ 0 - 255, whereas we need a wider range (0 - 1020).
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> ---
>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> ***
>> Please note, I left interface untouched since there is still
>> an open discussion what interface to use/what information to pass
>> to the hypervisor. The question whether we should abstract away
>> the state of the line or not.
>> ***
> Let's start with a simple question: is this going to work with
> virtio-mmio emulation in QEMU that doesn't lower the state of the line
> to end the notification (only calls qemu_set_irq(irq, high))?
>
> See: hw/virtio/virtio-mmio.c:virtio_mmio_update_irq
>
>
> Alex (CC'ed) might be able to confirm whether I am reading the QEMU code
> correctly. Assuming that it is true that QEMU is only raising the level,
> never lowering it, although the emulation is obviously not correct, I
> would rather keep QEMU as is for efficiency reasons, and because we
> don't want to deviate from the common implementation in QEMU.
>
>
> Looking at this patch and at vgic_inject_irq, yes, I think it would
> work as is.
Not sure whether QEMU lowers the level or not, but in virtio-disk 
backend example we don't set level to 0.
IIRC there was a discussion about that from which I took that "setting 
level to 0 still does nothing on Arm if IRQ edge triggered".
So, looks like, yes, it would work as is.


>
> So it looks like we are going to end up with an interface that:
>
> - in theory it is modelling the line closely
> - in practice it is only called to "trigger the IRQ"
>
>
> Hence my preference for being explicit about it and just call it
> trigger_irq.

I got it, just rename with retaining the level parameter?


>
> If we keep the patch as is, should we at least add a comment to document
> the "QEMU style" use model?

Sure, I will describe that QEMU is only raising the level and never 
lowering it, if I have a confirmation this is true.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
  2020-12-10  2:30         ` Stefano Stabellini
@ 2020-12-10 13:17           ` Julien Grall
  2020-12-10 13:21           ` Oleksandr
  1 sibling, 0 replies; 127+ messages in thread
From: Julien Grall @ 2020-12-10 13:17 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, xen-devel, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Julien Grall

Hi Stefano,

On 10/12/2020 02:30, Stefano Stabellini wrote:
>>> I am also wondering if there is any benefit in calling wait_for_io()
>>> earlier, maybe from try_handle_mmio if IO_RETRY?
>>
>> wait_for_io() may end up to deschedule the vCPU. I would like to avoid this to
>> happen in the middle of the I/O emulation because we need to happen it without
>> lock held at all.
>>
>> I don't think there are locks involved today, but the deeper in the call stack
>> the scheduling happens, the more chance we may screw up in the future.
>>
>> However...
>>
>>> leave_hypervisor_to_guest is very late for that.
>>
>> ... I am not sure what's the problem with that. The IOREQ will be notified of
>> the pending I/O as soon as try_handle_mmio() put the I/O in the shared page.
>>
>> If the IOREQ server is running on a different pCPU, then it might be possible
>> that the I/O has completed before reached leave_hypervisor_to_guest(). In this
>> case, we would not have to wait for the I/O.
> 
> Yeah, I was thinking about that too. Actually it could be faster
> this way we end up being lucky.
> 
> The reason for moving it earlier would be that by the time
> leave_hypervisor_to_guest is called "Xen" has already decided to
> continue running this particular vcpu. If we called wait_for_io()
> earlier, we would give important information to the scheduler before any
> decision is made.

I don't understand this. Xen preemption is voluntary, so the scheduler 
is not going to run unless requested.

wait_for_io() is a preemption point. So if you call it, then vCPU may 
get descheduled at that point.

Why would we want to do this? What's our benefits here?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed
  2020-12-10  2:30         ` Stefano Stabellini
  2020-12-10 13:17           ` Julien Grall
@ 2020-12-10 13:21           ` Oleksandr
  1 sibling, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-10 13:21 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Volodymyr Babchuk, Julien Grall


On 10.12.20 04:30, Stefano Stabellini wrote:

Hi Julien, Stefano

> On Wed, 9 Dec 2020, Julien Grall wrote:
>> On 09/12/2020 23:35, Stefano Stabellini wrote:
>>> On Wed, 9 Dec 2020, Stefano Stabellini wrote:
>>>> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>
>>>>> This patch adds proper handling of return value of
>>>>> vcpu_ioreq_handle_completion() which involves using a loop
>>>>> in leave_hypervisor_to_guest().
>>>>>
>>>>> The reason to use an unbounded loop here is the fact that vCPU
>>>>> shouldn't continue until an I/O has completed. In Xen case, if an I/O
>>>>> never completes then it most likely means that something went horribly
>>>>> wrong with the Device Emulator. And it is most likely not safe to
>>>>> continue. So letting the vCPU to spin forever if I/O never completes
>>>>> is a safer action than letting it continue and leaving the guest in
>>>>> unclear state and is the best what we can do for now.
>>>>>
>>>>> This wouldn't be an issue for Xen as do_softirq() would be called at
>>>>> every loop. In case of failure, the guest will crash and the vCPU
>>>>> will be unscheduled.
>>>> Imagine that we have two guests: one that requires an ioreq server and
>>>> one that doesn't. If I am not mistaken this loop could potentially spin
>>>> forever on a pcpu, thus preventing any other guest being scheduled, even
>>>> if the other guest doesn't need any ioreq servers.
>>>>
>>>>
>>>> My other concern is that we are busy-looping. Could we call something
>>>> like wfi() or do_idle() instead? The ioreq server event notification of
>>>> completion should wake us up?
>>>>
>>>> Following this line of thinking, I am wondering if instead of the
>>>> busy-loop we should call vcpu_block_unless_event_pending(current) in
>>>> try_handle_mmio if IO_RETRY. Then when the emulation is done, QEMU (or
>>>> equivalent) calls xenevtchn_notify which ends up waking up the domU
>>>> vcpu. Would that work?
>>> I read now Julien's reply: we are already doing something similar to
>>> what I suggested with the following call chain:
>>>
>>> check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io ->
>>> wait_on_xen_event_channel
>>>
>>> So the busy-loop here is only a safety-belt in cause of a spurious
>>> wake-up, in which case we are going to call again check_for_vcpu_work,
>>> potentially causing a guest reschedule.
>>>
>>> Then, this is fine and addresses both my concerns. Maybe let's add a note
>>> in the commit message about it.
>> Damm, I hit the "sent" button just a second before seen your reply. :/ Oh
>> well. I suggested the same because I have seen the same question multiple
>> time.


I will update commit description and probably the comment in code.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 18/23] xen/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-12-10  2:21   ` Stefano Stabellini
  2020-12-10 12:58     ` Oleksandr
@ 2020-12-10 13:38     ` Julien Grall
  1 sibling, 0 replies; 127+ messages in thread
From: Julien Grall @ 2020-12-10 13:38 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr Tyshchenko
  Cc: xen-devel, Julien Grall, Ian Jackson, Wei Liu, Andrew Cooper,
	George Dunlap, Jan Beulich, Volodymyr Babchuk,
	Oleksandr Tyshchenko, alex.bennee

Hi Stefano,

On 10/12/2020 02:21, Stefano Stabellini wrote:
> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
>> From: Julien Grall <julien.grall@arm.com>
>>
>> This patch adds ability to the device emulator to notify otherend
>> (some entity running in the guest) using a SPI and implements Arm
>> specific bits for it. Proposed interface allows emulator to set
>> the logical level of a one of a domain's IRQ lines.
>>
>> We can't reuse the existing DM op (xen_dm_op_set_isa_irq_level)
>> to inject an interrupt as the "isa_irq" field is only 8-bit and
>> able to cover IRQ 0 - 255, whereas we need a wider range (0 - 1020).
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> ---
>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> ***
>> Please note, I left interface untouched since there is still
>> an open discussion what interface to use/what information to pass
>> to the hypervisor. The question whether we should abstract away
>> the state of the line or not.
>> ***
> 
> Let's start with a simple question: is this going to work with
> virtio-mmio emulation in QEMU that doesn't lower the state of the line
> to end the notification (only calls qemu_set_irq(irq, high))?
> 
> See: hw/virtio/virtio-mmio.c:virtio_mmio_update_irq

Hmmm my version of QEMU is using:

level = (qatomic_read(&vdev->isr) != 0);
trace_virtio_mmio_setting_irq(level);
qemu_set_irq(proxy->irq, level);

So QEMU will raise/lower the interrupt based on whether there are 
pending still interrupts.

> 
> 
> Alex (CC'ed) might be able to confirm whether I am reading the QEMU code
> correctly. Assuming that it is true that QEMU is only raising the level,
> never lowering it, although the emulation is obviously not correct, I
> would rather keep QEMU as is for efficiency reasons, and because we
> don't want to deviate from the common implementation in QEMU.
> 
> 
> Looking at this patch and at vgic_inject_irq, yes, I think it would
> work as is.

Our implementation of vgic_inject_irq() is completely bogus as soon as 
you deal with level interrupt. We are getting away so far, because there 
are not many fully emulated level interrupt (AFAIK this would only be 
the pl011). In fact, we carry a gross hack in the emulation to handle them.

In case of the level interrupt, we should keep injecting the interrupt 
to the guest until the line was lowered down (e.g qemu_set_irq(irq, 0) 
assuming active-high).

> 
> 
> So it looks like we are going to end up with an interface that:
> 
> - in theory it is modelling the line closely

For level interrupt we need to know whether the line is low or high. I 
am struggling to see how this would work if we consider the variable as 
"trigger".

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
  2020-12-10  8:38                 ` Paul Durrant
@ 2020-12-10 16:57                   ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2020-12-10 16:57 UTC (permalink / raw)
  To: paul
  Cc: 'Jan Beulich', 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall',
	'Volodymyr Babchuk', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson', 'Wei Liu',
	'Julien Grall',
	xen-devel


On 10.12.20 10:38, Paul Durrant wrote:

Hi Paul.

>> -----Original Message-----
>> From: Oleksandr <olekstysh@gmail.com>
>> Sent: 09 December 2020 20:36
>> To: paul@xen.org
>> Cc: 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>;
>> 'Stefano Stabellini' <sstabellini@kernel.org>; 'Julien Grall' <julien@xen.org>; 'Volodymyr Babchuk'
>> <Volodymyr_Babchuk@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap'
>> <george.dunlap@citrix.com>; 'Ian Jackson' <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Julien Grall'
>> <julien.grall@arm.com>; xen-devel@lists.xenproject.org
>> Subject: Re: [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server()
>>
>>
>> Hi Paul.
>>
>>
>>>>>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>>>>>>> --- a/xen/include/xen/ioreq.h
>>>>>>>>> +++ b/xen/include/xen/ioreq.h
>>>>>>>>> @@ -55,6 +55,20 @@ struct ioreq_server {
>>>>>>>>>          uint8_t                bufioreq_handling;
>>>>>>>>>      };
>>>>>>>>>      +/*
>>>>>>>>> + * This should only be used when d == current->domain and it's not
>>>>>>>>> paused,
>>>>>>>> Is the "not paused" part really relevant here? Besides it being rare
>>>>>>>> that the current domain would be paused (if so, it's in the process
>>>>>>>> of having all its vCPU-s scheduled out), does this matter at all?do
>>>>>>>> any extra actionsdo any extra actions
>>>>>>> No, it isn't relevant, I will drop it.
>>>>>>>
>>>>>>>
>>>>>>>> Apart from this the patch looks okay to me, but I'm not sure it
>>>>>>>> addresses Paul's concerns. Iirc he had suggested to switch back to
>>>>>>>> a list if doing a swipe over the entire array is too expensive in
>>>>>>>> this specific case.
>>>>>>> We would like to avoid to do any extra actions in
>>>>>>> leave_hypervisor_to_guest() if possible.
>>>>>>> But not only there, the logic whether we check/set
>>>>>>> mapcache_invalidation variable could be avoided if a domain doesn't
>>>>>>> use IOREQ server...
>>>>>> Are you OK with this patch (common part of it)?
>>>>> How much of a performance benefit is this? The array is small to simply counting the non-NULL
>>>> entries should be pretty quick.
>>>> I didn't perform performance measurements on how much this call consumes.
>>>> In our system we run three domains. The emulator is in DomD only, so I
>>>> would like to avoid to call vcpu_ioreq_handle_completion() for every
>>>> Dom0/DomU's vCPUs
>>>> if there is no real need to do it.
>>> This is not relevant to the domain that the emulator is running in; it's concerning the domains
>> which the emulator is servicing. How many of those are there?
>> Err, yes, I wasn't precise when providing an example.
>> Single emulator is running in DomD and servicing DomU. So with the
>> helper in place the vcpu_ioreq_handle_completion() gets only called for
>> DomU vCPUs (as expected).
>> Without an optimization the vcpu_ioreq_handle_completion() gets called
>> for _all_ vCPUs, and I see it as an extra action for Dom0, DomD vCPUs.
>>
>>
>>>> On Arm vcpu_ioreq_handle_completion()
>>>> is called with IRQ enabled, so the call is accompanied with
>>>> corresponding irq_enable/irq_disable.
>>>> These unneeded actions could be avoided by using this simple one-line
>>>> helper...
>>>>
>>> The helper may be one line but there is more to the patch than that. I still think you could just
>> walk the array in the helper rather than keeping a running occupancy count.
>>
>> OK, is the implementation below close to what you propose? If yes, I
>> will update a helper and drop nr_servers variable.
>>
>> bool domain_has_ioreq_server(const struct domain *d)
>> {
>>       const struct ioreq_server *s;
>>       unsigned int id;
>>
>>       FOR_EACH_IOREQ_SERVER(d, id, s)
>>           return true;
>>
>>       return false;
>> }
> Yes, that's what I had in mind.
>
>    Paul

Thank you for the clarification.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling
  2020-12-10  2:30   ` Stefano Stabellini
@ 2020-12-10 18:50     ` Julien Grall
  2020-12-11  1:28       ` Stefano Stabellini
  0 siblings, 1 reply; 127+ messages in thread
From: Julien Grall @ 2020-12-10 18:50 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Volodymyr Babchuk, Julien Grall

Hi Stefano,

On 10/12/2020 02:30, Stefano Stabellini wrote:
> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> We need to send mapcache invalidation request to qemu/demu everytime
>> the page gets removed from a guest.
>>
>> At the moment, the Arm code doesn't explicitely remove the existing
>> mapping before inserting the new mapping. Instead, this is done
>> implicitely by __p2m_set_entry().
>>
>> So we need to recognize a case when old entry is a RAM page *and*
>> the new MFN is different in order to set the corresponding flag.
>> The most suitable place to do this is p2m_free_entry(), there
>> we can find the correct leaf type. The invalidation request
>> will be sent in do_trap_hypercall() later on.
> 
> Why is it sent in do_trap_hypercall() ?

I believe this is following the approach used by x86. There are actually 
some discussion about it (see [1]).

Leaving aside the toolstack case for now, AFAIK, the only way a guest 
can modify its p2m is via an hypercall. Do you have an example otherwise?

When sending the invalidation request, the vCPU will be blocked until 
all the IOREQ server have acknowledged the invalidation. So the 
hypercall seems to be the best position to do it.

Alternatively, we could use check_for_vcpu_work() to check if the 
mapcache needs to be invalidated. The inconvenience is we would execute 
a few more instructions in each entry/exit path.

> 
> 
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> CC: Julien Grall <julien.grall@arm.com>
>>
>> ---
>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> Changes V1 -> V2:
>>     - new patch, some changes were derived from (+ new explanation):
>>       xen/ioreq: Make x86's invalidate qemu mapcache handling common
>>     - put setting of the flag into __p2m_set_entry()
>>     - clarify the conditions when the flag should be set
>>     - use domain_has_ioreq_server()
>>     - update do_trap_hypercall() by adding local variable
>>
>> Changes V2 -> V3:
>>     - update patch description
>>     - move check to p2m_free_entry()
>>     - add a comment
>>     - use "curr" instead of "v" in do_trap_hypercall()
>> ---
>> ---
>>   xen/arch/arm/p2m.c   | 24 ++++++++++++++++--------
>>   xen/arch/arm/traps.c | 13 ++++++++++---
>>   2 files changed, 26 insertions(+), 11 deletions(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 5b8d494..9674f6f 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1,6 +1,7 @@
>>   #include <xen/cpu.h>
>>   #include <xen/domain_page.h>
>>   #include <xen/iocap.h>
>> +#include <xen/ioreq.h>
>>   #include <xen/lib.h>
>>   #include <xen/sched.h>
>>   #include <xen/softirq.h>
>> @@ -749,17 +750,24 @@ static void p2m_free_entry(struct p2m_domain *p2m,
>>       if ( !p2m_is_valid(entry) )
>>           return;
>>   
>> -    /* Nothing to do but updating the stats if the entry is a super-page. */
>> -    if ( p2m_is_superpage(entry, level) )
>> +    if ( p2m_is_superpage(entry, level) || (level == 3) )
>>       {
>> -        p2m->stats.mappings[level]--;
>> -        return;
>> -    }
>> +#ifdef CONFIG_IOREQ_SERVER
>> +        /*
>> +         * If this gets called (non-recursively) then either the entry
>> +         * was replaced by an entry with a different base (valid case) or
>> +         * the shattering of a superpage was failed (error case).
>> +         * So, at worst, the spurious mapcache invalidation might be sent.
>> +         */
>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>> +             (p2m->domain == current->domain) && p2m_is_ram(entry.p2m.type) )
>> +            p2m->domain->mapcache_invalidate = true;
> 
> Why the (p2m->domain == current->domain) check? Shouldn't we set
> mapcache_invalidate to true anyway? What happens if p2m->domain !=
> current->domain? We wouldn't want the domain to lose the
> mapcache_invalidate notification.

This is also discussed in [1]. :) The main question is why would a 
toolstack/device model modify the guest memory after boot?

If we assume it does, then the device model would need to pause the 
domain before modifying the RAM.

We also need to make sure that all the IOREQ servers have invalidated
the mapcache before the domain run again.

This would require quite a bit of work. I am not sure the effort is 
worth if there are no active users today.

> 
> 
>> +#endif
>>   
>> -    if ( level == 3 )
>> -    {
>>           p2m->stats.mappings[level]--;
>> -        p2m_put_l3_page(entry);
>> +        /* Nothing to do if the entry is a super-page. */
>> +        if ( level == 3 )
>> +            p2m_put_l3_page(entry);
>>           return;
>>       }
>>   
>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>> index b6077d2..151c626 100644
>> --- a/xen/arch/arm/traps.c
>> +++ b/xen/arch/arm/traps.c
>> @@ -1443,6 +1443,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>>                                 const union hsr hsr)
>>   {
>>       arm_hypercall_fn_t call = NULL;
>> +    struct vcpu *curr = current;
> 
> Is this just to save 3 characters?

Because current is not cheap to read and the compiler cannot optimize it 
(we obfuscate it as this is a per-cpu variable). So we commonly store 
'current'  in a local variable if there are multiple use.

> 
> 
>>       BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
>>   
>> @@ -1459,7 +1460,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>>           return;
>>       }
>>   
>> -    current->hcall_preempted = false;
>> +    curr->hcall_preempted = false;
>>   
>>       perfc_incra(hypercalls, *nr);
>>       call = arm_hypercall_table[*nr].fn;
>> @@ -1472,7 +1473,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>>       HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
>>   
>>   #ifndef NDEBUG
>> -    if ( !current->hcall_preempted )
>> +    if ( !curr->hcall_preempted )
>>       {
>>           /* Deliberately corrupt parameter regs used by this hypercall. */
>>           switch ( arm_hypercall_table[*nr].nr_args ) {
>> @@ -1489,8 +1490,14 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>>   #endif
>>   
>>       /* Ensure the hypercall trap instruction is re-executed. */
>> -    if ( current->hcall_preempted )
>> +    if ( curr->hcall_preempted )
>>           regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
>> +
>> +#ifdef CONFIG_IOREQ_SERVER
>> +    if ( unlikely(curr->domain->mapcache_invalidate) &&
>> +         test_and_clear_bool(curr->domain->mapcache_invalidate) )
>> +        ioreq_signal_mapcache_invalidate();
> 
> Why not just:
> 
> if ( unlikely(test_and_clear_bool(curr->domain->mapcache_invalidate)) )
>      ioreq_signal_mapcache_invalidate();
> 

This seems to match the x86 code. My guess is they tried to prevent the 
cost of the atomic operation if there is no chance mapcache_invalidate 
is true.

I am split whether the first check is worth it. The atomic operation 
should be uncontended most of the time, so it should be quick. But it 
will always be slower than just a read because there is always a store 
involved.

On a related topic, Jan pointed out that the invalidation would not work 
properly if you have multiple vCPU modifying the P2M at the same time.

Cheers,

[1] 
https://lore.kernel.org/xen-devel/f92f62bf-2f8d-34db-4be5-d3e6a4b9d580@suse.com/

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling
  2020-12-10 18:50     ` Julien Grall
@ 2020-12-11  1:28       ` Stefano Stabellini
  2020-12-11 11:21         ` Oleksandr
  2020-12-11 19:27         ` Julien Grall
  0 siblings, 2 replies; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-11  1:28 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Oleksandr Tyshchenko, xen-devel,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Julien Grall

On Thu, 10 Dec 2020, Julien Grall wrote:
> On 10/12/2020 02:30, Stefano Stabellini wrote:
> > On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > We need to send mapcache invalidation request to qemu/demu everytime
> > > the page gets removed from a guest.
> > > 
> > > At the moment, the Arm code doesn't explicitely remove the existing
> > > mapping before inserting the new mapping. Instead, this is done
> > > implicitely by __p2m_set_entry().
> > > 
> > > So we need to recognize a case when old entry is a RAM page *and*
> > > the new MFN is different in order to set the corresponding flag.
> > > The most suitable place to do this is p2m_free_entry(), there
> > > we can find the correct leaf type. The invalidation request
> > > will be sent in do_trap_hypercall() later on.
> > 
> > Why is it sent in do_trap_hypercall() ?
> 
> I believe this is following the approach used by x86. There are actually some
> discussion about it (see [1]).
> 
> Leaving aside the toolstack case for now, AFAIK, the only way a guest can
> modify its p2m is via an hypercall. Do you have an example otherwise?

OK this is a very important assumption. We should write it down for sure.
I think it is true today on ARM.


> When sending the invalidation request, the vCPU will be blocked until all the
> IOREQ server have acknowledged the invalidation. So the hypercall seems to be
> the best position to do it.
> 
> Alternatively, we could use check_for_vcpu_work() to check if the mapcache
> needs to be invalidated. The inconvenience is we would execute a few more
> instructions in each entry/exit path.

Yeah it would be more natural to call it from check_for_vcpu_work(). If
we put it between #ifdef CONFIG_IOREQ_SERVER it wouldn't be bad. But I
am not a fan of increasing the instructions on the exit path either.
From this point of view, putting it at the end of do_trap_hypercall is a
nice trick actually. Let's just make sure it has a good comment on top.


> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > CC: Julien Grall <julien.grall@arm.com>
> > > 
> > > ---
> > > Please note, this is a split/cleanup/hardening of Julien's PoC:
> > > "Add support for Guest IO forwarding to a device emulator"
> > > 
> > > Changes V1 -> V2:
> > >     - new patch, some changes were derived from (+ new explanation):
> > >       xen/ioreq: Make x86's invalidate qemu mapcache handling common
> > >     - put setting of the flag into __p2m_set_entry()
> > >     - clarify the conditions when the flag should be set
> > >     - use domain_has_ioreq_server()
> > >     - update do_trap_hypercall() by adding local variable
> > > 
> > > Changes V2 -> V3:
> > >     - update patch description
> > >     - move check to p2m_free_entry()
> > >     - add a comment
> > >     - use "curr" instead of "v" in do_trap_hypercall()
> > > ---
> > > ---
> > >   xen/arch/arm/p2m.c   | 24 ++++++++++++++++--------
> > >   xen/arch/arm/traps.c | 13 ++++++++++---
> > >   2 files changed, 26 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> > > index 5b8d494..9674f6f 100644
> > > --- a/xen/arch/arm/p2m.c
> > > +++ b/xen/arch/arm/p2m.c
> > > @@ -1,6 +1,7 @@
> > >   #include <xen/cpu.h>
> > >   #include <xen/domain_page.h>
> > >   #include <xen/iocap.h>
> > > +#include <xen/ioreq.h>
> > >   #include <xen/lib.h>
> > >   #include <xen/sched.h>
> > >   #include <xen/softirq.h>
> > > @@ -749,17 +750,24 @@ static void p2m_free_entry(struct p2m_domain *p2m,
> > >       if ( !p2m_is_valid(entry) )
> > >           return;
> > >   -    /* Nothing to do but updating the stats if the entry is a
> > > super-page. */
> > > -    if ( p2m_is_superpage(entry, level) )
> > > +    if ( p2m_is_superpage(entry, level) || (level == 3) )
> > >       {
> > > -        p2m->stats.mappings[level]--;
> > > -        return;
> > > -    }
> > > +#ifdef CONFIG_IOREQ_SERVER
> > > +        /*
> > > +         * If this gets called (non-recursively) then either the entry
> > > +         * was replaced by an entry with a different base (valid case) or
> > > +         * the shattering of a superpage was failed (error case).
> > > +         * So, at worst, the spurious mapcache invalidation might be
> > > sent.
> > > +         */
> > > +        if ( domain_has_ioreq_server(p2m->domain) &&
> > > +             (p2m->domain == current->domain) &&
> > > p2m_is_ram(entry.p2m.type) )
> > > +            p2m->domain->mapcache_invalidate = true;
> > 
> > Why the (p2m->domain == current->domain) check? Shouldn't we set
> > mapcache_invalidate to true anyway? What happens if p2m->domain !=
> > current->domain? We wouldn't want the domain to lose the
> > mapcache_invalidate notification.
> 
> This is also discussed in [1]. :) The main question is why would a
> toolstack/device model modify the guest memory after boot?
> 
> If we assume it does, then the device model would need to pause the domain
> before modifying the RAM.
> 
> We also need to make sure that all the IOREQ servers have invalidated
> the mapcache before the domain run again.
> 
> This would require quite a bit of work. I am not sure the effort is worth if
> there are no active users today.

OK, that explains why we think p2m->domain == current->domain, but why
do we need to have a check for it right here?

In other words, we don't think it is realistc to get here with
p2m->domain != current->domain, but let's say that we do somehow. What's
the best course of action? Probably, set mapcache_invalidate to true and
possibly print a warning?

Leaving mapcache_invalidate to false doesn't seem to be what we want to
do?

 
> > >       BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
> > >   @@ -1459,7 +1460,7 @@ static void do_trap_hypercall(struct cpu_user_regs
> > > *regs, register_t *nr,
> > >           return;
> > >       }
> > >   -    current->hcall_preempted = false;
> > > +    curr->hcall_preempted = false;
> > >         perfc_incra(hypercalls, *nr);
> > >       call = arm_hypercall_table[*nr].fn;
> > > @@ -1472,7 +1473,7 @@ static void do_trap_hypercall(struct cpu_user_regs
> > > *regs, register_t *nr,
> > >       HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
> > >     #ifndef NDEBUG
> > > -    if ( !current->hcall_preempted )
> > > +    if ( !curr->hcall_preempted )
> > >       {
> > >           /* Deliberately corrupt parameter regs used by this hypercall.
> > > */
> > >           switch ( arm_hypercall_table[*nr].nr_args ) {
> > > @@ -1489,8 +1490,14 @@ static void do_trap_hypercall(struct cpu_user_regs
> > > *regs, register_t *nr,
> > >   #endif
> > >         /* Ensure the hypercall trap instruction is re-executed. */
> > > -    if ( current->hcall_preempted )
> > > +    if ( curr->hcall_preempted )
> > >           regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
> > > +
> > > +#ifdef CONFIG_IOREQ_SERVER
> > > +    if ( unlikely(curr->domain->mapcache_invalidate) &&
> > > +         test_and_clear_bool(curr->domain->mapcache_invalidate) )
> > > +        ioreq_signal_mapcache_invalidate();
> > 
> > Why not just:
> > 
> > if ( unlikely(test_and_clear_bool(curr->domain->mapcache_invalidate)) )
> >      ioreq_signal_mapcache_invalidate();
> > 
> 
> This seems to match the x86 code. My guess is they tried to prevent the cost
> of the atomic operation if there is no chance mapcache_invalidate is true.
> 
> I am split whether the first check is worth it. The atomic operation should be
> uncontended most of the time, so it should be quick. But it will always be
> slower than just a read because there is always a store involved.

I am not a fun of optimizations with unclear benefits :-)


> On a related topic, Jan pointed out that the invalidation would not work
> properly if you have multiple vCPU modifying the P2M at the same time.

Uhm, yes.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling
  2020-12-11  1:28       ` Stefano Stabellini
@ 2020-12-11 11:21         ` Oleksandr
  2020-12-11 19:07           ` Stefano Stabellini
  2020-12-11 19:27         ` Julien Grall
  1 sibling, 1 reply; 127+ messages in thread
From: Oleksandr @ 2020-12-11 11:21 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Volodymyr Babchuk, Julien Grall


On 11.12.20 03:28, Stefano Stabellini wrote:

Hi Julien, Stefano

> On Thu, 10 Dec 2020, Julien Grall wrote:
>> On 10/12/2020 02:30, Stefano Stabellini wrote:
>>> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> We need to send mapcache invalidation request to qemu/demu everytime
>>>> the page gets removed from a guest.
>>>>
>>>> At the moment, the Arm code doesn't explicitely remove the existing
>>>> mapping before inserting the new mapping. Instead, this is done
>>>> implicitely by __p2m_set_entry().
>>>>
>>>> So we need to recognize a case when old entry is a RAM page *and*
>>>> the new MFN is different in order to set the corresponding flag.
>>>> The most suitable place to do this is p2m_free_entry(), there
>>>> we can find the correct leaf type. The invalidation request
>>>> will be sent in do_trap_hypercall() later on.
>>> Why is it sent in do_trap_hypercall() ?
>> I believe this is following the approach used by x86. There are actually some
>> discussion about it (see [1]).
>>
>> Leaving aside the toolstack case for now, AFAIK, the only way a guest can
>> modify its p2m is via an hypercall. Do you have an example otherwise?
> OK this is a very important assumption. We should write it down for sure.
> I think it is true today on ARM.
>
>
>> When sending the invalidation request, the vCPU will be blocked until all the
>> IOREQ server have acknowledged the invalidation. So the hypercall seems to be
>> the best position to do it.
>>
>> Alternatively, we could use check_for_vcpu_work() to check if the mapcache
>> needs to be invalidated. The inconvenience is we would execute a few more
>> instructions in each entry/exit path.
> Yeah it would be more natural to call it from check_for_vcpu_work(). If
> we put it between #ifdef CONFIG_IOREQ_SERVER it wouldn't be bad. But I
> am not a fan of increasing the instructions on the exit path either.
>  From this point of view, putting it at the end of do_trap_hypercall is a
> nice trick actually. Let's just make sure it has a good comment on top.
>
>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> CC: Julien Grall <julien.grall@arm.com>
>>>>
>>>> ---
>>>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>
>>>> Changes V1 -> V2:
>>>>      - new patch, some changes were derived from (+ new explanation):
>>>>        xen/ioreq: Make x86's invalidate qemu mapcache handling common
>>>>      - put setting of the flag into __p2m_set_entry()
>>>>      - clarify the conditions when the flag should be set
>>>>      - use domain_has_ioreq_server()
>>>>      - update do_trap_hypercall() by adding local variable
>>>>
>>>> Changes V2 -> V3:
>>>>      - update patch description
>>>>      - move check to p2m_free_entry()
>>>>      - add a comment
>>>>      - use "curr" instead of "v" in do_trap_hypercall()
>>>> ---
>>>> ---
>>>>    xen/arch/arm/p2m.c   | 24 ++++++++++++++++--------
>>>>    xen/arch/arm/traps.c | 13 ++++++++++---
>>>>    2 files changed, 26 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>>>> index 5b8d494..9674f6f 100644
>>>> --- a/xen/arch/arm/p2m.c
>>>> +++ b/xen/arch/arm/p2m.c
>>>> @@ -1,6 +1,7 @@
>>>>    #include <xen/cpu.h>
>>>>    #include <xen/domain_page.h>
>>>>    #include <xen/iocap.h>
>>>> +#include <xen/ioreq.h>
>>>>    #include <xen/lib.h>
>>>>    #include <xen/sched.h>
>>>>    #include <xen/softirq.h>
>>>> @@ -749,17 +750,24 @@ static void p2m_free_entry(struct p2m_domain *p2m,
>>>>        if ( !p2m_is_valid(entry) )
>>>>            return;
>>>>    -    /* Nothing to do but updating the stats if the entry is a
>>>> super-page. */
>>>> -    if ( p2m_is_superpage(entry, level) )
>>>> +    if ( p2m_is_superpage(entry, level) || (level == 3) )
>>>>        {
>>>> -        p2m->stats.mappings[level]--;
>>>> -        return;
>>>> -    }
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +        /*
>>>> +         * If this gets called (non-recursively) then either the entry
>>>> +         * was replaced by an entry with a different base (valid case) or
>>>> +         * the shattering of a superpage was failed (error case).
>>>> +         * So, at worst, the spurious mapcache invalidation might be
>>>> sent.
>>>> +         */
>>>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>>>> +             (p2m->domain == current->domain) &&
>>>> p2m_is_ram(entry.p2m.type) )
>>>> +            p2m->domain->mapcache_invalidate = true;
>>> Why the (p2m->domain == current->domain) check? Shouldn't we set
>>> mapcache_invalidate to true anyway? What happens if p2m->domain !=
>>> current->domain? We wouldn't want the domain to lose the
>>> mapcache_invalidate notification.
>> This is also discussed in [1]. :) The main question is why would a
>> toolstack/device model modify the guest memory after boot?
>>
>> If we assume it does, then the device model would need to pause the domain
>> before modifying the RAM.
>>
>> We also need to make sure that all the IOREQ servers have invalidated
>> the mapcache before the domain run again.
>>
>> This would require quite a bit of work. I am not sure the effort is worth if
>> there are no active users today.
> OK, that explains why we think p2m->domain == current->domain, but why
> do we need to have a check for it right here?
>
> In other words, we don't think it is realistc to get here with
> p2m->domain != current->domain, but let's say that we do somehow. What's
> the best course of action? Probably, set mapcache_invalidate to true and
> possibly print a warning?
>
> Leaving mapcache_invalidate to false doesn't seem to be what we want to
> do?
>
>   
>>>>        BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
>>>>    @@ -1459,7 +1460,7 @@ static void do_trap_hypercall(struct cpu_user_regs
>>>> *regs, register_t *nr,
>>>>            return;
>>>>        }
>>>>    -    current->hcall_preempted = false;
>>>> +    curr->hcall_preempted = false;
>>>>          perfc_incra(hypercalls, *nr);
>>>>        call = arm_hypercall_table[*nr].fn;
>>>> @@ -1472,7 +1473,7 @@ static void do_trap_hypercall(struct cpu_user_regs
>>>> *regs, register_t *nr,
>>>>        HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
>>>>      #ifndef NDEBUG
>>>> -    if ( !current->hcall_preempted )
>>>> +    if ( !curr->hcall_preempted )
>>>>        {
>>>>            /* Deliberately corrupt parameter regs used by this hypercall.
>>>> */
>>>>            switch ( arm_hypercall_table[*nr].nr_args ) {
>>>> @@ -1489,8 +1490,14 @@ static void do_trap_hypercall(struct cpu_user_regs
>>>> *regs, register_t *nr,
>>>>    #endif
>>>>          /* Ensure the hypercall trap instruction is re-executed. */
>>>> -    if ( current->hcall_preempted )
>>>> +    if ( curr->hcall_preempted )
>>>>            regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
>>>> +
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +    if ( unlikely(curr->domain->mapcache_invalidate) &&
>>>> +         test_and_clear_bool(curr->domain->mapcache_invalidate) )
>>>> +        ioreq_signal_mapcache_invalidate();
>>> Why not just:
>>>
>>> if ( unlikely(test_and_clear_bool(curr->domain->mapcache_invalidate)) )
>>>       ioreq_signal_mapcache_invalidate();
>>>
>> This seems to match the x86 code. My guess is they tried to prevent the cost
>> of the atomic operation if there is no chance mapcache_invalidate is true.
>>
>> I am split whether the first check is worth it. The atomic operation should be
>> uncontended most of the time, so it should be quick. But it will always be
>> slower than just a read because there is always a store involved.
> I am not a fun of optimizations with unclear benefits :-)
>
>
>> On a related topic, Jan pointed out that the invalidation would not work
>> properly if you have multiple vCPU modifying the P2M at the same time.
>>
Thanks to Julien, he explained all bits in detail. Indeed I followed how 
it was done on x86 (place where to send the invalidation request, the 
code to check whether the flag is set, which at first glance, appears 
odd, etc)
and review comments (to latch current into the local variable, and make 
sure that domain sends invalidation request on itself).
Regarding what to do if p2m->domain != current->domain in 
p2m_free_entry(). Probably we could set flag only if guest is paused, 
otherwise just print a warning. Thoughts?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling
  2020-12-11 11:21         ` Oleksandr
@ 2020-12-11 19:07           ` Stefano Stabellini
  2020-12-11 19:37             ` Julien Grall
  0 siblings, 1 reply; 127+ messages in thread
From: Stefano Stabellini @ 2020-12-11 19:07 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, Julien Grall, xen-devel,
	Oleksandr Tyshchenko, Volodymyr Babchuk, Julien Grall

On Fri, 11 Dec 2020, Oleksandr wrote:
> On 11.12.20 03:28, Stefano Stabellini wrote:
> > On Thu, 10 Dec 2020, Julien Grall wrote:
> > > On 10/12/2020 02:30, Stefano Stabellini wrote:
> > > > On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > 
> > > > > We need to send mapcache invalidation request to qemu/demu everytime
> > > > > the page gets removed from a guest.
> > > > > 
> > > > > At the moment, the Arm code doesn't explicitely remove the existing
> > > > > mapping before inserting the new mapping. Instead, this is done
> > > > > implicitely by __p2m_set_entry().
> > > > > 
> > > > > So we need to recognize a case when old entry is a RAM page *and*
> > > > > the new MFN is different in order to set the corresponding flag.
> > > > > The most suitable place to do this is p2m_free_entry(), there
> > > > > we can find the correct leaf type. The invalidation request
> > > > > will be sent in do_trap_hypercall() later on.
> > > > Why is it sent in do_trap_hypercall() ?
> > > I believe this is following the approach used by x86. There are actually
> > > some
> > > discussion about it (see [1]).
> > > 
> > > Leaving aside the toolstack case for now, AFAIK, the only way a guest can
> > > modify its p2m is via an hypercall. Do you have an example otherwise?
> > OK this is a very important assumption. We should write it down for sure.
> > I think it is true today on ARM.
> > 
> > 
> > > When sending the invalidation request, the vCPU will be blocked until all
> > > the
> > > IOREQ server have acknowledged the invalidation. So the hypercall seems to
> > > be
> > > the best position to do it.
> > > 
> > > Alternatively, we could use check_for_vcpu_work() to check if the mapcache
> > > needs to be invalidated. The inconvenience is we would execute a few more
> > > instructions in each entry/exit path.
> > Yeah it would be more natural to call it from check_for_vcpu_work(). If
> > we put it between #ifdef CONFIG_IOREQ_SERVER it wouldn't be bad. But I
> > am not a fan of increasing the instructions on the exit path either.
> >  From this point of view, putting it at the end of do_trap_hypercall is a
> > nice trick actually. Let's just make sure it has a good comment on top.
> > 
> > 
> > > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > CC: Julien Grall <julien.grall@arm.com>
> > > > > 
> > > > > ---
> > > > > Please note, this is a split/cleanup/hardening of Julien's PoC:
> > > > > "Add support for Guest IO forwarding to a device emulator"
> > > > > 
> > > > > Changes V1 -> V2:
> > > > >      - new patch, some changes were derived from (+ new explanation):
> > > > >        xen/ioreq: Make x86's invalidate qemu mapcache handling common
> > > > >      - put setting of the flag into __p2m_set_entry()
> > > > >      - clarify the conditions when the flag should be set
> > > > >      - use domain_has_ioreq_server()
> > > > >      - update do_trap_hypercall() by adding local variable
> > > > > 
> > > > > Changes V2 -> V3:
> > > > >      - update patch description
> > > > >      - move check to p2m_free_entry()
> > > > >      - add a comment
> > > > >      - use "curr" instead of "v" in do_trap_hypercall()
> > > > > ---
> > > > > ---
> > > > >    xen/arch/arm/p2m.c   | 24 ++++++++++++++++--------
> > > > >    xen/arch/arm/traps.c | 13 ++++++++++---
> > > > >    2 files changed, 26 insertions(+), 11 deletions(-)
> > > > > 
> > > > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> > > > > index 5b8d494..9674f6f 100644
> > > > > --- a/xen/arch/arm/p2m.c
> > > > > +++ b/xen/arch/arm/p2m.c
> > > > > @@ -1,6 +1,7 @@
> > > > >    #include <xen/cpu.h>
> > > > >    #include <xen/domain_page.h>
> > > > >    #include <xen/iocap.h>
> > > > > +#include <xen/ioreq.h>
> > > > >    #include <xen/lib.h>
> > > > >    #include <xen/sched.h>
> > > > >    #include <xen/softirq.h>
> > > > > @@ -749,17 +750,24 @@ static void p2m_free_entry(struct p2m_domain
> > > > > *p2m,
> > > > >        if ( !p2m_is_valid(entry) )
> > > > >            return;
> > > > >    -    /* Nothing to do but updating the stats if the entry is a
> > > > > super-page. */
> > > > > -    if ( p2m_is_superpage(entry, level) )
> > > > > +    if ( p2m_is_superpage(entry, level) || (level == 3) )
> > > > >        {
> > > > > -        p2m->stats.mappings[level]--;
> > > > > -        return;
> > > > > -    }
> > > > > +#ifdef CONFIG_IOREQ_SERVER
> > > > > +        /*
> > > > > +         * If this gets called (non-recursively) then either the
> > > > > entry
> > > > > +         * was replaced by an entry with a different base (valid
> > > > > case) or
> > > > > +         * the shattering of a superpage was failed (error case).
> > > > > +         * So, at worst, the spurious mapcache invalidation might be
> > > > > sent.
> > > > > +         */
> > > > > +        if ( domain_has_ioreq_server(p2m->domain) &&
> > > > > +             (p2m->domain == current->domain) &&
> > > > > p2m_is_ram(entry.p2m.type) )
> > > > > +            p2m->domain->mapcache_invalidate = true;
> > > > Why the (p2m->domain == current->domain) check? Shouldn't we set
> > > > mapcache_invalidate to true anyway? What happens if p2m->domain !=
> > > > current->domain? We wouldn't want the domain to lose the
> > > > mapcache_invalidate notification.
> > > This is also discussed in [1]. :) The main question is why would a
> > > toolstack/device model modify the guest memory after boot?
> > > 
> > > If we assume it does, then the device model would need to pause the domain
> > > before modifying the RAM.
> > > 
> > > We also need to make sure that all the IOREQ servers have invalidated
> > > the mapcache before the domain run again.
> > > 
> > > This would require quite a bit of work. I am not sure the effort is worth
> > > if
> > > there are no active users today.
> > OK, that explains why we think p2m->domain == current->domain, but why
> > do we need to have a check for it right here?
> > 
> > In other words, we don't think it is realistc to get here with
> > p2m->domain != current->domain, but let's say that we do somehow. What's
> > the best course of action? Probably, set mapcache_invalidate to true and
> > possibly print a warning?
> > 
> > Leaving mapcache_invalidate to false doesn't seem to be what we want to
> > do?
> > 
> >   
> > > > >        BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
> > > > >    @@ -1459,7 +1460,7 @@ static void do_trap_hypercall(struct
> > > > > cpu_user_regs
> > > > > *regs, register_t *nr,
> > > > >            return;
> > > > >        }
> > > > >    -    current->hcall_preempted = false;
> > > > > +    curr->hcall_preempted = false;
> > > > >          perfc_incra(hypercalls, *nr);
> > > > >        call = arm_hypercall_table[*nr].fn;
> > > > > @@ -1472,7 +1473,7 @@ static void do_trap_hypercall(struct
> > > > > cpu_user_regs
> > > > > *regs, register_t *nr,
> > > > >        HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
> > > > >      #ifndef NDEBUG
> > > > > -    if ( !current->hcall_preempted )
> > > > > +    if ( !curr->hcall_preempted )
> > > > >        {
> > > > >            /* Deliberately corrupt parameter regs used by this
> > > > > hypercall.
> > > > > */
> > > > >            switch ( arm_hypercall_table[*nr].nr_args ) {
> > > > > @@ -1489,8 +1490,14 @@ static void do_trap_hypercall(struct
> > > > > cpu_user_regs
> > > > > *regs, register_t *nr,
> > > > >    #endif
> > > > >          /* Ensure the hypercall trap instruction is re-executed. */
> > > > > -    if ( current->hcall_preempted )
> > > > > +    if ( curr->hcall_preempted )
> > > > >            regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
> > > > > +
> > > > > +#ifdef CONFIG_IOREQ_SERVER
> > > > > +    if ( unlikely(curr->domain->mapcache_invalidate) &&
> > > > > +         test_and_clear_bool(curr->domain->mapcache_invalidate) )
> > > > > +        ioreq_signal_mapcache_invalidate();
> > > > Why not just:
> > > > 
> > > > if ( unlikely(test_and_clear_bool(curr->domain->mapcache_invalidate)) )
> > > >       ioreq_signal_mapcache_invalidate();
> > > > 
> > > This seems to match the x86 code. My guess is they tried to prevent the
> > > cost
> > > of the atomic operation if there is no chance mapcache_invalidate is true.
> > > 
> > > I am split whether the first check is worth it. The atomic operation
> > > should be
> > > uncontended most of the time, so it should be quick. But it will always be
> > > slower than just a read because there is always a store involved.
> > I am not a fun of optimizations with unclear benefits :-)
> > 
> > 
> > > On a related topic, Jan pointed out that the invalidation would not work
> > > properly if you have multiple vCPU modifying the P2M at the same time.
> > > 
> Thanks to Julien, he explained all bits in detail. Indeed I followed how it
> was done on x86 (place where to send the invalidation request, the code to
> check whether the flag is set, which at first glance, appears odd, etc)
> and review comments (to latch current into the local variable, and make sure
> that domain sends invalidation request on itself).
> Regarding what to do if p2m->domain != current->domain in p2m_free_entry().
> Probably we could set flag only if guest is paused, otherwise just print a
> warning. Thoughts?

I'd do something like:

if ( domain_has_ioreq_server(p2m->domain) && p2m_is_ram(entry.p2m.type) )
{
    WARN_ON(p2m->domain != current->domain);
    p2m->domain->mapcache_invalidate = true;
}

but maybe Julien has a better idea.



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling
  2020-12-11  1:28       ` Stefano Stabellini
  2020-12-11 11:21         ` Oleksandr
@ 2020-12-11 19:27         ` Julien Grall
  1 sibling, 0 replies; 127+ messages in thread
From: Julien Grall @ 2020-12-11 19:27 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Oleksandr Tyshchenko, xen-devel, Oleksandr Tyshchenko,
	Volodymyr Babchuk, Julien Grall

Hi Stefano,

On 11/12/2020 01:28, Stefano Stabellini wrote:
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> CC: Julien Grall <julien.grall@arm.com>
>>>>
>>>> ---
>>>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>
>>>> Changes V1 -> V2:
>>>>      - new patch, some changes were derived from (+ new explanation):
>>>>        xen/ioreq: Make x86's invalidate qemu mapcache handling common
>>>>      - put setting of the flag into __p2m_set_entry()
>>>>      - clarify the conditions when the flag should be set
>>>>      - use domain_has_ioreq_server()
>>>>      - update do_trap_hypercall() by adding local variable
>>>>
>>>> Changes V2 -> V3:
>>>>      - update patch description
>>>>      - move check to p2m_free_entry()
>>>>      - add a comment
>>>>      - use "curr" instead of "v" in do_trap_hypercall()
>>>> ---
>>>> ---
>>>>    xen/arch/arm/p2m.c   | 24 ++++++++++++++++--------
>>>>    xen/arch/arm/traps.c | 13 ++++++++++---
>>>>    2 files changed, 26 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>>>> index 5b8d494..9674f6f 100644
>>>> --- a/xen/arch/arm/p2m.c
>>>> +++ b/xen/arch/arm/p2m.c
>>>> @@ -1,6 +1,7 @@
>>>>    #include <xen/cpu.h>
>>>>    #include <xen/domain_page.h>
>>>>    #include <xen/iocap.h>
>>>> +#include <xen/ioreq.h>
>>>>    #include <xen/lib.h>
>>>>    #include <xen/sched.h>
>>>>    #include <xen/softirq.h>
>>>> @@ -749,17 +750,24 @@ static void p2m_free_entry(struct p2m_domain *p2m,
>>>>        if ( !p2m_is_valid(entry) )
>>>>            return;
>>>>    -    /* Nothing to do but updating the stats if the entry is a
>>>> super-page. */
>>>> -    if ( p2m_is_superpage(entry, level) )
>>>> +    if ( p2m_is_superpage(entry, level) || (level == 3) )
>>>>        {
>>>> -        p2m->stats.mappings[level]--;
>>>> -        return;
>>>> -    }
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +        /*
>>>> +         * If this gets called (non-recursively) then either the entry
>>>> +         * was replaced by an entry with a different base (valid case) or
>>>> +         * the shattering of a superpage was failed (error case).
>>>> +         * So, at worst, the spurious mapcache invalidation might be
>>>> sent.
>>>> +         */
>>>> +        if ( domain_has_ioreq_server(p2m->domain) &&

Hmmm... I didn't realize that you were going to call 
domain_has_ioreq_server() here. Per your comment, this can only be 
called when on p2m->domain == current.

One way would be to switch the two check. However, I am not entirely 
sure this is necessary. I see no issue to always set mapcache_invalidate 
even if there are no IOREQ server available.

>>>> +             (p2m->domain == current->domain) &&
>>>> p2m_is_ram(entry.p2m.type) )
>>>> +            p2m->domain->mapcache_invalidate = true;
>>>
>>> Why the (p2m->domain == current->domain) check? Shouldn't we set
>>> mapcache_invalidate to true anyway? What happens if p2m->domain !=
>>> current->domain? We wouldn't want the domain to lose the
>>> mapcache_invalidate notification.
>>
>> This is also discussed in [1]. :) The main question is why would a
>> toolstack/device model modify the guest memory after boot?
>>
>> If we assume it does, then the device model would need to pause the domain
>> before modifying the RAM.
>>
>> We also need to make sure that all the IOREQ servers have invalidated
>> the mapcache before the domain run again.
>>
>> This would require quite a bit of work. I am not sure the effort is worth if
>> there are no active users today.
> 
> OK, that explains why we think p2m->domain == current->domain, but why
> do we need to have a check for it right here?
> 
> In other words, we don't think it is realistc to get here with
> p2m->domain != current->domain, but let's say that we do somehow.

I am guessing by "here", you mean in the situation where a RAM entry 
would be removed. Is it correct? If so yes, I don't believe this should 
happen today (even at domain creation/destruction).

> What's
> the best course of action?

The best course of action would be to forward the invalidation to *all* 
the IOREQ servers and wait for it before the domain can run again.

> Probably, set mapcache_invalidate to true and
> possibly print a warning?
So if the toolstack (or an IOREQ server ends to up use it), then we need 
to make sure all the IOREQ server have invalidated the mapcache before 
the domain can run again.

> 
> Leaving mapcache_invalidate to false doesn't seem to be what we want to
> do?

Setting to true/false is not going to be very helful because the guest 
may never issue an hypercall.

Without any more work, the guest may get corrupted. So I would suggest 
to either prevent the P2M to be modified after the domain has been 
created and before it is destroyed (it more a stopgap) or fix it properly.

>>>>        BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
>>>>    @@ -1459,7 +1460,7 @@ static void do_trap_hypercall(struct cpu_user_regs
>>>> *regs, register_t *nr,
>>>>            return;
>>>>        }
>>>>    -    current->hcall_preempted = false;
>>>> +    curr->hcall_preempted = false;
>>>>          perfc_incra(hypercalls, *nr);
>>>>        call = arm_hypercall_table[*nr].fn;
>>>> @@ -1472,7 +1473,7 @@ static void do_trap_hypercall(struct cpu_user_regs
>>>> *regs, register_t *nr,
>>>>        HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
>>>>      #ifndef NDEBUG
>>>> -    if ( !current->hcall_preempted )
>>>> +    if ( !curr->hcall_preempted )
>>>>        {
>>>>            /* Deliberately corrupt parameter regs used by this hypercall.
>>>> */
>>>>            switch ( arm_hypercall_table[*nr].nr_args ) {
>>>> @@ -1489,8 +1490,14 @@ static void do_trap_hypercall(struct cpu_user_regs
>>>> *regs, register_t *nr,
>>>>    #endif
>>>>          /* Ensure the hypercall trap instruction is re-executed. */
>>>> -    if ( current->hcall_preempted )
>>>> +    if ( curr->hcall_preempted )
>>>>            regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
>>>> +
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +    if ( unlikely(curr->domain->mapcache_invalidate) &&
>>>> +         test_and_clear_bool(curr->domain->mapcache_invalidate) )
>>>> +        ioreq_signal_mapcache_invalidate();
>>>
>>> Why not just:
>>>
>>> if ( unlikely(test_and_clear_bool(curr->domain->mapcache_invalidate)) )
>>>       ioreq_signal_mapcache_invalidate();
>>>
>>
>> This seems to match the x86 code. My guess is they tried to prevent the cost
>> of the atomic operation if there is no chance mapcache_invalidate is true.
>>
>> I am split whether the first check is worth it. The atomic operation should be
>> uncontended most of the time, so it should be quick. But it will always be
>> slower than just a read because there is always a store involved.
> 
> I am not a fun of optimizations with unclear benefits :-)

I thought a bit more about it and I am actually leaning towards keeping 
the first check.

The common implementation of the hypercall path is mostly (if not all) 
accessing per-vCPU variables. So the hypercalls can mostly work 
independently (at least in the common part).

Assuming we drop the first check, we would now be writing to a 
per-domain variable at every hypercall. You are probably going to notice 
performance impact if you were going to benchmark concurrent no-op 
hypercall, because the cache line is going to bounce (because of the write).

Arguably, this may become noise if you execute a full hypercall. But I 
would still like to treat the common hypercall path as the entry/exit 
path. IOW, I would like to be careful in what we add there.

The main reason is hypercalls may be used quite a lot by PV backends or 
device emulator (if we think about Virtio).

If we decided to move the change in the entry/exit path, then it would
definitely be an issue for the reasons I explained above. So I would 
also like to avoid the write in shared variable if we can.

FAOD, I am not saying this optmization will save the world :). I am sure 
there will be more (in particular in the vGIC part) in order to get 
Virtio performance in par with PV backends on Xen.

This discussion would also be moot if ...

> 
>> On a related topic, Jan pointed out that the invalidation would not work
>> properly if you have multiple vCPU modifying the P2M at the same time.

... we had a per-vCPU flag instead.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling
  2020-12-11 19:07           ` Stefano Stabellini
@ 2020-12-11 19:37             ` Julien Grall
  0 siblings, 0 replies; 127+ messages in thread
From: Julien Grall @ 2020-12-11 19:37 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Volodymyr Babchuk, Julien Grall



On 11/12/2020 19:07, Stefano Stabellini wrote:
> On Fri, 11 Dec 2020, Oleksandr wrote:
>> On 11.12.20 03:28, Stefano Stabellini wrote:
>>> On Thu, 10 Dec 2020, Julien Grall wrote:
>>>> On 10/12/2020 02:30, Stefano Stabellini wrote:
>>>>> On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>
>>>>>> We need to send mapcache invalidation request to qemu/demu everytime
>>>>>> the page gets removed from a guest.
>>>>>>
>>>>>> At the moment, the Arm code doesn't explicitely remove the existing
>>>>>> mapping before inserting the new mapping. Instead, this is done
>>>>>> implicitely by __p2m_set_entry().
>>>>>>
>>>>>> So we need to recognize a case when old entry is a RAM page *and*
>>>>>> the new MFN is different in order to set the corresponding flag.
>>>>>> The most suitable place to do this is p2m_free_entry(), there
>>>>>> we can find the correct leaf type. The invalidation request
>>>>>> will be sent in do_trap_hypercall() later on.
>>>>> Why is it sent in do_trap_hypercall() ?
>>>> I believe this is following the approach used by x86. There are actually
>>>> some
>>>> discussion about it (see [1]).
>>>>
>>>> Leaving aside the toolstack case for now, AFAIK, the only way a guest can
>>>> modify its p2m is via an hypercall. Do you have an example otherwise?
>>> OK this is a very important assumption. We should write it down for sure.
>>> I think it is true today on ARM.
>>>
>>>
>>>> When sending the invalidation request, the vCPU will be blocked until all
>>>> the
>>>> IOREQ server have acknowledged the invalidation. So the hypercall seems to
>>>> be
>>>> the best position to do it.
>>>>
>>>> Alternatively, we could use check_for_vcpu_work() to check if the mapcache
>>>> needs to be invalidated. The inconvenience is we would execute a few more
>>>> instructions in each entry/exit path.
>>> Yeah it would be more natural to call it from check_for_vcpu_work(). If
>>> we put it between #ifdef CONFIG_IOREQ_SERVER it wouldn't be bad. But I
>>> am not a fan of increasing the instructions on the exit path either.
>>>   From this point of view, putting it at the end of do_trap_hypercall is a
>>> nice trick actually. Let's just make sure it has a good comment on top.
>>>
>>>
>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>> CC: Julien Grall <julien.grall@arm.com>
>>>>>>
>>>>>> ---
>>>>>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>>>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>>>
>>>>>> Changes V1 -> V2:
>>>>>>       - new patch, some changes were derived from (+ new explanation):
>>>>>>         xen/ioreq: Make x86's invalidate qemu mapcache handling common
>>>>>>       - put setting of the flag into __p2m_set_entry()
>>>>>>       - clarify the conditions when the flag should be set
>>>>>>       - use domain_has_ioreq_server()
>>>>>>       - update do_trap_hypercall() by adding local variable
>>>>>>
>>>>>> Changes V2 -> V3:
>>>>>>       - update patch description
>>>>>>       - move check to p2m_free_entry()
>>>>>>       - add a comment
>>>>>>       - use "curr" instead of "v" in do_trap_hypercall()
>>>>>> ---
>>>>>> ---
>>>>>>     xen/arch/arm/p2m.c   | 24 ++++++++++++++++--------
>>>>>>     xen/arch/arm/traps.c | 13 ++++++++++---
>>>>>>     2 files changed, 26 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>>>>>> index 5b8d494..9674f6f 100644
>>>>>> --- a/xen/arch/arm/p2m.c
>>>>>> +++ b/xen/arch/arm/p2m.c
>>>>>> @@ -1,6 +1,7 @@
>>>>>>     #include <xen/cpu.h>
>>>>>>     #include <xen/domain_page.h>
>>>>>>     #include <xen/iocap.h>
>>>>>> +#include <xen/ioreq.h>
>>>>>>     #include <xen/lib.h>
>>>>>>     #include <xen/sched.h>
>>>>>>     #include <xen/softirq.h>
>>>>>> @@ -749,17 +750,24 @@ static void p2m_free_entry(struct p2m_domain
>>>>>> *p2m,
>>>>>>         if ( !p2m_is_valid(entry) )
>>>>>>             return;
>>>>>>     -    /* Nothing to do but updating the stats if the entry is a
>>>>>> super-page. */
>>>>>> -    if ( p2m_is_superpage(entry, level) )
>>>>>> +    if ( p2m_is_superpage(entry, level) || (level == 3) )
>>>>>>         {
>>>>>> -        p2m->stats.mappings[level]--;
>>>>>> -        return;
>>>>>> -    }
>>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>>> +        /*
>>>>>> +         * If this gets called (non-recursively) then either the
>>>>>> entry
>>>>>> +         * was replaced by an entry with a different base (valid
>>>>>> case) or
>>>>>> +         * the shattering of a superpage was failed (error case).
>>>>>> +         * So, at worst, the spurious mapcache invalidation might be
>>>>>> sent.
>>>>>> +         */
>>>>>> +        if ( domain_has_ioreq_server(p2m->domain) &&
>>>>>> +             (p2m->domain == current->domain) &&
>>>>>> p2m_is_ram(entry.p2m.type) )
>>>>>> +            p2m->domain->mapcache_invalidate = true;
>>>>> Why the (p2m->domain == current->domain) check? Shouldn't we set
>>>>> mapcache_invalidate to true anyway? What happens if p2m->domain !=
>>>>> current->domain? We wouldn't want the domain to lose the
>>>>> mapcache_invalidate notification.
>>>> This is also discussed in [1]. :) The main question is why would a
>>>> toolstack/device model modify the guest memory after boot?
>>>>
>>>> If we assume it does, then the device model would need to pause the domain
>>>> before modifying the RAM.
>>>>
>>>> We also need to make sure that all the IOREQ servers have invalidated
>>>> the mapcache before the domain run again.
>>>>
>>>> This would require quite a bit of work. I am not sure the effort is worth
>>>> if
>>>> there are no active users today.
>>> OK, that explains why we think p2m->domain == current->domain, but why
>>> do we need to have a check for it right here?
>>>
>>> In other words, we don't think it is realistc to get here with
>>> p2m->domain != current->domain, but let's say that we do somehow. What's
>>> the best course of action? Probably, set mapcache_invalidate to true and
>>> possibly print a warning?
>>>
>>> Leaving mapcache_invalidate to false doesn't seem to be what we want to
>>> do?
>>>
>>>    
>>>>>>         BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
>>>>>>     @@ -1459,7 +1460,7 @@ static void do_trap_hypercall(struct
>>>>>> cpu_user_regs
>>>>>> *regs, register_t *nr,
>>>>>>             return;
>>>>>>         }
>>>>>>     -    current->hcall_preempted = false;
>>>>>> +    curr->hcall_preempted = false;
>>>>>>           perfc_incra(hypercalls, *nr);
>>>>>>         call = arm_hypercall_table[*nr].fn;
>>>>>> @@ -1472,7 +1473,7 @@ static void do_trap_hypercall(struct
>>>>>> cpu_user_regs
>>>>>> *regs, register_t *nr,
>>>>>>         HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
>>>>>>       #ifndef NDEBUG
>>>>>> -    if ( !current->hcall_preempted )
>>>>>> +    if ( !curr->hcall_preempted )
>>>>>>         {
>>>>>>             /* Deliberately corrupt parameter regs used by this
>>>>>> hypercall.
>>>>>> */
>>>>>>             switch ( arm_hypercall_table[*nr].nr_args ) {
>>>>>> @@ -1489,8 +1490,14 @@ static void do_trap_hypercall(struct
>>>>>> cpu_user_regs
>>>>>> *regs, register_t *nr,
>>>>>>     #endif
>>>>>>           /* Ensure the hypercall trap instruction is re-executed. */
>>>>>> -    if ( current->hcall_preempted )
>>>>>> +    if ( curr->hcall_preempted )
>>>>>>             regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
>>>>>> +
>>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>>> +    if ( unlikely(curr->domain->mapcache_invalidate) &&
>>>>>> +         test_and_clear_bool(curr->domain->mapcache_invalidate) )
>>>>>> +        ioreq_signal_mapcache_invalidate();
>>>>> Why not just:
>>>>>
>>>>> if ( unlikely(test_and_clear_bool(curr->domain->mapcache_invalidate)) )
>>>>>        ioreq_signal_mapcache_invalidate();
>>>>>
>>>> This seems to match the x86 code. My guess is they tried to prevent the
>>>> cost
>>>> of the atomic operation if there is no chance mapcache_invalidate is true.
>>>>
>>>> I am split whether the first check is worth it. The atomic operation
>>>> should be
>>>> uncontended most of the time, so it should be quick. But it will always be
>>>> slower than just a read because there is always a store involved.
>>> I am not a fun of optimizations with unclear benefits :-)
>>>
>>>
>>>> On a related topic, Jan pointed out that the invalidation would not work
>>>> properly if you have multiple vCPU modifying the P2M at the same time.
>>>>
>> Thanks to Julien, he explained all bits in detail. Indeed I followed how it
>> was done on x86 (place where to send the invalidation request, the code to
>> check whether the flag is set, which at first glance, appears odd, etc)
>> and review comments (to latch current into the local variable, and make sure
>> that domain sends invalidation request on itself).
>> Regarding what to do if p2m->domain != current->domain in p2m_free_entry().
>> Probably we could set flag only if guest is paused, otherwise just print a
>> warning. Thoughts?
> 
> I'd do something like:
> 
> if ( domain_has_ioreq_server(p2m->domain) && p2m_is_ram(entry.p2m.type) )
> {
>      WARN_ON(p2m->domain != current->domain)

IOREQ server are not trusted. Yet they will be able to reach this patch 
if one re-use the stubdomain model (they are allowed to modify guest 
layout).

So this change would hand a DoS attack to the IOREQ server on a silver 
platter :).

In general, we should avoid to use WARN_ON() for things that can be 
triggered by a domain. Instead we should use gprintk(XENLOG_WARNING, 
"...") to allow rate-limit.

On the cons side, it would be more difficult to spot any misue with a 
gprintk().

>      p2m->domain->mapcache_invalidate = true;
> }
> 
> but maybe Julien has a better idea.

I suggested a different approach and some rationale in answer to your 
e-mail. Although, I am not sure if we could call it a better approach 
:). We can continue the discusison there.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re:
  2020-11-30 16:21 ` Alex Bennée
  2020-11-30 22:22   ` [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm Oleksandr
@ 2020-12-29 15:32   ` Roger Pau Monné
  1 sibling, 0 replies; 127+ messages in thread
From: Roger Pau Monné @ 2020-12-29 15:32 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Oleksandr Tyshchenko, xen-devel, Oleksandr Tyshchenko,
	Paul Durrant, Jan Beulich, Andrew Cooper, Wei Liu, Julien Grall,
	George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini,
	Tim Deegan, Daniel De Graaf, Volodymyr Babchuk, Jun Nakajima,
	Kevin Tian, Anthony PERARD, Bertrand Marquis, Wei Chen, Kaly Xin,
	Artem Mygaiev

On Mon, Nov 30, 2020 at 04:21:59PM +0000, Alex Bennée wrote:
> 
> Oleksandr Tyshchenko <olekstysh@gmail.com> writes:
> 
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >
> >
> > Date: Sat, 28 Nov 2020 22:33:51 +0200
> > Subject: [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > Hello all.
> >
> > The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
> > You can find an initial discussion at [1] and RFC/V1/V2 series at [2]/[3]/[4].
> > Xen on Arm requires some implementation to forward guest MMIO access to a device
> > model in order to implement virtio-mmio backend or even mediator outside of hypervisor.
> > As Xen on x86 already contains required support this series tries to make it common
> > and introduce Arm specific bits plus some new functionality. Patch series is based on
> > Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device emulator".
> > Besides splitting existing IOREQ/DM support and introducing Arm side, the series
> > also includes virtio-mmio related changes (last 2 patches for toolstack)
> > for the reviewers to be able to see how the whole picture could look
> > like.
> 
> Thanks for posting the latest version.
> 
> >
> > According to the initial discussion there are a few open questions/concerns
> > regarding security, performance in VirtIO solution:
> > 1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require different
> >    transport...
> 
> I think I'm repeating things here I've said in various ephemeral video
> chats over the last few weeks but I should probably put things down on
> the record.
> 
> I think the original intention of the virtio framers is advanced
> features would build on virtio-pci because you get a bunch of things
> "for free" - notably enumeration and MSI support. There is assumption
> that by the time you add these features to virtio-mmio you end up
> re-creating your own less well tested version of virtio-pci. I've not
> been terribly convinced by the argument that the guest implementation of
> PCI presents a sufficiently large blob of code to make the simpler MMIO
> desirable. My attempts to build two virtio kernels (PCI/MMIO) with
> otherwise the same devices wasn't terribly conclusive either way.
> 
> That said virtio-mmio still has life in it because the cloudy slimmed
> down guests moved to using it because the enumeration of PCI is a road
> block to their fast boot up requirements. I'm sure they would also
> appreciate a MSI implementation to reduce the overhead that handling
> notifications currently has on trap-and-emulate.
> 
> AIUI for Xen the other downside to PCI is you would have to emulate it
> in the hypervisor which would be additional code at the most privileged
> level.

Xen already emulates (or maybe it would be better to say decodes) PCI
accesses on the hypervisor and forwards them to the appropriate device
model using the IOREQ interface, so that's not something new. It's
not really emulating the PCI config space, but just detecting accesses
and forwarding them to the device model that should handle them.

You can register different emulators in user space that handle
accesses to different PCI devices from a guest.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 09/23] xen/dm: Make x86's DM feature common
  2020-12-08 14:54         ` Oleksandr
@ 2021-01-07 14:38           ` Oleksandr
  2021-01-07 15:01             ` Jan Beulich
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2021-01-07 14:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Daniel De Graaf, Oleksandr Tyshchenko,
	xen-devel


Hi Jan


>>>> On 30.11.2020 11:31, Oleksandr Tyshchenko wrote:
>>>>> From: Julien Grall <julien.grall@arm.com>
>>>>>
>>>>> As a lot of x86 code can be re-used on Arm later on, this patch
>>>>> splits devicemodel support into common and arch specific parts.
>>>>>
>>>>> The common DM feature is supposed to be built with IOREQ_SERVER
>>>>> option enabled (as well as the IOREQ feature), which is selected
>>>>> for x86's config HVM for now.
>>>>>
>>>>> Also update XSM code a bit to let DM op be used on Arm.
>>>>>
>>>>> This support is going to be used on Arm to be able run device
>>>>> emulator outside of Xen hypervisor.
>>>>>
>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>
>>>>> ---
>>>>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>>
>>>>> Changes RFC -> V1:
>>>>>      - update XSM, related changes were pulled from:
>>>>>        [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits 
>>>>> for IOREQ/DM features
>>>>>
>>>>> Changes V1 -> V2:
>>>>>      - update the author of a patch
>>>>>      - update patch description
>>>>>      - introduce xen/dm.h and move definitions here
>>>>>
>>>>> Changes V2 -> V3:
>>>>>      - no changes
>>>> And my concern regarding the common vs arch nesting also hasn't
>>>> changed.
>>>
>>> I am sorry, I might misread your comment, but I failed to see any
>>> obvious to me request(s) for changes.
>>> I have just re-read previous discussion...
>>> So the question about considering doing it the other way around (top
>>> level dm-op handling arch-specific
>>> and call into e.g. ioreq_server_dm_op() for otherwise unhandled ops) is
>>> exactly a concern which I should have addressed?
>> Well, on v2 you replied you didn't consider the alternative. I would
>> have expected that you would at least go through this consideration
>> process, and see whether there are better reasons to stick with the
>> apparently backwards arrangement than to change to the more
>> conventional one. If there are such reasons, I would expect them to
>> be called out in reply and perhaps also in the commit message; the
>> latter because down the road more people may wonder why the more
>> narrow / special set of cases gets handled at a higher layer than
>> the wider set of remaining ones, and they would then be able to find
>> an explanation without having to resort to searching through list
>> archives.
> Ah, will investigate. Sorry for not paying enough attention to it.
> Yes, IOREQ (I mean "common") ops are 7 out of 18 right now. The 
> subsequent patch is adding one more DM op - XEN_DMOP_set_irq_level.
> There are several PCI related ops which might want to be common in the 
> future (but I am not sure).
I think, I can say that I have considered the alternative (doing it the 
other way around), of course if I got your suggestion for V2 correctly.
Agree, the alternative is more natural, also compat_dm_op() was left in 
x86 code. For me the downside is in code duplication. With the 
alternative both arches have to duplicate do_dm_op() and "common" part 
of dm_op()
(only "switch ( op.op )" is unique). Probably, do_dm_op() could be moved 
to common dm.c as well as dm_op() could become global...


Now the question is which approach to take ("current" or "alternative") 
for me to prepare for V4. Personally, I would be OK with the both (with 
a little preference for "alternative").
Also, If we decide to go with the alternative, should the common files 
still be named dm.*?


diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index d3e2a9e..dc8e47d 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -16,6 +16,7 @@

  #include <xen/event.h>
  #include <xen/guest_access.h>
+#include <xen/dm.h>
  #include <xen/hypercall.h>
  #include <xen/ioreq.h>
  #include <xen/nospec.h>
@@ -29,13 +30,6 @@

  #include <public/hvm/hvm_op.h>

-struct dmop_args {
-    domid_t domid;
-    unsigned int nr_bufs;
-    /* Reserve enough buf elements for all current hypercalls. */
-    struct xen_dm_op_buf buf[2];
-};
-
  static bool _raw_copy_from_guest_buf_offset(void *dst,
                                              const struct dmop_args *args,
                                              unsigned int buf_idx,
@@ -408,71 +402,6 @@ static int dm_op(const struct dmop_args *op_args)

      switch ( op.op )
      {
-    case XEN_DMOP_create_ioreq_server:
-    {
-        struct xen_dm_op_create_ioreq_server *data =
-            &op.u.create_ioreq_server;
-
-        const_op = false;
-
-        rc = -EINVAL;
-        if ( data->pad[0] || data->pad[1] || data->pad[2] )
-            break;
-
-        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
-                                     &data->id);
-        break;
-    }
-
-    case XEN_DMOP_get_ioreq_server_info:
-    {
-        struct xen_dm_op_get_ioreq_server_info *data =
-            &op.u.get_ioreq_server_info;
-        const uint16_t valid_flags = XEN_DMOP_no_gfns;
-
-        const_op = false;
-
-        rc = -EINVAL;
-        if ( data->flags & ~valid_flags )
-            break;
-
-        rc = hvm_get_ioreq_server_info(d, data->id,
-                                       (data->flags & XEN_DMOP_no_gfns) ?
-                                       NULL : &data->ioreq_gfn,
-                                       (data->flags & XEN_DMOP_no_gfns) ?
-                                       NULL : &data->bufioreq_gfn,
- &data->bufioreq_port);
-        break;
-    }
-
-    case XEN_DMOP_map_io_range_to_ioreq_server:
-    {
-        const struct xen_dm_op_ioreq_server_range *data =
-            &op.u.map_io_range_to_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
-                                              data->start, data->end);
-        break;
-    }
-
-    case XEN_DMOP_unmap_io_range_from_ioreq_server:
-    {
-        const struct xen_dm_op_ioreq_server_range *data =
-            &op.u.unmap_io_range_from_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
-                                                  data->start, data->end);
-        break;
-    }
-
      case XEN_DMOP_map_mem_type_to_ioreq_server:
      {
          struct xen_dm_op_map_mem_type_to_ioreq_server *data =
@@ -523,32 +452,6 @@ static int dm_op(const struct dmop_args *op_args)
          break;
      }

-    case XEN_DMOP_set_ioreq_server_state:
-    {
-        const struct xen_dm_op_set_ioreq_server_state *data =
-            &op.u.set_ioreq_server_state;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
-        break;
-    }
-
-    case XEN_DMOP_destroy_ioreq_server:
-    {
-        const struct xen_dm_op_destroy_ioreq_server *data =
-            &op.u.destroy_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_destroy_ioreq_server(d, data->id);
-        break;
-    }
-
      case XEN_DMOP_track_dirty_vram:
      {
          const struct xen_dm_op_track_dirty_vram *data =
@@ -703,7 +606,7 @@ static int dm_op(const struct dmop_args *op_args)
      }

      default:
-        rc = -EOPNOTSUPP;
+        rc = ioreq_server_dm_op(&op, d, &const_op);
          break;
      }

diff --git a/xen/common/Makefile b/xen/common/Makefile
index b161381..8110431 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_CORE_PARKING) += core_parking.o
  obj-y += cpu.o
  obj-$(CONFIG_DEBUG_TRACE) += debugtrace.o
  obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o
+obj-$(CONFIG_IOREQ_SERVER) += dm.o
  obj-y += domain.o
  obj-y += event_2l.o
  obj-y += event_channel.o
diff --git a/xen/common/dm.c b/xen/common/dm.c
new file mode 100644
index 0000000..5653bcd
--- /dev/null
+++ b/xen/common/dm.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright (c) 2016 Citrix Systems Inc.
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License 
along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/dm.h>
+#include <xen/hypercall.h>
+#include <xen/ioreq.h>
+
+int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool 
*const_op)
+{
+    long rc;
+
+    switch ( op->op )
+    {
+    case XEN_DMOP_create_ioreq_server:
+    {
+        struct xen_dm_op_create_ioreq_server *data =
+            &op->u.create_ioreq_server;
+
+        *const_op = false;
+
+        rc = -EINVAL;
+        if ( data->pad[0] || data->pad[1] || data->pad[2] )
+            break;
+
+        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
+                                     &data->id);
+        break;
+    }
+
+    case XEN_DMOP_get_ioreq_server_info:
+    {
+        struct xen_dm_op_get_ioreq_server_info *data =
+            &op->u.get_ioreq_server_info;
+        const uint16_t valid_flags = XEN_DMOP_no_gfns;
+
+        *const_op = false;
+
+        rc = -EINVAL;
+        if ( data->flags & ~valid_flags )
+            break;
+
+        rc = hvm_get_ioreq_server_info(d, data->id,
+                                       (data->flags & XEN_DMOP_no_gfns) ?
+                                       NULL : (unsigned long 
*)&data->ioreq_gfn,
+                                       (data->flags & XEN_DMOP_no_gfns) ?
+                                       NULL : (unsigned long 
*)&data->bufioreq_gfn,
+ &data->bufioreq_port);
+        break;
+    }
+
+    case XEN_DMOP_map_io_range_to_ioreq_server:
+    {
+        const struct xen_dm_op_ioreq_server_range *data =
+            &op->u.map_io_range_to_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
+                                              data->start, data->end);
+        break;
+    }
+
+    case XEN_DMOP_unmap_io_range_from_ioreq_server:
+    {
+        const struct xen_dm_op_ioreq_server_range *data =
+            &op->u.unmap_io_range_from_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
+                                                  data->start, data->end);
+        break;
+    }
+
+    case XEN_DMOP_set_ioreq_server_state:
+    {
+        const struct xen_dm_op_set_ioreq_server_state *data =
+            &op->u.set_ioreq_server_state;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
+        break;
+    }
+
+    case XEN_DMOP_destroy_ioreq_server:
+    {
+        const struct xen_dm_op_destroy_ioreq_server *data =
+            &op->u.destroy_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_destroy_ioreq_server(d, data->id);
+        break;
+    }
+
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+    }
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/dm.h b/xen/include/xen/dm.h
new file mode 100644
index 0000000..8451f3c
--- /dev/null
+++ b/xen/include/xen/dm.h
@@ -0,0 +1,41 @@
+/*
+ * Copyright (c) 2016 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License 
along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __XEN_DM_H__
+#define __XEN_DM_H__
+
+#include <xen/sched.h>
+
+struct dmop_args {
+    domid_t domid;
+    unsigned int nr_bufs;
+    /* Reserve enough buf elements for all current hypercalls. */
+    struct xen_dm_op_buf buf[2];
+};
+
+int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool 
*const_op);
+
+#endif /* __XEN_DM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 7ae3c40..5c61d8e 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -707,14 +707,14 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG 
struct domain *d, unsigned int
      }
  }

+#endif /* CONFIG_X86 */
+
  static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
  {
      XSM_ASSERT_ACTION(XSM_DM_PRIV);
      return xsm_default_action(action, current->domain, d);
  }

-#endif /* CONFIG_X86 */
-
  #ifdef CONFIG_ARGO
  static XSM_INLINE int xsm_argo_enable(const struct domain *d)
  {
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index 7bd03d8..91ecff4 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -176,8 +176,8 @@ struct xsm_operations {
      int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t 
e, uint8_t allow);
      int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, 
uint8_t allow);
      int (*pmu_op) (struct domain *d, unsigned int op);
-    int (*dm_op) (struct domain *d);
  #endif
+    int (*dm_op) (struct domain *d);
      int (*xen_version) (uint32_t cmd);
      int (*domain_resource_map) (struct domain *d);
  #ifdef CONFIG_ARGO
@@ -682,13 +682,13 @@ static inline int xsm_pmu_op (xsm_default_t def, 
struct domain *d, unsigned int
      return xsm_ops->pmu_op(d, op);
  }

+#endif /* CONFIG_X86 */
+
  static inline int xsm_dm_op(xsm_default_t def, struct domain *d)
  {
      return xsm_ops->dm_op(d);
  }

-#endif /* CONFIG_X86 */
-
  static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
  {
      return xsm_ops->xen_version(op);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 9e09512..8bdffe7 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -147,8 +147,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
      set_to_dummy_if_null(ops, ioport_permission);
      set_to_dummy_if_null(ops, ioport_mapping);
      set_to_dummy_if_null(ops, pmu_op);
-    set_to_dummy_if_null(ops, dm_op);
  #endif
+    set_to_dummy_if_null(ops, dm_op);
      set_to_dummy_if_null(ops, xen_version);
      set_to_dummy_if_null(ops, domain_resource_map);
  #ifdef CONFIG_ARGO
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 19b0d9e..11784d7 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1656,14 +1656,13 @@ static int flask_pmu_op (struct domain *d, 
unsigned int op)
          return -EPERM;
      }
  }
+#endif /* CONFIG_X86 */

  static int flask_dm_op(struct domain *d)
  {
      return current_has_perm(d, SECCLASS_HVM, HVM__DM);
  }

-#endif /* CONFIG_X86 */
-
  static int flask_xen_version (uint32_t op)
  {
      u32 dsid = domain_sid(current->domain);
@@ -1865,8 +1864,8 @@ static struct xsm_operations flask_ops = {
      .ioport_permission = flask_ioport_permission,
      .ioport_mapping = flask_ioport_mapping,
      .pmu_op = flask_pmu_op,
-    .dm_op = flask_dm_op,
  #endif
+    .dm_op = flask_dm_op,
      .xen_version = flask_xen_version,
      .domain_resource_map = flask_domain_resource_map,
  #ifdef CONFIG_ARGO




-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 09/23] xen/dm: Make x86's DM feature common
  2021-01-07 14:38           ` Oleksandr
@ 2021-01-07 15:01             ` Jan Beulich
  2021-01-07 16:49               ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Jan Beulich @ 2021-01-07 15:01 UTC (permalink / raw)
  To: Oleksandr
  Cc: Julien Grall, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Daniel De Graaf, Oleksandr Tyshchenko,
	xen-devel

On 07.01.2021 15:38, Oleksandr wrote:
>>> Well, on v2 you replied you didn't consider the alternative. I would
>>> have expected that you would at least go through this consideration
>>> process, and see whether there are better reasons to stick with the
>>> apparently backwards arrangement than to change to the more
>>> conventional one. If there are such reasons, I would expect them to
>>> be called out in reply and perhaps also in the commit message; the
>>> latter because down the road more people may wonder why the more
>>> narrow / special set of cases gets handled at a higher layer than
>>> the wider set of remaining ones, and they would then be able to find
>>> an explanation without having to resort to searching through list
>>> archives.
>> Ah, will investigate. Sorry for not paying enough attention to it.
>> Yes, IOREQ (I mean "common") ops are 7 out of 18 right now. The 
>> subsequent patch is adding one more DM op - XEN_DMOP_set_irq_level.
>> There are several PCI related ops which might want to be common in the 
>> future (but I am not sure).
> I think, I can say that I have considered the alternative (doing it the 
> other way around), of course if I got your suggestion for V2 correctly.
> Agree, the alternative is more natural, also compat_dm_op() was left in 
> x86 code. For me the downside is in code duplication. With the 
> alternative both arches have to duplicate do_dm_op() and "common" part 
> of dm_op()
> (only "switch ( op.op )" is unique).

Yes, this duplication is the main downside.

> Now the question is which approach to take ("current" or "alternative") 
> for me to prepare for V4. Personally, I would be OK with the both (with 
> a little preference for "alternative").

Same here. I don't think I've seen anyone else voice an opinion.

> Also, If we decide to go with the alternative, should the common files 
> still be named dm.*?

I think this should live in ioreq.c, just like e.g.
iommu_do_pci_domctl() lives in passthrough/pci.c. This would then
allow quite a few things to become static in that file, I believe.

Jan


^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 09/23] xen/dm: Make x86's DM feature common
  2021-01-07 15:01             ` Jan Beulich
@ 2021-01-07 16:49               ` Oleksandr
  2021-01-12 22:23                 ` Oleksandr
  0 siblings, 1 reply; 127+ messages in thread
From: Oleksandr @ 2021-01-07 16:49 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall, Paul Durrant
  Cc: Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Daniel De Graaf, Oleksandr Tyshchenko,
	xen-devel


On 07.01.21 17:01, Jan Beulich wrote:

Hi Jan, all

> On 07.01.2021 15:38, Oleksandr wrote:
>>>> Well, on v2 you replied you didn't consider the alternative. I would
>>>> have expected that you would at least go through this consideration
>>>> process, and see whether there are better reasons to stick with the
>>>> apparently backwards arrangement than to change to the more
>>>> conventional one. If there are such reasons, I would expect them to
>>>> be called out in reply and perhaps also in the commit message; the
>>>> latter because down the road more people may wonder why the more
>>>> narrow / special set of cases gets handled at a higher layer than
>>>> the wider set of remaining ones, and they would then be able to find
>>>> an explanation without having to resort to searching through list
>>>> archives.
>>> Ah, will investigate. Sorry for not paying enough attention to it.
>>> Yes, IOREQ (I mean "common") ops are 7 out of 18 right now. The
>>> subsequent patch is adding one more DM op - XEN_DMOP_set_irq_level.
>>> There are several PCI related ops which might want to be common in the
>>> future (but I am not sure).
>> I think, I can say that I have considered the alternative (doing it the
>> other way around), of course if I got your suggestion for V2 correctly.
>> Agree, the alternative is more natural, also compat_dm_op() was left in
>> x86 code. For me the downside is in code duplication. With the
>> alternative both arches have to duplicate do_dm_op() and "common" part
>> of dm_op()
>> (only "switch ( op.op )" is unique).
> Yes, this duplication is the main downside.
>
>> Now the question is which approach to take ("current" or "alternative")
>> for me to prepare for V4. Personally, I would be OK with the both (with
>> a little preference for "alternative").
> Same here. I don't think I've seen anyone else voice an opinion.

Well, let's wait a bit for other opinions... @Julien, @Paul what do you 
think on that?


>
>> Also, If we decide to go with the alternative, should the common files
>> still be named dm.*?
> I think this should live in ioreq.c, just like e.g.
> iommu_do_pci_domctl() lives in passthrough/pci.c. This would then
> allow quite a few things to become static in that file, I believe.

I got it. It seems yes, at least the following could became static:

ioreq_server_create
ioreq_server_get_info
ioreq_server_map_io_range
ioreq_server_unmap_io_range
ioreq_server_set_state
ioreq_server_destroy


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 09/23] xen/dm: Make x86's DM feature common
  2021-01-07 16:49               ` Oleksandr
@ 2021-01-12 22:23                 ` Oleksandr
  0 siblings, 0 replies; 127+ messages in thread
From: Oleksandr @ 2021-01-12 22:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, Paul Durrant, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Ian Jackson, Julien Grall,
	Stefano Stabellini, Daniel De Graaf, Oleksandr Tyshchenko,
	xen-devel


Hi Jan.


On 07.01.21 18:49, Oleksandr wrote:
>
> On 07.01.21 17:01, Jan Beulich wrote:
>
> Hi Jan, all
>
>> On 07.01.2021 15:38, Oleksandr wrote:
>>>>> Well, on v2 you replied you didn't consider the alternative. I would
>>>>> have expected that you would at least go through this consideration
>>>>> process, and see whether there are better reasons to stick with the
>>>>> apparently backwards arrangement than to change to the more
>>>>> conventional one. If there are such reasons, I would expect them to
>>>>> be called out in reply and perhaps also in the commit message; the
>>>>> latter because down the road more people may wonder why the more
>>>>> narrow / special set of cases gets handled at a higher layer than
>>>>> the wider set of remaining ones, and they would then be able to find
>>>>> an explanation without having to resort to searching through list
>>>>> archives.
>>>> Ah, will investigate. Sorry for not paying enough attention to it.
>>>> Yes, IOREQ (I mean "common") ops are 7 out of 18 right now. The
>>>> subsequent patch is adding one more DM op - XEN_DMOP_set_irq_level.
>>>> There are several PCI related ops which might want to be common in the
>>>> future (but I am not sure).
>>> I think, I can say that I have considered the alternative (doing it the
>>> other way around), of course if I got your suggestion for V2 correctly.
>>> Agree, the alternative is more natural, also compat_dm_op() was left in
>>> x86 code. For me the downside is in code duplication. With the
>>> alternative both arches have to duplicate do_dm_op() and "common" part
>>> of dm_op()
>>> (only "switch ( op.op )" is unique).
>> Yes, this duplication is the main downside.
>>
>>> Now the question is which approach to take ("current" or "alternative")
>>> for me to prepare for V4. Personally, I would be OK with the both (with
>>> a little preference for "alternative").
>> Same here. I don't think I've seen anyone else voice an opinion.
>
> Well, let's wait a bit for other opinions... @Julien, @Paul what do 
> you think on that?
>
>
>>
>>> Also, If we decide to go with the alternative, should the common files
>>> still be named dm.*?
>> I think this should live in ioreq.c, just like e.g.
>> iommu_do_pci_domctl() lives in passthrough/pci.c. This would then
>> allow quite a few things to become static in that file, I believe.
>
> I got it. It seems yes, at least the following could became static:
>
> ioreq_server_create
> ioreq_server_get_info
> ioreq_server_map_io_range
> ioreq_server_unmap_io_range
> ioreq_server_set_state
> ioreq_server_destroy


Well, I have already pushed V4 with this "alternative" approach, let's 
discuss there [1].


[1] 
https://lore.kernel.org/xen-devel/1610488352-18494-10-git-send-email-olekstysh@gmail.com/T/#mb08f75657f43df869596c5c9c30e2395a9e35c7a


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 127+ messages in thread

* Re: [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-11-30 10:31 ` [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
  2020-12-08 14:24   ` Jan Beulich
  2020-12-09 23:49   ` Stefano Stabellini
@ 2021-01-15  1:18   ` Stefano Stabellini
  2 siblings, 0 replies; 127+ messages in thread
From: Stefano Stabellini @ 2021-01-15  1:18 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Jan Beulich, Wei Liu, Roger Pau Monné,
	Julien Grall

[-- Attachment #1: Type: text/plain, Size: 9325 bytes --]

On Mon, 30 Nov 2020, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch implements reference counting of foreign entries in
> in set_foreign_p2m_entry() on Arm. This is a mandatory action if
> we want to run emulator (IOREQ server) in other than dom0 domain,
> as we can't trust it to do the right thing if it is not running
> in dom0. So we need to grab a reference on the page to avoid it
> disappearing.
> 
> It is valid to always pass "p2m_map_foreign_rw" type to
> guest_physmap_add_entry() since the current and foreign domains
> would be always different. A case when they are equal would be
> rejected by rcu_lock_remote_domain_by_id(). Besides the similar
> comment in the code put a respective ASSERT() to catch incorrect
> usage in future.
> 
> It was tested with IOREQ feature to confirm that all the pages given
> to this function belong to a domain, so we can use the same approach
> as for XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap_one().
> 
> This involves adding an extra parameter for the foreign domain to
> set_foreign_p2m_entry() and a helper to indicate whether the arch
> supports the reference counting of foreign entries and the restriction
> for the hardware domain in the common code can be skipped for it.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> CC: Julien Grall <julien.grall@arm.com>

Acked-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Changes RFC -> V1:
>    - new patch, was split from:
>      "[RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features"
>    - rewrite a logic to handle properly reference in set_foreign_p2m_entry()
>      instead of treating foreign entries as p2m_ram_rw
> 
> Changes V1 -> V2:
>    - rebase according to the recent changes to acquire_resource()
>    - update patch description
>    - introduce arch_refcounts_p2m()
>    - add an explanation why p2m_map_foreign_rw is valid
>    - move set_foreign_p2m_entry() to p2m-common.h
>    - add const to new parameter
> 
> Changes V2 -> V3:
>    - update patch description
>    - rename arch_refcounts_p2m() to arch_acquire_resource_check()
>    - move comment to x86’s arch_acquire_resource_check()
>    - return rc in Arm's set_foreign_p2m_entry()
>    - put a respective ASSERT() into Arm's set_foreign_p2m_entry()
> ---
> ---
>  xen/arch/arm/p2m.c           | 24 ++++++++++++++++++++++++
>  xen/arch/x86/mm/p2m.c        |  5 +++--
>  xen/common/memory.c          | 10 +++-------
>  xen/include/asm-arm/p2m.h    | 19 +++++++++----------
>  xen/include/asm-x86/p2m.h    | 16 +++++++++++++---
>  xen/include/xen/p2m-common.h |  4 ++++
>  6 files changed, 56 insertions(+), 22 deletions(-)
> 
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 4eeb867..5b8d494 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1380,6 +1380,30 @@ int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn,
>      return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
>  }
>  
> +int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
> +                          unsigned long gfn, mfn_t mfn)
> +{
> +    struct page_info *page = mfn_to_page(mfn);
> +    int rc;
> +
> +    if ( !get_page(page, fd) )
> +        return -EINVAL;
> +
> +    /*
> +     * It is valid to always use p2m_map_foreign_rw here as if this gets
> +     * called then d != fd. A case when d == fd would be rejected by
> +     * rcu_lock_remote_domain_by_id() earlier. Put a respective ASSERT()
> +     * to catch incorrect usage in future.
> +     */
> +    ASSERT(d != fd);
> +
> +    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw);
> +    if ( rc )
> +        put_page(page);
> +
> +    return rc;
> +}
> +
>  static struct page_info *p2m_allocate_root(void)
>  {
>      struct page_info *page;
> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index 7a2ba82..4772c86 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -1321,7 +1321,8 @@ static int set_typed_p2m_entry(struct domain *d, unsigned long gfn_l,
>  }
>  
>  /* Set foreign mfn in the given guest's p2m table. */
> -int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
> +int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
> +                          unsigned long gfn, mfn_t mfn)
>  {
>      return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign,
>                                 p2m_get_hostp2m(d)->default_access);
> @@ -2621,7 +2622,7 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
>       * will update the m2p table which will result in  mfn -> gpfn of dom0
>       * and not fgfn of domU.
>       */
> -    rc = set_foreign_p2m_entry(tdom, gpfn, mfn);
> +    rc = set_foreign_p2m_entry(tdom, fdom, gpfn, mfn);
>      if ( rc )
>          gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. "
>                   "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 3363c06..49e3001 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1134,12 +1134,8 @@ static int acquire_resource(
>      xen_pfn_t mfn_list[32];
>      int rc;
>  
> -    /*
> -     * FIXME: Until foreign pages inserted into the P2M are properly
> -     *        reference counted, it is unsafe to allow mapping of
> -     *        resource pages unless the caller is the hardware domain.
> -     */
> -    if ( paging_mode_translate(currd) && !is_hardware_domain(currd) )
> +    if ( paging_mode_translate(currd) && !is_hardware_domain(currd) &&
> +         !arch_acquire_resource_check() )
>          return -EACCES;
>  
>      if ( copy_from_guest(&xmar, arg, 1) )
> @@ -1207,7 +1203,7 @@ static int acquire_resource(
>  
>          for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>          {
> -            rc = set_foreign_p2m_entry(currd, gfn_list[i],
> +            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>                                         _mfn(mfn_list[i]));
>              /* rc should be -EIO for any iteration other than the first */
>              if ( rc && i )
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index 28ca9a8..4f8056e 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -161,6 +161,15 @@ typedef enum {
>  #endif
>  #include <xen/p2m-common.h>
>  
> +static inline bool arch_acquire_resource_check(void)
> +{
> +    /*
> +     * The reference counting of foreign entries in set_foreign_p2m_entry()
> +     * is supported on Arm.
> +     */
> +    return true;
> +}
> +
>  static inline
>  void p2m_altp2m_check(struct vcpu *v, uint16_t idx)
>  {
> @@ -392,16 +401,6 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned int order)
>      return gfn_add(gfn, 1UL << order);
>  }
>  
> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
> -                                        mfn_t mfn)
> -{
> -    /*
> -     * NOTE: If this is implemented then proper reference counting of
> -     *       foreign entries will need to be implemented.
> -     */
> -    return -EOPNOTSUPP;
> -}
> -
>  /*
>   * A vCPU has cache enabled only when the MMU is enabled and data cache
>   * is enabled.
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 4603560..8d2dc22 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -382,6 +382,19 @@ struct p2m_domain {
>  #endif
>  #include <xen/p2m-common.h>
>  
> +static inline bool arch_acquire_resource_check(void)
> +{
> +    /*
> +     * The reference counting of foreign entries in set_foreign_p2m_entry()
> +     * is not supported on x86.
> +     *
> +     * FIXME: Until foreign pages inserted into the P2M are properly
> +     * reference counted, it is unsafe to allow mapping of
> +     * resource pages unless the caller is the hardware domain.
> +     */
> +    return false;
> +}
> +
>  /*
>   * Updates vCPU's n2pm to match its np2m_base in VMCx12 and returns that np2m.
>   */
> @@ -647,9 +660,6 @@ int p2m_finish_type_change(struct domain *d,
>  int p2m_is_logdirty_range(struct p2m_domain *, unsigned long start,
>                            unsigned long end);
>  
> -/* Set foreign entry in the p2m table (for priv-mapping) */
> -int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
> -
>  /* Set mmio addresses in the p2m table (for pass-through) */
>  int set_mmio_p2m_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
>                         unsigned int order);
> diff --git a/xen/include/xen/p2m-common.h b/xen/include/xen/p2m-common.h
> index 58031a6..b4bc709 100644
> --- a/xen/include/xen/p2m-common.h
> +++ b/xen/include/xen/p2m-common.h
> @@ -3,6 +3,10 @@
>  
>  #include <xen/mm.h>
>  
> +/* Set foreign entry in the p2m table */
> +int set_foreign_p2m_entry(struct domain *d, const struct domain *fd,
> +                          unsigned long gfn, mfn_t mfn);
> +
>  /* Remove a page from a domain's p2m table */
>  int __must_check
>  guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn,
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 127+ messages in thread

end of thread, other threads:[~2021-01-15  1:19 UTC | newest]

Thread overview: 127+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-30 10:31 Oleksandr Tyshchenko
2020-11-30 10:31 ` [PATCH V3 01/23] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
2020-12-01 11:03   ` Alex Bennée
2020-12-01 18:53     ` Oleksandr
2020-12-01 19:36       ` Alex Bennée
2020-12-02  8:00       ` Jan Beulich
2020-12-02 11:19         ` Oleksandr
2020-12-07 11:13   ` Jan Beulich
2020-12-07 15:27     ` Oleksandr
2020-12-07 16:29       ` Jan Beulich
2020-12-07 17:21         ` Oleksandr
2020-11-30 10:31 ` [PATCH V3 02/23] x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving Oleksandr Tyshchenko
2020-12-01 11:07   ` Alex Bennée
2020-12-07 11:19   ` Jan Beulich
2020-12-07 15:37     ` Oleksandr
2020-11-30 10:31 ` [PATCH V3 03/23] x86/ioreq: Provide out-of-line wrapper for the handle_mmio() Oleksandr Tyshchenko
2020-12-07 11:27   ` Jan Beulich
2020-12-07 15:39     ` Oleksandr
2020-11-30 10:31 ` [PATCH V3 04/23] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
2020-12-07 11:41   ` Jan Beulich
2020-12-07 19:43     ` Oleksandr
2020-12-08  9:21       ` Jan Beulich
2020-12-08 13:56         ` Oleksandr
2020-12-08 15:02           ` Jan Beulich
2020-12-08 17:24             ` Oleksandr
2020-11-30 10:31 ` [PATCH V3 05/23] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common Oleksandr Tyshchenko
2020-12-07 11:47   ` Jan Beulich
2020-11-30 10:31 ` [PATCH V3 06/23] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common Oleksandr Tyshchenko
2020-12-07 11:48   ` Jan Beulich
2020-11-30 10:31 ` [PATCH V3 07/23] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common Oleksandr Tyshchenko
2020-12-07 11:54   ` Jan Beulich
2020-11-30 10:31 ` [PATCH V3 08/23] xen/ioreq: Move x86's ioreq_server to struct domain Oleksandr Tyshchenko
2020-12-07 12:04   ` Jan Beulich
2020-12-07 12:12     ` Paul Durrant
2020-12-07 19:52     ` Oleksandr
2020-11-30 10:31 ` [PATCH V3 09/23] xen/dm: Make x86's DM feature common Oleksandr Tyshchenko
2020-12-07 12:08   ` Jan Beulich
2020-12-07 20:23     ` Oleksandr
2020-12-08  9:30       ` Jan Beulich
2020-12-08 14:54         ` Oleksandr
2021-01-07 14:38           ` Oleksandr
2021-01-07 15:01             ` Jan Beulich
2021-01-07 16:49               ` Oleksandr
2021-01-12 22:23                 ` Oleksandr
2020-11-30 10:31 ` [PATCH V3 10/23] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
2020-12-07 11:35   ` Jan Beulich
2020-12-07 12:11     ` Jan Beulich
2020-12-07 21:06       ` Oleksandr
2020-11-30 10:31 ` [PATCH V3 11/23] xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu Oleksandr Tyshchenko
2020-12-07 12:32   ` Jan Beulich
2020-12-07 20:59     ` Oleksandr
2020-12-08  7:52       ` Paul Durrant
2020-12-08  9:35         ` Jan Beulich
2020-12-08 18:21         ` Oleksandr
2020-11-30 10:31 ` [PATCH V3 12/23] xen/ioreq: Remove "hvm" prefixes from involved function names Oleksandr Tyshchenko
2020-12-07 12:45   ` Jan Beulich
2020-12-07 20:28     ` Oleksandr
2020-11-30 10:31 ` [PATCH V3 13/23] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
2020-12-09 21:32   ` Stefano Stabellini
2020-12-09 22:34     ` Oleksandr
2020-12-10  2:30       ` Stefano Stabellini
2020-11-30 10:31 ` [PATCH V3 14/23] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
2020-12-09 22:04   ` Stefano Stabellini
2020-12-09 22:49     ` Oleksandr
2020-12-10  2:30       ` Stefano Stabellini
2020-11-30 10:31 ` [PATCH V3 15/23] xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed Oleksandr Tyshchenko
2020-11-30 20:51   ` Volodymyr Babchuk
2020-12-01 12:46     ` Julien Grall
2020-12-09 23:18   ` Stefano Stabellini
2020-12-09 23:35     ` Stefano Stabellini
2020-12-09 23:47       ` Julien Grall
2020-12-10  2:30         ` Stefano Stabellini
2020-12-10 13:17           ` Julien Grall
2020-12-10 13:21           ` Oleksandr
2020-12-09 23:38     ` Julien Grall
2020-11-30 10:31 ` [PATCH V3 16/23] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
2020-12-08 14:24   ` Jan Beulich
2020-12-08 16:41     ` Oleksandr
2020-12-09 23:49   ` Stefano Stabellini
2021-01-15  1:18   ` Stefano Stabellini
2020-11-30 10:31 ` [PATCH V3 17/23] xen/ioreq: Introduce domain_has_ioreq_server() Oleksandr Tyshchenko
2020-12-08 15:11   ` Jan Beulich
2020-12-08 15:33     ` Oleksandr
2020-12-08 16:56       ` Oleksandr
2020-12-08 19:43         ` Paul Durrant
2020-12-08 20:16           ` Oleksandr
2020-12-09  9:01             ` Paul Durrant
2020-12-09 18:58               ` Julien Grall
2020-12-09 21:05                 ` Oleksandr
2020-12-09 20:36               ` Oleksandr
2020-12-10  8:38                 ` Paul Durrant
2020-12-10 16:57                   ` Oleksandr
2020-11-30 10:31 ` [PATCH V3 18/23] xen/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
2020-12-10  2:21   ` Stefano Stabellini
2020-12-10 12:58     ` Oleksandr
2020-12-10 13:38     ` Julien Grall
2020-11-30 10:31 ` [PATCH V3 19/23] xen/arm: io: Abstract sign-extension Oleksandr Tyshchenko
2020-11-30 21:03   ` Volodymyr Babchuk
2020-11-30 23:27     ` Oleksandr
2020-12-01  7:55       ` Jan Beulich
2020-12-01 10:30         ` Julien Grall
2020-12-01 10:42           ` Oleksandr
2020-12-01 12:13             ` Julien Grall
2020-12-01 12:24               ` Oleksandr
2020-12-01 12:28                 ` Julien Grall
2020-12-01 10:49           ` Jan Beulich
2020-12-01 10:23       ` Julien Grall
2020-11-30 10:31 ` [PATCH V3 20/23] xen/ioreq: Make x86's send_invalidate_req() common Oleksandr Tyshchenko
2020-12-08 15:24   ` Jan Beulich
2020-12-08 16:49     ` Oleksandr
2020-12-09  8:21       ` Jan Beulich
2020-11-30 10:31 ` [PATCH V3 21/23] xen/arm: Add mapcache invalidation handling Oleksandr Tyshchenko
2020-12-10  2:30   ` Stefano Stabellini
2020-12-10 18:50     ` Julien Grall
2020-12-11  1:28       ` Stefano Stabellini
2020-12-11 11:21         ` Oleksandr
2020-12-11 19:07           ` Stefano Stabellini
2020-12-11 19:37             ` Julien Grall
2020-12-11 19:27         ` Julien Grall
2020-11-30 10:31 ` [PATCH V3 22/23] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
2020-11-30 10:31 ` [PATCH V3 23/23] [RFC] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
2020-11-30 11:22 ` [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm Oleksandr
2020-12-07 13:03   ` Wei Chen
2020-12-07 21:03     ` Oleksandr
2020-11-30 16:21 ` Alex Bennée
2020-11-30 22:22   ` [PATCH V3 00/23] IOREQ feature (+ virtio-mmio) on Arm Oleksandr
2020-12-29 15:32   ` Roger Pau Monné

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.