All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm
@ 2020-09-10 20:21 Oleksandr Tyshchenko
  2020-09-10 20:21 ` [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
                   ` (15 more replies)
  0 siblings, 16 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini,
	Jun Nakajima, Kevin Tian, Tim Deegan, Daniel De Graaf,
	Volodymyr Babchuk, Julien Grall, Anthony PERARD,
	Bertrand Marquis

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Hello all.

The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
You can find an initial discussion at [1] and RFC patch series at [2].
Xen on Arm requires some implementation to forward guest MMIO access to a device
model in order to implement virtio-mmio backend or even mediator outside of hypervisor.
As Xen on x86 already contains required support this patch series tries to make it common
and introduce Arm specific bits plus some new functionality. Patch series is based on
Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device emulator".
Besides splitting existing IOREQ/DM support and introducing Arm side, the patch series
also includes virtio-mmio related changes (last 2 patches for toolstack)
for the reviewers to be able to see how the whole picture could look like.

According to the initial discussion there are a few open questions/concerns
regarding security, performance in VirtIO solution:
1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require different
   transport...
2. virtio backend is able to access all guest memory, some kind of protection
   is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest'
3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using
   Xenstore in virtio backend if possible.
4. a lot of 'foreing mapping' could lead to the memory exhaustion, Julien
   has some idea regarding that.

Looks like all of them are valid and worth considering, but the first thing
which we need on Arm is a mechanism to forward guest IO to a device emulator,
so let's focus on it in the first place.

***

There are a lot of changes since RFC series, several critical TODOs were
resolved on Arm, Arm code were improved and hardened, but one TODO still remains
which is "PIO handling" on Arm. The "PIO handling" TODO is expected to left unaddressed
for the current series. It is not an big issue for now while Xen doesn't have
support for vPCI on Arm. On Arm64 they are only used for PCI IO Bar and we
would probably want to expose them to emulator as PIO access to make a DM
completely arch-agnostic. So "PIO handling" should be implemented when we add
support for vPCI.

I left interface untouched in the following patch 
"xen/dm: Introduce xendevicemodel_set_irq_level DM op"
since there is still an open discussion what interface to use/what
information to pass to the hypervisor.

Also I decided to drop the following patch:
"[RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain"
as I got an advise to write our own policy using FLASK which would cover our use
case (with emulator in driver domain) rather than tweak Xen.

***

Patch series [3] was rebased on "1 month old staging branch"
(79c2d51 tools: bump library version numbers) and tested on Renesas Salvator-X
board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend (we will share it later)
running in driver domain and unmodified Linux Guest running on existing
virtio-blk driver (frontend). No issues were observed. Guest domain 'reboot/destroy'
use-cases work properly. Patch series was only build-tested on x86.

Please note, build-test passed for the following modes:
1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set
3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set
(!)5. Arm32: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
6. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set

(!) Please note, the build on Arm32 was broken for the RFC series (see cmpxchg
usage in hvm_send_buffered_ioreq()) due to the lack of cmpxchg_64 support on Arm32.

But, there is a patch on review to address this issue: https://patchwork.kernel.org/patch/11715559/
Together with the following patch in this series: "xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()"
we are able to fix Arm32 and harden IOREQ code on Arm. 

***

Any feedback/help would be highly appreciated.

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00825.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2020-08/msg00071.html
[3] https://github.com/otyshchenko1/xen/commits/ioreq_4.14_ml2

Oleksandr Tyshchenko (16):
  x86/ioreq: Prepare IOREQ feature for making it common
  xen/ioreq: Make x86's IOREQ feature common
  xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
  xen/ioreq: Provide alias for the handle_mmio()
  xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common
  xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
  xen/dm: Make x86's DM feature common
  xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
  arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  xen/ioreq: Introduce hvm_domain_has_ioreq_server()
  xen/dm: Introduce xendevicemodel_set_irq_level DM op
  xen/ioreq: Make x86's invalidate qemu mapcache handling common
  xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  libxl: Introduce basic virtio-mmio support on Arm
  [RFC] libxl: Add support for virtio-disk configuration

 MAINTAINERS                                     |    8 +-
 tools/libs/devicemodel/core.c                   |   18 +
 tools/libs/devicemodel/include/xendevicemodel.h |    4 +
 tools/libs/devicemodel/libxendevicemodel.map    |    1 +
 tools/libxl/Makefile                            |    4 +-
 tools/libxl/libxl_arm.c                         |   94 +-
 tools/libxl/libxl_create.c                      |    1 +
 tools/libxl/libxl_internal.h                    |    1 +
 tools/libxl/libxl_types.idl                     |   16 +
 tools/libxl/libxl_types_internal.idl            |    1 +
 tools/libxl/libxl_virtio_disk.c                 |  109 ++
 tools/xl/Makefile                               |    2 +-
 tools/xl/xl.h                                   |    3 +
 tools/xl/xl_cmdtable.c                          |   15 +
 tools/xl/xl_parse.c                             |  116 ++
 tools/xl/xl_virtio_disk.c                       |   46 +
 xen/arch/arm/Kconfig                            |    1 +
 xen/arch/arm/Makefile                           |    2 +
 xen/arch/arm/dm.c                               |   67 ++
 xen/arch/arm/domain.c                           |    9 +
 xen/arch/arm/io.c                               |   11 +-
 xen/arch/arm/ioreq.c                            |  142 +++
 xen/arch/arm/p2m.c                              |   16 +
 xen/arch/arm/traps.c                            |   41 +-
 xen/arch/x86/Kconfig                            |    1 +
 xen/arch/x86/hvm/dm.c                           |  289 +----
 xen/arch/x86/hvm/emulate.c                      |    2 +-
 xen/arch/x86/hvm/hvm.c                          |    2 +-
 xen/arch/x86/hvm/hypercall.c                    |    9 +-
 xen/arch/x86/hvm/intercept.c                    |    1 +
 xen/arch/x86/hvm/io.c                           |   16 +-
 xen/arch/x86/hvm/ioreq.c                        | 1426 +---------------------
 xen/arch/x86/hvm/stdvga.c                       |    2 +-
 xen/arch/x86/hvm/vmx/realmode.c                 |    1 +
 xen/arch/x86/hvm/vmx/vvmx.c                     |    3 +-
 xen/arch/x86/mm.c                               |   46 +-
 xen/arch/x86/mm/p2m.c                           |    5 +-
 xen/arch/x86/mm/shadow/common.c                 |    2 +-
 xen/common/Kconfig                              |    3 +
 xen/common/Makefile                             |    2 +
 xen/common/dm.c                                 |  288 +++++
 xen/common/ioreq.c                              | 1433 +++++++++++++++++++++++
 xen/common/memory.c                             |   54 +-
 xen/include/asm-arm/domain.h                    |   47 +
 xen/include/asm-arm/hvm/ioreq.h                 |  108 ++
 xen/include/asm-arm/mm.h                        |    8 -
 xen/include/asm-arm/mmio.h                      |    1 +
 xen/include/asm-arm/p2m.h                       |   11 +-
 xen/include/asm-arm/paging.h                    |    4 +
 xen/include/asm-x86/hvm/domain.h                |   36 +-
 xen/include/asm-x86/hvm/io.h                    |   17 -
 xen/include/asm-x86/hvm/ioreq.h                 |   47 +-
 xen/include/asm-x86/hvm/vcpu.h                  |    7 -
 xen/include/asm-x86/mm.h                        |    4 -
 xen/include/asm-x86/p2m.h                       |    3 +-
 xen/include/public/arch-arm.h                   |    5 +
 xen/include/public/hvm/dm_op.h                  |   15 +
 xen/include/xen/hypercall.h                     |   12 +
 xen/include/xen/ioreq.h                         |  146 +++
 xen/include/xen/sched.h                         |    2 +
 xen/include/xsm/dummy.h                         |    4 +-
 xen/include/xsm/xsm.h                           |    6 +-
 xen/xsm/dummy.c                                 |    2 +-
 xen/xsm/flask/hooks.c                           |    5 +-
 64 files changed, 2940 insertions(+), 1863 deletions(-)
 create mode 100644 tools/libxl/libxl_virtio_disk.c
 create mode 100644 tools/xl/xl_virtio_disk.c
 create mode 100644 xen/arch/arm/dm.c
 create mode 100644 xen/arch/arm/ioreq.c
 create mode 100644 xen/common/dm.c
 create mode 100644 xen/common/ioreq.c
 create mode 100644 xen/include/asm-arm/hvm/ioreq.h
 create mode 100644 xen/include/xen/ioreq.h

-- 
2.7.4



^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
@ 2020-09-10 20:21 ` Oleksandr Tyshchenko
  2020-09-14 13:52   ` Jan Beulich
  2020-09-23 17:22   ` Julien Grall
  2020-09-10 20:21 ` [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As a lot of x86 code can be re-used on Arm later on, this patch
prepares IOREQ support before moving to the common code. This way
we will get almost a verbatim copy for a code movement.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch, was split from:
     "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common"
   - fold the check of p->type into hvm_get_ioreq_server_range_type()
     and make it return success/failure
   - remove relocate_portio_handler() call from arch_hvm_ioreq_destroy()
     in arch/x86/hvm/ioreq.c
   - introduce arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion()
---
---
 xen/arch/x86/hvm/ioreq.c        | 117 ++++++++++++++++++++++++++--------------
 xen/include/asm-x86/hvm/ioreq.h |  16 ++++++
 2 files changed, 93 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 1cc27df..d912655 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -170,6 +170,29 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
     return true;
 }
 
+bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion)
+{
+    switch ( io_completion )
+    {
+    case HVMIO_realmode_completion:
+    {
+        struct hvm_emulate_ctxt ctxt;
+
+        hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
+        vmx_realmode_emulate_one(&ctxt);
+        hvm_emulate_writeback(&ctxt);
+
+        break;
+    }
+
+    default:
+        ASSERT_UNREACHABLE();
+        break;
+    }
+
+    return true;
+}
+
 bool handle_hvm_io_completion(struct vcpu *v)
 {
     struct domain *d = v->domain;
@@ -209,19 +232,8 @@ bool handle_hvm_io_completion(struct vcpu *v)
         return handle_pio(vio->io_req.addr, vio->io_req.size,
                           vio->io_req.dir);
 
-    case HVMIO_realmode_completion:
-    {
-        struct hvm_emulate_ctxt ctxt;
-
-        hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
-        vmx_realmode_emulate_one(&ctxt);
-        hvm_emulate_writeback(&ctxt);
-
-        break;
-    }
     default:
-        ASSERT_UNREACHABLE();
-        break;
+        return arch_handle_hvm_io_completion(io_completion);
     }
 
     return true;
@@ -836,6 +848,12 @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
     return rc;
 }
 
+/* Called when target domain is paused */
+int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s)
+{
+    return p2m_set_ioreq_server(s->target, 0, s);
+}
+
 int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 {
     struct hvm_ioreq_server *s;
@@ -855,7 +873,7 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
 
     domain_pause(d);
 
-    p2m_set_ioreq_server(d, 0, s);
+    arch_hvm_destroy_ioreq_server(s);
 
     hvm_ioreq_server_disable(s);
 
@@ -1215,8 +1233,7 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
     struct hvm_ioreq_server *s;
     unsigned int id;
 
-    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
-        return;
+    arch_hvm_ioreq_destroy(d);
 
     spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
 
@@ -1239,19 +1256,15 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
     spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
 }
 
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p)
+int hvm_get_ioreq_server_range_type(struct domain *d,
+                                    ioreq_t *p,
+                                    uint8_t *type,
+                                    uint64_t *addr)
 {
-    struct hvm_ioreq_server *s;
-    uint32_t cf8;
-    uint8_t type;
-    uint64_t addr;
-    unsigned int id;
+    uint32_t cf8 = d->arch.hvm.pci_cf8;
 
     if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
-        return NULL;
-
-    cf8 = d->arch.hvm.pci_cf8;
+        return -EINVAL;
 
     if ( p->type == IOREQ_TYPE_PIO &&
          (p->addr & ~3) == 0xcfc &&
@@ -1264,8 +1277,8 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
         reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf);
 
         /* PCI config data cycle */
-        type = XEN_DMOP_IO_RANGE_PCI;
-        addr = ((uint64_t)sbdf.sbdf << 32) | reg;
+        *type = XEN_DMOP_IO_RANGE_PCI;
+        *addr = ((uint64_t)sbdf.sbdf << 32) | reg;
         /* AMD extended configuration space access? */
         if ( CF8_ADDR_HI(cf8) &&
              d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&
@@ -1277,16 +1290,30 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
 
             if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) &&
                  (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
-                addr |= CF8_ADDR_HI(cf8);
+                *addr |= CF8_ADDR_HI(cf8);
         }
     }
     else
     {
-        type = (p->type == IOREQ_TYPE_PIO) ?
-                XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
-        addr = p->addr;
+        *type = (p->type == IOREQ_TYPE_PIO) ?
+                 XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
+        *addr = p->addr;
     }
 
+    return 0;
+}
+
+struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                 ioreq_t *p)
+{
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+    unsigned int id;
+
+    if ( hvm_get_ioreq_server_range_type(d, p, &type, &addr) )
+        return NULL;
+
     FOR_EACH_IOREQ_SERVER(d, id, s)
     {
         struct rangeset *r;
@@ -1351,7 +1378,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
     pg = iorp->va;
 
     if ( !pg )
-        return X86EMUL_UNHANDLEABLE;
+        return IOREQ_IO_UNHANDLED;
 
     /*
      * Return 0 for the cases we can't deal with:
@@ -1381,7 +1408,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
         break;
     default:
         gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return X86EMUL_UNHANDLEABLE;
+        return IOREQ_IO_UNHANDLED;
     }
 
     spin_lock(&s->bufioreq_lock);
@@ -1391,7 +1418,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
     {
         /* The queue is full: send the iopacket through the normal path. */
         spin_unlock(&s->bufioreq_lock);
-        return X86EMUL_UNHANDLEABLE;
+        return IOREQ_IO_UNHANDLED;
     }
 
     pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
@@ -1422,7 +1449,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
     notify_via_xen_event_channel(d, s->bufioreq_evtchn);
     spin_unlock(&s->bufioreq_lock);
 
-    return X86EMUL_OKAY;
+    return IOREQ_IO_HANDLED;
 }
 
 int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
@@ -1438,7 +1465,7 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
         return hvm_send_buffered_ioreq(s, proto_p);
 
     if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-        return X86EMUL_RETRY;
+        return IOREQ_IO_RETRY;
 
     list_for_each_entry ( sv,
                           &s->ioreq_vcpu_list,
@@ -1478,11 +1505,11 @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
             notify_via_xen_event_channel(d, port);
 
             sv->pending = true;
-            return X86EMUL_RETRY;
+            return IOREQ_IO_RETRY;
         }
     }
 
-    return X86EMUL_UNHANDLEABLE;
+    return IOREQ_IO_UNHANDLED;
 }
 
 unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
@@ -1496,7 +1523,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
         if ( !s->enabled )
             continue;
 
-        if ( hvm_send_ioreq(s, p, buffered) == X86EMUL_UNHANDLEABLE )
+        if ( hvm_send_ioreq(s, p, buffered) == IOREQ_IO_UNHANDLED )
             failed++;
     }
 
@@ -1515,11 +1542,21 @@ static int hvm_access_cf8(
     return X86EMUL_UNHANDLEABLE;
 }
 
+void arch_hvm_ioreq_init(struct domain *d)
+{
+    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
+}
+
+void arch_hvm_ioreq_destroy(struct domain *d)
+{
+
+}
+
 void hvm_ioreq_init(struct domain *d)
 {
     spin_lock_init(&d->arch.hvm.ioreq_server.lock);
 
-    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
+    arch_hvm_ioreq_init(d);
 }
 
 /*
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index e2588e9..151b92b 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -55,6 +55,22 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
 
 void hvm_ioreq_init(struct domain *d);
 
+int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s);
+
+bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion);
+
+int hvm_get_ioreq_server_range_type(struct domain *d,
+                                    ioreq_t *p,
+                                    uint8_t *type,
+                                    uint64_t *addr);
+
+void arch_hvm_ioreq_init(struct domain *d);
+void arch_hvm_ioreq_destroy(struct domain *d);
+
+#define IOREQ_IO_HANDLED     X86EMUL_OKAY
+#define IOREQ_IO_UNHANDLED   X86EMUL_UNHANDLEABLE
+#define IOREQ_IO_RETRY       X86EMUL_RETRY
+
 #endif /* __ASM_X86_HVM_IOREQ_H__ */
 
 /*
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
  2020-09-10 20:21 ` [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
@ 2020-09-10 20:21 ` Oleksandr Tyshchenko
  2020-09-14 14:17   ` Jan Beulich
  2020-09-24 18:01   ` Julien Grall
  2020-09-10 20:21 ` [PATCH V1 03/16] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common Oleksandr Tyshchenko
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As a lot of x86 code can be re-used on Arm later on, this patch
moves previously prepared IOREQ support to the common code.

The code movement is almost a verbatim copy with re-ordering
the headers alphabetically.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - was split into three patches:
     - x86/ioreq: Prepare IOREQ feature for making it common
     - xen/ioreq: Make x86's IOREQ feature common
     - xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
   - update MAINTAINERS file
   - do not use a separate subdir for the IOREQ stuff, move it to:
     - xen/common/ioreq.c
     - xen/include/xen/ioreq.h
   - update x86's files to include xen/ioreq.h
   - remove unneeded headers in arch/x86/hvm/ioreq.c
   - re-order the headers alphabetically in common/ioreq.c
   - update common/ioreq.c according to the newly introduced arch functions:
     arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion()
---
---
 MAINTAINERS                     |    8 +-
 xen/arch/x86/Kconfig            |    1 +
 xen/arch/x86/hvm/dm.c           |    2 +-
 xen/arch/x86/hvm/emulate.c      |    2 +-
 xen/arch/x86/hvm/hvm.c          |    2 +-
 xen/arch/x86/hvm/io.c           |    2 +-
 xen/arch/x86/hvm/ioreq.c        | 1425 +--------------------------------------
 xen/arch/x86/hvm/stdvga.c       |    2 +-
 xen/arch/x86/hvm/vmx/vvmx.c     |    3 +-
 xen/arch/x86/mm.c               |    2 +-
 xen/arch/x86/mm/shadow/common.c |    2 +-
 xen/common/Kconfig              |    3 +
 xen/common/Makefile             |    1 +
 xen/common/ioreq.c              | 1410 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/ioreq.h |   35 +-
 xen/include/xen/ioreq.h         |   82 +++
 16 files changed, 1533 insertions(+), 1449 deletions(-)
 create mode 100644 xen/common/ioreq.c
 create mode 100644 xen/include/xen/ioreq.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 33fe513..72ba472 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -333,6 +333,13 @@ X:	xen/drivers/passthrough/vtd/
 X:	xen/drivers/passthrough/device_tree.c
 F:	xen/include/xen/iommu.h
 
+I/O EMULATION (IOREQ)
+M:	Paul Durrant <paul@xen.org>
+S:	Supported
+F:	xen/common/ioreq.c
+F:	xen/include/xen/ioreq.h
+F:	xen/include/public/hvm/ioreq.h
+
 KCONFIG
 M:	Doug Goldstein <cardoe@cardoe.com>
 S:	Supported
@@ -549,7 +556,6 @@ F:	xen/arch/x86/hvm/ioreq.c
 F:	xen/include/asm-x86/hvm/emulate.h
 F:	xen/include/asm-x86/hvm/io.h
 F:	xen/include/asm-x86/hvm/ioreq.h
-F:	xen/include/public/hvm/ioreq.h
 
 X86 MEMORY MANAGEMENT
 M:	Jan Beulich <jbeulich@suse.com>
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index a636a4b..f5a9f87 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -91,6 +91,7 @@ config PV_LINEAR_PT
 
 config HVM
 	def_bool !PV_SHIM_EXCLUSIVE
+	select IOREQ_SERVER
 	prompt "HVM support"
 	---help---
 	  Interfaces to support HVM domains.  HVM domains require hardware
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 9930d68..5ce484a 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -17,12 +17,12 @@
 #include <xen/event.h>
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
+#include <xen/ioreq.h>
 #include <xen/nospec.h>
 #include <xen/sched.h>
 
 #include <asm/hap.h>
 #include <asm/hvm/cacheattr.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/shadow.h>
 
 #include <xsm/xsm.h>
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 8b4e73a..39bdf8d 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -10,6 +10,7 @@
  */
 
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
 #include <xen/paging.h>
@@ -20,7 +21,6 @@
 #include <asm/xstate.h>
 #include <asm/hvm/emulate.h>
 #include <asm/hvm/hvm.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/monitor.h>
 #include <asm/hvm/trace.h>
 #include <asm/hvm/support.h>
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index a9d1685..498e0e0 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -20,6 +20,7 @@
 
 #include <xen/ctype.h>
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/trace.h>
 #include <xen/sched.h>
@@ -64,7 +65,6 @@
 #include <asm/hvm/trace.h>
 #include <asm/hvm/nestedhvm.h>
 #include <asm/hvm/monitor.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/viridian.h>
 #include <asm/hvm/vm_event.h>
 #include <asm/altp2m.h>
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 724ab44..14f8c89 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -19,6 +19,7 @@
  */
 
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/mm.h>
 #include <xen/lib.h>
 #include <xen/errno.h>
@@ -35,7 +36,6 @@
 #include <asm/shadow.h>
 #include <asm/p2m.h>
 #include <asm/hvm/hvm.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vpt.h>
 #include <asm/hvm/vpic.h>
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index d912655..102b758 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -16,1086 +16,39 @@
  * this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
-#include <xen/ctype.h>
-#include <xen/init.h>
-#include <xen/lib.h>
-#include <xen/trace.h>
-#include <xen/sched.h>
-#include <xen/irq.h>
-#include <xen/softirq.h>
 #include <xen/domain.h>
-#include <xen/event.h>
-#include <xen/paging.h>
-#include <xen/vpci.h>
+#include <xen/ioreq.h>
 
-#include <asm/hvm/emulate.h>
-#include <asm/hvm/hvm.h>
-#include <asm/hvm/ioreq.h>
-#include <asm/hvm/vmx/vmx.h>
-
-#include <public/hvm/ioreq.h>
-#include <public/hvm/params.h>
-
-static void set_ioreq_server(struct domain *d, unsigned int id,
-                             struct hvm_ioreq_server *s)
-{
-    ASSERT(id < MAX_NR_IOREQ_SERVERS);
-    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
-
-    d->arch.hvm.ioreq_server.server[id] = s;
-}
-
-#define GET_IOREQ_SERVER(d, id) \
-    (d)->arch.hvm.ioreq_server.server[id]
-
-static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
-                                                 unsigned int id)
-{
-    if ( id >= MAX_NR_IOREQ_SERVERS )
-        return NULL;
-
-    return GET_IOREQ_SERVER(d, id);
-}
-
-/*
- * Iterate over all possible ioreq servers.
- *
- * NOTE: The iteration is backwards such that more recently created
- *       ioreq servers are favoured in hvm_select_ioreq_server().
- *       This is a semantic that previously existed when ioreq servers
- *       were held in a linked list.
- */
-#define FOR_EACH_IOREQ_SERVER(d, id, s) \
-    for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \
-        if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \
-            continue; \
-        else
-
-static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
-{
-    shared_iopage_t *p = s->ioreq.va;
-
-    ASSERT((v == current) || !vcpu_runnable(v));
-    ASSERT(p != NULL);
-
-    return &p->vcpu_ioreq[v->vcpu_id];
-}
-
-static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v,
-                                               struct hvm_ioreq_server **srvp)
-{
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        struct hvm_ioreq_vcpu *sv;
-
-        list_for_each_entry ( sv,
-                              &s->ioreq_vcpu_list,
-                              list_entry )
-        {
-            if ( sv->vcpu == v && sv->pending )
-            {
-                if ( srvp )
-                    *srvp = s;
-                return sv;
-            }
-        }
-    }
-
-    return NULL;
-}
-
-bool hvm_io_pending(struct vcpu *v)
-{
-    return get_pending_vcpu(v, NULL);
-}
-
-static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
-{
-    unsigned int prev_state = STATE_IOREQ_NONE;
-    unsigned int state = p->state;
-    uint64_t data = ~0;
-
-    smp_rmb();
-
-    /*
-     * The only reason we should see this condition be false is when an
-     * emulator dying races with I/O being requested.
-     */
-    while ( likely(state != STATE_IOREQ_NONE) )
-    {
-        if ( unlikely(state < prev_state) )
-        {
-            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
-                     prev_state, state);
-            sv->pending = false;
-            domain_crash(sv->vcpu->domain);
-            return false; /* bail */
-        }
-
-        switch ( prev_state = state )
-        {
-        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
-            p->state = STATE_IOREQ_NONE;
-            data = p->data;
-            break;
-
-        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
-        case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(sv->ioreq_evtchn,
-                                      ({ state = p->state;
-                                         smp_rmb();
-                                         state != prev_state; }));
-            continue;
-
-        default:
-            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state);
-            sv->pending = false;
-            domain_crash(sv->vcpu->domain);
-            return false; /* bail */
-        }
-
-        break;
-    }
-
-    p = &sv->vcpu->arch.hvm.hvm_io.io_req;
-    if ( hvm_ioreq_needs_completion(p) )
-        p->data = data;
-
-    sv->pending = false;
-
-    return true;
-}
-
-bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion)
-{
-    switch ( io_completion )
-    {
-    case HVMIO_realmode_completion:
-    {
-        struct hvm_emulate_ctxt ctxt;
-
-        hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
-        vmx_realmode_emulate_one(&ctxt);
-        hvm_emulate_writeback(&ctxt);
-
-        break;
-    }
-
-    default:
-        ASSERT_UNREACHABLE();
-        break;
-    }
-
-    return true;
-}
-
-bool handle_hvm_io_completion(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
-    struct hvm_ioreq_server *s;
-    struct hvm_ioreq_vcpu *sv;
-    enum hvm_io_completion io_completion;
-
-    if ( has_vpci(d) && vpci_process_pending(v) )
-    {
-        raise_softirq(SCHEDULE_SOFTIRQ);
-        return false;
-    }
-
-    sv = get_pending_vcpu(v, &s);
-    if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
-        return false;
-
-    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
-        STATE_IORESP_READY : STATE_IOREQ_NONE;
-
-    msix_write_completion(v);
-    vcpu_end_shutdown_deferral(v);
-
-    io_completion = vio->io_completion;
-    vio->io_completion = HVMIO_no_completion;
-
-    switch ( io_completion )
-    {
-    case HVMIO_no_completion:
-        break;
-
-    case HVMIO_mmio_completion:
-        return handle_mmio();
-
-    case HVMIO_pio_completion:
-        return handle_pio(vio->io_req.addr, vio->io_req.size,
-                          vio->io_req.dir);
-
-    default:
-        return arch_handle_hvm_io_completion(io_completion);
-    }
-
-    return true;
-}
-
-static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s)
-{
-    struct domain *d = s->target;
-    unsigned int i;
-
-    BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN != HVM_PARAM_IOREQ_PFN + 1);
-
-    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
-    {
-        if ( !test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask) )
-            return _gfn(d->arch.hvm.params[i]);
-    }
-
-    return INVALID_GFN;
-}
-
-static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
-{
-    struct domain *d = s->target;
-    unsigned int i;
-
-    for ( i = 0; i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8; i++ )
-    {
-        if ( test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.mask) )
-            return _gfn(d->arch.hvm.ioreq_gfn.base + i);
-    }
-
-    /*
-     * If we are out of 'normal' GFNs then we may still have a 'legacy'
-     * GFN available.
-     */
-    return hvm_alloc_legacy_ioreq_gfn(s);
-}
-
-static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s,
-                                      gfn_t gfn)
-{
-    struct domain *d = s->target;
-    unsigned int i;
-
-    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
-    {
-        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
-             break;
-    }
-    if ( i > HVM_PARAM_BUFIOREQ_PFN )
-        return false;
-
-    set_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask);
-    return true;
-}
-
-static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
-{
-    struct domain *d = s->target;
-    unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base;
-
-    ASSERT(!gfn_eq(gfn, INVALID_GFN));
-
-    if ( !hvm_free_legacy_ioreq_gfn(s, gfn) )
-    {
-        ASSERT(i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8);
-        set_bit(i, &d->arch.hvm.ioreq_gfn.mask);
-    }
-}
-
-static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return;
-
-    destroy_ring_for_helper(&iorp->va, iorp->page);
-    iorp->page = NULL;
-
-    hvm_free_ioreq_gfn(s, iorp->gfn);
-    iorp->gfn = INVALID_GFN;
-}
-
-static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    int rc;
-
-    if ( iorp->page )
-    {
-        /*
-         * If a page has already been allocated (which will happen on
-         * demand if hvm_get_ioreq_server_frame() is called), then
-         * mapping a guest frame is not permitted.
-         */
-        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-            return -EPERM;
-
-        return 0;
-    }
-
-    if ( d->is_dying )
-        return -EINVAL;
-
-    iorp->gfn = hvm_alloc_ioreq_gfn(s);
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return -ENOMEM;
-
-    rc = prepare_ring_for_helper(d, gfn_x(iorp->gfn), &iorp->page,
-                                 &iorp->va);
-
-    if ( rc )
-        hvm_unmap_ioreq_gfn(s, buf);
-
-    return rc;
-}
-
-static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    struct page_info *page;
-
-    if ( iorp->page )
-    {
-        /*
-         * If a guest frame has already been mapped (which may happen
-         * on demand if hvm_get_ioreq_server_info() is called), then
-         * allocating a page is not permitted.
-         */
-        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
-            return -EPERM;
-
-        return 0;
-    }
-
-    page = alloc_domheap_page(s->target, MEMF_no_refcount);
-
-    if ( !page )
-        return -ENOMEM;
-
-    if ( !get_page_and_type(page, s->target, PGT_writable_page) )
-    {
-        /*
-         * The domain can't possibly know about this page yet, so failure
-         * here is a clear indication of something fishy going on.
-         */
-        domain_crash(s->emulator);
-        return -ENODATA;
-    }
-
-    iorp->va = __map_domain_page_global(page);
-    if ( !iorp->va )
-        goto fail;
-
-    iorp->page = page;
-    clear_page(iorp->va);
-    return 0;
-
- fail:
-    put_page_alloc_ref(page);
-    put_page_and_type(page);
-
-    return -ENOMEM;
-}
-
-static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    struct page_info *page = iorp->page;
-
-    if ( !page )
-        return;
-
-    iorp->page = NULL;
-
-    unmap_domain_page_global(iorp->va);
-    iorp->va = NULL;
-
-    put_page_alloc_ref(page);
-    put_page_and_type(page);
-}
-
-bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
-{
-    const struct hvm_ioreq_server *s;
-    unsigned int id;
-    bool found = false;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
-        {
-            found = true;
-            break;
-        }
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return found;
-}
-
-static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-
-{
-    struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return;
-
-    if ( guest_physmap_remove_page(d, iorp->gfn,
-                                   page_to_mfn(iorp->page), 0) )
-        domain_crash(d);
-    clear_page(iorp->va);
-}
-
-static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    int rc;
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return 0;
-
-    clear_page(iorp->va);
-
-    rc = guest_physmap_add_page(d, iorp->gfn,
-                                page_to_mfn(iorp->page), 0);
-    if ( rc == 0 )
-        paging_mark_pfn_dirty(d, _pfn(gfn_x(iorp->gfn)));
-
-    return rc;
-}
-
-static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
-                                    struct hvm_ioreq_vcpu *sv)
-{
-    ASSERT(spin_is_locked(&s->lock));
-
-    if ( s->ioreq.va != NULL )
-    {
-        ioreq_t *p = get_ioreq(s, sv->vcpu);
-
-        p->vp_eport = sv->ioreq_evtchn;
-    }
-}
-
-#define HANDLE_BUFIOREQ(s) \
-    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
-
-static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
-                                     struct vcpu *v)
-{
-    struct hvm_ioreq_vcpu *sv;
-    int rc;
-
-    sv = xzalloc(struct hvm_ioreq_vcpu);
-
-    rc = -ENOMEM;
-    if ( !sv )
-        goto fail1;
-
-    spin_lock(&s->lock);
-
-    rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id,
-                                         s->emulator->domain_id, NULL);
-    if ( rc < 0 )
-        goto fail2;
-
-    sv->ioreq_evtchn = rc;
-
-    if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-    {
-        rc = alloc_unbound_xen_event_channel(v->domain, 0,
-                                             s->emulator->domain_id, NULL);
-        if ( rc < 0 )
-            goto fail3;
-
-        s->bufioreq_evtchn = rc;
-    }
-
-    sv->vcpu = v;
-
-    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
-
-    if ( s->enabled )
-        hvm_update_ioreq_evtchn(s, sv);
-
-    spin_unlock(&s->lock);
-    return 0;
-
- fail3:
-    free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
- fail2:
-    spin_unlock(&s->lock);
-    xfree(sv);
-
- fail1:
-    return rc;
-}
-
-static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
-                                         struct vcpu *v)
-{
-    struct hvm_ioreq_vcpu *sv;
-
-    spin_lock(&s->lock);
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-    {
-        if ( sv->vcpu != v )
-            continue;
-
-        list_del(&sv->list_entry);
-
-        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
-
-        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
-        xfree(sv);
-        break;
-    }
-
-    spin_unlock(&s->lock);
-}
-
-static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
-{
-    struct hvm_ioreq_vcpu *sv, *next;
-
-    spin_lock(&s->lock);
-
-    list_for_each_entry_safe ( sv,
-                               next,
-                               &s->ioreq_vcpu_list,
-                               list_entry )
-    {
-        struct vcpu *v = sv->vcpu;
-
-        list_del(&sv->list_entry);
-
-        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
-
-        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
-        xfree(sv);
-    }
-
-    spin_unlock(&s->lock);
-}
-
-static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
-{
-    int rc;
-
-    rc = hvm_map_ioreq_gfn(s, false);
-
-    if ( !rc && HANDLE_BUFIOREQ(s) )
-        rc = hvm_map_ioreq_gfn(s, true);
-
-    if ( rc )
-        hvm_unmap_ioreq_gfn(s, false);
-
-    return rc;
-}
-
-static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
-{
-    hvm_unmap_ioreq_gfn(s, true);
-    hvm_unmap_ioreq_gfn(s, false);
-}
-
-static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
-{
-    int rc;
-
-    rc = hvm_alloc_ioreq_mfn(s, false);
-
-    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
-        rc = hvm_alloc_ioreq_mfn(s, true);
-
-    if ( rc )
-        hvm_free_ioreq_mfn(s, false);
-
-    return rc;
-}
-
-static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
-{
-    hvm_free_ioreq_mfn(s, true);
-    hvm_free_ioreq_mfn(s, false);
-}
-
-static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
-{
-    unsigned int i;
-
-    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
-        rangeset_destroy(s->range[i]);
-}
-
-static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
-                                            ioservid_t id)
-{
-    unsigned int i;
-    int rc;
-
-    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
-    {
-        char *name;
-
-        rc = asprintf(&name, "ioreq_server %d %s", id,
-                      (i == XEN_DMOP_IO_RANGE_PORT) ? "port" :
-                      (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" :
-                      (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" :
-                      "");
-        if ( rc )
-            goto fail;
-
-        s->range[i] = rangeset_new(s->target, name,
-                                   RANGESETF_prettyprint_hex);
-
-        xfree(name);
-
-        rc = -ENOMEM;
-        if ( !s->range[i] )
-            goto fail;
-
-        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
-    }
-
-    return 0;
-
- fail:
-    hvm_ioreq_server_free_rangesets(s);
-
-    return rc;
-}
-
-static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
-{
-    struct hvm_ioreq_vcpu *sv;
-
-    spin_lock(&s->lock);
-
-    if ( s->enabled )
-        goto done;
-
-    hvm_remove_ioreq_gfn(s, false);
-    hvm_remove_ioreq_gfn(s, true);
-
-    s->enabled = true;
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-        hvm_update_ioreq_evtchn(s, sv);
-
-  done:
-    spin_unlock(&s->lock);
-}
-
-static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
-{
-    spin_lock(&s->lock);
-
-    if ( !s->enabled )
-        goto done;
-
-    hvm_add_ioreq_gfn(s, true);
-    hvm_add_ioreq_gfn(s, false);
-
-    s->enabled = false;
-
- done:
-    spin_unlock(&s->lock);
-}
-
-static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
-                                 struct domain *d, int bufioreq_handling,
-                                 ioservid_t id)
-{
-    struct domain *currd = current->domain;
-    struct vcpu *v;
-    int rc;
-
-    s->target = d;
-
-    get_knownalive_domain(currd);
-    s->emulator = currd;
-
-    spin_lock_init(&s->lock);
-    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
-    spin_lock_init(&s->bufioreq_lock);
-
-    s->ioreq.gfn = INVALID_GFN;
-    s->bufioreq.gfn = INVALID_GFN;
-
-    rc = hvm_ioreq_server_alloc_rangesets(s, id);
-    if ( rc )
-        return rc;
-
-    s->bufioreq_handling = bufioreq_handling;
-
-    for_each_vcpu ( d, v )
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc )
-            goto fail_add;
-    }
-
-    return 0;
-
- fail_add:
-    hvm_ioreq_server_remove_all_vcpus(s);
-    hvm_ioreq_server_unmap_pages(s);
-
-    hvm_ioreq_server_free_rangesets(s);
-
-    put_domain(s->emulator);
-    return rc;
-}
-
-static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
-{
-    ASSERT(!s->enabled);
-    hvm_ioreq_server_remove_all_vcpus(s);
-
-    /*
-     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
-     *       hvm_ioreq_server_free_pages() in that order.
-     *       This is because the former will do nothing if the pages
-     *       are not mapped, leaving the page to be freed by the latter.
-     *       However if the pages are mapped then the former will set
-     *       the page_info pointer to NULL, meaning the latter will do
-     *       nothing.
-     */
-    hvm_ioreq_server_unmap_pages(s);
-    hvm_ioreq_server_free_pages(s);
-
-    hvm_ioreq_server_free_rangesets(s);
-
-    put_domain(s->emulator);
-}
-
-int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
-                            ioservid_t *id)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int i;
-    int rc;
-
-    if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC )
-        return -EINVAL;
-
-    s = xzalloc(struct hvm_ioreq_server);
-    if ( !s )
-        return -ENOMEM;
-
-    domain_pause(d);
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
-    {
-        if ( !GET_IOREQ_SERVER(d, i) )
-            break;
-    }
-
-    rc = -ENOSPC;
-    if ( i >= MAX_NR_IOREQ_SERVERS )
-        goto fail;
-
-    /*
-     * It is safe to call set_ioreq_server() prior to
-     * hvm_ioreq_server_init() since the target domain is paused.
-     */
-    set_ioreq_server(d, i, s);
-
-    rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i);
-    if ( rc )
-    {
-        set_ioreq_server(d, i, NULL);
-        goto fail;
-    }
-
-    if ( id )
-        *id = i;
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    domain_unpause(d);
-
-    return 0;
-
- fail:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    domain_unpause(d);
-
-    xfree(s);
-    return rc;
-}
-
-/* Called when target domain is paused */
-int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s)
-{
-    return p2m_set_ioreq_server(s->target, 0, s);
-}
-
-int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    domain_pause(d);
-
-    arch_hvm_destroy_ioreq_server(s);
-
-    hvm_ioreq_server_disable(s);
-
-    /*
-     * It is safe to call hvm_ioreq_server_deinit() prior to
-     * set_ioreq_server() since the target domain is paused.
-     */
-    hvm_ioreq_server_deinit(s);
-    set_ioreq_server(d, id, NULL);
-
-    domain_unpause(d);
-
-    xfree(s);
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
-                              unsigned long *ioreq_gfn,
-                              unsigned long *bufioreq_gfn,
-                              evtchn_port_t *bufioreq_port)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    if ( ioreq_gfn || bufioreq_gfn )
-    {
-        rc = hvm_ioreq_server_map_pages(s);
-        if ( rc )
-            goto out;
-    }
-
-    if ( ioreq_gfn )
-        *ioreq_gfn = gfn_x(s->ioreq.gfn);
-
-    if ( HANDLE_BUFIOREQ(s) )
-    {
-        if ( bufioreq_gfn )
-            *bufioreq_gfn = gfn_x(s->bufioreq.gfn);
-
-        if ( bufioreq_port )
-            *bufioreq_port = s->bufioreq_evtchn;
-    }
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
-                               unsigned long idx, mfn_t *mfn)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    ASSERT(is_hvm_domain(d));
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    rc = hvm_ioreq_server_alloc_pages(s);
-    if ( rc )
-        goto out;
+#include <public/hvm/ioreq.h>
+#include <public/hvm/params.h>
 
-    switch ( idx )
+bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion)
+{
+    switch ( io_completion )
     {
-    case XENMEM_resource_ioreq_server_frame_bufioreq:
-        rc = -ENOENT;
-        if ( !HANDLE_BUFIOREQ(s) )
-            goto out;
-
-        *mfn = page_to_mfn(s->bufioreq.page);
-        rc = 0;
-        break;
+    case HVMIO_realmode_completion:
+    {
+        struct hvm_emulate_ctxt ctxt;
 
-    case XENMEM_resource_ioreq_server_frame_ioreq(0):
-        *mfn = page_to_mfn(s->ioreq.page);
-        rc = 0;
-        break;
+        hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
+        vmx_realmode_emulate_one(&ctxt);
+        hvm_emulate_writeback(&ctxt);
 
-    default:
-        rc = -EINVAL;
         break;
     }
 
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint64_t start,
-                                     uint64_t end)
-{
-    struct hvm_ioreq_server *s;
-    struct rangeset *r;
-    int rc;
-
-    if ( start > end )
-        return -EINVAL;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    switch ( type )
-    {
-    case XEN_DMOP_IO_RANGE_PORT:
-    case XEN_DMOP_IO_RANGE_MEMORY:
-    case XEN_DMOP_IO_RANGE_PCI:
-        r = s->range[type];
-        break;
-
     default:
-        r = NULL;
+        ASSERT_UNREACHABLE();
         break;
     }
 
-    rc = -EINVAL;
-    if ( !r )
-        goto out;
-
-    rc = -EEXIST;
-    if ( rangeset_overlaps_range(r, start, end) )
-        goto out;
-
-    rc = rangeset_add_range(r, start, end);
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
+    return true;
 }
 
-int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
-                                         uint32_t type, uint64_t start,
-                                         uint64_t end)
+/* Called when target domain is paused */
+int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s)
 {
-    struct hvm_ioreq_server *s;
-    struct rangeset *r;
-    int rc;
-
-    if ( start > end )
-        return -EINVAL;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    switch ( type )
-    {
-    case XEN_DMOP_IO_RANGE_PORT:
-    case XEN_DMOP_IO_RANGE_MEMORY:
-    case XEN_DMOP_IO_RANGE_PCI:
-        r = s->range[type];
-        break;
-
-    default:
-        r = NULL;
-        break;
-    }
-
-    rc = -EINVAL;
-    if ( !r )
-        goto out;
-
-    rc = -ENOENT;
-    if ( !rangeset_contains_range(r, start, end) )
-        goto out;
-
-    rc = rangeset_remove_range(r, start, end);
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
+    return p2m_set_ioreq_server(s->target, 0, s);
 }
 
 /*
@@ -1146,116 +99,6 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
-int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
-                               bool enabled)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    domain_pause(d);
-
-    if ( enabled )
-        hvm_ioreq_server_enable(s);
-    else
-        hvm_ioreq_server_disable(s);
-
-    domain_unpause(d);
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    return rc;
-}
-
-int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc )
-            goto fail;
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return 0;
-
- fail:
-    while ( ++id != MAX_NR_IOREQ_SERVERS )
-    {
-        s = GET_IOREQ_SERVER(d, id);
-
-        if ( !s )
-            continue;
-
-        hvm_ioreq_server_remove_vcpu(s, v);
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-        hvm_ioreq_server_remove_vcpu(s, v);
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-}
-
-void hvm_destroy_all_ioreq_servers(struct domain *d)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    arch_hvm_ioreq_destroy(d);
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    /* No need to domain_pause() as the domain is being torn down */
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        hvm_ioreq_server_disable(s);
-
-        /*
-         * It is safe to call hvm_ioreq_server_deinit() prior to
-         * set_ioreq_server() since the target domain is being destroyed.
-         */
-        hvm_ioreq_server_deinit(s);
-        set_ioreq_server(d, id, NULL);
-
-        xfree(s);
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-}
-
 int hvm_get_ioreq_server_range_type(struct domain *d,
                                     ioreq_t *p,
                                     uint8_t *type,
@@ -1303,233 +146,6 @@ int hvm_get_ioreq_server_range_type(struct domain *d,
     return 0;
 }
 
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p)
-{
-    struct hvm_ioreq_server *s;
-    uint8_t type;
-    uint64_t addr;
-    unsigned int id;
-
-    if ( hvm_get_ioreq_server_range_type(d, p, &type, &addr) )
-        return NULL;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        struct rangeset *r;
-
-        if ( !s->enabled )
-            continue;
-
-        r = s->range[type];
-
-        switch ( type )
-        {
-            unsigned long start, end;
-
-        case XEN_DMOP_IO_RANGE_PORT:
-            start = addr;
-            end = start + p->size - 1;
-            if ( rangeset_contains_range(r, start, end) )
-                return s;
-
-            break;
-
-        case XEN_DMOP_IO_RANGE_MEMORY:
-            start = hvm_mmio_first_byte(p);
-            end = hvm_mmio_last_byte(p);
-
-            if ( rangeset_contains_range(r, start, end) )
-                return s;
-
-            break;
-
-        case XEN_DMOP_IO_RANGE_PCI:
-            if ( rangeset_contains_singleton(r, addr >> 32) )
-            {
-                p->type = IOREQ_TYPE_PCI_CONFIG;
-                p->addr = addr;
-                return s;
-            }
-
-            break;
-        }
-    }
-
-    return NULL;
-}
-
-static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
-{
-    struct domain *d = current->domain;
-    struct hvm_ioreq_page *iorp;
-    buffered_iopage_t *pg;
-    buf_ioreq_t bp = { .data = p->data,
-                       .addr = p->addr,
-                       .type = p->type,
-                       .dir = p->dir };
-    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
-    int qw = 0;
-
-    /* Ensure buffered_iopage fits in a page */
-    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
-
-    iorp = &s->bufioreq;
-    pg = iorp->va;
-
-    if ( !pg )
-        return IOREQ_IO_UNHANDLED;
-
-    /*
-     * Return 0 for the cases we can't deal with:
-     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
-     *  - we cannot buffer accesses to guest memory buffers, as the guest
-     *    may expect the memory buffer to be synchronously accessed
-     *  - the count field is usually used with data_is_ptr and since we don't
-     *    support data_is_ptr we do not waste space for the count field either
-     */
-    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
-        return 0;
-
-    switch ( p->size )
-    {
-    case 1:
-        bp.size = 0;
-        break;
-    case 2:
-        bp.size = 1;
-        break;
-    case 4:
-        bp.size = 2;
-        break;
-    case 8:
-        bp.size = 3;
-        qw = 1;
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return IOREQ_IO_UNHANDLED;
-    }
-
-    spin_lock(&s->bufioreq_lock);
-
-    if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >=
-         (IOREQ_BUFFER_SLOT_NUM - qw) )
-    {
-        /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&s->bufioreq_lock);
-        return IOREQ_IO_UNHANDLED;
-    }
-
-    pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
-
-    if ( qw )
-    {
-        bp.data = p->data >> 32;
-        pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
-    }
-
-    /* Make the ioreq_t visible /before/ write_pointer. */
-    smp_wmb();
-    pg->ptrs.write_pointer += qw ? 2 : 1;
-
-    /* Canonicalize read/write pointers to prevent their overflow. */
-    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
-            qw++ < IOREQ_BUFFER_SLOT_NUM &&
-            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
-    {
-        union bufioreq_pointers old = pg->ptrs, new;
-        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
-
-        new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-        new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-        cmpxchg(&pg->ptrs.full, old.full, new.full);
-    }
-
-    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
-    spin_unlock(&s->bufioreq_lock);
-
-    return IOREQ_IO_HANDLED;
-}
-
-int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
-                   bool buffered)
-{
-    struct vcpu *curr = current;
-    struct domain *d = curr->domain;
-    struct hvm_ioreq_vcpu *sv;
-
-    ASSERT(s);
-
-    if ( buffered )
-        return hvm_send_buffered_ioreq(s, proto_p);
-
-    if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-        return IOREQ_IO_RETRY;
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-    {
-        if ( sv->vcpu == curr )
-        {
-            evtchn_port_t port = sv->ioreq_evtchn;
-            ioreq_t *p = get_ioreq(s, curr);
-
-            if ( unlikely(p->state != STATE_IOREQ_NONE) )
-            {
-                gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
-                        p->state);
-                break;
-            }
-
-            if ( unlikely(p->vp_eport != port) )
-            {
-                gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
-                        p->vp_eport);
-                break;
-            }
-
-            proto_p->state = STATE_IOREQ_NONE;
-            proto_p->vp_eport = port;
-            *p = *proto_p;
-
-            prepare_wait_on_xen_event_channel(port);
-
-            /*
-             * Following happens /after/ blocking and setting up ioreq
-             * contents. prepare_wait_on_xen_event_channel() is an implicit
-             * barrier.
-             */
-            p->state = STATE_IOREQ_READY;
-            notify_via_xen_event_channel(d, port);
-
-            sv->pending = true;
-            return IOREQ_IO_RETRY;
-        }
-    }
-
-    return IOREQ_IO_UNHANDLED;
-}
-
-unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
-{
-    struct domain *d = current->domain;
-    struct hvm_ioreq_server *s;
-    unsigned int id, failed = 0;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        if ( !s->enabled )
-            continue;
-
-        if ( hvm_send_ioreq(s, p, buffered) == IOREQ_IO_UNHANDLED )
-            failed++;
-    }
-
-    return failed;
-}
-
 static int hvm_access_cf8(
     int dir, unsigned int port, unsigned int bytes, uint32_t *val)
 {
@@ -1552,13 +168,6 @@ void arch_hvm_ioreq_destroy(struct domain *d)
 
 }
 
-void hvm_ioreq_init(struct domain *d)
-{
-    spin_lock_init(&d->arch.hvm.ioreq_server.lock);
-
-    arch_hvm_ioreq_init(d);
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index e267513..fd7cadb 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -27,10 +27,10 @@
  *  can have side effects.
  */
 
+#include <xen/ioreq.h>
 #include <xen/types.h>
 #include <xen/sched.h>
 #include <xen/domain_page.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/support.h>
 #include <xen/numa.h>
 #include <xen/paging.h>
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 1e51689..50e4e6e 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -19,10 +19,11 @@
  *
  */
 
+#include <xen/ioreq.h>
+
 #include <asm/types.h>
 #include <asm/mtrr.h>
 #include <asm/p2m.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vvmx.h>
 #include <asm/hvm/nestedhvm.h>
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 638f6bf..776d2b6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -100,6 +100,7 @@
  */
 
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
 #include <xen/mm.h>
@@ -141,7 +142,6 @@
 #include <asm/io_apic.h>
 #include <asm/pci.h>
 #include <asm/guest.h>
-#include <asm/hvm/ioreq.h>
 
 #include <asm/hvm/grant_table.h>
 #include <asm/pv/domain.h>
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 7c7204f..3893579 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -20,6 +20,7 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/ioreq.h>
 #include <xen/types.h>
 #include <xen/mm.h>
 #include <xen/trace.h>
@@ -34,7 +35,6 @@
 #include <asm/current.h>
 #include <asm/flushtlb.h>
 #include <asm/shadow.h>
-#include <asm/hvm/ioreq.h>
 #include <xen/numa.h>
 #include "private.h"
 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 15e3b79..fb6fb51 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -139,6 +139,9 @@ config HYPFS_CONFIG
 	  Disable this option in case you want to spare some memory or you
 	  want to hide the .config contents from dom0.
 
+config IOREQ_SERVER
+	bool
+
 config KEXEC
 	bool "kexec support"
 	default y
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 06881d0..8df2b6e 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_GRANT_TABLE) += grant_table.o
 obj-y += guestcopy.o
 obj-bin-y += gunzip.init.o
 obj-$(CONFIG_HYPFS) += hypfs.o
+obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
 obj-y += irq.o
 obj-y += kernel.o
 obj-y += keyhandler.o
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
new file mode 100644
index 0000000..5017617
--- /dev/null
+++ b/xen/common/ioreq.c
@@ -0,0 +1,1410 @@
+/*
+ * common/ioreq.c: hardware virtual machine I/O emulation
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/ctype.h>
+#include <xen/domain.h>
+#include <xen/domain_page.h>
+#include <xen/event.h>
+#include <xen/init.h>
+#include <xen/ioreq.h>
+#include <xen/irq.h>
+#include <xen/lib.h>
+#include <xen/paging.h>
+#include <xen/sched.h>
+#include <xen/softirq.h>
+#include <xen/trace.h>
+#include <xen/vpci.h>
+
+#include <public/hvm/dm_op.h>
+#include <public/hvm/ioreq.h>
+#include <public/hvm/params.h>
+
+static void set_ioreq_server(struct domain *d, unsigned int id,
+                             struct hvm_ioreq_server *s)
+{
+    ASSERT(id < MAX_NR_IOREQ_SERVERS);
+    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
+
+    d->arch.hvm.ioreq_server.server[id] = s;
+}
+
+/*
+ * Iterate over all possible ioreq servers.
+ *
+ * NOTE: The iteration is backwards such that more recently created
+ *       ioreq servers are favoured in hvm_select_ioreq_server().
+ *       This is a semantic that previously existed when ioreq servers
+ *       were held in a linked list.
+ */
+#define FOR_EACH_IOREQ_SERVER(d, id, s) \
+    for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \
+        if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \
+            continue; \
+        else
+
+static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    shared_iopage_t *p = s->ioreq.va;
+
+    ASSERT((v == current) || !vcpu_runnable(v));
+    ASSERT(p != NULL);
+
+    return &p->vcpu_ioreq[v->vcpu_id];
+}
+
+static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v,
+                                               struct hvm_ioreq_server **srvp)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        struct hvm_ioreq_vcpu *sv;
+
+        list_for_each_entry ( sv,
+                              &s->ioreq_vcpu_list,
+                              list_entry )
+        {
+            if ( sv->vcpu == v && sv->pending )
+            {
+                if ( srvp )
+                    *srvp = s;
+                return sv;
+            }
+        }
+    }
+
+    return NULL;
+}
+
+bool hvm_io_pending(struct vcpu *v)
+{
+    return get_pending_vcpu(v, NULL);
+}
+
+static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
+{
+    unsigned int prev_state = STATE_IOREQ_NONE;
+    unsigned int state = p->state;
+    uint64_t data = ~0;
+
+    smp_rmb();
+
+    /*
+     * The only reason we should see this condition be false is when an
+     * emulator dying races with I/O being requested.
+     */
+    while ( likely(state != STATE_IOREQ_NONE) )
+    {
+        if ( unlikely(state < prev_state) )
+        {
+            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
+                     prev_state, state);
+            sv->pending = false;
+            domain_crash(sv->vcpu->domain);
+            return false; /* bail */
+        }
+
+        switch ( prev_state = state )
+        {
+        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+            p->state = STATE_IOREQ_NONE;
+            data = p->data;
+            break;
+
+        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
+        case STATE_IOREQ_INPROCESS:
+            wait_on_xen_event_channel(sv->ioreq_evtchn,
+                                      ({ state = p->state;
+                                         smp_rmb();
+                                         state != prev_state; }));
+            continue;
+
+        default:
+            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state);
+            sv->pending = false;
+            domain_crash(sv->vcpu->domain);
+            return false; /* bail */
+        }
+
+        break;
+    }
+
+    p = &sv->vcpu->arch.hvm.hvm_io.io_req;
+    if ( hvm_ioreq_needs_completion(p) )
+        p->data = data;
+
+    sv->pending = false;
+
+    return true;
+}
+
+bool handle_hvm_io_completion(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
+    struct hvm_ioreq_server *s;
+    struct hvm_ioreq_vcpu *sv;
+    enum hvm_io_completion io_completion;
+
+    if ( has_vpci(d) && vpci_process_pending(v) )
+    {
+        raise_softirq(SCHEDULE_SOFTIRQ);
+        return false;
+    }
+
+    sv = get_pending_vcpu(v, &s);
+    if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
+        return false;
+
+    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
+        STATE_IORESP_READY : STATE_IOREQ_NONE;
+
+    msix_write_completion(v);
+    vcpu_end_shutdown_deferral(v);
+
+    io_completion = vio->io_completion;
+    vio->io_completion = HVMIO_no_completion;
+
+    switch ( io_completion )
+    {
+    case HVMIO_no_completion:
+        break;
+
+    case HVMIO_mmio_completion:
+        return handle_mmio();
+
+    case HVMIO_pio_completion:
+        return handle_pio(vio->io_req.addr, vio->io_req.size,
+                          vio->io_req.dir);
+
+    default:
+        return arch_handle_hvm_io_completion(io_completion);
+    }
+
+    return true;
+}
+
+static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->target;
+    unsigned int i;
+
+    BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN != HVM_PARAM_IOREQ_PFN + 1);
+
+    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
+    {
+        if ( !test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask) )
+            return _gfn(d->arch.hvm.params[i]);
+    }
+
+    return INVALID_GFN;
+}
+
+static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->target;
+    unsigned int i;
+
+    for ( i = 0; i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8; i++ )
+    {
+        if ( test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.mask) )
+            return _gfn(d->arch.hvm.ioreq_gfn.base + i);
+    }
+
+    /*
+     * If we are out of 'normal' GFNs then we may still have a 'legacy'
+     * GFN available.
+     */
+    return hvm_alloc_legacy_ioreq_gfn(s);
+}
+
+static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s,
+                                      gfn_t gfn)
+{
+    struct domain *d = s->target;
+    unsigned int i;
+
+    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
+    {
+        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
+             break;
+    }
+    if ( i > HVM_PARAM_BUFIOREQ_PFN )
+        return false;
+
+    set_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask);
+    return true;
+}
+
+static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
+{
+    struct domain *d = s->target;
+    unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base;
+
+    ASSERT(!gfn_eq(gfn, INVALID_GFN));
+
+    if ( !hvm_free_legacy_ioreq_gfn(s, gfn) )
+    {
+        ASSERT(i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8);
+        set_bit(i, &d->arch.hvm.ioreq_gfn.mask);
+    }
+}
+
+static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return;
+
+    destroy_ring_for_helper(&iorp->va, iorp->page);
+    iorp->page = NULL;
+
+    hvm_free_ioreq_gfn(s, iorp->gfn);
+    iorp->gfn = INVALID_GFN;
+}
+
+static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct domain *d = s->target;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    int rc;
+
+    if ( iorp->page )
+    {
+        /*
+         * If a page has already been allocated (which will happen on
+         * demand if hvm_get_ioreq_server_frame() is called), then
+         * mapping a guest frame is not permitted.
+         */
+        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+            return -EPERM;
+
+        return 0;
+    }
+
+    if ( d->is_dying )
+        return -EINVAL;
+
+    iorp->gfn = hvm_alloc_ioreq_gfn(s);
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return -ENOMEM;
+
+    rc = prepare_ring_for_helper(d, gfn_x(iorp->gfn), &iorp->page,
+                                 &iorp->va);
+
+    if ( rc )
+        hvm_unmap_ioreq_gfn(s, buf);
+
+    return rc;
+}
+
+static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct page_info *page;
+
+    if ( iorp->page )
+    {
+        /*
+         * If a guest frame has already been mapped (which may happen
+         * on demand if hvm_get_ioreq_server_info() is called), then
+         * allocating a page is not permitted.
+         */
+        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
+            return -EPERM;
+
+        return 0;
+    }
+
+    page = alloc_domheap_page(s->target, MEMF_no_refcount);
+
+    if ( !page )
+        return -ENOMEM;
+
+    if ( !get_page_and_type(page, s->target, PGT_writable_page) )
+    {
+        /*
+         * The domain can't possibly know about this page yet, so failure
+         * here is a clear indication of something fishy going on.
+         */
+        domain_crash(s->emulator);
+        return -ENODATA;
+    }
+
+    iorp->va = __map_domain_page_global(page);
+    if ( !iorp->va )
+        goto fail;
+
+    iorp->page = page;
+    clear_page(iorp->va);
+    return 0;
+
+ fail:
+    put_page_alloc_ref(page);
+    put_page_and_type(page);
+
+    return -ENOMEM;
+}
+
+static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct page_info *page = iorp->page;
+
+    if ( !page )
+        return;
+
+    iorp->page = NULL;
+
+    unmap_domain_page_global(iorp->va);
+    iorp->va = NULL;
+
+    put_page_alloc_ref(page);
+    put_page_and_type(page);
+}
+
+bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
+{
+    const struct hvm_ioreq_server *s;
+    unsigned int id;
+    bool found = false;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
+        {
+            found = true;
+            break;
+        }
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return found;
+}
+
+static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+
+{
+    struct domain *d = s->target;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return;
+
+    if ( guest_physmap_remove_page(d, iorp->gfn,
+                                   page_to_mfn(iorp->page), 0) )
+        domain_crash(d);
+    clear_page(iorp->va);
+}
+
+static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct domain *d = s->target;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    int rc;
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return 0;
+
+    clear_page(iorp->va);
+
+    rc = guest_physmap_add_page(d, iorp->gfn,
+                                page_to_mfn(iorp->page), 0);
+    if ( rc == 0 )
+        paging_mark_pfn_dirty(d, _pfn(gfn_x(iorp->gfn)));
+
+    return rc;
+}
+
+static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
+                                    struct hvm_ioreq_vcpu *sv)
+{
+    ASSERT(spin_is_locked(&s->lock));
+
+    if ( s->ioreq.va != NULL )
+    {
+        ioreq_t *p = get_ioreq(s, sv->vcpu);
+
+        p->vp_eport = sv->ioreq_evtchn;
+    }
+}
+
+#define HANDLE_BUFIOREQ(s) \
+    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
+
+static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
+                                     struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+    int rc;
+
+    sv = xzalloc(struct hvm_ioreq_vcpu);
+
+    rc = -ENOMEM;
+    if ( !sv )
+        goto fail1;
+
+    spin_lock(&s->lock);
+
+    rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id,
+                                         s->emulator->domain_id, NULL);
+    if ( rc < 0 )
+        goto fail2;
+
+    sv->ioreq_evtchn = rc;
+
+    if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+    {
+        rc = alloc_unbound_xen_event_channel(v->domain, 0,
+                                             s->emulator->domain_id, NULL);
+        if ( rc < 0 )
+            goto fail3;
+
+        s->bufioreq_evtchn = rc;
+    }
+
+    sv->vcpu = v;
+
+    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
+
+    if ( s->enabled )
+        hvm_update_ioreq_evtchn(s, sv);
+
+    spin_unlock(&s->lock);
+    return 0;
+
+ fail3:
+    free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+ fail2:
+    spin_unlock(&s->lock);
+    xfree(sv);
+
+ fail1:
+    return rc;
+}
+
+static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
+                                         struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+    {
+        if ( sv->vcpu != v )
+            continue;
+
+        list_del(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+        xfree(sv);
+        break;
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
+{
+    struct hvm_ioreq_vcpu *sv, *next;
+
+    spin_lock(&s->lock);
+
+    list_for_each_entry_safe ( sv,
+                               next,
+                               &s->ioreq_vcpu_list,
+                               list_entry )
+    {
+        struct vcpu *v = sv->vcpu;
+
+        list_del(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+        xfree(sv);
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+{
+    int rc;
+
+    rc = hvm_map_ioreq_gfn(s, false);
+
+    if ( !rc && HANDLE_BUFIOREQ(s) )
+        rc = hvm_map_ioreq_gfn(s, true);
+
+    if ( rc )
+        hvm_unmap_ioreq_gfn(s, false);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+{
+    hvm_unmap_ioreq_gfn(s, true);
+    hvm_unmap_ioreq_gfn(s, false);
+}
+
+static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
+{
+    int rc;
+
+    rc = hvm_alloc_ioreq_mfn(s, false);
+
+    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
+        rc = hvm_alloc_ioreq_mfn(s, true);
+
+    if ( rc )
+        hvm_free_ioreq_mfn(s, false);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
+{
+    hvm_free_ioreq_mfn(s, true);
+    hvm_free_ioreq_mfn(s, false);
+}
+
+static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
+{
+    unsigned int i;
+
+    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+        rangeset_destroy(s->range[i]);
+}
+
+static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
+                                            ioservid_t id)
+{
+    unsigned int i;
+    int rc;
+
+    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+    {
+        char *name;
+
+        rc = asprintf(&name, "ioreq_server %d %s", id,
+                      (i == XEN_DMOP_IO_RANGE_PORT) ? "port" :
+                      (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" :
+                      (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" :
+                      "");
+        if ( rc )
+            goto fail;
+
+        s->range[i] = rangeset_new(s->target, name,
+                                   RANGESETF_prettyprint_hex);
+
+        xfree(name);
+
+        rc = -ENOMEM;
+        if ( !s->range[i] )
+            goto fail;
+
+        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+    }
+
+    return 0;
+
+ fail:
+    hvm_ioreq_server_free_rangesets(s);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
+{
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    if ( s->enabled )
+        goto done;
+
+    hvm_remove_ioreq_gfn(s, false);
+    hvm_remove_ioreq_gfn(s, true);
+
+    s->enabled = true;
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+        hvm_update_ioreq_evtchn(s, sv);
+
+  done:
+    spin_unlock(&s->lock);
+}
+
+static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
+{
+    spin_lock(&s->lock);
+
+    if ( !s->enabled )
+        goto done;
+
+    hvm_add_ioreq_gfn(s, true);
+    hvm_add_ioreq_gfn(s, false);
+
+    s->enabled = false;
+
+ done:
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
+                                 struct domain *d, int bufioreq_handling,
+                                 ioservid_t id)
+{
+    struct domain *currd = current->domain;
+    struct vcpu *v;
+    int rc;
+
+    s->target = d;
+
+    get_knownalive_domain(currd);
+    s->emulator = currd;
+
+    spin_lock_init(&s->lock);
+    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
+    spin_lock_init(&s->bufioreq_lock);
+
+    s->ioreq.gfn = INVALID_GFN;
+    s->bufioreq.gfn = INVALID_GFN;
+
+    rc = hvm_ioreq_server_alloc_rangesets(s, id);
+    if ( rc )
+        return rc;
+
+    s->bufioreq_handling = bufioreq_handling;
+
+    for_each_vcpu ( d, v )
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail_add;
+    }
+
+    return 0;
+
+ fail_add:
+    hvm_ioreq_server_remove_all_vcpus(s);
+    hvm_ioreq_server_unmap_pages(s);
+
+    hvm_ioreq_server_free_rangesets(s);
+
+    put_domain(s->emulator);
+    return rc;
+}
+
+static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
+{
+    ASSERT(!s->enabled);
+    hvm_ioreq_server_remove_all_vcpus(s);
+
+    /*
+     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
+     *       hvm_ioreq_server_free_pages() in that order.
+     *       This is because the former will do nothing if the pages
+     *       are not mapped, leaving the page to be freed by the latter.
+     *       However if the pages are mapped then the former will set
+     *       the page_info pointer to NULL, meaning the latter will do
+     *       nothing.
+     */
+    hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_free_pages(s);
+
+    hvm_ioreq_server_free_rangesets(s);
+
+    put_domain(s->emulator);
+}
+
+int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
+                            ioservid_t *id)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int i;
+    int rc;
+
+    if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC )
+        return -EINVAL;
+
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        return -ENOMEM;
+
+    domain_pause(d);
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
+    {
+        if ( !GET_IOREQ_SERVER(d, i) )
+            break;
+    }
+
+    rc = -ENOSPC;
+    if ( i >= MAX_NR_IOREQ_SERVERS )
+        goto fail;
+
+    /*
+     * It is safe to call set_ioreq_server() prior to
+     * hvm_ioreq_server_init() since the target domain is paused.
+     */
+    set_ioreq_server(d, i, s);
+
+    rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i);
+    if ( rc )
+    {
+        set_ioreq_server(d, i, NULL);
+        goto fail;
+    }
+
+    if ( id )
+        *id = i;
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    domain_unpause(d);
+
+    return 0;
+
+ fail:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    domain_unpause(d);
+
+    xfree(s);
+    return rc;
+}
+
+int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    domain_pause(d);
+
+    arch_hvm_destroy_ioreq_server(s);
+
+    hvm_ioreq_server_disable(s);
+
+    /*
+     * It is safe to call hvm_ioreq_server_deinit() prior to
+     * set_ioreq_server() since the target domain is paused.
+     */
+    hvm_ioreq_server_deinit(s);
+    set_ioreq_server(d, id, NULL);
+
+    domain_unpause(d);
+
+    xfree(s);
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                              unsigned long *ioreq_gfn,
+                              unsigned long *bufioreq_gfn,
+                              evtchn_port_t *bufioreq_port)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    if ( ioreq_gfn || bufioreq_gfn )
+    {
+        rc = hvm_ioreq_server_map_pages(s);
+        if ( rc )
+            goto out;
+    }
+
+    if ( ioreq_gfn )
+        *ioreq_gfn = gfn_x(s->ioreq.gfn);
+
+    if ( HANDLE_BUFIOREQ(s) )
+    {
+        if ( bufioreq_gfn )
+            *bufioreq_gfn = gfn_x(s->bufioreq.gfn);
+
+        if ( bufioreq_port )
+            *bufioreq_port = s->bufioreq_evtchn;
+    }
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
+                               unsigned long idx, mfn_t *mfn)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    ASSERT(is_hvm_domain(d));
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    rc = hvm_ioreq_server_alloc_pages(s);
+    if ( rc )
+        goto out;
+
+    switch ( idx )
+    {
+    case XENMEM_resource_ioreq_server_frame_bufioreq:
+        rc = -ENOENT;
+        if ( !HANDLE_BUFIOREQ(s) )
+            goto out;
+
+        *mfn = page_to_mfn(s->bufioreq.page);
+        rc = 0;
+        break;
+
+    case XENMEM_resource_ioreq_server_frame_ioreq(0):
+        *mfn = page_to_mfn(s->ioreq.page);
+        rc = 0;
+        break;
+
+    default:
+        rc = -EINVAL;
+        break;
+    }
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint64_t start,
+                                     uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    struct rangeset *r;
+    int rc;
+
+    if ( start > end )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    switch ( type )
+    {
+    case XEN_DMOP_IO_RANGE_PORT:
+    case XEN_DMOP_IO_RANGE_MEMORY:
+    case XEN_DMOP_IO_RANGE_PCI:
+        r = s->range[type];
+        break;
+
+    default:
+        r = NULL;
+        break;
+    }
+
+    rc = -EINVAL;
+    if ( !r )
+        goto out;
+
+    rc = -EEXIST;
+    if ( rangeset_overlaps_range(r, start, end) )
+        goto out;
+
+    rc = rangeset_add_range(r, start, end);
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                         uint32_t type, uint64_t start,
+                                         uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    struct rangeset *r;
+    int rc;
+
+    if ( start > end )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    switch ( type )
+    {
+    case XEN_DMOP_IO_RANGE_PORT:
+    case XEN_DMOP_IO_RANGE_MEMORY:
+    case XEN_DMOP_IO_RANGE_PCI:
+        r = s->range[type];
+        break;
+
+    default:
+        r = NULL;
+        break;
+    }
+
+    rc = -EINVAL;
+    if ( !r )
+        goto out;
+
+    rc = -ENOENT;
+    if ( !rangeset_contains_range(r, start, end) )
+        goto out;
+
+    rc = rangeset_remove_range(r, start, end);
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                               bool enabled)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    domain_pause(d);
+
+    if ( enabled )
+        hvm_ioreq_server_enable(s);
+    else
+        hvm_ioreq_server_disable(s);
+
+    domain_unpause(d);
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    return rc;
+}
+
+int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail;
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return 0;
+
+ fail:
+    while ( ++id != MAX_NR_IOREQ_SERVERS )
+    {
+        s = GET_IOREQ_SERVER(d, id);
+
+        if ( !s )
+            continue;
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+        hvm_ioreq_server_remove_vcpu(s, v);
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+}
+
+void hvm_destroy_all_ioreq_servers(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    arch_hvm_ioreq_destroy(d);
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    /* No need to domain_pause() as the domain is being torn down */
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        hvm_ioreq_server_disable(s);
+
+        /*
+         * It is safe to call hvm_ioreq_server_deinit() prior to
+         * set_ioreq_server() since the target domain is being destroyed.
+         */
+        hvm_ioreq_server_deinit(s);
+        set_ioreq_server(d, id, NULL);
+
+        xfree(s);
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+}
+
+struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                 ioreq_t *p)
+{
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+    unsigned int id;
+
+    if ( hvm_get_ioreq_server_range_type(d, p, &type, &addr) )
+        return NULL;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        struct rangeset *r;
+
+        if ( !s->enabled )
+            continue;
+
+        r = s->range[type];
+
+        switch ( type )
+        {
+            unsigned long start, end;
+
+        case XEN_DMOP_IO_RANGE_PORT:
+            start = addr;
+            end = start + p->size - 1;
+            if ( rangeset_contains_range(r, start, end) )
+                return s;
+
+            break;
+
+        case XEN_DMOP_IO_RANGE_MEMORY:
+            start = hvm_mmio_first_byte(p);
+            end = hvm_mmio_last_byte(p);
+
+            if ( rangeset_contains_range(r, start, end) )
+                return s;
+
+            break;
+
+        case XEN_DMOP_IO_RANGE_PCI:
+            if ( rangeset_contains_singleton(r, addr >> 32) )
+            {
+                p->type = IOREQ_TYPE_PCI_CONFIG;
+                p->addr = addr;
+                return s;
+            }
+
+            break;
+        }
+    }
+
+    return NULL;
+}
+
+static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
+{
+    struct domain *d = current->domain;
+    struct hvm_ioreq_page *iorp;
+    buffered_iopage_t *pg;
+    buf_ioreq_t bp = { .data = p->data,
+                       .addr = p->addr,
+                       .type = p->type,
+                       .dir = p->dir };
+    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
+    int qw = 0;
+
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
+    iorp = &s->bufioreq;
+    pg = iorp->va;
+
+    if ( !pg )
+        return IOREQ_IO_UNHANDLED;
+
+    /*
+     * Return 0 for the cases we can't deal with:
+     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
+     *  - we cannot buffer accesses to guest memory buffers, as the guest
+     *    may expect the memory buffer to be synchronously accessed
+     *  - the count field is usually used with data_is_ptr and since we don't
+     *    support data_is_ptr we do not waste space for the count field either
+     */
+    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
+        return 0;
+
+    switch ( p->size )
+    {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
+        return IOREQ_IO_UNHANDLED;
+    }
+
+    spin_lock(&s->bufioreq_lock);
+
+    if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >=
+         (IOREQ_BUFFER_SLOT_NUM - qw) )
+    {
+        /* The queue is full: send the iopacket through the normal path. */
+        spin_unlock(&s->bufioreq_lock);
+        return IOREQ_IO_UNHANDLED;
+    }
+
+    pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
+
+    if ( qw )
+    {
+        bp.data = p->data >> 32;
+        pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
+    }
+
+    /* Make the ioreq_t visible /before/ write_pointer. */
+    smp_wmb();
+    pg->ptrs.write_pointer += qw ? 2 : 1;
+
+    /* Canonicalize read/write pointers to prevent their overflow. */
+    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
+            qw++ < IOREQ_BUFFER_SLOT_NUM &&
+            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
+    {
+        union bufioreq_pointers old = pg->ptrs, new;
+        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
+
+        new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
+        new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
+        cmpxchg(&pg->ptrs.full, old.full, new.full);
+    }
+
+    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
+    spin_unlock(&s->bufioreq_lock);
+
+    return IOREQ_IO_HANDLED;
+}
+
+int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
+                   bool buffered)
+{
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
+    struct hvm_ioreq_vcpu *sv;
+
+    ASSERT(s);
+
+    if ( buffered )
+        return hvm_send_buffered_ioreq(s, proto_p);
+
+    if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
+        return IOREQ_IO_RETRY;
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+    {
+        if ( sv->vcpu == curr )
+        {
+            evtchn_port_t port = sv->ioreq_evtchn;
+            ioreq_t *p = get_ioreq(s, curr);
+
+            if ( unlikely(p->state != STATE_IOREQ_NONE) )
+            {
+                gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
+                        p->state);
+                break;
+            }
+
+            if ( unlikely(p->vp_eport != port) )
+            {
+                gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
+                        p->vp_eport);
+                break;
+            }
+
+            proto_p->state = STATE_IOREQ_NONE;
+            proto_p->vp_eport = port;
+            *p = *proto_p;
+
+            prepare_wait_on_xen_event_channel(port);
+
+            /*
+             * Following happens /after/ blocking and setting up ioreq
+             * contents. prepare_wait_on_xen_event_channel() is an implicit
+             * barrier.
+             */
+            p->state = STATE_IOREQ_READY;
+            notify_via_xen_event_channel(d, port);
+
+            sv->pending = true;
+            return IOREQ_IO_RETRY;
+        }
+    }
+
+    return IOREQ_IO_UNHANDLED;
+}
+
+unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
+{
+    struct domain *d = current->domain;
+    struct hvm_ioreq_server *s;
+    unsigned int id, failed = 0;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        if ( !s->enabled )
+            continue;
+
+        if ( hvm_send_ioreq(s, p, buffered) == IOREQ_IO_UNHANDLED )
+            failed++;
+    }
+
+    return failed;
+}
+
+void hvm_ioreq_init(struct domain *d)
+{
+    spin_lock_init(&d->arch.hvm.ioreq_server.lock);
+
+    arch_hvm_ioreq_init(d);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index 151b92b..dec1e71 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -19,41 +19,12 @@
 #ifndef __ASM_X86_HVM_IOREQ_H__
 #define __ASM_X86_HVM_IOREQ_H__
 
-bool hvm_io_pending(struct vcpu *v);
-bool handle_hvm_io_completion(struct vcpu *v);
-bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
+#include <asm/hvm/emulate.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/vmx/vmx.h>
 
-int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
-                            ioservid_t *id);
-int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id);
-int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
-                              unsigned long *ioreq_gfn,
-                              unsigned long *bufioreq_gfn,
-                              evtchn_port_t *bufioreq_port);
-int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
-                               unsigned long idx, mfn_t *mfn);
-int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint64_t start,
-                                     uint64_t end);
-int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
-                                         uint32_t type, uint64_t start,
-                                         uint64_t end);
 int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
                                      uint32_t type, uint32_t flags);
-int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
-                               bool enabled);
-
-int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
-void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
-void hvm_destroy_all_ioreq_servers(struct domain *d);
-
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p);
-int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
-                   bool buffered);
-unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
-
-void hvm_ioreq_init(struct domain *d);
 
 int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s);
 
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
new file mode 100644
index 0000000..f846034
--- /dev/null
+++ b/xen/include/xen/ioreq.h
@@ -0,0 +1,82 @@
+/*
+ * ioreq.h: Hardware virtual machine assist interface definitions.
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __IOREQ_H__
+#define __IOREQ_H__
+
+#include <xen/sched.h>
+
+#include <asm/hvm/ioreq.h>
+
+#define GET_IOREQ_SERVER(d, id) \
+    (d)->arch.hvm.ioreq_server.server[id]
+
+static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
+                                                        unsigned int id)
+{
+    if ( id >= MAX_NR_IOREQ_SERVERS )
+        return NULL;
+
+    return GET_IOREQ_SERVER(d, id);
+}
+
+bool hvm_io_pending(struct vcpu *v);
+bool handle_hvm_io_completion(struct vcpu *v);
+bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
+
+int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
+                            ioservid_t *id);
+int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id);
+int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                              unsigned long *ioreq_gfn,
+                              unsigned long *bufioreq_gfn,
+                              evtchn_port_t *bufioreq_port);
+int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
+                               unsigned long idx, mfn_t *mfn);
+int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint64_t start,
+                                     uint64_t end);
+int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                         uint32_t type, uint64_t start,
+                                         uint64_t end);
+int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                               bool enabled);
+
+int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
+void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
+void hvm_destroy_all_ioreq_servers(struct domain *d);
+
+struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                 ioreq_t *p);
+int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
+                   bool buffered);
+unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
+
+void hvm_ioreq_init(struct domain *d);
+
+#endif /* __IOREQ_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 03/16] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
  2020-09-10 20:21 ` [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
  2020-09-10 20:21 ` [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
@ 2020-09-10 20:21 ` Oleksandr Tyshchenko
  2020-09-14 14:59   ` Jan Beulich
  2020-09-10 20:21 ` [PATCH V1 04/16] xen/ioreq: Provide alias for the handle_mmio() Oleksandr Tyshchenko
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Jun Nakajima, Kevin Tian, Jan Beulich,
	Andrew Cooper, Wei Liu, Roger Pau Monné,
	Paul Durrant, Julien Grall, Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The IOREQ is a common feature now and this helper will be used
on Arm as is. Move it to include/xen/ioreq.h

Although PIO handling on Arm is not introduced with the current series
(it will be implemented when we add support for vPCI), technically
the PIOs exist on Arm (however they are accessed the same way as MMIO)
and it would be better not to diverge now.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch, was split from:
     "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common"
---
---
 xen/arch/x86/hvm/vmx/realmode.c | 1 +
 xen/include/asm-x86/hvm/vcpu.h  | 7 -------
 xen/include/xen/ioreq.h         | 7 +++++++
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index bdbd9cb..292a7c3 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -10,6 +10,7 @@
  */
 
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
 #include <xen/paging.h>
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 5ccd075..6c1feda 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -91,13 +91,6 @@ struct hvm_vcpu_io {
     const struct g2m_ioport *g2m_ioport;
 };
 
-static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
-{
-    return ioreq->state == STATE_IOREQ_READY &&
-           !ioreq->data_is_ptr &&
-           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
-}
-
 struct nestedvcpu {
     bool_t nv_guestmode; /* vcpu in guestmode? */
     void *nv_vvmcx; /* l1 guest virtual VMCB/VMCS */
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index f846034..2240a73 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -35,6 +35,13 @@ static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
     return GET_IOREQ_SERVER(d, id);
 }
 
+static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
+{
+    return ioreq->state == STATE_IOREQ_READY &&
+           !ioreq->data_is_ptr &&
+           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
+}
+
 bool hvm_io_pending(struct vcpu *v);
 bool handle_hvm_io_completion(struct vcpu *v);
 bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 04/16] xen/ioreq: Provide alias for the handle_mmio()
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (2 preceding siblings ...)
  2020-09-10 20:21 ` [PATCH V1 03/16] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common Oleksandr Tyshchenko
@ 2020-09-10 20:21 ` Oleksandr Tyshchenko
  2020-09-14 15:10   ` Jan Beulich
  2020-09-23 17:28   ` Julien Grall
  2020-09-10 20:21 ` [PATCH V1 05/16] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common Oleksandr Tyshchenko
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The IOREQ is a common feature now and Arm will have its own
implementation.

But the name of the function is pretty generic and can be confusing
on Arm (we already have a try_handle_mmio()).

In order not to rename the function (which is used for a varying
set of purposes on x86) globally and get non-confusing variant on Arm
provide an alias ioreq_handle_complete_mmio() to be used on common and
Arm code.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch
---
---
 xen/common/ioreq.c              | 2 +-
 xen/include/asm-x86/hvm/ioreq.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 5017617..ce12751 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -189,7 +189,7 @@ bool handle_hvm_io_completion(struct vcpu *v)
         break;
 
     case HVMIO_mmio_completion:
-        return handle_mmio();
+        return ioreq_handle_complete_mmio();
 
     case HVMIO_pio_completion:
         return handle_pio(vio->io_req.addr, vio->io_req.size,
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index dec1e71..43afdee 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -42,6 +42,8 @@ void arch_hvm_ioreq_destroy(struct domain *d);
 #define IOREQ_IO_UNHANDLED   X86EMUL_UNHANDLEABLE
 #define IOREQ_IO_RETRY       X86EMUL_RETRY
 
+#define ioreq_handle_complete_mmio   handle_mmio
+
 #endif /* __ASM_X86_HVM_IOREQ_H__ */
 
 /*
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 05/16] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (3 preceding siblings ...)
  2020-09-10 20:21 ` [PATCH V1 04/16] xen/ioreq: Provide alias for the handle_mmio() Oleksandr Tyshchenko
@ 2020-09-10 20:21 ` Oleksandr Tyshchenko
  2020-09-14 15:13   ` Jan Beulich
  2020-09-10 20:22 ` [PATCH V1 06/16] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common Oleksandr Tyshchenko
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The IOREQ is a common feature now and these helpers will be used
on Arm as is. Move them to include/xen/ioreq.h

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch
---
---
 xen/arch/x86/hvm/intercept.c |  1 +
 xen/include/asm-x86/hvm/io.h | 16 ----------------
 xen/include/xen/ioreq.h      | 16 ++++++++++++++++
 3 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index cd4c4c1..891e497 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -17,6 +17,7 @@
  * this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/ioreq.h>
 #include <xen/types.h>
 #include <xen/sched.h>
 #include <asm/regs.h>
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 558426b..fb64294 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -40,22 +40,6 @@ struct hvm_mmio_ops {
     hvm_mmio_write_t write;
 };
 
-static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
-{
-    return unlikely(p->df) ?
-           p->addr - (p->count - 1ul) * p->size :
-           p->addr;
-}
-
-static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p)
-{
-    unsigned long size = p->size;
-
-    return unlikely(p->df) ?
-           p->addr + size - 1:
-           p->addr + (p->count * size) - 1;
-}
-
 typedef int (*portio_action_t)(
     int dir, unsigned int port, unsigned int bytes, uint32_t *val);
 
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 2240a73..9521170 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -35,6 +35,22 @@ static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
     return GET_IOREQ_SERVER(d, id);
 }
 
+static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
+{
+    return unlikely(p->df) ?
+           p->addr - (p->count - 1ul) * p->size :
+           p->addr;
+}
+
+static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p)
+{
+    unsigned long size = p->size;
+
+    return unlikely(p->df) ?
+           p->addr + size - 1:
+           p->addr + (p->count * size) - 1;
+}
+
 static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
 {
     return ioreq->state == STATE_IOREQ_READY &&
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 06/16] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (4 preceding siblings ...)
  2020-09-10 20:21 ` [PATCH V1 05/16] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  2020-09-14 15:16   ` Jan Beulich
  2020-09-10 20:22 ` [PATCH V1 07/16] xen/dm: Make x86's DM feature common Oleksandr Tyshchenko
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Jan Beulich, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Julien Grall, Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The IOREQ is a common feature now and these structs will be used
on Arm as is. Move them to xen/ioreq.h

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch
---
---
 xen/include/asm-x86/hvm/domain.h | 34 ----------------------------------
 xen/include/xen/ioreq.h          | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 9d247ba..765f35c 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -30,40 +30,6 @@
 
 #include <public/hvm/dm_op.h>
 
-struct hvm_ioreq_page {
-    gfn_t gfn;
-    struct page_info *page;
-    void *va;
-};
-
-struct hvm_ioreq_vcpu {
-    struct list_head list_entry;
-    struct vcpu      *vcpu;
-    evtchn_port_t    ioreq_evtchn;
-    bool             pending;
-};
-
-#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
-#define MAX_NR_IO_RANGES  256
-
-struct hvm_ioreq_server {
-    struct domain          *target, *emulator;
-
-    /* Lock to serialize toolstack modifications */
-    spinlock_t             lock;
-
-    struct hvm_ioreq_page  ioreq;
-    struct list_head       ioreq_vcpu_list;
-    struct hvm_ioreq_page  bufioreq;
-
-    /* Lock to serialize access to buffered ioreq ring */
-    spinlock_t             bufioreq_lock;
-    evtchn_port_t          bufioreq_evtchn;
-    struct rangeset        *range[NR_IO_RANGE_TYPES];
-    bool                   enabled;
-    uint8_t                bufioreq_handling;
-};
-
 #ifdef CONFIG_MEM_SHARING
 struct mem_sharing_domain
 {
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 9521170..102f7e8 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -23,6 +23,40 @@
 
 #include <asm/hvm/ioreq.h>
 
+struct hvm_ioreq_page {
+    gfn_t gfn;
+    struct page_info *page;
+    void *va;
+};
+
+struct hvm_ioreq_vcpu {
+    struct list_head list_entry;
+    struct vcpu      *vcpu;
+    evtchn_port_t    ioreq_evtchn;
+    bool             pending;
+};
+
+#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
+#define MAX_NR_IO_RANGES  256
+
+struct hvm_ioreq_server {
+    struct domain          *target, *emulator;
+
+    /* Lock to serialize toolstack modifications */
+    spinlock_t             lock;
+
+    struct hvm_ioreq_page  ioreq;
+    struct list_head       ioreq_vcpu_list;
+    struct hvm_ioreq_page  bufioreq;
+
+    /* Lock to serialize access to buffered ioreq ring */
+    spinlock_t             bufioreq_lock;
+    evtchn_port_t          bufioreq_evtchn;
+    struct rangeset        *range[NR_IO_RANGE_TYPES];
+    bool                   enabled;
+    uint8_t                bufioreq_handling;
+};
+
 #define GET_IOREQ_SERVER(d, id) \
     (d)->arch.hvm.ioreq_server.server[id]
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 07/16] xen/dm: Make x86's DM feature common
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (5 preceding siblings ...)
  2020-09-10 20:22 ` [PATCH V1 06/16] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  2020-09-14 15:56   ` Jan Beulich
  2020-09-23 17:35   ` Julien Grall
  2020-09-10 20:22 ` [PATCH V1 08/16] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
                   ` (8 subsequent siblings)
  15 siblings, 2 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Jan Beulich, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini,
	Daniel De Graaf, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As a lot of x86 code can be re-used on Arm later on, this patch
splits devicemodel support into common and arch specific parts.

Also update XSM code a bit to let DM op be used on Arm.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - update XSM, related changes were pulled from:
     [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
---
---
 xen/arch/x86/hvm/dm.c       | 287 +++-----------------------------------------
 xen/common/Makefile         |   1 +
 xen/common/dm.c             | 287 ++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/hypercall.h |  12 ++
 xen/include/xsm/dummy.h     |   4 +-
 xen/include/xsm/xsm.h       |   6 +-
 xen/xsm/dummy.c             |   2 +-
 xen/xsm/flask/hooks.c       |   5 +-
 8 files changed, 327 insertions(+), 277 deletions(-)
 create mode 100644 xen/common/dm.c

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 5ce484a..6ae535e 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -29,13 +29,6 @@
 
 #include <public/hvm/hvm_op.h>
 
-struct dmop_args {
-    domid_t domid;
-    unsigned int nr_bufs;
-    /* Reserve enough buf elements for all current hypercalls. */
-    struct xen_dm_op_buf buf[2];
-};
-
 static bool _raw_copy_from_guest_buf_offset(void *dst,
                                             const struct dmop_args *args,
                                             unsigned int buf_idx,
@@ -338,148 +331,20 @@ static int inject_event(struct domain *d,
     return 0;
 }
 
-static int dm_op(const struct dmop_args *op_args)
+int arch_dm_op(struct xen_dm_op *op, struct domain *d,
+               const struct dmop_args *op_args, bool *const_op)
 {
-    struct domain *d;
-    struct xen_dm_op op;
-    bool const_op = true;
     long rc;
-    size_t offset;
-
-    static const uint8_t op_size[] = {
-        [XEN_DMOP_create_ioreq_server]              = sizeof(struct xen_dm_op_create_ioreq_server),
-        [XEN_DMOP_get_ioreq_server_info]            = sizeof(struct xen_dm_op_get_ioreq_server_info),
-        [XEN_DMOP_map_io_range_to_ioreq_server]     = sizeof(struct xen_dm_op_ioreq_server_range),
-        [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range),
-        [XEN_DMOP_set_ioreq_server_state]           = sizeof(struct xen_dm_op_set_ioreq_server_state),
-        [XEN_DMOP_destroy_ioreq_server]             = sizeof(struct xen_dm_op_destroy_ioreq_server),
-        [XEN_DMOP_track_dirty_vram]                 = sizeof(struct xen_dm_op_track_dirty_vram),
-        [XEN_DMOP_set_pci_intx_level]               = sizeof(struct xen_dm_op_set_pci_intx_level),
-        [XEN_DMOP_set_isa_irq_level]                = sizeof(struct xen_dm_op_set_isa_irq_level),
-        [XEN_DMOP_set_pci_link_route]               = sizeof(struct xen_dm_op_set_pci_link_route),
-        [XEN_DMOP_modified_memory]                  = sizeof(struct xen_dm_op_modified_memory),
-        [XEN_DMOP_set_mem_type]                     = sizeof(struct xen_dm_op_set_mem_type),
-        [XEN_DMOP_inject_event]                     = sizeof(struct xen_dm_op_inject_event),
-        [XEN_DMOP_inject_msi]                       = sizeof(struct xen_dm_op_inject_msi),
-        [XEN_DMOP_map_mem_type_to_ioreq_server]     = sizeof(struct xen_dm_op_map_mem_type_to_ioreq_server),
-        [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
-        [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
-        [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
-    };
-
-    rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
-    if ( rc )
-        return rc;
-
-    if ( !is_hvm_domain(d) )
-        goto out;
-
-    rc = xsm_dm_op(XSM_DM_PRIV, d);
-    if ( rc )
-        goto out;
-
-    offset = offsetof(struct xen_dm_op, u);
-
-    rc = -EFAULT;
-    if ( op_args->buf[0].size < offset )
-        goto out;
-
-    if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) )
-        goto out;
-
-    if ( op.op >= ARRAY_SIZE(op_size) )
-    {
-        rc = -EOPNOTSUPP;
-        goto out;
-    }
-
-    op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size));
-
-    if ( op_args->buf[0].size < offset + op_size[op.op] )
-        goto out;
-
-    if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset,
-                                op_size[op.op]) )
-        goto out;
-
-    rc = -EINVAL;
-    if ( op.pad )
-        goto out;
-
-    switch ( op.op )
-    {
-    case XEN_DMOP_create_ioreq_server:
-    {
-        struct xen_dm_op_create_ioreq_server *data =
-            &op.u.create_ioreq_server;
-
-        const_op = false;
-
-        rc = -EINVAL;
-        if ( data->pad[0] || data->pad[1] || data->pad[2] )
-            break;
-
-        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
-                                     &data->id);
-        break;
-    }
 
-    case XEN_DMOP_get_ioreq_server_info:
+    switch ( op->op )
     {
-        struct xen_dm_op_get_ioreq_server_info *data =
-            &op.u.get_ioreq_server_info;
-        const uint16_t valid_flags = XEN_DMOP_no_gfns;
-
-        const_op = false;
-
-        rc = -EINVAL;
-        if ( data->flags & ~valid_flags )
-            break;
-
-        rc = hvm_get_ioreq_server_info(d, data->id,
-                                       (data->flags & XEN_DMOP_no_gfns) ?
-                                       NULL : &data->ioreq_gfn,
-                                       (data->flags & XEN_DMOP_no_gfns) ?
-                                       NULL : &data->bufioreq_gfn,
-                                       &data->bufioreq_port);
-        break;
-    }
-
-    case XEN_DMOP_map_io_range_to_ioreq_server:
-    {
-        const struct xen_dm_op_ioreq_server_range *data =
-            &op.u.map_io_range_to_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
-                                              data->start, data->end);
-        break;
-    }
-
-    case XEN_DMOP_unmap_io_range_from_ioreq_server:
-    {
-        const struct xen_dm_op_ioreq_server_range *data =
-            &op.u.unmap_io_range_from_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
-                                                  data->start, data->end);
-        break;
-    }
-
     case XEN_DMOP_map_mem_type_to_ioreq_server:
     {
         struct xen_dm_op_map_mem_type_to_ioreq_server *data =
-            &op.u.map_mem_type_to_ioreq_server;
+            &op->u.map_mem_type_to_ioreq_server;
         unsigned long first_gfn = data->opaque;
 
-        const_op = false;
+        *const_op = false;
 
         rc = -EOPNOTSUPP;
         if ( !hap_enabled(d) )
@@ -523,36 +388,10 @@ static int dm_op(const struct dmop_args *op_args)
         break;
     }
 
-    case XEN_DMOP_set_ioreq_server_state:
-    {
-        const struct xen_dm_op_set_ioreq_server_state *data =
-            &op.u.set_ioreq_server_state;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
-        break;
-    }
-
-    case XEN_DMOP_destroy_ioreq_server:
-    {
-        const struct xen_dm_op_destroy_ioreq_server *data =
-            &op.u.destroy_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_destroy_ioreq_server(d, data->id);
-        break;
-    }
-
     case XEN_DMOP_track_dirty_vram:
     {
         const struct xen_dm_op_track_dirty_vram *data =
-            &op.u.track_dirty_vram;
+            &op->u.track_dirty_vram;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -568,7 +407,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_set_pci_intx_level:
     {
         const struct xen_dm_op_set_pci_intx_level *data =
-            &op.u.set_pci_intx_level;
+            &op->u.set_pci_intx_level;
 
         rc = set_pci_intx_level(d, data->domain, data->bus,
                                 data->device, data->intx,
@@ -579,7 +418,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_set_isa_irq_level:
     {
         const struct xen_dm_op_set_isa_irq_level *data =
-            &op.u.set_isa_irq_level;
+            &op->u.set_isa_irq_level;
 
         rc = set_isa_irq_level(d, data->isa_irq, data->level);
         break;
@@ -588,7 +427,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_set_pci_link_route:
     {
         const struct xen_dm_op_set_pci_link_route *data =
-            &op.u.set_pci_link_route;
+            &op->u.set_pci_link_route;
 
         rc = hvm_set_pci_link_route(d, data->link, data->isa_irq);
         break;
@@ -597,19 +436,19 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_modified_memory:
     {
         struct xen_dm_op_modified_memory *data =
-            &op.u.modified_memory;
+            &op->u.modified_memory;
 
         rc = modified_memory(d, op_args, data);
-        const_op = !rc;
+        *const_op = !rc;
         break;
     }
 
     case XEN_DMOP_set_mem_type:
     {
         struct xen_dm_op_set_mem_type *data =
-            &op.u.set_mem_type;
+            &op->u.set_mem_type;
 
-        const_op = false;
+        *const_op = false;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -622,7 +461,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_inject_event:
     {
         const struct xen_dm_op_inject_event *data =
-            &op.u.inject_event;
+            &op->u.inject_event;
 
         rc = -EINVAL;
         if ( data->pad0 || data->pad1 )
@@ -635,7 +474,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_inject_msi:
     {
         const struct xen_dm_op_inject_msi *data =
-            &op.u.inject_msi;
+            &op->u.inject_msi;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -648,7 +487,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_remote_shutdown:
     {
         const struct xen_dm_op_remote_shutdown *data =
-            &op.u.remote_shutdown;
+            &op->u.remote_shutdown;
 
         domain_shutdown(d, data->reason);
         rc = 0;
@@ -657,7 +496,7 @@ static int dm_op(const struct dmop_args *op_args)
 
     case XEN_DMOP_relocate_memory:
     {
-        struct xen_dm_op_relocate_memory *data = &op.u.relocate_memory;
+        struct xen_dm_op_relocate_memory *data = &op->u.relocate_memory;
         struct xen_add_to_physmap xatp = {
             .domid = op_args->domid,
             .size = data->size,
@@ -680,7 +519,7 @@ static int dm_op(const struct dmop_args *op_args)
             data->size -= rc;
             data->src_gfn += rc;
             data->dst_gfn += rc;
-            const_op = false;
+            *const_op = false;
             rc = -ERESTART;
         }
         break;
@@ -689,7 +528,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_pin_memory_cacheattr:
     {
         const struct xen_dm_op_pin_memory_cacheattr *data =
-            &op.u.pin_memory_cacheattr;
+            &op->u.pin_memory_cacheattr;
 
         if ( data->pad )
         {
@@ -707,94 +546,6 @@ static int dm_op(const struct dmop_args *op_args)
         break;
     }
 
-    if ( (!rc || rc == -ERESTART) &&
-         !const_op && copy_to_guest_offset(op_args->buf[0].h, offset,
-                                           (void *)&op.u, op_size[op.op]) )
-        rc = -EFAULT;
-
- out:
-    rcu_unlock_domain(d);
-
-    return rc;
-}
-
-CHECK_dm_op_create_ioreq_server;
-CHECK_dm_op_get_ioreq_server_info;
-CHECK_dm_op_ioreq_server_range;
-CHECK_dm_op_set_ioreq_server_state;
-CHECK_dm_op_destroy_ioreq_server;
-CHECK_dm_op_track_dirty_vram;
-CHECK_dm_op_set_pci_intx_level;
-CHECK_dm_op_set_isa_irq_level;
-CHECK_dm_op_set_pci_link_route;
-CHECK_dm_op_modified_memory;
-CHECK_dm_op_set_mem_type;
-CHECK_dm_op_inject_event;
-CHECK_dm_op_inject_msi;
-CHECK_dm_op_remote_shutdown;
-CHECK_dm_op_relocate_memory;
-CHECK_dm_op_pin_memory_cacheattr;
-
-int compat_dm_op(domid_t domid,
-                 unsigned int nr_bufs,
-                 XEN_GUEST_HANDLE_PARAM(void) bufs)
-{
-    struct dmop_args args;
-    unsigned int i;
-    int rc;
-
-    if ( nr_bufs > ARRAY_SIZE(args.buf) )
-        return -E2BIG;
-
-    args.domid = domid;
-    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
-
-    for ( i = 0; i < args.nr_bufs; i++ )
-    {
-        struct compat_dm_op_buf cmp;
-
-        if ( copy_from_guest_offset(&cmp, bufs, i, 1) )
-            return -EFAULT;
-
-#define XLAT_dm_op_buf_HNDL_h(_d_, _s_) \
-        guest_from_compat_handle((_d_)->h, (_s_)->h)
-
-        XLAT_dm_op_buf(&args.buf[i], &cmp);
-
-#undef XLAT_dm_op_buf_HNDL_h
-    }
-
-    rc = dm_op(&args);
-
-    if ( rc == -ERESTART )
-        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
-                                           domid, nr_bufs, bufs);
-
-    return rc;
-}
-
-long do_dm_op(domid_t domid,
-              unsigned int nr_bufs,
-              XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs)
-{
-    struct dmop_args args;
-    int rc;
-
-    if ( nr_bufs > ARRAY_SIZE(args.buf) )
-        return -E2BIG;
-
-    args.domid = domid;
-    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
-
-    if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) )
-        return -EFAULT;
-
-    rc = dm_op(&args);
-
-    if ( rc == -ERESTART )
-        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
-                                           domid, nr_bufs, bufs);
-
     return rc;
 }
 
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 8df2b6e..5cf7208 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -6,6 +6,7 @@ obj-$(CONFIG_CORE_PARKING) += core_parking.o
 obj-y += cpu.o
 obj-$(CONFIG_DEBUG_TRACE) += debugtrace.o
 obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o
+obj-$(CONFIG_IOREQ_SERVER) += dm.o
 obj-y += domctl.o
 obj-y += domain.o
 obj-y += event_2l.o
diff --git a/xen/common/dm.c b/xen/common/dm.c
new file mode 100644
index 0000000..060731d
--- /dev/null
+++ b/xen/common/dm.c
@@ -0,0 +1,287 @@
+/*
+ * Copyright (c) 2016 Citrix Systems Inc.
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/guest_access.h>
+#include <xen/hypercall.h>
+#include <xen/ioreq.h>
+#include <xen/nospec.h>
+
+static int dm_op(const struct dmop_args *op_args)
+{
+    struct domain *d;
+    struct xen_dm_op op;
+    long rc;
+    bool const_op = true;
+    const size_t offset = offsetof(struct xen_dm_op, u);
+
+    static const uint8_t op_size[] = {
+        [XEN_DMOP_create_ioreq_server]              = sizeof(struct xen_dm_op_create_ioreq_server),
+        [XEN_DMOP_get_ioreq_server_info]            = sizeof(struct xen_dm_op_get_ioreq_server_info),
+        [XEN_DMOP_map_io_range_to_ioreq_server]     = sizeof(struct xen_dm_op_ioreq_server_range),
+        [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range),
+        [XEN_DMOP_set_ioreq_server_state]           = sizeof(struct xen_dm_op_set_ioreq_server_state),
+        [XEN_DMOP_destroy_ioreq_server]             = sizeof(struct xen_dm_op_destroy_ioreq_server),
+        [XEN_DMOP_track_dirty_vram]                 = sizeof(struct xen_dm_op_track_dirty_vram),
+        [XEN_DMOP_set_pci_intx_level]               = sizeof(struct xen_dm_op_set_pci_intx_level),
+        [XEN_DMOP_set_isa_irq_level]                = sizeof(struct xen_dm_op_set_isa_irq_level),
+        [XEN_DMOP_set_pci_link_route]               = sizeof(struct xen_dm_op_set_pci_link_route),
+        [XEN_DMOP_modified_memory]                  = sizeof(struct xen_dm_op_modified_memory),
+        [XEN_DMOP_set_mem_type]                     = sizeof(struct xen_dm_op_set_mem_type),
+        [XEN_DMOP_inject_event]                     = sizeof(struct xen_dm_op_inject_event),
+        [XEN_DMOP_inject_msi]                       = sizeof(struct xen_dm_op_inject_msi),
+        [XEN_DMOP_map_mem_type_to_ioreq_server]     = sizeof(struct xen_dm_op_map_mem_type_to_ioreq_server),
+        [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
+        [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
+        [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
+    };
+
+    rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
+    if ( rc )
+        return rc;
+
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_dm_op(XSM_DM_PRIV, d);
+    if ( rc )
+        goto out;
+
+    rc = -EFAULT;
+    if ( op_args->buf[0].size < offset )
+        goto out;
+
+    if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) )
+        goto out;
+
+    if ( op.op >= ARRAY_SIZE(op_size) )
+    {
+        rc = -EOPNOTSUPP;
+        goto out;
+    }
+
+    op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size));
+
+    if ( op_args->buf[0].size < offset + op_size[op.op] )
+        goto out;
+
+    if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset,
+                                op_size[op.op]) )
+        goto out;
+
+    rc = -EINVAL;
+    if ( op.pad )
+        goto out;
+
+    switch ( op.op )
+    {
+    case XEN_DMOP_create_ioreq_server:
+    {
+        struct xen_dm_op_create_ioreq_server *data =
+            &op.u.create_ioreq_server;
+
+        const_op = false;
+
+        rc = -EINVAL;
+        if ( data->pad[0] || data->pad[1] || data->pad[2] )
+            break;
+
+        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
+                                     &data->id);
+        break;
+    }
+
+    case XEN_DMOP_get_ioreq_server_info:
+    {
+        struct xen_dm_op_get_ioreq_server_info *data =
+            &op.u.get_ioreq_server_info;
+        const uint16_t valid_flags = XEN_DMOP_no_gfns;
+
+        const_op = false;
+
+        rc = -EINVAL;
+        if ( data->flags & ~valid_flags )
+            break;
+
+        rc = hvm_get_ioreq_server_info(d, data->id,
+                                       (data->flags & XEN_DMOP_no_gfns) ?
+                                       NULL : (unsigned long *)&data->ioreq_gfn,
+                                       (data->flags & XEN_DMOP_no_gfns) ?
+                                       NULL : (unsigned long *)&data->bufioreq_gfn,
+                                       &data->bufioreq_port);
+        break;
+    }
+
+    case XEN_DMOP_map_io_range_to_ioreq_server:
+    {
+        const struct xen_dm_op_ioreq_server_range *data =
+            &op.u.map_io_range_to_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
+                                              data->start, data->end);
+        break;
+    }
+
+    case XEN_DMOP_unmap_io_range_from_ioreq_server:
+    {
+        const struct xen_dm_op_ioreq_server_range *data =
+            &op.u.unmap_io_range_from_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
+                                                  data->start, data->end);
+        break;
+    }
+
+    case XEN_DMOP_set_ioreq_server_state:
+    {
+        const struct xen_dm_op_set_ioreq_server_state *data =
+            &op.u.set_ioreq_server_state;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
+        break;
+    }
+
+    case XEN_DMOP_destroy_ioreq_server:
+    {
+        const struct xen_dm_op_destroy_ioreq_server *data =
+            &op.u.destroy_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_destroy_ioreq_server(d, data->id);
+        break;
+    }
+
+    default:
+        rc = arch_dm_op(&op, d, op_args, &const_op);
+    }
+
+    if ( (!rc || rc == -ERESTART) &&
+         !const_op && copy_to_guest_offset(op_args->buf[0].h, offset,
+                                           (void *)&op.u, op_size[op.op]) )
+        rc = -EFAULT;
+
+ out:
+    rcu_unlock_domain(d);
+
+    return rc;
+}
+
+#ifdef CONFIG_COMPAT
+CHECK_dm_op_create_ioreq_server;
+CHECK_dm_op_get_ioreq_server_info;
+CHECK_dm_op_ioreq_server_range;
+CHECK_dm_op_set_ioreq_server_state;
+CHECK_dm_op_destroy_ioreq_server;
+CHECK_dm_op_track_dirty_vram;
+CHECK_dm_op_set_pci_intx_level;
+CHECK_dm_op_set_isa_irq_level;
+CHECK_dm_op_set_pci_link_route;
+CHECK_dm_op_modified_memory;
+CHECK_dm_op_set_mem_type;
+CHECK_dm_op_inject_event;
+CHECK_dm_op_inject_msi;
+CHECK_dm_op_remote_shutdown;
+CHECK_dm_op_relocate_memory;
+CHECK_dm_op_pin_memory_cacheattr;
+
+int compat_dm_op(domid_t domid,
+                 unsigned int nr_bufs,
+                 XEN_GUEST_HANDLE_PARAM(void) bufs)
+{
+    struct dmop_args args;
+    unsigned int i;
+    int rc;
+
+    if ( nr_bufs > ARRAY_SIZE(args.buf) )
+        return -E2BIG;
+
+    args.domid = domid;
+    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
+
+    for ( i = 0; i < args.nr_bufs; i++ )
+    {
+        struct compat_dm_op_buf cmp;
+
+        if ( copy_from_guest_offset(&cmp, bufs, i, 1) )
+            return -EFAULT;
+
+#define XLAT_dm_op_buf_HNDL_h(_d_, _s_) \
+        guest_from_compat_handle((_d_)->h, (_s_)->h)
+
+        XLAT_dm_op_buf(&args.buf[i], &cmp);
+
+#undef XLAT_dm_op_buf_HNDL_h
+    }
+
+    rc = dm_op(&args);
+
+    if ( rc == -ERESTART )
+        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
+                                           domid, nr_bufs, bufs);
+
+    return rc;
+}
+#endif
+
+long do_dm_op(domid_t domid,
+              unsigned int nr_bufs,
+              XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs)
+{
+    struct dmop_args args;
+    int rc;
+
+    if ( nr_bufs > ARRAY_SIZE(args.buf) )
+        return -E2BIG;
+
+    args.domid = domid;
+    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
+
+    if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) )
+        return -EFAULT;
+
+    rc = dm_op(&args);
+
+    if ( rc == -ERESTART )
+        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
+                                           domid, nr_bufs, bufs);
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index 655acc7..19f509f 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -150,6 +150,18 @@ do_dm_op(
     unsigned int nr_bufs,
     XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs);
 
+struct dmop_args {
+    domid_t domid;
+    unsigned int nr_bufs;
+    /* Reserve enough buf elements for all current hypercalls. */
+    struct xen_dm_op_buf buf[2];
+};
+
+int arch_dm_op(struct xen_dm_op *op,
+               struct domain *d,
+               const struct dmop_args *op_args,
+               bool *const_op);
+
 #ifdef CONFIG_HYPFS
 extern long
 do_hypfs_op(
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 5f6f842..c0813c0 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -723,14 +723,14 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
     }
 }
 
+#endif /* CONFIG_X86 */
+
 static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
 {
     XSM_ASSERT_ACTION(XSM_DM_PRIV);
     return xsm_default_action(action, current->domain, d);
 }
 
-#endif /* CONFIG_X86 */
-
 #ifdef CONFIG_ARGO
 static XSM_INLINE int xsm_argo_enable(const struct domain *d)
 {
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index a80bcf3..2a9b39d 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -177,8 +177,8 @@ struct xsm_operations {
     int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*pmu_op) (struct domain *d, unsigned int op);
-    int (*dm_op) (struct domain *d);
 #endif
+    int (*dm_op) (struct domain *d);
     int (*xen_version) (uint32_t cmd);
     int (*domain_resource_map) (struct domain *d);
 #ifdef CONFIG_ARGO
@@ -688,13 +688,13 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
     return xsm_ops->pmu_op(d, op);
 }
 
+#endif /* CONFIG_X86 */
+
 static inline int xsm_dm_op(xsm_default_t def, struct domain *d)
 {
     return xsm_ops->dm_op(d);
 }
 
-#endif /* CONFIG_X86 */
-
 static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
 {
     return xsm_ops->xen_version(op);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index d4cce68..e3afd06 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -148,8 +148,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, ioport_permission);
     set_to_dummy_if_null(ops, ioport_mapping);
     set_to_dummy_if_null(ops, pmu_op);
-    set_to_dummy_if_null(ops, dm_op);
 #endif
+    set_to_dummy_if_null(ops, dm_op);
     set_to_dummy_if_null(ops, xen_version);
     set_to_dummy_if_null(ops, domain_resource_map);
 #ifdef CONFIG_ARGO
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index a314bf8..645192a 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1662,14 +1662,13 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
         return -EPERM;
     }
 }
+#endif /* CONFIG_X86 */
 
 static int flask_dm_op(struct domain *d)
 {
     return current_has_perm(d, SECCLASS_HVM, HVM__DM);
 }
 
-#endif /* CONFIG_X86 */
-
 static int flask_xen_version (uint32_t op)
 {
     u32 dsid = domain_sid(current->domain);
@@ -1872,8 +1871,8 @@ static struct xsm_operations flask_ops = {
     .ioport_permission = flask_ioport_permission,
     .ioport_mapping = flask_ioport_mapping,
     .pmu_op = flask_pmu_op,
-    .dm_op = flask_dm_op,
 #endif
+    .dm_op = flask_dm_op,
     .xen_version = flask_xen_version,
     .domain_resource_map = flask_domain_resource_map,
 #ifdef CONFIG_ARGO
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 08/16] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (6 preceding siblings ...)
  2020-09-10 20:22 ` [PATCH V1 07/16] xen/dm: Make x86's DM feature common Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  2020-09-10 20:22 ` [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Jan Beulich, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As x86 implementation of XENMEM_resource_ioreq_server can be
re-used on Arm later on, this patch makes it common and removes
arch_acquire_resource as unneeded.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - no changes
---
---
 xen/arch/x86/mm.c        | 44 --------------------------------------------
 xen/common/memory.c      | 45 +++++++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-arm/mm.h |  8 --------
 xen/include/asm-x86/mm.h |  4 ----
 4 files changed, 43 insertions(+), 58 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 776d2b6..a5f6f12 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4594,50 +4594,6 @@ int xenmem_add_to_physmap_one(
     return rc;
 }
 
-int arch_acquire_resource(struct domain *d, unsigned int type,
-                          unsigned int id, unsigned long frame,
-                          unsigned int nr_frames, xen_pfn_t mfn_list[])
-{
-    int rc;
-
-    switch ( type )
-    {
-#ifdef CONFIG_HVM
-    case XENMEM_resource_ioreq_server:
-    {
-        ioservid_t ioservid = id;
-        unsigned int i;
-
-        rc = -EINVAL;
-        if ( !is_hvm_domain(d) )
-            break;
-
-        if ( id != (unsigned int)ioservid )
-            break;
-
-        rc = 0;
-        for ( i = 0; i < nr_frames; i++ )
-        {
-            mfn_t mfn;
-
-            rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn);
-            if ( rc )
-                break;
-
-            mfn_list[i] = mfn_x(mfn);
-        }
-        break;
-    }
-#endif
-
-    default:
-        rc = -EOPNOTSUPP;
-        break;
-    }
-
-    return rc;
-}
-
 long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     int rc;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c..e551fa6 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -30,6 +30,10 @@
 #include <public/memory.h>
 #include <xsm/xsm.h>
 
+#ifdef CONFIG_IOREQ_SERVER
+#include <xen/ioreq.h>
+#endif
+
 #ifdef CONFIG_X86
 #include <asm/guest.h>
 #endif
@@ -1045,6 +1049,38 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
     return 0;
 }
 
+#ifdef CONFIG_IOREQ_SERVER
+static int acquire_ioreq_server(struct domain *d,
+                                unsigned int id,
+                                unsigned long frame,
+                                unsigned int nr_frames,
+                                xen_pfn_t mfn_list[])
+{
+    ioservid_t ioservid = id;
+    unsigned int i;
+    int rc;
+
+    if ( !is_hvm_domain(d) )
+        return -EINVAL;
+
+    if ( id != (unsigned int)ioservid )
+        return -EINVAL;
+
+    for ( i = 0; i < nr_frames; i++ )
+    {
+        mfn_t mfn;
+
+        rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn);
+        if ( rc )
+            return rc;
+
+        mfn_list[i] = mfn_x(mfn);
+    }
+
+    return 0;
+}
+#endif
+
 static int acquire_resource(
     XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
 {
@@ -1095,9 +1131,14 @@ static int acquire_resource(
                                  mfn_list);
         break;
 
+#ifdef CONFIG_IOREQ_SERVER
+    case XENMEM_resource_ioreq_server:
+        rc = acquire_ioreq_server(d, xmar.id, xmar.frame, xmar.nr_frames,
+                                  mfn_list);
+        break;
+#endif
     default:
-        rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame,
-                                   xmar.nr_frames, mfn_list);
+        rc = -EOPNOTSUPP;
         break;
     }
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index f8ba49b..0b7de31 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -358,14 +358,6 @@ static inline void put_page_and_type(struct page_info *page)
 
 void clear_and_clean_page(struct page_info *page);
 
-static inline
-int arch_acquire_resource(struct domain *d, unsigned int type, unsigned int id,
-                          unsigned long frame, unsigned int nr_frames,
-                          xen_pfn_t mfn_list[])
-{
-    return -EOPNOTSUPP;
-}
-
 unsigned int arch_get_dma_bitsize(void);
 
 #endif /*  __ARCH_ARM_MM__ */
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 7e74996..2e111ad 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -649,8 +649,4 @@ static inline bool arch_mfn_in_directmap(unsigned long mfn)
     return mfn <= (virt_to_mfn(eva - 1) + 1);
 }
 
-int arch_acquire_resource(struct domain *d, unsigned int type,
-                          unsigned int id, unsigned long frame,
-                          unsigned int nr_frames, xen_pfn_t mfn_list[]);
-
 #endif /* __ASM_X86_MM_H__ */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (7 preceding siblings ...)
  2020-09-10 20:22 ` [PATCH V1 08/16] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  2020-09-11 10:14   ` Oleksandr
                     ` (2 more replies)
  2020-09-10 20:22 ` [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
                   ` (6 subsequent siblings)
  15 siblings, 3 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch adds basic IOREQ/DM support on Arm. The subsequent
patches will improve functionality, add remaining bits as well as
address several TODOs.

Please note, the "PIO handling" TODO is expected to left unaddressed
for the current series. It is not an big issue for now while Xen
doesn't have support for vPCI on Arm. On Arm64 they are only used
for PCI IO Bar and we would probably want to expose them to emulator
as PIO access to make a DM completely arch-agnostic. So "PIO handling"
should be implemented when we add support for vPCI.

Please note, at the moment build on Arm32 is broken (see cmpxchg
usage in hvm_send_buffered_ioreq()) due to the lack of cmpxchg_64
support on Arm32. There is a patch on review to address this issue:
https://patchwork.kernel.org/patch/11715559/

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - was split into:
     - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
     - xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
   - update patch description
   - update asm-arm/hvm/ioreq.h according to the newly introduced arch functions:
     - arch_hvm_destroy_ioreq_server()
     - arch_handle_hvm_io_completion()
   - update arch files to include xen/ioreq.h
   - remove HVMOP plumbing
   - rewrite a logic to handle properly case when hvm_send_ioreq() returns IO_RETRY
   - add a logic to handle properly handle_hvm_io_completion() return value
   - rename handle_mmio() to ioreq_handle_complete_mmio()
   - move paging_mark_pfn_dirty() to asm-arm/paging.h
   - remove forward declaration for hvm_ioreq_server in asm-arm/paging.h
   - move try_fwd_ioserv() to ioreq.c, provide stubs if !CONFIG_IOREQ_SERVER
   - do not remove #ifdef CONFIG_IOREQ_SERVER in memory.c for guarding xen/ioreq.h
   - use gdprintk in try_fwd_ioserv(), remove unneeded prints
   - update list of #include-s
   - move has_vpci() to asm-arm/domain.h
   - add a comment (TODO) to unimplemented yet handle_pio()
   - remove hvm_mmio_first(last)_byte() and hvm_ioreq_(page/vcpu/server) structs
     from the arch files, they were already moved to the common code
   - remove set_foreign_p2m_entry() changes, they will be properly implemented
     in the follow-up patch
   - select IOREQ_SERVER for Arm instead of Arm64 in Kconfig
   - remove x86's realmode and other unneeded stubs from xen/ioreq.h
   - clafify ioreq_t p.df usage in try_fwd_ioserv()
   - set ioreq_t p.count to 1 in try_fwd_ioserv()
---
---
 xen/arch/arm/Kconfig            |   1 +
 xen/arch/arm/Makefile           |   2 +
 xen/arch/arm/dm.c               |  33 ++++++++++
 xen/arch/arm/domain.c           |   9 +++
 xen/arch/arm/io.c               |  11 +++-
 xen/arch/arm/ioreq.c            | 142 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/traps.c            |  32 +++++++--
 xen/include/asm-arm/domain.h    |  46 +++++++++++++
 xen/include/asm-arm/hvm/ioreq.h | 108 ++++++++++++++++++++++++++++++
 xen/include/asm-arm/mmio.h      |   1 +
 xen/include/asm-arm/paging.h    |   4 ++
 11 files changed, 384 insertions(+), 5 deletions(-)
 create mode 100644 xen/arch/arm/dm.c
 create mode 100644 xen/arch/arm/ioreq.c
 create mode 100644 xen/include/asm-arm/hvm/ioreq.h

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 2777388..8264cd6 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -21,6 +21,7 @@ config ARM
 	select HAS_PASSTHROUGH
 	select HAS_PDX
 	select IOMMU_FORCE_PT_SHARE
+	select IOREQ_SERVER
 
 config ARCH_DEFCONFIG
 	string
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 7e82b21..617fa3e 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -13,6 +13,7 @@ obj-y += cpuerrata.o
 obj-y += cpufeature.o
 obj-y += decode.o
 obj-y += device.o
+obj-$(CONFIG_IOREQ_SERVER) += dm.o
 obj-y += domain.o
 obj-y += domain_build.init.o
 obj-y += domctl.o
@@ -27,6 +28,7 @@ obj-y += guest_atomics.o
 obj-y += guest_walk.o
 obj-y += hvm.o
 obj-y += io.o
+obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
 obj-y += irq.o
 obj-y += kernel.init.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
new file mode 100644
index 0000000..eb20344
--- /dev/null
+++ b/xen/arch/arm/dm.c
@@ -0,0 +1,33 @@
+/*
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/hypercall.h>
+
+int arch_dm_op(struct xen_dm_op *op, struct domain *d,
+               const struct dmop_args *op_args, bool *const_op)
+{
+    return -EOPNOTSUPP;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 3116932..043db3f 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -14,6 +14,7 @@
 #include <xen/grant_table.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/livepatch.h>
 #include <xen/sched.h>
@@ -681,6 +682,10 @@ int arch_domain_create(struct domain *d,
 
     ASSERT(config != NULL);
 
+#ifdef CONFIG_IOREQ_SERVER
+    hvm_ioreq_init(d);
+#endif
+
     /* p2m_init relies on some value initialized by the IOMMU subsystem */
     if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
         goto fail;
@@ -999,6 +1004,10 @@ int domain_relinquish_resources(struct domain *d)
         if (ret )
             return ret;
 
+#ifdef CONFIG_IOREQ_SERVER
+        hvm_destroy_all_ioreq_servers(d);
+#endif
+
     PROGRESS(xen):
         ret = relinquish_memory(d, &d->xenpage_list);
         if ( ret )
diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index ae7ef96..adc9de7 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -16,6 +16,7 @@
  * GNU General Public License for more details.
  */
 
+#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/spinlock.h>
 #include <xen/sched.h>
@@ -123,7 +124,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
 
     handler = find_mmio_handler(v->domain, info.gpa);
     if ( !handler )
-        return IO_UNHANDLED;
+    {
+        int rc;
+
+        rc = try_fwd_ioserv(regs, v, &info);
+        if ( rc == IO_HANDLED )
+            return handle_ioserv(regs, v);
+
+        return rc;
+    }
 
     /* All the instructions used on emulated MMIO region should be valid */
     if ( !dabt.valid )
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
new file mode 100644
index 0000000..e493c5b
--- /dev/null
+++ b/xen/arch/arm/ioreq.c
@@ -0,0 +1,142 @@
+/*
+ * arm/ioreq.c: hardware virtual machine I/O emulation
+ *
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/domain.h>
+#include <xen/ioreq.h>
+
+#include <public/hvm/ioreq.h>
+
+#include <asm/traps.h>
+
+enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
+{
+    const union hsr hsr = { .bits = regs->hsr };
+    const struct hsr_dabt dabt = hsr.dabt;
+    /* Code is similar to handle_read */
+    uint8_t size = (1 << dabt.size) * 8;
+    register_t r = v->arch.hvm.hvm_io.io_req.data;
+
+    /* We are done with the IO */
+    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
+
+    /* XXX: Do we need to take care of write here ? */
+    if ( dabt.write )
+        return IO_HANDLED;
+
+    /*
+     * Sign extend if required.
+     * Note that we expect the read handler to have zeroed the bits
+     * outside the requested access size.
+     */
+    if ( dabt.sign && (r & (1UL << (size - 1))) )
+    {
+        /*
+         * We are relying on register_t using the same as
+         * an unsigned long in order to keep the 32-bit assembly
+         * code smaller.
+         */
+        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
+        r |= (~0UL) << size;
+    }
+
+    set_user_reg(regs, dabt.reg, r);
+
+    return IO_HANDLED;
+}
+
+enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
+                             struct vcpu *v, mmio_info_t *info)
+{
+    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
+    ioreq_t p = {
+        .type = IOREQ_TYPE_COPY,
+        .addr = info->gpa,
+        .size = 1 << info->dabt.size,
+        .count = 1,
+        .dir = !info->dabt.write,
+        /*
+         * On x86, df is used by 'rep' instruction to tell the direction
+         * to iterate (forward or backward).
+         * On Arm, all the accesses to MMIO region will do a single
+         * memory access. So for now, we can safely always set to 0.
+         */
+        .df = 0,
+        .data = get_user_reg(regs, info->dabt.reg),
+        .state = STATE_IOREQ_READY,
+    };
+    struct hvm_ioreq_server *s = NULL;
+    enum io_state rc;
+
+    switch ( vio->io_req.state )
+    {
+    case STATE_IOREQ_NONE:
+        break;
+
+    case STATE_IORESP_READY:
+        return IO_HANDLED;
+
+    default:
+        gdprintk(XENLOG_ERR, "wrong state %u\n", vio->io_req.state);
+        return IO_ABORT;
+    }
+
+    s = hvm_select_ioreq_server(v->domain, &p);
+    if ( !s )
+        return IO_UNHANDLED;
+
+    if ( !info->dabt.valid )
+        return IO_ABORT;
+
+    vio->io_req = p;
+
+    rc = hvm_send_ioreq(s, &p, 0);
+    if ( rc != IO_RETRY || v->domain->is_shutting_down )
+        vio->io_req.state = STATE_IOREQ_NONE;
+    else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
+        rc = IO_HANDLED;
+    else
+        vio->io_completion = HVMIO_mmio_completion;
+
+    return rc;
+}
+
+bool ioreq_handle_complete_mmio(void)
+{
+    struct vcpu *v = current;
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    const union hsr hsr = { .bits = regs->hsr };
+    paddr_t addr = v->arch.hvm.hvm_io.io_req.addr;
+
+    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
+    {
+        advance_pc(regs, hsr);
+        return true;
+    }
+
+    return false;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 8f40d0e..121942c 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -21,6 +21,7 @@
 #include <xen/hypercall.h>
 #include <xen/init.h>
 #include <xen/iocap.h>
+#include <xen/ioreq.h>
 #include <xen/irq.h>
 #include <xen/lib.h>
 #include <xen/mem_access.h>
@@ -1384,6 +1385,9 @@ static arm_hypercall_t arm_hypercall_table[] = {
 #ifdef CONFIG_HYPFS
     HYPERCALL(hypfs_op, 5),
 #endif
+#ifdef CONFIG_IOREQ_SERVER
+    HYPERCALL(dm_op, 3),
+#endif
 };
 
 #ifndef NDEBUG
@@ -1955,9 +1959,14 @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
             case IO_HANDLED:
                 advance_pc(regs, hsr);
                 return;
+            case IO_RETRY:
+                /* finish later */
+                return;
             case IO_UNHANDLED:
                 /* IO unhandled, try another way to handle it. */
                 break;
+            default:
+                ASSERT_UNREACHABLE();
             }
         }
 
@@ -2249,12 +2258,23 @@ static void check_for_pcpu_work(void)
  * Process pending work for the vCPU. Any call should be fast or
  * implement preemption.
  */
-static void check_for_vcpu_work(void)
+static bool check_for_vcpu_work(void)
 {
     struct vcpu *v = current;
 
+#ifdef CONFIG_IOREQ_SERVER
+    bool handled;
+
+    local_irq_enable();
+    handled = handle_hvm_io_completion(v);
+    local_irq_disable();
+
+    if ( !handled )
+        return true;
+#endif
+
     if ( likely(!v->arch.need_flush_to_ram) )
-        return;
+        return false;
 
     /*
      * Give a chance for the pCPU to process work before handling the vCPU
@@ -2265,6 +2285,8 @@ static void check_for_vcpu_work(void)
     local_irq_enable();
     p2m_flush_vm(v);
     local_irq_disable();
+
+    return false;
 }
 
 /*
@@ -2277,8 +2299,10 @@ void leave_hypervisor_to_guest(void)
 {
     local_irq_disable();
 
-    check_for_vcpu_work();
-    check_for_pcpu_work();
+    do
+    {
+        check_for_pcpu_work();
+    } while ( check_for_vcpu_work() );
 
     vgic_sync_to_lrs();
 
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 6819a3b..d1c48d7 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -11,10 +11,27 @@
 #include <asm/vgic.h>
 #include <asm/vpl011.h>
 #include <public/hvm/params.h>
+#include <public/hvm/dm_op.h>
+#include <public/hvm/ioreq.h>
+
+#define MAX_NR_IOREQ_SERVERS 8
 
 struct hvm_domain
 {
     uint64_t              params[HVM_NR_PARAMS];
+
+    /* Guest page range used for non-default ioreq servers */
+    struct {
+        unsigned long base;
+        unsigned long mask;
+        unsigned long legacy_mask; /* indexed by HVM param number */
+    } ioreq_gfn;
+
+    /* Lock protects all other values in the sub-struct and the default */
+    struct {
+        spinlock_t              lock;
+        struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
+    } ioreq_server;
 };
 
 #ifdef CONFIG_ARM_64
@@ -91,6 +108,28 @@ struct arch_domain
 #endif
 }  __cacheline_aligned;
 
+enum hvm_io_completion {
+    HVMIO_no_completion,
+    HVMIO_mmio_completion,
+    HVMIO_pio_completion
+};
+
+struct hvm_vcpu_io {
+    /* I/O request in flight to device model. */
+    enum hvm_io_completion io_completion;
+    ioreq_t                io_req;
+
+    /*
+     * HVM emulation:
+     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
+     *  The latter is known to be an MMIO frame (not RAM).
+     *  This translation is only valid for accesses as per @mmio_access.
+     */
+    struct npfec        mmio_access;
+    unsigned long       mmio_gla;
+    unsigned long       mmio_gpfn;
+};
+
 struct arch_vcpu
 {
     struct {
@@ -204,6 +243,11 @@ struct arch_vcpu
      */
     bool need_flush_to_ram;
 
+    struct hvm_vcpu
+    {
+        struct hvm_vcpu_io hvm_io;
+    } hvm;
+
 }  __cacheline_aligned;
 
 void vcpu_show_execution_state(struct vcpu *);
@@ -262,6 +306,8 @@ static inline void arch_vcpu_block(struct vcpu *v) {}
 
 #define arch_vm_assist_valid_mask(d) (1UL << VMASST_TYPE_runstate_update_flag)
 
+#define has_vpci(d)    ({ (void)(d); false; })
+
 #endif /* __ASM_DOMAIN_H__ */
 
 /*
diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h
new file mode 100644
index 0000000..1c34df0
--- /dev/null
+++ b/xen/include/asm-arm/hvm/ioreq.h
@@ -0,0 +1,108 @@
+/*
+ * hvm.h: Hardware virtual machine assist interface definitions.
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __ASM_ARM_HVM_IOREQ_H__
+#define __ASM_ARM_HVM_IOREQ_H__
+
+#include <public/hvm/ioreq.h>
+#include <public/hvm/dm_op.h>
+
+#ifdef CONFIG_IOREQ_SERVER
+enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v);
+enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
+                             struct vcpu *v, mmio_info_t *info);
+#else
+static inline enum io_state handle_ioserv(struct cpu_user_regs *regs,
+                                          struct vcpu *v)
+{
+    return IO_UNHANDLED;
+}
+
+static inline enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
+                                           struct vcpu *v, mmio_info_t *info)
+{
+    return IO_UNHANDLED;
+}
+#endif
+
+bool ioreq_handle_complete_mmio(void);
+
+static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
+{
+    /*
+     * TODO: For Arm64, the main user will be PCI. So this should be
+     * implemented when we add support for vPCI.
+     */
+    BUG();
+    return true;
+}
+
+static inline int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s)
+{
+    return 0;
+}
+
+static inline void msix_write_completion(struct vcpu *v)
+{
+}
+
+static inline bool arch_handle_hvm_io_completion(
+    enum hvm_io_completion io_completion)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline int hvm_get_ioreq_server_range_type(struct domain *d,
+                                                  ioreq_t *p,
+                                                  uint8_t *type,
+                                                  uint64_t *addr)
+{
+    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
+        return -EINVAL;
+
+    *type = (p->type == IOREQ_TYPE_PIO) ?
+             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
+    *addr = p->addr;
+
+    return 0;
+}
+
+static inline void arch_hvm_ioreq_init(struct domain *d)
+{
+}
+
+static inline void arch_hvm_ioreq_destroy(struct domain *d)
+{
+}
+
+#define IOREQ_IO_HANDLED     IO_HANDLED
+#define IOREQ_IO_UNHANDLED   IO_UNHANDLED
+#define IOREQ_IO_RETRY       IO_RETRY
+
+#endif /* __ASM_ARM_HVM_IOREQ_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h
index 8dbfb27..7ab873c 100644
--- a/xen/include/asm-arm/mmio.h
+++ b/xen/include/asm-arm/mmio.h
@@ -37,6 +37,7 @@ enum io_state
     IO_ABORT,       /* The IO was handled by the helper and led to an abort. */
     IO_HANDLED,     /* The IO was successfully handled by the helper. */
     IO_UNHANDLED,   /* The IO was not handled by the helper. */
+    IO_RETRY,       /* Retry the emulation for some reason */
 };
 
 typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info,
diff --git a/xen/include/asm-arm/paging.h b/xen/include/asm-arm/paging.h
index 6d1a000..0550c55 100644
--- a/xen/include/asm-arm/paging.h
+++ b/xen/include/asm-arm/paging.h
@@ -4,6 +4,10 @@
 #define paging_mode_translate(d)              (1)
 #define paging_mode_external(d)               (1)
 
+static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
+{
+}
+
 #endif /* XEN_PAGING_H */
 
 /*
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (8 preceding siblings ...)
  2020-09-10 20:22 ` [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  2020-09-16  7:17   ` Jan Beulich
  2020-09-16  8:08   ` Jan Beulich
  2020-09-10 20:22 ` [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server() Oleksandr Tyshchenko
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Wei Liu, Roger Pau Monné,
	Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch implements reference counting of foreign entries in
in set_foreign_p2m_entry() on Arm. This is a mandatory action if
we want to run emulator (IOREQ server) in other than dom0 domain,
as we can't trust it to do the right thing if it is not running
in dom0. So we need to grab a reference on the page to avoid it
disappearing.

It was tested with IOREQ feature to confirm that all the pages given
to this function belong to a domain, so we can use the same approach
as for XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap_one().

This involves adding an extra parameter for the foreign domain to
set_foreign_p2m_entry().

Also remove restriction for the hardware domain in the common code
if we run on Arm.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch, was split from:
     "[RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features"
   - rewrite a logic to handle properly reference in set_foreign_p2m_entry()
     instead of treating foreign entries as p2m_ram_rw
---
---
 xen/arch/arm/p2m.c        | 16 ++++++++++++++++
 xen/arch/x86/mm/p2m.c     |  5 +++--
 xen/common/memory.c       |  4 +++-
 xen/include/asm-arm/p2m.h | 11 ++---------
 xen/include/asm-x86/p2m.h |  3 ++-
 5 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index ce59f2b..cb64fc5 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1385,6 +1385,22 @@ int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn,
     return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
 }
 
+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
+{
+    struct page_info *page = mfn_to_page(mfn);
+    int rc;
+
+    if ( !get_page(page, fd) )
+        return -EINVAL;
+
+    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw);
+    if ( rc )
+        put_page(page);
+
+    return 0;
+}
+
 static struct page_info *p2m_allocate_root(void)
 {
     struct page_info *page;
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index db7bde0..f27f8a4 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1320,7 +1320,8 @@ static int set_typed_p2m_entry(struct domain *d, unsigned long gfn_l,
 }
 
 /* Set foreign mfn in the given guest's p2m table. */
-int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
 {
     return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign,
                                p2m_get_hostp2m(d)->default_access);
@@ -2619,7 +2620,7 @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
      * will update the m2p table which will result in  mfn -> gpfn of dom0
      * and not fgfn of domU.
      */
-    rc = set_foreign_p2m_entry(tdom, gpfn, mfn);
+    rc = set_foreign_p2m_entry(tdom, fdom, gpfn, mfn);
     if ( rc )
         gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. "
                  "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
diff --git a/xen/common/memory.c b/xen/common/memory.c
index e551fa6..78781f1 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1155,6 +1155,7 @@ static int acquire_resource(
         xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)];
         unsigned int i;
 
+#ifndef CONFIG_ARM
         /*
          * FIXME: Until foreign pages inserted into the P2M are properly
          *        reference counted, it is unsafe to allow mapping of
@@ -1162,13 +1163,14 @@ static int acquire_resource(
          */
         if ( !is_hardware_domain(currd) )
             return -EACCES;
+#endif
 
         if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
             rc = -EFAULT;
 
         for ( i = 0; !rc && i < xmar.nr_frames; i++ )
         {
-            rc = set_foreign_p2m_entry(currd, gfn_list[i],
+            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
                                        _mfn(mfn_list[i]));
             /* rc should be -EIO for any iteration other than the first */
             if ( rc && i )
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 5fdb6e8..53ce373 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -381,15 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned int order)
     return gfn_add(gfn, 1UL << order);
 }
 
-static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
-                                        mfn_t mfn)
-{
-    /*
-     * NOTE: If this is implemented then proper reference counting of
-     *       foreign entries will need to be implemented.
-     */
-    return -EOPNOTSUPP;
-}
+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn,  mfn_t mfn);
 
 /*
  * A vCPU has cache enabled only when the MMU is enabled and data cache
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 8abae34..23bdca1 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -635,7 +635,8 @@ int p2m_is_logdirty_range(struct p2m_domain *, unsigned long start,
                           unsigned long end);
 
 /* Set foreign entry in the p2m table (for priv-mapping) */
-int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn, mfn_t mfn);
 
 /* Set mmio addresses in the p2m table (for pass-through) */
 int set_mmio_p2m_entry(struct domain *d, gfn_t gfn, mfn_t mfn,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (9 preceding siblings ...)
  2020-09-10 20:22 ` [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  2020-09-16  8:04   ` Jan Beulich
  2020-09-10 20:22 ` [PATCH V1 12/16] xen/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Paul Durrant, Jan Beulich, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch introduces a helper the main purpose of which is to check
if a domain is using IOREQ server(s).

On Arm the benefit is to avoid calling handle_hvm_io_completion()
(which implies iterating over all possible IOREQ servers anyway)
on every return in leave_hypervisor_to_guest() if there is no active
servers for the particular domain.

This involves adding an extra per-domain variable to store the count
of servers in use.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - new patch
---
---
 xen/arch/arm/traps.c             | 15 +++++++++------
 xen/common/ioreq.c               |  9 ++++++++-
 xen/include/asm-arm/domain.h     |  1 +
 xen/include/asm-x86/hvm/domain.h |  1 +
 xen/include/xen/ioreq.h          |  5 +++++
 5 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 121942c..6b37ae1 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2263,14 +2263,17 @@ static bool check_for_vcpu_work(void)
     struct vcpu *v = current;
 
 #ifdef CONFIG_IOREQ_SERVER
-    bool handled;
+    if ( hvm_domain_has_ioreq_server(v->domain) )
+    {
+        bool handled;
 
-    local_irq_enable();
-    handled = handle_hvm_io_completion(v);
-    local_irq_disable();
+        local_irq_enable();
+        handled = handle_hvm_io_completion(v);
+        local_irq_disable();
 
-    if ( !handled )
-        return true;
+        if ( !handled )
+            return true;
+    }
 #endif
 
     if ( likely(!v->arch.need_flush_to_ram) )
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index ce12751..4c3a835 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -38,9 +38,15 @@ static void set_ioreq_server(struct domain *d, unsigned int id,
                              struct hvm_ioreq_server *s)
 {
     ASSERT(id < MAX_NR_IOREQ_SERVERS);
-    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
+    ASSERT((!s && d->arch.hvm.ioreq_server.server[id]) ||
+           (s && !d->arch.hvm.ioreq_server.server[id]));
 
     d->arch.hvm.ioreq_server.server[id] = s;
+
+    if ( s )
+        d->arch.hvm.ioreq_server.nr_servers ++;
+    else
+        d->arch.hvm.ioreq_server.nr_servers --;
 }
 
 /*
@@ -1395,6 +1401,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
 void hvm_ioreq_init(struct domain *d)
 {
     spin_lock_init(&d->arch.hvm.ioreq_server.lock);
+    d->arch.hvm.ioreq_server.nr_servers = 0;
 
     arch_hvm_ioreq_init(d);
 }
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index d1c48d7..0c0506a 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -31,6 +31,7 @@ struct hvm_domain
     struct {
         spinlock_t              lock;
         struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
+        unsigned int            nr_servers;
     } ioreq_server;
 };
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 765f35c..79e0afb 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -77,6 +77,7 @@ struct hvm_domain {
     struct {
         spinlock_t              lock;
         struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
+        unsigned int            nr_servers;
     } ioreq_server;
 
     /* Cached CF8 for guest PCI config cycles */
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 102f7e8..25ce4c2 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -57,6 +57,11 @@ struct hvm_ioreq_server {
     uint8_t                bufioreq_handling;
 };
 
+static inline bool hvm_domain_has_ioreq_server(const struct domain *d)
+{
+    return (d->arch.hvm.ioreq_server.nr_servers > 0);
+}
+
 #define GET_IOREQ_SERVER(d, id) \
     (d)->arch.hvm.ioreq_server.server[id]
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 12/16] xen/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (10 preceding siblings ...)
  2020-09-10 20:22 ` [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server() Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  2020-09-26 13:50   ` Julien Grall
  2020-09-10 20:22 ` [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common Oleksandr Tyshchenko
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Andrew Cooper,
	George Dunlap, Jan Beulich, Julien Grall, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch adds ability to the device emulator to notify otherend
(some entity running in the guest) using a SPI and implements Arm
specific bits for it. Proposed interface allows emulator to set
the logical level of a one of a domain's IRQ lines.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Please note, I left interface untouched since there is still
an open discussion what interface to use/what information to pass
to the hypervisor. The question whether we should abstract away
the state of the line or not.

Changes RFC -> V1:
   - check incoming parameters in arch_dm_op()
   - add explicit padding to struct xen_dm_op_set_irq_level
---
---
 tools/libs/devicemodel/core.c                   | 18 +++++++++++++
 tools/libs/devicemodel/include/xendevicemodel.h |  4 +++
 tools/libs/devicemodel/libxendevicemodel.map    |  1 +
 xen/arch/arm/dm.c                               | 36 ++++++++++++++++++++++++-
 xen/common/dm.c                                 |  1 +
 xen/include/public/hvm/dm_op.h                  | 15 +++++++++++
 6 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index 4d40639..30bd79f 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
     return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
 }
 
+int xendevicemodel_set_irq_level(
+    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
+    unsigned int level)
+{
+    struct xen_dm_op op;
+    struct xen_dm_op_set_irq_level *data;
+
+    memset(&op, 0, sizeof(op));
+
+    op.op = XEN_DMOP_set_irq_level;
+    data = &op.u.set_irq_level;
+
+    data->irq = irq;
+    data->level = level;
+
+    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
+}
+
 int xendevicemodel_set_pci_link_route(
     xendevicemodel_handle *dmod, domid_t domid, uint8_t link, uint8_t irq)
 {
diff --git a/tools/libs/devicemodel/include/xendevicemodel.h b/tools/libs/devicemodel/include/xendevicemodel.h
index e877f5c..c06b3c8 100644
--- a/tools/libs/devicemodel/include/xendevicemodel.h
+++ b/tools/libs/devicemodel/include/xendevicemodel.h
@@ -209,6 +209,10 @@ int xendevicemodel_set_isa_irq_level(
     xendevicemodel_handle *dmod, domid_t domid, uint8_t irq,
     unsigned int level);
 
+int xendevicemodel_set_irq_level(
+    xendevicemodel_handle *dmod, domid_t domid, unsigned int irq,
+    unsigned int level);
+
 /**
  * This function maps a PCI INTx line to a an IRQ line.
  *
diff --git a/tools/libs/devicemodel/libxendevicemodel.map b/tools/libs/devicemodel/libxendevicemodel.map
index 561c62d..a0c3012 100644
--- a/tools/libs/devicemodel/libxendevicemodel.map
+++ b/tools/libs/devicemodel/libxendevicemodel.map
@@ -32,6 +32,7 @@ VERS_1.2 {
 	global:
 		xendevicemodel_relocate_memory;
 		xendevicemodel_pin_memory_cacheattr;
+		xendevicemodel_set_irq_level;
 } VERS_1.1;
 
 VERS_1.3 {
diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
index eb20344..428ef98 100644
--- a/xen/arch/arm/dm.c
+++ b/xen/arch/arm/dm.c
@@ -15,11 +15,45 @@
  */
 
 #include <xen/hypercall.h>
+#include <asm/vgic.h>
 
 int arch_dm_op(struct xen_dm_op *op, struct domain *d,
                const struct dmop_args *op_args, bool *const_op)
 {
-    return -EOPNOTSUPP;
+    int rc;
+
+    switch ( op->op )
+    {
+    case XEN_DMOP_set_irq_level:
+    {
+        const struct xen_dm_op_set_irq_level *data =
+            &op->u.set_irq_level;
+
+        /* Only SPIs are supported */
+        if ( (data->irq < NR_LOCAL_IRQS) || (data->irq >= vgic_num_irqs(d)) )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+        if ( data->level != 0 && data->level != 1 )
+        {
+            rc = -EINVAL;
+            break;
+        }
+
+
+        vgic_inject_irq(d, NULL, data->irq, data->level);
+        rc = 0;
+        break;
+    }
+
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+    }
+
+    return rc;
 }
 
 /*
diff --git a/xen/common/dm.c b/xen/common/dm.c
index 060731d..c55e042 100644
--- a/xen/common/dm.c
+++ b/xen/common/dm.c
@@ -47,6 +47,7 @@ static int dm_op(const struct dmop_args *op_args)
         [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
         [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
         [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
+        [XEN_DMOP_set_irq_level]                    = sizeof(struct xen_dm_op_set_irq_level),
     };
 
     rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index fd00e9d..39567bf 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -417,6 +417,20 @@ struct xen_dm_op_pin_memory_cacheattr {
     uint32_t pad;
 };
 
+/*
+ * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's
+ *                         IRQ lines.
+ * XXX Handle PPIs.
+ */
+#define XEN_DMOP_set_irq_level 19
+
+struct xen_dm_op_set_irq_level {
+    uint32_t irq;
+    /* IN - Level: 0 -> deasserted, 1 -> asserted */
+    uint8_t level;
+    uint8_t pad[3];
+};
+
 struct xen_dm_op {
     uint32_t op;
     uint32_t pad;
@@ -430,6 +444,7 @@ struct xen_dm_op {
         struct xen_dm_op_track_dirty_vram track_dirty_vram;
         struct xen_dm_op_set_pci_intx_level set_pci_intx_level;
         struct xen_dm_op_set_isa_irq_level set_isa_irq_level;
+        struct xen_dm_op_set_irq_level set_irq_level;
         struct xen_dm_op_set_pci_link_route set_pci_link_route;
         struct xen_dm_op_modified_memory modified_memory;
         struct xen_dm_op_set_mem_type set_mem_type;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (11 preceding siblings ...)
  2020-09-10 20:22 ` [PATCH V1 12/16] xen/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  2020-09-16  8:50   ` Jan Beulich
  2020-09-10 20:22 ` [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Wei Liu, Roger Pau Monné,
	Paul Durrant, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As the IOREQ is a common feature now and we also need to
invalidate qemu mapcache on XENMEM_decrease_reservation on Arm
this patch moves this handling to the common code and move per-domain
qemu_mapcache_invalidate variable out of the arch sub-struct.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - move send_invalidate_req() to the common code
   - update patch subject/description
   - move qemu_mapcache_invalidate out of the arch sub-struct,
     update checks
   - remove #if defined(CONFIG_ARM64) from the common code
---
---
 xen/arch/arm/traps.c             |  6 ++++++
 xen/arch/x86/hvm/hypercall.c     |  9 ++++-----
 xen/arch/x86/hvm/io.c            | 14 --------------
 xen/common/ioreq.c               | 14 ++++++++++++++
 xen/common/memory.c              |  5 +++++
 xen/include/asm-x86/hvm/domain.h |  1 -
 xen/include/asm-x86/hvm/io.h     |  1 -
 xen/include/xen/ioreq.h          |  2 ++
 xen/include/xen/sched.h          |  2 ++
 9 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 6b37ae1..de48b2f 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1490,6 +1490,12 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
     /* Ensure the hypercall trap instruction is re-executed. */
     if ( current->hcall_preempted )
         regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
+
+#ifdef CONFIG_IOREQ_SERVER
+    if ( unlikely(current->domain->qemu_mapcache_invalidate) &&
+         test_and_clear_bool(current->domain->qemu_mapcache_invalidate) )
+        send_invalidate_req();
+#endif
 }
 
 void arch_hypercall_tasklet_result(struct vcpu *v, long res)
diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index b6ccaf4..45fc20b 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -18,8 +18,10 @@
  *
  * Copyright (c) 2017 Citrix Systems Ltd.
  */
+
 #include <xen/lib.h>
 #include <xen/hypercall.h>
+#include <xen/ioreq.h>
 #include <xen/nospec.h>
 
 #include <asm/hvm/emulate.h>
@@ -46,9 +48,6 @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     else
         rc = compat_memory_op(cmd, arg);
 
-    if ( (cmd & MEMOP_CMD_MASK) == XENMEM_decrease_reservation )
-        curr->domain->arch.hvm.qemu_mapcache_invalidate = true;
-
     return rc;
 }
 
@@ -329,8 +328,8 @@ int hvm_hypercall(struct cpu_user_regs *regs)
     if ( curr->hcall_preempted )
         return HVM_HCALL_preempted;
 
-    if ( unlikely(currd->arch.hvm.qemu_mapcache_invalidate) &&
-         test_and_clear_bool(currd->arch.hvm.qemu_mapcache_invalidate) )
+    if ( unlikely(currd->qemu_mapcache_invalidate) &&
+         test_and_clear_bool(currd->qemu_mapcache_invalidate) )
         send_invalidate_req();
 
     return HVM_HCALL_completed;
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 14f8c89..e659a53 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -64,20 +64,6 @@ void send_timeoffset_req(unsigned long timeoff)
         gprintk(XENLOG_ERR, "Unsuccessful timeoffset update\n");
 }
 
-/* Ask ioemu mapcache to invalidate mappings. */
-void send_invalidate_req(void)
-{
-    ioreq_t p = {
-        .type = IOREQ_TYPE_INVALIDATE,
-        .size = 4,
-        .dir = IOREQ_WRITE,
-        .data = ~0UL, /* flush all */
-    };
-
-    if ( hvm_broadcast_ioreq(&p, false) != 0 )
-        gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n");
-}
-
 bool hvm_emulate_one_insn(hvm_emulate_validate_t *validate, const char *descr)
 {
     struct hvm_emulate_ctxt ctxt;
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 4c3a835..e24a481 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -34,6 +34,20 @@
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
 
+/* Ask ioemu mapcache to invalidate mappings. */
+void send_invalidate_req(void)
+{
+    ioreq_t p = {
+        .type = IOREQ_TYPE_INVALIDATE,
+        .size = 4,
+        .dir = IOREQ_WRITE,
+        .data = ~0UL, /* flush all */
+    };
+
+    if ( hvm_broadcast_ioreq(&p, false) != 0 )
+        gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n");
+}
+
 static void set_ioreq_server(struct domain *d, unsigned int id,
                              struct hvm_ioreq_server *s)
 {
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 78781f1..9d98252 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1651,6 +1651,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+#ifdef CONFIG_IOREQ_SERVER
+    if ( op == XENMEM_decrease_reservation )
+        curr_d->qemu_mapcache_invalidate = true;
+#endif
+
     return rc;
 }
 
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 79e0afb..11d5cc1 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -131,7 +131,6 @@ struct hvm_domain {
 
     struct viridian_domain *viridian;
 
-    bool_t                 qemu_mapcache_invalidate;
     bool_t                 is_s3_suspended;
 
     /*
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index fb64294..3da0136 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -97,7 +97,6 @@ bool relocate_portio_handler(
     unsigned int size);
 
 void send_timeoffset_req(unsigned long timeoff);
-void send_invalidate_req(void);
 bool handle_mmio_with_translation(unsigned long gla, unsigned long gpfn,
                                   struct npfec);
 bool handle_pio(uint16_t port, unsigned int size, int dir);
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
index 25ce4c2..5ade9b0 100644
--- a/xen/include/xen/ioreq.h
+++ b/xen/include/xen/ioreq.h
@@ -97,6 +97,8 @@ static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
            (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
 }
 
+void send_invalidate_req(void);
+
 bool hvm_io_pending(struct vcpu *v);
 bool handle_hvm_io_completion(struct vcpu *v);
 bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index ac53519..4c52a04 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -512,6 +512,8 @@ struct domain
     /* Argo interdomain communication support */
     struct argo_domain *argo;
 #endif
+
+    bool_t qemu_mapcache_invalidate;
 };
 
 static inline struct page_list_head *page_to_list(
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (12 preceding siblings ...)
  2020-09-10 20:22 ` [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  2020-09-16  9:04   ` Jan Beulich
  2020-09-23 18:05   ` Julien Grall
  2020-09-10 20:22 ` [PATCH V1 15/16] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
  2020-09-10 20:22 ` [PATCH V1 16/16] [RFC] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
  15 siblings, 2 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Julien Grall,
	Stefano Stabellini, Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The cmpxchg() in hvm_send_buffered_ioreq() operates on memory shared
with the emulator. In order to be on the safe side we need to switch
to guest_cmpxchg64() to prevent a domain to DoS Xen on Arm.

CC: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this patch depends on the following patch on a review:
https://patchwork.kernel.org/patch/11715559/

Changes RFC -> V1:
   - new patch
---
---
 xen/common/ioreq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index e24a481..645d8a1 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -30,6 +30,8 @@
 #include <xen/trace.h>
 #include <xen/vpci.h>
 
+#include <asm/guest_atomics.h>
+
 #include <public/hvm/dm_op.h>
 #include <public/hvm/ioreq.h>
 #include <public/hvm/params.h>
@@ -1325,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
 
         new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
         new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-        cmpxchg(&pg->ptrs.full, old.full, new.full);
+        guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full);
     }
 
     notify_via_xen_event_channel(d, s->bufioreq_evtchn);
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 15/16] libxl: Introduce basic virtio-mmio support on Arm
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (13 preceding siblings ...)
  2020-09-10 20:22 ` [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  2020-09-10 20:22 ` [PATCH V1 16/16] [RFC] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
  15 siblings, 0 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Julien Grall

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch creates specific device node in the Guest device-tree
with allocated MMIO range and SPI interrupt if specific 'virtio'
property is present in domain config.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - was squashed with:
     "[RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way"
     "[RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node"
     "[RFC PATCH V1 12/12] libxl: Fix duplicate memory node in DT"
   - move VirtIO MMIO #define-s to xen/include/public/arch-arm.h
---
---
 tools/libxl/libxl_arm.c       | 58 +++++++++++++++++++++++++++++++++++++++++--
 tools/libxl/libxl_types.idl   |  1 +
 tools/xl/xl_parse.c           |  1 +
 xen/include/public/arch-arm.h |  5 ++++
 4 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 34f8a29..36139d9 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -27,8 +27,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 {
     uint32_t nr_spis = 0;
     unsigned int i;
-    uint32_t vuart_irq;
-    bool vuart_enabled = false;
+    uint32_t vuart_irq, virtio_irq;
+    bool vuart_enabled = false, virtio_enabled = false;
 
     /*
      * If pl011 vuart is enabled then increment the nr_spis to allow allocation
@@ -40,6 +40,17 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         vuart_enabled = true;
     }
 
+    /*
+     * XXX: Handle properly virtio
+     * A proper solution would be the toolstack to allocate the interrupts
+     * used by each virtio backend and let the backend now which one is used
+     */
+    if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {
+        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
+        virtio_irq = GUEST_VIRTIO_MMIO_SPI;
+        virtio_enabled = true;
+    }
+
     for (i = 0; i < d_config->b_info.num_irqs; i++) {
         uint32_t irq = d_config->b_info.irqs[i];
         uint32_t spi;
@@ -59,6 +70,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
             return ERROR_FAIL;
         }
 
+        /* The same check as for vpl011 */
+        if (virtio_enabled && irq == virtio_irq) {
+            LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq);
+            return ERROR_FAIL;
+        }
+
         if (irq < 32)
             continue;
 
@@ -659,6 +676,39 @@ static int make_vpl011_uart_node(libxl__gc *gc, void *fdt,
     return 0;
 }
 
+static int make_virtio_mmio_node(libxl__gc *gc, void *fdt,
+                                 uint64_t base, uint32_t irq)
+{
+    int res;
+    gic_interrupt intr;
+    /* Placeholder for virtio@ + a 64-bit number + \0 */
+    char buf[24];
+
+    snprintf(buf, sizeof(buf), "virtio@%"PRIx64, base);
+    res = fdt_begin_node(fdt, buf);
+    if (res) return res;
+
+    res = fdt_property_compat(gc, fdt, 1, "virtio,mmio");
+    if (res) return res;
+
+    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
+                            1, base, GUEST_VIRTIO_MMIO_SIZE);
+    if (res) return res;
+
+    set_interrupt(intr, irq, 0xf, DT_IRQ_TYPE_EDGE_RISING);
+    res = fdt_property_interrupts(gc, fdt, &intr, 1);
+    if (res) return res;
+
+    res = fdt_property(fdt, "dma-coherent", NULL, 0);
+    if (res) return res;
+
+    res = fdt_end_node(fdt);
+    if (res) return res;
+
+    return 0;
+
+}
+
 static const struct arch_info *get_arch_info(libxl__gc *gc,
                                              const struct xc_dom_image *dom)
 {
@@ -962,6 +1012,9 @@ next_resize:
         if (info->tee == LIBXL_TEE_TYPE_OPTEE)
             FDT( make_optee_node(gc, fdt) );
 
+        if (libxl_defbool_val(info->arch_arm.virtio))
+            FDT( make_virtio_mmio_node(gc, fdt, GUEST_VIRTIO_MMIO_BASE, GUEST_VIRTIO_MMIO_SPI) );
+
         if (pfdt)
             FDT( copy_partial_fdt(gc, fdt, pfdt) );
 
@@ -1179,6 +1232,7 @@ void libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
 {
     /* ACPI is disabled by default */
     libxl_defbool_setdefault(&b_info->acpi, false);
+    libxl_defbool_setdefault(&b_info->arch_arm.virtio, false);
 
     if (b_info->type != LIBXL_DOMAIN_TYPE_PV)
         return;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9d3f05f..b054bf9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -639,6 +639,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 
 
     ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
+                               ("virtio", libxl_defbool),
                                ("vuart", libxl_vuart_type),
                               ])),
     # Alternate p2m is not bound to any architecture or guest type, as it is
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 61b4ef7..b8306aa 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2579,6 +2579,7 @@ skip_usbdev:
     }
 
     xlu_cfg_get_defbool(config, "dm_restrict", &b_info->dm_restrict, 0);
+    xlu_cfg_get_defbool(config, "virtio", &b_info->arch_arm.virtio, 0);
 
     if (c_info->type == LIBXL_DOMAIN_TYPE_HVM) {
         if (!xlu_cfg_get_string (config, "vga", &buf, 0)) {
diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h
index c365b1b..be7595f 100644
--- a/xen/include/public/arch-arm.h
+++ b/xen/include/public/arch-arm.h
@@ -464,6 +464,11 @@ typedef uint64_t xen_callback_t;
 #define PSCI_cpu_on      2
 #define PSCI_migrate     3
 
+/* VirtIO MMIO definitions */
+#define GUEST_VIRTIO_MMIO_BASE  xen_mk_ullong(0x02000000)
+#define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
+#define GUEST_VIRTIO_MMIO_SPI   33
+
 #endif
 
 #ifndef __ASSEMBLY__
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH V1 16/16] [RFC] libxl: Add support for virtio-disk configuration
  2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (14 preceding siblings ...)
  2020-09-10 20:22 ` [PATCH V1 15/16] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
@ 2020-09-10 20:22 ` Oleksandr Tyshchenko
  15 siblings, 0 replies; 111+ messages in thread
From: Oleksandr Tyshchenko @ 2020-09-10 20:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Anthony PERARD,
	Julien Grall, Stefano Stabellini

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch adds basic support for configuring and assisting virtio-disk
backend (emualator) which is intended to run out of Qemu and could be run
in any domain.

Xenstore was chosen as a communication interface for the emulator running
in non-toolstack domain to be able to get configuration either by reading
Xenstore directly or by receiving command line parameters (an updated 'xl devd'
running in the same domain would read Xenstore beforehand and call backend
executable with the required arguments).

An example of domain configuration (two disks are assigned to the guest,
the latter is in readonly mode):

vdisk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3;ro:/dev/mmcblk1p3' ]

Where per-disk Xenstore entries are:
- filename and readonly flag (configured via "vdisk" property)
- base and irq (allocated dynamically)

Besides handling 'visible' params described in configuration file,
patch also allocates virtio-mmio specific ones for each device and
writes them into Xenstore. virtio-mmio params (irq and base) are
unique per guest domain, they allocated at the domain creation time
and passed through to the emulator. Each VirtIO device has at least
one pair of these params.

TODO:
1. An extra "virtio" property could be removed.
2. Update documentation.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Changes RFC -> V1:
   - no changes

Please note, there is a real concern about VirtIO interrupts allocation.
Just copy here what Stefano said in RFC thread.

So, if we end up allocating let's say 6 virtio interrupts for a domain,
the chance of a clash with a physical interrupt of a passthrough device is real.

I am not entirely sure how to solve it, but these are a few ideas:
- choosing virtio interrupts that are less likely to conflict (maybe > 1000)
- make the virtio irq (optionally) configurable so that a user could
  override the default irq and specify one that doesn't conflict
- implementing support for virq != pirq (even the xl interface doesn't
  allow to specify the virq number for passthrough devices, see "irqs")
---
---
 tools/libxl/Makefile                 |   4 +-
 tools/libxl/libxl_arm.c              |  56 ++++++++++++++---
 tools/libxl/libxl_create.c           |   1 +
 tools/libxl/libxl_internal.h         |   1 +
 tools/libxl/libxl_types.idl          |  15 +++++
 tools/libxl/libxl_types_internal.idl |   1 +
 tools/libxl/libxl_virtio_disk.c      | 109 +++++++++++++++++++++++++++++++++
 tools/xl/Makefile                    |   2 +-
 tools/xl/xl.h                        |   3 +
 tools/xl/xl_cmdtable.c               |  15 +++++
 tools/xl/xl_parse.c                  | 115 +++++++++++++++++++++++++++++++++++
 tools/xl/xl_virtio_disk.c            |  46 ++++++++++++++
 12 files changed, 356 insertions(+), 12 deletions(-)
 create mode 100644 tools/libxl/libxl_virtio_disk.c
 create mode 100644 tools/xl/xl_virtio_disk.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 0e8dfc6..8ab6c41 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -141,7 +141,9 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_vtpm.o libxl_nic.o libxl_disk.o libxl_console.o \
 			libxl_cpupool.o libxl_mem.o libxl_sched.o libxl_tmem.o \
 			libxl_9pfs.o libxl_domain.o libxl_vdispl.o \
-			libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o $(LIBXL_OBJS-y)
+			libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o \
+			libxl_virtio_disk.o $(LIBXL_OBJS-y)
+
 LIBXL_OBJS += libxl_genid.o
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 36139d9..442b3b9 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -9,6 +9,12 @@
 #include <assert.h>
 #include <xen/device_tree_defs.h>
 
+#ifndef container_of
+#define container_of(ptr, type, member) ({			\
+        typeof( ((type *)0)->member ) *__mptr = (ptr);	\
+        (type *)( (char *)__mptr - offsetof(type,member) );})
+#endif
+
 static const char *gicv_to_string(libxl_gic_version gic_version)
 {
     switch (gic_version) {
@@ -40,14 +46,32 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         vuart_enabled = true;
     }
 
-    /*
-     * XXX: Handle properly virtio
-     * A proper solution would be the toolstack to allocate the interrupts
-     * used by each virtio backend and let the backend now which one is used
-     */
     if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {
-        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
+        uint64_t virtio_base;
+        libxl_device_virtio_disk *virtio_disk;
+
+        virtio_base = GUEST_VIRTIO_MMIO_BASE;
         virtio_irq = GUEST_VIRTIO_MMIO_SPI;
+
+        if (!d_config->num_virtio_disks) {
+            LOG(ERROR, "Virtio is enabled, but no Virtio devices present\n");
+            return ERROR_FAIL;
+        }
+        virtio_disk = &d_config->virtio_disks[0];
+
+        for (i = 0; i < virtio_disk->num_disks; i++) {
+            virtio_disk->disks[i].base = virtio_base;
+            virtio_disk->disks[i].irq = virtio_irq;
+
+            LOG(DEBUG, "Allocate Virtio MMIO params: IRQ %u BASE 0x%"PRIx64,
+                virtio_irq, virtio_base);
+
+            virtio_irq ++;
+            virtio_base += GUEST_VIRTIO_MMIO_SIZE;
+        }
+        virtio_irq --;
+
+        nr_spis += (virtio_irq - 32) + 1;
         virtio_enabled = true;
     }
 
@@ -71,8 +95,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         }
 
         /* The same check as for vpl011 */
-        if (virtio_enabled && irq == virtio_irq) {
-            LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq);
+        if (virtio_enabled &&
+           (irq >= GUEST_VIRTIO_MMIO_SPI && irq <= virtio_irq)) {
+            LOG(ERROR, "Physical IRQ %u conflicting with Virtio IRQ range\n", irq);
             return ERROR_FAIL;
         }
 
@@ -1012,8 +1037,19 @@ next_resize:
         if (info->tee == LIBXL_TEE_TYPE_OPTEE)
             FDT( make_optee_node(gc, fdt) );
 
-        if (libxl_defbool_val(info->arch_arm.virtio))
-            FDT( make_virtio_mmio_node(gc, fdt, GUEST_VIRTIO_MMIO_BASE, GUEST_VIRTIO_MMIO_SPI) );
+        if (libxl_defbool_val(info->arch_arm.virtio)) {
+            libxl_domain_config *d_config =
+                container_of(info, libxl_domain_config, b_info);
+            libxl_device_virtio_disk *virtio_disk = &d_config->virtio_disks[0];
+            unsigned int i;
+
+            for (i = 0; i < virtio_disk->num_disks; i++) {
+                uint64_t base = virtio_disk->disks[i].base;
+                uint32_t irq = virtio_disk->disks[i].irq;
+
+                FDT( make_virtio_mmio_node(gc, fdt, base, irq) );
+            }
+        }
 
         if (pfdt)
             FDT( copy_partial_fdt(gc, fdt, pfdt) );
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 2814818..8a0651e 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1817,6 +1817,7 @@ const libxl__device_type *device_type_tbl[] = {
     &libxl__dtdev_devtype,
     &libxl__vdispl_devtype,
     &libxl__vsnd_devtype,
+    &libxl__virtio_disk_devtype,
     NULL
 };
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 94a2317..4e2024d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3988,6 +3988,7 @@ extern const libxl__device_type libxl__vdispl_devtype;
 extern const libxl__device_type libxl__p9_devtype;
 extern const libxl__device_type libxl__pvcallsif_devtype;
 extern const libxl__device_type libxl__vsnd_devtype;
+extern const libxl__device_type libxl__virtio_disk_devtype;
 
 extern const libxl__device_type *device_type_tbl[];
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index b054bf9..5f8a3ff 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -935,6 +935,20 @@ libxl_device_vsnd = Struct("device_vsnd", [
     ("pcms", Array(libxl_vsnd_pcm, "num_vsnd_pcms"))
     ])
 
+libxl_virtio_disk_param = Struct("virtio_disk_param", [
+    ("filename", string),
+    ("readonly", bool),
+    ("irq", uint32),
+    ("base", uint64),
+    ])
+
+libxl_device_virtio_disk = Struct("device_virtio_disk", [
+    ("backend_domid", libxl_domid),
+    ("backend_domname", string),
+    ("devid", libxl_devid),
+    ("disks", Array(libxl_virtio_disk_param, "num_disks")),
+    ])
+
 libxl_domain_config = Struct("domain_config", [
     ("c_info", libxl_domain_create_info),
     ("b_info", libxl_domain_build_info),
@@ -951,6 +965,7 @@ libxl_domain_config = Struct("domain_config", [
     ("pvcallsifs", Array(libxl_device_pvcallsif, "num_pvcallsifs")),
     ("vdispls", Array(libxl_device_vdispl, "num_vdispls")),
     ("vsnds", Array(libxl_device_vsnd, "num_vsnds")),
+    ("virtio_disks", Array(libxl_device_virtio_disk, "num_virtio_disks")),
     # a channel manifests as a console with a name,
     # see docs/misc/channels.txt
     ("channels", Array(libxl_device_channel, "num_channels")),
diff --git a/tools/libxl/libxl_types_internal.idl b/tools/libxl/libxl_types_internal.idl
index 3593e21..8f71980 100644
--- a/tools/libxl/libxl_types_internal.idl
+++ b/tools/libxl/libxl_types_internal.idl
@@ -32,6 +32,7 @@ libxl__device_kind = Enumeration("device_kind", [
     (14, "PVCALLS"),
     (15, "VSND"),
     (16, "VINPUT"),
+    (17, "VIRTIO_DISK"),
     ])
 
 libxl__console_backend = Enumeration("console_backend", [
diff --git a/tools/libxl/libxl_virtio_disk.c b/tools/libxl/libxl_virtio_disk.c
new file mode 100644
index 0000000..25e7f1a
--- /dev/null
+++ b/tools/libxl/libxl_virtio_disk.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2020 EPAM Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_internal.h"
+
+static int libxl__device_virtio_disk_setdefault(libxl__gc *gc, uint32_t domid,
+                                                libxl_device_virtio_disk *virtio_disk,
+                                                bool hotplug)
+{
+    return libxl__resolve_domid(gc, virtio_disk->backend_domname,
+                                &virtio_disk->backend_domid);
+}
+
+static int libxl__virtio_disk_from_xenstore(libxl__gc *gc, const char *libxl_path,
+                                            libxl_devid devid,
+                                            libxl_device_virtio_disk *virtio_disk)
+{
+    const char *be_path;
+    int rc;
+
+    virtio_disk->devid = devid;
+    rc = libxl__xs_read_mandatory(gc, XBT_NULL,
+                                  GCSPRINTF("%s/backend", libxl_path),
+                                  &be_path);
+    if (rc) return rc;
+
+    rc = libxl__backendpath_parse_domid(gc, be_path, &virtio_disk->backend_domid);
+    if (rc) return rc;
+
+    return 0;
+}
+
+static void libxl__update_config_virtio_disk(libxl__gc *gc,
+                                             libxl_device_virtio_disk *dst,
+                                             libxl_device_virtio_disk *src)
+{
+    dst->devid = src->devid;
+}
+
+static int libxl_device_virtio_disk_compare(libxl_device_virtio_disk *d1,
+                                            libxl_device_virtio_disk *d2)
+{
+    return COMPARE_DEVID(d1, d2);
+}
+
+static void libxl__device_virtio_disk_add(libxl__egc *egc, uint32_t domid,
+                                          libxl_device_virtio_disk *virtio_disk,
+                                          libxl__ao_device *aodev)
+{
+    libxl__device_add_async(egc, domid, &libxl__virtio_disk_devtype, virtio_disk, aodev);
+}
+
+static int libxl__set_xenstore_virtio_disk(libxl__gc *gc, uint32_t domid,
+                                           libxl_device_virtio_disk *virtio_disk,
+                                           flexarray_t *back, flexarray_t *front,
+                                           flexarray_t *ro_front)
+{
+    int rc;
+    unsigned int i;
+
+    for (i = 0; i < virtio_disk->num_disks; i++) {
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/filename", i),
+                                   GCSPRINTF("%s", virtio_disk->disks[i].filename));
+        if (rc) return rc;
+
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/readonly", i),
+                                   GCSPRINTF("%d", virtio_disk->disks[i].readonly));
+        if (rc) return rc;
+
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/base", i),
+                                   GCSPRINTF("%lu", virtio_disk->disks[i].base));
+        if (rc) return rc;
+
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/irq", i),
+                                   GCSPRINTF("%u", virtio_disk->disks[i].irq));
+        if (rc) return rc;
+    }
+
+    return 0;
+}
+
+static LIBXL_DEFINE_UPDATE_DEVID(virtio_disk)
+static LIBXL_DEFINE_DEVICE_FROM_TYPE(virtio_disk)
+static LIBXL_DEFINE_DEVICES_ADD(virtio_disk)
+
+DEFINE_DEVICE_TYPE_STRUCT(virtio_disk, VIRTIO_DISK,
+    .update_config = (device_update_config_fn_t) libxl__update_config_virtio_disk,
+    .from_xenstore = (device_from_xenstore_fn_t) libxl__virtio_disk_from_xenstore,
+    .set_xenstore_config = (device_set_xenstore_config_fn_t) libxl__set_xenstore_virtio_disk
+);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/xl/Makefile b/tools/xl/Makefile
index af4912e..38e4701 100644
--- a/tools/xl/Makefile
+++ b/tools/xl/Makefile
@@ -22,7 +22,7 @@ XL_OBJS += xl_vtpm.o xl_block.o xl_nic.o xl_usb.o
 XL_OBJS += xl_sched.o xl_pci.o xl_vcpu.o xl_cdrom.o xl_mem.o
 XL_OBJS += xl_info.o xl_console.o xl_misc.o
 XL_OBJS += xl_vmcontrol.o xl_saverestore.o xl_migrate.o
-XL_OBJS += xl_vdispl.o xl_vsnd.o xl_vkb.o
+XL_OBJS += xl_vdispl.o xl_vsnd.o xl_vkb.o xl_virtio_disk.o
 
 $(XL_OBJS): CFLAGS += $(CFLAGS_libxentoollog)
 $(XL_OBJS): CFLAGS += $(CFLAGS_XL)
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 06569c6..3d26f19 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -178,6 +178,9 @@ int main_vsnddetach(int argc, char **argv);
 int main_vkbattach(int argc, char **argv);
 int main_vkblist(int argc, char **argv);
 int main_vkbdetach(int argc, char **argv);
+int main_virtio_diskattach(int argc, char **argv);
+int main_virtio_disklist(int argc, char **argv);
+int main_virtio_diskdetach(int argc, char **argv);
 int main_usbctrl_attach(int argc, char **argv);
 int main_usbctrl_detach(int argc, char **argv);
 int main_usbdev_attach(int argc, char **argv);
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 0833539..2bdf0b7 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -434,6 +434,21 @@ struct cmd_spec cmd_table[] = {
       "Destroy a domain's virtual sound device",
       "<Domain> <DevId>",
     },
+    { "virtio-disk-attach",
+      &main_virtio_diskattach, 1, 1,
+      "Create a new virtio block device",
+      " TBD\n"
+    },
+    { "virtio-disk-list",
+      &main_virtio_disklist, 0, 0,
+      "List virtio block devices for a domain",
+      "<Domain(s)>",
+    },
+    { "virtio-disk-detach",
+      &main_virtio_diskdetach, 0, 1,
+      "Destroy a domain's virtio block device",
+      "<Domain> <DevId>",
+    },
     { "uptime",
       &main_uptime, 0, 0,
       "Print uptime for all/some domains",
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index b8306aa..72c0a65 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1202,6 +1202,120 @@ out:
     if (rc) exit(EXIT_FAILURE);
 }
 
+#define MAX_VIRTIO_DISKS 4
+
+static int parse_virtio_disk_config(libxl_device_virtio_disk *virtio_disk, char *token)
+{
+    char *oparg;
+    libxl_string_list disks = NULL;
+    int i, rc;
+
+    if (MATCH_OPTION("backend", token, oparg)) {
+        virtio_disk->backend_domname = strdup(oparg);
+    } else if (MATCH_OPTION("disks", token, oparg)) {
+        split_string_into_string_list(oparg, ";", &disks);
+
+        virtio_disk->num_disks = libxl_string_list_length(&disks);
+        if (virtio_disk->num_disks > MAX_VIRTIO_DISKS) {
+            fprintf(stderr, "vdisk: currently only %d disks are supported",
+                    MAX_VIRTIO_DISKS);
+            return 1;
+        }
+        virtio_disk->disks = xcalloc(virtio_disk->num_disks,
+                                     sizeof(*virtio_disk->disks));
+
+        for(i = 0; i < virtio_disk->num_disks; i++) {
+            char *disk_opt;
+
+            rc = split_string_into_pair(disks[i], ":", &disk_opt,
+                                        &virtio_disk->disks[i].filename);
+            if (rc) {
+                fprintf(stderr, "vdisk: failed to split \"%s\" into pair\n",
+                        disks[i]);
+                goto out;
+            }
+
+            if (!strcmp(disk_opt, "ro"))
+                virtio_disk->disks[i].readonly = 1;
+            else if (!strcmp(disk_opt, "rw"))
+                virtio_disk->disks[i].readonly = 0;
+            else {
+                fprintf(stderr, "vdisk: failed to parse \"%s\" disk option\n",
+                        disk_opt);
+                rc = 1;
+            }
+            free(disk_opt);
+
+            if (rc) goto out;
+        }
+    } else {
+        fprintf(stderr, "Unknown string \"%s\" in vdisk spec\n", token);
+        rc = 1; goto out;
+    }
+
+    rc = 0;
+
+out:
+    libxl_string_list_dispose(&disks);
+    return rc;
+}
+
+static void parse_virtio_disk_list(const XLU_Config *config,
+                                   libxl_domain_config *d_config)
+{
+    XLU_ConfigList *virtio_disks;
+    const char *item;
+    char *buf = NULL;
+    int rc;
+
+    if (!xlu_cfg_get_list (config, "vdisk", &virtio_disks, 0, 0)) {
+        libxl_domain_build_info *b_info = &d_config->b_info;
+        int entry = 0;
+
+        /* XXX Remove an extra property */
+        libxl_defbool_setdefault(&b_info->arch_arm.virtio, false);
+        if (!libxl_defbool_val(b_info->arch_arm.virtio)) {
+            fprintf(stderr, "Virtio device requires Virtio property to be set\n");
+            exit(EXIT_FAILURE);
+        }
+
+        while ((item = xlu_cfg_get_listitem(virtio_disks, entry)) != NULL) {
+            libxl_device_virtio_disk *virtio_disk;
+            char *p;
+
+            virtio_disk = ARRAY_EXTEND_INIT(d_config->virtio_disks,
+                                            d_config->num_virtio_disks,
+                                            libxl_device_virtio_disk_init);
+
+            buf = strdup(item);
+
+            p = strtok (buf, ",");
+            while (p != NULL)
+            {
+                while (*p == ' ') p++;
+
+                rc = parse_virtio_disk_config(virtio_disk, p);
+                if (rc) goto out;
+
+                p = strtok (NULL, ",");
+            }
+
+            entry++;
+
+            if (virtio_disk->num_disks == 0) {
+                fprintf(stderr, "At least one virtio disk should be specified\n");
+                rc = 1; goto out;
+            }
+        }
+    }
+
+    rc = 0;
+
+out:
+    free(buf);
+    if (rc) exit(EXIT_FAILURE);
+}
+
 void parse_config_data(const char *config_source,
                        const char *config_data,
                        int config_len,
@@ -2732,6 +2846,7 @@ skip_usbdev:
     }
 
     parse_vkb_list(config, d_config);
+    parse_virtio_disk_list(config, d_config);
 
     xlu_cfg_get_defbool(config, "xend_suspend_evtchn_compat",
                         &c_info->xend_suspend_evtchn_compat, 0);
diff --git a/tools/xl/xl_virtio_disk.c b/tools/xl/xl_virtio_disk.c
new file mode 100644
index 0000000..808a7da
--- /dev/null
+++ b/tools/xl/xl_virtio_disk.c
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2020 EPAM Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include <stdlib.h>
+
+#include <libxl.h>
+#include <libxl_utils.h>
+#include <libxlutil.h>
+
+#include "xl.h"
+#include "xl_utils.h"
+#include "xl_parse.h"
+
+int main_virtio_diskattach(int argc, char **argv)
+{
+    return 0;
+}
+
+int main_virtio_disklist(int argc, char **argv)
+{
+   return 0;
+}
+
+int main_virtio_diskdetach(int argc, char **argv)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-10 20:22 ` [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
@ 2020-09-11 10:14   ` Oleksandr
  2020-09-16  7:51   ` Jan Beulich
  2020-09-23 18:03   ` Julien Grall
  2 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-11 10:14 UTC (permalink / raw)
  To: xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Julien Grall,
	Volodymyr Babchuk, Julien Grall


Hello all.


>   /*
> diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h
> new file mode 100644
> index 0000000..1c34df0
> --- /dev/null
> +++ b/xen/include/asm-arm/hvm/ioreq.h
> @@ -0,0 +1,108 @@
> +/*
> + * hvm.h: Hardware virtual machine assist interface definitions.
> + *
> + * Copyright (c) 2016 Citrix Systems Inc.
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __ASM_ARM_HVM_IOREQ_H__
> +#define __ASM_ARM_HVM_IOREQ_H__
> +
> +#include <public/hvm/ioreq.h>
> +#include <public/hvm/dm_op.h>
> +
> +#ifdef CONFIG_IOREQ_SERVER
> +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v);
> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> +                             struct vcpu *v, mmio_info_t *info);
> +#else
> +static inline enum io_state handle_ioserv(struct cpu_user_regs *regs,
> +                                          struct vcpu *v)
> +{
> +    return IO_UNHANDLED;
> +}
> +
> +static inline enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> +                                           struct vcpu *v, mmio_info_t *info)
> +{
> +    return IO_UNHANDLED;
> +}
> +#endif
> +
> +bool ioreq_handle_complete_mmio(void);
> +
> +static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
> +{
> +    /*
> +     * TODO: For Arm64, the main user will be PCI. So this should be
> +     * implemented when we add support for vPCI.
> +     */
> +    BUG();
> +    return true;
> +}
> +
> +static inline int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s)
> +{
> +    return 0;
> +}
> +
> +static inline void msix_write_completion(struct vcpu *v)
> +{
> +}
> +
> +static inline bool arch_handle_hvm_io_completion(
> +    enum hvm_io_completion io_completion)
> +{
> +    ASSERT_UNREACHABLE();

I am sorry, but there should be return true;
to avoid "no return statement in function returning non-void 
[-Werror=return-type]"
I am a little bit puzzled why I didn't spot this build error earlier.


> +}

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common
  2020-09-10 20:21 ` [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
@ 2020-09-14 13:52   ` Jan Beulich
  2020-09-21 12:22     ` Oleksandr
  2020-09-23 17:22   ` Julien Grall
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-14 13:52 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall

On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> As a lot of x86 code can be re-used on Arm later on, this patch
> prepares IOREQ support before moving to the common code. This way
> we will get almost a verbatim copy for a code movement.
> 
> This support is going to be used on Arm to be able run device
> emulator outside of Xen hypervisor.

This is all fine, but you should then go on and explain what you're
doing, and why (at which point it may become obvious that it would
be more helpful to split this into a couple of steps). In particular
something as suspicious as ...

> Changes RFC -> V1:
>    - new patch, was split from:
>      "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common"
>    - fold the check of p->type into hvm_get_ioreq_server_range_type()
>      and make it return success/failure
>    - remove relocate_portio_handler() call from arch_hvm_ioreq_destroy()
>      in arch/x86/hvm/ioreq.c

... this (see below).

> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -170,6 +170,29 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>      return true;
>  }
>  
> +bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion)

Do we need "handle" in here? Without it, I'd also not have to ask about
moving hvm further to the front of the name...

> @@ -836,6 +848,12 @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
>      return rc;
>  }
>  
> +/* Called when target domain is paused */
> +int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s)
> +{
> +    return p2m_set_ioreq_server(s->target, 0, s);
> +}

Why return "int" when ...

> @@ -855,7 +873,7 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
>  
>      domain_pause(d);
>  
> -    p2m_set_ioreq_server(d, 0, s);
> +    arch_hvm_destroy_ioreq_server(s);

... the result has been ignored anyway? Or otherwise I guess you'd
want to add error handling here (but then the result of
p2m_set_ioreq_server() should still get ignored, for backwards
compatibility).

> @@ -1215,8 +1233,7 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>      struct hvm_ioreq_server *s;
>      unsigned int id;
>  
> -    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
> -        return;
> +    arch_hvm_ioreq_destroy(d);

So the call to relocate_portio_handler() simply goes away. No
replacement, no explanation?

> @@ -1239,19 +1256,15 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>      spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>  }
>  
> -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
> -                                                 ioreq_t *p)
> +int hvm_get_ioreq_server_range_type(struct domain *d,
> +                                    ioreq_t *p,

At least p, but perhaps also d can gain const?

> +                                    uint8_t *type,
> +                                    uint64_t *addr)

By the name the function returns a type for a range (albeit without
it being clear where the two bounds of such a range actually live).
By the implementation is looks more like you mean "range_and_type",
albeit still without there really being a range passed back to the
caller. Therefore I think I need some clarification on what's
intended before even being able to suggest something. From ...

> +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
> +                                                 ioreq_t *p)
> +{
> +    struct hvm_ioreq_server *s;
> +    uint8_t type;
> +    uint64_t addr;
> +    unsigned int id;
> +
> +    if ( hvm_get_ioreq_server_range_type(d, p, &type, &addr) )
> +        return NULL;

... this use - maybe hvm_ioreq_server_get_type_addr() (albeit I
don't like this name very much)?

> @@ -1351,7 +1378,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
>      pg = iorp->va;
>  
>      if ( !pg )
> -        return X86EMUL_UNHANDLEABLE;
> +        return IOREQ_IO_UNHANDLED;

At this example - why the IO infix, duplicating the prefix? I'd
suggest to either drop it (if the remaining identifiers are deemed
unambiguous enough) or use e.g. IOREQ_STATUS_*.

> @@ -1515,11 +1542,21 @@ static int hvm_access_cf8(
>      return X86EMUL_UNHANDLEABLE;
>  }
>  
> +void arch_hvm_ioreq_init(struct domain *d)
> +{
> +    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
> +}
> +
> +void arch_hvm_ioreq_destroy(struct domain *d)
> +{
> +
> +}

Stray blank line?

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-10 20:21 ` [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
@ 2020-09-14 14:17   ` Jan Beulich
  2020-09-21 19:02     ` Oleksandr
  2020-09-24 18:01   ` Julien Grall
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-14 14:17 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall

On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
> ---
>  MAINTAINERS                     |    8 +-
>  xen/arch/x86/Kconfig            |    1 +
>  xen/arch/x86/hvm/dm.c           |    2 +-
>  xen/arch/x86/hvm/emulate.c      |    2 +-
>  xen/arch/x86/hvm/hvm.c          |    2 +-
>  xen/arch/x86/hvm/io.c           |    2 +-
>  xen/arch/x86/hvm/ioreq.c        | 1425 +--------------------------------------
>  xen/arch/x86/hvm/stdvga.c       |    2 +-
>  xen/arch/x86/hvm/vmx/vvmx.c     |    3 +-
>  xen/arch/x86/mm.c               |    2 +-
>  xen/arch/x86/mm/shadow/common.c |    2 +-
>  xen/common/Kconfig              |    3 +
>  xen/common/Makefile             |    1 +
>  xen/common/ioreq.c              | 1410 ++++++++++++++++++++++++++++++++++++++

This suggests it was almost the entire file which got moved. It would
be really nice if you could convince git to show the diff here, rather
than removal and addition of 1400 lines.

Additionally I wonder whether what's left in the original file wouldn't
better become inline functions now. If this was done in the previous
patch, the change would be truly a rename then, I think.

> --- a/xen/include/asm-x86/hvm/ioreq.h
> +++ b/xen/include/asm-x86/hvm/ioreq.h
> @@ -19,41 +19,12 @@
>  #ifndef __ASM_X86_HVM_IOREQ_H__
>  #define __ASM_X86_HVM_IOREQ_H__
>  
> -bool hvm_io_pending(struct vcpu *v);
> -bool handle_hvm_io_completion(struct vcpu *v);
> -bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
> +#include <asm/hvm/emulate.h>
> +#include <asm/hvm/hvm.h>
> +#include <asm/hvm/vmx/vmx.h>

Are all three really needed here? Especially the last one strikes me as
odd.

> --- /dev/null
> +++ b/xen/include/xen/ioreq.h
> @@ -0,0 +1,82 @@
> +/*
> + * ioreq.h: Hardware virtual machine assist interface definitions.
> + *
> + * Copyright (c) 2016 Citrix Systems Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __IOREQ_H__
> +#define __IOREQ_H__

__XEN_IOREQ_H__ please.

> +#include <xen/sched.h>
> +
> +#include <asm/hvm/ioreq.h>

Is this include really needed here (i.e. by the code further down in
the file, and hence by everyone including this header), or rather
just in a few specific .c files?

> +#define GET_IOREQ_SERVER(d, id) \
> +    (d)->arch.hvm.ioreq_server.server[id]

arch.hvm.* feels like a layering violation when used in this header.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 03/16] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
  2020-09-10 20:21 ` [PATCH V1 03/16] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common Oleksandr Tyshchenko
@ 2020-09-14 14:59   ` Jan Beulich
  2020-09-22 16:16     ` Oleksandr
  2020-09-23 17:27     ` Julien Grall
  0 siblings, 2 replies; 111+ messages in thread
From: Jan Beulich @ 2020-09-14 14:59 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Jun Nakajima, Kevin Tian,
	Andrew Cooper, Wei Liu, Roger Pau Monné,
	Paul Durrant, Julien Grall, Stefano Stabellini, Julien Grall

On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
> --- a/xen/include/xen/ioreq.h
> +++ b/xen/include/xen/ioreq.h
> @@ -35,6 +35,13 @@ static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
>      return GET_IOREQ_SERVER(d, id);
>  }
>  
> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
> +{
> +    return ioreq->state == STATE_IOREQ_READY &&
> +           !ioreq->data_is_ptr &&
> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
> +}

While the PIO aspect has been discussed to some length, what about
the data_is_ptr concept? I didn't think there were Arm insns fitting
this? Instead I thought some other Arm-specific adjustments to the
protocol might be needed. At which point the question of course would
be in how far ioreq_t as a whole really fits Arm in its current shape.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 04/16] xen/ioreq: Provide alias for the handle_mmio()
  2020-09-10 20:21 ` [PATCH V1 04/16] xen/ioreq: Provide alias for the handle_mmio() Oleksandr Tyshchenko
@ 2020-09-14 15:10   ` Jan Beulich
  2020-09-22 16:20     ` Oleksandr
  2020-09-23 17:28   ` Julien Grall
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-14 15:10 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall

On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
> --- a/xen/common/ioreq.c
> +++ b/xen/common/ioreq.c
> @@ -189,7 +189,7 @@ bool handle_hvm_io_completion(struct vcpu *v)
>          break;
>  
>      case HVMIO_mmio_completion:
> -        return handle_mmio();
> +        return ioreq_handle_complete_mmio();

Again the question: Any particular reason to have "handle" in here?
With the abstraction simply named ioreq_complete_mmio() feel free
to add
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 05/16] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common
  2020-09-10 20:21 ` [PATCH V1 05/16] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common Oleksandr Tyshchenko
@ 2020-09-14 15:13   ` Jan Beulich
  2020-09-22 16:24     ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-14 15:13 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall

On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The IOREQ is a common feature now and these helpers will be used
> on Arm as is. Move them to include/xen/ioreq.h

I think you also want to renamed them to replace the hvm_
prefix by ioreq_.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 06/16] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
  2020-09-10 20:22 ` [PATCH V1 06/16] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common Oleksandr Tyshchenko
@ 2020-09-14 15:16   ` Jan Beulich
  2020-09-14 15:59     ` Julien Grall
  2020-09-22 16:33     ` Oleksandr
  0 siblings, 2 replies; 111+ messages in thread
From: Jan Beulich @ 2020-09-14 15:16 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Julien Grall, Stefano Stabellini, Julien Grall

On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> --- a/xen/include/xen/ioreq.h
> +++ b/xen/include/xen/ioreq.h
> @@ -23,6 +23,40 @@
>  
>  #include <asm/hvm/ioreq.h>
>  
> +struct hvm_ioreq_page {
> +    gfn_t gfn;
> +    struct page_info *page;
> +    void *va;
> +};
> +
> +struct hvm_ioreq_vcpu {
> +    struct list_head list_entry;
> +    struct vcpu      *vcpu;
> +    evtchn_port_t    ioreq_evtchn;
> +    bool             pending;
> +};
> +
> +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
> +#define MAX_NR_IO_RANGES  256
> +
> +struct hvm_ioreq_server {
> +    struct domain          *target, *emulator;
> +
> +    /* Lock to serialize toolstack modifications */
> +    spinlock_t             lock;
> +
> +    struct hvm_ioreq_page  ioreq;
> +    struct list_head       ioreq_vcpu_list;
> +    struct hvm_ioreq_page  bufioreq;
> +
> +    /* Lock to serialize access to buffered ioreq ring */
> +    spinlock_t             bufioreq_lock;
> +    evtchn_port_t          bufioreq_evtchn;
> +    struct rangeset        *range[NR_IO_RANGE_TYPES];
> +    bool                   enabled;
> +    uint8_t                bufioreq_handling;
> +};

Besides there again being the question of hvm_ prefixes here,
is the bufioreq concept something Arm is meaning to make use
of? If not, this may want to become conditional ...

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 07/16] xen/dm: Make x86's DM feature common
  2020-09-10 20:22 ` [PATCH V1 07/16] xen/dm: Make x86's DM feature common Oleksandr Tyshchenko
@ 2020-09-14 15:56   ` Jan Beulich
  2020-09-22 16:46     ` Oleksandr
  2020-09-23 17:35   ` Julien Grall
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-14 15:56 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini,
	Daniel De Graaf, Julien Grall

On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> --- a/xen/include/xen/hypercall.h
> +++ b/xen/include/xen/hypercall.h
> @@ -150,6 +150,18 @@ do_dm_op(
>      unsigned int nr_bufs,
>      XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs);
>  
> +struct dmop_args {
> +    domid_t domid;
> +    unsigned int nr_bufs;
> +    /* Reserve enough buf elements for all current hypercalls. */
> +    struct xen_dm_op_buf buf[2];
> +};
> +
> +int arch_dm_op(struct xen_dm_op *op,
> +               struct domain *d,
> +               const struct dmop_args *op_args,
> +               bool *const_op);
> +
>  #ifdef CONFIG_HYPFS
>  extern long
>  do_hypfs_op(

There are exactly two CUs which need to see these two declarations.
Personally I think they should go into a new header, or at least
into one that half-way fits (from the pov of its other contents)
and doesn't get included by half the code base. But maybe it's
just me ...

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 06/16] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
  2020-09-14 15:16   ` Jan Beulich
@ 2020-09-14 15:59     ` Julien Grall
  2020-09-22 16:33     ` Oleksandr
  1 sibling, 0 replies; 111+ messages in thread
From: Julien Grall @ 2020-09-14 15:59 UTC (permalink / raw)
  To: Jan Beulich, Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Stefano Stabellini, Julien Grall

Hi Jan,

On 14/09/2020 16:16, Jan Beulich wrote:
> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/xen/ioreq.h
>> +++ b/xen/include/xen/ioreq.h
>> @@ -23,6 +23,40 @@
>>   
>>   #include <asm/hvm/ioreq.h>
>>   
>> +struct hvm_ioreq_page {
>> +    gfn_t gfn;
>> +    struct page_info *page;
>> +    void *va;
>> +};
>> +
>> +struct hvm_ioreq_vcpu {
>> +    struct list_head list_entry;
>> +    struct vcpu      *vcpu;
>> +    evtchn_port_t    ioreq_evtchn;
>> +    bool             pending;
>> +};
>> +
>> +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
>> +#define MAX_NR_IO_RANGES  256
>> +
>> +struct hvm_ioreq_server {
>> +    struct domain          *target, *emulator;
>> +
>> +    /* Lock to serialize toolstack modifications */
>> +    spinlock_t             lock;
>> +
>> +    struct hvm_ioreq_page  ioreq;
>> +    struct list_head       ioreq_vcpu_list;
>> +    struct hvm_ioreq_page  bufioreq;
>> +
>> +    /* Lock to serialize access to buffered ioreq ring */
>> +    spinlock_t             bufioreq_lock;
>> +    evtchn_port_t          bufioreq_evtchn;
>> +    struct rangeset        *range[NR_IO_RANGE_TYPES];
>> +    bool                   enabled;
>> +    uint8_t                bufioreq_handling;
>> +};
> 
> Besides there again being the question of hvm_ prefixes here,
> is the bufioreq concept something Arm is meaning to make use
> of? If not, this may want to become conditional ...

Yes, I would like to make use of it to optimize virtio notifications as 
we don't need to wait for them to be processed by the IOREQ server.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-09-10 20:22 ` [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
@ 2020-09-16  7:17   ` Jan Beulich
  2020-09-16  8:50     ` Julien Grall
  2020-09-16  8:08   ` Jan Beulich
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-16  7:17 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Wei Liu, Roger Pau Monné,
	Julien Grall

On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1155,6 +1155,7 @@ static int acquire_resource(
>          xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)];
>          unsigned int i;
>  
> +#ifndef CONFIG_ARM
>          /*
>           * FIXME: Until foreign pages inserted into the P2M are properly
>           *        reference counted, it is unsafe to allow mapping of
> @@ -1162,13 +1163,14 @@ static int acquire_resource(
>           */
>          if ( !is_hardware_domain(currd) )
>              return -EACCES;
> +#endif

Instead of #ifdef, may I ask that a predicate like arch_refcounts_p2m()
be used?

>          if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
>              rc = -EFAULT;
>  
>          for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>          {
> -            rc = set_foreign_p2m_entry(currd, gfn_list[i],
> +            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>                                         _mfn(mfn_list[i]));

Is it going to lead to proper behavior when d == currd, specifically
for Arm but also in the abstract model? If you expose this to other
than Dom0, this case needs at least considering (and hence mentioning
in the description of why it's safe to permit if you don't reject
such attempts). Personally I'd view it as wrong to use
p2m_map_foreign_rw in this case, even in the event that it can be
shown that nothing breaks in such a case. But I'm open to be
convinced of the opposite.

> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -381,15 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned int order)
>      return gfn_add(gfn, 1UL << order);
>  }
>  
> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
> -                                        mfn_t mfn)
> -{
> -    /*
> -     * NOTE: If this is implemented then proper reference counting of
> -     *       foreign entries will need to be implemented.
> -     */
> -    return -EOPNOTSUPP;
> -}
> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
> +                          unsigned long gfn,  mfn_t mfn);

With this and ...

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -635,7 +635,8 @@ int p2m_is_logdirty_range(struct p2m_domain *, unsigned long start,
>                            unsigned long end);
>  
>  /* Set foreign entry in the p2m table (for priv-mapping) */
> -int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
> +                          unsigned long gfn, mfn_t mfn);

... this now being identical (and as a result it being expected
that future ports would also want to implement a proper function)
except for the stray blank in the Arm variant, I'd like it to be
at least considered to move the declaration to xen/p2m-common.h.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-10 20:22 ` [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
  2020-09-11 10:14   ` Oleksandr
@ 2020-09-16  7:51   ` Jan Beulich
  2020-09-22 17:12     ` Oleksandr
  2020-09-23 18:03   ` Julien Grall
  2 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-16  7:51 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Julien Grall

On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> @@ -2277,8 +2299,10 @@ void leave_hypervisor_to_guest(void)
>  {
>      local_irq_disable();
>  
> -    check_for_vcpu_work();
> -    check_for_pcpu_work();
> +    do
> +    {
> +        check_for_pcpu_work();
> +    } while ( check_for_vcpu_work() );

Looking at patch 11 I've stumbled across changes done to code
related to this, and now I wonder: There's no mention in the
description of why this safe (i.e. not a potentially unbounded
loop).

As a nit - the opening brace does not belong on its own line in
this specific case.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
  2020-09-10 20:22 ` [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server() Oleksandr Tyshchenko
@ 2020-09-16  8:04   ` Jan Beulich
  2020-09-16  8:13     ` Paul Durrant
  2020-09-22 18:23     ` Oleksandr
  0 siblings, 2 replies; 111+ messages in thread
From: Jan Beulich @ 2020-09-16  8:04 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, Paul Durrant
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	Julien Grall

On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch introduces a helper the main purpose of which is to check
> if a domain is using IOREQ server(s).
> 
> On Arm the benefit is to avoid calling handle_hvm_io_completion()
> (which implies iterating over all possible IOREQ servers anyway)
> on every return in leave_hypervisor_to_guest() if there is no active
> servers for the particular domain.
> 
> This involves adding an extra per-domain variable to store the count
> of servers in use.

Since only Arm needs the variable (and the helper), perhaps both should
be Arm-specific (which looks to be possible without overly much hassle)?

> --- a/xen/common/ioreq.c
> +++ b/xen/common/ioreq.c
> @@ -38,9 +38,15 @@ static void set_ioreq_server(struct domain *d, unsigned int id,
>                               struct hvm_ioreq_server *s)
>  {
>      ASSERT(id < MAX_NR_IOREQ_SERVERS);
> -    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
> +    ASSERT((!s && d->arch.hvm.ioreq_server.server[id]) ||
> +           (s && !d->arch.hvm.ioreq_server.server[id]));

For one, this can be had with less redundancy (and imo even improved
clarity, but I guess this latter aspect my depend on personal
preferences):

    ASSERT(d->arch.hvm.ioreq_server.server[id] ? !s : !!s);

But then I wonder whether the original intention wasn't rather such
that replacing NULL by NULL is permissible. Paul?

>      d->arch.hvm.ioreq_server.server[id] = s;
> +
> +    if ( s )
> +        d->arch.hvm.ioreq_server.nr_servers ++;
> +    else
> +        d->arch.hvm.ioreq_server.nr_servers --;

Nit: Stray blanks (should be there only with binary operators).

> @@ -1395,6 +1401,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
>  void hvm_ioreq_init(struct domain *d)
>  {
>      spin_lock_init(&d->arch.hvm.ioreq_server.lock);
> +    d->arch.hvm.ioreq_server.nr_servers = 0;

There's no need for this - struct domain instances start out all
zero anyway.

> --- a/xen/include/xen/ioreq.h
> +++ b/xen/include/xen/ioreq.h
> @@ -57,6 +57,11 @@ struct hvm_ioreq_server {
>      uint8_t                bufioreq_handling;
>  };
>  
> +static inline bool hvm_domain_has_ioreq_server(const struct domain *d)
> +{
> +    return (d->arch.hvm.ioreq_server.nr_servers > 0);
> +}

This is safe only when d == current->domain and it's not paused,
or when they're distinct and d is paused. Otherwise the result is
stale before the caller can inspect it. This wants documenting by
at least a comment, but perhaps better by suitable ASSERT()s.

As in earlier patches I don't think a hvm_ prefix should be used
here.

Also as a nit: The parentheses here are unnecessary, and strictly
speaking the "> 0" is, too.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-09-10 20:22 ` [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
  2020-09-16  7:17   ` Jan Beulich
@ 2020-09-16  8:08   ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2020-09-16  8:08 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Wei Liu, Roger Pau Monné,
	Julien Grall

On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -635,7 +635,8 @@ int p2m_is_logdirty_range(struct p2m_domain *, unsigned long start,
>                            unsigned long end);
>  
>  /* Set foreign entry in the p2m table (for priv-mapping) */
> -int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
> +                          unsigned long gfn, mfn_t mfn);

Once

https://lists.xenproject.org/archives/html/xen-devel/2020-09/msg01092.html

has gone in, the new parameter wants to be const-qualified.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
  2020-09-16  8:04   ` Jan Beulich
@ 2020-09-16  8:13     ` Paul Durrant
  2020-09-16  8:39       ` Julien Grall
  2020-09-22 18:23     ` Oleksandr
  1 sibling, 1 reply; 111+ messages in thread
From: Paul Durrant @ 2020-09-16  8:13 UTC (permalink / raw)
  To: 'Jan Beulich', 'Oleksandr Tyshchenko'
  Cc: xen-devel, 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall',
	'Volodymyr Babchuk', 'Andrew Cooper',
	'Wei Liu', 'Roger Pau Monné',
	'Julien Grall'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 16 September 2020 09:05
> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano
> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger
> Pau Monné <roger.pau@citrix.com>; Julien Grall <julien.grall@arm.com>
> Subject: Re: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
> 
> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >
> > This patch introduces a helper the main purpose of which is to check
> > if a domain is using IOREQ server(s).
> >
> > On Arm the benefit is to avoid calling handle_hvm_io_completion()
> > (which implies iterating over all possible IOREQ servers anyway)
> > on every return in leave_hypervisor_to_guest() if there is no active
> > servers for the particular domain.
> >

Is this really worth it? The limit on the number of ioreq serves is small... just 8. I doubt you'd be able measure the difference.

> > This involves adding an extra per-domain variable to store the count
> > of servers in use.
> 
> Since only Arm needs the variable (and the helper), perhaps both should
> be Arm-specific (which looks to be possible without overly much hassle)?
> 
> > --- a/xen/common/ioreq.c
> > +++ b/xen/common/ioreq.c
> > @@ -38,9 +38,15 @@ static void set_ioreq_server(struct domain *d, unsigned int id,
> >                               struct hvm_ioreq_server *s)
> >  {
> >      ASSERT(id < MAX_NR_IOREQ_SERVERS);
> > -    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
> > +    ASSERT((!s && d->arch.hvm.ioreq_server.server[id]) ||
> > +           (s && !d->arch.hvm.ioreq_server.server[id]));
> 
> For one, this can be had with less redundancy (and imo even improved
> clarity, but I guess this latter aspect my depend on personal
> preferences):
> 
>     ASSERT(d->arch.hvm.ioreq_server.server[id] ? !s : !!s);
> 
> But then I wonder whether the original intention wasn't rather such
> that replacing NULL by NULL is permissible. Paul?
> 

Yikes, that's a long time ago.. but I can't see why the check for !s would be there unless it was indeed intended to allow replacing NULL with NULL.

  Paul



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
  2020-09-16  8:13     ` Paul Durrant
@ 2020-09-16  8:39       ` Julien Grall
  2020-09-16  8:43         ` Paul Durrant
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-16  8:39 UTC (permalink / raw)
  To: paul, 'Jan Beulich', 'Oleksandr Tyshchenko'
  Cc: xen-devel, 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Volodymyr Babchuk',
	'Andrew Cooper', 'Wei Liu',
	'Roger Pau Monné', 'Julien Grall'



On 16/09/2020 09:13, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 16 September 2020 09:05
>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant <paul@xen.org>
>> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano
>> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
>> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger
>> Pau Monné <roger.pau@citrix.com>; Julien Grall <julien.grall@arm.com>
>> Subject: Re: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
>>
>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> This patch introduces a helper the main purpose of which is to check
>>> if a domain is using IOREQ server(s).
>>>
>>> On Arm the benefit is to avoid calling handle_hvm_io_completion()
>>> (which implies iterating over all possible IOREQ servers anyway)
>>> on every return in leave_hypervisor_to_guest() if there is no active
>>> servers for the particular domain.
>>>
> 
> Is this really worth it? The limit on the number of ioreq serves is small... just 8. 

When I suggested this, I failed to realize there was only 8 IOREQ 
servers available. However, I would not be surprised if this increase 
long term as we want to use

> I doubt you'd be able measure the difference.
Bear in mind that entry/exit to the hypervisor is pretty "cheap" on Arm 
compare to x86. So we want to avoid doing extra work if it is not necessary.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
  2020-09-16  8:39       ` Julien Grall
@ 2020-09-16  8:43         ` Paul Durrant
  2020-09-22 18:39           ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Paul Durrant @ 2020-09-16  8:43 UTC (permalink / raw)
  To: 'Julien Grall', 'Jan Beulich',
	'Oleksandr Tyshchenko'
  Cc: xen-devel, 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Volodymyr Babchuk',
	'Andrew Cooper', 'Wei Liu',
	'Roger Pau Monné', 'Julien Grall'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 16 September 2020 09:39
> To: paul@xen.org; 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' <olekstysh@gmail.com>
> Cc: xen-devel@lists.xenproject.org; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>; 'Stefano
> Stabellini' <sstabellini@kernel.org>; 'Volodymyr Babchuk' <Volodymyr_Babchuk@epam.com>; 'Andrew
> Cooper' <andrew.cooper3@citrix.com>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>;
> 'Julien Grall' <julien.grall@arm.com>
> Subject: Re: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
> 
> 
> 
> On 16/09/2020 09:13, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 16 September 2020 09:05
> >> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant <paul@xen.org>
> >> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano
> >> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
> >> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>;
> Roger
> >> Pau Monné <roger.pau@citrix.com>; Julien Grall <julien.grall@arm.com>
> >> Subject: Re: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
> >>
> >> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> >>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>>
> >>> This patch introduces a helper the main purpose of which is to check
> >>> if a domain is using IOREQ server(s).
> >>>
> >>> On Arm the benefit is to avoid calling handle_hvm_io_completion()
> >>> (which implies iterating over all possible IOREQ servers anyway)
> >>> on every return in leave_hypervisor_to_guest() if there is no active
> >>> servers for the particular domain.
> >>>
> >
> > Is this really worth it? The limit on the number of ioreq serves is small... just 8.
> 
> When I suggested this, I failed to realize there was only 8 IOREQ
> servers available. However, I would not be surprised if this increase
> long term as we want to use

If that happens then we'll probably want to move (back to) a list rather than an array...

> 
> > I doubt you'd be able measure the difference.
> Bear in mind that entry/exit to the hypervisor is pretty "cheap" on Arm
> compare to x86. So we want to avoid doing extra work if it is not necessary.
> 

... which will seamlessly deal with this issue.

  Paul

> Cheers,
> 
> --
> Julien Grall



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common
  2020-09-10 20:22 ` [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common Oleksandr Tyshchenko
@ 2020-09-16  8:50   ` Jan Beulich
  2020-09-22 19:32     ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-16  8:50 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, Roger Pau Monné
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Wei Liu, Paul Durrant, Julien Grall

On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1490,6 +1490,12 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>      /* Ensure the hypercall trap instruction is re-executed. */
>      if ( current->hcall_preempted )
>          regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
> +
> +#ifdef CONFIG_IOREQ_SERVER
> +    if ( unlikely(current->domain->qemu_mapcache_invalidate) &&
> +         test_and_clear_bool(current->domain->qemu_mapcache_invalidate) )
> +        send_invalidate_req();
> +#endif
>  }

There's a lot of uses of "current" here now, and these don't look to
exactly be cheap on Arm either (they aren't on x86), so I wonder
whether this is the point where at least "current" wants latching
into a local variable here.

> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -18,8 +18,10 @@
>   *
>   * Copyright (c) 2017 Citrix Systems Ltd.
>   */
> +
>  #include <xen/lib.h>
>  #include <xen/hypercall.h>
> +#include <xen/ioreq.h>
>  #include <xen/nospec.h>

While I don't care much about the presence of absence of the blank
line between head comment and #include-s, I don't see why you add
one here.

> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1651,6 +1651,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +#ifdef CONFIG_IOREQ_SERVER
> +    if ( op == XENMEM_decrease_reservation )
> +        curr_d->qemu_mapcache_invalidate = true;
> +#endif

I don't see why you put this right into decrease_reservation(). This
isn't just to avoid the extra conditional, but first and foremost to
avoid bypassing the earlier return from the function (in the case of
preemption). In the context of this I wonder whether the ordering of
operations in hvm_hypercall() is actually correct.

I'm also unconvinced curr_d is the right domain in all cases here;
while this may be a pre-existing issue in principle, I'm afraid it
gets more pronounced by the logic getting moved to common code.
Roger - thoughts either way with, in particular, PVH Dom0 in mind?

> --- a/xen/include/xen/ioreq.h
> +++ b/xen/include/xen/ioreq.h
> @@ -97,6 +97,8 @@ static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
>             (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
>  }
>  
> +void send_invalidate_req(void);

Perhaps rename to ioreq_send_invalidate(), ioreq_send_invalidate_req(),
or send_invalidate_ioreq() at this occasion?

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -512,6 +512,8 @@ struct domain
>      /* Argo interdomain communication support */
>      struct argo_domain *argo;
>  #endif
> +
> +    bool_t qemu_mapcache_invalidate;

"bool" please.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-09-16  7:17   ` Jan Beulich
@ 2020-09-16  8:50     ` Julien Grall
  2020-09-16  8:52       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-16  8:50 UTC (permalink / raw)
  To: Jan Beulich, Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Wei Liu, Roger Pau Monné,
	Julien Grall

Hi,

On 16/09/2020 08:17, Jan Beulich wrote:
> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -1155,6 +1155,7 @@ static int acquire_resource(
>>           xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)];
>>           unsigned int i;
>>   
>> +#ifndef CONFIG_ARM
>>           /*
>>            * FIXME: Until foreign pages inserted into the P2M are properly
>>            *        reference counted, it is unsafe to allow mapping of
>> @@ -1162,13 +1163,14 @@ static int acquire_resource(
>>            */
>>           if ( !is_hardware_domain(currd) )
>>               return -EACCES;
>> +#endif
> 
> Instead of #ifdef, may I ask that a predicate like arch_refcounts_p2m()
> be used?

+1

> 
>>           if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
>>               rc = -EFAULT;
>>   
>>           for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>>           {
>> -            rc = set_foreign_p2m_entry(currd, gfn_list[i],
>> +            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>>                                          _mfn(mfn_list[i]));
> 
> Is it going to lead to proper behavior when d == currd, specifically
> for Arm but also in the abstract model? If you expose this to other
> than Dom0, this case needs at least considering (and hence mentioning
> in the description of why it's safe to permit if you don't reject
> such attempts).

This is already rejected by rcu_lock_remote_domain_by_id().

> Personally I'd view it as wrong to use
> p2m_map_foreign_rw in this case, even in the event that it can be
> shown that nothing breaks in such a case. But I'm open to be
> convinced of the opposite.

I would agree that p2m_map_foreign_rw would be wrong to use when currd 
== d. But this cannot happen, so I think p2m_map_foreign_rw is the 
proper type.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-09-16  8:50     ` Julien Grall
@ 2020-09-16  8:52       ` Jan Beulich
  2020-09-16  8:55         ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-16  8:52 UTC (permalink / raw)
  To: Julien Grall
  Cc: Oleksandr Tyshchenko, xen-devel, Oleksandr Tyshchenko,
	Stefano Stabellini, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Ian Jackson, Wei Liu, Roger Pau Monné,
	Julien Grall

On 16.09.2020 10:50, Julien Grall wrote:
> On 16/09/2020 08:17, Jan Beulich wrote:
>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>           for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>>>           {
>>> -            rc = set_foreign_p2m_entry(currd, gfn_list[i],
>>> +            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>>>                                          _mfn(mfn_list[i]));
>>
>> Is it going to lead to proper behavior when d == currd, specifically
>> for Arm but also in the abstract model? If you expose this to other
>> than Dom0, this case needs at least considering (and hence mentioning
>> in the description of why it's safe to permit if you don't reject
>> such attempts).
> 
> This is already rejected by rcu_lock_remote_domain_by_id().

Oh, yes, I'm sorry for overlooking this.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-09-16  8:52       ` Jan Beulich
@ 2020-09-16  8:55         ` Julien Grall
  2020-09-22 17:30           ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-16  8:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Oleksandr Tyshchenko, xen-devel, Oleksandr Tyshchenko,
	Stefano Stabellini, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Ian Jackson, Wei Liu, Roger Pau Monné,
	Julien Grall



On 16/09/2020 09:52, Jan Beulich wrote:
> On 16.09.2020 10:50, Julien Grall wrote:
>> On 16/09/2020 08:17, Jan Beulich wrote:
>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>>            for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>>>>            {
>>>> -            rc = set_foreign_p2m_entry(currd, gfn_list[i],
>>>> +            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>>>>                                           _mfn(mfn_list[i]));
>>>
>>> Is it going to lead to proper behavior when d == currd, specifically
>>> for Arm but also in the abstract model? If you expose this to other
>>> than Dom0, this case needs at least considering (and hence mentioning
>>> in the description of why it's safe to permit if you don't reject
>>> such attempts).
>>
>> This is already rejected by rcu_lock_remote_domain_by_id().
> 
> Oh, yes, I'm sorry for overlooking this.

That's fine, I also overlooked it when I originally wrote the code.

@oleksandr, it might be worth mentioning in the commit message and maybe 
in the code this subtlety.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-09-10 20:22 ` [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
@ 2020-09-16  9:04   ` Jan Beulich
  2020-09-16  9:07     ` Julien Grall
  2020-09-16  9:07     ` Paul Durrant
  2020-09-23 18:05   ` Julien Grall
  1 sibling, 2 replies; 111+ messages in thread
From: Jan Beulich @ 2020-09-16  9:04 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Julien Grall,
	Stefano Stabellini, Julien Grall

On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> @@ -1325,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
>  
>          new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
>          new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> -        cmpxchg(&pg->ptrs.full, old.full, new.full);
> +        guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full);

But the memory we're updating is shared with s->emulator, not with d,
if I'm not mistaken.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-09-16  9:04   ` Jan Beulich
@ 2020-09-16  9:07     ` Julien Grall
  2020-09-16  9:09       ` Paul Durrant
  2020-09-16  9:07     ` Paul Durrant
  1 sibling, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-16  9:07 UTC (permalink / raw)
  To: Jan Beulich, Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant,
	Stefano Stabellini, Julien Grall



On 16/09/2020 10:04, Jan Beulich wrote:
> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>> @@ -1325,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
>>   
>>           new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
>>           new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
>> -        cmpxchg(&pg->ptrs.full, old.full, new.full);
>> +        guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full);
> 
> But the memory we're updating is shared with s->emulator, not with d,
> if I'm not mistaken.

It is unfortunately shared with both s->emulator and d when using the 
legacy interface.

For Arm, there is no plan to support the legacy interface, so we should 
s->emulator and we should be fully protected.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-09-16  9:04   ` Jan Beulich
  2020-09-16  9:07     ` Julien Grall
@ 2020-09-16  9:07     ` Paul Durrant
  1 sibling, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2020-09-16  9:07 UTC (permalink / raw)
  To: 'Jan Beulich', 'Oleksandr Tyshchenko'
  Cc: xen-devel, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Stefano Stabellini', 'Julien Grall'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 16 September 2020 10:04
> To: Oleksandr Tyshchenko <olekstysh@gmail.com>
> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Paul Durrant
> <paul@xen.org>; Julien Grall <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Julien
> Grall <jgrall@amazon.com>
> Subject: Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
> 
> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> > @@ -1325,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
> >
> >          new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> >          new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> > -        cmpxchg(&pg->ptrs.full, old.full, new.full);
> > +        guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full);
> 
> But the memory we're updating is shared with s->emulator, not with d,
> if I'm not mistaken.
> 

You're not mistaken.

  Paul

> Jan



^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-09-16  9:07     ` Julien Grall
@ 2020-09-16  9:09       ` Paul Durrant
  2020-09-16  9:12         ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Paul Durrant @ 2020-09-16  9:09 UTC (permalink / raw)
  To: 'Julien Grall', 'Jan Beulich',
	'Oleksandr Tyshchenko'
  Cc: xen-devel, 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 16 September 2020 10:07
> To: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko <olekstysh@gmail.com>
> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Paul Durrant
> <paul@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Julien Grall <jgrall@amazon.com>
> Subject: Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
> 
> 
> 
> On 16/09/2020 10:04, Jan Beulich wrote:
> > On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
> >> @@ -1325,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
> >>
> >>           new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> >>           new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> >> -        cmpxchg(&pg->ptrs.full, old.full, new.full);
> >> +        guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full);
> >
> > But the memory we're updating is shared with s->emulator, not with d,
> > if I'm not mistaken.
> 
> It is unfortunately shared with both s->emulator and d when using the
> legacy interface.

When using magic pages they should be punched out of the P2M by the time the code gets here, so the memory should not be guest-visible.

  Paul

> 
> For Arm, there is no plan to support the legacy interface, so we should
> s->emulator and we should be fully protected.
> 
> Cheers,
> 
> --
> Julien Grall



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-09-16  9:09       ` Paul Durrant
@ 2020-09-16  9:12         ` Julien Grall
  2020-09-22 20:05           ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-16  9:12 UTC (permalink / raw)
  To: paul, 'Jan Beulich', 'Oleksandr Tyshchenko'
  Cc: xen-devel, 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall'



On 16/09/2020 10:09, Paul Durrant wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 16 September 2020 10:07
>> To: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko <olekstysh@gmail.com>
>> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Paul Durrant
>> <paul@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Julien Grall <jgrall@amazon.com>
>> Subject: Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
>>
>>
>>
>> On 16/09/2020 10:04, Jan Beulich wrote:
>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>> @@ -1325,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
>>>>
>>>>            new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
>>>>            new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
>>>> -        cmpxchg(&pg->ptrs.full, old.full, new.full);
>>>> +        guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full);
>>>
>>> But the memory we're updating is shared with s->emulator, not with d,
>>> if I'm not mistaken.
>>
>> It is unfortunately shared with both s->emulator and d when using the
>> legacy interface.
> 
> When using magic pages they should be punched out of the P2M by the time the code gets here, so the memory should not be guest-visible.

Can you point me to the code that doing this?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common
  2020-09-14 13:52   ` Jan Beulich
@ 2020-09-21 12:22     ` Oleksandr
  2020-09-21 12:31       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-21 12:22 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall


On 14.09.20 16:52, Jan Beulich wrote:

Hi Jan

> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> As a lot of x86 code can be re-used on Arm later on, this patch
>> prepares IOREQ support before moving to the common code. This way
>> we will get almost a verbatim copy for a code movement.
>>
>> This support is going to be used on Arm to be able run device
>> emulator outside of Xen hypervisor.
> This is all fine, but you should then go on and explain what you're
> doing, and why (at which point it may become obvious that it would
> be more helpful to split this into a couple of steps).

Got it. Will add an explanation.


> In particular
> something as suspicious as ...
>
>> Changes RFC -> V1:
>>     - new patch, was split from:
>>       "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common"
>>     - fold the check of p->type into hvm_get_ioreq_server_range_type()
>>       and make it return success/failure
>>     - remove relocate_portio_handler() call from arch_hvm_ioreq_destroy()
>>       in arch/x86/hvm/ioreq.c
> ... this (see below).
>
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -170,6 +170,29 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>>       return true;
>>   }
>>   
>> +bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion)
> Do we need "handle" in here? Without it, I'd also not have to ask about
> moving hvm further to the front of the name...
For me without "handle" it will sound a bit confusing because of the 
enum type which
has a similar name but without "arch" prefix:
bool arch_hvm_io_completion(enum hvm_io_completion io_completion)


Shall I then move "hvm" to the front of the name here?


>
>> @@ -836,6 +848,12 @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
>>       return rc;
>>   }
>>   
>> +/* Called when target domain is paused */
>> +int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s)
>> +{
>> +    return p2m_set_ioreq_server(s->target, 0, s);
>> +}
> Why return "int" when ...
>
>> @@ -855,7 +873,7 @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
>>   
>>       domain_pause(d);
>>   
>> -    p2m_set_ioreq_server(d, 0, s);
>> +    arch_hvm_destroy_ioreq_server(s);
> ... the result has been ignored anyway? Or otherwise I guess you'd
> want to add error handling here (but then the result of
> p2m_set_ioreq_server() should still get ignored, for backwards
> compatibility).

I didn't have a plan to add error handling here. Agree, will make 
arch_hvm_destroy_ioreq_server() returning void.


>
>> @@ -1215,8 +1233,7 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>>       struct hvm_ioreq_server *s;
>>       unsigned int id;
>>   
>> -    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
>> -        return;
>> +    arch_hvm_ioreq_destroy(d);
> So the call to relocate_portio_handler() simply goes away. No
> replacement, no explanation?
As I understand from the review comment on that for the RFC patch, there 
is no
a lot of point of keeping this. Indeed, looking at the code I got the 
same opinion.
I should have added an explanation in the commit description at least.
Or shall I return the call back?


>
>> @@ -1239,19 +1256,15 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>>       spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
>>   }
>>   
>> -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>> -                                                 ioreq_t *p)
>> +int hvm_get_ioreq_server_range_type(struct domain *d,
>> +                                    ioreq_t *p,
> At least p, but perhaps also d can gain const?

Agree, will add.


>
>> +                                    uint8_t *type,
>> +                                    uint64_t *addr)
> By the name the function returns a type for a range (albeit without
> it being clear where the two bounds of such a range actually live).
> By the implementation is looks more like you mean "range_and_type",
> albeit still without there really being a range passed back to the
> caller. Therefore I think I need some clarification on what's
> intended before even being able to suggest something.

This function is just an attempt to split arch specific things (cf8 
handling) from "common" hvm_select_ioreq_server().


>  From ...
>
>> +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>> +                                                 ioreq_t *p)
>> +{
>> +    struct hvm_ioreq_server *s;
>> +    uint8_t type;
>> +    uint64_t addr;
>> +    unsigned int id;
>> +
>> +    if ( hvm_get_ioreq_server_range_type(d, p, &type, &addr) )
>> +        return NULL;
> ... this use - maybe hvm_ioreq_server_get_type_addr() (albeit I
> don't like this name very much)?

Yes, hvm_ioreq_server_get_type_addr() better represents what function does.


>
>> @@ -1351,7 +1378,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
>>       pg = iorp->va;
>>   
>>       if ( !pg )
>> -        return X86EMUL_UNHANDLEABLE;
>> +        return IOREQ_IO_UNHANDLED;
> At this example - why the IO infix, duplicating the prefix? I'd
> suggest to either drop it (if the remaining identifiers are deemed
> unambiguous enough) or use e.g. IOREQ_STATUS_*.

Agree, I would prefer IOREQ_STATUS_*


>> @@ -1515,11 +1542,21 @@ static int hvm_access_cf8(
>>       return X86EMUL_UNHANDLEABLE;
>>   }
>>   
>> +void arch_hvm_ioreq_init(struct domain *d)
>> +{
>> +    register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
>> +}
>> +
>> +void arch_hvm_ioreq_destroy(struct domain *d)
>> +{
>> +
>> +}
> Stray blank line?

Will remove.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common
  2020-09-21 12:22     ` Oleksandr
@ 2020-09-21 12:31       ` Jan Beulich
  2020-09-21 12:47         ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-21 12:31 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall

On 21.09.2020 14:22, Oleksandr wrote:
> On 14.09.20 16:52, Jan Beulich wrote:
>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>> --- a/xen/arch/x86/hvm/ioreq.c
>>> +++ b/xen/arch/x86/hvm/ioreq.c
>>> @@ -170,6 +170,29 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>>>       return true;
>>>   }
>>>   
>>> +bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion)
>> Do we need "handle" in here? Without it, I'd also not have to ask about
>> moving hvm further to the front of the name...
> For me without "handle" it will sound a bit confusing because of the 
> enum type which
> has a similar name but without "arch" prefix:
> bool arch_hvm_io_completion(enum hvm_io_completion io_completion)

Every function handles something; there's no point including
"handle" in every name. Or else we'd have handle_memset()
instead of just memset(), for example.

> Shall I then move "hvm" to the front of the name here?

As per comments on later patches, I think we want to consider dropping
hvm prefixes or infixes altogether from the functions involved here.

>>> @@ -1215,8 +1233,7 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>>>       struct hvm_ioreq_server *s;
>>>       unsigned int id;
>>>   
>>> -    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
>>> -        return;
>>> +    arch_hvm_ioreq_destroy(d);
>> So the call to relocate_portio_handler() simply goes away. No
>> replacement, no explanation?
> As I understand from the review comment on that for the RFC patch, there 
> is no
> a lot of point of keeping this. Indeed, looking at the code I got the 
> same opinion.
> I should have added an explanation in the commit description at least.
> Or shall I return the call back?

If there's a reason to drop it (which I can't see, but I also
don't recall seeing the discussion you're mentioning), then doing
so should be a separate patch with suitable reasoning. In the
patch here you definitely should only transform what's already
there.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common
  2020-09-21 12:31       ` Jan Beulich
@ 2020-09-21 12:47         ` Oleksandr
  2020-09-21 13:29           ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-21 12:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall


On 21.09.20 15:31, Jan Beulich wrote:

Hi

> On 21.09.2020 14:22, Oleksandr wrote:
>> On 14.09.20 16:52, Jan Beulich wrote:
>>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>>> --- a/xen/arch/x86/hvm/ioreq.c
>>>> +++ b/xen/arch/x86/hvm/ioreq.c
>>>> @@ -170,6 +170,29 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>>>>        return true;
>>>>    }
>>>>    
>>>> +bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion)
>>> Do we need "handle" in here? Without it, I'd also not have to ask about
>>> moving hvm further to the front of the name...
>> For me without "handle" it will sound a bit confusing because of the
>> enum type which
>> has a similar name but without "arch" prefix:
>> bool arch_hvm_io_completion(enum hvm_io_completion io_completion)
> Every function handles something; there's no point including
> "handle" in every name. Or else we'd have handle_memset()
> instead of just memset(), for example.

Got it. Will remove "handle" here.


>
>> Shall I then move "hvm" to the front of the name here?
> As per comments on later patches, I think we want to consider dropping
> hvm prefixes or infixes altogether from the functions involved here.
>>>> @@ -1215,8 +1233,7 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>>>>        struct hvm_ioreq_server *s;
>>>>        unsigned int id;
>>>>    
>>>> -    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
>>>> -        return;
>>>> +    arch_hvm_ioreq_destroy(d);
>>> So the call to relocate_portio_handler() simply goes away. No
>>> replacement, no explanation?
>> As I understand from the review comment on that for the RFC patch, there
>> is no
>> a lot of point of keeping this. Indeed, looking at the code I got the
>> same opinion.
>> I should have added an explanation in the commit description at least.
>> Or shall I return the call back?
> If there's a reason to drop it (which I can't see, but I also
> don't recall seeing the discussion you're mentioning), then doing
> so should be a separate patch with suitable reasoning. In the
> patch here you definitely should only transform what's already
> there.
Sounds reasonable. Please see the comment below 
relocate_portio_handler() here:
https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg78512.html

However, I might interpret the request incorrectly.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common
  2020-09-21 12:47         ` Oleksandr
@ 2020-09-21 13:29           ` Jan Beulich
  2020-09-21 14:43             ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-21 13:29 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall

On 21.09.2020 14:47, Oleksandr wrote:
> On 21.09.20 15:31, Jan Beulich wrote:
>> On 21.09.2020 14:22, Oleksandr wrote:
>>> On 14.09.20 16:52, Jan Beulich wrote:
>>>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>>>> @@ -1215,8 +1233,7 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>>>>>        struct hvm_ioreq_server *s;
>>>>>        unsigned int id;
>>>>>    
>>>>> -    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
>>>>> -        return;
>>>>> +    arch_hvm_ioreq_destroy(d);
>>>> So the call to relocate_portio_handler() simply goes away. No
>>>> replacement, no explanation?
>>> As I understand from the review comment on that for the RFC patch, there
>>> is no
>>> a lot of point of keeping this. Indeed, looking at the code I got the
>>> same opinion.
>>> I should have added an explanation in the commit description at least.
>>> Or shall I return the call back?
>> If there's a reason to drop it (which I can't see, but I also
>> don't recall seeing the discussion you're mentioning), then doing
>> so should be a separate patch with suitable reasoning. In the
>> patch here you definitely should only transform what's already
>> there.
> Sounds reasonable. Please see the comment below 
> relocate_portio_handler() here:
> https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg78512.html
> 
> However, I might interpret the request incorrectly.

I'm afraid you do: The way you've coded it the function was a no-op.
But that's because you broke the caller by not bailing from
hvm_destroy_all_ioreq_servers() if relocate_portio_handler() returned
false. IOW you did assume that moving the "return" statement into an
inline function would have an effect on its caller(s). For questions
like this one it also often helps to look at the commit introducing
the construct in question (b3344bb1cae0 in this case): Chances are
that the description helps, albeit I agree there are many cases
(particularly the farther you get into distant past) where it isn't
of much help.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common
  2020-09-21 13:29           ` Jan Beulich
@ 2020-09-21 14:43             ` Oleksandr
  2020-09-21 15:28               ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-21 14:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall


On 21.09.20 16:29, Jan Beulich wrote:

Hi

> On 21.09.2020 14:47, Oleksandr wrote:
>> On 21.09.20 15:31, Jan Beulich wrote:
>>> On 21.09.2020 14:22, Oleksandr wrote:
>>>> On 14.09.20 16:52, Jan Beulich wrote:
>>>>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>>>>> @@ -1215,8 +1233,7 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>>>>>>         struct hvm_ioreq_server *s;
>>>>>>         unsigned int id;
>>>>>>     
>>>>>> -    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
>>>>>> -        return;
>>>>>> +    arch_hvm_ioreq_destroy(d);
>>>>> So the call to relocate_portio_handler() simply goes away. No
>>>>> replacement, no explanation?
>>>> As I understand from the review comment on that for the RFC patch, there
>>>> is no
>>>> a lot of point of keeping this. Indeed, looking at the code I got the
>>>> same opinion.
>>>> I should have added an explanation in the commit description at least.
>>>> Or shall I return the call back?
>>> If there's a reason to drop it (which I can't see, but I also
>>> don't recall seeing the discussion you're mentioning), then doing
>>> so should be a separate patch with suitable reasoning. In the
>>> patch here you definitely should only transform what's already
>>> there.
>> Sounds reasonable. Please see the comment below
>> relocate_portio_handler() here:
>> https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg78512.html
>>
>> However, I might interpret the request incorrectly.
> I'm afraid you do: The way you've coded it the function was a no-op.
> But that's because you broke the caller by not bailing from
> hvm_destroy_all_ioreq_servers() if relocate_portio_handler() returned
> false. IOW you did assume that moving the "return" statement into an
> inline function would have an effect on its caller(s). For questions
> like this one it also often helps to look at the commit introducing
> the construct in question (b3344bb1cae0 in this case): Chances are
> that the description helps, albeit I agree there are many cases
> (particularly the farther you get into distant past) where it isn't
> of much help.


Hmm, now it's clear to me what I did wrong. By calling 
relocate_portio_handler() here we don't really want to relocate 
something, we just use it as a flag to indicate whether we need to 
actually release IOREQ resources down the 
hvm_destroy_all_ioreq_servers(). Thank you for the explanation, it 
wasn't obvious to me at the beginning. But, now the question is how to 
do it in a correct way and retain current behavior (to not break callers)?

I see two options here:
1. Place the check of relocate_portio_handler() in 
arch_hvm_ioreq_destroy() and make the later returning bool.
     The "common" hvm_destroy_all_ioreq_servers() will check for the 
return value and bail out if false.
2. Don't use relocate_portio_handler(), instead introduce a flag into 
struct hvm_domain's ioreq_server sub-structure.


Personally I don't like much the option 1 and option 2 is a little bit 
overhead.

What do you think?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common
  2020-09-21 14:43             ` Oleksandr
@ 2020-09-21 15:28               ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2020-09-21 15:28 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall

On 21.09.2020 16:43, Oleksandr wrote:
> On 21.09.20 16:29, Jan Beulich wrote:
>> On 21.09.2020 14:47, Oleksandr wrote:
>>> On 21.09.20 15:31, Jan Beulich wrote:
>>>> On 21.09.2020 14:22, Oleksandr wrote:
>>>>> On 14.09.20 16:52, Jan Beulich wrote:
>>>>>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>>>>>> @@ -1215,8 +1233,7 @@ void hvm_destroy_all_ioreq_servers(struct domain *d)
>>>>>>>         struct hvm_ioreq_server *s;
>>>>>>>         unsigned int id;
>>>>>>>     
>>>>>>> -    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
>>>>>>> -        return;
>>>>>>> +    arch_hvm_ioreq_destroy(d);
>>>>>> So the call to relocate_portio_handler() simply goes away. No
>>>>>> replacement, no explanation?
>>>>> As I understand from the review comment on that for the RFC patch, there
>>>>> is no
>>>>> a lot of point of keeping this. Indeed, looking at the code I got the
>>>>> same opinion.
>>>>> I should have added an explanation in the commit description at least.
>>>>> Or shall I return the call back?
>>>> If there's a reason to drop it (which I can't see, but I also
>>>> don't recall seeing the discussion you're mentioning), then doing
>>>> so should be a separate patch with suitable reasoning. In the
>>>> patch here you definitely should only transform what's already
>>>> there.
>>> Sounds reasonable. Please see the comment below
>>> relocate_portio_handler() here:
>>> https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg78512.html
>>>
>>> However, I might interpret the request incorrectly.
>> I'm afraid you do: The way you've coded it the function was a no-op.
>> But that's because you broke the caller by not bailing from
>> hvm_destroy_all_ioreq_servers() if relocate_portio_handler() returned
>> false. IOW you did assume that moving the "return" statement into an
>> inline function would have an effect on its caller(s). For questions
>> like this one it also often helps to look at the commit introducing
>> the construct in question (b3344bb1cae0 in this case): Chances are
>> that the description helps, albeit I agree there are many cases
>> (particularly the farther you get into distant past) where it isn't
>> of much help.
> 
> 
> Hmm, now it's clear to me what I did wrong. By calling 
> relocate_portio_handler() here we don't really want to relocate 
> something, we just use it as a flag to indicate whether we need to 
> actually release IOREQ resources down the 
> hvm_destroy_all_ioreq_servers(). Thank you for the explanation, it 
> wasn't obvious to me at the beginning. But, now the question is how to 
> do it in a correct way and retain current behavior (to not break callers)?
> 
> I see two options here:
> 1. Place the check of relocate_portio_handler() in 
> arch_hvm_ioreq_destroy() and make the later returning bool.
>      The "common" hvm_destroy_all_ioreq_servers() will check for the 
> return value and bail out if false.
> 2. Don't use relocate_portio_handler(), instead introduce a flag into 
> struct hvm_domain's ioreq_server sub-structure.
> 
> 
> Personally I don't like much the option 1 and option 2 is a little bit 
> overhead.

Well, 1 is what matches current behavior, so I'd advocate for you
not changing the abstract model. Or else, again, change the abstract
model in a separate prereq patch.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-14 14:17   ` Jan Beulich
@ 2020-09-21 19:02     ` Oleksandr
  2020-09-22  6:33       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-21 19:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall


On 14.09.20 17:17, Jan Beulich wrote:

Hi Jan

> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>> ---
>>   MAINTAINERS                     |    8 +-
>>   xen/arch/x86/Kconfig            |    1 +
>>   xen/arch/x86/hvm/dm.c           |    2 +-
>>   xen/arch/x86/hvm/emulate.c      |    2 +-
>>   xen/arch/x86/hvm/hvm.c          |    2 +-
>>   xen/arch/x86/hvm/io.c           |    2 +-
>>   xen/arch/x86/hvm/ioreq.c        | 1425 +--------------------------------------
>>   xen/arch/x86/hvm/stdvga.c       |    2 +-
>>   xen/arch/x86/hvm/vmx/vvmx.c     |    3 +-
>>   xen/arch/x86/mm.c               |    2 +-
>>   xen/arch/x86/mm/shadow/common.c |    2 +-
>>   xen/common/Kconfig              |    3 +
>>   xen/common/Makefile             |    1 +
>>   xen/common/ioreq.c              | 1410 ++++++++++++++++++++++++++++++++++++++
> This suggests it was almost the entire file which got moved. It would
> be really nice if you could convince git to show the diff here, rather
> than removal and addition of 1400 lines.
>
> Additionally I wonder whether what's left in the original file wouldn't
> better become inline functions now. If this was done in the previous
> patch, the change would be truly a rename then, I think.
Last time when trying to make something inline in arch files for the RFC 
series (I don't remember exactly for what it was)
I got completely stuck with the build issues due to the header 
(inter-?)dependencies, which I failed to resolve).
Anyway I got your point and will try to experiment with that.


>
>> --- a/xen/include/asm-x86/hvm/ioreq.h
>> +++ b/xen/include/asm-x86/hvm/ioreq.h
>> @@ -19,41 +19,12 @@
>>   #ifndef __ASM_X86_HVM_IOREQ_H__
>>   #define __ASM_X86_HVM_IOREQ_H__
>>   
>> -bool hvm_io_pending(struct vcpu *v);
>> -bool handle_hvm_io_completion(struct vcpu *v);
>> -bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
>> +#include <asm/hvm/emulate.h>
>> +#include <asm/hvm/hvm.h>
>> +#include <asm/hvm/vmx/vmx.h>
> Are all three really needed here? Especially the last one strikes me as
> odd.

We can leave only #include <asm/hvm/emulate.h> here and move #include 
<asm/hvm/vmx/vmx.h> to x86/hvm/ioreq.c.
Also #include <asm/hvm/hvm.h> could be dropped.


>
>> --- /dev/null
>> +++ b/xen/include/xen/ioreq.h
>> @@ -0,0 +1,82 @@
>> +/*
>> + * ioreq.h: Hardware virtual machine assist interface definitions.
>> + *
>> + * Copyright (c) 2016 Citrix Systems Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef __IOREQ_H__
>> +#define __IOREQ_H__
> __XEN_IOREQ_H__ please.

ok


>
>> +#include <xen/sched.h>
>> +
>> +#include <asm/hvm/ioreq.h>
> Is this include really needed here (i.e. by the code further down in
> the file, and hence by everyone including this header), or rather
> just in a few specific .c files?
I think, just in few specific .c files. Which are x86/hvm/ioreq.c and 
common/ioreq.c now and several other files later on (x86/hvm/dm.c, 
arm/io.c, etc)
Shall I include that header in these files instead?


>
>> +#define GET_IOREQ_SERVER(d, id) \
>> +    (d)->arch.hvm.ioreq_server.server[id]
> arch.hvm.* feels like a layering violation when used in this header.
Got it. The only reason why GET_IOREQ_SERVER is here is inline 
get_ioreq_server(). I will make it non-inline and move both to 
common/ioreq.c.
I assume this layering violation issue applies for 
hvm_domain_has_ioreq_server() introduced in
[PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-21 19:02     ` Oleksandr
@ 2020-09-22  6:33       ` Jan Beulich
  2020-09-22  9:58         ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-22  6:33 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall

On 21.09.2020 21:02, Oleksandr wrote:
> On 14.09.20 17:17, Jan Beulich wrote:
>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>> --- /dev/null
>>> +++ b/xen/include/xen/ioreq.h
>>> @@ -0,0 +1,82 @@
>>> +/*
>>> + * ioreq.h: Hardware virtual machine assist interface definitions.
>>> + *
>>> + * Copyright (c) 2016 Citrix Systems Inc.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify it
>>> + * under the terms and conditions of the GNU General Public License,
>>> + * version 2, as published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>>> + * more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along with
>>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#ifndef __IOREQ_H__
>>> +#define __IOREQ_H__
>> __XEN_IOREQ_H__ please.
> 
> ok
> 
> 
>>
>>> +#include <xen/sched.h>
>>> +
>>> +#include <asm/hvm/ioreq.h>
>> Is this include really needed here (i.e. by the code further down in
>> the file, and hence by everyone including this header), or rather
>> just in a few specific .c files?
> I think, just in few specific .c files. Which are x86/hvm/ioreq.c and 
> common/ioreq.c now and several other files later on (x86/hvm/dm.c, 
> arm/io.c, etc)
> Shall I include that header in these files instead?

Yes please, and please take this as a common guideline. While
private headers are often used to include things needed by all
of the (typically few) users of the header, non-private ones
shouldn't create unnecessary dependencies on other headers. As
you've said further up - you did run into hard to resolve
header dependencies yourself, and the practice of including
headers without strict need is one of the reasons of such
problems.

>>> +#define GET_IOREQ_SERVER(d, id) \
>>> +    (d)->arch.hvm.ioreq_server.server[id]
>> arch.hvm.* feels like a layering violation when used in this header.
> Got it. The only reason why GET_IOREQ_SERVER is here is inline 
> get_ioreq_server(). I will make it non-inline and move both to 
> common/ioreq.c.

Which won't make the layering violation go away. It's still
common rather than per-arch code then. As suggested elsewhere,
I think the whole ioreq_server struct wants to move into
struct domain itself, perhaps inside a new #ifdef (iirc one of
the patches introduces a suitable Kconfig option). This goes
alongside my suggestion to drop the "hvm" prefixes and infixes
from involved function names.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-22  6:33       ` Jan Beulich
@ 2020-09-22  9:58         ` Oleksandr
  2020-09-22 10:54           ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-22  9:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall


On 22.09.20 09:33, Jan Beulich wrote:

Hi Jan

> On 21.09.2020 21:02, Oleksandr wrote:
>> On 14.09.20 17:17, Jan Beulich wrote:
>>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>>> --- /dev/null
>>>> +++ b/xen/include/xen/ioreq.h
>>>> @@ -0,0 +1,82 @@
>>>> +/*
>>>> + * ioreq.h: Hardware virtual machine assist interface definitions.
>>>> + *
>>>> + * Copyright (c) 2016 Citrix Systems Inc.
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify it
>>>> + * under the terms and conditions of the GNU General Public License,
>>>> + * version 2, as published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>>>> + * more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License along with
>>>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#ifndef __IOREQ_H__
>>>> +#define __IOREQ_H__
>>> __XEN_IOREQ_H__ please.
>> ok
>>
>>
>>>> +#include <xen/sched.h>
>>>> +
>>>> +#include <asm/hvm/ioreq.h>
>>> Is this include really needed here (i.e. by the code further down in
>>> the file, and hence by everyone including this header), or rather
>>> just in a few specific .c files?
>> I think, just in few specific .c files. Which are x86/hvm/ioreq.c and
>> common/ioreq.c now and several other files later on (x86/hvm/dm.c,
>> arm/io.c, etc)
>> Shall I include that header in these files instead?
> Yes please, and please take this as a common guideline. While
> private headers are often used to include things needed by all
> of the (typically few) users of the header, non-private ones
> shouldn't create unnecessary dependencies on other headers. As
> you've said further up - you did run into hard to resolve
> header dependencies yourself, and the practice of including
> headers without strict need is one of the reasons of such
> problems.

Got it.


>
>>>> +#define GET_IOREQ_SERVER(d, id) \
>>>> +    (d)->arch.hvm.ioreq_server.server[id]
>>> arch.hvm.* feels like a layering violation when used in this header.
>> Got it. The only reason why GET_IOREQ_SERVER is here is inline
>> get_ioreq_server(). I will make it non-inline and move both to
>> common/ioreq.c.
> Which won't make the layering violation go away. It's still
> common rather than per-arch code then. As suggested elsewhere,
> I think the whole ioreq_server struct wants to move into
> struct domain itself, perhaps inside a new #ifdef (iirc one of
> the patches introduces a suitable Kconfig option).
Well, your advise regarding ioreq_server sounds reasonable, but the 
common ioreq.c
still will have other *arch.hvm.* for both vcpu and domain. So looks 
like other involved structs should be moved
into *common* struct domain/vcpu itself, correct? Some of them could be 
moved easily since contain the same fields (arch.hvm.ioreq_gfn),
but some of them couldn't and seems to require to pull a lot of changes 
to the Xen code (arch.hvm.params, arch.hvm.hvm_io), I am afraid.
Or I missed something?


> This goes
> alongside my suggestion to drop the "hvm" prefixes and infixes
> from involved function names.
Well, I assume this request as well as the request above should be 
addressed in the follow-up patches, as we want to keep the code movement 
in current patch as (almost) verbatim copy,
Am I correct?

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-22  9:58         ` Oleksandr
@ 2020-09-22 10:54           ` Jan Beulich
  2020-09-22 15:05             ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-22 10:54 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall

On 22.09.2020 11:58, Oleksandr wrote:
> On 22.09.20 09:33, Jan Beulich wrote:
>> On 21.09.2020 21:02, Oleksandr wrote:
>>> On 14.09.20 17:17, Jan Beulich wrote:
>>>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>>>> +#define GET_IOREQ_SERVER(d, id) \
>>>>> +    (d)->arch.hvm.ioreq_server.server[id]
>>>> arch.hvm.* feels like a layering violation when used in this header.
>>> Got it. The only reason why GET_IOREQ_SERVER is here is inline
>>> get_ioreq_server(). I will make it non-inline and move both to
>>> common/ioreq.c.
>> Which won't make the layering violation go away. It's still
>> common rather than per-arch code then. As suggested elsewhere,
>> I think the whole ioreq_server struct wants to move into
>> struct domain itself, perhaps inside a new #ifdef (iirc one of
>> the patches introduces a suitable Kconfig option).
> Well, your advise regarding ioreq_server sounds reasonable, but the 
> common ioreq.c
> still will have other *arch.hvm.* for both vcpu and domain. So looks 
> like other involved structs should be moved
> into *common* struct domain/vcpu itself, correct? Some of them could be 
> moved easily since contain the same fields (arch.hvm.ioreq_gfn),
> but some of them couldn't and seems to require to pull a lot of changes 
> to the Xen code (arch.hvm.params, arch.hvm.hvm_io), I am afraid.
> Or I missed something?

arch.hvm.params, iirc, is an x86 concept, and hence would need
abstracting away anyway. I expect this will be common pattern:
Either you want things to become generic (structure fields
living directly in struct domain, or at least not under arch.hvm),
or things need abstracting for per-arch handling.

>> This goes
>> alongside my suggestion to drop the "hvm" prefixes and infixes
>> from involved function names.
> Well, I assume this request as well as the request above should be 
> addressed in the follow-up patches, as we want to keep the code movement 
> in current patch as (almost) verbatim copy,
> Am I correct?

The renaming could imo be done before or after the move, but within
a single series. Doing it (or some of it) during the move may be
acceptable, but this primarily depends on the overall effect on the
patch that this would have. I.e. the patch better wouldn't become
gigantic just because all the renaming gets done in one go, and it's
hundreds of places that need touching.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-22 10:54           ` Jan Beulich
@ 2020-09-22 15:05             ` Oleksandr
  2020-09-22 15:52               ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-22 15:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall


On 22.09.20 13:54, Jan Beulich wrote:

Hi Jan

> On 22.09.2020 11:58, Oleksandr wrote:
>> On 22.09.20 09:33, Jan Beulich wrote:
>>> On 21.09.2020 21:02, Oleksandr wrote:
>>>> On 14.09.20 17:17, Jan Beulich wrote:
>>>>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>>>>> +#define GET_IOREQ_SERVER(d, id) \
>>>>>> +    (d)->arch.hvm.ioreq_server.server[id]
>>>>> arch.hvm.* feels like a layering violation when used in this header.
>>>> Got it. The only reason why GET_IOREQ_SERVER is here is inline
>>>> get_ioreq_server(). I will make it non-inline and move both to
>>>> common/ioreq.c.
>>> Which won't make the layering violation go away. It's still
>>> common rather than per-arch code then. As suggested elsewhere,
>>> I think the whole ioreq_server struct wants to move into
>>> struct domain itself, perhaps inside a new #ifdef (iirc one of
>>> the patches introduces a suitable Kconfig option).
>> Well, your advise regarding ioreq_server sounds reasonable, but the
>> common ioreq.c
>> still will have other *arch.hvm.* for both vcpu and domain. So looks
>> like other involved structs should be moved
>> into *common* struct domain/vcpu itself, correct? Some of them could be
>> moved easily since contain the same fields (arch.hvm.ioreq_gfn),
>> but some of them couldn't and seems to require to pull a lot of changes
>> to the Xen code (arch.hvm.params, arch.hvm.hvm_io), I am afraid.
>> Or I missed something?
> arch.hvm.params, iirc, is an x86 concept, and hence would need
> abstracting away anyway. I expect this will be common pattern:
> Either you want things to become generic (structure fields
> living directly in struct domain, or at least not under arch.hvm),
> or things need abstracting for per-arch handling.
Got it.

Let me please clarify one more question.
In order to avoid the layering violation in current patch we could apply 
a complex approach.

1. *arch.hvm.ioreq_gfn* and *arch.hvm.ioreq_server*: Both structs go 
into common struct domain.

2. *arch.hvm.params*: Two functions that use it 
(hvm_alloc_legacy_ioreq_gfn/hvm_free_legacy_ioreq_gfn) either go into 
arch code completely or
     specific macro is used in common code:

    #define ioreq_get_params(d, i) ((d)->arch.hvm.params[i])

    I would prefer macro than moving functions to arch code (which are 
equal and should remain in sync).

3. *arch.hvm.hvm_io*: We could also use the following:

    #define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
    #define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)

    This way struct hvm_vcpu_io won't be used in common code as well.

    Are #2, 3 appropriate to go with?


Dirty non-tested patch (which applied on top of the whole series and 
targets Arm only) shows how it could look like.


diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 2e85ea7..5894bdab 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -194,7 +194,7 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu 
*sv, ioreq_t *p)
  bool handle_hvm_io_completion(struct vcpu *v)
  {
      struct domain *d = v->domain;
-    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
+    ioreq_t io_req = ioreq_get_io_req(v);
      struct hvm_ioreq_server *s;
      struct hvm_ioreq_vcpu *sv;
      enum hvm_io_completion io_completion;
@@ -209,14 +209,14 @@ bool handle_hvm_io_completion(struct vcpu *v)
      if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
          return false;

-    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
+    io_req.state = hvm_ioreq_needs_completion(&io_req) ?
          STATE_IORESP_READY : STATE_IOREQ_NONE;

      msix_write_completion(v);
      vcpu_end_shutdown_deferral(v);

-    io_completion = vio->io_completion;
-    vio->io_completion = HVMIO_no_completion;
+    io_completion = ioreq_get_io_completion(v);
+    ioreq_get_io_completion(v) = HVMIO_no_completion;

      switch ( io_completion )
      {
@@ -227,8 +227,8 @@ bool handle_hvm_io_completion(struct vcpu *v)
          return ioreq_handle_complete_mmio();

      case HVMIO_pio_completion:
-        return handle_pio(vio->io_req.addr, vio->io_req.size,
-                          vio->io_req.dir);
+        return handle_pio(io_req.addr, io_req.size,
+                          io_req.dir);

      default:
          return arch_handle_hvm_io_completion(io_completion);
@@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct 
hvm_ioreq_server *s)
      for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
      {
          if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
-            return _gfn(d->arch.hvm.params[i]);
+            return _gfn(ioreq_get_params(d, i));
      }

      return INVALID_GFN;
@@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct 
hvm_ioreq_server *s,

      for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
      {
-        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
+        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
               break;
      }
      if ( i > HVM_PARAM_BUFIOREQ_PFN )
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 0e3ef20..ff761f5 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -21,6 +21,8 @@ struct hvm_domain
      uint64_t              params[HVM_NR_PARAMS];
  };

+#define ioreq_get_params(d, i) ((d)->arch.hvm.params[i])
+
  #ifdef CONFIG_ARM_64
  enum domain_type {
      DOMAIN_32BIT,
@@ -120,6 +122,9 @@ struct hvm_vcpu_io {
      unsigned long       mmio_gpfn;
  };

+#define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
+#define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)
+
  struct arch_vcpu
  {
      struct {
(END)


>
>>> This goes
>>> alongside my suggestion to drop the "hvm" prefixes and infixes
>>> from involved function names.
>> Well, I assume this request as well as the request above should be
>> addressed in the follow-up patches, as we want to keep the code movement
>> in current patch as (almost) verbatim copy,
>> Am I correct?
> The renaming could imo be done before or after the move, but within
> a single series. Doing it (or some of it) during the move may be
> acceptable, but this primarily depends on the overall effect on the
> patch that this would have. I.e. the patch better wouldn't become
> gigantic just because all the renaming gets done in one go, and it's
> hundreds of places that need touching.

Got it as well.

Thank you for the explanation.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-22 15:05             ` Oleksandr
@ 2020-09-22 15:52               ` Jan Beulich
  2020-09-23 12:28                 ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-22 15:52 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall

On 22.09.2020 17:05, Oleksandr wrote:
> 2. *arch.hvm.params*: Two functions that use it 
> (hvm_alloc_legacy_ioreq_gfn/hvm_free_legacy_ioreq_gfn) either go into 
> arch code completely or
>      specific macro is used in common code:
> 
>     #define ioreq_get_params(d, i) ((d)->arch.hvm.params[i])

If Arm has the concept of params, then perhaps. But I didn't think
Arm does ...

>     I would prefer macro than moving functions to arch code (which are 
> equal and should remain in sync).

Yes, if the rest of the code is identical, I agree it's better to
merely abstract away small pieces like this one.

> 3. *arch.hvm.hvm_io*: We could also use the following:
> 
>     #define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
>     #define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)
> 
>     This way struct hvm_vcpu_io won't be used in common code as well.

But if Arm needs similar field, why keep them in arch.hvm.hvm_io?

> --- a/xen/common/ioreq.c
> +++ b/xen/common/ioreq.c
> @@ -194,7 +194,7 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu 
> *sv, ioreq_t *p)
>   bool handle_hvm_io_completion(struct vcpu *v)
>   {
>       struct domain *d = v->domain;
> -    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
> +    ioreq_t io_req = ioreq_get_io_req(v);
>       struct hvm_ioreq_server *s;
>       struct hvm_ioreq_vcpu *sv;
>       enum hvm_io_completion io_completion;
> @@ -209,14 +209,14 @@ bool handle_hvm_io_completion(struct vcpu *v)
>       if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
>           return false;
> 
> -    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
> +    io_req.state = hvm_ioreq_needs_completion(&io_req) ?
>           STATE_IORESP_READY : STATE_IOREQ_NONE;

This is unlikely to be correct - you're now updating an on-stack
copy of the ioreq_t instead of what vio points at.

>       msix_write_completion(v);
>       vcpu_end_shutdown_deferral(v);
> 
> -    io_completion = vio->io_completion;
> -    vio->io_completion = HVMIO_no_completion;
> +    io_completion = ioreq_get_io_completion(v);
> +    ioreq_get_io_completion(v) = HVMIO_no_completion;

I think it's at least odd to have an lvalue with this kind of a
name. Perhaps want to drop "get" if it's really meant to be used
like this.

> @@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct 
> hvm_ioreq_server *s)
>       for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>       {
>           if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
> -            return _gfn(d->arch.hvm.params[i]);
> +            return _gfn(ioreq_get_params(d, i));
>       }
> 
>       return INVALID_GFN;
> @@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct 
> hvm_ioreq_server *s,
> 
>       for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>       {
> -        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
> +        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
>                break;
>       }
>       if ( i > HVM_PARAM_BUFIOREQ_PFN )

And these two are needed by Arm? Shouldn't Arm exclusively use
the new model, via acquire_resource?

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 03/16] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
  2020-09-14 14:59   ` Jan Beulich
@ 2020-09-22 16:16     ` Oleksandr
  2020-09-23 17:27     ` Julien Grall
  1 sibling, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-22 16:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Jun Nakajima, Kevin Tian,
	Andrew Cooper, Wei Liu, Roger Pau Monné,
	Paul Durrant, Julien Grall, Stefano Stabellini, Julien Grall


On 14.09.20 17:59, Jan Beulich wrote:

Hi Jan

> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/xen/ioreq.h
>> +++ b/xen/include/xen/ioreq.h
>> @@ -35,6 +35,13 @@ static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
>>       return GET_IOREQ_SERVER(d, id);
>>   }
>>   
>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
>> +{
>> +    return ioreq->state == STATE_IOREQ_READY &&
>> +           !ioreq->data_is_ptr &&
>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
>> +}
> While the PIO aspect has been discussed to some length, what about
> the data_is_ptr concept? I didn't think there were Arm insns fitting
> this? Instead I thought some other Arm-specific adjustments to the
> protocol might be needed. At which point the question of course would
> be in how far ioreq_t as a whole really fits Arm in its current shape.
I may mistake here but I don't think the "data_is_ptr" is supported.
It worth mentioning that on Arm, all the accesses to MMIO region will do 
a single memory access.
So we set "df" to 0 and "count" to 1. Other ioreq_t fields are in use.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 04/16] xen/ioreq: Provide alias for the handle_mmio()
  2020-09-14 15:10   ` Jan Beulich
@ 2020-09-22 16:20     ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-22 16:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall


On 14.09.20 18:10, Jan Beulich wrote:

Hi Jan

> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>> --- a/xen/common/ioreq.c
>> +++ b/xen/common/ioreq.c
>> @@ -189,7 +189,7 @@ bool handle_hvm_io_completion(struct vcpu *v)
>>           break;
>>   
>>       case HVMIO_mmio_completion:
>> -        return handle_mmio();
>> +        return ioreq_handle_complete_mmio();
> Again the question: Any particular reason to have "handle" in here?

"Handle" has been already discussed in previous patches). Will remove.


> With the abstraction simply named ioreq_complete_mmio() feel free
> to add
> Acked-by: Jan Beulich <jbeulich@suse.com>


Thank you.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 05/16] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common
  2020-09-14 15:13   ` Jan Beulich
@ 2020-09-22 16:24     ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-22 16:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Julien Grall, Stefano Stabellini, Julien Grall


On 14.09.20 18:13, Jan Beulich wrote:

Hi Jan

> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The IOREQ is a common feature now and these helpers will be used
>> on Arm as is. Move them to include/xen/ioreq.h
> I think you also want to renamed them to replace the hvm_
> prefix by ioreq_.

ok

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 06/16] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
  2020-09-14 15:16   ` Jan Beulich
  2020-09-14 15:59     ` Julien Grall
@ 2020-09-22 16:33     ` Oleksandr
  1 sibling, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-22 16:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Julien Grall, Stefano Stabellini, Julien Grall


On 14.09.20 18:16, Jan Beulich wrote:

Hi Jan

> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/xen/ioreq.h
>> +++ b/xen/include/xen/ioreq.h
>> @@ -23,6 +23,40 @@
>>   
>>   #include <asm/hvm/ioreq.h>
>>   
>> +struct hvm_ioreq_page {
>> +    gfn_t gfn;
>> +    struct page_info *page;
>> +    void *va;
>> +};
>> +
>> +struct hvm_ioreq_vcpu {
>> +    struct list_head list_entry;
>> +    struct vcpu      *vcpu;
>> +    evtchn_port_t    ioreq_evtchn;
>> +    bool             pending;
>> +};
>> +
>> +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
>> +#define MAX_NR_IO_RANGES  256
>> +
>> +struct hvm_ioreq_server {
>> +    struct domain          *target, *emulator;
>> +
>> +    /* Lock to serialize toolstack modifications */
>> +    spinlock_t             lock;
>> +
>> +    struct hvm_ioreq_page  ioreq;
>> +    struct list_head       ioreq_vcpu_list;
>> +    struct hvm_ioreq_page  bufioreq;
>> +
>> +    /* Lock to serialize access to buffered ioreq ring */
>> +    spinlock_t             bufioreq_lock;
>> +    evtchn_port_t          bufioreq_evtchn;
>> +    struct rangeset        *range[NR_IO_RANGE_TYPES];
>> +    bool                   enabled;
>> +    uint8_t                bufioreq_handling;
>> +};
> Besides there again being the question of hvm_ prefixes here,
> is the bufioreq concept something Arm is meaning to make use
> of? If not, this may want to become conditional ...
The hvm_ prefixes will be removed.
Regarding bufioreq concept I agree with what Julien said. Although we 
don't need it right away on Arm we can use it later on for the virtio 
improvements.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 07/16] xen/dm: Make x86's DM feature common
  2020-09-14 15:56   ` Jan Beulich
@ 2020-09-22 16:46     ` Oleksandr
  2020-09-24 11:03       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-22 16:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini,
	Daniel De Graaf, Julien Grall


On 14.09.20 18:56, Jan Beulich wrote:
Hi Jan

> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/xen/hypercall.h
>> +++ b/xen/include/xen/hypercall.h
>> @@ -150,6 +150,18 @@ do_dm_op(
>>       unsigned int nr_bufs,
>>       XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs);
>>   
>> +struct dmop_args {
>> +    domid_t domid;
>> +    unsigned int nr_bufs;
>> +    /* Reserve enough buf elements for all current hypercalls. */
>> +    struct xen_dm_op_buf buf[2];
>> +};
>> +
>> +int arch_dm_op(struct xen_dm_op *op,
>> +               struct domain *d,
>> +               const struct dmop_args *op_args,
>> +               bool *const_op);
>> +
>>   #ifdef CONFIG_HYPFS
>>   extern long
>>   do_hypfs_op(
> There are exactly two CUs which need to see these two declarations.
> Personally I think they should go into a new header, or at least
> into one that half-way fits (from the pov of its other contents)
> and doesn't get included by half the code base. But maybe it's
> just me ...

I am afraid, I didn't get why this header is not suitable for keeping 
this stuff...

But, I don't against moving this into a new header (probably dm.h?)

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-16  7:51   ` Jan Beulich
@ 2020-09-22 17:12     ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-22 17:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Julien Grall


On 16.09.20 10:51, Jan Beulich wrote:

Hi Jan

> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>> @@ -2277,8 +2299,10 @@ void leave_hypervisor_to_guest(void)
>>   {
>>       local_irq_disable();
>>   
>> -    check_for_vcpu_work();
>> -    check_for_pcpu_work();
>> +    do
>> +    {
>> +        check_for_pcpu_work();
>> +    } while ( check_for_vcpu_work() );
> Looking at patch 11 I've stumbled across changes done to code
> related to this, and now I wonder: There's no mention in the
> description of why this safe (i.e. not a potentially unbounded
> loop).
Indeed there was a discussion regarding that. Probably I should have 
added an explanation.
Please see the thoughts about that and a reason why it was decided to 
let the vCPU to spin forever if I/O never completes
(check_for_vcpu_work() never returns false) and why it was considered as 
safe action:
https://patchwork.kernel.org/patch/11698549/#23540209


>
> As a nit - the opening brace does not belong on its own line in
> this specific case.
ok

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
  2020-09-16  8:55         ` Julien Grall
@ 2020-09-22 17:30           ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-22 17:30 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Wei Liu, Roger Pau Monné,
	Julien Grall


On 16.09.20 11:55, Julien Grall wrote:

Hi Julien, Jan

>
>
> On 16/09/2020 09:52, Jan Beulich wrote:
>> On 16.09.2020 10:50, Julien Grall wrote:
>>> On 16/09/2020 08:17, Jan Beulich wrote:
>>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>>>            for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>>>>>            {
>>>>> -            rc = set_foreign_p2m_entry(currd, gfn_list[i],
>>>>> +            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>>>>> _mfn(mfn_list[i]));
>>>>
>>>> Is it going to lead to proper behavior when d == currd, specifically
>>>> for Arm but also in the abstract model? If you expose this to other
>>>> than Dom0, this case needs at least considering (and hence mentioning
>>>> in the description of why it's safe to permit if you don't reject
>>>> such attempts).
>>>
>>> This is already rejected by rcu_lock_remote_domain_by_id().
>>
>> Oh, yes, I'm sorry for overlooking this.
>
> That's fine, I also overlooked it when I originally wrote the code.
>
> @oleksandr, it might be worth mentioning in the commit message and 
> maybe in the code this subtlety.

Yes, will do.

Also the following will be taken into the account:

1. Implement arch_refcounts_p2m()
2. Move function declaration to xen/p2m-common.h
3. Add const to new parameter


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
  2020-09-16  8:04   ` Jan Beulich
  2020-09-16  8:13     ` Paul Durrant
@ 2020-09-22 18:23     ` Oleksandr
  1 sibling, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-22 18:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Paul Durrant, xen-devel, Oleksandr Tyshchenko,
	Stefano Stabellini, Julien Grall, Volodymyr Babchuk,
	Andrew Cooper, Wei Liu, Roger Pau Monné,
	Julien Grall


On 16.09.20 11:04, Jan Beulich wrote:

Hi Jan

> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch introduces a helper the main purpose of which is to check
>> if a domain is using IOREQ server(s).
>>
>> On Arm the benefit is to avoid calling handle_hvm_io_completion()
>> (which implies iterating over all possible IOREQ servers anyway)
>> on every return in leave_hypervisor_to_guest() if there is no active
>> servers for the particular domain.
>>
>> This involves adding an extra per-domain variable to store the count
>> of servers in use.
> Since only Arm needs the variable (and the helper), perhaps both should
> be Arm-specific (which looks to be possible without overly much hassle)?
Please note that the whole ioreq_server struct are going be moved to 
"common" domain and new variable is going to go into it.
I am wondering whether this single-line helper could be used on x86 or 
potential new arch ...


>
>> --- a/xen/common/ioreq.c
>> +++ b/xen/common/ioreq.c
>> @@ -38,9 +38,15 @@ static void set_ioreq_server(struct domain *d, unsigned int id,
>>                                struct hvm_ioreq_server *s)
>>   {
>>       ASSERT(id < MAX_NR_IOREQ_SERVERS);
>> -    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
>> +    ASSERT((!s && d->arch.hvm.ioreq_server.server[id]) ||
>> +           (s && !d->arch.hvm.ioreq_server.server[id]));
> For one, this can be had with less redundancy (and imo even improved
> clarity, but I guess this latter aspect my depend on personal
> preferences):
>
>      ASSERT(d->arch.hvm.ioreq_server.server[id] ? !s : !!s);

This construction indeed better.


>
> But then I wonder whether the original intention wasn't rather such
> that replacing NULL by NULL is permissible. Paul?
>
>>       d->arch.hvm.ioreq_server.server[id] = s;
>> +
>> +    if ( s )
>> +        d->arch.hvm.ioreq_server.nr_servers ++;
>> +    else
>> +        d->arch.hvm.ioreq_server.nr_servers --;
> Nit: Stray blanks (should be there only with binary operators).

ok


>
>> @@ -1395,6 +1401,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
>>   void hvm_ioreq_init(struct domain *d)
>>   {
>>       spin_lock_init(&d->arch.hvm.ioreq_server.lock);
>> +    d->arch.hvm.ioreq_server.nr_servers = 0;
> There's no need for this - struct domain instances start out all
> zero anyway.

ok


>
>> --- a/xen/include/xen/ioreq.h
>> +++ b/xen/include/xen/ioreq.h
>> @@ -57,6 +57,11 @@ struct hvm_ioreq_server {
>>       uint8_t                bufioreq_handling;
>>   };
>>   
>> +static inline bool hvm_domain_has_ioreq_server(const struct domain *d)
>> +{
>> +    return (d->arch.hvm.ioreq_server.nr_servers > 0);
>> +}
> This is safe only when d == current->domain and it's not paused,
> or when they're distinct and d is paused. Otherwise the result is
> stale before the caller can inspect it. This wants documenting by
> at least a comment, but perhaps better by suitable ASSERT()s.

Agree, will use ASSERT()s.


>
> As in earlier patches I don't think a hvm_ prefix should be used
> here.

ok


>
> Also as a nit: The parentheses here are unnecessary, and strictly
> speaking the "> 0" is, too.

ok


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
  2020-09-16  8:43         ` Paul Durrant
@ 2020-09-22 18:39           ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-22 18:39 UTC (permalink / raw)
  To: paul, 'Julien Grall', 'Jan Beulich'
  Cc: xen-devel, 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Volodymyr Babchuk',
	'Andrew Cooper', 'Wei Liu',
	'Roger Pau Monné', 'Julien Grall'


On 16.09.20 11:43, Paul Durrant wrote:

Hi all.

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 16 September 2020 09:39
>> To: paul@xen.org; 'Jan Beulich' <jbeulich@suse.com>; 'Oleksandr Tyshchenko' <olekstysh@gmail.com>
>> Cc: xen-devel@lists.xenproject.org; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>; 'Stefano
>> Stabellini' <sstabellini@kernel.org>; 'Volodymyr Babchuk' <Volodymyr_Babchuk@epam.com>; 'Andrew
>> Cooper' <andrew.cooper3@citrix.com>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>;
>> 'Julien Grall' <julien.grall@arm.com>
>> Subject: Re: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
>>
>>
>>
>> On 16/09/2020 09:13, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 16 September 2020 09:05
>>>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant <paul@xen.org>
>>>> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Stefano
>>>> Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Volodymyr Babchuk
>>>> <Volodymyr_Babchuk@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>;
>> Roger
>>>> Pau Monné <roger.pau@citrix.com>; Julien Grall <julien.grall@arm.com>
>>>> Subject: Re: [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
>>>>
>>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>
>>>>> This patch introduces a helper the main purpose of which is to check
>>>>> if a domain is using IOREQ server(s).
>>>>>
>>>>> On Arm the benefit is to avoid calling handle_hvm_io_completion()
>>>>> (which implies iterating over all possible IOREQ servers anyway)
>>>>> on every return in leave_hypervisor_to_guest() if there is no active
>>>>> servers for the particular domain.
>>>>>
>>> Is this really worth it? The limit on the number of ioreq serves is small... just 8.
>> When I suggested this, I failed to realize there was only 8 IOREQ
>> servers available. However, I would not be surprised if this increase
>> long term as we want to use
> If that happens then we'll probably want to move (back to) a list rather than an array...
>
>>> I doubt you'd be able measure the difference.
>> Bear in mind that entry/exit to the hypervisor is pretty "cheap" on Arm
>> compare to x86. So we want to avoid doing extra work if it is not necessary.
>>
> ... which will seamlessly deal with this issue.


Please note that in addition to benefit for the exit part on Arm we 
could also use this helper to check if domain is using IOREQ here [1]
to avoid an extra action (send_invalidate_req() call).

[1] https://patchwork.kernel.org/patch/11769143/


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common
  2020-09-16  8:50   ` Jan Beulich
@ 2020-09-22 19:32     ` Oleksandr
  2020-09-24 11:16       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-22 19:32 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Wei Liu, Paul Durrant, Julien Grall


On 16.09.20 11:50, Jan Beulich wrote:

Hi Jan, Roger

> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>> --- a/xen/arch/arm/traps.c
>> +++ b/xen/arch/arm/traps.c
>> @@ -1490,6 +1490,12 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
>>       /* Ensure the hypercall trap instruction is re-executed. */
>>       if ( current->hcall_preempted )
>>           regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
>> +
>> +#ifdef CONFIG_IOREQ_SERVER
>> +    if ( unlikely(current->domain->qemu_mapcache_invalidate) &&
>> +         test_and_clear_bool(current->domain->qemu_mapcache_invalidate) )
>> +        send_invalidate_req();
>> +#endif
>>   }
> There's a lot of uses of "current" here now, and these don't look to
> exactly be cheap on Arm either (they aren't on x86), so I wonder
> whether this is the point where at least "current" wants latching
> into a local variable here.

Sounds reasonable, will use local variable


>
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -18,8 +18,10 @@
>>    *
>>    * Copyright (c) 2017 Citrix Systems Ltd.
>>    */
>> +
>>   #include <xen/lib.h>
>>   #include <xen/hypercall.h>
>> +#include <xen/ioreq.h>
>>   #include <xen/nospec.h>
> While I don't care much about the presence of absence of the blank
> line between head comment and #include-s, I don't see why you add
> one here.

Accidentally), will remove.


>
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -1651,6 +1651,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>           break;
>>       }
>>   
>> +#ifdef CONFIG_IOREQ_SERVER
>> +    if ( op == XENMEM_decrease_reservation )
>> +        curr_d->qemu_mapcache_invalidate = true;
>> +#endif
> I don't see why you put this right into decrease_reservation(). This
> isn't just to avoid the extra conditional, but first and foremost to
> avoid bypassing the earlier return from the function (in the case of
> preemption). In the context of this I wonder whether the ordering of
> operations in hvm_hypercall() is actually correct.
Good point, indeed we may return earlier in case of preemption, I missed 
that.
Will move it to decrease_reservation(). But, we may return even earlier 
in case of error...
Now I am wondering should we move it to the very beginning of command 
processing or not?
AFAIU before this patch qemu_mapcache_invalidate was always set in 
hvm_memory_op() if XENMEM_decrease_reservation came
despite of possible errors in the command processing.


> I'm also unconvinced curr_d is the right domain in all cases here;
> while this may be a pre-existing issue in principle, I'm afraid it
> gets more pronounced by the logic getting moved to common code.

Sorry I didn't get your concern here.


> Roger - thoughts either way with, in particular, PVH Dom0 in mind?

>> --- a/xen/include/xen/ioreq.h
>> +++ b/xen/include/xen/ioreq.h
>> @@ -97,6 +97,8 @@ static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
>>              (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
>>   }
>>   
>> +void send_invalidate_req(void);
> Perhaps rename to ioreq_send_invalidate(), ioreq_send_invalidate_req(),
> or send_invalidate_ioreq() at this occasion?

I would prefer function with ioreq_ prefix.


>
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -512,6 +512,8 @@ struct domain
>>       /* Argo interdomain communication support */
>>       struct argo_domain *argo;
>>   #endif
>> +
>> +    bool_t qemu_mapcache_invalidate;
> "bool" please.

ok

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-09-16  9:12         ` Julien Grall
@ 2020-09-22 20:05           ` Oleksandr
  2020-09-23 18:12             ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-22 20:05 UTC (permalink / raw)
  To: Julien Grall, paul, 'Jan Beulich'
  Cc: xen-devel, 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall'


On 16.09.20 12:12, Julien Grall wrote:

Hi all.

>
>
> On 16/09/2020 10:09, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Julien Grall <julien@xen.org>
>>> Sent: 16 September 2020 10:07
>>> To: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko 
>>> <olekstysh@gmail.com>
>>> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko 
>>> <oleksandr_tyshchenko@epam.com>; Paul Durrant
>>> <paul@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Julien 
>>> Grall <jgrall@amazon.com>
>>> Subject: Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() 
>>> instead of cmpxchg()
>>>
>>>
>>>
>>> On 16/09/2020 10:04, Jan Beulich wrote:
>>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>>> @@ -1325,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct 
>>>>> hvm_ioreq_server *s, ioreq_t *p)
>>>>>
>>>>>            new.read_pointer = old.read_pointer - n * 
>>>>> IOREQ_BUFFER_SLOT_NUM;
>>>>>            new.write_pointer = old.write_pointer - n * 
>>>>> IOREQ_BUFFER_SLOT_NUM;
>>>>> -        cmpxchg(&pg->ptrs.full, old.full, new.full);
>>>>> +        guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full);
>>>>
>>>> But the memory we're updating is shared with s->emulator, not with d,
>>>> if I'm not mistaken.
>>>
>>> It is unfortunately shared with both s->emulator and d when using the
>>> legacy interface.
>>
>> When using magic pages they should be punched out of the P2M by the 
>> time the code gets here, so the memory should not be guest-visible.
>
> Can you point me to the code that doing this?
>
> Cheers,
>
If we are not going to use legacy interface on Arm we will have a page 
to be mapped in a single domain at the time.

I will update patch to use "s->emulator" if no objections.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-22 15:52               ` Jan Beulich
@ 2020-09-23 12:28                 ` Oleksandr
  2020-09-24 10:58                   ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-23 12:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall


On 22.09.20 18:52, Jan Beulich wrote:

Hi Jan

> On 22.09.2020 17:05, Oleksandr wrote:
>> 2. *arch.hvm.params*: Two functions that use it
>> (hvm_alloc_legacy_ioreq_gfn/hvm_free_legacy_ioreq_gfn) either go into
>> arch code completely or
>>       specific macro is used in common code:
>>
>>      #define ioreq_get_params(d, i) ((d)->arch.hvm.params[i])
> If Arm has the concept of params, then perhaps. But I didn't think
> Arm does ...
I think it has in some degree, there is a handling of 
HVMOP_set_param/HVMOP_get_param and
also there is a code to setup HVM_PARAM_CALLBACK_IRQ.


>
>>      I would prefer macro than moving functions to arch code (which are
>> equal and should remain in sync).
> Yes, if the rest of the code is identical, I agree it's better to
> merely abstract away small pieces like this one.

ok


>
>> 3. *arch.hvm.hvm_io*: We could also use the following:
>>
>>      #define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
>>      #define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)
>>
>>      This way struct hvm_vcpu_io won't be used in common code as well.
> But if Arm needs similar field, why keep them in arch.hvm.hvm_io?
Yes, Arm needs the "some" fields, but not "all of them" as x86 has.
For example Arm needs only the following (at least in the context of 
this series):

+struct hvm_vcpu_io {
+    /* I/O request in flight to device model. */
+    enum hvm_io_completion io_completion;
+    ioreq_t                io_req;
+
+    /*
+     * HVM emulation:
+     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
+     *  The latter is known to be an MMIO frame (not RAM).
+     *  This translation is only valid for accesses as per @mmio_access.
+     */
+    struct npfec        mmio_access;
+    unsigned long       mmio_gla;
+    unsigned long       mmio_gpfn;
+};

But for x86 the number of fields is quite bigger. If they were in same 
way applicable for both archs (as what we have with ioreq_server struct)
I would move it to the common domain. I didn't think of a better idea 
than just abstracting accesses to these (used in common ioreq.c) two 
fields by macro.


>
>> --- a/xen/common/ioreq.c
>> +++ b/xen/common/ioreq.c
>> @@ -194,7 +194,7 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu
>> *sv, ioreq_t *p)
>>    bool handle_hvm_io_completion(struct vcpu *v)
>>    {
>>        struct domain *d = v->domain;
>> -    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>> +    ioreq_t io_req = ioreq_get_io_req(v);
>>        struct hvm_ioreq_server *s;
>>        struct hvm_ioreq_vcpu *sv;
>>        enum hvm_io_completion io_completion;
>> @@ -209,14 +209,14 @@ bool handle_hvm_io_completion(struct vcpu *v)
>>        if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
>>            return false;
>>
>> -    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
>> +    io_req.state = hvm_ioreq_needs_completion(&io_req) ?
>>            STATE_IORESP_READY : STATE_IOREQ_NONE;
> This is unlikely to be correct - you're now updating an on-stack
> copy of the ioreq_t instead of what vio points at.
Oh, thank you for pointing this, I should have used ioreq_t *io_req = 
&ioreq_get_io_req(v);
I don't like ioreq_get_io_req much), probably ioreq_req would sound a 
little bit better?


>
>>        msix_write_completion(v);
>>        vcpu_end_shutdown_deferral(v);
>>
>> -    io_completion = vio->io_completion;
>> -    vio->io_completion = HVMIO_no_completion;
>> +    io_completion = ioreq_get_io_completion(v);
>> +    ioreq_get_io_completion(v) = HVMIO_no_completion;
> I think it's at least odd to have an lvalue with this kind of a
> name. Perhaps want to drop "get" if it's really meant to be used
> like this.

ok


>
>> @@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct
>> hvm_ioreq_server *s)
>>        for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>        {
>>            if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
>> -            return _gfn(d->arch.hvm.params[i]);
>> +            return _gfn(ioreq_get_params(d, i));
>>        }
>>
>>        return INVALID_GFN;
>> @@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct
>> hvm_ioreq_server *s,
>>
>>        for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>        {
>> -        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
>> +        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
>>                 break;
>>        }
>>        if ( i > HVM_PARAM_BUFIOREQ_PFN )
> And these two are needed by Arm? Shouldn't Arm exclusively use
> the new model, via acquire_resource?
I dropped HVMOP plumbing on Arm as it was requested. Only acquire 
interface should be used.
This code is not supposed to be called on Arm, but it is a part of 
common code and we need to find a way how to abstract away *arch.hvm.params*
Am I correct?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common
  2020-09-10 20:21 ` [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
  2020-09-14 13:52   ` Jan Beulich
@ 2020-09-23 17:22   ` Julien Grall
  2020-09-23 18:08     ` Oleksandr
  1 sibling, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-23 17:22 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Stefano Stabellini, Julien Grall

Hi,

On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> As a lot of x86 code can be re-used on Arm later on, this patch
> prepares IOREQ support before moving to the common code. This way
> we will get almost a verbatim copy for a code movement.

FWIW, I agree with Jan that we need more details what and why you are 
going it. It would be worth considering to split in smaller patch.

> 
> This support is going to be used on Arm to be able run device
> emulator outside of Xen hypervisor.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>

Usually the first signed-off is the author of the patch. However, this 
patch look quite far off from what I originally wrote.

So I don't feel my signed-off-by is actually warrant here. If you want 
to credit me, then you can mention it in the commit message.

> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 03/16] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
  2020-09-14 14:59   ` Jan Beulich
  2020-09-22 16:16     ` Oleksandr
@ 2020-09-23 17:27     ` Julien Grall
  1 sibling, 0 replies; 111+ messages in thread
From: Julien Grall @ 2020-09-23 17:27 UTC (permalink / raw)
  To: Jan Beulich, Oleksandr Tyshchenko
  Cc: xen-devel, Oleksandr Tyshchenko, Jun Nakajima, Kevin Tian,
	Andrew Cooper, Wei Liu, Roger Pau Monné,
	Paul Durrant, Stefano Stabellini, Julien Grall

Hi,

On 14/09/2020 15:59, Jan Beulich wrote:
> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/xen/ioreq.h
>> +++ b/xen/include/xen/ioreq.h
>> @@ -35,6 +35,13 @@ static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
>>       return GET_IOREQ_SERVER(d, id);
>>   }
>>   
>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
>> +{
>> +    return ioreq->state == STATE_IOREQ_READY &&
>> +           !ioreq->data_is_ptr &&
>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
>> +}
> 
> While the PIO aspect has been discussed to some length, what about
> the data_is_ptr concept? I didn't think there were Arm insns fitting
> this? Instead I thought some other Arm-specific adjustments to the
> protocol might be needed. At which point the question of course would
> be in how far ioreq_t as a whole really fits Arm in its current shape.

I would rather not try to re-invent ioreq_t for Arm if we don't need to. 
This is only going to increase the amount of arch specific code in a 
device emulator that really ought to be agnostic.

At the moment, I think it is fine to have "unused" field on Arm as long 
as they contain the right value.

So I would rather keep the check in common code as well.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 04/16] xen/ioreq: Provide alias for the handle_mmio()
  2020-09-10 20:21 ` [PATCH V1 04/16] xen/ioreq: Provide alias for the handle_mmio() Oleksandr Tyshchenko
  2020-09-14 15:10   ` Jan Beulich
@ 2020-09-23 17:28   ` Julien Grall
  2020-09-23 18:17     ` Oleksandr
  1 sibling, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-23 17:28 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Jan Beulich, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	Stefano Stabellini, Julien Grall

Hi Oleksandr,

On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The IOREQ is a common feature now and Arm will have its own
> implementation.
> 
> But the name of the function is pretty generic and can be confusing
> on Arm (we already have a try_handle_mmio()).
> 
> In order not to rename the function (which is used for a varying
> set of purposes on x86) globally and get non-confusing variant on Arm
> provide an alias ioreq_handle_complete_mmio() to be used on common and
> Arm code.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>

This doesn't  look like a code I wrote... Can you make sure I am only 
credited on what I wrote?

> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 07/16] xen/dm: Make x86's DM feature common
  2020-09-10 20:22 ` [PATCH V1 07/16] xen/dm: Make x86's DM feature common Oleksandr Tyshchenko
  2020-09-14 15:56   ` Jan Beulich
@ 2020-09-23 17:35   ` Julien Grall
  2020-09-23 18:28     ` Oleksandr
  1 sibling, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-23 17:35 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Jan Beulich, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini, Daniel De Graaf,
	Julien Grall



On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

I believe I am the original author of this code. So this needs to be 
fixed accordingly.

> 
> As a lot of x86 code can be re-used on Arm later on, this patch
> splits devicemodel support into common and arch specific parts.
> 
> Also update XSM code a bit to let DM op be used on Arm.
> 
> This support is going to be used on Arm to be able run device
> emulator outside of Xen hypervisor.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Changes RFC -> V1:
>     - update XSM, related changes were pulled from:
>       [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
> ---
> ---
>   xen/arch/x86/hvm/dm.c       | 287 +++-----------------------------------------
>   xen/common/Makefile         |   1 +
>   xen/common/dm.c             | 287 ++++++++++++++++++++++++++++++++++++++++++++
>   xen/include/xen/hypercall.h |  12 ++
>   xen/include/xsm/dummy.h     |   4 +-
>   xen/include/xsm/xsm.h       |   6 +-
>   xen/xsm/dummy.c             |   2 +-
>   xen/xsm/flask/hooks.c       |   5 +-
>   8 files changed, 327 insertions(+), 277 deletions(-)
>   create mode 100644 xen/common/dm.c
> 
> diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
> index 5ce484a..6ae535e 100644
> --- a/xen/arch/x86/hvm/dm.c
> +++ b/xen/arch/x86/hvm/dm.c
> @@ -29,13 +29,6 @@
>   
>   #include <public/hvm/hvm_op.h>
>   
> -struct dmop_args {
> -    domid_t domid;
> -    unsigned int nr_bufs;
> -    /* Reserve enough buf elements for all current hypercalls. */
> -    struct xen_dm_op_buf buf[2];
> -};
> -
>   static bool _raw_copy_from_guest_buf_offset(void *dst,
>                                               const struct dmop_args *args,
>                                               unsigned int buf_idx,
> @@ -338,148 +331,20 @@ static int inject_event(struct domain *d,
>       return 0;
>   }
>   
> -static int dm_op(const struct dmop_args *op_args)
> +int arch_dm_op(struct xen_dm_op *op, struct domain *d,
> +               const struct dmop_args *op_args, bool *const_op)
>   {
> -    struct domain *d;
> -    struct xen_dm_op op;
> -    bool const_op = true;
>       long rc;
> -    size_t offset;
> -
> -    static const uint8_t op_size[] = {
> -        [XEN_DMOP_create_ioreq_server]              = sizeof(struct xen_dm_op_create_ioreq_server),
> -        [XEN_DMOP_get_ioreq_server_info]            = sizeof(struct xen_dm_op_get_ioreq_server_info),
> -        [XEN_DMOP_map_io_range_to_ioreq_server]     = sizeof(struct xen_dm_op_ioreq_server_range),
> -        [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range),
> -        [XEN_DMOP_set_ioreq_server_state]           = sizeof(struct xen_dm_op_set_ioreq_server_state),
> -        [XEN_DMOP_destroy_ioreq_server]             = sizeof(struct xen_dm_op_destroy_ioreq_server),
> -        [XEN_DMOP_track_dirty_vram]                 = sizeof(struct xen_dm_op_track_dirty_vram),
> -        [XEN_DMOP_set_pci_intx_level]               = sizeof(struct xen_dm_op_set_pci_intx_level),
> -        [XEN_DMOP_set_isa_irq_level]                = sizeof(struct xen_dm_op_set_isa_irq_level),
> -        [XEN_DMOP_set_pci_link_route]               = sizeof(struct xen_dm_op_set_pci_link_route),
> -        [XEN_DMOP_modified_memory]                  = sizeof(struct xen_dm_op_modified_memory),
> -        [XEN_DMOP_set_mem_type]                     = sizeof(struct xen_dm_op_set_mem_type),
> -        [XEN_DMOP_inject_event]                     = sizeof(struct xen_dm_op_inject_event),
> -        [XEN_DMOP_inject_msi]                       = sizeof(struct xen_dm_op_inject_msi),
> -        [XEN_DMOP_map_mem_type_to_ioreq_server]     = sizeof(struct xen_dm_op_map_mem_type_to_ioreq_server),
> -        [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
> -        [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
> -        [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
> -    };
> -
> -    rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
> -    if ( rc )
> -        return rc;
> -
> -    if ( !is_hvm_domain(d) )
> -        goto out;
> -
> -    rc = xsm_dm_op(XSM_DM_PRIV, d);
> -    if ( rc )
> -        goto out;
> -
> -    offset = offsetof(struct xen_dm_op, u);
> -
> -    rc = -EFAULT;
> -    if ( op_args->buf[0].size < offset )
> -        goto out;
> -
> -    if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) )
> -        goto out;
> -
> -    if ( op.op >= ARRAY_SIZE(op_size) )
> -    {
> -        rc = -EOPNOTSUPP;
> -        goto out;
> -    }
> -
> -    op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size));
> -
> -    if ( op_args->buf[0].size < offset + op_size[op.op] )
> -        goto out;
> -
> -    if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset,
> -                                op_size[op.op]) )
> -        goto out;
> -
> -    rc = -EINVAL;
> -    if ( op.pad )
> -        goto out;
> -
> -    switch ( op.op )
> -    {
> -    case XEN_DMOP_create_ioreq_server:
> -    {
> -        struct xen_dm_op_create_ioreq_server *data =
> -            &op.u.create_ioreq_server;
> -
> -        const_op = false;
> -
> -        rc = -EINVAL;
> -        if ( data->pad[0] || data->pad[1] || data->pad[2] )
> -            break;
> -
> -        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
> -                                     &data->id);
> -        break;
> -    }
>   
> -    case XEN_DMOP_get_ioreq_server_info:
> +    switch ( op->op )
>       {
> -        struct xen_dm_op_get_ioreq_server_info *data =
> -            &op.u.get_ioreq_server_info;
> -        const uint16_t valid_flags = XEN_DMOP_no_gfns;
> -
> -        const_op = false;
> -
> -        rc = -EINVAL;
> -        if ( data->flags & ~valid_flags )
> -            break;
> -
> -        rc = hvm_get_ioreq_server_info(d, data->id,
> -                                       (data->flags & XEN_DMOP_no_gfns) ?
> -                                       NULL : &data->ioreq_gfn,
> -                                       (data->flags & XEN_DMOP_no_gfns) ?
> -                                       NULL : &data->bufioreq_gfn,
> -                                       &data->bufioreq_port);
> -        break;
> -    }
> -
> -    case XEN_DMOP_map_io_range_to_ioreq_server:
> -    {
> -        const struct xen_dm_op_ioreq_server_range *data =
> -            &op.u.map_io_range_to_ioreq_server;
> -
> -        rc = -EINVAL;
> -        if ( data->pad )
> -            break;
> -
> -        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
> -                                              data->start, data->end);
> -        break;
> -    }
> -
> -    case XEN_DMOP_unmap_io_range_from_ioreq_server:
> -    {
> -        const struct xen_dm_op_ioreq_server_range *data =
> -            &op.u.unmap_io_range_from_ioreq_server;
> -
> -        rc = -EINVAL;
> -        if ( data->pad )
> -            break;
> -
> -        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
> -                                                  data->start, data->end);
> -        break;
> -    }
> -
>       case XEN_DMOP_map_mem_type_to_ioreq_server:
>       {
>           struct xen_dm_op_map_mem_type_to_ioreq_server *data =
> -            &op.u.map_mem_type_to_ioreq_server;
> +            &op->u.map_mem_type_to_ioreq_server;
>           unsigned long first_gfn = data->opaque;
>   
> -        const_op = false;
> +        *const_op = false;
>   
>           rc = -EOPNOTSUPP;
>           if ( !hap_enabled(d) )
> @@ -523,36 +388,10 @@ static int dm_op(const struct dmop_args *op_args)
>           break;
>       }
>   
> -    case XEN_DMOP_set_ioreq_server_state:
> -    {
> -        const struct xen_dm_op_set_ioreq_server_state *data =
> -            &op.u.set_ioreq_server_state;
> -
> -        rc = -EINVAL;
> -        if ( data->pad )
> -            break;
> -
> -        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
> -        break;
> -    }
> -
> -    case XEN_DMOP_destroy_ioreq_server:
> -    {
> -        const struct xen_dm_op_destroy_ioreq_server *data =
> -            &op.u.destroy_ioreq_server;
> -
> -        rc = -EINVAL;
> -        if ( data->pad )
> -            break;
> -
> -        rc = hvm_destroy_ioreq_server(d, data->id);
> -        break;
> -    }
> -
>       case XEN_DMOP_track_dirty_vram:
>       {
>           const struct xen_dm_op_track_dirty_vram *data =
> -            &op.u.track_dirty_vram;
> +            &op->u.track_dirty_vram;
>   
>           rc = -EINVAL;
>           if ( data->pad )
> @@ -568,7 +407,7 @@ static int dm_op(const struct dmop_args *op_args)
>       case XEN_DMOP_set_pci_intx_level:
>       {
>           const struct xen_dm_op_set_pci_intx_level *data =
> -            &op.u.set_pci_intx_level;
> +            &op->u.set_pci_intx_level;
>   
>           rc = set_pci_intx_level(d, data->domain, data->bus,
>                                   data->device, data->intx,
> @@ -579,7 +418,7 @@ static int dm_op(const struct dmop_args *op_args)
>       case XEN_DMOP_set_isa_irq_level:
>       {
>           const struct xen_dm_op_set_isa_irq_level *data =
> -            &op.u.set_isa_irq_level;
> +            &op->u.set_isa_irq_level;
>   
>           rc = set_isa_irq_level(d, data->isa_irq, data->level);
>           break;
> @@ -588,7 +427,7 @@ static int dm_op(const struct dmop_args *op_args)
>       case XEN_DMOP_set_pci_link_route:
>       {
>           const struct xen_dm_op_set_pci_link_route *data =
> -            &op.u.set_pci_link_route;
> +            &op->u.set_pci_link_route;
>   
>           rc = hvm_set_pci_link_route(d, data->link, data->isa_irq);
>           break;
> @@ -597,19 +436,19 @@ static int dm_op(const struct dmop_args *op_args)
>       case XEN_DMOP_modified_memory:
>       {
>           struct xen_dm_op_modified_memory *data =
> -            &op.u.modified_memory;
> +            &op->u.modified_memory;
>   
>           rc = modified_memory(d, op_args, data);
> -        const_op = !rc;
> +        *const_op = !rc;
>           break;
>       }
>   
>       case XEN_DMOP_set_mem_type:
>       {
>           struct xen_dm_op_set_mem_type *data =
> -            &op.u.set_mem_type;
> +            &op->u.set_mem_type;
>   
> -        const_op = false;
> +        *const_op = false;
>   
>           rc = -EINVAL;
>           if ( data->pad )
> @@ -622,7 +461,7 @@ static int dm_op(const struct dmop_args *op_args)
>       case XEN_DMOP_inject_event:
>       {
>           const struct xen_dm_op_inject_event *data =
> -            &op.u.inject_event;
> +            &op->u.inject_event;
>   
>           rc = -EINVAL;
>           if ( data->pad0 || data->pad1 )
> @@ -635,7 +474,7 @@ static int dm_op(const struct dmop_args *op_args)
>       case XEN_DMOP_inject_msi:
>       {
>           const struct xen_dm_op_inject_msi *data =
> -            &op.u.inject_msi;
> +            &op->u.inject_msi;
>   
>           rc = -EINVAL;
>           if ( data->pad )
> @@ -648,7 +487,7 @@ static int dm_op(const struct dmop_args *op_args)
>       case XEN_DMOP_remote_shutdown:
>       {
>           const struct xen_dm_op_remote_shutdown *data =
> -            &op.u.remote_shutdown;
> +            &op->u.remote_shutdown;
>   
>           domain_shutdown(d, data->reason);
>           rc = 0;
> @@ -657,7 +496,7 @@ static int dm_op(const struct dmop_args *op_args)
>   
>       case XEN_DMOP_relocate_memory:
>       {
> -        struct xen_dm_op_relocate_memory *data = &op.u.relocate_memory;
> +        struct xen_dm_op_relocate_memory *data = &op->u.relocate_memory;
>           struct xen_add_to_physmap xatp = {
>               .domid = op_args->domid,
>               .size = data->size,
> @@ -680,7 +519,7 @@ static int dm_op(const struct dmop_args *op_args)
>               data->size -= rc;
>               data->src_gfn += rc;
>               data->dst_gfn += rc;
> -            const_op = false;
> +            *const_op = false;
>               rc = -ERESTART;
>           }
>           break;
> @@ -689,7 +528,7 @@ static int dm_op(const struct dmop_args *op_args)
>       case XEN_DMOP_pin_memory_cacheattr:
>       {
>           const struct xen_dm_op_pin_memory_cacheattr *data =
> -            &op.u.pin_memory_cacheattr;
> +            &op->u.pin_memory_cacheattr;
>   
>           if ( data->pad )
>           {
> @@ -707,94 +546,6 @@ static int dm_op(const struct dmop_args *op_args)
>           break;
>       }
>   
> -    if ( (!rc || rc == -ERESTART) &&
> -         !const_op && copy_to_guest_offset(op_args->buf[0].h, offset,
> -                                           (void *)&op.u, op_size[op.op]) )
> -        rc = -EFAULT;
> -
> - out:
> -    rcu_unlock_domain(d);
> -
> -    return rc;
> -}
> -
> -CHECK_dm_op_create_ioreq_server;
> -CHECK_dm_op_get_ioreq_server_info;
> -CHECK_dm_op_ioreq_server_range;
> -CHECK_dm_op_set_ioreq_server_state;
> -CHECK_dm_op_destroy_ioreq_server;
> -CHECK_dm_op_track_dirty_vram;
> -CHECK_dm_op_set_pci_intx_level;
> -CHECK_dm_op_set_isa_irq_level;
> -CHECK_dm_op_set_pci_link_route;
> -CHECK_dm_op_modified_memory;
> -CHECK_dm_op_set_mem_type;
> -CHECK_dm_op_inject_event;
> -CHECK_dm_op_inject_msi;
> -CHECK_dm_op_remote_shutdown;
> -CHECK_dm_op_relocate_memory;
> -CHECK_dm_op_pin_memory_cacheattr;
> -
> -int compat_dm_op(domid_t domid,
> -                 unsigned int nr_bufs,
> -                 XEN_GUEST_HANDLE_PARAM(void) bufs)
> -{
> -    struct dmop_args args;
> -    unsigned int i;
> -    int rc;
> -
> -    if ( nr_bufs > ARRAY_SIZE(args.buf) )
> -        return -E2BIG;
> -
> -    args.domid = domid;
> -    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
> -
> -    for ( i = 0; i < args.nr_bufs; i++ )
> -    {
> -        struct compat_dm_op_buf cmp;
> -
> -        if ( copy_from_guest_offset(&cmp, bufs, i, 1) )
> -            return -EFAULT;
> -
> -#define XLAT_dm_op_buf_HNDL_h(_d_, _s_) \
> -        guest_from_compat_handle((_d_)->h, (_s_)->h)
> -
> -        XLAT_dm_op_buf(&args.buf[i], &cmp);
> -
> -#undef XLAT_dm_op_buf_HNDL_h
> -    }
> -
> -    rc = dm_op(&args);
> -
> -    if ( rc == -ERESTART )
> -        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
> -                                           domid, nr_bufs, bufs);
> -
> -    return rc;
> -}
> -
> -long do_dm_op(domid_t domid,
> -              unsigned int nr_bufs,
> -              XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs)
> -{
> -    struct dmop_args args;
> -    int rc;
> -
> -    if ( nr_bufs > ARRAY_SIZE(args.buf) )
> -        return -E2BIG;
> -
> -    args.domid = domid;
> -    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
> -
> -    if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) )
> -        return -EFAULT;
> -
> -    rc = dm_op(&args);
> -
> -    if ( rc == -ERESTART )
> -        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
> -                                           domid, nr_bufs, bufs);
> -
>       return rc;
>   }
>   
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 8df2b6e..5cf7208 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -6,6 +6,7 @@ obj-$(CONFIG_CORE_PARKING) += core_parking.o
>   obj-y += cpu.o
>   obj-$(CONFIG_DEBUG_TRACE) += debugtrace.o
>   obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o
> +obj-$(CONFIG_IOREQ_SERVER) += dm.o
>   obj-y += domctl.o
>   obj-y += domain.o
>   obj-y += event_2l.o
> diff --git a/xen/common/dm.c b/xen/common/dm.c
> new file mode 100644
> index 0000000..060731d
> --- /dev/null
> +++ b/xen/common/dm.c
> @@ -0,0 +1,287 @@
> +/*
> + * Copyright (c) 2016 Citrix Systems Inc.
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/guest_access.h>
> +#include <xen/hypercall.h>
> +#include <xen/ioreq.h>
> +#include <xen/nospec.h>
> +
> +static int dm_op(const struct dmop_args *op_args)
> +{
> +    struct domain *d;
> +    struct xen_dm_op op;
> +    long rc;
> +    bool const_op = true;
> +    const size_t offset = offsetof(struct xen_dm_op, u);
> +
> +    static const uint8_t op_size[] = {
> +        [XEN_DMOP_create_ioreq_server]              = sizeof(struct xen_dm_op_create_ioreq_server),
> +        [XEN_DMOP_get_ioreq_server_info]            = sizeof(struct xen_dm_op_get_ioreq_server_info),
> +        [XEN_DMOP_map_io_range_to_ioreq_server]     = sizeof(struct xen_dm_op_ioreq_server_range),
> +        [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range),
> +        [XEN_DMOP_set_ioreq_server_state]           = sizeof(struct xen_dm_op_set_ioreq_server_state),
> +        [XEN_DMOP_destroy_ioreq_server]             = sizeof(struct xen_dm_op_destroy_ioreq_server),
> +        [XEN_DMOP_track_dirty_vram]                 = sizeof(struct xen_dm_op_track_dirty_vram),
> +        [XEN_DMOP_set_pci_intx_level]               = sizeof(struct xen_dm_op_set_pci_intx_level),
> +        [XEN_DMOP_set_isa_irq_level]                = sizeof(struct xen_dm_op_set_isa_irq_level),
> +        [XEN_DMOP_set_pci_link_route]               = sizeof(struct xen_dm_op_set_pci_link_route),
> +        [XEN_DMOP_modified_memory]                  = sizeof(struct xen_dm_op_modified_memory),
> +        [XEN_DMOP_set_mem_type]                     = sizeof(struct xen_dm_op_set_mem_type),
> +        [XEN_DMOP_inject_event]                     = sizeof(struct xen_dm_op_inject_event),
> +        [XEN_DMOP_inject_msi]                       = sizeof(struct xen_dm_op_inject_msi),
> +        [XEN_DMOP_map_mem_type_to_ioreq_server]     = sizeof(struct xen_dm_op_map_mem_type_to_ioreq_server),
> +        [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
> +        [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
> +        [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
> +    };
> +
> +    rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
> +    if ( rc )
> +        return rc;
> +
> +    if ( !is_hvm_domain(d) )
> +        goto out;
> +
> +    rc = xsm_dm_op(XSM_DM_PRIV, d);
> +    if ( rc )
> +        goto out;
> +
> +    rc = -EFAULT;
> +    if ( op_args->buf[0].size < offset )
> +        goto out;
> +
> +    if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) )
> +        goto out;
> +
> +    if ( op.op >= ARRAY_SIZE(op_size) )
> +    {
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
> +
> +    op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size));
> +
> +    if ( op_args->buf[0].size < offset + op_size[op.op] )
> +        goto out;
> +
> +    if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset,
> +                                op_size[op.op]) )
> +        goto out;
> +
> +    rc = -EINVAL;
> +    if ( op.pad )
> +        goto out;
> +
> +    switch ( op.op )
> +    {
> +    case XEN_DMOP_create_ioreq_server:
> +    {
> +        struct xen_dm_op_create_ioreq_server *data =
> +            &op.u.create_ioreq_server;
> +
> +        const_op = false;
> +
> +        rc = -EINVAL;
> +        if ( data->pad[0] || data->pad[1] || data->pad[2] )
> +            break;
> +
> +        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
> +                                     &data->id);
> +        break;
> +    }
> +
> +    case XEN_DMOP_get_ioreq_server_info:
> +    {
> +        struct xen_dm_op_get_ioreq_server_info *data =
> +            &op.u.get_ioreq_server_info;
> +        const uint16_t valid_flags = XEN_DMOP_no_gfns;
> +
> +        const_op = false;
> +
> +        rc = -EINVAL;
> +        if ( data->flags & ~valid_flags )
> +            break;
> +
> +        rc = hvm_get_ioreq_server_info(d, data->id,
> +                                       (data->flags & XEN_DMOP_no_gfns) ?
> +                                       NULL : (unsigned long *)&data->ioreq_gfn,
> +                                       (data->flags & XEN_DMOP_no_gfns) ?
> +                                       NULL : (unsigned long *)&data->bufioreq_gfn,
> +                                       &data->bufioreq_port);
> +        break;
> +    }
> +
> +    case XEN_DMOP_map_io_range_to_ioreq_server:
> +    {
> +        const struct xen_dm_op_ioreq_server_range *data =
> +            &op.u.map_io_range_to_ioreq_server;
> +
> +        rc = -EINVAL;
> +        if ( data->pad )
> +            break;
> +
> +        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
> +                                              data->start, data->end);
> +        break;
> +    }
> +
> +    case XEN_DMOP_unmap_io_range_from_ioreq_server:
> +    {
> +        const struct xen_dm_op_ioreq_server_range *data =
> +            &op.u.unmap_io_range_from_ioreq_server;
> +
> +        rc = -EINVAL;
> +        if ( data->pad )
> +            break;
> +
> +        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
> +                                                  data->start, data->end);
> +        break;
> +    }
> +
> +    case XEN_DMOP_set_ioreq_server_state:
> +    {
> +        const struct xen_dm_op_set_ioreq_server_state *data =
> +            &op.u.set_ioreq_server_state;
> +
> +        rc = -EINVAL;
> +        if ( data->pad )
> +            break;
> +
> +        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
> +        break;
> +    }
> +
> +    case XEN_DMOP_destroy_ioreq_server:
> +    {
> +        const struct xen_dm_op_destroy_ioreq_server *data =
> +            &op.u.destroy_ioreq_server;
> +
> +        rc = -EINVAL;
> +        if ( data->pad )
> +            break;
> +
> +        rc = hvm_destroy_ioreq_server(d, data->id);
> +        break;
> +    }
> +
> +    default:
> +        rc = arch_dm_op(&op, d, op_args, &const_op);
> +    }
> +
> +    if ( (!rc || rc == -ERESTART) &&
> +         !const_op && copy_to_guest_offset(op_args->buf[0].h, offset,
> +                                           (void *)&op.u, op_size[op.op]) )
> +        rc = -EFAULT;
> +
> + out:
> +    rcu_unlock_domain(d);
> +
> +    return rc;
> +}
> +
> +#ifdef CONFIG_COMPAT
> +CHECK_dm_op_create_ioreq_server;
> +CHECK_dm_op_get_ioreq_server_info;
> +CHECK_dm_op_ioreq_server_range;
> +CHECK_dm_op_set_ioreq_server_state;
> +CHECK_dm_op_destroy_ioreq_server;
> +CHECK_dm_op_track_dirty_vram;
> +CHECK_dm_op_set_pci_intx_level;
> +CHECK_dm_op_set_isa_irq_level;
> +CHECK_dm_op_set_pci_link_route;
> +CHECK_dm_op_modified_memory;
> +CHECK_dm_op_set_mem_type;
> +CHECK_dm_op_inject_event;
> +CHECK_dm_op_inject_msi;
> +CHECK_dm_op_remote_shutdown;
> +CHECK_dm_op_relocate_memory;
> +CHECK_dm_op_pin_memory_cacheattr;
> +
> +int compat_dm_op(domid_t domid,
> +                 unsigned int nr_bufs,
> +                 XEN_GUEST_HANDLE_PARAM(void) bufs)
> +{
> +    struct dmop_args args;
> +    unsigned int i;
> +    int rc;
> +
> +    if ( nr_bufs > ARRAY_SIZE(args.buf) )
> +        return -E2BIG;
> +
> +    args.domid = domid;
> +    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
> +
> +    for ( i = 0; i < args.nr_bufs; i++ )
> +    {
> +        struct compat_dm_op_buf cmp;
> +
> +        if ( copy_from_guest_offset(&cmp, bufs, i, 1) )
> +            return -EFAULT;
> +
> +#define XLAT_dm_op_buf_HNDL_h(_d_, _s_) \
> +        guest_from_compat_handle((_d_)->h, (_s_)->h)
> +
> +        XLAT_dm_op_buf(&args.buf[i], &cmp);
> +
> +#undef XLAT_dm_op_buf_HNDL_h
> +    }
> +
> +    rc = dm_op(&args);
> +
> +    if ( rc == -ERESTART )
> +        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
> +                                           domid, nr_bufs, bufs);
> +
> +    return rc;
> +}
> +#endif
> +
> +long do_dm_op(domid_t domid,
> +              unsigned int nr_bufs,
> +              XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs)
> +{
> +    struct dmop_args args;
> +    int rc;
> +
> +    if ( nr_bufs > ARRAY_SIZE(args.buf) )
> +        return -E2BIG;
> +
> +    args.domid = domid;
> +    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
> +
> +    if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) )
> +        return -EFAULT;
> +
> +    rc = dm_op(&args);
> +
> +    if ( rc == -ERESTART )
> +        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
> +                                           domid, nr_bufs, bufs);
> +
> +    return rc;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
> index 655acc7..19f509f 100644
> --- a/xen/include/xen/hypercall.h
> +++ b/xen/include/xen/hypercall.h
> @@ -150,6 +150,18 @@ do_dm_op(
>       unsigned int nr_bufs,
>       XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs);
>   
> +struct dmop_args {
> +    domid_t domid;
> +    unsigned int nr_bufs;
> +    /* Reserve enough buf elements for all current hypercalls. */
> +    struct xen_dm_op_buf buf[2];
> +};
> +
> +int arch_dm_op(struct xen_dm_op *op,
> +               struct domain *d,
> +               const struct dmop_args *op_args,
> +               bool *const_op);
> +
>   #ifdef CONFIG_HYPFS
>   extern long
>   do_hypfs_op(
> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index 5f6f842..c0813c0 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -723,14 +723,14 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
>       }
>   }
>   
> +#endif /* CONFIG_X86 */
> +
>   static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
>   {
>       XSM_ASSERT_ACTION(XSM_DM_PRIV);
>       return xsm_default_action(action, current->domain, d);
>   }
>   
> -#endif /* CONFIG_X86 */
> -
>   #ifdef CONFIG_ARGO
>   static XSM_INLINE int xsm_argo_enable(const struct domain *d)
>   {
> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> index a80bcf3..2a9b39d 100644
> --- a/xen/include/xsm/xsm.h
> +++ b/xen/include/xsm/xsm.h
> @@ -177,8 +177,8 @@ struct xsm_operations {
>       int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>       int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>       int (*pmu_op) (struct domain *d, unsigned int op);
> -    int (*dm_op) (struct domain *d);
>   #endif
> +    int (*dm_op) (struct domain *d);
>       int (*xen_version) (uint32_t cmd);
>       int (*domain_resource_map) (struct domain *d);
>   #ifdef CONFIG_ARGO
> @@ -688,13 +688,13 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
>       return xsm_ops->pmu_op(d, op);
>   }
>   
> +#endif /* CONFIG_X86 */
> +
>   static inline int xsm_dm_op(xsm_default_t def, struct domain *d)
>   {
>       return xsm_ops->dm_op(d);
>   }
>   
> -#endif /* CONFIG_X86 */
> -
>   static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
>   {
>       return xsm_ops->xen_version(op);
> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
> index d4cce68..e3afd06 100644
> --- a/xen/xsm/dummy.c
> +++ b/xen/xsm/dummy.c
> @@ -148,8 +148,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
>       set_to_dummy_if_null(ops, ioport_permission);
>       set_to_dummy_if_null(ops, ioport_mapping);
>       set_to_dummy_if_null(ops, pmu_op);
> -    set_to_dummy_if_null(ops, dm_op);
>   #endif
> +    set_to_dummy_if_null(ops, dm_op);
>       set_to_dummy_if_null(ops, xen_version);
>       set_to_dummy_if_null(ops, domain_resource_map);
>   #ifdef CONFIG_ARGO
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index a314bf8..645192a 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1662,14 +1662,13 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
>           return -EPERM;
>       }
>   }
> +#endif /* CONFIG_X86 */
>   
>   static int flask_dm_op(struct domain *d)
>   {
>       return current_has_perm(d, SECCLASS_HVM, HVM__DM);
>   }
>   
> -#endif /* CONFIG_X86 */
> -
>   static int flask_xen_version (uint32_t op)
>   {
>       u32 dsid = domain_sid(current->domain);
> @@ -1872,8 +1871,8 @@ static struct xsm_operations flask_ops = {
>       .ioport_permission = flask_ioport_permission,
>       .ioport_mapping = flask_ioport_mapping,
>       .pmu_op = flask_pmu_op,
> -    .dm_op = flask_dm_op,
>   #endif
> +    .dm_op = flask_dm_op,
>       .xen_version = flask_xen_version,
>       .domain_resource_map = flask_domain_resource_map,
>   #ifdef CONFIG_ARGO
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-10 20:22 ` [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
  2020-09-11 10:14   ` Oleksandr
  2020-09-16  7:51   ` Jan Beulich
@ 2020-09-23 18:03   ` Julien Grall
  2020-09-23 20:16     ` Oleksandr
  2 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-23 18:03 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Stefano Stabellini, Volodymyr Babchuk,
	Julien Grall

Hi,

On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

I believe I am the originally author of this code...
> This patch adds basic IOREQ/DM support on Arm. The subsequent
> patches will improve functionality, add remaining bits as well as
> address several TODOs.

Find a bit weird to add code with TODOs that are handled in the same 
series? Can't we just split this patch in smaller one where everything 
is addressed from start?

> 
> Please note, the "PIO handling" TODO is expected to left unaddressed
> for the current series. It is not an big issue for now while Xen
> doesn't have support for vPCI on Arm. On Arm64 they are only used
> for PCI IO Bar and we would probably want to expose them to emulator
> as PIO access to make a DM completely arch-agnostic. So "PIO handling"
> should be implemented when we add support for vPCI.
> 
> Please note, at the moment build on Arm32 is broken (see cmpxchg
> usage in hvm_send_buffered_ioreq()) due to the lack of cmpxchg_64
> support on Arm32. There is a patch on review to address this issue:
> https://patchwork.kernel.org/patch/11715559/

This has been committed.

> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> ---
> ---
>   xen/arch/arm/Kconfig            |   1 +
>   xen/arch/arm/Makefile           |   2 +
>   xen/arch/arm/dm.c               |  33 ++++++++++
>   xen/arch/arm/domain.c           |   9 +++
>   xen/arch/arm/io.c               |  11 +++-
>   xen/arch/arm/ioreq.c            | 142 ++++++++++++++++++++++++++++++++++++++++
>   xen/arch/arm/traps.c            |  32 +++++++--
>   xen/include/asm-arm/domain.h    |  46 +++++++++++++
>   xen/include/asm-arm/hvm/ioreq.h | 108 ++++++++++++++++++++++++++++++
>   xen/include/asm-arm/mmio.h      |   1 +
>   xen/include/asm-arm/paging.h    |   4 ++
>   11 files changed, 384 insertions(+), 5 deletions(-)
>   create mode 100644 xen/arch/arm/dm.c
>   create mode 100644 xen/arch/arm/ioreq.c
>   create mode 100644 xen/include/asm-arm/hvm/ioreq.h
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index 2777388..8264cd6 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -21,6 +21,7 @@ config ARM
>   	select HAS_PASSTHROUGH
>   	select HAS_PDX
>   	select IOMMU_FORCE_PT_SHARE
> +	select IOREQ_SERVER

I would prefer if IOREQ_SERVER is not included in the default build of 
Xen. This is fairly big feature that require a lot more testing.

>   
>   config ARCH_DEFCONFIG
>   	string
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 7e82b21..617fa3e 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -13,6 +13,7 @@ obj-y += cpuerrata.o
>   obj-y += cpufeature.o
>   obj-y += decode.o
>   obj-y += device.o
> +obj-$(CONFIG_IOREQ_SERVER) += dm.o
>   obj-y += domain.o
>   obj-y += domain_build.init.o
>   obj-y += domctl.o
> @@ -27,6 +28,7 @@ obj-y += guest_atomics.o
>   obj-y += guest_walk.o
>   obj-y += hvm.o
>   obj-y += io.o
> +obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
>   obj-y += irq.o
>   obj-y += kernel.init.o
>   obj-$(CONFIG_LIVEPATCH) += livepatch.o
> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
> new file mode 100644
> index 0000000..eb20344
> --- /dev/null
> +++ b/xen/arch/arm/dm.c
> @@ -0,0 +1,33 @@
> +/*
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/hypercall.h>
> +
> +int arch_dm_op(struct xen_dm_op *op, struct domain *d,
> +               const struct dmop_args *op_args, bool *const_op)
> +{
> +    return -EOPNOTSUPP;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 3116932..043db3f 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -14,6 +14,7 @@
>   #include <xen/grant_table.h>
>   #include <xen/hypercall.h>
>   #include <xen/init.h>
> +#include <xen/ioreq.h>
>   #include <xen/lib.h>
>   #include <xen/livepatch.h>
>   #include <xen/sched.h>
> @@ -681,6 +682,10 @@ int arch_domain_create(struct domain *d,
>   
>       ASSERT(config != NULL);
>   
> +#ifdef CONFIG_IOREQ_SERVER
> +    hvm_ioreq_init(d);
> +#endif
> +
>       /* p2m_init relies on some value initialized by the IOMMU subsystem */
>       if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
>           goto fail;
> @@ -999,6 +1004,10 @@ int domain_relinquish_resources(struct domain *d)
>           if (ret )
>               return ret;
>   
> +#ifdef CONFIG_IOREQ_SERVER
> +        hvm_destroy_all_ioreq_servers(d);
> +#endif
> +
>       PROGRESS(xen):
>           ret = relinquish_memory(d, &d->xenpage_list);
>           if ( ret )
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index ae7ef96..adc9de7 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -16,6 +16,7 @@
>    * GNU General Public License for more details.
>    */
>   
> +#include <xen/ioreq.h>
>   #include <xen/lib.h>
>   #include <xen/spinlock.h>
>   #include <xen/sched.h>
> @@ -123,7 +124,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>   
>       handler = find_mmio_handler(v->domain, info.gpa);
>       if ( !handler )
> -        return IO_UNHANDLED;
> +    {
> +        int rc;
> +
> +        rc = try_fwd_ioserv(regs, v, &info);
> +        if ( rc == IO_HANDLED )
> +            return handle_ioserv(regs, v);
> +
> +        return rc;
> +    }
>   
>       /* All the instructions used on emulated MMIO region should be valid */
>       if ( !dabt.valid )
> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> new file mode 100644
> index 0000000..e493c5b
> --- /dev/null
> +++ b/xen/arch/arm/ioreq.c
> @@ -0,0 +1,142 @@
> +/*
> + * arm/ioreq.c: hardware virtual machine I/O emulation
> + *
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/domain.h>
> +#include <xen/ioreq.h>
> +
> +#include <public/hvm/ioreq.h>
> +
> +#include <asm/traps.h>
> +
> +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
> +{
> +    const union hsr hsr = { .bits = regs->hsr };
> +    const struct hsr_dabt dabt = hsr.dabt;
> +    /* Code is similar to handle_read */
> +    uint8_t size = (1 << dabt.size) * 8;
> +    register_t r = v->arch.hvm.hvm_io.io_req.data;
> +
> +    /* We are done with the IO */
> +    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
> +
> +    /* XXX: Do we need to take care of write here ? */

It doesn't look like we need to do anything for write as they have 
completed. Is there anything else we need to confirm?

> +    if ( dabt.write )
> +        return IO_HANDLED;
> +
> +    /*
> +     * Sign extend if required.
> +     * Note that we expect the read handler to have zeroed the bits
> +     * outside the requested access size.
> +     */
> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
> +    {
> +        /*
> +         * We are relying on register_t using the same as
> +         * an unsigned long in order to keep the 32-bit assembly
> +         * code smaller.
> +         */
> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
> +        r |= (~0UL) << size;
> +    }
> +
> +    set_user_reg(regs, dabt.reg, r);
> +
> +    return IO_HANDLED;
> +}
> +
> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> +                             struct vcpu *v, mmio_info_t *info)
> +{
> +    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
> +    ioreq_t p = {
> +        .type = IOREQ_TYPE_COPY,
> +        .addr = info->gpa,
> +        .size = 1 << info->dabt.size,
> +        .count = 1,
> +        .dir = !info->dabt.write,
> +        /*
> +         * On x86, df is used by 'rep' instruction to tell the direction
> +         * to iterate (forward or backward).
> +         * On Arm, all the accesses to MMIO region will do a single
> +         * memory access. So for now, we can safely always set to 0.
> +         */
> +        .df = 0,
> +        .data = get_user_reg(regs, info->dabt.reg),
> +        .state = STATE_IOREQ_READY,
> +    };
> +    struct hvm_ioreq_server *s = NULL;
> +    enum io_state rc;
> +
> +    switch ( vio->io_req.state )
> +    {
> +    case STATE_IOREQ_NONE:
> +        break;
> +
> +    case STATE_IORESP_READY:
> +        return IO_HANDLED;
> +
> +    default:
> +        gdprintk(XENLOG_ERR, "wrong state %u\n", vio->io_req.state);
> +        return IO_ABORT;
> +    }
> +
> +    s = hvm_select_ioreq_server(v->domain, &p);
> +    if ( !s )
> +        return IO_UNHANDLED;
> +
> +    if ( !info->dabt.valid )
> +        return IO_ABORT;
> +
> +    vio->io_req = p;
> +
> +    rc = hvm_send_ioreq(s, &p, 0);
> +    if ( rc != IO_RETRY || v->domain->is_shutting_down )
> +        vio->io_req.state = STATE_IOREQ_NONE;
> +    else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
> +        rc = IO_HANDLED;
> +    else
> +        vio->io_completion = HVMIO_mmio_completion;
> +
> +    return rc;
> +}
> +
> +bool ioreq_handle_complete_mmio(void)
> +{
> +    struct vcpu *v = current;
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    const union hsr hsr = { .bits = regs->hsr };
> +    paddr_t addr = v->arch.hvm.hvm_io.io_req.addr;
> +
> +    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
> +    {
> +        advance_pc(regs, hsr);
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 8f40d0e..121942c 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -21,6 +21,7 @@
>   #include <xen/hypercall.h>
>   #include <xen/init.h>
>   #include <xen/iocap.h>
> +#include <xen/ioreq.h>
>   #include <xen/irq.h>
>   #include <xen/lib.h>
>   #include <xen/mem_access.h>
> @@ -1384,6 +1385,9 @@ static arm_hypercall_t arm_hypercall_table[] = {
>   #ifdef CONFIG_HYPFS
>       HYPERCALL(hypfs_op, 5),
>   #endif
> +#ifdef CONFIG_IOREQ_SERVER
> +    HYPERCALL(dm_op, 3),
> +#endif
>   };
>   
>   #ifndef NDEBUG
> @@ -1955,9 +1959,14 @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
>               case IO_HANDLED:
>                   advance_pc(regs, hsr);
>                   return;
> +            case IO_RETRY:
> +                /* finish later */
> +                return;
>               case IO_UNHANDLED:
>                   /* IO unhandled, try another way to handle it. */
>                   break;
> +            default:
> +                ASSERT_UNREACHABLE();
>               }
>           }
>   
> @@ -2249,12 +2258,23 @@ static void check_for_pcpu_work(void)
>    * Process pending work for the vCPU. Any call should be fast or
>    * implement preemption.
>    */
> -static void check_for_vcpu_work(void)
> +static bool check_for_vcpu_work(void)
>   {
>       struct vcpu *v = current;
>   
> +#ifdef CONFIG_IOREQ_SERVER
> +    bool handled;
> +
> +    local_irq_enable();
> +    handled = handle_hvm_io_completion(v);
> +    local_irq_disable();
> +
> +    if ( !handled )
> +        return true;
> +#endif
> +
>       if ( likely(!v->arch.need_flush_to_ram) )
> -        return;
> +        return false;
>   
>       /*
>        * Give a chance for the pCPU to process work before handling the vCPU
> @@ -2265,6 +2285,8 @@ static void check_for_vcpu_work(void)
>       local_irq_enable();
>       p2m_flush_vm(v);
>       local_irq_disable();
> +
> +    return false;
>   }
>   
>   /*
> @@ -2277,8 +2299,10 @@ void leave_hypervisor_to_guest(void)
>   {
>       local_irq_disable();
>   
> -    check_for_vcpu_work();
> -    check_for_pcpu_work();
> +    do
> +    {
> +        check_for_pcpu_work();
> +    } while ( check_for_vcpu_work() );
>   
>       vgic_sync_to_lrs();
>   
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 6819a3b..d1c48d7 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -11,10 +11,27 @@
>   #include <asm/vgic.h>
>   #include <asm/vpl011.h>
>   #include <public/hvm/params.h>
> +#include <public/hvm/dm_op.h>
> +#include <public/hvm/ioreq.h>
> +
> +#define MAX_NR_IOREQ_SERVERS 8
>   
>   struct hvm_domain
>   {
>       uint64_t              params[HVM_NR_PARAMS];
> +
> +    /* Guest page range used for non-default ioreq servers */
> +    struct {
> +        unsigned long base;
> +        unsigned long mask;
> +        unsigned long legacy_mask; /* indexed by HVM param number */
> +    } ioreq_gfn;
> +
> +    /* Lock protects all other values in the sub-struct and the default */
> +    struct {
> +        spinlock_t              lock;
> +        struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
> +    } ioreq_server;
>   };
>   
>   #ifdef CONFIG_ARM_64
> @@ -91,6 +108,28 @@ struct arch_domain
>   #endif
>   }  __cacheline_aligned;
>   
> +enum hvm_io_completion {
> +    HVMIO_no_completion,
> +    HVMIO_mmio_completion,
> +    HVMIO_pio_completion
> +};
> +
> +struct hvm_vcpu_io {
> +    /* I/O request in flight to device model. */
> +    enum hvm_io_completion io_completion;
> +    ioreq_t                io_req;
> +
> +    /*
> +     * HVM emulation:
> +     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
> +     *  The latter is known to be an MMIO frame (not RAM).
> +     *  This translation is only valid for accesses as per @mmio_access.
> +     */
> +    struct npfec        mmio_access;
> +    unsigned long       mmio_gla;
> +    unsigned long       mmio_gpfn;
> +};
> +

Why do we need to re-define most of this? Can't this just be in common code?

I would also rather not define them if !CONFIG_IOREQ_SERVER is not set.


>   struct arch_vcpu
>   {
>       struct {
> @@ -204,6 +243,11 @@ struct arch_vcpu
>        */
>       bool need_flush_to_ram;
>   
> +    struct hvm_vcpu
> +    {
> +        struct hvm_vcpu_io hvm_io;
> +    } hvm;

The IOREQ code is meant to be agnostic from the type of guest, so I 
don't really see a reason for the common code to access arch.hvm.

This should be abstracted appropriately.

> +
>   }  __cacheline_aligned;
>   
>   void vcpu_show_execution_state(struct vcpu *);
> @@ -262,6 +306,8 @@ static inline void arch_vcpu_block(struct vcpu *v) {}
>   
>   #define arch_vm_assist_valid_mask(d) (1UL << VMASST_TYPE_runstate_update_flag)
>   
> +#define has_vpci(d)    ({ (void)(d); false; })
> +
>   #endif /* __ASM_DOMAIN_H__ */
>   
>   /*
> diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h
> new file mode 100644
> index 0000000..1c34df0
> --- /dev/null
> +++ b/xen/include/asm-arm/hvm/ioreq.h
> @@ -0,0 +1,108 @@
> +/*
> + * hvm.h: Hardware virtual machine assist interface definitions.
> + *
> + * Copyright (c) 2016 Citrix Systems Inc.
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __ASM_ARM_HVM_IOREQ_H__
> +#define __ASM_ARM_HVM_IOREQ_H__
> +
> +#include <public/hvm/ioreq.h>
> +#include <public/hvm/dm_op.h>
> +
> +#ifdef CONFIG_IOREQ_SERVER
> +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v);
> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> +                             struct vcpu *v, mmio_info_t *info);
> +#else
> +static inline enum io_state handle_ioserv(struct cpu_user_regs *regs,
> +                                          struct vcpu *v)
> +{
> +    return IO_UNHANDLED;
> +}
> +
> +static inline enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> +                                           struct vcpu *v, mmio_info_t *info)
> +{
> +    return IO_UNHANDLED;
> +}
> +#endif
> +
> +bool ioreq_handle_complete_mmio(void);
> +
> +static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
> +{
> +    /*
> +     * TODO: For Arm64, the main user will be PCI. So this should be
> +     * implemented when we add support for vPCI.
> +     */
> +    BUG();

Why do you use a BUG() and not an ASSERT_UNREACHABLE()?

> +    return true;
> +}
> +
> +static inline int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s)
> +{
> +    return 0;
> +}
> +
> +static inline void msix_write_completion(struct vcpu *v)
> +{
> +}
> +
> +static inline bool arch_handle_hvm_io_completion(
> +    enum hvm_io_completion io_completion)
> +{
> +    ASSERT_UNREACHABLE();
> +}
> +
> +static inline int hvm_get_ioreq_server_range_type(struct domain *d,
> +                                                  ioreq_t *p,
> +                                                  uint8_t *type,
> +                                                  uint64_t *addr)
> +{
> +    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
> +        return -EINVAL;
> +
> +    *type = (p->type == IOREQ_TYPE_PIO) ?
> +             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
> +    *addr = p->addr;
> +
> +    return 0;
> +}
> +
> +static inline void arch_hvm_ioreq_init(struct domain *d)
> +{
> +}
> +
> +static inline void arch_hvm_ioreq_destroy(struct domain *d)
> +{
> +}
> +
> +#define IOREQ_IO_HANDLED     IO_HANDLED
> +#define IOREQ_IO_UNHANDLED   IO_UNHANDLED
> +#define IOREQ_IO_RETRY       IO_RETRY
> +
> +#endif /* __ASM_ARM_HVM_IOREQ_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h
> index 8dbfb27..7ab873c 100644
> --- a/xen/include/asm-arm/mmio.h
> +++ b/xen/include/asm-arm/mmio.h
> @@ -37,6 +37,7 @@ enum io_state
>       IO_ABORT,       /* The IO was handled by the helper and led to an abort. */
>       IO_HANDLED,     /* The IO was successfully handled by the helper. */
>       IO_UNHANDLED,   /* The IO was not handled by the helper. */
> +    IO_RETRY,       /* Retry the emulation for some reason */
>   };
>   
>   typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info,
> diff --git a/xen/include/asm-arm/paging.h b/xen/include/asm-arm/paging.h
> index 6d1a000..0550c55 100644
> --- a/xen/include/asm-arm/paging.h
> +++ b/xen/include/asm-arm/paging.h
> @@ -4,6 +4,10 @@
>   #define paging_mode_translate(d)              (1)
>   #define paging_mode_external(d)               (1)
>   
> +static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
> +{
> +}
> +
>   #endif /* XEN_PAGING_H */
>   
>   /*
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-09-10 20:22 ` [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
  2020-09-16  9:04   ` Jan Beulich
@ 2020-09-23 18:05   ` Julien Grall
  1 sibling, 0 replies; 111+ messages in thread
From: Julien Grall @ 2020-09-23 18:05 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Paul Durrant, Stefano Stabellini, Julien Grall

Hi Oleksandr,

On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> The cmpxchg() in hvm_send_buffered_ioreq() operates on memory shared
> with the emulator. In order to be on the safe side we need to switch
> to guest_cmpxchg64() to prevent a domain to DoS Xen on Arm.
> 
> CC: Julien Grall <jgrall@amazon.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>


For bisection purpose, we need this series to at least build at every 
patch. It is fine if the IOREQ feature doesn't work.

So this patch wants to be earlier in the series to avoid breaking arm32 
compilation.

> 
> ---
> Please note, this patch depends on the following patch on a review:
> https://patchwork.kernel.org/patch/11715559/
> 
> Changes RFC -> V1:
>     - new patch
> ---
> ---
>   xen/common/ioreq.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
> index e24a481..645d8a1 100644
> --- a/xen/common/ioreq.c
> +++ b/xen/common/ioreq.c
> @@ -30,6 +30,8 @@
>   #include <xen/trace.h>
>   #include <xen/vpci.h>
>   
> +#include <asm/guest_atomics.h>
> +
>   #include <public/hvm/dm_op.h>
>   #include <public/hvm/ioreq.h>
>   #include <public/hvm/params.h>
> @@ -1325,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
>   
>           new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
>           new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> -        cmpxchg(&pg->ptrs.full, old.full, new.full);
> +        guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full);
>       }
>   
>       notify_via_xen_event_channel(d, s->bufioreq_evtchn);
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common
  2020-09-23 17:22   ` Julien Grall
@ 2020-09-23 18:08     ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-23 18:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Jan Beulich,
	Andrew Cooper, Wei Liu, Roger Pau Monné,
	Stefano Stabellini, Julien Grall


On 23.09.20 20:22, Julien Grall wrote:
> Hi,

Hi Julien


>
> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> As a lot of x86 code can be re-used on Arm later on, this patch
>> prepares IOREQ support before moving to the common code. This way
>> we will get almost a verbatim copy for a code movement.
>
> FWIW, I agree with Jan that we need more details what and why you are 
> going it. It would be worth considering to split in smaller patch.

ok


>
>
>>
>> This support is going to be used on Arm to be able run device
>> emulator outside of Xen hypervisor.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>
> Usually the first signed-off is the author of the patch. However, this 
> patch look quite far off from what I originally wrote.
>
> So I don't feel my signed-off-by is actually warrant here. If you want 
> to credit me, then you can mention it in the commit message.
This is related to all patches is this series. This patch series is the 
second attempt (the first was RFC) to make IOREQ support common and it 
became quite different from the initial commit.
I am sorry, I got completely lost whether the particular patch in this 
series is close to what you originally wrote or far from, I mean whether 
I should retain your SoB and whether I should drop it. So in order *not 
to make a mistake* is such an important question, I decided to add your 
SoB to each patch in this series and also add a note to each patch 
describing where this series came from.


>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>
> Cheers,
>
-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-09-22 20:05           ` Oleksandr
@ 2020-09-23 18:12             ` Julien Grall
  2020-09-23 20:29               ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-23 18:12 UTC (permalink / raw)
  To: Oleksandr, paul, 'Jan Beulich'
  Cc: xen-devel, 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall'



On 22/09/2020 21:05, Oleksandr wrote:
> 
> On 16.09.20 12:12, Julien Grall wrote:
> 
> Hi all.
> 
>>
>>
>> On 16/09/2020 10:09, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: 16 September 2020 10:07
>>>> To: Jan Beulich <jbeulich@suse.com>; Oleksandr Tyshchenko 
>>>> <olekstysh@gmail.com>
>>>> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko 
>>>> <oleksandr_tyshchenko@epam.com>; Paul Durrant
>>>> <paul@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Julien 
>>>> Grall <jgrall@amazon.com>
>>>> Subject: Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() 
>>>> instead of cmpxchg()
>>>>
>>>>
>>>>
>>>> On 16/09/2020 10:04, Jan Beulich wrote:
>>>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>>>> @@ -1325,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct 
>>>>>> hvm_ioreq_server *s, ioreq_t *p)
>>>>>>
>>>>>>            new.read_pointer = old.read_pointer - n * 
>>>>>> IOREQ_BUFFER_SLOT_NUM;
>>>>>>            new.write_pointer = old.write_pointer - n * 
>>>>>> IOREQ_BUFFER_SLOT_NUM;
>>>>>> -        cmpxchg(&pg->ptrs.full, old.full, new.full);
>>>>>> +        guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full);
>>>>>
>>>>> But the memory we're updating is shared with s->emulator, not with d,
>>>>> if I'm not mistaken.
>>>>
>>>> It is unfortunately shared with both s->emulator and d when using the
>>>> legacy interface.
>>>
>>> When using magic pages they should be punched out of the P2M by the 
>>> time the code gets here, so the memory should not be guest-visible.
>>
>> Can you point me to the code that doing this?
>>
>> Cheers,
>>
> If we are not going to use legacy interface on Arm we will have a page 
> to be mapped in a single domain at the time.
Right, but this is common code... You have to think what would be the 
implication if we are using the legacy interface.

Thankfully the only user of the legacy interface is x86 so far and there 
is not concern regarding the atomics operations.

If we are happy to consider that the legacy interface will never be used 
(I am starting to be worry that one will ask it on Arm...) then we 
should be fine.

I think would be worth documenting in the commit message and the code 
(hvm_allow_set_param()) that the legacy interface *must* not be used 
without revisiting the code.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 04/16] xen/ioreq: Provide alias for the handle_mmio()
  2020-09-23 17:28   ` Julien Grall
@ 2020-09-23 18:17     ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-23 18:17 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Paul Durrant, Jan Beulich,
	Andrew Cooper, Wei Liu, Roger Pau Monné,
	Stefano Stabellini, Julien Grall


On 23.09.20 20:28, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien


>
> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> The IOREQ is a common feature now and Arm will have its own
>> implementation.
>>
>> But the name of the function is pretty generic and can be confusing
>> on Arm (we already have a try_handle_mmio()).
>>
>> In order not to rename the function (which is used for a varying
>> set of purposes on x86) globally and get non-confusing variant on Arm
>> provide an alias ioreq_handle_complete_mmio() to be used on common and
>> Arm code.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>
> This doesn't  look like a code I wrote... Can you make sure I am only 
> credited on what I wrote?

I am sorry, for the next series I will re-check all patches.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 07/16] xen/dm: Make x86's DM feature common
  2020-09-23 17:35   ` Julien Grall
@ 2020-09-23 18:28     ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-23 18:28 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Jan Beulich, Andrew Cooper,
	Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini, Daniel De Graaf,
	Julien Grall


On 23.09.20 20:35, Julien Grall wrote:

Hi Julien

>
> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> I believe I am the original author of this code. So this needs to be 
> fixed accordingly.

Sorry, will fix.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-23 18:03   ` Julien Grall
@ 2020-09-23 20:16     ` Oleksandr
  2020-09-24 11:08       ` Jan Beulich
  2020-09-24 17:25       ` Julien Grall
  0 siblings, 2 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-23 20:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall


On 23.09.20 21:03, Julien Grall wrote:
> Hi,

Hi Julien


>
> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> I believe I am the originally author of this code...

Sorry, will fix


>
>> This patch adds basic IOREQ/DM support on Arm. The subsequent
>> patches will improve functionality, add remaining bits as well as
>> address several TODOs.
>
> Find a bit weird to add code with TODOs that are handled in the same 
> series? Can't we just split this patch in smaller one where everything 
> is addressed from start?
Sorry if I wasn't clear in description. Let me please clarify.
Corresponding RFC patch had 3 major TODOs:
1. Handle properly when hvm_send_ioreq() returns IO_RETRY
2. Proper ref-counting for the foreign entries in set_foreign_p2m_entry()
3. Check the return value of handle_hvm_io_completion() *and* avoid 
calling handle_hvm_io_completion() on every return.

TODO #1 was fixed in current patch
TODO #2 was fixed in "xen/mm: Handle properly reference in 
set_foreign_p2m_entry() on Arm"
TODO #3 was partially fixed in current patch (check the return value of 
handle_hvm_io_completion()).
The second part of TODO #3 (avoid calling handle_hvm_io_completion() on 
every return) was moved to a separate patch "xen/ioreq: Introduce 
hvm_domain_has_ioreq_server()"
and fixed (or probably improved is a better word) there along with 
introducing a mechanism to actually improve.

Could you please clarify how this patch could be split in smaller one?


>
>
>>
>> Please note, the "PIO handling" TODO is expected to left unaddressed
>> for the current series. It is not an big issue for now while Xen
>> doesn't have support for vPCI on Arm. On Arm64 they are only used
>> for PCI IO Bar and we would probably want to expose them to emulator
>> as PIO access to make a DM completely arch-agnostic. So "PIO handling"
>> should be implemented when we add support for vPCI.
>>
>> Please note, at the moment build on Arm32 is broken (see cmpxchg
>> usage in hvm_send_buffered_ioreq()) due to the lack of cmpxchg_64
>> support on Arm32. There is a patch on review to address this issue:
>> https://patchwork.kernel.org/patch/11715559/
>
> This has been committed.

Thank you for the patch, will remove a notice.


>
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> ---
>> ---
>>   xen/arch/arm/Kconfig            |   1 +
>>   xen/arch/arm/Makefile           |   2 +
>>   xen/arch/arm/dm.c               |  33 ++++++++++
>>   xen/arch/arm/domain.c           |   9 +++
>>   xen/arch/arm/io.c               |  11 +++-
>>   xen/arch/arm/ioreq.c            | 142 
>> ++++++++++++++++++++++++++++++++++++++++
>>   xen/arch/arm/traps.c            |  32 +++++++--
>>   xen/include/asm-arm/domain.h    |  46 +++++++++++++
>>   xen/include/asm-arm/hvm/ioreq.h | 108 ++++++++++++++++++++++++++++++
>>   xen/include/asm-arm/mmio.h      |   1 +
>>   xen/include/asm-arm/paging.h    |   4 ++
>>   11 files changed, 384 insertions(+), 5 deletions(-)
>>   create mode 100644 xen/arch/arm/dm.c
>>   create mode 100644 xen/arch/arm/ioreq.c
>>   create mode 100644 xen/include/asm-arm/hvm/ioreq.h
>>
>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
>> index 2777388..8264cd6 100644
>> --- a/xen/arch/arm/Kconfig
>> +++ b/xen/arch/arm/Kconfig
>> @@ -21,6 +21,7 @@ config ARM
>>       select HAS_PASSTHROUGH
>>       select HAS_PDX
>>       select IOMMU_FORCE_PT_SHARE
>> +    select IOREQ_SERVER
>
> I would prefer if IOREQ_SERVER is not included in the default build of 
> Xen. This is fairly big feature that require a lot more testing.

Sounds reasonable. Will remove.


>
>
>>     config ARCH_DEFCONFIG
>>       string
>> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
>> index 7e82b21..617fa3e 100644
>> --- a/xen/arch/arm/Makefile
>> +++ b/xen/arch/arm/Makefile
>> @@ -13,6 +13,7 @@ obj-y += cpuerrata.o
>>   obj-y += cpufeature.o
>>   obj-y += decode.o
>>   obj-y += device.o
>> +obj-$(CONFIG_IOREQ_SERVER) += dm.o
>>   obj-y += domain.o
>>   obj-y += domain_build.init.o
>>   obj-y += domctl.o
>> @@ -27,6 +28,7 @@ obj-y += guest_atomics.o
>>   obj-y += guest_walk.o
>>   obj-y += hvm.o
>>   obj-y += io.o
>> +obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
>>   obj-y += irq.o
>>   obj-y += kernel.init.o
>>   obj-$(CONFIG_LIVEPATCH) += livepatch.o
>> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
>> new file mode 100644
>> index 0000000..eb20344
>> --- /dev/null
>> +++ b/xen/arch/arm/dm.c
>> @@ -0,0 +1,33 @@
>> +/*
>> + * Copyright (c) 2019 Arm ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License 
>> along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <xen/hypercall.h>
>> +
>> +int arch_dm_op(struct xen_dm_op *op, struct domain *d,
>> +               const struct dmop_args *op_args, bool *const_op)
>> +{
>> +    return -EOPNOTSUPP;
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 3116932..043db3f 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -14,6 +14,7 @@
>>   #include <xen/grant_table.h>
>>   #include <xen/hypercall.h>
>>   #include <xen/init.h>
>> +#include <xen/ioreq.h>
>>   #include <xen/lib.h>
>>   #include <xen/livepatch.h>
>>   #include <xen/sched.h>
>> @@ -681,6 +682,10 @@ int arch_domain_create(struct domain *d,
>>         ASSERT(config != NULL);
>>   +#ifdef CONFIG_IOREQ_SERVER
>> +    hvm_ioreq_init(d);
>> +#endif
>> +
>>       /* p2m_init relies on some value initialized by the IOMMU 
>> subsystem */
>>       if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
>>           goto fail;
>> @@ -999,6 +1004,10 @@ int domain_relinquish_resources(struct domain *d)
>>           if (ret )
>>               return ret;
>>   +#ifdef CONFIG_IOREQ_SERVER
>> +        hvm_destroy_all_ioreq_servers(d);
>> +#endif
>> +
>>       PROGRESS(xen):
>>           ret = relinquish_memory(d, &d->xenpage_list);
>>           if ( ret )
>> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>> index ae7ef96..adc9de7 100644
>> --- a/xen/arch/arm/io.c
>> +++ b/xen/arch/arm/io.c
>> @@ -16,6 +16,7 @@
>>    * GNU General Public License for more details.
>>    */
>>   +#include <xen/ioreq.h>
>>   #include <xen/lib.h>
>>   #include <xen/spinlock.h>
>>   #include <xen/sched.h>
>> @@ -123,7 +124,15 @@ enum io_state try_handle_mmio(struct 
>> cpu_user_regs *regs,
>>         handler = find_mmio_handler(v->domain, info.gpa);
>>       if ( !handler )
>> -        return IO_UNHANDLED;
>> +    {
>> +        int rc;
>> +
>> +        rc = try_fwd_ioserv(regs, v, &info);
>> +        if ( rc == IO_HANDLED )
>> +            return handle_ioserv(regs, v);
>> +
>> +        return rc;
>> +    }
>>         /* All the instructions used on emulated MMIO region should 
>> be valid */
>>       if ( !dabt.valid )
>> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
>> new file mode 100644
>> index 0000000..e493c5b
>> --- /dev/null
>> +++ b/xen/arch/arm/ioreq.c
>> @@ -0,0 +1,142 @@
>> +/*
>> + * arm/ioreq.c: hardware virtual machine I/O emulation
>> + *
>> + * Copyright (c) 2019 Arm ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License 
>> along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <xen/domain.h>
>> +#include <xen/ioreq.h>
>> +
>> +#include <public/hvm/ioreq.h>
>> +
>> +#include <asm/traps.h>
>> +
>> +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v)
>> +{
>> +    const union hsr hsr = { .bits = regs->hsr };
>> +    const struct hsr_dabt dabt = hsr.dabt;
>> +    /* Code is similar to handle_read */
>> +    uint8_t size = (1 << dabt.size) * 8;
>> +    register_t r = v->arch.hvm.hvm_io.io_req.data;
>> +
>> +    /* We are done with the IO */
>> +    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
>> +
>> +    /* XXX: Do we need to take care of write here ? */
>
> It doesn't look like we need to do anything for write as they have 
> completed. Is there anything else we need to confirm?

Agree, it was discussed for the RFC series. I forgot to remove it.


>
>
>> +    if ( dabt.write )
>> +        return IO_HANDLED;
>> +
>> +    /*
>> +     * Sign extend if required.
>> +     * Note that we expect the read handler to have zeroed the bits
>> +     * outside the requested access size.
>> +     */
>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>> +    {
>> +        /*
>> +         * We are relying on register_t using the same as
>> +         * an unsigned long in order to keep the 32-bit assembly
>> +         * code smaller.
>> +         */
>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>> +        r |= (~0UL) << size;
>> +    }
>> +
>> +    set_user_reg(regs, dabt.reg, r);
>> +
>> +    return IO_HANDLED;
>> +}
>> +
>> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>> +                             struct vcpu *v, mmio_info_t *info)
>> +{
>> +    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>> +    ioreq_t p = {
>> +        .type = IOREQ_TYPE_COPY,
>> +        .addr = info->gpa,
>> +        .size = 1 << info->dabt.size,
>> +        .count = 1,
>> +        .dir = !info->dabt.write,
>> +        /*
>> +         * On x86, df is used by 'rep' instruction to tell the 
>> direction
>> +         * to iterate (forward or backward).
>> +         * On Arm, all the accesses to MMIO region will do a single
>> +         * memory access. So for now, we can safely always set to 0.
>> +         */
>> +        .df = 0,
>> +        .data = get_user_reg(regs, info->dabt.reg),
>> +        .state = STATE_IOREQ_READY,
>> +    };
>> +    struct hvm_ioreq_server *s = NULL;
>> +    enum io_state rc;
>> +
>> +    switch ( vio->io_req.state )
>> +    {
>> +    case STATE_IOREQ_NONE:
>> +        break;
>> +
>> +    case STATE_IORESP_READY:
>> +        return IO_HANDLED;
>> +
>> +    default:
>> +        gdprintk(XENLOG_ERR, "wrong state %u\n", vio->io_req.state);
>> +        return IO_ABORT;
>> +    }
>> +
>> +    s = hvm_select_ioreq_server(v->domain, &p);
>> +    if ( !s )
>> +        return IO_UNHANDLED;
>> +
>> +    if ( !info->dabt.valid )
>> +        return IO_ABORT;
>> +
>> +    vio->io_req = p;
>> +
>> +    rc = hvm_send_ioreq(s, &p, 0);
>> +    if ( rc != IO_RETRY || v->domain->is_shutting_down )
>> +        vio->io_req.state = STATE_IOREQ_NONE;
>> +    else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
>> +        rc = IO_HANDLED;
>> +    else
>> +        vio->io_completion = HVMIO_mmio_completion;
>> +
>> +    return rc;
>> +}
>> +
>> +bool ioreq_handle_complete_mmio(void)
>> +{
>> +    struct vcpu *v = current;
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +    const union hsr hsr = { .bits = regs->hsr };
>> +    paddr_t addr = v->arch.hvm.hvm_io.io_req.addr;
>> +
>> +    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
>> +    {
>> +        advance_pc(regs, hsr);
>> +        return true;
>> +    }
>> +
>> +    return false;
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>> index 8f40d0e..121942c 100644
>> --- a/xen/arch/arm/traps.c
>> +++ b/xen/arch/arm/traps.c
>> @@ -21,6 +21,7 @@
>>   #include <xen/hypercall.h>
>>   #include <xen/init.h>
>>   #include <xen/iocap.h>
>> +#include <xen/ioreq.h>
>>   #include <xen/irq.h>
>>   #include <xen/lib.h>
>>   #include <xen/mem_access.h>
>> @@ -1384,6 +1385,9 @@ static arm_hypercall_t arm_hypercall_table[] = {
>>   #ifdef CONFIG_HYPFS
>>       HYPERCALL(hypfs_op, 5),
>>   #endif
>> +#ifdef CONFIG_IOREQ_SERVER
>> +    HYPERCALL(dm_op, 3),
>> +#endif
>>   };
>>     #ifndef NDEBUG
>> @@ -1955,9 +1959,14 @@ static void do_trap_stage2_abort_guest(struct 
>> cpu_user_regs *regs,
>>               case IO_HANDLED:
>>                   advance_pc(regs, hsr);
>>                   return;
>> +            case IO_RETRY:
>> +                /* finish later */
>> +                return;
>>               case IO_UNHANDLED:
>>                   /* IO unhandled, try another way to handle it. */
>>                   break;
>> +            default:
>> +                ASSERT_UNREACHABLE();
>>               }
>>           }
>>   @@ -2249,12 +2258,23 @@ static void check_for_pcpu_work(void)
>>    * Process pending work for the vCPU. Any call should be fast or
>>    * implement preemption.
>>    */
>> -static void check_for_vcpu_work(void)
>> +static bool check_for_vcpu_work(void)
>>   {
>>       struct vcpu *v = current;
>>   +#ifdef CONFIG_IOREQ_SERVER
>> +    bool handled;
>> +
>> +    local_irq_enable();
>> +    handled = handle_hvm_io_completion(v);
>> +    local_irq_disable();
>> +
>> +    if ( !handled )
>> +        return true;
>> +#endif
>> +
>>       if ( likely(!v->arch.need_flush_to_ram) )
>> -        return;
>> +        return false;
>>         /*
>>        * Give a chance for the pCPU to process work before handling 
>> the vCPU
>> @@ -2265,6 +2285,8 @@ static void check_for_vcpu_work(void)
>>       local_irq_enable();
>>       p2m_flush_vm(v);
>>       local_irq_disable();
>> +
>> +    return false;
>>   }
>>     /*
>> @@ -2277,8 +2299,10 @@ void leave_hypervisor_to_guest(void)
>>   {
>>       local_irq_disable();
>>   -    check_for_vcpu_work();
>> -    check_for_pcpu_work();
>> +    do
>> +    {
>> +        check_for_pcpu_work();
>> +    } while ( check_for_vcpu_work() );
>>         vgic_sync_to_lrs();
>>   diff --git a/xen/include/asm-arm/domain.h 
>> b/xen/include/asm-arm/domain.h
>> index 6819a3b..d1c48d7 100644
>> --- a/xen/include/asm-arm/domain.h
>> +++ b/xen/include/asm-arm/domain.h
>> @@ -11,10 +11,27 @@
>>   #include <asm/vgic.h>
>>   #include <asm/vpl011.h>
>>   #include <public/hvm/params.h>
>> +#include <public/hvm/dm_op.h>
>> +#include <public/hvm/ioreq.h>
>> +
>> +#define MAX_NR_IOREQ_SERVERS 8
>>     struct hvm_domain
>>   {
>>       uint64_t              params[HVM_NR_PARAMS];
>> +
>> +    /* Guest page range used for non-default ioreq servers */
>> +    struct {
>> +        unsigned long base;
>> +        unsigned long mask;
>> +        unsigned long legacy_mask; /* indexed by HVM param number */
>> +    } ioreq_gfn;
>> +
>> +    /* Lock protects all other values in the sub-struct and the 
>> default */
>> +    struct {
>> +        spinlock_t              lock;
>> +        struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
>> +    } ioreq_server;
>>   };
>>     #ifdef CONFIG_ARM_64
>> @@ -91,6 +108,28 @@ struct arch_domain
>>   #endif
>>   }  __cacheline_aligned;
>>   +enum hvm_io_completion {
>> +    HVMIO_no_completion,
>> +    HVMIO_mmio_completion,
>> +    HVMIO_pio_completion
>> +};
>> +
>> +struct hvm_vcpu_io {
>> +    /* I/O request in flight to device model. */
>> +    enum hvm_io_completion io_completion;
>> +    ioreq_t                io_req;
>> +
>> +    /*
>> +     * HVM emulation:
>> +     *  Linear address @mmio_gla maps to MMIO physical frame 
>> @mmio_gpfn.
>> +     *  The latter is known to be an MMIO frame (not RAM).
>> +     *  This translation is only valid for accesses as per 
>> @mmio_access.
>> +     */
>> +    struct npfec        mmio_access;
>> +    unsigned long       mmio_gla;
>> +    unsigned long       mmio_gpfn;
>> +};
>> +
>
> Why do we need to re-define most of this? Can't this just be in common 
> code?

Jan asked almost the similar question in "[PATCH V1 02/16] xen/ioreq: 
Make x86's IOREQ feature common".
Please see my answer there:
https://patchwork.kernel.org/patch/11769105/#23637511

Theoretically we could move this to the common code, but the question is 
how to be with other struct fields the x86's struct hvm_vcpu_io 
has/needs but
Arm's seems not, would it be possible to logically split struct 
hvm_vcpu_io into common and arch parts?

struct hvm_vcpu_io {
     /* I/O request in flight to device model. */
     enum hvm_io_completion io_completion;
     ioreq_t                io_req;

     /*
      * HVM emulation:
      *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
      *  The latter is known to be an MMIO frame (not RAM).
      *  This translation is only valid for accesses as per @mmio_access.
      */
     struct npfec        mmio_access;
     unsigned long       mmio_gla;
     unsigned long       mmio_gpfn;

     /*
      * We may need to handle up to 3 distinct memory accesses per
      * instruction.
      */
     struct hvm_mmio_cache mmio_cache[3];
     unsigned int mmio_cache_count;

     /* For retries we shouldn't re-fetch the instruction. */
     unsigned int mmio_insn_bytes;
     unsigned char mmio_insn[16];
     struct hvmemul_cache *cache;

     /*
      * For string instruction emulation we need to be able to signal a
      * necessary retry through other than function return codes.
      */
     bool_t mmio_retry;

     unsigned long msix_unmask_address;
     unsigned long msix_snoop_address;
     unsigned long msix_snoop_gpa;

     const struct g2m_ioport *g2m_ioport;
};


>
> I would also rather not define them if !CONFIG_IOREQ_SERVER is not set.

ok


>
>
>
>>   struct arch_vcpu
>>   {
>>       struct {
>> @@ -204,6 +243,11 @@ struct arch_vcpu
>>        */
>>       bool need_flush_to_ram;
>>   +    struct hvm_vcpu
>> +    {
>> +        struct hvm_vcpu_io hvm_io;
>> +    } hvm;
>
> The IOREQ code is meant to be agnostic from the type of guest, so I 
> don't really see a reason for the common code to access arch.hvm.
>
> This should be abstracted appropriately.
Yes, there is a discussion pending in "[PATCH V1 02/16] xen/ioreq: Make 
x86's IOREQ feature common" about layering violation issue.
There won't be any uses of *arch.hvm* in the common ioreq come in next 
series. Everything containing *arch.hvm* in the common code will be 
abstracted away.


>
>
>> +
>>   }  __cacheline_aligned;
>>     void vcpu_show_execution_state(struct vcpu *);
>> @@ -262,6 +306,8 @@ static inline void arch_vcpu_block(struct vcpu 
>> *v) {}
>>     #define arch_vm_assist_valid_mask(d) (1UL << 
>> VMASST_TYPE_runstate_update_flag)
>>   +#define has_vpci(d)    ({ (void)(d); false; })
>> +
>>   #endif /* __ASM_DOMAIN_H__ */
>>     /*
>> diff --git a/xen/include/asm-arm/hvm/ioreq.h 
>> b/xen/include/asm-arm/hvm/ioreq.h
>> new file mode 100644
>> index 0000000..1c34df0
>> --- /dev/null
>> +++ b/xen/include/asm-arm/hvm/ioreq.h
>> @@ -0,0 +1,108 @@
>> +/*
>> + * hvm.h: Hardware virtual machine assist interface definitions.
>> + *
>> + * Copyright (c) 2016 Citrix Systems Inc.
>> + * Copyright (c) 2019 Arm ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License 
>> along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef __ASM_ARM_HVM_IOREQ_H__
>> +#define __ASM_ARM_HVM_IOREQ_H__
>> +
>> +#include <public/hvm/ioreq.h>
>> +#include <public/hvm/dm_op.h>
>> +
>> +#ifdef CONFIG_IOREQ_SERVER
>> +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu 
>> *v);
>> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>> +                             struct vcpu *v, mmio_info_t *info);
>> +#else
>> +static inline enum io_state handle_ioserv(struct cpu_user_regs *regs,
>> +                                          struct vcpu *v)
>> +{
>> +    return IO_UNHANDLED;
>> +}
>> +
>> +static inline enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>> +                                           struct vcpu *v, 
>> mmio_info_t *info)
>> +{
>> +    return IO_UNHANDLED;
>> +}
>> +#endif
>> +
>> +bool ioreq_handle_complete_mmio(void);
>> +
>> +static inline bool handle_pio(uint16_t port, unsigned int size, int 
>> dir)
>> +{
>> +    /*
>> +     * TODO: For Arm64, the main user will be PCI. So this should be
>> +     * implemented when we add support for vPCI.
>> +     */
>> +    BUG();
>
> Why do you use a BUG() and not an ASSERT_UNREACHABLE()?

Yes, ASSERT_UNREACHABLE() is better suited here.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
  2020-09-23 18:12             ` Julien Grall
@ 2020-09-23 20:29               ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-23 20:29 UTC (permalink / raw)
  To: Julien Grall, paul, 'Jan Beulich'
  Cc: xen-devel, 'Oleksandr Tyshchenko',
	'Stefano Stabellini', 'Julien Grall'


On 23.09.20 21:12, Julien Grall wrote:

Hi Julien

>
>>>>>
>>>>>
>>>>> On 16/09/2020 10:04, Jan Beulich wrote:
>>>>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>>>>> @@ -1325,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct 
>>>>>>> hvm_ioreq_server *s, ioreq_t *p)
>>>>>>>
>>>>>>>            new.read_pointer = old.read_pointer - n * 
>>>>>>> IOREQ_BUFFER_SLOT_NUM;
>>>>>>>            new.write_pointer = old.write_pointer - n * 
>>>>>>> IOREQ_BUFFER_SLOT_NUM;
>>>>>>> -        cmpxchg(&pg->ptrs.full, old.full, new.full);
>>>>>>> +        guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full);
>>>>>>
>>>>>> But the memory we're updating is shared with s->emulator, not 
>>>>>> with d,
>>>>>> if I'm not mistaken.
>>>>>
>>>>> It is unfortunately shared with both s->emulator and d when using the
>>>>> legacy interface.
>>>>
>>>> When using magic pages they should be punched out of the P2M by the 
>>>> time the code gets here, so the memory should not be guest-visible.
>>>
>>> Can you point me to the code that doing this?
>>>
>>> Cheers,
>>>
>> If we are not going to use legacy interface on Arm we will have a 
>> page to be mapped in a single domain at the time.
> Right, but this is common code... You have to think what would be the 
> implication if we are using the legacy interface.
>
> Thankfully the only user of the legacy interface is x86 so far and 
> there is not concern regarding the atomics operations.
>
> If we are happy to consider that the legacy interface will never be 
> used (I am starting to be worry that one will ask it on Arm...) then 
> we should be fine.
>
> I think would be worth documenting in the commit message and the code 
> (hvm_allow_set_param()) that the legacy interface *must* not be used 
> without revisiting the code.

Will do.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-23 12:28                 ` Oleksandr
@ 2020-09-24 10:58                   ` Jan Beulich
  2020-09-24 15:38                     ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-24 10:58 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall

On 23.09.2020 14:28, Oleksandr wrote:
> On 22.09.20 18:52, Jan Beulich wrote:
>> On 22.09.2020 17:05, Oleksandr wrote:
>>> 3. *arch.hvm.hvm_io*: We could also use the following:
>>>
>>>      #define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
>>>      #define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)
>>>
>>>      This way struct hvm_vcpu_io won't be used in common code as well.
>> But if Arm needs similar field, why keep them in arch.hvm.hvm_io?
> Yes, Arm needs the "some" fields, but not "all of them" as x86 has.
> For example Arm needs only the following (at least in the context of 
> this series):
> 
> +struct hvm_vcpu_io {
> +    /* I/O request in flight to device model. */
> +    enum hvm_io_completion io_completion;
> +    ioreq_t                io_req;
> +
> +    /*
> +     * HVM emulation:
> +     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
> +     *  The latter is known to be an MMIO frame (not RAM).
> +     *  This translation is only valid for accesses as per @mmio_access.
> +     */
> +    struct npfec        mmio_access;
> +    unsigned long       mmio_gla;
> +    unsigned long       mmio_gpfn;
> +};
> 
> But for x86 the number of fields is quite bigger. If they were in same 
> way applicable for both archs (as what we have with ioreq_server struct)
> I would move it to the common domain. I didn't think of a better idea 
> than just abstracting accesses to these (used in common ioreq.c) two 
> fields by macro.

I'm surprised Arm would need all the three last fields that you
mention. Both here and ...

>>> @@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct
>>> hvm_ioreq_server *s)
>>>        for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>        {
>>>            if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
>>> -            return _gfn(d->arch.hvm.params[i]);
>>> +            return _gfn(ioreq_get_params(d, i));
>>>        }
>>>
>>>        return INVALID_GFN;
>>> @@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct
>>> hvm_ioreq_server *s,
>>>
>>>        for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>        {
>>> -        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
>>> +        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
>>>                 break;
>>>        }
>>>        if ( i > HVM_PARAM_BUFIOREQ_PFN )
>> And these two are needed by Arm? Shouldn't Arm exclusively use
>> the new model, via acquire_resource?
> I dropped HVMOP plumbing on Arm as it was requested. Only acquire 
> interface should be used.
> This code is not supposed to be called on Arm, but it is a part of 
> common code and we need to find a way how to abstract away *arch.hvm.params*

... here I wonder whether you aren't moving more pieces to common
code than are actually arch-independent. I think a prereq step
missing so far is to clearly identify which pieces of the code
are arch-independent, and work towards abstracting away all of the
arch-dependent ones.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 07/16] xen/dm: Make x86's DM feature common
  2020-09-22 16:46     ` Oleksandr
@ 2020-09-24 11:03       ` Jan Beulich
  2020-09-24 12:47         ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-24 11:03 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini,
	Daniel De Graaf, Julien Grall

On 22.09.2020 18:46, Oleksandr wrote:
> 
> On 14.09.20 18:56, Jan Beulich wrote:
> Hi Jan
> 
>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>> --- a/xen/include/xen/hypercall.h
>>> +++ b/xen/include/xen/hypercall.h
>>> @@ -150,6 +150,18 @@ do_dm_op(
>>>       unsigned int nr_bufs,
>>>       XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs);
>>>   
>>> +struct dmop_args {
>>> +    domid_t domid;
>>> +    unsigned int nr_bufs;
>>> +    /* Reserve enough buf elements for all current hypercalls. */
>>> +    struct xen_dm_op_buf buf[2];
>>> +};
>>> +
>>> +int arch_dm_op(struct xen_dm_op *op,
>>> +               struct domain *d,
>>> +               const struct dmop_args *op_args,
>>> +               bool *const_op);
>>> +
>>>   #ifdef CONFIG_HYPFS
>>>   extern long
>>>   do_hypfs_op(
>> There are exactly two CUs which need to see these two declarations.
>> Personally I think they should go into a new header, or at least
>> into one that half-way fits (from the pov of its other contents)
>> and doesn't get included by half the code base. But maybe it's
>> just me ...
> 
> I am afraid, I didn't get why this header is not suitable for keeping 
> this stuff...

While I have no major objection against exposing arch_dm_op() to more
than just the relevant CUs, I don't think I'd like to see struct
dmop_args becoming visible to "everyone", and in particular changes
to it causing a re-build of (almost) everything.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-23 20:16     ` Oleksandr
@ 2020-09-24 11:08       ` Jan Beulich
  2020-09-24 16:02         ` Oleksandr
  2020-09-24 16:51         ` Julien Grall
  2020-09-24 17:25       ` Julien Grall
  1 sibling, 2 replies; 111+ messages in thread
From: Jan Beulich @ 2020-09-24 11:08 UTC (permalink / raw)
  To: Oleksandr
  Cc: Julien Grall, xen-devel, Oleksandr Tyshchenko,
	Stefano Stabellini, Volodymyr Babchuk, Julien Grall

On 23.09.2020 22:16, Oleksandr wrote:
> On 23.09.20 21:03, Julien Grall wrote:
>> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> @@ -91,6 +108,28 @@ struct arch_domain
>>>   #endif
>>>   }  __cacheline_aligned;
>>>   +enum hvm_io_completion {
>>> +    HVMIO_no_completion,
>>> +    HVMIO_mmio_completion,
>>> +    HVMIO_pio_completion
>>> +};
>>> +
>>> +struct hvm_vcpu_io {
>>> +    /* I/O request in flight to device model. */
>>> +    enum hvm_io_completion io_completion;
>>> +    ioreq_t                io_req;
>>> +
>>> +    /*
>>> +     * HVM emulation:
>>> +     *  Linear address @mmio_gla maps to MMIO physical frame 
>>> @mmio_gpfn.
>>> +     *  The latter is known to be an MMIO frame (not RAM).
>>> +     *  This translation is only valid for accesses as per 
>>> @mmio_access.
>>> +     */
>>> +    struct npfec        mmio_access;
>>> +    unsigned long       mmio_gla;
>>> +    unsigned long       mmio_gpfn;
>>> +};
>>> +
>>
>> Why do we need to re-define most of this? Can't this just be in common 
>> code?
> 
> Jan asked almost the similar question in "[PATCH V1 02/16] xen/ioreq: 
> Make x86's IOREQ feature common".
> Please see my answer there:
> https://patchwork.kernel.org/patch/11769105/#23637511
> 
> Theoretically we could move this to the common code, but the question is 
> how to be with other struct fields the x86's struct hvm_vcpu_io 
> has/needs but
> Arm's seems not, would it be possible to logically split struct 
> hvm_vcpu_io into common and arch parts?

Have struct vcpu_io and struct arch_vcpu_io as a sub-part of it?

Jan





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common
  2020-09-22 19:32     ` Oleksandr
@ 2020-09-24 11:16       ` Jan Beulich
  2020-09-24 16:45         ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-24 11:16 UTC (permalink / raw)
  To: Oleksandr
  Cc: Roger Pau Monné,
	xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Wei Liu, Paul Durrant, Julien Grall

On 22.09.2020 21:32, Oleksandr wrote:
> On 16.09.20 11:50, Jan Beulich wrote:
>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -1651,6 +1651,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>           break;
>>>       }
>>>   
>>> +#ifdef CONFIG_IOREQ_SERVER
>>> +    if ( op == XENMEM_decrease_reservation )
>>> +        curr_d->qemu_mapcache_invalidate = true;
>>> +#endif
>> I don't see why you put this right into decrease_reservation(). This
>> isn't just to avoid the extra conditional, but first and foremost to
>> avoid bypassing the earlier return from the function (in the case of
>> preemption). In the context of this I wonder whether the ordering of
>> operations in hvm_hypercall() is actually correct.
> Good point, indeed we may return earlier in case of preemption, I missed 
> that.
> Will move it to decrease_reservation(). But, we may return even earlier 
> in case of error...
> Now I am wondering should we move it to the very beginning of command 
> processing or not?

In _this_ series I'd strongly recommend you keep things working as
they are. If independently you think you've found a reason to
re-order certain operations, then feel free to send a patch with
suitable justification.

>> I'm also unconvinced curr_d is the right domain in all cases here;
>> while this may be a pre-existing issue in principle, I'm afraid it
>> gets more pronounced by the logic getting moved to common code.
> 
> Sorry I didn't get your concern here.

Well, you need to be concerned whose qemu_mapcache_invalidate flag
you set.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 07/16] xen/dm: Make x86's DM feature common
  2020-09-24 11:03       ` Jan Beulich
@ 2020-09-24 12:47         ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-24 12:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini,
	Daniel De Graaf, Julien Grall


On 24.09.20 14:03, Jan Beulich wrote:

Hi Jan

> On 22.09.2020 18:46, Oleksandr wrote:
>> On 14.09.20 18:56, Jan Beulich wrote:
>> Hi Jan
>>
>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>> --- a/xen/include/xen/hypercall.h
>>>> +++ b/xen/include/xen/hypercall.h
>>>> @@ -150,6 +150,18 @@ do_dm_op(
>>>>        unsigned int nr_bufs,
>>>>        XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs);
>>>>    
>>>> +struct dmop_args {
>>>> +    domid_t domid;
>>>> +    unsigned int nr_bufs;
>>>> +    /* Reserve enough buf elements for all current hypercalls. */
>>>> +    struct xen_dm_op_buf buf[2];
>>>> +};
>>>> +
>>>> +int arch_dm_op(struct xen_dm_op *op,
>>>> +               struct domain *d,
>>>> +               const struct dmop_args *op_args,
>>>> +               bool *const_op);
>>>> +
>>>>    #ifdef CONFIG_HYPFS
>>>>    extern long
>>>>    do_hypfs_op(
>>> There are exactly two CUs which need to see these two declarations.
>>> Personally I think they should go into a new header, or at least
>>> into one that half-way fits (from the pov of its other contents)
>>> and doesn't get included by half the code base. But maybe it's
>>> just me ...
>> I am afraid, I didn't get why this header is not suitable for keeping
>> this stuff...
> While I have no major objection against exposing arch_dm_op() to more
> than just the relevant CUs, I don't think I'd like to see struct
> dmop_args becoming visible to "everyone", and in particular changes
> to it causing a re-build of (almost) everything.

Thank you for clarification, I got your point

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-24 10:58                   ` Jan Beulich
@ 2020-09-24 15:38                     ` Oleksandr
  2020-09-24 15:51                       ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-24 15:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall


On 24.09.20 13:58, Jan Beulich wrote:

Hi Jan

> On 23.09.2020 14:28, Oleksandr wrote:
>> On 22.09.20 18:52, Jan Beulich wrote:
>>> On 22.09.2020 17:05, Oleksandr wrote:
>>>> 3. *arch.hvm.hvm_io*: We could also use the following:
>>>>
>>>>       #define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
>>>>       #define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)
>>>>
>>>>       This way struct hvm_vcpu_io won't be used in common code as well.
>>> But if Arm needs similar field, why keep them in arch.hvm.hvm_io?
>> Yes, Arm needs the "some" fields, but not "all of them" as x86 has.
>> For example Arm needs only the following (at least in the context of
>> this series):
>>
>> +struct hvm_vcpu_io {
>> +    /* I/O request in flight to device model. */
>> +    enum hvm_io_completion io_completion;
>> +    ioreq_t                io_req;
>> +
>> +    /*
>> +     * HVM emulation:
>> +     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
>> +     *  The latter is known to be an MMIO frame (not RAM).
>> +     *  This translation is only valid for accesses as per @mmio_access.
>> +     */
>> +    struct npfec        mmio_access;
>> +    unsigned long       mmio_gla;
>> +    unsigned long       mmio_gpfn;
>> +};
>>
>> But for x86 the number of fields is quite bigger. If they were in same
>> way applicable for both archs (as what we have with ioreq_server struct)
>> I would move it to the common domain. I didn't think of a better idea
>> than just abstracting accesses to these (used in common ioreq.c) two
>> fields by macro.
> I'm surprised Arm would need all the three last fields that you
> mention. Both here and ...
>
>>>> @@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct
>>>> hvm_ioreq_server *s)
>>>>         for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>>         {
>>>>             if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
>>>> -            return _gfn(d->arch.hvm.params[i]);
>>>> +            return _gfn(ioreq_get_params(d, i));
>>>>         }
>>>>
>>>>         return INVALID_GFN;
>>>> @@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct
>>>> hvm_ioreq_server *s,
>>>>
>>>>         for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>>         {
>>>> -        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
>>>> +        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
>>>>                  break;
>>>>         }
>>>>         if ( i > HVM_PARAM_BUFIOREQ_PFN )
>>> And these two are needed by Arm? Shouldn't Arm exclusively use
>>> the new model, via acquire_resource?
>> I dropped HVMOP plumbing on Arm as it was requested. Only acquire
>> interface should be used.
>> This code is not supposed to be called on Arm, but it is a part of
>> common code and we need to find a way how to abstract away *arch.hvm.params*
> ... here I wonder whether you aren't moving more pieces to common
> code than are actually arch-independent. I think a prereq step
> missing so far is to clearly identify which pieces of the code
> are arch-independent, and work towards abstracting away all of the
> arch-dependent ones.
Unfortunately, not all things are clear and obvious from the very beginning.
I have to admit, I didn't even imagine earlier that *arch.hvm.* usage in 
the common code is a layering violation issue.
Hopefully, now it is clear as well as the steps to avoid it in future.

...


I saw your advise (but haven't answered yet there) regarding splitting 
struct hvm_vcpu_io in
[PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM 
features. I think, it makes sense.
The only remaining bits I would like to clarify here is 
*arch.hvm.params*. Should we really want to move HVM params field to the 
common code
rather than abstracting away by proposed macro? Although stores a few 
IOREQ params, it is not a (completely) IOREQ stuff and is specific to 
the architecture.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-24 15:38                     ` Oleksandr
@ 2020-09-24 15:51                       ` Jan Beulich
  0 siblings, 0 replies; 111+ messages in thread
From: Jan Beulich @ 2020-09-24 15:51 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Andrew Cooper, George Dunlap,
	Ian Jackson, Julien Grall, Stefano Stabellini, Wei Liu,
	Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall

On 24.09.2020 17:38, Oleksandr wrote:
> On 24.09.20 13:58, Jan Beulich wrote:
>> On 23.09.2020 14:28, Oleksandr wrote:
>>> On 22.09.20 18:52, Jan Beulich wrote:
>>>> On 22.09.2020 17:05, Oleksandr wrote:
>>>>> @@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct
>>>>> hvm_ioreq_server *s)
>>>>>         for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>>>         {
>>>>>             if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
>>>>> -            return _gfn(d->arch.hvm.params[i]);
>>>>> +            return _gfn(ioreq_get_params(d, i));
>>>>>         }
>>>>>
>>>>>         return INVALID_GFN;
>>>>> @@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct
>>>>> hvm_ioreq_server *s,
>>>>>
>>>>>         for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>>>         {
>>>>> -        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
>>>>> +        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
>>>>>                  break;
>>>>>         }
>>>>>         if ( i > HVM_PARAM_BUFIOREQ_PFN )
>>>> And these two are needed by Arm? Shouldn't Arm exclusively use
>>>> the new model, via acquire_resource?
>>> I dropped HVMOP plumbing on Arm as it was requested. Only acquire
>>> interface should be used.
>>> This code is not supposed to be called on Arm, but it is a part of
>>> common code and we need to find a way how to abstract away *arch.hvm.params*
>> ... here I wonder whether you aren't moving more pieces to common
>> code than are actually arch-independent. I think a prereq step
>> missing so far is to clearly identify which pieces of the code
>> are arch-independent, and work towards abstracting away all of the
>> arch-dependent ones.
> Unfortunately, not all things are clear and obvious from the very beginning.
> I have to admit, I didn't even imagine earlier that *arch.hvm.* usage in 
> the common code is a layering violation issue.
> Hopefully, now it is clear as well as the steps to avoid it in future.
> 
> ...
> 
> 
> I saw your advise (but haven't answered yet there) regarding splitting 
> struct hvm_vcpu_io in
> [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM 
> features. I think, it makes sense.
> The only remaining bits I would like to clarify here is 
> *arch.hvm.params*. Should we really want to move HVM params field to the 
> common code
> rather than abstracting away by proposed macro?

I don't think I suggested doing so. In fact I recall having voiced
my expectation that Arm wouldn't use this at all. So yes, I agree
this better wouldn't be moved out of arch.hvm, but instead accesses
be abstracted by another means.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-24 11:08       ` Jan Beulich
@ 2020-09-24 16:02         ` Oleksandr
  2020-09-24 18:02           ` Oleksandr
  2020-09-24 16:51         ` Julien Grall
  1 sibling, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-24 16:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, xen-devel, Oleksandr Tyshchenko,
	Stefano Stabellini, Volodymyr Babchuk, Julien Grall


On 24.09.20 14:08, Jan Beulich wrote:

Hi Jan

> On 23.09.2020 22:16, Oleksandr wrote:
>> On 23.09.20 21:03, Julien Grall wrote:
>>> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> @@ -91,6 +108,28 @@ struct arch_domain
>>>>    #endif
>>>>    }  __cacheline_aligned;
>>>>    +enum hvm_io_completion {
>>>> +    HVMIO_no_completion,
>>>> +    HVMIO_mmio_completion,
>>>> +    HVMIO_pio_completion
>>>> +};
>>>> +
>>>> +struct hvm_vcpu_io {
>>>> +    /* I/O request in flight to device model. */
>>>> +    enum hvm_io_completion io_completion;
>>>> +    ioreq_t                io_req;
>>>> +
>>>> +    /*
>>>> +     * HVM emulation:
>>>> +     *  Linear address @mmio_gla maps to MMIO physical frame
>>>> @mmio_gpfn.
>>>> +     *  The latter is known to be an MMIO frame (not RAM).
>>>> +     *  This translation is only valid for accesses as per
>>>> @mmio_access.
>>>> +     */
>>>> +    struct npfec        mmio_access;
>>>> +    unsigned long       mmio_gla;
>>>> +    unsigned long       mmio_gpfn;
>>>> +};
>>>> +
>>> Why do we need to re-define most of this? Can't this just be in common
>>> code?
>> Jan asked almost the similar question in "[PATCH V1 02/16] xen/ioreq:
>> Make x86's IOREQ feature common".
>> Please see my answer there:
>> https://patchwork.kernel.org/patch/11769105/#23637511
>>
>> Theoretically we could move this to the common code, but the question is
>> how to be with other struct fields the x86's struct hvm_vcpu_io
>> has/needs but
>> Arm's seems not, would it be possible to logically split struct
>> hvm_vcpu_io into common and arch parts?
> Have struct vcpu_io and struct arch_vcpu_io as a sub-part of it?
Although it is going to pull a lot of changes into x86/hvm/*, yes this way
we indeed could logically split struct hvm_vcpu_io into common and arch 
parts in a clear way.
If it is really worth it, I will start looking into it.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common
  2020-09-24 11:16       ` Jan Beulich
@ 2020-09-24 16:45         ` Oleksandr
  2020-09-25  7:03           ` Jan Beulich
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-24 16:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Wei Liu, Paul Durrant, Julien Grall


On 24.09.20 14:16, Jan Beulich wrote:

Hi Jan

> On 22.09.2020 21:32, Oleksandr wrote:
>> On 16.09.20 11:50, Jan Beulich wrote:
>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>> --- a/xen/common/memory.c
>>>> +++ b/xen/common/memory.c
>>>> @@ -1651,6 +1651,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>            break;
>>>>        }
>>>>    
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +    if ( op == XENMEM_decrease_reservation )
>>>> +        curr_d->qemu_mapcache_invalidate = true;
>>>> +#endif
>>> I don't see why you put this right into decrease_reservation(). This
>>> isn't just to avoid the extra conditional, but first and foremost to
>>> avoid bypassing the earlier return from the function (in the case of
>>> preemption). In the context of this I wonder whether the ordering of
>>> operations in hvm_hypercall() is actually correct.
>> Good point, indeed we may return earlier in case of preemption, I missed
>> that.
>> Will move it to decrease_reservation(). But, we may return even earlier
>> in case of error...
>> Now I am wondering should we move it to the very beginning of command
>> processing or not?
> In _this_ series I'd strongly recommend you keep things working as
> they are. If independently you think you've found a reason to
> re-order certain operations, then feel free to send a patch with
> suitable justification.

Of course, I will try to retain current behavior.


>>> I'm also unconvinced curr_d is the right domain in all cases here;
>>> while this may be a pre-existing issue in principle, I'm afraid it
>>> gets more pronounced by the logic getting moved to common code.
>> Sorry I didn't get your concern here.
> Well, you need to be concerned whose qemu_mapcache_invalidate flag
> you set.
May I ask, in what cases the *curr_d* is the right domain?

We need to make sure that domain is using IOREQ server(s) at least. 
Hopefully, we have a helper for this
which is hvm_domain_has_ioreq_server(). Please clarify, anything else I 
should taking care of?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-24 11:08       ` Jan Beulich
  2020-09-24 16:02         ` Oleksandr
@ 2020-09-24 16:51         ` Julien Grall
  1 sibling, 0 replies; 111+ messages in thread
From: Julien Grall @ 2020-09-24 16:51 UTC (permalink / raw)
  To: Jan Beulich, Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall



On 24/09/2020 12:08, Jan Beulich wrote:
> On 23.09.2020 22:16, Oleksandr wrote:
>> On 23.09.20 21:03, Julien Grall wrote:
>>> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> @@ -91,6 +108,28 @@ struct arch_domain
>>>>    #endif
>>>>    }  __cacheline_aligned;
>>>>    +enum hvm_io_completion {
>>>> +    HVMIO_no_completion,
>>>> +    HVMIO_mmio_completion,
>>>> +    HVMIO_pio_completion
>>>> +};
>>>> +
>>>> +struct hvm_vcpu_io {
>>>> +    /* I/O request in flight to device model. */
>>>> +    enum hvm_io_completion io_completion;
>>>> +    ioreq_t                io_req;
>>>> +
>>>> +    /*
>>>> +     * HVM emulation:
>>>> +     *  Linear address @mmio_gla maps to MMIO physical frame
>>>> @mmio_gpfn.
>>>> +     *  The latter is known to be an MMIO frame (not RAM).
>>>> +     *  This translation is only valid for accesses as per
>>>> @mmio_access.
>>>> +     */
>>>> +    struct npfec        mmio_access;
>>>> +    unsigned long       mmio_gla;
>>>> +    unsigned long       mmio_gpfn;
>>>> +};
>>>> +
>>>
>>> Why do we need to re-define most of this? Can't this just be in common
>>> code?
>>
>> Jan asked almost the similar question in "[PATCH V1 02/16] xen/ioreq:
>> Make x86's IOREQ feature common".
>> Please see my answer there:
>> https://patchwork.kernel.org/patch/11769105/#23637511
>>
>> Theoretically we could move this to the common code, but the question is
>> how to be with other struct fields the x86's struct hvm_vcpu_io
>> has/needs but
>> Arm's seems not, would it be possible to logically split struct
>> hvm_vcpu_io into common and arch parts?
> 
> Have struct vcpu_io and struct arch_vcpu_io as a sub-part of it?

+1 for the idea.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-23 20:16     ` Oleksandr
  2020-09-24 11:08       ` Jan Beulich
@ 2020-09-24 17:25       ` Julien Grall
  2020-09-24 18:22         ` Oleksandr
  1 sibling, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-24 17:25 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall



On 23/09/2020 21:16, Oleksandr wrote:
> 
> On 23.09.20 21:03, Julien Grall wrote:
>> Hi,
> 
> Hi Julien
> 
> 
>>
>> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> I believe I am the originally author of this code...
> 
> Sorry, will fix
> 
> 
>>
>>> This patch adds basic IOREQ/DM support on Arm. The subsequent
>>> patches will improve functionality, add remaining bits as well as
>>> address several TODOs.
>>
>> Find a bit weird to add code with TODOs that are handled in the same 
>> series? Can't we just split this patch in smaller one where everything 
>> is addressed from start?
> Sorry if I wasn't clear in description. Let me please clarify.
> Corresponding RFC patch had 3 major TODOs:
> 1. Handle properly when hvm_send_ioreq() returns IO_RETRY
> 2. Proper ref-counting for the foreign entries in set_foreign_p2m_entry()
> 3. Check the return value of handle_hvm_io_completion() *and* avoid 
> calling handle_hvm_io_completion() on every return.
> 
> TODO #1 was fixed in current patch
> TODO #2 was fixed in "xen/mm: Handle properly reference in 
> set_foreign_p2m_entry() on Arm"
> TODO #3 was partially fixed in current patch (check the return value of 
> handle_hvm_io_completion()).
> The second part of TODO #3 (avoid calling handle_hvm_io_completion() on 
> every return) was moved to a separate patch "xen/ioreq: Introduce 
> hvm_domain_has_ioreq_server()"
> and fixed (or probably improved is a better word) there along with 
> introducing a mechanism to actually improve.

Right, none of those TODOs are described in the code. So it makes more 
difficult to know what you are actually referring to.

I would suggest to reshuffle the series so the TODOs are addressed 
before when possible.

> 
> Could you please clarify how this patch could be split in smaller one?

This patch is going to be reduced a fair bit if you make some of the 
structure common. The next steps would be to move anything that is not 
directly related to IOREQ out.

 From a quick look, there are few things that can be moved in separate 
patches:
    - The addition of the ASSERT_UNREACHABLE()
    - The addition of the loop in leave_hypervisor_to_guest() as I think 
it deserve some explanation.
    - The sign extension in handle_ioserv() can possibly be abstracted. 
Actually the code is quite similar to handle_read().

> 
> 
>>
>>
>>>
>>> Please note, the "PIO handling" TODO is expected to left unaddressed
>>> for the current series. It is not an big issue for now while Xen
>>> doesn't have support for vPCI on Arm. On Arm64 they are only used
>>> for PCI IO Bar and we would probably want to expose them to emulator
>>> as PIO access to make a DM completely arch-agnostic. So "PIO handling"
>>> should be implemented when we add support for vPCI.
>>>
>>> Please note, at the moment build on Arm32 is broken (see cmpxchg
>>> usage in hvm_send_buffered_ioreq()) due to the lack of cmpxchg_64
>>> support on Arm32. There is a patch on review to address this issue:
>>> https://patchwork.kernel.org/patch/11715559/
>>
>> This has been committed.
> 
> Thank you for the patch, will remove a notice.

For future reference, I think such notice would be better after --- as 
they don't need to be part of the commit message.

> 
>>
>>
>>> +    if ( dabt.write )
>>> +        return IO_HANDLED;
>>> +
>>> +    /*
>>> +     * Sign extend if required.
>>> +     * Note that we expect the read handler to have zeroed the bits
>>> +     * outside the requested access size.
>>> +     */
>>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>> +    {
>>> +        /*
>>> +         * We are relying on register_t using the same as
>>> +         * an unsigned long in order to keep the 32-bit assembly
>>> +         * code smaller.
>>> +         */
>>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>> +        r |= (~0UL) << size;
>>> +    }
>>> +
>>> +    set_user_reg(regs, dabt.reg, r);
>>> +
>>> +    return IO_HANDLED;
>>> +}
>>> +
>>> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>>> +                             struct vcpu *v, mmio_info_t *info)
>>> +{
>>> +    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>>> +    ioreq_t p = {
>>> +        .type = IOREQ_TYPE_COPY,
>>> +        .addr = info->gpa,
>>> +        .size = 1 << info->dabt.size,
>>> +        .count = 1,
>>> +        .dir = !info->dabt.write,
>>> +        /*
>>> +         * On x86, df is used by 'rep' instruction to tell the 
>>> direction
>>> +         * to iterate (forward or backward).
>>> +         * On Arm, all the accesses to MMIO region will do a single
>>> +         * memory access. So for now, we can safely always set to 0.
>>> +         */
>>> +        .df = 0,
>>> +        .data = get_user_reg(regs, info->dabt.reg),
>>> +        .state = STATE_IOREQ_READY,
>>> +    };
>>> +    struct hvm_ioreq_server *s = NULL;
>>> +    enum io_state rc;
>>> +
>>> +    switch ( vio->io_req.state )
>>> +    {
>>> +    case STATE_IOREQ_NONE:
>>> +        break;
>>> +
>>> +    case STATE_IORESP_READY:
>>> +        return IO_HANDLED;
>>> +
>>> +    default:
>>> +        gdprintk(XENLOG_ERR, "wrong state %u\n", vio->io_req.state);
>>> +        return IO_ABORT;
>>> +    }
>>> +
>>> +    s = hvm_select_ioreq_server(v->domain, &p);
>>> +    if ( !s )
>>> +        return IO_UNHANDLED;
>>> +
>>> +    if ( !info->dabt.valid )
>>> +        return IO_ABORT;
>>> +
>>> +    vio->io_req = p;
>>> +
>>> +    rc = hvm_send_ioreq(s, &p, 0);
>>> +    if ( rc != IO_RETRY || v->domain->is_shutting_down )
>>> +        vio->io_req.state = STATE_IOREQ_NONE;
>>> +    else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
>>> +        rc = IO_HANDLED;
>>> +    else
>>> +        vio->io_completion = HVMIO_mmio_completion;
>>> +
>>> +    return rc;
>>> +}
>>> +
>>> +bool ioreq_handle_complete_mmio(void)
>>> +{
>>> +    struct vcpu *v = current;
>>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>>> +    const union hsr hsr = { .bits = regs->hsr };
>>> +    paddr_t addr = v->arch.hvm.hvm_io.io_req.addr;
>>> +
>>> +    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
>>> +    {
>>> +        advance_pc(regs, hsr);
>>> +        return true;
>>> +    }
>>> +
>>> +    return false;
>>> +}
>>> +
>>> +/*
>>> + * Local variables:
>>> + * mode: C
>>> + * c-file-style: "BSD"
>>> + * c-basic-offset: 4
>>> + * tab-width: 4
>>> + * indent-tabs-mode: nil
>>> + * End:
>>> + */
>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>> index 8f40d0e..121942c 100644
>>> --- a/xen/arch/arm/traps.c
>>> +++ b/xen/arch/arm/traps.c
>>> @@ -21,6 +21,7 @@
>>>   #include <xen/hypercall.h>
>>>   #include <xen/init.h>
>>>   #include <xen/iocap.h>
>>> +#include <xen/ioreq.h>
>>>   #include <xen/irq.h>
>>>   #include <xen/lib.h>
>>>   #include <xen/mem_access.h>
>>> @@ -1384,6 +1385,9 @@ static arm_hypercall_t arm_hypercall_table[] = {
>>>   #ifdef CONFIG_HYPFS
>>>       HYPERCALL(hypfs_op, 5),
>>>   #endif
>>> +#ifdef CONFIG_IOREQ_SERVER
>>> +    HYPERCALL(dm_op, 3),
>>> +#endif
>>>   };
>>>     #ifndef NDEBUG
>>> @@ -1955,9 +1959,14 @@ static void do_trap_stage2_abort_guest(struct 
>>> cpu_user_regs *regs,
>>>               case IO_HANDLED:
>>>                   advance_pc(regs, hsr);
>>>                   return;
>>> +            case IO_RETRY:
>>> +                /* finish later */
>>> +                return;
>>>               case IO_UNHANDLED:
>>>                   /* IO unhandled, try another way to handle it. */
>>>                   break;
>>> +            default:
>>> +                ASSERT_UNREACHABLE();
>>>               }
>>>           }
>>>   @@ -2249,12 +2258,23 @@ static void check_for_pcpu_work(void)
>>>    * Process pending work for the vCPU. Any call should be fast or
>>>    * implement preemption.
>>>    */
>>> -static void check_for_vcpu_work(void)
>>> +static bool check_for_vcpu_work(void)
>>>   {
>>>       struct vcpu *v = current;
>>>   +#ifdef CONFIG_IOREQ_SERVER
>>> +    bool handled;
>>> +
>>> +    local_irq_enable();
>>> +    handled = handle_hvm_io_completion(v);
>>> +    local_irq_disable();
>>> +
>>> +    if ( !handled )
>>> +        return true;
>>> +#endif
>>> +
>>>       if ( likely(!v->arch.need_flush_to_ram) )
>>> -        return;
>>> +        return false;
>>>         /*
>>>        * Give a chance for the pCPU to process work before handling 
>>> the vCPU
>>> @@ -2265,6 +2285,8 @@ static void check_for_vcpu_work(void)
>>>       local_irq_enable();
>>>       p2m_flush_vm(v);
>>>       local_irq_disable();
>>> +
>>> +    return false;
>>>   }
>>>     /*
>>> @@ -2277,8 +2299,10 @@ void leave_hypervisor_to_guest(void)
>>>   {
>>>       local_irq_disable();
>>>   -    check_for_vcpu_work();
>>> -    check_for_pcpu_work();
>>> +    do
>>> +    {
>>> +        check_for_pcpu_work();
>>> +    } while ( check_for_vcpu_work() );
>>>         vgic_sync_to_lrs();
>>>   diff --git a/xen/include/asm-arm/domain.h 
>>> b/xen/include/asm-arm/domain.h
>>> index 6819a3b..d1c48d7 100644
>>> --- a/xen/include/asm-arm/domain.h
>>> +++ b/xen/include/asm-arm/domain.h
>>> @@ -11,10 +11,27 @@
>>>   #include <asm/vgic.h>
>>>   #include <asm/vpl011.h>
>>>   #include <public/hvm/params.h>
>>> +#include <public/hvm/dm_op.h>
>>> +#include <public/hvm/ioreq.h>
>>> +
>>> +#define MAX_NR_IOREQ_SERVERS 8
>>>     struct hvm_domain
>>>   {
>>>       uint64_t              params[HVM_NR_PARAMS];
>>> +
>>> +    /* Guest page range used for non-default ioreq servers */
>>> +    struct {
>>> +        unsigned long base;
>>> +        unsigned long mask;
>>> +        unsigned long legacy_mask; /* indexed by HVM param number */
>>> +    } ioreq_gfn;
>>> +
>>> +    /* Lock protects all other values in the sub-struct and the 
>>> default */
>>> +    struct {
>>> +        spinlock_t              lock;
>>> +        struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
>>> +    } ioreq_server;
>>>   };
>>>     #ifdef CONFIG_ARM_64
>>> @@ -91,6 +108,28 @@ struct arch_domain
>>>   #endif
>>>   }  __cacheline_aligned;
>>>   +enum hvm_io_completion {
>>> +    HVMIO_no_completion,
>>> +    HVMIO_mmio_completion,
>>> +    HVMIO_pio_completion
>>> +};
>>> +
>>> +struct hvm_vcpu_io {
>>> +    /* I/O request in flight to device model. */
>>> +    enum hvm_io_completion io_completion;
>>> +    ioreq_t                io_req;
>>> +
>>> +    /*
>>> +     * HVM emulation:
>>> +     *  Linear address @mmio_gla maps to MMIO physical frame 
>>> @mmio_gpfn.
>>> +     *  The latter is known to be an MMIO frame (not RAM).
>>> +     *  This translation is only valid for accesses as per 
>>> @mmio_access.
>>> +     */
>>> +    struct npfec        mmio_access;
>>> +    unsigned long       mmio_gla;
>>> +    unsigned long       mmio_gpfn;
>>> +};
>>> +
>>
>> Why do we need to re-define most of this? Can't this just be in common 
>> code?
> 
> Jan asked almost the similar question in "[PATCH V1 02/16] xen/ioreq: 
> Make x86's IOREQ feature common".
> Please see my answer there:
> https://patchwork.kernel.org/patch/11769105/#23637511
> 
> Theoretically we could move this to the common code, but the question is 
> how to be with other struct fields the x86's struct hvm_vcpu_io 
> has/needs but
> Arm's seems not, would it be possible to logically split struct 
> hvm_vcpu_io into common and arch parts?
> 
> struct hvm_vcpu_io {
>      /* I/O request in flight to device model. */
>      enum hvm_io_completion io_completion;
>      ioreq_t                io_req;
> 
>      /*
>       * HVM emulation:
>       *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
>       *  The latter is known to be an MMIO frame (not RAM).
>       *  This translation is only valid for accesses as per @mmio_access.
>       */
>      struct npfec        mmio_access;
>      unsigned long       mmio_gla;
>      unsigned long       mmio_gpfn;
> 
>      /*
>       * We may need to handle up to 3 distinct memory accesses per
>       * instruction.
>       */
>      struct hvm_mmio_cache mmio_cache[3];
>      unsigned int mmio_cache_count;
> 
>      /* For retries we shouldn't re-fetch the instruction. */
>      unsigned int mmio_insn_bytes;
>      unsigned char mmio_insn[16];
>      struct hvmemul_cache *cache;
> 
>      /*
>       * For string instruction emulation we need to be able to signal a
>       * necessary retry through other than function return codes.
>       */
>      bool_t mmio_retry;
> 
>      unsigned long msix_unmask_address;
>      unsigned long msix_snoop_address;
>      unsigned long msix_snoop_gpa;
> 
>      const struct g2m_ioport *g2m_ioport;
> };

I think Jan made some suggestion today. Let me know if you require more 
input.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-10 20:21 ` [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
  2020-09-14 14:17   ` Jan Beulich
@ 2020-09-24 18:01   ` Julien Grall
  2020-09-25  8:19     ` Paul Durrant
  1 sibling, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-24 18:01 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Andrew Cooper, George Dunlap, Ian Jackson,
	Jan Beulich, Stefano Stabellini, Wei Liu, Roger Pau Monné,
	Paul Durrant, Jun Nakajima, Kevin Tian, Tim Deegan, Julien Grall



On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
> +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
> +{
> +    unsigned int prev_state = STATE_IOREQ_NONE;
> +    unsigned int state = p->state;
> +    uint64_t data = ~0;
> +
> +    smp_rmb();
> +
> +    /*
> +     * The only reason we should see this condition be false is when an
> +     * emulator dying races with I/O being requested.
> +     */
> +    while ( likely(state != STATE_IOREQ_NONE) )
> +    {
> +        if ( unlikely(state < prev_state) )
> +        {
> +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
> +                     prev_state, state);
> +            sv->pending = false;
> +            domain_crash(sv->vcpu->domain);
> +            return false; /* bail */
> +        }
> +
> +        switch ( prev_state = state )
> +        {
> +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> +            p->state = STATE_IOREQ_NONE;
> +            data = p->data;
> +            break;
> +
> +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
> +        case STATE_IOREQ_INPROCESS:
> +            wait_on_xen_event_channel(sv->ioreq_evtchn,
> +                                      ({ state = p->state;
> +                                         smp_rmb();
> +                                         state != prev_state; }));
> +            continue;

As I pointed out previously [1], this helper was implemented with the 
expectation that wait_on_xen_event_channel() will not return if the vCPU 
got rescheduled.

However, this assumption doesn't hold on Arm.

I can see two solution:
    1) Re-execute the caller
    2) Prevent an IOREQ to disappear until the loop finish.

@Paul any opinions?

Cheers,

[1] 
https://lore.kernel.org/xen-devel/6bfc3920-8f29-188c-cff4-2b99dabe166f@xen.org/


-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-24 16:02         ` Oleksandr
@ 2020-09-24 18:02           ` Oleksandr
  2020-09-25  6:51             ` Jan Beulich
  2020-09-26 13:12             ` Julien Grall
  0 siblings, 2 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-24 18:02 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall


On 24.09.20 19:02, Oleksandr wrote:

Hi

>
> On 24.09.20 14:08, Jan Beulich wrote:
>
> Hi Jan
>
>> On 23.09.2020 22:16, Oleksandr wrote:
>>> On 23.09.20 21:03, Julien Grall wrote:
>>>> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>> @@ -91,6 +108,28 @@ struct arch_domain
>>>>>    #endif
>>>>>    }  __cacheline_aligned;
>>>>>    +enum hvm_io_completion {
>>>>> +    HVMIO_no_completion,
>>>>> +    HVMIO_mmio_completion,
>>>>> +    HVMIO_pio_completion
>>>>> +};
>>>>> +
>>>>> +struct hvm_vcpu_io {
>>>>> +    /* I/O request in flight to device model. */
>>>>> +    enum hvm_io_completion io_completion;
>>>>> +    ioreq_t                io_req;
>>>>> +
>>>>> +    /*
>>>>> +     * HVM emulation:
>>>>> +     *  Linear address @mmio_gla maps to MMIO physical frame
>>>>> @mmio_gpfn.
>>>>> +     *  The latter is known to be an MMIO frame (not RAM).
>>>>> +     *  This translation is only valid for accesses as per
>>>>> @mmio_access.
>>>>> +     */
>>>>> +    struct npfec        mmio_access;
>>>>> +    unsigned long       mmio_gla;
>>>>> +    unsigned long       mmio_gpfn;
>>>>> +};
>>>>> +
>>>> Why do we need to re-define most of this? Can't this just be in common
>>>> code?
>>> Jan asked almost the similar question in "[PATCH V1 02/16] xen/ioreq:
>>> Make x86's IOREQ feature common".
>>> Please see my answer there:
>>> https://patchwork.kernel.org/patch/11769105/#23637511
>>>
>>> Theoretically we could move this to the common code, but the 
>>> question is
>>> how to be with other struct fields the x86's struct hvm_vcpu_io
>>> has/needs but
>>> Arm's seems not, would it be possible to logically split struct
>>> hvm_vcpu_io into common and arch parts?
>> Have struct vcpu_io and struct arch_vcpu_io as a sub-part of it?
> Although it is going to pull a lot of changes into x86/hvm/*, yes this 
> way
> we indeed could logically split struct hvm_vcpu_io into common and 
> arch parts in a clear way.
> If it is really worth it, I will start looking into it.
Julien, I noticed that three fields mmio* are not used within current 
series on Arm. Do we expect them to be used later on?
I would rather not add fields which are not used. We could add them when 
needed.

Would be the following acceptable?
1. Both fields "io_completion" and "io_req" (which seems to be the only 
fields used in common/ioreq.c) are moved to common struct vcpu as part 
of struct vcpu_io,
     enum hvm_io_completion is also moved (and renamed).
2. We remove everything related to hvm_vcpu* from the Arm header.
3. x86's struct hvm_vcpu_io stays as it is (but without two fields 
"io_completion" and "io_req").
     I think, this way we separate a common part and reduce Xen changes 
(which are getting bigger).


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-24 17:25       ` Julien Grall
@ 2020-09-24 18:22         ` Oleksandr
  2020-09-26 13:21           ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-24 18:22 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall


On 24.09.20 20:25, Julien Grall wrote:

Hi Julien.

>
>
> On 23/09/2020 21:16, Oleksandr wrote:
>>
>> On 23.09.20 21:03, Julien Grall wrote:
>>> Hi,
>>
>> Hi Julien
>>
>>
>>>
>>> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> I believe I am the originally author of this code...
>>
>> Sorry, will fix
>>
>>
>>>
>>>> This patch adds basic IOREQ/DM support on Arm. The subsequent
>>>> patches will improve functionality, add remaining bits as well as
>>>> address several TODOs.
>>>
>>> Find a bit weird to add code with TODOs that are handled in the same 
>>> series? Can't we just split this patch in smaller one where 
>>> everything is addressed from start?
>> Sorry if I wasn't clear in description. Let me please clarify.
>> Corresponding RFC patch had 3 major TODOs:
>> 1. Handle properly when hvm_send_ioreq() returns IO_RETRY
>> 2. Proper ref-counting for the foreign entries in 
>> set_foreign_p2m_entry()
>> 3. Check the return value of handle_hvm_io_completion() *and* avoid 
>> calling handle_hvm_io_completion() on every return.
>>
>> TODO #1 was fixed in current patch
>> TODO #2 was fixed in "xen/mm: Handle properly reference in 
>> set_foreign_p2m_entry() on Arm"
>> TODO #3 was partially fixed in current patch (check the return value 
>> of handle_hvm_io_completion()).
>> The second part of TODO #3 (avoid calling handle_hvm_io_completion() 
>> on every return) was moved to a separate patch "xen/ioreq: Introduce 
>> hvm_domain_has_ioreq_server()"
>> and fixed (or probably improved is a better word) there along with 
>> introducing a mechanism to actually improve.
>
> Right, none of those TODOs are described in the code. So it makes more 
> difficult to know what you are actually referring to.
>
> I would suggest to reshuffle the series so the TODOs are addressed 
> before when possible.

ok, I will try to prepare something.


>
>>
>> Could you please clarify how this patch could be split in smaller one?
>
> This patch is going to be reduced a fair bit if you make some of the 
> structure common. The next steps would be to move anything that is not 
> directly related to IOREQ out.


Thank you for the clarification.
Yes, however, I believed everything in this patch is directly related to 
IOREQ...


>
>
> From a quick look, there are few things that can be moved in separate 
> patches:
>    - The addition of the ASSERT_UNREACHABLE()

Did you mean the addition of the ASSERT_UNREACHABLE() to 
arch_handle_hvm_io_completion/handle_pio can moved to separate patches?
Sorry, I don't quite understand, for what benefit?


>    - The addition of the loop in leave_hypervisor_to_guest() as I 
> think it deserve some explanation.

Agree that loop in leave_hypervisor_to_guest() needs explanation. Will 
move to separate patch. But, this way I need to return corresponding 
TODO back to this patch.


>    - The sign extension in handle_ioserv() can possibly be abstracted. 
> Actually the code is quite similar to handle_read().

Ok, will consider that.


>
>>
>>>
>>>>
>>>> Please note, the "PIO handling" TODO is expected to left unaddressed
>>>> for the current series. It is not an big issue for now while Xen
>>>> doesn't have support for vPCI on Arm. On Arm64 they are only used
>>>> for PCI IO Bar and we would probably want to expose them to emulator
>>>> as PIO access to make a DM completely arch-agnostic. So "PIO handling"
>>>> should be implemented when we add support for vPCI.
>>>>
>>>> Please note, at the moment build on Arm32 is broken (see cmpxchg
>>>> usage in hvm_send_buffered_ioreq()) due to the lack of cmpxchg_64
>>>> support on Arm32. There is a patch on review to address this issue:
>>>> https://patchwork.kernel.org/patch/11715559/
>>>
>>> This has been committed.
>>
>> Thank you for the patch, will remove a notice.
>
> For future reference, I think such notice would be better after --- as 
> they don't need to be part of the commit message.

Got it.


>
>
>>
>>>
>>>
>>>> +    if ( dabt.write )
>>>> +        return IO_HANDLED;
>>>> +
>>>> +    /*
>>>> +     * Sign extend if required.
>>>> +     * Note that we expect the read handler to have zeroed the bits
>>>> +     * outside the requested access size.
>>>> +     */
>>>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>>>> +    {
>>>> +        /*
>>>> +         * We are relying on register_t using the same as
>>>> +         * an unsigned long in order to keep the 32-bit assembly
>>>> +         * code smaller.
>>>> +         */
>>>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>>>> +        r |= (~0UL) << size;
>>>> +    }
>>>> +
>>>> +    set_user_reg(regs, dabt.reg, r);
>>>> +
>>>> +    return IO_HANDLED;
>>>> +}
>>>> +
>>>> +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>>>> +                             struct vcpu *v, mmio_info_t *info)
>>>> +{
>>>> +    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>>>> +    ioreq_t p = {
>>>> +        .type = IOREQ_TYPE_COPY,
>>>> +        .addr = info->gpa,
>>>> +        .size = 1 << info->dabt.size,
>>>> +        .count = 1,
>>>> +        .dir = !info->dabt.write,
>>>> +        /*
>>>> +         * On x86, df is used by 'rep' instruction to tell the 
>>>> direction
>>>> +         * to iterate (forward or backward).
>>>> +         * On Arm, all the accesses to MMIO region will do a single
>>>> +         * memory access. So for now, we can safely always set to 0.
>>>> +         */
>>>> +        .df = 0,
>>>> +        .data = get_user_reg(regs, info->dabt.reg),
>>>> +        .state = STATE_IOREQ_READY,
>>>> +    };
>>>> +    struct hvm_ioreq_server *s = NULL;
>>>> +    enum io_state rc;
>>>> +
>>>> +    switch ( vio->io_req.state )
>>>> +    {
>>>> +    case STATE_IOREQ_NONE:
>>>> +        break;
>>>> +
>>>> +    case STATE_IORESP_READY:
>>>> +        return IO_HANDLED;
>>>> +
>>>> +    default:
>>>> +        gdprintk(XENLOG_ERR, "wrong state %u\n", vio->io_req.state);
>>>> +        return IO_ABORT;
>>>> +    }
>>>> +
>>>> +    s = hvm_select_ioreq_server(v->domain, &p);
>>>> +    if ( !s )
>>>> +        return IO_UNHANDLED;
>>>> +
>>>> +    if ( !info->dabt.valid )
>>>> +        return IO_ABORT;
>>>> +
>>>> +    vio->io_req = p;
>>>> +
>>>> +    rc = hvm_send_ioreq(s, &p, 0);
>>>> +    if ( rc != IO_RETRY || v->domain->is_shutting_down )
>>>> +        vio->io_req.state = STATE_IOREQ_NONE;
>>>> +    else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
>>>> +        rc = IO_HANDLED;
>>>> +    else
>>>> +        vio->io_completion = HVMIO_mmio_completion;
>>>> +
>>>> +    return rc;
>>>> +}
>>>> +
>>>> +bool ioreq_handle_complete_mmio(void)
>>>> +{
>>>> +    struct vcpu *v = current;
>>>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>>>> +    const union hsr hsr = { .bits = regs->hsr };
>>>> +    paddr_t addr = v->arch.hvm.hvm_io.io_req.addr;
>>>> +
>>>> +    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
>>>> +    {
>>>> +        advance_pc(regs, hsr);
>>>> +        return true;
>>>> +    }
>>>> +
>>>> +    return false;
>>>> +}
>>>> +
>>>> +/*
>>>> + * Local variables:
>>>> + * mode: C
>>>> + * c-file-style: "BSD"
>>>> + * c-basic-offset: 4
>>>> + * tab-width: 4
>>>> + * indent-tabs-mode: nil
>>>> + * End:
>>>> + */
>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>>> index 8f40d0e..121942c 100644
>>>> --- a/xen/arch/arm/traps.c
>>>> +++ b/xen/arch/arm/traps.c
>>>> @@ -21,6 +21,7 @@
>>>>   #include <xen/hypercall.h>
>>>>   #include <xen/init.h>
>>>>   #include <xen/iocap.h>
>>>> +#include <xen/ioreq.h>
>>>>   #include <xen/irq.h>
>>>>   #include <xen/lib.h>
>>>>   #include <xen/mem_access.h>
>>>> @@ -1384,6 +1385,9 @@ static arm_hypercall_t arm_hypercall_table[] = {
>>>>   #ifdef CONFIG_HYPFS
>>>>       HYPERCALL(hypfs_op, 5),
>>>>   #endif
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +    HYPERCALL(dm_op, 3),
>>>> +#endif
>>>>   };
>>>>     #ifndef NDEBUG
>>>> @@ -1955,9 +1959,14 @@ static void 
>>>> do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
>>>>               case IO_HANDLED:
>>>>                   advance_pc(regs, hsr);
>>>>                   return;
>>>> +            case IO_RETRY:
>>>> +                /* finish later */
>>>> +                return;
>>>>               case IO_UNHANDLED:
>>>>                   /* IO unhandled, try another way to handle it. */
>>>>                   break;
>>>> +            default:
>>>> +                ASSERT_UNREACHABLE();
>>>>               }
>>>>           }
>>>>   @@ -2249,12 +2258,23 @@ static void check_for_pcpu_work(void)
>>>>    * Process pending work for the vCPU. Any call should be fast or
>>>>    * implement preemption.
>>>>    */
>>>> -static void check_for_vcpu_work(void)
>>>> +static bool check_for_vcpu_work(void)
>>>>   {
>>>>       struct vcpu *v = current;
>>>>   +#ifdef CONFIG_IOREQ_SERVER
>>>> +    bool handled;
>>>> +
>>>> +    local_irq_enable();
>>>> +    handled = handle_hvm_io_completion(v);
>>>> +    local_irq_disable();
>>>> +
>>>> +    if ( !handled )
>>>> +        return true;
>>>> +#endif
>>>> +
>>>>       if ( likely(!v->arch.need_flush_to_ram) )
>>>> -        return;
>>>> +        return false;
>>>>         /*
>>>>        * Give a chance for the pCPU to process work before handling 
>>>> the vCPU
>>>> @@ -2265,6 +2285,8 @@ static void check_for_vcpu_work(void)
>>>>       local_irq_enable();
>>>>       p2m_flush_vm(v);
>>>>       local_irq_disable();
>>>> +
>>>> +    return false;
>>>>   }
>>>>     /*
>>>> @@ -2277,8 +2299,10 @@ void leave_hypervisor_to_guest(void)
>>>>   {
>>>>       local_irq_disable();
>>>>   -    check_for_vcpu_work();
>>>> -    check_for_pcpu_work();
>>>> +    do
>>>> +    {
>>>> +        check_for_pcpu_work();
>>>> +    } while ( check_for_vcpu_work() );
>>>>         vgic_sync_to_lrs();
>>>>   diff --git a/xen/include/asm-arm/domain.h 
>>>> b/xen/include/asm-arm/domain.h
>>>> index 6819a3b..d1c48d7 100644
>>>> --- a/xen/include/asm-arm/domain.h
>>>> +++ b/xen/include/asm-arm/domain.h
>>>> @@ -11,10 +11,27 @@
>>>>   #include <asm/vgic.h>
>>>>   #include <asm/vpl011.h>
>>>>   #include <public/hvm/params.h>
>>>> +#include <public/hvm/dm_op.h>
>>>> +#include <public/hvm/ioreq.h>
>>>> +
>>>> +#define MAX_NR_IOREQ_SERVERS 8
>>>>     struct hvm_domain
>>>>   {
>>>>       uint64_t              params[HVM_NR_PARAMS];
>>>> +
>>>> +    /* Guest page range used for non-default ioreq servers */
>>>> +    struct {
>>>> +        unsigned long base;
>>>> +        unsigned long mask;
>>>> +        unsigned long legacy_mask; /* indexed by HVM param number */
>>>> +    } ioreq_gfn;
>>>> +
>>>> +    /* Lock protects all other values in the sub-struct and the 
>>>> default */
>>>> +    struct {
>>>> +        spinlock_t              lock;
>>>> +        struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
>>>> +    } ioreq_server;
>>>>   };
>>>>     #ifdef CONFIG_ARM_64
>>>> @@ -91,6 +108,28 @@ struct arch_domain
>>>>   #endif
>>>>   }  __cacheline_aligned;
>>>>   +enum hvm_io_completion {
>>>> +    HVMIO_no_completion,
>>>> +    HVMIO_mmio_completion,
>>>> +    HVMIO_pio_completion
>>>> +};
>>>> +
>>>> +struct hvm_vcpu_io {
>>>> +    /* I/O request in flight to device model. */
>>>> +    enum hvm_io_completion io_completion;
>>>> +    ioreq_t                io_req;
>>>> +
>>>> +    /*
>>>> +     * HVM emulation:
>>>> +     *  Linear address @mmio_gla maps to MMIO physical frame 
>>>> @mmio_gpfn.
>>>> +     *  The latter is known to be an MMIO frame (not RAM).
>>>> +     *  This translation is only valid for accesses as per 
>>>> @mmio_access.
>>>> +     */
>>>> +    struct npfec        mmio_access;
>>>> +    unsigned long       mmio_gla;
>>>> +    unsigned long       mmio_gpfn;
>>>> +};
>>>> +
>>>
>>> Why do we need to re-define most of this? Can't this just be in 
>>> common code?
>>
>> Jan asked almost the similar question in "[PATCH V1 02/16] xen/ioreq: 
>> Make x86's IOREQ feature common".
>> Please see my answer there:
>> https://patchwork.kernel.org/patch/11769105/#23637511
>>
>> Theoretically we could move this to the common code, but the question 
>> is how to be with other struct fields the x86's struct hvm_vcpu_io 
>> has/needs but
>> Arm's seems not, would it be possible to logically split struct 
>> hvm_vcpu_io into common and arch parts?
>>
>> struct hvm_vcpu_io {
>>      /* I/O request in flight to device model. */
>>      enum hvm_io_completion io_completion;
>>      ioreq_t                io_req;
>>
>>      /*
>>       * HVM emulation:
>>       *  Linear address @mmio_gla maps to MMIO physical frame 
>> @mmio_gpfn.
>>       *  The latter is known to be an MMIO frame (not RAM).
>>       *  This translation is only valid for accesses as per 
>> @mmio_access.
>>       */
>>      struct npfec        mmio_access;
>>      unsigned long       mmio_gla;
>>      unsigned long       mmio_gpfn;
>>
>>      /*
>>       * We may need to handle up to 3 distinct memory accesses per
>>       * instruction.
>>       */
>>      struct hvm_mmio_cache mmio_cache[3];
>>      unsigned int mmio_cache_count;
>>
>>      /* For retries we shouldn't re-fetch the instruction. */
>>      unsigned int mmio_insn_bytes;
>>      unsigned char mmio_insn[16];
>>      struct hvmemul_cache *cache;
>>
>>      /*
>>       * For string instruction emulation we need to be able to signal a
>>       * necessary retry through other than function return codes.
>>       */
>>      bool_t mmio_retry;
>>
>>      unsigned long msix_unmask_address;
>>      unsigned long msix_snoop_address;
>>      unsigned long msix_snoop_gpa;
>>
>>      const struct g2m_ioport *g2m_ioport;
>> };
>
> I think Jan made some suggestion today. Let me know if you require 
> more input.


Yes. I am considering this now. I provided my thoughts on that a little 
bit earlier. Could you please clarify there.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-24 18:02           ` Oleksandr
@ 2020-09-25  6:51             ` Jan Beulich
  2020-09-25  9:47               ` Oleksandr
  2020-09-26 13:12             ` Julien Grall
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-25  6:51 UTC (permalink / raw)
  To: Oleksandr
  Cc: Julien Grall, xen-devel, Oleksandr Tyshchenko,
	Stefano Stabellini, Volodymyr Babchuk, Julien Grall

On 24.09.2020 20:02, Oleksandr wrote:
> On 24.09.20 19:02, Oleksandr wrote:
> Julien, I noticed that three fields mmio* are not used within current 
> series on Arm. Do we expect them to be used later on?
> I would rather not add fields which are not used. We could add them when 
> needed.
> 
> Would be the following acceptable?
> 1. Both fields "io_completion" and "io_req" (which seems to be the only 
> fields used in common/ioreq.c) are moved to common struct vcpu as part 
> of struct vcpu_io,
>      enum hvm_io_completion is also moved (and renamed).
> 2. We remove everything related to hvm_vcpu* from the Arm header.
> 3. x86's struct hvm_vcpu_io stays as it is (but without two fields 
> "io_completion" and "io_req").
>      I think, this way we separate a common part and reduce Xen changes 
> (which are getting bigger).

If this works, it would be my preference too. So far I was left
under the impression that you did move / mention further fields
for a reason.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common
  2020-09-24 16:45         ` Oleksandr
@ 2020-09-25  7:03           ` Jan Beulich
  2020-09-25 13:05             ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-09-25  7:03 UTC (permalink / raw)
  To: Oleksandr
  Cc: Roger Pau Monné,
	xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Wei Liu, Paul Durrant, Julien Grall

On 24.09.2020 18:45, Oleksandr wrote:
> 
> On 24.09.20 14:16, Jan Beulich wrote:
> 
> Hi Jan
> 
>> On 22.09.2020 21:32, Oleksandr wrote:
>>> On 16.09.20 11:50, Jan Beulich wrote:
>>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>>> --- a/xen/common/memory.c
>>>>> +++ b/xen/common/memory.c
>>>>> @@ -1651,6 +1651,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>            break;
>>>>>        }
>>>>>    
>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>> +    if ( op == XENMEM_decrease_reservation )
>>>>> +        curr_d->qemu_mapcache_invalidate = true;
>>>>> +#endif
>>>> I don't see why you put this right into decrease_reservation(). This
>>>> isn't just to avoid the extra conditional, but first and foremost to
>>>> avoid bypassing the earlier return from the function (in the case of
>>>> preemption). In the context of this I wonder whether the ordering of
>>>> operations in hvm_hypercall() is actually correct.
>>> Good point, indeed we may return earlier in case of preemption, I missed
>>> that.
>>> Will move it to decrease_reservation(). But, we may return even earlier
>>> in case of error...
>>> Now I am wondering should we move it to the very beginning of command
>>> processing or not?
>> In _this_ series I'd strongly recommend you keep things working as
>> they are. If independently you think you've found a reason to
>> re-order certain operations, then feel free to send a patch with
>> suitable justification.
> 
> Of course, I will try to retain current behavior.
> 
> 
>>>> I'm also unconvinced curr_d is the right domain in all cases here;
>>>> while this may be a pre-existing issue in principle, I'm afraid it
>>>> gets more pronounced by the logic getting moved to common code.
>>> Sorry I didn't get your concern here.
>> Well, you need to be concerned whose qemu_mapcache_invalidate flag
>> you set.
> May I ask, in what cases the *curr_d* is the right domain?

When a domain does a decrease-reservation on itself. I thought
that's obvious. But perhaps your question was rather meant a to
whether a->domain ever is _not_ the right one?

> We need to make sure that domain is using IOREQ server(s) at least. 
> Hopefully, we have a helper for this
> which is hvm_domain_has_ioreq_server(). Please clarify, anything else I 
> should taking care of?

Nothing I can recall / think of right now, except that the change
may want to come under a different title and with a different
description. As indicated, I don't think this is correct for PVH
Dom0 issuing the request against a HVM DomU, and addressing this
will likely want this moved out of hvm_memory_op() anyway. Of
course an option is to split this into two patches - the proposed
bug fix (perhaps wanting backporting) and then the moving of the
field out of arch.hvm. If you feel uneasy about the bug fix part,
let me know and I (or maybe Roger) will see to put together a
patch.

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-24 18:01   ` Julien Grall
@ 2020-09-25  8:19     ` Paul Durrant
  2020-09-30 13:39       ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Paul Durrant @ 2020-09-25  8:19 UTC (permalink / raw)
  To: 'Julien Grall', 'Oleksandr Tyshchenko', xen-devel
  Cc: 'Oleksandr Tyshchenko', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson',
	'Jan Beulich', 'Stefano Stabellini',
	'Wei Liu', 'Roger Pau Monné',
	'Jun Nakajima', 'Kevin Tian',
	'Tim Deegan', 'Julien Grall'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 24 September 2020 19:01
> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; xen-devel@lists.xenproject.org
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
> George Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Paul Durrant <paul@xen.org>; Jun Nakajima <jun.nakajima@intel.com>;
> Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>; Julien Grall <julien.grall@arm.com>
> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> 
> 
> 
> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
> > +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
> > +{
> > +    unsigned int prev_state = STATE_IOREQ_NONE;
> > +    unsigned int state = p->state;
> > +    uint64_t data = ~0;
> > +
> > +    smp_rmb();
> > +
> > +    /*
> > +     * The only reason we should see this condition be false is when an
> > +     * emulator dying races with I/O being requested.
> > +     */
> > +    while ( likely(state != STATE_IOREQ_NONE) )
> > +    {
> > +        if ( unlikely(state < prev_state) )
> > +        {
> > +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
> > +                     prev_state, state);
> > +            sv->pending = false;
> > +            domain_crash(sv->vcpu->domain);
> > +            return false; /* bail */
> > +        }
> > +
> > +        switch ( prev_state = state )
> > +        {
> > +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> > +            p->state = STATE_IOREQ_NONE;
> > +            data = p->data;
> > +            break;
> > +
> > +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
> > +        case STATE_IOREQ_INPROCESS:
> > +            wait_on_xen_event_channel(sv->ioreq_evtchn,
> > +                                      ({ state = p->state;
> > +                                         smp_rmb();
> > +                                         state != prev_state; }));
> > +            continue;
> 
> As I pointed out previously [1], this helper was implemented with the
> expectation that wait_on_xen_event_channel() will not return if the vCPU
> got rescheduled.
> 
> However, this assumption doesn't hold on Arm.
> 
> I can see two solution:
>     1) Re-execute the caller
>     2) Prevent an IOREQ to disappear until the loop finish.
> 
> @Paul any opinions?

The ioreq control plane is largely predicated on there being no pending I/O when the state of a server is modified, and it is assumed that domain_pause() is sufficient to achieve this. If that assumption doesn't hold then we need additional synchronization.

  Paul

> 
> Cheers,
> 
> [1]
> https://lore.kernel.org/xen-devel/6bfc3920-8f29-188c-cff4-2b99dabe166f@xen.org/
> 
> 
> --
> Julien Grall



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-25  6:51             ` Jan Beulich
@ 2020-09-25  9:47               ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-25  9:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, xen-devel, Oleksandr Tyshchenko,
	Stefano Stabellini, Volodymyr Babchuk, Julien Grall


On 25.09.20 09:51, Jan Beulich wrote:

Hi Jan

> On 24.09.2020 20:02, Oleksandr wrote:
>> On 24.09.20 19:02, Oleksandr wrote:
>> Julien, I noticed that three fields mmio* are not used within current
>> series on Arm. Do we expect them to be used later on?
>> I would rather not add fields which are not used. We could add them when
>> needed.
>>
>> Would be the following acceptable?
>> 1. Both fields "io_completion" and "io_req" (which seems to be the only
>> fields used in common/ioreq.c) are moved to common struct vcpu as part
>> of struct vcpu_io,
>>       enum hvm_io_completion is also moved (and renamed).
>> 2. We remove everything related to hvm_vcpu* from the Arm header.
>> 3. x86's struct hvm_vcpu_io stays as it is (but without two fields
>> "io_completion" and "io_req").
>>       I think, this way we separate a common part and reduce Xen changes
>> (which are getting bigger).
> If this works, it would be my preference too.

Thanks. I will wait for Julien's input on that and if he doesn't have 
any objections I will go this direction.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common
  2020-09-25  7:03           ` Jan Beulich
@ 2020-09-25 13:05             ` Oleksandr
  2020-10-02  9:55               ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-25 13:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Wei Liu, Paul Durrant, Julien Grall


On 25.09.20 10:03, Jan Beulich wrote:

Hi Jan.

> On 24.09.2020 18:45, Oleksandr wrote:
>> On 24.09.20 14:16, Jan Beulich wrote:
>>
>> Hi Jan
>>
>>> On 22.09.2020 21:32, Oleksandr wrote:
>>>> On 16.09.20 11:50, Jan Beulich wrote:
>>>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>>>> --- a/xen/common/memory.c
>>>>>> +++ b/xen/common/memory.c
>>>>>> @@ -1651,6 +1651,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>             break;
>>>>>>         }
>>>>>>     
>>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>>> +    if ( op == XENMEM_decrease_reservation )
>>>>>> +        curr_d->qemu_mapcache_invalidate = true;
>>>>>> +#endif
>>>>> I don't see why you put this right into decrease_reservation(). This
>>>>> isn't just to avoid the extra conditional, but first and foremost to
>>>>> avoid bypassing the earlier return from the function (in the case of
>>>>> preemption). In the context of this I wonder whether the ordering of
>>>>> operations in hvm_hypercall() is actually correct.
>>>> Good point, indeed we may return earlier in case of preemption, I missed
>>>> that.
>>>> Will move it to decrease_reservation(). But, we may return even earlier
>>>> in case of error...
>>>> Now I am wondering should we move it to the very beginning of command
>>>> processing or not?
>>> In _this_ series I'd strongly recommend you keep things working as
>>> they are. If independently you think you've found a reason to
>>> re-order certain operations, then feel free to send a patch with
>>> suitable justification.
>> Of course, I will try to retain current behavior.
>>
>>
>>>>> I'm also unconvinced curr_d is the right domain in all cases here;
>>>>> while this may be a pre-existing issue in principle, I'm afraid it
>>>>> gets more pronounced by the logic getting moved to common code.
>>>> Sorry I didn't get your concern here.
>>> Well, you need to be concerned whose qemu_mapcache_invalidate flag
>>> you set.
>> May I ask, in what cases the *curr_d* is the right domain?
> When a domain does a decrease-reservation on itself. I thought
> that's obvious. But perhaps your question was rather meant a to
> whether a->domain ever is _not_ the right one?
No, my question was about *curr_d*. I saw your answer
 > I'm also unconvinced curr_d is the right domain in all cases here;
and just wanted to clarify these cases. Sorry if I was unclear.


>
>> We need to make sure that domain is using IOREQ server(s) at least.
>> Hopefully, we have a helper for this
>> which is hvm_domain_has_ioreq_server(). Please clarify, anything else I
>> should taking care of?
> Nothing I can recall / think of right now, except that the change
> may want to come under a different title and with a different
> description.As indicated, I don't think this is correct for PVH
> Dom0 issuing the request against a HVM DomU, and addressing this
> will likely want this moved out of hvm_memory_op() anyway. Of
> course an option is to split this into two patches - the proposed
> bug fix (perhaps wanting backporting) and then the moving of the
> field out of arch.hvm. If you feel uneasy about the bug fix part,
> let me know and I (or maybe Roger) will see to put together a
> patch.

Thank you for the clarification.

Yes, it would be really nice if you (or maybe Roger) could create a 
patch for the bug fix part.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-24 18:02           ` Oleksandr
  2020-09-25  6:51             ` Jan Beulich
@ 2020-09-26 13:12             ` Julien Grall
  2020-09-26 13:18               ` Oleksandr
  1 sibling, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-26 13:12 UTC (permalink / raw)
  To: Oleksandr, Jan Beulich
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall

Hi,

On 24/09/2020 19:02, Oleksandr wrote:
> On 24.09.20 19:02, Oleksandr wrote:
>> On 24.09.20 14:08, Jan Beulich wrote:
>>> On 23.09.2020 22:16, Oleksandr wrote:
>>>> On 23.09.20 21:03, Julien Grall wrote:
>>>>> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>> @@ -91,6 +108,28 @@ struct arch_domain
>>>>>>    #endif
>>>>>>    }  __cacheline_aligned;
>>>>>>    +enum hvm_io_completion {
>>>>>> +    HVMIO_no_completion,
>>>>>> +    HVMIO_mmio_completion,
>>>>>> +    HVMIO_pio_completion
>>>>>> +};
>>>>>> +
>>>>>> +struct hvm_vcpu_io {
>>>>>> +    /* I/O request in flight to device model. */
>>>>>> +    enum hvm_io_completion io_completion;
>>>>>> +    ioreq_t                io_req;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * HVM emulation:
>>>>>> +     *  Linear address @mmio_gla maps to MMIO physical frame
>>>>>> @mmio_gpfn.
>>>>>> +     *  The latter is known to be an MMIO frame (not RAM).
>>>>>> +     *  This translation is only valid for accesses as per
>>>>>> @mmio_access.
>>>>>> +     */
>>>>>> +    struct npfec        mmio_access;
>>>>>> +    unsigned long       mmio_gla;
>>>>>> +    unsigned long       mmio_gpfn;
>>>>>> +};
>>>>>> +
>>>>> Why do we need to re-define most of this? Can't this just be in common
>>>>> code?
>>>> Jan asked almost the similar question in "[PATCH V1 02/16] xen/ioreq:
>>>> Make x86's IOREQ feature common".
>>>> Please see my answer there:
>>>> https://patchwork.kernel.org/patch/11769105/#23637511
>>>>
>>>> Theoretically we could move this to the common code, but the 
>>>> question is
>>>> how to be with other struct fields the x86's struct hvm_vcpu_io
>>>> has/needs but
>>>> Arm's seems not, would it be possible to logically split struct
>>>> hvm_vcpu_io into common and arch parts?
>>> Have struct vcpu_io and struct arch_vcpu_io as a sub-part of it?
>> Although it is going to pull a lot of changes into x86/hvm/*, yes this 
>> way
>> we indeed could logically split struct hvm_vcpu_io into common and 
>> arch parts in a clear way.
>> If it is really worth it, I will start looking into it.
> Julien, I noticed that three fields mmio* are not used within current 
> series on Arm. Do we expect them to be used later on?

IIRC, I just copied them blindly when writing the PoC.

The information can already be found using the HSR (syndrome register), 
so those fields would be redundant for us.

> Would be the following acceptable?
> 1. Both fields "io_completion" and "io_req" (which seems to be the only 
> fields used in common/ioreq.c) are moved to common struct vcpu as part 
> of struct vcpu_io,
>      enum hvm_io_completion is also moved (and renamed).
> 2. We remove everything related to hvm_vcpu* from the Arm header.
> 3. x86's struct hvm_vcpu_io stays as it is (but without two fields 
> "io_completion" and "io_req").
>      I think, this way we separate a common part and reduce Xen changes 
> (which are getting bigger).

The plan looks reasonable to me.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-26 13:12             ` Julien Grall
@ 2020-09-26 13:18               ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-26 13:18 UTC (permalink / raw)
  To: Julien Grall
  Cc: Jan Beulich, xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall


On 26.09.20 16:12, Julien Grall wrote:
> Hi,

Hi Julien.


>
> On 24/09/2020 19:02, Oleksandr wrote:
>> On 24.09.20 19:02, Oleksandr wrote:
>>> On 24.09.20 14:08, Jan Beulich wrote:
>>>> On 23.09.2020 22:16, Oleksandr wrote:
>>>>> On 23.09.20 21:03, Julien Grall wrote:
>>>>>> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>> @@ -91,6 +108,28 @@ struct arch_domain
>>>>>>>    #endif
>>>>>>>    }  __cacheline_aligned;
>>>>>>>    +enum hvm_io_completion {
>>>>>>> +    HVMIO_no_completion,
>>>>>>> +    HVMIO_mmio_completion,
>>>>>>> +    HVMIO_pio_completion
>>>>>>> +};
>>>>>>> +
>>>>>>> +struct hvm_vcpu_io {
>>>>>>> +    /* I/O request in flight to device model. */
>>>>>>> +    enum hvm_io_completion io_completion;
>>>>>>> +    ioreq_t                io_req;
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * HVM emulation:
>>>>>>> +     *  Linear address @mmio_gla maps to MMIO physical frame
>>>>>>> @mmio_gpfn.
>>>>>>> +     *  The latter is known to be an MMIO frame (not RAM).
>>>>>>> +     *  This translation is only valid for accesses as per
>>>>>>> @mmio_access.
>>>>>>> +     */
>>>>>>> +    struct npfec        mmio_access;
>>>>>>> +    unsigned long       mmio_gla;
>>>>>>> +    unsigned long       mmio_gpfn;
>>>>>>> +};
>>>>>>> +
>>>>>> Why do we need to re-define most of this? Can't this just be in 
>>>>>> common
>>>>>> code?
>>>>> Jan asked almost the similar question in "[PATCH V1 02/16] xen/ioreq:
>>>>> Make x86's IOREQ feature common".
>>>>> Please see my answer there:
>>>>> https://patchwork.kernel.org/patch/11769105/#23637511
>>>>>
>>>>> Theoretically we could move this to the common code, but the 
>>>>> question is
>>>>> how to be with other struct fields the x86's struct hvm_vcpu_io
>>>>> has/needs but
>>>>> Arm's seems not, would it be possible to logically split struct
>>>>> hvm_vcpu_io into common and arch parts?
>>>> Have struct vcpu_io and struct arch_vcpu_io as a sub-part of it?
>>> Although it is going to pull a lot of changes into x86/hvm/*, yes 
>>> this way
>>> we indeed could logically split struct hvm_vcpu_io into common and 
>>> arch parts in a clear way.
>>> If it is really worth it, I will start looking into it.
>> Julien, I noticed that three fields mmio* are not used within current 
>> series on Arm. Do we expect them to be used later on?
>
> IIRC, I just copied them blindly when writing the PoC.
>
> The information can already be found using the HSR (syndrome 
> register), so those fields would be redundant for us.

Got it.


>
>
>> Would be the following acceptable?
>> 1. Both fields "io_completion" and "io_req" (which seems to be the 
>> only fields used in common/ioreq.c) are moved to common struct vcpu 
>> as part of struct vcpu_io,
>>      enum hvm_io_completion is also moved (and renamed).
>> 2. We remove everything related to hvm_vcpu* from the Arm header.
>> 3. x86's struct hvm_vcpu_io stays as it is (but without two fields 
>> "io_completion" and "io_req").
>>      I think, this way we separate a common part and reduce Xen 
>> changes (which are getting bigger).
>
> The plan looks reasonable to me.

OK, will follow it. Thank you


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-24 18:22         ` Oleksandr
@ 2020-09-26 13:21           ` Julien Grall
  2020-09-26 14:57             ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-26 13:21 UTC (permalink / raw)
  To: Oleksandr
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall

Hi Oleksandr,

On 24/09/2020 19:22, Oleksandr wrote:
> On 24.09.20 20:25, Julien Grall wrote:
>> On 23/09/2020 21:16, Oleksandr wrote:
>>> On 23.09.20 21:03, Julien Grall wrote:
>>>> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> Could you please clarify how this patch could be split in smaller one?
>>
>> This patch is going to be reduced a fair bit if you make some of the 
>> structure common. The next steps would be to move anything that is not 
>> directly related to IOREQ out.
> 
> 
> Thank you for the clarification.
> Yes, however, I believed everything in this patch is directly related to 
> IOREQ...
> 
> 
>>
>>
>> From a quick look, there are few things that can be moved in separate 
>> patches:
>>    - The addition of the ASSERT_UNREACHABLE()
> 
> Did you mean the addition of the ASSERT_UNREACHABLE() to 
> arch_handle_hvm_io_completion/handle_pio can moved to separate patches?
> Sorry, I don't quite understand, for what benefit?

Sorry I didn't realize there was multiple ASSERT_UNREACHABLE() in the 
code. I was referring to the one in the follow chunk:

@@ -1955,9 +1959,14 @@ static void do_trap_stage2_abort_guest(struct 
cpu_user_regs *regs,
              case IO_HANDLED:
                  advance_pc(regs, hsr);
                  return;
+            case IO_RETRY:
+                /* finish later */
+                return;
              case IO_UNHANDLED:
                  /* IO unhandled, try another way to handle it. */
                  break;
+            default:
+                ASSERT_UNREACHABLE();
              }
          }

While I understand the reason this was added, to me this doesn't seem to 
be directly related to this patch.

In fact, the switch case will be done on an enum. So without the 
default, the compiler will be able to notice if we are adding a new 
field. With this new approach, you would only notice at runtime 
(assuming the path is exercised).

So what do we gain?

[...]

>> I think Jan made some suggestion today. Let me know if you require 
>> more input.
> 
> 
> Yes. I am considering this now. I provided my thoughts on that a little 
> bit earlier. Could you please clarify there.

I have replied to it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 12/16] xen/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-09-10 20:22 ` [PATCH V1 12/16] xen/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
@ 2020-09-26 13:50   ` Julien Grall
  2020-09-26 14:21     ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-09-26 13:50 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Andrew Cooper,
	George Dunlap, Jan Beulich, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall, paul, Andre Przywara

(+ Paul and Andre)

Hi,

Adding Paul as the author of DMOP and Andre as this is GIC related.

On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Looking at the PoC I shared with you, this code was originally written 
by me.

> 
> This patch adds ability to the device emulator to notify otherend
> (some entity running in the guest) using a SPI and implements Arm
> specific bits for it. Proposed interface allows emulator to set
> the logical level of a one of a domain's IRQ lines.

It would be good to explain in the commit message why we can't use the 
existing DMOP to inject an interrupt.

> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> ---
> Please note, this is a split/cleanup/hardening of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Please note, I left interface untouched since there is still
> an open discussion what interface to use/what information to pass
> to the hypervisor. The question whether we should abstract away
> the state of the line or not.
> 
> Changes RFC -> V1:
>     - check incoming parameters in arch_dm_op()
>     - add explicit padding to struct xen_dm_op_set_irq_level
> ---
> ---
>   tools/libs/devicemodel/core.c                   | 18 +++++++++++++
>   tools/libs/devicemodel/include/xendevicemodel.h |  4 +++
>   tools/libs/devicemodel/libxendevicemodel.map    |  1 +
>   xen/arch/arm/dm.c                               | 36 ++++++++++++++++++++++++-
>   xen/common/dm.c                                 |  1 +
>   xen/include/public/hvm/dm_op.h                  | 15 +++++++++++
>   6 files changed, 74 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
> index 4d40639..30bd79f 100644
> --- a/tools/libs/devicemodel/core.c
> +++ b/tools/libs/devicemodel/core.c
> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
>       return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>   }
>   
> +int xendevicemodel_set_irq_level(
> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
> +    unsigned int level)
> +{
> +    struct xen_dm_op op;
> +    struct xen_dm_op_set_irq_level *data;
> +
> +    memset(&op, 0, sizeof(op));
> +
> +    op.op = XEN_DMOP_set_irq_level;
> +    data = &op.u.set_irq_level;
> +
> +    data->irq = irq;
> +    data->level = level;
> +
> +    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> +}
> +
>   int xendevicemodel_set_pci_link_route(
>       xendevicemodel_handle *dmod, domid_t domid, uint8_t link, uint8_t irq)
>   {
> diff --git a/tools/libs/devicemodel/include/xendevicemodel.h b/tools/libs/devicemodel/include/xendevicemodel.h
> index e877f5c..c06b3c8 100644
> --- a/tools/libs/devicemodel/include/xendevicemodel.h
> +++ b/tools/libs/devicemodel/include/xendevicemodel.h
> @@ -209,6 +209,10 @@ int xendevicemodel_set_isa_irq_level(
>       xendevicemodel_handle *dmod, domid_t domid, uint8_t irq,
>       unsigned int level);
>   
> +int xendevicemodel_set_irq_level(
> +    xendevicemodel_handle *dmod, domid_t domid, unsigned int irq,
> +    unsigned int level);
> +
>   /**
>    * This function maps a PCI INTx line to a an IRQ line.
>    *
> diff --git a/tools/libs/devicemodel/libxendevicemodel.map b/tools/libs/devicemodel/libxendevicemodel.map
> index 561c62d..a0c3012 100644
> --- a/tools/libs/devicemodel/libxendevicemodel.map
> +++ b/tools/libs/devicemodel/libxendevicemodel.map
> @@ -32,6 +32,7 @@ VERS_1.2 {
>   	global:
>   		xendevicemodel_relocate_memory;
>   		xendevicemodel_pin_memory_cacheattr;
> +		xendevicemodel_set_irq_level;
>   } VERS_1.1;
>   
>   VERS_1.3 {
> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
> index eb20344..428ef98 100644
> --- a/xen/arch/arm/dm.c
> +++ b/xen/arch/arm/dm.c
> @@ -15,11 +15,45 @@
>    */
>   
>   #include <xen/hypercall.h>

NIT: newline between <xen/*> and <asm/*> includes.

> +#include <asm/vgic.h>
>   
>   int arch_dm_op(struct xen_dm_op *op, struct domain *d,
>                  const struct dmop_args *op_args, bool *const_op)
>   {
> -    return -EOPNOTSUPP;
> +    int rc;
> +
> +    switch ( op->op )
> +    {
> +    case XEN_DMOP_set_irq_level:
> +    {
> +        const struct xen_dm_op_set_irq_level *data =
> +            &op->u.set_irq_level;
> +
> +        /* Only SPIs are supported */
> +        if ( (data->irq < NR_LOCAL_IRQS) || (data->irq >= vgic_num_irqs(d)) )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +
> +        if ( data->level != 0 && data->level != 1 )
> +        {
> +            rc = -EINVAL;
> +            break;
> +        }
> +

I think we want to check the padding is always 0.

> +
> +        vgic_inject_irq(d, NULL, data->irq, data->level);

So, this interface will allow the device emulator to raise/lower the 
line for an HW mapped interrupt. I think this will mess up with the 
internal state machine.

It would probably be better if a device emulator can only raise/lower 
the line for non-allocated interrupts (see d->arch.vgic.allocated_irqs). 
Any thoughts?

> +        rc = 0;
> +        break;
> +    }
> +
> +    default:
> +        rc = -EOPNOTSUPP;
> +        break;
> +    }
> +
> +    return rc;
>   }
>   
>   /*
> diff --git a/xen/common/dm.c b/xen/common/dm.c
> index 060731d..c55e042 100644
> --- a/xen/common/dm.c
> +++ b/xen/common/dm.c
> @@ -47,6 +47,7 @@ static int dm_op(const struct dmop_args *op_args)
>           [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
>           [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
>           [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
> +        [XEN_DMOP_set_irq_level]                    = sizeof(struct xen_dm_op_set_irq_level),
>       };
>   
>       rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
> diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
> index fd00e9d..39567bf 100644
> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -417,6 +417,20 @@ struct xen_dm_op_pin_memory_cacheattr {
>       uint32_t pad;
>   };
>   
> +/*
> + * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's
> + *                         IRQ lines.
> + * XXX Handle PPIs.

This is a public interface, so it seems a bit strange to leave a TODO in 
this code.

I wouldn't be surprised if someone will want PPI support soon, but we 
may be able to defer it if we can easily extend the hypercall.

@Paul, how did you envision to extend DMOP?

Also, is there any plan to add x86 support? If not, then I think we want 
to document in the interface that this is Arm only.

> + */
> +#define XEN_DMOP_set_irq_level 19
> +
> +struct xen_dm_op_set_irq_level {
> +    uint32_t irq;
> +    /* IN - Level: 0 -> deasserted, 1 -> asserted */
> +    uint8_t level;
> +    uint8_t pad[3];
> +};
> +
>   struct xen_dm_op {
>       uint32_t op;
>       uint32_t pad;
> @@ -430,6 +444,7 @@ struct xen_dm_op {
>           struct xen_dm_op_track_dirty_vram track_dirty_vram;
>           struct xen_dm_op_set_pci_intx_level set_pci_intx_level;
>           struct xen_dm_op_set_isa_irq_level set_isa_irq_level;
> +        struct xen_dm_op_set_irq_level set_irq_level;
>           struct xen_dm_op_set_pci_link_route set_pci_link_route;
>           struct xen_dm_op_modified_memory modified_memory;
>           struct xen_dm_op_set_mem_type set_mem_type;
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 12/16] xen/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-09-26 13:50   ` Julien Grall
@ 2020-09-26 14:21     ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-26 14:21 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Andrew Cooper,
	George Dunlap, Jan Beulich, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall, paul, Andre Przywara


On 26.09.20 16:50, Julien Grall wrote:
> (+ Paul and Andre)
>
> Hi,

Hi Julien



>
> Adding Paul as the author of DMOP and Andre as this is GIC related.
>
> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>
> Looking at the PoC I shared with you, this code was originally written 
> by me.

I am sorry, will fix.


>
>
>>
>> This patch adds ability to the device emulator to notify otherend
>> (some entity running in the guest) using a SPI and implements Arm
>> specific bits for it. Proposed interface allows emulator to set
>> the logical level of a one of a domain's IRQ lines.
>
> It would be good to explain in the commit message why we can't use the 
> existing DMOP to inject an interrupt.

Agree, will explain why the existing DMOP to inject an interrupt is not 
suitable for us.


>
>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> ---
>> Please note, this is a split/cleanup/hardening of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> Please note, I left interface untouched since there is still
>> an open discussion what interface to use/what information to pass
>> to the hypervisor. The question whether we should abstract away
>> the state of the line or not.
>>
>> Changes RFC -> V1:
>>     - check incoming parameters in arch_dm_op()
>>     - add explicit padding to struct xen_dm_op_set_irq_level
>> ---
>> ---
>>   tools/libs/devicemodel/core.c                   | 18 +++++++++++++
>>   tools/libs/devicemodel/include/xendevicemodel.h |  4 +++
>>   tools/libs/devicemodel/libxendevicemodel.map    |  1 +
>>   xen/arch/arm/dm.c                               | 36 
>> ++++++++++++++++++++++++-
>>   xen/common/dm.c                                 |  1 +
>>   xen/include/public/hvm/dm_op.h                  | 15 +++++++++++
>>   6 files changed, 74 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libs/devicemodel/core.c 
>> b/tools/libs/devicemodel/core.c
>> index 4d40639..30bd79f 100644
>> --- a/tools/libs/devicemodel/core.c
>> +++ b/tools/libs/devicemodel/core.c
>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
>>       return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>>   }
>>   +int xendevicemodel_set_irq_level(
>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
>> +    unsigned int level)
>> +{
>> +    struct xen_dm_op op;
>> +    struct xen_dm_op_set_irq_level *data;
>> +
>> +    memset(&op, 0, sizeof(op));
>> +
>> +    op.op = XEN_DMOP_set_irq_level;
>> +    data = &op.u.set_irq_level;
>> +
>> +    data->irq = irq;
>> +    data->level = level;
>> +
>> +    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>> +}
>> +
>>   int xendevicemodel_set_pci_link_route(
>>       xendevicemodel_handle *dmod, domid_t domid, uint8_t link, 
>> uint8_t irq)
>>   {
>> diff --git a/tools/libs/devicemodel/include/xendevicemodel.h 
>> b/tools/libs/devicemodel/include/xendevicemodel.h
>> index e877f5c..c06b3c8 100644
>> --- a/tools/libs/devicemodel/include/xendevicemodel.h
>> +++ b/tools/libs/devicemodel/include/xendevicemodel.h
>> @@ -209,6 +209,10 @@ int xendevicemodel_set_isa_irq_level(
>>       xendevicemodel_handle *dmod, domid_t domid, uint8_t irq,
>>       unsigned int level);
>>   +int xendevicemodel_set_irq_level(
>> +    xendevicemodel_handle *dmod, domid_t domid, unsigned int irq,
>> +    unsigned int level);
>> +
>>   /**
>>    * This function maps a PCI INTx line to a an IRQ line.
>>    *
>> diff --git a/tools/libs/devicemodel/libxendevicemodel.map 
>> b/tools/libs/devicemodel/libxendevicemodel.map
>> index 561c62d..a0c3012 100644
>> --- a/tools/libs/devicemodel/libxendevicemodel.map
>> +++ b/tools/libs/devicemodel/libxendevicemodel.map
>> @@ -32,6 +32,7 @@ VERS_1.2 {
>>       global:
>>           xendevicemodel_relocate_memory;
>>           xendevicemodel_pin_memory_cacheattr;
>> +        xendevicemodel_set_irq_level;
>>   } VERS_1.1;
>>     VERS_1.3 {
>> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
>> index eb20344..428ef98 100644
>> --- a/xen/arch/arm/dm.c
>> +++ b/xen/arch/arm/dm.c
>> @@ -15,11 +15,45 @@
>>    */
>>     #include <xen/hypercall.h>
>
> NIT: newline between <xen/*> and <asm/*> includes.

ok


>
>> +#include <asm/vgic.h>
>>     int arch_dm_op(struct xen_dm_op *op, struct domain *d,
>>                  const struct dmop_args *op_args, bool *const_op)
>>   {
>> -    return -EOPNOTSUPP;
>> +    int rc;
>> +
>> +    switch ( op->op )
>> +    {
>> +    case XEN_DMOP_set_irq_level:
>> +    {
>> +        const struct xen_dm_op_set_irq_level *data =
>> +            &op->u.set_irq_level;
>> +
>> +        /* Only SPIs are supported */
>> +        if ( (data->irq < NR_LOCAL_IRQS) || (data->irq >= 
>> vgic_num_irqs(d)) )
>> +        {
>> +            rc = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        if ( data->level != 0 && data->level != 1 )
>> +        {
>> +            rc = -EINVAL;
>> +            break;
>> +        }
>> +
>
> I think we want to check the padding is always 0.

ok


>
>> +
>> +        vgic_inject_irq(d, NULL, data->irq, data->level);
>
> So, this interface will allow the device emulator to raise/lower the 
> line for an HW mapped interrupt. I think this will mess up with the 
> internal state machine.
>
> It would probably be better if a device emulator can only raise/lower 
> the line for non-allocated interrupts (see 
> d->arch.vgic.allocated_irqs). Any thoughts?

I think, it really makes sense. Will add a corresponding check.


>
>
>> +        rc = 0;
>> +        break;
>> +    }
>> +
>> +    default:
>> +        rc = -EOPNOTSUPP;
>> +        break;
>> +    }
>> +
>> +    return rc;
>>   }
>>     /*
>> diff --git a/xen/common/dm.c b/xen/common/dm.c
>> index 060731d..c55e042 100644
>> --- a/xen/common/dm.c
>> +++ b/xen/common/dm.c
>> @@ -47,6 +47,7 @@ static int dm_op(const struct dmop_args *op_args)
>>           [XEN_DMOP_remote_shutdown]                  = sizeof(struct 
>> xen_dm_op_remote_shutdown),
>>           [XEN_DMOP_relocate_memory]                  = sizeof(struct 
>> xen_dm_op_relocate_memory),
>>           [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct 
>> xen_dm_op_pin_memory_cacheattr),
>> +        [XEN_DMOP_set_irq_level]                    = sizeof(struct 
>> xen_dm_op_set_irq_level),
>>       };
>>         rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
>> diff --git a/xen/include/public/hvm/dm_op.h 
>> b/xen/include/public/hvm/dm_op.h
>> index fd00e9d..39567bf 100644
>> --- a/xen/include/public/hvm/dm_op.h
>> +++ b/xen/include/public/hvm/dm_op.h
>> @@ -417,6 +417,20 @@ struct xen_dm_op_pin_memory_cacheattr {
>>       uint32_t pad;
>>   };
>>   +/*
>> + * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's
>> + *                         IRQ lines.
>> + * XXX Handle PPIs.
>
> This is a public interface, so it seems a bit strange to leave a TODO 
> in this code.
>
> I wouldn't be surprised if someone will want PPI support soon, but we 
> may be able to defer it if we can easily extend the hypercall.
>
> @Paul, how did you envision to extend DMOP?
>
> Also, is there any plan to add x86 support? If not, then I think we 
> want to document in the interface that this is Arm only.

I don't have a plan to add x86 support. Will clarify that it is for ARM 
only.


>
>> + */
>> +#define XEN_DMOP_set_irq_level 19
>> +
>> +struct xen_dm_op_set_irq_level {
>> +    uint32_t irq;
>> +    /* IN - Level: 0 -> deasserted, 1 -> asserted */
>> +    uint8_t level;
>> +    uint8_t pad[3];
>> +};
>> +
>>   struct xen_dm_op {
>>       uint32_t op;
>>       uint32_t pad;
>> @@ -430,6 +444,7 @@ struct xen_dm_op {
>>           struct xen_dm_op_track_dirty_vram track_dirty_vram;
>>           struct xen_dm_op_set_pci_intx_level set_pci_intx_level;
>>           struct xen_dm_op_set_isa_irq_level set_isa_irq_level;
>> +        struct xen_dm_op_set_irq_level set_irq_level;
>>           struct xen_dm_op_set_pci_link_route set_pci_link_route;
>>           struct xen_dm_op_modified_memory modified_memory;
>>           struct xen_dm_op_set_mem_type set_mem_type;
>>
>
> Cheers,
>
-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features
  2020-09-26 13:21           ` Julien Grall
@ 2020-09-26 14:57             ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-09-26 14:57 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Julien Grall


On 26.09.20 16:21, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien.


>
> On 24/09/2020 19:22, Oleksandr wrote:
>> On 24.09.20 20:25, Julien Grall wrote:
>>> On 23/09/2020 21:16, Oleksandr wrote:
>>>> On 23.09.20 21:03, Julien Grall wrote:
>>>>> On 10/09/2020 21:22, Oleksandr Tyshchenko wrote:
>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> Could you please clarify how this patch could be split in smaller one?
>>>
>>> This patch is going to be reduced a fair bit if you make some of the 
>>> structure common. The next steps would be to move anything that is 
>>> not directly related to IOREQ out.
>>
>>
>> Thank you for the clarification.
>> Yes, however, I believed everything in this patch is directly related 
>> to IOREQ...
>>
>>
>>>
>>>
>>> From a quick look, there are few things that can be moved in 
>>> separate patches:
>>>    - The addition of the ASSERT_UNREACHABLE()
>>
>> Did you mean the addition of the ASSERT_UNREACHABLE() to 
>> arch_handle_hvm_io_completion/handle_pio can moved to separate patches?
>> Sorry, I don't quite understand, for what benefit?
>
> Sorry I didn't realize there was multiple ASSERT_UNREACHABLE() in the 
> code. I was referring to the one in the follow chunk:
>
> @@ -1955,9 +1959,14 @@ static void do_trap_stage2_abort_guest(struct 
> cpu_user_regs *regs,
>              case IO_HANDLED:
>                  advance_pc(regs, hsr);
>                  return;
> +            case IO_RETRY:
> +                /* finish later */
> +                return;
>              case IO_UNHANDLED:
>                  /* IO unhandled, try another way to handle it. */
>                  break;
> +            default:
> +                ASSERT_UNREACHABLE();
>              }
>          }
>
> While I understand the reason this was added, to me this doesn't seem 
> to be directly related to this patch.
>
> In fact, the switch case will be done on an enum. So without the 
> default, the compiler will be able to notice if we are adding a new 
> field. With this new approach, you would only notice at runtime 
> (assuming the path is exercised).
>
> So what do we gain?

Hmm, now I am in doubt whether we really need to put 
ASSERT_UNREACHABLE() here. Also we would notice it at the runtime for 
debug builds only.


>
> [...]
>
>>> I think Jan made some suggestion today. Let me know if you require 
>>> more input.
>>
>>
>> Yes. I am considering this now. I provided my thoughts on that a 
>> little bit earlier. Could you please clarify there.
>
> I have replied to it.

Thank you.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-25  8:19     ` Paul Durrant
@ 2020-09-30 13:39       ` Oleksandr
  2020-09-30 17:47         ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-09-30 13:39 UTC (permalink / raw)
  To: 'Julien Grall', xen-devel
  Cc: paul, 'Oleksandr Tyshchenko', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson',
	'Jan Beulich', 'Stefano Stabellini',
	'Wei Liu', 'Roger Pau Monné',
	'Jun Nakajima', 'Kevin Tian',
	'Tim Deegan', 'Julien Grall'


Hi Julien

On 25.09.20 11:19, Paul Durrant wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 24 September 2020 19:01
>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; xen-devel@lists.xenproject.org
>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
>> George Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
>> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Roger Pau
>> Monné <roger.pau@citrix.com>; Paul Durrant <paul@xen.org>; Jun Nakajima <jun.nakajima@intel.com>;
>> Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>; Julien Grall <julien.grall@arm.com>
>> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
>>
>>
>>
>> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
>>> +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>>> +{
>>> +    unsigned int prev_state = STATE_IOREQ_NONE;
>>> +    unsigned int state = p->state;
>>> +    uint64_t data = ~0;
>>> +
>>> +    smp_rmb();
>>> +
>>> +    /*
>>> +     * The only reason we should see this condition be false is when an
>>> +     * emulator dying races with I/O being requested.
>>> +     */
>>> +    while ( likely(state != STATE_IOREQ_NONE) )
>>> +    {
>>> +        if ( unlikely(state < prev_state) )
>>> +        {
>>> +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
>>> +                     prev_state, state);
>>> +            sv->pending = false;
>>> +            domain_crash(sv->vcpu->domain);
>>> +            return false; /* bail */
>>> +        }
>>> +
>>> +        switch ( prev_state = state )
>>> +        {
>>> +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
>>> +            p->state = STATE_IOREQ_NONE;
>>> +            data = p->data;
>>> +            break;
>>> +
>>> +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
>>> +        case STATE_IOREQ_INPROCESS:
>>> +            wait_on_xen_event_channel(sv->ioreq_evtchn,
>>> +                                      ({ state = p->state;
>>> +                                         smp_rmb();
>>> +                                         state != prev_state; }));
>>> +            continue;
>> As I pointed out previously [1], this helper was implemented with the
>> expectation that wait_on_xen_event_channel() will not return if the vCPU
>> got rescheduled.
>>
>> However, this assumption doesn't hold on Arm.
>>
>> I can see two solution:
>>      1) Re-execute the caller
>>      2) Prevent an IOREQ to disappear until the loop finish.
>>
>> @Paul any opinions?
> The ioreq control plane is largely predicated on there being no pending I/O when the state of a server is modified, and it is assumed that domain_pause() is sufficient to achieve this. If that assumption doesn't hold then we need additional synchronization.
>
>    Paul
>
May I please clarify whether a concern still stands (with what was said 
above) and we need an additional synchronization on Arm?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-30 13:39       ` Oleksandr
@ 2020-09-30 17:47         ` Julien Grall
  2020-10-01  6:59           ` Paul Durrant
  2020-10-01  8:49           ` Jan Beulich
  0 siblings, 2 replies; 111+ messages in thread
From: Julien Grall @ 2020-09-30 17:47 UTC (permalink / raw)
  To: Oleksandr, xen-devel
  Cc: paul, 'Oleksandr Tyshchenko', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson',
	'Jan Beulich', 'Stefano Stabellini',
	'Wei Liu', 'Roger Pau Monné',
	'Jun Nakajima', 'Kevin Tian',
	'Tim Deegan', 'Julien Grall'

Hi,

On 30/09/2020 14:39, Oleksandr wrote:
> 
> Hi Julien
> 
> On 25.09.20 11:19, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Julien Grall <julien@xen.org>
>>> Sent: 24 September 2020 19:01
>>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; 
>>> xen-devel@lists.xenproject.org
>>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew 
>>> Cooper <andrew.cooper3@citrix.com>;
>>> George Dunlap <george.dunlap@citrix.com>; Ian Jackson 
>>> <ian.jackson@eu.citrix.com>; Jan Beulich
>>> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei 
>>> Liu <wl@xen.org>; Roger Pau
>>> Monné <roger.pau@citrix.com>; Paul Durrant <paul@xen.org>; Jun 
>>> Nakajima <jun.nakajima@intel.com>;
>>> Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>; Julien 
>>> Grall <julien.grall@arm.com>
>>> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
>>>
>>>
>>>
>>> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
>>>> +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>>>> +{
>>>> +    unsigned int prev_state = STATE_IOREQ_NONE;
>>>> +    unsigned int state = p->state;
>>>> +    uint64_t data = ~0;
>>>> +
>>>> +    smp_rmb();
>>>> +
>>>> +    /*
>>>> +     * The only reason we should see this condition be false is 
>>>> when an
>>>> +     * emulator dying races with I/O being requested.
>>>> +     */
>>>> +    while ( likely(state != STATE_IOREQ_NONE) )
>>>> +    {
>>>> +        if ( unlikely(state < prev_state) )
>>>> +        {
>>>> +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition 
>>>> %u -> %u\n",
>>>> +                     prev_state, state);
>>>> +            sv->pending = false;
>>>> +            domain_crash(sv->vcpu->domain);
>>>> +            return false; /* bail */
>>>> +        }
>>>> +
>>>> +        switch ( prev_state = state )
>>>> +        {
>>>> +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
>>>> +            p->state = STATE_IOREQ_NONE;
>>>> +            data = p->data;
>>>> +            break;
>>>> +
>>>> +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> 
>>>> IORESP_READY */
>>>> +        case STATE_IOREQ_INPROCESS:
>>>> +            wait_on_xen_event_channel(sv->ioreq_evtchn,
>>>> +                                      ({ state = p->state;
>>>> +                                         smp_rmb();
>>>> +                                         state != prev_state; }));
>>>> +            continue;
>>> As I pointed out previously [1], this helper was implemented with the
>>> expectation that wait_on_xen_event_channel() will not return if the vCPU
>>> got rescheduled.
>>>
>>> However, this assumption doesn't hold on Arm.
>>>
>>> I can see two solution:
>>>      1) Re-execute the caller
>>>      2) Prevent an IOREQ to disappear until the loop finish.
>>>
>>> @Paul any opinions?
>> The ioreq control plane is largely predicated on there being no 
>> pending I/O when the state of a server is modified, and it is assumed 
>> that domain_pause() is sufficient to achieve this. If that assumption 
>> doesn't hold then we need additional synchronization.

I don't think this assumption even hold on x86 because domain_pause() 
will not wait for I/O to finish.

On x86, the context switch will reset the stack and therefore 
wait_on_xen_event_channel() is not going to return. Instead, 
handle_hvm_io_completion() will be called from the tail callback in 
context_switch(). get_pending_vcpu() would return NULL as the IOREQ 
server disappeared. Although, it is not clear whether the vCPU will 
continue to run (or not).

Did I miss anything?

Regarding the fix itself, I am not sure what sort of synchronization we 
can do. Are you suggesting to wait for the I/O to complete? If so, how 
do we handle the case the IOREQ server died?

> May I please clarify whether a concern still stands (with what was said 
> above) and we need an additional synchronization on Arm?

Yes the concern is still there (See above).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-30 17:47         ` Julien Grall
@ 2020-10-01  6:59           ` Paul Durrant
  2020-10-01  8:49           ` Jan Beulich
  1 sibling, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2020-10-01  6:59 UTC (permalink / raw)
  To: 'Julien Grall', 'Oleksandr', xen-devel
  Cc: 'Oleksandr Tyshchenko', 'Andrew Cooper',
	'George Dunlap', 'Ian Jackson',
	'Jan Beulich', 'Stefano Stabellini',
	'Wei Liu', 'Roger Pau Monné',
	'Jun Nakajima', 'Kevin Tian',
	'Tim Deegan', 'Julien Grall'

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 30 September 2020 18:48
> To: Oleksandr <olekstysh@gmail.com>; xen-devel@lists.xenproject.org
> Cc: paul@xen.org; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Ian Jackson'
> <ian.jackson@eu.citrix.com>; 'Jan Beulich' <jbeulich@suse.com>; 'Stefano Stabellini'
> <sstabellini@kernel.org>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>; 'Jun
> Nakajima' <jun.nakajima@intel.com>; 'Kevin Tian' <kevin.tian@intel.com>; 'Tim Deegan' <tim@xen.org>;
> 'Julien Grall' <julien.grall@arm.com>
> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> 
> Hi,
> 
> On 30/09/2020 14:39, Oleksandr wrote:
> >
> > Hi Julien
> >
> > On 25.09.20 11:19, Paul Durrant wrote:
> >>> -----Original Message-----
> >>> From: Julien Grall <julien@xen.org>
> >>> Sent: 24 September 2020 19:01
> >>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>;
> >>> xen-devel@lists.xenproject.org
> >>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew
> >>> Cooper <andrew.cooper3@citrix.com>;
> >>> George Dunlap <george.dunlap@citrix.com>; Ian Jackson
> >>> <ian.jackson@eu.citrix.com>; Jan Beulich
> >>> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei
> >>> Liu <wl@xen.org>; Roger Pau
> >>> Monné <roger.pau@citrix.com>; Paul Durrant <paul@xen.org>; Jun
> >>> Nakajima <jun.nakajima@intel.com>;
> >>> Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>; Julien
> >>> Grall <julien.grall@arm.com>
> >>> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> >>>
> >>>
> >>>
> >>> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
> >>>> +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
> >>>> +{
> >>>> +    unsigned int prev_state = STATE_IOREQ_NONE;
> >>>> +    unsigned int state = p->state;
> >>>> +    uint64_t data = ~0;
> >>>> +
> >>>> +    smp_rmb();
> >>>> +
> >>>> +    /*
> >>>> +     * The only reason we should see this condition be false is
> >>>> when an
> >>>> +     * emulator dying races with I/O being requested.
> >>>> +     */
> >>>> +    while ( likely(state != STATE_IOREQ_NONE) )
> >>>> +    {
> >>>> +        if ( unlikely(state < prev_state) )
> >>>> +        {
> >>>> +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition
> >>>> %u -> %u\n",
> >>>> +                     prev_state, state);
> >>>> +            sv->pending = false;
> >>>> +            domain_crash(sv->vcpu->domain);
> >>>> +            return false; /* bail */
> >>>> +        }
> >>>> +
> >>>> +        switch ( prev_state = state )
> >>>> +        {
> >>>> +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> >>>> +            p->state = STATE_IOREQ_NONE;
> >>>> +            data = p->data;
> >>>> +            break;
> >>>> +
> >>>> +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} ->
> >>>> IORESP_READY */
> >>>> +        case STATE_IOREQ_INPROCESS:
> >>>> +            wait_on_xen_event_channel(sv->ioreq_evtchn,
> >>>> +                                      ({ state = p->state;
> >>>> +                                         smp_rmb();
> >>>> +                                         state != prev_state; }));
> >>>> +            continue;
> >>> As I pointed out previously [1], this helper was implemented with the
> >>> expectation that wait_on_xen_event_channel() will not return if the vCPU
> >>> got rescheduled.
> >>>
> >>> However, this assumption doesn't hold on Arm.
> >>>
> >>> I can see two solution:
> >>>      1) Re-execute the caller
> >>>      2) Prevent an IOREQ to disappear until the loop finish.
> >>>
> >>> @Paul any opinions?
> >> The ioreq control plane is largely predicated on there being no
> >> pending I/O when the state of a server is modified, and it is assumed
> >> that domain_pause() is sufficient to achieve this. If that assumption
> >> doesn't hold then we need additional synchronization.
> 
> I don't think this assumption even hold on x86 because domain_pause()
> will not wait for I/O to finish.
> 
> On x86, the context switch will reset the stack and therefore
> wait_on_xen_event_channel() is not going to return. Instead,
> handle_hvm_io_completion() will be called from the tail callback in
> context_switch(). get_pending_vcpu() would return NULL as the IOREQ
> server disappeared. Although, it is not clear whether the vCPU will
> continue to run (or not).
> 
> Did I miss anything?
> 
> Regarding the fix itself, I am not sure what sort of synchronization we
> can do. Are you suggesting to wait for the I/O to complete? If so, how
> do we handle the case the IOREQ server died?
> 

s/IOREQ server/emulator but that is a good point. If domain_pause() did wait for I/O to complete then this would have always been a problem so, with hindsight, it should have been obvious this was not the case.

Digging back, it looks like things would have probably been ok before 125833f5f1f0 "x86: fix ioreq-server event channel vulnerability" because, wait_on_xen_event_channel() and the loop condition above it did not dereference anything that would disappear with IOREQ server destruction (they used the shared page, which at this point was always a magic page and hence part of the target domain's memory). So things have probably been broken since 2014.

To fix the problem I think it is sufficient that we go back to a wait loop that can tolerate the IOREQ server disappearing between iterations and deals with that as a completed emulation (albeit returning f's for reads and sinking writes).

  Paul

> > May I please clarify whether a concern still stands (with what was said
> > above) and we need an additional synchronization on Arm?
> 
> Yes the concern is still there (See above).
> 
> Cheers,
> 
> --
> Julien Grall



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-09-30 17:47         ` Julien Grall
  2020-10-01  6:59           ` Paul Durrant
@ 2020-10-01  8:49           ` Jan Beulich
  2020-10-01  8:50             ` Paul Durrant
  1 sibling, 1 reply; 111+ messages in thread
From: Jan Beulich @ 2020-10-01  8:49 UTC (permalink / raw)
  To: Julien Grall
  Cc: Oleksandr, xen-devel, paul, 'Oleksandr Tyshchenko',
	'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Stefano Stabellini',
	'Wei Liu', 'Roger Pau Monné',
	'Jun Nakajima', 'Kevin Tian',
	'Tim Deegan', 'Julien Grall'

On 30.09.2020 19:47, Julien Grall wrote:
> Regarding the fix itself, I am not sure what sort of synchronization we 
> can do. Are you suggesting to wait for the I/O to complete? If so, how 
> do we handle the case the IOREQ server died?

In simple cases retrying the entire request may be an option. However,
if the server died after some parts of a multi-part operation were
done already, I guess the resulting loss of state is bad enough to
warrant crashing the guest. This shouldn't be much different from e.g.
a device disappearing from a bare metal system - any partial I/O done
to/from it will leave the machine in an unpredictable state, which it
may be too difficult to recover from without rebooting. (Of course,
staying with this analogue, it may also be okay to simple consider
the operation "complete", leaving it to the guest to recover. The
main issue on the hypervisor side then would be to ensure we don't
expose any uninitialized [due to not having got written to] data to
the guest.)

Jan


^ permalink raw reply	[flat|nested] 111+ messages in thread

* RE: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
  2020-10-01  8:49           ` Jan Beulich
@ 2020-10-01  8:50             ` Paul Durrant
  0 siblings, 0 replies; 111+ messages in thread
From: Paul Durrant @ 2020-10-01  8:50 UTC (permalink / raw)
  To: 'Jan Beulich', 'Julien Grall'
  Cc: 'Oleksandr', xen-devel, 'Oleksandr Tyshchenko',
	'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Stefano Stabellini',
	'Wei Liu', 'Roger Pau Monné',
	'Jun Nakajima', 'Kevin Tian',
	'Tim Deegan', 'Julien Grall'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 01 October 2020 09:49
> To: Julien Grall <julien@xen.org>
> Cc: Oleksandr <olekstysh@gmail.com>; xen-devel@lists.xenproject.org; paul@xen.org; 'Oleksandr
> Tyshchenko' <oleksandr_tyshchenko@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George
> Dunlap' <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Stefano Stabellini'
> <sstabellini@kernel.org>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>; 'Jun
> Nakajima' <jun.nakajima@intel.com>; 'Kevin Tian' <kevin.tian@intel.com>; 'Tim Deegan' <tim@xen.org>;
> 'Julien Grall' <julien.grall@arm.com>
> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> 
> On 30.09.2020 19:47, Julien Grall wrote:
> > Regarding the fix itself, I am not sure what sort of synchronization we
> > can do. Are you suggesting to wait for the I/O to complete? If so, how
> > do we handle the case the IOREQ server died?
> 
> In simple cases retrying the entire request may be an option. However,
> if the server died after some parts of a multi-part operation were
> done already, I guess the resulting loss of state is bad enough to
> warrant crashing the guest. This shouldn't be much different from e.g.
> a device disappearing from a bare metal system - any partial I/O done
> to/from it will leave the machine in an unpredictable state, which it
> may be too difficult to recover from without rebooting. (Of course,
> staying with this analogue, it may also be okay to simple consider
> the operation "complete", leaving it to the guest to recover. The
> main issue on the hypervisor side then would be to ensure we don't
> expose any uninitialized [due to not having got written to] data to
> the guest.)
> 

I'll try to take a look today and come up with a patch.

  Paul



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common
  2020-09-25 13:05             ` Oleksandr
@ 2020-10-02  9:55               ` Oleksandr
  2020-10-07 10:38                 ` Julien Grall
  0 siblings, 1 reply; 111+ messages in thread
From: Oleksandr @ 2020-10-02  9:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk, Andrew Cooper, George Dunlap,
	Ian Jackson, Wei Liu, Paul Durrant, Julien Grall


Hi Jan


On 25.09.20 16:05, Oleksandr wrote:
>
> On 25.09.20 10:03, Jan Beulich wrote:
>
> Hi Jan.
>
>> On 24.09.2020 18:45, Oleksandr wrote:
>>> On 24.09.20 14:16, Jan Beulich wrote:
>>>
>>> Hi Jan
>>>
>>>> On 22.09.2020 21:32, Oleksandr wrote:
>>>>> On 16.09.20 11:50, Jan Beulich wrote:
>>>>>> On 10.09.2020 22:22, Oleksandr Tyshchenko wrote:
>>>>>>> --- a/xen/common/memory.c
>>>>>>> +++ b/xen/common/memory.c
>>>>>>> @@ -1651,6 +1651,11 @@ long do_memory_op(unsigned long cmd, 
>>>>>>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>>             break;
>>>>>>>         }
>>>>>>>     +#ifdef CONFIG_IOREQ_SERVER
>>>>>>> +    if ( op == XENMEM_decrease_reservation )
>>>>>>> +        curr_d->qemu_mapcache_invalidate = true;
>>>>>>> +#endif
>>>>>> I don't see why you put this right into decrease_reservation(). This
>>>>>> isn't just to avoid the extra conditional, but first and foremost to
>>>>>> avoid bypassing the earlier return from the function (in the case of
>>>>>> preemption). In the context of this I wonder whether the ordering of
>>>>>> operations in hvm_hypercall() is actually correct.
>>>>> Good point, indeed we may return earlier in case of preemption, I 
>>>>> missed
>>>>> that.
>>>>> Will move it to decrease_reservation(). But, we may return even 
>>>>> earlier
>>>>> in case of error...
>>>>> Now I am wondering should we move it to the very beginning of command
>>>>> processing or not?
>>>> In _this_ series I'd strongly recommend you keep things working as
>>>> they are. If independently you think you've found a reason to
>>>> re-order certain operations, then feel free to send a patch with
>>>> suitable justification.
>>> Of course, I will try to retain current behavior.
>>>
>>>
>>>>>> I'm also unconvinced curr_d is the right domain in all cases here;
>>>>>> while this may be a pre-existing issue in principle, I'm afraid it
>>>>>> gets more pronounced by the logic getting moved to common code.
>>>>> Sorry I didn't get your concern here.
>>>> Well, you need to be concerned whose qemu_mapcache_invalidate flag
>>>> you set.
>>> May I ask, in what cases the *curr_d* is the right domain?
>> When a domain does a decrease-reservation on itself. I thought
>> that's obvious. But perhaps your question was rather meant a to
>> whether a->domain ever is _not_ the right one?
> No, my question was about *curr_d*. I saw your answer
> > I'm also unconvinced curr_d is the right domain in all cases here;
> and just wanted to clarify these cases. Sorry if I was unclear.
>
>
>>
>>> We need to make sure that domain is using IOREQ server(s) at least.
>>> Hopefully, we have a helper for this
>>> which is hvm_domain_has_ioreq_server(). Please clarify, anything else I
>>> should taking care of?
>> Nothing I can recall / think of right now, except that the change
>> may want to come under a different title and with a different
>> description.As indicated, I don't think this is correct for PVH
>> Dom0 issuing the request against a HVM DomU, and addressing this
>> will likely want this moved out of hvm_memory_op() anyway. Of
>> course an option is to split this into two patches - the proposed
>> bug fix (perhaps wanting backporting) and then the moving of the
>> field out of arch.hvm. If you feel uneasy about the bug fix part,
>> let me know and I (or maybe Roger) will see to put together a
>> patch.
>
> Thank you for the clarification.
>
> Yes, it would be really nice if you (or maybe Roger) could create a 
> patch for the bug fix part.


Thank you for your patch [1].

If I got it correctly there won't be a suitable common place where to 
set qemu_mapcache_invalidate flag anymore
as XENMEM_decrease_reservation is not a single place we need to make a 
decision whether to set it
By principle of analogy, on Arm we probably want to do so in 
guest_physmap_remove_page (or maybe better in p2m_remove_mapping).
Julien, what do you think?


I will modify current patch to not alter the common code.


[1] https://patchwork.kernel.org/patch/11803383/

-- 

Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common
  2020-10-02  9:55               ` Oleksandr
@ 2020-10-07 10:38                 ` Julien Grall
  2020-10-07 12:01                   ` Oleksandr
  0 siblings, 1 reply; 111+ messages in thread
From: Julien Grall @ 2020-10-07 10:38 UTC (permalink / raw)
  To: Oleksandr, Jan Beulich
  Cc: Roger Pau Monné,
	xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Wei Liu, Paul Durrant, Julien Grall

Hi Oleksandr,

On 02/10/2020 10:55, Oleksandr wrote:
> If I got it correctly there won't be a suitable common place where to 
> set qemu_mapcache_invalidate flag anymore
> as XENMEM_decrease_reservation is not a single place we need to make a 
> decision whether to set it
> By principle of analogy, on Arm we probably want to do so in 
> guest_physmap_remove_page (or maybe better in p2m_remove_mapping).
> Julien, what do you think?

At the moment, the Arm code doesn't explicitely remove the existing 
mapping before inserting the new mapping. Instead, this is done 
implicitely by p2m_set_entry().

So I think we want to invalidate the QEMU mapcache in p2m_set_entry() if 
the old entry is a RAM page *and* the new MFN is different.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common
  2020-10-07 10:38                 ` Julien Grall
@ 2020-10-07 12:01                   ` Oleksandr
  0 siblings, 0 replies; 111+ messages in thread
From: Oleksandr @ 2020-10-07 12:01 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: Roger Pau Monné,
	xen-devel, Oleksandr Tyshchenko, Stefano Stabellini,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Ian Jackson,
	Wei Liu, Paul Durrant, Julien Grall


On 07.10.20 13:38, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien.



>
> On 02/10/2020 10:55, Oleksandr wrote:
>> If I got it correctly there won't be a suitable common place where to 
>> set qemu_mapcache_invalidate flag anymore
>> as XENMEM_decrease_reservation is not a single place we need to make 
>> a decision whether to set it
>> By principle of analogy, on Arm we probably want to do so in 
>> guest_physmap_remove_page (or maybe better in p2m_remove_mapping).
>> Julien, what do you think?
>
> At the moment, the Arm code doesn't explicitely remove the existing 
> mapping before inserting the new mapping. Instead, this is done 
> implicitely by p2m_set_entry().

Got it.


>
>
> So I think we want to invalidate the QEMU mapcache in p2m_set_entry() 
> if the old entry is a RAM page *and* the new MFN is different.

Thank you. I hope, the following is close to what was suggested (didn't 
test yet):


diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index ae8594f..512eea9 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1073,7 +1073,14 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
       */
      if ( p2m_is_valid(orig_pte) &&
           !mfn_eq(lpae_get_mfn(*entry), lpae_get_mfn(orig_pte)) )
+    {
+#ifdef CONFIG_IOREQ_SERVER
+        if ( domain_has_ioreq_server(p2m->domain) &&
+             (p2m->domain == current->domain) && 
p2m_is_ram(orig_pte.p2m.type) )
+            p2m->domain->qemu_mapcache_invalidate = true;
+#endif
          p2m_free_entry(p2m, orig_pte, level);
+    }

  out:
      unmap_domain_page(table);


But, if I got the review comments correctly [1], the 
qemu_mapcache_invalidate variable should be per-vcpu instead of per-domain?

[1] https://patchwork.kernel.org/patch/11803383/



-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2020-10-07 12:02 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10 20:21 [PATCH V1 00/16] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
2020-09-10 20:21 ` [PATCH V1 01/16] x86/ioreq: Prepare IOREQ feature for making it common Oleksandr Tyshchenko
2020-09-14 13:52   ` Jan Beulich
2020-09-21 12:22     ` Oleksandr
2020-09-21 12:31       ` Jan Beulich
2020-09-21 12:47         ` Oleksandr
2020-09-21 13:29           ` Jan Beulich
2020-09-21 14:43             ` Oleksandr
2020-09-21 15:28               ` Jan Beulich
2020-09-23 17:22   ` Julien Grall
2020-09-23 18:08     ` Oleksandr
2020-09-10 20:21 ` [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
2020-09-14 14:17   ` Jan Beulich
2020-09-21 19:02     ` Oleksandr
2020-09-22  6:33       ` Jan Beulich
2020-09-22  9:58         ` Oleksandr
2020-09-22 10:54           ` Jan Beulich
2020-09-22 15:05             ` Oleksandr
2020-09-22 15:52               ` Jan Beulich
2020-09-23 12:28                 ` Oleksandr
2020-09-24 10:58                   ` Jan Beulich
2020-09-24 15:38                     ` Oleksandr
2020-09-24 15:51                       ` Jan Beulich
2020-09-24 18:01   ` Julien Grall
2020-09-25  8:19     ` Paul Durrant
2020-09-30 13:39       ` Oleksandr
2020-09-30 17:47         ` Julien Grall
2020-10-01  6:59           ` Paul Durrant
2020-10-01  8:49           ` Jan Beulich
2020-10-01  8:50             ` Paul Durrant
2020-09-10 20:21 ` [PATCH V1 03/16] xen/ioreq: Make x86's hvm_ioreq_needs_completion() common Oleksandr Tyshchenko
2020-09-14 14:59   ` Jan Beulich
2020-09-22 16:16     ` Oleksandr
2020-09-23 17:27     ` Julien Grall
2020-09-10 20:21 ` [PATCH V1 04/16] xen/ioreq: Provide alias for the handle_mmio() Oleksandr Tyshchenko
2020-09-14 15:10   ` Jan Beulich
2020-09-22 16:20     ` Oleksandr
2020-09-23 17:28   ` Julien Grall
2020-09-23 18:17     ` Oleksandr
2020-09-10 20:21 ` [PATCH V1 05/16] xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common Oleksandr Tyshchenko
2020-09-14 15:13   ` Jan Beulich
2020-09-22 16:24     ` Oleksandr
2020-09-10 20:22 ` [PATCH V1 06/16] xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common Oleksandr Tyshchenko
2020-09-14 15:16   ` Jan Beulich
2020-09-14 15:59     ` Julien Grall
2020-09-22 16:33     ` Oleksandr
2020-09-10 20:22 ` [PATCH V1 07/16] xen/dm: Make x86's DM feature common Oleksandr Tyshchenko
2020-09-14 15:56   ` Jan Beulich
2020-09-22 16:46     ` Oleksandr
2020-09-24 11:03       ` Jan Beulich
2020-09-24 12:47         ` Oleksandr
2020-09-23 17:35   ` Julien Grall
2020-09-23 18:28     ` Oleksandr
2020-09-10 20:22 ` [PATCH V1 08/16] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
2020-09-10 20:22 ` [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
2020-09-11 10:14   ` Oleksandr
2020-09-16  7:51   ` Jan Beulich
2020-09-22 17:12     ` Oleksandr
2020-09-23 18:03   ` Julien Grall
2020-09-23 20:16     ` Oleksandr
2020-09-24 11:08       ` Jan Beulich
2020-09-24 16:02         ` Oleksandr
2020-09-24 18:02           ` Oleksandr
2020-09-25  6:51             ` Jan Beulich
2020-09-25  9:47               ` Oleksandr
2020-09-26 13:12             ` Julien Grall
2020-09-26 13:18               ` Oleksandr
2020-09-24 16:51         ` Julien Grall
2020-09-24 17:25       ` Julien Grall
2020-09-24 18:22         ` Oleksandr
2020-09-26 13:21           ` Julien Grall
2020-09-26 14:57             ` Oleksandr
2020-09-10 20:22 ` [PATCH V1 10/16] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm Oleksandr Tyshchenko
2020-09-16  7:17   ` Jan Beulich
2020-09-16  8:50     ` Julien Grall
2020-09-16  8:52       ` Jan Beulich
2020-09-16  8:55         ` Julien Grall
2020-09-22 17:30           ` Oleksandr
2020-09-16  8:08   ` Jan Beulich
2020-09-10 20:22 ` [PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server() Oleksandr Tyshchenko
2020-09-16  8:04   ` Jan Beulich
2020-09-16  8:13     ` Paul Durrant
2020-09-16  8:39       ` Julien Grall
2020-09-16  8:43         ` Paul Durrant
2020-09-22 18:39           ` Oleksandr
2020-09-22 18:23     ` Oleksandr
2020-09-10 20:22 ` [PATCH V1 12/16] xen/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
2020-09-26 13:50   ` Julien Grall
2020-09-26 14:21     ` Oleksandr
2020-09-10 20:22 ` [PATCH V1 13/16] xen/ioreq: Make x86's invalidate qemu mapcache handling common Oleksandr Tyshchenko
2020-09-16  8:50   ` Jan Beulich
2020-09-22 19:32     ` Oleksandr
2020-09-24 11:16       ` Jan Beulich
2020-09-24 16:45         ` Oleksandr
2020-09-25  7:03           ` Jan Beulich
2020-09-25 13:05             ` Oleksandr
2020-10-02  9:55               ` Oleksandr
2020-10-07 10:38                 ` Julien Grall
2020-10-07 12:01                   ` Oleksandr
2020-09-10 20:22 ` [PATCH V1 14/16] xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() Oleksandr Tyshchenko
2020-09-16  9:04   ` Jan Beulich
2020-09-16  9:07     ` Julien Grall
2020-09-16  9:09       ` Paul Durrant
2020-09-16  9:12         ` Julien Grall
2020-09-22 20:05           ` Oleksandr
2020-09-23 18:12             ` Julien Grall
2020-09-23 20:29               ` Oleksandr
2020-09-16  9:07     ` Paul Durrant
2020-09-23 18:05   ` Julien Grall
2020-09-10 20:22 ` [PATCH V1 15/16] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
2020-09-10 20:22 ` [PATCH V1 16/16] [RFC] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.