xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm
@ 2020-08-03 18:21 Oleksandr Tyshchenko
  2020-08-03 18:21 ` [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
                   ` (12 more replies)
  0 siblings, 13 replies; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Paul Durrant, Andrew Cooper, Ian Jackson, George Dunlap,
	Tim Deegan, Oleksandr Tyshchenko, Jan Beulich, Anthony PERARD,
	Daniel De Graaf, Bertrand Marquis, Volodymyr Babchuk,
	Roger Pau Monné

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Hello all.

The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
You can find an initial discussion at [1]. Xen on Arm requires some implementation
to forward guest MMIO access to a device model in order to implement virtio-mmio
backend or even mediator outside of hypervisor. As Xen on x86 already contains
required support this patch series tries to make it common and introduce Arm
specific bits plus some new functionality. Patch series is based on Julien's
PoC "xen/arm: Add support for Guest IO forwarding to a device emulator".
Besides splitting existing IOREQ/DM support and introducing Arm side, 
the patch series also includes virtio-mmio related changes (toolstack)
for the reviewers to be able to see how the whole picture could look like.
For a non-RFC, the IOREQ/DM and virtio-mmio support will be sent separately.

According to the initial discussion there are a few open questions/concerns
regarding security, performance in VirtIO solution:
1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require different
   transport...
2. virtio backend is able to access all guest memory, some kind of protection
   is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest'
3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using
   Xenstore in virtio backend if possible.
4. a lot of 'foreing mapping' could lead to the memory exhaustion, Julien
   has some idea regarding that.

Looks like all of them are valid and worth considering, but the first thing
which we need on Arm is a mechanism to forward guest IO to a device emulator,
so let's focus on it in the first place.

***

Patch series [2] was rebased on Xen v4.14 release and tested on Renesas Salvator-X
board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend (we will share it later)
running in driver domain and unmodified Linux Guest running on existing
virtio-blk driver (frontend). No issues were observed. Guest domain 'reboot/destroy'
use-cases work properly. Patch series was only build-tested on x86.

Please note, build-test passed for the following modes:
1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set
3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set
5. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set

Build-test didn't pass for Arm32 mode with 'CONFIG_IOREQ_SERVER=y' due to the lack of
cmpxchg_64 support on Arm32. See cmpxchg usage in hvm_send_buffered_ioreq()).

***

Any feedback/help would be highly appreciated.

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00825.html
[2] https://github.com/otyshchenko1/xen/commits/ioreq_4.14_ml1

Oleksandr Tyshchenko (12):
  hvm/ioreq: Make x86's IOREQ feature common
  hvm/dm: Make x86's DM feature common
  xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
  xen/arm: Introduce arch specific bits for IOREQ/DM features
  hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  libxl: Introduce basic virtio-mmio support on Arm
  A collection of tweaks to be able to run emulator in driver domain
  xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation
  libxl: Handle virtio-mmio irq in more correct way
  libxl: Add support for virtio-disk configuration
  libxl: Insert "dma-coherent" property into virtio-mmio device node
  libxl: Fix duplicate memory node in DT

 tools/libs/devicemodel/core.c                   |   18 +
 tools/libs/devicemodel/include/xendevicemodel.h |    4 +
 tools/libs/devicemodel/libxendevicemodel.map    |    1 +
 tools/libxc/xc_dom_arm.c                        |   25 +-
 tools/libxl/Makefile                            |    4 +-
 tools/libxl/libxl_arm.c                         |   98 +-
 tools/libxl/libxl_create.c                      |    1 +
 tools/libxl/libxl_internal.h                    |    1 +
 tools/libxl/libxl_types.idl                     |   16 +
 tools/libxl/libxl_types_internal.idl            |    1 +
 tools/libxl/libxl_virtio_disk.c                 |  109 ++
 tools/xl/Makefile                               |    2 +-
 tools/xl/xl.h                                   |    3 +
 tools/xl/xl_cmdtable.c                          |   15 +
 tools/xl/xl_parse.c                             |  116 ++
 tools/xl/xl_virtio_disk.c                       |   46 +
 xen/arch/arm/Kconfig                            |    1 +
 xen/arch/arm/Makefile                           |    2 +
 xen/arch/arm/dm.c                               |   54 +
 xen/arch/arm/domain.c                           |    9 +
 xen/arch/arm/hvm.c                              |   46 +-
 xen/arch/arm/io.c                               |   67 +-
 xen/arch/arm/ioreq.c                            |  100 ++
 xen/arch/arm/traps.c                            |   23 +
 xen/arch/x86/Kconfig                            |    1 +
 xen/arch/x86/hvm/dm.c                           |  289 +----
 xen/arch/x86/hvm/emulate.c                      |    2 +-
 xen/arch/x86/hvm/hvm.c                          |    2 +-
 xen/arch/x86/hvm/io.c                           |    2 +-
 xen/arch/x86/hvm/ioreq.c                        | 1431 +----------------------
 xen/arch/x86/hvm/stdvga.c                       |    2 +-
 xen/arch/x86/hvm/vmx/realmode.c                 |    1 +
 xen/arch/x86/hvm/vmx/vvmx.c                     |    2 +-
 xen/arch/x86/mm.c                               |   45 -
 xen/arch/x86/mm/shadow/common.c                 |    2 +-
 xen/common/Kconfig                              |    3 +
 xen/common/Makefile                             |    1 +
 xen/common/domain.c                             |   15 +
 xen/common/domctl.c                             |    8 +-
 xen/common/event_channel.c                      |   14 +-
 xen/common/hvm/Makefile                         |    2 +
 xen/common/hvm/dm.c                             |  288 +++++
 xen/common/hvm/ioreq.c                          | 1430 ++++++++++++++++++++++
 xen/common/memory.c                             |   54 +-
 xen/include/asm-arm/domain.h                    |   82 ++
 xen/include/asm-arm/hvm/ioreq.h                 |  105 ++
 xen/include/asm-arm/mm.h                        |    8 -
 xen/include/asm-arm/mmio.h                      |    1 +
 xen/include/asm-arm/p2m.h                       |    7 +-
 xen/include/asm-x86/hvm/ioreq.h                 |   45 +-
 xen/include/asm-x86/hvm/vcpu.h                  |    7 -
 xen/include/asm-x86/mm.h                        |    4 -
 xen/include/public/hvm/dm_op.h                  |   15 +
 xen/include/xen/hvm/ioreq.h                     |   89 ++
 xen/include/xen/hypercall.h                     |   12 +
 xen/include/xsm/dummy.h                         |   20 +-
 xen/include/xsm/xsm.h                           |    6 +-
 xen/xsm/dummy.c                                 |    2 +-
 xen/xsm/flask/hooks.c                           |    5 +-
 59 files changed, 2958 insertions(+), 1806 deletions(-)
 create mode 100644 tools/libxl/libxl_virtio_disk.c
 create mode 100644 tools/xl/xl_virtio_disk.c
 create mode 100644 xen/arch/arm/dm.c
 create mode 100644 xen/arch/arm/ioreq.c
 create mode 100644 xen/common/hvm/Makefile
 create mode 100644 xen/common/hvm/dm.c
 create mode 100644 xen/common/hvm/ioreq.c
 create mode 100644 xen/include/asm-arm/hvm/ioreq.h
 create mode 100644 xen/include/xen/hvm/ioreq.h

-- 
2.7.4



^ permalink raw reply	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-04  7:45   ` Paul Durrant
                     ` (3 more replies)
  2020-08-03 18:21 ` [RFC PATCH V1 02/12] hvm/dm: Make x86's DM " Oleksandr Tyshchenko
                   ` (11 subsequent siblings)
  12 siblings, 4 replies; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jun Nakajima,
	Wei Liu, Paul Durrant, Andrew Cooper, Ian Jackson, George Dunlap,
	Tim Deegan, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	Roger Pau Monné

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As a lot of x86 code can be re-used on Arm later on, this patch
splits IOREQ support into common and arch specific parts.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Please note, this is a split/cleanup of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 xen/arch/x86/Kconfig            |    1 +
 xen/arch/x86/hvm/dm.c           |    2 +-
 xen/arch/x86/hvm/emulate.c      |    2 +-
 xen/arch/x86/hvm/hvm.c          |    2 +-
 xen/arch/x86/hvm/io.c           |    2 +-
 xen/arch/x86/hvm/ioreq.c        | 1431 +--------------------------------------
 xen/arch/x86/hvm/stdvga.c       |    2 +-
 xen/arch/x86/hvm/vmx/realmode.c |    1 +
 xen/arch/x86/hvm/vmx/vvmx.c     |    2 +-
 xen/arch/x86/mm.c               |    2 +-
 xen/arch/x86/mm/shadow/common.c |    2 +-
 xen/common/Kconfig              |    3 +
 xen/common/Makefile             |    1 +
 xen/common/hvm/Makefile         |    1 +
 xen/common/hvm/ioreq.c          | 1430 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/ioreq.h |   45 +-
 xen/include/asm-x86/hvm/vcpu.h  |    7 -
 xen/include/xen/hvm/ioreq.h     |   89 +++
 18 files changed, 1575 insertions(+), 1450 deletions(-)
 create mode 100644 xen/common/hvm/Makefile
 create mode 100644 xen/common/hvm/ioreq.c
 create mode 100644 xen/include/xen/hvm/ioreq.h

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index a636a4b..f5a9f87 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -91,6 +91,7 @@ config PV_LINEAR_PT
 
 config HVM
 	def_bool !PV_SHIM_EXCLUSIVE
+	select IOREQ_SERVER
 	prompt "HVM support"
 	---help---
 	  Interfaces to support HVM domains.  HVM domains require hardware
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index e3f8451..70adb27 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -16,13 +16,13 @@
 
 #include <xen/event.h>
 #include <xen/guest_access.h>
+#include <xen/hvm/ioreq.h>
 #include <xen/hypercall.h>
 #include <xen/nospec.h>
 #include <xen/sched.h>
 
 #include <asm/hap.h>
 #include <asm/hvm/cacheattr.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/shadow.h>
 
 #include <xsm/xsm.h>
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 8b4e73a..78993b3 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -9,6 +9,7 @@
  *    Keir Fraser <keir@xen.org>
  */
 
+#include <xen/hvm/ioreq.h>
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
@@ -20,7 +21,6 @@
 #include <asm/xstate.h>
 #include <asm/hvm/emulate.h>
 #include <asm/hvm/hvm.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/monitor.h>
 #include <asm/hvm/trace.h>
 #include <asm/hvm/support.h>
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 5bb4758..c05025d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -19,6 +19,7 @@
  */
 
 #include <xen/ctype.h>
+#include <xen/hvm/ioreq.h>
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <xen/trace.h>
@@ -64,7 +65,6 @@
 #include <asm/hvm/trace.h>
 #include <asm/hvm/nestedhvm.h>
 #include <asm/hvm/monitor.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/viridian.h>
 #include <asm/hvm/vm_event.h>
 #include <asm/altp2m.h>
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 724ab44..5d501d1 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -18,6 +18,7 @@
  * this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/hvm/ioreq.h>
 #include <xen/init.h>
 #include <xen/mm.h>
 #include <xen/lib.h>
@@ -35,7 +36,6 @@
 #include <asm/shadow.h>
 #include <asm/p2m.h>
 #include <asm/hvm/hvm.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vpt.h>
 #include <asm/hvm/vpic.h>
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 7240070..dd21e85 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -17,6 +17,7 @@
  */
 
 #include <xen/ctype.h>
+#include <xen/hvm/ioreq.h>
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <xen/trace.h>
@@ -28,1069 +29,16 @@
 #include <xen/paging.h>
 #include <xen/vpci.h>
 
-#include <asm/hvm/emulate.h>
-#include <asm/hvm/hvm.h>
-#include <asm/hvm/ioreq.h>
-#include <asm/hvm/vmx/vmx.h>
-
-#include <public/hvm/ioreq.h>
-#include <public/hvm/params.h>
-
-static void set_ioreq_server(struct domain *d, unsigned int id,
-                             struct hvm_ioreq_server *s)
-{
-    ASSERT(id < MAX_NR_IOREQ_SERVERS);
-    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
-
-    d->arch.hvm.ioreq_server.server[id] = s;
-}
-
-#define GET_IOREQ_SERVER(d, id) \
-    (d)->arch.hvm.ioreq_server.server[id]
-
-static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
-                                                 unsigned int id)
-{
-    if ( id >= MAX_NR_IOREQ_SERVERS )
-        return NULL;
-
-    return GET_IOREQ_SERVER(d, id);
-}
-
-/*
- * Iterate over all possible ioreq servers.
- *
- * NOTE: The iteration is backwards such that more recently created
- *       ioreq servers are favoured in hvm_select_ioreq_server().
- *       This is a semantic that previously existed when ioreq servers
- *       were held in a linked list.
- */
-#define FOR_EACH_IOREQ_SERVER(d, id, s) \
-    for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \
-        if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \
-            continue; \
-        else
-
-static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
-{
-    shared_iopage_t *p = s->ioreq.va;
-
-    ASSERT((v == current) || !vcpu_runnable(v));
-    ASSERT(p != NULL);
-
-    return &p->vcpu_ioreq[v->vcpu_id];
-}
-
-bool hvm_io_pending(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        struct hvm_ioreq_vcpu *sv;
-
-        list_for_each_entry ( sv,
-                              &s->ioreq_vcpu_list,
-                              list_entry )
-        {
-            if ( sv->vcpu == v && sv->pending )
-                return true;
-        }
-    }
-
-    return false;
-}
-
-static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data)
-{
-    struct vcpu *v = sv->vcpu;
-    ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req;
-
-    if ( hvm_ioreq_needs_completion(ioreq) )
-        ioreq->data = data;
-
-    sv->pending = false;
-}
-
-static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
-{
-    unsigned int prev_state = STATE_IOREQ_NONE;
-
-    while ( sv->pending )
-    {
-        unsigned int state = p->state;
-
-        smp_rmb();
-
-    recheck:
-        if ( unlikely(state == STATE_IOREQ_NONE) )
-        {
-            /*
-             * The only reason we should see this case is when an
-             * emulator is dying and it races with an I/O being
-             * requested.
-             */
-            hvm_io_assist(sv, ~0ul);
-            break;
-        }
-
-        if ( unlikely(state < prev_state) )
-        {
-            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
-                     prev_state, state);
-            sv->pending = false;
-            domain_crash(sv->vcpu->domain);
-            return false; /* bail */
-        }
-
-        switch ( prev_state = state )
-        {
-        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
-            p->state = STATE_IOREQ_NONE;
-            hvm_io_assist(sv, p->data);
-            break;
-        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
-        case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(sv->ioreq_evtchn,
-                                      ({ state = p->state;
-                                         smp_rmb();
-                                         state != prev_state; }));
-            goto recheck;
-        default:
-            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state);
-            sv->pending = false;
-            domain_crash(sv->vcpu->domain);
-            return false; /* bail */
-        }
-    }
-
-    return true;
-}
-
-bool handle_hvm_io_completion(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
-    struct hvm_ioreq_server *s;
-    enum hvm_io_completion io_completion;
-    unsigned int id;
-
-    if ( has_vpci(d) && vpci_process_pending(v) )
-    {
-        raise_softirq(SCHEDULE_SOFTIRQ);
-        return false;
-    }
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        struct hvm_ioreq_vcpu *sv;
-
-        list_for_each_entry ( sv,
-                              &s->ioreq_vcpu_list,
-                              list_entry )
-        {
-            if ( sv->vcpu == v && sv->pending )
-            {
-                if ( !hvm_wait_for_io(sv, get_ioreq(s, v)) )
-                    return false;
-
-                break;
-            }
-        }
-    }
-
-    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
-        STATE_IORESP_READY : STATE_IOREQ_NONE;
-
-    msix_write_completion(v);
-    vcpu_end_shutdown_deferral(v);
-
-    io_completion = vio->io_completion;
-    vio->io_completion = HVMIO_no_completion;
-
-    switch ( io_completion )
-    {
-    case HVMIO_no_completion:
-        break;
-
-    case HVMIO_mmio_completion:
-        return handle_mmio();
-
-    case HVMIO_pio_completion:
-        return handle_pio(vio->io_req.addr, vio->io_req.size,
-                          vio->io_req.dir);
-
-    case HVMIO_realmode_completion:
-    {
-        struct hvm_emulate_ctxt ctxt;
-
-        hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
-        vmx_realmode_emulate_one(&ctxt);
-        hvm_emulate_writeback(&ctxt);
-
-        break;
-    }
-    default:
-        ASSERT_UNREACHABLE();
-        break;
-    }
-
-    return true;
-}
-
-static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s)
-{
-    struct domain *d = s->target;
-    unsigned int i;
-
-    BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN != HVM_PARAM_IOREQ_PFN + 1);
-
-    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
-    {
-        if ( !test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask) )
-            return _gfn(d->arch.hvm.params[i]);
-    }
-
-    return INVALID_GFN;
-}
-
-static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
-{
-    struct domain *d = s->target;
-    unsigned int i;
-
-    for ( i = 0; i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8; i++ )
-    {
-        if ( test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.mask) )
-            return _gfn(d->arch.hvm.ioreq_gfn.base + i);
-    }
-
-    /*
-     * If we are out of 'normal' GFNs then we may still have a 'legacy'
-     * GFN available.
-     */
-    return hvm_alloc_legacy_ioreq_gfn(s);
-}
-
-static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s,
-                                      gfn_t gfn)
-{
-    struct domain *d = s->target;
-    unsigned int i;
-
-    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
-    {
-        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
-             break;
-    }
-    if ( i > HVM_PARAM_BUFIOREQ_PFN )
-        return false;
-
-    set_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask);
-    return true;
-}
-
-static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
-{
-    struct domain *d = s->target;
-    unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base;
-
-    ASSERT(!gfn_eq(gfn, INVALID_GFN));
-
-    if ( !hvm_free_legacy_ioreq_gfn(s, gfn) )
-    {
-        ASSERT(i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8);
-        set_bit(i, &d->arch.hvm.ioreq_gfn.mask);
-    }
-}
-
-static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return;
-
-    destroy_ring_for_helper(&iorp->va, iorp->page);
-    iorp->page = NULL;
-
-    hvm_free_ioreq_gfn(s, iorp->gfn);
-    iorp->gfn = INVALID_GFN;
-}
-
-static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    int rc;
-
-    if ( iorp->page )
-    {
-        /*
-         * If a page has already been allocated (which will happen on
-         * demand if hvm_get_ioreq_server_frame() is called), then
-         * mapping a guest frame is not permitted.
-         */
-        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-            return -EPERM;
-
-        return 0;
-    }
-
-    if ( d->is_dying )
-        return -EINVAL;
-
-    iorp->gfn = hvm_alloc_ioreq_gfn(s);
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return -ENOMEM;
-
-    rc = prepare_ring_for_helper(d, gfn_x(iorp->gfn), &iorp->page,
-                                 &iorp->va);
-
-    if ( rc )
-        hvm_unmap_ioreq_gfn(s, buf);
-
-    return rc;
-}
-
-static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    struct page_info *page;
-
-    if ( iorp->page )
-    {
-        /*
-         * If a guest frame has already been mapped (which may happen
-         * on demand if hvm_get_ioreq_server_info() is called), then
-         * allocating a page is not permitted.
-         */
-        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
-            return -EPERM;
-
-        return 0;
-    }
-
-    page = alloc_domheap_page(s->target, MEMF_no_refcount);
-
-    if ( !page )
-        return -ENOMEM;
-
-    if ( !get_page_and_type(page, s->target, PGT_writable_page) )
-    {
-        /*
-         * The domain can't possibly know about this page yet, so failure
-         * here is a clear indication of something fishy going on.
-         */
-        domain_crash(s->emulator);
-        return -ENODATA;
-    }
-
-    iorp->va = __map_domain_page_global(page);
-    if ( !iorp->va )
-        goto fail;
-
-    iorp->page = page;
-    clear_page(iorp->va);
-    return 0;
-
- fail:
-    put_page_alloc_ref(page);
-    put_page_and_type(page);
-
-    return -ENOMEM;
-}
-
-static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    struct page_info *page = iorp->page;
-
-    if ( !page )
-        return;
-
-    iorp->page = NULL;
-
-    unmap_domain_page_global(iorp->va);
-    iorp->va = NULL;
-
-    put_page_alloc_ref(page);
-    put_page_and_type(page);
-}
-
-bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
-{
-    const struct hvm_ioreq_server *s;
-    unsigned int id;
-    bool found = false;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
-        {
-            found = true;
-            break;
-        }
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return found;
-}
-
-static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-
-{
-    struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return;
-
-    if ( guest_physmap_remove_page(d, iorp->gfn,
-                                   page_to_mfn(iorp->page), 0) )
-        domain_crash(d);
-    clear_page(iorp->va);
-}
-
-static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    int rc;
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return 0;
-
-    clear_page(iorp->va);
-
-    rc = guest_physmap_add_page(d, iorp->gfn,
-                                page_to_mfn(iorp->page), 0);
-    if ( rc == 0 )
-        paging_mark_pfn_dirty(d, _pfn(gfn_x(iorp->gfn)));
-
-    return rc;
-}
-
-static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
-                                    struct hvm_ioreq_vcpu *sv)
-{
-    ASSERT(spin_is_locked(&s->lock));
-
-    if ( s->ioreq.va != NULL )
-    {
-        ioreq_t *p = get_ioreq(s, sv->vcpu);
-
-        p->vp_eport = sv->ioreq_evtchn;
-    }
-}
-
-#define HANDLE_BUFIOREQ(s) \
-    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
-
-static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
-                                     struct vcpu *v)
-{
-    struct hvm_ioreq_vcpu *sv;
-    int rc;
-
-    sv = xzalloc(struct hvm_ioreq_vcpu);
-
-    rc = -ENOMEM;
-    if ( !sv )
-        goto fail1;
-
-    spin_lock(&s->lock);
-
-    rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id,
-                                         s->emulator->domain_id, NULL);
-    if ( rc < 0 )
-        goto fail2;
-
-    sv->ioreq_evtchn = rc;
-
-    if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-    {
-        rc = alloc_unbound_xen_event_channel(v->domain, 0,
-                                             s->emulator->domain_id, NULL);
-        if ( rc < 0 )
-            goto fail3;
-
-        s->bufioreq_evtchn = rc;
-    }
-
-    sv->vcpu = v;
-
-    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
-
-    if ( s->enabled )
-        hvm_update_ioreq_evtchn(s, sv);
-
-    spin_unlock(&s->lock);
-    return 0;
-
- fail3:
-    free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
- fail2:
-    spin_unlock(&s->lock);
-    xfree(sv);
-
- fail1:
-    return rc;
-}
-
-static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
-                                         struct vcpu *v)
-{
-    struct hvm_ioreq_vcpu *sv;
-
-    spin_lock(&s->lock);
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-    {
-        if ( sv->vcpu != v )
-            continue;
-
-        list_del(&sv->list_entry);
-
-        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
-
-        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
-        xfree(sv);
-        break;
-    }
-
-    spin_unlock(&s->lock);
-}
-
-static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
-{
-    struct hvm_ioreq_vcpu *sv, *next;
-
-    spin_lock(&s->lock);
-
-    list_for_each_entry_safe ( sv,
-                               next,
-                               &s->ioreq_vcpu_list,
-                               list_entry )
-    {
-        struct vcpu *v = sv->vcpu;
-
-        list_del(&sv->list_entry);
-
-        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
-
-        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
-        xfree(sv);
-    }
-
-    spin_unlock(&s->lock);
-}
-
-static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
-{
-    int rc;
-
-    rc = hvm_map_ioreq_gfn(s, false);
-
-    if ( !rc && HANDLE_BUFIOREQ(s) )
-        rc = hvm_map_ioreq_gfn(s, true);
-
-    if ( rc )
-        hvm_unmap_ioreq_gfn(s, false);
-
-    return rc;
-}
-
-static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
-{
-    hvm_unmap_ioreq_gfn(s, true);
-    hvm_unmap_ioreq_gfn(s, false);
-}
-
-static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
-{
-    int rc;
-
-    rc = hvm_alloc_ioreq_mfn(s, false);
-
-    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
-        rc = hvm_alloc_ioreq_mfn(s, true);
-
-    if ( rc )
-        hvm_free_ioreq_mfn(s, false);
-
-    return rc;
-}
-
-static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
-{
-    hvm_free_ioreq_mfn(s, true);
-    hvm_free_ioreq_mfn(s, false);
-}
-
-static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
-{
-    unsigned int i;
-
-    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
-        rangeset_destroy(s->range[i]);
-}
-
-static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
-                                            ioservid_t id)
-{
-    unsigned int i;
-    int rc;
-
-    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
-    {
-        char *name;
-
-        rc = asprintf(&name, "ioreq_server %d %s", id,
-                      (i == XEN_DMOP_IO_RANGE_PORT) ? "port" :
-                      (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" :
-                      (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" :
-                      "");
-        if ( rc )
-            goto fail;
-
-        s->range[i] = rangeset_new(s->target, name,
-                                   RANGESETF_prettyprint_hex);
-
-        xfree(name);
-
-        rc = -ENOMEM;
-        if ( !s->range[i] )
-            goto fail;
-
-        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
-    }
-
-    return 0;
-
- fail:
-    hvm_ioreq_server_free_rangesets(s);
-
-    return rc;
-}
-
-static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
-{
-    struct hvm_ioreq_vcpu *sv;
-
-    spin_lock(&s->lock);
-
-    if ( s->enabled )
-        goto done;
-
-    hvm_remove_ioreq_gfn(s, false);
-    hvm_remove_ioreq_gfn(s, true);
-
-    s->enabled = true;
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-        hvm_update_ioreq_evtchn(s, sv);
-
-  done:
-    spin_unlock(&s->lock);
-}
-
-static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
-{
-    spin_lock(&s->lock);
-
-    if ( !s->enabled )
-        goto done;
-
-    hvm_add_ioreq_gfn(s, true);
-    hvm_add_ioreq_gfn(s, false);
-
-    s->enabled = false;
-
- done:
-    spin_unlock(&s->lock);
-}
-
-static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
-                                 struct domain *d, int bufioreq_handling,
-                                 ioservid_t id)
-{
-    struct domain *currd = current->domain;
-    struct vcpu *v;
-    int rc;
-
-    s->target = d;
-
-    get_knownalive_domain(currd);
-    s->emulator = currd;
-
-    spin_lock_init(&s->lock);
-    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
-    spin_lock_init(&s->bufioreq_lock);
-
-    s->ioreq.gfn = INVALID_GFN;
-    s->bufioreq.gfn = INVALID_GFN;
-
-    rc = hvm_ioreq_server_alloc_rangesets(s, id);
-    if ( rc )
-        return rc;
-
-    s->bufioreq_handling = bufioreq_handling;
-
-    for_each_vcpu ( d, v )
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc )
-            goto fail_add;
-    }
-
-    return 0;
-
- fail_add:
-    hvm_ioreq_server_remove_all_vcpus(s);
-    hvm_ioreq_server_unmap_pages(s);
-
-    hvm_ioreq_server_free_rangesets(s);
-
-    put_domain(s->emulator);
-    return rc;
-}
-
-static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
-{
-    ASSERT(!s->enabled);
-    hvm_ioreq_server_remove_all_vcpus(s);
-
-    /*
-     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
-     *       hvm_ioreq_server_free_pages() in that order.
-     *       This is because the former will do nothing if the pages
-     *       are not mapped, leaving the page to be freed by the latter.
-     *       However if the pages are mapped then the former will set
-     *       the page_info pointer to NULL, meaning the latter will do
-     *       nothing.
-     */
-    hvm_ioreq_server_unmap_pages(s);
-    hvm_ioreq_server_free_pages(s);
-
-    hvm_ioreq_server_free_rangesets(s);
-
-    put_domain(s->emulator);
-}
-
-int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
-                            ioservid_t *id)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int i;
-    int rc;
-
-    if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC )
-        return -EINVAL;
-
-    s = xzalloc(struct hvm_ioreq_server);
-    if ( !s )
-        return -ENOMEM;
-
-    domain_pause(d);
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
-    {
-        if ( !GET_IOREQ_SERVER(d, i) )
-            break;
-    }
-
-    rc = -ENOSPC;
-    if ( i >= MAX_NR_IOREQ_SERVERS )
-        goto fail;
-
-    /*
-     * It is safe to call set_ioreq_server() prior to
-     * hvm_ioreq_server_init() since the target domain is paused.
-     */
-    set_ioreq_server(d, i, s);
-
-    rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i);
-    if ( rc )
-    {
-        set_ioreq_server(d, i, NULL);
-        goto fail;
-    }
-
-    if ( id )
-        *id = i;
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    domain_unpause(d);
-
-    return 0;
-
- fail:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    domain_unpause(d);
-
-    xfree(s);
-    return rc;
-}
-
-int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    domain_pause(d);
-
-    p2m_set_ioreq_server(d, 0, s);
-
-    hvm_ioreq_server_disable(s);
-
-    /*
-     * It is safe to call hvm_ioreq_server_deinit() prior to
-     * set_ioreq_server() since the target domain is paused.
-     */
-    hvm_ioreq_server_deinit(s);
-    set_ioreq_server(d, id, NULL);
-
-    domain_unpause(d);
-
-    xfree(s);
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
-                              unsigned long *ioreq_gfn,
-                              unsigned long *bufioreq_gfn,
-                              evtchn_port_t *bufioreq_port)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    if ( ioreq_gfn || bufioreq_gfn )
-    {
-        rc = hvm_ioreq_server_map_pages(s);
-        if ( rc )
-            goto out;
-    }
-
-    if ( ioreq_gfn )
-        *ioreq_gfn = gfn_x(s->ioreq.gfn);
-
-    if ( HANDLE_BUFIOREQ(s) )
-    {
-        if ( bufioreq_gfn )
-            *bufioreq_gfn = gfn_x(s->bufioreq.gfn);
-
-        if ( bufioreq_port )
-            *bufioreq_port = s->bufioreq_evtchn;
-    }
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
-                               unsigned long idx, mfn_t *mfn)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    ASSERT(is_hvm_domain(d));
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    rc = hvm_ioreq_server_alloc_pages(s);
-    if ( rc )
-        goto out;
-
-    switch ( idx )
-    {
-    case XENMEM_resource_ioreq_server_frame_bufioreq:
-        rc = -ENOENT;
-        if ( !HANDLE_BUFIOREQ(s) )
-            goto out;
-
-        *mfn = page_to_mfn(s->bufioreq.page);
-        rc = 0;
-        break;
-
-    case XENMEM_resource_ioreq_server_frame_ioreq(0):
-        *mfn = page_to_mfn(s->ioreq.page);
-        rc = 0;
-        break;
-
-    default:
-        rc = -EINVAL;
-        break;
-    }
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint64_t start,
-                                     uint64_t end)
-{
-    struct hvm_ioreq_server *s;
-    struct rangeset *r;
-    int rc;
-
-    if ( start > end )
-        return -EINVAL;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    switch ( type )
-    {
-    case XEN_DMOP_IO_RANGE_PORT:
-    case XEN_DMOP_IO_RANGE_MEMORY:
-    case XEN_DMOP_IO_RANGE_PCI:
-        r = s->range[type];
-        break;
-
-    default:
-        r = NULL;
-        break;
-    }
-
-    rc = -EINVAL;
-    if ( !r )
-        goto out;
-
-    rc = -EEXIST;
-    if ( rangeset_overlaps_range(r, start, end) )
-        goto out;
-
-    rc = rangeset_add_range(r, start, end);
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
+#include <public/hvm/ioreq.h>
+#include <public/hvm/params.h>
 
-int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
-                                         uint32_t type, uint64_t start,
-                                         uint64_t end)
+void handle_realmode_completion(void)
 {
-    struct hvm_ioreq_server *s;
-    struct rangeset *r;
-    int rc;
-
-    if ( start > end )
-        return -EINVAL;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    switch ( type )
-    {
-    case XEN_DMOP_IO_RANGE_PORT:
-    case XEN_DMOP_IO_RANGE_MEMORY:
-    case XEN_DMOP_IO_RANGE_PCI:
-        r = s->range[type];
-        break;
-
-    default:
-        r = NULL;
-        break;
-    }
-
-    rc = -EINVAL;
-    if ( !r )
-        goto out;
-
-    rc = -ENOENT;
-    if ( !rangeset_contains_range(r, start, end) )
-        goto out;
-
-    rc = rangeset_remove_range(r, start, end);
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    struct hvm_emulate_ctxt ctxt;
 
-    return rc;
+    hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
+    vmx_realmode_emulate_one(&ctxt);
+    hvm_emulate_writeback(&ctxt);
 }
 
 /*
@@ -1141,130 +89,12 @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
-int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
-                               bool enabled)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    domain_pause(d);
-
-    if ( enabled )
-        hvm_ioreq_server_enable(s);
-    else
-        hvm_ioreq_server_disable(s);
-
-    domain_unpause(d);
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    return rc;
-}
-
-int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc )
-            goto fail;
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return 0;
-
- fail:
-    while ( ++id != MAX_NR_IOREQ_SERVERS )
-    {
-        s = GET_IOREQ_SERVER(d, id);
-
-        if ( !s )
-            continue;
-
-        hvm_ioreq_server_remove_vcpu(s, v);
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-        hvm_ioreq_server_remove_vcpu(s, v);
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-}
-
-void hvm_destroy_all_ioreq_servers(struct domain *d)
+void hvm_get_ioreq_server_range_type(struct domain *d,
+                                     ioreq_t *p,
+                                     uint8_t *type,
+                                     uint64_t *addr)
 {
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
-        return;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    /* No need to domain_pause() as the domain is being torn down */
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        hvm_ioreq_server_disable(s);
-
-        /*
-         * It is safe to call hvm_ioreq_server_deinit() prior to
-         * set_ioreq_server() since the target domain is being destroyed.
-         */
-        hvm_ioreq_server_deinit(s);
-        set_ioreq_server(d, id, NULL);
-
-        xfree(s);
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-}
-
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p)
-{
-    struct hvm_ioreq_server *s;
-    uint32_t cf8;
-    uint8_t type;
-    uint64_t addr;
-    unsigned int id;
-
-    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
-        return NULL;
-
-    cf8 = d->arch.hvm.pci_cf8;
+    uint32_t cf8 = d->arch.hvm.pci_cf8;
 
     if ( p->type == IOREQ_TYPE_PIO &&
          (p->addr & ~3) == 0xcfc &&
@@ -1277,8 +107,8 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
         reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf);
 
         /* PCI config data cycle */
-        type = XEN_DMOP_IO_RANGE_PCI;
-        addr = ((uint64_t)sbdf.sbdf << 32) | reg;
+        *type = XEN_DMOP_IO_RANGE_PCI;
+        *addr = ((uint64_t)sbdf.sbdf << 32) | reg;
         /* AMD extended configuration space access? */
         if ( CF8_ADDR_HI(cf8) &&
              d->arch.cpuid->x86_vendor == X86_VENDOR_AMD &&
@@ -1290,230 +120,15 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
 
             if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) &&
                  (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) )
-                addr |= CF8_ADDR_HI(cf8);
+                *addr |= CF8_ADDR_HI(cf8);
         }
     }
     else
     {
-        type = (p->type == IOREQ_TYPE_PIO) ?
-                XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
-        addr = p->addr;
-    }
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        struct rangeset *r;
-
-        if ( !s->enabled )
-            continue;
-
-        r = s->range[type];
-
-        switch ( type )
-        {
-            unsigned long start, end;
-
-        case XEN_DMOP_IO_RANGE_PORT:
-            start = addr;
-            end = start + p->size - 1;
-            if ( rangeset_contains_range(r, start, end) )
-                return s;
-
-            break;
-
-        case XEN_DMOP_IO_RANGE_MEMORY:
-            start = hvm_mmio_first_byte(p);
-            end = hvm_mmio_last_byte(p);
-
-            if ( rangeset_contains_range(r, start, end) )
-                return s;
-
-            break;
-
-        case XEN_DMOP_IO_RANGE_PCI:
-            if ( rangeset_contains_singleton(r, addr >> 32) )
-            {
-                p->type = IOREQ_TYPE_PCI_CONFIG;
-                p->addr = addr;
-                return s;
-            }
-
-            break;
-        }
-    }
-
-    return NULL;
-}
-
-static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
-{
-    struct domain *d = current->domain;
-    struct hvm_ioreq_page *iorp;
-    buffered_iopage_t *pg;
-    buf_ioreq_t bp = { .data = p->data,
-                       .addr = p->addr,
-                       .type = p->type,
-                       .dir = p->dir };
-    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
-    int qw = 0;
-
-    /* Ensure buffered_iopage fits in a page */
-    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
-
-    iorp = &s->bufioreq;
-    pg = iorp->va;
-
-    if ( !pg )
-        return X86EMUL_UNHANDLEABLE;
-
-    /*
-     * Return 0 for the cases we can't deal with:
-     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
-     *  - we cannot buffer accesses to guest memory buffers, as the guest
-     *    may expect the memory buffer to be synchronously accessed
-     *  - the count field is usually used with data_is_ptr and since we don't
-     *    support data_is_ptr we do not waste space for the count field either
-     */
-    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
-        return 0;
-
-    switch ( p->size )
-    {
-    case 1:
-        bp.size = 0;
-        break;
-    case 2:
-        bp.size = 1;
-        break;
-    case 4:
-        bp.size = 2;
-        break;
-    case 8:
-        bp.size = 3;
-        qw = 1;
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    spin_lock(&s->bufioreq_lock);
-
-    if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >=
-         (IOREQ_BUFFER_SLOT_NUM - qw) )
-    {
-        /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&s->bufioreq_lock);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
-
-    if ( qw )
-    {
-        bp.data = p->data >> 32;
-        pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
-    }
-
-    /* Make the ioreq_t visible /before/ write_pointer. */
-    smp_wmb();
-    pg->ptrs.write_pointer += qw ? 2 : 1;
-
-    /* Canonicalize read/write pointers to prevent their overflow. */
-    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
-            qw++ < IOREQ_BUFFER_SLOT_NUM &&
-            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
-    {
-        union bufioreq_pointers old = pg->ptrs, new;
-        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
-
-        new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-        new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-        cmpxchg(&pg->ptrs.full, old.full, new.full);
-    }
-
-    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
-    spin_unlock(&s->bufioreq_lock);
-
-    return X86EMUL_OKAY;
-}
-
-int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
-                   bool buffered)
-{
-    struct vcpu *curr = current;
-    struct domain *d = curr->domain;
-    struct hvm_ioreq_vcpu *sv;
-
-    ASSERT(s);
-
-    if ( buffered )
-        return hvm_send_buffered_ioreq(s, proto_p);
-
-    if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-        return X86EMUL_RETRY;
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-    {
-        if ( sv->vcpu == curr )
-        {
-            evtchn_port_t port = sv->ioreq_evtchn;
-            ioreq_t *p = get_ioreq(s, curr);
-
-            if ( unlikely(p->state != STATE_IOREQ_NONE) )
-            {
-                gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
-                        p->state);
-                break;
-            }
-
-            if ( unlikely(p->vp_eport != port) )
-            {
-                gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
-                        p->vp_eport);
-                break;
-            }
-
-            proto_p->state = STATE_IOREQ_NONE;
-            proto_p->vp_eport = port;
-            *p = *proto_p;
-
-            prepare_wait_on_xen_event_channel(port);
-
-            /*
-             * Following happens /after/ blocking and setting up ioreq
-             * contents. prepare_wait_on_xen_event_channel() is an implicit
-             * barrier.
-             */
-            p->state = STATE_IOREQ_READY;
-            notify_via_xen_event_channel(d, port);
-
-            sv->pending = true;
-            return X86EMUL_RETRY;
-        }
-    }
-
-    return X86EMUL_UNHANDLEABLE;
-}
-
-unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
-{
-    struct domain *d = current->domain;
-    struct hvm_ioreq_server *s;
-    unsigned int id, failed = 0;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        if ( !s->enabled )
-            continue;
-
-        if ( hvm_send_ioreq(s, p, buffered) == X86EMUL_UNHANDLEABLE )
-            failed++;
+        *type = (p->type == IOREQ_TYPE_PIO) ?
+                 XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
+        *addr = p->addr;
     }
-
-    return failed;
 }
 
 static int hvm_access_cf8(
@@ -1528,13 +143,19 @@ static int hvm_access_cf8(
     return X86EMUL_UNHANDLEABLE;
 }
 
-void hvm_ioreq_init(struct domain *d)
+void arch_hvm_ioreq_init(struct domain *d)
 {
     spin_lock_init(&d->arch.hvm.ioreq_server.lock);
 
     register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 }
 
+void arch_hvm_ioreq_destroy(struct domain *d)
+{
+    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
+        return;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index e267513..ab6d315 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -27,10 +27,10 @@
  *  can have side effects.
  */
 
+#include <xen/hvm/ioreq.h>
 #include <xen/types.h>
 #include <xen/sched.h>
 #include <xen/domain_page.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/support.h>
 #include <xen/numa.h>
 #include <xen/paging.h>
diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c
index bdbd9cb..b804262 100644
--- a/xen/arch/x86/hvm/vmx/realmode.c
+++ b/xen/arch/x86/hvm/vmx/realmode.c
@@ -9,6 +9,7 @@
  *    Keir Fraser <keir@xen.org>
  */
 
+#include <xen/hvm/ioreq.h>
 #include <xen/init.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 7dfff6c..acfeb1c 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -18,11 +18,11 @@
  * this program; If not, see <http://www.gnu.org/licenses/>.
  *
  */
+#include <xen/hvm/ioreq.h>
 
 #include <asm/types.h>
 #include <asm/mtrr.h>
 #include <asm/p2m.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vvmx.h>
 #include <asm/hvm/nestedhvm.h>
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 82bc676..2b06e15 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -99,6 +99,7 @@
  * doing the final put_page(), and remove it from the iommu if so.
  */
 
+#include <xen/hvm/ioreq.h>
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
@@ -141,7 +142,6 @@
 #include <asm/io_apic.h>
 #include <asm/pci.h>
 #include <asm/guest.h>
-#include <asm/hvm/ioreq.h>
 
 #include <asm/hvm/grant_table.h>
 #include <asm/pv/domain.h>
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 7737773..c84cbb2 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -20,6 +20,7 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/hvm/ioreq.h>
 #include <xen/types.h>
 #include <xen/mm.h>
 #include <xen/trace.h>
@@ -34,7 +35,6 @@
 #include <asm/current.h>
 #include <asm/flushtlb.h>
 #include <asm/shadow.h>
-#include <asm/hvm/ioreq.h>
 #include <xen/numa.h>
 #include "private.h"
 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 15e3b79..fb6fb51 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -139,6 +139,9 @@ config HYPFS_CONFIG
 	  Disable this option in case you want to spare some memory or you
 	  want to hide the .config contents from dom0.
 
+config IOREQ_SERVER
+	bool
+
 config KEXEC
 	bool "kexec support"
 	default y
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 06881d0..f6fc3f8 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -70,6 +70,7 @@ extra-y := symbols-dummy.o
 
 obj-$(CONFIG_COVERAGE) += coverage/
 obj-y += sched/
+obj-$(CONFIG_IOREQ_SERVER) += hvm/
 obj-$(CONFIG_UBSAN) += ubsan/
 
 obj-$(CONFIG_NEEDS_LIBELF) += libelf/
diff --git a/xen/common/hvm/Makefile b/xen/common/hvm/Makefile
new file mode 100644
index 0000000..326215d
--- /dev/null
+++ b/xen/common/hvm/Makefile
@@ -0,0 +1 @@
+obj-y += ioreq.o
diff --git a/xen/common/hvm/ioreq.c b/xen/common/hvm/ioreq.c
new file mode 100644
index 0000000..7e1fa23
--- /dev/null
+++ b/xen/common/hvm/ioreq.c
@@ -0,0 +1,1430 @@
+/*
+ * hvm/ioreq.c: hardware virtual machine I/O emulation
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/ctype.h>
+#include <xen/hvm/ioreq.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/trace.h>
+#include <xen/sched.h>
+#include <xen/irq.h>
+#include <xen/softirq.h>
+#include <xen/domain.h>
+#include <xen/domain_page.h>
+#include <xen/event.h>
+#include <xen/paging.h>
+#include <xen/vpci.h>
+
+#include <public/hvm/dm_op.h>
+#include <public/hvm/ioreq.h>
+#include <public/hvm/params.h>
+
+static void set_ioreq_server(struct domain *d, unsigned int id,
+                             struct hvm_ioreq_server *s)
+{
+    ASSERT(id < MAX_NR_IOREQ_SERVERS);
+    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
+
+    d->arch.hvm.ioreq_server.server[id] = s;
+}
+
+/*
+ * Iterate over all possible ioreq servers.
+ *
+ * NOTE: The iteration is backwards such that more recently created
+ *       ioreq servers are favoured in hvm_select_ioreq_server().
+ *       This is a semantic that previously existed when ioreq servers
+ *       were held in a linked list.
+ */
+#define FOR_EACH_IOREQ_SERVER(d, id, s) \
+    for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \
+        if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \
+            continue; \
+        else
+
+static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    shared_iopage_t *p = s->ioreq.va;
+
+    ASSERT((v == current) || !vcpu_runnable(v));
+    ASSERT(p != NULL);
+
+    return &p->vcpu_ioreq[v->vcpu_id];
+}
+
+bool hvm_io_pending(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        struct hvm_ioreq_vcpu *sv;
+
+        list_for_each_entry ( sv,
+                              &s->ioreq_vcpu_list,
+                              list_entry )
+        {
+            if ( sv->vcpu == v && sv->pending )
+                return true;
+        }
+    }
+
+    return false;
+}
+
+static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data)
+{
+    struct vcpu *v = sv->vcpu;
+    ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req;
+
+    if ( hvm_ioreq_needs_completion(ioreq) )
+        ioreq->data = data;
+
+    sv->pending = false;
+}
+
+static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
+{
+    unsigned int prev_state = STATE_IOREQ_NONE;
+
+    while ( sv->pending )
+    {
+        unsigned int state = p->state;
+
+        smp_rmb();
+
+    recheck:
+        if ( unlikely(state == STATE_IOREQ_NONE) )
+        {
+            /*
+             * The only reason we should see this case is when an
+             * emulator is dying and it races with an I/O being
+             * requested.
+             */
+            hvm_io_assist(sv, ~0ul);
+            break;
+        }
+
+        if ( unlikely(state < prev_state) )
+        {
+            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
+                     prev_state, state);
+            sv->pending = false;
+            domain_crash(sv->vcpu->domain);
+            return false; /* bail */
+        }
+
+        switch ( prev_state = state )
+        {
+        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+            p->state = STATE_IOREQ_NONE;
+            hvm_io_assist(sv, p->data);
+            break;
+        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
+        case STATE_IOREQ_INPROCESS:
+            wait_on_xen_event_channel(sv->ioreq_evtchn,
+                                      ({ state = p->state;
+                                         smp_rmb();
+                                         state != prev_state; }));
+            goto recheck;
+        default:
+            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state);
+            sv->pending = false;
+            domain_crash(sv->vcpu->domain);
+            return false; /* bail */
+        }
+    }
+
+    return true;
+}
+
+bool handle_hvm_io_completion(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
+    struct hvm_ioreq_server *s;
+    enum hvm_io_completion io_completion;
+    unsigned int id;
+
+    if ( has_vpci(d) && vpci_process_pending(v) )
+    {
+        raise_softirq(SCHEDULE_SOFTIRQ);
+        return false;
+    }
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        struct hvm_ioreq_vcpu *sv;
+
+        list_for_each_entry ( sv,
+                              &s->ioreq_vcpu_list,
+                              list_entry )
+        {
+            if ( sv->vcpu == v && sv->pending )
+            {
+                if ( !hvm_wait_for_io(sv, get_ioreq(s, v)) )
+                    return false;
+
+                break;
+            }
+        }
+    }
+
+    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
+        STATE_IORESP_READY : STATE_IOREQ_NONE;
+
+    msix_write_completion(v);
+    vcpu_end_shutdown_deferral(v);
+
+    io_completion = vio->io_completion;
+    vio->io_completion = HVMIO_no_completion;
+
+    switch ( io_completion )
+    {
+    case HVMIO_no_completion:
+        break;
+
+    case HVMIO_mmio_completion:
+        return handle_mmio();
+
+    case HVMIO_pio_completion:
+        return handle_pio(vio->io_req.addr, vio->io_req.size,
+                          vio->io_req.dir);
+
+    case HVMIO_realmode_completion:
+        handle_realmode_completion();
+        break;
+
+    default:
+        ASSERT_UNREACHABLE();
+        break;
+    }
+
+    return true;
+}
+
+static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->target;
+    unsigned int i;
+
+    BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN != HVM_PARAM_IOREQ_PFN + 1);
+
+    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
+    {
+        if ( !test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask) )
+            return _gfn(d->arch.hvm.params[i]);
+    }
+
+    return INVALID_GFN;
+}
+
+static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->target;
+    unsigned int i;
+
+    for ( i = 0; i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8; i++ )
+    {
+        if ( test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.mask) )
+            return _gfn(d->arch.hvm.ioreq_gfn.base + i);
+    }
+
+    /*
+     * If we are out of 'normal' GFNs then we may still have a 'legacy'
+     * GFN available.
+     */
+    return hvm_alloc_legacy_ioreq_gfn(s);
+}
+
+static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s,
+                                      gfn_t gfn)
+{
+    struct domain *d = s->target;
+    unsigned int i;
+
+    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
+    {
+        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
+             break;
+    }
+    if ( i > HVM_PARAM_BUFIOREQ_PFN )
+        return false;
+
+    set_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask);
+    return true;
+}
+
+static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
+{
+    struct domain *d = s->target;
+    unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base;
+
+    ASSERT(!gfn_eq(gfn, INVALID_GFN));
+
+    if ( !hvm_free_legacy_ioreq_gfn(s, gfn) )
+    {
+        ASSERT(i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8);
+        set_bit(i, &d->arch.hvm.ioreq_gfn.mask);
+    }
+}
+
+static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return;
+
+    destroy_ring_for_helper(&iorp->va, iorp->page);
+    iorp->page = NULL;
+
+    hvm_free_ioreq_gfn(s, iorp->gfn);
+    iorp->gfn = INVALID_GFN;
+}
+
+static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct domain *d = s->target;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    int rc;
+
+    if ( iorp->page )
+    {
+        /*
+         * If a page has already been allocated (which will happen on
+         * demand if hvm_get_ioreq_server_frame() is called), then
+         * mapping a guest frame is not permitted.
+         */
+        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+            return -EPERM;
+
+        return 0;
+    }
+
+    if ( d->is_dying )
+        return -EINVAL;
+
+    iorp->gfn = hvm_alloc_ioreq_gfn(s);
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return -ENOMEM;
+
+    rc = prepare_ring_for_helper(d, gfn_x(iorp->gfn), &iorp->page,
+                                 &iorp->va);
+
+    if ( rc )
+        hvm_unmap_ioreq_gfn(s, buf);
+
+    return rc;
+}
+
+static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct page_info *page;
+
+    if ( iorp->page )
+    {
+        /*
+         * If a guest frame has already been mapped (which may happen
+         * on demand if hvm_get_ioreq_server_info() is called), then
+         * allocating a page is not permitted.
+         */
+        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
+            return -EPERM;
+
+        return 0;
+    }
+
+    page = alloc_domheap_page(s->target, MEMF_no_refcount);
+
+    if ( !page )
+        return -ENOMEM;
+
+    if ( !get_page_and_type(page, s->target, PGT_writable_page) )
+    {
+        /*
+         * The domain can't possibly know about this page yet, so failure
+         * here is a clear indication of something fishy going on.
+         */
+        domain_crash(s->emulator);
+        return -ENODATA;
+    }
+
+    iorp->va = __map_domain_page_global(page);
+    if ( !iorp->va )
+        goto fail;
+
+    iorp->page = page;
+    clear_page(iorp->va);
+    return 0;
+
+ fail:
+    put_page_alloc_ref(page);
+    put_page_and_type(page);
+
+    return -ENOMEM;
+}
+
+static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct page_info *page = iorp->page;
+
+    if ( !page )
+        return;
+
+    iorp->page = NULL;
+
+    unmap_domain_page_global(iorp->va);
+    iorp->va = NULL;
+
+    put_page_alloc_ref(page);
+    put_page_and_type(page);
+}
+
+bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
+{
+    const struct hvm_ioreq_server *s;
+    unsigned int id;
+    bool found = false;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
+        {
+            found = true;
+            break;
+        }
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return found;
+}
+
+static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+
+{
+    struct domain *d = s->target;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return;
+
+    if ( guest_physmap_remove_page(d, iorp->gfn,
+                                   page_to_mfn(iorp->page), 0) )
+        domain_crash(d);
+    clear_page(iorp->va);
+}
+
+static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct domain *d = s->target;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    int rc;
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return 0;
+
+    clear_page(iorp->va);
+
+    rc = guest_physmap_add_page(d, iorp->gfn,
+                                page_to_mfn(iorp->page), 0);
+    if ( rc == 0 )
+        paging_mark_pfn_dirty(d, _pfn(gfn_x(iorp->gfn)));
+
+    return rc;
+}
+
+static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
+                                    struct hvm_ioreq_vcpu *sv)
+{
+    ASSERT(spin_is_locked(&s->lock));
+
+    if ( s->ioreq.va != NULL )
+    {
+        ioreq_t *p = get_ioreq(s, sv->vcpu);
+
+        p->vp_eport = sv->ioreq_evtchn;
+    }
+}
+
+#define HANDLE_BUFIOREQ(s) \
+    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
+
+static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
+                                     struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+    int rc;
+
+    sv = xzalloc(struct hvm_ioreq_vcpu);
+
+    rc = -ENOMEM;
+    if ( !sv )
+        goto fail1;
+
+    spin_lock(&s->lock);
+
+    rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id,
+                                         s->emulator->domain_id, NULL);
+    if ( rc < 0 )
+        goto fail2;
+
+    sv->ioreq_evtchn = rc;
+
+    if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+    {
+        rc = alloc_unbound_xen_event_channel(v->domain, 0,
+                                             s->emulator->domain_id, NULL);
+        if ( rc < 0 )
+            goto fail3;
+
+        s->bufioreq_evtchn = rc;
+    }
+
+    sv->vcpu = v;
+
+    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
+
+    if ( s->enabled )
+        hvm_update_ioreq_evtchn(s, sv);
+
+    spin_unlock(&s->lock);
+    return 0;
+
+ fail3:
+    free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+ fail2:
+    spin_unlock(&s->lock);
+    xfree(sv);
+
+ fail1:
+    return rc;
+}
+
+static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
+                                         struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+    {
+        if ( sv->vcpu != v )
+            continue;
+
+        list_del(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+        xfree(sv);
+        break;
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
+{
+    struct hvm_ioreq_vcpu *sv, *next;
+
+    spin_lock(&s->lock);
+
+    list_for_each_entry_safe ( sv,
+                               next,
+                               &s->ioreq_vcpu_list,
+                               list_entry )
+    {
+        struct vcpu *v = sv->vcpu;
+
+        list_del(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+        xfree(sv);
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+{
+    int rc;
+
+    rc = hvm_map_ioreq_gfn(s, false);
+
+    if ( !rc && HANDLE_BUFIOREQ(s) )
+        rc = hvm_map_ioreq_gfn(s, true);
+
+    if ( rc )
+        hvm_unmap_ioreq_gfn(s, false);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+{
+    hvm_unmap_ioreq_gfn(s, true);
+    hvm_unmap_ioreq_gfn(s, false);
+}
+
+static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
+{
+    int rc;
+
+    rc = hvm_alloc_ioreq_mfn(s, false);
+
+    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
+        rc = hvm_alloc_ioreq_mfn(s, true);
+
+    if ( rc )
+        hvm_free_ioreq_mfn(s, false);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
+{
+    hvm_free_ioreq_mfn(s, true);
+    hvm_free_ioreq_mfn(s, false);
+}
+
+static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
+{
+    unsigned int i;
+
+    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+        rangeset_destroy(s->range[i]);
+}
+
+static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
+                                            ioservid_t id)
+{
+    unsigned int i;
+    int rc;
+
+    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+    {
+        char *name;
+
+        rc = asprintf(&name, "ioreq_server %d %s", id,
+                      (i == XEN_DMOP_IO_RANGE_PORT) ? "port" :
+                      (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" :
+                      (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" :
+                      "");
+        if ( rc )
+            goto fail;
+
+        s->range[i] = rangeset_new(s->target, name,
+                                   RANGESETF_prettyprint_hex);
+
+        xfree(name);
+
+        rc = -ENOMEM;
+        if ( !s->range[i] )
+            goto fail;
+
+        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+    }
+
+    return 0;
+
+ fail:
+    hvm_ioreq_server_free_rangesets(s);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
+{
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    if ( s->enabled )
+        goto done;
+
+    hvm_remove_ioreq_gfn(s, false);
+    hvm_remove_ioreq_gfn(s, true);
+
+    s->enabled = true;
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+        hvm_update_ioreq_evtchn(s, sv);
+
+  done:
+    spin_unlock(&s->lock);
+}
+
+static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
+{
+    spin_lock(&s->lock);
+
+    if ( !s->enabled )
+        goto done;
+
+    hvm_add_ioreq_gfn(s, true);
+    hvm_add_ioreq_gfn(s, false);
+
+    s->enabled = false;
+
+ done:
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
+                                 struct domain *d, int bufioreq_handling,
+                                 ioservid_t id)
+{
+    struct domain *currd = current->domain;
+    struct vcpu *v;
+    int rc;
+
+    s->target = d;
+
+    get_knownalive_domain(currd);
+    s->emulator = currd;
+
+    spin_lock_init(&s->lock);
+    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
+    spin_lock_init(&s->bufioreq_lock);
+
+    s->ioreq.gfn = INVALID_GFN;
+    s->bufioreq.gfn = INVALID_GFN;
+
+    rc = hvm_ioreq_server_alloc_rangesets(s, id);
+    if ( rc )
+        return rc;
+
+    s->bufioreq_handling = bufioreq_handling;
+
+    for_each_vcpu ( d, v )
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail_add;
+    }
+
+    return 0;
+
+ fail_add:
+    hvm_ioreq_server_remove_all_vcpus(s);
+    hvm_ioreq_server_unmap_pages(s);
+
+    hvm_ioreq_server_free_rangesets(s);
+
+    put_domain(s->emulator);
+    return rc;
+}
+
+static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
+{
+    ASSERT(!s->enabled);
+    hvm_ioreq_server_remove_all_vcpus(s);
+
+    /*
+     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
+     *       hvm_ioreq_server_free_pages() in that order.
+     *       This is because the former will do nothing if the pages
+     *       are not mapped, leaving the page to be freed by the latter.
+     *       However if the pages are mapped then the former will set
+     *       the page_info pointer to NULL, meaning the latter will do
+     *       nothing.
+     */
+    hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_free_pages(s);
+
+    hvm_ioreq_server_free_rangesets(s);
+
+    put_domain(s->emulator);
+}
+
+int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
+                            ioservid_t *id)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int i;
+    int rc;
+
+    if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC )
+        return -EINVAL;
+
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        return -ENOMEM;
+
+    domain_pause(d);
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
+    {
+        if ( !GET_IOREQ_SERVER(d, i) )
+            break;
+    }
+
+    rc = -ENOSPC;
+    if ( i >= MAX_NR_IOREQ_SERVERS )
+        goto fail;
+
+    /*
+     * It is safe to call set_ioreq_server() prior to
+     * hvm_ioreq_server_init() since the target domain is paused.
+     */
+    set_ioreq_server(d, i, s);
+
+    rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i);
+    if ( rc )
+    {
+        set_ioreq_server(d, i, NULL);
+        goto fail;
+    }
+
+    if ( id )
+        *id = i;
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    domain_unpause(d);
+
+    return 0;
+
+ fail:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    domain_unpause(d);
+
+    xfree(s);
+    return rc;
+}
+
+int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    domain_pause(d);
+
+    p2m_set_ioreq_server(d, 0, s);
+
+    hvm_ioreq_server_disable(s);
+
+    /*
+     * It is safe to call hvm_ioreq_server_deinit() prior to
+     * set_ioreq_server() since the target domain is paused.
+     */
+    hvm_ioreq_server_deinit(s);
+    set_ioreq_server(d, id, NULL);
+
+    domain_unpause(d);
+
+    xfree(s);
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                              unsigned long *ioreq_gfn,
+                              unsigned long *bufioreq_gfn,
+                              evtchn_port_t *bufioreq_port)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    if ( ioreq_gfn || bufioreq_gfn )
+    {
+        rc = hvm_ioreq_server_map_pages(s);
+        if ( rc )
+            goto out;
+    }
+
+    if ( ioreq_gfn )
+        *ioreq_gfn = gfn_x(s->ioreq.gfn);
+
+    if ( HANDLE_BUFIOREQ(s) )
+    {
+        if ( bufioreq_gfn )
+            *bufioreq_gfn = gfn_x(s->bufioreq.gfn);
+
+        if ( bufioreq_port )
+            *bufioreq_port = s->bufioreq_evtchn;
+    }
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
+                               unsigned long idx, mfn_t *mfn)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    ASSERT(is_hvm_domain(d));
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    rc = hvm_ioreq_server_alloc_pages(s);
+    if ( rc )
+        goto out;
+
+    switch ( idx )
+    {
+    case XENMEM_resource_ioreq_server_frame_bufioreq:
+        rc = -ENOENT;
+        if ( !HANDLE_BUFIOREQ(s) )
+            goto out;
+
+        *mfn = page_to_mfn(s->bufioreq.page);
+        rc = 0;
+        break;
+
+    case XENMEM_resource_ioreq_server_frame_ioreq(0):
+        *mfn = page_to_mfn(s->ioreq.page);
+        rc = 0;
+        break;
+
+    default:
+        rc = -EINVAL;
+        break;
+    }
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint64_t start,
+                                     uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    struct rangeset *r;
+    int rc;
+
+    if ( start > end )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    switch ( type )
+    {
+    case XEN_DMOP_IO_RANGE_PORT:
+    case XEN_DMOP_IO_RANGE_MEMORY:
+    case XEN_DMOP_IO_RANGE_PCI:
+        r = s->range[type];
+        break;
+
+    default:
+        r = NULL;
+        break;
+    }
+
+    rc = -EINVAL;
+    if ( !r )
+        goto out;
+
+    rc = -EEXIST;
+    if ( rangeset_overlaps_range(r, start, end) )
+        goto out;
+
+    rc = rangeset_add_range(r, start, end);
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                         uint32_t type, uint64_t start,
+                                         uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    struct rangeset *r;
+    int rc;
+
+    if ( start > end )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    switch ( type )
+    {
+    case XEN_DMOP_IO_RANGE_PORT:
+    case XEN_DMOP_IO_RANGE_MEMORY:
+    case XEN_DMOP_IO_RANGE_PCI:
+        r = s->range[type];
+        break;
+
+    default:
+        r = NULL;
+        break;
+    }
+
+    rc = -EINVAL;
+    if ( !r )
+        goto out;
+
+    rc = -ENOENT;
+    if ( !rangeset_contains_range(r, start, end) )
+        goto out;
+
+    rc = rangeset_remove_range(r, start, end);
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                               bool enabled)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    domain_pause(d);
+
+    if ( enabled )
+        hvm_ioreq_server_enable(s);
+    else
+        hvm_ioreq_server_disable(s);
+
+    domain_unpause(d);
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    return rc;
+}
+
+int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail;
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return 0;
+
+ fail:
+    while ( ++id != MAX_NR_IOREQ_SERVERS )
+    {
+        s = GET_IOREQ_SERVER(d, id);
+
+        if ( !s )
+            continue;
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+        hvm_ioreq_server_remove_vcpu(s, v);
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+}
+
+void hvm_destroy_all_ioreq_servers(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    arch_hvm_ioreq_destroy(d);
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    /* No need to domain_pause() as the domain is being torn down */
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        hvm_ioreq_server_disable(s);
+
+        /*
+         * It is safe to call hvm_ioreq_server_deinit() prior to
+         * set_ioreq_server() since the target domain is being destroyed.
+         */
+        hvm_ioreq_server_deinit(s);
+        set_ioreq_server(d, id, NULL);
+
+        xfree(s);
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+}
+
+struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                 ioreq_t *p)
+{
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+    unsigned int id;
+
+    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
+        return NULL;
+
+    hvm_get_ioreq_server_range_type(d, p, &type, &addr);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        struct rangeset *r;
+
+        if ( !s->enabled )
+            continue;
+
+        r = s->range[type];
+
+        switch ( type )
+        {
+            unsigned long start, end;
+
+        case XEN_DMOP_IO_RANGE_PORT:
+            start = addr;
+            end = start + p->size - 1;
+            if ( rangeset_contains_range(r, start, end) )
+                return s;
+
+            break;
+
+        case XEN_DMOP_IO_RANGE_MEMORY:
+            start = hvm_mmio_first_byte(p);
+            end = hvm_mmio_last_byte(p);
+
+            if ( rangeset_contains_range(r, start, end) )
+                return s;
+
+            break;
+
+        case XEN_DMOP_IO_RANGE_PCI:
+            if ( rangeset_contains_singleton(r, addr >> 32) )
+            {
+                p->type = IOREQ_TYPE_PCI_CONFIG;
+                p->addr = addr;
+                return s;
+            }
+
+            break;
+        }
+    }
+
+    return NULL;
+}
+
+static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
+{
+    struct domain *d = current->domain;
+    struct hvm_ioreq_page *iorp;
+    buffered_iopage_t *pg;
+    buf_ioreq_t bp = { .data = p->data,
+                       .addr = p->addr,
+                       .type = p->type,
+                       .dir = p->dir };
+    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
+    int qw = 0;
+
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
+    iorp = &s->bufioreq;
+    pg = iorp->va;
+
+    if ( !pg )
+        return IOREQ_IO_UNHANDLED;
+
+    /*
+     * Return 0 for the cases we can't deal with:
+     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
+     *  - we cannot buffer accesses to guest memory buffers, as the guest
+     *    may expect the memory buffer to be synchronously accessed
+     *  - the count field is usually used with data_is_ptr and since we don't
+     *    support data_is_ptr we do not waste space for the count field either
+     */
+    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
+        return 0;
+
+    switch ( p->size )
+    {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
+        return IOREQ_IO_UNHANDLED;
+    }
+
+    spin_lock(&s->bufioreq_lock);
+
+    if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >=
+         (IOREQ_BUFFER_SLOT_NUM - qw) )
+    {
+        /* The queue is full: send the iopacket through the normal path. */
+        spin_unlock(&s->bufioreq_lock);
+        return IOREQ_IO_UNHANDLED;
+    }
+
+    pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
+
+    if ( qw )
+    {
+        bp.data = p->data >> 32;
+        pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
+    }
+
+    /* Make the ioreq_t visible /before/ write_pointer. */
+    smp_wmb();
+    pg->ptrs.write_pointer += qw ? 2 : 1;
+
+    /* Canonicalize read/write pointers to prevent their overflow. */
+    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
+            qw++ < IOREQ_BUFFER_SLOT_NUM &&
+            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
+    {
+        union bufioreq_pointers old = pg->ptrs, new;
+        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
+
+        new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
+        new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
+        cmpxchg(&pg->ptrs.full, old.full, new.full);
+    }
+
+    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
+    spin_unlock(&s->bufioreq_lock);
+
+    return IOREQ_IO_HANDLED;
+}
+
+int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
+                   bool buffered)
+{
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
+    struct hvm_ioreq_vcpu *sv;
+
+    ASSERT(s);
+
+    if ( buffered )
+        return hvm_send_buffered_ioreq(s, proto_p);
+
+    if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
+        return IOREQ_IO_RETRY;
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+    {
+        if ( sv->vcpu == curr )
+        {
+            evtchn_port_t port = sv->ioreq_evtchn;
+            ioreq_t *p = get_ioreq(s, curr);
+
+            if ( unlikely(p->state != STATE_IOREQ_NONE) )
+            {
+                gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
+                        p->state);
+                break;
+            }
+
+            if ( unlikely(p->vp_eport != port) )
+            {
+                gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
+                        p->vp_eport);
+                break;
+            }
+
+            proto_p->state = STATE_IOREQ_NONE;
+            proto_p->vp_eport = port;
+            *p = *proto_p;
+
+            prepare_wait_on_xen_event_channel(port);
+
+            /*
+             * Following happens /after/ blocking and setting up ioreq
+             * contents. prepare_wait_on_xen_event_channel() is an implicit
+             * barrier.
+             */
+            p->state = STATE_IOREQ_READY;
+            notify_via_xen_event_channel(d, port);
+
+            sv->pending = true;
+            return IOREQ_IO_RETRY;
+        }
+    }
+
+    return IOREQ_IO_UNHANDLED;
+}
+
+unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
+{
+    struct domain *d = current->domain;
+    struct hvm_ioreq_server *s;
+    unsigned int id, failed = 0;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        if ( !s->enabled )
+            continue;
+
+        if ( hvm_send_ioreq(s, p, buffered) == IOREQ_IO_UNHANDLED )
+            failed++;
+    }
+
+    return failed;
+}
+
+void hvm_ioreq_init(struct domain *d)
+{
+    spin_lock_init(&d->arch.hvm.ioreq_server.lock);
+
+    arch_hvm_ioreq_init(d);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index e2588e9..0e871e0 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -19,41 +19,26 @@
 #ifndef __ASM_X86_HVM_IOREQ_H__
 #define __ASM_X86_HVM_IOREQ_H__
 
-bool hvm_io_pending(struct vcpu *v);
-bool handle_hvm_io_completion(struct vcpu *v);
-bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
+#include <asm/hvm/emulate.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/vmx/vmx.h>
+
+void handle_realmode_completion(void);
+
+void hvm_get_ioreq_server_range_type(struct domain *d,
+                                     ioreq_t *p,
+                                     uint8_t *type,
+                                     uint64_t *addr);
 
-int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
-                            ioservid_t *id);
-int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id);
-int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
-                              unsigned long *ioreq_gfn,
-                              unsigned long *bufioreq_gfn,
-                              evtchn_port_t *bufioreq_port);
-int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
-                               unsigned long idx, mfn_t *mfn);
-int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint64_t start,
-                                     uint64_t end);
-int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
-                                         uint32_t type, uint64_t start,
-                                         uint64_t end);
 int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
                                      uint32_t type, uint32_t flags);
-int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
-                               bool enabled);
-
-int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
-void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
-void hvm_destroy_all_ioreq_servers(struct domain *d);
 
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p);
-int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
-                   bool buffered);
-unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
+void arch_hvm_ioreq_init(struct domain *d);
+void arch_hvm_ioreq_destroy(struct domain *d);
 
-void hvm_ioreq_init(struct domain *d);
+#define IOREQ_IO_HANDLED     X86EMUL_OKAY
+#define IOREQ_IO_UNHANDLED   X86EMUL_UNHANDLEABLE
+#define IOREQ_IO_RETRY       X86EMUL_RETRY
 
 #endif /* __ASM_X86_HVM_IOREQ_H__ */
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 5ccd075..6c1feda 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -91,13 +91,6 @@ struct hvm_vcpu_io {
     const struct g2m_ioport *g2m_ioport;
 };
 
-static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
-{
-    return ioreq->state == STATE_IOREQ_READY &&
-           !ioreq->data_is_ptr &&
-           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
-}
-
 struct nestedvcpu {
     bool_t nv_guestmode; /* vcpu in guestmode? */
     void *nv_vvmcx; /* l1 guest virtual VMCB/VMCS */
diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h
new file mode 100644
index 0000000..40b7b5e
--- /dev/null
+++ b/xen/include/xen/hvm/ioreq.h
@@ -0,0 +1,89 @@
+/*
+ * hvm.h: Hardware virtual machine assist interface definitions.
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __HVM_IOREQ_H__
+#define __HVM_IOREQ_H__
+
+#include <xen/sched.h>
+
+#include <asm/hvm/ioreq.h>
+
+#define GET_IOREQ_SERVER(d, id) \
+    (d)->arch.hvm.ioreq_server.server[id]
+
+static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
+                                                        unsigned int id)
+{
+    if ( id >= MAX_NR_IOREQ_SERVERS )
+        return NULL;
+
+    return GET_IOREQ_SERVER(d, id);
+}
+
+static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
+{
+    return ioreq->state == STATE_IOREQ_READY &&
+           !ioreq->data_is_ptr &&
+           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
+}
+
+bool hvm_io_pending(struct vcpu *v);
+bool handle_hvm_io_completion(struct vcpu *v);
+bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
+
+int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
+                            ioservid_t *id);
+int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id);
+int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                              unsigned long *ioreq_gfn,
+                              unsigned long *bufioreq_gfn,
+                              evtchn_port_t *bufioreq_port);
+int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
+                               unsigned long idx, mfn_t *mfn);
+int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint64_t start,
+                                     uint64_t end);
+int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                         uint32_t type, uint64_t start,
+                                         uint64_t end);
+int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                               bool enabled);
+
+int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
+void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
+void hvm_destroy_all_ioreq_servers(struct domain *d);
+
+struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                 ioreq_t *p);
+int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
+                   bool buffered);
+unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
+
+void hvm_ioreq_init(struct domain *d);
+
+#endif /* __HVM_IOREQ_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 02/12] hvm/dm: Make x86's DM feature common
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
  2020-08-03 18:21 ` [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-03 18:21 ` [RFC PATCH V1 03/12] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Jan Beulich, Roger Pau Monné

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As a lot of x86 code can be re-used on Arm later on, this patch
splits devicemodel support into common and arch specific parts.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Please note, this is a split/cleanup of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 xen/arch/x86/hvm/dm.c       | 287 +++-----------------------------------------
 xen/common/hvm/Makefile     |   1 +
 xen/common/hvm/dm.c         | 287 ++++++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/hypercall.h |  12 ++
 4 files changed, 319 insertions(+), 268 deletions(-)
 create mode 100644 xen/common/hvm/dm.c

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 70adb27..fb1ff09 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -29,13 +29,6 @@
 
 #include <public/hvm/hvm_op.h>
 
-struct dmop_args {
-    domid_t domid;
-    unsigned int nr_bufs;
-    /* Reserve enough buf elements for all current hypercalls. */
-    struct xen_dm_op_buf buf[2];
-};
-
 static bool _raw_copy_from_guest_buf_offset(void *dst,
                                             const struct dmop_args *args,
                                             unsigned int buf_idx,
@@ -337,148 +330,20 @@ static int inject_event(struct domain *d,
     return 0;
 }
 
-static int dm_op(const struct dmop_args *op_args)
+int arch_dm_op(struct xen_dm_op *op, struct domain *d,
+               const struct dmop_args *op_args, bool *const_op)
 {
-    struct domain *d;
-    struct xen_dm_op op;
-    bool const_op = true;
     long rc;
-    size_t offset;
-
-    static const uint8_t op_size[] = {
-        [XEN_DMOP_create_ioreq_server]              = sizeof(struct xen_dm_op_create_ioreq_server),
-        [XEN_DMOP_get_ioreq_server_info]            = sizeof(struct xen_dm_op_get_ioreq_server_info),
-        [XEN_DMOP_map_io_range_to_ioreq_server]     = sizeof(struct xen_dm_op_ioreq_server_range),
-        [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range),
-        [XEN_DMOP_set_ioreq_server_state]           = sizeof(struct xen_dm_op_set_ioreq_server_state),
-        [XEN_DMOP_destroy_ioreq_server]             = sizeof(struct xen_dm_op_destroy_ioreq_server),
-        [XEN_DMOP_track_dirty_vram]                 = sizeof(struct xen_dm_op_track_dirty_vram),
-        [XEN_DMOP_set_pci_intx_level]               = sizeof(struct xen_dm_op_set_pci_intx_level),
-        [XEN_DMOP_set_isa_irq_level]                = sizeof(struct xen_dm_op_set_isa_irq_level),
-        [XEN_DMOP_set_pci_link_route]               = sizeof(struct xen_dm_op_set_pci_link_route),
-        [XEN_DMOP_modified_memory]                  = sizeof(struct xen_dm_op_modified_memory),
-        [XEN_DMOP_set_mem_type]                     = sizeof(struct xen_dm_op_set_mem_type),
-        [XEN_DMOP_inject_event]                     = sizeof(struct xen_dm_op_inject_event),
-        [XEN_DMOP_inject_msi]                       = sizeof(struct xen_dm_op_inject_msi),
-        [XEN_DMOP_map_mem_type_to_ioreq_server]     = sizeof(struct xen_dm_op_map_mem_type_to_ioreq_server),
-        [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
-        [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
-        [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
-    };
-
-    rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
-    if ( rc )
-        return rc;
-
-    if ( !is_hvm_domain(d) )
-        goto out;
-
-    rc = xsm_dm_op(XSM_DM_PRIV, d);
-    if ( rc )
-        goto out;
-
-    offset = offsetof(struct xen_dm_op, u);
-
-    rc = -EFAULT;
-    if ( op_args->buf[0].size < offset )
-        goto out;
-
-    if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) )
-        goto out;
-
-    if ( op.op >= ARRAY_SIZE(op_size) )
-    {
-        rc = -EOPNOTSUPP;
-        goto out;
-    }
-
-    op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size));
-
-    if ( op_args->buf[0].size < offset + op_size[op.op] )
-        goto out;
-
-    if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset,
-                                op_size[op.op]) )
-        goto out;
-
-    rc = -EINVAL;
-    if ( op.pad )
-        goto out;
-
-    switch ( op.op )
-    {
-    case XEN_DMOP_create_ioreq_server:
-    {
-        struct xen_dm_op_create_ioreq_server *data =
-            &op.u.create_ioreq_server;
-
-        const_op = false;
-
-        rc = -EINVAL;
-        if ( data->pad[0] || data->pad[1] || data->pad[2] )
-            break;
-
-        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
-                                     &data->id);
-        break;
-    }
 
-    case XEN_DMOP_get_ioreq_server_info:
+    switch ( op->op )
     {
-        struct xen_dm_op_get_ioreq_server_info *data =
-            &op.u.get_ioreq_server_info;
-        const uint16_t valid_flags = XEN_DMOP_no_gfns;
-
-        const_op = false;
-
-        rc = -EINVAL;
-        if ( data->flags & ~valid_flags )
-            break;
-
-        rc = hvm_get_ioreq_server_info(d, data->id,
-                                       (data->flags & XEN_DMOP_no_gfns) ?
-                                       NULL : &data->ioreq_gfn,
-                                       (data->flags & XEN_DMOP_no_gfns) ?
-                                       NULL : &data->bufioreq_gfn,
-                                       &data->bufioreq_port);
-        break;
-    }
-
-    case XEN_DMOP_map_io_range_to_ioreq_server:
-    {
-        const struct xen_dm_op_ioreq_server_range *data =
-            &op.u.map_io_range_to_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
-                                              data->start, data->end);
-        break;
-    }
-
-    case XEN_DMOP_unmap_io_range_from_ioreq_server:
-    {
-        const struct xen_dm_op_ioreq_server_range *data =
-            &op.u.unmap_io_range_from_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
-                                                  data->start, data->end);
-        break;
-    }
-
     case XEN_DMOP_map_mem_type_to_ioreq_server:
     {
         struct xen_dm_op_map_mem_type_to_ioreq_server *data =
-            &op.u.map_mem_type_to_ioreq_server;
+            &op->u.map_mem_type_to_ioreq_server;
         unsigned long first_gfn = data->opaque;
 
-        const_op = false;
+        *const_op = false;
 
         rc = -EOPNOTSUPP;
         if ( !hap_enabled(d) )
@@ -522,36 +387,10 @@ static int dm_op(const struct dmop_args *op_args)
         break;
     }
 
-    case XEN_DMOP_set_ioreq_server_state:
-    {
-        const struct xen_dm_op_set_ioreq_server_state *data =
-            &op.u.set_ioreq_server_state;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
-        break;
-    }
-
-    case XEN_DMOP_destroy_ioreq_server:
-    {
-        const struct xen_dm_op_destroy_ioreq_server *data =
-            &op.u.destroy_ioreq_server;
-
-        rc = -EINVAL;
-        if ( data->pad )
-            break;
-
-        rc = hvm_destroy_ioreq_server(d, data->id);
-        break;
-    }
-
     case XEN_DMOP_track_dirty_vram:
     {
         const struct xen_dm_op_track_dirty_vram *data =
-            &op.u.track_dirty_vram;
+            &op->u.track_dirty_vram;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -567,7 +406,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_set_pci_intx_level:
     {
         const struct xen_dm_op_set_pci_intx_level *data =
-            &op.u.set_pci_intx_level;
+            &op->u.set_pci_intx_level;
 
         rc = set_pci_intx_level(d, data->domain, data->bus,
                                 data->device, data->intx,
@@ -578,7 +417,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_set_isa_irq_level:
     {
         const struct xen_dm_op_set_isa_irq_level *data =
-            &op.u.set_isa_irq_level;
+            &op->u.set_isa_irq_level;
 
         rc = set_isa_irq_level(d, data->isa_irq, data->level);
         break;
@@ -587,7 +426,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_set_pci_link_route:
     {
         const struct xen_dm_op_set_pci_link_route *data =
-            &op.u.set_pci_link_route;
+            &op->u.set_pci_link_route;
 
         rc = hvm_set_pci_link_route(d, data->link, data->isa_irq);
         break;
@@ -596,19 +435,19 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_modified_memory:
     {
         struct xen_dm_op_modified_memory *data =
-            &op.u.modified_memory;
+            &op->u.modified_memory;
 
         rc = modified_memory(d, op_args, data);
-        const_op = !rc;
+        *const_op = !rc;
         break;
     }
 
     case XEN_DMOP_set_mem_type:
     {
         struct xen_dm_op_set_mem_type *data =
-            &op.u.set_mem_type;
+            &op->u.set_mem_type;
 
-        const_op = false;
+        *const_op = false;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -621,7 +460,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_inject_event:
     {
         const struct xen_dm_op_inject_event *data =
-            &op.u.inject_event;
+            &op->u.inject_event;
 
         rc = -EINVAL;
         if ( data->pad0 || data->pad1 )
@@ -634,7 +473,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_inject_msi:
     {
         const struct xen_dm_op_inject_msi *data =
-            &op.u.inject_msi;
+            &op->u.inject_msi;
 
         rc = -EINVAL;
         if ( data->pad )
@@ -647,7 +486,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_remote_shutdown:
     {
         const struct xen_dm_op_remote_shutdown *data =
-            &op.u.remote_shutdown;
+            &op->u.remote_shutdown;
 
         domain_shutdown(d, data->reason);
         rc = 0;
@@ -656,7 +495,7 @@ static int dm_op(const struct dmop_args *op_args)
 
     case XEN_DMOP_relocate_memory:
     {
-        struct xen_dm_op_relocate_memory *data = &op.u.relocate_memory;
+        struct xen_dm_op_relocate_memory *data = &op->u.relocate_memory;
         struct xen_add_to_physmap xatp = {
             .domid = op_args->domid,
             .size = data->size,
@@ -679,7 +518,7 @@ static int dm_op(const struct dmop_args *op_args)
             data->size -= rc;
             data->src_gfn += rc;
             data->dst_gfn += rc;
-            const_op = false;
+            *const_op = false;
             rc = -ERESTART;
         }
         break;
@@ -688,7 +527,7 @@ static int dm_op(const struct dmop_args *op_args)
     case XEN_DMOP_pin_memory_cacheattr:
     {
         const struct xen_dm_op_pin_memory_cacheattr *data =
-            &op.u.pin_memory_cacheattr;
+            &op->u.pin_memory_cacheattr;
 
         if ( data->pad )
         {
@@ -706,94 +545,6 @@ static int dm_op(const struct dmop_args *op_args)
         break;
     }
 
-    if ( (!rc || rc == -ERESTART) &&
-         !const_op && copy_to_guest_offset(op_args->buf[0].h, offset,
-                                           (void *)&op.u, op_size[op.op]) )
-        rc = -EFAULT;
-
- out:
-    rcu_unlock_domain(d);
-
-    return rc;
-}
-
-CHECK_dm_op_create_ioreq_server;
-CHECK_dm_op_get_ioreq_server_info;
-CHECK_dm_op_ioreq_server_range;
-CHECK_dm_op_set_ioreq_server_state;
-CHECK_dm_op_destroy_ioreq_server;
-CHECK_dm_op_track_dirty_vram;
-CHECK_dm_op_set_pci_intx_level;
-CHECK_dm_op_set_isa_irq_level;
-CHECK_dm_op_set_pci_link_route;
-CHECK_dm_op_modified_memory;
-CHECK_dm_op_set_mem_type;
-CHECK_dm_op_inject_event;
-CHECK_dm_op_inject_msi;
-CHECK_dm_op_remote_shutdown;
-CHECK_dm_op_relocate_memory;
-CHECK_dm_op_pin_memory_cacheattr;
-
-int compat_dm_op(domid_t domid,
-                 unsigned int nr_bufs,
-                 XEN_GUEST_HANDLE_PARAM(void) bufs)
-{
-    struct dmop_args args;
-    unsigned int i;
-    int rc;
-
-    if ( nr_bufs > ARRAY_SIZE(args.buf) )
-        return -E2BIG;
-
-    args.domid = domid;
-    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
-
-    for ( i = 0; i < args.nr_bufs; i++ )
-    {
-        struct compat_dm_op_buf cmp;
-
-        if ( copy_from_guest_offset(&cmp, bufs, i, 1) )
-            return -EFAULT;
-
-#define XLAT_dm_op_buf_HNDL_h(_d_, _s_) \
-        guest_from_compat_handle((_d_)->h, (_s_)->h)
-
-        XLAT_dm_op_buf(&args.buf[i], &cmp);
-
-#undef XLAT_dm_op_buf_HNDL_h
-    }
-
-    rc = dm_op(&args);
-
-    if ( rc == -ERESTART )
-        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
-                                           domid, nr_bufs, bufs);
-
-    return rc;
-}
-
-long do_dm_op(domid_t domid,
-              unsigned int nr_bufs,
-              XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs)
-{
-    struct dmop_args args;
-    int rc;
-
-    if ( nr_bufs > ARRAY_SIZE(args.buf) )
-        return -E2BIG;
-
-    args.domid = domid;
-    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
-
-    if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) )
-        return -EFAULT;
-
-    rc = dm_op(&args);
-
-    if ( rc == -ERESTART )
-        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
-                                           domid, nr_bufs, bufs);
-
     return rc;
 }
 
diff --git a/xen/common/hvm/Makefile b/xen/common/hvm/Makefile
index 326215d..335fcc9 100644
--- a/xen/common/hvm/Makefile
+++ b/xen/common/hvm/Makefile
@@ -1 +1,2 @@
+obj-y += dm.o
 obj-y += ioreq.o
diff --git a/xen/common/hvm/dm.c b/xen/common/hvm/dm.c
new file mode 100644
index 0000000..09e9542
--- /dev/null
+++ b/xen/common/hvm/dm.c
@@ -0,0 +1,287 @@
+/*
+ * Copyright (c) 2016 Citrix Systems Inc.
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/guest_access.h>
+#include <xen/hvm/ioreq.h>
+#include <xen/hypercall.h>
+#include <xen/nospec.h>
+
+static int dm_op(const struct dmop_args *op_args)
+{
+    struct domain *d;
+    struct xen_dm_op op;
+    long rc;
+    bool const_op = true;
+    const size_t offset = offsetof(struct xen_dm_op, u);
+
+    static const uint8_t op_size[] = {
+        [XEN_DMOP_create_ioreq_server]              = sizeof(struct xen_dm_op_create_ioreq_server),
+        [XEN_DMOP_get_ioreq_server_info]            = sizeof(struct xen_dm_op_get_ioreq_server_info),
+        [XEN_DMOP_map_io_range_to_ioreq_server]     = sizeof(struct xen_dm_op_ioreq_server_range),
+        [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range),
+        [XEN_DMOP_set_ioreq_server_state]           = sizeof(struct xen_dm_op_set_ioreq_server_state),
+        [XEN_DMOP_destroy_ioreq_server]             = sizeof(struct xen_dm_op_destroy_ioreq_server),
+        [XEN_DMOP_track_dirty_vram]                 = sizeof(struct xen_dm_op_track_dirty_vram),
+        [XEN_DMOP_set_pci_intx_level]               = sizeof(struct xen_dm_op_set_pci_intx_level),
+        [XEN_DMOP_set_isa_irq_level]                = sizeof(struct xen_dm_op_set_isa_irq_level),
+        [XEN_DMOP_set_pci_link_route]               = sizeof(struct xen_dm_op_set_pci_link_route),
+        [XEN_DMOP_modified_memory]                  = sizeof(struct xen_dm_op_modified_memory),
+        [XEN_DMOP_set_mem_type]                     = sizeof(struct xen_dm_op_set_mem_type),
+        [XEN_DMOP_inject_event]                     = sizeof(struct xen_dm_op_inject_event),
+        [XEN_DMOP_inject_msi]                       = sizeof(struct xen_dm_op_inject_msi),
+        [XEN_DMOP_map_mem_type_to_ioreq_server]     = sizeof(struct xen_dm_op_map_mem_type_to_ioreq_server),
+        [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
+        [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
+        [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
+    };
+
+    rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
+    if ( rc )
+        return rc;
+
+    if ( !is_hvm_domain(d) )
+        goto out;
+
+    rc = xsm_dm_op(XSM_DM_PRIV, d);
+    if ( rc )
+        goto out;
+
+    rc = -EFAULT;
+    if ( op_args->buf[0].size < offset )
+        goto out;
+
+    if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) )
+        goto out;
+
+    if ( op.op >= ARRAY_SIZE(op_size) )
+    {
+        rc = -EOPNOTSUPP;
+        goto out;
+    }
+
+    op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size));
+
+    if ( op_args->buf[0].size < offset + op_size[op.op] )
+        goto out;
+
+    if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset,
+                                op_size[op.op]) )
+        goto out;
+
+    rc = -EINVAL;
+    if ( op.pad )
+        goto out;
+
+    switch ( op.op )
+    {
+    case XEN_DMOP_create_ioreq_server:
+    {
+        struct xen_dm_op_create_ioreq_server *data =
+            &op.u.create_ioreq_server;
+
+        const_op = false;
+
+        rc = -EINVAL;
+        if ( data->pad[0] || data->pad[1] || data->pad[2] )
+            break;
+
+        rc = hvm_create_ioreq_server(d, data->handle_bufioreq,
+                                     &data->id);
+        break;
+    }
+
+    case XEN_DMOP_get_ioreq_server_info:
+    {
+        struct xen_dm_op_get_ioreq_server_info *data =
+            &op.u.get_ioreq_server_info;
+        const uint16_t valid_flags = XEN_DMOP_no_gfns;
+
+        const_op = false;
+
+        rc = -EINVAL;
+        if ( data->flags & ~valid_flags )
+            break;
+
+        rc = hvm_get_ioreq_server_info(d, data->id,
+                                       (data->flags & XEN_DMOP_no_gfns) ?
+                                       NULL : (unsigned long *)&data->ioreq_gfn,
+                                       (data->flags & XEN_DMOP_no_gfns) ?
+                                       NULL : (unsigned long *)&data->bufioreq_gfn,
+                                       &data->bufioreq_port);
+        break;
+    }
+
+    case XEN_DMOP_map_io_range_to_ioreq_server:
+    {
+        const struct xen_dm_op_ioreq_server_range *data =
+            &op.u.map_io_range_to_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type,
+                                              data->start, data->end);
+        break;
+    }
+
+    case XEN_DMOP_unmap_io_range_from_ioreq_server:
+    {
+        const struct xen_dm_op_ioreq_server_range *data =
+            &op.u.unmap_io_range_from_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type,
+                                                  data->start, data->end);
+        break;
+    }
+
+    case XEN_DMOP_set_ioreq_server_state:
+    {
+        const struct xen_dm_op_set_ioreq_server_state *data =
+            &op.u.set_ioreq_server_state;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled);
+        break;
+    }
+
+    case XEN_DMOP_destroy_ioreq_server:
+    {
+        const struct xen_dm_op_destroy_ioreq_server *data =
+            &op.u.destroy_ioreq_server;
+
+        rc = -EINVAL;
+        if ( data->pad )
+            break;
+
+        rc = hvm_destroy_ioreq_server(d, data->id);
+        break;
+    }
+
+    default:
+        rc = arch_dm_op(&op, d, op_args, &const_op);
+    }
+
+    if ( (!rc || rc == -ERESTART) &&
+         !const_op && copy_to_guest_offset(op_args->buf[0].h, offset,
+                                           (void *)&op.u, op_size[op.op]) )
+        rc = -EFAULT;
+
+ out:
+    rcu_unlock_domain(d);
+
+    return rc;
+}
+
+#ifdef CONFIG_COMPAT
+CHECK_dm_op_create_ioreq_server;
+CHECK_dm_op_get_ioreq_server_info;
+CHECK_dm_op_ioreq_server_range;
+CHECK_dm_op_set_ioreq_server_state;
+CHECK_dm_op_destroy_ioreq_server;
+CHECK_dm_op_track_dirty_vram;
+CHECK_dm_op_set_pci_intx_level;
+CHECK_dm_op_set_isa_irq_level;
+CHECK_dm_op_set_pci_link_route;
+CHECK_dm_op_modified_memory;
+CHECK_dm_op_set_mem_type;
+CHECK_dm_op_inject_event;
+CHECK_dm_op_inject_msi;
+CHECK_dm_op_remote_shutdown;
+CHECK_dm_op_relocate_memory;
+CHECK_dm_op_pin_memory_cacheattr;
+
+int compat_dm_op(domid_t domid,
+                 unsigned int nr_bufs,
+                 XEN_GUEST_HANDLE_PARAM(void) bufs)
+{
+    struct dmop_args args;
+    unsigned int i;
+    int rc;
+
+    if ( nr_bufs > ARRAY_SIZE(args.buf) )
+        return -E2BIG;
+
+    args.domid = domid;
+    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
+
+    for ( i = 0; i < args.nr_bufs; i++ )
+    {
+        struct compat_dm_op_buf cmp;
+
+        if ( copy_from_guest_offset(&cmp, bufs, i, 1) )
+            return -EFAULT;
+
+#define XLAT_dm_op_buf_HNDL_h(_d_, _s_) \
+        guest_from_compat_handle((_d_)->h, (_s_)->h)
+
+        XLAT_dm_op_buf(&args.buf[i], &cmp);
+
+#undef XLAT_dm_op_buf_HNDL_h
+    }
+
+    rc = dm_op(&args);
+
+    if ( rc == -ERESTART )
+        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
+                                           domid, nr_bufs, bufs);
+
+    return rc;
+}
+#endif
+
+long do_dm_op(domid_t domid,
+              unsigned int nr_bufs,
+              XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs)
+{
+    struct dmop_args args;
+    int rc;
+
+    if ( nr_bufs > ARRAY_SIZE(args.buf) )
+        return -E2BIG;
+
+    args.domid = domid;
+    args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1);
+
+    if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) )
+        return -EFAULT;
+
+    rc = dm_op(&args);
+
+    if ( rc == -ERESTART )
+        rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih",
+                                           domid, nr_bufs, bufs);
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h
index 655acc7..19f509f 100644
--- a/xen/include/xen/hypercall.h
+++ b/xen/include/xen/hypercall.h
@@ -150,6 +150,18 @@ do_dm_op(
     unsigned int nr_bufs,
     XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs);
 
+struct dmop_args {
+    domid_t domid;
+    unsigned int nr_bufs;
+    /* Reserve enough buf elements for all current hypercalls. */
+    struct xen_dm_op_buf buf[2];
+};
+
+int arch_dm_op(struct xen_dm_op *op,
+               struct domain *d,
+               const struct dmop_args *op_args,
+               bool *const_op);
+
 #ifdef CONFIG_HYPFS
 extern long
 do_hypfs_op(
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 03/12] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
  2020-08-03 18:21 ` [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
  2020-08-03 18:21 ` [RFC PATCH V1 02/12] hvm/dm: Make x86's DM " Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-03 18:21 ` [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Jan Beulich, Volodymyr Babchuk, Roger Pau Monné

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As x86 implementation of XENMEM_resource_ioreq_server can be
re-used on Arm later on, this patch makes it common and removes
arch_acquire_resource as unneeded.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Please note, this is a split/cleanup of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 xen/arch/x86/mm.c        | 45 ---------------------------------------------
 xen/common/memory.c      | 45 +++++++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-arm/mm.h |  8 --------
 xen/include/asm-x86/mm.h |  4 ----
 4 files changed, 43 insertions(+), 59 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2b06e15..33238d0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -99,7 +99,6 @@
  * doing the final put_page(), and remove it from the iommu if so.
  */
 
-#include <xen/hvm/ioreq.h>
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
@@ -4600,50 +4599,6 @@ int xenmem_add_to_physmap_one(
     return rc;
 }
 
-int arch_acquire_resource(struct domain *d, unsigned int type,
-                          unsigned int id, unsigned long frame,
-                          unsigned int nr_frames, xen_pfn_t mfn_list[])
-{
-    int rc;
-
-    switch ( type )
-    {
-#ifdef CONFIG_HVM
-    case XENMEM_resource_ioreq_server:
-    {
-        ioservid_t ioservid = id;
-        unsigned int i;
-
-        rc = -EINVAL;
-        if ( !is_hvm_domain(d) )
-            break;
-
-        if ( id != (unsigned int)ioservid )
-            break;
-
-        rc = 0;
-        for ( i = 0; i < nr_frames; i++ )
-        {
-            mfn_t mfn;
-
-            rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn);
-            if ( rc )
-                break;
-
-            mfn_list[i] = mfn_x(mfn);
-        }
-        break;
-    }
-#endif
-
-    default:
-        rc = -EOPNOTSUPP;
-        break;
-    }
-
-    return rc;
-}
-
 long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     int rc;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 714077c..9283e5e 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -30,6 +30,10 @@
 #include <public/memory.h>
 #include <xsm/xsm.h>
 
+#ifdef CONFIG_IOREQ_SERVER
+#include <xen/hvm/ioreq.h>
+#endif
+
 #ifdef CONFIG_X86
 #include <asm/guest.h>
 #endif
@@ -1045,6 +1049,38 @@ static int acquire_grant_table(struct domain *d, unsigned int id,
     return 0;
 }
 
+#ifdef CONFIG_IOREQ_SERVER
+static int acquire_ioreq_server(struct domain *d,
+                                unsigned int id,
+                                unsigned long frame,
+                                unsigned int nr_frames,
+                                xen_pfn_t mfn_list[])
+{
+    ioservid_t ioservid = id;
+    unsigned int i;
+    int rc;
+
+    if ( !is_hvm_domain(d) )
+        return -EINVAL;
+
+    if ( id != (unsigned int)ioservid )
+        return -EINVAL;
+
+    for ( i = 0; i < nr_frames; i++ )
+    {
+        mfn_t mfn;
+
+        rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn);
+        if ( rc )
+            return rc;
+
+        mfn_list[i] = mfn_x(mfn);
+    }
+
+    return 0;
+}
+#endif
+
 static int acquire_resource(
     XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg)
 {
@@ -1095,9 +1131,14 @@ static int acquire_resource(
                                  mfn_list);
         break;
 
+#ifdef CONFIG_IOREQ_SERVER
+    case XENMEM_resource_ioreq_server:
+        rc = acquire_ioreq_server(d, xmar.id, xmar.frame, xmar.nr_frames,
+                                  mfn_list);
+        break;
+#endif
     default:
-        rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame,
-                                   xmar.nr_frames, mfn_list);
+        rc = -EOPNOTSUPP;
         break;
     }
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index f8ba49b..0b7de31 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -358,14 +358,6 @@ static inline void put_page_and_type(struct page_info *page)
 
 void clear_and_clean_page(struct page_info *page);
 
-static inline
-int arch_acquire_resource(struct domain *d, unsigned int type, unsigned int id,
-                          unsigned long frame, unsigned int nr_frames,
-                          xen_pfn_t mfn_list[])
-{
-    return -EOPNOTSUPP;
-}
-
 unsigned int arch_get_dma_bitsize(void);
 
 #endif /*  __ARCH_ARM_MM__ */
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 7e74996..2e111ad 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -649,8 +649,4 @@ static inline bool arch_mfn_in_directmap(unsigned long mfn)
     return mfn <= (virt_to_mfn(eva - 1) + 1);
 }
 
-int arch_acquire_resource(struct domain *d, unsigned int type,
-                          unsigned int id, unsigned long frame,
-                          unsigned int nr_frames, xen_pfn_t mfn_list[]);
-
 #endif /* __ASM_X86_MM_H__ */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (2 preceding siblings ...)
  2020-08-03 18:21 ` [RFC PATCH V1 03/12] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-04  7:49   ` Paul Durrant
                     ` (3 more replies)
  2020-08-03 18:21 ` [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
                   ` (8 subsequent siblings)
  12 siblings, 4 replies; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Jan Beulich, Daniel De Graaf, Volodymyr Babchuk

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch makes possible to forward Guest MMIO accesses
to a device emulator on Arm and enables that support for
Arm64.

Also update XSM code a bit to let DM op be used on Arm.
New arch DM op will be introduced in the follow-up patch.

Please note, at the moment build on Arm32 is broken
(see cmpxchg usage in hvm_send_buffered_ioreq()) if someone
wants to enable CONFIG_IOREQ_SERVER due to the lack of
cmpxchg_64 support on Arm32.

Please note, this is a split/cleanup of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 tools/libxc/xc_dom_arm.c        |  25 +++++++---
 xen/arch/arm/Kconfig            |   1 +
 xen/arch/arm/Makefile           |   2 +
 xen/arch/arm/dm.c               |  34 +++++++++++++
 xen/arch/arm/domain.c           |   9 ++++
 xen/arch/arm/hvm.c              |  46 +++++++++++++++++-
 xen/arch/arm/io.c               |  67 +++++++++++++++++++++++++-
 xen/arch/arm/ioreq.c            |  86 +++++++++++++++++++++++++++++++++
 xen/arch/arm/traps.c            |  17 +++++++
 xen/common/memory.c             |   5 +-
 xen/include/asm-arm/domain.h    |  80 +++++++++++++++++++++++++++++++
 xen/include/asm-arm/hvm/ioreq.h | 103 ++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/mmio.h      |   1 +
 xen/include/asm-arm/p2m.h       |   7 +--
 xen/include/xsm/dummy.h         |   4 +-
 xen/include/xsm/xsm.h           |   6 +--
 xen/xsm/dummy.c                 |   2 +-
 xen/xsm/flask/hooks.c           |   5 +-
 18 files changed, 476 insertions(+), 24 deletions(-)
 create mode 100644 xen/arch/arm/dm.c
 create mode 100644 xen/arch/arm/ioreq.c
 create mode 100644 xen/include/asm-arm/hvm/ioreq.h

diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index 931404c..b5fc066 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -26,11 +26,19 @@
 #include "xg_private.h"
 #include "xc_dom.h"
 
-#define NR_MAGIC_PAGES 4
+
 #define CONSOLE_PFN_OFFSET 0
 #define XENSTORE_PFN_OFFSET 1
 #define MEMACCESS_PFN_OFFSET 2
 #define VUART_PFN_OFFSET 3
+#define IOREQ_SERVER_PFN_OFFSET 4
+
+#define NR_IOREQ_SERVER_PAGES 8
+#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES)
+
+#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT)
+
+#define special_pfn(x)  (GUEST_MAGIC_BASE_PFN + (x))
 
 #define LPAE_SHIFT 9
 
@@ -51,7 +59,7 @@ const char *xc_domain_get_native_protocol(xc_interface *xch,
 static int alloc_magic_pages(struct xc_dom_image *dom)
 {
     int rc, i;
-    const xen_pfn_t base = GUEST_MAGIC_BASE >> XC_PAGE_SHIFT;
+    const xen_pfn_t base = special_pfn(0);
     xen_pfn_t p2m[NR_MAGIC_PAGES];
 
     BUILD_BUG_ON(NR_MAGIC_PAGES > GUEST_MAGIC_SIZE >> XC_PAGE_SHIFT);
@@ -71,10 +79,9 @@ static int alloc_magic_pages(struct xc_dom_image *dom)
     dom->xenstore_pfn = base + XENSTORE_PFN_OFFSET;
     dom->vuart_gfn = base + VUART_PFN_OFFSET;
 
-    xc_clear_domain_page(dom->xch, dom->guest_domid, dom->console_pfn);
-    xc_clear_domain_page(dom->xch, dom->guest_domid, dom->xenstore_pfn);
-    xc_clear_domain_page(dom->xch, dom->guest_domid, base + MEMACCESS_PFN_OFFSET);
-    xc_clear_domain_page(dom->xch, dom->guest_domid, dom->vuart_gfn);
+    /* XXX: Check return */
+    xc_clear_domain_pages(dom->xch, dom->guest_domid, special_pfn(0),
+                          NR_MAGIC_PAGES);
 
     xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_CONSOLE_PFN,
             dom->console_pfn);
@@ -88,6 +95,12 @@ static int alloc_magic_pages(struct xc_dom_image *dom)
     xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_STORE_EVTCHN,
             dom->xenstore_evtchn);
 
+    /* Tell the domain where the pages are and how many there are. */
+    xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_IOREQ_SERVER_PFN,
+                     special_pfn(IOREQ_SERVER_PFN_OFFSET));
+    xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES,
+                     NR_IOREQ_SERVER_PAGES);
+
     return 0;
 }
 
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 2777388..6b8a969 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -13,6 +13,7 @@ config ARM_64
 	def_bool y
 	depends on 64BIT
 	select HAS_FAST_MULTIPLY
+	select IOREQ_SERVER
 
 config ARM
 	def_bool y
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 7e82b21..617fa3e 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -13,6 +13,7 @@ obj-y += cpuerrata.o
 obj-y += cpufeature.o
 obj-y += decode.o
 obj-y += device.o
+obj-$(CONFIG_IOREQ_SERVER) += dm.o
 obj-y += domain.o
 obj-y += domain_build.init.o
 obj-y += domctl.o
@@ -27,6 +28,7 @@ obj-y += guest_atomics.o
 obj-y += guest_walk.o
 obj-y += hvm.o
 obj-y += io.o
+obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
 obj-y += irq.o
 obj-y += kernel.init.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
new file mode 100644
index 0000000..2437099
--- /dev/null
+++ b/xen/arch/arm/dm.c
@@ -0,0 +1,34 @@
+/*
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/hypercall.h>
+#include <asm/vgic.h>
+
+int arch_dm_op(struct xen_dm_op *op, struct domain *d,
+               const struct dmop_args *op_args, bool *const_op)
+{
+    return -EOPNOTSUPP;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 3116932..658eec0 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -12,6 +12,7 @@
 #include <xen/bitops.h>
 #include <xen/errno.h>
 #include <xen/grant_table.h>
+#include <xen/hvm/ioreq.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
 #include <xen/lib.h>
@@ -681,6 +682,10 @@ int arch_domain_create(struct domain *d,
 
     ASSERT(config != NULL);
 
+#ifdef CONFIG_IOREQ_SERVER
+    hvm_ioreq_init(d);
+#endif
+
     /* p2m_init relies on some value initialized by the IOMMU subsystem */
     if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 )
         goto fail;
@@ -999,6 +1004,10 @@ int domain_relinquish_resources(struct domain *d)
         if (ret )
             return ret;
 
+#ifdef CONFIG_IOREQ_SERVER
+        hvm_destroy_all_ioreq_servers(d);
+#endif
+
     PROGRESS(xen):
         ret = relinquish_memory(d, &d->xenpage_list);
         if ( ret )
diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c
index 8951b34..0379493 100644
--- a/xen/arch/arm/hvm.c
+++ b/xen/arch/arm/hvm.c
@@ -51,6 +51,14 @@ static int hvm_allow_set_param(const struct domain *d, unsigned int param)
     case HVM_PARAM_MONITOR_RING_PFN:
         return d == current->domain ? -EPERM : 0;
 
+        /*
+         * XXX: Do we need to follow x86's logic here:
+         * "The following parameters should only be changed once"?
+         */
+    case HVM_PARAM_IOREQ_SERVER_PFN:
+    case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+        return 0;
+
         /* Writeable only by Xen, hole, deprecated, or out-of-range. */
     default:
         return -EINVAL;
@@ -69,6 +77,11 @@ static int hvm_allow_get_param(const struct domain *d, unsigned int param)
     case HVM_PARAM_CONSOLE_EVTCHN:
         return 0;
 
+        /* XXX: Could these be read by someone? What policy to apply? */
+    case HVM_PARAM_IOREQ_SERVER_PFN:
+    case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+        return 0;
+
         /*
          * The following parameters are intended for toolstack usage only.
          * They may not be read by the domain.
@@ -82,6 +95,37 @@ static int hvm_allow_get_param(const struct domain *d, unsigned int param)
     }
 }
 
+static int hvmop_set_param(struct domain *d, const struct xen_hvm_param *a)
+{
+    int rc = 0;
+
+    switch ( a->index )
+    {
+    case HVM_PARAM_IOREQ_SERVER_PFN:
+        d->arch.hvm.ioreq_gfn.base = a->value;
+        break;
+    case HVM_PARAM_NR_IOREQ_SERVER_PAGES:
+    {
+        unsigned int i;
+
+        if ( a->value == 0 ||
+             a->value > sizeof(d->arch.hvm.ioreq_gfn.mask) * 8 )
+        {
+            rc = -EINVAL;
+            break;
+        }
+        for ( i = 0; i < a->value; i++ )
+            set_bit(i, &d->arch.hvm.ioreq_gfn.mask);
+
+        break;
+    }
+    }
+
+    d->arch.hvm.params[a->index] = a->value;
+
+    return rc;
+}
+
 long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     long rc = 0;
@@ -111,7 +155,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
             if ( rc )
                 goto param_fail;
 
-            d->arch.hvm.params[a.index] = a.value;
+            rc = hvmop_set_param(d, &a);
         }
         else
         {
diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index ae7ef96..436f669 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -16,6 +16,7 @@
  * GNU General Public License for more details.
  */
 
+#include <xen/hvm/ioreq.h>
 #include <xen/lib.h>
 #include <xen/spinlock.h>
 #include <xen/sched.h>
@@ -107,6 +108,62 @@ static const struct mmio_handler *find_mmio_handler(struct domain *d,
     return handler;
 }
 
+#ifdef CONFIG_IOREQ_SERVER
+static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
+                                    struct vcpu *v, mmio_info_t *info)
+{
+    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
+    ioreq_t p = {
+        .type = IOREQ_TYPE_COPY,
+        .addr = info->gpa,
+        .size = 1 << info->dabt.size,
+        .count = 0,
+        .dir = !info->dabt.write,
+        .df = 0,         /* XXX: What's for? */
+        .data = get_user_reg(regs, info->dabt.reg),
+        .state = STATE_IOREQ_READY,
+    };
+    struct hvm_ioreq_server *s = NULL;
+    enum io_state rc;
+
+    switch ( vio->io_req.state )
+    {
+    case STATE_IOREQ_NONE:
+        break;
+    default:
+        printk("d%u wrong state %u\n", v->domain->domain_id,
+               vio->io_req.state);
+        return IO_ABORT;
+    }
+
+    s = hvm_select_ioreq_server(v->domain, &p);
+    if ( !s )
+        return IO_UNHANDLED;
+
+    if ( !info->dabt.valid )
+    {
+        printk("Valid bit not set\n");
+        return IO_ABORT;
+    }
+
+    vio->io_req = p;
+
+    rc = hvm_send_ioreq(s, &p, 0);
+    if ( rc != IO_RETRY || v->domain->is_shutting_down )
+        vio->io_req.state = STATE_IOREQ_NONE;
+    else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
+        rc = IO_HANDLED;
+    else
+        vio->io_completion = HVMIO_mmio_completion;
+
+    /* XXX: Decide what to do */
+    if ( rc == IO_RETRY )
+        rc = IO_HANDLED;
+
+    return rc;
+}
+#endif
+
 enum io_state try_handle_mmio(struct cpu_user_regs *regs,
                               const union hsr hsr,
                               paddr_t gpa)
@@ -123,7 +180,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
 
     handler = find_mmio_handler(v->domain, info.gpa);
     if ( !handler )
-        return IO_UNHANDLED;
+    {
+        int rc = IO_UNHANDLED;
+
+#ifdef CONFIG_IOREQ_SERVER
+        rc = try_fwd_ioserv(regs, v, &info);
+#endif
+
+        return rc;
+    }
 
     /* All the instructions used on emulated MMIO region should be valid */
     if ( !dabt.valid )
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
new file mode 100644
index 0000000..a9cc839
--- /dev/null
+++ b/xen/arch/arm/ioreq.c
@@ -0,0 +1,86 @@
+/*
+ * arm/ioreq.c: hardware virtual machine I/O emulation
+ *
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/ctype.h>
+#include <xen/hvm/ioreq.h>
+#include <xen/init.h>
+#include <xen/lib.h>
+#include <xen/trace.h>
+#include <xen/sched.h>
+#include <xen/irq.h>
+#include <xen/softirq.h>
+#include <xen/domain.h>
+#include <xen/domain_page.h>
+#include <xen/event.h>
+#include <xen/paging.h>
+#include <xen/vpci.h>
+
+#include <public/hvm/dm_op.h>
+#include <public/hvm/ioreq.h>
+
+bool handle_mmio(void)
+{
+    struct vcpu *v = current;
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    const union hsr hsr = { .bits = regs->hsr };
+    const struct hsr_dabt dabt = hsr.dabt;
+    /* Code is similar to handle_read */
+    uint8_t size = (1 << dabt.size) * 8;
+    register_t r = v->arch.hvm.hvm_io.io_req.data;
+
+    /* We should only be here on Guest Data Abort */
+    ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL);
+
+    /* We are done with the IO */
+    /* XXX: Is it the right place? */
+    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
+
+    /* XXX: Do we need to take care of write here ? */
+    if ( dabt.write )
+        return true;
+
+    /*
+     * Sign extend if required.
+     * Note that we expect the read handler to have zeroed the bits
+     * outside the requested access size.
+     */
+    if ( dabt.sign && (r & (1UL << (size - 1))) )
+    {
+        /*
+         * We are relying on register_t using the same as
+         * an unsigned long in order to keep the 32-bit assembly
+         * code smaller.
+         */
+        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
+        r |= (~0UL) << size;
+    }
+
+    set_user_reg(regs, dabt.reg, r);
+
+    return true;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 8f40d0e..4cdf098 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -18,6 +18,7 @@
 
 #include <xen/domain_page.h>
 #include <xen/errno.h>
+#include <xen/hvm/ioreq.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
 #include <xen/iocap.h>
@@ -1384,6 +1385,9 @@ static arm_hypercall_t arm_hypercall_table[] = {
 #ifdef CONFIG_HYPFS
     HYPERCALL(hypfs_op, 5),
 #endif
+#ifdef CONFIG_IOREQ_SERVER
+    HYPERCALL(dm_op, 3),
+#endif
 };
 
 #ifndef NDEBUG
@@ -1958,6 +1962,9 @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
             case IO_UNHANDLED:
                 /* IO unhandled, try another way to handle it. */
                 break;
+            default:
+                /* XXX: Handle IO_RETRY */
+                ASSERT_UNREACHABLE();
             }
         }
 
@@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
  */
 void leave_hypervisor_to_guest(void)
 {
+#ifdef CONFIG_IOREQ_SERVER
+    /*
+     * XXX: Check the return. Shall we call that in
+     * continue_running and context_switch instead?
+     * The benefits would be to avoid calling
+     * handle_hvm_io_completion on every return.
+     */
+    local_irq_enable();
+    handle_hvm_io_completion(current);
+#endif
     local_irq_disable();
 
     check_for_vcpu_work();
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 9283e5e..0000477 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -8,6 +8,7 @@
  */
 
 #include <xen/domain_page.h>
+#include <xen/hvm/ioreq.h>
 #include <xen/types.h>
 #include <xen/lib.h>
 #include <xen/mm.h>
@@ -30,10 +31,6 @@
 #include <public/memory.h>
 #include <xsm/xsm.h>
 
-#ifdef CONFIG_IOREQ_SERVER
-#include <xen/hvm/ioreq.h>
-#endif
-
 #ifdef CONFIG_X86
 #include <asm/guest.h>
 #endif
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 4e2f582..e060b0a 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -11,12 +11,64 @@
 #include <asm/vgic.h>
 #include <asm/vpl011.h>
 #include <public/hvm/params.h>
+#include <public/hvm/dm_op.h>
+#include <public/hvm/ioreq.h>
 #include <xen/serial.h>
 #include <xen/rbtree.h>
 
+struct hvm_ioreq_page {
+    gfn_t gfn;
+    struct page_info *page;
+    void *va;
+};
+
+struct hvm_ioreq_vcpu {
+    struct list_head list_entry;
+    struct vcpu      *vcpu;
+    evtchn_port_t    ioreq_evtchn;
+    bool             pending;
+};
+
+#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
+#define MAX_NR_IO_RANGES  256
+
+#define MAX_NR_IOREQ_SERVERS 8
+#define DEFAULT_IOSERVID 0
+
+struct hvm_ioreq_server {
+    struct domain          *target, *emulator;
+
+    /* Lock to serialize toolstack modifications */
+    spinlock_t             lock;
+
+    struct hvm_ioreq_page  ioreq;
+    struct list_head       ioreq_vcpu_list;
+    struct hvm_ioreq_page  bufioreq;
+
+    /* Lock to serialize access to buffered ioreq ring */
+    spinlock_t             bufioreq_lock;
+    evtchn_port_t          bufioreq_evtchn;
+    struct rangeset        *range[NR_IO_RANGE_TYPES];
+    bool                   enabled;
+    uint8_t                bufioreq_handling;
+};
+
 struct hvm_domain
 {
     uint64_t              params[HVM_NR_PARAMS];
+
+    /* Guest page range used for non-default ioreq servers */
+    struct {
+        unsigned long base;
+        unsigned long mask;
+        unsigned long legacy_mask; /* indexed by HVM param number */
+    } ioreq_gfn;
+
+    /* Lock protects all other values in the sub-struct and the default */
+    struct {
+        spinlock_t              lock;
+        struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
+    } ioreq_server;
 };
 
 #ifdef CONFIG_ARM_64
@@ -93,6 +145,29 @@ struct arch_domain
 #endif
 }  __cacheline_aligned;
 
+enum hvm_io_completion {
+    HVMIO_no_completion,
+    HVMIO_mmio_completion,
+    HVMIO_pio_completion,
+    HVMIO_realmode_completion
+};
+
+struct hvm_vcpu_io {
+    /* I/O request in flight to device model. */
+    enum hvm_io_completion io_completion;
+    ioreq_t                io_req;
+
+    /*
+     * HVM emulation:
+     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
+     *  The latter is known to be an MMIO frame (not RAM).
+     *  This translation is only valid for accesses as per @mmio_access.
+     */
+    struct npfec        mmio_access;
+    unsigned long       mmio_gla;
+    unsigned long       mmio_gpfn;
+};
+
 struct arch_vcpu
 {
     struct {
@@ -206,6 +281,11 @@ struct arch_vcpu
      */
     bool need_flush_to_ram;
 
+    struct hvm_vcpu
+    {
+        struct hvm_vcpu_io hvm_io;
+    } hvm;
+
 }  __cacheline_aligned;
 
 void vcpu_show_execution_state(struct vcpu *);
diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h
new file mode 100644
index 0000000..83a560c
--- /dev/null
+++ b/xen/include/asm-arm/hvm/ioreq.h
@@ -0,0 +1,103 @@
+/*
+ * hvm.h: Hardware virtual machine assist interface definitions.
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ * Copyright (c) 2019 Arm ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __ASM_ARM_HVM_IOREQ_H__
+#define __ASM_ARM_HVM_IOREQ_H__
+
+#include <public/hvm/ioreq.h>
+#include <public/hvm/dm_op.h>
+
+#define has_vpci(d) (false)
+
+bool handle_mmio(void);
+
+static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
+{
+    /* XXX */
+    BUG();
+    return true;
+}
+
+static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
+{
+    return p->addr;
+}
+
+static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p)
+{
+    unsigned long size = p->size;
+
+    return p->addr + size - 1;
+}
+
+struct hvm_ioreq_server;
+
+static inline int p2m_set_ioreq_server(struct domain *d,
+                                       unsigned int flags,
+                                       struct hvm_ioreq_server *s)
+{
+    return -EOPNOTSUPP;
+}
+
+static inline void msix_write_completion(struct vcpu *v)
+{
+}
+
+static inline void handle_realmode_completion(void)
+{
+    ASSERT_UNREACHABLE();
+}
+
+static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
+{
+}
+
+static inline void hvm_get_ioreq_server_range_type(struct domain *d,
+                                                   ioreq_t *p,
+                                                   uint8_t *type,
+                                                   uint64_t *addr)
+{
+    *type = (p->type == IOREQ_TYPE_PIO) ?
+             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
+    *addr = p->addr;
+}
+
+static inline void arch_hvm_ioreq_init(struct domain *d)
+{
+}
+
+static inline void arch_hvm_ioreq_destroy(struct domain *d)
+{
+}
+
+#define IOREQ_IO_HANDLED     IO_HANDLED
+#define IOREQ_IO_UNHANDLED   IO_UNHANDLED
+#define IOREQ_IO_RETRY       IO_RETRY
+
+#endif /* __ASM_X86_HVM_IOREQ_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h
index 8dbfb27..7ab873c 100644
--- a/xen/include/asm-arm/mmio.h
+++ b/xen/include/asm-arm/mmio.h
@@ -37,6 +37,7 @@ enum io_state
     IO_ABORT,       /* The IO was handled by the helper and led to an abort. */
     IO_HANDLED,     /* The IO was successfully handled by the helper. */
     IO_UNHANDLED,   /* The IO was not handled by the helper. */
+    IO_RETRY,       /* Retry the emulation for some reason */
 };
 
 typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info,
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 5fdb6e8..5823f11 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
                                         mfn_t mfn)
 {
     /*
-     * NOTE: If this is implemented then proper reference counting of
-     *       foreign entries will need to be implemented.
+     * XXX: handle properly reference. It looks like the page may not always
+     * belong to d.
      */
-    return -EOPNOTSUPP;
+
+    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
 }
 
 /*
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 2368ace..317455a 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -713,14 +713,14 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
     }
 }
 
+#endif /* CONFIG_X86 */
+
 static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
 {
     XSM_ASSERT_ACTION(XSM_DM_PRIV);
     return xsm_default_action(action, current->domain, d);
 }
 
-#endif /* CONFIG_X86 */
-
 #ifdef CONFIG_ARGO
 static XSM_INLINE int xsm_argo_enable(const struct domain *d)
 {
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index a80bcf3..2a9b39d 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -177,8 +177,8 @@ struct xsm_operations {
     int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*pmu_op) (struct domain *d, unsigned int op);
-    int (*dm_op) (struct domain *d);
 #endif
+    int (*dm_op) (struct domain *d);
     int (*xen_version) (uint32_t cmd);
     int (*domain_resource_map) (struct domain *d);
 #ifdef CONFIG_ARGO
@@ -688,13 +688,13 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
     return xsm_ops->pmu_op(d, op);
 }
 
+#endif /* CONFIG_X86 */
+
 static inline int xsm_dm_op(xsm_default_t def, struct domain *d)
 {
     return xsm_ops->dm_op(d);
 }
 
-#endif /* CONFIG_X86 */
-
 static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
 {
     return xsm_ops->xen_version(op);
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index d4cce68..e3afd06 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -148,8 +148,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, ioport_permission);
     set_to_dummy_if_null(ops, ioport_mapping);
     set_to_dummy_if_null(ops, pmu_op);
-    set_to_dummy_if_null(ops, dm_op);
 #endif
+    set_to_dummy_if_null(ops, dm_op);
     set_to_dummy_if_null(ops, xen_version);
     set_to_dummy_if_null(ops, domain_resource_map);
 #ifdef CONFIG_ARGO
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index a314bf8..645192a 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1662,14 +1662,13 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
         return -EPERM;
     }
 }
+#endif /* CONFIG_X86 */
 
 static int flask_dm_op(struct domain *d)
 {
     return current_has_perm(d, SECCLASS_HVM, HVM__DM);
 }
 
-#endif /* CONFIG_X86 */
-
 static int flask_xen_version (uint32_t op)
 {
     u32 dsid = domain_sid(current->domain);
@@ -1872,8 +1871,8 @@ static struct xsm_operations flask_ops = {
     .ioport_permission = flask_ioport_permission,
     .ioport_mapping = flask_ioport_mapping,
     .pmu_op = flask_pmu_op,
-    .dm_op = flask_dm_op,
 #endif
+    .dm_op = flask_dm_op,
     .xen_version = flask_xen_version,
     .domain_resource_map = flask_domain_resource_map,
 #ifdef CONFIG_ARGO
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (3 preceding siblings ...)
  2020-08-03 18:21 ` [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-04 23:22   ` Stefano Stabellini
  2020-08-05 16:15   ` Jan Beulich
  2020-08-03 18:21 ` [RFC PATCH V1 06/12] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Jan Beulich, Volodymyr Babchuk

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch adds ability to the device emulator to notify otherend
(some entity running in the guest) using a SPI and implements Arm
specific bits for it. Proposed interface allows emulator to set
the logical level of a one of a domain's IRQ lines.

Please note, this is a split/cleanup of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 tools/libs/devicemodel/core.c                   | 18 ++++++++++++++++++
 tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
 tools/libs/devicemodel/libxendevicemodel.map    |  1 +
 xen/arch/arm/dm.c                               | 22 +++++++++++++++++++++-
 xen/common/hvm/dm.c                             |  1 +
 xen/include/public/hvm/dm_op.h                  | 15 +++++++++++++++
 6 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index 4d40639..30bd79f 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
     return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
 }
 
+int xendevicemodel_set_irq_level(
+    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
+    unsigned int level)
+{
+    struct xen_dm_op op;
+    struct xen_dm_op_set_irq_level *data;
+
+    memset(&op, 0, sizeof(op));
+
+    op.op = XEN_DMOP_set_irq_level;
+    data = &op.u.set_irq_level;
+
+    data->irq = irq;
+    data->level = level;
+
+    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
+}
+
 int xendevicemodel_set_pci_link_route(
     xendevicemodel_handle *dmod, domid_t domid, uint8_t link, uint8_t irq)
 {
diff --git a/tools/libs/devicemodel/include/xendevicemodel.h b/tools/libs/devicemodel/include/xendevicemodel.h
index e877f5c..c06b3c8 100644
--- a/tools/libs/devicemodel/include/xendevicemodel.h
+++ b/tools/libs/devicemodel/include/xendevicemodel.h
@@ -209,6 +209,10 @@ int xendevicemodel_set_isa_irq_level(
     xendevicemodel_handle *dmod, domid_t domid, uint8_t irq,
     unsigned int level);
 
+int xendevicemodel_set_irq_level(
+    xendevicemodel_handle *dmod, domid_t domid, unsigned int irq,
+    unsigned int level);
+
 /**
  * This function maps a PCI INTx line to a an IRQ line.
  *
diff --git a/tools/libs/devicemodel/libxendevicemodel.map b/tools/libs/devicemodel/libxendevicemodel.map
index 561c62d..a0c3012 100644
--- a/tools/libs/devicemodel/libxendevicemodel.map
+++ b/tools/libs/devicemodel/libxendevicemodel.map
@@ -32,6 +32,7 @@ VERS_1.2 {
 	global:
 		xendevicemodel_relocate_memory;
 		xendevicemodel_pin_memory_cacheattr;
+		xendevicemodel_set_irq_level;
 } VERS_1.1;
 
 VERS_1.3 {
diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
index 2437099..8431805 100644
--- a/xen/arch/arm/dm.c
+++ b/xen/arch/arm/dm.c
@@ -20,7 +20,27 @@
 int arch_dm_op(struct xen_dm_op *op, struct domain *d,
                const struct dmop_args *op_args, bool *const_op)
 {
-    return -EOPNOTSUPP;
+    int rc;
+
+    switch ( op->op )
+    {
+    case XEN_DMOP_set_irq_level:
+    {
+        const struct xen_dm_op_set_irq_level *data =
+            &op->u.set_irq_level;
+
+        /* XXX: Handle check */
+        vgic_inject_irq(d, NULL, data->irq, data->level);
+        rc = 0;
+        break;
+    }
+
+    default:
+        rc = -EOPNOTSUPP;
+        break;
+    }
+
+    return rc;
 }
 
 /*
diff --git a/xen/common/hvm/dm.c b/xen/common/hvm/dm.c
index 09e9542..e2e1250 100644
--- a/xen/common/hvm/dm.c
+++ b/xen/common/hvm/dm.c
@@ -47,6 +47,7 @@ static int dm_op(const struct dmop_args *op_args)
         [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
         [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
         [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
+        [XEN_DMOP_set_irq_level]                    = sizeof(struct xen_dm_op_set_irq_level),
     };
 
     rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
index fd00e9d..c45d29e 100644
--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -417,6 +417,20 @@ struct xen_dm_op_pin_memory_cacheattr {
     uint32_t pad;
 };
 
+/*
+ * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's
+ *                         IRQ lines.
+ * XXX Handle PPIs.
+ */
+#define XEN_DMOP_set_irq_level 19
+
+struct xen_dm_op_set_irq_level {
+    uint32_t irq;
+    /* IN - Level: 0 -> deasserted, 1 -> asserted */
+    uint8_t  level;
+};
+
+
 struct xen_dm_op {
     uint32_t op;
     uint32_t pad;
@@ -430,6 +444,7 @@ struct xen_dm_op {
         struct xen_dm_op_track_dirty_vram track_dirty_vram;
         struct xen_dm_op_set_pci_intx_level set_pci_intx_level;
         struct xen_dm_op_set_isa_irq_level set_isa_irq_level;
+        struct xen_dm_op_set_irq_level set_irq_level;
         struct xen_dm_op_set_pci_link_route set_pci_link_route;
         struct xen_dm_op_modified_memory modified_memory;
         struct xen_dm_op_set_mem_type set_mem_type;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 06/12] libxl: Introduce basic virtio-mmio support on Arm
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (4 preceding siblings ...)
  2020-08-03 18:21 ` [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-03 18:21 ` [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain Oleksandr Tyshchenko
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Ian Jackson,
	Oleksandr Tyshchenko, Julien Grall, Anthony PERARD

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch creates specific device node in the Guest device-tree
with allocated MMIO range and SPI interrupt if specific 'virtio'
property is present in domain config.

Please note, this patch breaks device passthrough use-case which
will be fixed in one of the follow-up patches.

Please note, this is a split/cleanup of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 tools/libxl/libxl_arm.c     | 39 +++++++++++++++++++++++++++++++++++++++
 tools/libxl/libxl_types.idl |  1 +
 tools/xl/xl_parse.c         |  1 +
 3 files changed, 41 insertions(+)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 34f8a29..620b499 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -68,6 +68,10 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
             nr_spis = spi + 1;
     }
 
+
+    /* XXX: Handle properly virtio */
+    nr_spis = 1;
+
     LOG(DEBUG, "Configure the domain");
 
     config->arch.nr_spis = nr_spis;
@@ -659,6 +663,37 @@ static int make_vpl011_uart_node(libxl__gc *gc, void *fdt,
     return 0;
 }
 
+#define GUEST_VIRTIO_MMIO_BASE  xen_mk_ullong(0x02000000)
+#define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
+#define GUEST_VIRTIO_MMIO_SPI   33
+
+static int make_virtio_mmio_node(libxl__gc *gc, void *fdt)
+{
+    int res;
+    gic_interrupt intr;
+
+    /* XXX: Add address in the node name */
+    res = fdt_begin_node(fdt, "virtio");
+    if (res) return res;
+
+    res = fdt_property_compat(gc, fdt, 1, "virtio,mmio");
+    if (res) return res;
+
+    res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
+                            1, GUEST_VIRTIO_MMIO_BASE, GUEST_VIRTIO_MMIO_SIZE);
+    if (res) return res;
+
+    set_interrupt(intr, GUEST_VIRTIO_MMIO_SPI, 0xf, DT_IRQ_TYPE_EDGE_RISING);
+    res = fdt_property_interrupts(gc, fdt, &intr, 1);
+    if (res) return res;
+
+    res = fdt_end_node(fdt);
+    if (res) return res;
+
+    return 0;
+
+}
+
 static const struct arch_info *get_arch_info(libxl__gc *gc,
                                              const struct xc_dom_image *dom)
 {
@@ -962,6 +997,9 @@ next_resize:
         if (info->tee == LIBXL_TEE_TYPE_OPTEE)
             FDT( make_optee_node(gc, fdt) );
 
+        if (libxl_defbool_val(info->arch_arm.virtio))
+            FDT( make_virtio_mmio_node(gc, fdt) );
+
         if (pfdt)
             FDT( copy_partial_fdt(gc, fdt, pfdt) );
 
@@ -1179,6 +1217,7 @@ void libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
 {
     /* ACPI is disabled by default */
     libxl_defbool_setdefault(&b_info->acpi, false);
+    libxl_defbool_setdefault(&b_info->arch_arm.virtio, false);
 
     if (b_info->type != LIBXL_DOMAIN_TYPE_PV)
         return;
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 9d3f05f..b054bf9 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -639,6 +639,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 
 
     ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
+                               ("virtio", libxl_defbool),
                                ("vuart", libxl_vuart_type),
                               ])),
     # Alternate p2m is not bound to any architecture or guest type, as it is
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 61b4ef7..b8306aa 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2579,6 +2579,7 @@ skip_usbdev:
     }
 
     xlu_cfg_get_defbool(config, "dm_restrict", &b_info->dm_restrict, 0);
+    xlu_cfg_get_defbool(config, "virtio", &b_info->arch_arm.virtio, 0);
 
     if (c_info->type == LIBXL_DOMAIN_TYPE_HVM) {
         if (!xlu_cfg_get_string (config, "vga", &buf, 0)) {
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (5 preceding siblings ...)
  2020-08-03 18:21 ` [RFC PATCH V1 06/12] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-05 16:19   ` Jan Beulich
  2020-08-03 18:21 ` [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation Oleksandr Tyshchenko
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Jan Beulich,
	Daniel De Graaf

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Trying to run emulator in driver domain I ran into various issues
mostly policy-related. So this patch tries to resolve all them
plobably in a hackish way. I would like to get feedback how
to implement them properly as having an emulator in driver domain
is a completely valid use-case.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 xen/common/domain.c        | 15 +++++++++++++++
 xen/common/domctl.c        |  8 +++++++-
 xen/common/event_channel.c | 14 ++++++++++++--
 xen/common/memory.c        |  6 ++++++
 xen/include/xsm/dummy.h    | 16 +++++++++++++---
 5 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index e9be05f..5c9fef2 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -695,6 +695,7 @@ int domain_kill(struct domain *d)
 {
     int rc = 0;
     struct vcpu *v;
+    struct domain *td;
 
     if ( d == current->domain )
         return -EINVAL;
@@ -733,6 +734,20 @@ int domain_kill(struct domain *d)
          * have to be put before we call put_domain. */
         vm_event_cleanup(d);
         put_domain(d);
+        /*
+         * XEN_DOMCTL_set_target implementation holds reference on
+         * target domain which doesn't allow to completely destroy it.
+         * Check if the reference are hold by someone and drop it
+         * when destroying target domain.
+         */
+        for_each_domain ( td ) {
+            if ( td->target == d ) {
+                td->target = NULL;
+                put_domain(d);
+                break;
+            }
+        }
+
         send_global_virq(VIRQ_DOM_EXC);
         /* fallthrough */
     case DOMDYING_dead:
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index a69b3b5..079c7b0 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -871,6 +871,12 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
         if ( (d == e) || (d->target != NULL) )
         {
             put_domain(e);
+            /*
+             * Be a little bit more polite here, looks like the emulator
+             * has just been restarted.
+             */
+            if ( d->target == e )
+                ret = 0;
             break;
         }
 
@@ -883,7 +889,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
             break;
         }
 
-        /* Hold reference on @e until we destroy @d. */
+        /* Hold reference on @e until we destroy either @d or @e */
         d->target = e;
         break;
     }
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index a8d182b5..2aa497a 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -235,7 +235,12 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
         ERROR_EXIT_DOM(port, d);
     chn = evtchn_from_port(d, port);
 
-    rc = xsm_evtchn_unbound(XSM_TARGET, d, chn, alloc->remote_dom);
+    /*
+     * XXX: XSM_TARGET is not functional for emulator running in driver domain.
+     * See xsm_default_action for details. Probably XSM_DM_PRIV could work,
+     * but there is a risk to break other users.
+     */
+    rc = xsm_evtchn_unbound(XSM_HOOK, d, chn, alloc->remote_dom);
     if ( rc )
         goto out;
 
@@ -1218,7 +1223,12 @@ int alloc_unbound_xen_event_channel(
     port = rc;
     chn = evtchn_from_port(ld, port);
 
-    rc = xsm_evtchn_unbound(XSM_TARGET, ld, chn, remote_domid);
+    /*
+     * XXX: XSM_TARGET is not functional for emulator running in driver domain.
+     * See xsm_default_action for details. Probably XSM_DM_PRIV could work,
+     * but there is a risk to break other users.
+     */
+    rc = xsm_evtchn_unbound(XSM_HOOK, ld, chn, remote_domid);
     if ( rc )
         goto out;
 
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 0000477..8b306f6 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1153,12 +1153,18 @@ static int acquire_resource(
         unsigned int i;
 
         /*
+         * XXX: Ugly hack for now to let emulator running in driver domain
+         * to succeeded in acquiring resource.
+         */
+#if 0
+        /*
          * FIXME: Until foreign pages inserted into the P2M are properly
          *        reference counted, it is unsafe to allow mapping of
          *        resource pages unless the caller is the hardware domain.
          */
         if ( !is_hardware_domain(currd) )
             return -EACCES;
+#endif
 
         if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) )
             rc = -EFAULT;
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 317455a..c0813c0 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -139,13 +139,23 @@ static XSM_INLINE int xsm_domctl(XSM_DEFAULT_ARG struct domain *d, int cmd)
     XSM_ASSERT_ACTION(XSM_OTHER);
     switch ( cmd )
     {
+    /*
+     * XXX: Emulator running in driver domain tries to get vcpus num.
+     * Probably we could avoid that change by modifying emulator to not use
+     * domctl for getting vcpus num.
+     */
+    case XEN_DOMCTL_getdomaininfo:
+    /*
+     * XXX: XSM_DM_PRIV is not functional for emulator running in driver domain
+     * without setting a target in advance. See xsm_default_action for details.
+     */
+    case XEN_DOMCTL_set_target:
+        return xsm_default_action(XSM_HOOK, current->domain, d);
     case XEN_DOMCTL_ioport_mapping:
     case XEN_DOMCTL_memory_mapping:
     case XEN_DOMCTL_bind_pt_irq:
     case XEN_DOMCTL_unbind_pt_irq:
         return xsm_default_action(XSM_DM_PRIV, current->domain, d);
-    case XEN_DOMCTL_getdomaininfo:
-        return xsm_default_action(XSM_XS_PRIV, current->domain, d);
     default:
         return xsm_default_action(XSM_PRIV, current->domain, d);
     }
@@ -275,7 +285,7 @@ static XSM_INLINE int xsm_claim_pages(XSM_DEFAULT_ARG struct domain *d)
 static XSM_INLINE int xsm_evtchn_unbound(XSM_DEFAULT_ARG struct domain *d, struct evtchn *chn,
                                          domid_t id2)
 {
-    XSM_ASSERT_ACTION(XSM_TARGET);
+    XSM_ASSERT_ACTION(XSM_HOOK);
     return xsm_default_action(action, current->domain, d);
 }
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (6 preceding siblings ...)
  2020-08-03 18:21 ` [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-05 16:21   ` Jan Beulich
  2020-08-03 18:21 ` [RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way Oleksandr Tyshchenko
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Jan Beulich,
	Volodymyr Babchuk

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Borrow x86's logic to invalidate qemu mapcache.

TODO: Move send_invalidate_req() to common code (ioreq.c?).

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 xen/arch/arm/ioreq.c            | 14 ++++++++++++++
 xen/arch/arm/traps.c            |  6 ++++++
 xen/common/memory.c             |  6 ++++++
 xen/include/asm-arm/domain.h    |  2 ++
 xen/include/asm-arm/hvm/ioreq.h |  2 ++
 5 files changed, 30 insertions(+)

diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index a9cc839..8f60c41 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -75,6 +75,20 @@ bool handle_mmio(void)
     return true;
 }
 
+/* Ask ioemu mapcache to invalidate mappings. */
+void send_invalidate_req(void)
+{
+    ioreq_t p = {
+        .type = IOREQ_TYPE_INVALIDATE,
+        .size = 4,
+        .dir = IOREQ_WRITE,
+        .data = ~0UL, /* flush all */
+    };
+
+    if ( hvm_broadcast_ioreq(&p, false) != 0 )
+        gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n");
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 4cdf098..ea472d1 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1490,6 +1490,12 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,
     /* Ensure the hypercall trap instruction is re-executed. */
     if ( current->hcall_preempted )
         regs->pc -= 4;  /* re-execute 'hvc #XEN_HYPERCALL_TAG' */
+
+#ifdef CONFIG_IOREQ_SERVER
+    if ( unlikely(current->domain->arch.hvm.qemu_mapcache_invalidate) &&
+         test_and_clear_bool(current->domain->arch.hvm.qemu_mapcache_invalidate) )
+        send_invalidate_req();
+#endif
 }
 
 void arch_hypercall_tasklet_result(struct vcpu *v, long res)
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 8b306f6..8d9f0a8 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1652,6 +1652,12 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    /* x86 already sets the flag in hvm_memory_op() */
+#if defined(CONFIG_ARM64) && defined(CONFIG_IOREQ_SERVER)
+    if ( op == XENMEM_decrease_reservation )
+        curr_d->arch.hvm.qemu_mapcache_invalidate = true;
+#endif
+
     return rc;
 }
 
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index e060b0a..0db8bb4 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -69,6 +69,8 @@ struct hvm_domain
         spinlock_t              lock;
         struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
     } ioreq_server;
+
+    bool_t qemu_mapcache_invalidate;
 };
 
 #ifdef CONFIG_ARM_64
diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h
index 83a560c..392ce64 100644
--- a/xen/include/asm-arm/hvm/ioreq.h
+++ b/xen/include/asm-arm/hvm/ioreq.h
@@ -90,6 +90,8 @@ static inline void arch_hvm_ioreq_destroy(struct domain *d)
 #define IOREQ_IO_UNHANDLED   IO_UNHANDLED
 #define IOREQ_IO_RETRY       IO_RETRY
 
+void send_invalidate_req(void);
+
 #endif /* __ASM_X86_HVM_IOREQ_H__ */
 
 /*
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (7 preceding siblings ...)
  2020-08-03 18:21 ` [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-04 23:22   ` Stefano Stabellini
  2020-08-03 18:21 ` [RFC PATCH V1 10/12] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Ian Jackson,
	Oleksandr Tyshchenko, Anthony PERARD

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch makes possible to use device passthrough again.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 tools/libxl/libxl_arm.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 620b499..4f748e3 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -9,6 +9,10 @@
 #include <assert.h>
 #include <xen/device_tree_defs.h>
 
+#define GUEST_VIRTIO_MMIO_BASE  xen_mk_ullong(0x02000000)
+#define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
+#define GUEST_VIRTIO_MMIO_SPI   33
+
 static const char *gicv_to_string(libxl_gic_version gic_version)
 {
     switch (gic_version) {
@@ -27,8 +31,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 {
     uint32_t nr_spis = 0;
     unsigned int i;
-    uint32_t vuart_irq;
-    bool vuart_enabled = false;
+    uint32_t vuart_irq, virtio_irq;
+    bool vuart_enabled = false, virtio_enabled = false;
 
     /*
      * If pl011 vuart is enabled then increment the nr_spis to allow allocation
@@ -40,6 +44,17 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         vuart_enabled = true;
     }
 
+    /*
+     * XXX: Handle properly virtio
+     * A proper solution would be the toolstack to allocate the interrupts
+     * used by each virtio backend and let the backend now which one is used
+     */
+    if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {
+        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
+        virtio_irq = GUEST_VIRTIO_MMIO_SPI;
+        virtio_enabled = true;
+    }
+
     for (i = 0; i < d_config->b_info.num_irqs; i++) {
         uint32_t irq = d_config->b_info.irqs[i];
         uint32_t spi;
@@ -59,6 +74,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
             return ERROR_FAIL;
         }
 
+        /* The same check as for vpl011 */
+        if (virtio_enabled && irq == virtio_irq) {
+            LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq);
+            return ERROR_FAIL;
+        }
+
         if (irq < 32)
             continue;
 
@@ -68,10 +89,6 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
             nr_spis = spi + 1;
     }
 
-
-    /* XXX: Handle properly virtio */
-    nr_spis = 1;
-
     LOG(DEBUG, "Configure the domain");
 
     config->arch.nr_spis = nr_spis;
@@ -663,10 +680,6 @@ static int make_vpl011_uart_node(libxl__gc *gc, void *fdt,
     return 0;
 }
 
-#define GUEST_VIRTIO_MMIO_BASE  xen_mk_ullong(0x02000000)
-#define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
-#define GUEST_VIRTIO_MMIO_SPI   33
-
 static int make_virtio_mmio_node(libxl__gc *gc, void *fdt)
 {
     int res;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 10/12] libxl: Add support for virtio-disk configuration
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (8 preceding siblings ...)
  2020-08-03 18:21 ` [RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-04 23:23   ` Stefano Stabellini
  2020-08-03 18:21 ` [RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node Oleksandr Tyshchenko
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Ian Jackson,
	Oleksandr Tyshchenko, Anthony PERARD

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This patch adds basic support for configuring and assisting virtio-disk
backend (emualator) which is intended to run out of Qemu and could be run
in any domain.

Xenstore was chosen as a communication interface for the emulator running
in non-toolstack domain to be able to get configuration either by reading
Xenstore directly or by receiving command line parameters (an updated 'xl devd'
running in the same domain would read Xenstore beforehand and call backend
executable with the required arguments).

An example of domain configuration (two disks are assigned to the guest,
the latter is in readonly mode):

vdisk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3;ro:/dev/mmcblk1p3' ]

Where per-disk Xenstore entries are:
- filename and readonly flag (configured via "vdisk" property)
- base and irq (allocated dynamically)

Besides handling 'visible' params described in configuration file,
patch also allocates virtio-mmio specific ones for each device and
writes them into Xenstore. virtio-mmio params (irq and base) are
unique per guest domain, they allocated at the domain creation time
and passed through to the emulator. Each VirtIO device has at least
one pair of these params.

TODO:
1. An extra "virtio" property could be removed.
2. Update documentation.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 tools/libxl/Makefile                 |   4 +-
 tools/libxl/libxl_arm.c              |  63 +++++++++++++++----
 tools/libxl/libxl_create.c           |   1 +
 tools/libxl/libxl_internal.h         |   1 +
 tools/libxl/libxl_types.idl          |  15 +++++
 tools/libxl/libxl_types_internal.idl |   1 +
 tools/libxl/libxl_virtio_disk.c      | 109 +++++++++++++++++++++++++++++++++
 tools/xl/Makefile                    |   2 +-
 tools/xl/xl.h                        |   3 +
 tools/xl/xl_cmdtable.c               |  15 +++++
 tools/xl/xl_parse.c                  | 115 +++++++++++++++++++++++++++++++++++
 tools/xl/xl_virtio_disk.c            |  46 ++++++++++++++
 12 files changed, 360 insertions(+), 15 deletions(-)
 create mode 100644 tools/libxl/libxl_virtio_disk.c
 create mode 100644 tools/xl/xl_virtio_disk.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 38cd43a..df94b13 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -141,7 +141,9 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
 			libxl_vtpm.o libxl_nic.o libxl_disk.o libxl_console.o \
 			libxl_cpupool.o libxl_mem.o libxl_sched.o libxl_tmem.o \
 			libxl_9pfs.o libxl_domain.o libxl_vdispl.o \
-			libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o $(LIBXL_OBJS-y)
+			libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o \
+			libxl_virtio_disk.o $(LIBXL_OBJS-y)
+
 LIBXL_OBJS += libxl_genid.o
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 4f748e3..469a8b0 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -13,6 +13,12 @@
 #define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
 #define GUEST_VIRTIO_MMIO_SPI   33
 
+#ifndef container_of
+#define container_of(ptr, type, member) ({			\
+        typeof( ((type *)0)->member ) *__mptr = (ptr);	\
+        (type *)( (char *)__mptr - offsetof(type,member) );})
+#endif
+
 static const char *gicv_to_string(libxl_gic_version gic_version)
 {
     switch (gic_version) {
@@ -44,14 +50,32 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         vuart_enabled = true;
     }
 
-    /*
-     * XXX: Handle properly virtio
-     * A proper solution would be the toolstack to allocate the interrupts
-     * used by each virtio backend and let the backend now which one is used
-     */
     if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {
-        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
+        uint64_t virtio_base;
+        libxl_device_virtio_disk *virtio_disk;
+
+        virtio_base = GUEST_VIRTIO_MMIO_BASE;
         virtio_irq = GUEST_VIRTIO_MMIO_SPI;
+
+        if (!d_config->num_virtio_disks) {
+            LOG(ERROR, "Virtio is enabled, but no Virtio devices present\n");
+            return ERROR_FAIL;
+        }
+        virtio_disk = &d_config->virtio_disks[0];
+
+        for (i = 0; i < virtio_disk->num_disks; i++) {
+            virtio_disk->disks[i].base = virtio_base;
+            virtio_disk->disks[i].irq = virtio_irq;
+
+            LOG(DEBUG, "Allocate Virtio MMIO params: IRQ %u BASE 0x%"PRIx64,
+                virtio_irq, virtio_base);
+
+            virtio_irq ++;
+            virtio_base += GUEST_VIRTIO_MMIO_SIZE;
+        }
+        virtio_irq --;
+
+        nr_spis += (virtio_irq - 32) + 1;
         virtio_enabled = true;
     }
 
@@ -75,8 +99,9 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
         }
 
         /* The same check as for vpl011 */
-        if (virtio_enabled && irq == virtio_irq) {
-            LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq);
+        if (virtio_enabled &&
+           (irq >= GUEST_VIRTIO_MMIO_SPI && irq <= virtio_irq)) {
+            LOG(ERROR, "Physical IRQ %u conflicting with Virtio IRQ range\n", irq);
             return ERROR_FAIL;
         }
 
@@ -680,7 +705,8 @@ static int make_vpl011_uart_node(libxl__gc *gc, void *fdt,
     return 0;
 }
 
-static int make_virtio_mmio_node(libxl__gc *gc, void *fdt)
+static int make_virtio_mmio_node(libxl__gc *gc, void *fdt,
+                                 uint64_t base, uint32_t irq)
 {
     int res;
     gic_interrupt intr;
@@ -693,10 +719,10 @@ static int make_virtio_mmio_node(libxl__gc *gc, void *fdt)
     if (res) return res;
 
     res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS,
-                            1, GUEST_VIRTIO_MMIO_BASE, GUEST_VIRTIO_MMIO_SIZE);
+                            1, base, GUEST_VIRTIO_MMIO_SIZE);
     if (res) return res;
 
-    set_interrupt(intr, GUEST_VIRTIO_MMIO_SPI, 0xf, DT_IRQ_TYPE_EDGE_RISING);
+    set_interrupt(intr, irq, 0xf, DT_IRQ_TYPE_EDGE_RISING);
     res = fdt_property_interrupts(gc, fdt, &intr, 1);
     if (res) return res;
 
@@ -1010,8 +1036,19 @@ next_resize:
         if (info->tee == LIBXL_TEE_TYPE_OPTEE)
             FDT( make_optee_node(gc, fdt) );
 
-        if (libxl_defbool_val(info->arch_arm.virtio))
-            FDT( make_virtio_mmio_node(gc, fdt) );
+        if (libxl_defbool_val(info->arch_arm.virtio)) {
+            libxl_domain_config *d_config =
+                container_of(info, libxl_domain_config, b_info);
+            libxl_device_virtio_disk *virtio_disk = &d_config->virtio_disks[0];
+            unsigned int i;
+
+            for (i = 0; i < virtio_disk->num_disks; i++) {
+                uint64_t base = virtio_disk->disks[i].base;
+                uint32_t irq = virtio_disk->disks[i].irq;
+
+                FDT( make_virtio_mmio_node(gc, fdt, base, irq) );
+            }
+        }
 
         if (pfdt)
             FDT( copy_partial_fdt(gc, fdt, pfdt) );
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 2814818..8a0651e 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1817,6 +1817,7 @@ const libxl__device_type *device_type_tbl[] = {
     &libxl__dtdev_devtype,
     &libxl__vdispl_devtype,
     &libxl__vsnd_devtype,
+    &libxl__virtio_disk_devtype,
     NULL
 };
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 94a2317..4e2024d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3988,6 +3988,7 @@ extern const libxl__device_type libxl__vdispl_devtype;
 extern const libxl__device_type libxl__p9_devtype;
 extern const libxl__device_type libxl__pvcallsif_devtype;
 extern const libxl__device_type libxl__vsnd_devtype;
+extern const libxl__device_type libxl__virtio_disk_devtype;
 
 extern const libxl__device_type *device_type_tbl[];
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index b054bf9..5f8a3ff 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -935,6 +935,20 @@ libxl_device_vsnd = Struct("device_vsnd", [
     ("pcms", Array(libxl_vsnd_pcm, "num_vsnd_pcms"))
     ])
 
+libxl_virtio_disk_param = Struct("virtio_disk_param", [
+    ("filename", string),
+    ("readonly", bool),
+    ("irq", uint32),
+    ("base", uint64),
+    ])
+
+libxl_device_virtio_disk = Struct("device_virtio_disk", [
+    ("backend_domid", libxl_domid),
+    ("backend_domname", string),
+    ("devid", libxl_devid),
+    ("disks", Array(libxl_virtio_disk_param, "num_disks")),
+    ])
+
 libxl_domain_config = Struct("domain_config", [
     ("c_info", libxl_domain_create_info),
     ("b_info", libxl_domain_build_info),
@@ -951,6 +965,7 @@ libxl_domain_config = Struct("domain_config", [
     ("pvcallsifs", Array(libxl_device_pvcallsif, "num_pvcallsifs")),
     ("vdispls", Array(libxl_device_vdispl, "num_vdispls")),
     ("vsnds", Array(libxl_device_vsnd, "num_vsnds")),
+    ("virtio_disks", Array(libxl_device_virtio_disk, "num_virtio_disks")),
     # a channel manifests as a console with a name,
     # see docs/misc/channels.txt
     ("channels", Array(libxl_device_channel, "num_channels")),
diff --git a/tools/libxl/libxl_types_internal.idl b/tools/libxl/libxl_types_internal.idl
index 3593e21..8f71980 100644
--- a/tools/libxl/libxl_types_internal.idl
+++ b/tools/libxl/libxl_types_internal.idl
@@ -32,6 +32,7 @@ libxl__device_kind = Enumeration("device_kind", [
     (14, "PVCALLS"),
     (15, "VSND"),
     (16, "VINPUT"),
+    (17, "VIRTIO_DISK"),
     ])
 
 libxl__console_backend = Enumeration("console_backend", [
diff --git a/tools/libxl/libxl_virtio_disk.c b/tools/libxl/libxl_virtio_disk.c
new file mode 100644
index 0000000..25e7f1a
--- /dev/null
+++ b/tools/libxl/libxl_virtio_disk.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2020 EPAM Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include "libxl_internal.h"
+
+static int libxl__device_virtio_disk_setdefault(libxl__gc *gc, uint32_t domid,
+                                                libxl_device_virtio_disk *virtio_disk,
+                                                bool hotplug)
+{
+    return libxl__resolve_domid(gc, virtio_disk->backend_domname,
+                                &virtio_disk->backend_domid);
+}
+
+static int libxl__virtio_disk_from_xenstore(libxl__gc *gc, const char *libxl_path,
+                                            libxl_devid devid,
+                                            libxl_device_virtio_disk *virtio_disk)
+{
+    const char *be_path;
+    int rc;
+
+    virtio_disk->devid = devid;
+    rc = libxl__xs_read_mandatory(gc, XBT_NULL,
+                                  GCSPRINTF("%s/backend", libxl_path),
+                                  &be_path);
+    if (rc) return rc;
+
+    rc = libxl__backendpath_parse_domid(gc, be_path, &virtio_disk->backend_domid);
+    if (rc) return rc;
+
+    return 0;
+}
+
+static void libxl__update_config_virtio_disk(libxl__gc *gc,
+                                             libxl_device_virtio_disk *dst,
+                                             libxl_device_virtio_disk *src)
+{
+    dst->devid = src->devid;
+}
+
+static int libxl_device_virtio_disk_compare(libxl_device_virtio_disk *d1,
+                                            libxl_device_virtio_disk *d2)
+{
+    return COMPARE_DEVID(d1, d2);
+}
+
+static void libxl__device_virtio_disk_add(libxl__egc *egc, uint32_t domid,
+                                          libxl_device_virtio_disk *virtio_disk,
+                                          libxl__ao_device *aodev)
+{
+    libxl__device_add_async(egc, domid, &libxl__virtio_disk_devtype, virtio_disk, aodev);
+}
+
+static int libxl__set_xenstore_virtio_disk(libxl__gc *gc, uint32_t domid,
+                                           libxl_device_virtio_disk *virtio_disk,
+                                           flexarray_t *back, flexarray_t *front,
+                                           flexarray_t *ro_front)
+{
+    int rc;
+    unsigned int i;
+
+    for (i = 0; i < virtio_disk->num_disks; i++) {
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/filename", i),
+                                   GCSPRINTF("%s", virtio_disk->disks[i].filename));
+        if (rc) return rc;
+
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/readonly", i),
+                                   GCSPRINTF("%d", virtio_disk->disks[i].readonly));
+        if (rc) return rc;
+
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/base", i),
+                                   GCSPRINTF("%lu", virtio_disk->disks[i].base));
+        if (rc) return rc;
+
+        rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/irq", i),
+                                   GCSPRINTF("%u", virtio_disk->disks[i].irq));
+        if (rc) return rc;
+    }
+
+    return 0;
+}
+
+static LIBXL_DEFINE_UPDATE_DEVID(virtio_disk)
+static LIBXL_DEFINE_DEVICE_FROM_TYPE(virtio_disk)
+static LIBXL_DEFINE_DEVICES_ADD(virtio_disk)
+
+DEFINE_DEVICE_TYPE_STRUCT(virtio_disk, VIRTIO_DISK,
+    .update_config = (device_update_config_fn_t) libxl__update_config_virtio_disk,
+    .from_xenstore = (device_from_xenstore_fn_t) libxl__virtio_disk_from_xenstore,
+    .set_xenstore_config = (device_set_xenstore_config_fn_t) libxl__set_xenstore_virtio_disk
+);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/tools/xl/Makefile b/tools/xl/Makefile
index af4912e..38e4701 100644
--- a/tools/xl/Makefile
+++ b/tools/xl/Makefile
@@ -22,7 +22,7 @@ XL_OBJS += xl_vtpm.o xl_block.o xl_nic.o xl_usb.o
 XL_OBJS += xl_sched.o xl_pci.o xl_vcpu.o xl_cdrom.o xl_mem.o
 XL_OBJS += xl_info.o xl_console.o xl_misc.o
 XL_OBJS += xl_vmcontrol.o xl_saverestore.o xl_migrate.o
-XL_OBJS += xl_vdispl.o xl_vsnd.o xl_vkb.o
+XL_OBJS += xl_vdispl.o xl_vsnd.o xl_vkb.o xl_virtio_disk.o
 
 $(XL_OBJS): CFLAGS += $(CFLAGS_libxentoollog)
 $(XL_OBJS): CFLAGS += $(CFLAGS_XL)
diff --git a/tools/xl/xl.h b/tools/xl/xl.h
index 06569c6..3d26f19 100644
--- a/tools/xl/xl.h
+++ b/tools/xl/xl.h
@@ -178,6 +178,9 @@ int main_vsnddetach(int argc, char **argv);
 int main_vkbattach(int argc, char **argv);
 int main_vkblist(int argc, char **argv);
 int main_vkbdetach(int argc, char **argv);
+int main_virtio_diskattach(int argc, char **argv);
+int main_virtio_disklist(int argc, char **argv);
+int main_virtio_diskdetach(int argc, char **argv);
 int main_usbctrl_attach(int argc, char **argv);
 int main_usbctrl_detach(int argc, char **argv);
 int main_usbdev_attach(int argc, char **argv);
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 0833539..2bdf0b7 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -434,6 +434,21 @@ struct cmd_spec cmd_table[] = {
       "Destroy a domain's virtual sound device",
       "<Domain> <DevId>",
     },
+    { "virtio-disk-attach",
+      &main_virtio_diskattach, 1, 1,
+      "Create a new virtio block device",
+      " TBD\n"
+    },
+    { "virtio-disk-list",
+      &main_virtio_disklist, 0, 0,
+      "List virtio block devices for a domain",
+      "<Domain(s)>",
+    },
+    { "virtio-disk-detach",
+      &main_virtio_diskdetach, 0, 1,
+      "Destroy a domain's virtio block device",
+      "<Domain> <DevId>",
+    },
     { "uptime",
       &main_uptime, 0, 0,
       "Print uptime for all/some domains",
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index b8306aa..fd72109 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1202,6 +1202,120 @@ out:
     if (rc) exit(EXIT_FAILURE);
 }
 
+#define MAX_VIRTIO_DISKS 4
+
+static int parse_virtio_disk_config(libxl_device_virtio_disk *virtio_disk, char *token)
+{
+    char *oparg;
+    libxl_string_list disks = NULL;
+    int i, rc;
+
+    if (MATCH_OPTION("backend", token, oparg)) {
+        virtio_disk->backend_domname = strdup(oparg);
+    } else if (MATCH_OPTION("disks", token, oparg)) {
+        split_string_into_string_list(oparg, ";", &disks);
+
+        virtio_disk->num_disks = libxl_string_list_length(&disks);
+        if (virtio_disk->num_disks > MAX_VIRTIO_DISKS) {
+            fprintf(stderr, "vdisk: currently only %d disks are supported",
+                    MAX_VIRTIO_DISKS);
+            return 1;
+        }
+        virtio_disk->disks = xcalloc(virtio_disk->num_disks,
+                                     sizeof(*virtio_disk->disks));
+
+        for(i = 0; i < virtio_disk->num_disks; i++) {
+            char *disk_opt;
+
+            rc = split_string_into_pair(disks[i], ":", &disk_opt,
+                                        &virtio_disk->disks[i].filename);
+            if (rc) {
+                fprintf(stderr, "vdisk: failed to split \"%s\" into pair\n",
+                        disks[i]);
+                goto out;
+            }
+
+            if (!strcmp(disk_opt, "ro"))
+                virtio_disk->disks[i].readonly = 1;
+            else if (!strcmp(disk_opt, "rw"))
+                virtio_disk->disks[i].readonly = 0;
+            else {
+                fprintf(stderr, "vdisk: failed to parse \"%s\" disk option\n",
+                        disk_opt);
+                rc = 1;
+            }
+            free(disk_opt);
+
+            if (rc) goto out;
+        }
+    } else {
+        fprintf(stderr, "Unknown string \"%s\" in vdisk spec\n", token);
+        rc = 1; goto out;
+    }
+
+    rc = 0;
+
+out:
+    libxl_string_list_dispose(&disks);
+    return rc;
+}
+
+static void parse_virtio_disk_list(const XLU_Config *config,
+                            libxl_domain_config *d_config)
+{
+    XLU_ConfigList *virtio_disks;
+    const char *item;
+    char *buf = NULL;
+    int rc;
+
+    if (!xlu_cfg_get_list (config, "vdisk", &virtio_disks, 0, 0)) {
+        libxl_domain_build_info *b_info = &d_config->b_info;
+        int entry = 0;
+
+        /* XXX Remove an extra property */
+        libxl_defbool_setdefault(&b_info->arch_arm.virtio, false);
+        if (!libxl_defbool_val(b_info->arch_arm.virtio)) {
+            fprintf(stderr, "Virtio device requires Virtio property to be set\n");
+            exit(EXIT_FAILURE);
+        }
+
+        while ((item = xlu_cfg_get_listitem(virtio_disks, entry)) != NULL) {
+            libxl_device_virtio_disk *virtio_disk;
+            char *p;
+
+            virtio_disk = ARRAY_EXTEND_INIT(d_config->virtio_disks,
+                                            d_config->num_virtio_disks,
+                                            libxl_device_virtio_disk_init);
+
+            buf = strdup(item);
+
+            p = strtok (buf, ",");
+            while (p != NULL)
+            {
+                while (*p == ' ') p++;
+
+                rc = parse_virtio_disk_config(virtio_disk, p);
+                if (rc) goto out;
+
+                p = strtok (NULL, ",");
+            }
+
+            entry++;
+
+            if (virtio_disk->num_disks == 0) {
+                fprintf(stderr, "At least one virtio disk should be specified\n");
+                rc = 1; goto out;
+            }
+        }
+    }
+
+    rc = 0;
+
+out:
+    free(buf);
+    if (rc) exit(EXIT_FAILURE);
+}
+
 void parse_config_data(const char *config_source,
                        const char *config_data,
                        int config_len,
@@ -2732,6 +2846,7 @@ skip_usbdev:
     }
 
     parse_vkb_list(config, d_config);
+    parse_virtio_disk_list(config, d_config);
 
     xlu_cfg_get_defbool(config, "xend_suspend_evtchn_compat",
                         &c_info->xend_suspend_evtchn_compat, 0);
diff --git a/tools/xl/xl_virtio_disk.c b/tools/xl/xl_virtio_disk.c
new file mode 100644
index 0000000..808a7da
--- /dev/null
+++ b/tools/xl/xl_virtio_disk.c
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2020 EPAM Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+
+#include <stdlib.h>
+
+#include <libxl.h>
+#include <libxl_utils.h>
+#include <libxlutil.h>
+
+#include "xl.h"
+#include "xl_utils.h"
+#include "xl_parse.h"
+
+int main_virtio_diskattach(int argc, char **argv)
+{
+    return 0;
+}
+
+int main_virtio_disklist(int argc, char **argv)
+{
+   return 0;
+}
+
+int main_virtio_diskdetach(int argc, char **argv)
+{
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (9 preceding siblings ...)
  2020-08-03 18:21 ` [RFC PATCH V1 10/12] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-04 23:23   ` Stefano Stabellini
  2020-08-03 18:21 ` [RFC PATCH V1 12/12] libxl: Fix duplicate memory node in DT Oleksandr Tyshchenko
  2020-08-15 17:24 ` [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Julien Grall
  12 siblings, 1 reply; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Ian Jackson,
	Oleksandr Tyshchenko, Anthony PERARD

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Without "dma-coherent" property present in virtio-mmio device node,
guest assumes it is non-coherent and making non-cacheable accesses
to the vring when the DMA API is used for vring operations.
But virtio-mmio device which runs at the host size is making cacheable
accesses to vring. This all may result in a loss of coherency between
the guest and host.

With this patch we can avoid modifying guest at all, otherwise we
need to force VirtIO framework to not use DMA API for vring operations.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 tools/libxl/libxl_arm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 469a8b0..a68fb14 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -726,6 +726,9 @@ static int make_virtio_mmio_node(libxl__gc *gc, void *fdt,
     res = fdt_property_interrupts(gc, fdt, &intr, 1);
     if (res) return res;
 
+    res = fdt_property(fdt, "dma-coherent", NULL, 0);
+    if (res) return res;
+
     res = fdt_end_node(fdt);
     if (res) return res;
 
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* [RFC PATCH V1 12/12] libxl: Fix duplicate memory node in DT
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (10 preceding siblings ...)
  2020-08-03 18:21 ` [RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node Oleksandr Tyshchenko
@ 2020-08-03 18:21 ` Oleksandr Tyshchenko
  2020-08-15 17:24 ` [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Julien Grall
  12 siblings, 0 replies; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-03 18:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Ian Jackson,
	Oleksandr Tyshchenko, Anthony PERARD

From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

When two or more VirtIO devices are passed to DomU the
following message is observed:
OF: Duplicate name in base, renamed to "virtio#1"

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 tools/libxl/libxl_arm.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index a68fb14..9671a44 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -710,9 +710,11 @@ static int make_virtio_mmio_node(libxl__gc *gc, void *fdt,
 {
     int res;
     gic_interrupt intr;
+    /* Placeholder for virtio@ + a 64-bit number + \0 */
+    char buf[24];
 
-    /* XXX: Add address in the node name */
-    res = fdt_begin_node(fdt, "virtio");
+    snprintf(buf, sizeof(buf), "virtio@%"PRIx64, base);
+    res = fdt_begin_node(fdt, buf);
     if (res) return res;
 
     res = fdt_property_compat(gc, fdt, 1, "virtio,mmio");
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* RE: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-03 18:21 ` [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
@ 2020-08-04  7:45   ` Paul Durrant
  2020-08-04 11:10     ` Oleksandr
  2020-08-05 13:30   ` Julien Grall
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 140+ messages in thread
From: Paul Durrant @ 2020-08-04  7:45 UTC (permalink / raw)
  To: 'Oleksandr Tyshchenko', xen-devel
  Cc: 'Kevin Tian', 'Stefano Stabellini',
	'Julien Grall', 'Jun Nakajima', 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', 'Roger Pau Monné'

> -----Original Message-----
> From: Oleksandr Tyshchenko <olekstysh@gmail.com>
> Sent: 03 August 2020 19:21
> To: xen-devel@lists.xenproject.org
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>;
> George Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Julien Grall
> <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Paul Durrant <paul@xen.org>; Jun
> Nakajima <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>; Julien
> Grall <julien.grall@arm.com>
> Subject: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
> 
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> As a lot of x86 code can be re-used on Arm later on, this patch
> splits IOREQ support into common and arch specific parts.
> 
> This support is going to be used on Arm to be able run device
> emulator outside of Xen hypervisor.
> 
> Please note, this is a split/cleanup of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>  xen/arch/x86/Kconfig            |    1 +
>  xen/arch/x86/hvm/dm.c           |    2 +-
>  xen/arch/x86/hvm/emulate.c      |    2 +-
>  xen/arch/x86/hvm/hvm.c          |    2 +-
>  xen/arch/x86/hvm/io.c           |    2 +-
>  xen/arch/x86/hvm/ioreq.c        | 1431 +--------------------------------------
>  xen/arch/x86/hvm/stdvga.c       |    2 +-
>  xen/arch/x86/hvm/vmx/realmode.c |    1 +
>  xen/arch/x86/hvm/vmx/vvmx.c     |    2 +-
>  xen/arch/x86/mm.c               |    2 +-
>  xen/arch/x86/mm/shadow/common.c |    2 +-
>  xen/common/Kconfig              |    3 +
>  xen/common/Makefile             |    1 +
>  xen/common/hvm/Makefile         |    1 +
>  xen/common/hvm/ioreq.c          | 1430 ++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/ioreq.h |   45 +-
>  xen/include/asm-x86/hvm/vcpu.h  |    7 -
>  xen/include/xen/hvm/ioreq.h     |   89 +++
>  18 files changed, 1575 insertions(+), 1450 deletions(-)
>  create mode 100644 xen/common/hvm/Makefile
>  create mode 100644 xen/common/hvm/ioreq.c
>  create mode 100644 xen/include/xen/hvm/ioreq.h

You need to adjust the MAINTAINERS file since there will now be common 'I/O EMULATION' code. Since I wrote most of ioreq.c, please retain me as a maintainer of the common code.

[snip]
> @@ -1528,13 +143,19 @@ static int hvm_access_cf8(
>      return X86EMUL_UNHANDLEABLE;
>  }
> 
> -void hvm_ioreq_init(struct domain *d)
> +void arch_hvm_ioreq_init(struct domain *d)
>  {
>      spin_lock_init(&d->arch.hvm.ioreq_server.lock);
> 
>      register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
>  }
> 
> +void arch_hvm_ioreq_destroy(struct domain *d)
> +{
> +    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
> +        return;

There's not really a lot of point in this. I think an empty function here would be ok.

> +}
> +
>  /*
>   * Local variables:
>   * mode: C

[snip]
> +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
> +                                                 ioreq_t *p)
> +{
> +    struct hvm_ioreq_server *s;
> +    uint8_t type;
> +    uint64_t addr;
> +    unsigned int id;
> +
> +    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
> +        return NULL;
> +
> +    hvm_get_ioreq_server_range_type(d, p, &type, &addr);

Looking at this, I think it would make more sense to fold the check of p->type into hvm_get_ioreq_server_range_type() and have it return success/failure.

> +
> +    FOR_EACH_IOREQ_SERVER(d, id, s)
> +    {
> +        struct rangeset *r;
> +
> +        if ( !s->enabled )
> +            continue;
> +
> +        r = s->range[type];
> +
> +        switch ( type )
> +        {
> +            unsigned long start, end;
> +
> +        case XEN_DMOP_IO_RANGE_PORT:
> +            start = addr;
> +            end = start + p->size - 1;
> +            if ( rangeset_contains_range(r, start, end) )
> +                return s;
> +
> +            break;
> +
> +        case XEN_DMOP_IO_RANGE_MEMORY:
> +            start = hvm_mmio_first_byte(p);
> +            end = hvm_mmio_last_byte(p);
> +
> +            if ( rangeset_contains_range(r, start, end) )
> +                return s;
> +
> +            break;
> +
> +        case XEN_DMOP_IO_RANGE_PCI:
> +            if ( rangeset_contains_singleton(r, addr >> 32) )
> +            {
> +                p->type = IOREQ_TYPE_PCI_CONFIG;
> +                p->addr = addr;
> +                return s;
> +            }
> +
> +            break;
> +        }
> +    }
> +
> +    return NULL;
> +}

[snip]
> diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h
> new file mode 100644
> index 0000000..40b7b5e
> --- /dev/null
> +++ b/xen/include/xen/hvm/ioreq.h
> @@ -0,0 +1,89 @@
> +/*
> + * hvm.h: Hardware virtual machine assist interface definitions.
> + *
> + * Copyright (c) 2016 Citrix Systems Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __HVM_IOREQ_H__
> +#define __HVM_IOREQ_H__
> +
> +#include <xen/sched.h>
> +
> +#include <asm/hvm/ioreq.h>
> +
> +#define GET_IOREQ_SERVER(d, id) \
> +    (d)->arch.hvm.ioreq_server.server[id]
> +
> +static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
> +                                                        unsigned int id)
> +{
> +    if ( id >= MAX_NR_IOREQ_SERVERS )
> +        return NULL;
> +
> +    return GET_IOREQ_SERVER(d, id);
> +}
> +
> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
> +{
> +    return ioreq->state == STATE_IOREQ_READY &&
> +           !ioreq->data_is_ptr &&
> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
> +}

I don't think having this in common code is correct. The short-cut of not completing PIO reads seems somewhat x86 specific. Does ARM even have the concept of PIO?

  Paul

> +
> +bool hvm_io_pending(struct vcpu *v);
> +bool handle_hvm_io_completion(struct vcpu *v);
> +bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
> +
> +int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
> +                            ioservid_t *id);
> +int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id);
> +int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
> +                              unsigned long *ioreq_gfn,
> +                              unsigned long *bufioreq_gfn,
> +                              evtchn_port_t *bufioreq_port);
> +int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
> +                               unsigned long idx, mfn_t *mfn);
> +int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
> +                                     uint32_t type, uint64_t start,
> +                                     uint64_t end);
> +int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
> +                                         uint32_t type, uint64_t start,
> +                                         uint64_t end);
> +int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
> +                               bool enabled);
> +
> +int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
> +void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
> +void hvm_destroy_all_ioreq_servers(struct domain *d);
> +
> +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
> +                                                 ioreq_t *p);
> +int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
> +                   bool buffered);
> +unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
> +
> +void hvm_ioreq_init(struct domain *d);
> +
> +#endif /* __HVM_IOREQ_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> --
> 2.7.4




^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-03 18:21 ` [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
@ 2020-08-04  7:49   ` Paul Durrant
  2020-08-04 14:01     ` Julien Grall
  2020-08-04 23:22   ` Stefano Stabellini
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 140+ messages in thread
From: Paul Durrant @ 2020-08-04  7:49 UTC (permalink / raw)
  To: 'Oleksandr Tyshchenko', xen-devel
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Oleksandr Tyshchenko',
	'Julien Grall', 'Jan Beulich',
	'Daniel De Graaf', 'Volodymyr Babchuk'

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Oleksandr Tyshchenko
> Sent: 03 August 2020 19:21
> To: xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Wei Liu <wl@xen.org>;
> Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; George Dunlap
> <george.dunlap@citrix.com>; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Julien Grall
> <julien.grall@arm.com>; Jan Beulich <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
> Subject: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
> 
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch makes possible to forward Guest MMIO accesses
> to a device emulator on Arm and enables that support for
> Arm64.
> 
> Also update XSM code a bit to let DM op be used on Arm.
> New arch DM op will be introduced in the follow-up patch.
> 
> Please note, at the moment build on Arm32 is broken
> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone
> wants to enable CONFIG_IOREQ_SERVER due to the lack of
> cmpxchg_64 support on Arm32.
> 
> Please note, this is a split/cleanup of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>  tools/libxc/xc_dom_arm.c        |  25 +++++++---
>  xen/arch/arm/Kconfig            |   1 +
>  xen/arch/arm/Makefile           |   2 +
>  xen/arch/arm/dm.c               |  34 +++++++++++++
>  xen/arch/arm/domain.c           |   9 ++++
>  xen/arch/arm/hvm.c              |  46 +++++++++++++++++-
>  xen/arch/arm/io.c               |  67 +++++++++++++++++++++++++-
>  xen/arch/arm/ioreq.c            |  86 +++++++++++++++++++++++++++++++++
>  xen/arch/arm/traps.c            |  17 +++++++
>  xen/common/memory.c             |   5 +-
>  xen/include/asm-arm/domain.h    |  80 +++++++++++++++++++++++++++++++
>  xen/include/asm-arm/hvm/ioreq.h | 103 ++++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/mmio.h      |   1 +
>  xen/include/asm-arm/p2m.h       |   7 +--
>  xen/include/xsm/dummy.h         |   4 +-
>  xen/include/xsm/xsm.h           |   6 +--
>  xen/xsm/dummy.c                 |   2 +-
>  xen/xsm/flask/hooks.c           |   5 +-
>  18 files changed, 476 insertions(+), 24 deletions(-)
>  create mode 100644 xen/arch/arm/dm.c
>  create mode 100644 xen/arch/arm/ioreq.c
>  create mode 100644 xen/include/asm-arm/hvm/ioreq.h
> 
> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
> index 931404c..b5fc066 100644
> --- a/tools/libxc/xc_dom_arm.c
> +++ b/tools/libxc/xc_dom_arm.c
> @@ -26,11 +26,19 @@
>  #include "xg_private.h"
>  #include "xc_dom.h"
> 
> -#define NR_MAGIC_PAGES 4
> +
>  #define CONSOLE_PFN_OFFSET 0
>  #define XENSTORE_PFN_OFFSET 1
>  #define MEMACCESS_PFN_OFFSET 2
>  #define VUART_PFN_OFFSET 3
> +#define IOREQ_SERVER_PFN_OFFSET 4
> +
> +#define NR_IOREQ_SERVER_PAGES 8
> +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES)
> +
> +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT)
> +
> +#define special_pfn(x)  (GUEST_MAGIC_BASE_PFN + (x))

Why introduce 'magic pages' for Arm? It's quite a horrible hack that we have begun to do away with by adding resource mapping.

  Paul



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-04  7:45   ` Paul Durrant
@ 2020-08-04 11:10     ` Oleksandr
  2020-08-04 11:23       ` Paul Durrant
  2020-08-04 13:52       ` Julien Grall
  0 siblings, 2 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-04 11:10 UTC (permalink / raw)
  To: paul, xen-devel
  Cc: 'Kevin Tian', 'Stefano Stabellini',
	'Julien Grall', 'Jun Nakajima', 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', 'Roger Pau Monné'


On 04.08.20 10:45, Paul Durrant wrote:

Hi Paul

>> -----Original Message-----
>> From: Oleksandr Tyshchenko <olekstysh@gmail.com>
>> Sent: 03 August 2020 19:21
>> To: xen-devel@lists.xenproject.org
>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Jan Beulich <jbeulich@suse.com>; Andrew
>> Cooper <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>;
>> George Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Julien Grall
>> <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Paul Durrant <paul@xen.org>; Jun
>> Nakajima <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>; Julien
>> Grall <julien.grall@arm.com>
>> Subject: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
>>
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> As a lot of x86 code can be re-used on Arm later on, this patch
>> splits IOREQ support into common and arch specific parts.
>>
>> This support is going to be used on Arm to be able run device
>> emulator outside of Xen hypervisor.
>>
>> Please note, this is a split/cleanup of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>>   xen/arch/x86/Kconfig            |    1 +
>>   xen/arch/x86/hvm/dm.c           |    2 +-
>>   xen/arch/x86/hvm/emulate.c      |    2 +-
>>   xen/arch/x86/hvm/hvm.c          |    2 +-
>>   xen/arch/x86/hvm/io.c           |    2 +-
>>   xen/arch/x86/hvm/ioreq.c        | 1431 +--------------------------------------
>>   xen/arch/x86/hvm/stdvga.c       |    2 +-
>>   xen/arch/x86/hvm/vmx/realmode.c |    1 +
>>   xen/arch/x86/hvm/vmx/vvmx.c     |    2 +-
>>   xen/arch/x86/mm.c               |    2 +-
>>   xen/arch/x86/mm/shadow/common.c |    2 +-
>>   xen/common/Kconfig              |    3 +
>>   xen/common/Makefile             |    1 +
>>   xen/common/hvm/Makefile         |    1 +
>>   xen/common/hvm/ioreq.c          | 1430 ++++++++++++++++++++++++++++++++++++++
>>   xen/include/asm-x86/hvm/ioreq.h |   45 +-
>>   xen/include/asm-x86/hvm/vcpu.h  |    7 -
>>   xen/include/xen/hvm/ioreq.h     |   89 +++
>>   18 files changed, 1575 insertions(+), 1450 deletions(-)
>>   create mode 100644 xen/common/hvm/Makefile
>>   create mode 100644 xen/common/hvm/ioreq.c
>>   create mode 100644 xen/include/xen/hvm/ioreq.h
> You need to adjust the MAINTAINERS file since there will now be common 'I/O EMULATION' code. Since I wrote most of ioreq.c, please retain me as a maintainer of the common code.

Oh, I completely forgot about MAINTAINERS file. Sure, I will update file 
and retain you.


>
> [snip]
>> @@ -1528,13 +143,19 @@ static int hvm_access_cf8(
>>       return X86EMUL_UNHANDLEABLE;
>>   }
>>
>> -void hvm_ioreq_init(struct domain *d)
>> +void arch_hvm_ioreq_init(struct domain *d)
>>   {
>>       spin_lock_init(&d->arch.hvm.ioreq_server.lock);
>>
>>       register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
>>   }
>>
>> +void arch_hvm_ioreq_destroy(struct domain *d)
>> +{
>> +    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
>> +        return;
> There's not really a lot of point in this. I think an empty function here would be ok.

ok


>> +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
>> +                                                 ioreq_t *p)
>> +{
>> +    struct hvm_ioreq_server *s;
>> +    uint8_t type;
>> +    uint64_t addr;
>> +    unsigned int id;
>> +
>> +    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
>> +        return NULL;
>> +
>> +    hvm_get_ioreq_server_range_type(d, p, &type, &addr);
> Looking at this, I think it would make more sense to fold the check of p->type into hvm_get_ioreq_server_range_type() and have it return success/failure.

ok, will update.


> diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h
>> new file mode 100644
>> index 0000000..40b7b5e
>> --- /dev/null
>> +++ b/xen/include/xen/hvm/ioreq.h
>> @@ -0,0 +1,89 @@
>> +/*
>> + * hvm.h: Hardware virtual machine assist interface definitions.
>> + *
>> + * Copyright (c) 2016 Citrix Systems Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef __HVM_IOREQ_H__
>> +#define __HVM_IOREQ_H__
>> +
>> +#include <xen/sched.h>
>> +
>> +#include <asm/hvm/ioreq.h>
>> +
>> +#define GET_IOREQ_SERVER(d, id) \
>> +    (d)->arch.hvm.ioreq_server.server[id]
>> +
>> +static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
>> +                                                        unsigned int id)
>> +{
>> +    if ( id >= MAX_NR_IOREQ_SERVERS )
>> +        return NULL;
>> +
>> +    return GET_IOREQ_SERVER(d, id);
>> +}
>> +
>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
>> +{
>> +    return ioreq->state == STATE_IOREQ_READY &&
>> +           !ioreq->data_is_ptr &&
>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
>> +}
> I don't think having this in common code is correct. The short-cut of not completing PIO reads seems somewhat x86 specific. Does ARM even have the concept of PIO?

I am not 100% sure here, but it seems that doesn't have.

Shall I make hvm_ioreq_needs_completion() per arch? Arm variant would 
have the same implementation, but without "ioreq->type != 
IOREQ_TYPE_PIO" check...

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-04 11:10     ` Oleksandr
@ 2020-08-04 11:23       ` Paul Durrant
  2020-08-04 11:51         ` Oleksandr
  2020-08-04 13:52       ` Julien Grall
  1 sibling, 1 reply; 140+ messages in thread
From: Paul Durrant @ 2020-08-04 11:23 UTC (permalink / raw)
  To: 'Oleksandr', xen-devel
  Cc: 'Kevin Tian', 'Stefano Stabellini',
	'Julien Grall', 'Jun Nakajima', 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', 'Roger Pau Monné'

> -----Original Message-----
> From: Oleksandr <olekstysh@gmail.com>
> Sent: 04 August 2020 12:10
> To: paul@xen.org; xen-devel@lists.xenproject.org
> Cc: 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>; 'Jan Beulich' <jbeulich@suse.com>; 'Andrew
> Cooper' <andrew.cooper3@citrix.com>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>;
> 'George Dunlap' <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Julien Grall'
> <julien@xen.org>; 'Stefano Stabellini' <sstabellini@kernel.org>; 'Jun Nakajima'
> <jun.nakajima@intel.com>; 'Kevin Tian' <kevin.tian@intel.com>; 'Tim Deegan' <tim@xen.org>; 'Julien
> Grall' <julien.grall@arm.com>
> Subject: Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
> 
> 
> On 04.08.20 10:45, Paul Durrant wrote:
> 
> Hi Paul
> 
> >> -----Original Message-----
> >> From: Oleksandr Tyshchenko <olekstysh@gmail.com>
> >> Sent: 03 August 2020 19:21
> >> To: xen-devel@lists.xenproject.org
> >> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Jan Beulich <jbeulich@suse.com>; Andrew
> >> Cooper <andrew.cooper3@citrix.com>; Wei Liu <wl@xen.org>; Roger Pau Monné <roger.pau@citrix.com>;
> >> George Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Julien Grall
> >> <julien@xen.org>; Stefano Stabellini <sstabellini@kernel.org>; Paul Durrant <paul@xen.org>; Jun
> >> Nakajima <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>;
> Julien
> >> Grall <julien.grall@arm.com>
> >> Subject: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
> >>
> >> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>
> >> As a lot of x86 code can be re-used on Arm later on, this patch
> >> splits IOREQ support into common and arch specific parts.
> >>
> >> This support is going to be used on Arm to be able run device
> >> emulator outside of Xen hypervisor.
> >>
> >> Please note, this is a split/cleanup of Julien's PoC:
> >> "Add support for Guest IO forwarding to a device emulator"
> >>
> >> Signed-off-by: Julien Grall <julien.grall@arm.com>
> >> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >> ---
> >>   xen/arch/x86/Kconfig            |    1 +
> >>   xen/arch/x86/hvm/dm.c           |    2 +-
> >>   xen/arch/x86/hvm/emulate.c      |    2 +-
> >>   xen/arch/x86/hvm/hvm.c          |    2 +-
> >>   xen/arch/x86/hvm/io.c           |    2 +-
> >>   xen/arch/x86/hvm/ioreq.c        | 1431 +--------------------------------------
> >>   xen/arch/x86/hvm/stdvga.c       |    2 +-
> >>   xen/arch/x86/hvm/vmx/realmode.c |    1 +
> >>   xen/arch/x86/hvm/vmx/vvmx.c     |    2 +-
> >>   xen/arch/x86/mm.c               |    2 +-
> >>   xen/arch/x86/mm/shadow/common.c |    2 +-
> >>   xen/common/Kconfig              |    3 +
> >>   xen/common/Makefile             |    1 +
> >>   xen/common/hvm/Makefile         |    1 +
> >>   xen/common/hvm/ioreq.c          | 1430 ++++++++++++++++++++++++++++++++++++++
> >>   xen/include/asm-x86/hvm/ioreq.h |   45 +-
> >>   xen/include/asm-x86/hvm/vcpu.h  |    7 -
> >>   xen/include/xen/hvm/ioreq.h     |   89 +++
> >>   18 files changed, 1575 insertions(+), 1450 deletions(-)
> >>   create mode 100644 xen/common/hvm/Makefile
> >>   create mode 100644 xen/common/hvm/ioreq.c
> >>   create mode 100644 xen/include/xen/hvm/ioreq.h
> > You need to adjust the MAINTAINERS file since there will now be common 'I/O EMULATION' code. Since I
> wrote most of ioreq.c, please retain me as a maintainer of the common code.
> 
> Oh, I completely forgot about MAINTAINERS file. Sure, I will update file
> and retain you.
> 
> 
> >
> > [snip]
> >> @@ -1528,13 +143,19 @@ static int hvm_access_cf8(
> >>       return X86EMUL_UNHANDLEABLE;
> >>   }
> >>
> >> -void hvm_ioreq_init(struct domain *d)
> >> +void arch_hvm_ioreq_init(struct domain *d)
> >>   {
> >>       spin_lock_init(&d->arch.hvm.ioreq_server.lock);
> >>
> >>       register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
> >>   }
> >>
> >> +void arch_hvm_ioreq_destroy(struct domain *d)
> >> +{
> >> +    if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) )
> >> +        return;
> > There's not really a lot of point in this. I think an empty function here would be ok.
> 
> ok
> 
> 
> >> +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
> >> +                                                 ioreq_t *p)
> >> +{
> >> +    struct hvm_ioreq_server *s;
> >> +    uint8_t type;
> >> +    uint64_t addr;
> >> +    unsigned int id;
> >> +
> >> +    if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
> >> +        return NULL;
> >> +
> >> +    hvm_get_ioreq_server_range_type(d, p, &type, &addr);
> > Looking at this, I think it would make more sense to fold the check of p->type into
> hvm_get_ioreq_server_range_type() and have it return success/failure.
> 
> ok, will update.
> 
> 
> > diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h
> >> new file mode 100644
> >> index 0000000..40b7b5e
> >> --- /dev/null
> >> +++ b/xen/include/xen/hvm/ioreq.h
> >> @@ -0,0 +1,89 @@
> >> +/*
> >> + * hvm.h: Hardware virtual machine assist interface definitions.
> >> + *
> >> + * Copyright (c) 2016 Citrix Systems Inc.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify it
> >> + * under the terms and conditions of the GNU General Public License,
> >> + * version 2, as published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope it will be useful, but WITHOUT
> >> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> >> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> >> + * more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License along with
> >> + * this program; If not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#ifndef __HVM_IOREQ_H__
> >> +#define __HVM_IOREQ_H__
> >> +
> >> +#include <xen/sched.h>
> >> +
> >> +#include <asm/hvm/ioreq.h>
> >> +
> >> +#define GET_IOREQ_SERVER(d, id) \
> >> +    (d)->arch.hvm.ioreq_server.server[id]
> >> +
> >> +static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
> >> +                                                        unsigned int id)
> >> +{
> >> +    if ( id >= MAX_NR_IOREQ_SERVERS )
> >> +        return NULL;
> >> +
> >> +    return GET_IOREQ_SERVER(d, id);
> >> +}
> >> +
> >> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
> >> +{
> >> +    return ioreq->state == STATE_IOREQ_READY &&
> >> +           !ioreq->data_is_ptr &&
> >> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
> >> +}
> > I don't think having this in common code is correct. The short-cut of not completing PIO reads seems
> somewhat x86 specific. Does ARM even have the concept of PIO?
> 
> I am not 100% sure here, but it seems that doesn't have.
> 
> Shall I make hvm_ioreq_needs_completion() per arch? Arm variant would
> have the same implementation, but without "ioreq->type !=
> IOREQ_TYPE_PIO" check...
> 

With your series applied, does any common code actually call hvm_ioreq_needs_completion()? I suspect it will remain x86 specific, without any need for an Arm variant.

  Paul

> --
> Regards,
> 
> Oleksandr Tyshchenko




^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-04 11:23       ` Paul Durrant
@ 2020-08-04 11:51         ` Oleksandr
  2020-08-04 13:18           ` Paul Durrant
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-04 11:51 UTC (permalink / raw)
  To: paul, xen-devel
  Cc: 'Kevin Tian', 'Stefano Stabellini',
	'Julien Grall', 'Jun Nakajima', 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', 'Roger Pau Monné'


On 04.08.20 14:23, Paul Durrant wrote:
>>
>>> diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h
>>>> new file mode 100644
>>>> index 0000000..40b7b5e
>>>> --- /dev/null
>>>> +++ b/xen/include/xen/hvm/ioreq.h
>>>> @@ -0,0 +1,89 @@
>>>> +/*
>>>> + * hvm.h: Hardware virtual machine assist interface definitions.
>>>> + *
>>>> + * Copyright (c) 2016 Citrix Systems Inc.
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify it
>>>> + * under the terms and conditions of the GNU General Public License,
>>>> + * version 2, as published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>>>> + * more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License along with
>>>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#ifndef __HVM_IOREQ_H__
>>>> +#define __HVM_IOREQ_H__
>>>> +
>>>> +#include <xen/sched.h>
>>>> +
>>>> +#include <asm/hvm/ioreq.h>
>>>> +
>>>> +#define GET_IOREQ_SERVER(d, id) \
>>>> +    (d)->arch.hvm.ioreq_server.server[id]
>>>> +
>>>> +static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
>>>> +                                                        unsigned int id)
>>>> +{
>>>> +    if ( id >= MAX_NR_IOREQ_SERVERS )
>>>> +        return NULL;
>>>> +
>>>> +    return GET_IOREQ_SERVER(d, id);
>>>> +}
>>>> +
>>>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
>>>> +{
>>>> +    return ioreq->state == STATE_IOREQ_READY &&
>>>> +           !ioreq->data_is_ptr &&
>>>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
>>>> +}
>>> I don't think having this in common code is correct. The short-cut of not completing PIO reads seems
>> somewhat x86 specific. Does ARM even have the concept of PIO?
>>
>> I am not 100% sure here, but it seems that doesn't have.
>>
>> Shall I make hvm_ioreq_needs_completion() per arch? Arm variant would
>> have the same implementation, but without "ioreq->type !=
>> IOREQ_TYPE_PIO" check...
>>
> With your series applied, does any common code actually call hvm_ioreq_needs_completion()? I suspect it will remain x86 specific, without any need for an Arm variant.
Yes, it does. Please see common usage in hvm_io_assist() and 
handle_hvm_io_completion() (current patch) and usage in Arm code 
(arch/arm/io.c: io_state try_fwd_ioserv) [1]


[1] 
https://lists.xenproject.org/archives/html/xen-devel/2020-08/msg00072.html


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-04 11:51         ` Oleksandr
@ 2020-08-04 13:18           ` Paul Durrant
  0 siblings, 0 replies; 140+ messages in thread
From: Paul Durrant @ 2020-08-04 13:18 UTC (permalink / raw)
  To: 'Oleksandr', xen-devel
  Cc: 'Kevin Tian', 'Stefano Stabellini',
	'Julien Grall', 'Jun Nakajima', 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', 'Roger Pau Monné'

> -----Original Message-----
> From: Oleksandr <olekstysh@gmail.com>
> Sent: 04 August 2020 12:51
> To: paul@xen.org; xen-devel@lists.xenproject.org
> Cc: 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>; 'Jan Beulich' <jbeulich@suse.com>; 'Andrew
> Cooper' <andrew.cooper3@citrix.com>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>;
> 'George Dunlap' <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Julien Grall'
> <julien@xen.org>; 'Stefano Stabellini' <sstabellini@kernel.org>; 'Jun Nakajima'
> <jun.nakajima@intel.com>; 'Kevin Tian' <kevin.tian@intel.com>; 'Tim Deegan' <tim@xen.org>; 'Julien
> Grall' <julien.grall@arm.com>
> Subject: Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
> 
> 
> On 04.08.20 14:23, Paul Durrant wrote:
> >>
> >>> diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h
> >>>> new file mode 100644
> >>>> index 0000000..40b7b5e
> >>>> --- /dev/null
> >>>> +++ b/xen/include/xen/hvm/ioreq.h
> >>>> @@ -0,0 +1,89 @@
> >>>> +/*
> >>>> + * hvm.h: Hardware virtual machine assist interface definitions.
> >>>> + *
> >>>> + * Copyright (c) 2016 Citrix Systems Inc.
> >>>> + *
> >>>> + * This program is free software; you can redistribute it and/or modify it
> >>>> + * under the terms and conditions of the GNU General Public License,
> >>>> + * version 2, as published by the Free Software Foundation.
> >>>> + *
> >>>> + * This program is distributed in the hope it will be useful, but WITHOUT
> >>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> >>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> >>>> + * more details.
> >>>> + *
> >>>> + * You should have received a copy of the GNU General Public License along with
> >>>> + * this program; If not, see <http://www.gnu.org/licenses/>.
> >>>> + */
> >>>> +
> >>>> +#ifndef __HVM_IOREQ_H__
> >>>> +#define __HVM_IOREQ_H__
> >>>> +
> >>>> +#include <xen/sched.h>
> >>>> +
> >>>> +#include <asm/hvm/ioreq.h>
> >>>> +
> >>>> +#define GET_IOREQ_SERVER(d, id) \
> >>>> +    (d)->arch.hvm.ioreq_server.server[id]
> >>>> +
> >>>> +static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
> >>>> +                                                        unsigned int id)
> >>>> +{
> >>>> +    if ( id >= MAX_NR_IOREQ_SERVERS )
> >>>> +        return NULL;
> >>>> +
> >>>> +    return GET_IOREQ_SERVER(d, id);
> >>>> +}
> >>>> +
> >>>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
> >>>> +{
> >>>> +    return ioreq->state == STATE_IOREQ_READY &&
> >>>> +           !ioreq->data_is_ptr &&
> >>>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
> >>>> +}
> >>> I don't think having this in common code is correct. The short-cut of not completing PIO reads
> seems
> >> somewhat x86 specific. Does ARM even have the concept of PIO?
> >>
> >> I am not 100% sure here, but it seems that doesn't have.
> >>
> >> Shall I make hvm_ioreq_needs_completion() per arch? Arm variant would
> >> have the same implementation, but without "ioreq->type !=
> >> IOREQ_TYPE_PIO" check...
> >>
> > With your series applied, does any common code actually call hvm_ioreq_needs_completion()? I suspect
> it will remain x86 specific, without any need for an Arm variant.
> Yes, it does. Please see common usage in hvm_io_assist() and
> handle_hvm_io_completion() (current patch) and usage in Arm code
> (arch/arm/io.c: io_state try_fwd_ioserv) [1]
> 
> 
> [1]
> https://lists.xenproject.org/archives/html/xen-devel/2020-08/msg00072.html
> 

Yes, but that code is clearly not finished since, after setting io_completion it says:

/* XXX: Decide what to do */
if ( rc == IO_RETRY )
    rc = IO_HANDLED;

So, it's not clear what the eventual implementation will be and whether it will need to make that call.

  Paul

> 
> --
> Regards,
> 
> Oleksandr Tyshchenko




^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-04 11:10     ` Oleksandr
  2020-08-04 11:23       ` Paul Durrant
@ 2020-08-04 13:52       ` Julien Grall
  2020-08-04 15:41         ` Jan Beulich
  2020-08-04 19:11         ` Stefano Stabellini
  1 sibling, 2 replies; 140+ messages in thread
From: Julien Grall @ 2020-08-04 13:52 UTC (permalink / raw)
  To: Oleksandr, paul, xen-devel
  Cc: 'Kevin Tian', 'Stefano Stabellini',
	'Jun Nakajima', 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', 'Roger Pau Monné'

Hi,

On 04/08/2020 12:10, Oleksandr wrote:
> On 04.08.20 10:45, Paul Durrant wrote:
>>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
>>> +{
>>> +    return ioreq->state == STATE_IOREQ_READY &&
>>> +           !ioreq->data_is_ptr &&
>>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != 
>>> IOREQ_WRITE);
>>> +}
>> I don't think having this in common code is correct. The short-cut of 
>> not completing PIO reads seems somewhat x86 specific. 

Hmmm, looking at the code, I think it doesn't wait for PIO writes to 
complete (not read). Did I miss anything?

> Does ARM even 
>> have the concept of PIO?
> 
> I am not 100% sure here, but it seems that doesn't have.

Technically, the PIOs exist on Arm, however they are accessed the same 
way as MMIO and will have a dedicated area defined by the HW.

AFAICT, on Arm64, they are only used for PCI IO Bar.

Now the question is whether we want to expose them to the Device 
Emulator as PIO or MMIO access. From a generic PoV, a DM shouldn't have 
to care about the architecture used. It should just be able to request a 
given IOport region.

So it may make sense to differentiate them in the common ioreq code as well.

I had a quick look at QEMU and wasn't able to tell if PIOs and MMIOs 
address space are different on Arm as well. Paul, Stefano, do you know 
what they are doing?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-04  7:49   ` Paul Durrant
@ 2020-08-04 14:01     ` Julien Grall
  2020-08-04 23:22       ` Stefano Stabellini
  2020-08-15 17:56       ` Julien Grall
  0 siblings, 2 replies; 140+ messages in thread
From: Julien Grall @ 2020-08-04 14:01 UTC (permalink / raw)
  To: paul, 'Oleksandr Tyshchenko', xen-devel
  Cc: 'Stefano Stabellini', 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Oleksandr Tyshchenko',
	'Julien Grall', 'Jan Beulich',
	'Daniel De Graaf', 'Volodymyr Babchuk'

Hi Paul,

On 04/08/2020 08:49, Paul Durrant wrote:
>> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
>> index 931404c..b5fc066 100644
>> --- a/tools/libxc/xc_dom_arm.c
>> +++ b/tools/libxc/xc_dom_arm.c
>> @@ -26,11 +26,19 @@
>>   #include "xg_private.h"
>>   #include "xc_dom.h"
>>
>> -#define NR_MAGIC_PAGES 4
>> +
>>   #define CONSOLE_PFN_OFFSET 0
>>   #define XENSTORE_PFN_OFFSET 1
>>   #define MEMACCESS_PFN_OFFSET 2
>>   #define VUART_PFN_OFFSET 3
>> +#define IOREQ_SERVER_PFN_OFFSET 4
>> +
>> +#define NR_IOREQ_SERVER_PAGES 8
>> +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES)
>> +
>> +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT)
>> +
>> +#define special_pfn(x)  (GUEST_MAGIC_BASE_PFN + (x))
> 
> Why introduce 'magic pages' for Arm? It's quite a horrible hack that we have begun to do away with by adding resource mapping.

This would require us to mandate at least Linux 4.17 in a domain that 
will run an IOREQ server. If we don't mandate this, the minimum version 
would be 4.10 where DM OP was introduced.

Because of XSA-300, we could technically not safely run an IOREQ server 
with existing Linux. So it is probably OK to enforce the use of the 
acquire interface.

Note that I haven't yet looked at the rest of the series. So I am not 
sure if there is more work necessary to enable it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-04 13:52       ` Julien Grall
@ 2020-08-04 15:41         ` Jan Beulich
  2020-08-04 19:11         ` Stefano Stabellini
  1 sibling, 0 replies; 140+ messages in thread
From: Jan Beulich @ 2020-08-04 15:41 UTC (permalink / raw)
  To: Julien Grall
  Cc: 'Kevin Tian', 'Stefano Stabellini',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', Oleksandr, 'Oleksandr Tyshchenko',
	'Julien Grall', 'Jun Nakajima',
	xen-devel, 'Roger Pau Monné'

On 04.08.2020 15:52, Julien Grall wrote:
> On 04/08/2020 12:10, Oleksandr wrote:
>> On 04.08.20 10:45, Paul Durrant wrote:
>>>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
>>>> +{
>>>> +    return ioreq->state == STATE_IOREQ_READY &&
>>>> +           !ioreq->data_is_ptr &&
>>>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != 
>>>> IOREQ_WRITE);
>>>> +}
>>> I don't think having this in common code is correct. The short-cut of 
>>> not completing PIO reads seems somewhat x86 specific. 
> 
> Hmmm, looking at the code, I think it doesn't wait for PIO writes to 
> complete (not read). Did I miss anything?

The point of the check isn't to determine whether to wait, but
what to do after having waited. Reads need a retry round through
the emulator (to store the result in the designated place),
while writes don't have such a requirement (and hence guest
execution can continue immediately in the general case).

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-04 13:52       ` Julien Grall
  2020-08-04 15:41         ` Jan Beulich
@ 2020-08-04 19:11         ` Stefano Stabellini
  2020-08-05  7:01           ` Jan Beulich
  2020-08-05  8:33           ` Julien Grall
  1 sibling, 2 replies; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-04 19:11 UTC (permalink / raw)
  To: Julien Grall
  Cc: 'Kevin Tian', 'Stefano Stabellini',
	'Jun Nakajima', 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	Oleksandr, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', xen-devel, 'Roger Pau Monné'

[-- Attachment #1: Type: text/plain, Size: 3669 bytes --]

On Tue, 4 Aug 2020, Julien Grall wrote:
> On 04/08/2020 12:10, Oleksandr wrote:
> > On 04.08.20 10:45, Paul Durrant wrote:
> > > > +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
> > > > +{
> > > > +    return ioreq->state == STATE_IOREQ_READY &&
> > > > +           !ioreq->data_is_ptr &&
> > > > +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
> > > > IOREQ_WRITE);
> > > > +}
> > > I don't think having this in common code is correct. The short-cut of not
> > > completing PIO reads seems somewhat x86 specific. 
> 
> Hmmm, looking at the code, I think it doesn't wait for PIO writes to complete
> (not read). Did I miss anything?
> 
> > Does ARM even 
> > > have the concept of PIO?
> > 
> > I am not 100% sure here, but it seems that doesn't have.
> 
> Technically, the PIOs exist on Arm, however they are accessed the same way as
> MMIO and will have a dedicated area defined by the HW.
> 
> AFAICT, on Arm64, they are only used for PCI IO Bar.
> 
> Now the question is whether we want to expose them to the Device Emulator as
> PIO or MMIO access. From a generic PoV, a DM shouldn't have to care about the
> architecture used. It should just be able to request a given IOport region.
> 
> So it may make sense to differentiate them in the common ioreq code as well.
> 
> I had a quick look at QEMU and wasn't able to tell if PIOs and MMIOs address
> space are different on Arm as well. Paul, Stefano, do you know what they are
> doing?

On the QEMU side, it looks like PIO (address_space_io) is used in
connection with the emulation of the "in" or "out" instructions, see
ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
space regardless of the architecture, such as
hw/pci/pci_bridge.c:pci_bridge_initfn.

However, because there is no "in" and "out" on ARM, I don't think
address_space_io can be accessed. Specifically, there is no equivalent
for target/i386/misc_helper.c:helper_inb on ARM.

So I think PIO is unused on ARM in QEMU.


FYI the ioreq type for PCI conf space reads and writes is
IOREQ_TYPE_PCI_CONFIG (neither MMIO nor PIO) which is implemented as
pci_host_config_read_common/pci_host_config_write_common directly
(neither PIO nor MMIO).


It looks like PIO-specific things could be kept x86-specific, without
loss of functionalities on the ARM side.


> The point of the check isn't to determine whether to wait, but
> what to do after having waited. Reads need a retry round through
> the emulator (to store the result in the designated place),
> while writes don't have such a requirement (and hence guest
> execution can continue immediately in the general case).

The x86 code looks like this:

            rc = hvm_send_ioreq(s, &p, 0);
            if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
                vio->io_req.state = STATE_IOREQ_NONE;
            else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
                rc = X86EMUL_OKAY;

Basically hvm_send_ioreq is expected to return RETRY.
Then, if it is a PIO write operation only, it is turned into OKAY right
away. Otherwise, rc stays as RETRY.

So, normally, hvmemul_do_io is expected to return RETRY, because the
emulator is not done yet. Am I understanding the code correctly?

If so, who is handling RETRY on x86? It tried to follow the call chain
but ended up in the x86 emulator and got lost :-)


At some point later, after the emulator (QEMU) has completed the
request, handle_hvm_io_completion gets called which ends up calling
handle_mmio() finishing the job on the Xen side too.


In other words:
RETRY ==> emulation in progress
OKAY  ==> emulation completed


Is that correct?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-04 14:01     ` Julien Grall
@ 2020-08-04 23:22       ` Stefano Stabellini
  2020-08-15 17:56       ` Julien Grall
  1 sibling, 0 replies; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-04 23:22 UTC (permalink / raw)
  To: Julien Grall
  Cc: 'Stefano Stabellini', 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Oleksandr Tyshchenko',
	'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', xen-devel, 'Daniel De Graaf',
	'Volodymyr Babchuk'

On Tue, 4 Aug 2020, Julien Grall wrote:
> On 04/08/2020 08:49, Paul Durrant wrote:
> > > diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
> > > index 931404c..b5fc066 100644
> > > --- a/tools/libxc/xc_dom_arm.c
> > > +++ b/tools/libxc/xc_dom_arm.c
> > > @@ -26,11 +26,19 @@
> > >   #include "xg_private.h"
> > >   #include "xc_dom.h"
> > > 
> > > -#define NR_MAGIC_PAGES 4
> > > +
> > >   #define CONSOLE_PFN_OFFSET 0
> > >   #define XENSTORE_PFN_OFFSET 1
> > >   #define MEMACCESS_PFN_OFFSET 2
> > >   #define VUART_PFN_OFFSET 3
> > > +#define IOREQ_SERVER_PFN_OFFSET 4
> > > +
> > > +#define NR_IOREQ_SERVER_PAGES 8
> > > +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES)
> > > +
> > > +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT)
> > > +
> > > +#define special_pfn(x)  (GUEST_MAGIC_BASE_PFN + (x))
> > 
> > Why introduce 'magic pages' for Arm? It's quite a horrible hack that we have
> > begun to do away with by adding resource mapping.
> 
> This would require us to mandate at least Linux 4.17 in a domain that will run
> an IOREQ server. If we don't mandate this, the minimum version would be 4.10
> where DM OP was introduced.
> 
> Because of XSA-300, we could technically not safely run an IOREQ server with
> existing Linux. So it is probably OK to enforce the use of the acquire
> interface.

+1


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-03 18:21 ` [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
  2020-08-04  7:49   ` Paul Durrant
@ 2020-08-04 23:22   ` Stefano Stabellini
  2020-08-05  7:05     ` Jan Beulich
  2020-08-05  9:32     ` Julien Grall
  2020-08-05 14:12   ` Julien Grall
  2020-08-05 16:13   ` Jan Beulich
  3 siblings, 2 replies; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-04 23:22 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Jan Beulich, xen-devel, Daniel De Graaf, Volodymyr Babchuk

On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch makes possible to forward Guest MMIO accesses
> to a device emulator on Arm and enables that support for
> Arm64.
> 
> Also update XSM code a bit to let DM op be used on Arm.
> New arch DM op will be introduced in the follow-up patch.
> 
> Please note, at the moment build on Arm32 is broken
> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone

Speaking of buffered_ioreq, if I recall correctly, they were only used
for VGA-related things on x86. It looks like it is still true.

If so, do we need it on ARM? Note that I don't think we can get rid of
it from the interface as it is baked into ioreq, but it might be
possible to have a dummy implementation on ARM. Or maybe not: looking at
xen/common/hvm/ioreq.c it looks like it would be difficult to
disentangle bufioreq stuff from the rest of the code.


> wants to enable CONFIG_IOREQ_SERVER due to the lack of
> cmpxchg_64 support on Arm32.
> 
> Please note, this is a split/cleanup of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

[...]


> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>   */
>  void leave_hypervisor_to_guest(void)
>  {
> +#ifdef CONFIG_IOREQ_SERVER
> +    /*
> +     * XXX: Check the return. Shall we call that in
> +     * continue_running and context_switch instead?
> +     * The benefits would be to avoid calling
> +     * handle_hvm_io_completion on every return.
> +     */

Yeah, that could be a simple and good optimization


> +    local_irq_enable();
> +    handle_hvm_io_completion(current);
> +#endif
>      local_irq_disable();
>  
>      check_for_vcpu_work();
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 4e2f582..e060b0a 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -11,12 +11,64 @@
>  #include <asm/vgic.h>
>  #include <asm/vpl011.h>
>  #include <public/hvm/params.h>
> +#include <public/hvm/dm_op.h>
> +#include <public/hvm/ioreq.h>
>  #include <xen/serial.h>
>  #include <xen/rbtree.h>
>  
> +struct hvm_ioreq_page {
> +    gfn_t gfn;
> +    struct page_info *page;
> +    void *va;
> +};
> +
> +struct hvm_ioreq_vcpu {
> +    struct list_head list_entry;
> +    struct vcpu      *vcpu;
> +    evtchn_port_t    ioreq_evtchn;
> +    bool             pending;
> +};
> +
> +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
> +#define MAX_NR_IO_RANGES  256
> +
> +#define MAX_NR_IOREQ_SERVERS 8
> +#define DEFAULT_IOSERVID 0
> +
> +struct hvm_ioreq_server {
> +    struct domain          *target, *emulator;
> +
> +    /* Lock to serialize toolstack modifications */
> +    spinlock_t             lock;
> +
> +    struct hvm_ioreq_page  ioreq;
> +    struct list_head       ioreq_vcpu_list;
> +    struct hvm_ioreq_page  bufioreq;
> +
> +    /* Lock to serialize access to buffered ioreq ring */
> +    spinlock_t             bufioreq_lock;
> +    evtchn_port_t          bufioreq_evtchn;
> +    struct rangeset        *range[NR_IO_RANGE_TYPES];
> +    bool                   enabled;
> +    uint8_t                bufioreq_handling;
> +};
> +
>  struct hvm_domain
>  {
>      uint64_t              params[HVM_NR_PARAMS];
> +
> +    /* Guest page range used for non-default ioreq servers */
> +    struct {
> +        unsigned long base;
> +        unsigned long mask;
> +        unsigned long legacy_mask; /* indexed by HVM param number */
> +    } ioreq_gfn;
> +
> +    /* Lock protects all other values in the sub-struct and the default */
> +    struct {
> +        spinlock_t              lock;
> +        struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
> +    } ioreq_server;
>  };
>  
>  #ifdef CONFIG_ARM_64
> @@ -93,6 +145,29 @@ struct arch_domain
>  #endif
>  }  __cacheline_aligned;
>  
> +enum hvm_io_completion {
> +    HVMIO_no_completion,
> +    HVMIO_mmio_completion,
> +    HVMIO_pio_completion,
> +    HVMIO_realmode_completion

realmode is an x86-ism (as pio), I wonder if we could get rid of it on ARM


> +};
> +
> +struct hvm_vcpu_io {
> +    /* I/O request in flight to device model. */
> +    enum hvm_io_completion io_completion;
> +    ioreq_t                io_req;
> +
> +    /*
> +     * HVM emulation:
> +     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
> +     *  The latter is known to be an MMIO frame (not RAM).
> +     *  This translation is only valid for accesses as per @mmio_access.
> +     */
> +    struct npfec        mmio_access;
> +    unsigned long       mmio_gla;
> +    unsigned long       mmio_gpfn;
> +};
> +
>  struct arch_vcpu
>  {
>      struct {
> @@ -206,6 +281,11 @@ struct arch_vcpu
>       */
>      bool need_flush_to_ram;
>  
> +    struct hvm_vcpu
> +    {
> +        struct hvm_vcpu_io hvm_io;
> +    } hvm;
> +
>  }  __cacheline_aligned;
>  
>  void vcpu_show_execution_state(struct vcpu *);
> diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h
> new file mode 100644
> index 0000000..83a560c
> --- /dev/null
> +++ b/xen/include/asm-arm/hvm/ioreq.h
> @@ -0,0 +1,103 @@
> +/*
> + * hvm.h: Hardware virtual machine assist interface definitions.
> + *
> + * Copyright (c) 2016 Citrix Systems Inc.
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __ASM_ARM_HVM_IOREQ_H__
> +#define __ASM_ARM_HVM_IOREQ_H__
> +
> +#include <public/hvm/ioreq.h>
> +#include <public/hvm/dm_op.h>
> +
> +#define has_vpci(d) (false)
> +
> +bool handle_mmio(void);
> +
> +static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
> +{
> +    /* XXX */
> +    BUG();
> +    return true;
> +}
> +
> +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
> +{
> +    return p->addr;
> +}
> +
> +static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p)
> +{
> +    unsigned long size = p->size;
> +
> +    return p->addr + size - 1;
> +}
> +
> +struct hvm_ioreq_server;
> +
> +static inline int p2m_set_ioreq_server(struct domain *d,
> +                                       unsigned int flags,
> +                                       struct hvm_ioreq_server *s)
> +{
> +    return -EOPNOTSUPP;
> +}
> +
> +static inline void msix_write_completion(struct vcpu *v)
> +{
> +}
> +
> +static inline void handle_realmode_completion(void)
> +{
> +    ASSERT_UNREACHABLE();
> +}
> +
> +static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
> +{
> +}
> +
> +static inline void hvm_get_ioreq_server_range_type(struct domain *d,
> +                                                   ioreq_t *p,
> +                                                   uint8_t *type,
> +                                                   uint64_t *addr)
> +{
> +    *type = (p->type == IOREQ_TYPE_PIO) ?
> +             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
> +    *addr = p->addr;
> +}
> +
> +static inline void arch_hvm_ioreq_init(struct domain *d)
> +{
> +}
> +
> +static inline void arch_hvm_ioreq_destroy(struct domain *d)
> +{
> +}
> +
> +#define IOREQ_IO_HANDLED     IO_HANDLED
> +#define IOREQ_IO_UNHANDLED   IO_UNHANDLED
> +#define IOREQ_IO_RETRY       IO_RETRY
> +
> +#endif /* __ASM_X86_HVM_IOREQ_H__ */
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index 5fdb6e8..5823f11 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>                                          mfn_t mfn)
>  {
>      /*
> -     * NOTE: If this is implemented then proper reference counting of
> -     *       foreign entries will need to be implemented.
> +     * XXX: handle properly reference. It looks like the page may not always
> +     * belong to d.

Just as a reference, and without taking away anything from the comment,
I think that QEMU is doing its own internal reference counting for these
mappings.


>       */
> -    return -EOPNOTSUPP;
> +
> +    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
>  }
>  
>  /*



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-03 18:21 ` [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
@ 2020-08-04 23:22   ` Stefano Stabellini
  2020-08-05  9:39     ` Julien Grall
  2020-08-05 16:15   ` Jan Beulich
  1 sibling, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-04 23:22 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Jan Beulich, xen-devel, Volodymyr Babchuk

On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch adds ability to the device emulator to notify otherend
> (some entity running in the guest) using a SPI and implements Arm
> specific bits for it. Proposed interface allows emulator to set
> the logical level of a one of a domain's IRQ lines.
> 
> Please note, this is a split/cleanup of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>  tools/libs/devicemodel/core.c                   | 18 ++++++++++++++++++
>  tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
>  tools/libs/devicemodel/libxendevicemodel.map    |  1 +
>  xen/arch/arm/dm.c                               | 22 +++++++++++++++++++++-
>  xen/common/hvm/dm.c                             |  1 +
>  xen/include/public/hvm/dm_op.h                  | 15 +++++++++++++++
>  6 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
> index 4d40639..30bd79f 100644
> --- a/tools/libs/devicemodel/core.c
> +++ b/tools/libs/devicemodel/core.c
> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
>      return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>  }
>  
> +int xendevicemodel_set_irq_level(
> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
> +    unsigned int level)

It is a pity that having xen_dm_op_set_pci_intx_level and
xen_dm_op_set_isa_irq_level already we need to add a third one, but from
the names alone I don't think we can reuse either of them.

It is very similar to set_isa_irq_level. We could almost rename
xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
better, just add an alias to it so that xendevicemodel_set_irq_level is
implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
not sure if it is worth doing it though. Any other opinions?


But I think we should plan for not needing two calls (one to set level
to 1, and one to set it to 0):
https://marc.info/?l=xen-devel&m=159535112027405


> +{
> +    struct xen_dm_op op;
> +    struct xen_dm_op_set_irq_level *data;
> +
> +    memset(&op, 0, sizeof(op));
> +
> +    op.op = XEN_DMOP_set_irq_level;
> +    data = &op.u.set_irq_level;
> +
> +    data->irq = irq;
> +    data->level = level;
> +
> +    return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> +}
> +
>  int xendevicemodel_set_pci_link_route(
>      xendevicemodel_handle *dmod, domid_t domid, uint8_t link, uint8_t irq)
>  {
> diff --git a/tools/libs/devicemodel/include/xendevicemodel.h b/tools/libs/devicemodel/include/xendevicemodel.h
> index e877f5c..c06b3c8 100644
> --- a/tools/libs/devicemodel/include/xendevicemodel.h
> +++ b/tools/libs/devicemodel/include/xendevicemodel.h
> @@ -209,6 +209,10 @@ int xendevicemodel_set_isa_irq_level(
>      xendevicemodel_handle *dmod, domid_t domid, uint8_t irq,
>      unsigned int level);
>  
> +int xendevicemodel_set_irq_level(
> +    xendevicemodel_handle *dmod, domid_t domid, unsigned int irq,
> +    unsigned int level);
> +
>  /**
>   * This function maps a PCI INTx line to a an IRQ line.
>   *
> diff --git a/tools/libs/devicemodel/libxendevicemodel.map b/tools/libs/devicemodel/libxendevicemodel.map
> index 561c62d..a0c3012 100644
> --- a/tools/libs/devicemodel/libxendevicemodel.map
> +++ b/tools/libs/devicemodel/libxendevicemodel.map
> @@ -32,6 +32,7 @@ VERS_1.2 {
>  	global:
>  		xendevicemodel_relocate_memory;
>  		xendevicemodel_pin_memory_cacheattr;
> +		xendevicemodel_set_irq_level;
>  } VERS_1.1;
>  
>  VERS_1.3 {
> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
> index 2437099..8431805 100644
> --- a/xen/arch/arm/dm.c
> +++ b/xen/arch/arm/dm.c
> @@ -20,7 +20,27 @@
>  int arch_dm_op(struct xen_dm_op *op, struct domain *d,
>                 const struct dmop_args *op_args, bool *const_op)
>  {
> -    return -EOPNOTSUPP;
> +    int rc;
> +
> +    switch ( op->op )
> +    {
> +    case XEN_DMOP_set_irq_level:
> +    {
> +        const struct xen_dm_op_set_irq_level *data =
> +            &op->u.set_irq_level;
> +
> +        /* XXX: Handle check */
> +        vgic_inject_irq(d, NULL, data->irq, data->level);
> +        rc = 0;
> +        break;
> +    }
> +
> +    default:
> +        rc = -EOPNOTSUPP;
> +        break;
> +    }
> +
> +    return rc;
>  }
>  
>  /*
> diff --git a/xen/common/hvm/dm.c b/xen/common/hvm/dm.c
> index 09e9542..e2e1250 100644
> --- a/xen/common/hvm/dm.c
> +++ b/xen/common/hvm/dm.c
> @@ -47,6 +47,7 @@ static int dm_op(const struct dmop_args *op_args)
>          [XEN_DMOP_remote_shutdown]                  = sizeof(struct xen_dm_op_remote_shutdown),
>          [XEN_DMOP_relocate_memory]                  = sizeof(struct xen_dm_op_relocate_memory),
>          [XEN_DMOP_pin_memory_cacheattr]             = sizeof(struct xen_dm_op_pin_memory_cacheattr),
> +        [XEN_DMOP_set_irq_level]                    = sizeof(struct xen_dm_op_set_irq_level),
>      };
>  
>      rc = rcu_lock_remote_domain_by_id(op_args->domid, &d);
> diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h
> index fd00e9d..c45d29e 100644
> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -417,6 +417,20 @@ struct xen_dm_op_pin_memory_cacheattr {
>      uint32_t pad;
>  };
>  
> +/*
> + * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's
> + *                         IRQ lines.
> + * XXX Handle PPIs.
> + */
> +#define XEN_DMOP_set_irq_level 19
> +
> +struct xen_dm_op_set_irq_level {
> +    uint32_t irq;
> +    /* IN - Level: 0 -> deasserted, 1 -> asserted */
> +    uint8_t  level;
> +};
> +
> +
>  struct xen_dm_op {
>      uint32_t op;
>      uint32_t pad;
> @@ -430,6 +444,7 @@ struct xen_dm_op {
>          struct xen_dm_op_track_dirty_vram track_dirty_vram;
>          struct xen_dm_op_set_pci_intx_level set_pci_intx_level;
>          struct xen_dm_op_set_isa_irq_level set_isa_irq_level;
> +        struct xen_dm_op_set_irq_level set_irq_level;
>          struct xen_dm_op_set_pci_link_route set_pci_link_route;
>          struct xen_dm_op_modified_memory modified_memory;
>          struct xen_dm_op_set_mem_type set_mem_type;
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way
  2020-08-03 18:21 ` [RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way Oleksandr Tyshchenko
@ 2020-08-04 23:22   ` Stefano Stabellini
  2020-08-05 20:51     ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-04 23:22 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Ian Jackson,
	Oleksandr Tyshchenko, Anthony PERARD, xen-devel

On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch makes possible to use device passthrough again.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>  tools/libxl/libxl_arm.c | 33 +++++++++++++++++++++++----------
>  1 file changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
> index 620b499..4f748e3 100644
> --- a/tools/libxl/libxl_arm.c
> +++ b/tools/libxl/libxl_arm.c
> @@ -9,6 +9,10 @@
>  #include <assert.h>
>  #include <xen/device_tree_defs.h>
>  
> +#define GUEST_VIRTIO_MMIO_BASE  xen_mk_ullong(0x02000000)
> +#define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
> +#define GUEST_VIRTIO_MMIO_SPI   33

They should be in xen/include/public/arch-arm.h

Is one interrupt enough if there are multiple virtio devices? Is it one
interrupt for all virtio devices, or one for each device?

Of course this patch should be folded in the patch to add virtio support
to libxl.


>  static const char *gicv_to_string(libxl_gic_version gic_version)
>  {
>      switch (gic_version) {
> @@ -27,8 +31,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>  {
>      uint32_t nr_spis = 0;
>      unsigned int i;
> -    uint32_t vuart_irq;
> -    bool vuart_enabled = false;
> +    uint32_t vuart_irq, virtio_irq;
> +    bool vuart_enabled = false, virtio_enabled = false;
>  
>      /*
>       * If pl011 vuart is enabled then increment the nr_spis to allow allocation
> @@ -40,6 +44,17 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>          vuart_enabled = true;
>      }
>  
> +    /*
> +     * XXX: Handle properly virtio
> +     * A proper solution would be the toolstack to allocate the interrupts
> +     * used by each virtio backend and let the backend now which one is used
> +     */
> +    if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {
> +        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
> +        virtio_irq = GUEST_VIRTIO_MMIO_SPI;
> +        virtio_enabled = true;
> +    }
> +
>      for (i = 0; i < d_config->b_info.num_irqs; i++) {
>          uint32_t irq = d_config->b_info.irqs[i];
>          uint32_t spi;
> @@ -59,6 +74,12 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>              return ERROR_FAIL;
>          }
>  
> +        /* The same check as for vpl011 */
> +        if (virtio_enabled && irq == virtio_irq) {
> +            LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq);
> +            return ERROR_FAIL;
> +        }
> +
>          if (irq < 32)
>              continue;
>  
> @@ -68,10 +89,6 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>              nr_spis = spi + 1;
>      }
>  
> -
> -    /* XXX: Handle properly virtio */
> -    nr_spis = 1;
> -
>      LOG(DEBUG, "Configure the domain");
>  
>      config->arch.nr_spis = nr_spis;
> @@ -663,10 +680,6 @@ static int make_vpl011_uart_node(libxl__gc *gc, void *fdt,
>      return 0;
>  }
>  
> -#define GUEST_VIRTIO_MMIO_BASE  xen_mk_ullong(0x02000000)
> -#define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
> -#define GUEST_VIRTIO_MMIO_SPI   33
> -
>  static int make_virtio_mmio_node(libxl__gc *gc, void *fdt)
>  {
>      int res;
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 10/12] libxl: Add support for virtio-disk configuration
  2020-08-03 18:21 ` [RFC PATCH V1 10/12] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
@ 2020-08-04 23:23   ` Stefano Stabellini
  2020-08-05 21:12     ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-04 23:23 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Ian Jackson,
	Oleksandr Tyshchenko, Anthony PERARD, xen-devel

On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch adds basic support for configuring and assisting virtio-disk
> backend (emualator) which is intended to run out of Qemu and could be run
> in any domain.
> 
> Xenstore was chosen as a communication interface for the emulator running
> in non-toolstack domain to be able to get configuration either by reading
> Xenstore directly or by receiving command line parameters (an updated 'xl devd'
> running in the same domain would read Xenstore beforehand and call backend
> executable with the required arguments).
> 
> An example of domain configuration (two disks are assigned to the guest,
> the latter is in readonly mode):
> 
> vdisk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3;ro:/dev/mmcblk1p3' ]
> 
> Where per-disk Xenstore entries are:
> - filename and readonly flag (configured via "vdisk" property)
> - base and irq (allocated dynamically)
> 
> Besides handling 'visible' params described in configuration file,
> patch also allocates virtio-mmio specific ones for each device and
> writes them into Xenstore. virtio-mmio params (irq and base) are
> unique per guest domain, they allocated at the domain creation time
> and passed through to the emulator. Each VirtIO device has at least
> one pair of these params.
> 
> TODO:
> 1. An extra "virtio" property could be removed.
> 2. Update documentation.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>  tools/libxl/Makefile                 |   4 +-
>  tools/libxl/libxl_arm.c              |  63 +++++++++++++++----
>  tools/libxl/libxl_create.c           |   1 +
>  tools/libxl/libxl_internal.h         |   1 +
>  tools/libxl/libxl_types.idl          |  15 +++++
>  tools/libxl/libxl_types_internal.idl |   1 +
>  tools/libxl/libxl_virtio_disk.c      | 109 +++++++++++++++++++++++++++++++++
>  tools/xl/Makefile                    |   2 +-
>  tools/xl/xl.h                        |   3 +
>  tools/xl/xl_cmdtable.c               |  15 +++++
>  tools/xl/xl_parse.c                  | 115 +++++++++++++++++++++++++++++++++++
>  tools/xl/xl_virtio_disk.c            |  46 ++++++++++++++
>  12 files changed, 360 insertions(+), 15 deletions(-)
>  create mode 100644 tools/libxl/libxl_virtio_disk.c
>  create mode 100644 tools/xl/xl_virtio_disk.c
> 
> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> index 38cd43a..df94b13 100644
> --- a/tools/libxl/Makefile
> +++ b/tools/libxl/Makefile
> @@ -141,7 +141,9 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
>  			libxl_vtpm.o libxl_nic.o libxl_disk.o libxl_console.o \
>  			libxl_cpupool.o libxl_mem.o libxl_sched.o libxl_tmem.o \
>  			libxl_9pfs.o libxl_domain.o libxl_vdispl.o \
> -			libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o $(LIBXL_OBJS-y)
> +			libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o \
> +			libxl_virtio_disk.o $(LIBXL_OBJS-y)
> +
>  LIBXL_OBJS += libxl_genid.o
>  LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
>  
> diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
> index 4f748e3..469a8b0 100644
> --- a/tools/libxl/libxl_arm.c
> +++ b/tools/libxl/libxl_arm.c
> @@ -13,6 +13,12 @@
>  #define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
>  #define GUEST_VIRTIO_MMIO_SPI   33
>  
> +#ifndef container_of
> +#define container_of(ptr, type, member) ({			\
> +        typeof( ((type *)0)->member ) *__mptr = (ptr);	\
> +        (type *)( (char *)__mptr - offsetof(type,member) );})
> +#endif
> +
>  static const char *gicv_to_string(libxl_gic_version gic_version)
>  {
>      switch (gic_version) {
> @@ -44,14 +50,32 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>          vuart_enabled = true;
>      }
>  
> -    /*
> -     * XXX: Handle properly virtio
> -     * A proper solution would be the toolstack to allocate the interrupts
> -     * used by each virtio backend and let the backend now which one is used
> -     */
>      if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {
> -        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
> +        uint64_t virtio_base;
> +        libxl_device_virtio_disk *virtio_disk;
> +
> +        virtio_base = GUEST_VIRTIO_MMIO_BASE;
>          virtio_irq = GUEST_VIRTIO_MMIO_SPI;
> +
> +        if (!d_config->num_virtio_disks) {
> +            LOG(ERROR, "Virtio is enabled, but no Virtio devices present\n");
> +            return ERROR_FAIL;
> +        }
> +        virtio_disk = &d_config->virtio_disks[0];
> +
> +        for (i = 0; i < virtio_disk->num_disks; i++) {
> +            virtio_disk->disks[i].base = virtio_base;
> +            virtio_disk->disks[i].irq = virtio_irq;
> +
> +            LOG(DEBUG, "Allocate Virtio MMIO params: IRQ %u BASE 0x%"PRIx64,
> +                virtio_irq, virtio_base);
> +
> +            virtio_irq ++;
> +            virtio_base += GUEST_VIRTIO_MMIO_SIZE;
> +        }
> +        virtio_irq --;
> +
> +        nr_spis += (virtio_irq - 32) + 1;

It looks like it is an interrupt per device, which could lead to quite a
few of them being allocated.

The issue is that today we don't really handle virtual interrupts
different from physical interrupts in Xen. So, if we end up allocating
let's say 6 virtio interrupts for a domain, the chance of a clash with a
physical interrupt of a passthrough device is real.

I am not entirely sure how to solve it, but these are a few ideas:
- choosing virtio interrupts that are less likely to conflict (maybe >
  1000)
- make the virtio irq (optionally) configurable so that a user could
  override the default irq and specify one that doesn't conflict
- implementing support for virq != pirq (even the xl interface doesn't
  allow to specify the virq number for passthrough devices, see "irqs")


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node
  2020-08-03 18:21 ` [RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node Oleksandr Tyshchenko
@ 2020-08-04 23:23   ` Stefano Stabellini
  2020-08-05 20:35     ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-04 23:23 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Ian Jackson,
	Oleksandr Tyshchenko, Anthony PERARD, xen-devel

On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Without "dma-coherent" property present in virtio-mmio device node,
> guest assumes it is non-coherent and making non-cacheable accesses
> to the vring when the DMA API is used for vring operations.
> But virtio-mmio device which runs at the host size is making cacheable
> accesses to vring. This all may result in a loss of coherency between
> the guest and host.
> 
> With this patch we can avoid modifying guest at all, otherwise we
> need to force VirtIO framework to not use DMA API for vring operations.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

This should also be folded in the first patch for libxl

> ---
>  tools/libxl/libxl_arm.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
> index 469a8b0..a68fb14 100644
> --- a/tools/libxl/libxl_arm.c
> +++ b/tools/libxl/libxl_arm.c
> @@ -726,6 +726,9 @@ static int make_virtio_mmio_node(libxl__gc *gc, void *fdt,
>      res = fdt_property_interrupts(gc, fdt, &intr, 1);
>      if (res) return res;
>  
> +    res = fdt_property(fdt, "dma-coherent", NULL, 0);
> +    if (res) return res;
> +
>      res = fdt_end_node(fdt);
>      if (res) return res;
>  
> -- 
> 2.7.4
> 


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-04 19:11         ` Stefano Stabellini
@ 2020-08-05  7:01           ` Jan Beulich
  2020-08-06  0:37             ` Stefano Stabellini
  2020-08-05  8:33           ` Julien Grall
  1 sibling, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-05  7:01 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', Julien Grall, 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	Oleksandr, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Jun Nakajima', xen-devel, 'Roger Pau Monné'

On 04.08.2020 21:11, Stefano Stabellini wrote:
>> The point of the check isn't to determine whether to wait, but
>> what to do after having waited. Reads need a retry round through
>> the emulator (to store the result in the designated place),
>> while writes don't have such a requirement (and hence guest
>> execution can continue immediately in the general case).
> 
> The x86 code looks like this:
> 
>             rc = hvm_send_ioreq(s, &p, 0);
>             if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
>                 vio->io_req.state = STATE_IOREQ_NONE;
>             else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
>                 rc = X86EMUL_OKAY;
> 
> Basically hvm_send_ioreq is expected to return RETRY.
> Then, if it is a PIO write operation only, it is turned into OKAY right
> away. Otherwise, rc stays as RETRY.
> 
> So, normally, hvmemul_do_io is expected to return RETRY, because the
> emulator is not done yet. Am I understanding the code correctly?

"The emulator" unfortunately is ambiguous here: Do you mean qemu
(or whichever else ioreq server) or the x86 emulator inside Xen?
There are various conditions leading to RETRY. As far as
hvm_send_ioreq() goes, it is expected to return RETRY whenever
some sort of response is to be expected (the most notable
exception being the hvm_send_buffered_ioreq() path), or when
submitting the request isn't possible in the first place.

> If so, who is handling RETRY on x86? It tried to follow the call chain
> but ended up in the x86 emulator and got lost :-)

Not sure I understand the question correctly, but I'll try an
answer nevertheless: hvm_send_ioreq() arranges for the vCPU to be
put to sleep (prepare_wait_on_xen_event_channel()). Once the event
channel got signaled (and vCPU unblocked), hvm_do_resume() ->
handle_hvm_io_completion() -> hvm_wait_for_io() then check whether
the wait reason has been satisfied (wait_on_xen_event_channel()),
and ...

> At some point later, after the emulator (QEMU) has completed the
> request, handle_hvm_io_completion gets called which ends up calling
> handle_mmio() finishing the job on the Xen side too.

..., as you say, handle_hvm_io_completion() invokes the retry of
the original operation (handle_mmio() or handle_pio() in
particular) if need be.

What's potentially confusing is that there's a second form of
retry, invoked by the x86 insn emulator itself when it needs to
split complex insns (the repeated string insns being the most
important example). This results in actually exiting back to guest
context without having advanced rIP, but after having updated
other register state suitably (to express the progress made so
far).

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-04 23:22   ` Stefano Stabellini
@ 2020-08-05  7:05     ` Jan Beulich
  2020-08-05 16:41       ` Stefano Stabellini
  2020-08-05  9:32     ` Julien Grall
  1 sibling, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-05  7:05 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Oleksandr Tyshchenko, Julien Grall,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk

On 05.08.2020 01:22, Stefano Stabellini wrote:
> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/asm-arm/p2m.h
>> +++ b/xen/include/asm-arm/p2m.h
>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>>                                          mfn_t mfn)
>>  {
>>      /*
>> -     * NOTE: If this is implemented then proper reference counting of
>> -     *       foreign entries will need to be implemented.
>> +     * XXX: handle properly reference. It looks like the page may not always
>> +     * belong to d.
> 
> Just as a reference, and without taking away anything from the comment,
> I think that QEMU is doing its own internal reference counting for these
> mappings.

Which of course in no way replaces the need to do proper ref counting
in Xen. (Just FAOD, as I'm not sure why you've said what you've said.)

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-04 19:11         ` Stefano Stabellini
  2020-08-05  7:01           ` Jan Beulich
@ 2020-08-05  8:33           ` Julien Grall
  2020-08-06  0:37             ` Stefano Stabellini
  1 sibling, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-05  8:33 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', 'Jun Nakajima', 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	Oleksandr, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', xen-devel, 'Roger Pau Monné'

Hi,

On 04/08/2020 20:11, Stefano Stabellini wrote:
> On Tue, 4 Aug 2020, Julien Grall wrote:
>> On 04/08/2020 12:10, Oleksandr wrote:
>>> On 04.08.20 10:45, Paul Durrant wrote:
>>>>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
>>>>> +{
>>>>> +    return ioreq->state == STATE_IOREQ_READY &&
>>>>> +           !ioreq->data_is_ptr &&
>>>>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
>>>>> IOREQ_WRITE);
>>>>> +}
>>>> I don't think having this in common code is correct. The short-cut of not
>>>> completing PIO reads seems somewhat x86 specific.
>>
>> Hmmm, looking at the code, I think it doesn't wait for PIO writes to complete
>> (not read). Did I miss anything?
>>
>>> Does ARM even
>>>> have the concept of PIO?
>>>
>>> I am not 100% sure here, but it seems that doesn't have.
>>
>> Technically, the PIOs exist on Arm, however they are accessed the same way as
>> MMIO and will have a dedicated area defined by the HW.
>>
>> AFAICT, on Arm64, they are only used for PCI IO Bar.
>>
>> Now the question is whether we want to expose them to the Device Emulator as
>> PIO or MMIO access. From a generic PoV, a DM shouldn't have to care about the
>> architecture used. It should just be able to request a given IOport region.
>>
>> So it may make sense to differentiate them in the common ioreq code as well.
>>
>> I had a quick look at QEMU and wasn't able to tell if PIOs and MMIOs address
>> space are different on Arm as well. Paul, Stefano, do you know what they are
>> doing?
> 
> On the QEMU side, it looks like PIO (address_space_io) is used in
> connection with the emulation of the "in" or "out" instructions, see
> ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
> space regardless of the architecture, such as
> hw/pci/pci_bridge.c:pci_bridge_initfn.
> 
> However, because there is no "in" and "out" on ARM, I don't think
> address_space_io can be accessed. Specifically, there is no equivalent
> for target/i386/misc_helper.c:helper_inb on ARM.

So how PCI I/O BAR are accessed? Surely, they could be used on Arm, right?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-04 23:22   ` Stefano Stabellini
  2020-08-05  7:05     ` Jan Beulich
@ 2020-08-05  9:32     ` Julien Grall
  2020-08-05 15:41       ` Oleksandr
                         ` (2 more replies)
  1 sibling, 3 replies; 140+ messages in thread
From: Julien Grall @ 2020-08-05  9:32 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr Tyshchenko
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Julien Grall, Jan Beulich, xen-devel,
	Daniel De Graaf, Volodymyr Babchuk

Hi Stefano,

On 05/08/2020 00:22, Stefano Stabellini wrote:
> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch makes possible to forward Guest MMIO accesses
>> to a device emulator on Arm and enables that support for
>> Arm64.
>>
>> Also update XSM code a bit to let DM op be used on Arm.
>> New arch DM op will be introduced in the follow-up patch.
>>
>> Please note, at the moment build on Arm32 is broken
>> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone
> 
> Speaking of buffered_ioreq, if I recall correctly, they were only used
> for VGA-related things on x86. It looks like it is still true.
> 
> If so, do we need it on ARM? Note that I don't think we can get rid of
> it from the interface as it is baked into ioreq, but it might be
> possible to have a dummy implementation on ARM. Or maybe not: looking at
> xen/common/hvm/ioreq.c it looks like it would be difficult to
> disentangle bufioreq stuff from the rest of the code.

We possibly don't need it right now. However, this could possibly be 
used in the future (e.g. virtio notification doorbell).

>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>    */
>>   void leave_hypervisor_to_guest(void)
>>   {
>> +#ifdef CONFIG_IOREQ_SERVER
>> +    /*
>> +     * XXX: Check the return. Shall we call that in
>> +     * continue_running and context_switch instead?
>> +     * The benefits would be to avoid calling
>> +     * handle_hvm_io_completion on every return.
>> +     */
> 
> Yeah, that could be a simple and good optimization

Well, it is not simple as it is sounds :). handle_hvm_io_completion() is 
the function in charge to mark the vCPU as waiting for I/O. So we would 
at least need to split the function.

I wrote this TODO because I wasn't sure about the complexity of 
handle_hvm_io_completion(current). Looking at it again, the main 
complexity is the looping over the IOREQ servers.

I think it would be better to optimize handle_hvm_io_completion() rather 
than trying to hack the context_switch() or continue_running().

[...]

>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>> index 5fdb6e8..5823f11 100644
>> --- a/xen/include/asm-arm/p2m.h
>> +++ b/xen/include/asm-arm/p2m.h
>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>>                                           mfn_t mfn)
>>   {
>>       /*
>> -     * NOTE: If this is implemented then proper reference counting of
>> -     *       foreign entries will need to be implemented.
>> +     * XXX: handle properly reference. It looks like the page may not always
>> +     * belong to d.
> 
> Just as a reference, and without taking away anything from the comment,
> I think that QEMU is doing its own internal reference counting for these
> mappings.

I am not sure how this matters here? We can't really trust the DM to do 
the right thing if it is not running in dom0.

But, IIRC, the problem is some of the pages doesn't belong to do a 
domain, so it is not possible to treat them as foreign mapping (e.g. you 
wouldn't be able to grab a reference). This investigation was done a 
couple of years ago, so this may have changed in recent Xen.

As a side note, I am a bit surprised to see most of my original TODOs 
present in the code. What is the plan to solve them?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-04 23:22   ` Stefano Stabellini
@ 2020-08-05  9:39     ` Julien Grall
  2020-08-06  0:37       ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-05  9:39 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr Tyshchenko
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Julien Grall, Jan Beulich, xen-devel,
	Volodymyr Babchuk

Hi,

On 05/08/2020 00:22, Stefano Stabellini wrote:
> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch adds ability to the device emulator to notify otherend
>> (some entity running in the guest) using a SPI and implements Arm
>> specific bits for it. Proposed interface allows emulator to set
>> the logical level of a one of a domain's IRQ lines.
>>
>> Please note, this is a split/cleanup of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>>   tools/libs/devicemodel/core.c                   | 18 ++++++++++++++++++
>>   tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
>>   tools/libs/devicemodel/libxendevicemodel.map    |  1 +
>>   xen/arch/arm/dm.c                               | 22 +++++++++++++++++++++-
>>   xen/common/hvm/dm.c                             |  1 +
>>   xen/include/public/hvm/dm_op.h                  | 15 +++++++++++++++
>>   6 files changed, 60 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
>> index 4d40639..30bd79f 100644
>> --- a/tools/libs/devicemodel/core.c
>> +++ b/tools/libs/devicemodel/core.c
>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
>>       return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>>   }
>>   
>> +int xendevicemodel_set_irq_level(
>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
>> +    unsigned int level)
> 
> It is a pity that having xen_dm_op_set_pci_intx_level and
> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
> the names alone I don't think we can reuse either of them.

The problem is not the name...

> 
> It is very similar to set_isa_irq_level. We could almost rename
> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
> better, just add an alias to it so that xendevicemodel_set_irq_level is
> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
> not sure if it is worth doing it though. Any other opinions?

... the problem is the interrupt field is only 8-bit. So we would only 
be able to cover IRQ 0 - 255.

It is not entirely clear how the existing subop could be extended 
without breaking existing callers.

> 
> 
> But I think we should plan for not needing two calls (one to set level
> to 1, and one to set it to 0):
> https://marc.info/?l=xen-devel&m=159535112027405

I am not sure to understand your suggestion here? Are you suggesting to 
remove the 'level' parameter?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-03 18:21 ` [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
  2020-08-04  7:45   ` Paul Durrant
@ 2020-08-05 13:30   ` Julien Grall
  2020-08-06 11:37     ` Oleksandr
  2020-08-05 16:15   ` Andrew Cooper
  2020-08-15 17:30   ` Julien Grall
  3 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-05 13:30 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Jan Beulich, Wei Liu,
	Paul Durrant, Andrew Cooper, Ian Jackson, George Dunlap,
	Tim Deegan, Oleksandr Tyshchenko, Julien Grall, Jun Nakajima,
	Roger Pau Monné

Hi,

On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> As a lot of x86 code can be re-used on Arm later on, this patch
> splits IOREQ support into common and arch specific parts.
> 
> This support is going to be used on Arm to be able run device
> emulator outside of Xen hypervisor.
> 
> Please note, this is a split/cleanup of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>   xen/arch/x86/Kconfig            |    1 +
>   xen/arch/x86/hvm/dm.c           |    2 +-
>   xen/arch/x86/hvm/emulate.c      |    2 +-
>   xen/arch/x86/hvm/hvm.c          |    2 +-
>   xen/arch/x86/hvm/io.c           |    2 +-
>   xen/arch/x86/hvm/ioreq.c        | 1431 +--------------------------------------
>   xen/arch/x86/hvm/stdvga.c       |    2 +-
>   xen/arch/x86/hvm/vmx/realmode.c |    1 +
>   xen/arch/x86/hvm/vmx/vvmx.c     |    2 +-
>   xen/arch/x86/mm.c               |    2 +-
>   xen/arch/x86/mm/shadow/common.c |    2 +-
>   xen/common/Kconfig              |    3 +
>   xen/common/Makefile             |    1 +
>   xen/common/hvm/Makefile         |    1 +
>   xen/common/hvm/ioreq.c          | 1430 ++++++++++++++++++++++++++++++++++++++
>   xen/include/asm-x86/hvm/ioreq.h |   45 +-
>   xen/include/asm-x86/hvm/vcpu.h  |    7 -
>   xen/include/xen/hvm/ioreq.h     |   89 +++
>   18 files changed, 1575 insertions(+), 1450 deletions(-)

That's quite a lot of code moved in a single patch. How can we check the 
code moved is still correct? Is it a verbatim copy?

>   create mode 100644 xen/common/hvm/Makefile
>   create mode 100644 xen/common/hvm/ioreq.c
>   create mode 100644 xen/include/xen/hvm/ioreq.h

[...]

> +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
> +{
> +    unsigned int prev_state = STATE_IOREQ_NONE;
> +
> +    while ( sv->pending )
> +    {
> +        unsigned int state = p->state;
> +
> +        smp_rmb();
> +
> +    recheck:
> +        if ( unlikely(state == STATE_IOREQ_NONE) )
> +        {
> +            /*
> +             * The only reason we should see this case is when an
> +             * emulator is dying and it races with an I/O being
> +             * requested.
> +             */
> +            hvm_io_assist(sv, ~0ul);
> +            break;
> +        }
> +
> +        if ( unlikely(state < prev_state) )
> +        {
> +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
> +                     prev_state, state);
> +            sv->pending = false;
> +            domain_crash(sv->vcpu->domain);
> +            return false; /* bail */
> +        }
> +
> +        switch ( prev_state = state )
> +        {
> +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> +            p->state = STATE_IOREQ_NONE;
> +            hvm_io_assist(sv, p->data);
> +            break;
> +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
> +        case STATE_IOREQ_INPROCESS:
> +            wait_on_xen_event_channel(sv->ioreq_evtchn,
> +                                      ({ state = p->state;
> +                                         smp_rmb();
> +                                         state != prev_state; }));
> +            goto recheck;

I recall some discussion on security@ about this specific code. An IOREQ 
server can be destroyed at any time. When destroying IOREQ server, the 
all the vCPUs will be paused to avoid race.

On x86, this was considered to be safe because
wait_on_xen_event_channel() will never return if the vCPU is re-scheduled.

However, on Arm, this function will return even after rescheduling. In 
this case, sv and p may point to invalid memory.

IIRC, the suggestion was to harden hvm_wait_for_io(). I guess we could 
fetch the sv and p after wait_on_xen_event_channel.

Any opinions?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-03 18:21 ` [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
  2020-08-04  7:49   ` Paul Durrant
  2020-08-04 23:22   ` Stefano Stabellini
@ 2020-08-05 14:12   ` Julien Grall
  2020-08-05 14:45     ` Jan Beulich
  2020-08-05 19:30     ` Oleksandr
  2020-08-05 16:13   ` Jan Beulich
  3 siblings, 2 replies; 140+ messages in thread
From: Julien Grall @ 2020-08-05 14:12 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	Daniel De Graaf, Volodymyr Babchuk

Hi,

On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> This patch makes possible to forward Guest MMIO accesses
> to a device emulator on Arm and enables that support for
> Arm64.
> 
> Also update XSM code a bit to let DM op be used on Arm.
> New arch DM op will be introduced in the follow-up patch.
> 
> Please note, at the moment build on Arm32 is broken
> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone
> wants to enable CONFIG_IOREQ_SERVER due to the lack of
> cmpxchg_64 support on Arm32.
> 
> Please note, this is a split/cleanup of Julien's PoC:
> "Add support for Guest IO forwarding to a device emulator"
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>   tools/libxc/xc_dom_arm.c        |  25 +++++++---
>   xen/arch/arm/Kconfig            |   1 +
>   xen/arch/arm/Makefile           |   2 +
>   xen/arch/arm/dm.c               |  34 +++++++++++++
>   xen/arch/arm/domain.c           |   9 ++++
>   xen/arch/arm/hvm.c              |  46 +++++++++++++++++-
>   xen/arch/arm/io.c               |  67 +++++++++++++++++++++++++-
>   xen/arch/arm/ioreq.c            |  86 +++++++++++++++++++++++++++++++++
>   xen/arch/arm/traps.c            |  17 +++++++
>   xen/common/memory.c             |   5 +-
>   xen/include/asm-arm/domain.h    |  80 +++++++++++++++++++++++++++++++
>   xen/include/asm-arm/hvm/ioreq.h | 103 ++++++++++++++++++++++++++++++++++++++++
>   xen/include/asm-arm/mmio.h      |   1 +
>   xen/include/asm-arm/p2m.h       |   7 +--
>   xen/include/xsm/dummy.h         |   4 +-
>   xen/include/xsm/xsm.h           |   6 +--
>   xen/xsm/dummy.c                 |   2 +-
>   xen/xsm/flask/hooks.c           |   5 +-
>   18 files changed, 476 insertions(+), 24 deletions(-)
>   create mode 100644 xen/arch/arm/dm.c
>   create mode 100644 xen/arch/arm/ioreq.c
>   create mode 100644 xen/include/asm-arm/hvm/ioreq.h

It feels to me the patch is doing quite a few things that are indirectly 
related. Can this be split to make the review easier?

I would like at least the following split from the series:
    - The tools changes
    - The P2M changes
    - The HVMOP plumbing (if we still require them)

[...]

> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
> new file mode 100644
> index 0000000..2437099
> --- /dev/null
> +++ b/xen/arch/arm/dm.c
> @@ -0,0 +1,34 @@
> +/*
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/hypercall.h>
> +#include <asm/vgic.h>

The list of includes sounds strange. Can we make sure to include only 
necessary headers and add the others when they are required?

> +
> +int arch_dm_op(struct xen_dm_op *op, struct domain *d,
> +               const struct dmop_args *op_args, bool *const_op)
> +{
> +    return -EOPNOTSUPP;
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */

>   long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>   {
>       long rc = 0;
> @@ -111,7 +155,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>               if ( rc )
>                   goto param_fail;
>   
> -            d->arch.hvm.params[a.index] = a.value;
> +            rc = hvmop_set_param(d, &a);
>           }
>           else
>           {
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index ae7ef96..436f669 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -16,6 +16,7 @@
>    * GNU General Public License for more details.
>    */
>   
> +#include <xen/hvm/ioreq.h>
>   #include <xen/lib.h>
>   #include <xen/spinlock.h>
>   #include <xen/sched.h>
> @@ -107,6 +108,62 @@ static const struct mmio_handler *find_mmio_handler(struct domain *d,
>       return handler;
>   }
>   
> +#ifdef CONFIG_IOREQ_SERVER

Can we just implement this function in ioreq.c and provide a stub when 
!CONFIG_IOREQ_SERVER?

> +static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
> +                                    struct vcpu *v, mmio_info_t *info)
> +{
> +    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
> +    ioreq_t p = {
> +        .type = IOREQ_TYPE_COPY,
> +        .addr = info->gpa,
> +        .size = 1 << info->dabt.size,
> +        .count = 0,
> +        .dir = !info->dabt.write,
> +        .df = 0,         /* XXX: What's for? */
> +        .data = get_user_reg(regs, info->dabt.reg),
> +        .state = STATE_IOREQ_READY,
> +    };
> +    struct hvm_ioreq_server *s = NULL;
> +    enum io_state rc;
> +
> +    switch ( vio->io_req.state )
> +    {
> +    case STATE_IOREQ_NONE:
> +        break;
> +    default:
> +        printk("d%u wrong state %u\n", v->domain->domain_id,
> +               vio->io_req.state);

This will likely want to be a gprintk() or gdprintk() to avoid a guest 
spamming Xen.

> +        return IO_ABORT;
> +    }
> +
> +    s = hvm_select_ioreq_server(v->domain, &p);
> +    if ( !s )
> +        return IO_UNHANDLED;
> +
> +    if ( !info->dabt.valid )
> +    {
> +        printk("Valid bit not set\n");

Same here. However, I am not convinced this is a useful message to keep.

> +        return IO_ABORT;
> +    }
> +
> +    vio->io_req = p;
> +
> +    rc = hvm_send_ioreq(s, &p, 0);
> +    if ( rc != IO_RETRY || v->domain->is_shutting_down )
> +        vio->io_req.state = STATE_IOREQ_NONE;
> +    else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
> +        rc = IO_HANDLED;
> +    else
> +        vio->io_completion = HVMIO_mmio_completion;
> +
> +    /* XXX: Decide what to do */

We want to understand how IO_RETRY can happen on x86 first. With that, 
we should be able to understand whether this can happen on Arm as well.

> +    if ( rc == IO_RETRY )
> +        rc = IO_HANDLED;
> +
> +    return rc;
> +}
> +#endif
> +
>   enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>                                 const union hsr hsr,
>                                 paddr_t gpa)
> @@ -123,7 +180,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>   
>       handler = find_mmio_handler(v->domain, info.gpa);
>       if ( !handler )
> -        return IO_UNHANDLED;
> +    {
> +        int rc = IO_UNHANDLED;
> +
> +#ifdef CONFIG_IOREQ_SERVER
> +        rc = try_fwd_ioserv(regs, v, &info);
> +#endif
> +
> +        return rc;
> +    }
>   
>       /* All the instructions used on emulated MMIO region should be valid */
>       if ( !dabt.valid )
> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> new file mode 100644
> index 0000000..a9cc839
> --- /dev/null
> +++ b/xen/arch/arm/ioreq.c
> @@ -0,0 +1,86 @@
> +/*
> + * arm/ioreq.c: hardware virtual machine I/O emulation
> + *
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <xen/ctype.h>
> +#include <xen/hvm/ioreq.h>
> +#include <xen/init.h>
> +#include <xen/lib.h>
> +#include <xen/trace.h>
> +#include <xen/sched.h>
> +#include <xen/irq.h>
> +#include <xen/softirq.h>
> +#include <xen/domain.h>
> +#include <xen/domain_page.h>
> +#include <xen/event.h>
> +#include <xen/paging.h>
> +#include <xen/vpci.h>
> +
> +#include <public/hvm/dm_op.h>
> +#include <public/hvm/ioreq.h>
> +
> +bool handle_mmio(void)

The name of the function is pretty generic and can be confusing on Arm 
(we already have a try_handle_mmio()).

What is this function supposed to do?

> +{
> +    struct vcpu *v = current;
> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
> +    const union hsr hsr = { .bits = regs->hsr };
> +    const struct hsr_dabt dabt = hsr.dabt;
> +    /* Code is similar to handle_read */
> +    uint8_t size = (1 << dabt.size) * 8;
> +    register_t r = v->arch.hvm.hvm_io.io_req.data;
> +
> +    /* We should only be here on Guest Data Abort */
> +    ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL);
> +
> +    /* We are done with the IO */
> +    /* XXX: Is it the right place? */
> +    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
> +
> +    /* XXX: Do we need to take care of write here ? */
> +    if ( dabt.write )
> +        return true;
> +
> +    /*
> +     * Sign extend if required.
> +     * Note that we expect the read handler to have zeroed the bits
> +     * outside the requested access size.
> +     */
> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
> +    {
> +        /*
> +         * We are relying on register_t using the same as
> +         * an unsigned long in order to keep the 32-bit assembly
> +         * code smaller.
> +         */
> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
> +        r |= (~0UL) << size;
> +    }
> +
> +    set_user_reg(regs, dabt.reg, r);
> +
> +    return true;
> +}

[...]

> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 9283e5e..0000477 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -8,6 +8,7 @@
>    */
>   
>   #include <xen/domain_page.h>
> +#include <xen/hvm/ioreq.h>
>   #include <xen/types.h>
>   #include <xen/lib.h>
>   #include <xen/mm.h>
> @@ -30,10 +31,6 @@
>   #include <public/memory.h>
>   #include <xsm/xsm.h>
>   
> -#ifdef CONFIG_IOREQ_SERVER
> -#include <xen/hvm/ioreq.h>
> -#endif
> -

Why do you remove something your just introduced?

>   #ifdef CONFIG_X86
>   #include <asm/guest.h>
>   #endif
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 4e2f582..e060b0a 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -11,12 +11,64 @@
>   #include <asm/vgic.h>
>   #include <asm/vpl011.h>
>   #include <public/hvm/params.h>
> +#include <public/hvm/dm_op.h>
> +#include <public/hvm/ioreq.h>
>   #include <xen/serial.h>
>   #include <xen/rbtree.h>
>   
> +struct hvm_ioreq_page {
> +    gfn_t gfn;
> +    struct page_info *page;
> +    void *va;
> +};

AFAICT all the structures/define you introduced here are used by the 
code common. So it feels to me they should be defined in a common header.

> +
> +struct hvm_ioreq_vcpu {
> +    struct list_head list_entry;
> +    struct vcpu      *vcpu;
> +    evtchn_port_t    ioreq_evtchn;
> +    bool             pending;
> +};
> +
> +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
> +#define MAX_NR_IO_RANGES  256
> +
> +#define MAX_NR_IOREQ_SERVERS 8
> +#define DEFAULT_IOSERVID 0
> +
> +struct hvm_ioreq_server {
> +    struct domain          *target, *emulator;
> +
> +    /* Lock to serialize toolstack modifications */
> +    spinlock_t             lock;
> +
> +    struct hvm_ioreq_page  ioreq;
> +    struct list_head       ioreq_vcpu_list;
> +    struct hvm_ioreq_page  bufioreq;
> +
> +    /* Lock to serialize access to buffered ioreq ring */
> +    spinlock_t             bufioreq_lock;
> +    evtchn_port_t          bufioreq_evtchn;
> +    struct rangeset        *range[NR_IO_RANGE_TYPES];
> +    bool                   enabled;
> +    uint8_t                bufioreq_handling;
> +};
> +
>   struct hvm_domain
>   {
>       uint64_t              params[HVM_NR_PARAMS];
> +
> +    /* Guest page range used for non-default ioreq servers */
> +    struct {
> +        unsigned long base;
> +        unsigned long mask;
> +        unsigned long legacy_mask; /* indexed by HVM param number */
> +    } ioreq_gfn;
> +
> +    /* Lock protects all other values in the sub-struct and the default */
> +    struct {
> +        spinlock_t              lock;
> +        struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
> +    } ioreq_server;
>   };
>   
>   #ifdef CONFIG_ARM_64
> @@ -93,6 +145,29 @@ struct arch_domain
>   #endif
>   }  __cacheline_aligned;
>   
> +enum hvm_io_completion {
> +    HVMIO_no_completion,
> +    HVMIO_mmio_completion,
> +    HVMIO_pio_completion,
> +    HVMIO_realmode_completion
> +};
> +
> +struct hvm_vcpu_io {
> +    /* I/O request in flight to device model. */
> +    enum hvm_io_completion io_completion;
> +    ioreq_t                io_req;
> +
> +    /*
> +     * HVM emulation:
> +     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
> +     *  The latter is known to be an MMIO frame (not RAM).
> +     *  This translation is only valid for accesses as per @mmio_access.
> +     */
> +    struct npfec        mmio_access;
> +    unsigned long       mmio_gla;
> +    unsigned long       mmio_gpfn;
> +};
> +
>   struct arch_vcpu
>   {
>       struct {
> @@ -206,6 +281,11 @@ struct arch_vcpu
>        */
>       bool need_flush_to_ram;
>   
> +    struct hvm_vcpu
> +    {
> +        struct hvm_vcpu_io hvm_io;
> +    } hvm;
> +
>   }  __cacheline_aligned;
>   
>   void vcpu_show_execution_state(struct vcpu *);
> diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h
> new file mode 100644
> index 0000000..83a560c
> --- /dev/null
> +++ b/xen/include/asm-arm/hvm/ioreq.h
> @@ -0,0 +1,103 @@
> +/*
> + * hvm.h: Hardware virtual machine assist interface definitions.
> + *
> + * Copyright (c) 2016 Citrix Systems Inc.
> + * Copyright (c) 2019 Arm ltd.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __ASM_ARM_HVM_IOREQ_H__
> +#define __ASM_ARM_HVM_IOREQ_H__
> +
> +#include <public/hvm/ioreq.h>
> +#include <public/hvm/dm_op.h>
> +
> +#define has_vpci(d) (false)

It feels to me this wants to be defined in vcpi.h.

> +
> +bool handle_mmio(void);
> +
> +static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
> +{
> +    /* XXX */

Can you expand this TODO? What do you expect to do?

> +    BUG();
> +    return true;
> +}
> +
> +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
> +{
> +    return p->addr;
> +}

I understand that the x86 version is more complex as it check p->df. 
However, aside reducing the complexity, I am not sure why we would want 
to diverge it.

After all, IOREQ is now meant to be a common feature.

> +
> +static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p)
> +{
> +    unsigned long size = p->size;
> +
> +    return p->addr + size - 1;
> +}

Same.

> +
> +struct hvm_ioreq_server;

Why do we need a forward declaration?

> +
> +static inline int p2m_set_ioreq_server(struct domain *d,
> +                                       unsigned int flags,
> +                                       struct hvm_ioreq_server *s)
> +{
> +    return -EOPNOTSUPP;
> +}

This should be defined in p2m.h. But I am not even sure what it is meant 
for. Can you expand it?

> +
> +static inline void msix_write_completion(struct vcpu *v)
> +{
> +}
> +
> +static inline void handle_realmode_completion(void)
> +{
> +    ASSERT_UNREACHABLE();
> +}

realmode is very x86 specific. So I don't think this function should be 
called from common code. It might be worth considering to split 
handle_hvm_io_completion() is 2 parts: common and arch specific.

> +
> +static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
> +{
> +}

This will want to be stubbed in asm-arm/paging.h.

> +
> +static inline void hvm_get_ioreq_server_range_type(struct domain *d,
> +                                                   ioreq_t *p,
> +                                                   uint8_t *type,
> +                                                   uint64_t *addr)
> +{
> +    *type = (p->type == IOREQ_TYPE_PIO) ?
> +             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
> +    *addr = p->addr;
> +}
> +
> +static inline void arch_hvm_ioreq_init(struct domain *d)
> +{
> +}
> +
> +static inline void arch_hvm_ioreq_destroy(struct domain *d)
> +{
> +}
> +
> +#define IOREQ_IO_HANDLED     IO_HANDLED
> +#define IOREQ_IO_UNHANDLED   IO_UNHANDLED
> +#define IOREQ_IO_RETRY       IO_RETRY
> +
> +#endif /* __ASM_X86_HVM_IOREQ_H__ */

s/X86/ARM/

> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * tab-width: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h
> index 8dbfb27..7ab873c 100644
> --- a/xen/include/asm-arm/mmio.h
> +++ b/xen/include/asm-arm/mmio.h
> @@ -37,6 +37,7 @@ enum io_state
>       IO_ABORT,       /* The IO was handled by the helper and led to an abort. */
>       IO_HANDLED,     /* The IO was successfully handled by the helper. */
>       IO_UNHANDLED,   /* The IO was not handled by the helper. */
> +    IO_RETRY,       /* Retry the emulation for some reason */
>   };
>   
>   typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info,
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index 5fdb6e8..5823f11 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>                                           mfn_t mfn)
>   {
>       /*
> -     * NOTE: If this is implemented then proper reference counting of
> -     *       foreign entries will need to be implemented.
> +     * XXX: handle properly reference. It looks like the page may not always
> +     * belong to d.
>        */
> -    return -EOPNOTSUPP;
> +
> +    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);

Treating foreign as p2m_ram_rw is more an hack that a real fix. I have 
answered to this separately (see my answer on Stefano's e-mail), so we 
can continue the conversation there.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-05 14:12   ` Julien Grall
@ 2020-08-05 14:45     ` Jan Beulich
  2020-08-05 19:30     ` Oleksandr
  1 sibling, 0 replies; 140+ messages in thread
From: Jan Beulich @ 2020-08-05 14:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Oleksandr Tyshchenko,
	Julien Grall, xen-devel, Daniel De Graaf, Volodymyr Babchuk

On 05.08.2020 16:12, Julien Grall wrote:
> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>> --- /dev/null
>> +++ b/xen/include/asm-arm/hvm/ioreq.h
>> @@ -0,0 +1,103 @@
>> +/*
>> + * hvm.h: Hardware virtual machine assist interface definitions.
>> + *
>> + * Copyright (c) 2016 Citrix Systems Inc.
>> + * Copyright (c) 2019 Arm ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef __ASM_ARM_HVM_IOREQ_H__
>> +#define __ASM_ARM_HVM_IOREQ_H__
>> +
>> +#include <public/hvm/ioreq.h>
>> +#include <public/hvm/dm_op.h>
>> +
>> +#define has_vpci(d) (false)
> 
> It feels to me this wants to be defined in vcpi.h.

On x86 it wants to live together with a bunch of other has_v*()
macros.

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-05  9:32     ` Julien Grall
@ 2020-08-05 15:41       ` Oleksandr
  2020-08-06 10:19         ` Julien Grall
  2020-08-10 18:09       ` Oleksandr
  2020-08-11 17:09       ` Oleksandr
  2 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-05 15:41 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk


On 05.08.20 12:32, Julien Grall wrote:

Hi Julien.

> Hi Stefano,
>
> On 05/08/2020 00:22, Stefano Stabellini wrote:
>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> This patch makes possible to forward Guest MMIO accesses
>>> to a device emulator on Arm and enables that support for
>>> Arm64.
>>>
>>> Also update XSM code a bit to let DM op be used on Arm.
>>> New arch DM op will be introduced in the follow-up patch.
>>>
>>> Please note, at the moment build on Arm32 is broken
>>> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone
>>
>> Speaking of buffered_ioreq, if I recall correctly, they were only used
>> for VGA-related things on x86. It looks like it is still true.
>>
>> If so, do we need it on ARM? Note that I don't think we can get rid of
>> it from the interface as it is baked into ioreq, but it might be
>> possible to have a dummy implementation on ARM. Or maybe not: looking at
>> xen/common/hvm/ioreq.c it looks like it would be difficult to
>> disentangle bufioreq stuff from the rest of the code.
>
> We possibly don't need it right now. However, this could possibly be 
> used in the future (e.g. virtio notification doorbell).
>
>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>>    */
>>>   void leave_hypervisor_to_guest(void)
>>>   {
>>> +#ifdef CONFIG_IOREQ_SERVER
>>> +    /*
>>> +     * XXX: Check the return. Shall we call that in
>>> +     * continue_running and context_switch instead?
>>> +     * The benefits would be to avoid calling
>>> +     * handle_hvm_io_completion on every return.
>>> +     */
>>
>> Yeah, that could be a simple and good optimization
>
> Well, it is not simple as it is sounds :). handle_hvm_io_completion() 
> is the function in charge to mark the vCPU as waiting for I/O. So we 
> would at least need to split the function.
>
> I wrote this TODO because I wasn't sure about the complexity of 
> handle_hvm_io_completion(current). Looking at it again, the main 
> complexity is the looping over the IOREQ servers.
>
> I think it would be better to optimize handle_hvm_io_completion() 
> rather than trying to hack the context_switch() or continue_running().
>
> [...]
>
>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>> index 5fdb6e8..5823f11 100644
>>> --- a/xen/include/asm-arm/p2m.h
>>> +++ b/xen/include/asm-arm/p2m.h
>>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct 
>>> domain *d, unsigned long gfn,
>>>                                           mfn_t mfn)
>>>   {
>>>       /*
>>> -     * NOTE: If this is implemented then proper reference counting of
>>> -     *       foreign entries will need to be implemented.
>>> +     * XXX: handle properly reference. It looks like the page may 
>>> not always
>>> +     * belong to d.
>>
>> Just as a reference, and without taking away anything from the comment,
>> I think that QEMU is doing its own internal reference counting for these
>> mappings.
>
> I am not sure how this matters here? We can't really trust the DM to 
> do the right thing if it is not running in dom0.
>
> But, IIRC, the problem is some of the pages doesn't belong to do a 
> domain, so it is not possible to treat them as foreign mapping (e.g. 
> you wouldn't be able to grab a reference). This investigation was done 
> a couple of years ago, so this may have changed in recent Xen.
>
> As a side note, I am a bit surprised to see most of my original TODOs 
> present in the code. What is the plan to solve them?
The plan is to solve most critical TODOs in current series, and rest in 
follow-up series if no objections of course. Any pointers how to solve 
them properly would be much appreciated. Unfortunately, now I have a 
weak understanding how they should be fixed. I see at least 3 major TODO 
here:

1. handle properly reference in set_foreign_p2m_entry()
2. optimize handle_hvm_io_completion()
3. hande properly IO_RETRY in try_fwd_ioserv()


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-03 18:21 ` [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
                     ` (2 preceding siblings ...)
  2020-08-05 14:12   ` Julien Grall
@ 2020-08-05 16:13   ` Jan Beulich
  2020-08-05 19:47     ` Oleksandr
  3 siblings, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-05 16:13 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Daniel De Graaf, Volodymyr Babchuk

On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -713,14 +713,14 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
>      }
>  }
>  
> +#endif /* CONFIG_X86 */
> +
>  static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
>  {
>      XSM_ASSERT_ACTION(XSM_DM_PRIV);
>      return xsm_default_action(action, current->domain, d);
>  }
>  
> -#endif /* CONFIG_X86 */
> -
>  #ifdef CONFIG_ARGO
>  static XSM_INLINE int xsm_argo_enable(const struct domain *d)
>  {
> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> index a80bcf3..2a9b39d 100644
> --- a/xen/include/xsm/xsm.h
> +++ b/xen/include/xsm/xsm.h
> @@ -177,8 +177,8 @@ struct xsm_operations {
>      int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>      int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>      int (*pmu_op) (struct domain *d, unsigned int op);
> -    int (*dm_op) (struct domain *d);
>  #endif
> +    int (*dm_op) (struct domain *d);
>      int (*xen_version) (uint32_t cmd);
>      int (*domain_resource_map) (struct domain *d);
>  #ifdef CONFIG_ARGO
> @@ -688,13 +688,13 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
>      return xsm_ops->pmu_op(d, op);
>  }
>  
> +#endif /* CONFIG_X86 */
> +
>  static inline int xsm_dm_op(xsm_default_t def, struct domain *d)
>  {
>      return xsm_ops->dm_op(d);
>  }
>  
> -#endif /* CONFIG_X86 */
> -
>  static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
>  {
>      return xsm_ops->xen_version(op);
> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
> index d4cce68..e3afd06 100644
> --- a/xen/xsm/dummy.c
> +++ b/xen/xsm/dummy.c
> @@ -148,8 +148,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
>      set_to_dummy_if_null(ops, ioport_permission);
>      set_to_dummy_if_null(ops, ioport_mapping);
>      set_to_dummy_if_null(ops, pmu_op);
> -    set_to_dummy_if_null(ops, dm_op);
>  #endif
> +    set_to_dummy_if_null(ops, dm_op);
>      set_to_dummy_if_null(ops, xen_version);
>      set_to_dummy_if_null(ops, domain_resource_map);
>  #ifdef CONFIG_ARGO
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index a314bf8..645192a 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1662,14 +1662,13 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
>          return -EPERM;
>      }
>  }
> +#endif /* CONFIG_X86 */
>  
>  static int flask_dm_op(struct domain *d)
>  {
>      return current_has_perm(d, SECCLASS_HVM, HVM__DM);
>  }
>  
> -#endif /* CONFIG_X86 */
> -
>  static int flask_xen_version (uint32_t op)
>  {
>      u32 dsid = domain_sid(current->domain);
> @@ -1872,8 +1871,8 @@ static struct xsm_operations flask_ops = {
>      .ioport_permission = flask_ioport_permission,
>      .ioport_mapping = flask_ioport_mapping,
>      .pmu_op = flask_pmu_op,
> -    .dm_op = flask_dm_op,
>  #endif
> +    .dm_op = flask_dm_op,
>      .xen_version = flask_xen_version,
>      .domain_resource_map = flask_domain_resource_map,
>  #ifdef CONFIG_ARGO

All of this looks to belong into patch 2?

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-03 18:21 ` [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
  2020-08-04 23:22   ` Stefano Stabellini
@ 2020-08-05 16:15   ` Jan Beulich
  2020-08-05 22:12     ` Oleksandr
  1 sibling, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-05 16:15 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	xen-devel, Volodymyr Babchuk

On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
> --- a/xen/include/public/hvm/dm_op.h
> +++ b/xen/include/public/hvm/dm_op.h
> @@ -417,6 +417,20 @@ struct xen_dm_op_pin_memory_cacheattr {
>      uint32_t pad;
>  };
>  
> +/*
> + * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's
> + *                         IRQ lines.
> + * XXX Handle PPIs.
> + */
> +#define XEN_DMOP_set_irq_level 19
> +
> +struct xen_dm_op_set_irq_level {
> +    uint32_t irq;
> +    /* IN - Level: 0 -> deasserted, 1 -> asserted */
> +    uint8_t  level;
> +};

If this is the way to go (I've seen other discussion going on),
please make sure you add explicit padding fields and ...

> +
> +
>  struct xen_dm_op {

... you don't add double blank lines.

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-03 18:21 ` [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
  2020-08-04  7:45   ` Paul Durrant
  2020-08-05 13:30   ` Julien Grall
@ 2020-08-05 16:15   ` Andrew Cooper
  2020-08-06  8:20     ` Oleksandr
  2020-08-15 17:30   ` Julien Grall
  3 siblings, 1 reply; 140+ messages in thread
From: Andrew Cooper @ 2020-08-05 16:15 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jan Beulich,
	Wei Liu, Paul Durrant, Tim Deegan, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Julien Grall, Jun Nakajima,
	Roger Pau Monné

On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 06881d0..f6fc3f8 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -70,6 +70,7 @@ extra-y := symbols-dummy.o
>  
>  obj-$(CONFIG_COVERAGE) += coverage/
>  obj-y += sched/
> +obj-$(CONFIG_IOREQ_SERVER) += hvm/
>  obj-$(CONFIG_UBSAN) += ubsan/
>  
>  obj-$(CONFIG_NEEDS_LIBELF) += libelf/
> diff --git a/xen/common/hvm/Makefile b/xen/common/hvm/Makefile
> new file mode 100644
> index 0000000..326215d
> --- /dev/null
> +++ b/xen/common/hvm/Makefile
> @@ -0,0 +1 @@
> +obj-y += ioreq.o
> diff --git a/xen/common/hvm/ioreq.c b/xen/common/hvm/ioreq.c
> new file mode 100644
> index 0000000..7e1fa23
> --- /dev/null
> +++ b/xen/common/hvm/ioreq.c
> <snip>

HVM is an internal detail of arch specific code.  It should not escape
into common code.

From x86's point of view, there is nothing conceptually wrong with
having an IOREQ server for PV guests, although it is very unlikely at
this point that adding support would be a good use of time.

Please make this into a proper top-level common set of functionality.

~Andrew


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
  2020-08-03 18:21 ` [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain Oleksandr Tyshchenko
@ 2020-08-05 16:19   ` Jan Beulich
  2020-08-05 16:40     ` Paul Durrant
  0 siblings, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-05 16:19 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, Paul Durrant
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, xen-devel,
	Daniel De Graaf

On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Trying to run emulator in driver domain I ran into various issues
> mostly policy-related. So this patch tries to resolve all them
> plobably in a hackish way. I would like to get feedback how
> to implement them properly as having an emulator in driver domain
> is a completely valid use-case.

From going over the comments I can only derive you want to run
an emulator in a driver domain, which doesn't really make sense
to me. A driver domain has a different purpose after all. If
instead you mean it to be run in just some other domain (which
also isn't the domain controlling the target), then there may
be more infrastructure changes needed.

Paul - was/is your standalone ioreq server (demu?) able to run
in other than the domain controlling a guest?

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation
  2020-08-03 18:21 ` [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation Oleksandr Tyshchenko
@ 2020-08-05 16:21   ` Jan Beulich
  2020-08-06 11:35     ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-05 16:21 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, xen-devel,
	Volodymyr Babchuk

On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1652,6 +1652,12 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    /* x86 already sets the flag in hvm_memory_op() */
> +#if defined(CONFIG_ARM64) && defined(CONFIG_IOREQ_SERVER)
> +    if ( op == XENMEM_decrease_reservation )
> +        curr_d->arch.hvm.qemu_mapcache_invalidate = true;
> +#endif

Doesn't the comment already indicate a route towards an approach
not requiring to alter common code?

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* RE: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
  2020-08-05 16:19   ` Jan Beulich
@ 2020-08-05 16:40     ` Paul Durrant
  2020-08-06  9:22       ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Paul Durrant @ 2020-08-05 16:40 UTC (permalink / raw)
  To: 'Jan Beulich', 'Oleksandr Tyshchenko'
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Oleksandr Tyshchenko',
	xen-devel, 'Daniel De Graaf'

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 05 August 2020 17:20
> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant <paul@xen.org>
> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew
> Cooper <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Ian Jackson
> <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
> <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
> Subject: Re: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
> 
> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
> > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >
> > Trying to run emulator in driver domain I ran into various issues
> > mostly policy-related. So this patch tries to resolve all them
> > plobably in a hackish way. I would like to get feedback how
> > to implement them properly as having an emulator in driver domain
> > is a completely valid use-case.
> 
> From going over the comments I can only derive you want to run
> an emulator in a driver domain, which doesn't really make sense
> to me. A driver domain has a different purpose after all. If
> instead you mean it to be run in just some other domain (which
> also isn't the domain controlling the target), then there may
> be more infrastructure changes needed.
> 
> Paul - was/is your standalone ioreq server (demu?) able to run
> in other than the domain controlling a guest?
> 

Not something I've done yet, but it was always part of the idea so that we could e.g. pass through a device to a dedicated domain and then run multiple demu instances there to virtualize it for many domUs. (I'm thinking here of a device that is not SR-IOV and hence would need some bespoke emulation code to share it out). That dedicated domain would be termed the 'driver domain' simply because it is running the device driver for the h/w that underpins the emulation.

  Paul



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-05  7:05     ` Jan Beulich
@ 2020-08-05 16:41       ` Stefano Stabellini
  2020-08-05 19:45         ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-05 16:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko,
	Oleksandr Tyshchenko, Julien Grall, xen-devel, Daniel De Graaf,
	Volodymyr Babchuk

On Wed, 5 Aug 2020, Jan Beulich wrote:
> On 05.08.2020 01:22, Stefano Stabellini wrote:
> > On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> >> --- a/xen/include/asm-arm/p2m.h
> >> +++ b/xen/include/asm-arm/p2m.h
> >> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
> >>                                          mfn_t mfn)
> >>  {
> >>      /*
> >> -     * NOTE: If this is implemented then proper reference counting of
> >> -     *       foreign entries will need to be implemented.
> >> +     * XXX: handle properly reference. It looks like the page may not always
> >> +     * belong to d.
> > 
> > Just as a reference, and without taking away anything from the comment,
> > I think that QEMU is doing its own internal reference counting for these
> > mappings.
> 
> Which of course in no way replaces the need to do proper ref counting
> in Xen. (Just FAOD, as I'm not sure why you've said what you've said.)

Given the state of the series, which is a RFC, I only meant to say that
the lack of refcounting shouldn't prevent things from working when using
QEMU. In the sense that if somebody wants to give it a try for an early
demo, they should be able to see it running without crashes.

Of course, refcounting needs to be implemented.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-05 14:12   ` Julien Grall
  2020-08-05 14:45     ` Jan Beulich
@ 2020-08-05 19:30     ` Oleksandr
  2020-08-06 11:08       ` Julien Grall
  1 sibling, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-05 19:30 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	Daniel De Graaf, Volodymyr Babchuk


On 05.08.20 17:12, Julien Grall wrote:
> Hi,

Hi Julien


>
> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch makes possible to forward Guest MMIO accesses
>> to a device emulator on Arm and enables that support for
>> Arm64.
>>
>> Also update XSM code a bit to let DM op be used on Arm.
>> New arch DM op will be introduced in the follow-up patch.
>>
>> Please note, at the moment build on Arm32 is broken
>> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone
>> wants to enable CONFIG_IOREQ_SERVER due to the lack of
>> cmpxchg_64 support on Arm32.
>>
>> Please note, this is a split/cleanup of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>>   tools/libxc/xc_dom_arm.c        |  25 +++++++---
>>   xen/arch/arm/Kconfig            |   1 +
>>   xen/arch/arm/Makefile           |   2 +
>>   xen/arch/arm/dm.c               |  34 +++++++++++++
>>   xen/arch/arm/domain.c           |   9 ++++
>>   xen/arch/arm/hvm.c              |  46 +++++++++++++++++-
>>   xen/arch/arm/io.c               |  67 +++++++++++++++++++++++++-
>>   xen/arch/arm/ioreq.c            |  86 
>> +++++++++++++++++++++++++++++++++
>>   xen/arch/arm/traps.c            |  17 +++++++
>>   xen/common/memory.c             |   5 +-
>>   xen/include/asm-arm/domain.h    |  80 +++++++++++++++++++++++++++++++
>>   xen/include/asm-arm/hvm/ioreq.h | 103 
>> ++++++++++++++++++++++++++++++++++++++++
>>   xen/include/asm-arm/mmio.h      |   1 +
>>   xen/include/asm-arm/p2m.h       |   7 +--
>>   xen/include/xsm/dummy.h         |   4 +-
>>   xen/include/xsm/xsm.h           |   6 +--
>>   xen/xsm/dummy.c                 |   2 +-
>>   xen/xsm/flask/hooks.c           |   5 +-
>>   18 files changed, 476 insertions(+), 24 deletions(-)
>>   create mode 100644 xen/arch/arm/dm.c
>>   create mode 100644 xen/arch/arm/ioreq.c
>>   create mode 100644 xen/include/asm-arm/hvm/ioreq.h
>
> It feels to me the patch is doing quite a few things that are 
> indirectly related. Can this be split to make the review easier?
>
> I would like at least the following split from the series:
>    - The tools changes
>    - The P2M changes
>    - The HVMOP plumbing (if we still require them)
Sure, will split.
However, I don't quite understand where we should leave HVMOP plumbing.
If I understand correctly the suggestion was to switch to acquire 
interface instead (which requires a Linux version to be v4.17 at least)?
I suspect, this is all about "xen/privcmd: add 
IOCTL_PRIVCMD_MMAP_RESOURCE" support for Linux?
Sorry, if asking a lot of questions, my developing environment is based 
on Vendor's BSP which uses v4.14 currently.


>
> [...]
>
>> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c
>> new file mode 100644
>> index 0000000..2437099
>> --- /dev/null
>> +++ b/xen/arch/arm/dm.c
>> @@ -0,0 +1,34 @@
>> +/*
>> + * Copyright (c) 2019 Arm ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License 
>> along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <xen/hypercall.h>
>> +#include <asm/vgic.h>
>
> The list of includes sounds strange. Can we make sure to include only 
> necessary headers and add the others when they are required?

Sure, I moved arch_dm_op internals to the next patch in this series, but 
forgot to move corresponding headers as well.


>
>
>> +
>> +int arch_dm_op(struct xen_dm_op *op, struct domain *d,
>> +               const struct dmop_args *op_args, bool *const_op)
>> +{
>> +    return -EOPNOTSUPP;
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>
>>   long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>>   {
>>       long rc = 0;
>> @@ -111,7 +155,7 @@ long do_hvm_op(unsigned long op, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>               if ( rc )
>>                   goto param_fail;
>>   -            d->arch.hvm.params[a.index] = a.value;
>> +            rc = hvmop_set_param(d, &a);
>>           }
>>           else
>>           {
>> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
>> index ae7ef96..436f669 100644
>> --- a/xen/arch/arm/io.c
>> +++ b/xen/arch/arm/io.c
>> @@ -16,6 +16,7 @@
>>    * GNU General Public License for more details.
>>    */
>>   +#include <xen/hvm/ioreq.h>
>>   #include <xen/lib.h>
>>   #include <xen/spinlock.h>
>>   #include <xen/sched.h>
>> @@ -107,6 +108,62 @@ static const struct mmio_handler 
>> *find_mmio_handler(struct domain *d,
>>       return handler;
>>   }
>>   +#ifdef CONFIG_IOREQ_SERVER
>
> Can we just implement this function in ioreq.c and provide a stub when 
> !CONFIG_IOREQ_SERVER?

Sure


>
>
>> +static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>> +                                    struct vcpu *v, mmio_info_t *info)
>> +{
>> +    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>> +    ioreq_t p = {
>> +        .type = IOREQ_TYPE_COPY,
>> +        .addr = info->gpa,
>> +        .size = 1 << info->dabt.size,
>> +        .count = 0,
>> +        .dir = !info->dabt.write,
>> +        .df = 0,         /* XXX: What's for? */
>> +        .data = get_user_reg(regs, info->dabt.reg),
>> +        .state = STATE_IOREQ_READY,
>> +    };
>> +    struct hvm_ioreq_server *s = NULL;
>> +    enum io_state rc;
>> +
>> +    switch ( vio->io_req.state )
>> +    {
>> +    case STATE_IOREQ_NONE:
>> +        break;
>> +    default:
>> +        printk("d%u wrong state %u\n", v->domain->domain_id,
>> +               vio->io_req.state);
>
> This will likely want to be a gprintk() or gdprintk() to avoid a guest 
> spamming Xen.

ok


>
>> +        return IO_ABORT;
>> +    }
>> +
>> +    s = hvm_select_ioreq_server(v->domain, &p);
>> +    if ( !s )
>> +        return IO_UNHANDLED;
>> +
>> +    if ( !info->dabt.valid )
>> +    {
>> +        printk("Valid bit not set\n");
>
> Same here. However, I am not convinced this is a useful message to keep.

ok


>
>> +        return IO_ABORT;
>> +    }
>> +
>> +    vio->io_req = p;
>> +
>> +    rc = hvm_send_ioreq(s, &p, 0);
>> +    if ( rc != IO_RETRY || v->domain->is_shutting_down )
>> +        vio->io_req.state = STATE_IOREQ_NONE;
>> +    else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
>> +        rc = IO_HANDLED;
>> +    else
>> +        vio->io_completion = HVMIO_mmio_completion;
>> +
>> +    /* XXX: Decide what to do */
>
> We want to understand how IO_RETRY can happen on x86 first. With that, 
> we should be able to understand whether this can happen on Arm as well.

Noted


>
>
>> +    if ( rc == IO_RETRY )
>> +        rc = IO_HANDLED;
>> +
>> +    return rc;
>> +}
>> +#endif
>> +
>>   enum io_state try_handle_mmio(struct cpu_user_regs *regs,
>>                                 const union hsr hsr,
>>                                 paddr_t gpa)
>> @@ -123,7 +180,15 @@ enum io_state try_handle_mmio(struct 
>> cpu_user_regs *regs,
>>         handler = find_mmio_handler(v->domain, info.gpa);
>>       if ( !handler )
>> -        return IO_UNHANDLED;
>> +    {
>> +        int rc = IO_UNHANDLED;
>> +
>> +#ifdef CONFIG_IOREQ_SERVER
>> +        rc = try_fwd_ioserv(regs, v, &info);
>> +#endif
>> +
>> +        return rc;
>> +    }
>>         /* All the instructions used on emulated MMIO region should 
>> be valid */
>>       if ( !dabt.valid )
>> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
>> new file mode 100644
>> index 0000000..a9cc839
>> --- /dev/null
>> +++ b/xen/arch/arm/ioreq.c
>> @@ -0,0 +1,86 @@
>> +/*
>> + * arm/ioreq.c: hardware virtual machine I/O emulation
>> + *
>> + * Copyright (c) 2019 Arm ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License 
>> along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <xen/ctype.h>
>> +#include <xen/hvm/ioreq.h>
>> +#include <xen/init.h>
>> +#include <xen/lib.h>
>> +#include <xen/trace.h>
>> +#include <xen/sched.h>
>> +#include <xen/irq.h>
>> +#include <xen/softirq.h>
>> +#include <xen/domain.h>
>> +#include <xen/domain_page.h>
>> +#include <xen/event.h>
>> +#include <xen/paging.h>
>> +#include <xen/vpci.h>
>> +
>> +#include <public/hvm/dm_op.h>
>> +#include <public/hvm/ioreq.h>
>> +
>> +bool handle_mmio(void)
>
> The name of the function is pretty generic and can be confusing on Arm 
> (we already have a try_handle_mmio()).
>
> What is this function supposed to do?
Agree, sounds confusing a bit. I assume it is supposed to complete Guest 
MMIO access after finishing emulation.

Shall I rename it to something appropriate (maybe by adding ioreq prefix)?


>
>
>> +{
>> +    struct vcpu *v = current;
>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>> +    const union hsr hsr = { .bits = regs->hsr };
>> +    const struct hsr_dabt dabt = hsr.dabt;
>> +    /* Code is similar to handle_read */
>> +    uint8_t size = (1 << dabt.size) * 8;
>> +    register_t r = v->arch.hvm.hvm_io.io_req.data;
>> +
>> +    /* We should only be here on Guest Data Abort */
>> +    ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL);
>> +
>> +    /* We are done with the IO */
>> +    /* XXX: Is it the right place? */
>> +    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
>> +
>> +    /* XXX: Do we need to take care of write here ? */
>> +    if ( dabt.write )
>> +        return true;
>> +
>> +    /*
>> +     * Sign extend if required.
>> +     * Note that we expect the read handler to have zeroed the bits
>> +     * outside the requested access size.
>> +     */
>> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
>> +    {
>> +        /*
>> +         * We are relying on register_t using the same as
>> +         * an unsigned long in order to keep the 32-bit assembly
>> +         * code smaller.
>> +         */
>> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
>> +        r |= (~0UL) << size;
>> +    }
>> +
>> +    set_user_reg(regs, dabt.reg, r);
>> +
>> +    return true;
>> +}
>
> [...]
>
>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>> index 9283e5e..0000477 100644
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -8,6 +8,7 @@
>>    */
>>     #include <xen/domain_page.h>
>> +#include <xen/hvm/ioreq.h>
>>   #include <xen/types.h>
>>   #include <xen/lib.h>
>>   #include <xen/mm.h>
>> @@ -30,10 +31,6 @@
>>   #include <public/memory.h>
>>   #include <xsm/xsm.h>
>>   -#ifdef CONFIG_IOREQ_SERVER
>> -#include <xen/hvm/ioreq.h>
>> -#endif
>> -
>
> Why do you remove something your just introduced?
The reason I guarded that header is to make "xen/mm: Make x86's 
XENMEM_resource_ioreq_server handling common" (previous) patch buildable 
on Arm
without arch IOREQ header added yet. I tried to make sure that the 
result after each patch was buildable to retain bisectability.
As current patch adds Arm IOREQ specific bits (including header), that 
guard could be removed as not needed anymore.


>
>>   #ifdef CONFIG_X86
>>   #include <asm/guest.h>
>>   #endif
>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>> index 4e2f582..e060b0a 100644
>> --- a/xen/include/asm-arm/domain.h
>> +++ b/xen/include/asm-arm/domain.h
>> @@ -11,12 +11,64 @@
>>   #include <asm/vgic.h>
>>   #include <asm/vpl011.h>
>>   #include <public/hvm/params.h>
>> +#include <public/hvm/dm_op.h>
>> +#include <public/hvm/ioreq.h>
>>   #include <xen/serial.h>
>>   #include <xen/rbtree.h>
>>   +struct hvm_ioreq_page {
>> +    gfn_t gfn;
>> +    struct page_info *page;
>> +    void *va;
>> +};
>
> AFAICT all the structures/define you introduced here are used by the 
> code common. So it feels to me they should be defined in a common header.

Make sense. Probably worth moving. I assume this also applies to x86 ones.


>
>
>> +
>> +struct hvm_ioreq_vcpu {
>> +    struct list_head list_entry;
>> +    struct vcpu      *vcpu;
>> +    evtchn_port_t    ioreq_evtchn;
>> +    bool             pending;
>> +};
>> +
>> +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
>> +#define MAX_NR_IO_RANGES  256
>> +
>> +#define MAX_NR_IOREQ_SERVERS 8
>> +#define DEFAULT_IOSERVID 0
>> +
>> +struct hvm_ioreq_server {
>> +    struct domain          *target, *emulator;
>> +
>> +    /* Lock to serialize toolstack modifications */
>> +    spinlock_t             lock;
>> +
>> +    struct hvm_ioreq_page  ioreq;
>> +    struct list_head       ioreq_vcpu_list;
>> +    struct hvm_ioreq_page  bufioreq;
>> +
>> +    /* Lock to serialize access to buffered ioreq ring */
>> +    spinlock_t             bufioreq_lock;
>> +    evtchn_port_t          bufioreq_evtchn;
>> +    struct rangeset        *range[NR_IO_RANGE_TYPES];
>> +    bool                   enabled;
>> +    uint8_t                bufioreq_handling;
>> +};
>> +
>>   struct hvm_domain
>>   {
>>       uint64_t              params[HVM_NR_PARAMS];
>> +
>> +    /* Guest page range used for non-default ioreq servers */
>> +    struct {
>> +        unsigned long base;
>> +        unsigned long mask;
>> +        unsigned long legacy_mask; /* indexed by HVM param number */
>> +    } ioreq_gfn;
>> +
>> +    /* Lock protects all other values in the sub-struct and the 
>> default */
>> +    struct {
>> +        spinlock_t              lock;
>> +        struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
>> +    } ioreq_server;
>>   };
>>     #ifdef CONFIG_ARM_64
>> @@ -93,6 +145,29 @@ struct arch_domain
>>   #endif
>>   }  __cacheline_aligned;
>>   +enum hvm_io_completion {
>> +    HVMIO_no_completion,
>> +    HVMIO_mmio_completion,
>> +    HVMIO_pio_completion,
>> +    HVMIO_realmode_completion
>> +};
>> +
>> +struct hvm_vcpu_io {
>> +    /* I/O request in flight to device model. */
>> +    enum hvm_io_completion io_completion;
>> +    ioreq_t                io_req;
>> +
>> +    /*
>> +     * HVM emulation:
>> +     *  Linear address @mmio_gla maps to MMIO physical frame 
>> @mmio_gpfn.
>> +     *  The latter is known to be an MMIO frame (not RAM).
>> +     *  This translation is only valid for accesses as per 
>> @mmio_access.
>> +     */
>> +    struct npfec        mmio_access;
>> +    unsigned long       mmio_gla;
>> +    unsigned long       mmio_gpfn;
>> +};
>> +
>>   struct arch_vcpu
>>   {
>>       struct {
>> @@ -206,6 +281,11 @@ struct arch_vcpu
>>        */
>>       bool need_flush_to_ram;
>>   +    struct hvm_vcpu
>> +    {
>> +        struct hvm_vcpu_io hvm_io;
>> +    } hvm;
>> +
>>   }  __cacheline_aligned;
>>     void vcpu_show_execution_state(struct vcpu *);
>> diff --git a/xen/include/asm-arm/hvm/ioreq.h 
>> b/xen/include/asm-arm/hvm/ioreq.h
>> new file mode 100644
>> index 0000000..83a560c
>> --- /dev/null
>> +++ b/xen/include/asm-arm/hvm/ioreq.h
>> @@ -0,0 +1,103 @@
>> +/*
>> + * hvm.h: Hardware virtual machine assist interface definitions.
>> + *
>> + * Copyright (c) 2016 Citrix Systems Inc.
>> + * Copyright (c) 2019 Arm ltd.
>> + *
>> + * This program is free software; you can redistribute it and/or 
>> modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but 
>> WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of 
>> MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public 
>> License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License 
>> along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef __ASM_ARM_HVM_IOREQ_H__
>> +#define __ASM_ARM_HVM_IOREQ_H__
>> +
>> +#include <public/hvm/ioreq.h>
>> +#include <public/hvm/dm_op.h>
>> +
>> +#define has_vpci(d) (false)
>
> It feels to me this wants to be defined in vcpi.h.

ok, will move.


>
>
>> +
>> +bool handle_mmio(void);
>> +
>> +static inline bool handle_pio(uint16_t port, unsigned int size, int 
>> dir)
>> +{
>> +    /* XXX */
>
> Can you expand this TODO? What do you expect to do?
I didn't expect this to be called on Arm. Sorry, I am not sure l have an 
idea how to handle this properly. I would keep it unimplemented until a 
real reason.
Will expand TODO.


>
>
>> +    BUG();
>> +    return true;
>> +}
>> +
>> +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
>> +{
>> +    return p->addr;
>> +}
>
> I understand that the x86 version is more complex as it check p->df. 
> However, aside reducing the complexity, I am not sure why we would 
> want to diverge it.
>
> After all, IOREQ is now meant to be a common feature.
Well, no objections at all.
Could you please clarify how could 'df' (Direction Flag?) be 
handled/used on Arm? I see that try_fwd_ioserv() always sets it 0. Or I 
need to just reuse x86's helpers as is,
which (together with count = df = 0) will result in what we actually 
have here?


>
>> +
>> +static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p)
>> +{
>> +    unsigned long size = p->size;
>> +
>> +    return p->addr + size - 1;
>> +}
>
> Same.

+


>
>> +
>> +struct hvm_ioreq_server;
>
> Why do we need a forward declaration?

I don't remember exactly, probably this way I temporally solved a build 
issue. Please let me recheck whether we could avoid using it.


>
>
>> +
>> +static inline int p2m_set_ioreq_server(struct domain *d,
>> +                                       unsigned int flags,
>> +                                       struct hvm_ioreq_server *s)
>> +{
>> +    return -EOPNOTSUPP;
>> +}
>
> This should be defined in p2m.h. But I am not even sure what it is 
> meant for. Can you expand it?

ok, will move.


In this series I tried to make as much IOREQ code common as possible and 
avoid complicating things, in order to achieve that a few stubs were 
added here. Please note,
that I also considered splitting into arch parts. But some functions 
couldn't be split easily.
This one is called from common hvm_destroy_ioreq_server() with flag 
being 0 (which will result in unmapping ioreq server from p2m type on x86).
I could add a comment describing why this stub is present here.


>
>
>> +
>> +static inline void msix_write_completion(struct vcpu *v)
>> +{
>> +}
>> +
>> +static inline void handle_realmode_completion(void)
>> +{
>> +    ASSERT_UNREACHABLE();
>> +}
>
> realmode is very x86 specific. So I don't think this function should 
> be called from common code. It might be worth considering to split 
> handle_hvm_io_completion() is 2 parts: common and arch specific.

I agree with you that realmode is x86 specific and looks not good in Arm 
header. I was thinking how to split handle_hvm_io_completion() 
gracefully but I failed find a good solution for that, so decided to add 
two stubs (msix_write_completion and handle_realmode_completion) on Arm. 
I could add a comment describing why they are here if appropriate. But 
if you think they shouldn't be called from the common code in any way, I 
will try to split it.


>
>> +
>> +static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
>> +{
>> +}
>
> This will want to be stubbed in asm-arm/paging.h.

ok


>
>
>> +
>> +static inline void hvm_get_ioreq_server_range_type(struct domain *d,
>> +                                                   ioreq_t *p,
>> +                                                   uint8_t *type,
>> +                                                   uint64_t *addr)
>> +{
>> +    *type = (p->type == IOREQ_TYPE_PIO) ?
>> +             XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY;
>> +    *addr = p->addr;
>> +}
>> +
>> +static inline void arch_hvm_ioreq_init(struct domain *d)
>> +{
>> +}
>> +
>> +static inline void arch_hvm_ioreq_destroy(struct domain *d)
>> +{
>> +}
>> +
>> +#define IOREQ_IO_HANDLED     IO_HANDLED
>> +#define IOREQ_IO_UNHANDLED   IO_UNHANDLED
>> +#define IOREQ_IO_RETRY       IO_RETRY
>> +
>> +#endif /* __ASM_X86_HVM_IOREQ_H__ */
>
> s/X86/ARM/

ok


>
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * tab-width: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h
>> index 8dbfb27..7ab873c 100644
>> --- a/xen/include/asm-arm/mmio.h
>> +++ b/xen/include/asm-arm/mmio.h
>> @@ -37,6 +37,7 @@ enum io_state
>>       IO_ABORT,       /* The IO was handled by the helper and led to 
>> an abort. */
>>       IO_HANDLED,     /* The IO was successfully handled by the 
>> helper. */
>>       IO_UNHANDLED,   /* The IO was not handled by the helper. */
>> +    IO_RETRY,       /* Retry the emulation for some reason */
>>   };
>>     typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info,
>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>> index 5fdb6e8..5823f11 100644
>> --- a/xen/include/asm-arm/p2m.h
>> +++ b/xen/include/asm-arm/p2m.h
>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct 
>> domain *d, unsigned long gfn,
>>                                           mfn_t mfn)
>>   {
>>       /*
>> -     * NOTE: If this is implemented then proper reference counting of
>> -     *       foreign entries will need to be implemented.
>> +     * XXX: handle properly reference. It looks like the page may 
>> not always
>> +     * belong to d.
>>        */
>> -    return -EOPNOTSUPP;
>> +
>> +    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
>
> Treating foreign as p2m_ram_rw is more an hack that a real fix. I have 
> answered to this separately (see my answer on Stefano's e-mail), so we 
> can continue the conversation there.

ok



-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-05 16:41       ` Stefano Stabellini
@ 2020-08-05 19:45         ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-05 19:45 UTC (permalink / raw)
  To: Stefano Stabellini, Jan Beulich
  Cc: Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Julien Grall, xen-devel, Daniel De Graaf,
	Volodymyr Babchuk


On 05.08.20 19:41, Stefano Stabellini wrote:
Hi Stefano

> On Wed, 5 Aug 2020, Jan Beulich wrote:
>> On 05.08.2020 01:22, Stefano Stabellini wrote:
>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>>>> --- a/xen/include/asm-arm/p2m.h
>>>> +++ b/xen/include/asm-arm/p2m.h
>>>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>>>>                                           mfn_t mfn)
>>>>   {
>>>>       /*
>>>> -     * NOTE: If this is implemented then proper reference counting of
>>>> -     *       foreign entries will need to be implemented.
>>>> +     * XXX: handle properly reference. It looks like the page may not always
>>>> +     * belong to d.
>>> Just as a reference, and without taking away anything from the comment,
>>> I think that QEMU is doing its own internal reference counting for these
>>> mappings.
>> Which of course in no way replaces the need to do proper ref counting
>> in Xen. (Just FAOD, as I'm not sure why you've said what you've said.)
> Given the state of the series, which is a RFC, I only meant to say that
> the lack of refcounting shouldn't prevent things from working when using
> QEMU. In the sense that if somebody wants to give it a try for an early
> demo, they should be able to see it running without crashes.

Yes, for the early demo it works fine, however I don't use Qemu.


>
> Of course, refcounting needs to be implemented.

+


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-05 16:13   ` Jan Beulich
@ 2020-08-05 19:47     ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-05 19:47 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Daniel De Graaf, Volodymyr Babchuk


On 05.08.20 19:13, Jan Beulich wrote:

Hi, Jan

> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/xsm/dummy.h
>> +++ b/xen/include/xsm/dummy.h
>> @@ -713,14 +713,14 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
>>       }
>>   }
>>   
>> +#endif /* CONFIG_X86 */
>> +
>>   static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d)
>>   {
>>       XSM_ASSERT_ACTION(XSM_DM_PRIV);
>>       return xsm_default_action(action, current->domain, d);
>>   }
>>   
>> -#endif /* CONFIG_X86 */
>> -
>>   #ifdef CONFIG_ARGO
>>   static XSM_INLINE int xsm_argo_enable(const struct domain *d)
>>   {
>> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
>> index a80bcf3..2a9b39d 100644
>> --- a/xen/include/xsm/xsm.h
>> +++ b/xen/include/xsm/xsm.h
>> @@ -177,8 +177,8 @@ struct xsm_operations {
>>       int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>>       int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>>       int (*pmu_op) (struct domain *d, unsigned int op);
>> -    int (*dm_op) (struct domain *d);
>>   #endif
>> +    int (*dm_op) (struct domain *d);
>>       int (*xen_version) (uint32_t cmd);
>>       int (*domain_resource_map) (struct domain *d);
>>   #ifdef CONFIG_ARGO
>> @@ -688,13 +688,13 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int
>>       return xsm_ops->pmu_op(d, op);
>>   }
>>   
>> +#endif /* CONFIG_X86 */
>> +
>>   static inline int xsm_dm_op(xsm_default_t def, struct domain *d)
>>   {
>>       return xsm_ops->dm_op(d);
>>   }
>>   
>> -#endif /* CONFIG_X86 */
>> -
>>   static inline int xsm_xen_version (xsm_default_t def, uint32_t op)
>>   {
>>       return xsm_ops->xen_version(op);
>> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
>> index d4cce68..e3afd06 100644
>> --- a/xen/xsm/dummy.c
>> +++ b/xen/xsm/dummy.c
>> @@ -148,8 +148,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops)
>>       set_to_dummy_if_null(ops, ioport_permission);
>>       set_to_dummy_if_null(ops, ioport_mapping);
>>       set_to_dummy_if_null(ops, pmu_op);
>> -    set_to_dummy_if_null(ops, dm_op);
>>   #endif
>> +    set_to_dummy_if_null(ops, dm_op);
>>       set_to_dummy_if_null(ops, xen_version);
>>       set_to_dummy_if_null(ops, domain_resource_map);
>>   #ifdef CONFIG_ARGO
>> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
>> index a314bf8..645192a 100644
>> --- a/xen/xsm/flask/hooks.c
>> +++ b/xen/xsm/flask/hooks.c
>> @@ -1662,14 +1662,13 @@ static int flask_pmu_op (struct domain *d, unsigned int op)
>>           return -EPERM;
>>       }
>>   }
>> +#endif /* CONFIG_X86 */
>>   
>>   static int flask_dm_op(struct domain *d)
>>   {
>>       return current_has_perm(d, SECCLASS_HVM, HVM__DM);
>>   }
>>   
>> -#endif /* CONFIG_X86 */
>> -
>>   static int flask_xen_version (uint32_t op)
>>   {
>>       u32 dsid = domain_sid(current->domain);
>> @@ -1872,8 +1871,8 @@ static struct xsm_operations flask_ops = {
>>       .ioport_permission = flask_ioport_permission,
>>       .ioport_mapping = flask_ioport_mapping,
>>       .pmu_op = flask_pmu_op,
>> -    .dm_op = flask_dm_op,
>>   #endif
>> +    .dm_op = flask_dm_op,
>>       .xen_version = flask_xen_version,
>>       .domain_resource_map = flask_domain_resource_map,
>>   #ifdef CONFIG_ARGO
> All of this looks to belong into patch 2?


Good point. Will move.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node
  2020-08-04 23:23   ` Stefano Stabellini
@ 2020-08-05 20:35     ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-05 20:35 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Wei Liu, Ian Jackson, Oleksandr Tyshchenko,
	Anthony PERARD, xen-devel


On 05.08.20 02:23, Stefano Stabellini wrote:

Hi Stefano

> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> Without "dma-coherent" property present in virtio-mmio device node,
>> guest assumes it is non-coherent and making non-cacheable accesses
>> to the vring when the DMA API is used for vring operations.
>> But virtio-mmio device which runs at the host size is making cacheable
>> accesses to vring. This all may result in a loss of coherency between
>> the guest and host.
>>
>> With this patch we can avoid modifying guest at all, otherwise we
>> need to force VirtIO framework to not use DMA API for vring operations.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> This should also be folded in the first patch for libxl

Agree, will do



-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way
  2020-08-04 23:22   ` Stefano Stabellini
@ 2020-08-05 20:51     ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-05 20:51 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Wei Liu, Ian Jackson, Oleksandr Tyshchenko,
	Anthony PERARD, xen-devel


On 05.08.20 02:22, Stefano Stabellini wrote:

Hi Stefano

> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch makes possible to use device passthrough again.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>>   tools/libxl/libxl_arm.c | 33 +++++++++++++++++++++++----------
>>   1 file changed, 23 insertions(+), 10 deletions(-)
>>
>> diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
>> index 620b499..4f748e3 100644
>> --- a/tools/libxl/libxl_arm.c
>> +++ b/tools/libxl/libxl_arm.c
>> @@ -9,6 +9,10 @@
>>   #include <assert.h>
>>   #include <xen/device_tree_defs.h>
>>   
>> +#define GUEST_VIRTIO_MMIO_BASE  xen_mk_ullong(0x02000000)
>> +#define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
>> +#define GUEST_VIRTIO_MMIO_SPI   33
> They should be in xen/include/public/arch-arm.h

ok


>
> Is one interrupt enough if there are multiple virtio devices? Is it one
> interrupt for all virtio devices, or one for each device?

   One interrupt for each virtio device. I experimented with current 
series and assigned 4 disk partitions to the guest. This resulted in 4 
separate device-tree nodes, and each node had individual SPI and MMIO range.


>
> Of course this patch should be folded in the patch to add virtio support
> to libxl.

ok


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 10/12] libxl: Add support for virtio-disk configuration
  2020-08-04 23:23   ` Stefano Stabellini
@ 2020-08-05 21:12     ` Oleksandr
  2020-08-06  0:37       ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-05 21:12 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Wei Liu, Ian Jackson, Oleksandr Tyshchenko,
	Anthony PERARD, xen-devel


On 05.08.20 02:23, Stefano Stabellini wrote:

Hi Stefano

> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> This patch adds basic support for configuring and assisting virtio-disk
>> backend (emualator) which is intended to run out of Qemu and could be run
>> in any domain.
>>
>> Xenstore was chosen as a communication interface for the emulator running
>> in non-toolstack domain to be able to get configuration either by reading
>> Xenstore directly or by receiving command line parameters (an updated 'xl devd'
>> running in the same domain would read Xenstore beforehand and call backend
>> executable with the required arguments).
>>
>> An example of domain configuration (two disks are assigned to the guest,
>> the latter is in readonly mode):
>>
>> vdisk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3;ro:/dev/mmcblk1p3' ]
>>
>> Where per-disk Xenstore entries are:
>> - filename and readonly flag (configured via "vdisk" property)
>> - base and irq (allocated dynamically)
>>
>> Besides handling 'visible' params described in configuration file,
>> patch also allocates virtio-mmio specific ones for each device and
>> writes them into Xenstore. virtio-mmio params (irq and base) are
>> unique per guest domain, they allocated at the domain creation time
>> and passed through to the emulator. Each VirtIO device has at least
>> one pair of these params.
>>
>> TODO:
>> 1. An extra "virtio" property could be removed.
>> 2. Update documentation.
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>>   tools/libxl/Makefile                 |   4 +-
>>   tools/libxl/libxl_arm.c              |  63 +++++++++++++++----
>>   tools/libxl/libxl_create.c           |   1 +
>>   tools/libxl/libxl_internal.h         |   1 +
>>   tools/libxl/libxl_types.idl          |  15 +++++
>>   tools/libxl/libxl_types_internal.idl |   1 +
>>   tools/libxl/libxl_virtio_disk.c      | 109 +++++++++++++++++++++++++++++++++
>>   tools/xl/Makefile                    |   2 +-
>>   tools/xl/xl.h                        |   3 +
>>   tools/xl/xl_cmdtable.c               |  15 +++++
>>   tools/xl/xl_parse.c                  | 115 +++++++++++++++++++++++++++++++++++
>>   tools/xl/xl_virtio_disk.c            |  46 ++++++++++++++
>>   12 files changed, 360 insertions(+), 15 deletions(-)
>>   create mode 100644 tools/libxl/libxl_virtio_disk.c
>>   create mode 100644 tools/xl/xl_virtio_disk.c
>>
>> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
>> index 38cd43a..df94b13 100644
>> --- a/tools/libxl/Makefile
>> +++ b/tools/libxl/Makefile
>> @@ -141,7 +141,9 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
>>   			libxl_vtpm.o libxl_nic.o libxl_disk.o libxl_console.o \
>>   			libxl_cpupool.o libxl_mem.o libxl_sched.o libxl_tmem.o \
>>   			libxl_9pfs.o libxl_domain.o libxl_vdispl.o \
>> -			libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o $(LIBXL_OBJS-y)
>> +			libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o \
>> +			libxl_virtio_disk.o $(LIBXL_OBJS-y)
>> +
>>   LIBXL_OBJS += libxl_genid.o
>>   LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
>>   
>> diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
>> index 4f748e3..469a8b0 100644
>> --- a/tools/libxl/libxl_arm.c
>> +++ b/tools/libxl/libxl_arm.c
>> @@ -13,6 +13,12 @@
>>   #define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
>>   #define GUEST_VIRTIO_MMIO_SPI   33
>>   
>> +#ifndef container_of
>> +#define container_of(ptr, type, member) ({			\
>> +        typeof( ((type *)0)->member ) *__mptr = (ptr);	\
>> +        (type *)( (char *)__mptr - offsetof(type,member) );})
>> +#endif
>> +
>>   static const char *gicv_to_string(libxl_gic_version gic_version)
>>   {
>>       switch (gic_version) {
>> @@ -44,14 +50,32 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
>>           vuart_enabled = true;
>>       }
>>   
>> -    /*
>> -     * XXX: Handle properly virtio
>> -     * A proper solution would be the toolstack to allocate the interrupts
>> -     * used by each virtio backend and let the backend now which one is used
>> -     */
>>       if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {
>> -        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
>> +        uint64_t virtio_base;
>> +        libxl_device_virtio_disk *virtio_disk;
>> +
>> +        virtio_base = GUEST_VIRTIO_MMIO_BASE;
>>           virtio_irq = GUEST_VIRTIO_MMIO_SPI;
>> +
>> +        if (!d_config->num_virtio_disks) {
>> +            LOG(ERROR, "Virtio is enabled, but no Virtio devices present\n");
>> +            return ERROR_FAIL;
>> +        }
>> +        virtio_disk = &d_config->virtio_disks[0];
>> +
>> +        for (i = 0; i < virtio_disk->num_disks; i++) {
>> +            virtio_disk->disks[i].base = virtio_base;
>> +            virtio_disk->disks[i].irq = virtio_irq;
>> +
>> +            LOG(DEBUG, "Allocate Virtio MMIO params: IRQ %u BASE 0x%"PRIx64,
>> +                virtio_irq, virtio_base);
>> +
>> +            virtio_irq ++;
>> +            virtio_base += GUEST_VIRTIO_MMIO_SIZE;
>> +        }
>> +        virtio_irq --;
>> +
>> +        nr_spis += (virtio_irq - 32) + 1;
> It looks like it is an interrupt per device, which could lead to quite a
> few of them being allocated.

Yes.


> So, if we end up allocating
> let's say 6 virtio interrupts for a domain, the chance of a clash with a
> physical interrupt of a passthrough device is real.

Yes.


>
> I am not entirely sure how to solve it, but these are a few ideas:
> - choosing virtio interrupts that are less likely to conflict (maybe >
>    1000)
> - make the virtio irq (optionally) configurable so that a user could
>    override the default irq and specify one that doesn't conflict
> - implementing support for virq != pirq (even the xl interface doesn't
>    allow to specify the virq number for passthrough devices, see "irqs")

Good ideas. The first requires minimum effort. Couldn't we choose virtio 
interrupt to allocate after making sure it is absent in guest "irqs" 
(d_config->b_info.irqs[i]), I mean to find some holes?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-05 16:15   ` Jan Beulich
@ 2020-08-05 22:12     ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-05 22:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	xen-devel, Volodymyr Babchuk


On 05.08.20 19:15, Jan Beulich wrote:

Hi, Jan

> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>> --- a/xen/include/public/hvm/dm_op.h
>> +++ b/xen/include/public/hvm/dm_op.h
>> @@ -417,6 +417,20 @@ struct xen_dm_op_pin_memory_cacheattr {
>>       uint32_t pad;
>>   };
>>   
>> +/*
>> + * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's
>> + *                         IRQ lines.
>> + * XXX Handle PPIs.
>> + */
>> +#define XEN_DMOP_set_irq_level 19
>> +
>> +struct xen_dm_op_set_irq_level {
>> +    uint32_t irq;
>> +    /* IN - Level: 0 -> deasserted, 1 -> asserted */
>> +    uint8_t  level;
>> +};
> If this is the way to go (I've seen other discussion going on),
> please make sure you add explicit padding fields and ...

ok


>
>> +
>> +
>>   struct xen_dm_op {
> ... you don't add double blank lines.

ok


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-05  9:39     ` Julien Grall
@ 2020-08-06  0:37       ` Stefano Stabellini
  2020-08-06 11:32         ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-06  0:37 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Oleksandr Tyshchenko,
	Julien Grall, Jan Beulich, xen-devel, Volodymyr Babchuk

On Wed, 5 Aug 2020, Julien Grall wrote:
> On 05/08/2020 00:22, Stefano Stabellini wrote:
> > On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > This patch adds ability to the device emulator to notify otherend
> > > (some entity running in the guest) using a SPI and implements Arm
> > > specific bits for it. Proposed interface allows emulator to set
> > > the logical level of a one of a domain's IRQ lines.
> > > 
> > > Please note, this is a split/cleanup of Julien's PoC:
> > > "Add support for Guest IO forwarding to a device emulator"
> > > 
> > > Signed-off-by: Julien Grall <julien.grall@arm.com>
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > >   tools/libs/devicemodel/core.c                   | 18 ++++++++++++++++++
> > >   tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
> > >   tools/libs/devicemodel/libxendevicemodel.map    |  1 +
> > >   xen/arch/arm/dm.c                               | 22
> > > +++++++++++++++++++++-
> > >   xen/common/hvm/dm.c                             |  1 +
> > >   xen/include/public/hvm/dm_op.h                  | 15 +++++++++++++++
> > >   6 files changed, 60 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
> > > index 4d40639..30bd79f 100644
> > > --- a/tools/libs/devicemodel/core.c
> > > +++ b/tools/libs/devicemodel/core.c
> > > @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
> > >       return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> > >   }
> > >   +int xendevicemodel_set_irq_level(
> > > +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
> > > +    unsigned int level)
> > 
> > It is a pity that having xen_dm_op_set_pci_intx_level and
> > xen_dm_op_set_isa_irq_level already we need to add a third one, but from
> > the names alone I don't think we can reuse either of them.
> 
> The problem is not the name...
> 
> > 
> > It is very similar to set_isa_irq_level. We could almost rename
> > xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
> > better, just add an alias to it so that xendevicemodel_set_irq_level is
> > implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
> > not sure if it is worth doing it though. Any other opinions?
> 
> ... the problem is the interrupt field is only 8-bit. So we would only be able
> to cover IRQ 0 - 255.

Argh, that's not going to work :-(  I wasn't sure if it was a good idea
anyway.


> It is not entirely clear how the existing subop could be extended without
> breaking existing callers.
>
> > But I think we should plan for not needing two calls (one to set level
> > to 1, and one to set it to 0):
> > https://marc.info/?l=xen-devel&m=159535112027405
> 
> I am not sure to understand your suggestion here? Are you suggesting to remove
> the 'level' parameter?

My hope was to make it optional to call the hypercall with level = 0,
not necessarily to remove 'level' from the struct.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-05  7:01           ` Jan Beulich
@ 2020-08-06  0:37             ` Stefano Stabellini
  2020-08-06  6:59               ` Jan Beulich
  2020-08-07 16:45               ` Oleksandr
  0 siblings, 2 replies; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-06  0:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: 'Kevin Tian',
	Stefano Stabellini, Julien Grall, 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	Oleksandr, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Jun Nakajima', xen-devel, 'Roger Pau Monné'

On Wed, 5 Aug 2020, Jan Beulich wrote:
> On 04.08.2020 21:11, Stefano Stabellini wrote:
> >> The point of the check isn't to determine whether to wait, but
> >> what to do after having waited. Reads need a retry round through
> >> the emulator (to store the result in the designated place),
> >> while writes don't have such a requirement (and hence guest
> >> execution can continue immediately in the general case).
> > 
> > The x86 code looks like this:
> > 
> >             rc = hvm_send_ioreq(s, &p, 0);
> >             if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
> >                 vio->io_req.state = STATE_IOREQ_NONE;
> >             else if ( !hvm_ioreq_needs_completion(&vio->io_req) )
> >                 rc = X86EMUL_OKAY;
> > 
> > Basically hvm_send_ioreq is expected to return RETRY.
> > Then, if it is a PIO write operation only, it is turned into OKAY right
> > away. Otherwise, rc stays as RETRY.
> > 
> > So, normally, hvmemul_do_io is expected to return RETRY, because the
> > emulator is not done yet. Am I understanding the code correctly?
> 
> "The emulator" unfortunately is ambiguous here: Do you mean qemu
> (or whichever else ioreq server) or the x86 emulator inside Xen?

I meant QEMU. I'll use "QEMU" instead of "emulator" in this thread going
forward for clarity.


> There are various conditions leading to RETRY. As far as
> hvm_send_ioreq() goes, it is expected to return RETRY whenever
> some sort of response is to be expected (the most notable
> exception being the hvm_send_buffered_ioreq() path), or when
> submitting the request isn't possible in the first place.
> 
> > If so, who is handling RETRY on x86? It tried to follow the call chain
> > but ended up in the x86 emulator and got lost :-)
> 
> Not sure I understand the question correctly, but I'll try an
> answer nevertheless: hvm_send_ioreq() arranges for the vCPU to be
> put to sleep (prepare_wait_on_xen_event_channel()). Once the event
> channel got signaled (and vCPU unblocked), hvm_do_resume() ->
> handle_hvm_io_completion() -> hvm_wait_for_io() then check whether
> the wait reason has been satisfied (wait_on_xen_event_channel()),
> and ...
> 
> > At some point later, after the emulator (QEMU) has completed the
> > request, handle_hvm_io_completion gets called which ends up calling
> > handle_mmio() finishing the job on the Xen side too.
> 
> ..., as you say, handle_hvm_io_completion() invokes the retry of
> the original operation (handle_mmio() or handle_pio() in
> particular) if need be.

OK, thanks for the details. My interpretation seems to be correct.

In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
also needs to handle try_handle_mmio returning IO_RETRY the first
around, and IO_HANDLED when after QEMU does its job.

What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
early and let the scheduler do its job? Something like:

            enum io_state state = try_handle_mmio(regs, hsr, gpa);

            switch ( state )
            {
            case IO_ABORT:
                goto inject_abt;
            case IO_HANDLED:
                advance_pc(regs, hsr);
                return;
            case IO_RETRY:
                /* finish later */
                return;
            case IO_UNHANDLED:
                /* IO unhandled, try another way to handle it. */
                break;
            default:
                ASSERT_UNREACHABLE();
            }

Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
handle_hvm_io_completion() after QEMU completes the emulation. Today,
handle_mmio just sets the user register with the read value.

But it would be better if it called again the original function
do_trap_stage2_abort_guest to actually retry the original operation.
This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
IO_HANDLED instead of IO_RETRY, thus, it will advance_pc (the program
counter) completing the handling of this instruction.

The user register with the read value could be set by try_handle_mmio if
try_fwd_ioserv returns IO_HANDLED instead of IO_RETRY.

Is that how the state machine is expected to work?


> What's potentially confusing is that there's a second form of
> retry, invoked by the x86 insn emulator itself when it needs to
> split complex insns (the repeated string insns being the most
> important example). This results in actually exiting back to guest
> context without having advanced rIP, but after having updated
> other register state suitably (to express the progress made so
> far).

Ah! And it seems to be exactly the same label: X86EMUL_RETRY. It would
be a good idea to differentiate between them.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-05  8:33           ` Julien Grall
@ 2020-08-06  0:37             ` Stefano Stabellini
  2020-08-06  9:45               ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-06  0:37 UTC (permalink / raw)
  To: Julien Grall
  Cc: 'Kevin Tian', Stefano Stabellini, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', Oleksandr, 'Oleksandr Tyshchenko',
	'Julien Grall', 'Jan Beulich',
	xen-devel, 'Roger Pau Monné'

[-- Attachment #1: Type: text/plain, Size: 2464 bytes --]

On Wed, 5 Aug 2020, Julien Grall wrote:
> On 04/08/2020 20:11, Stefano Stabellini wrote:
> > On Tue, 4 Aug 2020, Julien Grall wrote:
> > > On 04/08/2020 12:10, Oleksandr wrote:
> > > > On 04.08.20 10:45, Paul Durrant wrote:
> > > > > > +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
> > > > > > +{
> > > > > > +    return ioreq->state == STATE_IOREQ_READY &&
> > > > > > +           !ioreq->data_is_ptr &&
> > > > > > +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
> > > > > > IOREQ_WRITE);
> > > > > > +}
> > > > > I don't think having this in common code is correct. The short-cut of
> > > > > not
> > > > > completing PIO reads seems somewhat x86 specific.
> > > 
> > > Hmmm, looking at the code, I think it doesn't wait for PIO writes to
> > > complete
> > > (not read). Did I miss anything?
> > > 
> > > > Does ARM even
> > > > > have the concept of PIO?
> > > > 
> > > > I am not 100% sure here, but it seems that doesn't have.
> > > 
> > > Technically, the PIOs exist on Arm, however they are accessed the same way
> > > as
> > > MMIO and will have a dedicated area defined by the HW.
> > > 
> > > AFAICT, on Arm64, they are only used for PCI IO Bar.
> > > 
> > > Now the question is whether we want to expose them to the Device Emulator
> > > as
> > > PIO or MMIO access. From a generic PoV, a DM shouldn't have to care about
> > > the
> > > architecture used. It should just be able to request a given IOport
> > > region.
> > > 
> > > So it may make sense to differentiate them in the common ioreq code as
> > > well.
> > > 
> > > I had a quick look at QEMU and wasn't able to tell if PIOs and MMIOs
> > > address
> > > space are different on Arm as well. Paul, Stefano, do you know what they
> > > are
> > > doing?
> > 
> > On the QEMU side, it looks like PIO (address_space_io) is used in
> > connection with the emulation of the "in" or "out" instructions, see
> > ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
> > space regardless of the architecture, such as
> > hw/pci/pci_bridge.c:pci_bridge_initfn.
> > 
> > However, because there is no "in" and "out" on ARM, I don't think
> > address_space_io can be accessed. Specifically, there is no equivalent
> > for target/i386/misc_helper.c:helper_inb on ARM.
> 
> So how PCI I/O BAR are accessed? Surely, they could be used on Arm, right?

PIO is also memory mapped on ARM and it seems to have its own MMIO
address window.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 10/12] libxl: Add support for virtio-disk configuration
  2020-08-05 21:12     ` Oleksandr
@ 2020-08-06  0:37       ` Stefano Stabellini
  0 siblings, 0 replies; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-06  0:37 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Ian Jackson,
	Oleksandr Tyshchenko, Anthony PERARD, xen-devel

On Thu, 6 Aug 2020, Oleksandr wrote:
> On 05.08.20 02:23, Stefano Stabellini wrote:
> > On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > 
> > > This patch adds basic support for configuring and assisting virtio-disk
> > > backend (emualator) which is intended to run out of Qemu and could be run
> > > in any domain.
> > > 
> > > Xenstore was chosen as a communication interface for the emulator running
> > > in non-toolstack domain to be able to get configuration either by reading
> > > Xenstore directly or by receiving command line parameters (an updated 'xl
> > > devd'
> > > running in the same domain would read Xenstore beforehand and call backend
> > > executable with the required arguments).
> > > 
> > > An example of domain configuration (two disks are assigned to the guest,
> > > the latter is in readonly mode):
> > > 
> > > vdisk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3;ro:/dev/mmcblk1p3' ]
> > > 
> > > Where per-disk Xenstore entries are:
> > > - filename and readonly flag (configured via "vdisk" property)
> > > - base and irq (allocated dynamically)
> > > 
> > > Besides handling 'visible' params described in configuration file,
> > > patch also allocates virtio-mmio specific ones for each device and
> > > writes them into Xenstore. virtio-mmio params (irq and base) are
> > > unique per guest domain, they allocated at the domain creation time
> > > and passed through to the emulator. Each VirtIO device has at least
> > > one pair of these params.
> > > 
> > > TODO:
> > > 1. An extra "virtio" property could be removed.
> > > 2. Update documentation.
> > > 
> > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > >   tools/libxl/Makefile                 |   4 +-
> > >   tools/libxl/libxl_arm.c              |  63 +++++++++++++++----
> > >   tools/libxl/libxl_create.c           |   1 +
> > >   tools/libxl/libxl_internal.h         |   1 +
> > >   tools/libxl/libxl_types.idl          |  15 +++++
> > >   tools/libxl/libxl_types_internal.idl |   1 +
> > >   tools/libxl/libxl_virtio_disk.c      | 109
> > > +++++++++++++++++++++++++++++++++
> > >   tools/xl/Makefile                    |   2 +-
> > >   tools/xl/xl.h                        |   3 +
> > >   tools/xl/xl_cmdtable.c               |  15 +++++
> > >   tools/xl/xl_parse.c                  | 115
> > > +++++++++++++++++++++++++++++++++++
> > >   tools/xl/xl_virtio_disk.c            |  46 ++++++++++++++
> > >   12 files changed, 360 insertions(+), 15 deletions(-)
> > >   create mode 100644 tools/libxl/libxl_virtio_disk.c
> > >   create mode 100644 tools/xl/xl_virtio_disk.c
> > > 
> > > diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
> > > index 38cd43a..df94b13 100644
> > > --- a/tools/libxl/Makefile
> > > +++ b/tools/libxl/Makefile
> > > @@ -141,7 +141,9 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o
> > > libxl_dm.o libxl_pci.o \
> > >   			libxl_vtpm.o libxl_nic.o libxl_disk.o libxl_console.o
> > > \
> > >   			libxl_cpupool.o libxl_mem.o libxl_sched.o libxl_tmem.o
> > > \
> > >   			libxl_9pfs.o libxl_domain.o libxl_vdispl.o \
> > > -			libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o
> > > $(LIBXL_OBJS-y)
> > > +			libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o \
> > > +			libxl_virtio_disk.o $(LIBXL_OBJS-y)
> > > +
> > >   LIBXL_OBJS += libxl_genid.o
> > >   LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
> > >   diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
> > > index 4f748e3..469a8b0 100644
> > > --- a/tools/libxl/libxl_arm.c
> > > +++ b/tools/libxl/libxl_arm.c
> > > @@ -13,6 +13,12 @@
> > >   #define GUEST_VIRTIO_MMIO_SIZE  xen_mk_ullong(0x200)
> > >   #define GUEST_VIRTIO_MMIO_SPI   33
> > >   +#ifndef container_of
> > > +#define container_of(ptr, type, member) ({			\
> > > +        typeof( ((type *)0)->member ) *__mptr = (ptr);	\
> > > +        (type *)( (char *)__mptr - offsetof(type,member) );})
> > > +#endif
> > > +
> > >   static const char *gicv_to_string(libxl_gic_version gic_version)
> > >   {
> > >       switch (gic_version) {
> > > @@ -44,14 +50,32 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
> > >           vuart_enabled = true;
> > >       }
> > >   -    /*
> > > -     * XXX: Handle properly virtio
> > > -     * A proper solution would be the toolstack to allocate the
> > > interrupts
> > > -     * used by each virtio backend and let the backend now which one is
> > > used
> > > -     */
> > >       if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {
> > > -        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
> > > +        uint64_t virtio_base;
> > > +        libxl_device_virtio_disk *virtio_disk;
> > > +
> > > +        virtio_base = GUEST_VIRTIO_MMIO_BASE;
> > >           virtio_irq = GUEST_VIRTIO_MMIO_SPI;
> > > +
> > > +        if (!d_config->num_virtio_disks) {
> > > +            LOG(ERROR, "Virtio is enabled, but no Virtio devices
> > > present\n");
> > > +            return ERROR_FAIL;
> > > +        }
> > > +        virtio_disk = &d_config->virtio_disks[0];
> > > +
> > > +        for (i = 0; i < virtio_disk->num_disks; i++) {
> > > +            virtio_disk->disks[i].base = virtio_base;
> > > +            virtio_disk->disks[i].irq = virtio_irq;
> > > +
> > > +            LOG(DEBUG, "Allocate Virtio MMIO params: IRQ %u BASE
> > > 0x%"PRIx64,
> > > +                virtio_irq, virtio_base);
> > > +
> > > +            virtio_irq ++;
> > > +            virtio_base += GUEST_VIRTIO_MMIO_SIZE;
> > > +        }
> > > +        virtio_irq --;
> > > +
> > > +        nr_spis += (virtio_irq - 32) + 1;
> > It looks like it is an interrupt per device, which could lead to quite a
> > few of them being allocated.
> 
> Yes.
> 
> 
> > So, if we end up allocating
> > let's say 6 virtio interrupts for a domain, the chance of a clash with a
> > physical interrupt of a passthrough device is real.
> 
> Yes.
> 
> 
> > 
> > I am not entirely sure how to solve it, but these are a few ideas:
> > - choosing virtio interrupts that are less likely to conflict (maybe >
> >    1000)
> > - make the virtio irq (optionally) configurable so that a user could
> >    override the default irq and specify one that doesn't conflict
> > - implementing support for virq != pirq (even the xl interface doesn't
> >    allow to specify the virq number for passthrough devices, see "irqs")
> 
> Good ideas. The first requires minimum effort. Couldn't we choose virtio
> interrupt to allocate after making sure it is absent in guest "irqs"
> (d_config->b_info.irqs[i]), I mean to find some holes?

Yes, that might be possible too.

So far we have tried to stay away from dynamic irq allocation for guests
virtual devices, but we also didn't have to deal with potentially large
amount of them :-)


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-06  0:37             ` Stefano Stabellini
@ 2020-08-06  6:59               ` Jan Beulich
  2020-08-06 20:32                 ` Stefano Stabellini
  2020-08-07 16:45               ` Oleksandr
  1 sibling, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-06  6:59 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', Julien Grall, 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	Oleksandr, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Jun Nakajima', xen-devel, 'Roger Pau Monné'

On 06.08.2020 02:37, Stefano Stabellini wrote:
> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
> early and let the scheduler do its job? Something like:
> 
>             enum io_state state = try_handle_mmio(regs, hsr, gpa);
> 
>             switch ( state )
>             {
>             case IO_ABORT:
>                 goto inject_abt;
>             case IO_HANDLED:
>                 advance_pc(regs, hsr);
>                 return;
>             case IO_RETRY:
>                 /* finish later */
>                 return;
>             case IO_UNHANDLED:
>                 /* IO unhandled, try another way to handle it. */
>                 break;
>             default:
>                 ASSERT_UNREACHABLE();
>             }
> 
> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> handle_hvm_io_completion() after QEMU completes the emulation. Today,
> handle_mmio just sets the user register with the read value.
> 
> But it would be better if it called again the original function
> do_trap_stage2_abort_guest to actually retry the original operation.
> This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
> IO_HANDLED instead of IO_RETRY, thus, it will advance_pc (the program
> counter) completing the handling of this instruction.
> 
> The user register with the read value could be set by try_handle_mmio if
> try_fwd_ioserv returns IO_HANDLED instead of IO_RETRY.
> 
> Is that how the state machine is expected to work?

I think so. Just because it has taken us quite some time (years) on
the x86 side to get reasonably close to how hardware would behave
(I think we're still not fully there): The re-execution path needs
to make sure it observes exactly the same machine state as the
original path did. In particular changes to memory (by another vCPU)
must not be observed.

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-05 16:15   ` Andrew Cooper
@ 2020-08-06  8:20     ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-06  8:20 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Julien Grall, Jan Beulich,
	Wei Liu, Paul Durrant, Tim Deegan, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Julien Grall, Jun Nakajima,
	Roger Pau Monné


On 05.08.20 19:15, Andrew Cooper wrote:

Hi Andrew

> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>> diff --git a/xen/common/Makefile b/xen/common/Makefile
>> index 06881d0..f6fc3f8 100644
>> --- a/xen/common/Makefile
>> +++ b/xen/common/Makefile
>> @@ -70,6 +70,7 @@ extra-y := symbols-dummy.o
>>   
>>   obj-$(CONFIG_COVERAGE) += coverage/
>>   obj-y += sched/
>> +obj-$(CONFIG_IOREQ_SERVER) += hvm/
>>   obj-$(CONFIG_UBSAN) += ubsan/
>>   
>>   obj-$(CONFIG_NEEDS_LIBELF) += libelf/
>> diff --git a/xen/common/hvm/Makefile b/xen/common/hvm/Makefile
>> new file mode 100644
>> index 0000000..326215d
>> --- /dev/null
>> +++ b/xen/common/hvm/Makefile
>> @@ -0,0 +1 @@
>> +obj-y += ioreq.o
>> diff --git a/xen/common/hvm/ioreq.c b/xen/common/hvm/ioreq.c
>> new file mode 100644
>> index 0000000..7e1fa23
>> --- /dev/null
>> +++ b/xen/common/hvm/ioreq.c
>> <snip>
> HVM is an internal detail of arch specific code.  It should not escape
> into common code.
>
>  From x86's point of view, there is nothing conceptually wrong with
> having an IOREQ server for PV guests, although it is very unlikely at
> this point that adding support would be a good use of time.

Got it.


> Please make this into a proper top-level common set of functionality.

ok.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
  2020-08-05 16:40     ` Paul Durrant
@ 2020-08-06  9:22       ` Oleksandr
  2020-08-06  9:27         ` Jan Beulich
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-06  9:22 UTC (permalink / raw)
  To: paul, 'Jan Beulich'
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Oleksandr Tyshchenko',
	xen-devel, 'Daniel De Graaf'


On 05.08.20 19:40, Paul Durrant wrote:

Hi Jan, Paul

>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 05 August 2020 17:20
>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant <paul@xen.org>
>> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew
>> Cooper <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Ian Jackson
>> <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
>> <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
>> Subject: Re: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
>>
>> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> Trying to run emulator in driver domain I ran into various issues
>>> mostly policy-related. So this patch tries to resolve all them
>>> plobably in a hackish way. I would like to get feedback how
>>> to implement them properly as having an emulator in driver domain
>>> is a completely valid use-case.
>>  From going over the comments I can only derive you want to run
>> an emulator in a driver domain, which doesn't really make sense
>> to me. A driver domain has a different purpose after all. If
>> instead you mean it to be run in just some other domain (which
>> also isn't the domain controlling the target), then there may
>> be more infrastructure changes needed.
>>
>> Paul - was/is your standalone ioreq server (demu?) able to run
>> in other than the domain controlling a guest?
>>
> Not something I've done yet, but it was always part of the idea so that we could e.g. pass through a device to a dedicated domain and then run multiple demu instances there to virtualize it for many domUs. (I'm thinking here of a device that is not SR-IOV and hence would need some bespoke emulation code to share it out).That dedicated domain would be termed the 'driver domain' simply because it is running the device driver for the h/w that underpins the emulation.

I may abuse "driver domain" terminology, but indeed in our use-case we 
pass through a set of H/W devices to a dedicated domain which is running 
the device drivers for that H/Ws. Our target system comprises a thin 
Dom0 (without H/W devices at all), DomD (which owns most of the H/W 
devices) and DomU which runs on virtual devices. This patch tries to 
make changes at Xen side to be able run standalone ioreq server 
(emulator) in that dedicated (driver?) domain. Actually the virtio-mmio 
PoC is based on IOREQ/DM features and emulator (based on demu) acting as 
a virtio-mmio backend. But it may be various use-cases for that (some 
mediator for sharing specific H/W resource between Guests or custom PCI 
emulator for example). If this is valid from Xen PoV I would be happy to 
get feedback how to transform tweaks (hacks) in current patch into the 
proper support.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
  2020-08-06  9:22       ` Oleksandr
@ 2020-08-06  9:27         ` Jan Beulich
  2020-08-14 16:30           ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-06  9:27 UTC (permalink / raw)
  To: Oleksandr
  Cc: 'Stefano Stabellini', 'Julien Grall',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Oleksandr Tyshchenko',
	xen-devel, 'Daniel De Graaf'

On 06.08.2020 11:22, Oleksandr wrote:
> 
> On 05.08.20 19:40, Paul Durrant wrote:
> 
> Hi Jan, Paul
> 
>>> -----Original Message-----
>>> From: Jan Beulich <jbeulich@suse.com>
>>> Sent: 05 August 2020 17:20
>>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant <paul@xen.org>
>>> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew
>>> Cooper <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Ian Jackson
>>> <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
>>> <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
>>> Subject: Re: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
>>>
>>> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> Trying to run emulator in driver domain I ran into various issues
>>>> mostly policy-related. So this patch tries to resolve all them
>>>> plobably in a hackish way. I would like to get feedback how
>>>> to implement them properly as having an emulator in driver domain
>>>> is a completely valid use-case.
>>>  From going over the comments I can only derive you want to run
>>> an emulator in a driver domain, which doesn't really make sense
>>> to me. A driver domain has a different purpose after all. If
>>> instead you mean it to be run in just some other domain (which
>>> also isn't the domain controlling the target), then there may
>>> be more infrastructure changes needed.
>>>
>>> Paul - was/is your standalone ioreq server (demu?) able to run
>>> in other than the domain controlling a guest?
>>>
>> Not something I've done yet, but it was always part of the idea so that we could e.g. pass through a device to a dedicated domain and then run multiple demu instances there to virtualize it for many domUs. (I'm thinking here of a device that is not SR-IOV and hence would need some bespoke emulation code to share it out).That dedicated domain would be termed the 'driver domain' simply because it is running the device driver for the h/w that underpins the emulation.
> 
> I may abuse "driver domain" terminology, but indeed in our use-case we 
> pass through a set of H/W devices to a dedicated domain which is running 
> the device drivers for that H/Ws. Our target system comprises a thin 
> Dom0 (without H/W devices at all), DomD (which owns most of the H/W 
> devices) and DomU which runs on virtual devices. This patch tries to 
> make changes at Xen side to be able run standalone ioreq server 
> (emulator) in that dedicated (driver?) domain.

Okay, in which case I'm fine with the term. I simply wasn't aware of the
targeted scenario, sorry.

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-06  0:37             ` Stefano Stabellini
@ 2020-08-06  9:45               ` Julien Grall
  2020-08-06 23:48                 ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-06  9:45 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', 'Jun Nakajima', 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	Oleksandr, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', xen-devel, 'Roger Pau Monné'

Hi,

On 06/08/2020 01:37, Stefano Stabellini wrote:
> On Wed, 5 Aug 2020, Julien Grall wrote:
>> On 04/08/2020 20:11, Stefano Stabellini wrote:
>>> On Tue, 4 Aug 2020, Julien Grall wrote:
>>>> On 04/08/2020 12:10, Oleksandr wrote:
>>>>> On 04.08.20 10:45, Paul Durrant wrote:
>>>>>>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq)
>>>>>>> +{
>>>>>>> +    return ioreq->state == STATE_IOREQ_READY &&
>>>>>>> +           !ioreq->data_is_ptr &&
>>>>>>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
>>>>>>> IOREQ_WRITE);
>>>>>>> +}
>>>>>> I don't think having this in common code is correct. The short-cut of
>>>>>> not
>>>>>> completing PIO reads seems somewhat x86 specific.
>>>>
>>>> Hmmm, looking at the code, I think it doesn't wait for PIO writes to
>>>> complete
>>>> (not read). Did I miss anything?
>>>>
>>>>> Does ARM even
>>>>>> have the concept of PIO?
>>>>>
>>>>> I am not 100% sure here, but it seems that doesn't have.
>>>>
>>>> Technically, the PIOs exist on Arm, however they are accessed the same way
>>>> as
>>>> MMIO and will have a dedicated area defined by the HW.
>>>>
>>>> AFAICT, on Arm64, they are only used for PCI IO Bar.
>>>>
>>>> Now the question is whether we want to expose them to the Device Emulator
>>>> as
>>>> PIO or MMIO access. From a generic PoV, a DM shouldn't have to care about
>>>> the
>>>> architecture used. It should just be able to request a given IOport
>>>> region.
>>>>
>>>> So it may make sense to differentiate them in the common ioreq code as
>>>> well.
>>>>
>>>> I had a quick look at QEMU and wasn't able to tell if PIOs and MMIOs
>>>> address
>>>> space are different on Arm as well. Paul, Stefano, do you know what they
>>>> are
>>>> doing?
>>>
>>> On the QEMU side, it looks like PIO (address_space_io) is used in
>>> connection with the emulation of the "in" or "out" instructions, see
>>> ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
>>> space regardless of the architecture, such as
>>> hw/pci/pci_bridge.c:pci_bridge_initfn.
>>>
>>> However, because there is no "in" and "out" on ARM, I don't think
>>> address_space_io can be accessed. Specifically, there is no equivalent
>>> for target/i386/misc_helper.c:helper_inb on ARM.
>>
>> So how PCI I/O BAR are accessed? Surely, they could be used on Arm, right?
> 
> PIO is also memory mapped on ARM and it seems to have its own MMIO
> address window.
This part is already well-understood :). However, this only tell us how 
an OS is accessing a PIO.

What I am trying to figure out is how the hardware (or QEMU) is meant to 
work.

 From my understanding, the MMIO access will be received by the 
hostbridge and then forwarded to the appropriate PCI device. The two 
questions I am trying to answer is: How the I/O BARs are configured? 
Will it contain an MMIO address or an offset?

If the answer is the latter, then we will need PIO because a DM will 
never see the MMIO address (the hostbridge will be emulated in Xen).

I am still trying to navigate through the code and didn't manage to find 
an answer so far.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-05 15:41       ` Oleksandr
@ 2020-08-06 10:19         ` Julien Grall
  0 siblings, 0 replies; 140+ messages in thread
From: Julien Grall @ 2020-08-06 10:19 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk

Hi,

On 05/08/2020 16:41, Oleksandr wrote:
> 
> On 05.08.20 12:32, Julien Grall wrote:
> 
> Hi Julien.
> 
>> Hi Stefano,
>>
>> On 05/08/2020 00:22, Stefano Stabellini wrote:
>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> This patch makes possible to forward Guest MMIO accesses
>>>> to a device emulator on Arm and enables that support for
>>>> Arm64.
>>>>
>>>> Also update XSM code a bit to let DM op be used on Arm.
>>>> New arch DM op will be introduced in the follow-up patch.
>>>>
>>>> Please note, at the moment build on Arm32 is broken
>>>> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone
>>>
>>> Speaking of buffered_ioreq, if I recall correctly, they were only used
>>> for VGA-related things on x86. It looks like it is still true.
>>>
>>> If so, do we need it on ARM? Note that I don't think we can get rid of
>>> it from the interface as it is baked into ioreq, but it might be
>>> possible to have a dummy implementation on ARM. Or maybe not: looking at
>>> xen/common/hvm/ioreq.c it looks like it would be difficult to
>>> disentangle bufioreq stuff from the rest of the code.
>>
>> We possibly don't need it right now. However, this could possibly be 
>> used in the future (e.g. virtio notification doorbell).
>>
>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>>>    */
>>>>   void leave_hypervisor_to_guest(void)
>>>>   {
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +    /*
>>>> +     * XXX: Check the return. Shall we call that in
>>>> +     * continue_running and context_switch instead?
>>>> +     * The benefits would be to avoid calling
>>>> +     * handle_hvm_io_completion on every return.
>>>> +     */
>>>
>>> Yeah, that could be a simple and good optimization
>>
>> Well, it is not simple as it is sounds :). handle_hvm_io_completion() 
>> is the function in charge to mark the vCPU as waiting for I/O. So we 
>> would at least need to split the function.
>>
>> I wrote this TODO because I wasn't sure about the complexity of 
>> handle_hvm_io_completion(current). Looking at it again, the main 
>> complexity is the looping over the IOREQ servers.
>>
>> I think it would be better to optimize handle_hvm_io_completion() 
>> rather than trying to hack the context_switch() or continue_running().
>>
>> [...]
>>
>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>>> index 5fdb6e8..5823f11 100644
>>>> --- a/xen/include/asm-arm/p2m.h
>>>> +++ b/xen/include/asm-arm/p2m.h
>>>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct 
>>>> domain *d, unsigned long gfn,
>>>>                                           mfn_t mfn)
>>>>   {
>>>>       /*
>>>> -     * NOTE: If this is implemented then proper reference counting of
>>>> -     *       foreign entries will need to be implemented.
>>>> +     * XXX: handle properly reference. It looks like the page may 
>>>> not always
>>>> +     * belong to d.
>>>
>>> Just as a reference, and without taking away anything from the comment,
>>> I think that QEMU is doing its own internal reference counting for these
>>> mappings.
>>
>> I am not sure how this matters here? We can't really trust the DM to 
>> do the right thing if it is not running in dom0.
>>
>> But, IIRC, the problem is some of the pages doesn't belong to do a 
>> domain, so it is not possible to treat them as foreign mapping (e.g. 
>> you wouldn't be able to grab a reference). This investigation was done 
>> a couple of years ago, so this may have changed in recent Xen.
>>
>> As a side note, I am a bit surprised to see most of my original TODOs 
>> present in the code. What is the plan to solve them?
> The plan is to solve most critical TODOs in current series, and rest in 
> follow-up series if no objections of course. Any pointers how to solve 
> them properly would be much appreciated. Unfortunately, now I have a 
> weak understanding how they should be fixed. 

AFAICT, there is already some discussion about those 3 major TODOs 
happening. I would suggest to go through the discussions. We can clarify 
anything if needed.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-05 19:30     ` Oleksandr
@ 2020-08-06 11:08       ` Julien Grall
  2020-08-06 11:29         ` Jan Beulich
  2020-08-06 13:27         ` Oleksandr
  0 siblings, 2 replies; 140+ messages in thread
From: Julien Grall @ 2020-08-06 11:08 UTC (permalink / raw)
  To: Oleksandr, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	Daniel De Graaf, Volodymyr Babchuk



On 05/08/2020 20:30, Oleksandr wrote:
> 
> On 05.08.20 17:12, Julien Grall wrote:
>> Hi,
> 
> Hi Julien
> 
> 
>>
>> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> This patch makes possible to forward Guest MMIO accesses
>>> to a device emulator on Arm and enables that support for
>>> Arm64.
>>>
>>> Also update XSM code a bit to let DM op be used on Arm.
>>> New arch DM op will be introduced in the follow-up patch.
>>>
>>> Please note, at the moment build on Arm32 is broken
>>> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone
>>> wants to enable CONFIG_IOREQ_SERVER due to the lack of
>>> cmpxchg_64 support on Arm32.
>>>
>>> Please note, this is a split/cleanup of Julien's PoC:
>>> "Add support for Guest IO forwarding to a device emulator"
>>>
>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> ---
>>>   tools/libxc/xc_dom_arm.c        |  25 +++++++---
>>>   xen/arch/arm/Kconfig            |   1 +
>>>   xen/arch/arm/Makefile           |   2 +
>>>   xen/arch/arm/dm.c               |  34 +++++++++++++
>>>   xen/arch/arm/domain.c           |   9 ++++
>>>   xen/arch/arm/hvm.c              |  46 +++++++++++++++++-
>>>   xen/arch/arm/io.c               |  67 +++++++++++++++++++++++++-
>>>   xen/arch/arm/ioreq.c            |  86 
>>> +++++++++++++++++++++++++++++++++
>>>   xen/arch/arm/traps.c            |  17 +++++++
>>>   xen/common/memory.c             |   5 +-
>>>   xen/include/asm-arm/domain.h    |  80 +++++++++++++++++++++++++++++++
>>>   xen/include/asm-arm/hvm/ioreq.h | 103 
>>> ++++++++++++++++++++++++++++++++++++++++
>>>   xen/include/asm-arm/mmio.h      |   1 +
>>>   xen/include/asm-arm/p2m.h       |   7 +--
>>>   xen/include/xsm/dummy.h         |   4 +-
>>>   xen/include/xsm/xsm.h           |   6 +--
>>>   xen/xsm/dummy.c                 |   2 +-
>>>   xen/xsm/flask/hooks.c           |   5 +-
>>>   18 files changed, 476 insertions(+), 24 deletions(-)
>>>   create mode 100644 xen/arch/arm/dm.c
>>>   create mode 100644 xen/arch/arm/ioreq.c
>>>   create mode 100644 xen/include/asm-arm/hvm/ioreq.h
>>
>> It feels to me the patch is doing quite a few things that are 
>> indirectly related. Can this be split to make the review easier?
>>
>> I would like at least the following split from the series:
>>    - The tools changes
>>    - The P2M changes
>>    - The HVMOP plumbing (if we still require them)
> Sure, will split.
> However, I don't quite understand where we should leave HVMOP plumbing.

I think they will need to be droppped if we decide to use the acquire 
interface.

> If I understand correctly the suggestion was to switch to acquire 
> interface instead (which requires a Linux version to be v4.17 at least)?

This was the suggestion.

> I suspect, this is all about "xen/privcmd: add 
> IOCTL_PRIVCMD_MMAP_RESOURCE" support for Linux?

Correct.

>> What is this function supposed to do?
> Agree, sounds confusing a bit. I assume it is supposed to complete Guest 
> MMIO access after finishing emulation.
> 
> Shall I rename it to something appropriate (maybe by adding ioreq prefix)?

How about ioreq_handle_complete_mmio()?

>>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>>> index 9283e5e..0000477 100644
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -8,6 +8,7 @@
>>>    */
>>>     #include <xen/domain_page.h>
>>> +#include <xen/hvm/ioreq.h>
>>>   #include <xen/types.h>
>>>   #include <xen/lib.h>
>>>   #include <xen/mm.h>
>>> @@ -30,10 +31,6 @@
>>>   #include <public/memory.h>
>>>   #include <xsm/xsm.h>
>>>   -#ifdef CONFIG_IOREQ_SERVER
>>> -#include <xen/hvm/ioreq.h>
>>> -#endif
>>> -
>>
>> Why do you remove something your just introduced?
> The reason I guarded that header is to make "xen/mm: Make x86's 
> XENMEM_resource_ioreq_server handling common" (previous) patch buildable 
> on Arm
> without arch IOREQ header added yet. I tried to make sure that the 
> result after each patch was buildable to retain bisectability.
> As current patch adds Arm IOREQ specific bits (including header), that 
> guard could be removed as not needed anymore.
I agree we want to have the build bisectable. However, I am still 
puzzled why it is necessary to remove the #ifdef and move it earlier in 
the list.

Do you mind to provide more details?

[...]

>>> +
>>> +bool handle_mmio(void);
>>> +
>>> +static inline bool handle_pio(uint16_t port, unsigned int size, int 
>>> dir)
>>> +{
>>> +    /* XXX */
>>
>> Can you expand this TODO? What do you expect to do?
> I didn't expect this to be called on Arm. Sorry, I am not sure l have an 
> idea how to handle this properly. I would keep it unimplemented until a 
> real reason.
> Will expand TODO.

Let see how the conversation on patch#1 goes about PIO vs MMIO.

>>
>>
>>> +    BUG();
>>> +    return true;
>>> +}
>>> +
>>> +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
>>> +{
>>> +    return p->addr;
>>> +}
>>
>> I understand that the x86 version is more complex as it check p->df. 
>> However, aside reducing the complexity, I am not sure why we would 
>> want to diverge it.
>>
>> After all, IOREQ is now meant to be a common feature.
> Well, no objections at all.
> Could you please clarify how could 'df' (Direction Flag?) be 
> handled/used on Arm?

On x86, this is used by 'rep' instruction to tell the direction to 
iterate (forward or backward).

On Arm, all the accesses to MMIO region will do a single memory access. 
So for now, we can safely always set to 0.

> I see that try_fwd_ioserv() always sets it 0. Or I 
> need to just reuse x86's helpers as is,
> which (together with count = df = 0) will result in what we actually 
> have here?
AFAIU, both count and df should be 0 on Arm.

>>
>>
>>> +
>>> +static inline int p2m_set_ioreq_server(struct domain *d,
>>> +                                       unsigned int flags,
>>> +                                       struct hvm_ioreq_server *s)
>>> +{
>>> +    return -EOPNOTSUPP;
>>> +}
>>
>> This should be defined in p2m.h. But I am not even sure what it is 
>> meant for. Can you expand it?
> 
> ok, will move.
> 
> 
> In this series I tried to make as much IOREQ code common as possible and 
> avoid complicating things, in order to achieve that a few stubs were 
> added here. Please note,
> that I also considered splitting into arch parts. But some functions 
> couldn't be split easily.
> This one is called from common hvm_destroy_ioreq_server() with flag 
> being 0 (which will result in unmapping ioreq server from p2m type on x86).
> I could add a comment describing why this stub is present here.

Sorry if I wasn't clear. I wasn't asking why the stub is there but what 
should be the expected implementation of the function.

In particular, you are returning -EOPNOTSUPP. The only reason you are 
getting away from trouble is because the caller doesn't check the return.

Would it make sense to have a stub arch_hvm_destroy_ioreq_server()?

> 
> 
>>
>>
>>> +
>>> +static inline void msix_write_completion(struct vcpu *v)
>>> +{
>>> +}
>>> +
>>> +static inline void handle_realmode_completion(void)
>>> +{
>>> +    ASSERT_UNREACHABLE();
>>> +}
>>
>> realmode is very x86 specific. So I don't think this function should 
>> be called from common code. It might be worth considering to split 
>> handle_hvm_io_completion() is 2 parts: common and arch specific.
> 
> I agree with you that realmode is x86 specific and looks not good in Arm 
> header. 
It is not a problem of looking good or not. Instead, it is about 
abstraction. A developper shouldn't need to understand all the other 
architectures we support in order to follow the common code.

> I was thinking how to split handle_hvm_io_completion() 
> gracefully but I failed find a good solution for that, so decided to add 
> two stubs (msix_write_completion and handle_realmode_completion) on Arm. 
> I could add a comment describing why they are here if appropriate. But 
> if you think they shouldn't be called from the common code in any way, I 
> will try to split it.

I am not entirely sure what msix_write_completion is meant to do on x86. 
Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help?

Regarding handle_realmode_completion, I would add a new stub:

arch_ioreq_handle_io_completion() that is called from the default case 
of the switch.

On x86 it would be implemented as:

  switch (io_completion)
  {
     case HVMIO_realmode_completion:
       ...
     default:
       ASSERT_UNREACHABLE();
  }

On Arm, it would be implemented as:

   ASSERT_UNREACHABLE();

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-06 11:08       ` Julien Grall
@ 2020-08-06 11:29         ` Jan Beulich
  2020-08-20 18:30           ` Oleksandr
  2020-08-06 13:27         ` Oleksandr
  1 sibling, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-06 11:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr, Oleksandr Tyshchenko, Julien Grall,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk

On 06.08.2020 13:08, Julien Grall wrote:
> On 05/08/2020 20:30, Oleksandr wrote:
>> I was thinking how to split handle_hvm_io_completion() 
>> gracefully but I failed find a good solution for that, so decided to add 
>> two stubs (msix_write_completion and handle_realmode_completion) on Arm. 
>> I could add a comment describing why they are here if appropriate. But 
>> if you think they shouldn't be called from the common code in any way, I 
>> will try to split it.
> 
> I am not entirely sure what msix_write_completion is meant to do on x86. 
> Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help?

Due to the split brain model of handling PCI pass-through (between
Xen and qemu), a guest writing to an MSI-X entry needs this write
handed to qemu, and upon completion of the write there Xen also
needs to take some extra action.

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-06  0:37       ` Stefano Stabellini
@ 2020-08-06 11:32         ` Julien Grall
  2020-08-06 23:49           ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-06 11:32 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Oleksandr Tyshchenko, Julien Grall,
	Jan Beulich, xen-devel, Volodymyr Babchuk

Hi Stefano,

On 06/08/2020 01:37, Stefano Stabellini wrote:
> On Wed, 5 Aug 2020, Julien Grall wrote:
>> On 05/08/2020 00:22, Stefano Stabellini wrote:
>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> This patch adds ability to the device emulator to notify otherend
>>>> (some entity running in the guest) using a SPI and implements Arm
>>>> specific bits for it. Proposed interface allows emulator to set
>>>> the logical level of a one of a domain's IRQ lines.
>>>>
>>>> Please note, this is a split/cleanup of Julien's PoC:
>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>>    tools/libs/devicemodel/core.c                   | 18 ++++++++++++++++++
>>>>    tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
>>>>    tools/libs/devicemodel/libxendevicemodel.map    |  1 +
>>>>    xen/arch/arm/dm.c                               | 22
>>>> +++++++++++++++++++++-
>>>>    xen/common/hvm/dm.c                             |  1 +
>>>>    xen/include/public/hvm/dm_op.h                  | 15 +++++++++++++++
>>>>    6 files changed, 60 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
>>>> index 4d40639..30bd79f 100644
>>>> --- a/tools/libs/devicemodel/core.c
>>>> +++ b/tools/libs/devicemodel/core.c
>>>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
>>>>        return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>>>>    }
>>>>    +int xendevicemodel_set_irq_level(
>>>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
>>>> +    unsigned int level)
>>>
>>> It is a pity that having xen_dm_op_set_pci_intx_level and
>>> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
>>> the names alone I don't think we can reuse either of them.
>>
>> The problem is not the name...
>>
>>>
>>> It is very similar to set_isa_irq_level. We could almost rename
>>> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
>>> better, just add an alias to it so that xendevicemodel_set_irq_level is
>>> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
>>> not sure if it is worth doing it though. Any other opinions?
>>
>> ... the problem is the interrupt field is only 8-bit. So we would only be able
>> to cover IRQ 0 - 255.
> 
> Argh, that's not going to work :-(  I wasn't sure if it was a good idea
> anyway.
> 
> 
>> It is not entirely clear how the existing subop could be extended without
>> breaking existing callers.
>>
>>> But I think we should plan for not needing two calls (one to set level
>>> to 1, and one to set it to 0):
>>> https://marc.info/?l=xen-devel&m=159535112027405
>>
>> I am not sure to understand your suggestion here? Are you suggesting to remove
>> the 'level' parameter?
> 
> My hope was to make it optional to call the hypercall with level = 0,
> not necessarily to remove 'level' from the struct.

 From my understanding, the hypercall is meant to represent the status 
of the line between the device and the interrupt controller (either low 
or high).

This is then up to the interrupt controller to decide when the interrupt 
is going to be fired:
   - For edge interrupt, this will fire when the line move from low to 
high (or vice versa).
   - For level interrupt, this will fire when line is high (assuming 
level trigger high) and will keeping firing until the device decided to 
lower the line.

For a device, it is common to keep the line high until an OS wrote to a 
specific register.

Furthermore, technically, the guest OS is in charge to configure how an 
interrupt is triggered. Admittely this information is part of the DT, 
but nothing prevent a guest to change it.

As side note, we have a workaround in Xen for some buggy DT (see the 
arch timer) exposing the wrong trigger type.

Because of that, I don't really see a way to make optional. Maybe you 
have something different in mind?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation
  2020-08-05 16:21   ` Jan Beulich
@ 2020-08-06 11:35     ` Julien Grall
  2020-08-06 11:50       ` Jan Beulich
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-06 11:35 UTC (permalink / raw)
  To: Jan Beulich, Oleksandr Tyshchenko
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, xen-devel,
	Volodymyr Babchuk

Hi Jan,

On 05/08/2020 17:21, Jan Beulich wrote:
> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -1652,6 +1652,12 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>           break;
>>       }
>>   
>> +    /* x86 already sets the flag in hvm_memory_op() */
>> +#if defined(CONFIG_ARM64) && defined(CONFIG_IOREQ_SERVER)
>> +    if ( op == XENMEM_decrease_reservation )
>> +        curr_d->arch.hvm.qemu_mapcache_invalidate = true;
>> +#endif
> 
> Doesn't the comment already indicate a route towards an approach
> not requiring to alter common code?

Given that IOREQ is now moved under common/, I think it would make sense 
to have this set in common code as well for all the architecture.

IOW, I would suggest to drop the #ifdef CONFIG_ARM64. In addition, we 
may want to introduce an helper to check if a domain is using ioreq.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-05 13:30   ` Julien Grall
@ 2020-08-06 11:37     ` Oleksandr
  2020-08-10 16:29       ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-06 11:37 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Jan Beulich, Wei Liu,
	Paul Durrant, Andrew Cooper, Ian Jackson, George Dunlap,
	Tim Deegan, Oleksandr Tyshchenko, Julien Grall, Jun Nakajima,
	Roger Pau Monné


On 05.08.20 16:30, Julien Grall wrote:
> Hi,

Hi Julien


>
> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> As a lot of x86 code can be re-used on Arm later on, this patch
>> splits IOREQ support into common and arch specific parts.
>>
>> This support is going to be used on Arm to be able run device
>> emulator outside of Xen hypervisor.
>>
>> Please note, this is a split/cleanup of Julien's PoC:
>> "Add support for Guest IO forwarding to a device emulator"
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>>   xen/arch/x86/Kconfig            |    1 +
>>   xen/arch/x86/hvm/dm.c           |    2 +-
>>   xen/arch/x86/hvm/emulate.c      |    2 +-
>>   xen/arch/x86/hvm/hvm.c          |    2 +-
>>   xen/arch/x86/hvm/io.c           |    2 +-
>>   xen/arch/x86/hvm/ioreq.c        | 1431 
>> +--------------------------------------
>>   xen/arch/x86/hvm/stdvga.c       |    2 +-
>>   xen/arch/x86/hvm/vmx/realmode.c |    1 +
>>   xen/arch/x86/hvm/vmx/vvmx.c     |    2 +-
>>   xen/arch/x86/mm.c               |    2 +-
>>   xen/arch/x86/mm/shadow/common.c |    2 +-
>>   xen/common/Kconfig              |    3 +
>>   xen/common/Makefile             |    1 +
>>   xen/common/hvm/Makefile         |    1 +
>>   xen/common/hvm/ioreq.c          | 1430 
>> ++++++++++++++++++++++++++++++++++++++
>>   xen/include/asm-x86/hvm/ioreq.h |   45 +-
>>   xen/include/asm-x86/hvm/vcpu.h  |    7 -
>>   xen/include/xen/hvm/ioreq.h     |   89 +++
>>   18 files changed, 1575 insertions(+), 1450 deletions(-)
>
> That's quite a lot of code moved in a single patch. How can we check 
> the code moved is still correct? Is it a verbatim copy?
In this patch I mostly tried to separate a common IOREQ part which also 
resulted in updating x86 sources to include new header.  Also I moved 
hvm_ioreq_needs_completion() to common header (which probably wanted to 
be in a separate patch). It was a verbatim copy initially (w/o 
hvm_map_mem_type_to_ioreq_server) and then updated to deal with arch 
specific parts.
In which way do you want me to split this patch?

I could think of the following:

1. Copy of x86's ioreq.c/ioreq.h to common code
2. Update common ioreq.c/ioreq.h
3. Update x86's parts to be able to deal with common code
4. Move hvm_ioreq_needs_completion() to common code

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation
  2020-08-06 11:35     ` Julien Grall
@ 2020-08-06 11:50       ` Jan Beulich
  2020-08-06 14:28         ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-06 11:50 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Oleksandr Tyshchenko,
	xen-devel, Volodymyr Babchuk

On 06.08.2020 13:35, Julien Grall wrote:
> On 05/08/2020 17:21, Jan Beulich wrote:
>> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -1652,6 +1652,12 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>           break;
>>>       }
>>>   
>>> +    /* x86 already sets the flag in hvm_memory_op() */
>>> +#if defined(CONFIG_ARM64) && defined(CONFIG_IOREQ_SERVER)
>>> +    if ( op == XENMEM_decrease_reservation )
>>> +        curr_d->arch.hvm.qemu_mapcache_invalidate = true;
>>> +#endif
>>
>> Doesn't the comment already indicate a route towards an approach
>> not requiring to alter common code?
> 
> Given that IOREQ is now moved under common/, I think it would make sense 
> to have this set in common code as well for all the architecture.
> 
> IOW, I would suggest to drop the #ifdef CONFIG_ARM64. In addition, we 
> may want to introduce an helper to check if a domain is using ioreq.

Of course, with the (part of the) conditional dropped and the struct
field moved out of the arch sub-struct, this is fine to live here.

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-06 11:08       ` Julien Grall
  2020-08-06 11:29         ` Jan Beulich
@ 2020-08-06 13:27         ` Oleksandr
  2020-08-10 18:25           ` Julien Grall
  1 sibling, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-06 13:27 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	Daniel De Graaf, Volodymyr Babchuk


On 06.08.20 14:08, Julien Grall wrote:

Hi Julien

>
>>> What is this function supposed to do?
>> Agree, sounds confusing a bit. I assume it is supposed to complete 
>> Guest MMIO access after finishing emulation.
>>
>> Shall I rename it to something appropriate (maybe by adding ioreq 
>> prefix)?
>
> How about ioreq_handle_complete_mmio()?

For me it sounds fine.



>
>>>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>>>> index 9283e5e..0000477 100644
>>>> --- a/xen/common/memory.c
>>>> +++ b/xen/common/memory.c
>>>> @@ -8,6 +8,7 @@
>>>>    */
>>>>     #include <xen/domain_page.h>
>>>> +#include <xen/hvm/ioreq.h>
>>>>   #include <xen/types.h>
>>>>   #include <xen/lib.h>
>>>>   #include <xen/mm.h>
>>>> @@ -30,10 +31,6 @@
>>>>   #include <public/memory.h>
>>>>   #include <xsm/xsm.h>
>>>>   -#ifdef CONFIG_IOREQ_SERVER
>>>> -#include <xen/hvm/ioreq.h>
>>>> -#endif
>>>> -
>>>
>>> Why do you remove something your just introduced?
>> The reason I guarded that header is to make "xen/mm: Make x86's 
>> XENMEM_resource_ioreq_server handling common" (previous) patch 
>> buildable on Arm
>> without arch IOREQ header added yet. I tried to make sure that the 
>> result after each patch was buildable to retain bisectability.
>> As current patch adds Arm IOREQ specific bits (including header), 
>> that guard could be removed as not needed anymore.
> I agree we want to have the build bisectable. However, I am still 
> puzzled why it is necessary to remove the #ifdef and move it earlier 
> in the list.
>
> Do you mind to provide more details?
Previous patch "xen/mm: Make x86's XENMEM_resource_ioreq_server handling 
common" breaks build on Arm as it includes xen/hvm/ioreq.h which 
requires arch header
to be present (asm/hvm/ioreq.h). But the missing arch header together 
with other arch specific bits are introduced here in current patch. 
Probably I should have rearranged
changes in a way to not introduce #ifdef and then remove it...


>
> [...]
>
>>>> +
>>>> +bool handle_mmio(void);
>>>> +
>>>> +static inline bool handle_pio(uint16_t port, unsigned int size, 
>>>> int dir)
>>>> +{
>>>> +    /* XXX */
>>>
>>> Can you expand this TODO? What do you expect to do?
>> I didn't expect this to be called on Arm. Sorry, I am not sure l have 
>> an idea how to handle this properly. I would keep it unimplemented 
>> until a real reason.
>> Will expand TODO.
>
> Let see how the conversation on patch#1 goes about PIO vs MMIO.

ok


>
>>>
>>>
>>>> +    BUG();
>>>> +    return true;
>>>> +}
>>>> +
>>>> +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
>>>> +{
>>>> +    return p->addr;
>>>> +}
>>>
>>> I understand that the x86 version is more complex as it check p->df. 
>>> However, aside reducing the complexity, I am not sure why we would 
>>> want to diverge it.
>>>
>>> After all, IOREQ is now meant to be a common feature.
>> Well, no objections at all.
>> Could you please clarify how could 'df' (Direction Flag?) be 
>> handled/used on Arm?
>
> On x86, this is used by 'rep' instruction to tell the direction to 
> iterate (forward or backward).
>
> On Arm, all the accesses to MMIO region will do a single memory 
> access. So for now, we can safely always set to 0.
>
>> I see that try_fwd_ioserv() always sets it 0. Or I need to just reuse 
>> x86's helpers as is,
>> which (together with count = df = 0) will result in what we actually 
>> have here?
> AFAIU, both count and df should be 0 on Arm.

Thanks for the explanation. The only one question remains where to put 
common helpers hvm_mmio_first_byte/hvm_mmio_last_byte (common io.h?)?


>>>
>>>> +
>>>> +static inline int p2m_set_ioreq_server(struct domain *d,
>>>> +                                       unsigned int flags,
>>>> +                                       struct hvm_ioreq_server *s)
>>>> +{
>>>> +    return -EOPNOTSUPP;
>>>> +}
>>>
>>> This should be defined in p2m.h. But I am not even sure what it is 
>>> meant for. Can you expand it?
>>
>> ok, will move.
>>
>>
>> In this series I tried to make as much IOREQ code common as possible 
>> and avoid complicating things, in order to achieve that a few stubs 
>> were added here. Please note,
>> that I also considered splitting into arch parts. But some functions 
>> couldn't be split easily.
>> This one is called from common hvm_destroy_ioreq_server() with flag 
>> being 0 (which will result in unmapping ioreq server from p2m type on 
>> x86).
>> I could add a comment describing why this stub is present here.
>
> Sorry if I wasn't clear. I wasn't asking why the stub is there but 
> what should be the expected implementation of the function.
>
> In particular, you are returning -EOPNOTSUPP. The only reason you are 
> getting away from trouble is because the caller doesn't check the return.

True.


>
> Would it make sense to have a stub arch_hvm_destroy_ioreq_server()?

With what has been said above, it make sense, will create.


>>>> +
>>>> +static inline void msix_write_completion(struct vcpu *v)
>>>> +{
>>>> +}
>>>> +
>>>> +static inline void handle_realmode_completion(void)
>>>> +{
>>>> +    ASSERT_UNREACHABLE();
>>>> +}
>>>
>>> realmode is very x86 specific. So I don't think this function should 
>>> be called from common code. It might be worth considering to split 
>>> handle_hvm_io_completion() is 2 parts: common and arch specific.
>>
>> I agree with you that realmode is x86 specific and looks not good in 
>> Arm header. 
> It is not a problem of looking good or not. Instead, it is about 
> abstraction. A developper shouldn't need to understand all the other 
> architectures we support in order to follow the common code.
>
>> I was thinking how to split handle_hvm_io_completion() gracefully but 
>> I failed find a good solution for that, so decided to add two stubs 
>> (msix_write_completion and handle_realmode_completion) on Arm. I 
>> could add a comment describing why they are here if appropriate. But 
>> if you think they shouldn't be called from the common code in any 
>> way, I will try to split it.
>
> I am not entirely sure what msix_write_completion is meant to do on 
> x86. Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could 
> help?
>
> Regarding handle_realmode_completion, I would add a new stub:
>
> arch_ioreq_handle_io_completion() that is called from the default case 
> of the switch.
>
> On x86 it would be implemented as:
>
>  switch (io_completion)
>  {
>     case HVMIO_realmode_completion:
>       ...
>     default:
>       ASSERT_UNREACHABLE();
>  }
>
> On Arm, it would be implemented as:
>
>   ASSERT_UNREACHABLE();


Good point, will update.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation
  2020-08-06 11:50       ` Jan Beulich
@ 2020-08-06 14:28         ` Oleksandr
  2020-08-06 16:33           ` Jan Beulich
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-06 14:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, xen-devel,
	Volodymyr Babchuk


On 06.08.20 14:50, Jan Beulich wrote:

Hi Jan

>>> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>>>> --- a/xen/common/memory.c
>>>> +++ b/xen/common/memory.c
>>>> @@ -1652,6 +1652,12 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>            break;
>>>>        }
>>>>    
>>>> +    /* x86 already sets the flag in hvm_memory_op() */
>>>> +#if defined(CONFIG_ARM64) && defined(CONFIG_IOREQ_SERVER)
>>>> +    if ( op == XENMEM_decrease_reservation )
>>>> +        curr_d->arch.hvm.qemu_mapcache_invalidate = true;
>>>> +#endif
>>> Doesn't the comment already indicate a route towards an approach
>>> not requiring to alter common code?
>> Given that IOREQ is now moved under common/, I think it would make sense
>> to have this set in common code as well for all the architecture.
>>
>> IOW, I would suggest to drop the #ifdef CONFIG_ARM64. In addition, we
>> may want to introduce an helper to check if a domain is using ioreq.
> Of course, with the (part of the) conditional dropped and the struct
> field moved out of the arch sub-struct, this is fine to live here.

ok.


I suspect this should *also* live in compat_memory_op(). Please confirm 
whether my understanding correct.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation
  2020-08-06 14:28         ` Oleksandr
@ 2020-08-06 16:33           ` Jan Beulich
  2020-08-06 16:57             ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-06 16:33 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, xen-devel,
	Volodymyr Babchuk

On 06.08.2020 16:28, Oleksandr wrote:
> 
> On 06.08.20 14:50, Jan Beulich wrote:
> 
> Hi Jan
> 
>>>> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>>>>> --- a/xen/common/memory.c
>>>>> +++ b/xen/common/memory.c
>>>>> @@ -1652,6 +1652,12 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>            break;
>>>>>        }
>>>>>    
>>>>> +    /* x86 already sets the flag in hvm_memory_op() */
>>>>> +#if defined(CONFIG_ARM64) && defined(CONFIG_IOREQ_SERVER)
>>>>> +    if ( op == XENMEM_decrease_reservation )
>>>>> +        curr_d->arch.hvm.qemu_mapcache_invalidate = true;
>>>>> +#endif
>>>> Doesn't the comment already indicate a route towards an approach
>>>> not requiring to alter common code?
>>> Given that IOREQ is now moved under common/, I think it would make sense
>>> to have this set in common code as well for all the architecture.
>>>
>>> IOW, I would suggest to drop the #ifdef CONFIG_ARM64. In addition, we
>>> may want to introduce an helper to check if a domain is using ioreq.
>> Of course, with the (part of the) conditional dropped and the struct
>> field moved out of the arch sub-struct, this is fine to live here.
> 
> ok.
> 
> 
> I suspect this should *also* live in compat_memory_op(). Please confirm 
> whether my understanding correct.

Doesn't compat_memory_op() simply call here, so will have the flag set
as needed?

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation
  2020-08-06 16:33           ` Jan Beulich
@ 2020-08-06 16:57             ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-06 16:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, xen-devel,
	Volodymyr Babchuk


On 06.08.20 19:33, Jan Beulich wrote:

Hi Jan.

> On 06.08.2020 16:28, Oleksandr wrote:
>> On 06.08.20 14:50, Jan Beulich wrote:
>>
>> Hi Jan
>>
>>>>> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>>>>>> --- a/xen/common/memory.c
>>>>>> +++ b/xen/common/memory.c
>>>>>> @@ -1652,6 +1652,12 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>             break;
>>>>>>         }
>>>>>>     
>>>>>> +    /* x86 already sets the flag in hvm_memory_op() */
>>>>>> +#if defined(CONFIG_ARM64) && defined(CONFIG_IOREQ_SERVER)
>>>>>> +    if ( op == XENMEM_decrease_reservation )
>>>>>> +        curr_d->arch.hvm.qemu_mapcache_invalidate = true;
>>>>>> +#endif
>>>>> Doesn't the comment already indicate a route towards an approach
>>>>> not requiring to alter common code?
>>>> Given that IOREQ is now moved under common/, I think it would make sense
>>>> to have this set in common code as well for all the architecture.
>>>>
>>>> IOW, I would suggest to drop the #ifdef CONFIG_ARM64. In addition, we
>>>> may want to introduce an helper to check if a domain is using ioreq.
>>> Of course, with the (part of the) conditional dropped and the struct
>>> field moved out of the arch sub-struct, this is fine to live here.
>> ok.
>>
>>
>> I suspect this should *also* live in compat_memory_op(). Please confirm
>> whether my understanding correct.
> Doesn't compat_memory_op() simply call here, so will have the flag set
> as needed?
Indeed, sorry for the noise.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-06  6:59               ` Jan Beulich
@ 2020-08-06 20:32                 ` Stefano Stabellini
  2020-08-07 13:19                   ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-06 20:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: 'Kevin Tian',
	Stefano Stabellini, Julien Grall, 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	Oleksandr, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Jun Nakajima', xen-devel, 'Roger Pau Monné'

On Thu, 6 Aug 2020, Jan Beulich wrote:
> On 06.08.2020 02:37, Stefano Stabellini wrote:
> > What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
> > early and let the scheduler do its job? Something like:
> > 
> >             enum io_state state = try_handle_mmio(regs, hsr, gpa);
> > 
> >             switch ( state )
> >             {
> >             case IO_ABORT:
> >                 goto inject_abt;
> >             case IO_HANDLED:
> >                 advance_pc(regs, hsr);
> >                 return;
> >             case IO_RETRY:
> >                 /* finish later */
> >                 return;
> >             case IO_UNHANDLED:
> >                 /* IO unhandled, try another way to handle it. */
> >                 break;
> >             default:
> >                 ASSERT_UNREACHABLE();
> >             }
> > 
> > Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> > handle_hvm_io_completion() after QEMU completes the emulation. Today,
> > handle_mmio just sets the user register with the read value.
> > 
> > But it would be better if it called again the original function
> > do_trap_stage2_abort_guest to actually retry the original operation.
> > This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
> > IO_HANDLED instead of IO_RETRY, thus, it will advance_pc (the program
> > counter) completing the handling of this instruction.
> > 
> > The user register with the read value could be set by try_handle_mmio if
> > try_fwd_ioserv returns IO_HANDLED instead of IO_RETRY.
> > 
> > Is that how the state machine is expected to work?
> 
> I think so. Just because it has taken us quite some time (years) on
> the x86 side to get reasonably close to how hardware would behave
> (I think we're still not fully there): The re-execution path needs
> to make sure it observes exactly the same machine state as the
> original path did. In particular changes to memory (by another vCPU)
> must not be observed.

Thanks for the heads up. I think I understand how it is supposed to work
now. I hope Oleksandr is on the same page.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-06  9:45               ` Julien Grall
@ 2020-08-06 23:48                 ` Stefano Stabellini
  2020-08-10 19:20                   ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-06 23:48 UTC (permalink / raw)
  To: Julien Grall
  Cc: 'Kevin Tian', Stefano Stabellini, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', Oleksandr, 'Oleksandr Tyshchenko',
	'Julien Grall', 'Jan Beulich',
	xen-devel, 'Roger Pau Monné'

[-- Attachment #1: Type: text/plain, Size: 4657 bytes --]

On Thu, 6 Aug 2020, Julien Grall wrote:
> On 06/08/2020 01:37, Stefano Stabellini wrote:
> > On Wed, 5 Aug 2020, Julien Grall wrote:
> > > On 04/08/2020 20:11, Stefano Stabellini wrote:
> > > > On Tue, 4 Aug 2020, Julien Grall wrote:
> > > > > On 04/08/2020 12:10, Oleksandr wrote:
> > > > > > On 04.08.20 10:45, Paul Durrant wrote:
> > > > > > > > +static inline bool hvm_ioreq_needs_completion(const ioreq_t
> > > > > > > > *ioreq)
> > > > > > > > +{
> > > > > > > > +    return ioreq->state == STATE_IOREQ_READY &&
> > > > > > > > +           !ioreq->data_is_ptr &&
> > > > > > > > +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
> > > > > > > > IOREQ_WRITE);
> > > > > > > > +}
> > > > > > > I don't think having this in common code is correct. The short-cut
> > > > > > > of
> > > > > > > not
> > > > > > > completing PIO reads seems somewhat x86 specific.
> > > > > 
> > > > > Hmmm, looking at the code, I think it doesn't wait for PIO writes to
> > > > > complete
> > > > > (not read). Did I miss anything?
> > > > > 
> > > > > > Does ARM even
> > > > > > > have the concept of PIO?
> > > > > > 
> > > > > > I am not 100% sure here, but it seems that doesn't have.
> > > > > 
> > > > > Technically, the PIOs exist on Arm, however they are accessed the same
> > > > > way
> > > > > as
> > > > > MMIO and will have a dedicated area defined by the HW.
> > > > > 
> > > > > AFAICT, on Arm64, they are only used for PCI IO Bar.
> > > > > 
> > > > > Now the question is whether we want to expose them to the Device
> > > > > Emulator
> > > > > as
> > > > > PIO or MMIO access. From a generic PoV, a DM shouldn't have to care
> > > > > about
> > > > > the
> > > > > architecture used. It should just be able to request a given IOport
> > > > > region.
> > > > > 
> > > > > So it may make sense to differentiate them in the common ioreq code as
> > > > > well.
> > > > > 
> > > > > I had a quick look at QEMU and wasn't able to tell if PIOs and MMIOs
> > > > > address
> > > > > space are different on Arm as well. Paul, Stefano, do you know what
> > > > > they
> > > > > are
> > > > > doing?
> > > > 
> > > > On the QEMU side, it looks like PIO (address_space_io) is used in
> > > > connection with the emulation of the "in" or "out" instructions, see
> > > > ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
> > > > space regardless of the architecture, such as
> > > > hw/pci/pci_bridge.c:pci_bridge_initfn.
> > > > 
> > > > However, because there is no "in" and "out" on ARM, I don't think
> > > > address_space_io can be accessed. Specifically, there is no equivalent
> > > > for target/i386/misc_helper.c:helper_inb on ARM.
> > > 
> > > So how PCI I/O BAR are accessed? Surely, they could be used on Arm, right?
> > 
> > PIO is also memory mapped on ARM and it seems to have its own MMIO
> > address window.
> This part is already well-understood :). However, this only tell us how an OS
> is accessing a PIO.
> 
> What I am trying to figure out is how the hardware (or QEMU) is meant to work.
> 
> From my understanding, the MMIO access will be received by the hostbridge and
> then forwarded to the appropriate PCI device. The two questions I am trying to
> answer is: How the I/O BARs are configured? Will it contain an MMIO address or
> an offset?
> 
> If the answer is the latter, then we will need PIO because a DM will never see
> the MMIO address (the hostbridge will be emulated in Xen).

Now I understand the question :-)

This is the way I understand it works. Let's say that the PIO aperture
is 0x1000-0x2000 which is aliased to 0x3eff0000-0x3eff1000.
0x1000-0x2000 are addresses that cannot be accessed directly.
0x3eff0000-0x3eff1000 is the range that works.

A PCI device PIO BAR will have an address in the 0x1000-0x2000 range,
for instance 0x1100.

However, when the operating system access 0x1100, it will issue a read
to 0x3eff0100.

Xen will trap the read to 0x3eff0100 and send it to QEMU.

QEMU has to know that 0x3eff0000-0x3eff1000 is the alias to the PIO
aperture and that 0x3eff0100 correspond to PCI device foobar. Similarly,
QEMU has also to know the address range of the MMIO aperture and its
remappings, if any (it is possible to have address remapping for MMIO
addresses too.)

I think today this information is "built-in" QEMU, not configurable. It
works fine because *I think* the PCI aperture is pretty much the same on
x86 boards, at least the one supported by QEMU for Xen.

On ARM, I think we should explicitly declare the PCI MMIO aperture and
its alias/address-remapping. When we do that, we can also declare the
PIO aperture and its alias/address-remapping.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-06 11:32         ` Julien Grall
@ 2020-08-06 23:49           ` Stefano Stabellini
  2020-08-07  8:43             ` Jan Beulich
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-06 23:49 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Oleksandr Tyshchenko,
	Julien Grall, Jan Beulich, xen-devel, Volodymyr Babchuk

On Thu, 6 Aug 2020, Julien Grall wrote:
> On 06/08/2020 01:37, Stefano Stabellini wrote:
> > On Wed, 5 Aug 2020, Julien Grall wrote:
> > > On 05/08/2020 00:22, Stefano Stabellini wrote:
> > > > On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > 
> > > > > This patch adds ability to the device emulator to notify otherend
> > > > > (some entity running in the guest) using a SPI and implements Arm
> > > > > specific bits for it. Proposed interface allows emulator to set
> > > > > the logical level of a one of a domain's IRQ lines.
> > > > > 
> > > > > Please note, this is a split/cleanup of Julien's PoC:
> > > > > "Add support for Guest IO forwarding to a device emulator"
> > > > > 
> > > > > Signed-off-by: Julien Grall <julien.grall@arm.com>
> > > > > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > ---
> > > > >    tools/libs/devicemodel/core.c                   | 18
> > > > > ++++++++++++++++++
> > > > >    tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
> > > > >    tools/libs/devicemodel/libxendevicemodel.map    |  1 +
> > > > >    xen/arch/arm/dm.c                               | 22
> > > > > +++++++++++++++++++++-
> > > > >    xen/common/hvm/dm.c                             |  1 +
> > > > >    xen/include/public/hvm/dm_op.h                  | 15
> > > > > +++++++++++++++
> > > > >    6 files changed, 60 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/tools/libs/devicemodel/core.c
> > > > > b/tools/libs/devicemodel/core.c
> > > > > index 4d40639..30bd79f 100644
> > > > > --- a/tools/libs/devicemodel/core.c
> > > > > +++ b/tools/libs/devicemodel/core.c
> > > > > @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
> > > > >        return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> > > > >    }
> > > > >    +int xendevicemodel_set_irq_level(
> > > > > +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
> > > > > +    unsigned int level)
> > > > 
> > > > It is a pity that having xen_dm_op_set_pci_intx_level and
> > > > xen_dm_op_set_isa_irq_level already we need to add a third one, but from
> > > > the names alone I don't think we can reuse either of them.
> > > 
> > > The problem is not the name...
> > > 
> > > > 
> > > > It is very similar to set_isa_irq_level. We could almost rename
> > > > xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
> > > > better, just add an alias to it so that xendevicemodel_set_irq_level is
> > > > implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
> > > > not sure if it is worth doing it though. Any other opinions?
> > > 
> > > ... the problem is the interrupt field is only 8-bit. So we would only be
> > > able
> > > to cover IRQ 0 - 255.
> > 
> > Argh, that's not going to work :-(  I wasn't sure if it was a good idea
> > anyway.
> > 
> > 
> > > It is not entirely clear how the existing subop could be extended without
> > > breaking existing callers.
> > > 
> > > > But I think we should plan for not needing two calls (one to set level
> > > > to 1, and one to set it to 0):
> > > > https://marc.info/?l=xen-devel&m=159535112027405
> > > 
> > > I am not sure to understand your suggestion here? Are you suggesting to
> > > remove
> > > the 'level' parameter?
> > 
> > My hope was to make it optional to call the hypercall with level = 0,
> > not necessarily to remove 'level' from the struct.
> 
> From my understanding, the hypercall is meant to represent the status of the
> line between the device and the interrupt controller (either low or high).
> 
> This is then up to the interrupt controller to decide when the interrupt is
> going to be fired:
>   - For edge interrupt, this will fire when the line move from low to high (or
> vice versa).
>   - For level interrupt, this will fire when line is high (assuming level
> trigger high) and will keeping firing until the device decided to lower the
> line.
> 
> For a device, it is common to keep the line high until an OS wrote to a
> specific register.
> 
> Furthermore, technically, the guest OS is in charge to configure how an
> interrupt is triggered. Admittely this information is part of the DT, but
> nothing prevent a guest to change it.
> 
> As side note, we have a workaround in Xen for some buggy DT (see the arch
> timer) exposing the wrong trigger type.
> 
> Because of that, I don't really see a way to make optional. Maybe you have
> something different in mind?

For level, we need the level parameter. For edge, we are only interested
in the "edge", right? So maybe we don't want to specify the interrupt
line state for edge interrupts (note that virtio interrupts are edge.)

Maybe something like this:

    struct xen_dm_op_set_irq {
        uint32_t irq;
        #define XEN_DMOP_IRQ_LEVEL 0
        #define XEN_DMOP_IRQ_EDGE  1
        uint8_t type;
        /*
         * If type is XEN_DMOP_IRQ_EDGE, level is ignored.
         * If type is XEN_DMOP_IRQ_LEVEL, level means:
         *      Level: 0 -> level low
         *      Level: 1 -> asserted
         */
        uint8_t level;
    };

So a level interrupt would get two xen_dm_op_set_irq hypercalls, an edge
interrupt only one.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-06 23:49           ` Stefano Stabellini
@ 2020-08-07  8:43             ` Jan Beulich
  2020-08-07 21:50               ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-07  8:43 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Oleksandr Tyshchenko, Julien Grall,
	xen-devel, Volodymyr Babchuk

On 07.08.2020 01:49, Stefano Stabellini wrote:
> On Thu, 6 Aug 2020, Julien Grall wrote:
>> On 06/08/2020 01:37, Stefano Stabellini wrote:
>>> On Wed, 5 Aug 2020, Julien Grall wrote:
>>>> On 05/08/2020 00:22, Stefano Stabellini wrote:
>>>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>
>>>>>> This patch adds ability to the device emulator to notify otherend
>>>>>> (some entity running in the guest) using a SPI and implements Arm
>>>>>> specific bits for it. Proposed interface allows emulator to set
>>>>>> the logical level of a one of a domain's IRQ lines.
>>>>>>
>>>>>> Please note, this is a split/cleanup of Julien's PoC:
>>>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>>>
>>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>> ---
>>>>>>    tools/libs/devicemodel/core.c                   | 18
>>>>>> ++++++++++++++++++
>>>>>>    tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
>>>>>>    tools/libs/devicemodel/libxendevicemodel.map    |  1 +
>>>>>>    xen/arch/arm/dm.c                               | 22
>>>>>> +++++++++++++++++++++-
>>>>>>    xen/common/hvm/dm.c                             |  1 +
>>>>>>    xen/include/public/hvm/dm_op.h                  | 15
>>>>>> +++++++++++++++
>>>>>>    6 files changed, 60 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/tools/libs/devicemodel/core.c
>>>>>> b/tools/libs/devicemodel/core.c
>>>>>> index 4d40639..30bd79f 100644
>>>>>> --- a/tools/libs/devicemodel/core.c
>>>>>> +++ b/tools/libs/devicemodel/core.c
>>>>>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
>>>>>>        return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>>>>>>    }
>>>>>>    +int xendevicemodel_set_irq_level(
>>>>>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
>>>>>> +    unsigned int level)
>>>>>
>>>>> It is a pity that having xen_dm_op_set_pci_intx_level and
>>>>> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
>>>>> the names alone I don't think we can reuse either of them.
>>>>
>>>> The problem is not the name...
>>>>
>>>>>
>>>>> It is very similar to set_isa_irq_level. We could almost rename
>>>>> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
>>>>> better, just add an alias to it so that xendevicemodel_set_irq_level is
>>>>> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
>>>>> not sure if it is worth doing it though. Any other opinions?
>>>>
>>>> ... the problem is the interrupt field is only 8-bit. So we would only be
>>>> able
>>>> to cover IRQ 0 - 255.
>>>
>>> Argh, that's not going to work :-(  I wasn't sure if it was a good idea
>>> anyway.
>>>
>>>
>>>> It is not entirely clear how the existing subop could be extended without
>>>> breaking existing callers.
>>>>
>>>>> But I think we should plan for not needing two calls (one to set level
>>>>> to 1, and one to set it to 0):
>>>>> https://marc.info/?l=xen-devel&m=159535112027405
>>>>
>>>> I am not sure to understand your suggestion here? Are you suggesting to
>>>> remove
>>>> the 'level' parameter?
>>>
>>> My hope was to make it optional to call the hypercall with level = 0,
>>> not necessarily to remove 'level' from the struct.
>>
>> From my understanding, the hypercall is meant to represent the status of the
>> line between the device and the interrupt controller (either low or high).
>>
>> This is then up to the interrupt controller to decide when the interrupt is
>> going to be fired:
>>   - For edge interrupt, this will fire when the line move from low to high (or
>> vice versa).
>>   - For level interrupt, this will fire when line is high (assuming level
>> trigger high) and will keeping firing until the device decided to lower the
>> line.
>>
>> For a device, it is common to keep the line high until an OS wrote to a
>> specific register.
>>
>> Furthermore, technically, the guest OS is in charge to configure how an
>> interrupt is triggered. Admittely this information is part of the DT, but
>> nothing prevent a guest to change it.
>>
>> As side note, we have a workaround in Xen for some buggy DT (see the arch
>> timer) exposing the wrong trigger type.
>>
>> Because of that, I don't really see a way to make optional. Maybe you have
>> something different in mind?
> 
> For level, we need the level parameter. For edge, we are only interested
> in the "edge", right?

I don't think so, unless Arm has special restrictions. Edges can be
both rising and falling ones.

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-06 20:32                 ` Stefano Stabellini
@ 2020-08-07 13:19                   ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-07 13:19 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', Julien Grall, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', 'Oleksandr Tyshchenko',
	'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'


On 06.08.20 23:32, Stefano Stabellini wrote:

Hi Stefano

> On Thu, 6 Aug 2020, Jan Beulich wrote:
>> On 06.08.2020 02:37, Stefano Stabellini wrote:
>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
>>> early and let the scheduler do its job? Something like:
>>>
>>>              enum io_state state = try_handle_mmio(regs, hsr, gpa);
>>>
>>>              switch ( state )
>>>              {
>>>              case IO_ABORT:
>>>                  goto inject_abt;
>>>              case IO_HANDLED:
>>>                  advance_pc(regs, hsr);
>>>                  return;
>>>              case IO_RETRY:
>>>                  /* finish later */
>>>                  return;
>>>              case IO_UNHANDLED:
>>>                  /* IO unhandled, try another way to handle it. */
>>>                  break;
>>>              default:
>>>                  ASSERT_UNREACHABLE();
>>>              }
>>>
>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>> handle_hvm_io_completion() after QEMU completes the emulation. Today,
>>> handle_mmio just sets the user register with the read value.
>>>
>>> But it would be better if it called again the original function
>>> do_trap_stage2_abort_guest to actually retry the original operation.
>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
>>> IO_HANDLED instead of IO_RETRY, thus, it will advance_pc (the program
>>> counter) completing the handling of this instruction.
>>>
>>> The user register with the read value could be set by try_handle_mmio if
>>> try_fwd_ioserv returns IO_HANDLED instead of IO_RETRY.
>>>
>>> Is that how the state machine is expected to work?
>> I think so. Just because it has taken us quite some time (years) on
>> the x86 side to get reasonably close to how hardware would behave
>> (I think we're still not fully there): The re-execution path needs
>> to make sure it observes exactly the same machine state as the
>> original path did. In particular changes to memory (by another vCPU)
>> must not be observed.
> Thanks for the heads up. I think I understand how it is supposed to work
> now. I hope Oleksandr is on the same page.

Not completely. I am still going through the discussion and navigating 
the code trying to understand missing bits for me.

Thanks for trying to clarify how it supposed to work.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-06  0:37             ` Stefano Stabellini
  2020-08-06  6:59               ` Jan Beulich
@ 2020-08-07 16:45               ` Oleksandr
  2020-08-07 21:50                 ` Stefano Stabellini
  1 sibling, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-07 16:45 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', Julien Grall, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', 'Oleksandr Tyshchenko',
	'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'


On 06.08.20 03:37, Stefano Stabellini wrote:

Hi Stefano

Trying to simulate IO_RETRY handling mechanism (according to model 
below) I continuously get IO_RETRY from try_fwd_ioserv() ...

> OK, thanks for the details. My interpretation seems to be correct.
>
> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
> return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
> also needs to handle try_handle_mmio returning IO_RETRY the first
> around, and IO_HANDLED when after QEMU does its job.
>
> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
> early and let the scheduler do its job? Something like:
>
>              enum io_state state = try_handle_mmio(regs, hsr, gpa);
>
>              switch ( state )
>              {
>              case IO_ABORT:
>                  goto inject_abt;
>              case IO_HANDLED:
>                  advance_pc(regs, hsr);
>                  return;
>              case IO_RETRY:
>                  /* finish later */
>                  return;
>              case IO_UNHANDLED:
>                  /* IO unhandled, try another way to handle it. */
>                  break;
>              default:
>                  ASSERT_UNREACHABLE();
>              }
>
> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> handle_hvm_io_completion() after QEMU completes the emulation. Today,
> handle_mmio just sets the user register with the read value.
>
> But it would be better if it called again the original function
> do_trap_stage2_abort_guest to actually retry the original operation.
> This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
> IO_HANDLED instead of IO_RETRY,
I may miss some important point, but I failed to see why try_handle_mmio 
(try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this stage.
Or current try_fwd_ioserv() logic needs rework?


> thus, it will advance_pc (the program
> counter) completing the handling of this instruction.
>
> The user register with the read value could be set by try_handle_mmio if
> try_fwd_ioserv returns IO_HANDLED instead of IO_RETRY.
>
> Is that how the state machine is expected to work?



-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-07 16:45               ` Oleksandr
@ 2020-08-07 21:50                 ` Stefano Stabellini
  2020-08-07 22:19                   ` Oleksandr
  2020-08-07 23:45                   ` Oleksandr
  0 siblings, 2 replies; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-07 21:50 UTC (permalink / raw)
  To: Oleksandr
  Cc: 'Kevin Tian',
	Stefano Stabellini, Julien Grall, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', 'Oleksandr Tyshchenko',
	'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'

On Fri, 7 Aug 2020, Oleksandr wrote:
> On 06.08.20 03:37, Stefano Stabellini wrote:
> 
> Hi Stefano
> 
> Trying to simulate IO_RETRY handling mechanism (according to model below) I
> continuously get IO_RETRY from try_fwd_ioserv() ...
> 
> > OK, thanks for the details. My interpretation seems to be correct.
> > 
> > In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
> > return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
> > also needs to handle try_handle_mmio returning IO_RETRY the first
> > around, and IO_HANDLED when after QEMU does its job.
> > 
> > What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
> > early and let the scheduler do its job? Something like:
> > 
> >              enum io_state state = try_handle_mmio(regs, hsr, gpa);
> > 
> >              switch ( state )
> >              {
> >              case IO_ABORT:
> >                  goto inject_abt;
> >              case IO_HANDLED:
> >                  advance_pc(regs, hsr);
> >                  return;
> >              case IO_RETRY:
> >                  /* finish later */
> >                  return;
> >              case IO_UNHANDLED:
> >                  /* IO unhandled, try another way to handle it. */
> >                  break;
> >              default:
> >                  ASSERT_UNREACHABLE();
> >              }
> > 
> > Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> > handle_hvm_io_completion() after QEMU completes the emulation. Today,
> > handle_mmio just sets the user register with the read value.
> > 
> > But it would be better if it called again the original function
> > do_trap_stage2_abort_guest to actually retry the original operation.
> > This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
> > IO_HANDLED instead of IO_RETRY,
> I may miss some important point, but I failed to see why try_handle_mmio
> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this stage.
> Or current try_fwd_ioserv() logic needs rework?

I think you should check the ioreq->state in try_fwd_ioserv(), if the
result is ready, then ioreq->state should be STATE_IORESP_READY, and you
can return IO_HANDLED.

That is assuming that you are looking at the live version of the ioreq
shared with QEMU instead of a private copy of it, which I am not sure.
Looking at try_fwd_ioserv() it would seem that vio->io_req is just a
copy? The live version is returned by get_ioreq() ?

Even in handle_hvm_io_completion, instead of setting vio->io_req.state
to STATE_IORESP_READY by hand, it would be better to look at the live
version of the ioreq because QEMU will have already set ioreq->state
to STATE_IORESP_READY (hw/i386/xen/xen-hvm.c:cpu_handle_ioreq).

 
> > thus, it will advance_pc (the program
> > counter) completing the handling of this instruction.
> > 
> > The user register with the read value could be set by try_handle_mmio if
> > try_fwd_ioserv returns IO_HANDLED instead of IO_RETRY.
> > 
> > Is that how the state machine is expected to work?


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-07  8:43             ` Jan Beulich
@ 2020-08-07 21:50               ` Stefano Stabellini
  2020-08-08  9:27                 ` Julien Grall
  2020-08-17 15:23                 ` Jan Beulich
  0 siblings, 2 replies; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-07 21:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko,
	Oleksandr Tyshchenko, Julien Grall, xen-devel, Volodymyr Babchuk

On Fri, 7 Aug 2020, Jan Beulich wrote:
> On 07.08.2020 01:49, Stefano Stabellini wrote:
> > On Thu, 6 Aug 2020, Julien Grall wrote:
> >> On 06/08/2020 01:37, Stefano Stabellini wrote:
> >>> On Wed, 5 Aug 2020, Julien Grall wrote:
> >>>> On 05/08/2020 00:22, Stefano Stabellini wrote:
> >>>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> >>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>>>>>
> >>>>>> This patch adds ability to the device emulator to notify otherend
> >>>>>> (some entity running in the guest) using a SPI and implements Arm
> >>>>>> specific bits for it. Proposed interface allows emulator to set
> >>>>>> the logical level of a one of a domain's IRQ lines.
> >>>>>>
> >>>>>> Please note, this is a split/cleanup of Julien's PoC:
> >>>>>> "Add support for Guest IO forwarding to a device emulator"
> >>>>>>
> >>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
> >>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>>>>> ---
> >>>>>>    tools/libs/devicemodel/core.c                   | 18
> >>>>>> ++++++++++++++++++
> >>>>>>    tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
> >>>>>>    tools/libs/devicemodel/libxendevicemodel.map    |  1 +
> >>>>>>    xen/arch/arm/dm.c                               | 22
> >>>>>> +++++++++++++++++++++-
> >>>>>>    xen/common/hvm/dm.c                             |  1 +
> >>>>>>    xen/include/public/hvm/dm_op.h                  | 15
> >>>>>> +++++++++++++++
> >>>>>>    6 files changed, 60 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/tools/libs/devicemodel/core.c
> >>>>>> b/tools/libs/devicemodel/core.c
> >>>>>> index 4d40639..30bd79f 100644
> >>>>>> --- a/tools/libs/devicemodel/core.c
> >>>>>> +++ b/tools/libs/devicemodel/core.c
> >>>>>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
> >>>>>>        return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> >>>>>>    }
> >>>>>>    +int xendevicemodel_set_irq_level(
> >>>>>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
> >>>>>> +    unsigned int level)
> >>>>>
> >>>>> It is a pity that having xen_dm_op_set_pci_intx_level and
> >>>>> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
> >>>>> the names alone I don't think we can reuse either of them.
> >>>>
> >>>> The problem is not the name...
> >>>>
> >>>>>
> >>>>> It is very similar to set_isa_irq_level. We could almost rename
> >>>>> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
> >>>>> better, just add an alias to it so that xendevicemodel_set_irq_level is
> >>>>> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
> >>>>> not sure if it is worth doing it though. Any other opinions?
> >>>>
> >>>> ... the problem is the interrupt field is only 8-bit. So we would only be
> >>>> able
> >>>> to cover IRQ 0 - 255.
> >>>
> >>> Argh, that's not going to work :-(  I wasn't sure if it was a good idea
> >>> anyway.
> >>>
> >>>
> >>>> It is not entirely clear how the existing subop could be extended without
> >>>> breaking existing callers.
> >>>>
> >>>>> But I think we should plan for not needing two calls (one to set level
> >>>>> to 1, and one to set it to 0):
> >>>>> https://marc.info/?l=xen-devel&m=159535112027405
> >>>>
> >>>> I am not sure to understand your suggestion here? Are you suggesting to
> >>>> remove
> >>>> the 'level' parameter?
> >>>
> >>> My hope was to make it optional to call the hypercall with level = 0,
> >>> not necessarily to remove 'level' from the struct.
> >>
> >> From my understanding, the hypercall is meant to represent the status of the
> >> line between the device and the interrupt controller (either low or high).
> >>
> >> This is then up to the interrupt controller to decide when the interrupt is
> >> going to be fired:
> >>   - For edge interrupt, this will fire when the line move from low to high (or
> >> vice versa).
> >>   - For level interrupt, this will fire when line is high (assuming level
> >> trigger high) and will keeping firing until the device decided to lower the
> >> line.
> >>
> >> For a device, it is common to keep the line high until an OS wrote to a
> >> specific register.
> >>
> >> Furthermore, technically, the guest OS is in charge to configure how an
> >> interrupt is triggered. Admittely this information is part of the DT, but
> >> nothing prevent a guest to change it.
> >>
> >> As side note, we have a workaround in Xen for some buggy DT (see the arch
> >> timer) exposing the wrong trigger type.
> >>
> >> Because of that, I don't really see a way to make optional. Maybe you have
> >> something different in mind?
> > 
> > For level, we need the level parameter. For edge, we are only interested
> > in the "edge", right?
> 
> I don't think so, unless Arm has special restrictions. Edges can be
> both rising and falling ones.

And the same is true for level interrupts too: they could be active-low
or active-high.


Instead of modelling the state of the line, which seems to be a bit
error prone especially in the case of a single-device emulator that
might not have enough information about the rest of the system (it might
not know if the interrupt is active-high or active-low), we could model
the triggering of the interrupt instead.

In the case of level=1, it would mean that the interrupt line is active,
no matter if it is active-low or active-high. In the case of level=0, it
would mean that it is inactive.

Similarly, in the case of an edge interrupt edge=1 or level=1 would mean
that there is an edge, no matter if it is a rising or falling.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-07 21:50                 ` Stefano Stabellini
@ 2020-08-07 22:19                   ` Oleksandr
  2020-08-10 13:41                     ` Oleksandr
  2020-08-07 23:45                   ` Oleksandr
  1 sibling, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-07 22:19 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', Julien Grall, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', 'Oleksandr Tyshchenko',
	'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'


On 08.08.20 00:50, Stefano Stabellini wrote:

Hi Stefano

> On Fri, 7 Aug 2020, Oleksandr wrote:
>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>
>> Hi Stefano
>>
>> Trying to simulate IO_RETRY handling mechanism (according to model below) I
>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>
>>> OK, thanks for the details. My interpretation seems to be correct.
>>>
>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
>>> return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>> also needs to handle try_handle_mmio returning IO_RETRY the first
>>> around, and IO_HANDLED when after QEMU does its job.
>>>
>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
>>> early and let the scheduler do its job? Something like:
>>>
>>>               enum io_state state = try_handle_mmio(regs, hsr, gpa);
>>>
>>>               switch ( state )
>>>               {
>>>               case IO_ABORT:
>>>                   goto inject_abt;
>>>               case IO_HANDLED:
>>>                   advance_pc(regs, hsr);
>>>                   return;
>>>               case IO_RETRY:
>>>                   /* finish later */
>>>                   return;
>>>               case IO_UNHANDLED:
>>>                   /* IO unhandled, try another way to handle it. */
>>>                   break;
>>>               default:
>>>                   ASSERT_UNREACHABLE();
>>>               }
>>>
>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>> handle_hvm_io_completion() after QEMU completes the emulation. Today,
>>> handle_mmio just sets the user register with the read value.
>>>
>>> But it would be better if it called again the original function
>>> do_trap_stage2_abort_guest to actually retry the original operation.
>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
>>> IO_HANDLED instead of IO_RETRY,
>> I may miss some important point, but I failed to see why try_handle_mmio
>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this stage.
>> Or current try_fwd_ioserv() logic needs rework?
> I think you should check the ioreq->state in try_fwd_ioserv(), if the
> result is ready, then ioreq->state should be STATE_IORESP_READY, and you
> can return IO_HANDLED.

Indeed! Just coming to this opinion I saw your answer)

This is a dirty test patch:


---
  xen/arch/arm/io.c               | 12 ++++++++++++
  xen/arch/arm/ioreq.c            | 12 ++++++++++++
  xen/arch/arm/traps.c            |  6 ++++--
  xen/include/asm-arm/hvm/ioreq.h |  2 ++
  xen/include/asm-arm/traps.h     |  3 +++
  5 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 436f669..65a08f8 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -130,6 +130,10 @@ static enum io_state try_fwd_ioserv(struct 
cpu_user_regs *regs,
      {
      case STATE_IOREQ_NONE:
          break;
+
+    case STATE_IORESP_READY:
+        return IO_HANDLED;
+
      default:
          printk("d%u wrong state %u\n", v->domain->domain_id,
                 vio->io_req.state);
@@ -156,9 +160,11 @@ static enum io_state try_fwd_ioserv(struct 
cpu_user_regs *regs,
      else
          vio->io_completion = HVMIO_mmio_completion;

+#if 0
      /* XXX: Decide what to do */
      if ( rc == IO_RETRY )
          rc = IO_HANDLED;
+#endif

      return rc;
  }
@@ -185,6 +191,12 @@ enum io_state try_handle_mmio(struct cpu_user_regs 
*regs,

  #ifdef CONFIG_IOREQ_SERVER
          rc = try_fwd_ioserv(regs, v, &info);
+        if ( rc == IO_HANDLED )
+        {
+            printk("HANDLED %s[%d]\n", __func__, __LINE__);
+            handle_mmio_finish();
+        } else
+            printk("RETRY %s[%d]\n", __func__, __LINE__);
  #endif

          return rc;
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index 8f60c41..c8ed454 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -33,8 +33,20 @@
  #include <public/hvm/dm_op.h>
  #include <public/hvm/ioreq.h>

+#include <asm/traps.h>
+
  bool handle_mmio(void)
  {
+    struct cpu_user_regs *regs = guest_cpu_user_regs();
+    const union hsr hsr = { .bits = regs->hsr };
+
+    do_trap_stage2_abort_guest(regs, hsr);
+
+    return true;
+}
+
+bool handle_mmio_finish(void)
+{
      struct vcpu *v = current;
      struct cpu_user_regs *regs = guest_cpu_user_regs();
      const union hsr hsr = { .bits = regs->hsr };
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index ea472d1..3493d77 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1882,7 +1882,7 @@ static bool try_map_mmio(gfn_t gfn)
      return !map_regions_p2mt(d, gfn, 1, mfn, p2m_mmio_direct_c);
  }

-static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
+void do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
                                         const union hsr hsr)
  {
      /*
@@ -1965,11 +1965,13 @@ static void do_trap_stage2_abort_guest(struct 
cpu_user_regs *regs,
              case IO_HANDLED:
                  advance_pc(regs, hsr);
                  return;
+            case IO_RETRY:
+                /* finish later */
+                return;
              case IO_UNHANDLED:
                  /* IO unhandled, try another way to handle it. */
                  break;
              default:
-                /* XXX: Handle IO_RETRY */
                  ASSERT_UNREACHABLE();
              }
          }
diff --git a/xen/include/asm-arm/hvm/ioreq.h 
b/xen/include/asm-arm/hvm/ioreq.h
index 392ce64..fb4684d 100644
--- a/xen/include/asm-arm/hvm/ioreq.h
+++ b/xen/include/asm-arm/hvm/ioreq.h
@@ -27,6 +27,8 @@

  bool handle_mmio(void);

+bool handle_mmio_finish(void);
+
  static inline bool handle_pio(uint16_t port, unsigned int size, int dir)
  {
      /* XXX */
diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h
index 997c378..392fdb1 100644
--- a/xen/include/asm-arm/traps.h
+++ b/xen/include/asm-arm/traps.h
@@ -40,6 +40,9 @@ void advance_pc(struct cpu_user_regs *regs, const 
union hsr hsr);

  void inject_undef_exception(struct cpu_user_regs *regs, const union 
hsr hsr);

+void do_trap_stage2_abort_guest(struct cpu_user_regs *regs,
+                                        const union hsr hsr);
+
  /* read as zero and write ignore */
  void handle_raz_wi(struct cpu_user_regs *regs, int regidx, bool read,
                     const union hsr hsr, int min_el);
-- 
2.7.4


>
> That is assuming that you are looking at the live version of the ioreq
> shared with QEMU instead of a private copy of it, which I am not sure.
> Looking at try_fwd_ioserv() it would seem that vio->io_req is just a
> copy? The live version is returned by get_ioreq() ?
>
> Even in handle_hvm_io_completion, instead of setting vio->io_req.state
> to STATE_IORESP_READY by hand, it would be better to look at the live
> version of the ioreq because QEMU will have already set ioreq->state
> to STATE_IORESP_READY (hw/i386/xen/xen-hvm.c:cpu_handle_ioreq).

I need to recheck that.


Thank you.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-07 21:50                 ` Stefano Stabellini
  2020-08-07 22:19                   ` Oleksandr
@ 2020-08-07 23:45                   ` Oleksandr
  2020-08-10 23:34                     ` Stefano Stabellini
  1 sibling, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-07 23:45 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', Julien Grall, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', 'Oleksandr Tyshchenko',
	'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'


On 08.08.20 00:50, Stefano Stabellini wrote:

Hi

> On Fri, 7 Aug 2020, Oleksandr wrote:
>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>
>> Hi Stefano
>>
>> Trying to simulate IO_RETRY handling mechanism (according to model below) I
>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>
>>> OK, thanks for the details. My interpretation seems to be correct.
>>>
>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
>>> return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>> also needs to handle try_handle_mmio returning IO_RETRY the first
>>> around, and IO_HANDLED when after QEMU does its job.
>>>
>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
>>> early and let the scheduler do its job? Something like:
>>>
>>>               enum io_state state = try_handle_mmio(regs, hsr, gpa);
>>>
>>>               switch ( state )
>>>               {
>>>               case IO_ABORT:
>>>                   goto inject_abt;
>>>               case IO_HANDLED:
>>>                   advance_pc(regs, hsr);
>>>                   return;
>>>               case IO_RETRY:
>>>                   /* finish later */
>>>                   return;
>>>               case IO_UNHANDLED:
>>>                   /* IO unhandled, try another way to handle it. */
>>>                   break;
>>>               default:
>>>                   ASSERT_UNREACHABLE();
>>>               }
>>>
>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>> handle_hvm_io_completion() after QEMU completes the emulation. Today,
>>> handle_mmio just sets the user register with the read value.
>>>
>>> But it would be better if it called again the original function
>>> do_trap_stage2_abort_guest to actually retry the original operation.
>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
>>> IO_HANDLED instead of IO_RETRY,
>> I may miss some important point, but I failed to see why try_handle_mmio
>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this stage.
>> Or current try_fwd_ioserv() logic needs rework?
> I think you should check the ioreq->state in try_fwd_ioserv(), if the
> result is ready, then ioreq->state should be STATE_IORESP_READY, and you
> can return IO_HANDLED.
>
> That is assuming that you are looking at the live version of the ioreq
> shared with QEMU instead of a private copy of it, which I am not sure.
> Looking at try_fwd_ioserv() it would seem that vio->io_req is just a
> copy? The live version is returned by get_ioreq() ?

If I understand the code correctly, indeed, get_ioreq() returns live 
version shared with emulator.
Desired state change (STATE_IORESP_READY) what actually the 
hvm_wait_for_io() is waiting for is set here (in my case):
https://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git;a=blob;f=demu.c;h=f785b394d0cf141dffa05bdddecf338214358aea;hb=refs/heads/master#l698 


> Even in handle_hvm_io_completion, instead of setting vio->io_req.state
> to STATE_IORESP_READY by hand, it would be better to look at the live
> version of the ioreq because QEMU will have already set ioreq->state
> to STATE_IORESP_READY (hw/i386/xen/xen-hvm.c:cpu_handle_ioreq).
It seems that after detecting STATE_IORESP_READY in hvm_wait_for_io() 
the state of live version is set to STATE_IOREQ_NONE immediately, so 
looking at the live version down the handle_hvm_io_completion()
or in try_fwd_ioserv() shows us nothing I am afraid.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-07 21:50               ` Stefano Stabellini
@ 2020-08-08  9:27                 ` Julien Grall
  2020-08-08  9:28                   ` Julien Grall
  2020-08-10 23:34                   ` Stefano Stabellini
  2020-08-17 15:23                 ` Jan Beulich
  1 sibling, 2 replies; 140+ messages in thread
From: Julien Grall @ 2020-08-08  9:27 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Oleksandr Tyshchenko, Julien Grall,
	Jan Beulich, xen-devel, Volodymyr Babchuk

On Fri, 7 Aug 2020 at 22:51, Stefano Stabellini <sstabellini@kernel.org> wrote:
>
> On Fri, 7 Aug 2020, Jan Beulich wrote:
> > On 07.08.2020 01:49, Stefano Stabellini wrote:
> > > On Thu, 6 Aug 2020, Julien Grall wrote:
> > >> On 06/08/2020 01:37, Stefano Stabellini wrote:
> > >>> On Wed, 5 Aug 2020, Julien Grall wrote:
> > >>>> On 05/08/2020 00:22, Stefano Stabellini wrote:
> > >>>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> > >>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > >>>>>>
> > >>>>>> This patch adds ability to the device emulator to notify otherend
> > >>>>>> (some entity running in the guest) using a SPI and implements Arm
> > >>>>>> specific bits for it. Proposed interface allows emulator to set
> > >>>>>> the logical level of a one of a domain's IRQ lines.
> > >>>>>>
> > >>>>>> Please note, this is a split/cleanup of Julien's PoC:
> > >>>>>> "Add support for Guest IO forwarding to a device emulator"
> > >>>>>>
> > >>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
> > >>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > >>>>>> ---
> > >>>>>>    tools/libs/devicemodel/core.c                   | 18
> > >>>>>> ++++++++++++++++++
> > >>>>>>    tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
> > >>>>>>    tools/libs/devicemodel/libxendevicemodel.map    |  1 +
> > >>>>>>    xen/arch/arm/dm.c                               | 22
> > >>>>>> +++++++++++++++++++++-
> > >>>>>>    xen/common/hvm/dm.c                             |  1 +
> > >>>>>>    xen/include/public/hvm/dm_op.h                  | 15
> > >>>>>> +++++++++++++++
> > >>>>>>    6 files changed, 60 insertions(+), 1 deletion(-)
> > >>>>>>
> > >>>>>> diff --git a/tools/libs/devicemodel/core.c
> > >>>>>> b/tools/libs/devicemodel/core.c
> > >>>>>> index 4d40639..30bd79f 100644
> > >>>>>> --- a/tools/libs/devicemodel/core.c
> > >>>>>> +++ b/tools/libs/devicemodel/core.c
> > >>>>>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
> > >>>>>>        return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> > >>>>>>    }
> > >>>>>>    +int xendevicemodel_set_irq_level(
> > >>>>>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
> > >>>>>> +    unsigned int level)
> > >>>>>
> > >>>>> It is a pity that having xen_dm_op_set_pci_intx_level and
> > >>>>> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
> > >>>>> the names alone I don't think we can reuse either of them.
> > >>>>
> > >>>> The problem is not the name...
> > >>>>
> > >>>>>
> > >>>>> It is very similar to set_isa_irq_level. We could almost rename
> > >>>>> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
> > >>>>> better, just add an alias to it so that xendevicemodel_set_irq_level is
> > >>>>> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
> > >>>>> not sure if it is worth doing it though. Any other opinions?
> > >>>>
> > >>>> ... the problem is the interrupt field is only 8-bit. So we would only be
> > >>>> able
> > >>>> to cover IRQ 0 - 255.
> > >>>
> > >>> Argh, that's not going to work :-(  I wasn't sure if it was a good idea
> > >>> anyway.
> > >>>
> > >>>
> > >>>> It is not entirely clear how the existing subop could be extended without
> > >>>> breaking existing callers.
> > >>>>
> > >>>>> But I think we should plan for not needing two calls (one to set level
> > >>>>> to 1, and one to set it to 0):
> > >>>>> https://marc.info/?l=xen-devel&m=159535112027405
> > >>>>
> > >>>> I am not sure to understand your suggestion here? Are you suggesting to
> > >>>> remove
> > >>>> the 'level' parameter?
> > >>>
> > >>> My hope was to make it optional to call the hypercall with level = 0,
> > >>> not necessarily to remove 'level' from the struct.
> > >>
> > >> From my understanding, the hypercall is meant to represent the status of the
> > >> line between the device and the interrupt controller (either low or high).
> > >>
> > >> This is then up to the interrupt controller to decide when the interrupt is
> > >> going to be fired:
> > >>   - For edge interrupt, this will fire when the line move from low to high (or
> > >> vice versa).
> > >>   - For level interrupt, this will fire when line is high (assuming level
> > >> trigger high) and will keeping firing until the device decided to lower the
> > >> line.
> > >>
> > >> For a device, it is common to keep the line high until an OS wrote to a
> > >> specific register.
> > >>
> > >> Furthermore, technically, the guest OS is in charge to configure how an
> > >> interrupt is triggered. Admittely this information is part of the DT, but
> > >> nothing prevent a guest to change it.
> > >>
> > >> As side note, we have a workaround in Xen for some buggy DT (see the arch
> > >> timer) exposing the wrong trigger type.
> > >>
> > >> Because of that, I don't really see a way to make optional. Maybe you have
> > >> something different in mind?
> > >
> > > For level, we need the level parameter. For edge, we are only interested
> > > in the "edge", right?
> >
> > I don't think so, unless Arm has special restrictions. Edges can be
> > both rising and falling ones.
>
> And the same is true for level interrupts too: they could be active-low
> or active-high.
>
>
> Instead of modelling the state of the line, which seems to be a bit
> error prone especially in the case of a single-device emulator that
> might not have enough information about the rest of the system (it might
> not know if the interrupt is active-high or active-low), we could model
> the triggering of the interrupt instead.

I am not sure to understand why the single (or event multiple) device
emulator needs to know the trigger type. The information of the
trigger type of the interrupt would be described in the firmware table
and it is expected to be the same as what the emulator expects.

If the guest OS decided to configure wrongly the interrupt trigger
type, then it may not work properly. But, from my understanding, this
doesn't differ from the HW behavior.

>
> In the case of level=1, it would mean that the interrupt line is active,
> no matter if it is active-low or active-high. In the case of level=0, it
> would mean that it is inactive.
>
> Similarly, in the case of an edge interrupt edge=1 or level=1 would mean
> that there is an edge, no matter if it is rising or falling.

TBH, I think your approach is only going to introduce more headache in
Xen if a guest OS decides to change the trigger type.

It feels much easier to just ask the emulator to let us know the level
of the line. Then if the guest OS decides to change the trigger type,
we only need to resample the line.

Cheers,


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-08  9:27                 ` Julien Grall
@ 2020-08-08  9:28                   ` Julien Grall
  2020-08-10 23:34                   ` Stefano Stabellini
  1 sibling, 0 replies; 140+ messages in thread
From: Julien Grall @ 2020-08-08  9:28 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Oleksandr Tyshchenko, Julien Grall,
	Jan Beulich, xen-devel, Volodymyr Babchuk

On Sat, 8 Aug 2020 at 10:27, Julien Grall <julien.grall.oss@gmail.com> wrote:
>
> On Fri, 7 Aug 2020 at 22:51, Stefano Stabellini <sstabellini@kernel.org> wrote:
> >
> > On Fri, 7 Aug 2020, Jan Beulich wrote:
> > > On 07.08.2020 01:49, Stefano Stabellini wrote:
> > > > On Thu, 6 Aug 2020, Julien Grall wrote:
> > > >> On 06/08/2020 01:37, Stefano Stabellini wrote:
> > > >>> On Wed, 5 Aug 2020, Julien Grall wrote:
> > > >>>> On 05/08/2020 00:22, Stefano Stabellini wrote:
> > > >>>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> > > >>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > >>>>>>
> > > >>>>>> This patch adds ability to the device emulator to notify otherend
> > > >>>>>> (some entity running in the guest) using a SPI and implements Arm
> > > >>>>>> specific bits for it. Proposed interface allows emulator to set
> > > >>>>>> the logical level of a one of a domain's IRQ lines.
> > > >>>>>>
> > > >>>>>> Please note, this is a split/cleanup of Julien's PoC:
> > > >>>>>> "Add support for Guest IO forwarding to a device emulator"
> > > >>>>>>
> > > >>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
> > > >>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > >>>>>> ---
> > > >>>>>>    tools/libs/devicemodel/core.c                   | 18
> > > >>>>>> ++++++++++++++++++
> > > >>>>>>    tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
> > > >>>>>>    tools/libs/devicemodel/libxendevicemodel.map    |  1 +
> > > >>>>>>    xen/arch/arm/dm.c                               | 22
> > > >>>>>> +++++++++++++++++++++-
> > > >>>>>>    xen/common/hvm/dm.c                             |  1 +
> > > >>>>>>    xen/include/public/hvm/dm_op.h                  | 15
> > > >>>>>> +++++++++++++++
> > > >>>>>>    6 files changed, 60 insertions(+), 1 deletion(-)
> > > >>>>>>
> > > >>>>>> diff --git a/tools/libs/devicemodel/core.c
> > > >>>>>> b/tools/libs/devicemodel/core.c
> > > >>>>>> index 4d40639..30bd79f 100644
> > > >>>>>> --- a/tools/libs/devicemodel/core.c
> > > >>>>>> +++ b/tools/libs/devicemodel/core.c
> > > >>>>>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
> > > >>>>>>        return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> > > >>>>>>    }
> > > >>>>>>    +int xendevicemodel_set_irq_level(
> > > >>>>>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
> > > >>>>>> +    unsigned int level)
> > > >>>>>
> > > >>>>> It is a pity that having xen_dm_op_set_pci_intx_level and
> > > >>>>> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
> > > >>>>> the names alone I don't think we can reuse either of them.
> > > >>>>
> > > >>>> The problem is not the name...
> > > >>>>
> > > >>>>>
> > > >>>>> It is very similar to set_isa_irq_level. We could almost rename
> > > >>>>> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
> > > >>>>> better, just add an alias to it so that xendevicemodel_set_irq_level is
> > > >>>>> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
> > > >>>>> not sure if it is worth doing it though. Any other opinions?
> > > >>>>
> > > >>>> ... the problem is the interrupt field is only 8-bit. So we would only be
> > > >>>> able
> > > >>>> to cover IRQ 0 - 255.
> > > >>>
> > > >>> Argh, that's not going to work :-(  I wasn't sure if it was a good idea
> > > >>> anyway.
> > > >>>
> > > >>>
> > > >>>> It is not entirely clear how the existing subop could be extended without
> > > >>>> breaking existing callers.
> > > >>>>
> > > >>>>> But I think we should plan for not needing two calls (one to set level
> > > >>>>> to 1, and one to set it to 0):
> > > >>>>> https://marc.info/?l=xen-devel&m=159535112027405
> > > >>>>
> > > >>>> I am not sure to understand your suggestion here? Are you suggesting to
> > > >>>> remove
> > > >>>> the 'level' parameter?
> > > >>>
> > > >>> My hope was to make it optional to call the hypercall with level = 0,
> > > >>> not necessarily to remove 'level' from the struct.
> > > >>
> > > >> From my understanding, the hypercall is meant to represent the status of the
> > > >> line between the device and the interrupt controller (either low or high).
> > > >>
> > > >> This is then up to the interrupt controller to decide when the interrupt is
> > > >> going to be fired:
> > > >>   - For edge interrupt, this will fire when the line move from low to high (or
> > > >> vice versa).
> > > >>   - For level interrupt, this will fire when line is high (assuming level
> > > >> trigger high) and will keeping firing until the device decided to lower the
> > > >> line.
> > > >>
> > > >> For a device, it is common to keep the line high until an OS wrote to a
> > > >> specific register.
> > > >>
> > > >> Furthermore, technically, the guest OS is in charge to configure how an
> > > >> interrupt is triggered. Admittely this information is part of the DT, but
> > > >> nothing prevent a guest to change it.
> > > >>
> > > >> As side note, we have a workaround in Xen for some buggy DT (see the arch
> > > >> timer) exposing the wrong trigger type.
> > > >>
> > > >> Because of that, I don't really see a way to make optional. Maybe you have
> > > >> something different in mind?
> > > >
> > > > For level, we need the level parameter. For edge, we are only interested
> > > > in the "edge", right?
> > >
> > > I don't think so, unless Arm has special restrictions. Edges can be
> > > both rising and falling ones.
> >
> > And the same is true for level interrupts too: they could be active-low
> > or active-high.
> >
> >
> > Instead of modelling the state of the line, which seems to be a bit
> > error prone especially in the case of a single-device emulator that
> > might not have enough information about the rest of the system (it might
> > not know if the interrupt is active-high or active-low), we could model
> > the triggering of the interrupt instead.
>
> I am not sure to understand why the single (or event multiple) device
> emulator needs to know the trigger type. The information of the

I mean trigger type configured by the OS here. Sorry for the confusion.

> trigger type of the interrupt would be described in the firmware table
> and it is expected to be the same as what the emulator expects.
>
> If the guest OS decided to configure wrongly the interrupt trigger
> type, then it may not work properly. But, from my understanding, this
> doesn't differ from the HW behavior.
>
> >
> > In the case of level=1, it would mean that the interrupt line is active,
> > no matter if it is active-low or active-high. In the case of level=0, it
> > would mean that it is inactive.
> >
> > Similarly, in the case of an edge interrupt edge=1 or level=1 would mean
> > that there is an edge, no matter if it is rising or falling.
>
> TBH, I think your approach is only going to introduce more headache in
> Xen if a guest OS decides to change the trigger type.
>
> It feels much easier to just ask the emulator to let us know the level
> of the line. Then if the guest OS decides to change the trigger type,
> we only need to resample the line.
>
> Cheers,


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-07 22:19                   ` Oleksandr
@ 2020-08-10 13:41                     ` Oleksandr
  2020-08-10 23:34                       ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-10 13:41 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall
  Cc: 'Kevin Tian', 'Jun Nakajima', 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	'Oleksandr Tyshchenko', 'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'


Hi


On 08.08.20 01:19, Oleksandr wrote:
>
> On 08.08.20 00:50, Stefano Stabellini wrote:
>
> Hi Stefano
>
>> On Fri, 7 Aug 2020, Oleksandr wrote:
>>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>>
>>> Hi Stefano
>>>
>>> Trying to simulate IO_RETRY handling mechanism (according to model 
>>> below) I
>>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>>
>>>> OK, thanks for the details. My interpretation seems to be correct.
>>>>
>>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
>>>> return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>>> also needs to handle try_handle_mmio returning IO_RETRY the first
>>>> around, and IO_HANDLED when after QEMU does its job.
>>>>
>>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
>>>> early and let the scheduler do its job? Something like:
>>>>
>>>>               enum io_state state = try_handle_mmio(regs, hsr, gpa);
>>>>
>>>>               switch ( state )
>>>>               {
>>>>               case IO_ABORT:
>>>>                   goto inject_abt;
>>>>               case IO_HANDLED:
>>>>                   advance_pc(regs, hsr);
>>>>                   return;
>>>>               case IO_RETRY:
>>>>                   /* finish later */
>>>>                   return;
>>>>               case IO_UNHANDLED:
>>>>                   /* IO unhandled, try another way to handle it. */
>>>>                   break;
>>>>               default:
>>>>                   ASSERT_UNREACHABLE();
>>>>               }
>>>>
>>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>>> handle_hvm_io_completion() after QEMU completes the emulation. Today,
>>>> handle_mmio just sets the user register with the read value.
>>>>
>>>> But it would be better if it called again the original function
>>>> do_trap_stage2_abort_guest to actually retry the original operation.
>>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
>>>> IO_HANDLED instead of IO_RETRY,
>>> I may miss some important point, but I failed to see why 
>>> try_handle_mmio
>>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this 
>>> stage.
>>> Or current try_fwd_ioserv() logic needs rework?
>> I think you should check the ioreq->state in try_fwd_ioserv(), if the
>> result is ready, then ioreq->state should be STATE_IORESP_READY, and you
>> can return IO_HANDLED.
>

I optimized test patch a bit (now it looks much simpler). I didn't face 
any issues during a quick test.
Julien, Stefano, what do you think it does proper things for addressing 
TODO? Or I missed something?


---
  xen/arch/arm/io.c    | 4 ----
  xen/arch/arm/ioreq.c | 7 ++++++-
  xen/arch/arm/traps.c | 4 +++-
  3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 436f669..3063577 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -156,10 +156,6 @@ static enum io_state try_fwd_ioserv(struct 
cpu_user_regs *regs,
      else
          vio->io_completion = HVMIO_mmio_completion;

-    /* XXX: Decide what to do */
-    if ( rc == IO_RETRY )
-        rc = IO_HANDLED;
-
      return rc;
  }
  #endif
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index 8f60c41..e5235c6 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -33,6 +33,8 @@
  #include <public/hvm/dm_op.h>
  #include <public/hvm/ioreq.h>

+#include <asm/traps.h>
+
  bool handle_mmio(void)
  {
      struct vcpu *v = current;
@@ -52,7 +54,7 @@ bool handle_mmio(void)

      /* XXX: Do we need to take care of write here ? */
      if ( dabt.write )
-        return true;
+        goto done;

      /*
       * Sign extend if required.
@@ -72,6 +74,9 @@ bool handle_mmio(void)

      set_user_reg(regs, dabt.reg, r);

+done:
+    advance_pc(regs, hsr);
+
      return true;
  }

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index ea472d1..974c744 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1965,11 +1965,13 @@ static void do_trap_stage2_abort_guest(struct 
cpu_user_regs *regs,
              case IO_HANDLED:
                  advance_pc(regs, hsr);
                  return;
+            case IO_RETRY:
+                /* finish later */
+                return;
              case IO_UNHANDLED:
                  /* IO unhandled, try another way to handle it. */
                  break;
              default:
-                /* XXX: Handle IO_RETRY */
                  ASSERT_UNREACHABLE();
              }
          }
-- 
2.7.4


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-06 11:37     ` Oleksandr
@ 2020-08-10 16:29       ` Julien Grall
  2020-08-10 17:28         ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-10 16:29 UTC (permalink / raw)
  To: Oleksandr, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Jan Beulich, Wei Liu,
	Paul Durrant, Andrew Cooper, Ian Jackson, George Dunlap,
	Tim Deegan, Oleksandr Tyshchenko, Julien Grall, Jun Nakajima,
	Roger Pau Monné

Hi,

On 06/08/2020 12:37, Oleksandr wrote:
> 
> On 05.08.20 16:30, Julien Grall wrote:
>> Hi,
> 
> Hi Julien
> 
> 
>>
>> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>
>>> As a lot of x86 code can be re-used on Arm later on, this patch
>>> splits IOREQ support into common and arch specific parts.
>>>
>>> This support is going to be used on Arm to be able run device
>>> emulator outside of Xen hypervisor.
>>>
>>> Please note, this is a split/cleanup of Julien's PoC:
>>> "Add support for Guest IO forwarding to a device emulator"
>>>
>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> ---
>>>   xen/arch/x86/Kconfig            |    1 +
>>>   xen/arch/x86/hvm/dm.c           |    2 +-
>>>   xen/arch/x86/hvm/emulate.c      |    2 +-
>>>   xen/arch/x86/hvm/hvm.c          |    2 +-
>>>   xen/arch/x86/hvm/io.c           |    2 +-
>>>   xen/arch/x86/hvm/ioreq.c        | 1431 
>>> +--------------------------------------
>>>   xen/arch/x86/hvm/stdvga.c       |    2 +-
>>>   xen/arch/x86/hvm/vmx/realmode.c |    1 +
>>>   xen/arch/x86/hvm/vmx/vvmx.c     |    2 +-
>>>   xen/arch/x86/mm.c               |    2 +-
>>>   xen/arch/x86/mm/shadow/common.c |    2 +-
>>>   xen/common/Kconfig              |    3 +
>>>   xen/common/Makefile             |    1 +
>>>   xen/common/hvm/Makefile         |    1 +
>>>   xen/common/hvm/ioreq.c          | 1430 
>>> ++++++++++++++++++++++++++++++++++++++
>>>   xen/include/asm-x86/hvm/ioreq.h |   45 +-
>>>   xen/include/asm-x86/hvm/vcpu.h  |    7 -
>>>   xen/include/xen/hvm/ioreq.h     |   89 +++
>>>   18 files changed, 1575 insertions(+), 1450 deletions(-)
>>
>> That's quite a lot of code moved in a single patch. How can we check 
>> the code moved is still correct? Is it a verbatim copy?
> In this patch I mostly tried to separate a common IOREQ part which also 
> resulted in updating x86 sources to include new header.  Also I moved 
> hvm_ioreq_needs_completion() to common header (which probably wanted to 
> be in a separate patch). It was a verbatim copy initially (w/o 
> hvm_map_mem_type_to_ioreq_server) and then updated to deal with arch 
> specific parts.

I would prefer if the two parts are done separately. IOW, the code 
movement be nearly a verbatim copy.

> In which way do you want me to split this patch?
> 
> I could think of the following:
> 
> 1. Copy of x86's ioreq.c/ioreq.h to common code > 2. Update common ioreq.c/ioreq.h
> 3. Update x86's parts to be able to deal with common code
> 4. Move hvm_ioreq_needs_completion() to common code
> 

Ideally the code movement should be done in the same patch. This helps 
to check the patch is only moving code and also avoids mistakes on rebase.

So my preference would be to first modify the x86 code (e.g. renaming) 
to make it common and then have one patch that will move the code.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-10 16:29       ` Julien Grall
@ 2020-08-10 17:28         ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-10 17:28 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Jan Beulich, Wei Liu,
	Paul Durrant, Andrew Cooper, Ian Jackson, George Dunlap,
	Tim Deegan, Oleksandr Tyshchenko, Julien Grall, Jun Nakajima,
	Roger Pau Monné


On 10.08.20 19:29, Julien Grall wrote:
> Hi,

Hi Julien


>
> On 06/08/2020 12:37, Oleksandr wrote:
>>
>> On 05.08.20 16:30, Julien Grall wrote:
>>> Hi,
>>
>> Hi Julien
>>
>>
>>>
>>> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>
>>>> As a lot of x86 code can be re-used on Arm later on, this patch
>>>> splits IOREQ support into common and arch specific parts.
>>>>
>>>> This support is going to be used on Arm to be able run device
>>>> emulator outside of Xen hypervisor.
>>>>
>>>> Please note, this is a split/cleanup of Julien's PoC:
>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>>   xen/arch/x86/Kconfig            |    1 +
>>>>   xen/arch/x86/hvm/dm.c           |    2 +-
>>>>   xen/arch/x86/hvm/emulate.c      |    2 +-
>>>>   xen/arch/x86/hvm/hvm.c          |    2 +-
>>>>   xen/arch/x86/hvm/io.c           |    2 +-
>>>>   xen/arch/x86/hvm/ioreq.c        | 1431 
>>>> +--------------------------------------
>>>>   xen/arch/x86/hvm/stdvga.c       |    2 +-
>>>>   xen/arch/x86/hvm/vmx/realmode.c |    1 +
>>>>   xen/arch/x86/hvm/vmx/vvmx.c     |    2 +-
>>>>   xen/arch/x86/mm.c               |    2 +-
>>>>   xen/arch/x86/mm/shadow/common.c |    2 +-
>>>>   xen/common/Kconfig              |    3 +
>>>>   xen/common/Makefile             |    1 +
>>>>   xen/common/hvm/Makefile         |    1 +
>>>>   xen/common/hvm/ioreq.c          | 1430 
>>>> ++++++++++++++++++++++++++++++++++++++
>>>>   xen/include/asm-x86/hvm/ioreq.h |   45 +-
>>>>   xen/include/asm-x86/hvm/vcpu.h  |    7 -
>>>>   xen/include/xen/hvm/ioreq.h     |   89 +++
>>>>   18 files changed, 1575 insertions(+), 1450 deletions(-)
>>>
>>> That's quite a lot of code moved in a single patch. How can we check 
>>> the code moved is still correct? Is it a verbatim copy?
>> In this patch I mostly tried to separate a common IOREQ part which 
>> also resulted in updating x86 sources to include new header.  Also I 
>> moved hvm_ioreq_needs_completion() to common header (which probably 
>> wanted to be in a separate patch). It was a verbatim copy initially 
>> (w/o hvm_map_mem_type_to_ioreq_server) and then updated to deal with 
>> arch specific parts.
>
> I would prefer if the two parts are done separately. IOW, the code 
> movement be nearly a verbatim copy.
>
>> In which way do you want me to split this patch?
>>
>> I could think of the following:
>>
>> 1. Copy of x86's ioreq.c/ioreq.h to common code > 2. Update common 
>> ioreq.c/ioreq.h
>> 3. Update x86's parts to be able to deal with common code
>> 4. Move hvm_ioreq_needs_completion() to common code
>>
>
> Ideally the code movement should be done in the same patch. This helps 
> to check the patch is only moving code and also avoids mistakes on 
> rebase.
>
> So my preference would be to first modify the x86 code (e.g. renaming) 
> to make it common and then have one patch that will move the code.

ok, will try to split accordingly. Thank you


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-05  9:32     ` Julien Grall
  2020-08-05 15:41       ` Oleksandr
@ 2020-08-10 18:09       ` Oleksandr
  2020-08-10 18:21         ` Oleksandr
  2020-08-10 19:00         ` Julien Grall
  2020-08-11 17:09       ` Oleksandr
  2 siblings, 2 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-10 18:09 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk


On 05.08.20 12:32, Julien Grall wrote:

Hi Julien

>
>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>>    */
>>>   void leave_hypervisor_to_guest(void)
>>>   {
>>> +#ifdef CONFIG_IOREQ_SERVER
>>> +    /*
>>> +     * XXX: Check the return. Shall we call that in
>>> +     * continue_running and context_switch instead?
>>> +     * The benefits would be to avoid calling
>>> +     * handle_hvm_io_completion on every return.
>>> +     */
>>
>> Yeah, that could be a simple and good optimization
>
> Well, it is not simple as it is sounds :). handle_hvm_io_completion() 
> is the function in charge to mark the vCPU as waiting for I/O. So we 
> would at least need to split the function.
>
> I wrote this TODO because I wasn't sure about the complexity of 
> handle_hvm_io_completion(current). Looking at it again, the main 
> complexity is the looping over the IOREQ servers.
>
> I think it would be better to optimize handle_hvm_io_completion() 
> rather than trying to hack the context_switch() or continue_running().
Well, is the idea in proposed dirty test patch below close to what you 
expect? Patch optimizes handle_hvm_io_completion() to avoid extra 
actions if vcpu's domain doesn't have ioreq_server, alternatively
the check could be moved out of handle_hvm_io_completion() to avoid 
calling that function at all. BTW, TODO also suggests checking the 
return value of handle_hvm_io_completion(), but I am completely sure we 
can simply
just return from leave_hypervisor_to_guest() at this point. Could you 
please share your opinion?


---
  xen/common/hvm/ioreq.c       | 12 +++++++++++-
  xen/include/asm-arm/domain.h |  1 +
  xen/include/xen/hvm/ioreq.h  |  5 +++++
  3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/xen/common/hvm/ioreq.c b/xen/common/hvm/ioreq.c
index 7e1fa23..dc6647a 100644
--- a/xen/common/hvm/ioreq.c
+++ b/xen/common/hvm/ioreq.c
@@ -38,9 +38,15 @@ static void set_ioreq_server(struct domain *d, 
unsigned int id,
                               struct hvm_ioreq_server *s)
  {
      ASSERT(id < MAX_NR_IOREQ_SERVERS);
-    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
+    ASSERT((!s && d->arch.hvm.ioreq_server.server[id]) ||
+           (s && !d->arch.hvm.ioreq_server.server[id]));

      d->arch.hvm.ioreq_server.server[id] = s;
+
+    if ( s )
+        d->arch.hvm.ioreq_server.nr_servers ++;
+    else
+        d->arch.hvm.ioreq_server.nr_servers --;
  }

  /*
@@ -169,6 +175,9 @@ bool handle_hvm_io_completion(struct vcpu *v)
          return false;
      }

+    if ( !hvm_domain_has_ioreq_server(d) )
+        return true;
+
      FOR_EACH_IOREQ_SERVER(d, id, s)
      {
          struct hvm_ioreq_vcpu *sv;
@@ -1415,6 +1424,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool 
buffered)
  void hvm_ioreq_init(struct domain *d)
  {
      spin_lock_init(&d->arch.hvm.ioreq_server.lock);
+    d->arch.hvm.ioreq_server.nr_servers = 0;

      arch_hvm_ioreq_init(d);
  }
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 6a01d69..484bd1a 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -68,6 +68,7 @@ struct hvm_domain
      struct {
          spinlock_t              lock;
          struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
+        unsigned int            nr_servers;
      } ioreq_server;

      bool_t qemu_mapcache_invalidate;
diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h
index 40b7b5e..8f78852 100644
--- a/xen/include/xen/hvm/ioreq.h
+++ b/xen/include/xen/hvm/ioreq.h
@@ -23,6 +23,11 @@

  #include <asm/hvm/ioreq.h>

+static inline bool hvm_domain_has_ioreq_server(const struct domain *d)
+{
+    return (d->arch.hvm.ioreq_server.nr_servers > 0);
+}
+
  #define GET_IOREQ_SERVER(d, id) \
      (d)->arch.hvm.ioreq_server.server[id]

-- 
2.7.4





-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-10 18:09       ` Oleksandr
@ 2020-08-10 18:21         ` Oleksandr
  2020-08-10 19:00         ` Julien Grall
  1 sibling, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-10 18:21 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk


Hi


>
>>
>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>>>    */
>>>>   void leave_hypervisor_to_guest(void)
>>>>   {
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +    /*
>>>> +     * XXX: Check the return. Shall we call that in
>>>> +     * continue_running and context_switch instead?
>>>> +     * The benefits would be to avoid calling
>>>> +     * handle_hvm_io_completion on every return.
>>>> +     */
>>>
>>> Yeah, that could be a simple and good optimization
>>
>> Well, it is not simple as it is sounds :). handle_hvm_io_completion() 
>> is the function in charge to mark the vCPU as waiting for I/O. So we 
>> would at least need to split the function.
>>
>> I wrote this TODO because I wasn't sure about the complexity of 
>> handle_hvm_io_completion(current). Looking at it again, the main 
>> complexity is the looping over the IOREQ servers.
>>
>> I think it would be better to optimize handle_hvm_io_completion() 
>> rather than trying to hack the context_switch() or continue_running().
> Well, is the idea in proposed dirty test patch below close to what you 
> expect? Patch optimizes handle_hvm_io_completion() to avoid extra 
> actions if vcpu's domain doesn't have ioreq_server, alternatively
> the check could be moved out of handle_hvm_io_completion() to avoid 
> calling that function at all. BTW, TODO also suggests checking the 
> return value of handle_hvm_io_completion(), but I am completely sure 
> we can simply
> just return from leave_hypervisor_to_guest() at this point. 

Sorry, made a mistake in last sentence).  s / I am completely sure / I 
am not completely sure


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-06 13:27         ` Oleksandr
@ 2020-08-10 18:25           ` Julien Grall
  2020-08-10 19:58             ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-10 18:25 UTC (permalink / raw)
  To: Oleksandr, xen-devel
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	Daniel De Graaf, Volodymyr Babchuk



On 06/08/2020 14:27, Oleksandr wrote:
> 
> On 06.08.20 14:08, Julien Grall wrote:
> 
> Hi Julien
> 
>>
>>>> What is this function supposed to do?
>>> Agree, sounds confusing a bit. I assume it is supposed to complete 
>>> Guest MMIO access after finishing emulation.
>>>
>>> Shall I rename it to something appropriate (maybe by adding ioreq 
>>> prefix)?
>>
>> How about ioreq_handle_complete_mmio()?
> 
> For me it sounds fine.
> 
> 
> 
>>
>>>>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>>>>> index 9283e5e..0000477 100644
>>>>> --- a/xen/common/memory.c
>>>>> +++ b/xen/common/memory.c
>>>>> @@ -8,6 +8,7 @@
>>>>>    */
>>>>>     #include <xen/domain_page.h>
>>>>> +#include <xen/hvm/ioreq.h>
>>>>>   #include <xen/types.h>
>>>>>   #include <xen/lib.h>
>>>>>   #include <xen/mm.h>
>>>>> @@ -30,10 +31,6 @@
>>>>>   #include <public/memory.h>
>>>>>   #include <xsm/xsm.h>
>>>>>   -#ifdef CONFIG_IOREQ_SERVER
>>>>> -#include <xen/hvm/ioreq.h>
>>>>> -#endif
>>>>> -
>>>>
>>>> Why do you remove something your just introduced?
>>> The reason I guarded that header is to make "xen/mm: Make x86's 
>>> XENMEM_resource_ioreq_server handling common" (previous) patch 
>>> buildable on Arm
>>> without arch IOREQ header added yet. I tried to make sure that the 
>>> result after each patch was buildable to retain bisectability.
>>> As current patch adds Arm IOREQ specific bits (including header), 
>>> that guard could be removed as not needed anymore.
>> I agree we want to have the build bisectable. However, I am still 
>> puzzled why it is necessary to remove the #ifdef and move it earlier 
>> in the list.
>>
>> Do you mind to provide more details?
> Previous patch "xen/mm: Make x86's XENMEM_resource_ioreq_server handling 
> common" breaks build on Arm as it includes xen/hvm/ioreq.h which 
> requires arch header
> to be present (asm/hvm/ioreq.h). But the missing arch header together 
> with other arch specific bits are introduced here in current patch. 

I understand that both Arm and x86 now implement the asm/hvm/ioreq.h.
However, please keep in mind that there might be other architectures in 
the future.

With your change here, you would impose a new arch to implement 
asm/hvm/ioreq.h even if the developper have no plan to use the feature.

> Probably I should have rearranged
> changes in a way to not introduce #ifdef and then remove it...

Ideally we want to avoid #ifdef in the common code. But if this can't be 
done in an header, then the #ifdef here would be fine.

> 
> 
>>
>> [...]
>>
>>>>> +
>>>>> +bool handle_mmio(void);
>>>>> +
>>>>> +static inline bool handle_pio(uint16_t port, unsigned int size, 
>>>>> int dir)
>>>>> +{
>>>>> +    /* XXX */
>>>>
>>>> Can you expand this TODO? What do you expect to do?
>>> I didn't expect this to be called on Arm. Sorry, I am not sure l have 
>>> an idea how to handle this properly. I would keep it unimplemented 
>>> until a real reason.
>>> Will expand TODO.
>>
>> Let see how the conversation on patch#1 goes about PIO vs MMIO.
> 
> ok
> 
> 
>>
>>>>
>>>>
>>>>> +    BUG();
>>>>> +    return true;
>>>>> +}
>>>>> +
>>>>> +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p)
>>>>> +{
>>>>> +    return p->addr;
>>>>> +}
>>>>
>>>> I understand that the x86 version is more complex as it check p->df. 
>>>> However, aside reducing the complexity, I am not sure why we would 
>>>> want to diverge it.
>>>>
>>>> After all, IOREQ is now meant to be a common feature.
>>> Well, no objections at all.
>>> Could you please clarify how could 'df' (Direction Flag?) be 
>>> handled/used on Arm?
>>
>> On x86, this is used by 'rep' instruction to tell the direction to 
>> iterate (forward or backward).
>>
>> On Arm, all the accesses to MMIO region will do a single memory 
>> access. So for now, we can safely always set to 0.
>>
>>> I see that try_fwd_ioserv() always sets it 0. Or I need to just reuse 
>>> x86's helpers as is,
>>> which (together with count = df = 0) will result in what we actually 
>>> have here?
>> AFAIU, both count and df should be 0 on Arm.
> 
> Thanks for the explanation. The only one question remains where to put 
> common helpers hvm_mmio_first_byte/hvm_mmio_last_byte (common io.h?)?

It feels to me it should be part of the common ioreq.h.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-10 18:09       ` Oleksandr
  2020-08-10 18:21         ` Oleksandr
@ 2020-08-10 19:00         ` Julien Grall
  2020-08-10 20:29           ` Oleksandr
  1 sibling, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-10 19:00 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk



On 10/08/2020 19:09, Oleksandr wrote:
> 
> On 05.08.20 12:32, Julien Grall wrote:
> 
> Hi Julien
> 
>>
>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>>>    */
>>>>   void leave_hypervisor_to_guest(void)
>>>>   {
>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>> +    /*
>>>> +     * XXX: Check the return. Shall we call that in
>>>> +     * continue_running and context_switch instead?
>>>> +     * The benefits would be to avoid calling
>>>> +     * handle_hvm_io_completion on every return.
>>>> +     */
>>>
>>> Yeah, that could be a simple and good optimization
>>
>> Well, it is not simple as it is sounds :). handle_hvm_io_completion() 
>> is the function in charge to mark the vCPU as waiting for I/O. So we 
>> would at least need to split the function.
>>
>> I wrote this TODO because I wasn't sure about the complexity of 
>> handle_hvm_io_completion(current). Looking at it again, the main 
>> complexity is the looping over the IOREQ servers.
>>
>> I think it would be better to optimize handle_hvm_io_completion() 
>> rather than trying to hack the context_switch() or continue_running().
> Well, is the idea in proposed dirty test patch below close to what you 
> expect? Patch optimizes handle_hvm_io_completion() to avoid extra 
> actions if vcpu's domain doesn't have ioreq_server, alternatively
> the check could be moved out of handle_hvm_io_completion() to avoid 
> calling that function at all.

This looks ok to me.

> BTW, TODO also suggests checking the 
> return value of handle_hvm_io_completion(), but I am completely sure we 
> can simply
> just return from leave_hypervisor_to_guest() at this point. Could you 
> please share your opinion?

 From my understanding, handle_hvm_io_completion() may return false if 
there is pending I/O or a failure.

In the former case, I think we want to call handle_hvm_io_completion() 
later on. Possibly after we call do_softirq().

I am wondering whether check_for_vcpu_work() could return whether there 
are more work todo on the behalf of the vCPU.

So we could have:

do
{
   check_for_pcpu_work();
} while (check_for_vcpu_work())

The implementation of check_for_vcpu_work() would be:

if ( !handle_hvm_io_completion() )
   return true;

/* Rest of the existing code */

return false;

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-06 23:48                 ` Stefano Stabellini
@ 2020-08-10 19:20                   ` Julien Grall
  2020-08-10 23:34                     ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-10 19:20 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', 'Jun Nakajima', 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	Oleksandr, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', xen-devel, 'Roger Pau Monné'



On 07/08/2020 00:48, Stefano Stabellini wrote:
> On Thu, 6 Aug 2020, Julien Grall wrote:
>> On 06/08/2020 01:37, Stefano Stabellini wrote:
>>> On Wed, 5 Aug 2020, Julien Grall wrote:
>>>> On 04/08/2020 20:11, Stefano Stabellini wrote:
>>>>> On Tue, 4 Aug 2020, Julien Grall wrote:
>>>>>> On 04/08/2020 12:10, Oleksandr wrote:
>>>>>>> On 04.08.20 10:45, Paul Durrant wrote:
>>>>>>>>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t
>>>>>>>>> *ioreq)
>>>>>>>>> +{
>>>>>>>>> +    return ioreq->state == STATE_IOREQ_READY &&
>>>>>>>>> +           !ioreq->data_is_ptr &&
>>>>>>>>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
>>>>>>>>> IOREQ_WRITE);
>>>>>>>>> +}
>>>>>>>> I don't think having this in common code is correct. The short-cut
>>>>>>>> of
>>>>>>>> not
>>>>>>>> completing PIO reads seems somewhat x86 specific.
>>>>>>
>>>>>> Hmmm, looking at the code, I think it doesn't wait for PIO writes to
>>>>>> complete
>>>>>> (not read). Did I miss anything?
>>>>>>
>>>>>>> Does ARM even
>>>>>>>> have the concept of PIO?
>>>>>>>
>>>>>>> I am not 100% sure here, but it seems that doesn't have.
>>>>>>
>>>>>> Technically, the PIOs exist on Arm, however they are accessed the same
>>>>>> way
>>>>>> as
>>>>>> MMIO and will have a dedicated area defined by the HW.
>>>>>>
>>>>>> AFAICT, on Arm64, they are only used for PCI IO Bar.
>>>>>>
>>>>>> Now the question is whether we want to expose them to the Device
>>>>>> Emulator
>>>>>> as
>>>>>> PIO or MMIO access. From a generic PoV, a DM shouldn't have to care
>>>>>> about
>>>>>> the
>>>>>> architecture used. It should just be able to request a given IOport
>>>>>> region.
>>>>>>
>>>>>> So it may make sense to differentiate them in the common ioreq code as
>>>>>> well.
>>>>>>
>>>>>> I had a quick look at QEMU and wasn't able to tell if PIOs and MMIOs
>>>>>> address
>>>>>> space are different on Arm as well. Paul, Stefano, do you know what
>>>>>> they
>>>>>> are
>>>>>> doing?
>>>>>
>>>>> On the QEMU side, it looks like PIO (address_space_io) is used in
>>>>> connection with the emulation of the "in" or "out" instructions, see
>>>>> ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
>>>>> space regardless of the architecture, such as
>>>>> hw/pci/pci_bridge.c:pci_bridge_initfn.
>>>>>
>>>>> However, because there is no "in" and "out" on ARM, I don't think
>>>>> address_space_io can be accessed. Specifically, there is no equivalent
>>>>> for target/i386/misc_helper.c:helper_inb on ARM.
>>>>
>>>> So how PCI I/O BAR are accessed? Surely, they could be used on Arm, right?
>>>
>>> PIO is also memory mapped on ARM and it seems to have its own MMIO
>>> address window.
>> This part is already well-understood :). However, this only tell us how an OS
>> is accessing a PIO.
>>
>> What I am trying to figure out is how the hardware (or QEMU) is meant to work.
>>
>>  From my understanding, the MMIO access will be received by the hostbridge and
>> then forwarded to the appropriate PCI device. The two questions I am trying to
>> answer is: How the I/O BARs are configured? Will it contain an MMIO address or
>> an offset?
>>
>> If the answer is the latter, then we will need PIO because a DM will never see
>> the MMIO address (the hostbridge will be emulated in Xen).
> 
> Now I understand the question :-)
> 
> This is the way I understand it works. Let's say that the PIO aperture
> is 0x1000-0x2000 which is aliased to 0x3eff0000-0x3eff1000.
> 0x1000-0x2000 are addresses that cannot be accessed directly.
> 0x3eff0000-0x3eff1000 is the range that works.
> 
> A PCI device PIO BAR will have an address in the 0x1000-0x2000 range,
> for instance 0x1100.
> 
> However, when the operating system access 0x1100, it will issue a read
> to 0x3eff0100.
> 
> Xen will trap the read to 0x3eff0100 and send it to QEMU.
> 
> QEMU has to know that 0x3eff0000-0x3eff1000 is the alias to the PIO
> aperture and that 0x3eff0100 correspond to PCI device foobar. Similarly,
> QEMU has also to know the address range of the MMIO aperture and its
> remappings, if any (it is possible to have address remapping for MMIO
> addresses too.)
> 
> I think today this information is "built-in" QEMU, not configurable. It
> works fine because *I think* the PCI aperture is pretty much the same on
> x86 boards, at least the one supported by QEMU for Xen.

Well on x86, the OS will access PIO using inb/outb. So the address 
received by Xen is 0x1000-0x2000 and then forwarded to the DM using the 
PIO type.

> 
> On ARM, I think we should explicitly declare the PCI MMIO aperture and
> its alias/address-remapping. When we do that, we can also declare the
> PIO aperture and its alias/address-remapping.

Well yes, we need to define PCI MMIO and PCI I/O region because the 
guest OS needs to know them.

However, I am unsure how this would help us to solve the question 
whether access to the PCI I/O aperture should be sent as a PIO or MMIO.

Per what you wrote, the PCI I/O Bar would be configured with the range 
0x1000-0x2000. So a device emulator (this may not be QEMU and only 
emulate one PCI device!!) will only see that range.

How does the device-emulator then know that it needs to watch the region 
0x3eff0000-0x3eff1000?

It feels to me that it would be easier/make more sense if the DM only 
say "I want to watch the PIO range 0x1000-0x2000". So Xen would be in 
charge to do the translation between the OS view and the DM view.

This also means a DM would be completely arch-agnostic. This would 
follow the HW where you can plug your PCI card on any HW.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-10 18:25           ` Julien Grall
@ 2020-08-10 19:58             ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-10 19:58 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk


On 10.08.20 21:25, Julien Grall wrote:

Hi Julien

>
>>>
>>> Do you mind to provide more details?
>> Previous patch "xen/mm: Make x86's XENMEM_resource_ioreq_server 
>> handling common" breaks build on Arm as it includes xen/hvm/ioreq.h 
>> which requires arch header
>> to be present (asm/hvm/ioreq.h). But the missing arch header together 
>> with other arch specific bits are introduced here in current patch. 
>
> I understand that both Arm and x86 now implement the asm/hvm/ioreq.h.
> However, please keep in mind that there might be other architectures 
> in the future.
>
> With your change here, you would impose a new arch to implement 
> asm/hvm/ioreq.h even if the developper have no plan to use the feature.
>
>> Probably I should have rearranged
>> changes in a way to not introduce #ifdef and then remove it...
>
> Ideally we want to avoid #ifdef in the common code. But if this can't 
> be done in an header, then the #ifdef here would be fine.

Got it.


>
>>>>> I understand that the x86 version is more complex as it check 
>>>>> p->df. However, aside reducing the complexity, I am not sure why 
>>>>> we would want to diverge it.
>>>>>
>>>>> After all, IOREQ is now meant to be a common feature.
>>>> Well, no objections at all.
>>>> Could you please clarify how could 'df' (Direction Flag?) be 
>>>> handled/used on Arm?
>>>
>>> On x86, this is used by 'rep' instruction to tell the direction to 
>>> iterate (forward or backward).
>>>
>>> On Arm, all the accesses to MMIO region will do a single memory 
>>> access. So for now, we can safely always set to 0.
>>>
>>>> I see that try_fwd_ioserv() always sets it 0. Or I need to just 
>>>> reuse x86's helpers as is,
>>>> which (together with count = df = 0) will result in what we 
>>>> actually have here?
>>> AFAIU, both count and df should be 0 on Arm.
>>
>> Thanks for the explanation. The only one question remains where to 
>> put common helpers hvm_mmio_first_byte/hvm_mmio_last_byte (common 
>> io.h?)?
>
> It feels to me it should be part of the common ioreq.h.

ok, will move.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-10 19:00         ` Julien Grall
@ 2020-08-10 20:29           ` Oleksandr
  2020-08-10 22:37             ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-10 20:29 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk


On 10.08.20 22:00, Julien Grall wrote:

Hi Julien

>
>>>
>>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>>>>    */
>>>>>   void leave_hypervisor_to_guest(void)
>>>>>   {
>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>> +    /*
>>>>> +     * XXX: Check the return. Shall we call that in
>>>>> +     * continue_running and context_switch instead?
>>>>> +     * The benefits would be to avoid calling
>>>>> +     * handle_hvm_io_completion on every return.
>>>>> +     */
>>>>
>>>> Yeah, that could be a simple and good optimization
>>>
>>> Well, it is not simple as it is sounds :). 
>>> handle_hvm_io_completion() is the function in charge to mark the 
>>> vCPU as waiting for I/O. So we would at least need to split the 
>>> function.
>>>
>>> I wrote this TODO because I wasn't sure about the complexity of 
>>> handle_hvm_io_completion(current). Looking at it again, the main 
>>> complexity is the looping over the IOREQ servers.
>>>
>>> I think it would be better to optimize handle_hvm_io_completion() 
>>> rather than trying to hack the context_switch() or continue_running().
>> Well, is the idea in proposed dirty test patch below close to what 
>> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra 
>> actions if vcpu's domain doesn't have ioreq_server, alternatively
>> the check could be moved out of handle_hvm_io_completion() to avoid 
>> calling that function at all.
>
> This looks ok to me.
>
>> BTW, TODO also suggests checking the return value of 
>> handle_hvm_io_completion(), but I am completely sure we can simply
>> just return from leave_hypervisor_to_guest() at this point. Could you 
>> please share your opinion?
>
> From my understanding, handle_hvm_io_completion() may return false if 
> there is pending I/O or a failure.

It seems, yes


>
> In the former case, I think we want to call handle_hvm_io_completion() 
> later on. Possibly after we call do_softirq().
>
> I am wondering whether check_for_vcpu_work() could return whether 
> there are more work todo on the behalf of the vCPU.
>
> So we could have:
>
> do
> {
>   check_for_pcpu_work();
> } while (check_for_vcpu_work())
>
> The implementation of check_for_vcpu_work() would be:
>
> if ( !handle_hvm_io_completion() )
>   return true;
>
> /* Rest of the existing code */
>
> return false;

Thank you, will give it a try.

Can we behave the same way for both "pending I/O" and "failure" or we 
need to distinguish them?

Probably we need some sort of safe timeout/number attempts in order to 
not spin forever?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-10 20:29           ` Oleksandr
@ 2020-08-10 22:37             ` Julien Grall
  2020-08-11  6:13               ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-10 22:37 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk

On Mon, 10 Aug 2020 at 21:29, Oleksandr <olekstysh@gmail.com> wrote:
>
>
> On 10.08.20 22:00, Julien Grall wrote:
>
> Hi Julien
>
> >
> >>>
> >>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
> >>>>>    */
> >>>>>   void leave_hypervisor_to_guest(void)
> >>>>>   {
> >>>>> +#ifdef CONFIG_IOREQ_SERVER
> >>>>> +    /*
> >>>>> +     * XXX: Check the return. Shall we call that in
> >>>>> +     * continue_running and context_switch instead?
> >>>>> +     * The benefits would be to avoid calling
> >>>>> +     * handle_hvm_io_completion on every return.
> >>>>> +     */
> >>>>
> >>>> Yeah, that could be a simple and good optimization
> >>>
> >>> Well, it is not simple as it is sounds :).
> >>> handle_hvm_io_completion() is the function in charge to mark the
> >>> vCPU as waiting for I/O. So we would at least need to split the
> >>> function.
> >>>
> >>> I wrote this TODO because I wasn't sure about the complexity of
> >>> handle_hvm_io_completion(current). Looking at it again, the main
> >>> complexity is the looping over the IOREQ servers.
> >>>
> >>> I think it would be better to optimize handle_hvm_io_completion()
> >>> rather than trying to hack the context_switch() or continue_running().
> >> Well, is the idea in proposed dirty test patch below close to what
> >> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra
> >> actions if vcpu's domain doesn't have ioreq_server, alternatively
> >> the check could be moved out of handle_hvm_io_completion() to avoid
> >> calling that function at all.
> >
> > This looks ok to me.
> >
> >> BTW, TODO also suggests checking the return value of
> >> handle_hvm_io_completion(), but I am completely sure we can simply
> >> just return from leave_hypervisor_to_guest() at this point. Could you
> >> please share your opinion?
> >
> > From my understanding, handle_hvm_io_completion() may return false if
> > there is pending I/O or a failure.
>
> It seems, yes
>
>
> >
> > In the former case, I think we want to call handle_hvm_io_completion()
> > later on. Possibly after we call do_softirq().
> >
> > I am wondering whether check_for_vcpu_work() could return whether
> > there are more work todo on the behalf of the vCPU.
> >
> > So we could have:
> >
> > do
> > {
> >   check_for_pcpu_work();
> > } while (check_for_vcpu_work())
> >
> > The implementation of check_for_vcpu_work() would be:
> >
> > if ( !handle_hvm_io_completion() )
> >   return true;
> >
> > /* Rest of the existing code */
> >
> > return false;
>
> Thank you, will give it a try.
>
> Can we behave the same way for both "pending I/O" and "failure" or we
> need to distinguish them?

We don't need to distinguish them. In both cases, we will want to
process softirqs. In all the failure cases, the domain will have
crashed. Therefore the vCPU will be unscheduled.

>
> Probably we need some sort of safe timeout/number attempts in order to
> not spin forever?

Well, anything based on timeout/number of attempts is flaky. How do
you know whether the I/O is just taking a "long time" to complete?

But a vCPU shouldn't continue until an I/O has completed. This is
nothing very different than what a processor would do.

In Xen case, if an I/O never completes then it most likely means that
something went horribly wrong with the Device Emulator. So it is most
likely not safe to continue. In HW, when there is a device failure,
the OS may receive an SError (this is implementation defined) and
could act accordingly if it is able to recognize the issue.

It *might* be possible to send a virtual SError but there are a couple
of issues with it:
     * How do you detect a failure?
     * SErrors are implementations defined. You would need to teach
your OS (or the firmware) how to deal with them.

I would expect quite a bit of effort in order to design and implement
it. For now, it is probably best to just let the vCPU spin forever.

This wouldn't be an issue for Xen as do_softirq() would be called at
every loop.

Cheers,


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-08  9:27                 ` Julien Grall
  2020-08-08  9:28                   ` Julien Grall
@ 2020-08-10 23:34                   ` Stefano Stabellini
  2020-08-11 13:04                     ` Julien Grall
  1 sibling, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-10 23:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Oleksandr Tyshchenko,
	Julien Grall, Jan Beulich, xen-devel, Volodymyr Babchuk

On Sat, 8 Aug 2020, Julien Grall wrote:
> On Fri, 7 Aug 2020 at 22:51, Stefano Stabellini <sstabellini@kernel.org> wrote:
> >
> > On Fri, 7 Aug 2020, Jan Beulich wrote:
> > > On 07.08.2020 01:49, Stefano Stabellini wrote:
> > > > On Thu, 6 Aug 2020, Julien Grall wrote:
> > > >> On 06/08/2020 01:37, Stefano Stabellini wrote:
> > > >>> On Wed, 5 Aug 2020, Julien Grall wrote:
> > > >>>> On 05/08/2020 00:22, Stefano Stabellini wrote:
> > > >>>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> > > >>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > >>>>>>
> > > >>>>>> This patch adds ability to the device emulator to notify otherend
> > > >>>>>> (some entity running in the guest) using a SPI and implements Arm
> > > >>>>>> specific bits for it. Proposed interface allows emulator to set
> > > >>>>>> the logical level of a one of a domain's IRQ lines.
> > > >>>>>>
> > > >>>>>> Please note, this is a split/cleanup of Julien's PoC:
> > > >>>>>> "Add support for Guest IO forwarding to a device emulator"
> > > >>>>>>
> > > >>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
> > > >>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > >>>>>> ---
> > > >>>>>>    tools/libs/devicemodel/core.c                   | 18
> > > >>>>>> ++++++++++++++++++
> > > >>>>>>    tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
> > > >>>>>>    tools/libs/devicemodel/libxendevicemodel.map    |  1 +
> > > >>>>>>    xen/arch/arm/dm.c                               | 22
> > > >>>>>> +++++++++++++++++++++-
> > > >>>>>>    xen/common/hvm/dm.c                             |  1 +
> > > >>>>>>    xen/include/public/hvm/dm_op.h                  | 15
> > > >>>>>> +++++++++++++++
> > > >>>>>>    6 files changed, 60 insertions(+), 1 deletion(-)
> > > >>>>>>
> > > >>>>>> diff --git a/tools/libs/devicemodel/core.c
> > > >>>>>> b/tools/libs/devicemodel/core.c
> > > >>>>>> index 4d40639..30bd79f 100644
> > > >>>>>> --- a/tools/libs/devicemodel/core.c
> > > >>>>>> +++ b/tools/libs/devicemodel/core.c
> > > >>>>>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
> > > >>>>>>        return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> > > >>>>>>    }
> > > >>>>>>    +int xendevicemodel_set_irq_level(
> > > >>>>>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
> > > >>>>>> +    unsigned int level)
> > > >>>>>
> > > >>>>> It is a pity that having xen_dm_op_set_pci_intx_level and
> > > >>>>> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
> > > >>>>> the names alone I don't think we can reuse either of them.
> > > >>>>
> > > >>>> The problem is not the name...
> > > >>>>
> > > >>>>>
> > > >>>>> It is very similar to set_isa_irq_level. We could almost rename
> > > >>>>> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
> > > >>>>> better, just add an alias to it so that xendevicemodel_set_irq_level is
> > > >>>>> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
> > > >>>>> not sure if it is worth doing it though. Any other opinions?
> > > >>>>
> > > >>>> ... the problem is the interrupt field is only 8-bit. So we would only be
> > > >>>> able
> > > >>>> to cover IRQ 0 - 255.
> > > >>>
> > > >>> Argh, that's not going to work :-(  I wasn't sure if it was a good idea
> > > >>> anyway.
> > > >>>
> > > >>>
> > > >>>> It is not entirely clear how the existing subop could be extended without
> > > >>>> breaking existing callers.
> > > >>>>
> > > >>>>> But I think we should plan for not needing two calls (one to set level
> > > >>>>> to 1, and one to set it to 0):
> > > >>>>> https://marc.info/?l=xen-devel&m=159535112027405
> > > >>>>
> > > >>>> I am not sure to understand your suggestion here? Are you suggesting to
> > > >>>> remove
> > > >>>> the 'level' parameter?
> > > >>>
> > > >>> My hope was to make it optional to call the hypercall with level = 0,
> > > >>> not necessarily to remove 'level' from the struct.
> > > >>
> > > >> From my understanding, the hypercall is meant to represent the status of the
> > > >> line between the device and the interrupt controller (either low or high).
> > > >>
> > > >> This is then up to the interrupt controller to decide when the interrupt is
> > > >> going to be fired:
> > > >>   - For edge interrupt, this will fire when the line move from low to high (or
> > > >> vice versa).
> > > >>   - For level interrupt, this will fire when line is high (assuming level
> > > >> trigger high) and will keeping firing until the device decided to lower the
> > > >> line.
> > > >>
> > > >> For a device, it is common to keep the line high until an OS wrote to a
> > > >> specific register.
> > > >>
> > > >> Furthermore, technically, the guest OS is in charge to configure how an
> > > >> interrupt is triggered. Admittely this information is part of the DT, but
> > > >> nothing prevent a guest to change it.
> > > >>
> > > >> As side note, we have a workaround in Xen for some buggy DT (see the arch
> > > >> timer) exposing the wrong trigger type.
> > > >>
> > > >> Because of that, I don't really see a way to make optional. Maybe you have
> > > >> something different in mind?
> > > >
> > > > For level, we need the level parameter. For edge, we are only interested
> > > > in the "edge", right?
> > >
> > > I don't think so, unless Arm has special restrictions. Edges can be
> > > both rising and falling ones.
> >
> > And the same is true for level interrupts too: they could be active-low
> > or active-high.
> >
> >
> > Instead of modelling the state of the line, which seems to be a bit
> > error prone especially in the case of a single-device emulator that
> > might not have enough information about the rest of the system (it might
> > not know if the interrupt is active-high or active-low), we could model
> > the triggering of the interrupt instead.
> 
> I am not sure to understand why the single (or event multiple) device
> emulator needs to know the trigger type. The information of the
> trigger type of the interrupt would be described in the firmware table
> and it is expected to be the same as what the emulator expects.
> 
> If the guest OS decided to configure wrongly the interrupt trigger
> type, then it may not work properly. But, from my understanding, this
> doesn't differ from the HW behavior.
> 
> >
> > In the case of level=1, it would mean that the interrupt line is active,
> > no matter if it is active-low or active-high. In the case of level=0, it
> > would mean that it is inactive.
> >
> > Similarly, in the case of an edge interrupt edge=1 or level=1 would mean
> > that there is an edge, no matter if it is rising or falling.
> 
> TBH, I think your approach is only going to introduce more headache in
> Xen if a guest OS decides to change the trigger type.
> 
> It feels much easier to just ask the emulator to let us know the level
> of the line. Then if the guest OS decides to change the trigger type,
> we only need to resample the line.

Emulators, at least the ones in QEMU, don't model the hardware so
closely to care about trigger type. The only thing they typically care
about is to fire a notification.

The trigger type only comes into the picture when there is a bug or a
disagreement between Xen and QEMU. Imagine a device that can be both
level active-high or active-low, if the guest kernel changes the
configuration, Xen would know about it, but QEMU wouldn't. I vaguely
recall a bug 10+ years ago about this with QEMU on x86 and a line that
could be both active-high and active-low. So QEMU would raise the
interrupt but Xen would actually think that QEMU stopped the interrupt.

To do this right, we would have to introduce an interface between Xen
and QEMU to propagate the trigger type. Xen would have to tell QEMU when
the guest changed the configuration. That would work, but it would be
better if we can figure out a way to do without it to reduce complexity.

Instead, given that QEMU and other emulators don't actually care about
active-high or active-low, if we have a Xen interface that just says
"fire the interrupt" we get away from this kind of troubles. It would
also be more efficient because the total number of hypercalls required
would be lower.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-07 23:45                   ` Oleksandr
@ 2020-08-10 23:34                     ` Stefano Stabellini
  0 siblings, 0 replies; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-10 23:34 UTC (permalink / raw)
  To: Oleksandr
  Cc: 'Kevin Tian',
	Stefano Stabellini, Julien Grall, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', 'Oleksandr Tyshchenko',
	'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'

On Sat, 8 Aug 2020, Oleksandr wrote:
> On 08.08.20 00:50, Stefano Stabellini wrote:
> > On Fri, 7 Aug 2020, Oleksandr wrote:
> > > On 06.08.20 03:37, Stefano Stabellini wrote:
> > > 
> > > Hi Stefano
> > > 
> > > Trying to simulate IO_RETRY handling mechanism (according to model below)
> > > I
> > > continuously get IO_RETRY from try_fwd_ioserv() ...
> > > 
> > > > OK, thanks for the details. My interpretation seems to be correct.
> > > > 
> > > > In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
> > > > return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
> > > > also needs to handle try_handle_mmio returning IO_RETRY the first
> > > > around, and IO_HANDLED when after QEMU does its job.
> > > > 
> > > > What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
> > > > early and let the scheduler do its job? Something like:
> > > > 
> > > >               enum io_state state = try_handle_mmio(regs, hsr, gpa);
> > > > 
> > > >               switch ( state )
> > > >               {
> > > >               case IO_ABORT:
> > > >                   goto inject_abt;
> > > >               case IO_HANDLED:
> > > >                   advance_pc(regs, hsr);
> > > >                   return;
> > > >               case IO_RETRY:
> > > >                   /* finish later */
> > > >                   return;
> > > >               case IO_UNHANDLED:
> > > >                   /* IO unhandled, try another way to handle it. */
> > > >                   break;
> > > >               default:
> > > >                   ASSERT_UNREACHABLE();
> > > >               }
> > > > 
> > > > Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> > > > handle_hvm_io_completion() after QEMU completes the emulation. Today,
> > > > handle_mmio just sets the user register with the read value.
> > > > 
> > > > But it would be better if it called again the original function
> > > > do_trap_stage2_abort_guest to actually retry the original operation.
> > > > This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
> > > > IO_HANDLED instead of IO_RETRY,
> > > I may miss some important point, but I failed to see why try_handle_mmio
> > > (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this stage.
> > > Or current try_fwd_ioserv() logic needs rework?
> > I think you should check the ioreq->state in try_fwd_ioserv(), if the
> > result is ready, then ioreq->state should be STATE_IORESP_READY, and you
> > can return IO_HANDLED.
> > 
> > That is assuming that you are looking at the live version of the ioreq
> > shared with QEMU instead of a private copy of it, which I am not sure.
> > Looking at try_fwd_ioserv() it would seem that vio->io_req is just a
> > copy? The live version is returned by get_ioreq() ?
> 
> If I understand the code correctly, indeed, get_ioreq() returns live version
> shared with emulator.
> Desired state change (STATE_IORESP_READY) what actually the hvm_wait_for_io()
> is waiting for is set here (in my case):
> https://xenbits.xen.org/gitweb/?p=people/pauldu/demu.git;a=blob;f=demu.c;h=f785b394d0cf141dffa05bdddecf338214358aea;hb=refs/heads/master#l698 
> 
> > Even in handle_hvm_io_completion, instead of setting vio->io_req.state
> > to STATE_IORESP_READY by hand, it would be better to look at the live
> > version of the ioreq because QEMU will have already set ioreq->state
> > to STATE_IORESP_READY (hw/i386/xen/xen-hvm.c:cpu_handle_ioreq).
> It seems that after detecting STATE_IORESP_READY in hvm_wait_for_io() the
> state of live version is set to STATE_IOREQ_NONE immediately, so looking at
> the live version down the handle_hvm_io_completion()
> or in try_fwd_ioserv() shows us nothing I am afraid.

I see. That is because we want to "free" the ioreq as soon as possible,
which is good. handle_hvm_io_completion also sets vio->io_req.state to
STATE_IORESP_READY, so our private copy is still set to
STATE_IORESP_READY. Thus, try_fwd_ioserv should do the right thing
simply reading vio->io_req.state: try_fwd_ioserv should be able to
return IO_HANDLED when the "state" is STATE_IORESP_READY, right?


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-10 13:41                     ` Oleksandr
@ 2020-08-10 23:34                       ` Stefano Stabellini
  2020-08-11  9:19                         ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-10 23:34 UTC (permalink / raw)
  To: Oleksandr
  Cc: 'Kevin Tian',
	Stefano Stabellini, Julien Grall, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', 'Oleksandr Tyshchenko',
	'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'

[-- Attachment #1: Type: text/plain, Size: 5878 bytes --]

On Mon, 10 Aug 2020, Oleksandr wrote:
> On 08.08.20 01:19, Oleksandr wrote:
> > On 08.08.20 00:50, Stefano Stabellini wrote:
> > > On Fri, 7 Aug 2020, Oleksandr wrote:
> > > > On 06.08.20 03:37, Stefano Stabellini wrote:
> > > > 
> > > > Hi Stefano
> > > > 
> > > > Trying to simulate IO_RETRY handling mechanism (according to model
> > > > below) I
> > > > continuously get IO_RETRY from try_fwd_ioserv() ...
> > > > 
> > > > > OK, thanks for the details. My interpretation seems to be correct.
> > > > > 
> > > > > In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
> > > > > return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
> > > > > also needs to handle try_handle_mmio returning IO_RETRY the first
> > > > > around, and IO_HANDLED when after QEMU does its job.
> > > > > 
> > > > > What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
> > > > > early and let the scheduler do its job? Something like:
> > > > > 
> > > > >               enum io_state state = try_handle_mmio(regs, hsr, gpa);
> > > > > 
> > > > >               switch ( state )
> > > > >               {
> > > > >               case IO_ABORT:
> > > > >                   goto inject_abt;
> > > > >               case IO_HANDLED:
> > > > >                   advance_pc(regs, hsr);
> > > > >                   return;
> > > > >               case IO_RETRY:
> > > > >                   /* finish later */
> > > > >                   return;
> > > > >               case IO_UNHANDLED:
> > > > >                   /* IO unhandled, try another way to handle it. */
> > > > >                   break;
> > > > >               default:
> > > > >                   ASSERT_UNREACHABLE();
> > > > >               }
> > > > > 
> > > > > Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> > > > > handle_hvm_io_completion() after QEMU completes the emulation. Today,
> > > > > handle_mmio just sets the user register with the read value.
> > > > > 
> > > > > But it would be better if it called again the original function
> > > > > do_trap_stage2_abort_guest to actually retry the original operation.
> > > > > This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
> > > > > IO_HANDLED instead of IO_RETRY,
> > > > I may miss some important point, but I failed to see why try_handle_mmio
> > > > (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this
> > > > stage.
> > > > Or current try_fwd_ioserv() logic needs rework?
> > > I think you should check the ioreq->state in try_fwd_ioserv(), if the
> > > result is ready, then ioreq->state should be STATE_IORESP_READY, and you
> > > can return IO_HANDLED.
> > 
> 
> I optimized test patch a bit (now it looks much simpler). I didn't face any
> issues during a quick test.

Both patches get much closer to following the proper state machine,
great! I think this patch is certainly a good improvement. I think the
other patch you sent earlier, slightly larger, is even better. It makes
the following additional changes that would be good to have:

- try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
- handle_mmio simply calls do_trap_stage2_abort_guest

I would also remove "handle_mmio_finish" and do the guest register
setting as well as setting vio->io_req.state to STATE_IOREQ_NONE
directly in try_fwd_ioserv.

 
> ---
>  xen/arch/arm/io.c    | 4 ----
>  xen/arch/arm/ioreq.c | 7 ++++++-
>  xen/arch/arm/traps.c | 4 +++-
>  3 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index 436f669..3063577 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -156,10 +156,6 @@ static enum io_state try_fwd_ioserv(struct cpu_user_regs
> *regs,
>      else
>          vio->io_completion = HVMIO_mmio_completion;
> 
> -    /* XXX: Decide what to do */
> -    if ( rc 7== IO_RETRY )
> -        rc = IO_HANDLED;
> -
>      return rc;
>  }
>  #endif
> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> index 8f60c41..e5235c6 100644
> --- a/xen/arch/arm/ioreq.c
> +++ b/xen/arch/arm/ioreq.c
> @@ -33,6 +33,8 @@
>  #include <public/hvm/dm_op.h>
>  #include <public/hvm/ioreq.h>
> 
> +#include <asm/traps.h>
> +
>  bool handle_mmio(void)
>  {
>      struct vcpu *v = current;
> @@ -52,7 +54,7 @@ bool handle_mmio(void)
> 
>      /* XXX: Do we need to take care of write here ? */
>      if ( dabt.write )
> -        return true;
> +        goto done;
> 
>      /*
>       * Sign extend if required.
> @@ -72,6 +74,9 @@ bool handle_mmio(void)
> 
>      set_user_reg(regs, dabt.reg, r);
> 
> +done:
> +    advance_pc(regs, hsr);
> +
>      return true;
>  }
> 
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index ea472d1..974c744 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1965,11 +1965,13 @@ static void do_trap_stage2_abort_guest(struct
> cpu_user_regs *regs,
>              case IO_HANDLED:
>                  advance_pc(regs, hsr);
>                  return;
> +            case IO_RETRY:
> +                /* finish later */
> +                return;
>              case IO_UNHANDLED:
>                  /* IO unhandled, try another way to handle it. */
>                  break;
>              default:
> -                /* XXX: Handle IO_RETRY */
>                  ASSERT_UNREACHABLE();
>              }
>          }

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-10 19:20                   ` Julien Grall
@ 2020-08-10 23:34                     ` Stefano Stabellini
  2020-08-11 11:28                       ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-10 23:34 UTC (permalink / raw)
  To: Julien Grall
  Cc: 'Kevin Tian', Stefano Stabellini, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', Oleksandr, 'Oleksandr Tyshchenko',
	'Julien Grall', 'Jan Beulich',
	xen-devel, 'Roger Pau Monné'

[-- Attachment #1: Type: text/plain, Size: 6917 bytes --]

On Mon, 10 Aug 2020, Julien Grall wrote:
> On 07/08/2020 00:48, Stefano Stabellini wrote:
> > On Thu, 6 Aug 2020, Julien Grall wrote:
> > > On 06/08/2020 01:37, Stefano Stabellini wrote:
> > > > On Wed, 5 Aug 2020, Julien Grall wrote:
> > > > > On 04/08/2020 20:11, Stefano Stabellini wrote:
> > > > > > On Tue, 4 Aug 2020, Julien Grall wrote:
> > > > > > > On 04/08/2020 12:10, Oleksandr wrote:
> > > > > > > > On 04.08.20 10:45, Paul Durrant wrote:
> > > > > > > > > > +static inline bool hvm_ioreq_needs_completion(const ioreq_t
> > > > > > > > > > *ioreq)
> > > > > > > > > > +{
> > > > > > > > > > +    return ioreq->state == STATE_IOREQ_READY &&
> > > > > > > > > > +           !ioreq->data_is_ptr &&
> > > > > > > > > > +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
> > > > > > > > > > IOREQ_WRITE);
> > > > > > > > > > +}
> > > > > > > > > I don't think having this in common code is correct. The
> > > > > > > > > short-cut
> > > > > > > > > of
> > > > > > > > > not
> > > > > > > > > completing PIO reads seems somewhat x86 specific.
> > > > > > > 
> > > > > > > Hmmm, looking at the code, I think it doesn't wait for PIO writes
> > > > > > > to
> > > > > > > complete
> > > > > > > (not read). Did I miss anything?
> > > > > > > 
> > > > > > > > Does ARM even
> > > > > > > > > have the concept of PIO?
> > > > > > > > 
> > > > > > > > I am not 100% sure here, but it seems that doesn't have.
> > > > > > > 
> > > > > > > Technically, the PIOs exist on Arm, however they are accessed the
> > > > > > > same
> > > > > > > way
> > > > > > > as
> > > > > > > MMIO and will have a dedicated area defined by the HW.
> > > > > > > 
> > > > > > > AFAICT, on Arm64, they are only used for PCI IO Bar.
> > > > > > > 
> > > > > > > Now the question is whether we want to expose them to the Device
> > > > > > > Emulator
> > > > > > > as
> > > > > > > PIO or MMIO access. From a generic PoV, a DM shouldn't have to
> > > > > > > care
> > > > > > > about
> > > > > > > the
> > > > > > > architecture used. It should just be able to request a given
> > > > > > > IOport
> > > > > > > region.
> > > > > > > 
> > > > > > > So it may make sense to differentiate them in the common ioreq
> > > > > > > code as
> > > > > > > well.
> > > > > > > 
> > > > > > > I had a quick look at QEMU and wasn't able to tell if PIOs and
> > > > > > > MMIOs
> > > > > > > address
> > > > > > > space are different on Arm as well. Paul, Stefano, do you know
> > > > > > > what
> > > > > > > they
> > > > > > > are
> > > > > > > doing?
> > > > > > 
> > > > > > On the QEMU side, it looks like PIO (address_space_io) is used in
> > > > > > connection with the emulation of the "in" or "out" instructions, see
> > > > > > ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
> > > > > > space regardless of the architecture, such as
> > > > > > hw/pci/pci_bridge.c:pci_bridge_initfn.
> > > > > > 
> > > > > > However, because there is no "in" and "out" on ARM, I don't think
> > > > > > address_space_io can be accessed. Specifically, there is no
> > > > > > equivalent
> > > > > > for target/i386/misc_helper.c:helper_inb on ARM.
> > > > > 
> > > > > So how PCI I/O BAR are accessed? Surely, they could be used on Arm,
> > > > > right?
> > > > 
> > > > PIO is also memory mapped on ARM and it seems to have its own MMIO
> > > > address window.
> > > This part is already well-understood :). However, this only tell us how an
> > > OS
> > > is accessing a PIO.
> > > 
> > > What I am trying to figure out is how the hardware (or QEMU) is meant to
> > > work.
> > > 
> > >  From my understanding, the MMIO access will be received by the hostbridge
> > > and
> > > then forwarded to the appropriate PCI device. The two questions I am
> > > trying to
> > > answer is: How the I/O BARs are configured? Will it contain an MMIO
> > > address or
> > > an offset?
> > > 
> > > If the answer is the latter, then we will need PIO because a DM will never
> > > see
> > > the MMIO address (the hostbridge will be emulated in Xen).
> > 
> > Now I understand the question :-)
> > 
> > This is the way I understand it works. Let's say that the PIO aperture
> > is 0x1000-0x2000 which is aliased to 0x3eff0000-0x3eff1000.
> > 0x1000-0x2000 are addresses that cannot be accessed directly.
> > 0x3eff0000-0x3eff1000 is the range that works.
> > 
> > A PCI device PIO BAR will have an address in the 0x1000-0x2000 range,
> > for instance 0x1100.
> > 
> > However, when the operating system access 0x1100, it will issue a read
> > to 0x3eff0100.
> > 
> > Xen will trap the read to 0x3eff0100 and send it to QEMU.
> > 
> > QEMU has to know that 0x3eff0000-0x3eff1000 is the alias to the PIO
> > aperture and that 0x3eff0100 correspond to PCI device foobar. Similarly,
> > QEMU has also to know the address range of the MMIO aperture and its
> > remappings, if any (it is possible to have address remapping for MMIO
> > addresses too.)
> > 
> > I think today this information is "built-in" QEMU, not configurable. It
> > works fine because *I think* the PCI aperture is pretty much the same on
> > x86 boards, at least the one supported by QEMU for Xen.
> 
> Well on x86, the OS will access PIO using inb/outb. So the address received by
> Xen is 0x1000-0x2000 and then forwarded to the DM using the PIO type.
> 
> > On ARM, I think we should explicitly declare the PCI MMIO aperture and
> > its alias/address-remapping. When we do that, we can also declare the
> > PIO aperture and its alias/address-remapping.
> 
> Well yes, we need to define PCI MMIO and PCI I/O region because the guest OS
> needs to know them.

[1]
(see below)


> However, I am unsure how this would help us to solve the question whether
> access to the PCI I/O aperture should be sent as a PIO or MMIO.
> 
> Per what you wrote, the PCI I/O Bar would be configured with the range
> 0x1000-0x2000. So a device emulator (this may not be QEMU and only emulate one
> PCI device!!) will only see that range.
> 
> How does the device-emulator then know that it needs to watch the region
> 0x3eff0000-0x3eff1000?

It would know because the PCI PIO aperture, together with the alias, are
specified [1].


> It feels to me that it would be easier/make more sense if the DM only say "I
> want to watch the PIO range 0x1000-0x2000". So Xen would be in charge to do
> the translation between the OS view and the DM view.
> 
> This also means a DM would be completely arch-agnostic. This would follow the
> HW where you can plug your PCI card on any HW.

As you know, PIO access is actually not modelled by QEMU for ARM
targets. I worry about the long term stability of it, given that it is
untested.  I.e. qemu-system-aarch64 could have a broken PIO emulation
and nobody would find out except for us when we send ioreqs to it.

Thinking from a Xen/Emulator interface on ARM, is it wise to rely on an
access-type that doesn't exist on the architecture?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-10 22:37             ` Julien Grall
@ 2020-08-11  6:13               ` Oleksandr
  2020-08-12 15:08                 ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-11  6:13 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	xen-devel, Daniel De Graaf, Volodymyr Babchuk


On 11.08.20 01:37, Julien Grall wrote:

Hi Julien

> On Mon, 10 Aug 2020 at 21:29, Oleksandr <olekstysh@gmail.com> wrote:
>>
>> On 10.08.20 22:00, Julien Grall wrote:
>>
>> Hi Julien
>>
>>>>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>>>>>>     */
>>>>>>>    void leave_hypervisor_to_guest(void)
>>>>>>>    {
>>>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>>>> +    /*
>>>>>>> +     * XXX: Check the return. Shall we call that in
>>>>>>> +     * continue_running and context_switch instead?
>>>>>>> +     * The benefits would be to avoid calling
>>>>>>> +     * handle_hvm_io_completion on every return.
>>>>>>> +     */
>>>>>> Yeah, that could be a simple and good optimization
>>>>> Well, it is not simple as it is sounds :).
>>>>> handle_hvm_io_completion() is the function in charge to mark the
>>>>> vCPU as waiting for I/O. So we would at least need to split the
>>>>> function.
>>>>>
>>>>> I wrote this TODO because I wasn't sure about the complexity of
>>>>> handle_hvm_io_completion(current). Looking at it again, the main
>>>>> complexity is the looping over the IOREQ servers.
>>>>>
>>>>> I think it would be better to optimize handle_hvm_io_completion()
>>>>> rather than trying to hack the context_switch() or continue_running().
>>>> Well, is the idea in proposed dirty test patch below close to what
>>>> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra
>>>> actions if vcpu's domain doesn't have ioreq_server, alternatively
>>>> the check could be moved out of handle_hvm_io_completion() to avoid
>>>> calling that function at all.
>>> This looks ok to me.
>>>
>>>> BTW, TODO also suggests checking the return value of
>>>> handle_hvm_io_completion(), but I am completely sure we can simply
>>>> just return from leave_hypervisor_to_guest() at this point. Could you
>>>> please share your opinion?
>>>  From my understanding, handle_hvm_io_completion() may return false if
>>> there is pending I/O or a failure.
>> It seems, yes
>>
>>
>>> In the former case, I think we want to call handle_hvm_io_completion()
>>> later on. Possibly after we call do_softirq().
>>>
>>> I am wondering whether check_for_vcpu_work() could return whether
>>> there are more work todo on the behalf of the vCPU.
>>>
>>> So we could have:
>>>
>>> do
>>> {
>>>    check_for_pcpu_work();
>>> } while (check_for_vcpu_work())
>>>
>>> The implementation of check_for_vcpu_work() would be:
>>>
>>> if ( !handle_hvm_io_completion() )
>>>    return true;
>>>
>>> /* Rest of the existing code */
>>>
>>> return false;
>> Thank you, will give it a try.
>>
>> Can we behave the same way for both "pending I/O" and "failure" or we
>> need to distinguish them?
> We don't need to distinguish them. In both cases, we will want to
> process softirqs. In all the failure cases, the domain will have
> crashed. Therefore the vCPU will be unscheduled.

Got it.


>> Probably we need some sort of safe timeout/number attempts in order to
>> not spin forever?
> Well, anything based on timeout/number of attempts is flaky. How do
> you know whether the I/O is just taking a "long time" to complete?
>
> But a vCPU shouldn't continue until an I/O has completed. This is
> nothing very different than what a processor would do.
>
> In Xen case, if an I/O never completes then it most likely means that
> something went horribly wrong with the Device Emulator. So it is most
> likely not safe to continue. In HW, when there is a device failure,
> the OS may receive an SError (this is implementation defined) and
> could act accordingly if it is able to recognize the issue.
>
> It *might* be possible to send a virtual SError but there are a couple
> of issues with it:
>       * How do you detect a failure?
>       * SErrors are implementations defined. You would need to teach
> your OS (or the firmware) how to deal with them.
>
> I would expect quite a bit of effort in order to design and implement
> it. For now, it is probably best to just let the vCPU spin forever.
>
> This wouldn't be an issue for Xen as do_softirq() would be called at
> every loop.

  Thank you for clarification. Fair enough and sounds reasonable.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-10 23:34                       ` Stefano Stabellini
@ 2020-08-11  9:19                         ` Julien Grall
  2020-08-11 10:10                           ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-11  9:19 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr
  Cc: 'Kevin Tian', 'Jun Nakajima', 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	'Oleksandr Tyshchenko', 'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'

Hi Stefano,

On 11/08/2020 00:34, Stefano Stabellini wrote:
> On Mon, 10 Aug 2020, Oleksandr wrote:
>> On 08.08.20 01:19, Oleksandr wrote:
>>> On 08.08.20 00:50, Stefano Stabellini wrote:
>>>> On Fri, 7 Aug 2020, Oleksandr wrote:
>>>>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>>>>
>>>>> Hi Stefano
>>>>>
>>>>> Trying to simulate IO_RETRY handling mechanism (according to model
>>>>> below) I
>>>>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>>>>
>>>>>> OK, thanks for the details. My interpretation seems to be correct.
>>>>>>
>>>>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv should
>>>>>> return IO_RETRY. Then, xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>>>>> also needs to handle try_handle_mmio returning IO_RETRY the first
>>>>>> around, and IO_HANDLED when after QEMU does its job.
>>>>>>
>>>>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply return
>>>>>> early and let the scheduler do its job? Something like:
>>>>>>
>>>>>>                enum io_state state = try_handle_mmio(regs, hsr, gpa);
>>>>>>
>>>>>>                switch ( state )
>>>>>>                {
>>>>>>                case IO_ABORT:
>>>>>>                    goto inject_abt;
>>>>>>                case IO_HANDLED:
>>>>>>                    advance_pc(regs, hsr);
>>>>>>                    return;
>>>>>>                case IO_RETRY:
>>>>>>                    /* finish later */
>>>>>>                    return;
>>>>>>                case IO_UNHANDLED:
>>>>>>                    /* IO unhandled, try another way to handle it. */
>>>>>>                    break;
>>>>>>                default:
>>>>>>                    ASSERT_UNREACHABLE();
>>>>>>                }
>>>>>>
>>>>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>>>>> handle_hvm_io_completion() after QEMU completes the emulation. Today,
>>>>>> handle_mmio just sets the user register with the read value.
>>>>>>
>>>>>> But it would be better if it called again the original function
>>>>>> do_trap_stage2_abort_guest to actually retry the original operation.
>>>>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and gets
>>>>>> IO_HANDLED instead of IO_RETRY,
>>>>> I may miss some important point, but I failed to see why try_handle_mmio
>>>>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this
>>>>> stage.
>>>>> Or current try_fwd_ioserv() logic needs rework?
>>>> I think you should check the ioreq->state in try_fwd_ioserv(), if the
>>>> result is ready, then ioreq->state should be STATE_IORESP_READY, and you
>>>> can return IO_HANDLED.
>>>
>>
>> I optimized test patch a bit (now it looks much simpler). I didn't face any
>> issues during a quick test.
> 
> Both patches get much closer to following the proper state machine,
> great! I think this patch is certainly a good improvement. I think the
> other patch you sent earlier, slightly larger, is even better. It makes
> the following additional changes that would be good to have:
> 
> - try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
> - handle_mmio simply calls do_trap_stage2_abort_guest

I don't think we should call do_trap_stage2_abort_guest() as part of the 
completion because:
     * The function do_trap_stage2_abort_guest() is using registers that 
are not context switched (such as FAR_EL2). I/O handling is split in two 
with likely a context switch in the middle. The second part is the 
completion (i.e call to handle_mmio()). So the system registers will be 
incorrect.
     * A big chunk of do_trap_stage2_abort_guest() is not necessary for 
the completion. For instance, there is no need to try to translate the 
guest virtual address to a guest physical address.

Therefore the version below is probably the best approach.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-11  9:19                         ` Julien Grall
@ 2020-08-11 10:10                           ` Oleksandr
  2020-08-11 22:47                             ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-11 10:10 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: 'Kevin Tian', 'Jun Nakajima', 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	'Oleksandr Tyshchenko', 'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'


On 11.08.20 12:19, Julien Grall wrote:

Hi Julien, Stefano

> Hi Stefano,
>
> On 11/08/2020 00:34, Stefano Stabellini wrote:
>> On Mon, 10 Aug 2020, Oleksandr wrote:
>>> On 08.08.20 01:19, Oleksandr wrote:
>>>> On 08.08.20 00:50, Stefano Stabellini wrote:
>>>>> On Fri, 7 Aug 2020, Oleksandr wrote:
>>>>>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>>>>>
>>>>>> Hi Stefano
>>>>>>
>>>>>> Trying to simulate IO_RETRY handling mechanism (according to model
>>>>>> below) I
>>>>>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>>>>>
>>>>>>> OK, thanks for the details. My interpretation seems to be correct.
>>>>>>>
>>>>>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv 
>>>>>>> should
>>>>>>> return IO_RETRY. Then, 
>>>>>>> xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>>>>>> also needs to handle try_handle_mmio returning IO_RETRY the first
>>>>>>> around, and IO_HANDLED when after QEMU does its job.
>>>>>>>
>>>>>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply 
>>>>>>> return
>>>>>>> early and let the scheduler do its job? Something like:
>>>>>>>
>>>>>>>                enum io_state state = try_handle_mmio(regs, hsr, 
>>>>>>> gpa);
>>>>>>>
>>>>>>>                switch ( state )
>>>>>>>                {
>>>>>>>                case IO_ABORT:
>>>>>>>                    goto inject_abt;
>>>>>>>                case IO_HANDLED:
>>>>>>>                    advance_pc(regs, hsr);
>>>>>>>                    return;
>>>>>>>                case IO_RETRY:
>>>>>>>                    /* finish later */
>>>>>>>                    return;
>>>>>>>                case IO_UNHANDLED:
>>>>>>>                    /* IO unhandled, try another way to handle 
>>>>>>> it. */
>>>>>>>                    break;
>>>>>>>                default:
>>>>>>>                    ASSERT_UNREACHABLE();
>>>>>>>                }
>>>>>>>
>>>>>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>>>>>> handle_hvm_io_completion() after QEMU completes the emulation. 
>>>>>>> Today,
>>>>>>> handle_mmio just sets the user register with the read value.
>>>>>>>
>>>>>>> But it would be better if it called again the original function
>>>>>>> do_trap_stage2_abort_guest to actually retry the original 
>>>>>>> operation.
>>>>>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and 
>>>>>>> gets
>>>>>>> IO_HANDLED instead of IO_RETRY,
>>>>>> I may miss some important point, but I failed to see why 
>>>>>> try_handle_mmio
>>>>>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at this
>>>>>> stage.
>>>>>> Or current try_fwd_ioserv() logic needs rework?
>>>>> I think you should check the ioreq->state in try_fwd_ioserv(), if the
>>>>> result is ready, then ioreq->state should be STATE_IORESP_READY, 
>>>>> and you
>>>>> can return IO_HANDLED.
>>>>
>>>
>>> I optimized test patch a bit (now it looks much simpler). I didn't 
>>> face any
>>> issues during a quick test.
>>
>> Both patches get much closer to following the proper state machine,
>> great! I think this patch is certainly a good improvement. I think the
>> other patch you sent earlier, slightly larger, is even better. It makes
>> the following additional changes that would be good to have:
>>
>> - try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
>> - handle_mmio simply calls do_trap_stage2_abort_guest
>
> I don't think we should call do_trap_stage2_abort_guest() as part of 
> the completion because:
>     * The function do_trap_stage2_abort_guest() is using registers 
> that are not context switched (such as FAR_EL2). I/O handling is split 
> in two with likely a context switch in the middle. The second part is 
> the completion (i.e call to handle_mmio()). So the system registers 
> will be incorrect.
>     * A big chunk of do_trap_stage2_abort_guest() is not necessary for 
> the completion. For instance, there is no need to try to translate the 
> guest virtual address to a guest physical address.
>
> Therefore the version below is probably the best approach.


Indeed, the first version (with calling do_trap_stage2_abort_guest() for 
a completion) is a racy. When testing it more heavily I faced an issue 
(sometimes) which resulted in DomU got stuck completely.

(XEN) d2v1: vGICD: bad read width 0 r11 offset 0x000f00

I didn't investigate an issue in detail, but I assumed that code in 
do_trap_stage2_abort_guest() caused that. This was the main reason why I 
decided to optimize an initial patch (and took only advance_pc).
Reading Julien's answer I understand now what could happen.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-10 23:34                     ` Stefano Stabellini
@ 2020-08-11 11:28                       ` Julien Grall
  2020-08-11 22:48                         ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-11 11:28 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', 'Jun Nakajima', 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	Oleksandr, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', xen-devel, 'Roger Pau Monné'



On 11/08/2020 00:34, Stefano Stabellini wrote:
> On Mon, 10 Aug 2020, Julien Grall wrote:
>> On 07/08/2020 00:48, Stefano Stabellini wrote:
>>> On Thu, 6 Aug 2020, Julien Grall wrote:
>>>> On 06/08/2020 01:37, Stefano Stabellini wrote:
>>>>> On Wed, 5 Aug 2020, Julien Grall wrote:
>>>>>> On 04/08/2020 20:11, Stefano Stabellini wrote:
>>>>>>> On Tue, 4 Aug 2020, Julien Grall wrote:
>>>>>>>> On 04/08/2020 12:10, Oleksandr wrote:
>>>>>>>>> On 04.08.20 10:45, Paul Durrant wrote:
>>>>>>>>>>> +static inline bool hvm_ioreq_needs_completion(const ioreq_t
>>>>>>>>>>> *ioreq)
>>>>>>>>>>> +{
>>>>>>>>>>> +    return ioreq->state == STATE_IOREQ_READY &&
>>>>>>>>>>> +           !ioreq->data_is_ptr &&
>>>>>>>>>>> +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir !=
>>>>>>>>>>> IOREQ_WRITE);
>>>>>>>>>>> +}
>>>>>>>>>> I don't think having this in common code is correct. The
>>>>>>>>>> short-cut
>>>>>>>>>> of
>>>>>>>>>> not
>>>>>>>>>> completing PIO reads seems somewhat x86 specific.
>>>>>>>>
>>>>>>>> Hmmm, looking at the code, I think it doesn't wait for PIO writes
>>>>>>>> to
>>>>>>>> complete
>>>>>>>> (not read). Did I miss anything?
>>>>>>>>
>>>>>>>>> Does ARM even
>>>>>>>>>> have the concept of PIO?
>>>>>>>>>
>>>>>>>>> I am not 100% sure here, but it seems that doesn't have.
>>>>>>>>
>>>>>>>> Technically, the PIOs exist on Arm, however they are accessed the
>>>>>>>> same
>>>>>>>> way
>>>>>>>> as
>>>>>>>> MMIO and will have a dedicated area defined by the HW.
>>>>>>>>
>>>>>>>> AFAICT, on Arm64, they are only used for PCI IO Bar.
>>>>>>>>
>>>>>>>> Now the question is whether we want to expose them to the Device
>>>>>>>> Emulator
>>>>>>>> as
>>>>>>>> PIO or MMIO access. From a generic PoV, a DM shouldn't have to
>>>>>>>> care
>>>>>>>> about
>>>>>>>> the
>>>>>>>> architecture used. It should just be able to request a given
>>>>>>>> IOport
>>>>>>>> region.
>>>>>>>>
>>>>>>>> So it may make sense to differentiate them in the common ioreq
>>>>>>>> code as
>>>>>>>> well.
>>>>>>>>
>>>>>>>> I had a quick look at QEMU and wasn't able to tell if PIOs and
>>>>>>>> MMIOs
>>>>>>>> address
>>>>>>>> space are different on Arm as well. Paul, Stefano, do you know
>>>>>>>> what
>>>>>>>> they
>>>>>>>> are
>>>>>>>> doing?
>>>>>>>
>>>>>>> On the QEMU side, it looks like PIO (address_space_io) is used in
>>>>>>> connection with the emulation of the "in" or "out" instructions, see
>>>>>>> ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate PIO
>>>>>>> space regardless of the architecture, such as
>>>>>>> hw/pci/pci_bridge.c:pci_bridge_initfn.
>>>>>>>
>>>>>>> However, because there is no "in" and "out" on ARM, I don't think
>>>>>>> address_space_io can be accessed. Specifically, there is no
>>>>>>> equivalent
>>>>>>> for target/i386/misc_helper.c:helper_inb on ARM.
>>>>>>
>>>>>> So how PCI I/O BAR are accessed? Surely, they could be used on Arm,
>>>>>> right?
>>>>>
>>>>> PIO is also memory mapped on ARM and it seems to have its own MMIO
>>>>> address window.
>>>> This part is already well-understood :). However, this only tell us how an
>>>> OS
>>>> is accessing a PIO.
>>>>
>>>> What I am trying to figure out is how the hardware (or QEMU) is meant to
>>>> work.
>>>>
>>>>   From my understanding, the MMIO access will be received by the hostbridge
>>>> and
>>>> then forwarded to the appropriate PCI device. The two questions I am
>>>> trying to
>>>> answer is: How the I/O BARs are configured? Will it contain an MMIO
>>>> address or
>>>> an offset?
>>>>
>>>> If the answer is the latter, then we will need PIO because a DM will never
>>>> see
>>>> the MMIO address (the hostbridge will be emulated in Xen).
>>>
>>> Now I understand the question :-)
>>>
>>> This is the way I understand it works. Let's say that the PIO aperture
>>> is 0x1000-0x2000 which is aliased to 0x3eff0000-0x3eff1000.
>>> 0x1000-0x2000 are addresses that cannot be accessed directly.
>>> 0x3eff0000-0x3eff1000 is the range that works.
>>>
>>> A PCI device PIO BAR will have an address in the 0x1000-0x2000 range,
>>> for instance 0x1100.

Are you sure about this?

>>>
>>> However, when the operating system access 0x1100, it will issue a read
>>> to 0x3eff0100.
>>>
>>> Xen will trap the read to 0x3eff0100 and send it to QEMU.
>>>
>>> QEMU has to know that 0x3eff0000-0x3eff1000 is the alias to the PIO
>>> aperture and that 0x3eff0100 correspond to PCI device foobar. Similarly,
>>> QEMU has also to know the address range of the MMIO aperture and its
>>> remappings, if any (it is possible to have address remapping for MMIO
>>> addresses too.)
>>>
>>> I think today this information is "built-in" QEMU, not configurable. It
>>> works fine because *I think* the PCI aperture is pretty much the same on
>>> x86 boards, at least the one supported by QEMU for Xen.
>>
>> Well on x86, the OS will access PIO using inb/outb. So the address received by
>> Xen is 0x1000-0x2000 and then forwarded to the DM using the PIO type.
>>
>>> On ARM, I think we should explicitly declare the PCI MMIO aperture and
>>> its alias/address-remapping. When we do that, we can also declare the
>>> PIO aperture and its alias/address-remapping.
>>
>> Well yes, we need to define PCI MMIO and PCI I/O region because the guest OS
>> needs to know them.
> 
> [1]
> (see below)
> 
> 
>> However, I am unsure how this would help us to solve the question whether
>> access to the PCI I/O aperture should be sent as a PIO or MMIO.
>>
>> Per what you wrote, the PCI I/O Bar would be configured with the range
>> 0x1000-0x2000. So a device emulator (this may not be QEMU and only emulate one
>> PCI device!!) will only see that range.
>>
>> How does the device-emulator then know that it needs to watch the region
>> 0x3eff0000-0x3eff1000?
> 
> It would know because the PCI PIO aperture, together with the alias, are
> specified [1].

Are you suggesting fix it in the ABI or pass it as runtime information 
to the Device Emulator?

> 
> 
>> It feels to me that it would be easier/make more sense if the DM only say "I
>> want to watch the PIO range 0x1000-0x2000". So Xen would be in charge to do
>> the translation between the OS view and the DM view.
>>
>> This also means a DM would be completely arch-agnostic. This would follow the
>> HW where you can plug your PCI card on any HW.
> 
> As you know, PIO access is actually not modelled by QEMU for ARM
> targets. I worry about the long term stability of it, given that it is
> untested.  I.e. qemu-system-aarch64 could have a broken PIO emulation
> and nobody would find out except for us when we send ioreqs to it.

There are multiple references of PIO in the QEMU for Arm (see 
hw/arm/virt.c). So what do you mean by not modelled?

> Thinking from a Xen/Emulator interface on ARM, is it wise to rely on an
> access-type that doesn't exist on the architecture?

The architecture doesn't define an instruction to access PIO, however 
this doesn't mean such access doesn't exist on the platform.

For instance, PCI device may have I/O BAR. On Arm64, the hostbridge will 
be responsible to do the translation between the MMIO access to a PIO 
access for the PCI device.

I have the impression that we disagree in what the Device Emulator is 
meant to do. IHMO, the goal of the device emulator is to emulate a 
device in an arch-agnostic way.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-10 23:34                   ` Stefano Stabellini
@ 2020-08-11 13:04                     ` Julien Grall
  2020-08-11 22:48                       ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-11 13:04 UTC (permalink / raw)
  To: Stefano Stabellini, Julien Grall
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Oleksandr Tyshchenko, Julien Grall,
	Jan Beulich, xen-devel, Volodymyr Babchuk

Hi Stefano,

On 11/08/2020 00:34, Stefano Stabellini wrote:
> On Sat, 8 Aug 2020, Julien Grall wrote:
>> On Fri, 7 Aug 2020 at 22:51, Stefano Stabellini <sstabellini@kernel.org> wrote:
>>>
>>> On Fri, 7 Aug 2020, Jan Beulich wrote:
>>>> On 07.08.2020 01:49, Stefano Stabellini wrote:
>>>>> On Thu, 6 Aug 2020, Julien Grall wrote:
>>>>>> On 06/08/2020 01:37, Stefano Stabellini wrote:
>>>>>>> On Wed, 5 Aug 2020, Julien Grall wrote:
>>>>>>>> On 05/08/2020 00:22, Stefano Stabellini wrote:
>>>>>>>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>>>>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>>>>
>>>>>>>>>> This patch adds ability to the device emulator to notify otherend
>>>>>>>>>> (some entity running in the guest) using a SPI and implements Arm
>>>>>>>>>> specific bits for it. Proposed interface allows emulator to set
>>>>>>>>>> the logical level of a one of a domain's IRQ lines.
>>>>>>>>>>
>>>>>>>>>> Please note, this is a split/cleanup of Julien's PoC:
>>>>>>>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>>>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>>>> ---
>>>>>>>>>>     tools/libs/devicemodel/core.c                   | 18
>>>>>>>>>> ++++++++++++++++++
>>>>>>>>>>     tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
>>>>>>>>>>     tools/libs/devicemodel/libxendevicemodel.map    |  1 +
>>>>>>>>>>     xen/arch/arm/dm.c                               | 22
>>>>>>>>>> +++++++++++++++++++++-
>>>>>>>>>>     xen/common/hvm/dm.c                             |  1 +
>>>>>>>>>>     xen/include/public/hvm/dm_op.h                  | 15
>>>>>>>>>> +++++++++++++++
>>>>>>>>>>     6 files changed, 60 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/tools/libs/devicemodel/core.c
>>>>>>>>>> b/tools/libs/devicemodel/core.c
>>>>>>>>>> index 4d40639..30bd79f 100644
>>>>>>>>>> --- a/tools/libs/devicemodel/core.c
>>>>>>>>>> +++ b/tools/libs/devicemodel/core.c
>>>>>>>>>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
>>>>>>>>>>         return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>>>>>>>>>>     }
>>>>>>>>>>     +int xendevicemodel_set_irq_level(
>>>>>>>>>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
>>>>>>>>>> +    unsigned int level)
>>>>>>>>>
>>>>>>>>> It is a pity that having xen_dm_op_set_pci_intx_level and
>>>>>>>>> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
>>>>>>>>> the names alone I don't think we can reuse either of them.
>>>>>>>>
>>>>>>>> The problem is not the name...
>>>>>>>>
>>>>>>>>>
>>>>>>>>> It is very similar to set_isa_irq_level. We could almost rename
>>>>>>>>> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
>>>>>>>>> better, just add an alias to it so that xendevicemodel_set_irq_level is
>>>>>>>>> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
>>>>>>>>> not sure if it is worth doing it though. Any other opinions?
>>>>>>>>
>>>>>>>> ... the problem is the interrupt field is only 8-bit. So we would only be
>>>>>>>> able
>>>>>>>> to cover IRQ 0 - 255.
>>>>>>>
>>>>>>> Argh, that's not going to work :-(  I wasn't sure if it was a good idea
>>>>>>> anyway.
>>>>>>>
>>>>>>>
>>>>>>>> It is not entirely clear how the existing subop could be extended without
>>>>>>>> breaking existing callers.
>>>>>>>>
>>>>>>>>> But I think we should plan for not needing two calls (one to set level
>>>>>>>>> to 1, and one to set it to 0):
>>>>>>>>> https://marc.info/?l=xen-devel&m=159535112027405
>>>>>>>>
>>>>>>>> I am not sure to understand your suggestion here? Are you suggesting to
>>>>>>>> remove
>>>>>>>> the 'level' parameter?
>>>>>>>
>>>>>>> My hope was to make it optional to call the hypercall with level = 0,
>>>>>>> not necessarily to remove 'level' from the struct.
>>>>>>
>>>>>>  From my understanding, the hypercall is meant to represent the status of the
>>>>>> line between the device and the interrupt controller (either low or high).
>>>>>>
>>>>>> This is then up to the interrupt controller to decide when the interrupt is
>>>>>> going to be fired:
>>>>>>    - For edge interrupt, this will fire when the line move from low to high (or
>>>>>> vice versa).
>>>>>>    - For level interrupt, this will fire when line is high (assuming level
>>>>>> trigger high) and will keeping firing until the device decided to lower the
>>>>>> line.
>>>>>>
>>>>>> For a device, it is common to keep the line high until an OS wrote to a
>>>>>> specific register.
>>>>>>
>>>>>> Furthermore, technically, the guest OS is in charge to configure how an
>>>>>> interrupt is triggered. Admittely this information is part of the DT, but
>>>>>> nothing prevent a guest to change it.
>>>>>>
>>>>>> As side note, we have a workaround in Xen for some buggy DT (see the arch
>>>>>> timer) exposing the wrong trigger type.
>>>>>>
>>>>>> Because of that, I don't really see a way to make optional. Maybe you have
>>>>>> something different in mind?
>>>>>
>>>>> For level, we need the level parameter. For edge, we are only interested
>>>>> in the "edge", right?
>>>>
>>>> I don't think so, unless Arm has special restrictions. Edges can be
>>>> both rising and falling ones.
>>>
>>> And the same is true for level interrupts too: they could be active-low
>>> or active-high.
>>>
>>>
>>> Instead of modelling the state of the line, which seems to be a bit
>>> error prone especially in the case of a single-device emulator that
>>> might not have enough information about the rest of the system (it might
>>> not know if the interrupt is active-high or active-low), we could model
>>> the triggering of the interrupt instead.
>>
>> I am not sure to understand why the single (or event multiple) device
>> emulator needs to know the trigger type. The information of the
>> trigger type of the interrupt would be described in the firmware table
>> and it is expected to be the same as what the emulator expects.
>>
>> If the guest OS decided to configure wrongly the interrupt trigger
>> type, then it may not work properly. But, from my understanding, this
>> doesn't differ from the HW behavior.
>>
>>>
>>> In the case of level=1, it would mean that the interrupt line is active,
>>> no matter if it is active-low or active-high. In the case of level=0, it
>>> would mean that it is inactive.
>>>
>>> Similarly, in the case of an edge interrupt edge=1 or level=1 would mean
>>> that there is an edge, no matter if it is rising or falling.
>>
>> TBH, I think your approach is only going to introduce more headache in
>> Xen if a guest OS decides to change the trigger type.
>>
>> It feels much easier to just ask the emulator to let us know the level
>> of the line. Then if the guest OS decides to change the trigger type,
>> we only need to resample the line.
> 
> Emulators, at least the ones in QEMU, don't model the hardware so
> closely to care about trigger type. The only thing they typically care
> about is to fire a notification.

I don't think I agree with this. Devices in QEMU will set the level 
(high or low) of the line. This is then up to the interrupt controller 
to decide how to act with it. See the function qemu_set_irq().

In the case of active-high level interrupt, the interrupt would fire 
until the line has been lowered.

> 
> The trigger type only comes into the picture when there is a bug or a
> disagreement between Xen and QEMU. Imagine a device that can be both
> level active-high or active-low, if the guest kernel changes the
> configuration, Xen would know about it, but QEMU wouldn't.

Lets take a step back. From my understanding, on real HW, the OS will 
have to configure the device *and* the interrupt controller in order to 
switch from level active-low to level active-high. Otherwise, there 
would be discrepancy between the two.

In our situation, Xen is basically the interrupt controller and QEMU the 
device. So both should be aware of any change here. Did I miss anything?

>  I vaguely
> recall a bug 10+ years ago about this with QEMU on x86 and a line that
> could be both active-high and active-low. So QEMU would raise the
> interrupt but Xen would actually think that QEMU stopped the interrupt.
> 
> To do this right, we would have to introduce an interface between Xen
> and QEMU to propagate the trigger type. Xen would have to tell QEMU when
> the guest changed the configuration. That would work, but it would be
> better if we can figure out a way to do without it to reduce complexity.
Per above, I don't think this is necessary.

> 
> Instead, given that QEMU and other emulators don't actually care about
> active-high or active-low, if we have a Xen interface that just says
> "fire the interrupt" we get away from this kind of troubles. It would
> also be more efficient because the total number of hypercalls required
> would be lower.

I read "fire interrupt" the interrupt as "Please generate an interrupt 
once". Is it what you definition you expect?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-05  9:32     ` Julien Grall
  2020-08-05 15:41       ` Oleksandr
  2020-08-10 18:09       ` Oleksandr
@ 2020-08-11 17:09       ` Oleksandr
  2020-08-11 17:50         ` Julien Grall
  2 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-11 17:09 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Julien Grall, Jan Beulich, xen-devel,
	Daniel De Graaf, Volodymyr Babchuk


On 05.08.20 12:32, Julien Grall wrote:

Hi Julien, Stefano

>
>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>> index 5fdb6e8..5823f11 100644
>>> --- a/xen/include/asm-arm/p2m.h
>>> +++ b/xen/include/asm-arm/p2m.h
>>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct 
>>> domain *d, unsigned long gfn,
>>>                                           mfn_t mfn)
>>>   {
>>>       /*
>>> -     * NOTE: If this is implemented then proper reference counting of
>>> -     *       foreign entries will need to be implemented.
>>> +     * XXX: handle properly reference. It looks like the page may 
>>> not always
>>> +     * belong to d.
>>
>> Just as a reference, and without taking away anything from the comment,
>> I think that QEMU is doing its own internal reference counting for these
>> mappings.
>
> I am not sure how this matters here? We can't really trust the DM to 
> do the right thing if it is not running in dom0.
>
> But, IIRC, the problem is some of the pages doesn't belong to do a 
> domain, so it is not possible to treat them as foreign mapping (e.g. 
> you wouldn't be able to grab a reference). This investigation was done 
> a couple of years ago, so this may have changed in recent Xen.

Well, emulator is going to be used in driver domain, so this TODO must 
be resolved. I suspect that the check for a hardware domain in 
acquire_resource() which I skipped in a hackish way [1] could be simply 
removed once proper reference counting is implemented in Xen, correct?

Could you please provide some pointers on that problem? Maybe some 
questions need to be investigated again? Unfortunately, it is not 
completely clear to me the direction to follow...

***
I am wondering whether the similar problem exists on x86 as well? The 
FIXME tag (before checking for a hardware domain in acquire_resource()) 
in the common code makes me think it is a common issue. From other side 
x86's
implementation of set_foreign_p2m_entry() is exists unlike Arm's one 
(which returned -EOPNOTSUPP so far). Or these are unrelated?
***

[1] https://lists.xen.org/archives/html/xen-devel/2020-08/msg00075.html



-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-11 17:09       ` Oleksandr
@ 2020-08-11 17:50         ` Julien Grall
  2020-08-13 18:41           ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-11 17:50 UTC (permalink / raw)
  To: Oleksandr, Stefano Stabellini
  Cc: Wei Liu, Andrew Cooper, Ian Jackson, George Dunlap,
	Oleksandr Tyshchenko, Julien Grall, Jan Beulich, xen-devel,
	Daniel De Graaf, Volodymyr Babchuk



On 11/08/2020 18:09, Oleksandr wrote:
> 
> On 05.08.20 12:32, Julien Grall wrote:
> 
> Hi Julien, Stefano
> 
>>
>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>>> index 5fdb6e8..5823f11 100644
>>>> --- a/xen/include/asm-arm/p2m.h
>>>> +++ b/xen/include/asm-arm/p2m.h
>>>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct 
>>>> domain *d, unsigned long gfn,
>>>>                                           mfn_t mfn)
>>>>   {
>>>>       /*
>>>> -     * NOTE: If this is implemented then proper reference counting of
>>>> -     *       foreign entries will need to be implemented.
>>>> +     * XXX: handle properly reference. It looks like the page may 
>>>> not always
>>>> +     * belong to d.
>>>
>>> Just as a reference, and without taking away anything from the comment,
>>> I think that QEMU is doing its own internal reference counting for these
>>> mappings.
>>
>> I am not sure how this matters here? We can't really trust the DM to 
>> do the right thing if it is not running in dom0.
>>
>> But, IIRC, the problem is some of the pages doesn't belong to do a 
>> domain, so it is not possible to treat them as foreign mapping (e.g. 
>> you wouldn't be able to grab a reference). This investigation was done 
>> a couple of years ago, so this may have changed in recent Xen.
> 
> Well, emulator is going to be used in driver domain, so this TODO must 
> be resolved. I suspect that the check for a hardware domain in 
> acquire_resource() which I skipped in a hackish way [1] could be simply 
> removed once proper reference counting is implemented in Xen, correct?

It depends how you are going to solve it. If you manage to solve it in a 
generic way, then yes you could resolve. If not (i.e. it is solved in an 
arch-specific way), we would need to keep the check on arch that are not 
able to deal with it. See more below.

> 
> Could you please provide some pointers on that problem? Maybe some 
> questions need to be investigated again? Unfortunately, it is not 
> completely clear to me the direction to follow...
> 
> ***
> I am wondering whether the similar problem exists on x86 as well?

It is somewhat different. On Arm, we are able to handle properly foreign 
mapping (i.e. mapping page from a another domain) as we would grab a 
reference on the page (see XENMAPSPACE_gmfn_foreign handling in 
xenmem_add_to_physmap()). The reference will then be released when the 
entry is removed from the P2M (see p2m_free_entry()).

If all the pages given to set_foreign_p2m_entry() belong to a domain, 
then you could use the same approach.

However, I remember to run into some issues in some of the cases. I had 
a quick looked at the caller and I wasn't able to find any use cases 
that may be an issue.

The refcounting in the IOREQ code has changed after XSA-276 (this was 
found while working on the Arm port). Probably the best way to figure 
out if it works would be to try it and see if it fails.

Note that set_foreign_p2m_entry() doesn't have a parameter for the 
foreign domain. You would need to add a extra parameter for this.

> The 
> FIXME tag (before checking for a hardware domain in acquire_resource()) 
> in the common code makes me think it is a common issue. From other side 
> x86's
> implementation of set_foreign_p2m_entry() is exists unlike Arm's one 
> (which returned -EOPNOTSUPP so far). Or these are unrelated?

At the moment, x86 doesn't support refcounting for foreign mapping. 
Hence the reason to restrict them to the hardware domain.

> ***
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2020-08/msg00075.html
Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-11 10:10                           ` Oleksandr
@ 2020-08-11 22:47                             ` Stefano Stabellini
  2020-08-12 14:35                               ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-11 22:47 UTC (permalink / raw)
  To: Oleksandr
  Cc: 'Kevin Tian',
	Stefano Stabellini, Julien Grall, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', 'Oleksandr Tyshchenko',
	'Julien Grall',
	Jan Beulich, xen-devel, 'Roger Pau Monné'

[-- Attachment #1: Type: text/plain, Size: 6465 bytes --]

On Tue, 11 Aug 2020, Oleksandr wrote:
> On 11.08.20 12:19, Julien Grall wrote:
> > On 11/08/2020 00:34, Stefano Stabellini wrote:
> > > On Mon, 10 Aug 2020, Oleksandr wrote:
> > > > On 08.08.20 01:19, Oleksandr wrote:
> > > > > On 08.08.20 00:50, Stefano Stabellini wrote:
> > > > > > On Fri, 7 Aug 2020, Oleksandr wrote:
> > > > > > > On 06.08.20 03:37, Stefano Stabellini wrote:
> > > > > > > 
> > > > > > > Hi Stefano
> > > > > > > 
> > > > > > > Trying to simulate IO_RETRY handling mechanism (according to model
> > > > > > > below) I
> > > > > > > continuously get IO_RETRY from try_fwd_ioserv() ...
> > > > > > > 
> > > > > > > > OK, thanks for the details. My interpretation seems to be
> > > > > > > > correct.
> > > > > > > > 
> > > > > > > > In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv
> > > > > > > > should
> > > > > > > > return IO_RETRY. Then,
> > > > > > > > xen/arch/arm/traps.c:do_trap_stage2_abort_guest
> > > > > > > > also needs to handle try_handle_mmio returning IO_RETRY the
> > > > > > > > first
> > > > > > > > around, and IO_HANDLED when after QEMU does its job.
> > > > > > > > 
> > > > > > > > What should do_trap_stage2_abort_guest do on IO_RETRY? Simply
> > > > > > > > return
> > > > > > > > early and let the scheduler do its job? Something like:
> > > > > > > > 
> > > > > > > >                enum io_state state = try_handle_mmio(regs, hsr,
> > > > > > > > gpa);
> > > > > > > > 
> > > > > > > >                switch ( state )
> > > > > > > >                {
> > > > > > > >                case IO_ABORT:
> > > > > > > >                    goto inject_abt;
> > > > > > > >                case IO_HANDLED:
> > > > > > > >                    advance_pc(regs, hsr);
> > > > > > > >                    return;
> > > > > > > >                case IO_RETRY:
> > > > > > > >                    /* finish later */
> > > > > > > >                    return;
> > > > > > > >                case IO_UNHANDLED:
> > > > > > > >                    /* IO unhandled, try another way to handle
> > > > > > > > it. */
> > > > > > > >                    break;
> > > > > > > >                default:
> > > > > > > >                    ASSERT_UNREACHABLE();
> > > > > > > >                }
> > > > > > > > 
> > > > > > > > Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> > > > > > > > handle_hvm_io_completion() after QEMU completes the emulation.
> > > > > > > > Today,
> > > > > > > > handle_mmio just sets the user register with the read value.
> > > > > > > > 
> > > > > > > > But it would be better if it called again the original function
> > > > > > > > do_trap_stage2_abort_guest to actually retry the original
> > > > > > > > operation.
> > > > > > > > This time do_trap_stage2_abort_guest calls try_handle_mmio() and
> > > > > > > > gets
> > > > > > > > IO_HANDLED instead of IO_RETRY,
> > > > > > > I may miss some important point, but I failed to see why
> > > > > > > try_handle_mmio
> > > > > > > (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at
> > > > > > > this
> > > > > > > stage.
> > > > > > > Or current try_fwd_ioserv() logic needs rework?
> > > > > > I think you should check the ioreq->state in try_fwd_ioserv(), if
> > > > > > the
> > > > > > result is ready, then ioreq->state should be STATE_IORESP_READY, and
> > > > > > you
> > > > > > can return IO_HANDLED.
> > > > > 
> > > > 
> > > > I optimized test patch a bit (now it looks much simpler). I didn't face
> > > > any
> > > > issues during a quick test.
> > > 
> > > Both patches get much closer to following the proper state machine,
> > > great! I think this patch is certainly a good improvement. I think the
> > > other patch you sent earlier, slightly larger, is even better. It makes
> > > the following additional changes that would be good to have:
> > > 
> > > - try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
> > > - handle_mmio simply calls do_trap_stage2_abort_guest
> > 
> > I don't think we should call do_trap_stage2_abort_guest() as part of the
> > completion because:
> >     * The function do_trap_stage2_abort_guest() is using registers that are
> > not context switched (such as FAR_EL2). I/O handling is split in two with
> > likely a context switch in the middle. The second part is the completion
> > (i.e call to handle_mmio()). So the system registers will be incorrect.
> >     * A big chunk of do_trap_stage2_abort_guest() is not necessary for the
> > completion. For instance, there is no need to try to translate the guest
> > virtual address to a guest physical address.
> > 
> > Therefore the version below is probably the best approach.
> 
> 
> Indeed, the first version (with calling do_trap_stage2_abort_guest() for a
> completion) is a racy. When testing it more heavily I faced an issue
> (sometimes) which resulted in DomU got stuck completely.
> 
> (XEN) d2v1: vGICD: bad read width 0 r11 offset 0x000f00
> 
> I didn't investigate an issue in detail, but I assumed that code in
> do_trap_stage2_abort_guest() caused that. This was the main reason why I
> decided to optimize an initial patch (and took only advance_pc).
> Reading Julien's answer I understand now what could happen.

From your and Julien's feedback it is clear that calling
do_trap_stage2_abort_guest() is not possible and not a good idea.


The reason for my suggestion was to complete the implementation of the
state machine so that "RETRY" actually means "let's try again the
emulation" but the second time it will return "HANDLED".

Looking at this again, we could achieve the same goal in a better way by
moving the register setting from "handle_mmio" to "try_handle_mmio" and
also calling "try_handle_mmio" from "handle_mmio". Note that handle_mmio
would become almost empty like on x86.

1) do_trap_stage2_abort_guest ->
       try_handle_mmio ->
            try_fwd_ioserv ->
                IO_RETRY

2) handle_hvm_io_completion ->
       handle_mmio ->
           try_handle_mmio ->
               try_fwd_ioserv ->
                   IO_HANDLED


It is very similar to your second patch with a small change on calling
try_handle_mmio from handle_mmio and setting the register there. Do you
think that would work?

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-11 11:28                       ` Julien Grall
@ 2020-08-11 22:48                         ` Stefano Stabellini
  2020-08-12  8:19                           ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-11 22:48 UTC (permalink / raw)
  To: Julien Grall
  Cc: 'Kevin Tian', Stefano Stabellini, 'Jun Nakajima',
	'Wei Liu', paul, 'Andrew Cooper',
	'Ian Jackson', 'George Dunlap',
	'Tim Deegan', Oleksandr, 'Oleksandr Tyshchenko',
	'Julien Grall', 'Jan Beulich',
	xen-devel, 'Roger Pau Monné'

[-- Attachment #1: Type: text/plain, Size: 10187 bytes --]

On Tue, 11 Aug 2020, Julien Grall wrote:
> On 11/08/2020 00:34, Stefano Stabellini wrote:
> > On Mon, 10 Aug 2020, Julien Grall wrote:
> > > On 07/08/2020 00:48, Stefano Stabellini wrote:
> > > > On Thu, 6 Aug 2020, Julien Grall wrote:
> > > > > On 06/08/2020 01:37, Stefano Stabellini wrote:
> > > > > > On Wed, 5 Aug 2020, Julien Grall wrote:
> > > > > > > On 04/08/2020 20:11, Stefano Stabellini wrote:
> > > > > > > > On Tue, 4 Aug 2020, Julien Grall wrote:
> > > > > > > > > On 04/08/2020 12:10, Oleksandr wrote:
> > > > > > > > > > On 04.08.20 10:45, Paul Durrant wrote:
> > > > > > > > > > > > +static inline bool hvm_ioreq_needs_completion(const
> > > > > > > > > > > > ioreq_t
> > > > > > > > > > > > *ioreq)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +    return ioreq->state == STATE_IOREQ_READY &&
> > > > > > > > > > > > +           !ioreq->data_is_ptr &&
> > > > > > > > > > > > +           (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir
> > > > > > > > > > > > !=
> > > > > > > > > > > > IOREQ_WRITE);
> > > > > > > > > > > > +}
> > > > > > > > > > > I don't think having this in common code is correct. The
> > > > > > > > > > > short-cut
> > > > > > > > > > > of
> > > > > > > > > > > not
> > > > > > > > > > > completing PIO reads seems somewhat x86 specific.
> > > > > > > > > 
> > > > > > > > > Hmmm, looking at the code, I think it doesn't wait for PIO
> > > > > > > > > writes
> > > > > > > > > to
> > > > > > > > > complete
> > > > > > > > > (not read). Did I miss anything?
> > > > > > > > > 
> > > > > > > > > > Does ARM even
> > > > > > > > > > > have the concept of PIO?
> > > > > > > > > > 
> > > > > > > > > > I am not 100% sure here, but it seems that doesn't have.
> > > > > > > > > 
> > > > > > > > > Technically, the PIOs exist on Arm, however they are accessed
> > > > > > > > > the
> > > > > > > > > same
> > > > > > > > > way
> > > > > > > > > as
> > > > > > > > > MMIO and will have a dedicated area defined by the HW.
> > > > > > > > > 
> > > > > > > > > AFAICT, on Arm64, they are only used for PCI IO Bar.
> > > > > > > > > 
> > > > > > > > > Now the question is whether we want to expose them to the
> > > > > > > > > Device
> > > > > > > > > Emulator
> > > > > > > > > as
> > > > > > > > > PIO or MMIO access. From a generic PoV, a DM shouldn't have to
> > > > > > > > > care
> > > > > > > > > about
> > > > > > > > > the
> > > > > > > > > architecture used. It should just be able to request a given
> > > > > > > > > IOport
> > > > > > > > > region.
> > > > > > > > > 
> > > > > > > > > So it may make sense to differentiate them in the common ioreq
> > > > > > > > > code as
> > > > > > > > > well.
> > > > > > > > > 
> > > > > > > > > I had a quick look at QEMU and wasn't able to tell if PIOs and
> > > > > > > > > MMIOs
> > > > > > > > > address
> > > > > > > > > space are different on Arm as well. Paul, Stefano, do you know
> > > > > > > > > what
> > > > > > > > > they
> > > > > > > > > are
> > > > > > > > > doing?
> > > > > > > > 
> > > > > > > > On the QEMU side, it looks like PIO (address_space_io) is used
> > > > > > > > in
> > > > > > > > connection with the emulation of the "in" or "out" instructions,
> > > > > > > > see
> > > > > > > > ioport.c:cpu_inb for instance. Some parts of PCI on QEMU emulate
> > > > > > > > PIO
> > > > > > > > space regardless of the architecture, such as
> > > > > > > > hw/pci/pci_bridge.c:pci_bridge_initfn.
> > > > > > > > 
> > > > > > > > However, because there is no "in" and "out" on ARM, I don't
> > > > > > > > think
> > > > > > > > address_space_io can be accessed. Specifically, there is no
> > > > > > > > equivalent
> > > > > > > > for target/i386/misc_helper.c:helper_inb on ARM.
> > > > > > > 
> > > > > > > So how PCI I/O BAR are accessed? Surely, they could be used on
> > > > > > > Arm,
> > > > > > > right?
> > > > > > 
> > > > > > PIO is also memory mapped on ARM and it seems to have its own MMIO
> > > > > > address window.
> > > > > This part is already well-understood :). However, this only tell us
> > > > > how an
> > > > > OS
> > > > > is accessing a PIO.
> > > > > 
> > > > > What I am trying to figure out is how the hardware (or QEMU) is meant
> > > > > to
> > > > > work.
> > > > > 
> > > > >   From my understanding, the MMIO access will be received by the
> > > > > hostbridge
> > > > > and
> > > > > then forwarded to the appropriate PCI device. The two questions I am
> > > > > trying to
> > > > > answer is: How the I/O BARs are configured? Will it contain an MMIO
> > > > > address or
> > > > > an offset?
> > > > > 
> > > > > If the answer is the latter, then we will need PIO because a DM will
> > > > > never
> > > > > see
> > > > > the MMIO address (the hostbridge will be emulated in Xen).
> > > > 
> > > > Now I understand the question :-)
> > > > 
> > > > This is the way I understand it works. Let's say that the PIO aperture
> > > > is 0x1000-0x2000 which is aliased to 0x3eff0000-0x3eff1000.
> > > > 0x1000-0x2000 are addresses that cannot be accessed directly.
> > > > 0x3eff0000-0x3eff1000 is the range that works.
> > > > 
> > > > A PCI device PIO BAR will have an address in the 0x1000-0x2000 range,
> > > > for instance 0x1100.
> 
> Are you sure about this?

I am pretty sure, but only from reading the code. It would be great if
somebody ran QEMU and actually tested it. This is important because it
could make the whole discussion moot :-)


> > > > However, when the operating system access 0x1100, it will issue a read
> > > > to 0x3eff0100.
> > > > 
> > > > Xen will trap the read to 0x3eff0100 and send it to QEMU.
> > > > 
> > > > QEMU has to know that 0x3eff0000-0x3eff1000 is the alias to the PIO
> > > > aperture and that 0x3eff0100 correspond to PCI device foobar. Similarly,
> > > > QEMU has also to know the address range of the MMIO aperture and its
> > > > remappings, if any (it is possible to have address remapping for MMIO
> > > > addresses too.)
> > > > 
> > > > I think today this information is "built-in" QEMU, not configurable. It
> > > > works fine because *I think* the PCI aperture is pretty much the same on
> > > > x86 boards, at least the one supported by QEMU for Xen.
> > > 
> > > Well on x86, the OS will access PIO using inb/outb. So the address
> > > received by
> > > Xen is 0x1000-0x2000 and then forwarded to the DM using the PIO type.
> > > 
> > > > On ARM, I think we should explicitly declare the PCI MMIO aperture and
> > > > its alias/address-remapping. When we do that, we can also declare the
> > > > PIO aperture and its alias/address-remapping.
> > > 
> > > Well yes, we need to define PCI MMIO and PCI I/O region because the guest
> > > OS
> > > needs to know them.
> > 
> > [1]
> > (see below)
> > 
> > 
> > > However, I am unsure how this would help us to solve the question whether
> > > access to the PCI I/O aperture should be sent as a PIO or MMIO.
> > > 
> > > Per what you wrote, the PCI I/O Bar would be configured with the range
> > > 0x1000-0x2000. So a device emulator (this may not be QEMU and only emulate
> > > one
> > > PCI device!!) will only see that range.
> > > 
> > > How does the device-emulator then know that it needs to watch the region
> > > 0x3eff0000-0x3eff1000?
> > 
> > It would know because the PCI PIO aperture, together with the alias, are
> > specified [1].
> 
> Are you suggesting fix it in the ABI or pass it as runtime information to the
> Device Emulator?

I am suggesting of "fixing" it in the ABI. Whether we pass it at
runtime or not is less important I think.
 
 
> > > It feels to me that it would be easier/make more sense if the DM only say
> > > "I
> > > want to watch the PIO range 0x1000-0x2000". So Xen would be in charge to
> > > do
> > > the translation between the OS view and the DM view.
> > > 
> > > This also means a DM would be completely arch-agnostic. This would follow
> > > the
> > > HW where you can plug your PCI card on any HW.
> > 
> > As you know, PIO access is actually not modelled by QEMU for ARM
> > targets. I worry about the long term stability of it, given that it is
> > untested.  I.e. qemu-system-aarch64 could have a broken PIO emulation
> > and nobody would find out except for us when we send ioreqs to it.
> 
> There are multiple references of PIO in the QEMU for Arm (see hw/arm/virt.c).
> So what do you mean by not modelled?

I mean that PIO is only emulated as MMIO region, not as port-mapped I/O.


> > Thinking from a Xen/Emulator interface on ARM, is it wise to rely on an
> > access-type that doesn't exist on the architecture?
> 
> The architecture doesn't define an instruction to access PIO, however this
> doesn't mean such access doesn't exist on the platform.
> 
> For instance, PCI device may have I/O BAR. On Arm64, the hostbridge will be
> responsible to do the translation between the MMIO access to a PIO access for
> the PCI device.

As far as I understand the host bridge is responsible for any
translations between the host address space and the address space of
devices. Even for MMIO addresses there are translations. The hostbridge
doesn't do any port-mapped I/O to communicate with the device. There is
a different protocol for those communications. Port-mapped I/O is how it
is exposed to the host by the hostbridge.

If we wanted to emulate the hostbridge/device interface properly we
would end up with something pretty different.


> I have the impression that we disagree in what the Device Emulator is meant to
> do. IHMO, the goal of the device emulator is to emulate a device in an
> arch-agnostic way.

That would be great in theory but I am not sure it is achievable: if we
use an existing emulator like QEMU, even a single device has to fit
into QEMU's view of the world, which makes assumptions about host
bridges and apertures. It is impossible today to build QEMU in an
arch-agnostic way, it has to be tied to an architecture.

I realize we are not building this interface for QEMU specifically, but
even if we try to make the interface arch-agnostic, in reality the
emulators won't be arch-agnostic. If we send a port-mapped I/O request
to qemu-system-aarch64 who knows what is going to happen: it is a code
path that it is not explicitly tested.

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-11 13:04                     ` Julien Grall
@ 2020-08-11 22:48                       ` Stefano Stabellini
  2020-08-18  9:31                         ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-11 22:48 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Oleksandr Tyshchenko, Oleksandr Tyshchenko,
	Julien Grall, Jan Beulich, xen-devel, Volodymyr Babchuk,
	Julien Grall

On Tue, 11 Aug 2020, Julien Grall wrote:
> On 11/08/2020 00:34, Stefano Stabellini wrote:
> > On Sat, 8 Aug 2020, Julien Grall wrote:
> > > On Fri, 7 Aug 2020 at 22:51, Stefano Stabellini <sstabellini@kernel.org>
> > > wrote:
> > > > 
> > > > On Fri, 7 Aug 2020, Jan Beulich wrote:
> > > > > On 07.08.2020 01:49, Stefano Stabellini wrote:
> > > > > > On Thu, 6 Aug 2020, Julien Grall wrote:
> > > > > > > On 06/08/2020 01:37, Stefano Stabellini wrote:
> > > > > > > > On Wed, 5 Aug 2020, Julien Grall wrote:
> > > > > > > > > On 05/08/2020 00:22, Stefano Stabellini wrote:
> > > > > > > > > > On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> > > > > > > > > > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > > > > > > > 
> > > > > > > > > > > This patch adds ability to the device emulator to notify
> > > > > > > > > > > otherend
> > > > > > > > > > > (some entity running in the guest) using a SPI and
> > > > > > > > > > > implements Arm
> > > > > > > > > > > specific bits for it. Proposed interface allows emulator
> > > > > > > > > > > to set
> > > > > > > > > > > the logical level of a one of a domain's IRQ lines.
> > > > > > > > > > > 
> > > > > > > > > > > Please note, this is a split/cleanup of Julien's PoC:
> > > > > > > > > > > "Add support for Guest IO forwarding to a device emulator"
> > > > > > > > > > > 
> > > > > > > > > > > Signed-off-by: Julien Grall <julien.grall@arm.com>
> > > > > > > > > > > Signed-off-by: Oleksandr Tyshchenko
> > > > > > > > > > > <oleksandr_tyshchenko@epam.com>
> > > > > > > > > > > ---
> > > > > > > > > > >     tools/libs/devicemodel/core.c                   | 18
> > > > > > > > > > > ++++++++++++++++++
> > > > > > > > > > >     tools/libs/devicemodel/include/xendevicemodel.h |  4
> > > > > > > > > > > ++++
> > > > > > > > > > >     tools/libs/devicemodel/libxendevicemodel.map    |  1 +
> > > > > > > > > > >     xen/arch/arm/dm.c                               | 22
> > > > > > > > > > > +++++++++++++++++++++-
> > > > > > > > > > >     xen/common/hvm/dm.c                             |  1 +
> > > > > > > > > > >     xen/include/public/hvm/dm_op.h                  | 15
> > > > > > > > > > > +++++++++++++++
> > > > > > > > > > >     6 files changed, 60 insertions(+), 1 deletion(-)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/tools/libs/devicemodel/core.c
> > > > > > > > > > > b/tools/libs/devicemodel/core.c
> > > > > > > > > > > index 4d40639..30bd79f 100644
> > > > > > > > > > > --- a/tools/libs/devicemodel/core.c
> > > > > > > > > > > +++ b/tools/libs/devicemodel/core.c
> > > > > > > > > > > @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
> > > > > > > > > > >         return xendevicemodel_op(dmod, domid, 1, &op,
> > > > > > > > > > > sizeof(op));
> > > > > > > > > > >     }
> > > > > > > > > > >     +int xendevicemodel_set_irq_level(
> > > > > > > > > > > +    xendevicemodel_handle *dmod, domid_t domid, uint32_t
> > > > > > > > > > > irq,
> > > > > > > > > > > +    unsigned int level)
> > > > > > > > > > 
> > > > > > > > > > It is a pity that having xen_dm_op_set_pci_intx_level and
> > > > > > > > > > xen_dm_op_set_isa_irq_level already we need to add a third
> > > > > > > > > > one, but from
> > > > > > > > > > the names alone I don't think we can reuse either of them.
> > > > > > > > > 
> > > > > > > > > The problem is not the name...
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > It is very similar to set_isa_irq_level. We could almost
> > > > > > > > > > rename
> > > > > > > > > > xendevicemodel_set_isa_irq_level to
> > > > > > > > > > xendevicemodel_set_irq_level or,
> > > > > > > > > > better, just add an alias to it so that
> > > > > > > > > > xendevicemodel_set_irq_level is
> > > > > > > > > > implemented by calling xendevicemodel_set_isa_irq_level.
> > > > > > > > > > Honestly I am
> > > > > > > > > > not sure if it is worth doing it though. Any other opinions?
> > > > > > > > > 
> > > > > > > > > ... the problem is the interrupt field is only 8-bit. So we
> > > > > > > > > would only be
> > > > > > > > > able
> > > > > > > > > to cover IRQ 0 - 255.
> > > > > > > > 
> > > > > > > > Argh, that's not going to work :-(  I wasn't sure if it was a
> > > > > > > > good idea
> > > > > > > > anyway.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > It is not entirely clear how the existing subop could be
> > > > > > > > > extended without
> > > > > > > > > breaking existing callers.
> > > > > > > > > 
> > > > > > > > > > But I think we should plan for not needing two calls (one to
> > > > > > > > > > set level
> > > > > > > > > > to 1, and one to set it to 0):
> > > > > > > > > > https://marc.info/?l=xen-devel&m=159535112027405
> > > > > > > > > 
> > > > > > > > > I am not sure to understand your suggestion here? Are you
> > > > > > > > > suggesting to
> > > > > > > > > remove
> > > > > > > > > the 'level' parameter?
> > > > > > > > 
> > > > > > > > My hope was to make it optional to call the hypercall with level
> > > > > > > > = 0,
> > > > > > > > not necessarily to remove 'level' from the struct.
> > > > > > > 
> > > > > > >  From my understanding, the hypercall is meant to represent the
> > > > > > > status of the
> > > > > > > line between the device and the interrupt controller (either low
> > > > > > > or high).
> > > > > > > 
> > > > > > > This is then up to the interrupt controller to decide when the
> > > > > > > interrupt is
> > > > > > > going to be fired:
> > > > > > >    - For edge interrupt, this will fire when the line move from
> > > > > > > low to high (or
> > > > > > > vice versa).
> > > > > > >    - For level interrupt, this will fire when line is high
> > > > > > > (assuming level
> > > > > > > trigger high) and will keeping firing until the device decided to
> > > > > > > lower the
> > > > > > > line.
> > > > > > > 
> > > > > > > For a device, it is common to keep the line high until an OS wrote
> > > > > > > to a
> > > > > > > specific register.
> > > > > > > 
> > > > > > > Furthermore, technically, the guest OS is in charge to configure
> > > > > > > how an
> > > > > > > interrupt is triggered. Admittely this information is part of the
> > > > > > > DT, but
> > > > > > > nothing prevent a guest to change it.
> > > > > > > 
> > > > > > > As side note, we have a workaround in Xen for some buggy DT (see
> > > > > > > the arch
> > > > > > > timer) exposing the wrong trigger type.
> > > > > > > 
> > > > > > > Because of that, I don't really see a way to make optional. Maybe
> > > > > > > you have
> > > > > > > something different in mind?
> > > > > > 
> > > > > > For level, we need the level parameter. For edge, we are only
> > > > > > interested
> > > > > > in the "edge", right?
> > > > > 
> > > > > I don't think so, unless Arm has special restrictions. Edges can be
> > > > > both rising and falling ones.
> > > > 
> > > > And the same is true for level interrupts too: they could be active-low
> > > > or active-high.
> > > > 
> > > > 
> > > > Instead of modelling the state of the line, which seems to be a bit
> > > > error prone especially in the case of a single-device emulator that
> > > > might not have enough information about the rest of the system (it might
> > > > not know if the interrupt is active-high or active-low), we could model
> > > > the triggering of the interrupt instead.
> > > 
> > > I am not sure to understand why the single (or event multiple) device
> > > emulator needs to know the trigger type. The information of the
> > > trigger type of the interrupt would be described in the firmware table
> > > and it is expected to be the same as what the emulator expects.
> > > 
> > > If the guest OS decided to configure wrongly the interrupt trigger
> > > type, then it may not work properly. But, from my understanding, this
> > > doesn't differ from the HW behavior.
> > > 
> > > > 
> > > > In the case of level=1, it would mean that the interrupt line is active,
> > > > no matter if it is active-low or active-high. In the case of level=0, it
> > > > would mean that it is inactive.
> > > > 
> > > > Similarly, in the case of an edge interrupt edge=1 or level=1 would mean
> > > > that there is an edge, no matter if it is rising or falling.
> > > 
> > > TBH, I think your approach is only going to introduce more headache in
> > > Xen if a guest OS decides to change the trigger type.
> > > 
> > > It feels much easier to just ask the emulator to let us know the level
> > > of the line. Then if the guest OS decides to change the trigger type,
> > > we only need to resample the line.
> > 
> > Emulators, at least the ones in QEMU, don't model the hardware so
> > closely to care about trigger type. The only thing they typically care
> > about is to fire a notification.
> 
> I don't think I agree with this. Devices in QEMU will set the level (high or
> low) of the line. This is then up to the interrupt controller to decide how to
> act with it. See the function qemu_set_irq().
> 
> In the case of active-high level interrupt, the interrupt would fire until the
> line has been lowered.
> 
> > 
> > The trigger type only comes into the picture when there is a bug or a
> > disagreement between Xen and QEMU. Imagine a device that can be both
> > level active-high or active-low, if the guest kernel changes the
> > configuration, Xen would know about it, but QEMU wouldn't.
> 
> Lets take a step back. From my understanding, on real HW, the OS will have to
> configure the device *and* the interrupt controller in order to switch from
> level active-low to level active-high. Otherwise, there would be discrepancy
> between the two.
> 
> In our situation, Xen is basically the interrupt controller and QEMU the
> device. So both should be aware of any change here. Did I miss anything?

What you wrote looks correct. So now I wonder how they went out of sync
that time. Maybe it was something x86 specific and cannot happen on
ARM? Or maybe just a bug in the interrupt controller emulator or QEMU.


> >  I vaguely
> > recall a bug 10+ years ago about this with QEMU on x86 and a line that
> > could be both active-high and active-low. So QEMU would raise the
> > interrupt but Xen would actually think that QEMU stopped the interrupt.
> > 
> > To do this right, we would have to introduce an interface between Xen
> > and QEMU to propagate the trigger type. Xen would have to tell QEMU when
> > the guest changed the configuration. That would work, but it would be
> > better if we can figure out a way to do without it to reduce complexity.
> Per above, I don't think this is necessary.
>
> > 
> > Instead, given that QEMU and other emulators don't actually care about
> > active-high or active-low, if we have a Xen interface that just says
> > "fire the interrupt" we get away from this kind of troubles. It would
> > also be more efficient because the total number of hypercalls required
> > would be lower.
> 
> I read "fire interrupt" the interrupt as "Please generate an interrupt once".
> Is it what you definition you expect?

Yes, that is the idea. It would have to take into account the edge/level
semantic difference: level would have "start it" and a "stop it".


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-11 22:48                         ` Stefano Stabellini
@ 2020-08-12  8:19                           ` Julien Grall
  2020-08-20 19:14                             ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-12  8:19 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: 'Kevin Tian', 'Jun Nakajima', 'Wei Liu',
	paul, 'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Tim Deegan',
	Oleksandr, 'Oleksandr Tyshchenko', 'Julien Grall',
	'Jan Beulich', xen-devel, 'Roger Pau Monné'

Hi,

On 11/08/2020 23:48, Stefano Stabellini wrote:
>> I have the impression that we disagree in what the Device Emulator is meant to
>> do. IHMO, the goal of the device emulator is to emulate a device in an
>> arch-agnostic way.
> 
> That would be great in theory but I am not sure it is achievable: if we
> use an existing emulator like QEMU, even a single device has to fit
> into QEMU's view of the world, which makes assumptions about host
> bridges and apertures. It is impossible today to build QEMU in an
> arch-agnostic way, it has to be tied to an architecture.

AFAICT, the only reason QEMU cannot build be in an arch-agnostic way is 
because of TCG. If this wasn't built then you could easily write a 
machine that doesn't depend on the instruction set.

The proof is, today, we are using QEMU x86 to serve Arm64 guest. 
Although this is only for PV drivers.

> 
> I realize we are not building this interface for QEMU specifically, but
> even if we try to make the interface arch-agnostic, in reality the
> emulators won't be arch-agnostic.

This depends on your goal. If your goal is to write a standalone 
emulator for a single device, then it is entirely possible to make it 
arch-agnostic.

Per above, this would even be possible if you were emulating a set of 
devices.

What I want to avoid is requiring all the emulators to contain 
arch-specific code just because it is easier to get QEMU working on Xen 
on Arm.

> If we send a port-mapped I/O request
> to qemu-system-aarch64 who knows what is going to happen: it is a code
> path that it is not explicitly tested.

Maybe, maybe not. To me this is mostly software issues that can easily 
be mitigated if we do proper testing...

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-11 22:47                             ` Stefano Stabellini
@ 2020-08-12 14:35                               ` Oleksandr
  2020-08-12 23:08                                 ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-12 14:35 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Jan Beulich, paul, xen-devel,
	'Oleksandr Tyshchenko', 'Andrew Cooper',
	'Wei Liu', 'Roger Pau Monné',
	'George Dunlap', 'Ian Jackson',
	'Jun Nakajima', 'Kevin Tian',
	'Tim Deegan', 'Julien Grall'


On 12.08.20 01:47, Stefano Stabellini wrote:

Hi Stefano

> On Tue, 11 Aug 2020, Oleksandr wrote:
>> On 11.08.20 12:19, Julien Grall wrote:
>>> On 11/08/2020 00:34, Stefano Stabellini wrote:
>>>> On Mon, 10 Aug 2020, Oleksandr wrote:
>>>>> On 08.08.20 01:19, Oleksandr wrote:
>>>>>> On 08.08.20 00:50, Stefano Stabellini wrote:
>>>>>>> On Fri, 7 Aug 2020, Oleksandr wrote:
>>>>>>>> On 06.08.20 03:37, Stefano Stabellini wrote:
>>>>>>>>
>>>>>>>> Hi Stefano
>>>>>>>>
>>>>>>>> Trying to simulate IO_RETRY handling mechanism (according to model
>>>>>>>> below) I
>>>>>>>> continuously get IO_RETRY from try_fwd_ioserv() ...
>>>>>>>>
>>>>>>>>> OK, thanks for the details. My interpretation seems to be
>>>>>>>>> correct.
>>>>>>>>>
>>>>>>>>> In which case, it looks like xen/arch/arm/io.c:try_fwd_ioserv
>>>>>>>>> should
>>>>>>>>> return IO_RETRY. Then,
>>>>>>>>> xen/arch/arm/traps.c:do_trap_stage2_abort_guest
>>>>>>>>> also needs to handle try_handle_mmio returning IO_RETRY the
>>>>>>>>> first
>>>>>>>>> around, and IO_HANDLED when after QEMU does its job.
>>>>>>>>>
>>>>>>>>> What should do_trap_stage2_abort_guest do on IO_RETRY? Simply
>>>>>>>>> return
>>>>>>>>> early and let the scheduler do its job? Something like:
>>>>>>>>>
>>>>>>>>>                 enum io_state state = try_handle_mmio(regs, hsr,
>>>>>>>>> gpa);
>>>>>>>>>
>>>>>>>>>                 switch ( state )
>>>>>>>>>                 {
>>>>>>>>>                 case IO_ABORT:
>>>>>>>>>                     goto inject_abt;
>>>>>>>>>                 case IO_HANDLED:
>>>>>>>>>                     advance_pc(regs, hsr);
>>>>>>>>>                     return;
>>>>>>>>>                 case IO_RETRY:
>>>>>>>>>                     /* finish later */
>>>>>>>>>                     return;
>>>>>>>>>                 case IO_UNHANDLED:
>>>>>>>>>                     /* IO unhandled, try another way to handle
>>>>>>>>> it. */
>>>>>>>>>                     break;
>>>>>>>>>                 default:
>>>>>>>>>                     ASSERT_UNREACHABLE();
>>>>>>>>>                 }
>>>>>>>>>
>>>>>>>>> Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
>>>>>>>>> handle_hvm_io_completion() after QEMU completes the emulation.
>>>>>>>>> Today,
>>>>>>>>> handle_mmio just sets the user register with the read value.
>>>>>>>>>
>>>>>>>>> But it would be better if it called again the original function
>>>>>>>>> do_trap_stage2_abort_guest to actually retry the original
>>>>>>>>> operation.
>>>>>>>>> This time do_trap_stage2_abort_guest calls try_handle_mmio() and
>>>>>>>>> gets
>>>>>>>>> IO_HANDLED instead of IO_RETRY,
>>>>>>>> I may miss some important point, but I failed to see why
>>>>>>>> try_handle_mmio
>>>>>>>> (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at
>>>>>>>> this
>>>>>>>> stage.
>>>>>>>> Or current try_fwd_ioserv() logic needs rework?
>>>>>>> I think you should check the ioreq->state in try_fwd_ioserv(), if
>>>>>>> the
>>>>>>> result is ready, then ioreq->state should be STATE_IORESP_READY, and
>>>>>>> you
>>>>>>> can return IO_HANDLED.
>>>>> I optimized test patch a bit (now it looks much simpler). I didn't face
>>>>> any
>>>>> issues during a quick test.
>>>> Both patches get much closer to following the proper state machine,
>>>> great! I think this patch is certainly a good improvement. I think the
>>>> other patch you sent earlier, slightly larger, is even better. It makes
>>>> the following additional changes that would be good to have:
>>>>
>>>> - try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
>>>> - handle_mmio simply calls do_trap_stage2_abort_guest
>>> I don't think we should call do_trap_stage2_abort_guest() as part of the
>>> completion because:
>>>      * The function do_trap_stage2_abort_guest() is using registers that are
>>> not context switched (such as FAR_EL2). I/O handling is split in two with
>>> likely a context switch in the middle. The second part is the completion
>>> (i.e call to handle_mmio()). So the system registers will be incorrect.
>>>      * A big chunk of do_trap_stage2_abort_guest() is not necessary for the
>>> completion. For instance, there is no need to try to translate the guest
>>> virtual address to a guest physical address.
>>>
>>> Therefore the version below is probably the best approach.
>>
>> Indeed, the first version (with calling do_trap_stage2_abort_guest() for a
>> completion) is a racy. When testing it more heavily I faced an issue
>> (sometimes) which resulted in DomU got stuck completely.
>>
>> (XEN) d2v1: vGICD: bad read width 0 r11 offset 0x000f00
>>
>> I didn't investigate an issue in detail, but I assumed that code in
>> do_trap_stage2_abort_guest() caused that. This was the main reason why I
>> decided to optimize an initial patch (and took only advance_pc).
>> Reading Julien's answer I understand now what could happen.
>  From your and Julien's feedback it is clear that calling
> do_trap_stage2_abort_guest() is not possible and not a good idea.
>
>
> The reason for my suggestion was to complete the implementation of the
> state machine so that "RETRY" actually means "let's try again the
> emulation" but the second time it will return "HANDLED".
>
> Looking at this again, we could achieve the same goal in a better way by
> moving the register setting from "handle_mmio" to "try_handle_mmio" and
> also calling "try_handle_mmio" from "handle_mmio". Note that handle_mmio
> would become almost empty like on x86.
>
> 1) do_trap_stage2_abort_guest ->
>         try_handle_mmio ->
>              try_fwd_ioserv ->
>                  IO_RETRY
>
> 2) handle_hvm_io_completion ->
>         handle_mmio ->
>             try_handle_mmio ->
>                 try_fwd_ioserv ->
>                     IO_HANDLED
>
>
> It is very similar to your second patch with a small change on calling
> try_handle_mmio from handle_mmio and setting the register there. Do you
> think that would work?
If I understood correctly what you had suggested and properly 
implemented then it works, at least I didn't face any issues during testing.
I think this variant adds some extra operations comparing to the 
previous one (for example an attempt to find a mmio handler at 
try_handle_mmio). But, if you think new variant is cleaner and better 
represents how the state machine should look like, I would be absolutely 
OK to take this variant for non-RFC series. Please note, there was a 
request to move try_fwd_ioserv() to arm/ioreq.c (I am going to move new 
handle_ioserv() as well).


---
  xen/arch/arm/io.c    | 47 +++++++++++++++++++++++++++++++++++++++++++----
  xen/arch/arm/ioreq.c | 38 +++++++-------------------------------
  xen/arch/arm/traps.c |  4 +++-
  3 files changed, 53 insertions(+), 36 deletions(-)

diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
index 436f669..4db7c55 100644
--- a/xen/arch/arm/io.c
+++ b/xen/arch/arm/io.c
@@ -109,6 +109,43 @@ static const struct mmio_handler 
*find_mmio_handler(struct domain *d,
  }

  #ifdef CONFIG_IOREQ_SERVER
+static enum io_state handle_ioserv(struct cpu_user_regs *regs, struct 
vcpu *v)
+{
+    const union hsr hsr = { .bits = regs->hsr };
+    const struct hsr_dabt dabt = hsr.dabt;
+    /* Code is similar to handle_read */
+    uint8_t size = (1 << dabt.size) * 8;
+    register_t r = v->arch.hvm.hvm_io.io_req.data;
+
+    /* We are done with the IO */
+    /* XXX: Is it the right place? */
+    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
+
+    /* XXX: Do we need to take care of write here ? */
+    if ( dabt.write )
+        return IO_HANDLED;
+
+    /*
+     * Sign extend if required.
+     * Note that we expect the read handler to have zeroed the bits
+     * outside the requested access size.
+     */
+    if ( dabt.sign && (r & (1UL << (size - 1))) )
+    {
+        /*
+         * We are relying on register_t using the same as
+         * an unsigned long in order to keep the 32-bit assembly
+         * code smaller.
+         */
+        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
+        r |= (~0UL) << size;
+    }
+
+    set_user_reg(regs, dabt.reg, r);
+
+    return IO_HANDLED;
+}
+
  static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
                                      struct vcpu *v, mmio_info_t *info)
  {
@@ -130,6 +167,10 @@ static enum io_state try_fwd_ioserv(struct 
cpu_user_regs *regs,
      {
      case STATE_IOREQ_NONE:
          break;
+
+    case STATE_IORESP_READY:
+        return IO_HANDLED;
+
      default:
          printk("d%u wrong state %u\n", v->domain->domain_id,
                 vio->io_req.state);
@@ -156,10 +197,6 @@ static enum io_state try_fwd_ioserv(struct 
cpu_user_regs *regs,
      else
          vio->io_completion = HVMIO_mmio_completion;

-    /* XXX: Decide what to do */
-    if ( rc == IO_RETRY )
-        rc = IO_HANDLED;
-
      return rc;
  }
  #endif
@@ -185,6 +222,8 @@ enum io_state try_handle_mmio(struct cpu_user_regs 
*regs,

  #ifdef CONFIG_IOREQ_SERVER
          rc = try_fwd_ioserv(regs, v, &info);
+        if ( rc == IO_HANDLED )
+            return handle_ioserv(regs, v);
  #endif

          return rc;
diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index 8f60c41..9068b8d 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -33,46 +33,22 @@
  #include <public/hvm/dm_op.h>
  #include <public/hvm/ioreq.h>

+#include <asm/traps.h>
+
  bool handle_mmio(void)
  {
      struct vcpu *v = current;
      struct cpu_user_regs *regs = guest_cpu_user_regs();
      const union hsr hsr = { .bits = regs->hsr };
-    const struct hsr_dabt dabt = hsr.dabt;
-    /* Code is similar to handle_read */
-    uint8_t size = (1 << dabt.size) * 8;
-    register_t r = v->arch.hvm.hvm_io.io_req.data;
-
-    /* We should only be here on Guest Data Abort */
-    ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL);
+    paddr_t addr = v->arch.hvm.hvm_io.io_req.addr;

-    /* We are done with the IO */
-    /* XXX: Is it the right place? */
-    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
-
-    /* XXX: Do we need to take care of write here ? */
-    if ( dabt.write )
-        return true;
-
-    /*
-     * Sign extend if required.
-     * Note that we expect the read handler to have zeroed the bits
-     * outside the requested access size.
-     */
-    if ( dabt.sign && (r & (1UL << (size - 1))) )
+    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
      {
-        /*
-         * We are relying on register_t using the same as
-         * an unsigned long in order to keep the 32-bit assembly
-         * code smaller.
-         */
-        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
-        r |= (~0UL) << size;
+        advance_pc(regs, hsr);
+        return true;
      }

-    set_user_reg(regs, dabt.reg, r);
-
-    return true;
+    return false;
  }

  /* Ask ioemu mapcache to invalidate mappings. */
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index ea472d1..974c744 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1965,11 +1965,13 @@ static void do_trap_stage2_abort_guest(struct 
cpu_user_regs *regs,
              case IO_HANDLED:
                  advance_pc(regs, hsr);
                  return;
+            case IO_RETRY:
+                /* finish later */
+                return;
              case IO_UNHANDLED:
                  /* IO unhandled, try another way to handle it. */
                  break;
              default:
-                /* XXX: Handle IO_RETRY */
                  ASSERT_UNREACHABLE();
              }
          }
-- 
2.7.4



-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-11  6:13               ` Oleksandr
@ 2020-08-12 15:08                 ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-12 15:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, xen-devel, Oleksandr Tyshchenko, Ian Jackson,
	Wei Liu, Andrew Cooper, George Dunlap, Jan Beulich,
	Volodymyr Babchuk, Daniel De Graaf, Julien Grall


Hi Julien


>>>>>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void)
>>>>>>>>     */
>>>>>>>>    void leave_hypervisor_to_guest(void)
>>>>>>>>    {
>>>>>>>> +#ifdef CONFIG_IOREQ_SERVER
>>>>>>>> +    /*
>>>>>>>> +     * XXX: Check the return. Shall we call that in
>>>>>>>> +     * continue_running and context_switch instead?
>>>>>>>> +     * The benefits would be to avoid calling
>>>>>>>> +     * handle_hvm_io_completion on every return.
>>>>>>>> +     */
>>>>>>> Yeah, that could be a simple and good optimization
>>>>>> Well, it is not simple as it is sounds :).
>>>>>> handle_hvm_io_completion() is the function in charge to mark the
>>>>>> vCPU as waiting for I/O. So we would at least need to split the
>>>>>> function.
>>>>>>
>>>>>> I wrote this TODO because I wasn't sure about the complexity of
>>>>>> handle_hvm_io_completion(current). Looking at it again, the main
>>>>>> complexity is the looping over the IOREQ servers.
>>>>>>
>>>>>> I think it would be better to optimize handle_hvm_io_completion()
>>>>>> rather than trying to hack the context_switch() or 
>>>>>> continue_running().
>>>>> Well, is the idea in proposed dirty test patch below close to what
>>>>> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra
>>>>> actions if vcpu's domain doesn't have ioreq_server, alternatively
>>>>> the check could be moved out of handle_hvm_io_completion() to avoid
>>>>> calling that function at all.
>>>> This looks ok to me.
>>>>
>>>>> BTW, TODO also suggests checking the return value of
>>>>> handle_hvm_io_completion(), but I am completely sure we can simply
>>>>> just return from leave_hypervisor_to_guest() at this point. Could you
>>>>> please share your opinion?
>>>>  From my understanding, handle_hvm_io_completion() may return false if
>>>> there is pending I/O or a failure.
>>> It seems, yes
>>>
>>>
>>>> In the former case, I think we want to call handle_hvm_io_completion()
>>>> later on. Possibly after we call do_softirq().
>>>>
>>>> I am wondering whether check_for_vcpu_work() could return whether
>>>> there are more work todo on the behalf of the vCPU.
>>>>
>>>> So we could have:
>>>>
>>>> do
>>>> {
>>>>    check_for_pcpu_work();
>>>> } while (check_for_vcpu_work())
>>>>
>>>> The implementation of check_for_vcpu_work() would be:
>>>>
>>>> if ( !handle_hvm_io_completion() )
>>>>    return true;
>>>>
>>>> /* Rest of the existing code */
>>>>
>>>> return false;
>>> Thank you, will give it a try.
>>>
>>> Can we behave the same way for both "pending I/O" and "failure" or we
>>> need to distinguish them?
>> We don't need to distinguish them. In both cases, we will want to
>> process softirqs. In all the failure cases, the domain will have
>> crashed. Therefore the vCPU will be unscheduled.
>
> Got it.
>
>
>>> Probably we need some sort of safe timeout/number attempts in order to
>>> not spin forever?
>> Well, anything based on timeout/number of attempts is flaky. How do
>> you know whether the I/O is just taking a "long time" to complete?
>>
>> But a vCPU shouldn't continue until an I/O has completed. This is
>> nothing very different than what a processor would do.
>>
>> In Xen case, if an I/O never completes then it most likely means that
>> something went horribly wrong with the Device Emulator. So it is most
>> likely not safe to continue. In HW, when there is a device failure,
>> the OS may receive an SError (this is implementation defined) and
>> could act accordingly if it is able to recognize the issue.
>>
>> It *might* be possible to send a virtual SError but there are a couple
>> of issues with it:
>>       * How do you detect a failure?
>>       * SErrors are implementations defined. You would need to teach
>> your OS (or the firmware) how to deal with them.
>>
>> I would expect quite a bit of effort in order to design and implement
>> it. For now, it is probably best to just let the vCPU spin forever.
>>
>> This wouldn't be an issue for Xen as do_softirq() would be called at
>> every loop.
>
>  Thank you for clarification. Fair enough and sounds reasonable.
I added logic to properly handle the return value of 
handle_hvm_io_completion() as you had suggested. For test purpose I 
simulated handle_hvm_io_completion() to return false sometimes
(I couldn't detect real "pending I/O" failure during testing) to see how 
new logic behaved. I assume I can take this solution for non-RFC series (?)


---
  xen/arch/arm/traps.c         | 36 ++++++++++++++++++++++--------------
  xen/common/hvm/ioreq.c       |  9 ++++++++-
  xen/include/asm-arm/domain.h |  1 +
  xen/include/xen/hvm/ioreq.h  |  5 +++++
  4 files changed, 36 insertions(+), 15 deletions(-)

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 974c744..f74b514 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2264,12 +2264,26 @@ static void check_for_pcpu_work(void)
   * Process pending work for the vCPU. Any call should be fast or
   * implement preemption.
   */
-static void check_for_vcpu_work(void)
+static bool check_for_vcpu_work(void)
  {
      struct vcpu *v = current;

+#ifdef CONFIG_IOREQ_SERVER
+    if ( hvm_domain_has_ioreq_server(v->domain) )
+    {
+        bool handled;
+
+        local_irq_enable();
+        handled = handle_hvm_io_completion(v);
+        local_irq_disable();
+
+        if ( !handled )
+            return true;
+    }
+#endif
+
      if ( likely(!v->arch.need_flush_to_ram) )
-        return;
+        return false;

      /*
       * Give a chance for the pCPU to process work before handling the vCPU
@@ -2280,6 +2294,8 @@ static void check_for_vcpu_work(void)
      local_irq_enable();
      p2m_flush_vm(v);
      local_irq_disable();
+
+    return false;
  }

  /*
@@ -2290,20 +2306,12 @@ static void check_for_vcpu_work(void)
   */
  void leave_hypervisor_to_guest(void)
  {
-#ifdef CONFIG_IOREQ_SERVER
-    /*
-     * XXX: Check the return. Shall we call that in
-     * continue_running and context_switch instead?
-     * The benefits would be to avoid calling
-     * handle_hvm_io_completion on every return.
-     */
-    local_irq_enable();
-    handle_hvm_io_completion(current);
-#endif
      local_irq_disable();

-    check_for_vcpu_work();
-    check_for_pcpu_work();
+    do
+    {
+        check_for_pcpu_work();
+    } while ( check_for_vcpu_work() );

      vgic_sync_to_lrs();

diff --git a/xen/common/hvm/ioreq.c b/xen/common/hvm/ioreq.c
index 7e1fa23..81b41ab 100644
--- a/xen/common/hvm/ioreq.c
+++ b/xen/common/hvm/ioreq.c
@@ -38,9 +38,15 @@ static void set_ioreq_server(struct domain *d, 
unsigned int id,
                               struct hvm_ioreq_server *s)
  {
      ASSERT(id < MAX_NR_IOREQ_SERVERS);
-    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
+    ASSERT((!s && d->arch.hvm.ioreq_server.server[id]) ||
+           (s && !d->arch.hvm.ioreq_server.server[id]));

      d->arch.hvm.ioreq_server.server[id] = s;
+
+    if ( s )
+        d->arch.hvm.ioreq_server.nr_servers ++;
+    else
+        d->arch.hvm.ioreq_server.nr_servers --;
  }

  /*
@@ -1415,6 +1421,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool 
buffered)
  void hvm_ioreq_init(struct domain *d)
  {
      spin_lock_init(&d->arch.hvm.ioreq_server.lock);
+    d->arch.hvm.ioreq_server.nr_servers = 0;

      arch_hvm_ioreq_init(d);
  }
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 6a01d69..484bd1a 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -68,6 +68,7 @@ struct hvm_domain
      struct {
          spinlock_t              lock;
          struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS];
+        unsigned int            nr_servers;
      } ioreq_server;

      bool_t qemu_mapcache_invalidate;
diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h
index 40b7b5e..8f78852 100644
--- a/xen/include/xen/hvm/ioreq.h
+++ b/xen/include/xen/hvm/ioreq.h
@@ -23,6 +23,11 @@

  #include <asm/hvm/ioreq.h>

+static inline bool hvm_domain_has_ioreq_server(const struct domain *d)
+{
+    return (d->arch.hvm.ioreq_server.nr_servers > 0);
+}
+
  #define GET_IOREQ_SERVER(d, id) \
      (d)->arch.hvm.ioreq_server.server[id]

-- 
2.7.4


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-12 14:35                               ` Oleksandr
@ 2020-08-12 23:08                                 ` Stefano Stabellini
  2020-08-13 20:16                                   ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-12 23:08 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, Julien Grall, Jan Beulich, paul, xen-devel,
	'Oleksandr Tyshchenko', 'Andrew Cooper',
	'Wei Liu', 'Roger Pau Monné',
	'George Dunlap', 'Ian Jackson',
	'Jun Nakajima', 'Kevin Tian',
	'Tim Deegan', 'Julien Grall'

[-- Attachment #1: Type: text/plain, Size: 14454 bytes --]

On Wed, 12 Aug 2020, Oleksandr wrote:
> On 12.08.20 01:47, Stefano Stabellini wrote:
> > On Tue, 11 Aug 2020, Oleksandr wrote:
> > > On 11.08.20 12:19, Julien Grall wrote:
> > > > On 11/08/2020 00:34, Stefano Stabellini wrote:
> > > > > On Mon, 10 Aug 2020, Oleksandr wrote:
> > > > > > On 08.08.20 01:19, Oleksandr wrote:
> > > > > > > On 08.08.20 00:50, Stefano Stabellini wrote:
> > > > > > > > On Fri, 7 Aug 2020, Oleksandr wrote:
> > > > > > > > > On 06.08.20 03:37, Stefano Stabellini wrote:
> > > > > > > > > 
> > > > > > > > > Hi Stefano
> > > > > > > > > 
> > > > > > > > > Trying to simulate IO_RETRY handling mechanism (according to
> > > > > > > > > model
> > > > > > > > > below) I
> > > > > > > > > continuously get IO_RETRY from try_fwd_ioserv() ...
> > > > > > > > > 
> > > > > > > > > > OK, thanks for the details. My interpretation seems to be
> > > > > > > > > > correct.
> > > > > > > > > > 
> > > > > > > > > > In which case, it looks like
> > > > > > > > > > xen/arch/arm/io.c:try_fwd_ioserv
> > > > > > > > > > should
> > > > > > > > > > return IO_RETRY. Then,
> > > > > > > > > > xen/arch/arm/traps.c:do_trap_stage2_abort_guest
> > > > > > > > > > also needs to handle try_handle_mmio returning IO_RETRY the
> > > > > > > > > > first
> > > > > > > > > > around, and IO_HANDLED when after QEMU does its job.
> > > > > > > > > > 
> > > > > > > > > > What should do_trap_stage2_abort_guest do on IO_RETRY?
> > > > > > > > > > Simply
> > > > > > > > > > return
> > > > > > > > > > early and let the scheduler do its job? Something like:
> > > > > > > > > > 
> > > > > > > > > >                 enum io_state state = try_handle_mmio(regs,
> > > > > > > > > > hsr,
> > > > > > > > > > gpa);
> > > > > > > > > > 
> > > > > > > > > >                 switch ( state )
> > > > > > > > > >                 {
> > > > > > > > > >                 case IO_ABORT:
> > > > > > > > > >                     goto inject_abt;
> > > > > > > > > >                 case IO_HANDLED:
> > > > > > > > > >                     advance_pc(regs, hsr);
> > > > > > > > > >                     return;
> > > > > > > > > >                 case IO_RETRY:
> > > > > > > > > >                     /* finish later */
> > > > > > > > > >                     return;
> > > > > > > > > >                 case IO_UNHANDLED:
> > > > > > > > > >                     /* IO unhandled, try another way to
> > > > > > > > > > handle
> > > > > > > > > > it. */
> > > > > > > > > >                     break;
> > > > > > > > > >                 default:
> > > > > > > > > >                     ASSERT_UNREACHABLE();
> > > > > > > > > >                 }
> > > > > > > > > > 
> > > > > > > > > > Then, xen/arch/arm/ioreq.c:handle_mmio() gets called by
> > > > > > > > > > handle_hvm_io_completion() after QEMU completes the
> > > > > > > > > > emulation.
> > > > > > > > > > Today,
> > > > > > > > > > handle_mmio just sets the user register with the read value.
> > > > > > > > > > 
> > > > > > > > > > But it would be better if it called again the original
> > > > > > > > > > function
> > > > > > > > > > do_trap_stage2_abort_guest to actually retry the original
> > > > > > > > > > operation.
> > > > > > > > > > This time do_trap_stage2_abort_guest calls try_handle_mmio()
> > > > > > > > > > and
> > > > > > > > > > gets
> > > > > > > > > > IO_HANDLED instead of IO_RETRY,
> > > > > > > > > I may miss some important point, but I failed to see why
> > > > > > > > > try_handle_mmio
> > > > > > > > > (try_fwd_ioserv) will return IO_HANDLED instead of IO_RETRY at
> > > > > > > > > this
> > > > > > > > > stage.
> > > > > > > > > Or current try_fwd_ioserv() logic needs rework?
> > > > > > > > I think you should check the ioreq->state in try_fwd_ioserv(),
> > > > > > > > if
> > > > > > > > the
> > > > > > > > result is ready, then ioreq->state should be STATE_IORESP_READY,
> > > > > > > > and
> > > > > > > > you
> > > > > > > > can return IO_HANDLED.
> > > > > > I optimized test patch a bit (now it looks much simpler). I didn't
> > > > > > face
> > > > > > any
> > > > > > issues during a quick test.
> > > > > Both patches get much closer to following the proper state machine,
> > > > > great! I think this patch is certainly a good improvement. I think the
> > > > > other patch you sent earlier, slightly larger, is even better. It
> > > > > makes
> > > > > the following additional changes that would be good to have:
> > > > > 
> > > > > - try_fwd_ioserv returns IO_HANDLED on state == STATE_IORESP_READY
> > > > > - handle_mmio simply calls do_trap_stage2_abort_guest
> > > > I don't think we should call do_trap_stage2_abort_guest() as part of the
> > > > completion because:
> > > >      * The function do_trap_stage2_abort_guest() is using registers that
> > > > are
> > > > not context switched (such as FAR_EL2). I/O handling is split in two
> > > > with
> > > > likely a context switch in the middle. The second part is the completion
> > > > (i.e call to handle_mmio()). So the system registers will be incorrect.
> > > >      * A big chunk of do_trap_stage2_abort_guest() is not necessary for
> > > > the
> > > > completion. For instance, there is no need to try to translate the guest
> > > > virtual address to a guest physical address.
> > > > 
> > > > Therefore the version below is probably the best approach.
> > > 
> > > Indeed, the first version (with calling do_trap_stage2_abort_guest() for a
> > > completion) is a racy. When testing it more heavily I faced an issue
> > > (sometimes) which resulted in DomU got stuck completely.
> > > 
> > > (XEN) d2v1: vGICD: bad read width 0 r11 offset 0x000f00
> > > 
> > > I didn't investigate an issue in detail, but I assumed that code in
> > > do_trap_stage2_abort_guest() caused that. This was the main reason why I
> > > decided to optimize an initial patch (and took only advance_pc).
> > > Reading Julien's answer I understand now what could happen.
> >  From your and Julien's feedback it is clear that calling
> > do_trap_stage2_abort_guest() is not possible and not a good idea.
> > 
> > 
> > The reason for my suggestion was to complete the implementation of the
> > state machine so that "RETRY" actually means "let's try again the
> > emulation" but the second time it will return "HANDLED".
> > 
> > Looking at this again, we could achieve the same goal in a better way by
> > moving the register setting from "handle_mmio" to "try_handle_mmio" and
> > also calling "try_handle_mmio" from "handle_mmio". Note that handle_mmio
> > would become almost empty like on x86.
> > 
> > 1) do_trap_stage2_abort_guest ->
> >         try_handle_mmio ->
> >              try_fwd_ioserv ->
> >                  IO_RETRY
> > 
> > 2) handle_hvm_io_completion ->
> >         handle_mmio ->
> >             try_handle_mmio ->
> >                 try_fwd_ioserv ->
> >                     IO_HANDLED
> > 
> > 
> > It is very similar to your second patch with a small change on calling
> > try_handle_mmio from handle_mmio and setting the register there. Do you
> > think that would work?
> If I understood correctly what you had suggested and properly implemented then
> it works, at least I didn't face any issues during testing.
> I think this variant adds some extra operations comparing to the previous one
> (for example an attempt to find a mmio handler at try_handle_mmio). But, if
> you think new variant is cleaner and better represents how the state machine
> should look like, I would be absolutely OK to take this variant for non-RFC
> series. Please note, there was a request to move try_fwd_ioserv() to
> arm/ioreq.c (I am going to move new handle_ioserv() as well).
 
Yes, I think this version better represents the state machine, thanks
for looking into it. I think it is good.

In terms of number of operations, it doesn't look very concerning (in
the sense that it doesn't seem to add that many ops.) However we could
maybe improve it by passing a reference to the right mmio handler from
handle_mmio to try_handle_mmio if we have it. Or maybe we could save a
reference to the mmio handler as part of v->arch.hvm.hvm_io.io_req.

In any case, I think it is fine.

 
> ---
>  xen/arch/arm/io.c    | 47 +++++++++++++++++++++++++++++++++++++++++++----
>  xen/arch/arm/ioreq.c | 38 +++++++-------------------------------
>  xen/arch/arm/traps.c |  4 +++-
>  3 files changed, 53 insertions(+), 36 deletions(-)
> 
> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c
> index 436f669..4db7c55 100644
> --- a/xen/arch/arm/io.c
> +++ b/xen/arch/arm/io.c
> @@ -109,6 +109,43 @@ static const struct mmio_handler
> *find_mmio_handler(struct domain *d,
>  }
> 
>  #ifdef CONFIG_IOREQ_SERVER
> +static enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu
> *v)
> +{
> +    const union hsr hsr = { .bits = regs->hsr };
> +    const struct hsr_dabt dabt = hsr.dabt;
> +    /* Code is similar to handle_read */
> +    uint8_t size = (1 << dabt.size) * 8;
> +    register_t r = v->arch.hvm.hvm_io.io_req.data;
> +
> +    /* We are done with the IO */
> +    /* XXX: Is it the right place? */
> +    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
> +
> +    /* XXX: Do we need to take care of write here ? */
> +    if ( dabt.write )
> +        return IO_HANDLED;
> +
> +    /*
> +     * Sign extend if required.
> +     * Note that we expect the read handler to have zeroed the bits
> +     * outside the requested access size.
> +     */
> +    if ( dabt.sign && (r & (1UL << (size - 1))) )
> +    {
> +        /*
> +         * We are relying on register_t using the same as
> +         * an unsigned long in order to keep the 32-bit assembly
> +         * code smaller.
> +         */
> +        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
> +        r |= (~0UL) << size;
> +    }
> +
> +    set_user_reg(regs, dabt.reg, r);
> +
> +    return IO_HANDLED;
> +}
> +
>  static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>                                      struct vcpu *v, mmio_info_t *info)
>  {
> @@ -130,6 +167,10 @@ static enum io_state try_fwd_ioserv(struct cpu_user_regs
> *regs,
>      {
>      case STATE_IOREQ_NONE:
>          break;
> +
> +    case STATE_IORESP_READY:
> +        return IO_HANDLED;
> +
>      default:
>          printk("d%u wrong state %u\n", v->domain->domain_id,
>                 vio->io_req.state);
> @@ -156,10 +197,6 @@ static enum io_state try_fwd_ioserv(struct cpu_user_regs
> *regs,
>      else
>          vio->io_completion = HVMIO_mmio_completion;
> 
> -    /* XXX: Decide what to do */
> -    if ( rc == IO_RETRY )
> -        rc = IO_HANDLED;
> -
>      return rc;
>  }
>  #endif
> @@ -185,6 +222,8 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs,
> 
>  #ifdef CONFIG_IOREQ_SERVER
>          rc = try_fwd_ioserv(regs, v, &info);
> +        if ( rc == IO_HANDLED )
> +            return handle_ioserv(regs, v);
>  #endif
> 
>          return rc;
> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> index 8f60c41..9068b8d 100644
> --- a/xen/arch/arm/ioreq.c
> +++ b/xen/arch/arm/ioreq.c
> @@ -33,46 +33,22 @@
>  #include <public/hvm/dm_op.h>
>  #include <public/hvm/ioreq.h>
> 
> +#include <asm/traps.h>
> +
>  bool handle_mmio(void)
>  {
>      struct vcpu *v = current;
>      struct cpu_user_regs *regs = guest_cpu_user_regs();
>      const union hsr hsr = { .bits = regs->hsr };
> -    const struct hsr_dabt dabt = hsr.dabt;
> -    /* Code is similar to handle_read */
> -    uint8_t size = (1 << dabt.size) * 8;
> -    register_t r = v->arch.hvm.hvm_io.io_req.data;
> -
> -    /* We should only be here on Guest Data Abort */
> -    ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL);
> +    paddr_t addr = v->arch.hvm.hvm_io.io_req.addr;
> 
> -    /* We are done with the IO */
> -    /* XXX: Is it the right place? */
> -    v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE;
> -
> -    /* XXX: Do we need to take care of write here ? */
> -    if ( dabt.write )
> -        return true;
> -
> -    /*
> -     * Sign extend if required.
> -     * Note that we expect the read handler to have zeroed the bits
> -     * outside the requested access size.
> -     */
> -    if ( dabt.sign && (r & (1UL << (size - 1))) )
> +    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
>      {
> -        /*
> -         * We are relying on register_t using the same as
> -         * an unsigned long in order to keep the 32-bit assembly
> -         * code smaller.
> -         */
> -        BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long));
> -        r |= (~0UL) << size;
> +        advance_pc(regs, hsr);
> +        return true;
>      }
> 
> -    set_user_reg(regs, dabt.reg, r);
> -
> -    return true;
> +    return false;
>  }
> 
>  /* Ask ioemu mapcache to invalidate mappings. */
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index ea472d1..974c744 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1965,11 +1965,13 @@ static void do_trap_stage2_abort_guest(struct
> cpu_user_regs *regs,
>              case IO_HANDLED:
>                  advance_pc(regs, hsr);
>                  return;
> +            case IO_RETRY:
> +                /* finish later */
> +                return;
>              case IO_UNHANDLED:
>                  /* IO unhandled, try another way to handle it. */
>                  break;
>              default:
> -                /* XXX: Handle IO_RETRY */
>                  ASSERT_UNREACHABLE();
>              }
>          }
> -- 
> 2.7.4
> 
> 
> 
> -- 
> Regards,
> 
> Oleksandr Tyshchenko
> 

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-11 17:50         ` Julien Grall
@ 2020-08-13 18:41           ` Oleksandr
  2020-08-13 20:36             ` Julien Grall
  2020-08-13 20:39             ` Oleksandr Tyshchenko
  0 siblings, 2 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-13 18:41 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, xen-devel, Oleksandr Tyshchenko, Ian Jackson,
	Wei Liu, Andrew Cooper, George Dunlap, Jan Beulich,
	Volodymyr Babchuk, Daniel De Graaf, Julien Grall


On 11.08.20 20:50, Julien Grall wrote:

Hi Julien

>
>
> On 11/08/2020 18:09, Oleksandr wrote:
>>
>> On 05.08.20 12:32, Julien Grall wrote:
>>
>> Hi Julien, Stefano
>>
>>>
>>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>>>> index 5fdb6e8..5823f11 100644
>>>>> --- a/xen/include/asm-arm/p2m.h
>>>>> +++ b/xen/include/asm-arm/p2m.h
>>>>> @@ -385,10 +385,11 @@ static inline int 
>>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>>>>>                                           mfn_t mfn)
>>>>>   {
>>>>>       /*
>>>>> -     * NOTE: If this is implemented then proper reference 
>>>>> counting of
>>>>> -     *       foreign entries will need to be implemented.
>>>>> +     * XXX: handle properly reference. It looks like the page may 
>>>>> not always
>>>>> +     * belong to d.
>>>>
>>>> Just as a reference, and without taking away anything from the 
>>>> comment,
>>>> I think that QEMU is doing its own internal reference counting for 
>>>> these
>>>> mappings.
>>>
>>> I am not sure how this matters here? We can't really trust the DM to 
>>> do the right thing if it is not running in dom0.
>>>
>>> But, IIRC, the problem is some of the pages doesn't belong to do a 
>>> domain, so it is not possible to treat them as foreign mapping (e.g. 
>>> you wouldn't be able to grab a reference). This investigation was 
>>> done a couple of years ago, so this may have changed in recent Xen.
>>
>> Well, emulator is going to be used in driver domain, so this TODO 
>> must be resolved. I suspect that the check for a hardware domain in 
>> acquire_resource() which I skipped in a hackish way [1] could be 
>> simply removed once proper reference counting is implemented in Xen, 
>> correct?
>
> It depends how you are going to solve it. If you manage to solve it in 
> a generic way, then yes you could resolve. If not (i.e. it is solved 
> in an arch-specific way), we would need to keep the check on arch that 
> are not able to deal with it. See more below.
>
>>
>> Could you please provide some pointers on that problem? Maybe some 
>> questions need to be investigated again? Unfortunately, it is not 
>> completely clear to me the direction to follow...
>>
>> ***
>> I am wondering whether the similar problem exists on x86 as well?
>
> It is somewhat different. On Arm, we are able to handle properly 
> foreign mapping (i.e. mapping page from a another domain) as we would 
> grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in 
> xenmem_add_to_physmap()). The reference will then be released when the 
> entry is removed from the P2M (see p2m_free_entry()).
>
> If all the pages given to set_foreign_p2m_entry() belong to a domain, 
> then you could use the same approach.
>
> However, I remember to run into some issues in some of the cases. I 
> had a quick looked at the caller and I wasn't able to find any use 
> cases that may be an issue.
>
> The refcounting in the IOREQ code has changed after XSA-276 (this was 
> found while working on the Arm port). Probably the best way to figure 
> out if it works would be to try it and see if it fails.
>
> Note that set_foreign_p2m_entry() doesn't have a parameter for the 
> foreign domain. You would need to add a extra parameter for this.
>
>> The FIXME tag (before checking for a hardware domain in 
>> acquire_resource()) in the common code makes me think it is a common 
>> issue. From other side x86's
>> implementation of set_foreign_p2m_entry() is exists unlike Arm's one 
>> (which returned -EOPNOTSUPP so far). Or these are unrelated?
>
> At the moment, x86 doesn't support refcounting for foreign mapping. 
> Hence the reason to restrict them to the hardware domain.


Thank you for the pointers!


I checked that all pages given to set_foreign_p2m_entry() belonged to a 
domain (at least in my use-case). I noticed two calls for acquiring 
resource at the DomU creation time, the first call was for grant table 
(single gfn)
and the second for ioreq server which carried 2 gfns (for shared and 
buffered rings I assume). For the test purpose, I passed these gfns to 
get_page_from_gfn() in order to grab references on the pages, after that 
I tried to destroy DomU without calling put_page() for these pages. The 
fact that I couldn't destroy DomU completely (a zombie domain was 
observed) made me think that references were still taken, so worked as 
expected.


I implemented a test patch (which uses approach from 
xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check 
whether it would work.


---
  xen/arch/arm/p2m.c        | 30 ++++++++++++++++++++++++++++++
  xen/common/memory.c       |  2 +-
  xen/include/asm-arm/p2m.h | 12 ++----------
  3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index e9ccba8..7359715 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d, 
gfn_t gfn, mfn_t mfn,
      return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
  }

+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
+{
+    struct page_info *page;
+    p2m_type_t p2mt;
+    int rc;
+
+    /*
+     * Take reference to the foreign domain page. Reference will be 
released
+     * in p2m_put_l3_page().
+     */
+    page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC);
+    if ( !page )
+        return -EINVAL;
+
+    if ( p2m_is_ram(p2mt) )
+        p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw : 
p2m_map_foreign_ro;
+    else
+    {
+        put_page(page);
+        return -EINVAL;
+    }
+
+    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt);
+    if ( rc )
+        put_page(page);
+
+    return 0;
+}
+
  static struct page_info *p2m_allocate_root(void)
  {
      struct page_info *page;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 8d9f0a8..1de1d4f 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1171,7 +1171,7 @@ static int acquire_resource(

          for ( i = 0; !rc && i < xmar.nr_frames; i++ )
          {
-            rc = set_foreign_p2m_entry(currd, gfn_list[i],
+            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
                                         _mfn(mfn_list[i]));
              /* rc should be -EIO for any iteration other than the first */
              if ( rc && i )
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 5823f11..53ce373 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, 
unsigned int order)
      return gfn_add(gfn, 1UL << order);
  }

-static inline int set_foreign_p2m_entry(struct domain *d, unsigned long 
gfn,
-                                        mfn_t mfn)
-{
-    /*
-     * XXX: handle properly reference. It looks like the page may not 
always
-     * belong to d.
-     */
-
-    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
-}
+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn,  mfn_t mfn);

  /*
   * A vCPU has cache enabled only when the MMU is enabled and data cache
-- 
2.7.4


And with that patch applied I was facing a BUG when destroying/rebooting 
DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered 
that BUG:


Rebooting domain 2
root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at 
...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
(XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
(XEN) CPU:    3
(XEN) PC:     0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
(XEN) LR:     0000000000246ef0
(XEN) SP:     0000800725eafd80
(XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2: 000000000000001f
(XEN)      X3: 0000000080000000  X4: 0000000000000000  X5: 0000000000400000
(XEN)      X6: 0000800725eafe24  X7: 0000ffffd1ef3e08  X8: 0000000000000020
(XEN)      X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000
(XEN)     X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001
(XEN)     X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0
(XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0
(XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
(XEN)     X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d
(XEN)     X27: ffff000008ad1000 X28: ffff800052e65100  FP: ffff0000223dbd20
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0002000765f04000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000008078663f
(XEN)  TTBR0_EL2: 00000000781c5000
(XEN)
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 0000000000030010
(XEN)    FAR_EL2: ffff000008005f00
(XEN)
(XEN) Xen stack trace from sp=0000800725eafd80:
(XEN)    0000800725e68ec0 0000000000247078 00008004dc040000 00000000002477c8
(XEN)    ffffffffffffffea 0000000000000001 ffff8000460ec500 0000000000000002
(XEN)    000000000024645c 00000000002462dc 0000800725eafeb0 0000800725eafeb0
(XEN)    0000800725eaff30 0000000060000145 000000000027882c 0000800725eafeb0
(XEN)    0000800725eafeb0 01ff00000935de80 00008004dc040000 0000000000000006
(XEN)    ffff800000000000 0000000000000002 000000005a000ea1 000000019bc60002
(XEN)    0000ffffd1ef3e08 0000000000000020 0000000000000004 000000000027c7d8
(XEN)    000000005a000ea1 0000800725eafeb0 000000005a000ea1 0000000000279f98
(XEN)    0000000000000000 ffff8000460ec200 0000800725eaffb8 0000000000262c58
(XEN)    0000000000262c4c 07e0000160000249 0000000000000002 0000000000000001
(XEN)    ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 ffff800052e65100
(XEN)    000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 0000000000000000
(XEN)    00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 0000000000000002
(XEN)    0000000000000001 0000000000000001 0000000000000029 0000ffff9badb3d0
(XEN)    000000000000010f ffff8000460ec210 ffff8000460ec200 ffff8000460ec210
(XEN)    0000000000000001 ffff8000460ec500 ffff8000460ec280 0000000000000124
(XEN)    000000000000001d ffff000008ad1000 ffff800052e65100 ffff0000223dbd20
(XEN)    ffff000008537004 ffffffffffffffff ffff0000080c17e4 5a000ea160000145
(XEN)    0000000060000000 0000000000000000 0000000000000000 ffff800052e65100
(XEN)    ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
(XEN)    [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) PSCI cpu off failed for CPU0 err=-3
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...



Either I did something wrong (most likely) or there is an issue with 
page ref-counting in the IOREQ code. I am still trying to understand 
what is going on.
Some notes on that:
1. I checked that put_page() was called for these pages in 
p2m_put_l3_page() when destroying domain. This happened before 
hvm_free_ioreq_mfn() execution.
2. There was no BUG detected if I passed "p2m_ram_rw" instead of 
"p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't 
be fully destroyed because of the reference taken.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-12 23:08                                 ` Stefano Stabellini
@ 2020-08-13 20:16                                   ` Julien Grall
  0 siblings, 0 replies; 140+ messages in thread
From: Julien Grall @ 2020-08-13 20:16 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr
  Cc: Jan Beulich, paul, xen-devel, 'Oleksandr Tyshchenko',
	'Andrew Cooper', 'Wei Liu',
	'Roger Pau Monné', 'George Dunlap',
	'Ian Jackson', 'Jun Nakajima',
	'Kevin Tian', 'Tim Deegan',
	'Julien Grall'



On 13/08/2020 00:08, Stefano Stabellini wrote:
>>> It is very similar to your second patch with a small change on calling
>>> try_handle_mmio from handle_mmio and setting the register there. Do you
>>> think that would work?
>> If I understood correctly what you had suggested and properly implemented then
>> it works, at least I didn't face any issues during testing.
>> I think this variant adds some extra operations comparing to the previous one
>> (for example an attempt to find a mmio handler at try_handle_mmio). But, if
>> you think new variant is cleaner and better represents how the state machine
>> should look like, I would be absolutely OK to take this variant for non-RFC
>> series. Please note, there was a request to move try_fwd_ioserv() to
>> arm/ioreq.c (I am going to move new handle_ioserv() as well).
>   
> Yes, I think this version better represents the state machine, thanks
> for looking into it. I think it is good.
> 
> In terms of number of operations, it doesn't look very concerning (in
> the sense that it doesn't seem to add that many ops.) However we could
> maybe improve it by passing a reference to the right mmio handler from
> handle_mmio to try_handle_mmio if we have it. Or maybe we could save a
> reference to the mmio handler as part of v->arch.hvm.hvm_io.io_req.

There is no MMIO handler for the IOREQ handling. So I am not entirely 
sure what you are suggesting.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-13 18:41           ` Oleksandr
@ 2020-08-13 20:36             ` Julien Grall
  2020-08-13 21:49               ` Oleksandr
  2020-08-13 20:39             ` Oleksandr Tyshchenko
  1 sibling, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-13 20:36 UTC (permalink / raw)
  To: Oleksandr
  Cc: Stefano Stabellini, xen-devel, Oleksandr Tyshchenko, Ian Jackson,
	Wei Liu, Andrew Cooper, George Dunlap, Jan Beulich,
	Volodymyr Babchuk, Daniel De Graaf, Julien Grall



On 13/08/2020 19:41, Oleksandr wrote:
> Rebooting domain 2
> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at 
> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
> (XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
> (XEN) CPU:    3
> (XEN) PC:     0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
> (XEN) LR:     0000000000246ef0
> (XEN) SP:     0000800725eafd80
> (XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
> (XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2: 000000000000001f
> (XEN)      X3: 0000000080000000  X4: 0000000000000000  X5: 0000000000400000
> (XEN)      X6: 0000800725eafe24  X7: 0000ffffd1ef3e08  X8: 0000000000000020
> (XEN)      X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000
> (XEN)     X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001
> (XEN)     X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0
> (XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0
> (XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
> (XEN)     X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d
> (XEN)     X27: ffff000008ad1000 X28: ffff800052e65100  FP: ffff0000223dbd20
> (XEN)
> (XEN)   VTCR_EL2: 80023558
> (XEN)  VTTBR_EL2: 0002000765f04000
> (XEN)
> (XEN)  SCTLR_EL2: 30cd183d
> (XEN)    HCR_EL2: 000000008078663f
> (XEN)  TTBR0_EL2: 00000000781c5000
> (XEN)
> (XEN)    ESR_EL2: f2000001
> (XEN)  HPFAR_EL2: 0000000000030010
> (XEN)    FAR_EL2: ffff000008005f00
> (XEN)
> (XEN) Xen stack trace from sp=0000800725eafd80:
> (XEN)    0000800725e68ec0 0000000000247078 00008004dc040000 
> 00000000002477c8
> (XEN)    ffffffffffffffea 0000000000000001 ffff8000460ec500 
> 0000000000000002
> (XEN)    000000000024645c 00000000002462dc 0000800725eafeb0 
> 0000800725eafeb0
> (XEN)    0000800725eaff30 0000000060000145 000000000027882c 
> 0000800725eafeb0
> (XEN)    0000800725eafeb0 01ff00000935de80 00008004dc040000 
> 0000000000000006
> (XEN)    ffff800000000000 0000000000000002 000000005a000ea1 
> 000000019bc60002
> (XEN)    0000ffffd1ef3e08 0000000000000020 0000000000000004 
> 000000000027c7d8
> (XEN)    000000005a000ea1 0000800725eafeb0 000000005a000ea1 
> 0000000000279f98
> (XEN)    0000000000000000 ffff8000460ec200 0000800725eaffb8 
> 0000000000262c58
> (XEN)    0000000000262c4c 07e0000160000249 0000000000000002 
> 0000000000000001
> (XEN)    ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 
> ffff800052e65100
> (XEN)    000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 
> 0000000000000000
> (XEN)    00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 
> 0000000000000002
> (XEN)    0000000000000001 0000000000000001 0000000000000029 
> 0000ffff9badb3d0
> (XEN)    000000000000010f ffff8000460ec210 ffff8000460ec200 
> ffff8000460ec210
> (XEN)    0000000000000001 ffff8000460ec500 ffff8000460ec280 
> 0000000000000124
> (XEN)    000000000000001d ffff000008ad1000 ffff800052e65100 
> ffff0000223dbd20
> (XEN)    ffff000008537004 ffffffffffffffff ffff0000080c17e4 
> 5a000ea160000145
> (XEN)    0000000060000000 0000000000000000 0000000000000000 
> ffff800052e65100
> (XEN)    ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 
> 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
> (XEN)    [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) PSCI cpu off failed for CPU0 err=-3
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> 
> 
> 
> Either I did something wrong (most likely) or there is an issue with 
> page ref-counting in the IOREQ code. I am still trying to understand 
> what is going on.

At a first glance, the implement of set_foreign_p2m_entry() looks fine 
to me.

> Some notes on that:
> 1. I checked that put_page() was called for these pages in 
> p2m_put_l3_page() when destroying domain. This happened before 
> hvm_free_ioreq_mfn() execution.
> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of 
> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't 
> be fully destroyed because of the reference taken.

This definitely looks like a page reference issue. Would it be possible 
to print where the page reference are dropped? A WARN() in put_page() 
would help.

To avoid a lot of message, I tend to use a global variable that store 
the page I want to watch.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-13 18:41           ` Oleksandr
  2020-08-13 20:36             ` Julien Grall
@ 2020-08-13 20:39             ` Oleksandr Tyshchenko
  2020-08-13 22:14               ` Julien Grall
  1 sibling, 1 reply; 140+ messages in thread
From: Oleksandr Tyshchenko @ 2020-08-13 20:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, xen-devel, Oleksandr Tyshchenko, Ian Jackson,
	Wei Liu, Andrew Cooper, George Dunlap, Jan Beulich,
	Volodymyr Babchuk, Daniel De Graaf, Julien Grall

[-- Attachment #1: Type: text/plain, Size: 12635 bytes --]

Hi

Sorry for the possible format issue.

On Thu, Aug 13, 2020 at 9:42 PM Oleksandr <olekstysh@gmail.com> wrote:

>
> On 11.08.20 20:50, Julien Grall wrote:
>
> Hi Julien
>
> >
> >
> > On 11/08/2020 18:09, Oleksandr wrote:
> >>
> >> On 05.08.20 12:32, Julien Grall wrote:
> >>
> >> Hi Julien, Stefano
> >>
> >>>
> >>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> >>>>> index 5fdb6e8..5823f11 100644
> >>>>> --- a/xen/include/asm-arm/p2m.h
> >>>>> +++ b/xen/include/asm-arm/p2m.h
> >>>>> @@ -385,10 +385,11 @@ static inline int
> >>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
> >>>>>                                           mfn_t mfn)
> >>>>>   {
> >>>>>       /*
> >>>>> -     * NOTE: If this is implemented then proper reference
> >>>>> counting of
> >>>>> -     *       foreign entries will need to be implemented.
> >>>>> +     * XXX: handle properly reference. It looks like the page may
> >>>>> not always
> >>>>> +     * belong to d.
> >>>>
> >>>> Just as a reference, and without taking away anything from the
> >>>> comment,
> >>>> I think that QEMU is doing its own internal reference counting for
> >>>> these
> >>>> mappings.
> >>>
> >>> I am not sure how this matters here? We can't really trust the DM to
> >>> do the right thing if it is not running in dom0.
> >>>
> >>> But, IIRC, the problem is some of the pages doesn't belong to do a
> >>> domain, so it is not possible to treat them as foreign mapping (e.g.
> >>> you wouldn't be able to grab a reference). This investigation was
> >>> done a couple of years ago, so this may have changed in recent Xen.
> >>
> >> Well, emulator is going to be used in driver domain, so this TODO
> >> must be resolved. I suspect that the check for a hardware domain in
> >> acquire_resource() which I skipped in a hackish way [1] could be
> >> simply removed once proper reference counting is implemented in Xen,
> >> correct?
> >
> > It depends how you are going to solve it. If you manage to solve it in
> > a generic way, then yes you could resolve. If not (i.e. it is solved
> > in an arch-specific way), we would need to keep the check on arch that
> > are not able to deal with it. See more below.
> >
> >>
> >> Could you please provide some pointers on that problem? Maybe some
> >> questions need to be investigated again? Unfortunately, it is not
> >> completely clear to me the direction to follow...
> >>
> >> ***
> >> I am wondering whether the similar problem exists on x86 as well?
> >
> > It is somewhat different. On Arm, we are able to handle properly
> > foreign mapping (i.e. mapping page from a another domain) as we would
> > grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in
> > xenmem_add_to_physmap()). The reference will then be released when the
> > entry is removed from the P2M (see p2m_free_entry()).
> >
> > If all the pages given to set_foreign_p2m_entry() belong to a domain,
> > then you could use the same approach.
> >
> > However, I remember to run into some issues in some of the cases. I
> > had a quick looked at the caller and I wasn't able to find any use
> > cases that may be an issue.
> >
> > The refcounting in the IOREQ code has changed after XSA-276 (this was
> > found while working on the Arm port). Probably the best way to figure
> > out if it works would be to try it and see if it fails.
> >
> > Note that set_foreign_p2m_entry() doesn't have a parameter for the
> > foreign domain. You would need to add a extra parameter for this.
> >
> >> The FIXME tag (before checking for a hardware domain in
> >> acquire_resource()) in the common code makes me think it is a common
> >> issue. From other side x86's
> >> implementation of set_foreign_p2m_entry() is exists unlike Arm's one
> >> (which returned -EOPNOTSUPP so far). Or these are unrelated?
> >
> > At the moment, x86 doesn't support refcounting for foreign mapping.
> > Hence the reason to restrict them to the hardware domain.
>
>
> Thank you for the pointers!
>
>
> I checked that all pages given to set_foreign_p2m_entry() belonged to a
> domain (at least in my use-case). I noticed two calls for acquiring
> resource at the DomU creation time, the first call was for grant table
> (single gfn)
> and the second for ioreq server which carried 2 gfns (for shared and
> buffered rings I assume). For the test purpose, I passed these gfns to
> get_page_from_gfn() in order to grab references on the pages, after that
> I tried to destroy DomU without calling put_page() for these pages. The
> fact that I couldn't destroy DomU completely (a zombie domain was
> observed) made me think that references were still taken, so worked as
> expected.
>
>
> I implemented a test patch (which uses approach from
> xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check
> whether it would work.
>
>
> ---
>   xen/arch/arm/p2m.c        | 30 ++++++++++++++++++++++++++++++
>   xen/common/memory.c       |  2 +-
>   xen/include/asm-arm/p2m.h | 12 ++----------
>   3 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index e9ccba8..7359715 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d,
> gfn_t gfn, mfn_t mfn,
>       return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
>   }
>
> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
> +                          unsigned long gfn, mfn_t mfn)
> +{
> +    struct page_info *page;
> +    p2m_type_t p2mt;
> +    int rc;
> +
> +    /*
> +     * Take reference to the foreign domain page. Reference will be
> released
> +     * in p2m_put_l3_page().
> +     */
> +    page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC);
> +    if ( !page )
> +        return -EINVAL;
> +
> +    if ( p2m_is_ram(p2mt) )
> +        p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw :
> p2m_map_foreign_ro;
> +    else
> +    {
> +        put_page(page);
> +        return -EINVAL;
> +    }
> +
> +    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt);
> +    if ( rc )
> +        put_page(page);
> +
> +    return 0;
> +}
> +
>   static struct page_info *p2m_allocate_root(void)
>   {
>       struct page_info *page;
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 8d9f0a8..1de1d4f 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -1171,7 +1171,7 @@ static int acquire_resource(
>
>           for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>           {
> -            rc = set_foreign_p2m_entry(currd, gfn_list[i],
> +            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>                                          _mfn(mfn_list[i]));
>               /* rc should be -EIO for any iteration other than the first
> */
>               if ( rc && i )
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index 5823f11..53ce373 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn,
> unsigned int order)
>       return gfn_add(gfn, 1UL << order);
>   }
>
> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long
> gfn,
> -                                        mfn_t mfn)
> -{
> -    /*
> -     * XXX: handle properly reference. It looks like the page may not
> always
> -     * belong to d.
> -     */
> -
> -    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
> -}
> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
> +                          unsigned long gfn,  mfn_t mfn);
>
>   /*
>    * A vCPU has cache enabled only when the MMU is enabled and data cache
> --
> 2.7.4
>
>
> And with that patch applied I was facing a BUG when destroying/rebooting
> DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered
> that BUG:
>
>
> Rebooting domain 2
> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at
> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
> (XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
> (XEN) CPU:    3
> (XEN) PC:     0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
> (XEN) LR:     0000000000246ef0
> (XEN) SP:     0000800725eafd80
> (XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
> (XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2: 000000000000001f
> (XEN)      X3: 0000000080000000  X4: 0000000000000000  X5: 0000000000400000
> (XEN)      X6: 0000800725eafe24  X7: 0000ffffd1ef3e08  X8: 0000000000000020
> (XEN)      X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000
> (XEN)     X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001
> (XEN)     X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0
> (XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0
> (XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
> (XEN)     X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d
> (XEN)     X27: ffff000008ad1000 X28: ffff800052e65100  FP: ffff0000223dbd20
> (XEN)
> (XEN)   VTCR_EL2: 80023558
> (XEN)  VTTBR_EL2: 0002000765f04000
> (XEN)
> (XEN)  SCTLR_EL2: 30cd183d
> (XEN)    HCR_EL2: 000000008078663f
> (XEN)  TTBR0_EL2: 00000000781c5000
> (XEN)
> (XEN)    ESR_EL2: f2000001
> (XEN)  HPFAR_EL2: 0000000000030010
> (XEN)    FAR_EL2: ffff000008005f00
> (XEN)
> (XEN) Xen stack trace from sp=0000800725eafd80:
> (XEN)    0000800725e68ec0 0000000000247078 00008004dc040000
> 00000000002477c8
> (XEN)    ffffffffffffffea 0000000000000001 ffff8000460ec500
> 0000000000000002
> (XEN)    000000000024645c 00000000002462dc 0000800725eafeb0
> 0000800725eafeb0
> (XEN)    0000800725eaff30 0000000060000145 000000000027882c
> 0000800725eafeb0
> (XEN)    0000800725eafeb0 01ff00000935de80 00008004dc040000
> 0000000000000006
> (XEN)    ffff800000000000 0000000000000002 000000005a000ea1
> 000000019bc60002
> (XEN)    0000ffffd1ef3e08 0000000000000020 0000000000000004
> 000000000027c7d8
> (XEN)    000000005a000ea1 0000800725eafeb0 000000005a000ea1
> 0000000000279f98
> (XEN)    0000000000000000 ffff8000460ec200 0000800725eaffb8
> 0000000000262c58
> (XEN)    0000000000262c4c 07e0000160000249 0000000000000002
> 0000000000000001
> (XEN)    ffff8000460ec500 ffff8000460ec508 ffff8000460ec208
> ffff800052e65100
> (XEN)    000000005060b478 0000ffffd20f3000 ffff7e00013c77e0
> 0000000000000000
> (XEN)    00e800008ecebf53 0400000000000000 ffff7e00013b3ac0
> 0000000000000002
> (XEN)    0000000000000001 0000000000000001 0000000000000029
> 0000ffff9badb3d0
> (XEN)    000000000000010f ffff8000460ec210 ffff8000460ec200
> ffff8000460ec210
> (XEN)    0000000000000001 ffff8000460ec500 ffff8000460ec280
> 0000000000000124
> (XEN)    000000000000001d ffff000008ad1000 ffff800052e65100
> ffff0000223dbd20
> (XEN)    ffff000008537004 ffffffffffffffff ffff0000080c17e4
> 5a000ea160000145
> (XEN)    0000000060000000 0000000000000000 0000000000000000
> ffff800052e65100
> (XEN)    ffff0000223dbd20 0000ffff9badb3dc 0000000000000000
> 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
> (XEN)    [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 3:
> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) PSCI cpu off failed for CPU0 err=-3
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
>
>
>
> Either I did something wrong (most likely) or there is an issue with
> page ref-counting in the IOREQ code. I am still trying to understand
> what is going on.
> Some notes on that:
> 1. I checked that put_page() was called for these pages in
> p2m_put_l3_page() when destroying domain. This happened before
> hvm_free_ioreq_mfn() execution.
> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of
> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't
> be fully destroyed because of the reference taken.
>

I think I understand why BUG is triggered.

I checked "page->count_info & PGC_count_mask" and noticed
that get_page_from_gfn() doesn't seem to increase ref counter (but it
should?)

1. hvm_alloc_ioreq_mfn() -> ref 2
2. set_foreign_p2m_entry() -> ref still 2
3. p2m_put_l3_page() -> ref 1
4. hvm_free_ioreq_mfn() calls put_page_alloc_ref() with ref 1 which
triggers BUG


-- 
Regards,

Oleksandr Tyshchenko

[-- Attachment #2: Type: text/html, Size: 15788 bytes --]

^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-13 20:36             ` Julien Grall
@ 2020-08-13 21:49               ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-13 21:49 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, xen-devel, Oleksandr Tyshchenko, Ian Jackson,
	Wei Liu, Andrew Cooper, George Dunlap, Jan Beulich,
	Volodymyr Babchuk, Daniel De Graaf, Julien Grall


On 13.08.20 23:36, Julien Grall wrote:

Hi Julien

>
>
> On 13/08/2020 19:41, Oleksandr wrote:
>> Rebooting domain 2
>> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at 
>> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>> (XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
>> (XEN) CPU:    3
>> (XEN) PC:     0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
>> (XEN) LR:     0000000000246ef0
>> (XEN) SP:     0000800725eafd80
>> (XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
>> (XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2: 
>> 000000000000001f
>> (XEN)      X3: 0000000080000000  X4: 0000000000000000  X5: 
>> 0000000000400000
>> (XEN)      X6: 0000800725eafe24  X7: 0000ffffd1ef3e08  X8: 
>> 0000000000000020
>> (XEN)      X9: 0000000000000000 X10: 00e800008ecebf53 X11: 
>> 0400000000000000
>> (XEN)     X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 
>> 0000000000000001
>> (XEN)     X15: 0000000000000001 X16: 0000000000000029 X17: 
>> 0000ffff9badb3d0
>> (XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20: 
>> 0000800725e68ec0
>> (XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23: 
>> 000000005a000ea1
>> (XEN)     X24: ffff8000460ec280 X25: 0000000000000124 X26: 
>> 000000000000001d
>> (XEN)     X27: ffff000008ad1000 X28: ffff800052e65100  FP: 
>> ffff0000223dbd20
>> (XEN)
>> (XEN)   VTCR_EL2: 80023558
>> (XEN)  VTTBR_EL2: 0002000765f04000
>> (XEN)
>> (XEN)  SCTLR_EL2: 30cd183d
>> (XEN)    HCR_EL2: 000000008078663f
>> (XEN)  TTBR0_EL2: 00000000781c5000
>> (XEN)
>> (XEN)    ESR_EL2: f2000001
>> (XEN)  HPFAR_EL2: 0000000000030010
>> (XEN)    FAR_EL2: ffff000008005f00
>> (XEN)
>> (XEN) Xen stack trace from sp=0000800725eafd80:
>> (XEN)    0000800725e68ec0 0000000000247078 00008004dc040000 
>> 00000000002477c8
>> (XEN)    ffffffffffffffea 0000000000000001 ffff8000460ec500 
>> 0000000000000002
>> (XEN)    000000000024645c 00000000002462dc 0000800725eafeb0 
>> 0000800725eafeb0
>> (XEN)    0000800725eaff30 0000000060000145 000000000027882c 
>> 0000800725eafeb0
>> (XEN)    0000800725eafeb0 01ff00000935de80 00008004dc040000 
>> 0000000000000006
>> (XEN)    ffff800000000000 0000000000000002 000000005a000ea1 
>> 000000019bc60002
>> (XEN)    0000ffffd1ef3e08 0000000000000020 0000000000000004 
>> 000000000027c7d8
>> (XEN)    000000005a000ea1 0000800725eafeb0 000000005a000ea1 
>> 0000000000279f98
>> (XEN)    0000000000000000 ffff8000460ec200 0000800725eaffb8 
>> 0000000000262c58
>> (XEN)    0000000000262c4c 07e0000160000249 0000000000000002 
>> 0000000000000001
>> (XEN)    ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 
>> ffff800052e65100
>> (XEN)    000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 
>> 0000000000000000
>> (XEN)    00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 
>> 0000000000000002
>> (XEN)    0000000000000001 0000000000000001 0000000000000029 
>> 0000ffff9badb3d0
>> (XEN)    000000000000010f ffff8000460ec210 ffff8000460ec200 
>> ffff8000460ec210
>> (XEN)    0000000000000001 ffff8000460ec500 ffff8000460ec280 
>> 0000000000000124
>> (XEN)    000000000000001d ffff000008ad1000 ffff800052e65100 
>> ffff0000223dbd20
>> (XEN)    ffff000008537004 ffffffffffffffff ffff0000080c17e4 
>> 5a000ea160000145
>> (XEN)    0000000060000000 0000000000000000 0000000000000000 
>> ffff800052e65100
>> (XEN)    ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 
>> 0000000000000000
>> (XEN) Xen call trace:
>> (XEN)    [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
>> (XEN)    [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 3:
>> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) PSCI cpu off failed for CPU0 err=-3
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>>
>>
>>
>> Either I did something wrong (most likely) or there is an issue with 
>> page ref-counting in the IOREQ code. I am still trying to understand 
>> what is going on.
>
> At a first glance, the implement of set_foreign_p2m_entry() looks fine 
> to me.
>
>> Some notes on that:
>> 1. I checked that put_page() was called for these pages in 
>> p2m_put_l3_page() when destroying domain. This happened before 
>> hvm_free_ioreq_mfn() execution.
>> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of 
>> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU 
>> couldn't be fully destroyed because of the reference taken.
>
> This definitely looks like a page reference issue. Would it be 
> possible to print where the page reference are dropped? A WARN() in 
> put_page() would help.
>
> To avoid a lot of message, I tend to use a global variable that store 
> the page I want to watch.


Unfortunately it is unclear from the log who calls put_page() two times. 
One of the call is made by p2m_put_l3_page() I assume, but who makes a 
second call? Needs debugging.


Rebooting domain 2

root@generic-armv8-xt-dom0:~#

(XEN) put_page[1553] 0000000810e60e38 ---> ref = 3

(XEN) Xen WARN at mm.c:1554
(XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
(XEN) CPU:    2
(XEN) PC:     0000000000272a48 put_page+0xa0/0xc4
(XEN) LR:     0000000000272a48
(XEN) SP:     0000800725eaf990
(XEN) CPSR:   80000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000310028  X1: 0000000000000001  X2: 0000800725ca4000
(XEN)      X3: 0000000000000020  X4: 0000000000000000  X5: 0000000000000020
(XEN)      X6: 0080808080808080  X7: fefefefefefeff09  X8: 7f7f7f7f7f7f7f7f
(XEN)      X9: 756e6c64513d313b X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101
(XEN)     X12: 0000000000000008 X13: 000000000028b7d0 X14: 0000800725eaf6e8
(XEN)     X15: 0000000000000020 X16: 0000000000000000 X17: 0000ffffb5eaf070
(XEN)     X18: 000000000000010f X19: 8040000000000003 X20: 0000000810e60e38
(XEN)     X21: 0000800725f07208 X22: 0000800725f07208 X23: 000000000051c041
(XEN)     X24: 0000000000000001 X25: 0000800725eafa78 X26: 000000000009c041
(XEN)     X27: 0000000000000000 X28: 0000000000000007  FP: ffff00002212bd50
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0002000765f04000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000008078663f
(XEN)  TTBR0_EL2: 00000000781c5000
(XEN)
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 0000000000030010
(XEN)    FAR_EL2: ffff000008005f00
(XEN)
(XEN) Xen stack trace from sp=0000800725eaf990:
(XEN)    034000051c0417ff 0000000000000000 00000000002743e8 0000800725f07208
(XEN)    034000051c0417ff 0000000000000000 0000800725e6d208 0000800725f07208
(XEN)    0000000000000003 0000000000274910 0000000000000000 ffffffffffffffff
(XEN)    0000000000000001 000000000009c041 0000800725f07208 0000000000000000
(XEN)    0000000000000012 0000000000000007 0000000000000001 0000000000000009
(XEN)    0000000000274e00 0000800725f07208 ffffffffffffffff 0000000025eafb0c
(XEN)    0000800725f07000 ffff000008f86528 0000000000000000 0000000200000000
(XEN)    00000041000000e0 0000800725e6d000 0000000000000001 0000800725f07208
(XEN)    000000000009c041 0000000000000000 0000800725f07000 ffff000008f86528
(XEN)    ffff8000501b6540 ffff8000501b6550 ffff8000517bdb40 ffff800052784380
(XEN)    0000000000275770 0000800725eafb08 0000000000000000 0000000025f07000
(XEN)    fffffffffffffffe 0000800725f07000 0000000810e60e38 000000000021a930
(XEN)    0000000000236d18 0000800725f1b990 0000800725eafeb0 0000800725eafeb0
(XEN)    0000800725eaff30 00000000a0000145 000000005a000ea1 ffff000008f86528
(XEN)    0000800725f566a0 00000000002a9088 ffff00000814acc0 0000000a55bb542d
(XEN)    00000000002789d8 0000800725f1b990 00000000002b6cb8 0000800725f03920
(XEN)    00000000002b6cb8 0000000c9084be29 0000000000310228 0000800725eafbf0
(XEN)    ffff00000814f160 0000800725eafbe0 0000000000310228 0000800725eafbf0
(XEN)    0000000000310228 0000000100000002 00000000002b6cb8 0000000000237aa8
(XEN)    0000000000000002 0000000000000000 0000000000000000 0000800725eafc70
(XEN)    0000800725ecd6c8 0000800725ecddf0 00000000000000a0 0000000000000240
(XEN)    000000000026c920 0000800725ecd128 0000000000000002 0000000000000000
(XEN)    0000800725ecd000 0000000000000001 000000000026bfa8 0000000c90bf0b49
(XEN)    0000000000000002 0000000000000000 0000000000276fc4 0000000000000001
(XEN)    0000800725e64000 0000800725e68a60 0000800725e64000 0000800725f566a0
(XEN)    000000000023f53c 000000000027dc14 0000000000311390 0000000000346008
(XEN)    000000000027651c 0000800725eafdd8 0000000000240e44 0000000000000004
(XEN)    0000000000311390 0000000000240e80 0000800725e64000 0000000000240f20
(XEN)    ffff000022127ff0 000000000009c041 000000005a000ea1 ffff800011ae9800
(XEN)    0000000000000124 0000000000240f2c fffffffffffffff2 0000000000000001
(XEN)    0000800725e68ec0 0000800725e68ed0 000000000024694c 00008004dc040000
(XEN)    0000800725e68ec0 00008004dc040000 0000000000247c88 0000000000247c7c
(XEN)    ffffffffffffffea 0000000000000001 ffff800011ae9e80 0000000000000002
(XEN)    000000005a000ea1 0000ffffc52989c0 00000000002463b8 000000000024613c
(XEN)    0000800725eafeb0 0000800725eafeb0 0000800725eaff30 0000000060000145
(XEN)    00000000002789d8 0000800725eafeb0 0000800725eafeb0 01ff00000935de80
(XEN)    0000800725eca000 0000000000310280 0000000000000000 0000000000000000
(XEN)    000000005a000ea1 ffff800011ae9800 0000000000000124 000000000000001d
(XEN)    0000000000000004 000000000027c984 000000005a000ea1 0000800725eafeb0
(XEN)    000000005a000ea1 000000000027a144 0000000000000000 ffff80004e381f80
(XEN) Xen call trace:
(XEN)    [<0000000000272a48>] put_page+0xa0/0xc4 (PC)
(XEN)    [<0000000000272a48>] put_page+0xa0/0xc4 (LR)
(XEN)

(XEN) put_page[1553] 0000000810e60e38 ---> ref = 2

(XEN) Xen WARN at mm.c:1554
(XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
(XEN) CPU:    2
(XEN) PC:     0000000000272a48 put_page+0xa0/0xc4
(XEN) LR:     0000000000272a48
(XEN) SP:     0000800725eafaf0
(XEN) CPSR:   80000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000310028  X1: 0000000000000000  X2: 0000800725ca4000
(XEN)      X3: 0000000000000020  X4: 0000000000000000  X5: 0000000000000021
(XEN)      X6: 0080808080808080  X7: fefefefefefeff09  X8: 7f7f7f7f7f7f7f7f
(XEN)      X9: 756e6c64513d313b X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101
(XEN)     X12: 0000000000000008 X13: 000000000028b7d0 X14: 0000800725eaf848
(XEN)     X15: 0000000000000020 X16: 0000000000000000 X17: 0000ffffb5eaf070
(XEN)     X18: 000000000000010f X19: 8040000000000002 X20: 0000000810e60e38
(XEN)     X21: 0000000810e60e38 X22: 0000000000000000 X23: 0000800725f07000
(XEN)     X24: ffff000008f86528 X25: ffff8000501b6540 X26: ffff8000501b6550
(XEN)     X27: ffff8000517bdb40 X28: ffff800052784380  FP: ffff00002212bd50
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0002000765f04000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000008078663f
(XEN)  TTBR0_EL2: 00000000781c5000
(XEN)
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 0000000000030010
(XEN)    FAR_EL2: ffff000008005f00
(XEN)
(XEN) Xen stack trace from sp=0000800725eafaf0:
(XEN)    0000000000000000 0000800725f07000 000000000021a93c 000000000021a930
(XEN)    0000000000236d18 0000800725f1b990 0000800725eafeb0 0000800725eafeb0
(XEN)    0000800725eaff30 00000000a0000145 000000005a000ea1 ffff000008f86528
(XEN)    0000800725f566a0 00000000002a9088 ffff00000814acc0 0000000a55bb542d
(XEN)    00000000002789d8 0000800725f1b990 00000000002b6cb8 0000800725f03920
(XEN)    00000000002b6cb8 0000000c9084be29 0000000000310228 0000800725eafbf0
(XEN)    ffff00000814f160 0000800725eafbe0 0000000000310228 0000800725eafbf0
(XEN)    0000000000310228 0000000100000002 00000000002b6cb8 0000000000237aa8
(XEN)    0000000000000002 0000000000000000 0000000000000000 0000800725eafc70
(XEN)    0000800725ecd6c8 0000800725ecddf0 00000000000000a0 0000000000000240
(XEN)    000000000026c920 0000800725ecd128 0000000000000002 0000000000000000
(XEN)    0000800725ecd000 0000000000000001 000000000026bfa8 0000000c90bf0b49
(XEN)    0000000000000002 0000000000000000 0000000000276fc4 0000000000000001
(XEN)    0000800725e64000 0000800725e68a60 0000800725e64000 0000800725f566a0
(XEN)    000000000023f53c 000000000027dc14 0000000000311390 0000000000346008
(XEN)    000000000027651c 0000800725eafdd8 0000000000240e44 0000000000000004
(XEN)    0000000000311390 0000000000240e80 0000800725e64000 0000000000240f20
(XEN)    ffff000022127ff0 000000000009c041 000000005a000ea1 ffff800011ae9800
(XEN)    0000000000000124 0000000000240f2c fffffffffffffff2 0000000000000001
(XEN)    0000800725e68ec0 0000800725e68ed0 000000000024694c 00008004dc040000
(XEN)    0000800725e68ec0 00008004dc040000 0000000000247c88 0000000000247c7c
(XEN)    ffffffffffffffea 0000000000000001 ffff800011ae9e80 0000000000000002
(XEN)    000000005a000ea1 0000ffffc52989c0 00000000002463b8 000000000024613c
(XEN)    0000800725eafeb0 0000800725eafeb0 0000800725eaff30 0000000060000145
(XEN)    00000000002789d8 0000800725eafeb0 0000800725eafeb0 01ff00000935de80
(XEN)    0000800725eca000 0000000000310280 0000000000000000 0000000000000000
(XEN)    000000005a000ea1 ffff800011ae9800 0000000000000124 000000000000001d
(XEN)    0000000000000004 000000000027c984 000000005a000ea1 0000800725eafeb0
(XEN)    000000005a000ea1 000000000027a144 0000000000000000 ffff80004e381f80
(XEN)    0000800725eaffb8 0000000000262c58 0000000000262c4c 07e0000160000249
(XEN)    000000000000000f ffff00002212bd90 ffff80004e381f80 000000000009c041
(XEN)    ffff7dffff000000 0000ffffb603d000 0000000000000000 0000ffffb603c000
(XEN)    ffff8000521fdc08 0000000000000200 ffff8000517bdb60 0000000000000000
(XEN)    0000000000000000 0000ffffc529614f 000000000000001b 0000000000000001
(XEN)    000000000000000c 0000ffffb5eaf070 000000000000010f 0000000000000002
(XEN)    ffff80004e381f80 0000000000007ff0 ffff7e0000000000 0000000000000001
(XEN)    ffff000008f86528 ffff8000501b6540 ffff8000501b6550 ffff8000517bdb40
(XEN)    ffff800052784380 ffff00002212bd50 ffff000008537ab8 ffffffffffffffff
(XEN)    ffff0000080c1790 5a000ea1a0000145 0000000060000000 0000000000000000
(XEN)    0000000000000000 ffff800052784380 ffff00002212bd50 0000ffffb5eaf078
(XEN) Xen call trace:
(XEN)    [<0000000000272a48>] put_page+0xa0/0xc4 (PC)
(XEN)    [<0000000000272a48>] put_page+0xa0/0xc4 (LR)
(XEN)


(XEN) hvm_free_ioreq_mfn[417] ---> ref = 1

(XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
(XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
(XEN) CPU:    2
(XEN) PC:     0000000000246e2c ioreq.c#hvm_free_ioreq_mfn+0xbc/0xc0
(XEN) LR:     0000000000246dcc
(XEN) SP:     0000800725eafd70
(XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2: 000000000000001f
(XEN)      X3: 0000000080000000  X4: 0000000000000000  X5: 0000000000400000
(XEN)      X6: 0080808080808080  X7: fefefefefefeff09  X8: 7f7f7f7f7f7f7f7f
(XEN)      X9: 756e6c64513d313b X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101
(XEN)     X12: 0000000000000008 X13: 000000000028b7d0 X14: 0000800725eafac8
(XEN)     X15: 0000000000000020 X16: 0000000000000000 X17: 0000ffffb5eab3d0
(XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20: 0000000810e60e48
(XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
(XEN)     X24: ffff800011ae9800 X25: 0000000000000124 X26: 000000000000001d
(XEN)     X27: ffff000008ad1000 X28: ffff800052784380  FP: ffff00002212bd20
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0002000765f04000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000008078663f
(XEN)  TTBR0_EL2: 00000000781c5000
(XEN)
(XEN)    ESR_EL2: f2000001
(XEN)  HPFAR_EL2: 0000000000030010
(XEN)    FAR_EL2: ffff000008005f00
(XEN)
(XEN) Xen stack trace from sp=0000800725eafd70:
(XEN)    0000800725e68ec0 0000800725e68ec0 0000000000246f7c 0000800725e68ec0
(XEN)    00008004dc040000 00000000002476cc ffffffffffffffea 0000000000000001
(XEN)    ffff800011ae9e80 0000000000000002 00000000002462bc 000000000024613c
(XEN)    0000800725eafeb0 0000800725eafeb0 0000800725eaff30 0000000060000145
(XEN)    00000000002789d8 0000800725fb50a4 00000000ffffffff 0100000000277704
(XEN)    00008004dc040000 0000000000000006 0000000000000000 0000000000310288
(XEN)    0000000000000004 0000000125ec0002 0000ffffc52989a8 0000000000000020
(XEN)    0000000000000004 000000000027c984 000000005a000ea1 0000800725eafeb0
(XEN)    000000005a000ea1 000000000027a144 0000000000000000 ffff800011ae9100
(XEN)    0000800725eaffb8 0000000000262c58 0000000000262c4c 07e0000160000249
(XEN)    0000000000000002 0000000000000001 ffff800011ae9e80 ffff800011ae9e88
(XEN)    ffff800011ae9108 ffff800052784380 000000005068e148 0000ffffc5498000
(XEN)    ffff80005156ac10 0000000000000000 00e800008f88bf53 0400000000000000
(XEN)    ffff7e00013e22c0 0000000000000002 0000000000000001 0000000000000001
(XEN)    0000000000000029 0000ffffb5eab3d0 000000000000010f ffff800011ae9110
(XEN)    ffff800011ae9100 ffff800011ae9110 0000000000000001 ffff800011ae9e80
(XEN)    ffff800011ae9800 0000000000000124 000000000000001d ffff000008ad1000
(XEN)    ffff800052784380 ffff00002212bd20 ffff000008537004 ffffffffffffffff
(XEN)    ffff0000080c17e4 5a000ea160000145 0000000060000000 0000000000000000
(XEN)    0000000000000000 ffff800052784380 ffff00002212bd20 0000ffffb5eab3dc
(XEN)    0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<0000000000246e2c>] ioreq.c#hvm_free_ioreq_mfn+0xbc/0xc0 (PC)
(XEN)    [<0000000000246dcc>] ioreq.c#hvm_free_ioreq_mfn+0x5c/0xc0 (LR)
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) PSCI cpu off failed for CPU0 err=-3
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...




-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-13 20:39             ` Oleksandr Tyshchenko
@ 2020-08-13 22:14               ` Julien Grall
  2020-08-14 12:08                 ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-13 22:14 UTC (permalink / raw)
  To: Oleksandr Tyshchenko
  Cc: Stefano Stabellini, xen-devel, Oleksandr Tyshchenko, Ian Jackson,
	Wei Liu, Andrew Cooper, George Dunlap, Jan Beulich,
	Volodymyr Babchuk, Daniel De Graaf, Julien Grall

On Thu, 13 Aug 2020 at 21:40, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>
>
> Hi
>
> Sorry for the possible format issue.
>
> On Thu, Aug 13, 2020 at 9:42 PM Oleksandr <olekstysh@gmail.com> wrote:
>>
>>
>> On 11.08.20 20:50, Julien Grall wrote:
>>
>> Hi Julien
>>
>> >
>> >
>> > On 11/08/2020 18:09, Oleksandr wrote:
>> >>
>> >> On 05.08.20 12:32, Julien Grall wrote:
>> >>
>> >> Hi Julien, Stefano
>> >>
>> >>>
>> >>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>> >>>>> index 5fdb6e8..5823f11 100644
>> >>>>> --- a/xen/include/asm-arm/p2m.h
>> >>>>> +++ b/xen/include/asm-arm/p2m.h
>> >>>>> @@ -385,10 +385,11 @@ static inline int
>> >>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>> >>>>>                                           mfn_t mfn)
>> >>>>>   {
>> >>>>>       /*
>> >>>>> -     * NOTE: If this is implemented then proper reference
>> >>>>> counting of
>> >>>>> -     *       foreign entries will need to be implemented.
>> >>>>> +     * XXX: handle properly reference. It looks like the page may
>> >>>>> not always
>> >>>>> +     * belong to d.
>> >>>>
>> >>>> Just as a reference, and without taking away anything from the
>> >>>> comment,
>> >>>> I think that QEMU is doing its own internal reference counting for
>> >>>> these
>> >>>> mappings.
>> >>>
>> >>> I am not sure how this matters here? We can't really trust the DM to
>> >>> do the right thing if it is not running in dom0.
>> >>>
>> >>> But, IIRC, the problem is some of the pages doesn't belong to do a
>> >>> domain, so it is not possible to treat them as foreign mapping (e.g.
>> >>> you wouldn't be able to grab a reference). This investigation was
>> >>> done a couple of years ago, so this may have changed in recent Xen.
>> >>
>> >> Well, emulator is going to be used in driver domain, so this TODO
>> >> must be resolved. I suspect that the check for a hardware domain in
>> >> acquire_resource() which I skipped in a hackish way [1] could be
>> >> simply removed once proper reference counting is implemented in Xen,
>> >> correct?
>> >
>> > It depends how you are going to solve it. If you manage to solve it in
>> > a generic way, then yes you could resolve. If not (i.e. it is solved
>> > in an arch-specific way), we would need to keep the check on arch that
>> > are not able to deal with it. See more below.
>> >
>> >>
>> >> Could you please provide some pointers on that problem? Maybe some
>> >> questions need to be investigated again? Unfortunately, it is not
>> >> completely clear to me the direction to follow...
>> >>
>> >> ***
>> >> I am wondering whether the similar problem exists on x86 as well?
>> >
>> > It is somewhat different. On Arm, we are able to handle properly
>> > foreign mapping (i.e. mapping page from a another domain) as we would
>> > grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in
>> > xenmem_add_to_physmap()). The reference will then be released when the
>> > entry is removed from the P2M (see p2m_free_entry()).
>> >
>> > If all the pages given to set_foreign_p2m_entry() belong to a domain,
>> > then you could use the same approach.
>> >
>> > However, I remember to run into some issues in some of the cases. I
>> > had a quick looked at the caller and I wasn't able to find any use
>> > cases that may be an issue.
>> >
>> > The refcounting in the IOREQ code has changed after XSA-276 (this was
>> > found while working on the Arm port). Probably the best way to figure
>> > out if it works would be to try it and see if it fails.
>> >
>> > Note that set_foreign_p2m_entry() doesn't have a parameter for the
>> > foreign domain. You would need to add a extra parameter for this.
>> >
>> >> The FIXME tag (before checking for a hardware domain in
>> >> acquire_resource()) in the common code makes me think it is a common
>> >> issue. From other side x86's
>> >> implementation of set_foreign_p2m_entry() is exists unlike Arm's one
>> >> (which returned -EOPNOTSUPP so far). Or these are unrelated?
>> >
>> > At the moment, x86 doesn't support refcounting for foreign mapping.
>> > Hence the reason to restrict them to the hardware domain.
>>
>>
>> Thank you for the pointers!
>>
>>
>> I checked that all pages given to set_foreign_p2m_entry() belonged to a
>> domain (at least in my use-case). I noticed two calls for acquiring
>> resource at the DomU creation time, the first call was for grant table
>> (single gfn)
>> and the second for ioreq server which carried 2 gfns (for shared and
>> buffered rings I assume). For the test purpose, I passed these gfns to
>> get_page_from_gfn() in order to grab references on the pages, after that
>> I tried to destroy DomU without calling put_page() for these pages. The
>> fact that I couldn't destroy DomU completely (a zombie domain was
>> observed) made me think that references were still taken, so worked as
>> expected.
>>
>>
>> I implemented a test patch (which uses approach from
>> xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check
>> whether it would work.
>>
>>
>> ---
>>   xen/arch/arm/p2m.c        | 30 ++++++++++++++++++++++++++++++
>>   xen/common/memory.c       |  2 +-
>>   xen/include/asm-arm/p2m.h | 12 ++----------
>>   3 files changed, 33 insertions(+), 11 deletions(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index e9ccba8..7359715 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d,
>> gfn_t gfn, mfn_t mfn,
>>       return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
>>   }
>>
>> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
>> +                          unsigned long gfn, mfn_t mfn)
>> +{
>> +    struct page_info *page;
>> +    p2m_type_t p2mt;
>> +    int rc;
>> +
>> +    /*
>> +     * Take reference to the foreign domain page. Reference will be
>> released
>> +     * in p2m_put_l3_page().
>> +     */
>> +    page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC);
>> +    if ( !page )
>> +        return -EINVAL;
>> +
>> +    if ( p2m_is_ram(p2mt) )
>> +        p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw :
>> p2m_map_foreign_ro;
>> +    else
>> +    {
>> +        put_page(page);
>> +        return -EINVAL;
>> +    }
>> +
>> +    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt);
>> +    if ( rc )
>> +        put_page(page);
>> +
>> +    return 0;
>> +}
>> +
>>   static struct page_info *p2m_allocate_root(void)
>>   {
>>       struct page_info *page;
>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>> index 8d9f0a8..1de1d4f 100644
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -1171,7 +1171,7 @@ static int acquire_resource(
>>
>>           for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>>           {
>> -            rc = set_foreign_p2m_entry(currd, gfn_list[i],
>> +            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>>                                          _mfn(mfn_list[i]));
>>               /* rc should be -EIO for any iteration other than the first */
>>               if ( rc && i )
>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>> index 5823f11..53ce373 100644
>> --- a/xen/include/asm-arm/p2m.h
>> +++ b/xen/include/asm-arm/p2m.h
>> @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn,
>> unsigned int order)
>>       return gfn_add(gfn, 1UL << order);
>>   }
>>
>> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long
>> gfn,
>> -                                        mfn_t mfn)
>> -{
>> -    /*
>> -     * XXX: handle properly reference. It looks like the page may not
>> always
>> -     * belong to d.
>> -     */
>> -
>> -    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
>> -}
>> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
>> +                          unsigned long gfn,  mfn_t mfn);
>>
>>   /*
>>    * A vCPU has cache enabled only when the MMU is enabled and data cache
>> --
>> 2.7.4
>>
>>
>> And with that patch applied I was facing a BUG when destroying/rebooting
>> DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered
>> that BUG:
>>
>>
>> Rebooting domain 2
>> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at
>> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>> (XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
>> (XEN) CPU:    3
>> (XEN) PC:     0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
>> (XEN) LR:     0000000000246ef0
>> (XEN) SP:     0000800725eafd80
>> (XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
>> (XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2: 000000000000001f
>> (XEN)      X3: 0000000080000000  X4: 0000000000000000  X5: 0000000000400000
>> (XEN)      X6: 0000800725eafe24  X7: 0000ffffd1ef3e08  X8: 0000000000000020
>> (XEN)      X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000
>> (XEN)     X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001
>> (XEN)     X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0
>> (XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0
>> (XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
>> (XEN)     X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d
>> (XEN)     X27: ffff000008ad1000 X28: ffff800052e65100  FP: ffff0000223dbd20
>> (XEN)
>> (XEN)   VTCR_EL2: 80023558
>> (XEN)  VTTBR_EL2: 0002000765f04000
>> (XEN)
>> (XEN)  SCTLR_EL2: 30cd183d
>> (XEN)    HCR_EL2: 000000008078663f
>> (XEN)  TTBR0_EL2: 00000000781c5000
>> (XEN)
>> (XEN)    ESR_EL2: f2000001
>> (XEN)  HPFAR_EL2: 0000000000030010
>> (XEN)    FAR_EL2: ffff000008005f00
>> (XEN)
>> (XEN) Xen stack trace from sp=0000800725eafd80:
>> (XEN)    0000800725e68ec0 0000000000247078 00008004dc040000 00000000002477c8
>> (XEN)    ffffffffffffffea 0000000000000001 ffff8000460ec500 0000000000000002
>> (XEN)    000000000024645c 00000000002462dc 0000800725eafeb0 0000800725eafeb0
>> (XEN)    0000800725eaff30 0000000060000145 000000000027882c 0000800725eafeb0
>> (XEN)    0000800725eafeb0 01ff00000935de80 00008004dc040000 0000000000000006
>> (XEN)    ffff800000000000 0000000000000002 000000005a000ea1 000000019bc60002
>> (XEN)    0000ffffd1ef3e08 0000000000000020 0000000000000004 000000000027c7d8
>> (XEN)    000000005a000ea1 0000800725eafeb0 000000005a000ea1 0000000000279f98
>> (XEN)    0000000000000000 ffff8000460ec200 0000800725eaffb8 0000000000262c58
>> (XEN)    0000000000262c4c 07e0000160000249 0000000000000002 0000000000000001
>> (XEN)    ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 ffff800052e65100
>> (XEN)    000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 0000000000000000
>> (XEN)    00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 0000000000000002
>> (XEN)    0000000000000001 0000000000000001 0000000000000029 0000ffff9badb3d0
>> (XEN)    000000000000010f ffff8000460ec210 ffff8000460ec200 ffff8000460ec210
>> (XEN)    0000000000000001 ffff8000460ec500 ffff8000460ec280 0000000000000124
>> (XEN)    000000000000001d ffff000008ad1000 ffff800052e65100 ffff0000223dbd20
>> (XEN)    ffff000008537004 ffffffffffffffff ffff0000080c17e4 5a000ea160000145
>> (XEN)    0000000060000000 0000000000000000 0000000000000000 ffff800052e65100
>> (XEN)    ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 0000000000000000
>> (XEN) Xen call trace:
>> (XEN)    [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
>> (XEN)    [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 3:
>> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) PSCI cpu off failed for CPU0 err=-3
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>>
>>
>>
>> Either I did something wrong (most likely) or there is an issue with
>> page ref-counting in the IOREQ code. I am still trying to understand
>> what is going on.
>> Some notes on that:
>> 1. I checked that put_page() was called for these pages in
>> p2m_put_l3_page() when destroying domain. This happened before
>> hvm_free_ioreq_mfn() execution.
>> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of
>> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't
>> be fully destroyed because of the reference taken.
>
>
> I think I understand why BUG is triggered.
>
> I checked "page->count_info & PGC_count_mask" and noticed that get_page_from_gfn() doesn't seem to increase ref counter (but it should?)
>
> 1. hvm_alloc_ioreq_mfn() -> ref 2
> 2. set_foreign_p2m_entry() -> ref still 2
> 3. p2m_put_l3_page() -> ref 1
> 4. hvm_free_ioreq_mfn() calls put_page_alloc_ref() with ref 1 which triggers BUG

I looked again at your diff. It is actually not doing the right thing.
The parameter 'gfn' is a physical frame from 'd' (your current domain)
not 'fd'.
So you will end up grabbing a reference count on the wrong page. You
are quite lucky the 'gfn' is also valid for your foreign domain.

But in this case, you already have the MFN in hand. So what you want
to do is something like:

if (!get_page(mfn_to_page(mfn), fd))
  return -EINVAL;

/* Map page */


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-13 22:14               ` Julien Grall
@ 2020-08-14 12:08                 ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-14 12:08 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, xen-devel, Oleksandr Tyshchenko, Ian Jackson,
	Wei Liu, Andrew Cooper, George Dunlap, Jan Beulich,
	Volodymyr Babchuk, Daniel De Graaf, Julien Grall


On 14.08.20 01:14, Julien Grall wrote:

Hi Julien

> On Thu, 13 Aug 2020 at 21:40, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote:
>>
>> Hi
>>
>> Sorry for the possible format issue.
>>
>> On Thu, Aug 13, 2020 at 9:42 PM Oleksandr <olekstysh@gmail.com> wrote:
>>>
>>> On 11.08.20 20:50, Julien Grall wrote:
>>>
>>> Hi Julien
>>>
>>>>
>>>> On 11/08/2020 18:09, Oleksandr wrote:
>>>>> On 05.08.20 12:32, Julien Grall wrote:
>>>>>
>>>>> Hi Julien, Stefano
>>>>>
>>>>>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>>>>>>> index 5fdb6e8..5823f11 100644
>>>>>>>> --- a/xen/include/asm-arm/p2m.h
>>>>>>>> +++ b/xen/include/asm-arm/p2m.h
>>>>>>>> @@ -385,10 +385,11 @@ static inline int
>>>>>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>>>>>>>>                                            mfn_t mfn)
>>>>>>>>    {
>>>>>>>>        /*
>>>>>>>> -     * NOTE: If this is implemented then proper reference
>>>>>>>> counting of
>>>>>>>> -     *       foreign entries will need to be implemented.
>>>>>>>> +     * XXX: handle properly reference. It looks like the page may
>>>>>>>> not always
>>>>>>>> +     * belong to d.
>>>>>>> Just as a reference, and without taking away anything from the
>>>>>>> comment,
>>>>>>> I think that QEMU is doing its own internal reference counting for
>>>>>>> these
>>>>>>> mappings.
>>>>>> I am not sure how this matters here? We can't really trust the DM to
>>>>>> do the right thing if it is not running in dom0.
>>>>>>
>>>>>> But, IIRC, the problem is some of the pages doesn't belong to do a
>>>>>> domain, so it is not possible to treat them as foreign mapping (e.g.
>>>>>> you wouldn't be able to grab a reference). This investigation was
>>>>>> done a couple of years ago, so this may have changed in recent Xen.
>>>>> Well, emulator is going to be used in driver domain, so this TODO
>>>>> must be resolved. I suspect that the check for a hardware domain in
>>>>> acquire_resource() which I skipped in a hackish way [1] could be
>>>>> simply removed once proper reference counting is implemented in Xen,
>>>>> correct?
>>>> It depends how you are going to solve it. If you manage to solve it in
>>>> a generic way, then yes you could resolve. If not (i.e. it is solved
>>>> in an arch-specific way), we would need to keep the check on arch that
>>>> are not able to deal with it. See more below.
>>>>
>>>>> Could you please provide some pointers on that problem? Maybe some
>>>>> questions need to be investigated again? Unfortunately, it is not
>>>>> completely clear to me the direction to follow...
>>>>>
>>>>> ***
>>>>> I am wondering whether the similar problem exists on x86 as well?
>>>> It is somewhat different. On Arm, we are able to handle properly
>>>> foreign mapping (i.e. mapping page from a another domain) as we would
>>>> grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in
>>>> xenmem_add_to_physmap()). The reference will then be released when the
>>>> entry is removed from the P2M (see p2m_free_entry()).
>>>>
>>>> If all the pages given to set_foreign_p2m_entry() belong to a domain,
>>>> then you could use the same approach.
>>>>
>>>> However, I remember to run into some issues in some of the cases. I
>>>> had a quick looked at the caller and I wasn't able to find any use
>>>> cases that may be an issue.
>>>>
>>>> The refcounting in the IOREQ code has changed after XSA-276 (this was
>>>> found while working on the Arm port). Probably the best way to figure
>>>> out if it works would be to try it and see if it fails.
>>>>
>>>> Note that set_foreign_p2m_entry() doesn't have a parameter for the
>>>> foreign domain. You would need to add a extra parameter for this.
>>>>
>>>>> The FIXME tag (before checking for a hardware domain in
>>>>> acquire_resource()) in the common code makes me think it is a common
>>>>> issue. From other side x86's
>>>>> implementation of set_foreign_p2m_entry() is exists unlike Arm's one
>>>>> (which returned -EOPNOTSUPP so far). Or these are unrelated?
>>>> At the moment, x86 doesn't support refcounting for foreign mapping.
>>>> Hence the reason to restrict them to the hardware domain.
>>>
>>> Thank you for the pointers!
>>>
>>>
>>> I checked that all pages given to set_foreign_p2m_entry() belonged to a
>>> domain (at least in my use-case). I noticed two calls for acquiring
>>> resource at the DomU creation time, the first call was for grant table
>>> (single gfn)
>>> and the second for ioreq server which carried 2 gfns (for shared and
>>> buffered rings I assume). For the test purpose, I passed these gfns to
>>> get_page_from_gfn() in order to grab references on the pages, after that
>>> I tried to destroy DomU without calling put_page() for these pages. The
>>> fact that I couldn't destroy DomU completely (a zombie domain was
>>> observed) made me think that references were still taken, so worked as
>>> expected.
>>>
>>>
>>> I implemented a test patch (which uses approach from
>>> xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check
>>> whether it would work.
>>>
>>>
>>> ---
>>>    xen/arch/arm/p2m.c        | 30 ++++++++++++++++++++++++++++++
>>>    xen/common/memory.c       |  2 +-
>>>    xen/include/asm-arm/p2m.h | 12 ++----------
>>>    3 files changed, 33 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>>> index e9ccba8..7359715 100644
>>> --- a/xen/arch/arm/p2m.c
>>> +++ b/xen/arch/arm/p2m.c
>>> @@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d,
>>> gfn_t gfn, mfn_t mfn,
>>>        return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
>>>    }
>>>
>>> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
>>> +                          unsigned long gfn, mfn_t mfn)
>>> +{
>>> +    struct page_info *page;
>>> +    p2m_type_t p2mt;
>>> +    int rc;
>>> +
>>> +    /*
>>> +     * Take reference to the foreign domain page. Reference will be
>>> released
>>> +     * in p2m_put_l3_page().
>>> +     */
>>> +    page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC);
>>> +    if ( !page )
>>> +        return -EINVAL;
>>> +
>>> +    if ( p2m_is_ram(p2mt) )
>>> +        p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw :
>>> p2m_map_foreign_ro;
>>> +    else
>>> +    {
>>> +        put_page(page);
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt);
>>> +    if ( rc )
>>> +        put_page(page);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>    static struct page_info *p2m_allocate_root(void)
>>>    {
>>>        struct page_info *page;
>>> diff --git a/xen/common/memory.c b/xen/common/memory.c
>>> index 8d9f0a8..1de1d4f 100644
>>> --- a/xen/common/memory.c
>>> +++ b/xen/common/memory.c
>>> @@ -1171,7 +1171,7 @@ static int acquire_resource(
>>>
>>>            for ( i = 0; !rc && i < xmar.nr_frames; i++ )
>>>            {
>>> -            rc = set_foreign_p2m_entry(currd, gfn_list[i],
>>> +            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
>>>                                           _mfn(mfn_list[i]));
>>>                /* rc should be -EIO for any iteration other than the first */
>>>                if ( rc && i )
>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>>> index 5823f11..53ce373 100644
>>> --- a/xen/include/asm-arm/p2m.h
>>> +++ b/xen/include/asm-arm/p2m.h
>>> @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn,
>>> unsigned int order)
>>>        return gfn_add(gfn, 1UL << order);
>>>    }
>>>
>>> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long
>>> gfn,
>>> -                                        mfn_t mfn)
>>> -{
>>> -    /*
>>> -     * XXX: handle properly reference. It looks like the page may not
>>> always
>>> -     * belong to d.
>>> -     */
>>> -
>>> -    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
>>> -}
>>> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
>>> +                          unsigned long gfn,  mfn_t mfn);
>>>
>>>    /*
>>>     * A vCPU has cache enabled only when the MMU is enabled and data cache
>>> --
>>> 2.7.4
>>>
>>>
>>> And with that patch applied I was facing a BUG when destroying/rebooting
>>> DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered
>>> that BUG:
>>>
>>>
>>> Rebooting domain 2
>>> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at
>>> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>>> (XEN) ----[ Xen-4.14.0  arm64  debug=y   Not tainted ]----
>>> (XEN) CPU:    3
>>> (XEN) PC:     0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c
>>> (XEN) LR:     0000000000246ef0
>>> (XEN) SP:     0000800725eafd80
>>> (XEN) CPSR:   60000249 MODE:64-bit EL2h (Hypervisor, handler)
>>> (XEN)      X0: 0000000000000001  X1: 403fffffffffffff  X2: 000000000000001f
>>> (XEN)      X3: 0000000080000000  X4: 0000000000000000  X5: 0000000000400000
>>> (XEN)      X6: 0000800725eafe24  X7: 0000ffffd1ef3e08  X8: 0000000000000020
>>> (XEN)      X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000
>>> (XEN)     X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001
>>> (XEN)     X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0
>>> (XEN)     X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0
>>> (XEN)     X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1
>>> (XEN)     X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d
>>> (XEN)     X27: ffff000008ad1000 X28: ffff800052e65100  FP: ffff0000223dbd20
>>> (XEN)
>>> (XEN)   VTCR_EL2: 80023558
>>> (XEN)  VTTBR_EL2: 0002000765f04000
>>> (XEN)
>>> (XEN)  SCTLR_EL2: 30cd183d
>>> (XEN)    HCR_EL2: 000000008078663f
>>> (XEN)  TTBR0_EL2: 00000000781c5000
>>> (XEN)
>>> (XEN)    ESR_EL2: f2000001
>>> (XEN)  HPFAR_EL2: 0000000000030010
>>> (XEN)    FAR_EL2: ffff000008005f00
>>> (XEN)
>>> (XEN) Xen stack trace from sp=0000800725eafd80:
>>> (XEN)    0000800725e68ec0 0000000000247078 00008004dc040000 00000000002477c8
>>> (XEN)    ffffffffffffffea 0000000000000001 ffff8000460ec500 0000000000000002
>>> (XEN)    000000000024645c 00000000002462dc 0000800725eafeb0 0000800725eafeb0
>>> (XEN)    0000800725eaff30 0000000060000145 000000000027882c 0000800725eafeb0
>>> (XEN)    0000800725eafeb0 01ff00000935de80 00008004dc040000 0000000000000006
>>> (XEN)    ffff800000000000 0000000000000002 000000005a000ea1 000000019bc60002
>>> (XEN)    0000ffffd1ef3e08 0000000000000020 0000000000000004 000000000027c7d8
>>> (XEN)    000000005a000ea1 0000800725eafeb0 000000005a000ea1 0000000000279f98
>>> (XEN)    0000000000000000 ffff8000460ec200 0000800725eaffb8 0000000000262c58
>>> (XEN)    0000000000262c4c 07e0000160000249 0000000000000002 0000000000000001
>>> (XEN)    ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 ffff800052e65100
>>> (XEN)    000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 0000000000000000
>>> (XEN)    00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 0000000000000002
>>> (XEN)    0000000000000001 0000000000000001 0000000000000029 0000ffff9badb3d0
>>> (XEN)    000000000000010f ffff8000460ec210 ffff8000460ec200 ffff8000460ec210
>>> (XEN)    0000000000000001 ffff8000460ec500 ffff8000460ec280 0000000000000124
>>> (XEN)    000000000000001d ffff000008ad1000 ffff800052e65100 ffff0000223dbd20
>>> (XEN)    ffff000008537004 ffffffffffffffff ffff0000080c17e4 5a000ea160000145
>>> (XEN)    0000000060000000 0000000000000000 0000000000000000 ffff800052e65100
>>> (XEN)    ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN)    [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC)
>>> (XEN)    [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR)
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 3:
>>> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683
>>> (XEN) ****************************************
>>> (XEN)
>>> (XEN) Reboot in five seconds...
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) PSCI cpu off failed for CPU0 err=-3
>>> (XEN) ****************************************
>>> (XEN)
>>> (XEN) Reboot in five seconds...
>>>
>>>
>>>
>>> Either I did something wrong (most likely) or there is an issue with
>>> page ref-counting in the IOREQ code. I am still trying to understand
>>> what is going on.
>>> Some notes on that:
>>> 1. I checked that put_page() was called for these pages in
>>> p2m_put_l3_page() when destroying domain. This happened before
>>> hvm_free_ioreq_mfn() execution.
>>> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of
>>> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't
>>> be fully destroyed because of the reference taken.
>>
>> I think I understand why BUG is triggered.
>>
>> I checked "page->count_info & PGC_count_mask" and noticed that get_page_from_gfn() doesn't seem to increase ref counter (but it should?)
>>
>> 1. hvm_alloc_ioreq_mfn() -> ref 2
>> 2. set_foreign_p2m_entry() -> ref still 2
>> 3. p2m_put_l3_page() -> ref 1
>> 4. hvm_free_ioreq_mfn() calls put_page_alloc_ref() with ref 1 which triggers BUG
> I looked again at your diff. It is actually not doing the right thing.
> The parameter 'gfn' is a physical frame from 'd' (your current domain)
> not 'fd'.
> So you will end up grabbing a reference count on the wrong page. You
> are quite lucky the 'gfn' is also valid for your foreign domain.
>
> But in this case, you already have the MFN in hand. So what you want
> to do is something like:
>
> if (!get_page(mfn_to_page(mfn), fd))
>    return -EINVAL;
>
> /* Map page */


Indeed, thank you for the pointer. Now it is clear for me what went 
wrong. With the proposed change I didn't face any issues in my setup!


BTW, below the IOREQ server page life-cycle which looks correct.

create domain:

(XEN) 0000000810e60e38(0->1): hvm_alloc_ioreq_mfn() -> alloc_domheap_page()
(XEN) 0000000810e60e38(1->2): hvm_alloc_ioreq_mfn() -> get_page_and_type()
(XEN) 0000000810e60e38(2->3): acquire_resource() -> 
set_foreign_p2m_entry() -> get_page()

reboot domain:

(XEN) 0000000810e60e38(3->4): do_memory_op() -> get_page_from_gfn()
(XEN) 0000000810e60e38(4->3): do_memory_op() -> 
guest_physmap_remove_page() -> p2m_put_l3_page() -> put_page()
(XEN) 0000000810e60e38(3->2): do_memory_op() -> put_page()
(XEN) 0000000810e60e38(2->1): hvm_free_ioreq_mfn() -> put_page_alloc_ref()
(XEN) 0000000810e60e38(1->0): hvm_free_ioreq_mfn() -> put_page_and_type()


---
  xen/arch/arm/p2m.c        | 16 ++++++++++++++++
  xen/common/memory.c       |  2 +-
  xen/include/asm-arm/p2m.h | 12 ++----------
  3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index e9ccba8..4c99dd6 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1385,6 +1385,22 @@ int guest_physmap_remove_page(struct domain *d, 
gfn_t gfn, mfn_t mfn,
      return p2m_remove_mapping(d, gfn, (1 << page_order), mfn);
  }

+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn, mfn_t mfn)
+{
+    struct page_info *page = mfn_to_page(mfn);
+    int rc;
+
+    if ( !get_page(page, fd) )
+        return -EINVAL;
+
+    rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw);
+    if ( rc )
+        put_page(page);
+
+    return 0;
+}
+
  static struct page_info *p2m_allocate_root(void)
  {
      struct page_info *page;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 8d9f0a8..1de1d4f 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1171,7 +1171,7 @@ static int acquire_resource(

          for ( i = 0; !rc && i < xmar.nr_frames; i++ )
          {
-            rc = set_foreign_p2m_entry(currd, gfn_list[i],
+            rc = set_foreign_p2m_entry(currd, d, gfn_list[i],
                                         _mfn(mfn_list[i]));
              /* rc should be -EIO for any iteration other than the first */
              if ( rc && i )
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 5823f11..53ce373 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, 
unsigned int order)
      return gfn_add(gfn, 1UL << order);
  }

-static inline int set_foreign_p2m_entry(struct domain *d, unsigned long 
gfn,
-                                        mfn_t mfn)
-{
-    /*
-     * XXX: handle properly reference. It looks like the page may not 
always
-     * belong to d.
-     */
-
-    return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw);
-}
+int set_foreign_p2m_entry(struct domain *d, struct domain *fd,
+                          unsigned long gfn,  mfn_t mfn);

  /*
   * A vCPU has cache enabled only when the MMU is enabled and data cache
-- 
2.7.4



-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply related	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
  2020-08-06  9:27         ` Jan Beulich
@ 2020-08-14 16:30           ` Oleksandr
  2020-08-16 15:36             ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-14 16:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, paul, 'Oleksandr Tyshchenko',
	'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Julien Grall',
	'Stefano Stabellini', 'Wei Liu',
	'Daniel De Graaf'


Hello all.


>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 05 August 2020 17:20
>>>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant <paul@xen.org>
>>>> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew
>>>> Cooper <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Ian Jackson
>>>> <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>; Stefano Stabellini
>>>> <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Daniel De Graaf <dgdegra@tycho.nsa.gov>
>>>> Subject: Re: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
>>>>
>>>> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>
>>>>> Trying to run emulator in driver domain I ran into various issues
>>>>> mostly policy-related. So this patch tries to resolve all them
>>>>> plobably in a hackish way. I would like to get feedback how
>>>>> to implement them properly as having an emulator in driver domain
>>>>> is a completely valid use-case.
>>>>   From going over the comments I can only derive you want to run
>>>> an emulator in a driver domain, which doesn't really make sense
>>>> to me. A driver domain has a different purpose after all. If
>>>> instead you mean it to be run in just some other domain (which
>>>> also isn't the domain controlling the target), then there may
>>>> be more infrastructure changes needed.
>>>>
>>>> Paul - was/is your standalone ioreq server (demu?) able to run
>>>> in other than the domain controlling a guest?
>>>>
>>> Not something I've done yet, but it was always part of the idea so that we could e.g. pass through a device to a dedicated domain and then run multiple demu instances there to virtualize it for many domUs. (I'm thinking here of a device that is not SR-IOV and hence would need some bespoke emulation code to share it out).That dedicated domain would be termed the 'driver domain' simply because it is running the device driver for the h/w that underpins the emulation.
>> I may abuse "driver domain" terminology, but indeed in our use-case we
>> pass through a set of H/W devices to a dedicated domain which is running
>> the device drivers for that H/Ws. Our target system comprises a thin
>> Dom0 (without H/W devices at all), DomD (which owns most of the H/W
>> devices) and DomU which runs on virtual devices. This patch tries to
>> make changes at Xen side to be able run standalone ioreq server
>> (emulator) in that dedicated (driver?) domain.
> Okay, in which case I'm fine with the term. I simply wasn't aware of the
> targeted scenario, sorry.


May I kindly ask to suggest me the pointers how to *properly* resolve 
various policy related issues described in that patch? Without having 
them resolved it wouldn't be able to run standalone IOREQ server in 
driver domain.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm
  2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
                   ` (11 preceding siblings ...)
  2020-08-03 18:21 ` [RFC PATCH V1 12/12] libxl: Fix duplicate memory node in DT Oleksandr Tyshchenko
@ 2020-08-15 17:24 ` Julien Grall
  2020-08-16 19:34   ` Oleksandr
  12 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-15 17:24 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Oleksandr Tyshchenko, Jan Beulich, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini, Paul Durrant,
	Jun Nakajima, Kevin Tian, Tim Deegan, Volodymyr Babchuk,
	Daniel De Graaf, Anthony PERARD, Bertrand Marquis

Hi Oleksandr,

On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Hello all.
> 
> The purpose of this patch series is to add IOREQ/DM support to Xen on Arm.
> You can find an initial discussion at [1]. Xen on Arm requires some implementation
> to forward guest MMIO access to a device model in order to implement virtio-mmio
> backend or even mediator outside of hypervisor. As Xen on x86 already contains
> required support this patch series tries to make it common and introduce Arm
> specific bits plus some new functionality. Patch series is based on Julien's
> PoC "xen/arm: Add support for Guest IO forwarding to a device emulator".
> Besides splitting existing IOREQ/DM support and introducing Arm side,
> the patch series also includes virtio-mmio related changes (toolstack)
> for the reviewers to be able to see how the whole picture could look like.
> For a non-RFC, the IOREQ/DM and virtio-mmio support will be sent separately.
> 
> According to the initial discussion there are a few open questions/concerns
> regarding security, performance in VirtIO solution:
> 1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require different
>     transport...
> 2. virtio backend is able to access all guest memory, some kind of protection
>     is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest'
> 3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using
>     Xenstore in virtio backend if possible.
> 4. a lot of 'foreing mapping' could lead to the memory exhaustion, Julien
>     has some idea regarding that.
> 
> Looks like all of them are valid and worth considering, but the first thing
> which we need on Arm is a mechanism to forward guest IO to a device emulator,
> so let's focus on it in the first place.
> 
> ***
> 
> Patch series [2] was rebased on Xen v4.14 release and tested on Renesas Salvator-X
> board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend (we will share it later)
> running in driver domain and unmodified Linux Guest running on existing
> virtio-blk driver (frontend). No issues were observed. Guest domain 'reboot/destroy'
> use-cases work properly. Patch series was only build-tested on x86.
> 
> Please note, build-test passed for the following modes:
> 1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
> 2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set
> 3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
> 4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set
> 5. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set
> 
> Build-test didn't pass for Arm32 mode with 'CONFIG_IOREQ_SERVER=y' due to the lack of
> cmpxchg_64 support on Arm32. See cmpxchg usage in hvm_send_buffered_ioreq()).

I have sent a patch to implement cmpxchg64() and guest_cmpxchg64() (see 
[1]).

Cheers,

[1] 
https://lore.kernel.org/xen-devel/20200815172143.1327-1-julien@xen.org/T/#u

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-03 18:21 ` [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
                     ` (2 preceding siblings ...)
  2020-08-05 16:15   ` Andrew Cooper
@ 2020-08-15 17:30   ` Julien Grall
  2020-08-16 19:37     ` Oleksandr
  3 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-15 17:30 UTC (permalink / raw)
  To: Oleksandr Tyshchenko, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Jun Nakajima, Wei Liu,
	Paul Durrant, Andrew Cooper, Ian Jackson, George Dunlap,
	Tim Deegan, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	Roger Pau Monné

Hi Oleksandr,

On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
> +static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
> +{

[...]

> +    /* Canonicalize read/write pointers to prevent their overflow. */
> +    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
> +            qw++ < IOREQ_BUFFER_SLOT_NUM &&
> +            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
> +    {
> +        union bufioreq_pointers old = pg->ptrs, new;
> +        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
> +
> +        new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> +        new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
> +        cmpxchg(&pg->ptrs.full, old.full, new.full);

While working on the implementation of cmpxchg(), I realized the 
operation will happen on memory shared with a the emulator.

This will need to be switched to guest_cmpxchg64() to prevent a domain 
to DoS Xen on Arm.

I looked at the rest of the IOREQ and didn't notice any other example.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-04 14:01     ` Julien Grall
  2020-08-04 23:22       ` Stefano Stabellini
@ 2020-08-15 17:56       ` Julien Grall
  2020-08-17 14:36         ` Oleksandr
  1 sibling, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-15 17:56 UTC (permalink / raw)
  To: paul, 'Oleksandr Tyshchenko', xen-devel
  Cc: 'Stefano Stabellini', 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Oleksandr Tyshchenko',
	'Julien Grall', 'Jan Beulich',
	'Daniel De Graaf', 'Volodymyr Babchuk'

Hi,

On 04/08/2020 15:01, Julien Grall wrote:
> On 04/08/2020 08:49, Paul Durrant wrote:
>>> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
>>> index 931404c..b5fc066 100644
>>> --- a/tools/libxc/xc_dom_arm.c
>>> +++ b/tools/libxc/xc_dom_arm.c
>>> @@ -26,11 +26,19 @@
>>>   #include "xg_private.h"
>>>   #include "xc_dom.h"
>>>
>>> -#define NR_MAGIC_PAGES 4
>>> +
>>>   #define CONSOLE_PFN_OFFSET 0
>>>   #define XENSTORE_PFN_OFFSET 1
>>>   #define MEMACCESS_PFN_OFFSET 2
>>>   #define VUART_PFN_OFFSET 3
>>> +#define IOREQ_SERVER_PFN_OFFSET 4
>>> +
>>> +#define NR_IOREQ_SERVER_PAGES 8
>>> +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES)
>>> +
>>> +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT)
>>> +
>>> +#define special_pfn(x)  (GUEST_MAGIC_BASE_PFN + (x))
>>
>> Why introduce 'magic pages' for Arm? It's quite a horrible hack that 
>> we have begun to do away with by adding resource mapping.
> 
> This would require us to mandate at least Linux 4.17 in a domain that 
> will run an IOREQ server. If we don't mandate this, the minimum version 
> would be 4.10 where DM OP was introduced.
> 
> Because of XSA-300, we could technically not safely run an IOREQ server 
> with existing Linux. So it is probably OK to enforce the use of the 
> acquire interface.
One more thing. We are using atomic operations on the IOREQ pages. As 
our implementation is based on LL/SC instructions so far, we have 
mitigation in place to prevent a domain DoS Xen. However, this relies on 
the page to be mapped in a single domain at the time.

AFAICT, with the legacy interface, the pages will be mapped in both the 
target and the emulator. So this would defeat the mitigation we have in 
place.

Because the legacy interface is relying on foreign mapping, the page has 
to be mapped in the target P2M. It might be possible to restrict the 
access for the target by setting the p2m bits r, w to 0. This would 
still allow the foreign mapping to work as we only check the p2m type 
during mapping.

Anyway, I think we agreed that we want to avoid to introduce the legacy 
interface. But I wanted to answer just for completeness and keep a 
record of potential pitfalls with the legacy interface on Arm.

> 
> Note that I haven't yet looked at the rest of the series. So I am not 
> sure if there is more work necessary to enable it.
> 
> Cheers,
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
  2020-08-14 16:30           ` Oleksandr
@ 2020-08-16 15:36             ` Julien Grall
  2020-08-17 15:07               ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-16 15:36 UTC (permalink / raw)
  To: Oleksandr, xen-devel
  Cc: Jan Beulich, paul, 'Oleksandr Tyshchenko',
	'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Stefano Stabellini',
	'Wei Liu', 'Daniel De Graaf'



On 14/08/2020 17:30, Oleksandr wrote:
> 
> Hello all.
> 
> 
>>>>> -----Original Message-----
>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>> Sent: 05 August 2020 17:20
>>>>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant 
>>>>> <paul@xen.org>
>>>>> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko 
>>>>> <oleksandr_tyshchenko@epam.com>; Andrew
>>>>> Cooper <andrew.cooper3@citrix.com>; George Dunlap 
>>>>> <george.dunlap@citrix.com>; Ian Jackson
>>>>> <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>; Stefano 
>>>>> Stabellini
>>>>> <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Daniel De Graaf 
>>>>> <dgdegra@tycho.nsa.gov>
>>>>> Subject: Re: [RFC PATCH V1 07/12] A collection of tweaks to be able 
>>>>> to run emulator in driver domain
>>>>>
>>>>> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>
>>>>>> Trying to run emulator in driver domain I ran into various issues
>>>>>> mostly policy-related. So this patch tries to resolve all them
>>>>>> plobably in a hackish way. I would like to get feedback how
>>>>>> to implement them properly as having an emulator in driver domain
>>>>>> is a completely valid use-case.
>>>>>   From going over the comments I can only derive you want to run
>>>>> an emulator in a driver domain, which doesn't really make sense
>>>>> to me. A driver domain has a different purpose after all. If
>>>>> instead you mean it to be run in just some other domain (which
>>>>> also isn't the domain controlling the target), then there may
>>>>> be more infrastructure changes needed.
>>>>>
>>>>> Paul - was/is your standalone ioreq server (demu?) able to run
>>>>> in other than the domain controlling a guest?
>>>>>
>>>> Not something I've done yet, but it was always part of the idea so 
>>>> that we could e.g. pass through a device to a dedicated domain and 
>>>> then run multiple demu instances there to virtualize it for many 
>>>> domUs. (I'm thinking here of a device that is not SR-IOV and hence 
>>>> would need some bespoke emulation code to share it out).That 
>>>> dedicated domain would be termed the 'driver domain' simply because 
>>>> it is running the device driver for the h/w that underpins the 
>>>> emulation.
>>> I may abuse "driver domain" terminology, but indeed in our use-case we
>>> pass through a set of H/W devices to a dedicated domain which is running
>>> the device drivers for that H/Ws. Our target system comprises a thin
>>> Dom0 (without H/W devices at all), DomD (which owns most of the H/W
>>> devices) and DomU which runs on virtual devices. This patch tries to
>>> make changes at Xen side to be able run standalone ioreq server
>>> (emulator) in that dedicated (driver?) domain.
>> Okay, in which case I'm fine with the term. I simply wasn't aware of the
>> targeted scenario, sorry.
> 
> 
> May I kindly ask to suggest me the pointers how to *properly* resolve 
> various policy related issues described in that patch? Without having 
> them resolved it wouldn't be able to run standalone IOREQ server in 
> driver domain.

You could already do that by writing your own XSM policy. Did you 
explore it? If so, may I ask why this wouldn't be suitable?

Also, I would like to emphasis that because of XSA-295 (Unlimited Arm 
Atomics Operations), you can only run emulators in trusted domain on Arm.

There would be more work to do if you wanted to run them in non-trusted 
environment.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm
  2020-08-15 17:24 ` [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Julien Grall
@ 2020-08-16 19:34   ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-16 19:34 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Oleksandr Tyshchenko, Jan Beulich, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini, Paul Durrant,
	Jun Nakajima, Kevin Tian, Tim Deegan, Volodymyr Babchuk,
	Daniel De Graaf, Anthony PERARD, Bertrand Marquis


On 15.08.20 20:24, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien.


>
> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>
>> Hello all.
>>
>> The purpose of this patch series is to add IOREQ/DM support to Xen on 
>> Arm.
>> You can find an initial discussion at [1]. Xen on Arm requires some 
>> implementation
>> to forward guest MMIO access to a device model in order to implement 
>> virtio-mmio
>> backend or even mediator outside of hypervisor. As Xen on x86 already 
>> contains
>> required support this patch series tries to make it common and 
>> introduce Arm
>> specific bits plus some new functionality. Patch series is based on 
>> Julien's
>> PoC "xen/arm: Add support for Guest IO forwarding to a device emulator".
>> Besides splitting existing IOREQ/DM support and introducing Arm side,
>> the patch series also includes virtio-mmio related changes (toolstack)
>> for the reviewers to be able to see how the whole picture could look 
>> like.
>> For a non-RFC, the IOREQ/DM and virtio-mmio support will be sent 
>> separately.
>>
>> According to the initial discussion there are a few open 
>> questions/concerns
>> regarding security, performance in VirtIO solution:
>> 1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require 
>> different
>>     transport...
>> 2. virtio backend is able to access all guest memory, some kind of 
>> protection
>>     is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys 
>> in guest'
>> 3. interface between toolstack and 'out-of-qemu' virtio backend, 
>> avoid using
>>     Xenstore in virtio backend if possible.
>> 4. a lot of 'foreing mapping' could lead to the memory exhaustion, 
>> Julien
>>     has some idea regarding that.
>>
>> Looks like all of them are valid and worth considering, but the first 
>> thing
>> which we need on Arm is a mechanism to forward guest IO to a device 
>> emulator,
>> so let's focus on it in the first place.
>>
>> ***
>>
>> Patch series [2] was rebased on Xen v4.14 release and tested on 
>> Renesas Salvator-X
>> board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend (we will 
>> share it later)
>> running in driver domain and unmodified Linux Guest running on existing
>> virtio-blk driver (frontend). No issues were observed. Guest domain 
>> 'reboot/destroy'
>> use-cases work properly. Patch series was only build-tested on x86.
>>
>> Please note, build-test passed for the following modes:
>> 1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
>> 2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set
>> 3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default)
>> 4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set
>> 5. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set
>>
>> Build-test didn't pass for Arm32 mode with 'CONFIG_IOREQ_SERVER=y' 
>> due to the lack of
>> cmpxchg_64 support on Arm32. See cmpxchg usage in 
>> hvm_send_buffered_ioreq()).
>
> I have sent a patch to implement cmpxchg64() and guest_cmpxchg64() 
> (see [1]).
>
> Cheers,
>
> [1] 
> https://lore.kernel.org/xen-devel/20200815172143.1327-1-julien@xen.org/T/#u

  Thank you! I have already build-tested it. No issues). I will update 
corresponding patch to select IOREQ_SERVER for "config ARM" instead of 
"config ARM64".


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-15 17:30   ` Julien Grall
@ 2020-08-16 19:37     ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-16 19:37 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: Kevin Tian, Stefano Stabellini, Jun Nakajima, Wei Liu,
	Paul Durrant, Andrew Cooper, Ian Jackson, George Dunlap,
	Tim Deegan, Oleksandr Tyshchenko, Julien Grall, Jan Beulich,
	Roger Pau Monné


On 15.08.20 20:30, Julien Grall wrote:
> Hi Oleksandr,

Hi Julien.


>
> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote:
>> +static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, 
>> ioreq_t *p)
>> +{
>
> [...]
>
>> +    /* Canonicalize read/write pointers to prevent their overflow. */
>> +    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
>> +            qw++ < IOREQ_BUFFER_SLOT_NUM &&
>> +            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
>> +    {
>> +        union bufioreq_pointers old = pg->ptrs, new;
>> +        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
>> +
>> +        new.read_pointer = old.read_pointer - n * 
>> IOREQ_BUFFER_SLOT_NUM;
>> +        new.write_pointer = old.write_pointer - n * 
>> IOREQ_BUFFER_SLOT_NUM;
>> +        cmpxchg(&pg->ptrs.full, old.full, new.full);
>
> While working on the implementation of cmpxchg(), I realized the 
> operation will happen on memory shared with a the emulator.
>
> This will need to be switched to guest_cmpxchg64() to prevent a domain 
> to DoS Xen on Arm.

Got it. I will create a separate patch for that purpose.

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-15 17:56       ` Julien Grall
@ 2020-08-17 14:36         ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-17 14:36 UTC (permalink / raw)
  To: Julien Grall
  Cc: paul, xen-devel, 'Stefano Stabellini', 'Wei Liu',
	'Andrew Cooper', 'Ian Jackson',
	'George Dunlap', 'Oleksandr Tyshchenko',
	'Julien Grall', 'Jan Beulich',
	'Daniel De Graaf', 'Volodymyr Babchuk'


On 15.08.20 20:56, Julien Grall wrote:

Hi Julien.

> Hi,
>
> On 04/08/2020 15:01, Julien Grall wrote:
>> On 04/08/2020 08:49, Paul Durrant wrote:
>>>> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
>>>> index 931404c..b5fc066 100644
>>>> --- a/tools/libxc/xc_dom_arm.c
>>>> +++ b/tools/libxc/xc_dom_arm.c
>>>> @@ -26,11 +26,19 @@
>>>>   #include "xg_private.h"
>>>>   #include "xc_dom.h"
>>>>
>>>> -#define NR_MAGIC_PAGES 4
>>>> +
>>>>   #define CONSOLE_PFN_OFFSET 0
>>>>   #define XENSTORE_PFN_OFFSET 1
>>>>   #define MEMACCESS_PFN_OFFSET 2
>>>>   #define VUART_PFN_OFFSET 3
>>>> +#define IOREQ_SERVER_PFN_OFFSET 4
>>>> +
>>>> +#define NR_IOREQ_SERVER_PAGES 8
>>>> +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES)
>>>> +
>>>> +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT)
>>>> +
>>>> +#define special_pfn(x)  (GUEST_MAGIC_BASE_PFN + (x))
>>>
>>> Why introduce 'magic pages' for Arm? It's quite a horrible hack that 
>>> we have begun to do away with by adding resource mapping.
>>
>> This would require us to mandate at least Linux 4.17 in a domain that 
>> will run an IOREQ server. If we don't mandate this, the minimum 
>> version would be 4.10 where DM OP was introduced.
>>
>> Because of XSA-300, we could technically not safely run an IOREQ 
>> server with existing Linux. So it is probably OK to enforce the use 
>> of the acquire interface.
> One more thing. We are using atomic operations on the IOREQ pages. As 
> our implementation is based on LL/SC instructions so far, we have 
> mitigation in place to prevent a domain DoS Xen. However, this relies 
> on the page to be mapped in a single domain at the time.
>
> AFAICT, with the legacy interface, the pages will be mapped in both 
> the target and the emulator. So this would defeat the mitigation we 
> have in place.
>
> Because the legacy interface is relying on foreign mapping, the page 
> has to be mapped in the target P2M. It might be possible to restrict 
> the access for the target by setting the p2m bits r, w to 0. This 
> would still allow the foreign mapping to work as we only check the p2m 
> type during mapping.
>
> Anyway, I think we agreed that we want to avoid to introduce the 
> legacy interface. But I wanted to answer just for completeness and 
> keep a record of potential pitfalls with the legacy interface on Arm.
ok, the HVMOP plumbing on Arm will be dropped for non-RFC series. It 
seems that xenforeignmemory_map_resource() does needed things. Of 
course, the corresponding Linux patch to support 
IOCTL_PRIVCMD_MMAP_RESOURCE was cherry-picked for that purpose (I am 
currently using v4.14).

Thank you.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain
  2020-08-16 15:36             ` Julien Grall
@ 2020-08-17 15:07               ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-17 15:07 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Jan Beulich, paul, 'Oleksandr Tyshchenko',
	'Andrew Cooper', 'George Dunlap',
	'Ian Jackson', 'Stefano Stabellini',
	'Wei Liu', 'Daniel De Graaf'


On 16.08.20 18:36, Julien Grall wrote:

Hi Julien.

>
>
> On 14/08/2020 17:30, Oleksandr wrote:
>>
>> Hello all.
>>
>>
>>>>>> -----Original Message-----
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>> Sent: 05 August 2020 17:20
>>>>>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; Paul Durrant 
>>>>>> <paul@xen.org>
>>>>>> Cc: xen-devel@lists.xenproject.org; Oleksandr Tyshchenko 
>>>>>> <oleksandr_tyshchenko@epam.com>; Andrew
>>>>>> Cooper <andrew.cooper3@citrix.com>; George Dunlap 
>>>>>> <george.dunlap@citrix.com>; Ian Jackson
>>>>>> <ian.jackson@eu.citrix.com>; Julien Grall <julien@xen.org>; 
>>>>>> Stefano Stabellini
>>>>>> <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Daniel De Graaf 
>>>>>> <dgdegra@tycho.nsa.gov>
>>>>>> Subject: Re: [RFC PATCH V1 07/12] A collection of tweaks to be 
>>>>>> able to run emulator in driver domain
>>>>>>
>>>>>> On 03.08.2020 20:21, Oleksandr Tyshchenko wrote:
>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>
>>>>>>> Trying to run emulator in driver domain I ran into various issues
>>>>>>> mostly policy-related. So this patch tries to resolve all them
>>>>>>> plobably in a hackish way. I would like to get feedback how
>>>>>>> to implement them properly as having an emulator in driver domain
>>>>>>> is a completely valid use-case.
>>>>>>   From going over the comments I can only derive you want to run
>>>>>> an emulator in a driver domain, which doesn't really make sense
>>>>>> to me. A driver domain has a different purpose after all. If
>>>>>> instead you mean it to be run in just some other domain (which
>>>>>> also isn't the domain controlling the target), then there may
>>>>>> be more infrastructure changes needed.
>>>>>>
>>>>>> Paul - was/is your standalone ioreq server (demu?) able to run
>>>>>> in other than the domain controlling a guest?
>>>>>>
>>>>> Not something I've done yet, but it was always part of the idea so 
>>>>> that we could e.g. pass through a device to a dedicated domain and 
>>>>> then run multiple demu instances there to virtualize it for many 
>>>>> domUs. (I'm thinking here of a device that is not SR-IOV and hence 
>>>>> would need some bespoke emulation code to share it out).That 
>>>>> dedicated domain would be termed the 'driver domain' simply 
>>>>> because it is running the device driver for the h/w that underpins 
>>>>> the emulation.
>>>> I may abuse "driver domain" terminology, but indeed in our use-case we
>>>> pass through a set of H/W devices to a dedicated domain which is 
>>>> running
>>>> the device drivers for that H/Ws. Our target system comprises a thin
>>>> Dom0 (without H/W devices at all), DomD (which owns most of the H/W
>>>> devices) and DomU which runs on virtual devices. This patch tries to
>>>> make changes at Xen side to be able run standalone ioreq server
>>>> (emulator) in that dedicated (driver?) domain.
>>> Okay, in which case I'm fine with the term. I simply wasn't aware of 
>>> the
>>> targeted scenario, sorry.
>>
>>
>> May I kindly ask to suggest me the pointers how to *properly* resolve 
>> various policy related issues described in that patch? Without having 
>> them resolved it wouldn't be able to run standalone IOREQ server in 
>> driver domain.
>
> You could already do that by writing your own XSM policy. Did you 
> explore it? If so, may I ask why this wouldn't be suitable?
>
> Also, I would like to emphasis that because of XSA-295 (Unlimited Arm 
> Atomics Operations), you can only run emulators in trusted domain on Arm.
>
> There would be more work to do if you wanted to run them in 
> non-trusted environment.

Thank you for the explanation. Yes, we consider driver domain as a 
trusted domain, there is no plan to run emulator in non-trusted domains. 
Indeed, it worth trying to write our own policy which will cover our use 
case (with emulator in driver domain) rather than tweak Xen's default one.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-07 21:50               ` Stefano Stabellini
  2020-08-08  9:27                 ` Julien Grall
@ 2020-08-17 15:23                 ` Jan Beulich
  2020-08-17 22:56                   ` Stefano Stabellini
  1 sibling, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-17 15:23 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Oleksandr Tyshchenko, xen-devel,
	Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Andrew Cooper,
	George Dunlap, Volodymyr Babchuk, Julien Grall

On 07.08.2020 23:50, Stefano Stabellini wrote:
> On Fri, 7 Aug 2020, Jan Beulich wrote:
>> On 07.08.2020 01:49, Stefano Stabellini wrote:
>>> On Thu, 6 Aug 2020, Julien Grall wrote:
>>>> On 06/08/2020 01:37, Stefano Stabellini wrote:
>>>>> On Wed, 5 Aug 2020, Julien Grall wrote:
>>>>>> On 05/08/2020 00:22, Stefano Stabellini wrote:
>>>>>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>>
>>>>>>>> This patch adds ability to the device emulator to notify otherend
>>>>>>>> (some entity running in the guest) using a SPI and implements Arm
>>>>>>>> specific bits for it. Proposed interface allows emulator to set
>>>>>>>> the logical level of a one of a domain's IRQ lines.
>>>>>>>>
>>>>>>>> Please note, this is a split/cleanup of Julien's PoC:
>>>>>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>>>>>
>>>>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>> ---
>>>>>>>>    tools/libs/devicemodel/core.c                   | 18
>>>>>>>> ++++++++++++++++++
>>>>>>>>    tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
>>>>>>>>    tools/libs/devicemodel/libxendevicemodel.map    |  1 +
>>>>>>>>    xen/arch/arm/dm.c                               | 22
>>>>>>>> +++++++++++++++++++++-
>>>>>>>>    xen/common/hvm/dm.c                             |  1 +
>>>>>>>>    xen/include/public/hvm/dm_op.h                  | 15
>>>>>>>> +++++++++++++++
>>>>>>>>    6 files changed, 60 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/tools/libs/devicemodel/core.c
>>>>>>>> b/tools/libs/devicemodel/core.c
>>>>>>>> index 4d40639..30bd79f 100644
>>>>>>>> --- a/tools/libs/devicemodel/core.c
>>>>>>>> +++ b/tools/libs/devicemodel/core.c
>>>>>>>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
>>>>>>>>        return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>>>>>>>>    }
>>>>>>>>    +int xendevicemodel_set_irq_level(
>>>>>>>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
>>>>>>>> +    unsigned int level)
>>>>>>>
>>>>>>> It is a pity that having xen_dm_op_set_pci_intx_level and
>>>>>>> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
>>>>>>> the names alone I don't think we can reuse either of them.
>>>>>>
>>>>>> The problem is not the name...
>>>>>>
>>>>>>>
>>>>>>> It is very similar to set_isa_irq_level. We could almost rename
>>>>>>> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
>>>>>>> better, just add an alias to it so that xendevicemodel_set_irq_level is
>>>>>>> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
>>>>>>> not sure if it is worth doing it though. Any other opinions?
>>>>>>
>>>>>> ... the problem is the interrupt field is only 8-bit. So we would only be
>>>>>> able
>>>>>> to cover IRQ 0 - 255.
>>>>>
>>>>> Argh, that's not going to work :-(  I wasn't sure if it was a good idea
>>>>> anyway.
>>>>>
>>>>>
>>>>>> It is not entirely clear how the existing subop could be extended without
>>>>>> breaking existing callers.
>>>>>>
>>>>>>> But I think we should plan for not needing two calls (one to set level
>>>>>>> to 1, and one to set it to 0):
>>>>>>> https://marc.info/?l=xen-devel&m=159535112027405
>>>>>>
>>>>>> I am not sure to understand your suggestion here? Are you suggesting to
>>>>>> remove
>>>>>> the 'level' parameter?
>>>>>
>>>>> My hope was to make it optional to call the hypercall with level = 0,
>>>>> not necessarily to remove 'level' from the struct.
>>>>
>>>> From my understanding, the hypercall is meant to represent the status of the
>>>> line between the device and the interrupt controller (either low or high).
>>>>
>>>> This is then up to the interrupt controller to decide when the interrupt is
>>>> going to be fired:
>>>>   - For edge interrupt, this will fire when the line move from low to high (or
>>>> vice versa).
>>>>   - For level interrupt, this will fire when line is high (assuming level
>>>> trigger high) and will keeping firing until the device decided to lower the
>>>> line.
>>>>
>>>> For a device, it is common to keep the line high until an OS wrote to a
>>>> specific register.
>>>>
>>>> Furthermore, technically, the guest OS is in charge to configure how an
>>>> interrupt is triggered. Admittely this information is part of the DT, but
>>>> nothing prevent a guest to change it.
>>>>
>>>> As side note, we have a workaround in Xen for some buggy DT (see the arch
>>>> timer) exposing the wrong trigger type.
>>>>
>>>> Because of that, I don't really see a way to make optional. Maybe you have
>>>> something different in mind?
>>>
>>> For level, we need the level parameter. For edge, we are only interested
>>> in the "edge", right?
>>
>> I don't think so, unless Arm has special restrictions. Edges can be
>> both rising and falling ones.
> 
> And the same is true for level interrupts too: they could be active-low
> or active-high.
> 
> 
> Instead of modelling the state of the line, which seems to be a bit
> error prone especially in the case of a single-device emulator that
> might not have enough information about the rest of the system (it might
> not know if the interrupt is active-high or active-low), we could model
> the triggering of the interrupt instead.
> 
> In the case of level=1, it would mean that the interrupt line is active,
> no matter if it is active-low or active-high. In the case of level=0, it
> would mean that it is inactive.
> 
> Similarly, in the case of an edge interrupt edge=1 or level=1 would mean
> that there is an edge, no matter if it is a rising or falling.

Am I understanding right that you propose to fold two properties into
a single bit? While this _may_ be sufficient for Arm, wouldn't it be
better to retain both properties separately, to cover possible further
uses of the new sub-op?

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-17 15:23                 ` Jan Beulich
@ 2020-08-17 22:56                   ` Stefano Stabellini
  2020-08-18  8:03                     ` Jan Beulich
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-17 22:56 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Julien Grall, Oleksandr Tyshchenko,
	xen-devel, Oleksandr Tyshchenko, Ian Jackson, Wei Liu,
	Andrew Cooper, George Dunlap, Volodymyr Babchuk, Julien Grall

On Mon, 17 Aug 2020, Jan Beulich wrote:
> On 07.08.2020 23:50, Stefano Stabellini wrote:
> > On Fri, 7 Aug 2020, Jan Beulich wrote:
> >> On 07.08.2020 01:49, Stefano Stabellini wrote:
> >>> On Thu, 6 Aug 2020, Julien Grall wrote:
> >>>> On 06/08/2020 01:37, Stefano Stabellini wrote:
> >>>>> On Wed, 5 Aug 2020, Julien Grall wrote:
> >>>>>> On 05/08/2020 00:22, Stefano Stabellini wrote:
> >>>>>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
> >>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>>>>>>>
> >>>>>>>> This patch adds ability to the device emulator to notify otherend
> >>>>>>>> (some entity running in the guest) using a SPI and implements Arm
> >>>>>>>> specific bits for it. Proposed interface allows emulator to set
> >>>>>>>> the logical level of a one of a domain's IRQ lines.
> >>>>>>>>
> >>>>>>>> Please note, this is a split/cleanup of Julien's PoC:
> >>>>>>>> "Add support for Guest IO forwarding to a device emulator"
> >>>>>>>>
> >>>>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
> >>>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> >>>>>>>> ---
> >>>>>>>>    tools/libs/devicemodel/core.c                   | 18
> >>>>>>>> ++++++++++++++++++
> >>>>>>>>    tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
> >>>>>>>>    tools/libs/devicemodel/libxendevicemodel.map    |  1 +
> >>>>>>>>    xen/arch/arm/dm.c                               | 22
> >>>>>>>> +++++++++++++++++++++-
> >>>>>>>>    xen/common/hvm/dm.c                             |  1 +
> >>>>>>>>    xen/include/public/hvm/dm_op.h                  | 15
> >>>>>>>> +++++++++++++++
> >>>>>>>>    6 files changed, 60 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/tools/libs/devicemodel/core.c
> >>>>>>>> b/tools/libs/devicemodel/core.c
> >>>>>>>> index 4d40639..30bd79f 100644
> >>>>>>>> --- a/tools/libs/devicemodel/core.c
> >>>>>>>> +++ b/tools/libs/devicemodel/core.c
> >>>>>>>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
> >>>>>>>>        return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
> >>>>>>>>    }
> >>>>>>>>    +int xendevicemodel_set_irq_level(
> >>>>>>>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
> >>>>>>>> +    unsigned int level)
> >>>>>>>
> >>>>>>> It is a pity that having xen_dm_op_set_pci_intx_level and
> >>>>>>> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
> >>>>>>> the names alone I don't think we can reuse either of them.
> >>>>>>
> >>>>>> The problem is not the name...
> >>>>>>
> >>>>>>>
> >>>>>>> It is very similar to set_isa_irq_level. We could almost rename
> >>>>>>> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
> >>>>>>> better, just add an alias to it so that xendevicemodel_set_irq_level is
> >>>>>>> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
> >>>>>>> not sure if it is worth doing it though. Any other opinions?
> >>>>>>
> >>>>>> ... the problem is the interrupt field is only 8-bit. So we would only be
> >>>>>> able
> >>>>>> to cover IRQ 0 - 255.
> >>>>>
> >>>>> Argh, that's not going to work :-(  I wasn't sure if it was a good idea
> >>>>> anyway.
> >>>>>
> >>>>>
> >>>>>> It is not entirely clear how the existing subop could be extended without
> >>>>>> breaking existing callers.
> >>>>>>
> >>>>>>> But I think we should plan for not needing two calls (one to set level
> >>>>>>> to 1, and one to set it to 0):
> >>>>>>> https://marc.info/?l=xen-devel&m=159535112027405
> >>>>>>
> >>>>>> I am not sure to understand your suggestion here? Are you suggesting to
> >>>>>> remove
> >>>>>> the 'level' parameter?
> >>>>>
> >>>>> My hope was to make it optional to call the hypercall with level = 0,
> >>>>> not necessarily to remove 'level' from the struct.
> >>>>
> >>>> From my understanding, the hypercall is meant to represent the status of the
> >>>> line between the device and the interrupt controller (either low or high).
> >>>>
> >>>> This is then up to the interrupt controller to decide when the interrupt is
> >>>> going to be fired:
> >>>>   - For edge interrupt, this will fire when the line move from low to high (or
> >>>> vice versa).
> >>>>   - For level interrupt, this will fire when line is high (assuming level
> >>>> trigger high) and will keeping firing until the device decided to lower the
> >>>> line.
> >>>>
> >>>> For a device, it is common to keep the line high until an OS wrote to a
> >>>> specific register.
> >>>>
> >>>> Furthermore, technically, the guest OS is in charge to configure how an
> >>>> interrupt is triggered. Admittely this information is part of the DT, but
> >>>> nothing prevent a guest to change it.
> >>>>
> >>>> As side note, we have a workaround in Xen for some buggy DT (see the arch
> >>>> timer) exposing the wrong trigger type.
> >>>>
> >>>> Because of that, I don't really see a way to make optional. Maybe you have
> >>>> something different in mind?
> >>>
> >>> For level, we need the level parameter. For edge, we are only interested
> >>> in the "edge", right?
> >>
> >> I don't think so, unless Arm has special restrictions. Edges can be
> >> both rising and falling ones.
> > 
> > And the same is true for level interrupts too: they could be active-low
> > or active-high.
> > 
> > 
> > Instead of modelling the state of the line, which seems to be a bit
> > error prone especially in the case of a single-device emulator that
> > might not have enough information about the rest of the system (it might
> > not know if the interrupt is active-high or active-low), we could model
> > the triggering of the interrupt instead.
> > 
> > In the case of level=1, it would mean that the interrupt line is active,
> > no matter if it is active-low or active-high. In the case of level=0, it
> > would mean that it is inactive.
> > 
> > Similarly, in the case of an edge interrupt edge=1 or level=1 would mean
> > that there is an edge, no matter if it is a rising or falling.
> 
> Am I understanding right that you propose to fold two properties into
> a single bit?

I don't think I understand what are the two properties that my proposal
is merging into a single bit.

The hypercall specifies the state of the line in terms of "high" and
"low". My proposal is to replace it with "fire the interrupt" for edge
interrupts, and "interrupt enabled/disabled" for level, abstracting away
the state of the line in terms of high/low and instead focusing on
whether the interrupt should be injected or not.


> While this _may_ be sufficient for Arm, wouldn't it be
> better to retain both properties separately, to cover possible further
> uses of the new sub-op?

It would be possible to pass both sets of information, such as:

- line high/low
- "interrupt enabled/disabled" or "fire the interrupt"

If we pass both sets of information at the same time we lose the
benefits of my proposal. So I take you are suggesting to design the
hypercall so that either set (not both!) could be passed? So either:

- line high/low

or:

- "interrupt enabled/disabled" or "fire the interrupt"

?


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-17 22:56                   ` Stefano Stabellini
@ 2020-08-18  8:03                     ` Jan Beulich
  0 siblings, 0 replies; 140+ messages in thread
From: Jan Beulich @ 2020-08-18  8:03 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Oleksandr Tyshchenko, xen-devel,
	Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Andrew Cooper,
	George Dunlap, Volodymyr Babchuk, Julien Grall

On 18.08.2020 00:56, Stefano Stabellini wrote:
> On Mon, 17 Aug 2020, Jan Beulich wrote:
>> On 07.08.2020 23:50, Stefano Stabellini wrote:
>>> On Fri, 7 Aug 2020, Jan Beulich wrote:
>>>> On 07.08.2020 01:49, Stefano Stabellini wrote:
>>>>> On Thu, 6 Aug 2020, Julien Grall wrote:
>>>>>> On 06/08/2020 01:37, Stefano Stabellini wrote:
>>>>>>> On Wed, 5 Aug 2020, Julien Grall wrote:
>>>>>>>> On 05/08/2020 00:22, Stefano Stabellini wrote:
>>>>>>>>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote:
>>>>>>>>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>>>>
>>>>>>>>>> This patch adds ability to the device emulator to notify otherend
>>>>>>>>>> (some entity running in the guest) using a SPI and implements Arm
>>>>>>>>>> specific bits for it. Proposed interface allows emulator to set
>>>>>>>>>> the logical level of a one of a domain's IRQ lines.
>>>>>>>>>>
>>>>>>>>>> Please note, this is a split/cleanup of Julien's PoC:
>>>>>>>>>> "Add support for Guest IO forwarding to a device emulator"
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>>>>>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>>>>>> ---
>>>>>>>>>>    tools/libs/devicemodel/core.c                   | 18
>>>>>>>>>> ++++++++++++++++++
>>>>>>>>>>    tools/libs/devicemodel/include/xendevicemodel.h |  4 ++++
>>>>>>>>>>    tools/libs/devicemodel/libxendevicemodel.map    |  1 +
>>>>>>>>>>    xen/arch/arm/dm.c                               | 22
>>>>>>>>>> +++++++++++++++++++++-
>>>>>>>>>>    xen/common/hvm/dm.c                             |  1 +
>>>>>>>>>>    xen/include/public/hvm/dm_op.h                  | 15
>>>>>>>>>> +++++++++++++++
>>>>>>>>>>    6 files changed, 60 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/tools/libs/devicemodel/core.c
>>>>>>>>>> b/tools/libs/devicemodel/core.c
>>>>>>>>>> index 4d40639..30bd79f 100644
>>>>>>>>>> --- a/tools/libs/devicemodel/core.c
>>>>>>>>>> +++ b/tools/libs/devicemodel/core.c
>>>>>>>>>> @@ -430,6 +430,24 @@ int xendevicemodel_set_isa_irq_level(
>>>>>>>>>>        return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op));
>>>>>>>>>>    }
>>>>>>>>>>    +int xendevicemodel_set_irq_level(
>>>>>>>>>> +    xendevicemodel_handle *dmod, domid_t domid, uint32_t irq,
>>>>>>>>>> +    unsigned int level)
>>>>>>>>>
>>>>>>>>> It is a pity that having xen_dm_op_set_pci_intx_level and
>>>>>>>>> xen_dm_op_set_isa_irq_level already we need to add a third one, but from
>>>>>>>>> the names alone I don't think we can reuse either of them.
>>>>>>>>
>>>>>>>> The problem is not the name...
>>>>>>>>
>>>>>>>>>
>>>>>>>>> It is very similar to set_isa_irq_level. We could almost rename
>>>>>>>>> xendevicemodel_set_isa_irq_level to xendevicemodel_set_irq_level or,
>>>>>>>>> better, just add an alias to it so that xendevicemodel_set_irq_level is
>>>>>>>>> implemented by calling xendevicemodel_set_isa_irq_level. Honestly I am
>>>>>>>>> not sure if it is worth doing it though. Any other opinions?
>>>>>>>>
>>>>>>>> ... the problem is the interrupt field is only 8-bit. So we would only be
>>>>>>>> able
>>>>>>>> to cover IRQ 0 - 255.
>>>>>>>
>>>>>>> Argh, that's not going to work :-(  I wasn't sure if it was a good idea
>>>>>>> anyway.
>>>>>>>
>>>>>>>
>>>>>>>> It is not entirely clear how the existing subop could be extended without
>>>>>>>> breaking existing callers.
>>>>>>>>
>>>>>>>>> But I think we should plan for not needing two calls (one to set level
>>>>>>>>> to 1, and one to set it to 0):
>>>>>>>>> https://marc.info/?l=xen-devel&m=159535112027405
>>>>>>>>
>>>>>>>> I am not sure to understand your suggestion here? Are you suggesting to
>>>>>>>> remove
>>>>>>>> the 'level' parameter?
>>>>>>>
>>>>>>> My hope was to make it optional to call the hypercall with level = 0,
>>>>>>> not necessarily to remove 'level' from the struct.
>>>>>>
>>>>>> From my understanding, the hypercall is meant to represent the status of the
>>>>>> line between the device and the interrupt controller (either low or high).
>>>>>>
>>>>>> This is then up to the interrupt controller to decide when the interrupt is
>>>>>> going to be fired:
>>>>>>   - For edge interrupt, this will fire when the line move from low to high (or
>>>>>> vice versa).
>>>>>>   - For level interrupt, this will fire when line is high (assuming level
>>>>>> trigger high) and will keeping firing until the device decided to lower the
>>>>>> line.
>>>>>>
>>>>>> For a device, it is common to keep the line high until an OS wrote to a
>>>>>> specific register.
>>>>>>
>>>>>> Furthermore, technically, the guest OS is in charge to configure how an
>>>>>> interrupt is triggered. Admittely this information is part of the DT, but
>>>>>> nothing prevent a guest to change it.
>>>>>>
>>>>>> As side note, we have a workaround in Xen for some buggy DT (see the arch
>>>>>> timer) exposing the wrong trigger type.
>>>>>>
>>>>>> Because of that, I don't really see a way to make optional. Maybe you have
>>>>>> something different in mind?
>>>>>
>>>>> For level, we need the level parameter. For edge, we are only interested
>>>>> in the "edge", right?
>>>>
>>>> I don't think so, unless Arm has special restrictions. Edges can be
>>>> both rising and falling ones.
>>>
>>> And the same is true for level interrupts too: they could be active-low
>>> or active-high.
>>>
>>>
>>> Instead of modelling the state of the line, which seems to be a bit
>>> error prone especially in the case of a single-device emulator that
>>> might not have enough information about the rest of the system (it might
>>> not know if the interrupt is active-high or active-low), we could model
>>> the triggering of the interrupt instead.
>>>
>>> In the case of level=1, it would mean that the interrupt line is active,
>>> no matter if it is active-low or active-high. In the case of level=0, it
>>> would mean that it is inactive.
>>>
>>> Similarly, in the case of an edge interrupt edge=1 or level=1 would mean
>>> that there is an edge, no matter if it is a rising or falling.
>>
>> Am I understanding right that you propose to fold two properties into
>> a single bit?
> 
> I don't think I understand what are the two properties that my proposal
> is merging into a single bit.
> 
> The hypercall specifies the state of the line in terms of "high" and
> "low". My proposal is to replace it with "fire the interrupt" for edge
> interrupts, and "interrupt enabled/disabled" for level, abstracting away
> the state of the line in terms of high/low and instead focusing on
> whether the interrupt should be injected or not.

Okay, I realize I misunderstood. There's a naming issue that I think
gets in the way here: Since this is about triggering an IRQ without
"setting" its specific properties, perhaps "trigger_irq" would be a
better name, with your boolean distinguishing the "assert" and
"deassert" cases (and the other one telling "edge" vs "level")?

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-11 22:48                       ` Stefano Stabellini
@ 2020-08-18  9:31                         ` Julien Grall
  2020-08-21  0:53                           ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Julien Grall @ 2020-08-18  9:31 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Julien Grall, Jan Beulich, Oleksandr Tyshchenko, xen-devel,
	Oleksandr Tyshchenko, Ian Jackson, Wei Liu, Andrew Cooper,
	George Dunlap, Volodymyr Babchuk, Julien Grall

Hi Stefano,

On 11/08/2020 23:48, Stefano Stabellini wrote:
> On Tue, 11 Aug 2020, Julien Grall wrote:
>>>   I vaguely
>>> recall a bug 10+ years ago about this with QEMU on x86 and a line that
>>> could be both active-high and active-low. So QEMU would raise the
>>> interrupt but Xen would actually think that QEMU stopped the interrupt.
>>>
>>> To do this right, we would have to introduce an interface between Xen
>>> and QEMU to propagate the trigger type. Xen would have to tell QEMU when
>>> the guest changed the configuration. That would work, but it would be
>>> better if we can figure out a way to do without it to reduce complexity.
>> Per above, I don't think this is necessary.
>>
>>>
>>> Instead, given that QEMU and other emulators don't actually care about
>>> active-high or active-low, if we have a Xen interface that just says
>>> "fire the interrupt" we get away from this kind of troubles. It would
>>> also be more efficient because the total number of hypercalls required
>>> would be lower.
>>
>> I read "fire interrupt" the interrupt as "Please generate an interrupt once".
>> Is it what you definition you expect?
> 
> Yes, that is the idea. It would have to take into account the edge/level
> semantic difference: level would have "start it" and a "stop it".

I am still struggling to see how this can work:
     - At the moment, QEMU is only providing us the line state. How can 
we deduce the type of the interrupt? Would it mean a major modification 
of the QEMU API?
     - Can you provide a rough sketch how this could be implemented in Xen?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-06 11:29         ` Jan Beulich
@ 2020-08-20 18:30           ` Oleksandr
  2020-08-21  6:16             ` Jan Beulich
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-20 18:30 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall, Paul Durrant, Roger Pau Monné
  Cc: xen-devel, Stefano Stabellini, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Daniel De Graaf, Volodymyr Babchuk


Hello all.


I would like to clarify some questions based on the comments for the 
patch series. I put them together (please see below).


On 06.08.20 14:29, Jan Beulich wrote:
> On 06.08.2020 13:08, Julien Grall wrote:
>> On 05/08/2020 20:30, Oleksandr wrote:
>>> I was thinking how to split handle_hvm_io_completion()
>>> gracefully but I failed find a good solution for that, so decided to add
>>> two stubs (msix_write_completion and handle_realmode_completion) on Arm.
>>> I could add a comment describing why they are here if appropriate. But
>>> if you think they shouldn't be called from the common code in any way, I
>>> will try to split it.
>> I am not entirely sure what msix_write_completion is meant to do on x86.
>> Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help?
> Due to the split brain model of handling PCI pass-through (between
> Xen and qemu), a guest writing to an MSI-X entry needs this write
> handed to qemu, and upon completion of the write there Xen also
> needs to take some extra action.


1. Regarding common handle_hvm_io_completion() implementation:

Could msix_write_completion() be called later on so we would be able to 
split handle_hvm_io_completion() gracefully or could we call it from 
handle_mmio()?
The reason I am asking is to avoid calling it from the common code in 
order to avoid introducing stub on Arm which is not going to be ever 
implemented
(if msix_write_completion() is purely x86 material).

For the non-RFC patch series I moved handle_realmode_completion to the 
x86 code and now my local implementation looks like:

bool handle_hvm_io_completion(struct vcpu *v)
{
     struct domain *d = v->domain;
     struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
     struct hvm_ioreq_server *s;
     struct hvm_ioreq_vcpu *sv;
     enum hvm_io_completion io_completion;

     if ( has_vpci(d) && vpci_process_pending(v) )
     {
         raise_softirq(SCHEDULE_SOFTIRQ);
         return false;
     }

     sv = get_pending_vcpu(v, &s);
     if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
         return false;

     vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
         STATE_IORESP_READY : STATE_IOREQ_NONE;

     msix_write_completion(v);
     vcpu_end_shutdown_deferral(v);

     io_completion = vio->io_completion;
     vio->io_completion = HVMIO_no_completion;

     switch ( io_completion )
     {
     case HVMIO_no_completion:
         break;

     case HVMIO_mmio_completion:
         return handle_mmio();

     case HVMIO_pio_completion:
         return handle_pio(vio->io_req.addr, vio->io_req.size,
                           vio->io_req.dir);

     default:
         return arch_handle_hvm_io_completion(io_completion);
     }

     return true;
}

2. Regarding renaming common handle_mmio() to ioreq_handle_complete_mmio():

There was a request to consider renaming that function which is called 
from the common code in the context of IOREQ series.
The point is, that the name of the function is pretty generic and can be 
confusing on Arm (we already have a try_handle_mmio()).
I noticed that except common code that function is called from a few 
places on x86 (I am not even sure whether all of them are IOREQ related).
The question is would x86 folks be happy with such renaming?

Alternatively I could provide the following in 
include/asm-arm/hvm/ioreq.h without renaming it in the common code and
still using non-confusing variant on Arm (however I am not sure whether 
this is a good idea):

#define handle_mmio ioreq_handle_complete_mmio


3. Regarding common IOREQ/DM stuff location:

Currently it is located at:
common/hvm/...
include/xen/hvm/...

For the non-RFC patch series I am going to avoid using "hvm" name (which 
is internal detail of arch specific code and shouldn't be exposed to the 
common code).
The question is whether I should use another directory name (probably 
ioreq?) or just place them in common root directory?


Could you please share your opinion?

-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-12  8:19                           ` Julien Grall
@ 2020-08-20 19:14                             ` Oleksandr
  2020-08-21  0:53                               ` Stefano Stabellini
  0 siblings, 1 reply; 140+ messages in thread
From: Oleksandr @ 2020-08-20 19:14 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: paul, xen-devel, 'Oleksandr Tyshchenko',
	'Jan Beulich', 'Andrew Cooper', 'Wei Liu',
	'Roger Pau Monné', 'George Dunlap',
	'Ian Jackson', 'Jun Nakajima',
	'Kevin Tian', 'Tim Deegan',
	'Julien Grall'


On 12.08.20 11:19, Julien Grall wrote:
> Hi,

Hi Julien, Stefano


>
> On 11/08/2020 23:48, Stefano Stabellini wrote:
>>> I have the impression that we disagree in what the Device Emulator 
>>> is meant to
>>> do. IHMO, the goal of the device emulator is to emulate a device in an
>>> arch-agnostic way.
>>
>> That would be great in theory but I am not sure it is achievable: if we
>> use an existing emulator like QEMU, even a single device has to fit
>> into QEMU's view of the world, which makes assumptions about host
>> bridges and apertures. It is impossible today to build QEMU in an
>> arch-agnostic way, it has to be tied to an architecture.
>
> AFAICT, the only reason QEMU cannot build be in an arch-agnostic way 
> is because of TCG. If this wasn't built then you could easily write a 
> machine that doesn't depend on the instruction set.
>
> The proof is, today, we are using QEMU x86 to serve Arm64 guest. 
> Although this is only for PV drivers.
>
>>
>> I realize we are not building this interface for QEMU specifically, but
>> even if we try to make the interface arch-agnostic, in reality the
>> emulators won't be arch-agnostic.
>
> This depends on your goal. If your goal is to write a standalone 
> emulator for a single device, then it is entirely possible to make it 
> arch-agnostic.
>
> Per above, this would even be possible if you were emulating a set of 
> devices.
>
> What I want to avoid is requiring all the emulators to contain 
> arch-specific code just because it is easier to get QEMU working on 
> Xen on Arm.
>
>> If we send a port-mapped I/O request
>> to qemu-system-aarch64 who knows what is going to happen: it is a code
>> path that it is not explicitly tested.
>
> Maybe, maybe not. To me this is mostly software issues that can easily 
> be mitigated if we do proper testing...

Could we please find a common ground on whether the PIO handling needs 
to be implemented on Arm or not? At least for the current patch series.


Below my thoughts:
 From one side I agree that emulator shouldn't contain any arch-specific 
code, yes it is hypervisor specific but it should be arch agnostic if 
possible. So PIO case should be handled.
 From other side I tend to think that it might be possible to skip PIO 
handling for the current patch series (leave it x86 specific for now as 
we do with handle_realmode_completion()).
I think nothing will prevent us from adding PIO handling later on if 
there is a real need (use-case) for that. Please correct me if I am wrong.

I would be absolutely OK with any options.

What do you think?


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-20 19:14                             ` Oleksandr
@ 2020-08-21  0:53                               ` Stefano Stabellini
  2020-08-21 18:54                                 ` Julien Grall
  0 siblings, 1 reply; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-21  0:53 UTC (permalink / raw)
  To: Oleksandr
  Cc: Julien Grall, Stefano Stabellini, paul, xen-devel,
	'Oleksandr Tyshchenko', 'Jan Beulich',
	'Andrew Cooper', 'Wei Liu',
	'Roger Pau Monné', 'George Dunlap',
	'Ian Jackson', 'Jun Nakajima',
	'Kevin Tian', 'Tim Deegan',
	'Julien Grall'

On Thu, 20 Aug 2020, Oleksandr wrote:
> > On 11/08/2020 23:48, Stefano Stabellini wrote:
> > > > I have the impression that we disagree in what the Device Emulator is
> > > > meant to
> > > > do. IHMO, the goal of the device emulator is to emulate a device in an
> > > > arch-agnostic way.
> > > 
> > > That would be great in theory but I am not sure it is achievable: if we
> > > use an existing emulator like QEMU, even a single device has to fit
> > > into QEMU's view of the world, which makes assumptions about host
> > > bridges and apertures. It is impossible today to build QEMU in an
> > > arch-agnostic way, it has to be tied to an architecture.
> > 
> > AFAICT, the only reason QEMU cannot build be in an arch-agnostic way is
> > because of TCG. If this wasn't built then you could easily write a machine
> > that doesn't depend on the instruction set.
> > 
> > The proof is, today, we are using QEMU x86 to serve Arm64 guest. Although
> > this is only for PV drivers.
> > 
> > > 
> > > I realize we are not building this interface for QEMU specifically, but
> > > even if we try to make the interface arch-agnostic, in reality the
> > > emulators won't be arch-agnostic.
> > 
> > This depends on your goal. If your goal is to write a standalone emulator
> > for a single device, then it is entirely possible to make it arch-agnostic.
> > 
> > Per above, this would even be possible if you were emulating a set of
> > devices.
> > 
> > What I want to avoid is requiring all the emulators to contain arch-specific
> > code just because it is easier to get QEMU working on Xen on Arm.
> > 
> > > If we send a port-mapped I/O request
> > > to qemu-system-aarch64 who knows what is going to happen: it is a code
> > > path that it is not explicitly tested.
> > 
> > Maybe, maybe not. To me this is mostly software issues that can easily be
> > mitigated if we do proper testing...
> 
> Could we please find a common ground on whether the PIO handling needs to be
> implemented on Arm or not? At least for the current patch series.

Can you do a test on QEMU to verify which address space the PIO BARs are
using on ARM? I don't know if there is an easy way to test it but it
would be very useful for this conversation.


> Below my thoughts:
> From one side I agree that emulator shouldn't contain any arch-specific code,
> yes it is hypervisor specific but it should be arch agnostic if possible. So
> PIO case should be handled.
> From other side I tend to think that it might be possible to skip PIO handling
> for the current patch series (leave it x86 specific for now as we do with
> handle_realmode_completion()).
> I think nothing will prevent us from adding PIO handling later on if there is
> a real need (use-case) for that. Please correct me if I am wrong.
> 
> I would be absolutely OK with any options.
> 
> What do you think?

I agree that PIO handling is not the most critical thing right now given
that we have quite a few other important TODOs in the series. I'd be
fine reviewing another version of the series with this issue still
pending.


Of course, PIO needs to be handled. The key to me is that QEMU (or other
emulator) should *not* emulate in/out instructions on ARM. PIO ioreq
requests should not be satisfied by using address_space_io directly (the
PIO address space that requires special instructions to access it). In
QEMU the PIO reads/writes should be done via address_space_memory (the
normal memory mapped address space).

So either way of the following approaches should be OK:

1) Xen sends out PIO addresses as memory mapped addresses, QEMU simply
   reads/writes on them
2) Xen sends out PIO addresses as address_space_io, QEMU finds the
   mapping to address_space_memory, then reads/writes on
   address_space_memory

From an interface and implementation perspective, 1) means that
IOREQ_TYPE_PIO is unused on ARM, while 2) means that IOREQ_TYPE_PIO is
still used as part of the ioreq interface, even if QEMU doesn't directly
operate on those addresses.

My preference is 1) because it leads to a simpler solution.


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op
  2020-08-18  9:31                         ` Julien Grall
@ 2020-08-21  0:53                           ` Stefano Stabellini
  0 siblings, 0 replies; 140+ messages in thread
From: Stefano Stabellini @ 2020-08-21  0:53 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Julien Grall, Jan Beulich,
	Oleksandr Tyshchenko, xen-devel, Oleksandr Tyshchenko,
	Ian Jackson, Wei Liu, Andrew Cooper, George Dunlap,
	Volodymyr Babchuk, Julien Grall

On Tue, 18 Aug 2020, Julien Grall wrote:
> On 11/08/2020 23:48, Stefano Stabellini wrote:
> > On Tue, 11 Aug 2020, Julien Grall wrote:
> > > >   I vaguely
> > > > recall a bug 10+ years ago about this with QEMU on x86 and a line that
> > > > could be both active-high and active-low. So QEMU would raise the
> > > > interrupt but Xen would actually think that QEMU stopped the interrupt.
> > > > 
> > > > To do this right, we would have to introduce an interface between Xen
> > > > and QEMU to propagate the trigger type. Xen would have to tell QEMU when
> > > > the guest changed the configuration. That would work, but it would be
> > > > better if we can figure out a way to do without it to reduce complexity.
> > > Per above, I don't think this is necessary.
> > > 
> > > > 
> > > > Instead, given that QEMU and other emulators don't actually care about
> > > > active-high or active-low, if we have a Xen interface that just says
> > > > "fire the interrupt" we get away from this kind of troubles. It would
> > > > also be more efficient because the total number of hypercalls required
> > > > would be lower.
> > > 
> > > I read "fire interrupt" the interrupt as "Please generate an interrupt
> > > once".
> > > Is it what you definition you expect?
> > 
> > Yes, that is the idea. It would have to take into account the edge/level
> > semantic difference: level would have "start it" and a "stop it".
> 
> I am still struggling to see how this can work:
>     - At the moment, QEMU is only providing us the line state. How can we
> deduce the type of the interrupt? Would it mean a major modification of the
> QEMU API?

Good question. 

I don't think we would need any major modifications of the QEMU APIs.
QEMU already uses two different function calls to trigger an edge
interrupt and to trigger a level interrupt.

Edge interrupts are triggered with qemu_irq_pulse; level interrupts with
qemu_irq_raise/qemu_irq_lower.

It is also possible for devices to call qemu_set_irq directly which
only has the state of the line represented by the "level" argument.
As far as I can tell all interrupts emulated in QEMU (at least the ones
we care about) are active-high.

We have a couple of choices in the implementation, like hooking into
qemu_irq_pulse, and/or checking if the interrupt is level or edge in the
xen interrupt injection function. The latter shouldn't require any
changes in QEMU common code.


FYI looking into the code there is something "strange" in virtio-mmio.c:
it only ever calls qemu_set_irq to start a notification. It doesn't look
like it ever calls qemu_set_irq to stop a notification at all. It is
possible that the state of the line is not accurately emulated for
virtio-mmio.c.


>     - Can you provide a rough sketch how this could be implemented in Xen?

It would work similarly to other emulated interrupt injections on the
Xen side, calling vgic_inject_irq.  We have matching info about
level/edge and active-high/active-low in Xen too, so we could do more
precise emulation of the interrupt flow, although I am aware of the
current limitations of the vgic in that regard.

But I have the feeling I didn't address your concern :-)



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-20 18:30           ` Oleksandr
@ 2020-08-21  6:16             ` Jan Beulich
  2020-08-21 11:13               ` Oleksandr
  0 siblings, 1 reply; 140+ messages in thread
From: Jan Beulich @ 2020-08-21  6:16 UTC (permalink / raw)
  To: Oleksandr
  Cc: Julien Grall, Paul Durrant, Roger Pau Monné,
	xen-devel, Stefano Stabellini, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Daniel De Graaf, Volodymyr Babchuk

On 20.08.2020 20:30, Oleksandr wrote:
> On 06.08.20 14:29, Jan Beulich wrote:
>> On 06.08.2020 13:08, Julien Grall wrote:
>>> On 05/08/2020 20:30, Oleksandr wrote:
>>>> I was thinking how to split handle_hvm_io_completion()
>>>> gracefully but I failed find a good solution for that, so decided to add
>>>> two stubs (msix_write_completion and handle_realmode_completion) on Arm.
>>>> I could add a comment describing why they are here if appropriate. But
>>>> if you think they shouldn't be called from the common code in any way, I
>>>> will try to split it.
>>> I am not entirely sure what msix_write_completion is meant to do on x86.
>>> Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help?
>> Due to the split brain model of handling PCI pass-through (between
>> Xen and qemu), a guest writing to an MSI-X entry needs this write
>> handed to qemu, and upon completion of the write there Xen also
>> needs to take some extra action.
> 
> 
> 1. Regarding common handle_hvm_io_completion() implementation:
> 
> Could msix_write_completion() be called later on so we would be able to 
> split handle_hvm_io_completion() gracefully or could we call it from 
> handle_mmio()?
> The reason I am asking is to avoid calling it from the common code in 
> order to avoid introducing stub on Arm which is not going to be ever 
> implemented
> (if msix_write_completion() is purely x86 material).

I'm unconvinced of this last fact, but as with about everything it is
quite certainly possible to call the function later. The question is
how ugly this would become, as this may involve redundant conditionals
(i.e. ones which need to remain in sync) and/or extra propagation of
state.

> For the non-RFC patch series I moved handle_realmode_completion to the 
> x86 code and now my local implementation looks like:
> 
> bool handle_hvm_io_completion(struct vcpu *v)
> {
>      struct domain *d = v->domain;
>      struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>      struct hvm_ioreq_server *s;
>      struct hvm_ioreq_vcpu *sv;
>      enum hvm_io_completion io_completion;
> 
>      if ( has_vpci(d) && vpci_process_pending(v) )
>      {
>          raise_softirq(SCHEDULE_SOFTIRQ);
>          return false;
>      }
> 
>      sv = get_pending_vcpu(v, &s);
>      if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
>          return false;
> 
>      vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
>          STATE_IORESP_READY : STATE_IOREQ_NONE;
> 
>      msix_write_completion(v);
>      vcpu_end_shutdown_deferral(v);
> 
>      io_completion = vio->io_completion;
>      vio->io_completion = HVMIO_no_completion;
> 
>      switch ( io_completion )
>      {
>      case HVMIO_no_completion:
>          break;
> 
>      case HVMIO_mmio_completion:
>          return handle_mmio();
> 
>      case HVMIO_pio_completion:
>          return handle_pio(vio->io_req.addr, vio->io_req.size,
>                            vio->io_req.dir);
> 
>      default:
>          return arch_handle_hvm_io_completion(io_completion);
>      }
> 
>      return true;
> }
> 
> 2. Regarding renaming common handle_mmio() to ioreq_handle_complete_mmio():
> 
> There was a request to consider renaming that function which is called 
> from the common code in the context of IOREQ series.
> The point is, that the name of the function is pretty generic and can be 
> confusing on Arm (we already have a try_handle_mmio()).
> I noticed that except common code that function is called from a few 
> places on x86 (I am not even sure whether all of them are IOREQ related).
> The question is would x86 folks be happy with such renaming?

handle_mmio() without any parameters and used for a varying set
of purposes was imo never a good choice of name. The situation
has improved, but can do with further improvement. The new name,
if to be used for truly renaming the function need to fit all
uses though. As such, I don't think ioreq_handle_complete_mmio()
is an appropriate name.

> Alternatively I could provide the following in 
> include/asm-arm/hvm/ioreq.h without renaming it in the common code and
> still using non-confusing variant on Arm (however I am not sure whether 
> this is a good idea):
> 
> #define handle_mmio ioreq_handle_complete_mmio

If anything, for x86 it ought to be the other way around, at
which point you wouldn't need any alias #define on Arm.

> 3. Regarding common IOREQ/DM stuff location:
> 
> Currently it is located at:
> common/hvm/...
> include/xen/hvm/...
> 
> For the non-RFC patch series I am going to avoid using "hvm" name (which 
> is internal detail of arch specific code and shouldn't be exposed to the 
> common code).
> The question is whether I should use another directory name (probably 
> ioreq?) or just place them in common root directory?

I think there are arguments for and against hvm/. I'm not of
the opinion that ioreq/ would be a good name, so if hvm/ was to
be ruled out, I think the file(s) shouldn't go into separate
subdirs at all.

Jan


^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features
  2020-08-21  6:16             ` Jan Beulich
@ 2020-08-21 11:13               ` Oleksandr
  0 siblings, 0 replies; 140+ messages in thread
From: Oleksandr @ 2020-08-21 11:13 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: Paul Durrant, Roger Pau Monné,
	xen-devel, Stefano Stabellini, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Oleksandr Tyshchenko, Julien Grall,
	Daniel De Graaf, Volodymyr Babchuk


On 21.08.20 09:16, Jan Beulich wrote:

Hi Jan.

Thank you for your answer.

> On 20.08.2020 20:30, Oleksandr wrote:
>> On 06.08.20 14:29, Jan Beulich wrote:
>>> On 06.08.2020 13:08, Julien Grall wrote:
>>>> On 05/08/2020 20:30, Oleksandr wrote:
>>>>> I was thinking how to split handle_hvm_io_completion()
>>>>> gracefully but I failed find a good solution for that, so decided to add
>>>>> two stubs (msix_write_completion and handle_realmode_completion) on Arm.
>>>>> I could add a comment describing why they are here if appropriate. But
>>>>> if you think they shouldn't be called from the common code in any way, I
>>>>> will try to split it.
>>>> I am not entirely sure what msix_write_completion is meant to do on x86.
>>>> Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help?
>>> Due to the split brain model of handling PCI pass-through (between
>>> Xen and qemu), a guest writing to an MSI-X entry needs this write
>>> handed to qemu, and upon completion of the write there Xen also
>>> needs to take some extra action.
>>
>> 1. Regarding common handle_hvm_io_completion() implementation:
>>
>> Could msix_write_completion() be called later on so we would be able to
>> split handle_hvm_io_completion() gracefully or could we call it from
>> handle_mmio()?
>> The reason I am asking is to avoid calling it from the common code in
>> order to avoid introducing stub on Arm which is not going to be ever
>> implemented
>> (if msix_write_completion() is purely x86 material).
> I'm unconvinced of this last fact, but as with about everything it is
> quite certainly possible to call the function later. The question is
> how ugly this would become, as this may involve redundant conditionals
> (i.e. ones which need to remain in sync) and/or extra propagation of
> state.


I understand. Would it be better to make handle_hvm_io_completion() per 
arch then?
This would avoid using various stubs on Arm (we could get rid of 
has_vpci, msix_write_completion, handle_pio, 
arch_handle_hvm_io_completion, etc)
and avoid renaming of handle_mmio().

Julien what is your opinion on that?


For example the Arm implementation would look like:

bool handle_hvm_io_completion(struct vcpu *v)
{
     struct domain *d = v->domain;
     struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
     struct hvm_ioreq_server *s;
     struct hvm_ioreq_vcpu *sv;
     enum hvm_io_completion io_completion;

     sv = get_pending_vcpu(v, &s);
     if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
         return false;

     vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
         STATE_IORESP_READY : STATE_IOREQ_NONE;

     vcpu_end_shutdown_deferral(v);

     io_completion = vio->io_completion;
     vio->io_completion = HVMIO_no_completion;

     switch ( io_completion )
     {
     case HVMIO_no_completion:
         break;

     case HVMIO_mmio_completion:
         return ioreq_handle_complete_mmio();

     default:
         ASSERT_UNREACHABLE();
         break;
     }

     return true;
}


>>
>> 2. Regarding renaming common handle_mmio() to ioreq_handle_complete_mmio():
>>
>> There was a request to consider renaming that function which is called
>> from the common code in the context of IOREQ series.
>> The point is, that the name of the function is pretty generic and can be
>> confusing on Arm (we already have a try_handle_mmio()).
>> I noticed that except common code that function is called from a few
>> places on x86 (I am not even sure whether all of them are IOREQ related).
>> The question is would x86 folks be happy with such renaming?
> handle_mmio() without any parameters and used for a varying set
> of purposes was imo never a good choice of name. The situation
> has improved, but can do with further improvement. The new name,
> if to be used for truly renaming the function need to fit all
> uses though. As such, I don't think ioreq_handle_complete_mmio()
> is an appropriate name.
>
>> Alternatively I could provide the following in
>> include/asm-arm/hvm/ioreq.h without renaming it in the common code and
>> still using non-confusing variant on Arm (however I am not sure whether
>> this is a good idea):
>>
>> #define handle_mmio ioreq_handle_complete_mmio
> If anything, for x86 it ought to be the other way around, at
> which point you wouldn't need any alias #define on Arm.
But could this approach be accepted? I think it would be the easiest way
to avoid confusing on Arm and avoid renaming that function in the whole 
x86 code.


>
>> 3. Regarding common IOREQ/DM stuff location:
>>
>> Currently it is located at:
>> common/hvm/...
>> include/xen/hvm/...
>>
>> For the non-RFC patch series I am going to avoid using "hvm" name (which
>> is internal detail of arch specific code and shouldn't be exposed to the
>> common code).
>> The question is whether I should use another directory name (probably
>> ioreq?) or just place them in common root directory?
> I think there are arguments for and against hvm/. I'm not of
> the opinion that ioreq/ would be a good name, so if hvm/ was to
> be ruled out, I think the file(s) shouldn't go into separate
> subdirs at all.

Got it.


-- 
Regards,

Oleksandr Tyshchenko



^ permalink raw reply	[flat|nested] 140+ messages in thread

* Re: [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common
  2020-08-21  0:53                               ` Stefano Stabellini
@ 2020-08-21 18:54                                 ` Julien Grall
  0 siblings, 0 replies; 140+ messages in thread
From: Julien Grall @ 2020-08-21 18:54 UTC (permalink / raw)
  To: Stefano Stabellini, Oleksandr
  Cc: paul, xen-devel, 'Oleksandr Tyshchenko',
	'Jan Beulich', 'Andrew Cooper', 'Wei Liu',
	'Roger Pau Monné', 'George Dunlap',
	'Ian Jackson', 'Jun Nakajima',
	'Kevin Tian', 'Tim Deegan',
	'Julien Grall'

Hi Stefano,

On 21/08/2020 01:53, Stefano Stabellini wrote:
> On Thu, 20 Aug 2020, Oleksandr wrote:
>>> On 11/08/2020 23:48, Stefano Stabellini wrote:
>>>>> I have the impression that we disagree in what the Device Emulator is
>>>>> meant to
>>>>> do. IHMO, the goal of the device emulator is to emulate a device in an
>>>>> arch-agnostic way.
>>>>
>>>> That would be great in theory but I am not sure it is achievable: if we
>>>> use an existing emulator like QEMU, even a single device has to fit
>>>> into QEMU's view of the world, which makes assumptions about host
>>>> bridges and apertures. It is impossible today to build QEMU in an
>>>> arch-agnostic way, it has to be tied to an architecture.
>>>
>>> AFAICT, the only reason QEMU cannot build be in an arch-agnostic way is
>>> because of TCG. If this wasn't built then you could easily write a machine
>>> that doesn't depend on the instruction set.
>>>
>>> The proof is, today, we are using QEMU x86 to serve Arm64 guest. Although
>>> this is only for PV drivers.
>>>
>>>>
>>>> I realize we are not building this interface for QEMU specifically, but
>>>> even if we try to make the interface arch-agnostic, in reality the
>>>> emulators won't be arch-agnostic.
>>>
>>> This depends on your goal. If your goal is to write a standalone emulator
>>> for a single device, then it is entirely possible to make it arch-agnostic.
>>>
>>> Per above, this would even be possible if you were emulating a set of
>>> devices.
>>>
>>> What I want to avoid is requiring all the emulators to contain arch-specific
>>> code just because it is easier to get QEMU working on Xen on Arm.
>>>
>>>> If we send a port-mapped I/O request
>>>> to qemu-system-aarch64 who knows what is going to happen: it is a code
>>>> path that it is not explicitly tested.
>>>
>>> Maybe, maybe not. To me this is mostly software issues that can easily be
>>> mitigated if we do proper testing...
>>
>> Could we please find a common ground on whether the PIO handling needs to be
>> implemented on Arm or not? At least for the current patch series.
> 
> Can you do a test on QEMU to verify which address space the PIO BARs are
> using on ARM? I don't know if there is an easy way to test it but it
> would be very useful for this conversation.

This is basically configured by the machine itself. See create_pcie() in 
hw/arm/virt.c.

So the hostcontroller is basically unaware that a MMIO region will be 
used instead of a PIO region.

> 
> 
>> Below my thoughts:
>>  From one side I agree that emulator shouldn't contain any arch-specific code,
>> yes it is hypervisor specific but it should be arch agnostic if possible. So
>> PIO case should be handled.
>>  From other side I tend to think that it might be possible to skip PIO handling
>> for the current patch series (leave it x86 specific for now as we do with
>> handle_realmode_completion()).
>> I think nothing will prevent us from adding PIO handling later on if there is
>> a real need (use-case) for that. Please correct me if I am wrong.
>>
>> I would be absolutely OK with any options.
>>
>> What do you think?
> 
> I agree that PIO handling is not the most critical thing right now given
> that we have quite a few other important TODOs in the series. I'd be
> fine reviewing another version of the series with this issue still
> pending.

For Arm64, the main user will be PCI. So this can be delayed until we 
add support for vPCI.

> 
> 
> Of course, PIO needs to be handled. The key to me is that QEMU (or other
> emulator) should *not* emulate in/out instructions on ARM.

I don't think anyone here suggested that we would emulate in/out 
instructions on Arm. There is actually none of them.

>  PIO ioreq
> requests should not be satisfied by using address_space_io directly (the
> PIO address space that requires special instructions to access it). In
> QEMU the PIO reads/writes should be done via address_space_memory (the
> normal memory mapped address space).
> 
> So either way of the following approaches should be OK:
> 
> 1) Xen sends out PIO addresses as memory mapped addresses, QEMU simply
>     reads/writes on them
> 2) Xen sends out PIO addresses as address_space_io, QEMU finds the
>     mapping to address_space_memory, then reads/writes on
>     address_space_memory
> 
>  From an interface and implementation perspective, 1) means that
> IOREQ_TYPE_PIO is unused on ARM, while 2) means that IOREQ_TYPE_PIO is
> still used as part of the ioreq interface, even if QEMU doesn't directly
> operate on those addresses.
> 
> My preference is 1) because it leads to a simpler solution.

"simpler" is actually very subjective :). So maybe you can clarify some 
of my concerns with this approach.

One part that have been barely discussed is the configuration part.

The discussion below is based on having the virtual PCI hostbridges 
implemented in Xen.

In the case of MMIO BAR, then the emulator doesn't need to know where is 
aperture in advance. This is because the BAR will contain the an 
absolute MMIO address so it can configure the trap correctly.

In the case of PIO BAR, from my understanding, the BAR will contain a 
relative offset from the base of PIO aperture. So the emulator needs to 
know the base address of the PIO aperture. How do you plan to pass the 
information to emulator? How about the case where there are multiple 
hostbridges?

Furthermore, most of the discussions has been focused towards device 
model that will provide emulation for all your devices (e.g. QEMU). 
However, I think this is going to be less common of device model that 
will emulate a single device (e.g. DEMU). This fits more in the 
disaggregation model.

An emulator for a single PCI device is basically the same as a real PCI 
device. Do you agree with that?

The HW engineer designing the PCI device doesn't need to know about the 
architecture. It just needs to understand the interface with the 
hostbridge. The hostbridge will then take care of the differences with 
the architecture.

A developper should really be able to do the same with the emulator. 
I.e. write it for x86 and then just recompile for Arm. With your 
approach, he/she would have to understand how the architecture works.

I still don't quite understand why we are trying to differ here. Why 
would our hostbridge implementation would not abstract the same way as a 
real one does? Can you clarify it?

Maybe the problem is just the naming issue?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 140+ messages in thread

end of thread, other threads:[~2020-08-21 18:55 UTC | newest]

Thread overview: 140+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-03 18:21 [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Oleksandr Tyshchenko
2020-08-03 18:21 ` [RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common Oleksandr Tyshchenko
2020-08-04  7:45   ` Paul Durrant
2020-08-04 11:10     ` Oleksandr
2020-08-04 11:23       ` Paul Durrant
2020-08-04 11:51         ` Oleksandr
2020-08-04 13:18           ` Paul Durrant
2020-08-04 13:52       ` Julien Grall
2020-08-04 15:41         ` Jan Beulich
2020-08-04 19:11         ` Stefano Stabellini
2020-08-05  7:01           ` Jan Beulich
2020-08-06  0:37             ` Stefano Stabellini
2020-08-06  6:59               ` Jan Beulich
2020-08-06 20:32                 ` Stefano Stabellini
2020-08-07 13:19                   ` Oleksandr
2020-08-07 16:45               ` Oleksandr
2020-08-07 21:50                 ` Stefano Stabellini
2020-08-07 22:19                   ` Oleksandr
2020-08-10 13:41                     ` Oleksandr
2020-08-10 23:34                       ` Stefano Stabellini
2020-08-11  9:19                         ` Julien Grall
2020-08-11 10:10                           ` Oleksandr
2020-08-11 22:47                             ` Stefano Stabellini
2020-08-12 14:35                               ` Oleksandr
2020-08-12 23:08                                 ` Stefano Stabellini
2020-08-13 20:16                                   ` Julien Grall
2020-08-07 23:45                   ` Oleksandr
2020-08-10 23:34                     ` Stefano Stabellini
2020-08-05  8:33           ` Julien Grall
2020-08-06  0:37             ` Stefano Stabellini
2020-08-06  9:45               ` Julien Grall
2020-08-06 23:48                 ` Stefano Stabellini
2020-08-10 19:20                   ` Julien Grall
2020-08-10 23:34                     ` Stefano Stabellini
2020-08-11 11:28                       ` Julien Grall
2020-08-11 22:48                         ` Stefano Stabellini
2020-08-12  8:19                           ` Julien Grall
2020-08-20 19:14                             ` Oleksandr
2020-08-21  0:53                               ` Stefano Stabellini
2020-08-21 18:54                                 ` Julien Grall
2020-08-05 13:30   ` Julien Grall
2020-08-06 11:37     ` Oleksandr
2020-08-10 16:29       ` Julien Grall
2020-08-10 17:28         ` Oleksandr
2020-08-05 16:15   ` Andrew Cooper
2020-08-06  8:20     ` Oleksandr
2020-08-15 17:30   ` Julien Grall
2020-08-16 19:37     ` Oleksandr
2020-08-03 18:21 ` [RFC PATCH V1 02/12] hvm/dm: Make x86's DM " Oleksandr Tyshchenko
2020-08-03 18:21 ` [RFC PATCH V1 03/12] xen/mm: Make x86's XENMEM_resource_ioreq_server handling common Oleksandr Tyshchenko
2020-08-03 18:21 ` [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features Oleksandr Tyshchenko
2020-08-04  7:49   ` Paul Durrant
2020-08-04 14:01     ` Julien Grall
2020-08-04 23:22       ` Stefano Stabellini
2020-08-15 17:56       ` Julien Grall
2020-08-17 14:36         ` Oleksandr
2020-08-04 23:22   ` Stefano Stabellini
2020-08-05  7:05     ` Jan Beulich
2020-08-05 16:41       ` Stefano Stabellini
2020-08-05 19:45         ` Oleksandr
2020-08-05  9:32     ` Julien Grall
2020-08-05 15:41       ` Oleksandr
2020-08-06 10:19         ` Julien Grall
2020-08-10 18:09       ` Oleksandr
2020-08-10 18:21         ` Oleksandr
2020-08-10 19:00         ` Julien Grall
2020-08-10 20:29           ` Oleksandr
2020-08-10 22:37             ` Julien Grall
2020-08-11  6:13               ` Oleksandr
2020-08-12 15:08                 ` Oleksandr
2020-08-11 17:09       ` Oleksandr
2020-08-11 17:50         ` Julien Grall
2020-08-13 18:41           ` Oleksandr
2020-08-13 20:36             ` Julien Grall
2020-08-13 21:49               ` Oleksandr
2020-08-13 20:39             ` Oleksandr Tyshchenko
2020-08-13 22:14               ` Julien Grall
2020-08-14 12:08                 ` Oleksandr
2020-08-05 14:12   ` Julien Grall
2020-08-05 14:45     ` Jan Beulich
2020-08-05 19:30     ` Oleksandr
2020-08-06 11:08       ` Julien Grall
2020-08-06 11:29         ` Jan Beulich
2020-08-20 18:30           ` Oleksandr
2020-08-21  6:16             ` Jan Beulich
2020-08-21 11:13               ` Oleksandr
2020-08-06 13:27         ` Oleksandr
2020-08-10 18:25           ` Julien Grall
2020-08-10 19:58             ` Oleksandr
2020-08-05 16:13   ` Jan Beulich
2020-08-05 19:47     ` Oleksandr
2020-08-03 18:21 ` [RFC PATCH V1 05/12] hvm/dm: Introduce xendevicemodel_set_irq_level DM op Oleksandr Tyshchenko
2020-08-04 23:22   ` Stefano Stabellini
2020-08-05  9:39     ` Julien Grall
2020-08-06  0:37       ` Stefano Stabellini
2020-08-06 11:32         ` Julien Grall
2020-08-06 23:49           ` Stefano Stabellini
2020-08-07  8:43             ` Jan Beulich
2020-08-07 21:50               ` Stefano Stabellini
2020-08-08  9:27                 ` Julien Grall
2020-08-08  9:28                   ` Julien Grall
2020-08-10 23:34                   ` Stefano Stabellini
2020-08-11 13:04                     ` Julien Grall
2020-08-11 22:48                       ` Stefano Stabellini
2020-08-18  9:31                         ` Julien Grall
2020-08-21  0:53                           ` Stefano Stabellini
2020-08-17 15:23                 ` Jan Beulich
2020-08-17 22:56                   ` Stefano Stabellini
2020-08-18  8:03                     ` Jan Beulich
2020-08-05 16:15   ` Jan Beulich
2020-08-05 22:12     ` Oleksandr
2020-08-03 18:21 ` [RFC PATCH V1 06/12] libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko
2020-08-03 18:21 ` [RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain Oleksandr Tyshchenko
2020-08-05 16:19   ` Jan Beulich
2020-08-05 16:40     ` Paul Durrant
2020-08-06  9:22       ` Oleksandr
2020-08-06  9:27         ` Jan Beulich
2020-08-14 16:30           ` Oleksandr
2020-08-16 15:36             ` Julien Grall
2020-08-17 15:07               ` Oleksandr
2020-08-03 18:21 ` [RFC PATCH V1 08/12] xen/arm: Invalidate qemu mapcache on XENMEM_decrease_reservation Oleksandr Tyshchenko
2020-08-05 16:21   ` Jan Beulich
2020-08-06 11:35     ` Julien Grall
2020-08-06 11:50       ` Jan Beulich
2020-08-06 14:28         ` Oleksandr
2020-08-06 16:33           ` Jan Beulich
2020-08-06 16:57             ` Oleksandr
2020-08-03 18:21 ` [RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way Oleksandr Tyshchenko
2020-08-04 23:22   ` Stefano Stabellini
2020-08-05 20:51     ` Oleksandr
2020-08-03 18:21 ` [RFC PATCH V1 10/12] libxl: Add support for virtio-disk configuration Oleksandr Tyshchenko
2020-08-04 23:23   ` Stefano Stabellini
2020-08-05 21:12     ` Oleksandr
2020-08-06  0:37       ` Stefano Stabellini
2020-08-03 18:21 ` [RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node Oleksandr Tyshchenko
2020-08-04 23:23   ` Stefano Stabellini
2020-08-05 20:35     ` Oleksandr
2020-08-03 18:21 ` [RFC PATCH V1 12/12] libxl: Fix duplicate memory node in DT Oleksandr Tyshchenko
2020-08-15 17:24 ` [RFC PATCH V1 00/12] IOREQ feature (+ virtio-mmio) on Arm Julien Grall
2020-08-16 19:34   ` Oleksandr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).