All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU
@ 2020-01-07 14:36 Jan Kiszka
  2020-01-07 14:36 ` [RFC][PATCH v2 1/3] hw/misc: Add implementation of ivshmem revision 2 device Jan Kiszka
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Jan Kiszka @ 2020-01-07 14:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: liang yan, Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi

Overdue update of the ivshmem 2.0 device model as presented at [1].

Changes in v2:
 - changed PCI device ID to Siemens-granted one,
   adjusted PCI device revision to 0
 - removed unused feature register from device
 - addressed feedback on specification document
 - rebased over master

This version is now fully in sync with the implementation for Jailhouse
that is currently under review [2][3], UIO and virtio-ivshmem drivers
are shared. Jailhouse will very likely pick up this revision of the
device in order to move forward with stressing it.

More details on the usage with QEMU were in the original cover letter
(with adjustements to the new device ID):

If you want to play with this, the basic setup of the shared memory
device is described in patch 1 and 3. UIO driver and also the
virtio-ivshmem prototype can be found at

    http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2

Accessing the device via UIO is trivial enough. If you want to use it
for virtio, this is additionally to the description in patch 3 needed on
the virtio console backend side:

    modprobe uio_ivshmem
    echo "110a 4106 1af4 1100 ffc003 ffffff" > /sys/bus/pci/drivers/uio_ivshmem/new_id
    linux/tools/virtio/virtio-ivshmem-console /dev/uio0

And for virtio block:

    echo "110a 4106 1af4 1100 ffc002 ffffff" > /sys/bus/pci/drivers/uio_ivshmem/new_id
    linux/tools/virtio/virtio-ivshmem-console /dev/uio0 /path/to/disk.img

After that, you can start the QEMU frontend instance with the
virtio-ivshmem driver installed which can use the new /dev/hvc* or
/dev/vda* as usual.

Any feedback welcome!

Jan

PS: Let me know if I missed someone potentially interested in this topic
on CC - or if you would like to be dropped from the list.

[1] https://kvmforum2019.sched.com/event/TmxI
[2] https://groups.google.com/forum/#!topic/jailhouse-dev/ffnCcRh8LOs
[3] https://groups.google.com/forum/#!topic/jailhouse-dev/HX-0AGF1cjg

Jan Kiszka (3):
  hw/misc: Add implementation of ivshmem revision 2 device
  docs/specs: Add specification of ivshmem device revision 2
  contrib: Add server for ivshmem revision 2

 Makefile                                  |    3 +
 Makefile.objs                             |    1 +
 configure                                 |    1 +
 contrib/ivshmem2-server/Makefile.objs     |    1 +
 contrib/ivshmem2-server/ivshmem2-server.c |  462 ++++++++++++
 contrib/ivshmem2-server/ivshmem2-server.h |  158 +++++
 contrib/ivshmem2-server/main.c            |  313 +++++++++
 docs/specs/ivshmem-2-device-spec.md       |  376 ++++++++++
 hw/misc/Makefile.objs                     |    2 +-
 hw/misc/ivshmem2.c                        | 1085 +++++++++++++++++++++++++++++
 include/hw/misc/ivshmem2.h                |   48 ++
 include/hw/pci/pci_ids.h                  |    2 +
 12 files changed, 2451 insertions(+), 1 deletion(-)
 create mode 100644 contrib/ivshmem2-server/Makefile.objs
 create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c
 create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h
 create mode 100644 contrib/ivshmem2-server/main.c
 create mode 100644 docs/specs/ivshmem-2-device-spec.md
 create mode 100644 hw/misc/ivshmem2.c
 create mode 100644 include/hw/misc/ivshmem2.h

-- 
2.16.4



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][PATCH v2 1/3] hw/misc: Add implementation of ivshmem revision 2 device
  2020-01-07 14:36 [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU Jan Kiszka
@ 2020-01-07 14:36 ` Jan Kiszka
  2020-04-29  4:20   ` Liang Yan
  2020-01-07 14:36 ` [RFC][PATCH v2 2/3] docs/specs: Add specification of ivshmem device revision 2 Jan Kiszka
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2020-01-07 14:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: liang yan, Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi

From: Jan Kiszka <jan.kiszka@siemens.com>

This adds a reimplementation of ivshmem in its new revision 2 as
separate device. The goal of this is not to enable sharing with v1,
rather to allow explore the properties and potential limitation of the
new version prior to discussing its integration with the existing code.

v2 always requires a server to interconnect two more more QEMU
instances because it provides signaling between peers unconditionally.
Therefore, only the interconnecting chardev, master mode, and the usage
of ioeventfd can be configured at device level. All other parameters are
defined by the server instance.

A new server protocol is introduced along this. Its primary difference
is the introduction of a single welcome message that contains all peer
parameters, rather than a series of single-word messages pushing them
piece by piece.

A complicating difference in interrupt handling, compare to v1, is the
auto-disable mode of v2: When this is active, interrupt delivery is
disabled by the device after each interrupt event. This prevents the
usage of irqfd on the receiving side, but it lowers the handling cost
for guests that implemented interrupt throttling this way (specifically
when exposing the device via UIO).

No changes have been made to the ivshmem device regarding migration:
Only the master can live-migrate, slave peers have to hot-unplug the
device first.

The details of the device model will be specified in a succeeding
commit. Drivers for this device can currently be found under

http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2

To instantiate a ivshmem v2 device, just add

 ... -chardev socket,path=/tmp/ivshmem_socket,id=ivshmem \
     -device ivshmem,chardev=ivshmem

provided the server opened its socket under the default path.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 hw/misc/Makefile.objs      |    2 +-
 hw/misc/ivshmem2.c         | 1085 ++++++++++++++++++++++++++++++++++++++++++++
 include/hw/misc/ivshmem2.h |   48 ++
 include/hw/pci/pci_ids.h   |    2 +
 4 files changed, 1136 insertions(+), 1 deletion(-)
 create mode 100644 hw/misc/ivshmem2.c
 create mode 100644 include/hw/misc/ivshmem2.h

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index ba898a5781..90a4a6608c 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -26,7 +26,7 @@ common-obj-$(CONFIG_PUV3) += puv3_pm.o
 
 common-obj-$(CONFIG_MACIO) += macio/
 
-common-obj-$(CONFIG_IVSHMEM_DEVICE) += ivshmem.o
+common-obj-$(CONFIG_IVSHMEM_DEVICE) += ivshmem.o ivshmem2.o
 
 common-obj-$(CONFIG_REALVIEW) += arm_sysctl.o
 common-obj-$(CONFIG_NSERIES) += cbus.o
diff --git a/hw/misc/ivshmem2.c b/hw/misc/ivshmem2.c
new file mode 100644
index 0000000000..d5f88ed0e9
--- /dev/null
+++ b/hw/misc/ivshmem2.c
@@ -0,0 +1,1085 @@
+/*
+ * Inter-VM Shared Memory PCI device, version 2.
+ *
+ * Copyright (c) Siemens AG, 2019
+ *
+ * Authors:
+ *  Jan Kiszka <jan.kiszka@siemens.com>
+ *
+ * Based on ivshmem.c by Cam Macdonell <cam@cs.ualberta.ca>
+ *
+ * This code is licensed under the GNU GPL v2.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qapi/error.h"
+#include "qemu/cutils.h"
+#include "hw/hw.h"
+#include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+#include "hw/qdev-properties.h"
+#include "sysemu/kvm.h"
+#include "migration/blocker.h"
+#include "migration/vmstate.h"
+#include "qemu/error-report.h"
+#include "qemu/event_notifier.h"
+#include "qemu/module.h"
+#include "qom/object_interfaces.h"
+#include "chardev/char-fe.h"
+#include "sysemu/qtest.h"
+#include "qapi/visitor.h"
+
+#include "hw/misc/ivshmem2.h"
+
+#define PCI_VENDOR_ID_IVSHMEM   PCI_VENDOR_ID_SIEMENS
+#define PCI_DEVICE_ID_IVSHMEM   0x4106
+
+#define IVSHMEM_MAX_PEERS       UINT16_MAX
+#define IVSHMEM_IOEVENTFD       0
+#define IVSHMEM_MSI             1
+
+#define IVSHMEM_REG_BAR_SIZE    0x1000
+
+#define IVSHMEM_REG_ID          0x00
+#define IVSHMEM_REG_MAX_PEERS   0x04
+#define IVSHMEM_REG_INT_CTRL    0x08
+#define IVSHMEM_REG_DOORBELL    0x0c
+#define IVSHMEM_REG_STATE       0x10
+
+#define IVSHMEM_INT_ENABLE      0x1
+
+#define IVSHMEM_ONESHOT_MODE    0x1
+
+#define IVSHMEM_DEBUG 0
+#define IVSHMEM_DPRINTF(fmt, ...)                       \
+    do {                                                \
+        if (IVSHMEM_DEBUG) {                            \
+            printf("IVSHMEM: " fmt, ## __VA_ARGS__);    \
+        }                                               \
+    } while (0)
+
+#define TYPE_IVSHMEM "ivshmem"
+#define IVSHMEM(obj) \
+    OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM)
+
+typedef struct Peer {
+    int nb_eventfds;
+    EventNotifier *eventfds;
+} Peer;
+
+typedef struct MSIVector {
+    PCIDevice *pdev;
+    int virq;
+    bool unmasked;
+} MSIVector;
+
+typedef struct IVShmemVndrCap {
+    uint8_t id;
+    uint8_t next;
+    uint8_t length;
+    uint8_t priv_ctrl;
+    uint32_t state_tab_sz;
+    uint64_t rw_section_sz;
+    uint64_t output_section_sz;
+} IVShmemVndrCap;
+
+typedef struct IVShmemState {
+    /*< private >*/
+    PCIDevice parent_obj;
+    /*< public >*/
+
+    uint32_t features;
+
+    CharBackend server_chr;
+
+    /* registers */
+    uint8_t *priv_ctrl;
+    uint32_t vm_id;
+    uint32_t intctrl;
+    uint32_t state;
+
+    /* BARs */
+    MemoryRegion ivshmem_mmio; /* BAR 0 (registers) */
+    MemoryRegion ivshmem_bar2; /* BAR 2 (shared memory) */
+
+    void *shmem;
+    size_t shmem_sz;
+    size_t output_section_sz;
+
+    MemoryRegion state_tab;
+    MemoryRegion rw_section;
+    MemoryRegion input_sections;
+    MemoryRegion output_section;
+
+    /* interrupt support */
+    Peer *peers;
+    int nb_peers;               /* space in @peers[] */
+    uint32_t max_peers;
+    uint32_t vectors;
+    MSIVector *msi_vectors;
+
+    uint8_t msg_buf[32];        /* buffer for receiving server messages */
+    int msg_buffered_bytes;     /* #bytes in @msg_buf */
+
+    uint32_t protocol;
+
+    /* migration stuff */
+    OnOffAuto master;
+    Error *migration_blocker;
+} IVShmemState;
+
+static void ivshmem_enable_irqfd(IVShmemState *s);
+static void ivshmem_disable_irqfd(IVShmemState *s);
+
+static inline uint32_t ivshmem_has_feature(IVShmemState *ivs,
+                                           unsigned int feature) {
+    return (ivs->features & (1 << feature));
+}
+
+static inline bool ivshmem_is_master(IVShmemState *s)
+{
+    assert(s->master != ON_OFF_AUTO_AUTO);
+    return s->master == ON_OFF_AUTO_ON;
+}
+
+static bool ivshmem_irqfd_usable(IVShmemState *s)
+{
+    PCIDevice *pdev = PCI_DEVICE(s);
+
+    return (s->intctrl & IVSHMEM_INT_ENABLE) && msix_enabled(pdev) &&
+        !(*s->priv_ctrl & IVSHMEM_ONESHOT_MODE);
+}
+
+static void ivshmem_update_irqfd(IVShmemState *s, bool was_usable)
+{
+    bool is_usable = ivshmem_irqfd_usable(s);
+
+    if (kvm_msi_via_irqfd_enabled()) {
+        if (!was_usable && is_usable) {
+            ivshmem_enable_irqfd(s);
+        } else if (was_usable && !is_usable) {
+            ivshmem_disable_irqfd(s);
+        }
+    }
+}
+
+static void ivshmem_write_intctrl(IVShmemState *s, uint32_t new_state)
+{
+    bool was_usable = ivshmem_irqfd_usable(s);
+
+    s->intctrl = new_state & IVSHMEM_INT_ENABLE;
+    ivshmem_update_irqfd(s, was_usable);
+}
+
+static void ivshmem_write_state(IVShmemState *s, uint32_t new_state)
+{
+    uint32_t *state_table = s->shmem;
+    int peer;
+
+    state_table[s->vm_id] = new_state;
+    smp_mb();
+
+    if (s->state != new_state) {
+        s->state = new_state;
+        for (peer = 0; peer < s->nb_peers; peer++) {
+            if (peer != s->vm_id && s->peers[peer].nb_eventfds > 0) {
+                event_notifier_set(&s->peers[peer].eventfds[0]);
+            }
+        }
+    }
+}
+
+static void ivshmem_io_write(void *opaque, hwaddr addr,
+                             uint64_t val, unsigned size)
+{
+    IVShmemState *s = opaque;
+
+    uint16_t dest = val >> 16;
+    uint16_t vector = val & 0xff;
+
+    addr &= 0xfc;
+
+    IVSHMEM_DPRINTF("writing to addr " TARGET_FMT_plx "\n", addr);
+    switch (addr) {
+    case IVSHMEM_REG_INT_CTRL:
+        ivshmem_write_intctrl(s, val);
+        break;
+
+    case IVSHMEM_REG_DOORBELL:
+        /* check that dest VM ID is reasonable */
+        if (dest >= s->nb_peers) {
+            IVSHMEM_DPRINTF("Invalid destination VM ID (%d)\n", dest);
+            break;
+        }
+
+        /* check doorbell range */
+        if (vector < s->peers[dest].nb_eventfds) {
+            IVSHMEM_DPRINTF("Notifying VM %d on vector %d\n", dest, vector);
+            event_notifier_set(&s->peers[dest].eventfds[vector]);
+        } else {
+            IVSHMEM_DPRINTF("Invalid destination vector %d on VM %d\n",
+                            vector, dest);
+        }
+        break;
+
+    case IVSHMEM_REG_STATE:
+        ivshmem_write_state(s, val);
+        break;
+
+    default:
+        IVSHMEM_DPRINTF("Unhandled write " TARGET_FMT_plx "\n", addr);
+    }
+}
+
+static uint64_t ivshmem_io_read(void *opaque, hwaddr addr,
+                                unsigned size)
+{
+    IVShmemState *s = opaque;
+    uint32_t ret;
+
+    switch (addr) {
+    case IVSHMEM_REG_ID:
+        ret = s->vm_id;
+        break;
+
+    case IVSHMEM_REG_MAX_PEERS:
+        ret = s->max_peers;
+        break;
+
+    case IVSHMEM_REG_INT_CTRL:
+        ret = s->intctrl;
+        break;
+
+    case IVSHMEM_REG_STATE:
+        ret = s->state;
+        break;
+
+    default:
+        IVSHMEM_DPRINTF("why are we reading " TARGET_FMT_plx "\n", addr);
+        ret = 0;
+    }
+
+    return ret;
+}
+
+static const MemoryRegionOps ivshmem_mmio_ops = {
+    .read = ivshmem_io_read,
+    .write = ivshmem_io_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+static void ivshmem_vector_notify(void *opaque)
+{
+    MSIVector *entry = opaque;
+    PCIDevice *pdev = entry->pdev;
+    IVShmemState *s = IVSHMEM(pdev);
+    int vector = entry - s->msi_vectors;
+    EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
+
+    if (!event_notifier_test_and_clear(n) ||
+        !(s->intctrl & IVSHMEM_INT_ENABLE)) {
+        return;
+    }
+
+    IVSHMEM_DPRINTF("interrupt on vector %p %d\n", pdev, vector);
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        if (msix_enabled(pdev)) {
+            msix_notify(pdev, vector);
+        }
+    } else if (pdev->config[PCI_INTERRUPT_PIN]) {
+        pci_set_irq(pdev, 1);
+        pci_set_irq(pdev, 0);
+    }
+    if (*s->priv_ctrl & IVSHMEM_ONESHOT_MODE) {
+        s->intctrl &= ~IVSHMEM_INT_ENABLE;
+    }
+}
+
+static int ivshmem_irqfd_vector_unmask(PCIDevice *dev, unsigned vector,
+                                       MSIMessage msg)
+{
+    IVShmemState *s = IVSHMEM(dev);
+    EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
+    MSIVector *v = &s->msi_vectors[vector];
+    int ret;
+
+    IVSHMEM_DPRINTF("vector unmask %p %d\n", dev, vector);
+    if (!v->pdev) {
+        error_report("ivshmem: vector %d route does not exist", vector);
+        return -EINVAL;
+    }
+    assert(!v->unmasked);
+
+    ret = kvm_irqchip_add_msi_route(kvm_state, vector, dev);
+    if (ret < 0) {
+        error_report("kvm_irqchip_add_msi_route failed");
+        return ret;
+    }
+    v->virq = ret;
+    kvm_irqchip_commit_routes(kvm_state);
+
+    ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL, v->virq);
+    if (ret < 0) {
+        error_report("kvm_irqchip_add_irqfd_notifier_gsi failed");
+        return ret;
+    }
+    v->unmasked = true;
+
+    return 0;
+}
+
+static void ivshmem_irqfd_vector_mask(PCIDevice *dev, unsigned vector)
+{
+    IVShmemState *s = IVSHMEM(dev);
+    EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
+    MSIVector *v = &s->msi_vectors[vector];
+    int ret;
+
+    IVSHMEM_DPRINTF("vector mask %p %d\n", dev, vector);
+    if (!v->pdev) {
+        error_report("ivshmem: vector %d route does not exist", vector);
+        return;
+    }
+    assert(v->unmasked);
+
+    ret = kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, n, v->virq);
+    if (ret < 0) {
+        error_report("remove_irqfd_notifier_gsi failed");
+        return;
+    }
+    kvm_irqchip_release_virq(kvm_state, v->virq);
+
+    v->unmasked = false;
+}
+
+static void ivshmem_irqfd_vector_poll(PCIDevice *dev,
+                                      unsigned int vector_start,
+                                      unsigned int vector_end)
+{
+    IVShmemState *s = IVSHMEM(dev);
+    unsigned int vector;
+
+    IVSHMEM_DPRINTF("vector poll %p %d-%d\n", dev, vector_start, vector_end);
+
+    vector_end = MIN(vector_end, s->vectors);
+
+    for (vector = vector_start; vector < vector_end; vector++) {
+        EventNotifier *notifier = &s->peers[s->vm_id].eventfds[vector];
+
+        if (!msix_is_masked(dev, vector)) {
+            continue;
+        }
+
+        if (event_notifier_test_and_clear(notifier)) {
+            msix_set_pending(dev, vector);
+        }
+    }
+}
+
+static void ivshmem_watch_vector_notifier(IVShmemState *s, int vector)
+{
+    EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
+    int eventfd = event_notifier_get_fd(n);
+
+    assert(!s->msi_vectors[vector].pdev);
+    s->msi_vectors[vector].pdev = PCI_DEVICE(s);
+
+    qemu_set_fd_handler(eventfd, ivshmem_vector_notify,
+                        NULL, &s->msi_vectors[vector]);
+}
+
+static void ivshmem_unwatch_vector_notifier(IVShmemState *s, int vector)
+{
+    EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
+    int eventfd = event_notifier_get_fd(n);
+
+    if (!s->msi_vectors[vector].pdev) {
+        return;
+    }
+
+    qemu_set_fd_handler(eventfd, NULL, NULL, NULL);
+
+    s->msi_vectors[vector].pdev = NULL;
+}
+
+static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i)
+{
+    memory_region_add_eventfd(&s->ivshmem_mmio,
+                              IVSHMEM_REG_DOORBELL,
+                              4,
+                              true,
+                              (posn << 16) | i,
+                              &s->peers[posn].eventfds[i]);
+}
+
+static void ivshmem_del_eventfd(IVShmemState *s, int posn, int i)
+{
+    memory_region_del_eventfd(&s->ivshmem_mmio,
+                              IVSHMEM_REG_DOORBELL,
+                              4,
+                              true,
+                              (posn << 16) | i,
+                              &s->peers[posn].eventfds[i]);
+}
+
+static void close_peer_eventfds(IVShmemState *s, int posn)
+{
+    int i, n;
+
+    assert(posn >= 0 && posn < s->nb_peers);
+    n = s->peers[posn].nb_eventfds;
+
+    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
+        memory_region_transaction_begin();
+        for (i = 0; i < n; i++) {
+            ivshmem_del_eventfd(s, posn, i);
+        }
+        memory_region_transaction_commit();
+    }
+
+    for (i = 0; i < n; i++) {
+        event_notifier_cleanup(&s->peers[posn].eventfds[i]);
+    }
+
+    g_free(s->peers[posn].eventfds);
+    s->peers[posn].nb_eventfds = 0;
+}
+
+static void resize_peers(IVShmemState *s, int nb_peers)
+{
+    int old_nb_peers = s->nb_peers;
+    int i;
+
+    assert(nb_peers > old_nb_peers);
+    IVSHMEM_DPRINTF("bumping storage to %d peers\n", nb_peers);
+
+    s->peers = g_realloc(s->peers, nb_peers * sizeof(Peer));
+    s->nb_peers = nb_peers;
+
+    for (i = old_nb_peers; i < nb_peers; i++) {
+        s->peers[i].eventfds = NULL;
+        s->peers[i].nb_eventfds = 0;
+    }
+}
+
+static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector, Error **errp)
+{
+    PCIDevice *pdev = PCI_DEVICE(s);
+
+    IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
+    assert(!s->msi_vectors[vector].pdev);
+
+    s->msi_vectors[vector].unmasked = false;
+    s->msi_vectors[vector].pdev = pdev;
+}
+
+static void ivshmem_remove_kvm_msi_virq(IVShmemState *s, int vector)
+{
+    IVSHMEM_DPRINTF("ivshmem_remove_kvm_msi_virq vector:%d\n", vector);
+
+    if (s->msi_vectors[vector].pdev == NULL) {
+        return;
+    }
+
+    if (s->msi_vectors[vector].unmasked) {
+        ivshmem_irqfd_vector_mask(s->msi_vectors[vector].pdev, vector);
+    }
+
+    s->msi_vectors[vector].pdev = NULL;
+}
+
+static void process_msg_disconnect(IVShmemState *s, IvshmemPeerGone *msg,
+                                   Error **errp)
+{
+    if (msg->header.len < sizeof(*msg)) {
+        error_setg(errp, "Invalid peer-gone message size");
+        return;
+    }
+
+    le32_to_cpus(&msg->id);
+
+    IVSHMEM_DPRINTF("peer %d has gone away\n", msg->id);
+    if (msg->id >= s->nb_peers || msg->id == s->vm_id) {
+        error_setg(errp, "invalid peer %d", msg->id);
+        return;
+    }
+    close_peer_eventfds(s, msg->id);
+    event_notifier_set(&s->peers[s->vm_id].eventfds[0]);
+}
+
+static void process_msg_connect(IVShmemState *s, IvshmemEventFd *msg, int fd,
+                                Error **errp)
+{
+    Peer *peer;
+
+    if (msg->header.len < sizeof(*msg)) {
+        error_setg(errp, "Invalid eventfd message size");
+        close(fd);
+        return;
+    }
+
+    le32_to_cpus(&msg->id);
+    le32_to_cpus(&msg->vector);
+
+    if (msg->id >= s->nb_peers) {
+        resize_peers(s, msg->id + 1);
+    }
+
+    peer = &s->peers[msg->id];
+
+    /*
+     * The N-th connect message for this peer comes with the file
+     * descriptor for vector N-1.
+     */
+    if (msg->vector != peer->nb_eventfds) {
+        error_setg(errp, "Received vector %d out of order", msg->vector);
+        close(fd);
+        return;
+    }
+    if (peer->nb_eventfds >= s->vectors) {
+        error_setg(errp, "Too many eventfd received, device has %d vectors",
+                   s->vectors);
+        close(fd);
+        return;
+    }
+    peer->nb_eventfds++;
+
+    if (msg->vector == 0)
+        peer->eventfds = g_new0(EventNotifier, s->vectors);
+
+    IVSHMEM_DPRINTF("eventfds[%d][%d] = %d\n", msg->id, msg->vector, fd);
+    event_notifier_init_fd(&peer->eventfds[msg->vector], fd);
+    fcntl_setfl(fd, O_NONBLOCK); /* msix/irqfd poll non block */
+
+    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
+        ivshmem_add_eventfd(s, msg->id, msg->vector);
+    }
+
+    if (msg->id == s->vm_id) {
+        ivshmem_watch_vector_notifier(s, peer->nb_eventfds - 1);
+    }
+}
+
+static int ivshmem_can_receive(void *opaque)
+{
+    IVShmemState *s = opaque;
+
+    assert(s->msg_buffered_bytes < sizeof(s->msg_buf));
+    return sizeof(s->msg_buf) - s->msg_buffered_bytes;
+}
+
+static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
+{
+    IVShmemState *s = opaque;
+    IvshmemMsgHeader *header = (IvshmemMsgHeader *)&s->msg_buf;
+    Error *err = NULL;
+    int fd;
+
+    assert(size >= 0 && s->msg_buffered_bytes + size <= sizeof(s->msg_buf));
+    memcpy(s->msg_buf + s->msg_buffered_bytes, buf, size);
+    s->msg_buffered_bytes += size;
+    if (s->msg_buffered_bytes < sizeof(*header) ||
+        s->msg_buffered_bytes < le32_to_cpu(header->len)) {
+        return;
+    }
+
+    fd = qemu_chr_fe_get_msgfd(&s->server_chr);
+
+    le32_to_cpus(&header->type);
+    le32_to_cpus(&header->len);
+
+    switch (header->type) {
+    case IVSHMEM_MSG_EVENT_FD:
+        process_msg_connect(s, (IvshmemEventFd *)header, fd, &err);
+        break;
+    case IVSHMEM_MSG_PEER_GONE:
+        process_msg_disconnect(s, (IvshmemPeerGone *)header, &err);
+        break;
+    default:
+        error_setg(&err, "invalid message, type %d", header->type);
+        break;
+    }
+    if (err) {
+        error_report_err(err);
+    }
+
+    s->msg_buffered_bytes -= header->len;
+    memmove(s->msg_buf, s->msg_buf + header->len, s->msg_buffered_bytes);
+}
+
+static void ivshmem_recv_setup(IVShmemState *s, Error **errp)
+{
+    IvshmemInitialInfo msg;
+    struct stat buf;
+    uint8_t dummy;
+    int fd, n, ret;
+
+    n = 0;
+    do {
+        ret = qemu_chr_fe_read_all(&s->server_chr, (uint8_t *)&msg + n,
+                                   sizeof(msg) - n);
+        if (ret < 0) {
+            if (ret == -EINTR) {
+                continue;
+            }
+            error_setg_errno(errp, -ret, "read from server failed");
+            return;
+        }
+        n += ret;
+    } while (n < sizeof(msg));
+
+    fd = qemu_chr_fe_get_msgfd(&s->server_chr);
+
+    le32_to_cpus(&msg.header.type);
+    le32_to_cpus(&msg.header.len);
+    if (msg.header.type != IVSHMEM_MSG_INIT || msg.header.len < sizeof(msg)) {
+        error_setg(errp, "server sent invalid initial info");
+        return;
+    }
+
+    /* consume additional bytes of message */
+    msg.header.len -= sizeof(msg);
+    while (msg.header.len > 0) {
+        ret = qemu_chr_fe_read_all(&s->server_chr, &dummy, 1);
+        if (ret < 0) {
+            if (ret == -EINTR) {
+                continue;
+            }
+            error_setg_errno(errp, -ret, "read from server failed");
+            return;
+        }
+        msg.header.len -= ret;
+    }
+
+    le32_to_cpus(&msg.compatible_version);
+    if (msg.compatible_version != IVSHMEM_PROTOCOL_VERSION) {
+        error_setg(errp, "server sent compatible version %u, expecting %u",
+                   msg.compatible_version, IVSHMEM_PROTOCOL_VERSION);
+        return;
+    }
+
+    le32_to_cpus(&msg.id);
+    if (msg.id > IVSHMEM_MAX_PEERS) {
+        error_setg(errp, "server sent invalid ID");
+        return;
+    }
+    s->vm_id = msg.id;
+
+    if (fstat(fd, &buf) < 0) {
+        error_setg_errno(errp, errno,
+            "can't determine size of shared memory sent by server");
+        close(fd);
+        return;
+    }
+
+    s->shmem_sz = buf.st_size;
+
+    s->shmem = mmap(NULL, s->shmem_sz, PROT_READ | PROT_WRITE, MAP_SHARED,
+                    fd, 0);
+    if (s->shmem == MAP_FAILED) {
+        error_setg_errno(errp, errno,
+                         "can't map shared memory sent by server");
+        return;
+    }
+
+    le32_to_cpus(&msg.vectors);
+    if (msg.vectors < 1 || msg.vectors > 1024) {
+        error_setg(errp, "server sent invalid number of vectors message");
+        return;
+    }
+    s->vectors = msg.vectors;
+
+    s->max_peers = le32_to_cpu(msg.max_peers);
+    s->protocol = le32_to_cpu(msg.protocol);
+    s->output_section_sz = le64_to_cpu(msg.output_section_size);
+}
+
+/* Select the MSI-X vectors used by device.
+ * ivshmem maps events to vectors statically, so
+ * we just enable all vectors on init and after reset. */
+static void ivshmem_msix_vector_use(IVShmemState *s)
+{
+    PCIDevice *d = PCI_DEVICE(s);
+    int i;
+
+    for (i = 0; i < s->vectors; i++) {
+        msix_vector_use(d, i);
+    }
+}
+
+static void ivshmem_reset(DeviceState *d)
+{
+    IVShmemState *s = IVSHMEM(d);
+
+    ivshmem_disable_irqfd(s);
+
+    s->intctrl = 0;
+    ivshmem_write_state(s, 0);
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        ivshmem_msix_vector_use(s);
+    }
+}
+
+static int ivshmem_setup_interrupts(IVShmemState *s, Error **errp)
+{
+    /* allocate QEMU callback data for receiving interrupts */
+    s->msi_vectors = g_malloc0(s->vectors * sizeof(MSIVector));
+
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1, errp)) {
+            IVSHMEM_DPRINTF("msix requested but not available - disabling\n");
+            s->features &= ~(IVSHMEM_MSI | IVSHMEM_IOEVENTFD);
+        } else {
+            IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
+            ivshmem_msix_vector_use(s);
+        }
+    }
+
+    return 0;
+}
+
+static void ivshmem_enable_irqfd(IVShmemState *s)
+{
+    PCIDevice *pdev = PCI_DEVICE(s);
+    int i;
+
+    for (i = 0; i < s->peers[s->vm_id].nb_eventfds; i++) {
+        Error *err = NULL;
+
+        ivshmem_unwatch_vector_notifier(s, i);
+
+        ivshmem_add_kvm_msi_virq(s, i, &err);
+        if (err) {
+            error_report_err(err);
+            goto undo;
+        }
+    }
+
+    if (msix_set_vector_notifiers(pdev,
+                                  ivshmem_irqfd_vector_unmask,
+                                  ivshmem_irqfd_vector_mask,
+                                  ivshmem_irqfd_vector_poll)) {
+        error_report("ivshmem: msix_set_vector_notifiers failed");
+        goto undo;
+    }
+    return;
+
+undo:
+    while (--i >= 0) {
+        ivshmem_remove_kvm_msi_virq(s, i);
+    }
+}
+
+static void ivshmem_disable_irqfd(IVShmemState *s)
+{
+    PCIDevice *pdev = PCI_DEVICE(s);
+    int i;
+
+    if (!pdev->msix_vector_use_notifier) {
+        return;
+    }
+
+    msix_unset_vector_notifiers(pdev);
+
+    for (i = 0; i < s->peers[s->vm_id].nb_eventfds; i++) {
+        ivshmem_remove_kvm_msi_virq(s, i);
+        ivshmem_watch_vector_notifier(s, i);
+    }
+
+}
+
+static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
+                                 uint32_t val, int len)
+{
+    IVShmemState *s = IVSHMEM(pdev);
+    bool was_usable = ivshmem_irqfd_usable(s);
+
+    pci_default_write_config(pdev, address, val, len);
+    ivshmem_update_irqfd(s, was_usable);
+}
+
+static void ivshmem_exit(PCIDevice *dev)
+{
+    IVShmemState *s = IVSHMEM(dev);
+    int i;
+
+    if (s->migration_blocker) {
+        migrate_del_blocker(s->migration_blocker);
+        error_free(s->migration_blocker);
+    }
+
+    if (memory_region_is_mapped(&s->rw_section)) {
+        void *addr = memory_region_get_ram_ptr(&s->rw_section);
+        int fd;
+
+        if (munmap(addr, memory_region_size(&s->rw_section) == -1)) {
+            error_report("Failed to munmap shared memory %s",
+                         strerror(errno));
+        }
+
+        fd = memory_region_get_fd(&s->rw_section);
+        close(fd);
+
+        vmstate_unregister_ram(&s->state_tab, DEVICE(dev));
+        vmstate_unregister_ram(&s->rw_section, DEVICE(dev));
+    }
+
+    if (s->peers) {
+        for (i = 0; i < s->nb_peers; i++) {
+            close_peer_eventfds(s, i);
+        }
+        g_free(s->peers);
+    }
+
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        msix_uninit_exclusive_bar(dev);
+    }
+
+    g_free(s->msi_vectors);
+}
+
+static int ivshmem_pre_load(void *opaque)
+{
+    IVShmemState *s = opaque;
+
+    if (!ivshmem_is_master(s)) {
+        error_report("'peer' devices are not migratable");
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static int ivshmem_post_load(void *opaque, int version_id)
+{
+    IVShmemState *s = opaque;
+
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        ivshmem_msix_vector_use(s);
+    }
+    return 0;
+}
+
+static const VMStateDescription ivshmem_vmsd = {
+    .name = TYPE_IVSHMEM,
+    .version_id = 0,
+    .minimum_version_id = 0,
+    .pre_load = ivshmem_pre_load,
+    .post_load = ivshmem_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
+        VMSTATE_MSIX(parent_obj, IVShmemState),
+        VMSTATE_UINT32(state, IVShmemState),
+        VMSTATE_UINT32(intctrl, IVShmemState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static Property ivshmem_properties[] = {
+    DEFINE_PROP_CHR("chardev", IVShmemState, server_chr),
+    DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD,
+                    true),
+    DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void ivshmem_init(Object *obj)
+{
+    IVShmemState *s = IVSHMEM(obj);
+
+    s->features |= (1 << IVSHMEM_MSI);
+}
+
+static void ivshmem_realize(PCIDevice *dev, Error **errp)
+{
+    IVShmemState *s = IVSHMEM(dev);
+    Chardev *chr = qemu_chr_fe_get_driver(&s->server_chr);
+    size_t rw_section_sz, input_sections_sz;
+    IVShmemVndrCap *vndr_cap;
+    Error *err = NULL;
+    uint8_t *pci_conf;
+    int offset, priv_ctrl_pos;
+    off_t shmem_pos;
+
+    if (!qemu_chr_fe_backend_connected(&s->server_chr)) {
+        error_setg(errp, "You must specify a 'chardev'");
+        return;
+    }
+
+    /* IRQFD requires MSI */
+    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) &&
+        !ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        error_setg(errp, "ioeventfd/irqfd requires MSI");
+        return;
+    }
+
+    pci_conf = dev->config;
+    pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
+
+    memory_region_init_io(&s->ivshmem_mmio, OBJECT(s), &ivshmem_mmio_ops, s,
+                          "ivshmem.mmio", IVSHMEM_REG_BAR_SIZE);
+
+    /* region for registers*/
+    pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
+                     &s->ivshmem_mmio);
+
+    assert(chr);
+    IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
+                    chr->filename);
+
+    /*
+     * Receive setup messages from server synchronously.
+     * Older versions did it asynchronously, but that creates a
+     * number of entertaining race conditions.
+     */
+    ivshmem_recv_setup(s, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+
+    /* we allocate enough space for 16 peers and grow as needed */
+    resize_peers(s, 16);
+
+    if (s->master == ON_OFF_AUTO_ON && s->vm_id != 0) {
+        error_setg(errp,
+                   "Master must connect to the server before any peers");
+        return;
+    }
+
+    qemu_chr_fe_set_handlers(&s->server_chr, ivshmem_can_receive,
+                             ivshmem_read, NULL, NULL, s, NULL, true);
+
+    if (ivshmem_setup_interrupts(s, errp) < 0) {
+        error_prepend(errp, "Failed to initialize interrupts: ");
+        return;
+    }
+
+    memory_region_init(&s->ivshmem_bar2, OBJECT(s), "ivshmem.bar2",
+                       s->shmem_sz);
+
+    input_sections_sz = s->output_section_sz * s->max_peers;
+    if (input_sections_sz + 4096 > s->shmem_sz) {
+        error_setg(errp,
+                   "Invalid output section size, shared memory too small");
+        return;
+    }
+    rw_section_sz = s->shmem_sz - input_sections_sz - 4096;
+
+    shmem_pos = 0;
+    memory_region_init_ram_ptr(&s->state_tab, OBJECT(s), "ivshmem.state",
+                               4096, s->shmem + shmem_pos);
+    memory_region_set_readonly(&s->state_tab, true);
+    memory_region_add_subregion(&s->ivshmem_bar2, shmem_pos, &s->state_tab);
+
+    vmstate_register_ram(&s->state_tab, DEVICE(s));
+
+    if (rw_section_sz > 0) {
+        shmem_pos += 4096;
+        memory_region_init_ram_ptr(&s->rw_section, OBJECT(s),
+                                   "ivshmem.rw-section",
+                                   rw_section_sz, s->shmem + shmem_pos);
+        memory_region_add_subregion(&s->ivshmem_bar2, shmem_pos,
+                                    &s->rw_section);
+
+        vmstate_register_ram(&s->rw_section, DEVICE(s));
+    }
+
+    if (s->output_section_sz > 0) {
+        shmem_pos += rw_section_sz;
+        memory_region_init_ram_ptr(&s->input_sections, OBJECT(s),
+                                   "ivshmem.input-sections", input_sections_sz,
+                                   s->shmem + shmem_pos);
+        memory_region_set_readonly(&s->input_sections, true);
+        memory_region_add_subregion(&s->ivshmem_bar2, shmem_pos,
+                                    &s->input_sections);
+
+        shmem_pos += s->vm_id * s->output_section_sz;
+        memory_region_init_ram_ptr(&s->output_section, OBJECT(s),
+                                   "ivshmem.output-section",
+                                   s->output_section_sz, s->shmem + shmem_pos);
+        memory_region_add_subregion_overlap(&s->ivshmem_bar2, shmem_pos,
+                                            &s->output_section, 1);
+
+        vmstate_register_ram(&s->input_sections, DEVICE(s));
+    }
+
+    pci_config_set_class(dev->config, 0xff00 | (s->protocol >> 8));
+    pci_config_set_prog_interface(dev->config, (uint8_t)s->protocol);
+
+    offset = pci_add_capability(dev, PCI_CAP_ID_VNDR, 0, 0x18,
+                                &error_abort);
+    vndr_cap = (IVShmemVndrCap *)(pci_conf + offset);
+    vndr_cap->length = 0x18;
+    vndr_cap->state_tab_sz = cpu_to_le32(4096);
+    vndr_cap->rw_section_sz = cpu_to_le64(rw_section_sz);
+    vndr_cap->output_section_sz = s->output_section_sz;
+
+    priv_ctrl_pos = offset + offsetof(IVShmemVndrCap, priv_ctrl);
+    s->priv_ctrl = &dev->config[priv_ctrl_pos];
+    dev->wmask[priv_ctrl_pos] |= IVSHMEM_ONESHOT_MODE;
+
+    if (s->master == ON_OFF_AUTO_AUTO) {
+        s->master = s->vm_id == 0 ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
+    }
+
+    if (!ivshmem_is_master(s)) {
+        error_setg(&s->migration_blocker,
+                   "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
+        migrate_add_blocker(s->migration_blocker, &err);
+        if (err) {
+            error_propagate(errp, err);
+            error_free(s->migration_blocker);
+            return;
+        }
+    }
+
+    pci_register_bar(PCI_DEVICE(s), 2,
+                     PCI_BASE_ADDRESS_SPACE_MEMORY |
+                     PCI_BASE_ADDRESS_MEM_PREFETCH |
+                     PCI_BASE_ADDRESS_MEM_TYPE_64,
+                     &s->ivshmem_bar2);
+}
+
+static void ivshmem_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->realize = ivshmem_realize;
+    k->exit = ivshmem_exit;
+    k->config_write = ivshmem_write_config;
+    k->vendor_id = PCI_VENDOR_ID_IVSHMEM;
+    k->device_id = PCI_DEVICE_ID_IVSHMEM;
+    dc->reset = ivshmem_reset;
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    dc->desc = "Inter-VM shared memory v2";
+
+    dc->props = ivshmem_properties;
+    dc->vmsd = &ivshmem_vmsd;
+}
+
+static const TypeInfo ivshmem_info = {
+    .name          = TYPE_IVSHMEM,
+    .parent        = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(IVShmemState),
+    .instance_init = ivshmem_init,
+    .class_init    = ivshmem_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
+        { },
+    },
+};
+
+static void ivshmem_register_types(void)
+{
+    type_register_static(&ivshmem_info);
+}
+
+type_init(ivshmem_register_types)
diff --git a/include/hw/misc/ivshmem2.h b/include/hw/misc/ivshmem2.h
new file mode 100644
index 0000000000..ee994699e8
--- /dev/null
+++ b/include/hw/misc/ivshmem2.h
@@ -0,0 +1,48 @@
+/*
+ * Inter-VM Shared Memory PCI device, Version 2.
+ *
+ * Copyright (c) Siemens AG, 2019
+ *
+ * Authors:
+ *  Jan Kiszka <jan.kiszka@siemens.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.
+ */
+#ifndef IVSHMEM2_H
+#define IVSHMEM2_H
+
+#define IVSHMEM_PROTOCOL_VERSION    2
+
+#define IVSHMEM_MSG_INIT            0
+#define IVSHMEM_MSG_EVENT_FD        1
+#define IVSHMEM_MSG_PEER_GONE       2
+
+typedef struct IvshmemMsgHeader {
+    uint32_t type;
+    uint32_t len;
+} IvshmemMsgHeader;
+
+typedef struct IvshmemInitialInfo {
+    IvshmemMsgHeader header;
+    uint32_t version;
+    uint32_t compatible_version;
+    uint32_t id;
+    uint32_t max_peers;
+    uint32_t vectors;
+    uint32_t protocol;
+    uint64_t output_section_size;
+} IvshmemInitialInfo;
+
+typedef struct IvshmemEventFd {
+    IvshmemMsgHeader header;
+    uint32_t id;
+    uint32_t vector;
+} IvshmemEventFd;
+
+typedef struct IvshmemPeerGone {
+    IvshmemMsgHeader header;
+    uint32_t id;
+} IvshmemPeerGone;
+
+#endif /* IVSHMEM2_H */
diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index 11f8ab7149..f0c5d5ed2b 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -264,6 +264,8 @@
 #define PCI_VENDOR_ID_NEC                0x1033
 #define PCI_DEVICE_ID_NEC_UPD720200      0x0194
 
+#define PCI_VENDOR_ID_SIEMENS            0x110a
+
 #define PCI_VENDOR_ID_TEWS               0x1498
 #define PCI_DEVICE_ID_TEWS_TPCI200       0x30C8
 
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC][PATCH v2 2/3] docs/specs: Add specification of ivshmem device revision 2
  2020-01-07 14:36 [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU Jan Kiszka
  2020-01-07 14:36 ` [RFC][PATCH v2 1/3] hw/misc: Add implementation of ivshmem revision 2 device Jan Kiszka
@ 2020-01-07 14:36 ` Jan Kiszka
  2020-05-21 16:53   ` Alex Bennée
  2020-01-07 14:36 ` [RFC][PATCH v2 3/3] contrib: Add server for ivshmem " Jan Kiszka
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2020-01-07 14:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: liang yan, Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi

From: Jan Kiszka <jan.kiszka@siemens.com>

This imports the ivshmem v2 specification draft from Jailhouse where the
implementation is about to be merged now. The final home of the spec is
to be decided, this shall just simplify the review process at this
stage.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 docs/specs/ivshmem-2-device-spec.md | 376 ++++++++++++++++++++++++++++++++++++
 1 file changed, 376 insertions(+)
 create mode 100644 docs/specs/ivshmem-2-device-spec.md

diff --git a/docs/specs/ivshmem-2-device-spec.md b/docs/specs/ivshmem-2-device-spec.md
new file mode 100644
index 0000000000..d93cb22b04
--- /dev/null
+++ b/docs/specs/ivshmem-2-device-spec.md
@@ -0,0 +1,376 @@
+IVSHMEM Device Specification
+============================
+
+** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE SPECIFICATION! **
+
+The goal of the Inter-VM Shared Memory (IVSHMEM) device model is to
+define the minimally needed building blocks a hypervisor has to
+provide for enabling guest-to-guest communication. The details of
+communication protocols shall remain opaque to the hypervisor so that
+guests are free to define them as simple or sophisticated as they
+need.
+
+For that purpose, the IVSHMEM provides the following features to its
+users:
+
+- Interconnection between up to 65536 peers
+
+- Multi-purpose shared memory region
+
+    - common read/writable section
+
+    - output sections that are read/writable for one peer and only
+      readable for the others
+
+    - section with peer states
+
+- Event signaling via interrupt to remote sides
+
+- Support for life-cycle management via state value exchange and
+  interrupt notification on changes, backed by a shared memory
+  section
+
+- Free choice of protocol to be used on top
+
+- Protocol type declaration
+
+- Register can be implemented either memory-mapped or via I/O,
+  depending on platform support and lower VM-exit costs
+
+- Unprivileged access to memory-mapped or I/O registers feasible
+
+- Single discovery and configuration via standard PCI, no complexity
+  by additionally defining a platform device model
+
+
+Hypervisor Model
+----------------
+
+In order to provide a consistent link between peers, all connected
+instances of IVSHMEM devices need to be configured, created and run
+by the hypervisor according to the following requirements:
+
+- The instances of the device shall appear as a PCI device to their
+  users.
+
+- The read/write shared memory section has to be of the same size for
+  all peers. The size can be zero.
+
+- If shared memory output sections are present (non-zero section
+  size), there must be one reserved for each peer with exclusive
+  write access. All output sections must have the same size and must
+  be readable for all peers.
+
+- The State Table must have the same size for all peers, must be
+  large enough to hold the state values of all peers, and must be
+  read-only for the user.
+
+- State register changes (explicit writes, peer resets) have to be
+  propagated to the other peers by updating the corresponding State
+  Table entry and issuing an interrupt to all other peers if they
+  enabled reception.
+
+- Interrupts events triggered by a peer have to be delivered to the
+  target peer, provided the receiving side is valid and has enabled
+  the reception.
+
+- All peers must have the same interrupt delivery features available,
+  i.e. MSI-X with the same maximum number of vectors on platforms
+  supporting this mechanism, otherwise INTx with one vector.
+
+
+Guest-side Programming Model
+----------------------------
+
+An IVSHMEM device appears as a PCI device to its users. Unless
+otherwise noted, it conforms to the PCI Local Bus Specification,
+Revision 3.0. As such, it is discoverable via the PCI configuration
+space and provides a number of standard and custom PCI configuration
+registers.
+
+### Shared Memory Region Layout
+
+The shared memory region is divided into several sections.
+
+    +-----------------------------+   -
+    |                             |   :
+    | Output Section for peer n-1 |   : Output Section Size
+    |     (n = Maximum Peers)     |   :
+    +-----------------------------+   -
+    :                             :
+    :                             :
+    :                             :
+    +-----------------------------+   -
+    |                             |   :
+    |  Output Section for peer 1  |   : Output Section Size
+    |                             |   :
+    +-----------------------------+   -
+    |                             |   :
+    |  Output Section for peer 0  |   : Output Section Size
+    |                             |   :
+    +-----------------------------+   -
+    |                             |   :
+    |     Read/Write Section      |   : R/W Section Size
+    |                             |   :
+    +-----------------------------+   -
+    |                             |   :
+    |         State Table         |   : State Table Size
+    |                             |   :
+    +-----------------------------+   <-- Shared memory base address
+
+The first section consists of the mandatory State Table. Its size is
+defined by the State Table Size register and cannot be zero. This
+section is read-only for all peers.
+
+The second section consists of shared memory that is read/writable
+for all peers. Its size is defined by the R/W Section Size register.
+A size of zero is permitted.
+
+The third and following sections are output sections, one for each
+peer. Their sizes are all identical. The size of a single output
+section is defined by the Output Section Size register. An output
+section is read/writable for the corresponding peer and read-only for
+all other peers. E.g., only the peer with ID 3 can write to the
+fourths output section, but all peers can read from this section.
+
+All sizes have to be rounded up to multiples of a mappable page in
+order to allow access control according to the section restrictions.
+
+### Configuration Space Registers
+
+#### Header Registers
+
+| Offset | Register               | Content                                              |
+|-------:|:-----------------------|:-----------------------------------------------------|
+|    00h | Vendor ID              | 110Ah                                                |
+|    02h | Device ID              | 4106h                                                |
+|    04h | Command Register       | 0000h on reset, writable bits are:                   |
+|        |                        | Bit 0: I/O Space (if Register Region uses I/O)       |
+|        |                        | Bit 1: Memory Space (if Register Region uses Memory) |
+|        |                        | Bit 3: Bus Master                                    |
+|        |                        | Bit 10: INTx interrupt disable                       |
+|        |                        | Writes to other bits are ignored                     |
+|    06h | Status Register        | 0010h, static value                                  |
+|        |                        | In deviation to the PCI specification, the Interrupt |
+|        |                        | Status (bit 3) is never set                          |
+|    08h | Revision ID            | 00h                                                  |
+|    09h | Class Code, Interface  | Protocol Type bits 0-7, see [Protocols](#Protocols)  |
+|    0Ah | Class Code, Sub-Class  | Protocol Type bits 8-15, see [Protocols](#Protocols) |
+|    0Bh | Class Code, Base Class | FFh                                                  |
+|    0Eh | Header Type            | 00h                                                  |
+|    10h | BAR 0                  | MMIO or I/O register region                          |
+|    14h | BAR 1                  | MSI-X region                                         |
+|    18h | BAR 2 (with BAR 3)     | optional: 64-bit shared memory region                |
+|    2Ch | Subsystem Vendor ID    | same as Vendor ID, or provider-specific value        |
+|    2Eh | Subsystem ID           | same as Device ID, or provider-specific value        |
+|    34h | Capability Pointer     | First capability                                     |
+|    3Eh | Interrupt Pin          | 01h-04h, must be 00h if MSI-X is available           |
+
+The INTx status bit is never set by an implementation. Users of the
+IVSHMEM device are instead expected to derive the event state from
+protocol-specific information kept in the shared memory. This
+approach is significantly faster, and the complexity of
+register-based status tracking can be avoided.
+
+If BAR 2 is not present, the shared memory region is not relocatable
+by the user. In that case, the hypervisor has to implement the Base
+Address register in the vendor-specific capability.
+
+Subsystem IDs shall encode the provider (hypervisor) in order to
+allow identifying potential deviating implementations in case this
+should ever be required.
+
+If its platform supports MSI-X, an implementation of the IVSHMEM
+device must provide this interrupt model and must not expose INTx
+support.
+
+Other header registers may not be implemented. If not implemented,
+they return 0 on read and ignore write accesses.
+
+#### Vendor Specific Capability (ID 09h)
+
+This capability must always be present.
+
+| Offset | Register            | Content                                        |
+|-------:|:--------------------|:-----------------------------------------------|
+|    00h | ID                  | 09h                                            |
+|    01h | Next Capability     | Pointer to next capability or 00h              |
+|    02h | Length              | 20h if Base Address is present, 18h otherwise  |
+|    03h | Privileged Control  | Bit 0 (read/write): one-shot interrupt mode    |
+|        |                     | Bits 1-7: Reserved (0 on read, writes ignored) |
+|    04h | State Table Size    | 32-bit size of read-only State Table           |
+|    08h | R/W Section Size    | 64-bit size of common read/write section       |
+|    10h | Output Section Size | 64-bit size of output sections                 |
+|    18h | Base Address        | optional: 64-bit base address of shared memory |
+
+All registers are read-only. Writes are ignored, except to bit 0 of
+the Privileged Control register.
+
+When bit 0 in the Privileged Control register is set to 1, the device
+clears bit 0 in the Interrupt Control register on each interrupt
+delivery. This enables automatic interrupt throttling when
+re-enabling shall be performed by a scheduled unprivileged instance
+on the user side.
+
+An IVSHMEM device may not support a relocatable shared memory region.
+This support the hypervisor in locking down the guest-to-host address
+mapping and simplifies the runtime logic. In such a case, BAR 2 must
+not be implemented by the hypervisor. Instead, the Base Address
+register has to be implemented to report the location of the shared
+memory region in the user's address space.
+
+A non-existing shared memory section has to report zero in its
+Section Size register.
+
+#### MSI-X Capability (ID 11h)
+
+On platforms supporting MSI-X, IVSHMEM has to provide interrupt
+delivery via this mechanism. In that case, the MSI-X capability is
+present while the legacy INTx delivery mechanism is not available,
+and the Interrupt Pin configuration register returns 0.
+
+The IVSHMEM device has no notion of pending interrupts. Therefore,
+reading from the MSI-X Pending Bit Array will always return 0. Users
+of the IVSHMEM device are instead expected to derive the event state
+from protocol-specific information kept in the shared memory. This
+approach is significantly faster, and the complexity of
+register-based status tracking can be avoided.
+
+The corresponding MSI-X MMIO region is configured via BAR 1.
+
+The MSI-X table size reported by the MSI-X capability structure is
+identical for all peers.
+
+### Register Region
+
+The register region may be implemented as MMIO or I/O.
+
+When implementing it as MMIO, the hypervisor has to ensure that the
+register region can be mapped as a single page into the address space
+of the user, without causing potential overlaps with other resources.
+Write accesses to MMIO region offsets that are not backed by
+registers have to be ignored, read accesses have to return 0. This
+enables the user to hand out the complete region, along with the
+shared memory, to an unprivileged instance.
+
+The region location in the user's physical address space is
+configured via BAR 0. The following table visualizes the region
+layout:
+
+| Offset | Register                                                            |
+|-------:|:--------------------------------------------------------------------|
+|    00h | ID                                                                  |
+|    04h | Maximum Peers                                                       |
+|    08h | Interrupt Control                                                   |
+|    0Ch | Doorbell                                                            |
+|    10h | State                                                               |
+
+All registers support only aligned 32-bit accesses.
+
+#### ID Register (Offset 00h)
+
+Read-only register that reports the ID of the local device. It is
+unique for all of the connected devices and remains unchanged over
+their lifetime.
+
+#### Maximum Peers Register (Offset 04h)
+
+Read-only register that reports the maximum number of possible peers
+(including the local one). The permitted range is between 2 and 65536
+and remains constant over the lifetime of all peers.
+
+#### Interrupt Control Register (Offset 08h)
+
+This read/write register controls the generation of interrupts
+whenever a peer writes to the Doorbell register or changes its state.
+
+| Bits | Content                                                               |
+|-----:|:----------------------------------------------------------------------|
+|    0 | 1: Enable interrupt generation                                        |
+| 1-31 | Reserved (0 on read, writes ignored)                                  |
+
+Note that bit 0 is reset to 0 on interrupt delivery if one-shot
+interrupt mode is enabled in the Enhanced Features register.
+
+The value of this register after device reset is 0.
+
+#### Doorbell Register (Offset 0Ch)
+
+Write-only register that triggers an interrupt vector in the target
+device if it is enabled there.
+
+| Bits  | Content                                                              |
+|------:|:---------------------------------------------------------------------|
+|  0-15 | Vector number                                                        |
+| 16-31 | Target ID                                                            |
+
+Writing a vector number that is not enabled by the target has no
+effect. The peers can derive the number of available vectors from
+their own device capabilities because the provider is required to
+expose an identical number of vectors to all connected peers. The
+peers are expected to define or negotiate the used ones via the
+selected protocol.
+
+Addressing a non-existing or inactive target has no effect. Peers can
+identify active targets via the State Table.
+
+The behavior on reading from this register is undefined.
+
+#### State Register (Offset 10h)
+
+Read/write register that defines the state of the local device.
+Writing to this register sets the state and triggers MSI-X vector 0
+or the INTx interrupt, respectively, on the remote device if the
+written state value differs from the previous one. Users of peer
+devices can read the value written to this register from the State
+Table. They are expected differentiate state change interrupts from
+doorbell events by comparing the new state value with a locally
+stored copy.
+
+The value of this register after device reset is 0. The semantic of
+all other values can be defined freely by the chosen protocol.
+
+### State Table
+
+The State Table is a read-only section at the beginning of the shared
+memory region. It contains a 32-bit state value for each of the
+peers. Locating the table in shared memory allows fast checking of
+remote states without register accesses.
+
+The table is updated on each state change of a peers. Whenever a user
+of an IVSHMEM device writes a value to the Local State register, this
+value is copied into the corresponding entry of the State Table. When
+a IVSHMEM device is reset or disconnected from the other peers, zero
+is written into the corresponding table entry. The initial content of
+the table is all zeros.
+
+    +--------------------------------+
+    | 32-bit state value of peer n-1 |
+    +--------------------------------+
+    :                                :
+    +--------------------------------+
+    | 32-bit state value of peer 1   |
+    +--------------------------------+
+    | 32-bit state value of peer 0   |
+    +--------------------------------+ <-- Shared memory base address
+
+
+Protocols
+---------
+
+The IVSHMEM device shall support the peers of a connection in
+agreeing on the protocol used over the shared memory devices. For
+that purpose, the interface byte (offset 09h) and the sub-class byte
+(offset 0Ah) of the Class Code register encodes a 16-bit protocol
+type for the users. The following type values are defined:
+
+| Protocol Type | Description                                                  |
+|--------------:|:-------------------------------------------------------------|
+|         0000h | Undefined type                                               |
+|         0001h | Virtual peer-to-peer Ethernet                                |
+|   0002h-3FFFh | Reserved                                                     |
+|   4000h-7FFFh | User-defined protocols                                       |
+|   8000h-BFFFh | Virtio over Shared Memory, front-end peer                    |
+|   C000h-FFFFh | Virtio over Shared Memory, back-end peer                     |
+
+Details of the protocols are not in the scope of this specification.
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC][PATCH v2 3/3] contrib: Add server for ivshmem revision 2
  2020-01-07 14:36 [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU Jan Kiszka
  2020-01-07 14:36 ` [RFC][PATCH v2 1/3] hw/misc: Add implementation of ivshmem revision 2 device Jan Kiszka
  2020-01-07 14:36 ` [RFC][PATCH v2 2/3] docs/specs: Add specification of ivshmem device revision 2 Jan Kiszka
@ 2020-01-07 14:36 ` Jan Kiszka
  2020-04-29  4:22   ` Liang Yan
  2020-04-09 13:52 ` [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU Liang Yan
  2020-04-29  4:15 ` Liang Yan
  4 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2020-01-07 14:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: liang yan, Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi

From: Jan Kiszka <jan.kiszka@siemens.com>

This implements the server process for ivshmem v2 device models of QEMU.
Again, no effort has been spent yet on sharing code with the v1 server.
Parts have been copied, others were rewritten.

In addition to parameters of v1, this server now also specifies

 - the maximum number of peers to be connected (required to know in
   advance because of v2's state table)
 - the size of the output sections (can be 0)
 - the protocol ID to be published to all peers

When a virtio protocol ID is chosen, only 2 peers can be connected.
Furthermore, the server will signal the backend variant of the ID to the
master instance and the frontend ID to the slave peer.

To start, e.g., a server that allows virtio console over ivshmem, call

ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003

TODO: specify the new server protocol.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 Makefile                                  |   3 +
 Makefile.objs                             |   1 +
 configure                                 |   1 +
 contrib/ivshmem2-server/Makefile.objs     |   1 +
 contrib/ivshmem2-server/ivshmem2-server.c | 462 ++++++++++++++++++++++++++++++
 contrib/ivshmem2-server/ivshmem2-server.h | 158 ++++++++++
 contrib/ivshmem2-server/main.c            | 313 ++++++++++++++++++++
 7 files changed, 939 insertions(+)
 create mode 100644 contrib/ivshmem2-server/Makefile.objs
 create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c
 create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h
 create mode 100644 contrib/ivshmem2-server/main.c

diff --git a/Makefile b/Makefile
index 6b5ad1121b..33bb0eefdb 100644
--- a/Makefile
+++ b/Makefile
@@ -427,6 +427,7 @@ dummy := $(call unnest-vars,, \
                 elf2dmp-obj-y \
                 ivshmem-client-obj-y \
                 ivshmem-server-obj-y \
+                ivshmem2-server-obj-y \
                 rdmacm-mux-obj-y \
                 libvhost-user-obj-y \
                 vhost-user-scsi-obj-y \
@@ -655,6 +656,8 @@ ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) $(COMMON_LDADDS)
 	$(call LINK, $^)
 ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) $(COMMON_LDADDS)
 	$(call LINK, $^)
+ivshmem2-server$(EXESUF): $(ivshmem2-server-obj-y) $(COMMON_LDADDS)
+	$(call LINK, $^)
 endif
 vhost-user-scsi$(EXESUF): $(vhost-user-scsi-obj-y) libvhost-user.a
 	$(call LINK, $^)
diff --git a/Makefile.objs b/Makefile.objs
index 02bf5ce11d..ce243975ef 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -115,6 +115,7 @@ qga-vss-dll-obj-y = qga/
 elf2dmp-obj-y = contrib/elf2dmp/
 ivshmem-client-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem-client/
 ivshmem-server-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem-server/
+ivshmem2-server-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem2-server/
 libvhost-user-obj-y = contrib/libvhost-user/
 vhost-user-scsi.o-cflags := $(LIBISCSI_CFLAGS)
 vhost-user-scsi.o-libs := $(LIBISCSI_LIBS)
diff --git a/configure b/configure
index 747d3b4120..1cb1427f1b 100755
--- a/configure
+++ b/configure
@@ -6165,6 +6165,7 @@ if test "$want_tools" = "yes" ; then
   fi
   if [ "$ivshmem" = "yes" ]; then
     tools="ivshmem-client\$(EXESUF) ivshmem-server\$(EXESUF) $tools"
+    tools="ivshmem2-server\$(EXESUF) $tools"
   fi
   if [ "$curl" = "yes" ]; then
       tools="elf2dmp\$(EXESUF) $tools"
diff --git a/contrib/ivshmem2-server/Makefile.objs b/contrib/ivshmem2-server/Makefile.objs
new file mode 100644
index 0000000000..d233e18ec8
--- /dev/null
+++ b/contrib/ivshmem2-server/Makefile.objs
@@ -0,0 +1 @@
+ivshmem2-server-obj-y = ivshmem2-server.o main.o
diff --git a/contrib/ivshmem2-server/ivshmem2-server.c b/contrib/ivshmem2-server/ivshmem2-server.c
new file mode 100644
index 0000000000..b341f1fcd0
--- /dev/null
+++ b/contrib/ivshmem2-server/ivshmem2-server.c
@@ -0,0 +1,462 @@
+/*
+ * Copyright 6WIND S.A., 2014
+ * Copyright (c) Siemens AG, 2019
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/sockets.h"
+#include "qemu/atomic.h"
+
+#include <sys/socket.h>
+#include <sys/un.h>
+
+#include "ivshmem2-server.h"
+
+/* log a message on stdout if verbose=1 */
+#define IVSHMEM_SERVER_DEBUG(server, fmt, ...) do { \
+        if ((server)->args.verbose) {         \
+            printf(fmt, ## __VA_ARGS__); \
+        }                                \
+    } while (0)
+
+/** maximum size of a huge page, used by ivshmem_server_ftruncate() */
+#define IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE (1024 * 1024 * 1024)
+
+/** default listen backlog (number of sockets not accepted) */
+#define IVSHMEM_SERVER_LISTEN_BACKLOG 10
+
+/* send message to a client unix socket */
+static int ivshmem_server_send_msg(int sock_fd, void *payload, int len, int fd)
+{
+    int ret;
+    struct msghdr msg;
+    struct iovec iov[1];
+    union {
+        struct cmsghdr cmsg;
+        char control[CMSG_SPACE(sizeof(int))];
+    } msg_control;
+    struct cmsghdr *cmsg;
+
+    iov[0].iov_base = payload;
+    iov[0].iov_len = len;
+
+    memset(&msg, 0, sizeof(msg));
+    msg.msg_iov = iov;
+    msg.msg_iovlen = 1;
+
+    /* if fd is specified, add it in a cmsg */
+    if (fd >= 0) {
+        memset(&msg_control, 0, sizeof(msg_control));
+        msg.msg_control = &msg_control;
+        msg.msg_controllen = sizeof(msg_control);
+        cmsg = CMSG_FIRSTHDR(&msg);
+        cmsg->cmsg_level = SOL_SOCKET;
+        cmsg->cmsg_type = SCM_RIGHTS;
+        cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+        memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
+    }
+
+    ret = sendmsg(sock_fd, &msg, 0);
+    if (ret <= 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static int ivshmem_server_send_event_fd(int sock_fd, int peer_id, int vector,
+                                        int fd)
+{
+    IvshmemEventFd msg = {
+        .header = {
+            .type = GUINT32_TO_LE(IVSHMEM_MSG_EVENT_FD),
+            .len = GUINT32_TO_LE(sizeof(msg)),
+        },
+        .id = GUINT32_TO_LE(peer_id),
+        .vector = GUINT32_TO_LE(vector),
+    };
+
+    return ivshmem_server_send_msg(sock_fd, &msg, sizeof(msg), fd);
+}
+
+/* free a peer when the server advertises a disconnection or when the
+ * server is freed */
+static void
+ivshmem_server_free_peer(IvshmemServer *server, IvshmemServerPeer *peer)
+{
+    unsigned vector;
+    IvshmemServerPeer *other_peer;
+    IvshmemPeerGone msg = {
+        .header = {
+            .type = GUINT32_TO_LE(IVSHMEM_MSG_PEER_GONE),
+            .len = GUINT32_TO_LE(sizeof(msg)),
+        },
+        .id = GUINT32_TO_LE(peer->id),
+    };
+
+    IVSHMEM_SERVER_DEBUG(server, "free peer %" PRId64 "\n", peer->id);
+    close(peer->sock_fd);
+    QTAILQ_REMOVE(&server->peer_list, peer, next);
+
+    server->state_table[peer->id] = 0;
+    smp_mb();
+
+    /* advertise the deletion to other peers */
+    QTAILQ_FOREACH(other_peer, &server->peer_list, next) {
+        ivshmem_server_send_msg(other_peer->sock_fd, &msg, sizeof(msg), -1);
+    }
+
+    for (vector = 0; vector < peer->vectors_count; vector++) {
+        event_notifier_cleanup(&peer->vectors[vector]);
+    }
+
+    g_free(peer);
+}
+
+/* send the peer id and the shm_fd just after a new client connection */
+static int
+ivshmem_server_send_initial_info(IvshmemServer *server, IvshmemServerPeer *peer)
+{
+    IvshmemInitialInfo msg = {
+        .header = {
+            .type = GUINT32_TO_LE(IVSHMEM_MSG_INIT),
+            .len = GUINT32_TO_LE(sizeof(msg)),
+        },
+        .version = GUINT32_TO_LE(IVSHMEM_PROTOCOL_VERSION),
+        .compatible_version = GUINT32_TO_LE(IVSHMEM_PROTOCOL_VERSION),
+        .id = GUINT32_TO_LE(peer->id),
+        .max_peers = GUINT32_TO_LE(server->args.max_peers),
+        .vectors = GUINT32_TO_LE(server->args.vectors),
+        .protocol = GUINT32_TO_LE(server->args.protocol),
+        .output_section_size = GUINT64_TO_LE(server->args.output_section_size),
+    };
+    unsigned virtio_protocol;
+    int ret;
+
+    if (server->args.protocol >= 0x8000) {
+        virtio_protocol = server->args.protocol & ~0x4000;
+        msg.protocol &= ~0x4000;
+        if (peer->id == 0) {
+            virtio_protocol |= 0x4000;
+        }
+        msg.protocol = GUINT32_TO_LE(virtio_protocol);
+    }
+
+    ret = ivshmem_server_send_msg(peer->sock_fd, &msg, sizeof(msg),
+                                  server->shm_fd);
+    if (ret < 0) {
+        IVSHMEM_SERVER_DEBUG(server, "cannot send initial info: %s\n",
+                             strerror(errno));
+        return -1;
+    }
+
+    return 0;
+}
+
+/* handle message on listening unix socket (new client connection) */
+static int
+ivshmem_server_handle_new_conn(IvshmemServer *server)
+{
+    IvshmemServerPeer *peer, *other_peer;
+    struct sockaddr_un unaddr;
+    socklen_t unaddr_len;
+    int newfd;
+    unsigned i;
+
+    /* accept the incoming connection */
+    unaddr_len = sizeof(unaddr);
+    newfd = qemu_accept(server->sock_fd,
+                        (struct sockaddr *)&unaddr, &unaddr_len);
+
+    if (newfd < 0) {
+        IVSHMEM_SERVER_DEBUG(server, "cannot accept() %s\n", strerror(errno));
+        return -1;
+    }
+
+    qemu_set_nonblock(newfd);
+    IVSHMEM_SERVER_DEBUG(server, "accept()=%d\n", newfd);
+
+    /* allocate new structure for this peer */
+    peer = g_malloc0(sizeof(*peer));
+    peer->sock_fd = newfd;
+
+    /* get an unused peer id */
+    /* XXX: this could use id allocation such as Linux IDA, or simply
+     * a free-list */
+    for (i = 0; i < G_MAXUINT16; i++) {
+        if (ivshmem_server_search_peer(server, i) == NULL) {
+            break;
+        }
+    }
+    if (i >= server->args.max_peers) {
+        IVSHMEM_SERVER_DEBUG(server, "cannot allocate new client id\n");
+        close(newfd);
+        g_free(peer);
+        return -1;
+    }
+    peer->id = i;
+
+    /* create eventfd, one per vector */
+    peer->vectors_count = server->args.vectors;
+    for (i = 0; i < peer->vectors_count; i++) {
+        if (event_notifier_init(&peer->vectors[i], FALSE) < 0) {
+            IVSHMEM_SERVER_DEBUG(server, "cannot create eventfd\n");
+            goto fail;
+        }
+    }
+
+    /* send peer id and shm fd */
+    if (ivshmem_server_send_initial_info(server, peer) < 0) {
+        IVSHMEM_SERVER_DEBUG(server, "cannot send initial info\n");
+        goto fail;
+    }
+
+    /* advertise the new peer to others */
+    QTAILQ_FOREACH(other_peer, &server->peer_list, next) {
+        for (i = 0; i < peer->vectors_count; i++) {
+            ivshmem_server_send_event_fd(other_peer->sock_fd, peer->id, i,
+                                         peer->vectors[i].wfd);
+        }
+    }
+
+    /* advertise the other peers to the new one */
+    QTAILQ_FOREACH(other_peer, &server->peer_list, next) {
+        for (i = 0; i < peer->vectors_count; i++) {
+            ivshmem_server_send_event_fd(peer->sock_fd, other_peer->id, i,
+                                         other_peer->vectors[i].wfd);
+        }
+    }
+
+    /* advertise the new peer to itself */
+    for (i = 0; i < peer->vectors_count; i++) {
+        ivshmem_server_send_event_fd(peer->sock_fd, peer->id, i,
+                                     event_notifier_get_fd(&peer->vectors[i]));
+    }
+
+    QTAILQ_INSERT_TAIL(&server->peer_list, peer, next);
+    IVSHMEM_SERVER_DEBUG(server, "new peer id = %" PRId64 "\n",
+                         peer->id);
+    return 0;
+
+fail:
+    while (i--) {
+        event_notifier_cleanup(&peer->vectors[i]);
+    }
+    close(newfd);
+    g_free(peer);
+    return -1;
+}
+
+/* Try to ftruncate a file to next power of 2 of shmsize.
+ * If it fails; all power of 2 above shmsize are tested until
+ * we reach the maximum huge page size. This is useful
+ * if the shm file is in a hugetlbfs that cannot be truncated to the
+ * shm_size value. */
+static int
+ivshmem_server_ftruncate(int fd, unsigned shmsize)
+{
+    int ret;
+    struct stat mapstat;
+
+    /* align shmsize to next power of 2 */
+    shmsize = pow2ceil(shmsize);
+
+    if (fstat(fd, &mapstat) != -1 && mapstat.st_size == shmsize) {
+        return 0;
+    }
+
+    while (shmsize <= IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE) {
+        ret = ftruncate(fd, shmsize);
+        if (ret == 0) {
+            return ret;
+        }
+        shmsize *= 2;
+    }
+
+    return -1;
+}
+
+/* Init a new ivshmem server */
+void ivshmem_server_init(IvshmemServer *server)
+{
+    server->sock_fd = -1;
+    server->shm_fd = -1;
+    server->state_table = NULL;
+    QTAILQ_INIT(&server->peer_list);
+}
+
+/* open shm, create and bind to the unix socket */
+int
+ivshmem_server_start(IvshmemServer *server)
+{
+    struct sockaddr_un sun;
+    int shm_fd, sock_fd, ret;
+    void *state_table;
+
+    /* open shm file */
+    if (server->args.use_shm_open) {
+        IVSHMEM_SERVER_DEBUG(server, "Using POSIX shared memory: %s\n",
+                             server->args.shm_path);
+        shm_fd = shm_open(server->args.shm_path, O_CREAT | O_RDWR, S_IRWXU);
+    } else {
+        gchar *filename = g_strdup_printf("%s/ivshmem.XXXXXX",
+                                          server->args.shm_path);
+        IVSHMEM_SERVER_DEBUG(server, "Using file-backed shared memory: %s\n",
+                             server->args.shm_path);
+        shm_fd = mkstemp(filename);
+        unlink(filename);
+        g_free(filename);
+    }
+
+    if (shm_fd < 0) {
+        fprintf(stderr, "cannot open shm file %s: %s\n", server->args.shm_path,
+                strerror(errno));
+        return -1;
+    }
+    if (ivshmem_server_ftruncate(shm_fd, server->args.shm_size) < 0) {
+        fprintf(stderr, "ftruncate(%s) failed: %s\n", server->args.shm_path,
+                strerror(errno));
+        goto err_close_shm;
+    }
+    state_table = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED,
+                       shm_fd, 0);
+    if (state_table == MAP_FAILED) {
+        fprintf(stderr, "mmap failed: %s\n", strerror(errno));
+        goto err_close_shm;
+    }
+
+    IVSHMEM_SERVER_DEBUG(server, "create & bind socket %s\n",
+                         server->args.unix_socket_path);
+
+    /* create the unix listening socket */
+    sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
+    if (sock_fd < 0) {
+        IVSHMEM_SERVER_DEBUG(server, "cannot create socket: %s\n",
+                             strerror(errno));
+        goto err_unmap;
+    }
+
+    sun.sun_family = AF_UNIX;
+    ret = snprintf(sun.sun_path, sizeof(sun.sun_path), "%s",
+                   server->args.unix_socket_path);
+    if (ret < 0 || ret >= sizeof(sun.sun_path)) {
+        IVSHMEM_SERVER_DEBUG(server, "could not copy unix socket path\n");
+        goto err_close_sock;
+    }
+    if (bind(sock_fd, (struct sockaddr *)&sun, sizeof(sun)) < 0) {
+        IVSHMEM_SERVER_DEBUG(server, "cannot connect to %s: %s\n", sun.sun_path,
+                             strerror(errno));
+        goto err_close_sock;
+    }
+
+    if (listen(sock_fd, IVSHMEM_SERVER_LISTEN_BACKLOG) < 0) {
+        IVSHMEM_SERVER_DEBUG(server, "listen() failed: %s\n", strerror(errno));
+        goto err_close_sock;
+    }
+
+    server->sock_fd = sock_fd;
+    server->shm_fd = shm_fd;
+    server->state_table = state_table;
+
+    return 0;
+
+err_close_sock:
+    close(sock_fd);
+err_unmap:
+    munmap(state_table, 4096);
+err_close_shm:
+    if (server->args.use_shm_open) {
+        shm_unlink(server->args.shm_path);
+    }
+    close(shm_fd);
+    shm_unlink(server->args.shm_path);
+    return -1;
+}
+
+/* close connections to clients, the unix socket and the shm fd */
+void
+ivshmem_server_close(IvshmemServer *server)
+{
+    IvshmemServerPeer *peer, *npeer;
+
+    IVSHMEM_SERVER_DEBUG(server, "close server\n");
+
+    QTAILQ_FOREACH_SAFE(peer, &server->peer_list, next, npeer) {
+        ivshmem_server_free_peer(server, peer);
+    }
+
+    unlink(server->args.unix_socket_path);
+    if (server->args.use_shm_open) {
+        shm_unlink(server->args.shm_path);
+    }
+    close(server->sock_fd);
+    munmap(server->state_table, 4096);
+    close(server->shm_fd);
+    server->sock_fd = -1;
+    server->shm_fd = -1;
+}
+
+/* get the fd_set according to the unix socket and the peer list */
+void
+ivshmem_server_get_fds(const IvshmemServer *server, fd_set *fds, int *maxfd)
+{
+    IvshmemServerPeer *peer;
+
+    if (server->sock_fd == -1) {
+        return;
+    }
+
+    FD_SET(server->sock_fd, fds);
+    if (server->sock_fd >= *maxfd) {
+        *maxfd = server->sock_fd + 1;
+    }
+
+    QTAILQ_FOREACH(peer, &server->peer_list, next) {
+        FD_SET(peer->sock_fd, fds);
+        if (peer->sock_fd >= *maxfd) {
+            *maxfd = peer->sock_fd + 1;
+        }
+    }
+}
+
+/* process incoming messages on the sockets in fd_set */
+int
+ivshmem_server_handle_fds(IvshmemServer *server, fd_set *fds, int maxfd)
+{
+    IvshmemServerPeer *peer, *peer_next;
+
+    if (server->sock_fd < maxfd && FD_ISSET(server->sock_fd, fds) &&
+        ivshmem_server_handle_new_conn(server) < 0 && errno != EINTR) {
+        IVSHMEM_SERVER_DEBUG(server, "ivshmem_server_handle_new_conn() "
+                             "failed\n");
+        return -1;
+    }
+
+    QTAILQ_FOREACH_SAFE(peer, &server->peer_list, next, peer_next) {
+        /* any message from a peer socket result in a close() */
+        IVSHMEM_SERVER_DEBUG(server, "peer->sock_fd=%d\n", peer->sock_fd);
+        if (peer->sock_fd < maxfd && FD_ISSET(peer->sock_fd, fds)) {
+            ivshmem_server_free_peer(server, peer);
+        }
+    }
+
+    return 0;
+}
+
+/* lookup peer from its id */
+IvshmemServerPeer *
+ivshmem_server_search_peer(IvshmemServer *server, int64_t peer_id)
+{
+    IvshmemServerPeer *peer;
+
+    QTAILQ_FOREACH(peer, &server->peer_list, next) {
+        if (peer->id == peer_id) {
+            return peer;
+        }
+    }
+    return NULL;
+}
diff --git a/contrib/ivshmem2-server/ivshmem2-server.h b/contrib/ivshmem2-server/ivshmem2-server.h
new file mode 100644
index 0000000000..3fd6166577
--- /dev/null
+++ b/contrib/ivshmem2-server/ivshmem2-server.h
@@ -0,0 +1,158 @@
+/*
+ * Copyright 6WIND S.A., 2014
+ * Copyright (c) Siemens AG, 2019
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#ifndef IVSHMEM2_SERVER_H
+#define IVSHMEM2_SERVER_H
+
+/**
+ * The ivshmem server is a daemon that creates a unix socket in listen
+ * mode. The ivshmem clients (qemu or ivshmem-client) connect to this
+ * unix socket. For each client, the server will create some eventfd
+ * (see EVENTFD(2)), one per vector. These fd are transmitted to all
+ * clients using the SCM_RIGHTS cmsg message. Therefore, each client is
+ * able to send a notification to another client without being
+ * "profixied" by the server.
+ *
+ * We use this mechanism to send interruptions between guests.
+ * qemu is able to transform an event on a eventfd into a PCI MSI-x
+ * interruption in the guest.
+ *
+ * The ivshmem server is also able to share the file descriptor
+ * associated to the ivshmem shared memory.
+ */
+
+#include <sys/select.h>
+
+#include "qemu/event_notifier.h"
+#include "qemu/queue.h"
+#include "hw/misc/ivshmem2.h"
+
+/**
+ * Maximum number of notification vectors supported by the server
+ */
+#define IVSHMEM_SERVER_MAX_VECTORS 64
+
+/**
+ * Structure storing a peer
+ *
+ * Each time a client connects to an ivshmem server, a new
+ * IvshmemServerPeer structure is created. This peer and all its
+ * vectors are advertised to all connected clients through the connected
+ * unix sockets.
+ */
+typedef struct IvshmemServerPeer {
+    QTAILQ_ENTRY(IvshmemServerPeer) next;    /**< next in list*/
+    int sock_fd;                             /**< connected unix sock */
+    int64_t id;                              /**< the id of the peer */
+    EventNotifier vectors[IVSHMEM_SERVER_MAX_VECTORS]; /**< one per vector */
+    unsigned vectors_count;                  /**< number of vectors */
+} IvshmemServerPeer;
+
+/**
+ * Structure describing ivshmem server arguments
+ */
+typedef struct IvshmemServerArgs {
+    bool verbose;                   /**< true to enable verbose mode */
+    const char *unix_socket_path;   /**< pointer to unix socket file name */
+    const char *shm_path;           /**< Path to the shared memory; path
+                                         corresponds to a POSIX shm name or a
+                                         hugetlbfs mount point. */
+    bool use_shm_open;              /**< true to use shm_open, false for
+                                         file-backed shared memory */
+    uint64_t shm_size;              /**< total size of shared memory */
+    uint64_t output_section_size;   /**< size of each output section */
+    unsigned max_peers;             /**< maximum number of peers */
+    unsigned vectors;               /**< interrupt vectors per client */
+    unsigned protocol;              /**< protocol advertised to all clients */
+} IvshmemServerArgs;
+
+/**
+ * Structure describing an ivshmem server
+ *
+ * This structure stores all information related to our server: the name
+ * of the server unix socket and the list of connected peers.
+ */
+typedef struct IvshmemServer {
+    IvshmemServerArgs args;          /**< server arguments */
+    int sock_fd;                     /**< unix sock file descriptor */
+    int shm_fd;                      /**< shm file descriptor */
+    uint32_t *state_table;           /**< mapped state table */
+    QTAILQ_HEAD(, IvshmemServerPeer) peer_list; /**< list of peers */
+} IvshmemServer;
+
+/**
+ * Initialize an ivshmem server
+ *
+ * @server:         A pointer to an uninitialized IvshmemServer structure
+ */
+void ivshmem_server_init(IvshmemServer *server);
+
+/**
+ * Open the shm, then create and bind to the unix socket
+ *
+ * @server: The pointer to the initialized IvshmemServer structure
+ *
+ * Returns: 0 on success, or a negative value on error
+ */
+int ivshmem_server_start(IvshmemServer *server);
+
+/**
+ * Close the server
+ *
+ * Close connections to all clients, close the unix socket and the
+ * shared memory file descriptor. The structure remains initialized, so
+ * it is possible to call ivshmem_server_start() again after a call to
+ * ivshmem_server_close().
+ *
+ * @server: The ivshmem server
+ */
+void ivshmem_server_close(IvshmemServer *server);
+
+/**
+ * Fill a fd_set with file descriptors to be monitored
+ *
+ * This function will fill a fd_set with all file descriptors that must
+ * be polled (unix server socket and peers unix socket). The function
+ * will not initialize the fd_set, it is up to the caller to do it.
+ *
+ * @server: The ivshmem server
+ * @fds:    The fd_set to be updated
+ * @maxfd:  Must be set to the max file descriptor + 1 in fd_set. This value is
+ *          updated if this function adds a greater fd in fd_set.
+ */
+void
+ivshmem_server_get_fds(const IvshmemServer *server, fd_set *fds, int *maxfd);
+
+/**
+ * Read and handle new messages
+ *
+ * Given a fd_set (for instance filled by a call to select()), handle
+ * incoming messages from peers.
+ *
+ * @server: The ivshmem server
+ * @fds:    The fd_set containing the file descriptors to be checked. Note that
+ *          file descriptors that are not related to our server are ignored.
+ * @maxfd:  The maximum fd in fd_set, plus one.
+ *
+ * Returns: 0 on success, or a negative value on error
+ */
+int ivshmem_server_handle_fds(IvshmemServer *server, fd_set *fds, int maxfd);
+
+/**
+ * Search a peer from its identifier
+ *
+ * @server:  The ivshmem server
+ * @peer_id: The identifier of the peer structure
+ *
+ * Returns:  The peer structure, or NULL if not found
+ */
+IvshmemServerPeer *
+ivshmem_server_search_peer(IvshmemServer *server, int64_t peer_id);
+
+#endif /* IVSHMEM2_SERVER_H */
diff --git a/contrib/ivshmem2-server/main.c b/contrib/ivshmem2-server/main.c
new file mode 100644
index 0000000000..35cd6fca0f
--- /dev/null
+++ b/contrib/ivshmem2-server/main.c
@@ -0,0 +1,313 @@
+/*
+ * Copyright 6WIND S.A., 2014
+ * Copyright (c) Siemens AG, 2019
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/cutils.h"
+#include "qemu/option.h"
+#include "ivshmem2-server.h"
+
+#define IVSHMEM_SERVER_DEFAULT_FOREGROUND     0
+#define IVSHMEM_SERVER_DEFAULT_PID_FILE       "/var/run/ivshmem-server.pid"
+
+#define IVSHMEM_SERVER_DEFAULT_VERBOSE        0
+#define IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH "/tmp/ivshmem_socket"
+#define IVSHMEM_SERVER_DEFAULT_SHM_PATH       "ivshmem"
+#define IVSHMEM_SERVER_DEFAULT_SHM_SIZE       (4*1024*1024)
+#define IVSHMEM_SERVER_DEFAULT_OUTPUT_SEC_SZ  0
+#define IVSHMEM_SERVER_DEFAULT_MAX_PEERS      2
+#define IVSHMEM_SERVER_DEFAULT_VECTORS        1
+#define IVSHMEM_SERVER_DEFAULT_PROTOCOL       0
+
+/* used to quit on signal SIGTERM */
+static int ivshmem_server_quit;
+
+static bool foreground = IVSHMEM_SERVER_DEFAULT_FOREGROUND;
+static const char *pid_file = IVSHMEM_SERVER_DEFAULT_PID_FILE;
+
+static void
+ivshmem_server_usage(const char *progname)
+{
+    printf("Usage: %s [OPTION]...\n"
+           "  -h: show this help\n"
+           "  -v: verbose mode\n"
+           "  -F: foreground mode (default is to daemonize)\n"
+           "  -p <pid-file>: path to the PID file (used in daemon mode only)\n"
+           "     default " IVSHMEM_SERVER_DEFAULT_PID_FILE "\n"
+           "  -S <unix-socket-path>: path to the unix socket to listen to\n"
+           "     default " IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH "\n"
+           "  -M <shm-name>: POSIX shared memory object to use\n"
+           "     default " IVSHMEM_SERVER_DEFAULT_SHM_PATH "\n"
+           "  -m <dir-name>: where to create shared memory\n"
+           "  -l <size>: size of shared memory in bytes\n"
+           "     suffixes K, M and G can be used, e.g. 1K means 1024\n"
+           "     default %u\n"
+           "  -o <size>: size of each output section in bytes "
+                "(suffixes supported)\n"
+           "     default %u\n"
+           "  -n <peers>: maximum number of peers\n"
+           "     default %u\n"
+           "  -V <vectors>: number of vectors\n"
+           "     default %u\n"
+           "  -P <protocol>: 16-bit protocol to be advertised\n"
+           "     default 0x%04x\n"
+           "     When using virtio (0x8000...0xffff), only two peers are "
+           "supported, peer 0\n"
+           "     will become backend, peer 1 frontend\n",
+           progname, IVSHMEM_SERVER_DEFAULT_SHM_SIZE,
+           IVSHMEM_SERVER_DEFAULT_OUTPUT_SEC_SZ,
+           IVSHMEM_SERVER_DEFAULT_MAX_PEERS, IVSHMEM_SERVER_DEFAULT_VECTORS,
+           IVSHMEM_SERVER_DEFAULT_PROTOCOL);
+}
+
+static void
+ivshmem_server_help(const char *progname)
+{
+    fprintf(stderr, "Try '%s -h' for more information.\n", progname);
+}
+
+/* parse the program arguments, exit on error */
+static void
+ivshmem_server_parse_args(IvshmemServerArgs *args, int argc, char *argv[])
+{
+    int c;
+    unsigned long long v;
+    Error *err = NULL;
+
+    while ((c = getopt(argc, argv, "hvFp:S:m:M:l:o:n:V:P:")) != -1) {
+
+        switch (c) {
+        case 'h': /* help */
+            ivshmem_server_usage(argv[0]);
+            exit(0);
+            break;
+
+        case 'v': /* verbose */
+            args->verbose = 1;
+            break;
+
+        case 'F': /* foreground */
+            foreground = 1;
+            break;
+
+        case 'p': /* pid file */
+            pid_file = optarg;
+            break;
+
+        case 'S': /* unix socket path */
+            args->unix_socket_path = optarg;
+            break;
+
+        case 'M': /* shm name */
+        case 'm': /* dir name */
+            args->shm_path = optarg;
+            args->use_shm_open = c == 'M';
+            break;
+
+        case 'l': /* shm size */
+            parse_option_size("shm_size", optarg, &args->shm_size, &err);
+            if (err) {
+                error_report_err(err);
+                ivshmem_server_help(argv[0]);
+                exit(1);
+            }
+            break;
+
+        case 'o': /* output section size */
+            parse_option_size("output_section_size", optarg,
+                              &args->output_section_size, &err);
+            if (err) {
+                error_report_err(err);
+                ivshmem_server_help(argv[0]);
+                exit(1);
+            }
+            break;
+
+        case 'n': /* maximum number of peers */
+            if (parse_uint_full(optarg, &v, 0) < 0) {
+                fprintf(stderr, "cannot parse max-peers\n");
+                ivshmem_server_help(argv[0]);
+                exit(1);
+            }
+            args->max_peers = v;
+            break;
+
+        case 'V': /* number of vectors */
+            if (parse_uint_full(optarg, &v, 0) < 0) {
+                fprintf(stderr, "cannot parse vectors\n");
+                ivshmem_server_help(argv[0]);
+                exit(1);
+            }
+            args->vectors = v;
+            break;
+
+        case 'P': /* protocol */
+            if (parse_uint_full(optarg, &v, 0) < 0) {
+                fprintf(stderr, "cannot parse protocol\n");
+                ivshmem_server_help(argv[0]);
+                exit(1);
+            }
+            args->protocol = v;
+            break;
+
+        default:
+            ivshmem_server_usage(argv[0]);
+            exit(1);
+            break;
+        }
+    }
+
+    if (args->vectors > IVSHMEM_SERVER_MAX_VECTORS) {
+        fprintf(stderr, "too many requested vectors (max is %d)\n",
+                IVSHMEM_SERVER_MAX_VECTORS);
+        ivshmem_server_help(argv[0]);
+        exit(1);
+    }
+
+    if (args->protocol >= 0x8000 && args->max_peers > 2) {
+        fprintf(stderr, "virtio protocols only support 2 peers\n");
+        ivshmem_server_help(argv[0]);
+        exit(1);
+    }
+
+    if (args->verbose == 1 && foreground == 0) {
+        fprintf(stderr, "cannot use verbose in daemon mode\n");
+        ivshmem_server_help(argv[0]);
+        exit(1);
+    }
+}
+
+/* wait for events on listening server unix socket and connected client
+ * sockets */
+static int
+ivshmem_server_poll_events(IvshmemServer *server)
+{
+    fd_set fds;
+    int ret = 0, maxfd;
+
+    while (!ivshmem_server_quit) {
+
+        FD_ZERO(&fds);
+        maxfd = 0;
+        ivshmem_server_get_fds(server, &fds, &maxfd);
+
+        ret = select(maxfd, &fds, NULL, NULL, NULL);
+
+        if (ret < 0) {
+            if (errno == EINTR) {
+                continue;
+            }
+
+            fprintf(stderr, "select error: %s\n", strerror(errno));
+            break;
+        }
+        if (ret == 0) {
+            continue;
+        }
+
+        if (ivshmem_server_handle_fds(server, &fds, maxfd) < 0) {
+            fprintf(stderr, "ivshmem_server_handle_fds() failed\n");
+            break;
+        }
+    }
+
+    return ret;
+}
+
+static void
+ivshmem_server_quit_cb(int signum)
+{
+    ivshmem_server_quit = 1;
+}
+
+int
+main(int argc, char *argv[])
+{
+    IvshmemServer server = {
+        .args = {
+            .verbose = IVSHMEM_SERVER_DEFAULT_VERBOSE,
+            .unix_socket_path = IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH,
+            .shm_path = IVSHMEM_SERVER_DEFAULT_SHM_PATH,
+            .use_shm_open = true,
+            .shm_size = IVSHMEM_SERVER_DEFAULT_SHM_SIZE,
+            .output_section_size = IVSHMEM_SERVER_DEFAULT_OUTPUT_SEC_SZ,
+            .max_peers = IVSHMEM_SERVER_DEFAULT_MAX_PEERS,
+            .vectors = IVSHMEM_SERVER_DEFAULT_VECTORS,
+            .protocol = IVSHMEM_SERVER_DEFAULT_PROTOCOL,
+        },
+    };
+    struct sigaction sa, sa_quit;
+    int ret = 1;
+
+    /*
+     * Do not remove this notice without adding proper error handling!
+     * Start with handling ivshmem_server_send_one_msg() failure.
+     */
+    printf("*** Example code, do not use in production ***\n");
+
+    /* parse arguments, will exit on error */
+    ivshmem_server_parse_args(&server.args, argc, argv);
+
+    /* Ignore SIGPIPE, see this link for more info:
+     * http://www.mail-archive.com/libevent-users@monkey.org/msg01606.html */
+    sa.sa_handler = SIG_IGN;
+    sa.sa_flags = 0;
+    if (sigemptyset(&sa.sa_mask) == -1 ||
+        sigaction(SIGPIPE, &sa, 0) == -1) {
+        perror("failed to ignore SIGPIPE; sigaction");
+        goto err;
+    }
+
+    sa_quit.sa_handler = ivshmem_server_quit_cb;
+    sa_quit.sa_flags = 0;
+    if (sigemptyset(&sa_quit.sa_mask) == -1 ||
+        sigaction(SIGTERM, &sa_quit, 0) == -1 ||
+        sigaction(SIGINT, &sa_quit, 0) == -1) {
+        perror("failed to add signal handler; sigaction");
+        goto err;
+    }
+
+    /* init the ivshms structure */
+    ivshmem_server_init(&server);
+
+    /* start the ivshmem server (open shm & unix socket) */
+    if (ivshmem_server_start(&server) < 0) {
+        fprintf(stderr, "cannot bind\n");
+        goto err;
+    }
+
+    /* daemonize if asked to */
+    if (!foreground) {
+        FILE *fp;
+
+        if (qemu_daemon(1, 1) < 0) {
+            fprintf(stderr, "cannot daemonize: %s\n", strerror(errno));
+            goto err_close;
+        }
+
+        /* write pid file */
+        fp = fopen(pid_file, "w");
+        if (fp == NULL) {
+            fprintf(stderr, "cannot write pid file: %s\n", strerror(errno));
+            goto err_close;
+        }
+
+        fprintf(fp, "%d\n", (int) getpid());
+        fclose(fp);
+    }
+
+    ivshmem_server_poll_events(&server);
+    fprintf(stdout, "server disconnected\n");
+    ret = 0;
+
+err_close:
+    ivshmem_server_close(&server);
+err:
+    return ret;
+}
-- 
2.16.4



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU
  2020-01-07 14:36 [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU Jan Kiszka
                   ` (2 preceding siblings ...)
  2020-01-07 14:36 ` [RFC][PATCH v2 3/3] contrib: Add server for ivshmem " Jan Kiszka
@ 2020-04-09 13:52 ` Liang Yan
  2020-04-09 14:11   ` Jan Kiszka
  2020-04-29  4:15 ` Liang Yan
  4 siblings, 1 reply; 14+ messages in thread
From: Liang Yan @ 2020-04-09 13:52 UTC (permalink / raw)
  To: Jan Kiszka, qemu-devel
  Cc: Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi



On 1/7/20 9:36 AM, Jan Kiszka wrote:
> Overdue update of the ivshmem 2.0 device model as presented at [1].
> 
> Changes in v2:
>  - changed PCI device ID to Siemens-granted one,
>    adjusted PCI device revision to 0
>  - removed unused feature register from device
>  - addressed feedback on specification document
>  - rebased over master
> 
> This version is now fully in sync with the implementation for Jailhouse
> that is currently under review [2][3], UIO and virtio-ivshmem drivers
> are shared. Jailhouse will very likely pick up this revision of the
> device in order to move forward with stressing it.
> 
> More details on the usage with QEMU were in the original cover letter
> (with adjustements to the new device ID):
> 
> If you want to play with this, the basic setup of the shared memory
> device is described in patch 1 and 3. UIO driver and also the
> virtio-ivshmem prototype can be found at
> 
>     http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2
> 
> Accessing the device via UIO is trivial enough. If you want to use it
> for virtio, this is additionally to the description in patch 3 needed on
> the virtio console backend side:
> 
>     modprobe uio_ivshmem
>     echo "110a 4106 1af4 1100 ffc003 ffffff" > /sys/bus/pci/drivers/uio_ivshmem/new_id
>     linux/tools/virtio/virtio-ivshmem-console /dev/uio0
> 
> And for virtio block:
> 
>     echo "110a 4106 1af4 1100 ffc002 ffffff" > /sys/bus/pci/drivers/uio_ivshmem/new_id
>     linux/tools/virtio/virtio-ivshmem-console /dev/uio0 /path/to/disk.img
> 
> After that, you can start the QEMU frontend instance with the
> virtio-ivshmem driver installed which can use the new /dev/hvc* or
> /dev/vda* as usual.
> 
Hi, Jan,

Nice work.

I did a full test for your this new version. QEMU device part looks
good, virtio console worked as expected. Just had some issue with
virtio-ivshmem-block tests here.

I suppose you mean "linux/tools/virtio/virtio-ivshmem-block"?

Noticed "ffc002" is the main difference, however I saw nothing response
when run echo command here, are there anything I need to prepare?

I build the driver in guest kernel already.

Do I need a new protocol or anything for below command line?
ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003

Best,
Liang



> Any feedback welcome!
> 
> Jan
> 
> PS: Let me know if I missed someone potentially interested in this topic
> on CC - or if you would like to be dropped from the list.
> 
> [1] https://kvmforum2019.sched.com/event/TmxI
> [2] https://groups.google.com/forum/#!topic/jailhouse-dev/ffnCcRh8LOs
> [3] https://groups.google.com/forum/#!topic/jailhouse-dev/HX-0AGF1cjg
> 
> Jan Kiszka (3):
>   hw/misc: Add implementation of ivshmem revision 2 device
>   docs/specs: Add specification of ivshmem device revision 2
>   contrib: Add server for ivshmem revision 2
> 
>  Makefile                                  |    3 +
>  Makefile.objs                             |    1 +
>  configure                                 |    1 +
>  contrib/ivshmem2-server/Makefile.objs     |    1 +
>  contrib/ivshmem2-server/ivshmem2-server.c |  462 ++++++++++++
>  contrib/ivshmem2-server/ivshmem2-server.h |  158 +++++
>  contrib/ivshmem2-server/main.c            |  313 +++++++++
>  docs/specs/ivshmem-2-device-spec.md       |  376 ++++++++++
>  hw/misc/Makefile.objs                     |    2 +-
>  hw/misc/ivshmem2.c                        | 1085 +++++++++++++++++++++++++++++
>  include/hw/misc/ivshmem2.h                |   48 ++
>  include/hw/pci/pci_ids.h                  |    2 +
>  12 files changed, 2451 insertions(+), 1 deletion(-)
>  create mode 100644 contrib/ivshmem2-server/Makefile.objs
>  create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c
>  create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h
>  create mode 100644 contrib/ivshmem2-server/main.c
>  create mode 100644 docs/specs/ivshmem-2-device-spec.md
>  create mode 100644 hw/misc/ivshmem2.c
>  create mode 100644 include/hw/misc/ivshmem2.h
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU
  2020-04-09 13:52 ` [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU Liang Yan
@ 2020-04-09 14:11   ` Jan Kiszka
  2020-04-10  0:59     ` Liang Yan
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kiszka @ 2020-04-09 14:11 UTC (permalink / raw)
  To: Liang Yan, qemu-devel
  Cc: Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi

On 09.04.20 15:52, Liang Yan wrote:
> 
> 
> On 1/7/20 9:36 AM, Jan Kiszka wrote:
>> Overdue update of the ivshmem 2.0 device model as presented at [1].
>>
>> Changes in v2:
>>   - changed PCI device ID to Siemens-granted one,
>>     adjusted PCI device revision to 0
>>   - removed unused feature register from device
>>   - addressed feedback on specification document
>>   - rebased over master
>>
>> This version is now fully in sync with the implementation for Jailhouse
>> that is currently under review [2][3], UIO and virtio-ivshmem drivers
>> are shared. Jailhouse will very likely pick up this revision of the
>> device in order to move forward with stressing it.
>>
>> More details on the usage with QEMU were in the original cover letter
>> (with adjustements to the new device ID):
>>
>> If you want to play with this, the basic setup of the shared memory
>> device is described in patch 1 and 3. UIO driver and also the
>> virtio-ivshmem prototype can be found at
>>
>>      http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2
>>
>> Accessing the device via UIO is trivial enough. If you want to use it
>> for virtio, this is additionally to the description in patch 3 needed on
>> the virtio console backend side:
>>
>>      modprobe uio_ivshmem
>>      echo "110a 4106 1af4 1100 ffc003 ffffff" > /sys/bus/pci/drivers/uio_ivshmem/new_id
>>      linux/tools/virtio/virtio-ivshmem-console /dev/uio0
>>
>> And for virtio block:
>>
>>      echo "110a 4106 1af4 1100 ffc002 ffffff" > /sys/bus/pci/drivers/uio_ivshmem/new_id
>>      linux/tools/virtio/virtio-ivshmem-console /dev/uio0 /path/to/disk.img
>>
>> After that, you can start the QEMU frontend instance with the
>> virtio-ivshmem driver installed which can use the new /dev/hvc* or
>> /dev/vda* as usual.
>>
> Hi, Jan,
> 
> Nice work.
> 
> I did a full test for your this new version. QEMU device part looks
> good, virtio console worked as expected. Just had some issue with
> virtio-ivshmem-block tests here.
> 
> I suppose you mean "linux/tools/virtio/virtio-ivshmem-block"?

Yes, copy&paste mistake, had the same issue over in 
https://github.com/siemens/jailhouse/blob/master/Documentation/inter-cell-communication.md

> 
> Noticed "ffc002" is the main difference, however I saw nothing response
> when run echo command here, are there anything I need to prepare?
> 
> I build the driver in guest kernel already.
> 
> Do I need a new protocol or anything for below command line?
> ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003

Yes, you need to adjust that command line - didn't I document that 
somewhere? Looks like I didn't:

ivshmem2-server -F -l 1M -n 2 -V 2 -P 0x8002

i.e. a bit more memory is good (but this isn't speed-optimized anyway), 
you only need 2 vectors here (but more do not harm), and the protocol 
indeed needs adjustment (that is the key).

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU
  2020-04-09 14:11   ` Jan Kiszka
@ 2020-04-10  0:59     ` Liang Yan
  0 siblings, 0 replies; 14+ messages in thread
From: Liang Yan @ 2020-04-10  0:59 UTC (permalink / raw)
  To: Jan Kiszka, qemu-devel
  Cc: Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi



On 4/9/20 10:11 AM, Jan Kiszka wrote:
> On 09.04.20 15:52, Liang Yan wrote:
>>
>>
>> On 1/7/20 9:36 AM, Jan Kiszka wrote:
>>> Overdue update of the ivshmem 2.0 device model as presented at [1].
>>>
>>> Changes in v2:
>>>   - changed PCI device ID to Siemens-granted one,
>>>     adjusted PCI device revision to 0
>>>   - removed unused feature register from device
>>>   - addressed feedback on specification document
>>>   - rebased over master
>>>
>>> This version is now fully in sync with the implementation for Jailhouse
>>> that is currently under review [2][3], UIO and virtio-ivshmem drivers
>>> are shared. Jailhouse will very likely pick up this revision of the
>>> device in order to move forward with stressing it.
>>>
>>> More details on the usage with QEMU were in the original cover letter
>>> (with adjustements to the new device ID):
>>>
>>> If you want to play with this, the basic setup of the shared memory
>>> device is described in patch 1 and 3. UIO driver and also the
>>> virtio-ivshmem prototype can be found at
>>>
>>>     
>>> http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2
>>>
>>>
>>> Accessing the device via UIO is trivial enough. If you want to use it
>>> for virtio, this is additionally to the description in patch 3 needed on
>>> the virtio console backend side:
>>>
>>>      modprobe uio_ivshmem
>>>      echo "110a 4106 1af4 1100 ffc003 ffffff" >
>>> /sys/bus/pci/drivers/uio_ivshmem/new_id
>>>      linux/tools/virtio/virtio-ivshmem-console /dev/uio0
>>>
>>> And for virtio block:
>>>
>>>      echo "110a 4106 1af4 1100 ffc002 ffffff" >
>>> /sys/bus/pci/drivers/uio_ivshmem/new_id
>>>      linux/tools/virtio/virtio-ivshmem-console /dev/uio0
>>> /path/to/disk.img
>>>
>>> After that, you can start the QEMU frontend instance with the
>>> virtio-ivshmem driver installed which can use the new /dev/hvc* or
>>> /dev/vda* as usual.
>>>
>> Hi, Jan,
>>
>> Nice work.
>>
>> I did a full test for your this new version. QEMU device part looks
>> good, virtio console worked as expected. Just had some issue with
>> virtio-ivshmem-block tests here.
>>
>> I suppose you mean "linux/tools/virtio/virtio-ivshmem-block"?
> 
> Yes, copy&paste mistake, had the same issue over in
> https://github.com/siemens/jailhouse/blob/master/Documentation/inter-cell-communication.md
> 
> 
>>
>> Noticed "ffc002" is the main difference, however I saw nothing response
>> when run echo command here, are there anything I need to prepare?
>>
>> I build the driver in guest kernel already.
>>
>> Do I need a new protocol or anything for below command line?
>> ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003
> 
> Yes, you need to adjust that command line - didn't I document that
> somewhere? Looks like I didn't:
> 
> ivshmem2-server -F -l 1M -n 2 -V 2 -P 0x8002
> 
> i.e. a bit more memory is good (but this isn't speed-optimized anyway),
> you only need 2 vectors here (but more do not harm), and the protocol
> indeed needs adjustment (that is the key).
> 

Thanks for the reply. I just confirmed that virtio-ivshmem-block worked
with the new configruation, a "vdb" disk is mounted to fronted VM. I
will send out a full test summary later.

Best,
Liang



> Jan
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU
  2020-01-07 14:36 [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU Jan Kiszka
                   ` (3 preceding siblings ...)
  2020-04-09 13:52 ` [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU Liang Yan
@ 2020-04-29  4:15 ` Liang Yan
  2020-04-29 11:50   ` Jan Kiszka
  4 siblings, 1 reply; 14+ messages in thread
From: Liang Yan @ 2020-04-29  4:15 UTC (permalink / raw)
  To: Jan Kiszka, qemu-devel
  Cc: Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi

Hi, All,

Did a test for these patches, all looked fine.

Test environment:
Host: opensuse tumbleweed + latest upstream qemu  + these three patches
Guest: opensuse tumbleweed root fs + custom kernel(5.5) + related
uio-ivshmem driver + ivshmem-console/ivshmem-block tools


1. lspci show

00:04.0 Unassigned class [ff80]: Siemens AG Device 4106 (prog-if 02)
Subsystem: Red Hat, Inc. Device 1100
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0
Region 0: Memory at fea56000 (32-bit, non-prefetchable) [size=4K]
Region 1: Memory at fea57000 (32-bit, non-prefetchable) [size=4K]
Region 2: Memory at fd800000 (64-bit, prefetchable) [size=1M]
Capabilities: [4c] Vendor Specific Information: Len=18 <?>
Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
Vector table: BAR=1 offset=00000000
PBA: BAR=1 offset=00000800
Kernel driver in use: virtio-ivshmem


2. virtio-ivshmem-console test
2.1 ivshmem2-server(host)

airey:~/ivshmem/qemu/:[0]# ./ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003
*** Example code, do not use in production ***

2.2 guest vm backend(test-01)
localhost:~ # echo "110a 4106 1af4 1100 ffc003 ffffff" >
/sys/bus/pci/drivers/uio_ivshmem/new_id
[  185.831277] uio_ivshmem 0000:00:04.0: state_table at
0x00000000fd800000, size 0x0000000000001000
[  185.835129] uio_ivshmem 0000:00:04.0: rw_section at
0x00000000fd801000, size 0x0000000000007000

localhost:~ # virtio/virtio-ivshmem-console /dev/uio0
Waiting for peer to be ready...

2.3 guest vm frontend(test-02)
need to boot or reboot after backend is done

2.4 backend will serial output of frontend

localhost:~ # virtio/virtio-ivshmem-console /dev/uio0
Waiting for peer to be ready...

localhost:~/virtio # ./virtio-ivshmem-console /dev/uio0
Waiting for peer to be ready...
Starting virtio device
device_status: 0x0
device_status: 0x1
device_status: 0x3
device_features_sel: 1
device_features_sel: 0
driver_features_sel: 1
driver_features[1]: 0x13
driver_features_sel: 0
driver_features[0]: 0x1
device_status: 0xb
queue_sel: 0
queue size: 8
queue driver vector: 1
queue desc: 0x200
queue driver: 0x280
queue device: 0x2c0
queue enable: 1
queue_sel: 1
queue size: 8
queue driver vector: 2
queue desc: 0x400
queue driver: 0x480
queue device: 0x4c0
queue enable: 1
device_status: 0xf

Welcome to openSUSE Tumbleweed 20200326 - Kernel 5.5.0-rc5-1-default+
(hvc0).

enp0s3:


localhost login:

2.5 close backend and frontend will show
localhost:~ # [  185.685041] virtio-ivshmem 0000:00:04.0: backend failed!

3. virtio-ivshmem-block test

3.1 ivshmem2-server(host)
airey:~/ivshmem/qemu/:[0]# ./ivshmem2-server -F -l 1M -n 2 -V 2 -P 0x8002
*** Example code, do not use in production ***

3.2 guest vm backend(test-01)

localhost:~ # echo "110a 4106 1af4 1100 ffc002 ffffff" >
/sys/bus/pci/drivers/uio_ivshmem/new_id
[   77.701462] uio_ivshmem 0000:00:04.0: state_table at
0x00000000fd800000, size 0x0000000000001000
[   77.705231] uio_ivshmem 0000:00:04.0: rw_section at
0x00000000fd801000, size 0x00000000000ff000

localhost:~ # virtio/virtio-ivshmem-block /dev/uio0 /root/disk.img
Waiting for peer to be ready...

3.3 guest vm frontend(test-02)
need to boot or reboot after backend is done

3.4 guest vm backend(test-01)
localhost:~ # virtio/virtio-ivshmem-block /dev/uio0 /root/disk.img
Waiting for peer to be ready...
Starting virtio device
device_status: 0x0
device_status: 0x1
device_status: 0x3
device_features_sel: 1
device_features_sel: 0
driver_features_sel: 1
driver_features[1]: 0x13
driver_features_sel: 0
driver_features[0]: 0x206
device_status: 0xb
queue_sel: 0
queue size: 8
queue driver vector: 1
queue desc: 0x200
queue driver: 0x280
queue device: 0x2c0
queue enable: 1
device_status: 0xf

3.5 guest vm frontend(test-02), a new disk is attached:

fdisk /dev/vdb

Disk /dev/vdb: 192 KiB, 196608 bytes, 384 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

3.6 close backend and frontend will show
localhost:~ # [ 1312.284301] virtio-ivshmem 0000:00:04.0: backend failed!



Tested-by: Liang Yan <lyan@suse.com>

On 1/7/20 9:36 AM, Jan Kiszka wrote:
> Overdue update of the ivshmem 2.0 device model as presented at [1].
> 
> Changes in v2:
>  - changed PCI device ID to Siemens-granted one,
>    adjusted PCI device revision to 0
>  - removed unused feature register from device
>  - addressed feedback on specification document
>  - rebased over master
> 
> This version is now fully in sync with the implementation for Jailhouse
> that is currently under review [2][3], UIO and virtio-ivshmem drivers
> are shared. Jailhouse will very likely pick up this revision of the
> device in order to move forward with stressing it.
> 
> More details on the usage with QEMU were in the original cover letter
> (with adjustements to the new device ID):
> 
> If you want to play with this, the basic setup of the shared memory
> device is described in patch 1 and 3. UIO driver and also the
> virtio-ivshmem prototype can be found at
> 
>     http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2
> 
> Accessing the device via UIO is trivial enough. If you want to use it
> for virtio, this is additionally to the description in patch 3 needed on
> the virtio console backend side:
> 
>     modprobe uio_ivshmem
>     echo "110a 4106 1af4 1100 ffc003 ffffff" > /sys/bus/pci/drivers/uio_ivshmem/new_id
>     linux/tools/virtio/virtio-ivshmem-console /dev/uio0
> 
> And for virtio block:
> 
>     echo "110a 4106 1af4 1100 ffc002 ffffff" > /sys/bus/pci/drivers/uio_ivshmem/new_id
>     linux/tools/virtio/virtio-ivshmem-console /dev/uio0 /path/to/disk.img
> 
> After that, you can start the QEMU frontend instance with the
> virtio-ivshmem driver installed which can use the new /dev/hvc* or
> /dev/vda* as usual.
> 
> Any feedback welcome!
> 
> Jan
> 
> PS: Let me know if I missed someone potentially interested in this topic
> on CC - or if you would like to be dropped from the list.
> 
> [1] https://kvmforum2019.sched.com/event/TmxI
> [2] https://groups.google.com/forum/#!topic/jailhouse-dev/ffnCcRh8LOs
> [3] https://groups.google.com/forum/#!topic/jailhouse-dev/HX-0AGF1cjg
> 
> Jan Kiszka (3):
>   hw/misc: Add implementation of ivshmem revision 2 device
>   docs/specs: Add specification of ivshmem device revision 2
>   contrib: Add server for ivshmem revision 2
> 
>  Makefile                                  |    3 +
>  Makefile.objs                             |    1 +
>  configure                                 |    1 +
>  contrib/ivshmem2-server/Makefile.objs     |    1 +
>  contrib/ivshmem2-server/ivshmem2-server.c |  462 ++++++++++++
>  contrib/ivshmem2-server/ivshmem2-server.h |  158 +++++
>  contrib/ivshmem2-server/main.c            |  313 +++++++++
>  docs/specs/ivshmem-2-device-spec.md       |  376 ++++++++++
>  hw/misc/Makefile.objs                     |    2 +-
>  hw/misc/ivshmem2.c                        | 1085 +++++++++++++++++++++++++++++
>  include/hw/misc/ivshmem2.h                |   48 ++
>  include/hw/pci/pci_ids.h                  |    2 +
>  12 files changed, 2451 insertions(+), 1 deletion(-)
>  create mode 100644 contrib/ivshmem2-server/Makefile.objs
>  create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c
>  create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h
>  create mode 100644 contrib/ivshmem2-server/main.c
>  create mode 100644 docs/specs/ivshmem-2-device-spec.md
>  create mode 100644 hw/misc/ivshmem2.c
>  create mode 100644 include/hw/misc/ivshmem2.h
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH v2 1/3] hw/misc: Add implementation of ivshmem revision 2 device
  2020-01-07 14:36 ` [RFC][PATCH v2 1/3] hw/misc: Add implementation of ivshmem revision 2 device Jan Kiszka
@ 2020-04-29  4:20   ` Liang Yan
  0 siblings, 0 replies; 14+ messages in thread
From: Liang Yan @ 2020-04-29  4:20 UTC (permalink / raw)
  To: Jan Kiszka, qemu-devel
  Cc: Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi

A quick check by checkpatch.pl, pretty straightforward to fix.

ERROR: return is not a function, parentheses are not required
#211: FILE: hw/misc/ivshmem2.c:138:
+    return (ivs->features & (1 << feature));

ERROR: memory barrier without comment
#255: FILE: hw/misc/ivshmem2.c:182:
+    smp_mb();

ERROR: braces {} are necessary for all arms of this statement
#626: FILE: hw/misc/ivshmem2.c:553:
+    if (msg->vector == 0)
[...]

Best,
Liang


On 1/7/20 9:36 AM, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This adds a reimplementation of ivshmem in its new revision 2 as
> separate device. The goal of this is not to enable sharing with v1,
> rather to allow explore the properties and potential limitation of the
> new version prior to discussing its integration with the existing code.
> 
> v2 always requires a server to interconnect two more more QEMU
> instances because it provides signaling between peers unconditionally.
> Therefore, only the interconnecting chardev, master mode, and the usage
> of ioeventfd can be configured at device level. All other parameters are
> defined by the server instance.
> 
> A new server protocol is introduced along this. Its primary difference
> is the introduction of a single welcome message that contains all peer
> parameters, rather than a series of single-word messages pushing them
> piece by piece.
> 
> A complicating difference in interrupt handling, compare to v1, is the
> auto-disable mode of v2: When this is active, interrupt delivery is
> disabled by the device after each interrupt event. This prevents the
> usage of irqfd on the receiving side, but it lowers the handling cost
> for guests that implemented interrupt throttling this way (specifically
> when exposing the device via UIO).
> 
> No changes have been made to the ivshmem device regarding migration:
> Only the master can live-migrate, slave peers have to hot-unplug the
> device first.
> 
> The details of the device model will be specified in a succeeding
> commit. Drivers for this device can currently be found under
> 
> http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2
> 
> To instantiate a ivshmem v2 device, just add
> 
>  ... -chardev socket,path=/tmp/ivshmem_socket,id=ivshmem \
>      -device ivshmem,chardev=ivshmem
> 
> provided the server opened its socket under the default path.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  hw/misc/Makefile.objs      |    2 +-
>  hw/misc/ivshmem2.c         | 1085 ++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/misc/ivshmem2.h |   48 ++
>  include/hw/pci/pci_ids.h   |    2 +
>  4 files changed, 1136 insertions(+), 1 deletion(-)
>  create mode 100644 hw/misc/ivshmem2.c
>  create mode 100644 include/hw/misc/ivshmem2.h
> 
> diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
> index ba898a5781..90a4a6608c 100644
> --- a/hw/misc/Makefile.objs
> +++ b/hw/misc/Makefile.objs
> @@ -26,7 +26,7 @@ common-obj-$(CONFIG_PUV3) += puv3_pm.o
>  
>  common-obj-$(CONFIG_MACIO) += macio/
>  
> -common-obj-$(CONFIG_IVSHMEM_DEVICE) += ivshmem.o
> +common-obj-$(CONFIG_IVSHMEM_DEVICE) += ivshmem.o ivshmem2.o
>  
>  common-obj-$(CONFIG_REALVIEW) += arm_sysctl.o
>  common-obj-$(CONFIG_NSERIES) += cbus.o
> diff --git a/hw/misc/ivshmem2.c b/hw/misc/ivshmem2.c
> new file mode 100644
> index 0000000000..d5f88ed0e9
> --- /dev/null
> +++ b/hw/misc/ivshmem2.c
> @@ -0,0 +1,1085 @@
> +/*
> + * Inter-VM Shared Memory PCI device, version 2.
> + *
> + * Copyright (c) Siemens AG, 2019
> + *
> + * Authors:
> + *  Jan Kiszka <jan.kiszka@siemens.com>
> + *
> + * Based on ivshmem.c by Cam Macdonell <cam@cs.ualberta.ca>
> + *
> + * This code is licensed under the GNU GPL v2.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/units.h"
> +#include "qapi/error.h"
> +#include "qemu/cutils.h"
> +#include "hw/hw.h"
> +#include "hw/pci/pci.h"
> +#include "hw/pci/msi.h"
> +#include "hw/pci/msix.h"
> +#include "hw/qdev-properties.h"
> +#include "sysemu/kvm.h"
> +#include "migration/blocker.h"
> +#include "migration/vmstate.h"
> +#include "qemu/error-report.h"
> +#include "qemu/event_notifier.h"
> +#include "qemu/module.h"
> +#include "qom/object_interfaces.h"
> +#include "chardev/char-fe.h"
> +#include "sysemu/qtest.h"
> +#include "qapi/visitor.h"
> +
> +#include "hw/misc/ivshmem2.h"
> +
> +#define PCI_VENDOR_ID_IVSHMEM   PCI_VENDOR_ID_SIEMENS
> +#define PCI_DEVICE_ID_IVSHMEM   0x4106
> +
> +#define IVSHMEM_MAX_PEERS       UINT16_MAX
> +#define IVSHMEM_IOEVENTFD       0
> +#define IVSHMEM_MSI             1
> +
> +#define IVSHMEM_REG_BAR_SIZE    0x1000
> +
> +#define IVSHMEM_REG_ID          0x00
> +#define IVSHMEM_REG_MAX_PEERS   0x04
> +#define IVSHMEM_REG_INT_CTRL    0x08
> +#define IVSHMEM_REG_DOORBELL    0x0c
> +#define IVSHMEM_REG_STATE       0x10
> +
> +#define IVSHMEM_INT_ENABLE      0x1
> +
> +#define IVSHMEM_ONESHOT_MODE    0x1
> +
> +#define IVSHMEM_DEBUG 0
> +#define IVSHMEM_DPRINTF(fmt, ...)                       \
> +    do {                                                \
> +        if (IVSHMEM_DEBUG) {                            \
> +            printf("IVSHMEM: " fmt, ## __VA_ARGS__);    \
> +        }                                               \
> +    } while (0)
> +
> +#define TYPE_IVSHMEM "ivshmem"
> +#define IVSHMEM(obj) \
> +    OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM)
> +
> +typedef struct Peer {
> +    int nb_eventfds;
> +    EventNotifier *eventfds;
> +} Peer;
> +
> +typedef struct MSIVector {
> +    PCIDevice *pdev;
> +    int virq;
> +    bool unmasked;
> +} MSIVector;
> +
> +typedef struct IVShmemVndrCap {
> +    uint8_t id;
> +    uint8_t next;
> +    uint8_t length;
> +    uint8_t priv_ctrl;
> +    uint32_t state_tab_sz;
> +    uint64_t rw_section_sz;
> +    uint64_t output_section_sz;
> +} IVShmemVndrCap;
> +
> +typedef struct IVShmemState {
> +    /*< private >*/
> +    PCIDevice parent_obj;
> +    /*< public >*/
> +
> +    uint32_t features;
> +
> +    CharBackend server_chr;
> +
> +    /* registers */
> +    uint8_t *priv_ctrl;
> +    uint32_t vm_id;
> +    uint32_t intctrl;
> +    uint32_t state;
> +
> +    /* BARs */
> +    MemoryRegion ivshmem_mmio; /* BAR 0 (registers) */
> +    MemoryRegion ivshmem_bar2; /* BAR 2 (shared memory) */
> +
> +    void *shmem;
> +    size_t shmem_sz;
> +    size_t output_section_sz;
> +
> +    MemoryRegion state_tab;
> +    MemoryRegion rw_section;
> +    MemoryRegion input_sections;
> +    MemoryRegion output_section;
> +
> +    /* interrupt support */
> +    Peer *peers;
> +    int nb_peers;               /* space in @peers[] */
> +    uint32_t max_peers;
> +    uint32_t vectors;
> +    MSIVector *msi_vectors;
> +
> +    uint8_t msg_buf[32];        /* buffer for receiving server messages */
> +    int msg_buffered_bytes;     /* #bytes in @msg_buf */
> +
> +    uint32_t protocol;
> +
> +    /* migration stuff */
> +    OnOffAuto master;
> +    Error *migration_blocker;
> +} IVShmemState;
> +
> +static void ivshmem_enable_irqfd(IVShmemState *s);
> +static void ivshmem_disable_irqfd(IVShmemState *s);
> +
> +static inline uint32_t ivshmem_has_feature(IVShmemState *ivs,
> +                                           unsigned int feature) {
> +    return (ivs->features & (1 << feature));
> +}
> +
> +static inline bool ivshmem_is_master(IVShmemState *s)
> +{
> +    assert(s->master != ON_OFF_AUTO_AUTO);
> +    return s->master == ON_OFF_AUTO_ON;
> +}
> +
> +static bool ivshmem_irqfd_usable(IVShmemState *s)
> +{
> +    PCIDevice *pdev = PCI_DEVICE(s);
> +
> +    return (s->intctrl & IVSHMEM_INT_ENABLE) && msix_enabled(pdev) &&
> +        !(*s->priv_ctrl & IVSHMEM_ONESHOT_MODE);
> +}
> +
> +static void ivshmem_update_irqfd(IVShmemState *s, bool was_usable)
> +{
> +    bool is_usable = ivshmem_irqfd_usable(s);
> +
> +    if (kvm_msi_via_irqfd_enabled()) {
> +        if (!was_usable && is_usable) {
> +            ivshmem_enable_irqfd(s);
> +        } else if (was_usable && !is_usable) {
> +            ivshmem_disable_irqfd(s);
> +        }
> +    }
> +}
> +
> +static void ivshmem_write_intctrl(IVShmemState *s, uint32_t new_state)
> +{
> +    bool was_usable = ivshmem_irqfd_usable(s);
> +
> +    s->intctrl = new_state & IVSHMEM_INT_ENABLE;
> +    ivshmem_update_irqfd(s, was_usable);
> +}
> +
> +static void ivshmem_write_state(IVShmemState *s, uint32_t new_state)
> +{
> +    uint32_t *state_table = s->shmem;
> +    int peer;
> +
> +    state_table[s->vm_id] = new_state;
> +    smp_mb();
> +
> +    if (s->state != new_state) {
> +        s->state = new_state;
> +        for (peer = 0; peer < s->nb_peers; peer++) {
> +            if (peer != s->vm_id && s->peers[peer].nb_eventfds > 0) {
> +                event_notifier_set(&s->peers[peer].eventfds[0]);
> +            }
> +        }
> +    }
> +}
> +
> +static void ivshmem_io_write(void *opaque, hwaddr addr,
> +                             uint64_t val, unsigned size)
> +{
> +    IVShmemState *s = opaque;
> +
> +    uint16_t dest = val >> 16;
> +    uint16_t vector = val & 0xff;
> +
> +    addr &= 0xfc;
> +
> +    IVSHMEM_DPRINTF("writing to addr " TARGET_FMT_plx "\n", addr);
> +    switch (addr) {
> +    case IVSHMEM_REG_INT_CTRL:
> +        ivshmem_write_intctrl(s, val);
> +        break;
> +
> +    case IVSHMEM_REG_DOORBELL:
> +        /* check that dest VM ID is reasonable */
> +        if (dest >= s->nb_peers) {
> +            IVSHMEM_DPRINTF("Invalid destination VM ID (%d)\n", dest);
> +            break;
> +        }
> +
> +        /* check doorbell range */
> +        if (vector < s->peers[dest].nb_eventfds) {
> +            IVSHMEM_DPRINTF("Notifying VM %d on vector %d\n", dest, vector);
> +            event_notifier_set(&s->peers[dest].eventfds[vector]);
> +        } else {
> +            IVSHMEM_DPRINTF("Invalid destination vector %d on VM %d\n",
> +                            vector, dest);
> +        }
> +        break;
> +
> +    case IVSHMEM_REG_STATE:
> +        ivshmem_write_state(s, val);
> +        break;
> +
> +    default:
> +        IVSHMEM_DPRINTF("Unhandled write " TARGET_FMT_plx "\n", addr);
> +    }
> +}
> +
> +static uint64_t ivshmem_io_read(void *opaque, hwaddr addr,
> +                                unsigned size)
> +{
> +    IVShmemState *s = opaque;
> +    uint32_t ret;
> +
> +    switch (addr) {
> +    case IVSHMEM_REG_ID:
> +        ret = s->vm_id;
> +        break;
> +
> +    case IVSHMEM_REG_MAX_PEERS:
> +        ret = s->max_peers;
> +        break;
> +
> +    case IVSHMEM_REG_INT_CTRL:
> +        ret = s->intctrl;
> +        break;
> +
> +    case IVSHMEM_REG_STATE:
> +        ret = s->state;
> +        break;
> +
> +    default:
> +        IVSHMEM_DPRINTF("why are we reading " TARGET_FMT_plx "\n", addr);
> +        ret = 0;
> +    }
> +
> +    return ret;
> +}
> +
> +static const MemoryRegionOps ivshmem_mmio_ops = {
> +    .read = ivshmem_io_read,
> +    .write = ivshmem_io_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +    .impl = {
> +        .min_access_size = 4,
> +        .max_access_size = 4,
> +    },
> +};
> +
> +static void ivshmem_vector_notify(void *opaque)
> +{
> +    MSIVector *entry = opaque;
> +    PCIDevice *pdev = entry->pdev;
> +    IVShmemState *s = IVSHMEM(pdev);
> +    int vector = entry - s->msi_vectors;
> +    EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
> +
> +    if (!event_notifier_test_and_clear(n) ||
> +        !(s->intctrl & IVSHMEM_INT_ENABLE)) {
> +        return;
> +    }
> +
> +    IVSHMEM_DPRINTF("interrupt on vector %p %d\n", pdev, vector);
> +    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> +        if (msix_enabled(pdev)) {
> +            msix_notify(pdev, vector);
> +        }
> +    } else if (pdev->config[PCI_INTERRUPT_PIN]) {
> +        pci_set_irq(pdev, 1);
> +        pci_set_irq(pdev, 0);
> +    }
> +    if (*s->priv_ctrl & IVSHMEM_ONESHOT_MODE) {
> +        s->intctrl &= ~IVSHMEM_INT_ENABLE;
> +    }
> +}
> +
> +static int ivshmem_irqfd_vector_unmask(PCIDevice *dev, unsigned vector,
> +                                       MSIMessage msg)
> +{
> +    IVShmemState *s = IVSHMEM(dev);
> +    EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
> +    MSIVector *v = &s->msi_vectors[vector];
> +    int ret;
> +
> +    IVSHMEM_DPRINTF("vector unmask %p %d\n", dev, vector);
> +    if (!v->pdev) {
> +        error_report("ivshmem: vector %d route does not exist", vector);
> +        return -EINVAL;
> +    }
> +    assert(!v->unmasked);
> +
> +    ret = kvm_irqchip_add_msi_route(kvm_state, vector, dev);
> +    if (ret < 0) {
> +        error_report("kvm_irqchip_add_msi_route failed");
> +        return ret;
> +    }
> +    v->virq = ret;
> +    kvm_irqchip_commit_routes(kvm_state);
> +
> +    ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL, v->virq);
> +    if (ret < 0) {
> +        error_report("kvm_irqchip_add_irqfd_notifier_gsi failed");
> +        return ret;
> +    }
> +    v->unmasked = true;
> +
> +    return 0;
> +}
> +
> +static void ivshmem_irqfd_vector_mask(PCIDevice *dev, unsigned vector)
> +{
> +    IVShmemState *s = IVSHMEM(dev);
> +    EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
> +    MSIVector *v = &s->msi_vectors[vector];
> +    int ret;
> +
> +    IVSHMEM_DPRINTF("vector mask %p %d\n", dev, vector);
> +    if (!v->pdev) {
> +        error_report("ivshmem: vector %d route does not exist", vector);
> +        return;
> +    }
> +    assert(v->unmasked);
> +
> +    ret = kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, n, v->virq);
> +    if (ret < 0) {
> +        error_report("remove_irqfd_notifier_gsi failed");
> +        return;
> +    }
> +    kvm_irqchip_release_virq(kvm_state, v->virq);
> +
> +    v->unmasked = false;
> +}
> +
> +static void ivshmem_irqfd_vector_poll(PCIDevice *dev,
> +                                      unsigned int vector_start,
> +                                      unsigned int vector_end)
> +{
> +    IVShmemState *s = IVSHMEM(dev);
> +    unsigned int vector;
> +
> +    IVSHMEM_DPRINTF("vector poll %p %d-%d\n", dev, vector_start, vector_end);
> +
> +    vector_end = MIN(vector_end, s->vectors);
> +
> +    for (vector = vector_start; vector < vector_end; vector++) {
> +        EventNotifier *notifier = &s->peers[s->vm_id].eventfds[vector];
> +
> +        if (!msix_is_masked(dev, vector)) {
> +            continue;
> +        }
> +
> +        if (event_notifier_test_and_clear(notifier)) {
> +            msix_set_pending(dev, vector);
> +        }
> +    }
> +}
> +
> +static void ivshmem_watch_vector_notifier(IVShmemState *s, int vector)
> +{
> +    EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
> +    int eventfd = event_notifier_get_fd(n);
> +
> +    assert(!s->msi_vectors[vector].pdev);
> +    s->msi_vectors[vector].pdev = PCI_DEVICE(s);
> +
> +    qemu_set_fd_handler(eventfd, ivshmem_vector_notify,
> +                        NULL, &s->msi_vectors[vector]);
> +}
> +
> +static void ivshmem_unwatch_vector_notifier(IVShmemState *s, int vector)
> +{
> +    EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
> +    int eventfd = event_notifier_get_fd(n);
> +
> +    if (!s->msi_vectors[vector].pdev) {
> +        return;
> +    }
> +
> +    qemu_set_fd_handler(eventfd, NULL, NULL, NULL);
> +
> +    s->msi_vectors[vector].pdev = NULL;
> +}
> +
> +static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i)
> +{
> +    memory_region_add_eventfd(&s->ivshmem_mmio,
> +                              IVSHMEM_REG_DOORBELL,
> +                              4,
> +                              true,
> +                              (posn << 16) | i,
> +                              &s->peers[posn].eventfds[i]);
> +}
> +
> +static void ivshmem_del_eventfd(IVShmemState *s, int posn, int i)
> +{
> +    memory_region_del_eventfd(&s->ivshmem_mmio,
> +                              IVSHMEM_REG_DOORBELL,
> +                              4,
> +                              true,
> +                              (posn << 16) | i,
> +                              &s->peers[posn].eventfds[i]);
> +}
> +
> +static void close_peer_eventfds(IVShmemState *s, int posn)
> +{
> +    int i, n;
> +
> +    assert(posn >= 0 && posn < s->nb_peers);
> +    n = s->peers[posn].nb_eventfds;
> +
> +    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
> +        memory_region_transaction_begin();
> +        for (i = 0; i < n; i++) {
> +            ivshmem_del_eventfd(s, posn, i);
> +        }
> +        memory_region_transaction_commit();
> +    }
> +
> +    for (i = 0; i < n; i++) {
> +        event_notifier_cleanup(&s->peers[posn].eventfds[i]);
> +    }
> +
> +    g_free(s->peers[posn].eventfds);
> +    s->peers[posn].nb_eventfds = 0;
> +}
> +
> +static void resize_peers(IVShmemState *s, int nb_peers)
> +{
> +    int old_nb_peers = s->nb_peers;
> +    int i;
> +
> +    assert(nb_peers > old_nb_peers);
> +    IVSHMEM_DPRINTF("bumping storage to %d peers\n", nb_peers);
> +
> +    s->peers = g_realloc(s->peers, nb_peers * sizeof(Peer));
> +    s->nb_peers = nb_peers;
> +
> +    for (i = old_nb_peers; i < nb_peers; i++) {
> +        s->peers[i].eventfds = NULL;
> +        s->peers[i].nb_eventfds = 0;
> +    }
> +}
> +
> +static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector, Error **errp)
> +{
> +    PCIDevice *pdev = PCI_DEVICE(s);
> +
> +    IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
> +    assert(!s->msi_vectors[vector].pdev);
> +
> +    s->msi_vectors[vector].unmasked = false;
> +    s->msi_vectors[vector].pdev = pdev;
> +}
> +
> +static void ivshmem_remove_kvm_msi_virq(IVShmemState *s, int vector)
> +{
> +    IVSHMEM_DPRINTF("ivshmem_remove_kvm_msi_virq vector:%d\n", vector);
> +
> +    if (s->msi_vectors[vector].pdev == NULL) {
> +        return;
> +    }
> +
> +    if (s->msi_vectors[vector].unmasked) {
> +        ivshmem_irqfd_vector_mask(s->msi_vectors[vector].pdev, vector);
> +    }
> +
> +    s->msi_vectors[vector].pdev = NULL;
> +}
> +
> +static void process_msg_disconnect(IVShmemState *s, IvshmemPeerGone *msg,
> +                                   Error **errp)
> +{
> +    if (msg->header.len < sizeof(*msg)) {
> +        error_setg(errp, "Invalid peer-gone message size");
> +        return;
> +    }
> +
> +    le32_to_cpus(&msg->id);
> +
> +    IVSHMEM_DPRINTF("peer %d has gone away\n", msg->id);
> +    if (msg->id >= s->nb_peers || msg->id == s->vm_id) {
> +        error_setg(errp, "invalid peer %d", msg->id);
> +        return;
> +    }
> +    close_peer_eventfds(s, msg->id);
> +    event_notifier_set(&s->peers[s->vm_id].eventfds[0]);
> +}
> +
> +static void process_msg_connect(IVShmemState *s, IvshmemEventFd *msg, int fd,
> +                                Error **errp)
> +{
> +    Peer *peer;
> +
> +    if (msg->header.len < sizeof(*msg)) {
> +        error_setg(errp, "Invalid eventfd message size");
> +        close(fd);
> +        return;
> +    }
> +
> +    le32_to_cpus(&msg->id);
> +    le32_to_cpus(&msg->vector);
> +
> +    if (msg->id >= s->nb_peers) {
> +        resize_peers(s, msg->id + 1);
> +    }
> +
> +    peer = &s->peers[msg->id];
> +
> +    /*
> +     * The N-th connect message for this peer comes with the file
> +     * descriptor for vector N-1.
> +     */
> +    if (msg->vector != peer->nb_eventfds) {
> +        error_setg(errp, "Received vector %d out of order", msg->vector);
> +        close(fd);
> +        return;
> +    }
> +    if (peer->nb_eventfds >= s->vectors) {
> +        error_setg(errp, "Too many eventfd received, device has %d vectors",
> +                   s->vectors);
> +        close(fd);
> +        return;
> +    }
> +    peer->nb_eventfds++;
> +
> +    if (msg->vector == 0)
> +        peer->eventfds = g_new0(EventNotifier, s->vectors);
> +
> +    IVSHMEM_DPRINTF("eventfds[%d][%d] = %d\n", msg->id, msg->vector, fd);
> +    event_notifier_init_fd(&peer->eventfds[msg->vector], fd);
> +    fcntl_setfl(fd, O_NONBLOCK); /* msix/irqfd poll non block */
> +
> +    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
> +        ivshmem_add_eventfd(s, msg->id, msg->vector);
> +    }
> +
> +    if (msg->id == s->vm_id) {
> +        ivshmem_watch_vector_notifier(s, peer->nb_eventfds - 1);
> +    }
> +}
> +
> +static int ivshmem_can_receive(void *opaque)
> +{
> +    IVShmemState *s = opaque;
> +
> +    assert(s->msg_buffered_bytes < sizeof(s->msg_buf));
> +    return sizeof(s->msg_buf) - s->msg_buffered_bytes;
> +}
> +
> +static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
> +{
> +    IVShmemState *s = opaque;
> +    IvshmemMsgHeader *header = (IvshmemMsgHeader *)&s->msg_buf;
> +    Error *err = NULL;
> +    int fd;
> +
> +    assert(size >= 0 && s->msg_buffered_bytes + size <= sizeof(s->msg_buf));
> +    memcpy(s->msg_buf + s->msg_buffered_bytes, buf, size);
> +    s->msg_buffered_bytes += size;
> +    if (s->msg_buffered_bytes < sizeof(*header) ||
> +        s->msg_buffered_bytes < le32_to_cpu(header->len)) {
> +        return;
> +    }
> +
> +    fd = qemu_chr_fe_get_msgfd(&s->server_chr);
> +
> +    le32_to_cpus(&header->type);
> +    le32_to_cpus(&header->len);
> +
> +    switch (header->type) {
> +    case IVSHMEM_MSG_EVENT_FD:
> +        process_msg_connect(s, (IvshmemEventFd *)header, fd, &err);
> +        break;
> +    case IVSHMEM_MSG_PEER_GONE:
> +        process_msg_disconnect(s, (IvshmemPeerGone *)header, &err);
> +        break;
> +    default:
> +        error_setg(&err, "invalid message, type %d", header->type);
> +        break;
> +    }
> +    if (err) {
> +        error_report_err(err);
> +    }
> +
> +    s->msg_buffered_bytes -= header->len;
> +    memmove(s->msg_buf, s->msg_buf + header->len, s->msg_buffered_bytes);
> +}
> +
> +static void ivshmem_recv_setup(IVShmemState *s, Error **errp)
> +{
> +    IvshmemInitialInfo msg;
> +    struct stat buf;
> +    uint8_t dummy;
> +    int fd, n, ret;
> +
> +    n = 0;
> +    do {
> +        ret = qemu_chr_fe_read_all(&s->server_chr, (uint8_t *)&msg + n,
> +                                   sizeof(msg) - n);
> +        if (ret < 0) {
> +            if (ret == -EINTR) {
> +                continue;
> +            }
> +            error_setg_errno(errp, -ret, "read from server failed");
> +            return;
> +        }
> +        n += ret;
> +    } while (n < sizeof(msg));
> +
> +    fd = qemu_chr_fe_get_msgfd(&s->server_chr);
> +
> +    le32_to_cpus(&msg.header.type);
> +    le32_to_cpus(&msg.header.len);
> +    if (msg.header.type != IVSHMEM_MSG_INIT || msg.header.len < sizeof(msg)) {
> +        error_setg(errp, "server sent invalid initial info");
> +        return;
> +    }
> +
> +    /* consume additional bytes of message */
> +    msg.header.len -= sizeof(msg);
> +    while (msg.header.len > 0) {
> +        ret = qemu_chr_fe_read_all(&s->server_chr, &dummy, 1);
> +        if (ret < 0) {
> +            if (ret == -EINTR) {
> +                continue;
> +            }
> +            error_setg_errno(errp, -ret, "read from server failed");
> +            return;
> +        }
> +        msg.header.len -= ret;
> +    }
> +
> +    le32_to_cpus(&msg.compatible_version);
> +    if (msg.compatible_version != IVSHMEM_PROTOCOL_VERSION) {
> +        error_setg(errp, "server sent compatible version %u, expecting %u",
> +                   msg.compatible_version, IVSHMEM_PROTOCOL_VERSION);
> +        return;
> +    }
> +
> +    le32_to_cpus(&msg.id);
> +    if (msg.id > IVSHMEM_MAX_PEERS) {
> +        error_setg(errp, "server sent invalid ID");
> +        return;
> +    }
> +    s->vm_id = msg.id;
> +
> +    if (fstat(fd, &buf) < 0) {
> +        error_setg_errno(errp, errno,
> +            "can't determine size of shared memory sent by server");
> +        close(fd);
> +        return;
> +    }
> +
> +    s->shmem_sz = buf.st_size;
> +
> +    s->shmem = mmap(NULL, s->shmem_sz, PROT_READ | PROT_WRITE, MAP_SHARED,
> +                    fd, 0);
> +    if (s->shmem == MAP_FAILED) {
> +        error_setg_errno(errp, errno,
> +                         "can't map shared memory sent by server");
> +        return;
> +    }
> +
> +    le32_to_cpus(&msg.vectors);
> +    if (msg.vectors < 1 || msg.vectors > 1024) {
> +        error_setg(errp, "server sent invalid number of vectors message");
> +        return;
> +    }
> +    s->vectors = msg.vectors;
> +
> +    s->max_peers = le32_to_cpu(msg.max_peers);
> +    s->protocol = le32_to_cpu(msg.protocol);
> +    s->output_section_sz = le64_to_cpu(msg.output_section_size);
> +}
> +
> +/* Select the MSI-X vectors used by device.
> + * ivshmem maps events to vectors statically, so
> + * we just enable all vectors on init and after reset. */
> +static void ivshmem_msix_vector_use(IVShmemState *s)
> +{
> +    PCIDevice *d = PCI_DEVICE(s);
> +    int i;
> +
> +    for (i = 0; i < s->vectors; i++) {
> +        msix_vector_use(d, i);
> +    }
> +}
> +
> +static void ivshmem_reset(DeviceState *d)
> +{
> +    IVShmemState *s = IVSHMEM(d);
> +
> +    ivshmem_disable_irqfd(s);
> +
> +    s->intctrl = 0;
> +    ivshmem_write_state(s, 0);
> +    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> +        ivshmem_msix_vector_use(s);
> +    }
> +}
> +
> +static int ivshmem_setup_interrupts(IVShmemState *s, Error **errp)
> +{
> +    /* allocate QEMU callback data for receiving interrupts */
> +    s->msi_vectors = g_malloc0(s->vectors * sizeof(MSIVector));
> +
> +    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> +        if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1, errp)) {
> +            IVSHMEM_DPRINTF("msix requested but not available - disabling\n");
> +            s->features &= ~(IVSHMEM_MSI | IVSHMEM_IOEVENTFD);
> +        } else {
> +            IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
> +            ivshmem_msix_vector_use(s);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static void ivshmem_enable_irqfd(IVShmemState *s)
> +{
> +    PCIDevice *pdev = PCI_DEVICE(s);
> +    int i;
> +
> +    for (i = 0; i < s->peers[s->vm_id].nb_eventfds; i++) {
> +        Error *err = NULL;
> +
> +        ivshmem_unwatch_vector_notifier(s, i);
> +
> +        ivshmem_add_kvm_msi_virq(s, i, &err);
> +        if (err) {
> +            error_report_err(err);
> +            goto undo;
> +        }
> +    }
> +
> +    if (msix_set_vector_notifiers(pdev,
> +                                  ivshmem_irqfd_vector_unmask,
> +                                  ivshmem_irqfd_vector_mask,
> +                                  ivshmem_irqfd_vector_poll)) {
> +        error_report("ivshmem: msix_set_vector_notifiers failed");
> +        goto undo;
> +    }
> +    return;
> +
> +undo:
> +    while (--i >= 0) {
> +        ivshmem_remove_kvm_msi_virq(s, i);
> +    }
> +}
> +
> +static void ivshmem_disable_irqfd(IVShmemState *s)
> +{
> +    PCIDevice *pdev = PCI_DEVICE(s);
> +    int i;
> +
> +    if (!pdev->msix_vector_use_notifier) {
> +        return;
> +    }
> +
> +    msix_unset_vector_notifiers(pdev);
> +
> +    for (i = 0; i < s->peers[s->vm_id].nb_eventfds; i++) {
> +        ivshmem_remove_kvm_msi_virq(s, i);
> +        ivshmem_watch_vector_notifier(s, i);
> +    }
> +
> +}
> +
> +static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
> +                                 uint32_t val, int len)
> +{
> +    IVShmemState *s = IVSHMEM(pdev);
> +    bool was_usable = ivshmem_irqfd_usable(s);
> +
> +    pci_default_write_config(pdev, address, val, len);
> +    ivshmem_update_irqfd(s, was_usable);
> +}
> +
> +static void ivshmem_exit(PCIDevice *dev)
> +{
> +    IVShmemState *s = IVSHMEM(dev);
> +    int i;
> +
> +    if (s->migration_blocker) {
> +        migrate_del_blocker(s->migration_blocker);
> +        error_free(s->migration_blocker);
> +    }
> +
> +    if (memory_region_is_mapped(&s->rw_section)) {
> +        void *addr = memory_region_get_ram_ptr(&s->rw_section);
> +        int fd;
> +
> +        if (munmap(addr, memory_region_size(&s->rw_section) == -1)) {
> +            error_report("Failed to munmap shared memory %s",
> +                         strerror(errno));
> +        }
> +
> +        fd = memory_region_get_fd(&s->rw_section);
> +        close(fd);
> +
> +        vmstate_unregister_ram(&s->state_tab, DEVICE(dev));
> +        vmstate_unregister_ram(&s->rw_section, DEVICE(dev));
> +    }
> +
> +    if (s->peers) {
> +        for (i = 0; i < s->nb_peers; i++) {
> +            close_peer_eventfds(s, i);
> +        }
> +        g_free(s->peers);
> +    }
> +
> +    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> +        msix_uninit_exclusive_bar(dev);
> +    }
> +
> +    g_free(s->msi_vectors);
> +}
> +
> +static int ivshmem_pre_load(void *opaque)
> +{
> +    IVShmemState *s = opaque;
> +
> +    if (!ivshmem_is_master(s)) {
> +        error_report("'peer' devices are not migratable");
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
> +static int ivshmem_post_load(void *opaque, int version_id)
> +{
> +    IVShmemState *s = opaque;
> +
> +    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> +        ivshmem_msix_vector_use(s);
> +    }
> +    return 0;
> +}
> +
> +static const VMStateDescription ivshmem_vmsd = {
> +    .name = TYPE_IVSHMEM,
> +    .version_id = 0,
> +    .minimum_version_id = 0,
> +    .pre_load = ivshmem_pre_load,
> +    .post_load = ivshmem_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
> +        VMSTATE_MSIX(parent_obj, IVShmemState),
> +        VMSTATE_UINT32(state, IVShmemState),
> +        VMSTATE_UINT32(intctrl, IVShmemState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static Property ivshmem_properties[] = {
> +    DEFINE_PROP_CHR("chardev", IVShmemState, server_chr),
> +    DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD,
> +                    true),
> +    DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void ivshmem_init(Object *obj)
> +{
> +    IVShmemState *s = IVSHMEM(obj);
> +
> +    s->features |= (1 << IVSHMEM_MSI);
> +}
> +
> +static void ivshmem_realize(PCIDevice *dev, Error **errp)
> +{
> +    IVShmemState *s = IVSHMEM(dev);
> +    Chardev *chr = qemu_chr_fe_get_driver(&s->server_chr);
> +    size_t rw_section_sz, input_sections_sz;
> +    IVShmemVndrCap *vndr_cap;
> +    Error *err = NULL;
> +    uint8_t *pci_conf;
> +    int offset, priv_ctrl_pos;
> +    off_t shmem_pos;
> +
> +    if (!qemu_chr_fe_backend_connected(&s->server_chr)) {
> +        error_setg(errp, "You must specify a 'chardev'");
> +        return;
> +    }
> +
> +    /* IRQFD requires MSI */
> +    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) &&
> +        !ivshmem_has_feature(s, IVSHMEM_MSI)) {
> +        error_setg(errp, "ioeventfd/irqfd requires MSI");
> +        return;
> +    }
> +
> +    pci_conf = dev->config;
> +    pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
> +
> +    memory_region_init_io(&s->ivshmem_mmio, OBJECT(s), &ivshmem_mmio_ops, s,
> +                          "ivshmem.mmio", IVSHMEM_REG_BAR_SIZE);
> +
> +    /* region for registers*/
> +    pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
> +                     &s->ivshmem_mmio);
> +
> +    assert(chr);
> +    IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
> +                    chr->filename);
> +
> +    /*
> +     * Receive setup messages from server synchronously.
> +     * Older versions did it asynchronously, but that creates a
> +     * number of entertaining race conditions.
> +     */
> +    ivshmem_recv_setup(s, &err);
> +    if (err) {
> +        error_propagate(errp, err);
> +        return;
> +    }
> +
> +    /* we allocate enough space for 16 peers and grow as needed */
> +    resize_peers(s, 16);
> +
> +    if (s->master == ON_OFF_AUTO_ON && s->vm_id != 0) {
> +        error_setg(errp,
> +                   "Master must connect to the server before any peers");
> +        return;
> +    }
> +
> +    qemu_chr_fe_set_handlers(&s->server_chr, ivshmem_can_receive,
> +                             ivshmem_read, NULL, NULL, s, NULL, true);
> +
> +    if (ivshmem_setup_interrupts(s, errp) < 0) {
> +        error_prepend(errp, "Failed to initialize interrupts: ");
> +        return;
> +    }
> +
> +    memory_region_init(&s->ivshmem_bar2, OBJECT(s), "ivshmem.bar2",
> +                       s->shmem_sz);
> +
> +    input_sections_sz = s->output_section_sz * s->max_peers;
> +    if (input_sections_sz + 4096 > s->shmem_sz) {
> +        error_setg(errp,
> +                   "Invalid output section size, shared memory too small");
> +        return;
> +    }
> +    rw_section_sz = s->shmem_sz - input_sections_sz - 4096;
> +
> +    shmem_pos = 0;
> +    memory_region_init_ram_ptr(&s->state_tab, OBJECT(s), "ivshmem.state",
> +                               4096, s->shmem + shmem_pos);
> +    memory_region_set_readonly(&s->state_tab, true);
> +    memory_region_add_subregion(&s->ivshmem_bar2, shmem_pos, &s->state_tab);
> +
> +    vmstate_register_ram(&s->state_tab, DEVICE(s));
> +
> +    if (rw_section_sz > 0) {
> +        shmem_pos += 4096;
> +        memory_region_init_ram_ptr(&s->rw_section, OBJECT(s),
> +                                   "ivshmem.rw-section",
> +                                   rw_section_sz, s->shmem + shmem_pos);
> +        memory_region_add_subregion(&s->ivshmem_bar2, shmem_pos,
> +                                    &s->rw_section);
> +
> +        vmstate_register_ram(&s->rw_section, DEVICE(s));
> +    }
> +
> +    if (s->output_section_sz > 0) {
> +        shmem_pos += rw_section_sz;
> +        memory_region_init_ram_ptr(&s->input_sections, OBJECT(s),
> +                                   "ivshmem.input-sections", input_sections_sz,
> +                                   s->shmem + shmem_pos);
> +        memory_region_set_readonly(&s->input_sections, true);
> +        memory_region_add_subregion(&s->ivshmem_bar2, shmem_pos,
> +                                    &s->input_sections);
> +
> +        shmem_pos += s->vm_id * s->output_section_sz;
> +        memory_region_init_ram_ptr(&s->output_section, OBJECT(s),
> +                                   "ivshmem.output-section",
> +                                   s->output_section_sz, s->shmem + shmem_pos);
> +        memory_region_add_subregion_overlap(&s->ivshmem_bar2, shmem_pos,
> +                                            &s->output_section, 1);
> +
> +        vmstate_register_ram(&s->input_sections, DEVICE(s));
> +    }
> +
> +    pci_config_set_class(dev->config, 0xff00 | (s->protocol >> 8));
> +    pci_config_set_prog_interface(dev->config, (uint8_t)s->protocol);
> +
> +    offset = pci_add_capability(dev, PCI_CAP_ID_VNDR, 0, 0x18,
> +                                &error_abort);
> +    vndr_cap = (IVShmemVndrCap *)(pci_conf + offset);
> +    vndr_cap->length = 0x18;
> +    vndr_cap->state_tab_sz = cpu_to_le32(4096);
> +    vndr_cap->rw_section_sz = cpu_to_le64(rw_section_sz);
> +    vndr_cap->output_section_sz = s->output_section_sz;
> +
> +    priv_ctrl_pos = offset + offsetof(IVShmemVndrCap, priv_ctrl);
> +    s->priv_ctrl = &dev->config[priv_ctrl_pos];
> +    dev->wmask[priv_ctrl_pos] |= IVSHMEM_ONESHOT_MODE;
> +
> +    if (s->master == ON_OFF_AUTO_AUTO) {
> +        s->master = s->vm_id == 0 ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
> +    }
> +
> +    if (!ivshmem_is_master(s)) {
> +        error_setg(&s->migration_blocker,
> +                   "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
> +        migrate_add_blocker(s->migration_blocker, &err);
> +        if (err) {
> +            error_propagate(errp, err);
> +            error_free(s->migration_blocker);
> +            return;
> +        }
> +    }
> +
> +    pci_register_bar(PCI_DEVICE(s), 2,
> +                     PCI_BASE_ADDRESS_SPACE_MEMORY |
> +                     PCI_BASE_ADDRESS_MEM_PREFETCH |
> +                     PCI_BASE_ADDRESS_MEM_TYPE_64,
> +                     &s->ivshmem_bar2);
> +}
> +
> +static void ivshmem_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> +    k->realize = ivshmem_realize;
> +    k->exit = ivshmem_exit;
> +    k->config_write = ivshmem_write_config;
> +    k->vendor_id = PCI_VENDOR_ID_IVSHMEM;
> +    k->device_id = PCI_DEVICE_ID_IVSHMEM;
> +    dc->reset = ivshmem_reset;
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    dc->desc = "Inter-VM shared memory v2";
> +
> +    dc->props = ivshmem_properties;
> +    dc->vmsd = &ivshmem_vmsd;
> +}
> +
> +static const TypeInfo ivshmem_info = {
> +    .name          = TYPE_IVSHMEM,
> +    .parent        = TYPE_PCI_DEVICE,
> +    .instance_size = sizeof(IVShmemState),
> +    .instance_init = ivshmem_init,
> +    .class_init    = ivshmem_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> +        { },
> +    },
> +};
> +
> +static void ivshmem_register_types(void)
> +{
> +    type_register_static(&ivshmem_info);
> +}
> +
> +type_init(ivshmem_register_types)
> diff --git a/include/hw/misc/ivshmem2.h b/include/hw/misc/ivshmem2.h
> new file mode 100644
> index 0000000000..ee994699e8
> --- /dev/null
> +++ b/include/hw/misc/ivshmem2.h
> @@ -0,0 +1,48 @@
> +/*
> + * Inter-VM Shared Memory PCI device, Version 2.
> + *
> + * Copyright (c) Siemens AG, 2019
> + *
> + * Authors:
> + *  Jan Kiszka <jan.kiszka@siemens.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.
> + */
> +#ifndef IVSHMEM2_H
> +#define IVSHMEM2_H
> +
> +#define IVSHMEM_PROTOCOL_VERSION    2
> +
> +#define IVSHMEM_MSG_INIT            0
> +#define IVSHMEM_MSG_EVENT_FD        1
> +#define IVSHMEM_MSG_PEER_GONE       2
> +
> +typedef struct IvshmemMsgHeader {
> +    uint32_t type;
> +    uint32_t len;
> +} IvshmemMsgHeader;
> +
> +typedef struct IvshmemInitialInfo {
> +    IvshmemMsgHeader header;
> +    uint32_t version;
> +    uint32_t compatible_version;
> +    uint32_t id;
> +    uint32_t max_peers;
> +    uint32_t vectors;
> +    uint32_t protocol;
> +    uint64_t output_section_size;
> +} IvshmemInitialInfo;
> +
> +typedef struct IvshmemEventFd {
> +    IvshmemMsgHeader header;
> +    uint32_t id;
> +    uint32_t vector;
> +} IvshmemEventFd;
> +
> +typedef struct IvshmemPeerGone {
> +    IvshmemMsgHeader header;
> +    uint32_t id;
> +} IvshmemPeerGone;
> +
> +#endif /* IVSHMEM2_H */
> diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
> index 11f8ab7149..f0c5d5ed2b 100644
> --- a/include/hw/pci/pci_ids.h
> +++ b/include/hw/pci/pci_ids.h
> @@ -264,6 +264,8 @@
>  #define PCI_VENDOR_ID_NEC                0x1033
>  #define PCI_DEVICE_ID_NEC_UPD720200      0x0194
>  
> +#define PCI_VENDOR_ID_SIEMENS            0x110a
> +
>  #define PCI_VENDOR_ID_TEWS               0x1498
>  #define PCI_DEVICE_ID_TEWS_TPCI200       0x30C8
>  
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH v2 3/3] contrib: Add server for ivshmem revision 2
  2020-01-07 14:36 ` [RFC][PATCH v2 3/3] contrib: Add server for ivshmem " Jan Kiszka
@ 2020-04-29  4:22   ` Liang Yan
  0 siblings, 0 replies; 14+ messages in thread
From: Liang Yan @ 2020-04-29  4:22 UTC (permalink / raw)
  To: Jan Kiszka, qemu-devel
  Cc: Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi

A quick check by checkpatch.pl, pretty straightforward to fix.

ERROR: memory barrier without comment
#205: FILE: contrib/ivshmem2-server/ivshmem2-server.c:106:
+    smp_mb();

ERROR: spaces required around that '*' (ctx:VxV)
#753: FILE: contrib/ivshmem2-server/main.c:22:
+#define IVSHMEM_SERVER_DEFAULT_SHM_SIZE       (4*1024*1024)
                                                 ^

ERROR: spaces required around that '*' (ctx:VxV)
#753: FILE: contrib/ivshmem2-server/main.c:22:
+#define IVSHMEM_SERVER_DEFAULT_SHM_SIZE       (4*1024*1024)


Best,
Liang



On 1/7/20 9:36 AM, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> This implements the server process for ivshmem v2 device models of QEMU.
> Again, no effort has been spent yet on sharing code with the v1 server.
> Parts have been copied, others were rewritten.
> 
> In addition to parameters of v1, this server now also specifies
> 
>  - the maximum number of peers to be connected (required to know in
>    advance because of v2's state table)
>  - the size of the output sections (can be 0)
>  - the protocol ID to be published to all peers
> 
> When a virtio protocol ID is chosen, only 2 peers can be connected.
> Furthermore, the server will signal the backend variant of the ID to the
> master instance and the frontend ID to the slave peer.
> 
> To start, e.g., a server that allows virtio console over ivshmem, call
> 
> ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003
> 
> TODO: specify the new server protocol.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  Makefile                                  |   3 +
>  Makefile.objs                             |   1 +
>  configure                                 |   1 +
>  contrib/ivshmem2-server/Makefile.objs     |   1 +
>  contrib/ivshmem2-server/ivshmem2-server.c | 462 ++++++++++++++++++++++++++++++
>  contrib/ivshmem2-server/ivshmem2-server.h | 158 ++++++++++
>  contrib/ivshmem2-server/main.c            | 313 ++++++++++++++++++++
>  7 files changed, 939 insertions(+)
>  create mode 100644 contrib/ivshmem2-server/Makefile.objs
>  create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c
>  create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h
>  create mode 100644 contrib/ivshmem2-server/main.c
> 
> diff --git a/Makefile b/Makefile
> index 6b5ad1121b..33bb0eefdb 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -427,6 +427,7 @@ dummy := $(call unnest-vars,, \
>                  elf2dmp-obj-y \
>                  ivshmem-client-obj-y \
>                  ivshmem-server-obj-y \
> +                ivshmem2-server-obj-y \
>                  rdmacm-mux-obj-y \
>                  libvhost-user-obj-y \
>                  vhost-user-scsi-obj-y \
> @@ -655,6 +656,8 @@ ivshmem-client$(EXESUF): $(ivshmem-client-obj-y) $(COMMON_LDADDS)
>  	$(call LINK, $^)
>  ivshmem-server$(EXESUF): $(ivshmem-server-obj-y) $(COMMON_LDADDS)
>  	$(call LINK, $^)
> +ivshmem2-server$(EXESUF): $(ivshmem2-server-obj-y) $(COMMON_LDADDS)
> +	$(call LINK, $^)
>  endif
>  vhost-user-scsi$(EXESUF): $(vhost-user-scsi-obj-y) libvhost-user.a
>  	$(call LINK, $^)
> diff --git a/Makefile.objs b/Makefile.objs
> index 02bf5ce11d..ce243975ef 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -115,6 +115,7 @@ qga-vss-dll-obj-y = qga/
>  elf2dmp-obj-y = contrib/elf2dmp/
>  ivshmem-client-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem-client/
>  ivshmem-server-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem-server/
> +ivshmem2-server-obj-$(CONFIG_IVSHMEM) = contrib/ivshmem2-server/
>  libvhost-user-obj-y = contrib/libvhost-user/
>  vhost-user-scsi.o-cflags := $(LIBISCSI_CFLAGS)
>  vhost-user-scsi.o-libs := $(LIBISCSI_LIBS)
> diff --git a/configure b/configure
> index 747d3b4120..1cb1427f1b 100755
> --- a/configure
> +++ b/configure
> @@ -6165,6 +6165,7 @@ if test "$want_tools" = "yes" ; then
>    fi
>    if [ "$ivshmem" = "yes" ]; then
>      tools="ivshmem-client\$(EXESUF) ivshmem-server\$(EXESUF) $tools"
> +    tools="ivshmem2-server\$(EXESUF) $tools"
>    fi
>    if [ "$curl" = "yes" ]; then
>        tools="elf2dmp\$(EXESUF) $tools"
> diff --git a/contrib/ivshmem2-server/Makefile.objs b/contrib/ivshmem2-server/Makefile.objs
> new file mode 100644
> index 0000000000..d233e18ec8
> --- /dev/null
> +++ b/contrib/ivshmem2-server/Makefile.objs
> @@ -0,0 +1 @@
> +ivshmem2-server-obj-y = ivshmem2-server.o main.o
> diff --git a/contrib/ivshmem2-server/ivshmem2-server.c b/contrib/ivshmem2-server/ivshmem2-server.c
> new file mode 100644
> index 0000000000..b341f1fcd0
> --- /dev/null
> +++ b/contrib/ivshmem2-server/ivshmem2-server.c
> @@ -0,0 +1,462 @@
> +/*
> + * Copyright 6WIND S.A., 2014
> + * Copyright (c) Siemens AG, 2019
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.  See the COPYING file in the
> + * top-level directory.
> + */
> +#include "qemu/osdep.h"
> +#include "qemu/host-utils.h"
> +#include "qemu/sockets.h"
> +#include "qemu/atomic.h"
> +
> +#include <sys/socket.h>
> +#include <sys/un.h>
> +
> +#include "ivshmem2-server.h"
> +
> +/* log a message on stdout if verbose=1 */
> +#define IVSHMEM_SERVER_DEBUG(server, fmt, ...) do { \
> +        if ((server)->args.verbose) {         \
> +            printf(fmt, ## __VA_ARGS__); \
> +        }                                \
> +    } while (0)
> +
> +/** maximum size of a huge page, used by ivshmem_server_ftruncate() */
> +#define IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE (1024 * 1024 * 1024)
> +
> +/** default listen backlog (number of sockets not accepted) */
> +#define IVSHMEM_SERVER_LISTEN_BACKLOG 10
> +
> +/* send message to a client unix socket */
> +static int ivshmem_server_send_msg(int sock_fd, void *payload, int len, int fd)
> +{
> +    int ret;
> +    struct msghdr msg;
> +    struct iovec iov[1];
> +    union {
> +        struct cmsghdr cmsg;
> +        char control[CMSG_SPACE(sizeof(int))];
> +    } msg_control;
> +    struct cmsghdr *cmsg;
> +
> +    iov[0].iov_base = payload;
> +    iov[0].iov_len = len;
> +
> +    memset(&msg, 0, sizeof(msg));
> +    msg.msg_iov = iov;
> +    msg.msg_iovlen = 1;
> +
> +    /* if fd is specified, add it in a cmsg */
> +    if (fd >= 0) {
> +        memset(&msg_control, 0, sizeof(msg_control));
> +        msg.msg_control = &msg_control;
> +        msg.msg_controllen = sizeof(msg_control);
> +        cmsg = CMSG_FIRSTHDR(&msg);
> +        cmsg->cmsg_level = SOL_SOCKET;
> +        cmsg->cmsg_type = SCM_RIGHTS;
> +        cmsg->cmsg_len = CMSG_LEN(sizeof(int));
> +        memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
> +    }
> +
> +    ret = sendmsg(sock_fd, &msg, 0);
> +    if (ret <= 0) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int ivshmem_server_send_event_fd(int sock_fd, int peer_id, int vector,
> +                                        int fd)
> +{
> +    IvshmemEventFd msg = {
> +        .header = {
> +            .type = GUINT32_TO_LE(IVSHMEM_MSG_EVENT_FD),
> +            .len = GUINT32_TO_LE(sizeof(msg)),
> +        },
> +        .id = GUINT32_TO_LE(peer_id),
> +        .vector = GUINT32_TO_LE(vector),
> +    };
> +
> +    return ivshmem_server_send_msg(sock_fd, &msg, sizeof(msg), fd);
> +}
> +
> +/* free a peer when the server advertises a disconnection or when the
> + * server is freed */
> +static void
> +ivshmem_server_free_peer(IvshmemServer *server, IvshmemServerPeer *peer)
> +{
> +    unsigned vector;
> +    IvshmemServerPeer *other_peer;
> +    IvshmemPeerGone msg = {
> +        .header = {
> +            .type = GUINT32_TO_LE(IVSHMEM_MSG_PEER_GONE),
> +            .len = GUINT32_TO_LE(sizeof(msg)),
> +        },
> +        .id = GUINT32_TO_LE(peer->id),
> +    };
> +
> +    IVSHMEM_SERVER_DEBUG(server, "free peer %" PRId64 "\n", peer->id);
> +    close(peer->sock_fd);
> +    QTAILQ_REMOVE(&server->peer_list, peer, next);
> +
> +    server->state_table[peer->id] = 0;
> +    smp_mb();
> +
> +    /* advertise the deletion to other peers */
> +    QTAILQ_FOREACH(other_peer, &server->peer_list, next) {
> +        ivshmem_server_send_msg(other_peer->sock_fd, &msg, sizeof(msg), -1);
> +    }
> +
> +    for (vector = 0; vector < peer->vectors_count; vector++) {
> +        event_notifier_cleanup(&peer->vectors[vector]);
> +    }
> +
> +    g_free(peer);
> +}
> +
> +/* send the peer id and the shm_fd just after a new client connection */
> +static int
> +ivshmem_server_send_initial_info(IvshmemServer *server, IvshmemServerPeer *peer)
> +{
> +    IvshmemInitialInfo msg = {
> +        .header = {
> +            .type = GUINT32_TO_LE(IVSHMEM_MSG_INIT),
> +            .len = GUINT32_TO_LE(sizeof(msg)),
> +        },
> +        .version = GUINT32_TO_LE(IVSHMEM_PROTOCOL_VERSION),
> +        .compatible_version = GUINT32_TO_LE(IVSHMEM_PROTOCOL_VERSION),
> +        .id = GUINT32_TO_LE(peer->id),
> +        .max_peers = GUINT32_TO_LE(server->args.max_peers),
> +        .vectors = GUINT32_TO_LE(server->args.vectors),
> +        .protocol = GUINT32_TO_LE(server->args.protocol),
> +        .output_section_size = GUINT64_TO_LE(server->args.output_section_size),
> +    };
> +    unsigned virtio_protocol;
> +    int ret;
> +
> +    if (server->args.protocol >= 0x8000) {
> +        virtio_protocol = server->args.protocol & ~0x4000;
> +        msg.protocol &= ~0x4000;
> +        if (peer->id == 0) {
> +            virtio_protocol |= 0x4000;
> +        }
> +        msg.protocol = GUINT32_TO_LE(virtio_protocol);
> +    }
> +
> +    ret = ivshmem_server_send_msg(peer->sock_fd, &msg, sizeof(msg),
> +                                  server->shm_fd);
> +    if (ret < 0) {
> +        IVSHMEM_SERVER_DEBUG(server, "cannot send initial info: %s\n",
> +                             strerror(errno));
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +/* handle message on listening unix socket (new client connection) */
> +static int
> +ivshmem_server_handle_new_conn(IvshmemServer *server)
> +{
> +    IvshmemServerPeer *peer, *other_peer;
> +    struct sockaddr_un unaddr;
> +    socklen_t unaddr_len;
> +    int newfd;
> +    unsigned i;
> +
> +    /* accept the incoming connection */
> +    unaddr_len = sizeof(unaddr);
> +    newfd = qemu_accept(server->sock_fd,
> +                        (struct sockaddr *)&unaddr, &unaddr_len);
> +
> +    if (newfd < 0) {
> +        IVSHMEM_SERVER_DEBUG(server, "cannot accept() %s\n", strerror(errno));
> +        return -1;
> +    }
> +
> +    qemu_set_nonblock(newfd);
> +    IVSHMEM_SERVER_DEBUG(server, "accept()=%d\n", newfd);
> +
> +    /* allocate new structure for this peer */
> +    peer = g_malloc0(sizeof(*peer));
> +    peer->sock_fd = newfd;
> +
> +    /* get an unused peer id */
> +    /* XXX: this could use id allocation such as Linux IDA, or simply
> +     * a free-list */
> +    for (i = 0; i < G_MAXUINT16; i++) {
> +        if (ivshmem_server_search_peer(server, i) == NULL) {
> +            break;
> +        }
> +    }
> +    if (i >= server->args.max_peers) {
> +        IVSHMEM_SERVER_DEBUG(server, "cannot allocate new client id\n");
> +        close(newfd);
> +        g_free(peer);
> +        return -1;
> +    }
> +    peer->id = i;
> +
> +    /* create eventfd, one per vector */
> +    peer->vectors_count = server->args.vectors;
> +    for (i = 0; i < peer->vectors_count; i++) {
> +        if (event_notifier_init(&peer->vectors[i], FALSE) < 0) {
> +            IVSHMEM_SERVER_DEBUG(server, "cannot create eventfd\n");
> +            goto fail;
> +        }
> +    }
> +
> +    /* send peer id and shm fd */
> +    if (ivshmem_server_send_initial_info(server, peer) < 0) {
> +        IVSHMEM_SERVER_DEBUG(server, "cannot send initial info\n");
> +        goto fail;
> +    }
> +
> +    /* advertise the new peer to others */
> +    QTAILQ_FOREACH(other_peer, &server->peer_list, next) {
> +        for (i = 0; i < peer->vectors_count; i++) {
> +            ivshmem_server_send_event_fd(other_peer->sock_fd, peer->id, i,
> +                                         peer->vectors[i].wfd);
> +        }
> +    }
> +
> +    /* advertise the other peers to the new one */
> +    QTAILQ_FOREACH(other_peer, &server->peer_list, next) {
> +        for (i = 0; i < peer->vectors_count; i++) {
> +            ivshmem_server_send_event_fd(peer->sock_fd, other_peer->id, i,
> +                                         other_peer->vectors[i].wfd);
> +        }
> +    }
> +
> +    /* advertise the new peer to itself */
> +    for (i = 0; i < peer->vectors_count; i++) {
> +        ivshmem_server_send_event_fd(peer->sock_fd, peer->id, i,
> +                                     event_notifier_get_fd(&peer->vectors[i]));
> +    }
> +
> +    QTAILQ_INSERT_TAIL(&server->peer_list, peer, next);
> +    IVSHMEM_SERVER_DEBUG(server, "new peer id = %" PRId64 "\n",
> +                         peer->id);
> +    return 0;
> +
> +fail:
> +    while (i--) {
> +        event_notifier_cleanup(&peer->vectors[i]);
> +    }
> +    close(newfd);
> +    g_free(peer);
> +    return -1;
> +}
> +
> +/* Try to ftruncate a file to next power of 2 of shmsize.
> + * If it fails; all power of 2 above shmsize are tested until
> + * we reach the maximum huge page size. This is useful
> + * if the shm file is in a hugetlbfs that cannot be truncated to the
> + * shm_size value. */
> +static int
> +ivshmem_server_ftruncate(int fd, unsigned shmsize)
> +{
> +    int ret;
> +    struct stat mapstat;
> +
> +    /* align shmsize to next power of 2 */
> +    shmsize = pow2ceil(shmsize);
> +
> +    if (fstat(fd, &mapstat) != -1 && mapstat.st_size == shmsize) {
> +        return 0;
> +    }
> +
> +    while (shmsize <= IVSHMEM_SERVER_MAX_HUGEPAGE_SIZE) {
> +        ret = ftruncate(fd, shmsize);
> +        if (ret == 0) {
> +            return ret;
> +        }
> +        shmsize *= 2;
> +    }
> +
> +    return -1;
> +}
> +
> +/* Init a new ivshmem server */
> +void ivshmem_server_init(IvshmemServer *server)
> +{
> +    server->sock_fd = -1;
> +    server->shm_fd = -1;
> +    server->state_table = NULL;
> +    QTAILQ_INIT(&server->peer_list);
> +}
> +
> +/* open shm, create and bind to the unix socket */
> +int
> +ivshmem_server_start(IvshmemServer *server)
> +{
> +    struct sockaddr_un sun;
> +    int shm_fd, sock_fd, ret;
> +    void *state_table;
> +
> +    /* open shm file */
> +    if (server->args.use_shm_open) {
> +        IVSHMEM_SERVER_DEBUG(server, "Using POSIX shared memory: %s\n",
> +                             server->args.shm_path);
> +        shm_fd = shm_open(server->args.shm_path, O_CREAT | O_RDWR, S_IRWXU);
> +    } else {
> +        gchar *filename = g_strdup_printf("%s/ivshmem.XXXXXX",
> +                                          server->args.shm_path);
> +        IVSHMEM_SERVER_DEBUG(server, "Using file-backed shared memory: %s\n",
> +                             server->args.shm_path);
> +        shm_fd = mkstemp(filename);
> +        unlink(filename);
> +        g_free(filename);
> +    }
> +
> +    if (shm_fd < 0) {
> +        fprintf(stderr, "cannot open shm file %s: %s\n", server->args.shm_path,
> +                strerror(errno));
> +        return -1;
> +    }
> +    if (ivshmem_server_ftruncate(shm_fd, server->args.shm_size) < 0) {
> +        fprintf(stderr, "ftruncate(%s) failed: %s\n", server->args.shm_path,
> +                strerror(errno));
> +        goto err_close_shm;
> +    }
> +    state_table = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED,
> +                       shm_fd, 0);
> +    if (state_table == MAP_FAILED) {
> +        fprintf(stderr, "mmap failed: %s\n", strerror(errno));
> +        goto err_close_shm;
> +    }
> +
> +    IVSHMEM_SERVER_DEBUG(server, "create & bind socket %s\n",
> +                         server->args.unix_socket_path);
> +
> +    /* create the unix listening socket */
> +    sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);
> +    if (sock_fd < 0) {
> +        IVSHMEM_SERVER_DEBUG(server, "cannot create socket: %s\n",
> +                             strerror(errno));
> +        goto err_unmap;
> +    }
> +
> +    sun.sun_family = AF_UNIX;
> +    ret = snprintf(sun.sun_path, sizeof(sun.sun_path), "%s",
> +                   server->args.unix_socket_path);
> +    if (ret < 0 || ret >= sizeof(sun.sun_path)) {
> +        IVSHMEM_SERVER_DEBUG(server, "could not copy unix socket path\n");
> +        goto err_close_sock;
> +    }
> +    if (bind(sock_fd, (struct sockaddr *)&sun, sizeof(sun)) < 0) {
> +        IVSHMEM_SERVER_DEBUG(server, "cannot connect to %s: %s\n", sun.sun_path,
> +                             strerror(errno));
> +        goto err_close_sock;
> +    }
> +
> +    if (listen(sock_fd, IVSHMEM_SERVER_LISTEN_BACKLOG) < 0) {
> +        IVSHMEM_SERVER_DEBUG(server, "listen() failed: %s\n", strerror(errno));
> +        goto err_close_sock;
> +    }
> +
> +    server->sock_fd = sock_fd;
> +    server->shm_fd = shm_fd;
> +    server->state_table = state_table;
> +
> +    return 0;
> +
> +err_close_sock:
> +    close(sock_fd);
> +err_unmap:
> +    munmap(state_table, 4096);
> +err_close_shm:
> +    if (server->args.use_shm_open) {
> +        shm_unlink(server->args.shm_path);
> +    }
> +    close(shm_fd);
> +    shm_unlink(server->args.shm_path);
> +    return -1;
> +}
> +
> +/* close connections to clients, the unix socket and the shm fd */
> +void
> +ivshmem_server_close(IvshmemServer *server)
> +{
> +    IvshmemServerPeer *peer, *npeer;
> +
> +    IVSHMEM_SERVER_DEBUG(server, "close server\n");
> +
> +    QTAILQ_FOREACH_SAFE(peer, &server->peer_list, next, npeer) {
> +        ivshmem_server_free_peer(server, peer);
> +    }
> +
> +    unlink(server->args.unix_socket_path);
> +    if (server->args.use_shm_open) {
> +        shm_unlink(server->args.shm_path);
> +    }
> +    close(server->sock_fd);
> +    munmap(server->state_table, 4096);
> +    close(server->shm_fd);
> +    server->sock_fd = -1;
> +    server->shm_fd = -1;
> +}
> +
> +/* get the fd_set according to the unix socket and the peer list */
> +void
> +ivshmem_server_get_fds(const IvshmemServer *server, fd_set *fds, int *maxfd)
> +{
> +    IvshmemServerPeer *peer;
> +
> +    if (server->sock_fd == -1) {
> +        return;
> +    }
> +
> +    FD_SET(server->sock_fd, fds);
> +    if (server->sock_fd >= *maxfd) {
> +        *maxfd = server->sock_fd + 1;
> +    }
> +
> +    QTAILQ_FOREACH(peer, &server->peer_list, next) {
> +        FD_SET(peer->sock_fd, fds);
> +        if (peer->sock_fd >= *maxfd) {
> +            *maxfd = peer->sock_fd + 1;
> +        }
> +    }
> +}
> +
> +/* process incoming messages on the sockets in fd_set */
> +int
> +ivshmem_server_handle_fds(IvshmemServer *server, fd_set *fds, int maxfd)
> +{
> +    IvshmemServerPeer *peer, *peer_next;
> +
> +    if (server->sock_fd < maxfd && FD_ISSET(server->sock_fd, fds) &&
> +        ivshmem_server_handle_new_conn(server) < 0 && errno != EINTR) {
> +        IVSHMEM_SERVER_DEBUG(server, "ivshmem_server_handle_new_conn() "
> +                             "failed\n");
> +        return -1;
> +    }
> +
> +    QTAILQ_FOREACH_SAFE(peer, &server->peer_list, next, peer_next) {
> +        /* any message from a peer socket result in a close() */
> +        IVSHMEM_SERVER_DEBUG(server, "peer->sock_fd=%d\n", peer->sock_fd);
> +        if (peer->sock_fd < maxfd && FD_ISSET(peer->sock_fd, fds)) {
> +            ivshmem_server_free_peer(server, peer);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +/* lookup peer from its id */
> +IvshmemServerPeer *
> +ivshmem_server_search_peer(IvshmemServer *server, int64_t peer_id)
> +{
> +    IvshmemServerPeer *peer;
> +
> +    QTAILQ_FOREACH(peer, &server->peer_list, next) {
> +        if (peer->id == peer_id) {
> +            return peer;
> +        }
> +    }
> +    return NULL;
> +}
> diff --git a/contrib/ivshmem2-server/ivshmem2-server.h b/contrib/ivshmem2-server/ivshmem2-server.h
> new file mode 100644
> index 0000000000..3fd6166577
> --- /dev/null
> +++ b/contrib/ivshmem2-server/ivshmem2-server.h
> @@ -0,0 +1,158 @@
> +/*
> + * Copyright 6WIND S.A., 2014
> + * Copyright (c) Siemens AG, 2019
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.  See the COPYING file in the
> + * top-level directory.
> + */
> +
> +#ifndef IVSHMEM2_SERVER_H
> +#define IVSHMEM2_SERVER_H
> +
> +/**
> + * The ivshmem server is a daemon that creates a unix socket in listen
> + * mode. The ivshmem clients (qemu or ivshmem-client) connect to this
> + * unix socket. For each client, the server will create some eventfd
> + * (see EVENTFD(2)), one per vector. These fd are transmitted to all
> + * clients using the SCM_RIGHTS cmsg message. Therefore, each client is
> + * able to send a notification to another client without being
> + * "profixied" by the server.
> + *
> + * We use this mechanism to send interruptions between guests.
> + * qemu is able to transform an event on a eventfd into a PCI MSI-x
> + * interruption in the guest.
> + *
> + * The ivshmem server is also able to share the file descriptor
> + * associated to the ivshmem shared memory.
> + */
> +
> +#include <sys/select.h>
> +
> +#include "qemu/event_notifier.h"
> +#include "qemu/queue.h"
> +#include "hw/misc/ivshmem2.h"
> +
> +/**
> + * Maximum number of notification vectors supported by the server
> + */
> +#define IVSHMEM_SERVER_MAX_VECTORS 64
> +
> +/**
> + * Structure storing a peer
> + *
> + * Each time a client connects to an ivshmem server, a new
> + * IvshmemServerPeer structure is created. This peer and all its
> + * vectors are advertised to all connected clients through the connected
> + * unix sockets.
> + */
> +typedef struct IvshmemServerPeer {
> +    QTAILQ_ENTRY(IvshmemServerPeer) next;    /**< next in list*/
> +    int sock_fd;                             /**< connected unix sock */
> +    int64_t id;                              /**< the id of the peer */
> +    EventNotifier vectors[IVSHMEM_SERVER_MAX_VECTORS]; /**< one per vector */
> +    unsigned vectors_count;                  /**< number of vectors */
> +} IvshmemServerPeer;
> +
> +/**
> + * Structure describing ivshmem server arguments
> + */
> +typedef struct IvshmemServerArgs {
> +    bool verbose;                   /**< true to enable verbose mode */
> +    const char *unix_socket_path;   /**< pointer to unix socket file name */
> +    const char *shm_path;           /**< Path to the shared memory; path
> +                                         corresponds to a POSIX shm name or a
> +                                         hugetlbfs mount point. */
> +    bool use_shm_open;              /**< true to use shm_open, false for
> +                                         file-backed shared memory */
> +    uint64_t shm_size;              /**< total size of shared memory */
> +    uint64_t output_section_size;   /**< size of each output section */
> +    unsigned max_peers;             /**< maximum number of peers */
> +    unsigned vectors;               /**< interrupt vectors per client */
> +    unsigned protocol;              /**< protocol advertised to all clients */
> +} IvshmemServerArgs;
> +
> +/**
> + * Structure describing an ivshmem server
> + *
> + * This structure stores all information related to our server: the name
> + * of the server unix socket and the list of connected peers.
> + */
> +typedef struct IvshmemServer {
> +    IvshmemServerArgs args;          /**< server arguments */
> +    int sock_fd;                     /**< unix sock file descriptor */
> +    int shm_fd;                      /**< shm file descriptor */
> +    uint32_t *state_table;           /**< mapped state table */
> +    QTAILQ_HEAD(, IvshmemServerPeer) peer_list; /**< list of peers */
> +} IvshmemServer;
> +
> +/**
> + * Initialize an ivshmem server
> + *
> + * @server:         A pointer to an uninitialized IvshmemServer structure
> + */
> +void ivshmem_server_init(IvshmemServer *server);
> +
> +/**
> + * Open the shm, then create and bind to the unix socket
> + *
> + * @server: The pointer to the initialized IvshmemServer structure
> + *
> + * Returns: 0 on success, or a negative value on error
> + */
> +int ivshmem_server_start(IvshmemServer *server);
> +
> +/**
> + * Close the server
> + *
> + * Close connections to all clients, close the unix socket and the
> + * shared memory file descriptor. The structure remains initialized, so
> + * it is possible to call ivshmem_server_start() again after a call to
> + * ivshmem_server_close().
> + *
> + * @server: The ivshmem server
> + */
> +void ivshmem_server_close(IvshmemServer *server);
> +
> +/**
> + * Fill a fd_set with file descriptors to be monitored
> + *
> + * This function will fill a fd_set with all file descriptors that must
> + * be polled (unix server socket and peers unix socket). The function
> + * will not initialize the fd_set, it is up to the caller to do it.
> + *
> + * @server: The ivshmem server
> + * @fds:    The fd_set to be updated
> + * @maxfd:  Must be set to the max file descriptor + 1 in fd_set. This value is
> + *          updated if this function adds a greater fd in fd_set.
> + */
> +void
> +ivshmem_server_get_fds(const IvshmemServer *server, fd_set *fds, int *maxfd);
> +
> +/**
> + * Read and handle new messages
> + *
> + * Given a fd_set (for instance filled by a call to select()), handle
> + * incoming messages from peers.
> + *
> + * @server: The ivshmem server
> + * @fds:    The fd_set containing the file descriptors to be checked. Note that
> + *          file descriptors that are not related to our server are ignored.
> + * @maxfd:  The maximum fd in fd_set, plus one.
> + *
> + * Returns: 0 on success, or a negative value on error
> + */
> +int ivshmem_server_handle_fds(IvshmemServer *server, fd_set *fds, int maxfd);
> +
> +/**
> + * Search a peer from its identifier
> + *
> + * @server:  The ivshmem server
> + * @peer_id: The identifier of the peer structure
> + *
> + * Returns:  The peer structure, or NULL if not found
> + */
> +IvshmemServerPeer *
> +ivshmem_server_search_peer(IvshmemServer *server, int64_t peer_id);
> +
> +#endif /* IVSHMEM2_SERVER_H */
> diff --git a/contrib/ivshmem2-server/main.c b/contrib/ivshmem2-server/main.c
> new file mode 100644
> index 0000000000..35cd6fca0f
> --- /dev/null
> +++ b/contrib/ivshmem2-server/main.c
> @@ -0,0 +1,313 @@
> +/*
> + * Copyright 6WIND S.A., 2014
> + * Copyright (c) Siemens AG, 2019
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.  See the COPYING file in the
> + * top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/cutils.h"
> +#include "qemu/option.h"
> +#include "ivshmem2-server.h"
> +
> +#define IVSHMEM_SERVER_DEFAULT_FOREGROUND     0
> +#define IVSHMEM_SERVER_DEFAULT_PID_FILE       "/var/run/ivshmem-server.pid"
> +
> +#define IVSHMEM_SERVER_DEFAULT_VERBOSE        0
> +#define IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH "/tmp/ivshmem_socket"
> +#define IVSHMEM_SERVER_DEFAULT_SHM_PATH       "ivshmem"
> +#define IVSHMEM_SERVER_DEFAULT_SHM_SIZE       (4*1024*1024)
> +#define IVSHMEM_SERVER_DEFAULT_OUTPUT_SEC_SZ  0
> +#define IVSHMEM_SERVER_DEFAULT_MAX_PEERS      2
> +#define IVSHMEM_SERVER_DEFAULT_VECTORS        1
> +#define IVSHMEM_SERVER_DEFAULT_PROTOCOL       0
> +
> +/* used to quit on signal SIGTERM */
> +static int ivshmem_server_quit;
> +
> +static bool foreground = IVSHMEM_SERVER_DEFAULT_FOREGROUND;
> +static const char *pid_file = IVSHMEM_SERVER_DEFAULT_PID_FILE;
> +
> +static void
> +ivshmem_server_usage(const char *progname)
> +{
> +    printf("Usage: %s [OPTION]...\n"
> +           "  -h: show this help\n"
> +           "  -v: verbose mode\n"
> +           "  -F: foreground mode (default is to daemonize)\n"
> +           "  -p <pid-file>: path to the PID file (used in daemon mode only)\n"
> +           "     default " IVSHMEM_SERVER_DEFAULT_PID_FILE "\n"
> +           "  -S <unix-socket-path>: path to the unix socket to listen to\n"
> +           "     default " IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH "\n"
> +           "  -M <shm-name>: POSIX shared memory object to use\n"
> +           "     default " IVSHMEM_SERVER_DEFAULT_SHM_PATH "\n"
> +           "  -m <dir-name>: where to create shared memory\n"
> +           "  -l <size>: size of shared memory in bytes\n"
> +           "     suffixes K, M and G can be used, e.g. 1K means 1024\n"
> +           "     default %u\n"
> +           "  -o <size>: size of each output section in bytes "
> +                "(suffixes supported)\n"
> +           "     default %u\n"
> +           "  -n <peers>: maximum number of peers\n"
> +           "     default %u\n"
> +           "  -V <vectors>: number of vectors\n"
> +           "     default %u\n"
> +           "  -P <protocol>: 16-bit protocol to be advertised\n"
> +           "     default 0x%04x\n"
> +           "     When using virtio (0x8000...0xffff), only two peers are "
> +           "supported, peer 0\n"
> +           "     will become backend, peer 1 frontend\n",
> +           progname, IVSHMEM_SERVER_DEFAULT_SHM_SIZE,
> +           IVSHMEM_SERVER_DEFAULT_OUTPUT_SEC_SZ,
> +           IVSHMEM_SERVER_DEFAULT_MAX_PEERS, IVSHMEM_SERVER_DEFAULT_VECTORS,
> +           IVSHMEM_SERVER_DEFAULT_PROTOCOL);
> +}
> +
> +static void
> +ivshmem_server_help(const char *progname)
> +{
> +    fprintf(stderr, "Try '%s -h' for more information.\n", progname);
> +}
> +
> +/* parse the program arguments, exit on error */
> +static void
> +ivshmem_server_parse_args(IvshmemServerArgs *args, int argc, char *argv[])
> +{
> +    int c;
> +    unsigned long long v;
> +    Error *err = NULL;
> +
> +    while ((c = getopt(argc, argv, "hvFp:S:m:M:l:o:n:V:P:")) != -1) {
> +
> +        switch (c) {
> +        case 'h': /* help */
> +            ivshmem_server_usage(argv[0]);
> +            exit(0);
> +            break;
> +
> +        case 'v': /* verbose */
> +            args->verbose = 1;
> +            break;
> +
> +        case 'F': /* foreground */
> +            foreground = 1;
> +            break;
> +
> +        case 'p': /* pid file */
> +            pid_file = optarg;
> +            break;
> +
> +        case 'S': /* unix socket path */
> +            args->unix_socket_path = optarg;
> +            break;
> +
> +        case 'M': /* shm name */
> +        case 'm': /* dir name */
> +            args->shm_path = optarg;
> +            args->use_shm_open = c == 'M';
> +            break;
> +
> +        case 'l': /* shm size */
> +            parse_option_size("shm_size", optarg, &args->shm_size, &err);
> +            if (err) {
> +                error_report_err(err);
> +                ivshmem_server_help(argv[0]);
> +                exit(1);
> +            }
> +            break;
> +
> +        case 'o': /* output section size */
> +            parse_option_size("output_section_size", optarg,
> +                              &args->output_section_size, &err);
> +            if (err) {
> +                error_report_err(err);
> +                ivshmem_server_help(argv[0]);
> +                exit(1);
> +            }
> +            break;
> +
> +        case 'n': /* maximum number of peers */
> +            if (parse_uint_full(optarg, &v, 0) < 0) {
> +                fprintf(stderr, "cannot parse max-peers\n");
> +                ivshmem_server_help(argv[0]);
> +                exit(1);
> +            }
> +            args->max_peers = v;
> +            break;
> +
> +        case 'V': /* number of vectors */
> +            if (parse_uint_full(optarg, &v, 0) < 0) {
> +                fprintf(stderr, "cannot parse vectors\n");
> +                ivshmem_server_help(argv[0]);
> +                exit(1);
> +            }
> +            args->vectors = v;
> +            break;
> +
> +        case 'P': /* protocol */
> +            if (parse_uint_full(optarg, &v, 0) < 0) {
> +                fprintf(stderr, "cannot parse protocol\n");
> +                ivshmem_server_help(argv[0]);
> +                exit(1);
> +            }
> +            args->protocol = v;
> +            break;
> +
> +        default:
> +            ivshmem_server_usage(argv[0]);
> +            exit(1);
> +            break;
> +        }
> +    }
> +
> +    if (args->vectors > IVSHMEM_SERVER_MAX_VECTORS) {
> +        fprintf(stderr, "too many requested vectors (max is %d)\n",
> +                IVSHMEM_SERVER_MAX_VECTORS);
> +        ivshmem_server_help(argv[0]);
> +        exit(1);
> +    }
> +
> +    if (args->protocol >= 0x8000 && args->max_peers > 2) {
> +        fprintf(stderr, "virtio protocols only support 2 peers\n");
> +        ivshmem_server_help(argv[0]);
> +        exit(1);
> +    }
> +
> +    if (args->verbose == 1 && foreground == 0) {
> +        fprintf(stderr, "cannot use verbose in daemon mode\n");
> +        ivshmem_server_help(argv[0]);
> +        exit(1);
> +    }
> +}
> +
> +/* wait for events on listening server unix socket and connected client
> + * sockets */
> +static int
> +ivshmem_server_poll_events(IvshmemServer *server)
> +{
> +    fd_set fds;
> +    int ret = 0, maxfd;
> +
> +    while (!ivshmem_server_quit) {
> +
> +        FD_ZERO(&fds);
> +        maxfd = 0;
> +        ivshmem_server_get_fds(server, &fds, &maxfd);
> +
> +        ret = select(maxfd, &fds, NULL, NULL, NULL);
> +
> +        if (ret < 0) {
> +            if (errno == EINTR) {
> +                continue;
> +            }
> +
> +            fprintf(stderr, "select error: %s\n", strerror(errno));
> +            break;
> +        }
> +        if (ret == 0) {
> +            continue;
> +        }
> +
> +        if (ivshmem_server_handle_fds(server, &fds, maxfd) < 0) {
> +            fprintf(stderr, "ivshmem_server_handle_fds() failed\n");
> +            break;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +static void
> +ivshmem_server_quit_cb(int signum)
> +{
> +    ivshmem_server_quit = 1;
> +}
> +
> +int
> +main(int argc, char *argv[])
> +{
> +    IvshmemServer server = {
> +        .args = {
> +            .verbose = IVSHMEM_SERVER_DEFAULT_VERBOSE,
> +            .unix_socket_path = IVSHMEM_SERVER_DEFAULT_UNIX_SOCK_PATH,
> +            .shm_path = IVSHMEM_SERVER_DEFAULT_SHM_PATH,
> +            .use_shm_open = true,
> +            .shm_size = IVSHMEM_SERVER_DEFAULT_SHM_SIZE,
> +            .output_section_size = IVSHMEM_SERVER_DEFAULT_OUTPUT_SEC_SZ,
> +            .max_peers = IVSHMEM_SERVER_DEFAULT_MAX_PEERS,
> +            .vectors = IVSHMEM_SERVER_DEFAULT_VECTORS,
> +            .protocol = IVSHMEM_SERVER_DEFAULT_PROTOCOL,
> +        },
> +    };
> +    struct sigaction sa, sa_quit;
> +    int ret = 1;
> +
> +    /*
> +     * Do not remove this notice without adding proper error handling!
> +     * Start with handling ivshmem_server_send_one_msg() failure.
> +     */
> +    printf("*** Example code, do not use in production ***\n");
> +
> +    /* parse arguments, will exit on error */
> +    ivshmem_server_parse_args(&server.args, argc, argv);
> +
> +    /* Ignore SIGPIPE, see this link for more info:
> +     * http://www.mail-archive.com/libevent-users@monkey.org/msg01606.html */
> +    sa.sa_handler = SIG_IGN;
> +    sa.sa_flags = 0;
> +    if (sigemptyset(&sa.sa_mask) == -1 ||
> +        sigaction(SIGPIPE, &sa, 0) == -1) {
> +        perror("failed to ignore SIGPIPE; sigaction");
> +        goto err;
> +    }
> +
> +    sa_quit.sa_handler = ivshmem_server_quit_cb;
> +    sa_quit.sa_flags = 0;
> +    if (sigemptyset(&sa_quit.sa_mask) == -1 ||
> +        sigaction(SIGTERM, &sa_quit, 0) == -1 ||
> +        sigaction(SIGINT, &sa_quit, 0) == -1) {
> +        perror("failed to add signal handler; sigaction");
> +        goto err;
> +    }
> +
> +    /* init the ivshms structure */
> +    ivshmem_server_init(&server);
> +
> +    /* start the ivshmem server (open shm & unix socket) */
> +    if (ivshmem_server_start(&server) < 0) {
> +        fprintf(stderr, "cannot bind\n");
> +        goto err;
> +    }
> +
> +    /* daemonize if asked to */
> +    if (!foreground) {
> +        FILE *fp;
> +
> +        if (qemu_daemon(1, 1) < 0) {
> +            fprintf(stderr, "cannot daemonize: %s\n", strerror(errno));
> +            goto err_close;
> +        }
> +
> +        /* write pid file */
> +        fp = fopen(pid_file, "w");
> +        if (fp == NULL) {
> +            fprintf(stderr, "cannot write pid file: %s\n", strerror(errno));
> +            goto err_close;
> +        }
> +
> +        fprintf(fp, "%d\n", (int) getpid());
> +        fclose(fp);
> +    }
> +
> +    ivshmem_server_poll_events(&server);
> +    fprintf(stdout, "server disconnected\n");
> +    ret = 0;
> +
> +err_close:
> +    ivshmem_server_close(&server);
> +err:
> +    return ret;
> +}
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU
  2020-04-29  4:15 ` Liang Yan
@ 2020-04-29 11:50   ` Jan Kiszka
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kiszka @ 2020-04-29 11:50 UTC (permalink / raw)
  To: Liang Yan, qemu-devel
  Cc: Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, Hannes Reinecke, Stefan Hajnoczi

On 29.04.20 06:15, Liang Yan wrote:
> Hi, All,
> 
> Did a test for these patches, all looked fine.
> 
> Test environment:
> Host: opensuse tumbleweed + latest upstream qemu  + these three patches
> Guest: opensuse tumbleweed root fs + custom kernel(5.5) + related
> uio-ivshmem driver + ivshmem-console/ivshmem-block tools
> 
> 
> 1. lspci show
> 
> 00:04.0 Unassigned class [ff80]: Siemens AG Device 4106 (prog-if 02)
> Subsystem: Red Hat, Inc. Device 1100
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
> <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Region 0: Memory at fea56000 (32-bit, non-prefetchable) [size=4K]
> Region 1: Memory at fea57000 (32-bit, non-prefetchable) [size=4K]
> Region 2: Memory at fd800000 (64-bit, prefetchable) [size=1M]
> Capabilities: [4c] Vendor Specific Information: Len=18 <?>
> Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
> Vector table: BAR=1 offset=00000000
> PBA: BAR=1 offset=00000800
> Kernel driver in use: virtio-ivshmem
> 
> 
> 2. virtio-ivshmem-console test
> 2.1 ivshmem2-server(host)
> 
> airey:~/ivshmem/qemu/:[0]# ./ivshmem2-server -F -l 64K -n 2 -V 3 -P 0x8003
> *** Example code, do not use in production ***
> 
> 2.2 guest vm backend(test-01)
> localhost:~ # echo "110a 4106 1af4 1100 ffc003 ffffff" >
> /sys/bus/pci/drivers/uio_ivshmem/new_id
> [  185.831277] uio_ivshmem 0000:00:04.0: state_table at
> 0x00000000fd800000, size 0x0000000000001000
> [  185.835129] uio_ivshmem 0000:00:04.0: rw_section at
> 0x00000000fd801000, size 0x0000000000007000
> 
> localhost:~ # virtio/virtio-ivshmem-console /dev/uio0
> Waiting for peer to be ready...
> 
> 2.3 guest vm frontend(test-02)
> need to boot or reboot after backend is done
> 
> 2.4 backend will serial output of frontend
> 
> localhost:~ # virtio/virtio-ivshmem-console /dev/uio0
> Waiting for peer to be ready...
> 
> localhost:~/virtio # ./virtio-ivshmem-console /dev/uio0
> Waiting for peer to be ready...
> Starting virtio device
> device_status: 0x0
> device_status: 0x1
> device_status: 0x3
> device_features_sel: 1
> device_features_sel: 0
> driver_features_sel: 1
> driver_features[1]: 0x13
> driver_features_sel: 0
> driver_features[0]: 0x1
> device_status: 0xb
> queue_sel: 0
> queue size: 8
> queue driver vector: 1
> queue desc: 0x200
> queue driver: 0x280
> queue device: 0x2c0
> queue enable: 1
> queue_sel: 1
> queue size: 8
> queue driver vector: 2
> queue desc: 0x400
> queue driver: 0x480
> queue device: 0x4c0
> queue enable: 1
> device_status: 0xf
> 
> Welcome to openSUSE Tumbleweed 20200326 - Kernel 5.5.0-rc5-1-default+
> (hvc0).
> 
> enp0s3:
> 
> 
> localhost login:
> 
> 2.5 close backend and frontend will show
> localhost:~ # [  185.685041] virtio-ivshmem 0000:00:04.0: backend failed!
> 
> 3. virtio-ivshmem-block test
> 
> 3.1 ivshmem2-server(host)
> airey:~/ivshmem/qemu/:[0]# ./ivshmem2-server -F -l 1M -n 2 -V 2 -P 0x8002
> *** Example code, do not use in production ***
> 
> 3.2 guest vm backend(test-01)
> 
> localhost:~ # echo "110a 4106 1af4 1100 ffc002 ffffff" >
> /sys/bus/pci/drivers/uio_ivshmem/new_id
> [   77.701462] uio_ivshmem 0000:00:04.0: state_table at
> 0x00000000fd800000, size 0x0000000000001000
> [   77.705231] uio_ivshmem 0000:00:04.0: rw_section at
> 0x00000000fd801000, size 0x00000000000ff000
> 
> localhost:~ # virtio/virtio-ivshmem-block /dev/uio0 /root/disk.img
> Waiting for peer to be ready...
> 
> 3.3 guest vm frontend(test-02)
> need to boot or reboot after backend is done
> 
> 3.4 guest vm backend(test-01)
> localhost:~ # virtio/virtio-ivshmem-block /dev/uio0 /root/disk.img
> Waiting for peer to be ready...
> Starting virtio device
> device_status: 0x0
> device_status: 0x1
> device_status: 0x3
> device_features_sel: 1
> device_features_sel: 0
> driver_features_sel: 1
> driver_features[1]: 0x13
> driver_features_sel: 0
> driver_features[0]: 0x206
> device_status: 0xb
> queue_sel: 0
> queue size: 8
> queue driver vector: 1
> queue desc: 0x200
> queue driver: 0x280
> queue device: 0x2c0
> queue enable: 1
> device_status: 0xf
> 
> 3.5 guest vm frontend(test-02), a new disk is attached:
> 
> fdisk /dev/vdb
> 
> Disk /dev/vdb: 192 KiB, 196608 bytes, 384 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> 
> 3.6 close backend and frontend will show
> localhost:~ # [ 1312.284301] virtio-ivshmem 0000:00:04.0: backend failed!
> 
> 
> 
> Tested-by: Liang Yan <lyan@suse.com>
> 

Thanks for testing this! I'll look into your patch findings.

Jan

> On 1/7/20 9:36 AM, Jan Kiszka wrote:
>> Overdue update of the ivshmem 2.0 device model as presented at [1].
>>
>> Changes in v2:
>>   - changed PCI device ID to Siemens-granted one,
>>     adjusted PCI device revision to 0
>>   - removed unused feature register from device
>>   - addressed feedback on specification document
>>   - rebased over master
>>
>> This version is now fully in sync with the implementation for Jailhouse
>> that is currently under review [2][3], UIO and virtio-ivshmem drivers
>> are shared. Jailhouse will very likely pick up this revision of the
>> device in order to move forward with stressing it.
>>
>> More details on the usage with QEMU were in the original cover letter
>> (with adjustements to the new device ID):
>>
>> If you want to play with this, the basic setup of the shared memory
>> device is described in patch 1 and 3. UIO driver and also the
>> virtio-ivshmem prototype can be found at
>>
>>      http://git.kiszka.org/?p=linux.git;a=shortlog;h=refs/heads/queues/ivshmem2
>>
>> Accessing the device via UIO is trivial enough. If you want to use it
>> for virtio, this is additionally to the description in patch 3 needed on
>> the virtio console backend side:
>>
>>      modprobe uio_ivshmem
>>      echo "110a 4106 1af4 1100 ffc003 ffffff" > /sys/bus/pci/drivers/uio_ivshmem/new_id
>>      linux/tools/virtio/virtio-ivshmem-console /dev/uio0
>>
>> And for virtio block:
>>
>>      echo "110a 4106 1af4 1100 ffc002 ffffff" > /sys/bus/pci/drivers/uio_ivshmem/new_id
>>      linux/tools/virtio/virtio-ivshmem-console /dev/uio0 /path/to/disk.img
>>
>> After that, you can start the QEMU frontend instance with the
>> virtio-ivshmem driver installed which can use the new /dev/hvc* or
>> /dev/vda* as usual.
>>
>> Any feedback welcome!
>>
>> Jan
>>
>> PS: Let me know if I missed someone potentially interested in this topic
>> on CC - or if you would like to be dropped from the list.
>>
>> [1] https://kvmforum2019.sched.com/event/TmxI
>> [2] https://groups.google.com/forum/#!topic/jailhouse-dev/ffnCcRh8LOs
>> [3] https://groups.google.com/forum/#!topic/jailhouse-dev/HX-0AGF1cjg
>>
>> Jan Kiszka (3):
>>    hw/misc: Add implementation of ivshmem revision 2 device
>>    docs/specs: Add specification of ivshmem device revision 2
>>    contrib: Add server for ivshmem revision 2
>>
>>   Makefile                                  |    3 +
>>   Makefile.objs                             |    1 +
>>   configure                                 |    1 +
>>   contrib/ivshmem2-server/Makefile.objs     |    1 +
>>   contrib/ivshmem2-server/ivshmem2-server.c |  462 ++++++++++++
>>   contrib/ivshmem2-server/ivshmem2-server.h |  158 +++++
>>   contrib/ivshmem2-server/main.c            |  313 +++++++++
>>   docs/specs/ivshmem-2-device-spec.md       |  376 ++++++++++
>>   hw/misc/Makefile.objs                     |    2 +-
>>   hw/misc/ivshmem2.c                        | 1085 +++++++++++++++++++++++++++++
>>   include/hw/misc/ivshmem2.h                |   48 ++
>>   include/hw/pci/pci_ids.h                  |    2 +
>>   12 files changed, 2451 insertions(+), 1 deletion(-)
>>   create mode 100644 contrib/ivshmem2-server/Makefile.objs
>>   create mode 100644 contrib/ivshmem2-server/ivshmem2-server.c
>>   create mode 100644 contrib/ivshmem2-server/ivshmem2-server.h
>>   create mode 100644 contrib/ivshmem2-server/main.c
>>   create mode 100644 docs/specs/ivshmem-2-device-spec.md
>>   create mode 100644 hw/misc/ivshmem2.c
>>   create mode 100644 include/hw/misc/ivshmem2.h
>>

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH v2 2/3] docs/specs: Add specification of ivshmem device revision 2
  2020-01-07 14:36 ` [RFC][PATCH v2 2/3] docs/specs: Add specification of ivshmem device revision 2 Jan Kiszka
@ 2020-05-21 16:53   ` Alex Bennée
  2020-05-21 18:18     ` Michael S. Tsirkin
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Bennée @ 2020-05-21 16:53 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: liang yan, Jailhouse, Claudio Fontana, Michael S . Tsirkin,
	Markus Armbruster, qemu-devel, Hannes Reinecke, Stefan Hajnoczi


Jan Kiszka <jan.kiszka@siemens.com> writes:

> From: Jan Kiszka <jan.kiszka@siemens.com>
>
> This imports the ivshmem v2 specification draft from Jailhouse where the
> implementation is about to be merged now. The final home of the spec is
> to be decided, this shall just simplify the review process at this
> stage.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  docs/specs/ivshmem-2-device-spec.md | 376 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 376 insertions(+)
>  create mode 100644 docs/specs/ivshmem-2-device-spec.md
>
> diff --git a/docs/specs/ivshmem-2-device-spec.md b/docs/specs/ivshmem-2-device-spec.md
> new file mode 100644
> index 0000000000..d93cb22b04
> --- /dev/null
> +++ b/docs/specs/ivshmem-2-device-spec.md
> @@ -0,0 +1,376 @@
> +IVSHMEM Device Specification
> +============================
> +
> +** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE
> SPECIFICATION! **

Has there been any proposal to the OASIS virtio spec to use this as a
transport for VirtIO?

-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH v2 2/3] docs/specs: Add specification of ivshmem device revision 2
  2020-05-21 16:53   ` Alex Bennée
@ 2020-05-21 18:18     ` Michael S. Tsirkin
  2020-05-22  7:18       ` Jan Kiszka
  0 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2020-05-21 18:18 UTC (permalink / raw)
  To: Alex Bennée
  Cc: liang yan, Jailhouse, Claudio Fontana, Jan Kiszka,
	Markus Armbruster, qemu-devel, Hannes Reinecke, Stefan Hajnoczi

On Thu, May 21, 2020 at 05:53:31PM +0100, Alex Bennée wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
> > From: Jan Kiszka <jan.kiszka@siemens.com>
> >
> > This imports the ivshmem v2 specification draft from Jailhouse where the
> > implementation is about to be merged now. The final home of the spec is
> > to be decided, this shall just simplify the review process at this
> > stage.
> >
> > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> > ---
> >  docs/specs/ivshmem-2-device-spec.md | 376 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 376 insertions(+)
> >  create mode 100644 docs/specs/ivshmem-2-device-spec.md
> >
> > diff --git a/docs/specs/ivshmem-2-device-spec.md b/docs/specs/ivshmem-2-device-spec.md
> > new file mode 100644
> > index 0000000000..d93cb22b04
> > --- /dev/null
> > +++ b/docs/specs/ivshmem-2-device-spec.md
> > @@ -0,0 +1,376 @@
> > +IVSHMEM Device Specification
> > +============================
> > +
> > +** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE
> > SPECIFICATION! **
> 
> Has there been any proposal to the OASIS virtio spec to use this as a
> transport for VirtIO?
> 
> -- 
> Alex Bennée


I think intergrating this in the virtio spec needs not gate the
implementation, but I'd like to see the current spec copied to
virtio-comments list to possibly get comments from that community.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH v2 2/3] docs/specs: Add specification of ivshmem device revision 2
  2020-05-21 18:18     ` Michael S. Tsirkin
@ 2020-05-22  7:18       ` Jan Kiszka
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kiszka @ 2020-05-22  7:18 UTC (permalink / raw)
  To: Michael S. Tsirkin, Alex Bennée
  Cc: liang yan, Jailhouse, Claudio Fontana, Markus Armbruster,
	qemu-devel, Hannes Reinecke, Stefan Hajnoczi

On 21.05.20 20:18, Michael S. Tsirkin wrote:
> On Thu, May 21, 2020 at 05:53:31PM +0100, Alex Bennée wrote:
>>
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>
>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> This imports the ivshmem v2 specification draft from Jailhouse where the
>>> implementation is about to be merged now. The final home of the spec is
>>> to be decided, this shall just simplify the review process at this
>>> stage.
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>> ---
>>>  docs/specs/ivshmem-2-device-spec.md | 376 ++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 376 insertions(+)
>>>  create mode 100644 docs/specs/ivshmem-2-device-spec.md
>>>
>>> diff --git a/docs/specs/ivshmem-2-device-spec.md b/docs/specs/ivshmem-2-device-spec.md
>>> new file mode 100644
>>> index 0000000000..d93cb22b04
>>> --- /dev/null
>>> +++ b/docs/specs/ivshmem-2-device-spec.md
>>> @@ -0,0 +1,376 @@
>>> +IVSHMEM Device Specification
>>> +============================
>>> +
>>> +** NOTE: THIS IS WORK-IN-PROGRESS, NOT YET A STABLE INTERFACE
>>> SPECIFICATION! **
>>
>> Has there been any proposal to the OASIS virtio spec to use this as a
>> transport for VirtIO?
>>

Not yet, primarily because the second half - virtio over generic shared
memory or even ivshmem2 in particular - has not been written yet. It
only exists as prototypes. Lacking time.

>> -- 
>> Alex Bennée
> 
> 
> I think intergrating this in the virtio spec needs not gate the
> implementation, but I'd like to see the current spec copied to
> virtio-comments list to possibly get comments from that community.
> 
> 

Will do.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-05-22  7:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-07 14:36 [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU Jan Kiszka
2020-01-07 14:36 ` [RFC][PATCH v2 1/3] hw/misc: Add implementation of ivshmem revision 2 device Jan Kiszka
2020-04-29  4:20   ` Liang Yan
2020-01-07 14:36 ` [RFC][PATCH v2 2/3] docs/specs: Add specification of ivshmem device revision 2 Jan Kiszka
2020-05-21 16:53   ` Alex Bennée
2020-05-21 18:18     ` Michael S. Tsirkin
2020-05-22  7:18       ` Jan Kiszka
2020-01-07 14:36 ` [RFC][PATCH v2 3/3] contrib: Add server for ivshmem " Jan Kiszka
2020-04-29  4:22   ` Liang Yan
2020-04-09 13:52 ` [RFC][PATCH v2 0/3] IVSHMEM version 2 device for QEMU Liang Yan
2020-04-09 14:11   ` Jan Kiszka
2020-04-10  0:59     ` Liang Yan
2020-04-29  4:15 ` Liang Yan
2020-04-29 11:50   ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.