All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split
@ 2016-02-29 18:40 Markus Armbruster
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 01/38] exec: Fix memory allocation when memory path names new file Markus Armbruster
                   ` (37 more replies)
  0 siblings, 38 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Major issues addressed by this series:

* The specification document is incomplete and vague.  Rewritten.

* When a peer goes away, and its ID gets reused for another one,
  interrupts don't work.

* When configured for interrupts, we receive shared memory from the
  server some time after realize().  This creates a (usually
  short-lived) "no shared memory, yet" state.  If the guest wins the
  race, it is exposed to this state (known issue, if you count burying
  in docs/specs/ as "known").  If migration wins the race, it fails or
  corrupts memory.

* Interrupts are unreliable in a (usually small) time window after the
  destination peer connects.  I believe fixing this will require
  changing the client/server protocol, so just document it for now.

* The device isn't capable to tell guest software whether it is
  configured for interrupts.  Fix that in a new, backwards-compatible
  revision of the guest ABI, and bump the PCI revision.  Deprecate the
  old revision.

* The device properties are a confusing mess and badly checked.
  Clean that up.

* Migration with interrupts relies on server behavior not guaranteed
  by the specification.  Tighten the specification.

Paolo, I'd like your opinion on PATCH 1, 3, 29 and 30.  I hope I can
get competent review for my other patches elsewhere, but you're of
course welcome to review them, too.

Markus Armbruster (38):
  exec: Fix memory allocation when memory path names new file
  qemu-doc: Fix ivshmem huge page example
  event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD
  tests/libqos/pci-pc: Fix qpci_pc_iomap() to map BARs aligned
  ivshmem-test: Improve test case /ivshmem/single
  ivshmem-test: Clean up wait for devices to become operational
  ivshmem-test: Improve test cases /ivshmem/server-*
  ivshmem: Rewrite specification document
  ivshmem: Add missing newlines to debug printfs
  ivshmem: Compile debug prints unconditionally to prevent bit-rot
  ivshmem: Clean up after commit 9940c32
  ivshmem: Drop ivshmem_event() stub
  ivshmem: Don't destroy the chardev on version mismatch
  ivshmem: Fix harmless misuse of Error
  ivshmem: Failed realize() can leave migration blocker behind
  ivshmem: Clean up register callbacks
  ivshmem: Clean up MSI-X conditions
  ivshmem: Leave INTx alone when using MSI-X
  ivshmem: Assert interrupts are set up once
  ivshmem: Simplify rejection of invalid peer ID from server
  ivshmem: Disentangle ivshmem_read()
  ivshmem: Plug leaks on unplug, fix peer disconnect
  ivshmem: Receive shared memory synchronously in realize()
  ivshmem: Propagate errors through ivshmem_recv_setup()
  ivshmem: Rely on server sending the ID right after the version
  ivshmem: Drop the hackish test for UNIX domain chardev
  ivshmem: Simplify how we cope with short reads from server
  ivshmem: Tighten check of property "size"
  ivshmem: Implement shm=... with a memory backend
  ivshmem: Simplify memory regions for BAR 2 (shared memory)
  ivshmem: Inline check_shm_size() into its only caller
  qdev: New DEFINE_PROP_ON_OFF_AUTO
  ivshmem: Replace int role_val by OnOffAuto master
  ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem
  ivshmem: Clean up after the previous commit
  ivshmem: Drop ivshmem property x-memdev
  ivshmem: Require master to have ID zero
  contrib/ivshmem-server: Print "not for production" warning

 contrib/ivshmem-server/main.c      |    6 +
 default-configs/pci.mak            |    2 +-
 docs/specs/ivshmem-spec.txt        |  255 +++++++++
 docs/specs/ivshmem_device_spec.txt |  161 ------
 exec.c                             |   50 +-
 hw/core/qdev-properties.c          |   10 +
 hw/misc/ivshmem.c                  | 1115 +++++++++++++++++++-----------------
 include/hw/qdev-properties.h       |    3 +
 qemu-doc.texi                      |   47 +-
 tests/ivshmem-test.c               |   97 ++--
 tests/libqos/pci-pc.c              |    8 +-
 util/event_notifier-posix.c        |    6 +
 12 files changed, 996 insertions(+), 764 deletions(-)
 create mode 100644 docs/specs/ivshmem-spec.txt
 delete mode 100644 docs/specs/ivshmem_device_spec.txt

-- 
2.4.3

^ permalink raw reply	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 01/38] exec: Fix memory allocation when memory path names new file
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 11:35   ` Paolo Bonzini
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 02/38] qemu-doc: Fix ivshmem huge page example Markus Armbruster
                   ` (36 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Commit 8d31d6b extended file_ram_alloc() to accept file names in
addition to directory names.  Even though it passes O_CREAT to open(),
it actually works only for existing files.  Reproducer adapted from
the commit's qemu-doc.texi update:

    $ qemu-system-x86_64 -object memory-backend-file,size=2M,mem-path=/dev/hugepages/my-shmem-file,id=mb1
    qemu-system-x86_64: -object memory-backend-file,size=2M,mem-path=/dev/hugepages/my-shmem-file,id=mb1: failed to get page size of file /dev/hugepages/my-shmem-file: No such file or directory

Rearrange the code to create the file (if necessary) before getting
its page size.

While there, replace "hugepages" by "guest RAM" in error messages,
because host memory backends can be used for purposes other than huge
pages, e.g. /dev/shm/ shared memory.  Help text of -mem-path agrees.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 exec.c | 50 +++++++++++++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 19 deletions(-)

diff --git a/exec.c b/exec.c
index c62c439..81b0063 100644
--- a/exec.c
+++ b/exec.c
@@ -1235,36 +1235,25 @@ static void *file_ram_alloc(RAMBlock *block,
                             const char *path,
                             Error **errp)
 {
+    bool unlink_on_error = false;
     struct stat st;
     char *filename;
     char *sanitized_name;
     char *c;
     void *area;
-    int fd;
+    int ret, fd;
     uint64_t hpagesize;
     Error *local_err = NULL;
 
-    hpagesize = gethugepagesize(path, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        goto error;
-    }
-    block->mr->align = hpagesize;
-
-    if (memory < hpagesize) {
-        error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
-                   "or larger than huge page size 0x%" PRIx64,
-                   memory, hpagesize);
-        goto error;
-    }
-
     if (kvm_enabled() && !kvm_has_sync_mmu()) {
         error_setg(errp,
                    "host lacks kvm mmu notifiers, -mem-path unsupported");
-        goto error;
+        return NULL;
     }
 
-    if (!stat(path, &st) && S_ISDIR(st.st_mode)) {
+    ret = stat(path, &st);
+    if (!ret && S_ISDIR(st.st_mode)) {
+        /* path names a directory -> create a temporary file there */
         /* Make name safe to use with mkstemp by replacing '/' with '_'. */
         sanitized_name = g_strdup(memory_region_name(block->mr));
         for (c = sanitized_name; *c != '\0'; c++) {
@@ -1282,13 +1271,32 @@ static void *file_ram_alloc(RAMBlock *block,
             unlink(filename);
         }
         g_free(filename);
+    } else if (!ret) {
+        /* path names an existing file -> use it */
+        fd = open(path, O_RDWR);
     } else {
+        /* create a new file */
         fd = open(path, O_RDWR | O_CREAT, 0644);
+        unlink_on_error = true;
     }
 
     if (fd < 0) {
         error_setg_errno(errp, errno,
-                         "unable to create backing store for hugepages");
+                         "unable to create backing store for guest RAM");
+        return NULL;
+    }
+
+    hpagesize = gethugepagesize(path, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        goto error;
+    }
+    block->mr->align = hpagesize;
+
+    if (memory < hpagesize) {
+        error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
+                   "or larger than page size 0x%" PRIx64,
+                   memory, hpagesize);
         goto error;
     }
 
@@ -1307,7 +1315,7 @@ static void *file_ram_alloc(RAMBlock *block,
     area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
-                         "unable to map backing store for hugepages");
+                         "unable to map backing store for guest RAM");
         close(fd);
         goto error;
     }
@@ -1320,6 +1328,10 @@ static void *file_ram_alloc(RAMBlock *block,
     return area;
 
 error:
+    if (unlink_on_error) {
+        unlink(path);
+    }
+    close(fd);
     return NULL;
 }
 #endif
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 02/38] qemu-doc: Fix ivshmem huge page example
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 01/38] exec: Fix memory allocation when memory path names new file Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 10:51   ` Marc-André Lureau
  2016-03-01 11:35   ` Paolo Bonzini
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 03/38] event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD Markus Armbruster
                   ` (35 subsequent siblings)
  37 siblings, 2 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Option parameter "share" is missing.  Without it, you get a *private*
mmap(), which defeats ivshmem's purpose pretty thoroughly ;)

While there, switch to the conventional mountpoint of hugetlbfs
/dev/hugepages.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qemu-doc.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-doc.texi b/qemu-doc.texi
index bc9dd13..65f3b29 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1311,7 +1311,7 @@ Instead of specifying the <shm size> using POSIX shm, you may specify
 a memory backend that has hugepage support:
 
 @example
-qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/mnt/hugepages/my-shmem-file,id=mb1
+qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1
                  -device ivshmem,x-memdev=mb1
 @end example
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 03/38] event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 01/38] exec: Fix memory allocation when memory path names new file Markus Armbruster
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 02/38] qemu-doc: Fix ivshmem huge page example Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 10:57   ` Marc-André Lureau
  2016-03-01 11:35   ` Paolo Bonzini
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 04/38] tests/libqos/pci-pc: Fix qpci_pc_iomap() to map BARs aligned Markus Armbruster
                   ` (34 subsequent siblings)
  37 siblings, 2 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Event notifiers are designed for eventfd(2).  They can fall back to
pipes, but according to Paolo, event_notifier_init_fd() really
requires the real thing, and should therefore be under #ifdef
CONFIG_EVENTFD.  Do that.

Its only user is ivshmem, which is currently CONFIG_POSIX.  Narrow it
to CONFIG_EVENTFD.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 default-configs/pci.mak     | 2 +-
 util/event_notifier-posix.c | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index 4fa9a28..9c8bc68 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -36,5 +36,5 @@ CONFIG_SDHCI=y
 CONFIG_EDU=y
 CONFIG_VGA=y
 CONFIG_VGA_PCI=y
-CONFIG_IVSHMEM=$(CONFIG_POSIX)
+CONFIG_IVSHMEM=$(CONFIG_EVENTFD)
 CONFIG_ROCKER=y
diff --git a/util/event_notifier-posix.c b/util/event_notifier-posix.c
index 2e30e74..c9657a6 100644
--- a/util/event_notifier-posix.c
+++ b/util/event_notifier-posix.c
@@ -20,11 +20,17 @@
 #include <sys/eventfd.h>
 #endif
 
+#ifdef CONFIG_EVENTFD
+/*
+ * Initialize @e with existing file descriptor @fd.
+ * @fd must be a genuine eventfd object, emulation with pipe won't do.
+ */
 void event_notifier_init_fd(EventNotifier *e, int fd)
 {
     e->rfd = fd;
     e->wfd = fd;
 }
+#endif
 
 int event_notifier_init(EventNotifier *e, int active)
 {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 04/38] tests/libqos/pci-pc: Fix qpci_pc_iomap() to map BARs aligned
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (2 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 03/38] event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 11:05   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 05/38] ivshmem-test: Improve test case /ivshmem/single Markus Armbruster
                   ` (33 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

qpci_pc_iomap() maps BARs one after the other, without padding.  This
is wrong.  PCI Local Bus Specification Revision 3.0, 6.2.5.1. Address
Maps: "all address spaces used are a power of two in size and are
naturally aligned".  That's because the size of a BAR is given by the
number of address bits the device decodes, and the BAR needs to be
mapped at a multiple of that size to ensure the address decoding
works.

Fix qpci_pc_iomap() accordingly.  This takes care of a FIXME in
ivshmem-test.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/ivshmem-test.c  | 17 ++++++++---------
 tests/libqos/pci-pc.c |  8 ++++++--
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
index e184c67..e118377 100644
--- a/tests/ivshmem-test.c
+++ b/tests/ivshmem-test.c
@@ -110,19 +110,18 @@ static void setup_vm_cmd(IVState *s, const char *cmd, bool msix)
     s->pcibus = qpci_init_pc();
     s->dev = get_device(s->pcibus);
 
-    /* FIXME: other bar order fails, mappings changes */
-    s->mem_base = qpci_iomap(s->dev, 2, &barsize);
-    g_assert_nonnull(s->mem_base);
-    g_assert_cmpuint(barsize, ==, TMPSHMSIZE);
-
-    if (msix) {
-        qpci_msix_enable(s->dev);
-    }
-
     s->reg_base = qpci_iomap(s->dev, 0, &barsize);
     g_assert_nonnull(s->reg_base);
     g_assert_cmpuint(barsize, ==, 256);
 
+    if (msix) {
+        qpci_msix_enable(s->dev);
+    }
+
+    s->mem_base = qpci_iomap(s->dev, 2, &barsize);
+    g_assert_nonnull(s->mem_base);
+    g_assert_cmpuint(barsize, ==, TMPSHMSIZE);
+
     qpci_device_enable(s->dev);
 }
 
diff --git a/tests/libqos/pci-pc.c b/tests/libqos/pci-pc.c
index 08167c0..77f15e5 100644
--- a/tests/libqos/pci-pc.c
+++ b/tests/libqos/pci-pc.c
@@ -184,7 +184,9 @@ static void *qpci_pc_iomap(QPCIBus *bus, QPCIDevice *dev, int barno, uint64_t *s
     if (io_type == PCI_BASE_ADDRESS_SPACE_IO) {
         uint16_t loc;
 
-        g_assert((s->pci_iohole_alloc + size) <= s->pci_iohole_size);
+        g_assert(QEMU_ALIGN_UP(s->pci_iohole_alloc, size) + size
+                 <= s->pci_iohole_size);
+        s->pci_iohole_alloc = QEMU_ALIGN_UP(s->pci_iohole_alloc, size);
         loc = s->pci_iohole_start + s->pci_iohole_alloc;
         s->pci_iohole_alloc += size;
 
@@ -194,7 +196,9 @@ static void *qpci_pc_iomap(QPCIBus *bus, QPCIDevice *dev, int barno, uint64_t *s
     } else {
         uint64_t loc;
 
-        g_assert((s->pci_hole_alloc + size) <= s->pci_hole_size);
+        g_assert(QEMU_ALIGN_UP(s->pci_hole_alloc, size) + size
+                 <= s->pci_hole_size);
+        s->pci_hole_alloc = QEMU_ALIGN_UP(s->pci_hole_alloc, size);
         loc = s->pci_hole_start + s->pci_hole_alloc;
         s->pci_hole_alloc += size;
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 05/38] ivshmem-test: Improve test case /ivshmem/single
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (3 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 04/38] tests/libqos/pci-pc: Fix qpci_pc_iomap() to map BARs aligned Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 11:06   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 06/38] ivshmem-test: Clean up wait for devices to become operational Markus Armbruster
                   ` (32 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Test state of registers after reset.

Test reading Interrupt Status clears it.

Test (invalid) read of Doorbell.

Add more comments.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/ivshmem-test.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
index e118377..ba4d9f1 100644
--- a/tests/ivshmem-test.c
+++ b/tests/ivshmem-test.c
@@ -143,32 +143,41 @@ static void test_ivshmem_single(void)
     setup_vm(&state);
     s = &state;
 
-    /* valid io */
-    out_reg(s, INTRMASK, 0);
-    in_reg(s, INTRSTATUS);
-    in_reg(s, IVPOSITION);
+    /* initial state of readable registers */
+    g_assert_cmpuint(in_reg(s, INTRMASK), ==, 0);
+    g_assert_cmpuint(in_reg(s, INTRSTATUS), ==, 0);
+    g_assert_cmpuint(in_reg(s, IVPOSITION), ==, 0);
 
+    /* trigger interrupt via registers */
     out_reg(s, INTRMASK, 0xffffffff);
     g_assert_cmpuint(in_reg(s, INTRMASK), ==, 0xffffffff);
     out_reg(s, INTRSTATUS, 1);
-    /* XXX: intercept IRQ, not seen in resp */
+    /* check interrupt status */
     g_assert_cmpuint(in_reg(s, INTRSTATUS), ==, 1);
+    /* reading clears */
+    g_assert_cmpuint(in_reg(s, INTRSTATUS), ==, 0);
+    /* TODO intercept actual interrupt (needs qtest work) */
 
-    /* invalid io */
+    /* invalid register access */
     out_reg(s, IVPOSITION, 1);
+    in_reg(s, DOORBELL);
+
+    /* ring the (non-functional) doorbell */
     out_reg(s, DOORBELL, 8 << 16);
 
+    /* write shared memory */
     for (i = 0; i < G_N_ELEMENTS(data); i++) {
         data[i] = i;
     }
     qtest_memwrite(s->qtest, (uintptr_t)s->mem_base, data, sizeof(data));
 
+    /* verify write */
     for (i = 0; i < G_N_ELEMENTS(data); i++) {
         g_assert_cmpuint(((uint32_t *)tmpshmem)[i], ==, i);
     }
 
+    /* read it back and verify read */
     memset(data, 0, sizeof(data));
-
     qtest_memread(s->qtest, (uintptr_t)s->mem_base, data, sizeof(data));
     for (i = 0; i < G_N_ELEMENTS(data); i++) {
         g_assert_cmpuint(data[i], ==, i);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 06/38] ivshmem-test: Clean up wait for devices to become operational
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (4 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 05/38] ivshmem-test: Improve test case /ivshmem/single Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 11:10   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 07/38] ivshmem-test: Improve test cases /ivshmem/server-* Markus Armbruster
                   ` (31 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

test_ivshmem_server() waits until the first byte in BAR 2 contains the
0x42 we put into shared memory.  Works because the byte reads zero
until the device maps the shared memory gotten from the server.

Check the IVPosition register instead: it's initially -1, and becomes
non-negative right when the device maps the share memory, so no
change, just cleaner, because it's what guest software is supposed to
do.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/ivshmem-test.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
index ba4d9f1..f40c3497 100644
--- a/tests/ivshmem-test.c
+++ b/tests/ivshmem-test.c
@@ -301,7 +301,6 @@ static void test_ivshmem_server(bool msi)
     int nvectors = 2;
     guint64 end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND;
 
-    memset(tmpshmem, 0x42, TMPSHMSIZE);
     ret = ivshmem_server_init(&server, tmpserver, tmpshm,
                               TMPSHMSIZE, nvectors,
                               g_test_verbose());
@@ -315,9 +314,9 @@ static void test_ivshmem_server(bool msi)
     setup_vm_with_server(&state2, nvectors, msi);
     s2 = &state2;
 
+    /* check state before server sends stuff */
     g_assert_cmpuint(in_reg(s1, IVPOSITION), ==, 0xffffffff);
     g_assert_cmpuint(in_reg(s2, IVPOSITION), ==, 0xffffffff);
-
     g_assert_cmpuint(qtest_readb(s1->qtest, (uintptr_t)s1->mem_base), ==, 0x00);
 
     thread.server = &server;
@@ -326,12 +325,11 @@ static void test_ivshmem_server(bool msi)
     thread.thread = g_thread_new("ivshmem-server", server_thread, &thread);
     g_assert(thread.thread != NULL);
 
-    /* waiting until mapping is done */
+    /* waiting for devices to become operational */
     while (g_get_monotonic_time() < end_time) {
         g_usleep(1000);
-
-        if (qtest_readb(s1->qtest, (uintptr_t)s1->mem_base) == 0x42 &&
-            qtest_readb(s2->qtest, (uintptr_t)s2->mem_base) == 0x42) {
+        if ((int)in_reg(s1, IVPOSITION) >= 0 &&
+            (int)in_reg(s2, IVPOSITION) >= 0) {
             break;
         }
     }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 07/38] ivshmem-test: Improve test cases /ivshmem/server-*
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (5 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 06/38] ivshmem-test: Clean up wait for devices to become operational Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 11:13   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 08/38] ivshmem: Rewrite specification document Markus Armbruster
                   ` (30 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Document missing test: behavior with MSI-X present but not enabled.

For MSI-X, we test and clear the interrupt pending bit before testing
the interrupt.  For INTx, we only clear.  Change to test and clear for
consistency.

Test MSI-X vector 1 in addition to vector 0.

Improve comments.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 tests/ivshmem-test.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
index f40c3497..c1dd7bb 100644
--- a/tests/ivshmem-test.c
+++ b/tests/ivshmem-test.c
@@ -339,18 +339,21 @@ static void test_ivshmem_server(bool msi)
     vm2 = in_reg(s2, IVPOSITION);
     g_assert_cmpuint(vm1, !=, vm2);
 
+    /* check number of MSI-X vectors */
     global_qtest = s1->qtest;
     if (msi) {
         ret = qpci_msix_table_size(s1->dev);
         g_assert_cmpuint(ret, ==, nvectors);
     }
 
-    /* ping vm2 -> vm1 */
+    /* TODO test behavior before MSI-X is enabled */
+
+    /* ping vm2 -> vm1 on vector 0 */
     if (msi) {
         ret = qpci_msix_pending(s1->dev, 0);
         g_assert_cmpuint(ret, ==, 0);
     } else {
-        out_reg(s1, INTRSTATUS, 0);
+        g_assert_cmpuint(in_reg(s1, INTRSTATUS), ==, 0);
     }
     out_reg(s2, DOORBELL, vm1 << 16);
     do {
@@ -359,18 +362,18 @@ static void test_ivshmem_server(bool msi)
     } while (ret == 0 && g_get_monotonic_time() < end_time);
     g_assert_cmpuint(ret, !=, 0);
 
-    /* ping vm1 -> vm2 */
+    /* ping vm1 -> vm2 on vector 1 */
     global_qtest = s2->qtest;
     if (msi) {
-        ret = qpci_msix_pending(s2->dev, 0);
+        ret = qpci_msix_pending(s2->dev, 1);
         g_assert_cmpuint(ret, ==, 0);
     } else {
-        out_reg(s2, INTRSTATUS, 0);
+        g_assert_cmpuint(in_reg(s2, INTRSTATUS), ==, 0);
     }
-    out_reg(s1, DOORBELL, vm2 << 16);
+    out_reg(s1, DOORBELL, vm2 << 16 | 1);
     do {
         g_usleep(10000);
-        ret = msi ? qpci_msix_pending(s2->dev, 0) : in_reg(s2, INTRSTATUS);
+        ret = msi ? qpci_msix_pending(s2->dev, 1) : in_reg(s2, INTRSTATUS);
     } while (ret == 0 && g_get_monotonic_time() < end_time);
     g_assert_cmpuint(ret, !=, 0);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 08/38] ivshmem: Rewrite specification document
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (6 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 07/38] ivshmem-test: Improve test cases /ivshmem/server-* Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 11:25   ` Marc-André Lureau
  2016-03-01 15:46   ` Eric Blake
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 09/38] ivshmem: Add missing newlines to debug printfs Markus Armbruster
                   ` (29 subsequent siblings)
  37 siblings, 2 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

This started as an attempt to update ivshmem_device_spec.txt for
clarity, accuracy and completeness while working on its code, and
quickly became a full rewrite.  Since the diff would be useless
anyway, I'm using the opportunity to rename the file to
ivshmem-spec.txt.

I tried hard to ensure the new text contradicts neither the old text
nor the code.  If the new text contradicts the old text but not the
code, it's probably a bug in the old text.  If the new text
contradicts both, its probably a bug in the new text.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/specs/ivshmem-spec.txt        | 244 +++++++++++++++++++++++++++++++++++++
 docs/specs/ivshmem_device_spec.txt | 161 ------------------------
 2 files changed, 244 insertions(+), 161 deletions(-)
 create mode 100644 docs/specs/ivshmem-spec.txt
 delete mode 100644 docs/specs/ivshmem_device_spec.txt

diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
new file mode 100644
index 0000000..0835ba1
--- /dev/null
+++ b/docs/specs/ivshmem-spec.txt
@@ -0,0 +1,244 @@
+= Device Specification for Inter-VM shared memory device =
+
+The Inter-VM shared memory device (ivshmem) is designed to share a
+memory region between multiple QEMU processes running different guests
+and the host.  In order for all guests to be able to pick up the
+shared memory area, it is modeled by QEMU as a PCI device exposing
+said memory to the guest as a PCI BAR.
+
+The device can use a shared memory object on the host directly, or it
+can obtain one from an ivshmem server.
+
+In the latter case, the device can additionally interrupt its peers, and
+get interrupted by its peers.
+
+
+== Configuring the ivshmem PCI device ==
+
+There are two basic configurations:
+
+- Just shared memory: -device ivshmem,shm=NAME,...
+
+  This uses shared memory object NAME.
+
+- Shared memory plus interrupts: -device ivshmem,chardev=CHR,vectors=N,...
+
+  An ivshmem server must already be running on the host.  The device
+  connects to the server's UNIX domain socket via character device
+  CHR.
+
+  Each peer gets assigned a unique ID by the server.  IDs must be
+  between 0 and 65535.
+
+  Interrupts are message-signaled by default (MSI-X).  With msi=off
+  the device has no MSI-X capability, and uses legacy INTx instead.
+  vectors=N configures the number of vectors to use.
+
+For more details on ivshmem device properties, see The QEMU Emulator
+User Documentation (qemu-doc.*).
+
+
+== The ivshmem PCI device's guest interface ==
+
+The device has vendor ID 1af4, device ID 1110, revision 0.
+
+=== PCI BARs ===
+
+The ivshmem PCI device has two or three BARs:
+
+- BAR0 holds device registers (256 Byte MMIO)
+- BAR1 holds MSI-X table and PBA (only when using MSI-X)
+- BAR2 maps the shared memory object
+
+There are two ways to use this device:
+
+- If you only need the shared memory part, BAR2 suffices.  This way,
+  you have access to the shared memory in the guest and can use it as
+  you see fit.  Memnic, for example, uses ivshmem this way from guest
+  user space (see http://dpdk.org/browse/memnic).
+
+- If you additionally need the capability for peers to interrupt each
+  other, you need BAR0 and, if using MSI-X, BAR1.  You will most
+  likely want to write a kernel driver to handle interrupts.  Requires
+  the device to be configured for interrupts, obviously.
+
+If the device is configured for interrupts, BAR2 is initially invalid.
+It becomes safely accessible only after the ivshmem server provided
+the shared memory.  Guest software should wait for the IVPosition
+register (described below) to become non-negative before accessing
+BAR2.
+
+The device is not capable to tell guest software whether it is
+configured for interrupts.
+
+=== PCI device registers ===
+
+BAR 0 contains the following registers:
+
+    Offset  Size  Access      On reset  Function
+        0     4   read/write        0   Interrupt Mask
+                                        bit 0: peer interrupt
+                                        bit 1..31: reserved
+        4     4   read/write        0   Interrupt Status
+                                        bit 0: peer interrupt
+                                        bit 1..31: reserved
+        8     4   read-only   0 or -1   IVPosition
+       12     4   write-only      N/A   Doorbell
+                                        bit 0..15: vector
+                                        bit 16..31: peer ID
+       16   240   none            N/A   reserved
+
+Software should only access the registers as specified in column
+"Access".  Reserved bits should be ignored on read, and preserved on
+write.
+
+Interrupt Status and Mask Register together control the legacy INTx
+interrupt when the device has no MSI-X capability: INTx is asserted
+when the bit-wise AND of Status and Mask is non-zero and the device
+has no MSI-X capability.  Interrupt Status Register bit 0 becomes 1
+when an interrupt request from a peer is received.  Reading the
+register clears it.
+
+IVPosition Register: if the device is not configured for interrupts,
+this is zero.  Else, it's -1 for a short while after reset, then
+changes to the device's ID (between 0 and 65535).
+
+There is no good way for software to find out whether the device is
+configured for interrupts.  A positive IVPosition means interrupts,
+but zero could be either.  The initial -1 cannot be reliably observed.
+
+Doorbell Register: writing this register requests to interrupt a peer.
+The written value's high 16 bits are the ID of the peer to interrupt,
+and its low 16 bits select an interrupt vector.
+
+If the device is not configured for interrupts, the write is ignored.
+
+If the interrupt hasn't completed setup, the write is ignored.  The
+device is not capable to tell guest software whether setup is
+complete.  Interrupts can regress to this state on migration.
+
+If the peer with the requested ID isn't connected, or it has fewer
+interrupt vectors connected, the write is ignored.  The device is not
+capable to tell guest software what peers are connected, or how many
+interrupt vectors are connected.
+
+If the peer doesn't use MSI-X, its Interrupt Status register is set to
+1.  This asserts INTx unless masked by the Interrupt Mask register.
+The device is not capable to communicate the interrupt vector to guest
+software then.
+
+If the peer uses MSI-X, the interrupt for this vector becomes pending.
+There is no way for software to clear the pending bit, and a polling
+mode of operation is therefore impossible with MSI-X.
+
+With multiple MSI-X vectors, different vectors can be used to indicate
+different events have occurred.  The semantics of interrupt vectors
+are left to the application.
+
+
+== Interrupt infrastructure ==
+
+When configured for interrupts, the peers share eventfd objects in
+addition to shared memory.  The shared resources are managed by an
+ivshmem server.
+
+=== The ivshmem server ===
+
+The server listens on a UNIX domain socket.
+
+For each new client that connects to the server, the server
+- picks an ID,
+- creates eventfd file descriptors for the interrupt vectors,
+- sends the ID and the file descriptor for the shared memory to the
+  new client,
+- sends connect notifications for the new client to the other clients
+  (these contain file descriptors for sending interrupts),
+- sends connect notifications for the other clients to the new client,
+  and
+- sends interrupt setup messages to the new client (these contain file
+  descriptors for receiving interrupts).
+
+When a client disconnects from the server, the server sends disconnect
+notifications to the other clients.
+
+The next section describes the protocol in detail.
+
+If the server terminates without sending disconnect notifications for
+its connected clients, the clients can elect to continue.  They can
+communicate with each other normally, but won't receive disconnect
+notification on disconnect, and no new clients can connect.  There is
+no way for the clients to connect to a restarted the server.  The
+device is not capable to tell guest software whether the server is
+still up.
+
+Example server code is in contrib/ivshmem-server/.  Not to be used in
+production.  It assumes all clients use the same number of interrupt
+vectors.
+
+A standalone client is in contrib/ivshmem-client/.  It can be useful
+for debugging.
+
+=== The ivshmem Client-Server Protocol ===
+
+An ivshmem device configured for interrupts connects to an ivshmem
+server.  This section details the protocol between the two.
+
+The connection is one-way: the server sends messages to the client.
+Each message consists of a single 8 byte little-endian signed number,
+and may be accompanied by a file descriptor via SCM_RIGHTS.  Both
+client and server close the connection on error.
+
+On connect, the server sends the following messages in order:
+
+1. The protocol version number, currently zero.  The client should
+   close the connection on receipt of versions it can't handle.
+
+2. The client's ID.  This is unique among all clients of this server.
+   IDs must be between 0 and 65535, because the Doorbell register
+   provides only 16 bits for them.
+
+3. The number -1, accompanied by the file descriptor for the shared
+   memory.
+
+4. Connect notifications for existing other clients, if any.  This is
+   a peer ID (number between 0 and 65535 other than the client's ID),
+   repeated N times.  Each repetition is accompanied by one file
+   descriptor.  These are for interrupting the peer with that ID using
+   vector 0,..,N-1, in order.  If the client is configured for fewer
+   vectors, it closes the extra file descriptors.  If it is configured
+   for more, the extra vectors remain unconnected.
+
+5. Interrupt setup.  This is the client's own ID, repeated N times.
+   Each repetition is accompanied by one file descriptor.  These are
+   for receiving interrupts from peers using vector 0,..,N-1, in
+   order.  If the client is configured for fewer vectors, it closes
+   the extra file descriptors.  If it is configured for more, the
+   extra vectors remain unconnected.
+
+From then on, the server sends these kinds of messages:
+
+6. Connection / disconnection notification.  This is a peer ID.
+
+  - If the number comes with a file descriptor, it's a connection
+    notification, exactly like in step 4.
+
+  - Else, it's a disconnection notification for the peer with that ID.
+
+Known bugs:
+
+* The protocol changed incompatibly in QEMU 2.5.  Before, messages
+  were native endian long, and there was no version number.
+
+* The protocol is poorly designed.
+
+=== The ivshmem Client-Client Protocol ===
+
+An ivshmem device configured for interrupts receives eventfd file
+descriptors for interrupting peers and getting interrupted by peers
+from the server, as explained in the previous section.
+
+To interrupt a peer, the device writes the 8-byte integer 1 in native
+byte order to the respective file descriptor.
+
+To receive an interrupt, the device reads and discards as many 8-byte
+integers as it can.
diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt
deleted file mode 100644
index d318d65..0000000
--- a/docs/specs/ivshmem_device_spec.txt
+++ /dev/null
@@ -1,161 +0,0 @@
-
-Device Specification for Inter-VM shared memory device
-------------------------------------------------------
-
-The Inter-VM shared memory device is designed to share a memory region (created
-on the host via the POSIX shared memory API) between multiple QEMU processes
-running different guests. In order for all guests to be able to pick up the
-shared memory area, it is modeled by QEMU as a PCI device exposing said memory
-to the guest as a PCI BAR.
-The memory region does not belong to any guest, but is a POSIX memory object on
-the host. The host can access this shared memory if needed.
-
-The device also provides an optional communication mechanism between guests
-sharing the same memory object. More details about that in the section 'Guest to
-guest communication' section.
-
-
-The Inter-VM PCI device
------------------------
-
-From the VM point of view, the ivshmem PCI device supports three BARs.
-
-- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is
-  not used.
-- BAR1 is used for MSI-X when it is enabled in the device.
-- BAR2 is used to access the shared memory object.
-
-It is your choice how to use the device but you must choose between two
-behaviors :
-
-- basically, if you only need the shared memory part, you will map BAR2.
-  This way, you have access to the shared memory in guest and can use it as you
-  see fit (memnic, for example, uses it in userland
-  http://dpdk.org/browse/memnic).
-
-- BAR0 and BAR1 are used to implement an optional communication mechanism
-  through interrupts in the guests. If you need an event mechanism between the
-  guests accessing the shared memory, you will most likely want to write a
-  kernel driver that will handle interrupts. See details in the section 'Guest
-  to guest communication' section.
-
-The behavior is chosen when starting your QEMU processes:
-- no communication mechanism needed, the first QEMU to start creates the shared
-  memory on the host, subsequent QEMU processes will use it.
-
-- communication mechanism needed, an ivshmem server must be started before any
-  QEMU processes, then each QEMU process connects to the server unix socket.
-
-For more details on the QEMU ivshmem parameters, see qemu-doc documentation.
-
-
-Guest to guest communication
-----------------------------
-
-This section details the communication mechanism between the guests accessing
-the ivhsmem shared memory.
-
-*ivshmem server*
-
-This server code is available in qemu.git/contrib/ivshmem-server.
-
-The server must be started on the host before any guest.
-It creates a shared memory object then waits for clients to connect on a unix
-socket. All the messages are little-endian int64_t integer.
-
-For each client (QEMU process) that connects to the server:
-- the server sends a protocol version, if client does not support it, the client
-  closes the communication,
-- the server assigns an ID for this client and sends this ID to him as the first
-  message,
-- the server sends a fd to the shared memory object to this client,
-- the server creates a new set of host eventfds associated to the new client and
-  sends this set to all already connected clients,
-- finally, the server sends all the eventfds sets for all clients to the new
-  client.
-
-The server signals all clients when one of them disconnects.
-
-The client IDs are limited to 16 bits because of the current implementation (see
-Doorbell register in 'PCI device registers' subsection). Hence only 65536
-clients are supported.
-
-All the file descriptors (fd to the shared memory, eventfds for each client)
-are passed to clients using SCM_RIGHTS over the server unix socket.
-
-Apart from the current ivshmem implementation in QEMU, an ivshmem client has
-been provided in qemu.git/contrib/ivshmem-client for debug.
-
-*QEMU as an ivshmem client*
-
-At initialisation, when creating the ivshmem device, QEMU first receives a
-protocol version and closes communication with server if it does not match.
-Then, QEMU gets its ID from the server then makes it available through BAR0
-IVPosition register for the VM to use (see 'PCI device registers' subsection).
-QEMU then uses the fd to the shared memory to map it to BAR2.
-eventfds for all other clients received from the server are stored to implement
-BAR0 Doorbell register (see 'PCI device registers' subsection).
-Finally, eventfds assigned to this QEMU process are used to send interrupts in
-this VM.
-
-*PCI device registers*
-
-From the VM point of view, the ivshmem PCI device supports 4 registers of
-32-bits each.
-
-enum ivshmem_registers {
-    IntrMask = 0,
-    IntrStatus = 4,
-    IVPosition = 8,
-    Doorbell = 12
-};
-
-The first two registers are the interrupt mask and status registers.  Mask and
-status are only used with pin-based interrupts.  They are unused with MSI
-interrupts.
-
-Status Register: The status register is set to 1 when an interrupt occurs.
-
-Mask Register: The mask register is bitwise ANDed with the interrupt status
-and the result will raise an interrupt if it is non-zero.  However, since 1 is
-the only value the status will be set to, it is only the first bit of the mask
-that has any effect.  Therefore interrupts can be masked by setting the first
-bit to 0 and unmasked by setting the first bit to 1.
-
-IVPosition Register: The IVPosition register is read-only and reports the
-guest's ID number.  The guest IDs are non-negative integers.  When using the
-server, since the server is a separate process, the VM ID will only be set when
-the device is ready (shared memory is received from the server and accessible
-via the device).  If the device is not ready, the IVPosition will return -1.
-Applications should ensure that they have a valid VM ID before accessing the
-shared memory.
-
-Doorbell Register:  To interrupt another guest, a guest must write to the
-Doorbell register.  The doorbell register is 32-bits, logically divided into
-two 16-bit fields.  The high 16-bits are the guest ID to interrupt and the low
-16-bits are the interrupt vector to trigger.  The semantics of the value
-written to the doorbell depends on whether the device is using MSI or a regular
-pin-based interrupt.  In short, MSI uses vectors while regular interrupts set
-the status register.
-
-Regular Interrupts
-
-If regular interrupts are used (due to either a guest not supporting MSI or the
-user specifying not to use them on startup) then the value written to the lower
-16-bits of the Doorbell register results is arbitrary and will trigger an
-interrupt in the destination guest.
-
-Message Signalled Interrupts
-
-An ivshmem device may support multiple MSI vectors.  If so, the lower 16-bits
-written to the Doorbell register must be between 0 and the maximum number of
-vectors the guest supports.  The lower 16 bits written to the doorbell is the
-MSI vector that will be raised in the destination guest.  The number of MSI
-vectors is configurable but it is set when the VM is started.
-
-The important thing to remember with MSI is that it is only a signal, no status
-is set (since MSI interrupts are not shared).  All information other than the
-interrupt itself should be communicated via the shared memory region.  Devices
-supporting multiple MSI vectors can use different vectors to indicate different
-events have occurred.  The semantics of interrupt vectors are left to the
-user's discretion.
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 09/38] ivshmem: Add missing newlines to debug printfs
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (7 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 08/38] ivshmem: Rewrite specification document Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 12:20   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 10/38] ivshmem: Compile debug prints unconditionally to prevent bit-rot Markus Armbruster
                   ` (28 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 48b7a34..b74b02c 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -568,10 +568,10 @@ static void setup_interrupt(IVShmemState *s, int vector)
     IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);
 
     if (!with_irqfd) {
-        IVSHMEM_DPRINTF("with eventfd");
+        IVSHMEM_DPRINTF("with eventfd\n");
         watch_vector_notifier(s, n, vector);
     } else if (msix_enabled(pdev)) {
-        IVSHMEM_DPRINTF("with irqfd");
+        IVSHMEM_DPRINTF("with irqfd\n");
         if (ivshmem_add_kvm_msi_virq(s, vector) < 0) {
             return;
         }
@@ -582,7 +582,7 @@ static void setup_interrupt(IVShmemState *s, int vector)
         }
     } else {
         /* it will be delayed until msix is enabled, in write_config */
-        IVSHMEM_DPRINTF("with irqfd, delayed until msix enabled");
+        IVSHMEM_DPRINTF("with irqfd, delayed until msix enabled\n");
     }
 }
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 10/38] ivshmem: Compile debug prints unconditionally to prevent bit-rot
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (8 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 09/38] ivshmem: Add missing newlines to debug printfs Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 12:22   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 11/38] ivshmem: Clean up after commit 9940c32 Markus Armbruster
                   ` (27 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index b74b02c..395f357 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -48,13 +48,13 @@
 
 #define IVSHMEM_REG_BAR_SIZE 0x100
 
-//#define DEBUG_IVSHMEM
-#ifdef DEBUG_IVSHMEM
-#define IVSHMEM_DPRINTF(fmt, ...)        \
-    do {printf("IVSHMEM: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define IVSHMEM_DPRINTF(fmt, ...)
-#endif
+#define IVSHMEM_DEBUG 0
+#define IVSHMEM_DPRINTF(fmt, ...)                       \
+    do {                                                \
+        if (IVSHMEM_DEBUG) {                            \
+            printf("IVSHMEM: " fmt, ## __VA_ARGS__);    \
+        }                                               \
+    } while (0)
 
 #define TYPE_IVSHMEM "ivshmem"
 #define IVSHMEM(obj) \
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 11/38] ivshmem: Clean up after commit 9940c32
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (9 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 10/38] ivshmem: Compile debug prints unconditionally to prevent bit-rot Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 12:47   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 12/38] ivshmem: Drop ivshmem_event() stub Markus Armbruster
                   ` (26 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

IVShmemState member eventfd_chr is useless since commit 9940c32.  Drop
it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 395f357..b087dc3 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -79,7 +79,6 @@ typedef struct IVShmemState {
     uint32_t intrmask;
     uint32_t intrstatus;
 
-    CharDriverState **eventfd_chr;
     CharDriverState *server_chr;
     Fifo8 incoming_fifo;
     MemoryRegion ivshmem_mmio;
@@ -941,8 +940,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
 
         pci_register_bar(dev, 2, attr, &s->bar);
 
-        s->eventfd_chr = g_malloc0(s->vectors * sizeof(CharDriverState *));
-
         qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
                               ivshmem_check_version, ivshmem_event, s);
     } else {
@@ -1004,15 +1001,6 @@ static void pci_ivshmem_exit(PCIDevice *dev)
         memory_region_del_subregion(&s->bar, &s->ivshmem);
     }
 
-    if (s->eventfd_chr) {
-        for (i = 0; i < s->vectors; i++) {
-            if (s->eventfd_chr[i]) {
-                qemu_chr_free(s->eventfd_chr[i]);
-            }
-        }
-        g_free(s->eventfd_chr);
-    }
-
     if (s->peers) {
         for (i = 0; i < s->nb_peers; i++) {
             close_peer_eventfds(s, i);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 12/38] ivshmem: Drop ivshmem_event() stub
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (10 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 11/38] ivshmem: Clean up after commit 9940c32 Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 12:48   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 13/38] ivshmem: Don't destroy the chardev on version mismatch Markus Armbruster
                   ` (25 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index b087dc3..7119a07 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -267,11 +267,6 @@ static int ivshmem_can_receive(void * opaque)
     return sizeof(int64_t);
 }
 
-static void ivshmem_event(void *opaque, int event)
-{
-    IVSHMEM_DPRINTF("ivshmem_event %d\n", event);
-}
-
 static void ivshmem_vector_notify(void *opaque)
 {
     MSIVector *entry = opaque;
@@ -719,7 +714,7 @@ static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
 
     IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n");
     qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read,
-                          ivshmem_event, s);
+                          NULL, s);
 }
 
 /* Select the MSI-X vectors used by device.
@@ -941,7 +936,7 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         pci_register_bar(dev, 2, attr, &s->bar);
 
         qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
-                              ivshmem_check_version, ivshmem_event, s);
+                              ivshmem_check_version, NULL, s);
     } else {
         /* just map the file immediately, we're not using a server */
         int fd;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 13/38] ivshmem: Don't destroy the chardev on version mismatch
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (11 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 12/38] ivshmem: Drop ivshmem_event() stub Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 15:39   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 14/38] ivshmem: Fix harmless misuse of Error Markus Armbruster
                   ` (24 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Yes, the chardev is commonly useless after we read a bad version from
it, but destroying it is inappropriate anyway: the user created it, so
the user should be able to hold on to it as long as he likes.  We
don't destroy it on other errors.  Screwed up in commit 5105b1d.

Stop reading instead.

Also note QEMU's behavior in ivshmem-spec.txt.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/specs/ivshmem-spec.txt | 3 +++
 hw/misc/ivshmem.c           | 3 +--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
index 0835ba1..4fc6f37 100644
--- a/docs/specs/ivshmem-spec.txt
+++ b/docs/specs/ivshmem-spec.txt
@@ -188,6 +188,9 @@ Each message consists of a single 8 byte little-endian signed number,
 and may be accompanied by a file descriptor via SCM_RIGHTS.  Both
 client and server close the connection on error.
 
+Note: QEMU currently doesn't close the connection right on error, but
+only when the character device is destroyed.
+
 On connect, the server sends the following messages in order:
 
 1. The protocol version number, currently zero.  The client should
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 7119a07..2850e8a 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -707,8 +707,7 @@ static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
     if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) {
         fprintf(stderr, "incompatible version, you are connecting to a ivshmem-"
                 "server using a different protocol please check your setup\n");
-        qemu_chr_delete(s->server_chr);
-        s->server_chr = NULL;
+        qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s);
         return;
     }
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 14/38] ivshmem: Fix harmless misuse of Error
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (12 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 13/38] ivshmem: Don't destroy the chardev on version mismatch Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 15:47   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 15/38] ivshmem: Failed realize() can leave migration blocker behind Markus Armbruster
                   ` (23 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

We reuse errp after passing it host_memory_backend_get_memory().  If
both host_memory_backend_get_memory() and the reuse set an error, the
reuse will fail the assertion in error_setv().  Fortunately,
host_memory_backend_get_memory() can't fail.

Pass it &error_abort to make our assumption explicit, and to get the
assertion failure in the right place should it become invalid.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 2850e8a..eb53d9a 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -841,7 +841,7 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
             g_warning("size argument ignored with hostmem");
         }
 
-        mr = host_memory_backend_get_memory(s->hostmem, errp);
+        mr = host_memory_backend_get_memory(s->hostmem, &error_abort);
         s->ivshmem_size = memory_region_size(mr);
     } else if (s->sizearg == NULL) {
         s->ivshmem_size = 4 << 20; /* 4 MB default */
@@ -906,7 +906,8 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
 
         IVSHMEM_DPRINTF("using hostmem\n");
 
-        mr = host_memory_backend_get_memory(MEMORY_BACKEND(s->hostmem), errp);
+        mr = host_memory_backend_get_memory(MEMORY_BACKEND(s->hostmem),
+                                            &error_abort);
         vmstate_register_ram(mr, DEVICE(s));
         memory_region_add_subregion(&s->bar, 0, mr);
         pci_register_bar(PCI_DEVICE(s), 2, attr, &s->bar);
@@ -1131,7 +1132,7 @@ static void ivshmem_check_memdev_is_busy(Object *obj, const char *name,
 {
     MemoryRegion *mr;
 
-    mr = host_memory_backend_get_memory(MEMORY_BACKEND(val), errp);
+    mr = host_memory_backend_get_memory(MEMORY_BACKEND(val), &error_abort);
     if (memory_region_is_mapped(mr)) {
         char *path = object_get_canonical_path_component(val);
         error_setg(errp, "can't use already busy memdev: %s", path);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 15/38] ivshmem: Failed realize() can leave migration blocker behind
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (13 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 14/38] ivshmem: Fix harmless misuse of Error Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 15:59   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 16/38] ivshmem: Clean up register callbacks Markus Armbruster
                   ` (22 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

If pci_ivshmem_realize() fails after it created its migration blocker,
the blocker is left in place.  Fix that by creating it last.

Likewise, if it fails after it called fifo8_create(), it leaks fifo
memory.  Fix that the same way.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index eb53d9a..1392426 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -824,6 +824,7 @@ static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
 static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
 {
     IVShmemState *s = IVSHMEM(dev);
+    Error *err = NULL;
     uint8_t *pci_conf;
     uint8_t attr = PCI_BASE_ADDRESS_SPACE_MEMORY |
         PCI_BASE_ADDRESS_MEM_PREFETCH;
@@ -855,8 +856,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         s->ivshmem_size = size;
     }
 
-    fifo8_create(&s->incoming_fifo, sizeof(int64_t));
-
     /* IRQFD requires MSI */
     if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) &&
         !ivshmem_has_feature(s, IVSHMEM_MSI)) {
@@ -878,12 +877,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         s->role_val = IVSHMEM_MASTER; /* default */
     }
 
-    if (s->role_val == IVSHMEM_PEER) {
-        error_setg(&s->migration_blocker,
-                   "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
-        migrate_add_blocker(s->migration_blocker);
-    }
-
     pci_conf = dev->config;
     pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
 
@@ -962,7 +955,19 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
             return;
         }
 
-        create_shared_memory_BAR(s, fd, attr, errp);
+        create_shared_memory_BAR(s, fd, attr, &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
+    }
+
+    fifo8_create(&s->incoming_fifo, sizeof(int64_t));
+
+    if (s->role_val == IVSHMEM_PEER) {
+        error_setg(&s->migration_blocker,
+                   "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
+        migrate_add_blocker(s->migration_blocker);
     }
 }
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 16/38] ivshmem: Clean up register callbacks
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (14 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 15/38] ivshmem: Failed realize() can leave migration blocker behind Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 16:04   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 17/38] ivshmem: Clean up MSI-X conditions Markus Armbruster
                   ` (21 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 1392426..7191914 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -121,12 +121,10 @@ static inline uint32_t ivshmem_has_feature(IVShmemState *ivs,
     return (ivs->features & (1 << feature));
 }
 
-/* accessing registers - based on rtl8139 */
 static void ivshmem_update_irq(IVShmemState *s)
 {
     PCIDevice *d = PCI_DEVICE(s);
-    int isr;
-    isr = (s->intrstatus & s->intrmask) & 0xffffffff;
+    uint32_t isr = s->intrstatus & s->intrmask;
 
     /* don't print ISR resets */
     if (isr) {
@@ -134,7 +132,7 @@ static void ivshmem_update_irq(IVShmemState *s)
                         isr ? 1 : 0, s->intrstatus, s->intrmask);
     }
 
-    pci_set_irq(d, (isr != 0));
+    pci_set_irq(d, isr != 0);
 }
 
 static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
@@ -142,7 +140,6 @@ static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
     IVSHMEM_DPRINTF("IntrMask write(w) val = 0x%04x\n", val);
 
     s->intrmask = val;
-
     ivshmem_update_irq(s);
 }
 
@@ -151,7 +148,6 @@ static uint32_t ivshmem_IntrMask_read(IVShmemState *s)
     uint32_t ret = s->intrmask;
 
     IVSHMEM_DPRINTF("intrmask read(w) val = 0x%04x\n", ret);
-
     return ret;
 }
 
@@ -160,7 +156,6 @@ static void ivshmem_IntrStatus_write(IVShmemState *s, uint32_t val)
     IVSHMEM_DPRINTF("IntrStatus write(w) val = 0x%04x\n", val);
 
     s->intrstatus = val;
-
     ivshmem_update_irq(s);
 }
 
@@ -170,9 +165,7 @@ static uint32_t ivshmem_IntrStatus_read(IVShmemState *s)
 
     /* reading ISR clears all interrupts */
     s->intrstatus = 0;
-
     ivshmem_update_irq(s);
-
     return ret;
 }
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 17/38] ivshmem: Clean up MSI-X conditions
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (15 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 16/38] ivshmem: Clean up register callbacks Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 16:57   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 18/38] ivshmem: Leave INTx alone when using MSI-X Markus Armbruster
                   ` (20 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

There are three predicates related to MSI-X:

* ivshmem_has_feature(s, IVSHMEM_MSI) is true unless the non-MSI-X
  variant of the device is selected with msi=off.

* msix_present() is true when the device has the PCI capability MSI-X.
  It's initially false, and becomes true during successful realize of
  the MSI-X variant of the device.  Thus, it's the same as
  ivshmem_has_feature(s, IVSHMEM_MSI) for realized devices.

* msix_enabled() is true when msix_present() is true and guest software
  has enabled MSI-X.

Code that differs between the non-MSI-X and the MSI-X variant of the
device needs to be guarded by ivshmem_has_feature(s, IVSHMEM_MSI) or
by msix_present(), except the latter works only for realized devices.

Code that depends on whether MSI-X is in use needs to be guarded with
msix_enabled().

Code review led me to two minor messes:

* ivshmem_vector_notify() calls msix_notify() even when
  !msix_enabled(), unlike most other MSI-X-capable devices.  As far as
  I can tell, msix_notify() does nothing when !msix_enabled().  Add
  the guard anyway.

* Most callers of ivshmem_use_msix() guard it with
  ivshmem_has_feature(s, IVSHMEM_MSI).  Not necessary, because
  ivshmem_use_msix() does nothing when !msix_present().  That's
  ivshmem's only use of msix_present(), though.  Rename
  ivshmem_use_msix() to ivshmem_vector_use(), replace msix_present()
  by ivshmem_has_feature() there, and drop the redundant guards.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 7191914..cfea151 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -274,7 +274,9 @@ static void ivshmem_vector_notify(void *opaque)
 
     IVSHMEM_DPRINTF("interrupt on vector %p %d\n", pdev, vector);
     if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
-        msix_notify(pdev, vector);
+        if (msix_enabled(pdev)) {
+            msix_notify(pdev, vector);
+        }
     } else {
         ivshmem_IntrStatus_write(s, 1);
     }
@@ -712,13 +714,12 @@ static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
 /* Select the MSI-X vectors used by device.
  * ivshmem maps events to vectors statically, so
  * we just enable all vectors on init and after reset. */
-static void ivshmem_use_msix(IVShmemState * s)
+static void ivshmem_vector_use(IVShmemState *s)
 {
     PCIDevice *d = PCI_DEVICE(s);
     int i;
 
-    IVSHMEM_DPRINTF("%s, msix present: %d\n", __func__, msix_present(d));
-    if (!msix_present(d)) {
+    if (!ivshmem_has_feature(s, IVSHMEM_MSI)) {
         return;
     }
 
@@ -733,7 +734,7 @@ static void ivshmem_reset(DeviceState *d)
 
     s->intrstatus = 0;
     s->intrmask = 0;
-    ivshmem_use_msix(s);
+    ivshmem_vector_use(s);
 }
 
 static int ivshmem_setup_interrupts(IVShmemState *s)
@@ -747,9 +748,9 @@ static int ivshmem_setup_interrupts(IVShmemState *s)
         }
 
         IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
-        ivshmem_use_msix(s);
     }
 
+    ivshmem_vector_use(s);
     return 0;
 }
 
@@ -1034,12 +1035,7 @@ static int ivshmem_pre_load(void *opaque)
 
 static int ivshmem_post_load(void *opaque, int version_id)
 {
-    IVShmemState *s = opaque;
-
-    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
-        ivshmem_use_msix(s);
-    }
-
+    ivshmem_vector_use(opaque);
     return 0;
 }
 
@@ -1067,11 +1063,11 @@ static int ivshmem_load_old(QEMUFile *f, void *opaque, int version_id)
 
     if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
         msix_load(pdev, f);
-        ivshmem_use_msix(s);
     } else {
         s->intrstatus = qemu_get_be32(f);
         s->intrmask = qemu_get_be32(f);
     }
+    ivshmem_vector_use(s);
 
     return 0;
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 18/38] ivshmem: Leave INTx alone when using MSI-X
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (16 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 17/38] ivshmem: Clean up MSI-X conditions Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 17:14   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 19/38] ivshmem: Assert interrupts are set up once Markus Armbruster
                   ` (19 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

The ivshmem device can either use MSI-X or legacy INTx for interrupts.

With MSI-X enabled, peer interrupt events trigger an MSI as they
should.  But software can still raise INTx via interrupt status and
mask register in BAR 0.  This is explicitly prohibited by PCI Local
Bus Specification Revision 3.0, section 6.8.3.3:

    While enabled for MSI or MSI-X operation, a function is prohibited
    from using its INTx# pin (if implemented) to request service (MSI,
    MSI-X, and INTx# are mutually exclusive).

Fix the device model to leave INTx alone when using MSI-X.

Document that we claim to use INTx in config space even when we don't.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index cfea151..fc37feb 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -126,6 +126,11 @@ static void ivshmem_update_irq(IVShmemState *s)
     PCIDevice *d = PCI_DEVICE(s);
     uint32_t isr = s->intrstatus & s->intrmask;
 
+    /* No INTx with msi=off, whether the guest enabled MSI-X or not */
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        return;
+    }
+
     /* don't print ISR resets */
     if (isr) {
         IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
@@ -874,6 +879,10 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
     pci_conf = dev->config;
     pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
 
+    /*
+     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
+     * bald-faced lie then.  But it's a backwards compatible lie.
+     */
     pci_config_set_interrupt_pin(pci_conf, 1);
 
     memory_region_init_io(&s->ivshmem_mmio, OBJECT(s), &ivshmem_mmio_ops, s,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 19/38] ivshmem: Assert interrupts are set up once
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (17 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 18/38] ivshmem: Leave INTx alone when using MSI-X Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 12:02   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 20/38] ivshmem: Simplify rejection of invalid peer ID from server Markus Armbruster
                   ` (18 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

An interrupt is set up when the interrupt's file descriptor is
received.  Each message applies to the next interrupt vector.
Therefore, each vector cannot be set up more than once.

ivshmem_add_kvm_msi_virq() half-heartedly tries not to rely on this by
doing nothing then, but that's not going to recover from this error
should it become possible in the future.  watch_vector_notifier()
doesn't even try.

Simply assert what is the case, so we get alerted if we ever screw it
up.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index fc37feb..9d2209d 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -349,7 +349,7 @@ static void watch_vector_notifier(IVShmemState *s, EventNotifier *n,
 {
     int eventfd = event_notifier_get_fd(n);
 
-    /* if MSI is supported we need multiple interrupts */
+    assert(!s->msi_vectors[vector].pdev);
     s->msi_vectors[vector].pdev = PCI_DEVICE(s);
 
     qemu_set_fd_handler(eventfd, ivshmem_vector_notify,
@@ -535,10 +535,7 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
     int ret;
 
     IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
-
-    if (s->msi_vectors[vector].pdev != NULL) {
-        return 0;
-    }
+    assert(!s->msi_vectors[vector].pdev);
 
     ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
     if (ret < 0) {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 20/38] ivshmem: Simplify rejection of invalid peer ID from server
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (18 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 19/38] ivshmem: Assert interrupts are set up once Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 15:08   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 21/38] ivshmem: Disentangle ivshmem_read() Markus Armbruster
                   ` (17 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

ivshmem_read() processes server messages.  These are 64 bit signed
integers.  -1 is shared memory setup, 16 bit unsigned is a peer ID,
anything else is invalid.

ivshmem_read() rejects invalid negative messages right away, silently.

Invalid positive messages get rejected only in resize_peers(), and
ivshmem_read() then prints the rather cryptic message "failed to
resize peers array".

Extend the first check to cover all invalid messages, make it report
"server sent invalid message", and drop the second check.

Now resize_peers() can't fail anymore; simplify.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 61 ++++++++++++++++++++-----------------------------------
 1 file changed, 22 insertions(+), 39 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 9d2209d..5d33be4 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -39,7 +39,7 @@
 #define PCI_VENDOR_ID_IVSHMEM   PCI_VENDOR_ID_REDHAT_QUMRANET
 #define PCI_DEVICE_ID_IVSHMEM   0x1110
 
-#define IVSHMEM_MAX_PEERS G_MAXUINT16
+#define IVSHMEM_MAX_PEERS UINT16_MAX
 #define IVSHMEM_IOEVENTFD   0
 #define IVSHMEM_MSI     1
 
@@ -93,7 +93,7 @@ typedef struct IVShmemState {
     uint32_t ivshmem_64bit;
 
     Peer *peers;
-    int nb_peers; /* how many peers we have space for */
+    int nb_peers;               /* space in @peers[] */
 
     int vm_id;
     uint32_t vectors;
@@ -451,34 +451,21 @@ static void close_peer_eventfds(IVShmemState *s, int posn)
     s->peers[posn].nb_eventfds = 0;
 }
 
-/* this function increase the dynamic storage need to store data about other
- * peers */
-static int resize_peers(IVShmemState *s, int new_min_size)
+static void resize_peers(IVShmemState *s, int nb_peers)
 {
+    int old_nb_peers = s->nb_peers;
+    int i;
 
-    int j, old_size;
+    assert(nb_peers > old_nb_peers);
+    IVSHMEM_DPRINTF("bumping storage to %d peers\n", nb_peers);
 
-    /* limit number of max peers */
-    if (new_min_size <= 0 || new_min_size > IVSHMEM_MAX_PEERS) {
-        return -1;
-    }
-    if (new_min_size <= s->nb_peers) {
-        return 0;
-    }
-
-    old_size = s->nb_peers;
-    s->nb_peers = new_min_size;
+    s->peers = g_realloc(s->peers, nb_peers * sizeof(Peer));
+    s->nb_peers = nb_peers;
 
-    IVSHMEM_DPRINTF("bumping storage to %d peers\n", s->nb_peers);
-
-    s->peers = g_realloc(s->peers, s->nb_peers * sizeof(Peer));
-
-    for (j = old_size; j < s->nb_peers; j++) {
-        s->peers[j].eventfds = g_new0(EventNotifier, s->vectors);
-        s->peers[j].nb_eventfds = 0;
+    for (i = old_nb_peers; i < nb_peers; i++) {
+        s->peers[i].eventfds = g_new0(EventNotifier, s->vectors);
+        s->peers[i].nb_eventfds = 0;
     }
-
-    return 0;
 }
 
 static bool fifo_update_and_get(IVShmemState *s, const uint8_t *buf, int size,
@@ -590,25 +577,21 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
         return;
     }
 
-    if (incoming_posn < -1) {
-        IVSHMEM_DPRINTF("invalid incoming_posn %" PRId64 "\n", incoming_posn);
-        return;
-    }
-
-    /* pick off s->server_chr->msgfd and store it, posn should accompany msg */
     incoming_fd = qemu_chr_fe_get_msgfd(s->server_chr);
     IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n",
                     incoming_posn, incoming_fd);
 
-    /* make sure we have enough space for this peer */
+    if (incoming_posn < -1 || incoming_posn > IVSHMEM_MAX_PEERS) {
+        error_report("server sent invalid message %" PRId64,
+                     incoming_posn);
+        if (incoming_fd != -1) {
+            close(incoming_fd);
+        }
+        return;
+    }
+
     if (incoming_posn >= s->nb_peers) {
-        if (resize_peers(s, incoming_posn + 1) < 0) {
-            error_report("failed to resize peers array");
-            if (incoming_fd != -1) {
-                close(incoming_fd);
-            }
-            return;
-        }
+        resize_peers(s, incoming_posn + 1);
     }
 
     peer = &s->peers[incoming_posn];
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 21/38] ivshmem: Disentangle ivshmem_read()
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (19 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 20/38] ivshmem: Simplify rejection of invalid peer ID from server Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 15:28   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 22/38] ivshmem: Plug leaks on unplug, fix peer disconnect Markus Armbruster
                   ` (16 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 189 +++++++++++++++++++++++++++---------------------------
 1 file changed, 96 insertions(+), 93 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 5d33be4..fc46666 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -564,14 +564,106 @@ static void setup_interrupt(IVShmemState *s, int vector)
     }
 }
 
+static void process_msg_shmem(IVShmemState *s, int fd)
+{
+    Error *err = NULL;
+    void *ptr;
+
+    if (memory_region_is_mapped(&s->ivshmem)) {
+        error_report("shm already initialized");
+        close(fd);
+        return;
+    }
+
+    if (check_shm_size(s, fd, &err) == -1) {
+        error_report_err(err);
+        close(fd);
+        return;
+    }
+
+    /* mmap the region and map into the BAR2 */
+    ptr = mmap(0, s->ivshmem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+    if (ptr == MAP_FAILED) {
+        error_report("Failed to mmap shared memory %s", strerror(errno));
+        close(fd);
+        return;
+    }
+    memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
+                               "ivshmem.bar2", s->ivshmem_size, ptr);
+    qemu_set_ram_fd(s->ivshmem.ram_addr, fd);
+    vmstate_register_ram(&s->ivshmem, DEVICE(s));
+    memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
+}
+
+static void process_msg_disconnect(IVShmemState *s, uint16_t posn)
+{
+    IVSHMEM_DPRINTF("posn %d has gone away\n", posn);
+    close_peer_eventfds(s, posn);
+}
+
+static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
+{
+    Peer *peer = &s->peers[posn];
+    int vector;
+
+    /*
+     * The N-th connect message for this peer comes with the file
+     * descriptor for vector N-1.  Count messages to find the vector.
+     */
+    if (peer->nb_eventfds >= s->vectors) {
+        error_report("Too many eventfd received, device has %d vectors",
+                     s->vectors);
+        close(fd);
+        return;
+    }
+    vector = peer->nb_eventfds++;
+
+    IVSHMEM_DPRINTF("eventfds[%d][%d] = %d\n", posn, vector, fd);
+    event_notifier_init_fd(&peer->eventfds[vector], fd);
+    fcntl_setfl(fd, O_NONBLOCK); /* msix/irqfd poll non block */
+
+    if (posn == s->vm_id) {
+        setup_interrupt(s, vector);
+    }
+
+    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
+        ivshmem_add_eventfd(s, posn, vector);
+    }
+}
+
+static void process_msg(IVShmemState *s, int64_t msg, int fd)
+{
+    IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n", msg, fd);
+
+    if (msg < -1 || msg > IVSHMEM_MAX_PEERS) {
+        error_report("server sent invalid message %" PRId64, msg);
+        close(fd);
+        return;
+    }
+
+    if (msg == -1) {
+        process_msg_shmem(s, fd);
+        return;
+    }
+
+    if (msg >= s->nb_peers) {
+        resize_peers(s, msg + 1);
+    }
+
+    if (fd >= 0) {
+        process_msg_connect(s, msg, fd);
+    } else if (s->vm_id == -1) {
+        s->vm_id = msg;
+    } else {
+        process_msg_disconnect(s, msg);
+    }
+}
+
 static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
 {
     IVShmemState *s = opaque;
     int incoming_fd;
-    int new_eventfd;
     int64_t incoming_posn;
-    Error *err = NULL;
-    Peer *peer;
 
     if (!fifo_update_and_get_i64(s, buf, size, &incoming_posn)) {
         return;
@@ -581,96 +673,7 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
     IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n",
                     incoming_posn, incoming_fd);
 
-    if (incoming_posn < -1 || incoming_posn > IVSHMEM_MAX_PEERS) {
-        error_report("server sent invalid message %" PRId64,
-                     incoming_posn);
-        if (incoming_fd != -1) {
-            close(incoming_fd);
-        }
-        return;
-    }
-
-    if (incoming_posn >= s->nb_peers) {
-        resize_peers(s, incoming_posn + 1);
-    }
-
-    peer = &s->peers[incoming_posn];
-
-    if (incoming_fd == -1) {
-        /* if posn is positive and unseen before then this is our posn*/
-        if (incoming_posn >= 0 && s->vm_id == -1) {
-            /* receive our posn */
-            s->vm_id = incoming_posn;
-        } else {
-            /* otherwise an fd == -1 means an existing peer has gone away */
-            IVSHMEM_DPRINTF("posn %" PRId64 " has gone away\n", incoming_posn);
-            close_peer_eventfds(s, incoming_posn);
-        }
-        return;
-    }
-
-    /* if the position is -1, then it's shared memory region fd */
-    if (incoming_posn == -1) {
-        void * map_ptr;
-
-        if (memory_region_is_mapped(&s->ivshmem)) {
-            error_report("shm already initialized");
-            close(incoming_fd);
-            return;
-        }
-
-        if (check_shm_size(s, incoming_fd, &err) == -1) {
-            error_report_err(err);
-            close(incoming_fd);
-            return;
-        }
-
-        /* mmap the region and map into the BAR2 */
-        map_ptr = mmap(0, s->ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED,
-                                                            incoming_fd, 0);
-        if (map_ptr == MAP_FAILED) {
-            error_report("Failed to mmap shared memory %s", strerror(errno));
-            close(incoming_fd);
-            return;
-        }
-        memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
-                                   "ivshmem.bar2", s->ivshmem_size, map_ptr);
-        qemu_set_ram_fd(s->ivshmem.ram_addr, incoming_fd);
-        vmstate_register_ram(&s->ivshmem, DEVICE(s));
-
-        IVSHMEM_DPRINTF("guest h/w addr = %p, size = %" PRIu64 "\n",
-                        map_ptr, s->ivshmem_size);
-
-        memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
-
-        return;
-    }
-
-    /* each peer has an associated array of eventfds, and we keep
-     * track of how many eventfds received so far */
-    /* get a new eventfd: */
-    if (peer->nb_eventfds >= s->vectors) {
-        error_report("Too many eventfd received, device has %d vectors",
-                     s->vectors);
-        close(incoming_fd);
-        return;
-    }
-
-    new_eventfd = peer->nb_eventfds++;
-
-    /* this is an eventfd for a particular peer VM */
-    IVSHMEM_DPRINTF("eventfds[%" PRId64 "][%d] = %d\n", incoming_posn,
-                    new_eventfd, incoming_fd);
-    event_notifier_init_fd(&peer->eventfds[new_eventfd], incoming_fd);
-    fcntl_setfl(incoming_fd, O_NONBLOCK); /* msix/irqfd poll non block */
-
-    if (incoming_posn == s->vm_id) {
-        setup_interrupt(s, new_eventfd);
-    }
-
-    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
-        ivshmem_add_eventfd(s, incoming_posn, new_eventfd);
-    }
+    process_msg(s, incoming_posn, incoming_fd);
 }
 
 static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 22/38] ivshmem: Plug leaks on unplug, fix peer disconnect
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (20 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 21/38] ivshmem: Disentangle ivshmem_read() Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 17:47   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 23/38] ivshmem: Receive shared memory synchronously in realize() Markus Armbruster
                   ` (15 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

close_peer_eventfds() cleans up three things: ioeventfd triggers if
they exist, eventfds, and the array to store them.

Commit 98609cd (v1.2.0) fixed it not to clean up ioeventfd triggers
when they don't exist (property ioeventfd=off, which is the default).
Unfortunately, the fix also made it skip cleanup of the eventfds and
the array then.  This is a memory and file descriptor leak on unplug.

Additionally, the reset of nb_eventfds is skipped.  Doesn't matter on
unplug.  On peer disconnect, however, this permanently wedges the
interrupt vectors used for that peer's ID.  The eventfds stay behind,
but aren't connected to a peer anymore.  When the ID gets recycled for
a new peer, the new peer's eventfds get assigned to vectors after the
old ones.  Commonly, the device's number of vectors matches the
server's, so the new ones get dropped with a "Too many eventfd
received" message.  Interrupts either don't work (common case) or go
to the wrong vector.

Fix by narrowing the conditional to just the ioeventfd trigger
cleanup.

While there, move the "invalid" peer check to the only caller where it
can actually happen.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index fc46666..c366087 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -428,21 +428,17 @@ static void close_peer_eventfds(IVShmemState *s, int posn)
 {
     int i, n;
 
-    if (!ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
-        return;
-    }
-    if (posn < 0 || posn >= s->nb_peers) {
-        error_report("invalid peer %d", posn);
-        return;
-    }
-
+    assert(posn >= 0 && posn < s->nb_peers);
     n = s->peers[posn].nb_eventfds;
 
-    memory_region_transaction_begin();
-    for (i = 0; i < n; i++) {
-        ivshmem_del_eventfd(s, posn, i);
+    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
+        memory_region_transaction_begin();
+        for (i = 0; i < n; i++) {
+            ivshmem_del_eventfd(s, posn, i);
+        }
+        memory_region_transaction_commit();
     }
-    memory_region_transaction_commit();
+
     for (i = 0; i < n; i++) {
         event_notifier_cleanup(&s->peers[posn].eventfds[i]);
     }
@@ -598,6 +594,10 @@ static void process_msg_shmem(IVShmemState *s, int fd)
 static void process_msg_disconnect(IVShmemState *s, uint16_t posn)
 {
     IVSHMEM_DPRINTF("posn %d has gone away\n", posn);
+    if (posn >= s->nb_peers) {
+        error_report("invalid peer %d", posn);
+        return;
+    }
     close_peer_eventfds(s, posn);
 }
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 23/38] ivshmem: Receive shared memory synchronously in realize()
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (21 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 22/38] ivshmem: Plug leaks on unplug, fix peer disconnect Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 18:11   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup() Markus Armbruster
                   ` (14 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

When configured for interrupts (property "chardev" given), we receive
the shared memory from an ivshmem server.  We do so asynchronously
after realize() completes, by setting up callbacks with
qemu_chr_add_handlers().

Keeping server I/O out of realize() that way avoids delays due to a
slow server.  This is probably relevant only for hot plug.

However, this funny "no shared memory, yet" state of the device also
causes a raft of issues that are hard or impossible to work around:

* The guest is exposed to this state: when we enter and leave it its
  shared memory contents is apruptly replaced, and device register
  IVPosition changes.

  This is a known issue.  We document that guests should not access
  the shared memory after device initialization until the IVPosition
  register becomes non-negative.

  For cold plug, the funny state is unlikely to be visible in
  practice, because we normally receive the shared memory long before
  the guest gets around to mess with the device.

  For hot plug, the timing is tighter, but the relative slowness of
  PCI device configuration has a good chance to hide the funny state.

  In either case, guests complying with the documented procedure are
  safe.

* Migration becomes racy.

  If migration completes before the shared memory setup completes on
  the source, shared memory contents is silently lost.  Fortunately,
  migration is rather unlikely to win this race.

  If the shared memory's ramblock arrives at the destination before
  shared memory setup completes, migration fails.

  There is no known way for a management application to wait for
  shared memory setup to complete.

  All you can do is retry failed migration.  You can improve your
  chances by leaving more time between running the destination QEMU
  and the migrate command.

  To mitigate silent memory loss, you need to ensure the server
  initializes shared memory exactly the same on source and
  destination.

  These issues are entirely undocumented so far.

I'd expect the server to be almost always fast enough to hide these
issues.  But then rare catastrophic races are in a way the worst kind.

This is way more trouble than I'm willing to take from any device.
Kill the funny state by receiving shared memory synchronously in
realize().  If your hot plug hangs, go kill your ivshmem server.

For easier review, this commit only makes the receive synchronous, it
doesn't add the necessary error propagation.  Without that, the funny
state persists.  The next commit will do that, and kill it off for
real.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c    | 70 +++++++++++++++++++++++++++++++++++++---------------
 tests/ivshmem-test.c | 26 ++++++-------------
 2 files changed, 57 insertions(+), 39 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index c366087..352937f 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -676,27 +676,47 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
     process_msg(s, incoming_posn, incoming_fd);
 }
 
-static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
+static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
 {
-    IVShmemState *s = opaque;
-    int tmp;
-    int64_t version;
+    int64_t msg;
+    int n, ret;
 
-    if (!fifo_update_and_get_i64(s, buf, size, &version)) {
-        return;
-    }
+    n = 0;
+    do {
+        ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *)&msg + n,
+                                 sizeof(msg) - n);
+        if (ret < 0 && ret != -EINTR) {
+            /* TODO error handling */
+            return INT64_MIN;
+        }
+        n += ret;
+    } while (n < sizeof(msg));
 
-    tmp = qemu_chr_fe_get_msgfd(s->server_chr);
-    if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) {
+    *pfd = qemu_chr_fe_get_msgfd(s->server_chr);
+    return msg;
+}
+
+static void ivshmem_recv_setup(IVShmemState *s)
+{
+    int64_t msg;
+    int fd;
+
+    msg = ivshmem_recv_msg(s, &fd);
+    if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) {
         fprintf(stderr, "incompatible version, you are connecting to a ivshmem-"
                 "server using a different protocol please check your setup\n");
-        qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s);
         return;
     }
 
-    IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n");
-    qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read,
-                          NULL, s);
+    /*
+     * Receive more messages until we got shared memory.
+     */
+    do {
+        msg = ivshmem_recv_msg(s, &fd);
+        process_msg(s, msg, fd);
+    } while (msg != -1);
+
+    assert(memory_region_is_mapped(&s->ivshmem));
 }
 
 /* Select the MSI-X vectors used by device.
@@ -903,19 +923,29 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
                         s->server_chr->filename);
 
-        if (ivshmem_setup_interrupts(s) < 0) {
-            error_setg(errp, "failed to initialize interrupts");
-            return;
-        }
-
         /* we allocate enough space for 16 peers and grow as needed */
         resize_peers(s, 16);
         s->vm_id = -1;
 
         pci_register_bar(dev, 2, attr, &s->bar);
 
-        qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
-                              ivshmem_check_version, NULL, s);
+        /*
+         * Receive setup messages from server synchronously.
+         * Older versions did it asynchronously, but that creates a
+         * number of entertaining race conditions.
+         * TODO Propagate errors!  Without that, we still have races
+         * on errors.
+         */
+        ivshmem_recv_setup(s);
+        if (memory_region_is_mapped(&s->ivshmem)) {
+            qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
+                                  ivshmem_read, NULL, s);
+        }
+
+        if (ivshmem_setup_interrupts(s) < 0) {
+            error_setg(errp, "failed to initialize interrupts");
+            return;
+        }
     } else {
         /* just map the file immediately, we're not using a server */
         int fd;
diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
index c1dd7bb..68d6840 100644
--- a/tests/ivshmem-test.c
+++ b/tests/ivshmem-test.c
@@ -309,35 +309,23 @@ static void test_ivshmem_server(bool msi)
     ret = ivshmem_server_start(&server);
     g_assert_cmpint(ret, ==, 0);
 
-    setup_vm_with_server(&state1, nvectors, msi);
-    s1 = &state1;
-    setup_vm_with_server(&state2, nvectors, msi);
-    s2 = &state2;
-
-    /* check state before server sends stuff */
-    g_assert_cmpuint(in_reg(s1, IVPOSITION), ==, 0xffffffff);
-    g_assert_cmpuint(in_reg(s2, IVPOSITION), ==, 0xffffffff);
-    g_assert_cmpuint(qtest_readb(s1->qtest, (uintptr_t)s1->mem_base), ==, 0x00);
-
     thread.server = &server;
     ret = pipe(thread.pipe);
     g_assert_cmpint(ret, ==, 0);
     thread.thread = g_thread_new("ivshmem-server", server_thread, &thread);
     g_assert(thread.thread != NULL);
 
-    /* waiting for devices to become operational */
-    while (g_get_monotonic_time() < end_time) {
-        g_usleep(1000);
-        if ((int)in_reg(s1, IVPOSITION) >= 0 &&
-            (int)in_reg(s2, IVPOSITION) >= 0) {
-            break;
-        }
-    }
+    setup_vm_with_server(&state1, nvectors, msi);
+    s1 = &state1;
+    setup_vm_with_server(&state2, nvectors, msi);
+    s2 = &state2;
 
     /* check got different VM ids */
     vm1 = in_reg(s1, IVPOSITION);
     vm2 = in_reg(s2, IVPOSITION);
-    g_assert_cmpuint(vm1, !=, vm2);
+    g_assert_cmpint(vm1, >=, 0);
+    g_assert_cmpint(vm2, >=, 0);
+    g_assert_cmpint(vm1, !=, vm2);
 
     /* check number of MSI-X vectors */
     global_qtest = s1->qtest;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup()
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (22 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 23/38] ivshmem: Receive shared memory synchronously in realize() Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 18:27   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 25/38] ivshmem: Rely on server sending the ID right after the version Markus Armbruster
                   ` (13 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

This kills off the funny state described in the previous commit.

Simplify ivshmem_io_read() accordingly, and update documentation.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/specs/ivshmem-spec.txt |  20 ++++----
 hw/misc/ivshmem.c           | 121 +++++++++++++++++++++++++++-----------------
 qemu-doc.texi               |   9 +---
 3 files changed, 87 insertions(+), 63 deletions(-)

diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
index 4fc6f37..3eb8c97 100644
--- a/docs/specs/ivshmem-spec.txt
+++ b/docs/specs/ivshmem-spec.txt
@@ -62,11 +62,11 @@ There are two ways to use this device:
   likely want to write a kernel driver to handle interrupts.  Requires
   the device to be configured for interrupts, obviously.
 
-If the device is configured for interrupts, BAR2 is initially invalid.
-It becomes safely accessible only after the ivshmem server provided
-the shared memory.  Guest software should wait for the IVPosition
-register (described below) to become non-negative before accessing
-BAR2.
+Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
+configured for interrupts.  It becomes safely accessible only after
+the ivshmem server provided the shared memory.  Guest software should
+wait for the IVPosition register (described below) to become
+non-negative before accessing BAR2.
 
 The device is not capable to tell guest software whether it is
 configured for interrupts.
@@ -82,7 +82,7 @@ BAR 0 contains the following registers:
         4     4   read/write        0   Interrupt Status
                                         bit 0: peer interrupt
                                         bit 1..31: reserved
-        8     4   read-only   0 or -1   IVPosition
+        8     4   read-only   0 or ID   IVPosition
        12     4   write-only      N/A   Doorbell
                                         bit 0..15: vector
                                         bit 16..31: peer ID
@@ -100,12 +100,14 @@ when an interrupt request from a peer is received.  Reading the
 register clears it.
 
 IVPosition Register: if the device is not configured for interrupts,
-this is zero.  Else, it's -1 for a short while after reset, then
-changes to the device's ID (between 0 and 65535).
+this is zero.  Else, it is the device's ID (between 0 and 65535).
+
+Before QEMU 2.6.0, the register may read -1 for a short while after
+reset.
 
 There is no good way for software to find out whether the device is
 configured for interrupts.  A positive IVPosition means interrupts,
-but zero could be either.  The initial -1 cannot be reliably observed.
+but zero could be either.
 
 Doorbell Register: writing this register requests to interrupt a peer.
 The written value's high 16 bits are the ID of the peer to interrupt,
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 352937f..831da53 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -234,12 +234,7 @@ static uint64_t ivshmem_io_read(void *opaque, hwaddr addr,
             break;
 
         case IVPOSITION:
-            /* return my VM ID if the memory is mapped */
-            if (memory_region_is_mapped(&s->ivshmem)) {
-                ret = s->vm_id;
-            } else {
-                ret = -1;
-            }
+            ret = s->vm_id;
             break;
 
         default:
@@ -511,7 +506,8 @@ static bool fifo_update_and_get_i64(IVShmemState *s,
     return false;
 }
 
-static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
+static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
+                                     Error **errp)
 {
     PCIDevice *pdev = PCI_DEVICE(s);
     MSIMessage msg = msix_get_message(pdev, vector);
@@ -522,22 +518,21 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
 
     ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
     if (ret < 0) {
-        error_report("ivshmem: kvm_irqchip_add_msi_route failed");
-        return -1;
+        error_setg(errp, "kvm_irqchip_add_msi_route failed");
+        return;
     }
 
     s->msi_vectors[vector].virq = ret;
     s->msi_vectors[vector].pdev = pdev;
-
-    return 0;
 }
 
-static void setup_interrupt(IVShmemState *s, int vector)
+static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
 {
     EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
     bool with_irqfd = kvm_msi_via_irqfd_enabled() &&
         ivshmem_has_feature(s, IVSHMEM_MSI);
     PCIDevice *pdev = PCI_DEVICE(s);
+    Error *err = NULL;
 
     IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);
 
@@ -546,13 +541,16 @@ static void setup_interrupt(IVShmemState *s, int vector)
         watch_vector_notifier(s, n, vector);
     } else if (msix_enabled(pdev)) {
         IVSHMEM_DPRINTF("with irqfd\n");
-        if (ivshmem_add_kvm_msi_virq(s, vector) < 0) {
+        ivshmem_add_kvm_msi_virq(s, vector, &err);
+        if (err) {
+            error_propagate(errp, err);
             return;
         }
 
         if (!msix_is_masked(pdev, vector)) {
             kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL,
                                                s->msi_vectors[vector].virq);
+            /* TODO handle error */
         }
     } else {
         /* it will be delayed until msix is enabled, in write_config */
@@ -560,19 +558,19 @@ static void setup_interrupt(IVShmemState *s, int vector)
     }
 }
 
-static void process_msg_shmem(IVShmemState *s, int fd)
+static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
 {
     Error *err = NULL;
     void *ptr;
 
     if (memory_region_is_mapped(&s->ivshmem)) {
-        error_report("shm already initialized");
+        error_setg(errp, "server sent unexpected shared memory message");
         close(fd);
         return;
     }
 
     if (check_shm_size(s, fd, &err) == -1) {
-        error_report_err(err);
+        error_propagate(errp, err);
         close(fd);
         return;
     }
@@ -580,7 +578,7 @@ static void process_msg_shmem(IVShmemState *s, int fd)
     /* mmap the region and map into the BAR2 */
     ptr = mmap(0, s->ivshmem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
     if (ptr == MAP_FAILED) {
-        error_report("Failed to mmap shared memory %s", strerror(errno));
+        error_setg_errno(errp, errno, "Failed to mmap shared memory");
         close(fd);
         return;
     }
@@ -591,17 +589,19 @@ static void process_msg_shmem(IVShmemState *s, int fd)
     memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
 }
 
-static void process_msg_disconnect(IVShmemState *s, uint16_t posn)
+static void process_msg_disconnect(IVShmemState *s, uint16_t posn,
+                                   Error **errp)
 {
     IVSHMEM_DPRINTF("posn %d has gone away\n", posn);
     if (posn >= s->nb_peers) {
-        error_report("invalid peer %d", posn);
+        error_setg(errp, "invalid peer %d", posn);
         return;
     }
     close_peer_eventfds(s, posn);
 }
 
-static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
+static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd,
+                                Error **errp)
 {
     Peer *peer = &s->peers[posn];
     int vector;
@@ -611,8 +611,8 @@ static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
      * descriptor for vector N-1.  Count messages to find the vector.
      */
     if (peer->nb_eventfds >= s->vectors) {
-        error_report("Too many eventfd received, device has %d vectors",
-                     s->vectors);
+        error_setg(errp, "Too many eventfd received, device has %d vectors",
+                   s->vectors);
         close(fd);
         return;
     }
@@ -623,7 +623,8 @@ static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
     fcntl_setfl(fd, O_NONBLOCK); /* msix/irqfd poll non block */
 
     if (posn == s->vm_id) {
-        setup_interrupt(s, vector);
+        setup_interrupt(s, vector, errp);
+        /* TODO do we need to handle the error? */
     }
 
     if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
@@ -631,18 +632,18 @@ static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
     }
 }
 
-static void process_msg(IVShmemState *s, int64_t msg, int fd)
+static void process_msg(IVShmemState *s, int64_t msg, int fd, Error **errp)
 {
     IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n", msg, fd);
 
     if (msg < -1 || msg > IVSHMEM_MAX_PEERS) {
-        error_report("server sent invalid message %" PRId64, msg);
+        error_setg(errp, "server sent invalid message %" PRId64, msg);
         close(fd);
         return;
     }
 
     if (msg == -1) {
-        process_msg_shmem(s, fd);
+        process_msg_shmem(s, fd, errp);
         return;
     }
 
@@ -651,17 +652,18 @@ static void process_msg(IVShmemState *s, int64_t msg, int fd)
     }
 
     if (fd >= 0) {
-        process_msg_connect(s, msg, fd);
+        process_msg_connect(s, msg, fd, errp);
     } else if (s->vm_id == -1) {
         s->vm_id = msg;
     } else {
-        process_msg_disconnect(s, msg);
+        process_msg_disconnect(s, msg, errp);
     }
 }
 
 static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
 {
     IVShmemState *s = opaque;
+    Error *err = NULL;
     int incoming_fd;
     int64_t incoming_posn;
 
@@ -673,10 +675,13 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
     IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n",
                     incoming_posn, incoming_fd);
 
-    process_msg(s, incoming_posn, incoming_fd);
+    process_msg(s, incoming_posn, incoming_fd, &err);
+    if (err) {
+        error_report_err(err);
+    }
 }
 
-static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
+static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd, Error **errp)
 {
     int64_t msg;
     int n, ret;
@@ -686,7 +691,7 @@ static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
         ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *)&msg + n,
                                  sizeof(msg) - n);
         if (ret < 0 && ret != -EINTR) {
-            /* TODO error handling */
+            error_setg_errno(errp, -ret, "read from server failed");
             return INT64_MIN;
         }
         n += ret;
@@ -696,15 +701,24 @@ static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
     return msg;
 }
 
-static void ivshmem_recv_setup(IVShmemState *s)
+static void ivshmem_recv_setup(IVShmemState *s, Error **errp)
 {
+    Error *err = NULL;
     int64_t msg;
     int fd;
 
-    msg = ivshmem_recv_msg(s, &fd);
-    if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) {
-        fprintf(stderr, "incompatible version, you are connecting to a ivshmem-"
-                "server using a different protocol please check your setup\n");
+    msg = ivshmem_recv_msg(s, &fd, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    if (msg != IVSHMEM_PROTOCOL_VERSION) {
+        error_setg(errp, "server sent version %" PRId64 ", expecting %d",
+                   msg, IVSHMEM_PROTOCOL_VERSION);
+        return;
+    }
+    if (fd != -1) {
+        error_setg(errp, "server sent invalid version message");
         return;
     }
 
@@ -712,8 +726,16 @@ static void ivshmem_recv_setup(IVShmemState *s)
      * Receive more messages until we got shared memory.
      */
     do {
-        msg = ivshmem_recv_msg(s, &fd);
-        process_msg(s, msg, fd);
+        msg = ivshmem_recv_msg(s, &fd, &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
+        process_msg(s, msg, fd, &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
+        }
     } while (msg != -1);
 
     assert(memory_region_is_mapped(&s->ivshmem));
@@ -768,7 +790,13 @@ static void ivshmem_enable_irqfd(IVShmemState *s)
     int i;
 
     for (i = 0; i < s->peers[s->vm_id].nb_eventfds; i++) {
-        ivshmem_add_kvm_msi_virq(s, i);
+        Error *err = NULL;
+
+        ivshmem_add_kvm_msi_virq(s, i, &err);
+        if (err) {
+            error_report_err(err);
+            /* TODO do we need to handle the error? */
+        }
     }
 
     if (msix_set_vector_notifiers(pdev,
@@ -814,7 +842,7 @@ static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
     pci_default_write_config(pdev, address, val, len);
     is_enabled = msix_enabled(pdev);
 
-    if (kvm_msi_via_irqfd_enabled() && s->vm_id != -1) {
+    if (kvm_msi_via_irqfd_enabled()) {
         if (!was_enabled && is_enabled) {
             ivshmem_enable_irqfd(s);
         } else if (was_enabled && !is_enabled) {
@@ -933,15 +961,16 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
          * Receive setup messages from server synchronously.
          * Older versions did it asynchronously, but that creates a
          * number of entertaining race conditions.
-         * TODO Propagate errors!  Without that, we still have races
-         * on errors.
          */
-        ivshmem_recv_setup(s);
-        if (memory_region_is_mapped(&s->ivshmem)) {
-            qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
-                                  ivshmem_read, NULL, s);
+        ivshmem_recv_setup(s, &err);
+        if (err) {
+            error_propagate(errp, err);
+            return;
         }
 
+        qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
+                              ivshmem_read, NULL, s);
+
         if (ivshmem_setup_interrupts(s) < 0) {
             error_setg(errp, "failed to initialize interrupts");
             return;
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 65f3b29..8afbbcd 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1289,14 +1289,7 @@ qemu-system-i386 -device ivshmem,size=@var{shm-size},vectors=@var{vectors},chard
 
 When using the server, the guest will be assigned a VM ID (>=0) that allows guests
 using the same server to communicate via interrupts.  Guests can read their
-VM ID from a device register (see example code).  Since receiving the shared
-memory region from the server is asynchronous, there is a (small) chance the
-guest may boot before the shared memory is attached.  To allow an application
-to ensure shared memory is attached, the VM ID register will return -1 (an
-invalid VM ID) until the memory is attached.  Once the shared memory is
-attached, the VM ID will return the guest's valid VM ID.  With these semantics,
-the guest application can check to ensure the shared memory is attached to the
-guest before proceeding.
+VM ID from a device register (see ivshmem-spec.txt).
 
 The @option{role} argument can be set to either master or peer and will affect
 how the shared memory is migrated.  With @option{role=master}, the guest will
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 25/38] ivshmem: Rely on server sending the ID right after the version
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (23 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup() Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 18:36   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 26/38] ivshmem: Drop the hackish test for UNIX domain chardev Markus Armbruster
                   ` (12 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

The protocol specification (ivshmem-spec.txt, formerly
ivshmem_device_spec.txt) has always required the ID message to be sent
right at the beginning, and ivshmem-server has always complied.  The
device, however, accepts it out of order.  If an interrupt setup
arrived before it, though, it would be misinterpreted as connect
notification.  Fix the latent bug by relying on the spec and
ivshmem-server's actual behavior.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 831da53..8f976ca 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -653,8 +653,6 @@ static void process_msg(IVShmemState *s, int64_t msg, int fd, Error **errp)
 
     if (fd >= 0) {
         process_msg_connect(s, msg, fd, errp);
-    } else if (s->vm_id == -1) {
-        s->vm_id = msg;
     } else {
         process_msg_disconnect(s, msg, errp);
     }
@@ -723,6 +721,30 @@ static void ivshmem_recv_setup(IVShmemState *s, Error **errp)
     }
 
     /*
+     * ivshmem-server sends the remaining initial messages in a fixed
+     * order, but the device has always accepted them in any order.
+     * Stay as compatible as practical, just in case people use
+     * servers that behave differently.
+     */
+
+    /*
+     * ivshmem_device_spec.txt has always required the ID message
+     * right here, and ivshmem-server has always complied.  However,
+     * older versions of the device accepted it out of order, but
+     * broke when an interrupt setup message arrived before it.
+     */
+    msg = ivshmem_recv_msg(s, &fd, &err);
+    if (err) {
+        error_propagate(errp, err);
+        return;
+    }
+    if (fd != -1 || msg < 0 || msg > IVSHMEM_MAX_PEERS) {
+        error_setg(errp, "server sent invalid ID message");
+        return;
+    }
+    s->vm_id = msg;
+
+    /*
      * Receive more messages until we got shared memory.
      */
     do {
@@ -953,7 +975,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
 
         /* we allocate enough space for 16 peers and grow as needed */
         resize_peers(s, 16);
-        s->vm_id = -1;
 
         pci_register_bar(dev, 2, attr, &s->bar);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 26/38] ivshmem: Drop the hackish test for UNIX domain chardev
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (24 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 25/38] ivshmem: Rely on server sending the ID right after the version Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 18:38   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 27/38] ivshmem: Simplify how we cope with short reads from server Markus Armbruster
                   ` (11 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

The chardev must be capable of transmitting SCM_RIGHTS ancillary
messages.  We check it by comparing CharDriverState member filename to
"unix:".  That's almost as brittle as it is disgusting.

When the actual transmission all happened asynchronously, this check
was all we could do in realize(), and thus better than nothing.  But
now we receive at least one SCM_RIGHTS synchronously in realize(),
it's not worth its keep anymore.  Drop it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 8f976ca..e578b8a 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -961,15 +961,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         memory_region_add_subregion(&s->bar, 0, mr);
         pci_register_bar(PCI_DEVICE(s), 2, attr, &s->bar);
     } else if (s->server_chr != NULL) {
-        /* FIXME do not rely on what chr drivers put into filename */
-        if (strncmp(s->server_chr->filename, "unix:", 5)) {
-            error_setg(errp, "chardev is not a unix client socket");
-            return;
-        }
-
-        /* if we get a UNIX socket as the parameter we will talk
-         * to the ivshmem server to receive the memory region */
-
         IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
                         s->server_chr->filename);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 27/38] ivshmem: Simplify how we cope with short reads from server
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (25 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 26/38] ivshmem: Drop the hackish test for UNIX domain chardev Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 18:41   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 28/38] ivshmem: Tighten check of property "size" Markus Armbruster
                   ` (10 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Short reads from a UNIX domain sockets are exceedingly unlikely when
the other side always sends eight bytes and we always read eight
bytes.  We cope with them anyway.  However, the code doing that is
rather convoluted.  Dumb it down radically.

Replace the convoluted code

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 76 ++++++++++++-------------------------------------------
 1 file changed, 16 insertions(+), 60 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index e578b8a..fb8a4f7 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -26,7 +26,6 @@
 #include "migration/migration.h"
 #include "qemu/error-report.h"
 #include "qemu/event_notifier.h"
-#include "qemu/fifo8.h"
 #include "sysemu/char.h"
 #include "sysemu/hostmem.h"
 #include "qapi/visitor.h"
@@ -80,7 +79,6 @@ typedef struct IVShmemState {
     uint32_t intrstatus;
 
     CharDriverState *server_chr;
-    Fifo8 incoming_fifo;
     MemoryRegion ivshmem_mmio;
 
     /* We might need to register the BAR before we actually have the memory.
@@ -99,6 +97,8 @@ typedef struct IVShmemState {
     uint32_t vectors;
     uint32_t features;
     MSIVector *msi_vectors;
+    uint64_t msg_buf;           /* buffer for receiving server messages */
+    int msg_buffered_bytes;     /* #bytes in @msg_buf */
 
     Error *migration_blocker;
 
@@ -255,11 +255,6 @@ static const MemoryRegionOps ivshmem_mmio_ops = {
     },
 };
 
-static int ivshmem_can_receive(void * opaque)
-{
-    return sizeof(int64_t);
-}
-
 static void ivshmem_vector_notify(void *opaque)
 {
     MSIVector *entry = opaque;
@@ -459,53 +454,6 @@ static void resize_peers(IVShmemState *s, int nb_peers)
     }
 }
 
-static bool fifo_update_and_get(IVShmemState *s, const uint8_t *buf, int size,
-                                void *data, size_t len)
-{
-    const uint8_t *p;
-    uint32_t num;
-
-    assert(len <= sizeof(int64_t)); /* limitation of the fifo */
-    if (fifo8_is_empty(&s->incoming_fifo) && size == len) {
-        memcpy(data, buf, size);
-        return true;
-    }
-
-    IVSHMEM_DPRINTF("short read of %d bytes\n", size);
-
-    num = MIN(size, sizeof(int64_t) - fifo8_num_used(&s->incoming_fifo));
-    fifo8_push_all(&s->incoming_fifo, buf, num);
-
-    if (fifo8_num_used(&s->incoming_fifo) < len) {
-        assert(num == 0);
-        return false;
-    }
-
-    size -= num;
-    buf += num;
-    p = fifo8_pop_buf(&s->incoming_fifo, len, &num);
-    assert(num == len);
-
-    memcpy(data, p, len);
-
-    if (size > 0) {
-        fifo8_push_all(&s->incoming_fifo, buf, size);
-    }
-
-    return true;
-}
-
-static bool fifo_update_and_get_i64(IVShmemState *s,
-                                    const uint8_t *buf, int size, int64_t *i64)
-{
-    if (fifo_update_and_get(s, buf, size, i64, sizeof(*i64))) {
-        *i64 = GINT64_FROM_LE(*i64);
-        return true;
-    }
-
-    return false;
-}
-
 static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
                                      Error **errp)
 {
@@ -658,6 +606,14 @@ static void process_msg(IVShmemState *s, int64_t msg, int fd, Error **errp)
     }
 }
 
+static int ivshmem_can_receive(void *opaque)
+{
+    IVShmemState *s = opaque;
+
+    assert(s->msg_buffered_bytes < sizeof(s->msg_buf));
+    return sizeof(s->msg_buf) - s->msg_buffered_bytes;
+}
+
 static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
 {
     IVShmemState *s = opaque;
@@ -665,8 +621,12 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
     int incoming_fd;
     int64_t incoming_posn;
 
-    if (!fifo_update_and_get_i64(s, buf, size, &incoming_posn)) {
-        return;
+    assert(size >= 0 && s->msg_buffered_bytes + size <= sizeof(s->msg_buf));
+    memcpy((unsigned char *)&s->msg_buf + s->msg_buffered_bytes, buf, size);
+    s->msg_buffered_bytes += size;
+    if (s->msg_buffered_bytes == sizeof(s->msg_buf)) {
+        incoming_posn = le64_to_cpu(s->msg_buf);
+        s->msg_buffered_bytes = 0;
     }
 
     incoming_fd = qemu_chr_fe_get_msgfd(s->server_chr);
@@ -1019,8 +979,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         }
     }
 
-    fifo8_create(&s->incoming_fifo, sizeof(int64_t));
-
     if (s->role_val == IVSHMEM_PEER) {
         error_setg(&s->migration_blocker,
                    "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
@@ -1033,8 +991,6 @@ static void pci_ivshmem_exit(PCIDevice *dev)
     IVShmemState *s = IVSHMEM(dev);
     int i;
 
-    fifo8_destroy(&s->incoming_fifo);
-
     if (s->migration_blocker) {
         migrate_del_blocker(s->migration_blocker);
         error_free(s->migration_blocker);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 28/38] ivshmem: Tighten check of property "size"
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (26 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 27/38] ivshmem: Simplify how we cope with short reads from server Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 18:44   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 29/38] ivshmem: Implement shm=... with a memory backend Markus Armbruster
                   ` (9 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

If size_t is narrower than 64 bits, passing uint64_t ivshmem_size to
mmap() truncates.  Reject such sizes.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index fb8a4f7..8d54fa9 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -87,7 +87,7 @@ typedef struct IVShmemState {
      */
     MemoryRegion bar;
     MemoryRegion ivshmem;
-    uint64_t ivshmem_size; /* size of shared memory region */
+    size_t ivshmem_size; /* size of shared memory region */
     uint32_t ivshmem_64bit;
 
     Peer *peers;
@@ -361,7 +361,7 @@ static int check_shm_size(IVShmemState *s, int fd, Error **errp)
 
     if (s->ivshmem_size > buf.st_size) {
         error_setg(errp, "Requested memory size greater"
-                   " than shared object size (%" PRIu64 " > %" PRIu64")",
+                   " than shared object size (%zu > %" PRIu64")",
                    s->ivshmem_size, (uint64_t)buf.st_size);
         return -1;
     } else {
@@ -861,7 +861,8 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
     } else {
         char *end;
         int64_t size = qemu_strtosz(s->sizearg, &end);
-        if (size < 0 || *end != '\0' || !is_power_of_2(size)) {
+        if (size < 0 || (size_t)size != size || *end != '\0'
+            || !is_power_of_2(size)) {
             error_setg(errp, "Invalid size %s", s->sizearg);
             return;
         }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 29/38] ivshmem: Implement shm=... with a memory backend
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (27 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 28/38] ivshmem: Tighten check of property "size" Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 11:37   ` Paolo Bonzini
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory) Markus Armbruster
                   ` (8 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

ivshmem has its very own code to create and map shared memory.
Replace that with an implicitly created memory backend.  Reduces the
number of ways we create BAR 2 from three to two.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 89 +++++++++++++++++++++----------------------------------
 1 file changed, 33 insertions(+), 56 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 8d54fa9..9931d5e 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -26,6 +26,7 @@
 #include "migration/migration.h"
 #include "qemu/error-report.h"
 #include "qemu/event_notifier.h"
+#include "qom/object_interfaces.h"
 #include "sysemu/char.h"
 #include "sysemu/hostmem.h"
 #include "qapi/visitor.h"
@@ -369,31 +370,6 @@ static int check_shm_size(IVShmemState *s, int fd, Error **errp)
     }
 }
 
-/* create the shared memory BAR when we are not using the server, so we can
- * create the BAR and map the memory immediately */
-static int create_shared_memory_BAR(IVShmemState *s, int fd, uint8_t attr,
-                                    Error **errp)
-{
-    void * ptr;
-
-    ptr = mmap(0, s->ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
-    if (ptr == MAP_FAILED) {
-        error_setg_errno(errp, errno, "Failed to mmap shared memory");
-        return -1;
-    }
-
-    memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s), "ivshmem.bar2",
-                               s->ivshmem_size, ptr);
-    qemu_set_ram_fd(s->ivshmem.ram_addr, fd);
-    vmstate_register_ram(&s->ivshmem, DEVICE(s));
-    memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
-
-    /* region for shared memory */
-    pci_register_bar(PCI_DEVICE(s), 2, attr, &s->bar);
-
-    return 0;
-}
-
 static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i)
 {
     memory_region_add_eventfd(&s->ivshmem_mmio,
@@ -833,6 +809,33 @@ static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
     }
 }
 
+static HostMemoryBackend *desugar_shm(const char *shm, size_t size)
+{
+    /* TODO avoid the detour through QemuOpts */
+    static int counter;
+    QemuOpts *opts = qemu_opts_create(qemu_find_opts("object"),
+                                      NULL, 0, &error_abort);
+    char *path;
+    Object *obj;
+
+    qemu_opt_set(opts, "qom-type", "memory-backend-file",
+    &error_abort);
+    /* FIXME need a better way to make up an ID */
+    qemu_opts_set_id(opts, g_strdup_printf("ivshmem-backend-%d", counter++));
+    path = g_strdup_printf("/dev/shm/%s", shm);
+    qemu_opt_set(opts, "mem-path", path, &error_abort);
+    qemu_opt_set_number(opts, "size", size, &error_abort);
+    qemu_opt_set_bool(opts, "share", true, &error_abort);
+    g_free(path);
+
+    obj = user_creatable_add_opts(opts, &error_abort);
+    qemu_opts_del(opts);
+
+    user_creatable_complete(obj, &error_abort);
+
+    return MEMORY_BACKEND(obj);
+}
+
 static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
 {
     IVShmemState *s = IVSHMEM(dev);
@@ -911,6 +914,10 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         attr |= PCI_BASE_ADDRESS_MEM_TYPE_64;
     }
 
+    if (s->shmobj) {
+        s->hostmem = desugar_shm(s->shmobj, s->ivshmem_size);
+    }
+
     if (s->hostmem != NULL) {
         MemoryRegion *mr;
 
@@ -921,7 +928,7 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         vmstate_register_ram(mr, DEVICE(s));
         memory_region_add_subregion(&s->bar, 0, mr);
         pci_register_bar(PCI_DEVICE(s), 2, attr, &s->bar);
-    } else if (s->server_chr != NULL) {
+    } else {
         IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
                         s->server_chr->filename);
 
@@ -948,36 +955,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
             error_setg(errp, "failed to initialize interrupts");
             return;
         }
-    } else {
-        /* just map the file immediately, we're not using a server */
-        int fd;
-
-        IVSHMEM_DPRINTF("using shm_open (shm object = %s)\n", s->shmobj);
-
-        /* try opening with O_EXCL and if it succeeds zero the memory
-         * by truncating to 0 */
-        if ((fd = shm_open(s->shmobj, O_CREAT|O_RDWR|O_EXCL,
-                        S_IRWXU|S_IRWXG|S_IRWXO)) > 0) {
-           /* truncate file to length PCI device's memory */
-            if (ftruncate(fd, s->ivshmem_size) != 0) {
-                error_report("could not truncate shared file");
-            }
-
-        } else if ((fd = shm_open(s->shmobj, O_CREAT|O_RDWR,
-                        S_IRWXU|S_IRWXG|S_IRWXO)) < 0) {
-            error_setg(errp, "could not open shared file");
-            return;
-        }
-
-        if (check_shm_size(s, fd, errp) == -1) {
-            return;
-        }
-
-        create_shared_memory_BAR(s, fd, attr, &err);
-        if (err) {
-            error_propagate(errp, err);
-            return;
-        }
     }
 
     if (s->role_val == IVSHMEM_PEER) {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory)
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (28 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 29/38] ivshmem: Implement shm=... with a memory backend Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-01 11:42   ` Paolo Bonzini
  2016-03-01 11:46   ` Paolo Bonzini
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 31/38] ivshmem: Inline check_shm_size() into its only caller Markus Armbruster
                   ` (7 subsequent siblings)
  37 siblings, 2 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

ivshmem_realize() puts the shared memory region in a container region.
Used to be necessary to permit delayed mapping of the shared memory.
Now we don't do that anymore, the container is redundant.  Drop it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 44 +++++++++++++++-----------------------------
 1 file changed, 15 insertions(+), 29 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 9931d5e..0440bca 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -82,12 +82,7 @@ typedef struct IVShmemState {
     CharDriverState *server_chr;
     MemoryRegion ivshmem_mmio;
 
-    /* We might need to register the BAR before we actually have the memory.
-     * So prepare a container MemoryRegion for the BAR immediately and
-     * add a subregion when we have the memory.
-     */
-    MemoryRegion bar;
-    MemoryRegion ivshmem;
+    MemoryRegion *ivshmem_bar2; /* BAR 2 (shared memory) */
     size_t ivshmem_size; /* size of shared memory region */
     uint32_t ivshmem_64bit;
 
@@ -487,7 +482,7 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
     Error *err = NULL;
     void *ptr;
 
-    if (memory_region_is_mapped(&s->ivshmem)) {
+    if (s->ivshmem_bar2) {
         error_setg(errp, "server sent unexpected shared memory message");
         close(fd);
         return;
@@ -506,11 +501,10 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
         close(fd);
         return;
     }
-    memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
+    s->ivshmem_bar2 = g_new(MemoryRegion, 1);
+    memory_region_init_ram_ptr(s->ivshmem_bar2, OBJECT(s),
                                "ivshmem.bar2", s->ivshmem_size, ptr);
-    qemu_set_ram_fd(s->ivshmem.ram_addr, fd);
-    vmstate_register_ram(&s->ivshmem, DEVICE(s));
-    memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
+    qemu_set_ram_fd(s->ivshmem_bar2->ram_addr, fd);
 }
 
 static void process_msg_disconnect(IVShmemState *s, uint16_t posn,
@@ -696,7 +690,7 @@ static void ivshmem_recv_setup(IVShmemState *s, Error **errp)
         }
     } while (msg != -1);
 
-    assert(memory_region_is_mapped(&s->ivshmem));
+    assert(s->ivshmem_bar2);
 }
 
 /* Select the MSI-X vectors used by device.
@@ -909,7 +903,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
     pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
                      &s->ivshmem_mmio);
 
-    memory_region_init(&s->bar, OBJECT(s), "ivshmem-bar2-container", s->ivshmem_size);
     if (s->ivshmem_64bit) {
         attr |= PCI_BASE_ADDRESS_MEM_TYPE_64;
     }
@@ -919,15 +912,10 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
     }
 
     if (s->hostmem != NULL) {
-        MemoryRegion *mr;
-
         IVSHMEM_DPRINTF("using hostmem\n");
 
-        mr = host_memory_backend_get_memory(MEMORY_BACKEND(s->hostmem),
-                                            &error_abort);
-        vmstate_register_ram(mr, DEVICE(s));
-        memory_region_add_subregion(&s->bar, 0, mr);
-        pci_register_bar(PCI_DEVICE(s), 2, attr, &s->bar);
+        s->ivshmem_bar2 = host_memory_backend_get_memory(s->hostmem,
+                                                         &error_abort);
     } else {
         IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
                         s->server_chr->filename);
@@ -935,8 +923,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         /* we allocate enough space for 16 peers and grow as needed */
         resize_peers(s, 16);
 
-        pci_register_bar(dev, 2, attr, &s->bar);
-
         /*
          * Receive setup messages from server synchronously.
          * Older versions did it asynchronously, but that creates a
@@ -957,6 +943,9 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         }
     }
 
+    vmstate_register_ram(s->ivshmem_bar2, DEVICE(s));
+    pci_register_bar(PCI_DEVICE(s), 2, attr, s->ivshmem_bar2);
+
     if (s->role_val == IVSHMEM_PEER) {
         error_setg(&s->migration_blocker,
                    "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
@@ -974,22 +963,19 @@ static void pci_ivshmem_exit(PCIDevice *dev)
         error_free(s->migration_blocker);
     }
 
-    if (memory_region_is_mapped(&s->ivshmem)) {
+    if (memory_region_is_mapped(s->ivshmem_bar2)) {
         if (!s->hostmem) {
-            void *addr = memory_region_get_ram_ptr(&s->ivshmem);
-            int fd;
+            void *addr = memory_region_get_ram_ptr(s->ivshmem_bar2);
 
             if (munmap(addr, s->ivshmem_size) == -1) {
                 error_report("Failed to munmap shared memory %s",
                              strerror(errno));
             }
 
-            if ((fd = qemu_get_ram_fd(s->ivshmem.ram_addr)) != -1)
-                close(fd);
+            close(qemu_get_ram_fd(s->ivshmem_bar2->ram_addr));
         }
 
-        vmstate_unregister_ram(&s->ivshmem, DEVICE(dev));
-        memory_region_del_subregion(&s->bar, &s->ivshmem);
+        vmstate_unregister_ram(s->ivshmem_bar2, DEVICE(dev));
     }
 
     if (s->peers) {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 31/38] ivshmem: Inline check_shm_size() into its only caller
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (29 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory) Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 18:49   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 32/38] qdev: New DEFINE_PROP_ON_OFF_AUTO Markus Armbruster
                   ` (6 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Improve the error messages while there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 37 +++++++++++--------------------------
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 0440bca..785ed1c 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -342,29 +342,6 @@ static void watch_vector_notifier(IVShmemState *s, EventNotifier *n,
                         NULL, &s->msi_vectors[vector]);
 }
 
-static int check_shm_size(IVShmemState *s, int fd, Error **errp)
-{
-    /* check that the guest isn't going to try and map more memory than the
-     * the object has allocated return -1 to indicate error */
-
-    struct stat buf;
-
-    if (fstat(fd, &buf) < 0) {
-        error_setg(errp, "exiting: fstat on fd %d failed: %s",
-                   fd, strerror(errno));
-        return -1;
-    }
-
-    if (s->ivshmem_size > buf.st_size) {
-        error_setg(errp, "Requested memory size greater"
-                   " than shared object size (%zu > %" PRIu64")",
-                   s->ivshmem_size, (uint64_t)buf.st_size);
-        return -1;
-    } else {
-        return 0;
-    }
-}
-
 static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i)
 {
     memory_region_add_eventfd(&s->ivshmem_mmio,
@@ -479,7 +456,7 @@ static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
 
 static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
 {
-    Error *err = NULL;
+    struct stat buf;
     void *ptr;
 
     if (s->ivshmem_bar2) {
@@ -488,8 +465,16 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
         return;
     }
 
-    if (check_shm_size(s, fd, &err) == -1) {
-        error_propagate(errp, err);
+    if (fstat(fd, &buf) < 0) {
+        error_setg_errno(errp, errno,
+            "can't determine size of shared memory sent by server");
+        close(fd);
+        return;
+    }
+
+    if (s->ivshmem_size > buf.st_size) {
+        error_setg(errp, "server sent only %zd bytes of shared memory",
+                   (size_t)buf.st_size);
         close(fd);
         return;
     }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 32/38] qdev: New DEFINE_PROP_ON_OFF_AUTO
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (30 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 31/38] ivshmem: Inline check_shm_size() into its only caller Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 18:54   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 33/38] ivshmem: Replace int role_val by OnOffAuto master Markus Armbruster
                   ` (5 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/core/qdev-properties.c    | 10 ++++++++++
 include/hw/qdev-properties.h |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index bc89800..d2f5a08 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -516,6 +516,16 @@ PropertyInfo qdev_prop_macaddr = {
     .set   = set_mac,
 };
 
+/* --- on/off/auto --- */
+
+PropertyInfo qdev_prop_on_off_auto = {
+    .name = "OnOffAuto",
+    .description = "on/off/auto",
+    .enum_table = OnOffAuto_lookup,
+    .get = get_enum,
+    .set = set_enum,
+};
+
 /* --- lost tick policy --- */
 
 QEMU_BUILD_BUG_ON(sizeof(LostTickPolicy) != sizeof(int));
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 03a1b91..0586cac 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -18,6 +18,7 @@ extern PropertyInfo qdev_prop_string;
 extern PropertyInfo qdev_prop_chr;
 extern PropertyInfo qdev_prop_ptr;
 extern PropertyInfo qdev_prop_macaddr;
+extern PropertyInfo qdev_prop_on_off_auto;
 extern PropertyInfo qdev_prop_losttickpolicy;
 extern PropertyInfo qdev_prop_bios_chs_trans;
 extern PropertyInfo qdev_prop_fdc_drive_type;
@@ -155,6 +156,8 @@ extern PropertyInfo qdev_prop_arraylen;
     DEFINE_PROP(_n, _s, _f, qdev_prop_drive, BlockBackend *)
 #define DEFINE_PROP_MACADDR(_n, _s, _f)         \
     DEFINE_PROP(_n, _s, _f, qdev_prop_macaddr, MACAddr)
+#define DEFINE_PROP_ON_OFF_AUTO(_n, _s, _f, _d) \
+    DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_on_off_auto, OnOffAuto)
 #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
     DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_losttickpolicy, \
                         LostTickPolicy)
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 33/38] ivshmem: Replace int role_val by OnOffAuto master
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (31 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 32/38] qdev: New DEFINE_PROP_ON_OFF_AUTO Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-02 18:56   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 34/38] ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem Markus Armbruster
                   ` (4 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

In preparation of making it a qdev property.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 785ed1c..b39ea27 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -43,9 +43,6 @@
 #define IVSHMEM_IOEVENTFD   0
 #define IVSHMEM_MSI     1
 
-#define IVSHMEM_PEER    0
-#define IVSHMEM_MASTER  1
-
 #define IVSHMEM_REG_BAR_SIZE 0x100
 
 #define IVSHMEM_DEBUG 0
@@ -96,12 +93,12 @@ typedef struct IVShmemState {
     uint64_t msg_buf;           /* buffer for receiving server messages */
     int msg_buffered_bytes;     /* #bytes in @msg_buf */
 
+    OnOffAuto master;
     Error *migration_blocker;
 
     char * shmobj;
     char * sizearg;
     char * role;
-    int role_val;   /* scalar to avoid multiple string comparisons */
 } IVShmemState;
 
 /* registers for the Inter-VM shared memory device */
@@ -117,6 +114,12 @@ static inline uint32_t ivshmem_has_feature(IVShmemState *ivs,
     return (ivs->features & (1 << feature));
 }
 
+static inline bool ivshmem_is_master(IVShmemState *s)
+{
+    assert(s->master != ON_OFF_AUTO_AUTO);
+    return s->master == ON_OFF_AUTO_ON;
+}
+
 static void ivshmem_update_irq(IVShmemState *s)
 {
     PCIDevice *d = PCI_DEVICE(s);
@@ -861,15 +864,15 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
     /* check that role is reasonable */
     if (s->role) {
         if (strncmp(s->role, "peer", 5) == 0) {
-            s->role_val = IVSHMEM_PEER;
+            s->master = ON_OFF_AUTO_OFF;
         } else if (strncmp(s->role, "master", 7) == 0) {
-            s->role_val = IVSHMEM_MASTER;
+            s->master = ON_OFF_AUTO_ON;
         } else {
             error_setg(errp, "'role' must be 'peer' or 'master'");
             return;
         }
     } else {
-        s->role_val = IVSHMEM_MASTER; /* default */
+        s->master = ON_OFF_AUTO_AUTO;
     }
 
     pci_conf = dev->config;
@@ -931,7 +934,11 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
     vmstate_register_ram(s->ivshmem_bar2, DEVICE(s));
     pci_register_bar(PCI_DEVICE(s), 2, attr, s->ivshmem_bar2);
 
-    if (s->role_val == IVSHMEM_PEER) {
+    if (s->master == ON_OFF_AUTO_AUTO) {
+        s->master = s->vm_id == 0 ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
+    }
+
+    if (ivshmem_is_master(s)) {
         error_setg(&s->migration_blocker,
                    "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
         migrate_add_blocker(s->migration_blocker);
@@ -993,7 +1000,7 @@ static int ivshmem_pre_load(void *opaque)
 {
     IVShmemState *s = opaque;
 
-    if (s->role_val == IVSHMEM_PEER) {
+    if (ivshmem_is_master(s)) {
         error_report("'peer' devices are not migratable");
         return -EINVAL;
     }
@@ -1019,9 +1026,9 @@ static int ivshmem_load_old(QEMUFile *f, void *opaque, int version_id)
         return -EINVAL;
     }
 
-    if (s->role_val == IVSHMEM_PEER) {
-        error_report("'peer' devices are not migratable");
-        return -EINVAL;
+    ret = ivshmem_pre_load(s);
+    if (ret) {
+        return ret;
     }
 
     ret = pci_device_load(pdev, f);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 34/38] ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (32 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 33/38] ivshmem: Replace int role_val by OnOffAuto master Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-03 13:53   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 35/38] ivshmem: Clean up after the previous commit Markus Armbruster
                   ` (3 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

ivshmem can be configured with and without interrupt capability
(a.k.a. "doorbell").  The two configurations have largely disjoint
options, which makes for a confusing (and badly checked) user
interface.  Moreover, the device can't tell the guest whether its
doorbell is enabled.

Create two new device models ivshmem-plain and ivshmem-doorbell, and
deprecate the old one.

Changes from ivshmem:

* PCI revision is 1 instead of 0.  The new revision is fully backwards
  compatible for guests.  Guests may elect to require at least
  revision 1 to make sure they're not exposed to the funny "no shared
  memory, yet" state.

* Property "role" replaced by "master".  role=master becomes
  master=on, role=peer becomes master=off.  Default is off instead of
  auto.

* Property "use64" is gone.  The new devices always have 64 bit BARs.

Changes from ivshmem to ivshmem-plain:

* The Interrupt Pin register in PCI config space is zero (does not use
  an interrupt pin) instead of one (uses INTA).

* Property "x-memdev" is renamed to "memdev".

* Properties "shm" and "size" are gone.  Use property "memdev"
  instead.

* Property "msi" is gone.  The new device can't have MSI-X capability.
  It can't interrupt anyway.

* Properties "ioeventfd" and "vectors" are gone.  They're meaningless
  without interrupts anyway.

Changes from ivshmem to ivshmem-doorbell:

* Property "msi" is gone.  The new device always has MSI-X capability.

* Property "ioeventfd" defaults to on instead of off.

* Property "size" is gone.  The new device can only map all the shared
  memory received from the server.

Guests can easily find out whether the device is configured for
interrupts by checking for MSI-X capability.

Note: some code added in sub-optimal places to make the diff easier to
review.  The next commit will move it to more sensible places.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/specs/ivshmem-spec.txt |  66 +++++-----
 hw/misc/ivshmem.c           | 310 ++++++++++++++++++++++++++++++++------------
 qemu-doc.texi               |  33 ++---
 tests/ivshmem-test.c        |  12 +-
 4 files changed, 288 insertions(+), 133 deletions(-)

diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
index 3eb8c97..047dfc0 100644
--- a/docs/specs/ivshmem-spec.txt
+++ b/docs/specs/ivshmem-spec.txt
@@ -17,9 +17,10 @@ get interrupted by its peers.
 
 There are two basic configurations:
 
-- Just shared memory: -device ivshmem,shm=NAME,...
+- Just shared memory: -device ivshmem-plain,memdev=HMB,...
 
-  This uses shared memory object NAME.
+  This uses host memory backend HMB.  It should have option "share"
+  set.
 
 - Shared memory plus interrupts: -device ivshmem,chardev=CHR,vectors=N,...
 
@@ -30,9 +31,8 @@ There are two basic configurations:
   Each peer gets assigned a unique ID by the server.  IDs must be
   between 0 and 65535.
 
-  Interrupts are message-signaled by default (MSI-X).  With msi=off
-  the device has no MSI-X capability, and uses legacy INTx instead.
-  vectors=N configures the number of vectors to use.
+  Interrupts are message-signaled (MSI-X).  vectors=N configures the
+  number of vectors to use.
 
 For more details on ivshmem device properties, see The QEMU Emulator
 User Documentation (qemu-doc.*).
@@ -40,14 +40,15 @@ User Documentation (qemu-doc.*).
 
 == The ivshmem PCI device's guest interface ==
 
-The device has vendor ID 1af4, device ID 1110, revision 0.
+The device has vendor ID 1af4, device ID 1110, revision 1.  Before
+QEMU 2.6.0, it had revision 0.
 
 === PCI BARs ===
 
 The ivshmem PCI device has two or three BARs:
 
 - BAR0 holds device registers (256 Byte MMIO)
-- BAR1 holds MSI-X table and PBA (only when using MSI-X)
+- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell)
 - BAR2 maps the shared memory object
 
 There are two ways to use this device:
@@ -58,18 +59,19 @@ There are two ways to use this device:
   user space (see http://dpdk.org/browse/memnic).
 
 - If you additionally need the capability for peers to interrupt each
-  other, you need BAR0 and, if using MSI-X, BAR1.  You will most
-  likely want to write a kernel driver to handle interrupts.  Requires
-  the device to be configured for interrupts, obviously.
+  other, you need BAR0 and BAR1.  You will most likely want to write a
+  kernel driver to handle interrupts.  Requires the device to be
+  configured for interrupts, obviously.
 
 Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
 configured for interrupts.  It becomes safely accessible only after
-the ivshmem server provided the shared memory.  Guest software should
-wait for the IVPosition register (described below) to become
-non-negative before accessing BAR2.
+the ivshmem server provided the shared memory.  These devices have PCI
+revision 0 rather than 1.  Guest software should wait for the
+IVPosition register (described below) to become non-negative before
+accessing BAR2.
 
-The device is not capable to tell guest software whether it is
-configured for interrupts.
+Revision 0 of the device is not capable to tell guest software whether
+it is configured for interrupts.
 
 === PCI device registers ===
 
@@ -77,10 +79,12 @@ BAR 0 contains the following registers:
 
     Offset  Size  Access      On reset  Function
         0     4   read/write        0   Interrupt Mask
-                                        bit 0: peer interrupt
+                                        bit 0: peer interrupt (rev 0)
+                                               reserved       (rev 1)
                                         bit 1..31: reserved
         4     4   read/write        0   Interrupt Status
-                                        bit 0: peer interrupt
+                                        bit 0: peer interrupt (rev 0)
+                                               reserved       (rev 1)
                                         bit 1..31: reserved
         8     4   read-only   0 or ID   IVPosition
        12     4   write-only      N/A   Doorbell
@@ -92,18 +96,18 @@ Software should only access the registers as specified in column
 "Access".  Reserved bits should be ignored on read, and preserved on
 write.
 
-Interrupt Status and Mask Register together control the legacy INTx
-interrupt when the device has no MSI-X capability: INTx is asserted
-when the bit-wise AND of Status and Mask is non-zero and the device
-has no MSI-X capability.  Interrupt Status Register bit 0 becomes 1
-when an interrupt request from a peer is received.  Reading the
-register clears it.
+In revision 0 of the device, Interrupt Status and Mask Register
+together control the legacy INTx interrupt when the device has no
+MSI-X capability: INTx is asserted when the bit-wise AND of Status and
+Mask is non-zero and the device has no MSI-X capability.  Interrupt
+Status Register bit 0 becomes 1 when an interrupt request from a peer
+is received.  Reading the register clears it.
 
 IVPosition Register: if the device is not configured for interrupts,
 this is zero.  Else, it is the device's ID (between 0 and 65535).
 
 Before QEMU 2.6.0, the register may read -1 for a short while after
-reset.
+reset.  These devices have PCI revision 0 rather than 1.
 
 There is no good way for software to find out whether the device is
 configured for interrupts.  A positive IVPosition means interrupts,
@@ -124,14 +128,14 @@ interrupt vectors connected, the write is ignored.  The device is not
 capable to tell guest software what peers are connected, or how many
 interrupt vectors are connected.
 
-If the peer doesn't use MSI-X, its Interrupt Status register is set to
-1.  This asserts INTx unless masked by the Interrupt Mask register.
-The device is not capable to communicate the interrupt vector to guest
-software then.
+The peer's interrupt for this vector then becomes pending.  There is
+no way for software to clear the pending bit, and a polling mode of
+operation is therefore impossible.
 
-If the peer uses MSI-X, the interrupt for this vector becomes pending.
-There is no way for software to clear the pending bit, and a polling
-mode of operation is therefore impossible with MSI-X.
+If the peer is a revision 0 device without MSI-X capability, its
+Interrupt Status register is set to 1.  This asserts INTx unless
+masked by the Interrupt Mask register.  The device is not capable to
+communicate the interrupt vector to guest software then.
 
 With multiple MSI-X vectors, different vectors can be used to indicate
 different events have occurred.  The semantics of interrupt vectors
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index b39ea27..f7f5b3b 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -29,6 +29,7 @@
 #include "qom/object_interfaces.h"
 #include "sysemu/char.h"
 #include "sysemu/hostmem.h"
+#include "sysemu/qtest.h"
 #include "qapi/visitor.h"
 #include "exec/ram_addr.h"
 
@@ -53,6 +54,18 @@
         }                                               \
     } while (0)
 
+#define TYPE_IVSHMEM_COMMON "ivshmem-common"
+#define IVSHMEM_COMMON(obj) \
+    OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_COMMON)
+
+#define TYPE_IVSHMEM_PLAIN "ivshmem-plain"
+#define IVSHMEM_PLAIN(obj) \
+    OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_PLAIN)
+
+#define TYPE_IVSHMEM_DOORBELL "ivshmem-doorbell"
+#define IVSHMEM_DOORBELL(obj) \
+    OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_DOORBELL)
+
 #define TYPE_IVSHMEM "ivshmem"
 #define IVSHMEM(obj) \
     OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM)
@@ -80,8 +93,6 @@ typedef struct IVShmemState {
     MemoryRegion ivshmem_mmio;
 
     MemoryRegion *ivshmem_bar2; /* BAR 2 (shared memory) */
-    size_t ivshmem_size; /* size of shared memory region */
-    uint32_t ivshmem_64bit;
 
     Peer *peers;
     int nb_peers;               /* space in @peers[] */
@@ -96,9 +107,12 @@ typedef struct IVShmemState {
     OnOffAuto master;
     Error *migration_blocker;
 
-    char * shmobj;
-    char * sizearg;
-    char * role;
+    /* legacy cruft */
+    char *role;
+    char *shmobj;
+    char *sizearg;
+    size_t legacy_size;
+    uint32_t not_legacy_32bit;
 } IVShmemState;
 
 /* registers for the Inter-VM shared memory device */
@@ -258,7 +272,7 @@ static void ivshmem_vector_notify(void *opaque)
 {
     MSIVector *entry = opaque;
     PCIDevice *pdev = entry->pdev;
-    IVShmemState *s = IVSHMEM(pdev);
+    IVShmemState *s = IVSHMEM_COMMON(pdev);
     int vector = entry - s->msi_vectors;
     EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
 
@@ -279,7 +293,7 @@ static void ivshmem_vector_notify(void *opaque)
 static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector,
                                  MSIMessage msg)
 {
-    IVShmemState *s = IVSHMEM(dev);
+    IVShmemState *s = IVSHMEM_COMMON(dev);
     EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
     MSIVector *v = &s->msi_vectors[vector];
     int ret;
@@ -296,7 +310,7 @@ static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector,
 
 static void ivshmem_vector_mask(PCIDevice *dev, unsigned vector)
 {
-    IVShmemState *s = IVSHMEM(dev);
+    IVShmemState *s = IVSHMEM_COMMON(dev);
     EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
     int ret;
 
@@ -313,7 +327,7 @@ static void ivshmem_vector_poll(PCIDevice *dev,
                                 unsigned int vector_start,
                                 unsigned int vector_end)
 {
-    IVShmemState *s = IVSHMEM(dev);
+    IVShmemState *s = IVSHMEM_COMMON(dev);
     unsigned int vector;
 
     IVSHMEM_DPRINTF("vector poll %p %d-%d\n", dev, vector_start, vector_end);
@@ -460,6 +474,7 @@ static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
 static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
 {
     struct stat buf;
+    size_t size;
     void *ptr;
 
     if (s->ivshmem_bar2) {
@@ -475,15 +490,21 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
         return;
     }
 
-    if (s->ivshmem_size > buf.st_size) {
-        error_setg(errp, "server sent only %zd bytes of shared memory",
-                   (size_t)buf.st_size);
-        close(fd);
-        return;
+    size = buf.st_size;
+
+    /* Legacy cruft */
+    if (s->legacy_size != SIZE_MAX) {
+        if (size < s->legacy_size) {
+            error_setg(errp, "server sent only %zd bytes of shared memory",
+                       (size_t)buf.st_size);
+            close(fd);
+            return;
+        }
+        size = s->legacy_size;
     }
 
     /* mmap the region and map into the BAR2 */
-    ptr = mmap(0, s->ivshmem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+    ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
     if (ptr == MAP_FAILED) {
         error_setg_errno(errp, errno, "Failed to mmap shared memory");
         close(fd);
@@ -491,7 +512,7 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
     }
     s->ivshmem_bar2 = g_new(MemoryRegion, 1);
     memory_region_init_ram_ptr(s->ivshmem_bar2, OBJECT(s),
-                               "ivshmem.bar2", s->ivshmem_size, ptr);
+                               "ivshmem.bar2", size, ptr);
     qemu_set_ram_fd(s->ivshmem_bar2->ram_addr, fd);
 }
 
@@ -700,7 +721,7 @@ static void ivshmem_vector_use(IVShmemState *s)
 
 static void ivshmem_reset(DeviceState *d)
 {
-    IVShmemState *s = IVSHMEM(d);
+    IVShmemState *s = IVSHMEM_COMMON(d);
 
     s->intrstatus = 0;
     s->intrmask = 0;
@@ -776,7 +797,7 @@ static void ivshmem_disable_irqfd(IVShmemState *s)
 static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
                                  uint32_t val, int len)
 {
-    IVShmemState *s = IVSHMEM(pdev);
+    IVShmemState *s = IVSHMEM_COMMON(pdev);
     int is_enabled, was_enabled = msix_enabled(pdev);
 
     pci_default_write_config(pdev, address, val, len);
@@ -818,42 +839,14 @@ static HostMemoryBackend *desugar_shm(const char *shm, size_t size)
     return MEMORY_BACKEND(obj);
 }
 
-static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
+static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
 {
-    IVShmemState *s = IVSHMEM(dev);
+    IVShmemState *s = IVSHMEM_COMMON(dev);
     Error *err = NULL;
     uint8_t *pci_conf;
     uint8_t attr = PCI_BASE_ADDRESS_SPACE_MEMORY |
         PCI_BASE_ADDRESS_MEM_PREFETCH;
 
-    if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
-        error_setg(errp,
-                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
-        return;
-    }
-
-    if (s->hostmem) {
-        MemoryRegion *mr;
-
-        if (s->sizearg) {
-            g_warning("size argument ignored with hostmem");
-        }
-
-        mr = host_memory_backend_get_memory(s->hostmem, &error_abort);
-        s->ivshmem_size = memory_region_size(mr);
-    } else if (s->sizearg == NULL) {
-        s->ivshmem_size = 4 << 20; /* 4 MB default */
-    } else {
-        char *end;
-        int64_t size = qemu_strtosz(s->sizearg, &end);
-        if (size < 0 || (size_t)size != size || *end != '\0'
-            || !is_power_of_2(size)) {
-            error_setg(errp, "Invalid size %s", s->sizearg);
-            return;
-        }
-        s->ivshmem_size = size;
-    }
-
     /* IRQFD requires MSI */
     if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) &&
         !ivshmem_has_feature(s, IVSHMEM_MSI)) {
@@ -861,29 +854,9 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
         return;
     }
 
-    /* check that role is reasonable */
-    if (s->role) {
-        if (strncmp(s->role, "peer", 5) == 0) {
-            s->master = ON_OFF_AUTO_OFF;
-        } else if (strncmp(s->role, "master", 7) == 0) {
-            s->master = ON_OFF_AUTO_ON;
-        } else {
-            error_setg(errp, "'role' must be 'peer' or 'master'");
-            return;
-        }
-    } else {
-        s->master = ON_OFF_AUTO_AUTO;
-    }
-
     pci_conf = dev->config;
     pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
 
-    /*
-     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
-     * bald-faced lie then.  But it's a backwards compatible lie.
-     */
-    pci_config_set_interrupt_pin(pci_conf, 1);
-
     memory_region_init_io(&s->ivshmem_mmio, OBJECT(s), &ivshmem_mmio_ops, s,
                           "ivshmem-mmio", IVSHMEM_REG_BAR_SIZE);
 
@@ -891,14 +864,10 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
     pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
                      &s->ivshmem_mmio);
 
-    if (s->ivshmem_64bit) {
+    if (!s->not_legacy_32bit) {
         attr |= PCI_BASE_ADDRESS_MEM_TYPE_64;
     }
 
-    if (s->shmobj) {
-        s->hostmem = desugar_shm(s->shmobj, s->ivshmem_size);
-    }
-
     if (s->hostmem != NULL) {
         IVSHMEM_DPRINTF("using hostmem\n");
 
@@ -945,9 +914,68 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
     }
 }
 
-static void pci_ivshmem_exit(PCIDevice *dev)
+static void ivshmem_realize(PCIDevice *dev, Error **errp)
 {
-    IVShmemState *s = IVSHMEM(dev);
+    IVShmemState *s = IVSHMEM_COMMON(dev);
+
+    if (!qtest_enabled()) {
+        error_report("ivshmem is deprecated, please use ivshmem-plain"
+                     " or ivshmem-doorbell instead");
+    }
+
+    if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
+        error_setg(errp,
+                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
+        return;
+    }
+
+    if (s->hostmem) {
+        if (s->sizearg) {
+            g_warning("size argument ignored with hostmem");
+        }
+    } else if (s->sizearg == NULL) {
+        s->legacy_size = 4 << 20; /* 4 MB default */
+    } else {
+        char *end;
+        int64_t size = qemu_strtosz(s->sizearg, &end);
+        if (size < 0 || (size_t)size != size || *end != '\0'
+            || !is_power_of_2(size)) {
+            error_setg(errp, "Invalid size %s", s->sizearg);
+            return;
+        }
+        s->legacy_size = size;
+    }
+
+    /* check that role is reasonable */
+    if (s->role) {
+        if (strncmp(s->role, "peer", 5) == 0) {
+            s->master = ON_OFF_AUTO_OFF;
+        } else if (strncmp(s->role, "master", 7) == 0) {
+            s->master = ON_OFF_AUTO_ON;
+        } else {
+            error_setg(errp, "'role' must be 'peer' or 'master'");
+            return;
+        }
+    } else {
+        s->master = ON_OFF_AUTO_AUTO;
+    }
+
+    if (s->shmobj) {
+        s->hostmem = desugar_shm(s->shmobj, s->legacy_size);
+    }
+
+    /*
+     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
+     * bald-faced lie then.  But it's a backwards compatible lie.
+     */
+    pci_config_set_interrupt_pin(dev->config, 1);
+
+    ivshmem_common_realize(dev, errp);
+}
+
+static void ivshmem_exit(PCIDevice *dev)
+{
+    IVShmemState *s = IVSHMEM_COMMON(dev);
     int i;
 
     if (s->migration_blocker) {
@@ -959,7 +987,7 @@ static void pci_ivshmem_exit(PCIDevice *dev)
         if (!s->hostmem) {
             void *addr = memory_region_get_ram_ptr(s->ivshmem_bar2);
 
-            if (munmap(addr, s->ivshmem_size) == -1) {
+            if (munmap(addr, memory_region_size(s->ivshmem_bar2) == -1)) {
                 error_report("Failed to munmap shared memory %s",
                              strerror(errno));
             }
@@ -1074,28 +1102,39 @@ static Property ivshmem_properties[] = {
     DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true),
     DEFINE_PROP_STRING("shm", IVShmemState, shmobj),
     DEFINE_PROP_STRING("role", IVShmemState, role),
-    DEFINE_PROP_UINT32("use64", IVShmemState, ivshmem_64bit, 1),
+    DEFINE_PROP_UINT32("use64", IVShmemState, not_legacy_32bit, 1),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-static void ivshmem_class_init(ObjectClass *klass, void *data)
+static void ivshmem_common_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-    k->realize = pci_ivshmem_realize;
-    k->exit = pci_ivshmem_exit;
+    k->realize = ivshmem_common_realize;
+    k->exit = ivshmem_exit;
     k->config_write = ivshmem_write_config;
     k->vendor_id = PCI_VENDOR_ID_IVSHMEM;
     k->device_id = PCI_DEVICE_ID_IVSHMEM;
     k->class_id = PCI_CLASS_MEMORY_RAM;
+    k->revision = 1;
     dc->reset = ivshmem_reset;
-    dc->props = ivshmem_properties;
-    dc->vmsd = &ivshmem_vmsd;
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     dc->desc = "Inter-VM shared memory";
 }
 
+static void ivshmem_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->realize = ivshmem_realize;
+    k->revision = 0;
+    dc->desc = "Inter-VM shared memory (legacy)";
+    dc->props = ivshmem_properties;
+    dc->vmsd = &ivshmem_vmsd;
+}
+
 static void ivshmem_check_memdev_is_busy(Object *obj, const char *name,
                                          Object *val, Error **errp)
 {
@@ -1122,16 +1161,121 @@ static void ivshmem_init(Object *obj)
                              &error_abort);
 }
 
+static const TypeInfo ivshmem_common_info = {
+    .name          = TYPE_IVSHMEM_COMMON,
+    .parent        = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(IVShmemState),
+    .abstract      = true,
+    .class_init    = ivshmem_common_class_init,
+};
+
 static const TypeInfo ivshmem_info = {
     .name          = TYPE_IVSHMEM,
-    .parent        = TYPE_PCI_DEVICE,
+    .parent        = TYPE_IVSHMEM_COMMON,
     .instance_size = sizeof(IVShmemState),
     .instance_init = ivshmem_init,
     .class_init    = ivshmem_class_init,
 };
 
+static const VMStateDescription ivshmem_plain_vmsd = {
+    .name = TYPE_IVSHMEM_PLAIN,
+    .version_id = 0,
+    .minimum_version_id = 0,
+    .pre_load = ivshmem_pre_load,
+    .post_load = ivshmem_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
+        VMSTATE_UINT32(intrstatus, IVShmemState),
+        VMSTATE_UINT32(intrmask, IVShmemState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static Property ivshmem_plain_properties[] = {
+    DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void ivshmem_plain_init(Object *obj)
+{
+    IVShmemState *s = IVSHMEM_PLAIN(obj);
+
+    object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND,
+                             (Object **)&s->hostmem,
+                             ivshmem_check_memdev_is_busy,
+                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
+                             &error_abort);
+}
+
+static void ivshmem_plain_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->props = ivshmem_plain_properties;
+    dc->vmsd = &ivshmem_plain_vmsd;
+}
+
+static const TypeInfo ivshmem_plain_info = {
+    .name          = TYPE_IVSHMEM_PLAIN,
+    .parent        = TYPE_IVSHMEM_COMMON,
+    .instance_size = sizeof(IVShmemState),
+    .instance_init = ivshmem_plain_init,
+    .class_init    = ivshmem_plain_class_init,
+};
+
+static const VMStateDescription ivshmem_doorbell_vmsd = {
+    .name = TYPE_IVSHMEM_DOORBELL,
+    .version_id = 0,
+    .minimum_version_id = 0,
+    .pre_load = ivshmem_pre_load,
+    .post_load = ivshmem_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
+        VMSTATE_MSIX(parent_obj, IVShmemState),
+        VMSTATE_UINT32(intrstatus, IVShmemState),
+        VMSTATE_UINT32(intrmask, IVShmemState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static Property ivshmem_doorbell_properties[] = {
+    DEFINE_PROP_CHR("chardev", IVShmemState, server_chr),
+    DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 1),
+    DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD,
+                    true),
+    DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void ivshmem_doorbell_init(Object *obj)
+{
+    IVShmemState *s = IVSHMEM_DOORBELL(obj);
+
+    s->features |= (1 << IVSHMEM_MSI);
+    s->legacy_size = SIZE_MAX;  /* whatever the server sends */
+}
+
+static void ivshmem_doorbell_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->props = ivshmem_doorbell_properties;
+    dc->vmsd = &ivshmem_doorbell_vmsd;
+}
+
+static const TypeInfo ivshmem_doorbell_info = {
+    .name          = TYPE_IVSHMEM_DOORBELL,
+    .parent        = TYPE_IVSHMEM_COMMON,
+    .instance_size = sizeof(IVShmemState),
+    .instance_init = ivshmem_doorbell_init,
+    .class_init    = ivshmem_doorbell_class_init,
+};
+
 static void ivshmem_register_types(void)
 {
+    type_register_static(&ivshmem_common_info);
+    type_register_static(&ivshmem_plain_info);
+    type_register_static(&ivshmem_doorbell_info);
     type_register_static(&ivshmem_info);
 }
 
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 8afbbcd..0dd01c7 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1262,13 +1262,18 @@ basic example.
 
 @subsection Inter-VM Shared Memory device
 
-With KVM enabled on a Linux host, a shared memory device is available.  Guests
-map a POSIX shared memory region into the guest as a PCI device that enables
-zero-copy communication to the application level of the guests.  The basic
-syntax is:
+On Linux hosts, a shared memory device is available.  The basic syntax
+is:
 
 @example
-qemu-system-i386 -device ivshmem,size=@var{size},shm=@var{shm-name}
+qemu-system-x86_64 -device ivshmem-plain,memdev=@var{hostmem}
+@end example
+
+where @var{hostmem} names a host memory backend.  For a POSIX shared
+memory backend, use something like
+
+@example
+-object memory-backend-file,size=1M,share,mem-path=/dev/shm/ivshmem,id=@var{hostmem}
 @end example
 
 If desired, interrupts can be sent between guest VMs accessing the same shared
@@ -1282,8 +1287,7 @@ memory server is:
 ivshmem-server -p @var{pidfile} -S @var{path} -m @var{shm-name} -l @var{shm-size} -n @var{vectors}
 
 # Then start your qemu instances with matching arguments
-qemu-system-i386 -device ivshmem,size=@var{shm-size},vectors=@var{vectors},chardev=@var{id}
-                 [,msi=on][,ioeventfd=on][,role=peer|master]
+qemu-system-x86_64 -device ivshmem-doorbell,vectors=@var{vectors},chardev=@var{id}
                  -chardev socket,path=@var{path},id=@var{id}
 @end example
 
@@ -1291,12 +1295,11 @@ When using the server, the guest will be assigned a VM ID (>=0) that allows gues
 using the same server to communicate via interrupts.  Guests can read their
 VM ID from a device register (see ivshmem-spec.txt).
 
-The @option{role} argument can be set to either master or peer and will affect
-how the shared memory is migrated.  With @option{role=master}, the guest will
-copy the shared memory on migration to the destination host.  With
-@option{role=peer}, the guest will not be able to migrate with the device attached.
-With the @option{peer} case, the device should be detached and then reattached
-after migration using the PCI hotplug support.
+With device property @option{master=on}, the guest will copy the shared
+memory on migration to the destination host.  With @option{master=off},
+the guest will not be able to migrate with the device attached.  In the
+latter case, the device should be detached and then reattached after
+migration using the PCI hotplug support.
 
 @subsubsection ivshmem and hugepages
 
@@ -1304,8 +1307,8 @@ Instead of specifying the <shm size> using POSIX shm, you may specify
 a memory backend that has hugepage support:
 
 @example
-qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1
-                 -device ivshmem,x-memdev=mb1
+qemu-system-x86_64 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1
+                 -device ivshmem-plain,memdev=mb1
 @end example
 
 ivshmem-server also supports hugepages mount points with the
diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
index 68d6840..891b6b8 100644
--- a/tests/ivshmem-test.c
+++ b/tests/ivshmem-test.c
@@ -127,7 +127,9 @@ static void setup_vm_cmd(IVState *s, const char *cmd, bool msix)
 
 static void setup_vm(IVState *s)
 {
-    char *cmd = g_strdup_printf("-device ivshmem,shm=%s,size=1M", tmpshm);
+    char *cmd = g_strdup_printf("-object memory-backend-file"
+                                ",id=mb1,size=1M,share,mem-path=/dev/shm%s"
+                                " -device ivshmem-plain,memdev=mb1", tmpshm);
 
     setup_vm_cmd(s, cmd, false);
 
@@ -284,8 +286,10 @@ static void *server_thread(void *data)
 static void setup_vm_with_server(IVState *s, int nvectors, bool msi)
 {
     char *cmd = g_strdup_printf("-chardev socket,id=chr0,path=%s,nowait "
-                                "-device ivshmem,size=1M,chardev=chr0,vectors=%d,msi=%s",
-                                tmpserver, nvectors, msi ? "true" : "false");
+                                "-device ivshmem%s,chardev=chr0,vectors=%d",
+                                tmpserver,
+                                msi ? "-doorbell" : ",size=1M,msi=off",
+                                nvectors);
 
     setup_vm_cmd(s, cmd, msi);
 
@@ -412,7 +416,7 @@ static void test_ivshmem_memdev(void)
 
     /* just for the sake of checking memory-backend property */
     setup_vm_cmd(&state, "-object memory-backend-ram,size=1M,id=mb1"
-                 " -device ivshmem,x-memdev=mb1", false);
+                 " -device ivshmem-plain,memdev=mb1", false);
 
     cleanup_vm(&state);
 }
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 35/38] ivshmem: Clean up after the previous commit
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (33 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 34/38] ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-03 13:56   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 36/38] ivshmem: Drop ivshmem property x-memdev Markus Armbruster
                   ` (2 subsequent siblings)
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Move code to more sensible places.  Use the opportunity to reorder and
document IVShmemState members.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 420 +++++++++++++++++++++++++++---------------------------
 1 file changed, 213 insertions(+), 207 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index f7f5b3b..33b6842 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -85,25 +85,30 @@ typedef struct IVShmemState {
     PCIDevice parent_obj;
     /*< public >*/
 
-    HostMemoryBackend *hostmem;
+    uint32_t features;
+
+    /* exactly one of these two may be set */
+    HostMemoryBackend *hostmem; /* with interrupts */
+    CharDriverState *server_chr; /* without interrupts */
+
+    /* registers */
     uint32_t intrmask;
     uint32_t intrstatus;
+    int vm_id;
 
-    CharDriverState *server_chr;
-    MemoryRegion ivshmem_mmio;
-
+    /* BARs */
+    MemoryRegion ivshmem_mmio;  /* BAR 0 (registers) */
     MemoryRegion *ivshmem_bar2; /* BAR 2 (shared memory) */
 
+    /* interrupt support */
     Peer *peers;
     int nb_peers;               /* space in @peers[] */
-
-    int vm_id;
     uint32_t vectors;
-    uint32_t features;
     MSIVector *msi_vectors;
     uint64_t msg_buf;           /* buffer for receiving server messages */
     int msg_buffered_bytes;     /* #bytes in @msg_buf */
 
+    /* migration stuff */
     OnOffAuto master;
     Error *migration_blocker;
 
@@ -812,33 +817,6 @@ static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
     }
 }
 
-static HostMemoryBackend *desugar_shm(const char *shm, size_t size)
-{
-    /* TODO avoid the detour through QemuOpts */
-    static int counter;
-    QemuOpts *opts = qemu_opts_create(qemu_find_opts("object"),
-                                      NULL, 0, &error_abort);
-    char *path;
-    Object *obj;
-
-    qemu_opt_set(opts, "qom-type", "memory-backend-file",
-    &error_abort);
-    /* FIXME need a better way to make up an ID */
-    qemu_opts_set_id(opts, g_strdup_printf("ivshmem-backend-%d", counter++));
-    path = g_strdup_printf("/dev/shm/%s", shm);
-    qemu_opt_set(opts, "mem-path", path, &error_abort);
-    qemu_opt_set_number(opts, "size", size, &error_abort);
-    qemu_opt_set_bool(opts, "share", true, &error_abort);
-    g_free(path);
-
-    obj = user_creatable_add_opts(opts, &error_abort);
-    qemu_opts_del(opts);
-
-    user_creatable_complete(obj, &error_abort);
-
-    return MEMORY_BACKEND(obj);
-}
-
 static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
 {
     IVShmemState *s = IVSHMEM_COMMON(dev);
@@ -914,65 +892,6 @@ static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
     }
 }
 
-static void ivshmem_realize(PCIDevice *dev, Error **errp)
-{
-    IVShmemState *s = IVSHMEM_COMMON(dev);
-
-    if (!qtest_enabled()) {
-        error_report("ivshmem is deprecated, please use ivshmem-plain"
-                     " or ivshmem-doorbell instead");
-    }
-
-    if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
-        error_setg(errp,
-                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
-        return;
-    }
-
-    if (s->hostmem) {
-        if (s->sizearg) {
-            g_warning("size argument ignored with hostmem");
-        }
-    } else if (s->sizearg == NULL) {
-        s->legacy_size = 4 << 20; /* 4 MB default */
-    } else {
-        char *end;
-        int64_t size = qemu_strtosz(s->sizearg, &end);
-        if (size < 0 || (size_t)size != size || *end != '\0'
-            || !is_power_of_2(size)) {
-            error_setg(errp, "Invalid size %s", s->sizearg);
-            return;
-        }
-        s->legacy_size = size;
-    }
-
-    /* check that role is reasonable */
-    if (s->role) {
-        if (strncmp(s->role, "peer", 5) == 0) {
-            s->master = ON_OFF_AUTO_OFF;
-        } else if (strncmp(s->role, "master", 7) == 0) {
-            s->master = ON_OFF_AUTO_ON;
-        } else {
-            error_setg(errp, "'role' must be 'peer' or 'master'");
-            return;
-        }
-    } else {
-        s->master = ON_OFF_AUTO_AUTO;
-    }
-
-    if (s->shmobj) {
-        s->hostmem = desugar_shm(s->shmobj, s->legacy_size);
-    }
-
-    /*
-     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
-     * bald-faced lie then.  But it's a backwards compatible lie.
-     */
-    pci_config_set_interrupt_pin(dev->config, 1);
-
-    ivshmem_common_realize(dev, errp);
-}
-
 static void ivshmem_exit(PCIDevice *dev)
 {
     IVShmemState *s = IVSHMEM_COMMON(dev);
@@ -1012,18 +931,6 @@ static void ivshmem_exit(PCIDevice *dev)
     g_free(s->msi_vectors);
 }
 
-static bool test_msix(void *opaque, int version_id)
-{
-    IVShmemState *s = opaque;
-
-    return ivshmem_has_feature(s, IVSHMEM_MSI);
-}
-
-static bool test_no_msix(void *opaque, int version_id)
-{
-    return !test_msix(opaque, version_id);
-}
-
 static int ivshmem_pre_load(void *opaque)
 {
     IVShmemState *s = opaque;
@@ -1042,70 +949,6 @@ static int ivshmem_post_load(void *opaque, int version_id)
     return 0;
 }
 
-static int ivshmem_load_old(QEMUFile *f, void *opaque, int version_id)
-{
-    IVShmemState *s = opaque;
-    PCIDevice *pdev = PCI_DEVICE(s);
-    int ret;
-
-    IVSHMEM_DPRINTF("ivshmem_load_old\n");
-
-    if (version_id != 0) {
-        return -EINVAL;
-    }
-
-    ret = ivshmem_pre_load(s);
-    if (ret) {
-        return ret;
-    }
-
-    ret = pci_device_load(pdev, f);
-    if (ret) {
-        return ret;
-    }
-
-    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
-        msix_load(pdev, f);
-    } else {
-        s->intrstatus = qemu_get_be32(f);
-        s->intrmask = qemu_get_be32(f);
-    }
-    ivshmem_vector_use(s);
-
-    return 0;
-}
-
-static const VMStateDescription ivshmem_vmsd = {
-    .name = "ivshmem",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .pre_load = ivshmem_pre_load,
-    .post_load = ivshmem_post_load,
-    .fields = (VMStateField[]) {
-        VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
-
-        VMSTATE_MSIX_TEST(parent_obj, IVShmemState, test_msix),
-        VMSTATE_UINT32_TEST(intrstatus, IVShmemState, test_no_msix),
-        VMSTATE_UINT32_TEST(intrmask, IVShmemState, test_no_msix),
-
-        VMSTATE_END_OF_LIST()
-    },
-    .load_state_old = ivshmem_load_old,
-    .minimum_version_id_old = 0
-};
-
-static Property ivshmem_properties[] = {
-    DEFINE_PROP_CHR("chardev", IVShmemState, server_chr),
-    DEFINE_PROP_STRING("size", IVShmemState, sizearg),
-    DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 1),
-    DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD, false),
-    DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true),
-    DEFINE_PROP_STRING("shm", IVShmemState, shmobj),
-    DEFINE_PROP_STRING("role", IVShmemState, role),
-    DEFINE_PROP_UINT32("use64", IVShmemState, not_legacy_32bit, 1),
-    DEFINE_PROP_END_OF_LIST(),
-};
-
 static void ivshmem_common_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1123,17 +966,13 @@ static void ivshmem_common_class_init(ObjectClass *klass, void *data)
     dc->desc = "Inter-VM shared memory";
 }
 
-static void ivshmem_class_init(ObjectClass *klass, void *data)
-{
-    DeviceClass *dc = DEVICE_CLASS(klass);
-    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
-
-    k->realize = ivshmem_realize;
-    k->revision = 0;
-    dc->desc = "Inter-VM shared memory (legacy)";
-    dc->props = ivshmem_properties;
-    dc->vmsd = &ivshmem_vmsd;
-}
+static const TypeInfo ivshmem_common_info = {
+    .name          = TYPE_IVSHMEM_COMMON,
+    .parent        = TYPE_PCI_DEVICE,
+    .instance_size = sizeof(IVShmemState),
+    .abstract      = true,
+    .class_init    = ivshmem_common_class_init,
+};
 
 static void ivshmem_check_memdev_is_busy(Object *obj, const char *name,
                                          Object *val, Error **errp)
@@ -1150,33 +989,6 @@ static void ivshmem_check_memdev_is_busy(Object *obj, const char *name,
     }
 }
 
-static void ivshmem_init(Object *obj)
-{
-    IVShmemState *s = IVSHMEM(obj);
-
-    object_property_add_link(obj, "x-memdev", TYPE_MEMORY_BACKEND,
-                             (Object **)&s->hostmem,
-                             ivshmem_check_memdev_is_busy,
-                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
-                             &error_abort);
-}
-
-static const TypeInfo ivshmem_common_info = {
-    .name          = TYPE_IVSHMEM_COMMON,
-    .parent        = TYPE_PCI_DEVICE,
-    .instance_size = sizeof(IVShmemState),
-    .abstract      = true,
-    .class_init    = ivshmem_common_class_init,
-};
-
-static const TypeInfo ivshmem_info = {
-    .name          = TYPE_IVSHMEM,
-    .parent        = TYPE_IVSHMEM_COMMON,
-    .instance_size = sizeof(IVShmemState),
-    .instance_init = ivshmem_init,
-    .class_init    = ivshmem_class_init,
-};
-
 static const VMStateDescription ivshmem_plain_vmsd = {
     .name = TYPE_IVSHMEM_PLAIN,
     .version_id = 0,
@@ -1271,6 +1083,200 @@ static const TypeInfo ivshmem_doorbell_info = {
     .class_init    = ivshmem_doorbell_class_init,
 };
 
+static int ivshmem_load_old(QEMUFile *f, void *opaque, int version_id)
+{
+    IVShmemState *s = opaque;
+    PCIDevice *pdev = PCI_DEVICE(s);
+    int ret;
+
+    IVSHMEM_DPRINTF("ivshmem_load_old\n");
+
+    if (version_id != 0) {
+        return -EINVAL;
+    }
+
+    ret = ivshmem_pre_load(s);
+    if (ret) {
+        return ret;
+    }
+
+    ret = pci_device_load(pdev, f);
+    if (ret) {
+        return ret;
+    }
+
+    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
+        msix_load(pdev, f);
+    } else {
+        s->intrstatus = qemu_get_be32(f);
+        s->intrmask = qemu_get_be32(f);
+    }
+    ivshmem_vector_use(s);
+
+    return 0;
+}
+
+static bool test_msix(void *opaque, int version_id)
+{
+    IVShmemState *s = opaque;
+
+    return ivshmem_has_feature(s, IVSHMEM_MSI);
+}
+
+static bool test_no_msix(void *opaque, int version_id)
+{
+    return !test_msix(opaque, version_id);
+}
+
+static const VMStateDescription ivshmem_vmsd = {
+    .name = "ivshmem",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .pre_load = ivshmem_pre_load,
+    .post_load = ivshmem_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
+
+        VMSTATE_MSIX_TEST(parent_obj, IVShmemState, test_msix),
+        VMSTATE_UINT32_TEST(intrstatus, IVShmemState, test_no_msix),
+        VMSTATE_UINT32_TEST(intrmask, IVShmemState, test_no_msix),
+
+        VMSTATE_END_OF_LIST()
+    },
+    .load_state_old = ivshmem_load_old,
+    .minimum_version_id_old = 0
+};
+
+static Property ivshmem_properties[] = {
+    DEFINE_PROP_CHR("chardev", IVShmemState, server_chr),
+    DEFINE_PROP_STRING("size", IVShmemState, sizearg),
+    DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 1),
+    DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD,
+                    false),
+    DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true),
+    DEFINE_PROP_STRING("shm", IVShmemState, shmobj),
+    DEFINE_PROP_STRING("role", IVShmemState, role),
+    DEFINE_PROP_UINT32("use64", IVShmemState, not_legacy_32bit, 1),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static HostMemoryBackend *desugar_shm(const char *shm, size_t size)
+{
+    /* TODO avoid the detour through QemuOpts */
+    static int counter;
+    QemuOpts *opts = qemu_opts_create(qemu_find_opts("object"),
+                                      NULL, 0, &error_abort);
+    char *path;
+    Object *obj;
+
+    qemu_opt_set(opts, "qom-type", "memory-backend-file",
+    &error_abort);
+    /* FIXME need a better way to make up an ID */
+    qemu_opts_set_id(opts, g_strdup_printf("ivshmem-backend-%d", counter++));
+    path = g_strdup_printf("/dev/shm/%s", shm);
+    qemu_opt_set(opts, "mem-path", path, &error_abort);
+    qemu_opt_set_number(opts, "size", size, &error_abort);
+    qemu_opt_set_bool(opts, "share", true, &error_abort);
+    g_free(path);
+
+    obj = user_creatable_add_opts(opts, &error_abort);
+    qemu_opts_del(opts);
+
+    user_creatable_complete(obj, &error_abort);
+
+    return MEMORY_BACKEND(obj);
+}
+
+static void ivshmem_realize(PCIDevice *dev, Error **errp)
+{
+    IVShmemState *s = IVSHMEM_COMMON(dev);
+
+    if (!qtest_enabled()) {
+        error_report("ivshmem is deprecated, please use ivshmem-plain"
+                     " or ivshmem-doorbell instead");
+    }
+
+    if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
+        error_setg(errp,
+                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
+        return;
+    }
+
+    if (s->hostmem) {
+        if (s->sizearg) {
+            g_warning("size argument ignored with hostmem");
+        }
+    } else if (s->sizearg == NULL) {
+        s->legacy_size = 4 << 20; /* 4 MB default */
+    } else {
+        char *end;
+        int64_t size = qemu_strtosz(s->sizearg, &end);
+        if (size < 0 || (size_t)size != size || *end != '\0'
+            || !is_power_of_2(size)) {
+            error_setg(errp, "Invalid size %s", s->sizearg);
+            return;
+        }
+        s->legacy_size = size;
+    }
+
+    /* check that role is reasonable */
+    if (s->role) {
+        if (strncmp(s->role, "peer", 5) == 0) {
+            s->master = ON_OFF_AUTO_OFF;
+        } else if (strncmp(s->role, "master", 7) == 0) {
+            s->master = ON_OFF_AUTO_ON;
+        } else {
+            error_setg(errp, "'role' must be 'peer' or 'master'");
+            return;
+        }
+    } else {
+        s->master = ON_OFF_AUTO_AUTO;
+    }
+
+    if (s->shmobj) {
+        s->hostmem = desugar_shm(s->shmobj, s->legacy_size);
+    }
+
+    /*
+     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
+     * bald-faced lie then.  But it's a backwards compatible lie.
+     */
+    pci_config_set_interrupt_pin(dev->config, 1);
+
+    ivshmem_common_realize(dev, errp);
+}
+
+static void ivshmem_init(Object *obj)
+{
+    IVShmemState *s = IVSHMEM(obj);
+
+    object_property_add_link(obj, "x-memdev", TYPE_MEMORY_BACKEND,
+                             (Object **)&s->hostmem,
+                             ivshmem_check_memdev_is_busy,
+                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
+                             &error_abort);
+}
+
+static void ivshmem_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+
+    k->realize = ivshmem_realize;
+    k->revision = 0;
+    dc->desc = "Inter-VM shared memory (legacy)";
+    dc->props = ivshmem_properties;
+    dc->vmsd = &ivshmem_vmsd;
+}
+
+static const TypeInfo ivshmem_info = {
+    .name          = TYPE_IVSHMEM,
+    .parent        = TYPE_IVSHMEM_COMMON,
+    .instance_size = sizeof(IVShmemState),
+    .instance_init = ivshmem_init,
+    .class_init    = ivshmem_class_init,
+};
+
 static void ivshmem_register_types(void)
 {
     type_register_static(&ivshmem_common_info);
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 36/38] ivshmem: Drop ivshmem property x-memdev
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (34 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 35/38] ivshmem: Clean up after the previous commit Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-03 14:03   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 37/38] ivshmem: Require master to have ID zero Markus Armbruster
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 38/38] contrib/ivshmem-server: Print "not for production" warning Markus Armbruster
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Use ivshmem-plain instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 hw/misc/ivshmem.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 33b6842..f6fce15 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -1197,8 +1197,7 @@ static void ivshmem_realize(PCIDevice *dev, Error **errp)
     }
 
     if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
-        error_setg(errp,
-                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
+        error_setg(errp, "You must specify either 'shm' or 'chardev'");
         return;
     }
 
@@ -1246,17 +1245,6 @@ static void ivshmem_realize(PCIDevice *dev, Error **errp)
     ivshmem_common_realize(dev, errp);
 }
 
-static void ivshmem_init(Object *obj)
-{
-    IVShmemState *s = IVSHMEM(obj);
-
-    object_property_add_link(obj, "x-memdev", TYPE_MEMORY_BACKEND,
-                             (Object **)&s->hostmem,
-                             ivshmem_check_memdev_is_busy,
-                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
-                             &error_abort);
-}
-
 static void ivshmem_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1273,7 +1261,6 @@ static const TypeInfo ivshmem_info = {
     .name          = TYPE_IVSHMEM,
     .parent        = TYPE_IVSHMEM_COMMON,
     .instance_size = sizeof(IVShmemState),
-    .instance_init = ivshmem_init,
     .class_init    = ivshmem_class_init,
 };
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 37/38] ivshmem: Require master to have ID zero
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (35 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 36/38] ivshmem: Drop ivshmem property x-memdev Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-03 14:11   ` Marc-André Lureau
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 38/38] contrib/ivshmem-server: Print "not for production" warning Markus Armbruster
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

Migration with ivshmem needs to be carefully orchestrated to work.
Exactly one peer (the "master") migrates to the destination, all other
peers need to unplug (and disconnect), migrate, plug back (and
reconnect).  This is sort of documented in qemu-doc.

If peers connect on the destination before migration completes, the
shared memory can get messed up.  This isn't documented anywhere.  Fix
that in qemu-doc.

To avoid messing up register IVPosition on migration, the server must
assign the same ID on source and destination.  ivshmem-spec.txt leaves
ID assignment unspecified, however.

Amend ivshmem-spec.txt to require the first client to receive ID zero.
The example ivshmem-server complies: it always assigns the first
unused ID.

For a bit of additional safety, enforce ID zero for the master.  This
does nothing when we're not using a server, because the ID is zero for
all peers then.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/specs/ivshmem-spec.txt | 2 ++
 hw/misc/ivshmem.c           | 6 ++++++
 qemu-doc.texi               | 5 +++++
 3 files changed, 13 insertions(+)

diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
index 047dfc0..b062acd 100644
--- a/docs/specs/ivshmem-spec.txt
+++ b/docs/specs/ivshmem-spec.txt
@@ -164,6 +164,8 @@ For each new client that connects to the server, the server
 - sends interrupt setup messages to the new client (these contain file
   descriptors for receiving interrupts).
 
+The first client to connect to the server receives ID zero.
+
 When a client disconnects from the server, the server sends disconnect
 notifications to the other clients.
 
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index f6fce15..9a292e5 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -869,6 +869,12 @@ static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
             return;
         }
 
+        if (s->master == ON_OFF_AUTO_ON && s->vm_id != 0) {
+            error_setg(errp,
+                       "master must connect to the server before any peers");
+            return;
+        }
+
         qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
                               ivshmem_read, NULL, s);
 
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 0dd01c7..79141d3 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1295,12 +1295,17 @@ When using the server, the guest will be assigned a VM ID (>=0) that allows gues
 using the same server to communicate via interrupts.  Guests can read their
 VM ID from a device register (see ivshmem-spec.txt).
 
+@subsubsection Migration with ivshmem
+
 With device property @option{master=on}, the guest will copy the shared
 memory on migration to the destination host.  With @option{master=off},
 the guest will not be able to migrate with the device attached.  In the
 latter case, the device should be detached and then reattached after
 migration using the PCI hotplug support.
 
+At most one of the devices sharing the same memory can be master.  The
+master must complete migration before you plug back the other devices.
+
 @subsubsection ivshmem and hugepages
 
 Instead of specifying the <shm size> using POSIX shm, you may specify
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* [Qemu-devel] [PATCH 38/38] contrib/ivshmem-server: Print "not for production" warning
  2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
                   ` (36 preceding siblings ...)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 37/38] ivshmem: Require master to have ID zero Markus Armbruster
@ 2016-02-29 18:40 ` Markus Armbruster
  2016-03-03 14:15   ` Marc-André Lureau
  37 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-02-29 18:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: claudio.fontana, cam, mlureau, david.marchand, pbonzini

The code is okay for illustrating how things work and for testing, but
its error handling make it unfit for production use.  Print a warning
to protect the innocent.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 contrib/ivshmem-server/main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c
index cca1061..97488dc 100644
--- a/contrib/ivshmem-server/main.c
+++ b/contrib/ivshmem-server/main.c
@@ -197,6 +197,12 @@ main(int argc, char *argv[])
     };
     int ret = 1;
 
+    /*
+     * Do not remove this notice without adding proper error handling!
+     * Start with handling ivshmem_server_send_one_msg() failure.
+     */
+    printf("*** Example code, do not use in production ***\n");
+
     /* parse arguments, will exit on error */
     ivshmem_server_parse_args(&args, argc, argv);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 02/38] qemu-doc: Fix ivshmem huge page example
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 02/38] qemu-doc: Fix ivshmem huge page example Markus Armbruster
@ 2016-03-01 10:51   ` Marc-André Lureau
  2016-03-01 11:35   ` Paolo Bonzini
  1 sibling, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 10:51 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Option parameter "share" is missing.  Without it, you get a *private*
> mmap(), which defeats ivshmem's purpose pretty thoroughly ;)
>
> While there, switch to the conventional mountpoint of hugetlbfs
> /dev/hugepages.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


> ---
>  qemu-doc.texi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index bc9dd13..65f3b29 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -1311,7 +1311,7 @@ Instead of specifying the <shm size> using POSIX shm, you may specify
>  a memory backend that has hugepage support:
>
>  @example
> -qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/mnt/hugepages/my-shmem-file,id=mb1
> +qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1
>                   -device ivshmem,x-memdev=mb1
>  @end example
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 03/38] event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 03/38] event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD Markus Armbruster
@ 2016-03-01 10:57   ` Marc-André Lureau
  2016-03-01 12:00     ` Markus Armbruster
  2016-03-01 11:35   ` Paolo Bonzini
  1 sibling, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 10:57 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Event notifiers are designed for eventfd(2).  They can fall back to
> pipes, but according to Paolo, event_notifier_init_fd() really
> requires the real thing, and should therefore be under #ifdef
> CONFIG_EVENTFD.  Do that.
>
> Its only user is ivshmem, which is currently CONFIG_POSIX.  Narrow it
> to CONFIG_EVENTFD.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  default-configs/pci.mak     | 2 +-
>  util/event_notifier-posix.c | 6 ++++++
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/default-configs/pci.mak b/default-configs/pci.mak
> index 4fa9a28..9c8bc68 100644
> --- a/default-configs/pci.mak
> +++ b/default-configs/pci.mak
> @@ -36,5 +36,5 @@ CONFIG_SDHCI=y
>  CONFIG_EDU=y
>  CONFIG_VGA=y
>  CONFIG_VGA_PCI=y
> -CONFIG_IVSHMEM=$(CONFIG_POSIX)
> +CONFIG_IVSHMEM=$(CONFIG_EVENTFD)

This narrows ivshmem to eventfd os only. Eventually after the split,
it is easier to bring back posix for ivshmem-plain, but it's important
to highlight this change.

>  CONFIG_ROCKER=y
> diff --git a/util/event_notifier-posix.c b/util/event_notifier-posix.c
> index 2e30e74..c9657a6 100644
> --- a/util/event_notifier-posix.c
> +++ b/util/event_notifier-posix.c
> @@ -20,11 +20,17 @@
>  #include <sys/eventfd.h>
>  #endif
>
> +#ifdef CONFIG_EVENTFD
> +/*
> + * Initialize @e with existing file descriptor @fd.
> + * @fd must be a genuine eventfd object, emulation with pipe won't do.
> + */
>  void event_notifier_init_fd(EventNotifier *e, int fd)
>  {
>      e->rfd = fd;
>      e->wfd = fd;
>  }
> +#endif
>
>  int event_notifier_init(EventNotifier *e, int active)
>  {
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 04/38] tests/libqos/pci-pc: Fix qpci_pc_iomap() to map BARs aligned
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 04/38] tests/libqos/pci-pc: Fix qpci_pc_iomap() to map BARs aligned Markus Armbruster
@ 2016-03-01 11:05   ` Marc-André Lureau
  2016-03-01 12:05     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 11:05 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> qpci_pc_iomap() maps BARs one after the other, without padding.  This
> is wrong.  PCI Local Bus Specification Revision 3.0, 6.2.5.1. Address
> Maps: "all address spaces used are a power of two in size and are
> naturally aligned".  That's because the size of a BAR is given by the
> number of address bits the device decodes, and the BAR needs to be
> mapped at a multiple of that size to ensure the address decoding
> works.
>
> Fix qpci_pc_iomap() accordingly.  This takes care of a FIXME in
> ivshmem-test.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Neat, thanks for fixing my fixme ;)

> ---
>  tests/ivshmem-test.c  | 17 ++++++++---------
>  tests/libqos/pci-pc.c |  8 ++++++--
>  2 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
> index e184c67..e118377 100644
> --- a/tests/ivshmem-test.c
> +++ b/tests/ivshmem-test.c
> @@ -110,19 +110,18 @@ static void setup_vm_cmd(IVState *s, const char *cmd, bool msix)
>      s->pcibus = qpci_init_pc();
>      s->dev = get_device(s->pcibus);
>
> -    /* FIXME: other bar order fails, mappings changes */
> -    s->mem_base = qpci_iomap(s->dev, 2, &barsize);
> -    g_assert_nonnull(s->mem_base);
> -    g_assert_cmpuint(barsize, ==, TMPSHMSIZE);
> -
> -    if (msix) {
> -        qpci_msix_enable(s->dev);
> -    }
> -
>      s->reg_base = qpci_iomap(s->dev, 0, &barsize);
>      g_assert_nonnull(s->reg_base);
>      g_assert_cmpuint(barsize, ==, 256);
>
> +    if (msix) {
> +        qpci_msix_enable(s->dev);
> +    }
> +
> +    s->mem_base = qpci_iomap(s->dev, 2, &barsize);
> +    g_assert_nonnull(s->mem_base);
> +    g_assert_cmpuint(barsize, ==, TMPSHMSIZE);
> +
>      qpci_device_enable(s->dev);
>  }
>
> diff --git a/tests/libqos/pci-pc.c b/tests/libqos/pci-pc.c
> index 08167c0..77f15e5 100644
> --- a/tests/libqos/pci-pc.c
> +++ b/tests/libqos/pci-pc.c
> @@ -184,7 +184,9 @@ static void *qpci_pc_iomap(QPCIBus *bus, QPCIDevice *dev, int barno, uint64_t *s
>      if (io_type == PCI_BASE_ADDRESS_SPACE_IO) {
>          uint16_t loc;
>
> -        g_assert((s->pci_iohole_alloc + size) <= s->pci_iohole_size);
> +        g_assert(QEMU_ALIGN_UP(s->pci_iohole_alloc, size) + size
> +                 <= s->pci_iohole_size);
> +        s->pci_iohole_alloc = QEMU_ALIGN_UP(s->pci_iohole_alloc, size);
>          loc = s->pci_iohole_start + s->pci_iohole_alloc;
>          s->pci_iohole_alloc += size;
>
> @@ -194,7 +196,9 @@ static void *qpci_pc_iomap(QPCIBus *bus, QPCIDevice *dev, int barno, uint64_t *s
>      } else {
>          uint64_t loc;
>
> -        g_assert((s->pci_hole_alloc + size) <= s->pci_hole_size);
> +        g_assert(QEMU_ALIGN_UP(s->pci_hole_alloc, size) + size
> +                 <= s->pci_hole_size);
> +        s->pci_hole_alloc = QEMU_ALIGN_UP(s->pci_hole_alloc, size);
>          loc = s->pci_hole_start + s->pci_hole_alloc;
>          s->pci_hole_alloc += size;
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 05/38] ivshmem-test: Improve test case /ivshmem/single
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 05/38] ivshmem-test: Improve test case /ivshmem/single Markus Armbruster
@ 2016-03-01 11:06   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 11:06 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Test state of registers after reset.
>
> Test reading Interrupt Status clears it.
>
> Test (invalid) read of Doorbell.
>
> Add more comments.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


> ---
>  tests/ivshmem-test.c | 23 ++++++++++++++++-------
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
> index e118377..ba4d9f1 100644
> --- a/tests/ivshmem-test.c
> +++ b/tests/ivshmem-test.c
> @@ -143,32 +143,41 @@ static void test_ivshmem_single(void)
>      setup_vm(&state);
>      s = &state;
>
> -    /* valid io */
> -    out_reg(s, INTRMASK, 0);
> -    in_reg(s, INTRSTATUS);
> -    in_reg(s, IVPOSITION);
> +    /* initial state of readable registers */
> +    g_assert_cmpuint(in_reg(s, INTRMASK), ==, 0);
> +    g_assert_cmpuint(in_reg(s, INTRSTATUS), ==, 0);
> +    g_assert_cmpuint(in_reg(s, IVPOSITION), ==, 0);
>
> +    /* trigger interrupt via registers */
>      out_reg(s, INTRMASK, 0xffffffff);
>      g_assert_cmpuint(in_reg(s, INTRMASK), ==, 0xffffffff);
>      out_reg(s, INTRSTATUS, 1);
> -    /* XXX: intercept IRQ, not seen in resp */
> +    /* check interrupt status */
>      g_assert_cmpuint(in_reg(s, INTRSTATUS), ==, 1);
> +    /* reading clears */
> +    g_assert_cmpuint(in_reg(s, INTRSTATUS), ==, 0);
> +    /* TODO intercept actual interrupt (needs qtest work) */
>
> -    /* invalid io */
> +    /* invalid register access */
>      out_reg(s, IVPOSITION, 1);
> +    in_reg(s, DOORBELL);
> +
> +    /* ring the (non-functional) doorbell */
>      out_reg(s, DOORBELL, 8 << 16);
>
> +    /* write shared memory */
>      for (i = 0; i < G_N_ELEMENTS(data); i++) {
>          data[i] = i;
>      }
>      qtest_memwrite(s->qtest, (uintptr_t)s->mem_base, data, sizeof(data));
>
> +    /* verify write */
>      for (i = 0; i < G_N_ELEMENTS(data); i++) {
>          g_assert_cmpuint(((uint32_t *)tmpshmem)[i], ==, i);
>      }
>
> +    /* read it back and verify read */
>      memset(data, 0, sizeof(data));
> -
>      qtest_memread(s->qtest, (uintptr_t)s->mem_base, data, sizeof(data));
>      for (i = 0; i < G_N_ELEMENTS(data); i++) {
>          g_assert_cmpuint(data[i], ==, i);
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 06/38] ivshmem-test: Clean up wait for devices to become operational
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 06/38] ivshmem-test: Clean up wait for devices to become operational Markus Armbruster
@ 2016-03-01 11:10   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 11:10 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> test_ivshmem_server() waits until the first byte in BAR 2 contains the
> 0x42 we put into shared memory.  Works because the byte reads zero
> until the device maps the shared memory gotten from the server.
>
> Check the IVPosition register instead: it's initially -1, and becomes
> non-negative right when the device maps the share memory, so no
> change, just cleaner, because it's what guest software is supposed to
> do.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

> ---
>  tests/ivshmem-test.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
> index ba4d9f1..f40c3497 100644
> --- a/tests/ivshmem-test.c
> +++ b/tests/ivshmem-test.c
> @@ -301,7 +301,6 @@ static void test_ivshmem_server(bool msi)
>      int nvectors = 2;
>      guint64 end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND;
>
> -    memset(tmpshmem, 0x42, TMPSHMSIZE);
>      ret = ivshmem_server_init(&server, tmpserver, tmpshm,
>                                TMPSHMSIZE, nvectors,
>                                g_test_verbose());
> @@ -315,9 +314,9 @@ static void test_ivshmem_server(bool msi)
>      setup_vm_with_server(&state2, nvectors, msi);
>      s2 = &state2;
>
> +    /* check state before server sends stuff */
>      g_assert_cmpuint(in_reg(s1, IVPOSITION), ==, 0xffffffff);
>      g_assert_cmpuint(in_reg(s2, IVPOSITION), ==, 0xffffffff);
> -
>      g_assert_cmpuint(qtest_readb(s1->qtest, (uintptr_t)s1->mem_base), ==, 0x00);
>
>      thread.server = &server;
> @@ -326,12 +325,11 @@ static void test_ivshmem_server(bool msi)
>      thread.thread = g_thread_new("ivshmem-server", server_thread, &thread);
>      g_assert(thread.thread != NULL);
>
> -    /* waiting until mapping is done */
> +    /* waiting for devices to become operational */
>      while (g_get_monotonic_time() < end_time) {
>          g_usleep(1000);
> -
> -        if (qtest_readb(s1->qtest, (uintptr_t)s1->mem_base) == 0x42 &&
> -            qtest_readb(s2->qtest, (uintptr_t)s2->mem_base) == 0x42) {
> +        if ((int)in_reg(s1, IVPOSITION) >= 0 &&
> +            (int)in_reg(s2, IVPOSITION) >= 0) {
>              break;
>          }
>      }
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 07/38] ivshmem-test: Improve test cases /ivshmem/server-*
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 07/38] ivshmem-test: Improve test cases /ivshmem/server-* Markus Armbruster
@ 2016-03-01 11:13   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 11:13 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Document missing test: behavior with MSI-X present but not enabled.
>
> For MSI-X, we test and clear the interrupt pending bit before testing
> the interrupt.  For INTx, we only clear.  Change to test and clear for
> consistency.
>
> Test MSI-X vector 1 in addition to vector 0.
>
> Improve comments.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


> ---
>  tests/ivshmem-test.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
> index f40c3497..c1dd7bb 100644
> --- a/tests/ivshmem-test.c
> +++ b/tests/ivshmem-test.c
> @@ -339,18 +339,21 @@ static void test_ivshmem_server(bool msi)
>      vm2 = in_reg(s2, IVPOSITION);
>      g_assert_cmpuint(vm1, !=, vm2);
>
> +    /* check number of MSI-X vectors */
>      global_qtest = s1->qtest;
>      if (msi) {
>          ret = qpci_msix_table_size(s1->dev);
>          g_assert_cmpuint(ret, ==, nvectors);
>      }
>
> -    /* ping vm2 -> vm1 */
> +    /* TODO test behavior before MSI-X is enabled */
> +
> +    /* ping vm2 -> vm1 on vector 0 */
>      if (msi) {
>          ret = qpci_msix_pending(s1->dev, 0);
>          g_assert_cmpuint(ret, ==, 0);
>      } else {
> -        out_reg(s1, INTRSTATUS, 0);
> +        g_assert_cmpuint(in_reg(s1, INTRSTATUS), ==, 0);
>      }
>      out_reg(s2, DOORBELL, vm1 << 16);
>      do {
> @@ -359,18 +362,18 @@ static void test_ivshmem_server(bool msi)
>      } while (ret == 0 && g_get_monotonic_time() < end_time);
>      g_assert_cmpuint(ret, !=, 0);
>
> -    /* ping vm1 -> vm2 */
> +    /* ping vm1 -> vm2 on vector 1 */
>      global_qtest = s2->qtest;
>      if (msi) {
> -        ret = qpci_msix_pending(s2->dev, 0);
> +        ret = qpci_msix_pending(s2->dev, 1);
>          g_assert_cmpuint(ret, ==, 0);
>      } else {
> -        out_reg(s2, INTRSTATUS, 0);
> +        g_assert_cmpuint(in_reg(s2, INTRSTATUS), ==, 0);
>      }
> -    out_reg(s1, DOORBELL, vm2 << 16);
> +    out_reg(s1, DOORBELL, vm2 << 16 | 1);
>      do {
>          g_usleep(10000);
> -        ret = msi ? qpci_msix_pending(s2->dev, 0) : in_reg(s2, INTRSTATUS);
> +        ret = msi ? qpci_msix_pending(s2->dev, 1) : in_reg(s2, INTRSTATUS);
>      } while (ret == 0 && g_get_monotonic_time() < end_time);
>      g_assert_cmpuint(ret, !=, 0);
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 08/38] ivshmem: Rewrite specification document
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 08/38] ivshmem: Rewrite specification document Markus Armbruster
@ 2016-03-01 11:25   ` Marc-André Lureau
  2016-03-01 15:46   ` Eric Blake
  1 sibling, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 11:25 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> This started as an attempt to update ivshmem_device_spec.txt for
> clarity, accuracy and completeness while working on its code, and
> quickly became a full rewrite.  Since the diff would be useless
> anyway, I'm using the opportunity to rename the file to
> ivshmem-spec.txt.
>
> I tried hard to ensure the new text contradicts neither the old text
> nor the code.  If the new text contradicts the old text but not the
> code, it's probably a bug in the old text.  If the new text
> contradicts both, its probably a bug in the new text.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


> ---
>  docs/specs/ivshmem-spec.txt        | 244 +++++++++++++++++++++++++++++++++++++
>  docs/specs/ivshmem_device_spec.txt | 161 ------------------------
>  2 files changed, 244 insertions(+), 161 deletions(-)
>  create mode 100644 docs/specs/ivshmem-spec.txt
>  delete mode 100644 docs/specs/ivshmem_device_spec.txt
>
> diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
> new file mode 100644
> index 0000000..0835ba1
> --- /dev/null
> +++ b/docs/specs/ivshmem-spec.txt
> @@ -0,0 +1,244 @@
> += Device Specification for Inter-VM shared memory device =
> +
> +The Inter-VM shared memory device (ivshmem) is designed to share a
> +memory region between multiple QEMU processes running different guests
> +and the host.  In order for all guests to be able to pick up the
> +shared memory area, it is modeled by QEMU as a PCI device exposing
> +said memory to the guest as a PCI BAR.
> +
> +The device can use a shared memory object on the host directly, or it
> +can obtain one from an ivshmem server.
> +
> +In the latter case, the device can additionally interrupt its peers, and
> +get interrupted by its peers.
> +
> +
> +== Configuring the ivshmem PCI device ==
> +
> +There are two basic configurations:
> +
> +- Just shared memory: -device ivshmem,shm=NAME,...
> +
> +  This uses shared memory object NAME.
> +
> +- Shared memory plus interrupts: -device ivshmem,chardev=CHR,vectors=N,...
> +
> +  An ivshmem server must already be running on the host.  The device
> +  connects to the server's UNIX domain socket via character device
> +  CHR.
> +
> +  Each peer gets assigned a unique ID by the server.  IDs must be
> +  between 0 and 65535.
> +
> +  Interrupts are message-signaled by default (MSI-X).  With msi=off
> +  the device has no MSI-X capability, and uses legacy INTx instead.
> +  vectors=N configures the number of vectors to use.
> +
> +For more details on ivshmem device properties, see The QEMU Emulator
> +User Documentation (qemu-doc.*).
> +
> +
> +== The ivshmem PCI device's guest interface ==
> +
> +The device has vendor ID 1af4, device ID 1110, revision 0.
> +
> +=== PCI BARs ===
> +
> +The ivshmem PCI device has two or three BARs:
> +
> +- BAR0 holds device registers (256 Byte MMIO)
> +- BAR1 holds MSI-X table and PBA (only when using MSI-X)
> +- BAR2 maps the shared memory object
> +
> +There are two ways to use this device:
> +
> +- If you only need the shared memory part, BAR2 suffices.  This way,
> +  you have access to the shared memory in the guest and can use it as
> +  you see fit.  Memnic, for example, uses ivshmem this way from guest
> +  user space (see http://dpdk.org/browse/memnic).
> +
> +- If you additionally need the capability for peers to interrupt each
> +  other, you need BAR0 and, if using MSI-X, BAR1.  You will most
> +  likely want to write a kernel driver to handle interrupts.  Requires
> +  the device to be configured for interrupts, obviously.
> +
> +If the device is configured for interrupts, BAR2 is initially invalid.
> +It becomes safely accessible only after the ivshmem server provided
> +the shared memory.  Guest software should wait for the IVPosition
> +register (described below) to become non-negative before accessing
> +BAR2.
> +
> +The device is not capable to tell guest software whether it is
> +configured for interrupts.
> +
> +=== PCI device registers ===
> +
> +BAR 0 contains the following registers:
> +
> +    Offset  Size  Access      On reset  Function
> +        0     4   read/write        0   Interrupt Mask
> +                                        bit 0: peer interrupt
> +                                        bit 1..31: reserved
> +        4     4   read/write        0   Interrupt Status
> +                                        bit 0: peer interrupt
> +                                        bit 1..31: reserved
> +        8     4   read-only   0 or -1   IVPosition
> +       12     4   write-only      N/A   Doorbell
> +                                        bit 0..15: vector
> +                                        bit 16..31: peer ID
> +       16   240   none            N/A   reserved
> +
> +Software should only access the registers as specified in column
> +"Access".  Reserved bits should be ignored on read, and preserved on
> +write.
> +
> +Interrupt Status and Mask Register together control the legacy INTx
> +interrupt when the device has no MSI-X capability: INTx is asserted
> +when the bit-wise AND of Status and Mask is non-zero and the device
> +has no MSI-X capability.  Interrupt Status Register bit 0 becomes 1
> +when an interrupt request from a peer is received.  Reading the
> +register clears it.
> +
> +IVPosition Register: if the device is not configured for interrupts,
> +this is zero.  Else, it's -1 for a short while after reset, then
> +changes to the device's ID (between 0 and 65535).
> +
> +There is no good way for software to find out whether the device is
> +configured for interrupts.  A positive IVPosition means interrupts,
> +but zero could be either.  The initial -1 cannot be reliably observed.
> +
> +Doorbell Register: writing this register requests to interrupt a peer.
> +The written value's high 16 bits are the ID of the peer to interrupt,
> +and its low 16 bits select an interrupt vector.
> +
> +If the device is not configured for interrupts, the write is ignored.
> +
> +If the interrupt hasn't completed setup, the write is ignored.  The
> +device is not capable to tell guest software whether setup is
> +complete.  Interrupts can regress to this state on migration.
> +
> +If the peer with the requested ID isn't connected, or it has fewer
> +interrupt vectors connected, the write is ignored.  The device is not
> +capable to tell guest software what peers are connected, or how many
> +interrupt vectors are connected.
> +
> +If the peer doesn't use MSI-X, its Interrupt Status register is set to
> +1.  This asserts INTx unless masked by the Interrupt Mask register.
> +The device is not capable to communicate the interrupt vector to guest
> +software then.
> +
> +If the peer uses MSI-X, the interrupt for this vector becomes pending.
> +There is no way for software to clear the pending bit, and a polling
> +mode of operation is therefore impossible with MSI-X.
> +
> +With multiple MSI-X vectors, different vectors can be used to indicate
> +different events have occurred.  The semantics of interrupt vectors
> +are left to the application.
> +
> +
> +== Interrupt infrastructure ==
> +
> +When configured for interrupts, the peers share eventfd objects in
> +addition to shared memory.  The shared resources are managed by an
> +ivshmem server.
> +
> +=== The ivshmem server ===
> +
> +The server listens on a UNIX domain socket.
> +
> +For each new client that connects to the server, the server
> +- picks an ID,
> +- creates eventfd file descriptors for the interrupt vectors,
> +- sends the ID and the file descriptor for the shared memory to the
> +  new client,
> +- sends connect notifications for the new client to the other clients
> +  (these contain file descriptors for sending interrupts),
> +- sends connect notifications for the other clients to the new client,
> +  and
> +- sends interrupt setup messages to the new client (these contain file
> +  descriptors for receiving interrupts).
> +
> +When a client disconnects from the server, the server sends disconnect
> +notifications to the other clients.
> +
> +The next section describes the protocol in detail.
> +
> +If the server terminates without sending disconnect notifications for
> +its connected clients, the clients can elect to continue.  They can
> +communicate with each other normally, but won't receive disconnect
> +notification on disconnect, and no new clients can connect.  There is
> +no way for the clients to connect to a restarted the server.  The
> +device is not capable to tell guest software whether the server is
> +still up.
> +
> +Example server code is in contrib/ivshmem-server/.  Not to be used in
> +production.  It assumes all clients use the same number of interrupt
> +vectors.
> +
> +A standalone client is in contrib/ivshmem-client/.  It can be useful
> +for debugging.
> +
> +=== The ivshmem Client-Server Protocol ===
> +
> +An ivshmem device configured for interrupts connects to an ivshmem
> +server.  This section details the protocol between the two.
> +
> +The connection is one-way: the server sends messages to the client.
> +Each message consists of a single 8 byte little-endian signed number,
> +and may be accompanied by a file descriptor via SCM_RIGHTS.  Both
> +client and server close the connection on error.
> +
> +On connect, the server sends the following messages in order:
> +
> +1. The protocol version number, currently zero.  The client should
> +   close the connection on receipt of versions it can't handle.
> +
> +2. The client's ID.  This is unique among all clients of this server.
> +   IDs must be between 0 and 65535, because the Doorbell register
> +   provides only 16 bits for them.
> +
> +3. The number -1, accompanied by the file descriptor for the shared
> +   memory.
> +
> +4. Connect notifications for existing other clients, if any.  This is
> +   a peer ID (number between 0 and 65535 other than the client's ID),
> +   repeated N times.  Each repetition is accompanied by one file
> +   descriptor.  These are for interrupting the peer with that ID using
> +   vector 0,..,N-1, in order.  If the client is configured for fewer
> +   vectors, it closes the extra file descriptors.  If it is configured
> +   for more, the extra vectors remain unconnected.
> +
> +5. Interrupt setup.  This is the client's own ID, repeated N times.
> +   Each repetition is accompanied by one file descriptor.  These are
> +   for receiving interrupts from peers using vector 0,..,N-1, in
> +   order.  If the client is configured for fewer vectors, it closes
> +   the extra file descriptors.  If it is configured for more, the
> +   extra vectors remain unconnected.
> +
> +From then on, the server sends these kinds of messages:
> +
> +6. Connection / disconnection notification.  This is a peer ID.
> +
> +  - If the number comes with a file descriptor, it's a connection
> +    notification, exactly like in step 4.
> +
> +  - Else, it's a disconnection notification for the peer with that ID.
> +
> +Known bugs:
> +
> +* The protocol changed incompatibly in QEMU 2.5.  Before, messages
> +  were native endian long, and there was no version number.
> +
> +* The protocol is poorly designed.
> +
> +=== The ivshmem Client-Client Protocol ===
> +
> +An ivshmem device configured for interrupts receives eventfd file
> +descriptors for interrupting peers and getting interrupted by peers
> +from the server, as explained in the previous section.
> +
> +To interrupt a peer, the device writes the 8-byte integer 1 in native
> +byte order to the respective file descriptor.
> +
> +To receive an interrupt, the device reads and discards as many 8-byte
> +integers as it can.
> diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt
> deleted file mode 100644
> index d318d65..0000000
> --- a/docs/specs/ivshmem_device_spec.txt
> +++ /dev/null
> @@ -1,161 +0,0 @@
> -
> -Device Specification for Inter-VM shared memory device
> -------------------------------------------------------
> -
> -The Inter-VM shared memory device is designed to share a memory region (created
> -on the host via the POSIX shared memory API) between multiple QEMU processes
> -running different guests. In order for all guests to be able to pick up the
> -shared memory area, it is modeled by QEMU as a PCI device exposing said memory
> -to the guest as a PCI BAR.
> -The memory region does not belong to any guest, but is a POSIX memory object on
> -the host. The host can access this shared memory if needed.
> -
> -The device also provides an optional communication mechanism between guests
> -sharing the same memory object. More details about that in the section 'Guest to
> -guest communication' section.
> -
> -
> -The Inter-VM PCI device
> ------------------------
> -
> -From the VM point of view, the ivshmem PCI device supports three BARs.
> -
> -- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is
> -  not used.
> -- BAR1 is used for MSI-X when it is enabled in the device.
> -- BAR2 is used to access the shared memory object.
> -
> -It is your choice how to use the device but you must choose between two
> -behaviors :
> -
> -- basically, if you only need the shared memory part, you will map BAR2.
> -  This way, you have access to the shared memory in guest and can use it as you
> -  see fit (memnic, for example, uses it in userland
> -  http://dpdk.org/browse/memnic).
> -
> -- BAR0 and BAR1 are used to implement an optional communication mechanism
> -  through interrupts in the guests. If you need an event mechanism between the
> -  guests accessing the shared memory, you will most likely want to write a
> -  kernel driver that will handle interrupts. See details in the section 'Guest
> -  to guest communication' section.
> -
> -The behavior is chosen when starting your QEMU processes:
> -- no communication mechanism needed, the first QEMU to start creates the shared
> -  memory on the host, subsequent QEMU processes will use it.
> -
> -- communication mechanism needed, an ivshmem server must be started before any
> -  QEMU processes, then each QEMU process connects to the server unix socket.
> -
> -For more details on the QEMU ivshmem parameters, see qemu-doc documentation.
> -
> -
> -Guest to guest communication
> -----------------------------
> -
> -This section details the communication mechanism between the guests accessing
> -the ivhsmem shared memory.
> -
> -*ivshmem server*
> -
> -This server code is available in qemu.git/contrib/ivshmem-server.
> -
> -The server must be started on the host before any guest.
> -It creates a shared memory object then waits for clients to connect on a unix
> -socket. All the messages are little-endian int64_t integer.
> -
> -For each client (QEMU process) that connects to the server:
> -- the server sends a protocol version, if client does not support it, the client
> -  closes the communication,
> -- the server assigns an ID for this client and sends this ID to him as the first
> -  message,
> -- the server sends a fd to the shared memory object to this client,
> -- the server creates a new set of host eventfds associated to the new client and
> -  sends this set to all already connected clients,
> -- finally, the server sends all the eventfds sets for all clients to the new
> -  client.
> -
> -The server signals all clients when one of them disconnects.
> -
> -The client IDs are limited to 16 bits because of the current implementation (see
> -Doorbell register in 'PCI device registers' subsection). Hence only 65536
> -clients are supported.
> -
> -All the file descriptors (fd to the shared memory, eventfds for each client)
> -are passed to clients using SCM_RIGHTS over the server unix socket.
> -
> -Apart from the current ivshmem implementation in QEMU, an ivshmem client has
> -been provided in qemu.git/contrib/ivshmem-client for debug.
> -
> -*QEMU as an ivshmem client*
> -
> -At initialisation, when creating the ivshmem device, QEMU first receives a
> -protocol version and closes communication with server if it does not match.
> -Then, QEMU gets its ID from the server then makes it available through BAR0
> -IVPosition register for the VM to use (see 'PCI device registers' subsection).
> -QEMU then uses the fd to the shared memory to map it to BAR2.
> -eventfds for all other clients received from the server are stored to implement
> -BAR0 Doorbell register (see 'PCI device registers' subsection).
> -Finally, eventfds assigned to this QEMU process are used to send interrupts in
> -this VM.
> -
> -*PCI device registers*
> -
> -From the VM point of view, the ivshmem PCI device supports 4 registers of
> -32-bits each.
> -
> -enum ivshmem_registers {
> -    IntrMask = 0,
> -    IntrStatus = 4,
> -    IVPosition = 8,
> -    Doorbell = 12
> -};
> -
> -The first two registers are the interrupt mask and status registers.  Mask and
> -status are only used with pin-based interrupts.  They are unused with MSI
> -interrupts.
> -
> -Status Register: The status register is set to 1 when an interrupt occurs.
> -
> -Mask Register: The mask register is bitwise ANDed with the interrupt status
> -and the result will raise an interrupt if it is non-zero.  However, since 1 is
> -the only value the status will be set to, it is only the first bit of the mask
> -that has any effect.  Therefore interrupts can be masked by setting the first
> -bit to 0 and unmasked by setting the first bit to 1.
> -
> -IVPosition Register: The IVPosition register is read-only and reports the
> -guest's ID number.  The guest IDs are non-negative integers.  When using the
> -server, since the server is a separate process, the VM ID will only be set when
> -the device is ready (shared memory is received from the server and accessible
> -via the device).  If the device is not ready, the IVPosition will return -1.
> -Applications should ensure that they have a valid VM ID before accessing the
> -shared memory.
> -
> -Doorbell Register:  To interrupt another guest, a guest must write to the
> -Doorbell register.  The doorbell register is 32-bits, logically divided into
> -two 16-bit fields.  The high 16-bits are the guest ID to interrupt and the low
> -16-bits are the interrupt vector to trigger.  The semantics of the value
> -written to the doorbell depends on whether the device is using MSI or a regular
> -pin-based interrupt.  In short, MSI uses vectors while regular interrupts set
> -the status register.
> -
> -Regular Interrupts
> -
> -If regular interrupts are used (due to either a guest not supporting MSI or the
> -user specifying not to use them on startup) then the value written to the lower
> -16-bits of the Doorbell register results is arbitrary and will trigger an
> -interrupt in the destination guest.
> -
> -Message Signalled Interrupts
> -
> -An ivshmem device may support multiple MSI vectors.  If so, the lower 16-bits
> -written to the Doorbell register must be between 0 and the maximum number of
> -vectors the guest supports.  The lower 16 bits written to the doorbell is the
> -MSI vector that will be raised in the destination guest.  The number of MSI
> -vectors is configurable but it is set when the VM is started.
> -
> -The important thing to remember with MSI is that it is only a signal, no status
> -is set (since MSI interrupts are not shared).  All information other than the
> -interrupt itself should be communicated via the shared memory region.  Devices
> -supporting multiple MSI vectors can use different vectors to indicate different
> -events have occurred.  The semantics of interrupt vectors are left to the
> -user's discretion.
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 01/38] exec: Fix memory allocation when memory path names new file
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 01/38] exec: Fix memory allocation when memory path names new file Markus Armbruster
@ 2016-03-01 11:35   ` Paolo Bonzini
  2016-03-01 11:58     ` Markus Armbruster
  2016-03-04 18:50     ` Markus Armbruster
  0 siblings, 2 replies; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-01 11:35 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: mlureau, cam, claudio.fontana, david.marchand



On 29/02/2016 19:40, Markus Armbruster wrote:
> -    if (!stat(path, &st) && S_ISDIR(st.st_mode)) {
> +    ret = stat(path, &st);
> +    if (!ret && S_ISDIR(st.st_mode)) {
> +        /* path names a directory -> create a temporary file there */
>          /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>          sanitized_name = g_strdup(memory_region_name(block->mr));
>          for (c = sanitized_name; *c != '\0'; c++) {
> @@ -1282,13 +1271,32 @@ static void *file_ram_alloc(RAMBlock *block,
>              unlink(filename);
>          }
>          g_free(filename);
> +    } else if (!ret) {
> +        /* path names an existing file -> use it */
> +        fd = open(path, O_RDWR);
>      } else {
> +        /* create a new file */
>          fd = open(path, O_RDWR | O_CREAT, 0644);
> +        unlink_on_error = true;
>      }

While at it, let's avoid TOCTTOU conditions:

    for (;;) {
        fd = open(path, O_RDWR);
        if (fd != -1) {
            break;
        }
        if (errno == ENOENT) {
            fd = open(path, O_RDWR | O_CREAT | O_EXCL, 0644);
            if (fd != -1) {
                unlink_on_error = true;
                break;
            }
        } else if (errno == EISDIR) {
            ... mkstemp ...
            if (fd != -1) {
                unlink_on_error = true;
                break;
            }
        }
        if (errno != EEXIST && errno != EINTR) {
            goto error;
        }
    }

and use fstatfs in gethugepagesize.

Paolo

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 02/38] qemu-doc: Fix ivshmem huge page example
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 02/38] qemu-doc: Fix ivshmem huge page example Markus Armbruster
  2016-03-01 10:51   ` Marc-André Lureau
@ 2016-03-01 11:35   ` Paolo Bonzini
  1 sibling, 0 replies; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-01 11:35 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: mlureau, cam, claudio.fontana, david.marchand



On 29/02/2016 19:40, Markus Armbruster wrote:
> Option parameter "share" is missing.  Without it, you get a *private*
> mmap(), which defeats ivshmem's purpose pretty thoroughly ;)
> 
> While there, switch to the conventional mountpoint of hugetlbfs
> /dev/hugepages.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  qemu-doc.texi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index bc9dd13..65f3b29 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -1311,7 +1311,7 @@ Instead of specifying the <shm size> using POSIX shm, you may specify
>  a memory backend that has hugepage support:
>  
>  @example
> -qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/mnt/hugepages/my-shmem-file,id=mb1
> +qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1
>                   -device ivshmem,x-memdev=mb1
>  @end example
>  
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 03/38] event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 03/38] event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD Markus Armbruster
  2016-03-01 10:57   ` Marc-André Lureau
@ 2016-03-01 11:35   ` Paolo Bonzini
  1 sibling, 0 replies; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-01 11:35 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: mlureau, cam, claudio.fontana, david.marchand



On 29/02/2016 19:40, Markus Armbruster wrote:
> Event notifiers are designed for eventfd(2).  They can fall back to
> pipes, but according to Paolo, event_notifier_init_fd() really
> requires the real thing, and should therefore be under #ifdef
> CONFIG_EVENTFD.  Do that.
> 
> Its only user is ivshmem, which is currently CONFIG_POSIX.  Narrow it
> to CONFIG_EVENTFD.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  default-configs/pci.mak     | 2 +-
>  util/event_notifier-posix.c | 6 ++++++
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/default-configs/pci.mak b/default-configs/pci.mak
> index 4fa9a28..9c8bc68 100644
> --- a/default-configs/pci.mak
> +++ b/default-configs/pci.mak
> @@ -36,5 +36,5 @@ CONFIG_SDHCI=y
>  CONFIG_EDU=y
>  CONFIG_VGA=y
>  CONFIG_VGA_PCI=y
> -CONFIG_IVSHMEM=$(CONFIG_POSIX)
> +CONFIG_IVSHMEM=$(CONFIG_EVENTFD)
>  CONFIG_ROCKER=y
> diff --git a/util/event_notifier-posix.c b/util/event_notifier-posix.c
> index 2e30e74..c9657a6 100644
> --- a/util/event_notifier-posix.c
> +++ b/util/event_notifier-posix.c
> @@ -20,11 +20,17 @@
>  #include <sys/eventfd.h>
>  #endif
>  
> +#ifdef CONFIG_EVENTFD
> +/*
> + * Initialize @e with existing file descriptor @fd.
> + * @fd must be a genuine eventfd object, emulation with pipe won't do.
> + */
>  void event_notifier_init_fd(EventNotifier *e, int fd)
>  {
>      e->rfd = fd;
>      e->wfd = fd;
>  }
> +#endif
>  
>  int event_notifier_init(EventNotifier *e, int active)
>  {
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 29/38] ivshmem: Implement shm=... with a memory backend
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 29/38] ivshmem: Implement shm=... with a memory backend Markus Armbruster
@ 2016-03-01 11:37   ` Paolo Bonzini
  2016-03-01 12:08     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-01 11:37 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: mlureau, cam, claudio.fontana, david.marchand



On 29/02/2016 19:40, Markus Armbruster wrote:
> ivshmem has its very own code to create and map shared memory.
> Replace that with an implicitly created memory backend.  Reduces the
> number of ways we create BAR 2 from three to two.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Very appreciated, but do not use user_creatable_add_opts.  Instead,
create the object with object_initialize, object_property_set_* and
user_creatable_complete.  After the object_initialize, add it with
object_property_add_child *under the ivshmem device itself*, giving it a
name like "internal-shm-backend".

This matches what virtio-blk dataplane used to do for x-dataplane (now
removed).

Thanks,

Paolo

> +static HostMemoryBackend *desugar_shm(const char *shm, size_t size)
> +{
> +    /* TODO avoid the detour through QemuOpts */
> +    static int counter;
> +    QemuOpts *opts = qemu_opts_create(qemu_find_opts("object"),
> +                                      NULL, 0, &error_abort);
> +    char *path;
> +    Object *obj;
> +
> +    qemu_opt_set(opts, "qom-type", "memory-backend-file",
> +    &error_abort);
> +    /* FIXME need a better way to make up an ID */
> +    qemu_opts_set_id(opts, g_strdup_printf("ivshmem-backend-%d", counter++));
> +    path = g_strdup_printf("/dev/shm/%s", shm);
> +    qemu_opt_set(opts, "mem-path", path, &error_abort);
> +    qemu_opt_set_number(opts, "size", size, &error_abort);
> +    qemu_opt_set_bool(opts, "share", true, &error_abort);
> +    g_free(path);
> +
> +    obj = user_creatable_add_opts(opts, &error_abort);
> +    qemu_opts_del(opts);
> +
> +    user_creatable_complete(obj, &error_abort);
> +
> +    return MEMORY_BACKEND(obj);
> +}
> +
>  static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>  {
>      IVShmemState *s = IVSHMEM(dev);
> @@ -911,6 +914,10 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>          attr |= PCI_BASE_ADDRESS_MEM_TYPE_64;
>      }
>  
> +    if (s->shmobj) {
> +        s->hostmem = desugar_shm(s->shmobj, s->ivshmem_size);
> +    }
> +
>      if (s->hostmem != NULL) {
>          MemoryRegion *mr;
>  
> @@ -921,7 +928,7 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>          vmstate_register_ram(mr, DEVICE(s));
>          memory_region_add_subregion(&s->bar, 0, mr);
>          pci_register_bar(PCI_DEVICE(s), 2, attr, &s->bar);
> -    } else if (s->server_chr != NULL) {
> +    } else {
>          IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
>                          s->server_chr->filename);
>  
> @@ -948,36 +955,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>              error_setg(errp, "failed to initialize interrupts");
>              return;
>          }
> -    } else {
> -        /* just map the file immediately, we're not using a server */
> -        int fd;
> -
> -        IVSHMEM_DPRINTF("using shm_open (shm object = %s)\n", s->shmobj);
> -
> -        /* try opening with O_EXCL and if it succeeds zero the memory
> -         * by truncating to 0 */
> -        if ((fd = shm_open(s->shmobj, O_CREAT|O_RDWR|O_EXCL,
> -                        S_IRWXU|S_IRWXG|S_IRWXO)) > 0) {
> -           /* truncate file to length PCI device's memory */
> -            if (ftruncate(fd, s->ivshmem_size) != 0) {
> -                error_report("could not truncate shared file");
> -            }
> -
> -        } else if ((fd = shm_open(s->shmobj, O_CREAT|O_RDWR,
> -                        S_IRWXU|S_IRWXG|S_IRWXO)) < 0) {
> -            error_setg(errp, "could not open shared file");
> -            return;
> -        }
> -
> -        if (check_shm_size(s, fd, errp) == -1) {
> -            return;
> -        }
> -
> -        create_shared_memory_BAR(s, fd, attr, &err);
> -        if (err) {
> -            error_propagate(errp, err);
> -            return;
> -        }
>      }
>  
>      if (s->role_val == IVSHMEM_PEER) {
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory) Markus Armbruster
@ 2016-03-01 11:42   ` Paolo Bonzini
  2016-03-01 12:14     ` Markus Armbruster
  2016-03-01 11:46   ` Paolo Bonzini
  1 sibling, 1 reply; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-01 11:42 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: mlureau, cam, claudio.fontana, david.marchand



On 29/02/2016 19:40, Markus Armbruster wrote:
> ivshmem_realize() puts the shared memory region in a container region.
> Used to be necessary to permit delayed mapping of the shared memory.
> Now we don't do that anymore, the container is redundant.  Drop it.

Can you explain why we don't do that anymore to someone who hasn't read
patches 4 to 28? :-)  Is it patch 23?

Paolo

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory)
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory) Markus Armbruster
  2016-03-01 11:42   ` Paolo Bonzini
@ 2016-03-01 11:46   ` Paolo Bonzini
  2016-03-01 14:06     ` Markus Armbruster
  1 sibling, 1 reply; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-01 11:46 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: mlureau, cam, claudio.fontana, david.marchand



On 29/02/2016 19:40, Markus Armbruster wrote:
> -    memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
> +    s->ivshmem_bar2 = g_new(MemoryRegion, 1);
> +    memory_region_init_ram_ptr(s->ivshmem_bar2, OBJECT(s),
>                                 "ivshmem.bar2", s->ivshmem_size, ptr);
> -    qemu_set_ram_fd(s->ivshmem.ram_addr, fd);
> -    vmstate_register_ram(&s->ivshmem, DEVICE(s));
> -    memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
> +    qemu_set_ram_fd(s->ivshmem_bar2->ram_addr, fd);

This is missing an instance_finalize callback to do

    if (s->ivshmem_bar2) {
        object_unparent(s->ivshmem_bar2);
        g_free(s->ivshmem_bar2);
    }

or, alternatively just use a flag (e.g. s->bar2_mapped) and allocate it
directly in the IVShmemState struct.

Paolo

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 01/38] exec: Fix memory allocation when memory path names new file
  2016-03-01 11:35   ` Paolo Bonzini
@ 2016-03-01 11:58     ` Markus Armbruster
  2016-03-04 18:50     ` Markus Armbruster
  1 sibling, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-01 11:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: claudio.fontana, cam, mlureau, qemu-devel, david.marchand

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 29/02/2016 19:40, Markus Armbruster wrote:
>> -    if (!stat(path, &st) && S_ISDIR(st.st_mode)) {
>> +    ret = stat(path, &st);
>> +    if (!ret && S_ISDIR(st.st_mode)) {
>> +        /* path names a directory -> create a temporary file there */
>>          /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>>          sanitized_name = g_strdup(memory_region_name(block->mr));
>>          for (c = sanitized_name; *c != '\0'; c++) {
>> @@ -1282,13 +1271,32 @@ static void *file_ram_alloc(RAMBlock *block,
>>              unlink(filename);
>>          }
>>          g_free(filename);
>> +    } else if (!ret) {
>> +        /* path names an existing file -> use it */
>> +        fd = open(path, O_RDWR);
>>      } else {
>> +        /* create a new file */
>>          fd = open(path, O_RDWR | O_CREAT, 0644);
>> +        unlink_on_error = true;
>>      }
>
> While at it, let's avoid TOCTTOU conditions:
>
>     for (;;) {
>         fd = open(path, O_RDWR);
>         if (fd != -1) {
>             break;
>         }
>         if (errno == ENOENT) {
>             fd = open(path, O_RDWR | O_CREAT | O_EXCL, 0644);
>             if (fd != -1) {
>                 unlink_on_error = true;
>                 break;
>             }
>         } else if (errno == EISDIR) {
>             ... mkstemp ...
>             if (fd != -1) {
>                 unlink_on_error = true;
>                 break;
>             }
>         }
>         if (errno != EEXIST && errno != EINTR) {
>             goto error;
>         }
>     }
>
> and use fstatfs in gethugepagesize.

Good point, will do!

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 03/38] event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD
  2016-03-01 10:57   ` Marc-André Lureau
@ 2016-03-01 12:00     ` Markus Armbruster
  2016-03-01 12:05       ` Paolo Bonzini
  0 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-03-01 12:00 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> Event notifiers are designed for eventfd(2).  They can fall back to
>> pipes, but according to Paolo, event_notifier_init_fd() really
>> requires the real thing, and should therefore be under #ifdef
>> CONFIG_EVENTFD.  Do that.
>>
>> Its only user is ivshmem, which is currently CONFIG_POSIX.  Narrow it
>> to CONFIG_EVENTFD.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  default-configs/pci.mak     | 2 +-
>>  util/event_notifier-posix.c | 6 ++++++
>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/default-configs/pci.mak b/default-configs/pci.mak
>> index 4fa9a28..9c8bc68 100644
>> --- a/default-configs/pci.mak
>> +++ b/default-configs/pci.mak
>> @@ -36,5 +36,5 @@ CONFIG_SDHCI=y
>>  CONFIG_EDU=y
>>  CONFIG_VGA=y
>>  CONFIG_VGA_PCI=y
>> -CONFIG_IVSHMEM=$(CONFIG_POSIX)
>> +CONFIG_IVSHMEM=$(CONFIG_EVENTFD)
>
> This narrows ivshmem to eventfd os only. Eventually after the split,
> it is easier to bring back posix for ivshmem-plain,

Good point.

>                                                     but it's important
> to highlight this change.

Yes.  Any ideas on how to highlight it more?

[...]

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 03/38] event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD
  2016-03-01 12:00     ` Markus Armbruster
@ 2016-03-01 12:05       ` Paolo Bonzini
  0 siblings, 0 replies; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-01 12:05 UTC (permalink / raw)
  To: Markus Armbruster, Marc-André Lureau
  Cc: cam, Claudio Fontana, QEMU, David Marchand



On 01/03/2016 13:00, Markus Armbruster wrote:
> Marc-André Lureau <marcandre.lureau@gmail.com> writes:
> 
>> Hi
>>
>> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>> Event notifiers are designed for eventfd(2).  They can fall back to
>>> pipes, but according to Paolo, event_notifier_init_fd() really
>>> requires the real thing, and should therefore be under #ifdef
>>> CONFIG_EVENTFD.  Do that.
>>>
>>> Its only user is ivshmem, which is currently CONFIG_POSIX.  Narrow it
>>> to CONFIG_EVENTFD.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>>>  default-configs/pci.mak     | 2 +-
>>>  util/event_notifier-posix.c | 6 ++++++
>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/default-configs/pci.mak b/default-configs/pci.mak
>>> index 4fa9a28..9c8bc68 100644
>>> --- a/default-configs/pci.mak
>>> +++ b/default-configs/pci.mak
>>> @@ -36,5 +36,5 @@ CONFIG_SDHCI=y
>>>  CONFIG_EDU=y
>>>  CONFIG_VGA=y
>>>  CONFIG_VGA_PCI=y
>>> -CONFIG_IVSHMEM=$(CONFIG_POSIX)
>>> +CONFIG_IVSHMEM=$(CONFIG_EVENTFD)
>>
>> This narrows ivshmem to eventfd os only. Eventually after the split,
>> it is easier to bring back posix for ivshmem-plain,
> 
> Good point.
> 
>>                                                     but it's important
>> to highlight this change.
> 
> Yes.  Any ideas on how to highlight it more?

Release notes should do, under "Build dependencies".

Paolo

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 04/38] tests/libqos/pci-pc: Fix qpci_pc_iomap() to map BARs aligned
  2016-03-01 11:05   ` Marc-André Lureau
@ 2016-03-01 12:05     ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-01 12:05 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> qpci_pc_iomap() maps BARs one after the other, without padding.  This
>> is wrong.  PCI Local Bus Specification Revision 3.0, 6.2.5.1. Address
>> Maps: "all address spaces used are a power of two in size and are
>> naturally aligned".  That's because the size of a BAR is given by the
>> number of address bits the device decodes, and the BAR needs to be
>> mapped at a multiple of that size to ensure the address decoding
>> works.
>>
>> Fix qpci_pc_iomap() accordingly.  This takes care of a FIXME in
>> ivshmem-test.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>
> Neat, thanks for fixing my fixme ;)

You're welcome :)

Of course, I found it the hard way regardless.  I experimented with new
tests, and suddenly *nothing* worked anymore.  WTF?!?  After "some"
debugging, I notced a BAR mapped at a funny address...  what happens if
I align it properly?  Everything works, that's what happens.  And then I
remembered your FIXME.

Thanks!

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 29/38] ivshmem: Implement shm=... with a memory backend
  2016-03-01 11:37   ` Paolo Bonzini
@ 2016-03-01 12:08     ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-01 12:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: claudio.fontana, cam, mlureau, qemu-devel, david.marchand

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 29/02/2016 19:40, Markus Armbruster wrote:
>> ivshmem has its very own code to create and map shared memory.
>> Replace that with an implicitly created memory backend.  Reduces the
>> number of ways we create BAR 2 from three to two.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
> Very appreciated, but do not use user_creatable_add_opts.  Instead,
> create the object with object_initialize, object_property_set_* and
> user_creatable_complete.  After the object_initialize, add it with
> object_property_add_child *under the ivshmem device itself*, giving it a
> name like "internal-shm-backend".

Will do.

> This matches what virtio-blk dataplane used to do for x-dataplane (now
> removed).

Note to self: commit a616fb7, try to steal that code.

Thanks!

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory)
  2016-03-01 11:42   ` Paolo Bonzini
@ 2016-03-01 12:14     ` Markus Armbruster
  2016-03-01 12:17       ` Paolo Bonzini
  0 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-03-01 12:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: claudio.fontana, cam, mlureau, qemu-devel, david.marchand

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 29/02/2016 19:40, Markus Armbruster wrote:
>> ivshmem_realize() puts the shared memory region in a container region.
>> Used to be necessary to permit delayed mapping of the shared memory.
>> Now we don't do that anymore, the container is redundant.  Drop it.
>
> Can you explain why we don't do that anymore to someone who hasn't read
> patches 4 to 28? :-)  Is it patch 23?

Yes, but you also need 24 to complete the job.

Commit message could perhaps explain it like this:

    ivshmem_realize() puts the shared memory region in a container
    region.  Used to be necessary to permit delayed mapping of the
    shared memory.  However, we recently moved to synchronous mapping,
    in "ivshmem: Receive shared memory synchronously in realize()" and
    the commit following it.  The container is redundant since then.
    Drop it.

Better?

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory)
  2016-03-01 12:14     ` Markus Armbruster
@ 2016-03-01 12:17       ` Paolo Bonzini
  0 siblings, 0 replies; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-01 12:17 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: claudio.fontana, cam, mlureau, qemu-devel, david.marchand



On 01/03/2016 13:14, Markus Armbruster wrote:
>> > Can you explain why we don't do that anymore to someone who hasn't read
>> > patches 4 to 28? :-)  Is it patch 23?
> Yes, but you also need 24 to complete the job.
> 
> Commit message could perhaps explain it like this:
> 
>     ivshmem_realize() puts the shared memory region in a container
>     region.  Used to be necessary to permit delayed mapping of the
>     shared memory.  However, we recently moved to synchronous mapping,
>     in "ivshmem: Receive shared memory synchronously in realize()" and
>     the commit following it.  The container is redundant since then.
>     Drop it.
> 
> Better?

Yes, thanks!

Paolo

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 09/38] ivshmem: Add missing newlines to debug printfs
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 09/38] ivshmem: Add missing newlines to debug printfs Markus Armbruster
@ 2016-03-01 12:20   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 12:20 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


>  hw/misc/ivshmem.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 48b7a34..b74b02c 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -568,10 +568,10 @@ static void setup_interrupt(IVShmemState *s, int vector)
>      IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);
>
>      if (!with_irqfd) {
> -        IVSHMEM_DPRINTF("with eventfd");
> +        IVSHMEM_DPRINTF("with eventfd\n");
>          watch_vector_notifier(s, n, vector);
>      } else if (msix_enabled(pdev)) {
> -        IVSHMEM_DPRINTF("with irqfd");
> +        IVSHMEM_DPRINTF("with irqfd\n");
>          if (ivshmem_add_kvm_msi_virq(s, vector) < 0) {
>              return;
>          }
> @@ -582,7 +582,7 @@ static void setup_interrupt(IVShmemState *s, int vector)
>          }
>      } else {
>          /* it will be delayed until msix is enabled, in write_config */
> -        IVSHMEM_DPRINTF("with irqfd, delayed until msix enabled");
> +        IVSHMEM_DPRINTF("with irqfd, delayed until msix enabled\n");
>      }
>  }
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 10/38] ivshmem: Compile debug prints unconditionally to prevent bit-rot
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 10/38] ivshmem: Compile debug prints unconditionally to prevent bit-rot Markus Armbruster
@ 2016-03-01 12:22   ` Marc-André Lureau
  2016-03-01 15:49     ` Eric Blake
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 12:22 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

(apparently, there are other places in qemu where this conversion could be done)

>  hw/misc/ivshmem.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index b74b02c..395f357 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -48,13 +48,13 @@
>
>  #define IVSHMEM_REG_BAR_SIZE 0x100
>
> -//#define DEBUG_IVSHMEM
> -#ifdef DEBUG_IVSHMEM
> -#define IVSHMEM_DPRINTF(fmt, ...)        \
> -    do {printf("IVSHMEM: " fmt, ## __VA_ARGS__); } while (0)
> -#else
> -#define IVSHMEM_DPRINTF(fmt, ...)
> -#endif
> +#define IVSHMEM_DEBUG 0
> +#define IVSHMEM_DPRINTF(fmt, ...)                       \
> +    do {                                                \
> +        if (IVSHMEM_DEBUG) {                            \
> +            printf("IVSHMEM: " fmt, ## __VA_ARGS__);    \
> +        }                                               \
> +    } while (0)
>
>  #define TYPE_IVSHMEM "ivshmem"
>  #define IVSHMEM(obj) \
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 11/38] ivshmem: Clean up after commit 9940c32
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 11/38] ivshmem: Clean up after commit 9940c32 Markus Armbruster
@ 2016-03-01 12:47   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 12:47 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> IVShmemState member eventfd_chr is useless since commit 9940c32.  Drop
> it.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

oops indeed,
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


> ---
>  hw/misc/ivshmem.c | 12 ------------
>  1 file changed, 12 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 395f357..b087dc3 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -79,7 +79,6 @@ typedef struct IVShmemState {
>      uint32_t intrmask;
>      uint32_t intrstatus;
>
> -    CharDriverState **eventfd_chr;
>      CharDriverState *server_chr;
>      Fifo8 incoming_fifo;
>      MemoryRegion ivshmem_mmio;
> @@ -941,8 +940,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>
>          pci_register_bar(dev, 2, attr, &s->bar);
>
> -        s->eventfd_chr = g_malloc0(s->vectors * sizeof(CharDriverState *));
> -
>          qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
>                                ivshmem_check_version, ivshmem_event, s);
>      } else {
> @@ -1004,15 +1001,6 @@ static void pci_ivshmem_exit(PCIDevice *dev)
>          memory_region_del_subregion(&s->bar, &s->ivshmem);
>      }
>
> -    if (s->eventfd_chr) {
> -        for (i = 0; i < s->vectors; i++) {
> -            if (s->eventfd_chr[i]) {
> -                qemu_chr_free(s->eventfd_chr[i]);
> -            }
> -        }
> -        g_free(s->eventfd_chr);
> -    }
> -
>      if (s->peers) {
>          for (i = 0; i < s->nb_peers; i++) {
>              close_peer_eventfds(s, i);
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 12/38] ivshmem: Drop ivshmem_event() stub
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 12/38] ivshmem: Drop ivshmem_event() stub Markus Armbruster
@ 2016-03-01 12:48   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 12:48 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


>  hw/misc/ivshmem.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index b087dc3..7119a07 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -267,11 +267,6 @@ static int ivshmem_can_receive(void * opaque)
>      return sizeof(int64_t);
>  }
>
> -static void ivshmem_event(void *opaque, int event)
> -{
> -    IVSHMEM_DPRINTF("ivshmem_event %d\n", event);
> -}
> -
>  static void ivshmem_vector_notify(void *opaque)
>  {
>      MSIVector *entry = opaque;
> @@ -719,7 +714,7 @@ static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
>
>      IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n");
>      qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read,
> -                          ivshmem_event, s);
> +                          NULL, s);
>  }
>
>  /* Select the MSI-X vectors used by device.
> @@ -941,7 +936,7 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>          pci_register_bar(dev, 2, attr, &s->bar);
>
>          qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
> -                              ivshmem_check_version, ivshmem_event, s);
> +                              ivshmem_check_version, NULL, s);
>      } else {
>          /* just map the file immediately, we're not using a server */
>          int fd;
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory)
  2016-03-01 11:46   ` Paolo Bonzini
@ 2016-03-01 14:06     ` Markus Armbruster
  2016-03-01 15:15       ` Paolo Bonzini
  0 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-03-01 14:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: claudio.fontana, cam, mlureau, qemu-devel, david.marchand

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 29/02/2016 19:40, Markus Armbruster wrote:
>> -    memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
>> +    s->ivshmem_bar2 = g_new(MemoryRegion, 1);
>> +    memory_region_init_ram_ptr(s->ivshmem_bar2, OBJECT(s),
>>                                 "ivshmem.bar2", s->ivshmem_size, ptr);
>> -    qemu_set_ram_fd(s->ivshmem.ram_addr, fd);
>> -    vmstate_register_ram(&s->ivshmem, DEVICE(s));
>> -    memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
>> +    qemu_set_ram_fd(s->ivshmem_bar2->ram_addr, fd);
>
> This is missing an instance_finalize callback to do
>
>     if (s->ivshmem_bar2) {
>         object_unparent(s->ivshmem_bar2);
>         g_free(s->ivshmem_bar2);
>     }

Since it's allocated within ivshmem_realize(), I guess I could free it
in ivshmem_exit().

> or, alternatively just use a flag (e.g. s->bar2_mapped) and allocate it
> directly in the IVShmemState struct.

I'll see what comes out nicer.  Thanks!

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory)
  2016-03-01 14:06     ` Markus Armbruster
@ 2016-03-01 15:15       ` Paolo Bonzini
  2016-03-02 11:06         ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-01 15:15 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: claudio.fontana, cam, mlureau, qemu-devel, david.marchand



On 01/03/2016 15:06, Markus Armbruster wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 29/02/2016 19:40, Markus Armbruster wrote:
>>> -    memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
>>> +    s->ivshmem_bar2 = g_new(MemoryRegion, 1);
>>> +    memory_region_init_ram_ptr(s->ivshmem_bar2, OBJECT(s),
>>>                                 "ivshmem.bar2", s->ivshmem_size, ptr);
>>> -    qemu_set_ram_fd(s->ivshmem.ram_addr, fd);
>>> -    vmstate_register_ram(&s->ivshmem, DEVICE(s));
>>> -    memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
>>> +    qemu_set_ram_fd(s->ivshmem_bar2->ram_addr, fd);
>>
>> This is missing an instance_finalize callback to do
>>
>>     if (s->ivshmem_bar2) {
>>         object_unparent(s->ivshmem_bar2);
>>         g_free(s->ivshmem_bar2);
>>     }
> 
> Since it's allocated within ivshmem_realize(), I guess I could free it
> in ivshmem_exit().

Unfortunately you can't, because the guest might be using it at the time
of hot-unplug (e.g. DMAing from disk to it).  Unrealize is the place
where you hide stuff, and in this case the PCI core does it for you;
finalize is the place where you free stuff.

This is mentioned (though not really in these terms) in docs/memory.txt.

Paolo

>> or, alternatively just use a flag (e.g. s->bar2_mapped) and allocate it
>> directly in the IVShmemState struct.
> 
> I'll see what comes out nicer.  Thanks!
> 

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 13/38] ivshmem: Don't destroy the chardev on version mismatch
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 13/38] ivshmem: Don't destroy the chardev on version mismatch Markus Armbruster
@ 2016-03-01 15:39   ` Marc-André Lureau
  2016-03-02  9:52     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 15:39 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Yes, the chardev is commonly useless after we read a bad version from
> it, but destroying it is inappropriate anyway: the user created it, so
> the user should be able to hold on to it as long as he likes.  We
> don't destroy it on other errors.  Screwed up in commit 5105b1d.
>
> Stop reading instead.
>
> Also note QEMU's behavior in ivshmem-spec.txt.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Interestingly enough, it seems both smartcard "passthru" and usb
redirect do chr_delete(). It would a nice follow up to standardize
this there too.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>



> ---
>  docs/specs/ivshmem-spec.txt | 3 +++
>  hw/misc/ivshmem.c           | 3 +--
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
> index 0835ba1..4fc6f37 100644
> --- a/docs/specs/ivshmem-spec.txt
> +++ b/docs/specs/ivshmem-spec.txt
> @@ -188,6 +188,9 @@ Each message consists of a single 8 byte little-endian signed number,
>  and may be accompanied by a file descriptor via SCM_RIGHTS.  Both
>  client and server close the connection on error.
>
> +Note: QEMU currently doesn't close the connection right on error, but
> +only when the character device is destroyed.
> +
>  On connect, the server sends the following messages in order:
>
>  1. The protocol version number, currently zero.  The client should
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 7119a07..2850e8a 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -707,8 +707,7 @@ static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
>      if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) {
>          fprintf(stderr, "incompatible version, you are connecting to a ivshmem-"
>                  "server using a different protocol please check your setup\n");
> -        qemu_chr_delete(s->server_chr);
> -        s->server_chr = NULL;
> +        qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s);
>          return;
>      }
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 08/38] ivshmem: Rewrite specification document
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 08/38] ivshmem: Rewrite specification document Markus Armbruster
  2016-03-01 11:25   ` Marc-André Lureau
@ 2016-03-01 15:46   ` Eric Blake
  2016-03-02  9:50     ` Markus Armbruster
  1 sibling, 1 reply; 118+ messages in thread
From: Eric Blake @ 2016-03-01 15:46 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel
  Cc: mlureau, cam, claudio.fontana, david.marchand, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1632 bytes --]

On 02/29/2016 11:40 AM, Markus Armbruster wrote:
> This started as an attempt to update ivshmem_device_spec.txt for
> clarity, accuracy and completeness while working on its code, and
> quickly became a full rewrite.  Since the diff would be useless
> anyway, I'm using the opportunity to rename the file to
> ivshmem-spec.txt.
> 
> I tried hard to ensure the new text contradicts neither the old text
> nor the code.  If the new text contradicts the old text but not the
> code, it's probably a bug in the old text.  If the new text
> contradicts both, its probably a bug in the new text.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

> +If the server terminates without sending disconnect notifications for
> +its connected clients, the clients can elect to continue.  They can
> +communicate with each other normally, but won't receive disconnect
> +notification on disconnect, and no new clients can connect.  There is
> +no way for the clients to connect to a restarted the server.  The

s/the server/server/

> +device is not capable to tell guest software whether the server is
> +still up.

Wow - lots of shortcomings in the server protocol.  Food for thought for
future improvements, but I'm happy with your approach of just
documenting pitfalls for now.

> +
> +Known bugs:
> +
> +* The protocol changed incompatibly in QEMU 2.5.  Before, messages
> +  were native endian long, and there was no version number.
> +
> +* The protocol is poorly designed.


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 14/38] ivshmem: Fix harmless misuse of Error
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 14/38] ivshmem: Fix harmless misuse of Error Markus Armbruster
@ 2016-03-01 15:47   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 15:47 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> We reuse errp after passing it host_memory_backend_get_memory().  If
> both host_memory_backend_get_memory() and the reuse set an error, the
> reuse will fail the assertion in error_setv().  Fortunately,
> host_memory_backend_get_memory() can't fail.
>
> Pass it &error_abort to make our assumption explicit, and to get the
> assertion failure in the right place should it become invalid.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>



>  hw/misc/ivshmem.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 2850e8a..eb53d9a 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -841,7 +841,7 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>              g_warning("size argument ignored with hostmem");
>          }
>
> -        mr = host_memory_backend_get_memory(s->hostmem, errp);
> +        mr = host_memory_backend_get_memory(s->hostmem, &error_abort);
>          s->ivshmem_size = memory_region_size(mr);
>      } else if (s->sizearg == NULL) {
>          s->ivshmem_size = 4 << 20; /* 4 MB default */
> @@ -906,7 +906,8 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>
>          IVSHMEM_DPRINTF("using hostmem\n");
>
> -        mr = host_memory_backend_get_memory(MEMORY_BACKEND(s->hostmem), errp);
> +        mr = host_memory_backend_get_memory(MEMORY_BACKEND(s->hostmem),
> +                                            &error_abort);
>          vmstate_register_ram(mr, DEVICE(s));
>          memory_region_add_subregion(&s->bar, 0, mr);
>          pci_register_bar(PCI_DEVICE(s), 2, attr, &s->bar);
> @@ -1131,7 +1132,7 @@ static void ivshmem_check_memdev_is_busy(Object *obj, const char *name,
>  {
>      MemoryRegion *mr;
>
> -    mr = host_memory_backend_get_memory(MEMORY_BACKEND(val), errp);
> +    mr = host_memory_backend_get_memory(MEMORY_BACKEND(val), &error_abort);
>      if (memory_region_is_mapped(mr)) {
>          char *path = object_get_canonical_path_component(val);
>          error_setg(errp, "can't use already busy memdev: %s", path);
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 10/38] ivshmem: Compile debug prints unconditionally to prevent bit-rot
  2016-03-01 12:22   ` Marc-André Lureau
@ 2016-03-01 15:49     ` Eric Blake
  2016-03-02  9:51       ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Eric Blake @ 2016-03-01 15:49 UTC (permalink / raw)
  To: Marc-André Lureau, Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

[-- Attachment #1: Type: text/plain, Size: 591 bytes --]

On 03/01/2016 05:22 AM, Marc-André Lureau wrote:
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
> 
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> 
> (apparently, there are other places in qemu where this conversion could be done)

Yep. I try to flag them when I see someone touch one, but a global
search-and-replace would be a nice beginner's project.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 15/38] ivshmem: Failed realize() can leave migration blocker behind
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 15/38] ivshmem: Failed realize() can leave migration blocker behind Markus Armbruster
@ 2016-03-01 15:59   ` Marc-André Lureau
  2016-03-02  9:54     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 15:59 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> If pci_ivshmem_realize() fails after it created its migration blocker,
> the blocker is left in place.  Fix that by creating it last.
>
> Likewise, if it fails after it called fifo8_create(), it leaks fifo
> memory.  Fix that the same way.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Make sense, I didn't fully get that "realize" was suppose to handle
failure properly.

Btw, why do you introduce a new err variable? I guess that's easier to
deal with, perhaps in a following patch.

other than that
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>



>  hw/misc/ivshmem.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index eb53d9a..1392426 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -824,6 +824,7 @@ static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
>  static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>  {
>      IVShmemState *s = IVSHMEM(dev);
> +    Error *err = NULL;
>      uint8_t *pci_conf;
>      uint8_t attr = PCI_BASE_ADDRESS_SPACE_MEMORY |
>          PCI_BASE_ADDRESS_MEM_PREFETCH;
> @@ -855,8 +856,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>          s->ivshmem_size = size;
>      }
>
> -    fifo8_create(&s->incoming_fifo, sizeof(int64_t));
> -
>      /* IRQFD requires MSI */
>      if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) &&
>          !ivshmem_has_feature(s, IVSHMEM_MSI)) {
> @@ -878,12 +877,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>          s->role_val = IVSHMEM_MASTER; /* default */
>      }
>
> -    if (s->role_val == IVSHMEM_PEER) {
> -        error_setg(&s->migration_blocker,
> -                   "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
> -        migrate_add_blocker(s->migration_blocker);
> -    }
> -
>      pci_conf = dev->config;
>      pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
>
> @@ -962,7 +955,19 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>              return;
>          }
>
> -        create_shared_memory_BAR(s, fd, attr, errp);
> +        create_shared_memory_BAR(s, fd, attr, &err);
> +        if (err) {
> +            error_propagate(errp, err);
> +            return;
> +        }
> +    }
> +
> +    fifo8_create(&s->incoming_fifo, sizeof(int64_t));
> +
> +    if (s->role_val == IVSHMEM_PEER) {
> +        error_setg(&s->migration_blocker,
> +                   "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
> +        migrate_add_blocker(s->migration_blocker);
>      }
>  }
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 16/38] ivshmem: Clean up register callbacks
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 16/38] ivshmem: Clean up register callbacks Markus Armbruster
@ 2016-03-01 16:04   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 16:04 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


>  hw/misc/ivshmem.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 1392426..7191914 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -121,12 +121,10 @@ static inline uint32_t ivshmem_has_feature(IVShmemState *ivs,
>      return (ivs->features & (1 << feature));
>  }
>
> -/* accessing registers - based on rtl8139 */
>  static void ivshmem_update_irq(IVShmemState *s)
>  {
>      PCIDevice *d = PCI_DEVICE(s);
> -    int isr;
> -    isr = (s->intrstatus & s->intrmask) & 0xffffffff;
> +    uint32_t isr = s->intrstatus & s->intrmask;
>
>      /* don't print ISR resets */
>      if (isr) {
> @@ -134,7 +132,7 @@ static void ivshmem_update_irq(IVShmemState *s)
>                          isr ? 1 : 0, s->intrstatus, s->intrmask);
>      }
>
> -    pci_set_irq(d, (isr != 0));
> +    pci_set_irq(d, isr != 0);
>  }
>
>  static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
> @@ -142,7 +140,6 @@ static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val)
>      IVSHMEM_DPRINTF("IntrMask write(w) val = 0x%04x\n", val);
>
>      s->intrmask = val;
> -
>      ivshmem_update_irq(s);
>  }
>
> @@ -151,7 +148,6 @@ static uint32_t ivshmem_IntrMask_read(IVShmemState *s)
>      uint32_t ret = s->intrmask;
>
>      IVSHMEM_DPRINTF("intrmask read(w) val = 0x%04x\n", ret);
> -
>      return ret;
>  }
>
> @@ -160,7 +156,6 @@ static void ivshmem_IntrStatus_write(IVShmemState *s, uint32_t val)
>      IVSHMEM_DPRINTF("IntrStatus write(w) val = 0x%04x\n", val);
>
>      s->intrstatus = val;
> -
>      ivshmem_update_irq(s);
>  }
>
> @@ -170,9 +165,7 @@ static uint32_t ivshmem_IntrStatus_read(IVShmemState *s)
>
>      /* reading ISR clears all interrupts */
>      s->intrstatus = 0;
> -
>      ivshmem_update_irq(s);
> -
>      return ret;
>  }
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 17/38] ivshmem: Clean up MSI-X conditions
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 17/38] ivshmem: Clean up MSI-X conditions Markus Armbruster
@ 2016-03-01 16:57   ` Marc-André Lureau
  2016-03-02 10:25     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 16:57 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> There are three predicates related to MSI-X:
>
> * ivshmem_has_feature(s, IVSHMEM_MSI) is true unless the non-MSI-X
>   variant of the device is selected with msi=off.
>
> * msix_present() is true when the device has the PCI capability MSI-X.
>   It's initially false, and becomes true during successful realize of
>   the MSI-X variant of the device.  Thus, it's the same as
>   ivshmem_has_feature(s, IVSHMEM_MSI) for realized devices.
>
> * msix_enabled() is true when msix_present() is true and guest software
>   has enabled MSI-X.
>
> Code that differs between the non-MSI-X and the MSI-X variant of the
> device needs to be guarded by ivshmem_has_feature(s, IVSHMEM_MSI) or
> by msix_present(), except the latter works only for realized devices.
>
> Code that depends on whether MSI-X is in use needs to be guarded with
> msix_enabled().
>
> Code review led me to two minor messes:
>
> * ivshmem_vector_notify() calls msix_notify() even when
>   !msix_enabled(), unlike most other MSI-X-capable devices.  As far as
>   I can tell, msix_notify() does nothing when !msix_enabled().  Add
>   the guard anyway.
>

sure, feel free to split in a seperate patch with my Review-by.

> * Most callers of ivshmem_use_msix() guard it with
>   ivshmem_has_feature(s, IVSHMEM_MSI).  Not necessary, because
>   ivshmem_use_msix() does nothing when !msix_present().  That's
>   ivshmem's only use of msix_present(), though.  Rename
>   ivshmem_use_msix() to ivshmem_vector_use(), replace msix_present()
>   by ivshmem_has_feature() there, and drop the redundant guards.

I prefer that code related to msix remains within msix blocks if
possible, improving readability imho.

Furthermore, since the function is msix specific, I think it's worth
keeping the "msix" in the name. Since ivshmem_msix_use() wasn't good
enough for you, perhaps we need the full-blown
ivshmem_msix_vectors_use() instead.

>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  hw/misc/ivshmem.c | 22 +++++++++-------------
>  1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 7191914..cfea151 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -274,7 +274,9 @@ static void ivshmem_vector_notify(void *opaque)
>
>      IVSHMEM_DPRINTF("interrupt on vector %p %d\n", pdev, vector);
>      if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> -        msix_notify(pdev, vector);
> +        if (msix_enabled(pdev)) {
> +            msix_notify(pdev, vector);
> +        }
>      } else {
>          ivshmem_IntrStatus_write(s, 1);
>      }
> @@ -712,13 +714,12 @@ static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
>  /* Select the MSI-X vectors used by device.
>   * ivshmem maps events to vectors statically, so
>   * we just enable all vectors on init and after reset. */
> -static void ivshmem_use_msix(IVShmemState * s)
> +static void ivshmem_vector_use(IVShmemState *s)
>  {
>      PCIDevice *d = PCI_DEVICE(s);
>      int i;
>
> -    IVSHMEM_DPRINTF("%s, msix present: %d\n", __func__, msix_present(d));
> -    if (!msix_present(d)) {
> +    if (!ivshmem_has_feature(s, IVSHMEM_MSI)) {
>          return;
>      }
>
> @@ -733,7 +734,7 @@ static void ivshmem_reset(DeviceState *d)
>
>      s->intrstatus = 0;
>      s->intrmask = 0;
> -    ivshmem_use_msix(s);
> +    ivshmem_vector_use(s);
>  }
>
>  static int ivshmem_setup_interrupts(IVShmemState *s)
> @@ -747,9 +748,9 @@ static int ivshmem_setup_interrupts(IVShmemState *s)
>          }
>
>          IVSHMEM_DPRINTF("msix initialized (%d vectors)\n", s->vectors);
> -        ivshmem_use_msix(s);
>      }
>
> +    ivshmem_vector_use(s);
>      return 0;
>  }
>
> @@ -1034,12 +1035,7 @@ static int ivshmem_pre_load(void *opaque)
>
>  static int ivshmem_post_load(void *opaque, int version_id)
>  {
> -    IVShmemState *s = opaque;
> -
> -    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> -        ivshmem_use_msix(s);
> -    }
> -
> +    ivshmem_vector_use(opaque);
>      return 0;
>  }
>
> @@ -1067,11 +1063,11 @@ static int ivshmem_load_old(QEMUFile *f, void *opaque, int version_id)
>
>      if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
>          msix_load(pdev, f);
> -        ivshmem_use_msix(s);
>      } else {
>          s->intrstatus = qemu_get_be32(f);
>          s->intrmask = qemu_get_be32(f);
>      }
> +    ivshmem_vector_use(s);
>
>      return 0;
>  }
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 18/38] ivshmem: Leave INTx alone when using MSI-X
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 18/38] ivshmem: Leave INTx alone when using MSI-X Markus Armbruster
@ 2016-03-01 17:14   ` Marc-André Lureau
  2016-03-01 17:30     ` Paolo Bonzini
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-01 17:14 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> The ivshmem device can either use MSI-X or legacy INTx for interrupts.
>
> With MSI-X enabled, peer interrupt events trigger an MSI as they
> should.  But software can still raise INTx via interrupt status and
> mask register in BAR 0.  This is explicitly prohibited by PCI Local
> Bus Specification Revision 3.0, section 6.8.3.3:
>
>     While enabled for MSI or MSI-X operation, a function is prohibited
>     from using its INTx# pin (if implemented) to request service (MSI,
>     MSI-X, and INTx# are mutually exclusive).
>
> Fix the device model to leave INTx alone when using MSI-X.
>
> Document that we claim to use INTx in config space even when we don't.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  hw/misc/ivshmem.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index cfea151..fc37feb 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -126,6 +126,11 @@ static void ivshmem_update_irq(IVShmemState *s)
>      PCIDevice *d = PCI_DEVICE(s);
>      uint32_t isr = s->intrstatus & s->intrmask;
>
> +    /* No INTx with msi=off, whether the guest enabled MSI-X or not */
> +    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> +        return;
> +    }
> +
>      /* don't print ISR resets */
>      if (isr) {
>          IVSHMEM_DPRINTF("Set IRQ to %d (%04x %04x)\n",
> @@ -874,6 +879,10 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>      pci_conf = dev->config;
>      pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
>
> +    /*
> +     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
> +     * bald-faced lie then.  But it's a backwards compatible lie.
> +     */
>      pci_config_set_interrupt_pin(pci_conf, 1);

I am not sure how much of a problem this is. Apparently, other devices
claim interrupt and msi (ich, hda, pvscsi)

Better ask someone more familiar with PCI details.

>
>      memory_region_init_io(&s->ivshmem_mmio, OBJECT(s), &ivshmem_mmio_ops, s,
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 18/38] ivshmem: Leave INTx alone when using MSI-X
  2016-03-01 17:14   ` Marc-André Lureau
@ 2016-03-01 17:30     ` Paolo Bonzini
  2016-03-02 11:04       ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-01 17:30 UTC (permalink / raw)
  To: Marc-André Lureau, Markus Armbruster
  Cc: cam, Claudio Fontana, QEMU, David Marchand



On 01/03/2016 18:14, Marc-André Lureau wrote:
> > +    /*
> > +     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
> > +     * bald-faced lie then.  But it's a backwards compatible lie.
> > +     */
> >      pci_config_set_interrupt_pin(pci_conf, 1);
> 
> I am not sure how much of a problem this is. Apparently, other devices
> claim interrupt and msi (ich, hda, pvscsi)
> 
> Better ask someone more familiar with PCI details.

The interrupt pin is read-only and just helps the OS figure out which
interrupt is routed to intx.  If you return early from
ivshmem_update_irq if IVSHMEM_MSI, you should skip this line too.

I think it's better to leave this line in and check

    if (msix_enabled(pci_dev)) {
        return;
    }

in ivshmem_update_irq instead.  This matches what xhci does, for example.

Paolo

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 08/38] ivshmem: Rewrite specification document
  2016-03-01 15:46   ` Eric Blake
@ 2016-03-02  9:50     ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02  9:50 UTC (permalink / raw)
  To: Eric Blake
  Cc: claudio.fontana, qemu-devel, david.marchand, mlureau, pbonzini, cam

Eric Blake <eblake@redhat.com> writes:

> On 02/29/2016 11:40 AM, Markus Armbruster wrote:
>> This started as an attempt to update ivshmem_device_spec.txt for
>> clarity, accuracy and completeness while working on its code, and
>> quickly became a full rewrite.  Since the diff would be useless
>> anyway, I'm using the opportunity to rename the file to
>> ivshmem-spec.txt.
>> 
>> I tried hard to ensure the new text contradicts neither the old text
>> nor the code.  If the new text contradicts the old text but not the
>> code, it's probably a bug in the old text.  If the new text
>> contradicts both, its probably a bug in the new text.
>> 
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>
>> +If the server terminates without sending disconnect notifications for
>> +its connected clients, the clients can elect to continue.  They can
>> +communicate with each other normally, but won't receive disconnect
>> +notification on disconnect, and no new clients can connect.  There is
>> +no way for the clients to connect to a restarted the server.  The
>
> s/the server/server/

Will fix, thanks!

>> +device is not capable to tell guest software whether the server is
>> +still up.
>
> Wow - lots of shortcomings in the server protocol.  Food for thought for
> future improvements, but I'm happy with your approach of just
> documenting pitfalls for now.

Best we can do for 2.6 anyway :)

>> +
>> +Known bugs:
>> +
>> +* The protocol changed incompatibly in QEMU 2.5.  Before, messages
>> +  were native endian long, and there was no version number.
>> +
>> +* The protocol is poorly designed.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 10/38] ivshmem: Compile debug prints unconditionally to prevent bit-rot
  2016-03-01 15:49     ` Eric Blake
@ 2016-03-02  9:51       ` Markus Armbruster
  2016-03-02 15:52         ` Eric Blake
  0 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02  9:51 UTC (permalink / raw)
  To: Eric Blake
  Cc: Claudio Fontana, David Marchand, QEMU, Marc-André Lureau,
	Paolo Bonzini, cam

Eric Blake <eblake@redhat.com> writes:

> On 03/01/2016 05:22 AM, Marc-André Lureau wrote:
>> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>> 
>> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>> 
>> (apparently, there are other places in qemu where this conversion could be done)
>
> Yep. I try to flag them when I see someone touch one, but a global
> search-and-replace would be a nice beginner's project.

Would you like to add it http://wiki.qemu.org/BiteSizedTasks ?

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 13/38] ivshmem: Don't destroy the chardev on version mismatch
  2016-03-01 15:39   ` Marc-André Lureau
@ 2016-03-02  9:52     ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02  9:52 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> Yes, the chardev is commonly useless after we read a bad version from
>> it, but destroying it is inappropriate anyway: the user created it, so
>> the user should be able to hold on to it as long as he likes.  We
>> don't destroy it on other errors.  Screwed up in commit 5105b1d.
>>
>> Stop reading instead.
>>
>> Also note QEMU's behavior in ivshmem-spec.txt.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
> Interestingly enough, it seems both smartcard "passthru" and usb
> redirect do chr_delete(). It would a nice follow up to standardize
> this there too.

Wasn't aware of them.  I'll see what I can do.

> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 15/38] ivshmem: Failed realize() can leave migration blocker behind
  2016-03-01 15:59   ` Marc-André Lureau
@ 2016-03-02  9:54     ` Markus Armbruster
  2016-03-02 10:50       ` Marc-André Lureau
  0 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02  9:54 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> If pci_ivshmem_realize() fails after it created its migration blocker,
>> the blocker is left in place.  Fix that by creating it last.
>>
>> Likewise, if it fails after it called fifo8_create(), it leaks fifo
>> memory.  Fix that the same way.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>
> Make sense, I didn't fully get that "realize" was suppose to handle
> failure properly.
>
> Btw, why do you introduce a new err variable? I guess that's easier to
> deal with, perhaps in a following patch.

See explanation inline.

> other than that
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>
>
>
>>  hw/misc/ivshmem.c | 23 ++++++++++++++---------
>>  1 file changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index eb53d9a..1392426 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -824,6 +824,7 @@ static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
>>  static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>>  {
>>      IVShmemState *s = IVSHMEM(dev);
>> +    Error *err = NULL;
>>      uint8_t *pci_conf;
>>      uint8_t attr = PCI_BASE_ADDRESS_SPACE_MEMORY |
>>          PCI_BASE_ADDRESS_MEM_PREFETCH;
>> @@ -855,8 +856,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>>          s->ivshmem_size = size;
>>      }
>>
>> -    fifo8_create(&s->incoming_fifo, sizeof(int64_t));
>> -
>>      /* IRQFD requires MSI */
>>      if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) &&
>>          !ivshmem_has_feature(s, IVSHMEM_MSI)) {
>> @@ -878,12 +877,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>>          s->role_val = IVSHMEM_MASTER; /* default */
>>      }
>>
>> -    if (s->role_val == IVSHMEM_PEER) {
>> -        error_setg(&s->migration_blocker,
>> -                   "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
>> -        migrate_add_blocker(s->migration_blocker);
>> -    }
>> -
>>      pci_conf = dev->config;
>>      pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
>>
>> @@ -962,7 +955,19 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>>              return;
>>          }
>>
>> -        create_shared_memory_BAR(s, fd, attr, errp);
>> +        create_shared_memory_BAR(s, fd, attr, &err);
>> +        if (err) {
>> +            error_propagate(errp, err);
>> +            return;
>> +        }

Before my patch, passing errp to create_shared_memory_BAR() was fine,
because it was the last thing the function does.

Now, it isn't: we must bypass the rest of the function on error.

All clear now?

>> +    }
>> +
>> +    fifo8_create(&s->incoming_fifo, sizeof(int64_t));
>> +
>> +    if (s->role_val == IVSHMEM_PEER) {
>> +        error_setg(&s->migration_blocker,
>> +                   "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
>> +        migrate_add_blocker(s->migration_blocker);
>>      }
>>  }
>>
>> --
>> 2.4.3
>>
>>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 17/38] ivshmem: Clean up MSI-X conditions
  2016-03-01 16:57   ` Marc-André Lureau
@ 2016-03-02 10:25     ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 10:25 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> There are three predicates related to MSI-X:
>>
>> * ivshmem_has_feature(s, IVSHMEM_MSI) is true unless the non-MSI-X
>>   variant of the device is selected with msi=off.
>>
>> * msix_present() is true when the device has the PCI capability MSI-X.
>>   It's initially false, and becomes true during successful realize of
>>   the MSI-X variant of the device.  Thus, it's the same as
>>   ivshmem_has_feature(s, IVSHMEM_MSI) for realized devices.
>>
>> * msix_enabled() is true when msix_present() is true and guest software
>>   has enabled MSI-X.
>>
>> Code that differs between the non-MSI-X and the MSI-X variant of the
>> device needs to be guarded by ivshmem_has_feature(s, IVSHMEM_MSI) or
>> by msix_present(), except the latter works only for realized devices.
>>
>> Code that depends on whether MSI-X is in use needs to be guarded with
>> msix_enabled().
>>
>> Code review led me to two minor messes:
>>
>> * ivshmem_vector_notify() calls msix_notify() even when
>>   !msix_enabled(), unlike most other MSI-X-capable devices.  As far as
>>   I can tell, msix_notify() does nothing when !msix_enabled().  Add
>>   the guard anyway.
>>
>
> sure, feel free to split in a seperate patch with my Review-by.
>
>> * Most callers of ivshmem_use_msix() guard it with
>>   ivshmem_has_feature(s, IVSHMEM_MSI).  Not necessary, because
>>   ivshmem_use_msix() does nothing when !msix_present().  That's
>>   ivshmem's only use of msix_present(), though.  Rename
>>   ivshmem_use_msix() to ivshmem_vector_use(), replace msix_present()
>>   by ivshmem_has_feature() there, and drop the redundant guards.
>
> I prefer that code related to msix remains within msix blocks if
> possible, improving readability imho.
>
> Furthermore, since the function is msix specific, I think it's worth
> keeping the "msix" in the name. Since ivshmem_msix_use() wasn't good
> enough for you, perhaps we need the full-blown
> ivshmem_msix_vectors_use() instead.

"Vectors" means actually two related, but distinct things with ivshmem:

* the communication channels to transmit interrupts among peers, and

* the MSI-X vectors.

You can have the former without the latter, with msi=off.

I guess there are two views of the function, both reasonable:

1. Prepare usage of "vectors", i.e. either kind.  Name the function
ivshmem_msix_vectors_use(), and call it unconditionally.  The fact that
it does only MSI-X stuff is implementation detail.

2. Prepare usage of MSI-X vectors.  Name the function
ivshmem_msix_vectors_use() or similar, and calls it only when
ivshmem_has_feature(s, IVSHMEM_MSI), for consistency with other MSI-X
functions.

You prefer 2, I prefer 1.  But it's not a deal-breaker for me; if you
feel strongly, I can do 2.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 15/38] ivshmem: Failed realize() can leave migration blocker behind
  2016-03-02  9:54     ` Markus Armbruster
@ 2016-03-02 10:50       ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 10:50 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Wed, Mar 2, 2016 at 10:54 AM, Markus Armbruster <armbru@redhat.com> wrote:
>>> -        create_shared_memory_BAR(s, fd, attr, errp);
>>> +        create_shared_memory_BAR(s, fd, attr, &err);
>>> +        if (err) {
>>> +            error_propagate(errp, err);
>>> +            return;
>>> +        }
>
> Before my patch, passing errp to create_shared_memory_BAR() was fine,
> because it was the last thing the function does.
>
> Now, it isn't: we must bypass the rest of the function on error.
>
> All clear now?


Got it, thanks.

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 18/38] ivshmem: Leave INTx alone when using MSI-X
  2016-03-01 17:30     ` Paolo Bonzini
@ 2016-03-02 11:04       ` Markus Armbruster
  2016-03-02 14:15         ` Paolo Bonzini
  0 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 11:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Claudio Fontana, cam, Marc-André Lureau, QEMU, David Marchand

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 01/03/2016 18:14, Marc-André Lureau wrote:
>> > +    /*
>> > +     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
>> > +     * bald-faced lie then.  But it's a backwards compatible lie.
>> > +     */
>> >      pci_config_set_interrupt_pin(pci_conf, 1);
>> 
>> I am not sure how much of a problem this is. Apparently, other devices
>> claim interrupt and msi (ich, hda, pvscsi)
>> 
>> Better ask someone more familiar with PCI details.
>
> The interrupt pin is read-only and just helps the OS figure out which
> interrupt is routed to intx.  If you return early from
> ivshmem_update_irq if IVSHMEM_MSI, you should skip this line too.
>
> I think it's better to leave this line in and check
>
>     if (msix_enabled(pci_dev)) {
>         return;
>     }
>
> in ivshmem_update_irq instead.  This matches what xhci does, for example.

Yes, but it's not what ivshmem has ever done.  In other words, it's a
backward-incompatible change.

A PCI function declares whether it can do MSI or MSI-X with
capabilities.

Use of MSI and MSI-X is optional.  Software can enable either MSI or
MSI-X, both not both.  When MSI-X is enabled, the function must signal
interrupts via MSI-X.  When MSI is enabled, it must signal interrupts
via MSI.  When neither is enabled, it signals interrupts via INTx *if*
it has the pin wired up.  PCI Local Bus Specification Revision 3.0,
section 6.8 Message Signaled Interrupts:

    It is recommended that devices implement interrupt pins to provide
    compatibility in systems that do not support MSI (devices default to
    interrupt pins).  However, it is expected that the need for
    interrupt pins will diminish over time.  Devices that do not support
    interrupt pins due to pin constraints (rely on polling for device
    service) may implement messages to increase performance without
    adding additional pins.  Therefore, system configuration software
    must not assume that a message capable device has an interrupt pin.

The xhci device *does* implement this fallback to INTx.

For better or worse, fallback to INTx has never been implemented in
ivshmem.  You can either ask for an INTx-only device (msi=off), or for
an MSI-X-only device (msi=on).  The latter *cannot* do interrupts until
you enable MSI-X.

Similarly, the ivshmem-doorbell device introduced later in this series
can only do MSI-X, and the ivshmem-plain device cannot do interrupts at
all.

We could of course implement the fallback in ivshmem, too.  It's not
quite as simple as making ivshmem_update_irq() do nothing when
msix_enabled(), we also have to adapt ivshmem_vector_notify(), update
ivshmem-spec.txt, and cover the fallback in the tests.  Also limit the
change to revision 1 of the device for compatibility.  I very much doubt
this is worth the trouble.

A PCI function declares its INTx use in config space register Interrupt
Pin.  Ibid., section 6.2.4. Miscellaneous Registers:

    The Interrupt Pin register tells which interrupt pin the device (or
    device function) uses.  A value of 1 corresponds to INTA#.  A value
    of 2 corresponds to INTB#.  A value of 3 corresponds to INTC#.  A
    value of 4 corresponds to INTD#.  Devices (or device functions) that
    do not use an interrupt pin must put a 0 in this register.

ivshmem with msi=on should therefore put 0 in this register.  It
doesn't, but I feel it's better to let it remain bug-compatible.
ivshmem-doorbell and ivshmem-plain get it right.

Aside: xhci falls back to INTx, and should therefore declare its use of
INTx in the Interrupt Pin register, but I can't see where it does.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory)
  2016-03-01 15:15       ` Paolo Bonzini
@ 2016-03-02 11:06         ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 11:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: mlureau, cam, claudio.fontana, qemu-devel, david.marchand

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 01/03/2016 15:06, Markus Armbruster wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>>> On 29/02/2016 19:40, Markus Armbruster wrote:
>>>> -    memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
>>>> +    s->ivshmem_bar2 = g_new(MemoryRegion, 1);
>>>> +    memory_region_init_ram_ptr(s->ivshmem_bar2, OBJECT(s),
>>>>                                 "ivshmem.bar2", s->ivshmem_size, ptr);
>>>> -    qemu_set_ram_fd(s->ivshmem.ram_addr, fd);
>>>> -    vmstate_register_ram(&s->ivshmem, DEVICE(s));
>>>> -    memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
>>>> +    qemu_set_ram_fd(s->ivshmem_bar2->ram_addr, fd);
>>>
>>> This is missing an instance_finalize callback to do
>>>
>>>     if (s->ivshmem_bar2) {
>>>         object_unparent(s->ivshmem_bar2);
>>>         g_free(s->ivshmem_bar2);
>>>     }
>> 
>> Since it's allocated within ivshmem_realize(), I guess I could free it
>> in ivshmem_exit().
>
> Unfortunately you can't, because the guest might be using it at the time
> of hot-unplug (e.g. DMAing from disk to it).  Unrealize is the place
> where you hide stuff, and in this case the PCI core does it for you;
> finalize is the place where you free stuff.
>
> This is mentioned (though not really in these terms) in docs/memory.txt.

You mean I'm supposed to have read and understood that?!?  ;-}
Thanks!

[...]

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 19/38] ivshmem: Assert interrupts are set up once
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 19/38] ivshmem: Assert interrupts are set up once Markus Armbruster
@ 2016-03-02 12:02   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 12:02 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> An interrupt is set up when the interrupt's file descriptor is
> received.  Each message applies to the next interrupt vector.
> Therefore, each vector cannot be set up more than once.
>
> ivshmem_add_kvm_msi_virq() half-heartedly tries not to rely on this by
> doing nothing then, but that's not going to recover from this error
> should it become possible in the future.  watch_vector_notifier()
> doesn't even try.
>
> Simply assert what is the case, so we get alerted if we ever screw it
> up.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  hw/misc/ivshmem.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index fc37feb..9d2209d 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -349,7 +349,7 @@ static void watch_vector_notifier(IVShmemState *s, EventNotifier *n,
>  {
>      int eventfd = event_notifier_get_fd(n);
>
> -    /* if MSI is supported we need multiple interrupts */
> +    assert(!s->msi_vectors[vector].pdev);

ok, why not

>      s->msi_vectors[vector].pdev = PCI_DEVICE(s);
>
>      qemu_set_fd_handler(eventfd, ivshmem_vector_notify,
> @@ -535,10 +535,7 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
>      int ret;
>
>      IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
> -
> -    if (s->msi_vectors[vector].pdev != NULL) {
> -        return 0;
> -    }
> +    assert(!s->msi_vectors[vector].pdev);

that one is more tricky, since irqfd may be enabled/disabled
dynamically from ivshmem_write_config(), and
ivshmem_add_kvm_msi_virq() may be called at different times. However,
I think an assert is correct as there shouldn't be a valid state where
add_kvm_msi_virq() is called with the same vector when irqfd is
enabled.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 18/38] ivshmem: Leave INTx alone when using MSI-X
  2016-03-02 11:04       ` Markus Armbruster
@ 2016-03-02 14:15         ` Paolo Bonzini
  2016-03-02 15:50           ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-02 14:15 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Claudio Fontana, cam, Marc-André Lureau, QEMU, David Marchand



On 02/03/2016 12:04, Markus Armbruster wrote:
> For better or worse, fallback to INTx has never been implemented in
> ivshmem.  You can either ask for an INTx-only device (msi=off), or for
> an MSI-X-only device (msi=on).  The latter *cannot* do interrupts until
> you enable MSI-X.

Aha, now I see what you mean:

    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
        msix_notify(pdev, vector);
    } else {
        ivshmem_IntrStatus_write(s, 1);
    }

So I believe your patch is okay.  Perhaps you could also change the
interrupt pin for new machine types (even without changing the
revision), but it's not necessary to do it.

> Similarly, the ivshmem-doorbell device introduced later in this series
> can only do MSI-X, and the ivshmem-plain device cannot do interrupts at
> all.

Here:

    dev->config[PCI_INTERRUPT_PIN] = 0x01; /* interrupt pin 1 */

Paolo

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 20/38] ivshmem: Simplify rejection of invalid peer ID from server
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 20/38] ivshmem: Simplify rejection of invalid peer ID from server Markus Armbruster
@ 2016-03-02 15:08   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 15:08 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> ivshmem_read() processes server messages.  These are 64 bit signed
> integers.  -1 is shared memory setup, 16 bit unsigned is a peer ID,
> anything else is invalid.
>
> ivshmem_read() rejects invalid negative messages right away, silently.
>
> Invalid positive messages get rejected only in resize_peers(), and
> ivshmem_read() then prints the rather cryptic message "failed to
> resize peers array".
>
> Extend the first check to cover all invalid messages, make it report
> "server sent invalid message", and drop the second check.
>
> Now resize_peers() can't fail anymore; simplify.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


>  hw/misc/ivshmem.c | 61 ++++++++++++++++++++-----------------------------------
>  1 file changed, 22 insertions(+), 39 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 9d2209d..5d33be4 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -39,7 +39,7 @@
>  #define PCI_VENDOR_ID_IVSHMEM   PCI_VENDOR_ID_REDHAT_QUMRANET
>  #define PCI_DEVICE_ID_IVSHMEM   0x1110
>
> -#define IVSHMEM_MAX_PEERS G_MAXUINT16
> +#define IVSHMEM_MAX_PEERS UINT16_MAX
>  #define IVSHMEM_IOEVENTFD   0
>  #define IVSHMEM_MSI     1
>
> @@ -93,7 +93,7 @@ typedef struct IVShmemState {
>      uint32_t ivshmem_64bit;
>
>      Peer *peers;
> -    int nb_peers; /* how many peers we have space for */
> +    int nb_peers;               /* space in @peers[] */
>
>      int vm_id;
>      uint32_t vectors;
> @@ -451,34 +451,21 @@ static void close_peer_eventfds(IVShmemState *s, int posn)
>      s->peers[posn].nb_eventfds = 0;
>  }
>
> -/* this function increase the dynamic storage need to store data about other
> - * peers */
> -static int resize_peers(IVShmemState *s, int new_min_size)
> +static void resize_peers(IVShmemState *s, int nb_peers)
>  {
> +    int old_nb_peers = s->nb_peers;
> +    int i;
>
> -    int j, old_size;
> +    assert(nb_peers > old_nb_peers);
> +    IVSHMEM_DPRINTF("bumping storage to %d peers\n", nb_peers);
>
> -    /* limit number of max peers */
> -    if (new_min_size <= 0 || new_min_size > IVSHMEM_MAX_PEERS) {
> -        return -1;
> -    }
> -    if (new_min_size <= s->nb_peers) {
> -        return 0;
> -    }
> -
> -    old_size = s->nb_peers;
> -    s->nb_peers = new_min_size;
> +    s->peers = g_realloc(s->peers, nb_peers * sizeof(Peer));
> +    s->nb_peers = nb_peers;
>
> -    IVSHMEM_DPRINTF("bumping storage to %d peers\n", s->nb_peers);
> -
> -    s->peers = g_realloc(s->peers, s->nb_peers * sizeof(Peer));
> -
> -    for (j = old_size; j < s->nb_peers; j++) {
> -        s->peers[j].eventfds = g_new0(EventNotifier, s->vectors);
> -        s->peers[j].nb_eventfds = 0;
> +    for (i = old_nb_peers; i < nb_peers; i++) {
> +        s->peers[i].eventfds = g_new0(EventNotifier, s->vectors);
> +        s->peers[i].nb_eventfds = 0;
>      }
> -
> -    return 0;
>  }
>
>  static bool fifo_update_and_get(IVShmemState *s, const uint8_t *buf, int size,
> @@ -590,25 +577,21 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>          return;
>      }
>
> -    if (incoming_posn < -1) {
> -        IVSHMEM_DPRINTF("invalid incoming_posn %" PRId64 "\n", incoming_posn);
> -        return;
> -    }
> -
> -    /* pick off s->server_chr->msgfd and store it, posn should accompany msg */
>      incoming_fd = qemu_chr_fe_get_msgfd(s->server_chr);
>      IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n",
>                      incoming_posn, incoming_fd);
>
> -    /* make sure we have enough space for this peer */
> +    if (incoming_posn < -1 || incoming_posn > IVSHMEM_MAX_PEERS) {
> +        error_report("server sent invalid message %" PRId64,
> +                     incoming_posn);
> +        if (incoming_fd != -1) {
> +            close(incoming_fd);
> +        }
> +        return;
> +    }
> +
>      if (incoming_posn >= s->nb_peers) {
> -        if (resize_peers(s, incoming_posn + 1) < 0) {
> -            error_report("failed to resize peers array");
> -            if (incoming_fd != -1) {
> -                close(incoming_fd);
> -            }
> -            return;
> -        }
> +        resize_peers(s, incoming_posn + 1);
>      }
>
>      peer = &s->peers[incoming_posn];
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 21/38] ivshmem: Disentangle ivshmem_read()
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 21/38] ivshmem: Disentangle ivshmem_read() Markus Armbruster
@ 2016-03-02 15:28   ` Marc-André Lureau
  2016-03-02 15:53     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 15:28 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  hw/misc/ivshmem.c | 189 +++++++++++++++++++++++++++---------------------------
>  1 file changed, 96 insertions(+), 93 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 5d33be4..fc46666 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -564,14 +564,106 @@ static void setup_interrupt(IVShmemState *s, int vector)
>      }
>  }
>
> +static void process_msg_shmem(IVShmemState *s, int fd)
> +{
> +    Error *err = NULL;
> +    void *ptr;
> +
> +    if (memory_region_is_mapped(&s->ivshmem)) {
> +        error_report("shm already initialized");
> +        close(fd);
> +        return;
> +    }
> +
> +    if (check_shm_size(s, fd, &err) == -1) {
> +        error_report_err(err);
> +        close(fd);
> +        return;
> +    }
> +
> +    /* mmap the region and map into the BAR2 */
> +    ptr = mmap(0, s->ivshmem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +    if (ptr == MAP_FAILED) {
> +        error_report("Failed to mmap shared memory %s", strerror(errno));
> +        close(fd);
> +        return;
> +    }
> +    memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
> +                               "ivshmem.bar2", s->ivshmem_size, ptr);
> +    qemu_set_ram_fd(s->ivshmem.ram_addr, fd);
> +    vmstate_register_ram(&s->ivshmem, DEVICE(s));
> +    memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
> +}
> +
> +static void process_msg_disconnect(IVShmemState *s, uint16_t posn)
> +{
> +    IVSHMEM_DPRINTF("posn %d has gone away\n", posn);
> +    close_peer_eventfds(s, posn);
> +}
> +
> +static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
> +{
> +    Peer *peer = &s->peers[posn];
> +    int vector;
> +
> +    /*
> +     * The N-th connect message for this peer comes with the file
> +     * descriptor for vector N-1.  Count messages to find the vector.
> +     */
> +    if (peer->nb_eventfds >= s->vectors) {
> +        error_report("Too many eventfd received, device has %d vectors",
> +                     s->vectors);
> +        close(fd);
> +        return;
> +    }
> +    vector = peer->nb_eventfds++;
> +
> +    IVSHMEM_DPRINTF("eventfds[%d][%d] = %d\n", posn, vector, fd);
> +    event_notifier_init_fd(&peer->eventfds[vector], fd);
> +    fcntl_setfl(fd, O_NONBLOCK); /* msix/irqfd poll non block */
> +
> +    if (posn == s->vm_id) {
> +        setup_interrupt(s, vector);
> +    }
> +
> +    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
> +        ivshmem_add_eventfd(s, posn, vector);
> +    }
> +}
> +
> +static void process_msg(IVShmemState *s, int64_t msg, int fd)
> +{
> +    IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n", msg, fd);
> +
> +    if (msg < -1 || msg > IVSHMEM_MAX_PEERS) {
> +        error_report("server sent invalid message %" PRId64, msg);
> +        close(fd);
> +        return;
> +    }
> +
> +    if (msg == -1) {
> +        process_msg_shmem(s, fd);

the previous code used to close fd if any, it's worth to keep that imho

> +        return;
> +    }
> +
> +    if (msg >= s->nb_peers) {
> +        resize_peers(s, msg + 1);
> +    }
> +
> +    if (fd >= 0) {
> +        process_msg_connect(s, msg, fd);
> +    } else if (s->vm_id == -1) {
> +        s->vm_id = msg;
> +    } else {
> +        process_msg_disconnect(s, msg);
> +    }
> +}
> +
>  static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>  {
>      IVShmemState *s = opaque;
>      int incoming_fd;
> -    int new_eventfd;
>      int64_t incoming_posn;
> -    Error *err = NULL;
> -    Peer *peer;
>
>      if (!fifo_update_and_get_i64(s, buf, size, &incoming_posn)) {
>          return;
> @@ -581,96 +673,7 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>      IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n",
>                      incoming_posn, incoming_fd);
>
> -    if (incoming_posn < -1 || incoming_posn > IVSHMEM_MAX_PEERS) {
> -        error_report("server sent invalid message %" PRId64,
> -                     incoming_posn);
> -        if (incoming_fd != -1) {
> -            close(incoming_fd);
> -        }
> -        return;
> -    }
> -
> -    if (incoming_posn >= s->nb_peers) {
> -        resize_peers(s, incoming_posn + 1);
> -    }
> -
> -    peer = &s->peers[incoming_posn];
> -
> -    if (incoming_fd == -1) {
> -        /* if posn is positive and unseen before then this is our posn*/
> -        if (incoming_posn >= 0 && s->vm_id == -1) {
> -            /* receive our posn */
> -            s->vm_id = incoming_posn;
> -        } else {
> -            /* otherwise an fd == -1 means an existing peer has gone away */
> -            IVSHMEM_DPRINTF("posn %" PRId64 " has gone away\n", incoming_posn);
> -            close_peer_eventfds(s, incoming_posn);
> -        }
> -        return;
> -    }
> -
> -    /* if the position is -1, then it's shared memory region fd */
> -    if (incoming_posn == -1) {
> -        void * map_ptr;
> -
> -        if (memory_region_is_mapped(&s->ivshmem)) {
> -            error_report("shm already initialized");
> -            close(incoming_fd);
> -            return;
> -        }
> -
> -        if (check_shm_size(s, incoming_fd, &err) == -1) {
> -            error_report_err(err);
> -            close(incoming_fd);
> -            return;
> -        }
> -
> -        /* mmap the region and map into the BAR2 */
> -        map_ptr = mmap(0, s->ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED,
> -                                                            incoming_fd, 0);
> -        if (map_ptr == MAP_FAILED) {
> -            error_report("Failed to mmap shared memory %s", strerror(errno));
> -            close(incoming_fd);
> -            return;
> -        }
> -        memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
> -                                   "ivshmem.bar2", s->ivshmem_size, map_ptr);
> -        qemu_set_ram_fd(s->ivshmem.ram_addr, incoming_fd);
> -        vmstate_register_ram(&s->ivshmem, DEVICE(s));
> -
> -        IVSHMEM_DPRINTF("guest h/w addr = %p, size = %" PRIu64 "\n",
> -                        map_ptr, s->ivshmem_size);
> -
> -        memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
> -
> -        return;
> -    }
> -
> -    /* each peer has an associated array of eventfds, and we keep
> -     * track of how many eventfds received so far */
> -    /* get a new eventfd: */
> -    if (peer->nb_eventfds >= s->vectors) {
> -        error_report("Too many eventfd received, device has %d vectors",
> -                     s->vectors);
> -        close(incoming_fd);
> -        return;
> -    }
> -
> -    new_eventfd = peer->nb_eventfds++;
> -
> -    /* this is an eventfd for a particular peer VM */
> -    IVSHMEM_DPRINTF("eventfds[%" PRId64 "][%d] = %d\n", incoming_posn,
> -                    new_eventfd, incoming_fd);
> -    event_notifier_init_fd(&peer->eventfds[new_eventfd], incoming_fd);
> -    fcntl_setfl(incoming_fd, O_NONBLOCK); /* msix/irqfd poll non block */
> -
> -    if (incoming_posn == s->vm_id) {
> -        setup_interrupt(s, new_eventfd);
> -    }
> -
> -    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
> -        ivshmem_add_eventfd(s, incoming_posn, new_eventfd);
> -    }
> +    process_msg(s, incoming_posn, incoming_fd);

while at it, you may want to rename incoming_posn to msg for consistency.

>  }
>
>  static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
> --
> 2.4.3
>
>

looks good otherwise

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 18/38] ivshmem: Leave INTx alone when using MSI-X
  2016-03-02 14:15         ` Paolo Bonzini
@ 2016-03-02 15:50           ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 15:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Marc-André Lureau, cam, Claudio Fontana, QEMU, David Marchand

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 02/03/2016 12:04, Markus Armbruster wrote:
>> For better or worse, fallback to INTx has never been implemented in
>> ivshmem.  You can either ask for an INTx-only device (msi=off), or for
>> an MSI-X-only device (msi=on).  The latter *cannot* do interrupts until
>> you enable MSI-X.
>
> Aha, now I see what you mean:
>
>     if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
>         msix_notify(pdev, vector);
>     } else {
>         ivshmem_IntrStatus_write(s, 1);
>     }
>
> So I believe your patch is okay.

I can try to explain it a bit better in the comment and/or commit
message when I respin.

>                                   Perhaps you could also change the
> interrupt pin for new machine types (even without changing the
> revision), but it's not necessary to do it.

I chose not to bother, because after PATCH 34, the non-deprecated
devices are all revision 1 (correct Interrupt Pin register).

>> Similarly, the ivshmem-doorbell device introduced later in this series
>> can only do MSI-X, and the ivshmem-plain device cannot do interrupts at
>> all.
>
> Here:
>
>     dev->config[PCI_INTERRUPT_PIN] = 0x01; /* interrupt pin 1 */

Ah!  I looked only for pci_config_set_interrupt_pin().

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 10/38] ivshmem: Compile debug prints unconditionally to prevent bit-rot
  2016-03-02  9:51       ` Markus Armbruster
@ 2016-03-02 15:52         ` Eric Blake
  0 siblings, 0 replies; 118+ messages in thread
From: Eric Blake @ 2016-03-02 15:52 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Claudio Fontana, David Marchand, QEMU, Marc-André Lureau,
	Paolo Bonzini, cam

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

On 03/02/2016 02:51 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> On 03/01/2016 05:22 AM, Marc-André Lureau wrote:
>>> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>>> ---
>>>
>>> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>>>
>>> (apparently, there are other places in qemu where this conversion could be done)
>>
>> Yep. I try to flag them when I see someone touch one, but a global
>> search-and-replace would be a nice beginner's project.
> 
> Would you like to add it http://wiki.qemu.org/BiteSizedTasks ?
> 

Done.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 21/38] ivshmem: Disentangle ivshmem_read()
  2016-03-02 15:28   ` Marc-André Lureau
@ 2016-03-02 15:53     ` Markus Armbruster
  2016-03-02 17:33       ` Marc-André Lureau
  0 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 15:53 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  hw/misc/ivshmem.c | 189 +++++++++++++++++++++++++++---------------------------
>>  1 file changed, 96 insertions(+), 93 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index 5d33be4..fc46666 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -564,14 +564,106 @@ static void setup_interrupt(IVShmemState *s, int vector)
>>      }
>>  }
>>
>> +static void process_msg_shmem(IVShmemState *s, int fd)
>> +{
>> +    Error *err = NULL;
>> +    void *ptr;
>> +
>> +    if (memory_region_is_mapped(&s->ivshmem)) {
>> +        error_report("shm already initialized");
>> +        close(fd);
>> +        return;
>> +    }
>> +
>> +    if (check_shm_size(s, fd, &err) == -1) {
>> +        error_report_err(err);
>> +        close(fd);
>> +        return;
>> +    }
>> +
>> +    /* mmap the region and map into the BAR2 */
>> +    ptr = mmap(0, s->ivshmem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>> +    if (ptr == MAP_FAILED) {
>> +        error_report("Failed to mmap shared memory %s", strerror(errno));
>> +        close(fd);
>> +        return;
>> +    }
>> +    memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
>> +                               "ivshmem.bar2", s->ivshmem_size, ptr);
>> +    qemu_set_ram_fd(s->ivshmem.ram_addr, fd);
>> +    vmstate_register_ram(&s->ivshmem, DEVICE(s));
>> +    memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
>> +}
>> +
>> +static void process_msg_disconnect(IVShmemState *s, uint16_t posn)
>> +{
>> +    IVSHMEM_DPRINTF("posn %d has gone away\n", posn);
>> +    close_peer_eventfds(s, posn);
>> +}
>> +
>> +static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
>> +{
>> +    Peer *peer = &s->peers[posn];
>> +    int vector;
>> +
>> +    /*
>> +     * The N-th connect message for this peer comes with the file
>> +     * descriptor for vector N-1.  Count messages to find the vector.
>> +     */
>> +    if (peer->nb_eventfds >= s->vectors) {
>> +        error_report("Too many eventfd received, device has %d vectors",
>> +                     s->vectors);
>> +        close(fd);
>> +        return;
>> +    }
>> +    vector = peer->nb_eventfds++;
>> +
>> +    IVSHMEM_DPRINTF("eventfds[%d][%d] = %d\n", posn, vector, fd);
>> +    event_notifier_init_fd(&peer->eventfds[vector], fd);
>> +    fcntl_setfl(fd, O_NONBLOCK); /* msix/irqfd poll non block */
>> +
>> +    if (posn == s->vm_id) {
>> +        setup_interrupt(s, vector);
>> +    }
>> +
>> +    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
>> +        ivshmem_add_eventfd(s, posn, vector);
>> +    }
>> +}
>> +
>> +static void process_msg(IVShmemState *s, int64_t msg, int fd)
>> +{
>> +    IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n", msg, fd);
>> +
>> +    if (msg < -1 || msg > IVSHMEM_MAX_PEERS) {
>> +        error_report("server sent invalid message %" PRId64, msg);
>> +        close(fd);
>> +        return;
>> +    }
>> +
>> +    if (msg == -1) {
>> +        process_msg_shmem(s, fd);
>
> the previous code used to close fd if any, it's worth to keep that imho

I'm blind.  Where?

>> +        return;
>> +    }
>> +
>> +    if (msg >= s->nb_peers) {
>> +        resize_peers(s, msg + 1);
>> +    }
>> +
>> +    if (fd >= 0) {
>> +        process_msg_connect(s, msg, fd);
>> +    } else if (s->vm_id == -1) {
>> +        s->vm_id = msg;
>> +    } else {
>> +        process_msg_disconnect(s, msg);
>> +    }
>> +}
>> +
>>  static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>>  {
>>      IVShmemState *s = opaque;
>>      int incoming_fd;
>> -    int new_eventfd;
>>      int64_t incoming_posn;
>> -    Error *err = NULL;
>> -    Peer *peer;
>>
>>      if (!fifo_update_and_get_i64(s, buf, size, &incoming_posn)) {
>>          return;
>> @@ -581,96 +673,7 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>>      IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n",
>>                      incoming_posn, incoming_fd);
>>
>> -    if (incoming_posn < -1 || incoming_posn > IVSHMEM_MAX_PEERS) {
>> -        error_report("server sent invalid message %" PRId64,
>> -                     incoming_posn);
>> -        if (incoming_fd != -1) {
>> -            close(incoming_fd);
>> -        }
>> -        return;
>> -    }
>> -
>> -    if (incoming_posn >= s->nb_peers) {
>> -        resize_peers(s, incoming_posn + 1);
>> -    }
>> -
>> -    peer = &s->peers[incoming_posn];
>> -
>> -    if (incoming_fd == -1) {
>> -        /* if posn is positive and unseen before then this is our posn*/
>> -        if (incoming_posn >= 0 && s->vm_id == -1) {
>> -            /* receive our posn */
>> -            s->vm_id = incoming_posn;
>> -        } else {
>> -            /* otherwise an fd == -1 means an existing peer has gone away */
>> -            IVSHMEM_DPRINTF("posn %" PRId64 " has gone away\n", incoming_posn);
>> -            close_peer_eventfds(s, incoming_posn);
>> -        }
>> -        return;
>> -    }
>> -
>> -    /* if the position is -1, then it's shared memory region fd */
>> -    if (incoming_posn == -1) {
>> -        void * map_ptr;
>> -
>> -        if (memory_region_is_mapped(&s->ivshmem)) {
>> -            error_report("shm already initialized");
>> -            close(incoming_fd);
>> -            return;
>> -        }
>> -
>> -        if (check_shm_size(s, incoming_fd, &err) == -1) {
>> -            error_report_err(err);
>> -            close(incoming_fd);
>> -            return;
>> -        }
>> -
>> -        /* mmap the region and map into the BAR2 */
>> -        map_ptr = mmap(0, s->ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED,
>> -                                                            incoming_fd, 0);
>> -        if (map_ptr == MAP_FAILED) {
>> -            error_report("Failed to mmap shared memory %s", strerror(errno));
>> -            close(incoming_fd);
>> -            return;
>> -        }
>> -        memory_region_init_ram_ptr(&s->ivshmem, OBJECT(s),
>> -                                   "ivshmem.bar2", s->ivshmem_size, map_ptr);
>> -        qemu_set_ram_fd(s->ivshmem.ram_addr, incoming_fd);
>> -        vmstate_register_ram(&s->ivshmem, DEVICE(s));
>> -
>> -        IVSHMEM_DPRINTF("guest h/w addr = %p, size = %" PRIu64 "\n",
>> -                        map_ptr, s->ivshmem_size);
>> -
>> -        memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
>> -
>> -        return;
>> -    }
>> -
>> -    /* each peer has an associated array of eventfds, and we keep
>> -     * track of how many eventfds received so far */
>> -    /* get a new eventfd: */
>> -    if (peer->nb_eventfds >= s->vectors) {
>> -        error_report("Too many eventfd received, device has %d vectors",
>> -                     s->vectors);
>> -        close(incoming_fd);
>> -        return;
>> -    }
>> -
>> -    new_eventfd = peer->nb_eventfds++;
>> -
>> -    /* this is an eventfd for a particular peer VM */
>> -    IVSHMEM_DPRINTF("eventfds[%" PRId64 "][%d] = %d\n", incoming_posn,
>> -                    new_eventfd, incoming_fd);
>> -    event_notifier_init_fd(&peer->eventfds[new_eventfd], incoming_fd);
>> -    fcntl_setfl(incoming_fd, O_NONBLOCK); /* msix/irqfd poll non block */
>> -
>> -    if (incoming_posn == s->vm_id) {
>> -        setup_interrupt(s, new_eventfd);
>> -    }
>> -
>> -    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
>> -        ivshmem_add_eventfd(s, incoming_posn, new_eventfd);
>> -    }
>> +    process_msg(s, incoming_posn, incoming_fd);
>
> while at it, you may want to rename incoming_posn to msg for consistency.

I didn't to minimize churn, but you're probably right.  I'll try and see
how the diff comes out.

>>  }
>>
>>  static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
>> --
>> 2.4.3
>>
>>
>
> looks good otherwise

Thanks!

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 21/38] ivshmem: Disentangle ivshmem_read()
  2016-03-02 15:53     ` Markus Armbruster
@ 2016-03-02 17:33       ` Marc-André Lureau
  2016-03-02 19:15         ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 17:33 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Wed, Mar 2, 2016 at 4:53 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>> +    if (msg == -1) {
>>> +        process_msg_shmem(s, fd);
>>
>> the previous code used to close fd if any, it's worth to keep that imho
>
> I'm blind.  Where?

Sorry, wrong place I looked at, seems you got them all.

    if (msg < -1 || msg > IVSHMEM_MAX_PEERS) {
        error_report("server sent invalid message %" PRId64, msg);
        close(fd);
        return;
    }


However, why not keep the if fd != -1 here (not a great idea to call
close otherwise)

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 22/38] ivshmem: Plug leaks on unplug, fix peer disconnect
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 22/38] ivshmem: Plug leaks on unplug, fix peer disconnect Markus Armbruster
@ 2016-03-02 17:47   ` Marc-André Lureau
  2016-03-02 19:19     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 17:47 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> close_peer_eventfds() cleans up three things: ioeventfd triggers if
> they exist, eventfds, and the array to store them.
>
> Commit 98609cd (v1.2.0) fixed it not to clean up ioeventfd triggers
> when they don't exist (property ioeventfd=off, which is the default).
> Unfortunately, the fix also made it skip cleanup of the eventfds and
> the array then.  This is a memory and file descriptor leak on unplug.
>
> Additionally, the reset of nb_eventfds is skipped.  Doesn't matter on
> unplug.  On peer disconnect, however, this permanently wedges the
> interrupt vectors used for that peer's ID.  The eventfds stay behind,
> but aren't connected to a peer anymore.  When the ID gets recycled for
> a new peer, the new peer's eventfds get assigned to vectors after the
> old ones.  Commonly, the device's number of vectors matches the
> server's, so the new ones get dropped with a "Too many eventfd
> received" message.  Interrupts either don't work (common case) or go
> to the wrong vector.
>
> Fix by narrowing the conditional to just the ioeventfd trigger
> cleanup.
>
> While there, move the "invalid" peer check to the only caller where it
> can actually happen.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  hw/misc/ivshmem.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index fc46666..c366087 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -428,21 +428,17 @@ static void close_peer_eventfds(IVShmemState *s, int posn)
>  {
>      int i, n;
>
> -    if (!ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
> -        return;
> -    }
> -    if (posn < 0 || posn >= s->nb_peers) {
> -        error_report("invalid peer %d", posn);
> -        return;
> -    }
> -
> +    assert(posn >= 0 && posn < s->nb_peers);
>      n = s->peers[posn].nb_eventfds;
>
> -    memory_region_transaction_begin();
> -    for (i = 0; i < n; i++) {
> -        ivshmem_del_eventfd(s, posn, i);
> +    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
> +        memory_region_transaction_begin();
> +        for (i = 0; i < n; i++) {
> +            ivshmem_del_eventfd(s, posn, i);
> +        }
> +        memory_region_transaction_commit();
>      }
> -    memory_region_transaction_commit();
> +
>      for (i = 0; i < n; i++) {
>          event_notifier_cleanup(&s->peers[posn].eventfds[i]);
>      }

Looks good, that makes me wonder, what would happen if posn == vm_id?
I think this should be made an invalid condition or it should revert
setup_interrupt().

> @@ -598,6 +594,10 @@ static void process_msg_shmem(IVShmemState *s, int fd)
>  static void process_msg_disconnect(IVShmemState *s, uint16_t posn)
>  {
>      IVSHMEM_DPRINTF("posn %d has gone away\n", posn);
> +    if (posn >= s->nb_peers) {
> +        error_report("invalid peer %d", posn);
> +        return;
> +    }
>      close_peer_eventfds(s, posn);
>  }
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 23/38] ivshmem: Receive shared memory synchronously in realize()
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 23/38] ivshmem: Receive shared memory synchronously in realize() Markus Armbruster
@ 2016-03-02 18:11   ` Marc-André Lureau
  2016-03-02 19:28     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 18:11 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> When configured for interrupts (property "chardev" given), we receive
> the shared memory from an ivshmem server.  We do so asynchronously
> after realize() completes, by setting up callbacks with
> qemu_chr_add_handlers().
>
> Keeping server I/O out of realize() that way avoids delays due to a
> slow server.  This is probably relevant only for hot plug.
>
> However, this funny "no shared memory, yet" state of the device also
> causes a raft of issues that are hard or impossible to work around:
>
> * The guest is exposed to this state: when we enter and leave it its
>   shared memory contents is apruptly replaced, and device register
>   IVPosition changes.
>
>   This is a known issue.  We document that guests should not access
>   the shared memory after device initialization until the IVPosition
>   register becomes non-negative.
>
>   For cold plug, the funny state is unlikely to be visible in
>   practice, because we normally receive the shared memory long before
>   the guest gets around to mess with the device.
>
>   For hot plug, the timing is tighter, but the relative slowness of
>   PCI device configuration has a good chance to hide the funny state.
>
>   In either case, guests complying with the documented procedure are
>   safe.
>
> * Migration becomes racy.
>
>   If migration completes before the shared memory setup completes on
>   the source, shared memory contents is silently lost.  Fortunately,
>   migration is rather unlikely to win this race.
>
>   If the shared memory's ramblock arrives at the destination before
>   shared memory setup completes, migration fails.
>
>   There is no known way for a management application to wait for
>   shared memory setup to complete.
>
>   All you can do is retry failed migration.  You can improve your
>   chances by leaving more time between running the destination QEMU
>   and the migrate command.
>
>   To mitigate silent memory loss, you need to ensure the server
>   initializes shared memory exactly the same on source and
>   destination.
>
>   These issues are entirely undocumented so far.
>
> I'd expect the server to be almost always fast enough to hide these
> issues.  But then rare catastrophic races are in a way the worst kind.
>
> This is way more trouble than I'm willing to take from any device.
> Kill the funny state by receiving shared memory synchronously in
> realize().  If your hot plug hangs, go kill your ivshmem server.
>
> For easier review, this commit only makes the receive synchronous, it
> doesn't add the necessary error propagation.  Without that, the funny
> state persists.  The next commit will do that, and kill it off for
> real.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  hw/misc/ivshmem.c    | 70 +++++++++++++++++++++++++++++++++++++---------------
>  tests/ivshmem-test.c | 26 ++++++-------------
>  2 files changed, 57 insertions(+), 39 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index c366087..352937f 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -676,27 +676,47 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>      process_msg(s, incoming_posn, incoming_fd);
>  }
>
> -static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
> +static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
>  {
> -    IVShmemState *s = opaque;
> -    int tmp;
> -    int64_t version;
> +    int64_t msg;
> +    int n, ret;
>
> -    if (!fifo_update_and_get_i64(s, buf, size, &version)) {
> -        return;
> -    }
> +    n = 0;
> +    do {
> +        ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *)&msg + n,
> +                                 sizeof(msg) - n);
> +        if (ret < 0 && ret != -EINTR) {
> +            /* TODO error handling */
> +            return INT64_MIN;
> +        }
> +        n += ret;
> +    } while (n < sizeof(msg));
>
> -    tmp = qemu_chr_fe_get_msgfd(s->server_chr);
> -    if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) {
> +    *pfd = qemu_chr_fe_get_msgfd(s->server_chr);
> +    return msg;
> +}
> +
> +static void ivshmem_recv_setup(IVShmemState *s)
> +{
> +    int64_t msg;
> +    int fd;
> +
> +    msg = ivshmem_recv_msg(s, &fd);
> +    if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) {
>          fprintf(stderr, "incompatible version, you are connecting to a ivshmem-"
>                  "server using a different protocol please check your setup\n");
> -        qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s);
>          return;
>      }
>
> -    IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n");
> -    qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read,
> -                          NULL, s);
> +    /*
> +     * Receive more messages until we got shared memory.
> +     */
> +    do {
> +        msg = ivshmem_recv_msg(s, &fd);
> +        process_msg(s, msg, fd);
> +    } while (msg != -1);
> +
> +    assert(memory_region_is_mapped(&s->ivshmem));

It is easy to trigger at runtime, I suggest to report an error instead.

looks good otherwise

>  }
>
>  /* Select the MSI-X vectors used by device.
> @@ -903,19 +923,29 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>          IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
>                          s->server_chr->filename);
>
> -        if (ivshmem_setup_interrupts(s) < 0) {
> -            error_setg(errp, "failed to initialize interrupts");
> -            return;
> -        }
> -
>          /* we allocate enough space for 16 peers and grow as needed */
>          resize_peers(s, 16);
>          s->vm_id = -1;
>
>          pci_register_bar(dev, 2, attr, &s->bar);
>
> -        qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
> -                              ivshmem_check_version, NULL, s);
> +        /*
> +         * Receive setup messages from server synchronously.
> +         * Older versions did it asynchronously, but that creates a
> +         * number of entertaining race conditions.
> +         * TODO Propagate errors!  Without that, we still have races
> +         * on errors.
> +         */
> +        ivshmem_recv_setup(s);
> +        if (memory_region_is_mapped(&s->ivshmem)) {
> +            qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
> +                                  ivshmem_read, NULL, s);
> +        }
> +
> +        if (ivshmem_setup_interrupts(s) < 0) {
> +            error_setg(errp, "failed to initialize interrupts");
> +            return;
> +        }
>      } else {
>          /* just map the file immediately, we're not using a server */
>          int fd;
> diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
> index c1dd7bb..68d6840 100644
> --- a/tests/ivshmem-test.c
> +++ b/tests/ivshmem-test.c
> @@ -309,35 +309,23 @@ static void test_ivshmem_server(bool msi)
>      ret = ivshmem_server_start(&server);
>      g_assert_cmpint(ret, ==, 0);
>
> -    setup_vm_with_server(&state1, nvectors, msi);
> -    s1 = &state1;
> -    setup_vm_with_server(&state2, nvectors, msi);
> -    s2 = &state2;
> -
> -    /* check state before server sends stuff */
> -    g_assert_cmpuint(in_reg(s1, IVPOSITION), ==, 0xffffffff);
> -    g_assert_cmpuint(in_reg(s2, IVPOSITION), ==, 0xffffffff);
> -    g_assert_cmpuint(qtest_readb(s1->qtest, (uintptr_t)s1->mem_base), ==, 0x00);
> -
>      thread.server = &server;
>      ret = pipe(thread.pipe);
>      g_assert_cmpint(ret, ==, 0);
>      thread.thread = g_thread_new("ivshmem-server", server_thread, &thread);
>      g_assert(thread.thread != NULL);
>
> -    /* waiting for devices to become operational */
> -    while (g_get_monotonic_time() < end_time) {
> -        g_usleep(1000);
> -        if ((int)in_reg(s1, IVPOSITION) >= 0 &&
> -            (int)in_reg(s2, IVPOSITION) >= 0) {
> -            break;
> -        }
> -    }
> +    setup_vm_with_server(&state1, nvectors, msi);
> +    s1 = &state1;
> +    setup_vm_with_server(&state2, nvectors, msi);
> +    s2 = &state2;
>
>      /* check got different VM ids */
>      vm1 = in_reg(s1, IVPOSITION);
>      vm2 = in_reg(s2, IVPOSITION);
> -    g_assert_cmpuint(vm1, !=, vm2);
> +    g_assert_cmpint(vm1, >=, 0);
> +    g_assert_cmpint(vm2, >=, 0);
> +    g_assert_cmpint(vm1, !=, vm2);
>
>      /* check number of MSI-X vectors */
>      global_qtest = s1->qtest;
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup()
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup() Markus Armbruster
@ 2016-03-02 18:27   ` Marc-André Lureau
  2016-03-02 19:35     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 18:27 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> This kills off the funny state described in the previous commit.
>
> Simplify ivshmem_io_read() accordingly, and update documentation.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  docs/specs/ivshmem-spec.txt |  20 ++++----
>  hw/misc/ivshmem.c           | 121 +++++++++++++++++++++++++++-----------------
>  qemu-doc.texi               |   9 +---
>  3 files changed, 87 insertions(+), 63 deletions(-)
>
> diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
> index 4fc6f37..3eb8c97 100644
> --- a/docs/specs/ivshmem-spec.txt
> +++ b/docs/specs/ivshmem-spec.txt
> @@ -62,11 +62,11 @@ There are two ways to use this device:
>    likely want to write a kernel driver to handle interrupts.  Requires
>    the device to be configured for interrupts, obviously.
>
> -If the device is configured for interrupts, BAR2 is initially invalid.
> -It becomes safely accessible only after the ivshmem server provided
> -the shared memory.  Guest software should wait for the IVPosition
> -register (described below) to become non-negative before accessing
> -BAR2.
> +Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
> +configured for interrupts.  It becomes safely accessible only after
> +the ivshmem server provided the shared memory.  Guest software should
> +wait for the IVPosition register (described below) to become
> +non-negative before accessing BAR2.
>
>  The device is not capable to tell guest software whether it is
>  configured for interrupts.
> @@ -82,7 +82,7 @@ BAR 0 contains the following registers:
>          4     4   read/write        0   Interrupt Status
>                                          bit 0: peer interrupt
>                                          bit 1..31: reserved
> -        8     4   read-only   0 or -1   IVPosition
> +        8     4   read-only   0 or ID   IVPosition
>         12     4   write-only      N/A   Doorbell
>                                          bit 0..15: vector
>                                          bit 16..31: peer ID
> @@ -100,12 +100,14 @@ when an interrupt request from a peer is received.  Reading the
>  register clears it.
>
>  IVPosition Register: if the device is not configured for interrupts,
> -this is zero.  Else, it's -1 for a short while after reset, then
> -changes to the device's ID (between 0 and 65535).
> +this is zero.  Else, it is the device's ID (between 0 and 65535).
> +
> +Before QEMU 2.6.0, the register may read -1 for a short while after
> +reset.
>
>  There is no good way for software to find out whether the device is
>  configured for interrupts.  A positive IVPosition means interrupts,
> -but zero could be either.  The initial -1 cannot be reliably observed.
> +but zero could be either.
>
>  Doorbell Register: writing this register requests to interrupt a peer.
>  The written value's high 16 bits are the ID of the peer to interrupt,
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 352937f..831da53 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -234,12 +234,7 @@ static uint64_t ivshmem_io_read(void *opaque, hwaddr addr,
>              break;
>
>          case IVPOSITION:
> -            /* return my VM ID if the memory is mapped */
> -            if (memory_region_is_mapped(&s->ivshmem)) {
> -                ret = s->vm_id;
> -            } else {
> -                ret = -1;
> -            }
> +            ret = s->vm_id;
>              break;
>
>          default:
> @@ -511,7 +506,8 @@ static bool fifo_update_and_get_i64(IVShmemState *s,
>      return false;
>  }
>
> -static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
> +static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
> +                                     Error **errp)

I prefer to return -1 in case of error, even if Error** is also returned.

>  {
>      PCIDevice *pdev = PCI_DEVICE(s);
>      MSIMessage msg = msix_get_message(pdev, vector);
> @@ -522,22 +518,21 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
>
>      ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
>      if (ret < 0) {
> -        error_report("ivshmem: kvm_irqchip_add_msi_route failed");
> -        return -1;
> +        error_setg(errp, "kvm_irqchip_add_msi_route failed");
> +        return;
>      }
>
>      s->msi_vectors[vector].virq = ret;
>      s->msi_vectors[vector].pdev = pdev;
> -
> -    return 0;
>  }
>
> -static void setup_interrupt(IVShmemState *s, int vector)
> +static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
>  {
>      EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
>      bool with_irqfd = kvm_msi_via_irqfd_enabled() &&
>          ivshmem_has_feature(s, IVSHMEM_MSI);
>      PCIDevice *pdev = PCI_DEVICE(s);
> +    Error *err = NULL;
>
>      IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);
>
> @@ -546,13 +541,16 @@ static void setup_interrupt(IVShmemState *s, int vector)
>          watch_vector_notifier(s, n, vector);
>      } else if (msix_enabled(pdev)) {
>          IVSHMEM_DPRINTF("with irqfd\n");
> -        if (ivshmem_add_kvm_msi_virq(s, vector) < 0) {
> +        ivshmem_add_kvm_msi_virq(s, vector, &err);
> +        if (err) {
> +            error_propagate(errp, err);
>              return;

That would make this simpler, avoiding local err variables, and
propagate. But you seem to prefer that form. I don't know if there is
any conventions (I am used to glib conventions, and usually a bool
success is returned, even if the function takes a GError)

>
>          if (!msix_is_masked(pdev, vector)) {
>              kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL,
>                                                 s->msi_vectors[vector].virq);
> +            /* TODO handle error */
>          }
>      } else {
>          /* it will be delayed until msix is enabled, in write_config */
> @@ -560,19 +558,19 @@ static void setup_interrupt(IVShmemState *s, int vector)
>      }
>  }
>
> -static void process_msg_shmem(IVShmemState *s, int fd)
> +static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
>  {
>      Error *err = NULL;
>      void *ptr;
>
>      if (memory_region_is_mapped(&s->ivshmem)) {
> -        error_report("shm already initialized");
> +        error_setg(errp, "server sent unexpected shared memory message");
>          close(fd);
>          return;
>      }
>
>      if (check_shm_size(s, fd, &err) == -1) {
> -        error_report_err(err);
> +        error_propagate(errp, err);
>          close(fd);
>          return;
>      }
> @@ -580,7 +578,7 @@ static void process_msg_shmem(IVShmemState *s, int fd)
>      /* mmap the region and map into the BAR2 */
>      ptr = mmap(0, s->ivshmem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>      if (ptr == MAP_FAILED) {
> -        error_report("Failed to mmap shared memory %s", strerror(errno));
> +        error_setg_errno(errp, errno, "Failed to mmap shared memory");
>          close(fd);
>          return;
>      }
> @@ -591,17 +589,19 @@ static void process_msg_shmem(IVShmemState *s, int fd)
>      memory_region_add_subregion(&s->bar, 0, &s->ivshmem);
>  }
>
> -static void process_msg_disconnect(IVShmemState *s, uint16_t posn)
> +static void process_msg_disconnect(IVShmemState *s, uint16_t posn,
> +                                   Error **errp)
>  {
>      IVSHMEM_DPRINTF("posn %d has gone away\n", posn);
>      if (posn >= s->nb_peers) {
> -        error_report("invalid peer %d", posn);
> +        error_setg(errp, "invalid peer %d", posn);
>          return;
>      }
>      close_peer_eventfds(s, posn);
>  }
>
> -static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
> +static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd,
> +                                Error **errp)
>  {
>      Peer *peer = &s->peers[posn];
>      int vector;
> @@ -611,8 +611,8 @@ static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
>       * descriptor for vector N-1.  Count messages to find the vector.
>       */
>      if (peer->nb_eventfds >= s->vectors) {
> -        error_report("Too many eventfd received, device has %d vectors",
> -                     s->vectors);
> +        error_setg(errp, "Too many eventfd received, device has %d vectors",
> +                   s->vectors);
>          close(fd);
>          return;
>      }
> @@ -623,7 +623,8 @@ static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
>      fcntl_setfl(fd, O_NONBLOCK); /* msix/irqfd poll non block */
>
>      if (posn == s->vm_id) {
> -        setup_interrupt(s, vector);
> +        setup_interrupt(s, vector, errp);
> +        /* TODO do we need to handle the error? */
>      }
>
>      if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
> @@ -631,18 +632,18 @@ static void process_msg_connect(IVShmemState *s, uint16_t posn, int fd)
>      }
>  }
>
> -static void process_msg(IVShmemState *s, int64_t msg, int fd)
> +static void process_msg(IVShmemState *s, int64_t msg, int fd, Error **errp)
>  {
>      IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n", msg, fd);
>
>      if (msg < -1 || msg > IVSHMEM_MAX_PEERS) {
> -        error_report("server sent invalid message %" PRId64, msg);
> +        error_setg(errp, "server sent invalid message %" PRId64, msg);
>          close(fd);
>          return;
>      }
>
>      if (msg == -1) {
> -        process_msg_shmem(s, fd);
> +        process_msg_shmem(s, fd, errp);
>          return;
>      }
>
> @@ -651,17 +652,18 @@ static void process_msg(IVShmemState *s, int64_t msg, int fd)
>      }
>
>      if (fd >= 0) {
> -        process_msg_connect(s, msg, fd);
> +        process_msg_connect(s, msg, fd, errp);
>      } else if (s->vm_id == -1) {
>          s->vm_id = msg;
>      } else {
> -        process_msg_disconnect(s, msg);
> +        process_msg_disconnect(s, msg, errp);
>      }
>  }
>
>  static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>  {
>      IVShmemState *s = opaque;
> +    Error *err = NULL;
>      int incoming_fd;
>      int64_t incoming_posn;
>
> @@ -673,10 +675,13 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>      IVSHMEM_DPRINTF("posn is %" PRId64 ", fd is %d\n",
>                      incoming_posn, incoming_fd);
>
> -    process_msg(s, incoming_posn, incoming_fd);
> +    process_msg(s, incoming_posn, incoming_fd, &err);
> +    if (err) {
> +        error_report_err(err);
> +    }
>  }
>
> -static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
> +static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd, Error **errp)
>  {
>      int64_t msg;
>      int n, ret;
> @@ -686,7 +691,7 @@ static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
>          ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *)&msg + n,
>                                   sizeof(msg) - n);
>          if (ret < 0 && ret != -EINTR) {
> -            /* TODO error handling */
> +            error_setg_errno(errp, -ret, "read from server failed");
>              return INT64_MIN;
>          }
>          n += ret;
> @@ -696,15 +701,24 @@ static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
>      return msg;
>  }
>
> -static void ivshmem_recv_setup(IVShmemState *s)
> +static void ivshmem_recv_setup(IVShmemState *s, Error **errp)
>  {
> +    Error *err = NULL;
>      int64_t msg;
>      int fd;
>
> -    msg = ivshmem_recv_msg(s, &fd);
> -    if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) {
> -        fprintf(stderr, "incompatible version, you are connecting to a ivshmem-"
> -                "server using a different protocol please check your setup\n");
> +    msg = ivshmem_recv_msg(s, &fd, &err);
> +    if (err) {
> +        error_propagate(errp, err);
> +        return;
> +    }
> +    if (msg != IVSHMEM_PROTOCOL_VERSION) {
> +        error_setg(errp, "server sent version %" PRId64 ", expecting %d",
> +                   msg, IVSHMEM_PROTOCOL_VERSION);
> +        return;
> +    }
> +    if (fd != -1) {
> +        error_setg(errp, "server sent invalid version message");
>          return;
>      }
>
> @@ -712,8 +726,16 @@ static void ivshmem_recv_setup(IVShmemState *s)
>       * Receive more messages until we got shared memory.
>       */
>      do {
> -        msg = ivshmem_recv_msg(s, &fd);
> -        process_msg(s, msg, fd);
> +        msg = ivshmem_recv_msg(s, &fd, &err);
> +        if (err) {
> +            error_propagate(errp, err);
> +            return;
> +        }
> +        process_msg(s, msg, fd, &err);
> +        if (err) {
> +            error_propagate(errp, err);
> +            return;
> +        }
>      } while (msg != -1);
>
>      assert(memory_region_is_mapped(&s->ivshmem));
> @@ -768,7 +790,13 @@ static void ivshmem_enable_irqfd(IVShmemState *s)
>      int i;
>
>      for (i = 0; i < s->peers[s->vm_id].nb_eventfds; i++) {
> -        ivshmem_add_kvm_msi_virq(s, i);
> +        Error *err = NULL;
> +
> +        ivshmem_add_kvm_msi_virq(s, i, &err);
> +        if (err) {
> +            error_report_err(err);
> +            /* TODO do we need to handle the error? */
> +        }
>      }
>
>      if (msix_set_vector_notifiers(pdev,
> @@ -814,7 +842,7 @@ static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
>      pci_default_write_config(pdev, address, val, len);
>      is_enabled = msix_enabled(pdev);
>
> -    if (kvm_msi_via_irqfd_enabled() && s->vm_id != -1) {
> +    if (kvm_msi_via_irqfd_enabled()) {
>          if (!was_enabled && is_enabled) {
>              ivshmem_enable_irqfd(s);
>          } else if (was_enabled && !is_enabled) {
> @@ -933,15 +961,16 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>           * Receive setup messages from server synchronously.
>           * Older versions did it asynchronously, but that creates a
>           * number of entertaining race conditions.
> -         * TODO Propagate errors!  Without that, we still have races
> -         * on errors.
>           */
> -        ivshmem_recv_setup(s);
> -        if (memory_region_is_mapped(&s->ivshmem)) {
> -            qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
> -                                  ivshmem_read, NULL, s);
> +        ivshmem_recv_setup(s, &err);
> +        if (err) {
> +            error_propagate(errp, err);
> +            return;
>          }
>
> +        qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
> +                              ivshmem_read, NULL, s);
> +
>          if (ivshmem_setup_interrupts(s) < 0) {
>              error_setg(errp, "failed to initialize interrupts");
>              return;
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index 65f3b29..8afbbcd 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -1289,14 +1289,7 @@ qemu-system-i386 -device ivshmem,size=@var{shm-size},vectors=@var{vectors},chard
>
>  When using the server, the guest will be assigned a VM ID (>=0) that allows guests
>  using the same server to communicate via interrupts.  Guests can read their
> -VM ID from a device register (see example code).  Since receiving the shared
> -memory region from the server is asynchronous, there is a (small) chance the
> -guest may boot before the shared memory is attached.  To allow an application
> -to ensure shared memory is attached, the VM ID register will return -1 (an
> -invalid VM ID) until the memory is attached.  Once the shared memory is
> -attached, the VM ID will return the guest's valid VM ID.  With these semantics,
> -the guest application can check to ensure the shared memory is attached to the
> -guest before proceeding.
> +VM ID from a device register (see ivshmem-spec.txt).
>
>  The @option{role} argument can be set to either master or peer and will affect
>  how the shared memory is migrated.  With @option{role=master}, the guest will
> --
> 2.4.3
>
>

looks good otherwise

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 25/38] ivshmem: Rely on server sending the ID right after the version
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 25/38] ivshmem: Rely on server sending the ID right after the version Markus Armbruster
@ 2016-03-02 18:36   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 18:36 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> The protocol specification (ivshmem-spec.txt, formerly
> ivshmem_device_spec.txt) has always required the ID message to be sent
> right at the beginning, and ivshmem-server has always complied.  The
> device, however, accepts it out of order.  If an interrupt setup
> arrived before it, though, it would be misinterpreted as connect
> notification.  Fix the latent bug by relying on the spec and
> ivshmem-server's actual behavior.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


>  hw/misc/ivshmem.c | 27 ++++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 831da53..8f976ca 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -653,8 +653,6 @@ static void process_msg(IVShmemState *s, int64_t msg, int fd, Error **errp)
>
>      if (fd >= 0) {
>          process_msg_connect(s, msg, fd, errp);
> -    } else if (s->vm_id == -1) {
> -        s->vm_id = msg;
>      } else {
>          process_msg_disconnect(s, msg, errp);
>      }
> @@ -723,6 +721,30 @@ static void ivshmem_recv_setup(IVShmemState *s, Error **errp)
>      }
>
>      /*
> +     * ivshmem-server sends the remaining initial messages in a fixed
> +     * order, but the device has always accepted them in any order.
> +     * Stay as compatible as practical, just in case people use
> +     * servers that behave differently.
> +     */
> +
> +    /*
> +     * ivshmem_device_spec.txt has always required the ID message
> +     * right here, and ivshmem-server has always complied.  However,
> +     * older versions of the device accepted it out of order, but
> +     * broke when an interrupt setup message arrived before it.
> +     */
> +    msg = ivshmem_recv_msg(s, &fd, &err);
> +    if (err) {
> +        error_propagate(errp, err);
> +        return;
> +    }
> +    if (fd != -1 || msg < 0 || msg > IVSHMEM_MAX_PEERS) {
> +        error_setg(errp, "server sent invalid ID message");
> +        return;
> +    }
> +    s->vm_id = msg;
> +
> +    /*
>       * Receive more messages until we got shared memory.
>       */
>      do {
> @@ -953,7 +975,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>
>          /* we allocate enough space for 16 peers and grow as needed */
>          resize_peers(s, 16);
> -        s->vm_id = -1;
>
>          pci_register_bar(dev, 2, attr, &s->bar);
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 26/38] ivshmem: Drop the hackish test for UNIX domain chardev
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 26/38] ivshmem: Drop the hackish test for UNIX domain chardev Markus Armbruster
@ 2016-03-02 18:38   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 18:38 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> The chardev must be capable of transmitting SCM_RIGHTS ancillary
> messages.  We check it by comparing CharDriverState member filename to
> "unix:".  That's almost as brittle as it is disgusting.
>
> When the actual transmission all happened asynchronously, this check
> was all we could do in realize(), and thus better than nothing.  But
> now we receive at least one SCM_RIGHTS synchronously in realize(),
> it's not worth its keep anymore.  Drop it.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Didn't look that horrible to me, and could be actually more useful
than a later error. But I don't think this is an issue, so why not
drop a few lines..

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


>  hw/misc/ivshmem.c | 9 ---------
>  1 file changed, 9 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 8f976ca..e578b8a 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -961,15 +961,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>          memory_region_add_subregion(&s->bar, 0, mr);
>          pci_register_bar(PCI_DEVICE(s), 2, attr, &s->bar);
>      } else if (s->server_chr != NULL) {
> -        /* FIXME do not rely on what chr drivers put into filename */
> -        if (strncmp(s->server_chr->filename, "unix:", 5)) {
> -            error_setg(errp, "chardev is not a unix client socket");
> -            return;
> -        }
> -
> -        /* if we get a UNIX socket as the parameter we will talk
> -         * to the ivshmem server to receive the memory region */
> -
>          IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
>                          s->server_chr->filename);
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 27/38] ivshmem: Simplify how we cope with short reads from server
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 27/38] ivshmem: Simplify how we cope with short reads from server Markus Armbruster
@ 2016-03-02 18:41   ` Marc-André Lureau
  2016-03-02 19:38     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 18:41 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Short reads from a UNIX domain sockets are exceedingly unlikely when
> the other side always sends eight bytes and we always read eight
> bytes.  We cope with them anyway.  However, the code doing that is
> rather convoluted.  Dumb it down radically.
>
> Replace the convoluted code

agreed

>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  hw/misc/ivshmem.c | 76 ++++++++++++-------------------------------------------
>  1 file changed, 16 insertions(+), 60 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index e578b8a..fb8a4f7 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -26,7 +26,6 @@
>  #include "migration/migration.h"
>  #include "qemu/error-report.h"
>  #include "qemu/event_notifier.h"
> -#include "qemu/fifo8.h"
>  #include "sysemu/char.h"
>  #include "sysemu/hostmem.h"
>  #include "qapi/visitor.h"
> @@ -80,7 +79,6 @@ typedef struct IVShmemState {
>      uint32_t intrstatus;
>
>      CharDriverState *server_chr;
> -    Fifo8 incoming_fifo;
>      MemoryRegion ivshmem_mmio;
>
>      /* We might need to register the BAR before we actually have the memory.
> @@ -99,6 +97,8 @@ typedef struct IVShmemState {
>      uint32_t vectors;
>      uint32_t features;
>      MSIVector *msi_vectors;
> +    uint64_t msg_buf;           /* buffer for receiving server messages */
> +    int msg_buffered_bytes;     /* #bytes in @msg_buf */
>
>      Error *migration_blocker;
>
> @@ -255,11 +255,6 @@ static const MemoryRegionOps ivshmem_mmio_ops = {
>      },
>  };
>
> -static int ivshmem_can_receive(void * opaque)
> -{
> -    return sizeof(int64_t);
> -}
> -
>  static void ivshmem_vector_notify(void *opaque)
>  {
>      MSIVector *entry = opaque;
> @@ -459,53 +454,6 @@ static void resize_peers(IVShmemState *s, int nb_peers)
>      }
>  }
>
> -static bool fifo_update_and_get(IVShmemState *s, const uint8_t *buf, int size,
> -                                void *data, size_t len)
> -{
> -    const uint8_t *p;
> -    uint32_t num;
> -
> -    assert(len <= sizeof(int64_t)); /* limitation of the fifo */
> -    if (fifo8_is_empty(&s->incoming_fifo) && size == len) {
> -        memcpy(data, buf, size);
> -        return true;
> -    }
> -
> -    IVSHMEM_DPRINTF("short read of %d bytes\n", size);
> -
> -    num = MIN(size, sizeof(int64_t) - fifo8_num_used(&s->incoming_fifo));
> -    fifo8_push_all(&s->incoming_fifo, buf, num);
> -
> -    if (fifo8_num_used(&s->incoming_fifo) < len) {
> -        assert(num == 0);
> -        return false;
> -    }
> -
> -    size -= num;
> -    buf += num;
> -    p = fifo8_pop_buf(&s->incoming_fifo, len, &num);
> -    assert(num == len);
> -
> -    memcpy(data, p, len);
> -
> -    if (size > 0) {
> -        fifo8_push_all(&s->incoming_fifo, buf, size);
> -    }
> -
> -    return true;
> -}
> -
> -static bool fifo_update_and_get_i64(IVShmemState *s,
> -                                    const uint8_t *buf, int size, int64_t *i64)
> -{
> -    if (fifo_update_and_get(s, buf, size, i64, sizeof(*i64))) {
> -        *i64 = GINT64_FROM_LE(*i64);
> -        return true;
> -    }
> -
> -    return false;
> -}
> -
>  static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
>                                       Error **errp)
>  {
> @@ -658,6 +606,14 @@ static void process_msg(IVShmemState *s, int64_t msg, int fd, Error **errp)
>      }
>  }
>
> +static int ivshmem_can_receive(void *opaque)
> +{
> +    IVShmemState *s = opaque;
> +
> +    assert(s->msg_buffered_bytes < sizeof(s->msg_buf));
> +    return sizeof(s->msg_buf) - s->msg_buffered_bytes;
> +}
> +
>  static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>  {
>      IVShmemState *s = opaque;
> @@ -665,8 +621,12 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>      int incoming_fd;
>      int64_t incoming_posn;
>
> -    if (!fifo_update_and_get_i64(s, buf, size, &incoming_posn)) {
> -        return;
> +    assert(size >= 0 && s->msg_buffered_bytes + size <= sizeof(s->msg_buf));
> +    memcpy((unsigned char *)&s->msg_buf + s->msg_buffered_bytes, buf, size);
> +    s->msg_buffered_bytes += size;
> +    if (s->msg_buffered_bytes == sizeof(s->msg_buf)) {
> +        incoming_posn = le64_to_cpu(s->msg_buf);
> +        s->msg_buffered_bytes = 0;
>      }
>

missing "else return" though.

>      incoming_fd = qemu_chr_fe_get_msgfd(s->server_chr);
> @@ -1019,8 +979,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>          }
>      }
>
> -    fifo8_create(&s->incoming_fifo, sizeof(int64_t));
> -
>      if (s->role_val == IVSHMEM_PEER) {
>          error_setg(&s->migration_blocker,
>                     "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
> @@ -1033,8 +991,6 @@ static void pci_ivshmem_exit(PCIDevice *dev)
>      IVShmemState *s = IVSHMEM(dev);
>      int i;
>
> -    fifo8_destroy(&s->incoming_fifo);
> -
>      if (s->migration_blocker) {
>          migrate_del_blocker(s->migration_blocker);
>          error_free(s->migration_blocker);
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 28/38] ivshmem: Tighten check of property "size"
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 28/38] ivshmem: Tighten check of property "size" Markus Armbruster
@ 2016-03-02 18:44   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 18:44 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> If size_t is narrower than 64 bits, passing uint64_t ivshmem_size to
> mmap() truncates.  Reject such sizes.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


>  hw/misc/ivshmem.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index fb8a4f7..8d54fa9 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -87,7 +87,7 @@ typedef struct IVShmemState {
>       */
>      MemoryRegion bar;
>      MemoryRegion ivshmem;
> -    uint64_t ivshmem_size; /* size of shared memory region */
> +    size_t ivshmem_size; /* size of shared memory region */
>      uint32_t ivshmem_64bit;
>
>      Peer *peers;
> @@ -361,7 +361,7 @@ static int check_shm_size(IVShmemState *s, int fd, Error **errp)
>
>      if (s->ivshmem_size > buf.st_size) {
>          error_setg(errp, "Requested memory size greater"
> -                   " than shared object size (%" PRIu64 " > %" PRIu64")",
> +                   " than shared object size (%zu > %" PRIu64")",
>                     s->ivshmem_size, (uint64_t)buf.st_size);
>          return -1;
>      } else {
> @@ -861,7 +861,8 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>      } else {
>          char *end;
>          int64_t size = qemu_strtosz(s->sizearg, &end);
> -        if (size < 0 || *end != '\0' || !is_power_of_2(size)) {
> +        if (size < 0 || (size_t)size != size || *end != '\0'
> +            || !is_power_of_2(size)) {
>              error_setg(errp, "Invalid size %s", s->sizearg);
>              return;
>          }
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 31/38] ivshmem: Inline check_shm_size() into its only caller
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 31/38] ivshmem: Inline check_shm_size() into its only caller Markus Armbruster
@ 2016-03-02 18:49   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 18:49 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Improve the error messages while there.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

I am not convinced this improves readibility much, I would cleanup a
bit the function, but keep it.

>  hw/misc/ivshmem.c | 37 +++++++++++--------------------------
>  1 file changed, 11 insertions(+), 26 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 0440bca..785ed1c 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -342,29 +342,6 @@ static void watch_vector_notifier(IVShmemState *s, EventNotifier *n,
>                          NULL, &s->msi_vectors[vector]);
>  }
>
> -static int check_shm_size(IVShmemState *s, int fd, Error **errp)
> -{
> -    /* check that the guest isn't going to try and map more memory than the
> -     * the object has allocated return -1 to indicate error */
> -
> -    struct stat buf;
> -
> -    if (fstat(fd, &buf) < 0) {
> -        error_setg(errp, "exiting: fstat on fd %d failed: %s",
> -                   fd, strerror(errno));
> -        return -1;
> -    }
> -
> -    if (s->ivshmem_size > buf.st_size) {
> -        error_setg(errp, "Requested memory size greater"
> -                   " than shared object size (%zu > %" PRIu64")",
> -                   s->ivshmem_size, (uint64_t)buf.st_size);
> -        return -1;
> -    } else {
> -        return 0;
> -    }
> -}
> -
>  static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i)
>  {
>      memory_region_add_eventfd(&s->ivshmem_mmio,
> @@ -479,7 +456,7 @@ static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
>
>  static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
>  {
> -    Error *err = NULL;
> +    struct stat buf;
>      void *ptr;
>
>      if (s->ivshmem_bar2) {
> @@ -488,8 +465,16 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
>          return;
>      }
>
> -    if (check_shm_size(s, fd, &err) == -1) {
> -        error_propagate(errp, err);
> +    if (fstat(fd, &buf) < 0) {
> +        error_setg_errno(errp, errno,
> +            "can't determine size of shared memory sent by server");
> +        close(fd);
> +        return;
> +    }
> +
> +    if (s->ivshmem_size > buf.st_size) {
> +        error_setg(errp, "server sent only %zd bytes of shared memory",
> +                   (size_t)buf.st_size);
>          close(fd);
>          return;
>      }
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 32/38] qdev: New DEFINE_PROP_ON_OFF_AUTO
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 32/38] qdev: New DEFINE_PROP_ON_OFF_AUTO Markus Armbruster
@ 2016-03-02 18:54   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 18:54 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


>  hw/core/qdev-properties.c    | 10 ++++++++++
>  include/hw/qdev-properties.h |  3 +++
>  2 files changed, 13 insertions(+)
>
> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> index bc89800..d2f5a08 100644
> --- a/hw/core/qdev-properties.c
> +++ b/hw/core/qdev-properties.c
> @@ -516,6 +516,16 @@ PropertyInfo qdev_prop_macaddr = {
>      .set   = set_mac,
>  };
>
> +/* --- on/off/auto --- */
> +
> +PropertyInfo qdev_prop_on_off_auto = {
> +    .name = "OnOffAuto",
> +    .description = "on/off/auto",
> +    .enum_table = OnOffAuto_lookup,
> +    .get = get_enum,
> +    .set = set_enum,
> +};
> +
>  /* --- lost tick policy --- */
>
>  QEMU_BUILD_BUG_ON(sizeof(LostTickPolicy) != sizeof(int));
> diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
> index 03a1b91..0586cac 100644
> --- a/include/hw/qdev-properties.h
> +++ b/include/hw/qdev-properties.h
> @@ -18,6 +18,7 @@ extern PropertyInfo qdev_prop_string;
>  extern PropertyInfo qdev_prop_chr;
>  extern PropertyInfo qdev_prop_ptr;
>  extern PropertyInfo qdev_prop_macaddr;
> +extern PropertyInfo qdev_prop_on_off_auto;
>  extern PropertyInfo qdev_prop_losttickpolicy;
>  extern PropertyInfo qdev_prop_bios_chs_trans;
>  extern PropertyInfo qdev_prop_fdc_drive_type;
> @@ -155,6 +156,8 @@ extern PropertyInfo qdev_prop_arraylen;
>      DEFINE_PROP(_n, _s, _f, qdev_prop_drive, BlockBackend *)
>  #define DEFINE_PROP_MACADDR(_n, _s, _f)         \
>      DEFINE_PROP(_n, _s, _f, qdev_prop_macaddr, MACAddr)
> +#define DEFINE_PROP_ON_OFF_AUTO(_n, _s, _f, _d) \
> +    DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_on_off_auto, OnOffAuto)
>  #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
>      DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_losttickpolicy, \
>                          LostTickPolicy)
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 33/38] ivshmem: Replace int role_val by OnOffAuto master
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 33/38] ivshmem: Replace int role_val by OnOffAuto master Markus Armbruster
@ 2016-03-02 18:56   ` Marc-André Lureau
  2016-03-02 19:39     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 18:56 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> In preparation of making it a qdev property.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> --
>  hw/misc/ivshmem.c | 31 +++++++++++++++++++------------
>  1 file changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 785ed1c..b39ea27 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -43,9 +43,6 @@
>  #define IVSHMEM_IOEVENTFD   0
>  #define IVSHMEM_MSI     1
>
> -#define IVSHMEM_PEER    0
> -#define IVSHMEM_MASTER  1
> -
>  #define IVSHMEM_REG_BAR_SIZE 0x100
>
>  #define IVSHMEM_DEBUG 0
> @@ -96,12 +93,12 @@ typedef struct IVShmemState {
>      uint64_t msg_buf;           /* buffer for receiving server messages */
>      int msg_buffered_bytes;     /* #bytes in @msg_buf */
>
> +    OnOffAuto master;
>      Error *migration_blocker;
>
>      char * shmobj;
>      char * sizearg;
>      char * role;
> -    int role_val;   /* scalar to avoid multiple string comparisons */
>  } IVShmemState;
>
>  /* registers for the Inter-VM shared memory device */
> @@ -117,6 +114,12 @@ static inline uint32_t ivshmem_has_feature(IVShmemState *ivs,
>      return (ivs->features & (1 << feature));
>  }
>
> +static inline bool ivshmem_is_master(IVShmemState *s)
> +{
> +    assert(s->master != ON_OFF_AUTO_AUTO);
> +    return s->master == ON_OFF_AUTO_ON;
> +}
> +
>  static void ivshmem_update_irq(IVShmemState *s)
>  {
>      PCIDevice *d = PCI_DEVICE(s);
> @@ -861,15 +864,15 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>      /* check that role is reasonable */
>      if (s->role) {
>          if (strncmp(s->role, "peer", 5) == 0) {
> -            s->role_val = IVSHMEM_PEER;
> +            s->master = ON_OFF_AUTO_OFF;
>          } else if (strncmp(s->role, "master", 7) == 0) {
> -            s->role_val = IVSHMEM_MASTER;
> +            s->master = ON_OFF_AUTO_ON;
>          } else {
>              error_setg(errp, "'role' must be 'peer' or 'master'");
>              return;
>          }
>      } else {
> -        s->role_val = IVSHMEM_MASTER; /* default */
> +        s->master = ON_OFF_AUTO_AUTO;
>      }
>
>      pci_conf = dev->config;
> @@ -931,7 +934,11 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>      vmstate_register_ram(s->ivshmem_bar2, DEVICE(s));
>      pci_register_bar(PCI_DEVICE(s), 2, attr, s->ivshmem_bar2);
>
> -    if (s->role_val == IVSHMEM_PEER) {
> +    if (s->master == ON_OFF_AUTO_AUTO) {
> +        s->master = s->vm_id == 0 ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
> +    }
> +
> +    if (ivshmem_is_master(s)) {

!ivshmem_is_master() instead, or ivshmem_is_peer().

>          error_setg(&s->migration_blocker,
>                     "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
>          migrate_add_blocker(s->migration_blocker);
> @@ -993,7 +1000,7 @@ static int ivshmem_pre_load(void *opaque)
>  {
>      IVShmemState *s = opaque;
>
> -    if (s->role_val == IVSHMEM_PEER) {
> +    if (ivshmem_is_master(s)) {

same here

>          error_report("'peer' devices are not migratable");
>          return -EINVAL;
>      }
> @@ -1019,9 +1026,9 @@ static int ivshmem_load_old(QEMUFile *f, void *opaque, int version_id)
>          return -EINVAL;
>      }
>
> -    if (s->role_val == IVSHMEM_PEER) {
> -        error_report("'peer' devices are not migratable");
> -        return -EINVAL;
> +    ret = ivshmem_pre_load(s);
> +    if (ret) {
> +        return ret;
>      }
>
>      ret = pci_device_load(pdev, f);
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 21/38] ivshmem: Disentangle ivshmem_read()
  2016-03-02 17:33       ` Marc-André Lureau
@ 2016-03-02 19:15         ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 19:15 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> On Wed, Mar 2, 2016 at 4:53 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>>> +    if (msg == -1) {
>>>> +        process_msg_shmem(s, fd);
>>>
>>> the previous code used to close fd if any, it's worth to keep that imho
>>
>> I'm blind.  Where?
>
> Sorry, wrong place I looked at, seems you got them all.
>
>     if (msg < -1 || msg > IVSHMEM_MAX_PEERS) {
>         error_report("server sent invalid message %" PRId64, msg);
>         close(fd);
>         return;
>     }
>
>
> However, why not keep the if fd != -1 here (not a great idea to call
> close otherwise)

We refuse to make the code more verbose to avoid free(NULL), and I very
much agree with that.

close(-1) is like free(NULL) in that it is perfectly safe.  Where they
differ is performance: free() checks for null right away, but close()
checks only after switching to supervisor mode.  Doesn't matter on an
error path.

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 22/38] ivshmem: Plug leaks on unplug, fix peer disconnect
  2016-03-02 17:47   ` Marc-André Lureau
@ 2016-03-02 19:19     ` Markus Armbruster
  2016-03-02 23:52       ` Marc-André Lureau
  0 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 19:19 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> close_peer_eventfds() cleans up three things: ioeventfd triggers if
>> they exist, eventfds, and the array to store them.
>>
>> Commit 98609cd (v1.2.0) fixed it not to clean up ioeventfd triggers
>> when they don't exist (property ioeventfd=off, which is the default).
>> Unfortunately, the fix also made it skip cleanup of the eventfds and
>> the array then.  This is a memory and file descriptor leak on unplug.
>>
>> Additionally, the reset of nb_eventfds is skipped.  Doesn't matter on
>> unplug.  On peer disconnect, however, this permanently wedges the
>> interrupt vectors used for that peer's ID.  The eventfds stay behind,
>> but aren't connected to a peer anymore.  When the ID gets recycled for
>> a new peer, the new peer's eventfds get assigned to vectors after the
>> old ones.  Commonly, the device's number of vectors matches the
>> server's, so the new ones get dropped with a "Too many eventfd
>> received" message.  Interrupts either don't work (common case) or go
>> to the wrong vector.
>>
>> Fix by narrowing the conditional to just the ioeventfd trigger
>> cleanup.
>>
>> While there, move the "invalid" peer check to the only caller where it
>> can actually happen.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  hw/misc/ivshmem.c | 24 ++++++++++++------------
>>  1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index fc46666..c366087 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -428,21 +428,17 @@ static void close_peer_eventfds(IVShmemState *s, int posn)
>>  {
>>      int i, n;
>>
>> -    if (!ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
>> -        return;
>> -    }
>> -    if (posn < 0 || posn >= s->nb_peers) {
>> -        error_report("invalid peer %d", posn);
>> -        return;
>> -    }
>> -
>> +    assert(posn >= 0 && posn < s->nb_peers);
>>      n = s->peers[posn].nb_eventfds;
>>
>> -    memory_region_transaction_begin();
>> -    for (i = 0; i < n; i++) {
>> -        ivshmem_del_eventfd(s, posn, i);
>> +    if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
>> +        memory_region_transaction_begin();
>> +        for (i = 0; i < n; i++) {
>> +            ivshmem_del_eventfd(s, posn, i);
>> +        }
>> +        memory_region_transaction_commit();
>>      }
>> -    memory_region_transaction_commit();
>> +
>>      for (i = 0; i < n; i++) {
>>          event_notifier_cleanup(&s->peers[posn].eventfds[i]);
>>      }
>
> Looks good, that makes me wonder, what would happen if posn == vm_id?
> I think this should be made an invalid condition or it should revert
> setup_interrupt().

When called from pci_ivshmem_exit(): perfectly fine.

When called from process_msg_disconnect(): invalid as long as
ivshmem-spec.txt doesn't assign a sane meaning to it.  Let's make it an
error there, okay?

>> @@ -598,6 +594,10 @@ static void process_msg_shmem(IVShmemState *s, int fd)
>>  static void process_msg_disconnect(IVShmemState *s, uint16_t posn)
>>  {
>>      IVSHMEM_DPRINTF("posn %d has gone away\n", posn);
>> +    if (posn >= s->nb_peers) {
>> +        error_report("invalid peer %d", posn);
>> +        return;
>> +    }
>>      close_peer_eventfds(s, posn);
>>  }
>>
>> --
>> 2.4.3
>>
>>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 23/38] ivshmem: Receive shared memory synchronously in realize()
  2016-03-02 18:11   ` Marc-André Lureau
@ 2016-03-02 19:28     ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 19:28 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> When configured for interrupts (property "chardev" given), we receive
>> the shared memory from an ivshmem server.  We do so asynchronously
>> after realize() completes, by setting up callbacks with
>> qemu_chr_add_handlers().
>>
>> Keeping server I/O out of realize() that way avoids delays due to a
>> slow server.  This is probably relevant only for hot plug.
>>
>> However, this funny "no shared memory, yet" state of the device also
>> causes a raft of issues that are hard or impossible to work around:
>>
>> * The guest is exposed to this state: when we enter and leave it its
>>   shared memory contents is apruptly replaced, and device register
>>   IVPosition changes.
>>
>>   This is a known issue.  We document that guests should not access
>>   the shared memory after device initialization until the IVPosition
>>   register becomes non-negative.
>>
>>   For cold plug, the funny state is unlikely to be visible in
>>   practice, because we normally receive the shared memory long before
>>   the guest gets around to mess with the device.
>>
>>   For hot plug, the timing is tighter, but the relative slowness of
>>   PCI device configuration has a good chance to hide the funny state.
>>
>>   In either case, guests complying with the documented procedure are
>>   safe.
>>
>> * Migration becomes racy.
>>
>>   If migration completes before the shared memory setup completes on
>>   the source, shared memory contents is silently lost.  Fortunately,
>>   migration is rather unlikely to win this race.
>>
>>   If the shared memory's ramblock arrives at the destination before
>>   shared memory setup completes, migration fails.
>>
>>   There is no known way for a management application to wait for
>>   shared memory setup to complete.
>>
>>   All you can do is retry failed migration.  You can improve your
>>   chances by leaving more time between running the destination QEMU
>>   and the migrate command.
>>
>>   To mitigate silent memory loss, you need to ensure the server
>>   initializes shared memory exactly the same on source and
>>   destination.
>>
>>   These issues are entirely undocumented so far.
>>
>> I'd expect the server to be almost always fast enough to hide these
>> issues.  But then rare catastrophic races are in a way the worst kind.
>>
>> This is way more trouble than I'm willing to take from any device.
>> Kill the funny state by receiving shared memory synchronously in
>> realize().  If your hot plug hangs, go kill your ivshmem server.
>>
>> For easier review, this commit only makes the receive synchronous, it
>> doesn't add the necessary error propagation.  Without that, the funny
>> state persists.  The next commit will do that, and kill it off for
>> real.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  hw/misc/ivshmem.c    | 70 +++++++++++++++++++++++++++++++++++++---------------
>>  tests/ivshmem-test.c | 26 ++++++-------------
>>  2 files changed, 57 insertions(+), 39 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index c366087..352937f 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -676,27 +676,47 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>>      process_msg(s, incoming_posn, incoming_fd);
>>  }
>>
>> -static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size)
>> +static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
>>  {
>> -    IVShmemState *s = opaque;
>> -    int tmp;
>> -    int64_t version;
>> +    int64_t msg;
>> +    int n, ret;
>>
>> -    if (!fifo_update_and_get_i64(s, buf, size, &version)) {
>> -        return;
>> -    }
>> +    n = 0;
>> +    do {
>> +        ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *)&msg + n,
>> +                                 sizeof(msg) - n);
>> +        if (ret < 0 && ret != -EINTR) {
>> +            /* TODO error handling */
>> +            return INT64_MIN;
>> +        }
>> +        n += ret;
>> +    } while (n < sizeof(msg));
>>
>> -    tmp = qemu_chr_fe_get_msgfd(s->server_chr);
>> -    if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) {
>> +    *pfd = qemu_chr_fe_get_msgfd(s->server_chr);
>> +    return msg;
>> +}
>> +
>> +static void ivshmem_recv_setup(IVShmemState *s)
>> +{
>> +    int64_t msg;
>> +    int fd;
>> +
>> +    msg = ivshmem_recv_msg(s, &fd);
>> +    if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) {
>>          fprintf(stderr, "incompatible version, you are connecting to a ivshmem-"
>>                  "server using a different protocol please check your setup\n");
>> -        qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s);
>>          return;
>>      }
>>
>> -    IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n");
>> -    qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read,
>> -                          NULL, s);
>> +    /*
>> +     * Receive more messages until we got shared memory.
>> +     */
>> +    do {
>> +        msg = ivshmem_recv_msg(s, &fd);
>> +        process_msg(s, msg, fd);
>> +    } while (msg != -1);
>> +
>> +    assert(memory_region_is_mapped(&s->ivshmem));
>
> It is easy to trigger at runtime, I suggest to report an error instead.

On successful return, the shared memory must be mapped.  To ensure that,
the code receives messages until it is.  The assertion double-checks the
code got it right.

However, you're right in that in this half-finished state, the assertion
is problematic: since we don't yet propagate errors, we take the success
path even when process_msg_shmem() failed.

I guess I'll have to turn the assertion into a comment until the next
patch.

> looks good otherwise

Thanks!

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup()
  2016-03-02 18:27   ` Marc-André Lureau
@ 2016-03-02 19:35     ` Markus Armbruster
  2016-03-03  0:03       ` Marc-André Lureau
  0 siblings, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 19:35 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> This kills off the funny state described in the previous commit.
>>
>> Simplify ivshmem_io_read() accordingly, and update documentation.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  docs/specs/ivshmem-spec.txt |  20 ++++----
>>  hw/misc/ivshmem.c           | 121 +++++++++++++++++++++++++++-----------------
>>  qemu-doc.texi               |   9 +---
>>  3 files changed, 87 insertions(+), 63 deletions(-)
>>
>> diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
>> index 4fc6f37..3eb8c97 100644
>> --- a/docs/specs/ivshmem-spec.txt
>> +++ b/docs/specs/ivshmem-spec.txt
>> @@ -62,11 +62,11 @@ There are two ways to use this device:
>>    likely want to write a kernel driver to handle interrupts.  Requires
>>    the device to be configured for interrupts, obviously.
>>
>> -If the device is configured for interrupts, BAR2 is initially invalid.
>> -It becomes safely accessible only after the ivshmem server provided
>> -the shared memory.  Guest software should wait for the IVPosition
>> -register (described below) to become non-negative before accessing
>> -BAR2.
>> +Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
>> +configured for interrupts.  It becomes safely accessible only after
>> +the ivshmem server provided the shared memory.  Guest software should
>> +wait for the IVPosition register (described below) to become
>> +non-negative before accessing BAR2.
>>
>>  The device is not capable to tell guest software whether it is
>>  configured for interrupts.
>> @@ -82,7 +82,7 @@ BAR 0 contains the following registers:
>>          4     4   read/write        0   Interrupt Status
>>                                          bit 0: peer interrupt
>>                                          bit 1..31: reserved
>> -        8     4   read-only   0 or -1   IVPosition
>> +        8     4   read-only   0 or ID   IVPosition
>>         12     4   write-only      N/A   Doorbell
>>                                          bit 0..15: vector
>>                                          bit 16..31: peer ID
>> @@ -100,12 +100,14 @@ when an interrupt request from a peer is received.  Reading the
>>  register clears it.
>>
>>  IVPosition Register: if the device is not configured for interrupts,
>> -this is zero.  Else, it's -1 for a short while after reset, then
>> -changes to the device's ID (between 0 and 65535).
>> +this is zero.  Else, it is the device's ID (between 0 and 65535).
>> +
>> +Before QEMU 2.6.0, the register may read -1 for a short while after
>> +reset.
>>
>>  There is no good way for software to find out whether the device is
>>  configured for interrupts.  A positive IVPosition means interrupts,
>> -but zero could be either.  The initial -1 cannot be reliably observed.
>> +but zero could be either.
>>
>>  Doorbell Register: writing this register requests to interrupt a peer.
>>  The written value's high 16 bits are the ID of the peer to interrupt,
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index 352937f..831da53 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -234,12 +234,7 @@ static uint64_t ivshmem_io_read(void *opaque, hwaddr addr,
>>              break;
>>
>>          case IVPOSITION:
>> -            /* return my VM ID if the memory is mapped */
>> -            if (memory_region_is_mapped(&s->ivshmem)) {
>> -                ret = s->vm_id;
>> -            } else {
>> -                ret = -1;
>> -            }
>> +            ret = s->vm_id;
>>              break;
>>
>>          default:
>> @@ -511,7 +506,8 @@ static bool fifo_update_and_get_i64(IVShmemState *s,
>>      return false;
>>  }
>>
>> -static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
>> +static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
>> +                                     Error **errp)
>
> I prefer to return -1 in case of error, even if Error** is also returned.

You know, I'd prefer that, too, and I've argued for it unsuccessfully.
As it is, we fairly consistently return void when the function returns
errors through Error ** and has no non-error value.

>>  {
>>      PCIDevice *pdev = PCI_DEVICE(s);
>>      MSIMessage msg = msix_get_message(pdev, vector);
>> @@ -522,22 +518,21 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
>>
>>      ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
>>      if (ret < 0) {
>> -        error_report("ivshmem: kvm_irqchip_add_msi_route failed");
>> -        return -1;
>> +        error_setg(errp, "kvm_irqchip_add_msi_route failed");
>> +        return;
>>      }
>>
>>      s->msi_vectors[vector].virq = ret;
>>      s->msi_vectors[vector].pdev = pdev;
>> -
>> -    return 0;
>>  }
>>
>> -static void setup_interrupt(IVShmemState *s, int vector)
>> +static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
>>  {
>>      EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
>>      bool with_irqfd = kvm_msi_via_irqfd_enabled() &&
>>          ivshmem_has_feature(s, IVSHMEM_MSI);
>>      PCIDevice *pdev = PCI_DEVICE(s);
>> +    Error *err = NULL;
>>
>>      IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);
>>
>> @@ -546,13 +541,16 @@ static void setup_interrupt(IVShmemState *s, int vector)
>>          watch_vector_notifier(s, n, vector);
>>      } else if (msix_enabled(pdev)) {
>>          IVSHMEM_DPRINTF("with irqfd\n");
>> -        if (ivshmem_add_kvm_msi_virq(s, vector) < 0) {
>> +        ivshmem_add_kvm_msi_virq(s, vector, &err);
>> +        if (err) {
>> +            error_propagate(errp, err);
>>              return;
>
> That would make this simpler, avoiding local err variables, and
> propagate. But you seem to prefer that form. I don't know if there is
> any conventions (I am used to glib conventions, and usually a bool
> success is returned, even if the function takes a GError)

Does GLib spell out this convention somewhere?

I can perhaps try to cook up a patch to demonstrate the advantages of
returning a success/failure value even with Error **, and try to get our
convention changed.

Until then, we better stick to the existing convention, whether we like
it or not.

Thanks!

[...]

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 27/38] ivshmem: Simplify how we cope with short reads from server
  2016-03-02 18:41   ` Marc-André Lureau
@ 2016-03-02 19:38     ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 19:38 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> Short reads from a UNIX domain sockets are exceedingly unlikely when
>> the other side always sends eight bytes and we always read eight
>> bytes.  We cope with them anyway.  However, the code doing that is
>> rather convoluted.  Dumb it down radically.
>>
>> Replace the convoluted code
>
> agreed
>
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  hw/misc/ivshmem.c | 76 ++++++++++++-------------------------------------------
>>  1 file changed, 16 insertions(+), 60 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index e578b8a..fb8a4f7 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -26,7 +26,6 @@
>>  #include "migration/migration.h"
>>  #include "qemu/error-report.h"
>>  #include "qemu/event_notifier.h"
>> -#include "qemu/fifo8.h"
>>  #include "sysemu/char.h"
>>  #include "sysemu/hostmem.h"
>>  #include "qapi/visitor.h"
>> @@ -80,7 +79,6 @@ typedef struct IVShmemState {
>>      uint32_t intrstatus;
>>
>>      CharDriverState *server_chr;
>> -    Fifo8 incoming_fifo;
>>      MemoryRegion ivshmem_mmio;
>>
>>      /* We might need to register the BAR before we actually have the memory.
>> @@ -99,6 +97,8 @@ typedef struct IVShmemState {
>>      uint32_t vectors;
>>      uint32_t features;
>>      MSIVector *msi_vectors;
>> +    uint64_t msg_buf;           /* buffer for receiving server messages */
>> +    int msg_buffered_bytes;     /* #bytes in @msg_buf */
>>
>>      Error *migration_blocker;
>>
>> @@ -255,11 +255,6 @@ static const MemoryRegionOps ivshmem_mmio_ops = {
>>      },
>>  };
>>
>> -static int ivshmem_can_receive(void * opaque)
>> -{
>> -    return sizeof(int64_t);
>> -}
>> -
>>  static void ivshmem_vector_notify(void *opaque)
>>  {
>>      MSIVector *entry = opaque;
>> @@ -459,53 +454,6 @@ static void resize_peers(IVShmemState *s, int nb_peers)
>>      }
>>  }
>>
>> -static bool fifo_update_and_get(IVShmemState *s, const uint8_t *buf, int size,
>> -                                void *data, size_t len)
>> -{
>> -    const uint8_t *p;
>> -    uint32_t num;
>> -
>> -    assert(len <= sizeof(int64_t)); /* limitation of the fifo */
>> -    if (fifo8_is_empty(&s->incoming_fifo) && size == len) {
>> -        memcpy(data, buf, size);
>> -        return true;
>> -    }
>> -
>> -    IVSHMEM_DPRINTF("short read of %d bytes\n", size);
>> -
>> -    num = MIN(size, sizeof(int64_t) - fifo8_num_used(&s->incoming_fifo));
>> -    fifo8_push_all(&s->incoming_fifo, buf, num);
>> -
>> -    if (fifo8_num_used(&s->incoming_fifo) < len) {
>> -        assert(num == 0);
>> -        return false;
>> -    }
>> -
>> -    size -= num;
>> -    buf += num;
>> -    p = fifo8_pop_buf(&s->incoming_fifo, len, &num);
>> -    assert(num == len);
>> -
>> -    memcpy(data, p, len);
>> -
>> -    if (size > 0) {
>> -        fifo8_push_all(&s->incoming_fifo, buf, size);
>> -    }
>> -
>> -    return true;
>> -}
>> -
>> -static bool fifo_update_and_get_i64(IVShmemState *s,
>> -                                    const uint8_t *buf, int size, int64_t *i64)
>> -{
>> -    if (fifo_update_and_get(s, buf, size, i64, sizeof(*i64))) {
>> -        *i64 = GINT64_FROM_LE(*i64);
>> -        return true;
>> -    }
>> -
>> -    return false;
>> -}
>> -
>>  static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
>>                                       Error **errp)
>>  {
>> @@ -658,6 +606,14 @@ static void process_msg(IVShmemState *s, int64_t msg, int fd, Error **errp)
>>      }
>>  }
>>
>> +static int ivshmem_can_receive(void *opaque)
>> +{
>> +    IVShmemState *s = opaque;
>> +
>> +    assert(s->msg_buffered_bytes < sizeof(s->msg_buf));
>> +    return sizeof(s->msg_buf) - s->msg_buffered_bytes;
>> +}
>> +
>>  static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>>  {
>>      IVShmemState *s = opaque;
>> @@ -665,8 +621,12 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>>      int incoming_fd;
>>      int64_t incoming_posn;
>>
>> -    if (!fifo_update_and_get_i64(s, buf, size, &incoming_posn)) {
>> -        return;
>> +    assert(size >= 0 && s->msg_buffered_bytes + size <= sizeof(s->msg_buf));
>> +    memcpy((unsigned char *)&s->msg_buf + s->msg_buffered_bytes, buf, size);
>> +    s->msg_buffered_bytes += size;
>> +    if (s->msg_buffered_bytes == sizeof(s->msg_buf)) {
>> +        incoming_posn = le64_to_cpu(s->msg_buf);
>> +        s->msg_buffered_bytes = 0;
>>      }
>>
>
> missing "else return" though.

Indeed.  Glad you caught my screwup.

>>      incoming_fd = qemu_chr_fe_get_msgfd(s->server_chr);
>> @@ -1019,8 +979,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>>          }
>>      }
>>
>> -    fifo8_create(&s->incoming_fifo, sizeof(int64_t));
>> -
>>      if (s->role_val == IVSHMEM_PEER) {
>>          error_setg(&s->migration_blocker,
>>                     "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");
>> @@ -1033,8 +991,6 @@ static void pci_ivshmem_exit(PCIDevice *dev)
>>      IVShmemState *s = IVSHMEM(dev);
>>      int i;
>>
>> -    fifo8_destroy(&s->incoming_fifo);
>> -
>>      if (s->migration_blocker) {
>>          migrate_del_blocker(s->migration_blocker);
>>          error_free(s->migration_blocker);
>> --
>> 2.4.3
>>
>>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 33/38] ivshmem: Replace int role_val by OnOffAuto master
  2016-03-02 18:56   ` Marc-André Lureau
@ 2016-03-02 19:39     ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-02 19:39 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> In preparation of making it a qdev property.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> --
>>  hw/misc/ivshmem.c | 31 +++++++++++++++++++------------
>>  1 file changed, 19 insertions(+), 12 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index 785ed1c..b39ea27 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -43,9 +43,6 @@
>>  #define IVSHMEM_IOEVENTFD   0
>>  #define IVSHMEM_MSI     1
>>
>> -#define IVSHMEM_PEER    0
>> -#define IVSHMEM_MASTER  1
>> -
>>  #define IVSHMEM_REG_BAR_SIZE 0x100
>>
>>  #define IVSHMEM_DEBUG 0
>> @@ -96,12 +93,12 @@ typedef struct IVShmemState {
>>      uint64_t msg_buf;           /* buffer for receiving server messages */
>>      int msg_buffered_bytes;     /* #bytes in @msg_buf */
>>
>> +    OnOffAuto master;
>>      Error *migration_blocker;
>>
>>      char * shmobj;
>>      char * sizearg;
>>      char * role;
>> -    int role_val;   /* scalar to avoid multiple string comparisons */
>>  } IVShmemState;
>>
>>  /* registers for the Inter-VM shared memory device */
>> @@ -117,6 +114,12 @@ static inline uint32_t ivshmem_has_feature(IVShmemState *ivs,
>>      return (ivs->features & (1 << feature));
>>  }
>>
>> +static inline bool ivshmem_is_master(IVShmemState *s)
>> +{
>> +    assert(s->master != ON_OFF_AUTO_AUTO);
>> +    return s->master == ON_OFF_AUTO_ON;
>> +}
>> +
>>  static void ivshmem_update_irq(IVShmemState *s)
>>  {
>>      PCIDevice *d = PCI_DEVICE(s);
>> @@ -861,15 +864,15 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>>      /* check that role is reasonable */
>>      if (s->role) {
>>          if (strncmp(s->role, "peer", 5) == 0) {
>> -            s->role_val = IVSHMEM_PEER;
>> +            s->master = ON_OFF_AUTO_OFF;
>>          } else if (strncmp(s->role, "master", 7) == 0) {
>> -            s->role_val = IVSHMEM_MASTER;
>> +            s->master = ON_OFF_AUTO_ON;
>>          } else {
>>              error_setg(errp, "'role' must be 'peer' or 'master'");
>>              return;
>>          }
>>      } else {
>> -        s->role_val = IVSHMEM_MASTER; /* default */
>> +        s->master = ON_OFF_AUTO_AUTO;
>>      }
>>
>>      pci_conf = dev->config;
>> @@ -931,7 +934,11 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>>      vmstate_register_ram(s->ivshmem_bar2, DEVICE(s));
>>      pci_register_bar(PCI_DEVICE(s), 2, attr, s->ivshmem_bar2);
>>
>> -    if (s->role_val == IVSHMEM_PEER) {
>> +    if (s->master == ON_OFF_AUTO_AUTO) {
>> +        s->master = s->vm_id == 0 ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
>> +    }
>> +
>> +    if (ivshmem_is_master(s)) {
>
> !ivshmem_is_master() instead, or ivshmem_is_peer().

Another stupid mistake...

>>          error_setg(&s->migration_blocker,
>>                     "Migration is disabled when using feature 'peer mode' in device 'ivshmem'");

Note to self: improve this message while there.

>>          migrate_add_blocker(s->migration_blocker);
>> @@ -993,7 +1000,7 @@ static int ivshmem_pre_load(void *opaque)
>>  {
>>      IVShmemState *s = opaque;
>>
>> -    if (s->role_val == IVSHMEM_PEER) {
>> +    if (ivshmem_is_master(s)) {
>
> same here

Yup.  Thanks!

>>          error_report("'peer' devices are not migratable");
>>          return -EINVAL;
>>      }
>> @@ -1019,9 +1026,9 @@ static int ivshmem_load_old(QEMUFile *f, void *opaque, int version_id)
>>          return -EINVAL;
>>      }
>>
>> -    if (s->role_val == IVSHMEM_PEER) {
>> -        error_report("'peer' devices are not migratable");
>> -        return -EINVAL;
>> +    ret = ivshmem_pre_load(s);
>> +    if (ret) {
>> +        return ret;
>>      }
>>
>>      ret = pci_device_load(pdev, f);
>> --
>> 2.4.3
>>
>>

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 22/38] ivshmem: Plug leaks on unplug, fix peer disconnect
  2016-03-02 19:19     ` Markus Armbruster
@ 2016-03-02 23:52       ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-02 23:52 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Wed, Mar 2, 2016 at 8:19 PM, Markus Armbruster <armbru@redhat.com> wrote:
> When called from process_msg_disconnect(): invalid as long as
> ivshmem-spec.txt doesn't assign a sane meaning to it.  Let's make it an
> error there, okay?


Sounds find to me too

thanks

-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup()
  2016-03-02 19:35     ` Markus Armbruster
@ 2016-03-03  0:03       ` Marc-André Lureau
  2016-03-03  7:16         ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-03  0:03 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Wed, Mar 2, 2016 at 8:35 PM, Markus Armbruster <armbru@redhat.com> wrote:
> You know, I'd prefer that, too, and I've argued for it unsuccessfully.
> As it is, we fairly consistently return void when the function returns
> errors through Error ** and has no non-error value.

Good to know we are on same side.

>>>  {
>>>      PCIDevice *pdev = PCI_DEVICE(s);
>>>      MSIMessage msg = msix_get_message(pdev, vector);
>>> @@ -522,22 +518,21 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
>>>
>>>      ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
>>>      if (ret < 0) {
>>> -        error_report("ivshmem: kvm_irqchip_add_msi_route failed");
>>> -        return -1;
>>> +        error_setg(errp, "kvm_irqchip_add_msi_route failed");
>>> +        return;
>>>      }
>>>
>>>      s->msi_vectors[vector].virq = ret;
>>>      s->msi_vectors[vector].pdev = pdev;
>>> -
>>> -    return 0;
>>>  }
>>>
>>> -static void setup_interrupt(IVShmemState *s, int vector)
>>> +static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
>>>  {
>>>      EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
>>>      bool with_irqfd = kvm_msi_via_irqfd_enabled() &&
>>>          ivshmem_has_feature(s, IVSHMEM_MSI);
>>>      PCIDevice *pdev = PCI_DEVICE(s);
>>> +    Error *err = NULL;
>>>
>>>      IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);
>>>
>>> @@ -546,13 +541,16 @@ static void setup_interrupt(IVShmemState *s, int vector)
>>>          watch_vector_notifier(s, n, vector);
>>>      } else if (msix_enabled(pdev)) {
>>>          IVSHMEM_DPRINTF("with irqfd\n");
>>> -        if (ivshmem_add_kvm_msi_virq(s, vector) < 0) {
>>> +        ivshmem_add_kvm_msi_virq(s, vector, &err);
>>> +        if (err) {
>>> +            error_propagate(errp, err);
>>>              return;
>>
>> That would make this simpler, avoiding local err variables, and
>> propagate. But you seem to prefer that form. I don't know if there is
>> any conventions (I am used to glib conventions, and usually a bool
>> success is returned, even if the function takes a GError)
>
> Does GLib spell out this convention somewhere?

For glib, there is a paragraph about return bool/GError conventions
(which is usually adapted to other return type):
https://developer.gnome.org/glib/unstable/glib-Error-Reporting.html

>
> I can perhaps try to cook up a patch to demonstrate the advantages of
> returning a success/failure value even with Error **, and try to get our
> convention changed.
>
> Until then, we better stick to the existing convention, whether we like
> it or not.

ok




-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup()
  2016-03-03  0:03       ` Marc-André Lureau
@ 2016-03-03  7:16         ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-03  7:16 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Wed, Mar 2, 2016 at 8:35 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> You know, I'd prefer that, too, and I've argued for it unsuccessfully.
>> As it is, we fairly consistently return void when the function returns
>> errors through Error ** and has no non-error value.
>
> Good to know we are on same side.
>
>>>>  {
>>>>      PCIDevice *pdev = PCI_DEVICE(s);
>>>>      MSIMessage msg = msix_get_message(pdev, vector);
>>>> @@ -522,22 +518,21 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
>>>>
>>>>      ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
>>>>      if (ret < 0) {
>>>> -        error_report("ivshmem: kvm_irqchip_add_msi_route failed");
>>>> -        return -1;
>>>> +        error_setg(errp, "kvm_irqchip_add_msi_route failed");
>>>> +        return;
>>>>      }
>>>>
>>>>      s->msi_vectors[vector].virq = ret;
>>>>      s->msi_vectors[vector].pdev = pdev;
>>>> -
>>>> -    return 0;
>>>>  }
>>>>
>>>> -static void setup_interrupt(IVShmemState *s, int vector)
>>>> +static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
>>>>  {
>>>>      EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
>>>>      bool with_irqfd = kvm_msi_via_irqfd_enabled() &&
>>>>          ivshmem_has_feature(s, IVSHMEM_MSI);
>>>>      PCIDevice *pdev = PCI_DEVICE(s);
>>>> +    Error *err = NULL;
>>>>
>>>>      IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);
>>>>
>>>> @@ -546,13 +541,16 @@ static void setup_interrupt(IVShmemState *s, int vector)
>>>>          watch_vector_notifier(s, n, vector);
>>>>      } else if (msix_enabled(pdev)) {
>>>>          IVSHMEM_DPRINTF("with irqfd\n");
>>>> -        if (ivshmem_add_kvm_msi_virq(s, vector) < 0) {
>>>> +        ivshmem_add_kvm_msi_virq(s, vector, &err);
>>>> +        if (err) {
>>>> +            error_propagate(errp, err);
>>>>              return;
>>>
>>> That would make this simpler, avoiding local err variables, and
>>> propagate. But you seem to prefer that form. I don't know if there is
>>> any conventions (I am used to glib conventions, and usually a bool
>>> success is returned, even if the function takes a GError)
>>
>> Does GLib spell out this convention somewhere?
>
> For glib, there is a paragraph about return bool/GError conventions
> (which is usually adapted to other return type):
> https://developer.gnome.org/glib/unstable/glib-Error-Reporting.html

While I can't see a hard-and-fast rule there, the text clearly shows a
strong preference for making the function value a reliable error
indicator whenever possible.

Thanks!

>>
>> I can perhaps try to cook up a patch to demonstrate the advantages of
>> returning a success/failure value even with Error **, and try to get our
>> convention changed.
>>
>> Until then, we better stick to the existing convention, whether we like
>> it or not.
>
> ok

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 34/38] ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 34/38] ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem Markus Armbruster
@ 2016-03-03 13:53   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-03 13:53 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> ivshmem can be configured with and without interrupt capability
> (a.k.a. "doorbell").  The two configurations have largely disjoint
> options, which makes for a confusing (and badly checked) user
> interface.  Moreover, the device can't tell the guest whether its
> doorbell is enabled.
>
> Create two new device models ivshmem-plain and ivshmem-doorbell, and
> deprecate the old one.
>
> Changes from ivshmem:
>
> * PCI revision is 1 instead of 0.  The new revision is fully backwards
>   compatible for guests.  Guests may elect to require at least
>   revision 1 to make sure they're not exposed to the funny "no shared
>   memory, yet" state.
>
> * Property "role" replaced by "master".  role=master becomes
>   master=on, role=peer becomes master=off.  Default is off instead of
>   auto.
>
> * Property "use64" is gone.  The new devices always have 64 bit BARs.
>
> Changes from ivshmem to ivshmem-plain:
>
> * The Interrupt Pin register in PCI config space is zero (does not use
>   an interrupt pin) instead of one (uses INTA).
>
> * Property "x-memdev" is renamed to "memdev".
>
> * Properties "shm" and "size" are gone.  Use property "memdev"
>   instead.
>
> * Property "msi" is gone.  The new device can't have MSI-X capability.
>   It can't interrupt anyway.
>
> * Properties "ioeventfd" and "vectors" are gone.  They're meaningless
>   without interrupts anyway.
>
> Changes from ivshmem to ivshmem-doorbell:
>
> * Property "msi" is gone.  The new device always has MSI-X capability.
>
> * Property "ioeventfd" defaults to on instead of off.
>
> * Property "size" is gone.  The new device can only map all the shared
>   memory received from the server.
>
> Guests can easily find out whether the device is configured for
> interrupts by checking for MSI-X capability.
>
> Note: some code added in sub-optimal places to make the diff easier to
> review.  The next commit will move it to more sensible places.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

This is a large patch that could perhaps be splitted, but I didn't see
any issue with it so:
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


>  docs/specs/ivshmem-spec.txt |  66 +++++-----
>  hw/misc/ivshmem.c           | 310 ++++++++++++++++++++++++++++++++------------
>  qemu-doc.texi               |  33 ++---
>  tests/ivshmem-test.c        |  12 +-
>  4 files changed, 288 insertions(+), 133 deletions(-)
>
> diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
> index 3eb8c97..047dfc0 100644
> --- a/docs/specs/ivshmem-spec.txt
> +++ b/docs/specs/ivshmem-spec.txt
> @@ -17,9 +17,10 @@ get interrupted by its peers.
>
>  There are two basic configurations:
>
> -- Just shared memory: -device ivshmem,shm=NAME,...
> +- Just shared memory: -device ivshmem-plain,memdev=HMB,...
>
> -  This uses shared memory object NAME.
> +  This uses host memory backend HMB.  It should have option "share"
> +  set.
>
>  - Shared memory plus interrupts: -device ivshmem,chardev=CHR,vectors=N,...
>
> @@ -30,9 +31,8 @@ There are two basic configurations:
>    Each peer gets assigned a unique ID by the server.  IDs must be
>    between 0 and 65535.
>
> -  Interrupts are message-signaled by default (MSI-X).  With msi=off
> -  the device has no MSI-X capability, and uses legacy INTx instead.
> -  vectors=N configures the number of vectors to use.
> +  Interrupts are message-signaled (MSI-X).  vectors=N configures the
> +  number of vectors to use.
>
>  For more details on ivshmem device properties, see The QEMU Emulator
>  User Documentation (qemu-doc.*).
> @@ -40,14 +40,15 @@ User Documentation (qemu-doc.*).
>
>  == The ivshmem PCI device's guest interface ==
>
> -The device has vendor ID 1af4, device ID 1110, revision 0.
> +The device has vendor ID 1af4, device ID 1110, revision 1.  Before
> +QEMU 2.6.0, it had revision 0.
>
>  === PCI BARs ===
>
>  The ivshmem PCI device has two or three BARs:
>
>  - BAR0 holds device registers (256 Byte MMIO)
> -- BAR1 holds MSI-X table and PBA (only when using MSI-X)
> +- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell)
>  - BAR2 maps the shared memory object
>
>  There are two ways to use this device:
> @@ -58,18 +59,19 @@ There are two ways to use this device:
>    user space (see http://dpdk.org/browse/memnic).
>
>  - If you additionally need the capability for peers to interrupt each
> -  other, you need BAR0 and, if using MSI-X, BAR1.  You will most
> -  likely want to write a kernel driver to handle interrupts.  Requires
> -  the device to be configured for interrupts, obviously.
> +  other, you need BAR0 and BAR1.  You will most likely want to write a
> +  kernel driver to handle interrupts.  Requires the device to be
> +  configured for interrupts, obviously.
>
>  Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
>  configured for interrupts.  It becomes safely accessible only after
> -the ivshmem server provided the shared memory.  Guest software should
> -wait for the IVPosition register (described below) to become
> -non-negative before accessing BAR2.
> +the ivshmem server provided the shared memory.  These devices have PCI
> +revision 0 rather than 1.  Guest software should wait for the
> +IVPosition register (described below) to become non-negative before
> +accessing BAR2.
>
> -The device is not capable to tell guest software whether it is
> -configured for interrupts.
> +Revision 0 of the device is not capable to tell guest software whether
> +it is configured for interrupts.
>
>  === PCI device registers ===
>
> @@ -77,10 +79,12 @@ BAR 0 contains the following registers:
>
>      Offset  Size  Access      On reset  Function
>          0     4   read/write        0   Interrupt Mask
> -                                        bit 0: peer interrupt
> +                                        bit 0: peer interrupt (rev 0)
> +                                               reserved       (rev 1)
>                                          bit 1..31: reserved
>          4     4   read/write        0   Interrupt Status
> -                                        bit 0: peer interrupt
> +                                        bit 0: peer interrupt (rev 0)
> +                                               reserved       (rev 1)
>                                          bit 1..31: reserved
>          8     4   read-only   0 or ID   IVPosition
>         12     4   write-only      N/A   Doorbell
> @@ -92,18 +96,18 @@ Software should only access the registers as specified in column
>  "Access".  Reserved bits should be ignored on read, and preserved on
>  write.
>
> -Interrupt Status and Mask Register together control the legacy INTx
> -interrupt when the device has no MSI-X capability: INTx is asserted
> -when the bit-wise AND of Status and Mask is non-zero and the device
> -has no MSI-X capability.  Interrupt Status Register bit 0 becomes 1
> -when an interrupt request from a peer is received.  Reading the
> -register clears it.
> +In revision 0 of the device, Interrupt Status and Mask Register
> +together control the legacy INTx interrupt when the device has no
> +MSI-X capability: INTx is asserted when the bit-wise AND of Status and
> +Mask is non-zero and the device has no MSI-X capability.  Interrupt
> +Status Register bit 0 becomes 1 when an interrupt request from a peer
> +is received.  Reading the register clears it.
>
>  IVPosition Register: if the device is not configured for interrupts,
>  this is zero.  Else, it is the device's ID (between 0 and 65535).
>
>  Before QEMU 2.6.0, the register may read -1 for a short while after
> -reset.
> +reset.  These devices have PCI revision 0 rather than 1.
>
>  There is no good way for software to find out whether the device is
>  configured for interrupts.  A positive IVPosition means interrupts,
> @@ -124,14 +128,14 @@ interrupt vectors connected, the write is ignored.  The device is not
>  capable to tell guest software what peers are connected, or how many
>  interrupt vectors are connected.
>
> -If the peer doesn't use MSI-X, its Interrupt Status register is set to
> -1.  This asserts INTx unless masked by the Interrupt Mask register.
> -The device is not capable to communicate the interrupt vector to guest
> -software then.
> +The peer's interrupt for this vector then becomes pending.  There is
> +no way for software to clear the pending bit, and a polling mode of
> +operation is therefore impossible.
>
> -If the peer uses MSI-X, the interrupt for this vector becomes pending.
> -There is no way for software to clear the pending bit, and a polling
> -mode of operation is therefore impossible with MSI-X.
> +If the peer is a revision 0 device without MSI-X capability, its
> +Interrupt Status register is set to 1.  This asserts INTx unless
> +masked by the Interrupt Mask register.  The device is not capable to
> +communicate the interrupt vector to guest software then.
>
>  With multiple MSI-X vectors, different vectors can be used to indicate
>  different events have occurred.  The semantics of interrupt vectors
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index b39ea27..f7f5b3b 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -29,6 +29,7 @@
>  #include "qom/object_interfaces.h"
>  #include "sysemu/char.h"
>  #include "sysemu/hostmem.h"
> +#include "sysemu/qtest.h"
>  #include "qapi/visitor.h"
>  #include "exec/ram_addr.h"
>
> @@ -53,6 +54,18 @@
>          }                                               \
>      } while (0)
>
> +#define TYPE_IVSHMEM_COMMON "ivshmem-common"
> +#define IVSHMEM_COMMON(obj) \
> +    OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_COMMON)
> +
> +#define TYPE_IVSHMEM_PLAIN "ivshmem-plain"
> +#define IVSHMEM_PLAIN(obj) \
> +    OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_PLAIN)
> +
> +#define TYPE_IVSHMEM_DOORBELL "ivshmem-doorbell"
> +#define IVSHMEM_DOORBELL(obj) \
> +    OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_DOORBELL)
> +
>  #define TYPE_IVSHMEM "ivshmem"
>  #define IVSHMEM(obj) \
>      OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM)
> @@ -80,8 +93,6 @@ typedef struct IVShmemState {
>      MemoryRegion ivshmem_mmio;
>
>      MemoryRegion *ivshmem_bar2; /* BAR 2 (shared memory) */
> -    size_t ivshmem_size; /* size of shared memory region */
> -    uint32_t ivshmem_64bit;
>
>      Peer *peers;
>      int nb_peers;               /* space in @peers[] */
> @@ -96,9 +107,12 @@ typedef struct IVShmemState {
>      OnOffAuto master;
>      Error *migration_blocker;
>
> -    char * shmobj;
> -    char * sizearg;
> -    char * role;
> +    /* legacy cruft */
> +    char *role;
> +    char *shmobj;
> +    char *sizearg;
> +    size_t legacy_size;
> +    uint32_t not_legacy_32bit;
>  } IVShmemState;
>
>  /* registers for the Inter-VM shared memory device */
> @@ -258,7 +272,7 @@ static void ivshmem_vector_notify(void *opaque)
>  {
>      MSIVector *entry = opaque;
>      PCIDevice *pdev = entry->pdev;
> -    IVShmemState *s = IVSHMEM(pdev);
> +    IVShmemState *s = IVSHMEM_COMMON(pdev);
>      int vector = entry - s->msi_vectors;
>      EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
>
> @@ -279,7 +293,7 @@ static void ivshmem_vector_notify(void *opaque)
>  static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector,
>                                   MSIMessage msg)
>  {
> -    IVShmemState *s = IVSHMEM(dev);
> +    IVShmemState *s = IVSHMEM_COMMON(dev);
>      EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
>      MSIVector *v = &s->msi_vectors[vector];
>      int ret;
> @@ -296,7 +310,7 @@ static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector,
>
>  static void ivshmem_vector_mask(PCIDevice *dev, unsigned vector)
>  {
> -    IVShmemState *s = IVSHMEM(dev);
> +    IVShmemState *s = IVSHMEM_COMMON(dev);
>      EventNotifier *n = &s->peers[s->vm_id].eventfds[vector];
>      int ret;
>
> @@ -313,7 +327,7 @@ static void ivshmem_vector_poll(PCIDevice *dev,
>                                  unsigned int vector_start,
>                                  unsigned int vector_end)
>  {
> -    IVShmemState *s = IVSHMEM(dev);
> +    IVShmemState *s = IVSHMEM_COMMON(dev);
>      unsigned int vector;
>
>      IVSHMEM_DPRINTF("vector poll %p %d-%d\n", dev, vector_start, vector_end);
> @@ -460,6 +474,7 @@ static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
>  static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
>  {
>      struct stat buf;
> +    size_t size;
>      void *ptr;
>
>      if (s->ivshmem_bar2) {
> @@ -475,15 +490,21 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
>          return;
>      }
>
> -    if (s->ivshmem_size > buf.st_size) {
> -        error_setg(errp, "server sent only %zd bytes of shared memory",
> -                   (size_t)buf.st_size);
> -        close(fd);
> -        return;
> +    size = buf.st_size;
> +
> +    /* Legacy cruft */
> +    if (s->legacy_size != SIZE_MAX) {
> +        if (size < s->legacy_size) {
> +            error_setg(errp, "server sent only %zd bytes of shared memory",
> +                       (size_t)buf.st_size);
> +            close(fd);
> +            return;
> +        }
> +        size = s->legacy_size;
>      }
>
>      /* mmap the region and map into the BAR2 */
> -    ptr = mmap(0, s->ivshmem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> +    ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>      if (ptr == MAP_FAILED) {
>          error_setg_errno(errp, errno, "Failed to mmap shared memory");
>          close(fd);
> @@ -491,7 +512,7 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
>      }
>      s->ivshmem_bar2 = g_new(MemoryRegion, 1);
>      memory_region_init_ram_ptr(s->ivshmem_bar2, OBJECT(s),
> -                               "ivshmem.bar2", s->ivshmem_size, ptr);
> +                               "ivshmem.bar2", size, ptr);
>      qemu_set_ram_fd(s->ivshmem_bar2->ram_addr, fd);
>  }
>
> @@ -700,7 +721,7 @@ static void ivshmem_vector_use(IVShmemState *s)
>
>  static void ivshmem_reset(DeviceState *d)
>  {
> -    IVShmemState *s = IVSHMEM(d);
> +    IVShmemState *s = IVSHMEM_COMMON(d);
>
>      s->intrstatus = 0;
>      s->intrmask = 0;
> @@ -776,7 +797,7 @@ static void ivshmem_disable_irqfd(IVShmemState *s)
>  static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
>                                   uint32_t val, int len)
>  {
> -    IVShmemState *s = IVSHMEM(pdev);
> +    IVShmemState *s = IVSHMEM_COMMON(pdev);
>      int is_enabled, was_enabled = msix_enabled(pdev);
>
>      pci_default_write_config(pdev, address, val, len);
> @@ -818,42 +839,14 @@ static HostMemoryBackend *desugar_shm(const char *shm, size_t size)
>      return MEMORY_BACKEND(obj);
>  }
>
> -static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
> +static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
>  {
> -    IVShmemState *s = IVSHMEM(dev);
> +    IVShmemState *s = IVSHMEM_COMMON(dev);
>      Error *err = NULL;
>      uint8_t *pci_conf;
>      uint8_t attr = PCI_BASE_ADDRESS_SPACE_MEMORY |
>          PCI_BASE_ADDRESS_MEM_PREFETCH;
>
> -    if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
> -        error_setg(errp,
> -                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
> -        return;
> -    }
> -
> -    if (s->hostmem) {
> -        MemoryRegion *mr;
> -
> -        if (s->sizearg) {
> -            g_warning("size argument ignored with hostmem");
> -        }
> -
> -        mr = host_memory_backend_get_memory(s->hostmem, &error_abort);
> -        s->ivshmem_size = memory_region_size(mr);
> -    } else if (s->sizearg == NULL) {
> -        s->ivshmem_size = 4 << 20; /* 4 MB default */
> -    } else {
> -        char *end;
> -        int64_t size = qemu_strtosz(s->sizearg, &end);
> -        if (size < 0 || (size_t)size != size || *end != '\0'
> -            || !is_power_of_2(size)) {
> -            error_setg(errp, "Invalid size %s", s->sizearg);
> -            return;
> -        }
> -        s->ivshmem_size = size;
> -    }
> -
>      /* IRQFD requires MSI */
>      if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) &&
>          !ivshmem_has_feature(s, IVSHMEM_MSI)) {
> @@ -861,29 +854,9 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>          return;
>      }
>
> -    /* check that role is reasonable */
> -    if (s->role) {
> -        if (strncmp(s->role, "peer", 5) == 0) {
> -            s->master = ON_OFF_AUTO_OFF;
> -        } else if (strncmp(s->role, "master", 7) == 0) {
> -            s->master = ON_OFF_AUTO_ON;
> -        } else {
> -            error_setg(errp, "'role' must be 'peer' or 'master'");
> -            return;
> -        }
> -    } else {
> -        s->master = ON_OFF_AUTO_AUTO;
> -    }
> -
>      pci_conf = dev->config;
>      pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY;
>
> -    /*
> -     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
> -     * bald-faced lie then.  But it's a backwards compatible lie.
> -     */
> -    pci_config_set_interrupt_pin(pci_conf, 1);
> -
>      memory_region_init_io(&s->ivshmem_mmio, OBJECT(s), &ivshmem_mmio_ops, s,
>                            "ivshmem-mmio", IVSHMEM_REG_BAR_SIZE);
>
> @@ -891,14 +864,10 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>      pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
>                       &s->ivshmem_mmio);
>
> -    if (s->ivshmem_64bit) {
> +    if (!s->not_legacy_32bit) {
>          attr |= PCI_BASE_ADDRESS_MEM_TYPE_64;
>      }
>
> -    if (s->shmobj) {
> -        s->hostmem = desugar_shm(s->shmobj, s->ivshmem_size);
> -    }
> -
>      if (s->hostmem != NULL) {
>          IVSHMEM_DPRINTF("using hostmem\n");
>
> @@ -945,9 +914,68 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp)
>      }
>  }
>
> -static void pci_ivshmem_exit(PCIDevice *dev)
> +static void ivshmem_realize(PCIDevice *dev, Error **errp)
>  {
> -    IVShmemState *s = IVSHMEM(dev);
> +    IVShmemState *s = IVSHMEM_COMMON(dev);
> +
> +    if (!qtest_enabled()) {
> +        error_report("ivshmem is deprecated, please use ivshmem-plain"
> +                     " or ivshmem-doorbell instead");
> +    }
> +
> +    if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
> +        error_setg(errp,
> +                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
> +        return;
> +    }
> +
> +    if (s->hostmem) {
> +        if (s->sizearg) {
> +            g_warning("size argument ignored with hostmem");
> +        }
> +    } else if (s->sizearg == NULL) {
> +        s->legacy_size = 4 << 20; /* 4 MB default */
> +    } else {
> +        char *end;
> +        int64_t size = qemu_strtosz(s->sizearg, &end);
> +        if (size < 0 || (size_t)size != size || *end != '\0'
> +            || !is_power_of_2(size)) {
> +            error_setg(errp, "Invalid size %s", s->sizearg);
> +            return;
> +        }
> +        s->legacy_size = size;
> +    }
> +
> +    /* check that role is reasonable */
> +    if (s->role) {
> +        if (strncmp(s->role, "peer", 5) == 0) {
> +            s->master = ON_OFF_AUTO_OFF;
> +        } else if (strncmp(s->role, "master", 7) == 0) {
> +            s->master = ON_OFF_AUTO_ON;
> +        } else {
> +            error_setg(errp, "'role' must be 'peer' or 'master'");
> +            return;
> +        }
> +    } else {
> +        s->master = ON_OFF_AUTO_AUTO;
> +    }
> +
> +    if (s->shmobj) {
> +        s->hostmem = desugar_shm(s->shmobj, s->legacy_size);
> +    }
> +
> +    /*
> +     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
> +     * bald-faced lie then.  But it's a backwards compatible lie.
> +     */
> +    pci_config_set_interrupt_pin(dev->config, 1);
> +
> +    ivshmem_common_realize(dev, errp);
> +}
> +
> +static void ivshmem_exit(PCIDevice *dev)
> +{
> +    IVShmemState *s = IVSHMEM_COMMON(dev);
>      int i;
>
>      if (s->migration_blocker) {
> @@ -959,7 +987,7 @@ static void pci_ivshmem_exit(PCIDevice *dev)
>          if (!s->hostmem) {
>              void *addr = memory_region_get_ram_ptr(s->ivshmem_bar2);
>
> -            if (munmap(addr, s->ivshmem_size) == -1) {
> +            if (munmap(addr, memory_region_size(s->ivshmem_bar2) == -1)) {
>                  error_report("Failed to munmap shared memory %s",
>                               strerror(errno));
>              }
> @@ -1074,28 +1102,39 @@ static Property ivshmem_properties[] = {
>      DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true),
>      DEFINE_PROP_STRING("shm", IVShmemState, shmobj),
>      DEFINE_PROP_STRING("role", IVShmemState, role),
> -    DEFINE_PROP_UINT32("use64", IVShmemState, ivshmem_64bit, 1),
> +    DEFINE_PROP_UINT32("use64", IVShmemState, not_legacy_32bit, 1),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>
> -static void ivshmem_class_init(ObjectClass *klass, void *data)
> +static void ivshmem_common_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
>      PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>
> -    k->realize = pci_ivshmem_realize;
> -    k->exit = pci_ivshmem_exit;
> +    k->realize = ivshmem_common_realize;
> +    k->exit = ivshmem_exit;
>      k->config_write = ivshmem_write_config;
>      k->vendor_id = PCI_VENDOR_ID_IVSHMEM;
>      k->device_id = PCI_DEVICE_ID_IVSHMEM;
>      k->class_id = PCI_CLASS_MEMORY_RAM;
> +    k->revision = 1;
>      dc->reset = ivshmem_reset;
> -    dc->props = ivshmem_properties;
> -    dc->vmsd = &ivshmem_vmsd;
>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>      dc->desc = "Inter-VM shared memory";
>  }
>
> +static void ivshmem_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> +    k->realize = ivshmem_realize;
> +    k->revision = 0;
> +    dc->desc = "Inter-VM shared memory (legacy)";
> +    dc->props = ivshmem_properties;
> +    dc->vmsd = &ivshmem_vmsd;
> +}
> +
>  static void ivshmem_check_memdev_is_busy(Object *obj, const char *name,
>                                           Object *val, Error **errp)
>  {
> @@ -1122,16 +1161,121 @@ static void ivshmem_init(Object *obj)
>                               &error_abort);
>  }
>
> +static const TypeInfo ivshmem_common_info = {
> +    .name          = TYPE_IVSHMEM_COMMON,
> +    .parent        = TYPE_PCI_DEVICE,
> +    .instance_size = sizeof(IVShmemState),
> +    .abstract      = true,
> +    .class_init    = ivshmem_common_class_init,
> +};
> +
>  static const TypeInfo ivshmem_info = {
>      .name          = TYPE_IVSHMEM,
> -    .parent        = TYPE_PCI_DEVICE,
> +    .parent        = TYPE_IVSHMEM_COMMON,
>      .instance_size = sizeof(IVShmemState),
>      .instance_init = ivshmem_init,
>      .class_init    = ivshmem_class_init,
>  };
>
> +static const VMStateDescription ivshmem_plain_vmsd = {
> +    .name = TYPE_IVSHMEM_PLAIN,
> +    .version_id = 0,
> +    .minimum_version_id = 0,
> +    .pre_load = ivshmem_pre_load,
> +    .post_load = ivshmem_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
> +        VMSTATE_UINT32(intrstatus, IVShmemState),
> +        VMSTATE_UINT32(intrmask, IVShmemState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static Property ivshmem_plain_properties[] = {
> +    DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void ivshmem_plain_init(Object *obj)
> +{
> +    IVShmemState *s = IVSHMEM_PLAIN(obj);
> +
> +    object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND,
> +                             (Object **)&s->hostmem,
> +                             ivshmem_check_memdev_is_busy,
> +                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
> +                             &error_abort);
> +}
> +
> +static void ivshmem_plain_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->props = ivshmem_plain_properties;
> +    dc->vmsd = &ivshmem_plain_vmsd;
> +}
> +
> +static const TypeInfo ivshmem_plain_info = {
> +    .name          = TYPE_IVSHMEM_PLAIN,
> +    .parent        = TYPE_IVSHMEM_COMMON,
> +    .instance_size = sizeof(IVShmemState),
> +    .instance_init = ivshmem_plain_init,
> +    .class_init    = ivshmem_plain_class_init,
> +};
> +
> +static const VMStateDescription ivshmem_doorbell_vmsd = {
> +    .name = TYPE_IVSHMEM_DOORBELL,
> +    .version_id = 0,
> +    .minimum_version_id = 0,
> +    .pre_load = ivshmem_pre_load,
> +    .post_load = ivshmem_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
> +        VMSTATE_MSIX(parent_obj, IVShmemState),
> +        VMSTATE_UINT32(intrstatus, IVShmemState),
> +        VMSTATE_UINT32(intrmask, IVShmemState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static Property ivshmem_doorbell_properties[] = {
> +    DEFINE_PROP_CHR("chardev", IVShmemState, server_chr),
> +    DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 1),
> +    DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD,
> +                    true),
> +    DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void ivshmem_doorbell_init(Object *obj)
> +{
> +    IVShmemState *s = IVSHMEM_DOORBELL(obj);
> +
> +    s->features |= (1 << IVSHMEM_MSI);
> +    s->legacy_size = SIZE_MAX;  /* whatever the server sends */
> +}
> +
> +static void ivshmem_doorbell_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->props = ivshmem_doorbell_properties;
> +    dc->vmsd = &ivshmem_doorbell_vmsd;
> +}
> +
> +static const TypeInfo ivshmem_doorbell_info = {
> +    .name          = TYPE_IVSHMEM_DOORBELL,
> +    .parent        = TYPE_IVSHMEM_COMMON,
> +    .instance_size = sizeof(IVShmemState),
> +    .instance_init = ivshmem_doorbell_init,
> +    .class_init    = ivshmem_doorbell_class_init,
> +};
> +
>  static void ivshmem_register_types(void)
>  {
> +    type_register_static(&ivshmem_common_info);
> +    type_register_static(&ivshmem_plain_info);
> +    type_register_static(&ivshmem_doorbell_info);
>      type_register_static(&ivshmem_info);
>  }
>
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index 8afbbcd..0dd01c7 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -1262,13 +1262,18 @@ basic example.
>
>  @subsection Inter-VM Shared Memory device
>
> -With KVM enabled on a Linux host, a shared memory device is available.  Guests
> -map a POSIX shared memory region into the guest as a PCI device that enables
> -zero-copy communication to the application level of the guests.  The basic
> -syntax is:
> +On Linux hosts, a shared memory device is available.  The basic syntax
> +is:
>
>  @example
> -qemu-system-i386 -device ivshmem,size=@var{size},shm=@var{shm-name}
> +qemu-system-x86_64 -device ivshmem-plain,memdev=@var{hostmem}
> +@end example
> +
> +where @var{hostmem} names a host memory backend.  For a POSIX shared
> +memory backend, use something like
> +
> +@example
> +-object memory-backend-file,size=1M,share,mem-path=/dev/shm/ivshmem,id=@var{hostmem}
>  @end example
>
>  If desired, interrupts can be sent between guest VMs accessing the same shared
> @@ -1282,8 +1287,7 @@ memory server is:
>  ivshmem-server -p @var{pidfile} -S @var{path} -m @var{shm-name} -l @var{shm-size} -n @var{vectors}
>
>  # Then start your qemu instances with matching arguments
> -qemu-system-i386 -device ivshmem,size=@var{shm-size},vectors=@var{vectors},chardev=@var{id}
> -                 [,msi=on][,ioeventfd=on][,role=peer|master]
> +qemu-system-x86_64 -device ivshmem-doorbell,vectors=@var{vectors},chardev=@var{id}
>                   -chardev socket,path=@var{path},id=@var{id}
>  @end example
>
> @@ -1291,12 +1295,11 @@ When using the server, the guest will be assigned a VM ID (>=0) that allows gues
>  using the same server to communicate via interrupts.  Guests can read their
>  VM ID from a device register (see ivshmem-spec.txt).
>
> -The @option{role} argument can be set to either master or peer and will affect
> -how the shared memory is migrated.  With @option{role=master}, the guest will
> -copy the shared memory on migration to the destination host.  With
> -@option{role=peer}, the guest will not be able to migrate with the device attached.
> -With the @option{peer} case, the device should be detached and then reattached
> -after migration using the PCI hotplug support.
> +With device property @option{master=on}, the guest will copy the shared
> +memory on migration to the destination host.  With @option{master=off},
> +the guest will not be able to migrate with the device attached.  In the
> +latter case, the device should be detached and then reattached after
> +migration using the PCI hotplug support.
>
>  @subsubsection ivshmem and hugepages
>
> @@ -1304,8 +1307,8 @@ Instead of specifying the <shm size> using POSIX shm, you may specify
>  a memory backend that has hugepage support:
>
>  @example
> -qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1
> -                 -device ivshmem,x-memdev=mb1
> +qemu-system-x86_64 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1
> +                 -device ivshmem-plain,memdev=mb1
>  @end example
>
>  ivshmem-server also supports hugepages mount points with the
> diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c
> index 68d6840..891b6b8 100644
> --- a/tests/ivshmem-test.c
> +++ b/tests/ivshmem-test.c
> @@ -127,7 +127,9 @@ static void setup_vm_cmd(IVState *s, const char *cmd, bool msix)
>
>  static void setup_vm(IVState *s)
>  {
> -    char *cmd = g_strdup_printf("-device ivshmem,shm=%s,size=1M", tmpshm);
> +    char *cmd = g_strdup_printf("-object memory-backend-file"
> +                                ",id=mb1,size=1M,share,mem-path=/dev/shm%s"
> +                                " -device ivshmem-plain,memdev=mb1", tmpshm);
>
>      setup_vm_cmd(s, cmd, false);
>
> @@ -284,8 +286,10 @@ static void *server_thread(void *data)
>  static void setup_vm_with_server(IVState *s, int nvectors, bool msi)
>  {
>      char *cmd = g_strdup_printf("-chardev socket,id=chr0,path=%s,nowait "
> -                                "-device ivshmem,size=1M,chardev=chr0,vectors=%d,msi=%s",
> -                                tmpserver, nvectors, msi ? "true" : "false");
> +                                "-device ivshmem%s,chardev=chr0,vectors=%d",
> +                                tmpserver,
> +                                msi ? "-doorbell" : ",size=1M,msi=off",
> +                                nvectors);
>
>      setup_vm_cmd(s, cmd, msi);
>
> @@ -412,7 +416,7 @@ static void test_ivshmem_memdev(void)
>
>      /* just for the sake of checking memory-backend property */
>      setup_vm_cmd(&state, "-object memory-backend-ram,size=1M,id=mb1"
> -                 " -device ivshmem,x-memdev=mb1", false);
> +                 " -device ivshmem-plain,memdev=mb1", false);
>
>      cleanup_vm(&state);
>  }
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 35/38] ivshmem: Clean up after the previous commit
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 35/38] ivshmem: Clean up after the previous commit Markus Armbruster
@ 2016-03-03 13:56   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-03 13:56 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Move code to more sensible places.  Use the opportunity to reorder and
> document IVShmemState members.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


> ---
>  hw/misc/ivshmem.c | 420 +++++++++++++++++++++++++++---------------------------
>  1 file changed, 213 insertions(+), 207 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index f7f5b3b..33b6842 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -85,25 +85,30 @@ typedef struct IVShmemState {
>      PCIDevice parent_obj;
>      /*< public >*/
>
> -    HostMemoryBackend *hostmem;
> +    uint32_t features;
> +
> +    /* exactly one of these two may be set */
> +    HostMemoryBackend *hostmem; /* with interrupts */
> +    CharDriverState *server_chr; /* without interrupts */
> +
> +    /* registers */
>      uint32_t intrmask;
>      uint32_t intrstatus;
> +    int vm_id;
>
> -    CharDriverState *server_chr;
> -    MemoryRegion ivshmem_mmio;
> -
> +    /* BARs */
> +    MemoryRegion ivshmem_mmio;  /* BAR 0 (registers) */
>      MemoryRegion *ivshmem_bar2; /* BAR 2 (shared memory) */
>
> +    /* interrupt support */
>      Peer *peers;
>      int nb_peers;               /* space in @peers[] */
> -
> -    int vm_id;
>      uint32_t vectors;
> -    uint32_t features;
>      MSIVector *msi_vectors;
>      uint64_t msg_buf;           /* buffer for receiving server messages */
>      int msg_buffered_bytes;     /* #bytes in @msg_buf */
>
> +    /* migration stuff */
>      OnOffAuto master;
>      Error *migration_blocker;
>
> @@ -812,33 +817,6 @@ static void ivshmem_write_config(PCIDevice *pdev, uint32_t address,
>      }
>  }
>
> -static HostMemoryBackend *desugar_shm(const char *shm, size_t size)
> -{
> -    /* TODO avoid the detour through QemuOpts */
> -    static int counter;
> -    QemuOpts *opts = qemu_opts_create(qemu_find_opts("object"),
> -                                      NULL, 0, &error_abort);
> -    char *path;
> -    Object *obj;
> -
> -    qemu_opt_set(opts, "qom-type", "memory-backend-file",
> -    &error_abort);
> -    /* FIXME need a better way to make up an ID */
> -    qemu_opts_set_id(opts, g_strdup_printf("ivshmem-backend-%d", counter++));
> -    path = g_strdup_printf("/dev/shm/%s", shm);
> -    qemu_opt_set(opts, "mem-path", path, &error_abort);
> -    qemu_opt_set_number(opts, "size", size, &error_abort);
> -    qemu_opt_set_bool(opts, "share", true, &error_abort);
> -    g_free(path);
> -
> -    obj = user_creatable_add_opts(opts, &error_abort);
> -    qemu_opts_del(opts);
> -
> -    user_creatable_complete(obj, &error_abort);
> -
> -    return MEMORY_BACKEND(obj);
> -}
> -
>  static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
>  {
>      IVShmemState *s = IVSHMEM_COMMON(dev);
> @@ -914,65 +892,6 @@ static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
>      }
>  }
>
> -static void ivshmem_realize(PCIDevice *dev, Error **errp)
> -{
> -    IVShmemState *s = IVSHMEM_COMMON(dev);
> -
> -    if (!qtest_enabled()) {
> -        error_report("ivshmem is deprecated, please use ivshmem-plain"
> -                     " or ivshmem-doorbell instead");
> -    }
> -
> -    if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
> -        error_setg(errp,
> -                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
> -        return;
> -    }
> -
> -    if (s->hostmem) {
> -        if (s->sizearg) {
> -            g_warning("size argument ignored with hostmem");
> -        }
> -    } else if (s->sizearg == NULL) {
> -        s->legacy_size = 4 << 20; /* 4 MB default */
> -    } else {
> -        char *end;
> -        int64_t size = qemu_strtosz(s->sizearg, &end);
> -        if (size < 0 || (size_t)size != size || *end != '\0'
> -            || !is_power_of_2(size)) {
> -            error_setg(errp, "Invalid size %s", s->sizearg);
> -            return;
> -        }
> -        s->legacy_size = size;
> -    }
> -
> -    /* check that role is reasonable */
> -    if (s->role) {
> -        if (strncmp(s->role, "peer", 5) == 0) {
> -            s->master = ON_OFF_AUTO_OFF;
> -        } else if (strncmp(s->role, "master", 7) == 0) {
> -            s->master = ON_OFF_AUTO_ON;
> -        } else {
> -            error_setg(errp, "'role' must be 'peer' or 'master'");
> -            return;
> -        }
> -    } else {
> -        s->master = ON_OFF_AUTO_AUTO;
> -    }
> -
> -    if (s->shmobj) {
> -        s->hostmem = desugar_shm(s->shmobj, s->legacy_size);
> -    }
> -
> -    /*
> -     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
> -     * bald-faced lie then.  But it's a backwards compatible lie.
> -     */
> -    pci_config_set_interrupt_pin(dev->config, 1);
> -
> -    ivshmem_common_realize(dev, errp);
> -}
> -
>  static void ivshmem_exit(PCIDevice *dev)
>  {
>      IVShmemState *s = IVSHMEM_COMMON(dev);
> @@ -1012,18 +931,6 @@ static void ivshmem_exit(PCIDevice *dev)
>      g_free(s->msi_vectors);
>  }
>
> -static bool test_msix(void *opaque, int version_id)
> -{
> -    IVShmemState *s = opaque;
> -
> -    return ivshmem_has_feature(s, IVSHMEM_MSI);
> -}
> -
> -static bool test_no_msix(void *opaque, int version_id)
> -{
> -    return !test_msix(opaque, version_id);
> -}
> -
>  static int ivshmem_pre_load(void *opaque)
>  {
>      IVShmemState *s = opaque;
> @@ -1042,70 +949,6 @@ static int ivshmem_post_load(void *opaque, int version_id)
>      return 0;
>  }
>
> -static int ivshmem_load_old(QEMUFile *f, void *opaque, int version_id)
> -{
> -    IVShmemState *s = opaque;
> -    PCIDevice *pdev = PCI_DEVICE(s);
> -    int ret;
> -
> -    IVSHMEM_DPRINTF("ivshmem_load_old\n");
> -
> -    if (version_id != 0) {
> -        return -EINVAL;
> -    }
> -
> -    ret = ivshmem_pre_load(s);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    ret = pci_device_load(pdev, f);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> -        msix_load(pdev, f);
> -    } else {
> -        s->intrstatus = qemu_get_be32(f);
> -        s->intrmask = qemu_get_be32(f);
> -    }
> -    ivshmem_vector_use(s);
> -
> -    return 0;
> -}
> -
> -static const VMStateDescription ivshmem_vmsd = {
> -    .name = "ivshmem",
> -    .version_id = 1,
> -    .minimum_version_id = 1,
> -    .pre_load = ivshmem_pre_load,
> -    .post_load = ivshmem_post_load,
> -    .fields = (VMStateField[]) {
> -        VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
> -
> -        VMSTATE_MSIX_TEST(parent_obj, IVShmemState, test_msix),
> -        VMSTATE_UINT32_TEST(intrstatus, IVShmemState, test_no_msix),
> -        VMSTATE_UINT32_TEST(intrmask, IVShmemState, test_no_msix),
> -
> -        VMSTATE_END_OF_LIST()
> -    },
> -    .load_state_old = ivshmem_load_old,
> -    .minimum_version_id_old = 0
> -};
> -
> -static Property ivshmem_properties[] = {
> -    DEFINE_PROP_CHR("chardev", IVShmemState, server_chr),
> -    DEFINE_PROP_STRING("size", IVShmemState, sizearg),
> -    DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 1),
> -    DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD, false),
> -    DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true),
> -    DEFINE_PROP_STRING("shm", IVShmemState, shmobj),
> -    DEFINE_PROP_STRING("role", IVShmemState, role),
> -    DEFINE_PROP_UINT32("use64", IVShmemState, not_legacy_32bit, 1),
> -    DEFINE_PROP_END_OF_LIST(),
> -};
> -
>  static void ivshmem_common_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -1123,17 +966,13 @@ static void ivshmem_common_class_init(ObjectClass *klass, void *data)
>      dc->desc = "Inter-VM shared memory";
>  }
>
> -static void ivshmem_class_init(ObjectClass *klass, void *data)
> -{
> -    DeviceClass *dc = DEVICE_CLASS(klass);
> -    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> -
> -    k->realize = ivshmem_realize;
> -    k->revision = 0;
> -    dc->desc = "Inter-VM shared memory (legacy)";
> -    dc->props = ivshmem_properties;
> -    dc->vmsd = &ivshmem_vmsd;
> -}
> +static const TypeInfo ivshmem_common_info = {
> +    .name          = TYPE_IVSHMEM_COMMON,
> +    .parent        = TYPE_PCI_DEVICE,
> +    .instance_size = sizeof(IVShmemState),
> +    .abstract      = true,
> +    .class_init    = ivshmem_common_class_init,
> +};
>
>  static void ivshmem_check_memdev_is_busy(Object *obj, const char *name,
>                                           Object *val, Error **errp)
> @@ -1150,33 +989,6 @@ static void ivshmem_check_memdev_is_busy(Object *obj, const char *name,
>      }
>  }
>
> -static void ivshmem_init(Object *obj)
> -{
> -    IVShmemState *s = IVSHMEM(obj);
> -
> -    object_property_add_link(obj, "x-memdev", TYPE_MEMORY_BACKEND,
> -                             (Object **)&s->hostmem,
> -                             ivshmem_check_memdev_is_busy,
> -                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
> -                             &error_abort);
> -}
> -
> -static const TypeInfo ivshmem_common_info = {
> -    .name          = TYPE_IVSHMEM_COMMON,
> -    .parent        = TYPE_PCI_DEVICE,
> -    .instance_size = sizeof(IVShmemState),
> -    .abstract      = true,
> -    .class_init    = ivshmem_common_class_init,
> -};
> -
> -static const TypeInfo ivshmem_info = {
> -    .name          = TYPE_IVSHMEM,
> -    .parent        = TYPE_IVSHMEM_COMMON,
> -    .instance_size = sizeof(IVShmemState),
> -    .instance_init = ivshmem_init,
> -    .class_init    = ivshmem_class_init,
> -};
> -
>  static const VMStateDescription ivshmem_plain_vmsd = {
>      .name = TYPE_IVSHMEM_PLAIN,
>      .version_id = 0,
> @@ -1271,6 +1083,200 @@ static const TypeInfo ivshmem_doorbell_info = {
>      .class_init    = ivshmem_doorbell_class_init,
>  };
>
> +static int ivshmem_load_old(QEMUFile *f, void *opaque, int version_id)
> +{
> +    IVShmemState *s = opaque;
> +    PCIDevice *pdev = PCI_DEVICE(s);
> +    int ret;
> +
> +    IVSHMEM_DPRINTF("ivshmem_load_old\n");
> +
> +    if (version_id != 0) {
> +        return -EINVAL;
> +    }
> +
> +    ret = ivshmem_pre_load(s);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    ret = pci_device_load(pdev, f);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
> +        msix_load(pdev, f);
> +    } else {
> +        s->intrstatus = qemu_get_be32(f);
> +        s->intrmask = qemu_get_be32(f);
> +    }
> +    ivshmem_vector_use(s);
> +
> +    return 0;
> +}
> +
> +static bool test_msix(void *opaque, int version_id)
> +{
> +    IVShmemState *s = opaque;
> +
> +    return ivshmem_has_feature(s, IVSHMEM_MSI);
> +}
> +
> +static bool test_no_msix(void *opaque, int version_id)
> +{
> +    return !test_msix(opaque, version_id);
> +}
> +
> +static const VMStateDescription ivshmem_vmsd = {
> +    .name = "ivshmem",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .pre_load = ivshmem_pre_load,
> +    .post_load = ivshmem_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_PCI_DEVICE(parent_obj, IVShmemState),
> +
> +        VMSTATE_MSIX_TEST(parent_obj, IVShmemState, test_msix),
> +        VMSTATE_UINT32_TEST(intrstatus, IVShmemState, test_no_msix),
> +        VMSTATE_UINT32_TEST(intrmask, IVShmemState, test_no_msix),
> +
> +        VMSTATE_END_OF_LIST()
> +    },
> +    .load_state_old = ivshmem_load_old,
> +    .minimum_version_id_old = 0
> +};
> +
> +static Property ivshmem_properties[] = {
> +    DEFINE_PROP_CHR("chardev", IVShmemState, server_chr),
> +    DEFINE_PROP_STRING("size", IVShmemState, sizearg),
> +    DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 1),
> +    DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD,
> +                    false),
> +    DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true),
> +    DEFINE_PROP_STRING("shm", IVShmemState, shmobj),
> +    DEFINE_PROP_STRING("role", IVShmemState, role),
> +    DEFINE_PROP_UINT32("use64", IVShmemState, not_legacy_32bit, 1),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static HostMemoryBackend *desugar_shm(const char *shm, size_t size)
> +{
> +    /* TODO avoid the detour through QemuOpts */
> +    static int counter;
> +    QemuOpts *opts = qemu_opts_create(qemu_find_opts("object"),
> +                                      NULL, 0, &error_abort);
> +    char *path;
> +    Object *obj;
> +
> +    qemu_opt_set(opts, "qom-type", "memory-backend-file",
> +    &error_abort);
> +    /* FIXME need a better way to make up an ID */
> +    qemu_opts_set_id(opts, g_strdup_printf("ivshmem-backend-%d", counter++));
> +    path = g_strdup_printf("/dev/shm/%s", shm);
> +    qemu_opt_set(opts, "mem-path", path, &error_abort);
> +    qemu_opt_set_number(opts, "size", size, &error_abort);
> +    qemu_opt_set_bool(opts, "share", true, &error_abort);
> +    g_free(path);
> +
> +    obj = user_creatable_add_opts(opts, &error_abort);
> +    qemu_opts_del(opts);
> +
> +    user_creatable_complete(obj, &error_abort);
> +
> +    return MEMORY_BACKEND(obj);
> +}
> +
> +static void ivshmem_realize(PCIDevice *dev, Error **errp)
> +{
> +    IVShmemState *s = IVSHMEM_COMMON(dev);
> +
> +    if (!qtest_enabled()) {
> +        error_report("ivshmem is deprecated, please use ivshmem-plain"
> +                     " or ivshmem-doorbell instead");
> +    }
> +
> +    if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
> +        error_setg(errp,
> +                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
> +        return;
> +    }
> +
> +    if (s->hostmem) {
> +        if (s->sizearg) {
> +            g_warning("size argument ignored with hostmem");
> +        }
> +    } else if (s->sizearg == NULL) {
> +        s->legacy_size = 4 << 20; /* 4 MB default */
> +    } else {
> +        char *end;
> +        int64_t size = qemu_strtosz(s->sizearg, &end);
> +        if (size < 0 || (size_t)size != size || *end != '\0'
> +            || !is_power_of_2(size)) {
> +            error_setg(errp, "Invalid size %s", s->sizearg);
> +            return;
> +        }
> +        s->legacy_size = size;
> +    }
> +
> +    /* check that role is reasonable */
> +    if (s->role) {
> +        if (strncmp(s->role, "peer", 5) == 0) {
> +            s->master = ON_OFF_AUTO_OFF;
> +        } else if (strncmp(s->role, "master", 7) == 0) {
> +            s->master = ON_OFF_AUTO_ON;
> +        } else {
> +            error_setg(errp, "'role' must be 'peer' or 'master'");
> +            return;
> +        }
> +    } else {
> +        s->master = ON_OFF_AUTO_AUTO;
> +    }
> +
> +    if (s->shmobj) {
> +        s->hostmem = desugar_shm(s->shmobj, s->legacy_size);
> +    }
> +
> +    /*
> +     * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a
> +     * bald-faced lie then.  But it's a backwards compatible lie.
> +     */
> +    pci_config_set_interrupt_pin(dev->config, 1);
> +
> +    ivshmem_common_realize(dev, errp);
> +}
> +
> +static void ivshmem_init(Object *obj)
> +{
> +    IVShmemState *s = IVSHMEM(obj);
> +
> +    object_property_add_link(obj, "x-memdev", TYPE_MEMORY_BACKEND,
> +                             (Object **)&s->hostmem,
> +                             ivshmem_check_memdev_is_busy,
> +                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
> +                             &error_abort);
> +}
> +
> +static void ivshmem_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> +
> +    k->realize = ivshmem_realize;
> +    k->revision = 0;
> +    dc->desc = "Inter-VM shared memory (legacy)";
> +    dc->props = ivshmem_properties;
> +    dc->vmsd = &ivshmem_vmsd;
> +}
> +
> +static const TypeInfo ivshmem_info = {
> +    .name          = TYPE_IVSHMEM,
> +    .parent        = TYPE_IVSHMEM_COMMON,
> +    .instance_size = sizeof(IVShmemState),
> +    .instance_init = ivshmem_init,
> +    .class_init    = ivshmem_class_init,
> +};
> +
>  static void ivshmem_register_types(void)
>  {
>      type_register_static(&ivshmem_common_info);
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 36/38] ivshmem: Drop ivshmem property x-memdev
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 36/38] ivshmem: Drop ivshmem property x-memdev Markus Armbruster
@ 2016-03-03 14:03   ` Marc-André Lureau
  2016-03-03 14:17     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-03 14:03 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Use ivshmem-plain instead.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  hw/misc/ivshmem.c | 15 +--------------
>  1 file changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 33b6842..f6fce15 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -1197,8 +1197,7 @@ static void ivshmem_realize(PCIDevice *dev, Error **errp)
>      }
>
>      if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
> -        error_setg(errp,
> -                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
> +        error_setg(errp, "You must specify either 'shm' or 'chardev'");
>          return;
>      }

You could also get rid of hostmem checks here:

-    if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
+    if (!!s->server_chr + !!s->shmobj > 1) {
         error_setg(errp, "You must specify either 'shm' or 'chardev'");
         return;
     }

-    if (s->hostmem) {
-        if (s->sizearg) {
-            g_warning("size argument ignored with hostmem");
-        }
-    } else if (s->sizearg == NULL) {
+    if (s->sizearg == NULL) {

otherwise, looks good

> @@ -1246,17 +1245,6 @@ static void ivshmem_realize(PCIDevice *dev, Error **errp)
>      ivshmem_common_realize(dev, errp);
>  }
>
> -static void ivshmem_init(Object *obj)
> -{
> -    IVShmemState *s = IVSHMEM(obj);
> -
> -    object_property_add_link(obj, "x-memdev", TYPE_MEMORY_BACKEND,
> -                             (Object **)&s->hostmem,
> -                             ivshmem_check_memdev_is_busy,
> -                             OBJ_PROP_LINK_UNREF_ON_RELEASE,
> -                             &error_abort);
> -}
> -
>  static void ivshmem_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -1273,7 +1261,6 @@ static const TypeInfo ivshmem_info = {
>      .name          = TYPE_IVSHMEM,
>      .parent        = TYPE_IVSHMEM_COMMON,
>      .instance_size = sizeof(IVShmemState),
> -    .instance_init = ivshmem_init,
>      .class_init    = ivshmem_class_init,
>  };
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 37/38] ivshmem: Require master to have ID zero
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 37/38] ivshmem: Require master to have ID zero Markus Armbruster
@ 2016-03-03 14:11   ` Marc-André Lureau
  0 siblings, 0 replies; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-03 14:11 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Migration with ivshmem needs to be carefully orchestrated to work.
> Exactly one peer (the "master") migrates to the destination, all other
> peers need to unplug (and disconnect), migrate, plug back (and
> reconnect).  This is sort of documented in qemu-doc.
>
> If peers connect on the destination before migration completes, the
> shared memory can get messed up.  This isn't documented anywhere.  Fix
> that in qemu-doc.
>
> To avoid messing up register IVPosition on migration, the server must
> assign the same ID on source and destination.  ivshmem-spec.txt leaves
> ID assignment unspecified, however.
>
> Amend ivshmem-spec.txt to require the first client to receive ID zero.
> The example ivshmem-server complies: it always assigns the first
> unused ID.
>
> For a bit of additional safety, enforce ID zero for the master.  This
> does nothing when we're not using a server, because the ID is zero for
> all peers then.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>


>  docs/specs/ivshmem-spec.txt | 2 ++
>  hw/misc/ivshmem.c           | 6 ++++++
>  qemu-doc.texi               | 5 +++++
>  3 files changed, 13 insertions(+)
>
> diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
> index 047dfc0..b062acd 100644
> --- a/docs/specs/ivshmem-spec.txt
> +++ b/docs/specs/ivshmem-spec.txt
> @@ -164,6 +164,8 @@ For each new client that connects to the server, the server
>  - sends interrupt setup messages to the new client (these contain file
>    descriptors for receiving interrupts).
>
> +The first client to connect to the server receives ID zero.
> +
>  When a client disconnects from the server, the server sends disconnect
>  notifications to the other clients.
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index f6fce15..9a292e5 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -869,6 +869,12 @@ static void ivshmem_common_realize(PCIDevice *dev, Error **errp)
>              return;
>          }
>
> +        if (s->master == ON_OFF_AUTO_ON && s->vm_id != 0) {
> +            error_setg(errp,
> +                       "master must connect to the server before any peers");
> +            return;
> +        }
> +
>          qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive,
>                                ivshmem_read, NULL, s);
>
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index 0dd01c7..79141d3 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -1295,12 +1295,17 @@ When using the server, the guest will be assigned a VM ID (>=0) that allows gues
>  using the same server to communicate via interrupts.  Guests can read their
>  VM ID from a device register (see ivshmem-spec.txt).
>
> +@subsubsection Migration with ivshmem
> +
>  With device property @option{master=on}, the guest will copy the shared
>  memory on migration to the destination host.  With @option{master=off},
>  the guest will not be able to migrate with the device attached.  In the
>  latter case, the device should be detached and then reattached after
>  migration using the PCI hotplug support.
>
> +At most one of the devices sharing the same memory can be master.  The
> +master must complete migration before you plug back the other devices.
> +
>  @subsubsection ivshmem and hugepages
>
>  Instead of specifying the <shm size> using POSIX shm, you may specify
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 38/38] contrib/ivshmem-server: Print "not for production" warning
  2016-02-29 18:40 ` [Qemu-devel] [PATCH 38/38] contrib/ivshmem-server: Print "not for production" warning Markus Armbruster
@ 2016-03-03 14:15   ` Marc-André Lureau
  2016-03-07 18:42     ` Markus Armbruster
  0 siblings, 1 reply; 118+ messages in thread
From: Marc-André Lureau @ 2016-03-03 14:15 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
> The code is okay for illustrating how things work and for testing, but
> its error handling make it unfit for production use.  Print a warning
> to protect the innocent.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---

I guess David would do something about the problems you report. Tbh, I
don't think ivshmem-server is so bad wrt error handling.

Meanwhile:
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

>  contrib/ivshmem-server/main.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/contrib/ivshmem-server/main.c b/contrib/ivshmem-server/main.c
> index cca1061..97488dc 100644
> --- a/contrib/ivshmem-server/main.c
> +++ b/contrib/ivshmem-server/main.c
> @@ -197,6 +197,12 @@ main(int argc, char *argv[])
>      };
>      int ret = 1;
>
> +    /*
> +     * Do not remove this notice without adding proper error handling!
> +     * Start with handling ivshmem_server_send_one_msg() failure.
> +     */
> +    printf("*** Example code, do not use in production ***\n");
> +
>      /* parse arguments, will exit on error */
>      ivshmem_server_parse_args(&args, argc, argv);
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 36/38] ivshmem: Drop ivshmem property x-memdev
  2016-03-03 14:03   ` Marc-André Lureau
@ 2016-03-03 14:17     ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-03 14:17 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> Use ivshmem-plain instead.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  hw/misc/ivshmem.c | 15 +--------------
>>  1 file changed, 1 insertion(+), 14 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index 33b6842..f6fce15 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -1197,8 +1197,7 @@ static void ivshmem_realize(PCIDevice *dev, Error **errp)
>>      }
>>
>>      if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
>> -        error_setg(errp,
>> -                   "You must specify either 'shm', 'chardev' or 'x-memdev'");
>> +        error_setg(errp, "You must specify either 'shm' or 'chardev'");
>>          return;
>>      }
>
> You could also get rid of hostmem checks here:
>
> -    if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) {
> +    if (!!s->server_chr + !!s->shmobj > 1) {
>          error_setg(errp, "You must specify either 'shm' or 'chardev'");
>          return;
>      }
>
> -    if (s->hostmem) {
> -        if (s->sizearg) {
> -            g_warning("size argument ignored with hostmem");
> -        }
> -    } else if (s->sizearg == NULL) {
> +    if (s->sizearg == NULL) {

Will do.  No idea how I missed those :)

> otherwise, looks good

Thanks!

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 01/38] exec: Fix memory allocation when memory path names new file
  2016-03-01 11:35   ` Paolo Bonzini
  2016-03-01 11:58     ` Markus Armbruster
@ 2016-03-04 18:50     ` Markus Armbruster
  2016-03-07 13:12       ` Paolo Bonzini
  1 sibling, 1 reply; 118+ messages in thread
From: Markus Armbruster @ 2016-03-04 18:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: claudio.fontana, cam, mlureau, qemu-devel, david.marchand

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 29/02/2016 19:40, Markus Armbruster wrote:
>> -    if (!stat(path, &st) && S_ISDIR(st.st_mode)) {
>> +    ret = stat(path, &st);
>> +    if (!ret && S_ISDIR(st.st_mode)) {
>> +        /* path names a directory -> create a temporary file there */
>>          /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>>          sanitized_name = g_strdup(memory_region_name(block->mr));
>>          for (c = sanitized_name; *c != '\0'; c++) {
>> @@ -1282,13 +1271,32 @@ static void *file_ram_alloc(RAMBlock *block,
>>              unlink(filename);
>>          }
>>          g_free(filename);
>> +    } else if (!ret) {
>> +        /* path names an existing file -> use it */
>> +        fd = open(path, O_RDWR);
>>      } else {
>> +        /* create a new file */
>>          fd = open(path, O_RDWR | O_CREAT, 0644);
>> +        unlink_on_error = true;
>>      }
>
> While at it, let's avoid TOCTTOU conditions:
>
>     for (;;) {
>         fd = open(path, O_RDWR);
>         if (fd != -1) {
>             break;
>         }
>         if (errno == ENOENT) {
>             fd = open(path, O_RDWR | O_CREAT | O_EXCL, 0644);
>             if (fd != -1) {
>                 unlink_on_error = true;
>                 break;
>             }
>         } else if (errno == EISDIR) {
>             ... mkstemp ...
>             if (fd != -1) {
>                 unlink_on_error = true;
>                 break;
>             }
>         }
>         if (errno != EEXIST && errno != EINTR) {
>             goto error;
>         }
>     }
>
> and use fstatfs in gethugepagesize.

A question on gethugepagesize().  We have a couple of copies.

Here's target-ppc/kvm.c's:

    static long gethugepagesize(const char *mem_path)
    {
        struct statfs fs;
        int ret;

        do {
            ret = statfs(mem_path, &fs);
        } while (ret != 0 && errno == EINTR);

        if (ret != 0) {
            fprintf(stderr, "Couldn't statfs() memory path: %s\n",
                    strerror(errno));
            exit(1);
        }

    #define HUGETLBFS_MAGIC       0x958458f6

        if (fs.f_type != HUGETLBFS_MAGIC) {
            /* Explicit mempath, but it's ordinary pages */
            return getpagesize();
        }

        /* It's hugepage, return the huge page size */
        return fs.f_bsize;
    }

I guess the use of HUGETLBFS_MAGIC is fine since kvm.c is Linux-specific.

There's another one in ivshmem_server.c, functionally identical and
wrapped in CONFIG_LINUX.

Here's exec.c's:

    #define HUGETLBFS_MAGIC       0x958458f6

    static long gethugepagesize(const char *path, Error **errp)
    {
        struct statfs fs;
        int ret;

        do {
            ret = statfs(path, &fs);
        } while (ret != 0 && errno == EINTR);

        if (ret != 0) {
            error_setg_errno(errp, errno, "failed to get page size of file %s",
                             path);
            return 0;
        }

        return fs.f_bsize;
    }

Before commit bfc2a1a, it additionally had

    if (fs.f_type != HUGETLBFS_MAGIC)
        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);

Note the lack of "if not hugetlbfs, use getpagesize()" logic.

Here's util/mmap-alloc.c's:

    #define HUGETLBFS_MAGIC       0x958458f6

    #ifdef CONFIG_LINUX
    #include <sys/vfs.h>
    #endif

    size_t qemu_fd_getpagesize(int fd)
    {
    #ifdef CONFIG_LINUX
        struct statfs fs;
        int ret;

        if (fd != -1) {
            do {
                ret = fstatfs(fd, &fs);
            } while (ret != 0 && errno == EINTR);

            if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
                return fs.f_bsize;
            }
        }
    #endif

        return getpagesize();
    }

Would you like me to convert the others users to this one and drop the
dupes?

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 01/38] exec: Fix memory allocation when memory path names new file
  2016-03-04 18:50     ` Markus Armbruster
@ 2016-03-07 13:12       ` Paolo Bonzini
  0 siblings, 0 replies; 118+ messages in thread
From: Paolo Bonzini @ 2016-03-07 13:12 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: claudio.fontana, cam, mlureau, qemu-devel, david.marchand



On 04/03/2016 19:50, Markus Armbruster wrote:
> There's another one in ivshmem_server.c, functionally identical and
> wrapped in CONFIG_LINUX.

Not quite identical, since it returns -1 for non-hugetlbfs.  It should
return getpagesize().

> Here's exec.c's:
> 
>     #define HUGETLBFS_MAGIC       0x958458f6
> 
>     static long gethugepagesize(const char *path, Error **errp)
>     {
>         struct statfs fs;
>         int ret;
> 
>         do {
>             ret = statfs(path, &fs);
>         } while (ret != 0 && errno == EINTR);
> 
>         if (ret != 0) {
>             error_setg_errno(errp, errno, "failed to get page size of file %s",
>                              path);
>             return 0;
>         }
> 
>         return fs.f_bsize;
>     }
> 
> Before commit bfc2a1a, it additionally had
> 
>     if (fs.f_type != HUGETLBFS_MAGIC)
>         fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> 
> Note the lack of "if not hugetlbfs, use getpagesize()" logic.
> 
> Here's util/mmap-alloc.c's:
> 
>     #define HUGETLBFS_MAGIC       0x958458f6
> 
>     #ifdef CONFIG_LINUX
>     #include <sys/vfs.h>
>     #endif
> 
>     size_t qemu_fd_getpagesize(int fd)
>     {
>     #ifdef CONFIG_LINUX
>         struct statfs fs;
>         int ret;
> 
>         if (fd != -1) {
>             do {
>                 ret = fstatfs(fd, &fs);
>             } while (ret != 0 && errno == EINTR);
> 
>             if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
>                 return fs.f_bsize;
>             }
>         }
>     #endif
> 
>         return getpagesize();
>     }
> 
> Would you like me to convert the others users to this one and drop the
> dupes?

That would be great, since all of them really should use fstatfs instead
of statfs.

Paolo

^ permalink raw reply	[flat|nested] 118+ messages in thread

* Re: [Qemu-devel] [PATCH 38/38] contrib/ivshmem-server: Print "not for production" warning
  2016-03-03 14:15   ` Marc-André Lureau
@ 2016-03-07 18:42     ` Markus Armbruster
  0 siblings, 0 replies; 118+ messages in thread
From: Markus Armbruster @ 2016-03-07 18:42 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Paolo Bonzini, cam, Claudio Fontana, QEMU, David Marchand

Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> The code is okay for illustrating how things work and for testing, but
>> its error handling make it unfit for production use.  Print a warning
>> to protect the innocent.
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>
> I guess David would do something about the problems you report. Tbh, I

That would be nice.

> don't think ivshmem-server is so bad wrt error handling.

ivshmem_server_send_one_msg() returns -1 on error with errno set.  Okay.

ivshmem_server_send_initial_info() fails in turn.
ivshmem_server_handle_new_conn() handles this by closing the connection.
Okay, except for EAGAIN and EINTR.

All other callers ignore ivshmem_server_send_one_msg() failures.  Not
okay.

Here's an example of how things could go haywire:

* The server handles connections one after the other.  It makes the file
  descriptors non-blocking.

* When a client connects, ivshmem-server sends 3 + N*V messages to the
  new client, and V messages to each existing client, where N is the
  number of existing clients, and V is the number of vectors.  Of these,
  only the 3 to the new client are checked for errors.  The unchecked
  messages transmit eventfds for interrupts in groups of V messages.

* With a sufficiently large N*V and a sluggish client, the server can
  conceivably hit EAGAIN.  When it happens, the server drops messages
  silently.

* InterVM interrupts corresponding to dropped eventfds will be silently
  dropped.

* If out a group of V messages any non-trailing messages get dropped,
  the trailing ones get silently miswired to the wrong vector.

Good luck debugging this in the field!

A thorough review of error handling is called for.  Since I can't do
that now, I'm adding the warning.

> Meanwhile:
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 118+ messages in thread

end of thread, other threads:[~2016-03-07 18:42 UTC | newest]

Thread overview: 118+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-29 18:40 [Qemu-devel] [PATCH 00/38] ivshmem: Fixes, cleanups, device model split Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 01/38] exec: Fix memory allocation when memory path names new file Markus Armbruster
2016-03-01 11:35   ` Paolo Bonzini
2016-03-01 11:58     ` Markus Armbruster
2016-03-04 18:50     ` Markus Armbruster
2016-03-07 13:12       ` Paolo Bonzini
2016-02-29 18:40 ` [Qemu-devel] [PATCH 02/38] qemu-doc: Fix ivshmem huge page example Markus Armbruster
2016-03-01 10:51   ` Marc-André Lureau
2016-03-01 11:35   ` Paolo Bonzini
2016-02-29 18:40 ` [Qemu-devel] [PATCH 03/38] event_notifier: Make event_notifier_init_fd() #ifdef CONFIG_EVENTFD Markus Armbruster
2016-03-01 10:57   ` Marc-André Lureau
2016-03-01 12:00     ` Markus Armbruster
2016-03-01 12:05       ` Paolo Bonzini
2016-03-01 11:35   ` Paolo Bonzini
2016-02-29 18:40 ` [Qemu-devel] [PATCH 04/38] tests/libqos/pci-pc: Fix qpci_pc_iomap() to map BARs aligned Markus Armbruster
2016-03-01 11:05   ` Marc-André Lureau
2016-03-01 12:05     ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 05/38] ivshmem-test: Improve test case /ivshmem/single Markus Armbruster
2016-03-01 11:06   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 06/38] ivshmem-test: Clean up wait for devices to become operational Markus Armbruster
2016-03-01 11:10   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 07/38] ivshmem-test: Improve test cases /ivshmem/server-* Markus Armbruster
2016-03-01 11:13   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 08/38] ivshmem: Rewrite specification document Markus Armbruster
2016-03-01 11:25   ` Marc-André Lureau
2016-03-01 15:46   ` Eric Blake
2016-03-02  9:50     ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 09/38] ivshmem: Add missing newlines to debug printfs Markus Armbruster
2016-03-01 12:20   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 10/38] ivshmem: Compile debug prints unconditionally to prevent bit-rot Markus Armbruster
2016-03-01 12:22   ` Marc-André Lureau
2016-03-01 15:49     ` Eric Blake
2016-03-02  9:51       ` Markus Armbruster
2016-03-02 15:52         ` Eric Blake
2016-02-29 18:40 ` [Qemu-devel] [PATCH 11/38] ivshmem: Clean up after commit 9940c32 Markus Armbruster
2016-03-01 12:47   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 12/38] ivshmem: Drop ivshmem_event() stub Markus Armbruster
2016-03-01 12:48   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 13/38] ivshmem: Don't destroy the chardev on version mismatch Markus Armbruster
2016-03-01 15:39   ` Marc-André Lureau
2016-03-02  9:52     ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 14/38] ivshmem: Fix harmless misuse of Error Markus Armbruster
2016-03-01 15:47   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 15/38] ivshmem: Failed realize() can leave migration blocker behind Markus Armbruster
2016-03-01 15:59   ` Marc-André Lureau
2016-03-02  9:54     ` Markus Armbruster
2016-03-02 10:50       ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 16/38] ivshmem: Clean up register callbacks Markus Armbruster
2016-03-01 16:04   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 17/38] ivshmem: Clean up MSI-X conditions Markus Armbruster
2016-03-01 16:57   ` Marc-André Lureau
2016-03-02 10:25     ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 18/38] ivshmem: Leave INTx alone when using MSI-X Markus Armbruster
2016-03-01 17:14   ` Marc-André Lureau
2016-03-01 17:30     ` Paolo Bonzini
2016-03-02 11:04       ` Markus Armbruster
2016-03-02 14:15         ` Paolo Bonzini
2016-03-02 15:50           ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 19/38] ivshmem: Assert interrupts are set up once Markus Armbruster
2016-03-02 12:02   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 20/38] ivshmem: Simplify rejection of invalid peer ID from server Markus Armbruster
2016-03-02 15:08   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 21/38] ivshmem: Disentangle ivshmem_read() Markus Armbruster
2016-03-02 15:28   ` Marc-André Lureau
2016-03-02 15:53     ` Markus Armbruster
2016-03-02 17:33       ` Marc-André Lureau
2016-03-02 19:15         ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 22/38] ivshmem: Plug leaks on unplug, fix peer disconnect Markus Armbruster
2016-03-02 17:47   ` Marc-André Lureau
2016-03-02 19:19     ` Markus Armbruster
2016-03-02 23:52       ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 23/38] ivshmem: Receive shared memory synchronously in realize() Markus Armbruster
2016-03-02 18:11   ` Marc-André Lureau
2016-03-02 19:28     ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup() Markus Armbruster
2016-03-02 18:27   ` Marc-André Lureau
2016-03-02 19:35     ` Markus Armbruster
2016-03-03  0:03       ` Marc-André Lureau
2016-03-03  7:16         ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 25/38] ivshmem: Rely on server sending the ID right after the version Markus Armbruster
2016-03-02 18:36   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 26/38] ivshmem: Drop the hackish test for UNIX domain chardev Markus Armbruster
2016-03-02 18:38   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 27/38] ivshmem: Simplify how we cope with short reads from server Markus Armbruster
2016-03-02 18:41   ` Marc-André Lureau
2016-03-02 19:38     ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 28/38] ivshmem: Tighten check of property "size" Markus Armbruster
2016-03-02 18:44   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 29/38] ivshmem: Implement shm=... with a memory backend Markus Armbruster
2016-03-01 11:37   ` Paolo Bonzini
2016-03-01 12:08     ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 30/38] ivshmem: Simplify memory regions for BAR 2 (shared memory) Markus Armbruster
2016-03-01 11:42   ` Paolo Bonzini
2016-03-01 12:14     ` Markus Armbruster
2016-03-01 12:17       ` Paolo Bonzini
2016-03-01 11:46   ` Paolo Bonzini
2016-03-01 14:06     ` Markus Armbruster
2016-03-01 15:15       ` Paolo Bonzini
2016-03-02 11:06         ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 31/38] ivshmem: Inline check_shm_size() into its only caller Markus Armbruster
2016-03-02 18:49   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 32/38] qdev: New DEFINE_PROP_ON_OFF_AUTO Markus Armbruster
2016-03-02 18:54   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 33/38] ivshmem: Replace int role_val by OnOffAuto master Markus Armbruster
2016-03-02 18:56   ` Marc-André Lureau
2016-03-02 19:39     ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 34/38] ivshmem: Split ivshmem-plain, ivshmem-doorbell off ivshmem Markus Armbruster
2016-03-03 13:53   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 35/38] ivshmem: Clean up after the previous commit Markus Armbruster
2016-03-03 13:56   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 36/38] ivshmem: Drop ivshmem property x-memdev Markus Armbruster
2016-03-03 14:03   ` Marc-André Lureau
2016-03-03 14:17     ` Markus Armbruster
2016-02-29 18:40 ` [Qemu-devel] [PATCH 37/38] ivshmem: Require master to have ID zero Markus Armbruster
2016-03-03 14:11   ` Marc-André Lureau
2016-02-29 18:40 ` [Qemu-devel] [PATCH 38/38] contrib/ivshmem-server: Print "not for production" warning Markus Armbruster
2016-03-03 14:15   ` Marc-André Lureau
2016-03-07 18:42     ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.