All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/5] proposal to make hostmem listener RAM unplug safe
@ 2013-04-01  8:20 Liu Ping Fan
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 1/5] memory: add ref/unref interface for MemroyRegionOps Liu Ping Fan
                   ` (4 more replies)
  0 siblings, 5 replies; 35+ messages in thread
From: Liu Ping Fan @ 2013-04-01  8:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

This is an proposal similar to push core listener out of biglock
     for core listener out of biglock, 
     refer to orignal plan posted by Marcelo Tosatti,
     http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04315.html

While core listener's out of biglock still no driven use case, as to
HostMem listner, it is already out of biglock, we need to ensure it RAM
unplug safe.  Currently, to handle such issue, we need to call
bdrv_drain_all() in hostmem->commit(), while these patch fix this issue
with refcnt.

RAM_Device {
  int refcnt; ---> this is what we to inc/dec when hostmem_lookup()/"req cb"
....
  MemoryRegion mr;
}

For RAM hotunplug, please refer to Vasilis Liaskovitis's patches
   http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02693.html


Liu Ping Fan (5):
  memory: add ref/unref interface for MemroyRegionOps
  hostmem: make hostmem global and RAM hotunplg safe
  vring: use hostmem's RAM safe api
  virtio-blk: release reference to RAM's memoryRegion
  hostmem: init/finalize hostmem listener

 hw/dataplane/hostmem.c    |  130 +++++++++++++++++++++++++++++++-------------
 hw/dataplane/hostmem.h    |   19 ++-----
 hw/dataplane/virtio-blk.c |   52 ++++++++++++++----
 hw/dataplane/vring.c      |   88 ++++++++++++++++++++++--------
 hw/dataplane/vring.h      |    5 +-
 include/exec/memory.h     |   10 ++++
 include/qemu/main-loop.h  |    2 +
 memory.c                  |   18 ++++++
 vl.c                      |    2 +
 9 files changed, 235 insertions(+), 91 deletions(-)

-- 
1.7.4.4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 1/5] memory: add ref/unref interface for MemroyRegionOps
  2013-04-01  8:20 [Qemu-devel] [PATCH 0/5] proposal to make hostmem listener RAM unplug safe Liu Ping Fan
@ 2013-04-01  8:20 ` Liu Ping Fan
  2013-04-11  9:49   ` Stefan Hajnoczi
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe Liu Ping Fan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Liu Ping Fan @ 2013-04-01  8:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

This pair of interface are optinal, except for those device which is
used outside the biglock's protection for hot unplug. Currently,
HostMem used by virtio-blk dataplane is outside biglock, so the RAM
device should implement this.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/exec/memory.h |   10 ++++++++++
 memory.c              |   18 ++++++++++++++++++
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2322732..4289e62 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -55,6 +55,12 @@ struct MemoryRegionIORange {
  * Memory region callbacks
  */
 struct MemoryRegionOps {
+
+    /* ref/unref pair is optional;  ref.
+     * inc refcnt of object who store MemoryRegion
+     */
+    void (*ref)(void);
+    void (*unref)(void);
     /* Read from the memory region. @addr is relative to @mr; @size is
      * in bytes. */
     uint64_t (*read)(void *opaque,
@@ -228,6 +234,10 @@ struct MemoryListener {
     QTAILQ_ENTRY(MemoryListener) link;
 };
 
+/**/
+bool memory_region_ref(MemoryRegion *mr);
+bool memory_region_unref(MemoryRegion *mr);
+
 /**
  * memory_region_init: Initialize a memory region
  *
diff --git a/memory.c b/memory.c
index 75ca281..c29998d 100644
--- a/memory.c
+++ b/memory.c
@@ -786,6 +786,24 @@ static bool memory_region_wrong_endianness(MemoryRegion *mr)
 #endif
 }
 
+bool memory_region_ref(MemoryRegion *mr)
+{
+    if (mr->ops && mr->ops->ref) {
+        mr->ops->ref();
+        return true;
+    }
+    return false;
+}
+
+bool memory_region_unref(MemoryRegion *mr)
+{
+    if (mr->ops && mr->ops->unref) {
+        mr->ops->unref();
+        return true;
+    }
+    return false;
+}
+
 void memory_region_init(MemoryRegion *mr,
                         const char *name,
                         uint64_t size)
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-01  8:20 [Qemu-devel] [PATCH 0/5] proposal to make hostmem listener RAM unplug safe Liu Ping Fan
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 1/5] memory: add ref/unref interface for MemroyRegionOps Liu Ping Fan
@ 2013-04-01  8:20 ` Liu Ping Fan
  2013-04-02  6:11   ` li guang
                     ` (2 more replies)
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 3/5] vring: use hostmem's RAM safe api Liu Ping Fan
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 35+ messages in thread
From: Liu Ping Fan @ 2013-04-01  8:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

The hostmem listener will translate gpa to hva quickly. Make it
global, so other component can use it.
Also this patch adopt MemoryRegionOps's ref/unref interface to
make it survive the RAM hotunplug.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/dataplane/hostmem.c |  130 +++++++++++++++++++++++++++++++++--------------
 hw/dataplane/hostmem.h |   19 ++------
 2 files changed, 95 insertions(+), 54 deletions(-)

diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
index 380537e..86c02cd 100644
--- a/hw/dataplane/hostmem.c
+++ b/hw/dataplane/hostmem.c
@@ -13,6 +13,12 @@
 
 #include "exec/address-spaces.h"
 #include "hostmem.h"
+#include "qemu/main-loop.h"
+
+/* lock to protect the access to cur_hostmem */
+static QemuMutex hostmem_lock;
+static HostMem *cur_hostmem;
+static HostMem *next_hostmem;
 
 static int hostmem_lookup_cmp(const void *phys_, const void *region_)
 {
@@ -28,16 +34,49 @@ static int hostmem_lookup_cmp(const void *phys_, const void *region_)
     }
 }
 
+static void hostmem_ref(HostMem *hostmem)
+{
+    assert(__sync_add_and_fetch(&hostmem->ref, 1) > 0);
+}
+
+void hostmem_unref(HostMem *hostmem)
+{
+    int i;
+    HostMemRegion *hmr;
+
+    if (!__sync_sub_and_fetch(&hostmem->ref, 1)) {
+        for (i = 0; i < hostmem->num_current_regions; i++) {
+            hmr = &hostmem->current_regions[i];
+            /* Fix me, when memory hotunplug implement
+             * assert(hmr->mr_ops->unref)
+             */
+            if (hmr->mr->ops && hmr->mr->ops->unref) {
+                hmr->mr->ops->unref();
+            }
+        }
+        g_free(hostmem->current_regions);
+        g_free(hostmem);
+    }
+}
+
 /**
  * Map guest physical address to host pointer
+ * also inc refcnt of *mr, caller need to unref later
  */
-void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
+void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write)
 {
     HostMemRegion *region;
     void *host_addr = NULL;
     hwaddr offset_within_region;
+    HostMem *hostmem;
+
+    assert(mr);
+    *mr = NULL;
+    qemu_mutex_lock(&hostmem_lock);
+    hostmem = cur_hostmem;
+    hostmem_ref(hostmem);
+    qemu_mutex_unlock(&hostmem_lock);
 
-    qemu_mutex_lock(&hostmem->current_regions_lock);
     region = bsearch(&phys, hostmem->current_regions,
                      hostmem->num_current_regions,
                      sizeof(hostmem->current_regions[0]),
@@ -52,28 +91,33 @@ void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
     if (len <= region->size - offset_within_region) {
         host_addr = region->host_addr + offset_within_region;
     }
-out:
-    qemu_mutex_unlock(&hostmem->current_regions_lock);
+    *mr = region->mr;
+    memory_region_ref(*mr);
 
+out:
+    hostmem_unref(hostmem);
     return host_addr;
 }
 
+static void hostmem_listener_begin(MemoryListener *listener)
+{
+    next_hostmem = g_new0(HostMem, 1);
+    next_hostmem->ref = 1;
+}
+
 /**
  * Install new regions list
  */
 static void hostmem_listener_commit(MemoryListener *listener)
 {
-    HostMem *hostmem = container_of(listener, HostMem, listener);
+    HostMem *tmp;
 
-    qemu_mutex_lock(&hostmem->current_regions_lock);
-    g_free(hostmem->current_regions);
-    hostmem->current_regions = hostmem->new_regions;
-    hostmem->num_current_regions = hostmem->num_new_regions;
-    qemu_mutex_unlock(&hostmem->current_regions_lock);
+    tmp = cur_hostmem;
+    qemu_mutex_lock(&hostmem_lock);
+    cur_hostmem = next_hostmem;
+    qemu_mutex_unlock(&hostmem_lock);
+    hostmem_unref(tmp);
 
-    /* Reset new regions list */
-    hostmem->new_regions = NULL;
-    hostmem->num_new_regions = 0;
 }
 
 /**
@@ -83,23 +127,24 @@ static void hostmem_append_new_region(HostMem *hostmem,
                                       MemoryRegionSection *section)
 {
     void *ram_ptr = memory_region_get_ram_ptr(section->mr);
-    size_t num = hostmem->num_new_regions;
-    size_t new_size = (num + 1) * sizeof(hostmem->new_regions[0]);
+    size_t num = hostmem->num_current_regions;
+    size_t new_size = (num + 1) * sizeof(hostmem->current_regions[0]);
 
-    hostmem->new_regions = g_realloc(hostmem->new_regions, new_size);
-    hostmem->new_regions[num] = (HostMemRegion){
+    hostmem->current_regions = g_realloc(hostmem->current_regions, new_size);
+    hostmem->current_regions[num] = (HostMemRegion){
         .host_addr = ram_ptr + section->offset_within_region,
         .guest_addr = section->offset_within_address_space,
+        .mr = section->mr,
         .size = section->size,
         .readonly = section->readonly,
     };
-    hostmem->num_new_regions++;
+    hostmem->num_current_regions++;
 }
 
-static void hostmem_listener_append_region(MemoryListener *listener,
+static void hostmem_listener_nop_region(MemoryListener *listener,
                                            MemoryRegionSection *section)
 {
-    HostMem *hostmem = container_of(listener, HostMem, listener);
+    HostMem *hostmem = next_hostmem;
 
     /* Ignore non-RAM regions, we may not be able to map them */
     if (!memory_region_is_ram(section->mr)) {
@@ -114,6 +159,18 @@ static void hostmem_listener_append_region(MemoryListener *listener,
     hostmem_append_new_region(hostmem, section);
 }
 
+static void hostmem_listener_append_region(MemoryListener *listener,
+                                           MemoryRegionSection *section)
+{
+    hostmem_listener_nop_region(listener, section);
+    /* Fix me, when memory hotunplug implement
+     * assert(section->mr->ops->ref)
+     */
+    if (section->mr->ops && section->mr->ops->ref) {
+        section->mr->ops->ref();
+    }
+}
+
 /* We don't implement most MemoryListener callbacks, use these nop stubs */
 static void hostmem_listener_dummy(MemoryListener *listener)
 {
@@ -137,18 +194,12 @@ static void hostmem_listener_coalesced_mmio_dummy(MemoryListener *listener,
 {
 }
 
-void hostmem_init(HostMem *hostmem)
-{
-    memset(hostmem, 0, sizeof(*hostmem));
-
-    qemu_mutex_init(&hostmem->current_regions_lock);
-
-    hostmem->listener = (MemoryListener){
-        .begin = hostmem_listener_dummy,
+static MemoryListener hostmem_listener = {
+        .begin = hostmem_listener_begin,
         .commit = hostmem_listener_commit,
         .region_add = hostmem_listener_append_region,
         .region_del = hostmem_listener_section_dummy,
-        .region_nop = hostmem_listener_append_region,
+        .region_nop = hostmem_listener_nop_region,
         .log_start = hostmem_listener_section_dummy,
         .log_stop = hostmem_listener_section_dummy,
         .log_sync = hostmem_listener_section_dummy,
@@ -159,18 +210,19 @@ void hostmem_init(HostMem *hostmem)
         .coalesced_mmio_add = hostmem_listener_coalesced_mmio_dummy,
         .coalesced_mmio_del = hostmem_listener_coalesced_mmio_dummy,
         .priority = 10,
-    };
+};
 
-    memory_listener_register(&hostmem->listener, &address_space_memory);
-    if (hostmem->num_new_regions > 0) {
-        hostmem_listener_commit(&hostmem->listener);
-    }
+void hostmem_init(void)
+{
+    qemu_mutex_init(&hostmem_lock);
+    cur_hostmem = g_new0(HostMem, 1);
+    cur_hostmem->ref = 1;
+    memory_listener_register(&hostmem_listener, &address_space_memory);
 }
 
-void hostmem_finalize(HostMem *hostmem)
+void hostmem_finalize(void)
 {
-    memory_listener_unregister(&hostmem->listener);
-    g_free(hostmem->new_regions);
-    g_free(hostmem->current_regions);
-    qemu_mutex_destroy(&hostmem->current_regions_lock);
+    memory_listener_unregister(&hostmem_listener);
+    hostmem_unref(cur_hostmem);
+    qemu_mutex_destroy(&hostmem_lock);
 }
diff --git a/hw/dataplane/hostmem.h b/hw/dataplane/hostmem.h
index b2cf093..883ba74 100644
--- a/hw/dataplane/hostmem.h
+++ b/hw/dataplane/hostmem.h
@@ -20,29 +20,18 @@
 typedef struct {
     void *host_addr;
     hwaddr guest_addr;
+    MemoryRegion *mr;
     uint64_t size;
     bool readonly;
 } HostMemRegion;
 
 typedef struct {
-    /* The listener is invoked when regions change and a new list of regions is
-     * built up completely before they are installed.
-     */
-    MemoryListener listener;
-    HostMemRegion *new_regions;
-    size_t num_new_regions;
-
-    /* Current regions are accessed from multiple threads either to lookup
-     * addresses or to install a new list of regions.  The lock protects the
-     * pointer and the regions.
-     */
-    QemuMutex current_regions_lock;
+    int ref;
     HostMemRegion *current_regions;
     size_t num_current_regions;
 } HostMem;
 
-void hostmem_init(HostMem *hostmem);
-void hostmem_finalize(HostMem *hostmem);
+void hostmem_unref(HostMem *hostmem);
 
 /**
  * Map a guest physical address to a pointer
@@ -52,6 +41,6 @@ void hostmem_finalize(HostMem *hostmem);
  * can be done with other mechanisms like bdrv_drain_all() that quiesce
  * in-flight I/O.
  */
-void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write);
+void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write);
 
 #endif /* HOSTMEM_H */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 3/5] vring: use hostmem's RAM safe api
  2013-04-01  8:20 [Qemu-devel] [PATCH 0/5] proposal to make hostmem listener RAM unplug safe Liu Ping Fan
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 1/5] memory: add ref/unref interface for MemroyRegionOps Liu Ping Fan
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe Liu Ping Fan
@ 2013-04-01  8:20 ` Liu Ping Fan
  2013-04-11 10:15   ` Stefan Hajnoczi
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion Liu Ping Fan
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 5/5] hostmem: init/finalize hostmem listener Liu Ping Fan
  4 siblings, 1 reply; 35+ messages in thread
From: Liu Ping Fan @ 2013-04-01  8:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

When lookup gpa to hva, the corresponding MemoryRegion will be exposed
to caller, and will release by caller later

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/dataplane/vring.c     |   88 ++++++++++++++++++++++++++++++++++------------
 hw/dataplane/vring.h     |    5 ++-
 include/qemu/main-loop.h |    2 +
 3 files changed, 70 insertions(+), 25 deletions(-)

diff --git a/hw/dataplane/vring.c b/hw/dataplane/vring.c
index e3b2253..5a9b335 100644
--- a/hw/dataplane/vring.c
+++ b/hw/dataplane/vring.c
@@ -27,8 +27,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
 
     vring->broken = false;
 
-    hostmem_init(&vring->hostmem);
-    vring_ptr = hostmem_lookup(&vring->hostmem, vring_addr, vring_size, true);
+    vring_ptr = hostmem_lookup(vring_addr, vring_size, &vring->vring_mr, true);
     if (!vring_ptr) {
         error_report("Failed to map vring "
                      "addr %#" HWADDR_PRIx " size %" HWADDR_PRIu,
@@ -51,7 +50,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
 
 void vring_teardown(Vring *vring)
 {
-    hostmem_finalize(&vring->hostmem);
+    memory_region_unref(vring->vring_mr);
 }
 
 /* Disable guest->host notifies */
@@ -111,11 +110,14 @@ bool vring_should_notify(VirtIODevice *vdev, Vring *vring)
 static int get_indirect(Vring *vring,
                         struct iovec iov[], struct iovec *iov_end,
                         unsigned int *out_num, unsigned int *in_num,
-                        struct vring_desc *indirect)
+                        struct vring_desc *indirect,
+                        MemoryRegion ***mrs)
 {
     struct vring_desc desc;
     unsigned int i = 0, count, found = 0;
-
+    MemoryRegion **cur = *mrs;
+    int ret = 0;
+    MemoryRegion *tmp;
     /* Sanity check */
     if (unlikely(indirect->len % sizeof(desc))) {
         error_report("Invalid length in indirect descriptor: "
@@ -138,16 +140,18 @@ static int get_indirect(Vring *vring,
         struct vring_desc *desc_ptr;
 
         /* Translate indirect descriptor */
-        desc_ptr = hostmem_lookup(&vring->hostmem,
-                                  indirect->addr + found * sizeof(desc),
-                                  sizeof(desc), false);
+        desc_ptr = hostmem_lookup(indirect->addr + found * sizeof(desc),
+                                  sizeof(desc),
+                                  &tmp,
+                                  false);
         if (!desc_ptr) {
             error_report("Failed to map indirect descriptor "
                          "addr %#" PRIx64 " len %zu",
                          (uint64_t)indirect->addr + found * sizeof(desc),
                          sizeof(desc));
             vring->broken = true;
-            return -EFAULT;
+            ret = -EFAULT;
+            goto fail;
         }
         desc = *desc_ptr;
 
@@ -158,31 +162,41 @@ static int get_indirect(Vring *vring,
             error_report("Loop detected: last one at %u "
                          "indirect size %u", i, count);
             vring->broken = true;
-            return -EFAULT;
+            memory_region_unref(tmp);
+            ret = -EFAULT;
+            goto fail;
         }
 
         if (unlikely(desc.flags & VRING_DESC_F_INDIRECT)) {
             error_report("Nested indirect descriptor");
             vring->broken = true;
-            return -EFAULT;
+            memory_region_unref(tmp);
+            ret = -EFAULT;
+            goto fail;
         }
 
         /* Stop for now if there are not enough iovecs available. */
         if (iov >= iov_end) {
-            return -ENOBUFS;
+            memory_region_unref(tmp);
+            ret = -ENOBUFS;
+            goto fail;
         }
 
-        iov->iov_base = hostmem_lookup(&vring->hostmem, desc.addr, desc.len,
+        iov->iov_base = hostmem_lookup(desc.addr, desc.len,
+                                       cur,
                                        desc.flags & VRING_DESC_F_WRITE);
         if (!iov->iov_base) {
             error_report("Failed to map indirect descriptor"
                          "addr %#" PRIx64 " len %u",
                          (uint64_t)desc.addr, desc.len);
             vring->broken = true;
-            return -EFAULT;
+            memory_region_unref(tmp);
+            ret = -EFAULT;
+            goto fail;
         }
         iov->iov_len = desc.len;
         iov++;
+        cur++;
 
         /* If this is an input descriptor, increment that count. */
         if (desc.flags & VRING_DESC_F_WRITE) {
@@ -194,13 +208,23 @@ static int get_indirect(Vring *vring,
                 error_report("Indirect descriptor "
                              "has out after in: idx %u", i);
                 vring->broken = true;
-                return -EFAULT;
+                memory_region_unref(tmp);
+                ret = -EFAULT;
+                goto fail;
             }
             *out_num += 1;
         }
         i = desc.next;
+        memory_region_unref(tmp);
     } while (desc.flags & VRING_DESC_F_NEXT);
+    *mrs = cur;
     return 0;
+
+fail:
+    for (; cur > *mrs; cur--) {
+        memory_region_unref(*cur);
+    }
+    return ret;
 }
 
 /* This looks in the virtqueue and for the first available buffer, and converts
@@ -212,15 +236,20 @@ static int get_indirect(Vring *vring,
  * never a valid descriptor number) if none was found.  A negative code is
  * returned on error.
  *
+ * @mrs should be no less than iov[]
+ *
  * Stolen from linux/drivers/vhost/vhost.c.
  */
 int vring_pop(VirtIODevice *vdev, Vring *vring,
               struct iovec iov[], struct iovec *iov_end,
-              unsigned int *out_num, unsigned int *in_num)
+              unsigned int *out_num, unsigned int *in_num,
+              MemoryRegion **mrs)
 {
     struct vring_desc desc;
     unsigned int i, head, found = 0, num = vring->vr.num;
     uint16_t avail_idx, last_avail_idx;
+    MemoryRegion **cur = mrs;
+    int ret = 0;
 
     /* If there was a fatal error then refuse operation */
     if (vring->broken) {
@@ -270,13 +299,15 @@ int vring_pop(VirtIODevice *vdev, Vring *vring,
         if (unlikely(i >= num)) {
             error_report("Desc index is %u > %u, head = %u", i, num, head);
             vring->broken = true;
-            return -EFAULT;
+            ret = -EFAULT;
+            goto fail;
         }
         if (unlikely(++found > num)) {
             error_report("Loop detected: last one at %u vq size %u head %u",
                          i, num, head);
             vring->broken = true;
-            return -EFAULT;
+            ret = -EFAULT;
+            goto fail;
         }
         desc = vring->vr.desc[i];
 
@@ -284,7 +315,8 @@ int vring_pop(VirtIODevice *vdev, Vring *vring,
         barrier();
 
         if (desc.flags & VRING_DESC_F_INDIRECT) {
-            int ret = get_indirect(vring, iov, iov_end, out_num, in_num, &desc);
+            int ret = get_indirect(vring, iov, iov_end, out_num, in_num, &desc,
+                        &cur);
             if (ret < 0) {
                 return ret;
             }
@@ -296,20 +328,24 @@ int vring_pop(VirtIODevice *vdev, Vring *vring,
          * with the current set.
          */
         if (iov >= iov_end) {
-            return -ENOBUFS;
+            ret = -ENOBUFS;
+            goto fail;
         }
 
         /* TODO handle non-contiguous memory across region boundaries */
-        iov->iov_base = hostmem_lookup(&vring->hostmem, desc.addr, desc.len,
+        iov->iov_base = hostmem_lookup(desc.addr, desc.len,
+                                       cur,
                                        desc.flags & VRING_DESC_F_WRITE);
         if (!iov->iov_base) {
             error_report("Failed to map vring desc addr %#" PRIx64 " len %u",
                          (uint64_t)desc.addr, desc.len);
             vring->broken = true;
-            return -EFAULT;
+            ret = -EFAULT;
+            goto fail;
         }
         iov->iov_len  = desc.len;
         iov++;
+        cur++;
 
         if (desc.flags & VRING_DESC_F_WRITE) {
             /* If this is an input descriptor,
@@ -321,7 +357,8 @@ int vring_pop(VirtIODevice *vdev, Vring *vring,
             if (unlikely(*in_num)) {
                 error_report("Descriptor has out after in: idx %d", i);
                 vring->broken = true;
-                return -EFAULT;
+                ret = -EFAULT;
+                goto fail;
             }
             *out_num += 1;
         }
@@ -331,6 +368,11 @@ int vring_pop(VirtIODevice *vdev, Vring *vring,
     /* On success, increment avail index. */
     vring->last_avail_idx++;
     return head;
+fail:
+    for (; cur > mrs; cur--) {
+        memory_region_unref(*cur);
+    }
+    return ret;
 }
 
 /* After we've used one of their buffers, we tell them about it.
diff --git a/hw/dataplane/vring.h b/hw/dataplane/vring.h
index defb1ef..e00deaa 100644
--- a/hw/dataplane/vring.h
+++ b/hw/dataplane/vring.h
@@ -23,7 +23,7 @@
 #include "hw/virtio.h"
 
 typedef struct {
-    HostMem hostmem;                /* guest memory mapper */
+    MemoryRegion *vring_mr;   /* RAM's memoryRegion on which this vring sits */
     struct vring vr;                /* virtqueue vring mapped to host memory */
     uint16_t last_avail_idx;        /* last processed avail ring index */
     uint16_t last_used_idx;         /* last processed used ring index */
@@ -56,7 +56,8 @@ bool vring_enable_notification(VirtIODevice *vdev, Vring *vring);
 bool vring_should_notify(VirtIODevice *vdev, Vring *vring);
 int vring_pop(VirtIODevice *vdev, Vring *vring,
               struct iovec iov[], struct iovec *iov_end,
-              unsigned int *out_num, unsigned int *in_num);
+              unsigned int *out_num, unsigned int *in_num,
+              MemoryRegion **mrs);
 void vring_push(Vring *vring, unsigned int head, int len);
 
 #endif /* VRING_H */
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 6f0200a..58592be 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -308,4 +308,6 @@ void qemu_iohandler_poll(GArray *pollfds, int rc);
 QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
 void qemu_bh_schedule_idle(QEMUBH *bh);
 
+void hostmem_init(void);
+void hostmem_finalize(void);
 #endif
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion
  2013-04-01  8:20 [Qemu-devel] [PATCH 0/5] proposal to make hostmem listener RAM unplug safe Liu Ping Fan
                   ` (2 preceding siblings ...)
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 3/5] vring: use hostmem's RAM safe api Liu Ping Fan
@ 2013-04-01  8:20 ` Liu Ping Fan
  2013-04-02  5:58   ` li guang
  2013-04-11 10:20   ` Stefan Hajnoczi
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 5/5] hostmem: init/finalize hostmem listener Liu Ping Fan
  4 siblings, 2 replies; 35+ messages in thread
From: Liu Ping Fan @ 2013-04-01  8:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

virtio-blk will reference to RAM's memoryRegion when the req has been
done.  So we can avoid to call bdrv_drain_all() when RAM hot unplug.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 hw/dataplane/virtio-blk.c |   52 ++++++++++++++++++++++++++++++++++----------
 1 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/hw/dataplane/virtio-blk.c b/hw/dataplane/virtio-blk.c
index 1242d61..437174d 100644
--- a/hw/dataplane/virtio-blk.c
+++ b/hw/dataplane/virtio-blk.c
@@ -34,6 +34,8 @@ enum {
 
 typedef struct {
     struct iocb iocb;               /* Linux AIO control block */
+    MemoryRegion *mrs[VRING_MAX];
+    int mrs_cnt;
     QEMUIOVector *inhdr;            /* iovecs for virtio_blk_inhdr */
     unsigned int head;              /* vring descriptor index */
     struct iovec *bounce_iov;       /* used if guest buffers are unaligned */
@@ -120,6 +122,9 @@ static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
      * transferred plus the status bytes.
      */
     vring_push(&s->vring, req->head, len + sizeof(hdr));
+    while (--req->mrs_cnt >= 0) {
+        memory_region_unref(req->mrs[req->mrs_cnt]);
+    }
 
     s->num_reqs--;
 }
@@ -155,7 +160,8 @@ static void do_get_id_cmd(VirtIOBlockDataPlane *s,
 static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
                        struct iovec *iov, unsigned int iov_cnt,
                        long long offset, unsigned int head,
-                       QEMUIOVector *inhdr)
+                       QEMUIOVector *inhdr,
+                       MemoryRegion **mrs, int cnt)
 {
     struct iocb *iocb;
     QEMUIOVector qiov;
@@ -187,6 +193,8 @@ static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
 
     /* Fill in virtio block metadata needed for completion */
     VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
+    memcpy(req->mrs, mrs, cnt*sizeof(MemoryRegion *));
+    req->mrs_cnt = cnt;
     req->head = head;
     req->inhdr = inhdr;
     req->bounce_iov = bounce_iov;
@@ -196,19 +204,22 @@ static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
 
 static int process_request(IOQueue *ioq, struct iovec iov[],
                            unsigned int out_num, unsigned int in_num,
-                           unsigned int head)
+                           unsigned int head, MemoryRegion **mrs)
 {
     VirtIOBlockDataPlane *s = container_of(ioq, VirtIOBlockDataPlane, ioqueue);
     struct iovec *in_iov = &iov[out_num];
     struct virtio_blk_outhdr outhdr;
     QEMUIOVector *inhdr;
     size_t in_size;
+    unsigned int i, cnt = out_num+in_num;
+    int ret;
 
     /* Copy in outhdr */
     if (unlikely(iov_to_buf(iov, out_num, 0, &outhdr,
                             sizeof(outhdr)) != sizeof(outhdr))) {
         error_report("virtio-blk request outhdr too short");
-        return -EFAULT;
+        ret = -EFAULT;
+        goto free_mrs;
     }
     iov_discard_front(&iov, &out_num, sizeof(outhdr));
 
@@ -216,7 +227,8 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
     in_size = iov_size(in_iov, in_num);
     if (in_size < sizeof(struct virtio_blk_inhdr)) {
         error_report("virtio_blk request inhdr too short");
-        return -EFAULT;
+        ret = -EFAULT;
+        goto free_mrs;
     }
     inhdr = g_slice_new(QEMUIOVector);
     qemu_iovec_init(inhdr, 1);
@@ -229,18 +241,22 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
     outhdr.type &= ~VIRTIO_BLK_T_BARRIER;
 
     switch (outhdr.type) {
+    /* For VIRTIO_BLK_T_IN/OUT, MemoryRegion will be release when cb */
     case VIRTIO_BLK_T_IN:
-        do_rdwr_cmd(s, true, in_iov, in_num, outhdr.sector * 512, head, inhdr);
+        do_rdwr_cmd(s, true, in_iov, in_num, outhdr.sector * 512, head, inhdr,
+            mrs, cnt);
         return 0;
 
     case VIRTIO_BLK_T_OUT:
-        do_rdwr_cmd(s, false, iov, out_num, outhdr.sector * 512, head, inhdr);
+        do_rdwr_cmd(s, false, iov, out_num, outhdr.sector * 512, head, inhdr,
+            mrs, cnt);
         return 0;
 
     case VIRTIO_BLK_T_SCSI_CMD:
         /* TODO support SCSI commands */
         complete_request_early(s, head, inhdr, VIRTIO_BLK_S_UNSUPP);
-        return 0;
+        ret = 0;
+        goto free_mrs;
 
     case VIRTIO_BLK_T_FLUSH:
         /* TODO fdsync not supported by Linux AIO, do it synchronously here! */
@@ -249,18 +265,27 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
         } else {
             complete_request_early(s, head, inhdr, VIRTIO_BLK_S_OK);
         }
-        return 0;
+        ret = 0;
+        goto free_mrs;
 
     case VIRTIO_BLK_T_GET_ID:
         do_get_id_cmd(s, in_iov, in_num, head, inhdr);
-        return 0;
+        ret = 0;
+        goto free_mrs;
 
     default:
         error_report("virtio-blk unsupported request type %#x", outhdr.type);
         qemu_iovec_destroy(inhdr);
         g_slice_free(QEMUIOVector, inhdr);
-        return -EFAULT;
+        ret = -EFAULT;
+        goto free_mrs;
+    }
+
+free_mrs:
+    for (i = 0; i < cnt; i++) {
+        memory_region_unref(mrs[i]);
     }
+    return ret;
 }
 
 static int flush_true(EventNotifier *e)
@@ -286,6 +311,7 @@ static void handle_notify(EventNotifier *e)
     struct iovec iovec[VRING_MAX];
     struct iovec *end = &iovec[VRING_MAX];
     struct iovec *iov = iovec;
+    MemoryRegion *mrs[VRING_MAX];
 
     /* When a request is read from the vring, the index of the first descriptor
      * (aka head) is returned so that the completed request can be pushed onto
@@ -304,7 +330,8 @@ static void handle_notify(EventNotifier *e)
         vring_disable_notification(s->vdev, &s->vring);
 
         for (;;) {
-            head = vring_pop(s->vdev, &s->vring, iov, end, &out_num, &in_num);
+            head = vring_pop(s->vdev, &s->vring, iov, end, &out_num, &in_num,
+                            mrs);
             if (head < 0) {
                 break; /* no more requests */
             }
@@ -312,7 +339,8 @@ static void handle_notify(EventNotifier *e)
             trace_virtio_blk_data_plane_process_request(s, out_num, in_num,
                                                         head);
 
-            if (process_request(&s->ioqueue, iov, out_num, in_num, head) < 0) {
+            if (process_request(&s->ioqueue, iov, out_num, in_num, head,
+                    mrs) < 0) {
                 vring_set_broken(&s->vring);
                 break;
             }
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 5/5] hostmem: init/finalize hostmem listener
  2013-04-01  8:20 [Qemu-devel] [PATCH 0/5] proposal to make hostmem listener RAM unplug safe Liu Ping Fan
                   ` (3 preceding siblings ...)
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion Liu Ping Fan
@ 2013-04-01  8:20 ` Liu Ping Fan
  2013-04-11 10:02   ` Paolo Bonzini
  2013-06-13  4:38   ` [Qemu-devel] about atexit() (was: [PATCH 5/5] hostmem: init/finalize hostmem listener) Amos Kong
  4 siblings, 2 replies; 35+ messages in thread
From: Liu Ping Fan @ 2013-04-01  8:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 vl.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 7643f16..46a25cf 100644
--- a/vl.c
+++ b/vl.c
@@ -4157,6 +4157,7 @@ int main(int argc, char **argv, char **envp)
     }
 
     os_set_line_buffering();
+    hostmem_init();
 
     qemu_init_cpu_loop();
     qemu_mutex_lock_iothread();
@@ -4174,6 +4175,7 @@ int main(int argc, char **argv, char **envp)
 
     /* clean up network at qemu process termination */
     atexit(&net_cleanup);
+    atexit(&hostmem_finalize);
 
     if (net_init_clients() < 0) {
         exit(1);
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion Liu Ping Fan
@ 2013-04-02  5:58   ` li guang
  2013-04-12  4:44     ` liu ping fan
  2013-04-11 10:20   ` Stefan Hajnoczi
  1 sibling, 1 reply; 35+ messages in thread
From: li guang @ 2013-04-02  5:58 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

在 2013-04-01一的 16:20 +0800,Liu Ping Fan写道:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> virtio-blk will reference to RAM's memoryRegion when the req has been
> done. 

  do you mean unreference after req completed?

> So we can avoid to call bdrv_drain_all() when RAM hot unplug.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  hw/dataplane/virtio-blk.c |   52 ++++++++++++++++++++++++++++++++++----------
>  1 files changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/dataplane/virtio-blk.c b/hw/dataplane/virtio-blk.c
> index 1242d61..437174d 100644
> --- a/hw/dataplane/virtio-blk.c
> +++ b/hw/dataplane/virtio-blk.c
> @@ -34,6 +34,8 @@ enum {
>  
>  typedef struct {
>      struct iocb iocb;               /* Linux AIO control block */
> +    MemoryRegion *mrs[VRING_MAX];
> +    int mrs_cnt;
>      QEMUIOVector *inhdr;            /* iovecs for virtio_blk_inhdr */
>      unsigned int head;              /* vring descriptor index */
>      struct iovec *bounce_iov;       /* used if guest buffers are unaligned */
> @@ -120,6 +122,9 @@ static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
>       * transferred plus the status bytes.
>       */
>      vring_push(&s->vring, req->head, len + sizeof(hdr));
> +    while (--req->mrs_cnt >= 0) {
> +        memory_region_unref(req->mrs[req->mrs_cnt]);
> +    }
>  
>      s->num_reqs--;
>  }
> @@ -155,7 +160,8 @@ static void do_get_id_cmd(VirtIOBlockDataPlane *s,
>  static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
>                         struct iovec *iov, unsigned int iov_cnt,
>                         long long offset, unsigned int head,
> -                       QEMUIOVector *inhdr)
> +                       QEMUIOVector *inhdr,
> +                       MemoryRegion **mrs, int cnt)
>  {
>      struct iocb *iocb;
>      QEMUIOVector qiov;
> @@ -187,6 +193,8 @@ static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
>  
>      /* Fill in virtio block metadata needed for completion */
>      VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
> +    memcpy(req->mrs, mrs, cnt*sizeof(MemoryRegion *));
> +    req->mrs_cnt = cnt;
>      req->head = head;
>      req->inhdr = inhdr;
>      req->bounce_iov = bounce_iov;
> @@ -196,19 +204,22 @@ static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
>  
>  static int process_request(IOQueue *ioq, struct iovec iov[],
>                             unsigned int out_num, unsigned int in_num,
> -                           unsigned int head)
> +                           unsigned int head, MemoryRegion **mrs)
>  {
>      VirtIOBlockDataPlane *s = container_of(ioq, VirtIOBlockDataPlane, ioqueue);
>      struct iovec *in_iov = &iov[out_num];
>      struct virtio_blk_outhdr outhdr;
>      QEMUIOVector *inhdr;
>      size_t in_size;
> +    unsigned int i, cnt = out_num+in_num;
> +    int ret;
>  
>      /* Copy in outhdr */
>      if (unlikely(iov_to_buf(iov, out_num, 0, &outhdr,
>                              sizeof(outhdr)) != sizeof(outhdr))) {
>          error_report("virtio-blk request outhdr too short");
> -        return -EFAULT;
> +        ret = -EFAULT;
> +        goto free_mrs;
>      }
>      iov_discard_front(&iov, &out_num, sizeof(outhdr));
>  
> @@ -216,7 +227,8 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
>      in_size = iov_size(in_iov, in_num);
>      if (in_size < sizeof(struct virtio_blk_inhdr)) {
>          error_report("virtio_blk request inhdr too short");
> -        return -EFAULT;
> +        ret = -EFAULT;
> +        goto free_mrs;
>      }
>      inhdr = g_slice_new(QEMUIOVector);
>      qemu_iovec_init(inhdr, 1);
> @@ -229,18 +241,22 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
>      outhdr.type &= ~VIRTIO_BLK_T_BARRIER;
>  
>      switch (outhdr.type) {
> +    /* For VIRTIO_BLK_T_IN/OUT, MemoryRegion will be release when cb */
>      case VIRTIO_BLK_T_IN:
> -        do_rdwr_cmd(s, true, in_iov, in_num, outhdr.sector * 512, head, inhdr);
> +        do_rdwr_cmd(s, true, in_iov, in_num, outhdr.sector * 512, head, inhdr,
> +            mrs, cnt);
>          return 0;
>  
>      case VIRTIO_BLK_T_OUT:
> -        do_rdwr_cmd(s, false, iov, out_num, outhdr.sector * 512, head, inhdr);
> +        do_rdwr_cmd(s, false, iov, out_num, outhdr.sector * 512, head, inhdr,
> +            mrs, cnt);
>          return 0;
>  
>      case VIRTIO_BLK_T_SCSI_CMD:
>          /* TODO support SCSI commands */
>          complete_request_early(s, head, inhdr, VIRTIO_BLK_S_UNSUPP);
> -        return 0;
> +        ret = 0;
> +        goto free_mrs;

IMHO, 's/goto free_mrs/break/g', so free_mrs label can be removed.

>  
>      case VIRTIO_BLK_T_FLUSH:
>          /* TODO fdsync not supported by Linux AIO, do it synchronously here! */
> @@ -249,18 +265,27 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
>          } else {
>              complete_request_early(s, head, inhdr, VIRTIO_BLK_S_OK);
>          }
> -        return 0;
> +        ret = 0;
> +        goto free_mrs;
>  
>      case VIRTIO_BLK_T_GET_ID:
>          do_get_id_cmd(s, in_iov, in_num, head, inhdr);
> -        return 0;
> +        ret = 0;
> +        goto free_mrs;
>  
>      default:
>          error_report("virtio-blk unsupported request type %#x", outhdr.type);
>          qemu_iovec_destroy(inhdr);
>          g_slice_free(QEMUIOVector, inhdr);
> -        return -EFAULT;
> +        ret = -EFAULT;
> +        goto free_mrs;
> +    }
> +
> +free_mrs:
> +    for (i = 0; i < cnt; i++) {
> +        memory_region_unref(mrs[i]);
>      }
> +    return ret;
>  }
>  
>  static int flush_true(EventNotifier *e)
> @@ -286,6 +311,7 @@ static void handle_notify(EventNotifier *e)
>      struct iovec iovec[VRING_MAX];
>      struct iovec *end = &iovec[VRING_MAX];
>      struct iovec *iov = iovec;
> +    MemoryRegion *mrs[VRING_MAX];
>  
>      /* When a request is read from the vring, the index of the first descriptor
>       * (aka head) is returned so that the completed request can be pushed onto
> @@ -304,7 +330,8 @@ static void handle_notify(EventNotifier *e)
>          vring_disable_notification(s->vdev, &s->vring);
>  
>          for (;;) {
> -            head = vring_pop(s->vdev, &s->vring, iov, end, &out_num, &in_num);
> +            head = vring_pop(s->vdev, &s->vring, iov, end, &out_num, &in_num,
> +                            mrs);
>              if (head < 0) {
>                  break; /* no more requests */
>              }
> @@ -312,7 +339,8 @@ static void handle_notify(EventNotifier *e)
>              trace_virtio_blk_data_plane_process_request(s, out_num, in_num,
>                                                          head);
>  
> -            if (process_request(&s->ioqueue, iov, out_num, in_num, head) < 0) {
> +            if (process_request(&s->ioqueue, iov, out_num, in_num, head,
> +                    mrs) < 0) {
>                  vring_set_broken(&s->vring);
>                  break;
>              }

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe Liu Ping Fan
@ 2013-04-02  6:11   ` li guang
  2013-04-11 10:09   ` Paolo Bonzini
  2013-04-11 10:11   ` Stefan Hajnoczi
  2 siblings, 0 replies; 35+ messages in thread
From: li guang @ 2013-04-02  6:11 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

在 2013-04-01一的 16:20 +0800,Liu Ping Fan写道:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> The hostmem listener will translate gpa to hva quickly. Make it
> global, so other component can use it.
> Also this patch adopt MemoryRegionOps's ref/unref interface to
> make it survive the RAM hotunplug.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  hw/dataplane/hostmem.c |  130 +++++++++++++++++++++++++++++++++--------------
>  hw/dataplane/hostmem.h |   19 ++------
>  2 files changed, 95 insertions(+), 54 deletions(-)
> 
> diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
> index 380537e..86c02cd 100644
> --- a/hw/dataplane/hostmem.c
> +++ b/hw/dataplane/hostmem.c
> @@ -13,6 +13,12 @@
>  
>  #include "exec/address-spaces.h"
>  #include "hostmem.h"
> +#include "qemu/main-loop.h"
> +
> +/* lock to protect the access to cur_hostmem */
> +static QemuMutex hostmem_lock;
> +static HostMem *cur_hostmem;
> +static HostMem *next_hostmem;
>  
>  static int hostmem_lookup_cmp(const void *phys_, const void *region_)
>  {
> @@ -28,16 +34,49 @@ static int hostmem_lookup_cmp(const void *phys_, const void *region_)
>      }
>  }
>  
> +static void hostmem_ref(HostMem *hostmem)
> +{
> +    assert(__sync_add_and_fetch(&hostmem->ref, 1) > 0);
> +}
> +
> +void hostmem_unref(HostMem *hostmem)
> +{
> +    int i;
> +    HostMemRegion *hmr;
> +
> +    if (!__sync_sub_and_fetch(&hostmem->ref, 1)) {

maybe '__sync_sub_and_fetch(&hostmem->ref, 1) > 0' is more safer

> +        for (i = 0; i < hostmem->num_current_regions; i++) {
> +            hmr = &hostmem->current_regions[i];
> +            /* Fix me, when memory hotunplug implement
> +             * assert(hmr->mr_ops->unref)
> +             */
> +            if (hmr->mr->ops && hmr->mr->ops->unref) {
> +                hmr->mr->ops->unref();
> +            }
> +        }
> +        g_free(hostmem->current_regions);
> +        g_free(hostmem);
> +    }
> +}
> +
>  /**
>   * Map guest physical address to host pointer
> + * also inc refcnt of *mr, caller need to unref later
>   */
> -void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
> +void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write)
>  {
>      HostMemRegion *region;
>      void *host_addr = NULL;
>      hwaddr offset_within_region;
> +    HostMem *hostmem;
> +
> +    assert(mr);
> +    *mr = NULL;
> +    qemu_mutex_lock(&hostmem_lock);
> +    hostmem = cur_hostmem;
> +    hostmem_ref(hostmem);
> +    qemu_mutex_unlock(&hostmem_lock);
>  
> -    qemu_mutex_lock(&hostmem->current_regions_lock);
>      region = bsearch(&phys, hostmem->current_regions,
>                       hostmem->num_current_regions,
>                       sizeof(hostmem->current_regions[0]),
> @@ -52,28 +91,33 @@ void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
>      if (len <= region->size - offset_within_region) {
>          host_addr = region->host_addr + offset_within_region;
>      }
> -out:
> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
> +    *mr = region->mr;
> +    memory_region_ref(*mr);
>  
> +out:
> +    hostmem_unref(hostmem);
>      return host_addr;
>  }
>  
> +static void hostmem_listener_begin(MemoryListener *listener)
> +{
> +    next_hostmem = g_new0(HostMem, 1);
> +    next_hostmem->ref = 1;
> +}
> +
>  /**
>   * Install new regions list
>   */
>  static void hostmem_listener_commit(MemoryListener *listener)
>  {
> -    HostMem *hostmem = container_of(listener, HostMem, listener);
> +    HostMem *tmp;
>  
> -    qemu_mutex_lock(&hostmem->current_regions_lock);
> -    g_free(hostmem->current_regions);
> -    hostmem->current_regions = hostmem->new_regions;
> -    hostmem->num_current_regions = hostmem->num_new_regions;
> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
> +    tmp = cur_hostmem;
> +    qemu_mutex_lock(&hostmem_lock);
> +    cur_hostmem = next_hostmem;
> +    qemu_mutex_unlock(&hostmem_lock);
> +    hostmem_unref(tmp);
>  
> -    /* Reset new regions list */
> -    hostmem->new_regions = NULL;
> -    hostmem->num_new_regions = 0;
>  }
>  
>  /**
> @@ -83,23 +127,24 @@ static void hostmem_append_new_region(HostMem *hostmem,
>                                        MemoryRegionSection *section)
>  {
>      void *ram_ptr = memory_region_get_ram_ptr(section->mr);
> -    size_t num = hostmem->num_new_regions;
> -    size_t new_size = (num + 1) * sizeof(hostmem->new_regions[0]);
> +    size_t num = hostmem->num_current_regions;
> +    size_t new_size = (num + 1) * sizeof(hostmem->current_regions[0]);
>  
> -    hostmem->new_regions = g_realloc(hostmem->new_regions, new_size);
> -    hostmem->new_regions[num] = (HostMemRegion){
> +    hostmem->current_regions = g_realloc(hostmem->current_regions, new_size);
> +    hostmem->current_regions[num] = (HostMemRegion){
>          .host_addr = ram_ptr + section->offset_within_region,
>          .guest_addr = section->offset_within_address_space,
> +        .mr = section->mr,
>          .size = section->size,
>          .readonly = section->readonly,
>      };
> -    hostmem->num_new_regions++;
> +    hostmem->num_current_regions++;
>  }
>  
> -static void hostmem_listener_append_region(MemoryListener *listener,
> +static void hostmem_listener_nop_region(MemoryListener *listener,
>                                             MemoryRegionSection *section)
>  {
> -    HostMem *hostmem = container_of(listener, HostMem, listener);
> +    HostMem *hostmem = next_hostmem;
>  
>      /* Ignore non-RAM regions, we may not be able to map them */
>      if (!memory_region_is_ram(section->mr)) {
> @@ -114,6 +159,18 @@ static void hostmem_listener_append_region(MemoryListener *listener,
>      hostmem_append_new_region(hostmem, section);
>  }
>  
> +static void hostmem_listener_append_region(MemoryListener *listener,
> +                                           MemoryRegionSection *section)
> +{
> +    hostmem_listener_nop_region(listener, section);
> +    /* Fix me, when memory hotunplug implement
> +     * assert(section->mr->ops->ref)
> +     */
> +    if (section->mr->ops && section->mr->ops->ref) {
> +        section->mr->ops->ref();
> +    }
> +}
> +
>  /* We don't implement most MemoryListener callbacks, use these nop stubs */
>  static void hostmem_listener_dummy(MemoryListener *listener)
>  {
> @@ -137,18 +194,12 @@ static void hostmem_listener_coalesced_mmio_dummy(MemoryListener *listener,
>  {
>  }
>  
> -void hostmem_init(HostMem *hostmem)
> -{
> -    memset(hostmem, 0, sizeof(*hostmem));
> -
> -    qemu_mutex_init(&hostmem->current_regions_lock);
> -
> -    hostmem->listener = (MemoryListener){
> -        .begin = hostmem_listener_dummy,
> +static MemoryListener hostmem_listener = {
> +        .begin = hostmem_listener_begin,
>          .commit = hostmem_listener_commit,
>          .region_add = hostmem_listener_append_region,
>          .region_del = hostmem_listener_section_dummy,
> -        .region_nop = hostmem_listener_append_region,
> +        .region_nop = hostmem_listener_nop_region,
>          .log_start = hostmem_listener_section_dummy,
>          .log_stop = hostmem_listener_section_dummy,
>          .log_sync = hostmem_listener_section_dummy,
> @@ -159,18 +210,19 @@ void hostmem_init(HostMem *hostmem)
>          .coalesced_mmio_add = hostmem_listener_coalesced_mmio_dummy,
>          .coalesced_mmio_del = hostmem_listener_coalesced_mmio_dummy,
>          .priority = 10,
> -    };
> +};
>  
> -    memory_listener_register(&hostmem->listener, &address_space_memory);
> -    if (hostmem->num_new_regions > 0) {
> -        hostmem_listener_commit(&hostmem->listener);
> -    }
> +void hostmem_init(void)
> +{
> +    qemu_mutex_init(&hostmem_lock);
> +    cur_hostmem = g_new0(HostMem, 1);
> +    cur_hostmem->ref = 1;
> +    memory_listener_register(&hostmem_listener, &address_space_memory);
>  }
>  
> -void hostmem_finalize(HostMem *hostmem)
> +void hostmem_finalize(void)
>  {
> -    memory_listener_unregister(&hostmem->listener);
> -    g_free(hostmem->new_regions);
> -    g_free(hostmem->current_regions);
> -    qemu_mutex_destroy(&hostmem->current_regions_lock);
> +    memory_listener_unregister(&hostmem_listener);
> +    hostmem_unref(cur_hostmem);
> +    qemu_mutex_destroy(&hostmem_lock);
>  }
> diff --git a/hw/dataplane/hostmem.h b/hw/dataplane/hostmem.h
> index b2cf093..883ba74 100644
> --- a/hw/dataplane/hostmem.h
> +++ b/hw/dataplane/hostmem.h
> @@ -20,29 +20,18 @@
>  typedef struct {
>      void *host_addr;
>      hwaddr guest_addr;
> +    MemoryRegion *mr;
>      uint64_t size;
>      bool readonly;
>  } HostMemRegion;
>  
>  typedef struct {
> -    /* The listener is invoked when regions change and a new list of regions is
> -     * built up completely before they are installed.
> -     */
> -    MemoryListener listener;
> -    HostMemRegion *new_regions;
> -    size_t num_new_regions;
> -
> -    /* Current regions are accessed from multiple threads either to lookup
> -     * addresses or to install a new list of regions.  The lock protects the
> -     * pointer and the regions.
> -     */
> -    QemuMutex current_regions_lock;
> +    int ref;
>      HostMemRegion *current_regions;
>      size_t num_current_regions;
>  } HostMem;
>  
> -void hostmem_init(HostMem *hostmem);
> -void hostmem_finalize(HostMem *hostmem);
> +void hostmem_unref(HostMem *hostmem);
>  
>  /**
>   * Map a guest physical address to a pointer
> @@ -52,6 +41,6 @@ void hostmem_finalize(HostMem *hostmem);
>   * can be done with other mechanisms like bdrv_drain_all() that quiesce
>   * in-flight I/O.
>   */
> -void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write);
> +void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write);
>  
>  #endif /* HOSTMEM_H */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] memory: add ref/unref interface for MemroyRegionOps
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 1/5] memory: add ref/unref interface for MemroyRegionOps Liu Ping Fan
@ 2013-04-11  9:49   ` Stefan Hajnoczi
  2013-04-12  4:12     ` liu ping fan
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2013-04-11  9:49 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Mon, Apr 01, 2013 at 04:20:30PM +0800, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> This pair of interface are optinal, except for those device which is
> used outside the biglock's protection for hot unplug.

Not sure if this comment is true.  Memory unplug safety is not about the
big lock, it's about whether a reference to memory is held *across* a
hot unplug operation.

So even code that is under the big lock can use a guest RAM buffer
across the event loop, and therefore be exposed to a RAM unplug!

Therefore inc/dec must be used if guest RAM is held across event loop
handler calls.  If the guest RAM access happens completely inside a
handler function, then it is not affected by hot plug and doesn't need
to do inc/dec.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] hostmem: init/finalize hostmem listener
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 5/5] hostmem: init/finalize hostmem listener Liu Ping Fan
@ 2013-04-11 10:02   ` Paolo Bonzini
  2013-04-11 12:08     ` liu ping fan
  2013-06-13  4:38   ` [Qemu-devel] about atexit() (was: [PATCH 5/5] hostmem: init/finalize hostmem listener) Amos Kong
  1 sibling, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-04-11 10:02 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Stefan Hajnoczi

Il 01/04/2013 10:20, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  vl.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index 7643f16..46a25cf 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4157,6 +4157,7 @@ int main(int argc, char **argv, char **envp)
>      }
>  
>      os_set_line_buffering();
> +    hostmem_init();
>  
>      qemu_init_cpu_loop();
>      qemu_mutex_lock_iothread();
> @@ -4174,6 +4175,7 @@ int main(int argc, char **argv, char **envp)
>  
>      /* clean up network at qemu process termination */
>      atexit(&net_cleanup);
> +    atexit(&hostmem_finalize);

This should be in hostmem_init.

Paolo

>  
>      if (net_init_clients() < 0) {
>          exit(1);
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe Liu Ping Fan
  2013-04-02  6:11   ` li guang
@ 2013-04-11 10:09   ` Paolo Bonzini
  2013-04-12  6:46     ` liu ping fan
  2013-04-11 10:11   ` Stefan Hajnoczi
  2 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-04-11 10:09 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Stefan Hajnoczi

Il 01/04/2013 10:20, Liu Ping Fan ha scritto:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> The hostmem listener will translate gpa to hva quickly. Make it
> global, so other component can use it.
> Also this patch adopt MemoryRegionOps's ref/unref interface to
> make it survive the RAM hotunplug.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  hw/dataplane/hostmem.c |  130 +++++++++++++++++++++++++++++++++--------------
>  hw/dataplane/hostmem.h |   19 ++------
>  2 files changed, 95 insertions(+), 54 deletions(-)
> 
> diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
> index 380537e..86c02cd 100644
> --- a/hw/dataplane/hostmem.c
> +++ b/hw/dataplane/hostmem.c
> @@ -13,6 +13,12 @@
>  
>  #include "exec/address-spaces.h"
>  #include "hostmem.h"
> +#include "qemu/main-loop.h"
> +
> +/* lock to protect the access to cur_hostmem */
> +static QemuMutex hostmem_lock;
> +static HostMem *cur_hostmem;
> +static HostMem *next_hostmem;
>  
>  static int hostmem_lookup_cmp(const void *phys_, const void *region_)
>  {
> @@ -28,16 +34,49 @@ static int hostmem_lookup_cmp(const void *phys_, const void *region_)
>      }
>  }
>  
> +static void hostmem_ref(HostMem *hostmem)
> +{
> +    assert(__sync_add_and_fetch(&hostmem->ref, 1) > 0);
> +}
> +
> +void hostmem_unref(HostMem *hostmem)
> +{
> +    int i;
> +    HostMemRegion *hmr;
> +
> +    if (!__sync_sub_and_fetch(&hostmem->ref, 1)) {
> +        for (i = 0; i < hostmem->num_current_regions; i++) {
> +            hmr = &hostmem->current_regions[i];
> +            /* Fix me, when memory hotunplug implement
> +             * assert(hmr->mr_ops->unref)
> +             */
> +            if (hmr->mr->ops && hmr->mr->ops->unref) {
> +                hmr->mr->ops->unref();
> +            }
> +        }
> +        g_free(hostmem->current_regions);
> +        g_free(hostmem);
> +    }
> +}
> +
>  /**
>   * Map guest physical address to host pointer
> + * also inc refcnt of *mr, caller need to unref later
>   */
> -void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
> +void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write)
>  {
>      HostMemRegion *region;
>      void *host_addr = NULL;
>      hwaddr offset_within_region;
> +    HostMem *hostmem;
> +
> +    assert(mr);
> +    *mr = NULL;

I think it's simpler if you allow a NULL MemoryRegion.

Also, it is better if you keep the HostMem type, because in principle
different HostMems could be used for different AddressSpaces.
Basically, what is now HostMem would become HostMemRegions, and the new
HostMem type would have these fields:

   QemuMutex         hostmem_lock;
   HostMemRegions   *cur_regions;
   HostMemRegions   *next_regions;

matching your three globals.  I like the rcu-ready design.

Finally, please split this patch and patch 3 differently:

1) add HostMemRegions

2) add ref/unref of memory regions to hostmem

3) add MemoryRegion argument to hostmem_lookup, leave it NULL in vring

4) add ref/unref of memory regions to vring

Paolo

> +    qemu_mutex_lock(&hostmem_lock);
> +    hostmem = cur_hostmem;
> +    hostmem_ref(hostmem);
> +    qemu_mutex_unlock(&hostmem_lock);
>  
> -    qemu_mutex_lock(&hostmem->current_regions_lock);
>      region = bsearch(&phys, hostmem->current_regions,
>                       hostmem->num_current_regions,
>                       sizeof(hostmem->current_regions[0]),
> @@ -52,28 +91,33 @@ void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
>      if (len <= region->size - offset_within_region) {
>          host_addr = region->host_addr + offset_within_region;
>      }
> -out:
> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
> +    *mr = region->mr;
> +    memory_region_ref(*mr);
>  
> +out:
> +    hostmem_unref(hostmem);
>      return host_addr;
>  }
>  
> +static void hostmem_listener_begin(MemoryListener *listener)
> +{
> +    next_hostmem = g_new0(HostMem, 1);
> +    next_hostmem->ref = 1;
> +}
> +
>  /**
>   * Install new regions list
>   */
>  static void hostmem_listener_commit(MemoryListener *listener)
>  {
> -    HostMem *hostmem = container_of(listener, HostMem, listener);
> +    HostMem *tmp;
>  
> -    qemu_mutex_lock(&hostmem->current_regions_lock);
> -    g_free(hostmem->current_regions);
> -    hostmem->current_regions = hostmem->new_regions;
> -    hostmem->num_current_regions = hostmem->num_new_regions;
> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
> +    tmp = cur_hostmem;
> +    qemu_mutex_lock(&hostmem_lock);
> +    cur_hostmem = next_hostmem;
> +    qemu_mutex_unlock(&hostmem_lock);
> +    hostmem_unref(tmp);
>  
> -    /* Reset new regions list */
> -    hostmem->new_regions = NULL;
> -    hostmem->num_new_regions = 0;
>  }
>  
>  /**
> @@ -83,23 +127,24 @@ static void hostmem_append_new_region(HostMem *hostmem,
>                                        MemoryRegionSection *section)
>  {
>      void *ram_ptr = memory_region_get_ram_ptr(section->mr);
> -    size_t num = hostmem->num_new_regions;
> -    size_t new_size = (num + 1) * sizeof(hostmem->new_regions[0]);
> +    size_t num = hostmem->num_current_regions;
> +    size_t new_size = (num + 1) * sizeof(hostmem->current_regions[0]);
>  
> -    hostmem->new_regions = g_realloc(hostmem->new_regions, new_size);
> -    hostmem->new_regions[num] = (HostMemRegion){
> +    hostmem->current_regions = g_realloc(hostmem->current_regions, new_size);
> +    hostmem->current_regions[num] = (HostMemRegion){
>          .host_addr = ram_ptr + section->offset_within_region,
>          .guest_addr = section->offset_within_address_space,
> +        .mr = section->mr,
>          .size = section->size,
>          .readonly = section->readonly,
>      };
> -    hostmem->num_new_regions++;
> +    hostmem->num_current_regions++;
>  }
>  
> -static void hostmem_listener_append_region(MemoryListener *listener,
> +static void hostmem_listener_nop_region(MemoryListener *listener,
>                                             MemoryRegionSection *section)
>  {
> -    HostMem *hostmem = container_of(listener, HostMem, listener);
> +    HostMem *hostmem = next_hostmem;
>  
>      /* Ignore non-RAM regions, we may not be able to map them */
>      if (!memory_region_is_ram(section->mr)) {
> @@ -114,6 +159,18 @@ static void hostmem_listener_append_region(MemoryListener *listener,
>      hostmem_append_new_region(hostmem, section);
>  }
>  
> +static void hostmem_listener_append_region(MemoryListener *listener,
> +                                           MemoryRegionSection *section)
> +{
> +    hostmem_listener_nop_region(listener, section);
> +    /* Fix me, when memory hotunplug implement
> +     * assert(section->mr->ops->ref)
> +     */
> +    if (section->mr->ops && section->mr->ops->ref) {
> +        section->mr->ops->ref();
> +    }
> +}
> +
>  /* We don't implement most MemoryListener callbacks, use these nop stubs */
>  static void hostmem_listener_dummy(MemoryListener *listener)
>  {
> @@ -137,18 +194,12 @@ static void hostmem_listener_coalesced_mmio_dummy(MemoryListener *listener,
>  {
>  }
>  
> -void hostmem_init(HostMem *hostmem)
> -{
> -    memset(hostmem, 0, sizeof(*hostmem));
> -
> -    qemu_mutex_init(&hostmem->current_regions_lock);
> -
> -    hostmem->listener = (MemoryListener){
> -        .begin = hostmem_listener_dummy,
> +static MemoryListener hostmem_listener = {
> +        .begin = hostmem_listener_begin,
>          .commit = hostmem_listener_commit,
>          .region_add = hostmem_listener_append_region,
>          .region_del = hostmem_listener_section_dummy,
> -        .region_nop = hostmem_listener_append_region,
> +        .region_nop = hostmem_listener_nop_region,
>          .log_start = hostmem_listener_section_dummy,
>          .log_stop = hostmem_listener_section_dummy,
>          .log_sync = hostmem_listener_section_dummy,
> @@ -159,18 +210,19 @@ void hostmem_init(HostMem *hostmem)
>          .coalesced_mmio_add = hostmem_listener_coalesced_mmio_dummy,
>          .coalesced_mmio_del = hostmem_listener_coalesced_mmio_dummy,
>          .priority = 10,
> -    };
> +};
>  
> -    memory_listener_register(&hostmem->listener, &address_space_memory);
> -    if (hostmem->num_new_regions > 0) {
> -        hostmem_listener_commit(&hostmem->listener);
> -    }
> +void hostmem_init(void)
> +{
> +    qemu_mutex_init(&hostmem_lock);
> +    cur_hostmem = g_new0(HostMem, 1);
> +    cur_hostmem->ref = 1;
> +    memory_listener_register(&hostmem_listener, &address_space_memory);
>  }
>  
> -void hostmem_finalize(HostMem *hostmem)
> +void hostmem_finalize(void)
>  {
> -    memory_listener_unregister(&hostmem->listener);
> -    g_free(hostmem->new_regions);
> -    g_free(hostmem->current_regions);
> -    qemu_mutex_destroy(&hostmem->current_regions_lock);
> +    memory_listener_unregister(&hostmem_listener);
> +    hostmem_unref(cur_hostmem);
> +    qemu_mutex_destroy(&hostmem_lock);
>  }
> diff --git a/hw/dataplane/hostmem.h b/hw/dataplane/hostmem.h
> index b2cf093..883ba74 100644
> --- a/hw/dataplane/hostmem.h
> +++ b/hw/dataplane/hostmem.h
> @@ -20,29 +20,18 @@
>  typedef struct {
>      void *host_addr;
>      hwaddr guest_addr;
> +    MemoryRegion *mr;
>      uint64_t size;
>      bool readonly;
>  } HostMemRegion;
>  
>  typedef struct {
> -    /* The listener is invoked when regions change and a new list of regions is
> -     * built up completely before they are installed.
> -     */
> -    MemoryListener listener;
> -    HostMemRegion *new_regions;
> -    size_t num_new_regions;
> -
> -    /* Current regions are accessed from multiple threads either to lookup
> -     * addresses or to install a new list of regions.  The lock protects the
> -     * pointer and the regions.
> -     */
> -    QemuMutex current_regions_lock;
> +    int ref;
>      HostMemRegion *current_regions;
>      size_t num_current_regions;
>  } HostMem;
>  
> -void hostmem_init(HostMem *hostmem);
> -void hostmem_finalize(HostMem *hostmem);
> +void hostmem_unref(HostMem *hostmem);
>  
>  /**
>   * Map a guest physical address to a pointer
> @@ -52,6 +41,6 @@ void hostmem_finalize(HostMem *hostmem);
>   * can be done with other mechanisms like bdrv_drain_all() that quiesce
>   * in-flight I/O.
>   */
> -void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write);
> +void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write);
>  
>  #endif /* HOSTMEM_H */
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe Liu Ping Fan
  2013-04-02  6:11   ` li guang
  2013-04-11 10:09   ` Paolo Bonzini
@ 2013-04-11 10:11   ` Stefan Hajnoczi
  2013-04-11 10:26     ` Paolo Bonzini
  2013-04-12  3:55     ` liu ping fan
  2 siblings, 2 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2013-04-11 10:11 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Mon, Apr 01, 2013 at 04:20:31PM +0800, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> The hostmem listener will translate gpa to hva quickly. Make it
> global, so other component can use it.
> Also this patch adopt MemoryRegionOps's ref/unref interface to
> make it survive the RAM hotunplug.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  hw/dataplane/hostmem.c |  130 +++++++++++++++++++++++++++++++++--------------
>  hw/dataplane/hostmem.h |   19 ++------
>  2 files changed, 95 insertions(+), 54 deletions(-)
> 
> diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
> index 380537e..86c02cd 100644
> --- a/hw/dataplane/hostmem.c
> +++ b/hw/dataplane/hostmem.c
> @@ -13,6 +13,12 @@
>  
>  #include "exec/address-spaces.h"
>  #include "hostmem.h"
> +#include "qemu/main-loop.h"
> +
> +/* lock to protect the access to cur_hostmem */
> +static QemuMutex hostmem_lock;
> +static HostMem *cur_hostmem;
> +static HostMem *next_hostmem;
>  
>  static int hostmem_lookup_cmp(const void *phys_, const void *region_)
>  {
> @@ -28,16 +34,49 @@ static int hostmem_lookup_cmp(const void *phys_, const void *region_)
>      }
>  }
>  
> +static void hostmem_ref(HostMem *hostmem)
> +{
> +    assert(__sync_add_and_fetch(&hostmem->ref, 1) > 0);

man 3 assert:
"If  the macro NDEBUG was defined at the moment <assert.h> was last
included, the macro assert() generates no code, and hence does nothing
at all."

Never rely on a side-effect within an assert() call.  Store the return
value into a local variable and test the local in the assert().

Also, these sync gcc builtins are not available on all platforms or gcc
versions.  We need to be a little careful to avoid breaking builds here.
Maybe __sync_add_and_fetch() is fine but I wanted to mention it because
I've had trouble with these in the past.

I suggest using glib atomics instead, if possible.

> +}
> +
> +void hostmem_unref(HostMem *hostmem)
> +{
> +    int i;
> +    HostMemRegion *hmr;
> +
> +    if (!__sync_sub_and_fetch(&hostmem->ref, 1)) {
> +        for (i = 0; i < hostmem->num_current_regions; i++) {
> +            hmr = &hostmem->current_regions[i];
> +            /* Fix me, when memory hotunplug implement
> +             * assert(hmr->mr_ops->unref)
> +             */

Why this fixme?  Can you resolve it by making ->unref() return bool from
the start of this patch series?

> +            if (hmr->mr->ops && hmr->mr->ops->unref) {
> +                hmr->mr->ops->unref();
> +            }
> +        }
> +        g_free(hostmem->current_regions);
> +        g_free(hostmem);
> +    }
> +}
> +
>  /**
>   * Map guest physical address to host pointer
> + * also inc refcnt of *mr, caller need to unref later
>   */
> -void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
> +void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write)

A cleaner approach is to keep the hostmem_foo(HostMem *) functions and
have a global interface without the HostMem*.  That way the global
wrapper focusses on acquiring cur_hostmem while the existing functions
stay unchanged and focus on performing the actual operation.

>  {
>      HostMemRegion *region;
>      void *host_addr = NULL;
>      hwaddr offset_within_region;
> +    HostMem *hostmem;
> +
> +    assert(mr);
> +    *mr = NULL;
> +    qemu_mutex_lock(&hostmem_lock);
> +    hostmem = cur_hostmem;
> +    hostmem_ref(hostmem);
> +    qemu_mutex_unlock(&hostmem_lock);
>  
> -    qemu_mutex_lock(&hostmem->current_regions_lock);

Why is it safe to drop this lock?  The memory API could invoke callbacks
in another thread which causes us to update regions.

>      region = bsearch(&phys, hostmem->current_regions,
>                       hostmem->num_current_regions,
>                       sizeof(hostmem->current_regions[0]),
> @@ -52,28 +91,33 @@ void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
>      if (len <= region->size - offset_within_region) {
>          host_addr = region->host_addr + offset_within_region;
>      }
> -out:
> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
> +    *mr = region->mr;
> +    memory_region_ref(*mr);

How does this protect against the QEMU thread hot unplugging while we
are searching hostmem->current_regions[]?  Our mr would be stale and the
ref operation would corrupt memory if mr has already been freed!

>  
> +out:
> +    hostmem_unref(hostmem);
>      return host_addr;
>  }
>  
> +static void hostmem_listener_begin(MemoryListener *listener)
> +{
> +    next_hostmem = g_new0(HostMem, 1);
> +    next_hostmem->ref = 1;
> +}
> +
>  /**
>   * Install new regions list
>   */
>  static void hostmem_listener_commit(MemoryListener *listener)
>  {
> -    HostMem *hostmem = container_of(listener, HostMem, listener);
> +    HostMem *tmp;
>  
> -    qemu_mutex_lock(&hostmem->current_regions_lock);
> -    g_free(hostmem->current_regions);
> -    hostmem->current_regions = hostmem->new_regions;
> -    hostmem->num_current_regions = hostmem->num_new_regions;
> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
> +    tmp = cur_hostmem;
> +    qemu_mutex_lock(&hostmem_lock);

In hostmem_lookup() you accessed cur_hostmem inside the lock.  Does the
lock protect cur_hostmem or not?

> +    cur_hostmem = next_hostmem;
> +    qemu_mutex_unlock(&hostmem_lock);
> +    hostmem_unref(tmp);
>  
> -    /* Reset new regions list */
> -    hostmem->new_regions = NULL;
> -    hostmem->num_new_regions = 0;
>  }
>  
>  /**

I gave up here.  The approach you are trying to take isn't clear in this
patch.  I've pointed out some inconsistencies but they make it hard to
review more since I don't understand what you're trying to do.

Please split patches into logical steps and especially document
locks/refcounts to explain their scope - what do they protect?

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] vring: use hostmem's RAM safe api
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 3/5] vring: use hostmem's RAM safe api Liu Ping Fan
@ 2013-04-11 10:15   ` Stefan Hajnoczi
  2013-04-12  4:49     ` liu ping fan
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2013-04-11 10:15 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Mon, Apr 01, 2013 at 04:20:32PM +0800, Liu Ping Fan wrote:
> @@ -51,7 +50,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
>  
>  void vring_teardown(Vring *vring)
>  {
> -    hostmem_finalize(&vring->hostmem);
> +    memory_region_unref(vring->vring_mr);
>  }

dataplane keeps a reference to the vring.  This prevents memory hot
unplug while the device is up.  If this is a problem we'll have to
reduce the lifespan of the vring mapping.

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion Liu Ping Fan
  2013-04-02  5:58   ` li guang
@ 2013-04-11 10:20   ` Stefan Hajnoczi
  2013-04-12  4:48     ` liu ping fan
  1 sibling, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2013-04-11 10:20 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Mon, Apr 01, 2013 at 04:20:33PM +0800, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> virtio-blk will reference to RAM's memoryRegion when the req has been
> done.  So we can avoid to call bdrv_drain_all() when RAM hot unplug.

How does the hot unplug operation work without bdrv_drain_all()?  In
other words, how do we safely remove a MemoryRegion and wait for it to
become unreferenced?

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-11 10:11   ` Stefan Hajnoczi
@ 2013-04-11 10:26     ` Paolo Bonzini
  2013-04-12  3:55     ` liu ping fan
  1 sibling, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-04-11 10:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Liu Ping Fan, Vasilis Liaskovitis

Il 11/04/2013 12:11, Stefan Hajnoczi ha scritto:
> Also, these sync gcc builtins are not available on all platforms or gcc
> versions.  We need to be a little careful to avoid breaking builds here.
> Maybe __sync_add_and_fetch() is fine but I wanted to mention it because
> I've had trouble with these in the past.

We are already using GCC atomics elsewhere (spice, migration, vhost).
They are really ugly to type and not complete, so we should have our own
wrapper include, but they are fine.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] hostmem: init/finalize hostmem listener
  2013-04-11 10:02   ` Paolo Bonzini
@ 2013-04-11 12:08     ` liu ping fan
  2013-04-11 13:14       ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: liu ping fan @ 2013-04-11 12:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Stefan Hajnoczi

On Thu, Apr 11, 2013 at 6:02 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 01/04/2013 10:20, Liu Ping Fan ha scritto:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  vl.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/vl.c b/vl.c
>> index 7643f16..46a25cf 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -4157,6 +4157,7 @@ int main(int argc, char **argv, char **envp)
>>      }
>>
>>      os_set_line_buffering();
>> +    hostmem_init();
>>
>>      qemu_init_cpu_loop();
>>      qemu_mutex_lock_iothread();
>> @@ -4174,6 +4175,7 @@ int main(int argc, char **argv, char **envp)
>>
>>      /* clean up network at qemu process termination */
>>      atexit(&net_cleanup);
>> +    atexit(&hostmem_finalize);
>
> This should be in hostmem_init.
>
Ok, thanks. And extra, for the hostmem_init, what about something like
#define block_init(function) module_init(function, MODULE_INIT_BLOCK)

Regards,
Pingfan
> Paolo
>
>>
>>      if (net_init_clients() < 0) {
>>          exit(1);
>>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] hostmem: init/finalize hostmem listener
  2013-04-11 12:08     ` liu ping fan
@ 2013-04-11 13:14       ` Paolo Bonzini
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-04-11 13:14 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Stefan Hajnoczi

Il 11/04/2013 14:08, liu ping fan ha scritto:
>>> >>      /* clean up network at qemu process termination */
>>> >>      atexit(&net_cleanup);
>>> >> +    atexit(&hostmem_finalize);
>> >
>> > This should be in hostmem_init.
>> >
> Ok, thanks. And extra, for the hostmem_init, what about something like
> #define block_init(function) module_init(function, MODULE_INIT_BLOCK)

Actually, I would prefer that you do not use a global, so the above
comment doesn't really apply unless someone overrides me and tell you to
use a global.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-11 10:11   ` Stefan Hajnoczi
  2013-04-11 10:26     ` Paolo Bonzini
@ 2013-04-12  3:55     ` liu ping fan
  2013-04-12  8:38       ` Stefan Hajnoczi
  1 sibling, 1 reply; 35+ messages in thread
From: liu ping fan @ 2013-04-12  3:55 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Thu, Apr 11, 2013 at 6:11 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Apr 01, 2013 at 04:20:31PM +0800, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> The hostmem listener will translate gpa to hva quickly. Make it
>> global, so other component can use it.
>> Also this patch adopt MemoryRegionOps's ref/unref interface to
>> make it survive the RAM hotunplug.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  hw/dataplane/hostmem.c |  130 +++++++++++++++++++++++++++++++++--------------
>>  hw/dataplane/hostmem.h |   19 ++------
>>  2 files changed, 95 insertions(+), 54 deletions(-)
>>
>> diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
>> index 380537e..86c02cd 100644
>> --- a/hw/dataplane/hostmem.c
>> +++ b/hw/dataplane/hostmem.c
>> @@ -13,6 +13,12 @@
>>
>>  #include "exec/address-spaces.h"
>>  #include "hostmem.h"
>> +#include "qemu/main-loop.h"
>> +
>> +/* lock to protect the access to cur_hostmem */
>> +static QemuMutex hostmem_lock;
>> +static HostMem *cur_hostmem;
>> +static HostMem *next_hostmem;
>>
>>  static int hostmem_lookup_cmp(const void *phys_, const void *region_)
>>  {
>> @@ -28,16 +34,49 @@ static int hostmem_lookup_cmp(const void *phys_, const void *region_)
>>      }
>>  }
>>
>> +static void hostmem_ref(HostMem *hostmem)
>> +{
>> +    assert(__sync_add_and_fetch(&hostmem->ref, 1) > 0);
>
> man 3 assert:
> "If  the macro NDEBUG was defined at the moment <assert.h> was last
> included, the macro assert() generates no code, and hence does nothing
> at all."
>
So if needed, using abort()?
> Never rely on a side-effect within an assert() call.  Store the return
> value into a local variable and test the local in the assert().
>
Ok.
> Also, these sync gcc builtins are not available on all platforms or gcc
> versions.  We need to be a little careful to avoid breaking builds here.
> Maybe __sync_add_and_fetch() is fine but I wanted to mention it because
> I've had trouble with these in the past.
>
> I suggest using glib atomics instead, if possible.
>
>> +}
>> +
>> +void hostmem_unref(HostMem *hostmem)
>> +{
>> +    int i;
>> +    HostMemRegion *hmr;
>> +
>> +    if (!__sync_sub_and_fetch(&hostmem->ref, 1)) {
>> +        for (i = 0; i < hostmem->num_current_regions; i++) {
>> +            hmr = &hostmem->current_regions[i];
>> +            /* Fix me, when memory hotunplug implement
>> +             * assert(hmr->mr_ops->unref)
>> +             */
>
> Why this fixme?  Can you resolve it by making ->unref() return bool from
> the start of this patch series?
>
This is used to notice the developer that the ref/unref interface
should be implemented for RAM device, since it is can be used during
RAM unplug.  And as you mentioned, here s/assert/abort/  Right?
>> +            if (hmr->mr->ops && hmr->mr->ops->unref) {
>> +                hmr->mr->ops->unref();
>> +            }
>> +        }
>> +        g_free(hostmem->current_regions);
>> +        g_free(hostmem);
>> +    }
>> +}
>> +
>>  /**
>>   * Map guest physical address to host pointer
>> + * also inc refcnt of *mr, caller need to unref later
>>   */
>> -void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
>> +void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write)
>
> A cleaner approach is to keep the hostmem_foo(HostMem *) functions and
> have a global interface without the HostMem*.  That way the global
> wrapper focusses on acquiring cur_hostmem while the existing functions
> stay unchanged and focus on performing the actual operation.
>
The new API enforce a param "MemoryRegion **mr",  and rely on the
refcnt on it to survive the RAM unplug.  The caller should unref this
mr when done with it.  But the old API can not provide this, and not
easy to provide a wrapper.
>>  {
>>      HostMemRegion *region;
>>      void *host_addr = NULL;
>>      hwaddr offset_within_region;
>> +    HostMem *hostmem;
>> +
>> +    assert(mr);
>> +    *mr = NULL;
>> +    qemu_mutex_lock(&hostmem_lock);
>> +    hostmem = cur_hostmem;
>> +    hostmem_ref(hostmem);
>> +    qemu_mutex_unlock(&hostmem_lock);
>>
>> -    qemu_mutex_lock(&hostmem->current_regions_lock);
>
> Why is it safe to drop this lock?  The memory API could invoke callbacks
> in another thread which causes us to update regions.
>
The trick is the RCU. The lookup work will cur_hostmem, meanwhile if
there is a updater, it changes next_hostmem. So there is no race
between them.
>>      region = bsearch(&phys, hostmem->current_regions,
>>                       hostmem->num_current_regions,
>>                       sizeof(hostmem->current_regions[0]),
>> @@ -52,28 +91,33 @@ void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
>>      if (len <= region->size - offset_within_region) {
>>          host_addr = region->host_addr + offset_within_region;
>>      }
>> -out:
>> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
>> +    *mr = region->mr;
>> +    memory_region_ref(*mr);
>
> How does this protect against the QEMU thread hot unplugging while we
> are searching hostmem->current_regions[]?  Our mr would be stale and the
> ref operation would corrupt memory if mr has already been freed!
>
When hostmem_listener_append_region, we inc refcnt of mr, this will
pin the mr. During the lookup, it will help us agaist unplug.  Then
after cur_hostmem retired to past, we will release the corresponding
refcnt which it holds.

>>
>> +out:
>> +    hostmem_unref(hostmem);
>>      return host_addr;
>>  }
>>
>> +static void hostmem_listener_begin(MemoryListener *listener)
>> +{
>> +    next_hostmem = g_new0(HostMem, 1);
>> +    next_hostmem->ref = 1;
>> +}
>> +
>>  /**
>>   * Install new regions list
>>   */
>>  static void hostmem_listener_commit(MemoryListener *listener)
>>  {
>> -    HostMem *hostmem = container_of(listener, HostMem, listener);
>> +    HostMem *tmp;
>>
>> -    qemu_mutex_lock(&hostmem->current_regions_lock);
>> -    g_free(hostmem->current_regions);
>> -    hostmem->current_regions = hostmem->new_regions;
>> -    hostmem->num_current_regions = hostmem->num_new_regions;
>> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
>> +    tmp = cur_hostmem;
>> +    qemu_mutex_lock(&hostmem_lock);
>
> In hostmem_lookup() you accessed cur_hostmem inside the lock.  Does the
> lock protect cur_hostmem or not?
>
Yes, protect. Supposed we have HostMem A, and it will become B. Then
hostmem_lookup will either see A or B.  If it see A, it should use A
refcnt agaist hostmem_listener_commit to drop A.  This refcnt has no
relation with mr's object's refcnt.
>> +    cur_hostmem = next_hostmem;
>> +    qemu_mutex_unlock(&hostmem_lock);
>> +    hostmem_unref(tmp);
>>
>> -    /* Reset new regions list */
>> -    hostmem->new_regions = NULL;
>> -    hostmem->num_new_regions = 0;
>>  }
>>
>>  /**
>
> I gave up here.  The approach you are trying to take isn't clear in this

The main idea differ from origianl code is rcu, we have two version of
HostMem, reader uses cur_hostmem, updater uses next_hostmem
> patch.  I've pointed out some inconsistencies but they make it hard to
> review more since I don't understand what you're trying to do.
>
> Please split patches into logical steps and especially document
> locks/refcounts to explain their scope - what do they protect?
>
Sorry for that, I will try to imrove it, and more document.

Thanks and regards,
Pingfan

> Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] memory: add ref/unref interface for MemroyRegionOps
  2013-04-11  9:49   ` Stefan Hajnoczi
@ 2013-04-12  4:12     ` liu ping fan
  0 siblings, 0 replies; 35+ messages in thread
From: liu ping fan @ 2013-04-12  4:12 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Thu, Apr 11, 2013 at 5:49 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Apr 01, 2013 at 04:20:30PM +0800, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> This pair of interface are optinal, except for those device which is
>> used outside the biglock's protection for hot unplug.
>
> Not sure if this comment is true.  Memory unplug safety is not about the
> big lock, it's about whether a reference to memory is held *across* a
> hot unplug operation.
>
What I exactly mean is DeviceX unplug is under biglock, so if
operations on DeviceX are all within the biglock, they are safe. But
that is not the trueth with RAM. So using ref/unref to manage the
reference to memory

> So even code that is under the big lock can use a guest RAM buffer
> across the event loop, and therefore be exposed to a RAM unplug!
>
> Therefore inc/dec must be used if guest RAM is held across event loop

The ref/unref interface wrapper the inc/dec, exposes them from MemoryRegionOps.

> handler calls.  If the guest RAM access happens completely inside a
> handler function, then it is not affected by hot plug and doesn't need
> to do inc/dec.

Yes, that is truth. And that is why only calling ref/unref in
HostMemListener region_add, but not in mem_add().

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion
  2013-04-02  5:58   ` li guang
@ 2013-04-12  4:44     ` liu ping fan
  0 siblings, 0 replies; 35+ messages in thread
From: liu ping fan @ 2013-04-12  4:44 UTC (permalink / raw)
  To: li guang
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

On Tue, Apr 2, 2013 at 1:58 PM, li guang <lig.fnst@cn.fujitsu.com> wrote:
> 在 2013-04-01一的 16:20 +0800,Liu Ping Fan写道:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> virtio-blk will reference to RAM's memoryRegion when the req has been
>> done.
>
>   do you mean unreference after req completed?
>
Yes, should s/reference/unreference/

>> So we can avoid to call bdrv_drain_all() when RAM hot unplug.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  hw/dataplane/virtio-blk.c |   52 ++++++++++++++++++++++++++++++++++----------
>>  1 files changed, 40 insertions(+), 12 deletions(-)
>>
>> diff --git a/hw/dataplane/virtio-blk.c b/hw/dataplane/virtio-blk.c
>> index 1242d61..437174d 100644
>> --- a/hw/dataplane/virtio-blk.c
>> +++ b/hw/dataplane/virtio-blk.c
>> @@ -34,6 +34,8 @@ enum {
>>
>>  typedef struct {
>>      struct iocb iocb;               /* Linux AIO control block */
>> +    MemoryRegion *mrs[VRING_MAX];
>> +    int mrs_cnt;
>>      QEMUIOVector *inhdr;            /* iovecs for virtio_blk_inhdr */
>>      unsigned int head;              /* vring descriptor index */
>>      struct iovec *bounce_iov;       /* used if guest buffers are unaligned */
>> @@ -120,6 +122,9 @@ static void complete_request(struct iocb *iocb, ssize_t ret, void *opaque)
>>       * transferred plus the status bytes.
>>       */
>>      vring_push(&s->vring, req->head, len + sizeof(hdr));
>> +    while (--req->mrs_cnt >= 0) {
>> +        memory_region_unref(req->mrs[req->mrs_cnt]);
>> +    }
>>
>>      s->num_reqs--;
>>  }
>> @@ -155,7 +160,8 @@ static void do_get_id_cmd(VirtIOBlockDataPlane *s,
>>  static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
>>                         struct iovec *iov, unsigned int iov_cnt,
>>                         long long offset, unsigned int head,
>> -                       QEMUIOVector *inhdr)
>> +                       QEMUIOVector *inhdr,
>> +                       MemoryRegion **mrs, int cnt)
>>  {
>>      struct iocb *iocb;
>>      QEMUIOVector qiov;
>> @@ -187,6 +193,8 @@ static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
>>
>>      /* Fill in virtio block metadata needed for completion */
>>      VirtIOBlockRequest *req = container_of(iocb, VirtIOBlockRequest, iocb);
>> +    memcpy(req->mrs, mrs, cnt*sizeof(MemoryRegion *));
>> +    req->mrs_cnt = cnt;
>>      req->head = head;
>>      req->inhdr = inhdr;
>>      req->bounce_iov = bounce_iov;
>> @@ -196,19 +204,22 @@ static int do_rdwr_cmd(VirtIOBlockDataPlane *s, bool read,
>>
>>  static int process_request(IOQueue *ioq, struct iovec iov[],
>>                             unsigned int out_num, unsigned int in_num,
>> -                           unsigned int head)
>> +                           unsigned int head, MemoryRegion **mrs)
>>  {
>>      VirtIOBlockDataPlane *s = container_of(ioq, VirtIOBlockDataPlane, ioqueue);
>>      struct iovec *in_iov = &iov[out_num];
>>      struct virtio_blk_outhdr outhdr;
>>      QEMUIOVector *inhdr;
>>      size_t in_size;
>> +    unsigned int i, cnt = out_num+in_num;
>> +    int ret;
>>
>>      /* Copy in outhdr */
>>      if (unlikely(iov_to_buf(iov, out_num, 0, &outhdr,
>>                              sizeof(outhdr)) != sizeof(outhdr))) {
>>          error_report("virtio-blk request outhdr too short");
>> -        return -EFAULT;
>> +        ret = -EFAULT;
>> +        goto free_mrs;
>>      }
>>      iov_discard_front(&iov, &out_num, sizeof(outhdr));
>>
>> @@ -216,7 +227,8 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
>>      in_size = iov_size(in_iov, in_num);
>>      if (in_size < sizeof(struct virtio_blk_inhdr)) {
>>          error_report("virtio_blk request inhdr too short");
>> -        return -EFAULT;
>> +        ret = -EFAULT;
>> +        goto free_mrs;
>>      }
>>      inhdr = g_slice_new(QEMUIOVector);
>>      qemu_iovec_init(inhdr, 1);
>> @@ -229,18 +241,22 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
>>      outhdr.type &= ~VIRTIO_BLK_T_BARRIER;
>>
>>      switch (outhdr.type) {
>> +    /* For VIRTIO_BLK_T_IN/OUT, MemoryRegion will be release when cb */
>>      case VIRTIO_BLK_T_IN:
>> -        do_rdwr_cmd(s, true, in_iov, in_num, outhdr.sector * 512, head, inhdr);
>> +        do_rdwr_cmd(s, true, in_iov, in_num, outhdr.sector * 512, head, inhdr,
>> +            mrs, cnt);
>>          return 0;
>>
>>      case VIRTIO_BLK_T_OUT:
>> -        do_rdwr_cmd(s, false, iov, out_num, outhdr.sector * 512, head, inhdr);
>> +        do_rdwr_cmd(s, false, iov, out_num, outhdr.sector * 512, head, inhdr,
>> +            mrs, cnt);
>>          return 0;
>>
>>      case VIRTIO_BLK_T_SCSI_CMD:
>>          /* TODO support SCSI commands */
>>          complete_request_early(s, head, inhdr, VIRTIO_BLK_S_UNSUPP);
>> -        return 0;
>> +        ret = 0;
>> +        goto free_mrs;
>
> IMHO, 's/goto free_mrs/break/g', so free_mrs label can be removed.
>
Adopt.  Thanks.
Pingfan
>>
>>      case VIRTIO_BLK_T_FLUSH:
>>          /* TODO fdsync not supported by Linux AIO, do it synchronously here! */
>> @@ -249,18 +265,27 @@ static int process_request(IOQueue *ioq, struct iovec iov[],
>>          } else {
>>              complete_request_early(s, head, inhdr, VIRTIO_BLK_S_OK);
>>          }
>> -        return 0;
>> +        ret = 0;
>> +        goto free_mrs;
>>
>>      case VIRTIO_BLK_T_GET_ID:
>>          do_get_id_cmd(s, in_iov, in_num, head, inhdr);
>> -        return 0;
>> +        ret = 0;
>> +        goto free_mrs;
>>
>>      default:
>>          error_report("virtio-blk unsupported request type %#x", outhdr.type);
>>          qemu_iovec_destroy(inhdr);
>>          g_slice_free(QEMUIOVector, inhdr);
>> -        return -EFAULT;
>> +        ret = -EFAULT;
>> +        goto free_mrs;
>> +    }
>> +
>> +free_mrs:
>> +    for (i = 0; i < cnt; i++) {
>> +        memory_region_unref(mrs[i]);
>>      }
>> +    return ret;
>>  }
>>
>>  static int flush_true(EventNotifier *e)
>> @@ -286,6 +311,7 @@ static void handle_notify(EventNotifier *e)
>>      struct iovec iovec[VRING_MAX];
>>      struct iovec *end = &iovec[VRING_MAX];
>>      struct iovec *iov = iovec;
>> +    MemoryRegion *mrs[VRING_MAX];
>>
>>      /* When a request is read from the vring, the index of the first descriptor
>>       * (aka head) is returned so that the completed request can be pushed onto
>> @@ -304,7 +330,8 @@ static void handle_notify(EventNotifier *e)
>>          vring_disable_notification(s->vdev, &s->vring);
>>
>>          for (;;) {
>> -            head = vring_pop(s->vdev, &s->vring, iov, end, &out_num, &in_num);
>> +            head = vring_pop(s->vdev, &s->vring, iov, end, &out_num, &in_num,
>> +                            mrs);
>>              if (head < 0) {
>>                  break; /* no more requests */
>>              }
>> @@ -312,7 +339,8 @@ static void handle_notify(EventNotifier *e)
>>              trace_virtio_blk_data_plane_process_request(s, out_num, in_num,
>>                                                          head);
>>
>> -            if (process_request(&s->ioqueue, iov, out_num, in_num, head) < 0) {
>> +            if (process_request(&s->ioqueue, iov, out_num, in_num, head,
>> +                    mrs) < 0) {
>>                  vring_set_broken(&s->vring);
>>                  break;
>>              }
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion
  2013-04-11 10:20   ` Stefan Hajnoczi
@ 2013-04-12  4:48     ` liu ping fan
  2013-04-12  8:45       ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: liu ping fan @ 2013-04-12  4:48 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Thu, Apr 11, 2013 at 6:20 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Apr 01, 2013 at 04:20:33PM +0800, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> virtio-blk will reference to RAM's memoryRegion when the req has been
>> done.  So we can avoid to call bdrv_drain_all() when RAM hot unplug.
>
> How does the hot unplug operation work without bdrv_drain_all()?  In
> other words, how do we safely remove a MemoryRegion and wait for it to
> become unreferenced?
>
bdrv_drain_all() forces the end of usage of memoryRegion. But we can
let the req done callback ( marks this req finish the end of usage of
mr) to release the refcnt of memoryRegion.

> Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] vring: use hostmem's RAM safe api
  2013-04-11 10:15   ` Stefan Hajnoczi
@ 2013-04-12  4:49     ` liu ping fan
  2013-04-12  8:41       ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: liu ping fan @ 2013-04-12  4:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Thu, Apr 11, 2013 at 6:15 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Apr 01, 2013 at 04:20:32PM +0800, Liu Ping Fan wrote:
>> @@ -51,7 +50,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
>>
>>  void vring_teardown(Vring *vring)
>>  {
>> -    hostmem_finalize(&vring->hostmem);
>> +    memory_region_unref(vring->vring_mr);
>>  }
>
> dataplane keeps a reference to the vring.  This prevents memory hot
> unplug while the device is up.  If this is a problem we'll have to
> reduce the lifespan of the vring mapping.
>
hot unplug is rare case, maybe we can ignore the delay of device's finalize.

> Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-11 10:09   ` Paolo Bonzini
@ 2013-04-12  6:46     ` liu ping fan
  2013-04-12  8:21       ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: liu ping fan @ 2013-04-12  6:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Stefan Hajnoczi

On Thu, Apr 11, 2013 at 6:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 01/04/2013 10:20, Liu Ping Fan ha scritto:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> The hostmem listener will translate gpa to hva quickly. Make it
>> global, so other component can use it.
>> Also this patch adopt MemoryRegionOps's ref/unref interface to
>> make it survive the RAM hotunplug.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  hw/dataplane/hostmem.c |  130 +++++++++++++++++++++++++++++++++--------------
>>  hw/dataplane/hostmem.h |   19 ++------
>>  2 files changed, 95 insertions(+), 54 deletions(-)
>>
>> diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
>> index 380537e..86c02cd 100644
>> --- a/hw/dataplane/hostmem.c
>> +++ b/hw/dataplane/hostmem.c
>> @@ -13,6 +13,12 @@
>>
>>  #include "exec/address-spaces.h"
>>  #include "hostmem.h"
>> +#include "qemu/main-loop.h"
>> +
>> +/* lock to protect the access to cur_hostmem */
>> +static QemuMutex hostmem_lock;
>> +static HostMem *cur_hostmem;
>> +static HostMem *next_hostmem;
>>
>>  static int hostmem_lookup_cmp(const void *phys_, const void *region_)
>>  {
>> @@ -28,16 +34,49 @@ static int hostmem_lookup_cmp(const void *phys_, const void *region_)
>>      }
>>  }
>>
>> +static void hostmem_ref(HostMem *hostmem)
>> +{
>> +    assert(__sync_add_and_fetch(&hostmem->ref, 1) > 0);
>> +}
>> +
>> +void hostmem_unref(HostMem *hostmem)
>> +{
>> +    int i;
>> +    HostMemRegion *hmr;
>> +
>> +    if (!__sync_sub_and_fetch(&hostmem->ref, 1)) {
>> +        for (i = 0; i < hostmem->num_current_regions; i++) {
>> +            hmr = &hostmem->current_regions[i];
>> +            /* Fix me, when memory hotunplug implement
>> +             * assert(hmr->mr_ops->unref)
>> +             */
>> +            if (hmr->mr->ops && hmr->mr->ops->unref) {
>> +                hmr->mr->ops->unref();
>> +            }
>> +        }
>> +        g_free(hostmem->current_regions);
>> +        g_free(hostmem);
>> +    }
>> +}
>> +
>>  /**
>>   * Map guest physical address to host pointer
>> + * also inc refcnt of *mr, caller need to unref later
>>   */
>> -void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
>> +void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write)
>>  {
>>      HostMemRegion *region;
>>      void *host_addr = NULL;
>>      hwaddr offset_within_region;
>> +    HostMem *hostmem;
>> +
>> +    assert(mr);
>> +    *mr = NULL;
>
> I think it's simpler if you allow a NULL MemoryRegion.
>
But how can we guarantee the caller to release the refcnt?  In fact,
if @mr is not exposed to external component directly, it is still
required internally, for example, cpu_physical_memory_map() ported
onto HostMem, we will still rely on _unmap() to release the refcnt.

> Also, it is better if you keep the HostMem type, because in principle
> different HostMems could be used for different AddressSpaces.
> Basically, what is now HostMem would become HostMemRegions, and the new
> HostMem type would have these fields:
>
>    QemuMutex         hostmem_lock;
>    HostMemRegions   *cur_regions;
>    HostMemRegions   *next_regions;
>
> matching your three globals.  I like the rcu-ready design.
>
What about
AddressSpaceMemoryRegion {
  QemuMutex hostmem_lock;
  HostMem *cur_hostmem;
  HostMem *next_hostmem;
  MemoryListener listener;
}
So each AddressSpaceMemoryRegion can be bound with specific AddressSpace.

> Finally, please split this patch and patch 3 differently:
>
> 1) add HostMemRegions
>
> 2) add ref/unref of memory regions to hostmem
>
> 3) add MemoryRegion argument to hostmem_lookup, leave it NULL in vring
>
As explained above, seem we can not leave it NULL. (In theory, vring
seated on RAM can not be unplug, since it is kernel.  But I think we
can not rely on the guest's behavior)
> 4) add ref/unref of memory regions to vring
>
Thank you very much :-).  Will follow that, and arrange patches more clearly.

Regards,
Pingfan
> Paolo
>
>> +    qemu_mutex_lock(&hostmem_lock);
>> +    hostmem = cur_hostmem;
>> +    hostmem_ref(hostmem);
>> +    qemu_mutex_unlock(&hostmem_lock);
>>
>> -    qemu_mutex_lock(&hostmem->current_regions_lock);
>>      region = bsearch(&phys, hostmem->current_regions,
>>                       hostmem->num_current_regions,
>>                       sizeof(hostmem->current_regions[0]),
>> @@ -52,28 +91,33 @@ void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
>>      if (len <= region->size - offset_within_region) {
>>          host_addr = region->host_addr + offset_within_region;
>>      }
>> -out:
>> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
>> +    *mr = region->mr;
>> +    memory_region_ref(*mr);
>>
>> +out:
>> +    hostmem_unref(hostmem);
>>      return host_addr;
>>  }
>>
>> +static void hostmem_listener_begin(MemoryListener *listener)
>> +{
>> +    next_hostmem = g_new0(HostMem, 1);
>> +    next_hostmem->ref = 1;
>> +}
>> +
>>  /**
>>   * Install new regions list
>>   */
>>  static void hostmem_listener_commit(MemoryListener *listener)
>>  {
>> -    HostMem *hostmem = container_of(listener, HostMem, listener);
>> +    HostMem *tmp;
>>
>> -    qemu_mutex_lock(&hostmem->current_regions_lock);
>> -    g_free(hostmem->current_regions);
>> -    hostmem->current_regions = hostmem->new_regions;
>> -    hostmem->num_current_regions = hostmem->num_new_regions;
>> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
>> +    tmp = cur_hostmem;
>> +    qemu_mutex_lock(&hostmem_lock);
>> +    cur_hostmem = next_hostmem;
>> +    qemu_mutex_unlock(&hostmem_lock);
>> +    hostmem_unref(tmp);
>>
>> -    /* Reset new regions list */
>> -    hostmem->new_regions = NULL;
>> -    hostmem->num_new_regions = 0;
>>  }
>>
>>  /**
>> @@ -83,23 +127,24 @@ static void hostmem_append_new_region(HostMem *hostmem,
>>                                        MemoryRegionSection *section)
>>  {
>>      void *ram_ptr = memory_region_get_ram_ptr(section->mr);
>> -    size_t num = hostmem->num_new_regions;
>> -    size_t new_size = (num + 1) * sizeof(hostmem->new_regions[0]);
>> +    size_t num = hostmem->num_current_regions;
>> +    size_t new_size = (num + 1) * sizeof(hostmem->current_regions[0]);
>>
>> -    hostmem->new_regions = g_realloc(hostmem->new_regions, new_size);
>> -    hostmem->new_regions[num] = (HostMemRegion){
>> +    hostmem->current_regions = g_realloc(hostmem->current_regions, new_size);
>> +    hostmem->current_regions[num] = (HostMemRegion){
>>          .host_addr = ram_ptr + section->offset_within_region,
>>          .guest_addr = section->offset_within_address_space,
>> +        .mr = section->mr,
>>          .size = section->size,
>>          .readonly = section->readonly,
>>      };
>> -    hostmem->num_new_regions++;
>> +    hostmem->num_current_regions++;
>>  }
>>
>> -static void hostmem_listener_append_region(MemoryListener *listener,
>> +static void hostmem_listener_nop_region(MemoryListener *listener,
>>                                             MemoryRegionSection *section)
>>  {
>> -    HostMem *hostmem = container_of(listener, HostMem, listener);
>> +    HostMem *hostmem = next_hostmem;
>>
>>      /* Ignore non-RAM regions, we may not be able to map them */
>>      if (!memory_region_is_ram(section->mr)) {
>> @@ -114,6 +159,18 @@ static void hostmem_listener_append_region(MemoryListener *listener,
>>      hostmem_append_new_region(hostmem, section);
>>  }
>>
>> +static void hostmem_listener_append_region(MemoryListener *listener,
>> +                                           MemoryRegionSection *section)
>> +{
>> +    hostmem_listener_nop_region(listener, section);
>> +    /* Fix me, when memory hotunplug implement
>> +     * assert(section->mr->ops->ref)
>> +     */
>> +    if (section->mr->ops && section->mr->ops->ref) {
>> +        section->mr->ops->ref();
>> +    }
>> +}
>> +
>>  /* We don't implement most MemoryListener callbacks, use these nop stubs */
>>  static void hostmem_listener_dummy(MemoryListener *listener)
>>  {
>> @@ -137,18 +194,12 @@ static void hostmem_listener_coalesced_mmio_dummy(MemoryListener *listener,
>>  {
>>  }
>>
>> -void hostmem_init(HostMem *hostmem)
>> -{
>> -    memset(hostmem, 0, sizeof(*hostmem));
>> -
>> -    qemu_mutex_init(&hostmem->current_regions_lock);
>> -
>> -    hostmem->listener = (MemoryListener){
>> -        .begin = hostmem_listener_dummy,
>> +static MemoryListener hostmem_listener = {
>> +        .begin = hostmem_listener_begin,
>>          .commit = hostmem_listener_commit,
>>          .region_add = hostmem_listener_append_region,
>>          .region_del = hostmem_listener_section_dummy,
>> -        .region_nop = hostmem_listener_append_region,
>> +        .region_nop = hostmem_listener_nop_region,
>>          .log_start = hostmem_listener_section_dummy,
>>          .log_stop = hostmem_listener_section_dummy,
>>          .log_sync = hostmem_listener_section_dummy,
>> @@ -159,18 +210,19 @@ void hostmem_init(HostMem *hostmem)
>>          .coalesced_mmio_add = hostmem_listener_coalesced_mmio_dummy,
>>          .coalesced_mmio_del = hostmem_listener_coalesced_mmio_dummy,
>>          .priority = 10,
>> -    };
>> +};
>>
>> -    memory_listener_register(&hostmem->listener, &address_space_memory);
>> -    if (hostmem->num_new_regions > 0) {
>> -        hostmem_listener_commit(&hostmem->listener);
>> -    }
>> +void hostmem_init(void)
>> +{
>> +    qemu_mutex_init(&hostmem_lock);
>> +    cur_hostmem = g_new0(HostMem, 1);
>> +    cur_hostmem->ref = 1;
>> +    memory_listener_register(&hostmem_listener, &address_space_memory);
>>  }
>>
>> -void hostmem_finalize(HostMem *hostmem)
>> +void hostmem_finalize(void)
>>  {
>> -    memory_listener_unregister(&hostmem->listener);
>> -    g_free(hostmem->new_regions);
>> -    g_free(hostmem->current_regions);
>> -    qemu_mutex_destroy(&hostmem->current_regions_lock);
>> +    memory_listener_unregister(&hostmem_listener);
>> +    hostmem_unref(cur_hostmem);
>> +    qemu_mutex_destroy(&hostmem_lock);
>>  }
>> diff --git a/hw/dataplane/hostmem.h b/hw/dataplane/hostmem.h
>> index b2cf093..883ba74 100644
>> --- a/hw/dataplane/hostmem.h
>> +++ b/hw/dataplane/hostmem.h
>> @@ -20,29 +20,18 @@
>>  typedef struct {
>>      void *host_addr;
>>      hwaddr guest_addr;
>> +    MemoryRegion *mr;
>>      uint64_t size;
>>      bool readonly;
>>  } HostMemRegion;
>>
>>  typedef struct {
>> -    /* The listener is invoked when regions change and a new list of regions is
>> -     * built up completely before they are installed.
>> -     */
>> -    MemoryListener listener;
>> -    HostMemRegion *new_regions;
>> -    size_t num_new_regions;
>> -
>> -    /* Current regions are accessed from multiple threads either to lookup
>> -     * addresses or to install a new list of regions.  The lock protects the
>> -     * pointer and the regions.
>> -     */
>> -    QemuMutex current_regions_lock;
>> +    int ref;
>>      HostMemRegion *current_regions;
>>      size_t num_current_regions;
>>  } HostMem;
>>
>> -void hostmem_init(HostMem *hostmem);
>> -void hostmem_finalize(HostMem *hostmem);
>> +void hostmem_unref(HostMem *hostmem);
>>
>>  /**
>>   * Map a guest physical address to a pointer
>> @@ -52,6 +41,6 @@ void hostmem_finalize(HostMem *hostmem);
>>   * can be done with other mechanisms like bdrv_drain_all() that quiesce
>>   * in-flight I/O.
>>   */
>> -void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write);
>> +void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write);
>>
>>  #endif /* HOSTMEM_H */
>>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-12  6:46     ` liu ping fan
@ 2013-04-12  8:21       ` Stefan Hajnoczi
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2013-04-12  8:21 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Fri, Apr 12, 2013 at 02:46:41PM +0800, liu ping fan wrote:
> On Thu, Apr 11, 2013 at 6:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Il 01/04/2013 10:20, Liu Ping Fan ha scritto:
> > Finally, please split this patch and patch 3 differently:
> >
> > 1) add HostMemRegions
> >
> > 2) add ref/unref of memory regions to hostmem
> >
> > 3) add MemoryRegion argument to hostmem_lookup, leave it NULL in vring
> >
> As explained above, seem we can not leave it NULL. (In theory, vring
> seated on RAM can not be unplug, since it is kernel.  But I think we
> can not rely on the guest's behavior)

It should be possible to evacuate the vring.  The guest needs to reset
the device so the vring is unmapped.  Then memory can be unplugged.
Reinitializing the device will allocate a new vring and things work
again.

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-12  3:55     ` liu ping fan
@ 2013-04-12  8:38       ` Stefan Hajnoczi
  2013-04-12 10:03         ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2013-04-12  8:38 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Fri, Apr 12, 2013 at 11:55:39AM +0800, liu ping fan wrote:
> On Thu, Apr 11, 2013 at 6:11 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Mon, Apr 01, 2013 at 04:20:31PM +0800, Liu Ping Fan wrote:
> >> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> >>
> >> The hostmem listener will translate gpa to hva quickly. Make it
> >> global, so other component can use it.
> >> Also this patch adopt MemoryRegionOps's ref/unref interface to
> >> make it survive the RAM hotunplug.
> >>
> >> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> >> ---
> >>  hw/dataplane/hostmem.c |  130 +++++++++++++++++++++++++++++++++--------------
> >>  hw/dataplane/hostmem.h |   19 ++------
> >>  2 files changed, 95 insertions(+), 54 deletions(-)
> >>
> >> diff --git a/hw/dataplane/hostmem.c b/hw/dataplane/hostmem.c
> >> index 380537e..86c02cd 100644
> >> --- a/hw/dataplane/hostmem.c
> >> +++ b/hw/dataplane/hostmem.c
> >> @@ -13,6 +13,12 @@
> >>
> >>  #include "exec/address-spaces.h"
> >>  #include "hostmem.h"
> >> +#include "qemu/main-loop.h"
> >> +
> >> +/* lock to protect the access to cur_hostmem */
> >> +static QemuMutex hostmem_lock;
> >> +static HostMem *cur_hostmem;
> >> +static HostMem *next_hostmem;
> >>
> >>  static int hostmem_lookup_cmp(const void *phys_, const void *region_)
> >>  {
> >> @@ -28,16 +34,49 @@ static int hostmem_lookup_cmp(const void *phys_, const void *region_)
> >>      }
> >>  }
> >>
> >> +static void hostmem_ref(HostMem *hostmem)
> >> +{
> >> +    assert(__sync_add_and_fetch(&hostmem->ref, 1) > 0);
> >
> > man 3 assert:
> > "If  the macro NDEBUG was defined at the moment <assert.h> was last
> > included, the macro assert() generates no code, and hence does nothing
> > at all."
> >
> So if needed, using abort()?

No, you can still use assert but don't put side-effects into the
assertion:

int old = __sync_add_and_fetch(&hostmem->ref, 1);
assert(old > 0);

> > Never rely on a side-effect within an assert() call.  Store the return
> > value into a local variable and test the local in the assert().
> >
> Ok.
> > Also, these sync gcc builtins are not available on all platforms or gcc
> > versions.  We need to be a little careful to avoid breaking builds here.
> > Maybe __sync_add_and_fetch() is fine but I wanted to mention it because
> > I've had trouble with these in the past.
> >
> > I suggest using glib atomics instead, if possible.
> >
> >> +}
> >> +
> >> +void hostmem_unref(HostMem *hostmem)
> >> +{
> >> +    int i;
> >> +    HostMemRegion *hmr;
> >> +
> >> +    if (!__sync_sub_and_fetch(&hostmem->ref, 1)) {
> >> +        for (i = 0; i < hostmem->num_current_regions; i++) {
> >> +            hmr = &hostmem->current_regions[i];
> >> +            /* Fix me, when memory hotunplug implement
> >> +             * assert(hmr->mr_ops->unref)
> >> +             */
> >
> > Why this fixme?  Can you resolve it by making ->unref() return bool from
> > the start of this patch series?
> >
> This is used to notice the developer that the ref/unref interface
> should be implemented for RAM device, since it is can be used during
> RAM unplug.  And as you mentioned, here s/assert/abort/  Right?

Ah, sorry.  I misread the assertion, you want to require that ->unref()
is implemented.

The assertion is fine the way it is.

> >> +            if (hmr->mr->ops && hmr->mr->ops->unref) {
> >> +                hmr->mr->ops->unref();
> >> +            }
> >> +        }
> >> +        g_free(hostmem->current_regions);
> >> +        g_free(hostmem);
> >> +    }
> >> +}
> >> +
> >>  /**
> >>   * Map guest physical address to host pointer
> >> + * also inc refcnt of *mr, caller need to unref later
> >>   */
> >> -void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
> >> +void *hostmem_lookup(hwaddr phys, hwaddr len, MemoryRegion **mr, bool is_write)
> >
> > A cleaner approach is to keep the hostmem_foo(HostMem *) functions and
> > have a global interface without the HostMem*.  That way the global
> > wrapper focusses on acquiring cur_hostmem while the existing functions
> > stay unchanged and focus on performing the actual operation.
> >
> The new API enforce a param "MemoryRegion **mr",  and rely on the
> refcnt on it to survive the RAM unplug.  The caller should unref this
> mr when done with it.  But the old API can not provide this, and not
> easy to provide a wrapper.

I understand, MemoryRegion needs to be added to the function.

But it would be nice to keep the hostmem_lock and cur_hostmem stuff
separate to make this function easier to understand.  In other words,
keep the globals away from the code that operates on a HostMem.  It
makes the code easier to read and guarantees to the reader that you are
not mixing in assumptions about global state.

> >>  {
> >>      HostMemRegion *region;
> >>      void *host_addr = NULL;
> >>      hwaddr offset_within_region;
> >> +    HostMem *hostmem;
> >> +
> >> +    assert(mr);
> >> +    *mr = NULL;
> >> +    qemu_mutex_lock(&hostmem_lock);
> >> +    hostmem = cur_hostmem;
> >> +    hostmem_ref(hostmem);
> >> +    qemu_mutex_unlock(&hostmem_lock);
> >>
> >> -    qemu_mutex_lock(&hostmem->current_regions_lock);
> >
> > Why is it safe to drop this lock?  The memory API could invoke callbacks
> > in another thread which causes us to update regions.
> >
> The trick is the RCU. The lookup work will cur_hostmem, meanwhile if
> there is a updater, it changes next_hostmem. So there is no race
> between them.

I see.  Please document the nature of the cur/next variables in
comments.

You can make the code review process smoother by laying out patches in a
way that makes them logical and easy to understand.  Right now it feels
like I have to reverse-engineer what you were thinking in order to
understand the patches.

> >>      region = bsearch(&phys, hostmem->current_regions,
> >>                       hostmem->num_current_regions,
> >>                       sizeof(hostmem->current_regions[0]),
> >> @@ -52,28 +91,33 @@ void *hostmem_lookup(HostMem *hostmem, hwaddr phys, hwaddr len, bool is_write)
> >>      if (len <= region->size - offset_within_region) {
> >>          host_addr = region->host_addr + offset_within_region;
> >>      }
> >> -out:
> >> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
> >> +    *mr = region->mr;
> >> +    memory_region_ref(*mr);
> >
> > How does this protect against the QEMU thread hot unplugging while we
> > are searching hostmem->current_regions[]?  Our mr would be stale and the
> > ref operation would corrupt memory if mr has already been freed!
> >
> When hostmem_listener_append_region, we inc refcnt of mr, this will
> pin the mr. During the lookup, it will help us agaist unplug.  Then
> after cur_hostmem retired to past, we will release the corresponding
> refcnt which it holds.

I see.  This also explain why hostmem_ref()/hostmem_unref() a
asymmetric: ref() just increments hostmem's refcount while unref()
actually decrements refcounts for memory regions.  It's something I
wondered about when looking at those functions.

> >>
> >> +out:
> >> +    hostmem_unref(hostmem);
> >>      return host_addr;
> >>  }
> >>
> >> +static void hostmem_listener_begin(MemoryListener *listener)
> >> +{
> >> +    next_hostmem = g_new0(HostMem, 1);
> >> +    next_hostmem->ref = 1;
> >> +}
> >> +
> >>  /**
> >>   * Install new regions list
> >>   */
> >>  static void hostmem_listener_commit(MemoryListener *listener)
> >>  {
> >> -    HostMem *hostmem = container_of(listener, HostMem, listener);
> >> +    HostMem *tmp;
> >>
> >> -    qemu_mutex_lock(&hostmem->current_regions_lock);
> >> -    g_free(hostmem->current_regions);
> >> -    hostmem->current_regions = hostmem->new_regions;
> >> -    hostmem->num_current_regions = hostmem->num_new_regions;
> >> -    qemu_mutex_unlock(&hostmem->current_regions_lock);
> >> +    tmp = cur_hostmem;
> >> +    qemu_mutex_lock(&hostmem_lock);
> >
> > In hostmem_lookup() you accessed cur_hostmem inside the lock.  Does the
> > lock protect cur_hostmem or not?
> >
> Yes, protect. Supposed we have HostMem A, and it will become B. Then
> hostmem_lookup will either see A or B.  If it see A, it should use A
> refcnt agaist hostmem_listener_commit to drop A.  This refcnt has no
> relation with mr's object's refcnt.

My question is why you are accessing cur_hostmem outside hostmem_lock
but then assigning it inside the lock on the next line...

> >> +    cur_hostmem = next_hostmem;

...here.

If you want an atomic exchange then tmp = cur_hostmem should be inside
the lock.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] vring: use hostmem's RAM safe api
  2013-04-12  4:49     ` liu ping fan
@ 2013-04-12  8:41       ` Stefan Hajnoczi
  0 siblings, 0 replies; 35+ messages in thread
From: Stefan Hajnoczi @ 2013-04-12  8:41 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Fri, Apr 12, 2013 at 12:49:45PM +0800, liu ping fan wrote:
> On Thu, Apr 11, 2013 at 6:15 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Mon, Apr 01, 2013 at 04:20:32PM +0800, Liu Ping Fan wrote:
> >> @@ -51,7 +50,7 @@ bool vring_setup(Vring *vring, VirtIODevice *vdev, int n)
> >>
> >>  void vring_teardown(Vring *vring)
> >>  {
> >> -    hostmem_finalize(&vring->hostmem);
> >> +    memory_region_unref(vring->vring_mr);
> >>  }
> >
> > dataplane keeps a reference to the vring.  This prevents memory hot
> > unplug while the device is up.  If this is a problem we'll have to
> > reduce the lifespan of the vring mapping.
> >
> hot unplug is rare case, maybe we can ignore the delay of device's finalize.

I thought about the vring more and I think we can keep the current
behavior.  dataplane keeps the vring up when the device is active.

When the guest kernel/device driver resets the device then we unmap the
vring.  It's reasonable that the guest needs to release the virtio
device before hot unplugging memory used for the vring :).

Therefore, in practice this behavior should be fine.

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion
  2013-04-12  4:48     ` liu ping fan
@ 2013-04-12  8:45       ` Stefan Hajnoczi
  2013-04-12  9:05         ` liu ping fan
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2013-04-12  8:45 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Fri, Apr 12, 2013 at 12:48:12PM +0800, liu ping fan wrote:
> On Thu, Apr 11, 2013 at 6:20 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Mon, Apr 01, 2013 at 04:20:33PM +0800, Liu Ping Fan wrote:
> >> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> >>
> >> virtio-blk will reference to RAM's memoryRegion when the req has been
> >> done.  So we can avoid to call bdrv_drain_all() when RAM hot unplug.
> >
> > How does the hot unplug operation work without bdrv_drain_all()?  In
> > other words, how do we safely remove a MemoryRegion and wait for it to
> > become unreferenced?
> >
> bdrv_drain_all() forces the end of usage of memoryRegion. But we can
> let the req done callback ( marks this req finish the end of usage of
> mr) to release the refcnt of memoryRegion.

Yes.  What I'm interested in is the wait mechanism for the QEMU thread
to wait until the memory region(s) become unreferenced.

This patch series is only one half of the memory unplug puzzle and I'd
like to understand how the other half - the unplug operation - will be
implemented.

Just a summary would be interesting - especially how a QEMU thread will
wait until memory regions have been released.  The reference counter
doesn't have any notification that would allow a blocking wait.

Then you have the RCU concept.  So maybe the unplug operation will not
block but instead be called several times from the event loop until all
references have been released?

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion
  2013-04-12  8:45       ` Stefan Hajnoczi
@ 2013-04-12  9:05         ` liu ping fan
  2013-04-16  7:57           ` Stefan Hajnoczi
  0 siblings, 1 reply; 35+ messages in thread
From: liu ping fan @ 2013-04-12  9:05 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Fri, Apr 12, 2013 at 4:45 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Fri, Apr 12, 2013 at 12:48:12PM +0800, liu ping fan wrote:
>> On Thu, Apr 11, 2013 at 6:20 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> > On Mon, Apr 01, 2013 at 04:20:33PM +0800, Liu Ping Fan wrote:
>> >> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> >>
>> >> virtio-blk will reference to RAM's memoryRegion when the req has been
>> >> done.  So we can avoid to call bdrv_drain_all() when RAM hot unplug.
>> >
>> > How does the hot unplug operation work without bdrv_drain_all()?  In
>> > other words, how do we safely remove a MemoryRegion and wait for it to
>> > become unreferenced?
>> >
>> bdrv_drain_all() forces the end of usage of memoryRegion. But we can
>> let the req done callback ( marks this req finish the end of usage of
>> mr) to release the refcnt of memoryRegion.
>
> Yes.  What I'm interested in is the wait mechanism for the QEMU thread
> to wait until the memory region(s) become unreferenced.
>
> This patch series is only one half of the memory unplug puzzle and I'd
> like to understand how the other half - the unplug operation - will be
> implemented.
>
The unplug patch is still under developed, more detail please refer to
Vasilis Liaskovitis's patches:
   http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02693.html

> Just a summary would be interesting - especially how a QEMU thread will
> wait until memory regions have been released.  The reference counter
> doesn't have any notification that would allow a blocking wait.
>
Sorry, not understand "a blocking wait".  To summary, when
initializing, RamDevice's refcnt == 1, and unplug will release this
one. Meanwhile, all the MemoryListeners which are async with unplug,
will inc refcnt to against the unplug event.
> Then you have the RCU concept.  So maybe the unplug operation will not
> block but instead be called several times from the event loop until all
> references have been released?
>
As mentioned above, unplug will put its own refcnt. And unplug will not block.


> Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-12  8:38       ` Stefan Hajnoczi
@ 2013-04-12 10:03         ` Paolo Bonzini
  2013-04-15  1:42           ` liu ping fan
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2013-04-12 10:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	liu ping fan, qemu-devel, Vasilis Liaskovitis

Il 12/04/2013 10:38, Stefan Hajnoczi ha scritto:
>> > Yes, protect. Supposed we have HostMem A, and it will become B. Then
>> > hostmem_lookup will either see A or B.  If it see A, it should use A
>> > refcnt agaist hostmem_listener_commit to drop A.  This refcnt has no
>> > relation with mr's object's refcnt.
> My question is why you are accessing cur_hostmem outside hostmem_lock
> but then assigning it inside the lock on the next line...
> 
>>>> > >> +    cur_hostmem = next_hostmem;
> ...here.
> 
> If you want an atomic exchange then tmp = cur_hostmem should be inside
> the lock.

It will work because readers will grab either the hostmem_lock or the
BQL, while writers will grab both.  A kind of local/global lock, but I'm
not sure it was intentional. :)  It's simpler to just move the read
inside the lock.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-12 10:03         ` Paolo Bonzini
@ 2013-04-15  1:42           ` liu ping fan
  2013-04-15  6:38             ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: liu ping fan @ 2013-04-15  1:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Anthony Liguori, Stefan Hajnoczi, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Jan Kiszka

On Fri, Apr 12, 2013 at 6:03 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 12/04/2013 10:38, Stefan Hajnoczi ha scritto:
>>> > Yes, protect. Supposed we have HostMem A, and it will become B. Then
>>> > hostmem_lookup will either see A or B.  If it see A, it should use A
>>> > refcnt agaist hostmem_listener_commit to drop A.  This refcnt has no
>>> > relation with mr's object's refcnt.
>> My question is why you are accessing cur_hostmem outside hostmem_lock
>> but then assigning it inside the lock on the next line...
>>
>>>>> > >> +    cur_hostmem = next_hostmem;
>> ...here.
>>
>> If you want an atomic exchange then tmp = cur_hostmem should be inside
>> the lock.
>
> It will work because readers will grab either the hostmem_lock or the
> BQL, while writers will grab both.  A kind of local/global lock, but I'm
No only hostmem_lock is used to protect readers from writers.  While
the writers are protected agaist each other by biglock.  So leaving
the "cur_hostmem = next_hostmem" outside to reflect the lock's rules.

Regards,
Pingfan
> not sure it was intentional. :)  It's simpler to just move the read
> inside the lock.
>
> Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe
  2013-04-15  1:42           ` liu ping fan
@ 2013-04-15  6:38             ` Paolo Bonzini
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-04-15  6:38 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Anthony Liguori, Stefan Hajnoczi, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Jan Kiszka

Il 15/04/2013 03:42, liu ping fan ha scritto:
> > It will work because readers will grab either the hostmem_lock or the
> > BQL, while writers will grab both.  A kind of local/global lock, but I'm
> 
> No only hostmem_lock is used to protect readers from writers.  While
> the writers are protected agaist each other by biglock.  So leaving
> the "cur_hostmem = next_hostmem" outside to reflect the lock's rules.

Doesn't help much if you do not document them...

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion
  2013-04-12  9:05         ` liu ping fan
@ 2013-04-16  7:57           ` Stefan Hajnoczi
  2013-04-16  8:12             ` Paolo Bonzini
  0 siblings, 1 reply; 35+ messages in thread
From: Stefan Hajnoczi @ 2013-04-16  7:57 UTC (permalink / raw)
  To: liu ping fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Paolo Bonzini

On Fri, Apr 12, 2013 at 05:05:41PM +0800, liu ping fan wrote:
> On Fri, Apr 12, 2013 at 4:45 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Fri, Apr 12, 2013 at 12:48:12PM +0800, liu ping fan wrote:
> >> On Thu, Apr 11, 2013 at 6:20 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >> > On Mon, Apr 01, 2013 at 04:20:33PM +0800, Liu Ping Fan wrote:
> >> >> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> >> >>
> >> >> virtio-blk will reference to RAM's memoryRegion when the req has been
> >> >> done.  So we can avoid to call bdrv_drain_all() when RAM hot unplug.
> >> >
> >> > How does the hot unplug operation work without bdrv_drain_all()?  In
> >> > other words, how do we safely remove a MemoryRegion and wait for it to
> >> > become unreferenced?
> >> >
> >> bdrv_drain_all() forces the end of usage of memoryRegion. But we can
> >> let the req done callback ( marks this req finish the end of usage of
> >> mr) to release the refcnt of memoryRegion.
> >
> > Yes.  What I'm interested in is the wait mechanism for the QEMU thread
> > to wait until the memory region(s) become unreferenced.
> >
> > This patch series is only one half of the memory unplug puzzle and I'd
> > like to understand how the other half - the unplug operation - will be
> > implemented.
> >
> The unplug patch is still under developed, more detail please refer to
> Vasilis Liaskovitis's patches:
>    http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02693.html
> 
> > Just a summary would be interesting - especially how a QEMU thread will
> > wait until memory regions have been released.  The reference counter
> > doesn't have any notification that would allow a blocking wait.
> >
> Sorry, not understand "a blocking wait".  To summary, when
> initializing, RamDevice's refcnt == 1, and unplug will release this
> one. Meanwhile, all the MemoryListeners which are async with unplug,
> will inc refcnt to against the unplug event.

Okay, thanks for the summary.  I don't need to see patches, I just want
to understand how the changes implemented in this series will be used.

> > Then you have the RCU concept.  So maybe the unplug operation will not
> > block but instead be called several times from the event loop until all
> > references have been released?
> >
> As mentioned above, unplug will put its own refcnt. And unplug will not block.

So it sounds like unplug will not block and there is no guarantee the
memory is actually unplugged when the monitor command completes.  The
memory region is only released when the last reference count holder lets
go.

This means that pending I/O to a hung NFS mount can delay the actual
unplug for unbounded time (by default the kernel NFS client keeps
retrying and does not fail the I/O request).  The user will be able to
issue additional monitor commands and see that memory is not yet
unplugged?

Stefan

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion
  2013-04-16  7:57           ` Stefan Hajnoczi
@ 2013-04-16  8:12             ` Paolo Bonzini
  0 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2013-04-16  8:12 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	liu ping fan, qemu-devel, Vasilis Liaskovitis

Il 16/04/2013 09:57, Stefan Hajnoczi ha scritto:
> So it sounds like unplug will not block and there is no guarantee the
> memory is actually unplugged when the monitor command completes.  The
> memory region is only released when the last reference count holder lets
> go.
> 
> This means that pending I/O to a hung NFS mount can delay the actual
> unplug for unbounded time (by default the kernel NFS client keeps
> retrying and does not fail the I/O request).  The user will be able to
> issue additional monitor commands and see that memory is not yet
> unplugged?

I think "info mtree" would provide information.  We can add an event
too, similar to the recently added DEVICE_DELETED.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] about atexit() (was: [PATCH 5/5] hostmem: init/finalize hostmem listener)
  2013-04-01  8:20 ` [Qemu-devel] [PATCH 5/5] hostmem: init/finalize hostmem listener Liu Ping Fan
  2013-04-11 10:02   ` Paolo Bonzini
@ 2013-06-13  4:38   ` Amos Kong
  2013-06-13  8:51     ` liu ping fan
  1 sibling, 1 reply; 35+ messages in thread
From: Amos Kong @ 2013-06-13  4:38 UTC (permalink / raw)
  To: Liu Ping Fan
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

On Mon, Apr 01, 2013 at 04:20:34PM +0800, Liu Ping Fan wrote:
> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  vl.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index 7643f16..46a25cf 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4157,6 +4157,7 @@ int main(int argc, char **argv, char **envp)
>      }
>  
>      os_set_line_buffering();
> +    hostmem_init();
>  
>      qemu_init_cpu_loop();
>      qemu_mutex_lock_iothread();
> @@ -4174,6 +4175,7 @@ int main(int argc, char **argv, char **envp)
>  
>      /* clean up network at qemu process termination */
>      atexit(&net_cleanup);
> +    atexit(&hostmem_finalize);


The func registered by atexit() can only be called at normal termination.
If qemu process is abort() or killed by 'kill -9', the func won't be
called.

A known issue: at the abnormal termination, downscript could not be
executed to cleanup tap device. Can we suggest user to clean network
manually in this condition?


>  
>      if (net_init_clients() < 0) {
>          exit(1);
> -- 

-- 
			Amos.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] about atexit() (was: [PATCH 5/5] hostmem: init/finalize hostmem listener)
  2013-06-13  4:38   ` [Qemu-devel] about atexit() (was: [PATCH 5/5] hostmem: init/finalize hostmem listener) Amos Kong
@ 2013-06-13  8:51     ` liu ping fan
  0 siblings, 0 replies; 35+ messages in thread
From: liu ping fan @ 2013-06-13  8:51 UTC (permalink / raw)
  To: Amos Kong
  Cc: Peter Maydell, Anthony Liguori, Jan Kiszka, Marcelo Tosatti,
	qemu-devel, Vasilis Liaskovitis, Stefan Hajnoczi, Paolo Bonzini

On Thu, Jun 13, 2013 at 12:38 PM, Amos Kong <akong@redhat.com> wrote:
> On Mon, Apr 01, 2013 at 04:20:34PM +0800, Liu Ping Fan wrote:
>> From: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  vl.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/vl.c b/vl.c
>> index 7643f16..46a25cf 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -4157,6 +4157,7 @@ int main(int argc, char **argv, char **envp)
>>      }
>>
>>      os_set_line_buffering();
>> +    hostmem_init();
>>
>>      qemu_init_cpu_loop();
>>      qemu_mutex_lock_iothread();
>> @@ -4174,6 +4175,7 @@ int main(int argc, char **argv, char **envp)
>>
>>      /* clean up network at qemu process termination */
>>      atexit(&net_cleanup);
>> +    atexit(&hostmem_finalize);
>
>
> The func registered by atexit() can only be called at normal termination.
> If qemu process is abort() or killed by 'kill -9', the func won't be
> called.
>
> A known issue: at the abnormal termination, downscript could not be
> executed to cleanup tap device. Can we suggest user to clean network
> manually in this condition?
>
SIG_KILL leaves no opportunity for us to do extra things, so I think
your suggestion is the only way out.
>
>>
>>      if (net_init_clients() < 0) {
>>          exit(1);
>> --
>
> --
>                         Amos.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2013-06-13  8:51 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-01  8:20 [Qemu-devel] [PATCH 0/5] proposal to make hostmem listener RAM unplug safe Liu Ping Fan
2013-04-01  8:20 ` [Qemu-devel] [PATCH 1/5] memory: add ref/unref interface for MemroyRegionOps Liu Ping Fan
2013-04-11  9:49   ` Stefan Hajnoczi
2013-04-12  4:12     ` liu ping fan
2013-04-01  8:20 ` [Qemu-devel] [PATCH 2/5] hostmem: make hostmem global and RAM hotunplg safe Liu Ping Fan
2013-04-02  6:11   ` li guang
2013-04-11 10:09   ` Paolo Bonzini
2013-04-12  6:46     ` liu ping fan
2013-04-12  8:21       ` Stefan Hajnoczi
2013-04-11 10:11   ` Stefan Hajnoczi
2013-04-11 10:26     ` Paolo Bonzini
2013-04-12  3:55     ` liu ping fan
2013-04-12  8:38       ` Stefan Hajnoczi
2013-04-12 10:03         ` Paolo Bonzini
2013-04-15  1:42           ` liu ping fan
2013-04-15  6:38             ` Paolo Bonzini
2013-04-01  8:20 ` [Qemu-devel] [PATCH 3/5] vring: use hostmem's RAM safe api Liu Ping Fan
2013-04-11 10:15   ` Stefan Hajnoczi
2013-04-12  4:49     ` liu ping fan
2013-04-12  8:41       ` Stefan Hajnoczi
2013-04-01  8:20 ` [Qemu-devel] [PATCH 4/5] virtio-blk: release reference to RAM's memoryRegion Liu Ping Fan
2013-04-02  5:58   ` li guang
2013-04-12  4:44     ` liu ping fan
2013-04-11 10:20   ` Stefan Hajnoczi
2013-04-12  4:48     ` liu ping fan
2013-04-12  8:45       ` Stefan Hajnoczi
2013-04-12  9:05         ` liu ping fan
2013-04-16  7:57           ` Stefan Hajnoczi
2013-04-16  8:12             ` Paolo Bonzini
2013-04-01  8:20 ` [Qemu-devel] [PATCH 5/5] hostmem: init/finalize hostmem listener Liu Ping Fan
2013-04-11 10:02   ` Paolo Bonzini
2013-04-11 12:08     ` liu ping fan
2013-04-11 13:14       ` Paolo Bonzini
2013-06-13  4:38   ` [Qemu-devel] about atexit() (was: [PATCH 5/5] hostmem: init/finalize hostmem listener) Amos Kong
2013-06-13  8:51     ` liu ping fan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.