All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages
@ 2013-03-21 16:05 Orit Wasserman
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 1/9] Add QemuFileWritevBuffer QemuFileOps Orit Wasserman
                   ` (10 more replies)
  0 siblings, 11 replies; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, pbonzini, mst, chegu_vinod, quintela

In migration all data is copied to a static buffer in QEMUFile,
this hurts our network bandwidth and CPU usage especially with large guests.
We switched to iovec for storing different buffers to send (even a byte field is
considered as a buffer) and use writev to send the iovec.
writev was chosen (as apposed to sendmsg) because it supprts non socket fds.
  
Guest memory pages are not copied by calling a new function 
qemu_put_buffer_no_copy.
The page header data and device state data are still copied into the static
buffer. This data consists of a lot of bytes and integer fields and the static
buffer is used to store it during batching.
Another improvement is changing qemu_putbe64/32/16 to create a single
buffer instead of several byte sized buffer.

git repository: git://github.com/oritwas/qemu.git sendv_v2

Change from v2:
Always send data for the iovec even if writev_buffer is not implemented.
Coalesce adjacent iovecs to create one big buffer from small adjacent buffer.

Changes from v1:
Use iov_send for socket.
Make writev_buffer optional and if it is not implemented use put_buffer

Future work: Make number of iovec changeable

Orit Wasserman (9):
  Add QemuFileWritevBuffer QemuFileOps
  Add socket_writev_buffer function
  Update bytes_xfer in qemu_put_byte
  Store the data to send also in iovec
  Use writev ops if available
  More optimized qemu_put_be64/32/16
  Add qemu_put_buffer_no_copy
  Use qemu_put_buffer_no_copy for guest memory pages
  coalesce adjacent iovecs

 arch_init.c                   |   2 +-
 include/migration/qemu-file.h |  12 ++++
 savevm.c                      | 127 ++++++++++++++++++++++++++++++++++--------
 3 files changed, 116 insertions(+), 25 deletions(-)

-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v3 1/9] Add QemuFileWritevBuffer QemuFileOps
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
@ 2013-03-21 16:05 ` Orit Wasserman
  2013-03-21 16:57   ` Juan Quintela
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 2/9] Add socket_writev_buffer function Orit Wasserman
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, pbonzini, mst, chegu_vinod, quintela

This will allow us to write an iovec

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 include/migration/qemu-file.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index df81261..8d3da9b 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -51,11 +51,18 @@ typedef int (QEMUFileCloseFunc)(void *opaque);
  */
 typedef int (QEMUFileGetFD)(void *opaque);
 
+/*
+ * This function writes an iovec to file.
+ */
+typedef int (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
+                                       int iovcnt);
+
 typedef struct QEMUFileOps {
     QEMUFilePutBufferFunc *put_buffer;
     QEMUFileGetBufferFunc *get_buffer;
     QEMUFileCloseFunc *close;
     QEMUFileGetFD *get_fd;
+    QEMUFileWritevBufferFunc *writev_buffer;
 } QEMUFileOps;
 
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v3 2/9] Add socket_writev_buffer function
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 1/9] Add QemuFileWritevBuffer QemuFileOps Orit Wasserman
@ 2013-03-21 16:05 ` Orit Wasserman
  2013-03-21 16:55   ` Juan Quintela
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte Orit Wasserman
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, pbonzini, mst, chegu_vinod, quintela

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 savevm.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/savevm.c b/savevm.c
index 35c8d1e..baa45ae 100644
--- a/savevm.c
+++ b/savevm.c
@@ -39,6 +39,7 @@
 #include "qmp-commands.h"
 #include "trace.h"
 #include "qemu/bitops.h"
+#include "qemu/iov.h"
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -171,6 +172,18 @@ static void coroutine_fn yield_until_fd_readable(int fd)
     qemu_coroutine_yield();
 }
 
+static int socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt)
+{
+    QEMUFileSocket *s = opaque;
+    ssize_t len;
+
+    len = iov_send(s->fd, iov, iovcnt, 0, iov_size(iov, iovcnt));
+    if (len < iov_size(iov, iovcnt)) {
+        len = -socket_error();
+    }
+    return len;
+}
+
 static int socket_get_fd(void *opaque)
 {
     QEMUFileSocket *s = opaque;
@@ -387,6 +400,7 @@ static const QEMUFileOps socket_read_ops = {
 static const QEMUFileOps socket_write_ops = {
     .get_fd =     socket_get_fd,
     .put_buffer = socket_put_buffer,
+    .writev_buffer = socket_writev_buffer,
     .close =      socket_close
 };
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 1/9] Add QemuFileWritevBuffer QemuFileOps Orit Wasserman
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 2/9] Add socket_writev_buffer function Orit Wasserman
@ 2013-03-21 16:05 ` Orit Wasserman
  2013-03-21 17:06   ` Juan Quintela
  2013-03-21 17:38   ` Eric Blake
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 4/9] Store the data to send also in iovec Orit Wasserman
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, pbonzini, mst, chegu_vinod, quintela

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 savevm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/savevm.c b/savevm.c
index baa45ae..686c8c8 100644
--- a/savevm.c
+++ b/savevm.c
@@ -647,6 +647,8 @@ void qemu_put_byte(QEMUFile *f, int v)
 
     f->buf[f->buf_index++] = v;
     f->is_write = 1;
+    f->bytes_xfer += 1;
+
     if (f->buf_index >= IO_BUF_SIZE) {
         qemu_fflush(f);
     }
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v3 4/9] Store the data to send also in iovec
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
                   ` (2 preceding siblings ...)
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte Orit Wasserman
@ 2013-03-21 16:05 ` Orit Wasserman
  2013-03-21 17:11   ` Juan Quintela
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 5/9] Use writev ops if available Orit Wasserman
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, pbonzini, mst, chegu_vinod, quintela

All data is still copied into the static buffer.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 savevm.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/savevm.c b/savevm.c
index 686c8c8..ab81dd3 100644
--- a/savevm.c
+++ b/savevm.c
@@ -114,6 +114,7 @@ void qemu_announce_self(void)
 /* savevm/loadvm support */
 
 #define IO_BUF_SIZE 32768
+#define MAX_IOV_SIZE MIN(IOV_MAX, 64)
 
 struct QEMUFile {
     const QEMUFileOps *ops;
@@ -129,6 +130,9 @@ struct QEMUFile {
     int buf_size; /* 0 when writing */
     uint8_t buf[IO_BUF_SIZE];
 
+    struct iovec iov[MAX_IOV_SIZE];
+    unsigned int iovcnt;
+
     int last_error;
 };
 
@@ -527,6 +531,7 @@ static void qemu_fflush(QEMUFile *f)
             f->pos += f->buf_index;
         }
         f->buf_index = 0;
+        f->iovcnt = 0;
     }
     if (ret < 0) {
         qemu_file_set_error(f, ret);
@@ -619,12 +624,14 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size)
         if (l > size)
             l = size;
         memcpy(f->buf + f->buf_index, buf, l);
+        f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
+        f->iov[f->iovcnt++].iov_len = l;
         f->is_write = 1;
         f->buf_index += l;
         f->bytes_xfer += l;
         buf += l;
         size -= l;
-        if (f->buf_index >= IO_BUF_SIZE) {
+        if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
             qemu_fflush(f);
             if (qemu_file_get_error(f)) {
                 break;
@@ -648,8 +655,10 @@ void qemu_put_byte(QEMUFile *f, int v)
     f->buf[f->buf_index++] = v;
     f->is_write = 1;
     f->bytes_xfer += 1;
+    f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
+    f->iov[f->iovcnt++].iov_len = 1;
 
-    if (f->buf_index >= IO_BUF_SIZE) {
+    if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
         qemu_fflush(f);
     }
 }
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v3 5/9] Use writev ops if available
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
                   ` (3 preceding siblings ...)
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 4/9] Store the data to send also in iovec Orit Wasserman
@ 2013-03-21 16:05 ` Orit Wasserman
  2013-03-21 17:17   ` Juan Quintela
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 6/9] More optimized qemu_put_be64/32/16 Orit Wasserman
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, pbonzini, mst, chegu_vinod, quintela

Update qemu_fflush and stdio_close to use writev ops if they are available
Use the buffers stored in the iovec.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 savevm.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/savevm.c b/savevm.c
index ab81dd3..f69dce3 100644
--- a/savevm.c
+++ b/savevm.c
@@ -292,7 +292,7 @@ static int stdio_fclose(void *opaque)
     QEMUFileStdio *s = opaque;
     int ret = 0;
 
-    if (s->file->ops->put_buffer) {
+    if (s->file->ops->put_buffer || s->file->ops->writev_buffer) {
         int fd = fileno(s->stdio_file);
         struct stat st;
 
@@ -515,24 +515,40 @@ static void qemu_file_set_error(QEMUFile *f, int ret)
     }
 }
 
-/** Flushes QEMUFile buffer
+/**
+ * Flushes QEMUFile buffer
  *
+ * If there is writev_buffer QEMUFileOps it uses it otherwise uses
+ * put_buffer ops.
  */
 static void qemu_fflush(QEMUFile *f)
 {
     int ret = 0;
+    int i = 0;
 
-    if (!f->ops->put_buffer) {
+    if (!f->ops->writev_buffer && !f->ops->put_buffer) {
         return;
     }
-    if (f->is_write && f->buf_index > 0) {
-        ret = f->ops->put_buffer(f->opaque, f->buf, f->pos, f->buf_index);
-        if (ret >= 0) {
-            f->pos += f->buf_index;
+
+    if (f->is_write && f->iovcnt > 0) {
+        if (f->ops->writev_buffer) {
+            ret = f->ops->writev_buffer(f->opaque, f->iov, f->iovcnt);
+            if (ret >= 0) {
+                f->pos += ret;
+            }
+        } else {
+            for (i = 0; i < f->iovcnt && ret >= 0; i++) {
+                ret = f->ops->put_buffer(f->opaque, f->iov[i].iov_base, f->pos,
+                                         f->iov[i].iov_len);
+                if (ret >= 0) {
+                    f->pos += ret;
+                }
+            }
         }
         f->buf_index = 0;
         f->iovcnt = 0;
     }
+
     if (ret < 0) {
         qemu_file_set_error(f, ret);
     }
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v3 6/9] More optimized qemu_put_be64/32/16
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
                   ` (4 preceding siblings ...)
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 5/9] Use writev ops if available Orit Wasserman
@ 2013-03-21 16:05 ` Orit Wasserman
  2013-03-21 17:18   ` Juan Quintela
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 7/9] Add qemu_put_buffer_no_copy Orit Wasserman
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, pbonzini, mst, chegu_vinod, quintela

qemu_put_be functions used qemu_put_byte this caused lots of 1 bytes buffers
in the iovec.
we move to use cpu_put_be64/32/16wu and put a single buffer per call.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 savevm.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/savevm.c b/savevm.c
index f69dce3..83aa9e7 100644
--- a/savevm.c
+++ b/savevm.c
@@ -795,22 +795,26 @@ void qemu_file_reset_rate_limit(QEMUFile *f)
 
 void qemu_put_be16(QEMUFile *f, unsigned int v)
 {
-    qemu_put_byte(f, v >> 8);
-    qemu_put_byte(f, v);
+    uint16_t p;
+    cpu_to_be16wu(&p, v);
+
+    qemu_put_buffer(f, (uint8_t *)&p, sizeof(p));
 }
 
 void qemu_put_be32(QEMUFile *f, unsigned int v)
 {
-    qemu_put_byte(f, v >> 24);
-    qemu_put_byte(f, v >> 16);
-    qemu_put_byte(f, v >> 8);
-    qemu_put_byte(f, v);
+    uint32_t p;
+    cpu_to_be32wu(&p, v);
+
+    qemu_put_buffer(f, (uint8_t *)&p, sizeof(p));
 }
 
 void qemu_put_be64(QEMUFile *f, uint64_t v)
 {
-    qemu_put_be32(f, v >> 32);
-    qemu_put_be32(f, v);
+    uint64_t p;
+    cpu_to_be64wu(&p, v);
+
+    qemu_put_buffer(f, (uint8_t *)&p, sizeof(p));
 }
 
 unsigned int qemu_get_be16(QEMUFile *f)
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v3 7/9] Add qemu_put_buffer_no_copy
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
                   ` (5 preceding siblings ...)
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 6/9] More optimized qemu_put_be64/32/16 Orit Wasserman
@ 2013-03-21 16:05 ` Orit Wasserman
  2013-03-21 17:34   ` Juan Quintela
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages Orit Wasserman
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, pbonzini, mst, chegu_vinod, quintela

This allow us to add a buffer to the iovec to send without copying it
into the static buffer.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 include/migration/qemu-file.h |  5 +++++
 savevm.c                      | 37 ++++++++++++++++++++++++++++---------
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 8d3da9b..5168be2 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -75,6 +75,11 @@ int qemu_fclose(QEMUFile *f);
 int64_t qemu_ftell(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
+/*
+ * put_buffer without copying the buffer.
+ * The buffer should be available till it is sent.
+ */
+void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size);
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/savevm.c b/savevm.c
index 83aa9e7..fbfb9e3 100644
--- a/savevm.c
+++ b/savevm.c
@@ -621,6 +621,30 @@ int qemu_fclose(QEMUFile *f)
     return ret;
 }
 
+
+void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
+{
+    if (f->last_error) {
+        return;
+    }
+
+    if (f->is_write == 0 && f->buf_index > 0) {
+        fprintf(stderr,
+                "Attempted to write to buffer while read buffer is not empty\n");
+        abort();
+    }
+
+    f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
+    f->iov[f->iovcnt++].iov_len = size;
+
+    f->is_write = 1;
+    f->bytes_xfer += size;
+
+    if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
+        qemu_fflush(f);
+    }
+}
+
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size)
 {
     int l;
@@ -640,19 +664,14 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size)
         if (l > size)
             l = size;
         memcpy(f->buf + f->buf_index, buf, l);
-        f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
-        f->iov[f->iovcnt++].iov_len = l;
         f->is_write = 1;
         f->buf_index += l;
-        f->bytes_xfer += l;
+        qemu_put_buffer_no_copy(f, f->buf + (f->buf_index - l), l);
+        if (qemu_file_get_error(f)) {
+            break;
+        }
         buf += l;
         size -= l;
-        if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
-            qemu_fflush(f);
-            if (qemu_file_get_error(f)) {
-                break;
-            }
-        }
     }
 }
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
                   ` (6 preceding siblings ...)
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 7/9] Add qemu_put_buffer_no_copy Orit Wasserman
@ 2013-03-21 16:05 ` Orit Wasserman
  2013-03-21 16:14   ` Michael S. Tsirkin
  2013-03-21 17:37   ` Juan Quintela
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs Orit Wasserman
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, pbonzini, mst, chegu_vinod, quintela

This will remove an unneeded copy of guest memory pages.
For the page header and device state we still copy the data to the
static buffer the other option is to allocate the memory on demand
which is more expensive.

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 arch_init.c | 2 +-
 savevm.c    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 98e2bc6..27b53eb 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -481,7 +481,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
             /* XBZRLE overflow or normal page */
             if (bytes_sent == -1) {
                 bytes_sent = save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
-                qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+                qemu_put_buffer_no_copy(f, p, TARGET_PAGE_SIZE);
                 bytes_sent += TARGET_PAGE_SIZE;
                 acct_info.norm_pages++;
             }
diff --git a/savevm.c b/savevm.c
index fbfb9e3..50e8fb2 100644
--- a/savevm.c
+++ b/savevm.c
@@ -634,7 +634,7 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
         abort();
     }
 
-    f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
+    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
     f->iov[f->iovcnt++].iov_len = size;
 
     f->is_write = 1;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
                   ` (7 preceding siblings ...)
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages Orit Wasserman
@ 2013-03-21 16:05 ` Orit Wasserman
  2013-03-21 16:16   ` Michael S. Tsirkin
  2013-03-21 17:41   ` Juan Quintela
  2013-03-21 17:12 ` [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Paolo Bonzini
  2013-03-21 17:42 ` Juan Quintela
  10 siblings, 2 replies; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Orit Wasserman, pbonzini, mst, chegu_vinod, quintela

This way we send one big buffer instead of many small ones

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
---
 savevm.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/savevm.c b/savevm.c
index 50e8fb2..13a533b 100644
--- a/savevm.c
+++ b/savevm.c
@@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
         abort();
     }
 
-    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
-    f->iov[f->iovcnt++].iov_len = size;
+    /* check for adjoint buffer and colace them */
+    if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
+        f->iov[f->iovcnt - 1].iov_len) {
+        f->iov[f->iovcnt - 1].iov_len += size;
+    } else {
+        f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
+        f->iov[f->iovcnt++].iov_len = size;
+    }
 
     f->is_write = 1;
     f->bytes_xfer += size;
@@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v)
     f->buf[f->buf_index++] = v;
     f->is_write = 1;
     f->bytes_xfer += 1;
-    f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
-    f->iov[f->iovcnt++].iov_len = 1;
+
+    /* check for adjoint buffer and colace them */
+    if (f->iovcnt > 0 && f->buf + (f->buf_index - 1)  ==
+        f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) {
+        f->iov[f->iovcnt - 1].iov_len += 1;
+    } else {
+        f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
+        f->iov[f->iovcnt++].iov_len = 1;
+    }
 
     if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
         qemu_fflush(f);
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages Orit Wasserman
@ 2013-03-21 16:14   ` Michael S. Tsirkin
  2013-03-21 17:37   ` Juan Quintela
  1 sibling, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 16:14 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, quintela

On Thu, Mar 21, 2013 at 06:05:39PM +0200, Orit Wasserman wrote:
> This will remove an unneeded copy of guest memory pages.
> For the page header and device state we still copy the data to the
> static buffer the other option is to allocate the memory on demand
> which is more expensive.
> 
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Okay so with this, the _nocopy can be later rewritten to do vmsplice to
save another copy? Cool.

> ---
>  arch_init.c | 2 +-
>  savevm.c    | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 98e2bc6..27b53eb 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -481,7 +481,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>              /* XBZRLE overflow or normal page */
>              if (bytes_sent == -1) {
>                  bytes_sent = save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
> -                qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> +                qemu_put_buffer_no_copy(f, p, TARGET_PAGE_SIZE);
>                  bytes_sent += TARGET_PAGE_SIZE;
>                  acct_info.norm_pages++;
>              }
> diff --git a/savevm.c b/savevm.c
> index fbfb9e3..50e8fb2 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -634,7 +634,7 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
>          abort();
>      }
>  
> -    f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
> +    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
>      f->iov[f->iovcnt++].iov_len = size;
>  
>      f->is_write = 1;
> -- 
> 1.7.11.7

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs Orit Wasserman
@ 2013-03-21 16:16   ` Michael S. Tsirkin
  2013-03-21 16:27     ` Orit Wasserman
  2013-03-21 17:41   ` Juan Quintela
  1 sibling, 1 reply; 37+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 16:16 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, quintela

On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote:
> This way we send one big buffer instead of many small ones
> 
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Why does this happen BTW?

> ---
>  savevm.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/savevm.c b/savevm.c
> index 50e8fb2..13a533b 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
>          abort();
>      }
>  
> -    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> -    f->iov[f->iovcnt++].iov_len = size;
> +    /* check for adjoint buffer and colace them */
> +    if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
> +        f->iov[f->iovcnt - 1].iov_len) {
> +        f->iov[f->iovcnt - 1].iov_len += size;
> +    } else {
> +        f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> +        f->iov[f->iovcnt++].iov_len = size;
> +    }
>  
>      f->is_write = 1;
>      f->bytes_xfer += size;
> @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v)
>      f->buf[f->buf_index++] = v;
>      f->is_write = 1;
>      f->bytes_xfer += 1;
> -    f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
> -    f->iov[f->iovcnt++].iov_len = 1;
> +
> +    /* check for adjoint buffer and colace them */
> +    if (f->iovcnt > 0 && f->buf + (f->buf_index - 1)  ==
> +        f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) {
> +        f->iov[f->iovcnt - 1].iov_len += 1;
> +    } else {
> +        f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
> +        f->iov[f->iovcnt++].iov_len = 1;
> +    }
>  
>      if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
>          qemu_fflush(f);
> -- 
> 1.7.11.7

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 16:16   ` Michael S. Tsirkin
@ 2013-03-21 16:27     ` Orit Wasserman
  2013-03-21 16:29       ` Michael S. Tsirkin
  0 siblings, 1 reply; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: pbonzini, chegu_vinod, qemu-devel, quintela

On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote:
>> This way we send one big buffer instead of many small ones
>>
>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> 
> Why does this happen BTW?

It happens in the last phase when we send the device state that consists of a lot
bytes and int field that are written using qemu_put_byte/be16/...


> 
>> ---
>>  savevm.c | 21 +++++++++++++++++----
>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/savevm.c b/savevm.c
>> index 50e8fb2..13a533b 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
>>          abort();
>>      }
>>  
>> -    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
>> -    f->iov[f->iovcnt++].iov_len = size;
>> +    /* check for adjoint buffer and colace them */
>> +    if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
>> +        f->iov[f->iovcnt - 1].iov_len) {
>> +        f->iov[f->iovcnt - 1].iov_len += size;
>> +    } else {
>> +        f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
>> +        f->iov[f->iovcnt++].iov_len = size;
>> +    }
>>  
>>      f->is_write = 1;
>>      f->bytes_xfer += size;
>> @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v)
>>      f->buf[f->buf_index++] = v;
>>      f->is_write = 1;
>>      f->bytes_xfer += 1;
>> -    f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
>> -    f->iov[f->iovcnt++].iov_len = 1;
>> +
>> +    /* check for adjoint buffer and colace them */
>> +    if (f->iovcnt > 0 && f->buf + (f->buf_index - 1)  ==
>> +        f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) {
>> +        f->iov[f->iovcnt - 1].iov_len += 1;
>> +    } else {
>> +        f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
>> +        f->iov[f->iovcnt++].iov_len = 1;
>> +    }
>>  
>>      if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
>>          qemu_fflush(f);
>> -- 
>> 1.7.11.7

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 16:27     ` Orit Wasserman
@ 2013-03-21 16:29       ` Michael S. Tsirkin
  2013-03-21 16:40         ` Orit Wasserman
  2013-03-21 17:44         ` Juan Quintela
  0 siblings, 2 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 16:29 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, quintela

On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote:
> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote:
> > On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote:
> >> This way we send one big buffer instead of many small ones
> >>
> >> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> > 
> > Why does this happen BTW?
> 
> It happens in the last phase when we send the device state that consists of a lot
> bytes and int field that are written using qemu_put_byte/be16/...
> 

Confused  I thought device_state does not use _nocopy?
My idea of using vmsplice relies exactly on this:
we can not splice device state ...

> > 
> >> ---
> >>  savevm.c | 21 +++++++++++++++++----
> >>  1 file changed, 17 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/savevm.c b/savevm.c
> >> index 50e8fb2..13a533b 100644
> >> --- a/savevm.c
> >> +++ b/savevm.c
> >> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
> >>          abort();
> >>      }
> >>  
> >> -    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> >> -    f->iov[f->iovcnt++].iov_len = size;
> >> +    /* check for adjoint buffer and colace them */
> >> +    if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
> >> +        f->iov[f->iovcnt - 1].iov_len) {
> >> +        f->iov[f->iovcnt - 1].iov_len += size;
> >> +    } else {
> >> +        f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> >> +        f->iov[f->iovcnt++].iov_len = size;
> >> +    }
> >>  
> >>      f->is_write = 1;
> >>      f->bytes_xfer += size;
> >> @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v)
> >>      f->buf[f->buf_index++] = v;
> >>      f->is_write = 1;
> >>      f->bytes_xfer += 1;
> >> -    f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
> >> -    f->iov[f->iovcnt++].iov_len = 1;
> >> +
> >> +    /* check for adjoint buffer and colace them */
> >> +    if (f->iovcnt > 0 && f->buf + (f->buf_index - 1)  ==
> >> +        f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) {
> >> +        f->iov[f->iovcnt - 1].iov_len += 1;
> >> +    } else {
> >> +        f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
> >> +        f->iov[f->iovcnt++].iov_len = 1;
> >> +    }
> >>  
> >>      if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
> >>          qemu_fflush(f);
> >> -- 
> >> 1.7.11.7

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 16:29       ` Michael S. Tsirkin
@ 2013-03-21 16:40         ` Orit Wasserman
  2013-03-21 17:10           ` Michael S. Tsirkin
  2013-03-21 17:44         ` Juan Quintela
  1 sibling, 1 reply; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 16:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: pbonzini, chegu_vinod, qemu-devel, quintela

On 03/21/2013 06:29 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote:
>> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote:
>>> On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote:
>>>> This way we send one big buffer instead of many small ones
>>>>
>>>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
>>>
>>> Why does this happen BTW?
>>
>> It happens in the last phase when we send the device state that consists of a lot
>> bytes and int field that are written using qemu_put_byte/be16/...
>>
> 
> Confused  I thought device_state does not use _nocopy?
> My idea of using vmsplice relies exactly on this:
> we can not splice device state ...

qemu_put_buffer calls qemu_put_buffer_no_copy (you know code reuse)
So I guess we will need to add some flag if we want to use vmsplice or not
or split no_copy into external and internal. It won't be a big change

> 
>>>
>>>> ---
>>>>  savevm.c | 21 +++++++++++++++++----
>>>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/savevm.c b/savevm.c
>>>> index 50e8fb2..13a533b 100644
>>>> --- a/savevm.c
>>>> +++ b/savevm.c
>>>> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
>>>>          abort();
>>>>      }
>>>>  
>>>> -    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
>>>> -    f->iov[f->iovcnt++].iov_len = size;
>>>> +    /* check for adjoint buffer and colace them */
>>>> +    if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
>>>> +        f->iov[f->iovcnt - 1].iov_len) {
>>>> +        f->iov[f->iovcnt - 1].iov_len += size;
>>>> +    } else {
>>>> +        f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
>>>> +        f->iov[f->iovcnt++].iov_len = size;
>>>> +    }
>>>>  
>>>>      f->is_write = 1;
>>>>      f->bytes_xfer += size;
>>>> @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v)
>>>>      f->buf[f->buf_index++] = v;
>>>>      f->is_write = 1;
>>>>      f->bytes_xfer += 1;
>>>> -    f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
>>>> -    f->iov[f->iovcnt++].iov_len = 1;
>>>> +
>>>> +    /* check for adjoint buffer and colace them */
>>>> +    if (f->iovcnt > 0 && f->buf + (f->buf_index - 1)  ==
>>>> +        f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) {
>>>> +        f->iov[f->iovcnt - 1].iov_len += 1;
>>>> +    } else {
>>>> +        f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
>>>> +        f->iov[f->iovcnt++].iov_len = 1;
>>>> +    }
>>>>  
>>>>      if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
>>>>          qemu_fflush(f);
>>>> -- 
>>>> 1.7.11.7

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/9] Add socket_writev_buffer function
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 2/9] Add socket_writev_buffer function Orit Wasserman
@ 2013-03-21 16:55   ` Juan Quintela
  0 siblings, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 16:55 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

Orit Wasserman <owasserm@redhat.com> wrote:
D> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> ---
>  savevm.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/savevm.c b/savevm.c
> index 35c8d1e..baa45ae 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -39,6 +39,7 @@
>  #include "qmp-commands.h"
>  #include "trace.h"
>  #include "qemu/bitops.h"
> +#include "qemu/iov.h"
>  
>  #define SELF_ANNOUNCE_ROUNDS 5
>  
> @@ -171,6 +172,18 @@ static void coroutine_fn yield_until_fd_readable(int fd)
>      qemu_coroutine_yield();
>  }
>  
> +static int socket_writev_buffer(void *opaque, struct iovec *iov, int iovcnt)
> +{
> +    QEMUFileSocket *s = opaque;
> +    ssize_t len;
> +
> +    len = iov_send(s->fd, iov, iovcnt, 0, iov_size(iov, iovcnt));
> +    if (len < iov_size(iov, iovcnt)) {

Not the more expensive operation ever, but we can factor
  iov_size(iov, iovcnt)
value into one variable.

> +        len = -socket_error();
> +    }
> +    return len;
> +}
> +
>  static int socket_get_fd(void *opaque)
>  {
>      QEMUFileSocket *s = opaque;
> @@ -387,6 +400,7 @@ static const QEMUFileOps socket_read_ops = {
>  static const QEMUFileOps socket_write_ops = {
>      .get_fd =     socket_get_fd,
>      .put_buffer = socket_put_buffer,
> +    .writev_buffer = socket_writev_buffer,
>      .close =      socket_close
>  };

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/9] Add QemuFileWritevBuffer QemuFileOps
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 1/9] Add QemuFileWritevBuffer QemuFileOps Orit Wasserman
@ 2013-03-21 16:57   ` Juan Quintela
  0 siblings, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 16:57 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

Orit Wasserman <owasserm@redhat.com> wrote:
> This will allow us to write an iovec
>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte Orit Wasserman
@ 2013-03-21 17:06   ` Juan Quintela
  2013-03-21 17:38   ` Eric Blake
  1 sibling, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 17:06 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

Orit Wasserman <owasserm@redhat.com> wrote:
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Acked-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 16:40         ` Orit Wasserman
@ 2013-03-21 17:10           ` Michael S. Tsirkin
  0 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 17:10 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, quintela

On Thu, Mar 21, 2013 at 06:40:24PM +0200, Orit Wasserman wrote:
> On 03/21/2013 06:29 PM, Michael S. Tsirkin wrote:
> > On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote:
> >> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote:
> >>> On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote:
> >>>> This way we send one big buffer instead of many small ones
> >>>>
> >>>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> >>>
> >>> Why does this happen BTW?
> >>
> >> It happens in the last phase when we send the device state that consists of a lot
> >> bytes and int field that are written using qemu_put_byte/be16/...
> >>
> > 
> > Confused  I thought device_state does not use _nocopy?
> > My idea of using vmsplice relies exactly on this:
> > we can not splice device state ...
> 
> qemu_put_buffer calls qemu_put_buffer_no_copy (you know code reuse)
> So I guess we will need to add some flag if we want to use vmsplice or not
> or split no_copy into external and internal. It won't be a big change

Fair enough. Have time to look into this? Data copies is like 15% CPU,
more with swap ...

> > 
> >>>
> >>>> ---
> >>>>  savevm.c | 21 +++++++++++++++++----
> >>>>  1 file changed, 17 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/savevm.c b/savevm.c
> >>>> index 50e8fb2..13a533b 100644
> >>>> --- a/savevm.c
> >>>> +++ b/savevm.c
> >>>> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
> >>>>          abort();
> >>>>      }
> >>>>  
> >>>> -    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> >>>> -    f->iov[f->iovcnt++].iov_len = size;
> >>>> +    /* check for adjoint buffer and colace them */
> >>>> +    if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
> >>>> +        f->iov[f->iovcnt - 1].iov_len) {
> >>>> +        f->iov[f->iovcnt - 1].iov_len += size;
> >>>> +    } else {
> >>>> +        f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> >>>> +        f->iov[f->iovcnt++].iov_len = size;
> >>>> +    }
> >>>>  
> >>>>      f->is_write = 1;
> >>>>      f->bytes_xfer += size;
> >>>> @@ -690,8 +696,15 @@ void qemu_put_byte(QEMUFile *f, int v)
> >>>>      f->buf[f->buf_index++] = v;
> >>>>      f->is_write = 1;
> >>>>      f->bytes_xfer += 1;
> >>>> -    f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
> >>>> -    f->iov[f->iovcnt++].iov_len = 1;
> >>>> +
> >>>> +    /* check for adjoint buffer and colace them */
> >>>> +    if (f->iovcnt > 0 && f->buf + (f->buf_index - 1)  ==
> >>>> +        f->iov[f->iovcnt - 1].iov_base + f->iov[f->iovcnt - 1].iov_len) {
> >>>> +        f->iov[f->iovcnt - 1].iov_len += 1;
> >>>> +    } else {
> >>>> +        f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
> >>>> +        f->iov[f->iovcnt++].iov_len = 1;
> >>>> +    }
> >>>>  
> >>>>      if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
> >>>>          qemu_fflush(f);
> >>>> -- 
> >>>> 1.7.11.7

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 4/9] Store the data to send also in iovec
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 4/9] Store the data to send also in iovec Orit Wasserman
@ 2013-03-21 17:11   ` Juan Quintela
  0 siblings, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 17:11 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

Orit Wasserman <owasserm@redhat.com> wrote:
> All data is still copied into the static buffer.
>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> ---
>  savevm.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/savevm.c b/savevm.c
> index 686c8c8..ab81dd3 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -114,6 +114,7 @@ void qemu_announce_self(void)
>  /* savevm/loadvm support */
>  
>  #define IO_BUF_SIZE 32768
> +#define MAX_IOV_SIZE MIN(IOV_MAX, 64)
>  
>  struct QEMUFile {
>      const QEMUFileOps *ops;
> @@ -129,6 +130,9 @@ struct QEMUFile {
>      int buf_size; /* 0 when writing */
>      uint8_t buf[IO_BUF_SIZE];
>  
> +    struct iovec iov[MAX_IOV_SIZE];
> +    unsigned int iovcnt;
> +
>      int last_error;
>  };
>  
> @@ -527,6 +531,7 @@ static void qemu_fflush(QEMUFile *f)
>              f->pos += f->buf_index;
>          }
>          f->buf_index = 0;
> +        f->iovcnt = 0;
>      }
>      if (ret < 0) {
>          qemu_file_set_error(f, ret);
> @@ -619,12 +624,14 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size)
>          if (l > size)
>              l = size;
>          memcpy(f->buf + f->buf_index, buf, l);
> +        f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
> +        f->iov[f->iovcnt++].iov_len = l;
>          f->is_write = 1;
>          f->buf_index += l;
>          f->bytes_xfer += l;

After reading the patch several times, we need to do a s/l/len/ on this
function, 1 and l are too similar, and I was reading the wrong value :p

>          buf += l;
>          size -= l;
> -        if (f->buf_index >= IO_BUF_SIZE) {
> +        if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
>              qemu_fflush(f);
>              if (qemu_file_get_error(f)) {
>                  break;
> @@ -648,8 +655,10 @@ void qemu_put_byte(QEMUFile *f, int v)
>      f->buf[f->buf_index++] = v;
>      f->is_write = 1;
>      f->bytes_xfer += 1;
> +    f->iov[f->iovcnt].iov_base = f->buf + (f->buf_index - 1);
> +    f->iov[f->iovcnt++].iov_len = 1;
>  
> -    if (f->buf_index >= IO_BUF_SIZE) {
> +    if (f->buf_index >= IO_BUF_SIZE || f->iovcnt >= MAX_IOV_SIZE) {
>          qemu_fflush(f);
>      }
>  }

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
                   ` (8 preceding siblings ...)
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs Orit Wasserman
@ 2013-03-21 17:12 ` Paolo Bonzini
  2013-03-21 17:35   ` Orit Wasserman
  2013-03-21 17:42 ` Juan Quintela
  10 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2013-03-21 17:12 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: mst, chegu_vinod, qemu-devel, quintela

Il 21/03/2013 17:05, Orit Wasserman ha scritto:
> In migration all data is copied to a static buffer in QEMUFile,
> this hurts our network bandwidth and CPU usage especially with large guests.
> We switched to iovec for storing different buffers to send (even a byte field is
> considered as a buffer) and use writev to send the iovec.
> writev was chosen (as apposed to sendmsg) because it supprts non socket fds.
>   
> Guest memory pages are not copied by calling a new function 
> qemu_put_buffer_no_copy.
> The page header data and device state data are still copied into the static
> buffer. This data consists of a lot of bytes and integer fields and the static
> buffer is used to store it during batching.
> Another improvement is changing qemu_putbe64/32/16 to create a single
> buffer instead of several byte sized buffer.
> 
> git repository: git://github.com/oritwas/qemu.git sendv_v2
> 
> Change from v2:
> Always send data for the iovec even if writev_buffer is not implemented.
> Coalesce adjacent iovecs to create one big buffer from small adjacent buffer.
> 
> Changes from v1:
> Use iov_send for socket.
> Make writev_buffer optional and if it is not implemented use put_buffer

I didn't review it too closely, so I don't feel like giving my
Reviewed-by.  Still, I must say it is a very nice and simple solution.
Kudos!

Just one question: patch 6 is not strictly needed anymore now that you
have coalescing, no?  Should Juan leave it out?

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 5/9] Use writev ops if available
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 5/9] Use writev ops if available Orit Wasserman
@ 2013-03-21 17:17   ` Juan Quintela
  0 siblings, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 17:17 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

Orit Wasserman <owasserm@redhat.com> wrote:
> Update qemu_fflush and stdio_close to use writev ops if they are available
> Use the buffers stored in the iovec.
>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

I think that qemu_fflush() should only be called for QEMUFiles opened in
write mode, but that is independent of this series.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 6/9] More optimized qemu_put_be64/32/16
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 6/9] More optimized qemu_put_be64/32/16 Orit Wasserman
@ 2013-03-21 17:18   ` Juan Quintela
  0 siblings, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 17:18 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

Orit Wasserman <owasserm@redhat.com> wrote:
> qemu_put_be functions used qemu_put_byte this caused lots of 1 bytes buffers
> in the iovec.
> we move to use cpu_put_be64/32/16wu and put a single buffer per call.
>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 7/9] Add qemu_put_buffer_no_copy
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 7/9] Add qemu_put_buffer_no_copy Orit Wasserman
@ 2013-03-21 17:34   ` Juan Quintela
  2013-03-21 17:39     ` Orit Wasserman
  0 siblings, 1 reply; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 17:34 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

Orit Wasserman <owasserm@redhat.com> wrote:
> This allow us to add a buffer to the iovec to send without copying it
> into the static buffer.
>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> ---
>  include/migration/qemu-file.h |  5 +++++
>  savevm.c                      | 37 ++++++++++++++++++++++++++++---------
>  2 files changed, 33 insertions(+), 9 deletions(-)
>
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index 8d3da9b..5168be2 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -75,6 +75,11 @@ int qemu_fclose(QEMUFile *f);
>  int64_t qemu_ftell(QEMUFile *f);
>  void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
>  void qemu_put_byte(QEMUFile *f, int v);
> +/*
> + * put_buffer without copying the buffer.
> + * The buffer should be available till it is sent.
> + */
> +void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size);
>  
>  static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
>  {
> diff --git a/savevm.c b/savevm.c
> index 83aa9e7..fbfb9e3 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -621,6 +621,30 @@ int qemu_fclose(QEMUFile *f)
>      return ret;
>  }
>  
> +
> +void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
> +{
> +    if (f->last_error) {
> +        return;
> +    }
> +
> +    if (f->is_write == 0 && f->buf_index > 0) {
> +        fprintf(stderr,
> +                "Attempted to write to buffer while read buffer is not empty\n");
> +        abort();
> +    }

I don't understand this test at all (yes, I know that the test already
existed).

We shouldn't never arrived qemu_put_buffer() with a QEMUFile with
f->is_write == 0.

Change it for one assert()?

> +    f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
> +    f->iov[f->iovcnt++].iov_len = size;

This is clearly wrong, or I have completely missunderstood it (I will
give a 50% to each posiblitiy).

Here we should be using "buf" and "size" passed as paramenters, f->buf
and f->buf_index shouldn't be used, no?

> +
> +    f->is_write = 1;

is_write is completely redundant, and should just be a:

f->ops->put_buffer test (or now with writev).  We only set it up when
there is anything to be written?

But again, this is independent of this series.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages
  2013-03-21 17:12 ` [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Paolo Bonzini
@ 2013-03-21 17:35   ` Orit Wasserman
  0 siblings, 0 replies; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 17:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: mst, chegu_vinod, qemu-devel, quintela

On 03/21/2013 07:12 PM, Paolo Bonzini wrote:
> Il 21/03/2013 17:05, Orit Wasserman ha scritto:
>> In migration all data is copied to a static buffer in QEMUFile,
>> this hurts our network bandwidth and CPU usage especially with large guests.
>> We switched to iovec for storing different buffers to send (even a byte field is
>> considered as a buffer) and use writev to send the iovec.
>> writev was chosen (as apposed to sendmsg) because it supprts non socket fds.
>>   
>> Guest memory pages are not copied by calling a new function 
>> qemu_put_buffer_no_copy.
>> The page header data and device state data are still copied into the static
>> buffer. This data consists of a lot of bytes and integer fields and the static
>> buffer is used to store it during batching.
>> Another improvement is changing qemu_putbe64/32/16 to create a single
>> buffer instead of several byte sized buffer.
>>
>> git repository: git://github.com/oritwas/qemu.git sendv_v2
>>
>> Change from v2:
>> Always send data for the iovec even if writev_buffer is not implemented.
>> Coalesce adjacent iovecs to create one big buffer from small adjacent buffer.
>>
>> Changes from v1:
>> Use iov_send for socket.
>> Make writev_buffer optional and if it is not implemented use put_buffer
> 
> I didn't review it too closely, so I don't feel like giving my
> Reviewed-by.  Still, I must say it is a very nice and simple solution.
> Kudos!
> 
> Just one question: patch 6 is not strictly needed anymore now that you
> have coalescing, no?  Should Juan leave it out?
true, Juan had a small fix so I'm sending v4 I will remove it.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages Orit Wasserman
  2013-03-21 16:14   ` Michael S. Tsirkin
@ 2013-03-21 17:37   ` Juan Quintela
  2013-03-21 18:08     ` Orit Wasserman
  1 sibling, 1 reply; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 17:37 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

Orit Wasserman <owasserm@redhat.com> wrote:
> This will remove an unneeded copy of guest memory pages.
> For the page header and device state we still copy the data to the
> static buffer the other option is to allocate the memory on demand
> which is more expensive.
>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> ---
>  arch_init.c | 2 +-
>  savevm.c    | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch_init.c b/arch_init.c
> index 98e2bc6..27b53eb 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -481,7 +481,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>              /* XBZRLE overflow or normal page */
>              if (bytes_sent == -1) {
>                  bytes_sent = save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
> -                qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> +                qemu_put_buffer_no_copy(f, p, TARGET_PAGE_SIZE);
>                  bytes_sent += TARGET_PAGE_SIZE;
>                  acct_info.norm_pages++;
>              }

Once here, shouldn't we also change:

block-migration.c::blk_send()

    qemu_put_buffer(f, blk->buf, BLOCK_SIZE);

to nocopy?

Again, this can be an additional patch.


> diff --git a/savevm.c b/savevm.c
> index fbfb9e3..50e8fb2 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -634,7 +634,7 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
>          abort();
>      }
>  
> -    f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
> +    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
>      f->iov[f->iovcnt++].iov_len = size;

This bit should be in the previous patch, the error that I pointed?

Later, Juan.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte Orit Wasserman
  2013-03-21 17:06   ` Juan Quintela
@ 2013-03-21 17:38   ` Eric Blake
  2013-03-21 17:41     ` Orit Wasserman
  1 sibling, 1 reply; 37+ messages in thread
From: Eric Blake @ 2013-03-21 17:38 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, quintela, chegu_vinod, qemu-devel, mst

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

On 03/21/2013 10:05 AM, Orit Wasserman wrote:
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> ---
>  savevm.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/savevm.c b/savevm.c
> index baa45ae..686c8c8 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -647,6 +647,8 @@ void qemu_put_byte(QEMUFile *f, int v)
>  
>      f->buf[f->buf_index++] = v;
>      f->is_write = 1;
> +    f->bytes_xfer += 1;

Why ' += 1' instead of the shorter '++'?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 7/9] Add qemu_put_buffer_no_copy
  2013-03-21 17:34   ` Juan Quintela
@ 2013-03-21 17:39     ` Orit Wasserman
  0 siblings, 0 replies; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 17:39 UTC (permalink / raw)
  To: quintela; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

On 03/21/2013 07:34 PM, Juan Quintela wrote:
> Orit Wasserman <owasserm@redhat.com> wrote:
>> This allow us to add a buffer to the iovec to send without copying it
>> into the static buffer.
>>
>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
>> ---
>>  include/migration/qemu-file.h |  5 +++++
>>  savevm.c                      | 37 ++++++++++++++++++++++++++++---------
>>  2 files changed, 33 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
>> index 8d3da9b..5168be2 100644
>> --- a/include/migration/qemu-file.h
>> +++ b/include/migration/qemu-file.h
>> @@ -75,6 +75,11 @@ int qemu_fclose(QEMUFile *f);
>>  int64_t qemu_ftell(QEMUFile *f);
>>  void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
>>  void qemu_put_byte(QEMUFile *f, int v);
>> +/*
>> + * put_buffer without copying the buffer.
>> + * The buffer should be available till it is sent.
>> + */
>> +void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size);
>>  
>>  static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
>>  {
>> diff --git a/savevm.c b/savevm.c
>> index 83aa9e7..fbfb9e3 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -621,6 +621,30 @@ int qemu_fclose(QEMUFile *f)
>>      return ret;
>>  }
>>  
>> +
>> +void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
>> +{
>> +    if (f->last_error) {
>> +        return;
>> +    }
>> +
>> +    if (f->is_write == 0 && f->buf_index > 0) {
>> +        fprintf(stderr,
>> +                "Attempted to write to buffer while read buffer is not empty\n");
>> +        abort();
>> +    }
> 
> I don't understand this test at all (yes, I know that the test already
> existed).
> 
> We shouldn't never arrived qemu_put_buffer() with a QEMUFile with
> f->is_write == 0.
> 
> Change it for one assert()?
> 
>> +    f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
>> +    f->iov[f->iovcnt++].iov_len = size;
> 
> This is clearly wrong, or I have completely missunderstood it (I will
> give a 50% to each posiblitiy).
> 
> Here we should be using "buf" and "size" passed as paramenters, f->buf
> and f->buf_index shouldn't be used, no?
you are right, it somehow moved to the next patch :(
I will send a fixed v4 after you finish review the series ..
> 
>> +
>> +    f->is_write = 1;
> 
> is_write is completely redundant, and should just be a:
> 
> f->ops->put_buffer test (or now with writev).  We only set it up when
> there is anything to be written?
> 
> But again, this is independent of this series.
> 
I agree, maybe we need a cleanup series and remove it.
It will certainly simplify this code

Orit

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs Orit Wasserman
  2013-03-21 16:16   ` Michael S. Tsirkin
@ 2013-03-21 17:41   ` Juan Quintela
  1 sibling, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 17:41 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

Orit Wasserman <owasserm@redhat.com> wrote:
> This way we send one big buffer instead of many small ones
>
> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> ---
>  savevm.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/savevm.c b/savevm.c
> index 50e8fb2..13a533b 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -634,8 +634,14 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
>          abort();
>      }
>  
> -    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> -    f->iov[f->iovcnt++].iov_len = size;


> +    /* check for adjoint buffer and colace them */

s/colace/coalesce/

> +    if (f->iovcnt > 0 && buf == f->iov[f->iovcnt - 1].iov_base +
> +        f->iov[f->iovcnt - 1].iov_len) {
> +        f->iov[f->iovcnt - 1].iov_len += size;
> +    } else {
> +        f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> +        f->iov[f->iovcnt++].iov_len = size;
> +    }

Can we create a function for this?
Would help make sure that we don't forget to "fix" both places when we
have changes.

Later, Juan.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte
  2013-03-21 17:38   ` Eric Blake
@ 2013-03-21 17:41     ` Orit Wasserman
  0 siblings, 0 replies; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 17:41 UTC (permalink / raw)
  To: Eric Blake; +Cc: pbonzini, quintela, chegu_vinod, qemu-devel, mst

On 03/21/2013 07:38 PM, Eric Blake wrote:
> On 03/21/2013 10:05 AM, Orit Wasserman wrote:
>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
>> ---
>>  savevm.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/savevm.c b/savevm.c
>> index baa45ae..686c8c8 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -647,6 +647,8 @@ void qemu_put_byte(QEMUFile *f, int v)
>>  
>>      f->buf[f->buf_index++] = v;
>>      f->is_write = 1;
>> +    f->bytes_xfer += 1;
> 
> Why ' += 1' instead of the shorter '++'?
> 
It compiles to the same code but I can change it :)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages
  2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
                   ` (9 preceding siblings ...)
  2013-03-21 17:12 ` [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Paolo Bonzini
@ 2013-03-21 17:42 ` Juan Quintela
  10 siblings, 0 replies; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 17:42 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

Orit Wasserman <owasserm@redhat.com> wrote:
> In migration all data is copied to a static buffer in QEMUFile,
> this hurts our network bandwidth and CPU usage especially with large guests.
> We switched to iovec for storing different buffers to send (even a byte field is
> considered as a buffer) and use writev to send the iovec.
> writev was chosen (as apposed to sendmsg) because it supprts non socket fds.
>   
> Guest memory pages are not copied by calling a new function 
> qemu_put_buffer_no_copy.
> The page header data and device state data are still copied into the static
> buffer. This data consists of a lot of bytes and integer fields and the static
> buffer is used to store it during batching.
> Another improvement is changing qemu_putbe64/32/16 to create a single
> buffer instead of several byte sized buffer.
>
> git repository: git://github.com/oritwas/qemu.git sendv_v2

it is still sendv_v2 or sendv_v3 O:-)

> Change from v2:
> Always send data for the iovec even if writev_buffer is not implemented.
> Coalesce adjacent iovecs to create one big buffer from small adjacent buffer.
>
> Changes from v1:
> Use iov_send for socket.
> Make writev_buffer optional and if it is not implemented use put_buffer
>
> Future work: Make number of iovec changeable
>
> Orit Wasserman (9):
>   Add QemuFileWritevBuffer QemuFileOps
>   Add socket_writev_buffer function
>   Update bytes_xfer in qemu_put_byte
>   Store the data to send also in iovec
>   Use writev ops if available
>   More optimized qemu_put_be64/32/16
>   Add qemu_put_buffer_no_copy
>   Use qemu_put_buffer_no_copy for guest memory pages
>   coalesce adjacent iovecs
>
>  arch_init.c                   |   2 +-
>  include/migration/qemu-file.h |  12 ++++
>  savevm.c                      | 127 ++++++++++++++++++++++++++++++++++--------
>  3 files changed, 116 insertions(+), 25 deletions(-)

Really nice series.  Just a couple of small nits.

Later, Juan.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 16:29       ` Michael S. Tsirkin
  2013-03-21 16:40         ` Orit Wasserman
@ 2013-03-21 17:44         ` Juan Quintela
  2013-03-21 17:46           ` Michael S. Tsirkin
  1 sibling, 1 reply; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 17:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Orit Wasserman, chegu_vinod, qemu-devel, pbonzini

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote:
>> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote:
>> > On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote:
>> >> This way we send one big buffer instead of many small ones
>> >>
>> >> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
>> > 
>> > Why does this happen BTW?
>> 
>> It happens in the last phase when we send the device state that
>> consists of a lot
>> bytes and int field that are written using qemu_put_byte/be16/...
>> 
>
> Confused  I thought device_state does not use _nocopy?
> My idea of using vmsplice relies exactly on this:
> we can not splice device state ...


As it is today, I am not sure that we can use vmsplice() because we
are sending:


<header>
<page>
<header>
<page>
<header>
<page>

We can optimize at some pount to write a bigger/different header and
sent a bunch of pages together, but just now we don't have that code.

Later, Juan.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 17:44         ` Juan Quintela
@ 2013-03-21 17:46           ` Michael S. Tsirkin
  2013-03-21 18:22             ` Juan Quintela
  0 siblings, 1 reply; 37+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 17:46 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Orit Wasserman, chegu_vinod, qemu-devel, pbonzini

On Thu, Mar 21, 2013 at 06:44:14PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Mar 21, 2013 at 06:27:42PM +0200, Orit Wasserman wrote:
> >> On 03/21/2013 06:16 PM, Michael S. Tsirkin wrote:
> >> > On Thu, Mar 21, 2013 at 06:05:40PM +0200, Orit Wasserman wrote:
> >> >> This way we send one big buffer instead of many small ones
> >> >>
> >> >> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> >> > 
> >> > Why does this happen BTW?
> >> 
> >> It happens in the last phase when we send the device state that
> >> consists of a lot
> >> bytes and int field that are written using qemu_put_byte/be16/...
> >> 
> >
> > Confused  I thought device_state does not use _nocopy?
> > My idea of using vmsplice relies exactly on this:
> > we can not splice device state ...
> 
> 
> As it is today, I am not sure that we can use vmsplice() because we
> are sending:
> 
> 
> <header>
> <page>
> <header>
> <page>
> <header>
> <page>
> 
> We can optimize at some pount to write a bigger/different header and
> sent a bunch of pages together, but just now we don't have that code.
> 
> Later, Juan.

Sending the page can do vmsplice, can't it?
Multipage is likely a good idea anyway, e.g. RDMA wants to
do this too.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages
  2013-03-21 17:37   ` Juan Quintela
@ 2013-03-21 18:08     ` Orit Wasserman
  2013-03-21 18:21       ` Michael S. Tsirkin
  0 siblings, 1 reply; 37+ messages in thread
From: Orit Wasserman @ 2013-03-21 18:08 UTC (permalink / raw)
  To: quintela; +Cc: pbonzini, chegu_vinod, qemu-devel, mst

On 03/21/2013 07:37 PM, Juan Quintela wrote:
> Orit Wasserman <owasserm@redhat.com> wrote:
>> This will remove an unneeded copy of guest memory pages.
>> For the page header and device state we still copy the data to the
>> static buffer the other option is to allocate the memory on demand
>> which is more expensive.
>>
>> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
>> ---
>>  arch_init.c | 2 +-
>>  savevm.c    | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch_init.c b/arch_init.c
>> index 98e2bc6..27b53eb 100644
>> --- a/arch_init.c
>> +++ b/arch_init.c
>> @@ -481,7 +481,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
>>              /* XBZRLE overflow or normal page */
>>              if (bytes_sent == -1) {
>>                  bytes_sent = save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
>> -                qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
>> +                qemu_put_buffer_no_copy(f, p, TARGET_PAGE_SIZE);
>>                  bytes_sent += TARGET_PAGE_SIZE;
>>                  acct_info.norm_pages++;
>>              }
> 
> Once here, shouldn't we also change:
> 
> block-migration.c::blk_send()
> 
>     qemu_put_buffer(f, blk->buf, BLOCK_SIZE);
> 
> to nocopy?
Sadly no, I looked at the code and I saw the buffer is freed immediately
after call blk_send :(. look at mig_save_device_dirty.

Orit
> 
> Again, this can be an additional patch.
> 
> 
>> diff --git a/savevm.c b/savevm.c
>> index fbfb9e3..50e8fb2 100644
>> --- a/savevm.c
>> +++ b/savevm.c
>> @@ -634,7 +634,7 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
>>          abort();
>>      }
>>  
>> -    f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
>> +    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
>>      f->iov[f->iovcnt++].iov_len = size;
> 
> This bit should be in the previous patch, the error that I pointed?
> 
> Later, Juan.
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages
  2013-03-21 18:08     ` Orit Wasserman
@ 2013-03-21 18:21       ` Michael S. Tsirkin
  0 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 18:21 UTC (permalink / raw)
  To: Orit Wasserman; +Cc: pbonzini, chegu_vinod, qemu-devel, quintela

On Thu, Mar 21, 2013 at 08:08:58PM +0200, Orit Wasserman wrote:
> On 03/21/2013 07:37 PM, Juan Quintela wrote:
> > Orit Wasserman <owasserm@redhat.com> wrote:
> >> This will remove an unneeded copy of guest memory pages.
> >> For the page header and device state we still copy the data to the
> >> static buffer the other option is to allocate the memory on demand
> >> which is more expensive.
> >>
> >> Signed-off-by: Orit Wasserman <owasserm@redhat.com>
> >> ---
> >>  arch_init.c | 2 +-
> >>  savevm.c    | 2 +-
> >>  2 files changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch_init.c b/arch_init.c
> >> index 98e2bc6..27b53eb 100644
> >> --- a/arch_init.c
> >> +++ b/arch_init.c
> >> @@ -481,7 +481,7 @@ static int ram_save_block(QEMUFile *f, bool last_stage)
> >>              /* XBZRLE overflow or normal page */
> >>              if (bytes_sent == -1) {
> >>                  bytes_sent = save_block_hdr(f, block, offset, cont, RAM_SAVE_FLAG_PAGE);
> >> -                qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
> >> +                qemu_put_buffer_no_copy(f, p, TARGET_PAGE_SIZE);
> >>                  bytes_sent += TARGET_PAGE_SIZE;
> >>                  acct_info.norm_pages++;
> >>              }
> > 
> > Once here, shouldn't we also change:
> > 
> > block-migration.c::blk_send()
> > 
> >     qemu_put_buffer(f, blk->buf, BLOCK_SIZE);
> > 
> > to nocopy?
> Sadly no, I looked at the code and I saw the buffer is freed immediately
> after call blk_send :(. look at mig_save_device_dirty.
> 
> Orit

Hmm if _nocopy also implies it is not safe to free the
buffer immediately, then I'd rather put this in a name
e.g. _async instead of _nocopy.

> > 
> > Again, this can be an additional patch.
> > 
> > 
> >> diff --git a/savevm.c b/savevm.c
> >> index fbfb9e3..50e8fb2 100644
> >> --- a/savevm.c
> >> +++ b/savevm.c
> >> @@ -634,7 +634,7 @@ void qemu_put_buffer_no_copy(QEMUFile *f, const uint8_t *buf, int size)
> >>          abort();
> >>      }
> >>  
> >> -    f->iov[f->iovcnt].iov_base = f->buf + f->buf_index;
> >> +    f->iov[f->iovcnt].iov_base = (uint8_t *)buf;
> >>      f->iov[f->iovcnt++].iov_len = size;
> > 
> > This bit should be in the previous patch, the error that I pointed?
> > 
> > Later, Juan.
> > 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 17:46           ` Michael S. Tsirkin
@ 2013-03-21 18:22             ` Juan Quintela
  2013-03-21 18:33               ` Michael S. Tsirkin
  0 siblings, 1 reply; 37+ messages in thread
From: Juan Quintela @ 2013-03-21 18:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Orit Wasserman, chegu_vinod, qemu-devel, pbonzini

"Michael S. Tsirkin" <mst@redhat.com> wrote:

>> <header>
>> <page>
>> <header>
>> <page>
>> <header>
>> <page>
>> 
>> We can optimize at some pount to write a bigger/different header and
>> sent a bunch of pages together, but just now we don't have that code.
>> 
>> Later, Juan.
>
> Sending the page can do vmsplice, can't it?
> Multipage is likely a good idea anyway, e.g. RDMA wants to
> do this too.

RDMA requires de pages lock into memory, no?

Later, Juan.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs
  2013-03-21 18:22             ` Juan Quintela
@ 2013-03-21 18:33               ` Michael S. Tsirkin
  0 siblings, 0 replies; 37+ messages in thread
From: Michael S. Tsirkin @ 2013-03-21 18:33 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Orit Wasserman, chegu_vinod, qemu-devel, pbonzini

On Thu, Mar 21, 2013 at 07:22:01PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> >> <header>
> >> <page>
> >> <header>
> >> <page>
> >> <header>
> >> <page>
> >> 
> >> We can optimize at some pount to write a bigger/different header and
> >> sent a bunch of pages together, but just now we don't have that code.
> >> 
> >> Later, Juan.
> >
> > Sending the page can do vmsplice, can't it?
> > Multipage is likely a good idea anyway, e.g. RDMA wants to
> > do this too.
> 
> RDMA requires de pages lock into memory, no?
> 
> Later, Juan.

That's a separate issue, but the point is it is asynchronous.
It can be made to behave exactly the same implementing this API:

	- send async: lock, send
	- flush: poll for completion, unlock

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2013-03-21 18:35 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-21 16:05 [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Orit Wasserman
2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 1/9] Add QemuFileWritevBuffer QemuFileOps Orit Wasserman
2013-03-21 16:57   ` Juan Quintela
2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 2/9] Add socket_writev_buffer function Orit Wasserman
2013-03-21 16:55   ` Juan Quintela
2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 3/9] Update bytes_xfer in qemu_put_byte Orit Wasserman
2013-03-21 17:06   ` Juan Quintela
2013-03-21 17:38   ` Eric Blake
2013-03-21 17:41     ` Orit Wasserman
2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 4/9] Store the data to send also in iovec Orit Wasserman
2013-03-21 17:11   ` Juan Quintela
2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 5/9] Use writev ops if available Orit Wasserman
2013-03-21 17:17   ` Juan Quintela
2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 6/9] More optimized qemu_put_be64/32/16 Orit Wasserman
2013-03-21 17:18   ` Juan Quintela
2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 7/9] Add qemu_put_buffer_no_copy Orit Wasserman
2013-03-21 17:34   ` Juan Quintela
2013-03-21 17:39     ` Orit Wasserman
2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 8/9] Use qemu_put_buffer_no_copy for guest memory pages Orit Wasserman
2013-03-21 16:14   ` Michael S. Tsirkin
2013-03-21 17:37   ` Juan Quintela
2013-03-21 18:08     ` Orit Wasserman
2013-03-21 18:21       ` Michael S. Tsirkin
2013-03-21 16:05 ` [Qemu-devel] [PATCH v3 9/9] coalesce adjacent iovecs Orit Wasserman
2013-03-21 16:16   ` Michael S. Tsirkin
2013-03-21 16:27     ` Orit Wasserman
2013-03-21 16:29       ` Michael S. Tsirkin
2013-03-21 16:40         ` Orit Wasserman
2013-03-21 17:10           ` Michael S. Tsirkin
2013-03-21 17:44         ` Juan Quintela
2013-03-21 17:46           ` Michael S. Tsirkin
2013-03-21 18:22             ` Juan Quintela
2013-03-21 18:33               ` Michael S. Tsirkin
2013-03-21 17:41   ` Juan Quintela
2013-03-21 17:12 ` [Qemu-devel] [PATCH v3 0/9] Migration: Remove copying of guest ram pages Paolo Bonzini
2013-03-21 17:35   ` Orit Wasserman
2013-03-21 17:42 ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.