All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/6] tools/libxl: Introduce min and max macros
@ 2015-03-16 13:29 Ross Lagerwall
  2015-03-16 13:29 ` [PATCH v3 2/6] tools/libxl: Update datacopier to support sending data only Ross Lagerwall
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Ross Lagerwall @ 2015-03-16 13:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wei Liu

From: Andrew Cooper <andrew.cooper3@citrix.com>

This is the same set used by libxc.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_internal.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 934465a..fcbec7f 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -108,6 +108,22 @@
 
 #define ARRAY_SIZE(a) (sizeof(a) / sizeof(a[0]))
 
+#define min(X, Y) ({                             \
+            const typeof (X) _x = (X);           \
+            const typeof (Y) _y = (Y);           \
+            (void) (&_x == &_y);                 \
+            (_x < _y) ? _x : _y; })
+#define max(X, Y) ({                             \
+            const typeof (X) _x = (X);           \
+            const typeof (Y) _y = (Y);           \
+            (void) (&_x == &_y);                 \
+            (_x > _y) ? _x : _y; })
+
+#define min_t(type, x, y)                                               \
+    ({ const type _x = (x); const type _y = (y); _x < _y ? _x: _y; })
+#define max_t(type, x, y)                                               \
+    ({ const type _x = (x); const type _y = (y); _x > _y ? _x: _y; })
+
 #define LIBXL__LOGGING_ENABLED
 
 #ifdef LIBXL__LOGGING_ENABLED
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 2/6] tools/libxl: Update datacopier to support sending data only
  2015-03-16 13:29 [PATCH v3 1/6] tools/libxl: Introduce min and max macros Ross Lagerwall
@ 2015-03-16 13:29 ` Ross Lagerwall
  2015-03-16 13:29 ` [PATCH v3 3/6] tools/libxl: Avoid overrunning static buffer with prefixdata Ross Lagerwall
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Ross Lagerwall @ 2015-03-16 13:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wen Congyang, Wei Liu

From: Wen Congyang <wency@cn.fujitsu.com>

Currently, starting a datacopier requires a valid read and write fd, but this
is a problem when purely sending data from a local buffer to a writable fd.

The prefixdata mechanism already exists and works for inserting data from a
local buffer ahead of reading from the read fd.

Make the lack of a read fd non-fatal.  A datacopier with no read fd, but some
prefixdata will write the prefixdata to the write fd and complete successfully.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
[Rewrite commit message]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_aoutils.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index b10d2e1..3e0c0ae 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -309,9 +309,11 @@ int libxl__datacopier_start(libxl__datacopier_state *dc)
 
     libxl__datacopier_init(dc);
 
-    rc = libxl__ev_fd_register(gc, &dc->toread, datacopier_readable,
-                               dc->readfd, POLLIN);
-    if (rc) goto out;
+    if (dc->readfd >= 0) {
+        rc = libxl__ev_fd_register(gc, &dc->toread, datacopier_readable,
+                                   dc->readfd, POLLIN);
+        if (rc) goto out;
+    }
 
     rc = libxl__ev_fd_register(gc, &dc->towrite, datacopier_writable,
                                dc->writefd, POLLOUT);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/6] tools/libxl: Avoid overrunning static buffer with prefixdata
  2015-03-16 13:29 [PATCH v3 1/6] tools/libxl: Introduce min and max macros Ross Lagerwall
  2015-03-16 13:29 ` [PATCH v3 2/6] tools/libxl: Update datacopier to support sending data only Ross Lagerwall
@ 2015-03-16 13:29 ` Ross Lagerwall
  2015-03-16 13:29 ` [PATCH v3 4/6] tools/libxl: Allow limiting amount copied by datacopier Ross Lagerwall
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Ross Lagerwall @ 2015-03-16 13:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Ross Lagerwall, Wei Liu, Ian Jackson, Ian Campbell, Andrew Cooper

An individual datacopier_buf contains a static buffer of 1000 bytes.
Attempting to add prefixdata of more than 1000 bytes would overrun the buffer
and cause heap corruption.

Instead, split the prefixdata and chain together multiple datacopier buffers.
This allows for an arbitrary quantity of prefixdata to be added to a
datacopier.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_aoutils.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 3e0c0ae..6882ca3 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -160,6 +160,8 @@ void libxl__datacopier_prefixdata(libxl__egc *egc, libxl__datacopier_state *dc,
 {
     EGC_GC;
     libxl__datacopier_buf *buf;
+    const uint8_t *ptr;
+
     /*
      * It is safe for this to be called immediately after _start, as
      * is documented in the public comment.  _start's caller must have
@@ -170,12 +172,14 @@ void libxl__datacopier_prefixdata(libxl__egc *egc, libxl__datacopier_state *dc,
 
     assert(len < dc->maxsz - dc->used);
 
-    buf = libxl__zalloc(NOGC, sizeof(*buf));
-    buf->used = len;
-    memcpy(buf->buf, data, len);
+    for (ptr = data; len; len -= buf->used, ptr += buf->used) {
+        buf = libxl__malloc(NOGC, sizeof(*buf));
+        buf->used = min(len, sizeof(buf->buf));
+        memcpy(buf->buf, ptr, buf->used);
 
-    dc->used += len;
-    LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, buf, entry);
+        dc->used += buf->used;
+        LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, buf, entry);
+    }
 }
 
 static int datacopier_pollhup_handled(libxl__egc *egc,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 4/6] tools/libxl: Allow limiting amount copied by datacopier
  2015-03-16 13:29 [PATCH v3 1/6] tools/libxl: Introduce min and max macros Ross Lagerwall
  2015-03-16 13:29 ` [PATCH v3 2/6] tools/libxl: Update datacopier to support sending data only Ross Lagerwall
  2015-03-16 13:29 ` [PATCH v3 3/6] tools/libxl: Avoid overrunning static buffer with prefixdata Ross Lagerwall
@ 2015-03-16 13:29 ` Ross Lagerwall
  2015-03-18 11:12   ` Ian Campbell
  2015-03-16 13:29 ` [PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer Ross Lagerwall
  2015-03-16 13:29 ` [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable Ross Lagerwall
  4 siblings, 1 reply; 29+ messages in thread
From: Ross Lagerwall @ 2015-03-16 13:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Ross Lagerwall, Wei Liu, Ian Jackson, Ian Campbell, Andrew Cooper

Currently, a datacopier will unconditionally read until EOF on its read fd.

For migration v2, libxl needs to read records of a specific length out of the
migration stream, without reading any further data.

Introduce a parameter, bytes_to_read, which may be used to stop the datacopier
ahead of reaching EOF. If bytes_to_read is set to -1, then the datacopier will
read until EOF.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
[Rewrite commit message]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_aoutils.c    | 13 +++++++++----
 tools/libxl/libxl_bootloader.c |  2 ++
 tools/libxl/libxl_dom.c        |  1 +
 tools/libxl/libxl_internal.h   |  1 +
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 6882ca3..dff5d71 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -145,7 +145,8 @@ static void datacopier_check_state(libxl__egc *egc, libxl__datacopier_state *dc)
                 return;
             }
         }
-    } else if (!libxl__ev_fd_isregistered(&dc->toread)) {
+    } else if (!libxl__ev_fd_isregistered(&dc->toread) ||
+               dc->bytes_to_read == 0) {
         /* we have had eof */
         datacopier_callback(egc, dc, 0, 0);
         return;
@@ -231,9 +232,9 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
             buf->used = 0;
             LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, buf, entry);
         }
-        int r = read(ev->fd,
-                     buf->buf + buf->used,
-                     sizeof(buf->buf) - buf->used);
+        int r = read(ev->fd, buf->buf + buf->used,
+                     min_t(size_t, sizeof(buf->buf) - buf->used,
+                           (dc->bytes_to_read == -1) ? SIZE_MAX : dc->bytes_to_read));
         if (r < 0) {
             if (errno == EINTR) continue;
             if (errno == EWOULDBLOCK) break;
@@ -259,6 +260,10 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
         buf->used += r;
         dc->used += r;
         assert(buf->used <= sizeof(buf->buf));
+        if (dc->bytes_to_read > 0)
+            dc->bytes_to_read -= r;
+        if (dc->bytes_to_read == 0)
+            break;
     }
     datacopier_check_state(egc, dc);
 }
diff --git a/tools/libxl/libxl_bootloader.c b/tools/libxl/libxl_bootloader.c
index 79947d4..0263ef9 100644
--- a/tools/libxl/libxl_bootloader.c
+++ b/tools/libxl/libxl_bootloader.c
@@ -516,6 +516,7 @@ static void bootloader_gotptys(libxl__egc *egc, libxl__openpty_state *op)
 
     bl->keystrokes.ao = ao;
     bl->keystrokes.maxsz = BOOTLOADER_BUF_OUT;
+    bl->keystrokes.bytes_to_read = -1;
     bl->keystrokes.copywhat =
         GCSPRINTF("bootloader input for domain %"PRIu32, bl->domid);
     bl->keystrokes.callback =         bootloader_keystrokes_copyfail;
@@ -527,6 +528,7 @@ static void bootloader_gotptys(libxl__egc *egc, libxl__openpty_state *op)
 
     bl->display.ao = ao;
     bl->display.maxsz = BOOTLOADER_BUF_IN;
+    bl->display.bytes_to_read = -1;
     bl->display.copywhat =
         GCSPRINTF("bootloader output for domain %"PRIu32, bl->domid);
     bl->display.callback =         bootloader_display_copyfail;
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index a16d4a1..108676f 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1922,6 +1922,7 @@ void libxl__domain_save_device_model(libxl__egc *egc,
     dc->readfd = -1;
     dc->writefd = fd;
     dc->maxsz = INT_MAX;
+    dc->bytes_to_read = -1;
     dc->copywhat = GCSPRINTF("qemu save file for domain %"PRIu32, dss->domid);
     dc->writewhat = "save/migration stream";
     dc->callback = save_device_model_datacopier_done;
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index fcbec7f..58ac658 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2545,6 +2545,7 @@ struct libxl__datacopier_state {
     libxl__ao *ao;
     int readfd, writefd;
     ssize_t maxsz;
+    ssize_t bytes_to_read; /* set to -1 to read until EOF */
     const char *copywhat, *readwhat, *writewhat; /* for error msgs */
     FILE *log; /* gets a copy of everything */
     libxl__datacopier_callback *callback;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer
  2015-03-16 13:29 [PATCH v3 1/6] tools/libxl: Introduce min and max macros Ross Lagerwall
                   ` (2 preceding siblings ...)
  2015-03-16 13:29 ` [PATCH v3 4/6] tools/libxl: Allow limiting amount copied by datacopier Ross Lagerwall
@ 2015-03-16 13:29 ` Ross Lagerwall
  2015-03-18 11:18   ` Ian Campbell
  2015-04-01 15:53   ` Ian Jackson
  2015-03-16 13:29 ` [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable Ross Lagerwall
  4 siblings, 2 replies; 29+ messages in thread
From: Ross Lagerwall @ 2015-03-16 13:29 UTC (permalink / raw)
  To: xen-devel
  Cc: Ross Lagerwall, Wei Liu, Ian Jackson, Ian Campbell, Andrew Cooper

Currently a datacopier may source its data from an fd or local buffer, but its
destination must be an fd.  For migration v2, libxl needs to read from the
migration stream into a local buffer.

Implement a "read into local buffer" mode, invoked when readbuf is set and
writefd is -1.  On success, the callback passes the number of bytes read.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
[Rewrite commit message]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_aoutils.c  | 59 +++++++++++++++++++++++++++-----------------
 tools/libxl/libxl_internal.h |  8 ++++--
 2 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index dff5d71..a5a10fb 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -134,7 +134,7 @@ static void datacopier_check_state(libxl__egc *egc, libxl__datacopier_state *dc)
     STATE_AO_GC(dc->ao);
     int rc;
     
-    if (dc->used) {
+    if (dc->used && !dc->readbuf) {
         if (!libxl__ev_fd_isregistered(&dc->towrite)) {
             rc = libxl__ev_fd_register(gc, &dc->towrite, datacopier_writable,
                                        dc->writefd, POLLOUT);
@@ -148,7 +148,7 @@ static void datacopier_check_state(libxl__egc *egc, libxl__datacopier_state *dc)
     } else if (!libxl__ev_fd_isregistered(&dc->toread) ||
                dc->bytes_to_read == 0) {
         /* we have had eof */
-        datacopier_callback(egc, dc, 0, 0);
+        datacopier_callback(egc, dc, 0, dc->readbuf ? dc->used : 0);
         return;
     } else {
         /* nothing buffered, but still reading */
@@ -216,25 +216,31 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
     }
     assert(revents & POLLIN);
     for (;;) {
-        while (dc->used >= dc->maxsz) {
-            libxl__datacopier_buf *rm = LIBXL_TAILQ_FIRST(&dc->bufs);
-            dc->used -= rm->used;
-            assert(dc->used >= 0);
-            LIBXL_TAILQ_REMOVE(&dc->bufs, rm, entry);
-            free(rm);
-        }
+        libxl__datacopier_buf *buf = NULL;
+        int r;
+
+        if (dc->readbuf) {
+            r = read(ev->fd, dc->readbuf + dc->used, dc->bytes_to_read);
+        } else {
+            while (dc->used >= dc->maxsz) {
+                libxl__datacopier_buf *rm = LIBXL_TAILQ_FIRST(&dc->bufs);
+                dc->used -= rm->used;
+                assert(dc->used >= 0);
+                LIBXL_TAILQ_REMOVE(&dc->bufs, rm, entry);
+                free(rm);
+            }
 
-        libxl__datacopier_buf *buf =
-            LIBXL_TAILQ_LAST(&dc->bufs, libxl__datacopier_bufs);
-        if (!buf || buf->used >= sizeof(buf->buf)) {
-            buf = malloc(sizeof(*buf));
-            if (!buf) libxl__alloc_failed(CTX, __func__, 1, sizeof(*buf));
-            buf->used = 0;
-            LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, buf, entry);
-        }
-        int r = read(ev->fd, buf->buf + buf->used,
+            buf = LIBXL_TAILQ_LAST(&dc->bufs, libxl__datacopier_bufs);
+            if (!buf || buf->used >= sizeof(buf->buf)) {
+                buf = malloc(sizeof(*buf));
+                if (!buf) libxl__alloc_failed(CTX, __func__, 1, sizeof(*buf));
+                buf->used = 0;
+                LIBXL_TAILQ_INSERT_TAIL(&dc->bufs, buf, entry);
+            }
+            r = read(ev->fd, buf->buf + buf->used,
                      min_t(size_t, sizeof(buf->buf) - buf->used,
                            (dc->bytes_to_read == -1) ? SIZE_MAX : dc->bytes_to_read));
+        }
         if (r < 0) {
             if (errno == EINTR) continue;
             if (errno == EWOULDBLOCK) break;
@@ -257,9 +263,11 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
                 return;
             }
         }
-        buf->used += r;
+        if (!dc->readbuf) {
+            buf->used += r;
+            assert(buf->used <= sizeof(buf->buf));
+        }
         dc->used += r;
-        assert(buf->used <= sizeof(buf->buf));
         if (dc->bytes_to_read > 0)
             dc->bytes_to_read -= r;
         if (dc->bytes_to_read == 0)
@@ -318,15 +326,20 @@ int libxl__datacopier_start(libxl__datacopier_state *dc)
 
     libxl__datacopier_init(dc);
 
+    assert(dc->readfd >= 0 || dc->writefd >= 0);
+    assert(!(dc->readbuf && dc->bytes_to_read == -1));
+
     if (dc->readfd >= 0) {
         rc = libxl__ev_fd_register(gc, &dc->toread, datacopier_readable,
                                    dc->readfd, POLLIN);
         if (rc) goto out;
     }
 
-    rc = libxl__ev_fd_register(gc, &dc->towrite, datacopier_writable,
-                               dc->writefd, POLLOUT);
-    if (rc) goto out;
+    if (dc->writefd >= 0) {
+        rc = libxl__ev_fd_register(gc, &dc->towrite, datacopier_writable,
+                                   dc->writefd, POLLOUT);
+        if (rc) goto out;
+    }
 
     return 0;
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 58ac658..931c00c 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2524,8 +2524,9 @@ typedef struct libxl__datacopier_buf libxl__datacopier_buf;
 
 /* onwrite==1 means failure happened when writing, logged, errnoval is valid
  * onwrite==0 means failure happened when reading
- *     errnoval==0 means we got eof and all data was written
- *     errnoval!=0 means we had a read error, logged
+ *     errnoval>=0 means we got eof and all data was written or number of bytes
+ *                 written when in read mode
+ *     errnoval<0 means we had a read error, logged
  * onwrite==-1 means some other internal failure, errnoval not valid, logged
  * If we get POLLHUP, we call callback_pollhup(..., onwrite, -1);
  * or if callback_pollhup==0 this is an internal failure, as above.
@@ -2550,6 +2551,9 @@ struct libxl__datacopier_state {
     FILE *log; /* gets a copy of everything */
     libxl__datacopier_callback *callback;
     libxl__datacopier_callback *callback_pollhup;
+    void *readbuf; /* Set this to read data into it without writing to an
+                      fd. The buffer should be at least as large as the
+                      bytes_to_read parameter, which should not be -1. */
     /* remaining fields are private to datacopier */
     libxl__ev_fd toread, towrite;
     ssize_t used;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable
  2015-03-16 13:29 [PATCH v3 1/6] tools/libxl: Introduce min and max macros Ross Lagerwall
                   ` (3 preceding siblings ...)
  2015-03-16 13:29 ` [PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer Ross Lagerwall
@ 2015-03-16 13:29 ` Ross Lagerwall
  2015-03-18 11:31   ` Ian Campbell
  2015-03-26 15:20   ` Roger Pau Monné
  4 siblings, 2 replies; 29+ messages in thread
From: Ross Lagerwall @ 2015-03-16 13:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wei Liu

From: Andrew Cooper <andrew.cooper3@citrix.com>

POLLHUP|POLLIN is a valid revent to receive when there is readable data in a
pipe, but the writable fd has been closed.  This occurs in migration v2 when
the legacy conversion process (which transforms the data inline) completes and
exits successfully.

In the case that there is data to read, suppress the POLLHUP.  POSIX states
that the hangup state is latched[1], which means it will reoccur on subsequent
poll() calls.  The datacopier is thus provided the opportunity to read until
EOF, if possible.

A POLLHUP on its own is treated exactly as before, indicating a different
error with the fd.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_aoutils.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index a5a10fb..0d4c8af 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -205,6 +205,9 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
     libxl__datacopier_state *dc = CONTAINER_OF(ev, *dc, toread);
     STATE_AO_GC(dc->ao);
 
+    if ((revents & (POLLHUP|POLLIN)) == (POLLHUP|POLLIN))
+        revents &= ~POLLHUP;
+
     if (datacopier_pollhup_handled(egc, dc, revents, 0))
         return;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 4/6] tools/libxl: Allow limiting amount copied by datacopier
  2015-03-16 13:29 ` [PATCH v3 4/6] tools/libxl: Allow limiting amount copied by datacopier Ross Lagerwall
@ 2015-03-18 11:12   ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2015-03-18 11:12 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, xen-devel

On Mon, 2015-03-16 at 13:29 +0000, Ross Lagerwall wrote:
> Currently, a datacopier will unconditionally read until EOF on its read fd.
> 
> For migration v2, libxl needs to read records of a specific length out of the
> migration stream, without reading any further data.
> 
> Introduce a parameter, bytes_to_read, which may be used to stop the datacopier
> ahead of reaching EOF. If bytes_to_read is set to -1, then the datacopier will
> read until EOF.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> [Rewrite commit message]
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer
  2015-03-16 13:29 ` [PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer Ross Lagerwall
@ 2015-03-18 11:18   ` Ian Campbell
  2015-04-01 15:53   ` Ian Jackson
  1 sibling, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2015-03-18 11:18 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, xen-devel

On Mon, 2015-03-16 at 13:29 +0000, Ross Lagerwall wrote:
> Currently a datacopier may source its data from an fd or local buffer, but its
> destination must be an fd.  For migration v2, libxl needs to read from the
> migration stream into a local buffer.
> 
> Implement a "read into local buffer" mode, invoked when readbuf is set and
> writefd is -1.  On success, the callback passes the number of bytes read.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> [Rewrite commit message]
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable
  2015-03-16 13:29 ` [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable Ross Lagerwall
@ 2015-03-18 11:31   ` Ian Campbell
  2015-03-26 15:20   ` Roger Pau Monné
  1 sibling, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2015-03-18 11:31 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, xen-devel

On Mon, 2015-03-16 at 13:29 +0000, Ross Lagerwall wrote:
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> POLLHUP|POLLIN is a valid revent to receive when there is readable data in a
> pipe, but the writable fd has been closed.  This occurs in migration v2 when
> the legacy conversion process (which transforms the data inline) completes and
> exits successfully.
> 
> In the case that there is data to read, suppress the POLLHUP.  POSIX states
> that the hangup state is latched[1], which means it will reoccur on subsequent
> poll() calls.  The datacopier is thus provided the opportunity to read until
> EOF, if possible.
> 
> A POLLHUP on its own is treated exactly as before, indicating a different
> error with the fd.
> 
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Acked-by: Ian Campbell <Ian.Campbell@citrix.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable
  2015-03-16 13:29 ` [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable Ross Lagerwall
  2015-03-18 11:31   ` Ian Campbell
@ 2015-03-26 15:20   ` Roger Pau Monné
  2015-03-30 10:40     ` Ian Campbell
  1 sibling, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2015-03-26 15:20 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel
  Cc: Andrew Cooper, Ian Jackson, Ian Campbell, Wei Liu

El 16/03/15 a les 14.29, Ross Lagerwall ha escrit:
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> POLLHUP|POLLIN is a valid revent to receive when there is readable data in a
> pipe, but the writable fd has been closed.  This occurs in migration v2 when
> the legacy conversion process (which transforms the data inline) completes and
> exits successfully.
> 
> In the case that there is data to read, suppress the POLLHUP.  POSIX states
> that the hangup state is latched[1], which means it will reoccur on subsequent
> poll() calls.  The datacopier is thus provided the opportunity to read until
> EOF, if possible.
> 
> A POLLHUP on its own is treated exactly as before, indicating a different
> error with the fd.
> 
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html

This patch breaks booting pygrub guests on FreeBSD, it seems to trigger 
some kind of error when pygrub exits:

libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof copying bootloader output
libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot (re-)build domain: -3

I will try to look into it, have you tested if the same happens on 
Linux?

Roger.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable
  2015-03-26 15:20   ` Roger Pau Monné
@ 2015-03-30 10:40     ` Ian Campbell
  2015-04-01 10:34       ` Roger Pau Monné
  0 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2015-03-30 10:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Ross Lagerwall, Andrew Cooper, Ian Jackson, Wei Liu, xen-devel

On Thu, 2015-03-26 at 16:20 +0100, Roger Pau Monné wrote:
> El 16/03/15 a les 14.29, Ross Lagerwall ha escrit:
> > From: Andrew Cooper <andrew.cooper3@citrix.com>
> > 
> > POLLHUP|POLLIN is a valid revent to receive when there is readable data in a
> > pipe, but the writable fd has been closed.  This occurs in migration v2 when
> > the legacy conversion process (which transforms the data inline) completes and
> > exits successfully.
> > 
> > In the case that there is data to read, suppress the POLLHUP.  POSIX states
> > that the hangup state is latched[1], which means it will reoccur on subsequent
> > poll() calls.  The datacopier is thus provided the opportunity to read until
> > EOF, if possible.
> > 
> > A POLLHUP on its own is treated exactly as before, indicating a different
> > error with the fd.
> > 
> > [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
> 
> This patch breaks booting pygrub guests on FreeBSD, it seems to trigger 
> some kind of error when pygrub exits:
> 
> libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof copying bootloader output
> libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot (re-)build domain: -3
> 
> I will try to look into it, have you tested if the same happens on 
> Linux?

Ross/Andrew, have you tried this on Linux?

Roger, Does xl -vvv or even strace give any clues as to the sequences of
events which lead to the error?

Did/does commit 916735e1814d "tools/libxl: Restore errnoval behavior for
datacopier callback" make any difference?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable
  2015-03-30 10:40     ` Ian Campbell
@ 2015-04-01 10:34       ` Roger Pau Monné
  2015-04-01 14:36         ` Andrew Cooper
  0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2015-04-01 10:34 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Ross Lagerwall, Andrew Cooper, Ian Jackson, Wei Liu, xen-devel

Hello,

El 30/03/15 a les 12.40, Ian Campbell ha escrit:
> On Thu, 2015-03-26 at 16:20 +0100, Roger Pau Monné wrote:
>> El 16/03/15 a les 14.29, Ross Lagerwall ha escrit:
>>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>>
>>> POLLHUP|POLLIN is a valid revent to receive when there is readable data in a
>>> pipe, but the writable fd has been closed.  This occurs in migration v2 when
>>> the legacy conversion process (which transforms the data inline) completes and
>>> exits successfully.
>>>
>>> In the case that there is data to read, suppress the POLLHUP.  POSIX states
>>> that the hangup state is latched[1], which means it will reoccur on subsequent
>>> poll() calls.  The datacopier is thus provided the opportunity to read until
>>> EOF, if possible.
>>>
>>> A POLLHUP on its own is treated exactly as before, indicating a different
>>> error with the fd.
>>>
>>> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
>>
>> This patch breaks booting pygrub guests on FreeBSD, it seems to trigger 
>> some kind of error when pygrub exits:
>>
>> libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof copying bootloader output
>> libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot (re-)build domain: -3
>>
>> I will try to look into it, have you tested if the same happens on 
>> Linux?
> 
> Ross/Andrew, have you tried this on Linux?
> 
> Roger, Does xl -vvv or even strace give any clues as to the sequences of
> events which lead to the error?

Here is the output of xl -vvv, it doesn't seem to contain any useful 
info apart from what I've already posted:

# xl -vvv create debian.cfg
Parsing config from debian.cfg
libxl: debug: libxl_create.c:1521:do_domain_create: ao 0x802834080: create: how=0x0 callback=0x0 poller=0x80281f0a0
libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=unknown
libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk vdev=xvda, using backend phy
libxl: debug: libxl_create.c:924:initiate_domain_create: running bootloader
libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=(null) spec.backend=phy
libxl: debug: libxl.c:3065:libxl__device_disk_local_initiate_attach: locally attaching PHY disk /dev/zvol/tank/debian
libxl: debug: libxl_bootloader.c:411:bootloader_disk_attached_cb: Config bootloader value: pygrub
libxl: debug: libxl_bootloader.c:427:bootloader_disk_attached_cb: Checking for bootloader in libexec path: /usr/local/lib/xen/bin/pygrub
libxl: debug: libxl_create.c:1537:do_domain_create: ao 0x802834080: inprogress: poller=0x80281f0a0, flags=i
libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x8028367a0 wpath=/local/domain/2 token=3/0: register slotnum=3
libxl: debug: libxl_event.c:1942:libxl__ao_progress_report: ao 0x802834080: progress report: ignored
libxl: debug: libxl_bootloader.c:539:bootloader_gotptys: executing bootloader: /usr/local/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: /usr/local/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: --output=/var/run/xen/bootloader.2.out
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: --output-format=simple0
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: --output-directory=/var/run/xen/bootloader.2.d
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: /dev/zvol/tank/debian
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x8028367a0 wpath=/local/domain/2 token=3/0: event epath=/local/domain/2
libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof copying bootloader output
libxl: debug: libxl_bootloader.c:638:bootloader_finished: bootloader completed
libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x8028367a0 wpath=/local/domain/2 token=3/0: deregister slotnum=3
libxl: error: libxl_create.c:1138:domcreate_rebuild_done: cannot (re-)build domain: -3
libxl: info: libxl.c:1699:devices_destroy_cb: forked pid 763 for destroy of domain 2
libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x802834080: complete, rc=-3
libxl: debug: libxl_event.c:1738:libxl__ao__destroy: ao 0x802834080: destroy
xc: debug: hypercall buffer: total allocations:44 total releases:44
xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
xc: debug: hypercall buffer: cache current size:2
xc: debug: hypercall buffer: cache hits:35 misses:2 toobig:7

And this is (I think) the relevant output from ktrace/kdump:

   766 xl       RET   read 20/0x14
   766 xl       CALL  read(0xc,0x80280702f,0x3cd)
   766 xl       RET   read -1 errno 35 Resource temporarily unavailable
   766 xl       CALL  gettimeofday(0x7fffffffde90,0)
   766 xl       RET   gettimeofday 0
   766 xl       CALL  poll(0x802816180,0x6,0xffffffff)
   766 xl       RET   poll 1
   766 xl       CALL  gettimeofday(0x7fffffffde90,0)
   766 xl       RET   gettimeofday 0
   766 xl       CALL  write(0x10,0x802807014,0x1b)
   766 xl       GIO   fd 16 wrote 27 bytes
       0x0000 1b5b 3f31 6c1b 3e1b 5b32 343b 3148 0000 0000 000d 1b5b  |.[?1l.>.[24;1H.......[|
       0x0016 3f31 6c1b 3e                                            |?1l.>|

   766 xl       RET   write 27/0x1b
   766 xl       CALL  gettimeofday(0x7fffffffde90,0)
   766 xl       RET   gettimeofday 0
   766 xl       CALL  poll(0x802816180,0x5,0xffffffff)
   766 xl       RET   poll 1
   766 xl       PSIG  SIGCHLD caught handler=0x80114bdf0 mask=0x0 code=CLD_EXITED
   766 xl       CALL  sigprocmask(SIG_SETMASK,0x7fffffffd81c,0)
   766 xl       RET   sigprocmask 0
   766 xl       CALL  write(0xe,0x800af1fd2,0x1)
   766 xl       GIO   fd 14 wrote 1 byte
       "\0"
   766 xl       RET   write 1
   766 xl       CALL  sigreturn(0x7fffffffd440)
   766 xl       RET   sigreturn JUSTRETURN
   766 xl       CALL  gettimeofday(0x7fffffffde90,0)
   766 xl       RET   gettimeofday 0
   766 xl       CALL  read(0xc,0x802807014,0x3e8)
   766 xl       GIO   fd 12 read 0 bytes
       ""
   766 xl       RET   read 0
   766 xl       CALL  write(0x2,0x7fffffffd0b0,0x7)
   766 xl       GIO   fd 2 wrote 7 bytes
       "libxl: "
   766 xl       RET   write 7
   766 xl       CALL  write(0x2,0x7fffffffd0b0,0x7)
   766 xl       GIO   fd 2 wrote 7 bytes
       "error: "

> Did/does commit 916735e1814d "tools/libxl: Restore errnoval behavior for
> datacopier callback" make any difference?

No, I still get the same error. I will try to take a closer look into this 
when I finish some pending stuff...

Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable
  2015-04-01 10:34       ` Roger Pau Monné
@ 2015-04-01 14:36         ` Andrew Cooper
  2015-04-02 15:03           ` [PATCH 0/3] datacopier POLLHUP fixes " Ian Jackson
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2015-04-01 14:36 UTC (permalink / raw)
  To: Roger Pau Monné, Ian Campbell
  Cc: Ross Lagerwall, Ian Jackson, Wei Liu, xen-devel

On 01/04/15 11:34, Roger Pau Monné wrote:
> Hello,
>
> El 30/03/15 a les 12.40, Ian Campbell ha escrit:
>> On Thu, 2015-03-26 at 16:20 +0100, Roger Pau Monné wrote:
>>> El 16/03/15 a les 14.29, Ross Lagerwall ha escrit:
>>>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>
>>>> POLLHUP|POLLIN is a valid revent to receive when there is readable data in a
>>>> pipe, but the writable fd has been closed.  This occurs in migration v2 when
>>>> the legacy conversion process (which transforms the data inline) completes and
>>>> exits successfully.
>>>>
>>>> In the case that there is data to read, suppress the POLLHUP.  POSIX states
>>>> that the hangup state is latched[1], which means it will reoccur on subsequent
>>>> poll() calls.  The datacopier is thus provided the opportunity to read until
>>>> EOF, if possible.
>>>>
>>>> A POLLHUP on its own is treated exactly as before, indicating a different
>>>> error with the fd.
>>>>
>>>> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
>>> This patch breaks booting pygrub guests on FreeBSD, it seems to trigger 
>>> some kind of error when pygrub exits:
>>>
>>> libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof copying bootloader output
>>> libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot (re-)build domain: -3
>>>
>>> I will try to look into it, have you tested if the same happens on 
>>> Linux?
>> Ross/Andrew, have you tried this on Linux?
>>
>> Roger, Does xl -vvv or even strace give any clues as to the sequences of
>> events which lead to the error?
> Here is the output of xl -vvv, it doesn't seem to contain any useful 
> info apart from what I've already posted:
>
> # xl -vvv create debian.cfg
> Parsing config from debian.cfg
> libxl: debug: libxl_create.c:1521:do_domain_create: ao 0x802834080: create: how=0x0 callback=0x0 poller=0x80281f0a0
> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=unknown
> libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk vdev=xvda, using backend phy
> libxl: debug: libxl_create.c:924:initiate_domain_create: running bootloader
> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=(null) spec.backend=phy
> libxl: debug: libxl.c:3065:libxl__device_disk_local_initiate_attach: locally attaching PHY disk /dev/zvol/tank/debian
> libxl: debug: libxl_bootloader.c:411:bootloader_disk_attached_cb: Config bootloader value: pygrub
> libxl: debug: libxl_bootloader.c:427:bootloader_disk_attached_cb: Checking for bootloader in libexec path: /usr/local/lib/xen/bin/pygrub
> libxl: debug: libxl_create.c:1537:do_domain_create: ao 0x802834080: inprogress: poller=0x80281f0a0, flags=i
> libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x8028367a0 wpath=/local/domain/2 token=3/0: register slotnum=3
> libxl: debug: libxl_event.c:1942:libxl__ao_progress_report: ao 0x802834080: progress report: ignored
> libxl: debug: libxl_bootloader.c:539:bootloader_gotptys: executing bootloader: /usr/local/lib/xen/bin/pygrub
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: /usr/local/lib/xen/bin/pygrub
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: --output=/var/run/xen/bootloader.2.out
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: --output-format=simple0
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: --output-directory=/var/run/xen/bootloader.2.d
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader arg: /dev/zvol/tank/debian
> libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x8028367a0 wpath=/local/domain/2 token=3/0: event epath=/local/domain/2
> libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof copying bootloader output
> libxl: debug: libxl_bootloader.c:638:bootloader_finished: bootloader completed
> libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x8028367a0 wpath=/local/domain/2 token=3/0: deregister slotnum=3
> libxl: error: libxl_create.c:1138:domcreate_rebuild_done: cannot (re-)build domain: -3
> libxl: info: libxl.c:1699:devices_destroy_cb: forked pid 763 for destroy of domain 2
> libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x802834080: complete, rc=-3
> libxl: debug: libxl_event.c:1738:libxl__ao__destroy: ao 0x802834080: destroy
> xc: debug: hypercall buffer: total allocations:44 total releases:44
> xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
> xc: debug: hypercall buffer: cache current size:2
> xc: debug: hypercall buffer: cache hits:35 misses:2 toobig:7
>
> And this is (I think) the relevant output from ktrace/kdump:
>
>    766 xl       RET   read 20/0x14
>    766 xl       CALL  read(0xc,0x80280702f,0x3cd)
>    766 xl       RET   read -1 errno 35 Resource temporarily unavailable
>    766 xl       CALL  gettimeofday(0x7fffffffde90,0)
>    766 xl       RET   gettimeofday 0
>    766 xl       CALL  poll(0x802816180,0x6,0xffffffff)
>    766 xl       RET   poll 1
>    766 xl       CALL  gettimeofday(0x7fffffffde90,0)
>    766 xl       RET   gettimeofday 0
>    766 xl       CALL  write(0x10,0x802807014,0x1b)
>    766 xl       GIO   fd 16 wrote 27 bytes
>        0x0000 1b5b 3f31 6c1b 3e1b 5b32 343b 3148 0000 0000 000d 1b5b  |.[?1l.>.[24;1H.......[|
>        0x0016 3f31 6c1b 3e                                            |?1l.>|
>
>    766 xl       RET   write 27/0x1b
>    766 xl       CALL  gettimeofday(0x7fffffffde90,0)
>    766 xl       RET   gettimeofday 0
>    766 xl       CALL  poll(0x802816180,0x5,0xffffffff)
>    766 xl       RET   poll 1
>    766 xl       PSIG  SIGCHLD caught handler=0x80114bdf0 mask=0x0 code=CLD_EXITED
>    766 xl       CALL  sigprocmask(SIG_SETMASK,0x7fffffffd81c,0)
>    766 xl       RET   sigprocmask 0
>    766 xl       CALL  write(0xe,0x800af1fd2,0x1)
>    766 xl       GIO   fd 14 wrote 1 byte
>        "\0"
>    766 xl       RET   write 1
>    766 xl       CALL  sigreturn(0x7fffffffd440)
>    766 xl       RET   sigreturn JUSTRETURN
>    766 xl       CALL  gettimeofday(0x7fffffffde90,0)
>    766 xl       RET   gettimeofday 0
>    766 xl       CALL  read(0xc,0x802807014,0x3e8)
>    766 xl       GIO   fd 12 read 0 bytes
>        ""
>    766 xl       RET   read 0
>    766 xl       CALL  write(0x2,0x7fffffffd0b0,0x7)
>    766 xl       GIO   fd 2 wrote 7 bytes
>        "libxl: "
>    766 xl       RET   write 7
>    766 xl       CALL  write(0x2,0x7fffffffd0b0,0x7)
>    766 xl       GIO   fd 2 wrote 7 bytes
>        "error: "
>
>> Did/does commit 916735e1814d "tools/libxl: Restore errnoval behavior for
>> datacopier callback" make any difference?
> No, I still get the same error. I will try to take a closer look into this 
> when I finish some pending stuff...

This works on Linux

libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch
w=0xb50870 wpath=/local/domain/11 token=3/0: register slotnum=3
libxl: debug: libxl_event.c:1942:libxl__ao_progress_report: ao 0xb53700:
progress report: ignored
libxl: debug: libxl_bootloader.c:539:bootloader_gotptys: executing
bootloader: /usr/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: /usr/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: --args=root=/dev/xvda2 ro
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: --output=/var/run/xen/bootloader.11.out
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: --output-format=simple0
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: --output-directory=/var/run/xen/bootloader.11.d
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys:   bootloader
arg: /dev/vm/andrewcoop-jessie-disk
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0xb50870
wpath=/local/domain/11 token=3/0: event epath=/local/domain/11
libxl: debug: libxl_bootloader.c:638:bootloader_finished: bootloader
completed
libxl: debug: libxl_bootloader.c:138:bootloader_result_command:
bootloader output contained kernel
/var/run/xen/bootloader.11.d/boot_kernel.t2OsEV
libxl: debug: libxl_bootloader.c:138:bootloader_result_command:
bootloader output contained ramdisk
/var/run/xen/bootloader.11.d/boot_ramdisk.3i2X1b
libxl: debug: libxl_bootloader.c:138:bootloader_result_command:
bootloader output contained args root=/dev/xvda2 ro elevator=noop
root=/dev/xvda2 ro
libxl: debug: libxl_bootloader.c:651:bootloader_finished: bootloader
execution successful


As 7e9ec5 has positively been identified as the cause of the regression
on FreeBSD, that presumably means that FreeBSD won't permit a read() of
an FD which has POLLHUP set.

POSIX appears to specifically allow this action.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer
  2015-03-16 13:29 ` [PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer Ross Lagerwall
  2015-03-18 11:18   ` Ian Campbell
@ 2015-04-01 15:53   ` Ian Jackson
  2015-04-01 15:59     ` Ian Campbell
  1 sibling, 1 reply; 29+ messages in thread
From: Ian Jackson @ 2015-04-01 15:53 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Wei Liu, Ian Campbell, xen-devel

Ross Lagerwall writes ("[PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer"):
> Currently a datacopier may source its data from an fd or local buffer, but \
its
> destination must be an fd.  For migration v2, libxl needs to read from the
> migration stream into a local buffer.

Can you please wrap your commit messages to 70 columns or less.

> Implement a "read into local buffer" mode, invoked when readbuf is set and
> writefd is -1.  On success, the callback passes the number of bytes read.
...
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 58ac658..931c00c 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2524,8 +2524,9 @@ typedef struct libxl__datacopier_buf libxl__datacopier_buf;
>  
>  /* onwrite==1 means failure happened when writing, logged, errnoval is valid
>   * onwrite==0 means failure happened when reading
> - *     errnoval==0 means we got eof and all data was written
> - *     errnoval!=0 means we had a read error, logged
> + *     errnoval>=0 means we got eof and all data was written or number of bytes
> + *                 written when in read mode
> + *     errnoval<0 means we had a read error, logged

I don't like this API.  Are you encoding a size in errnoval ?

Also, what does "errnoval<0 means we had a read error, logged" mean
for the actual value of errnoval ?  Is it a negated errno value ?

I think negated errno values should be entirely excluded from userland
code.  They just cause confusion.

Perhaps it would be better to include "bytes read" and "bytes written"
fields in the dc state structure.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer
  2015-04-01 15:53   ` Ian Jackson
@ 2015-04-01 15:59     ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2015-04-01 15:59 UTC (permalink / raw)
  To: Ian Jackson; +Cc: Ross Lagerwall, Andrew Cooper, Wei Liu, xen-devel

On Wed, 2015-04-01 at 16:53 +0100, Ian Jackson wrote:
> >  /* onwrite==1 means failure happened when writing, logged, errnoval is valid
> >   * onwrite==0 means failure happened when reading
> > - *     errnoval==0 means we got eof and all data was written
> > - *     errnoval!=0 means we had a read error, logged
> > + *     errnoval>=0 means we got eof and all data was written or number of bytes
> > + *                 written when in read mode
> > + *     errnoval<0 means we had a read error, logged
> 
> I don't like this API.  Are you encoding a size in errnoval ?

Note that this aspect has since been reverted since it also broke
things. Ross was going to look for another way to achieve it, and this:

> Perhaps it would be better to include "bytes read" and "bytes written"
> fields in the dc state structure.

Sounds like a very plausible answer.

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/3] datacopier POLLHUP fixes handling when the fd is also readable
  2015-04-01 14:36         ` Andrew Cooper
@ 2015-04-02 15:03           ` Ian Jackson
  2015-04-02 15:04             ` [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable" Ian Jackson
                               ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Ian Jackson @ 2015-04-02 15:03 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Ross Lagerwall, xen-devel, Wei Liu, Ian Campbell, Roger Pau Monné

Andrew Cooper writes ("Re: [Xen-devel] [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"):
> > 
> > libxl: error: libxl_bootloader.c:596:bootloader_copyfail:
> > unexpected eof copying bootloader output
> > 
> > libxl: debug: libxl_bootloader.c:638:bootloader_finished:
> > bootloader completed

This shows that the problem is with 7e9ec50b0535.  The bootloader code
relies on the difference between POLLHUP[|POLLIN] and just POLLIN.

I think this small series will fix the problem.  The commit messages
have more detailed explanations.

Note that I have NOT EXECUTED this code.  I have neither a FreeBSD
dom0 nor the migration v2 patches to hand.

I propose to push 1/3 (the revert of 7e9ec50b0535) immediately.
2/3 and 3/3 should await testing.

Andrew, do you have the capability to test 2/3 ?

3/3 is hard to test because the condition is probably very hard to
reproduce.  I will see if I can have a go.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"
  2015-04-02 15:03           ` [PATCH 0/3] datacopier POLLHUP fixes " Ian Jackson
@ 2015-04-02 15:04             ` Ian Jackson
  2015-04-02 15:08               ` Ian Campbell
                                 ` (2 more replies)
  2015-04-02 15:04             ` [PATCH 2/3] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson
                               ` (2 subsequent siblings)
  3 siblings, 3 replies; 29+ messages in thread
From: Ian Jackson @ 2015-04-02 15:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Ian Campbell, Andrew Cooper, Ian Jackson,
	Ross Lagerwall, Roger Pau Monné

The bootloader code is relying on detecting POLLHUP, and 7e9ec50b
breaks that.  7e9ec50b, when handling a pty master, violates the
specification of the datacopier interface (as defined).

When the bootloader exits, several things change, all at once:
 (a) The master pty fd (held by libxl) starts to signal POLLHUP
    and maybe also POLLIN.
 (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
    which will be handled by the libxl child process code.
 (c) reads on the master pty fd start to return EOF

From the point of view of the datacopier these might happen in any
order.  I think there is a latent bug with (c), which I will discuss
later in this email.

In a recent bug report from a FreeBSD installation, the datacopier
gets told about (a) before (b).  But 7e9ec50b filters the POLLHUP out,
so that the dc signals eof rather than hup.  As a result in
bootloader_copyfail we take the error path.

This reverts commit 7e9ec50b0535bf2630da9d279a060775817d136d.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_aoutils.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index da102a0..3942634 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -205,9 +205,6 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
     libxl__datacopier_state *dc = CONTAINER_OF(ev, *dc, toread);
     STATE_AO_GC(dc->ao);
 
-    if ((revents & (POLLHUP|POLLIN)) == (POLLHUP|POLLIN))
-        revents &= ~POLLHUP;
-
     if (datacopier_pollhup_handled(egc, dc, revents, 0))
         return;
 
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/3] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof
  2015-04-02 15:03           ` [PATCH 0/3] datacopier POLLHUP fixes " Ian Jackson
  2015-04-02 15:04             ` [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable" Ian Jackson
@ 2015-04-02 15:04             ` Ian Jackson
  2015-04-02 15:13               ` Ian Campbell
  2015-04-02 15:04             ` [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race Ian Jackson
  2015-04-02 15:09             ` [PATCH 0/3] datacopier POLLHUP fixes handling when the fd is also readable Ian Jackson
  3 siblings, 1 reply; 29+ messages in thread
From: Ian Jackson @ 2015-04-02 15:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Ian Campbell, Andrew Cooper, Ian Jackson,
	Ross Lagerwall, Roger Pau Monné

Some operating systems (including FreeBSD[1]) signal not just POLLIN
when a reading pipe reaches EOF, but also POLLHUP.  This is
permitted[2].  But datacopiers mishandle this, because they always
treat POLLHUP exceptionally (either reporting it via callback_pollhup,
or treating it as an error).  datacopiers reading from pipes on such
OSs can fail (perhaps leaving some data unprocessed) rather than
completing successfully.

[1] http://www.greenend.org.uk/rjk/tech/poll.html
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html

Most callers in libxl do not care about POLLHUP except as an error
condition.

So change the datacopier semantics so that unless callback_pollhup is
specified we treat POLLHUP|POLLIN as if it were POLLIN - and, read all
available data).

This fixes the problem which 7e9ec50b0535 was aimed at.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_aoutils.c  |    2 +-
 tools/libxl/libxl_internal.h |    3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 3942634..c3232a6 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -208,7 +208,7 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
     if (datacopier_pollhup_handled(egc, dc, revents, 0))
         return;
 
-    if (revents & ~POLLIN) {
+    if (revents & ~POLLIN && revents != (POLLIN|POLLHUP)) {
         LOG(ERROR, "unexpected poll event 0x%x (should be POLLIN)"
             " on %s during copy of %s", revents, dc->readwhat, dc->copywhat);
         datacopier_callback(egc, dc, -1, 0);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7f81dd3..c6969f0 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2535,7 +2535,8 @@ typedef struct libxl__datacopier_buf libxl__datacopier_buf;
  *     errnoval!=0 means we had a read error, logged
  * onwrite==-1 means some other internal failure, errnoval not valid, logged
  * If we get POLLHUP, we call callback_pollhup(..., onwrite, -1);
- * or if callback_pollhup==0 this is an internal failure, as above.
+ * or if callback_pollhup==0 this is treated as eof (if POLLIN|POLLHUP
+ * on the reading fd) or an internal failure (otherwise), as above.
  * In all cases copier is killed before calling this callback */
 typedef void libxl__datacopier_callback(libxl__egc *egc,
      libxl__datacopier_state *dc, int onwrite, int errnoval);
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race
  2015-04-02 15:03           ` [PATCH 0/3] datacopier POLLHUP fixes " Ian Jackson
  2015-04-02 15:04             ` [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable" Ian Jackson
  2015-04-02 15:04             ` [PATCH 2/3] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson
@ 2015-04-02 15:04             ` Ian Jackson
  2015-04-02 15:15               ` Ian Campbell
  2015-04-02 16:29               ` Ian Jackson
  2015-04-02 15:09             ` [PATCH 0/3] datacopier POLLHUP fixes handling when the fd is also readable Ian Jackson
  3 siblings, 2 replies; 29+ messages in thread
From: Ian Jackson @ 2015-04-02 15:04 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Ian Campbell, Andrew Cooper, Ian Jackson,
	Ross Lagerwall, Roger Pau Monné

When the bootloader exits, several things change, all at once:
 (a) The master pty fd (held by libxl) starts to signal POLLHUP
    and maybe also POLLIN.
 (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
    which will be handled by the libxl child process code.
 (c) reads on the master pty fd start to return EOF

From the point of view of the datacopier these might happen in any
order.

(c) can be detected only after a previous POLLIN without POLLHUP and
that previous POLLIN would be associated with data which was read,
which must therefore have ended up in the dc's buffer.  But nothing
stops the dc from writing that data into the output fd and reporting
eof before it calls poll again.

This race is unlikely.  Indeed it might be actually 100% precluded, by
luck, by the current organisation of the fd handling code.  But
nevertheless it should be fixed.

We solve the race with a poll of the reading fd, to double-check, when
we detect eof via read.  (This is only necessary if the caller has
specified callback_pollhup, as otherwise POLLHUP|POLLIN - and,
presumably, POLLIN followed perhaps by POLLHUP|POLLIN, is to be
treated as eof anyway.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_aoutils.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index c3232a6..9460fb4 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -250,6 +250,22 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
             return;
         }
         if (r == 0) {
+            if (dc->callback_pollhup) {
+                /* It might be that this "eof" is actually a HUP.  If
+                 * the caller cares about the difference,
+                 * double-check using poll(2). */
+                struct pollfd hupchk;
+                hupchk.fd = ev->fd;
+                hupchk.events = POLLIN;
+                hupchk.revents = 0;
+                r = poll(&hupchk, 1, 0);
+                if (r < 0)
+                    LIBXL__EVENT_DISASTER(egc,
+     "unexpected failure polling fd for datacopier eof hup check",
+                                  errno, 0);
+                if (datacopier_pollhup_handled(egc, dc, hupchk.revents, 0))
+                    return;
+            }
             libxl__ev_fd_deregister(gc, &dc->toread);
             break;
         }
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"
  2015-04-02 15:04             ` [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable" Ian Jackson
@ 2015-04-02 15:08               ` Ian Campbell
  2015-04-02 15:27                 ` Ian Jackson
  2015-04-02 15:10               ` Andrew Cooper
  2015-04-02 15:11               ` Ian Jackson
  2 siblings, 1 reply; 29+ messages in thread
From: Ian Campbell @ 2015-04-02 15:08 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andrew Cooper, xen-devel, Ross Lagerwall, Wei Liu, Roger Pau Monné

On Thu, 2015-04-02 at 16:04 +0100, Ian Jackson wrote:
> The bootloader code is relying on detecting POLLHUP, and 7e9ec50b
> breaks that.  7e9ec50b, when handling a pty master, violates the
> specification of the datacopier interface (as defined).
> 
> When the bootloader exits, several things change, all at once:
>  (a) The master pty fd (held by libxl) starts to signal POLLHUP
>     and maybe also POLLIN.
>  (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
>     which will be handled by the libxl child process code.
>  (c) reads on the master pty fd start to return EOF
> 
> From the point of view of the datacopier these might happen in any
> order.  I think there is a latent bug with (c), which I will discuss
> later in this email.

later in another mail maybe?

> In a recent bug report from a FreeBSD installation, the datacopier
> gets told about (a) before (b).  But 7e9ec50b filters the POLLHUP out,
> so that the dc signals eof rather than hup.  As a result in
> bootloader_copyfail we take the error path.
> 
> This reverts commit 7e9ec50b0535bf2630da9d279a060775817d136d.
> 
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxl/libxl_aoutils.c |    3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
> index da102a0..3942634 100644
> --- a/tools/libxl/libxl_aoutils.c
> +++ b/tools/libxl/libxl_aoutils.c
> @@ -205,9 +205,6 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
>      libxl__datacopier_state *dc = CONTAINER_OF(ev, *dc, toread);
>      STATE_AO_GC(dc->ao);
>  
> -    if ((revents & (POLLHUP|POLLIN)) == (POLLHUP|POLLIN))
> -        revents &= ~POLLHUP;
> -
>      if (datacopier_pollhup_handled(egc, dc, revents, 0))
>          return;
>  



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/3] datacopier POLLHUP fixes handling when the fd is also readable
  2015-04-02 15:03           ` [PATCH 0/3] datacopier POLLHUP fixes " Ian Jackson
                               ` (2 preceding siblings ...)
  2015-04-02 15:04             ` [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race Ian Jackson
@ 2015-04-02 15:09             ` Ian Jackson
  3 siblings, 0 replies; 29+ messages in thread
From: Ian Jackson @ 2015-04-02 15:09 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monné,
	Ian Campbell, Ross Lagerwall, xen-devel, Wei Liu

Ian Jackson writes ("[PATCH 0/3] datacopier POLLHUP fixes handling when the fd is also readable"):
> I think this small series will fix the problem.  The commit messages
> have more detailed explanations.

Now here:
  http://xenbits.xen.org/gitweb/?p=people/iwj/xen.git;a=shortlog;h=refs/heads/wip.datacopier-pollhup-fixes.v1
  xenbits.xen.org:/home/iwj/ext/xen.git#wip.datacopier-pollhup-fixes.v1

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"
  2015-04-02 15:04             ` [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable" Ian Jackson
  2015-04-02 15:08               ` Ian Campbell
@ 2015-04-02 15:10               ` Andrew Cooper
  2015-04-02 15:11               ` Ian Jackson
  2 siblings, 0 replies; 29+ messages in thread
From: Andrew Cooper @ 2015-04-02 15:10 UTC (permalink / raw)
  To: Ian Jackson, xen-devel
  Cc: Ross Lagerwall, Wei Liu, Ian Campbell, Roger Pau Monné

On 02/04/15 16:04, Ian Jackson wrote:
> The bootloader code is relying on detecting POLLHUP, and 7e9ec50b
> breaks that.  7e9ec50b, when handling a pty master, violates the
> specification of the datacopier interface (as defined).
>
> When the bootloader exits, several things change, all at once:
>  (a) The master pty fd (held by libxl) starts to signal POLLHUP
>     and maybe also POLLIN.
>  (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
>     which will be handled by the libxl child process code.
>  (c) reads on the master pty fd start to return EOF
>
> From the point of view of the datacopier these might happen in any
> order.  I think there is a latent bug with (c), which I will discuss
> later in this email.
>
> In a recent bug report from a FreeBSD installation, the datacopier
> gets told about (a) before (b).  But 7e9ec50b filters the POLLHUP out,
> so that the dc signals eof rather than hup.  As a result in
> bootloader_copyfail we take the error path.
>
> This reverts commit 7e9ec50b0535bf2630da9d279a060775817d136d.
>
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxl/libxl_aoutils.c |    3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
> index da102a0..3942634 100644
> --- a/tools/libxl/libxl_aoutils.c
> +++ b/tools/libxl/libxl_aoutils.c
> @@ -205,9 +205,6 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
>      libxl__datacopier_state *dc = CONTAINER_OF(ev, *dc, toread);
>      STATE_AO_GC(dc->ao);
>  
> -    if ((revents & (POLLHUP|POLLIN)) == (POLLHUP|POLLIN))
> -        revents &= ~POLLHUP;
> -
>      if (datacopier_pollhup_handled(egc, dc, revents, 0))
>          return;
>  


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"
  2015-04-02 15:04             ` [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable" Ian Jackson
  2015-04-02 15:08               ` Ian Campbell
  2015-04-02 15:10               ` Andrew Cooper
@ 2015-04-02 15:11               ` Ian Jackson
  2 siblings, 0 replies; 29+ messages in thread
From: Ian Jackson @ 2015-04-02 15:11 UTC (permalink / raw)
  To: xen-devel, Andrew Cooper, Roger Pau Monné,
	Ian Campbell, Ross Lagerwall, Wei Liu

Ian Jackson writes ("[PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable""):
> The bootloader code is relying on detecting POLLHUP, and 7e9ec50b
> breaks that.  7e9ec50b, when handling a pty master, violates the
> specification of the datacopier interface (as defined).
...
> This reverts commit 7e9ec50b0535bf2630da9d279a060775817d136d.

Now cherry picked and pushed onto staging, after consultation with
Andrew.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/3] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof
  2015-04-02 15:04             ` [PATCH 2/3] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson
@ 2015-04-02 15:13               ` Ian Campbell
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2015-04-02 15:13 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andrew Cooper, xen-devel, Ross Lagerwall, Wei Liu, Roger Pau Monné

On Thu, 2015-04-02 at 16:04 +0100, Ian Jackson wrote:
> Some operating systems (including FreeBSD[1]) signal not just POLLIN
> when a reading pipe reaches EOF, but also POLLHUP.  This is
> permitted[2].  But datacopiers mishandle this, because they always
> treat POLLHUP exceptionally (either reporting it via callback_pollhup,
> or treating it as an error).  datacopiers reading from pipes on such
> OSs can fail (perhaps leaving some data unprocessed) rather than
> completing successfully.
> 
> [1] http://www.greenend.org.uk/rjk/tech/poll.html
> [2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
> 
> Most callers in libxl do not care about POLLHUP except as an error
> condition.
> 
> So change the datacopier semantics so that unless callback_pollhup is
> specified we treat POLLHUP|POLLIN as if it were POLLIN - and, read all
> available data).

Punctuation is odd towards the end, a stray looking "-" and an unmatched
")". 

> This fixes the problem which 7e9ec50b0535 was aimed at.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
>  tools/libxl/libxl_aoutils.c  |    2 +-
>  tools/libxl/libxl_internal.h |    3 ++-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
> index 3942634..c3232a6 100644
> --- a/tools/libxl/libxl_aoutils.c
> +++ b/tools/libxl/libxl_aoutils.c
> @@ -208,7 +208,7 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
>      if (datacopier_pollhup_handled(egc, dc, revents, 0))
>          return;
>  
> -    if (revents & ~POLLIN) {
> +    if (revents & ~POLLIN && revents != (POLLIN|POLLHUP)) {

AIUI you are checking "revents != (POLLIN|POLLHUP)" rather than
"(revents&((POLLIN|POLLHUP)) != (POLLIN|POLLHUP)" because e.g. POLLIN|
POLLHUP|POLLOUT is nonsensical and is never expected.

>          LOG(ERROR, "unexpected poll event 0x%x (should be POLLIN)"
>              " on %s during copy of %s", revents, dc->readwhat, dc->copywhat);
>          datacopier_callback(egc, dc, -1, 0);
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 7f81dd3..c6969f0 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -2535,7 +2535,8 @@ typedef struct libxl__datacopier_buf libxl__datacopier_buf;
>   *     errnoval!=0 means we had a read error, logged
>   * onwrite==-1 means some other internal failure, errnoval not valid, logged
>   * If we get POLLHUP, we call callback_pollhup(..., onwrite, -1);
> - * or if callback_pollhup==0 this is an internal failure, as above.
> + * or if callback_pollhup==0 this is treated as eof (if POLLIN|POLLHUP
> + * on the reading fd) or an internal failure (otherwise), as above.
>   * In all cases copier is killed before calling this callback */
>  typedef void libxl__datacopier_callback(libxl__egc *egc,
>       libxl__datacopier_state *dc, int onwrite, int errnoval);



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race
  2015-04-02 15:04             ` [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race Ian Jackson
@ 2015-04-02 15:15               ` Ian Campbell
  2015-04-02 16:29               ` Ian Jackson
  1 sibling, 0 replies; 29+ messages in thread
From: Ian Campbell @ 2015-04-02 15:15 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Andrew Cooper, xen-devel, Ross Lagerwall, Wei Liu, Roger Pau Monné

On Thu, 2015-04-02 at 16:04 +0100, Ian Jackson wrote:
> When the bootloader exits, several things change, all at once:
>  (a) The master pty fd (held by libxl) starts to signal POLLHUP
>     and maybe also POLLIN.
>  (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
>     which will be handled by the libxl child process code.
>  (c) reads on the master pty fd start to return EOF
> 
> From the point of view of the datacopier these might happen in any
> order.
> 
> (c) can be detected only after a previous POLLIN without POLLHUP and
> that previous POLLIN would be associated with data which was read,
> which must therefore have ended up in the dc's buffer.  But nothing
> stops the dc from writing that data into the output fd and reporting
> eof before it calls poll again.
> 
> This race is unlikely.  Indeed it might be actually 100% precluded, by
> luck, by the current organisation of the fd handling code.  But
> nevertheless it should be fixed.
> 
> We solve the race with a poll of the reading fd, to double-check, when
> we detect eof via read.  (This is only necessary if the caller has
> specified callback_pollhup, as otherwise POLLHUP|POLLIN - and,
> presumably, POLLIN followed perhaps by POLLHUP|POLLIN, is to be
> treated as eof anyway.)
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable"
  2015-04-02 15:08               ` Ian Campbell
@ 2015-04-02 15:27                 ` Ian Jackson
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Jackson @ 2015-04-02 15:27 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Cooper, xen-devel, Ross Lagerwall, Wei Liu, Roger Pau Monné

Ian Campbell writes ("Re: [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable""):
> On Thu, 2015-04-02 at 16:04 +0100, Ian Jackson wrote:
> > The bootloader code is relying on detecting POLLHUP, and 7e9ec50b
> > breaks that.  7e9ec50b, when handling a pty master, violates the
> > specification of the datacopier interface (as defined).
> > 
> > When the bootloader exits, several things change, all at once:
> >  (a) The master pty fd (held by libxl) starts to signal POLLHUP
> >     and maybe also POLLIN.
> >  (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
> >     which will be handled by the libxl child process code.
> >  (c) reads on the master pty fd start to return EOF
> > 
> > From the point of view of the datacopier these might happen in any
> > order.  I think there is a latent bug with (c), which I will discuss
> > later in this email.
> 
> later in another mail maybe?

In 3/3.  Oh well, I have pushed it now.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race
  2015-04-02 15:04             ` [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race Ian Jackson
  2015-04-02 15:15               ` Ian Campbell
@ 2015-04-02 16:29               ` Ian Jackson
  2015-04-07 11:14                 ` Roger Pau Monné
  1 sibling, 1 reply; 29+ messages in thread
From: Ian Jackson @ 2015-04-02 16:29 UTC (permalink / raw)
  To: xen-devel, Andrew Cooper, Roger Pau Monné,
	Ian Campbell, Ross Lagerwall, Wei Liu

Ian Jackson writes ("[PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race"):
> We solve the race with a poll of the reading fd, to double-check, when
> we detect eof via read.  (This is only necessary if the caller has
> specified callback_pollhup, as otherwise POLLHUP|POLLIN - and,
> presumably, POLLIN followed perhaps by POLLHUP|POLLIN, is to be
> treated as eof anyway.)

I have been unable to reproduce this race.  On Linux, the pty master
returns EGAIN from read, not EOF, when the bootloader exits.

Below is the patch I used to test this.  I think this should repro the
bug on FreeBSD when doing "xl create" of a guest which (a) uses pygrub
(b) autoboots within 15 seconds.

Roger, can you conveniently test this ?  With the hacky patch below
you should see the bug, which should in turn be fixed by the 3/3 I am
just replying to.

Thanks,
Ian.

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index a08f780..5ba6d73 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -200,6 +200,10 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
     libxl__datacopier_state *dc = CONTAINER_OF(ev, *dc, toread);
     STATE_AO_GC(dc->ao);
 
+fprintf(stderr, "DC READABLE\n");
+sleep(15);
+fprintf(stderr, "DC READABLE CONTINUING %o\n", revents);
+
     if (datacopier_pollhup_handled(egc, dc, revents, 0))
         return;
 
@@ -211,6 +215,7 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
     }
     assert(revents & POLLIN);
     for (;;) {
+fprintf(stderr, "DC READABLE LOOP\n");
         while (dc->used >= dc->maxsz) {
             libxl__datacopier_buf *rm = LIBXL_TAILQ_FIRST(&dc->bufs);
             dc->used -= rm->used;
@@ -230,6 +235,7 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
         int r = read(ev->fd,
                      buf->buf + buf->used,
                      sizeof(buf->buf) - buf->used);
+fprintf(stderr, "DC READABLE READ r=%d %s\n",r,strerror(errno));
         if (r < 0) {
             if (errno == EINTR) continue;
             if (errno == EWOULDBLOCK) break;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race
  2015-04-02 16:29               ` Ian Jackson
@ 2015-04-07 11:14                 ` Roger Pau Monné
  2015-04-07 12:26                   ` Ian Jackson
  0 siblings, 1 reply; 29+ messages in thread
From: Roger Pau Monné @ 2015-04-07 11:14 UTC (permalink / raw)
  To: Ian Jackson, xen-devel, Andrew Cooper, Ian Campbell,
	Ross Lagerwall, Wei Liu

El 02/04/15 a les 18.29, Ian Jackson ha escrit:
> Ian Jackson writes ("[PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race"):
>> We solve the race with a poll of the reading fd, to double-check, when
>> we detect eof via read.  (This is only necessary if the caller has
>> specified callback_pollhup, as otherwise POLLHUP|POLLIN - and,
>> presumably, POLLIN followed perhaps by POLLHUP|POLLIN, is to be
>> treated as eof anyway.)
> 
> I have been unable to reproduce this race.  On Linux, the pty master
> returns EGAIN from read, not EOF, when the bootloader exits.
> 
> Below is the patch I used to test this.  I think this should repro the
> bug on FreeBSD when doing "xl create" of a guest which (a) uses pygrub
> (b) autoboots within 15 seconds.
> 
> Roger, can you conveniently test this ?  With the hacky patch below
> you should see the bug, which should in turn be fixed by the 3/3 I am
> just replying to.

Hello,

I don't seem to be able to trigger the issue with the debug patch 
applied. AFAICT pygrub is blocked (probably because the output buffer 
is full) and total execution greatly exceeds 15s seconds. Here is the 
full output:

# xl -vvv create debian.cfg
Parsing config from debian.cfg
libxl: debug: libxl_create.c:1512:do_domain_create: ao 0x802834080: create: how=0x0 callback=0x0 poller=0x80281f0a0
libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=unknown
libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk vdev=xvda, using backend phy
libxl: debug: libxl_create.c:915:initiate_domain_create: running bootloader
libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=(null) spec.backend=phy
libxl: debug: libxl.c:3020:libxl__device_disk_local_initiate_attach: locally attaching PHY disk /dev/zvol/tank/debian
libxl: debug: libxl_bootloader.c:411:bootloader_disk_attached_cb: Config bootloader value: pygrub
libxl: debug: libxl_bootloader.c:427:bootloader_disk_attached_cb: Checking for bootloader in libexec path: /usr/local/lib/xen/bin/pygrub
libxl: debug: libxl_create.c:1528:do_domain_create: ao 0x802834080: inprogress: poller=0x80281f0a0, flags=i
libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x802836780 wpath=/local/domain/4 token=3/0: register slotnum=3
libxl: debug: libxl_event.c:1942:libxl__ao_progress_report: ao 0x802834080: progress report: ignored
libxl: debug: libxl_bootloader.c:537:bootloader_gotptys: executing bootloader: /usr/local/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:541:bootloader_gotptys:   bootloader arg: /usr/local/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:541:bootloader_gotptys:   bootloader arg: --output=/var/run/xen/bootloader.4.out
libxl: debug: libxl_bootloader.c:541:bootloader_gotptys:   bootloader arg: --output-format=simple0
libxl: debug: libxl_bootloader.c:541:bootloader_gotptys:   bootloader arg: --output-directory=/var/run/xen/bootloader.4.d
libxl: debug: libxl_bootloader.c:541:bootloader_gotptys:   bootloader arg: /dev/zvol/tank/debian
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x802836780 wpath=/local/domain/4 token=3/0: event epath=/local/domain/4
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=315 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=71 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=988 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=127 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=22 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=105 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=134 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 1
DC READABLE LOOP
DC READABLE READ r=20 Resource temporarily unavailable
DC READABLE LOOP
DC READABLE READ r=-1 Resource temporarily unavailable
DC READABLE
DC READABLE CONTINUING 21
libxl: debug: libxl_aoutils.c:190:datacopier_pollhup_handled: received POLLHUP on bootloader pty during copy of bootloader output for domain 4
libxl: debug: libxl_bootloader.c:636:bootloader_finished: bootloader completed
libxl: debug: libxl_bootloader.c:138:bootloader_result_command: bootloader output contained kernel /var/run/xen/bootloader.4.d/boot_kernel._RYFNa
libxl: debug: libxl_bootloader.c:138:bootloader_result_command: bootloader output contained ramdisk /var/run/xen/bootloader.4.d/boot_ramdisk.Tj3qTl
libxl: debug: libxl_bootloader.c:138:bootloader_result_command: bootloader output contained args root=UUID=d1854b6d-e90f-4497-a614-5566ececc345 ro  quiet
libxl: debug: libxl_bootloader.c:649:bootloader_finished: bootloader execution successful
libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x802836780 wpath=/local/domain/4 token=3/0: deregister slotnum=3
domainbuilder: detail: xc_dom_allocate: cmdline="root=UUID=d1854b6d-e90f-4497-a614-5566ececc345 ro  quiet", features="(null)"
libxl: debug: libxl_dom.c:536:libxl__build_pv: pv kernel mapped 1 path /var/run/xen/bootloader.4.d/boot_kernel._RYFNa
domainbuilder: detail: xc_dom_kernel_mem: called
domainbuilder: detail: xc_dom_ramdisk_mem: called
domainbuilder: detail: xc_dom_boot_xen_init: ver 4.6, caps xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
domainbuilder: detail: xc_dom_parse_image: called
domainbuilder: detail: xc_dom_find_loader: trying multiboot-binary loader ...
domainbuilder: detail: loader probe failed
domainbuilder: detail: xc_dom_find_loader: trying Linux bzImage loader ...
domainbuilder: detail: xc_dom_malloc            : 11430 kB
domainbuilder: detail: xc_dom_do_gunzip: unzip ok, 0x2aaa36 -> 0xb299c0
domainbuilder: detail: loader probe OK
xc: detail: elf_parse_binary: phdr: paddr=0x1000000 memsz=0x554000
xc: detail: elf_parse_binary: phdr: paddr=0x1600000 memsz=0x950e0
xc: detail: elf_parse_binary: phdr: paddr=0x1696000 memsz=0x13400
xc: detail: elf_parse_binary: phdr: paddr=0x16aa000 memsz=0x294000
xc: detail: elf_parse_binary: memory: 0x1000000 -> 0x193e000
xc: detail: elf_xen_parse_note: GUEST_OS = "linux"
xc: detail: elf_xen_parse_note: GUEST_VERSION = "2.6"
xc: detail: elf_xen_parse_note: XEN_VERSION = "xen-3.0"
xc: detail: elf_xen_parse_note: VIRT_BASE = 0xffffffff80000000
xc: detail: elf_xen_parse_note: ENTRY = 0xffffffff816aa200
xc: detail: elf_xen_parse_note: HYPERCALL_PAGE = 0xffffffff81001000
xc: detail: elf_xen_parse_note: FEATURES = "!writable_page_tables|pae_pgdir_above_4gb"
xc: detail: elf_xen_parse_note: PAE_MODE = "yes"
xc: detail: elf_xen_parse_note: LOADER = "generic"
xc: detail: elf_xen_parse_note: unknown xen elf note (0xd)
xc: detail: elf_xen_parse_note: SUSPEND_CANCEL = 0x1
xc: detail: elf_xen_parse_note: HV_START_LOW = 0xffff800000000000
xc: detail: elf_xen_parse_note: PADDR_OFFSET = 0x0
xc: detail: elf_xen_addr_calc_check: addresses:
xc: detail:     virt_base        = 0xffffffff80000000
xc: detail:     elf_paddr_offset = 0x0
xc: detail:     virt_offset      = 0xffffffff80000000
xc: detail:     virt_kstart      = 0xffffffff81000000
xc: detail:     virt_kend        = 0xffffffff8193e000
xc: detail:     virt_entry       = 0xffffffff816aa200
xc: detail:     p2m_base         = 0xffffffffffffffff
domainbuilder: detail: xc_dom_parse_elf_kernel: xen-3.0-x86_64: 0xffffffff81000000 -> 0xffffffff8193e000
domainbuilder: detail: xc_dom_mem_init: mem 1024 MB, pages 0x40000 pages, 4k each
domainbuilder: detail: xc_dom_mem_init: 0x40000 pages
domainbuilder: detail: xc_dom_boot_mem_init: called
domainbuilder: detail: x86_compat: guest xen-3.0-x86_64, address size 64
domainbuilder: detail: xc_dom_malloc            : 2048 kB
domainbuilder: detail: xc_dom_build_image: called
domainbuilder: detail: xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff8193e000  (pfn 0x1000 + 0x93e pages)
domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x1000+0x93e at 0x804c00000
xc: detail: elf_load_binary: phdr 0 at 0x804c00000 -> 0x805154000
xc: detail: elf_load_binary: phdr 1 at 0x805200000 -> 0x8052950e0
xc: detail: elf_load_binary: phdr 2 at 0x805296000 -> 0x8052a9400
xc: detail: elf_load_binary: phdr 3 at 0x8052aa000 -> 0x805329000
domainbuilder: detail: xc_dom_alloc_segment:   ramdisk      : 0xffffffff8193e000 -> 0xffffffff836b9000  (pfn 0x193e + 0x1d7b pages)
domainbuilder: detail: xc_dom_malloc            : 176 kB
domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x193e+0x1d7b at 0x805600000
domainbuilder: detail: xc_dom_do_gunzip: unzip ok, 0x9c2a27 -> 0x1d7ac10
domainbuilder: detail: xc_dom_alloc_segment:   phys2mach    : 0xffffffff836b9000 -> 0xffffffff838b9000  (pfn 0x36b9 + 0x200 pages)
domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x36b9+0x200 at 0x803e00000
domainbuilder: detail: xc_dom_alloc_page   :   start info   : 0xffffffff838b9000 (pfn 0x38b9)
domainbuilder: detail: xc_dom_alloc_page   :   xenstore     : 0xffffffff838ba000 (pfn 0x38ba)
domainbuilder: detail: xc_dom_alloc_page   :   console      : 0xffffffff838bb000 (pfn 0x38bb)
domainbuilder: detail: nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
domainbuilder: detail: nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
domainbuilder: detail: nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
domainbuilder: detail: nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffff83bfffff, 30 table(s)
domainbuilder: detail: xc_dom_alloc_segment:   page tables  : 0xffffffff838bc000 -> 0xffffffff838dd000  (pfn 0x38bc + 0x21 pages)
domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x38bc+0x21 at 0x800681000
domainbuilder: detail: xc_dom_alloc_page   :   boot stack   : 0xffffffff838dd000 (pfn 0x38dd)
domainbuilder: detail: xc_dom_build_image  : virt_alloc_end : 0xffffffff838de000
domainbuilder: detail: xc_dom_build_image  : virt_pgtab_end : 0xffffffff83c00000
domainbuilder: detail: xc_dom_boot_image: called
domainbuilder: detail: arch_setup_bootearly: doing nothing
domainbuilder: detail: xc_dom_compat_check: supported guest type: xen-3.0-x86_64 <= matches
domainbuilder: detail: xc_dom_compat_check: supported guest type: xen-3.0-x86_32p
domainbuilder: detail: xc_dom_compat_check: supported guest type: hvm-3.0-x86_32
domainbuilder: detail: xc_dom_compat_check: supported guest type: hvm-3.0-x86_32p
domainbuilder: detail: xc_dom_compat_check: supported guest type: hvm-3.0-x86_64
domainbuilder: detail: xc_dom_update_guest_p2m: dst 64bit, pages 0x40000
domainbuilder: detail: clear_page: pfn 0x38bb, mfn 0x194457
domainbuilder: detail: clear_page: pfn 0x38ba, mfn 0x194458
domainbuilder: detail: xc_dom_pfn_to_ptr_retcount: domU mapping: pfn 0x38b9+0x1 at 0x80067e000
domainbuilder: detail: start_info_x86_64: called
domainbuilder: detail: setup_hypercall_page: vaddr=0xffffffff81001000 pfn=0x1001
domainbuilder: detail: domain builder memory footprint
domainbuilder: detail:    allocated
domainbuilder: detail:       malloc             : 13726 kB
domainbuilder: detail:       anon mmap          : 0 bytes
domainbuilder: detail:    mapped
domainbuilder: detail:       file mmap          : 0 bytes
domainbuilder: detail:       domU mmap          : 40 MB
domainbuilder: detail: arch_setup_bootlate: shared_info: pfn 0x0, mfn 0xd75fd
domainbuilder: detail: shared_info_x86_64: called
domainbuilder: detail: vcpu_x86_64: called
domainbuilder: detail: vcpu_x86_64: cr3: pfn 0x38bc mfn 0x194456
domainbuilder: detail: launch_vm: called, ctxt=0x80067c004
domainbuilder: detail: xc_dom_release: called
libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=phy
libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x8028693c8 wpath=/local/domain/0/backend/vbd/4/51712/state token=3/1: register slotnum=3
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x8028693c8 wpath=/local/domain/0/backend/vbd/4/51712/state token=3/1: event epath=/local/domain/0/backend/vbd/4/51712/state
libxl: debug: libxl_event.c:834:devstate_watch_callback: backend /local/domain/0/backend/vbd/4/51712/state wanted state 2 still waiting state 1
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x8028693c8 wpath=/local/domain/0/backend/vbd/4/51712/state token=3/1: event epath=/local/domain/0/backend/vbd/4/51712/state
libxl: debug: libxl_event.c:830:devstate_watch_callback: backend /local/domain/0/backend/vbd/4/51712/state wanted state 2 ok
libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x8028693c8 wpath=/local/domain/0/backend/vbd/4/51712/state token=3/1: deregister slotnum=3
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x8028693c8: deregister unregistered
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x802869450: deregister unregistered
libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x802869748 wpath=/local/domain/0/backend/vif/4/0/state token=3/2: register slotnum=3
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x802869748 wpath=/local/domain/0/backend/vif/4/0/state token=3/2: event epath=/local/domain/0/backend/vif/4/0/state
libxl: debug: libxl_event.c:834:devstate_watch_callback: backend /local/domain/0/backend/vif/4/0/state wanted state 2 still waiting state 1
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x802869748 wpath=/local/domain/0/backend/vif/4/0/state token=3/2: event epath=/local/domain/0/backend/vif/4/0/state
libxl: debug: libxl_event.c:830:devstate_watch_callback: backend /local/domain/0/backend/vif/4/0/state wanted state 2 ok
libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x802869748 wpath=/local/domain/0/backend/vif/4/0/state token=3/2: deregister slotnum=3
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x802869748: deregister unregistered
libxl: debug: libxl_device.c:1030:device_hotplug: calling hotplug script: /usr/local/etc/xen/scripts/vif-bridge /local/domain/0/backend/vif/4/0
libxl: debug: libxl_aoutils.c:519:libxl__async_exec_start: forking to execute: /usr/local/etc/xen/scripts/vif-bridge /local/domain/0/backend/vif/4/0
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x8028697d0: deregister unregistered
libxl: debug: libxl_event.c:629:libxl__ev_xswatch_deregister: watch w=0x8028697d0: deregister unregistered
libxl: debug: libxl_event.c:1942:libxl__ao_progress_report: ao 0x802834080: progress report: ignored
libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x802834080: complete, rc=0
libxl: debug: libxl_event.c:1738:libxl__ao__destroy: ao 0x802834080: destroy
xc: debug: hypercall buffer: total allocations:390 total releases:390
xc: debug: hypercall buffer: current allocations:0 maximum allocations:4
xc: debug: hypercall buffer: cache current size:4
xc: debug: hypercall buffer: cache hits:374 misses:4 toobig:12

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race
  2015-04-07 11:14                 ` Roger Pau Monné
@ 2015-04-07 12:26                   ` Ian Jackson
  0 siblings, 0 replies; 29+ messages in thread
From: Ian Jackson @ 2015-04-07 12:26 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, xen-devel, Wei Liu, Ian Campbell, Ross Lagerwall

Roger Pau Monné writes ("Re: [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race"):
> I don't seem to be able to trigger the issue with the debug patch 
> applied. AFAICT pygrub is blocked (probably because the output buffer 
> is full) and total execution greatly exceeds 15s seconds. Here is the 
> full output:

For reference, here is the relevant bits of the IRC conversation I
just had with Roger:

12:53 <Diziet> Maybe replace the bootloader with a script which does    sleep 
               1; echo hi; sleep 1; exit 0
13:05 <royger> Diziet: OK, with the dummy script I'm able to trigger the issue: 
               http://pastebin.com/raw.php?i=uUeJviV1
13:05 <royger> now I'm going to try with your wip series
13:15 <royger> Diziet: patch 3/3 seems to solve the issue, here's the output of 
               the debug patch appplied on top of 
               wip.datacopier-pollhup-fixes.v1 and with the dummy pygrub 
               script: http://pastebin.com/raw.php?i=cQqsFuf3
13:17 <Diziet> royger: Marvellous, those are the outputs I'm expecting.  Thank 
               you very much.
13:18 <Diziet> So, I will repost these with your Tested-by on the final patch, 
               if that's OK.
13:18 <royger> Diziet: sure :)

Ian.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-04-07 12:26 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-16 13:29 [PATCH v3 1/6] tools/libxl: Introduce min and max macros Ross Lagerwall
2015-03-16 13:29 ` [PATCH v3 2/6] tools/libxl: Update datacopier to support sending data only Ross Lagerwall
2015-03-16 13:29 ` [PATCH v3 3/6] tools/libxl: Avoid overrunning static buffer with prefixdata Ross Lagerwall
2015-03-16 13:29 ` [PATCH v3 4/6] tools/libxl: Allow limiting amount copied by datacopier Ross Lagerwall
2015-03-18 11:12   ` Ian Campbell
2015-03-16 13:29 ` [PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer Ross Lagerwall
2015-03-18 11:18   ` Ian Campbell
2015-04-01 15:53   ` Ian Jackson
2015-04-01 15:59     ` Ian Campbell
2015-03-16 13:29 ` [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable Ross Lagerwall
2015-03-18 11:31   ` Ian Campbell
2015-03-26 15:20   ` Roger Pau Monné
2015-03-30 10:40     ` Ian Campbell
2015-04-01 10:34       ` Roger Pau Monné
2015-04-01 14:36         ` Andrew Cooper
2015-04-02 15:03           ` [PATCH 0/3] datacopier POLLHUP fixes " Ian Jackson
2015-04-02 15:04             ` [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable" Ian Jackson
2015-04-02 15:08               ` Ian Campbell
2015-04-02 15:27                 ` Ian Jackson
2015-04-02 15:10               ` Andrew Cooper
2015-04-02 15:11               ` Ian Jackson
2015-04-02 15:04             ` [PATCH 2/3] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson
2015-04-02 15:13               ` Ian Campbell
2015-04-02 15:04             ` [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race Ian Jackson
2015-04-02 15:15               ` Ian Campbell
2015-04-02 16:29               ` Ian Jackson
2015-04-07 11:14                 ` Roger Pau Monné
2015-04-07 12:26                   ` Ian Jackson
2015-04-02 15:09             ` [PATCH 0/3] datacopier POLLHUP fixes handling when the fd is also readable Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.