All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2]: libxl: datacopier POLLHUP fixes
@ 2015-04-07 13:05 Ian Jackson
  2015-04-07 13:05 ` [PATCH 1/2] libxl: datacopier: Avoid eof/POLLHUP race Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2015-04-07 13:05 UTC (permalink / raw)
  To: Andrew Cooper, Roger Pau Monné,
	Ian Campbell, Ross Lagerwall, xen-devel, Wei Liu

7e9ec50b0535 "tools/libxl: Adjust datacopiers POLLHUP handling when
the fd is also readable" has been reverted by the application of what
was 1/3 of v1 of this mini-series.

(Sorry about the odd reference in the commit message to `later in this
mail', which was a reference to `Avoid [theoretical] eof/POLLHUP race'
and ought to have just read `later'.)

These are the remaining two patches:

[PATCH 1/2] libxl: datacopier: Avoid eof/POLLHUP race

This was 3/3 and was described as "theoretical" but with a test patch
Roger has been able to reproduce it and verify that this patch fixes
the problem.  This bug is still (I think) quite unlikely, but
this patch is a IMO candidate for backport.

[PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on

An earlier (wrong, I now think) approach to this was 2/3 of the
previous series.  I have removed Ian Campbell's ack.  It would be good
to know whether this revised version actually fixes the bug that
7e9ec50b0535 was aimed at.

Thanks,
Ian.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] libxl: datacopier: Avoid eof/POLLHUP race
  2015-04-07 13:05 [PATCH v2 0/2]: libxl: datacopier POLLHUP fixes Ian Jackson
@ 2015-04-07 13:05 ` Ian Jackson
  2015-04-07 13:05   ` [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2015-04-07 13:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Ross Lagerwall

When the bootloader exits, several things change, all at once:
 (a) The master pty fd (held by libxl) starts to signal POLLHUP
    and maybe also POLLIN.
 (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN,
    which will be handled by the libxl child process code.
 (c) reads on the master pty fd start to return EOF

From the point of view of the datacopier these might happen in any
order.

(c) can be detected only after a previous POLLIN without POLLHUP and
that previous POLLIN would be associated with data which was read,
which must therefore have ended up in the dc's buffer.  But nothing
stops the dc from writing that data into the output fd and reporting
eof before it calls poll again.

This race is unlikely.  But  nevertheless it should be fixed.

We solve the race with a poll of the reading fd, to double-check, when
we detect eof via read.  (This is only necessary if the caller has
specified callback_pollhup, as otherwise POLLHUP|POLLIN - and,
presumably, POLLIN followed perhaps by POLLHUP|POLLIN, is to be
treated as eof anyway.)

With a testing patch supplied by me, Roger Pau Monné has reproduced
the failure on FreeBSD and verified that this patch fixes the problem.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 tools/libxl/libxl_aoutils.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index 3942634..ddbe6ae 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -250,6 +250,22 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
             return;
         }
         if (r == 0) {
+            if (dc->callback_pollhup) {
+                /* It might be that this "eof" is actually a HUP.  If
+                 * the caller cares about the difference,
+                 * double-check using poll(2). */
+                struct pollfd hupchk;
+                hupchk.fd = ev->fd;
+                hupchk.events = POLLIN;
+                hupchk.revents = 0;
+                r = poll(&hupchk, 1, 0);
+                if (r < 0)
+                    LIBXL__EVENT_DISASTER(egc,
+     "unexpected failure polling fd for datacopier eof hup check",
+                                  errno, 0);
+                if (datacopier_pollhup_handled(egc, dc, hupchk.revents, 0))
+                    return;
+            }
             libxl__ev_fd_deregister(gc, &dc->toread);
             break;
         }
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof
  2015-04-07 13:05 ` [PATCH 1/2] libxl: datacopier: Avoid eof/POLLHUP race Ian Jackson
@ 2015-04-07 13:05   ` Ian Jackson
  2015-04-08 11:05     ` Wei Liu
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Jackson @ 2015-04-07 13:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, Ian Campbell, Andrew Cooper, Ian Jackson,
	Ross Lagerwall, Roger Pau Monné

Some operating systems (including Linux and FreeBSD[1]) signal not
(only) POLLIN when a reading pipe reaches EOF, but POLLHUP (with or
without POLLIN).  This is permitted[2].  The implications are that in
the general case it is not possible to determine whether POLLHUP
indicates an error or simply eof without attempting a read.

Datacopiers mishandle this, because they always treat POLLHUP
exceptionally (either reporting it via callback_pollhup, or treating
it as an error).  datacopiers reading from pipes on such OSs can fail
(perhaps leaving some data unprocessed) rather than completing
successfully.

[1] http://www.greenend.org.uk/rjk/tech/poll.html
[2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html

Distinguishing POLLHUP is needed for pty fds, but most callers in
libxl do not care about POLLHUP except as an error or eof condition.

So change the datacopier semantics so that if callback_pollhup is not
specified we treat POLLHUP almost like POLLIN.  The difference is that
if we get HUP from poll, but EWOULDBLOCK from read, we must signal an
error ratehr than attempting the read again.

This fixes the problem which 7e9ec50b0535 was aimed at.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Ross Lagerwall <ross.lagerwall@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
v2: Take a completely different approach which is correct for OSes
    which always signal pipe EOF with POLLHUP.
---
 tools/libxl/libxl_aoutils.c  |   19 +++++++++++++++----
 tools/libxl/libxl_internal.h |    3 ++-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c
index ddbe6ae..ef679dd 100644
--- a/tools/libxl/libxl_aoutils.c
+++ b/tools/libxl/libxl_aoutils.c
@@ -208,13 +208,14 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
     if (datacopier_pollhup_handled(egc, dc, revents, 0))
         return;
 
-    if (revents & ~POLLIN) {
-        LOG(ERROR, "unexpected poll event 0x%x (should be POLLIN)"
+    if (revents & ~(POLLIN|POLLHUP)) {
+        LOG(ERROR,
+            "unexpected poll event 0x%x (expected POLLIN and/or POLLHUP)"
             " on %s during copy of %s", revents, dc->readwhat, dc->copywhat);
         datacopier_callback(egc, dc, -1, 0);
         return;
     }
-    assert(revents & POLLIN);
+    assert(revents & (POLLIN|POLLHUP));
     for (;;) {
         libxl__datacopier_buf *buf = NULL;
         int r;
@@ -243,7 +244,17 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev,
         }
         if (r < 0) {
             if (errno == EINTR) continue;
-            if (errno == EWOULDBLOCK) break;
+            if (errno == EWOULDBLOCK) {
+                if (revents & POLLHUP) {
+                    LOG(ERROR,
+                        "poll reported HUP but fd read gave EWOULDBLOCK"
+                        " on %s during copy of %s",
+                        dc->readwhat, dc->copywhat);
+                    datacopier_callback(egc, dc, -1, 0);
+                    return;
+                }
+                break;
+            }
             LOGE(ERROR, "error reading %s during copy of %s",
                  dc->readwhat, dc->copywhat);
             datacopier_callback(egc, dc, 0, errno);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 3aba221..b6aa962 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2538,7 +2538,8 @@ typedef struct libxl__datacopier_buf libxl__datacopier_buf;
  *     errnoval!=0 means we had a read error, logged
  * onwrite==-1 means some other internal failure, errnoval not valid, logged
  * If we get POLLHUP, we call callback_pollhup(..., onwrite, -1);
- * or if callback_pollhup==0 this is an internal failure, as above.
+ * or if callback_pollhup==0 this is treated as eof (if POLLIN|POLLHUP
+ * on the reading fd) or an internal failure (otherwise), as above.
  * In all cases copier is killed before calling this callback */
 typedef void libxl__datacopier_callback(libxl__egc *egc,
      libxl__datacopier_state *dc, int onwrite, int errnoval);
-- 
1.7.10.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof
  2015-04-07 13:05   ` [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson
@ 2015-04-08 11:05     ` Wei Liu
  2015-04-15 13:31       ` Ian Campbell
  0 siblings, 1 reply; 5+ messages in thread
From: Wei Liu @ 2015-04-08 11:05 UTC (permalink / raw)
  To: Ian Jackson
  Cc: xen-devel, Wei Liu, Ian Campbell, Andrew Cooper, Ross Lagerwall,
	Roger Pau Monné

On Tue, Apr 07, 2015 at 02:05:28PM +0100, Ian Jackson wrote:
> Some operating systems (including Linux and FreeBSD[1]) signal not
> (only) POLLIN when a reading pipe reaches EOF, but POLLHUP (with or
> without POLLIN).  This is permitted[2].  The implications are that in
> the general case it is not possible to determine whether POLLHUP
> indicates an error or simply eof without attempting a read.
> 
> Datacopiers mishandle this, because they always treat POLLHUP
> exceptionally (either reporting it via callback_pollhup, or treating
> it as an error).  datacopiers reading from pipes on such OSs can fail
> (perhaps leaving some data unprocessed) rather than completing
> successfully.
> 
> [1] http://www.greenend.org.uk/rjk/tech/poll.html
> [2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
> 
> Distinguishing POLLHUP is needed for pty fds, but most callers in
> libxl do not care about POLLHUP except as an error or eof condition.
> 
> So change the datacopier semantics so that if callback_pollhup is not
> specified we treat POLLHUP almost like POLLIN.  The difference is that
> if we get HUP from poll, but EWOULDBLOCK from read, we must signal an
> error ratehr than attempting the read again.
> 
> This fixes the problem which 7e9ec50b0535 was aimed at.
> 
> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>

The code matches documents and commit log so:

Acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof
  2015-04-08 11:05     ` Wei Liu
@ 2015-04-15 13:31       ` Ian Campbell
  0 siblings, 0 replies; 5+ messages in thread
From: Ian Campbell @ 2015-04-15 13:31 UTC (permalink / raw)
  To: Wei Liu
  Cc: Ross Lagerwall, Andrew Cooper, xen-devel, Ian Jackson,
	Roger Pau Monné

On Wed, 2015-04-08 at 12:05 +0100, Wei Liu wrote:
> On Tue, Apr 07, 2015 at 02:05:28PM +0100, Ian Jackson wrote:
> > Some operating systems (including Linux and FreeBSD[1]) signal not
> > (only) POLLIN when a reading pipe reaches EOF, but POLLHUP (with or
> > without POLLIN).  This is permitted[2].  The implications are that in
> > the general case it is not possible to determine whether POLLHUP
> > indicates an error or simply eof without attempting a read.
> > 
> > Datacopiers mishandle this, because they always treat POLLHUP
> > exceptionally (either reporting it via callback_pollhup, or treating
> > it as an error).  datacopiers reading from pipes on such OSs can fail
> > (perhaps leaving some data unprocessed) rather than completing
> > successfully.
> > 
> > [1] http://www.greenend.org.uk/rjk/tech/poll.html
> > [2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
> > 
> > Distinguishing POLLHUP is needed for pty fds, but most callers in
> > libxl do not care about POLLHUP except as an error or eof condition.
> > 
> > So change the datacopier semantics so that if callback_pollhup is not
> > specified we treat POLLHUP almost like POLLIN.  The difference is that
> > if we get HUP from poll, but EWOULDBLOCK from read, we must signal an
> > error ratehr than attempting the read again.
> > 
> > This fixes the problem which 7e9ec50b0535 was aimed at.
> > 
> > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
> > CC: Ian Campbell <ian.campbell@citrix.com>
> > CC: Andrew Cooper <andrew.cooper3@citrix.com>
> > CC: Roger Pau Monné <roger.pau@citrix.com>
> > CC: Ross Lagerwall <ross.lagerwall@citrix.com>
> > CC: Wei Liu <wei.liu2@citrix.com>
> 
> The code matches documents and commit log so:
> 
> Acked-by: Wei Liu <wei.liu2@citrix.com>

Likewise, so I applied, doing s/ratehr/rather/ on the commit message.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-15 13:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-07 13:05 [PATCH v2 0/2]: libxl: datacopier POLLHUP fixes Ian Jackson
2015-04-07 13:05 ` [PATCH 1/2] libxl: datacopier: Avoid eof/POLLHUP race Ian Jackson
2015-04-07 13:05   ` [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson
2015-04-08 11:05     ` Wei Liu
2015-04-15 13:31       ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.