* [PATCH v2 0/2]: libxl: datacopier POLLHUP fixes @ 2015-04-07 13:05 Ian Jackson 2015-04-07 13:05 ` [PATCH 1/2] libxl: datacopier: Avoid eof/POLLHUP race Ian Jackson 0 siblings, 1 reply; 5+ messages in thread From: Ian Jackson @ 2015-04-07 13:05 UTC (permalink / raw) To: Andrew Cooper, Roger Pau Monné, Ian Campbell, Ross Lagerwall, xen-devel, Wei Liu 7e9ec50b0535 "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable" has been reverted by the application of what was 1/3 of v1 of this mini-series. (Sorry about the odd reference in the commit message to `later in this mail', which was a reference to `Avoid [theoretical] eof/POLLHUP race' and ought to have just read `later'.) These are the remaining two patches: [PATCH 1/2] libxl: datacopier: Avoid eof/POLLHUP race This was 3/3 and was described as "theoretical" but with a test patch Roger has been able to reproduce it and verify that this patch fixes the problem. This bug is still (I think) quite unlikely, but this patch is a IMO candidate for backport. [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on An earlier (wrong, I now think) approach to this was 2/3 of the previous series. I have removed Ian Campbell's ack. It would be good to know whether this revised version actually fixes the bug that 7e9ec50b0535 was aimed at. Thanks, Ian. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] libxl: datacopier: Avoid eof/POLLHUP race 2015-04-07 13:05 [PATCH v2 0/2]: libxl: datacopier POLLHUP fixes Ian Jackson @ 2015-04-07 13:05 ` Ian Jackson 2015-04-07 13:05 ` [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson 0 siblings, 1 reply; 5+ messages in thread From: Ian Jackson @ 2015-04-07 13:05 UTC (permalink / raw) To: xen-devel; +Cc: Andrew Cooper, Ian Jackson, Wei Liu, Ross Lagerwall When the bootloader exits, several things change, all at once: (a) The master pty fd (held by libxl) starts to signal POLLHUP and maybe also POLLIN. (b) The child exits (so that the SIGCHLD self-pipe signals POLLIN, which will be handled by the libxl child process code. (c) reads on the master pty fd start to return EOF From the point of view of the datacopier these might happen in any order. (c) can be detected only after a previous POLLIN without POLLHUP and that previous POLLIN would be associated with data which was read, which must therefore have ended up in the dc's buffer. But nothing stops the dc from writing that data into the output fd and reporting eof before it calls poll again. This race is unlikely. But nevertheless it should be fixed. We solve the race with a poll of the reading fd, to double-check, when we detect eof via read. (This is only necessary if the caller has specified callback_pollhup, as otherwise POLLHUP|POLLIN - and, presumably, POLLIN followed perhaps by POLLHUP|POLLIN, is to be treated as eof anyway.) With a testing patch supplied by me, Roger Pau Monné has reproduced the failure on FreeBSD and verified that this patch fixes the problem. Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Tested-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ross Lagerwall <ross.lagerwall@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> --- tools/libxl/libxl_aoutils.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c index 3942634..ddbe6ae 100644 --- a/tools/libxl/libxl_aoutils.c +++ b/tools/libxl/libxl_aoutils.c @@ -250,6 +250,22 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev, return; } if (r == 0) { + if (dc->callback_pollhup) { + /* It might be that this "eof" is actually a HUP. If + * the caller cares about the difference, + * double-check using poll(2). */ + struct pollfd hupchk; + hupchk.fd = ev->fd; + hupchk.events = POLLIN; + hupchk.revents = 0; + r = poll(&hupchk, 1, 0); + if (r < 0) + LIBXL__EVENT_DISASTER(egc, + "unexpected failure polling fd for datacopier eof hup check", + errno, 0); + if (datacopier_pollhup_handled(egc, dc, hupchk.revents, 0)) + return; + } libxl__ev_fd_deregister(gc, &dc->toread); break; } -- 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof 2015-04-07 13:05 ` [PATCH 1/2] libxl: datacopier: Avoid eof/POLLHUP race Ian Jackson @ 2015-04-07 13:05 ` Ian Jackson 2015-04-08 11:05 ` Wei Liu 0 siblings, 1 reply; 5+ messages in thread From: Ian Jackson @ 2015-04-07 13:05 UTC (permalink / raw) To: xen-devel Cc: Wei Liu, Ian Campbell, Andrew Cooper, Ian Jackson, Ross Lagerwall, Roger Pau Monné Some operating systems (including Linux and FreeBSD[1]) signal not (only) POLLIN when a reading pipe reaches EOF, but POLLHUP (with or without POLLIN). This is permitted[2]. The implications are that in the general case it is not possible to determine whether POLLHUP indicates an error or simply eof without attempting a read. Datacopiers mishandle this, because they always treat POLLHUP exceptionally (either reporting it via callback_pollhup, or treating it as an error). datacopiers reading from pipes on such OSs can fail (perhaps leaving some data unprocessed) rather than completing successfully. [1] http://www.greenend.org.uk/rjk/tech/poll.html [2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html Distinguishing POLLHUP is needed for pty fds, but most callers in libxl do not care about POLLHUP except as an error or eof condition. So change the datacopier semantics so that if callback_pollhup is not specified we treat POLLHUP almost like POLLIN. The difference is that if we get HUP from poll, but EWOULDBLOCK from read, we must signal an error ratehr than attempting the read again. This fixes the problem which 7e9ec50b0535 was aimed at. Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Ross Lagerwall <ross.lagerwall@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> --- v2: Take a completely different approach which is correct for OSes which always signal pipe EOF with POLLHUP. --- tools/libxl/libxl_aoutils.c | 19 +++++++++++++++---- tools/libxl/libxl_internal.h | 3 ++- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/tools/libxl/libxl_aoutils.c b/tools/libxl/libxl_aoutils.c index ddbe6ae..ef679dd 100644 --- a/tools/libxl/libxl_aoutils.c +++ b/tools/libxl/libxl_aoutils.c @@ -208,13 +208,14 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev, if (datacopier_pollhup_handled(egc, dc, revents, 0)) return; - if (revents & ~POLLIN) { - LOG(ERROR, "unexpected poll event 0x%x (should be POLLIN)" + if (revents & ~(POLLIN|POLLHUP)) { + LOG(ERROR, + "unexpected poll event 0x%x (expected POLLIN and/or POLLHUP)" " on %s during copy of %s", revents, dc->readwhat, dc->copywhat); datacopier_callback(egc, dc, -1, 0); return; } - assert(revents & POLLIN); + assert(revents & (POLLIN|POLLHUP)); for (;;) { libxl__datacopier_buf *buf = NULL; int r; @@ -243,7 +244,17 @@ static void datacopier_readable(libxl__egc *egc, libxl__ev_fd *ev, } if (r < 0) { if (errno == EINTR) continue; - if (errno == EWOULDBLOCK) break; + if (errno == EWOULDBLOCK) { + if (revents & POLLHUP) { + LOG(ERROR, + "poll reported HUP but fd read gave EWOULDBLOCK" + " on %s during copy of %s", + dc->readwhat, dc->copywhat); + datacopier_callback(egc, dc, -1, 0); + return; + } + break; + } LOGE(ERROR, "error reading %s during copy of %s", dc->readwhat, dc->copywhat); datacopier_callback(egc, dc, 0, errno); diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 3aba221..b6aa962 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2538,7 +2538,8 @@ typedef struct libxl__datacopier_buf libxl__datacopier_buf; * errnoval!=0 means we had a read error, logged * onwrite==-1 means some other internal failure, errnoval not valid, logged * If we get POLLHUP, we call callback_pollhup(..., onwrite, -1); - * or if callback_pollhup==0 this is an internal failure, as above. + * or if callback_pollhup==0 this is treated as eof (if POLLIN|POLLHUP + * on the reading fd) or an internal failure (otherwise), as above. * In all cases copier is killed before calling this callback */ typedef void libxl__datacopier_callback(libxl__egc *egc, libxl__datacopier_state *dc, int onwrite, int errnoval); -- 1.7.10.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof 2015-04-07 13:05 ` [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson @ 2015-04-08 11:05 ` Wei Liu 2015-04-15 13:31 ` Ian Campbell 0 siblings, 1 reply; 5+ messages in thread From: Wei Liu @ 2015-04-08 11:05 UTC (permalink / raw) To: Ian Jackson Cc: xen-devel, Wei Liu, Ian Campbell, Andrew Cooper, Ross Lagerwall, Roger Pau Monné On Tue, Apr 07, 2015 at 02:05:28PM +0100, Ian Jackson wrote: > Some operating systems (including Linux and FreeBSD[1]) signal not > (only) POLLIN when a reading pipe reaches EOF, but POLLHUP (with or > without POLLIN). This is permitted[2]. The implications are that in > the general case it is not possible to determine whether POLLHUP > indicates an error or simply eof without attempting a read. > > Datacopiers mishandle this, because they always treat POLLHUP > exceptionally (either reporting it via callback_pollhup, or treating > it as an error). datacopiers reading from pipes on such OSs can fail > (perhaps leaving some data unprocessed) rather than completing > successfully. > > [1] http://www.greenend.org.uk/rjk/tech/poll.html > [2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html > > Distinguishing POLLHUP is needed for pty fds, but most callers in > libxl do not care about POLLHUP except as an error or eof condition. > > So change the datacopier semantics so that if callback_pollhup is not > specified we treat POLLHUP almost like POLLIN. The difference is that > if we get HUP from poll, but EWOULDBLOCK from read, we must signal an > error ratehr than attempting the read again. > > This fixes the problem which 7e9ec50b0535 was aimed at. > > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > CC: Ian Campbell <ian.campbell@citrix.com> > CC: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > CC: Ross Lagerwall <ross.lagerwall@citrix.com> > CC: Wei Liu <wei.liu2@citrix.com> The code matches documents and commit log so: Acked-by: Wei Liu <wei.liu2@citrix.com> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof 2015-04-08 11:05 ` Wei Liu @ 2015-04-15 13:31 ` Ian Campbell 0 siblings, 0 replies; 5+ messages in thread From: Ian Campbell @ 2015-04-15 13:31 UTC (permalink / raw) To: Wei Liu Cc: Ross Lagerwall, Andrew Cooper, xen-devel, Ian Jackson, Roger Pau Monné On Wed, 2015-04-08 at 12:05 +0100, Wei Liu wrote: > On Tue, Apr 07, 2015 at 02:05:28PM +0100, Ian Jackson wrote: > > Some operating systems (including Linux and FreeBSD[1]) signal not > > (only) POLLIN when a reading pipe reaches EOF, but POLLHUP (with or > > without POLLIN). This is permitted[2]. The implications are that in > > the general case it is not possible to determine whether POLLHUP > > indicates an error or simply eof without attempting a read. > > > > Datacopiers mishandle this, because they always treat POLLHUP > > exceptionally (either reporting it via callback_pollhup, or treating > > it as an error). datacopiers reading from pipes on such OSs can fail > > (perhaps leaving some data unprocessed) rather than completing > > successfully. > > > > [1] http://www.greenend.org.uk/rjk/tech/poll.html > > [2] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html > > > > Distinguishing POLLHUP is needed for pty fds, but most callers in > > libxl do not care about POLLHUP except as an error or eof condition. > > > > So change the datacopier semantics so that if callback_pollhup is not > > specified we treat POLLHUP almost like POLLIN. The difference is that > > if we get HUP from poll, but EWOULDBLOCK from read, we must signal an > > error ratehr than attempting the read again. > > > > This fixes the problem which 7e9ec50b0535 was aimed at. > > > > Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> > > CC: Ian Campbell <ian.campbell@citrix.com> > > CC: Andrew Cooper <andrew.cooper3@citrix.com> > > CC: Roger Pau Monné <roger.pau@citrix.com> > > CC: Ross Lagerwall <ross.lagerwall@citrix.com> > > CC: Wei Liu <wei.liu2@citrix.com> > > The code matches documents and commit log so: > > Acked-by: Wei Liu <wei.liu2@citrix.com> Likewise, so I applied, doing s/ratehr/rather/ on the commit message. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-04-15 13:31 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-04-07 13:05 [PATCH v2 0/2]: libxl: datacopier POLLHUP fixes Ian Jackson 2015-04-07 13:05 ` [PATCH 1/2] libxl: datacopier: Avoid eof/POLLHUP race Ian Jackson 2015-04-07 13:05 ` [PATCH 2/2] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson 2015-04-08 11:05 ` Wei Liu 2015-04-15 13:31 ` Ian Campbell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.