All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Question about nonblocking stderr and lost logs
@ 2015-11-30  5:34 Sam Bobroff
  2015-12-04  6:33 ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: Sam Bobroff @ 2015-11-30  5:34 UTC (permalink / raw)
  To: qemu-devel

Hi QEMU programmers,

While doing some experimental work on QEMU that has involved adding a lot of
new log messages (using qemu_log_mask()), I've discovered that under some
conditions a lot of my log messages go missing.

I've tracked the issue down to qemu_logfile being left at the default (stderr)
(so when not using -D) and according to strace what is happening is that the
debug messages are being passed to write() but write() is returning EWOULDBLOCK
and the messags are dropped.

This seems to be happening because stderr is being set non-blocking (which is a
bit odd to me), and it seems like NONBLOCK is being set as qmp_chardev_add() adds a
device for stdout (yes stdout, not stderr; perhaps file descriptors have been
dup'd by that point?).

Is this by design to prevent a slow console from blocking QEMU? If not, should
I delve further and try to prevent non-blocking being set on stderr?

(Unfortunately I don't have a replication for this using an unmodified QEMU but
I suspect I could find one if necessary.)

Cheers,
Sam.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Question about nonblocking stderr and lost logs
  2015-11-30  5:34 [Qemu-devel] Question about nonblocking stderr and lost logs Sam Bobroff
@ 2015-12-04  6:33 ` Stefan Hajnoczi
  2015-12-07  3:22   ` Sam Bobroff
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Hajnoczi @ 2015-12-04  6:33 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]

On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> Hi QEMU programmers,
> 
> While doing some experimental work on QEMU that has involved adding a lot of
> new log messages (using qemu_log_mask()), I've discovered that under some
> conditions a lot of my log messages go missing.
> 
> I've tracked the issue down to qemu_logfile being left at the default (stderr)
> (so when not using -D) and according to strace what is happening is that the
> debug messages are being passed to write() but write() is returning EWOULDBLOCK
> and the messags are dropped.
> 
> This seems to be happening because stderr is being set non-blocking (which is a
> bit odd to me), and it seems like NONBLOCK is being set as qmp_chardev_add() adds a
> device for stdout (yes stdout, not stderr; perhaps file descriptors have been
> dup'd by that point?).
> 
> Is this by design to prevent a slow console from blocking QEMU? If not, should
> I delve further and try to prevent non-blocking being set on stderr?
> 
> (Unfortunately I don't have a replication for this using an unmodified QEMU but
> I suspect I could find one if necessary.)

This sounds like a bug.  stderr should be blocking.

But it's specific to your QEMU build or your command-line.  Here are my
results on an x86 host:

$ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
                     -drive if=virtio,cache=none,file=test.img,format=raw
$ cat /proc/$(pgrep qemu)/fdinfo/2
pos:	0
flags:	0100002
mnt_id:	22

The flags are O_RDWR | O_LARGEFILE.

O_NONBLOCK would have been 04000 (octal).

Please retry with qemu.git/master and post your QEMU command-line.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Question about nonblocking stderr and lost logs
  2015-12-04  6:33 ` Stefan Hajnoczi
@ 2015-12-07  3:22   ` Sam Bobroff
  2015-12-10  8:37     ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: Sam Bobroff @ 2015-12-07  3:22 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Fri, Dec 04, 2015 at 02:33:01PM +0800, Stefan Hajnoczi wrote:
> On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> > Hi QEMU programmers,
> > 
> > While doing some experimental work on QEMU that has involved adding a lot of
> > new log messages (using qemu_log_mask()), I've discovered that under some
> > conditions a lot of my log messages go missing.
> > 
> > I've tracked the issue down to qemu_logfile being left at the default (stderr)
> > (so when not using -D) and according to strace what is happening is that the
> > debug messages are being passed to write() but write() is returning EWOULDBLOCK
> > and the messags are dropped.
> > 
> > This seems to be happening because stderr is being set non-blocking (which is a
> > bit odd to me), and it seems like NONBLOCK is being set as qmp_chardev_add() adds a
> > device for stdout (yes stdout, not stderr; perhaps file descriptors have been
> > dup'd by that point?).
> > 
> > Is this by design to prevent a slow console from blocking QEMU? If not, should
> > I delve further and try to prevent non-blocking being set on stderr?
> > 
> > (Unfortunately I don't have a replication for this using an unmodified QEMU but
> > I suspect I could find one if necessary.)
> 
> This sounds like a bug.  stderr should be blocking.
> 
> But it's specific to your QEMU build or your command-line.  Here are my
> results on an x86 host:
> 
> $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
>                      -drive if=virtio,cache=none,file=test.img,format=raw
> $ cat /proc/$(pgrep qemu)/fdinfo/2
> pos:	0
> flags:	0100002
> mnt_id:	22
> 
> The flags are O_RDWR | O_LARGEFILE.
> 
> O_NONBLOCK would have been 04000 (octal).
> 
> Please retry with qemu.git/master and post your QEMU command-line.
> 
> Stefan

Hi Stefan,

Thanks for checking this! I think I've worked out what's happening and it's a
bit complicated ;-)

It appears that the behaviour is not directly related to the command line or
build but rather the way the shell sets up the environment.

Here's a test very similar to yours using origin/master QEMU on my Debian
stable machine, with bash as my shell:

$ qemu-system-x86_64 -S -vga none -nographic
( In another virtual console: )
$ cat /proc/$(pgrep qemu)/fdinfo/2
pos:    0
flags:  0104002
mnt_id: 19

So I seem to have O_NONBLOCK set on stderr.

However, if I perform that test by backgrounding QEMU and using the same
console:
$ qemu-system-x86_64 -S -vga none -nographic &
[1] 4970
[1]+  Stopped                 qemu-system-x86_64 -S -nographic -vga none
$ cat /proc/$(pgrep qemu)/fdinfo/2
pos:    0
flags:  0100002
mnt_id: 19

Hmm!

After a bit of reading, it seems like this is caused by the fact that bash,
when there is no redirection, sets up fds 0, 1 and 2 by dup'ing a single file
descriptor. I found this surprising!

A small test program that sets O_NONBLOCK on fd 1 and checks for it on fd 2
seems to corroborate it:

#include <stdio.h>
#include <fcntl.h>
#include <assert.h>

int main() {
        int f1, f2;

        f1 = fcntl(1, F_GETFL, 0);
        f2 = fcntl(2, F_GETFL, 0);
        assert(!(f1 & O_NONBLOCK));
        assert(!(f2 & O_NONBLOCK));
        assert(!fcntl(1, F_SETFL, f1 | O_NONBLOCK));
        if (fcntl(2, F_GETFL, 0) & O_NONBLOCK)
                fprintf(stderr, "File descriptors are duplicates.\n");
        else
                fprintf(stderr, "File descriptors are separate.\n");
        return 0;
}

gcc -Wall foo.c
$ ./a.out
File descriptors are duplicates.
$ ./a.out > /dev/null
File descriptors are separate.

The nonblocking flag is always (?) being set on stdin and stdout during
qemu_chr_open_stdio(), but it doesn't affect stderr unless they are dup'd.

So this is probably not going to occur often, and certainly not when running
under libvirt or other tools. But if it is actually what is happening, I'm not
sure what to do about it. Any ideas?

Cheers,
Sam.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Question about nonblocking stderr and lost logs
  2015-12-07  3:22   ` Sam Bobroff
@ 2015-12-10  8:37     ` Stefan Hajnoczi
  2015-12-14  5:19       ` Sam Bobroff
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Hajnoczi @ 2015-12-10  8:37 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4426 bytes --]

On Mon, Dec 07, 2015 at 02:22:18PM +1100, Sam Bobroff wrote:
> On Fri, Dec 04, 2015 at 02:33:01PM +0800, Stefan Hajnoczi wrote:
> > On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> > > Hi QEMU programmers,
> > > 
> > > While doing some experimental work on QEMU that has involved adding a lot of
> > > new log messages (using qemu_log_mask()), I've discovered that under some
> > > conditions a lot of my log messages go missing.
> > > 
> > > I've tracked the issue down to qemu_logfile being left at the default (stderr)
> > > (so when not using -D) and according to strace what is happening is that the
> > > debug messages are being passed to write() but write() is returning EWOULDBLOCK
> > > and the messags are dropped.
> > > 
> > > This seems to be happening because stderr is being set non-blocking (which is a
> > > bit odd to me), and it seems like NONBLOCK is being set as qmp_chardev_add() adds a
> > > device for stdout (yes stdout, not stderr; perhaps file descriptors have been
> > > dup'd by that point?).
> > > 
> > > Is this by design to prevent a slow console from blocking QEMU? If not, should
> > > I delve further and try to prevent non-blocking being set on stderr?
> > > 
> > > (Unfortunately I don't have a replication for this using an unmodified QEMU but
> > > I suspect I could find one if necessary.)
> > 
> > This sounds like a bug.  stderr should be blocking.
> > 
> > But it's specific to your QEMU build or your command-line.  Here are my
> > results on an x86 host:
> > 
> > $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
> >                      -drive if=virtio,cache=none,file=test.img,format=raw
> > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > pos:	0
> > flags:	0100002
> > mnt_id:	22
> > 
> > The flags are O_RDWR | O_LARGEFILE.
> > 
> > O_NONBLOCK would have been 04000 (octal).
> > 
> > Please retry with qemu.git/master and post your QEMU command-line.
> > 
> > Stefan
> 
> Hi Stefan,
> 
> Thanks for checking this! I think I've worked out what's happening and it's a
> bit complicated ;-)
> 
> It appears that the behaviour is not directly related to the command line or
> build but rather the way the shell sets up the environment.
> 
> Here's a test very similar to yours using origin/master QEMU on my Debian
> stable machine, with bash as my shell:
> 
> $ qemu-system-x86_64 -S -vga none -nographic
> ( In another virtual console: )
> $ cat /proc/$(pgrep qemu)/fdinfo/2
> pos:    0
> flags:  0104002
> mnt_id: 19
> 
> So I seem to have O_NONBLOCK set on stderr.
> 
> However, if I perform that test by backgrounding QEMU and using the same
> console:
> $ qemu-system-x86_64 -S -vga none -nographic &
> [1] 4970
> [1]+  Stopped                 qemu-system-x86_64 -S -nographic -vga none
> $ cat /proc/$(pgrep qemu)/fdinfo/2
> pos:    0
> flags:  0100002
> mnt_id: 19
> 
> Hmm!
> 
> After a bit of reading, it seems like this is caused by the fact that bash,
> when there is no redirection, sets up fds 0, 1 and 2 by dup'ing a single file
> descriptor. I found this surprising!
> 
> A small test program that sets O_NONBLOCK on fd 1 and checks for it on fd 2
> seems to corroborate it:
> 
> #include <stdio.h>
> #include <fcntl.h>
> #include <assert.h>
> 
> int main() {
>         int f1, f2;
> 
>         f1 = fcntl(1, F_GETFL, 0);
>         f2 = fcntl(2, F_GETFL, 0);
>         assert(!(f1 & O_NONBLOCK));
>         assert(!(f2 & O_NONBLOCK));
>         assert(!fcntl(1, F_SETFL, f1 | O_NONBLOCK));
>         if (fcntl(2, F_GETFL, 0) & O_NONBLOCK)
>                 fprintf(stderr, "File descriptors are duplicates.\n");
>         else
>                 fprintf(stderr, "File descriptors are separate.\n");
>         return 0;
> }
> 
> gcc -Wall foo.c
> $ ./a.out
> File descriptors are duplicates.
> $ ./a.out > /dev/null
> File descriptors are separate.
> 
> The nonblocking flag is always (?) being set on stdin and stdout during
> qemu_chr_open_stdio(), but it doesn't affect stderr unless they are dup'd.
> 
> So this is probably not going to occur often, and certainly not when running
> under libvirt or other tools. But if it is actually what is happening, I'm not
> sure what to do about it. Any ideas?

This is unfortunate.  I wonder if opening /proc/fd/<n> is just a dup or
if it actually creates separate file status flags.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Question about nonblocking stderr and lost logs
  2015-12-10  8:37     ` Stefan Hajnoczi
@ 2015-12-14  5:19       ` Sam Bobroff
  2015-12-14  7:29         ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: Sam Bobroff @ 2015-12-14  5:19 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Thu, Dec 10, 2015 at 04:37:22PM +0800, Stefan Hajnoczi wrote:
> On Mon, Dec 07, 2015 at 02:22:18PM +1100, Sam Bobroff wrote:
> > On Fri, Dec 04, 2015 at 02:33:01PM +0800, Stefan Hajnoczi wrote:
> > > On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> > > > Hi QEMU programmers,
> > > > 
> > > > While doing some experimental work on QEMU that has involved adding a lot of
> > > > new log messages (using qemu_log_mask()), I've discovered that under some
> > > > conditions a lot of my log messages go missing.
> > > > 
> > > > I've tracked the issue down to qemu_logfile being left at the default (stderr)
> > > > (so when not using -D) and according to strace what is happening is that the
> > > > debug messages are being passed to write() but write() is returning EWOULDBLOCK
> > > > and the messags are dropped.
> > > > 
> > > > This seems to be happening because stderr is being set non-blocking (which is a
> > > > bit odd to me), and it seems like NONBLOCK is being set as qmp_chardev_add() adds a
> > > > device for stdout (yes stdout, not stderr; perhaps file descriptors have been
> > > > dup'd by that point?).
> > > > 
> > > > Is this by design to prevent a slow console from blocking QEMU? If not, should
> > > > I delve further and try to prevent non-blocking being set on stderr?
> > > > 
> > > > (Unfortunately I don't have a replication for this using an unmodified QEMU but
> > > > I suspect I could find one if necessary.)
> > > 
> > > This sounds like a bug.  stderr should be blocking.
> > > 
> > > But it's specific to your QEMU build or your command-line.  Here are my
> > > results on an x86 host:
> > > 
> > > $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
> > >                      -drive if=virtio,cache=none,file=test.img,format=raw
> > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > pos:	0
> > > flags:	0100002
> > > mnt_id:	22
> > > 
> > > The flags are O_RDWR | O_LARGEFILE.
> > > 
> > > O_NONBLOCK would have been 04000 (octal).
> > > 
> > > Please retry with qemu.git/master and post your QEMU command-line.
> > > 
> > > Stefan
> > 
> > Hi Stefan,
> > 
> > Thanks for checking this! I think I've worked out what's happening and it's a
> > bit complicated ;-)
> > 
> > It appears that the behaviour is not directly related to the command line or
> > build but rather the way the shell sets up the environment.
> > 
> > Here's a test very similar to yours using origin/master QEMU on my Debian
> > stable machine, with bash as my shell:
> > 
> > $ qemu-system-x86_64 -S -vga none -nographic
> > ( In another virtual console: )
> > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > pos:    0
> > flags:  0104002
> > mnt_id: 19
> > 
> > So I seem to have O_NONBLOCK set on stderr.
> > 
> > However, if I perform that test by backgrounding QEMU and using the same
> > console:
> > $ qemu-system-x86_64 -S -vga none -nographic &
> > [1] 4970
> > [1]+  Stopped                 qemu-system-x86_64 -S -nographic -vga none
> > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > pos:    0
> > flags:  0100002
> > mnt_id: 19
> > 
> > Hmm!
> > 
> > After a bit of reading, it seems like this is caused by the fact that bash,
> > when there is no redirection, sets up fds 0, 1 and 2 by dup'ing a single file
> > descriptor. I found this surprising!
> > 
> > A small test program that sets O_NONBLOCK on fd 1 and checks for it on fd 2
> > seems to corroborate it:
> > 
> > #include <stdio.h>
> > #include <fcntl.h>
> > #include <assert.h>
> > 
> > int main() {
> >         int f1, f2;
> > 
> >         f1 = fcntl(1, F_GETFL, 0);
> >         f2 = fcntl(2, F_GETFL, 0);
> >         assert(!(f1 & O_NONBLOCK));
> >         assert(!(f2 & O_NONBLOCK));
> >         assert(!fcntl(1, F_SETFL, f1 | O_NONBLOCK));
> >         if (fcntl(2, F_GETFL, 0) & O_NONBLOCK)
> >                 fprintf(stderr, "File descriptors are duplicates.\n");
> >         else
> >                 fprintf(stderr, "File descriptors are separate.\n");
> >         return 0;
> > }
> > 
> > gcc -Wall foo.c
> > $ ./a.out
> > File descriptors are duplicates.
> > $ ./a.out > /dev/null
> > File descriptors are separate.
> > 
> > The nonblocking flag is always (?) being set on stdin and stdout during
> > qemu_chr_open_stdio(), but it doesn't affect stderr unless they are dup'd.
> > 
> > So this is probably not going to occur often, and certainly not when running
> > under libvirt or other tools. But if it is actually what is happening, I'm not
> > sure what to do about it. Any ideas?
> 
> This is unfortunate.  I wonder if opening /proc/fd/<n> is just a dup or
> if it actually creates separate file status flags.

Calling open() always creates a new "file description" (which contains file
status flags), so it's certainly a way to approach this.

I've testing doing this if we're going to use stderr as the qemu_logfile:

	dup2(open("/proc/self/fd/2", O_WRONLY), 2)

... in do_qemu_set_log() and it fixes the issue, however...

* Doing the dup2() causes it to fix any other fprintf()s to stderr, but
  I feel bad for changing the fd that's inside the stderr "FILE *" "underneath
  it". Would it be better to do the following...
	qemu_logfile = fopen("/proc/self/fd/2", "a")
  ... even though that will leave other fprintf()s unfixed. (I think that is
  probably OK.)
* We could also unconditionally do the dup2(), maybe early in main().
* I'm not sure which platforms other than modern Linux support
  /proc/self/fd/<n>. (Just fixing it for Linux seems a lot better than
  nothing.)
* We could print a warning if stderr is nonblocking and we can't work around
  it. This seems like a good idea even if don't fix it for Linux.
* Alternatively, we could take this approach to stdin and stdout as they are
  added to the char driver and set nonblocking. This somehow feels more
  invasive, although I'm not sure why.
* We could also handle it by converting the qemu_log_mask calls from fprintf()
  to sprintf() + write(), as we'd then be able to handle EAGAIN correctly. This
  doesn't feel any cleaner to me.

Cheers,
Sam.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Question about nonblocking stderr and lost logs
  2015-12-14  5:19       ` Sam Bobroff
@ 2015-12-14  7:29         ` Stefan Hajnoczi
  2015-12-16  0:53           ` Sam Bobroff
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Hajnoczi @ 2015-12-14  7:29 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6711 bytes --]

On Mon, Dec 14, 2015 at 04:19:50PM +1100, Sam Bobroff wrote:
> On Thu, Dec 10, 2015 at 04:37:22PM +0800, Stefan Hajnoczi wrote:
> > On Mon, Dec 07, 2015 at 02:22:18PM +1100, Sam Bobroff wrote:
> > > On Fri, Dec 04, 2015 at 02:33:01PM +0800, Stefan Hajnoczi wrote:
> > > > On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> > > > > Hi QEMU programmers,
> > > > > 
> > > > > While doing some experimental work on QEMU that has involved adding a lot of
> > > > > new log messages (using qemu_log_mask()), I've discovered that under some
> > > > > conditions a lot of my log messages go missing.
> > > > > 
> > > > > I've tracked the issue down to qemu_logfile being left at the default (stderr)
> > > > > (so when not using -D) and according to strace what is happening is that the
> > > > > debug messages are being passed to write() but write() is returning EWOULDBLOCK
> > > > > and the messags are dropped.
> > > > > 
> > > > > This seems to be happening because stderr is being set non-blocking (which is a
> > > > > bit odd to me), and it seems like NONBLOCK is being set as qmp_chardev_add() adds a
> > > > > device for stdout (yes stdout, not stderr; perhaps file descriptors have been
> > > > > dup'd by that point?).
> > > > > 
> > > > > Is this by design to prevent a slow console from blocking QEMU? If not, should
> > > > > I delve further and try to prevent non-blocking being set on stderr?
> > > > > 
> > > > > (Unfortunately I don't have a replication for this using an unmodified QEMU but
> > > > > I suspect I could find one if necessary.)
> > > > 
> > > > This sounds like a bug.  stderr should be blocking.
> > > > 
> > > > But it's specific to your QEMU build or your command-line.  Here are my
> > > > results on an x86 host:
> > > > 
> > > > $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
> > > >                      -drive if=virtio,cache=none,file=test.img,format=raw
> > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > pos:	0
> > > > flags:	0100002
> > > > mnt_id:	22
> > > > 
> > > > The flags are O_RDWR | O_LARGEFILE.
> > > > 
> > > > O_NONBLOCK would have been 04000 (octal).
> > > > 
> > > > Please retry with qemu.git/master and post your QEMU command-line.
> > > > 
> > > > Stefan
> > > 
> > > Hi Stefan,
> > > 
> > > Thanks for checking this! I think I've worked out what's happening and it's a
> > > bit complicated ;-)
> > > 
> > > It appears that the behaviour is not directly related to the command line or
> > > build but rather the way the shell sets up the environment.
> > > 
> > > Here's a test very similar to yours using origin/master QEMU on my Debian
> > > stable machine, with bash as my shell:
> > > 
> > > $ qemu-system-x86_64 -S -vga none -nographic
> > > ( In another virtual console: )
> > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > pos:    0
> > > flags:  0104002
> > > mnt_id: 19
> > > 
> > > So I seem to have O_NONBLOCK set on stderr.
> > > 
> > > However, if I perform that test by backgrounding QEMU and using the same
> > > console:
> > > $ qemu-system-x86_64 -S -vga none -nographic &
> > > [1] 4970
> > > [1]+  Stopped                 qemu-system-x86_64 -S -nographic -vga none
> > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > pos:    0
> > > flags:  0100002
> > > mnt_id: 19
> > > 
> > > Hmm!
> > > 
> > > After a bit of reading, it seems like this is caused by the fact that bash,
> > > when there is no redirection, sets up fds 0, 1 and 2 by dup'ing a single file
> > > descriptor. I found this surprising!
> > > 
> > > A small test program that sets O_NONBLOCK on fd 1 and checks for it on fd 2
> > > seems to corroborate it:
> > > 
> > > #include <stdio.h>
> > > #include <fcntl.h>
> > > #include <assert.h>
> > > 
> > > int main() {
> > >         int f1, f2;
> > > 
> > >         f1 = fcntl(1, F_GETFL, 0);
> > >         f2 = fcntl(2, F_GETFL, 0);
> > >         assert(!(f1 & O_NONBLOCK));
> > >         assert(!(f2 & O_NONBLOCK));
> > >         assert(!fcntl(1, F_SETFL, f1 | O_NONBLOCK));
> > >         if (fcntl(2, F_GETFL, 0) & O_NONBLOCK)
> > >                 fprintf(stderr, "File descriptors are duplicates.\n");
> > >         else
> > >                 fprintf(stderr, "File descriptors are separate.\n");
> > >         return 0;
> > > }
> > > 
> > > gcc -Wall foo.c
> > > $ ./a.out
> > > File descriptors are duplicates.
> > > $ ./a.out > /dev/null
> > > File descriptors are separate.
> > > 
> > > The nonblocking flag is always (?) being set on stdin and stdout during
> > > qemu_chr_open_stdio(), but it doesn't affect stderr unless they are dup'd.
> > > 
> > > So this is probably not going to occur often, and certainly not when running
> > > under libvirt or other tools. But if it is actually what is happening, I'm not
> > > sure what to do about it. Any ideas?
> > 
> > This is unfortunate.  I wonder if opening /proc/fd/<n> is just a dup or
> > if it actually creates separate file status flags.
> 
> Calling open() always creates a new "file description" (which contains file
> status flags), so it's certainly a way to approach this.
> 
> I've testing doing this if we're going to use stderr as the qemu_logfile:
> 
> 	dup2(open("/proc/self/fd/2", O_WRONLY), 2)
> 
> ... in do_qemu_set_log() and it fixes the issue, however...

Thanks for trying it out.  Will you send a patch?

> * Doing the dup2() causes it to fix any other fprintf()s to stderr, but
>   I feel bad for changing the fd that's inside the stderr "FILE *" "underneath
>   it". Would it be better to do the following...
> 	qemu_logfile = fopen("/proc/self/fd/2", "a")
>   ... even though that will leave other fprintf()s unfixed. (I think that is
>   probably OK.)
> * We could also unconditionally do the dup2(), maybe early in main().

This is probably the cleanest way to solve the problem.  Early on during
startup QEMU can "undup" stderr and never worry about it again.

> * I'm not sure which platforms other than modern Linux support
>   /proc/self/fd/<n>. (Just fixing it for Linux seems a lot better than
>   nothing.)

This workaround is platform-specific but still useful (on Linux).

> * We could print a warning if stderr is nonblocking and we can't work around
>   it. This seems like a good idea even if don't fix it for Linux.

This is good when the workaround is not possible.

> * We could also handle it by converting the qemu_log_mask calls from fprintf()
>   to sprintf() + write(), as we'd then be able to handle EAGAIN correctly. This
>   doesn't feel any cleaner to me.

I agree.  Bypassing fprintf() doesn't help since the fd is still
O_NONBLOCK.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Question about nonblocking stderr and lost logs
  2015-12-14  7:29         ` Stefan Hajnoczi
@ 2015-12-16  0:53           ` Sam Bobroff
  2015-12-22  6:43             ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: Sam Bobroff @ 2015-12-16  0:53 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Mon, Dec 14, 2015 at 03:29:49PM +0800, Stefan Hajnoczi wrote:
> On Mon, Dec 14, 2015 at 04:19:50PM +1100, Sam Bobroff wrote:
> > On Thu, Dec 10, 2015 at 04:37:22PM +0800, Stefan Hajnoczi wrote:
> > > On Mon, Dec 07, 2015 at 02:22:18PM +1100, Sam Bobroff wrote:
> > > > On Fri, Dec 04, 2015 at 02:33:01PM +0800, Stefan Hajnoczi wrote:
> > > > > On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> > > > > > Hi QEMU programmers,
> > > > > > 
> > > > > > While doing some experimental work on QEMU that has involved adding a lot of
> > > > > > new log messages (using qemu_log_mask()), I've discovered that under some
> > > > > > conditions a lot of my log messages go missing.
> > > > > > 
> > > > > > I've tracked the issue down to qemu_logfile being left at the default (stderr)
> > > > > > (so when not using -D) and according to strace what is happening is that the
> > > > > > debug messages are being passed to write() but write() is returning EWOULDBLOCK
> > > > > > and the messags are dropped.
> > > > > > 
> > > > > > This seems to be happening because stderr is being set non-blocking (which is a
> > > > > > bit odd to me), and it seems like NONBLOCK is being set as qmp_chardev_add() adds a
> > > > > > device for stdout (yes stdout, not stderr; perhaps file descriptors have been
> > > > > > dup'd by that point?).
> > > > > > 
> > > > > > Is this by design to prevent a slow console from blocking QEMU? If not, should
> > > > > > I delve further and try to prevent non-blocking being set on stderr?
> > > > > > 
> > > > > > (Unfortunately I don't have a replication for this using an unmodified QEMU but
> > > > > > I suspect I could find one if necessary.)
> > > > > 
> > > > > This sounds like a bug.  stderr should be blocking.
> > > > > 
> > > > > But it's specific to your QEMU build or your command-line.  Here are my
> > > > > results on an x86 host:
> > > > > 
> > > > > $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
> > > > >                      -drive if=virtio,cache=none,file=test.img,format=raw
> > > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > > pos:	0
> > > > > flags:	0100002
> > > > > mnt_id:	22
> > > > > 
> > > > > The flags are O_RDWR | O_LARGEFILE.
> > > > > 
> > > > > O_NONBLOCK would have been 04000 (octal).
> > > > > 
> > > > > Please retry with qemu.git/master and post your QEMU command-line.
> > > > > 
> > > > > Stefan
> > > > 
> > > > Hi Stefan,
> > > > 
> > > > Thanks for checking this! I think I've worked out what's happening and it's a
> > > > bit complicated ;-)
> > > > 
> > > > It appears that the behaviour is not directly related to the command line or
> > > > build but rather the way the shell sets up the environment.
> > > > 
> > > > Here's a test very similar to yours using origin/master QEMU on my Debian
> > > > stable machine, with bash as my shell:
> > > > 
> > > > $ qemu-system-x86_64 -S -vga none -nographic
> > > > ( In another virtual console: )
> > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > pos:    0
> > > > flags:  0104002
> > > > mnt_id: 19
> > > > 
> > > > So I seem to have O_NONBLOCK set on stderr.
> > > > 
> > > > However, if I perform that test by backgrounding QEMU and using the same
> > > > console:
> > > > $ qemu-system-x86_64 -S -vga none -nographic &
> > > > [1] 4970
> > > > [1]+  Stopped                 qemu-system-x86_64 -S -nographic -vga none
> > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > pos:    0
> > > > flags:  0100002
> > > > mnt_id: 19
> > > > 
> > > > Hmm!
> > > > 
> > > > After a bit of reading, it seems like this is caused by the fact that bash,
> > > > when there is no redirection, sets up fds 0, 1 and 2 by dup'ing a single file
> > > > descriptor. I found this surprising!
> > > > 
> > > > A small test program that sets O_NONBLOCK on fd 1 and checks for it on fd 2
> > > > seems to corroborate it:
> > > > 
> > > > #include <stdio.h>
> > > > #include <fcntl.h>
> > > > #include <assert.h>
> > > > 
> > > > int main() {
> > > >         int f1, f2;
> > > > 
> > > >         f1 = fcntl(1, F_GETFL, 0);
> > > >         f2 = fcntl(2, F_GETFL, 0);
> > > >         assert(!(f1 & O_NONBLOCK));
> > > >         assert(!(f2 & O_NONBLOCK));
> > > >         assert(!fcntl(1, F_SETFL, f1 | O_NONBLOCK));
> > > >         if (fcntl(2, F_GETFL, 0) & O_NONBLOCK)
> > > >                 fprintf(stderr, "File descriptors are duplicates.\n");
> > > >         else
> > > >                 fprintf(stderr, "File descriptors are separate.\n");
> > > >         return 0;
> > > > }
> > > > 
> > > > gcc -Wall foo.c
> > > > $ ./a.out
> > > > File descriptors are duplicates.
> > > > $ ./a.out > /dev/null
> > > > File descriptors are separate.
> > > > 
> > > > The nonblocking flag is always (?) being set on stdin and stdout during
> > > > qemu_chr_open_stdio(), but it doesn't affect stderr unless they are dup'd.
> > > > 
> > > > So this is probably not going to occur often, and certainly not when running
> > > > under libvirt or other tools. But if it is actually what is happening, I'm not
> > > > sure what to do about it. Any ideas?
> > > 
> > > This is unfortunate.  I wonder if opening /proc/fd/<n> is just a dup or
> > > if it actually creates separate file status flags.
> > 
> > Calling open() always creates a new "file description" (which contains file
> > status flags), so it's certainly a way to approach this.
> > 
> > I've testing doing this if we're going to use stderr as the qemu_logfile:
> > 
> > 	dup2(open("/proc/self/fd/2", O_WRONLY), 2)
> > 
> > ... in do_qemu_set_log() and it fixes the issue, however...
> 
> Thanks for trying it out.  Will you send a patch?
>
> > * Doing the dup2() causes it to fix any other fprintf()s to stderr, but
> >   I feel bad for changing the fd that's inside the stderr "FILE *" "underneath
> >   it". Would it be better to do the following...
> > 	qemu_logfile = fopen("/proc/self/fd/2", "a")
> >   ... even though that will leave other fprintf()s unfixed. (I think that is
> >   probably OK.)
> > * We could also unconditionally do the dup2(), maybe early in main().
> 
> This is probably the cleanest way to solve the problem.  Early on during
> startup QEMU can "undup" stderr and never worry about it again.

OK.

> > * I'm not sure which platforms other than modern Linux support
> >   /proc/self/fd/<n>. (Just fixing it for Linux seems a lot better than
> >   nothing.)
> 
> This workaround is platform-specific but still useful (on Linux).
> 
> > * We could print a warning if stderr is nonblocking and we can't work around
> >   it. This seems like a good idea even if don't fix it for Linux.
> 
> This is good when the workaround is not possible.

OK, I'll include this too.

> > * We could also handle it by converting the qemu_log_mask calls from fprintf()
> >   to sprintf() + write(), as we'd then be able to handle EAGAIN correctly. This
> >   doesn't feel any cleaner to me.
> 
> I agree.  Bypassing fprintf() doesn't help since the fd is still
> O_NONBLOCK.

That wasn't exactly what I was trying to say, what I meant was that it *would*
help but that I don't think it's a good solution. I'll explain (sorry if I'm
over-explaining!):

The problem with fprintf() is that it may write "some" bytes and then return
failure, so it's impossible to handle the EAGAIN case correctly even if we
wanted to: we can't know where to continue writing. IMHO, this seems like it
could be considered a bug in fprintf().

Because write() always indicates how many bytes were written, we could manually
block on EAGAIN and resume writing at the correct place. But this would be
slower, require a local buffer (that may need to be resized sometimes) and more
complex code, so I don't think it's a good fix.  Although, it would be much more
portable and it would work.

Would you like me to post a patch for the re-open approach or explore the
sprintf()/write() idea?

Cheers,
Sam.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] Question about nonblocking stderr and lost logs
  2015-12-16  0:53           ` Sam Bobroff
@ 2015-12-22  6:43             ` Stefan Hajnoczi
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2015-12-22  6:43 UTC (permalink / raw)
  To: Sam Bobroff; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8622 bytes --]

On Wed, Dec 16, 2015 at 11:53:16AM +1100, Sam Bobroff wrote:
> On Mon, Dec 14, 2015 at 03:29:49PM +0800, Stefan Hajnoczi wrote:
> > On Mon, Dec 14, 2015 at 04:19:50PM +1100, Sam Bobroff wrote:
> > > On Thu, Dec 10, 2015 at 04:37:22PM +0800, Stefan Hajnoczi wrote:
> > > > On Mon, Dec 07, 2015 at 02:22:18PM +1100, Sam Bobroff wrote:
> > > > > On Fri, Dec 04, 2015 at 02:33:01PM +0800, Stefan Hajnoczi wrote:
> > > > > > On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> > > > > > > Hi QEMU programmers,
> > > > > > > 
> > > > > > > While doing some experimental work on QEMU that has involved adding a lot of
> > > > > > > new log messages (using qemu_log_mask()), I've discovered that under some
> > > > > > > conditions a lot of my log messages go missing.
> > > > > > > 
> > > > > > > I've tracked the issue down to qemu_logfile being left at the default (stderr)
> > > > > > > (so when not using -D) and according to strace what is happening is that the
> > > > > > > debug messages are being passed to write() but write() is returning EWOULDBLOCK
> > > > > > > and the messags are dropped.
> > > > > > > 
> > > > > > > This seems to be happening because stderr is being set non-blocking (which is a
> > > > > > > bit odd to me), and it seems like NONBLOCK is being set as qmp_chardev_add() adds a
> > > > > > > device for stdout (yes stdout, not stderr; perhaps file descriptors have been
> > > > > > > dup'd by that point?).
> > > > > > > 
> > > > > > > Is this by design to prevent a slow console from blocking QEMU? If not, should
> > > > > > > I delve further and try to prevent non-blocking being set on stderr?
> > > > > > > 
> > > > > > > (Unfortunately I don't have a replication for this using an unmodified QEMU but
> > > > > > > I suspect I could find one if necessary.)
> > > > > > 
> > > > > > This sounds like a bug.  stderr should be blocking.
> > > > > > 
> > > > > > But it's specific to your QEMU build or your command-line.  Here are my
> > > > > > results on an x86 host:
> > > > > > 
> > > > > > $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
> > > > > >                      -drive if=virtio,cache=none,file=test.img,format=raw
> > > > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > > > pos:	0
> > > > > > flags:	0100002
> > > > > > mnt_id:	22
> > > > > > 
> > > > > > The flags are O_RDWR | O_LARGEFILE.
> > > > > > 
> > > > > > O_NONBLOCK would have been 04000 (octal).
> > > > > > 
> > > > > > Please retry with qemu.git/master and post your QEMU command-line.
> > > > > > 
> > > > > > Stefan
> > > > > 
> > > > > Hi Stefan,
> > > > > 
> > > > > Thanks for checking this! I think I've worked out what's happening and it's a
> > > > > bit complicated ;-)
> > > > > 
> > > > > It appears that the behaviour is not directly related to the command line or
> > > > > build but rather the way the shell sets up the environment.
> > > > > 
> > > > > Here's a test very similar to yours using origin/master QEMU on my Debian
> > > > > stable machine, with bash as my shell:
> > > > > 
> > > > > $ qemu-system-x86_64 -S -vga none -nographic
> > > > > ( In another virtual console: )
> > > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > > pos:    0
> > > > > flags:  0104002
> > > > > mnt_id: 19
> > > > > 
> > > > > So I seem to have O_NONBLOCK set on stderr.
> > > > > 
> > > > > However, if I perform that test by backgrounding QEMU and using the same
> > > > > console:
> > > > > $ qemu-system-x86_64 -S -vga none -nographic &
> > > > > [1] 4970
> > > > > [1]+  Stopped                 qemu-system-x86_64 -S -nographic -vga none
> > > > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > > > pos:    0
> > > > > flags:  0100002
> > > > > mnt_id: 19
> > > > > 
> > > > > Hmm!
> > > > > 
> > > > > After a bit of reading, it seems like this is caused by the fact that bash,
> > > > > when there is no redirection, sets up fds 0, 1 and 2 by dup'ing a single file
> > > > > descriptor. I found this surprising!
> > > > > 
> > > > > A small test program that sets O_NONBLOCK on fd 1 and checks for it on fd 2
> > > > > seems to corroborate it:
> > > > > 
> > > > > #include <stdio.h>
> > > > > #include <fcntl.h>
> > > > > #include <assert.h>
> > > > > 
> > > > > int main() {
> > > > >         int f1, f2;
> > > > > 
> > > > >         f1 = fcntl(1, F_GETFL, 0);
> > > > >         f2 = fcntl(2, F_GETFL, 0);
> > > > >         assert(!(f1 & O_NONBLOCK));
> > > > >         assert(!(f2 & O_NONBLOCK));
> > > > >         assert(!fcntl(1, F_SETFL, f1 | O_NONBLOCK));
> > > > >         if (fcntl(2, F_GETFL, 0) & O_NONBLOCK)
> > > > >                 fprintf(stderr, "File descriptors are duplicates.\n");
> > > > >         else
> > > > >                 fprintf(stderr, "File descriptors are separate.\n");
> > > > >         return 0;
> > > > > }
> > > > > 
> > > > > gcc -Wall foo.c
> > > > > $ ./a.out
> > > > > File descriptors are duplicates.
> > > > > $ ./a.out > /dev/null
> > > > > File descriptors are separate.
> > > > > 
> > > > > The nonblocking flag is always (?) being set on stdin and stdout during
> > > > > qemu_chr_open_stdio(), but it doesn't affect stderr unless they are dup'd.
> > > > > 
> > > > > So this is probably not going to occur often, and certainly not when running
> > > > > under libvirt or other tools. But if it is actually what is happening, I'm not
> > > > > sure what to do about it. Any ideas?
> > > > 
> > > > This is unfortunate.  I wonder if opening /proc/fd/<n> is just a dup or
> > > > if it actually creates separate file status flags.
> > > 
> > > Calling open() always creates a new "file description" (which contains file
> > > status flags), so it's certainly a way to approach this.
> > > 
> > > I've testing doing this if we're going to use stderr as the qemu_logfile:
> > > 
> > > 	dup2(open("/proc/self/fd/2", O_WRONLY), 2)
> > > 
> > > ... in do_qemu_set_log() and it fixes the issue, however...
> > 
> > Thanks for trying it out.  Will you send a patch?
> >
> > > * Doing the dup2() causes it to fix any other fprintf()s to stderr, but
> > >   I feel bad for changing the fd that's inside the stderr "FILE *" "underneath
> > >   it". Would it be better to do the following...
> > > 	qemu_logfile = fopen("/proc/self/fd/2", "a")
> > >   ... even though that will leave other fprintf()s unfixed. (I think that is
> > >   probably OK.)
> > > * We could also unconditionally do the dup2(), maybe early in main().
> > 
> > This is probably the cleanest way to solve the problem.  Early on during
> > startup QEMU can "undup" stderr and never worry about it again.
> 
> OK.
> 
> > > * I'm not sure which platforms other than modern Linux support
> > >   /proc/self/fd/<n>. (Just fixing it for Linux seems a lot better than
> > >   nothing.)
> > 
> > This workaround is platform-specific but still useful (on Linux).
> > 
> > > * We could print a warning if stderr is nonblocking and we can't work around
> > >   it. This seems like a good idea even if don't fix it for Linux.
> > 
> > This is good when the workaround is not possible.
> 
> OK, I'll include this too.
> 
> > > * We could also handle it by converting the qemu_log_mask calls from fprintf()
> > >   to sprintf() + write(), as we'd then be able to handle EAGAIN correctly. This
> > >   doesn't feel any cleaner to me.
> > 
> > I agree.  Bypassing fprintf() doesn't help since the fd is still
> > O_NONBLOCK.
> 
> That wasn't exactly what I was trying to say, what I meant was that it *would*
> help but that I don't think it's a good solution. I'll explain (sorry if I'm
> over-explaining!):
> 
> The problem with fprintf() is that it may write "some" bytes and then return
> failure, so it's impossible to handle the EAGAIN case correctly even if we
> wanted to: we can't know where to continue writing. IMHO, this seems like it
> could be considered a bug in fprintf().
> 
> Because write() always indicates how many bytes were written, we could manually
> block on EAGAIN and resume writing at the correct place. But this would be
> slower, require a local buffer (that may need to be resized sometimes) and more
> complex code, so I don't think it's a good fix.  Although, it would be much more
> portable and it would work.
> 
> Would you like me to post a patch for the re-open approach or explore the
> sprintf()/write() idea?

The program may crash/terminate before stderr becomes writeable after
EAGAIN.

I prefer a reliable, blocking stderr so re-open seems like a good idea.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-12-22  6:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-30  5:34 [Qemu-devel] Question about nonblocking stderr and lost logs Sam Bobroff
2015-12-04  6:33 ` Stefan Hajnoczi
2015-12-07  3:22   ` Sam Bobroff
2015-12-10  8:37     ` Stefan Hajnoczi
2015-12-14  5:19       ` Sam Bobroff
2015-12-14  7:29         ` Stefan Hajnoczi
2015-12-16  0:53           ` Sam Bobroff
2015-12-22  6:43             ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.