All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] libqtest: fail if child coredumps
@ 2018-05-24 14:30 Michael S. Tsirkin
  2018-05-24 14:45 ` Thomas Huth
  2018-05-24 15:02 ` [Qemu-devel] [PATCH] libqtest: fail if child coredumps Peter Maydell
  0 siblings, 2 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 14:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Eric Blake, Philippe Mathieu-Daudé,
	Markus Armbruster, Tiwei Bie

Right now tests report OK status if QEMU crashes during cleanup.
Let's catch that case and fail the test.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/libqtest.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index 43fb97e..f869854 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -103,8 +103,15 @@ static int socket_accept(int sock)
 static void kill_qemu(QTestState *s)
 {
     if (s->qemu_pid != -1) {
+        int wstatus = 0;
+        pid_t pid;
+
         kill(s->qemu_pid, SIGTERM);
-        waitpid(s->qemu_pid, NULL, 0);
+        pid = waitpid(s->qemu_pid, &wstatus, 0);
+
+        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
+            assert(!WCOREDUMP(wstatus));
+        }
     }
 }
 
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
  2018-05-24 14:30 [Qemu-devel] [PATCH] libqtest: fail if child coredumps Michael S. Tsirkin
@ 2018-05-24 14:45 ` Thomas Huth
  2018-05-24 15:00   ` Michael S. Tsirkin
                     ` (3 more replies)
  2018-05-24 15:02 ` [Qemu-devel] [PATCH] libqtest: fail if child coredumps Peter Maydell
  1 sibling, 4 replies; 26+ messages in thread
From: Thomas Huth @ 2018-05-24 14:45 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel
  Cc: Eric Blake, Philippe Mathieu-Daudé, Markus Armbruster, Tiwei Bie

On 24.05.2018 16:30, Michael S. Tsirkin wrote:
> Right now tests report OK status if QEMU crashes during cleanup.
> Let's catch that case and fail the test.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  tests/libqtest.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index 43fb97e..f869854 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>  static void kill_qemu(QTestState *s)
>  {
>      if (s->qemu_pid != -1) {
> +        int wstatus = 0;
> +        pid_t pid;
> +
>          kill(s->qemu_pid, SIGTERM);
> -        waitpid(s->qemu_pid, NULL, 0);
> +        pid = waitpid(s->qemu_pid, &wstatus, 0);
> +
> +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> +            assert(!WCOREDUMP(wstatus));
> +        }
>      }
>  }

That's basically a good idea ... but I've already seen yet another issue
in the past already: QEMU sometimes simply hangs in an endless loop
during clean up and never terminates. I think we should detect that
situation, too.

So instead of killing QEMU at the end of the testing, I think we should
rather try to terminate it with the QMP "quit" command. If QEMU does not
terminate with an exit code of 0, then the test should be flagged a
failed (and only if QEMU did not terminate at all, it should be killed
with SIGKILL).

 Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
  2018-05-24 14:45 ` Thomas Huth
@ 2018-05-24 15:00   ` Michael S. Tsirkin
  2018-05-24 15:04     ` Marc-André Lureau
  2018-05-24 15:46     ` Thomas Huth
  2018-05-24 15:17   ` [Qemu-devel] [PATCH 2/1] libqtest: add more exit status checks Michael S. Tsirkin
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 15:00 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-devel, Eric Blake, Philippe Mathieu-Daudé,
	Markus Armbruster, Tiwei Bie

On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
> > Right now tests report OK status if QEMU crashes during cleanup.
> > Let's catch that case and fail the test.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  tests/libqtest.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/libqtest.c b/tests/libqtest.c
> > index 43fb97e..f869854 100644
> > --- a/tests/libqtest.c
> > +++ b/tests/libqtest.c
> > @@ -103,8 +103,15 @@ static int socket_accept(int sock)
> >  static void kill_qemu(QTestState *s)
> >  {
> >      if (s->qemu_pid != -1) {
> > +        int wstatus = 0;
> > +        pid_t pid;
> > +
> >          kill(s->qemu_pid, SIGTERM);
> > -        waitpid(s->qemu_pid, NULL, 0);
> > +        pid = waitpid(s->qemu_pid, &wstatus, 0);
> > +
> > +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> > +            assert(!WCOREDUMP(wstatus));
> > +        }
> >      }
> >  }
> 
> That's basically a good idea ... but I've already seen yet another issue
> in the past already: QEMU sometimes simply hangs in an endless loop
> during clean up and never terminates. I think we should detect that
> situation, too. So instead of killing QEMU at the end of the testing, I think we should
> rather try to terminate it with the QMP "quit" command. If QEMU does not
> terminate with an exit code of 0, then the test should be flagged a
> failed (and only if QEMU did not terminate at all, it should be killed
> with SIGKILL).
> 
>  Thomas

Fine but can we agree to do this as a patch on top? And do you have
the time to implement this?

I'm seeing patches that cause crash on cleanup, it's not a theoretical
problem for me, so I'd like this one to go in first.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
  2018-05-24 14:30 [Qemu-devel] [PATCH] libqtest: fail if child coredumps Michael S. Tsirkin
  2018-05-24 14:45 ` Thomas Huth
@ 2018-05-24 15:02 ` Peter Maydell
  1 sibling, 0 replies; 26+ messages in thread
From: Peter Maydell @ 2018-05-24 15:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: QEMU Developers, Thomas Huth, Philippe Mathieu-Daudé,
	Tiwei Bie, Markus Armbruster

On 24 May 2018 at 15:30, Michael S. Tsirkin <mst@redhat.com> wrote:
> Right now tests report OK status if QEMU crashes during cleanup.
> Let's catch that case and fail the test.
>

Ha -- I've occasionally wondered why some obvious
test failures got reported as OK. Thanks for tracking
down what was going on.

-- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
  2018-05-24 15:00   ` Michael S. Tsirkin
@ 2018-05-24 15:04     ` Marc-André Lureau
  2018-05-24 15:18       ` Michael S. Tsirkin
  2018-05-24 15:46     ` Thomas Huth
  1 sibling, 1 reply; 26+ messages in thread
From: Marc-André Lureau @ 2018-05-24 15:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Thomas Huth, Markus Armbruster, QEMU, Tiwei Bie,
	Philippe Mathieu-Daudé

Hi

On Thu, May 24, 2018 at 5:00 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
>> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
>> > Right now tests report OK status if QEMU crashes during cleanup.
>> > Let's catch that case and fail the test.
>> >
>> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> > ---
>> >  tests/libqtest.c | 9 ++++++++-
>> >  1 file changed, 8 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/tests/libqtest.c b/tests/libqtest.c
>> > index 43fb97e..f869854 100644
>> > --- a/tests/libqtest.c
>> > +++ b/tests/libqtest.c
>> > @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>> >  static void kill_qemu(QTestState *s)
>> >  {
>> >      if (s->qemu_pid != -1) {
>> > +        int wstatus = 0;
>> > +        pid_t pid;
>> > +
>> >          kill(s->qemu_pid, SIGTERM);
>> > -        waitpid(s->qemu_pid, NULL, 0);
>> > +        pid = waitpid(s->qemu_pid, &wstatus, 0);
>> > +
>> > +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
>> > +            assert(!WCOREDUMP(wstatus));
>> > +        }
>> >      }
>> >  }
>>
>> That's basically a good idea ... but I've already seen yet another issue
>> in the past already: QEMU sometimes simply hangs in an endless loop
>> during clean up and never terminates. I think we should detect that
>> situation, too. So instead of killing QEMU at the end of the testing, I think we should
>> rather try to terminate it with the QMP "quit" command. If QEMU does not
>> terminate with an exit code of 0, then the test should be flagged a
>> failed (and only if QEMU did not terminate at all, it should be killed
>> with SIGKILL).
>>
>>  Thomas
>
> Fine but can we agree to do this as a patch on top? And do you have
> the time to implement this?
>
> I'm seeing patches that cause crash on cleanup, it's not a theoretical
> problem for me, so I'd like this one to go in first.

What is the difference between sending "quit" and SIGTERM (qemu has
code to handle it in termsig_handler()). Is this documented somewhere?


-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 2/1] libqtest: add more exit status checks
  2018-05-24 14:45 ` Thomas Huth
  2018-05-24 15:00   ` Michael S. Tsirkin
@ 2018-05-24 15:17   ` Michael S. Tsirkin
  2018-05-24 15:24     ` Eric Blake
  2018-05-24 15:38   ` [Qemu-devel] [PATCH v2 " Michael S. Tsirkin
  2018-05-24 16:11   ` [Qemu-devel] [PATCH v3 " Michael S. Tsirkin
  3 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 15:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Eric Blake, Philippe Mathieu-Daudé, Markus Armbruster

Add more checks on how did QEMU exit.

Legal ways to exit right now:
- exit(0) or return from main
- kill(SIGTERM) - sent by testing infrastructure

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Suggested-by: Thomas Huth <thuth@redhat.com>
---

TODO: refactor testing infrastructure to stop
QEMU through QMP as opposed to SIGTERM.


 tests/libqtest.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index f869854..2470f57 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -110,7 +110,14 @@ static void kill_qemu(QTestState *s)
         pid = waitpid(s->qemu_pid, &wstatus, 0);
 
         if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
+            /* Core dump is never OK */
             assert(!WCOREDUMP(wstatus));
+            /* Either exit normally or get killed */
+            assert((WIFEXITED(wstatus) || WIFSIGNALED(wstatus)));
+            /* Exited normally - check exit status. */
+            assert(!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) == 0);
+            /* Killed - we only ever send SIGTERM. */
+            assert(!WIFSIGNALED(wstatus) || WTERMSIG(wstatus) == SIGTERM);
         }
     }
 }
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
  2018-05-24 15:04     ` Marc-André Lureau
@ 2018-05-24 15:18       ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 15:18 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Thomas Huth, Markus Armbruster, QEMU, Tiwei Bie,
	Philippe Mathieu-Daudé

On Thu, May 24, 2018 at 05:04:56PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Thu, May 24, 2018 at 5:00 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
> >> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
> >> > Right now tests report OK status if QEMU crashes during cleanup.
> >> > Let's catch that case and fail the test.
> >> >
> >> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> > ---
> >> >  tests/libqtest.c | 9 ++++++++-
> >> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/tests/libqtest.c b/tests/libqtest.c
> >> > index 43fb97e..f869854 100644
> >> > --- a/tests/libqtest.c
> >> > +++ b/tests/libqtest.c
> >> > @@ -103,8 +103,15 @@ static int socket_accept(int sock)
> >> >  static void kill_qemu(QTestState *s)
> >> >  {
> >> >      if (s->qemu_pid != -1) {
> >> > +        int wstatus = 0;
> >> > +        pid_t pid;
> >> > +
> >> >          kill(s->qemu_pid, SIGTERM);
> >> > -        waitpid(s->qemu_pid, NULL, 0);
> >> > +        pid = waitpid(s->qemu_pid, &wstatus, 0);
> >> > +
> >> > +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> >> > +            assert(!WCOREDUMP(wstatus));
> >> > +        }
> >> >      }
> >> >  }
> >>
> >> That's basically a good idea ... but I've already seen yet another issue
> >> in the past already: QEMU sometimes simply hangs in an endless loop
> >> during clean up and never terminates. I think we should detect that
> >> situation, too. So instead of killing QEMU at the end of the testing, I think we should
> >> rather try to terminate it with the QMP "quit" command. If QEMU does not
> >> terminate with an exit code of 0, then the test should be flagged a
> >> failed (and only if QEMU did not terminate at all, it should be killed
> >> with SIGKILL).
> >>
> >>  Thomas
> >
> > Fine but can we agree to do this as a patch on top? And do you have
> > the time to implement this?
> >
> > I'm seeing patches that cause crash on cleanup, it's not a theoretical
> > problem for me, so I'd like this one to go in first.
> 
> What is the difference between sending "quit" and SIGTERM (qemu has
> code to handle it in termsig_handler()). Is this documented somewhere?

SIGTERM might wake QEMU if it's blocked somewhere in kernel.
Sending quit also tests QMP isn't blocked.

> 
> -- 
> Marc-André Lureau

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 2/1] libqtest: add more exit status checks
  2018-05-24 15:17   ` [Qemu-devel] [PATCH 2/1] libqtest: add more exit status checks Michael S. Tsirkin
@ 2018-05-24 15:24     ` Eric Blake
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2018-05-24 15:24 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel
  Cc: Thomas Huth, Philippe Mathieu-Daudé, Markus Armbruster

On 05/24/2018 10:17 AM, Michael S. Tsirkin wrote:
> Add more checks on how did QEMU exit.
> 
> Legal ways to exit right now:
> - exit(0) or return from main
> - kill(SIGTERM) - sent by testing infrastructure

Yes, but kill(SIGTERM) causes qemu proper to exit with status 0, because 
we handle the signal rather than letting it have default behavior or 
rethrowing it.

> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Suggested-by: Thomas Huth <thuth@redhat.com>
> ---
> 
> TODO: refactor testing infrastructure to stop
> QEMU through QMP as opposed to SIGTERM.

>           if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> +            /* Core dump is never OK */
>               assert(!WCOREDUMP(wstatus));
> +            /* Either exit normally or get killed */
> +            assert((WIFEXITED(wstatus) || WIFSIGNALED(wstatus)));

Eww. This is true if someone does SIGKILL, but not if someone does SIGTERM.

> +            /* Exited normally - check exit status. */
> +            assert(!WIFEXITED(wstatus) || WEXITSTATUS(wstatus) == 0);

This is fine.

> +            /* Killed - we only ever send SIGTERM. */
> +            assert(!WIFSIGNALED(wstatus) || WTERMSIG(wstatus) == SIGTERM);

But this is bogus.  Sending SIGTERM does NOT cause WIFSIGNALED(); you'd 
be hard-pressed to get qemu to exit with WTERMSIG(wstatus) == SIGTERM.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v2 2/1] libqtest: add more exit status checks
  2018-05-24 14:45 ` Thomas Huth
  2018-05-24 15:00   ` Michael S. Tsirkin
  2018-05-24 15:17   ` [Qemu-devel] [PATCH 2/1] libqtest: add more exit status checks Michael S. Tsirkin
@ 2018-05-24 15:38   ` Michael S. Tsirkin
  2018-05-24 15:52     ` Eric Blake
  2018-05-24 15:54     ` Thomas Huth
  2018-05-24 16:11   ` [Qemu-devel] [PATCH v3 " Michael S. Tsirkin
  3 siblings, 2 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 15:38 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eric Blake, Thomas Huth, Philippe Mathieu-Daudé, Markus Armbruster

Add more checks on how did QEMU exit.

Legal ways to exit right now:
- exit(0) or return from main
- kill(SIGTERM) - sent by testing infrastructure

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Changes from v1:
- drop SIGTERM as suggested by Eric

 tests/libqtest.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index f869854..0576874 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -110,7 +110,12 @@ static void kill_qemu(QTestState *s)
         pid = waitpid(s->qemu_pid, &wstatus, 0);
 
         if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
+            /* Core dump is never OK */
             assert(!WCOREDUMP(wstatus));
+            /* Must exit normally */
+            assert(WIFEXITED(wstatus));
+            /* If exited normally - check exit status */
+            assert(!WIFEXITED(wstatus) || !WEXITSTATUS(wstatus));
         }
     }
 }
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
  2018-05-24 15:00   ` Michael S. Tsirkin
  2018-05-24 15:04     ` Marc-André Lureau
@ 2018-05-24 15:46     ` Thomas Huth
  2018-05-24 15:51       ` Michael S. Tsirkin
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Huth @ 2018-05-24 15:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Eric Blake, Philippe Mathieu-Daudé,
	Markus Armbruster, Tiwei Bie

On 24.05.2018 17:00, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
>> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
>>> Right now tests report OK status if QEMU crashes during cleanup.
>>> Let's catch that case and fail the test.
>>>
>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>> ---
>>>  tests/libqtest.c | 9 ++++++++-
>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tests/libqtest.c b/tests/libqtest.c
>>> index 43fb97e..f869854 100644
>>> --- a/tests/libqtest.c
>>> +++ b/tests/libqtest.c
>>> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>>>  static void kill_qemu(QTestState *s)
>>>  {
>>>      if (s->qemu_pid != -1) {
>>> +        int wstatus = 0;
>>> +        pid_t pid;
>>> +
>>>          kill(s->qemu_pid, SIGTERM);
>>> -        waitpid(s->qemu_pid, NULL, 0);
>>> +        pid = waitpid(s->qemu_pid, &wstatus, 0);
>>> +
>>> +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
>>> +            assert(!WCOREDUMP(wstatus));
>>> +        }
>>>      }
>>>  }
>>
>> That's basically a good idea ... but I've already seen yet another issue
>> in the past already: QEMU sometimes simply hangs in an endless loop
>> during clean up and never terminates. I think we should detect that
>> situation, too. So instead of killing QEMU at the end of the testing, I think we should
>> rather try to terminate it with the QMP "quit" command. If QEMU does not
>> terminate with an exit code of 0, then the test should be flagged a
>> failed (and only if QEMU did not terminate at all, it should be killed
>> with SIGKILL).
>>
>>  Thomas
> 
> Fine but can we agree to do this as a patch on top? And do you have
> the time to implement this?

Fine for me if we do that later. And no, I currently don't have time to
work on this (but I've got it on my TODO list somewhere, so I hope I
won't forget about it later...).

> I'm seeing patches that cause crash on cleanup, it's not a theoretical
> problem for me, so I'd like this one to go in first.

Ok, so here are the two problems that I remember:

1)

git checkout 17bd9597be45b96ae00716b0ae01a4d11bbee1ab~1
make -j4 subdir-nios2-softmmu
nios2-softmmu/qemu-system-nios2 -monitor stdio

==> You can neither "quit" from the HMP prompt, nor kill QEMU with
SIGTERM, you've got to use SIGKILL instead.

Ok, libqtest likely would not have reported success in this case, too,
we just did not notice since there is no libqtest in place that tests
the nios2 machine in TCG mode. Anyway, it would be nice if qtest would
properly detect the situation and report an error instead of just
hanging in waitpid().

2)

git checkout b39b61e410022f96ceb53d4381d25cba5126ac44~1
make -j4 subdir-ppc-softmmu
ppc-softmmu/qemu-system-ppc -M 40p -monitor stdio

===> QEMU asserts here with both, HMP "quit" and SIGTERM. This was the
problem where libqtest did not report an error though it should have
reported one. So QEMU was not hanging in an endless loop here, but core
dumped ... Sorry, I apparently mixed this up in my mind with the first
case. That means we should be fine here with your patch.

 Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
  2018-05-24 15:46     ` Thomas Huth
@ 2018-05-24 15:51       ` Michael S. Tsirkin
  2018-05-24 16:01         ` Thomas Huth
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 15:51 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-devel, Eric Blake, Philippe Mathieu-Daudé,
	Markus Armbruster, Tiwei Bie

On Thu, May 24, 2018 at 05:46:05PM +0200, Thomas Huth wrote:
> On 24.05.2018 17:00, Michael S. Tsirkin wrote:
> > On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
> >> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
> >>> Right now tests report OK status if QEMU crashes during cleanup.
> >>> Let's catch that case and fail the test.
> >>>
> >>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>> ---
> >>>  tests/libqtest.c | 9 ++++++++-
> >>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/tests/libqtest.c b/tests/libqtest.c
> >>> index 43fb97e..f869854 100644
> >>> --- a/tests/libqtest.c
> >>> +++ b/tests/libqtest.c
> >>> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
> >>>  static void kill_qemu(QTestState *s)
> >>>  {
> >>>      if (s->qemu_pid != -1) {
> >>> +        int wstatus = 0;
> >>> +        pid_t pid;
> >>> +
> >>>          kill(s->qemu_pid, SIGTERM);
> >>> -        waitpid(s->qemu_pid, NULL, 0);
> >>> +        pid = waitpid(s->qemu_pid, &wstatus, 0);
> >>> +
> >>> +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> >>> +            assert(!WCOREDUMP(wstatus));
> >>> +        }
> >>>      }
> >>>  }
> >>
> >> That's basically a good idea ... but I've already seen yet another issue
> >> in the past already: QEMU sometimes simply hangs in an endless loop
> >> during clean up and never terminates. I think we should detect that
> >> situation, too. So instead of killing QEMU at the end of the testing, I think we should
> >> rather try to terminate it with the QMP "quit" command. If QEMU does not
> >> terminate with an exit code of 0, then the test should be flagged a
> >> failed (and only if QEMU did not terminate at all, it should be killed
> >> with SIGKILL).
> >>
> >>  Thomas
> > 
> > Fine but can we agree to do this as a patch on top? And do you have
> > the time to implement this?
> 
> Fine for me if we do that later. And no, I currently don't have time to
> work on this (but I've got it on my TODO list somewhere, so I hope I
> won't forget about it later...).
> 
> > I'm seeing patches that cause crash on cleanup, it's not a theoretical
> > problem for me, so I'd like this one to go in first.
> 
> Ok, so here are the two problems that I remember:
> 
> 1)
> 
> git checkout 17bd9597be45b96ae00716b0ae01a4d11bbee1ab~1
> make -j4 subdir-nios2-softmmu
> nios2-softmmu/qemu-system-nios2 -monitor stdio
> 
> ==> You can neither "quit" from the HMP prompt, nor kill QEMU with
> SIGTERM, you've got to use SIGKILL instead.
> 
> Ok, libqtest likely would not have reported success in this case, too,
> we just did not notice since there is no libqtest in place that tests
> the nios2 machine in TCG mode. Anyway, it would be nice if qtest would
> properly detect the situation and report an error instead of just
> hanging in waitpid().

That's a tall order, it assumes we can define a reasonable
time during which QEMU should exit.

> 2)
> 
> git checkout b39b61e410022f96ceb53d4381d25cba5126ac44~1
> make -j4 subdir-ppc-softmmu
> ppc-softmmu/qemu-system-ppc -M 40p -monitor stdio
> 
> ===> QEMU asserts here with both, HMP "quit" and SIGTERM. This was the
> problem where libqtest did not report an error though it should have
> reported one. So QEMU was not hanging in an endless loop here, but core
> dumped ... Sorry, I apparently mixed this up in my mind with the first
> case. That means we should be fine here with your patch.
> 
>  Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/1] libqtest: add more exit status checks
  2018-05-24 15:38   ` [Qemu-devel] [PATCH v2 " Michael S. Tsirkin
@ 2018-05-24 15:52     ` Eric Blake
  2018-05-24 16:00       ` Eric Blake
  2018-05-24 16:00       ` Michael S. Tsirkin
  2018-05-24 15:54     ` Thomas Huth
  1 sibling, 2 replies; 26+ messages in thread
From: Eric Blake @ 2018-05-24 15:52 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel
  Cc: Thomas Huth, Philippe Mathieu-Daudé, Markus Armbruster

On 05/24/2018 10:38 AM, Michael S. Tsirkin wrote:
> Add more checks on how did QEMU exit.
> 
> Legal ways to exit right now:
> - exit(0) or return from main
> - kill(SIGTERM) - sent by testing infrastructure
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Changes from v1:
> - drop SIGTERM as suggested by Eric
> 

> +++ b/tests/libqtest.c
> @@ -110,7 +110,12 @@ static void kill_qemu(QTestState *s)
>           pid = waitpid(s->qemu_pid, &wstatus, 0);
>   
>           if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {

Wait a moment. If WIFSIGNALED() is true...

> +            /* Core dump is never OK */
>               assert(!WCOREDUMP(wstatus));
> +            /* Must exit normally */
> +            assert(WIFEXITED(wstatus));

...then WIFEXITED() is false.  This is bogus.

> +            /* If exited normally - check exit status */
> +            assert(!WIFEXITED(wstatus) || !WEXITSTATUS(wstatus));

And you have some redundancy - !WIFEXITED() is not possible if you just 
asserted WIFEXITED().

Better would be:

if (pid == s->qemu_pid) {
     /*
      * Since sending SIGTERM turns into a normal exit, we want to flag
      * any non-normal exit, whether or not it dumped core, as a test
      * failure (even if it was a SIGKILL from someone desperate to stop
      * the testsuite).
      */
     assert(WIFEXITED(wstatus) && !WEXITSTATUS(wstatus));
}

Also, since waitpid() can only return either s->qemu_pid or -1 as we 
aren't using WNOHANG, it may also be worth asserting that if pid == -1, 
we either have EAGAIN (but why aren't we looping in that case?) or ECHILD.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/1] libqtest: add more exit status checks
  2018-05-24 15:38   ` [Qemu-devel] [PATCH v2 " Michael S. Tsirkin
  2018-05-24 15:52     ` Eric Blake
@ 2018-05-24 15:54     ` Thomas Huth
  2018-05-24 16:01       ` Michael S. Tsirkin
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Huth @ 2018-05-24 15:54 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel
  Cc: Eric Blake, Philippe Mathieu-Daudé, Markus Armbruster

On 24.05.2018 17:38, Michael S. Tsirkin wrote:
> Add more checks on how did QEMU exit.
> 
> Legal ways to exit right now:
> - exit(0) or return from main
> - kill(SIGTERM) - sent by testing infrastructure
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> 
> Changes from v1:
> - drop SIGTERM as suggested by Eric
> 
>  tests/libqtest.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tests/libqtest.c b/tests/libqtest.c
> index f869854..0576874 100644
> --- a/tests/libqtest.c
> +++ b/tests/libqtest.c
> @@ -110,7 +110,12 @@ static void kill_qemu(QTestState *s)
>          pid = waitpid(s->qemu_pid, &wstatus, 0);
>  
>          if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {

Since we're only waiting for one pid ... wouldn't it be better to do

 assert(pid == s->qemu_pid)

instead?

> +            /* Core dump is never OK */
>              assert(!WCOREDUMP(wstatus));
> +            /* Must exit normally */
> +            assert(WIFEXITED(wstatus));

So you asserted that WIFEXITED(wstatus) != 0 here ...

> +            /* If exited normally - check exit status */
> +            assert(!WIFEXITED(wstatus) || !WEXITSTATUS(wstatus));

So "!WIFEXITED(wstatus)" is always 0 here? That's confusing...?

 Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/1] libqtest: add more exit status checks
  2018-05-24 15:52     ` Eric Blake
@ 2018-05-24 16:00       ` Eric Blake
  2018-05-24 16:01         ` Michael S. Tsirkin
  2018-05-24 16:00       ` Michael S. Tsirkin
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Blake @ 2018-05-24 16:00 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel
  Cc: Thomas Huth, Philippe Mathieu-Daudé, Markus Armbruster

On 05/24/2018 10:52 AM, Eric Blake wrote:

> Also, since waitpid() can only return either s->qemu_pid or -1 as we 
> aren't using WNOHANG, it may also be worth asserting that if pid == -1, 
> we either have EAGAIN (but why aren't we looping in that case?) or ECHILD.

I meant EINTR, not EAGAIN.  But in general, using waitpid() to collect 
process status without doing it in a loop is risky.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/1] libqtest: add more exit status checks
  2018-05-24 15:52     ` Eric Blake
  2018-05-24 16:00       ` Eric Blake
@ 2018-05-24 16:00       ` Michael S. Tsirkin
  1 sibling, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 16:00 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Thomas Huth, Philippe Mathieu-Daudé, Markus Armbruster

On Thu, May 24, 2018 at 10:52:26AM -0500, Eric Blake wrote:
> On 05/24/2018 10:38 AM, Michael S. Tsirkin wrote:
> > Add more checks on how did QEMU exit.
> > 
> > Legal ways to exit right now:
> > - exit(0) or return from main
> > - kill(SIGTERM) - sent by testing infrastructure
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Changes from v1:
> > - drop SIGTERM as suggested by Eric
> > 
> 
> > +++ b/tests/libqtest.c
> > @@ -110,7 +110,12 @@ static void kill_qemu(QTestState *s)
> >           pid = waitpid(s->qemu_pid, &wstatus, 0);
> >           if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> 
> Wait a moment. If WIFSIGNALED() is true...
> 
> > +            /* Core dump is never OK */
> >               assert(!WCOREDUMP(wstatus));
> > +            /* Must exit normally */
> > +            assert(WIFEXITED(wstatus));
> 
> ...then WIFEXITED() is false.  This is bogus.
> 
> > +            /* If exited normally - check exit status */
> > +            assert(!WIFEXITED(wstatus) || !WEXITSTATUS(wstatus));
> 
> And you have some redundancy - !WIFEXITED() is not possible if you just
> asserted WIFEXITED().
> 
> Better would be:
> 
> if (pid == s->qemu_pid) {
>     /*
>      * Since sending SIGTERM turns into a normal exit, we want to flag
>      * any non-normal exit, whether or not it dumped core, as a test
>      * failure (even if it was a SIGKILL from someone desperate to stop
>      * the testsuite).
>      */
>     assert(WIFEXITED(wstatus) && !WEXITSTATUS(wstatus));
> }
> 
> Also, since waitpid() can only return either s->qemu_pid or -1 as we aren't
> using WNOHANG,


> it may also be worth asserting that if pid == -1, we either
> have EAGAIN (but why aren't we looping in that case?)

I don't think waitpid can return EAGAIN

> or ECHILD.

Right but checking for known failures explicitly is helpful for debugging.


> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH] libqtest: fail if child coredumps
  2018-05-24 15:51       ` Michael S. Tsirkin
@ 2018-05-24 16:01         ` Thomas Huth
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Huth @ 2018-05-24 16:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Eric Blake, Philippe Mathieu-Daudé,
	Markus Armbruster, Tiwei Bie

On 24.05.2018 17:51, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 05:46:05PM +0200, Thomas Huth wrote:
>> On 24.05.2018 17:00, Michael S. Tsirkin wrote:
>>> On Thu, May 24, 2018 at 04:45:31PM +0200, Thomas Huth wrote:
>>>> On 24.05.2018 16:30, Michael S. Tsirkin wrote:
>>>>> Right now tests report OK status if QEMU crashes during cleanup.
>>>>> Let's catch that case and fail the test.
>>>>>
>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> ---
>>>>>  tests/libqtest.c | 9 ++++++++-
>>>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/tests/libqtest.c b/tests/libqtest.c
>>>>> index 43fb97e..f869854 100644
>>>>> --- a/tests/libqtest.c
>>>>> +++ b/tests/libqtest.c
>>>>> @@ -103,8 +103,15 @@ static int socket_accept(int sock)
>>>>>  static void kill_qemu(QTestState *s)
>>>>>  {
>>>>>      if (s->qemu_pid != -1) {
>>>>> +        int wstatus = 0;
>>>>> +        pid_t pid;
>>>>> +
>>>>>          kill(s->qemu_pid, SIGTERM);
>>>>> -        waitpid(s->qemu_pid, NULL, 0);
>>>>> +        pid = waitpid(s->qemu_pid, &wstatus, 0);
>>>>> +
>>>>> +        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
>>>>> +            assert(!WCOREDUMP(wstatus));
>>>>> +        }
>>>>>      }
>>>>>  }
>>>>
>>>> That's basically a good idea ... but I've already seen yet another issue
>>>> in the past already: QEMU sometimes simply hangs in an endless loop
>>>> during clean up and never terminates. I think we should detect that
>>>> situation, too. So instead of killing QEMU at the end of the testing, I think we should
>>>> rather try to terminate it with the QMP "quit" command. If QEMU does not
>>>> terminate with an exit code of 0, then the test should be flagged a
>>>> failed (and only if QEMU did not terminate at all, it should be killed
>>>> with SIGKILL).
>>>>
>>>>  Thomas
>>>
>>> Fine but can we agree to do this as a patch on top? And do you have
>>> the time to implement this?
>>
>> Fine for me if we do that later. And no, I currently don't have time to
>> work on this (but I've got it on my TODO list somewhere, so I hope I
>> won't forget about it later...).
>>
>>> I'm seeing patches that cause crash on cleanup, it's not a theoretical
>>> problem for me, so I'd like this one to go in first.
>>
>> Ok, so here are the two problems that I remember:
>>
>> 1)
>>
>> git checkout 17bd9597be45b96ae00716b0ae01a4d11bbee1ab~1
>> make -j4 subdir-nios2-softmmu
>> nios2-softmmu/qemu-system-nios2 -monitor stdio
>>
>> ==> You can neither "quit" from the HMP prompt, nor kill QEMU with
>> SIGTERM, you've got to use SIGKILL instead.
>>
>> Ok, libqtest likely would not have reported success in this case, too,
>> we just did not notice since there is no libqtest in place that tests
>> the nios2 machine in TCG mode. Anyway, it would be nice if qtest would
>> properly detect the situation and report an error instead of just
>> hanging in waitpid().
> 
> That's a tall order, it assumes we can define a reasonable
> time during which QEMU should exit.

Yeah, thinking about this again, and considering the problems that we've
seen with overloaded test machines in the past, it's probably not worth
the effort to find a reasonable timeout here.

And since the problem with an endless loop won't go unnoticed (I
originally thought SIGTERM would kill QEMU, so that endless loops would
not be noticed here - I was just not aware of the signal handler that we
have here in QEMU), let's simply forget about that idea with QMP "quit"
+ timeout + SIGKILL and continue with your patch instead.

 Thomas


>> 2)
>>
>> git checkout b39b61e410022f96ceb53d4381d25cba5126ac44~1
>> make -j4 subdir-ppc-softmmu
>> ppc-softmmu/qemu-system-ppc -M 40p -monitor stdio
>>
>> ===> QEMU asserts here with both, HMP "quit" and SIGTERM. This was the
>> problem where libqtest did not report an error though it should have
>> reported one. So QEMU was not hanging in an endless loop here, but core
>> dumped ... Sorry, I apparently mixed this up in my mind with the first
>> case. That means we should be fine here with your patch.
>>
>>  Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/1] libqtest: add more exit status checks
  2018-05-24 15:54     ` Thomas Huth
@ 2018-05-24 16:01       ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 16:01 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-devel, Eric Blake, Philippe Mathieu-Daudé, Markus Armbruster

On Thu, May 24, 2018 at 05:54:37PM +0200, Thomas Huth wrote:
> On 24.05.2018 17:38, Michael S. Tsirkin wrote:
> > Add more checks on how did QEMU exit.
> > 
> > Legal ways to exit right now:
> > - exit(0) or return from main
> > - kill(SIGTERM) - sent by testing infrastructure
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> > 
> > Changes from v1:
> > - drop SIGTERM as suggested by Eric
> > 
> >  tests/libqtest.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/tests/libqtest.c b/tests/libqtest.c
> > index f869854..0576874 100644
> > --- a/tests/libqtest.c
> > +++ b/tests/libqtest.c
> > @@ -110,7 +110,12 @@ static void kill_qemu(QTestState *s)
> >          pid = waitpid(s->qemu_pid, &wstatus, 0);
> >  
> >          if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> 
> Since we're only waiting for one pid ... wouldn't it be better to do
> 
>  assert(pid == s->qemu_pid)
> 
> instead?
> 
> > +            /* Core dump is never OK */
> >              assert(!WCOREDUMP(wstatus));
> > +            /* Must exit normally */
> > +            assert(WIFEXITED(wstatus));
> 
> So you asserted that WIFEXITED(wstatus) != 0 here ...
> 
> > +            /* If exited normally - check exit status */
> > +            assert(!WIFEXITED(wstatus) || !WEXITSTATUS(wstatus));
> 
> So "!WIFEXITED(wstatus)" is always 0 here? That's confusing...?
> 
>  Thomas

It will allow adding more conditions if we ever need to though.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/1] libqtest: add more exit status checks
  2018-05-24 16:00       ` Eric Blake
@ 2018-05-24 16:01         ` Michael S. Tsirkin
  2018-05-24 18:16           ` Eric Blake
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 16:01 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Thomas Huth, Philippe Mathieu-Daudé, Markus Armbruster

On Thu, May 24, 2018 at 11:00:19AM -0500, Eric Blake wrote:
> On 05/24/2018 10:52 AM, Eric Blake wrote:
> 
> > Also, since waitpid() can only return either s->qemu_pid or -1 as we
> > aren't using WNOHANG, it may also be worth asserting that if pid == -1,
> > we either have EAGAIN (but why aren't we looping in that case?) or
> > ECHILD.
> 
> I meant EINTR, not EAGAIN.  But in general, using waitpid() to collect
> process status without doing it in a loop is risky.

Interesting. Risky how?

> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH v3 2/1] libqtest: add more exit status checks
  2018-05-24 14:45 ` Thomas Huth
                     ` (2 preceding siblings ...)
  2018-05-24 15:38   ` [Qemu-devel] [PATCH v2 " Michael S. Tsirkin
@ 2018-05-24 16:11   ` Michael S. Tsirkin
  2018-05-24 17:26     ` Thomas Huth
  3 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 16:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eric Blake, Thomas Huth, Philippe Mathieu-Daudé, Markus Armbruster

Add more checks on how did QEMU exit.

Legal ways to exit right now:
- exit(0) or return from main
- kill(SIGTERM) - sent by testing infrastructure

Anything else is illegal.

Include explicit asserts on bad behaviour encountered
to make debugging easier.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Changes from v2:
- bugfix
- assert returned pid
- rework complex asserts for clarity

Changes from v1:
- drop SIGTERM as suggested by Eric

 tests/libqtest.c | 5 +++++


 tests/libqtest.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index f869854..36ca859 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -109,9 +109,19 @@ static void kill_qemu(QTestState *s)
         kill(s->qemu_pid, SIGTERM);
         pid = waitpid(s->qemu_pid, &wstatus, 0);
 
-        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
+        /* waitpid returns child PID on success */
+        assert(pid == s->qemu_pid);
+
+        /* If exited on signal - check the reason: core dump is never OK */
+        if (WIFSIGNALED(wstatus)) {
             assert(!WCOREDUMP(wstatus));
         }
+        /* If exited normally - check exit status */
+        if (WIFEXITED(wstatus)) {
+            assert(!WEXITSTATUS(wstatus));
+        }
+        /* Valid ways to exit: right now only return from main or exit */
+        assert(WIFEXITED(wstatus));
     }
 }
 
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/1] libqtest: add more exit status checks
  2018-05-24 16:11   ` [Qemu-devel] [PATCH v3 " Michael S. Tsirkin
@ 2018-05-24 17:26     ` Thomas Huth
  2018-05-24 17:33       ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Huth @ 2018-05-24 17:26 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel
  Cc: Eric Blake, Philippe Mathieu-Daudé, Markus Armbruster

On 24.05.2018 18:11, Michael S. Tsirkin wrote:
> Add more checks on how did QEMU exit.
> 
> Legal ways to exit right now:
> - exit(0) or return from main
> - kill(SIGTERM) - sent by testing infrastructure
> 
> Anything else is illegal.
[...]
> -        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> +        /* waitpid returns child PID on success */
> +        assert(pid == s->qemu_pid);
> +
> +        /* If exited on signal - check the reason: core dump is never OK */
> +        if (WIFSIGNALED(wstatus)) {
>              assert(!WCOREDUMP(wstatus));
>          }
> +        /* If exited normally - check exit status */
> +        if (WIFEXITED(wstatus)) {
> +            assert(!WEXITSTATUS(wstatus));
> +        }
> +        /* Valid ways to exit: right now only return from main or exit */
> +        assert(WIFEXITED(wstatus));
>      }
>  }

It's strange that you always get WIFEXITED(wstatus) == true here, even
if QEMU has been terminated by SIGTERM? I assume that's due to the fact
that QEMU intercepts SIGTERM and terminates via exit() instead? So I
think you could simply replace the last three asserts with:

	assert(WIFEXITED(wstatus) && !WEXITSTATUS(wstatus));

 Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/1] libqtest: add more exit status checks
  2018-05-24 17:26     ` Thomas Huth
@ 2018-05-24 17:33       ` Michael S. Tsirkin
  2018-05-24 17:58         ` Thomas Huth
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 17:33 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-devel, Eric Blake, Philippe Mathieu-Daudé, Markus Armbruster

On Thu, May 24, 2018 at 07:26:16PM +0200, Thomas Huth wrote:
> On 24.05.2018 18:11, Michael S. Tsirkin wrote:
> > Add more checks on how did QEMU exit.
> > 
> > Legal ways to exit right now:
> > - exit(0) or return from main
> > - kill(SIGTERM) - sent by testing infrastructure
> > 
> > Anything else is illegal.
> [...]
> > -        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> > +        /* waitpid returns child PID on success */
> > +        assert(pid == s->qemu_pid);
> > +
> > +        /* If exited on signal - check the reason: core dump is never OK */
> > +        if (WIFSIGNALED(wstatus)) {
> >              assert(!WCOREDUMP(wstatus));
> >          }
> > +        /* If exited normally - check exit status */
> > +        if (WIFEXITED(wstatus)) {
> > +            assert(!WEXITSTATUS(wstatus));
> > +        }
> > +        /* Valid ways to exit: right now only return from main or exit */
> > +        assert(WIFEXITED(wstatus));
> >      }
> >  }
> 
> It's strange that you always get WIFEXITED(wstatus) == true here, even
> if QEMU has been terminated by SIGTERM? I assume that's due to the fact
> that QEMU intercepts SIGTERM and terminates via exit() instead?

Right now, yes. This can of course change, so it's not
a good idea hard-coding this assumption to deep
in the code, imho.

> So I
> think you could simply replace the last three asserts with:
> 
> 	assert(WIFEXITED(wstatus) && !WEXITSTATUS(wstatus));
> 
>  Thomas

I could but they would be harder to debug.

If I see
	"assertion failed: !WCOREDUMP(wstatus)"
then that is very readable.

If I just see
	"assertion failed: WIFEXITED(wstatus) && !WEXITSTATUS(wstatus)"
then I just know something went wrong.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/1] libqtest: add more exit status checks
  2018-05-24 17:33       ` Michael S. Tsirkin
@ 2018-05-24 17:58         ` Thomas Huth
  2018-05-24 18:15           ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Huth @ 2018-05-24 17:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Eric Blake, Philippe Mathieu-Daudé, Markus Armbruster

On 24.05.2018 19:33, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 07:26:16PM +0200, Thomas Huth wrote:
>> On 24.05.2018 18:11, Michael S. Tsirkin wrote:
>>> Add more checks on how did QEMU exit.
>>>
>>> Legal ways to exit right now:
>>> - exit(0) or return from main
>>> - kill(SIGTERM) - sent by testing infrastructure
>>>
>>> Anything else is illegal.
>> [...]
>>> -        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
>>> +        /* waitpid returns child PID on success */
>>> +        assert(pid == s->qemu_pid);
>>> +
>>> +        /* If exited on signal - check the reason: core dump is never OK */
>>> +        if (WIFSIGNALED(wstatus)) {
>>>              assert(!WCOREDUMP(wstatus));
>>>          }
>>> +        /* If exited normally - check exit status */
>>> +        if (WIFEXITED(wstatus)) {
>>> +            assert(!WEXITSTATUS(wstatus));
>>> +        }
>>> +        /* Valid ways to exit: right now only return from main or exit */
>>> +        assert(WIFEXITED(wstatus));
>>>      }
>>>  }
>>
>> It's strange that you always get WIFEXITED(wstatus) == true here, even
>> if QEMU has been terminated by SIGTERM? I assume that's due to the fact
>> that QEMU intercepts SIGTERM and terminates via exit() instead?
> 
> Right now, yes. This can of course change, so it's not
> a good idea hard-coding this assumption to deep
> in the code, imho.
> 
>> So I
>> think you could simply replace the last three asserts with:
>>
>> 	assert(WIFEXITED(wstatus) && !WEXITSTATUS(wstatus));
>>
>>  Thomas
> 
> I could but they would be harder to debug.
> 
> If I see
> 	"assertion failed: !WCOREDUMP(wstatus)"
> then that is very readable.
> 
> If I just see
> 	"assertion failed: WIFEXITED(wstatus) && !WEXITSTATUS(wstatus)"
> then I just know something went wrong.

Then simply use:

	assert(WIFEXITED(wstatus));
	assert(!WEXITSTATUS(wstatus));

If QEMU exited due to a signal, you'll see the first assert, and if it
returned a non-zero exit code, you'll see the second assert. That's all
you really need to know here, I think.

I don't think that you gain anything by checking WCOREDUMP() here. And
according to the man-page of waitpid:

 This macro is not specified in POSIX.1-2001 and is not
 available on some UNIX implementations (e.g., AIX, SunOS).
 Only use this enclosed in #ifdef WCOREDUMP ... #endif.

So if you insist on using that macro, you might need to add some #ifdef
code around it, I think.

 Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/1] libqtest: add more exit status checks
  2018-05-24 17:58         ` Thomas Huth
@ 2018-05-24 18:15           ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 18:15 UTC (permalink / raw)
  To: Thomas Huth
  Cc: qemu-devel, Eric Blake, Philippe Mathieu-Daudé, Markus Armbruster

On Thu, May 24, 2018 at 07:58:06PM +0200, Thomas Huth wrote:
> On 24.05.2018 19:33, Michael S. Tsirkin wrote:
> > On Thu, May 24, 2018 at 07:26:16PM +0200, Thomas Huth wrote:
> >> On 24.05.2018 18:11, Michael S. Tsirkin wrote:
> >>> Add more checks on how did QEMU exit.
> >>>
> >>> Legal ways to exit right now:
> >>> - exit(0) or return from main
> >>> - kill(SIGTERM) - sent by testing infrastructure
> >>>
> >>> Anything else is illegal.
> >> [...]
> >>> -        if (pid == s->qemu_pid && WIFSIGNALED(wstatus)) {
> >>> +        /* waitpid returns child PID on success */
> >>> +        assert(pid == s->qemu_pid);
> >>> +
> >>> +        /* If exited on signal - check the reason: core dump is never OK */
> >>> +        if (WIFSIGNALED(wstatus)) {
> >>>              assert(!WCOREDUMP(wstatus));
> >>>          }
> >>> +        /* If exited normally - check exit status */
> >>> +        if (WIFEXITED(wstatus)) {
> >>> +            assert(!WEXITSTATUS(wstatus));
> >>> +        }
> >>> +        /* Valid ways to exit: right now only return from main or exit */
> >>> +        assert(WIFEXITED(wstatus));
> >>>      }
> >>>  }
> >>
> >> It's strange that you always get WIFEXITED(wstatus) == true here, even
> >> if QEMU has been terminated by SIGTERM? I assume that's due to the fact
> >> that QEMU intercepts SIGTERM and terminates via exit() instead?
> > 
> > Right now, yes. This can of course change, so it's not
> > a good idea hard-coding this assumption to deep
> > in the code, imho.
> > 
> >> So I
> >> think you could simply replace the last three asserts with:
> >>
> >> 	assert(WIFEXITED(wstatus) && !WEXITSTATUS(wstatus));
> >>
> >>  Thomas
> > 
> > I could but they would be harder to debug.
> > 
> > If I see
> > 	"assertion failed: !WCOREDUMP(wstatus)"
> > then that is very readable.
> > 
> > If I just see
> > 	"assertion failed: WIFEXITED(wstatus) && !WEXITSTATUS(wstatus)"
> > then I just know something went wrong.
> 
> Then simply use:
> 
> 	assert(WIFEXITED(wstatus));
> 	assert(!WEXITSTATUS(wstatus));
> 
> If QEMU exited due to a signal, you'll see the first assert, and if it
> returned a non-zero exit code, you'll see the second assert. That's all
> you really need to know here, I think.
> 
> I don't think that you gain anything by checking WCOREDUMP() here.
> And
> according to the man-page of waitpid:
> 
>  This macro is not specified in POSIX.1-2001 and is not
>  available on some UNIX implementations (e.g., AIX, SunOS).
>  Only use this enclosed in #ifdef WCOREDUMP ... #endif.
> 
> So if you insist on using that macro, you might need to add some #ifdef
> code around it, I think.
> 
>  Thomas

I think we are better off defining it if it's not defined.
Will post an osdep patch.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/1] libqtest: add more exit status checks
  2018-05-24 16:01         ` Michael S. Tsirkin
@ 2018-05-24 18:16           ` Eric Blake
  2018-05-24 18:20             ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Blake @ 2018-05-24 18:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Thomas Huth, Philippe Mathieu-Daudé, Markus Armbruster

On 05/24/2018 11:01 AM, Michael S. Tsirkin wrote:
> On Thu, May 24, 2018 at 11:00:19AM -0500, Eric Blake wrote:
>> On 05/24/2018 10:52 AM, Eric Blake wrote:
>>
>>> Also, since waitpid() can only return either s->qemu_pid or -1 as we
>>> aren't using WNOHANG, it may also be worth asserting that if pid == -1,
>>> we either have EAGAIN (but why aren't we looping in that case?) or
>>> ECHILD.
>>
>> I meant EINTR, not EAGAIN.  But in general, using waitpid() to collect
>> process status without doing it in a loop is risky.
> 
> Interesting. Risky how?

If your process has any signal handler installed, then an EINTR failure 
means you interpret a transient failure to grab process status (because 
your check was interrupted by something else) as a permanent failure, 
unless you go back to another waitpid() in a loop.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/1] libqtest: add more exit status checks
  2018-05-24 18:16           ` Eric Blake
@ 2018-05-24 18:20             ` Michael S. Tsirkin
  2018-05-25  5:40               ` Markus Armbruster
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-05-24 18:20 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Thomas Huth, Philippe Mathieu-Daudé, Markus Armbruster

On Thu, May 24, 2018 at 01:16:24PM -0500, Eric Blake wrote:
> On 05/24/2018 11:01 AM, Michael S. Tsirkin wrote:
> > On Thu, May 24, 2018 at 11:00:19AM -0500, Eric Blake wrote:
> > > On 05/24/2018 10:52 AM, Eric Blake wrote:
> > > 
> > > > Also, since waitpid() can only return either s->qemu_pid or -1 as we
> > > > aren't using WNOHANG, it may also be worth asserting that if pid == -1,
> > > > we either have EAGAIN (but why aren't we looping in that case?) or
> > > > ECHILD.
> > > 
> > > I meant EINTR, not EAGAIN.  But in general, using waitpid() to collect
> > > process status without doing it in a loop is risky.
> > 
> > Interesting. Risky how?
> 
> If your process has any signal handler installed, then an EINTR failure
> means you interpret a transient failure to grab process status (because your
> check was interrupted by something else) as a permanent failure, unless you
> go back to another waitpid() in a loop.

I don't think we have a handler installed, though.

> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/1] libqtest: add more exit status checks
  2018-05-24 18:20             ` Michael S. Tsirkin
@ 2018-05-25  5:40               ` Markus Armbruster
  0 siblings, 0 replies; 26+ messages in thread
From: Markus Armbruster @ 2018-05-25  5:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Blake, Thomas Huth, qemu-devel, Philippe Mathieu-Daudé

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Thu, May 24, 2018 at 01:16:24PM -0500, Eric Blake wrote:
>> On 05/24/2018 11:01 AM, Michael S. Tsirkin wrote:
>> > On Thu, May 24, 2018 at 11:00:19AM -0500, Eric Blake wrote:
>> > > On 05/24/2018 10:52 AM, Eric Blake wrote:
>> > > 
>> > > > Also, since waitpid() can only return either s->qemu_pid or -1 as we
>> > > > aren't using WNOHANG, it may also be worth asserting that if pid == -1,
>> > > > we either have EAGAIN (but why aren't we looping in that case?) or
>> > > > ECHILD.
>> > > 
>> > > I meant EINTR, not EAGAIN.  But in general, using waitpid() to collect
>> > > process status without doing it in a loop is risky.
>> > 
>> > Interesting. Risky how?
>> 
>> If your process has any signal handler installed, then an EINTR failure
>> means you interpret a transient failure to grab process status (because your
>> check was interrupted by something else) as a permanent failure, unless you
>> go back to another waitpid() in a loop.
>
> I don't think we have a handler installed, though.

That's a nasty assumption to make for a library.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-05-25  5:40 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-24 14:30 [Qemu-devel] [PATCH] libqtest: fail if child coredumps Michael S. Tsirkin
2018-05-24 14:45 ` Thomas Huth
2018-05-24 15:00   ` Michael S. Tsirkin
2018-05-24 15:04     ` Marc-André Lureau
2018-05-24 15:18       ` Michael S. Tsirkin
2018-05-24 15:46     ` Thomas Huth
2018-05-24 15:51       ` Michael S. Tsirkin
2018-05-24 16:01         ` Thomas Huth
2018-05-24 15:17   ` [Qemu-devel] [PATCH 2/1] libqtest: add more exit status checks Michael S. Tsirkin
2018-05-24 15:24     ` Eric Blake
2018-05-24 15:38   ` [Qemu-devel] [PATCH v2 " Michael S. Tsirkin
2018-05-24 15:52     ` Eric Blake
2018-05-24 16:00       ` Eric Blake
2018-05-24 16:01         ` Michael S. Tsirkin
2018-05-24 18:16           ` Eric Blake
2018-05-24 18:20             ` Michael S. Tsirkin
2018-05-25  5:40               ` Markus Armbruster
2018-05-24 16:00       ` Michael S. Tsirkin
2018-05-24 15:54     ` Thomas Huth
2018-05-24 16:01       ` Michael S. Tsirkin
2018-05-24 16:11   ` [Qemu-devel] [PATCH v3 " Michael S. Tsirkin
2018-05-24 17:26     ` Thomas Huth
2018-05-24 17:33       ` Michael S. Tsirkin
2018-05-24 17:58         ` Thomas Huth
2018-05-24 18:15           ` Michael S. Tsirkin
2018-05-24 15:02 ` [Qemu-devel] [PATCH] libqtest: fail if child coredumps Peter Maydell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.