All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 0/3] scripts/qemu.py small fixes
@ 2017-07-20  9:18 Amador Pahim
  2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running() Amador Pahim
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Amador Pahim @ 2017-07-20  9:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: famz, berrange, ehabkost, mreitz, kwolf, armbru, crosa, ldoktor,
	Amador Pahim

First commit fixes the 'is_running()' method, which is not
working currently.

Second commit includes the qemu command line and its output when
there's an exception during the launch() and the VM is not started.

Last commit renames self._args to self.args. The leading underscore
represents that the variable is private and as such it should not be
accessed externally. But that variable is the main API for inclusion
of Qemu command arguments, so the rename makes it public.

Changes v1->v2:
 - Style fixes to make checkpatch.pl happy.
 - Rebased.
Changes v2->v3:
 - Fix typo in patch 3 ("qemu.py: make 'args' public") commit message.

Amador Pahim (3):
  qemu.py: fix is_running()
  qemu.py: include debug information on launch error
  qemu.py: make 'args' public

 scripts/qemu.py               | 33 ++++++++++++++++++++++++---------
 tests/qemu-iotests/iotests.py | 18 +++++++++---------
 2 files changed, 33 insertions(+), 18 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running()
  2017-07-20  9:18 [Qemu-devel] [PATCH v3 0/3] scripts/qemu.py small fixes Amador Pahim
@ 2017-07-20  9:19 ` Amador Pahim
  2017-07-20 11:49   ` Markus Armbruster
  2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error Amador Pahim
  2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 3/3] qemu.py: make 'args' public Amador Pahim
  2 siblings, 1 reply; 17+ messages in thread
From: Amador Pahim @ 2017-07-20  9:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: famz, berrange, ehabkost, mreitz, kwolf, armbru, crosa, ldoktor,
	Amador Pahim

Current implementation is broken. It does not really test if the child
process is running.

The Popen.returncode will only be set after by a poll(), wait() or
communicate(). If the Popen fails to launch a VM, the Popen.returncode
will not turn to None by itself.

Instead of using Popen.returncode, let's use Popen.poll(), which
actually checks if child process has terminated.

Signed-off-by: Amador Pahim <apahim@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 scripts/qemu.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/qemu.py b/scripts/qemu.py
index 880e3e8219..f0fade32bd 100644
--- a/scripts/qemu.py
+++ b/scripts/qemu.py
@@ -86,7 +86,7 @@ class QEMUMachine(object):
             raise
 
     def is_running(self):
-        return self._popen and (self._popen.returncode is None)
+        return self._popen and (self._popen.poll() is None)
 
     def exitcode(self):
         if self._popen is None:
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error
  2017-07-20  9:18 [Qemu-devel] [PATCH v3 0/3] scripts/qemu.py small fixes Amador Pahim
  2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running() Amador Pahim
@ 2017-07-20  9:19 ` Amador Pahim
  2017-07-20 11:58   ` Markus Armbruster
  2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 3/3] qemu.py: make 'args' public Amador Pahim
  2 siblings, 1 reply; 17+ messages in thread
From: Amador Pahim @ 2017-07-20  9:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: famz, berrange, ehabkost, mreitz, kwolf, armbru, crosa, ldoktor,
	Amador Pahim

When launching a VM, if an exception happens and the VM is not
initiated, it is useful to see the qemu command line that was executed
and the output of that command.

Before the patch:

    >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
    >>> VM.launch()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "qemu.py", line 137, in launch
        self._post_launch()
      File "qemu.py", line 121, in _post_launch
        self._qmp.accept()
      File "qmp/qmp.py", line 145, in accept
        self.__sock, _ = self.__sock.accept()
      File "/usr/lib64/python2.7/socket.py", line 206, in accept
        sock, addr = self._sock.accept()
    socket.timeout: timed out

After the patch:

    >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
    >>> VM.launch()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "qemu.py", line 156, in launch
        raise RuntimeError(msg)
    RuntimeError: Error launching VM.
    Original Exception:
    Traceback (most recent call last):
      File "qemu.py", line 138, in launch
        self._post_launch()
      File "qemu.py", line 122, in _post_launch
        self._qmp.accept()
      File "qmp/qmp.py", line 145, in accept
        self.__sock, _ = self.__sock.accept()
      File "/usr/lib64/python2.7/socket.py", line 206, in accept
        sock, addr = self._sock.accept()
    timeout: timed out
    Command:
    /usr/bin/qemu-system-aarch64 -chardev socket,id=mon,
    path=/var/tmp/qemu-23958-monitor.sock -mon chardev=mon,mode=control
    -display none -vga none
    Output:
    qemu-system-aarch64: No machine specified, and there is no default
    Use -machine help to list supported machines

Also, if the launch() faces an exception, the 'except' now will use args
to fill the debug information. So this patch assigns 'args' earlier,
assuring it will be available for the 'except'.

Signed-off-by: Amador Pahim <apahim@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 scripts/qemu.py | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/scripts/qemu.py b/scripts/qemu.py
index f0fade32bd..2707ae7f75 100644
--- a/scripts/qemu.py
+++ b/scripts/qemu.py
@@ -18,6 +18,7 @@ import os
 import sys
 import subprocess
 import qmp.qmp
+import traceback
 
 
 class QEMUMachine(object):
@@ -129,17 +130,30 @@ class QEMUMachine(object):
         '''Launch the VM and establish a QMP connection'''
         devnull = open('/dev/null', 'rb')
         qemulog = open(self._qemu_log_path, 'wb')
+        args = self._wrapper + [self._binary] + self._base_args() + self.args
         try:
             self._pre_launch()
-            args = self._wrapper + [self._binary] + self._base_args() + self._args
             self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
                                            stderr=subprocess.STDOUT, shell=False)
             self._post_launch()
         except:
+            self._load_io_log()
             if self.is_running():
                 self._popen.kill()
                 self._popen.wait()
-            self._load_io_log()
+            else:
+                exc_type, exc_value, exc_traceback = sys.exc_info()
+                msg = ('Error launching VM.\n'
+                       'Original Exception: \n%s'
+                       'Command:\n%s\n'
+                       'Output:\n%s\n' %
+                       (''.join(traceback.format_exception(exc_type,
+                                                           exc_value,
+                                                           exc_traceback)),
+                        ' '.join(args),
+                        self._iolog))
+                self._post_shutdown()
+                raise RuntimeError(msg)
             self._post_shutdown()
             raise
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH v3 3/3] qemu.py: make 'args' public
  2017-07-20  9:18 [Qemu-devel] [PATCH v3 0/3] scripts/qemu.py small fixes Amador Pahim
  2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running() Amador Pahim
  2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error Amador Pahim
@ 2017-07-20  9:19 ` Amador Pahim
  2 siblings, 0 replies; 17+ messages in thread
From: Amador Pahim @ 2017-07-20  9:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: famz, berrange, ehabkost, mreitz, kwolf, armbru, crosa, ldoktor,
	Amador Pahim

Let's make args public so users can extend it without feeling like
abusing the internal API.

Signed-off-by: Amador Pahim <apahim@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
---
 scripts/qemu.py               | 13 +++++++------
 tests/qemu-iotests/iotests.py | 18 +++++++++---------
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/scripts/qemu.py b/scripts/qemu.py
index 2707ae7f75..2c2043f89a 100644
--- a/scripts/qemu.py
+++ b/scripts/qemu.py
@@ -34,7 +34,7 @@ class QEMUMachine(object):
         self._qemu_log_path = os.path.join(test_dir, name + ".log")
         self._popen = None
         self._binary = binary
-        self._args = list(args) # Force copy args in case we modify them
+        self.args = list(args) # Force copy args in case we modify them
         self._wrapper = wrapper
         self._events = []
         self._iolog = None
@@ -44,8 +44,8 @@ class QEMUMachine(object):
     # This can be used to add an unused monitor instance.
     def add_monitor_telnet(self, ip, port):
         args = 'tcp:%s:%d,server,nowait,telnet' % (ip, port)
-        self._args.append('-monitor')
-        self._args.append(args)
+        self.args.append('-monitor')
+        self.args.append(args)
 
     def add_fd(self, fd, fdset, opaque, opts=''):
         '''Pass a file descriptor to the VM'''
@@ -55,8 +55,8 @@ class QEMUMachine(object):
         if opts:
             options.append(opts)
 
-        self._args.append('-add-fd')
-        self._args.append(','.join(options))
+        self.args.append('-add-fd')
+        self.args.append(','.join(options))
         return self
 
     def send_fd_scm(self, fd_file_path):
@@ -168,7 +168,8 @@ class QEMUMachine(object):
 
             exitcode = self._popen.wait()
             if exitcode < 0:
-                sys.stderr.write('qemu received signal %i: %s\n' % (-exitcode, ' '.join(self._args)))
+                sys.stderr.write('qemu received signal %i: %s\n' %
+                                 (-exitcode, ' '.join(self.args)))
             self._load_io_log()
             self._post_shutdown()
 
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index abcf3c10e2..6925d8841e 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -150,13 +150,13 @@ class VM(qtest.QEMUQtestMachine):
         self._num_drives = 0
 
     def add_device(self, opts):
-        self._args.append('-device')
-        self._args.append(opts)
+        self.args.append('-device')
+        self.args.append(opts)
         return self
 
     def add_drive_raw(self, opts):
-        self._args.append('-drive')
-        self._args.append(opts)
+        self.args.append('-drive')
+        self.args.append(opts)
         return self
 
     def add_drive(self, path, opts='', interface='virtio', format=imgfmt):
@@ -172,17 +172,17 @@ class VM(qtest.QEMUQtestMachine):
         if opts:
             options.append(opts)
 
-        self._args.append('-drive')
-        self._args.append(','.join(options))
+        self.args.append('-drive')
+        self.args.append(','.join(options))
         self._num_drives += 1
         return self
 
     def add_blockdev(self, opts):
-        self._args.append('-blockdev')
+        self.args.append('-blockdev')
         if isinstance(opts, str):
-            self._args.append(opts)
+            self.args.append(opts)
         else:
-            self._args.append(','.join(opts))
+            self.args.append(','.join(opts))
         return self
 
     def pause_drive(self, drive, event=None):
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running()
  2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running() Amador Pahim
@ 2017-07-20 11:49   ` Markus Armbruster
  2017-07-20 12:57     ` Amador Pahim
  0 siblings, 1 reply; 17+ messages in thread
From: Markus Armbruster @ 2017-07-20 11:49 UTC (permalink / raw)
  To: Amador Pahim; +Cc: qemu-devel, kwolf, ldoktor, ehabkost, mreitz, crosa, famz

Amador Pahim <apahim@redhat.com> writes:

> Current implementation is broken. It does not really test if the child
> process is running.

What usage exactly is broken by this?  Got a reproducer for me?

> The Popen.returncode will only be set after by a poll(), wait() or
> communicate(). If the Popen fails to launch a VM, the Popen.returncode
> will not turn to None by itself.

Hmm.  What is the value of .returncode then?

> Instead of using Popen.returncode, let's use Popen.poll(), which
> actually checks if child process has terminated.
>
> Signed-off-by: Amador Pahim <apahim@redhat.com>
> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> ---
>  scripts/qemu.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/qemu.py b/scripts/qemu.py
> index 880e3e8219..f0fade32bd 100644
> --- a/scripts/qemu.py
> +++ b/scripts/qemu.py
> @@ -86,7 +86,7 @@ class QEMUMachine(object):
>              raise
>  
>      def is_running(self):
> -        return self._popen and (self._popen.returncode is None)
> +        return self._popen and (self._popen.poll() is None)
>  
>      def exitcode(self):
>          if self._popen is None:
               return None
           return self._popen.returncode

Why is this one safe?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error
  2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error Amador Pahim
@ 2017-07-20 11:58   ` Markus Armbruster
  2017-07-20 13:14     ` Amador Pahim
  0 siblings, 1 reply; 17+ messages in thread
From: Markus Armbruster @ 2017-07-20 11:58 UTC (permalink / raw)
  To: Amador Pahim; +Cc: qemu-devel, kwolf, ldoktor, ehabkost, mreitz, crosa, famz

Amador Pahim <apahim@redhat.com> writes:

> When launching a VM, if an exception happens and the VM is not
> initiated, it is useful to see the qemu command line that was executed
> and the output of that command.
>
> Before the patch:
>
>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
>     >>> VM.launch()
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>       File "qemu.py", line 137, in launch
>         self._post_launch()
>       File "qemu.py", line 121, in _post_launch
>         self._qmp.accept()
>       File "qmp/qmp.py", line 145, in accept
>         self.__sock, _ = self.__sock.accept()
>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
>         sock, addr = self._sock.accept()
>     socket.timeout: timed out
>
> After the patch:
>
>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
>     >>> VM.launch()
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>       File "qemu.py", line 156, in launch
>         raise RuntimeError(msg)
>     RuntimeError: Error launching VM.
>     Original Exception:
>     Traceback (most recent call last):
>       File "qemu.py", line 138, in launch
>         self._post_launch()
>       File "qemu.py", line 122, in _post_launch
>         self._qmp.accept()
>       File "qmp/qmp.py", line 145, in accept
>         self.__sock, _ = self.__sock.accept()
>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
>         sock, addr = self._sock.accept()
>     timeout: timed out
>     Command:
>     /usr/bin/qemu-system-aarch64 -chardev socket,id=mon,
>     path=/var/tmp/qemu-23958-monitor.sock -mon chardev=mon,mode=control
>     -display none -vga none
>     Output:
>     qemu-system-aarch64: No machine specified, and there is no default
>     Use -machine help to list supported machines
>
> Also, if the launch() faces an exception, the 'except' now will use args
> to fill the debug information. So this patch assigns 'args' earlier,
> assuring it will be available for the 'except'.
>
> Signed-off-by: Amador Pahim <apahim@redhat.com>
> Reviewed-by: Fam Zheng <famz@redhat.com>
> ---
>  scripts/qemu.py | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/scripts/qemu.py b/scripts/qemu.py
> index f0fade32bd..2707ae7f75 100644
> --- a/scripts/qemu.py
> +++ b/scripts/qemu.py
> @@ -18,6 +18,7 @@ import os
>  import sys
>  import subprocess
>  import qmp.qmp
> +import traceback
>  
>  
>  class QEMUMachine(object):
> @@ -129,17 +130,30 @@ class QEMUMachine(object):
>          '''Launch the VM and establish a QMP connection'''
>          devnull = open('/dev/null', 'rb')
>          qemulog = open(self._qemu_log_path, 'wb')
> +        args = self._wrapper + [self._binary] + self._base_args() + self.args
>          try:
>              self._pre_launch()
> -            args = self._wrapper + [self._binary] + self._base_args() + self._args
>              self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
>                                             stderr=subprocess.STDOUT, shell=False)
>              self._post_launch()

Lifting the assignment out of the try makes sense, because exceptions
there are programming errors.

Why are ._pre_launch() and ._post_launch() inside the try?

>          except:

Why do we catch all exceptions, and not just
subprocess.CalledProcessError?

> +            self._load_io_log()
>              if self.is_running():

I suspect we can't get here only for "odd" exceptions, not for
subprocess.CalledProcessError.

>                  self._popen.kill()
>                  self._popen.wait()
> -            self._load_io_log()
> +            else:
> +                exc_type, exc_value, exc_traceback = sys.exc_info()
> +                msg = ('Error launching VM.\n'
> +                       'Original Exception: \n%s'
> +                       'Command:\n%s\n'
> +                       'Output:\n%s\n' %
> +                       (''.join(traceback.format_exception(exc_type,
> +                                                           exc_value,
> +                                                           exc_traceback)),
> +                        ' '.join(args),
> +                        self._iolog))
> +                self._post_shutdown()
> +                raise RuntimeError(msg)
>              self._post_shutdown()
>              raise

Why do we print a traceback when subprocess.CalledProcessError?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running()
  2017-07-20 11:49   ` Markus Armbruster
@ 2017-07-20 12:57     ` Amador Pahim
  2017-07-20 15:09       ` Markus Armbruster
  0 siblings, 1 reply; 17+ messages in thread
From: Amador Pahim @ 2017-07-20 12:57 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, kwolf, Lukáš Doktor, ehabkost, mreitz,
	Cleber Rosa, Fam Zheng

On Thu, Jul 20, 2017 at 1:49 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Amador Pahim <apahim@redhat.com> writes:
>
>> Current implementation is broken. It does not really test if the child
>> process is running.
>
> What usage exactly is broken by this?  Got a reproducer for me?

Problem is that 'returncode' is not set without a calling
poll()/wait()/communicate(), so it's only useful to test if the
process is running after such calls. But if we use 'poll()' instead,
it will, according to the docs, "Check if child process has
terminated. Set and return returncode attribute."

Reproducer is:

 >>> import subprocess
 >>> devnull = open('/dev/null', 'rb')
 >>> p = subprocess.Popen(['qemu-system-x86_64', '-broken'],
stdin=devnull, stdout=devnull, stderr=devnull, shell=False)
 >>> print p.returncode
 None
 >>> print p.poll()
 1
 >>> print p.returncode
 1


>
>> The Popen.returncode will only be set after by a poll(), wait() or
>> communicate(). If the Popen fails to launch a VM, the Popen.returncode
>> will not turn to None by itself.
>
> Hmm.  What is the value of .returncode then?

returncode starts with None and becomes the process exit code when the
process is over and one of that three methods is called (poll(),
wait() or communicate()).

There's an error in my description though. The correct would be: "The
Popen.returncode will only be set after a call to poll(), wait() or
communicate(). If the Popen fails to launch a VM, the Popen.returncode
will not turn from None to the actual return code by itself."


>
>> Instead of using Popen.returncode, let's use Popen.poll(), which
>> actually checks if child process has terminated.
>>
>> Signed-off-by: Amador Pahim <apahim@redhat.com>
>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>> Reviewed-by: Fam Zheng <famz@redhat.com>
>> ---
>>  scripts/qemu.py | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/scripts/qemu.py b/scripts/qemu.py
>> index 880e3e8219..f0fade32bd 100644
>> --- a/scripts/qemu.py
>> +++ b/scripts/qemu.py
>> @@ -86,7 +86,7 @@ class QEMUMachine(object):
>>              raise
>>
>>      def is_running(self):
>> -        return self._popen and (self._popen.returncode is None)
>> +        return self._popen and (self._popen.poll() is None)
>>
>>      def exitcode(self):
>>          if self._popen is None:
>                return None
>            return self._popen.returncode
>
> Why is this one safe?

Here it's used just to retrieve the value from the Popen.returncode.
It's not being used to check whether the process is running or not.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error
  2017-07-20 11:58   ` Markus Armbruster
@ 2017-07-20 13:14     ` Amador Pahim
  2017-07-20 14:43       ` Eduardo Habkost
  2017-07-20 15:01       ` Markus Armbruster
  0 siblings, 2 replies; 17+ messages in thread
From: Amador Pahim @ 2017-07-20 13:14 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, kwolf, Lukáš Doktor, ehabkost, mreitz,
	Cleber Rosa, Fam Zheng

On Thu, Jul 20, 2017 at 1:58 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Amador Pahim <apahim@redhat.com> writes:
>
>> When launching a VM, if an exception happens and the VM is not
>> initiated, it is useful to see the qemu command line that was executed
>> and the output of that command.
>>
>> Before the patch:
>>
>>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
>>     >>> VM.launch()
>>     Traceback (most recent call last):
>>       File "<stdin>", line 1, in <module>
>>       File "qemu.py", line 137, in launch
>>         self._post_launch()
>>       File "qemu.py", line 121, in _post_launch
>>         self._qmp.accept()
>>       File "qmp/qmp.py", line 145, in accept
>>         self.__sock, _ = self.__sock.accept()
>>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
>>         sock, addr = self._sock.accept()
>>     socket.timeout: timed out
>>
>> After the patch:
>>
>>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
>>     >>> VM.launch()
>>     Traceback (most recent call last):
>>       File "<stdin>", line 1, in <module>
>>       File "qemu.py", line 156, in launch
>>         raise RuntimeError(msg)
>>     RuntimeError: Error launching VM.
>>     Original Exception:
>>     Traceback (most recent call last):
>>       File "qemu.py", line 138, in launch
>>         self._post_launch()
>>       File "qemu.py", line 122, in _post_launch
>>         self._qmp.accept()
>>       File "qmp/qmp.py", line 145, in accept
>>         self.__sock, _ = self.__sock.accept()
>>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
>>         sock, addr = self._sock.accept()
>>     timeout: timed out
>>     Command:
>>     /usr/bin/qemu-system-aarch64 -chardev socket,id=mon,
>>     path=/var/tmp/qemu-23958-monitor.sock -mon chardev=mon,mode=control
>>     -display none -vga none
>>     Output:
>>     qemu-system-aarch64: No machine specified, and there is no default
>>     Use -machine help to list supported machines
>>
>> Also, if the launch() faces an exception, the 'except' now will use args
>> to fill the debug information. So this patch assigns 'args' earlier,
>> assuring it will be available for the 'except'.
>>
>> Signed-off-by: Amador Pahim <apahim@redhat.com>
>> Reviewed-by: Fam Zheng <famz@redhat.com>
>> ---
>>  scripts/qemu.py | 18 ++++++++++++++++--
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/scripts/qemu.py b/scripts/qemu.py
>> index f0fade32bd..2707ae7f75 100644
>> --- a/scripts/qemu.py
>> +++ b/scripts/qemu.py
>> @@ -18,6 +18,7 @@ import os
>>  import sys
>>  import subprocess
>>  import qmp.qmp
>> +import traceback
>>
>>
>>  class QEMUMachine(object):
>> @@ -129,17 +130,30 @@ class QEMUMachine(object):
>>          '''Launch the VM and establish a QMP connection'''
>>          devnull = open('/dev/null', 'rb')
>>          qemulog = open(self._qemu_log_path, 'wb')
>> +        args = self._wrapper + [self._binary] + self._base_args() + self.args
>>          try:
>>              self._pre_launch()
>> -            args = self._wrapper + [self._binary] + self._base_args() + self._args
>>              self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
>>                                             stderr=subprocess.STDOUT, shell=False)
>>              self._post_launch()
>
> Lifting the assignment out of the try makes sense, because exceptions
> there are programming errors.
>
> Why are ._pre_launch() and ._post_launch() inside the try?

The _pre_launch()/_post_launch() are probably the main source of
exceptions here and I assume we want to handle the exceptions coming
from them.

The call to subprocess.Popen() is unlikely to raise exceptions. I
mean, it's easy to trigger an 'OSError' exception providing a
non-existent command to Popen, but once we provide something
executable, we will not see an exception here if the command fails.
The CalledProcessError is only raised by subprocess.check_call() or
subprocess.check_output().


>
>>          except:
>
> Why do we catch all exceptions, and not just
> subprocess.CalledProcessError?
>
>> +            self._load_io_log()
>>              if self.is_running():
>
> I suspect we can't get here only for "odd" exceptions, not for
> subprocess.CalledProcessError.
>
>>                  self._popen.kill()
>>                  self._popen.wait()
>> -            self._load_io_log()
>> +            else:
>> +                exc_type, exc_value, exc_traceback = sys.exc_info()
>> +                msg = ('Error launching VM.\n'
>> +                       'Original Exception: \n%s'
>> +                       'Command:\n%s\n'
>> +                       'Output:\n%s\n' %
>> +                       (''.join(traceback.format_exception(exc_type,
>> +                                                           exc_value,
>> +                                                           exc_traceback)),
>> +                        ' '.join(args),
>> +                        self._iolog))
>> +                self._post_shutdown()
>> +                raise RuntimeError(msg)
>>              self._post_shutdown()
>>              raise
>
> Why do we print a traceback when subprocess.CalledProcessError?

subprocess.CalledProcessError will not happen with subprocess.Popen,
even if the command fails. But if the virtual machine does not start,
we will, for example, see a 'socket.timeout' exception coming from
_post_launch() when it tries to connect to the qemu monitor.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error
  2017-07-20 13:14     ` Amador Pahim
@ 2017-07-20 14:43       ` Eduardo Habkost
  2017-07-20 15:51         ` Amador Pahim
  2017-07-20 15:01       ` Markus Armbruster
  1 sibling, 1 reply; 17+ messages in thread
From: Eduardo Habkost @ 2017-07-20 14:43 UTC (permalink / raw)
  To: Amador Pahim
  Cc: Markus Armbruster, qemu-devel, kwolf, Lukáš Doktor,
	mreitz, Cleber Rosa, Fam Zheng

On Thu, Jul 20, 2017 at 03:14:28PM +0200, Amador Pahim wrote:
> On Thu, Jul 20, 2017 at 1:58 PM, Markus Armbruster <armbru@redhat.com> wrote:
> > Amador Pahim <apahim@redhat.com> writes:
> >
> >> When launching a VM, if an exception happens and the VM is not
> >> initiated, it is useful to see the qemu command line that was executed
> >> and the output of that command.
> >>
> >> Before the patch:
> >>
> >>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
> >>     >>> VM.launch()
> >>     Traceback (most recent call last):
> >>       File "<stdin>", line 1, in <module>
> >>       File "qemu.py", line 137, in launch
> >>         self._post_launch()
> >>       File "qemu.py", line 121, in _post_launch
> >>         self._qmp.accept()
> >>       File "qmp/qmp.py", line 145, in accept
> >>         self.__sock, _ = self.__sock.accept()
> >>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
> >>         sock, addr = self._sock.accept()
> >>     socket.timeout: timed out
> >>
> >> After the patch:
> >>
> >>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
> >>     >>> VM.launch()
> >>     Traceback (most recent call last):
> >>       File "<stdin>", line 1, in <module>
> >>       File "qemu.py", line 156, in launch
> >>         raise RuntimeError(msg)
> >>     RuntimeError: Error launching VM.
> >>     Original Exception:
> >>     Traceback (most recent call last):
> >>       File "qemu.py", line 138, in launch
> >>         self._post_launch()
> >>       File "qemu.py", line 122, in _post_launch
> >>         self._qmp.accept()
> >>       File "qmp/qmp.py", line 145, in accept
> >>         self.__sock, _ = self.__sock.accept()
> >>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
> >>         sock, addr = self._sock.accept()
> >>     timeout: timed out
> >>     Command:
> >>     /usr/bin/qemu-system-aarch64 -chardev socket,id=mon,
> >>     path=/var/tmp/qemu-23958-monitor.sock -mon chardev=mon,mode=control
> >>     -display none -vga none
> >>     Output:
> >>     qemu-system-aarch64: No machine specified, and there is no default
> >>     Use -machine help to list supported machines
> >>
> >> Also, if the launch() faces an exception, the 'except' now will use args
> >> to fill the debug information. So this patch assigns 'args' earlier,
> >> assuring it will be available for the 'except'.
> >>
> >> Signed-off-by: Amador Pahim <apahim@redhat.com>
> >> Reviewed-by: Fam Zheng <famz@redhat.com>
> >> ---
> >>  scripts/qemu.py | 18 ++++++++++++++++--
> >>  1 file changed, 16 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/scripts/qemu.py b/scripts/qemu.py
> >> index f0fade32bd..2707ae7f75 100644
> >> --- a/scripts/qemu.py
> >> +++ b/scripts/qemu.py
> >> @@ -18,6 +18,7 @@ import os
> >>  import sys
> >>  import subprocess
> >>  import qmp.qmp
> >> +import traceback
> >>
> >>
> >>  class QEMUMachine(object):
> >> @@ -129,17 +130,30 @@ class QEMUMachine(object):
> >>          '''Launch the VM and establish a QMP connection'''
> >>          devnull = open('/dev/null', 'rb')
> >>          qemulog = open(self._qemu_log_path, 'wb')
> >> +        args = self._wrapper + [self._binary] + self._base_args() + self.args
> >>          try:
> >>              self._pre_launch()
> >> -            args = self._wrapper + [self._binary] + self._base_args() + self._args
> >>              self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
> >>                                             stderr=subprocess.STDOUT, shell=False)
> >>              self._post_launch()
> >
> > Lifting the assignment out of the try makes sense, because exceptions
> > there are programming errors.
> >
> > Why are ._pre_launch() and ._post_launch() inside the try?
> 
> The _pre_launch()/_post_launch() are probably the main source of
> exceptions here and I assume we want to handle the exceptions coming
> from them.
> 
> The call to subprocess.Popen() is unlikely to raise exceptions. I
> mean, it's easy to trigger an 'OSError' exception providing a
> non-existent command to Popen, but once we provide something
> executable, we will not see an exception here if the command fails.
> The CalledProcessError is only raised by subprocess.check_call() or
> subprocess.check_output().
> 
> 

That's correct.  QEMU exiting won't cause an exception, it
doesn't matter the error code.  Failing to connect to QMP or
qtest sockets on _post_launch() will.

However, another purpose of the try/except block is to ensure we
clean up the process and log file in case _any_ exception happens
(programming error or not).  That's why everything after
open(self._qemu_log_path) is inside the try block.  We should be
careful to not break that.

I think this code really needs to be cleaned up.  The existing
try/except block has a huge number of responsibilities:
1) Ensuring we kill the process in case _post_launch() raises an
   exception.
2) Ensuring we load the log file even if Popen() or _post_launch()
   raise an exception.
3) Ensuring we remove the monitor socket if anything after
   _pre_launch() raises an exception (through _post_shutdown())
4) Ensuring the log file is deleted if anything after
   open(_qemu_log_path) raises an exception (through
   _post_shutdown())
5) Ensuring _post_shutdown() is called if anything after
   _pre_launch() raises an exception (e.g. QEMUQTestMachine
   deletes the socket created by _pre_launch() inside
   _post_shutdown()
6) (New, added by this patch) Generating a more useful exception
   message if something (what exactly?) fails.

Some of the cases above are just programming errors, but it's
still nice to do proper cleanup in case of programming errors.
e.g.: if _base_args() raise an exception, we can ensure we delete
the log file anyway (this patch breaks that).

If we don't clean this up, future cleanups (e.g. moving
_pre_launch() outside the try block) would easily break one or
more responsibilities above, and this would be very easy to miss
on patch review.


> >
> >>          except:
> >
> > Why do we catch all exceptions, and not just
> > subprocess.CalledProcessError?

See above.

> >
> >> +            self._load_io_log()
> >>              if self.is_running():
> >
> > I suspect we can't get here only for "odd" exceptions, not for
> > subprocess.CalledProcessError.

It depends on what you mean by "odd".

> >
> >>                  self._popen.kill()
> >>                  self._popen.wait()
> >> -            self._load_io_log()
> >> +            else:
> >> +                exc_type, exc_value, exc_traceback = sys.exc_info()
> >> +                msg = ('Error launching VM.\n'
> >> +                       'Original Exception: \n%s'
> >> +                       'Command:\n%s\n'
> >> +                       'Output:\n%s\n' %
> >> +                       (''.join(traceback.format_exception(exc_type,
> >> +                                                           exc_value,
> >> +                                                           exc_traceback)),
> >> +                        ' '.join(args),
> >> +                        self._iolog))
> >> +                self._post_shutdown()
> >> +                raise RuntimeError(msg)
> >>              self._post_shutdown()
> >>              raise
> >
> > Why do we print a traceback when subprocess.CalledProcessError?
> 
> subprocess.CalledProcessError will not happen with subprocess.Popen,
> even if the command fails. But if the virtual machine does not start,
> we will, for example, see a 'socket.timeout' exception coming from
> _post_launch() when it tries to connect to the qemu monitor.

Exactly.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error
  2017-07-20 13:14     ` Amador Pahim
  2017-07-20 14:43       ` Eduardo Habkost
@ 2017-07-20 15:01       ` Markus Armbruster
  2017-07-20 15:50         ` Amador Pahim
  1 sibling, 1 reply; 17+ messages in thread
From: Markus Armbruster @ 2017-07-20 15:01 UTC (permalink / raw)
  To: Amador Pahim
  Cc: kwolf, Lukáš Doktor, Fam Zheng, ehabkost, qemu-devel,
	mreitz, Cleber Rosa

Amador Pahim <apahim@redhat.com> writes:

> On Thu, Jul 20, 2017 at 1:58 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> Amador Pahim <apahim@redhat.com> writes:
>>
>>> When launching a VM, if an exception happens and the VM is not
>>> initiated, it is useful to see the qemu command line that was executed
>>> and the output of that command.
>>>
>>> Before the patch:
>>>
>>>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
>>>     >>> VM.launch()
>>>     Traceback (most recent call last):
>>>       File "<stdin>", line 1, in <module>
>>>       File "qemu.py", line 137, in launch
>>>         self._post_launch()
>>>       File "qemu.py", line 121, in _post_launch
>>>         self._qmp.accept()
>>>       File "qmp/qmp.py", line 145, in accept
>>>         self.__sock, _ = self.__sock.accept()
>>>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
>>>         sock, addr = self._sock.accept()
>>>     socket.timeout: timed out
>>>
>>> After the patch:
>>>
>>>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
>>>     >>> VM.launch()
>>>     Traceback (most recent call last):
>>>       File "<stdin>", line 1, in <module>
>>>       File "qemu.py", line 156, in launch
>>>         raise RuntimeError(msg)
>>>     RuntimeError: Error launching VM.
>>>     Original Exception:
>>>     Traceback (most recent call last):
>>>       File "qemu.py", line 138, in launch
>>>         self._post_launch()
>>>       File "qemu.py", line 122, in _post_launch
>>>         self._qmp.accept()
>>>       File "qmp/qmp.py", line 145, in accept
>>>         self.__sock, _ = self.__sock.accept()
>>>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
>>>         sock, addr = self._sock.accept()
>>>     timeout: timed out
>>>     Command:
>>>     /usr/bin/qemu-system-aarch64 -chardev socket,id=mon,
>>>     path=/var/tmp/qemu-23958-monitor.sock -mon chardev=mon,mode=control
>>>     -display none -vga none
>>>     Output:
>>>     qemu-system-aarch64: No machine specified, and there is no default
>>>     Use -machine help to list supported machines
>>>
>>> Also, if the launch() faces an exception, the 'except' now will use args
>>> to fill the debug information. So this patch assigns 'args' earlier,
>>> assuring it will be available for the 'except'.
>>>
>>> Signed-off-by: Amador Pahim <apahim@redhat.com>
>>> Reviewed-by: Fam Zheng <famz@redhat.com>
>>> ---
>>>  scripts/qemu.py | 18 ++++++++++++++++--
>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/scripts/qemu.py b/scripts/qemu.py
>>> index f0fade32bd..2707ae7f75 100644
>>> --- a/scripts/qemu.py
>>> +++ b/scripts/qemu.py
>>> @@ -18,6 +18,7 @@ import os
>>>  import sys
>>>  import subprocess
>>>  import qmp.qmp
>>> +import traceback
>>>
>>>
>>>  class QEMUMachine(object):
>>> @@ -129,17 +130,30 @@ class QEMUMachine(object):
>>>          '''Launch the VM and establish a QMP connection'''
>>>          devnull = open('/dev/null', 'rb')
>>>          qemulog = open(self._qemu_log_path, 'wb')
>>> +        args = self._wrapper + [self._binary] + self._base_args() + self.args
>>>          try:
>>>              self._pre_launch()
>>> -            args = self._wrapper + [self._binary] + self._base_args() + self._args
>>>              self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
>>>                                             stderr=subprocess.STDOUT, shell=False)
>>>              self._post_launch()
>>
>> Lifting the assignment out of the try makes sense, because exceptions
>> there are programming errors.
>>
>> Why are ._pre_launch() and ._post_launch() inside the try?
>
> The _pre_launch()/_post_launch() are probably the main source of
> exceptions here and I assume we want to handle the exceptions coming
> from them.
>
> The call to subprocess.Popen() is unlikely to raise exceptions. I
> mean, it's easy to trigger an 'OSError' exception providing a
> non-existent command to Popen, but once we provide something
> executable, we will not see an exception here if the command fails.
> The CalledProcessError is only raised by subprocess.check_call() or
> subprocess.check_output().

Okay.

Catching all exceptions is too sloppy for my taste.  But it's not your
patch's fault.

>>>          except:
>>
>> Why do we catch all exceptions, and not just
>> subprocess.CalledProcessError?
>>
>>> +            self._load_io_log()
>>>              if self.is_running():
>>
>> I suspect we can't get here only for "odd" exceptions, not for
>> subprocess.CalledProcessError.
>>
>>>                  self._popen.kill()
>>>                  self._popen.wait()
>>> -            self._load_io_log()
>>> +            else:
>>> +                exc_type, exc_value, exc_traceback = sys.exc_info()
>>> +                msg = ('Error launching VM.\n'
>>> +                       'Original Exception: \n%s'
>>> +                       'Command:\n%s\n'
>>> +                       'Output:\n%s\n' %
>>> +                       (''.join(traceback.format_exception(exc_type,
>>> +                                                           exc_value,
>>> +                                                           exc_traceback)),
>>> +                        ' '.join(args),
>>> +                        self._iolog))
>>> +                self._post_shutdown()
>>> +                raise RuntimeError(msg)
>>>              self._post_shutdown()
>>>              raise
>>
>> Why do we print a traceback when subprocess.CalledProcessError?
>
> subprocess.CalledProcessError will not happen with subprocess.Popen,
> even if the command fails. But if the virtual machine does not start,
> we will, for example, see a 'socket.timeout' exception coming from
> _post_launch() when it tries to connect to the qemu monitor.

A traceback is a inexcusably lazy way to report such errors.  You're
putting lipstick on a pig.

When putting on lipstick is all we can do, it can be better than
nothing.  Is it all we can do?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running()
  2017-07-20 12:57     ` Amador Pahim
@ 2017-07-20 15:09       ` Markus Armbruster
  2017-07-20 15:46         ` Amador Pahim
  2017-07-20 17:49         ` Eduardo Habkost
  0 siblings, 2 replies; 17+ messages in thread
From: Markus Armbruster @ 2017-07-20 15:09 UTC (permalink / raw)
  To: Amador Pahim
  Cc: kwolf, Lukáš Doktor, Fam Zheng, ehabkost, qemu-devel,
	mreitz, Cleber Rosa

Amador Pahim <apahim@redhat.com> writes:

> On Thu, Jul 20, 2017 at 1:49 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> Amador Pahim <apahim@redhat.com> writes:
>>
>>> Current implementation is broken. It does not really test if the child
>>> process is running.
>>
>> What usage exactly is broken by this?  Got a reproducer for me?
>
> Problem is that 'returncode' is not set without a calling
> poll()/wait()/communicate(), so it's only useful to test if the
> process is running after such calls. But if we use 'poll()' instead,
> it will, according to the docs, "Check if child process has
> terminated. Set and return returncode attribute."
>
> Reproducer is:
>
>  >>> import subprocess
>  >>> devnull = open('/dev/null', 'rb')
>  >>> p = subprocess.Popen(['qemu-system-x86_64', '-broken'],
> stdin=devnull, stdout=devnull, stderr=devnull, shell=False)
>  >>> print p.returncode
>  None
>  >>> print p.poll()
>  1
>  >>> print p.returncode
>  1
>
>>> The Popen.returncode will only be set after by a poll(), wait() or
>>> communicate(). If the Popen fails to launch a VM, the Popen.returncode
>>> will not turn to None by itself.
>>
>> Hmm.  What is the value of .returncode then?
>
> returncode starts with None and becomes the process exit code when the
> process is over and one of that three methods is called (poll(),
> wait() or communicate()).
>
> There's an error in my description though. The correct would be: "The
> Popen.returncode will only be set after a call to poll(), wait() or
> communicate(). If the Popen fails to launch a VM, the Popen.returncode
> will not turn from None to the actual return code by itself."

Suggest to add ", and is_running() continues to report True".

>>> Instead of using Popen.returncode, let's use Popen.poll(), which
>>> actually checks if child process has terminated.
>>>
>>> Signed-off-by: Amador Pahim <apahim@redhat.com>
>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>> Reviewed-by: Fam Zheng <famz@redhat.com>
>>> ---
>>>  scripts/qemu.py | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/scripts/qemu.py b/scripts/qemu.py
>>> index 880e3e8219..f0fade32bd 100644
>>> --- a/scripts/qemu.py
>>> +++ b/scripts/qemu.py
>>> @@ -86,7 +86,7 @@ class QEMUMachine(object):
>>>              raise
>>>
>>>      def is_running(self):
>>> -        return self._popen and (self._popen.returncode is None)
>>> +        return self._popen and (self._popen.poll() is None)
>>>
>>>      def exitcode(self):
>>>          if self._popen is None:
>>                return None
>>            return self._popen.returncode
>>
>> Why is this one safe?
>
> Here it's used just to retrieve the value from the Popen.returncode.
> It's not being used to check whether the process is running or not.

If self._popen is not None, we return self._popen.returncode.  It's None
if .poll() etc. haven't been called.  Can this happen?  If not, why not?
If yes, why is returning None then okay?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running()
  2017-07-20 15:09       ` Markus Armbruster
@ 2017-07-20 15:46         ` Amador Pahim
  2017-07-20 17:49         ` Eduardo Habkost
  1 sibling, 0 replies; 17+ messages in thread
From: Amador Pahim @ 2017-07-20 15:46 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: kwolf, Lukáš Doktor, Fam Zheng, ehabkost, qemu-devel,
	mreitz, Cleber Rosa

On Thu, Jul 20, 2017 at 5:09 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Amador Pahim <apahim@redhat.com> writes:
>
>> On Thu, Jul 20, 2017 at 1:49 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>> Amador Pahim <apahim@redhat.com> writes:
>>>
>>>> Current implementation is broken. It does not really test if the child
>>>> process is running.
>>>
>>> What usage exactly is broken by this?  Got a reproducer for me?
>>
>> Problem is that 'returncode' is not set without a calling
>> poll()/wait()/communicate(), so it's only useful to test if the
>> process is running after such calls. But if we use 'poll()' instead,
>> it will, according to the docs, "Check if child process has
>> terminated. Set and return returncode attribute."
>>
>> Reproducer is:
>>
>>  >>> import subprocess
>>  >>> devnull = open('/dev/null', 'rb')
>>  >>> p = subprocess.Popen(['qemu-system-x86_64', '-broken'],
>> stdin=devnull, stdout=devnull, stderr=devnull, shell=False)
>>  >>> print p.returncode
>>  None
>>  >>> print p.poll()
>>  1
>>  >>> print p.returncode
>>  1
>>
>>>> The Popen.returncode will only be set after by a poll(), wait() or
>>>> communicate(). If the Popen fails to launch a VM, the Popen.returncode
>>>> will not turn to None by itself.
>>>
>>> Hmm.  What is the value of .returncode then?
>>
>> returncode starts with None and becomes the process exit code when the
>> process is over and one of that three methods is called (poll(),
>> wait() or communicate()).
>>
>> There's an error in my description though. The correct would be: "The
>> Popen.returncode will only be set after a call to poll(), wait() or
>> communicate(). If the Popen fails to launch a VM, the Popen.returncode
>> will not turn from None to the actual return code by itself."
>
> Suggest to add ", and is_running() continues to report True".
>
>>>> Instead of using Popen.returncode, let's use Popen.poll(), which
>>>> actually checks if child process has terminated.
>>>>
>>>> Signed-off-by: Amador Pahim <apahim@redhat.com>
>>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>>> Reviewed-by: Fam Zheng <famz@redhat.com>
>>>> ---
>>>>  scripts/qemu.py | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/scripts/qemu.py b/scripts/qemu.py
>>>> index 880e3e8219..f0fade32bd 100644
>>>> --- a/scripts/qemu.py
>>>> +++ b/scripts/qemu.py
>>>> @@ -86,7 +86,7 @@ class QEMUMachine(object):
>>>>              raise
>>>>
>>>>      def is_running(self):
>>>> -        return self._popen and (self._popen.returncode is None)
>>>> +        return self._popen and (self._popen.poll() is None)
>>>>
>>>>      def exitcode(self):
>>>>          if self._popen is None:
>>>                return None
>>>            return self._popen.returncode
>>>
>>> Why is this one safe?
>>
>> Here it's used just to retrieve the value from the Popen.returncode.
>> It's not being used to check whether the process is running or not.
>
> If self._popen is not None, we return self._popen.returncode.  It's None
> if .poll() etc. haven't been called.  Can this happen?  If not, why not?
> If yes, why is returning None then okay?

Yes, that can happen. This method is not returning an up-to-date
returncode, it's serving just as a wrapper to the attribute, being the
attribute updated or not.
I lack the background on why it was coded that way, but considering
the API-user perspective, I agree with you. We should return
self._popen.poll() here indeed. Fixing that.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error
  2017-07-20 15:01       ` Markus Armbruster
@ 2017-07-20 15:50         ` Amador Pahim
  0 siblings, 0 replies; 17+ messages in thread
From: Amador Pahim @ 2017-07-20 15:50 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: kwolf, Lukáš Doktor, Fam Zheng, ehabkost, qemu-devel,
	mreitz, Cleber Rosa

On Thu, Jul 20, 2017 at 5:01 PM, Markus Armbruster <armbru@redhat.com> wrote:
> Amador Pahim <apahim@redhat.com> writes:
>
>> On Thu, Jul 20, 2017 at 1:58 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>> Amador Pahim <apahim@redhat.com> writes:
>>>
>>>> When launching a VM, if an exception happens and the VM is not
>>>> initiated, it is useful to see the qemu command line that was executed
>>>> and the output of that command.
>>>>
>>>> Before the patch:
>>>>
>>>>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
>>>>     >>> VM.launch()
>>>>     Traceback (most recent call last):
>>>>       File "<stdin>", line 1, in <module>
>>>>       File "qemu.py", line 137, in launch
>>>>         self._post_launch()
>>>>       File "qemu.py", line 121, in _post_launch
>>>>         self._qmp.accept()
>>>>       File "qmp/qmp.py", line 145, in accept
>>>>         self.__sock, _ = self.__sock.accept()
>>>>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
>>>>         sock, addr = self._sock.accept()
>>>>     socket.timeout: timed out
>>>>
>>>> After the patch:
>>>>
>>>>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
>>>>     >>> VM.launch()
>>>>     Traceback (most recent call last):
>>>>       File "<stdin>", line 1, in <module>
>>>>       File "qemu.py", line 156, in launch
>>>>         raise RuntimeError(msg)
>>>>     RuntimeError: Error launching VM.
>>>>     Original Exception:
>>>>     Traceback (most recent call last):
>>>>       File "qemu.py", line 138, in launch
>>>>         self._post_launch()
>>>>       File "qemu.py", line 122, in _post_launch
>>>>         self._qmp.accept()
>>>>       File "qmp/qmp.py", line 145, in accept
>>>>         self.__sock, _ = self.__sock.accept()
>>>>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
>>>>         sock, addr = self._sock.accept()
>>>>     timeout: timed out
>>>>     Command:
>>>>     /usr/bin/qemu-system-aarch64 -chardev socket,id=mon,
>>>>     path=/var/tmp/qemu-23958-monitor.sock -mon chardev=mon,mode=control
>>>>     -display none -vga none
>>>>     Output:
>>>>     qemu-system-aarch64: No machine specified, and there is no default
>>>>     Use -machine help to list supported machines
>>>>
>>>> Also, if the launch() faces an exception, the 'except' now will use args
>>>> to fill the debug information. So this patch assigns 'args' earlier,
>>>> assuring it will be available for the 'except'.
>>>>
>>>> Signed-off-by: Amador Pahim <apahim@redhat.com>
>>>> Reviewed-by: Fam Zheng <famz@redhat.com>
>>>> ---
>>>>  scripts/qemu.py | 18 ++++++++++++++++--
>>>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/scripts/qemu.py b/scripts/qemu.py
>>>> index f0fade32bd..2707ae7f75 100644
>>>> --- a/scripts/qemu.py
>>>> +++ b/scripts/qemu.py
>>>> @@ -18,6 +18,7 @@ import os
>>>>  import sys
>>>>  import subprocess
>>>>  import qmp.qmp
>>>> +import traceback
>>>>
>>>>
>>>>  class QEMUMachine(object):
>>>> @@ -129,17 +130,30 @@ class QEMUMachine(object):
>>>>          '''Launch the VM and establish a QMP connection'''
>>>>          devnull = open('/dev/null', 'rb')
>>>>          qemulog = open(self._qemu_log_path, 'wb')
>>>> +        args = self._wrapper + [self._binary] + self._base_args() + self.args
>>>>          try:
>>>>              self._pre_launch()
>>>> -            args = self._wrapper + [self._binary] + self._base_args() + self._args
>>>>              self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
>>>>                                             stderr=subprocess.STDOUT, shell=False)
>>>>              self._post_launch()
>>>
>>> Lifting the assignment out of the try makes sense, because exceptions
>>> there are programming errors.
>>>
>>> Why are ._pre_launch() and ._post_launch() inside the try?
>>
>> The _pre_launch()/_post_launch() are probably the main source of
>> exceptions here and I assume we want to handle the exceptions coming
>> from them.
>>
>> The call to subprocess.Popen() is unlikely to raise exceptions. I
>> mean, it's easy to trigger an 'OSError' exception providing a
>> non-existent command to Popen, but once we provide something
>> executable, we will not see an exception here if the command fails.
>> The CalledProcessError is only raised by subprocess.check_call() or
>> subprocess.check_output().
>
> Okay.
>
> Catching all exceptions is too sloppy for my taste.  But it's not your
> patch's fault.
>
>>>>          except:
>>>
>>> Why do we catch all exceptions, and not just
>>> subprocess.CalledProcessError?
>>>
>>>> +            self._load_io_log()
>>>>              if self.is_running():
>>>
>>> I suspect we can't get here only for "odd" exceptions, not for
>>> subprocess.CalledProcessError.
>>>
>>>>                  self._popen.kill()
>>>>                  self._popen.wait()
>>>> -            self._load_io_log()
>>>> +            else:
>>>> +                exc_type, exc_value, exc_traceback = sys.exc_info()
>>>> +                msg = ('Error launching VM.\n'
>>>> +                       'Original Exception: \n%s'
>>>> +                       'Command:\n%s\n'
>>>> +                       'Output:\n%s\n' %
>>>> +                       (''.join(traceback.format_exception(exc_type,
>>>> +                                                           exc_value,
>>>> +                                                           exc_traceback)),
>>>> +                        ' '.join(args),
>>>> +                        self._iolog))
>>>> +                self._post_shutdown()
>>>> +                raise RuntimeError(msg)
>>>>              self._post_shutdown()
>>>>              raise
>>>
>>> Why do we print a traceback when subprocess.CalledProcessError?
>>
>> subprocess.CalledProcessError will not happen with subprocess.Popen,
>> even if the command fails. But if the virtual machine does not start,
>> we will, for example, see a 'socket.timeout' exception coming from
>> _post_launch() when it tries to connect to the qemu monitor.
>
> A traceback is a inexcusably lazy way to report such errors.  You're
> putting lipstick on a pig.
>
> When putting on lipstick is all we can do, it can be better than
> nothing.  Is it all we can do?

Ok, so let me take this opportunity to cleanup this code at once.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error
  2017-07-20 14:43       ` Eduardo Habkost
@ 2017-07-20 15:51         ` Amador Pahim
  0 siblings, 0 replies; 17+ messages in thread
From: Amador Pahim @ 2017-07-20 15:51 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Markus Armbruster, qemu-devel, kwolf, Lukáš Doktor,
	mreitz, Cleber Rosa, Fam Zheng

On Thu, Jul 20, 2017 at 4:43 PM, Eduardo Habkost <ehabkost@redhat.com> wrote:
> On Thu, Jul 20, 2017 at 03:14:28PM +0200, Amador Pahim wrote:
>> On Thu, Jul 20, 2017 at 1:58 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> > Amador Pahim <apahim@redhat.com> writes:
>> >
>> >> When launching a VM, if an exception happens and the VM is not
>> >> initiated, it is useful to see the qemu command line that was executed
>> >> and the output of that command.
>> >>
>> >> Before the patch:
>> >>
>> >>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
>> >>     >>> VM.launch()
>> >>     Traceback (most recent call last):
>> >>       File "<stdin>", line 1, in <module>
>> >>       File "qemu.py", line 137, in launch
>> >>         self._post_launch()
>> >>       File "qemu.py", line 121, in _post_launch
>> >>         self._qmp.accept()
>> >>       File "qmp/qmp.py", line 145, in accept
>> >>         self.__sock, _ = self.__sock.accept()
>> >>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
>> >>         sock, addr = self._sock.accept()
>> >>     socket.timeout: timed out
>> >>
>> >> After the patch:
>> >>
>> >>     >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
>> >>     >>> VM.launch()
>> >>     Traceback (most recent call last):
>> >>       File "<stdin>", line 1, in <module>
>> >>       File "qemu.py", line 156, in launch
>> >>         raise RuntimeError(msg)
>> >>     RuntimeError: Error launching VM.
>> >>     Original Exception:
>> >>     Traceback (most recent call last):
>> >>       File "qemu.py", line 138, in launch
>> >>         self._post_launch()
>> >>       File "qemu.py", line 122, in _post_launch
>> >>         self._qmp.accept()
>> >>       File "qmp/qmp.py", line 145, in accept
>> >>         self.__sock, _ = self.__sock.accept()
>> >>       File "/usr/lib64/python2.7/socket.py", line 206, in accept
>> >>         sock, addr = self._sock.accept()
>> >>     timeout: timed out
>> >>     Command:
>> >>     /usr/bin/qemu-system-aarch64 -chardev socket,id=mon,
>> >>     path=/var/tmp/qemu-23958-monitor.sock -mon chardev=mon,mode=control
>> >>     -display none -vga none
>> >>     Output:
>> >>     qemu-system-aarch64: No machine specified, and there is no default
>> >>     Use -machine help to list supported machines
>> >>
>> >> Also, if the launch() faces an exception, the 'except' now will use args
>> >> to fill the debug information. So this patch assigns 'args' earlier,
>> >> assuring it will be available for the 'except'.
>> >>
>> >> Signed-off-by: Amador Pahim <apahim@redhat.com>
>> >> Reviewed-by: Fam Zheng <famz@redhat.com>
>> >> ---
>> >>  scripts/qemu.py | 18 ++++++++++++++++--
>> >>  1 file changed, 16 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/scripts/qemu.py b/scripts/qemu.py
>> >> index f0fade32bd..2707ae7f75 100644
>> >> --- a/scripts/qemu.py
>> >> +++ b/scripts/qemu.py
>> >> @@ -18,6 +18,7 @@ import os
>> >>  import sys
>> >>  import subprocess
>> >>  import qmp.qmp
>> >> +import traceback
>> >>
>> >>
>> >>  class QEMUMachine(object):
>> >> @@ -129,17 +130,30 @@ class QEMUMachine(object):
>> >>          '''Launch the VM and establish a QMP connection'''
>> >>          devnull = open('/dev/null', 'rb')
>> >>          qemulog = open(self._qemu_log_path, 'wb')
>> >> +        args = self._wrapper + [self._binary] + self._base_args() + self.args
>> >>          try:
>> >>              self._pre_launch()
>> >> -            args = self._wrapper + [self._binary] + self._base_args() + self._args
>> >>              self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
>> >>                                             stderr=subprocess.STDOUT, shell=False)
>> >>              self._post_launch()
>> >
>> > Lifting the assignment out of the try makes sense, because exceptions
>> > there are programming errors.
>> >
>> > Why are ._pre_launch() and ._post_launch() inside the try?
>>
>> The _pre_launch()/_post_launch() are probably the main source of
>> exceptions here and I assume we want to handle the exceptions coming
>> from them.
>>
>> The call to subprocess.Popen() is unlikely to raise exceptions. I
>> mean, it's easy to trigger an 'OSError' exception providing a
>> non-existent command to Popen, but once we provide something
>> executable, we will not see an exception here if the command fails.
>> The CalledProcessError is only raised by subprocess.check_call() or
>> subprocess.check_output().
>>
>>
>
> That's correct.  QEMU exiting won't cause an exception, it
> doesn't matter the error code.  Failing to connect to QMP or
> qtest sockets on _post_launch() will.
>
> However, another purpose of the try/except block is to ensure we
> clean up the process and log file in case _any_ exception happens
> (programming error or not).  That's why everything after
> open(self._qemu_log_path) is inside the try block.  We should be
> careful to not break that.
>
> I think this code really needs to be cleaned up.  The existing
> try/except block has a huge number of responsibilities:
> 1) Ensuring we kill the process in case _post_launch() raises an
>    exception.
> 2) Ensuring we load the log file even if Popen() or _post_launch()
>    raise an exception.
> 3) Ensuring we remove the monitor socket if anything after
>    _pre_launch() raises an exception (through _post_shutdown())
> 4) Ensuring the log file is deleted if anything after
>    open(_qemu_log_path) raises an exception (through
>    _post_shutdown())
> 5) Ensuring _post_shutdown() is called if anything after
>    _pre_launch() raises an exception (e.g. QEMUQTestMachine
>    deletes the socket created by _pre_launch() inside
>    _post_shutdown()
> 6) (New, added by this patch) Generating a more useful exception
>    message if something (what exactly?) fails.
>
> Some of the cases above are just programming errors, but it's
> still nice to do proper cleanup in case of programming errors.
> e.g.: if _base_args() raise an exception, we can ensure we delete
> the log file anyway (this patch breaks that).
>
> If we don't clean this up, future cleanups (e.g. moving
> _pre_launch() outside the try block) would easily break one or
> more responsibilities above, and this would be very easy to miss
> on patch review.

Agreed. Working on that.

>
>
>> >
>> >>          except:
>> >
>> > Why do we catch all exceptions, and not just
>> > subprocess.CalledProcessError?
>
> See above.
>
>> >
>> >> +            self._load_io_log()
>> >>              if self.is_running():
>> >
>> > I suspect we can't get here only for "odd" exceptions, not for
>> > subprocess.CalledProcessError.
>
> It depends on what you mean by "odd".
>
>> >
>> >>                  self._popen.kill()
>> >>                  self._popen.wait()
>> >> -            self._load_io_log()
>> >> +            else:
>> >> +                exc_type, exc_value, exc_traceback = sys.exc_info()
>> >> +                msg = ('Error launching VM.\n'
>> >> +                       'Original Exception: \n%s'
>> >> +                       'Command:\n%s\n'
>> >> +                       'Output:\n%s\n' %
>> >> +                       (''.join(traceback.format_exception(exc_type,
>> >> +                                                           exc_value,
>> >> +                                                           exc_traceback)),
>> >> +                        ' '.join(args),
>> >> +                        self._iolog))
>> >> +                self._post_shutdown()
>> >> +                raise RuntimeError(msg)
>> >>              self._post_shutdown()
>> >>              raise
>> >
>> > Why do we print a traceback when subprocess.CalledProcessError?
>>
>> subprocess.CalledProcessError will not happen with subprocess.Popen,
>> even if the command fails. But if the virtual machine does not start,
>> we will, for example, see a 'socket.timeout' exception coming from
>> _post_launch() when it tries to connect to the qemu monitor.
>
> Exactly.
>
> --
> Eduardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running()
  2017-07-20 15:09       ` Markus Armbruster
  2017-07-20 15:46         ` Amador Pahim
@ 2017-07-20 17:49         ` Eduardo Habkost
  2017-07-20 20:14           ` Amador Pahim
  1 sibling, 1 reply; 17+ messages in thread
From: Eduardo Habkost @ 2017-07-20 17:49 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Amador Pahim, kwolf, Lukáš Doktor, Fam Zheng,
	qemu-devel, mreitz, Cleber Rosa

On Thu, Jul 20, 2017 at 05:09:11PM +0200, Markus Armbruster wrote:
> Amador Pahim <apahim@redhat.com> writes:
> 
> > On Thu, Jul 20, 2017 at 1:49 PM, Markus Armbruster <armbru@redhat.com> wrote:
> >> Amador Pahim <apahim@redhat.com> writes:
> >>
> >>> Current implementation is broken. It does not really test if the child
> >>> process is running.
> >>
> >> What usage exactly is broken by this?  Got a reproducer for me?
> >
> > Problem is that 'returncode' is not set without a calling
> > poll()/wait()/communicate(), so it's only useful to test if the
> > process is running after such calls. But if we use 'poll()' instead,
> > it will, according to the docs, "Check if child process has
> > terminated. Set and return returncode attribute."
> >
> > Reproducer is:
> >
> >  >>> import subprocess
> >  >>> devnull = open('/dev/null', 'rb')
> >  >>> p = subprocess.Popen(['qemu-system-x86_64', '-broken'],
> > stdin=devnull, stdout=devnull, stderr=devnull, shell=False)
> >  >>> print p.returncode
> >  None
> >  >>> print p.poll()
> >  1
> >  >>> print p.returncode
> >  1
> >
> >>> The Popen.returncode will only be set after by a poll(), wait() or
> >>> communicate(). If the Popen fails to launch a VM, the Popen.returncode
> >>> will not turn to None by itself.
> >>
> >> Hmm.  What is the value of .returncode then?
> >
> > returncode starts with None and becomes the process exit code when the
> > process is over and one of that three methods is called (poll(),
> > wait() or communicate()).
> >
> > There's an error in my description though. The correct would be: "The
> > Popen.returncode will only be set after a call to poll(), wait() or
> > communicate(). If the Popen fails to launch a VM, the Popen.returncode
> > will not turn from None to the actual return code by itself."
> 
> Suggest to add ", and is_running() continues to report True".
> 
> >>> Instead of using Popen.returncode, let's use Popen.poll(), which
> >>> actually checks if child process has terminated.
> >>>
> >>> Signed-off-by: Amador Pahim <apahim@redhat.com>
> >>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
> >>> Reviewed-by: Fam Zheng <famz@redhat.com>
> >>> ---
> >>>  scripts/qemu.py | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/scripts/qemu.py b/scripts/qemu.py
> >>> index 880e3e8219..f0fade32bd 100644
> >>> --- a/scripts/qemu.py
> >>> +++ b/scripts/qemu.py
> >>> @@ -86,7 +86,7 @@ class QEMUMachine(object):
> >>>              raise
> >>>
> >>>      def is_running(self):
> >>> -        return self._popen and (self._popen.returncode is None)
> >>> +        return self._popen and (self._popen.poll() is None)
> >>>

After re-reading shutdown(), I think this is _not_ OK: if
is_running() return False before we call .wait(), we will never
load the log file or run _post_shutdown() if QEMU exits between
the launch() and shutdown() calls.

Yes, it's fragile.

The root problem on both launch() and shutdown() seems to be
coupling the external "is QEMU running?" state with the internal
"did we load the log file and ran _post_shutdown() already?"
state.

I see two possible approaches for this:

1) Benefit from the fact that the internal Popen state will not
   change under our feet unless we explicitly call
   poll()/wait()/etc, and keep the existing code.  (Not my
   favorite option)

2) Rewrite the code so that we don't depend on the subtle Popen
   internal state rules, and track our own internal state in
   a QEMUMachine attribute.  e.g.:

    def _handle_shutdown(self):
        '''Load log file and call _post_shutdown() hook if necessary'''
        # Must be called only after QEMU actually exited.
        assert not self.is_running()
        if self._shutdown_pending:
            if self.exitcode() < 0:
                sys.stderr.write('qemu received signal %i: %s\n' % (-exitcode, ' '.join(self._args)))
            self._load_io_log()
            self._post_shutdown()
            self._shutdown_pending = False

    def _terminate(self):
        '''Terminate QEMU if it's still running'''
        if self.is_running():
            try:
                self._qmp.cmd('quit')
                self._qmp.close()
            except:
                self._popen.kill()
                self._popen.wait()

    def _launch(self):
        '''Launch the VM and establish a QMP connection'''
        devnull = open('/dev/null', 'rb')
        qemulog = open(self._qemu_log_path, 'wb')
        self._shutdown_pending = True
        self._pre_launch()
        args = self._wrapper + [self._binary] + self._base_args() + self._args
        self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
                                       stderr=subprocess.STDOUT, shell=False)
        self._post_launch()

    def launch(self):
        try:
            self._launch()
        except:
            self._terminate()
            self._handle_shutdown()
            raise

    def shutdown(self):
        '''Terminate the VM and clean up'''
        self._terminate()
        self._handle_shutdown()


Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

> >>>      def exitcode(self):
> >>>          if self._popen is None:
> >>                return None
> >>            return self._popen.returncode
> >>
> >> Why is this one safe?
> >
> > Here it's used just to retrieve the value from the Popen.returncode.
> > It's not being used to check whether the process is running or not.
> 
> If self._popen is not None, we return self._popen.returncode.  It's None
> if .poll() etc. haven't been called.  Can this happen?  If not, why not?
> If yes, why is returning None then okay?

It can't happen because the only caller of exitcode()
(device-crash-test) calls it immediately after shutdown().  But
it would be nice to make exitcode() behavior consistent with
is_running().

-- 
Eduardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running()
  2017-07-20 17:49         ` Eduardo Habkost
@ 2017-07-20 20:14           ` Amador Pahim
  2017-07-21  7:34             ` Lukáš Doktor
  0 siblings, 1 reply; 17+ messages in thread
From: Amador Pahim @ 2017-07-20 20:14 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Markus Armbruster, kwolf, Lukáš Doktor, Fam Zheng,
	qemu-devel, mreitz, Cleber Rosa

On Thu, Jul 20, 2017 at 7:49 PM, Eduardo Habkost <ehabkost@redhat.com> wrote:
> On Thu, Jul 20, 2017 at 05:09:11PM +0200, Markus Armbruster wrote:
>> Amador Pahim <apahim@redhat.com> writes:
>>
>> > On Thu, Jul 20, 2017 at 1:49 PM, Markus Armbruster <armbru@redhat.com> wrote:
>> >> Amador Pahim <apahim@redhat.com> writes:
>> >>
>> >>> Current implementation is broken. It does not really test if the child
>> >>> process is running.
>> >>
>> >> What usage exactly is broken by this?  Got a reproducer for me?
>> >
>> > Problem is that 'returncode' is not set without a calling
>> > poll()/wait()/communicate(), so it's only useful to test if the
>> > process is running after such calls. But if we use 'poll()' instead,
>> > it will, according to the docs, "Check if child process has
>> > terminated. Set and return returncode attribute."
>> >
>> > Reproducer is:
>> >
>> >  >>> import subprocess
>> >  >>> devnull = open('/dev/null', 'rb')
>> >  >>> p = subprocess.Popen(['qemu-system-x86_64', '-broken'],
>> > stdin=devnull, stdout=devnull, stderr=devnull, shell=False)
>> >  >>> print p.returncode
>> >  None
>> >  >>> print p.poll()
>> >  1
>> >  >>> print p.returncode
>> >  1
>> >
>> >>> The Popen.returncode will only be set after by a poll(), wait() or
>> >>> communicate(). If the Popen fails to launch a VM, the Popen.returncode
>> >>> will not turn to None by itself.
>> >>
>> >> Hmm.  What is the value of .returncode then?
>> >
>> > returncode starts with None and becomes the process exit code when the
>> > process is over and one of that three methods is called (poll(),
>> > wait() or communicate()).
>> >
>> > There's an error in my description though. The correct would be: "The
>> > Popen.returncode will only be set after a call to poll(), wait() or
>> > communicate(). If the Popen fails to launch a VM, the Popen.returncode
>> > will not turn from None to the actual return code by itself."
>>
>> Suggest to add ", and is_running() continues to report True".
>>
>> >>> Instead of using Popen.returncode, let's use Popen.poll(), which
>> >>> actually checks if child process has terminated.
>> >>>
>> >>> Signed-off-by: Amador Pahim <apahim@redhat.com>
>> >>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>> >>> Reviewed-by: Fam Zheng <famz@redhat.com>
>> >>> ---
>> >>>  scripts/qemu.py | 2 +-
>> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>>
>> >>> diff --git a/scripts/qemu.py b/scripts/qemu.py
>> >>> index 880e3e8219..f0fade32bd 100644
>> >>> --- a/scripts/qemu.py
>> >>> +++ b/scripts/qemu.py
>> >>> @@ -86,7 +86,7 @@ class QEMUMachine(object):
>> >>>              raise
>> >>>
>> >>>      def is_running(self):
>> >>> -        return self._popen and (self._popen.returncode is None)
>> >>> +        return self._popen and (self._popen.poll() is None)
>> >>>
>
> After re-reading shutdown(), I think this is _not_ OK: if
> is_running() return False before we call .wait(), we will never
> load the log file or run _post_shutdown() if QEMU exits between
> the launch() and shutdown() calls.

Yes, I just noticed that while cleaning up the code.

>
> Yes, it's fragile.
>
> The root problem on both launch() and shutdown() seems to be
> coupling the external "is QEMU running?" state with the internal
> "did we load the log file and ran _post_shutdown() already?"
> state.
>
> I see two possible approaches for this:
>
> 1) Benefit from the fact that the internal Popen state will not
>    change under our feet unless we explicitly call
>    poll()/wait()/etc, and keep the existing code.  (Not my
>    favorite option)
>
> 2) Rewrite the code so that we don't depend on the subtle Popen
>    internal state rules, and track our own internal state in
>    a QEMUMachine attribute.  e.g.:

+1 for this approach. I'm working on something similar, thanks for the
detailed "e.g." code here.

>
>     def _handle_shutdown(self):
>         '''Load log file and call _post_shutdown() hook if necessary'''
>         # Must be called only after QEMU actually exited.
>         assert not self.is_running()
>         if self._shutdown_pending:
>             if self.exitcode() < 0:
>                 sys.stderr.write('qemu received signal %i: %s\n' % (-exitcode, ' '.join(self._args)))
>             self._load_io_log()
>             self._post_shutdown()
>             self._shutdown_pending = False
>
>     def _terminate(self):
>         '''Terminate QEMU if it's still running'''
>         if self.is_running():
>             try:
>                 self._qmp.cmd('quit')
>                 self._qmp.close()
>             except:
>                 self._popen.kill()
>                 self._popen.wait()
>
>     def _launch(self):
>         '''Launch the VM and establish a QMP connection'''
>         devnull = open('/dev/null', 'rb')
>         qemulog = open(self._qemu_log_path, 'wb')
>         self._shutdown_pending = True
>         self._pre_launch()
>         args = self._wrapper + [self._binary] + self._base_args() + self._args
>         self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
>                                        stderr=subprocess.STDOUT, shell=False)
>         self._post_launch()
>
>     def launch(self):
>         try:
>             self._launch()
>         except:
>             self._terminate()
>             self._handle_shutdown()
>             raise
>
>     def shutdown(self):
>         '''Terminate the VM and clean up'''
>         self._terminate()
>         self._handle_shutdown()
>
>
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>
>> >>>      def exitcode(self):
>> >>>          if self._popen is None:
>> >>                return None
>> >>            return self._popen.returncode
>> >>
>> >> Why is this one safe?
>> >
>> > Here it's used just to retrieve the value from the Popen.returncode.
>> > It's not being used to check whether the process is running or not.
>>
>> If self._popen is not None, we return self._popen.returncode.  It's None
>> if .poll() etc. haven't been called.  Can this happen?  If not, why not?
>> If yes, why is returning None then okay?
>
> It can't happen because the only caller of exitcode()
> (device-crash-test) calls it immediately after shutdown().  But
> it would be nice to make exitcode() behavior consistent with
> is_running().
>
> --
> Eduardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running()
  2017-07-20 20:14           ` Amador Pahim
@ 2017-07-21  7:34             ` Lukáš Doktor
  0 siblings, 0 replies; 17+ messages in thread
From: Lukáš Doktor @ 2017-07-21  7:34 UTC (permalink / raw)
  To: Amador Pahim, Eduardo Habkost
  Cc: Markus Armbruster, kwolf, Fam Zheng, qemu-devel, mreitz, Cleber Rosa

[-- Attachment #1: Type: text/plain, Size: 7800 bytes --]

Dne 20.7.2017 v 22:14 Amador Pahim napsal(a):
> On Thu, Jul 20, 2017 at 7:49 PM, Eduardo Habkost <ehabkost@redhat.com> wrote:
>> On Thu, Jul 20, 2017 at 05:09:11PM +0200, Markus Armbruster wrote:
>>> Amador Pahim <apahim@redhat.com> writes:
>>>
>>>> On Thu, Jul 20, 2017 at 1:49 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>>>> Amador Pahim <apahim@redhat.com> writes:
>>>>>
>>>>>> Current implementation is broken. It does not really test if the child
>>>>>> process is running.
>>>>>
>>>>> What usage exactly is broken by this?  Got a reproducer for me?
>>>>
>>>> Problem is that 'returncode' is not set without a calling
>>>> poll()/wait()/communicate(), so it's only useful to test if the
>>>> process is running after such calls. But if we use 'poll()' instead,
>>>> it will, according to the docs, "Check if child process has
>>>> terminated. Set and return returncode attribute."
>>>>
>>>> Reproducer is:
>>>>
>>>>  >>> import subprocess
>>>>  >>> devnull = open('/dev/null', 'rb')
>>>>  >>> p = subprocess.Popen(['qemu-system-x86_64', '-broken'],
>>>> stdin=devnull, stdout=devnull, stderr=devnull, shell=False)
>>>>  >>> print p.returncode
>>>>  None
>>>>  >>> print p.poll()
>>>>  1
>>>>  >>> print p.returncode
>>>>  1
>>>>
>>>>>> The Popen.returncode will only be set after by a poll(), wait() or
>>>>>> communicate(). If the Popen fails to launch a VM, the Popen.returncode
>>>>>> will not turn to None by itself.
>>>>>
>>>>> Hmm.  What is the value of .returncode then?
>>>>
>>>> returncode starts with None and becomes the process exit code when the
>>>> process is over and one of that three methods is called (poll(),
>>>> wait() or communicate()).
>>>>
>>>> There's an error in my description though. The correct would be: "The
>>>> Popen.returncode will only be set after a call to poll(), wait() or
>>>> communicate(). If the Popen fails to launch a VM, the Popen.returncode
>>>> will not turn from None to the actual return code by itself."
>>>
>>> Suggest to add ", and is_running() continues to report True".
>>>
>>>>>> Instead of using Popen.returncode, let's use Popen.poll(), which
>>>>>> actually checks if child process has terminated.
>>>>>>
>>>>>> Signed-off-by: Amador Pahim <apahim@redhat.com>
>>>>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>>>>> Reviewed-by: Fam Zheng <famz@redhat.com>
>>>>>> ---
>>>>>>  scripts/qemu.py | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/scripts/qemu.py b/scripts/qemu.py
>>>>>> index 880e3e8219..f0fade32bd 100644
>>>>>> --- a/scripts/qemu.py
>>>>>> +++ b/scripts/qemu.py
>>>>>> @@ -86,7 +86,7 @@ class QEMUMachine(object):
>>>>>>              raise
>>>>>>
>>>>>>      def is_running(self):
>>>>>> -        return self._popen and (self._popen.returncode is None)
>>>>>> +        return self._popen and (self._popen.poll() is None)
>>>>>>
>>
>> After re-reading shutdown(), I think this is _not_ OK: if
>> is_running() return False before we call .wait(), we will never
>> load the log file or run _post_shutdown() if QEMU exits between
>> the launch() and shutdown() calls.
> 
> Yes, I just noticed that while cleaning up the code.
> 
>>
>> Yes, it's fragile.
>>
>> The root problem on both launch() and shutdown() seems to be
>> coupling the external "is QEMU running?" state with the internal
>> "did we load the log file and ran _post_shutdown() already?"
>> state.
>>
>> I see two possible approaches for this:
>>
>> 1) Benefit from the fact that the internal Popen state will not
>>    change under our feet unless we explicitly call
>>    poll()/wait()/etc, and keep the existing code.  (Not my
>>    favorite option)
>>
>> 2) Rewrite the code so that we don't depend on the subtle Popen
>>    internal state rules, and track our own internal state in
>>    a QEMUMachine attribute.  e.g.:
> 
> +1 for this approach. I'm working on something similar, thanks for the
> detailed "e.g." code here.
> 
>>
>>     def _handle_shutdown(self):
>>         '''Load log file and call _post_shutdown() hook if necessary'''
>>         # Must be called only after QEMU actually exited.
>>         assert not self.is_running()
>>         if self._shutdown_pending:
>>             if self.exitcode() < 0:
>>                 sys.stderr.write('qemu received signal %i: %s\n' % (-exitcode, ' '.join(self._args)))
>>             self._load_io_log()
>>             self._post_shutdown()
>>             self._shutdown_pending = False
>>
>>     def _terminate(self):
>>         '''Terminate QEMU if it's still running'''
>>         if self.is_running():
>>             try:
>>                 self._qmp.cmd('quit')
>>                 self._qmp.close()
>>             except:
>>                 self._popen.kill()
>>                 self._popen.wait()
>>
>>     def _launch(self):
>>         '''Launch the VM and establish a QMP connection'''
>>         devnull = open('/dev/null', 'rb')
>>         qemulog = open(self._qemu_log_path, 'wb')
>>         self._shutdown_pending = True
>>         self._pre_launch()
>>         args = self._wrapper + [self._binary] + self._base_args() + self._args
>>         self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
>>                                        stderr=subprocess.STDOUT, shell=False)
>>         self._post_launch()
>>
>>     def launch(self):
>>         try:
>>             self._launch()
>>         except:
>>             self._terminate()
>>             self._handle_shutdown()
>>             raise
>>
>>     def shutdown(self):
>>         '''Terminate the VM and clean up'''
>>         self._terminate()
>>         self._handle_shutdown()
>>
This part also caught my attention and I also meant to improve it when this series is merged. Anyway let's state my suggestions here, take it or let it go:

1. `get_log` should check whether `self._iolog` is `None` and then it should check for process status
2. the `self._iolog` is `None` is the indication whether `shutdown` was called or not (not whether the process exists or not)
3. add `__del__` to cleanup in case one forgets to call `shutdown` (currently the files and processes are left behind)
4. use `name = "qemu-%d-%d" % (os.getpid(), id(self))` to allow multiple instances with default name at the same time.


Also I just realized that even with just this patch (as is) files/processes can be left behind:

    >>> import qemu, os
    >>> a=qemu.QEMUMachine("/usr/bin/qemu-kvm", debug=True)
    >>> a.launch()
    QMP:>>> {'execute': 'qmp_capabilities'}
    QMP:<<< {u'return': {}}
    >>> a.is_running()
    False
    >>> a.shutdown()
    >>> os.path.exists(a._qemu_log_path)
    True

Before this patch it worked well as (as Eduardo mentioned) the `is_running` was tracing internal state, not the process state.

Regards,
Lukáš

>>
>> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>>
>>>>>>      def exitcode(self):
>>>>>>          if self._popen is None:
>>>>>                return None
>>>>>            return self._popen.returncode
>>>>>
>>>>> Why is this one safe?
>>>>
>>>> Here it's used just to retrieve the value from the Popen.returncode.
>>>> It's not being used to check whether the process is running or not.
>>>
>>> If self._popen is not None, we return self._popen.returncode.  It's None
>>> if .poll() etc. haven't been called.  Can this happen?  If not, why not?
>>> If yes, why is returning None then okay?
>>
>> It can't happen because the only caller of exitcode()
>> (device-crash-test) calls it immediately after shutdown().  But
>> it would be nice to make exitcode() behavior consistent with
>> is_running().
>>
>> --
>> Eduardo



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 502 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-07-21  7:34 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-20  9:18 [Qemu-devel] [PATCH v3 0/3] scripts/qemu.py small fixes Amador Pahim
2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running() Amador Pahim
2017-07-20 11:49   ` Markus Armbruster
2017-07-20 12:57     ` Amador Pahim
2017-07-20 15:09       ` Markus Armbruster
2017-07-20 15:46         ` Amador Pahim
2017-07-20 17:49         ` Eduardo Habkost
2017-07-20 20:14           ` Amador Pahim
2017-07-21  7:34             ` Lukáš Doktor
2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error Amador Pahim
2017-07-20 11:58   ` Markus Armbruster
2017-07-20 13:14     ` Amador Pahim
2017-07-20 14:43       ` Eduardo Habkost
2017-07-20 15:51         ` Amador Pahim
2017-07-20 15:01       ` Markus Armbruster
2017-07-20 15:50         ` Amador Pahim
2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 3/3] qemu.py: make 'args' public Amador Pahim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.