All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eduardo Habkost <ehabkost@redhat.com>
To: "Philippe Mathieu-Daudé" <philmd@redhat.com>
Cc: kwolf@redhat.com, Markus Armbruster <armbru@redhat.com>,
	John Snow <jsnow@redhat.com>,
	qemu-devel@nongnu.org, Cleber Rosa <crosa@redhat.com>
Subject: Re: [PATCH v3 0/3] python/machine.py: refactor shutdown
Date: Mon, 15 Jun 2020 18:21:52 -0400	[thread overview]
Message-ID: <20200615222152.GT2366737@habkost.net> (raw)
In-Reply-To: <5d20012b-923c-0bf4-232d-272977087fe3@redhat.com>

On Mon, Jun 15, 2020 at 05:21:18PM +0200, Philippe Mathieu-Daudé wrote:
> On 6/9/20 11:55 PM, John Snow wrote:
> > 
> > 
> > On 6/9/20 9:08 AM, Philippe Mathieu-Daudé wrote:
> >> Hi John,
> >>
> >> On 6/4/20 9:52 PM, John Snow wrote:
> >>> v3:
> >>>  - Split _post_shutdown refactor into own patch (now 1/3)
> >>>  - Re-add sigkill warning squelch (now 3/3)
> >>>
> >>> NOTE: I re-added the squelch in its own patch for review purposes, but
> >>> for the purposes of avoiding temporary breakage, a maintainer may wish
> >>> to squash patches 2 and 3 if they are accepted.
> >>>
> >>> v2: Philippe took patches 1, 3 and 4.
> >>>
> >>> This is a re-write of what was:
> >>> [PATCH RFC 03/32] python//machine.py: remove bare except
> >>> [PATCH 2/4] python/machine.py: remove bare except
> >>>
> >>> It's a bit heavier handed, but it should address some of kwolf's
> >>> feedback from the RFC version.
> >>>
> >>> Applies straight to origin/master, ought to pass pylint and flake8:
> >>>
> >>>> cd ~/src/qemu/python/qemu
> >>>> pylint *.py
> >>>> flake8 *.py
> >>>
> >>> John Snow (3):
> >>>   python/machine.py: consolidate _post_shutdown()
> >>>   python/machine.py: refactor shutdown
> >>>   python/machine.py: re-add sigkill warning suppression
> >>>
> >>>  python/qemu/machine.py | 100 +++++++++++++++++++++++++++++------------
> >>>  1 file changed, 71 insertions(+), 29 deletions(-)
> >>>
> >>
> >> I'm now seeing this error:
> >>
> >> 21:31:58 DEBUG| / # reboot
> >> 21:32:01 DEBUG| / # reboot: Restarting system
> >> 21:32:01 DEBUG| >>> {'execute': 'quit'}
> >> 21:32:01 WARNI| qemu received signal 9; command:
> >> "mips-softmmu/qemu-system-mips -display none -vga none -chardev
> >> socket,id=mon,path=/tmp/tmp679upvrk/qemu-10292-monitor.sock -mon
> >> chardev=mon,mode=control -machine malta -chardev
> >> socket,id=console,path=/tmp/tmp679upvrk/qemu-10292-console.sock,server,nowait
> >> -serial chardev:console -kernel
> >> /tmp/avocado_b3aaagr9/avocado_job_5bj0xe1h/12-tests_acceptance_boot_linux_console.py_BootLinuxConsole.test_mips_malta_cpio/boot/vmlinux-4.5.0-2-4kc-malta
> >> -initrd
> >> /tmp/avocado_b3aaagr9/avocado_job_5bj0xe1h/12-tests_acceptance_boot_linux_console.py_BootLinuxConsole.test_mips_malta_cpiorootfs.cpio
> >> -append printk.time=0 console=ttyS0 console=tty rdinit=/sbin/init
> >> noreboot -no-reboot"
> >> 21:32:01 ERROR|
> >> 21:32:01 ERROR| Reproduced traceback from:
> >> /home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/core/test.py:886
> >> 21:32:01 ERROR| Traceback (most recent call last):
> >> 21:32:01 ERROR|   File
> >> "/home/travis/build/philmd/qemu/build/tests/acceptance/avocado_qemu/__init__.py",
> >> line 195, in tearDown
> >> 21:32:01 ERROR|     vm.shutdown()
> >> 21:32:01 ERROR|   File
> >> "/home/travis/build/philmd/qemu/python/qemu/machine.py", line 449, in
> >> shutdown
> >> 21:32:01 ERROR|     self._do_shutdown(has_quit)
> >> 21:32:01 ERROR|   File
> >> "/home/travis/build/philmd/qemu/python/qemu/machine.py", line 426, in
> >> _do_shutdown
> >> 21:32:01 ERROR|     self._soft_shutdown(has_quit, timeout)
> >> 21:32:01 ERROR|   File
> >> "/home/travis/build/philmd/qemu/python/qemu/machine.py", line 413, in
> >> _soft_shutdown
> >> 21:32:01 ERROR|     self._qmp.cmd('quit')
> >> 21:32:01 ERROR|   File
> >> "/home/travis/build/philmd/qemu/python/qemu/qmp.py", line 271, in cmd
> >> 21:32:01 ERROR|     return self.cmd_obj(qmp_cmd)
> >> 21:32:01 ERROR|   File
> >> "/home/travis/build/philmd/qemu/python/qemu/qmp.py", line 249, in cmd_obj
> >> 21:32:01 ERROR|     self.__sock.sendall(json.dumps(qmp_cmd).encode('utf-8'))
> >> 21:32:01 ERROR| BrokenPipeError: [Errno 32] Broken pipe
> >> 21:32:01 ERROR|
> >> 21:32:01 DEBUG| DATA (filename=output.expected) => NOT FOUND (data
> >> sources: variant, test, file)
> >> 21:32:01 DEBUG| DATA (filename=stdout.expected) => NOT FOUND (data
> >> sources: variant, test, file)
> >> 21:32:01 DEBUG| DATA (filename=stderr.expected) => NOT FOUND (data
> >> sources: variant, test, file)
> >> 21:32:01 DEBUG| Not logging /var/log/syslog (lack of permissions)
> >> 21:32:01 ERROR| ERROR
> >> 12-tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_mips_malta_cpio
> >> -> TestSetupFail: [Errno 32] Broken pipe
> >> 21:32:01 INFO |
> >>
> >> https://travis-ci.org/github/philmd/qemu/jobs/696142277#L5329
> >>
> > 
> > Gotcha.
> > 
> > The problem here is that `test_mips_malta_cpio` in boot_linux_console.py
> > does this:
> > 
> >         self.vm.add_args('-kernel', kernel_path,
> >                          '-initrd', initrd_path,
> >                          '-append', kernel_command_line,
> >                          '-no-reboot')
> > 
> > and then:
> > 
> > exec_command_and_wait_for_pattern(self, 'reboot',
> >                          'reboot: Restarting system')
> > 
> > and (in avocado_qemu/) __init__.py does this:
> > 
> >     def tearDown(self):
> >         for vm in self._vms.values():
> >             vm.shutdown()
> > 
> > 
> > 
> > What's happening here is that we are instructing QEMU to *close* when
> > the guest reboots instead of allowing it to reboot. Then, we are issuing
> > a reboot command to the guest, which will effectively terminate QEMU as
> > well. Finally, we are trying to send a shutdown command to QEMU, but
> > QEMU has already gone.
> > 
> > Now, in the shutdown code, we do make an attempt to catch this:
> > 
> > def is_running(self):
> >     """Returns true if the VM is running."""
> >     return self._popen is not None and self._popen.poll() is None
> > 
> > But, well, race conditions.
> > 
> > When we make it here:
> > 
> >         if self._qmp is not None:
> >             if not has_quit:
> >                 self._qmp.cmd('quit')
> >             self._qmp.close()
> > 
> > We believe we are running and we believe we have an open QMP socket.
> > Attempting to engage the socket by sending 'quit' causes the error.
> > 
> > It's a tight window: if quit happens earlier, we send the command
> > successfully and everything's OK. If quit happens later, we realize QEMU
> > isn't running and proceed to cleanup.
> 
> Nice debugging :)
> 
> > 
> > Ultimately:
> > 
> > - Avocado should not try to shut down QEMU twice, but
> > - machine.py shouldn't enable the race condition either.
> > 
> > 
> > 
> > for my part, how about this:
> > 
> > diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> > index 99bcb499878..813f8e477db 100644
> > --- a/python/qemu/machine.py
> > +++ b/python/qemu/machine.py
> > @@ -385,7 +385,14 @@ def _soft_shutdown(self, has_quit: bool = False,
> > timeout: int = 3) -> None:
> > 
> >          if self._qmp is not None:
> >              if not has_quit:
> > -                self._qmp.cmd('quit')
> > +                try:
> > +                    self._qmp.cmd('quit')
> > +                except (BrokenPipeError, ConnectionResetError):
> > +                    # QMP went away just before or just after sending
> > 'quit'
> > +                    if not self.is_running():
> > +                        # "Mission Accomplished"
> > +                        pass
> > +                    raise
> 
> Looks OK to me, Cleber/Eduardo can you Ack?

Looks good to me.

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>


> 
> >              self._qmp.close()
> > 
> >          self._popen.wait(timeout=timeout)
> > 
> 

-- 
Eduardo



  reply	other threads:[~2020-06-15 22:22 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04 19:52 [PATCH v3 0/3] python/machine.py: refactor shutdown John Snow
2020-06-04 19:52 ` [PATCH v3 1/3] python/machine.py: consolidate _post_shutdown() John Snow
2020-06-04 19:52 ` [PATCH v3 2/3] python/machine.py: refactor shutdown John Snow
2020-06-04 19:52 ` [PATCH v3 3/3] python/machine.py: re-add sigkill warning suppression John Snow
2020-06-05  9:34 ` [PATCH v3 0/3] python/machine.py: refactor shutdown Kevin Wolf
2020-06-09 13:08 ` Philippe Mathieu-Daudé
2020-06-09 21:55   ` John Snow
2020-06-15 15:21     ` Philippe Mathieu-Daudé
2020-06-15 22:21       ` Eduardo Habkost [this message]
2020-06-16 15:06         ` John Snow
2020-06-16 15:36           ` Philippe Mathieu-Daudé
2020-06-16 21:49       ` Cleber Rosa
2020-06-17 17:13         ` Philippe Mathieu-Daudé
2020-06-20 10:14           ` Philippe Mathieu-Daudé
2020-06-22 17:51             ` John Snow
2020-06-22 17:55               ` Philippe Mathieu-Daudé
2020-06-22 18:08                 ` John Snow
2020-06-22 18:28                   ` Philippe Mathieu-Daudé
2020-06-26 17:28             ` John Snow
2020-06-17 17:15 ` Philippe Mathieu-Daudé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200615222152.GT2366737@habkost.net \
    --to=ehabkost@redhat.com \
    --cc=armbru@redhat.com \
    --cc=crosa@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.