From: Eric Blake <eblake@redhat.com>
To: "Max Reitz" <mreitz@redhat.com>,
"Lukáš Doktor" <ldoktor@redhat.com>,
qemu-block@nongnu.org
Cc: kwolf@redhat.com, qemu-devel@nongnu.org,
Stefan Hajnoczi <stefanha@redhat.com>,
Aarushi Mehta <mehta.aaru20@gmail.com>
Subject: Re: [PATCH] tests/qemu_iotests: Minimize usage of used ports
Date: Thu, 6 Feb 2020 10:48:26 -0600 [thread overview]
Message-ID: <797578d5-bfab-5eb7-8921-0fcf1f3ee40e@redhat.com> (raw)
In-Reply-To: <726ca911-be83-c2d5-ff3f-efa32bc2233e@redhat.com>
On 2/6/20 10:37 AM, Max Reitz wrote:
>>
>> thank you and I am sorry for not digging deep enough. This week my CI failed with:
>>
>> 01:24:06 DEBUG| [stdout] +ERROR: test_inet (__main__.QemuNBD)
>> 01:24:06 DEBUG| [stdout] +----------------------------------------------------------------------
>> 01:24:06 DEBUG| [stdout] +Traceback (most recent call last):
>> 01:24:06 DEBUG| [stdout] + File "147", line 85, in setUp
>> 01:24:06 DEBUG| [stdout] + self.vm.launch()
>> 01:24:06 DEBUG| [stdout] + File "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/machine.py", line 302, in launch
>> 01:24:06 DEBUG| [stdout] + self._launch()
>> 01:24:06 DEBUG| [stdout] + File "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/machine.py", line 319, in _launch
>> 01:24:06 DEBUG| [stdout] + self._pre_launch()
>> 01:24:06 DEBUG| [stdout] + File "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/qtest.py", line 106, in _pre_launch
>> 01:24:06 DEBUG| [stdout] + super(QEMUQtestMachine, self)._pre_launch()
>> 01:24:06 DEBUG| [stdout] + File "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/machine.py", line 270, in _pre_launch
>> 01:24:06 DEBUG| [stdout] + self._qmp = qmp.QEMUMonitorProtocol(self._vm_monitor, server=True)
>> 01:24:06 DEBUG| [stdout] + File "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/qmp.py", line 60, in __init__
>> 01:24:06 DEBUG| [stdout] + self.__sock.bind(self.__address)
>> 01:24:06 DEBUG| [stdout] +OSError: [Errno 98] Address already in use
Was this test 147? If so, see:
https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg01469.html
because that failure matches what I was seeing.
>>
>> I made the mistake of reproducing this on my home system using the qemu revision that I had and assuming it's caused by a used port. So I limited the port range and used nc to occupy the port. It sort-of reproduced but instead of Address already in use it hanged until I kill the nc process. Then it failed with:
>>
>> +Traceback (most recent call last):
>> + File "147", line 124, in test_inet
>> + flatten_sock_addr(address))
>> + File "147", line 59, in client_test
>> + self.assert_qmp(result, 'return', {})
>> + File "/home/medic/Work/Projekty/qemu/tests/qemu-iotests/iotests.py", line 821, in assert_qmp
>> + result = self.dictpath(d, path)
>> + File "/home/medic/Work/Projekty/qemu/tests/qemu-iotests/iotests.py", line 797, in dictpath
>> + self.fail('failed path traversal for "%s" in "%s"' % (path, str(d)))
>> +AssertionError: failed path traversal for "return" in "{'error': {'class': 'GenericError', 'desc': 'Failed to read initial magic: Unexpected end-of-file before all bytes were read'}}"
>>
That's a secondary failure, I assume if the initial bug is fixed we are
less likely to hit the secondary one; but the secondary one may still be
worth fixing.
>> After a brief study I thought qemu is not doing the job well enough and wanted to add a protection. Anyway after a more thorough overview I came to a different conclusion and that is that we are facing the same issue as with incoming migration about a year ago. What happened is that I started "nc -l localhost 32789" which results in:
>>
>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
>> nc 26758 medic 3u IPv6 9579487 0t0 TCP localhost:32789 (LISTEN)
>>
>> Then we start the server by "_try_server_up" where qemu-nbd detects the port is occupied on IPv6 but available on IPv4, so it claims it:
>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
>> nc 26758 medic 3u IPv6 9579487 0t0 TCP localhost:32789 (LISTEN)
>> qemu-nbd 26927 medic 4u IPv4 9591857 0t0 TCP localhost:32789 (LISTEN)
>>
>> and reports success. Then we try to connect but the hotplugged VM first attempts to connect on the IPv6 address and hangs for infinity.
>>
>> Now is this an expected behavior? If so then we need the find_free_address (but preferably directly in _try_server_up just before starting the qemu-nbd) to leave as little time-frame for collision as possible. Otherwise the test is alright and qemu-nbd needs a fix to bail out in case some address is already used (IIRC this is what incoming migration does).
>
> Ah, OK.
>
> Well, expected behavior... It’s a shame, that’s what it is.
In libnbd, we recently improved the testsuite by switching over to
systemd-style fd passing: instead of asking qemu-nbd to open a random
port (and hoping it is available), we instead pre-open the port (where
failure is under our control) and then pass in that fd with environment
variables to qemu-nbd, which in turn guarantees that qemu-nbd won't hit
failures in trying to use the port. Maybe we should utilize that more
in qemu's own testsuite.
Also, I need to revisit my proposed patches for letting qemu-nbd support
TLS over Unix sockets, as that's another way to avoid TCP contention
(right now, qemu has an anachronistic prohibition against the
combination of TLS and Unix sockets).
>
>> My second mistake was testing this on the old code-base and rebasing it only before sending the patch (without testing after the rebase). If I were to test it first, I would have found out that the real reproducer is simply running the test as the commit 8dff69b9415b4287e900358744b732195e1ab2e2 broke it.
>>
>>
>> So basically there are 2 actions:
>>
>> 1. fix the test as on my system it fails in 100% of cases, bisect says the first bad commit is 8dff69b9415b4287e900358744b732195e1ab2e2. Would anyone have time in digging into this? I already spent way too much time on this and don't really know what that commit is trying to do.
>
> Yep, I’ve sent a patch:
>
> https://lists.nongnu.org/archive/html/qemu-block/2020-02/msg00294.html
Ah, so we did notice the same problem.
>
>> 2. decide on the behavior when IPv4/6 is already in use (bail-out or start).
>> 2a. In case it should bail-out than the test is correct and there is no need for my patch. On the other hand qemu-nbd has to be fixed
>
> I don’t think it makes much sense to let qemu’s NBD server ensure that
> it claims both IPv4 and IPv6 in case the user specifies a
> non-descriptive hostname.
>
>> 2b. Otherwise I can send a v2 that will check the port in the _try_server_up just before starting qemu-nbd to minimize the risk of using a utilized port (or should you decide it's not worth checking, I can simply forget about this)
>
> Hm. It wouldn’t be fully reliable, but, well... The risk would be minimal.
>
> OTOH, would it work if we just did a %s/localhost/127.0.0.1/ in the
> test? We have specific cases for IPv6, so I think it makes sense to
> force IPv4 in all other cases.
Except then it will fail on machines configured for IPv6-only.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
next prev parent reply other threads:[~2020-02-06 16:50 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-03 7:59 [PATCH] tests/qemu_iotests: Minimize usage of used ports Lukáš Doktor
2020-02-03 15:32 ` Eric Blake
2020-02-06 15:03 ` Max Reitz
2020-02-06 16:27 ` Lukáš Doktor
2020-02-06 16:37 ` Max Reitz
2020-02-06 16:48 ` Eric Blake [this message]
2020-02-06 16:59 ` Max Reitz
2020-02-06 18:33 ` Lukáš Doktor
2020-02-07 8:24 ` Max Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=797578d5-bfab-5eb7-8921-0fcf1f3ee40e@redhat.com \
--to=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=ldoktor@redhat.com \
--cc=mehta.aaru20@gmail.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.