All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>, qemu-devel@nongnu.org
Cc: "Laurent Vivier" <lvivier@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Yongji Xie" <elohimes@gmail.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Max Reitz" <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected
Date: Mon, 22 Apr 2019 09:51:17 -0500	[thread overview]
Message-ID: <46b6b751-4e3f-1b11-9ac7-d0d73cca2227@redhat.com> (raw)
In-Reply-To: <20190211182442.8542-1-berrange@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2938 bytes --]

On 2/11/19 12:24 PM, Daniel P. Berrangé wrote:
> This is a followup to
> 
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
>   v2: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg05947.html
> 
> This series comes out of a discussion between myself & Yongji Xie in:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01881.html
> 
> I eventually understood that the problem faced was that
> tcp_chr_wait_connected was racing with the background connection attempt
> previously started, causing two connections to be established. This
> broke because some vhost user servers only allow a single connection.
> 
> After messing around with the code alot the final solution was in fact
> very easy. We simply have to delay the first background connection
> attempt until the main loop is running. It will then automatically
> turn into a no-op if tcp_chr_wait_connected has been run. This is
> dealt with in the last patch in this series
> 
> I believe this should solve the problem Yongji Xie faced, and thus not
> require us to add support for "nowait" option with client sockets at
> all. The reconnect=1 option effectively already implements nowait
> semantics, and now plays nicely with tcp_chr_wait_connected.
> 
> In investigating this I found various other bugs that needed fixing and
> identified some useful refactoring to simplify / clarify the code, hence
> this very long series.

Even with this series applied, I'm still seeing sporadic failures of
iotest 169. Max posted a hack patch a while back that tries to work
around the race:

https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05907.html

which he originally diagnosed in iotest 147:
https://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg05579.html

but as it was a hack, he has not pursued it further, and so the symptoms
are still there, although not completely reproducible:

169 10s ... - output mismatch (see 169.out.bad)
--- /home/eblake/qemu/tests/qemu-iotests/169.out	2018-11-16
15:48:12.018526748 -0600
+++ /home/eblake/qemu/tests/qemu-iotests/169.out.bad	2019-04-22
09:38:45.481517132 -0500
@@ -1,3 +1,5 @@
+WARNING:qemu:qemu received signal 11:
/home/eblake/qemu/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
-chardev
socket,id=mon,path=/home/eblake/qemu/tests/qemu-iotests/scratch/tmp4clmPF/qemua-26803-monitor.sock
-mon chardev=mon,mode=control -display none -vga none -qtest
unix:path=/home/eblake/qemu/tests/qemu-iotests/scratch/qemua-26803-qtest.sock
-machine accel=qtest -nodefaults -machine accel=qtest -drive
if=virtio,id=drive0,file=/home/eblake/qemu/tests/qemu-iotests/scratch/disk_a,format=qcow2,cache=writeback

Any chance you can take a look as to what a non-hack fix should be?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Eric Blake <eblake@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>, qemu-devel@nongnu.org
Cc: "Laurent Vivier" <lvivier@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>, "Max Reitz" <mreitz@redhat.com>,
	"Yongji Xie" <elohimes@gmail.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected
Date: Mon, 22 Apr 2019 09:51:17 -0500	[thread overview]
Message-ID: <46b6b751-4e3f-1b11-9ac7-d0d73cca2227@redhat.com> (raw)
Message-ID: <20190422145117.awsPh8lsu3V5jIdPnhdPxlpUJ-GL5kIJ9-KJ95CHrKc@z> (raw)
In-Reply-To: <20190211182442.8542-1-berrange@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2938 bytes --]

On 2/11/19 12:24 PM, Daniel P. Berrangé wrote:
> This is a followup to
> 
>   v1: https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg03344.html
>   v2: http://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg05947.html
> 
> This series comes out of a discussion between myself & Yongji Xie in:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg01881.html
> 
> I eventually understood that the problem faced was that
> tcp_chr_wait_connected was racing with the background connection attempt
> previously started, causing two connections to be established. This
> broke because some vhost user servers only allow a single connection.
> 
> After messing around with the code alot the final solution was in fact
> very easy. We simply have to delay the first background connection
> attempt until the main loop is running. It will then automatically
> turn into a no-op if tcp_chr_wait_connected has been run. This is
> dealt with in the last patch in this series
> 
> I believe this should solve the problem Yongji Xie faced, and thus not
> require us to add support for "nowait" option with client sockets at
> all. The reconnect=1 option effectively already implements nowait
> semantics, and now plays nicely with tcp_chr_wait_connected.
> 
> In investigating this I found various other bugs that needed fixing and
> identified some useful refactoring to simplify / clarify the code, hence
> this very long series.

Even with this series applied, I'm still seeing sporadic failures of
iotest 169. Max posted a hack patch a while back that tries to work
around the race:

https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg05907.html

which he originally diagnosed in iotest 147:
https://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg05579.html

but as it was a hack, he has not pursued it further, and so the symptoms
are still there, although not completely reproducible:

169 10s ... - output mismatch (see 169.out.bad)
--- /home/eblake/qemu/tests/qemu-iotests/169.out	2018-11-16
15:48:12.018526748 -0600
+++ /home/eblake/qemu/tests/qemu-iotests/169.out.bad	2019-04-22
09:38:45.481517132 -0500
@@ -1,3 +1,5 @@
+WARNING:qemu:qemu received signal 11:
/home/eblake/qemu/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64
-chardev
socket,id=mon,path=/home/eblake/qemu/tests/qemu-iotests/scratch/tmp4clmPF/qemua-26803-monitor.sock
-mon chardev=mon,mode=control -display none -vga none -qtest
unix:path=/home/eblake/qemu/tests/qemu-iotests/scratch/qemua-26803-qtest.sock
-machine accel=qtest -nodefaults -machine accel=qtest -drive
if=virtio,id=drive0,file=/home/eblake/qemu/tests/qemu-iotests/scratch/disk_a,format=qcow2,cache=writeback

Any chance you can take a look as to what a non-hack fix should be?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2019-04-22 14:51 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-11 18:24 [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 01/16] io: store reference to thread information in the QIOTask struct Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 02/16] io: add qio_task_wait_thread to join with a background thread Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 03/16] chardev: fix validation of options for QMP created chardevs Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 04/16] chardev: forbid 'reconnect' option with server sockets Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 05/16] chardev: forbid 'wait' option with client sockets Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 06/16] chardev: remove many local variables in qemu_chr_parse_socket Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 07/16] chardev: ensure qemu_chr_parse_compat reports missing driver error Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 08/16] chardev: remove unused 'sioc' variable & cleanup paths Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 09/16] chardev: split tcp_chr_wait_connected into two methods Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 10/16] chardev: split up qmp_chardev_open_socket connection code Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 11/16] chardev: use a state machine for socket connection state Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 12/16] chardev: honour the reconnect setting in tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 13/16] chardev: disallow TLS/telnet/websocket with tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 14/16] chardev: fix race with client connections in tcp_chr_wait_connected Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 15/16] tests: expand coverage of socket chardev test Daniel P. Berrangé
2019-02-11 18:24 ` [Qemu-devel] [PATCH v3 16/16] chardev: ensure termios is fully initialized Daniel P. Berrangé
2019-04-22 14:51 ` Eric Blake [this message]
2019-04-22 14:51   ` [Qemu-devel] [PATCH v3 00/16] chardev: refactoring & many bugfixes related tcp_chr_wait_connected Eric Blake
2019-04-23 14:13   ` Daniel P. Berrangé
2019-04-23 14:13     ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46b6b751-4e3f-1b11-9ac7-d0d73cca2227@redhat.com \
    --to=eblake@redhat.com \
    --cc=berrange@redhat.com \
    --cc=elohimes@gmail.com \
    --cc=lvivier@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.