All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: "manish.mishra" <manish.mishra@nutanix.com>
Cc: qemu-devel@nongnu.org, peterx@redhat.com,
	prerna.saxena@nutanix.com, quintela@redhat.com,
	dgilbert@redhat.com, lsoaresp@redhat.com
Subject: Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel
Date: Tue, 22 Nov 2022 10:31:59 +0000	[thread overview]
Message-ID: <Y3ylH0J7rl5o5KrI@redhat.com> (raw)
In-Reply-To: <f03d5744-8369-ed73-49f8-9a53a9507afb@nutanix.com>

On Tue, Nov 22, 2022 at 03:43:55PM +0530, manish.mishra wrote:
> 
> On 22/11/22 3:23 pm, Daniel P. Berrangé wrote:
> > On Tue, Nov 22, 2022 at 03:10:53PM +0530, manish.mishra wrote:
> > > On 22/11/22 2:59 pm, Daniel P. Berrangé wrote:
> > > > On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:
> > > > > On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:
> > > > > > On Sat, Nov 19, 2022 at 09:36:14AM +0000, manish.mishra wrote:
> > > > > > > MSG_PEEK reads from the peek of channel, The data is treated as
> > > > > > > unread and the next read shall still return this data. This
> > > > > > > support is currently added only for socket class. Extra parameter
> > > > > > > 'flags' is added to io_readv calls to pass extra read flags like
> > > > > > > MSG_PEEK.
> > > > > > > 
> > > > > > > Suggested-by: Daniel P. Berrangé <berrange@redhat.com
> > > > > > > Signed-off-by: manish.mishra <manish.mishra@nutanix.com>
> > > > > > > ---
> > > > > > >     chardev/char-socket.c               |  4 +-
> > > > > > >     include/io/channel.h                | 83 +++++++++++++++++++++++++++++
> > > > > > >     io/channel-buffer.c                 |  1 +
> > > > > > >     io/channel-command.c                |  1 +
> > > > > > >     io/channel-file.c                   |  1 +
> > > > > > >     io/channel-null.c                   |  1 +
> > > > > > >     io/channel-socket.c                 | 16 +++++-
> > > > > > >     io/channel-tls.c                    |  1 +
> > > > > > >     io/channel-websock.c                |  1 +
> > > > > > >     io/channel.c                        | 73 +++++++++++++++++++++++--
> > > > > > >     migration/channel-block.c           |  1 +
> > > > > > >     scsi/qemu-pr-helper.c               |  2 +-
> > > > > > >     tests/qtest/tpm-emu.c               |  2 +-
> > > > > > >     tests/unit/test-io-channel-socket.c |  1 +
> > > > > > >     util/vhost-user-server.c            |  2 +-
> > > > > > >     15 files changed, 179 insertions(+), 11 deletions(-)
> > > > > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > > > > index b76dca9cc1..a06b24766d 100644
> > > > > > > --- a/io/channel-socket.c
> > > > > > > +++ b/io/channel-socket.c
> > > > > > > @@ -406,6 +406,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
> > > > > > >         }
> > > > > > >     #endif /* WIN32 */
> > > > > > > +    qio_channel_set_feature(QIO_CHANNEL(cioc), QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
> > > > > > > +
> > > > > > This covers the incoming server side socket.
> > > > > > 
> > > > > > This also needs to be set in outgoing client side socket in
> > > > > > qio_channel_socket_connect_async
> > > > > Yes sorry, i considered only current use-case, but as it is generic one both should be there. Thanks will update it.
> > > > > 
> > > > > > > @@ -705,7 +718,6 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > > > > > >     }
> > > > > > >     #endif /* WIN32 */
> > > > > > > -
> > > > > > >     #ifdef QEMU_MSG_ZEROCOPY
> > > > > > >     static int qio_channel_socket_flush(QIOChannel *ioc,
> > > > > > >                                         Error **errp)
> > > > > > Please remove this unrelated whitespace change.
> > > > > > 
> > > > > > 
> > > > > > > @@ -109,6 +117,37 @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
> > > > > > >         return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, errp);
> > > > > > >     }
> > > > > > > +int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
> > > > > > > +                                   const struct iovec *iov,
> > > > > > > +                                   size_t niov,
> > > > > > > +                                   Error **errp)
> > > > > > > +{
> > > > > > > +   ssize_t len = 0;
> > > > > > > +   ssize_t total = iov_size(iov, niov);
> > > > > > > +
> > > > > > > +   while (len < total) {
> > > > > > > +       len = qio_channel_readv_full(ioc, iov, niov, NULL,
> > > > > > > +                                    NULL, QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
> > > > > > > +
> > > > > > > +       if (len == QIO_CHANNEL_ERR_BLOCK) {
> > > > > > > +            if (qemu_in_coroutine()) {
> > > > > > > +                qio_channel_yield(ioc, G_IO_IN);
> > > > > > > +            } else {
> > > > > > > +                qio_channel_wait(ioc, G_IO_IN);
> > > > > > > +            }
> > > > > > > +            continue;
> > > > > > > +       }
> > > > > > > +       if (len == 0) {
> > > > > > > +           return 0;
> > > > > > > +       }
> > > > > > > +       if (len < 0) {
> > > > > > > +           return -1;
> > > > > > > +       }
> > > > > > > +   }
> > > > > > This will busy wait burning CPU where there is a read > 0 and < total.
> > > > > > 
> > > > > Daniel, i could use MSG_WAITALL too if that works but then we will
> > > > > lose opportunity to yield. Or if you have some other idea.
> > > > I fear this is an inherant problem with the idea of using PEEK to
> > > > look at the magic data.
> > > > 
> > > > If we actually read the magic bytes off the wire, then we could have
> > > > the same code path for TLS and non-TLS. We would have to modify the
> > > > existing later code paths though to take account of fact that the
> > > > magic was already read by an earlier codepath.
> > > > 
> > > > With regards,
> > > > Daniel
> > > 
> > > sure Daniel, I am happy to drop use of MSG_PEEK, but that way also we
> > > have issue with tls for reason we discussed in V2. Is it okay to send
> > > a patch with actual read ahead but not for tls case? tls anyway does
> > > not have this bug as it does handshake.
> > I've re-read the previous threads, but I don't see what the problem
> > with TLS is.  We already decided that TLS is not affected by the
> > race condition. So there should be no problem in reading the magic
> > bytes early on the TLS channels, while reading the bytes early on
> > a non-TLS channel will fix the race condition.
> 
> 
> Actually with tls all channels requires handshake to be assumed established,
> and from source side we do initial qemu_flush only when all channels are
> established. But on destination side we will stuck on reading magic for
> main channel itself which never comes because source has not flushed data,
> so no new connections can be established(e.g. multiFD). So basically
> destination can not accept any new channel until we read from main
> channel and source is not putting any data on main channel until all
> channels are established. So if we read ahread in
> ioc_process_incoming_channel there is this deadlock with tls. This issue
> is not there with non-tls case, because there on source side we assume a
> connection established once connect() call is successful.

Ah yes, I forgot about the 'flush' problem. Reading magic in non-TLS case
is OK then i guess.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2022-11-22 10:33 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-19  9:36 [PATCH 1/2] io: Add support for MSG_PEEK for socket channel manish.mishra
2022-11-19  9:36 ` [PATCH 2/2] migration: check magic value for deciding the mapping of channels manish.mishra
2022-11-19  9:36 ` manish.mishra
2022-11-19  9:36 ` [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel manish.mishra
2022-11-22  9:00   ` Daniel P. Berrangé
2022-11-22  9:08     ` manish.mishra
2022-11-22  9:29       ` Daniel P. Berrangé
2022-11-22  9:40         ` manish.mishra
2022-11-22  9:53           ` Daniel P. Berrangé
2022-11-22 10:13             ` manish.mishra
2022-11-22 10:31               ` Daniel P. Berrangé [this message]
2022-11-22 14:41       ` Peter Xu
2022-11-22 14:49         ` Daniel P. Berrangé
2022-11-22 15:31           ` manish.mishra
2022-11-22 16:10             ` Peter Xu
2022-11-22 16:29               ` Peter Xu
2022-11-22 16:33                 ` Peter Xu
2022-11-22 16:42                   ` manish.mishra
2022-11-22 17:16                     ` Peter Xu
2022-11-22 17:31                       ` Daniel P. Berrangé
2022-11-19  9:36 ` [PATCH v3 2/2] migration: check magic value for deciding the mapping of channels manish.mishra
2022-11-21 21:59   ` Peter Xu
2022-11-22  9:01   ` Daniel P. Berrangé
2022-11-19  9:40 ` [PATCH 1/2] io: Add support for MSG_PEEK for socket channel manish.mishra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3ylH0J7rl5o5KrI@redhat.com \
    --to=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=manish.mishra@nutanix.com \
    --cc=peterx@redhat.com \
    --cc=prerna.saxena@nutanix.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.