qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] Avoid retrying on serial_xmit with EPIPE
@ 2018-05-31  7:45 Sergio Lopez
  2018-05-31  7:45 ` [Qemu-devel] [PATCH 1/3] io: Implement QIO_CHANNEL_ERR_BROKEN Sergio Lopez
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Sergio Lopez @ 2018-05-31  7:45 UTC (permalink / raw)
  To: berrange, pbonzini, marcandre.lureau, mst; +Cc: qemu-devel, Sergio Lopez

EPIPE is not a recoverable error, so retrying on serial_xmit is a waste of CPU
cycles, and can potentially comprosie the Guest OS stability if both the vCPU
and the emulator thread are pinned to the same pCPU, with the first one
actively polling the serial device and barely giving time to the second to make
any actual progress.

Sergio Lopez (3):
  io: Implement QIO_CHANNEL_ERR_BROKEN
  chardev: On QIO_CHANNEL_ERR_BROKEN set errno to EPIPE
  hw/char/serial: Don't retry on serial_xmit if errno == EPIPE

 chardev/char-io.c    | 3 +++
 hw/char/serial.c     | 1 +
 include/io/channel.h | 1 +
 io/channel-file.c    | 3 +++
 4 files changed, 8 insertions(+)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 1/3] io: Implement QIO_CHANNEL_ERR_BROKEN
  2018-05-31  7:45 [Qemu-devel] [PATCH 0/3] Avoid retrying on serial_xmit with EPIPE Sergio Lopez
@ 2018-05-31  7:45 ` Sergio Lopez
  2018-06-01 11:50   ` Daniel P. Berrangé
  2018-05-31  7:46 ` [Qemu-devel] [PATCH 2/3] chardev: On QIO_CHANNEL_ERR_BROKEN set errno to EPIPE Sergio Lopez
  2018-05-31  7:46 ` [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE Sergio Lopez
  2 siblings, 1 reply; 18+ messages in thread
From: Sergio Lopez @ 2018-05-31  7:45 UTC (permalink / raw)
  To: berrange, pbonzini, marcandre.lureau, mst; +Cc: qemu-devel, Sergio Lopez

QIO_CHANNEL_ERR_BROKEN is used to identify a potentially unrecoverable
error in the channel, like EPIPE.
---
 include/io/channel.h | 1 +
 io/channel-file.c    | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/io/channel.h b/include/io/channel.h
index e8cdadb..bbe45f6 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -38,6 +38,7 @@ typedef struct QIOChannel QIOChannel;
 typedef struct QIOChannelClass QIOChannelClass;
 
 #define QIO_CHANNEL_ERR_BLOCK -2
+#define QIO_CHANNEL_ERR_BROKEN -3
 
 typedef enum QIOChannelFeature QIOChannelFeature;
 
diff --git a/io/channel-file.c b/io/channel-file.c
index db948ab..a990f67 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -124,6 +124,9 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
         if (errno == EAGAIN) {
             return QIO_CHANNEL_ERR_BLOCK;
         }
+        if (errno == EPIPE) {
+            return QIO_CHANNEL_ERR_BROKEN;
+        }
         if (errno == EINTR) {
             goto retry;
         }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 2/3] chardev: On QIO_CHANNEL_ERR_BROKEN set errno to EPIPE
  2018-05-31  7:45 [Qemu-devel] [PATCH 0/3] Avoid retrying on serial_xmit with EPIPE Sergio Lopez
  2018-05-31  7:45 ` [Qemu-devel] [PATCH 1/3] io: Implement QIO_CHANNEL_ERR_BROKEN Sergio Lopez
@ 2018-05-31  7:46 ` Sergio Lopez
  2018-05-31  9:48   ` Marc-André Lureau
  2018-06-01 11:51   ` Daniel P. Berrangé
  2018-05-31  7:46 ` [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE Sergio Lopez
  2 siblings, 2 replies; 18+ messages in thread
From: Sergio Lopez @ 2018-05-31  7:46 UTC (permalink / raw)
  To: berrange, pbonzini, marcandre.lureau, mst; +Cc: qemu-devel, Sergio Lopez

This allows callers to identify this potentially unrecoverable error.
---
 chardev/char-io.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/chardev/char-io.c b/chardev/char-io.c
index f810524..f934eb9 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -168,6 +168,9 @@ int io_channel_send_full(QIOChannel *ioc,
 
             errno = EAGAIN;
             return -1;
+        } else if (ret == QIO_CHANNEL_ERR_BROKEN) {
+            errno = EPIPE;
+            return -1;
         } else if (ret < 0) {
             errno = EINVAL;
             return -1;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-05-31  7:45 [Qemu-devel] [PATCH 0/3] Avoid retrying on serial_xmit with EPIPE Sergio Lopez
  2018-05-31  7:45 ` [Qemu-devel] [PATCH 1/3] io: Implement QIO_CHANNEL_ERR_BROKEN Sergio Lopez
  2018-05-31  7:46 ` [Qemu-devel] [PATCH 2/3] chardev: On QIO_CHANNEL_ERR_BROKEN set errno to EPIPE Sergio Lopez
@ 2018-05-31  7:46 ` Sergio Lopez
  2018-05-31  9:52   ` Marc-André Lureau
  2 siblings, 1 reply; 18+ messages in thread
From: Sergio Lopez @ 2018-05-31  7:46 UTC (permalink / raw)
  To: berrange, pbonzini, marcandre.lureau, mst; +Cc: qemu-devel, Sergio Lopez

If writing to the frontend channel failed with EPIPE, don't set up a
retry. EPIPE is not a recoverable error, so trying again is a waste of CPU
cycles.

If the vCPU writing to the serial device and emulator thread are pinned
to the same pCPU, it can also compromise the stability of the Guest OS,
as both threads will be competing for pCPU's time, with the vCPU
actively polling the serial device and barely giving time to the
emulator thread to make actual progress.
---
 hw/char/serial.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/char/serial.c b/hw/char/serial.c
index 2c080c9..f26e86b 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -262,6 +262,7 @@ static void serial_xmit(SerialState *s)
             /* in loopback mode, say that we just received a char */
             serial_receive1(s, &s->tsr, 1);
         } else if (qemu_chr_fe_write(&s->chr, &s->tsr, 1) != 1 &&
+                   errno != EPIPE &&
                    s->tsr_retry < MAX_XMIT_RETRY) {
             assert(s->watch_tag == 0);
             s->watch_tag =
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] chardev: On QIO_CHANNEL_ERR_BROKEN set errno to EPIPE
  2018-05-31  7:46 ` [Qemu-devel] [PATCH 2/3] chardev: On QIO_CHANNEL_ERR_BROKEN set errno to EPIPE Sergio Lopez
@ 2018-05-31  9:48   ` Marc-André Lureau
  2018-05-31 11:03     ` Sergio Lopez
  2018-06-01 11:51   ` Daniel P. Berrangé
  1 sibling, 1 reply; 18+ messages in thread
From: Marc-André Lureau @ 2018-05-31  9:48 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: Daniel P. Berrange, Paolo Bonzini, Michael S. Tsirkin, QEMU

On Thu, May 31, 2018 at 9:46 AM, Sergio Lopez <slp@redhat.com> wrote:
> This allows callers to identify this potentially unrecoverable error.

(missing signed-off)

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

> ---
>  chardev/char-io.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/chardev/char-io.c b/chardev/char-io.c
> index f810524..f934eb9 100644
> --- a/chardev/char-io.c
> +++ b/chardev/char-io.c
> @@ -168,6 +168,9 @@ int io_channel_send_full(QIOChannel *ioc,
>
>              errno = EAGAIN;
>              return -1;
> +        } else if (ret == QIO_CHANNEL_ERR_BROKEN) {
> +            errno = EPIPE;
> +            return -1;
>          } else if (ret < 0) {
>              errno = EINVAL;
>              return -1;
> --
> 1.8.3.1
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-05-31  7:46 ` [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE Sergio Lopez
@ 2018-05-31  9:52   ` Marc-André Lureau
  2018-05-31 11:03     ` Sergio Lopez
  2018-06-01 11:52     ` Daniel P. Berrangé
  0 siblings, 2 replies; 18+ messages in thread
From: Marc-André Lureau @ 2018-05-31  9:52 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: Daniel P. Berrange, Paolo Bonzini, Michael S. Tsirkin, QEMU

Hi

On Thu, May 31, 2018 at 9:46 AM, Sergio Lopez <slp@redhat.com> wrote:
> If writing to the frontend channel failed with EPIPE, don't set up a
> retry. EPIPE is not a recoverable error, so trying again is a waste of CPU
> cycles.
>
> If the vCPU writing to the serial device and emulator thread are pinned
> to the same pCPU, it can also compromise the stability of the Guest OS,
> as both threads will be competing for pCPU's time, with the vCPU
> actively polling the serial device and barely giving time to the
> emulator thread to make actual progress.
> ---
>  hw/char/serial.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/char/serial.c b/hw/char/serial.c
> index 2c080c9..f26e86b 100644
> --- a/hw/char/serial.c
> +++ b/hw/char/serial.c
> @@ -262,6 +262,7 @@ static void serial_xmit(SerialState *s)
>              /* in loopback mode, say that we just received a char */
>              serial_receive1(s, &s->tsr, 1);
>          } else if (qemu_chr_fe_write(&s->chr, &s->tsr, 1) != 1 &&
> +                   errno != EPIPE &&
>                     s->tsr_retry < MAX_XMIT_RETRY) {

Instead of adding explicit handling of EPIPE, shouldn't the code be
rewritten to treat -1 return && errno != EAGAIN as fatal?

>              assert(s->watch_tag == 0);
>              s->watch_tag =
> --
> 1.8.3.1
>
>



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-05-31  9:52   ` Marc-André Lureau
@ 2018-05-31 11:03     ` Sergio Lopez
  2018-05-31 11:07       ` Marc-André Lureau
  2018-06-01 11:52     ` Daniel P. Berrangé
  1 sibling, 1 reply; 18+ messages in thread
From: Sergio Lopez @ 2018-05-31 11:03 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Daniel P. Berrange, Paolo Bonzini, Michael S. Tsirkin, QEMU

On Thu, May 31, 2018 at 11:52:01AM +0200, Marc-André Lureau wrote:
> On Thu, May 31, 2018 at 9:46 AM, Sergio Lopez <slp@redhat.com> wrote:
> > If writing to the frontend channel failed with EPIPE, don't set up a
> > retry. EPIPE is not a recoverable error, so trying again is a waste of CPU
> > cycles.
> >
> > If the vCPU writing to the serial device and emulator thread are pinned
> > to the same pCPU, it can also compromise the stability of the Guest OS,
> > as both threads will be competing for pCPU's time, with the vCPU
> > actively polling the serial device and barely giving time to the
> > emulator thread to make actual progress.
> > ---
> >  hw/char/serial.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hw/char/serial.c b/hw/char/serial.c
> > index 2c080c9..f26e86b 100644
> > --- a/hw/char/serial.c
> > +++ b/hw/char/serial.c
> > @@ -262,6 +262,7 @@ static void serial_xmit(SerialState *s)
> >              /* in loopback mode, say that we just received a char */
> >              serial_receive1(s, &s->tsr, 1);
> >          } else if (qemu_chr_fe_write(&s->chr, &s->tsr, 1) != 1 &&
> > +                   errno != EPIPE &&
> >                     s->tsr_retry < MAX_XMIT_RETRY) {
> 
> Instead of adding explicit handling of EPIPE, shouldn't the code be
> rewritten to treat -1 return && errno != EAGAIN as fatal?

What kind of action should it trigger if treating errno != EAGAIN as fatal?

If you meant something like calling 'abort()', I think that, while this
could be considered as the "correct" behavior, it's a bit unpractical
for the users. The most common situation for this to happen is
restarting virtlogd while having the console/serial redirected to it.
The VM will stop writing to the serial device, but otherwise the Guest
OS can probably keep working without any issues.

Sergio.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] chardev: On QIO_CHANNEL_ERR_BROKEN set errno to EPIPE
  2018-05-31  9:48   ` Marc-André Lureau
@ 2018-05-31 11:03     ` Sergio Lopez
  0 siblings, 0 replies; 18+ messages in thread
From: Sergio Lopez @ 2018-05-31 11:03 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Daniel P. Berrange, Paolo Bonzini, Michael S. Tsirkin, QEMU

On Thu, May 31, 2018 at 11:48:25AM +0200, Marc-André Lureau wrote:
> On Thu, May 31, 2018 at 9:46 AM, Sergio Lopez <slp@redhat.com> wrote:
> > This allows callers to identify this potentially unrecoverable error.
> 
> (missing signed-off)

Ouch, sorry, will add it in v2.

Sergio.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-05-31 11:03     ` Sergio Lopez
@ 2018-05-31 11:07       ` Marc-André Lureau
  0 siblings, 0 replies; 18+ messages in thread
From: Marc-André Lureau @ 2018-05-31 11:07 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: Daniel P. Berrange, Paolo Bonzini, Michael S. Tsirkin, QEMU

Hi

On Thu, May 31, 2018 at 1:03 PM, Sergio Lopez <slp@redhat.com> wrote:
> On Thu, May 31, 2018 at 11:52:01AM +0200, Marc-André Lureau wrote:
>> On Thu, May 31, 2018 at 9:46 AM, Sergio Lopez <slp@redhat.com> wrote:
>> > If writing to the frontend channel failed with EPIPE, don't set up a
>> > retry. EPIPE is not a recoverable error, so trying again is a waste of CPU
>> > cycles.
>> >
>> > If the vCPU writing to the serial device and emulator thread are pinned
>> > to the same pCPU, it can also compromise the stability of the Guest OS,
>> > as both threads will be competing for pCPU's time, with the vCPU
>> > actively polling the serial device and barely giving time to the
>> > emulator thread to make actual progress.
>> > ---
>> >  hw/char/serial.c | 1 +
>> >  1 file changed, 1 insertion(+)
>> >
>> > diff --git a/hw/char/serial.c b/hw/char/serial.c
>> > index 2c080c9..f26e86b 100644
>> > --- a/hw/char/serial.c
>> > +++ b/hw/char/serial.c
>> > @@ -262,6 +262,7 @@ static void serial_xmit(SerialState *s)
>> >              /* in loopback mode, say that we just received a char */
>> >              serial_receive1(s, &s->tsr, 1);
>> >          } else if (qemu_chr_fe_write(&s->chr, &s->tsr, 1) != 1 &&
>> > +                   errno != EPIPE &&
>> >                     s->tsr_retry < MAX_XMIT_RETRY) {
>>
>> Instead of adding explicit handling of EPIPE, shouldn't the code be
>> rewritten to treat -1 return && errno != EAGAIN as fatal?
>
> What kind of action should it trigger if treating errno != EAGAIN as fatal?

"fatal" was perhaps the wrong word. I mean that we don't go into the
retry loop and treat the chardev as disconnected (without explicit
handling of EPIPE).

>
> If you meant something like calling 'abort()', I think that, while this
> could be considered as the "correct" behavior, it's a bit unpractical
> for the users. The most common situation for this to happen is
> restarting virtlogd while having the console/serial redirected to it.
> The VM will stop writing to the serial device, but otherwise the Guest
> OS can probably keep working without any issues.
>
> Sergio.



-- 
Marc-André Lureau

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/3] io: Implement QIO_CHANNEL_ERR_BROKEN
  2018-05-31  7:45 ` [Qemu-devel] [PATCH 1/3] io: Implement QIO_CHANNEL_ERR_BROKEN Sergio Lopez
@ 2018-06-01 11:50   ` Daniel P. Berrangé
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel P. Berrangé @ 2018-06-01 11:50 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: pbonzini, marcandre.lureau, mst, qemu-devel

On Thu, May 31, 2018 at 09:45:59AM +0200, Sergio Lopez wrote:
> QIO_CHANNEL_ERR_BROKEN is used to identify a potentially unrecoverable
> error in the channel, like EPIPE.
> ---
>  include/io/channel.h | 1 +
>  io/channel-file.c    | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/include/io/channel.h b/include/io/channel.h
> index e8cdadb..bbe45f6 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -38,6 +38,7 @@ typedef struct QIOChannel QIOChannel;
>  typedef struct QIOChannelClass QIOChannelClass;
>  
>  #define QIO_CHANNEL_ERR_BLOCK -2
> +#define QIO_CHANNEL_ERR_BROKEN -3
>  
>  typedef enum QIOChannelFeature QIOChannelFeature;
>  
> diff --git a/io/channel-file.c b/io/channel-file.c
> index db948ab..a990f67 100644
> --- a/io/channel-file.c
> +++ b/io/channel-file.c
> @@ -124,6 +124,9 @@ static ssize_t qio_channel_file_writev(QIOChannel *ioc,
>          if (errno == EAGAIN) {
>              return QIO_CHANNEL_ERR_BLOCK;
>          }
> +        if (errno == EPIPE) {
> +            return QIO_CHANNEL_ERR_BROKEN;
> +        }
>          if (errno == EINTR) {
>              goto retry;
>          }

I don't see a compelling reason to add this.  EAGAIN needs special
handling because we do not wish to set the "Error **errp" object.
EPIPE can be just handled by callers in the same way as any other
errno value, by closing the connection.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] chardev: On QIO_CHANNEL_ERR_BROKEN set errno to EPIPE
  2018-05-31  7:46 ` [Qemu-devel] [PATCH 2/3] chardev: On QIO_CHANNEL_ERR_BROKEN set errno to EPIPE Sergio Lopez
  2018-05-31  9:48   ` Marc-André Lureau
@ 2018-06-01 11:51   ` Daniel P. Berrangé
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel P. Berrangé @ 2018-06-01 11:51 UTC (permalink / raw)
  To: Sergio Lopez; +Cc: pbonzini, marcandre.lureau, mst, qemu-devel

On Thu, May 31, 2018 at 09:46:00AM +0200, Sergio Lopez wrote:
> This allows callers to identify this potentially unrecoverable error.
> ---
>  chardev/char-io.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/chardev/char-io.c b/chardev/char-io.c
> index f810524..f934eb9 100644
> --- a/chardev/char-io.c
> +++ b/chardev/char-io.c
> @@ -168,6 +168,9 @@ int io_channel_send_full(QIOChannel *ioc,
>  
>              errno = EAGAIN;
>              return -1;
> +        } else if (ret == QIO_CHANNEL_ERR_BROKEN) {
> +            errno = EPIPE;
> +            return -1;
>          } else if (ret < 0) {
>              errno = EINVAL;
>              return -1;

Again, I don't see any reason to add this special case for EPIPE


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-05-31  9:52   ` Marc-André Lureau
  2018-05-31 11:03     ` Sergio Lopez
@ 2018-06-01 11:52     ` Daniel P. Berrangé
  2018-06-01 12:05       ` Sergio Lopez
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel P. Berrangé @ 2018-06-01 11:52 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Sergio Lopez, Paolo Bonzini, Michael S. Tsirkin, QEMU

On Thu, May 31, 2018 at 11:52:01AM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Thu, May 31, 2018 at 9:46 AM, Sergio Lopez <slp@redhat.com> wrote:
> > If writing to the frontend channel failed with EPIPE, don't set up a
> > retry. EPIPE is not a recoverable error, so trying again is a waste of CPU
> > cycles.
> >
> > If the vCPU writing to the serial device and emulator thread are pinned
> > to the same pCPU, it can also compromise the stability of the Guest OS,
> > as both threads will be competing for pCPU's time, with the vCPU
> > actively polling the serial device and barely giving time to the
> > emulator thread to make actual progress.
> > ---
> >  hw/char/serial.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hw/char/serial.c b/hw/char/serial.c
> > index 2c080c9..f26e86b 100644
> > --- a/hw/char/serial.c
> > +++ b/hw/char/serial.c
> > @@ -262,6 +262,7 @@ static void serial_xmit(SerialState *s)
> >              /* in loopback mode, say that we just received a char */
> >              serial_receive1(s, &s->tsr, 1);
> >          } else if (qemu_chr_fe_write(&s->chr, &s->tsr, 1) != 1 &&
> > +                   errno != EPIPE &&
> >                     s->tsr_retry < MAX_XMIT_RETRY) {
> 
> Instead of adding explicit handling of EPIPE, shouldn't the code be
> rewritten to treat -1 return && errno != EAGAIN as fatal?

Yes, exactly this code is already broken for every single errno
value, not simply EPIPE. It needs fixing to treat '-1' return code
correctly instead of retrying everything.

> 
> >              assert(s->watch_tag == 0);
> >              s->watch_tag =

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-06-01 11:52     ` Daniel P. Berrangé
@ 2018-06-01 12:05       ` Sergio Lopez
  2018-06-01 12:16         ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Sergio Lopez @ 2018-06-01 12:05 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Marc-André Lureau, Paolo Bonzini, Michael S. Tsirkin, QEMU

On Fri, Jun 01, 2018 at 12:52:12PM +0100, Daniel P. Berrangé wrote:
> On Thu, May 31, 2018 at 11:52:01AM +0200, Marc-André Lureau wrote:
> > Hi
> > 
> > On Thu, May 31, 2018 at 9:46 AM, Sergio Lopez <slp@redhat.com> wrote:
> > > If writing to the frontend channel failed with EPIPE, don't set up a
> > > retry. EPIPE is not a recoverable error, so trying again is a waste of CPU
> > > cycles.
> > >
> > > If the vCPU writing to the serial device and emulator thread are pinned
> > > to the same pCPU, it can also compromise the stability of the Guest OS,
> > > as both threads will be competing for pCPU's time, with the vCPU
> > > actively polling the serial device and barely giving time to the
> > > emulator thread to make actual progress.
> > > ---
> > >  hw/char/serial.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/hw/char/serial.c b/hw/char/serial.c
> > > index 2c080c9..f26e86b 100644
> > > --- a/hw/char/serial.c
> > > +++ b/hw/char/serial.c
> > > @@ -262,6 +262,7 @@ static void serial_xmit(SerialState *s)
> > >              /* in loopback mode, say that we just received a char */
> > >              serial_receive1(s, &s->tsr, 1);
> > >          } else if (qemu_chr_fe_write(&s->chr, &s->tsr, 1) != 1 &&
> > > +                   errno != EPIPE &&
> > >                     s->tsr_retry < MAX_XMIT_RETRY) {
> > 
> > Instead of adding explicit handling of EPIPE, shouldn't the code be
> > rewritten to treat -1 return && errno != EAGAIN as fatal?
> 
> Yes, exactly this code is already broken for every single errno
> value, not simply EPIPE. It needs fixing to treat '-1' return code
> correctly instead of retrying everything.

Given that EAGAIN is already taken care of in
chardev/char.c:qemu_chr_write_buffer, in which cases should we retry? Or
should we just drop all the tsr_retry logic?

Sergio.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-06-01 12:05       ` Sergio Lopez
@ 2018-06-01 12:16         ` Paolo Bonzini
  2018-06-01 12:28           ` Sergio Lopez
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2018-06-01 12:16 UTC (permalink / raw)
  To: Sergio Lopez, Daniel P. Berrangé
  Cc: Marc-André Lureau, Michael S. Tsirkin, QEMU

On 01/06/2018 14:05, Sergio Lopez wrote:
>>> Instead of adding explicit handling of EPIPE, shouldn't the code be
>>> rewritten to treat -1 return && errno != EAGAIN as fatal?
>> Yes, exactly this code is already broken for every single errno
>> value, not simply EPIPE. It needs fixing to treat '-1' return code
>> correctly instead of retrying everything.
> Given that EAGAIN is already taken care of in
> chardev/char.c:qemu_chr_write_buffer, in which cases should we retry? Or
> should we just drop all the tsr_retry logic?

It's not handled there, the call from qemu_chr_fe_write has write_all ==
false.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-06-01 12:16         ` Paolo Bonzini
@ 2018-06-01 12:28           ` Sergio Lopez
  2018-06-01 12:31             ` Paolo Bonzini
  2018-06-01 12:43             ` Peter Maydell
  0 siblings, 2 replies; 18+ messages in thread
From: Sergio Lopez @ 2018-06-01 12:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Daniel P. Berrangé,
	Marc-André Lureau, Michael S. Tsirkin, QEMU

On Fri, Jun 01, 2018 at 02:16:59PM +0200, Paolo Bonzini wrote:
> On 01/06/2018 14:05, Sergio Lopez wrote:
> >>> Instead of adding explicit handling of EPIPE, shouldn't the code be
> >>> rewritten to treat -1 return && errno != EAGAIN as fatal?
> >> Yes, exactly this code is already broken for every single errno
> >> value, not simply EPIPE. It needs fixing to treat '-1' return code
> >> correctly instead of retrying everything.
> > Given that EAGAIN is already taken care of in
> > chardev/char.c:qemu_chr_write_buffer, in which cases should we retry? Or
> > should we just drop all the tsr_retry logic?
> 
> It's not handled there, the call from qemu_chr_fe_write has write_all ==
> false.

You're right. So should we just retry only for -1 && errno == EAGAIN and
just ignore the error otherwise?

Sergio.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-06-01 12:28           ` Sergio Lopez
@ 2018-06-01 12:31             ` Paolo Bonzini
  2018-06-01 12:43             ` Peter Maydell
  1 sibling, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2018-06-01 12:31 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Daniel P. Berrangé,
	Marc-André Lureau, Michael S. Tsirkin, QEMU

On 01/06/2018 14:28, Sergio Lopez wrote:
> On Fri, Jun 01, 2018 at 02:16:59PM +0200, Paolo Bonzini wrote:
>> On 01/06/2018 14:05, Sergio Lopez wrote:
>>>>> Instead of adding explicit handling of EPIPE, shouldn't the code be
>>>>> rewritten to treat -1 return && errno != EAGAIN as fatal?
>>>> Yes, exactly this code is already broken for every single errno
>>>> value, not simply EPIPE. It needs fixing to treat '-1' return code
>>>> correctly instead of retrying everything.
>>> Given that EAGAIN is already taken care of in
>>> chardev/char.c:qemu_chr_write_buffer, in which cases should we retry? Or
>>> should we just drop all the tsr_retry logic?
>>
>> It's not handled there, the call from qemu_chr_fe_write has write_all ==
>> false.
> 
> You're right. So should we just retry only for -1 && errno == EAGAIN and
> just ignore the error otherwise?

Probably, yes.

Paolo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-06-01 12:28           ` Sergio Lopez
  2018-06-01 12:31             ` Paolo Bonzini
@ 2018-06-01 12:43             ` Peter Maydell
  2018-06-01 16:20               ` Sergio Lopez
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Maydell @ 2018-06-01 12:43 UTC (permalink / raw)
  To: Sergio Lopez
  Cc: Paolo Bonzini, Marc-André Lureau, QEMU, Michael S. Tsirkin

On 1 June 2018 at 13:28, Sergio Lopez <slp@redhat.com> wrote:
> On Fri, Jun 01, 2018 at 02:16:59PM +0200, Paolo Bonzini wrote:
>> On 01/06/2018 14:05, Sergio Lopez wrote:
>> >>> Instead of adding explicit handling of EPIPE, shouldn't the code be
>> >>> rewritten to treat -1 return && errno != EAGAIN as fatal?
>> >> Yes, exactly this code is already broken for every single errno
>> >> value, not simply EPIPE. It needs fixing to treat '-1' return code
>> >> correctly instead of retrying everything.
>> > Given that EAGAIN is already taken care of in
>> > chardev/char.c:qemu_chr_write_buffer, in which cases should we retry? Or
>> > should we just drop all the tsr_retry logic?
>>
>> It's not handled there, the call from qemu_chr_fe_write has write_all ==
>> false.
>
> You're right. So should we just retry only for -1 && errno == EAGAIN and
> just ignore the error otherwise?

I don't think we should make all of qemu_chr_fe_write()'s callers
have to handle EAGAIN. That function already has a way to tell
the caller that it didn't consume any data, which is to return 0.

(EAGAIN is a curse and we should strive to avoid it spreading its
poison into areas of the codebase that we can keep it out of.)

In general there are three cases I think qemu_chr_fe_write() callers
might care about:
 * data was consumed (return >0)
 * data wasn't consumed, try again later (return 0)
 * data wasn't consumed, and a later attempt won't work either
   (return -1)

NB that hw/char/serial.c is not the only serial device using
this pattern and probably needing adjustment (though I think it's
the only one with the complicated tsr_retry logic).

In the patch that started this thread the report is that we
use lots of CPU. Can you explain why that happens? I was expecting
that we would set up the qemu_chr_fe_add_watch() on the dead
chr FE and it would just never fire since the FE can never
accept further data...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE
  2018-06-01 12:43             ` Peter Maydell
@ 2018-06-01 16:20               ` Sergio Lopez
  0 siblings, 0 replies; 18+ messages in thread
From: Sergio Lopez @ 2018-06-01 16:20 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, Marc-André Lureau, QEMU, Michael S. Tsirkin

On Fri, Jun 01, 2018 at 01:43:56PM +0100, Peter Maydell wrote:
> On 1 June 2018 at 13:28, Sergio Lopez <slp@redhat.com> wrote:
> > On Fri, Jun 01, 2018 at 02:16:59PM +0200, Paolo Bonzini wrote:
> >> On 01/06/2018 14:05, Sergio Lopez wrote:
> >> >>> Instead of adding explicit handling of EPIPE, shouldn't the code be
> >> >>> rewritten to treat -1 return && errno != EAGAIN as fatal?
> >> >> Yes, exactly this code is already broken for every single errno
> >> >> value, not simply EPIPE. It needs fixing to treat '-1' return code
> >> >> correctly instead of retrying everything.
> >> > Given that EAGAIN is already taken care of in
> >> > chardev/char.c:qemu_chr_write_buffer, in which cases should we retry? Or
> >> > should we just drop all the tsr_retry logic?
> >>
> >> It's not handled there, the call from qemu_chr_fe_write has write_all ==
> >> false.
> >
> > You're right. So should we just retry only for -1 && errno == EAGAIN and
> > just ignore the error otherwise?
> 
> I don't think we should make all of qemu_chr_fe_write()'s callers
> have to handle EAGAIN. That function already has a way to tell
> the caller that it didn't consume any data, which is to return 0.
> 
> (EAGAIN is a curse and we should strive to avoid it spreading its
> poison into areas of the codebase that we can keep it out of.)
> 
> In general there are three cases I think qemu_chr_fe_write() callers
> might care about:
>  * data was consumed (return >0)
>  * data wasn't consumed, try again later (return 0)
>  * data wasn't consumed, and a later attempt won't work either
>    (return -1)
> 
> NB that hw/char/serial.c is not the only serial device using
> this pattern and probably needing adjustment (though I think it's
> the only one with the complicated tsr_retry logic).
> 
> In the patch that started this thread the report is that we
> use lots of CPU. Can you explain why that happens? I was expecting
> that we would set up the qemu_chr_fe_add_watch() on the dead
> chr FE and it would just never fire since the FE can never
> accept further data...

The callback is indeed triggered (I haven't looked why yet), and both
the main thread and the vCPU that issued the write to the serial port
(which is now polling the device to see if the transmitter is empty) are
fighting for the qemu_global_mutex.

This issue is heavily amplified if both threads are sharing the same
pCPU (by chance or pinning).

Some debugging info from a simulation:


 - vCPU thread waiting to acquire qemu_global_mutex:

Thread 3 (Thread 0x7fb6c7fff700 (LWP 31641)):
#0  0x00007fb6e591c4cd in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fb6e5917dcb in _L_lock_812 () at /lib64/libpthread.so.0
#2  0x00007fb6e5917c98 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x556dfb7b6420 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:79
#3  0x0000556dfaebcfd7 in qemu_mutex_lock_impl (mutex=mutex@entry=0x556dfb7b6420 <qemu_global_mutex>, file=file@entry=0x556dfaf57477 "/root/Projects/qemu/cpus.c", line=line@entry=1765)
    at util/qemu-thread-posix.c:67
#4  0x0000556dfaacfe58 in qemu_mutex_lock_iothread () at /root/Projects/qemu/cpus.c:1765
#5  0x0000556dfaa90bcd in prepare_mmio_access (mr=0x556dfe71c0e0, mr=0x556dfe71c0e0)
    at /root/Projects/qemu/exec.c:3068
#6  0x0000556dfaa95f98 in flatview_read_continue (fv=fv@entry=0x7fb6c028aa80, addr=addr@entry=1021, attrs=..., attrs@entry=..., buf=buf@entry=0x7fb6e739f000 "", len=len@entry=1, addr1=5, l=1, mr=0x556dfe71c0e0)
    at /root/Projects/qemu/exec.c:3189
---Type <return> to continue, or q <return> to quit---
#7  0x0000556dfaa9617c in flatview_read (fv=0x7fb6c028aa80, addr=1021, attrs=..., buf=0x7fb6e739f000 "", len=1) at /root/Projects/qemu/exec.c:3255
#8  0x0000556dfaa9629f in address_space_read_full (as=<optimized out>, addr=addr@entry=1021, attrs=..., buf=<optimized out>, len=len@entry=1) at /root/Projects/qemu/exec.c:3268
#9  0x0000556dfaa963fa in address_space_rw (as=<optimized out>, addr=addr@entry=1021, attrs=...,
    attrs@entry=..., buf=<optimized out>, len=len@entry=1, is_write=is_write@entry=false)
    at /root/Projects/qemu/exec.c:3298
#10 0x0000556dfaaf6536 in kvm_cpu_exec (count=1, size=1, direction=<optimized out>, data=<optimized out>, attrs=..., port=1021) at /root/Projects/qemu/accel/kvm/kvm-all.c:1730
#11 0x0000556dfaaf6536 in kvm_cpu_exec (cpu=cpu@entry=0x556dfd3d8020)
    at /root/Projects/qemu/accel/kvm/kvm-all.c:1970
#12 0x0000556dfaad0006 in qemu_kvm_cpu_thread_fn (arg=0x556dfd3d8020) at /root/Projects/qemu/cpus.c:1215
#13 0x00007fb6e5915dd5 in start_thread (arg=0x7fb6c7fff700) at pthread_create.c:308
#14 0x00007fb6d9e97b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113


 - The owner is LWP 31634, the main thread

 (gdb) p qemu_global_mutex
$1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 31634, __nusers = 1, __kind = 0, __spins = 0,
      __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
    __size = "\002\000\000\000\000\000\000\000\222{\000\000\001", '\000' <repeats 26 times>,
    __align = 2}, initialized = true}


 - The main thead is in the callback

 Thread 1 (Thread 0x7fb6e7355cc0 (LWP 31634)):
---Type <return> to continue, or q <return> to quit---
#0  0x0000556dfac59420 in serial_watch_cb (chan=0x556dfd321400, cond=G_IO_OUT, opaque=0x556dfe71c000)
    at hw/char/serial.c:233
#1  0x00007fb6e68638f9 in g_main_context_dispatch (context=0x556dfd314210) at gmain.c:3146
#2  0x00007fb6e68638f9 in g_main_context_dispatch (context=context@entry=0x556dfd314210) at gmain.c:3811
#3  0x0000556dfaeba126 in main_loop_wait () at util/main-loop.c:215
#4  0x0000556dfaeba126 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:263
#5  0x0000556dfaeba126 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:522
#6  0x0000556dfaa89a2f in main () at vl.c:1943
#7  0x0000556dfaa89a2f in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4679


Given your question, I was wondering what would happen if the callback
never fired. The result is much, much worse, as QEMU doesn't set
UART_LSR_TEMT, and the Guest OS (RHEL7.5) keeps polling the serial
device for ~10 seconds. Of course this can be regarded as a issue if the
Guest's serial driver, but I'm pretty sure we don't want to break it ;-)

This is not the first time I'm fighting the retry logic in serial.c.
Some versions back, serial_xmit would happily add up watchers with no
limit, eventually exceeding FD_SETSIZE limit. I'm starting to thing that
tsr_retry logic does more harm than good.

In any case, I think the question on the table is when should
serial_xmit retry writing to the FE, and we seem to have two options:

 1. When ret != 1 and errno == EAGAIN.

 2. When ret == 0.

I'm good with either, so I leave the decision to you.

Sergio.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-06-01 16:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-31  7:45 [Qemu-devel] [PATCH 0/3] Avoid retrying on serial_xmit with EPIPE Sergio Lopez
2018-05-31  7:45 ` [Qemu-devel] [PATCH 1/3] io: Implement QIO_CHANNEL_ERR_BROKEN Sergio Lopez
2018-06-01 11:50   ` Daniel P. Berrangé
2018-05-31  7:46 ` [Qemu-devel] [PATCH 2/3] chardev: On QIO_CHANNEL_ERR_BROKEN set errno to EPIPE Sergio Lopez
2018-05-31  9:48   ` Marc-André Lureau
2018-05-31 11:03     ` Sergio Lopez
2018-06-01 11:51   ` Daniel P. Berrangé
2018-05-31  7:46 ` [Qemu-devel] [PATCH 3/3] hw/char/serial: Don't retry on serial_xmit if errno == EPIPE Sergio Lopez
2018-05-31  9:52   ` Marc-André Lureau
2018-05-31 11:03     ` Sergio Lopez
2018-05-31 11:07       ` Marc-André Lureau
2018-06-01 11:52     ` Daniel P. Berrangé
2018-06-01 12:05       ` Sergio Lopez
2018-06-01 12:16         ` Paolo Bonzini
2018-06-01 12:28           ` Sergio Lopez
2018-06-01 12:31             ` Paolo Bonzini
2018-06-01 12:43             ` Peter Maydell
2018-06-01 16:20               ` Sergio Lopez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).