All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] xen-9pfs: Fix log messages of reply errors
  2020-05-10 17:41 [PATCH 0/2] 9pfs: regression init_in_iov_from_pdu truncating size Christian Schoenebeck
@ 2020-05-10 17:05 ` Christian Schoenebeck
  2020-05-11 22:09   ` Stefano Stabellini
  2020-05-10 17:18 ` [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size Christian Schoenebeck
  2020-05-12  9:38 ` [PATCH 0/2] 9pfs: regression " Greg Kurz
  2 siblings, 1 reply; 19+ messages in thread
From: Christian Schoenebeck @ 2020-05-10 17:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz, Stefano Stabellini, Anthony Perard, Paul Durrant

If delivery of some 9pfs response fails for some reason, log the
error message by mentioning the 9P protocol reply type, not by
client's request type. The latter could be misleading that the
error occurred already when handling the request input.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 hw/9pfs/xen-9p-backend.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 18fe5b7c92..f04caabfe5 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -137,7 +137,8 @@ static ssize_t xen_9pfs_pdu_vmarshal(V9fsPDU *pdu,
     ret = v9fs_iov_vmarshal(in_sg, num, offset, 0, fmt, ap);
     if (ret < 0) {
         xen_pv_printf(&xen_9pfs->xendev, 0,
-                      "Failed to encode VirtFS request type %d\n", pdu->id + 1);
+                      "Failed to encode VirtFS reply type %d\n",
+                      pdu->id + 1);
         xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
         xen_9pfs_disconnect(&xen_9pfs->xendev);
     }
@@ -201,9 +202,9 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU *pdu,
 
     buf_size = iov_size(ring->sg, num);
     if (buf_size  < P9_IOHDRSZ) {
-        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs request type %d"
-                "needs %zu bytes, buffer has %zu, less than minimum\n",
-                pdu->id, *size, buf_size);
+        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d needs "
+                      "%zu bytes, buffer has %zu, less than minimum\n",
+                      pdu->id + 1, *size, buf_size);
         xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
         xen_9pfs_disconnect(&xen_9pfs->xendev);
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-10 17:41 [PATCH 0/2] 9pfs: regression init_in_iov_from_pdu truncating size Christian Schoenebeck
  2020-05-10 17:05 ` [PATCH 1/2] xen-9pfs: Fix log messages of reply errors Christian Schoenebeck
@ 2020-05-10 17:18 ` Christian Schoenebeck
  2020-05-10 18:43   ` Christian Schoenebeck
  2020-05-11 22:09   ` Stefano Stabellini
  2020-05-12  9:38 ` [PATCH 0/2] 9pfs: regression " Greg Kurz
  2 siblings, 2 replies; 19+ messages in thread
From: Christian Schoenebeck @ 2020-05-10 17:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz, Stefano Stabellini, Anthony Perard, Paul Durrant

Commit SHA-1 16724a173049ac29c7b5ade741da93a0f46edff7 introduced
truncating the response to the currently available transport buffer
size, which was supposed to fix an 9pfs error on Xen boot where
transport buffer might still be smaller than required for response.

Unfortunately this change broke small reads (with less than 12
bytes).

To address both concerns, check the actual response type and only
truncate reply for Rreaddir responses, and only if truncated reply
would at least return one payload byte to client. Use Rreaddir's
precise header size (11) for this instead of P9_IOHDRSZ.

Fixes: 16724a173049ac29c7b5ade741da93a0f46edff7
Fixes: https://bugs.launchpad.net/bugs/1877688
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 hw/9pfs/virtio-9p-device.c | 35 +++++++++++++++++++++++++++--------
 hw/9pfs/xen-9p-backend.c   | 38 +++++++++++++++++++++++++++++---------
 2 files changed, 56 insertions(+), 17 deletions(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 536447a355..57e4d92ecb 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -154,15 +154,34 @@ static void virtio_init_in_iov_from_pdu(V9fsPDU *pdu, struct iovec **piov,
     VirtQueueElement *elem = v->elems[pdu->idx];
     size_t buf_size = iov_size(elem->in_sg, elem->in_num);
 
-    if (buf_size < P9_IOHDRSZ) {
-        VirtIODevice *vdev = VIRTIO_DEVICE(v);
+    if (pdu->id + 1 == P9_RREAD) {
+        /* size[4] Rread tag[2] count[4] data[count] */
+        const size_t hdr_size = 11;
+        /*
+         * If current transport buffer size is smaller than actually required
+         * for this Rreaddir response, then truncate the response to the
+         * currently available transport buffer size, however only if it would
+         * at least allow to return 1 payload byte to client.
+         */
+        if (buf_size < hdr_size + 1) {
+            VirtIODevice *vdev = VIRTIO_DEVICE(v);
 
-        virtio_error(vdev,
-                     "VirtFS reply type %d needs %zu bytes, buffer has %zu, less than minimum",
-                     pdu->id + 1, *size, buf_size);
-    }
-    if (buf_size < *size) {
-        *size = buf_size;
+            virtio_error(vdev,
+                         "VirtFS reply type %d needs %zu bytes, buffer has "
+                         "%zu, less than minimum (%zu)",
+                         pdu->id + 1, *size, buf_size, hdr_size + 1);
+        }
+        if (buf_size < *size) {
+            *size = buf_size;
+        }
+    } else {
+        if (buf_size < *size) {
+            VirtIODevice *vdev = VIRTIO_DEVICE(v);
+
+            virtio_error(vdev,
+                         "VirtFS reply type %d needs %zu bytes, buffer has %zu",
+                         pdu->id + 1, *size, buf_size);
+        }
     }
 
     *piov = elem->in_sg;
diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index f04caabfe5..98f340d24b 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -201,15 +201,35 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU *pdu,
     xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
 
     buf_size = iov_size(ring->sg, num);
-    if (buf_size  < P9_IOHDRSZ) {
-        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d needs "
-                      "%zu bytes, buffer has %zu, less than minimum\n",
-                      pdu->id + 1, *size, buf_size);
-        xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
-        xen_9pfs_disconnect(&xen_9pfs->xendev);
-    }
-    if (buf_size  < *size) {
-        *size = buf_size;
+    if (pdu->id + 1 == P9_RREAD) {
+        /* size[4] Rread tag[2] count[4] data[count] */
+        const size_t hdr_size = 11;
+        /*
+         * If current transport buffer size is smaller than actually required
+         * for this Rreaddir response, then truncate the response to the
+         * currently available transport buffer size, however only if it would
+         * at least allow to return 1 payload byte to client.
+         */
+        if (buf_size < hdr_size + 1) {
+            xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d "
+                          "needs %zu bytes, buffer has %zu, less than "
+                          "minimum (%zu)\n",
+                          pdu->id + 1, *size, buf_size, hdr_size + 1);
+            xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
+            xen_9pfs_disconnect(&xen_9pfs->xendev);
+        }
+        if (buf_size < *size) {
+            *size = buf_size;
+        }
+    } else {
+        if (buf_size < *size) {
+            xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d "
+                         "needs %zu bytes, buffer has %zu\n", pdu->id + 1,
+                          *size, buf_size);
+
+            xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
+            xen_9pfs_disconnect(&xen_9pfs->xendev);
+        }
     }
 
     *piov = ring->sg;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 0/2] 9pfs: regression init_in_iov_from_pdu truncating size
@ 2020-05-10 17:41 Christian Schoenebeck
  2020-05-10 17:05 ` [PATCH 1/2] xen-9pfs: Fix log messages of reply errors Christian Schoenebeck
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Christian Schoenebeck @ 2020-05-10 17:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz, Stefano Stabellini, Anthony Perard, Paul Durrant

Stefano, looks like your original patch needs some more fine tuning:

https://bugs.launchpad.net/bugs/1877688

Please check if the assumptions I made about Xen are correct, and please
also test whether these changes still work for you with Xen as intended by
you.

Christian Schoenebeck (2):
  xen-9pfs: Fix log messages of reply errors
  9pfs: fix init_in_iov_from_pdu truncating size

 hw/9pfs/virtio-9p-device.c | 35 ++++++++++++++++++++++++--------
 hw/9pfs/xen-9p-backend.c   | 41 ++++++++++++++++++++++++++++----------
 2 files changed, 58 insertions(+), 18 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-10 17:18 ` [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size Christian Schoenebeck
@ 2020-05-10 18:43   ` Christian Schoenebeck
  2020-05-11 22:09   ` Stefano Stabellini
  1 sibling, 0 replies; 19+ messages in thread
From: Christian Schoenebeck @ 2020-05-10 18:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Greg Kurz, Stefano Stabellini, Anthony Perard, Paul Durrant

On Sonntag, 10. Mai 2020 19:18:21 CEST Christian Schoenebeck wrote:
> Commit SHA-1 16724a173049ac29c7b5ade741da93a0f46edff7 introduced
> truncating the response to the currently available transport buffer
> size, which was supposed to fix an 9pfs error on Xen boot where
> transport buffer might still be smaller than required for response.
> 
> Unfortunately this change broke small reads (with less than 12
> bytes).
> 
> To address both concerns, check the actual response type and only
> truncate reply for Rreaddir responses, and only if truncated reply
> would at least return one payload byte to client. Use Rreaddir's
> precise header size (11) for this instead of P9_IOHDRSZ.

Oops, that should have been "Rread's" of course, not "Rreaddir's". Apparently 
I already spent too much time on readdir() code.

If there's not more to do, then maybe you could just amend this log message 
typo Greg? I would appreciate it.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-10 17:18 ` [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size Christian Schoenebeck
  2020-05-10 18:43   ` Christian Schoenebeck
@ 2020-05-11 22:09   ` Stefano Stabellini
  2020-05-12 11:29     ` Christian Schoenebeck
  1 sibling, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2020-05-11 22:09 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Anthony Perard, Stefano Stabellini, Paul Durrant, qemu-devel, Greg Kurz

On Sun, 10 May 2020, Christian Schoenebeck wrote:
> Commit SHA-1 16724a173049ac29c7b5ade741da93a0f46edff7 introduced
> truncating the response to the currently available transport buffer
> size, which was supposed to fix an 9pfs error on Xen boot where
> transport buffer might still be smaller than required for response.
> 
> Unfortunately this change broke small reads (with less than 12
> bytes).
> 
> To address both concerns, check the actual response type and only
> truncate reply for Rreaddir responses, 

I realize you mean "Rread" (not Rreaddir). Are we sure that truncation
can only happen with Rread? I checked the spec it looks like Directories
are pretty much like files from the spec point of view. So it seems to
me that truncation might be possible there too.


> and only if truncated reply would at least return one payload byte to
> client. Use Rreaddir's precise header size (11) for this instead of
> P9_IOHDRSZ.

Ah! That's the underlying error isn't it? That P9_IOHDRSZ is not really
the size of the reply header, it is bigger. Hence the check:

  if (buf_size < P9_IOHDRSZ) {

can be wrong for very small sizes.



> Fixes: 16724a173049ac29c7b5ade741da93a0f46edff7
> Fixes: https://bugs.launchpad.net/bugs/1877688
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> ---
>  hw/9pfs/virtio-9p-device.c | 35 +++++++++++++++++++++++++++--------
>  hw/9pfs/xen-9p-backend.c   | 38 +++++++++++++++++++++++++++++---------
>  2 files changed, 56 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
> index 536447a355..57e4d92ecb 100644
> --- a/hw/9pfs/virtio-9p-device.c
> +++ b/hw/9pfs/virtio-9p-device.c
> @@ -154,15 +154,34 @@ static void virtio_init_in_iov_from_pdu(V9fsPDU *pdu, struct iovec **piov,
>      VirtQueueElement *elem = v->elems[pdu->idx];
>      size_t buf_size = iov_size(elem->in_sg, elem->in_num);
>  
> -    if (buf_size < P9_IOHDRSZ) {
> -        VirtIODevice *vdev = VIRTIO_DEVICE(v);
> +    if (pdu->id + 1 == P9_RREAD) {
> +        /* size[4] Rread tag[2] count[4] data[count] */

4+2+4 = 10


> +        const size_t hdr_size = 11;

Are you adding 1 to account for "count"?


> +        /*
> +         * If current transport buffer size is smaller than actually required
> +         * for this Rreaddir response, then truncate the response to the
> +         * currently available transport buffer size, however only if it would
> +         * at least allow to return 1 payload byte to client.
> +         */
> +        if (buf_size < hdr_size + 1) {

If you have already added 1 before, why do we need to add 1 again here?


> +            VirtIODevice *vdev = VIRTIO_DEVICE(v);
>  
> -        virtio_error(vdev,
> -                     "VirtFS reply type %d needs %zu bytes, buffer has %zu, less than minimum",
> -                     pdu->id + 1, *size, buf_size);
> -    }
> -    if (buf_size < *size) {
> -        *size = buf_size;
> +            virtio_error(vdev,
> +                         "VirtFS reply type %d needs %zu bytes, buffer has "
> +                         "%zu, less than minimum (%zu)",
> +                         pdu->id + 1, *size, buf_size, hdr_size + 1);
> +        }

I think we want to return here


> +        if (buf_size < *size) {
> +            *size = buf_size;
> +        }
> +    } else {
> +        if (buf_size < *size) {
> +            VirtIODevice *vdev = VIRTIO_DEVICE(v);
> +
> +            virtio_error(vdev,
> +                         "VirtFS reply type %d needs %zu bytes, buffer has %zu",
> +                         pdu->id + 1, *size, buf_size);
> +        }
>      }
>  
>      *piov = elem->in_sg;
> diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> index f04caabfe5..98f340d24b 100644
> --- a/hw/9pfs/xen-9p-backend.c
> +++ b/hw/9pfs/xen-9p-backend.c
> @@ -201,15 +201,35 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU *pdu,
>      xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
>  
>      buf_size = iov_size(ring->sg, num);
> -    if (buf_size  < P9_IOHDRSZ) {
> -        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d needs "
> -                      "%zu bytes, buffer has %zu, less than minimum\n",
> -                      pdu->id + 1, *size, buf_size);
> -        xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
> -        xen_9pfs_disconnect(&xen_9pfs->xendev);
> -    }
> -    if (buf_size  < *size) {
> -        *size = buf_size;
> +    if (pdu->id + 1 == P9_RREAD) {
> +        /* size[4] Rread tag[2] count[4] data[count] */
> +        const size_t hdr_size = 11;
> +        /*
> +         * If current transport buffer size is smaller than actually required
> +         * for this Rreaddir response, then truncate the response to the
> +         * currently available transport buffer size, however only if it would
> +         * at least allow to return 1 payload byte to client.
> +         */
> +        if (buf_size < hdr_size + 1) {
> +            xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d "
> +                          "needs %zu bytes, buffer has %zu, less than "
> +                          "minimum (%zu)\n",
> +                          pdu->id + 1, *size, buf_size, hdr_size + 1);
> +            xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
> +            xen_9pfs_disconnect(&xen_9pfs->xendev);

I htink we want to return here.


> +        }
> +        if (buf_size < *size) {
> +            *size = buf_size;
> +        }
> +    } else {
> +        if (buf_size < *size) {
> +            xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d "
> +                         "needs %zu bytes, buffer has %zu\n", pdu->id + 1,
> +                          *size, buf_size);
> +
> +            xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
> +            xen_9pfs_disconnect(&xen_9pfs->xendev);
> +        }
>      }
>  
>      *piov = ring->sg;
> -- 
> 2.20.1
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] xen-9pfs: Fix log messages of reply errors
  2020-05-10 17:05 ` [PATCH 1/2] xen-9pfs: Fix log messages of reply errors Christian Schoenebeck
@ 2020-05-11 22:09   ` Stefano Stabellini
  0 siblings, 0 replies; 19+ messages in thread
From: Stefano Stabellini @ 2020-05-11 22:09 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Anthony Perard, Stefano Stabellini, Paul Durrant, qemu-devel, Greg Kurz

On Sun, 10 May 2020, Christian Schoenebeck wrote:
> If delivery of some 9pfs response fails for some reason, log the
> error message by mentioning the 9P protocol reply type, not by
> client's request type. The latter could be misleading that the
> error occurred already when handling the request input.
> 
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>

Acked-by: Stefano Stabellini <sstabellini@kernel.org>

> ---
>  hw/9pfs/xen-9p-backend.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> index 18fe5b7c92..f04caabfe5 100644
> --- a/hw/9pfs/xen-9p-backend.c
> +++ b/hw/9pfs/xen-9p-backend.c
> @@ -137,7 +137,8 @@ static ssize_t xen_9pfs_pdu_vmarshal(V9fsPDU *pdu,
>      ret = v9fs_iov_vmarshal(in_sg, num, offset, 0, fmt, ap);
>      if (ret < 0) {
>          xen_pv_printf(&xen_9pfs->xendev, 0,
> -                      "Failed to encode VirtFS request type %d\n", pdu->id + 1);
> +                      "Failed to encode VirtFS reply type %d\n",
> +                      pdu->id + 1);
>          xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
>          xen_9pfs_disconnect(&xen_9pfs->xendev);
>      }
> @@ -201,9 +202,9 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU *pdu,
>  
>      buf_size = iov_size(ring->sg, num);
>      if (buf_size  < P9_IOHDRSZ) {
> -        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs request type %d"
> -                "needs %zu bytes, buffer has %zu, less than minimum\n",
> -                pdu->id, *size, buf_size);
> +        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d needs "
> +                      "%zu bytes, buffer has %zu, less than minimum\n",
> +                      pdu->id + 1, *size, buf_size);
>          xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
>          xen_9pfs_disconnect(&xen_9pfs->xendev);
>      }
> -- 
> 2.20.1
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] 9pfs: regression init_in_iov_from_pdu truncating size
  2020-05-10 17:41 [PATCH 0/2] 9pfs: regression init_in_iov_from_pdu truncating size Christian Schoenebeck
  2020-05-10 17:05 ` [PATCH 1/2] xen-9pfs: Fix log messages of reply errors Christian Schoenebeck
  2020-05-10 17:18 ` [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size Christian Schoenebeck
@ 2020-05-12  9:38 ` Greg Kurz
  2020-05-13 23:07   ` Greg Kurz
  2 siblings, 1 reply; 19+ messages in thread
From: Greg Kurz @ 2020-05-12  9:38 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Anthony Perard, Stefano Stabellini, qemu-devel, Paul Durrant

On Sun, 10 May 2020 19:41:52 +0200
Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:

> Stefano, looks like your original patch needs some more fine tuning:
> 
> https://bugs.launchpad.net/bugs/1877688
> 
> Please check if the assumptions I made about Xen are correct, and please
> also test whether these changes still work for you with Xen as intended by
> you.
> 
> Christian Schoenebeck (2):
>   xen-9pfs: Fix log messages of reply errors
>   9pfs: fix init_in_iov_from_pdu truncating size
> 
>  hw/9pfs/virtio-9p-device.c | 35 ++++++++++++++++++++++++--------
>  hw/9pfs/xen-9p-backend.c   | 41 ++++++++++++++++++++++++++++----------
>  2 files changed, 58 insertions(+), 18 deletions(-)
> 

Sorry, I'm off this week, not sure I'll have time to review.
So I've only applied patch 1 for now and I'll let Stefano
and you sort out what should be done for patch 2.

Cheers,

--
Greg


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-11 22:09   ` Stefano Stabellini
@ 2020-05-12 11:29     ` Christian Schoenebeck
  2020-05-12 23:24       ` Stefano Stabellini
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Schoenebeck @ 2020-05-12 11:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefano Stabellini, Anthony Perard, Paul Durrant, Greg Kurz

On Dienstag, 12. Mai 2020 00:09:46 CEST Stefano Stabellini wrote:
> On Sun, 10 May 2020, Christian Schoenebeck wrote:
> > Commit SHA-1 16724a173049ac29c7b5ade741da93a0f46edff7 introduced
> > truncating the response to the currently available transport buffer
> > size, which was supposed to fix an 9pfs error on Xen boot where
> > transport buffer might still be smaller than required for response.
> > 
> > Unfortunately this change broke small reads (with less than 12
> > bytes).
> > 
> > To address both concerns, check the actual response type and only
> > truncate reply for Rreaddir responses,
> 
> I realize you mean "Rread" (not Rreaddir).

Yes, that's currently an unfortunate auto completion issue in my head due to 
having spent too much time on readdir() code lately. I'm working on it. ;-)

> Are we sure that truncation
> can only happen with Rread? I checked the spec it looks like Directories
> are pretty much like files from the spec point of view. So it seems to
> me that truncation might be possible there too.

That's the big question here: do we need to address this for other types than 
Rread? So far I only addressed this for Rread because from what you described 
you were encountering that Xen boot issue only with reads, right?

What we cannot do is simply truncating any response arbitrarily without 
looking at the precise response type. With Rread (9p2000.L) it's quite simple, 
because it is Ok (i.e. from client & guest OS perspective) to return 
*arbitrarily* less data than originally requested by client.

A problem with Rread would be protocol variant 9p2000.u, because such clients 
would also use Rread on directories. In that case client would end up with 
trash data -> undefined behaviour.

Likewise for Rreaddir (9p2000.L) it would be much more complicated, we could 
truncate the response, but we may not truncate individual directory entries of 
that response. So not only the Rreaddir header must be preserved, we also 
would have to look at the individual entries (their sizes vary for each entry 
individually) returned and only truncate at the boundaries of individual 
entries inside the response, otherwise client would receive trash data -> 
undefined behaviour.

And that's it, for all other 9P types we can't truncate response at all. For 
those types we could probably just return EAGAIN, but would it help? Probably 
would require changes on client side as well then to handle this correctly.

> > and only if truncated reply would at least return one payload byte to
> > client. Use Rreaddir's precise header size (11) for this instead of
> > P9_IOHDRSZ.
> 
> Ah! That's the underlying error isn't it? That P9_IOHDRSZ is not really
> the size of the reply header, it is bigger. Hence the check:
> 
>   if (buf_size < P9_IOHDRSZ) {
> 
> can be wrong for very small sizes.

Exactly. We need to handle this according to the precise header size of the 
respective response type. But unfortunately that's not enough, like described 
above in detail.

> > Fixes: 16724a173049ac29c7b5ade741da93a0f46edff7
> > Fixes: https://bugs.launchpad.net/bugs/1877688
> > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > ---
> > 
> >  hw/9pfs/virtio-9p-device.c | 35 +++++++++++++++++++++++++++--------
> >  hw/9pfs/xen-9p-backend.c   | 38 +++++++++++++++++++++++++++++---------
> >  2 files changed, 56 insertions(+), 17 deletions(-)
> > 
> > diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
> > index 536447a355..57e4d92ecb 100644
> > --- a/hw/9pfs/virtio-9p-device.c
> > +++ b/hw/9pfs/virtio-9p-device.c
> > @@ -154,15 +154,34 @@ static void virtio_init_in_iov_from_pdu(V9fsPDU
> > *pdu, struct iovec **piov,> 
> >      VirtQueueElement *elem = v->elems[pdu->idx];
> >      size_t buf_size = iov_size(elem->in_sg, elem->in_num);
> > 
> > -    if (buf_size < P9_IOHDRSZ) {
> > -        VirtIODevice *vdev = VIRTIO_DEVICE(v);
> > +    if (pdu->id + 1 == P9_RREAD) {
> > +        /* size[4] Rread tag[2] count[4] data[count] */
> 
> 4+2+4 = 10

That's 4+1+2+4 = 11

You were missing "Rread" here which identifies the (numeric) 9P response type 
and which is always 1 byte in size.

> > +        const size_t hdr_size = 11;
> 
> Are you adding 1 to account for "count"?

The idea was that from client perspective a successful read() call must at 
least return 1 byte. We must not return 0 bytes, because that would indicate 
EOF for POSIX systems. For that reason the minimum Rread response size would 
be:

	header_size + 1 payload_byte

and hence

	11 + 1 = 12

> > +        /*
> > +         * If current transport buffer size is smaller than actually
> > required +         * for this Rreaddir response, then truncate the
> > response to the +         * currently available transport buffer size,
> > however only if it would +         * at least allow to return 1 payload
> > byte to client.
> > +         */
> > +        if (buf_size < hdr_size + 1) {
> 
> If you have already added 1 before, why do we need to add 1 again here?

See desription above. Variable 'hdr_size' is the Rread header size (11 bytes) 
and +1 is added here for the minimum payload data returned.

> > +            VirtIODevice *vdev = VIRTIO_DEVICE(v);
> > 
> > -        virtio_error(vdev,
> > -                     "VirtFS reply type %d needs %zu bytes, buffer has
> > %zu, less than minimum", -                     pdu->id + 1, *size,
> > buf_size);
> > -    }
> > -    if (buf_size < *size) {
> > -        *size = buf_size;
> > +            virtio_error(vdev,
> > +                         "VirtFS reply type %d needs %zu bytes, buffer
> > has " +                         "%zu, less than minimum (%zu)",
> > +                         pdu->id + 1, *size, buf_size, hdr_size + 1);
> > +        }
> 
> I think we want to return here

I just preserved your structure as it was. If you tell me it should return 
here, I will add a return. NP.

> 
> > +        if (buf_size < *size) {
> > +            *size = buf_size;
> > +        }
> > +    } else {
> > +        if (buf_size < *size) {
> > +            VirtIODevice *vdev = VIRTIO_DEVICE(v);
> > +
> > +            virtio_error(vdev,
> > +                         "VirtFS reply type %d needs %zu bytes, buffer
> > has %zu", +                         pdu->id + 1, *size, buf_size);
> > +        }
> > 
> >      }
> >      
> >      *piov = elem->in_sg;
> > 
> > diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> > index f04caabfe5..98f340d24b 100644
> > --- a/hw/9pfs/xen-9p-backend.c
> > +++ b/hw/9pfs/xen-9p-backend.c
> > @@ -201,15 +201,35 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU
> > *pdu,> 
> >      xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
> >      
> >      buf_size = iov_size(ring->sg, num);
> > 
> > -    if (buf_size  < P9_IOHDRSZ) {
> > -        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d needs
> > " -                      "%zu bytes, buffer has %zu, less than
> > minimum\n", -                      pdu->id + 1, *size, buf_size);
> > -        xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
> > -        xen_9pfs_disconnect(&xen_9pfs->xendev);
> > -    }
> > -    if (buf_size  < *size) {
> > -        *size = buf_size;
> > +    if (pdu->id + 1 == P9_RREAD) {
> > +        /* size[4] Rread tag[2] count[4] data[count] */
> > +        const size_t hdr_size = 11;
> > +        /*
> > +         * If current transport buffer size is smaller than actually
> > required +         * for this Rreaddir response, then truncate the
> > response to the +         * currently available transport buffer size,
> > however only if it would +         * at least allow to return 1 payload
> > byte to client.
> > +         */
> > +        if (buf_size < hdr_size + 1) {
> > +            xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d "
> > +                          "needs %zu bytes, buffer has %zu, less than "
> > +                          "minimum (%zu)\n",
> > +                          pdu->id + 1, *size, buf_size, hdr_size + 1);
> > +            xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
> > +            xen_9pfs_disconnect(&xen_9pfs->xendev);
> 
> I htink we want to return here.

Same thing: if you want a return here, I will add it.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-12 11:29     ` Christian Schoenebeck
@ 2020-05-12 23:24       ` Stefano Stabellini
  2020-05-13 13:11         ` Christian Schoenebeck
  0 siblings, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2020-05-12 23:24 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Anthony Perard, Greg Kurz, Stefano Stabellini, qemu-devel, Paul Durrant

On Tue, 12 May 2020, Christian Schoenebeck wrote:
> On Dienstag, 12. Mai 2020 00:09:46 CEST Stefano Stabellini wrote:
> > On Sun, 10 May 2020, Christian Schoenebeck wrote:
> > > Commit SHA-1 16724a173049ac29c7b5ade741da93a0f46edff7 introduced
> > > truncating the response to the currently available transport buffer
> > > size, which was supposed to fix an 9pfs error on Xen boot where
> > > transport buffer might still be smaller than required for response.
> > > 
> > > Unfortunately this change broke small reads (with less than 12
> > > bytes).
> > > 
> > > To address both concerns, check the actual response type and only
> > > truncate reply for Rreaddir responses,
> > 
> > I realize you mean "Rread" (not Rreaddir).
> 
> Yes, that's currently an unfortunate auto completion issue in my head due to 
> having spent too much time on readdir() code lately. I'm working on it. ;-)

ahahah


> > Are we sure that truncation
> > can only happen with Rread? I checked the spec it looks like Directories
> > are pretty much like files from the spec point of view. So it seems to
> > me that truncation might be possible there too.
> 
> That's the big question here: do we need to address this for other types than 
> Rread? So far I only addressed this for Rread because from what you described 
> you were encountering that Xen boot issue only with reads, right?

That's right


> What we cannot do is simply truncating any response arbitrarily without 
> looking at the precise response type. With Rread (9p2000.L) it's quite simple, 
> because it is Ok (i.e. from client & guest OS perspective) to return 
> *arbitrarily* less data than originally requested by client.
> 
> A problem with Rread would be protocol variant 9p2000.u, because such clients 
> would also use Rread on directories. In that case client would end up with 
> trash data -> undefined behaviour.
> 
> Likewise for Rreaddir (9p2000.L) it would be much more complicated, we could 
> truncate the response, but we may not truncate individual directory entries of 
> that response. So not only the Rreaddir header must be preserved, we also 
> would have to look at the individual entries (their sizes vary for each entry 
> individually) returned and only truncate at the boundaries of individual 
> entries inside the response, otherwise client would receive trash data -> 
> undefined behaviour.

Let me premise that this patch fixes an important bug, so I am not
asking you to do any more work to get this patch committed right now :-)

But I think it would be good to test what would happen if the client did
a read on a directory with hundreds of entries, such as a maildir
directory. There has to be point where the number of directory entries
is larger than the shared buffer. What happens then?

I am guessing that we'll have to implement the "truncate at the
boundaries of individual entries" to get it right in all cases.

Given that it doesn't look like truncation would work right with
Rreaddir anyway today, I think it would be OK to fix it in a separate
patch.


> And that's it, for all other 9P types we can't truncate response at all. For 
> those types we could probably just return EAGAIN, but would it help? Probably 
> would require changes on client side as well then to handle this correctly.

I think the critical ones are Rread and Rreaddir. I wonder if we could
even reproduce a bug with any of the other 9p types. Probably not. So
returning an error might be good enough. Returning an error is also way
better than calling virtio_error or xen_9pfs_disconnect, more on this
below.


> > > and only if truncated reply would at least return one payload byte to
> > > client. Use Rreaddir's precise header size (11) for this instead of
> > > P9_IOHDRSZ.
> > 
> > Ah! That's the underlying error isn't it? That P9_IOHDRSZ is not really
> > the size of the reply header, it is bigger. Hence the check:
> > 
> >   if (buf_size < P9_IOHDRSZ) {
> > 
> > can be wrong for very small sizes.
> 
> Exactly. We need to handle this according to the precise header size of the 
> respective response type. But unfortunately that's not enough, like described 
> above in detail.

OK got it


> > > Fixes: 16724a173049ac29c7b5ade741da93a0f46edff7
> > > Fixes: https://bugs.launchpad.net/bugs/1877688
> > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > > ---
> > > 
> > >  hw/9pfs/virtio-9p-device.c | 35 +++++++++++++++++++++++++++--------
> > >  hw/9pfs/xen-9p-backend.c   | 38 +++++++++++++++++++++++++++++---------
> > >  2 files changed, 56 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
> > > index 536447a355..57e4d92ecb 100644
> > > --- a/hw/9pfs/virtio-9p-device.c
> > > +++ b/hw/9pfs/virtio-9p-device.c
> > > @@ -154,15 +154,34 @@ static void virtio_init_in_iov_from_pdu(V9fsPDU
> > > *pdu, struct iovec **piov,> 
> > >      VirtQueueElement *elem = v->elems[pdu->idx];
> > >      size_t buf_size = iov_size(elem->in_sg, elem->in_num);
> > > 
> > > -    if (buf_size < P9_IOHDRSZ) {
> > > -        VirtIODevice *vdev = VIRTIO_DEVICE(v);
> > > +    if (pdu->id + 1 == P9_RREAD) {
> > > +        /* size[4] Rread tag[2] count[4] data[count] */
> > 
> > 4+2+4 = 10
> 
> That's 4+1+2+4 = 11
> 
> You were missing "Rread" here which identifies the (numeric) 9P response type 
> and which is always 1 byte in size.

I thought so. Could you please update the comment in the code? As is it
is a bit confusing.

 
> > > +        const size_t hdr_size = 11;
> > 
> > Are you adding 1 to account for "count"?
> 
> The idea was that from client perspective a successful read() call must at 
> least return 1 byte. We must not return 0 bytes, because that would indicate 
> EOF for POSIX systems. For that reason the minimum Rread response size would 
> be:
> 
> 	header_size + 1 payload_byte
> 
> and hence
> 
> 	11 + 1 = 12

Yes I think you are right


> > > +        /*
> > > +         * If current transport buffer size is smaller than actually
> > > required +         * for this Rreaddir response, then truncate the
> > > response to the +         * currently available transport buffer size,
> > > however only if it would +         * at least allow to return 1 payload
> > > byte to client.
> > > +         */
> > > +        if (buf_size < hdr_size + 1) {
> > 
> > If you have already added 1 before, why do we need to add 1 again here?
> 
> See desription above. Variable 'hdr_size' is the Rread header size (11 bytes) 
> and +1 is added here for the minimum payload data returned.

OK


> > > +            VirtIODevice *vdev = VIRTIO_DEVICE(v);
> > > 
> > > -        virtio_error(vdev,
> > > -                     "VirtFS reply type %d needs %zu bytes, buffer has
> > > %zu, less than minimum", -                     pdu->id + 1, *size,
> > > buf_size);
> > > -    }
> > > -    if (buf_size < *size) {
> > > -        *size = buf_size;
> > > +            virtio_error(vdev,
> > > +                         "VirtFS reply type %d needs %zu bytes, buffer
> > > has " +                         "%zu, less than minimum (%zu)",
> > > +                         pdu->id + 1, *size, buf_size, hdr_size + 1);
> > > +        }
> > 
> > I think we want to return here
> 
> I just preserved your structure as it was. If you tell me it should return 
> here, I will add a return. NP.

The point I was trying to make is that it doesn't make a lot of sense to
try to do any operations after calling virtio_error. I am not a virtio
expert but certainly in the case of Xen below we cannot continue after
calling xen_9pfs_disconnect. The whole connection gets closed, more on
this below.


> > 
> > > +        if (buf_size < *size) {
> > > +            *size = buf_size;
> > > +        }
> > > +    } else {
> > > +        if (buf_size < *size) {
> > > +            VirtIODevice *vdev = VIRTIO_DEVICE(v);
> > > +
> > > +            virtio_error(vdev,
> > > +                         "VirtFS reply type %d needs %zu bytes, buffer
> > > has %zu", +                         pdu->id + 1, *size, buf_size);
> > > +        }
> > > 
> > >      }
> > >      
> > >      *piov = elem->in_sg;
> > > 
> > > diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> > > index f04caabfe5..98f340d24b 100644
> > > --- a/hw/9pfs/xen-9p-backend.c
> > > +++ b/hw/9pfs/xen-9p-backend.c
> > > @@ -201,15 +201,35 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU
> > > *pdu,> 
> > >      xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
> > >      
> > >      buf_size = iov_size(ring->sg, num);
> > > 
> > > -    if (buf_size  < P9_IOHDRSZ) {
> > > -        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d needs
> > > " -                      "%zu bytes, buffer has %zu, less than
> > > minimum\n", -                      pdu->id + 1, *size, buf_size);
> > > -        xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
> > > -        xen_9pfs_disconnect(&xen_9pfs->xendev);
> > > -    }
> > > -    if (buf_size  < *size) {
> > > -        *size = buf_size;
> > > +    if (pdu->id + 1 == P9_RREAD) {
> > > +        /* size[4] Rread tag[2] count[4] data[count] */
> > > +        const size_t hdr_size = 11;
> > > +        /*
> > > +         * If current transport buffer size is smaller than actually
> > > required +         * for this Rreaddir response, then truncate the
> > > response to the +         * currently available transport buffer size,
> > > however only if it would +         * at least allow to return 1 payload
> > > byte to client.
> > > +         */
> > > +        if (buf_size < hdr_size + 1) {
> > > +            xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs reply type %d "
> > > +                          "needs %zu bytes, buffer has %zu, less than "
> > > +                          "minimum (%zu)\n",
> > > +                          pdu->id + 1, *size, buf_size, hdr_size + 1);
> > > +            xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
> > > +            xen_9pfs_disconnect(&xen_9pfs->xendev);
> > 
> > I htink we want to return here.
> 
> Same thing: if you want a return here, I will add it.

I think we have to return here because the connection gets closed by
xen_9pfs_disconnect. I wonder if we should return -EAGAIN like you
suggested instead of calling xen_9pfs_disconnect() which is
irrecoverable. But that could be done later in a separate patch.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-12 23:24       ` Stefano Stabellini
@ 2020-05-13 13:11         ` Christian Schoenebeck
  2020-05-13 23:31           ` Stefano Stabellini
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Schoenebeck @ 2020-05-13 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefano Stabellini, Anthony Perard, Greg Kurz, Paul Durrant

On Mittwoch, 13. Mai 2020 01:24:16 CEST Stefano Stabellini wrote:
> Let me premise that this patch fixes an important bug, so I am not
> asking you to do any more work to get this patch committed right now :-)

Actually, I realized that this overall issue is much more delicate to handle 
than I thought. My patch would still miss other important things, like e.g. it 
does not roll back the read / readdir streaming positions of the respective 
FID on 9P server side. -> data loss / undefined behaviour on subsequent read / 
readdir

Also the 'count' parameter in the Rread response would need to be updated as 
well. -> undefined behaviour

For these and other reasons already described, I start to think that it is 
actually not a good idea to truncate the response at all, at least not at that 
final transport stage (as my or your patch do).

Question: is there really no way to handle this overall issue exlusively on 
Xen side, i.e. delaying response delivery until large enough transport buffer 
would be available or acquiring on-demand the required buffer size? Or by 
looking ahead at msize in advance (see below)?

From what I see now, I would recommend completely reverting response 
truncation on virtio transport side at least. Because so far I have not seen a 
single case with virtio where the transport buffer was too small for 
delivering any response. And as you can see, response truncation at late 
transport level opens all kinds of severe issues on other ends. It is not easy 
to do and requires a lot of code!

> But I think it would be good to test what would happen if the client did
> a read on a directory with hundreds of entries, such as a maildir
> directory. There has to be point where the number of directory entries
> is larger than the shared buffer. What happens then?

I'm talking about virtio only now: that scenario does not happen. When client 
opens a 9P session, client transmits 'msize' as parameter for the 9P sesssion 
which is the "maximum message size" ever to be assumed on both sides for  
client requests and server responses. By default msize is 4 kiB (4096 bytes) 
with Linux clients, and our server would immediately error out if any client 
tries to open a session with 'msize' < 4 kiB (for similar reasons discussed; 
because some 9P response types cannot be truncated at all):

https://github.com/qemu/qemu/commit/
e16453a31a00c1c0a199cab0617e8aa888f6419a#diff-f3e98ed0a65e27743b14785fa85b7d79

The Rread and Rreaddir responses are already truncated (in a sane and correct 
way) in 9p.c by server's request handler in case the response would exceed 
'msize'. For that reason it is actually unexpected that transport would not be 
able to provide the required transport buffer size for any response.

> I am guessing that we'll have to implement the "truncate at the
> boundaries of individual entries" to get it right in all cases.
> 
> Given that it doesn't look like truncation would work right with
> Rreaddir anyway today, I think it would be OK to fix it in a separate
> patch.

If there is really no way with Xen to ensure there's always a buffer size 
according to msize, then I fear this would need to be addressed in a 
completely different way: by changing the transport interface, early asking 
transport what the biggest buffer size would be, and then passing that info 
already early over to the 9P request handler (9p.c), such that it would 
already be able to assemble the individual responses *correctly* by using

	max_response_size = MIN(msize, client_requested_size, transport_size)

> > And that's it, for all other 9P types we can't truncate response at all.
> > For those types we could probably just return EAGAIN, but would it help?
> > Probably would require changes on client side as well then to handle this
> > correctly.
> I think the critical ones are Rread and Rreaddir. I wonder if we could
> even reproduce a bug with any of the other 9p types. Probably not. So
> returning an error might be good enough. Returning an error is also way
> better than calling virtio_error or xen_9pfs_disconnect, more on this
> below.

Xen *MUST* ensure to always deliver a transport buffer at least 4kiB in size 
for 9P responses. See my commit log message in the commit link provided above 
for detailed explanation why.

And IMO Xen should always be able to provide a transport buffer that's at 
least msize large. If that's the case, then we don't have to worry about 
anything else and can drop this transport truncation code.

> > > > Fixes: 16724a173049ac29c7b5ade741da93a0f46edff7
> > > > Fixes: https://bugs.launchpad.net/bugs/1877688
> > > > Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> > > > ---
> > > > 
> > > >  hw/9pfs/virtio-9p-device.c | 35 +++++++++++++++++++++++++++--------
> > > >  hw/9pfs/xen-9p-backend.c   | 38
> > > >  +++++++++++++++++++++++++++++---------
> > > >  2 files changed, 56 insertions(+), 17 deletions(-)
> > > > 
> > > > diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
> > > > index 536447a355..57e4d92ecb 100644
> > > > --- a/hw/9pfs/virtio-9p-device.c
> > > > +++ b/hw/9pfs/virtio-9p-device.c
> > > > @@ -154,15 +154,34 @@ static void virtio_init_in_iov_from_pdu(V9fsPDU
> > > > *pdu, struct iovec **piov,>
> > > > 
> > > >      VirtQueueElement *elem = v->elems[pdu->idx];
> > > >      size_t buf_size = iov_size(elem->in_sg, elem->in_num);
> > > > 
> > > > -    if (buf_size < P9_IOHDRSZ) {
> > > > -        VirtIODevice *vdev = VIRTIO_DEVICE(v);
> > > > +    if (pdu->id + 1 == P9_RREAD) {
> > > > +        /* size[4] Rread tag[2] count[4] data[count] */
> > > 
> > > 4+2+4 = 10
> > 
> > That's 4+1+2+4 = 11
> > 
> > You were missing "Rread" here which identifies the (numeric) 9P response
> > type and which is always 1 byte in size.
> 
> I thought so. Could you please update the comment in the code? As is it
> is a bit confusing.

I just copy pasted the 9P specs here, but NP, that's the smallest issue to 
address.

> I think we have to return here because the connection gets closed by
> xen_9pfs_disconnect. I wonder if we should return -EAGAIN like you
> suggested instead of calling xen_9pfs_disconnect() which is
> irrecoverable. But that could be done later in a separate patch.

Even -EAGAIN would require a bunch of code, since the precise error response 
type depends a) on the protocol dialect (9p2000.L vs. 9p2000.u) and b) even 
depending on the precise request type. For some 9P request types it is not 
even allowed to return an error response at all.

As you can see it all boils down to one thing: better look for a solution for 
Xen to deliver the required buffer size, or in worst case: yield the 
coroutine, i.e. delay delivery of response until that can be assured by Xen.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] 9pfs: regression init_in_iov_from_pdu truncating size
  2020-05-12  9:38 ` [PATCH 0/2] 9pfs: regression " Greg Kurz
@ 2020-05-13 23:07   ` Greg Kurz
  2020-05-13 23:33     ` Stefano Stabellini
  0 siblings, 1 reply; 19+ messages in thread
From: Greg Kurz @ 2020-05-13 23:07 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Anthony Perard, Stefano Stabellini, qemu-devel, Paul Durrant

On Tue, 12 May 2020 11:38:23 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Sun, 10 May 2020 19:41:52 +0200
> Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> 
> > Stefano, looks like your original patch needs some more fine tuning:
> > 
> > https://bugs.launchpad.net/bugs/1877688
> > 
> > Please check if the assumptions I made about Xen are correct, and please
> > also test whether these changes still work for you with Xen as intended by
> > you.
> > 
> > Christian Schoenebeck (2):
> >   xen-9pfs: Fix log messages of reply errors
> >   9pfs: fix init_in_iov_from_pdu truncating size
> > 
> >  hw/9pfs/virtio-9p-device.c | 35 ++++++++++++++++++++++++--------
> >  hw/9pfs/xen-9p-backend.c   | 41 ++++++++++++++++++++++++++++----------
> >  2 files changed, 58 insertions(+), 18 deletions(-)
> > 
> 
> Sorry, I'm off this week, not sure I'll have time to review.
> So I've only applied patch 1 for now and I'll let Stefano
> and you sort out what should be done for patch 2.
> 

IIUC this requires more thinking and may end up in a complete rewrite of
the truncating logic. I intend to send a PR soon : is it worth keeping
patch 1 anyway ? 

> Cheers,
> 
> --
> Greg
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-13 13:11         ` Christian Schoenebeck
@ 2020-05-13 23:31           ` Stefano Stabellini
  2020-05-14 14:24             ` Christian Schoenebeck
  0 siblings, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2020-05-13 23:31 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Anthony Perard, Stefano Stabellini, Paul Durrant, qemu-devel, Greg Kurz

On Wed, 13 May 2020, Christian Schoenebeck wrote:
> On Mittwoch, 13. Mai 2020 01:24:16 CEST Stefano Stabellini wrote:
> > Let me premise that this patch fixes an important bug, so I am not
> > asking you to do any more work to get this patch committed right now :-)
> 
> Actually, I realized that this overall issue is much more delicate to handle 
> than I thought. My patch would still miss other important things, like e.g. it 
> does not roll back the read / readdir streaming positions of the respective 
> FID on 9P server side. -> data loss / undefined behaviour on subsequent read / 
> readdir
> 
> Also the 'count' parameter in the Rread response would need to be updated as 
> well. -> undefined behaviour
> 
> For these and other reasons already described, I start to think that it is 
> actually not a good idea to truncate the response at all, at least not at that 
> final transport stage (as my or your patch do).
> 
> Question: is there really no way to handle this overall issue exlusively on 
> Xen side, i.e. delaying response delivery until large enough transport buffer 
> would be available or acquiring on-demand the required buffer size? Or by 
> looking ahead at msize in advance (see below)?
> 
> >From what I see now, I would recommend completely reverting response 
> truncation on virtio transport side at least. Because so far I have not seen a 
> single case with virtio where the transport buffer was too small for 
> delivering any response. And as you can see, response truncation at late 
> transport level opens all kinds of severe issues on other ends. It is not easy 
> to do and requires a lot of code!
> 
> > But I think it would be good to test what would happen if the client did
> > a read on a directory with hundreds of entries, such as a maildir
> > directory. There has to be point where the number of directory entries
> > is larger than the shared buffer. What happens then?
> 
> I'm talking about virtio only now: that scenario does not happen. When client 
> opens a 9P session, client transmits 'msize' as parameter for the 9P sesssion 
> which is the "maximum message size" ever to be assumed on both sides for  
> client requests and server responses. By default msize is 4 kiB (4096 bytes) 
> with Linux clients, and our server would immediately error out if any client 
> tries to open a session with 'msize' < 4 kiB (for similar reasons discussed; 
> because some 9P response types cannot be truncated at all):
> 
> https://github.com/qemu/qemu/commit/
> e16453a31a00c1c0a199cab0617e8aa888f6419a#diff-f3e98ed0a65e27743b14785fa85b7d79
> 
> The Rread and Rreaddir responses are already truncated (in a sane and correct 
> way) in 9p.c by server's request handler in case the response would exceed 
> 'msize'. For that reason it is actually unexpected that transport would not be 
> able to provide the required transport buffer size for any response.
>
> > I am guessing that we'll have to implement the "truncate at the
> > boundaries of individual entries" to get it right in all cases.
> > 
> > Given that it doesn't look like truncation would work right with
> > Rreaddir anyway today, I think it would be OK to fix it in a separate
> > patch.
> 
> If there is really no way with Xen to ensure there's always a buffer size 
> according to msize, 

Thank you for the long writeup. Let's see if we can come up with a
good plan.

The Xen transport
(https://xenbits.xenproject.org/docs/4.13-testing/misc/9pfs.html) has a
shared memory area with 64 pages, so 256KB. It is split in half: 128KB
for requests and 128KB for responses. 

The original error message is the following:

  xen be: 9pfs-0: xen be: 9pfs-0: Xen 9pfs request type 116 needs 126987 bytes, buffer has 126965
  Xen 9pfs request type 116needs 126987 bytes, buffer has 126965

and msize was set to 131072 in the Linux client.


So transport_size == msize. However, there can be multiple requests and
responses inflight at any given time. It means that even with
transport_size=128KB and msize=4KB we could still have so many outstanding
requests that the available memory is less then 4KB.

I admit that with transport_size=128KB and msize=4KB it would be so
unlikely to happen that it would be close to impossible to reproduce the
error. But theoretically it is possible.

If we can't come up with any better ideas, we could simply limit the
msize in the Linux client to something like transport_size/2 or
transport_size/4, but it doesn't feel right.


> > I think we have to return here because the connection gets closed by
> > xen_9pfs_disconnect. I wonder if we should return -EAGAIN like you
> > suggested instead of calling xen_9pfs_disconnect() which is
> > irrecoverable. But that could be done later in a separate patch.
> 
> Even -EAGAIN would require a bunch of code, since the precise error response 
> type depends a) on the protocol dialect (9p2000.L vs. 9p2000.u) and b) even 
> depending on the precise request type. For some 9P request types it is not 
> even allowed to return an error response at all.
> 
> As you can see it all boils down to one thing: better look for a solution for 
> Xen to deliver the required buffer size, or in worst case: yield the 
> coroutine, i.e. delay delivery of response until that can be assured by Xen.

That's might be the best option. But it would have to be
xen_9pfs_init_in_iov_from_pdu to call qemu_coroutine_yield() in a loop?
Like:


diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 18fe5b7c92..ef1db47d0c 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -197,18 +197,13 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU *pdu,
     g_free(ring->sg);
 
     ring->sg = g_new0(struct iovec, 2);
-    xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
 
+again:
+    xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
     buf_size = iov_size(ring->sg, num);
-    if (buf_size  < P9_IOHDRSZ) {
-        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs request type %d"
-                "needs %zu bytes, buffer has %zu, less than minimum\n",
-                pdu->id, *size, buf_size);
-        xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
-        xen_9pfs_disconnect(&xen_9pfs->xendev);
-    }
     if (buf_size  < *size) {
-        *size = buf_size;
+        qemu_coroutine_yield();
+        goto again;
     }
 
     *piov = ring->sg;


How is the coroutine going to get scheduled again? Is it automatic or do
we need to call it again explicitly when we get the next notification
from the client, which should arrive as soon as the client finishes
reading the responses?


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] 9pfs: regression init_in_iov_from_pdu truncating size
  2020-05-13 23:07   ` Greg Kurz
@ 2020-05-13 23:33     ` Stefano Stabellini
  0 siblings, 0 replies; 19+ messages in thread
From: Stefano Stabellini @ 2020-05-13 23:33 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Anthony Perard, Stefano Stabellini, Christian Schoenebeck,
	qemu-devel, Paul Durrant

On Thu, 14 May 2020, Greg Kurz wrote:
> On Tue, 12 May 2020 11:38:23 +0200
> Greg Kurz <groug@kaod.org> wrote:
> 
> > On Sun, 10 May 2020 19:41:52 +0200
> > Christian Schoenebeck <qemu_oss@crudebyte.com> wrote:
> > 
> > > Stefano, looks like your original patch needs some more fine tuning:
> > > 
> > > https://bugs.launchpad.net/bugs/1877688
> > > 
> > > Please check if the assumptions I made about Xen are correct, and please
> > > also test whether these changes still work for you with Xen as intended by
> > > you.
> > > 
> > > Christian Schoenebeck (2):
> > >   xen-9pfs: Fix log messages of reply errors
> > >   9pfs: fix init_in_iov_from_pdu truncating size
> > > 
> > >  hw/9pfs/virtio-9p-device.c | 35 ++++++++++++++++++++++++--------
> > >  hw/9pfs/xen-9p-backend.c   | 41 ++++++++++++++++++++++++++++----------
> > >  2 files changed, 58 insertions(+), 18 deletions(-)
> > > 
> > 
> > Sorry, I'm off this week, not sure I'll have time to review.
> > So I've only applied patch 1 for now and I'll let Stefano
> > and you sort out what should be done for patch 2.
> > 
> 
> IIUC this requires more thinking and may end up in a complete rewrite of
> the truncating logic. I intend to send a PR soon : is it worth keeping
> patch 1 anyway ? 

Patch 1 is fine. For patch 2 we might need a little more time.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-13 23:31           ` Stefano Stabellini
@ 2020-05-14 14:24             ` Christian Schoenebeck
  2020-05-14 15:51               ` Stefano Stabellini
  2020-05-14 16:24               ` Stefano Stabellini
  0 siblings, 2 replies; 19+ messages in thread
From: Christian Schoenebeck @ 2020-05-14 14:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefano Stabellini, Anthony Perard, Paul Durrant, Greg Kurz

On Donnerstag, 14. Mai 2020 01:31:30 CEST Stefano Stabellini wrote:
> > If there is really no way with Xen to ensure there's always a buffer size
> > according to msize,
> 
> Thank you for the long writeup. Let's see if we can come up with a
> good plan.
> 
> The Xen transport
> (https://xenbits.xenproject.org/docs/4.13-testing/misc/9pfs.html) has a
> shared memory area with 64 pages, so 256KB. It is split in half: 128KB
> for requests and 128KB for responses.

That's very little. That also means you won't be able to ever achieve decent 
9pfs performance with Xen, because that requires a much larger msize of 
approximately msize >= 4MB, depending on the underlying I/O hardware, tendency 
growing (due to rising I/O hardware bandwidth).

> The original error message is the following:
> 
>   xen be: 9pfs-0: xen be: 9pfs-0: Xen 9pfs request type 116 needs 126987
> bytes, buffer has 126965 Xen 9pfs request type 116needs 126987 bytes,
> buffer has 126965
> 
> and msize was set to 131072 in the Linux client.
> 
> So transport_size == msize. However, there can be multiple requests and
> responses inflight at any given time. It means that even with
> transport_size=128KB and msize=4KB we could still have so many outstanding
> requests that the available memory is less then 4KB.
> 
> I admit that with transport_size=128KB and msize=4KB it would be so
> unlikely to happen that it would be close to impossible to reproduce the
> error. But theoretically it is possible.

Even though msize=4kiB is the default value with Linux clients, I would never 
use such a small value as it would cause an extremely poor 9pfs performance. 
My plan BTW is to log a performance warning soon if a client is using such a 
little msize.

> If we can't come up with any better ideas, we could simply limit the
> msize in the Linux client to something like transport_size/2 or
> transport_size/4, but it doesn't feel right.

That sounds like it would just reduce likeliness for the low buffer situation 
to happen with Xen, it would not prevent it though.

As far as I understand you now, the problem with Xen transport seems to be 
that clients can submit requests more quickly than the 9pfs server could 
process. That would change the overall situation completely, because my hack 
solution with delaying delivery (yield) suggested previously, was based on the 
assumption that this low transport buffer scenario only happens once on boot, 
but not possibly at any later point again and again.

> > As you can see it all boils down to one thing: better look for a solution
> > for Xen to deliver the required buffer size, or in worst case: yield the
> > coroutine, i.e. delay delivery of response until that can be assured by
> > Xen.
> That's might be the best option. But it would have to be
> xen_9pfs_init_in_iov_from_pdu to call qemu_coroutine_yield() in a loop?
> Like:
> 
> 
> diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> index 18fe5b7c92..ef1db47d0c 100644
> --- a/hw/9pfs/xen-9p-backend.c
> +++ b/hw/9pfs/xen-9p-backend.c
> @@ -197,18 +197,13 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU
> *pdu, g_free(ring->sg);
> 
>      ring->sg = g_new0(struct iovec, 2);
> -    xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
> 
> +again:
> +    xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
>      buf_size = iov_size(ring->sg, num);
> -    if (buf_size  < P9_IOHDRSZ) {
> -        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs request type %d"
> -                "needs %zu bytes, buffer has %zu, less than minimum\n",
> -                pdu->id, *size, buf_size);
> -        xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
> -        xen_9pfs_disconnect(&xen_9pfs->xendev);
> -    }
>      if (buf_size  < *size) {
> -        *size = buf_size;
> +        qemu_coroutine_yield();
> +        goto again;
>      }
> 
>      *piov = ring->sg;
> 
> 
> How is the coroutine going to get scheduled again? Is it automatic or do
> we need to call it again explicitly when we get the next notification
> from the client, which should arrive as soon as the client finishes
> reading the responses?

I "think" the coroutine would be resumed in the next main loop iteration. But 
I am still reading up on qemu's coroutine implementation details myself. So 
that might work, yes, even though in a hackish way.

And as explained above, if this situation not only happens on boot, then you 
might consider a different approach. There is also yield_until_fd_readable(int 
fd); BTW. However the obvious step would be increasing Xen transport capacity 
tremendously at first place.

Looks like this issue will still take quite some time to be fixed with Xen. If 
you don't mind I'll send out a patch to revert truncation on virtio side, so 
that at least this bug is fixed with virtio ASAP.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-14 14:24             ` Christian Schoenebeck
@ 2020-05-14 15:51               ` Stefano Stabellini
  2020-05-14 16:10                 ` Christian Schoenebeck
  2020-05-14 16:24               ` Stefano Stabellini
  1 sibling, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2020-05-14 15:51 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Anthony Perard, Greg Kurz, Stefano Stabellini, qemu-devel, Paul Durrant

On Thu, 14 May 2020, Christian Schoenebeck wrote:
> Looks like this issue will still take quite some time to be fixed with Xen. If 
> you don't mind I'll send out a patch to revert truncation on virtio side, so 
> that at least this bug is fixed with virtio ASAP.

Let me answer to this quickly so that if you want to get the patch out
today you can.


Yes, I think it is OK to revert truncation in virtio now. Only one
thing: would there still be any value in doing for Xen:

+    if (pdu->id + 1 == P9_RREAD) {
+        /* size[4] Rread tag[2] count[4] data[count] */
+        const size_t hdr_size = 11;
+        /*
+         * If current transport buffer size is smaller than actually required
+         * for this Rreaddir response, then truncate the response to the
+         * currently available transport buffer size, however only if it would
+         * at least allow to return 1 payload byte to client.
+         */
+        if (buf_size < hdr_size + 1) {


like your patch here does? Although not a complete solution it looks
like it would still be a good improvement over the current situation for
Xen.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-14 15:51               ` Stefano Stabellini
@ 2020-05-14 16:10                 ` Christian Schoenebeck
  2020-05-14 16:23                   ` Stefano Stabellini
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Schoenebeck @ 2020-05-14 16:10 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefano Stabellini, Anthony Perard, Greg Kurz, Paul Durrant

On Donnerstag, 14. Mai 2020 17:51:27 CEST Stefano Stabellini wrote:
> On Thu, 14 May 2020, Christian Schoenebeck wrote:
> > Looks like this issue will still take quite some time to be fixed with
> > Xen. If you don't mind I'll send out a patch to revert truncation on
> > virtio side, so that at least this bug is fixed with virtio ASAP.
> 
> Let me answer to this quickly so that if you want to get the patch out
> today you can.
> 
> 
> Yes, I think it is OK to revert truncation in virtio now. 

Good

> Only one
> thing: would there still be any value in doing for Xen:
> 
> +    if (pdu->id + 1 == P9_RREAD) {
> +        /* size[4] Rread tag[2] count[4] data[count] */
> +        const size_t hdr_size = 11;
> +        /*
> +         * If current transport buffer size is smaller than actually
> required +         * for this Rreaddir response, then truncate the response
> to the +         * currently available transport buffer size, however only
> if it would +         * at least allow to return 1 payload byte to client.
> +         */
> +        if (buf_size < hdr_size + 1) {
> 
> 
> like your patch here does? Although not a complete solution it looks
> like it would still be a good improvement over the current situation for
> Xen.

IMO in its current form, no. It would just move the problematic from a clearly 
visible 9pfs server termination with error, towards a silent data loss 
(without any error) on client side. Remember: this patch does not roll back 
the filesystem driver's read position.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-14 16:10                 ` Christian Schoenebeck
@ 2020-05-14 16:23                   ` Stefano Stabellini
  0 siblings, 0 replies; 19+ messages in thread
From: Stefano Stabellini @ 2020-05-14 16:23 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Anthony Perard, Stefano Stabellini, Paul Durrant, qemu-devel, Greg Kurz

On Thu, 14 May 2020, Christian Schoenebeck wrote:
> On Donnerstag, 14. Mai 2020 17:51:27 CEST Stefano Stabellini wrote:
> > On Thu, 14 May 2020, Christian Schoenebeck wrote:
> > > Looks like this issue will still take quite some time to be fixed with
> > > Xen. If you don't mind I'll send out a patch to revert truncation on
> > > virtio side, so that at least this bug is fixed with virtio ASAP.
> > 
> > Let me answer to this quickly so that if you want to get the patch out
> > today you can.
> > 
> > 
> > Yes, I think it is OK to revert truncation in virtio now. 
> 
> Good
> 
> > Only one
> > thing: would there still be any value in doing for Xen:
> > 
> > +    if (pdu->id + 1 == P9_RREAD) {
> > +        /* size[4] Rread tag[2] count[4] data[count] */
> > +        const size_t hdr_size = 11;
> > +        /*
> > +         * If current transport buffer size is smaller than actually
> > required +         * for this Rreaddir response, then truncate the response
> > to the +         * currently available transport buffer size, however only
> > if it would +         * at least allow to return 1 payload byte to client.
> > +         */
> > +        if (buf_size < hdr_size + 1) {
> > 
> > 
> > like your patch here does? Although not a complete solution it looks
> > like it would still be a good improvement over the current situation for
> > Xen.
> 
> IMO in its current form, no. It would just move the problematic from a clearly 
> visible 9pfs server termination with error, towards a silent data loss 
> (without any error) on client side. Remember: this patch does not roll back 
> the filesystem driver's read position.

OK


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-14 14:24             ` Christian Schoenebeck
  2020-05-14 15:51               ` Stefano Stabellini
@ 2020-05-14 16:24               ` Stefano Stabellini
  2020-05-14 17:21                 ` Christian Schoenebeck
  1 sibling, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2020-05-14 16:24 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Anthony Perard, Greg Kurz, Stefano Stabellini, qemu-devel, Paul Durrant

On Thu, 14 May 2020, Christian Schoenebeck wrote:
> On Donnerstag, 14. Mai 2020 01:31:30 CEST Stefano Stabellini wrote:
> > > If there is really no way with Xen to ensure there's always a buffer size
> > > according to msize,
> > 
> > Thank you for the long writeup. Let's see if we can come up with a
> > good plan.
> > 
> > The Xen transport
> > (https://xenbits.xenproject.org/docs/4.13-testing/misc/9pfs.html) has a
> > shared memory area with 64 pages, so 256KB. It is split in half: 128KB
> > for requests and 128KB for responses.
> 
> That's very little. That also means you won't be able to ever achieve decent 
> 9pfs performance with Xen, because that requires a much larger msize of 
> approximately msize >= 4MB, depending on the underlying I/O hardware, tendency 
> growing (due to rising I/O hardware bandwidth).

That's great feedback. Fortunately the size of the transport is
configurable so it is one liner away from becoming much bigger (well,
two oneliners, see net/9p/trans_xen.c:XEN_9PFS_RING_ORDER in Linux, and
hw/9pfs/xen-9p-backend.c:MAX_RING_ORDER in QEMU.)

Would you recommend 4MB each way, so a total of 8MB?


> > The original error message is the following:
> > 
> >   xen be: 9pfs-0: xen be: 9pfs-0: Xen 9pfs request type 116 needs 126987
> > bytes, buffer has 126965 Xen 9pfs request type 116needs 126987 bytes,
> > buffer has 126965
> > 
> > and msize was set to 131072 in the Linux client.
> > 
> > So transport_size == msize. However, there can be multiple requests and
> > responses inflight at any given time. It means that even with
> > transport_size=128KB and msize=4KB we could still have so many outstanding
> > requests that the available memory is less then 4KB.
> > 
> > I admit that with transport_size=128KB and msize=4KB it would be so
> > unlikely to happen that it would be close to impossible to reproduce the
> > error. But theoretically it is possible.
> 
> Even though msize=4kiB is the default value with Linux clients, I would never 
> use such a small value as it would cause an extremely poor 9pfs performance. 
> My plan BTW is to log a performance warning soon if a client is using such a 
> little msize.

Sounds good to me too


> > If we can't come up with any better ideas, we could simply limit the
> > msize in the Linux client to something like transport_size/2 or
> > transport_size/4, but it doesn't feel right.
> 
> That sounds like it would just reduce likeliness for the low buffer situation 
> to happen with Xen, it would not prevent it though.

Right


> As far as I understand you now, the problem with Xen transport seems to be 
> that clients can submit requests more quickly than the 9pfs server could 
> process. That would change the overall situation completely, because my hack 
> solution with delaying delivery (yield) suggested previously, was based on the 
> assumption that this low transport buffer scenario only happens once on boot, 
> but not possibly at any later point again and again.

Yes I think it could happen at any time, not just at boot.

From my understanding, I don't think it is necessarily due to QEMU being
slow. In fact, if QEMU was slow and the client fast, the reply ring
would be empty because the client would be up to speed, therefore msize
== transport_available_size, and there is no problem.

I think it happens when the client submits a bunch of requests but then
it is slow in reading back the replies. Hence, replies accumulate on the
ring, and transport_available_size < msize.

Given that currently the total ring size for replies is equal to msize,
we just need 2 inflight requests to potentially cause the situation.
For instance, a read of 10 bytes and a read of 128KB (max msize): if
issued together they lead to QEMU writing 10 bytes on the reply ring,
and then not having enough room to write 128KB anymore. This example is
a bit of an oversimplification because it doesn't account of the size of
the headers, but I hope you got my point.

So maybe it is OK to delaying delivery because it seems to me that is
the issue.

Also I am thinking that if we are going to increase the ring size to
4MB, maybe we should take the opportunity to set msize=transport_size/2
just to reduce the likelihood of this kind of issue, in addition to a
proper fix.

So maybe transport=8MB and msize=4MB or transport=4MB and msize=2MB.


> > > As you can see it all boils down to one thing: better look for a solution
> > > for Xen to deliver the required buffer size, or in worst case: yield the
> > > coroutine, i.e. delay delivery of response until that can be assured by
> > > Xen.
> > That's might be the best option. But it would have to be
> > xen_9pfs_init_in_iov_from_pdu to call qemu_coroutine_yield() in a loop?
> > Like:
> > 
> > 
> > diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> > index 18fe5b7c92..ef1db47d0c 100644
> > --- a/hw/9pfs/xen-9p-backend.c
> > +++ b/hw/9pfs/xen-9p-backend.c
> > @@ -197,18 +197,13 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU
> > *pdu, g_free(ring->sg);
> > 
> >      ring->sg = g_new0(struct iovec, 2);
> > -    xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
> > 
> > +again:
> > +    xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
> >      buf_size = iov_size(ring->sg, num);
> > -    if (buf_size  < P9_IOHDRSZ) {
> > -        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs request type %d"
> > -                "needs %zu bytes, buffer has %zu, less than minimum\n",
> > -                pdu->id, *size, buf_size);
> > -        xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
> > -        xen_9pfs_disconnect(&xen_9pfs->xendev);
> > -    }
> >      if (buf_size  < *size) {
> > -        *size = buf_size;
> > +        qemu_coroutine_yield();
> > +        goto again;
> >      }
> > 
> >      *piov = ring->sg;
> > 
> > 
> > How is the coroutine going to get scheduled again? Is it automatic or do
> > we need to call it again explicitly when we get the next notification
> > from the client, which should arrive as soon as the client finishes
> > reading the responses?
> 
> I "think" the coroutine would be resumed in the next main loop iteration. But 
> I am still reading up on qemu's coroutine implementation details myself. So 
> that might work, yes, even though in a hackish way.
> 
> And as explained above, if this situation not only happens on boot, then you 
> might consider a different approach. There is also yield_until_fd_readable(int 
> fd); BTW. However the obvious step would be increasing Xen transport capacity 
> tremendously at first place.

Yes, I think we want to do three things:
- increase the transport size for performance
- set msize=transport_size/2 to decrease the likehood of this kind of
  issues
- introduce a slow but safe fix when the issue happens, and
  qemu_coroutine_yield() or yield_until_fd_readable() could be OK if it
  is very rare


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
  2020-05-14 16:24               ` Stefano Stabellini
@ 2020-05-14 17:21                 ` Christian Schoenebeck
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Schoenebeck @ 2020-05-14 17:21 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: qemu-devel, Anthony Perard, Paul Durrant, Greg Kurz

On Donnerstag, 14. Mai 2020 18:24:58 CEST Stefano Stabellini wrote:
> > > The Xen transport
> > > (https://xenbits.xenproject.org/docs/4.13-testing/misc/9pfs.html) has a
> > > shared memory area with 64 pages, so 256KB. It is split in half: 128KB
> > > for requests and 128KB for responses.
> > 
> > That's very little. That also means you won't be able to ever achieve
> > decent 9pfs performance with Xen, because that requires a much larger
> > msize of approximately msize >= 4MB, depending on the underlying I/O
> > hardware, tendency growing (due to rising I/O hardware bandwidth).
> 
> That's great feedback. Fortunately the size of the transport is
> configurable so it is one liner away from becoming much bigger (well,
> two oneliners, see net/9p/trans_xen.c:XEN_9PFS_RING_ORDER in Linux, and
> hw/9pfs/xen-9p-backend.c:MAX_RING_ORDER in QEMU.)
> 
> Would you recommend 4MB each way, so a total of 8MB?

It would be an improvement for Xen, sure. But what if a user picks e.g. 
msize=100MB? Then you are back at square one, because if guest does this on a 
huge file:

	const size_t sz = 5*1024*1024;
	char* buf = malloc(sz);
	// read file in 5MiB chunks
	while (read(fd, buf, sz) > 0) {
		...
	}

then you are still unable to handle this; the yield hack would loop forever.

> > As far as I understand you now, the problem with Xen transport seems to be
> > that clients can submit requests more quickly than the 9pfs server could
> > process. That would change the overall situation completely, because my
> > hack solution with delaying delivery (yield) suggested previously, was
> > based on the assumption that this low transport buffer scenario only
> > happens once on boot, but not possibly at any later point again and
> > again.
> 
> Yes I think it could happen at any time, not just at boot.
> 
> From my understanding, I don't think it is necessarily due to QEMU being
> slow. In fact, if QEMU was slow and the client fast, the reply ring
> would be empty because the client would be up to speed, therefore msize
> == transport_available_size, and there is no problem.

Right, if msize == transport_max_size then Xen easily runs into this 
situation, no matter how fast client is to pull responses. Because in this 
situation it requires just a single response still pending to be pulled by 
client to trigger this Xen transport buffer underrun situation.

> So maybe it is OK to delaying delivery because it seems to me that is
> the issue.
> 
> Also I am thinking that if we are going to increase the ring size to
> 4MB, maybe we should take the opportunity to set msize=transport_size/2
> just to reduce the likelihood of this kind of issue, in addition to a
> proper fix.

To prevent the yield hack to loop forever, yes, you would at least need to 
limit 'msize' according to Xen's maximum transport buffer size. This must be 
done when client opens a 9P session when client sends Tversion:

http://ericvh.github.io/9p-rfc/rfc9p2000.html#version

However I would make the Xen transport buffer size at least configurable for 
administrators. You'll never know how large people want to pick 'msize' for 
performance reasons.

> Yes, I think we want to do three things:
> - increase the transport size for performance
> - set msize=transport_size/2 to decrease the likehood of this kind of
>   issues

Sounds like a plan to me!

> - introduce a slow but safe fix when the issue happens, and
>   qemu_coroutine_yield() or yield_until_fd_readable() could be OK if it
>   is very rare

Yeah, and better test that this would not hang. ;-)

I would also call warn_report_once() in case this slow yield hack is 
triggered, to let admin know that Xen's transport buffer should be increased 
to avoid this performance issue.

Likewise call warn_report_once() if server's Tversion request handler had to 
reduce msize.

Best regards,
Christian Schoenebeck




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-05-14 17:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-10 17:41 [PATCH 0/2] 9pfs: regression init_in_iov_from_pdu truncating size Christian Schoenebeck
2020-05-10 17:05 ` [PATCH 1/2] xen-9pfs: Fix log messages of reply errors Christian Schoenebeck
2020-05-11 22:09   ` Stefano Stabellini
2020-05-10 17:18 ` [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size Christian Schoenebeck
2020-05-10 18:43   ` Christian Schoenebeck
2020-05-11 22:09   ` Stefano Stabellini
2020-05-12 11:29     ` Christian Schoenebeck
2020-05-12 23:24       ` Stefano Stabellini
2020-05-13 13:11         ` Christian Schoenebeck
2020-05-13 23:31           ` Stefano Stabellini
2020-05-14 14:24             ` Christian Schoenebeck
2020-05-14 15:51               ` Stefano Stabellini
2020-05-14 16:10                 ` Christian Schoenebeck
2020-05-14 16:23                   ` Stefano Stabellini
2020-05-14 16:24               ` Stefano Stabellini
2020-05-14 17:21                 ` Christian Schoenebeck
2020-05-12  9:38 ` [PATCH 0/2] 9pfs: regression " Greg Kurz
2020-05-13 23:07   ` Greg Kurz
2020-05-13 23:33     ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.