[virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets

All of lore.kernel.org
 help / color / mirror / Atom feed

* [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-02  4:45 Gavin Li
  2022-08-04  5:00   ` Jason Wang
  2022-08-05 22:11   ` Si-Wei Liu
  0 siblings, 2 replies; 102+ messages in thread
From: Gavin Li @ 2022-08-02  4:45 UTC (permalink / raw)
  To: mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: gavinl, parav, gavi

Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
packets even when GUEST_* offloads are not present on the device.
However, if GSO is not supported, it would be sufficient to allocate
segments to cover just up the MTU size and no further. Allocating the
maximum amount of segments results in a large waste of buffer space in
the queue, which limits the number of packets that can be buffered and
can result in reduced performance.

Therefore, if GSO is not supported, use the MTU to calculate the
optimal amount of segments required.

Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
1 VQ, queue size 1024, before and after the change, with the iperf
server running over the virtio-net interface.

MTU(Bytes)/Bandwidth (Gbit/s)
             Before   After
  1500        22.5     22.4
  9000        12.8     25.9

Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Gavi Teitz <gavi@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
---
 drivers/net/virtio_net.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ec8e1b3108c3..d36918c1809d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -222,6 +222,9 @@ struct virtnet_info {
 	/* I like... big packets and I cannot lie! */
 	bool big_packets;
 
+	/* Indicates GSO support */
+	bool gso_is_supported;
+
 	/* Host will merge rx buffers for big packets (shake it! shake it!) */
 	bool mergeable_rx_bufs;
 
@@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 			   gfp_t gfp)
 {
+	unsigned int sg_num = MAX_SKB_FRAGS;
 	struct page *first, *list = NULL;
 	char *p;
 	int i, err, offset;
 
-	sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
+	if (!vi->gso_is_supported) {
+		unsigned int mtu = vi->dev->mtu;
+
+		sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
+	}
+
+	sg_init_table(rq->sg, sg_num + 2);
 
 	/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
-	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
+	for (i = sg_num + 1; i > 1; --i) {
 		first = get_a_page(rq, gfp);
 		if (!first) {
 			if (list)
@@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 
 	/* chain first in list head */
 	first->private = (unsigned long)list;
-	err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
+	err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
 				  first, gfp);
 	if (err < 0)
 		give_pages(rq, first);
@@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
 	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
 	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
-	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
+	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
 		vi->big_packets = true;
+		vi->gso_is_supported = true;
+	}
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
 		vi->mergeable_rx_bufs = true;
-- 
2.31.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-02  4:45 [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets Gavin Li
@ 2022-08-04  5:00   ` Jason Wang
  2022-08-05 22:11   ` Si-Wei Liu
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-04  5:00 UTC (permalink / raw)
  To: Gavin Li
  Cc: alexander.h.duyck, Virtio-Dev, mst, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem

On Tue, Aug 2, 2022 at 12:47 PM Gavin Li <gavinl@nvidia.com> wrote:
>
> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> packets even when GUEST_* offloads are not present on the device.
> However, if GSO is not supported, it would be sufficient to allocate
> segments to cover just up the MTU size and no further. Allocating the
> maximum amount of segments results in a large waste of buffer space in
> the queue, which limits the number of packets that can be buffered and
> can result in reduced performance.
>
> Therefore, if GSO is not supported, use the MTU to calculate the
> optimal amount of segments required.
>
> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> 1 VQ, queue size 1024, before and after the change, with the iperf
> server running over the virtio-net interface.
>
> MTU(Bytes)/Bandwidth (Gbit/s)
>              Before   After
>   1500        22.5     22.4
>   9000        12.8     25.9
>
> Signed-off-by: Gavin Li <gavinl@nvidia.com>
> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> ---
>  drivers/net/virtio_net.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ec8e1b3108c3..d36918c1809d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -222,6 +222,9 @@ struct virtnet_info {
>         /* I like... big packets and I cannot lie! */
>         bool big_packets;
>
> +       /* Indicates GSO support */
> +       bool gso_is_supported;
> +
>         /* Host will merge rx buffers for big packets (shake it! shake it!) */
>         bool mergeable_rx_bufs;
>
> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>                            gfp_t gfp)
>  {
> +       unsigned int sg_num = MAX_SKB_FRAGS;
>         struct page *first, *list = NULL;
>         char *p;
>         int i, err, offset;
>
> -       sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> +       if (!vi->gso_is_supported) {
> +               unsigned int mtu = vi->dev->mtu;
> +
> +               sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
> +       }
> +
> +       sg_init_table(rq->sg, sg_num + 2);
>
>         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> -       for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> +       for (i = sg_num + 1; i > 1; --i) {
>                 first = get_a_page(rq, gfp);
>                 if (!first) {
>                         if (list)
> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>
>         /* chain first in list head */
>         first->private = (unsigned long)list;
> -       err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> +       err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>                                   first, gfp);
>         if (err < 0)
>                 give_pages(rq, first);
> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> -           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> +           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>                 vi->big_packets = true;
> +               vi->gso_is_supported = true;

Why not simply re-use big_packets here?

Thanks

> +       }
>
>         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>                 vi->mergeable_rx_bufs = true;
> --
> 2.31.1
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-04  5:00   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-04  5:00 UTC (permalink / raw)
  To: Gavin Li
  Cc: mst, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Parav Pandit, gavi

On Tue, Aug 2, 2022 at 12:47 PM Gavin Li <gavinl@nvidia.com> wrote:
>
> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> packets even when GUEST_* offloads are not present on the device.
> However, if GSO is not supported, it would be sufficient to allocate
> segments to cover just up the MTU size and no further. Allocating the
> maximum amount of segments results in a large waste of buffer space in
> the queue, which limits the number of packets that can be buffered and
> can result in reduced performance.
>
> Therefore, if GSO is not supported, use the MTU to calculate the
> optimal amount of segments required.
>
> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> 1 VQ, queue size 1024, before and after the change, with the iperf
> server running over the virtio-net interface.
>
> MTU(Bytes)/Bandwidth (Gbit/s)
>              Before   After
>   1500        22.5     22.4
>   9000        12.8     25.9
>
> Signed-off-by: Gavin Li <gavinl@nvidia.com>
> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> ---
>  drivers/net/virtio_net.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ec8e1b3108c3..d36918c1809d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -222,6 +222,9 @@ struct virtnet_info {
>         /* I like... big packets and I cannot lie! */
>         bool big_packets;
>
> +       /* Indicates GSO support */
> +       bool gso_is_supported;
> +
>         /* Host will merge rx buffers for big packets (shake it! shake it!) */
>         bool mergeable_rx_bufs;
>
> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>                            gfp_t gfp)
>  {
> +       unsigned int sg_num = MAX_SKB_FRAGS;
>         struct page *first, *list = NULL;
>         char *p;
>         int i, err, offset;
>
> -       sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> +       if (!vi->gso_is_supported) {
> +               unsigned int mtu = vi->dev->mtu;
> +
> +               sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
> +       }
> +
> +       sg_init_table(rq->sg, sg_num + 2);
>
>         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> -       for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> +       for (i = sg_num + 1; i > 1; --i) {
>                 first = get_a_page(rq, gfp);
>                 if (!first) {
>                         if (list)
> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>
>         /* chain first in list head */
>         first->private = (unsigned long)list;
> -       err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> +       err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>                                   first, gfp);
>         if (err < 0)
>                 give_pages(rq, first);
> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> -           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> +           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>                 vi->big_packets = true;
> +               vi->gso_is_supported = true;

Why not simply re-use big_packets here?

Thanks

> +       }
>
>         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>                 vi->mergeable_rx_bufs = true;
> --
> 2.31.1
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-04  5:00   ` Jason Wang
@ 2022-08-04  7:10     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-04  7:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Thu, Aug 04, 2022 at 01:00:46PM +0800, Jason Wang wrote:
> On Tue, Aug 2, 2022 at 12:47 PM Gavin Li <gavinl@nvidia.com> wrote:
> >
> > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > packets even when GUEST_* offloads are not present on the device.
> > However, if GSO is not supported, it would be sufficient to allocate
> > segments to cover just up the MTU size and no further. Allocating the
> > maximum amount of segments results in a large waste of buffer space in
> > the queue, which limits the number of packets that can be buffered and
> > can result in reduced performance.
> >
> > Therefore, if GSO is not supported, use the MTU to calculate the
> > optimal amount of segments required.
> >
> > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > 1 VQ, queue size 1024, before and after the change, with the iperf
> > server running over the virtio-net interface.
> >
> > MTU(Bytes)/Bandwidth (Gbit/s)
> >              Before   After
> >   1500        22.5     22.4
> >   9000        12.8     25.9
> >
> > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > ---
> >  drivers/net/virtio_net.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index ec8e1b3108c3..d36918c1809d 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -222,6 +222,9 @@ struct virtnet_info {
> >         /* I like... big packets and I cannot lie! */
> >         bool big_packets;
> >
> > +       /* Indicates GSO support */
> > +       bool gso_is_supported;
> > +
> >         /* Host will merge rx buffers for big packets (shake it! shake it!) */
> >         bool mergeable_rx_bufs;
> >
> > @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> >                            gfp_t gfp)
> >  {
> > +       unsigned int sg_num = MAX_SKB_FRAGS;
> >         struct page *first, *list = NULL;
> >         char *p;
> >         int i, err, offset;
> >
> > -       sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > +       if (!vi->gso_is_supported) {
> > +               unsigned int mtu = vi->dev->mtu;
> > +
> > +               sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
> > +       }
> > +
> > +       sg_init_table(rq->sg, sg_num + 2);
> >
> >         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > -       for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > +       for (i = sg_num + 1; i > 1; --i) {
> >                 first = get_a_page(rq, gfp);
> >                 if (!first) {
> >                         if (list)
> > @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> >
> >         /* chain first in list head */
> >         first->private = (unsigned long)list;
> > -       err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > +       err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
> >                                   first, gfp);
> >         if (err < 0)
> >                 give_pages(rq, first);
> > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> >         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > -           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > +           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> >                 vi->big_packets = true;
> > +               vi->gso_is_supported = true;
> 
> Why not simply re-use big_packets here?
> 
> Thanks

I don't get this question. The patch does use big_packets, it wants
to figure out guest GSO is off so MTU limits the size. 
The name "gso_is_supported" is confusing, should be e.g. guest_gso.


> > +       }
> >
> >         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> >                 vi->mergeable_rx_bufs = true;
> > --
> > 2.31.1
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-04  7:10     ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-04  7:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Gavin Li, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Parav Pandit, gavi

On Thu, Aug 04, 2022 at 01:00:46PM +0800, Jason Wang wrote:
> On Tue, Aug 2, 2022 at 12:47 PM Gavin Li <gavinl@nvidia.com> wrote:
> >
> > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > packets even when GUEST_* offloads are not present on the device.
> > However, if GSO is not supported, it would be sufficient to allocate
> > segments to cover just up the MTU size and no further. Allocating the
> > maximum amount of segments results in a large waste of buffer space in
> > the queue, which limits the number of packets that can be buffered and
> > can result in reduced performance.
> >
> > Therefore, if GSO is not supported, use the MTU to calculate the
> > optimal amount of segments required.
> >
> > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > 1 VQ, queue size 1024, before and after the change, with the iperf
> > server running over the virtio-net interface.
> >
> > MTU(Bytes)/Bandwidth (Gbit/s)
> >              Before   After
> >   1500        22.5     22.4
> >   9000        12.8     25.9
> >
> > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > ---
> >  drivers/net/virtio_net.c | 20 ++++++++++++++++----
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index ec8e1b3108c3..d36918c1809d 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -222,6 +222,9 @@ struct virtnet_info {
> >         /* I like... big packets and I cannot lie! */
> >         bool big_packets;
> >
> > +       /* Indicates GSO support */
> > +       bool gso_is_supported;
> > +
> >         /* Host will merge rx buffers for big packets (shake it! shake it!) */
> >         bool mergeable_rx_bufs;
> >
> > @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> >                            gfp_t gfp)
> >  {
> > +       unsigned int sg_num = MAX_SKB_FRAGS;
> >         struct page *first, *list = NULL;
> >         char *p;
> >         int i, err, offset;
> >
> > -       sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > +       if (!vi->gso_is_supported) {
> > +               unsigned int mtu = vi->dev->mtu;
> > +
> > +               sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
> > +       }
> > +
> > +       sg_init_table(rq->sg, sg_num + 2);
> >
> >         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > -       for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > +       for (i = sg_num + 1; i > 1; --i) {
> >                 first = get_a_page(rq, gfp);
> >                 if (!first) {
> >                         if (list)
> > @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> >
> >         /* chain first in list head */
> >         first->private = (unsigned long)list;
> > -       err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > +       err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
> >                                   first, gfp);
> >         if (err < 0)
> >                 give_pages(rq, first);
> > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> >         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > -           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > +           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> >                 vi->big_packets = true;
> > +               vi->gso_is_supported = true;
> 
> Why not simply re-use big_packets here?
> 
> Thanks

I don't get this question. The patch does use big_packets, it wants
to figure out guest GSO is off so MTU limits the size. 
The name "gso_is_supported" is confusing, should be e.g. guest_gso.


> > +       }
> >
> >         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> >                 vi->mergeable_rx_bufs = true;
> > --
> > 2.31.1
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-04  7:10     ` Michael S. Tsirkin
@ 2022-08-04  7:23       ` Jason Wang
  -1 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-04  7:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Thu, Aug 4, 2022 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Aug 04, 2022 at 01:00:46PM +0800, Jason Wang wrote:
> > On Tue, Aug 2, 2022 at 12:47 PM Gavin Li <gavinl@nvidia.com> wrote:
> > >
> > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > > packets even when GUEST_* offloads are not present on the device.
> > > However, if GSO is not supported, it would be sufficient to allocate
> > > segments to cover just up the MTU size and no further. Allocating the
> > > maximum amount of segments results in a large waste of buffer space in
> > > the queue, which limits the number of packets that can be buffered and
> > > can result in reduced performance.
> > >
> > > Therefore, if GSO is not supported, use the MTU to calculate the
> > > optimal amount of segments required.
> > >
> > > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > > 1 VQ, queue size 1024, before and after the change, with the iperf
> > > server running over the virtio-net interface.
> > >
> > > MTU(Bytes)/Bandwidth (Gbit/s)
> > >              Before   After
> > >   1500        22.5     22.4
> > >   9000        12.8     25.9
> > >
> > > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > ---
> > >  drivers/net/virtio_net.c | 20 ++++++++++++++++----
> > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index ec8e1b3108c3..d36918c1809d 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -222,6 +222,9 @@ struct virtnet_info {
> > >         /* I like... big packets and I cannot lie! */
> > >         bool big_packets;
> > >
> > > +       /* Indicates GSO support */
> > > +       bool gso_is_supported;
> > > +
> > >         /* Host will merge rx buffers for big packets (shake it! shake it!) */
> > >         bool mergeable_rx_bufs;
> > >
> > > @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> > >                            gfp_t gfp)
> > >  {
> > > +       unsigned int sg_num = MAX_SKB_FRAGS;
> > >         struct page *first, *list = NULL;
> > >         char *p;
> > >         int i, err, offset;
> > >
> > > -       sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > > +       if (!vi->gso_is_supported) {
> > > +               unsigned int mtu = vi->dev->mtu;
> > > +
> > > +               sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
> > > +       }
> > > +
> > > +       sg_init_table(rq->sg, sg_num + 2);
> > >
> > >         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > > -       for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > > +       for (i = sg_num + 1; i > 1; --i) {
> > >                 first = get_a_page(rq, gfp);
> > >                 if (!first) {
> > >                         if (list)
> > > @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> > >
> > >         /* chain first in list head */
> > >         first->private = (unsigned long)list;
> > > -       err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > > +       err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
> > >                                   first, gfp);
> > >         if (err < 0)
> > >                 give_pages(rq, first);
> > > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > -           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > > +           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > >                 vi->big_packets = true;
> > > +               vi->gso_is_supported = true;
> >
> > Why not simply re-use big_packets here?
> >
> > Thanks
>
> I don't get this question. The patch does use big_packets, it wants
> to figure out guest GSO is off so MTU limits the size.

Yes.

Thanks

> The name "gso_is_supported" is confusing, should be e.g. guest_gso.
>
>
> > > +       }
> > >
> > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> > >                 vi->mergeable_rx_bufs = true;
> > > --
> > > 2.31.1
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-04  7:23       ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-04  7:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gavin Li, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Parav Pandit, gavi

On Thu, Aug 4, 2022 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Aug 04, 2022 at 01:00:46PM +0800, Jason Wang wrote:
> > On Tue, Aug 2, 2022 at 12:47 PM Gavin Li <gavinl@nvidia.com> wrote:
> > >
> > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > > packets even when GUEST_* offloads are not present on the device.
> > > However, if GSO is not supported, it would be sufficient to allocate
> > > segments to cover just up the MTU size and no further. Allocating the
> > > maximum amount of segments results in a large waste of buffer space in
> > > the queue, which limits the number of packets that can be buffered and
> > > can result in reduced performance.
> > >
> > > Therefore, if GSO is not supported, use the MTU to calculate the
> > > optimal amount of segments required.
> > >
> > > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > > 1 VQ, queue size 1024, before and after the change, with the iperf
> > > server running over the virtio-net interface.
> > >
> > > MTU(Bytes)/Bandwidth (Gbit/s)
> > >              Before   After
> > >   1500        22.5     22.4
> > >   9000        12.8     25.9
> > >
> > > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > ---
> > >  drivers/net/virtio_net.c | 20 ++++++++++++++++----
> > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index ec8e1b3108c3..d36918c1809d 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -222,6 +222,9 @@ struct virtnet_info {
> > >         /* I like... big packets and I cannot lie! */
> > >         bool big_packets;
> > >
> > > +       /* Indicates GSO support */
> > > +       bool gso_is_supported;
> > > +
> > >         /* Host will merge rx buffers for big packets (shake it! shake it!) */
> > >         bool mergeable_rx_bufs;
> > >
> > > @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> > >                            gfp_t gfp)
> > >  {
> > > +       unsigned int sg_num = MAX_SKB_FRAGS;
> > >         struct page *first, *list = NULL;
> > >         char *p;
> > >         int i, err, offset;
> > >
> > > -       sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > > +       if (!vi->gso_is_supported) {
> > > +               unsigned int mtu = vi->dev->mtu;
> > > +
> > > +               sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
> > > +       }
> > > +
> > > +       sg_init_table(rq->sg, sg_num + 2);
> > >
> > >         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > > -       for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > > +       for (i = sg_num + 1; i > 1; --i) {
> > >                 first = get_a_page(rq, gfp);
> > >                 if (!first) {
> > >                         if (list)
> > > @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> > >
> > >         /* chain first in list head */
> > >         first->private = (unsigned long)list;
> > > -       err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > > +       err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
> > >                                   first, gfp);
> > >         if (err < 0)
> > >                 give_pages(rq, first);
> > > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > -           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > > +           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > >                 vi->big_packets = true;
> > > +               vi->gso_is_supported = true;
> >
> > Why not simply re-use big_packets here?
> >
> > Thanks
>
> I don't get this question. The patch does use big_packets, it wants
> to figure out guest GSO is off so MTU limits the size.

Yes.

Thanks

> The name "gso_is_supported" is confusing, should be e.g. guest_gso.
>
>
> > > +       }
> > >
> > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> > >                 vi->mergeable_rx_bufs = true;
> > > --
> > > 2.31.1
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-04  7:23       ` Jason Wang
@ 2022-08-04  7:24         ` Jason Wang
  -1 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-04  7:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Thu, Aug 4, 2022 at 3:23 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Aug 4, 2022 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Aug 04, 2022 at 01:00:46PM +0800, Jason Wang wrote:
> > > On Tue, Aug 2, 2022 at 12:47 PM Gavin Li <gavinl@nvidia.com> wrote:
> > > >
> > > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > > > packets even when GUEST_* offloads are not present on the device.
> > > > However, if GSO is not supported, it would be sufficient to allocate
> > > > segments to cover just up the MTU size and no further. Allocating the
> > > > maximum amount of segments results in a large waste of buffer space in
> > > > the queue, which limits the number of packets that can be buffered and
> > > > can result in reduced performance.
> > > >
> > > > Therefore, if GSO is not supported, use the MTU to calculate the
> > > > optimal amount of segments required.
> > > >
> > > > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > > > 1 VQ, queue size 1024, before and after the change, with the iperf
> > > > server running over the virtio-net interface.
> > > >
> > > > MTU(Bytes)/Bandwidth (Gbit/s)
> > > >              Before   After
> > > >   1500        22.5     22.4
> > > >   9000        12.8     25.9
> > > >
> > > > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > > > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > ---
> > > >  drivers/net/virtio_net.c | 20 ++++++++++++++++----
> > > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index ec8e1b3108c3..d36918c1809d 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -222,6 +222,9 @@ struct virtnet_info {
> > > >         /* I like... big packets and I cannot lie! */
> > > >         bool big_packets;
> > > >
> > > > +       /* Indicates GSO support */
> > > > +       bool gso_is_supported;
> > > > +
> > > >         /* Host will merge rx buffers for big packets (shake it! shake it!) */
> > > >         bool mergeable_rx_bufs;
> > > >
> > > > @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> > > >                            gfp_t gfp)
> > > >  {
> > > > +       unsigned int sg_num = MAX_SKB_FRAGS;
> > > >         struct page *first, *list = NULL;
> > > >         char *p;
> > > >         int i, err, offset;
> > > >
> > > > -       sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > > > +       if (!vi->gso_is_supported) {
> > > > +               unsigned int mtu = vi->dev->mtu;
> > > > +
> > > > +               sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
> > > > +       }
> > > > +
> > > > +       sg_init_table(rq->sg, sg_num + 2);
> > > >
> > > >         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > > > -       for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > > > +       for (i = sg_num + 1; i > 1; --i) {
> > > >                 first = get_a_page(rq, gfp);
> > > >                 if (!first) {
> > > >                         if (list)
> > > > @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> > > >
> > > >         /* chain first in list head */
> > > >         first->private = (unsigned long)list;
> > > > -       err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > > > +       err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
> > > >                                   first, gfp);
> > > >         if (err < 0)
> > > >                 give_pages(rq, first);
> > > > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > > -           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > > > +           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > > >                 vi->big_packets = true;
> > > > +               vi->gso_is_supported = true;
> > >
> > > Why not simply re-use big_packets here?
> > >
> > > Thanks
> >
> > I don't get this question. The patch does use big_packets, it wants
> > to figure out guest GSO is off so MTU limits the size.
>
> Yes.
>
> Thanks

I wonder if it's better to introduce the boolean here:

        /* TODO: size buffers correctly in this case. */
                if (dev->mtu > ETH_DATA_LEN)
                        vi->big_packets = true;

Thanks

>
> > The name "gso_is_supported" is confusing, should be e.g. guest_gso.
> >
> >
> > > > +       }
> > > >
> > > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> > > >                 vi->mergeable_rx_bufs = true;
> > > > --
> > > > 2.31.1
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-04  7:24         ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-04  7:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gavin Li, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Parav Pandit, gavi

On Thu, Aug 4, 2022 at 3:23 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, Aug 4, 2022 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Aug 04, 2022 at 01:00:46PM +0800, Jason Wang wrote:
> > > On Tue, Aug 2, 2022 at 12:47 PM Gavin Li <gavinl@nvidia.com> wrote:
> > > >
> > > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > > > packets even when GUEST_* offloads are not present on the device.
> > > > However, if GSO is not supported, it would be sufficient to allocate
> > > > segments to cover just up the MTU size and no further. Allocating the
> > > > maximum amount of segments results in a large waste of buffer space in
> > > > the queue, which limits the number of packets that can be buffered and
> > > > can result in reduced performance.
> > > >
> > > > Therefore, if GSO is not supported, use the MTU to calculate the
> > > > optimal amount of segments required.
> > > >
> > > > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > > > 1 VQ, queue size 1024, before and after the change, with the iperf
> > > > server running over the virtio-net interface.
> > > >
> > > > MTU(Bytes)/Bandwidth (Gbit/s)
> > > >              Before   After
> > > >   1500        22.5     22.4
> > > >   9000        12.8     25.9
> > > >
> > > > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > > > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > ---
> > > >  drivers/net/virtio_net.c | 20 ++++++++++++++++----
> > > >  1 file changed, 16 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index ec8e1b3108c3..d36918c1809d 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -222,6 +222,9 @@ struct virtnet_info {
> > > >         /* I like... big packets and I cannot lie! */
> > > >         bool big_packets;
> > > >
> > > > +       /* Indicates GSO support */
> > > > +       bool gso_is_supported;
> > > > +
> > > >         /* Host will merge rx buffers for big packets (shake it! shake it!) */
> > > >         bool mergeable_rx_bufs;
> > > >
> > > > @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >  static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> > > >                            gfp_t gfp)
> > > >  {
> > > > +       unsigned int sg_num = MAX_SKB_FRAGS;
> > > >         struct page *first, *list = NULL;
> > > >         char *p;
> > > >         int i, err, offset;
> > > >
> > > > -       sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > > > +       if (!vi->gso_is_supported) {
> > > > +               unsigned int mtu = vi->dev->mtu;
> > > > +
> > > > +               sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
> > > > +       }
> > > > +
> > > > +       sg_init_table(rq->sg, sg_num + 2);
> > > >
> > > >         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > > > -       for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > > > +       for (i = sg_num + 1; i > 1; --i) {
> > > >                 first = get_a_page(rq, gfp);
> > > >                 if (!first) {
> > > >                         if (list)
> > > > @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> > > >
> > > >         /* chain first in list head */
> > > >         first->private = (unsigned long)list;
> > > > -       err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > > > +       err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
> > > >                                   first, gfp);
> > > >         if (err < 0)
> > > >                 give_pages(rq, first);
> > > > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> > > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > > -           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > > > +           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > > >                 vi->big_packets = true;
> > > > +               vi->gso_is_supported = true;
> > >
> > > Why not simply re-use big_packets here?
> > >
> > > Thanks
> >
> > I don't get this question. The patch does use big_packets, it wants
> > to figure out guest GSO is off so MTU limits the size.
>
> Yes.
>
> Thanks

I wonder if it's better to introduce the boolean here:

        /* TODO: size buffers correctly in this case. */
                if (dev->mtu > ETH_DATA_LEN)
                        vi->big_packets = true;

Thanks

>
> > The name "gso_is_supported" is confusing, should be e.g. guest_gso.
> >
> >
> > > > +       }
> > > >
> > > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> > > >                 vi->mergeable_rx_bufs = true;
> > > > --
> > > > 2.31.1
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-02  4:45 [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets Gavin Li
@ 2022-08-05 22:11   ` Si-Wei Liu
  2022-08-05 22:11   ` Si-Wei Liu
  1 sibling, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-05 22:11 UTC (permalink / raw)
  To: Gavin Li, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: gavi



On 8/1/2022 9:45 PM, Gavin Li wrote:
> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> packets even when GUEST_* offloads are not present on the device.
> However, if GSO is not supported,
GUEST GSO (virtio term), or GRO HW (netdev core term) it should have 
been be called.

>   it would be sufficient to allocate
> segments to cover just up the MTU size and no further. Allocating the
> maximum amount of segments results in a large waste of buffer space in
> the queue, which limits the number of packets that can be buffered and
> can result in reduced performance.
>
> Therefore, if GSO is not supported,
Ditto.

> use the MTU to calculate the
> optimal amount of segments required.
>
> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> 1 VQ, queue size 1024, before and after the change, with the iperf
> server running over the virtio-net interface.
>
> MTU(Bytes)/Bandwidth (Gbit/s)
>               Before   After
>    1500        22.5     22.4
>    9000        12.8     25.9
>
> Signed-off-by: Gavin Li <gavinl@nvidia.com>
> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> ---
>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ec8e1b3108c3..d36918c1809d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -222,6 +222,9 @@ struct virtnet_info {
>   	/* I like... big packets and I cannot lie! */
>   	bool big_packets;
>   
> +	/* Indicates GSO support */
> +	bool gso_is_supported;
> +
>   	/* Host will merge rx buffers for big packets (shake it! shake it!) */
>   	bool mergeable_rx_bufs;
>   
> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>   static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>   			   gfp_t gfp)
>   {
> +	unsigned int sg_num = MAX_SKB_FRAGS;
>   	struct page *first, *list = NULL;
>   	char *p;
>   	int i, err, offset;
>   
> -	sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> +	if (!vi->gso_is_supported) {
> +		unsigned int mtu = vi->dev->mtu;
> +
> +		sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
DIV_ROUND_UP() can be used?

Since this branch slightly adds up cost to the datapath, I wonder if 
this sg_num can be saved and set only once (generally in virtnet_probe 
time) in struct virtnet_info?
> +	}
> +
> +	sg_init_table(rq->sg, sg_num + 2);
>   
>   	/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
Comment doesn't match code.
> -	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> +	for (i = sg_num + 1; i > 1; --i) {
>   		first = get_a_page(rq, gfp);
>   		if (!first) {
>   			if (list)
> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>   
>   	/* chain first in list head */
>   	first->private = (unsigned long)list;
> -	err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> +	err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>   				  first, gfp);
>   	if (err < 0)
>   		give_pages(rq, first);
> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>   	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>   	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>   	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> -	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>   		vi->big_packets = true;
> +		vi->gso_is_supported = true;
Please do the same for virtnet_clear_guest_offloads(), and 
correspondingly virtnet_restore_guest_offloads() as well. Not sure why 
virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on 
successful return, seems like a bug to me.


Thanks,
-Siwei
> +	}
>   
>   	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>   		vi->mergeable_rx_bufs = true;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-05 22:11   ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-05 22:11 UTC (permalink / raw)
  To: Gavin Li, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: parav, gavi



On 8/1/2022 9:45 PM, Gavin Li wrote:
> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> packets even when GUEST_* offloads are not present on the device.
> However, if GSO is not supported,
GUEST GSO (virtio term), or GRO HW (netdev core term) it should have 
been be called.

>   it would be sufficient to allocate
> segments to cover just up the MTU size and no further. Allocating the
> maximum amount of segments results in a large waste of buffer space in
> the queue, which limits the number of packets that can be buffered and
> can result in reduced performance.
>
> Therefore, if GSO is not supported,
Ditto.

> use the MTU to calculate the
> optimal amount of segments required.
>
> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> 1 VQ, queue size 1024, before and after the change, with the iperf
> server running over the virtio-net interface.
>
> MTU(Bytes)/Bandwidth (Gbit/s)
>               Before   After
>    1500        22.5     22.4
>    9000        12.8     25.9
>
> Signed-off-by: Gavin Li <gavinl@nvidia.com>
> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> ---
>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ec8e1b3108c3..d36918c1809d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -222,6 +222,9 @@ struct virtnet_info {
>   	/* I like... big packets and I cannot lie! */
>   	bool big_packets;
>   
> +	/* Indicates GSO support */
> +	bool gso_is_supported;
> +
>   	/* Host will merge rx buffers for big packets (shake it! shake it!) */
>   	bool mergeable_rx_bufs;
>   
> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>   static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>   			   gfp_t gfp)
>   {
> +	unsigned int sg_num = MAX_SKB_FRAGS;
>   	struct page *first, *list = NULL;
>   	char *p;
>   	int i, err, offset;
>   
> -	sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> +	if (!vi->gso_is_supported) {
> +		unsigned int mtu = vi->dev->mtu;
> +
> +		sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
DIV_ROUND_UP() can be used?

Since this branch slightly adds up cost to the datapath, I wonder if 
this sg_num can be saved and set only once (generally in virtnet_probe 
time) in struct virtnet_info?
> +	}
> +
> +	sg_init_table(rq->sg, sg_num + 2);
>   
>   	/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
Comment doesn't match code.
> -	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> +	for (i = sg_num + 1; i > 1; --i) {
>   		first = get_a_page(rq, gfp);
>   		if (!first) {
>   			if (list)
> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>   
>   	/* chain first in list head */
>   	first->private = (unsigned long)list;
> -	err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> +	err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>   				  first, gfp);
>   	if (err < 0)
>   		give_pages(rq, first);
> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>   	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>   	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>   	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> -	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>   		vi->big_packets = true;
> +		vi->gso_is_supported = true;
Please do the same for virtnet_clear_guest_offloads(), and 
correspondingly virtnet_restore_guest_offloads() as well. Not sure why 
virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on 
successful return, seems like a bug to me.


Thanks,
-Siwei
> +	}
>   
>   	if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>   		vi->mergeable_rx_bufs = true;


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-05 22:11   ` Si-Wei Liu
@ 2022-08-05 23:26     ` Si-Wei Liu
  -1 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-05 23:26 UTC (permalink / raw)
  To: Gavin Li, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: gavi



On 8/5/2022 3:11 PM, Si-Wei Liu wrote:
>
>
> On 8/1/2022 9:45 PM, Gavin Li wrote:
>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>> packets even when GUEST_* offloads are not present on the device.
>> However, if GSO is not supported,
> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have 
> been be called.
>
>>   it would be sufficient to allocate
>> segments to cover just up the MTU size and no further. Allocating the
>> maximum amount of segments results in a large waste of buffer space in
>> the queue, which limits the number of packets that can be buffered and
>> can result in reduced performance.
>>
>> Therefore, if GSO is not supported,
> Ditto.
>
>> use the MTU to calculate the
>> optimal amount of segments required.
>>
>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>> 1 VQ, queue size 1024, before and after the change, with the iperf
>> server running over the virtio-net interface.
>>
>> MTU(Bytes)/Bandwidth (Gbit/s)
>>               Before   After
>>    1500        22.5     22.4
>>    9000        12.8     25.9
>>
>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> ---
>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index ec8e1b3108c3..d36918c1809d 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>       /* I like... big packets and I cannot lie! */
>>       bool big_packets;
>>   +    /* Indicates GSO support */
>> +    bool gso_is_supported;
>> +
>>       /* Host will merge rx buffers for big packets (shake it! shake 
>> it!) */
>>       bool mergeable_rx_bufs;
>>   @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct 
>> virtnet_info *vi, struct receive_queue *rq,
>>   static int add_recvbuf_big(struct virtnet_info *vi, struct 
>> receive_queue *rq,
>>                  gfp_t gfp)
>>   {
>> +    unsigned int sg_num = MAX_SKB_FRAGS;
>>       struct page *first, *list = NULL;
>>       char *p;
>>       int i, err, offset;
>>   -    sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>> +    if (!vi->gso_is_supported) {
>> +        unsigned int mtu = vi->dev->mtu;
>> +
>> +        sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / 
>> PAGE_SIZE;
> DIV_ROUND_UP() can be used?
>
> Since this branch slightly adds up cost to the datapath, I wonder if 
> this sg_num can be saved and set only once (generally in virtnet_probe 
> time
... , but can align with new mtu during .ndo_change_mtu(), too.
> ) in struct virtnet_info?

Thanks,
-Siwei

>> +    }
>> +
>> +    sg_init_table(rq->sg, sg_num + 2);
>>         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> Comment doesn't match code.
>> -    for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>> +    for (i = sg_num + 1; i > 1; --i) {
>>           first = get_a_page(rq, gfp);
>>           if (!first) {
>>               if (list)
>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info 
>> *vi, struct receive_queue *rq,
>>         /* chain first in list head */
>>       first->private = (unsigned long)list;
>> -    err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>> +    err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>                     first, gfp);
>>       if (err < 0)
>>           give_pages(rq, first);
>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device 
>> *vdev)
>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>> -        virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>> +        virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>           vi->big_packets = true;
>> +        vi->gso_is_supported = true;
> Please do the same for virtnet_clear_guest_offloads(), and 
> correspondingly virtnet_restore_guest_offloads() as well. Not sure why 
> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet 
> on successful return, seems like a bug to me.
>
>
> Thanks,
> -Siwei
>> +    }
>>         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>           vi->mergeable_rx_bufs = true;
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-05 23:26     ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-05 23:26 UTC (permalink / raw)
  To: Gavin Li, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: parav, gavi



On 8/5/2022 3:11 PM, Si-Wei Liu wrote:
>
>
> On 8/1/2022 9:45 PM, Gavin Li wrote:
>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>> packets even when GUEST_* offloads are not present on the device.
>> However, if GSO is not supported,
> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have 
> been be called.
>
>>   it would be sufficient to allocate
>> segments to cover just up the MTU size and no further. Allocating the
>> maximum amount of segments results in a large waste of buffer space in
>> the queue, which limits the number of packets that can be buffered and
>> can result in reduced performance.
>>
>> Therefore, if GSO is not supported,
> Ditto.
>
>> use the MTU to calculate the
>> optimal amount of segments required.
>>
>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>> 1 VQ, queue size 1024, before and after the change, with the iperf
>> server running over the virtio-net interface.
>>
>> MTU(Bytes)/Bandwidth (Gbit/s)
>>               Before   After
>>    1500        22.5     22.4
>>    9000        12.8     25.9
>>
>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> ---
>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index ec8e1b3108c3..d36918c1809d 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>       /* I like... big packets and I cannot lie! */
>>       bool big_packets;
>>   +    /* Indicates GSO support */
>> +    bool gso_is_supported;
>> +
>>       /* Host will merge rx buffers for big packets (shake it! shake 
>> it!) */
>>       bool mergeable_rx_bufs;
>>   @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct 
>> virtnet_info *vi, struct receive_queue *rq,
>>   static int add_recvbuf_big(struct virtnet_info *vi, struct 
>> receive_queue *rq,
>>                  gfp_t gfp)
>>   {
>> +    unsigned int sg_num = MAX_SKB_FRAGS;
>>       struct page *first, *list = NULL;
>>       char *p;
>>       int i, err, offset;
>>   -    sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>> +    if (!vi->gso_is_supported) {
>> +        unsigned int mtu = vi->dev->mtu;
>> +
>> +        sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / 
>> PAGE_SIZE;
> DIV_ROUND_UP() can be used?
>
> Since this branch slightly adds up cost to the datapath, I wonder if 
> this sg_num can be saved and set only once (generally in virtnet_probe 
> time
... , but can align with new mtu during .ndo_change_mtu(), too.
> ) in struct virtnet_info?

Thanks,
-Siwei

>> +    }
>> +
>> +    sg_init_table(rq->sg, sg_num + 2);
>>         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> Comment doesn't match code.
>> -    for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>> +    for (i = sg_num + 1; i > 1; --i) {
>>           first = get_a_page(rq, gfp);
>>           if (!first) {
>>               if (list)
>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info 
>> *vi, struct receive_queue *rq,
>>         /* chain first in list head */
>>       first->private = (unsigned long)list;
>> -    err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>> +    err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>                     first, gfp);
>>       if (err < 0)
>>           give_pages(rq, first);
>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device 
>> *vdev)
>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>> -        virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>> +        virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>           vi->big_packets = true;
>> +        vi->gso_is_supported = true;
> Please do the same for virtnet_clear_guest_offloads(), and 
> correspondingly virtnet_restore_guest_offloads() as well. Not sure why 
> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet 
> on successful return, seems like a bug to me.
>
>
> Thanks,
> -Siwei
>> +    }
>>         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>           vi->mergeable_rx_bufs = true;
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-04  7:10     ` Michael S. Tsirkin
  (?)
  (?)
@ 2022-08-08  6:24     ` Gavin Li
  -1 siblings, 0 replies; 102+ messages in thread
From: Gavin Li @ 2022-08-08  6:24 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Parav Pandit, gavi


On 8/4/2022 3:10 PM, Michael S. Tsirkin wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Aug 04, 2022 at 01:00:46PM +0800, Jason Wang wrote:
>> On Tue, Aug 2, 2022 at 12:47 PM Gavin Li <gavinl@nvidia.com> wrote:
>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>> packets even when GUEST_* offloads are not present on the device.
>>> However, if GSO is not supported, it would be sufficient to allocate
>>> segments to cover just up the MTU size and no further. Allocating the
>>> maximum amount of segments results in a large waste of buffer space in
>>> the queue, which limits the number of packets that can be buffered and
>>> can result in reduced performance.
>>>
>>> Therefore, if GSO is not supported, use the MTU to calculate the
>>> optimal amount of segments required.
>>>
>>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>> server running over the virtio-net interface.
>>>
>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>               Before   After
>>>    1500        22.5     22.4
>>>    9000        12.8     25.9
>>>
>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>> ---
>>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index ec8e1b3108c3..d36918c1809d 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>          /* I like... big packets and I cannot lie! */
>>>          bool big_packets;
>>>
>>> +       /* Indicates GSO support */
>>> +       bool gso_is_supported;
>>> +
>>>          /* Host will merge rx buffers for big packets (shake it! shake it!) */
>>>          bool mergeable_rx_bufs;
>>>
>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>>>   static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>>>                             gfp_t gfp)
>>>   {
>>> +       unsigned int sg_num = MAX_SKB_FRAGS;
>>>          struct page *first, *list = NULL;
>>>          char *p;
>>>          int i, err, offset;
>>>
>>> -       sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>> +       if (!vi->gso_is_supported) {
>>> +               unsigned int mtu = vi->dev->mtu;
>>> +
>>> +               sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
>>> +       }
>>> +
>>> +       sg_init_table(rq->sg, sg_num + 2);
>>>
>>>          /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>> -       for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>> +       for (i = sg_num + 1; i > 1; --i) {
>>>                  first = get_a_page(rq, gfp);
>>>                  if (!first) {
>>>                          if (list)
>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>>>
>>>          /* chain first in list head */
>>>          first->private = (unsigned long)list;
>>> -       err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>> +       err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>                                    first, gfp);
>>>          if (err < 0)
>>>                  give_pages(rq, first);
>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>>>          if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>              virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>              virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>> -           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>> +           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>                  vi->big_packets = true;
>>> +               vi->gso_is_supported = true;
>> Why not simply re-use big_packets here?
>>
>> Thanks
> I don't get this question. The patch does use big_packets, it wants
> to figure out guest GSO is off so MTU limits the size.
> The name "gso_is_supported" is confusing, should be e.g. guest_gso.
ACK.
>
>>> +       }
>>>
>>>          if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>                  vi->mergeable_rx_bufs = true;
>>> --
>>> 2.31.1
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-04  7:24         ` Jason Wang
  (?)
@ 2022-08-08  6:54         ` Gavin Li
  -1 siblings, 0 replies; 102+ messages in thread
From: Gavin Li @ 2022-08-08  6:54 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Parav Pandit, gavi


On 8/4/2022 3:24 PM, Jason Wang wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Aug 4, 2022 at 3:23 PM Jason Wang <jasowang@redhat.com> wrote:
>> On Thu, Aug 4, 2022 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Thu, Aug 04, 2022 at 01:00:46PM +0800, Jason Wang wrote:
>>>> On Tue, Aug 2, 2022 at 12:47 PM Gavin Li <gavinl@nvidia.com> wrote:
>>>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>>>> packets even when GUEST_* offloads are not present on the device.
>>>>> However, if GSO is not supported, it would be sufficient to allocate
>>>>> segments to cover just up the MTU size and no further. Allocating the
>>>>> maximum amount of segments results in a large waste of buffer space in
>>>>> the queue, which limits the number of packets that can be buffered and
>>>>> can result in reduced performance.
>>>>>
>>>>> Therefore, if GSO is not supported, use the MTU to calculate the
>>>>> optimal amount of segments required.
>>>>>
>>>>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>>>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>>>> server running over the virtio-net interface.
>>>>>
>>>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>>>               Before   After
>>>>>    1500        22.5     22.4
>>>>>    9000        12.8     25.9
>>>>>
>>>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>>>> ---
>>>>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index ec8e1b3108c3..d36918c1809d 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>>>          /* I like... big packets and I cannot lie! */
>>>>>          bool big_packets;
>>>>>
>>>>> +       /* Indicates GSO support */
>>>>> +       bool gso_is_supported;
>>>>> +
>>>>>          /* Host will merge rx buffers for big packets (shake it! shake it!) */
>>>>>          bool mergeable_rx_bufs;
>>>>>
>>>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>>>>>   static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>>>>>                             gfp_t gfp)
>>>>>   {
>>>>> +       unsigned int sg_num = MAX_SKB_FRAGS;
>>>>>          struct page *first, *list = NULL;
>>>>>          char *p;
>>>>>          int i, err, offset;
>>>>>
>>>>> -       sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>>>> +       if (!vi->gso_is_supported) {
>>>>> +               unsigned int mtu = vi->dev->mtu;
>>>>> +
>>>>> +               sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu / PAGE_SIZE;
>>>>> +       }
>>>>> +
>>>>> +       sg_init_table(rq->sg, sg_num + 2);
>>>>>
>>>>>          /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>>>> -       for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>>>> +       for (i = sg_num + 1; i > 1; --i) {
>>>>>                  first = get_a_page(rq, gfp);
>>>>>                  if (!first) {
>>>>>                          if (list)
>>>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>>>>>
>>>>>          /* chain first in list head */
>>>>>          first->private = (unsigned long)list;
>>>>> -       err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>>>> +       err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>>>                                    first, gfp);
>>>>>          if (err < 0)
>>>>>                  give_pages(rq, first);
>>>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device *vdev)
>>>>>          if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>>>              virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>>>              virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>>>> -           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>>>> +           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>>>                  vi->big_packets = true;
>>>>> +               vi->gso_is_supported = true;
>>>> Why not simply re-use big_packets here?
>>>>
>>>> Thanks
>>> I don't get this question. The patch does use big_packets, it wants
>>> to figure out guest GSO is off so MTU limits the size.
>> Yes.
>>
>> Thanks
> I wonder if it's better to introduce the boolean here:
>
>          /* TODO: size buffers correctly in this case. */
>                  if (dev->mtu > ETH_DATA_LEN)
>                          vi->big_packets = true;
>
> Thanks
This is not a safe or straightforward way to determine that guest GSO is 
not supported, as big_packets just indicates the packets can be big, not 
specifically guest GSO.
>>> The name "gso_is_supported" is confusing, should be e.g. guest_gso.
>>>
>>>
>>>>> +       }
>>>>>
>>>>>          if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>>>                  vi->mergeable_rx_bufs = true;
>>>>> --
>>>>> 2.31.1
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-05 22:11   ` Si-Wei Liu
  (?)
  (?)
@ 2022-08-08  7:31   ` Gavin Li
  2022-08-08 23:56       ` Si-Wei Liu
  -1 siblings, 1 reply; 102+ messages in thread
From: Gavin Li @ 2022-08-08  7:31 UTC (permalink / raw)
  To: Si-Wei Liu, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: parav, gavi


On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
> External email: Use caution opening links or attachments
>
>
> On 8/1/2022 9:45 PM, Gavin Li wrote:
>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>> packets even when GUEST_* offloads are not present on the device.
>> However, if GSO is not supported,
> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
> been be called.
ACK
>
>>   it would be sufficient to allocate
>> segments to cover just up the MTU size and no further. Allocating the
>> maximum amount of segments results in a large waste of buffer space in
>> the queue, which limits the number of packets that can be buffered and
>> can result in reduced performance.
>>
>> Therefore, if GSO is not supported,
> Ditto.
ACK
>
>> use the MTU to calculate the
>> optimal amount of segments required.
>>
>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>> 1 VQ, queue size 1024, before and after the change, with the iperf
>> server running over the virtio-net interface.
>>
>> MTU(Bytes)/Bandwidth (Gbit/s)
>>               Before   After
>>    1500        22.5     22.4
>>    9000        12.8     25.9
>>
>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> ---
>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index ec8e1b3108c3..d36918c1809d 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>       /* I like... big packets and I cannot lie! */
>>       bool big_packets;
>>
>> +     /* Indicates GSO support */
>> +     bool gso_is_supported;
>> +
>>       /* Host will merge rx buffers for big packets (shake it! shake 
>> it!) */
>>       bool mergeable_rx_bufs;
>>
>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct 
>> virtnet_info *vi, struct receive_queue *rq,
>>   static int add_recvbuf_big(struct virtnet_info *vi, struct 
>> receive_queue *rq,
>>                          gfp_t gfp)
>>   {
>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>       struct page *first, *list = NULL;
>>       char *p;
>>       int i, err, offset;
>>
>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>> +     if (!vi->gso_is_supported) {
>> +             unsigned int mtu = vi->dev->mtu;
>> +
>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu 
>> / PAGE_SIZE;
> DIV_ROUND_UP() can be used?
ACK
>
> Since this branch slightly adds up cost to the datapath, I wonder if
> this sg_num can be saved and set only once (generally in virtnet_probe
> time) in struct virtnet_info?
Not sure how to do it and align it with align with new mtu during 
.ndo_change_mtu()---as you mentioned in the following mail. Any idea? 
ndo_change_mtu might be in vendor specific code and unmanageable. In my 
case, the mtu can only be changed in the xml of the guest vm.
>> +     }
>> +
>> +     sg_init_table(rq->sg, sg_num + 2);
>>
>>       /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> Comment doesn't match code.
ACK
>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>> +     for (i = sg_num + 1; i > 1; --i) {
>>               first = get_a_page(rq, gfp);
>>               if (!first) {
>>                       if (list)
>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info 
>> *vi, struct receive_queue *rq,
>>
>>       /* chain first in list head */
>>       first->private = (unsigned long)list;
>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>                                 first, gfp);
>>       if (err < 0)
>>               give_pages(rq, first);
>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device 
>> *vdev)
>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>               vi->big_packets = true;
>> +             vi->gso_is_supported = true;
> Please do the same for virtnet_clear_guest_offloads(), and
> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> successful return, seems like a bug to me.
ACK. The two calls virtnet_set_guest_offloads and 
virtnet_set_guest_offloads is also called by virtnet_set_features. Do 
you think if I can do this in virtnet_set_guest_offloads?
>
>
> Thanks,
> -Siwei
>> +     }
>>
>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>               vi->mergeable_rx_bufs = true;
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-05 23:26     ` Si-Wei Liu
  (?)
@ 2022-08-08  7:34     ` Gavin Li
  -1 siblings, 0 replies; 102+ messages in thread
From: Gavin Li @ 2022-08-08  7:34 UTC (permalink / raw)
  To: Si-Wei Liu, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: parav, gavi


On 8/6/2022 7:26 AM, Si-Wei Liu wrote:
> External email: Use caution opening links or attachments
>
>
> On 8/5/2022 3:11 PM, Si-Wei Liu wrote:
>>
>>
>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>> packets even when GUEST_* offloads are not present on the device.
>>> However, if GSO is not supported,
>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>> been be called.
>>
>>>   it would be sufficient to allocate
>>> segments to cover just up the MTU size and no further. Allocating the
>>> maximum amount of segments results in a large waste of buffer space in
>>> the queue, which limits the number of packets that can be buffered and
>>> can result in reduced performance.
>>>
>>> Therefore, if GSO is not supported,
>> Ditto.
>>
>>> use the MTU to calculate the
>>> optimal amount of segments required.
>>>
>>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>> server running over the virtio-net interface.
>>>
>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>               Before   After
>>>    1500        22.5     22.4
>>>    9000        12.8     25.9
>>>
>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>> ---
>>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index ec8e1b3108c3..d36918c1809d 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>       /* I like... big packets and I cannot lie! */
>>>       bool big_packets;
>>>   +    /* Indicates GSO support */
>>> +    bool gso_is_supported;
>>> +
>>>       /* Host will merge rx buffers for big packets (shake it! shake
>>> it!) */
>>>       bool mergeable_rx_bufs;
>>>   @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
>>> virtnet_info *vi, struct receive_queue *rq,
>>>   static int add_recvbuf_big(struct virtnet_info *vi, struct
>>> receive_queue *rq,
>>>                  gfp_t gfp)
>>>   {
>>> +    unsigned int sg_num = MAX_SKB_FRAGS;
>>>       struct page *first, *list = NULL;
>>>       char *p;
>>>       int i, err, offset;
>>>   -    sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>> +    if (!vi->gso_is_supported) {
>>> +        unsigned int mtu = vi->dev->mtu;
>>> +
>>> +        sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu /
>>> PAGE_SIZE;
>> DIV_ROUND_UP() can be used?
>>
>> Since this branch slightly adds up cost to the datapath, I wonder if
>> this sg_num can be saved and set only once (generally in virtnet_probe
>> time
> ... , but can align with new mtu during .ndo_change_mtu(), too.
ACK but don't know how to do this.
>> ) in struct virtnet_info?
>
> Thanks,
> -Siwei
>
>>> +    }
>>> +
>>> +    sg_init_table(rq->sg, sg_num + 2);
>>>         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>> Comment doesn't match code.
>>> -    for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>> +    for (i = sg_num + 1; i > 1; --i) {
>>>           first = get_a_page(rq, gfp);
>>>           if (!first) {
>>>               if (list)
>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
>>> *vi, struct receive_queue *rq,
>>>         /* chain first in list head */
>>>       first->private = (unsigned long)list;
>>> -    err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>> +    err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>                     first, gfp);
>>>       if (err < 0)
>>>           give_pages(rq, first);
>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
>>> *vdev)
>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>> -        virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>> +        virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>           vi->big_packets = true;
>>> +        vi->gso_is_supported = true;
>> Please do the same for virtnet_clear_guest_offloads(), and
>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet
>> on successful return, seems like a bug to me.
>>
>>
>> Thanks,
>> -Siwei
>>> +    }
>>>         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>           vi->mergeable_rx_bufs = true;
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-08  7:31   ` Gavin Li
@ 2022-08-08 23:56       ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-08 23:56 UTC (permalink / raw)
  To: Gavin Li, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: gavi



On 8/8/2022 12:31 AM, Gavin Li wrote:
>
> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>> packets even when GUEST_* offloads are not present on the device.
>>> However, if GSO is not supported,
>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>> been be called.
> ACK
>>
>>>   it would be sufficient to allocate
>>> segments to cover just up the MTU size and no further. Allocating the
>>> maximum amount of segments results in a large waste of buffer space in
>>> the queue, which limits the number of packets that can be buffered and
>>> can result in reduced performance.
>>>
>>> Therefore, if GSO is not supported,
>> Ditto.
> ACK
>>
>>> use the MTU to calculate the
>>> optimal amount of segments required.
>>>
>>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>> server running over the virtio-net interface.
>>>
>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>               Before   After
>>>    1500        22.5     22.4
>>>    9000        12.8     25.9
>>>
>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>> ---
>>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index ec8e1b3108c3..d36918c1809d 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>       /* I like... big packets and I cannot lie! */
>>>       bool big_packets;
>>>
>>> +     /* Indicates GSO support */
>>> +     bool gso_is_supported;
>>> +
>>>       /* Host will merge rx buffers for big packets (shake it! shake 
>>> it!) */
>>>       bool mergeable_rx_bufs;
>>>
>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct 
>>> virtnet_info *vi, struct receive_queue *rq,
>>>   static int add_recvbuf_big(struct virtnet_info *vi, struct 
>>> receive_queue *rq,
>>>                          gfp_t gfp)
>>>   {
>>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>>       struct page *first, *list = NULL;
>>>       char *p;
>>>       int i, err, offset;
>>>
>>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>> +     if (!vi->gso_is_supported) {
>>> +             unsigned int mtu = vi->dev->mtu;
>>> +
>>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu 
>>> / PAGE_SIZE;
>> DIV_ROUND_UP() can be used?
> ACK
>>
>> Since this branch slightly adds up cost to the datapath, I wonder if
>> this sg_num can be saved and set only once (generally in virtnet_probe
>> time) in struct virtnet_info?
> Not sure how to do it and align it with align with new mtu during 
> .ndo_change_mtu()---as you mentioned in the following mail. Any idea? 
> ndo_change_mtu might be in vendor specific code and unmanageable. In 
> my case, the mtu can only be changed in the xml of the guest vm.
Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on 
a virtio-net device with 9000 MTU (as defined in guest xml). Basically 
guest user can set MTU to any valid value lower than the original 
HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu() 
should have validated the MTU value before coming down to it. And I 
suspect you might want to do virtnet_close() and virtnet_open() 
before/after changing the buffer size on the fly (the netif_running() 
case), implementing .ndo_change_mtu() will be needed anyway.

>>> +     }
>>> +
>>> +     sg_init_table(rq->sg, sg_num + 2);
>>>
>>>       /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>> Comment doesn't match code.
> ACK
>>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>> +     for (i = sg_num + 1; i > 1; --i) {
>>>               first = get_a_page(rq, gfp);
>>>               if (!first) {
>>>                       if (list)
>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info 
>>> *vi, struct receive_queue *rq,
>>>
>>>       /* chain first in list head */
>>>       first->private = (unsigned long)list;
>>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>                                 first, gfp);
>>>       if (err < 0)
>>>               give_pages(rq, first);
>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device 
>>> *vdev)
>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>               vi->big_packets = true;
>>> +             vi->gso_is_supported = true;
>> Please do the same for virtnet_clear_guest_offloads(), and
>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
>> successful return, seems like a bug to me.
> ACK. The two calls virtnet_set_guest_offloads and 
> virtnet_set_guest_offloads is also called by virtnet_set_features. Do 
> you think if I can do this in virtnet_set_guest_offloads?
I think that it should be fine, though you may want to deal with the XDP 
path not to regress it.

-Siwei

>>
>>
>> Thanks,
>> -Siwei
>>> +     }
>>>
>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>               vi->mergeable_rx_bufs = true;
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-08 23:56       ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-08 23:56 UTC (permalink / raw)
  To: Gavin Li, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: parav, gavi



On 8/8/2022 12:31 AM, Gavin Li wrote:
>
> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>> packets even when GUEST_* offloads are not present on the device.
>>> However, if GSO is not supported,
>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>> been be called.
> ACK
>>
>>>   it would be sufficient to allocate
>>> segments to cover just up the MTU size and no further. Allocating the
>>> maximum amount of segments results in a large waste of buffer space in
>>> the queue, which limits the number of packets that can be buffered and
>>> can result in reduced performance.
>>>
>>> Therefore, if GSO is not supported,
>> Ditto.
> ACK
>>
>>> use the MTU to calculate the
>>> optimal amount of segments required.
>>>
>>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>> server running over the virtio-net interface.
>>>
>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>               Before   After
>>>    1500        22.5     22.4
>>>    9000        12.8     25.9
>>>
>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>> ---
>>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index ec8e1b3108c3..d36918c1809d 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>       /* I like... big packets and I cannot lie! */
>>>       bool big_packets;
>>>
>>> +     /* Indicates GSO support */
>>> +     bool gso_is_supported;
>>> +
>>>       /* Host will merge rx buffers for big packets (shake it! shake 
>>> it!) */
>>>       bool mergeable_rx_bufs;
>>>
>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct 
>>> virtnet_info *vi, struct receive_queue *rq,
>>>   static int add_recvbuf_big(struct virtnet_info *vi, struct 
>>> receive_queue *rq,
>>>                          gfp_t gfp)
>>>   {
>>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>>       struct page *first, *list = NULL;
>>>       char *p;
>>>       int i, err, offset;
>>>
>>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>> +     if (!vi->gso_is_supported) {
>>> +             unsigned int mtu = vi->dev->mtu;
>>> +
>>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu 
>>> / PAGE_SIZE;
>> DIV_ROUND_UP() can be used?
> ACK
>>
>> Since this branch slightly adds up cost to the datapath, I wonder if
>> this sg_num can be saved and set only once (generally in virtnet_probe
>> time) in struct virtnet_info?
> Not sure how to do it and align it with align with new mtu during 
> .ndo_change_mtu()---as you mentioned in the following mail. Any idea? 
> ndo_change_mtu might be in vendor specific code and unmanageable. In 
> my case, the mtu can only be changed in the xml of the guest vm.
Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on 
a virtio-net device with 9000 MTU (as defined in guest xml). Basically 
guest user can set MTU to any valid value lower than the original 
HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu() 
should have validated the MTU value before coming down to it. And I 
suspect you might want to do virtnet_close() and virtnet_open() 
before/after changing the buffer size on the fly (the netif_running() 
case), implementing .ndo_change_mtu() will be needed anyway.

>>> +     }
>>> +
>>> +     sg_init_table(rq->sg, sg_num + 2);
>>>
>>>       /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>> Comment doesn't match code.
> ACK
>>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>> +     for (i = sg_num + 1; i > 1; --i) {
>>>               first = get_a_page(rq, gfp);
>>>               if (!first) {
>>>                       if (list)
>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info 
>>> *vi, struct receive_queue *rq,
>>>
>>>       /* chain first in list head */
>>>       first->private = (unsigned long)list;
>>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>                                 first, gfp);
>>>       if (err < 0)
>>>               give_pages(rq, first);
>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device 
>>> *vdev)
>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>               vi->big_packets = true;
>>> +             vi->gso_is_supported = true;
>> Please do the same for virtnet_clear_guest_offloads(), and
>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
>> successful return, seems like a bug to me.
> ACK. The two calls virtnet_set_guest_offloads and 
> virtnet_set_guest_offloads is also called by virtnet_set_features. Do 
> you think if I can do this in virtnet_set_guest_offloads?
I think that it should be fine, though you may want to deal with the XDP 
path not to regress it.

-Siwei

>>
>>
>> Thanks,
>> -Siwei
>>> +     }
>>>
>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>               vi->mergeable_rx_bufs = true;
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-08 23:56       ` Si-Wei Liu
  (?)
@ 2022-08-09  7:06       ` Gavin Li
  2022-08-09  7:44           ` Jason Wang
  2022-08-09 18:06           ` Si-Wei Liu
  -1 siblings, 2 replies; 102+ messages in thread
From: Gavin Li @ 2022-08-09  7:06 UTC (permalink / raw)
  To: Si-Wei Liu, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: parav, gavi

[-- Attachment #1: Type: text/plain, Size: 6860 bytes --]


On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
> External email: Use caution opening links or attachments
>
>
> On 8/8/2022 12:31 AM, Gavin Li wrote:
>>
>> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>>> packets even when GUEST_* offloads are not present on the device.
>>>> However, if GSO is not supported,
>>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>>> been be called.
>> ACK
>>>
>>>>   it would be sufficient to allocate
>>>> segments to cover just up the MTU size and no further. Allocating the
>>>> maximum amount of segments results in a large waste of buffer space in
>>>> the queue, which limits the number of packets that can be buffered and
>>>> can result in reduced performance.
>>>>
>>>> Therefore, if GSO is not supported,
>>> Ditto.
>> ACK
>>>
>>>> use the MTU to calculate the
>>>> optimal amount of segments required.
>>>>
>>>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA 
>>>> for
>>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>>> server running over the virtio-net interface.
>>>>
>>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>>               Before   After
>>>>    1500        22.5     22.4
>>>>    9000        12.8     25.9
>>>>
>>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>>> ---
>>>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index ec8e1b3108c3..d36918c1809d 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>>       /* I like... big packets and I cannot lie! */
>>>>       bool big_packets;
>>>>
>>>> +     /* Indicates GSO support */
>>>> +     bool gso_is_supported;
>>>> +
>>>>       /* Host will merge rx buffers for big packets (shake it! shake
>>>> it!) */
>>>>       bool mergeable_rx_bufs;
>>>>
>>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
>>>> virtnet_info *vi, struct receive_queue *rq,
>>>>   static int add_recvbuf_big(struct virtnet_info *vi, struct
>>>> receive_queue *rq,
>>>>                          gfp_t gfp)
>>>>   {
>>>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>>>       struct page *first, *list = NULL;
>>>>       char *p;
>>>>       int i, err, offset;
>>>>
>>>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>>> +     if (!vi->gso_is_supported) {
>>>> +             unsigned int mtu = vi->dev->mtu;
>>>> +
>>>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
>>>> / PAGE_SIZE;
>>> DIV_ROUND_UP() can be used?
>> ACK
>>>
>>> Since this branch slightly adds up cost to the datapath, I wonder if
>>> this sg_num can be saved and set only once (generally in virtnet_probe
>>> time) in struct virtnet_info?
>> Not sure how to do it and align it with align with new mtu during
>> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
>> ndo_change_mtu might be in vendor specific code and unmanageable. In
>> my case, the mtu can only be changed in the xml of the guest vm.
> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
> a virtio-net device with 9000 MTU (as defined in guest xml). Basically
> guest user can set MTU to any valid value lower than the original
> HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
> should have validated the MTU value before coming down to it. And I
> suspect you might want to do virtnet_close() and virtnet_open()
> before/after changing the buffer size on the fly (the netif_running()
> case), implementing .ndo_change_mtu() will be needed anyway.
a guest VM driver changing mtu to smaller one is valid use case. 
However, current optimization suggested in the patch doesn't degrade any 
performance. Performing close() and open() sequence is good idea, that I 
would like to take up next after this patch as its going to be more than 
one patch to achieve it.
>
>>>> +     }
>>>> +
>>>> +     sg_init_table(rq->sg, sg_num + 2);
>>>>
>>>>       /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>> Comment doesn't match code.
>> ACK
>>>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>>> +     for (i = sg_num + 1; i > 1; --i) {
>>>>               first = get_a_page(rq, gfp);
>>>>               if (!first) {
>>>>                       if (list)
>>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
>>>> *vi, struct receive_queue *rq,
>>>>
>>>>       /* chain first in list head */
>>>>       first->private = (unsigned long)list;
>>>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>>                                 first, gfp);
>>>>       if (err < 0)
>>>>               give_pages(rq, first);
>>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
>>>> *vdev)
>>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>>               vi->big_packets = true;
>>>> +             vi->gso_is_supported = true;
>>> Please do the same for virtnet_clear_guest_offloads(), and
>>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>>> virtnet_clear_guest_offloads() or the caller doesn't unset 
>>> big_packet on
>>> successful return, seems like a bug to me.
>> ACK. The two calls virtnet_set_guest_offloads and
>> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
>> you think if I can do this in virtnet_set_guest_offloads?
> I think that it should be fine, though you may want to deal with the XDP
> path not to regress it.
>
> -Siwei
>
>>>
>>>
>>> Thanks,
>>> -Siwei
>>>> +     }
>>>>
>>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>>               vi->mergeable_rx_bufs = true;
>>>
>

[-- Attachment #2: Type: text/html, Size: 14159 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09  7:06       ` Gavin Li
@ 2022-08-09  7:44           ` Jason Wang
  2022-08-09 18:06           ` Si-Wei Liu
  1 sibling, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-09  7:44 UTC (permalink / raw)
  To: Gavin Li
  Cc: alexander.h.duyck, Virtio-Dev, mst, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem

On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
>
>
> On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
>
> External email: Use caution opening links or attachments
>
>
> On 8/8/2022 12:31 AM, Gavin Li wrote:
>
>
> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>
> External email: Use caution opening links or attachments
>
>
> On 8/1/2022 9:45 PM, Gavin Li wrote:
>
> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> packets even when GUEST_* offloads are not present on the device.
> However, if GSO is not supported,
>
> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
> been be called.
>
> ACK
>
>
>   it would be sufficient to allocate
> segments to cover just up the MTU size and no further. Allocating the
> maximum amount of segments results in a large waste of buffer space in
> the queue, which limits the number of packets that can be buffered and
> can result in reduced performance.
>
> Therefore, if GSO is not supported,
>
> Ditto.
>
> ACK
>
>
> use the MTU to calculate the
> optimal amount of segments required.
>
> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> 1 VQ, queue size 1024, before and after the change, with the iperf
> server running over the virtio-net interface.
>
> MTU(Bytes)/Bandwidth (Gbit/s)
>               Before   After
>    1500        22.5     22.4
>    9000        12.8     25.9
>
> Signed-off-by: Gavin Li <gavinl@nvidia.com>
> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> ---
>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ec8e1b3108c3..d36918c1809d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -222,6 +222,9 @@ struct virtnet_info {
>       /* I like... big packets and I cannot lie! */
>       bool big_packets;
>
> +     /* Indicates GSO support */
> +     bool gso_is_supported;
> +
>       /* Host will merge rx buffers for big packets (shake it! shake
> it!) */
>       bool mergeable_rx_bufs;
>
> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
> virtnet_info *vi, struct receive_queue *rq,
>   static int add_recvbuf_big(struct virtnet_info *vi, struct
> receive_queue *rq,
>                          gfp_t gfp)
>   {
> +     unsigned int sg_num = MAX_SKB_FRAGS;
>       struct page *first, *list = NULL;
>       char *p;
>       int i, err, offset;
>
> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> +     if (!vi->gso_is_supported) {
> +             unsigned int mtu = vi->dev->mtu;
> +
> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
> / PAGE_SIZE;
>
> DIV_ROUND_UP() can be used?
>
> ACK
>
>
> Since this branch slightly adds up cost to the datapath, I wonder if
> this sg_num can be saved and set only once (generally in virtnet_probe
> time) in struct virtnet_info?
>
> Not sure how to do it and align it with align with new mtu during
> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
> ndo_change_mtu might be in vendor specific code and unmanageable. In
> my case, the mtu can only be changed in the xml of the guest vm.
>
> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
> a virtio-net device with 9000 MTU (as defined in guest xml). Basically
> guest user can set MTU to any valid value lower than the original
> HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
> should have validated the MTU value before coming down to it. And I
> suspect you might want to do virtnet_close() and virtnet_open()
> before/after changing the buffer size on the fly (the netif_running()
> case), implementing .ndo_change_mtu() will be needed anyway.
>
> a guest VM driver changing mtu to smaller one is valid use case. However, current optimization suggested in the patch doesn't degrade any performance. Performing close() and open() sequence is good idea, that I would like to take up next after this patch as its going to be more than one patch to achieve it.

Right, it could be done on top.

But another note is that, it would still be better to support GUEST GSO feature:

1) can work for the case for path MTU
2) (migration)compatibility with software backends

>
>
> +     }
> +
> +     sg_init_table(rq->sg, sg_num + 2);
>
>       /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>
> Comment doesn't match code.
>
> ACK
>
> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> +     for (i = sg_num + 1; i > 1; --i) {
>               first = get_a_page(rq, gfp);
>               if (!first) {
>                       if (list)
> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
> *vi, struct receive_queue *rq,
>
>       /* chain first in list head */
>       first->private = (unsigned long)list;
> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>                                 first, gfp);
>       if (err < 0)
>               give_pages(rq, first);
> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
> *vdev)
>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>               vi->big_packets = true;
> +             vi->gso_is_supported = true;
>
> Please do the same for virtnet_clear_guest_offloads(), and
> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> successful return, seems like a bug to me.

It is fine as long as

1) we don't implement ethtool API for changing guest offloads
2) big mode XDP is not enabled

So that code works only for XDP but we forbid big packets in the case
of XDP right now.

Thanks

>
> ACK. The two calls virtnet_set_guest_offloads and
> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
> you think if I can do this in virtnet_set_guest_offloads?
>
> I think that it should be fine, though you may want to deal with the XDP
> path not to regress it.
>
> -Siwei
>
>
>
> Thanks,
> -Siwei
>
> +     }
>
>       if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>               vi->mergeable_rx_bufs = true;
>
>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09  7:44           ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-09  7:44 UTC (permalink / raw)
  To: Gavin Li
  Cc: Si-Wei Liu, mst, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Parav Pandit, gavi

On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
>
>
> On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
>
> External email: Use caution opening links or attachments
>
>
> On 8/8/2022 12:31 AM, Gavin Li wrote:
>
>
> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>
> External email: Use caution opening links or attachments
>
>
> On 8/1/2022 9:45 PM, Gavin Li wrote:
>
> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> packets even when GUEST_* offloads are not present on the device.
> However, if GSO is not supported,
>
> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
> been be called.
>
> ACK
>
>
>   it would be sufficient to allocate
> segments to cover just up the MTU size and no further. Allocating the
> maximum amount of segments results in a large waste of buffer space in
> the queue, which limits the number of packets that can be buffered and
> can result in reduced performance.
>
> Therefore, if GSO is not supported,
>
> Ditto.
>
> ACK
>
>
> use the MTU to calculate the
> optimal amount of segments required.
>
> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> 1 VQ, queue size 1024, before and after the change, with the iperf
> server running over the virtio-net interface.
>
> MTU(Bytes)/Bandwidth (Gbit/s)
>               Before   After
>    1500        22.5     22.4
>    9000        12.8     25.9
>
> Signed-off-by: Gavin Li <gavinl@nvidia.com>
> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> ---
>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>   1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ec8e1b3108c3..d36918c1809d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -222,6 +222,9 @@ struct virtnet_info {
>       /* I like... big packets and I cannot lie! */
>       bool big_packets;
>
> +     /* Indicates GSO support */
> +     bool gso_is_supported;
> +
>       /* Host will merge rx buffers for big packets (shake it! shake
> it!) */
>       bool mergeable_rx_bufs;
>
> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
> virtnet_info *vi, struct receive_queue *rq,
>   static int add_recvbuf_big(struct virtnet_info *vi, struct
> receive_queue *rq,
>                          gfp_t gfp)
>   {
> +     unsigned int sg_num = MAX_SKB_FRAGS;
>       struct page *first, *list = NULL;
>       char *p;
>       int i, err, offset;
>
> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> +     if (!vi->gso_is_supported) {
> +             unsigned int mtu = vi->dev->mtu;
> +
> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
> / PAGE_SIZE;
>
> DIV_ROUND_UP() can be used?
>
> ACK
>
>
> Since this branch slightly adds up cost to the datapath, I wonder if
> this sg_num can be saved and set only once (generally in virtnet_probe
> time) in struct virtnet_info?
>
> Not sure how to do it and align it with align with new mtu during
> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
> ndo_change_mtu might be in vendor specific code and unmanageable. In
> my case, the mtu can only be changed in the xml of the guest vm.
>
> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
> a virtio-net device with 9000 MTU (as defined in guest xml). Basically
> guest user can set MTU to any valid value lower than the original
> HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
> should have validated the MTU value before coming down to it. And I
> suspect you might want to do virtnet_close() and virtnet_open()
> before/after changing the buffer size on the fly (the netif_running()
> case), implementing .ndo_change_mtu() will be needed anyway.
>
> a guest VM driver changing mtu to smaller one is valid use case. However, current optimization suggested in the patch doesn't degrade any performance. Performing close() and open() sequence is good idea, that I would like to take up next after this patch as its going to be more than one patch to achieve it.

Right, it could be done on top.

But another note is that, it would still be better to support GUEST GSO feature:

1) can work for the case for path MTU
2) (migration)compatibility with software backends

>
>
> +     }
> +
> +     sg_init_table(rq->sg, sg_num + 2);
>
>       /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>
> Comment doesn't match code.
>
> ACK
>
> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> +     for (i = sg_num + 1; i > 1; --i) {
>               first = get_a_page(rq, gfp);
>               if (!first) {
>                       if (list)
> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
> *vi, struct receive_queue *rq,
>
>       /* chain first in list head */
>       first->private = (unsigned long)list;
> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>                                 first, gfp);
>       if (err < 0)
>               give_pages(rq, first);
> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
> *vdev)
>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>               vi->big_packets = true;
> +             vi->gso_is_supported = true;
>
> Please do the same for virtnet_clear_guest_offloads(), and
> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> successful return, seems like a bug to me.

It is fine as long as

1) we don't implement ethtool API for changing guest offloads
2) big mode XDP is not enabled

So that code works only for XDP but we forbid big packets in the case
of XDP right now.

Thanks

>
> ACK. The two calls virtnet_set_guest_offloads and
> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
> you think if I can do this in virtnet_set_guest_offloads?
>
> I think that it should be fine, though you may want to deal with the XDP
> path not to regress it.
>
> -Siwei
>
>
>
> Thanks,
> -Siwei
>
> +     }
>
>       if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>               vi->mergeable_rx_bufs = true;
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09  7:44           ` Jason Wang
@ 2022-08-09  9:22             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09  9:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 03:44:22PM +0800, Jason Wang wrote:
> > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
> > *vdev)
> >       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> >           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> >           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> >               vi->big_packets = true;
> > +             vi->gso_is_supported = true;
> >
> > Please do the same for virtnet_clear_guest_offloads(), and
> > correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> > virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> > successful return, seems like a bug to me.
> 
> It is fine as long as
> 
> 1) we don't implement ethtool API for changing guest offloads
> 2) big mode XDP is not enabled
> 
> So that code works only for XDP but we forbid big packets in the case
> of XDP right now.
> 
> Thanks

To put it another way, changing big_packets after probe requires a bunch
of work as current code assumes this flag never changes.
Adding a TODO to handle dynamic offload config is fine but
I don't think it should block this.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09  9:22             ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09  9:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Gavin Li, Si-Wei Liu, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Parav Pandit, gavi

On Tue, Aug 09, 2022 at 03:44:22PM +0800, Jason Wang wrote:
> > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
> > *vdev)
> >       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> >           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> >           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> >               vi->big_packets = true;
> > +             vi->gso_is_supported = true;
> >
> > Please do the same for virtnet_clear_guest_offloads(), and
> > correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> > virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> > successful return, seems like a bug to me.
> 
> It is fine as long as
> 
> 1) we don't implement ethtool API for changing guest offloads
> 2) big mode XDP is not enabled
> 
> So that code works only for XDP but we forbid big packets in the case
> of XDP right now.
> 
> Thanks

To put it another way, changing big_packets after probe requires a bunch
of work as current code assumes this flag never changes.
Adding a TODO to handle dynamic offload config is fine but
I don't think it should block this.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09  7:44           ` Jason Wang
@ 2022-08-09  9:25             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09  9:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 03:44:22PM +0800, Jason Wang wrote:
> > +             unsigned int mtu = vi->dev->mtu;

BTW should this not be max_mtu?  Otherwise if user configures mtu that
is too small we'll add buffers that are too small.  some backends simply
lock up if this happens (I think vhost does).
Maybe we should add a feature to allow packet drop if it's too small.
And send mtu guest to host while we are at it?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09  9:25             ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09  9:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Gavin Li, Si-Wei Liu, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Parav Pandit, gavi

On Tue, Aug 09, 2022 at 03:44:22PM +0800, Jason Wang wrote:
> > +             unsigned int mtu = vi->dev->mtu;

BTW should this not be max_mtu?  Otherwise if user configures mtu that
is too small we'll add buffers that are too small.  some backends simply
lock up if this happens (I think vhost does).
Maybe we should add a feature to allow packet drop if it's too small.
And send mtu guest to host while we are at it?

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09  9:22             ` Michael S. Tsirkin
@ 2022-08-09  9:28               ` Jason Wang
  -1 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-09  9:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 9, 2022 at 5:22 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Aug 09, 2022 at 03:44:22PM +0800, Jason Wang wrote:
> > > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
> > > *vdev)
> > >       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > >           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > >           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > > +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > >               vi->big_packets = true;
> > > +             vi->gso_is_supported = true;
> > >
> > > Please do the same for virtnet_clear_guest_offloads(), and
> > > correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> > > virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> > > successful return, seems like a bug to me.
> >
> > It is fine as long as
> >
> > 1) we don't implement ethtool API for changing guest offloads
> > 2) big mode XDP is not enabled
> >
> > So that code works only for XDP but we forbid big packets in the case
> > of XDP right now.
> >
> > Thanks
>
> To put it another way, changing big_packets after probe requires a bunch
> of work as current code assumes this flag never changes.
> Adding a TODO to handle dynamic offload config is fine but
> I don't think it should block this.

Yes, this is what I mean.

Thanks

>
> --
> MST
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09  9:28               ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-09  9:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gavin Li, Si-Wei Liu, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Parav Pandit, gavi

On Tue, Aug 9, 2022 at 5:22 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Aug 09, 2022 at 03:44:22PM +0800, Jason Wang wrote:
> > > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
> > > *vdev)
> > >       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > >           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > >           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > > +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > >               vi->big_packets = true;
> > > +             vi->gso_is_supported = true;
> > >
> > > Please do the same for virtnet_clear_guest_offloads(), and
> > > correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> > > virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> > > successful return, seems like a bug to me.
> >
> > It is fine as long as
> >
> > 1) we don't implement ethtool API for changing guest offloads
> > 2) big mode XDP is not enabled
> >
> > So that code works only for XDP but we forbid big packets in the case
> > of XDP right now.
> >
> > Thanks
>
> To put it another way, changing big_packets after probe requires a bunch
> of work as current code assumes this flag never changes.
> Adding a TODO to handle dynamic offload config is fine but
> I don't think it should block this.

Yes, this is what I mean.

Thanks

>
> --
> MST
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09  9:25             ` Michael S. Tsirkin
@ 2022-08-09  9:40               ` Jason Wang
  -1 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-09  9:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 9, 2022 at 5:25 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Aug 09, 2022 at 03:44:22PM +0800, Jason Wang wrote:
> > > +             unsigned int mtu = vi->dev->mtu;
>
> BTW should this not be max_mtu?

Yes.

> Otherwise if user configures mtu that
> is too small we'll add buffers that are too small.  some backends simply
> lock up if this happens (I think vhost does).

Probably not? If we run out of buffers, we will wait for the next
kick. (Otherwise it would be an guest triggerable behaviour)

> Maybe we should add a feature to allow packet drop if it's too small.

I may miss something but isn't this what most devices will do? (e.g
ethernet have a minimal packet length)

Thanks

> And send mtu guest to host while we are at it?
>
> --
> MST
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09  9:40               ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-09  9:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gavin Li, Si-Wei Liu, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Parav Pandit, gavi

On Tue, Aug 9, 2022 at 5:25 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Aug 09, 2022 at 03:44:22PM +0800, Jason Wang wrote:
> > > +             unsigned int mtu = vi->dev->mtu;
>
> BTW should this not be max_mtu?

Yes.

> Otherwise if user configures mtu that
> is too small we'll add buffers that are too small.  some backends simply
> lock up if this happens (I think vhost does).

Probably not? If we run out of buffers, we will wait for the next
kick. (Otherwise it would be an guest triggerable behaviour)

> Maybe we should add a feature to allow packet drop if it's too small.

I may miss something but isn't this what most devices will do? (e.g
ethernet have a minimal packet length)

Thanks

> And send mtu guest to host while we are at it?
>
> --
> MST
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09  7:06       ` Gavin Li
@ 2022-08-09 18:06           ` Si-Wei Liu
  2022-08-09 18:06           ` Si-Wei Liu
  1 sibling, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 18:06 UTC (permalink / raw)
  To: Gavin Li, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: gavi


[-- Attachment #1.1: Type: text/plain, Size: 7257 bytes --]



On 8/9/2022 12:06 AM, Gavin Li wrote:
>
>
> On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 8/8/2022 12:31 AM, Gavin Li wrote:
>>>
>>> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>>>> packets even when GUEST_* offloads are not present on the device.
>>>>> However, if GSO is not supported,
>>>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>>>> been be called.
>>> ACK
>>>>
>>>>>   it would be sufficient to allocate
>>>>> segments to cover just up the MTU size and no further. Allocating the
>>>>> maximum amount of segments results in a large waste of buffer 
>>>>> space in
>>>>> the queue, which limits the number of packets that can be buffered 
>>>>> and
>>>>> can result in reduced performance.
>>>>>
>>>>> Therefore, if GSO is not supported,
>>>> Ditto.
>>> ACK
>>>>
>>>>> use the MTU to calculate the
>>>>> optimal amount of segments required.
>>>>>
>>>>> Below is the iperf TCP test results over a Mellanox NIC, using 
>>>>> vDPA for
>>>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>>>> server running over the virtio-net interface.
>>>>>
>>>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>>>               Before   After
>>>>>    1500        22.5     22.4
>>>>>    9000        12.8     25.9
>>>>>
>>>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>>>> ---
>>>>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index ec8e1b3108c3..d36918c1809d 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>>>       /* I like... big packets and I cannot lie! */
>>>>>       bool big_packets;
>>>>>
>>>>> +     /* Indicates GSO support */
>>>>> +     bool gso_is_supported;
>>>>> +
>>>>>       /* Host will merge rx buffers for big packets (shake it! shake
>>>>> it!) */
>>>>>       bool mergeable_rx_bufs;
>>>>>
>>>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
>>>>> virtnet_info *vi, struct receive_queue *rq,
>>>>>   static int add_recvbuf_big(struct virtnet_info *vi, struct
>>>>> receive_queue *rq,
>>>>>                          gfp_t gfp)
>>>>>   {
>>>>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>>>>       struct page *first, *list = NULL;
>>>>>       char *p;
>>>>>       int i, err, offset;
>>>>>
>>>>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>>>> +     if (!vi->gso_is_supported) {
>>>>> +             unsigned int mtu = vi->dev->mtu;
>>>>> +
>>>>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
>>>>> / PAGE_SIZE;
>>>> DIV_ROUND_UP() can be used?
>>> ACK
>>>>
>>>> Since this branch slightly adds up cost to the datapath, I wonder if
>>>> this sg_num can be saved and set only once (generally in virtnet_probe
>>>> time) in struct virtnet_info?
>>> Not sure how to do it and align it with align with new mtu during
>>> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
>>> ndo_change_mtu might be in vendor specific code and unmanageable. In
>>> my case, the mtu can only be changed in the xml of the guest vm.
>> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest 
>> on a virtio-net device with 9000 MTU (as defined in guest xml). 
>> Basically
>> guest user can set MTU to any valid value lower than the original
>> HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
>> should have validated the MTU value before coming down to it. And I
>> suspect you might want to do virtnet_close() and virtnet_open()
>> before/after changing the buffer size on the fly (the netif_running()
>> case), implementing .ndo_change_mtu() will be needed anyway.
> a guest VM driver changing mtu to smaller one is valid use case. 
> However, current optimization suggested in the patch doesn't degrade 
> any performance. Performing close() and open() sequence is good idea, 
> that I would like to take up next after this patch as its going to be 
> more than one patch to achieve it.
Sure, it's fine to separate it out into another patch and optimize on 
top later on. Though the previous comment of avoiding repeatedly 
computing sg_num in datapath still holds: set sg_num only once in 
probe() time then, in which case the sg_num needed can be simply 
inferred from dev->max_mtu rather than dev->mtu.

-Siwei

>>
>>>>> +     }
>>>>> +
>>>>> +     sg_init_table(rq->sg, sg_num + 2);
>>>>>
>>>>>       /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>>> Comment doesn't match code.
>>> ACK
>>>>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>>>> +     for (i = sg_num + 1; i > 1; --i) {
>>>>>               first = get_a_page(rq, gfp);
>>>>>               if (!first) {
>>>>>                       if (list)
>>>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
>>>>> *vi, struct receive_queue *rq,
>>>>>
>>>>>       /* chain first in list head */
>>>>>       first->private = (unsigned long)list;
>>>>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>>>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>>>                                 first, gfp);
>>>>>       if (err < 0)
>>>>>               give_pages(rq, first);
>>>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
>>>>> *vdev)
>>>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>>>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>>>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>>>               vi->big_packets = true;
>>>>> +             vi->gso_is_supported = true;
>>>> Please do the same for virtnet_clear_guest_offloads(), and
>>>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>>>> virtnet_clear_guest_offloads() or the caller doesn't unset 
>>>> big_packet on
>>>> successful return, seems like a bug to me.
>>> ACK. The two calls virtnet_set_guest_offloads and
>>> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
>>> you think if I can do this in virtnet_set_guest_offloads?
>> I think that it should be fine, though you may want to deal with the XDP
>> path not to regress it.
>>
>> -Siwei
>>
>>>>
>>>>
>>>> Thanks,
>>>> -Siwei
>>>>> +     }
>>>>>
>>>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>>>               vi->mergeable_rx_bufs = true;
>>>>
>>

[-- Attachment #1.2: Type: text/html, Size: 13783 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 18:06           ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 18:06 UTC (permalink / raw)
  To: Gavin Li, mst, stephen, davem, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	jasowang, loseweigh
  Cc: parav, gavi

[-- Attachment #1: Type: text/plain, Size: 7431 bytes --]



On 8/9/2022 12:06 AM, Gavin Li wrote:
>
>
> On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 8/8/2022 12:31 AM, Gavin Li wrote:
>>>
>>> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>>>> packets even when GUEST_* offloads are not present on the device.
>>>>> However, if GSO is not supported,
>>>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>>>> been be called.
>>> ACK
>>>>
>>>>>   it would be sufficient to allocate
>>>>> segments to cover just up the MTU size and no further. Allocating the
>>>>> maximum amount of segments results in a large waste of buffer 
>>>>> space in
>>>>> the queue, which limits the number of packets that can be buffered 
>>>>> and
>>>>> can result in reduced performance.
>>>>>
>>>>> Therefore, if GSO is not supported,
>>>> Ditto.
>>> ACK
>>>>
>>>>> use the MTU to calculate the
>>>>> optimal amount of segments required.
>>>>>
>>>>> Below is the iperf TCP test results over a Mellanox NIC, using 
>>>>> vDPA for
>>>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>>>> server running over the virtio-net interface.
>>>>>
>>>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>>>               Before   After
>>>>>    1500        22.5     22.4
>>>>>    9000        12.8     25.9
>>>>>
>>>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>>>> ---
>>>>>   drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>>>   1 file changed, 16 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> index ec8e1b3108c3..d36918c1809d 100644
>>>>> --- a/drivers/net/virtio_net.c
>>>>> +++ b/drivers/net/virtio_net.c
>>>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>>>       /* I like... big packets and I cannot lie! */
>>>>>       bool big_packets;
>>>>>
>>>>> +     /* Indicates GSO support */
>>>>> +     bool gso_is_supported;
>>>>> +
>>>>>       /* Host will merge rx buffers for big packets (shake it! shake
>>>>> it!) */
>>>>>       bool mergeable_rx_bufs;
>>>>>
>>>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
>>>>> virtnet_info *vi, struct receive_queue *rq,
>>>>>   static int add_recvbuf_big(struct virtnet_info *vi, struct
>>>>> receive_queue *rq,
>>>>>                          gfp_t gfp)
>>>>>   {
>>>>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>>>>       struct page *first, *list = NULL;
>>>>>       char *p;
>>>>>       int i, err, offset;
>>>>>
>>>>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>>>> +     if (!vi->gso_is_supported) {
>>>>> +             unsigned int mtu = vi->dev->mtu;
>>>>> +
>>>>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
>>>>> / PAGE_SIZE;
>>>> DIV_ROUND_UP() can be used?
>>> ACK
>>>>
>>>> Since this branch slightly adds up cost to the datapath, I wonder if
>>>> this sg_num can be saved and set only once (generally in virtnet_probe
>>>> time) in struct virtnet_info?
>>> Not sure how to do it and align it with align with new mtu during
>>> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
>>> ndo_change_mtu might be in vendor specific code and unmanageable. In
>>> my case, the mtu can only be changed in the xml of the guest vm.
>> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest 
>> on a virtio-net device with 9000 MTU (as defined in guest xml). 
>> Basically
>> guest user can set MTU to any valid value lower than the original
>> HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
>> should have validated the MTU value before coming down to it. And I
>> suspect you might want to do virtnet_close() and virtnet_open()
>> before/after changing the buffer size on the fly (the netif_running()
>> case), implementing .ndo_change_mtu() will be needed anyway.
> a guest VM driver changing mtu to smaller one is valid use case. 
> However, current optimization suggested in the patch doesn't degrade 
> any performance. Performing close() and open() sequence is good idea, 
> that I would like to take up next after this patch as its going to be 
> more than one patch to achieve it.
Sure, it's fine to separate it out into another patch and optimize on 
top later on. Though the previous comment of avoiding repeatedly 
computing sg_num in datapath still holds: set sg_num only once in 
probe() time then, in which case the sg_num needed can be simply 
inferred from dev->max_mtu rather than dev->mtu.

-Siwei

>>
>>>>> +     }
>>>>> +
>>>>> +     sg_init_table(rq->sg, sg_num + 2);
>>>>>
>>>>>       /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>>> Comment doesn't match code.
>>> ACK
>>>>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>>>> +     for (i = sg_num + 1; i > 1; --i) {
>>>>>               first = get_a_page(rq, gfp);
>>>>>               if (!first) {
>>>>>                       if (list)
>>>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
>>>>> *vi, struct receive_queue *rq,
>>>>>
>>>>>       /* chain first in list head */
>>>>>       first->private = (unsigned long)list;
>>>>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>>>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>>>                                 first, gfp);
>>>>>       if (err < 0)
>>>>>               give_pages(rq, first);
>>>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
>>>>> *vdev)
>>>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>>>           virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>>>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>>>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>>>               vi->big_packets = true;
>>>>> +             vi->gso_is_supported = true;
>>>> Please do the same for virtnet_clear_guest_offloads(), and
>>>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>>>> virtnet_clear_guest_offloads() or the caller doesn't unset 
>>>> big_packet on
>>>> successful return, seems like a bug to me.
>>> ACK. The two calls virtnet_set_guest_offloads and
>>> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
>>> you think if I can do this in virtnet_set_guest_offloads?
>> I think that it should be fine, though you may want to deal with the XDP
>> path not to regress it.
>>
>> -Siwei
>>
>>>>
>>>>
>>>> Thanks,
>>>> -Siwei
>>>>> +     }
>>>>>
>>>>>       if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>>>               vi->mergeable_rx_bufs = true;
>>>>
>>

[-- Attachment #2: Type: text/html, Size: 14047 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09  7:44           ` Jason Wang
@ 2022-08-09 18:38             ` Si-Wei Liu
  -1 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 18:38 UTC (permalink / raw)
  To: Jason Wang, Gavin Li
  Cc: alexander.h.duyck, Virtio-Dev, mst, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem



On 8/9/2022 12:44 AM, Jason Wang wrote:
> On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
>>
>> On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 8/8/2022 12:31 AM, Gavin Li wrote:
>>
>>
>> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>
>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>> packets even when GUEST_* offloads are not present on the device.
>> However, if GSO is not supported,
>>
>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>> been be called.
>>
>> ACK
>>
>>
>>    it would be sufficient to allocate
>> segments to cover just up the MTU size and no further. Allocating the
>> maximum amount of segments results in a large waste of buffer space in
>> the queue, which limits the number of packets that can be buffered and
>> can result in reduced performance.
>>
>> Therefore, if GSO is not supported,
>>
>> Ditto.
>>
>> ACK
>>
>>
>> use the MTU to calculate the
>> optimal amount of segments required.
>>
>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>> 1 VQ, queue size 1024, before and after the change, with the iperf
>> server running over the virtio-net interface.
>>
>> MTU(Bytes)/Bandwidth (Gbit/s)
>>                Before   After
>>     1500        22.5     22.4
>>     9000        12.8     25.9
>>
>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> ---
>>    drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>    1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index ec8e1b3108c3..d36918c1809d 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>        /* I like... big packets and I cannot lie! */
>>        bool big_packets;
>>
>> +     /* Indicates GSO support */
>> +     bool gso_is_supported;
>> +
>>        /* Host will merge rx buffers for big packets (shake it! shake
>> it!) */
>>        bool mergeable_rx_bufs;
>>
>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
>> virtnet_info *vi, struct receive_queue *rq,
>>    static int add_recvbuf_big(struct virtnet_info *vi, struct
>> receive_queue *rq,
>>                           gfp_t gfp)
>>    {
>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>        struct page *first, *list = NULL;
>>        char *p;
>>        int i, err, offset;
>>
>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>> +     if (!vi->gso_is_supported) {
>> +             unsigned int mtu = vi->dev->mtu;
>> +
>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
>> / PAGE_SIZE;
>>
>> DIV_ROUND_UP() can be used?
>>
>> ACK
>>
>>
>> Since this branch slightly adds up cost to the datapath, I wonder if
>> this sg_num can be saved and set only once (generally in virtnet_probe
>> time) in struct virtnet_info?
>>
>> Not sure how to do it and align it with align with new mtu during
>> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
>> ndo_change_mtu might be in vendor specific code and unmanageable. In
>> my case, the mtu can only be changed in the xml of the guest vm.
>>
>> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
>> a virtio-net device with 9000 MTU (as defined in guest xml). Basically
>> guest user can set MTU to any valid value lower than the original
>> HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
>> should have validated the MTU value before coming down to it. And I
>> suspect you might want to do virtnet_close() and virtnet_open()
>> before/after changing the buffer size on the fly (the netif_running()
>> case), implementing .ndo_change_mtu() will be needed anyway.
>>
>> a guest VM driver changing mtu to smaller one is valid use case. However, current optimization suggested in the patch doesn't degrade any performance. Performing close() and open() sequence is good idea, that I would like to take up next after this patch as its going to be more than one patch to achieve it.
> Right, it could be done on top.
>
> But another note is that, it would still be better to support GUEST GSO feature:
>
> 1) can work for the case for path MTU
> 2) (migration)compatibility with software backends
>
>>
>> +     }
>> +
>> +     sg_init_table(rq->sg, sg_num + 2);
>>
>>        /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>
>> Comment doesn't match code.
>>
>> ACK
>>
>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>> +     for (i = sg_num + 1; i > 1; --i) {
>>                first = get_a_page(rq, gfp);
>>                if (!first) {
>>                        if (list)
>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
>> *vi, struct receive_queue *rq,
>>
>>        /* chain first in list head */
>>        first->private = (unsigned long)list;
>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>                                  first, gfp);
>>        if (err < 0)
>>                give_pages(rq, first);
>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
>> *vdev)
>>        if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>                vi->big_packets = true;
>> +             vi->gso_is_supported = true;
>>
>> Please do the same for virtnet_clear_guest_offloads(), and
>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
>> successful return, seems like a bug to me.
> It is fine as long as
>
> 1) we don't implement ethtool API for changing guest offloads
Not sure if I missed something, but it looks the current 
virtnet_set_features() already supports toggling on/off GRO HW through 
commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 (formerly misnamed as 
LRO). Sorry, I realized I had a typo in email: 
"virtnet_set_guest_offloads() or the caller doesn't unset big_packet ...".

> 2) big mode XDP is not enabled
Currently it is not. Not a single patch nor this patch, but the context 
for the eventual goal is to allow XDP on a MTU=9000 link when guest 
users intentionally lower down MTU to 1500.

Regards,
-Siwei
>
> So that code works only for XDP but we forbid big packets in the case
> of XDP right now.
>
> Thanks
>
>> ACK. The two calls virtnet_set_guest_offloads and
>> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
>> you think if I can do this in virtnet_set_guest_offloads?
>>
>> I think that it should be fine, though you may want to deal with the XDP
>> path not to regress it.
>>
>> -Siwei
>>
>>
>>
>> Thanks,
>> -Siwei
>>
>> +     }
>>
>>        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>                vi->mergeable_rx_bufs = true;
>>
>>
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 18:38             ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 18:38 UTC (permalink / raw)
  To: Jason Wang, Gavin Li
  Cc: mst, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Parav Pandit, gavi



On 8/9/2022 12:44 AM, Jason Wang wrote:
> On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
>>
>> On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 8/8/2022 12:31 AM, Gavin Li wrote:
>>
>>
>> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>
>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>> packets even when GUEST_* offloads are not present on the device.
>> However, if GSO is not supported,
>>
>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>> been be called.
>>
>> ACK
>>
>>
>>    it would be sufficient to allocate
>> segments to cover just up the MTU size and no further. Allocating the
>> maximum amount of segments results in a large waste of buffer space in
>> the queue, which limits the number of packets that can be buffered and
>> can result in reduced performance.
>>
>> Therefore, if GSO is not supported,
>>
>> Ditto.
>>
>> ACK
>>
>>
>> use the MTU to calculate the
>> optimal amount of segments required.
>>
>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>> 1 VQ, queue size 1024, before and after the change, with the iperf
>> server running over the virtio-net interface.
>>
>> MTU(Bytes)/Bandwidth (Gbit/s)
>>                Before   After
>>     1500        22.5     22.4
>>     9000        12.8     25.9
>>
>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> ---
>>    drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>    1 file changed, 16 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index ec8e1b3108c3..d36918c1809d 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>        /* I like... big packets and I cannot lie! */
>>        bool big_packets;
>>
>> +     /* Indicates GSO support */
>> +     bool gso_is_supported;
>> +
>>        /* Host will merge rx buffers for big packets (shake it! shake
>> it!) */
>>        bool mergeable_rx_bufs;
>>
>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
>> virtnet_info *vi, struct receive_queue *rq,
>>    static int add_recvbuf_big(struct virtnet_info *vi, struct
>> receive_queue *rq,
>>                           gfp_t gfp)
>>    {
>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>        struct page *first, *list = NULL;
>>        char *p;
>>        int i, err, offset;
>>
>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>> +     if (!vi->gso_is_supported) {
>> +             unsigned int mtu = vi->dev->mtu;
>> +
>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
>> / PAGE_SIZE;
>>
>> DIV_ROUND_UP() can be used?
>>
>> ACK
>>
>>
>> Since this branch slightly adds up cost to the datapath, I wonder if
>> this sg_num can be saved and set only once (generally in virtnet_probe
>> time) in struct virtnet_info?
>>
>> Not sure how to do it and align it with align with new mtu during
>> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
>> ndo_change_mtu might be in vendor specific code and unmanageable. In
>> my case, the mtu can only be changed in the xml of the guest vm.
>>
>> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
>> a virtio-net device with 9000 MTU (as defined in guest xml). Basically
>> guest user can set MTU to any valid value lower than the original
>> HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
>> should have validated the MTU value before coming down to it. And I
>> suspect you might want to do virtnet_close() and virtnet_open()
>> before/after changing the buffer size on the fly (the netif_running()
>> case), implementing .ndo_change_mtu() will be needed anyway.
>>
>> a guest VM driver changing mtu to smaller one is valid use case. However, current optimization suggested in the patch doesn't degrade any performance. Performing close() and open() sequence is good idea, that I would like to take up next after this patch as its going to be more than one patch to achieve it.
> Right, it could be done on top.
>
> But another note is that, it would still be better to support GUEST GSO feature:
>
> 1) can work for the case for path MTU
> 2) (migration)compatibility with software backends
>
>>
>> +     }
>> +
>> +     sg_init_table(rq->sg, sg_num + 2);
>>
>>        /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>
>> Comment doesn't match code.
>>
>> ACK
>>
>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>> +     for (i = sg_num + 1; i > 1; --i) {
>>                first = get_a_page(rq, gfp);
>>                if (!first) {
>>                        if (list)
>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
>> *vi, struct receive_queue *rq,
>>
>>        /* chain first in list head */
>>        first->private = (unsigned long)list;
>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>                                  first, gfp);
>>        if (err < 0)
>>                give_pages(rq, first);
>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
>> *vdev)
>>        if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>                vi->big_packets = true;
>> +             vi->gso_is_supported = true;
>>
>> Please do the same for virtnet_clear_guest_offloads(), and
>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
>> successful return, seems like a bug to me.
> It is fine as long as
>
> 1) we don't implement ethtool API for changing guest offloads
Not sure if I missed something, but it looks the current 
virtnet_set_features() already supports toggling on/off GRO HW through 
commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 (formerly misnamed as 
LRO). Sorry, I realized I had a typo in email: 
"virtnet_set_guest_offloads() or the caller doesn't unset big_packet ...".

> 2) big mode XDP is not enabled
Currently it is not. Not a single patch nor this patch, but the context 
for the eventual goal is to allow XDP on a MTU=9000 link when guest 
users intentionally lower down MTU to 1500.

Regards,
-Siwei
>
> So that code works only for XDP but we forbid big packets in the case
> of XDP right now.
>
> Thanks
>
>> ACK. The two calls virtnet_set_guest_offloads and
>> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
>> you think if I can do this in virtnet_set_guest_offloads?
>>
>> I think that it should be fine, though you may want to deal with the XDP
>> path not to regress it.
>>
>> -Siwei
>>
>>
>>
>> Thanks,
>> -Siwei
>>
>> +     }
>>
>>        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>                vi->mergeable_rx_bufs = true;
>>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 18:38             ` Si-Wei Liu
@ 2022-08-09 18:42               ` Parav Pandit
  -1 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit via Virtualization @ 2022-08-09 18:42 UTC (permalink / raw)
  To: Si-Wei Liu, Jason Wang, Gavin Li
  Cc: alexander.h.duyck, Virtio-Dev, mst, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem

> From: Si-Wei Liu <si-wei.liu@oracle.com>
> Sent: Tuesday, August 9, 2022 2:39 PM

> Currently it is not. Not a single patch nor this patch, but the context for the
> eventual goal is to allow XDP on a MTU=9000 link when guest users
> intentionally lower down MTU to 1500.

Which application benefit by having asymmetry by lowering mtu to 1500 to send packets but want to receive 9K packets?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 18:42               ` Parav Pandit
  0 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit @ 2022-08-09 18:42 UTC (permalink / raw)
  To: Si-Wei Liu, Jason Wang, Gavin Li
  Cc: mst, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Gavi Teitz

> From: Si-Wei Liu <si-wei.liu@oracle.com>
> Sent: Tuesday, August 9, 2022 2:39 PM

> Currently it is not. Not a single patch nor this patch, but the context for the
> eventual goal is to allow XDP on a MTU=9000 link when guest users
> intentionally lower down MTU to 1500.

Which application benefit by having asymmetry by lowering mtu to 1500 to send packets but want to receive 9K packets?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 18:42               ` Parav Pandit
@ 2022-08-09 19:08                 ` Si-Wei Liu
  -1 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 19:08 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang, Gavin Li
  Cc: alexander.h.duyck, Virtio-Dev, mst, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem



On 8/9/2022 11:42 AM, Parav Pandit wrote:
>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>> Sent: Tuesday, August 9, 2022 2:39 PM
>> Currently it is not. Not a single patch nor this patch, but the context for the
>> eventual goal is to allow XDP on a MTU=9000 link when guest users
>> intentionally lower down MTU to 1500.
> Which application benefit by having asymmetry by lowering mtu to 1500 to send packets but want to receive 9K packets?
I think virtio-net driver doesn't differentiate MTU and MRU, in which 
case the receive buffer will be reduced to fit the 1500B payload size 
when mtu is lowered down to 1500 from 9000. What I actually tried to say 
is that as our current use case (software virtio) supports XDP 
applications with 1500 guest mtu and mergeable buffer enabled on a 9000 
MTU link, technically it's a legitimate use case regardless of mergeable 
buffer capability. Otherwise it's considered to be a usability 
regression in driver software side when migrating (not live migrating) 
existing user to vdpa, due to lack of hardware implementation of certain 
relevant feature.

Thanks,
-Siwei
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 19:08                 ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 19:08 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang, Gavin Li
  Cc: mst, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Gavi Teitz



On 8/9/2022 11:42 AM, Parav Pandit wrote:
>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>> Sent: Tuesday, August 9, 2022 2:39 PM
>> Currently it is not. Not a single patch nor this patch, but the context for the
>> eventual goal is to allow XDP on a MTU=9000 link when guest users
>> intentionally lower down MTU to 1500.
> Which application benefit by having asymmetry by lowering mtu to 1500 to send packets but want to receive 9K packets?
I think virtio-net driver doesn't differentiate MTU and MRU, in which 
case the receive buffer will be reduced to fit the 1500B payload size 
when mtu is lowered down to 1500 from 9000. What I actually tried to say 
is that as our current use case (software virtio) supports XDP 
applications with 1500 guest mtu and mergeable buffer enabled on a 9000 
MTU link, technically it's a legitimate use case regardless of mergeable 
buffer capability. Otherwise it's considered to be a usability 
regression in driver software side when migrating (not live migrating) 
existing user to vdpa, due to lack of hardware implementation of certain 
relevant feature.

Thanks,
-Siwei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 19:08                 ` Si-Wei Liu
@ 2022-08-09 19:18                   ` Parav Pandit
  -1 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit via Virtualization @ 2022-08-09 19:18 UTC (permalink / raw)
  To: Si-Wei Liu, Jason Wang, Gavin Li
  Cc: alexander.h.duyck, Virtio-Dev, mst, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem

> From: Si-Wei Liu <si-wei.liu@oracle.com>
> Sent: Tuesday, August 9, 2022 3:09 PM

> >> From: Si-Wei Liu <si-wei.liu@oracle.com>
> >> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> >> single patch nor this patch, but the context for the eventual goal is
> >> to allow XDP on a MTU=9000 link when guest users intentionally lower
> >> down MTU to 1500.
> > Which application benefit by having asymmetry by lowering mtu to 1500
> to send packets but want to receive 9K packets?

Below details doesn’t answer the question of asymmetry. :)

> I think virtio-net driver doesn't differentiate MTU and MRU, in which case
> the receive buffer will be reduced to fit the 1500B payload size when mtu is
> lowered down to 1500 from 9000. 
How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.

Device doesn't know about it because mtu in config space is RO field.
Device keep dropping 9K packets because buffers posted are 1500 bytes.
This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".

So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
(it doesn’t have any relation to mergeable or otherwise).
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 19:18                   ` Parav Pandit
  0 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit @ 2022-08-09 19:18 UTC (permalink / raw)
  To: Si-Wei Liu, Jason Wang, Gavin Li
  Cc: mst, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Gavi Teitz

> From: Si-Wei Liu <si-wei.liu@oracle.com>
> Sent: Tuesday, August 9, 2022 3:09 PM

> >> From: Si-Wei Liu <si-wei.liu@oracle.com>
> >> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> >> single patch nor this patch, but the context for the eventual goal is
> >> to allow XDP on a MTU=9000 link when guest users intentionally lower
> >> down MTU to 1500.
> > Which application benefit by having asymmetry by lowering mtu to 1500
> to send packets but want to receive 9K packets?

Below details doesn’t answer the question of asymmetry. :)

> I think virtio-net driver doesn't differentiate MTU and MRU, in which case
> the receive buffer will be reduced to fit the 1500B payload size when mtu is
> lowered down to 1500 from 9000. 
How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.

Device doesn't know about it because mtu in config space is RO field.
Device keep dropping 9K packets because buffers posted are 1500 bytes.
This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".

So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
(it doesn’t have any relation to mergeable or otherwise).

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 19:18                   ` Parav Pandit
@ 2022-08-09 20:32                     ` Si-Wei Liu
  -1 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 20:32 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang, Gavin Li
  Cc: alexander.h.duyck, Virtio-Dev, mst, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem



On 8/9/2022 12:18 PM, Parav Pandit wrote:
>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>> Sent: Tuesday, August 9, 2022 3:09 PM
>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
>>>> single patch nor this patch, but the context for the eventual goal is
>>>> to allow XDP on a MTU=9000 link when guest users intentionally lower
>>>> down MTU to 1500.
>>> Which application benefit by having asymmetry by lowering mtu to 1500
>> to send packets but want to receive 9K packets?
> Below details doesn’t answer the question of asymmetry. :)
>
>> I think virtio-net driver doesn't differentiate MTU and MRU, in which case
>> the receive buffer will be reduced to fit the 1500B payload size when mtu is
>> lowered down to 1500 from 9000.
> How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
For big_packet path, yes, we need improvement; for mergeable, it's 
adaptable to any incoming packet size so 1500 is what it is today.
>
> Device doesn't know about it because mtu in config space is RO field.
> Device keep dropping 9K packets because buffers posted are 1500 bytes.
> This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".
Right, that's what it happens today on device side (i.e. vhost-net, btw 
mlx5 vdpa device seems to have a bug not pro-actively dropping packets 
that exceed the MTU size, causing guest panic in small packet path).
>
> So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
> (it doesn’t have any relation to mergeable or otherwise).

Usually, the use case I'm aware of would set the peer's MTU to 1500 
(e.g. on a virtual network appliance), or it would rely on path mtu 
discovery to avoid packet drop across links.

-Siwei


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 20:32                     ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 20:32 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang, Gavin Li
  Cc: mst, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Gavi Teitz



On 8/9/2022 12:18 PM, Parav Pandit wrote:
>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>> Sent: Tuesday, August 9, 2022 3:09 PM
>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
>>>> single patch nor this patch, but the context for the eventual goal is
>>>> to allow XDP on a MTU=9000 link when guest users intentionally lower
>>>> down MTU to 1500.
>>> Which application benefit by having asymmetry by lowering mtu to 1500
>> to send packets but want to receive 9K packets?
> Below details doesn’t answer the question of asymmetry. :)
>
>> I think virtio-net driver doesn't differentiate MTU and MRU, in which case
>> the receive buffer will be reduced to fit the 1500B payload size when mtu is
>> lowered down to 1500 from 9000.
> How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
For big_packet path, yes, we need improvement; for mergeable, it's 
adaptable to any incoming packet size so 1500 is what it is today.
>
> Device doesn't know about it because mtu in config space is RO field.
> Device keep dropping 9K packets because buffers posted are 1500 bytes.
> This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".
Right, that's what it happens today on device side (i.e. vhost-net, btw 
mlx5 vdpa device seems to have a bug not pro-actively dropping packets 
that exceed the MTU size, causing guest panic in small packet path).
>
> So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
> (it doesn’t have any relation to mergeable or otherwise).

Usually, the use case I'm aware of would set the peer's MTU to 1500 
(e.g. on a virtual network appliance), or it would rely on path mtu 
discovery to avoid packet drop across links.

-Siwei



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 20:32                     ` Si-Wei Liu
@ 2022-08-09 21:13                       ` Parav Pandit
  -1 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit via Virtualization @ 2022-08-09 21:13 UTC (permalink / raw)
  To: Si-Wei Liu, Jason Wang, Gavin Li
  Cc: alexander.h.duyck, Virtio-Dev, mst, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem

> From: Si-Wei Liu <si-wei.liu@oracle.com>
> Sent: Tuesday, August 9, 2022 4:33 PM
> 
> On 8/9/2022 12:18 PM, Parav Pandit wrote:
> >> From: Si-Wei Liu <si-wei.liu@oracle.com>
> >> Sent: Tuesday, August 9, 2022 3:09 PM
> >>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
> >>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> >>>> single patch nor this patch, but the context for the eventual goal
> >>>> is to allow XDP on a MTU=9000 link when guest users intentionally
> >>>> lower down MTU to 1500.
> >>> Which application benefit by having asymmetry by lowering mtu to
> >>> 1500
> >> to send packets but want to receive 9K packets?
> > Below details doesn’t answer the question of asymmetry. :)
> >
> >> I think virtio-net driver doesn't differentiate MTU and MRU, in which
> >> case the receive buffer will be reduced to fit the 1500B payload size
> >> when mtu is lowered down to 1500 from 9000.
> > How? Driver reduced the mXu to 1500, say it is improved to post buffers of
> 1500 bytes.
> For big_packet path, yes, we need improvement; for mergeable, it's
> adaptable to any incoming packet size so 1500 is what it is today.
> >
> > Device doesn't know about it because mtu in config space is RO field.
> > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > This is because device follows the spec " The device MUST NOT pass
> received packets that exceed mtu".
> Right, that's what it happens today on device side (i.e. vhost-net, btw
> mlx5 vdpa device seems to have a bug not pro-actively dropping packets that
> exceed the MTU size, causing guest panic in small packet path).
> >
> > So, I am lost what virtio net device user application is trying to achieve by
> sending smaller packets and dropping all receive packets.
> > (it doesn’t have any relation to mergeable or otherwise).
> 
> Usually, the use case I'm aware of would set the peer's MTU to 1500 (e.g. on
> a virtual network appliance), or it would rely on path mtu discovery to avoid
> packet drop across links.
Ok. Somehow the application knows the mtu to set on (all) peer(s) and hope for the best.
Understood.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 21:13                       ` Parav Pandit
  0 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit @ 2022-08-09 21:13 UTC (permalink / raw)
  To: Si-Wei Liu, Jason Wang, Gavin Li
  Cc: mst, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Gavi Teitz

> From: Si-Wei Liu <si-wei.liu@oracle.com>
> Sent: Tuesday, August 9, 2022 4:33 PM
> 
> On 8/9/2022 12:18 PM, Parav Pandit wrote:
> >> From: Si-Wei Liu <si-wei.liu@oracle.com>
> >> Sent: Tuesday, August 9, 2022 3:09 PM
> >>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
> >>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> >>>> single patch nor this patch, but the context for the eventual goal
> >>>> is to allow XDP on a MTU=9000 link when guest users intentionally
> >>>> lower down MTU to 1500.
> >>> Which application benefit by having asymmetry by lowering mtu to
> >>> 1500
> >> to send packets but want to receive 9K packets?
> > Below details doesn’t answer the question of asymmetry. :)
> >
> >> I think virtio-net driver doesn't differentiate MTU and MRU, in which
> >> case the receive buffer will be reduced to fit the 1500B payload size
> >> when mtu is lowered down to 1500 from 9000.
> > How? Driver reduced the mXu to 1500, say it is improved to post buffers of
> 1500 bytes.
> For big_packet path, yes, we need improvement; for mergeable, it's
> adaptable to any incoming packet size so 1500 is what it is today.
> >
> > Device doesn't know about it because mtu in config space is RO field.
> > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > This is because device follows the spec " The device MUST NOT pass
> received packets that exceed mtu".
> Right, that's what it happens today on device side (i.e. vhost-net, btw
> mlx5 vdpa device seems to have a bug not pro-actively dropping packets that
> exceed the MTU size, causing guest panic in small packet path).
> >
> > So, I am lost what virtio net device user application is trying to achieve by
> sending smaller packets and dropping all receive packets.
> > (it doesn’t have any relation to mergeable or otherwise).
> 
> Usually, the use case I'm aware of would set the peer's MTU to 1500 (e.g. on
> a virtual network appliance), or it would rely on path mtu discovery to avoid
> packet drop across links.
Ok. Somehow the application knows the mtu to set on (all) peer(s) and hope for the best.
Understood.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 21:13                       ` Parav Pandit
@ 2022-08-09 21:32                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 21:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 09:13:42PM +0000, Parav Pandit wrote:
> > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > Sent: Tuesday, August 9, 2022 4:33 PM
> > 
> > On 8/9/2022 12:18 PM, Parav Pandit wrote:
> > >> From: Si-Wei Liu <si-wei.liu@oracle.com>
> > >> Sent: Tuesday, August 9, 2022 3:09 PM
> > >>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
> > >>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> > >>>> single patch nor this patch, but the context for the eventual goal
> > >>>> is to allow XDP on a MTU=9000 link when guest users intentionally
> > >>>> lower down MTU to 1500.
> > >>> Which application benefit by having asymmetry by lowering mtu to
> > >>> 1500
> > >> to send packets but want to receive 9K packets?
> > > Below details doesn’t answer the question of asymmetry. :)
> > >
> > >> I think virtio-net driver doesn't differentiate MTU and MRU, in which
> > >> case the receive buffer will be reduced to fit the 1500B payload size
> > >> when mtu is lowered down to 1500 from 9000.
> > > How? Driver reduced the mXu to 1500, say it is improved to post buffers of
> > 1500 bytes.
> > For big_packet path, yes, we need improvement; for mergeable, it's
> > adaptable to any incoming packet size so 1500 is what it is today.
> > >
> > > Device doesn't know about it because mtu in config space is RO field.
> > > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > > This is because device follows the spec " The device MUST NOT pass
> > received packets that exceed mtu".
> > Right, that's what it happens today on device side (i.e. vhost-net, btw
> > mlx5 vdpa device seems to have a bug not pro-actively dropping packets that
> > exceed the MTU size, causing guest panic in small packet path).
> > >
> > > So, I am lost what virtio net device user application is trying to achieve by
> > sending smaller packets and dropping all receive packets.
> > > (it doesn’t have any relation to mergeable or otherwise).
> > 
> > Usually, the use case I'm aware of would set the peer's MTU to 1500 (e.g. on
> > a virtual network appliance), or it would rely on path mtu discovery to avoid
> > packet drop across links.
> Ok. Somehow the application knows the mtu to set on (all) peer(s) and hope for the best.
> Understood.

That's generally what one has to do with mtu, yes - it has to be set
consistently across the LAN. While e.g. pMTU might help work around some
misconfigured LANs with a mix of different MTUs it was never designed
for that.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 21:32                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 21:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Si-Wei Liu, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Tue, Aug 09, 2022 at 09:13:42PM +0000, Parav Pandit wrote:
> > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > Sent: Tuesday, August 9, 2022 4:33 PM
> > 
> > On 8/9/2022 12:18 PM, Parav Pandit wrote:
> > >> From: Si-Wei Liu <si-wei.liu@oracle.com>
> > >> Sent: Tuesday, August 9, 2022 3:09 PM
> > >>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
> > >>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> > >>>> single patch nor this patch, but the context for the eventual goal
> > >>>> is to allow XDP on a MTU=9000 link when guest users intentionally
> > >>>> lower down MTU to 1500.
> > >>> Which application benefit by having asymmetry by lowering mtu to
> > >>> 1500
> > >> to send packets but want to receive 9K packets?
> > > Below details doesn’t answer the question of asymmetry. :)
> > >
> > >> I think virtio-net driver doesn't differentiate MTU and MRU, in which
> > >> case the receive buffer will be reduced to fit the 1500B payload size
> > >> when mtu is lowered down to 1500 from 9000.
> > > How? Driver reduced the mXu to 1500, say it is improved to post buffers of
> > 1500 bytes.
> > For big_packet path, yes, we need improvement; for mergeable, it's
> > adaptable to any incoming packet size so 1500 is what it is today.
> > >
> > > Device doesn't know about it because mtu in config space is RO field.
> > > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > > This is because device follows the spec " The device MUST NOT pass
> > received packets that exceed mtu".
> > Right, that's what it happens today on device side (i.e. vhost-net, btw
> > mlx5 vdpa device seems to have a bug not pro-actively dropping packets that
> > exceed the MTU size, causing guest panic in small packet path).
> > >
> > > So, I am lost what virtio net device user application is trying to achieve by
> > sending smaller packets and dropping all receive packets.
> > > (it doesn’t have any relation to mergeable or otherwise).
> > 
> > Usually, the use case I'm aware of would set the peer's MTU to 1500 (e.g. on
> > a virtual network appliance), or it would rely on path mtu discovery to avoid
> > packet drop across links.
> Ok. Somehow the application knows the mtu to set on (all) peer(s) and hope for the best.
> Understood.

That's generally what one has to do with mtu, yes - it has to be set
consistently across the LAN. While e.g. pMTU might help work around some
misconfigured LANs with a mix of different MTUs it was never designed
for that.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 18:38             ` Si-Wei Liu
@ 2022-08-09 21:34               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 21:34 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 11:38:52AM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/9/2022 12:44 AM, Jason Wang wrote:
> > On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
> > > 
> > > On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
> > > 
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On 8/8/2022 12:31 AM, Gavin Li wrote:
> > > 
> > > 
> > > On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
> > > 
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On 8/1/2022 9:45 PM, Gavin Li wrote:
> > > 
> > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > > packets even when GUEST_* offloads are not present on the device.
> > > However, if GSO is not supported,
> > > 
> > > GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
> > > been be called.
> > > 
> > > ACK
> > > 
> > > 
> > >    it would be sufficient to allocate
> > > segments to cover just up the MTU size and no further. Allocating the
> > > maximum amount of segments results in a large waste of buffer space in
> > > the queue, which limits the number of packets that can be buffered and
> > > can result in reduced performance.
> > > 
> > > Therefore, if GSO is not supported,
> > > 
> > > Ditto.
> > > 
> > > ACK
> > > 
> > > 
> > > use the MTU to calculate the
> > > optimal amount of segments required.
> > > 
> > > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > > 1 VQ, queue size 1024, before and after the change, with the iperf
> > > server running over the virtio-net interface.
> > > 
> > > MTU(Bytes)/Bandwidth (Gbit/s)
> > >                Before   After
> > >     1500        22.5     22.4
> > >     9000        12.8     25.9
> > > 
> > > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > ---
> > >    drivers/net/virtio_net.c | 20 ++++++++++++++++----
> > >    1 file changed, 16 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index ec8e1b3108c3..d36918c1809d 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -222,6 +222,9 @@ struct virtnet_info {
> > >        /* I like... big packets and I cannot lie! */
> > >        bool big_packets;
> > > 
> > > +     /* Indicates GSO support */
> > > +     bool gso_is_supported;
> > > +
> > >        /* Host will merge rx buffers for big packets (shake it! shake
> > > it!) */
> > >        bool mergeable_rx_bufs;
> > > 
> > > @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
> > > virtnet_info *vi, struct receive_queue *rq,
> > >    static int add_recvbuf_big(struct virtnet_info *vi, struct
> > > receive_queue *rq,
> > >                           gfp_t gfp)
> > >    {
> > > +     unsigned int sg_num = MAX_SKB_FRAGS;
> > >        struct page *first, *list = NULL;
> > >        char *p;
> > >        int i, err, offset;
> > > 
> > > -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > > +     if (!vi->gso_is_supported) {
> > > +             unsigned int mtu = vi->dev->mtu;
> > > +
> > > +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
> > > / PAGE_SIZE;
> > > 
> > > DIV_ROUND_UP() can be used?
> > > 
> > > ACK
> > > 
> > > 
> > > Since this branch slightly adds up cost to the datapath, I wonder if
> > > this sg_num can be saved and set only once (generally in virtnet_probe
> > > time) in struct virtnet_info?
> > > 
> > > Not sure how to do it and align it with align with new mtu during
> > > .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
> > > ndo_change_mtu might be in vendor specific code and unmanageable. In
> > > my case, the mtu can only be changed in the xml of the guest vm.
> > > 
> > > Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
> > > a virtio-net device with 9000 MTU (as defined in guest xml). Basically
> > > guest user can set MTU to any valid value lower than the original
> > > HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
> > > should have validated the MTU value before coming down to it. And I
> > > suspect you might want to do virtnet_close() and virtnet_open()
> > > before/after changing the buffer size on the fly (the netif_running()
> > > case), implementing .ndo_change_mtu() will be needed anyway.
> > > 
> > > a guest VM driver changing mtu to smaller one is valid use case. However, current optimization suggested in the patch doesn't degrade any performance. Performing close() and open() sequence is good idea, that I would like to take up next after this patch as its going to be more than one patch to achieve it.
> > Right, it could be done on top.
> > 
> > But another note is that, it would still be better to support GUEST GSO feature:
> > 
> > 1) can work for the case for path MTU
> > 2) (migration)compatibility with software backends
> > 
> > > 
> > > +     }
> > > +
> > > +     sg_init_table(rq->sg, sg_num + 2);
> > > 
> > >        /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > > 
> > > Comment doesn't match code.
> > > 
> > > ACK
> > > 
> > > -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > > +     for (i = sg_num + 1; i > 1; --i) {
> > >                first = get_a_page(rq, gfp);
> > >                if (!first) {
> > >                        if (list)
> > > @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
> > > *vi, struct receive_queue *rq,
> > > 
> > >        /* chain first in list head */
> > >        first->private = (unsigned long)list;
> > > -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > > +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
> > >                                  first, gfp);
> > >        if (err < 0)
> > >                give_pages(rq, first);
> > > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
> > > *vdev)
> > >        if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > >            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > >            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > > +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > >                vi->big_packets = true;
> > > +             vi->gso_is_supported = true;
> > > 
> > > Please do the same for virtnet_clear_guest_offloads(), and
> > > correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> > > virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> > > successful return, seems like a bug to me.
> > It is fine as long as
> > 
> > 1) we don't implement ethtool API for changing guest offloads
> Not sure if I missed something, but it looks the current
> virtnet_set_features() already supports toggling on/off GRO HW through
> commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 (formerly misnamed as LRO).
> Sorry, I realized I had a typo in email: "virtnet_set_guest_offloads() or
> the caller doesn't unset big_packet ...".

"we" here is the device, not the driver.

> > 2) big mode XDP is not enabled
> Currently it is not. Not a single patch nor this patch, but the context for
> the eventual goal is to allow XDP on a MTU=9000 link when guest users
> intentionally lower down MTU to 1500.
> 
> Regards,
> -Siwei
> > 
> > So that code works only for XDP but we forbid big packets in the case
> > of XDP right now.
> > 
> > Thanks
> > 
> > > ACK. The two calls virtnet_set_guest_offloads and
> > > virtnet_set_guest_offloads is also called by virtnet_set_features. Do
> > > you think if I can do this in virtnet_set_guest_offloads?
> > > 
> > > I think that it should be fine, though you may want to deal with the XDP
> > > path not to regress it.
> > > 
> > > -Siwei
> > > 
> > > 
> > > 
> > > Thanks,
> > > -Siwei
> > > 
> > > +     }
> > > 
> > >        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> > >                vi->mergeable_rx_bufs = true;
> > > 
> > > 
> > > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 21:34               ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 21:34 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Jason Wang, Gavin Li, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Parav Pandit, gavi

On Tue, Aug 09, 2022 at 11:38:52AM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/9/2022 12:44 AM, Jason Wang wrote:
> > On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
> > > 
> > > On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
> > > 
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On 8/8/2022 12:31 AM, Gavin Li wrote:
> > > 
> > > 
> > > On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
> > > 
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On 8/1/2022 9:45 PM, Gavin Li wrote:
> > > 
> > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > > packets even when GUEST_* offloads are not present on the device.
> > > However, if GSO is not supported,
> > > 
> > > GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
> > > been be called.
> > > 
> > > ACK
> > > 
> > > 
> > >    it would be sufficient to allocate
> > > segments to cover just up the MTU size and no further. Allocating the
> > > maximum amount of segments results in a large waste of buffer space in
> > > the queue, which limits the number of packets that can be buffered and
> > > can result in reduced performance.
> > > 
> > > Therefore, if GSO is not supported,
> > > 
> > > Ditto.
> > > 
> > > ACK
> > > 
> > > 
> > > use the MTU to calculate the
> > > optimal amount of segments required.
> > > 
> > > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > > 1 VQ, queue size 1024, before and after the change, with the iperf
> > > server running over the virtio-net interface.
> > > 
> > > MTU(Bytes)/Bandwidth (Gbit/s)
> > >                Before   After
> > >     1500        22.5     22.4
> > >     9000        12.8     25.9
> > > 
> > > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > ---
> > >    drivers/net/virtio_net.c | 20 ++++++++++++++++----
> > >    1 file changed, 16 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index ec8e1b3108c3..d36918c1809d 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -222,6 +222,9 @@ struct virtnet_info {
> > >        /* I like... big packets and I cannot lie! */
> > >        bool big_packets;
> > > 
> > > +     /* Indicates GSO support */
> > > +     bool gso_is_supported;
> > > +
> > >        /* Host will merge rx buffers for big packets (shake it! shake
> > > it!) */
> > >        bool mergeable_rx_bufs;
> > > 
> > > @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
> > > virtnet_info *vi, struct receive_queue *rq,
> > >    static int add_recvbuf_big(struct virtnet_info *vi, struct
> > > receive_queue *rq,
> > >                           gfp_t gfp)
> > >    {
> > > +     unsigned int sg_num = MAX_SKB_FRAGS;
> > >        struct page *first, *list = NULL;
> > >        char *p;
> > >        int i, err, offset;
> > > 
> > > -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > > +     if (!vi->gso_is_supported) {
> > > +             unsigned int mtu = vi->dev->mtu;
> > > +
> > > +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
> > > / PAGE_SIZE;
> > > 
> > > DIV_ROUND_UP() can be used?
> > > 
> > > ACK
> > > 
> > > 
> > > Since this branch slightly adds up cost to the datapath, I wonder if
> > > this sg_num can be saved and set only once (generally in virtnet_probe
> > > time) in struct virtnet_info?
> > > 
> > > Not sure how to do it and align it with align with new mtu during
> > > .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
> > > ndo_change_mtu might be in vendor specific code and unmanageable. In
> > > my case, the mtu can only be changed in the xml of the guest vm.
> > > 
> > > Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
> > > a virtio-net device with 9000 MTU (as defined in guest xml). Basically
> > > guest user can set MTU to any valid value lower than the original
> > > HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
> > > should have validated the MTU value before coming down to it. And I
> > > suspect you might want to do virtnet_close() and virtnet_open()
> > > before/after changing the buffer size on the fly (the netif_running()
> > > case), implementing .ndo_change_mtu() will be needed anyway.
> > > 
> > > a guest VM driver changing mtu to smaller one is valid use case. However, current optimization suggested in the patch doesn't degrade any performance. Performing close() and open() sequence is good idea, that I would like to take up next after this patch as its going to be more than one patch to achieve it.
> > Right, it could be done on top.
> > 
> > But another note is that, it would still be better to support GUEST GSO feature:
> > 
> > 1) can work for the case for path MTU
> > 2) (migration)compatibility with software backends
> > 
> > > 
> > > +     }
> > > +
> > > +     sg_init_table(rq->sg, sg_num + 2);
> > > 
> > >        /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > > 
> > > Comment doesn't match code.
> > > 
> > > ACK
> > > 
> > > -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > > +     for (i = sg_num + 1; i > 1; --i) {
> > >                first = get_a_page(rq, gfp);
> > >                if (!first) {
> > >                        if (list)
> > > @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
> > > *vi, struct receive_queue *rq,
> > > 
> > >        /* chain first in list head */
> > >        first->private = (unsigned long)list;
> > > -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > > +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
> > >                                  first, gfp);
> > >        if (err < 0)
> > >                give_pages(rq, first);
> > > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
> > > *vdev)
> > >        if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > >            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > >            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > > +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > >                vi->big_packets = true;
> > > +             vi->gso_is_supported = true;
> > > 
> > > Please do the same for virtnet_clear_guest_offloads(), and
> > > correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> > > virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> > > successful return, seems like a bug to me.
> > It is fine as long as
> > 
> > 1) we don't implement ethtool API for changing guest offloads
> Not sure if I missed something, but it looks the current
> virtnet_set_features() already supports toggling on/off GRO HW through
> commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 (formerly misnamed as LRO).
> Sorry, I realized I had a typo in email: "virtnet_set_guest_offloads() or
> the caller doesn't unset big_packet ...".

"we" here is the device, not the driver.

> > 2) big mode XDP is not enabled
> Currently it is not. Not a single patch nor this patch, but the context for
> the eventual goal is to allow XDP on a MTU=9000 link when guest users
> intentionally lower down MTU to 1500.
> 
> Regards,
> -Siwei
> > 
> > So that code works only for XDP but we forbid big packets in the case
> > of XDP right now.
> > 
> > Thanks
> > 
> > > ACK. The two calls virtnet_set_guest_offloads and
> > > virtnet_set_guest_offloads is also called by virtnet_set_features. Do
> > > you think if I can do this in virtnet_set_guest_offloads?
> > > 
> > > I think that it should be fine, though you may want to deal with the XDP
> > > path not to regress it.
> > > 
> > > -Siwei
> > > 
> > > 
> > > 
> > > Thanks,
> > > -Siwei
> > > 
> > > +     }
> > > 
> > >        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> > >                vi->mergeable_rx_bufs = true;
> > > 
> > > 
> > > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 19:18                   ` Parav Pandit
@ 2022-08-09 21:37                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 21:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
> > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > Sent: Tuesday, August 9, 2022 3:09 PM
> 
> > >> From: Si-Wei Liu <si-wei.liu@oracle.com>
> > >> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> > >> single patch nor this patch, but the context for the eventual goal is
> > >> to allow XDP on a MTU=9000 link when guest users intentionally lower
> > >> down MTU to 1500.
> > > Which application benefit by having asymmetry by lowering mtu to 1500
> > to send packets but want to receive 9K packets?
> 
> Below details doesn’t answer the question of asymmetry. :)
> 
> > I think virtio-net driver doesn't differentiate MTU and MRU, in which case
> > the receive buffer will be reduced to fit the 1500B payload size when mtu is
> > lowered down to 1500 from 9000. 
> How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
> 
> Device doesn't know about it because mtu in config space is RO field.
> Device keep dropping 9K packets because buffers posted are 1500 bytes.
> This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".


The "mtu" here is the device config field, which is

        /* Default maximum transmit unit advice */

there is no guarantee device will not get a bigger packet.
And there is no guarantee such a packet will be dropped
as opposed to wedging the device if userspace insists on
adding smaller buffers.


> So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
> (it doesn’t have any relation to mergeable or otherwise).

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 21:37                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 21:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Si-Wei Liu, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
> > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > Sent: Tuesday, August 9, 2022 3:09 PM
> 
> > >> From: Si-Wei Liu <si-wei.liu@oracle.com>
> > >> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> > >> single patch nor this patch, but the context for the eventual goal is
> > >> to allow XDP on a MTU=9000 link when guest users intentionally lower
> > >> down MTU to 1500.
> > > Which application benefit by having asymmetry by lowering mtu to 1500
> > to send packets but want to receive 9K packets?
> 
> Below details doesn’t answer the question of asymmetry. :)
> 
> > I think virtio-net driver doesn't differentiate MTU and MRU, in which case
> > the receive buffer will be reduced to fit the 1500B payload size when mtu is
> > lowered down to 1500 from 9000. 
> How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
> 
> Device doesn't know about it because mtu in config space is RO field.
> Device keep dropping 9K packets because buffers posted are 1500 bytes.
> This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".


The "mtu" here is the device config field, which is

        /* Default maximum transmit unit advice */

there is no guarantee device will not get a bigger packet.
And there is no guarantee such a packet will be dropped
as opposed to wedging the device if userspace insists on
adding smaller buffers.


> So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
> (it doesn’t have any relation to mergeable or otherwise).


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 21:34               ` Michael S. Tsirkin
@ 2022-08-09 21:39                 ` Si-Wei Liu
  -1 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 21:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li



On 8/9/2022 2:34 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 09, 2022 at 11:38:52AM -0700, Si-Wei Liu wrote:
>>
>> On 8/9/2022 12:44 AM, Jason Wang wrote:
>>> On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
>>>> On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 8/8/2022 12:31 AM, Gavin Li wrote:
>>>>
>>>>
>>>> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>>>
>>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>>> packets even when GUEST_* offloads are not present on the device.
>>>> However, if GSO is not supported,
>>>>
>>>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>>>> been be called.
>>>>
>>>> ACK
>>>>
>>>>
>>>>     it would be sufficient to allocate
>>>> segments to cover just up the MTU size and no further. Allocating the
>>>> maximum amount of segments results in a large waste of buffer space in
>>>> the queue, which limits the number of packets that can be buffered and
>>>> can result in reduced performance.
>>>>
>>>> Therefore, if GSO is not supported,
>>>>
>>>> Ditto.
>>>>
>>>> ACK
>>>>
>>>>
>>>> use the MTU to calculate the
>>>> optimal amount of segments required.
>>>>
>>>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>>> server running over the virtio-net interface.
>>>>
>>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>>                 Before   After
>>>>      1500        22.5     22.4
>>>>      9000        12.8     25.9
>>>>
>>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>>> ---
>>>>     drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>>     1 file changed, 16 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index ec8e1b3108c3..d36918c1809d 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>>         /* I like... big packets and I cannot lie! */
>>>>         bool big_packets;
>>>>
>>>> +     /* Indicates GSO support */
>>>> +     bool gso_is_supported;
>>>> +
>>>>         /* Host will merge rx buffers for big packets (shake it! shake
>>>> it!) */
>>>>         bool mergeable_rx_bufs;
>>>>
>>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
>>>> virtnet_info *vi, struct receive_queue *rq,
>>>>     static int add_recvbuf_big(struct virtnet_info *vi, struct
>>>> receive_queue *rq,
>>>>                            gfp_t gfp)
>>>>     {
>>>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>>>         struct page *first, *list = NULL;
>>>>         char *p;
>>>>         int i, err, offset;
>>>>
>>>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>>> +     if (!vi->gso_is_supported) {
>>>> +             unsigned int mtu = vi->dev->mtu;
>>>> +
>>>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
>>>> / PAGE_SIZE;
>>>>
>>>> DIV_ROUND_UP() can be used?
>>>>
>>>> ACK
>>>>
>>>>
>>>> Since this branch slightly adds up cost to the datapath, I wonder if
>>>> this sg_num can be saved and set only once (generally in virtnet_probe
>>>> time) in struct virtnet_info?
>>>>
>>>> Not sure how to do it and align it with align with new mtu during
>>>> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
>>>> ndo_change_mtu might be in vendor specific code and unmanageable. In
>>>> my case, the mtu can only be changed in the xml of the guest vm.
>>>>
>>>> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
>>>> a virtio-net device with 9000 MTU (as defined in guest xml). Basically
>>>> guest user can set MTU to any valid value lower than the original
>>>> HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
>>>> should have validated the MTU value before coming down to it. And I
>>>> suspect you might want to do virtnet_close() and virtnet_open()
>>>> before/after changing the buffer size on the fly (the netif_running()
>>>> case), implementing .ndo_change_mtu() will be needed anyway.
>>>>
>>>> a guest VM driver changing mtu to smaller one is valid use case. However, current optimization suggested in the patch doesn't degrade any performance. Performing close() and open() sequence is good idea, that I would like to take up next after this patch as its going to be more than one patch to achieve it.
>>> Right, it could be done on top.
>>>
>>> But another note is that, it would still be better to support GUEST GSO feature:
>>>
>>> 1) can work for the case for path MTU
>>> 2) (migration)compatibility with software backends
>>>
>>>> +     }
>>>> +
>>>> +     sg_init_table(rq->sg, sg_num + 2);
>>>>
>>>>         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>>>
>>>> Comment doesn't match code.
>>>>
>>>> ACK
>>>>
>>>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>>> +     for (i = sg_num + 1; i > 1; --i) {
>>>>                 first = get_a_page(rq, gfp);
>>>>                 if (!first) {
>>>>                         if (list)
>>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
>>>> *vi, struct receive_queue *rq,
>>>>
>>>>         /* chain first in list head */
>>>>         first->private = (unsigned long)list;
>>>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>>                                   first, gfp);
>>>>         if (err < 0)
>>>>                 give_pages(rq, first);
>>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
>>>> *vdev)
>>>>         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>>             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>>             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>>                 vi->big_packets = true;
>>>> +             vi->gso_is_supported = true;
>>>>
>>>> Please do the same for virtnet_clear_guest_offloads(), and
>>>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>>>> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
>>>> successful return, seems like a bug to me.
>>> It is fine as long as
>>>
>>> 1) we don't implement ethtool API for changing guest offloads
>> Not sure if I missed something, but it looks the current
>> virtnet_set_features() already supports toggling on/off GRO HW through
>> commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 (formerly misnamed as LRO).
>> Sorry, I realized I had a typo in email: "virtnet_set_guest_offloads() or
>> the caller doesn't unset big_packet ...".
> "we" here is the device, not the driver.
What is the ethtool API from device level? 
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS for sure, right?

It's implemented in software backend as far as I know. I see no reason 
*technically* this is infeasible, regardless what you name it, being a 
bug or TODO.

-Siwei

>
>>> 2) big mode XDP is not enabled
>> Currently it is not. Not a single patch nor this patch, but the context for
>> the eventual goal is to allow XDP on a MTU=9000 link when guest users
>> intentionally lower down MTU to 1500.
>>
>> Regards,
>> -Siwei
>>> So that code works only for XDP but we forbid big packets in the case
>>> of XDP right now.
>>>
>>> Thanks
>>>
>>>> ACK. The two calls virtnet_set_guest_offloads and
>>>> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
>>>> you think if I can do this in virtnet_set_guest_offloads?
>>>>
>>>> I think that it should be fine, though you may want to deal with the XDP
>>>> path not to regress it.
>>>>
>>>> -Siwei
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>> +     }
>>>>
>>>>         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>>                 vi->mergeable_rx_bufs = true;
>>>>
>>>>
>>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 21:39                 ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 21:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Gavin Li, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Parav Pandit, gavi



On 8/9/2022 2:34 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 09, 2022 at 11:38:52AM -0700, Si-Wei Liu wrote:
>>
>> On 8/9/2022 12:44 AM, Jason Wang wrote:
>>> On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
>>>> On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 8/8/2022 12:31 AM, Gavin Li wrote:
>>>>
>>>>
>>>> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>>>>
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>>>
>>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>>> packets even when GUEST_* offloads are not present on the device.
>>>> However, if GSO is not supported,
>>>>
>>>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>>>> been be called.
>>>>
>>>> ACK
>>>>
>>>>
>>>>     it would be sufficient to allocate
>>>> segments to cover just up the MTU size and no further. Allocating the
>>>> maximum amount of segments results in a large waste of buffer space in
>>>> the queue, which limits the number of packets that can be buffered and
>>>> can result in reduced performance.
>>>>
>>>> Therefore, if GSO is not supported,
>>>>
>>>> Ditto.
>>>>
>>>> ACK
>>>>
>>>>
>>>> use the MTU to calculate the
>>>> optimal amount of segments required.
>>>>
>>>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>>> server running over the virtio-net interface.
>>>>
>>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>>                 Before   After
>>>>      1500        22.5     22.4
>>>>      9000        12.8     25.9
>>>>
>>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>>> ---
>>>>     drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>>     1 file changed, 16 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index ec8e1b3108c3..d36918c1809d 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>>         /* I like... big packets and I cannot lie! */
>>>>         bool big_packets;
>>>>
>>>> +     /* Indicates GSO support */
>>>> +     bool gso_is_supported;
>>>> +
>>>>         /* Host will merge rx buffers for big packets (shake it! shake
>>>> it!) */
>>>>         bool mergeable_rx_bufs;
>>>>
>>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
>>>> virtnet_info *vi, struct receive_queue *rq,
>>>>     static int add_recvbuf_big(struct virtnet_info *vi, struct
>>>> receive_queue *rq,
>>>>                            gfp_t gfp)
>>>>     {
>>>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>>>         struct page *first, *list = NULL;
>>>>         char *p;
>>>>         int i, err, offset;
>>>>
>>>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>>> +     if (!vi->gso_is_supported) {
>>>> +             unsigned int mtu = vi->dev->mtu;
>>>> +
>>>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
>>>> / PAGE_SIZE;
>>>>
>>>> DIV_ROUND_UP() can be used?
>>>>
>>>> ACK
>>>>
>>>>
>>>> Since this branch slightly adds up cost to the datapath, I wonder if
>>>> this sg_num can be saved and set only once (generally in virtnet_probe
>>>> time) in struct virtnet_info?
>>>>
>>>> Not sure how to do it and align it with align with new mtu during
>>>> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
>>>> ndo_change_mtu might be in vendor specific code and unmanageable. In
>>>> my case, the mtu can only be changed in the xml of the guest vm.
>>>>
>>>> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
>>>> a virtio-net device with 9000 MTU (as defined in guest xml). Basically
>>>> guest user can set MTU to any valid value lower than the original
>>>> HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
>>>> should have validated the MTU value before coming down to it. And I
>>>> suspect you might want to do virtnet_close() and virtnet_open()
>>>> before/after changing the buffer size on the fly (the netif_running()
>>>> case), implementing .ndo_change_mtu() will be needed anyway.
>>>>
>>>> a guest VM driver changing mtu to smaller one is valid use case. However, current optimization suggested in the patch doesn't degrade any performance. Performing close() and open() sequence is good idea, that I would like to take up next after this patch as its going to be more than one patch to achieve it.
>>> Right, it could be done on top.
>>>
>>> But another note is that, it would still be better to support GUEST GSO feature:
>>>
>>> 1) can work for the case for path MTU
>>> 2) (migration)compatibility with software backends
>>>
>>>> +     }
>>>> +
>>>> +     sg_init_table(rq->sg, sg_num + 2);
>>>>
>>>>         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>>>
>>>> Comment doesn't match code.
>>>>
>>>> ACK
>>>>
>>>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>>> +     for (i = sg_num + 1; i > 1; --i) {
>>>>                 first = get_a_page(rq, gfp);
>>>>                 if (!first) {
>>>>                         if (list)
>>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
>>>> *vi, struct receive_queue *rq,
>>>>
>>>>         /* chain first in list head */
>>>>         first->private = (unsigned long)list;
>>>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>>                                   first, gfp);
>>>>         if (err < 0)
>>>>                 give_pages(rq, first);
>>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
>>>> *vdev)
>>>>         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>>             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>>             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>>                 vi->big_packets = true;
>>>> +             vi->gso_is_supported = true;
>>>>
>>>> Please do the same for virtnet_clear_guest_offloads(), and
>>>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>>>> virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
>>>> successful return, seems like a bug to me.
>>> It is fine as long as
>>>
>>> 1) we don't implement ethtool API for changing guest offloads
>> Not sure if I missed something, but it looks the current
>> virtnet_set_features() already supports toggling on/off GRO HW through
>> commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 (formerly misnamed as LRO).
>> Sorry, I realized I had a typo in email: "virtnet_set_guest_offloads() or
>> the caller doesn't unset big_packet ...".
> "we" here is the device, not the driver.
What is the ethtool API from device level? 
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS for sure, right?

It's implemented in software backend as far as I know. I see no reason 
*technically* this is infeasible, regardless what you name it, being a 
bug or TODO.

-Siwei

>
>>> 2) big mode XDP is not enabled
>> Currently it is not. Not a single patch nor this patch, but the context for
>> the eventual goal is to allow XDP on a MTU=9000 link when guest users
>> intentionally lower down MTU to 1500.
>>
>> Regards,
>> -Siwei
>>> So that code works only for XDP but we forbid big packets in the case
>>> of XDP right now.
>>>
>>> Thanks
>>>
>>>> ACK. The two calls virtnet_set_guest_offloads and
>>>> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
>>>> you think if I can do this in virtnet_set_guest_offloads?
>>>>
>>>> I think that it should be fine, though you may want to deal with the XDP
>>>> path not to regress it.
>>>>
>>>> -Siwei
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> -Siwei
>>>>
>>>> +     }
>>>>
>>>>         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>>                 vi->mergeable_rx_bufs = true;
>>>>
>>>>
>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 21:37                     ` Michael S. Tsirkin
@ 2022-08-09 21:49                       ` Parav Pandit
  -1 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit via Virtualization @ 2022-08-09 21:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, August 9, 2022 5:38 PM

[..]
> > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > which case the receive buffer will be reduced to fit the 1500B
> > > payload size when mtu is lowered down to 1500 from 9000.
> > How? Driver reduced the mXu to 1500, say it is improved to post buffers of
> 1500 bytes.
> >
> > Device doesn't know about it because mtu in config space is RO field.
> > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > This is because device follows the spec " The device MUST NOT pass
> received packets that exceed mtu".
> 
> 
> The "mtu" here is the device config field, which is
> 
>         /* Default maximum transmit unit advice */
> 

It is the field from struct virtio_net_config.mtu. right?
This is RO field for driver.

> there is no guarantee device will not get a bigger packet.
Right. That is what I also hinted.
Hence, allocating buffers worth upto mtu is safer.
When user overrides it, driver can be further optimized to honor such new value on rx buffer posting.

> And there is no guarantee such a packet will be dropped as opposed to
> wedging the device if userspace insists on adding smaller buffers.
>
If user space insists on small buffers, so be it. It only works when user exactly know what user is doing in the whole network.
When user prefers to override the device RO field, device is in the dark and things work on best effort basis.
This must be a reasonably advance user who has good knowledge of its network topology etc.

For such case, may be yes, driver should be further optimized.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 21:49                       ` Parav Pandit
  0 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit @ 2022-08-09 21:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Si-Wei Liu, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, August 9, 2022 5:38 PM

[..]
> > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > which case the receive buffer will be reduced to fit the 1500B
> > > payload size when mtu is lowered down to 1500 from 9000.
> > How? Driver reduced the mXu to 1500, say it is improved to post buffers of
> 1500 bytes.
> >
> > Device doesn't know about it because mtu in config space is RO field.
> > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > This is because device follows the spec " The device MUST NOT pass
> received packets that exceed mtu".
> 
> 
> The "mtu" here is the device config field, which is
> 
>         /* Default maximum transmit unit advice */
> 

It is the field from struct virtio_net_config.mtu. right?
This is RO field for driver.

> there is no guarantee device will not get a bigger packet.
Right. That is what I also hinted.
Hence, allocating buffers worth upto mtu is safer.
When user overrides it, driver can be further optimized to honor such new value on rx buffer posting.

> And there is no guarantee such a packet will be dropped as opposed to
> wedging the device if userspace insists on adding smaller buffers.
>
If user space insists on small buffers, so be it. It only works when user exactly know what user is doing in the whole network.
When user prefers to override the device RO field, device is in the dark and things work on best effort basis.
This must be a reasonably advance user who has good knowledge of its network topology etc.

For such case, may be yes, driver should be further optimized.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 21:49                       ` Parav Pandit
@ 2022-08-09 22:25                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 22:25 UTC (permalink / raw)
  To: Parav Pandit
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, August 9, 2022 5:38 PM
> 
> [..]
> > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > which case the receive buffer will be reduced to fit the 1500B
> > > > payload size when mtu is lowered down to 1500 from 9000.
> > > How? Driver reduced the mXu to 1500, say it is improved to post buffers of
> > 1500 bytes.
> > >
> > > Device doesn't know about it because mtu in config space is RO field.
> > > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > > This is because device follows the spec " The device MUST NOT pass
> > received packets that exceed mtu".
> > 
> > 
> > The "mtu" here is the device config field, which is
> > 
> >         /* Default maximum transmit unit advice */
> > 
> 
> It is the field from struct virtio_net_config.mtu. right?
> This is RO field for driver.
> 
> > there is no guarantee device will not get a bigger packet.
> Right. That is what I also hinted.
> Hence, allocating buffers worth upto mtu is safer.

yes

> When user overrides it, driver can be further optimized to honor such new value on rx buffer posting.

no, not without a feature bit promising device won't get wedged.

> > And there is no guarantee such a packet will be dropped as opposed to
> > wedging the device if userspace insists on adding smaller buffers.
> >
> If user space insists on small buffers, so be it.

If previously things worked, the "so be it" is a regression and blaming
users won't help us. 

> It only works when user exactly know what user is doing in the whole network.

If you want to claim this you need a new feature bit.

> When user prefers to override the device RO field, device is in the dark and things work on best effort basis.

Dropping packets is best effort. Getting stuck forever isn't, that's
a quality of implementation issue.

> This must be a reasonably advance user who has good knowledge of its network topology etc.
> 
> For such case, may be yes, driver should be further optimized.
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 22:25                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 22:25 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Si-Wei Liu, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, August 9, 2022 5:38 PM
> 
> [..]
> > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > which case the receive buffer will be reduced to fit the 1500B
> > > > payload size when mtu is lowered down to 1500 from 9000.
> > > How? Driver reduced the mXu to 1500, say it is improved to post buffers of
> > 1500 bytes.
> > >
> > > Device doesn't know about it because mtu in config space is RO field.
> > > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > > This is because device follows the spec " The device MUST NOT pass
> > received packets that exceed mtu".
> > 
> > 
> > The "mtu" here is the device config field, which is
> > 
> >         /* Default maximum transmit unit advice */
> > 
> 
> It is the field from struct virtio_net_config.mtu. right?
> This is RO field for driver.
> 
> > there is no guarantee device will not get a bigger packet.
> Right. That is what I also hinted.
> Hence, allocating buffers worth upto mtu is safer.

yes

> When user overrides it, driver can be further optimized to honor such new value on rx buffer posting.

no, not without a feature bit promising device won't get wedged.

> > And there is no guarantee such a packet will be dropped as opposed to
> > wedging the device if userspace insists on adding smaller buffers.
> >
> If user space insists on small buffers, so be it.

If previously things worked, the "so be it" is a regression and blaming
users won't help us. 

> It only works when user exactly know what user is doing in the whole network.

If you want to claim this you need a new feature bit.

> When user prefers to override the device RO field, device is in the dark and things work on best effort basis.

Dropping packets is best effort. Getting stuck forever isn't, that's
a quality of implementation issue.

> This must be a reasonably advance user who has good knowledge of its network topology etc.
> 
> For such case, may be yes, driver should be further optimized.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 21:39                 ` Si-Wei Liu
@ 2022-08-09 22:27                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 22:27 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 02:39:49PM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/9/2022 2:34 PM, Michael S. Tsirkin wrote:
> > On Tue, Aug 09, 2022 at 11:38:52AM -0700, Si-Wei Liu wrote:
> > > 
> > > On 8/9/2022 12:44 AM, Jason Wang wrote:
> > > > On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
> > > > > On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
> > > > > 
> > > > > External email: Use caution opening links or attachments
> > > > > 
> > > > > 
> > > > > On 8/8/2022 12:31 AM, Gavin Li wrote:
> > > > > 
> > > > > 
> > > > > On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
> > > > > 
> > > > > External email: Use caution opening links or attachments
> > > > > 
> > > > > 
> > > > > On 8/1/2022 9:45 PM, Gavin Li wrote:
> > > > > 
> > > > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > > > > packets even when GUEST_* offloads are not present on the device.
> > > > > However, if GSO is not supported,
> > > > > 
> > > > > GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
> > > > > been be called.
> > > > > 
> > > > > ACK
> > > > > 
> > > > > 
> > > > >     it would be sufficient to allocate
> > > > > segments to cover just up the MTU size and no further. Allocating the
> > > > > maximum amount of segments results in a large waste of buffer space in
> > > > > the queue, which limits the number of packets that can be buffered and
> > > > > can result in reduced performance.
> > > > > 
> > > > > Therefore, if GSO is not supported,
> > > > > 
> > > > > Ditto.
> > > > > 
> > > > > ACK
> > > > > 
> > > > > 
> > > > > use the MTU to calculate the
> > > > > optimal amount of segments required.
> > > > > 
> > > > > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > > > > 1 VQ, queue size 1024, before and after the change, with the iperf
> > > > > server running over the virtio-net interface.
> > > > > 
> > > > > MTU(Bytes)/Bandwidth (Gbit/s)
> > > > >                 Before   After
> > > > >      1500        22.5     22.4
> > > > >      9000        12.8     25.9
> > > > > 
> > > > > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > > > > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > > ---
> > > > >     drivers/net/virtio_net.c | 20 ++++++++++++++++----
> > > > >     1 file changed, 16 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index ec8e1b3108c3..d36918c1809d 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -222,6 +222,9 @@ struct virtnet_info {
> > > > >         /* I like... big packets and I cannot lie! */
> > > > >         bool big_packets;
> > > > > 
> > > > > +     /* Indicates GSO support */
> > > > > +     bool gso_is_supported;
> > > > > +
> > > > >         /* Host will merge rx buffers for big packets (shake it! shake
> > > > > it!) */
> > > > >         bool mergeable_rx_bufs;
> > > > > 
> > > > > @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
> > > > > virtnet_info *vi, struct receive_queue *rq,
> > > > >     static int add_recvbuf_big(struct virtnet_info *vi, struct
> > > > > receive_queue *rq,
> > > > >                            gfp_t gfp)
> > > > >     {
> > > > > +     unsigned int sg_num = MAX_SKB_FRAGS;
> > > > >         struct page *first, *list = NULL;
> > > > >         char *p;
> > > > >         int i, err, offset;
> > > > > 
> > > > > -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > > > > +     if (!vi->gso_is_supported) {
> > > > > +             unsigned int mtu = vi->dev->mtu;
> > > > > +
> > > > > +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
> > > > > / PAGE_SIZE;
> > > > > 
> > > > > DIV_ROUND_UP() can be used?
> > > > > 
> > > > > ACK
> > > > > 
> > > > > 
> > > > > Since this branch slightly adds up cost to the datapath, I wonder if
> > > > > this sg_num can be saved and set only once (generally in virtnet_probe
> > > > > time) in struct virtnet_info?
> > > > > 
> > > > > Not sure how to do it and align it with align with new mtu during
> > > > > .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
> > > > > ndo_change_mtu might be in vendor specific code and unmanageable. In
> > > > > my case, the mtu can only be changed in the xml of the guest vm.
> > > > > 
> > > > > Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
> > > > > a virtio-net device with 9000 MTU (as defined in guest xml). Basically
> > > > > guest user can set MTU to any valid value lower than the original
> > > > > HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
> > > > > should have validated the MTU value before coming down to it. And I
> > > > > suspect you might want to do virtnet_close() and virtnet_open()
> > > > > before/after changing the buffer size on the fly (the netif_running()
> > > > > case), implementing .ndo_change_mtu() will be needed anyway.
> > > > > 
> > > > > a guest VM driver changing mtu to smaller one is valid use case. However, current optimization suggested in the patch doesn't degrade any performance. Performing close() and open() sequence is good idea, that I would like to take up next after this patch as its going to be more than one patch to achieve it.
> > > > Right, it could be done on top.
> > > > 
> > > > But another note is that, it would still be better to support GUEST GSO feature:
> > > > 
> > > > 1) can work for the case for path MTU
> > > > 2) (migration)compatibility with software backends
> > > > 
> > > > > +     }
> > > > > +
> > > > > +     sg_init_table(rq->sg, sg_num + 2);
> > > > > 
> > > > >         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > > > > 
> > > > > Comment doesn't match code.
> > > > > 
> > > > > ACK
> > > > > 
> > > > > -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > > > > +     for (i = sg_num + 1; i > 1; --i) {
> > > > >                 first = get_a_page(rq, gfp);
> > > > >                 if (!first) {
> > > > >                         if (list)
> > > > > @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
> > > > > *vi, struct receive_queue *rq,
> > > > > 
> > > > >         /* chain first in list head */
> > > > >         first->private = (unsigned long)list;
> > > > > -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > > > > +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
> > > > >                                   first, gfp);
> > > > >         if (err < 0)
> > > > >                 give_pages(rq, first);
> > > > > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
> > > > > *vdev)
> > > > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > > > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > > > -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > > > > +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > > > >                 vi->big_packets = true;
> > > > > +             vi->gso_is_supported = true;
> > > > > 
> > > > > Please do the same for virtnet_clear_guest_offloads(), and
> > > > > correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> > > > > virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> > > > > successful return, seems like a bug to me.
> > > > It is fine as long as
> > > > 
> > > > 1) we don't implement ethtool API for changing guest offloads
> > > Not sure if I missed something, but it looks the current
> > > virtnet_set_features() already supports toggling on/off GRO HW through
> > > commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 (formerly misnamed as LRO).
> > > Sorry, I realized I had a typo in email: "virtnet_set_guest_offloads() or
> > > the caller doesn't unset big_packet ...".
> > "we" here is the device, not the driver.
> What is the ethtool API from device level? VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> for sure, right?

Yes.

> It's implemented in software backend as far as I know. I see no reason
> *technically* this is infeasible, regardless what you name it, being a bug
> or TODO.
> 
> -Siwei

It's feasible but it's more work. Whether we bother depends in
particular on whether anyone cares.

> > 
> > > > 2) big mode XDP is not enabled
> > > Currently it is not. Not a single patch nor this patch, but the context for
> > > the eventual goal is to allow XDP on a MTU=9000 link when guest users
> > > intentionally lower down MTU to 1500.
> > > 
> > > Regards,
> > > -Siwei
> > > > So that code works only for XDP but we forbid big packets in the case
> > > > of XDP right now.
> > > > 
> > > > Thanks
> > > > 
> > > > > ACK. The two calls virtnet_set_guest_offloads and
> > > > > virtnet_set_guest_offloads is also called by virtnet_set_features. Do
> > > > > you think if I can do this in virtnet_set_guest_offloads?
> > > > > 
> > > > > I think that it should be fine, though you may want to deal with the XDP
> > > > > path not to regress it.
> > > > > 
> > > > > -Siwei
> > > > > 
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > -Siwei
> > > > > 
> > > > > +     }
> > > > > 
> > > > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> > > > >                 vi->mergeable_rx_bufs = true;
> > > > > 
> > > > > 
> > > > > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 22:27                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 22:27 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Jason Wang, Gavin Li, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Parav Pandit, gavi

On Tue, Aug 09, 2022 at 02:39:49PM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/9/2022 2:34 PM, Michael S. Tsirkin wrote:
> > On Tue, Aug 09, 2022 at 11:38:52AM -0700, Si-Wei Liu wrote:
> > > 
> > > On 8/9/2022 12:44 AM, Jason Wang wrote:
> > > > On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
> > > > > On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
> > > > > 
> > > > > External email: Use caution opening links or attachments
> > > > > 
> > > > > 
> > > > > On 8/8/2022 12:31 AM, Gavin Li wrote:
> > > > > 
> > > > > 
> > > > > On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
> > > > > 
> > > > > External email: Use caution opening links or attachments
> > > > > 
> > > > > 
> > > > > On 8/1/2022 9:45 PM, Gavin Li wrote:
> > > > > 
> > > > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > > > > packets even when GUEST_* offloads are not present on the device.
> > > > > However, if GSO is not supported,
> > > > > 
> > > > > GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
> > > > > been be called.
> > > > > 
> > > > > ACK
> > > > > 
> > > > > 
> > > > >     it would be sufficient to allocate
> > > > > segments to cover just up the MTU size and no further. Allocating the
> > > > > maximum amount of segments results in a large waste of buffer space in
> > > > > the queue, which limits the number of packets that can be buffered and
> > > > > can result in reduced performance.
> > > > > 
> > > > > Therefore, if GSO is not supported,
> > > > > 
> > > > > Ditto.
> > > > > 
> > > > > ACK
> > > > > 
> > > > > 
> > > > > use the MTU to calculate the
> > > > > optimal amount of segments required.
> > > > > 
> > > > > Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> > > > > 1 VQ, queue size 1024, before and after the change, with the iperf
> > > > > server running over the virtio-net interface.
> > > > > 
> > > > > MTU(Bytes)/Bandwidth (Gbit/s)
> > > > >                 Before   After
> > > > >      1500        22.5     22.4
> > > > >      9000        12.8     25.9
> > > > > 
> > > > > Signed-off-by: Gavin Li <gavinl@nvidia.com>
> > > > > Reviewed-by: Gavi Teitz <gavi@nvidia.com>
> > > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > > ---
> > > > >     drivers/net/virtio_net.c | 20 ++++++++++++++++----
> > > > >     1 file changed, 16 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index ec8e1b3108c3..d36918c1809d 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -222,6 +222,9 @@ struct virtnet_info {
> > > > >         /* I like... big packets and I cannot lie! */
> > > > >         bool big_packets;
> > > > > 
> > > > > +     /* Indicates GSO support */
> > > > > +     bool gso_is_supported;
> > > > > +
> > > > >         /* Host will merge rx buffers for big packets (shake it! shake
> > > > > it!) */
> > > > >         bool mergeable_rx_bufs;
> > > > > 
> > > > > @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
> > > > > virtnet_info *vi, struct receive_queue *rq,
> > > > >     static int add_recvbuf_big(struct virtnet_info *vi, struct
> > > > > receive_queue *rq,
> > > > >                            gfp_t gfp)
> > > > >     {
> > > > > +     unsigned int sg_num = MAX_SKB_FRAGS;
> > > > >         struct page *first, *list = NULL;
> > > > >         char *p;
> > > > >         int i, err, offset;
> > > > > 
> > > > > -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > > > > +     if (!vi->gso_is_supported) {
> > > > > +             unsigned int mtu = vi->dev->mtu;
> > > > > +
> > > > > +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
> > > > > / PAGE_SIZE;
> > > > > 
> > > > > DIV_ROUND_UP() can be used?
> > > > > 
> > > > > ACK
> > > > > 
> > > > > 
> > > > > Since this branch slightly adds up cost to the datapath, I wonder if
> > > > > this sg_num can be saved and set only once (generally in virtnet_probe
> > > > > time) in struct virtnet_info?
> > > > > 
> > > > > Not sure how to do it and align it with align with new mtu during
> > > > > .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
> > > > > ndo_change_mtu might be in vendor specific code and unmanageable. In
> > > > > my case, the mtu can only be changed in the xml of the guest vm.
> > > > > 
> > > > > Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from guest on
> > > > > a virtio-net device with 9000 MTU (as defined in guest xml). Basically
> > > > > guest user can set MTU to any valid value lower than the original
> > > > > HOST_MTU. In the vendor defined .ndo_change_mtu() op, dev_validate_mtu()
> > > > > should have validated the MTU value before coming down to it. And I
> > > > > suspect you might want to do virtnet_close() and virtnet_open()
> > > > > before/after changing the buffer size on the fly (the netif_running()
> > > > > case), implementing .ndo_change_mtu() will be needed anyway.
> > > > > 
> > > > > a guest VM driver changing mtu to smaller one is valid use case. However, current optimization suggested in the patch doesn't degrade any performance. Performing close() and open() sequence is good idea, that I would like to take up next after this patch as its going to be more than one patch to achieve it.
> > > > Right, it could be done on top.
> > > > 
> > > > But another note is that, it would still be better to support GUEST GSO feature:
> > > > 
> > > > 1) can work for the case for path MTU
> > > > 2) (migration)compatibility with software backends
> > > > 
> > > > > +     }
> > > > > +
> > > > > +     sg_init_table(rq->sg, sg_num + 2);
> > > > > 
> > > > >         /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > > > > 
> > > > > Comment doesn't match code.
> > > > > 
> > > > > ACK
> > > > > 
> > > > > -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > > > > +     for (i = sg_num + 1; i > 1; --i) {
> > > > >                 first = get_a_page(rq, gfp);
> > > > >                 if (!first) {
> > > > >                         if (list)
> > > > > @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
> > > > > *vi, struct receive_queue *rq,
> > > > > 
> > > > >         /* chain first in list head */
> > > > >         first->private = (unsigned long)list;
> > > > > -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > > > > +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
> > > > >                                   first, gfp);
> > > > >         if (err < 0)
> > > > >                 give_pages(rq, first);
> > > > > @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
> > > > > *vdev)
> > > > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> > > > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> > > > >             virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
> > > > > -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
> > > > > +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
> > > > >                 vi->big_packets = true;
> > > > > +             vi->gso_is_supported = true;
> > > > > 
> > > > > Please do the same for virtnet_clear_guest_offloads(), and
> > > > > correspondingly virtnet_restore_guest_offloads() as well. Not sure why
> > > > > virtnet_clear_guest_offloads() or the caller doesn't unset big_packet on
> > > > > successful return, seems like a bug to me.
> > > > It is fine as long as
> > > > 
> > > > 1) we don't implement ethtool API for changing guest offloads
> > > Not sure if I missed something, but it looks the current
> > > virtnet_set_features() already supports toggling on/off GRO HW through
> > > commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 (formerly misnamed as LRO).
> > > Sorry, I realized I had a typo in email: "virtnet_set_guest_offloads() or
> > > the caller doesn't unset big_packet ...".
> > "we" here is the device, not the driver.
> What is the ethtool API from device level? VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> for sure, right?

Yes.

> It's implemented in software backend as far as I know. I see no reason
> *technically* this is infeasible, regardless what you name it, being a bug
> or TODO.
> 
> -Siwei

It's feasible but it's more work. Whether we bother depends in
particular on whether anyone cares.

> > 
> > > > 2) big mode XDP is not enabled
> > > Currently it is not. Not a single patch nor this patch, but the context for
> > > the eventual goal is to allow XDP on a MTU=9000 link when guest users
> > > intentionally lower down MTU to 1500.
> > > 
> > > Regards,
> > > -Siwei
> > > > So that code works only for XDP but we forbid big packets in the case
> > > > of XDP right now.
> > > > 
> > > > Thanks
> > > > 
> > > > > ACK. The two calls virtnet_set_guest_offloads and
> > > > > virtnet_set_guest_offloads is also called by virtnet_set_features. Do
> > > > > you think if I can do this in virtnet_set_guest_offloads?
> > > > > 
> > > > > I think that it should be fine, though you may want to deal with the XDP
> > > > > path not to regress it.
> > > > > 
> > > > > -Siwei
> > > > > 
> > > > > 
> > > > > 
> > > > > Thanks,
> > > > > -Siwei
> > > > > 
> > > > > +     }
> > > > > 
> > > > >         if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
> > > > >                 vi->mergeable_rx_bufs = true;
> > > > > 
> > > > > 
> > > > > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 21:37                     ` Michael S. Tsirkin
@ 2022-08-09 22:32                       ` Si-Wei Liu
  -1 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 22:32 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li



On 8/9/2022 2:37 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>> Sent: Tuesday, August 9, 2022 3:09 PM
>>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
>>>>> single patch nor this patch, but the context for the eventual goal is
>>>>> to allow XDP on a MTU=9000 link when guest users intentionally lower
>>>>> down MTU to 1500.
>>>> Which application benefit by having asymmetry by lowering mtu to 1500
>>> to send packets but want to receive 9K packets?
>> Below details doesn’t answer the question of asymmetry. :)
>>
>>> I think virtio-net driver doesn't differentiate MTU and MRU, in which case
>>> the receive buffer will be reduced to fit the 1500B payload size when mtu is
>>> lowered down to 1500 from 9000.
>> How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
>>
>> Device doesn't know about it because mtu in config space is RO field.
>> Device keep dropping 9K packets because buffers posted are 1500 bytes.
>> This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".
>
> The "mtu" here is the device config field, which is
>
>          /* Default maximum transmit unit advice */
>
> there is no guarantee device will not get a bigger packet.
> And there is no guarantee such a packet will be dropped
> as opposed to wedging the device if userspace insists on
> adding smaller buffers.
It'd be nice to document this requirement or statement to the spec for 
clarity purpose. Otherwise various device implementations are hard to 
follow. The capture is that vhost-net drops bigger packets while the 
driver only supplied smaller buffers. This is the status quo, and users 
seemingly have relied on this behavior for some while.

-Siwei
>
>
>> So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
>> (it doesn’t have any relation to mergeable or otherwise).
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 22:32                       ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 22:32 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: Jason Wang, Gavin Li, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Gavi Teitz



On 8/9/2022 2:37 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>> Sent: Tuesday, August 9, 2022 3:09 PM
>>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
>>>>> single patch nor this patch, but the context for the eventual goal is
>>>>> to allow XDP on a MTU=9000 link when guest users intentionally lower
>>>>> down MTU to 1500.
>>>> Which application benefit by having asymmetry by lowering mtu to 1500
>>> to send packets but want to receive 9K packets?
>> Below details doesn’t answer the question of asymmetry. :)
>>
>>> I think virtio-net driver doesn't differentiate MTU and MRU, in which case
>>> the receive buffer will be reduced to fit the 1500B payload size when mtu is
>>> lowered down to 1500 from 9000.
>> How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
>>
>> Device doesn't know about it because mtu in config space is RO field.
>> Device keep dropping 9K packets because buffers posted are 1500 bytes.
>> This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".
>
> The "mtu" here is the device config field, which is
>
>          /* Default maximum transmit unit advice */
>
> there is no guarantee device will not get a bigger packet.
> And there is no guarantee such a packet will be dropped
> as opposed to wedging the device if userspace insists on
> adding smaller buffers.
It'd be nice to document this requirement or statement to the spec for 
clarity purpose. Otherwise various device implementations are hard to 
follow. The capture is that vhost-net drops bigger packets while the 
driver only supplied smaller buffers. This is the status quo, and users 
seemingly have relied on this behavior for some while.

-Siwei
>
>
>> So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
>> (it doesn’t have any relation to mergeable or otherwise).
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 22:32                       ` Si-Wei Liu
@ 2022-08-09 22:37                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 22:37 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 03:32:26PM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/9/2022 2:37 PM, Michael S. Tsirkin wrote:
> > On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
> > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > Sent: Tuesday, August 9, 2022 3:09 PM
> > > > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > > Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> > > > > > single patch nor this patch, but the context for the eventual goal is
> > > > > > to allow XDP on a MTU=9000 link when guest users intentionally lower
> > > > > > down MTU to 1500.
> > > > > Which application benefit by having asymmetry by lowering mtu to 1500
> > > > to send packets but want to receive 9K packets?
> > > Below details doesn’t answer the question of asymmetry. :)
> > > 
> > > > I think virtio-net driver doesn't differentiate MTU and MRU, in which case
> > > > the receive buffer will be reduced to fit the 1500B payload size when mtu is
> > > > lowered down to 1500 from 9000.
> > > How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
> > > 
> > > Device doesn't know about it because mtu in config space is RO field.
> > > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > > This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".
> > 
> > The "mtu" here is the device config field, which is
> > 
> >          /* Default maximum transmit unit advice */
> > 
> > there is no guarantee device will not get a bigger packet.
> > And there is no guarantee such a packet will be dropped
> > as opposed to wedging the device if userspace insists on
> > adding smaller buffers.
> It'd be nice to document this requirement or statement to the spec for
> clarity purpose.

It's not a requirement, more of a bug. But it's been like this for
years.

> Otherwise various device implementations are hard to
> follow. The capture is that vhost-net drops bigger packets while the driver
> only supplied smaller buffers. This is the status quo, and users seemingly
> have relied on this behavior for some while.
> 
> -Siwei

Weird where do you see this in code? I see

                sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
                                                      &busyloop_intr);
                if (!sock_len)
                        break;
                sock_len += sock_hlen;
                vhost_len = sock_len + vhost_hlen;
                headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                                        vhost_len, &in, vq_log, &log,
                                        likely(mergeable) ? UIO_MAXIOV : 1);
                /* On error, stop handling until the next kick. */
                if (unlikely(headcount < 0))
                        goto out;


so it does not drop a packet, it just stops processing the queue.



> > 
> > 
> > > So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
> > > (it doesn’t have any relation to mergeable or otherwise).
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 22:37                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 22:37 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Parav Pandit, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Tue, Aug 09, 2022 at 03:32:26PM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/9/2022 2:37 PM, Michael S. Tsirkin wrote:
> > On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
> > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > Sent: Tuesday, August 9, 2022 3:09 PM
> > > > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > > Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> > > > > > single patch nor this patch, but the context for the eventual goal is
> > > > > > to allow XDP on a MTU=9000 link when guest users intentionally lower
> > > > > > down MTU to 1500.
> > > > > Which application benefit by having asymmetry by lowering mtu to 1500
> > > > to send packets but want to receive 9K packets?
> > > Below details doesn’t answer the question of asymmetry. :)
> > > 
> > > > I think virtio-net driver doesn't differentiate MTU and MRU, in which case
> > > > the receive buffer will be reduced to fit the 1500B payload size when mtu is
> > > > lowered down to 1500 from 9000.
> > > How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
> > > 
> > > Device doesn't know about it because mtu in config space is RO field.
> > > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > > This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".
> > 
> > The "mtu" here is the device config field, which is
> > 
> >          /* Default maximum transmit unit advice */
> > 
> > there is no guarantee device will not get a bigger packet.
> > And there is no guarantee such a packet will be dropped
> > as opposed to wedging the device if userspace insists on
> > adding smaller buffers.
> It'd be nice to document this requirement or statement to the spec for
> clarity purpose.

It's not a requirement, more of a bug. But it's been like this for
years.

> Otherwise various device implementations are hard to
> follow. The capture is that vhost-net drops bigger packets while the driver
> only supplied smaller buffers. This is the status quo, and users seemingly
> have relied on this behavior for some while.
> 
> -Siwei

Weird where do you see this in code? I see

                sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
                                                      &busyloop_intr);
                if (!sock_len)
                        break;
                sock_len += sock_hlen;
                vhost_len = sock_len + vhost_hlen;
                headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                                        vhost_len, &in, vq_log, &log,
                                        likely(mergeable) ? UIO_MAXIOV : 1);
                /* On error, stop handling until the next kick. */
                if (unlikely(headcount < 0))
                        goto out;


so it does not drop a packet, it just stops processing the queue.



> > 
> > 
> > > So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
> > > (it doesn’t have any relation to mergeable or otherwise).
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 22:25                         ` Michael S. Tsirkin
@ 2022-08-09 22:49                           ` Parav Pandit
  -1 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit via Virtualization @ 2022-08-09 22:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, August 9, 2022 6:26 PM
> To: Parav Pandit <parav@nvidia.com>
> Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> Stephen <stephen@networkplumber.org>; davem
> <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> Teitz <gavi@nvidia.com>
> Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> big packets
> 
> On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, August 9, 2022 5:38 PM
> >
> > [..]
> > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > buffers of
> > > 1500 bytes.
> > > >
> > > > Device doesn't know about it because mtu in config space is RO field.
> > > > Device keep dropping 9K packets because buffers posted are 1500
> bytes.
> > > > This is because device follows the spec " The device MUST NOT pass
> > > received packets that exceed mtu".
> > >
> > >
> > > The "mtu" here is the device config field, which is
> > >
> > >         /* Default maximum transmit unit advice */
> > >
> >
> > It is the field from struct virtio_net_config.mtu. right?
> > This is RO field for driver.
> >
> > > there is no guarantee device will not get a bigger packet.
> > Right. That is what I also hinted.
> > Hence, allocating buffers worth upto mtu is safer.
> 
> yes
> 
> > When user overrides it, driver can be further optimized to honor such new
> value on rx buffer posting.
> 
> no, not without a feature bit promising device won't get wedged.
> 
I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
Why device should be affected with it?
( I am not proposing such weird configuration but asking for sake of correctness).

> > > And there is no guarantee such a packet will be dropped as opposed
> > > to wedging the device if userspace insists on adding smaller buffers.
> > >
> > If user space insists on small buffers, so be it.
> 
> If previously things worked, the "so be it" is a regression and blaming users
> won't help us.
> 
I am not suggesting above.
This was Si-Wei's suggestion that somehow driver wants to post smaller buffers than the mtu because user knows what peer is doing.
So may be driver can be extended to give more weight on user config.

> > It only works when user exactly know what user is doing in the whole
> network.
> 
> If you want to claim this you need a new feature bit.
> 
Why is a new bit needed to tell device?
User is doing something its own config mismatching the buffers and mtu.
A solid use case hasn't emerged for this yet.

If user wants to modify the mtu, we should just make virtio_net_config.mtu as RW field using new feature bit.
Is that what you mean?
If so, yes, it makes things very neat where driver and device are aligned to each other, the way they are today.
Only limitation is that its one-way. = device tells to driver.

> > When user prefers to override the device RO field, device is in the dark and
> things work on best effort basis.
> 
> Dropping packets is best effort. Getting stuck forever isn't, that's a quality of
> implementation issue.
>
Not sure, why things get stuck for ever. Maybe you have example to explain.
I am mostly missing something.
 
> > This must be a reasonably advance user who has good knowledge of its
> network topology etc.
> >
> > For such case, may be yes, driver should be further optimized.
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 22:49                           ` Parav Pandit
  0 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit @ 2022-08-09 22:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Si-Wei Liu, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, August 9, 2022 6:26 PM
> To: Parav Pandit <parav@nvidia.com>
> Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> Stephen <stephen@networkplumber.org>; davem
> <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> Teitz <gavi@nvidia.com>
> Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> big packets
> 
> On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, August 9, 2022 5:38 PM
> >
> > [..]
> > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > buffers of
> > > 1500 bytes.
> > > >
> > > > Device doesn't know about it because mtu in config space is RO field.
> > > > Device keep dropping 9K packets because buffers posted are 1500
> bytes.
> > > > This is because device follows the spec " The device MUST NOT pass
> > > received packets that exceed mtu".
> > >
> > >
> > > The "mtu" here is the device config field, which is
> > >
> > >         /* Default maximum transmit unit advice */
> > >
> >
> > It is the field from struct virtio_net_config.mtu. right?
> > This is RO field for driver.
> >
> > > there is no guarantee device will not get a bigger packet.
> > Right. That is what I also hinted.
> > Hence, allocating buffers worth upto mtu is safer.
> 
> yes
> 
> > When user overrides it, driver can be further optimized to honor such new
> value on rx buffer posting.
> 
> no, not without a feature bit promising device won't get wedged.
> 
I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
Why device should be affected with it?
( I am not proposing such weird configuration but asking for sake of correctness).

> > > And there is no guarantee such a packet will be dropped as opposed
> > > to wedging the device if userspace insists on adding smaller buffers.
> > >
> > If user space insists on small buffers, so be it.
> 
> If previously things worked, the "so be it" is a regression and blaming users
> won't help us.
> 
I am not suggesting above.
This was Si-Wei's suggestion that somehow driver wants to post smaller buffers than the mtu because user knows what peer is doing.
So may be driver can be extended to give more weight on user config.

> > It only works when user exactly know what user is doing in the whole
> network.
> 
> If you want to claim this you need a new feature bit.
> 
Why is a new bit needed to tell device?
User is doing something its own config mismatching the buffers and mtu.
A solid use case hasn't emerged for this yet.

If user wants to modify the mtu, we should just make virtio_net_config.mtu as RW field using new feature bit.
Is that what you mean?
If so, yes, it makes things very neat where driver and device are aligned to each other, the way they are today.
Only limitation is that its one-way. = device tells to driver.

> > When user prefers to override the device RO field, device is in the dark and
> things work on best effort basis.
> 
> Dropping packets is best effort. Getting stuck forever isn't, that's a quality of
> implementation issue.
>
Not sure, why things get stuck for ever. Maybe you have example to explain.
I am mostly missing something.
 
> > This must be a reasonably advance user who has good knowledge of its
> network topology etc.
> >
> > For such case, may be yes, driver should be further optimized.
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 22:37                         ` Michael S. Tsirkin
@ 2022-08-09 22:54                           ` Si-Wei Liu
  -1 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 22:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li



On 8/9/2022 3:37 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 09, 2022 at 03:32:26PM -0700, Si-Wei Liu wrote:
>>
>> On 8/9/2022 2:37 PM, Michael S. Tsirkin wrote:
>>> On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
>>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>> Sent: Tuesday, August 9, 2022 3:09 PM
>>>>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
>>>>>>> single patch nor this patch, but the context for the eventual goal is
>>>>>>> to allow XDP on a MTU=9000 link when guest users intentionally lower
>>>>>>> down MTU to 1500.
>>>>>> Which application benefit by having asymmetry by lowering mtu to 1500
>>>>> to send packets but want to receive 9K packets?
>>>> Below details doesn’t answer the question of asymmetry. :)
>>>>
>>>>> I think virtio-net driver doesn't differentiate MTU and MRU, in which case
>>>>> the receive buffer will be reduced to fit the 1500B payload size when mtu is
>>>>> lowered down to 1500 from 9000.
>>>> How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
>>>>
>>>> Device doesn't know about it because mtu in config space is RO field.
>>>> Device keep dropping 9K packets because buffers posted are 1500 bytes.
>>>> This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".
>>> The "mtu" here is the device config field, which is
>>>
>>>           /* Default maximum transmit unit advice */
>>>
>>> there is no guarantee device will not get a bigger packet.
>>> And there is no guarantee such a packet will be dropped
>>> as opposed to wedging the device if userspace insists on
>>> adding smaller buffers.
>> It'd be nice to document this requirement or statement to the spec for
>> clarity purpose.
> It's not a requirement, more of a bug. But it's been like this for
> years.
Well, I'm not sure how it may wedge the device if not capable of posting 
to smaller buffers, is there other option than drop? Truncate to what 
the buffer size may fit and deliver up? Seems even worse than drop...

>
>> Otherwise various device implementations are hard to
>> follow. The capture is that vhost-net drops bigger packets while the driver
>> only supplied smaller buffers. This is the status quo, and users seemingly
>> have relied on this behavior for some while.
>>
>> -Siwei
> Weird where do you see this in code? I see
>
>                  sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>                                                        &busyloop_intr);
>                  if (!sock_len)
>                          break;
>                  sock_len += sock_hlen;
>                  vhost_len = sock_len + vhost_hlen;
>                  headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
>                                          vhost_len, &in, vq_log, &log,
>                                          likely(mergeable) ? UIO_MAXIOV : 1);
>                  /* On error, stop handling until the next kick. */
>                  if (unlikely(headcount < 0))
>                          goto out;
>
>
> so it does not drop a packet, it just stops processing the queue.
Here

                 /* On overrun, truncate and discard */
                 if (unlikely(headcount > UIO_MAXIOV)) {
                         iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
                         err = sock->ops->recvmsg(sock, &msg,
                                                  1, MSG_DONTWAIT | 
MSG_TRUNC);
                         pr_debug("Discarded rx packet: len %zd\n", 
sock_len);
                         continue;
                 }

recvmsg(, , 1, ) is essentially to drop the oversized packet.


In get_rx_bufs(), overrun detection will return something larger than 
UIO_MAXIOV as indicator:

static int get_rx_bufs()
{
:
;
         /* Detect overrun */
         if (unlikely(datalen > 0)) {
                 r = UIO_MAXIOV + 1;
                 goto err;
         }
:
:


-Siwei

>
>
>>>
>>>> So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
>>>> (it doesn’t have any relation to mergeable or otherwise).
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 22:54                           ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 22:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz



On 8/9/2022 3:37 PM, Michael S. Tsirkin wrote:
> On Tue, Aug 09, 2022 at 03:32:26PM -0700, Si-Wei Liu wrote:
>>
>> On 8/9/2022 2:37 PM, Michael S. Tsirkin wrote:
>>> On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
>>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>> Sent: Tuesday, August 9, 2022 3:09 PM
>>>>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
>>>>>>> single patch nor this patch, but the context for the eventual goal is
>>>>>>> to allow XDP on a MTU=9000 link when guest users intentionally lower
>>>>>>> down MTU to 1500.
>>>>>> Which application benefit by having asymmetry by lowering mtu to 1500
>>>>> to send packets but want to receive 9K packets?
>>>> Below details doesn’t answer the question of asymmetry. :)
>>>>
>>>>> I think virtio-net driver doesn't differentiate MTU and MRU, in which case
>>>>> the receive buffer will be reduced to fit the 1500B payload size when mtu is
>>>>> lowered down to 1500 from 9000.
>>>> How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
>>>>
>>>> Device doesn't know about it because mtu in config space is RO field.
>>>> Device keep dropping 9K packets because buffers posted are 1500 bytes.
>>>> This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".
>>> The "mtu" here is the device config field, which is
>>>
>>>           /* Default maximum transmit unit advice */
>>>
>>> there is no guarantee device will not get a bigger packet.
>>> And there is no guarantee such a packet will be dropped
>>> as opposed to wedging the device if userspace insists on
>>> adding smaller buffers.
>> It'd be nice to document this requirement or statement to the spec for
>> clarity purpose.
> It's not a requirement, more of a bug. But it's been like this for
> years.
Well, I'm not sure how it may wedge the device if not capable of posting 
to smaller buffers, is there other option than drop? Truncate to what 
the buffer size may fit and deliver up? Seems even worse than drop...

>
>> Otherwise various device implementations are hard to
>> follow. The capture is that vhost-net drops bigger packets while the driver
>> only supplied smaller buffers. This is the status quo, and users seemingly
>> have relied on this behavior for some while.
>>
>> -Siwei
> Weird where do you see this in code? I see
>
>                  sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>                                                        &busyloop_intr);
>                  if (!sock_len)
>                          break;
>                  sock_len += sock_hlen;
>                  vhost_len = sock_len + vhost_hlen;
>                  headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
>                                          vhost_len, &in, vq_log, &log,
>                                          likely(mergeable) ? UIO_MAXIOV : 1);
>                  /* On error, stop handling until the next kick. */
>                  if (unlikely(headcount < 0))
>                          goto out;
>
>
> so it does not drop a packet, it just stops processing the queue.
Here

                 /* On overrun, truncate and discard */
                 if (unlikely(headcount > UIO_MAXIOV)) {
                         iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
                         err = sock->ops->recvmsg(sock, &msg,
                                                  1, MSG_DONTWAIT | 
MSG_TRUNC);
                         pr_debug("Discarded rx packet: len %zd\n", 
sock_len);
                         continue;
                 }

recvmsg(, , 1, ) is essentially to drop the oversized packet.


In get_rx_bufs(), overrun detection will return something larger than 
UIO_MAXIOV as indicator:

static int get_rx_bufs()
{
:
;
         /* Detect overrun */
         if (unlikely(datalen > 0)) {
                 r = UIO_MAXIOV + 1;
                 goto err;
         }
:
:


-Siwei

>
>
>>>
>>>> So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
>>>> (it doesn’t have any relation to mergeable or otherwise).
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 22:49                           ` Parav Pandit
@ 2022-08-09 22:59                             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 22:59 UTC (permalink / raw)
  To: Parav Pandit
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 10:49:48PM +0000, Parav Pandit wrote:
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, August 9, 2022 6:26 PM
> > To: Parav Pandit <parav@nvidia.com>
> > Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> > <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> > Stephen <stephen@networkplumber.org>; davem
> > <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> > foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> > jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> > kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> > Teitz <gavi@nvidia.com>
> > Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> > big packets
> > 
> > On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, August 9, 2022 5:38 PM
> > >
> > > [..]
> > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > > buffers of
> > > > 1500 bytes.
> > > > >
> > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > Device keep dropping 9K packets because buffers posted are 1500
> > bytes.
> > > > > This is because device follows the spec " The device MUST NOT pass
> > > > received packets that exceed mtu".
> > > >
> > > >
> > > > The "mtu" here is the device config field, which is
> > > >
> > > >         /* Default maximum transmit unit advice */
> > > >
> > >
> > > It is the field from struct virtio_net_config.mtu. right?
> > > This is RO field for driver.
> > >
> > > > there is no guarantee device will not get a bigger packet.
> > > Right. That is what I also hinted.
> > > Hence, allocating buffers worth upto mtu is safer.
> > 
> > yes
> > 
> > > When user overrides it, driver can be further optimized to honor such new
> > value on rx buffer posting.
> > 
> > no, not without a feature bit promising device won't get wedged.
> > 
> I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> Why device should be affected with it?
> ( I am not proposing such weird configuration but asking for sake of correctness).

They just are because drivers did not do this.

> > > > And there is no guarantee such a packet will be dropped as opposed
> > > > to wedging the device if userspace insists on adding smaller buffers.
> > > >
> > > If user space insists on small buffers, so be it.
> > 
> > If previously things worked, the "so be it" is a regression and blaming users
> > won't help us.
> > 
> I am not suggesting above.
> This was Si-Wei's suggestion that somehow driver wants to post smaller buffers than the mtu because user knows what peer is doing.
> So may be driver can be extended to give more weight on user config.
> 
> > > It only works when user exactly know what user is doing in the whole
> > network.
> > 
> > If you want to claim this you need a new feature bit.
> > 
> Why is a new bit needed to tell device?
> User is doing something its own config mismatching the buffers and mtu.
> A solid use case hasn't emerged for this yet.
> 
> If user wants to modify the mtu, we should just make virtio_net_config.mtu as RW field using new feature bit.
> Is that what you mean?
> If so, yes, it makes things very neat where driver and device are aligned to each other, the way they are today.
> Only limitation is that its one-way. = device tells to driver.
> 
> > > When user prefers to override the device RO field, device is in the dark and
> > things work on best effort basis.
> > 
> > Dropping packets is best effort. Getting stuck forever isn't, that's a quality of
> > implementation issue.
> >
> Not sure, why things get stuck for ever. Maybe you have example to explain.
> I am mostly missing something.

I sent an explanation a bit earlier. It's more or less a bug.

> > > This must be a reasonably advance user who has good knowledge of its
> > network topology etc.
> > >
> > > For such case, may be yes, driver should be further optimized.
> > >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 22:59                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 22:59 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Si-Wei Liu, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Tue, Aug 09, 2022 at 10:49:48PM +0000, Parav Pandit wrote:
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, August 9, 2022 6:26 PM
> > To: Parav Pandit <parav@nvidia.com>
> > Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> > <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> > Stephen <stephen@networkplumber.org>; davem
> > <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> > foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> > jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> > kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> > Teitz <gavi@nvidia.com>
> > Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> > big packets
> > 
> > On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, August 9, 2022 5:38 PM
> > >
> > > [..]
> > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > > buffers of
> > > > 1500 bytes.
> > > > >
> > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > Device keep dropping 9K packets because buffers posted are 1500
> > bytes.
> > > > > This is because device follows the spec " The device MUST NOT pass
> > > > received packets that exceed mtu".
> > > >
> > > >
> > > > The "mtu" here is the device config field, which is
> > > >
> > > >         /* Default maximum transmit unit advice */
> > > >
> > >
> > > It is the field from struct virtio_net_config.mtu. right?
> > > This is RO field for driver.
> > >
> > > > there is no guarantee device will not get a bigger packet.
> > > Right. That is what I also hinted.
> > > Hence, allocating buffers worth upto mtu is safer.
> > 
> > yes
> > 
> > > When user overrides it, driver can be further optimized to honor such new
> > value on rx buffer posting.
> > 
> > no, not without a feature bit promising device won't get wedged.
> > 
> I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> Why device should be affected with it?
> ( I am not proposing such weird configuration but asking for sake of correctness).

They just are because drivers did not do this.

> > > > And there is no guarantee such a packet will be dropped as opposed
> > > > to wedging the device if userspace insists on adding smaller buffers.
> > > >
> > > If user space insists on small buffers, so be it.
> > 
> > If previously things worked, the "so be it" is a regression and blaming users
> > won't help us.
> > 
> I am not suggesting above.
> This was Si-Wei's suggestion that somehow driver wants to post smaller buffers than the mtu because user knows what peer is doing.
> So may be driver can be extended to give more weight on user config.
> 
> > > It only works when user exactly know what user is doing in the whole
> > network.
> > 
> > If you want to claim this you need a new feature bit.
> > 
> Why is a new bit needed to tell device?
> User is doing something its own config mismatching the buffers and mtu.
> A solid use case hasn't emerged for this yet.
> 
> If user wants to modify the mtu, we should just make virtio_net_config.mtu as RW field using new feature bit.
> Is that what you mean?
> If so, yes, it makes things very neat where driver and device are aligned to each other, the way they are today.
> Only limitation is that its one-way. = device tells to driver.
> 
> > > When user prefers to override the device RO field, device is in the dark and
> > things work on best effort basis.
> > 
> > Dropping packets is best effort. Getting stuck forever isn't, that's a quality of
> > implementation issue.
> >
> Not sure, why things get stuck for ever. Maybe you have example to explain.
> I am mostly missing something.

I sent an explanation a bit earlier. It's more or less a bug.

> > > This must be a reasonably advance user who has good knowledge of its
> > network topology etc.
> > >
> > > For such case, may be yes, driver should be further optimized.
> > >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 22:54                           ` Si-Wei Liu
@ 2022-08-09 23:03                             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 23:03 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 03:54:57PM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/9/2022 3:37 PM, Michael S. Tsirkin wrote:
> > On Tue, Aug 09, 2022 at 03:32:26PM -0700, Si-Wei Liu wrote:
> > > 
> > > On 8/9/2022 2:37 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
> > > > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > > Sent: Tuesday, August 9, 2022 3:09 PM
> > > > > > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > > > > Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> > > > > > > > single patch nor this patch, but the context for the eventual goal is
> > > > > > > > to allow XDP on a MTU=9000 link when guest users intentionally lower
> > > > > > > > down MTU to 1500.
> > > > > > > Which application benefit by having asymmetry by lowering mtu to 1500
> > > > > > to send packets but want to receive 9K packets?
> > > > > Below details doesn’t answer the question of asymmetry. :)
> > > > > 
> > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in which case
> > > > > > the receive buffer will be reduced to fit the 1500B payload size when mtu is
> > > > > > lowered down to 1500 from 9000.
> > > > > How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
> > > > > 
> > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > > > > This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".
> > > > The "mtu" here is the device config field, which is
> > > > 
> > > >           /* Default maximum transmit unit advice */
> > > > 
> > > > there is no guarantee device will not get a bigger packet.
> > > > And there is no guarantee such a packet will be dropped
> > > > as opposed to wedging the device if userspace insists on
> > > > adding smaller buffers.
> > > It'd be nice to document this requirement or statement to the spec for
> > > clarity purpose.
> > It's not a requirement, more of a bug. But it's been like this for
> > years.
> Well, I'm not sure how it may wedge the device if not capable of posting to
> smaller buffers, is there other option than drop? Truncate to what the
> buffer size may fit and deliver up? Seems even worse than drop...
> 
> > 
> > > Otherwise various device implementations are hard to
> > > follow. The capture is that vhost-net drops bigger packets while the driver
> > > only supplied smaller buffers. This is the status quo, and users seemingly
> > > have relied on this behavior for some while.
> > > 
> > > -Siwei
> > Weird where do you see this in code? I see
> > 
> >                  sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
> >                                                        &busyloop_intr);
> >                  if (!sock_len)
> >                          break;
> >                  sock_len += sock_hlen;
> >                  vhost_len = sock_len + vhost_hlen;
> >                  headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> >                                          vhost_len, &in, vq_log, &log,
> >                                          likely(mergeable) ? UIO_MAXIOV : 1);
> >                  /* On error, stop handling until the next kick. */
> >                  if (unlikely(headcount < 0))
> >                          goto out;
> > 
> > 
> > so it does not drop a packet, it just stops processing the queue.
> Here
> 
>                 /* On overrun, truncate and discard */
>                 if (unlikely(headcount > UIO_MAXIOV)) {
>                         iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
>                         err = sock->ops->recvmsg(sock, &msg,
>                                                  1, MSG_DONTWAIT |
> MSG_TRUNC);
>                         pr_debug("Discarded rx packet: len %zd\n",
> sock_len);
>                         continue;
>                 }
> 
> recvmsg(, , 1, ) is essentially to drop the oversized packet.
> 
> 
> In get_rx_bufs(), overrun detection will return something larger than
> UIO_MAXIOV as indicator:
> 
> static int get_rx_bufs()
> {
> :
> ;
>         /* Detect overrun */
>         if (unlikely(datalen > 0)) {
>                 r = UIO_MAXIOV + 1;
>                 goto err;
>         }
> :
> :
> 
> 
> -Siwei


Hmm you are right. I'll check but it seems I have misread the code.
Sorry about wasting your time on this.
So maybe the approach is ok then.
It's late, I'll recheck tomorrow.


> > 
> > 
> > > > 
> > > > > So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
> > > > > (it doesn’t have any relation to mergeable or otherwise).
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 23:03                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 23:03 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Parav Pandit, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Tue, Aug 09, 2022 at 03:54:57PM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/9/2022 3:37 PM, Michael S. Tsirkin wrote:
> > On Tue, Aug 09, 2022 at 03:32:26PM -0700, Si-Wei Liu wrote:
> > > 
> > > On 8/9/2022 2:37 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
> > > > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > > Sent: Tuesday, August 9, 2022 3:09 PM
> > > > > > > > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > > > > > > > Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> > > > > > > > single patch nor this patch, but the context for the eventual goal is
> > > > > > > > to allow XDP on a MTU=9000 link when guest users intentionally lower
> > > > > > > > down MTU to 1500.
> > > > > > > Which application benefit by having asymmetry by lowering mtu to 1500
> > > > > > to send packets but want to receive 9K packets?
> > > > > Below details doesn’t answer the question of asymmetry. :)
> > > > > 
> > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in which case
> > > > > > the receive buffer will be reduced to fit the 1500B payload size when mtu is
> > > > > > lowered down to 1500 from 9000.
> > > > > How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 bytes.
> > > > > 
> > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > > > > This is because device follows the spec " The device MUST NOT pass received packets that exceed mtu".
> > > > The "mtu" here is the device config field, which is
> > > > 
> > > >           /* Default maximum transmit unit advice */
> > > > 
> > > > there is no guarantee device will not get a bigger packet.
> > > > And there is no guarantee such a packet will be dropped
> > > > as opposed to wedging the device if userspace insists on
> > > > adding smaller buffers.
> > > It'd be nice to document this requirement or statement to the spec for
> > > clarity purpose.
> > It's not a requirement, more of a bug. But it's been like this for
> > years.
> Well, I'm not sure how it may wedge the device if not capable of posting to
> smaller buffers, is there other option than drop? Truncate to what the
> buffer size may fit and deliver up? Seems even worse than drop...
> 
> > 
> > > Otherwise various device implementations are hard to
> > > follow. The capture is that vhost-net drops bigger packets while the driver
> > > only supplied smaller buffers. This is the status quo, and users seemingly
> > > have relied on this behavior for some while.
> > > 
> > > -Siwei
> > Weird where do you see this in code? I see
> > 
> >                  sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
> >                                                        &busyloop_intr);
> >                  if (!sock_len)
> >                          break;
> >                  sock_len += sock_hlen;
> >                  vhost_len = sock_len + vhost_hlen;
> >                  headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> >                                          vhost_len, &in, vq_log, &log,
> >                                          likely(mergeable) ? UIO_MAXIOV : 1);
> >                  /* On error, stop handling until the next kick. */
> >                  if (unlikely(headcount < 0))
> >                          goto out;
> > 
> > 
> > so it does not drop a packet, it just stops processing the queue.
> Here
> 
>                 /* On overrun, truncate and discard */
>                 if (unlikely(headcount > UIO_MAXIOV)) {
>                         iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
>                         err = sock->ops->recvmsg(sock, &msg,
>                                                  1, MSG_DONTWAIT |
> MSG_TRUNC);
>                         pr_debug("Discarded rx packet: len %zd\n",
> sock_len);
>                         continue;
>                 }
> 
> recvmsg(, , 1, ) is essentially to drop the oversized packet.
> 
> 
> In get_rx_bufs(), overrun detection will return something larger than
> UIO_MAXIOV as indicator:
> 
> static int get_rx_bufs()
> {
> :
> ;
>         /* Detect overrun */
>         if (unlikely(datalen > 0)) {
>                 r = UIO_MAXIOV + 1;
>                 goto err;
>         }
> :
> :
> 
> 
> -Siwei


Hmm you are right. I'll check but it seems I have misread the code.
Sorry about wasting your time on this.
So maybe the approach is ok then.
It's late, I'll recheck tomorrow.


> > 
> > 
> > > > 
> > > > > So, I am lost what virtio net device user application is trying to achieve by sending smaller packets and dropping all receive packets.
> > > > > (it doesn’t have any relation to mergeable or otherwise).
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 22:49                           ` Parav Pandit
@ 2022-08-09 23:04                             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 23:04 UTC (permalink / raw)
  To: Parav Pandit
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 10:49:48PM +0000, Parav Pandit wrote:
> > > When user prefers to override the device RO field, device is in the dark and
> > things work on best effort basis.
> > 
> > Dropping packets is best effort. Getting stuck forever isn't, that's a quality of
> > implementation issue.
> >
> Not sure, why things get stuck for ever. Maybe you have example to explain.
> I am mostly missing something.

I'm no longer sure I'm right. Will recheck tomorrow, it's late here.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 23:04                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-09 23:04 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Si-Wei Liu, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Tue, Aug 09, 2022 at 10:49:48PM +0000, Parav Pandit wrote:
> > > When user prefers to override the device RO field, device is in the dark and
> > things work on best effort basis.
> > 
> > Dropping packets is best effort. Getting stuck forever isn't, that's a quality of
> > implementation issue.
> >
> Not sure, why things get stuck for ever. Maybe you have example to explain.
> I am mostly missing something.

I'm no longer sure I'm right. Will recheck tomorrow, it's late here.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 22:49                           ` Parav Pandit
@ 2022-08-09 23:24                             ` Si-Wei Liu
  -1 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 23:24 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li



On 8/9/2022 3:49 PM, Parav Pandit wrote:
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: Tuesday, August 9, 2022 6:26 PM
>> To: Parav Pandit <parav@nvidia.com>
>> Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
>> <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
>> Stephen <stephen@networkplumber.org>; davem
>> <davem@davemloft.net>; virtualization <virtualization@lists.linux-
>> foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
>> jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
>> kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
>> Teitz <gavi@nvidia.com>
>> Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
>> big packets
>>
>> On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>> Sent: Tuesday, August 9, 2022 5:38 PM
>>> [..]
>>>>>> I think virtio-net driver doesn't differentiate MTU and MRU, in
>>>>>> which case the receive buffer will be reduced to fit the 1500B
>>>>>> payload size when mtu is lowered down to 1500 from 9000.
>>>>> How? Driver reduced the mXu to 1500, say it is improved to post
>>>>> buffers of
>>>> 1500 bytes.
>>>>> Device doesn't know about it because mtu in config space is RO field.
>>>>> Device keep dropping 9K packets because buffers posted are 1500
>> bytes.
>>>>> This is because device follows the spec " The device MUST NOT pass
>>>> received packets that exceed mtu".
>>>>
>>>>
>>>> The "mtu" here is the device config field, which is
>>>>
>>>>          /* Default maximum transmit unit advice */
>>>>
>>> It is the field from struct virtio_net_config.mtu. right?
>>> This is RO field for driver.
>>>
>>>> there is no guarantee device will not get a bigger packet.
>>> Right. That is what I also hinted.
>>> Hence, allocating buffers worth upto mtu is safer.
>> yes
>>
>>> When user overrides it, driver can be further optimized to honor such new
>> value on rx buffer posting.
>>
>> no, not without a feature bit promising device won't get wedged.
>>
> I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> Why device should be affected with it?
> ( I am not proposing such weird configuration but asking for sake of correctness).
I am also confused how the device can be wedged in this case.

>
>>>> And there is no guarantee such a packet will be dropped as opposed
>>>> to wedging the device if userspace insists on adding smaller buffers.
>>>>
>>> If user space insists on small buffers, so be it.
>> If previously things worked, the "so be it" is a regression and blaming users
>> won't help us.
>>
> I am not suggesting above.
> This was Si-Wei's suggestion that somehow driver wants to post smaller buffers than the mtu because user knows what peer is doing.
> So may be driver can be extended to give more weight on user config.
It's not me, it's from our customers with real use cases. Some of which 
have very dedicate network setup, and it's not at odd they know virtio 
internals quite well. At one point they even customized the driver to 
disable mergeable buffer ahead of us offering them the opt-out from 
device level. And their appliance indeed has assumption of 1460 mtu 
everywhere.

-Siwei

>
>>> It only works when user exactly know what user is doing in the whole
>> network.
>>
>> If you want to claim this you need a new feature bit.
>>
> Why is a new bit needed to tell device?
> User is doing something its own config mismatching the buffers and mtu.
> A solid use case hasn't emerged for this yet.
>
> If user wants to modify the mtu, we should just make virtio_net_config.mtu as RW field using new feature bit.
> Is that what you mean?
> If so, yes, it makes things very neat where driver and device are aligned to each other, the way they are today.
> Only limitation is that its one-way. = device tells to driver.
>
>>> When user prefers to override the device RO field, device is in the dark and
>> things work on best effort basis.
>>
>> Dropping packets is best effort. Getting stuck forever isn't, that's a quality of
>> implementation issue.
>>
> Not sure, why things get stuck for ever. Maybe you have example to explain.
> I am mostly missing something.
>   
>>> This must be a reasonably advance user who has good knowledge of its
>> network topology etc.
>>> For such case, may be yes, driver should be further optimized.
>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-09 23:24                             ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-09 23:24 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: Jason Wang, Gavin Li, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Gavi Teitz



On 8/9/2022 3:49 PM, Parav Pandit wrote:
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: Tuesday, August 9, 2022 6:26 PM
>> To: Parav Pandit <parav@nvidia.com>
>> Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
>> <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
>> Stephen <stephen@networkplumber.org>; davem
>> <davem@davemloft.net>; virtualization <virtualization@lists.linux-
>> foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
>> jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
>> kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
>> Teitz <gavi@nvidia.com>
>> Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
>> big packets
>>
>> On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>> Sent: Tuesday, August 9, 2022 5:38 PM
>>> [..]
>>>>>> I think virtio-net driver doesn't differentiate MTU and MRU, in
>>>>>> which case the receive buffer will be reduced to fit the 1500B
>>>>>> payload size when mtu is lowered down to 1500 from 9000.
>>>>> How? Driver reduced the mXu to 1500, say it is improved to post
>>>>> buffers of
>>>> 1500 bytes.
>>>>> Device doesn't know about it because mtu in config space is RO field.
>>>>> Device keep dropping 9K packets because buffers posted are 1500
>> bytes.
>>>>> This is because device follows the spec " The device MUST NOT pass
>>>> received packets that exceed mtu".
>>>>
>>>>
>>>> The "mtu" here is the device config field, which is
>>>>
>>>>          /* Default maximum transmit unit advice */
>>>>
>>> It is the field from struct virtio_net_config.mtu. right?
>>> This is RO field for driver.
>>>
>>>> there is no guarantee device will not get a bigger packet.
>>> Right. That is what I also hinted.
>>> Hence, allocating buffers worth upto mtu is safer.
>> yes
>>
>>> When user overrides it, driver can be further optimized to honor such new
>> value on rx buffer posting.
>>
>> no, not without a feature bit promising device won't get wedged.
>>
> I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> Why device should be affected with it?
> ( I am not proposing such weird configuration but asking for sake of correctness).
I am also confused how the device can be wedged in this case.

>
>>>> And there is no guarantee such a packet will be dropped as opposed
>>>> to wedging the device if userspace insists on adding smaller buffers.
>>>>
>>> If user space insists on small buffers, so be it.
>> If previously things worked, the "so be it" is a regression and blaming users
>> won't help us.
>>
> I am not suggesting above.
> This was Si-Wei's suggestion that somehow driver wants to post smaller buffers than the mtu because user knows what peer is doing.
> So may be driver can be extended to give more weight on user config.
It's not me, it's from our customers with real use cases. Some of which 
have very dedicate network setup, and it's not at odd they know virtio 
internals quite well. At one point they even customized the driver to 
disable mergeable buffer ahead of us offering them the opt-out from 
device level. And their appliance indeed has assumption of 1460 mtu 
everywhere.

-Siwei

>
>>> It only works when user exactly know what user is doing in the whole
>> network.
>>
>> If you want to claim this you need a new feature bit.
>>
> Why is a new bit needed to tell device?
> User is doing something its own config mismatching the buffers and mtu.
> A solid use case hasn't emerged for this yet.
>
> If user wants to modify the mtu, we should just make virtio_net_config.mtu as RW field using new feature bit.
> Is that what you mean?
> If so, yes, it makes things very neat where driver and device are aligned to each other, the way they are today.
> Only limitation is that its one-way. = device tells to driver.
>
>>> When user prefers to override the device RO field, device is in the dark and
>> things work on best effort basis.
>>
>> Dropping packets is best effort. Getting stuck forever isn't, that's a quality of
>> implementation issue.
>>
> Not sure, why things get stuck for ever. Maybe you have example to explain.
> I am mostly missing something.
>   
>>> This must be a reasonably advance user who has good knowledge of its
>> network topology etc.
>>> For such case, may be yes, driver should be further optimized.
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 18:38             ` Si-Wei Liu
@ 2022-08-10  1:15               ` Jason Wang
  -1 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-10  1:15 UTC (permalink / raw)
  To: Si-Wei Liu, Gavin Li
  Cc: alexander.h.duyck, Virtio-Dev, mst, kubakici, sridhar.samudrala,
	jesse.brandeburg, gavi, virtualization, Hemminger, Stephen,
	loseweigh, davem


在 2022/8/10 02:38, Si-Wei Liu 写道:
>
>
> On 8/9/2022 12:44 AM, Jason Wang wrote:
>> On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
>>>
>>> On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 8/8/2022 12:31 AM, Gavin Li wrote:
>>>
>>>
>>> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>>
>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>> packets even when GUEST_* offloads are not present on the device.
>>> However, if GSO is not supported,
>>>
>>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>>> been be called.
>>>
>>> ACK
>>>
>>>
>>>    it would be sufficient to allocate
>>> segments to cover just up the MTU size and no further. Allocating the
>>> maximum amount of segments results in a large waste of buffer space in
>>> the queue, which limits the number of packets that can be buffered and
>>> can result in reduced performance.
>>>
>>> Therefore, if GSO is not supported,
>>>
>>> Ditto.
>>>
>>> ACK
>>>
>>>
>>> use the MTU to calculate the
>>> optimal amount of segments required.
>>>
>>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>> server running over the virtio-net interface.
>>>
>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>                Before   After
>>>     1500        22.5     22.4
>>>     9000        12.8     25.9
>>>
>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>> ---
>>>    drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>    1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index ec8e1b3108c3..d36918c1809d 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>        /* I like... big packets and I cannot lie! */
>>>        bool big_packets;
>>>
>>> +     /* Indicates GSO support */
>>> +     bool gso_is_supported;
>>> +
>>>        /* Host will merge rx buffers for big packets (shake it! shake
>>> it!) */
>>>        bool mergeable_rx_bufs;
>>>
>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
>>> virtnet_info *vi, struct receive_queue *rq,
>>>    static int add_recvbuf_big(struct virtnet_info *vi, struct
>>> receive_queue *rq,
>>>                           gfp_t gfp)
>>>    {
>>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>>        struct page *first, *list = NULL;
>>>        char *p;
>>>        int i, err, offset;
>>>
>>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>> +     if (!vi->gso_is_supported) {
>>> +             unsigned int mtu = vi->dev->mtu;
>>> +
>>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
>>> / PAGE_SIZE;
>>>
>>> DIV_ROUND_UP() can be used?
>>>
>>> ACK
>>>
>>>
>>> Since this branch slightly adds up cost to the datapath, I wonder if
>>> this sg_num can be saved and set only once (generally in virtnet_probe
>>> time) in struct virtnet_info?
>>>
>>> Not sure how to do it and align it with align with new mtu during
>>> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
>>> ndo_change_mtu might be in vendor specific code and unmanageable. In
>>> my case, the mtu can only be changed in the xml of the guest vm.
>>>
>>> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from 
>>> guest on
>>> a virtio-net device with 9000 MTU (as defined in guest xml). Basically
>>> guest user can set MTU to any valid value lower than the original
>>> HOST_MTU. In the vendor defined .ndo_change_mtu() op, 
>>> dev_validate_mtu()
>>> should have validated the MTU value before coming down to it. And I
>>> suspect you might want to do virtnet_close() and virtnet_open()
>>> before/after changing the buffer size on the fly (the netif_running()
>>> case), implementing .ndo_change_mtu() will be needed anyway.
>>>
>>> a guest VM driver changing mtu to smaller one is valid use case. 
>>> However, current optimization suggested in the patch doesn't degrade 
>>> any performance. Performing close() and open() sequence is good 
>>> idea, that I would like to take up next after this patch as its 
>>> going to be more than one patch to achieve it.
>> Right, it could be done on top.
>>
>> But another note is that, it would still be better to support GUEST 
>> GSO feature:
>>
>> 1) can work for the case for path MTU
>> 2) (migration)compatibility with software backends
>>
>>>
>>> +     }
>>> +
>>> +     sg_init_table(rq->sg, sg_num + 2);
>>>
>>>        /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>>
>>> Comment doesn't match code.
>>>
>>> ACK
>>>
>>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>> +     for (i = sg_num + 1; i > 1; --i) {
>>>                first = get_a_page(rq, gfp);
>>>                if (!first) {
>>>                        if (list)
>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
>>> *vi, struct receive_queue *rq,
>>>
>>>        /* chain first in list head */
>>>        first->private = (unsigned long)list;
>>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>                                  first, gfp);
>>>        if (err < 0)
>>>                give_pages(rq, first);
>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
>>> *vdev)
>>>        if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>                vi->big_packets = true;
>>> +             vi->gso_is_supported = true;
>>>
>>> Please do the same for virtnet_clear_guest_offloads(), and
>>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>>> virtnet_clear_guest_offloads() or the caller doesn't unset 
>>> big_packet on
>>> successful return, seems like a bug to me.
>> It is fine as long as
>>
>> 1) we don't implement ethtool API for changing guest offloads
> Not sure if I missed something, but it looks the current 
> virtnet_set_features() already supports toggling on/off GRO HW through 
> commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 (formerly misnamed as 
> LRO). Sorry, I realized I had a typo in email: 
> "virtnet_set_guest_offloads() or the caller doesn't unset big_packet 
> ...".


Yes, I miss that.


>
>> 2) big mode XDP is not enabled
> Currently it is not. Not a single patch nor this patch, but the 
> context for the eventual goal is to allow XDP on a MTU=9000 link when 
> guest users intentionally lower down MTU to 1500.


AFAIK, this requires more changes since mergeable path allocates 
PAGE_SIZE while small path allocates ~1500. This is something that needs 
to be fixed.

Thanks


>
> Regards,
> -Siwei
>>
>> So that code works only for XDP but we forbid big packets in the case
>> of XDP right now.
>>
>> Thanks
>>
>>> ACK. The two calls virtnet_set_guest_offloads and
>>> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
>>> you think if I can do this in virtnet_set_guest_offloads?
>>>
>>> I think that it should be fine, though you may want to deal with the 
>>> XDP
>>> path not to regress it.
>>>
>>> -Siwei
>>>
>>>
>>>
>>> Thanks,
>>> -Siwei
>>>
>>> +     }
>>>
>>>        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>                vi->mergeable_rx_bufs = true;
>>>
>>>
>>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10  1:15               ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-10  1:15 UTC (permalink / raw)
  To: Si-Wei Liu, Gavin Li
  Cc: mst, Hemminger, Stephen, davem, virtualization, Virtio-Dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, sridhar.samudrala,
	loseweigh, Parav Pandit, gavi


在 2022/8/10 02:38, Si-Wei Liu 写道:
>
>
> On 8/9/2022 12:44 AM, Jason Wang wrote:
>> On Tue, Aug 9, 2022 at 3:07 PM Gavin Li <gavinl@nvidia.com> wrote:
>>>
>>> On 8/9/2022 7:56 AM, Si-Wei Liu wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 8/8/2022 12:31 AM, Gavin Li wrote:
>>>
>>>
>>> On 8/6/2022 6:11 AM, Si-Wei Liu wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 8/1/2022 9:45 PM, Gavin Li wrote:
>>>
>>> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
>>> packets even when GUEST_* offloads are not present on the device.
>>> However, if GSO is not supported,
>>>
>>> GUEST GSO (virtio term), or GRO HW (netdev core term) it should have
>>> been be called.
>>>
>>> ACK
>>>
>>>
>>>    it would be sufficient to allocate
>>> segments to cover just up the MTU size and no further. Allocating the
>>> maximum amount of segments results in a large waste of buffer space in
>>> the queue, which limits the number of packets that can be buffered and
>>> can result in reduced performance.
>>>
>>> Therefore, if GSO is not supported,
>>>
>>> Ditto.
>>>
>>> ACK
>>>
>>>
>>> use the MTU to calculate the
>>> optimal amount of segments required.
>>>
>>> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
>>> 1 VQ, queue size 1024, before and after the change, with the iperf
>>> server running over the virtio-net interface.
>>>
>>> MTU(Bytes)/Bandwidth (Gbit/s)
>>>                Before   After
>>>     1500        22.5     22.4
>>>     9000        12.8     25.9
>>>
>>> Signed-off-by: Gavin Li <gavinl@nvidia.com>
>>> Reviewed-by: Gavi Teitz <gavi@nvidia.com>
>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>> ---
>>>    drivers/net/virtio_net.c | 20 ++++++++++++++++----
>>>    1 file changed, 16 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index ec8e1b3108c3..d36918c1809d 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -222,6 +222,9 @@ struct virtnet_info {
>>>        /* I like... big packets and I cannot lie! */
>>>        bool big_packets;
>>>
>>> +     /* Indicates GSO support */
>>> +     bool gso_is_supported;
>>> +
>>>        /* Host will merge rx buffers for big packets (shake it! shake
>>> it!) */
>>>        bool mergeable_rx_bufs;
>>>
>>> @@ -1312,14 +1315,21 @@ static int add_recvbuf_small(struct
>>> virtnet_info *vi, struct receive_queue *rq,
>>>    static int add_recvbuf_big(struct virtnet_info *vi, struct
>>> receive_queue *rq,
>>>                           gfp_t gfp)
>>>    {
>>> +     unsigned int sg_num = MAX_SKB_FRAGS;
>>>        struct page *first, *list = NULL;
>>>        char *p;
>>>        int i, err, offset;
>>>
>>> -     sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
>>> +     if (!vi->gso_is_supported) {
>>> +             unsigned int mtu = vi->dev->mtu;
>>> +
>>> +             sg_num = (mtu % PAGE_SIZE) ? mtu / PAGE_SIZE + 1 : mtu
>>> / PAGE_SIZE;
>>>
>>> DIV_ROUND_UP() can be used?
>>>
>>> ACK
>>>
>>>
>>> Since this branch slightly adds up cost to the datapath, I wonder if
>>> this sg_num can be saved and set only once (generally in virtnet_probe
>>> time) in struct virtnet_info?
>>>
>>> Not sure how to do it and align it with align with new mtu during
>>> .ndo_change_mtu()---as you mentioned in the following mail. Any idea?
>>> ndo_change_mtu might be in vendor specific code and unmanageable. In
>>> my case, the mtu can only be changed in the xml of the guest vm.
>>>
>>> Nope, for e.g. "ip link dev eth0 set mtu 1500" can be done from 
>>> guest on
>>> a virtio-net device with 9000 MTU (as defined in guest xml). Basically
>>> guest user can set MTU to any valid value lower than the original
>>> HOST_MTU. In the vendor defined .ndo_change_mtu() op, 
>>> dev_validate_mtu()
>>> should have validated the MTU value before coming down to it. And I
>>> suspect you might want to do virtnet_close() and virtnet_open()
>>> before/after changing the buffer size on the fly (the netif_running()
>>> case), implementing .ndo_change_mtu() will be needed anyway.
>>>
>>> a guest VM driver changing mtu to smaller one is valid use case. 
>>> However, current optimization suggested in the patch doesn't degrade 
>>> any performance. Performing close() and open() sequence is good 
>>> idea, that I would like to take up next after this patch as its 
>>> going to be more than one patch to achieve it.
>> Right, it could be done on top.
>>
>> But another note is that, it would still be better to support GUEST 
>> GSO feature:
>>
>> 1) can work for the case for path MTU
>> 2) (migration)compatibility with software backends
>>
>>>
>>> +     }
>>> +
>>> +     sg_init_table(rq->sg, sg_num + 2);
>>>
>>>        /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
>>>
>>> Comment doesn't match code.
>>>
>>> ACK
>>>
>>> -     for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>>> +     for (i = sg_num + 1; i > 1; --i) {
>>>                first = get_a_page(rq, gfp);
>>>                if (!first) {
>>>                        if (list)
>>> @@ -1350,7 +1360,7 @@ static int add_recvbuf_big(struct virtnet_info
>>> *vi, struct receive_queue *rq,
>>>
>>>        /* chain first in list head */
>>>        first->private = (unsigned long)list;
>>> -     err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
>>> +     err = virtqueue_add_inbuf(rq->vq, rq->sg, sg_num + 2,
>>>                                  first, gfp);
>>>        if (err < 0)
>>>                give_pages(rq, first);
>>> @@ -3571,8 +3581,10 @@ static int virtnet_probe(struct virtio_device
>>> *vdev)
>>>        if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>>            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>>>            virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
>>> -         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
>>> +         virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO)) {
>>>                vi->big_packets = true;
>>> +             vi->gso_is_supported = true;
>>>
>>> Please do the same for virtnet_clear_guest_offloads(), and
>>> correspondingly virtnet_restore_guest_offloads() as well. Not sure why
>>> virtnet_clear_guest_offloads() or the caller doesn't unset 
>>> big_packet on
>>> successful return, seems like a bug to me.
>> It is fine as long as
>>
>> 1) we don't implement ethtool API for changing guest offloads
> Not sure if I missed something, but it looks the current 
> virtnet_set_features() already supports toggling on/off GRO HW through 
> commit a02e8964eaf9271a8a5fcc0c55bd13f933bafc56 (formerly misnamed as 
> LRO). Sorry, I realized I had a typo in email: 
> "virtnet_set_guest_offloads() or the caller doesn't unset big_packet 
> ...".


Yes, I miss that.


>
>> 2) big mode XDP is not enabled
> Currently it is not. Not a single patch nor this patch, but the 
> context for the eventual goal is to allow XDP on a MTU=9000 link when 
> guest users intentionally lower down MTU to 1500.


AFAIK, this requires more changes since mergeable path allocates 
PAGE_SIZE while small path allocates ~1500. This is something that needs 
to be fixed.

Thanks


>
> Regards,
> -Siwei
>>
>> So that code works only for XDP but we forbid big packets in the case
>> of XDP right now.
>>
>> Thanks
>>
>>> ACK. The two calls virtnet_set_guest_offloads and
>>> virtnet_set_guest_offloads is also called by virtnet_set_features. Do
>>> you think if I can do this in virtnet_set_guest_offloads?
>>>
>>> I think that it should be fine, though you may want to deal with the 
>>> XDP
>>> path not to regress it.
>>>
>>> -Siwei
>>>
>>>
>>>
>>> Thanks,
>>> -Siwei
>>>
>>> +     }
>>>
>>>        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
>>>                vi->mergeable_rx_bufs = true;
>>>
>>>
>>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 22:54                           ` Si-Wei Liu
@ 2022-08-10  1:24                             ` Jason Wang
  -1 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-10  1:24 UTC (permalink / raw)
  To: Si-Wei Liu, Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li


在 2022/8/10 06:54, Si-Wei Liu 写道:
>
>
> On 8/9/2022 3:37 PM, Michael S. Tsirkin wrote:
>> On Tue, Aug 09, 2022 at 03:32:26PM -0700, Si-Wei Liu wrote:
>>>
>>> On 8/9/2022 2:37 PM, Michael S. Tsirkin wrote:
>>>> On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
>>>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>> Sent: Tuesday, August 9, 2022 3:09 PM
>>>>>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
>>>>>>>> single patch nor this patch, but the context for the eventual 
>>>>>>>> goal is
>>>>>>>> to allow XDP on a MTU=9000 link when guest users intentionally 
>>>>>>>> lower
>>>>>>>> down MTU to 1500.
>>>>>>> Which application benefit by having asymmetry by lowering mtu to 
>>>>>>> 1500
>>>>>> to send packets but want to receive 9K packets?
>>>>> Below details doesn’t answer the question of asymmetry. :)
>>>>>
>>>>>> I think virtio-net driver doesn't differentiate MTU and MRU, in 
>>>>>> which case
>>>>>> the receive buffer will be reduced to fit the 1500B payload size 
>>>>>> when mtu is
>>>>>> lowered down to 1500 from 9000.
>>>>> How? Driver reduced the mXu to 1500, say it is improved to post 
>>>>> buffers of 1500 bytes.
>>>>>
>>>>> Device doesn't know about it because mtu in config space is RO field.
>>>>> Device keep dropping 9K packets because buffers posted are 1500 
>>>>> bytes.
>>>>> This is because device follows the spec " The device MUST NOT pass 
>>>>> received packets that exceed mtu".
>>>> The "mtu" here is the device config field, which is
>>>>
>>>>           /* Default maximum transmit unit advice */
>>>>
>>>> there is no guarantee device will not get a bigger packet.
>>>> And there is no guarantee such a packet will be dropped
>>>> as opposed to wedging the device if userspace insists on
>>>> adding smaller buffers.
>>> It'd be nice to document this requirement or statement to the spec for
>>> clarity purpose.
>> It's not a requirement, more of a bug. But it's been like this for
>> years.
> Well, I'm not sure how it may wedge the device if not capable of 
> posting to smaller buffers, is there other option than drop? Truncate 
> to what the buffer size may fit and deliver up? Seems even worse than 
> drop...
>
>>
>>> Otherwise various device implementations are hard to
>>> follow. The capture is that vhost-net drops bigger packets while the 
>>> driver
>>> only supplied smaller buffers. This is the status quo, and users 
>>> seemingly
>>> have relied on this behavior for some while.
>>>
>>> -Siwei
>> Weird where do you see this in code? I see
>>
>>                  sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>> &busyloop_intr);
>>                  if (!sock_len)
>>                          break;
>>                  sock_len += sock_hlen;
>>                  vhost_len = sock_len + vhost_hlen;
>>                  headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
>>                                          vhost_len, &in, vq_log, &log,
>>                                          likely(mergeable) ? 
>> UIO_MAXIOV : 1);
>>                  /* On error, stop handling until the next kick. */
>>                  if (unlikely(headcount < 0))
>>                          goto out;
>>
>>
>> so it does not drop a packet, it just stops processing the queue.
> Here
>
>                 /* On overrun, truncate and discard */
>                 if (unlikely(headcount > UIO_MAXIOV)) {
>                         iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 
> 1);
>                         err = sock->ops->recvmsg(sock, &msg,
>                                                  1, MSG_DONTWAIT | 
> MSG_TRUNC);
>                         pr_debug("Discarded rx packet: len %zd\n", 
> sock_len);
>                         continue;
>                 }
>
> recvmsg(, , 1, ) is essentially to drop the oversized packet.


It's not necessarily the oversized packet but the packet that has too 
many sgs.

This issues has been discussed in the past, (for example we disable 
large rx queue size for vhost-net in Qemu). Where it could be solved by 
doing piece-wise copy

Thanks


>
>
> In get_rx_bufs(), overrun detection will return something larger than 
> UIO_MAXIOV as indicator:
>
> static int get_rx_bufs()
> {
> :
> ;
>         /* Detect overrun */
>         if (unlikely(datalen > 0)) {
>                 r = UIO_MAXIOV + 1;
>                 goto err;
>         }
> :
> :
>
>
> -Siwei
>
>>
>>
>>>>
>>>>> So, I am lost what virtio net device user application is trying to 
>>>>> achieve by sending smaller packets and dropping all receive packets.
>>>>> (it doesn’t have any relation to mergeable or otherwise).
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10  1:24                             ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-10  1:24 UTC (permalink / raw)
  To: Si-Wei Liu, Michael S. Tsirkin
  Cc: Parav Pandit, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz


在 2022/8/10 06:54, Si-Wei Liu 写道:
>
>
> On 8/9/2022 3:37 PM, Michael S. Tsirkin wrote:
>> On Tue, Aug 09, 2022 at 03:32:26PM -0700, Si-Wei Liu wrote:
>>>
>>> On 8/9/2022 2:37 PM, Michael S. Tsirkin wrote:
>>>> On Tue, Aug 09, 2022 at 07:18:30PM +0000, Parav Pandit wrote:
>>>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>> Sent: Tuesday, August 9, 2022 3:09 PM
>>>>>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>>>>> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
>>>>>>>> single patch nor this patch, but the context for the eventual 
>>>>>>>> goal is
>>>>>>>> to allow XDP on a MTU=9000 link when guest users intentionally 
>>>>>>>> lower
>>>>>>>> down MTU to 1500.
>>>>>>> Which application benefit by having asymmetry by lowering mtu to 
>>>>>>> 1500
>>>>>> to send packets but want to receive 9K packets?
>>>>> Below details doesn’t answer the question of asymmetry. :)
>>>>>
>>>>>> I think virtio-net driver doesn't differentiate MTU and MRU, in 
>>>>>> which case
>>>>>> the receive buffer will be reduced to fit the 1500B payload size 
>>>>>> when mtu is
>>>>>> lowered down to 1500 from 9000.
>>>>> How? Driver reduced the mXu to 1500, say it is improved to post 
>>>>> buffers of 1500 bytes.
>>>>>
>>>>> Device doesn't know about it because mtu in config space is RO field.
>>>>> Device keep dropping 9K packets because buffers posted are 1500 
>>>>> bytes.
>>>>> This is because device follows the spec " The device MUST NOT pass 
>>>>> received packets that exceed mtu".
>>>> The "mtu" here is the device config field, which is
>>>>
>>>>           /* Default maximum transmit unit advice */
>>>>
>>>> there is no guarantee device will not get a bigger packet.
>>>> And there is no guarantee such a packet will be dropped
>>>> as opposed to wedging the device if userspace insists on
>>>> adding smaller buffers.
>>> It'd be nice to document this requirement or statement to the spec for
>>> clarity purpose.
>> It's not a requirement, more of a bug. But it's been like this for
>> years.
> Well, I'm not sure how it may wedge the device if not capable of 
> posting to smaller buffers, is there other option than drop? Truncate 
> to what the buffer size may fit and deliver up? Seems even worse than 
> drop...
>
>>
>>> Otherwise various device implementations are hard to
>>> follow. The capture is that vhost-net drops bigger packets while the 
>>> driver
>>> only supplied smaller buffers. This is the status quo, and users 
>>> seemingly
>>> have relied on this behavior for some while.
>>>
>>> -Siwei
>> Weird where do you see this in code? I see
>>
>>                  sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
>> &busyloop_intr);
>>                  if (!sock_len)
>>                          break;
>>                  sock_len += sock_hlen;
>>                  vhost_len = sock_len + vhost_hlen;
>>                  headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
>>                                          vhost_len, &in, vq_log, &log,
>>                                          likely(mergeable) ? 
>> UIO_MAXIOV : 1);
>>                  /* On error, stop handling until the next kick. */
>>                  if (unlikely(headcount < 0))
>>                          goto out;
>>
>>
>> so it does not drop a packet, it just stops processing the queue.
> Here
>
>                 /* On overrun, truncate and discard */
>                 if (unlikely(headcount > UIO_MAXIOV)) {
>                         iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 
> 1);
>                         err = sock->ops->recvmsg(sock, &msg,
>                                                  1, MSG_DONTWAIT | 
> MSG_TRUNC);
>                         pr_debug("Discarded rx packet: len %zd\n", 
> sock_len);
>                         continue;
>                 }
>
> recvmsg(, , 1, ) is essentially to drop the oversized packet.


It's not necessarily the oversized packet but the packet that has too 
many sgs.

This issues has been discussed in the past, (for example we disable 
large rx queue size for vhost-net in Qemu). Where it could be solved by 
doing piece-wise copy

Thanks


>
>
> In get_rx_bufs(), overrun detection will return something larger than 
> UIO_MAXIOV as indicator:
>
> static int get_rx_bufs()
> {
> :
> ;
>         /* Detect overrun */
>         if (unlikely(datalen > 0)) {
>                 r = UIO_MAXIOV + 1;
>                 goto err;
>         }
> :
> :
>
>
> -Siwei
>
>>
>>
>>>>
>>>>> So, I am lost what virtio net device user application is trying to 
>>>>> achieve by sending smaller packets and dropping all receive packets.
>>>>> (it doesn’t have any relation to mergeable or otherwise).
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-09 23:24                             ` Si-Wei Liu
@ 2022-08-10  6:14                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10  6:14 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Tue, Aug 09, 2022 at 04:24:23PM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/9/2022 3:49 PM, Parav Pandit wrote:
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, August 9, 2022 6:26 PM
> > > To: Parav Pandit <parav@nvidia.com>
> > > Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> > > <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> > > Stephen <stephen@networkplumber.org>; davem
> > > <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> > > foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> > > jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> > > kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> > > Teitz <gavi@nvidia.com>
> > > Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> > > big packets
> > > 
> > > On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Tuesday, August 9, 2022 5:38 PM
> > > > [..]
> > > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > > > buffers of
> > > > > 1500 bytes.
> > > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > > Device keep dropping 9K packets because buffers posted are 1500
> > > bytes.
> > > > > > This is because device follows the spec " The device MUST NOT pass
> > > > > received packets that exceed mtu".
> > > > > 
> > > > > 
> > > > > The "mtu" here is the device config field, which is
> > > > > 
> > > > >          /* Default maximum transmit unit advice */
> > > > > 
> > > > It is the field from struct virtio_net_config.mtu. right?
> > > > This is RO field for driver.
> > > > 
> > > > > there is no guarantee device will not get a bigger packet.
> > > > Right. That is what I also hinted.
> > > > Hence, allocating buffers worth upto mtu is safer.
> > > yes
> > > 
> > > > When user overrides it, driver can be further optimized to honor such new
> > > value on rx buffer posting.
> > > 
> > > no, not without a feature bit promising device won't get wedged.
> > > 
> > I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> > Why device should be affected with it?
> > ( I am not proposing such weird configuration but asking for sake of correctness).
> I am also confused how the device can be wedged in this case.

Yea sorry. I misunderstood the code. It can't be.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10  6:14                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10  6:14 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Parav Pandit, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Tue, Aug 09, 2022 at 04:24:23PM -0700, Si-Wei Liu wrote:
> 
> 
> On 8/9/2022 3:49 PM, Parav Pandit wrote:
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, August 9, 2022 6:26 PM
> > > To: Parav Pandit <parav@nvidia.com>
> > > Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> > > <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> > > Stephen <stephen@networkplumber.org>; davem
> > > <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> > > foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> > > jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> > > kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> > > Teitz <gavi@nvidia.com>
> > > Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> > > big packets
> > > 
> > > On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Tuesday, August 9, 2022 5:38 PM
> > > > [..]
> > > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > > > buffers of
> > > > > 1500 bytes.
> > > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > > Device keep dropping 9K packets because buffers posted are 1500
> > > bytes.
> > > > > > This is because device follows the spec " The device MUST NOT pass
> > > > > received packets that exceed mtu".
> > > > > 
> > > > > 
> > > > > The "mtu" here is the device config field, which is
> > > > > 
> > > > >          /* Default maximum transmit unit advice */
> > > > > 
> > > > It is the field from struct virtio_net_config.mtu. right?
> > > > This is RO field for driver.
> > > > 
> > > > > there is no guarantee device will not get a bigger packet.
> > > > Right. That is what I also hinted.
> > > > Hence, allocating buffers worth upto mtu is safer.
> > > yes
> > > 
> > > > When user overrides it, driver can be further optimized to honor such new
> > > value on rx buffer posting.
> > > 
> > > no, not without a feature bit promising device won't get wedged.
> > > 
> > I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> > Why device should be affected with it?
> > ( I am not proposing such weird configuration but asking for sake of correctness).
> I am also confused how the device can be wedged in this case.

Yea sorry. I misunderstood the code. It can't be.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10  6:14                               ` Michael S. Tsirkin
@ 2022-08-10  6:15                                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10  6:15 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Wed, Aug 10, 2022 at 02:14:07AM -0400, Michael S. Tsirkin wrote:
> On Tue, Aug 09, 2022 at 04:24:23PM -0700, Si-Wei Liu wrote:
> > 
> > 
> > On 8/9/2022 3:49 PM, Parav Pandit wrote:
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, August 9, 2022 6:26 PM
> > > > To: Parav Pandit <parav@nvidia.com>
> > > > Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> > > > <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> > > > Stephen <stephen@networkplumber.org>; davem
> > > > <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> > > > foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> > > > jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> > > > kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> > > > Teitz <gavi@nvidia.com>
> > > > Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> > > > big packets
> > > > 
> > > > On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Tuesday, August 9, 2022 5:38 PM
> > > > > [..]
> > > > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > > > > buffers of
> > > > > > 1500 bytes.
> > > > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > > > Device keep dropping 9K packets because buffers posted are 1500
> > > > bytes.
> > > > > > > This is because device follows the spec " The device MUST NOT pass
> > > > > > received packets that exceed mtu".
> > > > > > 
> > > > > > 
> > > > > > The "mtu" here is the device config field, which is
> > > > > > 
> > > > > >          /* Default maximum transmit unit advice */
> > > > > > 
> > > > > It is the field from struct virtio_net_config.mtu. right?
> > > > > This is RO field for driver.
> > > > > 
> > > > > > there is no guarantee device will not get a bigger packet.
> > > > > Right. That is what I also hinted.
> > > > > Hence, allocating buffers worth upto mtu is safer.
> > > > yes
> > > > 
> > > > > When user overrides it, driver can be further optimized to honor such new
> > > > value on rx buffer posting.
> > > > 
> > > > no, not without a feature bit promising device won't get wedged.
> > > > 
> > > I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> > > Why device should be affected with it?
> > > ( I am not proposing such weird configuration but asking for sake of correctness).
> > I am also confused how the device can be wedged in this case.
> 
> Yea sorry. I misunderstood the code. It can't be.

Here's a problem as I see it. Let's say we reduce mtu.
Small buffers are added. Now we increase mtu.
Device will drop all packets until small buffers are consumed.

Should we make this depend on the vq reset ability maybe?

> -- 
> MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10  6:15                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10  6:15 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Parav Pandit, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Wed, Aug 10, 2022 at 02:14:07AM -0400, Michael S. Tsirkin wrote:
> On Tue, Aug 09, 2022 at 04:24:23PM -0700, Si-Wei Liu wrote:
> > 
> > 
> > On 8/9/2022 3:49 PM, Parav Pandit wrote:
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, August 9, 2022 6:26 PM
> > > > To: Parav Pandit <parav@nvidia.com>
> > > > Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> > > > <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> > > > Stephen <stephen@networkplumber.org>; davem
> > > > <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> > > > foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> > > > jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> > > > kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> > > > Teitz <gavi@nvidia.com>
> > > > Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> > > > big packets
> > > > 
> > > > On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Tuesday, August 9, 2022 5:38 PM
> > > > > [..]
> > > > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > > > > buffers of
> > > > > > 1500 bytes.
> > > > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > > > Device keep dropping 9K packets because buffers posted are 1500
> > > > bytes.
> > > > > > > This is because device follows the spec " The device MUST NOT pass
> > > > > > received packets that exceed mtu".
> > > > > > 
> > > > > > 
> > > > > > The "mtu" here is the device config field, which is
> > > > > > 
> > > > > >          /* Default maximum transmit unit advice */
> > > > > > 
> > > > > It is the field from struct virtio_net_config.mtu. right?
> > > > > This is RO field for driver.
> > > > > 
> > > > > > there is no guarantee device will not get a bigger packet.
> > > > > Right. That is what I also hinted.
> > > > > Hence, allocating buffers worth upto mtu is safer.
> > > > yes
> > > > 
> > > > > When user overrides it, driver can be further optimized to honor such new
> > > > value on rx buffer posting.
> > > > 
> > > > no, not without a feature bit promising device won't get wedged.
> > > > 
> > > I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> > > Why device should be affected with it?
> > > ( I am not proposing such weird configuration but asking for sake of correctness).
> > I am also confused how the device can be wedged in this case.
> 
> Yea sorry. I misunderstood the code. It can't be.

Here's a problem as I see it. Let's say we reduce mtu.
Small buffers are added. Now we increase mtu.
Device will drop all packets until small buffers are consumed.

Should we make this depend on the vq reset ability maybe?

> -- 
> MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10  6:15                                 ` Michael S. Tsirkin
@ 2022-08-10  6:59                                   ` Jason Wang
  -1 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-10  6:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Wed, Aug 10, 2022 at 2:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Aug 10, 2022 at 02:14:07AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Aug 09, 2022 at 04:24:23PM -0700, Si-Wei Liu wrote:
> > >
> > >
> > > On 8/9/2022 3:49 PM, Parav Pandit wrote:
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Tuesday, August 9, 2022 6:26 PM
> > > > > To: Parav Pandit <parav@nvidia.com>
> > > > > Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> > > > > <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> > > > > Stephen <stephen@networkplumber.org>; davem
> > > > > <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> > > > > foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> > > > > jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> > > > > kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> > > > > Teitz <gavi@nvidia.com>
> > > > > Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> > > > > big packets
> > > > >
> > > > > On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > Sent: Tuesday, August 9, 2022 5:38 PM
> > > > > > [..]
> > > > > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > > > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > > > > > buffers of
> > > > > > > 1500 bytes.
> > > > > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > > > > Device keep dropping 9K packets because buffers posted are 1500
> > > > > bytes.
> > > > > > > > This is because device follows the spec " The device MUST NOT pass
> > > > > > > received packets that exceed mtu".
> > > > > > >
> > > > > > >
> > > > > > > The "mtu" here is the device config field, which is
> > > > > > >
> > > > > > >          /* Default maximum transmit unit advice */
> > > > > > >
> > > > > > It is the field from struct virtio_net_config.mtu. right?
> > > > > > This is RO field for driver.
> > > > > >
> > > > > > > there is no guarantee device will not get a bigger packet.
> > > > > > Right. That is what I also hinted.
> > > > > > Hence, allocating buffers worth upto mtu is safer.
> > > > > yes
> > > > >
> > > > > > When user overrides it, driver can be further optimized to honor such new
> > > > > value on rx buffer posting.
> > > > >
> > > > > no, not without a feature bit promising device won't get wedged.
> > > > >
> > > > I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> > > > Why device should be affected with it?
> > > > ( I am not proposing such weird configuration but asking for sake of correctness).
> > > I am also confused how the device can be wedged in this case.
> >
> > Yea sorry. I misunderstood the code. It can't be.
>
> Here's a problem as I see it. Let's say we reduce mtu.
> Small buffers are added. Now we increase mtu.
> Device will drop all packets until small buffers are consumed.
>
> Should we make this depend on the vq reset ability maybe?

The advantage of this is to keep TX working. Or we can use device
reset as a fallback if there's no vq reset.

Thanks


>
> > --
> > MST
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10  6:59                                   ` Jason Wang
  0 siblings, 0 replies; 102+ messages in thread
From: Jason Wang @ 2022-08-10  6:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Si-Wei Liu, Parav Pandit, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Wed, Aug 10, 2022 at 2:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Aug 10, 2022 at 02:14:07AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Aug 09, 2022 at 04:24:23PM -0700, Si-Wei Liu wrote:
> > >
> > >
> > > On 8/9/2022 3:49 PM, Parav Pandit wrote:
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Tuesday, August 9, 2022 6:26 PM
> > > > > To: Parav Pandit <parav@nvidia.com>
> > > > > Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> > > > > <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> > > > > Stephen <stephen@networkplumber.org>; davem
> > > > > <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> > > > > foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> > > > > jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> > > > > kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> > > > > Teitz <gavi@nvidia.com>
> > > > > Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> > > > > big packets
> > > > >
> > > > > On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > Sent: Tuesday, August 9, 2022 5:38 PM
> > > > > > [..]
> > > > > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > > > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > > > > > buffers of
> > > > > > > 1500 bytes.
> > > > > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > > > > Device keep dropping 9K packets because buffers posted are 1500
> > > > > bytes.
> > > > > > > > This is because device follows the spec " The device MUST NOT pass
> > > > > > > received packets that exceed mtu".
> > > > > > >
> > > > > > >
> > > > > > > The "mtu" here is the device config field, which is
> > > > > > >
> > > > > > >          /* Default maximum transmit unit advice */
> > > > > > >
> > > > > > It is the field from struct virtio_net_config.mtu. right?
> > > > > > This is RO field for driver.
> > > > > >
> > > > > > > there is no guarantee device will not get a bigger packet.
> > > > > > Right. That is what I also hinted.
> > > > > > Hence, allocating buffers worth upto mtu is safer.
> > > > > yes
> > > > >
> > > > > > When user overrides it, driver can be further optimized to honor such new
> > > > > value on rx buffer posting.
> > > > >
> > > > > no, not without a feature bit promising device won't get wedged.
> > > > >
> > > > I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> > > > Why device should be affected with it?
> > > > ( I am not proposing such weird configuration but asking for sake of correctness).
> > > I am also confused how the device can be wedged in this case.
> >
> > Yea sorry. I misunderstood the code. It can't be.
>
> Here's a problem as I see it. Let's say we reduce mtu.
> Small buffers are added. Now we increase mtu.
> Device will drop all packets until small buffers are consumed.
>
> Should we make this depend on the vq reset ability maybe?

The advantage of this is to keep TX working. Or we can use device
reset as a fallback if there's no vq reset.

Thanks


>
> > --
> > MST
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10  6:59                                   ` Jason Wang
@ 2022-08-10  9:03                                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10  9:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Wed, Aug 10, 2022 at 02:59:55PM +0800, Jason Wang wrote:
> On Wed, Aug 10, 2022 at 2:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Aug 10, 2022 at 02:14:07AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Aug 09, 2022 at 04:24:23PM -0700, Si-Wei Liu wrote:
> > > >
> > > >
> > > > On 8/9/2022 3:49 PM, Parav Pandit wrote:
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Tuesday, August 9, 2022 6:26 PM
> > > > > > To: Parav Pandit <parav@nvidia.com>
> > > > > > Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> > > > > > <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> > > > > > Stephen <stephen@networkplumber.org>; davem
> > > > > > <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> > > > > > foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> > > > > > jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> > > > > > kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> > > > > > Teitz <gavi@nvidia.com>
> > > > > > Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> > > > > > big packets
> > > > > >
> > > > > > On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > Sent: Tuesday, August 9, 2022 5:38 PM
> > > > > > > [..]
> > > > > > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > > > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > > > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > > > > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > > > > > > buffers of
> > > > > > > > 1500 bytes.
> > > > > > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > > > > > Device keep dropping 9K packets because buffers posted are 1500
> > > > > > bytes.
> > > > > > > > > This is because device follows the spec " The device MUST NOT pass
> > > > > > > > received packets that exceed mtu".
> > > > > > > >
> > > > > > > >
> > > > > > > > The "mtu" here is the device config field, which is
> > > > > > > >
> > > > > > > >          /* Default maximum transmit unit advice */
> > > > > > > >
> > > > > > > It is the field from struct virtio_net_config.mtu. right?
> > > > > > > This is RO field for driver.
> > > > > > >
> > > > > > > > there is no guarantee device will not get a bigger packet.
> > > > > > > Right. That is what I also hinted.
> > > > > > > Hence, allocating buffers worth upto mtu is safer.
> > > > > > yes
> > > > > >
> > > > > > > When user overrides it, driver can be further optimized to honor such new
> > > > > > value on rx buffer posting.
> > > > > >
> > > > > > no, not without a feature bit promising device won't get wedged.
> > > > > >
> > > > > I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> > > > > Why device should be affected with it?
> > > > > ( I am not proposing such weird configuration but asking for sake of correctness).
> > > > I am also confused how the device can be wedged in this case.
> > >
> > > Yea sorry. I misunderstood the code. It can't be.
> >
> > Here's a problem as I see it. Let's say we reduce mtu.
> > Small buffers are added. Now we increase mtu.
> > Device will drop all packets until small buffers are consumed.
> >
> > Should we make this depend on the vq reset ability maybe?
> 
> The advantage of this is to keep TX working. Or we can use device
> reset as a fallback if there's no vq reset.
> 
> Thanks

Device reset is really annoying in that it loses all the state:
rx filters etc etc.

> 
> >
> > > --
> > > MST
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10  9:03                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10  9:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: Si-Wei Liu, Parav Pandit, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Wed, Aug 10, 2022 at 02:59:55PM +0800, Jason Wang wrote:
> On Wed, Aug 10, 2022 at 2:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Aug 10, 2022 at 02:14:07AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Aug 09, 2022 at 04:24:23PM -0700, Si-Wei Liu wrote:
> > > >
> > > >
> > > > On 8/9/2022 3:49 PM, Parav Pandit wrote:
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Tuesday, August 9, 2022 6:26 PM
> > > > > > To: Parav Pandit <parav@nvidia.com>
> > > > > > Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
> > > > > > <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
> > > > > > Stephen <stephen@networkplumber.org>; davem
> > > > > > <davem@davemloft.net>; virtualization <virtualization@lists.linux-
> > > > > > foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
> > > > > > jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
> > > > > > kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
> > > > > > Teitz <gavi@nvidia.com>
> > > > > > Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
> > > > > > big packets
> > > > > >
> > > > > > On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
> > > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > Sent: Tuesday, August 9, 2022 5:38 PM
> > > > > > > [..]
> > > > > > > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > > > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > > > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > > > > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > > > > > > buffers of
> > > > > > > > 1500 bytes.
> > > > > > > > > Device doesn't know about it because mtu in config space is RO field.
> > > > > > > > > Device keep dropping 9K packets because buffers posted are 1500
> > > > > > bytes.
> > > > > > > > > This is because device follows the spec " The device MUST NOT pass
> > > > > > > > received packets that exceed mtu".
> > > > > > > >
> > > > > > > >
> > > > > > > > The "mtu" here is the device config field, which is
> > > > > > > >
> > > > > > > >          /* Default maximum transmit unit advice */
> > > > > > > >
> > > > > > > It is the field from struct virtio_net_config.mtu. right?
> > > > > > > This is RO field for driver.
> > > > > > >
> > > > > > > > there is no guarantee device will not get a bigger packet.
> > > > > > > Right. That is what I also hinted.
> > > > > > > Hence, allocating buffers worth upto mtu is safer.
> > > > > > yes
> > > > > >
> > > > > > > When user overrides it, driver can be further optimized to honor such new
> > > > > > value on rx buffer posting.
> > > > > >
> > > > > > no, not without a feature bit promising device won't get wedged.
> > > > > >
> > > > > I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
> > > > > Why device should be affected with it?
> > > > > ( I am not proposing such weird configuration but asking for sake of correctness).
> > > > I am also confused how the device can be wedged in this case.
> > >
> > > Yea sorry. I misunderstood the code. It can't be.
> >
> > Here's a problem as I see it. Let's say we reduce mtu.
> > Small buffers are added. Now we increase mtu.
> > Device will drop all packets until small buffers are consumed.
> >
> > Should we make this depend on the vq reset ability maybe?
> 
> The advantage of this is to keep TX working. Or we can use device
> reset as a fallback if there's no vq reset.
> 
> Thanks

Device reset is really annoying in that it loses all the state:
rx filters etc etc.

> 
> >
> > > --
> > > MST
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10  9:03                                     ` Michael S. Tsirkin
@ 2022-08-10 16:00                                       ` Parav Pandit
  -1 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit via Virtualization @ 2022-08-10 16:00 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, August 10, 2022 5:03 AM
> > >
> > > Should we make this depend on the vq reset ability maybe?
> >
> > The advantage of this is to keep TX working. Or we can use device
> > reset as a fallback if there's no vq reset.
> >
> > Thanks
> 
> Device reset is really annoying in that it loses all the state:
> rx filters etc etc.

The elegant solution is let driver tell the new mtu to the device.
One way to do so is by using existing ctrl vq.
If merged buffer is done, and new mtu is > minimum posting size, no need to undergo vq reset.
If merged buffer is not done, and buffer posted are smaller than new mtu, undergo vq reset optionally.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10 16:00                                       ` Parav Pandit
  0 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit @ 2022-08-10 16:00 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: Si-Wei Liu, Gavin Li, Hemminger, Stephen, davem, virtualization,
	Virtio-Dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	sridhar.samudrala, loseweigh, Gavi Teitz


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, August 10, 2022 5:03 AM
> > >
> > > Should we make this depend on the vq reset ability maybe?
> >
> > The advantage of this is to keep TX working. Or we can use device
> > reset as a fallback if there's no vq reset.
> >
> > Thanks
> 
> Device reset is really annoying in that it loses all the state:
> rx filters etc etc.

The elegant solution is let driver tell the new mtu to the device.
One way to do so is by using existing ctrl vq.
If merged buffer is done, and new mtu is > minimum posting size, no need to undergo vq reset.
If merged buffer is not done, and buffer posted are smaller than new mtu, undergo vq reset optionally.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10 16:00                                       ` Parav Pandit
@ 2022-08-10 16:05                                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10 16:05 UTC (permalink / raw)
  To: Parav Pandit
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, August 10, 2022 5:03 AM
> > > >
> > > > Should we make this depend on the vq reset ability maybe?
> > >
> > > The advantage of this is to keep TX working. Or we can use device
> > > reset as a fallback if there's no vq reset.
> > >
> > > Thanks
> > 
> > Device reset is really annoying in that it loses all the state:
> > rx filters etc etc.
> 
> The elegant solution is let driver tell the new mtu to the device.
> One way to do so is by using existing ctrl vq.

That will need a new feature bit.

> If merged buffer is done, and new mtu is > minimum posting size, no need to undergo vq reset.
> If merged buffer is not done, and buffer posted are smaller than new mtu, undergo vq reset optionally.

This can be done with or without sending mtu to device.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10 16:05                                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10 16:05 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Wang, Si-Wei Liu, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, August 10, 2022 5:03 AM
> > > >
> > > > Should we make this depend on the vq reset ability maybe?
> > >
> > > The advantage of this is to keep TX working. Or we can use device
> > > reset as a fallback if there's no vq reset.
> > >
> > > Thanks
> > 
> > Device reset is really annoying in that it loses all the state:
> > rx filters etc etc.
> 
> The elegant solution is let driver tell the new mtu to the device.
> One way to do so is by using existing ctrl vq.

That will need a new feature bit.

> If merged buffer is done, and new mtu is > minimum posting size, no need to undergo vq reset.
> If merged buffer is not done, and buffer posted are smaller than new mtu, undergo vq reset optionally.

This can be done with or without sending mtu to device.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10 16:05                                         ` Michael S. Tsirkin
@ 2022-08-10 16:22                                           ` Parav Pandit
  -1 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit via Virtualization @ 2022-08-10 16:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, August 10, 2022 12:05 PM
> 
> On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > >
> > > > > Should we make this depend on the vq reset ability maybe?
> > > >
> > > > The advantage of this is to keep TX working. Or we can use device
> > > > reset as a fallback if there's no vq reset.
> > > >
> > > > Thanks
> > >
> > > Device reset is really annoying in that it loses all the state:
> > > rx filters etc etc.
> >
> > The elegant solution is let driver tell the new mtu to the device.
> > One way to do so is by using existing ctrl vq.
> 
> That will need a new feature bit.
> 
Yes. ctrl vq can tell what all configuration does it allow. :)
Or you prefer feature bit?

> > If merged buffer is done, and new mtu is > minimum posting size, no need
> to undergo vq reset.
> > If merged buffer is not done, and buffer posted are smaller than new mtu,
> undergo vq reset optionally.
> 
> This can be done with or without sending mtu to device.
Yes, telling mtu to device helps device to optimize and adhere to the spec line " The device MUST NOT pass received packets that exceed mtu" in section 5.1.4.1.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10 16:22                                           ` Parav Pandit
  0 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit @ 2022-08-10 16:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Si-Wei Liu, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, August 10, 2022 12:05 PM
> 
> On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > >
> > > > > Should we make this depend on the vq reset ability maybe?
> > > >
> > > > The advantage of this is to keep TX working. Or we can use device
> > > > reset as a fallback if there's no vq reset.
> > > >
> > > > Thanks
> > >
> > > Device reset is really annoying in that it loses all the state:
> > > rx filters etc etc.
> >
> > The elegant solution is let driver tell the new mtu to the device.
> > One way to do so is by using existing ctrl vq.
> 
> That will need a new feature bit.
> 
Yes. ctrl vq can tell what all configuration does it allow. :)
Or you prefer feature bit?

> > If merged buffer is done, and new mtu is > minimum posting size, no need
> to undergo vq reset.
> > If merged buffer is not done, and buffer posted are smaller than new mtu,
> undergo vq reset optionally.
> 
> This can be done with or without sending mtu to device.
Yes, telling mtu to device helps device to optimize and adhere to the spec line " The device MUST NOT pass received packets that exceed mtu" in section 5.1.4.1.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10 16:22                                           ` Parav Pandit
@ 2022-08-10 16:58                                             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10 16:58 UTC (permalink / raw)
  To: Parav Pandit
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Wed, Aug 10, 2022 at 04:22:41PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, August 10, 2022 12:05 PM
> > 
> > On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > > >
> > > > > > Should we make this depend on the vq reset ability maybe?
> > > > >
> > > > > The advantage of this is to keep TX working. Or we can use device
> > > > > reset as a fallback if there's no vq reset.
> > > > >
> > > > > Thanks
> > > >
> > > > Device reset is really annoying in that it loses all the state:
> > > > rx filters etc etc.
> > >
> > > The elegant solution is let driver tell the new mtu to the device.
> > > One way to do so is by using existing ctrl vq.
> > 
> > That will need a new feature bit.
> > 
> Yes. ctrl vq can tell what all configuration does it allow. :)
> Or you prefer feature bit?

We did feature bits for this in the past.

> > > If merged buffer is done, and new mtu is > minimum posting size, no need
> > to undergo vq reset.
> > > If merged buffer is not done, and buffer posted are smaller than new mtu,
> > undergo vq reset optionally.
> > 
> > This can be done with or without sending mtu to device.
> Yes, telling mtu to device helps device to optimize and adhere to the spec line " The device MUST NOT pass received packets that exceed mtu" in section 5.1.4.1.

Again, that line refers to \field{mtu} which is the max mtu supported,
irrespective to anything driver does.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10 16:58                                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10 16:58 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Wang, Si-Wei Liu, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Wed, Aug 10, 2022 at 04:22:41PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, August 10, 2022 12:05 PM
> > 
> > On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > > >
> > > > > > Should we make this depend on the vq reset ability maybe?
> > > > >
> > > > > The advantage of this is to keep TX working. Or we can use device
> > > > > reset as a fallback if there's no vq reset.
> > > > >
> > > > > Thanks
> > > >
> > > > Device reset is really annoying in that it loses all the state:
> > > > rx filters etc etc.
> > >
> > > The elegant solution is let driver tell the new mtu to the device.
> > > One way to do so is by using existing ctrl vq.
> > 
> > That will need a new feature bit.
> > 
> Yes. ctrl vq can tell what all configuration does it allow. :)
> Or you prefer feature bit?

We did feature bits for this in the past.

> > > If merged buffer is done, and new mtu is > minimum posting size, no need
> > to undergo vq reset.
> > > If merged buffer is not done, and buffer posted are smaller than new mtu,
> > undergo vq reset optionally.
> > 
> > This can be done with or without sending mtu to device.
> Yes, telling mtu to device helps device to optimize and adhere to the spec line " The device MUST NOT pass received packets that exceed mtu" in section 5.1.4.1.

Again, that line refers to \field{mtu} which is the max mtu supported,
irrespective to anything driver does.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10 16:58                                             ` Michael S. Tsirkin
@ 2022-08-10 17:02                                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10 17:02 UTC (permalink / raw)
  To: Parav Pandit
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Wed, Aug 10, 2022 at 12:58:58PM -0400, Michael S. Tsirkin wrote:
> On Wed, Aug 10, 2022 at 04:22:41PM +0000, Parav Pandit wrote:
> > 
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, August 10, 2022 12:05 PM
> > > 
> > > On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > > > >
> > > > > > > Should we make this depend on the vq reset ability maybe?
> > > > > >
> > > > > > The advantage of this is to keep TX working. Or we can use device
> > > > > > reset as a fallback if there's no vq reset.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Device reset is really annoying in that it loses all the state:
> > > > > rx filters etc etc.
> > > >
> > > > The elegant solution is let driver tell the new mtu to the device.
> > > > One way to do so is by using existing ctrl vq.
> > > 
> > > That will need a new feature bit.
> > > 
> > Yes. ctrl vq can tell what all configuration does it allow. :)
> > Or you prefer feature bit?
> 
> We did feature bits for this in the past.
> 
> > > > If merged buffer is done, and new mtu is > minimum posting size, no need
> > > to undergo vq reset.
> > > > If merged buffer is not done, and buffer posted are smaller than new mtu,
> > > undergo vq reset optionally.
> > > 
> > > This can be done with or without sending mtu to device.
> > Yes, telling mtu to device helps device to optimize and adhere to the spec line " The device MUST NOT pass received packets that exceed mtu" in section 5.1.4.1.
> 
> Again, that line refers to \field{mtu} which is the max mtu supported,
> irrespective to anything driver does.

BTW with any such ctrl vq interface we need to think how cases such as
increasing and decreasing MTU work.  The normal behaviour for linux
drivers is to limit this to when the link is down.  Which reminds me, we
do not have a command to bring link down, either.

> -- 
> MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10 17:02                                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10 17:02 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Wang, Si-Wei Liu, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Wed, Aug 10, 2022 at 12:58:58PM -0400, Michael S. Tsirkin wrote:
> On Wed, Aug 10, 2022 at 04:22:41PM +0000, Parav Pandit wrote:
> > 
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, August 10, 2022 12:05 PM
> > > 
> > > On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > > > >
> > > > > > > Should we make this depend on the vq reset ability maybe?
> > > > > >
> > > > > > The advantage of this is to keep TX working. Or we can use device
> > > > > > reset as a fallback if there's no vq reset.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Device reset is really annoying in that it loses all the state:
> > > > > rx filters etc etc.
> > > >
> > > > The elegant solution is let driver tell the new mtu to the device.
> > > > One way to do so is by using existing ctrl vq.
> > > 
> > > That will need a new feature bit.
> > > 
> > Yes. ctrl vq can tell what all configuration does it allow. :)
> > Or you prefer feature bit?
> 
> We did feature bits for this in the past.
> 
> > > > If merged buffer is done, and new mtu is > minimum posting size, no need
> > > to undergo vq reset.
> > > > If merged buffer is not done, and buffer posted are smaller than new mtu,
> > > undergo vq reset optionally.
> > > 
> > > This can be done with or without sending mtu to device.
> > Yes, telling mtu to device helps device to optimize and adhere to the spec line " The device MUST NOT pass received packets that exceed mtu" in section 5.1.4.1.
> 
> Again, that line refers to \field{mtu} which is the max mtu supported,
> irrespective to anything driver does.

BTW with any such ctrl vq interface we need to think how cases such as
increasing and decreasing MTU work.  The normal behaviour for linux
drivers is to limit this to when the link is down.  Which reminds me, we
do not have a command to bring link down, either.

> -- 
> MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10 16:58                                             ` Michael S. Tsirkin
@ 2022-08-10 17:06                                               ` Parav Pandit
  -1 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit via Virtualization @ 2022-08-10 17:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, August 10, 2022 12:59 PM
> 
> On Wed, Aug 10, 2022 at 04:22:41PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, August 10, 2022 12:05 PM
> > >
> > > On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > > > >
> > > > > > > Should we make this depend on the vq reset ability maybe?
> > > > > >
> > > > > > The advantage of this is to keep TX working. Or we can use
> > > > > > device reset as a fallback if there's no vq reset.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Device reset is really annoying in that it loses all the state:
> > > > > rx filters etc etc.
> > > >
> > > > The elegant solution is let driver tell the new mtu to the device.
> > > > One way to do so is by using existing ctrl vq.
> > >
> > > That will need a new feature bit.
> > >
> > Yes. ctrl vq can tell what all configuration does it allow. :) Or you
> > prefer feature bit?
> 
> We did feature bits for this in the past.
> 
Ok. Will try to draft the update for future.

Gavin should repost the patch to address comments unrelated to this future bit anyway.
Right?

> > > > If merged buffer is done, and new mtu is > minimum posting size,
> > > > no need
> > > to undergo vq reset.
> > > > If merged buffer is not done, and buffer posted are smaller than
> > > > new mtu,
> > > undergo vq reset optionally.
> > >
> > > This can be done with or without sending mtu to device.
> > Yes, telling mtu to device helps device to optimize and adhere to the spec
> line " The device MUST NOT pass received packets that exceed mtu" in
> section 5.1.4.1.
> 
> Again, that line refers to \field{mtu} which is the max mtu supported,
> irrespective to anything driver does.
> 
> --
> MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10 17:06                                               ` Parav Pandit
  0 siblings, 0 replies; 102+ messages in thread
From: Parav Pandit @ 2022-08-10 17:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Si-Wei Liu, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, August 10, 2022 12:59 PM
> 
> On Wed, Aug 10, 2022 at 04:22:41PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, August 10, 2022 12:05 PM
> > >
> > > On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > > > >
> > > > > > > Should we make this depend on the vq reset ability maybe?
> > > > > >
> > > > > > The advantage of this is to keep TX working. Or we can use
> > > > > > device reset as a fallback if there's no vq reset.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Device reset is really annoying in that it loses all the state:
> > > > > rx filters etc etc.
> > > >
> > > > The elegant solution is let driver tell the new mtu to the device.
> > > > One way to do so is by using existing ctrl vq.
> > >
> > > That will need a new feature bit.
> > >
> > Yes. ctrl vq can tell what all configuration does it allow. :) Or you
> > prefer feature bit?
> 
> We did feature bits for this in the past.
> 
Ok. Will try to draft the update for future.

Gavin should repost the patch to address comments unrelated to this future bit anyway.
Right?

> > > > If merged buffer is done, and new mtu is > minimum posting size,
> > > > no need
> > > to undergo vq reset.
> > > > If merged buffer is not done, and buffer posted are smaller than
> > > > new mtu,
> > > undergo vq reset optionally.
> > >
> > > This can be done with or without sending mtu to device.
> > Yes, telling mtu to device helps device to optimize and adhere to the spec
> line " The device MUST NOT pass received packets that exceed mtu" in
> section 5.1.4.1.
> 
> Again, that line refers to \field{mtu} which is the max mtu supported,
> irrespective to anything driver does.
> 
> --
> MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10 17:06                                               ` Parav Pandit
@ 2022-08-10 17:12                                                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10 17:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li

On Wed, Aug 10, 2022 at 05:06:33PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, August 10, 2022 12:59 PM
> > 
> > On Wed, Aug 10, 2022 at 04:22:41PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Wednesday, August 10, 2022 12:05 PM
> > > >
> > > > On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> > > > >
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > > > > >
> > > > > > > > Should we make this depend on the vq reset ability maybe?
> > > > > > >
> > > > > > > The advantage of this is to keep TX working. Or we can use
> > > > > > > device reset as a fallback if there's no vq reset.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Device reset is really annoying in that it loses all the state:
> > > > > > rx filters etc etc.
> > > > >
> > > > > The elegant solution is let driver tell the new mtu to the device.
> > > > > One way to do so is by using existing ctrl vq.
> > > >
> > > > That will need a new feature bit.
> > > >
> > > Yes. ctrl vq can tell what all configuration does it allow. :) Or you
> > > prefer feature bit?
> > 
> > We did feature bits for this in the past.
> > 
> Ok. Will try to draft the update for future.
> 
> Gavin should repost the patch to address comments unrelated to this future bit anyway.
> Right?

Right.

> > > > > If merged buffer is done, and new mtu is > minimum posting size,
> > > > > no need
> > > > to undergo vq reset.
> > > > > If merged buffer is not done, and buffer posted are smaller than
> > > > > new mtu,
> > > > undergo vq reset optionally.
> > > >
> > > > This can be done with or without sending mtu to device.
> > > Yes, telling mtu to device helps device to optimize and adhere to the spec
> > line " The device MUST NOT pass received packets that exceed mtu" in
> > section 5.1.4.1.
> > 
> > Again, that line refers to \field{mtu} which is the max mtu supported,
> > irrespective to anything driver does.
> > 
> > --
> > MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-10 17:12                                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 102+ messages in thread
From: Michael S. Tsirkin @ 2022-08-10 17:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Wang, Si-Wei Liu, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz

On Wed, Aug 10, 2022 at 05:06:33PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, August 10, 2022 12:59 PM
> > 
> > On Wed, Aug 10, 2022 at 04:22:41PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Wednesday, August 10, 2022 12:05 PM
> > > >
> > > > On Wed, Aug 10, 2022 at 04:00:08PM +0000, Parav Pandit wrote:
> > > > >
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > > > > >
> > > > > > > > Should we make this depend on the vq reset ability maybe?
> > > > > > >
> > > > > > > The advantage of this is to keep TX working. Or we can use
> > > > > > > device reset as a fallback if there's no vq reset.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Device reset is really annoying in that it loses all the state:
> > > > > > rx filters etc etc.
> > > > >
> > > > > The elegant solution is let driver tell the new mtu to the device.
> > > > > One way to do so is by using existing ctrl vq.
> > > >
> > > > That will need a new feature bit.
> > > >
> > > Yes. ctrl vq can tell what all configuration does it allow. :) Or you
> > > prefer feature bit?
> > 
> > We did feature bits for this in the past.
> > 
> Ok. Will try to draft the update for future.
> 
> Gavin should repost the patch to address comments unrelated to this future bit anyway.
> Right?

Right.

> > > > > If merged buffer is done, and new mtu is > minimum posting size,
> > > > > no need
> > > > to undergo vq reset.
> > > > > If merged buffer is not done, and buffer posted are smaller than
> > > > > new mtu,
> > > > undergo vq reset optionally.
> > > >
> > > > This can be done with or without sending mtu to device.
> > > Yes, telling mtu to device helps device to optimize and adhere to the spec
> > line " The device MUST NOT pass received packets that exceed mtu" in
> > section 5.1.4.1.
> > 
> > Again, that line refers to \field{mtu} which is the max mtu supported,
> > irrespective to anything driver does.
> > 
> > --
> > MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
  2022-08-10  6:15                                 ` Michael S. Tsirkin
@ 2022-08-11  0:26                                   ` Si-Wei Liu
  -1 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-11  0:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: alexander.h.duyck, Virtio-Dev, kubakici, sridhar.samudrala,
	jesse.brandeburg, Gavi Teitz, virtualization, Hemminger, Stephen,
	loseweigh, davem, Gavin Li



On 8/9/2022 11:15 PM, Michael S. Tsirkin wrote:
> On Wed, Aug 10, 2022 at 02:14:07AM -0400, Michael S. Tsirkin wrote:
>> On Tue, Aug 09, 2022 at 04:24:23PM -0700, Si-Wei Liu wrote:
>>>
>>> On 8/9/2022 3:49 PM, Parav Pandit wrote:
>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>> Sent: Tuesday, August 9, 2022 6:26 PM
>>>>> To: Parav Pandit <parav@nvidia.com>
>>>>> Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
>>>>> <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
>>>>> Stephen <stephen@networkplumber.org>; davem
>>>>> <davem@davemloft.net>; virtualization <virtualization@lists.linux-
>>>>> foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
>>>>> jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
>>>>> kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
>>>>> Teitz <gavi@nvidia.com>
>>>>> Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
>>>>> big packets
>>>>>
>>>>> On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
>>>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>>>> Sent: Tuesday, August 9, 2022 5:38 PM
>>>>>> [..]
>>>>>>>>> I think virtio-net driver doesn't differentiate MTU and MRU, in
>>>>>>>>> which case the receive buffer will be reduced to fit the 1500B
>>>>>>>>> payload size when mtu is lowered down to 1500 from 9000.
>>>>>>>> How? Driver reduced the mXu to 1500, say it is improved to post
>>>>>>>> buffers of
>>>>>>> 1500 bytes.
>>>>>>>> Device doesn't know about it because mtu in config space is RO field.
>>>>>>>> Device keep dropping 9K packets because buffers posted are 1500
>>>>> bytes.
>>>>>>>> This is because device follows the spec " The device MUST NOT pass
>>>>>>> received packets that exceed mtu".
>>>>>>>
>>>>>>>
>>>>>>> The "mtu" here is the device config field, which is
>>>>>>>
>>>>>>>           /* Default maximum transmit unit advice */
>>>>>>>
>>>>>> It is the field from struct virtio_net_config.mtu. right?
>>>>>> This is RO field for driver.
>>>>>>
>>>>>>> there is no guarantee device will not get a bigger packet.
>>>>>> Right. That is what I also hinted.
>>>>>> Hence, allocating buffers worth upto mtu is safer.
>>>>> yes
>>>>>
>>>>>> When user overrides it, driver can be further optimized to honor such new
>>>>> value on rx buffer posting.
>>>>>
>>>>> no, not without a feature bit promising device won't get wedged.
>>>>>
>>>> I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
>>>> Why device should be affected with it?
>>>> ( I am not proposing such weird configuration but asking for sake of correctness).
>>> I am also confused how the device can be wedged in this case.
>> Yea sorry. I misunderstood the code. It can't be.
> Here's a problem as I see it. Let's say we reduce mtu.
> Small buffers are added. Now we increase mtu.
> Device will drop all packets until small buffers are consumed.
>
> Should we make this depend on the vq reset ability maybe?
To be honest I am not sure if worth it, very few user changes MTU on the 
fly with traffic ongoing, for the most cases I've seen users just change 
it only once in deployment time. Even if they change it on the fly they 
may need to be aware of the consequence and implication of loss of 
packets. In real devices, mtu change could end up with link status 
change and in that case there's usually no guarantee arrived  packet 
will be kept during the window.

While I could understand this would slightly introduce regression on 
functionality, as the worst case for packet loss if device dropping 
packets, it would be all smaller buffers of the full queue size. For 
correctness and elegance I don't mind introducing specific feature that 
changes MTU, or even relying on vq_reset is fine. device_reset would be 
too overwhelmed for this special use case IMHO.

-Siwei

>
>> -- 
>> MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets
@ 2022-08-11  0:26                                   ` Si-Wei Liu
  0 siblings, 0 replies; 102+ messages in thread
From: Si-Wei Liu @ 2022-08-11  0:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Jason Wang, Gavin Li, Hemminger, Stephen, davem,
	virtualization, Virtio-Dev, jesse.brandeburg, alexander.h.duyck,
	kubakici, sridhar.samudrala, loseweigh, Gavi Teitz



On 8/9/2022 11:15 PM, Michael S. Tsirkin wrote:
> On Wed, Aug 10, 2022 at 02:14:07AM -0400, Michael S. Tsirkin wrote:
>> On Tue, Aug 09, 2022 at 04:24:23PM -0700, Si-Wei Liu wrote:
>>>
>>> On 8/9/2022 3:49 PM, Parav Pandit wrote:
>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>> Sent: Tuesday, August 9, 2022 6:26 PM
>>>>> To: Parav Pandit <parav@nvidia.com>
>>>>> Cc: Si-Wei Liu <si-wei.liu@oracle.com>; Jason Wang
>>>>> <jasowang@redhat.com>; Gavin Li <gavinl@nvidia.com>; Hemminger,
>>>>> Stephen <stephen@networkplumber.org>; davem
>>>>> <davem@davemloft.net>; virtualization <virtualization@lists.linux-
>>>>> foundation.org>; Virtio-Dev <virtio-dev@lists.oasis-open.org>;
>>>>> jesse.brandeburg@intel.com; alexander.h.duyck@intel.com;
>>>>> kubakici@wp.pl; sridhar.samudrala@intel.com; loseweigh@gmail.com; Gavi
>>>>> Teitz <gavi@nvidia.com>
>>>>> Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for
>>>>> big packets
>>>>>
>>>>> On Tue, Aug 09, 2022 at 09:49:03PM +0000, Parav Pandit wrote:
>>>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>>>> Sent: Tuesday, August 9, 2022 5:38 PM
>>>>>> [..]
>>>>>>>>> I think virtio-net driver doesn't differentiate MTU and MRU, in
>>>>>>>>> which case the receive buffer will be reduced to fit the 1500B
>>>>>>>>> payload size when mtu is lowered down to 1500 from 9000.
>>>>>>>> How? Driver reduced the mXu to 1500, say it is improved to post
>>>>>>>> buffers of
>>>>>>> 1500 bytes.
>>>>>>>> Device doesn't know about it because mtu in config space is RO field.
>>>>>>>> Device keep dropping 9K packets because buffers posted are 1500
>>>>> bytes.
>>>>>>>> This is because device follows the spec " The device MUST NOT pass
>>>>>>> received packets that exceed mtu".
>>>>>>>
>>>>>>>
>>>>>>> The "mtu" here is the device config field, which is
>>>>>>>
>>>>>>>           /* Default maximum transmit unit advice */
>>>>>>>
>>>>>> It is the field from struct virtio_net_config.mtu. right?
>>>>>> This is RO field for driver.
>>>>>>
>>>>>>> there is no guarantee device will not get a bigger packet.
>>>>>> Right. That is what I also hinted.
>>>>>> Hence, allocating buffers worth upto mtu is safer.
>>>>> yes
>>>>>
>>>>>> When user overrides it, driver can be further optimized to honor such new
>>>>> value on rx buffer posting.
>>>>>
>>>>> no, not without a feature bit promising device won't get wedged.
>>>>>
>>>> I mean to say as_it_stands today, driver can decide to post smaller buffers with larger mtu.
>>>> Why device should be affected with it?
>>>> ( I am not proposing such weird configuration but asking for sake of correctness).
>>> I am also confused how the device can be wedged in this case.
>> Yea sorry. I misunderstood the code. It can't be.
> Here's a problem as I see it. Let's say we reduce mtu.
> Small buffers are added. Now we increase mtu.
> Device will drop all packets until small buffers are consumed.
>
> Should we make this depend on the vq reset ability maybe?
To be honest I am not sure if worth it, very few user changes MTU on the 
fly with traffic ongoing, for the most cases I've seen users just change 
it only once in deployment time. Even if they change it on the fly they 
may need to be aware of the consequence and implication of loss of 
packets. In real devices, mtu change could end up with link status 
change and in that case there's usually no guarantee arrived  packet 
will be kept during the window.

While I could understand this would slightly introduce regression on 
functionality, as the worst case for packet loss if device dropping 
packets, it would be all smaller buffers of the full queue size. For 
correctness and elegance I don't mind introducing specific feature that 
changes MTU, or even relying on vq_reset is fine. device_reset would be 
too overwhelmed for this special use case IMHO.

-Siwei

>
>> -- 
>> MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2022-08-11  0:26 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-02  4:45 [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets Gavin Li
2022-08-04  5:00 ` Jason Wang
2022-08-04  5:00   ` Jason Wang
2022-08-04  7:10   ` Michael S. Tsirkin
2022-08-04  7:10     ` Michael S. Tsirkin
2022-08-04  7:23     ` Jason Wang
2022-08-04  7:23       ` Jason Wang
2022-08-04  7:24       ` Jason Wang
2022-08-04  7:24         ` Jason Wang
2022-08-08  6:54         ` Gavin Li
2022-08-08  6:24     ` Gavin Li
2022-08-05 22:11 ` Si-Wei Liu
2022-08-05 22:11   ` Si-Wei Liu
2022-08-05 23:26   ` Si-Wei Liu
2022-08-05 23:26     ` Si-Wei Liu
2022-08-08  7:34     ` Gavin Li
2022-08-08  7:31   ` Gavin Li
2022-08-08 23:56     ` Si-Wei Liu
2022-08-08 23:56       ` Si-Wei Liu
2022-08-09  7:06       ` Gavin Li
2022-08-09  7:44         ` Jason Wang
2022-08-09  7:44           ` Jason Wang
2022-08-09  9:22           ` Michael S. Tsirkin
2022-08-09  9:22             ` Michael S. Tsirkin
2022-08-09  9:28             ` Jason Wang
2022-08-09  9:28               ` Jason Wang
2022-08-09  9:25           ` Michael S. Tsirkin
2022-08-09  9:25             ` Michael S. Tsirkin
2022-08-09  9:40             ` Jason Wang
2022-08-09  9:40               ` Jason Wang
2022-08-09 18:38           ` Si-Wei Liu
2022-08-09 18:38             ` Si-Wei Liu
2022-08-09 18:42             ` Parav Pandit via Virtualization
2022-08-09 18:42               ` Parav Pandit
2022-08-09 19:08               ` Si-Wei Liu
2022-08-09 19:08                 ` Si-Wei Liu
2022-08-09 19:18                 ` Parav Pandit via Virtualization
2022-08-09 19:18                   ` Parav Pandit
2022-08-09 20:32                   ` Si-Wei Liu
2022-08-09 20:32                     ` Si-Wei Liu
2022-08-09 21:13                     ` Parav Pandit via Virtualization
2022-08-09 21:13                       ` Parav Pandit
2022-08-09 21:32                       ` Michael S. Tsirkin
2022-08-09 21:32                         ` Michael S. Tsirkin
2022-08-09 21:37                   ` Michael S. Tsirkin
2022-08-09 21:37                     ` Michael S. Tsirkin
2022-08-09 21:49                     ` Parav Pandit via Virtualization
2022-08-09 21:49                       ` Parav Pandit
2022-08-09 22:25                       ` Michael S. Tsirkin
2022-08-09 22:25                         ` Michael S. Tsirkin
2022-08-09 22:49                         ` Parav Pandit via Virtualization
2022-08-09 22:49                           ` Parav Pandit
2022-08-09 22:59                           ` Michael S. Tsirkin
2022-08-09 22:59                             ` Michael S. Tsirkin
2022-08-09 23:04                           ` Michael S. Tsirkin
2022-08-09 23:04                             ` Michael S. Tsirkin
2022-08-09 23:24                           ` Si-Wei Liu
2022-08-09 23:24                             ` Si-Wei Liu
2022-08-10  6:14                             ` Michael S. Tsirkin
2022-08-10  6:14                               ` Michael S. Tsirkin
2022-08-10  6:15                               ` Michael S. Tsirkin
2022-08-10  6:15                                 ` Michael S. Tsirkin
2022-08-10  6:59                                 ` Jason Wang
2022-08-10  6:59                                   ` Jason Wang
2022-08-10  9:03                                   ` Michael S. Tsirkin
2022-08-10  9:03                                     ` Michael S. Tsirkin
2022-08-10 16:00                                     ` Parav Pandit via Virtualization
2022-08-10 16:00                                       ` Parav Pandit
2022-08-10 16:05                                       ` Michael S. Tsirkin
2022-08-10 16:05                                         ` Michael S. Tsirkin
2022-08-10 16:22                                         ` Parav Pandit via Virtualization
2022-08-10 16:22                                           ` Parav Pandit
2022-08-10 16:58                                           ` Michael S. Tsirkin
2022-08-10 16:58                                             ` Michael S. Tsirkin
2022-08-10 17:02                                             ` Michael S. Tsirkin
2022-08-10 17:02                                               ` Michael S. Tsirkin
2022-08-10 17:06                                             ` Parav Pandit via Virtualization
2022-08-10 17:06                                               ` Parav Pandit
2022-08-10 17:12                                               ` Michael S. Tsirkin
2022-08-10 17:12                                                 ` Michael S. Tsirkin
2022-08-11  0:26                                 ` Si-Wei Liu
2022-08-11  0:26                                   ` Si-Wei Liu
2022-08-09 22:32                     ` Si-Wei Liu
2022-08-09 22:32                       ` Si-Wei Liu
2022-08-09 22:37                       ` Michael S. Tsirkin
2022-08-09 22:37                         ` Michael S. Tsirkin
2022-08-09 22:54                         ` Si-Wei Liu
2022-08-09 22:54                           ` Si-Wei Liu
2022-08-09 23:03                           ` Michael S. Tsirkin
2022-08-09 23:03                             ` Michael S. Tsirkin
2022-08-10  1:24                           ` Jason Wang
2022-08-10  1:24                             ` Jason Wang
2022-08-09 21:34             ` Michael S. Tsirkin
2022-08-09 21:34               ` Michael S. Tsirkin
2022-08-09 21:39               ` Si-Wei Liu
2022-08-09 21:39                 ` Si-Wei Liu
2022-08-09 22:27                 ` Michael S. Tsirkin
2022-08-09 22:27                   ` Michael S. Tsirkin
2022-08-10  1:15             ` Jason Wang
2022-08-10  1:15               ` Jason Wang
2022-08-09 18:06         ` Si-Wei Liu
2022-08-09 18:06           ` Si-Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.