All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] virtio_net: Use BUG_ON instead of if condition followed by BUG
@ 2021-05-17 13:31 Xianting Tian
  2021-05-17 14:35   ` Andrew Lunn
  2021-05-17 18:43 ` virtio_net: BQL? Dave Taht
  0 siblings, 2 replies; 20+ messages in thread
From: Xianting Tian @ 2021-05-17 13:31 UTC (permalink / raw)
  To: mst, jasowang, davem, kuba; +Cc: virtualization, netdev, linux-kernel

BUG_ON() uses unlikely in if(), which can be optimized at compile time.

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
---
  drivers/net/virtio_net.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c921ebf3ae82..212d52204884 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct 
sk_buff *skb)
  	else
  		hdr = skb_vnet_hdr(skb);

-	if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
+	BUG_ON(virtio_net_hdr_from_skb(skb, &hdr->hdr,
  				    virtio_is_little_endian(vi->vdev), false,
-				    0))
-		BUG();
+				    0));

  	if (vi->mergeable_rx_bufs)
  		hdr->num_buffers = 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] virtio_net: Use BUG_ON instead of if condition followed by BUG
  2021-05-17 13:31 [PATCH] virtio_net: Use BUG_ON instead of if condition followed by BUG Xianting Tian
@ 2021-05-17 14:35   ` Andrew Lunn
  2021-05-17 18:43 ` virtio_net: BQL? Dave Taht
  1 sibling, 0 replies; 20+ messages in thread
From: Andrew Lunn @ 2021-05-17 14:35 UTC (permalink / raw)
  To: Xianting Tian
  Cc: mst, jasowang, davem, kuba, virtualization, netdev, linux-kernel

On Mon, May 17, 2021 at 09:31:19PM +0800, Xianting Tian wrote:
> BUG_ON() uses unlikely in if(), which can be optimized at compile time.
> 
> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c921ebf3ae82..212d52204884 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
> sk_buff *skb)
>  	else
>  		hdr = skb_vnet_hdr(skb);
> 
> -	if (virtio_net_hdr_from_skb(skb, &hdr->hdr,

How fatal is it not being able to get the header from the skb? There
has been push back on the use of BUG() or its variants, since it kills
the machine dead. Would it be possible to turn this into a WARN_ON and
return -EPROTO or something?

       Andrew

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] virtio_net: Use BUG_ON instead of if condition followed by BUG
@ 2021-05-17 14:35   ` Andrew Lunn
  0 siblings, 0 replies; 20+ messages in thread
From: Andrew Lunn @ 2021-05-17 14:35 UTC (permalink / raw)
  To: Xianting Tian; +Cc: mst, netdev, linux-kernel, virtualization, kuba, davem

On Mon, May 17, 2021 at 09:31:19PM +0800, Xianting Tian wrote:
> BUG_ON() uses unlikely in if(), which can be optimized at compile time.
> 
> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c921ebf3ae82..212d52204884 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
> sk_buff *skb)
>  	else
>  		hdr = skb_vnet_hdr(skb);
> 
> -	if (virtio_net_hdr_from_skb(skb, &hdr->hdr,

How fatal is it not being able to get the header from the skb? There
has been push back on the use of BUG() or its variants, since it kills
the machine dead. Would it be possible to turn this into a WARN_ON and
return -EPROTO or something?

       Andrew
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* virtio_net: BQL?
  2021-05-17 13:31 [PATCH] virtio_net: Use BUG_ON instead of if condition followed by BUG Xianting Tian
  2021-05-17 14:35   ` Andrew Lunn
@ 2021-05-17 18:43 ` Dave Taht
  2021-05-17 20:23     ` Willem de Bruijn
  2021-05-19  8:35     ` Michael S. Tsirkin
  1 sibling, 2 replies; 20+ messages in thread
From: Dave Taht @ 2021-05-17 18:43 UTC (permalink / raw)
  To: Xianting Tian
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski,
	virtualization, Linux Kernel Network Developers, LKML

Not really related to this patch, but is there some reason why virtio
has no support for BQL?

On Mon, May 17, 2021 at 11:41 AM Xianting Tian
<xianting.tian@linux.alibaba.com> wrote:
>
> BUG_ON() uses unlikely in if(), which can be optimized at compile time.
>
> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c921ebf3ae82..212d52204884 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
> sk_buff *skb)
>         else
>                 hdr = skb_vnet_hdr(skb);
>
> -       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
> +       BUG_ON(virtio_net_hdr_from_skb(skb, &hdr->hdr,
>                                     virtio_is_little_endian(vi->vdev), false,
> -                                   0))
> -               BUG();
> +                                   0));
>
>         if (vi->mergeable_rx_bufs)
>                 hdr->num_buffers = 0;
> --
> 2.17.1
>


-- 
Latest Podcast:
https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/

Dave Täht CTO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: virtio_net: BQL?
  2021-05-17 18:43 ` virtio_net: BQL? Dave Taht
@ 2021-05-17 20:23     ` Willem de Bruijn
  2021-05-19  8:35     ` Michael S. Tsirkin
  1 sibling, 0 replies; 20+ messages in thread
From: Willem de Bruijn @ 2021-05-17 20:23 UTC (permalink / raw)
  To: Dave Taht
  Cc: Xianting Tian, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Jakub Kicinski, virtualization, Linux Kernel Network Developers,
	LKML

On Mon, May 17, 2021 at 2:44 PM Dave Taht <dave.taht@gmail.com> wrote:
>
> Not really related to this patch, but is there some reason why virtio
> has no support for BQL?

There have been a few attempts to add it over the years.

Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/

That thread has a long discussion. I think the key open issue remains

"The tricky part is the mode switching between napi and no napi."

> On Mon, May 17, 2021 at 11:41 AM Xianting Tian
> <xianting.tian@linux.alibaba.com> wrote:
> >
> > BUG_ON() uses unlikely in if(), which can be optimized at compile time.
> >
> > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> > ---
> >   drivers/net/virtio_net.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c921ebf3ae82..212d52204884 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
> > sk_buff *skb)
> >         else
> >                 hdr = skb_vnet_hdr(skb);
> >
> > -       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
> > +       BUG_ON(virtio_net_hdr_from_skb(skb, &hdr->hdr,
> >                                     virtio_is_little_endian(vi->vdev), false,
> > -                                   0))
> > -               BUG();
> > +                                   0));
> >
> >         if (vi->mergeable_rx_bufs)
> >                 hdr->num_buffers = 0;
> > --
> > 2.17.1
> >
>
>
> --
> Latest Podcast:
> https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
>
> Dave Täht CTO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: virtio_net: BQL?
@ 2021-05-17 20:23     ` Willem de Bruijn
  0 siblings, 0 replies; 20+ messages in thread
From: Willem de Bruijn @ 2021-05-17 20:23 UTC (permalink / raw)
  To: Dave Taht
  Cc: Michael S. Tsirkin, Xianting Tian,
	Linux Kernel Network Developers, LKML, virtualization,
	Jakub Kicinski, David S. Miller

On Mon, May 17, 2021 at 2:44 PM Dave Taht <dave.taht@gmail.com> wrote:
>
> Not really related to this patch, but is there some reason why virtio
> has no support for BQL?

There have been a few attempts to add it over the years.

Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/

That thread has a long discussion. I think the key open issue remains

"The tricky part is the mode switching between napi and no napi."

> On Mon, May 17, 2021 at 11:41 AM Xianting Tian
> <xianting.tian@linux.alibaba.com> wrote:
> >
> > BUG_ON() uses unlikely in if(), which can be optimized at compile time.
> >
> > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> > ---
> >   drivers/net/virtio_net.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c921ebf3ae82..212d52204884 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
> > sk_buff *skb)
> >         else
> >                 hdr = skb_vnet_hdr(skb);
> >
> > -       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
> > +       BUG_ON(virtio_net_hdr_from_skb(skb, &hdr->hdr,
> >                                     virtio_is_little_endian(vi->vdev), false,
> > -                                   0))
> > -               BUG();
> > +                                   0));
> >
> >         if (vi->mergeable_rx_bufs)
> >                 hdr->num_buffers = 0;
> > --
> > 2.17.1
> >
>
>
> --
> Latest Podcast:
> https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
>
> Dave Täht CTO, TekLibre, LLC
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: virtio_net: BQL?
  2021-05-17 20:23     ` Willem de Bruijn
  (?)
@ 2021-05-17 21:48     ` Dave Taht
  2021-05-17 23:00       ` [Bloat] " Stephen Hemminger
  2021-05-24  2:53         ` Jason Wang
  -1 siblings, 2 replies; 20+ messages in thread
From: Dave Taht @ 2021-05-17 21:48 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Xianting Tian, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Jakub Kicinski, virtualization, Linux Kernel Network Developers,
	LKML, bloat

On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Mon, May 17, 2021 at 2:44 PM Dave Taht <dave.taht@gmail.com> wrote:
> >
> > Not really related to this patch, but is there some reason why virtio
> > has no support for BQL?
>
> There have been a few attempts to add it over the years.
>
> Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/
>
> That thread has a long discussion. I think the key open issue remains
>
> "The tricky part is the mode switching between napi and no napi."

Oy, vey.

I didn't pay any attention to that discussion, sadly enough.

It's been about that long (2018) since I paid any attention to
bufferbloat in the cloud and my cloudy provider (linode) switched to
using virtio when I wasn't looking. For over a year now, I'd been
getting reports saying that comcast's pie rollout wasn't working as
well as expected, that evenroute's implementation of sch_cake and sqm
on inbound wasn't working right, nor pf_sense's and numerous other
issues at Internet scale.

Last week I ran a string of benchmarks against starlink's new services
and was really aghast at what I found there, too. but the problem
seemed deeper than in just the dishy...

Without BQL, there's no backpressure for fq_codel to do its thing.
None. My measurement servers aren't FQ-codeling
no matter how much load I put on them. Since that qdisc is the default
now in most linux distributions, I imagine that the bulk of the cloud
is now behaving as erratically as linux was in 2011 with enormous
swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
rings, [1], breaking various rate estimators in codel, pie and the tcp
stack itself.

See:

http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png

See the swings in latency there? that's symptomatic of tx/rx rings
filling and emptying.

it wasn't until I switched my measurement server temporarily over to
sch_fq that I got a rrul result that was close to the results we used
to get from the virtualized e1000e drivers we were using in 2014.

http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png

While I have long supported the use of sch_fq for tcp-heavy workloads,
it still behaves better with bql in place, and fq_codel is better for
generic workloads... but needs bql based backpressure to kick in.

[1] I really hope I'm overreacting but, um, er, could someone(s) spin
up a new patch that does bql in some way even half right for this
driver and help test it? I haven't built a kernel in a while.


> > On Mon, May 17, 2021 at 11:41 AM Xianting Tian
> > <xianting.tian@linux.alibaba.com> wrote:
> > >
> > > BUG_ON() uses unlikely in if(), which can be optimized at compile time.
> > >
> > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> > > ---
> > >   drivers/net/virtio_net.c | 5 ++---
> > >   1 file changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index c921ebf3ae82..212d52204884 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
> > > sk_buff *skb)
> > >         else
> > >                 hdr = skb_vnet_hdr(skb);
> > >
> > > -       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
> > > +       BUG_ON(virtio_net_hdr_from_skb(skb, &hdr->hdr,
> > >                                     virtio_is_little_endian(vi->vdev), false,
> > > -                                   0))
> > > -               BUG();
> > > +                                   0));
> > >
> > >         if (vi->mergeable_rx_bufs)
> > >                 hdr->num_buffers = 0;
> > > --
> > > 2.17.1
> > >
> >
> >
> > --
> > Latest Podcast:
> > https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
> >
> > Dave Täht CTO, TekLibre, LLC



--
Latest Podcast:
https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/

Dave Täht CTO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] virtio_net: BQL?
  2021-05-17 21:48     ` Dave Taht
@ 2021-05-17 23:00       ` Stephen Hemminger
  2021-05-17 23:32         ` Dave Taht
  2021-05-19 14:31           ` Eric Dumazet
  2021-05-24  2:53         ` Jason Wang
  1 sibling, 2 replies; 20+ messages in thread
From: Stephen Hemminger @ 2021-05-17 23:00 UTC (permalink / raw)
  To: Dave Taht
  Cc: Willem de Bruijn, Michael S. Tsirkin, Xianting Tian,
	Linux Kernel Network Developers, LKML, virtualization, bloat,
	Jakub Kicinski, David S. Miller

On Mon, 17 May 2021 14:48:46 -0700
Dave Taht <dave.taht@gmail.com> wrote:

> On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Mon, May 17, 2021 at 2:44 PM Dave Taht <dave.taht@gmail.com> wrote:  
> > >
> > > Not really related to this patch, but is there some reason why virtio
> > > has no support for BQL?  
> >
> > There have been a few attempts to add it over the years.
> >
> > Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/
> >
> > That thread has a long discussion. I think the key open issue remains
> >
> > "The tricky part is the mode switching between napi and no napi."  
> 
> Oy, vey.
> 
> I didn't pay any attention to that discussion, sadly enough.
> 
> It's been about that long (2018) since I paid any attention to
> bufferbloat in the cloud and my cloudy provider (linode) switched to
> using virtio when I wasn't looking. For over a year now, I'd been
> getting reports saying that comcast's pie rollout wasn't working as
> well as expected, that evenroute's implementation of sch_cake and sqm
> on inbound wasn't working right, nor pf_sense's and numerous other
> issues at Internet scale.
> 
> Last week I ran a string of benchmarks against starlink's new services
> and was really aghast at what I found there, too. but the problem
> seemed deeper than in just the dishy...
> 
> Without BQL, there's no backpressure for fq_codel to do its thing.
> None. My measurement servers aren't FQ-codeling
> no matter how much load I put on them. Since that qdisc is the default
> now in most linux distributions, I imagine that the bulk of the cloud
> is now behaving as erratically as linux was in 2011 with enormous
> swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
> rings, [1], breaking various rate estimators in codel, pie and the tcp
> stack itself.
> 
> See:
> 
> http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png
> 
> See the swings in latency there? that's symptomatic of tx/rx rings
> filling and emptying.
> 
> it wasn't until I switched my measurement server temporarily over to
> sch_fq that I got a rrul result that was close to the results we used
> to get from the virtualized e1000e drivers we were using in 2014.
> 
> http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png
> 
> While I have long supported the use of sch_fq for tcp-heavy workloads,
> it still behaves better with bql in place, and fq_codel is better for
> generic workloads... but needs bql based backpressure to kick in.
> 
> [1] I really hope I'm overreacting but, um, er, could someone(s) spin
> up a new patch that does bql in some way even half right for this
> driver and help test it? I haven't built a kernel in a while.
> 

The Azure network driver (netvsc) also does not have BQL. Several years ago
I tried adding it but it benchmarked worse and there is the added complexity
of handling the accelerated networking VF path.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] virtio_net: BQL?
  2021-05-17 23:00       ` [Bloat] " Stephen Hemminger
@ 2021-05-17 23:32         ` Dave Taht
  2021-05-18  2:48           ` Stephen Hemminger
  2021-05-19 14:31           ` Eric Dumazet
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Taht @ 2021-05-17 23:32 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Willem de Bruijn, Michael S. Tsirkin, Xianting Tian,
	Linux Kernel Network Developers, LKML, virtualization, bloat,
	Jakub Kicinski, David S. Miller

On Mon, May 17, 2021 at 4:00 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 17 May 2021 14:48:46 -0700
> Dave Taht <dave.taht@gmail.com> wrote:
>
> > On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Mon, May 17, 2021 at 2:44 PM Dave Taht <dave.taht@gmail.com> wrote:
> > > >
> > > > Not really related to this patch, but is there some reason why virtio
> > > > has no support for BQL?
> > >
> > > There have been a few attempts to add it over the years.
> > >
> > > Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/
> > >
> > > That thread has a long discussion. I think the key open issue remains
> > >
> > > "The tricky part is the mode switching between napi and no napi."
> >
> > Oy, vey.
> >
> > I didn't pay any attention to that discussion, sadly enough.
> >
> > It's been about that long (2018) since I paid any attention to
> > bufferbloat in the cloud and my cloudy provider (linode) switched to
> > using virtio when I wasn't looking. For over a year now, I'd been
> > getting reports saying that comcast's pie rollout wasn't working as
> > well as expected, that evenroute's implementation of sch_cake and sqm
> > on inbound wasn't working right, nor pf_sense's and numerous other
> > issues at Internet scale.
> >
> > Last week I ran a string of benchmarks against starlink's new services
> > and was really aghast at what I found there, too. but the problem
> > seemed deeper than in just the dishy...
> >
> > Without BQL, there's no backpressure for fq_codel to do its thing.
> > None. My measurement servers aren't FQ-codeling
> > no matter how much load I put on them. Since that qdisc is the default
> > now in most linux distributions, I imagine that the bulk of the cloud
> > is now behaving as erratically as linux was in 2011 with enormous
> > swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
> > rings, [1], breaking various rate estimators in codel, pie and the tcp
> > stack itself.
> >
> > See:
> >
> > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png
> >
> > See the swings in latency there? that's symptomatic of tx/rx rings
> > filling and emptying.
> >
> > it wasn't until I switched my measurement server temporarily over to
> > sch_fq that I got a rrul result that was close to the results we used
> > to get from the virtualized e1000e drivers we were using in 2014.
> >
> > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png
> >
> > While I have long supported the use of sch_fq for tcp-heavy workloads,
> > it still behaves better with bql in place, and fq_codel is better for
> > generic workloads... but needs bql based backpressure to kick in.
> >
> > [1] I really hope I'm overreacting but, um, er, could someone(s) spin
> > up a new patch that does bql in some way even half right for this
> > driver and help test it? I haven't built a kernel in a while.
> >
>
> The Azure network driver (netvsc) also does not have BQL. Several years ago
> I tried adding it but it benchmarked worse and there is the added complexity
> of handling the accelerated networking VF path.

I certainly agree it adds complexity, but the question is what sort of
network behavior resulted without backpressure inside the
vm?

What sorts of benchmarks did you do?

I will get setup to do some testing of this that is less adhoc.


-- 
Latest Podcast:
https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/

Dave Täht CTO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] virtio_net: BQL?
  2021-05-17 23:32         ` Dave Taht
@ 2021-05-18  2:48           ` Stephen Hemminger
  0 siblings, 0 replies; 20+ messages in thread
From: Stephen Hemminger @ 2021-05-18  2:48 UTC (permalink / raw)
  To: Dave Taht
  Cc: Willem de Bruijn, Michael S. Tsirkin, Xianting Tian,
	Linux Kernel Network Developers, LKML, virtualization, bloat,
	Jakub Kicinski, David S. Miller

On Mon, 17 May 2021 16:32:21 -0700
Dave Taht <dave.taht@gmail.com> wrote:

> On Mon, May 17, 2021 at 4:00 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > On Mon, 17 May 2021 14:48:46 -0700
> > Dave Taht <dave.taht@gmail.com> wrote:
> >  
> > > On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
> > > <willemdebruijn.kernel@gmail.com> wrote:  
> > > >
> > > > On Mon, May 17, 2021 at 2:44 PM Dave Taht <dave.taht@gmail.com> wrote:  
> > > > >
> > > > > Not really related to this patch, but is there some reason why virtio
> > > > > has no support for BQL?  
> > > >
> > > > There have been a few attempts to add it over the years.
> > > >
> > > > Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/
> > > >
> > > > That thread has a long discussion. I think the key open issue remains
> > > >
> > > > "The tricky part is the mode switching between napi and no napi."  
> > >
> > > Oy, vey.
> > >
> > > I didn't pay any attention to that discussion, sadly enough.
> > >
> > > It's been about that long (2018) since I paid any attention to
> > > bufferbloat in the cloud and my cloudy provider (linode) switched to
> > > using virtio when I wasn't looking. For over a year now, I'd been
> > > getting reports saying that comcast's pie rollout wasn't working as
> > > well as expected, that evenroute's implementation of sch_cake and sqm
> > > on inbound wasn't working right, nor pf_sense's and numerous other
> > > issues at Internet scale.
> > >
> > > Last week I ran a string of benchmarks against starlink's new services
> > > and was really aghast at what I found there, too. but the problem
> > > seemed deeper than in just the dishy...
> > >
> > > Without BQL, there's no backpressure for fq_codel to do its thing.
> > > None. My measurement servers aren't FQ-codeling
> > > no matter how much load I put on them. Since that qdisc is the default
> > > now in most linux distributions, I imagine that the bulk of the cloud
> > > is now behaving as erratically as linux was in 2011 with enormous
> > > swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
> > > rings, [1], breaking various rate estimators in codel, pie and the tcp
> > > stack itself.
> > >
> > > See:
> > >
> > > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png
> > >
> > > See the swings in latency there? that's symptomatic of tx/rx rings
> > > filling and emptying.
> > >
> > > it wasn't until I switched my measurement server temporarily over to
> > > sch_fq that I got a rrul result that was close to the results we used
> > > to get from the virtualized e1000e drivers we were using in 2014.
> > >
> > > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png
> > >
> > > While I have long supported the use of sch_fq for tcp-heavy workloads,
> > > it still behaves better with bql in place, and fq_codel is better for
> > > generic workloads... but needs bql based backpressure to kick in.
> > >
> > > [1] I really hope I'm overreacting but, um, er, could someone(s) spin
> > > up a new patch that does bql in some way even half right for this
> > > driver and help test it? I haven't built a kernel in a while.
> > >  
> >
> > The Azure network driver (netvsc) also does not have BQL. Several years ago
> > I tried adding it but it benchmarked worse and there is the added complexity
> > of handling the accelerated networking VF path.  
> 
> I certainly agree it adds complexity, but the question is what sort of
> network behavior resulted without backpressure inside the
> vm?
> 
> What sorts of benchmarks did you do?
> 
> I will get setup to do some testing of this that is less adhoc.

Less of an issue than it seems for must users.

For the most common case, all transmits are passed through to the underlying
VF network device (Mellanox). So since Mellanox supports BQL, that works.
The special case is if accelerated networking is disabled or host is being
serviced and the slow path is used. Optimizing the slow path is not that
interesting.

I wonder if the use of SRIOV with virtio (which requires another layer
with the failover device) behaves the same way?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] virtio_net: Use BUG_ON instead of if condition followed by BUG
  2021-05-17 14:35   ` Andrew Lunn
  (?)
@ 2021-05-18  9:20   ` Xianting Tian
  -1 siblings, 0 replies; 20+ messages in thread
From: Xianting Tian @ 2021-05-18  9:20 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: mst, jasowang, davem, kuba, virtualization, netdev, linux-kernel

thanks for your comments,
It is a good idea, I think we can follow the similar logic in function 
'receive_buf':
	if (virtio_net_hdr_to_skb(skb, &hdr->hdr,
				  virtio_is_little_endian(vi->vdev))) {
		net_warn_ratelimited("%s: bad gso: type: %u, size:%u\n",
				     dev->name, hdr->hdr.gso_type,
				     hdr->hdr.gso_size);
		goto frame_err;
	}

I will summit a new patch later.


在 2021/5/17 下午10:35, Andrew Lunn 写道:
> On Mon, May 17, 2021 at 09:31:19PM +0800, Xianting Tian wrote:
>> BUG_ON() uses unlikely in if(), which can be optimized at compile time.
>>
>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 5 ++---
>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index c921ebf3ae82..212d52204884 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
>> sk_buff *skb)
>>   	else
>>   		hdr = skb_vnet_hdr(skb);
>>
>> -	if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
> 
> How fatal is it not being able to get the header from the skb? There
> has been push back on the use of BUG() or its variants, since it kills
> the machine dead. Would it be possible to turn this into a WARN_ON and
> return -EPROTO or something?
> 
>         Andrew
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: virtio_net: BQL?
  2021-05-17 18:43 ` virtio_net: BQL? Dave Taht
@ 2021-05-19  8:35     ` Michael S. Tsirkin
  2021-05-19  8:35     ` Michael S. Tsirkin
  1 sibling, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-05-19  8:35 UTC (permalink / raw)
  To: Dave Taht
  Cc: Xianting Tian, Jason Wang, David S. Miller, Jakub Kicinski,
	virtualization, Linux Kernel Network Developers, LKML

On Mon, May 17, 2021 at 11:43:43AM -0700, Dave Taht wrote:
> Not really related to this patch, but is there some reason why virtio
> has no support for BQL?

So just so you can try it out, I rebased my old patch.
XDP is handled incorrectly by it so we shouldn't apply it as is,
but should be good enough for you to see whether it helps.
Completely untested!

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>



diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7be93ca01650..4bfb682a20b2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -556,6 +556,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 			kicks = 1;
 	}
 out:
+	/* TODO: netdev_tx_completed_queue? */
 	u64_stats_update_begin(&sq->stats.syncp);
 	sq->stats.bytes += bytes;
 	sq->stats.packets += packets;
@@ -1376,7 +1377,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 	return stats.packets;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
+static void free_old_xmit_skbs(struct netdev_queue *txq, struct send_queue *sq, bool in_napi)
 {
 	unsigned int len;
 	unsigned int packets = 0;
@@ -1406,6 +1407,8 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 	if (!packets)
 		return;
 
+	netdev_tx_completed_queue(txq, packets, bytes);
+
 	u64_stats_update_begin(&sq->stats.syncp);
 	sq->stats.bytes += bytes;
 	sq->stats.packets += packets;
@@ -1434,7 +1437,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 
 	if (__netif_tx_trylock(txq)) {
 		virtqueue_disable_cb(sq->vq);
-		free_old_xmit_skbs(sq, true);
+		free_old_xmit_skbs(txq, sq, true);
 
 		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 			netif_tx_wake_queue(txq);
@@ -1522,7 +1525,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	txq = netdev_get_tx_queue(vi->dev, index);
 	__netif_tx_lock(txq, raw_smp_processor_id());
 	virtqueue_disable_cb(sq->vq);
-	free_old_xmit_skbs(sq, true);
+	free_old_xmit_skbs(txq, sq, true);
 
 	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 		netif_tx_wake_queue(txq);
@@ -1606,10 +1609,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !netdev_xmit_more();
 	bool use_napi = sq->napi.weight;
+	unsigned int bytes = skb->len;
 
 	/* Free up any pending old buffers before queueing new ones. */
 	virtqueue_disable_cb(sq->vq);
-	free_old_xmit_skbs(sq, false);
+	free_old_xmit_skbs(txq, sq, false);
 
 	if (use_napi && kick)
 		virtqueue_enable_cb_delayed(sq->vq);
@@ -1638,6 +1642,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		nf_reset_ct(skb);
 	}
 
+	netdev_tx_sent_queue(txq, bytes);
+
 	/* If running out of space, stop queue to avoid getting packets that we
 	 * are then unable to transmit.
 	 * An alternative would be to force queuing layer to requeue the skb by
@@ -1653,7 +1659,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		if (!use_napi &&
 		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq, false);
+			free_old_xmit_skbs(txq, sq, false);
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: virtio_net: BQL?
@ 2021-05-19  8:35     ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-05-19  8:35 UTC (permalink / raw)
  To: Dave Taht
  Cc: Xianting Tian, Linux Kernel Network Developers, LKML,
	virtualization, Jakub Kicinski, David S. Miller

On Mon, May 17, 2021 at 11:43:43AM -0700, Dave Taht wrote:
> Not really related to this patch, but is there some reason why virtio
> has no support for BQL?

So just so you can try it out, I rebased my old patch.
XDP is handled incorrectly by it so we shouldn't apply it as is,
but should be good enough for you to see whether it helps.
Completely untested!

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>



diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7be93ca01650..4bfb682a20b2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -556,6 +556,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 			kicks = 1;
 	}
 out:
+	/* TODO: netdev_tx_completed_queue? */
 	u64_stats_update_begin(&sq->stats.syncp);
 	sq->stats.bytes += bytes;
 	sq->stats.packets += packets;
@@ -1376,7 +1377,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 	return stats.packets;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
+static void free_old_xmit_skbs(struct netdev_queue *txq, struct send_queue *sq, bool in_napi)
 {
 	unsigned int len;
 	unsigned int packets = 0;
@@ -1406,6 +1407,8 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 	if (!packets)
 		return;
 
+	netdev_tx_completed_queue(txq, packets, bytes);
+
 	u64_stats_update_begin(&sq->stats.syncp);
 	sq->stats.bytes += bytes;
 	sq->stats.packets += packets;
@@ -1434,7 +1437,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 
 	if (__netif_tx_trylock(txq)) {
 		virtqueue_disable_cb(sq->vq);
-		free_old_xmit_skbs(sq, true);
+		free_old_xmit_skbs(txq, sq, true);
 
 		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 			netif_tx_wake_queue(txq);
@@ -1522,7 +1525,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	txq = netdev_get_tx_queue(vi->dev, index);
 	__netif_tx_lock(txq, raw_smp_processor_id());
 	virtqueue_disable_cb(sq->vq);
-	free_old_xmit_skbs(sq, true);
+	free_old_xmit_skbs(txq, sq, true);
 
 	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 		netif_tx_wake_queue(txq);
@@ -1606,10 +1609,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !netdev_xmit_more();
 	bool use_napi = sq->napi.weight;
+	unsigned int bytes = skb->len;
 
 	/* Free up any pending old buffers before queueing new ones. */
 	virtqueue_disable_cb(sq->vq);
-	free_old_xmit_skbs(sq, false);
+	free_old_xmit_skbs(txq, sq, false);
 
 	if (use_napi && kick)
 		virtqueue_enable_cb_delayed(sq->vq);
@@ -1638,6 +1642,8 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		nf_reset_ct(skb);
 	}
 
+	netdev_tx_sent_queue(txq, bytes);
+
 	/* If running out of space, stop queue to avoid getting packets that we
 	 * are then unable to transmit.
 	 * An alternative would be to force queuing layer to requeue the skb by
@@ -1653,7 +1659,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		if (!use_napi &&
 		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq, false);
+			free_old_xmit_skbs(txq, sq, false);
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Bloat] virtio_net: BQL?
  2021-05-17 23:00       ` [Bloat] " Stephen Hemminger
@ 2021-05-19 14:31           ` Eric Dumazet
  2021-05-19 14:31           ` Eric Dumazet
  1 sibling, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2021-05-19 14:31 UTC (permalink / raw)
  To: Stephen Hemminger, Dave Taht
  Cc: Willem de Bruijn, Michael S. Tsirkin, Xianting Tian,
	Linux Kernel Network Developers, LKML, virtualization, bloat,
	Jakub Kicinski, David S. Miller



On 5/18/21 1:00 AM, Stephen Hemminger wrote:

> 
> The Azure network driver (netvsc) also does not have BQL. Several years ago
> I tried adding it but it benchmarked worse and there is the added complexity
> of handling the accelerated networking VF path.
> 


Note that NIC with many TX queues make BQL almost useless, only adding extra
overhead.

We should probably make BQL something that can be manually turned on/off.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] virtio_net: BQL?
@ 2021-05-19 14:31           ` Eric Dumazet
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Dumazet @ 2021-05-19 14:31 UTC (permalink / raw)
  To: Stephen Hemminger, Dave Taht
  Cc: Willem de Bruijn, Michael S. Tsirkin, Xianting Tian,
	Linux Kernel Network Developers, LKML, virtualization, bloat,
	Jakub Kicinski, David S. Miller



On 5/18/21 1:00 AM, Stephen Hemminger wrote:

> 
> The Azure network driver (netvsc) also does not have BQL. Several years ago
> I tried adding it but it benchmarked worse and there is the added complexity
> of handling the accelerated networking VF path.
> 


Note that NIC with many TX queues make BQL almost useless, only adding extra
overhead.

We should probably make BQL something that can be manually turned on/off.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bloat] virtio_net: BQL?
  2021-05-19 14:31           ` Eric Dumazet
  (?)
@ 2021-05-19 22:53           ` Jakub Kicinski
  -1 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2021-05-19 22:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, Dave Taht, Willem de Bruijn,
	Michael S. Tsirkin, Xianting Tian,
	Linux Kernel Network Developers, LKML, virtualization, bloat,
	David S. Miller

On Wed, 19 May 2021 16:31:10 +0200 Eric Dumazet wrote:
> On 5/18/21 1:00 AM, Stephen Hemminger wrote:
> > The Azure network driver (netvsc) also does not have BQL. Several years ago
> > I tried adding it but it benchmarked worse and there is the added complexity
> > of handling the accelerated networking VF path.
> 
> Note that NIC with many TX queues make BQL almost useless, only adding extra
> overhead.
> 
> We should probably make BQL something that can be manually turned on/off.

Ah, I've been pondering this. Are you thinking of a bit in
dev_queue->state? Not perfect, because with a careful driver design 
one can avoid most dev_queue accesses from the completion path.
It's still much better than recompiling the kernel to set BQL=n, tho.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: virtio_net: BQL?
  2021-05-17 21:48     ` Dave Taht
@ 2021-05-24  2:53         ` Jason Wang
  2021-05-24  2:53         ` Jason Wang
  1 sibling, 0 replies; 20+ messages in thread
From: Jason Wang @ 2021-05-24  2:53 UTC (permalink / raw)
  To: Dave Taht, Willem de Bruijn
  Cc: Xianting Tian, Michael S. Tsirkin, David S. Miller,
	Jakub Kicinski, virtualization, Linux Kernel Network Developers,
	LKML, bloat


在 2021/5/18 上午5:48, Dave Taht 写道:
> On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Mon, May 17, 2021 at 2:44 PM Dave Taht <dave.taht@gmail.com> wrote:
>>> Not really related to this patch, but is there some reason why virtio
>>> has no support for BQL?
>> There have been a few attempts to add it over the years.
>>
>> Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/
>>
>> That thread has a long discussion. I think the key open issue remains
>>
>> "The tricky part is the mode switching between napi and no napi."
> Oy, vey.
>
> I didn't pay any attention to that discussion, sadly enough.
>
> It's been about that long (2018) since I paid any attention to
> bufferbloat in the cloud and my cloudy provider (linode) switched to
> using virtio when I wasn't looking. For over a year now, I'd been
> getting reports saying that comcast's pie rollout wasn't working as
> well as expected, that evenroute's implementation of sch_cake and sqm
> on inbound wasn't working right, nor pf_sense's and numerous other
> issues at Internet scale.
>
> Last week I ran a string of benchmarks against starlink's new services
> and was really aghast at what I found there, too. but the problem
> seemed deeper than in just the dishy...
>
> Without BQL, there's no backpressure for fq_codel to do its thing.
> None. My measurement servers aren't FQ-codeling
> no matter how much load I put on them. Since that qdisc is the default
> now in most linux distributions, I imagine that the bulk of the cloud
> is now behaving as erratically as linux was in 2011 with enormous
> swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
> rings, [1], breaking various rate estimators in codel, pie and the tcp
> stack itself.
>
> See:
>
> http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png
>
> See the swings in latency there? that's symptomatic of tx/rx rings
> filling and emptying.
>
> it wasn't until I switched my measurement server temporarily over to
> sch_fq that I got a rrul result that was close to the results we used
> to get from the virtualized e1000e drivers we were using in 2014.
>
> http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png
>
> While I have long supported the use of sch_fq for tcp-heavy workloads,
> it still behaves better with bql in place, and fq_codel is better for
> generic workloads... but needs bql based backpressure to kick in.
>
> [1] I really hope I'm overreacting but, um, er, could someone(s) spin
> up a new patch that does bql in some way even half right for this
> driver and help test it? I haven't built a kernel in a while.


I think it's time to obsolete skb_orphan() for virtio-net to get rid of 
a brunch of tricky codes in the current virtio-net driver.

Then we can do BQL on top.

I will prepare some patches to do this (probably with Michael's BQL patch).

Thanks


>
>
>>> On Mon, May 17, 2021 at 11:41 AM Xianting Tian
>>> <xianting.tian@linux.alibaba.com> wrote:
>>>> BUG_ON() uses unlikely in if(), which can be optimized at compile time.
>>>>
>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>> ---
>>>>    drivers/net/virtio_net.c | 5 ++---
>>>>    1 file changed, 2 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index c921ebf3ae82..212d52204884 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
>>>> sk_buff *skb)
>>>>          else
>>>>                  hdr = skb_vnet_hdr(skb);
>>>>
>>>> -       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
>>>> +       BUG_ON(virtio_net_hdr_from_skb(skb, &hdr->hdr,
>>>>                                      virtio_is_little_endian(vi->vdev), false,
>>>> -                                   0))
>>>> -               BUG();
>>>> +                                   0));
>>>>
>>>>          if (vi->mergeable_rx_bufs)
>>>>                  hdr->num_buffers = 0;
>>>> --
>>>> 2.17.1
>>>>
>>>
>>> --
>>> Latest Podcast:
>>> https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
>>>
>>> Dave Täht CTO, TekLibre, LLC
>
>
> --
> Latest Podcast:
> https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
>
> Dave Täht CTO, TekLibre, LLC
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: virtio_net: BQL?
@ 2021-05-24  2:53         ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2021-05-24  2:53 UTC (permalink / raw)
  To: Dave Taht, Willem de Bruijn
  Cc: Michael S. Tsirkin, Xianting Tian,
	Linux Kernel Network Developers, LKML, virtualization, bloat,
	Jakub Kicinski, David S. Miller


在 2021/5/18 上午5:48, Dave Taht 写道:
> On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Mon, May 17, 2021 at 2:44 PM Dave Taht <dave.taht@gmail.com> wrote:
>>> Not really related to this patch, but is there some reason why virtio
>>> has no support for BQL?
>> There have been a few attempts to add it over the years.
>>
>> Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/
>>
>> That thread has a long discussion. I think the key open issue remains
>>
>> "The tricky part is the mode switching between napi and no napi."
> Oy, vey.
>
> I didn't pay any attention to that discussion, sadly enough.
>
> It's been about that long (2018) since I paid any attention to
> bufferbloat in the cloud and my cloudy provider (linode) switched to
> using virtio when I wasn't looking. For over a year now, I'd been
> getting reports saying that comcast's pie rollout wasn't working as
> well as expected, that evenroute's implementation of sch_cake and sqm
> on inbound wasn't working right, nor pf_sense's and numerous other
> issues at Internet scale.
>
> Last week I ran a string of benchmarks against starlink's new services
> and was really aghast at what I found there, too. but the problem
> seemed deeper than in just the dishy...
>
> Without BQL, there's no backpressure for fq_codel to do its thing.
> None. My measurement servers aren't FQ-codeling
> no matter how much load I put on them. Since that qdisc is the default
> now in most linux distributions, I imagine that the bulk of the cloud
> is now behaving as erratically as linux was in 2011 with enormous
> swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
> rings, [1], breaking various rate estimators in codel, pie and the tcp
> stack itself.
>
> See:
>
> http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png
>
> See the swings in latency there? that's symptomatic of tx/rx rings
> filling and emptying.
>
> it wasn't until I switched my measurement server temporarily over to
> sch_fq that I got a rrul result that was close to the results we used
> to get from the virtualized e1000e drivers we were using in 2014.
>
> http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png
>
> While I have long supported the use of sch_fq for tcp-heavy workloads,
> it still behaves better with bql in place, and fq_codel is better for
> generic workloads... but needs bql based backpressure to kick in.
>
> [1] I really hope I'm overreacting but, um, er, could someone(s) spin
> up a new patch that does bql in some way even half right for this
> driver and help test it? I haven't built a kernel in a while.


I think it's time to obsolete skb_orphan() for virtio-net to get rid of 
a brunch of tricky codes in the current virtio-net driver.

Then we can do BQL on top.

I will prepare some patches to do this (probably with Michael's BQL patch).

Thanks


>
>
>>> On Mon, May 17, 2021 at 11:41 AM Xianting Tian
>>> <xianting.tian@linux.alibaba.com> wrote:
>>>> BUG_ON() uses unlikely in if(), which can be optimized at compile time.
>>>>
>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>> ---
>>>>    drivers/net/virtio_net.c | 5 ++---
>>>>    1 file changed, 2 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index c921ebf3ae82..212d52204884 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
>>>> sk_buff *skb)
>>>>          else
>>>>                  hdr = skb_vnet_hdr(skb);
>>>>
>>>> -       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
>>>> +       BUG_ON(virtio_net_hdr_from_skb(skb, &hdr->hdr,
>>>>                                      virtio_is_little_endian(vi->vdev), false,
>>>> -                                   0))
>>>> -               BUG();
>>>> +                                   0));
>>>>
>>>>          if (vi->mergeable_rx_bufs)
>>>>                  hdr->num_buffers = 0;
>>>> --
>>>> 2.17.1
>>>>
>>>
>>> --
>>> Latest Podcast:
>>> https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
>>>
>>> Dave Täht CTO, TekLibre, LLC
>
>
> --
> Latest Podcast:
> https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
>
> Dave Täht CTO, TekLibre, LLC
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: virtio_net: BQL?
  2021-05-24  2:53         ` Jason Wang
@ 2021-05-24  9:10           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-05-24  9:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Dave Taht, Willem de Bruijn, Xianting Tian, David S. Miller,
	Jakub Kicinski, virtualization, Linux Kernel Network Developers,
	LKML, bloat

On Mon, May 24, 2021 at 10:53:08AM +0800, Jason Wang wrote:
> 
> 在 2021/5/18 上午5:48, Dave Taht 写道:
> > On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > > On Mon, May 17, 2021 at 2:44 PM Dave Taht <dave.taht@gmail.com> wrote:
> > > > Not really related to this patch, but is there some reason why virtio
> > > > has no support for BQL?
> > > There have been a few attempts to add it over the years.
> > > 
> > > Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/
> > > 
> > > That thread has a long discussion. I think the key open issue remains
> > > 
> > > "The tricky part is the mode switching between napi and no napi."
> > Oy, vey.
> > 
> > I didn't pay any attention to that discussion, sadly enough.
> > 
> > It's been about that long (2018) since I paid any attention to
> > bufferbloat in the cloud and my cloudy provider (linode) switched to
> > using virtio when I wasn't looking. For over a year now, I'd been
> > getting reports saying that comcast's pie rollout wasn't working as
> > well as expected, that evenroute's implementation of sch_cake and sqm
> > on inbound wasn't working right, nor pf_sense's and numerous other
> > issues at Internet scale.
> > 
> > Last week I ran a string of benchmarks against starlink's new services
> > and was really aghast at what I found there, too. but the problem
> > seemed deeper than in just the dishy...
> > 
> > Without BQL, there's no backpressure for fq_codel to do its thing.
> > None. My measurement servers aren't FQ-codeling
> > no matter how much load I put on them. Since that qdisc is the default
> > now in most linux distributions, I imagine that the bulk of the cloud
> > is now behaving as erratically as linux was in 2011 with enormous
> > swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
> > rings, [1], breaking various rate estimators in codel, pie and the tcp
> > stack itself.
> > 
> > See:
> > 
> > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png
> > 
> > See the swings in latency there? that's symptomatic of tx/rx rings
> > filling and emptying.
> > 
> > it wasn't until I switched my measurement server temporarily over to
> > sch_fq that I got a rrul result that was close to the results we used
> > to get from the virtualized e1000e drivers we were using in 2014.
> > 
> > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png
> > 
> > While I have long supported the use of sch_fq for tcp-heavy workloads,
> > it still behaves better with bql in place, and fq_codel is better for
> > generic workloads... but needs bql based backpressure to kick in.
> > 
> > [1] I really hope I'm overreacting but, um, er, could someone(s) spin
> > up a new patch that does bql in some way even half right for this
> > driver and help test it? I haven't built a kernel in a while.
> 
> 
> I think it's time to obsolete skb_orphan() for virtio-net to get rid of a
> brunch of tricky codes in the current virtio-net driver.
> 
> Then we can do BQL on top.
> 
> I will prepare some patches to do this (probably with Michael's BQL patch).
> 
> Thanks

First step would be to fix up and test the BQL part.
IIRC it didn't seem to help performance in our benchmarking,
and Eric seems to say that's expected ...


> 
> > 
> > 
> > > > On Mon, May 17, 2021 at 11:41 AM Xianting Tian
> > > > <xianting.tian@linux.alibaba.com> wrote:
> > > > > BUG_ON() uses unlikely in if(), which can be optimized at compile time.
> > > > > 
> > > > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> > > > > ---
> > > > >    drivers/net/virtio_net.c | 5 ++---
> > > > >    1 file changed, 2 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index c921ebf3ae82..212d52204884 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
> > > > > sk_buff *skb)
> > > > >          else
> > > > >                  hdr = skb_vnet_hdr(skb);
> > > > > 
> > > > > -       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
> > > > > +       BUG_ON(virtio_net_hdr_from_skb(skb, &hdr->hdr,
> > > > >                                      virtio_is_little_endian(vi->vdev), false,
> > > > > -                                   0))
> > > > > -               BUG();
> > > > > +                                   0));
> > > > > 
> > > > >          if (vi->mergeable_rx_bufs)
> > > > >                  hdr->num_buffers = 0;
> > > > > --
> > > > > 2.17.1
> > > > > 
> > > > 
> > > > --
> > > > Latest Podcast:
> > > > https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
> > > > 
> > > > Dave Täht CTO, TekLibre, LLC
> > 
> > 
> > --
> > Latest Podcast:
> > https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
> > 
> > Dave Täht CTO, TekLibre, LLC
> > 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: virtio_net: BQL?
@ 2021-05-24  9:10           ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-05-24  9:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Willem de Bruijn, Xianting Tian, Linux Kernel Network Developers,
	Dave Taht, virtualization, bloat, Jakub Kicinski,
	David S. Miller, LKML

On Mon, May 24, 2021 at 10:53:08AM +0800, Jason Wang wrote:
> 
> 在 2021/5/18 上午5:48, Dave Taht 写道:
> > On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > > On Mon, May 17, 2021 at 2:44 PM Dave Taht <dave.taht@gmail.com> wrote:
> > > > Not really related to this patch, but is there some reason why virtio
> > > > has no support for BQL?
> > > There have been a few attempts to add it over the years.
> > > 
> > > Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/
> > > 
> > > That thread has a long discussion. I think the key open issue remains
> > > 
> > > "The tricky part is the mode switching between napi and no napi."
> > Oy, vey.
> > 
> > I didn't pay any attention to that discussion, sadly enough.
> > 
> > It's been about that long (2018) since I paid any attention to
> > bufferbloat in the cloud and my cloudy provider (linode) switched to
> > using virtio when I wasn't looking. For over a year now, I'd been
> > getting reports saying that comcast's pie rollout wasn't working as
> > well as expected, that evenroute's implementation of sch_cake and sqm
> > on inbound wasn't working right, nor pf_sense's and numerous other
> > issues at Internet scale.
> > 
> > Last week I ran a string of benchmarks against starlink's new services
> > and was really aghast at what I found there, too. but the problem
> > seemed deeper than in just the dishy...
> > 
> > Without BQL, there's no backpressure for fq_codel to do its thing.
> > None. My measurement servers aren't FQ-codeling
> > no matter how much load I put on them. Since that qdisc is the default
> > now in most linux distributions, I imagine that the bulk of the cloud
> > is now behaving as erratically as linux was in 2011 with enormous
> > swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
> > rings, [1], breaking various rate estimators in codel, pie and the tcp
> > stack itself.
> > 
> > See:
> > 
> > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png
> > 
> > See the swings in latency there? that's symptomatic of tx/rx rings
> > filling and emptying.
> > 
> > it wasn't until I switched my measurement server temporarily over to
> > sch_fq that I got a rrul result that was close to the results we used
> > to get from the virtualized e1000e drivers we were using in 2014.
> > 
> > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png
> > 
> > While I have long supported the use of sch_fq for tcp-heavy workloads,
> > it still behaves better with bql in place, and fq_codel is better for
> > generic workloads... but needs bql based backpressure to kick in.
> > 
> > [1] I really hope I'm overreacting but, um, er, could someone(s) spin
> > up a new patch that does bql in some way even half right for this
> > driver and help test it? I haven't built a kernel in a while.
> 
> 
> I think it's time to obsolete skb_orphan() for virtio-net to get rid of a
> brunch of tricky codes in the current virtio-net driver.
> 
> Then we can do BQL on top.
> 
> I will prepare some patches to do this (probably with Michael's BQL patch).
> 
> Thanks

First step would be to fix up and test the BQL part.
IIRC it didn't seem to help performance in our benchmarking,
and Eric seems to say that's expected ...


> 
> > 
> > 
> > > > On Mon, May 17, 2021 at 11:41 AM Xianting Tian
> > > > <xianting.tian@linux.alibaba.com> wrote:
> > > > > BUG_ON() uses unlikely in if(), which can be optimized at compile time.
> > > > > 
> > > > > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> > > > > ---
> > > > >    drivers/net/virtio_net.c | 5 ++---
> > > > >    1 file changed, 2 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index c921ebf3ae82..212d52204884 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
> > > > > sk_buff *skb)
> > > > >          else
> > > > >                  hdr = skb_vnet_hdr(skb);
> > > > > 
> > > > > -       if (virtio_net_hdr_from_skb(skb, &hdr->hdr,
> > > > > +       BUG_ON(virtio_net_hdr_from_skb(skb, &hdr->hdr,
> > > > >                                      virtio_is_little_endian(vi->vdev), false,
> > > > > -                                   0))
> > > > > -               BUG();
> > > > > +                                   0));
> > > > > 
> > > > >          if (vi->mergeable_rx_bufs)
> > > > >                  hdr->num_buffers = 0;
> > > > > --
> > > > > 2.17.1
> > > > > 
> > > > 
> > > > --
> > > > Latest Podcast:
> > > > https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
> > > > 
> > > > Dave Täht CTO, TekLibre, LLC
> > 
> > 
> > --
> > Latest Podcast:
> > https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
> > 
> > Dave Täht CTO, TekLibre, LLC
> > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-05-24  9:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-17 13:31 [PATCH] virtio_net: Use BUG_ON instead of if condition followed by BUG Xianting Tian
2021-05-17 14:35 ` Andrew Lunn
2021-05-17 14:35   ` Andrew Lunn
2021-05-18  9:20   ` Xianting Tian
2021-05-17 18:43 ` virtio_net: BQL? Dave Taht
2021-05-17 20:23   ` Willem de Bruijn
2021-05-17 20:23     ` Willem de Bruijn
2021-05-17 21:48     ` Dave Taht
2021-05-17 23:00       ` [Bloat] " Stephen Hemminger
2021-05-17 23:32         ` Dave Taht
2021-05-18  2:48           ` Stephen Hemminger
2021-05-19 14:31         ` Eric Dumazet
2021-05-19 14:31           ` Eric Dumazet
2021-05-19 22:53           ` Jakub Kicinski
2021-05-24  2:53       ` Jason Wang
2021-05-24  2:53         ` Jason Wang
2021-05-24  9:10         ` Michael S. Tsirkin
2021-05-24  9:10           ` Michael S. Tsirkin
2021-05-19  8:35   ` Michael S. Tsirkin
2021-05-19  8:35     ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.