All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] net: stmmac: add BQL support
@ 2014-12-28 14:57 Beniamino Galvani
  2014-12-28 16:25 ` Dave Taht
  2014-12-29  7:48 ` Florian Fainelli
  0 siblings, 2 replies; 7+ messages in thread
From: Beniamino Galvani @ 2014-12-28 14:57 UTC (permalink / raw)
  To: David S. Miller
  Cc: Giuseppe Cavallaro, netdev, linux-kernel, Beniamino Galvani

Add support for Byte Queue Limits to the STMicro MAC driver.

Tested on a Amlogic S805 Cortex-A5 board, where the use of BQL
slightly decreases the ping latency from ~10ms to ~3ms when the
100Mbps link is saturated by TCP streams. No difference is
observed at 1Gbps.

Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 118a427..c5af3d8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1097,6 +1097,7 @@ static int init_dma_desc_rings(struct net_device *dev, gfp_t flags)
 
 	priv->dirty_tx = 0;
 	priv->cur_tx = 0;
+	netdev_reset_queue(priv->dev);
 
 	stmmac_clear_descriptors(priv);
 
@@ -1300,6 +1301,7 @@ static void stmmac_dma_operation_mode(struct stmmac_priv *priv)
 static void stmmac_tx_clean(struct stmmac_priv *priv)
 {
 	unsigned int txsize = priv->dma_tx_size;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
 
 	spin_lock(&priv->tx_lock);
 
@@ -1356,6 +1358,8 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
 		priv->hw->mode->clean_desc3(priv, p);
 
 		if (likely(skb != NULL)) {
+			pkts_compl++;
+			bytes_compl += skb->len;
 			dev_consume_skb_any(skb);
 			priv->tx_skbuff[entry] = NULL;
 		}
@@ -1364,6 +1368,9 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
 
 		priv->dirty_tx++;
 	}
+
+	netdev_completed_queue(priv->dev, pkts_compl, bytes_compl);
+
 	if (unlikely(netif_queue_stopped(priv->dev) &&
 		     stmmac_tx_avail(priv) > STMMAC_TX_THRESH(priv))) {
 		netif_tx_lock(priv->dev);
@@ -1418,6 +1425,7 @@ static void stmmac_tx_err(struct stmmac_priv *priv)
 						     (i == txsize - 1));
 	priv->dirty_tx = 0;
 	priv->cur_tx = 0;
+	netdev_reset_queue(priv->dev);
 	priv->hw->dma->start_tx(priv->ioaddr);
 
 	priv->dev->stats.tx_errors++;
@@ -2049,6 +2057,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
 		skb_tx_timestamp(skb);
 
 	priv->hw->dma->enable_dma_transmission(priv->ioaddr);
+	netdev_sent_queue(dev, skb->len);
 
 	spin_unlock(&priv->tx_lock);
 	return NETDEV_TX_OK;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net: stmmac: add BQL support
  2014-12-28 14:57 [PATCH net-next] net: stmmac: add BQL support Beniamino Galvani
@ 2014-12-28 16:25 ` Dave Taht
  2014-12-28 21:48   ` Beniamino Galvani
  2014-12-29  7:48 ` Florian Fainelli
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Taht @ 2014-12-28 16:25 UTC (permalink / raw)
  To: Beniamino Galvani; +Cc: Giuseppe Cavallaro, netdev, linux-kernel

On Sun, Dec 28, 2014 at 6:57 AM, Beniamino Galvani <b.galvani@gmail.com> wrote:
> Add support for Byte Queue Limits to the STMicro MAC driver.

Thank you!

> Tested on a Amlogic S805 Cortex-A5 board, where the use of BQL
> slightly decreases the ping latency from ~10ms to ~3ms when the
> 100Mbps link is saturated by TCP streams. No difference is
> observed at 1Gbps.

I see the plural. With TSQ in place it is hard (without something like
the rrul test driving multiple streams) to drive a driver to
saturation with small numbers of flows. This was with pfifo_fast, not
sch_fq, at 100mbit?

Can this board actually drive a full gigabit in the first place? Until
now most of the low end arm boards I have seen only came with
a 100mbit mac, and the gig ones lacking offloads seemed to peak
out at about 600mbit.

Under my christmas tree landed a quad core A5 (odroid-c1), also an
xgene and zedboard - both of the latter are a-needing BQL,
and I haven't booted the udroid yet. Hopefully it is the
same driver you just improved.

(https://plus.google.com/u/0/107942175615993706558/posts/f1D43umhm7E )

> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 118a427..c5af3d8 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1097,6 +1097,7 @@ static int init_dma_desc_rings(struct net_device *dev, gfp_t flags)
>
>         priv->dirty_tx = 0;
>         priv->cur_tx = 0;
> +       netdev_reset_queue(priv->dev);
>
>         stmmac_clear_descriptors(priv);
>
> @@ -1300,6 +1301,7 @@ static void stmmac_dma_operation_mode(struct stmmac_priv *priv)
>  static void stmmac_tx_clean(struct stmmac_priv *priv)
>  {
>         unsigned int txsize = priv->dma_tx_size;
> +       unsigned int bytes_compl = 0, pkts_compl = 0;
>
>         spin_lock(&priv->tx_lock);
>
> @@ -1356,6 +1358,8 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
>                 priv->hw->mode->clean_desc3(priv, p);
>
>                 if (likely(skb != NULL)) {
> +                       pkts_compl++;
> +                       bytes_compl += skb->len;
>                         dev_consume_skb_any(skb);
>                         priv->tx_skbuff[entry] = NULL;
>                 }
> @@ -1364,6 +1368,9 @@ static void stmmac_tx_clean(struct stmmac_priv *priv)
>
>                 priv->dirty_tx++;
>         }
> +
> +       netdev_completed_queue(priv->dev, pkts_compl, bytes_compl);
> +
>         if (unlikely(netif_queue_stopped(priv->dev) &&
>                      stmmac_tx_avail(priv) > STMMAC_TX_THRESH(priv))) {
>                 netif_tx_lock(priv->dev);
> @@ -1418,6 +1425,7 @@ static void stmmac_tx_err(struct stmmac_priv *priv)
>                                                      (i == txsize - 1));
>         priv->dirty_tx = 0;
>         priv->cur_tx = 0;
> +       netdev_reset_queue(priv->dev);
>         priv->hw->dma->start_tx(priv->ioaddr);
>
>         priv->dev->stats.tx_errors++;
> @@ -2049,6 +2057,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>                 skb_tx_timestamp(skb);
>
>         priv->hw->dma->enable_dma_transmission(priv->ioaddr);
> +       netdev_sent_queue(dev, skb->len);
>
>         spin_unlock(&priv->tx_lock);
>         return NETDEV_TX_OK;
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht

thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net: stmmac: add BQL support
  2014-12-28 16:25 ` Dave Taht
@ 2014-12-28 21:48   ` Beniamino Galvani
  2014-12-29 17:42     ` Dave Taht
  0 siblings, 1 reply; 7+ messages in thread
From: Beniamino Galvani @ 2014-12-28 21:48 UTC (permalink / raw)
  To: Dave Taht; +Cc: Giuseppe Cavallaro, netdev, linux-kernel

On Sun, Dec 28, 2014 at 08:25:40AM -0800, Dave Taht wrote:
> On Sun, Dec 28, 2014 at 6:57 AM, Beniamino Galvani <b.galvani@gmail.com> wrote:
> > Add support for Byte Queue Limits to the STMicro MAC driver.
> 
> Thank you!
> 
> > Tested on a Amlogic S805 Cortex-A5 board, where the use of BQL
> > slightly decreases the ping latency from ~10ms to ~3ms when the
> > 100Mbps link is saturated by TCP streams. No difference is
> > observed at 1Gbps.
> 
> I see the plural. With TSQ in place it is hard (without something like
> the rrul test driving multiple streams) to drive a driver to
> saturation with small numbers of flows. This was with pfifo_fast, not
> sch_fq, at 100mbit?

Hi Dave,

yes, this was with pfifo_fast and I used 4 iperf TCP streams. The total
throughput didn't seem to increase adding more streams.

> 
> Can this board actually drive a full gigabit in the first place? Until
> now most of the low end arm boards I have seen only came with
> a 100mbit mac, and the gig ones lacking offloads seemed to peak
> out at about 600mbit.

I measured a throughput of 650mbit in rx and 600mbit in tx.

> 
> Under my christmas tree landed a quad core A5 (odroid-c1), also an
> xgene and zedboard - both of the latter are a-needing BQL,
> and I haven't booted the udroid yet. Hopefully it is the
> same driver you just improved.

I'm using the odroid-c1 too, with this tree based on the recent
Amlogic mainline work:

  https://github.com/bengal/linux/tree/meson8b

Unfortunately at the moment the support for the board is very basic
(for example, SMP is not working yet) but it's enough to do some NIC
tests.

Beniamino

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net: stmmac: add BQL support
  2014-12-28 14:57 [PATCH net-next] net: stmmac: add BQL support Beniamino Galvani
  2014-12-28 16:25 ` Dave Taht
@ 2014-12-29  7:48 ` Florian Fainelli
  2014-12-30 21:27   ` Beniamino Galvani
  1 sibling, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2014-12-29  7:48 UTC (permalink / raw)
  To: Beniamino Galvani
  Cc: David S. Miller, Giuseppe Cavallaro, netdev, linux-kernel

2014-12-28 6:57 GMT-08:00 Beniamino Galvani <b.galvani@gmail.com>:
> Add support for Byte Queue Limits to the STMicro MAC driver.
>
> Tested on a Amlogic S805 Cortex-A5 board, where the use of BQL
> slightly decreases the ping latency from ~10ms to ~3ms when the
> 100Mbps link is saturated by TCP streams. No difference is
> observed at 1Gbps.
>
> Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
> ---

[snip]

>         priv->dev->stats.tx_errors++;
> @@ -2049,6 +2057,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
>                 skb_tx_timestamp(skb);
>
>         priv->hw->dma->enable_dma_transmission(priv->ioaddr);
> +       netdev_sent_queue(dev, skb->len);

You are introducing a potential use after free here in case tx_lock is
eliminated one day and your TX reclaim logic kicks in and frees the
freshly transmitted SKB, it would be safer to just cache skb->len in a
local variable, and use it here.

>
>         spin_unlock(&priv->tx_lock);
>         return NETDEV_TX_OK;
-- 
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net: stmmac: add BQL support
  2014-12-28 21:48   ` Beniamino Galvani
@ 2014-12-29 17:42     ` Dave Taht
  2014-12-30 23:13       ` Beniamino Galvani
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Taht @ 2014-12-29 17:42 UTC (permalink / raw)
  To: Beniamino Galvani; +Cc: Giuseppe Cavallaro, netdev, linux-kernel

On Sun, Dec 28, 2014 at 1:48 PM, Beniamino Galvani <b.galvani@gmail.com> wrote:
> On Sun, Dec 28, 2014 at 08:25:40AM -0800, Dave Taht wrote:
>> On Sun, Dec 28, 2014 at 6:57 AM, Beniamino Galvani <b.galvani@gmail.com> wrote:
>> > Add support for Byte Queue Limits to the STMicro MAC driver.
>>
>> Thank you!
>>
>> > Tested on a Amlogic S805 Cortex-A5 board, where the use of BQL
>> > slightly decreases the ping latency from ~10ms to ~3ms when the
>> > 100Mbps link is saturated by TCP streams. No difference is
>> > observed at 1Gbps.
>>
>> I see the plural. With TSQ in place it is hard (without something like
>> the rrul test driving multiple streams) to drive a driver to
>> saturation with small numbers of flows. This was with pfifo_fast, not
>> sch_fq, at 100mbit?
>
> Hi Dave,
>
> yes, this was with pfifo_fast and I used 4 iperf TCP streams. The total
> throughput didn't seem to increase adding more streams.

>>
>> Can this board actually drive a full gigabit in the first place? Until
>> now most of the low end arm boards I have seen only came with
>> a 100mbit mac, and the gig ones lacking offloads seemed to peak
>> out at about 600mbit.
>
> I measured a throughput of 650mbit in rx and 600mbit in tx.

You might want to try the rrul test which tests both directions and
latency at the same time.

In my case I have been trying to find a low-cost chip that could do soft
rate limiting (htb) + fq_codel at up to 300mbit/sec, as that is about
the peak speed
we will be getting from cable modems, and these are horribly overbuffered,
at these speeds too, with 1.2sec of bidirectional latency observed at
120mbit/12mbit.

I'm open to crazy ideas like trying to find a use for the gpu, etc, to
get there.

>
>>
>> Under my christmas tree landed a quad core A5 (odroid-c1), also an
>> xgene and zedboard - both of the latter are a-needing BQL,
>> and I haven't booted the udroid yet. Hopefully it is the
>> same driver you just improved.
>
> I'm using the odroid-c1 too, with this tree based on the recent
> Amlogic mainline work:
>
>   https://github.com/bengal/linux/tree/meson8b

Oh, cool, thx!

> Unfortunately at the moment the support for the board is very basic
> (for example, SMP is not working yet) but it's enough to do some NIC
> tests.

Good to know. Have you looked at xmit_more yet?

http://lwn.net/Articles/615238/


> Beniamino



-- 
Dave Täht

http://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net: stmmac: add BQL support
  2014-12-29  7:48 ` Florian Fainelli
@ 2014-12-30 21:27   ` Beniamino Galvani
  0 siblings, 0 replies; 7+ messages in thread
From: Beniamino Galvani @ 2014-12-30 21:27 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: David S. Miller, Giuseppe Cavallaro, netdev, linux-kernel

On Sun, Dec 28, 2014 at 11:48:14PM -0800, Florian Fainelli wrote:
> 2014-12-28 6:57 GMT-08:00 Beniamino Galvani <b.galvani@gmail.com>:
> > Add support for Byte Queue Limits to the STMicro MAC driver.
> >
> > Tested on a Amlogic S805 Cortex-A5 board, where the use of BQL
> > slightly decreases the ping latency from ~10ms to ~3ms when the
> > 100Mbps link is saturated by TCP streams. No difference is
> > observed at 1Gbps.
> >
> > Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
> > ---
> 
> [snip]
> 
> >         priv->dev->stats.tx_errors++;
> > @@ -2049,6 +2057,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
> >                 skb_tx_timestamp(skb);
> >
> >         priv->hw->dma->enable_dma_transmission(priv->ioaddr);
> > +       netdev_sent_queue(dev, skb->len);
> 
> You are introducing a potential use after free here in case tx_lock is
> eliminated one day and your TX reclaim logic kicks in and frees the
> freshly transmitted SKB, it would be safer to just cache skb->len in a
> local variable, and use it here.

Ok, I will change this part; probably a simpler solution is to call
netdev_sent_queue() before enabling the DMA like in other drivers.

BTW, I'm not sure this lock is really needed, since it should be
possible to safely access the ring without locks from both the
transmit and the reclaim functions if the pointers are updated
carefully. So maybe it will be really removed one day.

Beniamino

> 
> >
> >         spin_unlock(&priv->tx_lock);
> >         return NETDEV_TX_OK;


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net: stmmac: add BQL support
  2014-12-29 17:42     ` Dave Taht
@ 2014-12-30 23:13       ` Beniamino Galvani
  0 siblings, 0 replies; 7+ messages in thread
From: Beniamino Galvani @ 2014-12-30 23:13 UTC (permalink / raw)
  To: Dave Taht; +Cc: Giuseppe Cavallaro, netdev, linux-kernel

On Mon, Dec 29, 2014 at 09:42:01AM -0800, Dave Taht wrote:
> On Sun, Dec 28, 2014 at 1:48 PM, Beniamino Galvani <b.galvani@gmail.com> wrote:
> > On Sun, Dec 28, 2014 at 08:25:40AM -0800, Dave Taht wrote:
> >> On Sun, Dec 28, 2014 at 6:57 AM, Beniamino Galvani <b.galvani@gmail.com> wrote:
> >> > Add support for Byte Queue Limits to the STMicro MAC driver.
> >>
> >> Thank you!
> >>
> >> > Tested on a Amlogic S805 Cortex-A5 board, where the use of BQL
> >> > slightly decreases the ping latency from ~10ms to ~3ms when the
> >> > 100Mbps link is saturated by TCP streams. No difference is
> >> > observed at 1Gbps.
> >>
> >> I see the plural. With TSQ in place it is hard (without something like
> >> the rrul test driving multiple streams) to drive a driver to
> >> saturation with small numbers of flows. This was with pfifo_fast, not
> >> sch_fq, at 100mbit?
> >
> > Hi Dave,
> >
> > yes, this was with pfifo_fast and I used 4 iperf TCP streams. The total
> > throughput didn't seem to increase adding more streams.
> 
> >>
> >> Can this board actually drive a full gigabit in the first place? Until
> >> now most of the low end arm boards I have seen only came with
> >> a 100mbit mac, and the gig ones lacking offloads seemed to peak
> >> out at about 600mbit.
> >
> > I measured a throughput of 650mbit in rx and 600mbit in tx.
> 
> You might want to try the rrul test which tests both directions and
> latency at the same time.

I will try it, thanks.

> 
> In my case I have been trying to find a low-cost chip that could do soft
> rate limiting (htb) + fq_codel at up to 300mbit/sec, as that is about
> the peak speed
> we will be getting from cable modems, and these are horribly overbuffered,
> at these speeds too, with 1.2sec of bidirectional latency observed at
> 120mbit/12mbit.
> 
> I'm open to crazy ideas like trying to find a use for the gpu, etc, to
> get there.
> 
> >
> >>
> >> Under my christmas tree landed a quad core A5 (odroid-c1), also an
> >> xgene and zedboard - both of the latter are a-needing BQL,
> >> and I haven't booted the udroid yet. Hopefully it is the
> >> same driver you just improved.
> >
> > I'm using the odroid-c1 too, with this tree based on the recent
> > Amlogic mainline work:
> >
> >   https://github.com/bengal/linux/tree/meson8b
> 
> Oh, cool, thx!
> 
> > Unfortunately at the moment the support for the board is very basic
> > (for example, SMP is not working yet) but it's enough to do some NIC
> > tests.
> 
> Good to know. Have you looked at xmit_more yet?
> 
> http://lwn.net/Articles/615238/

I don't know if I have implemented it correctly, but I found that the
improvement with xmit_more is so small to be barely observable, maybe
because the cost for starting the hardware transmission is very low
(it's a single mmio write).

Beniamino

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-12-30 23:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-28 14:57 [PATCH net-next] net: stmmac: add BQL support Beniamino Galvani
2014-12-28 16:25 ` Dave Taht
2014-12-28 21:48   ` Beniamino Galvani
2014-12-29 17:42     ` Dave Taht
2014-12-30 23:13       ` Beniamino Galvani
2014-12-29  7:48 ` Florian Fainelli
2014-12-30 21:27   ` Beniamino Galvani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.