All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ethernet:arc: Fix racing of TX ring buffer
@ 2016-05-14 16:16 Shuyu Wei
  2016-05-14 20:03 ` Francois Romieu
  0 siblings, 1 reply; 7+ messages in thread
From: Shuyu Wei @ 2016-05-14 16:16 UTC (permalink / raw)
  To: wxt, davem; +Cc: heiko, linux-rockchip, netdev

[-- Attachment #1: mail1.txt --]
[-- Type: text/plain, Size: 1474 bytes --]

The tail of the ring buffer(txbd_dirty) should never go ahead of the
head(txbd_curr) or the ring buffer will corrupt. 

This is the root cause of racing.

Besides, setting the FOR_EMAC flag should be the last step of modifying
the buffer descriptor, or possible racing will occur.

Signed-off-by: Shuyu Wei <sy.w@outlook.com>
---

diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c
index a3a9392..5ece05b 100644
--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -155,7 +155,7 @@ static void arc_emac_tx_clean(struct net_device *ndev)
        struct net_device_stats *stats = &ndev->stats;
        unsigned int i;
 
-       for (i = 0; i < TX_BD_NUM; i++) {
+       for (i = priv->txbd_dirty; i != priv->txbd_curr; i = (i + 1) % TX_BD_NUM) {
                unsigned int *txbd_dirty = &priv->txbd_dirty;
                struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
                struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];
@@ -686,12 +686,12 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev)
 
        skb_tx_timestamp(skb);
 
+       priv->tx_buff[*txbd_curr].skb = skb;
        *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
 
        /* Make sure info word is set */
        wmb();
 
-       priv->tx_buff[*txbd_curr].skb = skb;
 
        /* Increment index to point to the next BD */
        *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer
  2016-05-14 16:16 [PATCH] ethernet:arc: Fix racing of TX ring buffer Shuyu Wei
@ 2016-05-14 20:03 ` Francois Romieu
  2016-05-14 23:10   ` Shuyu Wei
  0 siblings, 1 reply; 7+ messages in thread
From: Francois Romieu @ 2016-05-14 20:03 UTC (permalink / raw)
  To: Shuyu Wei; +Cc: wxt, davem, heiko, linux-rockchip, netdev

Shuyu Wei <wsy2220@gmail.com> :
> The tail of the ring buffer(txbd_dirty) should never go ahead of the
> head(txbd_curr) or the ring buffer will corrupt. 
> 
> This is the root cause of racing.

No (see below).

It may suffer from some barrier illness though.

> Besides, setting the FOR_EMAC flag should be the last step of modifying
> the buffer descriptor, or possible racing will occur.

(s/Besides//)

Yes. Good catch.

> Signed-off-by: Shuyu Wei <sy.w@outlook.com>
> ---
> 
> diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c
> index a3a9392..5ece05b 100644
> --- a/drivers/net/ethernet/arc/emac_main.c
> +++ b/drivers/net/ethernet/arc/emac_main.c
> @@ -155,7 +155,7 @@ static void arc_emac_tx_clean(struct net_device *ndev)
>         struct net_device_stats *stats = &ndev->stats;
>         unsigned int i;
>  
> -       for (i = 0; i < TX_BD_NUM; i++) {
> +       for (i = priv->txbd_dirty; i != priv->txbd_curr; i = (i + 1) % TX_BD_NUM) {
>                 unsigned int *txbd_dirty = &priv->txbd_dirty;
>                 struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
>                 struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];

"i" is only used as a loop counter in arc_emac_tx_clean. It is not even
used as an index to dereference an array or whatever. Only "priv->txbd_dirty"
is used.

arc_emac_tx_clean() checks FOR_EMAC, skb, and dirty tx data. It takes care of
clearing those itself. Thus, (memory / io barrier considerations apart) it can
only proceed beyond its own "if ((info & FOR_EMAC) || !txbd->data || !skb)"
check if arc_emac_tx wrote all of those.

Where they are used as loop counters, both TX_BD_NUM and txbd_curr - txbd_dirty
can be considered as hints (please note that unsigned arithmetic can replace
the "%" sludgehammer here).

> @@ -686,12 +686,12 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev)
>  
>         skb_tx_timestamp(skb);

> +       priv->tx_buff[*txbd_curr].skb = skb;

	dma_wmb();

(sync writes to memory before releasing descriptor)

>         *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
>  
>         /* Make sure info word is set */
>         wmb();

arc_emac_tx_clean can run from this point.

txbd_curr is still not set (and it does need to). So, if you insist
on txbd_curr appearing in arc_emac_tx_clean::for(...), it's perfectly
possible to ignore a sent packet.

I ignored arc_reg_set() at the end of arc_emac_tx(). I have no idea
if it is posted nor if it forces the chipset to read the descriptors
(synchronously ?) so part of the sentence above could be wrong.

You have found a big offender in arc_emac_tx() but the arc_emac_tx_clean()
part is imho useless, incorrectly understood or misworded.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer
  2016-05-14 20:03 ` Francois Romieu
@ 2016-05-14 23:10   ` Shuyu Wei
  2016-05-14 23:24     ` Shuyu Wei
  2016-05-15  9:19     ` Francois Romieu
  0 siblings, 2 replies; 7+ messages in thread
From: Shuyu Wei @ 2016-05-14 23:10 UTC (permalink / raw)
  To: Francois Romieu; +Cc: wxt, davem, heiko, linux-rockchip, netdev

On Sat, May 14, 2016 at 10:03:56PM +0200, Francois Romieu wrote:
> Shuyu Wei <wsy2220@gmail.com> :
> > The tail of the ring buffer(txbd_dirty) should never go ahead of the
> > head(txbd_curr) or the ring buffer will corrupt. 
> > 
> > This is the root cause of racing.
> 
> No (see below).
> 
> It may suffer from some barrier illness though.
> 
> > Besides, setting the FOR_EMAC flag should be the last step of modifying
> > the buffer descriptor, or possible racing will occur.
> 
> (s/Besides//)
> 
> Yes. Good catch.
> 
> > Signed-off-by: Shuyu Wei <sy.w@outlook.com>
> > ---
> > 
> > diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c
> > index a3a9392..5ece05b 100644
> > --- a/drivers/net/ethernet/arc/emac_main.c
> > +++ b/drivers/net/ethernet/arc/emac_main.c
> > @@ -155,7 +155,7 @@ static void arc_emac_tx_clean(struct net_device *ndev)
> >         struct net_device_stats *stats = &ndev->stats;
> >         unsigned int i;
> >  
> > -       for (i = 0; i < TX_BD_NUM; i++) {
> > +       for (i = priv->txbd_dirty; i != priv->txbd_curr; i = (i + 1) % TX_BD_NUM) {
> >                 unsigned int *txbd_dirty = &priv->txbd_dirty;
> >                 struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
> >                 struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];
> 
> "i" is only used as a loop counter in arc_emac_tx_clean. It is not even
> used as an index to dereference an array or whatever. Only "priv->txbd_dirty"
> is used.
> 
> arc_emac_tx_clean() checks FOR_EMAC, skb, and dirty tx data. It takes care of
> clearing those itself. Thus, (memory / io barrier considerations apart) it can
> only proceed beyond its own "if ((info & FOR_EMAC) || !txbd->data || !skb)"
> check if arc_emac_tx wrote all of those.
> 
> Where they are used as loop counters, both TX_BD_NUM and txbd_curr - txbd_dirty
> can be considered as hints (please note that unsigned arithmetic can replace
> the "%" sludgehammer here).
> 
> > @@ -686,12 +686,12 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev)
> >  
> >         skb_tx_timestamp(skb);
> 
> > +       priv->tx_buff[*txbd_curr].skb = skb;
> 
> 	dma_wmb();
> 
> (sync writes to memory before releasing descriptor)
> 
> >         *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
> >  
> >         /* Make sure info word is set */
> >         wmb();
> 
> arc_emac_tx_clean can run from this point.
> 
> txbd_curr is still not set (and it does need to). So, if you insist
> on txbd_curr appearing in arc_emac_tx_clean::for(...), it's perfectly
> possible to ignore a sent packet.
> 
> I ignored arc_reg_set() at the end of arc_emac_tx(). I have no idea
> if it is posted nor if it forces the chipset to read the descriptors
> (synchronously ?) so part of the sentence above could be wrong.
> 
> You have found a big offender in arc_emac_tx() but the arc_emac_tx_clean()
> part is imho useless, incorrectly understood or misworded.
> 
> -- 
> Ueimor


Hi, Ueimor. Thanks for your reply.

I don't think taking txbd_curr and txbd_dirty only as hints is a good idea.
That could be a big waste, since tx_clean have to go through all the txbds.
I tried your advice, Tx throughput can only reach 5.52MB/s.

Leaving one sent packet in tx_clean is acceptable if we respect to txbd_curr
and txbd_dirty, since the ignored packet will be cleaned when new packets
arrive.

>  for (i = priv->txbd_dirty; i != priv->txbd_curr; i = (i + 1) % TX_BD_NUM) {
In fact, the loop above will always ignore one or two sent packet, the loop
below can free all packets or leave one if txbd_curr is not updated. I
use the above one since it is clearer.

  for (i = priv->txbd_dirty; (i + 1) % TX_BD_NUM != priv->txbd_curr; i = (i + 1) % TX_BD_NUM) {

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer
  2016-05-14 23:10   ` Shuyu Wei
@ 2016-05-14 23:24     ` Shuyu Wei
  2016-05-15  9:19     ` Francois Romieu
  1 sibling, 0 replies; 7+ messages in thread
From: Shuyu Wei @ 2016-05-14 23:24 UTC (permalink / raw)
  To: Francois Romieu; +Cc: wxt, davem, heiko, linux-rockchip, netdev

Sorry, the last two lines is wrong, ignore it.

I mean I intended to ignore one or two packets.
It's just a trade-off for performance, but it
doesn't cause any memory leak.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer
  2016-05-14 23:10   ` Shuyu Wei
  2016-05-14 23:24     ` Shuyu Wei
@ 2016-05-15  9:19     ` Francois Romieu
       [not found]       ` <20160515091953.GA19983-lmTtMILVy1jWQcoT9B9Ug5SCg42XY1Uw0E9HWUfgJXw@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Francois Romieu @ 2016-05-15  9:19 UTC (permalink / raw)
  To: Shuyu Wei
  Cc: linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, heiko-4mtYJXux2i+zQB+pC5nmwQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, wxt-TNX95d0MmH7DzftRWevZcw

Shuyu Wei <wsy2220-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> :
[...]
> I don't think taking txbd_curr and txbd_dirty only as hints is a good idea.
> That could be a big waste, since tx_clean have to go through all the txbds.

Sorry if my point was not clear: arc_emac_tx_clean() does not need
to change (at least not for the reason given in the commit message) :o)

Current code:

static void arc_emac_tx_clean(struct net_device *ndev)
{
[...]
        for (i = 0; i < TX_BD_NUM; i++) {
                unsigned int *txbd_dirty = &priv->txbd_dirty;
                struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
                struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];
                struct sk_buff *skb = tx_buff->skb;
                unsigned int info = le32_to_cpu(txbd->info);

                if ((info & FOR_EMAC) || !txbd->data || !skb)
                        break;
                        ^^^^^

-> the "break" statement prevents reading all txbds. At most one extra
   descriptor is read and this driver isn't in the Mpps business.

> I tried your advice, Tx throughput can only reach 5.52MB/s.

Even with the original code above ?

> Leaving one sent packet in tx_clean is acceptable if we respect to txbd_curr
> and txbd_dirty, since the ignored packet will be cleaned when new packets
> arrive.

There is no reason to leave tx packet roting in the first place. Really.
I doubt it would help bql for one.

Packet may rot because of unexpected hardware behavior and driver should
cope with it when it is diagnosed, sure. However, you don't want the driver
to opens it own unbounded window. Next packet: 10 us, 10 ms, 10 s ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer
       [not found]       ` <20160515091953.GA19983-lmTtMILVy1jWQcoT9B9Ug5SCg42XY1Uw0E9HWUfgJXw@public.gmane.org>
@ 2016-05-15 12:38         ` Shuyu Wei
  2016-05-15 18:07           ` Francois Romieu
  0 siblings, 1 reply; 7+ messages in thread
From: Shuyu Wei @ 2016-05-15 12:38 UTC (permalink / raw)
  To: Francois Romieu
  Cc: linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, heiko-4mtYJXux2i+zQB+pC5nmwQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, wxt-TNX95d0MmH7DzftRWevZcw

On Sun, May 15, 2016 at 11:19:53AM +0200, Francois Romieu wrote:
> 
> static void arc_emac_tx_clean(struct net_device *ndev)
> {
> [...]
>         for (i = 0; i < TX_BD_NUM; i++) {
>                 unsigned int *txbd_dirty = &priv->txbd_dirty;
>                 struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
>                 struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];
>                 struct sk_buff *skb = tx_buff->skb;
>                 unsigned int info = le32_to_cpu(txbd->info);
> 
>                 if ((info & FOR_EMAC) || !txbd->data || !skb)
>                         break;
>                         ^^^^^
> 
> -> the "break" statement prevents reading all txbds. At most one extra
>    descriptor is read and this driver isn't in the Mpps business.
> 

You are right, I forgot the break statement.

> > I tried your advice, Tx throughput can only reach 5.52MB/s.
> 
> Even with the original code above ?

Yes, I left tx_clean unmodified, and took your advice below.
I tested it again just now, this time throughput do reach 9.8MB/s,
Maybe last time cpu is not idle.

I still have a question, is it possible that tx_clean() run
between   priv->tx_buff[*txbd_curr].skb = skb   and   dma_wmb()?

--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -685,13 +685,15 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev)
        wmb();
 
        skb_tx_timestamp(skb);
+       priv->tx_buff[*txbd_curr].skb = skb;
+
+       dma_wmb();
 
        *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
 
        /* Make sure info word is set */
        wmb();
 
-       priv->tx_buff[*txbd_curr].skb = skb;
 
        /* Increment index to point to the next BD */
        *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer
  2016-05-15 12:38         ` Shuyu Wei
@ 2016-05-15 18:07           ` Francois Romieu
  0 siblings, 0 replies; 7+ messages in thread
From: Francois Romieu @ 2016-05-15 18:07 UTC (permalink / raw)
  To: Shuyu Wei
  Cc: linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, heiko-4mtYJXux2i+zQB+pC5nmwQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, wxt-TNX95d0MmH7DzftRWevZcw

Shuyu Wei <wsy2220-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> :
[...]
> I still have a question, is it possible that tx_clean() run
> between   priv->tx_buff[*txbd_curr].skb = skb   and   dma_wmb()?

A (previous) run can take place after priv->tx_buff[*txbd_curr].skb and
before *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len).

So, yes, the driver must check in arc_emac_tx_clean() that 1) either
txbd_dirty != txbd_curr or 2) "info" is not consistent with a still-not-used
status word. Please be patient with me and get rid of the useless "i"

diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c
index a3a9392..337ea3b 100644
--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -153,9 +153,8 @@ static void arc_emac_tx_clean(struct net_device *ndev)
 {
 	struct arc_emac_priv *priv = netdev_priv(ndev);
 	struct net_device_stats *stats = &ndev->stats;
-	unsigned int i;
 
-	for (i = 0; i < TX_BD_NUM; i++) {
+	while (priv->txbd_dirty != priv->txbd_curr) {
 		unsigned int *txbd_dirty = &priv->txbd_dirty;
 		struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
 		struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];

-- 
Ueimor

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-05-15 18:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-14 16:16 [PATCH] ethernet:arc: Fix racing of TX ring buffer Shuyu Wei
2016-05-14 20:03 ` Francois Romieu
2016-05-14 23:10   ` Shuyu Wei
2016-05-14 23:24     ` Shuyu Wei
2016-05-15  9:19     ` Francois Romieu
     [not found]       ` <20160515091953.GA19983-lmTtMILVy1jWQcoT9B9Ug5SCg42XY1Uw0E9HWUfgJXw@public.gmane.org>
2016-05-15 12:38         ` Shuyu Wei
2016-05-15 18:07           ` Francois Romieu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.