Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [BUG] pfifo_fast may cause out-of-order CAN frame transmission
@ 2020-01-08 14:55 Ahmad Fatoum
  2020-01-09 12:51 ` Paolo Abeni
  0 siblings, 1 reply; 10+ messages in thread
From: Ahmad Fatoum @ 2020-01-08 14:55 UTC (permalink / raw)
  To: linux-can, netdev, pabeni; +Cc: Sascha Hauer

Hello,

I've run into an issue of CAN frames being sent out-of-order on an i.MX6 Dual
with Linux v5.5-rc5. Bisecting has lead me down to this commit:

ba27b4cdaaa ("net: dev: introduce support for sch BYPASS for lockless qdisc")

With it, using pfifo_fast, every few hundred frames, FlexCAN's .ndo_start_xmit is
passed frames in an order different from how userspace stuffed them into the same
socket.

Reverting it fixes the issue as does booting with maxcpus=1 or using pfifo
instead of pfifo_fast.

According to [1], such reordering shouldn't be happening.

Details on my setup:
Kernel version: v5.5-rc5, (occurs much more often with LOCKDEP turned on)
CAN-Bitrate: 250 kbit/s
CAN frames are generated with:
cangen canX -I2 -L1 -Di -i -g0.12 -p 100
which keeps polling after ENOBUFS until socket is writable, sends out a CAN
frame with one incrementing payload byte and then waits 120 usec before repeating.

Please let me know if any additional info is needed.

Cheers
Ahmad

[1]: http://linux-tc-notes.sourceforge.net/tc/doc/sch_pfifo_fast.txt

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission
  2020-01-08 14:55 [BUG] pfifo_fast may cause out-of-order CAN frame transmission Ahmad Fatoum
@ 2020-01-09 12:51 ` Paolo Abeni
  2020-01-09 17:39   ` Ahmad Fatoum
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2020-01-09 12:51 UTC (permalink / raw)
  To: Ahmad Fatoum, linux-can, netdev; +Cc: Sascha Hauer

On Wed, 2020-01-08 at 15:55 +0100, Ahmad Fatoum wrote:
> I've run into an issue of CAN frames being sent out-of-order on an i.MX6 Dual
> with Linux v5.5-rc5. Bisecting has lead me down to this commit:
> 
> ba27b4cdaaa ("net: dev: introduce support for sch BYPASS for lockless qdisc")
> 
> With it, using pfifo_fast, every few hundred frames, FlexCAN's .ndo_start_xmit is
> passed frames in an order different from how userspace stuffed them into the same
> socket.
> 
> Reverting it fixes the issue as does booting with maxcpus=1 or using pfifo
> instead of pfifo_fast.
> 
> According to [1], such reordering shouldn't be happening.
> 
> Details on my setup:
> Kernel version: v5.5-rc5, (occurs much more often with LOCKDEP turned on)
> CAN-Bitrate: 250 kbit/s
> CAN frames are generated with:
> cangen canX -I2 -L1 -Di -i -g0.12 -p 100
> which keeps polling after ENOBUFS until socket is writable, sends out a CAN
> frame with one incrementing payload byte and then waits 120 usec before repeating.
> 
> Please let me know if any additional info is needed.

Thank you for the report.

I think there is a possible race condition in the 'empty' flag update
schema:

CPU 0					CPU1
(running e.g. net_tx_action)		(can xmit)

qdisc_run()				__dev_xmit_skb()
pfifo_fast_dequeue				
// queue is empty, returns NULL
WRITE_ONCE(qdisc->empty, true);
					pfifo_fast_enqueue
					qdisc_run_begin() 	
					// still locked by CPU 0,
					// return false and do nothing, 
					// qdisc->empty is still true

					(next can xmit)
					// BYPASS code path
					sch_direct_xmit()
					// send pkt 2
					__qdisc_run()
					// send pkt 1

The following patch try to addresses the above, clearing 'empty' flag
in a more aggressive way. We can end-up skipping the bypass
optimization more often than strictly needed in some contended
scenarios, but I guess that is preferrable to the current issue.

The code is only build-tested, could you please try it in your setup?

Thanks,

Paolo
---
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fceddf89592a..df460fe0773a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -158,7 +158,6 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 	if (qdisc->flags & TCQ_F_NOLOCK) {
 		if (!spin_trylock(&qdisc->seqlock))
 			return false;
-		WRITE_ONCE(qdisc->empty, false);
 	} else if (qdisc_is_running(qdisc)) {
 		return false;
 	}
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ad39c87b7fd..3c46575a5af5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3625,6 +3625,8 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 			qdisc_run_end(q);
 		} else {
 			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
+			if (rc != NET_XMIT_DROP && READ_ONCE(q->empty))
+				WRITE_ONCE(q->empty, false);
 			qdisc_run(q);
 		}
 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission
  2020-01-09 12:51 ` Paolo Abeni
@ 2020-01-09 17:39   ` Ahmad Fatoum
  2020-01-10 16:31     ` Paolo Abeni
  0 siblings, 1 reply; 10+ messages in thread
From: Ahmad Fatoum @ 2020-01-09 17:39 UTC (permalink / raw)
  To: Paolo Abeni, linux-can, netdev; +Cc: Sascha Hauer

Hello Paolo,

On 1/9/20 1:51 PM, Paolo Abeni wrote:
> On Wed, 2020-01-08 at 15:55 +0100, Ahmad Fatoum wrote:
>> I've run into an issue of CAN frames being sent out-of-order on an i.MX6 Dual
>> with Linux v5.5-rc5. Bisecting has lead me down to this commit:
> 
> Thank you for the report.

Thanks for the prompt patch. :-)

> The code is only build-tested, could you please try it in your setup?

Issue still persists, albeit appears to have become much less frequent. Took 2 million
frames till first two were swapped. What I usually saw was a swap every few thousand
frames at least and quite often more frequent than that. Might just be noise though.

Thanks
Ahmad

> 
> Thanks,
> 
> Paolo
> ---
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index fceddf89592a..df460fe0773a 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -158,7 +158,6 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
>  	if (qdisc->flags & TCQ_F_NOLOCK) {
>  		if (!spin_trylock(&qdisc->seqlock))
>  			return false;
> -		WRITE_ONCE(qdisc->empty, false);
>  	} else if (qdisc_is_running(qdisc)) {
>  		return false;
>  	}
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0ad39c87b7fd..3c46575a5af5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3625,6 +3625,8 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>  			qdisc_run_end(q);
>  		} else {
>  			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
> +			if (rc != NET_XMIT_DROP && READ_ONCE(q->empty))
> +				WRITE_ONCE(q->empty, false);
>  			qdisc_run(q);
>  		}
>  
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission
  2020-01-09 17:39   ` Ahmad Fatoum
@ 2020-01-10 16:31     ` Paolo Abeni
  2020-01-12 21:29       ` Ahmad Fatoum
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2020-01-10 16:31 UTC (permalink / raw)
  To: Ahmad Fatoum, linux-can, netdev; +Cc: Sascha Hauer

On Thu, 2020-01-09 at 18:39 +0100, Ahmad Fatoum wrote:
> Hello Paolo,
> 
> On 1/9/20 1:51 PM, Paolo Abeni wrote:
> > On Wed, 2020-01-08 at 15:55 +0100, Ahmad Fatoum wrote:
> > > I've run into an issue of CAN frames being sent out-of-order on an i.MX6 Dual
> > > with Linux v5.5-rc5. Bisecting has lead me down to this commit:
> > 
> > Thank you for the report.
> 
> Thanks for the prompt patch. :-)
> 
> > The code is only build-tested, could you please try it in your setup?
> 
> Issue still persists, albeit appears to have become much less frequent. Took 2 million
> frames till first two were swapped. What I usually saw was a swap every few thousand
> frames at least and quite often more frequent than that. Might just be noise though.

Thank you for testing. Even with the proposed patch there is still a
possible race condition: the CPU holding the seqlock can clear the
'empty' flag after that the CPU xmitting the packet enqueue it and set
the 'empty' flag.

The only option I can think of - beyond plain revert - is updating the
'empty' flag in a even a more coarse way, as in the following patch.

Again, the code only build tested and very rough, but it would be
helpful if you could give it a spin.

Thank you!

Paolo

---
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 6a70845bd9ab..fb365fbf65f8 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -113,7 +113,7 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 		     struct net_device *dev, struct netdev_queue *txq,
 		     spinlock_t *root_lock, bool validate);
 
-void __qdisc_run(struct Qdisc *q);
+int __qdisc_run(struct Qdisc *q);
 
 static inline void qdisc_run(struct Qdisc *q)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fceddf89592a..df460fe0773a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -158,7 +158,6 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 	if (qdisc->flags & TCQ_F_NOLOCK) {
 		if (!spin_trylock(&qdisc->seqlock))
 			return false;
-		WRITE_ONCE(qdisc->empty, false);
 	} else if (qdisc_is_running(qdisc)) {
 		return false;
 	}
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ad39c87b7fd..b6378bb7b64a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3624,10 +3624,22 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 end_run:
 			qdisc_run_end(q);
 		} else {
+			int quota = 0;
+
 			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
-			qdisc_run(q);
+			if (!qdisc_run_begin(q))
+				goto out;
+
+			WRITE_ONCE(q->empty, false);
+			if (likely(!test_bit(__QDISC_STATE_DEACTIVATED,
+					     &q->state)))
+				quota = __qdisc_run(q);
+			if (quota > 0)
+				WRITE_ONCE(q->empty, true);
+			qdisc_run_end(q);
 		}
 
+out:
 		if (unlikely(to_free))
 			kfree_skb_list(to_free);
 		return rc;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 5ab696efca95..1bd2c4e9c4c2 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -376,7 +376,7 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets)
 	return sch_direct_xmit(skb, q, dev, txq, root_lock, validate);
 }
 
-void __qdisc_run(struct Qdisc *q)
+int __qdisc_run(struct Qdisc *q)
 {
 	int quota = dev_tx_weight;
 	int packets;
@@ -388,6 +388,7 @@ void __qdisc_run(struct Qdisc *q)
 			break;
 		}
 	}
+	return quota;
 }
 
 unsigned long dev_trans_start(struct net_device *dev)
@@ -649,12 +650,9 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
 
 		skb = __skb_array_consume(q);
 	}
-	if (likely(skb)) {
-		qdisc_update_stats_at_dequeue(qdisc, skb);
-	} else {
-		WRITE_ONCE(qdisc->empty, true);
-	}
 
+	if (likely(skb))
+		qdisc_update_stats_at_dequeue(qdisc, skb);
 	return skb;
 }
 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission
  2020-01-10 16:31     ` Paolo Abeni
@ 2020-01-12 21:29       ` Ahmad Fatoum
       [not found]         ` <57a2352dfc442ea2aa9cd653f8e09db277bf67c7.camel@redhat.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Ahmad Fatoum @ 2020-01-12 21:29 UTC (permalink / raw)
  To: Paolo Abeni, linux-can, netdev; +Cc: kernel

Hello Paolo,

On 1/10/20 5:31 PM, Paolo Abeni wrote:
> On Thu, 2020-01-09 at 18:39 +0100, Ahmad Fatoum wrote:
>> Hello Paolo,
>>
>> On 1/9/20 1:51 PM, Paolo Abeni wrote:
>>> On Wed, 2020-01-08 at 15:55 +0100, Ahmad Fatoum wrote:
>>>> I've run into an issue of CAN frames being sent out-of-order on an i.MX6 Dual
>>>> with Linux v5.5-rc5. Bisecting has lead me down to this commit:
>>>
>>> Thank you for the report.
>>
>> Thanks for the prompt patch. :-)
>>
>>> The code is only build-tested, could you please try it in your setup?
>>
>> Issue still persists, albeit appears to have become much less frequent. Took 2 million
>> frames till first two were swapped. What I usually saw was a swap every few thousand
>> frames at least and quite often more frequent than that. Might just be noise though.
> 
> Thank you for testing. Even with the proposed patch there is still a
> possible race condition: the CPU holding the seqlock can clear the
> 'empty' flag after that the CPU xmitting the packet enqueue it and set
> the 'empty' flag.
> 
> The only option I can think of - beyond plain revert - is updating the
> 'empty' flag in a even a more coarse way, as in the following patch.
> 
> Again, the code only build tested and very rough, but it would be
> helpful if you could give it a spin.

Issue still reproducible despite the new patch.

Thanks
Ahmad

> 
> Thank you!
> 
> Paolo
> 
> ---
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 6a70845bd9ab..fb365fbf65f8 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -113,7 +113,7 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>  		     struct net_device *dev, struct netdev_queue *txq,
>  		     spinlock_t *root_lock, bool validate);
>  
> -void __qdisc_run(struct Qdisc *q);
> +int __qdisc_run(struct Qdisc *q);
>  
>  static inline void qdisc_run(struct Qdisc *q)
>  {
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index fceddf89592a..df460fe0773a 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -158,7 +158,6 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
>  	if (qdisc->flags & TCQ_F_NOLOCK) {
>  		if (!spin_trylock(&qdisc->seqlock))
>  			return false;
> -		WRITE_ONCE(qdisc->empty, false);
>  	} else if (qdisc_is_running(qdisc)) {
>  		return false;
>  	}
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0ad39c87b7fd..b6378bb7b64a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3624,10 +3624,22 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>  end_run:
>  			qdisc_run_end(q);
>  		} else {
> +			int quota = 0;
> +
>  			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
> -			qdisc_run(q);
> +			if (!qdisc_run_begin(q))
> +				goto out;
> +
> +			WRITE_ONCE(q->empty, false);
> +			if (likely(!test_bit(__QDISC_STATE_DEACTIVATED,
> +					     &q->state)))
> +				quota = __qdisc_run(q);
> +			if (quota > 0)
> +				WRITE_ONCE(q->empty, true);
> +			qdisc_run_end(q);
>  		}
>  
> +out:
>  		if (unlikely(to_free))
>  			kfree_skb_list(to_free);
>  		return rc;
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 5ab696efca95..1bd2c4e9c4c2 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -376,7 +376,7 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets)
>  	return sch_direct_xmit(skb, q, dev, txq, root_lock, validate);
>  }
>  
> -void __qdisc_run(struct Qdisc *q)
> +int __qdisc_run(struct Qdisc *q)
>  {
>  	int quota = dev_tx_weight;
>  	int packets;
> @@ -388,6 +388,7 @@ void __qdisc_run(struct Qdisc *q)
>  			break;
>  		}
>  	}
> +	return quota;
>  }
>  
>  unsigned long dev_trans_start(struct net_device *dev)
> @@ -649,12 +650,9 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>  
>  		skb = __skb_array_consume(q);
>  	}
> -	if (likely(skb)) {
> -		qdisc_update_stats_at_dequeue(qdisc, skb);
> -	} else {
> -		WRITE_ONCE(qdisc->empty, true);
> -	}
>  
> +	if (likely(skb))
> +		qdisc_update_stats_at_dequeue(qdisc, skb);
>  	return skb;
>  }
>  
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission
       [not found]         ` <57a2352dfc442ea2aa9cd653f8e09db277bf67c7.camel@redhat.com>
@ 2020-01-20 16:06           ` Ahmad Fatoum
  2020-02-04 16:25             ` Ahmad Fatoum
  0 siblings, 1 reply; 10+ messages in thread
From: Ahmad Fatoum @ 2020-01-20 16:06 UTC (permalink / raw)
  To: Paolo Abeni, netdev, linux-can, Pengutronix Kernel Team

Hello Paolo,

On 1/16/20 1:40 PM, Paolo Abeni wrote:
> I'm sorry for this trial & error experience. I tried to reproduce the
> issue on top of the vcan virtual device, but it looks like it requires
> the timing imposed by a real device, and it's missing here (TL;DR: I
> can't reproduce the issue locally).

No worries. I don't mind testing.

> 
> Code wise, the 2nd patch closed a possible race, but it dumbly re-
> opened the one addressed by the first attempt - the 'empty' field must
> be cleared prior to the trylock operation, or we may end-up with such
> field set and the queue not empty.
> 
> So, could you please try the following code?

Unfortunately, I still see observe reodering.

Thanks
Ahmad

> 
> Many thanks!
> ---
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 6a70845bd9ab..fb365fbf65f8 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -113,7 +113,7 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>  		     struct net_device *dev, struct netdev_queue *txq,
>  		     spinlock_t *root_lock, bool validate);
>  
> -void __qdisc_run(struct Qdisc *q);
> +int __qdisc_run(struct Qdisc *q);
>  
>  static inline void qdisc_run(struct Qdisc *q)
>  {
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index fceddf89592a..df460fe0773a 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -158,7 +158,6 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
>  	if (qdisc->flags & TCQ_F_NOLOCK) {
>  		if (!spin_trylock(&qdisc->seqlock))
>  			return false;
> -		WRITE_ONCE(qdisc->empty, false);
>  	} else if (qdisc_is_running(qdisc)) {
>  		return false;
>  	}
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0ad39c87b7fd..41e89796cc6b 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3624,10 +3624,23 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>  end_run:
>  			qdisc_run_end(q);
>  		} else {
> +			int quota = 0;
> +
>  			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
> -			qdisc_run(q);
> +			if (READ_ONCE(q->empty))
> +				WRITE_ONCE(q->empty, false);
> +			if (!qdisc_run_begin(q))
> +				goto out;
> +
> +			if (likely(!test_bit(__QDISC_STATE_DEACTIVATED,
> +					     &q->state)))
> +				quota = __qdisc_run(q);
> +			if (quota > 0)
> +				WRITE_ONCE(q->empty, true);
> +			qdisc_run_end(q);
>  		}
>  
> +out:
>  		if (unlikely(to_free))
>  			kfree_skb_list(to_free);
>  		return rc;
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 5ab696efca95..1bd2c4e9c4c2 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -376,7 +376,7 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets)
>  	return sch_direct_xmit(skb, q, dev, txq, root_lock, validate);
>  }
>  
> -void __qdisc_run(struct Qdisc *q)
> +int __qdisc_run(struct Qdisc *q)
>  {
>  	int quota = dev_tx_weight;
>  	int packets;
> @@ -388,6 +388,7 @@ void __qdisc_run(struct Qdisc *q)
>  			break;
>  		}
>  	}
> +	return quota;
>  }
>  
>  unsigned long dev_trans_start(struct net_device *dev)
> @@ -649,12 +650,9 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
>  
>  		skb = __skb_array_consume(q);
>  	}
> -	if (likely(skb)) {
> -		qdisc_update_stats_at_dequeue(qdisc, skb);
> -	} else {
> -		WRITE_ONCE(qdisc->empty, true);
> -	}
>  
> +	if (likely(skb))
> +		qdisc_update_stats_at_dequeue(qdisc, skb);
>  	return skb;
>  }
>  
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission
  2020-01-20 16:06           ` Ahmad Fatoum
@ 2020-02-04 16:25             ` Ahmad Fatoum
  2020-02-06 13:21               ` Paolo Abeni
  0 siblings, 1 reply; 10+ messages in thread
From: Ahmad Fatoum @ 2020-02-04 16:25 UTC (permalink / raw)
  To: Paolo Abeni, netdev, linux-can, Pengutronix Kernel Team

Hello Paolo,

On 1/20/20 5:06 PM, Ahmad Fatoum wrote:
> Hello Paolo,
> 
> On 1/16/20 1:40 PM, Paolo Abeni wrote:
>> I'm sorry for this trial & error experience. I tried to reproduce the
>> issue on top of the vcan virtual device, but it looks like it requires
>> the timing imposed by a real device, and it's missing here (TL;DR: I
>> can't reproduce the issue locally).
> 
> No worries. I don't mind testing.
> 
>>
>> Code wise, the 2nd patch closed a possible race, but it dumbly re-
>> opened the one addressed by the first attempt - the 'empty' field must
>> be cleared prior to the trylock operation, or we may end-up with such
>> field set and the queue not empty.
>>
>> So, could you please try the following code?
> 
> Unfortunately, I still see observe reodering.

Any news?

Thanks
Ahmad

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission
  2020-02-04 16:25             ` Ahmad Fatoum
@ 2020-02-06 13:21               ` Paolo Abeni
  2020-02-06 17:06                 ` Oliver Hartkopp
  2020-02-14 16:03                 ` Ahmad Fatoum
  0 siblings, 2 replies; 10+ messages in thread
From: Paolo Abeni @ 2020-02-06 13:21 UTC (permalink / raw)
  To: Ahmad Fatoum, netdev, linux-can, Pengutronix Kernel Team

On Tue, 2020-02-04 at 17:25 +0100, Ahmad Fatoum wrote:
> Hello Paolo,
> 
> On 1/20/20 5:06 PM, Ahmad Fatoum wrote:
> > Hello Paolo,
> > 
> > On 1/16/20 1:40 PM, Paolo Abeni wrote:
> > > I'm sorry for this trial & error experience. I tried to reproduce the
> > > issue on top of the vcan virtual device, but it looks like it requires
> > > the timing imposed by a real device, and it's missing here (TL;DR: I
> > > can't reproduce the issue locally).
> > 
> > No worries. I don't mind testing.
> > 
> > > Code wise, the 2nd patch closed a possible race, but it dumbly re-
> > > opened the one addressed by the first attempt - the 'empty' field must
> > > be cleared prior to the trylock operation, or we may end-up with such
> > > field set and the queue not empty.
> > > 
> > > So, could you please try the following code?
> > 
> > Unfortunately, I still see observe reodering.
> 
> Any news?

I'm unable to find any better solution than a revert. That will cost
some small performace regression, so I'm a bit reluctant to go ahead.
If there is agreement I can post the revert.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission
  2020-02-06 13:21               ` Paolo Abeni
@ 2020-02-06 17:06                 ` Oliver Hartkopp
  2020-02-14 16:03                 ` Ahmad Fatoum
  1 sibling, 0 replies; 10+ messages in thread
From: Oliver Hartkopp @ 2020-02-06 17:06 UTC (permalink / raw)
  To: Paolo Abeni, Ahmad Fatoum, netdev, linux-can, Pengutronix Kernel Team

Hi Paolo,

On 06/02/2020 14.21, Paolo Abeni wrote:
> On Tue, 2020-02-04 at 17:25 +0100, Ahmad Fatoum wrote:
>> Hello Paolo,
>>
>> On 1/20/20 5:06 PM, Ahmad Fatoum wrote:
>>> Hello Paolo,
>>>
>>> On 1/16/20 1:40 PM, Paolo Abeni wrote:
>>>> I'm sorry for this trial & error experience. I tried to reproduce the
>>>> issue on top of the vcan virtual device, but it looks like it requires
>>>> the timing imposed by a real device, and it's missing here (TL;DR: I
>>>> can't reproduce the issue locally).
>>>
>>> No worries. I don't mind testing.
>>>
>>>> Code wise, the 2nd patch closed a possible race, but it dumbly re-
>>>> opened the one addressed by the first attempt - the 'empty' field must
>>>> be cleared prior to the trylock operation, or we may end-up with such
>>>> field set and the queue not empty.
>>>>
>>>> So, could you please try the following code?
>>>
>>> Unfortunately, I still see observe reodering.
>>
>> Any news?
> 
> I'm unable to find any better solution than a revert. That will cost
> some small performace regression, so I'm a bit reluctant to go ahead.
> If there is agreement I can post the revert.

Is it possible that the current pfifo_fast handling has some additional 
problems:

https://marc.info/?l=linux-netdev&m=158092393613669&w=2

Or is this unrelated?

Best,
Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission
  2020-02-06 13:21               ` Paolo Abeni
  2020-02-06 17:06                 ` Oliver Hartkopp
@ 2020-02-14 16:03                 ` Ahmad Fatoum
  1 sibling, 0 replies; 10+ messages in thread
From: Ahmad Fatoum @ 2020-02-14 16:03 UTC (permalink / raw)
  To: Paolo Abeni, netdev, linux-can, Pengutronix Kernel Team

Hello Paolo,

On 2/6/20 2:21 PM, Paolo Abeni wrote:
> On Tue, 2020-02-04 at 17:25 +0100, Ahmad Fatoum wrote:
>> Hello Paolo,
>>
>> On 1/20/20 5:06 PM, Ahmad Fatoum wrote:
>>> Hello Paolo,
>>>
>>> On 1/16/20 1:40 PM, Paolo Abeni wrote:
>>>> I'm sorry for this trial & error experience. I tried to reproduce the
>>>> issue on top of the vcan virtual device, but it looks like it requires
>>>> the timing imposed by a real device, and it's missing here (TL;DR: I
>>>> can't reproduce the issue locally).
>>>
>>> No worries. I don't mind testing.
>>>
>>>> Code wise, the 2nd patch closed a possible race, but it dumbly re-
>>>> opened the one addressed by the first attempt - the 'empty' field must
>>>> be cleared prior to the trylock operation, or we may end-up with such
>>>> field set and the queue not empty.
>>>>
>>>> So, could you please try the following code?
>>>
>>> Unfortunately, I still see observe reodering.
>>
>> Any news?
> 
> I'm unable to find any better solution than a revert. That will cost
> some small performace regression, so I'm a bit reluctant to go ahead.
> If there is agreement I can post the revert.

Could you draft the revert?
d518d2ed864 ("net/sched: fix race between deactivation and dequeue for NOLOCK qdisc")
looks applicable even with the revert, so I don't want to inadvertently cause
a regression by wrongly reverting.

Cheers
Ahmad


> 
> Cheers,
> 
> Paolo
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, back to index

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 14:55 [BUG] pfifo_fast may cause out-of-order CAN frame transmission Ahmad Fatoum
2020-01-09 12:51 ` Paolo Abeni
2020-01-09 17:39   ` Ahmad Fatoum
2020-01-10 16:31     ` Paolo Abeni
2020-01-12 21:29       ` Ahmad Fatoum
     [not found]         ` <57a2352dfc442ea2aa9cd653f8e09db277bf67c7.camel@redhat.com>
2020-01-20 16:06           ` Ahmad Fatoum
2020-02-04 16:25             ` Ahmad Fatoum
2020-02-06 13:21               ` Paolo Abeni
2020-02-06 17:06                 ` Oliver Hartkopp
2020-02-14 16:03                 ` Ahmad Fatoum

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git