Netdev Archive on lore.kernel.org
 help / color / Atom feed
* Packet gets stuck in NOLOCK pfifo_fast qdisc
@ 2019-10-09  6:46 Jonas Bonn
  2019-10-09 19:14 ` Paolo Abeni
  0 siblings, 1 reply; 4+ messages in thread
From: Jonas Bonn @ 2019-10-09  6:46 UTC (permalink / raw)
  To: netdev, LKML, David S . Miller, pabeni

Hi,

The lockless pfifo_fast qdisc has an issue with packets getting stuck in 
the queue.  What appears to happen is:

i)  Thread 1 holds the 'seqlock' on the qdisc and dequeues packets.
ii)  Thread 1 dequeues the last packet in the queue.
iii)  Thread 1 iterates through the qdisc->dequeue function again and 
determines that the queue is empty.

iv)  Thread 2 queues up a packet.  Since 'seqlock' is busy, it just 
assumes the packet will be dequeued by whoever is holding the lock.

v)  Thread 1 releases 'seqlock'.

After v), nobody will check if there are packets in the queue until a 
new packet is enqueued.  Thereby, the packet enqueued by Thread 2 may be 
delayed indefinitely.

What, I think, should probably happen is that Thread 1 should check that 
the queue is empty again after releasing 'seqlock'.  I poked at this, 
but it wasn't obvious to me how to go about this given the way the 
layering works here.  Roughly:

qdisc_run_end() {
...
	spin_unlock(seqlock);
	if (!qdisc_is_empty(qdisc))
		qdisc_run();
...
}

Calling qdisc_run() from qdisc_run_end() doesn't feel right!

There's a qdisc->empty property (and qdisc_is_empty() relies on it) but 
it's not particularly useful in this case since there's a race in 
setting this property which makes it not quite reliable.

Hope someone can shine some light on how to proceed here.

/Jonas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Packet gets stuck in NOLOCK pfifo_fast qdisc
  2019-10-09  6:46 Packet gets stuck in NOLOCK pfifo_fast qdisc Jonas Bonn
@ 2019-10-09 19:14 ` Paolo Abeni
  2019-10-10  6:27   ` Jonas Bonn
  2019-10-11  0:39   ` Jonas Bonn
  0 siblings, 2 replies; 4+ messages in thread
From: Paolo Abeni @ 2019-10-09 19:14 UTC (permalink / raw)
  To: Jonas Bonn, netdev, LKML, David S . Miller, John Fastabend

On Wed, 2019-10-09 at 08:46 +0200, Jonas Bonn wrote:
> Hi,
> 
> The lockless pfifo_fast qdisc has an issue with packets getting stuck in 
> the queue.  What appears to happen is:
> 
> i)  Thread 1 holds the 'seqlock' on the qdisc and dequeues packets.
> ii)  Thread 1 dequeues the last packet in the queue.
> iii)  Thread 1 iterates through the qdisc->dequeue function again and 
> determines that the queue is empty.
> 
> iv)  Thread 2 queues up a packet.  Since 'seqlock' is busy, it just 
> assumes the packet will be dequeued by whoever is holding the lock.
> 
> v)  Thread 1 releases 'seqlock'.
> 
> After v), nobody will check if there are packets in the queue until a 
> new packet is enqueued.  Thereby, the packet enqueued by Thread 2 may be 
> delayed indefinitely.

I think you are right.

It looks like this possible race is present since the initial lockless
implementation - commit 6b3ba9146fe6 ("net: sched: allow qdiscs to
handle locking")

Anyhow the racing windows looks quite tiny - I never observed that
issue in my tests. Do you have a working reproducer?

Something alike the following code - completely untested - can possibly
address the issue, but it's a bit rough and I would prefer not adding
additonal complexity to the lockless qdiscs, can you please have a spin
a it?

Thanks,

Paolo
---
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 6a70845bd9ab..65a1c03330d6 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -113,18 +113,23 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 		     struct net_device *dev, struct netdev_queue *txq,
 		     spinlock_t *root_lock, bool validate);
 
-void __qdisc_run(struct Qdisc *q);
+int __qdisc_run(struct Qdisc *q);
 
 static inline void qdisc_run(struct Qdisc *q)
 {
+	int quota = 0;
+
 	if (qdisc_run_begin(q)) {
 		/* NOLOCK qdisc must check 'state' under the qdisc seqlock
 		 * to avoid racing with dev_qdisc_reset()
 		 */
 		if (!(q->flags & TCQ_F_NOLOCK) ||
 		    likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
-			__qdisc_run(q);
+			quota = __qdisc_run(q);
 		qdisc_run_end(q);
+
+		if (quota > 0 && q->flags & TCQ_F_NOLOCK && q->ops->peek(q))
+			__netif_schedule(q);
 	}
 }
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 17bd8f539bc7..013480f6a794 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -376,7 +376,7 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets)
 	return sch_direct_xmit(skb, q, dev, txq, root_lock, validate);
 }
 
-void __qdisc_run(struct Qdisc *q)
+int __qdisc_run(struct Qdisc *q)
 {
 	int quota = dev_tx_weight;
 	int packets;
@@ -390,9 +390,10 @@ void __qdisc_run(struct Qdisc *q)
 		quota -= packets;
 		if (quota <= 0 || need_resched()) {
 			__netif_schedule(q);
-			break;
+			return 0;
 		}
 	}
+	return quota;
 }
 
 unsigned long dev_trans_start(struct net_device *dev)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Packet gets stuck in NOLOCK pfifo_fast qdisc
  2019-10-09 19:14 ` Paolo Abeni
@ 2019-10-10  6:27   ` Jonas Bonn
  2019-10-11  0:39   ` Jonas Bonn
  1 sibling, 0 replies; 4+ messages in thread
From: Jonas Bonn @ 2019-10-10  6:27 UTC (permalink / raw)
  To: Paolo Abeni, netdev, LKML, David S . Miller, John Fastabend

Hi Paolo,

On 09/10/2019 21:14, Paolo Abeni wrote:
> On Wed, 2019-10-09 at 08:46 +0200, Jonas Bonn wrote:
>> Hi,
>>
>> The lockless pfifo_fast qdisc has an issue with packets getting stuck in
>> the queue.  What appears to happen is:
>>
>> i)  Thread 1 holds the 'seqlock' on the qdisc and dequeues packets.
>> ii)  Thread 1 dequeues the last packet in the queue.
>> iii)  Thread 1 iterates through the qdisc->dequeue function again and
>> determines that the queue is empty.
>>
>> iv)  Thread 2 queues up a packet.  Since 'seqlock' is busy, it just
>> assumes the packet will be dequeued by whoever is holding the lock.
>>
>> v)  Thread 1 releases 'seqlock'.
>>
>> After v), nobody will check if there are packets in the queue until a
>> new packet is enqueued.  Thereby, the packet enqueued by Thread 2 may be
>> delayed indefinitely.
> 
> I think you are right.
> 
> It looks like this possible race is present since the initial lockless
> implementation - commit 6b3ba9146fe6 ("net: sched: allow qdiscs to
> handle locking")
> 
> Anyhow the racing windows looks quite tiny - I never observed that
> issue in my tests. Do you have a working reproducer?

Yes, it's reliably reproducible.  We do network latency measurements and 
latency spikes for these packets that get stuck in the queue.

> 
> Something alike the following code - completely untested - can possibly
> address the issue, but it's a bit rough and I would prefer not adding
> additonal complexity to the lockless qdiscs, can you please have a spin
> a it?

Your change looks reasonable.  I'll give it a try.


> 
> Thanks,
> 
> Paolo
> ---
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 6a70845bd9ab..65a1c03330d6 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -113,18 +113,23 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>   		     struct net_device *dev, struct netdev_queue *txq,
>   		     spinlock_t *root_lock, bool validate);
>   
> -void __qdisc_run(struct Qdisc *q);
> +int __qdisc_run(struct Qdisc *q);
>   
>   static inline void qdisc_run(struct Qdisc *q)
>   {
> +	int quota = 0;
> +
>   	if (qdisc_run_begin(q)) {
>   		/* NOLOCK qdisc must check 'state' under the qdisc seqlock
>   		 * to avoid racing with dev_qdisc_reset()
>   		 */
>   		if (!(q->flags & TCQ_F_NOLOCK) ||
>   		    likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
> -			__qdisc_run(q);
> +			quota = __qdisc_run(q);
>   		qdisc_run_end(q);
> +
> +		if (quota > 0 && q->flags & TCQ_F_NOLOCK && q->ops->peek(q))
> +			__netif_schedule(q);

Not sure this is relevant, but there's a subtle difference in the way 
that the underlying ptr_ring peeks at the queue head and checks whether 
the queue is empty.

For peek it's:

READ_ONCE(r->queue[r->consumer_head]);

For is_empty it's:

!r->queue[READ_ONCE(r->consumer_head)];

The placement of the READ_ONCE changes here.  I can't get my head around 
whether this difference is significant or not.  If it is, then perhaps 
an is_empty() method is needed on the qdisc_ops...???

/Jonas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Packet gets stuck in NOLOCK pfifo_fast qdisc
  2019-10-09 19:14 ` Paolo Abeni
  2019-10-10  6:27   ` Jonas Bonn
@ 2019-10-11  0:39   ` Jonas Bonn
  1 sibling, 0 replies; 4+ messages in thread
From: Jonas Bonn @ 2019-10-11  0:39 UTC (permalink / raw)
  To: Paolo Abeni, netdev, LKML, David S . Miller, John Fastabend

Hi Paolo,

On 09/10/2019 21:14, Paolo Abeni wrote:
> Something alike the following code - completely untested - can possibly
> address the issue, but it's a bit rough and I would prefer not adding
> additonal complexity to the lockless qdiscs, can you please have a spin
> a it?

We've tested a couple of variants of this patch today, but unfortunately 
it doesn't fix the problem of packets getting stuck in the queue.

A couple of comments:

i) On 5.4, there is the BYPASS path that also needs the same treatment 
as it's essentially replicating the behavour of qdisc_run, just without 
the queue/dequeue steps

ii)  We are working a lot with the 4.19 kernel so I backported to the 
patch to this version and tested there.  Here the solution would seem to 
be more robust as the BYPASS path does not exist.

Unfortunately, in both cases we continue to see the issue of the "last 
packet" getting stuck in the queue.

/Jonas


> 
> Thanks,
> 
> Paolo
> ---
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 6a70845bd9ab..65a1c03330d6 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -113,18 +113,23 @@ bool sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
>   		     struct net_device *dev, struct netdev_queue *txq,
>   		     spinlock_t *root_lock, bool validate);
>   
> -void __qdisc_run(struct Qdisc *q);
> +int __qdisc_run(struct Qdisc *q);
>   
>   static inline void qdisc_run(struct Qdisc *q)
>   {
> +	int quota = 0;
> +
>   	if (qdisc_run_begin(q)) {
>   		/* NOLOCK qdisc must check 'state' under the qdisc seqlock
>   		 * to avoid racing with dev_qdisc_reset()
>   		 */
>   		if (!(q->flags & TCQ_F_NOLOCK) ||
>   		    likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
> -			__qdisc_run(q);
> +			quota = __qdisc_run(q);
>   		qdisc_run_end(q);
> +
> +		if (quota > 0 && q->flags & TCQ_F_NOLOCK && q->ops->peek(q))
> +			__netif_schedule(q);
>   	}
>   }
>   
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 17bd8f539bc7..013480f6a794 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -376,7 +376,7 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets)
>   	return sch_direct_xmit(skb, q, dev, txq, root_lock, validate);
>   }
>   
> -void __qdisc_run(struct Qdisc *q)
> +int __qdisc_run(struct Qdisc *q)
>   {
>   	int quota = dev_tx_weight;
>   	int packets;
> @@ -390,9 +390,10 @@ void __qdisc_run(struct Qdisc *q)
>   		quota -= packets;
>   		if (quota <= 0 || need_resched()) {
>   			__netif_schedule(q);
> -			break;
> +			return 0;
>   		}
>   	}
> +	return quota;
>   }
>   
>   unsigned long dev_trans_start(struct net_device *dev)
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09  6:46 Packet gets stuck in NOLOCK pfifo_fast qdisc Jonas Bonn
2019-10-09 19:14 ` Paolo Abeni
2019-10-10  6:27   ` Jonas Bonn
2019-10-11  0:39   ` Jonas Bonn

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox