All of lore.kernel.org
 help / color / mirror / Atom feed
* rawip: delayed and mis-sequenced transmits
@ 2022-07-06 15:54 David Laight
  2022-07-07  1:54 ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: David Laight @ 2022-07-06 15:54 UTC (permalink / raw)
  To: netdev

I'm seeing some UDP packets being significantly delayed (even 200ms)
and then being received immediately after another packet.
During the delay other packets are received on the same UDP socket.
The receiving program is very simple - so the problem must be
with the sender.

The sender is designed to send a lot of packets, but in this case
two packets are sent every 20ms.
While the two packets have the same UDP port numbers, they are
different application data streams.

The sender has a lot of threads that can send packets, each uses
a single 'rawip' socket (to avoid contention on the socket lock).
It is pretty random (but heavily biased) which threads actually send
the packets.
However it is pretty much certain that the two sends will be done
by different threads at almost exactly the same time.

I've not yet tried to find where the packets get queued, never
mind why. But my best guess is they are stuck on a queue
associated with the sending socket.

But the only reason I can see for a queue on the socket is to
allow the socket lock be dropped while a packet is passed to
the ethernet driver.
So a send from a second thread would queue a packet and it would
be picked up when the earlier transmit completes.
This is actually consistent with what I'm seeing - the end of tx
picks up a packet that was queued earlier.
But in my case the packet was queued much, much earlier.
And the earlier send was done by the same thread.

So what I think must be happening is that contention further
down the protocol stack is causing the packet be queued on the
sending socket instead of (probably) the qdisc layer.

Usually tx packets (seem to) get queued in the qdisc layer and
collected when the thread currently doing a tx setup returns
from the ethernet driver.
So I'm not at all sure why, at least sometimes, packets are
being queued on the socket.

I have got 'threaded napi' enabled (to avoid losing rx packets)
but nothing unusual is enabled on the tx side.
Ethernet is tg3 - single tx ring.

Anyone any ideas before I start digging through the kernel code?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rawip: delayed and mis-sequenced transmits
  2022-07-06 15:54 rawip: delayed and mis-sequenced transmits David Laight
@ 2022-07-07  1:54 ` Jakub Kicinski
  2022-07-07  8:02   ` David Laight
  2022-07-07  9:34   ` David Laight
  0 siblings, 2 replies; 4+ messages in thread
From: Jakub Kicinski @ 2022-07-07  1:54 UTC (permalink / raw)
  To: David Laight; +Cc: netdev

On Wed, 6 Jul 2022 15:54:18 +0000 David Laight wrote:
> Anyone any ideas before I start digging through the kernel code?

If the qdisc is pfifo_fast and kernel is old there could be races.
But I don't think that's likely given you probably run something
recent and next packet tx would usually flush the stuck packet.
In any case - switching qdisc could be a useful test, also bpftrace 
is your friend for catching patckets with long sojourn time.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: rawip: delayed and mis-sequenced transmits
  2022-07-07  1:54 ` Jakub Kicinski
@ 2022-07-07  8:02   ` David Laight
  2022-07-07  9:34   ` David Laight
  1 sibling, 0 replies; 4+ messages in thread
From: David Laight @ 2022-07-07  8:02 UTC (permalink / raw)
  To: 'Jakub Kicinski'; +Cc: netdev

From: Jakub Kicinski
> Sent: 07 July 2022 02:54
> 
> On Wed, 6 Jul 2022 15:54:18 +0000 David Laight wrote:
> > Anyone any ideas before I start digging through the kernel code?
> 
> If the qdisc is pfifo_fast and kernel is old there could be races.
> But I don't think that's likely given you probably run something
> recent and next packet tx would usually flush the stuck packet.
> In any case - switching qdisc could be a useful test, also bpftrace
> is your friend for catching patckets with long sojourn time.

Yes, I forgot to mention the system is running a 5.18-rc6 kernel.
(I've already got some diagnostics in the receive path.)

I'd expect bugs in the qdisc layer to (mostly) keep packets
in order.
In this case I'm sending several packets at 20ms intervals that
overtake the 'stuck' packet.
So it really must be on a per socket queue.
Also the 'stuck' packet is sent after the packet that unblocks it.

I'm not sure normal tracing will help.
I'm seeing one mis-sequence every couple of million packets.
I can add code to the ethernet tx code to call ftrace_stop()
when the delayed packet gets sent.
(I've got the same check in the rx path to detect lost packets.)
That should show where it came from and probably why it got queued.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: rawip: delayed and mis-sequenced transmits
  2022-07-07  1:54 ` Jakub Kicinski
  2022-07-07  8:02   ` David Laight
@ 2022-07-07  9:34   ` David Laight
  1 sibling, 0 replies; 4+ messages in thread
From: David Laight @ 2022-07-07  9:34 UTC (permalink / raw)
  To: 'Jakub Kicinski'
  Cc: netdev, Vladimir Oltean, 'linyunsheng@huawei.com'

From: Jakub Kicinski
> Sent: 07 July 2022 02:54
> 
> On Wed, 6 Jul 2022 15:54:18 +0000 David Laight wrote:
> > Anyone any ideas before I start digging through the kernel code?
> 
> If the qdisc is pfifo_fast and kernel is old there could be races.
> But I don't think that's likely given you probably run something
> recent and next packet tx would usually flush the stuck packet.
> In any case - switching qdisc could be a useful test, also bpftrace
> is your friend for catching patckets with long sojourn time.

Reading the sources I think I've found something:
In core/dev.c line 3818 there is:

static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
				 struct net_device *dev,
				 struct netdev_queue *txq)
{
	spinlock_t *root_lock = qdisc_lock(q);
	struct sk_buff *to_free = NULL;
	bool contended;
	int rc;

	qdisc_calculate_pkt_len(skb, q);

	if (q->flags & TCQ_F_NOLOCK) {
		if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) &&
		    qdisc_run_begin(q)) {
			/* Retest nolock_qdisc_is_empty() within the protection
			 * of q->seqlock to protect from racing with requeuing.
			 */
			if (unlikely(!nolock_qdisc_is_empty(q))) {
				rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
				__qdisc_run(q);
				qdisc_run_end(q);

				goto no_lock_out;
			}

I think I'm getting into the code below with a packet queued.
Unlike the code above this sends the current packet before the
queued one - which is exactly what I'm seeing.
Which must mean that the global flags are out of sync with
the per-cpu flags and a transmit from the cpu that queued
the packet is needed to unblock things.

This seems to have been added by c4fef01ba4793

			qdisc_bstats_cpu_update(q, skb);
			if (sch_direct_xmit(skb, q, dev, txq, NULL, true) &&
			    !nolock_qdisc_is_empty(q))
				__qdisc_run(q);

			qdisc_run_end(q);
			return NET_XMIT_SUCCESS;
		}

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-07  9:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06 15:54 rawip: delayed and mis-sequenced transmits David Laight
2022-07-07  1:54 ` Jakub Kicinski
2022-07-07  8:02   ` David Laight
2022-07-07  9:34   ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.