All of lore.kernel.org
 help / color / mirror / Atom feed
* r8169 tx batching(?) causing performance problems
@ 2018-10-03 10:27 David Howells
  2018-10-03 15:23 ` David Howells
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: David Howells @ 2018-10-03 10:27 UTC (permalink / raw)
  To: Realtek linux nic maintainers; +Cc: dhowells, netdev

Hi,

Can someone help me figure out a performance issue that seems to be caused by
an RTL8168g/8111g NIC that seems to be batching up transmissions - or, at
least, not starting immediately that it's given something to transmit?

The setup that I'm dealing with is an AFS filesystem client (the test machine
with the troublesome NIC) and an AFS server machine connected by GigE through
a switch by 2m cables.  The network is pretty much uncontended and the server
is only serving files to the test machine.

AFS uses the RxRPC protocol which runs over UDP, one RxRPC packet per UDP
packet.  RxRPC has two packet types that are of relevance to this issue: DATA
and ACK.  Linux contains an implementation of RxRPC as the AF_RXRPC socket
family and an AFS filesystem that can be found in fs/afs/.

The symptoms I'm seeing are that whilst the client is doing a multi-megabyte
direct-I/O reads over AFS, the server is occasionally stalling in its sending
of DATA packets because the ACK packets being sent by the client are appearing
in batches on the server.

I stuck some tracepoints into the r8169 driver to investigate the issue (see
attached patch).  Here's an excerpt from the trace (note that I didn't enable
the rx tracepoint):

 dd-3179 [000] .N.3    37.095116: net_rtl8169_tx: enp3s0 p=178-179 skb=ba987d36
 nd-2961 [001] d.h1    37.095250: net_rtl8169_interrupt: enp3s0 st=81
 nd-2961 [001] ..s1    37.095253: net_rtl8169_poll: enp3s0 st=81
 dd-3179 [000] ...3    37.095286: net_rtl8169_tx: enp3s0 p=178-180 skb=44d352cf
 dd-3179 [000] .N.3    37.095307: net_rtl8169_tx: enp3s0 p=178-181 skb=53ac88f1
 dd-3179 [000] .N.3    37.095315: net_rtl8169_tx: enp3s0 p=178-182 skb=4f3eedd9
 dd-3179 [000] .N.3    37.095328: net_rtl8169_tx: enp3s0 p=178-183 skb=38c81784
 dd-3179 [000] .N.3    37.095338: net_rtl8169_tx: enp3s0 p=178-184 skb=1f0a8fc3
 dd-3179 [000] .N.3    37.095345: net_rtl8169_tx: enp3s0 p=178-185 skb=281a484d
 dd-3179 [000] .N.3    37.095362: net_rtl8169_tx: enp3s0 p=178-186 skb=13d8a01a
 dd-3179 [000] .N.3    37.095370: net_rtl8169_tx: enp3s0 p=178-187 skb=68fe3d70
 dd-3179 [000] .N.3    37.095382: net_rtl8169_tx: enp3s0 p=178-188 skb=3cf64dd8
 dd-3179 [000] .N.3    37.095390: net_rtl8169_tx: enp3s0 p=178-189 skb=da35591a
 dd-3179 [000] .N.3    37.095403: net_rtl8169_tx: enp3s0 p=178-190 skb=3013974a
 dd-3179 [000] .N.3    37.095410: net_rtl8169_tx: enp3s0 p=178-191 skb=7ecddd8e
 dd-3179 [000] .N.3    37.095427: net_rtl8169_tx: enp3s0 p=178-192 skb=aa30c686
 nd-2961 [001] d.h1    37.095433: net_rtl8169_interrupt: enp3s0 st=85
 nd-2961 [001] ..s1    37.095434: net_rtl8169_poll: enp3s0 st=85
 dd-3179 [000] .N.3    37.095435: net_rtl8169_tx: enp3s0 p=178-193 skb=006b0947
 nd-2961 [001] ..s1    37.095439: net_rtl8169_tx_done: enp3s0 p=178 skb=ba987d36
 nd-2961 [001] ..s1    37.095440: net_rtl8169_tx_done: enp3s0 p=179 skb=44d352cf
 nd-2961 [001] ..s1    37.095441: net_rtl8169_tx_done: enp3s0 p=180 skb=53ac88f1
 nd-2961 [001] ..s1    37.095441: net_rtl8169_tx_done: enp3s0 p=181 skb=4f3eedd9
 nd-2961 [001] ..s1    37.095442: net_rtl8169_tx_done: enp3s0 p=182 skb=38c81784
 nd-2961 [001] ..s1    37.095443: net_rtl8169_tx_done: enp3s0 p=183 skb=1f0a8fc3
 nd-2961 [001] ..s1    37.095443: net_rtl8169_tx_done: enp3s0 p=184 skb=281a484d
 nd-2961 [001] ..s1    37.095444: net_rtl8169_tx_done: enp3s0 p=185 skb=13d8a01a
 nd-2961 [001] ..s1    37.095445: net_rtl8169_tx_done: enp3s0 p=186 skb=68fe3d70
 nd-2961 [001] ..s1    37.095445: net_rtl8169_tx_done: enp3s0 p=187 skb=3cf64dd8
 nd-2961 [001] ..s1    37.095446: net_rtl8169_tx_done: enp3s0 p=188 skb=da35591a
 nd-2961 [001] ..s1    37.095446: net_rtl8169_tx_done: enp3s0 p=189 skb=3013974a
 nd-2961 [001] ..s1    37.095447: net_rtl8169_tx_done: enp3s0 p=190 skb=7ecddd8e

As can be seen, there are a number of packets being transmitted by AF_RXRPC
from the dd command.  Each of these is an ACK packet.  ACK packets are around
850-900 bits in size on the wire according to wireshark.

The first Tx interrupt (status 0x85: TxOk is 0x04) occurs 317uS after the
packet 178 is added to an empty buffer and is deleted 6uS after that.  Packet
190 is deleted 37uS after being added.

What surprises me is that it would seem that an ACK packet should take about
~1uS to transmit and the average interval between ACK packets being added to
the queue is ~12uS, but the ring is not obviously making progress in being
transmitted.

Now, I understand that the dirty_tx and cur_tx ring indices are private to the
driver and not seen by the device, but I added an extra bit of code to count
up the number of descriptors with DescOwn still set, and it always appears to
match the number of packets in the buffer.

So I'm guessing that the NIC does one of a number of things:

 (1) It delays starting transmission until it's got a large enough batch.

 (2) It delays starting transmission until some time has passed since being
     poked.

 (3) It takes time to get the transmitter going for some reason.

 (4) It's batching the update of the Tx ring descriptors and does this right
     before interrupting the CPU.

 (5) Reading/writing the Tx descriptor ring from the device is slow.

But again note that tshark on the server shows the ACKs arriving as a batch -
I've also instrumented the NIC driver for that machine and observed the NIC
apparently receiving ACK packets in batches.

Also, what does the TxDescUnavail status flag signify?  The NIC is setting it,
but nothing in the driver takes any notice of it or clears it.

Thanks,
David
---
commit 4ce09fa309d8fb4b1b910091607c2290c45541f1
Author: David Howells <dhowells@redhat.com>
Date:   Tue Oct 2 21:14:04 2018 +0100

    Add tracepoints for Realtek r8169

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index ab30aaeac6d3..8d3a86a674b7 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -31,6 +31,9 @@
 #include <linux/ipv6.h>
 #include <net/ip6_checksum.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/net_rtl8169.h>
+
 #define MODULENAME "r8169"
 
 #define FIRMWARE_8168D_1	"rtl_nic/rtl8168d-1.fw"
@@ -6197,6 +6200,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 	RTL_W8(tp, TxPoll, NPQ);
 
 	mmiowb();
+	trace_net_rtl8169_tx(tp->dev, tp->dirty_tx, tp->cur_tx, skb);
 
 	if (!TX_FRAGS_READY_FOR(tp, MAX_SKB_FRAGS)) {
 		/* Avoid wrongly optimistic queue wake-up: rtl_tx thread must
@@ -6302,6 +6306,7 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 		rtl8169_unmap_tx_skb(tp_to_dev(tp), tx_skb,
 				     tp->TxDescArray + entry);
 		if (status & LastFrag) {
+			trace_net_rtl8169_tx_done(dev, dirty_tx, tx_skb->skb);
 			u64_stats_update_begin(&tp->tx_stats.syncp);
 			tp->tx_stats.packets++;
 			tp->tx_stats.bytes += tx_skb->skb->len;
@@ -6453,6 +6458,7 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
 				dev->stats.multicast++;
 
 			napi_gro_receive(&tp->napi, skb);
+			trace_net_rtl8169_napi_rx(dev, skb);
 
 			u64_stats_update_begin(&tp->rx_stats.syncp);
 			tp->rx_stats.packets++;
@@ -6466,7 +6472,6 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, u32 budget
 
 	count = cur_rx - tp->cur_rx;
 	tp->cur_rx = cur_rx;
-
 	return count;
 }
 
@@ -6478,6 +6483,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	if (status == 0xffff || !(status & (RTL_EVENT_NAPI | tp->event_slow)))
 		return IRQ_NONE;
 
+	trace_net_rtl8169_interrupt(tp->napi.dev, status);
+
 	rtl_irq_disable(tp);
 	napi_schedule_irqoff(&tp->napi);
 
@@ -6559,6 +6566,7 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
 
 	status = rtl_get_events(tp);
 	rtl_ack_events(tp, status & ~tp->event_slow);
+	trace_net_rtl8169_poll(dev, status);
 
 	if (status & RTL_EVENT_NAPI_RX)
 		work_done = rtl_rx(dev, tp, (u32) budget);
diff --git a/include/trace/events/net_rtl8169.h b/include/trace/events/net_rtl8169.h
new file mode 100644
index 000000000000..92049078b3fa
--- /dev/null
+++ b/include/trace/events/net_rtl8169.h
@@ -0,0 +1,125 @@
+/* Realtek RTL8169 tracepoints
+ *
+ * Copyright (C) 2018 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public Licence
+ * as published by the Free Software Foundation; either version
+ * 2 of the Licence, or (at your option) any later version.
+ */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM net_rtl8169
+
+#if !defined(_TRACE_NET_RTL8169_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NET_RTL8169_H
+
+#include <linux/tracepoint.h>
+#include <linux/errqueue.h>
+
+
+TRACE_EVENT(net_rtl8169_interrupt,
+	    TP_PROTO(struct net_device *netdev, u16 status),
+
+	    TP_ARGS(netdev, status),
+
+	    TP_STRUCT__entry(
+		    __field(u16,			status		)
+		    __array(char,			name, IFNAMSIZ	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->status = status;
+		    memcpy(__entry->name, netdev->name, IFNAMSIZ);
+			   ),
+
+	    TP_printk("%s st=%x", __entry->name, __entry->status)
+	    );
+
+TRACE_EVENT(net_rtl8169_poll,
+	    TP_PROTO(struct net_device *netdev, u16 status),
+
+	    TP_ARGS(netdev, status),
+
+	    TP_STRUCT__entry(
+		    __field(u16,			status		)
+		    __array(char,			name, IFNAMSIZ	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->status = status;
+		    memcpy(__entry->name, netdev->name, IFNAMSIZ);
+			   ),
+
+	    TP_printk("%s st=%x", __entry->name, __entry->status)
+	    );
+
+TRACE_EVENT(net_rtl8169_napi_rx,
+	    TP_PROTO(struct net_device *netdev, struct sk_buff *skb),
+
+	    TP_ARGS(netdev, skb),
+
+	    TP_STRUCT__entry(
+		    __field(struct sk_buff *,		skb		)
+		    __array(char,			name, IFNAMSIZ	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->skb = skb;
+		    memcpy(__entry->name, netdev->name, IFNAMSIZ);
+			   ),
+
+	    TP_printk("%s skb=%p", __entry->name, __entry->skb)
+	    );
+
+TRACE_EVENT(net_rtl8169_tx_done,
+	    TP_PROTO(struct net_device *netdev, unsigned int dirty_tx,
+		     struct sk_buff *skb),
+
+	    TP_ARGS(netdev, dirty_tx, skb),
+
+	    TP_STRUCT__entry(
+		    __field(struct sk_buff *,		skb)
+		    __field(unsigned int,		dirty_tx)
+		    __array(char,			name, IFNAMSIZ	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->skb = skb;
+		    __entry->dirty_tx = dirty_tx;
+		    memcpy(__entry->name, netdev->name, IFNAMSIZ);
+			   ),
+
+	    TP_printk("%s p=%u skb=%p",
+		      __entry->name, __entry->dirty_tx, __entry->skb)
+	    );
+
+TRACE_EVENT(net_rtl8169_tx,
+	    TP_PROTO(struct net_device *netdev, unsigned int dirty_tx,
+		     unsigned int cur_tx, struct sk_buff *skb),
+
+	    TP_ARGS(netdev, dirty_tx, cur_tx, skb),
+
+	    TP_STRUCT__entry(
+		    __field(struct sk_buff *,		skb		)
+		    __field(unsigned int,		dirty_tx	)
+		    __field(unsigned int,		cur_tx		)
+		    __array(char,			name, IFNAMSIZ	)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->skb = skb;
+		    __entry->dirty_tx = dirty_tx;
+		    __entry->cur_tx = cur_tx;
+		    memcpy(__entry->name, netdev->name, IFNAMSIZ);
+			   ),
+
+	    TP_printk("%s p=%u-%u skb=%p",
+		      __entry->name, __entry->dirty_tx, __entry->cur_tx,
+		      __entry->skb)
+	    );
+
+#endif /* _TRACE_NET_RTL8169_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: r8169 tx batching(?) causing performance problems
  2018-10-03 10:27 r8169 tx batching(?) causing performance problems David Howells
@ 2018-10-03 15:23 ` David Howells
  2018-10-03 16:31 ` David Miller
  2018-10-03 20:19 ` David Howells
  2 siblings, 0 replies; 5+ messages in thread
From: David Howells @ 2018-10-03 15:23 UTC (permalink / raw)
  To: Realtek linux nic maintainers; +Cc: dhowells, netdev, hkallweit1

David Howells <dhowells@redhat.com> wrote:

> Can someone help me figure out a performance issue that seems to be caused by
> an RTL8168g/8111g NIC that seems to be batching up transmissions - or, at
> least, not starting immediately that it's given something to transmit?

I've been told that:

	commit ad5f97faff4231e72b96bd96adbe1b6e977a9b86
	Author: Heiner Kallweit <hkallweit1@gmail.com>
	Date:   Fri Sep 28 23:51:54 2018 +0200
	r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO

might fix the problem, however it doesn't seem to help entirely.  Whilst it
does now seem to be transmitting whilst I'm generating up the queue, it still
seems that I'm able to load up the queue faster the packets are being cleared
from the queue.

So in the following excerpt:

id-0    [001] d.h2    41.284702: net_rtl8169_interrupt: enp3s0 st=85
id-0    [001] ..s2    41.284715: net_rtl8169_poll: enp3s0 st=85
dd-3186 [003] ...3    41.284741: net_rtl8169_tx: enp3s0 p=213-216 skb=2e2b0c3e
dd-3186 [003] ...3    41.284777: net_rtl8169_tx: enp3s0 p=213-217 skb=3950deca
dd-3186 [003] ...3    41.284790: net_rtl8169_tx: enp3s0 p=213-218 skb=471b2bc2
dd-3186 [003] ...3    41.284826: net_rtl8169_tx: enp3s0 p=213-219 skb=7c25ae16
dd-3186 [003] ...3    41.284839: net_rtl8169_tx: enp3s0 p=213-220 skb=cfbf719f
dd-3186 [003] ...3    41.284870: net_rtl8169_tx: enp3s0 p=213-221 skb=d34a1f67
dd-3186 [003] ...3    41.284883: net_rtl8169_tx: enp3s0 p=213-222 skb=466e20e8
dd-3186 [003] ...3    41.284914: net_rtl8169_tx: enp3s0 p=213-223 skb=3d36cb1c
dd-3186 [003] ...3    41.284927: net_rtl8169_tx: enp3s0 p=213-224 skb=399c06ea
id-0    [001] ..s2    41.284938: net_rtl8169_tx_done: enp3s0 p=213 skb=c797fea6
id-0    [001] ..s2    41.284939: net_rtl8169_tx_done: enp3s0 p=214 skb=c0e4d6f0
id-0    [001] ..s2    41.284940: net_rtl8169_tx_done: enp3s0 p=215 skb=2e2b0c3e
id-0    [001] ..s2    41.284941: net_rtl8169_tx_done: enp3s0 p=216 skb=3950deca
id-0    [001] ..s2    41.284941: net_rtl8169_tx_done: enp3s0 p=217 skb=471b2bc2
id-0    [001] ..s2    41.284942: net_rtl8169_tx_done: enp3s0 p=218 skb=7c25ae16
id-0    [001] ..s2    41.284943: net_rtl8169_tx_done: enp3s0 p=219 skb=cfbf719f
id-0    [001] ..s2    41.284944: net_rtl8169_tx_done: enp3s0 p=220 skb=d34a1f67
id-0    [001] ..s2    41.284945: net_rtl8169_tx_done: enp3s0 p=221 skb=466e20e8
id-0    [001] ..s2    41.284946: net_rtl8169_tx_done: enp3s0 p=222 skb=3d36cb1c
id-0    [001] ..s2    41.284947: net_rtl8169_tx_done: enp3s0 p=223 skb=399c06ea
id-0    [001] d.h2    41.284954: net_rtl8169_interrupt: enp3s0 st=85

packets are being queued something like 11-13uS apart, but there seems like a
big gap of about 200uS in the idle thread between the poll and the first
tx_done that might be masking things.

David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: r8169 tx batching(?) causing performance problems
  2018-10-03 10:27 r8169 tx batching(?) causing performance problems David Howells
  2018-10-03 15:23 ` David Howells
@ 2018-10-03 16:31 ` David Miller
  2018-10-03 20:19 ` David Howells
  2 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2018-10-03 16:31 UTC (permalink / raw)
  To: dhowells; +Cc: nic_swsd, netdev


Probably you are seeing some interrupt mitigation.

It seems there is a difference in how the interrupt mitigation is
programmed on for 8168 chips vs. others by default.  Most get
all zeros in the IntrMitigate register, whilst for 8168 chips
a value of 0x5151 is programmed.

You can play with ethtool to mess with the coalescing settings
to see if this is part of the problem.

I bet this might explain the behavior you see after including
even Heiner's TXCFG_AUTO_FIFO patch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: r8169 tx batching(?) causing performance problems
  2018-10-03 10:27 r8169 tx batching(?) causing performance problems David Howells
  2018-10-03 15:23 ` David Howells
  2018-10-03 16:31 ` David Miller
@ 2018-10-03 20:19 ` David Howells
  2018-10-03 21:43   ` David Miller
  2 siblings, 1 reply; 5+ messages in thread
From: David Howells @ 2018-10-03 20:19 UTC (permalink / raw)
  To: David Miller; +Cc: dhowells, nic_swsd, netdev

David Miller <davem@davemloft.net> wrote:

> Probably you are seeing some interrupt mitigation.
> 
> It seems there is a difference in how the interrupt mitigation is
> programmed on for 8168 chips vs. others by default.  Most get
> all zeros in the IntrMitigate register, whilst for 8168 chips
> a value of 0x5151 is programmed.

I'm not sure what that means.  I can't seem to find a programmer's manual for
the chip.

> You can play with ethtool to mess with the coalescing settings
> to see if this is part of the problem.

These bits from "ethtool -c enp3s0"?

	rx-usecs: 200
	rx-frames: 4
	rx-usecs-irq: 0
	rx-frames-irq: 0

	tx-usecs: 200
	tx-frames: 4
	tx-usecs-irq: 0
	tx-frames-irq: 0

> I bet this might explain the behavior you see after including
> even Heiner's TXCFG_AUTO_FIFO patch.

Thanks,
David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: r8169 tx batching(?) causing performance problems
  2018-10-03 20:19 ` David Howells
@ 2018-10-03 21:43   ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2018-10-03 21:43 UTC (permalink / raw)
  To: dhowells; +Cc: nic_swsd, netdev

From: David Howells <dhowells@redhat.com>
Date: Wed, 03 Oct 2018 21:19:40 +0100

> David Miller <davem@davemloft.net> wrote:
> 
>> Probably you are seeing some interrupt mitigation.
>> 
>> It seems there is a difference in how the interrupt mitigation is
>> programmed on for 8168 chips vs. others by default.  Most get
>> all zeros in the IntrMitigate register, whilst for 8168 chips
>> a value of 0x5151 is programmed.
> 
> I'm not sure what that means.  I can't seem to find a programmer's manual for
> the chip.

There is a comment which documents what might be the register layout
elsewhere in the driver:

	/*
	 * Undocumented corner. Supposedly:
	 * (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets
	 */
	RTL_W16(tp, IntrMitigate, 0x0000);

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-10-04  4:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-03 10:27 r8169 tx batching(?) causing performance problems David Howells
2018-10-03 15:23 ` David Howells
2018-10-03 16:31 ` David Miller
2018-10-03 20:19 ` David Howells
2018-10-03 21:43   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.