All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sven Van Asbroeck <thesven73@gmail.com>
To: Bryan Whitehead <bryan.whitehead@microchip.com>,
	UNGLinuxDriver@microchip.com,
	David S Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>
Cc: "Sven Van Asbroeck" <thesven73@gmail.com>,
	"Andrew Lunn" <andrew@lunn.ch>,
	"Alexey Denisov" <rtgbnm@gmail.com>,
	"Sergej Bauer" <sbauer@blackbox.su>,
	"Tim Harvey" <tharvey@gateworks.com>,
	"Anders Rønningen" <anders@ronningen.priv.no>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH net-next v1 1/6] lan743x: boost performance on cpu archs w/o dma cache snooping
Date: Fri, 29 Jan 2021 14:52:35 -0500	[thread overview]
Message-ID: <20210129195240.31871-2-TheSven73@gmail.com> (raw)
In-Reply-To: <20210129195240.31871-1-TheSven73@gmail.com>

From: Sven Van Asbroeck <thesven73@gmail.com>

The buffers in the lan743x driver's receive ring are always 9K,
even when the largest packet that can be received (the mtu) is
much smaller. This performs particularly badly on cpu archs
without dma cache snooping (such as ARM): each received packet
results in a 9K dma_{map|unmap} operation, which is very expensive
because cpu caches need to be invalidated.

Careful measurement of the driver rx path on armv7 reveals that
the cpu spends the majority of its time waiting for cache
invalidation.

Optimize as follows:

1. set rx ring buffer size equal to the mtu. this limits the
   amount of cache that needs to be invalidated per dma_map().

2. when dma_unmap()ping, skip cpu sync. Sync only the packet data
   actually received, the size of which the chip will indicate in
   its rx ring descriptors. this limits the amount of cache that
   needs to be invalidated per dma_unmap().

These optimizations double the rx performance on armv7.
Third parties report 3x rx speedup on armv8.

Performance on dma cache snooping architectures (such as x86)
is expected to stay the same.

Tested with iperf3 on a freescale imx6qp + lan7430, both sides
set to mtu 1500 bytes, measure rx performance:

Before:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.00  sec   550 MBytes   231 Mbits/sec    0
After:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.00  sec  1.33 GBytes   570 Mbits/sec    0

Test by Anders Roenningen (anders@ronningen.priv.no) on armv8,
    rx iperf3:
Before 102 Mbits/sec
After  279 Mbits/sec

Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
---

Tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git # 46eb3c108fe1

To: Bryan Whitehead <bryan.whitehead@microchip.com>
To: UNGLinuxDriver@microchip.com
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Alexey Denisov <rtgbnm@gmail.com>
Cc: Sergej Bauer <sbauer@blackbox.su>
Cc: Tim Harvey <tharvey@gateworks.com>
Cc: Anders Rønningen <anders@ronningen.priv.no>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org (open list)

 drivers/net/ethernet/microchip/lan743x_main.c | 35 ++++++++++++-------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index f1f6eba4ace4..f485320e5784 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1957,11 +1957,11 @@ static int lan743x_rx_next_index(struct lan743x_rx *rx, int index)
 
 static struct sk_buff *lan743x_rx_allocate_skb(struct lan743x_rx *rx)
 {
-	int length = 0;
+	struct net_device *netdev = rx->adapter->netdev;
 
-	length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
-	return __netdev_alloc_skb(rx->adapter->netdev,
-				  length, GFP_ATOMIC | GFP_DMA);
+	return __netdev_alloc_skb(netdev,
+				  netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING,
+				  GFP_ATOMIC | GFP_DMA);
 }
 
 static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
@@ -1977,9 +1977,10 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
 {
 	struct lan743x_rx_buffer_info *buffer_info;
 	struct lan743x_rx_descriptor *descriptor;
-	int length = 0;
+	struct net_device *netdev = rx->adapter->netdev;
+	int length;
 
-	length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
+	length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
 	descriptor = &rx->ring_cpu_ptr[index];
 	buffer_info = &rx->buffer_info[index];
 	buffer_info->skb = skb;
@@ -2148,11 +2149,18 @@ static int lan743x_rx_process_packet(struct lan743x_rx *rx)
 			descriptor = &rx->ring_cpu_ptr[first_index];
 
 			/* unmap from dma */
+			packet_length =	RX_DESC_DATA0_FRAME_LENGTH_GET_
+					(descriptor->data0);
 			if (buffer_info->dma_ptr) {
-				dma_unmap_single(&rx->adapter->pdev->dev,
-						 buffer_info->dma_ptr,
-						 buffer_info->buffer_length,
-						 DMA_FROM_DEVICE);
+				dma_sync_single_for_cpu(&rx->adapter->pdev->dev,
+							buffer_info->dma_ptr,
+							packet_length,
+							DMA_FROM_DEVICE);
+				dma_unmap_single_attrs(&rx->adapter->pdev->dev,
+						       buffer_info->dma_ptr,
+						       buffer_info->buffer_length,
+						       DMA_FROM_DEVICE,
+						       DMA_ATTR_SKIP_CPU_SYNC);
 				buffer_info->dma_ptr = 0;
 				buffer_info->buffer_length = 0;
 			}
@@ -2167,8 +2175,8 @@ static int lan743x_rx_process_packet(struct lan743x_rx *rx)
 			int index = first_index;
 
 			/* multi buffer packet not supported */
-			/* this should not happen since
-			 * buffers are allocated to be at least jumbo size
+			/* this should not happen since buffers are allocated
+			 * to be at least the mtu size configured in the mac.
 			 */
 
 			/* clean up buffers */
@@ -2628,6 +2636,9 @@ static int lan743x_netdev_change_mtu(struct net_device *netdev, int new_mtu)
 	struct lan743x_adapter *adapter = netdev_priv(netdev);
 	int ret = 0;
 
+	if (netif_running(netdev))
+		return -EBUSY;
+
 	ret = lan743x_mac_set_mtu(adapter, new_mtu);
 	if (!ret)
 		netdev->mtu = new_mtu;
-- 
2.17.1


  reply	other threads:[~2021-01-29 20:11 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-29 19:52 [PATCH net-next v1 0/6] lan743x speed boost Sven Van Asbroeck
2021-01-29 19:52 ` Sven Van Asbroeck [this message]
2021-01-29 20:36   ` [PATCH net-next v1 1/6] lan743x: boost performance on cpu archs w/o dma cache snooping Andrew Lunn
2021-01-29 22:49     ` Sven Van Asbroeck
2021-01-29 22:01   ` Jakub Kicinski
2021-01-29 22:46     ` Sven Van Asbroeck
2021-01-30 22:10   ` Bryan.Whitehead
2021-01-30 23:59     ` Sven Van Asbroeck
2021-01-31  0:14     ` Sven Van Asbroeck
     [not found]   ` <20210204060210.2362-1-hdanton@sina.com>
2021-02-05  9:31     ` Christoph Hellwig
2021-02-05 14:01       ` Sven Van Asbroeck
2021-02-05 12:44   ` Sergej Bauer
2021-02-05 14:07     ` Sven Van Asbroeck
2021-02-05 15:09       ` Sergej Bauer
2021-02-05 16:39         ` Sven Van Asbroeck
2021-02-05 16:59           ` Sergej Bauer
2021-01-29 19:52 ` [PATCH net-next v1 2/6] lan743x: support rx multi-buffer packets Sven Van Asbroeck
2021-01-29 22:11   ` Willem de Bruijn
2021-01-29 23:02     ` Sven Van Asbroeck
2021-01-29 23:08       ` Willem de Bruijn
2021-01-29 23:10         ` Sven Van Asbroeck
2021-01-31  7:06   ` Bryan.Whitehead
2021-01-31 15:25     ` Sven Van Asbroeck
2021-02-01 18:04       ` Bryan.Whitehead
2021-02-03 18:53         ` Sven Van Asbroeck
2021-02-03 20:14           ` Bryan.Whitehead
2021-02-03 20:25             ` Sven Van Asbroeck
2021-02-03 20:41               ` Bryan.Whitehead
2021-01-29 19:52 ` [PATCH net-next v1 3/6] lan743x: allow mtu change while network interface is up Sven Van Asbroeck
2021-01-29 19:52 ` [PATCH net-next v1 4/6] TEST ONLY: lan743x: limit rx ring buffer size to 500 bytes Sven Van Asbroeck
2021-01-29 19:52 ` [PATCH net-next v1 5/6] TEST ONLY: lan743x: skb_alloc failure test Sven Van Asbroeck
2021-01-29 19:52 ` [PATCH net-next v1 6/6] TEST ONLY: lan743x: skb_trim " Sven Van Asbroeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210129195240.31871-2-TheSven73@gmail.com \
    --to=thesven73@gmail.com \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=anders@ronningen.priv.no \
    --cc=andrew@lunn.ch \
    --cc=bryan.whitehead@microchip.com \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rtgbnm@gmail.com \
    --cc=sbauer@blackbox.su \
    --cc=tharvey@gateworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.