netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Lunn <andrew@lunn.ch>
To: "Rafał Miłecki" <zajec5@gmail.com>
Cc: Felix Fietkau <nbd@nbd.name>, Arnd Bergmann <arnd@arndb.de>,
	Alexander Lobakin <alexandr.lobakin@intel.com>,
	Network Development <netdev@vger.kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Russell King <linux@armlinux.org.uk>,
	"openwrt-devel@lists.openwrt.org"
	<openwrt-devel@lists.openwrt.org>,
	Florian Fainelli <f.fainelli@gmail.com>
Subject: Re: Optimizing kernel compilation / alignments for network performance
Date: Fri, 6 May 2022 14:42:49 +0200	[thread overview]
Message-ID: <YnUXyQbLRn4BmJYr@lunn.ch> (raw)
In-Reply-To: <04fa6560-e6f4-005f-cddb-7bc9b4859ba2@gmail.com>

> > I just took a quick look at the driver. It allocates and maps rx buffers that can cover a packet size of BGMAC_RX_MAX_FRAME_SIZE = 9724.
> > This seems rather excessive, especially since most people are going to use a MTU of 1500.
> > My proposal would be to add support for making rx buffer size dependent on MTU, reallocating the ring on MTU changes.
> > This should significantly reduce the time spent on flushing caches.
> 
> Oh, that's important too, it was changed by commit 8c7da63978f1 ("bgmac:
> configure MTU and add support for frames beyond 8192 byte size"):
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c7da63978f1672eb4037bbca6e7eac73f908f03
> 
> It lowered NAT speed with bgmac by 60% (362 Mbps → 140 Mbps).
> 
> I do all my testing with
> #define BGMAC_RX_MAX_FRAME_SIZE			1536

That helps show that cache operations are part of your bottleneck.

Taking a quick look at the driver. On the receive side:

                       /* Unmap buffer to make it accessible to the CPU */
                        dma_unmap_single(dma_dev, dma_addr,
                                         BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);

Here is data is mapped read for the CPU to use it.

			/* Get info from the header */
                        len = le16_to_cpu(rx->len);
                        flags = le16_to_cpu(rx->flags);

                        /* Check for poison and drop or pass the packet */
                        if (len == 0xdead && flags == 0xbeef) {
                                netdev_err(bgmac->net_dev, "Found poisoned packet at slot %d, DMA issue!\n",
                                           ring->start);
                                put_page(virt_to_head_page(buf));
                                bgmac->net_dev->stats.rx_errors++;
                                break;
                        }

                        if (len > BGMAC_RX_ALLOC_SIZE) {
                                netdev_err(bgmac->net_dev, "Found oversized packet at slot %d, DMA issue!\n",
                                           ring->start);
                                put_page(virt_to_head_page(buf));
                                bgmac->net_dev->stats.rx_length_errors++;
                                bgmac->net_dev->stats.rx_errors++;
                                break;
                        }

                        /* Omit CRC. */
                        len -= ETH_FCS_LEN;

                        skb = build_skb(buf, BGMAC_RX_ALLOC_SIZE);
                        if (unlikely(!skb)) {
                                netdev_err(bgmac->net_dev, "build_skb failed\n");
                                put_page(virt_to_head_page(buf));
                                bgmac->net_dev->stats.rx_errors++;
                                break;
                        }
                        skb_put(skb, BGMAC_RX_FRAME_OFFSET +
                                BGMAC_RX_BUF_OFFSET + len);
                        skb_pull(skb, BGMAC_RX_FRAME_OFFSET +
                                 BGMAC_RX_BUF_OFFSET);

                        skb_checksum_none_assert(skb);
                        skb->protocol = eth_type_trans(skb, bgmac->net_dev);

and this is the first access of the actual data. You can make the
cache actually work for you, rather than against you, to adding a call to

	prefetch(buf);

just after the dma_unmap_single(). That will start getting the frame
header from DRAM into cache, so hopefully it is available by the time
eth_type_trans() is called and you don't have a cache miss.

	Andrew

  reply	other threads:[~2022-05-06 12:43 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-27 12:04 Optimizing kernel compilation / alignments for network performance Rafał Miłecki
2022-04-27 12:56 ` Alexander Lobakin
2022-04-27 17:31   ` Rafał Miłecki
2022-04-29 14:18     ` Rafał Miłecki
2022-04-29 14:49     ` Arnd Bergmann
2022-05-05 15:42       ` Rafał Miłecki
2022-05-05 16:04         ` Andrew Lunn
2022-05-05 16:46           ` Felix Fietkau
2022-05-06  7:47             ` Rafał Miłecki
2022-05-06 12:42               ` Andrew Lunn [this message]
2022-05-10 10:29                 ` Rafał Miłecki
2022-05-10 14:09                   ` Dave Taht
2022-05-10 19:15                     ` Dave Taht
2022-05-06  7:44           ` Rafał Miłecki
2022-05-06  8:45             ` Arnd Bergmann
2022-05-06  8:55               ` Rafał Miłecki
2022-05-06  9:44                 ` Arnd Bergmann
2022-05-10 12:51                   ` Rafał Miłecki
2022-05-10 13:19                     ` Arnd Bergmann
2022-05-10 11:23               ` Rafał Miłecki
2022-05-10 13:18                 ` Arnd Bergmann
2022-05-08  9:53             ` Rafał Miłecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnUXyQbLRn4BmJYr@lunn.ch \
    --to=andrew@lunn.ch \
    --cc=alexandr.lobakin@intel.com \
    --cc=arnd@arndb.de \
    --cc=f.fainelli@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux@armlinux.org.uk \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=openwrt-devel@lists.openwrt.org \
    --cc=zajec5@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).