From: "Rafał Miłecki" <zajec5@gmail.com>
To: Andrew Lunn <andrew@lunn.ch>
Cc: Felix Fietkau <nbd@nbd.name>, Arnd Bergmann <arnd@arndb.de>,
Alexander Lobakin <alexandr.lobakin@intel.com>,
Network Development <netdev@vger.kernel.org>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
Russell King <linux@armlinux.org.uk>,
"openwrt-devel@lists.openwrt.org"
<openwrt-devel@lists.openwrt.org>,
Florian Fainelli <f.fainelli@gmail.com>
Subject: Re: Optimizing kernel compilation / alignments for network performance
Date: Tue, 10 May 2022 12:29:32 +0200 [thread overview]
Message-ID: <391ca2d1-6977-0c9b-588c-31ad9bb68c82@gmail.com> (raw)
In-Reply-To: <YnUXyQbLRn4BmJYr@lunn.ch>
On 6.05.2022 14:42, Andrew Lunn wrote:
>>> I just took a quick look at the driver. It allocates and maps rx buffers that can cover a packet size of BGMAC_RX_MAX_FRAME_SIZE = 9724.
>>> This seems rather excessive, especially since most people are going to use a MTU of 1500.
>>> My proposal would be to add support for making rx buffer size dependent on MTU, reallocating the ring on MTU changes.
>>> This should significantly reduce the time spent on flushing caches.
>>
>> Oh, that's important too, it was changed by commit 8c7da63978f1 ("bgmac:
>> configure MTU and add support for frames beyond 8192 byte size"):
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c7da63978f1672eb4037bbca6e7eac73f908f03
>>
>> It lowered NAT speed with bgmac by 60% (362 Mbps → 140 Mbps).
>>
>> I do all my testing with
>> #define BGMAC_RX_MAX_FRAME_SIZE 1536
>
> That helps show that cache operations are part of your bottleneck.
>
> Taking a quick look at the driver. On the receive side:
>
> /* Unmap buffer to make it accessible to the CPU */
> dma_unmap_single(dma_dev, dma_addr,
> BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
>
> Here is data is mapped read for the CPU to use it.
>
> /* Get info from the header */
> len = le16_to_cpu(rx->len);
> flags = le16_to_cpu(rx->flags);
>
> /* Check for poison and drop or pass the packet */
> if (len == 0xdead && flags == 0xbeef) {
> netdev_err(bgmac->net_dev, "Found poisoned packet at slot %d, DMA issue!\n",
> ring->start);
> put_page(virt_to_head_page(buf));
> bgmac->net_dev->stats.rx_errors++;
> break;
> }
>
> if (len > BGMAC_RX_ALLOC_SIZE) {
> netdev_err(bgmac->net_dev, "Found oversized packet at slot %d, DMA issue!\n",
> ring->start);
> put_page(virt_to_head_page(buf));
> bgmac->net_dev->stats.rx_length_errors++;
> bgmac->net_dev->stats.rx_errors++;
> break;
> }
>
> /* Omit CRC. */
> len -= ETH_FCS_LEN;
>
> skb = build_skb(buf, BGMAC_RX_ALLOC_SIZE);
> if (unlikely(!skb)) {
> netdev_err(bgmac->net_dev, "build_skb failed\n");
> put_page(virt_to_head_page(buf));
> bgmac->net_dev->stats.rx_errors++;
> break;
> }
> skb_put(skb, BGMAC_RX_FRAME_OFFSET +
> BGMAC_RX_BUF_OFFSET + len);
> skb_pull(skb, BGMAC_RX_FRAME_OFFSET +
> BGMAC_RX_BUF_OFFSET);
>
> skb_checksum_none_assert(skb);
> skb->protocol = eth_type_trans(skb, bgmac->net_dev);
>
> and this is the first access of the actual data. You can make the
> cache actually work for you, rather than against you, to adding a call to
>
> prefetch(buf);
>
> just after the dma_unmap_single(). That will start getting the frame
> header from DRAM into cache, so hopefully it is available by the time
> eth_type_trans() is called and you don't have a cache miss.
I don't think that analysis is correct.
Please take a look at following lines:
struct bgmac_rx_header *rx = slot->buf + BGMAC_RX_BUF_OFFSET;
void *buf = slot->buf;
The first we do after dma_unmap_single() call is rx->len read. That
actually points to DMA data. There is nothing we could keep CPU busy
with while preteching data.
FWIW I tried adding prefetch(buf); anyway. I didn't change NAT speed by
a single 1 Mb/s. Speed was exactly the same as without prefetch() call.
next prev parent reply other threads:[~2022-05-10 10:29 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-27 12:04 Optimizing kernel compilation / alignments for network performance Rafał Miłecki
2022-04-27 12:56 ` Alexander Lobakin
2022-04-27 17:31 ` Rafał Miłecki
2022-04-29 14:18 ` Rafał Miłecki
2022-04-29 14:49 ` Arnd Bergmann
2022-05-05 15:42 ` Rafał Miłecki
2022-05-05 16:04 ` Andrew Lunn
2022-05-05 16:46 ` Felix Fietkau
2022-05-06 7:47 ` Rafał Miłecki
2022-05-06 12:42 ` Andrew Lunn
2022-05-10 10:29 ` Rafał Miłecki [this message]
2022-05-10 14:09 ` Dave Taht
2022-05-10 19:15 ` Dave Taht
2022-05-06 7:44 ` Rafał Miłecki
2022-05-06 8:45 ` Arnd Bergmann
2022-05-06 8:55 ` Rafał Miłecki
2022-05-06 9:44 ` Arnd Bergmann
2022-05-10 12:51 ` Rafał Miłecki
2022-05-10 13:19 ` Arnd Bergmann
2022-05-10 11:23 ` Rafał Miłecki
2022-05-10 13:18 ` Arnd Bergmann
2022-05-08 9:53 ` Rafał Miłecki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=391ca2d1-6977-0c9b-588c-31ad9bb68c82@gmail.com \
--to=zajec5@gmail.com \
--cc=alexandr.lobakin@intel.com \
--cc=andrew@lunn.ch \
--cc=arnd@arndb.de \
--cc=f.fainelli@gmail.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux@armlinux.org.uk \
--cc=nbd@nbd.name \
--cc=netdev@vger.kernel.org \
--cc=openwrt-devel@lists.openwrt.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).