netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Optimizing kernel compilation / alignments for network performance
@ 2022-04-27 12:04 Rafał Miłecki
  2022-04-27 12:56 ` Alexander Lobakin
  0 siblings, 1 reply; 22+ messages in thread
From: Rafał Miłecki @ 2022-04-27 12:04 UTC (permalink / raw)
  To: Network Development, linux-arm-kernel, Russell King, Andrew Lunn,
	Felix Fietkau
  Cc: openwrt-devel, Florian Fainelli

Hi,

I noticed years ago that kernel changes touching code - that I don't use
at all - can affect network performance for me.

I work with home routers based on Broadcom Northstar platform. Those
are SoCs with not-so-powerful 2 x ARM Cortex-A9 CPU cores. Main task of
those devices is NAT masquerade and that is what I test with iperf
running on two x86 machines.

***

Example of such unused code change:
ce5013ff3bec ("mtd: spi-nor: Add support for XM25QH64A and XM25QH128A").
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce5013ff3bec05cf2a8a05c75fcd520d9914d92b
It lowered my NAT speed from 381 Mb/s to 367 Mb/s (-3,5%).

I first reported that issue it in the e-mail thread:
ARM router NAT performance affected by random/unrelated commits
https://lkml.org/lkml/2019/5/21/349
https://www.spinics.net/lists/linux-block/msg40624.html

Back then it was commit 5b0890a97204 ("flow_dissector: Parse batman-adv
unicast headers")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9316a9ed6895c4ad2f0cde171d486f80c55d8283
that increased my NAT speed from 741 Mb/s to 773 Mb/s (+4,3%).

***

It appears Northstar CPUs have little cache size and so any change in
location of kernel symbols can affect NAT performance. That explains why
changing unrelated code affects anything & it has been partially proven
aligning some of cache-v7.S code.

My question is: is there a way to find out & force an optimal symbols
locations?

Adding .align 5 to the cache-v7.S is a partial success. I'd like to find
out what other functions are worth optimizing (aligning) and force that
(I guess  __attribute__((aligned(32))) could be used).

I can't really draw any conclusions from comparing System.map before and
after above commits as they relocate thousands of symbols in one go.

Optimizing is pretty important for me for two reasons:
1. I want to reach maximum possible NAT masquerade performance
2. I need stable performance across random commits to detect regressions

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-05-10 19:15 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27 12:04 Optimizing kernel compilation / alignments for network performance Rafał Miłecki
2022-04-27 12:56 ` Alexander Lobakin
2022-04-27 17:31   ` Rafał Miłecki
2022-04-29 14:18     ` Rafał Miłecki
2022-04-29 14:49     ` Arnd Bergmann
2022-05-05 15:42       ` Rafał Miłecki
2022-05-05 16:04         ` Andrew Lunn
2022-05-05 16:46           ` Felix Fietkau
2022-05-06  7:47             ` Rafał Miłecki
2022-05-06 12:42               ` Andrew Lunn
2022-05-10 10:29                 ` Rafał Miłecki
2022-05-10 14:09                   ` Dave Taht
2022-05-10 19:15                     ` Dave Taht
2022-05-06  7:44           ` Rafał Miłecki
2022-05-06  8:45             ` Arnd Bergmann
2022-05-06  8:55               ` Rafał Miłecki
2022-05-06  9:44                 ` Arnd Bergmann
2022-05-10 12:51                   ` Rafał Miłecki
2022-05-10 13:19                     ` Arnd Bergmann
2022-05-10 11:23               ` Rafał Miłecki
2022-05-10 13:18                 ` Arnd Bergmann
2022-05-08  9:53             ` Rafał Miłecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).