* [PATCH net-next V1] net: adjust net_device layout for cacheline usage
@ 2021-01-26 17:39 Jesper Dangaard Brouer
2021-01-29 3:51 ` David Ahern
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2021-01-26 17:39 UTC (permalink / raw)
To: David Ahern, netdev, Jakub Kicinski, David S. Miller
Cc: Jesper Dangaard Brouer, bpf, Eric Dumazet, Daniel Borkmann,
Alexei Starovoitov
The current layout of net_device is not optimal for cacheline usage.
The member adj_list.lower linked list is split between cacheline 2 and 3.
The ifindex is placed together with stats (struct net_device_stats),
although most modern drivers don't update this stats member.
The members netdev_ops, mtu and hard_header_len are placed on three
different cachelines. These members are accessed for XDP redirect into
devmap, which were noticeably with perf tool. When not using the map
redirect variant (like TC-BPF does), then ifindex is also used, which is
placed on a separate fourth cacheline. These members are also accessed
during forwarding with regular network stack. The members priv_flags and
flags are on fast-path for network stack transmit path in __dev_queue_xmit
(currently located together with mtu cacheline).
This patch creates a read mostly cacheline, with the purpose of keeping the
above mentioned members on the same cacheline.
Some netdev_features_t members also becomes part of this cacheline, which is
on purpose, as function netif_skb_features() is on fast-path via
validate_xmit_skb().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/linux/netdevice.h | 53 +++++++++++++++++++++++----------------------
1 file changed, 27 insertions(+), 26 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b7915484369c..2645f114de54 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1855,7 +1855,6 @@ struct net_device {
unsigned long mem_end;
unsigned long mem_start;
unsigned long base_addr;
- int irq;
/*
* Some hardware also needs these fields (state,dev_list,
@@ -1877,6 +1876,23 @@ struct net_device {
struct list_head lower;
} adj_list;
+ /* Read-mostly cache-line for fast-path access */
+ unsigned int flags;
+ unsigned int priv_flags;
+ const struct net_device_ops *netdev_ops;
+ int ifindex;
+ unsigned short gflags;
+ unsigned short hard_header_len;
+
+ /* Note : dev->mtu is often read without holding a lock.
+ * Writers usually hold RTNL.
+ * It is recommended to use READ_ONCE() to annotate the reads,
+ * and to use WRITE_ONCE() to annotate the writes.
+ */
+ unsigned int mtu;
+ unsigned short needed_headroom;
+ unsigned short needed_tailroom;
+
netdev_features_t features;
netdev_features_t hw_features;
netdev_features_t wanted_features;
@@ -1885,10 +1901,15 @@ struct net_device {
netdev_features_t mpls_features;
netdev_features_t gso_partial_features;
- int ifindex;
+ unsigned int min_mtu;
+ unsigned int max_mtu;
+ unsigned short type;
+ unsigned char min_header_len;
+ unsigned char name_assign_type;
+
int group;
- struct net_device_stats stats;
+ struct net_device_stats stats; /* not used by modern drivers */
atomic_long_t rx_dropped;
atomic_long_t tx_dropped;
@@ -1902,7 +1923,6 @@ struct net_device {
const struct iw_handler_def *wireless_handlers;
struct iw_public_data *wireless_data;
#endif
- const struct net_device_ops *netdev_ops;
const struct ethtool_ops *ethtool_ops;
#ifdef CONFIG_NET_L3_MASTER_DEV
const struct l3mdev_ops *l3mdev_ops;
@@ -1921,34 +1941,12 @@ struct net_device {
const struct header_ops *header_ops;
- unsigned int flags;
- unsigned int priv_flags;
-
- unsigned short gflags;
- unsigned short padded;
-
unsigned char operstate;
unsigned char link_mode;
unsigned char if_port;
unsigned char dma;
- /* Note : dev->mtu is often read without holding a lock.
- * Writers usually hold RTNL.
- * It is recommended to use READ_ONCE() to annotate the reads,
- * and to use WRITE_ONCE() to annotate the writes.
- */
- unsigned int mtu;
- unsigned int min_mtu;
- unsigned int max_mtu;
- unsigned short type;
- unsigned short hard_header_len;
- unsigned char min_header_len;
- unsigned char name_assign_type;
-
- unsigned short needed_headroom;
- unsigned short needed_tailroom;
-
/* Interface address info. */
unsigned char perm_addr[MAX_ADDR_LEN];
unsigned char addr_assign_type;
@@ -1959,7 +1957,10 @@ struct net_device {
unsigned short neigh_priv_len;
unsigned short dev_id;
unsigned short dev_port;
+ unsigned short padded;
+
spinlock_t addr_list_lock;
+ int irq;
struct netdev_hw_addr_list uc;
struct netdev_hw_addr_list mc;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next V1] net: adjust net_device layout for cacheline usage
2021-01-26 17:39 [PATCH net-next V1] net: adjust net_device layout for cacheline usage Jesper Dangaard Brouer
@ 2021-01-29 3:51 ` David Ahern
2021-01-29 7:58 ` Jesper Dangaard Brouer
2021-01-29 4:50 ` patchwork-bot+netdevbpf
[not found] ` <52835f1f-96e1-b36e-2631-1182649ac3a8@gmail.com>
2 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2021-01-29 3:51 UTC (permalink / raw)
To: Jesper Dangaard Brouer, netdev, Jakub Kicinski, David S. Miller
Cc: bpf, Eric Dumazet, Daniel Borkmann, Alexei Starovoitov
On 1/26/21 10:39 AM, Jesper Dangaard Brouer wrote:
> The current layout of net_device is not optimal for cacheline usage.
>
> The member adj_list.lower linked list is split between cacheline 2 and 3.
> The ifindex is placed together with stats (struct net_device_stats),
> although most modern drivers don't update this stats member.
>
> The members netdev_ops, mtu and hard_header_len are placed on three
> different cachelines. These members are accessed for XDP redirect into
> devmap, which were noticeably with perf tool. When not using the map
> redirect variant (like TC-BPF does), then ifindex is also used, which is
> placed on a separate fourth cacheline. These members are also accessed
> during forwarding with regular network stack. The members priv_flags and
> flags are on fast-path for network stack transmit path in __dev_queue_xmit
> (currently located together with mtu cacheline).
>
> This patch creates a read mostly cacheline, with the purpose of keeping the
> above mentioned members on the same cacheline.
>
> Some netdev_features_t members also becomes part of this cacheline, which is
> on purpose, as function netif_skb_features() is on fast-path via
> validate_xmit_skb().
A long over due look at the organization of this struct. Do you have
performance numbers for the XDP case?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next V1] net: adjust net_device layout for cacheline usage
2021-01-26 17:39 [PATCH net-next V1] net: adjust net_device layout for cacheline usage Jesper Dangaard Brouer
2021-01-29 3:51 ` David Ahern
@ 2021-01-29 4:50 ` patchwork-bot+netdevbpf
[not found] ` <52835f1f-96e1-b36e-2631-1182649ac3a8@gmail.com>
2 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-01-29 4:50 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: dsahern, netdev, kuba, davem, bpf, eric.dumazet, borkmann,
alexei.starovoitov
Hello:
This patch was applied to netdev/net-next.git (refs/heads/master):
On Tue, 26 Jan 2021 18:39:39 +0100 you wrote:
> The current layout of net_device is not optimal for cacheline usage.
>
> The member adj_list.lower linked list is split between cacheline 2 and 3.
> The ifindex is placed together with stats (struct net_device_stats),
> although most modern drivers don't update this stats member.
>
> The members netdev_ops, mtu and hard_header_len are placed on three
> different cachelines. These members are accessed for XDP redirect into
> devmap, which were noticeably with perf tool. When not using the map
> redirect variant (like TC-BPF does), then ifindex is also used, which is
> placed on a separate fourth cacheline. These members are also accessed
> during forwarding with regular network stack. The members priv_flags and
> flags are on fast-path for network stack transmit path in __dev_queue_xmit
> (currently located together with mtu cacheline).
>
> [...]
Here is the summary with links:
- [net-next,V1] net: adjust net_device layout for cacheline usage
https://git.kernel.org/netdev/net-next/c/28af22c6c8df
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next V1] net: adjust net_device layout for cacheline usage
2021-01-29 3:51 ` David Ahern
@ 2021-01-29 7:58 ` Jesper Dangaard Brouer
[not found] ` <20210129114642.139cb7dc@carbon>
0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2021-01-29 7:58 UTC (permalink / raw)
To: David Ahern
Cc: netdev, Jakub Kicinski, David S. Miller, bpf, Eric Dumazet,
Daniel Borkmann, Alexei Starovoitov, brouer
On Thu, 28 Jan 2021 20:51:23 -0700
David Ahern <dsahern@gmail.com> wrote:
> On 1/26/21 10:39 AM, Jesper Dangaard Brouer wrote:
> > The current layout of net_device is not optimal for cacheline usage.
> >
> > The member adj_list.lower linked list is split between cacheline 2 and 3.
> > The ifindex is placed together with stats (struct net_device_stats),
> > although most modern drivers don't update this stats member.
> >
> > The members netdev_ops, mtu and hard_header_len are placed on three
> > different cachelines. These members are accessed for XDP redirect into
> > devmap, which were noticeably with perf tool. When not using the map
> > redirect variant (like TC-BPF does), then ifindex is also used, which is
> > placed on a separate fourth cacheline. These members are also accessed
> > during forwarding with regular network stack. The members priv_flags and
> > flags are on fast-path for network stack transmit path in __dev_queue_xmit
> > (currently located together with mtu cacheline).
> >
> > This patch creates a read mostly cacheline, with the purpose of keeping the
> > above mentioned members on the same cacheline.
> >
> > Some netdev_features_t members also becomes part of this cacheline, which is
> > on purpose, as function netif_skb_features() is on fast-path via
> > validate_xmit_skb().
>
> A long over due look at the organization of this struct. Do you have
> performance numbers for the XDP case?
Yes, my measurements are documented here:
https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_redir01_net_device.org
Calc improvements of xdp_redirect_map on driver i40e:
* (1/12115061-1/12906785)*10^9 = 5.06 ns
* ((12906785/12115061)-1)*100 = 6.54%
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next V1] net: adjust net_device layout for cacheline usage
[not found] ` <20210129114642.139cb7dc@carbon>
@ 2021-01-29 19:35 ` Jakub Kicinski
2021-01-29 19:47 ` Eric Dumazet
0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2021-01-29 19:35 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: David Ahern, netdev, David S. Miller, bpf, Eric Dumazet,
Daniel Borkmann, Alexei Starovoitov
On Fri, 29 Jan 2021 11:46:42 +0100 Jesper Dangaard Brouer wrote:
> > On Thu, 28 Jan 2021 20:51:23 -0700
> > David Ahern <dsahern@gmail.com> wrote:
> > > A long over due look at the organization of this struct.
>
> Yes, I was surprised that the cache-lines used in fast-path was this
> spread out.
I tried measuring the cache misses on struct netdevice running
relatively network-heavy production workload once but they were
really deep in the noise. Things become much easier to optimize
with a XDP micro-benchmark, but obviously should benefit all.
> There is a comment /* Cache lines mostly used on receive path */
> but that comment no-longer start on a cacheline, so I suspect that this
> have slowly diverted over time (Eric's commit 9356b8fc07 dates back to 2005).
>
> Patch is already applied. I expected people to would say that I also
> needed to adjust the doc-type comments. The comments describing the
> members, seems to be ordered the same way as defined. Should we/I keep
> that order intact? (when moving members)
kdoc didn't complain, and as you say it's already a mess, plus it's
two screen-fulls of scrolling away...
I think converting to inline kdoc of members would be an improvement,
if you want to sign up for that? Otherwise -EDIDNTCARE on my side :)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next V1] net: adjust net_device layout for cacheline usage
2021-01-29 19:35 ` Jakub Kicinski
@ 2021-01-29 19:47 ` Eric Dumazet
2021-01-29 20:07 ` Jakub Kicinski
0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2021-01-29 19:47 UTC (permalink / raw)
To: Jakub Kicinski, Jesper Dangaard Brouer
Cc: David Ahern, netdev, David S. Miller, bpf, Eric Dumazet,
Daniel Borkmann, Alexei Starovoitov
On 1/29/21 8:35 PM, Jakub Kicinski wrote:
> kdoc didn't complain, and as you say it's already a mess, plus it's
> two screen-fulls of scrolling away...
>
> I think converting to inline kdoc of members would be an improvement,
> if you want to sign up for that? Otherwise -EDIDNTCARE on my side :)
>
What about removing this kdoc ?
kdoc for a huge structure is mostly useless...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next V1] net: adjust net_device layout for cacheline usage
2021-01-29 19:47 ` Eric Dumazet
@ 2021-01-29 20:07 ` Jakub Kicinski
2021-02-12 15:57 ` Jesper Dangaard Brouer
0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2021-01-29 20:07 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jesper Dangaard Brouer, David Ahern, netdev, David S. Miller,
bpf, Daniel Borkmann, Alexei Starovoitov
On Fri, 29 Jan 2021 20:47:41 +0100 Eric Dumazet wrote:
> On 1/29/21 8:35 PM, Jakub Kicinski wrote:
>
> > kdoc didn't complain, and as you say it's already a mess, plus it's
> > two screen-fulls of scrolling away...
> >
> > I think converting to inline kdoc of members would be an improvement,
> > if you want to sign up for that? Otherwise -EDIDNTCARE on my side :)
> >
>
> What about removing this kdoc ?
>
> kdoc for a huge structure is mostly useless...
It's definitely not useful for "us", I'd guess most seasoned developers
will just grep for uses of the field - but maybe it is useful for noobs
trying to have high-level sense of the code?
Either way is fine by me, we can always preserve meaningful comments
inline without the kdoc decorator.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next V1] net: adjust net_device layout for cacheline usage
[not found] ` <20210129150058.34e3a855@carbon>
@ 2021-02-12 15:49 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2021-02-12 15:49 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Ahern, netdev, Jakub Kicinski, David S. Miller, bpf,
Daniel Borkmann, Alexei Starovoitov, brouer
On Fri, 29 Jan 2021 15:00:58 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> On Fri, 29 Jan 2021 14:33:02 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > On 1/26/21 6:39 PM, Jesper Dangaard Brouer wrote:
> > > The current layout of net_device is not optimal for cacheline usage.
> > >
[...]
> > > @@ -1877,6 +1876,23 @@ struct net_device {
[...]
> > > +
> > > netdev_features_t features;
> > > netdev_features_t hw_features;
> > > netdev_features_t wanted_features;
> >
> > Probably wanted_features, hw_features are not used in fast path, only
> > in control path ?
>
> Yes, that was also my analysis. I did consider moving those down, but
> I wanted to keep the first iteration simple ;-)
I've send a followup[1] to address this, thanks for pointing it out.
[1] https://patchwork.kernel.org/project/netdevbpf/patch/161313782625.1008639.6000589679659428869.stgit@firesoul/
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next V1] net: adjust net_device layout for cacheline usage
2021-01-29 20:07 ` Jakub Kicinski
@ 2021-02-12 15:57 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2021-02-12 15:57 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Eric Dumazet, David Ahern, netdev, David S. Miller, bpf,
Daniel Borkmann, Alexei Starovoitov, brouer
On Fri, 29 Jan 2021 12:07:23 -0800
Jakub Kicinski <kuba@kernel.org> wrote:
> On Fri, 29 Jan 2021 20:47:41 +0100 Eric Dumazet wrote:
> > On 1/29/21 8:35 PM, Jakub Kicinski wrote:
> >
> > > kdoc didn't complain, and as you say it's already a mess, plus it's
> > > two screen-fulls of scrolling away...
> > >
> > > I think converting to inline kdoc of members would be an improvement,
> > > if you want to sign up for that? Otherwise -EDIDNTCARE on my side :)
> > >
> >
> > What about removing this kdoc ?
> >
> > kdoc for a huge structure is mostly useless...
>
> It's definitely not useful for "us", I'd guess most seasoned developers
> will just grep for uses of the field - but maybe it is useful for noobs
> trying to have high-level sense of the code?
>
> Either way is fine by me, we can always preserve meaningful comments
> inline without the kdoc decorator.
I agree that removing this kdoc makes sense. But as Jakub says we
should preserve meaningful comments inline. I'll add this task to my
TODO list, but anyone feel free to do the work before I get around to it.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-02-12 15:59 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-26 17:39 [PATCH net-next V1] net: adjust net_device layout for cacheline usage Jesper Dangaard Brouer
2021-01-29 3:51 ` David Ahern
2021-01-29 7:58 ` Jesper Dangaard Brouer
[not found] ` <20210129114642.139cb7dc@carbon>
2021-01-29 19:35 ` Jakub Kicinski
2021-01-29 19:47 ` Eric Dumazet
2021-01-29 20:07 ` Jakub Kicinski
2021-02-12 15:57 ` Jesper Dangaard Brouer
2021-01-29 4:50 ` patchwork-bot+netdevbpf
[not found] ` <52835f1f-96e1-b36e-2631-1182649ac3a8@gmail.com>
[not found] ` <20210129150058.34e3a855@carbon>
2021-02-12 15:49 ` Jesper Dangaard Brouer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).