netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PATCH] net: Cache align dst refcnt
@ 2013-12-17 18:15 Tom Herbert
  2013-12-17 18:29 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Tom Herbert @ 2013-12-17 18:15 UTC (permalink / raw)
  To: davem, edumazet, netdev

ipv4_dst_check is coming up very high in perf top for TCP_RR tests and
it really is not doing much interesting. Looks like we have false
sharing with dst->refcnt. This patch cache aligns refcnt. There were
already some comments in the code that refcnt needs to be cache aligned.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/dst.h | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 44995c1..265b4cd 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -85,17 +85,12 @@ struct dst_entry {
 #endif
 
 	/*
-	 * Align __refcnt to a 64 bytes alignment
-	 * (L1_CACHE_SIZE would be too much)
-	 */
-#ifdef CONFIG_64BIT
-	long			__pad_to_align_refcnt[2];
-#endif
-	/*
 	 * __refcnt wants to be on a different cache line from
 	 * input/output/ops or performance tanks badly
 	 */
-	atomic_t		__refcnt;	/* client references	*/
+	atomic_t		__refcnt ____cacheline_aligned_in_smp;
+				/* client references	*/
+
 	int			__use;
 	unsigned long		lastuse;
 	union {
-- 
1.8.5.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: PATCH] net: Cache align dst refcnt
  2013-12-17 18:15 PATCH] net: Cache align dst refcnt Tom Herbert
@ 2013-12-17 18:29 ` Eric Dumazet
  2013-12-17 18:47   ` Eric Dumazet
  2013-12-17 19:51   ` Tom Herbert
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2013-12-17 18:29 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, edumazet, netdev

On Tue, 2013-12-17 at 10:15 -0800, Tom Herbert wrote:
> ipv4_dst_check is coming up very high in perf top for TCP_RR tests and
> it really is not doing much interesting. Looks like we have false
> sharing with dst->refcnt. This patch cache aligns refcnt. There were
> already some comments in the code that refcnt needs to be cache aligned.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  include/net/dst.h | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)

This is not needed. I wonder how you tested this.

Problem was solved in commit 5635c10d976716
("net: make sure struct dst_entry refcount is aligned on 64 bytes")

If this BUILD_BUG_ON() doesn't work for you, we should investigate !

static inline void dst_hold(struct dst_entry *dst)
{
        /*
         * If your kernel compilation stops here, please check
         * __pad_to_align_refcnt declaration in struct dst_entry
         */
        BUILD_BUG_ON(offsetof(struct dst_entry, __refcnt) & 63);
        atomic_inc(&dst->__refcnt);
}

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH] net: Cache align dst refcnt
  2013-12-17 18:29 ` Eric Dumazet
@ 2013-12-17 18:47   ` Eric Dumazet
  2013-12-17 19:51   ` Tom Herbert
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2013-12-17 18:47 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, edumazet, netdev

On Tue, 2013-12-17 at 10:29 -0800, Eric Dumazet wrote:
> On Tue, 2013-12-17 at 10:15 -0800, Tom Herbert wrote:
> > ipv4_dst_check is coming up very high in perf top for TCP_RR tests and
> > it really is not doing much interesting. Looks like we have false
> > sharing with dst->refcnt. This patch cache aligns refcnt. There were
> > already some comments in the code that refcnt needs to be cache aligned.
> > 
> > Signed-off-by: Tom Herbert <therbert@google.com>
> > ---
> >  include/net/dst.h | 11 +++--------
> >  1 file changed, 3 insertions(+), 8 deletions(-)

BTW, this dst refcount dirtying is because of netperf using blocking
reads, enabling prequeuing (assuming /proc/sys/net/ipv4/tcp_low_latency
is 0)

And the rx dst is shared by all cpus, if you use concurrent netperfs all
coming from one particular source.

This could be solved (if we really care about this case on real
workloads) by making nh_rth_input a per cpu var, as we did for
nh_pcpu_rth_output

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH] net: Cache align dst refcnt
  2013-12-17 18:29 ` Eric Dumazet
  2013-12-17 18:47   ` Eric Dumazet
@ 2013-12-17 19:51   ` Tom Herbert
  2013-12-17 20:10     ` David Miller
  1 sibling, 1 reply; 5+ messages in thread
From: Tom Herbert @ 2013-12-17 19:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Eric Dumazet, Linux Netdev List

On Tue, Dec 17, 2013 at 10:29 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2013-12-17 at 10:15 -0800, Tom Herbert wrote:
>> ipv4_dst_check is coming up very high in perf top for TCP_RR tests and
>> it really is not doing much interesting. Looks like we have false
>> sharing with dst->refcnt. This patch cache aligns refcnt. There were
>> already some comments in the code that refcnt needs to be cache aligned.
>>
>> Signed-off-by: Tom Herbert <therbert@google.com>
>> ---
>>  include/net/dst.h | 11 +++--------
>>  1 file changed, 3 insertions(+), 8 deletions(-)
>
> This is not needed. I wonder how you tested this.
>
Well I tested with simple netperf:

./super_netperf 200 -t TCP_RR -l 120 -H host -- -r 1,1

Without fix, ipv4_dst_check is 1.66% of CPU utilization (in top 10
functions). With fix, don't see ipv4_dst_check anymore and I don't see
any the cache misses being moved to other functions. This tiny
function is using more CPU that tcp_ack! :-(

> Problem was solved in commit 5635c10d976716
> ("net: make sure struct dst_entry refcount is aligned on 64 bytes")
>
> If this BUILD_BUG_ON() doesn't work for you, we should investigate !
>
Definitely not hitting the bug on. In any case, I'm wondering why is
cache alignment being hard coded here instead of using compiler
directives?

> static inline void dst_hold(struct dst_entry *dst)
> {
>         /*
>          * If your kernel compilation stops here, please check
>          * __pad_to_align_refcnt declaration in struct dst_entry
>          */
>         BUILD_BUG_ON(offsetof(struct dst_entry, __refcnt) & 63);
>         atomic_inc(&dst->__refcnt);
> }
>
> Thanks
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH] net: Cache align dst refcnt
  2013-12-17 19:51   ` Tom Herbert
@ 2013-12-17 20:10     ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2013-12-17 20:10 UTC (permalink / raw)
  To: therbert; +Cc: eric.dumazet, edumazet, netdev

From: Tom Herbert <therbert@google.com>
Date: Tue, 17 Dec 2013 11:51:02 -0800

> Definitely not hitting the bug on. In any case, I'm wondering why is
> cache alignment being hard coded here instead of using compiler
> directives?

To tightly control the size of this structure, it is one of the most
critical in the whole networking stack so any wasted bytes is a huge
deal.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-12-17 20:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-17 18:15 PATCH] net: Cache align dst refcnt Tom Herbert
2013-12-17 18:29 ` Eric Dumazet
2013-12-17 18:47   ` Eric Dumazet
2013-12-17 19:51   ` Tom Herbert
2013-12-17 20:10     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).