From: Eric Dumazet <dada1@cosmosbay.com>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: akepner@sgi.com, linux@horizon.com, davem@davemloft.net,
netdev@vger.kernel.org, bcrl@kvack.org
Subject: Re: Extensible hashing and RCU
Date: Tue, 20 Feb 2007 12:09:51 +0100 [thread overview]
Message-ID: <200702201209.52388.dada1@cosmosbay.com> (raw)
In-Reply-To: <20070220104418.GA5262@2ka.mipt.ru>
On Tuesday 20 February 2007 11:44, Evgeniy Polyakov wrote:
> On Tue, Feb 20, 2007 at 11:04:15AM +0100, Eric Dumazet (dada1@cosmosbay.com)
wrote:
> > You totally miss the fact that the 1-2-4 MB cache is not available for
> > you at all. It is filled by User accesses. I dont care about DOS. I care
> > about real servers, servicing tcp clients. The TCP service/stack should
> > not take more than 10% of CPU (cycles and caches). The User application
> > is certainly more important because it hosts the real added value.
>
> TCP socket is 4k in size, one tree entry can be reduced to 200 bytes?
>
> No one says about _that_ cache miss, it is considered OK to have, but
> tree cache miss becomes the worst thing ever.
> In softirq we process socket's state, lock, reference counter several
> pointer, and if we are happy - the whole TCP state machine fields - and
> most of it stasy there when kernel is over - userspace issues syscalls
> which must populate it back. Why don't we see that it is moved into
> cache each time syscall is invoked? Because it is in the cache as long
> as part of the hash table assotiated with last recently used hash
> entries, which should not be there, and instead part of the tree can be.
No I see cache misses everywhere...
This is because my machines are doing real work in user land. They are not lab
machines. Even if I had cpus with 16-32MB cache, it would be the same,
because User land wants GBs ...
For example, sock_wfree() uses 1.6612 % of cpu because of false sharing of
sk_flags (dirtied each time SOCK_QUEUE_SHRUNK is set :(
ffffffff803c2850 <sock_wfree>: /* sock_wfree total: 714241 1.6613 */
1307 0.0030 :ffffffff803c2850: push %rbp
55056 0.1281 :ffffffff803c2851: mov %rsp,%rbp
94 2.2e-04 :ffffffff803c2854: push %rbx
:ffffffff803c2855: sub $0x8,%rsp
1090 0.0025 :ffffffff803c2859: mov 0x10(%rdi),%rbx
3 7.0e-06 :ffffffff803c285d: mov 0xb8(%rdi),%eax
38 8.8e-05 :ffffffff803c2863: lock sub %eax,0x90(%rbx)
/* HOT : access to sk_flags */
81979 0.1907 :ffffffff803c286a: mov 0x100(%rbx),%eax
512119 1.1912 :ffffffff803c2870: test $0x2,%ah
262 6.1e-04 :ffffffff803c2873: jne ffffffff803c2880
<sock_wfree+0x30>
142 3.3e-04 :ffffffff803c2875: mov %rbx,%rdi
14467 0.0336 :ffffffff803c2878: callq *0x200(%rbx)
63 1.5e-04 :ffffffff803c287e: data16
:ffffffff803c287f: nop
9046 0.0210 :ffffffff803c2880: lock decl 0x28(%rbx)
29792 0.0693 :ffffffff803c2884: sete %al
56 1.3e-04 :ffffffff803c2887: test %al,%al
789 0.0018 :ffffffff803c2889: je ffffffff803c2893
<sock_wfree+0x43>
:ffffffff803c288b: mov %rbx,%rdi
144 3.3e-04 :ffffffff803c288e: callq ffffffff803c0f90 <sk_free>
1685 0.0039 :ffffffff803c2893: add $0x8,%rsp
2462 0.0057 :ffffffff803c2897: pop %rbx
684 0.0016 :ffffffff803c2898: leaveq
2963 0.0069 :ffffffff803c2899: retq
This is why tcp lookups should not take more than 1% themselves : other parts
of the stack *want* to make many cache misses too.
If we want to optimize tcp, we should reorder fields to reduce number of cache
lines, not change algos. struct sock fields are currently placed to reduce
holes, while they should be grouped by related fields sharing cache lines.
next prev parent reply other threads:[~2007-02-20 11:10 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-04 7:41 Extensible hashing and RCU linux
2007-02-05 18:02 ` akepner
2007-02-17 13:13 ` Evgeniy Polyakov
2007-02-18 18:46 ` Eric Dumazet
2007-02-18 19:10 ` Evgeniy Polyakov
2007-02-18 20:21 ` Eric Dumazet
2007-02-18 21:23 ` Michael K. Edwards
2007-02-18 22:04 ` Michael K. Edwards
2007-02-19 12:04 ` Andi Kleen
2007-02-19 19:18 ` Michael K. Edwards
2007-02-19 11:41 ` Evgeniy Polyakov
2007-02-19 13:38 ` Eric Dumazet
2007-02-19 13:56 ` Evgeniy Polyakov
2007-02-19 14:14 ` Eric Dumazet
2007-02-19 14:25 ` Evgeniy Polyakov
2007-02-19 15:14 ` Eric Dumazet
2007-02-19 18:13 ` Eric Dumazet
2007-02-19 18:26 ` Benjamin LaHaise
2007-02-19 18:38 ` Benjamin LaHaise
2007-02-20 9:25 ` Evgeniy Polyakov
2007-02-20 9:57 ` David Miller
2007-02-20 10:22 ` Evgeniy Polyakov
2007-02-20 10:04 ` Eric Dumazet
2007-02-20 10:12 ` David Miller
2007-02-20 10:30 ` Evgeniy Polyakov
2007-02-20 11:10 ` Eric Dumazet
2007-02-20 11:23 ` Evgeniy Polyakov
2007-02-20 11:30 ` Eric Dumazet
2007-02-20 11:41 ` Evgeniy Polyakov
2007-02-20 10:49 ` Eric Dumazet
2007-02-20 15:07 ` Michael K. Edwards
2007-02-20 15:11 ` Evgeniy Polyakov
2007-02-20 15:49 ` Michael K. Edwards
2007-02-20 15:59 ` Evgeniy Polyakov
2007-02-20 16:08 ` Eric Dumazet
2007-02-20 16:20 ` Evgeniy Polyakov
2007-02-20 16:38 ` Eric Dumazet
2007-02-20 16:59 ` Evgeniy Polyakov
2007-02-20 17:05 ` Evgeniy Polyakov
2007-02-20 17:53 ` Eric Dumazet
2007-02-20 18:00 ` Evgeniy Polyakov
2007-02-20 18:55 ` Eric Dumazet
2007-02-20 19:06 ` Evgeniy Polyakov
2007-02-20 19:17 ` Eric Dumazet
2007-02-20 19:36 ` Evgeniy Polyakov
2007-02-20 19:44 ` Michael K. Edwards
2007-02-20 17:20 ` Eric Dumazet
2007-02-20 17:55 ` Evgeniy Polyakov
2007-02-20 18:12 ` Evgeniy Polyakov
2007-02-20 19:13 ` Michael K. Edwards
2007-02-20 19:44 ` Evgeniy Polyakov
2007-02-20 20:03 ` Michael K. Edwards
2007-02-20 20:09 ` Michael K. Edwards
2007-02-21 8:56 ` Evgeniy Polyakov
2007-02-21 9:34 ` David Miller
2007-02-21 9:51 ` Evgeniy Polyakov
2007-02-21 10:03 ` David Miller
2007-02-21 8:54 ` Evgeniy Polyakov
2007-02-21 9:15 ` Eric Dumazet
2007-02-21 9:27 ` Evgeniy Polyakov
2007-02-21 9:38 ` Eric Dumazet
2007-02-21 9:57 ` Evgeniy Polyakov
2007-02-21 21:15 ` Michael K. Edwards
2007-02-22 9:06 ` David Miller
2007-02-22 11:00 ` Michael K. Edwards
2007-02-22 11:07 ` David Miller
2007-02-22 19:24 ` Stephen Hemminger
2007-02-20 16:04 ` Eric Dumazet
2007-02-22 23:49 ` linux
2007-02-23 2:31 ` Michael K. Edwards
2007-02-20 10:44 ` Evgeniy Polyakov
2007-02-20 11:09 ` Eric Dumazet [this message]
2007-02-20 11:29 ` Evgeniy Polyakov
2007-02-20 11:34 ` Eric Dumazet
2007-02-20 11:45 ` Evgeniy Polyakov
2007-02-21 12:41 ` Andi Kleen
2007-02-21 13:19 ` Eric Dumazet
2007-02-21 13:37 ` David Miller
2007-02-21 23:13 ` Robert Olsson
2007-02-22 6:06 ` Eric Dumazet
2007-02-22 11:41 ` Andi Kleen
2007-02-22 11:44 ` David Miller
2007-02-20 12:11 ` Evgeniy Polyakov
2007-02-19 22:10 ` Andi Kleen
2007-02-19 12:02 ` Andi Kleen
2007-02-19 12:35 ` Robert Olsson
2007-02-19 14:04 ` Evgeniy Polyakov
2007-03-02 8:52 ` Evgeniy Polyakov
2007-03-02 9:56 ` Eric Dumazet
2007-03-02 10:28 ` Evgeniy Polyakov
2007-03-02 20:45 ` Michael K. Edwards
2007-03-03 10:46 ` Evgeniy Polyakov
2007-03-04 10:02 ` Michael K. Edwards
2007-03-04 20:36 ` David Miller
2007-03-05 7:12 ` Michael K. Edwards
2007-03-05 10:02 ` Robert Olsson
2007-03-05 10:00 ` Evgeniy Polyakov
2007-03-13 9:32 ` Evgeniy Polyakov
2007-03-13 10:08 ` Eric Dumazet
2007-03-13 10:24 ` Evgeniy Polyakov
2007-02-05 18:41 ` [RFC/TOY]Extensible " akepner
2007-02-06 19:09 ` linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200702201209.52388.dada1@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akepner@sgi.com \
--cc=bcrl@kvack.org \
--cc=davem@davemloft.net \
--cc=johnpol@2ka.mipt.ru \
--cc=linux@horizon.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.