linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Andrea Arcangeli <andrea@suse.de>
Cc: Eric Whiting <ewhiting@amis.com>,
	akpm@osdl.org, linux-kernel@vger.kernel.org
Subject: Re: -mmX 4G patches feedback [numbers: how much performance impact]
Date: Wed, 7 Apr 2004 10:23:50 +0200	[thread overview]
Message-ID: <20040407082350.GB32140@elte.hu> (raw)
In-Reply-To: <20040407072349.GC26888@dualathlon.random>


* Andrea Arcangeli <andrea@suse.de> wrote:

> > That area of the curve is quite suspect at first sight. With a TLB flush
> 
> the cache size of the l2 is 512k, that's the point where slowing down
> walking pagetables out of l2 hurt most. It made perfect sense to me.
> Likely on a 1M cache machine you'll get the same huge slowdown at 1M
> working set and so on with bigger cache sizes in more expensive x86
> big iron cpus.

ah! i assumed a 1MB cache.

yes, there could be caching effects around the cache size, but the
magnitude still looks _way_ off. Here are a number of independent
calculations and measurements to support this point:

64 dTLBs means 64 pages. Even assuming the most spread out layout in PAE
mode, a single pagetable walk needs to access the pte and the pmd
pointers, which, if each pagewalk-path lies on separate cachelines
(worst-case), it means 2x64 == 128 bytes footprint per page. [64 byte L2
cacheline size on your box.] This means 128x64 == 8K footprint in the L2
cache for the TLBs.

this is only 1.5% of the L2 cache, so it should not make such a huge
difference. Even considering P4's habit of fetching two cachelines on a
miss, the footprint could at most be 3%.

the real pagetable footprint is likely much lower - there's likely a
fair amount of sharing at the pmd pointer level.

so the theory that it's the pagetable falling out of the cache that
makes the difference doesnt seem plausible.

> > every 1 msec [*], for a 'double digit' slowdown to happen it means the
> > effect of the TLB flush has to be on the order of 100-200 usecs. This is
> > near impossible, the dTLB+iTLB on your CPU is only 64+64. This means
> > that a simple mmap() or a context-switch done by a number-cruncher
> > (which does a TLB flush too) would have a 100-200 usecs secondary cost -
> > this has never been seen or reported before!
> 
> note that it's not only the tlb flush having the cost, the cost is the
> later code going slow due the tlb misses. [...]

this is what i called the 'secondary' cost of the TLB flush.

> [...] So if you rdstc around the mmap syscall it'll return quick just
> like the irq returns quick to userspace. the cost of the tlb misses
> causing pte walkings of ptes out of l2 cache isn't easily measurable
> in other ways than I did.

i know that it's not measurable directly, but a 100-200 usecs slowdown
due to a TLB flush would be easily noticeable as a slowdown in userspace
performance.

lets assume all of the userspace pagetable cachelines fall out of the L2
cache during the 1 msec slice, and lets assume the worst-case spread of
the pagetables, necessiating 128 cachemisses. With 300 cycles per L2
cachemiss, this makes for 38400 cycles - 15 usecs on your 2.5 GHz Xeon. 
This is the very-worst-case, and it's still an order of magnitude
smaller than the 100-200 usecs.

I've attached tlb2.c which measures cold-cache miss costs with a
randomized access pattern over a memory range of 512 MB, using 131072
separate pages as a TLB-miss target. (you should run it as root, it uses
cli.) So this cost combines the data-cache miss cost and the TLB-miss
costs.

This should give the worst-case cost of TLB misses. According to these
measurements the worst-case for 64 dTLB misses is ~25000 cycles, or 10
usecs on your box - well below the conservative calculation above. This
is the very worst-case TLB-trashing example i could create, and note
that it still over-estimates the TLB-miss cost because the data-cache
miss factors in too. (and the data-cache miss cannot be done in parallel
to the TLB miss, because without knowing the TLB the CPU cannot know
which page to fetch.) [the TLB miss is two cachemisses, the data-cache
fetch is one cachemiss. The two TLB related pte and pmd cachemisses are
linearized too, because the pmd value is needed for the pte fetching.]
So the real cost for the TLB misses would be around 6 usecs, way below
the 100-200 usecs effect you measured.

> > to eliminate layout effects, could you do another curve? Plain -aa4 (no
> 
> with page layout effects you mean different page coloring I assume,
> but it's well proven that page coloring has no significant effect on
> those multi associative x86 caches [...]

E.g. the P4 has associativity constraints in the VM-linear space too
(!). Just try to access 10 cachelines spaced exactly 128 KB away from
each other (in virtual space).

> > 4:4 patch) but a __flush_tlb() added before and after do_IRQ(), in
> > arch/i386/kernel/irq.c? This should simulate much of the TLB flushing
> > effect of 4:4 on -aa4, without any of the other layout changes in the
> > kernel. [it's not a full simulation of all effects of 4:4, but it should
> > simulate the TLB flush effect quite well.]
> 
> sure I can try it (though not right now, but I'll try before the
> weekend).

thanks for the testing! It will be interesting to see.

	Ingo

  reply	other threads:[~2004-04-07  8:23 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-05 16:36 -mmX 4G patches feedback Eric Whiting
2004-04-05 17:46 ` Andrea Arcangeli
2004-04-05 21:35   ` Eric Whiting
2004-04-05 22:16     ` Andrea Arcangeli
2004-04-06 11:55       ` -mmX 4G patches feedback [numbers: how much performance impact] Ingo Molnar
2004-04-06 14:49         ` Eric Whiting
2004-04-06 15:59         ` Andrea Arcangeli
2004-04-06 16:13           ` Arjan van de Ven
2004-04-06 16:39             ` Andrea Arcangeli
2004-04-06 17:24           ` Ingo Molnar
2004-04-06 17:57             ` Andrea Arcangeli
2004-04-07 22:54               ` Martin J. Bligh
2004-04-07 22:50                 ` Andrea Arcangeli
2004-04-06 19:25           ` Ingo Molnar
2004-04-06 20:25             ` Andrea Arcangeli
2004-04-07  6:03               ` Andrea Arcangeli
2004-04-07  6:46                 ` Ingo Molnar
2004-04-07  7:23                   ` Andrea Arcangeli
2004-04-07  8:23                     ` Ingo Molnar [this message]
2004-04-07 21:35                       ` Andrea Arcangeli
2004-04-07 17:27                   ` Andrea Arcangeli
2004-04-07  7:25               ` Ingo Molnar
2004-04-07 21:39                 ` Andrea Arcangeli
2004-04-07 22:58             ` Martin J. Bligh
2004-04-07 23:01               ` Andrea Arcangeli
2004-04-07 23:21                 ` Martin J. Bligh
2004-04-07 23:18                   ` Andrea Arcangeli
2004-04-07 23:34                     ` Martin J. Bligh
2004-04-08  0:18                       ` Andrea Arcangeli
2004-04-08  6:24                         ` Martin J. Bligh
2004-04-08 21:59                           ` Andrea Arcangeli
2004-04-08 22:19                             ` Martin J. Bligh
2004-04-08 22:19                               ` Andrea Arcangeli
2004-04-08 23:14                                 ` Martin J. Bligh
2004-04-08 23:22                                   ` Andrea Arcangeli
2004-04-08 23:42                                     ` Martin J. Bligh
2004-04-08 23:49                                       ` Andrea Arcangeli
2004-04-07 21:19       ` -mmX 4G patches feedback Martin J. Bligh
2004-04-07 21:49         ` Andrea Arcangeli
2004-04-06 17:59 -mmX 4G patches feedback [numbers: how much performance impact] Manfred Spraul
2004-04-06 18:41 ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040407082350.GB32140@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=ewhiting@amis.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).