From: Linus Torvalds <torvalds@linux-foundation.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
Salman Qazi <sqazi@google.com>,
davem@davemloft.net, linux-kernel@vger.kernel.org,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Andi Kleen <andi@firstfloor.org>
Subject: Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()
Date: Sat, 28 Feb 2009 09:16:21 -0800 (PST) [thread overview]
Message-ID: <alpine.LFD.2.00.0902280904271.3111@localhost.localdomain> (raw)
In-Reply-To: <20090228125816.GA14917@elte.hu>
On Sat, 28 Feb 2009, Ingo Molnar wrote:
>
> Can you suggest some other workload that should show sensitivity
> to this detail too? Like a simple write() loop of non-4K-sized
> files or so?
I bet you can find it, but I also suspect that it will depend quite a bit
on the microarchitecture. What does 'movntq' actually _do_ on different
CPU's (bypass L1 or L2 or just turn the L1 cache policy to "write through
and invalidate")? How expensive is the sfence when there are still stores
in the write buffer? Does 'movqnt' even use the write buffer for cached
stores, or is doing some special path the the last-level cache?
If you want to be really subtle, ask questions like what are the
implications for last-level caches that are inclusive? The last-level
cache would take not just the new write, but it also has logic to make
sure that it's a superset of the inner caches, so what does that do to
replacement policy for that cache? Or does it cause invalidations in the
inner caches?
Non-temporal stores are really quite different from normal stores.
Depending on microarchitecture, that may be totally a non-issue (bypassing
the L1 may be trivial and have no impact on anything else at all). Or it
could be that a movntq is really expensive because it needs to do odd
things.
So if you want to test this, I'd suggest using the same program that did
the 256-byte writes (Unixbench's fstime thing), but just change the
numbers, and just try different things. But I'd _also_ suggest that if
you're going for anything more complicated (ie if you really want to
have a good argument for that 'total_size' thing), then you should try out
at least three different microarchitectures.
The "different" ones would be at a minimum P4, Core2 and Opteron. They
really could have very different behavior.
I suspect Core2 and Core i7 are fairly similar, but at the same time Ci7
has that L3 cache thing, so it's quite possible that movntq is actually
fundamentally different (does it bypass both L1 and L2? If so, latencies
to the L3 are _much_ longer to Ci7 than the very cheap L2 latencies on
C2).
Linus
next prev parent reply other threads:[~2009-02-28 17:17 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-24 2:03 Performance regression in write() syscall Salman Qazi
2009-02-24 4:10 ` Nick Piggin
2009-02-24 4:28 ` Linus Torvalds
2009-02-24 9:02 ` Nick Piggin
2009-02-24 15:52 ` Linus Torvalds
2009-02-24 16:24 ` Andi Kleen
2009-02-24 16:51 ` Ingo Molnar
2009-02-25 3:23 ` Nick Piggin
2009-02-25 7:25 ` [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache() Ingo Molnar
2009-02-25 8:09 ` Nick Piggin
2009-02-25 8:29 ` Ingo Molnar
2009-02-25 8:59 ` Nick Piggin
2009-02-25 12:01 ` Ingo Molnar
2009-02-25 16:04 ` Linus Torvalds
2009-02-25 16:29 ` Ingo Molnar
2009-02-27 12:05 ` Nick Piggin
2009-02-28 8:29 ` Ingo Molnar
2009-02-28 11:49 ` Nick Piggin
2009-02-28 12:58 ` Ingo Molnar
2009-02-28 17:16 ` Linus Torvalds [this message]
2009-02-28 17:24 ` Arjan van de Ven
2009-02-28 17:42 ` Linus Torvalds
2009-02-28 17:53 ` Arjan van de Ven
2009-02-28 18:05 ` Andi Kleen
2009-02-28 18:27 ` Ingo Molnar
2009-02-28 18:39 ` Arjan van de Ven
2009-03-02 10:39 ` [PATCH] x86, mm: dont use non-temporal stores in pagecache accesses Ingo Molnar
2009-02-28 18:52 ` [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache() Linus Torvalds
2009-03-01 14:19 ` Nick Piggin
2009-03-01 0:06 ` David Miller
2009-03-01 0:40 ` Andi Kleen
2009-03-01 0:28 ` H. Peter Anvin
2009-03-01 0:38 ` Arjan van de Ven
2009-03-01 1:48 ` Andi Kleen
2009-03-01 1:38 ` Arjan van de Ven
2009-03-01 1:40 ` H. Peter Anvin
2009-03-01 14:06 ` Nick Piggin
2009-03-02 4:46 ` H. Peter Anvin
2009-03-02 6:18 ` Nick Piggin
2009-03-02 21:16 ` Linus Torvalds
2009-03-02 21:25 ` Ingo Molnar
2009-03-03 4:30 ` Nick Piggin
2009-03-03 4:20 ` Nick Piggin
2009-03-03 9:02 ` Ingo Molnar
2009-03-04 3:37 ` Nick Piggin
2009-03-01 2:07 ` Andi Kleen
2009-02-24 5:43 ` Performance regression in write() syscall Salman Qazi
2009-02-24 10:09 ` Andi Kleen
2009-02-24 16:13 ` Ingo Molnar
2009-02-24 16:51 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0902280904271.3111@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=davem@davemloft.net \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=sqazi@google.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).