linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Salman Qazi <sqazi@google.com>,
	davem@davemloft.net, linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andi Kleen <andi@firstfloor.org>
Subject: Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()
Date: Wed, 25 Feb 2009 13:01:39 +0100	[thread overview]
Message-ID: <20090225120139.GB14279@elte.hu> (raw)
In-Reply-To: <200902251959.05853.nickpiggin@yahoo.com.au>


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> No I'm talking about this next case:
> 
> > We can do little about user-space doing stupid things as 
> > doing a big write as a series of many smaller-than-4K 
> > writes.
> 
> Not necessarily smaller than 4K writes, but even as a series 
> of 4K writes. It isn't stupid thing to do if the source memory 
> is always in cache. But if you destination is unlikely to be 
> used, then you still would want nontemporal stores.

I dont disagree that it would be nice to handle that case too, i 
just dont see how.

Unless you suggest some new logic that tracks the length of a 
continuous write to a file, and whether it got read back 
recently, i dont see how this could be done sanely.

That's the deal generally: if an app gives the kernel enough 
information in a syscall, the kernel can act on it reasonably. 

Sometimes, for important cases we allow apps to set attributes 
that function across syscalls too - like here we could extend 
madvise() to hint the kind of access ... but i doubt it would be 
used widely.

Sometimes, for _really_ important cases the kernel will also try 
to auto-detect patterns of use. We do that for readahead and we 
do that for socket buffers - and a few other things. Do you 
suggest we should do it here too?

Anyway ... i wouldnt mind if the lowlevel code used more hints 
if they are present and useful.

And unlike the 'final tail' case which indeed was quirky 
behavior and was worth fixing (hence the 'total' patch), the 
'should the kernel detect many small writes being one real big 
write' question is not a quirk but a highlevel question that the 
lowlevel copy code (nor the lowlevel pagecache code) can answer. 
So it's all a bit different.

> > The new numbers from Salman are convincing too - and his fix
> 
> I'm not exactly convinced. The boundary behaviour condition is 
> a real negative. What I question is whether that benchmark is 
> not doing something stupid. He is quoting the write(2)-only 
> portion of the benchmark, so the speedup does not come from 
> the app reading back results from cache. It comes from either 
> overwriting the same dirty cachelines (a performance critical 
> program should really avoid doing this if possible anyway); or 
> the cached stores simply pipelining better with non-store 
> operations (but in that case you probably still want 
> non-temporal stores anyway because if your workload is doing 
> any real work, you don't want to push its cache out with these 
> stores).
> 
> So, can we find something that is more realistic? Doesn't gcc 
> create several stages of temporary files?

i dont think this is really about performance critical apps, and 
i suspect the numbers will be even more convincing if a read() 
is inserted inbetween.

Lets face it, 99% of the Linux apps out there are not coded with 
'performance critical' aspects in mind.

So what we have to do is to watch out for common and still sane 
patterns of kernel usage - and optimize them, not dismiss them 
with 'this could be done even faster with XYZ'. (As long as it 
does not hurt sane usages - which i think this does not.)

	Ingo

  reply	other threads:[~2009-02-25 12:02 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-24  2:03 Performance regression in write() syscall Salman Qazi
2009-02-24  4:10 ` Nick Piggin
2009-02-24  4:28   ` Linus Torvalds
2009-02-24  9:02     ` Nick Piggin
2009-02-24 15:52       ` Linus Torvalds
2009-02-24 16:24         ` Andi Kleen
2009-02-24 16:51         ` Ingo Molnar
2009-02-25  3:23         ` Nick Piggin
2009-02-25  7:25           ` [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache() Ingo Molnar
2009-02-25  8:09             ` Nick Piggin
2009-02-25  8:29               ` Ingo Molnar
2009-02-25  8:59                 ` Nick Piggin
2009-02-25 12:01                   ` Ingo Molnar [this message]
2009-02-25 16:04             ` Linus Torvalds
2009-02-25 16:29               ` Ingo Molnar
2009-02-27 12:05               ` Nick Piggin
2009-02-28  8:29                 ` Ingo Molnar
2009-02-28 11:49                   ` Nick Piggin
2009-02-28 12:58                     ` Ingo Molnar
2009-02-28 17:16                       ` Linus Torvalds
2009-02-28 17:24                         ` Arjan van de Ven
2009-02-28 17:42                           ` Linus Torvalds
2009-02-28 17:53                             ` Arjan van de Ven
2009-02-28 18:05                             ` Andi Kleen
2009-02-28 18:27                             ` Ingo Molnar
2009-02-28 18:39                               ` Arjan van de Ven
2009-03-02 10:39                                 ` [PATCH] x86, mm: dont use non-temporal stores in pagecache accesses Ingo Molnar
2009-02-28 18:52                               ` [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache() Linus Torvalds
2009-03-01 14:19                                 ` Nick Piggin
2009-03-01  0:06                             ` David Miller
2009-03-01  0:40                               ` Andi Kleen
2009-03-01  0:28                                 ` H. Peter Anvin
2009-03-01  0:38                                   ` Arjan van de Ven
2009-03-01  1:48                                     ` Andi Kleen
2009-03-01  1:38                                       ` Arjan van de Ven
2009-03-01  1:40                                         ` H. Peter Anvin
2009-03-01 14:06                                           ` Nick Piggin
2009-03-02  4:46                                             ` H. Peter Anvin
2009-03-02  6:18                                               ` Nick Piggin
2009-03-02 21:16                                             ` Linus Torvalds
2009-03-02 21:25                                               ` Ingo Molnar
2009-03-03  4:30                                                 ` Nick Piggin
2009-03-03  4:20                                               ` Nick Piggin
2009-03-03  9:02                                                 ` Ingo Molnar
2009-03-04  3:37                                                   ` Nick Piggin
2009-03-01  2:07                                         ` Andi Kleen
2009-02-24  5:43   ` Performance regression in write() syscall Salman Qazi
2009-02-24 10:09 ` Andi Kleen
2009-02-24 16:13   ` Ingo Molnar
2009-02-24 16:51     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090225120139.GB14279@elte.hu \
    --to=mingo@elte.hu \
    --cc=andi@firstfloor.org \
    --cc=davem@davemloft.net \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=sqazi@google.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).