linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Martin J. Bligh" <Martin.Bligh@us.ibm.com>
To: Andrea Arcangeli <andrea@suse.de>
Cc: linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: Scalability problem (kmap_lock) with -aa kernels
Date: Tue, 19 Mar 2002 22:15:30 -0800	[thread overview]
Message-ID: <221408128.1016576129@[10.10.2.3]> (raw)
In-Reply-To: <20020320024008.A4268@dualathlon.random>

> If you have PIII cpus pre3 mainline has a bug in the machine check code,
> this one liner will fix it:

Thanks - that'll probably save me a bunch of time ... 

May I assume, for the sake of discussion, that there's not much 
difference between pre1-aa1 and your latest stuff in the area we're 
talking about here?

> One thing I see is not only a scalability problem with the locking, but
> it seems kmap_high is also spending an huge amount of time in kernel
> compared to the "standard" profiling.  

One other wierd thing to note in the profiling is that kunmap_high
is way up in the profile as well - seeing as kunmap_high doesn't
do the same sort of scanning as kmap_high, it shouldn't be O(N).
I'm wondering if somehow the profiling attributes some of the spinning
cost to the calling function. Pure speculation, but there's defintely
something strange there ....

> That maybe  because I increased
> too much the size of the pool (the algorithm is O(N)). Can you try 
> again with this incremental patch applied?

Sure ... will retest. But by shrinking the size of the pool back down, 
won't you just increase the number of global tlbflushes? Any way you 
cut it, the kmaps are going to be expensive ... according to the 
lockmeter stuff, you're doing about 3.5 times as many kmaps.

> The pte-highmem stuff has nothing to do with the kmap_high O(N)
> complexity that maybe the real reason of this slowdown. (the above 
> patch decreases N of an order of magnitude and so we'll see if that 
> was the real problem or not)

I appreciate the scanning doesn't scale well, but is the pte-highmem
stuff the cause of the increase of kmap frequency? There seems to be
both a frequency and duration problem.

> So avoiding persistent kmaps in the pte handling would in turn you give
> you additional scalability __if__ your workload is very pagetable
> intensive (a kernel compile is very pagetable intensive incidentally),
> but the very same scalability problem you can find with the pagetables
> you will have it also for the cache in different workloads because all
> the pagecache is in highmem too and every time you execute a read
> syscall you will also need to kmap-persistent a pagecache.

I don't think anyone would deny that making kmap faster / more scalable
would be a Good Thing ;-) I haven't stared at the pagecache code too
much - once we avoid the bounce buffers with Jens' patches, do we 
still need to do a kmap for the pagecache situation you mention?

> The 2.5 kernel avoids using persistent kmaps for pagetables, that's the
> only interesting difference with pte-highmem in 2.4 (all other
> differences are not interesting and I prefer pte-highmem for all other
> parts for sure), but note that you will still have to pay for an hit if
> you want the feature compared to the "standard" 2.4 that you benchmarked
> against: in 2.5 the CPU will have to walk pagetables for the kmap areas
> after every new kmap because the kmap will be forced to flush the tlb
> entry without persistence. The pagetables relative to the kmap atomic
> area are shared across all cpus and the cpu issues locked cycles to walk
> them.

OK, I guess we're back to the question of whether a local tlb_flush_one
per kmap is cheaper than a global tlb_flush_all once per LAST_PKMAP 
kmaps. Not just in terms of time to execute, but in terms of how much
we slow down others by trashing the cache ... I guess that's going to
be tough to really measure.

It would be nice to be able to compare the two different kmap approaches
against each other - AFAIK, the 2.5 implementation isn't available for 
2.4 to compare though ... if your stuff is easy to change over to 
atomic_kmap, I'd be happy to compare it (unless shrinking the pool size
fixes it, in which case we're done).

Thanks for taking the time to explain all of this - I have a much 
better idea what's going on now. I'll get you the new numbers tommorow.

M.


  reply	other threads:[~2002-03-20  6:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-03-19  4:25 Scalability problem (kmap_lock) with -aa kernels Martin J. Bligh
2002-03-19  8:58 ` Rik van Riel
2002-03-20  1:40 ` Andrea Arcangeli
2002-03-20  6:15   ` Martin J. Bligh [this message]
2002-03-20 12:30     ` Andrea Arcangeli
2002-03-20 16:14 Martin J. Bligh
2002-03-20 16:39 ` Andrea Arcangeli
2002-03-20 17:41   ` Rik van Riel
2002-03-20 18:26     ` Andrea Arcangeli
2002-03-20 19:35       ` Rik van Riel
2002-03-20 18:16   ` Martin J. Bligh
2002-03-20 18:29     ` Martin J. Bligh
2002-03-20 18:40     ` Andrea Arcangeli
2002-03-20 18:15 ` Hugh Dickins
2002-03-20 18:56   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='221408128.1016576129@[10.10.2.3]' \
    --to=martin.bligh@us.ibm.com \
    --cc=andrea@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).