linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Overhead of highpte
@ 2003-07-02 22:53 Martin J. Bligh
  2003-07-02 23:15 ` William Lee Irwin III
  2003-07-04  2:34 ` Dave Hansen
  0 siblings, 2 replies; 7+ messages in thread
From: Martin J. Bligh @ 2003-07-02 22:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Bill Irwin, haveblue

Some people were saying they couldn't see an overhead with highpte.
Seems pretty obvious to me still. It should help *more* on the NUMA
box, as PTEs become node-local.

The kmap_atomic is, of course, perfectly understandable. The increase
in the rmap functions is a bit of a mystery to me.

M.

Kernbench: (make -j vmlinux, maximal tasks)
                              Elapsed      System        User         CPU
               2.5.73-mm3       45.38      114.91      565.81     1497.75
       2.5.73-mm3-highpte       46.54      130.41      566.84     1498.00

(note system time)

      1480     9.1% total
      1236    52.7% page_remove_rmap
       113    18.5% page_add_rmap
        90   150.0% kmap_atomic
        89    54.6% kmem_cache_free
        45    15.0% zap_pte_range
        37     0.0% kmap_atomic_to_page
        28    87.5% __pte_chain_free
        26   216.7% kunmap_atomic
        17    13.4% release_pages
        12    10.5% file_move
        11    42.3% filemap_nopage
        10    13.7% handle_mm_fault
...
       -10   -16.1% generic_file_open
       -10    -5.2% atomic_dec_and_lock
       -13    -2.4% __copy_to_user_ll
       -13    -3.7% find_get_page
       -13    -7.0% path_lookup
       -21    -2.6% __d_lookup
       -36   -78.3% page_address
       -49   -74.2% pte_alloc_one
      -104    -2.2% default_idle


SDET 32  (see disclaimer)
                           Throughput    Std. Dev
               2.5.73-mm3       100.0%         0.8%
       2.5.73-mm3-highpte        95.3%         0.1%


(highpte hung above 32 load).

       971     5.5% total
       399     3.9% default_idle
       329    23.1% page_remove_rmap
       124    15.6% page_add_rmap
       119    94.4% kmem_cache_free
        39   205.3% kmap_atomic
        24     9.8% release_pages
        21   131.2% __pte_chain_free
        15  1500.0% kmap_atomic_to_page
        13    76.5% __kmalloc
        11   183.3% kunmap_atomic
...
       -10   -35.7% __copy_from_user_ll
       -10    -3.9% find_get_page
       -16   -11.7% .text.lock.filemap
       -16   -45.7% page_address
       -16   -18.2% atomic_dec_and_lock
       -40   -74.1% pte_alloc_one


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overhead of highpte
  2003-07-02 22:53 Overhead of highpte Martin J. Bligh
@ 2003-07-02 23:15 ` William Lee Irwin III
  2003-07-03  0:02   ` William Lee Irwin III
  2003-07-04  2:34 ` Dave Hansen
  1 sibling, 1 reply; 7+ messages in thread
From: William Lee Irwin III @ 2003-07-02 23:15 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, Andrew Morton, haveblue

On Wed, Jul 02, 2003 at 03:53:24PM -0700, Martin J. Bligh wrote:
> Some people were saying they couldn't see an overhead with highpte.
> Seems pretty obvious to me still. It should help *more* on the NUMA
> box, as PTEs become node-local.
> The kmap_atomic is, of course, perfectly understandable. The increase
> in the rmap functions is a bit of a mystery to me.

The rmap functions perform kmap_atomic() internally while traversing
pte_chains and so will take various additional TLB misses and incur
various bits of computational expense with i386's kmap_atomic() semantics.

The observations I made were in combination with both highpmd and an as
of yet unmerged full 3-level pagetable cacheing patch.


-- wli

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overhead of highpte
  2003-07-02 23:15 ` William Lee Irwin III
@ 2003-07-03  0:02   ` William Lee Irwin III
  0 siblings, 0 replies; 7+ messages in thread
From: William Lee Irwin III @ 2003-07-03  0:02 UTC (permalink / raw)
  To: Martin J. Bligh, linux-kernel, Andrew Morton, haveblue

On Wed, Jul 02, 2003 at 03:53:24PM -0700, Martin J. Bligh wrote:
>> Some people were saying they couldn't see an overhead with highpte.
>> Seems pretty obvious to me still. It should help *more* on the NUMA
>> box, as PTEs become node-local.
>> The kmap_atomic is, of course, perfectly understandable. The increase
>> in the rmap functions is a bit of a mystery to me.

On Wed, Jul 02, 2003 at 04:15:02PM -0700, William Lee Irwin III wrote:
> The rmap functions perform kmap_atomic() internally while traversing
> pte_chains and so will take various additional TLB misses and incur
> various bits of computational expense with i386's kmap_atomic() semantics.
> The observations I made were in combination with both highpmd and an as
> of yet unmerged full 3-level pagetable cacheing patch.

Actually they were made with something approaching 0.5MB of diff vs.
the core VM and i386 arch code, so it isn't in the least bit remotely
likely to resemble any results on mainline.

Now, OTOH, my test was done on the same tree with different .config
settings, so it wasn't completely useless.


-- wli

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overhead of highpte
  2003-07-02 22:53 Overhead of highpte Martin J. Bligh
  2003-07-02 23:15 ` William Lee Irwin III
@ 2003-07-04  2:34 ` Dave Hansen
  2003-07-04  2:46   ` Martin J. Bligh
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2003-07-04  2:34 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, Andrew Morton, William Lee Irwin III

On Wed, 2003-07-02 at 15:53, Martin J. Bligh wrote:
> Some people were saying they couldn't see an overhead with highpte.
> Seems pretty obvious to me still. It should help *more* on the NUMA
> box, as PTEs become node-local.
> 
> The kmap_atomic is, of course, perfectly understandable. The increase
> in the rmap functions is a bit of a mystery to me.
> 
> M.
> 
> Kernbench: (make -j vmlinux, maximal tasks)
>                               Elapsed      System        User         CPU
>                2.5.73-mm3       45.38      114.91      565.81     1497.75
>        2.5.73-mm3-highpte       46.54      130.41      566.84     1498.00

OK, let's add to the mystery.  Here's my run, on virtually the same
hardware except, I don't do a bzImage.  bzImage is pretty useless
because I don't want to benchmark gzip, so I just do vmlinux.  My times
should be _faster_ than yours, right?

                   Elapsed:     User:   System:    CPU:
2.5.73-mjb2         77.008s  937.756s       90s   1334%
2.5.73-mjb2-highpte 76.756s  935.464s   93.116s   1339%

Yeah, system time goes up.  Something funky is going on.  We should have
the same machines, except that I have twice the RAM, right?  What kind
of fs are you doing your tests on?  I'm doing ramfs.

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overhead of highpte
  2003-07-04  2:34 ` Dave Hansen
@ 2003-07-04  2:46   ` Martin J. Bligh
  2003-07-04  2:54     ` Dave Hansen
  2003-07-04  3:53     ` Overhead of highpte (or not :) Dave Hansen
  0 siblings, 2 replies; 7+ messages in thread
From: Martin J. Bligh @ 2003-07-04  2:46 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, Andrew Morton, William Lee Irwin III

> On Wed, 2003-07-02 at 15:53, Martin J. Bligh wrote:
>> Some people were saying they couldn't see an overhead with highpte.
>> Seems pretty obvious to me still. It should help *more* on the NUMA
>> box, as PTEs become node-local.
>> 
>> The kmap_atomic is, of course, perfectly understandable. The increase
>> in the rmap functions is a bit of a mystery to me.
>> 
>> M.
>> 
>> Kernbench: (make -j vmlinux, maximal tasks)
>>                               Elapsed      System        User         CPU
>>                2.5.73-mm3       45.38      114.91      565.81     1497.75
>>        2.5.73-mm3-highpte       46.54      130.41      566.84     1498.00
> 
> OK, let's add to the mystery.  Here's my run, on virtually the same
> hardware except, I don't do a bzImage.  bzImage is pretty useless
> because I don't want to benchmark gzip, so I just do vmlinux.  My times
> should be _faster_ than yours, right?

I do vmlinux as well.
 
>                    Elapsed:     User:   System:    CPU:
> 2.5.73-mjb2         77.008s  937.756s       90s   1334%
> 2.5.73-mjb2-highpte 76.756s  935.464s   93.116s   1339%
> 
> Yeah, system time goes up.  Something funky is going on.  We should have
> the same machines, except that I have twice the RAM, right?  What kind
> of fs are you doing your tests on?  I'm doing ramfs.

ext2.

I suspect the problem is that your gcc is such a slow piece of shit,
you're totally userspace bound. Try 2.95 (just move the /usr/bin/gcc
symlink on debian).

M.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overhead of highpte
  2003-07-04  2:46   ` Martin J. Bligh
@ 2003-07-04  2:54     ` Dave Hansen
  2003-07-04  3:53     ` Overhead of highpte (or not :) Dave Hansen
  1 sibling, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2003-07-04  2:54 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, Andrew Morton, William Lee Irwin III

On Thu, 2003-07-03 at 19:46, Martin J. Bligh wrote:
> I suspect the problem is that your gcc is such a slow piece of shit,
> you're totally userspace bound. Try 2.95 (just move the /usr/bin/gcc
> symlink on debian).

Yep, you're right.  I thought I _was_ using 2.95.  Oh well.

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Overhead of highpte (or not :)
  2003-07-04  2:46   ` Martin J. Bligh
  2003-07-04  2:54     ` Dave Hansen
@ 2003-07-04  3:53     ` Dave Hansen
  1 sibling, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2003-07-04  3:53 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, Andrew Morton, William Lee Irwin III

After fixing my gcc stupidity
             Elapsed:    User:   System:     CPU:
high pte:    49.962s  578.888s   99.048s  1356.2%
  lowpte:    49.630s  576.242s   90.158s  1342.0%
    ukva:    50.122s  577.164s   88.958s  1328.4%

The increase in elapsed is probably within tolerances.  The system time
isn't :)  The decrease in system time compared to lowpte comes because
the PTE pages are allocated in highmem, and are node-local.  

Differential profiles are up in a bit.

-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-07-04  3:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-02 22:53 Overhead of highpte Martin J. Bligh
2003-07-02 23:15 ` William Lee Irwin III
2003-07-03  0:02   ` William Lee Irwin III
2003-07-04  2:34 ` Dave Hansen
2003-07-04  2:46   ` Martin J. Bligh
2003-07-04  2:54     ` Dave Hansen
2003-07-04  3:53     ` Overhead of highpte (or not :) Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).