* Overhead of highpte
@ 2003-07-02 22:53 Martin J. Bligh
2003-07-02 23:15 ` William Lee Irwin III
2003-07-04 2:34 ` Dave Hansen
0 siblings, 2 replies; 7+ messages in thread
From: Martin J. Bligh @ 2003-07-02 22:53 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, Bill Irwin, haveblue
Some people were saying they couldn't see an overhead with highpte.
Seems pretty obvious to me still. It should help *more* on the NUMA
box, as PTEs become node-local.
The kmap_atomic is, of course, perfectly understandable. The increase
in the rmap functions is a bit of a mystery to me.
M.
Kernbench: (make -j vmlinux, maximal tasks)
Elapsed System User CPU
2.5.73-mm3 45.38 114.91 565.81 1497.75
2.5.73-mm3-highpte 46.54 130.41 566.84 1498.00
(note system time)
1480 9.1% total
1236 52.7% page_remove_rmap
113 18.5% page_add_rmap
90 150.0% kmap_atomic
89 54.6% kmem_cache_free
45 15.0% zap_pte_range
37 0.0% kmap_atomic_to_page
28 87.5% __pte_chain_free
26 216.7% kunmap_atomic
17 13.4% release_pages
12 10.5% file_move
11 42.3% filemap_nopage
10 13.7% handle_mm_fault
...
-10 -16.1% generic_file_open
-10 -5.2% atomic_dec_and_lock
-13 -2.4% __copy_to_user_ll
-13 -3.7% find_get_page
-13 -7.0% path_lookup
-21 -2.6% __d_lookup
-36 -78.3% page_address
-49 -74.2% pte_alloc_one
-104 -2.2% default_idle
SDET 32 (see disclaimer)
Throughput Std. Dev
2.5.73-mm3 100.0% 0.8%
2.5.73-mm3-highpte 95.3% 0.1%
(highpte hung above 32 load).
971 5.5% total
399 3.9% default_idle
329 23.1% page_remove_rmap
124 15.6% page_add_rmap
119 94.4% kmem_cache_free
39 205.3% kmap_atomic
24 9.8% release_pages
21 131.2% __pte_chain_free
15 1500.0% kmap_atomic_to_page
13 76.5% __kmalloc
11 183.3% kunmap_atomic
...
-10 -35.7% __copy_from_user_ll
-10 -3.9% find_get_page
-16 -11.7% .text.lock.filemap
-16 -45.7% page_address
-16 -18.2% atomic_dec_and_lock
-40 -74.1% pte_alloc_one
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Overhead of highpte
2003-07-02 22:53 Overhead of highpte Martin J. Bligh
@ 2003-07-02 23:15 ` William Lee Irwin III
2003-07-03 0:02 ` William Lee Irwin III
2003-07-04 2:34 ` Dave Hansen
1 sibling, 1 reply; 7+ messages in thread
From: William Lee Irwin III @ 2003-07-02 23:15 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel, Andrew Morton, haveblue
On Wed, Jul 02, 2003 at 03:53:24PM -0700, Martin J. Bligh wrote:
> Some people were saying they couldn't see an overhead with highpte.
> Seems pretty obvious to me still. It should help *more* on the NUMA
> box, as PTEs become node-local.
> The kmap_atomic is, of course, perfectly understandable. The increase
> in the rmap functions is a bit of a mystery to me.
The rmap functions perform kmap_atomic() internally while traversing
pte_chains and so will take various additional TLB misses and incur
various bits of computational expense with i386's kmap_atomic() semantics.
The observations I made were in combination with both highpmd and an as
of yet unmerged full 3-level pagetable cacheing patch.
-- wli
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Overhead of highpte
2003-07-02 23:15 ` William Lee Irwin III
@ 2003-07-03 0:02 ` William Lee Irwin III
0 siblings, 0 replies; 7+ messages in thread
From: William Lee Irwin III @ 2003-07-03 0:02 UTC (permalink / raw)
To: Martin J. Bligh, linux-kernel, Andrew Morton, haveblue
On Wed, Jul 02, 2003 at 03:53:24PM -0700, Martin J. Bligh wrote:
>> Some people were saying they couldn't see an overhead with highpte.
>> Seems pretty obvious to me still. It should help *more* on the NUMA
>> box, as PTEs become node-local.
>> The kmap_atomic is, of course, perfectly understandable. The increase
>> in the rmap functions is a bit of a mystery to me.
On Wed, Jul 02, 2003 at 04:15:02PM -0700, William Lee Irwin III wrote:
> The rmap functions perform kmap_atomic() internally while traversing
> pte_chains and so will take various additional TLB misses and incur
> various bits of computational expense with i386's kmap_atomic() semantics.
> The observations I made were in combination with both highpmd and an as
> of yet unmerged full 3-level pagetable cacheing patch.
Actually they were made with something approaching 0.5MB of diff vs.
the core VM and i386 arch code, so it isn't in the least bit remotely
likely to resemble any results on mainline.
Now, OTOH, my test was done on the same tree with different .config
settings, so it wasn't completely useless.
-- wli
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Overhead of highpte
2003-07-02 22:53 Overhead of highpte Martin J. Bligh
2003-07-02 23:15 ` William Lee Irwin III
@ 2003-07-04 2:34 ` Dave Hansen
2003-07-04 2:46 ` Martin J. Bligh
1 sibling, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2003-07-04 2:34 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel, Andrew Morton, William Lee Irwin III
On Wed, 2003-07-02 at 15:53, Martin J. Bligh wrote:
> Some people were saying they couldn't see an overhead with highpte.
> Seems pretty obvious to me still. It should help *more* on the NUMA
> box, as PTEs become node-local.
>
> The kmap_atomic is, of course, perfectly understandable. The increase
> in the rmap functions is a bit of a mystery to me.
>
> M.
>
> Kernbench: (make -j vmlinux, maximal tasks)
> Elapsed System User CPU
> 2.5.73-mm3 45.38 114.91 565.81 1497.75
> 2.5.73-mm3-highpte 46.54 130.41 566.84 1498.00
OK, let's add to the mystery. Here's my run, on virtually the same
hardware except, I don't do a bzImage. bzImage is pretty useless
because I don't want to benchmark gzip, so I just do vmlinux. My times
should be _faster_ than yours, right?
Elapsed: User: System: CPU:
2.5.73-mjb2 77.008s 937.756s 90s 1334%
2.5.73-mjb2-highpte 76.756s 935.464s 93.116s 1339%
Yeah, system time goes up. Something funky is going on. We should have
the same machines, except that I have twice the RAM, right? What kind
of fs are you doing your tests on? I'm doing ramfs.
--
Dave Hansen
haveblue@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Overhead of highpte
2003-07-04 2:34 ` Dave Hansen
@ 2003-07-04 2:46 ` Martin J. Bligh
2003-07-04 2:54 ` Dave Hansen
2003-07-04 3:53 ` Overhead of highpte (or not :) Dave Hansen
0 siblings, 2 replies; 7+ messages in thread
From: Martin J. Bligh @ 2003-07-04 2:46 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-kernel, Andrew Morton, William Lee Irwin III
> On Wed, 2003-07-02 at 15:53, Martin J. Bligh wrote:
>> Some people were saying they couldn't see an overhead with highpte.
>> Seems pretty obvious to me still. It should help *more* on the NUMA
>> box, as PTEs become node-local.
>>
>> The kmap_atomic is, of course, perfectly understandable. The increase
>> in the rmap functions is a bit of a mystery to me.
>>
>> M.
>>
>> Kernbench: (make -j vmlinux, maximal tasks)
>> Elapsed System User CPU
>> 2.5.73-mm3 45.38 114.91 565.81 1497.75
>> 2.5.73-mm3-highpte 46.54 130.41 566.84 1498.00
>
> OK, let's add to the mystery. Here's my run, on virtually the same
> hardware except, I don't do a bzImage. bzImage is pretty useless
> because I don't want to benchmark gzip, so I just do vmlinux. My times
> should be _faster_ than yours, right?
I do vmlinux as well.
> Elapsed: User: System: CPU:
> 2.5.73-mjb2 77.008s 937.756s 90s 1334%
> 2.5.73-mjb2-highpte 76.756s 935.464s 93.116s 1339%
>
> Yeah, system time goes up. Something funky is going on. We should have
> the same machines, except that I have twice the RAM, right? What kind
> of fs are you doing your tests on? I'm doing ramfs.
ext2.
I suspect the problem is that your gcc is such a slow piece of shit,
you're totally userspace bound. Try 2.95 (just move the /usr/bin/gcc
symlink on debian).
M.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Overhead of highpte
2003-07-04 2:46 ` Martin J. Bligh
@ 2003-07-04 2:54 ` Dave Hansen
2003-07-04 3:53 ` Overhead of highpte (or not :) Dave Hansen
1 sibling, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2003-07-04 2:54 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel, Andrew Morton, William Lee Irwin III
On Thu, 2003-07-03 at 19:46, Martin J. Bligh wrote:
> I suspect the problem is that your gcc is such a slow piece of shit,
> you're totally userspace bound. Try 2.95 (just move the /usr/bin/gcc
> symlink on debian).
Yep, you're right. I thought I _was_ using 2.95. Oh well.
--
Dave Hansen
haveblue@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Overhead of highpte (or not :)
2003-07-04 2:46 ` Martin J. Bligh
2003-07-04 2:54 ` Dave Hansen
@ 2003-07-04 3:53 ` Dave Hansen
1 sibling, 0 replies; 7+ messages in thread
From: Dave Hansen @ 2003-07-04 3:53 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: linux-kernel, Andrew Morton, William Lee Irwin III
After fixing my gcc stupidity
Elapsed: User: System: CPU:
high pte: 49.962s 578.888s 99.048s 1356.2%
lowpte: 49.630s 576.242s 90.158s 1342.0%
ukva: 50.122s 577.164s 88.958s 1328.4%
The increase in elapsed is probably within tolerances. The system time
isn't :) The decrease in system time compared to lowpte comes because
the PTE pages are allocated in highmem, and are node-local.
Differential profiles are up in a bit.
--
Dave Hansen
haveblue@us.ibm.com
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-07-04 3:39 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-02 22:53 Overhead of highpte Martin J. Bligh
2003-07-02 23:15 ` William Lee Irwin III
2003-07-03 0:02 ` William Lee Irwin III
2003-07-04 2:34 ` Dave Hansen
2003-07-04 2:46 ` Martin J. Bligh
2003-07-04 2:54 ` Dave Hansen
2003-07-04 3:53 ` Overhead of highpte (or not :) Dave Hansen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).