[PATCH -mm 0/2] speed up arch_get_unmapped_area

* [PATCH -mm 0/2] speed up arch_get_unmapped_area
@ 2012-02-23 19:54 Rik van Riel
  2012-02-23 19:56 ` [PATCH -mm 1/2] mm: fix quadratic behaviour in get_unmapped_area_topdown Rik van Riel
  2012-02-23 20:00 ` [PATCH -mm 2/2] mm: do not reset mm->free_area_cache on every single munmap Rik van Riel
  0 siblings, 2 replies; 12+ messages in thread
From: Rik van Riel @ 2012-02-23 19:54 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, akpm, Mel Gorman, Johannes Weiner, KOSAKI Motohiro,
	Andrea Arcangeli, hughd

Many years ago, we introduced a limit on the number of VMAs per
process and set that limit to 64k, because there are processes
that end up using tens of thousands of VMAs.

Unfortunately, arch_get_unmapped_area and 
arch_get_unmapped_area_topdown have serious scalability issues
when a process has thousands of VMAs.

Luckily, it turns out those are fairly easy to fix.

I have torture tested the arch_get_unmapped_area code with a
little program that does tens of thousands of anonymous mmaps,
followed by a bunch of unmaps, followed by more maps in a loop.
The program measures the time each mmap call takes, I have run
the program in both 64 and 32 bit mode, but performance between
them is indistinguishable.

Without my patches, the average time for mmap is 242 milliseconds,
with the maximum being close to half a second.  The number of VMAs
in the process seems to vary between about 35k and 60k.

$ ./agua_frag_test_64 
..........

Min Time (ms): 4
Avg. Time (ms): 242.0000
Max Time (ms): 454
Std Dev (ms): 91.5856
Standard deviation exceeds 10

With my patches, the average time for mmap is 8 milliseconds, with
the maximum up to about 20 milliseconds in many test runs. The number
of VMAs in the process seems to vary between about 40k and 70k.

$ ./agua_frag_test_64 
..........

Min Time (ms): 5
Avg. Time (ms): 8.0000
Max Time (ms): 15
Std Dev (ms): 1.3715
All checks pass

In short, my patches introduce a little extra space overhead (about 1/8th
more virtual address space), but reduce the amount of CPU time taken by
mmap in this test case by about a factor 30.

-- 
All Rights Reversed

^ permalink raw reply	[flat|nested] 12+ messages in thread