> Given that mremap is holding mmap_sem exclusively, how about userspace > malloc implementation taking some exclusive malloc lock and doing > normal mremap followed by mmap with MAP_FIXED to fill the hole ? It > might end up having largely same overhead. Well, modulo some extra TLB > flushing. But arguably, reducing TLB flushes for sequence of page > table updates could be usefully addressed separately (e.g. maybe by > matching those syscalls, maybe via syslets). You can't use MAP_FIXED because it has a race with other users of mmap. The address hint will *usually* work, but you need to deal with the case where it fails and then cope with the fallout of the fragmentation. PaX ASLR ignores address hints so that's something else to consider if you care about running on PaX/Grsecurity patched kernels. I'm doing this in my own allocator that's heavily based on the jemalloc design. It just unmaps the memory given by the hinted mmap call if it fails to get back the hole: https://github.com/thestinger/allocator/blob/e80d2d0c2863c490b650ecffeb33beaccfcfdc46/huge.c#L167-L180 On 64-bit, it relies on 1TiB of reserved address space (works even with overcommit disabled) to do per-CPU allocation for chunks and huge (>= chunk size) allocations via address range checks so it also needs this ugly workaround too: https://github.com/thestinger/allocator/blob/e80d2d0c2863c490b650ecffeb33beaccfcfdc46/huge.c#L67-L75 I'm convinced that the mmap_sem writer lock can be avoided for the case with MREMAP_FIXED via a good heuristic though. It just needs to check that dst is a single VMA that matches the src properties and fall back to the writer lock if that's not the case. This will have the same performance as a separate syscall to move pages in all the cases where that syscall would work.