* [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
@ 2013-06-27 23:16 Dave Hansen
2013-06-28 5:47 ` Zheng Liu
0 siblings, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2013-06-27 23:16 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, Dave Hansen
I've been doing some testing involving large amounts of
page cache. It's quite painful to get hundreds of GB
of page cache mapped in, especially when I am trying to
do it in parallel threads. This is true even when the
page cache is already allocated and I only need to map
it in. The test:
1. take 160 16MB files
2. clone 160 threads, mmap the 16MB files, and either
a. walk through the file touching each page
b. run MADV_POPULATE on the file
3. MADV_DONTNEED on the mmap()'d area
160 threads/processes:
faulting | MADV_POPULATE
Threads: 698 | 102239 (146x speedup)
Proceeses: 154247 | 297518 (1.9x speedup)
single threaded:
faulting | MADV_POPULATE
1908 | 3710 (1.9x speedup)
To fix the thread suckage, this patch just walks the
VMAs and maps all the pages in. Since it does a
bunch of them in one go, it amortizes the cost of
acquiring the mmap_sem across all of those pages.
FAQ:
Why do threads suck so much?
Bouncing the mmap_sem cacheline around, plus anything
else that we write to during a fault. We do one page,
move the cachelines to another CPU, do one more page,
etc...
Does MADV_DONTNEED work for this?
No. It brings the pages in to the page cache, but
does not map them the way it is implemented at the
moment. I guess we'd be within our rights to make
it behave like MADV_POPULATE if we want though.
---
linux.git-davehans/include/uapi/asm-generic/mman-common.h | 1
linux.git-davehans/mm/madvise.c | 40 +++++++++++++-
2 files changed, 40 insertions(+), 1 deletion(-)
diff -puN include/uapi/asm-generic/mman-common.h~madv_populate include/uapi/asm-generic/mman-common.h
--- linux.git/include/uapi/asm-generic/mman-common.h~madv_populate 2013-06-27 15:22:35.651854196 -0700
+++ linux.git-davehans/include/uapi/asm-generic/mman-common.h 2013-06-27 15:22:35.656854418 -0700
@@ -51,6 +51,7 @@
#define MADV_DONTDUMP 16 /* Explicity exclude from the core dump,
overrides the coredump filter bits */
#define MADV_DODUMP 17 /* Clear the MADV_NODUMP flag */
+#define MADV_POPULATE 18 /* Fill in mapping like faults would */
/* compatibility flags */
#define MAP_FILE 0
diff -puN mm/madvise.c~madv_populate mm/madvise.c
--- linux.git/mm/madvise.c~madv_populate 2013-06-27 15:22:35.652854240 -0700
+++ linux.git-davehans/mm/madvise.c 2013-06-27 15:22:35.656854418 -0700
@@ -19,6 +19,7 @@
#include <linux/blkdev.h>
#include <linux/swap.h>
#include <linux/swapops.h>
+#include "internal.h"
/*
* Any behaviour which results in changes to the vma->vm_flags needs to
@@ -31,6 +32,7 @@ static int madvise_need_mmap_write(int b
case MADV_REMOVE:
case MADV_WILLNEED:
case MADV_DONTNEED:
+ case MADV_POPULATE:
return 0;
default:
/* be safe, default to 1. list exceptions explicitly */
@@ -252,6 +254,39 @@ static long madvise_willneed(struct vm_a
}
/*
+ * Do not just populate the page cache (WILLNEED), also map the pages.
+ */
+static long madvise_populate(struct vm_area_struct * vma,
+ struct vm_area_struct ** prev,
+ unsigned long start, unsigned long end)
+{
+ struct file *file = vma->vm_file;
+ int locked = 1;
+ int ret;
+
+ if (file && file->f_mapping->a_ops->get_xip_mem) {
+ /* no bad return value, but ignore advice */
+ return 0;
+ }
+
+ ret = __mlock_vma_pages_range(vma, start, end, &locked);
+ /*
+ * Make sure that out down_read() matches (read vs.
+ * write) what we did in sys_madvise.
+ */
+ BUG_ON(madvise_need_mmap_write(MADV_POPULATE));
+ if (!locked) {
+ down_read(¤t->mm->mmap_sem);
+ /* tell sys_madvise we drop mmap_sem: */
+ *prev = NULL;
+ } else {
+ *prev = vma;
+ }
+
+ return ret;
+}
+
+/*
* Application no longer needs these pages. If the pages are dirty,
* it's OK to just throw them away. The app will be more careful about
* data it wants to keep. Be sure to free swap resources too. The
@@ -378,6 +413,8 @@ madvise_vma(struct vm_area_struct *vma,
return madvise_remove(vma, prev, start, end);
case MADV_WILLNEED:
return madvise_willneed(vma, prev, start, end);
+ case MADV_POPULATE:
+ return madvise_populate(vma, prev, start, end);
case MADV_DONTNEED:
return madvise_dontneed(vma, prev, start, end);
default:
@@ -407,6 +444,7 @@ madvise_behavior_valid(int behavior)
#endif
case MADV_DONTDUMP:
case MADV_DODUMP:
+ case MADV_POPULATE:
return 1;
default:
@@ -536,7 +574,7 @@ SYSCALL_DEFINE3(madvise, unsigned long,
goto out;
if (prev)
vma = prev->vm_next;
- else /* madvise_remove dropped mmap_sem */
+ else /* madvise_remove/populate dropped mmap_sem */
vma = find_vma(current->mm, start);
}
out:
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
2013-06-27 23:16 [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting Dave Hansen
@ 2013-06-28 5:47 ` Zheng Liu
2013-06-28 15:48 ` Dave Hansen
0 siblings, 1 reply; 10+ messages in thread
From: Zheng Liu @ 2013-06-28 5:47 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-mm, linux-kernel
Hi Dave,
On Thu, Jun 27, 2013 at 04:16:05PM -0700, Dave Hansen wrote:
>
> I've been doing some testing involving large amounts of
> page cache. It's quite painful to get hundreds of GB
> of page cache mapped in, especially when I am trying to
> do it in parallel threads. This is true even when the
> page cache is already allocated and I only need to map
> it in. The test:
>
> 1. take 160 16MB files
> 2. clone 160 threads, mmap the 16MB files, and either
> a. walk through the file touching each page
Why not change MAP_POPULATE flag in mmap(2)? Now it is only for private
mappings. But maybe we could let it support shared mapping.
Regards,
- Zheng
> b. run MADV_POPULATE on the file
> 3. MADV_DONTNEED on the mmap()'d area
>
> 160 threads/processes:
> faulting | MADV_POPULATE
> Threads: 698 | 102239 (146x speedup)
> Proceeses: 154247 | 297518 (1.9x speedup)
>
> single threaded:
> faulting | MADV_POPULATE
> 1908 | 3710 (1.9x speedup)
>
> To fix the thread suckage, this patch just walks the
> VMAs and maps all the pages in. Since it does a
> bunch of them in one go, it amortizes the cost of
> acquiring the mmap_sem across all of those pages.
>
> FAQ:
>
> Why do threads suck so much?
>
> Bouncing the mmap_sem cacheline around, plus anything
> else that we write to during a fault. We do one page,
> move the cachelines to another CPU, do one more page,
> etc...
>
> Does MADV_DONTNEED work for this?
>
> No. It brings the pages in to the page cache, but
> does not map them the way it is implemented at the
> moment. I guess we'd be within our rights to make
> it behave like MADV_POPULATE if we want though.
>
>
>
> ---
>
> linux.git-davehans/include/uapi/asm-generic/mman-common.h | 1
> linux.git-davehans/mm/madvise.c | 40 +++++++++++++-
> 2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff -puN include/uapi/asm-generic/mman-common.h~madv_populate include/uapi/asm-generic/mman-common.h
> --- linux.git/include/uapi/asm-generic/mman-common.h~madv_populate 2013-06-27 15:22:35.651854196 -0700
> +++ linux.git-davehans/include/uapi/asm-generic/mman-common.h 2013-06-27 15:22:35.656854418 -0700
> @@ -51,6 +51,7 @@
> #define MADV_DONTDUMP 16 /* Explicity exclude from the core dump,
> overrides the coredump filter bits */
> #define MADV_DODUMP 17 /* Clear the MADV_NODUMP flag */
> +#define MADV_POPULATE 18 /* Fill in mapping like faults would */
>
> /* compatibility flags */
> #define MAP_FILE 0
> diff -puN mm/madvise.c~madv_populate mm/madvise.c
> --- linux.git/mm/madvise.c~madv_populate 2013-06-27 15:22:35.652854240 -0700
> +++ linux.git-davehans/mm/madvise.c 2013-06-27 15:22:35.656854418 -0700
> @@ -19,6 +19,7 @@
> #include <linux/blkdev.h>
> #include <linux/swap.h>
> #include <linux/swapops.h>
> +#include "internal.h"
>
> /*
> * Any behaviour which results in changes to the vma->vm_flags needs to
> @@ -31,6 +32,7 @@ static int madvise_need_mmap_write(int b
> case MADV_REMOVE:
> case MADV_WILLNEED:
> case MADV_DONTNEED:
> + case MADV_POPULATE:
> return 0;
> default:
> /* be safe, default to 1. list exceptions explicitly */
> @@ -252,6 +254,39 @@ static long madvise_willneed(struct vm_a
> }
>
> /*
> + * Do not just populate the page cache (WILLNEED), also map the pages.
> + */
> +static long madvise_populate(struct vm_area_struct * vma,
> + struct vm_area_struct ** prev,
> + unsigned long start, unsigned long end)
> +{
> + struct file *file = vma->vm_file;
> + int locked = 1;
> + int ret;
> +
> + if (file && file->f_mapping->a_ops->get_xip_mem) {
> + /* no bad return value, but ignore advice */
> + return 0;
> + }
> +
> + ret = __mlock_vma_pages_range(vma, start, end, &locked);
> + /*
> + * Make sure that out down_read() matches (read vs.
> + * write) what we did in sys_madvise.
> + */
> + BUG_ON(madvise_need_mmap_write(MADV_POPULATE));
> + if (!locked) {
> + down_read(¤t->mm->mmap_sem);
> + /* tell sys_madvise we drop mmap_sem: */
> + *prev = NULL;
> + } else {
> + *prev = vma;
> + }
> +
> + return ret;
> +}
> +
> +/*
> * Application no longer needs these pages. If the pages are dirty,
> * it's OK to just throw them away. The app will be more careful about
> * data it wants to keep. Be sure to free swap resources too. The
> @@ -378,6 +413,8 @@ madvise_vma(struct vm_area_struct *vma,
> return madvise_remove(vma, prev, start, end);
> case MADV_WILLNEED:
> return madvise_willneed(vma, prev, start, end);
> + case MADV_POPULATE:
> + return madvise_populate(vma, prev, start, end);
> case MADV_DONTNEED:
> return madvise_dontneed(vma, prev, start, end);
> default:
> @@ -407,6 +444,7 @@ madvise_behavior_valid(int behavior)
> #endif
> case MADV_DONTDUMP:
> case MADV_DODUMP:
> + case MADV_POPULATE:
> return 1;
>
> default:
> @@ -536,7 +574,7 @@ SYSCALL_DEFINE3(madvise, unsigned long,
> goto out;
> if (prev)
> vma = prev->vm_next;
> - else /* madvise_remove dropped mmap_sem */
> + else /* madvise_remove/populate dropped mmap_sem */
> vma = find_vma(current->mm, start);
> }
> out:
> _
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
2013-06-28 5:47 ` Zheng Liu
@ 2013-06-28 15:48 ` Dave Hansen
2013-06-29 2:20 ` Zheng Liu
0 siblings, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2013-06-28 15:48 UTC (permalink / raw)
To: linux-mm, linux-kernel
On 06/27/2013 10:47 PM, Zheng Liu wrote:
>> I've been doing some testing involving large amounts of
>> page cache. It's quite painful to get hundreds of GB
>> of page cache mapped in, especially when I am trying to
>> do it in parallel threads. This is true even when the
>> page cache is already allocated and I only need to map
>> it in. The test:
>>
>> 1. take 160 16MB files
>> 2. clone 160 threads, mmap the 16MB files, and either
>> a. walk through the file touching each page
>
> Why not change MAP_POPULATE flag in mmap(2)? Now it is only for private
> mappings. But maybe we could let it support shared mapping.
Adding that support to mmap() will certainly _help_ some folks. But,
anything that mmap()s something is taking mmap_sem for write. That
means that threaded apps doing mmap()/munmap() frequently are _not_
scalable.
IOW, a process needing to do a bunch of MAP_POPULATEs isn't
parallelizable, but one using this mechanism would be.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
2013-06-28 15:48 ` Dave Hansen
@ 2013-06-29 2:20 ` Zheng Liu
2013-07-01 16:16 ` Dave Hansen
0 siblings, 1 reply; 10+ messages in thread
From: Zheng Liu @ 2013-06-29 2:20 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-mm, linux-kernel
On 06/28/2013 11:48 PM, Dave Hansen wrote:
> On 06/27/2013 10:47 PM, Zheng Liu wrote:
>>> I've been doing some testing involving large amounts of
>>> page cache. It's quite painful to get hundreds of GB
>>> of page cache mapped in, especially when I am trying to
>>> do it in parallel threads. This is true even when the
>>> page cache is already allocated and I only need to map
>>> it in. The test:
>>>
>>> 1. take 160 16MB files
>>> 2. clone 160 threads, mmap the 16MB files, and either
>>> a. walk through the file touching each page
>>
>> Why not change MAP_POPULATE flag in mmap(2)? Now it is only for private
>> mappings. But maybe we could let it support shared mapping.
>
> Adding that support to mmap() will certainly _help_ some folks. But,
> anything that mmap()s something is taking mmap_sem for write. That
> means that threaded apps doing mmap()/munmap() frequently are _not_
> scalable.
>
> IOW, a process needing to do a bunch of MAP_POPULATEs isn't
> parallelizable, but one using this mechanism would be.
I look at the code, and it seems that we will handle MAP_POPULATE flag
after we release mmap_sem locking in vm_mmap_pgoff():
down_write(&mm->mmap_sem);
ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
&populate);
up_write(&mm->mmap_sem);
if (populate)
mm_populate(ret, populate);
Am I missing something?
Regards,
- Zheng
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
2013-06-29 2:20 ` Zheng Liu
@ 2013-07-01 16:16 ` Dave Hansen
2013-07-02 2:37 ` Zheng Liu
0 siblings, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2013-07-01 16:16 UTC (permalink / raw)
To: Zheng Liu; +Cc: linux-mm, linux-kernel
On 06/28/2013 07:20 PM, Zheng Liu wrote:
>> > IOW, a process needing to do a bunch of MAP_POPULATEs isn't
>> > parallelizable, but one using this mechanism would be.
> I look at the code, and it seems that we will handle MAP_POPULATE flag
> after we release mmap_sem locking in vm_mmap_pgoff():
>
> down_write(&mm->mmap_sem);
> ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
> &populate);
> up_write(&mm->mmap_sem);
> if (populate)
> mm_populate(ret, populate);
>
> Am I missing something?
I went and did my same test using mmap(MAP_POPULATE)/munmap() pair
versus using MADV_POPULATE in 160 threads in parallel.
MADV_POPULATE was about 10x faster in the threaded configuration.
With MADV_POPULATE, the biggest cost is shipping the mmap_sem cacheline
around so that we can write the reader count update in to it. With
mmap(), there is a lot of _contention_ on that lock which is much, much
more expensive than simply bouncing a cacheline around.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
2013-07-01 16:16 ` Dave Hansen
@ 2013-07-02 2:37 ` Zheng Liu
2013-07-02 4:43 ` Dave Hansen
2013-07-14 3:12 ` Sam Ben
0 siblings, 2 replies; 10+ messages in thread
From: Zheng Liu @ 2013-07-02 2:37 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-mm, linux-kernel
On Mon, Jul 01, 2013 at 09:16:46AM -0700, Dave Hansen wrote:
> On 06/28/2013 07:20 PM, Zheng Liu wrote:
> >> > IOW, a process needing to do a bunch of MAP_POPULATEs isn't
> >> > parallelizable, but one using this mechanism would be.
> > I look at the code, and it seems that we will handle MAP_POPULATE flag
> > after we release mmap_sem locking in vm_mmap_pgoff():
> >
> > down_write(&mm->mmap_sem);
> > ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
> > &populate);
> > up_write(&mm->mmap_sem);
> > if (populate)
> > mm_populate(ret, populate);
> >
> > Am I missing something?
>
> I went and did my same test using mmap(MAP_POPULATE)/munmap() pair
> versus using MADV_POPULATE in 160 threads in parallel.
>
> MADV_POPULATE was about 10x faster in the threaded configuration.
>
> With MADV_POPULATE, the biggest cost is shipping the mmap_sem cacheline
> around so that we can write the reader count update in to it. With
> mmap(), there is a lot of _contention_ on that lock which is much, much
> more expensive than simply bouncing a cacheline around.
Thanks for your explanation.
FWIW, it would be great if we can let MAP_POPULATE flag support shared
mappings because in our product system there has a lot of applications
that uses mmap(2) and then pre-faults this mapping. Currently these
applications need to pre-fault the mapping manually.
Regards,
- Zheng
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
2013-07-02 2:37 ` Zheng Liu
@ 2013-07-02 4:43 ` Dave Hansen
2013-07-02 6:06 ` Zheng Liu
2013-07-14 3:12 ` Sam Ben
1 sibling, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2013-07-02 4:43 UTC (permalink / raw)
To: linux-mm, linux-kernel
On 07/01/2013 07:37 PM, Zheng Liu wrote:
> FWIW, it would be great if we can let MAP_POPULATE flag support shared
> mappings because in our product system there has a lot of applications
> that uses mmap(2) and then pre-faults this mapping. Currently these
> applications need to pre-fault the mapping manually.
Are you sure it doesn't? From a cursory look at the code, it looked to
me like it would populate anonymous and file-backed, but I didn't
double-check experimentally.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
2013-07-02 4:43 ` Dave Hansen
@ 2013-07-02 6:06 ` Zheng Liu
0 siblings, 0 replies; 10+ messages in thread
From: Zheng Liu @ 2013-07-02 6:06 UTC (permalink / raw)
To: Dave Hansen; +Cc: linux-mm, linux-kernel
On Mon, Jul 01, 2013 at 09:43:29PM -0700, Dave Hansen wrote:
> On 07/01/2013 07:37 PM, Zheng Liu wrote:
> > FWIW, it would be great if we can let MAP_POPULATE flag support shared
> > mappings because in our product system there has a lot of applications
> > that uses mmap(2) and then pre-faults this mapping. Currently these
> > applications need to pre-fault the mapping manually.
>
> Are you sure it doesn't? From a cursory look at the code, it looked to
> me like it would populate anonymous and file-backed, but I didn't
> double-check experimentally.
Thanks for pointing it out. I write a program to test this issue, and it
seems to me that it can populate a shared mapping. But in manpage it
describes as below:
MAP_POPULATE (since Linux 2.5.46)
Populate (prefault) page tables for a mapping. For a file mapping,
this causes read-ahead on the file. Later accesses to the mapping
will not be blocked by page faults. MAP_POPULATE is only supported
for private mappings since Linux 2.6.23.
This page is part of release 3.24 of the Linux man-pages project. I am
not sure whether it has been updated or not.
Regards,
- Zheng
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
2013-07-02 2:37 ` Zheng Liu
2013-07-02 4:43 ` Dave Hansen
@ 2013-07-14 3:12 ` Sam Ben
2013-07-15 0:22 ` Zheng Liu
1 sibling, 1 reply; 10+ messages in thread
From: Sam Ben @ 2013-07-14 3:12 UTC (permalink / raw)
To: Dave Hansen, linux-mm, linux-kernel
On 07/02/2013 10:37 AM, Zheng Liu wrote:
> On Mon, Jul 01, 2013 at 09:16:46AM -0700, Dave Hansen wrote:
>> On 06/28/2013 07:20 PM, Zheng Liu wrote:
>>>>> IOW, a process needing to do a bunch of MAP_POPULATEs isn't
>>>>> parallelizable, but one using this mechanism would be.
>>> I look at the code, and it seems that we will handle MAP_POPULATE flag
>>> after we release mmap_sem locking in vm_mmap_pgoff():
>>>
>>> down_write(&mm->mmap_sem);
>>> ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
>>> &populate);
>>> up_write(&mm->mmap_sem);
>>> if (populate)
>>> mm_populate(ret, populate);
>>>
>>> Am I missing something?
>> I went and did my same test using mmap(MAP_POPULATE)/munmap() pair
>> versus using MADV_POPULATE in 160 threads in parallel.
>>
>> MADV_POPULATE was about 10x faster in the threaded configuration.
>>
>> With MADV_POPULATE, the biggest cost is shipping the mmap_sem cacheline
>> around so that we can write the reader count update in to it. With
>> mmap(), there is a lot of _contention_ on that lock which is much, much
>> more expensive than simply bouncing a cacheline around.
> Thanks for your explanation.
>
> FWIW, it would be great if we can let MAP_POPULATE flag support shared
> mappings because in our product system there has a lot of applications
> that uses mmap(2) and then pre-faults this mapping. Currently these
> applications need to pre-fault the mapping manually.
How do you pre-fault the mapping manually in your product system? By
walking through the file touching each page?
>
> Regards,
> - Zheng
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting
2013-07-14 3:12 ` Sam Ben
@ 2013-07-15 0:22 ` Zheng Liu
0 siblings, 0 replies; 10+ messages in thread
From: Zheng Liu @ 2013-07-15 0:22 UTC (permalink / raw)
To: Sam Ben; +Cc: Dave Hansen, linux-mm, linux-kernel
On Sun, Jul 14, 2013 at 11:12:58AM +0800, Sam Ben wrote:
> On 07/02/2013 10:37 AM, Zheng Liu wrote:
> >On Mon, Jul 01, 2013 at 09:16:46AM -0700, Dave Hansen wrote:
> >>On 06/28/2013 07:20 PM, Zheng Liu wrote:
> >>>>>IOW, a process needing to do a bunch of MAP_POPULATEs isn't
> >>>>>parallelizable, but one using this mechanism would be.
> >>>I look at the code, and it seems that we will handle MAP_POPULATE flag
> >>>after we release mmap_sem locking in vm_mmap_pgoff():
> >>>
> >>> down_write(&mm->mmap_sem);
> >>> ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
> >>> &populate);
> >>> up_write(&mm->mmap_sem);
> >>> if (populate)
> >>> mm_populate(ret, populate);
> >>>
> >>>Am I missing something?
> >>I went and did my same test using mmap(MAP_POPULATE)/munmap() pair
> >>versus using MADV_POPULATE in 160 threads in parallel.
> >>
> >>MADV_POPULATE was about 10x faster in the threaded configuration.
> >>
> >>With MADV_POPULATE, the biggest cost is shipping the mmap_sem cacheline
> >>around so that we can write the reader count update in to it. With
> >>mmap(), there is a lot of _contention_ on that lock which is much, much
> >>more expensive than simply bouncing a cacheline around.
> >Thanks for your explanation.
> >
> >FWIW, it would be great if we can let MAP_POPULATE flag support shared
> >mappings because in our product system there has a lot of applications
> >that uses mmap(2) and then pre-faults this mapping. Currently these
> >applications need to pre-fault the mapping manually.
>
> How do you pre-fault the mapping manually in your product system? By
> walking through the file touching each page?
Yes, in our product system most applications do like this.
Regards,
- Zheng
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-07-15 0:03 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-27 23:16 [RFC][PATCH] mm: madvise: MADV_POPULATE for quick pre-faulting Dave Hansen
2013-06-28 5:47 ` Zheng Liu
2013-06-28 15:48 ` Dave Hansen
2013-06-29 2:20 ` Zheng Liu
2013-07-01 16:16 ` Dave Hansen
2013-07-02 2:37 ` Zheng Liu
2013-07-02 4:43 ` Dave Hansen
2013-07-02 6:06 ` Zheng Liu
2013-07-14 3:12 ` Sam Ben
2013-07-15 0:22 ` Zheng Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).