All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-02 16:55 ` Mel Gorman
  0 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-02 16:55 UTC (permalink / raw)
  To: linux-mm; +Cc: Minchan Kim, Vlastimil Babka, Andrew Morton, linux-kernel

glibc malloc changed behaviour in glibc 2.10 to have per-thread arenas
instead of creating new areans if the existing ones were contended.
The decision appears to have been made so the allocator scales better but the
downside is that madvise(MADV_DONTNEED) is now called for these per-thread
areans during free. This tears down pages that would have previously
remained. There is nothing wrong with this decision from a functional point
of view but any threaded application that frequently allocates/frees the
same-sized region is going to incur the full teardown and refault costs.

This is extremely obvious in the ebizzy benchmark. At its core, threads are
frequently freeing and allocating buffers of the same size. It is much faster
on distributions with older versions of glibc. Profiles showed that a large
amount of system CPU time was spent on tearing down and refaulting pages.

This patch identifies when a thread is frequently calling MADV_DONTNEED
on the same region of memory and starts ignoring the hint. On an 8-core
single-socket machine this was the impact on ebizzy using glibc 2.19.

ebizzy Overall Throughput
                            3.19.0-rc6            3.19.0-rc6
                               vanilla          madvise-v1r1
Hmean    Rsec-1     12619.93 (  0.00%)    34807.02 (175.81%)
Hmean    Rsec-3     33434.19 (  0.00%)   100733.77 (201.29%)
Hmean    Rsec-5     45796.68 (  0.00%)   134257.34 (193.16%)
Hmean    Rsec-7     53146.93 (  0.00%)   145512.85 (173.79%)
Hmean    Rsec-12    55132.87 (  0.00%)   145560.86 (164.02%)
Hmean    Rsec-18    54846.52 (  0.00%)   145120.79 (164.59%)
Hmean    Rsec-24    54368.95 (  0.00%)   142733.89 (162.53%)
Hmean    Rsec-30    54388.86 (  0.00%)   141424.09 (160.02%)
Hmean    Rsec-32    54047.11 (  0.00%)   139151.76 (157.46%)

And the system CPU usage was also much reduced

          3.19.0-rc6   3.19.0-rc6
             vanilla madvise-v1r1
User         2647.19      8347.26
System       5742.90        42.42
Elapsed      1350.60      1350.65

It's even more ridiculous on a 4 socket machine

ebizzy Overall Throughput
                             3.19.0-rc6             3.19.0-rc6
                                vanilla           madvise-v1r1
Hmean    Rsec-1       5354.37 (  0.00%)    12838.61 (139.78%)
Hmean    Rsec-4      10338.41 (  0.00%)    50514.52 (388.61%)
Hmean    Rsec-7       7766.33 (  0.00%)    88555.30 (1040.25%)
Hmean    Rsec-12      7188.40 (  0.00%)   154180.78 (2044.86%)
Hmean    Rsec-21      7001.82 (  0.00%)   266555.51 (3706.95%)
Hmean    Rsec-30      8975.08 (  0.00%)   314369.88 (3402.70%)
Hmean    Rsec-48     12136.53 (  0.00%)   358525.74 (2854.10%)
Hmean    Rsec-79     12607.37 (  0.00%)   341646.49 (2609.89%)
Hmean    Rsec-110    12563.37 (  0.00%)   338058.65 (2590.83%)
Hmean    Rsec-141    11701.85 (  0.00%)   331255.78 (2730.80%)
Hmean    Rsec-172    10987.39 (  0.00%)   312003.62 (2739.65%)
Hmean    Rsec-192    12050.46 (  0.00%)   296401.88 (2359.67%)

          3.19.0-rc6   3.19.0-rc6
             vanilla madvise-v1r1
User         4136.44     53506.65
System      50262.68       906.49
Elapsed      1802.07      1801.99

Note in both cases that the elapsed time is similar because the benchmark
is configured to run for a fixed duration.

MADV_FREE would have a lower cost if the underlying allocator used it but
there is no guarantee that allocators will use it. Arguably the kernel
has no business preventing an application developer shooting themselves
in a foot but this is a case where it's relatively easy to detect the bad
behaviour and avoid it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/exec.c             |  4 ++++
 include/linux/sched.h |  5 +++++
 kernel/fork.c         |  5 +++++
 mm/madvise.c          | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 70 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index ad8798e26be9..5c691fcc32f4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1551,6 +1551,10 @@ static int do_execveat_common(int fd, struct filename *filename,
 	current->in_execve = 0;
 	acct_update_integrals(current);
 	task_numa_free(current);
+	if (current->madvise_state) {
+		kfree(current->madvise_state);
+		current->madvise_state = NULL;
+	}
 	free_bprm(bprm);
 	kfree(pathbuf);
 	putname(filename);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db31ef98d2f..b6706bdb27fd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1271,6 +1271,9 @@ enum perf_event_task_context {
 	perf_nr_task_contexts,
 };
 
+/* mm/madvise.c */
+struct madvise_state_info;
+
 struct task_struct {
 	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
 	void *stack;
@@ -1637,6 +1640,8 @@ struct task_struct {
 
 	struct page_frag task_frag;
 
+	struct madvise_state_info *madvise_state;
+
 #ifdef	CONFIG_TASK_DELAY_ACCT
 	struct task_delay_info *delays;
 #endif
diff --git a/kernel/fork.c b/kernel/fork.c
index 4dc2ddade9f1..6d8dd1379240 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -246,6 +246,11 @@ void __put_task_struct(struct task_struct *tsk)
 	delayacct_tsk_free(tsk);
 	put_signal_struct(tsk->signal);
 
+	if (current->madvise_state) {
+		kfree(current->madvise_state);
+		current->madvise_state = NULL;
+	}
+
 	if (!profile_handoff_task(tsk))
 		free_task(tsk);
 }
diff --git a/mm/madvise.c b/mm/madvise.c
index a271adc93289..907bb0922711 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -19,6 +19,7 @@
 #include <linux/blkdev.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/vmacache.h>
 
 /*
  * Any behaviour which results in changes to the vma->vm_flags needs to
@@ -251,6 +252,57 @@ static long madvise_willneed(struct vm_area_struct *vma,
 	return 0;
 }
 
+#define MADVISE_HASH		VMACACHE_HASH
+#define MADVISE_STATE_SIZE	VMACACHE_SIZE
+#define MADVISE_THRESHOLD	8
+
+struct madvise_state_info {
+	unsigned long start;
+	unsigned long end;
+	int count;
+	unsigned long jiffies;
+};
+
+/* Returns true if userspace is continually dropping the same address range */
+static bool ignore_madvise_hint(unsigned long start, unsigned long end)
+{
+	int i;
+
+	if (!current->madvise_state)
+		current->madvise_state = kzalloc(sizeof(struct madvise_state_info) * MADVISE_STATE_SIZE, GFP_KERNEL);
+	if (!current->madvise_state)
+		return false;
+
+	i = VMACACHE_HASH(start);
+	if (current->madvise_state[i].start != start ||
+	    current->madvise_state[i].end != end) {
+		/* cache miss */
+		current->madvise_state[i].start = start;
+		current->madvise_state[i].end = end;
+		current->madvise_state[i].count = 0;
+		current->madvise_state[i].jiffies = jiffies;
+	} else {
+		/* cache hit */
+		unsigned long reset = current->madvise_state[i].jiffies + HZ;
+		if (time_after(jiffies, reset)) {
+			/*
+			 * If it is a second since the last madvise on this
+			 * range or since madvise hints got ignored then reset
+			 * the counts and apply the hint again.
+			 */
+			current->madvise_state[i].count = 0;
+			current->madvise_state[i].jiffies = jiffies;
+		} else
+			current->madvise_state[i].count++;
+
+		if (current->madvise_state[i].count > MADVISE_THRESHOLD)
+			return true;
+		current->madvise_state[i].jiffies = jiffies;
+	}
+
+	return false;
+}
+
 /*
  * Application no longer needs these pages.  If the pages are dirty,
  * it's OK to just throw them away.  The app will be more careful about
@@ -278,6 +330,10 @@ static long madvise_dontneed(struct vm_area_struct *vma,
 	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
 		return -EINVAL;
 
+	/* Ignore hint if madvise is continually dropping the same range */
+	if (ignore_madvise_hint(start, end))
+		return 0;
+
 	if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
 		struct zap_details details = {
 			.nonlinear_vma = vma,


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-02 16:55 ` Mel Gorman
  0 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-02 16:55 UTC (permalink / raw)
  To: linux-mm; +Cc: Minchan Kim, Vlastimil Babka, Andrew Morton, linux-kernel

glibc malloc changed behaviour in glibc 2.10 to have per-thread arenas
instead of creating new areans if the existing ones were contended.
The decision appears to have been made so the allocator scales better but the
downside is that madvise(MADV_DONTNEED) is now called for these per-thread
areans during free. This tears down pages that would have previously
remained. There is nothing wrong with this decision from a functional point
of view but any threaded application that frequently allocates/frees the
same-sized region is going to incur the full teardown and refault costs.

This is extremely obvious in the ebizzy benchmark. At its core, threads are
frequently freeing and allocating buffers of the same size. It is much faster
on distributions with older versions of glibc. Profiles showed that a large
amount of system CPU time was spent on tearing down and refaulting pages.

This patch identifies when a thread is frequently calling MADV_DONTNEED
on the same region of memory and starts ignoring the hint. On an 8-core
single-socket machine this was the impact on ebizzy using glibc 2.19.

ebizzy Overall Throughput
                            3.19.0-rc6            3.19.0-rc6
                               vanilla          madvise-v1r1
Hmean    Rsec-1     12619.93 (  0.00%)    34807.02 (175.81%)
Hmean    Rsec-3     33434.19 (  0.00%)   100733.77 (201.29%)
Hmean    Rsec-5     45796.68 (  0.00%)   134257.34 (193.16%)
Hmean    Rsec-7     53146.93 (  0.00%)   145512.85 (173.79%)
Hmean    Rsec-12    55132.87 (  0.00%)   145560.86 (164.02%)
Hmean    Rsec-18    54846.52 (  0.00%)   145120.79 (164.59%)
Hmean    Rsec-24    54368.95 (  0.00%)   142733.89 (162.53%)
Hmean    Rsec-30    54388.86 (  0.00%)   141424.09 (160.02%)
Hmean    Rsec-32    54047.11 (  0.00%)   139151.76 (157.46%)

And the system CPU usage was also much reduced

          3.19.0-rc6   3.19.0-rc6
             vanilla madvise-v1r1
User         2647.19      8347.26
System       5742.90        42.42
Elapsed      1350.60      1350.65

It's even more ridiculous on a 4 socket machine

ebizzy Overall Throughput
                             3.19.0-rc6             3.19.0-rc6
                                vanilla           madvise-v1r1
Hmean    Rsec-1       5354.37 (  0.00%)    12838.61 (139.78%)
Hmean    Rsec-4      10338.41 (  0.00%)    50514.52 (388.61%)
Hmean    Rsec-7       7766.33 (  0.00%)    88555.30 (1040.25%)
Hmean    Rsec-12      7188.40 (  0.00%)   154180.78 (2044.86%)
Hmean    Rsec-21      7001.82 (  0.00%)   266555.51 (3706.95%)
Hmean    Rsec-30      8975.08 (  0.00%)   314369.88 (3402.70%)
Hmean    Rsec-48     12136.53 (  0.00%)   358525.74 (2854.10%)
Hmean    Rsec-79     12607.37 (  0.00%)   341646.49 (2609.89%)
Hmean    Rsec-110    12563.37 (  0.00%)   338058.65 (2590.83%)
Hmean    Rsec-141    11701.85 (  0.00%)   331255.78 (2730.80%)
Hmean    Rsec-172    10987.39 (  0.00%)   312003.62 (2739.65%)
Hmean    Rsec-192    12050.46 (  0.00%)   296401.88 (2359.67%)

          3.19.0-rc6   3.19.0-rc6
             vanilla madvise-v1r1
User         4136.44     53506.65
System      50262.68       906.49
Elapsed      1802.07      1801.99

Note in both cases that the elapsed time is similar because the benchmark
is configured to run for a fixed duration.

MADV_FREE would have a lower cost if the underlying allocator used it but
there is no guarantee that allocators will use it. Arguably the kernel
has no business preventing an application developer shooting themselves
in a foot but this is a case where it's relatively easy to detect the bad
behaviour and avoid it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/exec.c             |  4 ++++
 include/linux/sched.h |  5 +++++
 kernel/fork.c         |  5 +++++
 mm/madvise.c          | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 70 insertions(+)

diff --git a/fs/exec.c b/fs/exec.c
index ad8798e26be9..5c691fcc32f4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1551,6 +1551,10 @@ static int do_execveat_common(int fd, struct filename *filename,
 	current->in_execve = 0;
 	acct_update_integrals(current);
 	task_numa_free(current);
+	if (current->madvise_state) {
+		kfree(current->madvise_state);
+		current->madvise_state = NULL;
+	}
 	free_bprm(bprm);
 	kfree(pathbuf);
 	putname(filename);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8db31ef98d2f..b6706bdb27fd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1271,6 +1271,9 @@ enum perf_event_task_context {
 	perf_nr_task_contexts,
 };
 
+/* mm/madvise.c */
+struct madvise_state_info;
+
 struct task_struct {
 	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
 	void *stack;
@@ -1637,6 +1640,8 @@ struct task_struct {
 
 	struct page_frag task_frag;
 
+	struct madvise_state_info *madvise_state;
+
 #ifdef	CONFIG_TASK_DELAY_ACCT
 	struct task_delay_info *delays;
 #endif
diff --git a/kernel/fork.c b/kernel/fork.c
index 4dc2ddade9f1..6d8dd1379240 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -246,6 +246,11 @@ void __put_task_struct(struct task_struct *tsk)
 	delayacct_tsk_free(tsk);
 	put_signal_struct(tsk->signal);
 
+	if (current->madvise_state) {
+		kfree(current->madvise_state);
+		current->madvise_state = NULL;
+	}
+
 	if (!profile_handoff_task(tsk))
 		free_task(tsk);
 }
diff --git a/mm/madvise.c b/mm/madvise.c
index a271adc93289..907bb0922711 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -19,6 +19,7 @@
 #include <linux/blkdev.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+#include <linux/vmacache.h>
 
 /*
  * Any behaviour which results in changes to the vma->vm_flags needs to
@@ -251,6 +252,57 @@ static long madvise_willneed(struct vm_area_struct *vma,
 	return 0;
 }
 
+#define MADVISE_HASH		VMACACHE_HASH
+#define MADVISE_STATE_SIZE	VMACACHE_SIZE
+#define MADVISE_THRESHOLD	8
+
+struct madvise_state_info {
+	unsigned long start;
+	unsigned long end;
+	int count;
+	unsigned long jiffies;
+};
+
+/* Returns true if userspace is continually dropping the same address range */
+static bool ignore_madvise_hint(unsigned long start, unsigned long end)
+{
+	int i;
+
+	if (!current->madvise_state)
+		current->madvise_state = kzalloc(sizeof(struct madvise_state_info) * MADVISE_STATE_SIZE, GFP_KERNEL);
+	if (!current->madvise_state)
+		return false;
+
+	i = VMACACHE_HASH(start);
+	if (current->madvise_state[i].start != start ||
+	    current->madvise_state[i].end != end) {
+		/* cache miss */
+		current->madvise_state[i].start = start;
+		current->madvise_state[i].end = end;
+		current->madvise_state[i].count = 0;
+		current->madvise_state[i].jiffies = jiffies;
+	} else {
+		/* cache hit */
+		unsigned long reset = current->madvise_state[i].jiffies + HZ;
+		if (time_after(jiffies, reset)) {
+			/*
+			 * If it is a second since the last madvise on this
+			 * range or since madvise hints got ignored then reset
+			 * the counts and apply the hint again.
+			 */
+			current->madvise_state[i].count = 0;
+			current->madvise_state[i].jiffies = jiffies;
+		} else
+			current->madvise_state[i].count++;
+
+		if (current->madvise_state[i].count > MADVISE_THRESHOLD)
+			return true;
+		current->madvise_state[i].jiffies = jiffies;
+	}
+
+	return false;
+}
+
 /*
  * Application no longer needs these pages.  If the pages are dirty,
  * it's OK to just throw them away.  The app will be more careful about
@@ -278,6 +330,10 @@ static long madvise_dontneed(struct vm_area_struct *vma,
 	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
 		return -EINVAL;
 
+	/* Ignore hint if madvise is continually dropping the same range */
+	if (ignore_madvise_hint(start, end))
+		return 0;
+
 	if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
 		struct zap_details details = {
 			.nonlinear_vma = vma,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-02 16:55 ` Mel Gorman
@ 2015-02-02 22:05   ` Andrew Morton
  -1 siblings, 0 replies; 76+ messages in thread
From: Andrew Morton @ 2015-02-02 22:05 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On Mon, 2 Feb 2015 16:55:25 +0000 Mel Gorman <mgorman@suse.de> wrote:

> glibc malloc changed behaviour in glibc 2.10 to have per-thread arenas
> instead of creating new areans if the existing ones were contended.
> The decision appears to have been made so the allocator scales better but the
> downside is that madvise(MADV_DONTNEED) is now called for these per-thread
> areans during free. This tears down pages that would have previously
> remained. There is nothing wrong with this decision from a functional point
> of view but any threaded application that frequently allocates/frees the
> same-sized region is going to incur the full teardown and refault costs.

MADV_DONTNEED has been there for many years.  How could this problem
not have been noticed during glibc 2.10 development/testing?  Is there
some more recent kernel change which is triggering this?

> This patch identifies when a thread is frequently calling MADV_DONTNEED
> on the same region of memory and starts ignoring the hint.

That's pretty nasty-looking :(

And presumably there are all sorts of behaviours which will still
trigger the problem but which will avoid the start/end equality test in
ignore_madvise_hint()?

Really, this is a glibc problem and only a glibc problem. 
MADV_DONTNEED is unavoidably expensive and glibc is calling
MADV_DONTNEED for a region which it *does* need.  Is there something
preventing this from being addressed within glibc?


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-02 22:05   ` Andrew Morton
  0 siblings, 0 replies; 76+ messages in thread
From: Andrew Morton @ 2015-02-02 22:05 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On Mon, 2 Feb 2015 16:55:25 +0000 Mel Gorman <mgorman@suse.de> wrote:

> glibc malloc changed behaviour in glibc 2.10 to have per-thread arenas
> instead of creating new areans if the existing ones were contended.
> The decision appears to have been made so the allocator scales better but the
> downside is that madvise(MADV_DONTNEED) is now called for these per-thread
> areans during free. This tears down pages that would have previously
> remained. There is nothing wrong with this decision from a functional point
> of view but any threaded application that frequently allocates/frees the
> same-sized region is going to incur the full teardown and refault costs.

MADV_DONTNEED has been there for many years.  How could this problem
not have been noticed during glibc 2.10 development/testing?  Is there
some more recent kernel change which is triggering this?

> This patch identifies when a thread is frequently calling MADV_DONTNEED
> on the same region of memory and starts ignoring the hint.

That's pretty nasty-looking :(

And presumably there are all sorts of behaviours which will still
trigger the problem but which will avoid the start/end equality test in
ignore_madvise_hint()?

Really, this is a glibc problem and only a glibc problem. 
MADV_DONTNEED is unavoidably expensive and glibc is calling
MADV_DONTNEED for a region which it *does* need.  Is there something
preventing this from being addressed within glibc?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-02 22:05   ` Andrew Morton
@ 2015-02-02 22:18     ` Mel Gorman
  -1 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-02 22:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On Mon, Feb 02, 2015 at 02:05:06PM -0800, Andrew Morton wrote:
> On Mon, 2 Feb 2015 16:55:25 +0000 Mel Gorman <mgorman@suse.de> wrote:
> 
> > glibc malloc changed behaviour in glibc 2.10 to have per-thread arenas
> > instead of creating new areans if the existing ones were contended.
> > The decision appears to have been made so the allocator scales better but the
> > downside is that madvise(MADV_DONTNEED) is now called for these per-thread
> > areans during free. This tears down pages that would have previously
> > remained. There is nothing wrong with this decision from a functional point
> > of view but any threaded application that frequently allocates/frees the
> > same-sized region is going to incur the full teardown and refault costs.
> 
> MADV_DONTNEED has been there for many years.  How could this problem
> not have been noticed during glibc 2.10 development/testing? 

I do not know. I only spotted it due to switching distributions. Looping
allocations and frees of the same sizes is considered inefficient and it
might have been dismissed on those grounds. It's probably less noticeable
when it only affects threaded applications.

> Is there
> some more recent kernel change which is triggering this?
> 

Not that I'm aware of.

> > This patch identifies when a thread is frequently calling MADV_DONTNEED
> > on the same region of memory and starts ignoring the hint.
> 
> That's pretty nasty-looking :(
> 

Yep, it is but we're very limited in terms of what we can do within the
kernel here.

> And presumably there are all sorts of behaviours which will still
> trigger the problem but which will avoid the start/end equality test in
> ignore_madvise_hint()?
> 

Yes. I would expect that a simple pattern of multiple allocs followed by
multiple frees in a loop would also trigger it.

> Really, this is a glibc problem and only a glibc problem. 
> MADV_DONTNEED is unavoidably expensive and glibc is calling
> MADV_DONTNEED for a region which it *does* need. 

To be fair to glibc, it calls it on a region it *thinks* it doesn't need only
to reuse it immediately afterwards because of how the benchmark is
implemented.

> Is there something
> preventing this from being addressed within glibc?
 
I doubt it other than I expect they'll punt it back and blame either the
application for being stupid or the kernel for being slow.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-02 22:18     ` Mel Gorman
  0 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-02 22:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On Mon, Feb 02, 2015 at 02:05:06PM -0800, Andrew Morton wrote:
> On Mon, 2 Feb 2015 16:55:25 +0000 Mel Gorman <mgorman@suse.de> wrote:
> 
> > glibc malloc changed behaviour in glibc 2.10 to have per-thread arenas
> > instead of creating new areans if the existing ones were contended.
> > The decision appears to have been made so the allocator scales better but the
> > downside is that madvise(MADV_DONTNEED) is now called for these per-thread
> > areans during free. This tears down pages that would have previously
> > remained. There is nothing wrong with this decision from a functional point
> > of view but any threaded application that frequently allocates/frees the
> > same-sized region is going to incur the full teardown and refault costs.
> 
> MADV_DONTNEED has been there for many years.  How could this problem
> not have been noticed during glibc 2.10 development/testing? 

I do not know. I only spotted it due to switching distributions. Looping
allocations and frees of the same sizes is considered inefficient and it
might have been dismissed on those grounds. It's probably less noticeable
when it only affects threaded applications.

> Is there
> some more recent kernel change which is triggering this?
> 

Not that I'm aware of.

> > This patch identifies when a thread is frequently calling MADV_DONTNEED
> > on the same region of memory and starts ignoring the hint.
> 
> That's pretty nasty-looking :(
> 

Yep, it is but we're very limited in terms of what we can do within the
kernel here.

> And presumably there are all sorts of behaviours which will still
> trigger the problem but which will avoid the start/end equality test in
> ignore_madvise_hint()?
> 

Yes. I would expect that a simple pattern of multiple allocs followed by
multiple frees in a loop would also trigger it.

> Really, this is a glibc problem and only a glibc problem. 
> MADV_DONTNEED is unavoidably expensive and glibc is calling
> MADV_DONTNEED for a region which it *does* need. 

To be fair to glibc, it calls it on a region it *thinks* it doesn't need only
to reuse it immediately afterwards because of how the benchmark is
implemented.

> Is there something
> preventing this from being addressed within glibc?
 
I doubt it other than I expect they'll punt it back and blame either the
application for being stupid or the kernel for being slow.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-02 16:55 ` Mel Gorman
@ 2015-02-02 22:22   ` Dave Hansen
  -1 siblings, 0 replies; 76+ messages in thread
From: Dave Hansen @ 2015-02-02 22:22 UTC (permalink / raw)
  To: Mel Gorman, linux-mm
  Cc: Minchan Kim, Vlastimil Babka, Andrew Morton, linux-kernel

On 02/02/2015 08:55 AM, Mel Gorman wrote:
> This patch identifies when a thread is frequently calling MADV_DONTNEED
> on the same region of memory and starts ignoring the hint. On an 8-core
> single-socket machine this was the impact on ebizzy using glibc 2.19.

The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
called:

>      MADV_DONTNEED
>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.

So if we have anything depending on the behavior that it's _always_
zero-filled after an MADV_DONTNEED, this will break it.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-02 22:22   ` Dave Hansen
  0 siblings, 0 replies; 76+ messages in thread
From: Dave Hansen @ 2015-02-02 22:22 UTC (permalink / raw)
  To: Mel Gorman, linux-mm
  Cc: Minchan Kim, Vlastimil Babka, Andrew Morton, linux-kernel

On 02/02/2015 08:55 AM, Mel Gorman wrote:
> This patch identifies when a thread is frequently calling MADV_DONTNEED
> on the same region of memory and starts ignoring the hint. On an 8-core
> single-socket machine this was the impact on ebizzy using glibc 2.19.

The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
called:

>      MADV_DONTNEED
>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.

So if we have anything depending on the behavior that it's _always_
zero-filled after an MADV_DONTNEED, this will break it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-02 22:18     ` Mel Gorman
@ 2015-02-02 22:35       ` Andrew Morton
  -1 siblings, 0 replies; 76+ messages in thread
From: Andrew Morton @ 2015-02-02 22:35 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On Mon, 2 Feb 2015 22:18:24 +0000 Mel Gorman <mgorman@suse.de> wrote:

> > Is there something
> > preventing this from being addressed within glibc?
>  
> I doubt it other than I expect they'll punt it back and blame either the
> application for being stupid or the kernel for being slow.

*Is* the application being stupid?  What is it actually doing? 
Something like

pthread_routine()
{
	p = malloc(X);
	do_some(work);
	free(p);
	return;
}

?

If so, that doesn't seem stupid?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-02 22:35       ` Andrew Morton
  0 siblings, 0 replies; 76+ messages in thread
From: Andrew Morton @ 2015-02-02 22:35 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On Mon, 2 Feb 2015 22:18:24 +0000 Mel Gorman <mgorman@suse.de> wrote:

> > Is there something
> > preventing this from being addressed within glibc?
>  
> I doubt it other than I expect they'll punt it back and blame either the
> application for being stupid or the kernel for being slow.

*Is* the application being stupid?  What is it actually doing? 
Something like

pthread_routine()
{
	p = malloc(X);
	do_some(work);
	free(p);
	return;
}

?

If so, that doesn't seem stupid?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-02 22:35       ` Andrew Morton
@ 2015-02-03  0:26         ` Davidlohr Bueso
  -1 siblings, 0 replies; 76+ messages in thread
From: Davidlohr Bueso @ 2015-02-03  0:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On Mon, 2015-02-02 at 14:35 -0800, Andrew Morton wrote:
> On Mon, 2 Feb 2015 22:18:24 +0000 Mel Gorman <mgorman@suse.de> wrote:
> 
> > > Is there something
> > > preventing this from being addressed within glibc?
> >  
> > I doubt it other than I expect they'll punt it back and blame either the
> > application for being stupid or the kernel for being slow.
> 
> *Is* the application being stupid?  What is it actually doing? 
> Something like
> 
> pthread_routine()
> {
> 	p = malloc(X);
> 	do_some(work);
> 	free(p);

Ebizzy adds a time based loop in there. But yeah, pretty much a standard
pthread model.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03  0:26         ` Davidlohr Bueso
  0 siblings, 0 replies; 76+ messages in thread
From: Davidlohr Bueso @ 2015-02-03  0:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On Mon, 2015-02-02 at 14:35 -0800, Andrew Morton wrote:
> On Mon, 2 Feb 2015 22:18:24 +0000 Mel Gorman <mgorman@suse.de> wrote:
> 
> > > Is there something
> > > preventing this from being addressed within glibc?
> >  
> > I doubt it other than I expect they'll punt it back and blame either the
> > application for being stupid or the kernel for being slow.
> 
> *Is* the application being stupid?  What is it actually doing? 
> Something like
> 
> pthread_routine()
> {
> 	p = malloc(X);
> 	do_some(work);
> 	free(p);

Ebizzy adds a time based loop in there. But yeah, pretty much a standard
pthread model.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-02 22:22   ` Dave Hansen
@ 2015-02-03  8:19     ` Vlastimil Babka
  -1 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2015-02-03  8:19 UTC (permalink / raw)
  To: Dave Hansen, Mel Gorman, linux-mm
  Cc: Minchan Kim, Andrew Morton, linux-kernel, linux-api,
	mtk.manpages, linux-man

[CC linux-api, man pages]

On 02/02/2015 11:22 PM, Dave Hansen wrote:
> On 02/02/2015 08:55 AM, Mel Gorman wrote:
>> This patch identifies when a thread is frequently calling MADV_DONTNEED
>> on the same region of memory and starts ignoring the hint. On an 8-core
>> single-socket machine this was the impact on ebizzy using glibc 2.19.
> 
> The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> called:
> 
>>      MADV_DONTNEED
>>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
>>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
>>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> 
> So if we have anything depending on the behavior that it's _always_
> zero-filled after an MADV_DONTNEED, this will break it.

OK, so that's a third person (including me) who understood it as a zero-fill
guarantee. I think the man page should be clarified (if it's indeed not
guaranteed), or we have a bug.

The implementation actually skips MADV_DONTNEED for
VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.

I'm not sure about VM_PFNMAP, these are probably special enough. For mlock, one
could expect that mlocking and MADV_DONTNEED would be in some opposition, but
it's not documented in the manpage AFAIK. Neither is the hugetlb case, which
could be really unexpected by the user.

Next, what the man page says about guarantees:

"The kernel is free to ignore the advice."

- that would suggest that nothing is guaranteed

"This call does not influence the semantics of the application (except in the
case of MADV_DONTNEED)"

- that depends if the reader understands it as "does influence by MADV_DONTNEED"
or "may influence by MADV_DONTNEED"

- btw, isn't MADV_DONTFORK another exception that does influence the semantics?
And since it's mentioned as a workaround for some hardware, is it OK to ignore
this advice?

And the part you already cited:

"Subsequent accesses of pages in this range will succeed, but will result either
in reloading of the memory contents from the underlying mapped file (see
mmap(2)) or zero-fill on-demand pages for mappings without an underlying file."

- The word "will result" did sound as a guarantee at least to me. So here it
could be changed to "may result (unless the advice is ignored)"?

And if we agree that there is indeed no guarantee, what's the actual semantic
difference from MADV_FREE? I guess none? So there's only a possible perfomance
difference?

Vlastimil


^ permalink raw reply	[flat|nested] 76+ messages in thread

* MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03  8:19     ` Vlastimil Babka
  0 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2015-02-03  8:19 UTC (permalink / raw)
  To: Dave Hansen, Mel Gorman, linux-mm
  Cc: Minchan Kim, Andrew Morton, linux-kernel, linux-api,
	mtk.manpages, linux-man

[CC linux-api, man pages]

On 02/02/2015 11:22 PM, Dave Hansen wrote:
> On 02/02/2015 08:55 AM, Mel Gorman wrote:
>> This patch identifies when a thread is frequently calling MADV_DONTNEED
>> on the same region of memory and starts ignoring the hint. On an 8-core
>> single-socket machine this was the impact on ebizzy using glibc 2.19.
> 
> The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> called:
> 
>>      MADV_DONTNEED
>>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
>>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
>>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> 
> So if we have anything depending on the behavior that it's _always_
> zero-filled after an MADV_DONTNEED, this will break it.

OK, so that's a third person (including me) who understood it as a zero-fill
guarantee. I think the man page should be clarified (if it's indeed not
guaranteed), or we have a bug.

The implementation actually skips MADV_DONTNEED for
VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.

I'm not sure about VM_PFNMAP, these are probably special enough. For mlock, one
could expect that mlocking and MADV_DONTNEED would be in some opposition, but
it's not documented in the manpage AFAIK. Neither is the hugetlb case, which
could be really unexpected by the user.

Next, what the man page says about guarantees:

"The kernel is free to ignore the advice."

- that would suggest that nothing is guaranteed

"This call does not influence the semantics of the application (except in the
case of MADV_DONTNEED)"

- that depends if the reader understands it as "does influence by MADV_DONTNEED"
or "may influence by MADV_DONTNEED"

- btw, isn't MADV_DONTFORK another exception that does influence the semantics?
And since it's mentioned as a workaround for some hardware, is it OK to ignore
this advice?

And the part you already cited:

"Subsequent accesses of pages in this range will succeed, but will result either
in reloading of the memory contents from the underlying mapped file (see
mmap(2)) or zero-fill on-demand pages for mappings without an underlying file."

- The word "will result" did sound as a guarantee at least to me. So here it
could be changed to "may result (unless the advice is ignored)"?

And if we agree that there is indeed no guarantee, what's the actual semantic
difference from MADV_FREE? I guess none? So there's only a possible perfomance
difference?

Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-02 22:22   ` Dave Hansen
@ 2015-02-03  9:47     ` Mel Gorman
  -1 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-03  9:47 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Minchan Kim, Vlastimil Babka, Andrew Morton, linux-kernel

On Mon, Feb 02, 2015 at 02:22:36PM -0800, Dave Hansen wrote:
> On 02/02/2015 08:55 AM, Mel Gorman wrote:
> > This patch identifies when a thread is frequently calling MADV_DONTNEED
> > on the same region of memory and starts ignoring the hint. On an 8-core
> > single-socket machine this was the impact on ebizzy using glibc 2.19.
> 
> The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> called:
> 

It also claims that the kernel is free to ignore the advice.

> >      MADV_DONTNEED
> >               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> >               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> >               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> 
> So if we have anything depending on the behavior that it's _always_
> zero-filled after an MADV_DONTNEED, this will break it.

True. I'd be surprised if any application depended on that but to be safe,
an ignored hint could clear the pages. It would still be cheaper than a
full teardown and refault.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03  9:47     ` Mel Gorman
  0 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-03  9:47 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, Minchan Kim, Vlastimil Babka, Andrew Morton, linux-kernel

On Mon, Feb 02, 2015 at 02:22:36PM -0800, Dave Hansen wrote:
> On 02/02/2015 08:55 AM, Mel Gorman wrote:
> > This patch identifies when a thread is frequently calling MADV_DONTNEED
> > on the same region of memory and starts ignoring the hint. On an 8-core
> > single-socket machine this was the impact on ebizzy using glibc 2.19.
> 
> The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> called:
> 

It also claims that the kernel is free to ignore the advice.

> >      MADV_DONTNEED
> >               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> >               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> >               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> 
> So if we have anything depending on the behavior that it's _always_
> zero-filled after an MADV_DONTNEED, this will break it.

True. I'd be surprised if any application depended on that but to be safe,
an ignored hint could clear the pages. It would still be cheaper than a
full teardown and refault.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-03  9:47     ` Mel Gorman
@ 2015-02-03 10:47       ` Kirill A. Shutemov
  -1 siblings, 0 replies; 76+ messages in thread
From: Kirill A. Shutemov @ 2015-02-03 10:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Dave Hansen, linux-mm, Minchan Kim, Vlastimil Babka,
	Andrew Morton, linux-kernel

On Tue, Feb 03, 2015 at 09:47:18AM +0000, Mel Gorman wrote:
> On Mon, Feb 02, 2015 at 02:22:36PM -0800, Dave Hansen wrote:
> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> > > This patch identifies when a thread is frequently calling MADV_DONTNEED
> > > on the same region of memory and starts ignoring the hint. On an 8-core
> > > single-socket machine this was the impact on ebizzy using glibc 2.19.
> > 
> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> > called:
> > 
> 
> It also claims that the kernel is free to ignore the advice.
> 
> > >      MADV_DONTNEED
> > >               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> > >               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> > >               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> > 
> > So if we have anything depending on the behavior that it's _always_
> > zero-filled after an MADV_DONTNEED, this will break it.
> 
> True. I'd be surprised if any application depended on that 

IIUC, jemalloc depends on this[1].

[1] https://github.com/jemalloc/jemalloc/blob/dev/src/chunk_mmap.c#L117

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 10:47       ` Kirill A. Shutemov
  0 siblings, 0 replies; 76+ messages in thread
From: Kirill A. Shutemov @ 2015-02-03 10:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Dave Hansen, linux-mm, Minchan Kim, Vlastimil Babka,
	Andrew Morton, linux-kernel

On Tue, Feb 03, 2015 at 09:47:18AM +0000, Mel Gorman wrote:
> On Mon, Feb 02, 2015 at 02:22:36PM -0800, Dave Hansen wrote:
> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> > > This patch identifies when a thread is frequently calling MADV_DONTNEED
> > > on the same region of memory and starts ignoring the hint. On an 8-core
> > > single-socket machine this was the impact on ebizzy using glibc 2.19.
> > 
> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> > called:
> > 
> 
> It also claims that the kernel is free to ignore the advice.
> 
> > >      MADV_DONTNEED
> > >               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> > >               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> > >               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> > 
> > So if we have anything depending on the behavior that it's _always_
> > zero-filled after an MADV_DONTNEED, this will break it.
> 
> True. I'd be surprised if any application depended on that 

IIUC, jemalloc depends on this[1].

[1] https://github.com/jemalloc/jemalloc/blob/dev/src/chunk_mmap.c#L117

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-02 22:35       ` Andrew Morton
@ 2015-02-03 10:50         ` Mel Gorman
  -1 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-03 10:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On Mon, Feb 02, 2015 at 02:35:41PM -0800, Andrew Morton wrote:
> On Mon, 2 Feb 2015 22:18:24 +0000 Mel Gorman <mgorman@suse.de> wrote:
> 
> > > Is there something
> > > preventing this from being addressed within glibc?
> >  
> > I doubt it other than I expect they'll punt it back and blame either the
> > application for being stupid or the kernel for being slow.
> 
> *Is* the application being stupid?  What is it actually doing? 

Only a little. There is little simulated think time between the allocation
and the subsequent free. It means the cost of alloc/free dominates where
in "real" applications they would either be reusing buffers if they were
constantly needed or the think time would mask the cost of the free.

> Something like
> 
> pthread_routine()
> {
> 	p = malloc(X);
> 	do_some(work);
> 	free(p);
> 	return;
> }
> 

Pretty much. There is a search_mem() function that

alloc(copy_size)
memcpy
search
free(copy)

A real application might try and avoid the copy or reuse buffers if they
encountered this particular problem.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 10:50         ` Mel Gorman
  0 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-03 10:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On Mon, Feb 02, 2015 at 02:35:41PM -0800, Andrew Morton wrote:
> On Mon, 2 Feb 2015 22:18:24 +0000 Mel Gorman <mgorman@suse.de> wrote:
> 
> > > Is there something
> > > preventing this from being addressed within glibc?
> >  
> > I doubt it other than I expect they'll punt it back and blame either the
> > application for being stupid or the kernel for being slow.
> 
> *Is* the application being stupid?  What is it actually doing? 

Only a little. There is little simulated think time between the allocation
and the subsequent free. It means the cost of alloc/free dominates where
in "real" applications they would either be reusing buffers if they were
constantly needed or the think time would mask the cost of the free.

> Something like
> 
> pthread_routine()
> {
> 	p = malloc(X);
> 	do_some(work);
> 	free(p);
> 	return;
> }
> 

Pretty much. There is a search_mem() function that

alloc(copy_size)
memcpy
search
free(copy)

A real application might try and avoid the copy or reuse buffers if they
encountered this particular problem.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 10:53       ` Kirill A. Shutemov
  0 siblings, 0 replies; 76+ messages in thread
From: Kirill A. Shutemov @ 2015-02-03 10:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Dave Hansen, Mel Gorman, linux-mm, Minchan Kim, Andrew Morton,
	linux-kernel, linux-api, mtk.manpages, linux-man

On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> [CC linux-api, man pages]
> 
> On 02/02/2015 11:22 PM, Dave Hansen wrote:
> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> >> This patch identifies when a thread is frequently calling MADV_DONTNEED
> >> on the same region of memory and starts ignoring the hint. On an 8-core
> >> single-socket machine this was the impact on ebizzy using glibc 2.19.
> > 
> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> > called:
> > 
> >>      MADV_DONTNEED
> >>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> >>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> >>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> > 
> > So if we have anything depending on the behavior that it's _always_
> > zero-filled after an MADV_DONTNEED, this will break it.
> 
> OK, so that's a third person (including me) who understood it as a zero-fill
> guarantee. I think the man page should be clarified (if it's indeed not
> guaranteed), or we have a bug.
> 
> The implementation actually skips MADV_DONTNEED for
> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.

It doesn't skip. It fails with -EINVAL. Or I miss something.

> - The word "will result" did sound as a guarantee at least to me. So here it
> could be changed to "may result (unless the advice is ignored)"?

It's too late to fix documentation. Applications already depends on the
beheviour.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 10:53       ` Kirill A. Shutemov
  0 siblings, 0 replies; 76+ messages in thread
From: Kirill A. Shutemov @ 2015-02-03 10:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Dave Hansen, Mel Gorman, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Minchan Kim, Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> [CC linux-api, man pages]
> 
> On 02/02/2015 11:22 PM, Dave Hansen wrote:
> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> >> This patch identifies when a thread is frequently calling MADV_DONTNEED
> >> on the same region of memory and starts ignoring the hint. On an 8-core
> >> single-socket machine this was the impact on ebizzy using glibc 2.19.
> > 
> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> > called:
> > 
> >>      MADV_DONTNEED
> >>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> >>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> >>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> > 
> > So if we have anything depending on the behavior that it's _always_
> > zero-filled after an MADV_DONTNEED, this will break it.
> 
> OK, so that's a third person (including me) who understood it as a zero-fill
> guarantee. I think the man page should be clarified (if it's indeed not
> guaranteed), or we have a bug.
> 
> The implementation actually skips MADV_DONTNEED for
> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.

It doesn't skip. It fails with -EINVAL. Or I miss something.

> - The word "will result" did sound as a guarantee at least to me. So here it
> could be changed to "may result (unless the advice is ignored)"?

It's too late to fix documentation. Applications already depends on the
beheviour.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 10:53       ` Kirill A. Shutemov
  0 siblings, 0 replies; 76+ messages in thread
From: Kirill A. Shutemov @ 2015-02-03 10:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Dave Hansen, Mel Gorman, linux-mm, Minchan Kim, Andrew Morton,
	linux-kernel, linux-api, mtk.manpages, linux-man

On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> [CC linux-api, man pages]
> 
> On 02/02/2015 11:22 PM, Dave Hansen wrote:
> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> >> This patch identifies when a thread is frequently calling MADV_DONTNEED
> >> on the same region of memory and starts ignoring the hint. On an 8-core
> >> single-socket machine this was the impact on ebizzy using glibc 2.19.
> > 
> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> > called:
> > 
> >>      MADV_DONTNEED
> >>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> >>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> >>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> > 
> > So if we have anything depending on the behavior that it's _always_
> > zero-filled after an MADV_DONTNEED, this will break it.
> 
> OK, so that's a third person (including me) who understood it as a zero-fill
> guarantee. I think the man page should be clarified (if it's indeed not
> guaranteed), or we have a bug.
> 
> The implementation actually skips MADV_DONTNEED for
> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.

It doesn't skip. It fails with -EINVAL. Or I miss something.

> - The word "will result" did sound as a guarantee at least to me. So here it
> could be changed to "may result (unless the advice is ignored)"?

It's too late to fix documentation. Applications already depends on the
beheviour.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-03  8:19     ` Vlastimil Babka
@ 2015-02-03 11:16       ` Mel Gorman
  -1 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-03 11:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Dave Hansen, linux-mm, Minchan Kim, Andrew Morton, linux-kernel,
	linux-api, mtk.manpages, linux-man

On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> [CC linux-api, man pages]
> 
> On 02/02/2015 11:22 PM, Dave Hansen wrote:
> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> >> This patch identifies when a thread is frequently calling MADV_DONTNEED
> >> on the same region of memory and starts ignoring the hint. On an 8-core
> >> single-socket machine this was the impact on ebizzy using glibc 2.19.
> > 
> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> > called:
> > 
> >>      MADV_DONTNEED
> >>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> >>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> >>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> > 
> > So if we have anything depending on the behavior that it's _always_
> > zero-filled after an MADV_DONTNEED, this will break it.
> 
> OK, so that's a third person (including me) who understood it as a zero-fill
> guarantee. I think the man page should be clarified (if it's indeed not
> guaranteed), or we have a bug.
> 
> The implementation actually skips MADV_DONTNEED for
> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.
> 

This was the first reason why I did not consider the zero-filling to be a
guarantee. That said, at this point I'm also not considering pushing this
patch towards the kernel. I agree that this is a glibc bug so I've dropped
a line to some glibc people to see what they think the approach should be.

> I'm not sure about VM_PFNMAP, these are probably special enough. For mlock, one
> could expect that mlocking and MADV_DONTNEED would be in some opposition, but
> it's not documented in the manpage AFAIK. Neither is the hugetlb case, which
> could be really unexpected by the user.
> 

The equivalent posix page also lacks details on how exactly this flag
should behave. hugetlb is sortof special in that it's always backed by
a ram-based file where the contents can be refaulted. It gets hairy when
the mapping has been created to look anonymous but is not anonymous
really. The semantics of hugetlb have always been fuzzy.

> Next, what the man page says about guarantees:
> 
> "The kernel is free to ignore the advice."
> 
> - that would suggest that nothing is guaranteed
> 

Yep, another reason why I did not clear the page when ignoring the hint.

> "This call does not influence the semantics of the application (except in the
> case of MADV_DONTNEED)"
> 
> - that depends if the reader understands it as "does influence by MADV_DONTNEED"
> or "may influence by MADV_DONTNEED"
> 
> - btw, isn't MADV_DONTFORK another exception that does influence the semantics?
> And since it's mentioned as a workaround for some hardware, is it OK to ignore
> this advice?
> 

MADV_DONTFORK is also a Linux-specific extention. It happens to be one
that if it gets ignored then the application will be very surprised.

> And the part you already cited:
> 
> "Subsequent accesses of pages in this range will succeed, but will result either
> in reloading of the memory contents from the underlying mapped file (see
> mmap(2)) or zero-fill on-demand pages for mappings without an underlying file."
> 
> - The word "will result" did sound as a guarantee at least to me. So here it
> could be changed to "may result (unless the advice is ignored)"?
> 

The wording should be "may result" as there are circumstances where it
gets ignored even without this prototype patch.

> And if we agree that there is indeed no guarantee, what's the actual semantic
> difference from MADV_FREE? I guess none? So there's only a possible perfomance
> difference?
> 

Timing. MADV_DONTNEED if it has an effect is immediate, is a heavier
operations and RSS is reduced. MADV_FREE only has an impact in the future
if there is memory pressure.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 11:16       ` Mel Gorman
  0 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-03 11:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Dave Hansen, linux-mm, Minchan Kim, Andrew Morton, linux-kernel,
	linux-api, mtk.manpages, linux-man

On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> [CC linux-api, man pages]
> 
> On 02/02/2015 11:22 PM, Dave Hansen wrote:
> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> >> This patch identifies when a thread is frequently calling MADV_DONTNEED
> >> on the same region of memory and starts ignoring the hint. On an 8-core
> >> single-socket machine this was the impact on ebizzy using glibc 2.19.
> > 
> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> > called:
> > 
> >>      MADV_DONTNEED
> >>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> >>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> >>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> > 
> > So if we have anything depending on the behavior that it's _always_
> > zero-filled after an MADV_DONTNEED, this will break it.
> 
> OK, so that's a third person (including me) who understood it as a zero-fill
> guarantee. I think the man page should be clarified (if it's indeed not
> guaranteed), or we have a bug.
> 
> The implementation actually skips MADV_DONTNEED for
> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.
> 

This was the first reason why I did not consider the zero-filling to be a
guarantee. That said, at this point I'm also not considering pushing this
patch towards the kernel. I agree that this is a glibc bug so I've dropped
a line to some glibc people to see what they think the approach should be.

> I'm not sure about VM_PFNMAP, these are probably special enough. For mlock, one
> could expect that mlocking and MADV_DONTNEED would be in some opposition, but
> it's not documented in the manpage AFAIK. Neither is the hugetlb case, which
> could be really unexpected by the user.
> 

The equivalent posix page also lacks details on how exactly this flag
should behave. hugetlb is sortof special in that it's always backed by
a ram-based file where the contents can be refaulted. It gets hairy when
the mapping has been created to look anonymous but is not anonymous
really. The semantics of hugetlb have always been fuzzy.

> Next, what the man page says about guarantees:
> 
> "The kernel is free to ignore the advice."
> 
> - that would suggest that nothing is guaranteed
> 

Yep, another reason why I did not clear the page when ignoring the hint.

> "This call does not influence the semantics of the application (except in the
> case of MADV_DONTNEED)"
> 
> - that depends if the reader understands it as "does influence by MADV_DONTNEED"
> or "may influence by MADV_DONTNEED"
> 
> - btw, isn't MADV_DONTFORK another exception that does influence the semantics?
> And since it's mentioned as a workaround for some hardware, is it OK to ignore
> this advice?
> 

MADV_DONTFORK is also a Linux-specific extention. It happens to be one
that if it gets ignored then the application will be very surprised.

> And the part you already cited:
> 
> "Subsequent accesses of pages in this range will succeed, but will result either
> in reloading of the memory contents from the underlying mapped file (see
> mmap(2)) or zero-fill on-demand pages for mappings without an underlying file."
> 
> - The word "will result" did sound as a guarantee at least to me. So here it
> could be changed to "may result (unless the advice is ignored)"?
> 

The wording should be "may result" as there are circumstances where it
gets ignored even without this prototype patch.

> And if we agree that there is indeed no guarantee, what's the actual semantic
> difference from MADV_FREE? I guess none? So there's only a possible perfomance
> difference?
> 

Timing. MADV_DONTNEED if it has an effect is immediate, is a heavier
operations and RSS is reduced. MADV_FREE only has an impact in the future
if there is memory pressure.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-03 10:47       ` Kirill A. Shutemov
@ 2015-02-03 11:21         ` Mel Gorman
  -1 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-03 11:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, linux-mm, Minchan Kim, Vlastimil Babka,
	Andrew Morton, linux-kernel

On Tue, Feb 03, 2015 at 12:47:56PM +0200, Kirill A. Shutemov wrote:
> On Tue, Feb 03, 2015 at 09:47:18AM +0000, Mel Gorman wrote:
> > On Mon, Feb 02, 2015 at 02:22:36PM -0800, Dave Hansen wrote:
> > > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> > > > This patch identifies when a thread is frequently calling MADV_DONTNEED
> > > > on the same region of memory and starts ignoring the hint. On an 8-core
> > > > single-socket machine this was the impact on ebizzy using glibc 2.19.
> > > 
> > > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> > > called:
> > > 
> > 
> > It also claims that the kernel is free to ignore the advice.
> > 
> > > >      MADV_DONTNEED
> > > >               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> > > >               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> > > >               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> > > 
> > > So if we have anything depending on the behavior that it's _always_
> > > zero-filled after an MADV_DONTNEED, this will break it.
> > 
> > True. I'd be surprised if any application depended on that 
> 
> IIUC, jemalloc depends on this[1].
> 
> [1] https://github.com/jemalloc/jemalloc/blob/dev/src/chunk_mmap.c#L117
> 

Hope they never back regions with hugetlb then or fall apart if the process
called mlockall

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 11:21         ` Mel Gorman
  0 siblings, 0 replies; 76+ messages in thread
From: Mel Gorman @ 2015-02-03 11:21 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, linux-mm, Minchan Kim, Vlastimil Babka,
	Andrew Morton, linux-kernel

On Tue, Feb 03, 2015 at 12:47:56PM +0200, Kirill A. Shutemov wrote:
> On Tue, Feb 03, 2015 at 09:47:18AM +0000, Mel Gorman wrote:
> > On Mon, Feb 02, 2015 at 02:22:36PM -0800, Dave Hansen wrote:
> > > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> > > > This patch identifies when a thread is frequently calling MADV_DONTNEED
> > > > on the same region of memory and starts ignoring the hint. On an 8-core
> > > > single-socket machine this was the impact on ebizzy using glibc 2.19.
> > > 
> > > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> > > called:
> > > 
> > 
> > It also claims that the kernel is free to ignore the advice.
> > 
> > > >      MADV_DONTNEED
> > > >               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> > > >               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> > > >               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> > > 
> > > So if we have anything depending on the behavior that it's _always_
> > > zero-filled after an MADV_DONTNEED, this will break it.
> > 
> > True. I'd be surprised if any application depended on that 
> 
> IIUC, jemalloc depends on this[1].
> 
> [1] https://github.com/jemalloc/jemalloc/blob/dev/src/chunk_mmap.c#L117
> 

Hope they never back regions with hugetlb then or fall apart if the process
called mlockall

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-03 10:53       ` Kirill A. Shutemov
@ 2015-02-03 11:42         ` Vlastimil Babka
  -1 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2015-02-03 11:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Mel Gorman, linux-mm, Minchan Kim, Andrew Morton,
	linux-kernel, linux-api, mtk.manpages, linux-man

On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
>> [CC linux-api, man pages]
>> 
>> On 02/02/2015 11:22 PM, Dave Hansen wrote:
>> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
>> >> This patch identifies when a thread is frequently calling MADV_DONTNEED
>> >> on the same region of memory and starts ignoring the hint. On an 8-core
>> >> single-socket machine this was the impact on ebizzy using glibc 2.19.
>> > 
>> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
>> > called:
>> > 
>> >>      MADV_DONTNEED
>> >>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
>> >>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
>> >>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
>> > 
>> > So if we have anything depending on the behavior that it's _always_
>> > zero-filled after an MADV_DONTNEED, this will break it.
>> 
>> OK, so that's a third person (including me) who understood it as a zero-fill
>> guarantee. I think the man page should be clarified (if it's indeed not
>> guaranteed), or we have a bug.
>> 
>> The implementation actually skips MADV_DONTNEED for
>> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.
> 
> It doesn't skip. It fails with -EINVAL. Or I miss something.

No, I missed that. Thanks for pointing out. The manpage also explains EINVAL in
this case:

*  The application is attempting to release locked or shared pages (with
MADV_DONTNEED).

- that covers mlocking ok, not sure if the rest fits the "shared pages" case
though. I dont see any check for other kinds of shared pages in the code.

>> - The word "will result" did sound as a guarantee at least to me. So here it
>> could be changed to "may result (unless the advice is ignored)"?
> 
> It's too late to fix documentation. Applications already depends on the
> beheviour.

Right, so as long as they check for EINVAL, it should be safe. It appears that
jemalloc does.

I still wouldnt be sure just by reading the man page that the clearing is
guaranteed whenever I dont get an error return value, though,


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 11:42         ` Vlastimil Babka
  0 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2015-02-03 11:42 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Mel Gorman, linux-mm, Minchan Kim, Andrew Morton,
	linux-kernel, linux-api, mtk.manpages, linux-man

On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
>> [CC linux-api, man pages]
>> 
>> On 02/02/2015 11:22 PM, Dave Hansen wrote:
>> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
>> >> This patch identifies when a thread is frequently calling MADV_DONTNEED
>> >> on the same region of memory and starts ignoring the hint. On an 8-core
>> >> single-socket machine this was the impact on ebizzy using glibc 2.19.
>> > 
>> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
>> > called:
>> > 
>> >>      MADV_DONTNEED
>> >>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
>> >>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
>> >>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
>> > 
>> > So if we have anything depending on the behavior that it's _always_
>> > zero-filled after an MADV_DONTNEED, this will break it.
>> 
>> OK, so that's a third person (including me) who understood it as a zero-fill
>> guarantee. I think the man page should be clarified (if it's indeed not
>> guaranteed), or we have a bug.
>> 
>> The implementation actually skips MADV_DONTNEED for
>> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.
> 
> It doesn't skip. It fails with -EINVAL. Or I miss something.

No, I missed that. Thanks for pointing out. The manpage also explains EINVAL in
this case:

*  The application is attempting to release locked or shared pages (with
MADV_DONTNEED).

- that covers mlocking ok, not sure if the rest fits the "shared pages" case
though. I dont see any check for other kinds of shared pages in the code.

>> - The word "will result" did sound as a guarantee at least to me. So here it
>> could be changed to "may result (unless the advice is ignored)"?
> 
> It's too late to fix documentation. Applications already depends on the
> beheviour.

Right, so as long as they check for EINVAL, it should be safe. It appears that
jemalloc does.

I still wouldnt be sure just by reading the man page that the clearing is
guaranteed whenever I dont get an error return value, though,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 15:21         ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2015-02-03 15:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, Dave Hansen, linux-mm, Minchan Kim,
	Andrew Morton, linux-kernel, linux-api, mtk.manpages, linux-man

On Tue 03-02-15 11:16:00, Mel Gorman wrote:
> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
[...]
> > And if we agree that there is indeed no guarantee, what's the actual semantic
> > difference from MADV_FREE? I guess none? So there's only a possible perfomance
> > difference?
> > 
> 
> Timing. MADV_DONTNEED if it has an effect is immediate, is a heavier
> operations and RSS is reduced. MADV_FREE only has an impact in the future
> if there is memory pressure.

JFTR. the man page for MADV_FREE has been proposed already
(https://lkml.org/lkml/2014/12/5/63 should be the last version AFAIR). I
do not see it in the man-pages git tree but the patch was not in time
for 3.19 so I guess it will only appear in 3.20.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 15:21         ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2015-02-03 15:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, Dave Hansen, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Minchan Kim, Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-man-u79uwXL29TY76Z2rM5mHXA

On Tue 03-02-15 11:16:00, Mel Gorman wrote:
> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
[...]
> > And if we agree that there is indeed no guarantee, what's the actual semantic
> > difference from MADV_FREE? I guess none? So there's only a possible perfomance
> > difference?
> > 
> 
> Timing. MADV_DONTNEED if it has an effect is immediate, is a heavier
> operations and RSS is reduced. MADV_FREE only has an impact in the future
> if there is memory pressure.

JFTR. the man page for MADV_FREE has been proposed already
(https://lkml.org/lkml/2014/12/5/63 should be the last version AFAIR). I
do not see it in the man-pages git tree but the patch was not in time
for 3.19 so I guess it will only appear in 3.20.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 15:21         ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2015-02-03 15:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Vlastimil Babka, Dave Hansen, linux-mm, Minchan Kim,
	Andrew Morton, linux-kernel, linux-api, mtk.manpages, linux-man

On Tue 03-02-15 11:16:00, Mel Gorman wrote:
> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
[...]
> > And if we agree that there is indeed no guarantee, what's the actual semantic
> > difference from MADV_FREE? I guess none? So there's only a possible perfomance
> > difference?
> > 
> 
> Timing. MADV_DONTNEED if it has an effect is immediate, is a heavier
> operations and RSS is reduced. MADV_FREE only has an impact in the future
> if there is memory pressure.

JFTR. the man page for MADV_FREE has been proposed already
(https://lkml.org/lkml/2014/12/5/63 should be the last version AFAIR). I
do not see it in the man-pages git tree but the patch was not in time
for 3.19 so I guess it will only appear in 3.20.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-03 11:42         ` Vlastimil Babka
@ 2015-02-03 16:20           ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-03 16:20 UTC (permalink / raw)
  To: Vlastimil Babka, Kirill A. Shutemov
  Cc: mtk.manpages, Dave Hansen, Mel Gorman, linux-mm, Minchan Kim,
	Andrew Morton, linux-kernel, linux-api, linux-man, Hugh Dickins

Hello Vlastimil

Thanks for CCing me into this thread.

On 02/03/2015 12:42 PM, Vlastimil Babka wrote:
> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
>>> [CC linux-api, man pages]
>>>
>>> On 02/02/2015 11:22 PM, Dave Hansen wrote:
>>>> On 02/02/2015 08:55 AM, Mel Gorman wrote:
>>>>> This patch identifies when a thread is frequently calling MADV_DONTNEED
>>>>> on the same region of memory and starts ignoring the hint. On an 8-core
>>>>> single-socket machine this was the impact on ebizzy using glibc 2.19.
>>>>
>>>> The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
>>>> called:
>>>>
>>>>>      MADV_DONTNEED
>>>>>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
>>>>>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
>>>>>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
>>>>
>>>> So if we have anything depending on the behavior that it's _always_
>>>> zero-filled after an MADV_DONTNEED, this will break it.
>>>
>>> OK, so that's a third person (including me) who understood it as a zero-fill
>>> guarantee. I think the man page should be clarified (if it's indeed not
>>> guaranteed), or we have a bug.
>>>
>>> The implementation actually skips MADV_DONTNEED for
>>> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.
>>
>> It doesn't skip. It fails with -EINVAL. Or I miss something.
> 
> No, I missed that. Thanks for pointing out. The manpage also explains EINVAL in
> this case:
> 
> *  The application is attempting to release locked or shared pages (with
> MADV_DONTNEED).

Yes, there is that. But the page could be more explicit when discussing
MADV_DONTNEED in the main text. I've done that.

> - that covers mlocking ok, not sure if the rest fits the "shared pages" case
> though. I dont see any check for other kinds of shared pages in the code.

Agreed. "shared" here seems confused. I've removed it. And I've
added mention of "Huge TLB pages" for this error.

>>> - The word "will result" did sound as a guarantee at least to me. So here it
>>> could be changed to "may result (unless the advice is ignored)"?
>>
>> It's too late to fix documentation. Applications already depends on the
>> beheviour.
> 
> Right, so as long as they check for EINVAL, it should be safe. It appears that
> jemalloc does.

So, first a brief question: in the cases where the call does not error out,
are we agreed that in the current implementation, MADV_DONTNEED will
always result in zero-filled pages when the region is faulted back in
(when we consider pages that are not backed by a file)?

> I still wouldnt be sure just by reading the man page that the clearing is
> guaranteed whenever I dont get an error return value, though,

I'm not quite sure what you want here. I mean: if there's an error,
then the DONTNEED action didn't occur, right? Therefore, there won't
be zero-filled pages. But, for what it's worth, I added "If the
operation succeeds" at the start of that sentence beginning "Subsequent
accesses...".

Now, some history, explaining why the page is a bit of a mess,
and for that matter why I could really use more help on it from MM
folk (especially in the form of actual patches [1], rather than notes
about deficiencies in the documentation), because:

    ***I simply cannot keep up with all of the details***.

Once upon a time (Linux 2.4), there was madvise() with just 5 flags:

       MADV_NORMAL
       MADV_RANDOM
       MADV_SEQUENTIAL
       MADV_WILLNEED
       MADV_DONTNEED

And already a dozen years ago, *I* added the text about MADV_DONTNEED.
Back then, I believe it was true. I'm not sure if it's still true now,
but I assume for the moment that it is, and await feedback. And the 
text saying that the call does not affect the semantics of memory 
access dates back even further (and was then true, MADV_DONTNEED aside).

Those 5 flags have analogs in POSIX's posix_madvise() (albeit, there
is a semantic mismatch between the destructive MADV_DONTNEED and
POSIX's nondestructive POSIX_MADV_DONTNEED). They also appear
on most other implementations.

Since the original implementation, numerous pieces of cruft^W^W^W
excellent new flags have been overloaded into this one system call.
Some of those certainly violated the "does not change the semantics
of the application" statement, but, sadly, the kernel developers who
implemented MADV_REMOVE or MADV_DONTFORK did not think to send a
patch to the man page for those new flags, one that might have noted
that the semantics of the application are changed by such flags. Equally
sadly, I did overlook to scan the bigger page when *I* added 
documentation of these flags to those pages, otherwise I might have 
caught that detail.

So, just to repeat, I  could really use more help on it from MM
folk in the form of actual patches to the man page.

Thanks,

Michael

[1] https://www.kernel.org/doc/man-pages/patches.html

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 16:20           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-03 16:20 UTC (permalink / raw)
  To: Vlastimil Babka, Kirill A. Shutemov
  Cc: mtk.manpages, Dave Hansen, Mel Gorman, linux-mm, Minchan Kim,
	Andrew Morton, linux-kernel, linux-api, linux-man, Hugh Dickins

Hello Vlastimil

Thanks for CCing me into this thread.

On 02/03/2015 12:42 PM, Vlastimil Babka wrote:
> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
>>> [CC linux-api, man pages]
>>>
>>> On 02/02/2015 11:22 PM, Dave Hansen wrote:
>>>> On 02/02/2015 08:55 AM, Mel Gorman wrote:
>>>>> This patch identifies when a thread is frequently calling MADV_DONTNEED
>>>>> on the same region of memory and starts ignoring the hint. On an 8-core
>>>>> single-socket machine this was the impact on ebizzy using glibc 2.19.
>>>>
>>>> The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
>>>> called:
>>>>
>>>>>      MADV_DONTNEED
>>>>>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
>>>>>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
>>>>>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
>>>>
>>>> So if we have anything depending on the behavior that it's _always_
>>>> zero-filled after an MADV_DONTNEED, this will break it.
>>>
>>> OK, so that's a third person (including me) who understood it as a zero-fill
>>> guarantee. I think the man page should be clarified (if it's indeed not
>>> guaranteed), or we have a bug.
>>>
>>> The implementation actually skips MADV_DONTNEED for
>>> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.
>>
>> It doesn't skip. It fails with -EINVAL. Or I miss something.
> 
> No, I missed that. Thanks for pointing out. The manpage also explains EINVAL in
> this case:
> 
> *  The application is attempting to release locked or shared pages (with
> MADV_DONTNEED).

Yes, there is that. But the page could be more explicit when discussing
MADV_DONTNEED in the main text. I've done that.

> - that covers mlocking ok, not sure if the rest fits the "shared pages" case
> though. I dont see any check for other kinds of shared pages in the code.

Agreed. "shared" here seems confused. I've removed it. And I've
added mention of "Huge TLB pages" for this error.

>>> - The word "will result" did sound as a guarantee at least to me. So here it
>>> could be changed to "may result (unless the advice is ignored)"?
>>
>> It's too late to fix documentation. Applications already depends on the
>> beheviour.
> 
> Right, so as long as they check for EINVAL, it should be safe. It appears that
> jemalloc does.

So, first a brief question: in the cases where the call does not error out,
are we agreed that in the current implementation, MADV_DONTNEED will
always result in zero-filled pages when the region is faulted back in
(when we consider pages that are not backed by a file)?

> I still wouldnt be sure just by reading the man page that the clearing is
> guaranteed whenever I dont get an error return value, though,

I'm not quite sure what you want here. I mean: if there's an error,
then the DONTNEED action didn't occur, right? Therefore, there won't
be zero-filled pages. But, for what it's worth, I added "If the
operation succeeds" at the start of that sentence beginning "Subsequent
accesses...".

Now, some history, explaining why the page is a bit of a mess,
and for that matter why I could really use more help on it from MM
folk (especially in the form of actual patches [1], rather than notes
about deficiencies in the documentation), because:

    ***I simply cannot keep up with all of the details***.

Once upon a time (Linux 2.4), there was madvise() with just 5 flags:

       MADV_NORMAL
       MADV_RANDOM
       MADV_SEQUENTIAL
       MADV_WILLNEED
       MADV_DONTNEED

And already a dozen years ago, *I* added the text about MADV_DONTNEED.
Back then, I believe it was true. I'm not sure if it's still true now,
but I assume for the moment that it is, and await feedback. And the 
text saying that the call does not affect the semantics of memory 
access dates back even further (and was then true, MADV_DONTNEED aside).

Those 5 flags have analogs in POSIX's posix_madvise() (albeit, there
is a semantic mismatch between the destructive MADV_DONTNEED and
POSIX's nondestructive POSIX_MADV_DONTNEED). They also appear
on most other implementations.

Since the original implementation, numerous pieces of cruft^W^W^W
excellent new flags have been overloaded into this one system call.
Some of those certainly violated the "does not change the semantics
of the application" statement, but, sadly, the kernel developers who
implemented MADV_REMOVE or MADV_DONTFORK did not think to send a
patch to the man page for those new flags, one that might have noted
that the semantics of the application are changed by such flags. Equally
sadly, I did overlook to scan the bigger page when *I* added 
documentation of these flags to those pages, otherwise I might have 
caught that detail.

So, just to repeat, I  could really use more help on it from MM
folk in the form of actual patches to the man page.

Thanks,

Michael

[1] https://www.kernel.org/doc/man-pages/patches.html

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 16:25           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-03 16:25 UTC (permalink / raw)
  To: Michal Hocko, Mel Gorman
  Cc: mtk.manpages, minchan Kim, Dave Hansen, linux-mm, Minchan Kim,
	Andrew Morton, linux-kernel, linux-api, linux-man

On 02/03/2015 04:21 PM, Michal Hocko wrote:
> On Tue 03-02-15 11:16:00, Mel Gorman wrote:
>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> [...]
>>> And if we agree that there is indeed no guarantee, what's the actual semantic
>>> difference from MADV_FREE? I guess none? So there's only a possible perfomance
>>> difference?
>>>
>>
>> Timing. MADV_DONTNEED if it has an effect is immediate, is a heavier
>> operations and RSS is reduced. MADV_FREE only has an impact in the future
>> if there is memory pressure.
> 
> JFTR. the man page for MADV_FREE has been proposed already
> (https://lkml.org/lkml/2014/12/5/63 should be the last version AFAIR). I
> do not see it in the man-pages git tree but the patch was not in time
> for 3.19 so I guess it will only appear in 3.20.
> 

Yikes! That patch was buried in the bottom of a locked filing cabinet
in a disused lavatory. I unfortunately don't read every thread that comes
my way, especially if it doesn't look like a man-pages patch (i.e., falls
in the middle of an LKML thread that starts on another topic, and doesn't 
see linux-man@). I'll respond to that patch soon. (There are some problems
that mean I could not accept it, AFAICT.)

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 16:25           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-03 16:25 UTC (permalink / raw)
  To: Michal Hocko, Mel Gorman
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, minchan Kim, Dave Hansen,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On 02/03/2015 04:21 PM, Michal Hocko wrote:
> On Tue 03-02-15 11:16:00, Mel Gorman wrote:
>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> [...]
>>> And if we agree that there is indeed no guarantee, what's the actual semantic
>>> difference from MADV_FREE? I guess none? So there's only a possible perfomance
>>> difference?
>>>
>>
>> Timing. MADV_DONTNEED if it has an effect is immediate, is a heavier
>> operations and RSS is reduced. MADV_FREE only has an impact in the future
>> if there is memory pressure.
> 
> JFTR. the man page for MADV_FREE has been proposed already
> (https://lkml.org/lkml/2014/12/5/63 should be the last version AFAIR). I
> do not see it in the man-pages git tree but the patch was not in time
> for 3.19 so I guess it will only appear in 3.20.
> 

Yikes! That patch was buried in the bottom of a locked filing cabinet
in a disused lavatory. I unfortunately don't read every thread that comes
my way, especially if it doesn't look like a man-pages patch (i.e., falls
in the middle of an LKML thread that starts on another topic, and doesn't 
see linux-man@). I'll respond to that patch soon. (There are some problems
that mean I could not accept it, AFAICT.)

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-03 16:25           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-03 16:25 UTC (permalink / raw)
  To: Michal Hocko, Mel Gorman; +Cc: mtk.manpages, minchan Kim, Dave Hansen, linux-mm

On 02/03/2015 04:21 PM, Michal Hocko wrote:
> On Tue 03-02-15 11:16:00, Mel Gorman wrote:
>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> [...]
>>> And if we agree that there is indeed no guarantee, what's the actual semantic
>>> difference from MADV_FREE? I guess none? So there's only a possible perfomance
>>> difference?
>>>
>>
>> Timing. MADV_DONTNEED if it has an effect is immediate, is a heavier
>> operations and RSS is reduced. MADV_FREE only has an impact in the future
>> if there is memory pressure.
> 
> JFTR. the man page for MADV_FREE has been proposed already
> (https://lkml.org/lkml/2014/12/5/63 should be the last version AFAIR). I
> do not see it in the man-pages git tree but the patch was not in time
> for 3.19 so I guess it will only appear in 3.20.
> 

Yikes! That patch was buried in the bottom of a locked filing cabinet
in a disused lavatory. I unfortunately don't read every thread that comes
my way, especially if it doesn't look like a man-pages patch (i.e., falls
in the middle of an LKML thread that starts on another topic, and doesn't 
see linux-man@). I'll respond to that patch soon. (There are some problems
that mean I could not accept it, AFAICT.)

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04  0:09           ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-04  0:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman, linux-mm,
	Andrew Morton, linux-kernel, linux-api, mtk.manpages, linux-man,
	Rik van Riel

On Tue, Feb 03, 2015 at 12:42:53PM +0100, Vlastimil Babka wrote:
> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
> > On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> >> [CC linux-api, man pages]
> >> 
> >> On 02/02/2015 11:22 PM, Dave Hansen wrote:
> >> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> >> >> This patch identifies when a thread is frequently calling MADV_DONTNEED
> >> >> on the same region of memory and starts ignoring the hint. On an 8-core
> >> >> single-socket machine this was the impact on ebizzy using glibc 2.19.
> >> > 
> >> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> >> > called:
> >> > 
> >> >>      MADV_DONTNEED
> >> >>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> >> >>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> >> >>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> >> > 
> >> > So if we have anything depending on the behavior that it's _always_
> >> > zero-filled after an MADV_DONTNEED, this will break it.
> >> 
> >> OK, so that's a third person (including me) who understood it as a zero-fill
> >> guarantee. I think the man page should be clarified (if it's indeed not
> >> guaranteed), or we have a bug.
> >> 
> >> The implementation actually skips MADV_DONTNEED for
> >> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.
> > 
> > It doesn't skip. It fails with -EINVAL. Or I miss something.
> 
> No, I missed that. Thanks for pointing out. The manpage also explains EINVAL in
> this case:
> 
> *  The application is attempting to release locked or shared pages (with
> MADV_DONTNEED).
> 
> - that covers mlocking ok, not sure if the rest fits the "shared pages" case
> though. I dont see any check for other kinds of shared pages in the code.
> 
> >> - The word "will result" did sound as a guarantee at least to me. So here it
> >> could be changed to "may result (unless the advice is ignored)"?
> > 
> > It's too late to fix documentation. Applications already depends on the
> > beheviour.
> 
> Right, so as long as they check for EINVAL, it should be safe. It appears that
> jemalloc does.
> 
> I still wouldnt be sure just by reading the man page that the clearing is
> guaranteed whenever I dont get an error return value, though,
> 

IMHO,

Man page said
"MADV_DONTNEED: Subsequent accesses of pages in this range will succeed,
 but will result either in reloading of  the memory contents from the
 underlying mapped file (see mmap(2)) or  zero-fill-on-demand pages
 for mappings without an underlying file."

Heap by allocated by malloc(3) is anonymous page so it's a mapping
withtout an underlying file so userspace can expect zero-fill.

Man page said
"EINVAL: The application is attempting to release locked or
shared pages (with MADV_DONTNEED)"

So, user can expect the call on area by allocated by malloc(3)
if he doesn't call mlock will always be successful.

Man page said
"madivse: This call does not influence the semantics of the application
(except in the case of MADV_DONTNEED)"

So, we shouldn't break MADV_DONTNEED's semantic which free pages
instantly. It's a long time semantic and it was one of arguable issues
on MADV_FREE Rik had tried long time ago to replace MADV_DONTNEED
with MADV_FREE.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04  0:09           ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-04  0:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-man-u79uwXL29TY76Z2rM5mHXA, Rik van Riel

On Tue, Feb 03, 2015 at 12:42:53PM +0100, Vlastimil Babka wrote:
> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
> > On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> >> [CC linux-api, man pages]
> >> 
> >> On 02/02/2015 11:22 PM, Dave Hansen wrote:
> >> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> >> >> This patch identifies when a thread is frequently calling MADV_DONTNEED
> >> >> on the same region of memory and starts ignoring the hint. On an 8-core
> >> >> single-socket machine this was the impact on ebizzy using glibc 2.19.
> >> > 
> >> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> >> > called:
> >> > 
> >> >>      MADV_DONTNEED
> >> >>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> >> >>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> >> >>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> >> > 
> >> > So if we have anything depending on the behavior that it's _always_
> >> > zero-filled after an MADV_DONTNEED, this will break it.
> >> 
> >> OK, so that's a third person (including me) who understood it as a zero-fill
> >> guarantee. I think the man page should be clarified (if it's indeed not
> >> guaranteed), or we have a bug.
> >> 
> >> The implementation actually skips MADV_DONTNEED for
> >> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.
> > 
> > It doesn't skip. It fails with -EINVAL. Or I miss something.
> 
> No, I missed that. Thanks for pointing out. The manpage also explains EINVAL in
> this case:
> 
> *  The application is attempting to release locked or shared pages (with
> MADV_DONTNEED).
> 
> - that covers mlocking ok, not sure if the rest fits the "shared pages" case
> though. I dont see any check for other kinds of shared pages in the code.
> 
> >> - The word "will result" did sound as a guarantee at least to me. So here it
> >> could be changed to "may result (unless the advice is ignored)"?
> > 
> > It's too late to fix documentation. Applications already depends on the
> > beheviour.
> 
> Right, so as long as they check for EINVAL, it should be safe. It appears that
> jemalloc does.
> 
> I still wouldnt be sure just by reading the man page that the clearing is
> guaranteed whenever I dont get an error return value, though,
> 

IMHO,

Man page said
"MADV_DONTNEED: Subsequent accesses of pages in this range will succeed,
 but will result either in reloading of  the memory contents from the
 underlying mapped file (see mmap(2)) or  zero-fill-on-demand pages
 for mappings without an underlying file."

Heap by allocated by malloc(3) is anonymous page so it's a mapping
withtout an underlying file so userspace can expect zero-fill.

Man page said
"EINVAL: The application is attempting to release locked or
shared pages (with MADV_DONTNEED)"

So, user can expect the call on area by allocated by malloc(3)
if he doesn't call mlock will always be successful.

Man page said
"madivse: This call does not influence the semantics of the application
(except in the case of MADV_DONTNEED)"

So, we shouldn't break MADV_DONTNEED's semantic which free pages
instantly. It's a long time semantic and it was one of arguable issues
on MADV_FREE Rik had tried long time ago to replace MADV_DONTNEED
with MADV_FREE.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04  0:09           ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-04  0:09 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman, linux-mm,
	Andrew Morton, linux-kernel, linux-api, mtk.manpages, linux-man,
	Rik van Riel

On Tue, Feb 03, 2015 at 12:42:53PM +0100, Vlastimil Babka wrote:
> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
> > On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
> >> [CC linux-api, man pages]
> >> 
> >> On 02/02/2015 11:22 PM, Dave Hansen wrote:
> >> > On 02/02/2015 08:55 AM, Mel Gorman wrote:
> >> >> This patch identifies when a thread is frequently calling MADV_DONTNEED
> >> >> on the same region of memory and starts ignoring the hint. On an 8-core
> >> >> single-socket machine this was the impact on ebizzy using glibc 2.19.
> >> > 
> >> > The manpage, at least, claims that we zero-fill after MADV_DONTNEED is
> >> > called:
> >> > 
> >> >>      MADV_DONTNEED
> >> >>               Do  not  expect  access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources
> >> >>               associated with it.)  Subsequent accesses of pages in this range will succeed, but will result either in reloading of the memory contents  from  the
> >> >>               underlying mapped file (see mmap(2)) or zero-fill-on-demand pages for mappings without an underlying file.
> >> > 
> >> > So if we have anything depending on the behavior that it's _always_
> >> > zero-filled after an MADV_DONTNEED, this will break it.
> >> 
> >> OK, so that's a third person (including me) who understood it as a zero-fill
> >> guarantee. I think the man page should be clarified (if it's indeed not
> >> guaranteed), or we have a bug.
> >> 
> >> The implementation actually skips MADV_DONTNEED for
> >> VM_LOCKED|VM_HUGETLB|VM_PFNMAP vma's.
> > 
> > It doesn't skip. It fails with -EINVAL. Or I miss something.
> 
> No, I missed that. Thanks for pointing out. The manpage also explains EINVAL in
> this case:
> 
> *  The application is attempting to release locked or shared pages (with
> MADV_DONTNEED).
> 
> - that covers mlocking ok, not sure if the rest fits the "shared pages" case
> though. I dont see any check for other kinds of shared pages in the code.
> 
> >> - The word "will result" did sound as a guarantee at least to me. So here it
> >> could be changed to "may result (unless the advice is ignored)"?
> > 
> > It's too late to fix documentation. Applications already depends on the
> > beheviour.
> 
> Right, so as long as they check for EINVAL, it should be safe. It appears that
> jemalloc does.
> 
> I still wouldnt be sure just by reading the man page that the clearing is
> guaranteed whenever I dont get an error return value, though,
> 

IMHO,

Man page said
"MADV_DONTNEED: Subsequent accesses of pages in this range will succeed,
 but will result either in reloading of  the memory contents from the
 underlying mapped file (see mmap(2)) or  zero-fill-on-demand pages
 for mappings without an underlying file."

Heap by allocated by malloc(3) is anonymous page so it's a mapping
withtout an underlying file so userspace can expect zero-fill.

Man page said
"EINVAL: The application is attempting to release locked or
shared pages (with MADV_DONTNEED)"

So, user can expect the call on area by allocated by malloc(3)
if he doesn't call mlock will always be successful.

Man page said
"madivse: This call does not influence the semantics of the application
(except in the case of MADV_DONTNEED)"

So, we shouldn't break MADV_DONTNEED's semantic which free pages
instantly. It's a long time semantic and it was one of arguable issues
on MADV_FREE Rik had tried long time ago to replace MADV_DONTNEED
with MADV_FREE.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04 13:46             ` Vlastimil Babka
  0 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2015-02-04 13:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Kirill A. Shutemov
  Cc: Dave Hansen, Mel Gorman, linux-mm, Minchan Kim, Andrew Morton,
	linux-kernel, linux-api, linux-man, Hugh Dickins

On 02/03/2015 05:20 PM, Michael Kerrisk (man-pages) wrote:
> Hello Vlastimil
>
> Thanks for CCing me into this thread.

NP

> On 02/03/2015 12:42 PM, Vlastimil Babka wrote:
>> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
>>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
>>>
>>> It doesn't skip. It fails with -EINVAL. Or I miss something.
>>
>> No, I missed that. Thanks for pointing out. The manpage also explains EINVAL in
>> this case:
>>
>> *  The application is attempting to release locked or shared pages (with
>> MADV_DONTNEED).
>
> Yes, there is that. But the page could be more explicit when discussing
> MADV_DONTNEED in the main text. I've done that.
>
>> - that covers mlocking ok, not sure if the rest fits the "shared pages" case
>> though. I dont see any check for other kinds of shared pages in the code.
>
> Agreed. "shared" here seems confused. I've removed it. And I've
> added mention of "Huge TLB pages" for this error.
>

Thanks.

>>>> - The word "will result" did sound as a guarantee at least to me. So here it
>>>> could be changed to "may result (unless the advice is ignored)"?
>>>
>>> It's too late to fix documentation. Applications already depends on the
>>> beheviour.
>>
>> Right, so as long as they check for EINVAL, it should be safe. It appears that
>> jemalloc does.
>
> So, first a brief question: in the cases where the call does not error out,
> are we agreed that in the current implementation, MADV_DONTNEED will
> always result in zero-filled pages when the region is faulted back in
> (when we consider pages that are not backed by a file)?

I'd agree at this point.
Also we should probably mention anonymously shared pages (shmem). I 
think they behave the same as file here.

>> I still wouldnt be sure just by reading the man page that the clearing is
>> guaranteed whenever I dont get an error return value, though,
>
> I'm not quite sure what you want here. I mean: if there's an error,

I was just reiterating that the guarantee is not clear from if you 
consider all the statements in the man page.

> then the DONTNEED action didn't occur, right? Therefore, there won't
> be zero-filled pages. But, for what it's worth, I added "If the
> operation succeeds" at the start of that sentence beginning "Subsequent
> accesses...".

Yes, that should clarify it. Thanks!

> Now, some history, explaining why the page is a bit of a mess,
> and for that matter why I could really use more help on it from MM
> folk (especially in the form of actual patches [1], rather than notes
> about deficiencies in the documentation), because:
>
>      ***I simply cannot keep up with all of the details***.

I see, and expected it would be like this. I would just send patch if 
the situation was clear, but here we should agree first, and I thought 
you should be involved from the beginning.

> Once upon a time (Linux 2.4), there was madvise() with just 5 flags:
>
>         MADV_NORMAL
>         MADV_RANDOM
>         MADV_SEQUENTIAL
>         MADV_WILLNEED
>         MADV_DONTNEED
>
> And already a dozen years ago, *I* added the text about MADV_DONTNEED.
> Back then, I believe it was true. I'm not sure if it's still true now,
> but I assume for the moment that it is, and await feedback. And the
> text saying that the call does not affect the semantics of memory
> access dates back even further (and was then true, MADV_DONTNEED aside).
>
> Those 5 flags have analogs in POSIX's posix_madvise() (albeit, there
> is a semantic mismatch between the destructive MADV_DONTNEED and
> POSIX's nondestructive POSIX_MADV_DONTNEED). They also appear
> on most other implementations.
>
> Since the original implementation, numerous pieces of cruft^W^W^W
> excellent new flags have been overloaded into this one system call.
> Some of those certainly violated the "does not change the semantics
> of the application" statement, but, sadly, the kernel developers who
> implemented MADV_REMOVE or MADV_DONTFORK did not think to send a
> patch to the man page for those new flags, one that might have noted
> that the semantics of the application are changed by such flags. Equally
> sadly, I did overlook to scan the bigger page when *I* added
> documentation of these flags to those pages, otherwise I might have
> caught that detail.
>
> So, just to repeat, I  could really use more help on it from MM
> folk in the form of actual patches to the man page.

Thanks for the background. I'll try to remember to check for man-pages 
part when I review some api changing patch.

> Thanks,
>
> Michael
>
> [1] https://www.kernel.org/doc/man-pages/patches.html
>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04 13:46             ` Vlastimil Babka
  0 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2015-02-04 13:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Kirill A. Shutemov
  Cc: Dave Hansen, Mel Gorman, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	Minchan Kim, Andrew Morton, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA, Hugh Dickins

On 02/03/2015 05:20 PM, Michael Kerrisk (man-pages) wrote:
> Hello Vlastimil
>
> Thanks for CCing me into this thread.

NP

> On 02/03/2015 12:42 PM, Vlastimil Babka wrote:
>> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
>>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
>>>
>>> It doesn't skip. It fails with -EINVAL. Or I miss something.
>>
>> No, I missed that. Thanks for pointing out. The manpage also explains EINVAL in
>> this case:
>>
>> *  The application is attempting to release locked or shared pages (with
>> MADV_DONTNEED).
>
> Yes, there is that. But the page could be more explicit when discussing
> MADV_DONTNEED in the main text. I've done that.
>
>> - that covers mlocking ok, not sure if the rest fits the "shared pages" case
>> though. I dont see any check for other kinds of shared pages in the code.
>
> Agreed. "shared" here seems confused. I've removed it. And I've
> added mention of "Huge TLB pages" for this error.
>

Thanks.

>>>> - The word "will result" did sound as a guarantee at least to me. So here it
>>>> could be changed to "may result (unless the advice is ignored)"?
>>>
>>> It's too late to fix documentation. Applications already depends on the
>>> beheviour.
>>
>> Right, so as long as they check for EINVAL, it should be safe. It appears that
>> jemalloc does.
>
> So, first a brief question: in the cases where the call does not error out,
> are we agreed that in the current implementation, MADV_DONTNEED will
> always result in zero-filled pages when the region is faulted back in
> (when we consider pages that are not backed by a file)?

I'd agree at this point.
Also we should probably mention anonymously shared pages (shmem). I 
think they behave the same as file here.

>> I still wouldnt be sure just by reading the man page that the clearing is
>> guaranteed whenever I dont get an error return value, though,
>
> I'm not quite sure what you want here. I mean: if there's an error,

I was just reiterating that the guarantee is not clear from if you 
consider all the statements in the man page.

> then the DONTNEED action didn't occur, right? Therefore, there won't
> be zero-filled pages. But, for what it's worth, I added "If the
> operation succeeds" at the start of that sentence beginning "Subsequent
> accesses...".

Yes, that should clarify it. Thanks!

> Now, some history, explaining why the page is a bit of a mess,
> and for that matter why I could really use more help on it from MM
> folk (especially in the form of actual patches [1], rather than notes
> about deficiencies in the documentation), because:
>
>      ***I simply cannot keep up with all of the details***.

I see, and expected it would be like this. I would just send patch if 
the situation was clear, but here we should agree first, and I thought 
you should be involved from the beginning.

> Once upon a time (Linux 2.4), there was madvise() with just 5 flags:
>
>         MADV_NORMAL
>         MADV_RANDOM
>         MADV_SEQUENTIAL
>         MADV_WILLNEED
>         MADV_DONTNEED
>
> And already a dozen years ago, *I* added the text about MADV_DONTNEED.
> Back then, I believe it was true. I'm not sure if it's still true now,
> but I assume for the moment that it is, and await feedback. And the
> text saying that the call does not affect the semantics of memory
> access dates back even further (and was then true, MADV_DONTNEED aside).
>
> Those 5 flags have analogs in POSIX's posix_madvise() (albeit, there
> is a semantic mismatch between the destructive MADV_DONTNEED and
> POSIX's nondestructive POSIX_MADV_DONTNEED). They also appear
> on most other implementations.
>
> Since the original implementation, numerous pieces of cruft^W^W^W
> excellent new flags have been overloaded into this one system call.
> Some of those certainly violated the "does not change the semantics
> of the application" statement, but, sadly, the kernel developers who
> implemented MADV_REMOVE or MADV_DONTFORK did not think to send a
> patch to the man page for those new flags, one that might have noted
> that the semantics of the application are changed by such flags. Equally
> sadly, I did overlook to scan the bigger page when *I* added
> documentation of these flags to those pages, otherwise I might have
> caught that detail.
>
> So, just to repeat, I  could really use more help on it from MM
> folk in the form of actual patches to the man page.

Thanks for the background. I'll try to remember to check for man-pages 
part when I review some api changing patch.

> Thanks,
>
> Michael
>
> [1] https://www.kernel.org/doc/man-pages/patches.html
>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04 13:46             ` Vlastimil Babka
  0 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2015-02-04 13:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), Kirill A. Shutemov
  Cc: Dave Hansen, Mel Gorman, linux-mm, Minchan Kim, Andrew Morton,
	linux-kernel, linux-api, linux-man, Hugh Dickins

On 02/03/2015 05:20 PM, Michael Kerrisk (man-pages) wrote:
> Hello Vlastimil
>
> Thanks for CCing me into this thread.

NP

> On 02/03/2015 12:42 PM, Vlastimil Babka wrote:
>> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
>>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
>>>
>>> It doesn't skip. It fails with -EINVAL. Or I miss something.
>>
>> No, I missed that. Thanks for pointing out. The manpage also explains EINVAL in
>> this case:
>>
>> *  The application is attempting to release locked or shared pages (with
>> MADV_DONTNEED).
>
> Yes, there is that. But the page could be more explicit when discussing
> MADV_DONTNEED in the main text. I've done that.
>
>> - that covers mlocking ok, not sure if the rest fits the "shared pages" case
>> though. I dont see any check for other kinds of shared pages in the code.
>
> Agreed. "shared" here seems confused. I've removed it. And I've
> added mention of "Huge TLB pages" for this error.
>

Thanks.

>>>> - The word "will result" did sound as a guarantee at least to me. So here it
>>>> could be changed to "may result (unless the advice is ignored)"?
>>>
>>> It's too late to fix documentation. Applications already depends on the
>>> beheviour.
>>
>> Right, so as long as they check for EINVAL, it should be safe. It appears that
>> jemalloc does.
>
> So, first a brief question: in the cases where the call does not error out,
> are we agreed that in the current implementation, MADV_DONTNEED will
> always result in zero-filled pages when the region is faulted back in
> (when we consider pages that are not backed by a file)?

I'd agree at this point.
Also we should probably mention anonymously shared pages (shmem). I 
think they behave the same as file here.

>> I still wouldnt be sure just by reading the man page that the clearing is
>> guaranteed whenever I dont get an error return value, though,
>
> I'm not quite sure what you want here. I mean: if there's an error,

I was just reiterating that the guarantee is not clear from if you 
consider all the statements in the man page.

> then the DONTNEED action didn't occur, right? Therefore, there won't
> be zero-filled pages. But, for what it's worth, I added "If the
> operation succeeds" at the start of that sentence beginning "Subsequent
> accesses...".

Yes, that should clarify it. Thanks!

> Now, some history, explaining why the page is a bit of a mess,
> and for that matter why I could really use more help on it from MM
> folk (especially in the form of actual patches [1], rather than notes
> about deficiencies in the documentation), because:
>
>      ***I simply cannot keep up with all of the details***.

I see, and expected it would be like this. I would just send patch if 
the situation was clear, but here we should agree first, and I thought 
you should be involved from the beginning.

> Once upon a time (Linux 2.4), there was madvise() with just 5 flags:
>
>         MADV_NORMAL
>         MADV_RANDOM
>         MADV_SEQUENTIAL
>         MADV_WILLNEED
>         MADV_DONTNEED
>
> And already a dozen years ago, *I* added the text about MADV_DONTNEED.
> Back then, I believe it was true. I'm not sure if it's still true now,
> but I assume for the moment that it is, and await feedback. And the
> text saying that the call does not affect the semantics of memory
> access dates back even further (and was then true, MADV_DONTNEED aside).
>
> Those 5 flags have analogs in POSIX's posix_madvise() (albeit, there
> is a semantic mismatch between the destructive MADV_DONTNEED and
> POSIX's nondestructive POSIX_MADV_DONTNEED). They also appear
> on most other implementations.
>
> Since the original implementation, numerous pieces of cruft^W^W^W
> excellent new flags have been overloaded into this one system call.
> Some of those certainly violated the "does not change the semantics
> of the application" statement, but, sadly, the kernel developers who
> implemented MADV_REMOVE or MADV_DONTFORK did not think to send a
> patch to the man page for those new flags, one that might have noted
> that the semantics of the application are changed by such flags. Equally
> sadly, I did overlook to scan the bigger page when *I* added
> documentation of these flags to those pages, otherwise I might have
> caught that detail.
>
> So, just to repeat, I  could really use more help on it from MM
> folk in the form of actual patches to the man page.

Thanks for the background. I'll try to remember to check for man-pages 
part when I review some api changing patch.

> Thanks,
>
> Michael
>
> [1] https://www.kernel.org/doc/man-pages/patches.html
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04 14:00               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-04 14:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman, linux-mm,
	Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

Hello Vlastimil,

On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
> On 02/03/2015 05:20 PM, Michael Kerrisk (man-pages) wrote:
>>
>> On 02/03/2015 12:42 PM, Vlastimil Babka wrote:
>>>
>>> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
>>>>
>>>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
>>>>
>>>> It doesn't skip. It fails with -EINVAL. Or I miss something.
>>>
>>>
>>> No, I missed that. Thanks for pointing out. The manpage also explains
>>> EINVAL in
>>> this case:
>>>
>>> *  The application is attempting to release locked or shared pages (with
>>> MADV_DONTNEED).
>>
>> Yes, there is that. But the page could be more explicit when discussing
>> MADV_DONTNEED in the main text. I've done that.
>>
>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>> case
>>> though. I dont see any check for other kinds of shared pages in the code.
>>
>> Agreed. "shared" here seems confused. I've removed it. And I've
>> added mention of "Huge TLB pages" for this error.
>
> Thanks.

I also added those cases for MADV_REMOVE, BTW.

>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>> here it
>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>
>>>> It's too late to fix documentation. Applications already depends on the
>>>> beheviour.
>>>
>>> Right, so as long as they check for EINVAL, it should be safe. It appears
>>> that
>>> jemalloc does.
>>
>>
>> So, first a brief question: in the cases where the call does not error
>> out,
>> are we agreed that in the current implementation, MADV_DONTNEED will
>> always result in zero-filled pages when the region is faulted back in
>> (when we consider pages that are not backed by a file)?
>
>
> I'd agree at this point.

Thanks for the confirmation.

> Also we should probably mention anonymously shared pages (shmem). I think
> they behave the same as file here.

You mean tmpfs here, right? (I don't keep all of the synonyms straight.)

>>> I still wouldnt be sure just by reading the man page that the clearing is
>>> guaranteed whenever I dont get an error return value, though,
>>
>> I'm not quite sure what you want here. I mean: if there's an error,
>
> I was just reiterating that the guarantee is not clear from if you consider
> all the statements in the man page.
>
>> then the DONTNEED action didn't occur, right? Therefore, there won't
>> be zero-filled pages. But, for what it's worth, I added "If the
>> operation succeeds" at the start of that sentence beginning "Subsequent
>> accesses...".
>
> Yes, that should clarify it. Thanks!

Okay.

>> Now, some history, explaining why the page is a bit of a mess,
>> and for that matter why I could really use more help on it from MM
>> folk (especially in the form of actual patches [1], rather than notes
>> about deficiencies in the documentation), because:
>>
>>      ***I simply cannot keep up with all of the details***.
>
> I see, and expected it would be like this. I would just send patch if the
> situation was clear, but here we should agree first, and I thought you
> should be involved from the beginning.

Sorry -- I should have made it clearer, this statement was not
targeted at you personally, or even necessarily at this particular
thread. It was a general comment, that came up sharply to me as I
looked at how much cruft there is in the madvise() page.

>> Once upon a time (Linux 2.4), there was madvise() with just 5 flags:
>>
>>         MADV_NORMAL
>>         MADV_RANDOM
>>         MADV_SEQUENTIAL
>>         MADV_WILLNEED
>>         MADV_DONTNEED
>>
>> And already a dozen years ago, *I* added the text about MADV_DONTNEED.
>> Back then, I believe it was true. I'm not sure if it's still true now,
>> but I assume for the moment that it is, and await feedback. And the
>> text saying that the call does not affect the semantics of memory
>> access dates back even further (and was then true, MADV_DONTNEED aside).
>>
>> Those 5 flags have analogs in POSIX's posix_madvise() (albeit, there
>> is a semantic mismatch between the destructive MADV_DONTNEED and
>> POSIX's nondestructive POSIX_MADV_DONTNEED). They also appear
>> on most other implementations.
>>
>> Since the original implementation, numerous pieces of cruft^W^W^W
>> excellent new flags have been overloaded into this one system call.
>> Some of those certainly violated the "does not change the semantics
>> of the application" statement, but, sadly, the kernel developers who
>> implemented MADV_REMOVE or MADV_DONTFORK did not think to send a
>> patch to the man page for those new flags, one that might have noted
>> that the semantics of the application are changed by such flags. Equally
>> sadly, I did overlook to scan the bigger page when *I* added
>> documentation of these flags to those pages, otherwise I might have
>> caught that detail.
>>
>> So, just to repeat, I  could really use more help on it from MM
>> folk in the form of actual patches to the man page.
>
> Thanks for the background. I'll try to remember to check for man-pages part
> when I review some api changing patch.

That would be great.

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04 14:00               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-04 14:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Minchan Kim, Andrew Morton,
	lkml, Linux API, linux-man, Hugh Dickins

Hello Vlastimil,

On 4 February 2015 at 14:46, Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> wrote:
> On 02/03/2015 05:20 PM, Michael Kerrisk (man-pages) wrote:
>>
>> On 02/03/2015 12:42 PM, Vlastimil Babka wrote:
>>>
>>> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
>>>>
>>>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
>>>>
>>>> It doesn't skip. It fails with -EINVAL. Or I miss something.
>>>
>>>
>>> No, I missed that. Thanks for pointing out. The manpage also explains
>>> EINVAL in
>>> this case:
>>>
>>> *  The application is attempting to release locked or shared pages (with
>>> MADV_DONTNEED).
>>
>> Yes, there is that. But the page could be more explicit when discussing
>> MADV_DONTNEED in the main text. I've done that.
>>
>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>> case
>>> though. I dont see any check for other kinds of shared pages in the code.
>>
>> Agreed. "shared" here seems confused. I've removed it. And I've
>> added mention of "Huge TLB pages" for this error.
>
> Thanks.

I also added those cases for MADV_REMOVE, BTW.

>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>> here it
>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>
>>>> It's too late to fix documentation. Applications already depends on the
>>>> beheviour.
>>>
>>> Right, so as long as they check for EINVAL, it should be safe. It appears
>>> that
>>> jemalloc does.
>>
>>
>> So, first a brief question: in the cases where the call does not error
>> out,
>> are we agreed that in the current implementation, MADV_DONTNEED will
>> always result in zero-filled pages when the region is faulted back in
>> (when we consider pages that are not backed by a file)?
>
>
> I'd agree at this point.

Thanks for the confirmation.

> Also we should probably mention anonymously shared pages (shmem). I think
> they behave the same as file here.

You mean tmpfs here, right? (I don't keep all of the synonyms straight.)

>>> I still wouldnt be sure just by reading the man page that the clearing is
>>> guaranteed whenever I dont get an error return value, though,
>>
>> I'm not quite sure what you want here. I mean: if there's an error,
>
> I was just reiterating that the guarantee is not clear from if you consider
> all the statements in the man page.
>
>> then the DONTNEED action didn't occur, right? Therefore, there won't
>> be zero-filled pages. But, for what it's worth, I added "If the
>> operation succeeds" at the start of that sentence beginning "Subsequent
>> accesses...".
>
> Yes, that should clarify it. Thanks!

Okay.

>> Now, some history, explaining why the page is a bit of a mess,
>> and for that matter why I could really use more help on it from MM
>> folk (especially in the form of actual patches [1], rather than notes
>> about deficiencies in the documentation), because:
>>
>>      ***I simply cannot keep up with all of the details***.
>
> I see, and expected it would be like this. I would just send patch if the
> situation was clear, but here we should agree first, and I thought you
> should be involved from the beginning.

Sorry -- I should have made it clearer, this statement was not
targeted at you personally, or even necessarily at this particular
thread. It was a general comment, that came up sharply to me as I
looked at how much cruft there is in the madvise() page.

>> Once upon a time (Linux 2.4), there was madvise() with just 5 flags:
>>
>>         MADV_NORMAL
>>         MADV_RANDOM
>>         MADV_SEQUENTIAL
>>         MADV_WILLNEED
>>         MADV_DONTNEED
>>
>> And already a dozen years ago, *I* added the text about MADV_DONTNEED.
>> Back then, I believe it was true. I'm not sure if it's still true now,
>> but I assume for the moment that it is, and await feedback. And the
>> text saying that the call does not affect the semantics of memory
>> access dates back even further (and was then true, MADV_DONTNEED aside).
>>
>> Those 5 flags have analogs in POSIX's posix_madvise() (albeit, there
>> is a semantic mismatch between the destructive MADV_DONTNEED and
>> POSIX's nondestructive POSIX_MADV_DONTNEED). They also appear
>> on most other implementations.
>>
>> Since the original implementation, numerous pieces of cruft^W^W^W
>> excellent new flags have been overloaded into this one system call.
>> Some of those certainly violated the "does not change the semantics
>> of the application" statement, but, sadly, the kernel developers who
>> implemented MADV_REMOVE or MADV_DONTFORK did not think to send a
>> patch to the man page for those new flags, one that might have noted
>> that the semantics of the application are changed by such flags. Equally
>> sadly, I did overlook to scan the bigger page when *I* added
>> documentation of these flags to those pages, otherwise I might have
>> caught that detail.
>>
>> So, just to repeat, I  could really use more help on it from MM
>> folk in the form of actual patches to the man page.
>
> Thanks for the background. I'll try to remember to check for man-pages part
> when I review some api changing patch.

That would be great.

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04 14:00               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-04 14:00 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman, linux-mm,
	Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

Hello Vlastimil,

On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
> On 02/03/2015 05:20 PM, Michael Kerrisk (man-pages) wrote:
>>
>> On 02/03/2015 12:42 PM, Vlastimil Babka wrote:
>>>
>>> On 02/03/2015 11:53 AM, Kirill A. Shutemov wrote:
>>>>
>>>> On Tue, Feb 03, 2015 at 09:19:15AM +0100, Vlastimil Babka wrote:
>>>>
>>>> It doesn't skip. It fails with -EINVAL. Or I miss something.
>>>
>>>
>>> No, I missed that. Thanks for pointing out. The manpage also explains
>>> EINVAL in
>>> this case:
>>>
>>> *  The application is attempting to release locked or shared pages (with
>>> MADV_DONTNEED).
>>
>> Yes, there is that. But the page could be more explicit when discussing
>> MADV_DONTNEED in the main text. I've done that.
>>
>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>> case
>>> though. I dont see any check for other kinds of shared pages in the code.
>>
>> Agreed. "shared" here seems confused. I've removed it. And I've
>> added mention of "Huge TLB pages" for this error.
>
> Thanks.

I also added those cases for MADV_REMOVE, BTW.

>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>> here it
>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>
>>>> It's too late to fix documentation. Applications already depends on the
>>>> beheviour.
>>>
>>> Right, so as long as they check for EINVAL, it should be safe. It appears
>>> that
>>> jemalloc does.
>>
>>
>> So, first a brief question: in the cases where the call does not error
>> out,
>> are we agreed that in the current implementation, MADV_DONTNEED will
>> always result in zero-filled pages when the region is faulted back in
>> (when we consider pages that are not backed by a file)?
>
>
> I'd agree at this point.

Thanks for the confirmation.

> Also we should probably mention anonymously shared pages (shmem). I think
> they behave the same as file here.

You mean tmpfs here, right? (I don't keep all of the synonyms straight.)

>>> I still wouldnt be sure just by reading the man page that the clearing is
>>> guaranteed whenever I dont get an error return value, though,
>>
>> I'm not quite sure what you want here. I mean: if there's an error,
>
> I was just reiterating that the guarantee is not clear from if you consider
> all the statements in the man page.
>
>> then the DONTNEED action didn't occur, right? Therefore, there won't
>> be zero-filled pages. But, for what it's worth, I added "If the
>> operation succeeds" at the start of that sentence beginning "Subsequent
>> accesses...".
>
> Yes, that should clarify it. Thanks!

Okay.

>> Now, some history, explaining why the page is a bit of a mess,
>> and for that matter why I could really use more help on it from MM
>> folk (especially in the form of actual patches [1], rather than notes
>> about deficiencies in the documentation), because:
>>
>>      ***I simply cannot keep up with all of the details***.
>
> I see, and expected it would be like this. I would just send patch if the
> situation was clear, but here we should agree first, and I thought you
> should be involved from the beginning.

Sorry -- I should have made it clearer, this statement was not
targeted at you personally, or even necessarily at this particular
thread. It was a general comment, that came up sharply to me as I
looked at how much cruft there is in the madvise() page.

>> Once upon a time (Linux 2.4), there was madvise() with just 5 flags:
>>
>>         MADV_NORMAL
>>         MADV_RANDOM
>>         MADV_SEQUENTIAL
>>         MADV_WILLNEED
>>         MADV_DONTNEED
>>
>> And already a dozen years ago, *I* added the text about MADV_DONTNEED.
>> Back then, I believe it was true. I'm not sure if it's still true now,
>> but I assume for the moment that it is, and await feedback. And the
>> text saying that the call does not affect the semantics of memory
>> access dates back even further (and was then true, MADV_DONTNEED aside).
>>
>> Those 5 flags have analogs in POSIX's posix_madvise() (albeit, there
>> is a semantic mismatch between the destructive MADV_DONTNEED and
>> POSIX's nondestructive POSIX_MADV_DONTNEED). They also appear
>> on most other implementations.
>>
>> Since the original implementation, numerous pieces of cruft^W^W^W
>> excellent new flags have been overloaded into this one system call.
>> Some of those certainly violated the "does not change the semantics
>> of the application" statement, but, sadly, the kernel developers who
>> implemented MADV_REMOVE or MADV_DONTFORK did not think to send a
>> patch to the man page for those new flags, one that might have noted
>> that the semantics of the application are changed by such flags. Equally
>> sadly, I did overlook to scan the bigger page when *I* added
>> documentation of these flags to those pages, otherwise I might have
>> caught that detail.
>>
>> So, just to repeat, I  could really use more help on it from MM
>> folk in the form of actual patches to the man page.
>
> Thanks for the background. I'll try to remember to check for man-pages part
> when I review some api changing patch.

That would be great.

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-04 14:00               ` Michael Kerrisk (man-pages)
@ 2015-02-04 17:02                 ` Vlastimil Babka
  -1 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2015-02-04 17:02 UTC (permalink / raw)
  To: mtk.manpages
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman, linux-mm,
	Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
> Hello Vlastimil,
>
> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>>> case
>>>> though. I dont see any check for other kinds of shared pages in the code.
>>>
>>> Agreed. "shared" here seems confused. I've removed it. And I've
>>> added mention of "Huge TLB pages" for this error.
>>
>> Thanks.
>
> I also added those cases for MADV_REMOVE, BTW.

Right. There's also the following for MADV_REMOVE that needs updating:

"Currently, only shmfs/tmpfs supports this; other filesystems return 
with the error ENOSYS."

- it's not just shmem/tmpfs anymore. It should be best to refer to 
fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to 
date.

- AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error 
code is listed in the ERRORS section.

>>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>>> here it
>>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>>
>>>>> It's too late to fix documentation. Applications already depends on the
>>>>> beheviour.
>>>>
>>>> Right, so as long as they check for EINVAL, it should be safe. It appears
>>>> that
>>>> jemalloc does.
>>>
>>>
>>> So, first a brief question: in the cases where the call does not error
>>> out,
>>> are we agreed that in the current implementation, MADV_DONTNEED will
>>> always result in zero-filled pages when the region is faulted back in
>>> (when we consider pages that are not backed by a file)?
>>
>>
>> I'd agree at this point.
>
> Thanks for the confirmation.
>
>> Also we should probably mention anonymously shared pages (shmem). I think
>> they behave the same as file here.
>
> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)

shmem is tmpfs (that by itself would fit under "files" just fine), but 
also sys V segments created by shmget(2) and also mappings created by 
mmap with MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single 
manpage to refer to the full list.

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04 17:02                 ` Vlastimil Babka
  0 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2015-02-04 17:02 UTC (permalink / raw)
  To: mtk.manpages
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman, linux-mm,
	Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
> Hello Vlastimil,
>
> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>>> case
>>>> though. I dont see any check for other kinds of shared pages in the code.
>>>
>>> Agreed. "shared" here seems confused. I've removed it. And I've
>>> added mention of "Huge TLB pages" for this error.
>>
>> Thanks.
>
> I also added those cases for MADV_REMOVE, BTW.

Right. There's also the following for MADV_REMOVE that needs updating:

"Currently, only shmfs/tmpfs supports this; other filesystems return 
with the error ENOSYS."

- it's not just shmem/tmpfs anymore. It should be best to refer to 
fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to 
date.

- AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error 
code is listed in the ERRORS section.

>>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>>> here it
>>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>>
>>>>> It's too late to fix documentation. Applications already depends on the
>>>>> beheviour.
>>>>
>>>> Right, so as long as they check for EINVAL, it should be safe. It appears
>>>> that
>>>> jemalloc does.
>>>
>>>
>>> So, first a brief question: in the cases where the call does not error
>>> out,
>>> are we agreed that in the current implementation, MADV_DONTNEED will
>>> always result in zero-filled pages when the region is faulted back in
>>> (when we consider pages that are not backed by a file)?
>>
>>
>> I'd agree at this point.
>
> Thanks for the confirmation.
>
>> Also we should probably mention anonymously shared pages (shmem). I think
>> they behave the same as file here.
>
> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)

shmem is tmpfs (that by itself would fit under "files" just fine), but 
also sys V segments created by shmget(2) and also mappings created by 
mmap with MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single 
manpage to refer to the full list.

Thanks,
Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04 19:24                   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-04 19:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman, linux-mm,
	Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
>>
>> Hello Vlastimil,
>>
>> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
>>>>>
>>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>>>> case
>>>>> though. I dont see any check for other kinds of shared pages in the
>>>>> code.
>>>>
>>>>
>>>> Agreed. "shared" here seems confused. I've removed it. And I've
>>>> added mention of "Huge TLB pages" for this error.
>>>
>>>
>>> Thanks.
>>
>>
>> I also added those cases for MADV_REMOVE, BTW.
>
>
> Right. There's also the following for MADV_REMOVE that needs updating:
>
> "Currently, only shmfs/tmpfs supports this; other filesystems return with
> the error ENOSYS."
>
> - it's not just shmem/tmpfs anymore. It should be best to refer to
> fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> date.
>
> - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> listed in the ERRORS section.

Yup, I recently added that as well, based on a patch from Jan Chaloupka.

>>>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>>>> here it
>>>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>>>
>>>>>> It's too late to fix documentation. Applications already depends on
>>>>>> the
>>>>>> beheviour.
>>>>>
>>>>> Right, so as long as they check for EINVAL, it should be safe. It
>>>>> appears
>>>>> that
>>>>> jemalloc does.
>>>>
>>>> So, first a brief question: in the cases where the call does not error
>>>> out,
>>>> are we agreed that in the current implementation, MADV_DONTNEED will
>>>> always result in zero-filled pages when the region is faulted back in
>>>> (when we consider pages that are not backed by a file)?
>>>
>>> I'd agree at this point.
>>
>> Thanks for the confirmation.
>>
>>> Also we should probably mention anonymously shared pages (shmem). I think
>>> they behave the same as file here.
>>
>> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
>
> shmem is tmpfs (that by itself would fit under "files" just fine), but also
> sys V segments created by shmget(2) and also mappings created by mmap with
> MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> refer to the full list.

So, how about this text:

              After a successful MADV_DONTNEED operation, the seman‐
              tics  of  memory  access  in  the specified region are
              changed: subsequent accesses of  pages  in  the  range
              will  succeed,  but will result in either reloading of
              the memory contents from the  underlying  mapped  file
              (for  shared file mappings, shared anonymous mappings,
              and shmem-based techniques such  as  System  V  shared
              memory  segments)  or  zero-fill-on-demand  pages  for
              anonymous private mappings.

Thanks,

Michael

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04 19:24                   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-04 19:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Minchan Kim, Andrew Morton,
	lkml, Linux API, linux-man, Hugh Dickins

On 4 February 2015 at 18:02, Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> wrote:
> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
>>
>> Hello Vlastimil,
>>
>> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> wrote:
>>>>>
>>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>>>> case
>>>>> though. I dont see any check for other kinds of shared pages in the
>>>>> code.
>>>>
>>>>
>>>> Agreed. "shared" here seems confused. I've removed it. And I've
>>>> added mention of "Huge TLB pages" for this error.
>>>
>>>
>>> Thanks.
>>
>>
>> I also added those cases for MADV_REMOVE, BTW.
>
>
> Right. There's also the following for MADV_REMOVE that needs updating:
>
> "Currently, only shmfs/tmpfs supports this; other filesystems return with
> the error ENOSYS."
>
> - it's not just shmem/tmpfs anymore. It should be best to refer to
> fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> date.
>
> - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> listed in the ERRORS section.

Yup, I recently added that as well, based on a patch from Jan Chaloupka.

>>>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>>>> here it
>>>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>>>
>>>>>> It's too late to fix documentation. Applications already depends on
>>>>>> the
>>>>>> beheviour.
>>>>>
>>>>> Right, so as long as they check for EINVAL, it should be safe. It
>>>>> appears
>>>>> that
>>>>> jemalloc does.
>>>>
>>>> So, first a brief question: in the cases where the call does not error
>>>> out,
>>>> are we agreed that in the current implementation, MADV_DONTNEED will
>>>> always result in zero-filled pages when the region is faulted back in
>>>> (when we consider pages that are not backed by a file)?
>>>
>>> I'd agree at this point.
>>
>> Thanks for the confirmation.
>>
>>> Also we should probably mention anonymously shared pages (shmem). I think
>>> they behave the same as file here.
>>
>> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
>
> shmem is tmpfs (that by itself would fit under "files" just fine), but also
> sys V segments created by shmget(2) and also mappings created by mmap with
> MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> refer to the full list.

So, how about this text:

              After a successful MADV_DONTNEED operation, the seman‐
              tics  of  memory  access  in  the specified region are
              changed: subsequent accesses of  pages  in  the  range
              will  succeed,  but will result in either reloading of
              the memory contents from the  underlying  mapped  file
              (for  shared file mappings, shared anonymous mappings,
              and shmem-based techniques such  as  System  V  shared
              memory  segments)  or  zero-fill-on-demand  pages  for
              anonymous private mappings.

Thanks,

Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-04 19:24                   ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-04 19:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Kirill A. Shutemov, Dave Hansen, Mel Gorman, linux-mm,
	Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
>>
>> Hello Vlastimil,
>>
>> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
>>>>>
>>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>>>> case
>>>>> though. I dont see any check for other kinds of shared pages in the
>>>>> code.
>>>>
>>>>
>>>> Agreed. "shared" here seems confused. I've removed it. And I've
>>>> added mention of "Huge TLB pages" for this error.
>>>
>>>
>>> Thanks.
>>
>>
>> I also added those cases for MADV_REMOVE, BTW.
>
>
> Right. There's also the following for MADV_REMOVE that needs updating:
>
> "Currently, only shmfs/tmpfs supports this; other filesystems return with
> the error ENOSYS."
>
> - it's not just shmem/tmpfs anymore. It should be best to refer to
> fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> date.
>
> - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> listed in the ERRORS section.

Yup, I recently added that as well, based on a patch from Jan Chaloupka.

>>>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>>>> here it
>>>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>>>
>>>>>> It's too late to fix documentation. Applications already depends on
>>>>>> the
>>>>>> beheviour.
>>>>>
>>>>> Right, so as long as they check for EINVAL, it should be safe. It
>>>>> appears
>>>>> that
>>>>> jemalloc does.
>>>>
>>>> So, first a brief question: in the cases where the call does not error
>>>> out,
>>>> are we agreed that in the current implementation, MADV_DONTNEED will
>>>> always result in zero-filled pages when the region is faulted back in
>>>> (when we consider pages that are not backed by a file)?
>>>
>>> I'd agree at this point.
>>
>> Thanks for the confirmation.
>>
>>> Also we should probably mention anonymously shared pages (shmem). I think
>>> they behave the same as file here.
>>
>> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
>
> shmem is tmpfs (that by itself would fit under "files" just fine), but also
> sys V segments created by shmget(2) and also mappings created by mmap with
> MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> refer to the full list.

So, how about this text:

              After a successful MADV_DONTNEED operation, the seman‐
              tics  of  memory  access  in  the specified region are
              changed: subsequent accesses of  pages  in  the  range
              will  succeed,  but will result in either reloading of
              the memory contents from the  underlying  mapped  file
              (for  shared file mappings, shared anonymous mappings,
              and shmem-based techniques such  as  System  V  shared
              memory  segments)  or  zero-fill-on-demand  pages  for
              anonymous private mappings.

Thanks,

Michael

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-04 19:24                   ` Michael Kerrisk (man-pages)
  (?)
@ 2015-02-05  1:07                     ` Minchan Kim
  -1 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-05  1:07 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

Hello,

On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
> > On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
> >>
> >> Hello Vlastimil,
> >>
> >> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
> >>>>>
> >>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
> >>>>> case
> >>>>> though. I dont see any check for other kinds of shared pages in the
> >>>>> code.
> >>>>
> >>>>
> >>>> Agreed. "shared" here seems confused. I've removed it. And I've
> >>>> added mention of "Huge TLB pages" for this error.
> >>>
> >>>
> >>> Thanks.
> >>
> >>
> >> I also added those cases for MADV_REMOVE, BTW.
> >
> >
> > Right. There's also the following for MADV_REMOVE that needs updating:
> >
> > "Currently, only shmfs/tmpfs supports this; other filesystems return with
> > the error ENOSYS."
> >
> > - it's not just shmem/tmpfs anymore. It should be best to refer to
> > fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> > date.
> >
> > - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> > listed in the ERRORS section.
> 
> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
> 
> >>>>>>> - The word "will result" did sound as a guarantee at least to me. So
> >>>>>>> here it
> >>>>>>> could be changed to "may result (unless the advice is ignored)"?
> >>>>>>
> >>>>>> It's too late to fix documentation. Applications already depends on
> >>>>>> the
> >>>>>> beheviour.
> >>>>>
> >>>>> Right, so as long as they check for EINVAL, it should be safe. It
> >>>>> appears
> >>>>> that
> >>>>> jemalloc does.
> >>>>
> >>>> So, first a brief question: in the cases where the call does not error
> >>>> out,
> >>>> are we agreed that in the current implementation, MADV_DONTNEED will
> >>>> always result in zero-filled pages when the region is faulted back in
> >>>> (when we consider pages that are not backed by a file)?
> >>>
> >>> I'd agree at this point.
> >>
> >> Thanks for the confirmation.
> >>
> >>> Also we should probably mention anonymously shared pages (shmem). I think
> >>> they behave the same as file here.
> >>
> >> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
> >
> > shmem is tmpfs (that by itself would fit under "files" just fine), but also
> > sys V segments created by shmget(2) and also mappings created by mmap with
> > MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> > refer to the full list.
> 
> So, how about this text:
> 
>               After a successful MADV_DONTNEED operation, the seman‐
>               tics  of  memory  access  in  the specified region are
>               changed: subsequent accesses of  pages  in  the  range
>               will  succeed,  but will result in either reloading of
>               the memory contents from the  underlying  mapped  file
>               (for  shared file mappings, shared anonymous mappings,
>               and shmem-based techniques such  as  System  V  shared
>               memory  segments)  or  zero-fill-on-demand  pages  for
>               anonymous private mappings.

Hmm, I'd like to clarify.

Whether it was intention or not, some of userspace developers thought
about that syscall drop pages instantly if was no-error return so that
they will see more free pages(ie, rss for the process will be decreased)
with keeping the VMA. Can we rely on it?

And we should make error section, too.
"locked" covers mlock(2) and you said you will add hugetlb. Then,
VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
special mapping for some drivers?

One more thing, "The kernel is free to ignore the advice".
It conflicts "This call does not influence the semantics of the
application (except in the case of MADV_DONTNEED)" so
is it okay we can believe "The kernel is free to ingmore the advise
except MADV_DONTNEED"?

Thanks.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-05  1:07                     ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-05  1:07 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

Hello,

On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
> > On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
> >>
> >> Hello Vlastimil,
> >>
> >> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
> >>>>>
> >>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
> >>>>> case
> >>>>> though. I dont see any check for other kinds of shared pages in the
> >>>>> code.
> >>>>
> >>>>
> >>>> Agreed. "shared" here seems confused. I've removed it. And I've
> >>>> added mention of "Huge TLB pages" for this error.
> >>>
> >>>
> >>> Thanks.
> >>
> >>
> >> I also added those cases for MADV_REMOVE, BTW.
> >
> >
> > Right. There's also the following for MADV_REMOVE that needs updating:
> >
> > "Currently, only shmfs/tmpfs supports this; other filesystems return with
> > the error ENOSYS."
> >
> > - it's not just shmem/tmpfs anymore. It should be best to refer to
> > fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> > date.
> >
> > - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> > listed in the ERRORS section.
> 
> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
> 
> >>>>>>> - The word "will result" did sound as a guarantee at least to me. So
> >>>>>>> here it
> >>>>>>> could be changed to "may result (unless the advice is ignored)"?
> >>>>>>
> >>>>>> It's too late to fix documentation. Applications already depends on
> >>>>>> the
> >>>>>> beheviour.
> >>>>>
> >>>>> Right, so as long as they check for EINVAL, it should be safe. It
> >>>>> appears
> >>>>> that
> >>>>> jemalloc does.
> >>>>
> >>>> So, first a brief question: in the cases where the call does not error
> >>>> out,
> >>>> are we agreed that in the current implementation, MADV_DONTNEED will
> >>>> always result in zero-filled pages when the region is faulted back in
> >>>> (when we consider pages that are not backed by a file)?
> >>>
> >>> I'd agree at this point.
> >>
> >> Thanks for the confirmation.
> >>
> >>> Also we should probably mention anonymously shared pages (shmem). I think
> >>> they behave the same as file here.
> >>
> >> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
> >
> > shmem is tmpfs (that by itself would fit under "files" just fine), but also
> > sys V segments created by shmget(2) and also mappings created by mmap with
> > MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> > refer to the full list.
> 
> So, how about this text:
> 
>               After a successful MADV_DONTNEED operation, the seman‐
>               tics  of  memory  access  in  the specified region are
>               changed: subsequent accesses of  pages  in  the  range
>               will  succeed,  but will result in either reloading of
>               the memory contents from the  underlying  mapped  file
>               (for  shared file mappings, shared anonymous mappings,
>               and shmem-based techniques such  as  System  V  shared
>               memory  segments)  or  zero-fill-on-demand  pages  for
>               anonymous private mappings.

Hmm, I'd like to clarify.

Whether it was intention or not, some of userspace developers thought
about that syscall drop pages instantly if was no-error return so that
they will see more free pages(ie, rss for the process will be decreased)
with keeping the VMA. Can we rely on it?

And we should make error section, too.
"locked" covers mlock(2) and you said you will add hugetlb. Then,
VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
special mapping for some drivers?

One more thing, "The kernel is free to ignore the advice".
It conflicts "This call does not influence the semantics of the
application (except in the case of MADV_DONTNEED)" so
is it okay we can believe "The kernel is free to ingmore the advise
except MADV_DONTNEED"?

Thanks.
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-05  1:07                     ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-05  1:07 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

Hello,

On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
> > On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
> >>
> >> Hello Vlastimil,
> >>
> >> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
> >>>>>
> >>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
> >>>>> case
> >>>>> though. I dont see any check for other kinds of shared pages in the
> >>>>> code.
> >>>>
> >>>>
> >>>> Agreed. "shared" here seems confused. I've removed it. And I've
> >>>> added mention of "Huge TLB pages" for this error.
> >>>
> >>>
> >>> Thanks.
> >>
> >>
> >> I also added those cases for MADV_REMOVE, BTW.
> >
> >
> > Right. There's also the following for MADV_REMOVE that needs updating:
> >
> > "Currently, only shmfs/tmpfs supports this; other filesystems return with
> > the error ENOSYS."
> >
> > - it's not just shmem/tmpfs anymore. It should be best to refer to
> > fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> > date.
> >
> > - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> > listed in the ERRORS section.
> 
> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
> 
> >>>>>>> - The word "will result" did sound as a guarantee at least to me. So
> >>>>>>> here it
> >>>>>>> could be changed to "may result (unless the advice is ignored)"?
> >>>>>>
> >>>>>> It's too late to fix documentation. Applications already depends on
> >>>>>> the
> >>>>>> beheviour.
> >>>>>
> >>>>> Right, so as long as they check for EINVAL, it should be safe. It
> >>>>> appears
> >>>>> that
> >>>>> jemalloc does.
> >>>>
> >>>> So, first a brief question: in the cases where the call does not error
> >>>> out,
> >>>> are we agreed that in the current implementation, MADV_DONTNEED will
> >>>> always result in zero-filled pages when the region is faulted back in
> >>>> (when we consider pages that are not backed by a file)?
> >>>
> >>> I'd agree at this point.
> >>
> >> Thanks for the confirmation.
> >>
> >>> Also we should probably mention anonymously shared pages (shmem). I think
> >>> they behave the same as file here.
> >>
> >> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
> >
> > shmem is tmpfs (that by itself would fit under "files" just fine), but also
> > sys V segments created by shmget(2) and also mappings created by mmap with
> > MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> > refer to the full list.
> 
> So, how about this text:
> 
>               After a successful MADV_DONTNEED operation, the semana??
>               tics  of  memory  access  in  the specified region are
>               changed: subsequent accesses of  pages  in  the  range
>               will  succeed,  but will result in either reloading of
>               the memory contents from the  underlying  mapped  file
>               (for  shared file mappings, shared anonymous mappings,
>               and shmem-based techniques such  as  System  V  shared
>               memory  segments)  or  zero-fill-on-demand  pages  for
>               anonymous private mappings.

Hmm, I'd like to clarify.

Whether it was intention or not, some of userspace developers thought
about that syscall drop pages instantly if was no-error return so that
they will see more free pages(ie, rss for the process will be decreased)
with keeping the VMA. Can we rely on it?

And we should make error section, too.
"locked" covers mlock(2) and you said you will add hugetlb. Then,
VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
special mapping for some drivers?

One more thing, "The kernel is free to ignore the advice".
It conflicts "This call does not influence the semantics of the
application (except in the case of MADV_DONTNEED)" so
is it okay we can believe "The kernel is free to ingmore the advise
except MADV_DONTNEED"?

Thanks.
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-04 19:24                   ` Michael Kerrisk (man-pages)
  (?)
@ 2015-02-05 15:41                     ` Michal Hocko
  -1 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2015-02-05 15:41 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm, Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On Wed 04-02-15 20:24:27, Michael Kerrisk wrote:
[...]
> So, how about this text:
> 
>               After a successful MADV_DONTNEED operation, the seman‐
>               tics  of  memory  access  in  the specified region are
>               changed: subsequent accesses of  pages  in  the  range
>               will  succeed,  but will result in either reloading of
>               the memory contents from the  underlying  mapped  file

"
result in either providing the up-to-date contents of the underlying
mapped file
"

Would be more precise IMO because reload might be interpreted as a major
fault which is not necessarily the case (see below).

>               (for  shared file mappings, shared anonymous mappings,
>               and shmem-based techniques such  as  System  V  shared
>               memory  segments)  or  zero-fill-on-demand  pages  for
>               anonymous private mappings.

Yes, this wording is better because many users are not aware of
MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
mention that.

I am just wondering whether it makes sense to mention that MADV_DONTNEED
for shared mappings might be surprising and not freeing the backing
pages thus not really freeing memory until there is a memory
pressure. But maybe this is too implementation specific for a man
page. What about the following wording on top of yours?
"
Please note that the MADV_DONTNEED hint on shared mappings might not
lead to immediate freeing of pages in the range. The kernel is free to
delay this until an appropriate moment. RSS of the calling process will
be reduced however.
"
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-05 15:41                     ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2015-02-05 15:41 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm, Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On Wed 04-02-15 20:24:27, Michael Kerrisk wrote:
[...]
> So, how about this text:
> 
>               After a successful MADV_DONTNEED operation, the seman‐
>               tics  of  memory  access  in  the specified region are
>               changed: subsequent accesses of  pages  in  the  range
>               will  succeed,  but will result in either reloading of
>               the memory contents from the  underlying  mapped  file

"
result in either providing the up-to-date contents of the underlying
mapped file
"

Would be more precise IMO because reload might be interpreted as a major
fault which is not necessarily the case (see below).

>               (for  shared file mappings, shared anonymous mappings,
>               and shmem-based techniques such  as  System  V  shared
>               memory  segments)  or  zero-fill-on-demand  pages  for
>               anonymous private mappings.

Yes, this wording is better because many users are not aware of
MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
mention that.

I am just wondering whether it makes sense to mention that MADV_DONTNEED
for shared mappings might be surprising and not freeing the backing
pages thus not really freeing memory until there is a memory
pressure. But maybe this is too implementation specific for a man
page. What about the following wording on top of yours?
"
Please note that the MADV_DONTNEED hint on shared mappings might not
lead to immediate freeing of pages in the range. The kernel is free to
delay this until an appropriate moment. RSS of the calling process will
be reduced however.
"
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-05 15:41                     ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2015-02-05 15:41 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm, Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On Wed 04-02-15 20:24:27, Michael Kerrisk wrote:
[...]
> So, how about this text:
> 
>               After a successful MADV_DONTNEED operation, the semana??
>               tics  of  memory  access  in  the specified region are
>               changed: subsequent accesses of  pages  in  the  range
>               will  succeed,  but will result in either reloading of
>               the memory contents from the  underlying  mapped  file

"
result in either providing the up-to-date contents of the underlying
mapped file
"

Would be more precise IMO because reload might be interpreted as a major
fault which is not necessarily the case (see below).

>               (for  shared file mappings, shared anonymous mappings,
>               and shmem-based techniques such  as  System  V  shared
>               memory  segments)  or  zero-fill-on-demand  pages  for
>               anonymous private mappings.

Yes, this wording is better because many users are not aware of
MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
mention that.

I am just wondering whether it makes sense to mention that MADV_DONTNEED
for shared mappings might be surprising and not freeing the backing
pages thus not really freeing memory until there is a memory
pressure. But maybe this is too implementation specific for a man
page. What about the following wording on top of yours?
"
Please note that the MADV_DONTNEED hint on shared mappings might not
lead to immediate freeing of pages in the range. The kernel is free to
delay this until an appropriate moment. RSS of the calling process will
be reduced however.
"
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-02 22:18     ` Mel Gorman
@ 2015-02-05 21:44       ` Rik van Riel
  -1 siblings, 0 replies; 76+ messages in thread
From: Rik van Riel @ 2015-02-05 21:44 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On 02/02/2015 05:18 PM, Mel Gorman wrote:
> On Mon, Feb 02, 2015 at 02:05:06PM -0800, Andrew Morton wrote:
>> On Mon, 2 Feb 2015 16:55:25 +0000 Mel Gorman <mgorman@suse.de> wrote:
>>
>>> glibc malloc changed behaviour in glibc 2.10 to have per-thread arenas
>>> instead of creating new areans if the existing ones were contended.
>>> The decision appears to have been made so the allocator scales better but the
>>> downside is that madvise(MADV_DONTNEED) is now called for these per-thread
>>> areans during free. This tears down pages that would have previously
>>> remained. There is nothing wrong with this decision from a functional point
>>> of view but any threaded application that frequently allocates/frees the
>>> same-sized region is going to incur the full teardown and refault costs.
>>
>> MADV_DONTNEED has been there for many years.  How could this problem
>> not have been noticed during glibc 2.10 development/testing? 
> 
> I do not know. I only spotted it due to switching distributions. Looping
> allocations and frees of the same sizes is considered inefficient and it
> might have been dismissed on those grounds. It's probably less noticeable
> when it only affects threaded applications.
> 
>> Is there
>> some more recent kernel change which is triggering this?
>>
> 
> Not that I'm aware of.
> 
>>> This patch identifies when a thread is frequently calling MADV_DONTNEED
>>> on the same region of memory and starts ignoring the hint.
>>
>> That's pretty nasty-looking :(
>>
> 
> Yep, it is but we're very limited in terms of what we can do within the
> kernel here.
> 
>> And presumably there are all sorts of behaviours which will still
>> trigger the problem but which will avoid the start/end equality test in
>> ignore_madvise_hint()?
>>
> 
> Yes. I would expect that a simple pattern of multiple allocs followed by
> multiple frees in a loop would also trigger it.
> 
>> Really, this is a glibc problem and only a glibc problem. 
>> MADV_DONTNEED is unavoidably expensive and glibc is calling
>> MADV_DONTNEED for a region which it *does* need. 
> 
> To be fair to glibc, it calls it on a region it *thinks* it doesn't need only
> to reuse it immediately afterwards because of how the benchmark is
> implemented.
> 
>> Is there something
>> preventing this from being addressed within glibc?
>  
> I doubt it other than I expect they'll punt it back and blame either the
> application for being stupid or the kernel for being slow.

This sounds like something that could benefit from Minchan's
MADV_FREE, instead of MADV_DONTNEED.

If non page aligned malloc/free does not depend on pages
being zeroed, I suspect an MADV_DONTNEED resulting from
a malloc/free loop also does not depend on it.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-05 21:44       ` Rik van Riel
  0 siblings, 0 replies; 76+ messages in thread
From: Rik van Riel @ 2015-02-05 21:44 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: linux-mm, Minchan Kim, Vlastimil Babka, linux-kernel

On 02/02/2015 05:18 PM, Mel Gorman wrote:
> On Mon, Feb 02, 2015 at 02:05:06PM -0800, Andrew Morton wrote:
>> On Mon, 2 Feb 2015 16:55:25 +0000 Mel Gorman <mgorman@suse.de> wrote:
>>
>>> glibc malloc changed behaviour in glibc 2.10 to have per-thread arenas
>>> instead of creating new areans if the existing ones were contended.
>>> The decision appears to have been made so the allocator scales better but the
>>> downside is that madvise(MADV_DONTNEED) is now called for these per-thread
>>> areans during free. This tears down pages that would have previously
>>> remained. There is nothing wrong with this decision from a functional point
>>> of view but any threaded application that frequently allocates/frees the
>>> same-sized region is going to incur the full teardown and refault costs.
>>
>> MADV_DONTNEED has been there for many years.  How could this problem
>> not have been noticed during glibc 2.10 development/testing? 
> 
> I do not know. I only spotted it due to switching distributions. Looping
> allocations and frees of the same sizes is considered inefficient and it
> might have been dismissed on those grounds. It's probably less noticeable
> when it only affects threaded applications.
> 
>> Is there
>> some more recent kernel change which is triggering this?
>>
> 
> Not that I'm aware of.
> 
>>> This patch identifies when a thread is frequently calling MADV_DONTNEED
>>> on the same region of memory and starts ignoring the hint.
>>
>> That's pretty nasty-looking :(
>>
> 
> Yep, it is but we're very limited in terms of what we can do within the
> kernel here.
> 
>> And presumably there are all sorts of behaviours which will still
>> trigger the problem but which will avoid the start/end equality test in
>> ignore_madvise_hint()?
>>
> 
> Yes. I would expect that a simple pattern of multiple allocs followed by
> multiple frees in a loop would also trigger it.
> 
>> Really, this is a glibc problem and only a glibc problem. 
>> MADV_DONTNEED is unavoidably expensive and glibc is calling
>> MADV_DONTNEED for a region which it *does* need. 
> 
> To be fair to glibc, it calls it on a region it *thinks* it doesn't need only
> to reuse it immediately afterwards because of how the benchmark is
> implemented.
> 
>> Is there something
>> preventing this from being addressed within glibc?
>  
> I doubt it other than I expect they'll punt it back and blame either the
> application for being stupid or the kernel for being slow.

This sounds like something that could benefit from Minchan's
MADV_FREE, instead of MADV_DONTNEED.

If non page aligned malloc/free does not depend on pages
being zeroed, I suspect an MADV_DONTNEED resulting from
a malloc/free loop also does not depend on it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-05  1:07                     ` Minchan Kim
  (?)
@ 2015-02-06 15:41                       ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-06 15:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: mtk.manpages, Vlastimil Babka, Kirill A. Shutemov, Dave Hansen,
	Mel Gorman, linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On 02/05/2015 02:07 AM, Minchan Kim wrote:
> Hello,
> 
> On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
>> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
>>> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
>>>>
>>>> Hello Vlastimil,
>>>>
>>>> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
>>>>>>>
>>>>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>>>>>> case
>>>>>>> though. I dont see any check for other kinds of shared pages in the
>>>>>>> code.
>>>>>>
>>>>>>
>>>>>> Agreed. "shared" here seems confused. I've removed it. And I've
>>>>>> added mention of "Huge TLB pages" for this error.
>>>>>
>>>>>
>>>>> Thanks.
>>>>
>>>>
>>>> I also added those cases for MADV_REMOVE, BTW.
>>>
>>>
>>> Right. There's also the following for MADV_REMOVE that needs updating:
>>>
>>> "Currently, only shmfs/tmpfs supports this; other filesystems return with
>>> the error ENOSYS."
>>>
>>> - it's not just shmem/tmpfs anymore. It should be best to refer to
>>> fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
>>> date.
>>>
>>> - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
>>> listed in the ERRORS section.
>>
>> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
>>
>>>>>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>>>>>> here it
>>>>>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>>>>>
>>>>>>>> It's too late to fix documentation. Applications already depends on
>>>>>>>> the
>>>>>>>> beheviour.
>>>>>>>
>>>>>>> Right, so as long as they check for EINVAL, it should be safe. It
>>>>>>> appears
>>>>>>> that
>>>>>>> jemalloc does.
>>>>>>
>>>>>> So, first a brief question: in the cases where the call does not error
>>>>>> out,
>>>>>> are we agreed that in the current implementation, MADV_DONTNEED will
>>>>>> always result in zero-filled pages when the region is faulted back in
>>>>>> (when we consider pages that are not backed by a file)?
>>>>>
>>>>> I'd agree at this point.
>>>>
>>>> Thanks for the confirmation.
>>>>
>>>>> Also we should probably mention anonymously shared pages (shmem). I think
>>>>> they behave the same as file here.
>>>>
>>>> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
>>>
>>> shmem is tmpfs (that by itself would fit under "files" just fine), but also
>>> sys V segments created by shmget(2) and also mappings created by mmap with
>>> MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
>>> refer to the full list.
>>
>> So, how about this text:
>>
>>               After a successful MADV_DONTNEED operation, the seman‐
>>               tics  of  memory  access  in  the specified region are
>>               changed: subsequent accesses of  pages  in  the  range
>>               will  succeed,  but will result in either reloading of
>>               the memory contents from the  underlying  mapped  file
>>               (for  shared file mappings, shared anonymous mappings,
>>               and shmem-based techniques such  as  System  V  shared
>>               memory  segments)  or  zero-fill-on-demand  pages  for
>>               anonymous private mappings.
> 
> Hmm, I'd like to clarify.
> 
> Whether it was intention or not, some of userspace developers thought
> about that syscall drop pages instantly if was no-error return so that
> they will see more free pages(ie, rss for the process will be decreased)
> with keeping the VMA. Can we rely on it?

I do not know. Michael?

> And we should make error section, too.
> "locked" covers mlock(2) and you said you will add hugetlb. Then,
> VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
> special mapping for some drivers?

I'm open for offers on what to add.
 
> One more thing, "The kernel is free to ignore the advice".
> It conflicts "This call does not influence the semantics of the
> application (except in the case of MADV_DONTNEED)" so
> is it okay we can believe "The kernel is free to ingmore the advise
> except MADV_DONTNEED"?

I decided to just drop the sentence

     The kernel is free to ignore the advice.

It creates misunderstandings, and does not really add information.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-06 15:41                       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-06 15:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Vlastimil Babka,
	Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, lkml, Linux API,
	linux-man, Hugh Dickins

On 02/05/2015 02:07 AM, Minchan Kim wrote:
> Hello,
> 
> On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
>> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> wrote:
>>> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
>>>>
>>>> Hello Vlastimil,
>>>>
>>>> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> wrote:
>>>>>>>
>>>>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>>>>>> case
>>>>>>> though. I dont see any check for other kinds of shared pages in the
>>>>>>> code.
>>>>>>
>>>>>>
>>>>>> Agreed. "shared" here seems confused. I've removed it. And I've
>>>>>> added mention of "Huge TLB pages" for this error.
>>>>>
>>>>>
>>>>> Thanks.
>>>>
>>>>
>>>> I also added those cases for MADV_REMOVE, BTW.
>>>
>>>
>>> Right. There's also the following for MADV_REMOVE that needs updating:
>>>
>>> "Currently, only shmfs/tmpfs supports this; other filesystems return with
>>> the error ENOSYS."
>>>
>>> - it's not just shmem/tmpfs anymore. It should be best to refer to
>>> fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
>>> date.
>>>
>>> - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
>>> listed in the ERRORS section.
>>
>> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
>>
>>>>>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>>>>>> here it
>>>>>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>>>>>
>>>>>>>> It's too late to fix documentation. Applications already depends on
>>>>>>>> the
>>>>>>>> beheviour.
>>>>>>>
>>>>>>> Right, so as long as they check for EINVAL, it should be safe. It
>>>>>>> appears
>>>>>>> that
>>>>>>> jemalloc does.
>>>>>>
>>>>>> So, first a brief question: in the cases where the call does not error
>>>>>> out,
>>>>>> are we agreed that in the current implementation, MADV_DONTNEED will
>>>>>> always result in zero-filled pages when the region is faulted back in
>>>>>> (when we consider pages that are not backed by a file)?
>>>>>
>>>>> I'd agree at this point.
>>>>
>>>> Thanks for the confirmation.
>>>>
>>>>> Also we should probably mention anonymously shared pages (shmem). I think
>>>>> they behave the same as file here.
>>>>
>>>> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
>>>
>>> shmem is tmpfs (that by itself would fit under "files" just fine), but also
>>> sys V segments created by shmget(2) and also mappings created by mmap with
>>> MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
>>> refer to the full list.
>>
>> So, how about this text:
>>
>>               After a successful MADV_DONTNEED operation, the seman‐
>>               tics  of  memory  access  in  the specified region are
>>               changed: subsequent accesses of  pages  in  the  range
>>               will  succeed,  but will result in either reloading of
>>               the memory contents from the  underlying  mapped  file
>>               (for  shared file mappings, shared anonymous mappings,
>>               and shmem-based techniques such  as  System  V  shared
>>               memory  segments)  or  zero-fill-on-demand  pages  for
>>               anonymous private mappings.
> 
> Hmm, I'd like to clarify.
> 
> Whether it was intention or not, some of userspace developers thought
> about that syscall drop pages instantly if was no-error return so that
> they will see more free pages(ie, rss for the process will be decreased)
> with keeping the VMA. Can we rely on it?

I do not know. Michael?

> And we should make error section, too.
> "locked" covers mlock(2) and you said you will add hugetlb. Then,
> VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
> special mapping for some drivers?

I'm open for offers on what to add.
 
> One more thing, "The kernel is free to ignore the advice".
> It conflicts "This call does not influence the semantics of the
> application (except in the case of MADV_DONTNEED)" so
> is it okay we can believe "The kernel is free to ingmore the advise
> except MADV_DONTNEED"?

I decided to just drop the sentence

     The kernel is free to ignore the advice.

It creates misunderstandings, and does not really add information.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-06 15:41                       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-06 15:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: mtk.manpages, Vlastimil Babka, Kirill A. Shutemov, Dave Hansen,
	Mel Gorman, linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On 02/05/2015 02:07 AM, Minchan Kim wrote:
> Hello,
> 
> On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
>> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
>>> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
>>>>
>>>> Hello Vlastimil,
>>>>
>>>> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
>>>>>>>
>>>>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
>>>>>>> case
>>>>>>> though. I dont see any check for other kinds of shared pages in the
>>>>>>> code.
>>>>>>
>>>>>>
>>>>>> Agreed. "shared" here seems confused. I've removed it. And I've
>>>>>> added mention of "Huge TLB pages" for this error.
>>>>>
>>>>>
>>>>> Thanks.
>>>>
>>>>
>>>> I also added those cases for MADV_REMOVE, BTW.
>>>
>>>
>>> Right. There's also the following for MADV_REMOVE that needs updating:
>>>
>>> "Currently, only shmfs/tmpfs supports this; other filesystems return with
>>> the error ENOSYS."
>>>
>>> - it's not just shmem/tmpfs anymore. It should be best to refer to
>>> fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
>>> date.
>>>
>>> - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
>>> listed in the ERRORS section.
>>
>> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
>>
>>>>>>>>> - The word "will result" did sound as a guarantee at least to me. So
>>>>>>>>> here it
>>>>>>>>> could be changed to "may result (unless the advice is ignored)"?
>>>>>>>>
>>>>>>>> It's too late to fix documentation. Applications already depends on
>>>>>>>> the
>>>>>>>> beheviour.
>>>>>>>
>>>>>>> Right, so as long as they check for EINVAL, it should be safe. It
>>>>>>> appears
>>>>>>> that
>>>>>>> jemalloc does.
>>>>>>
>>>>>> So, first a brief question: in the cases where the call does not error
>>>>>> out,
>>>>>> are we agreed that in the current implementation, MADV_DONTNEED will
>>>>>> always result in zero-filled pages when the region is faulted back in
>>>>>> (when we consider pages that are not backed by a file)?
>>>>>
>>>>> I'd agree at this point.
>>>>
>>>> Thanks for the confirmation.
>>>>
>>>>> Also we should probably mention anonymously shared pages (shmem). I think
>>>>> they behave the same as file here.
>>>>
>>>> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
>>>
>>> shmem is tmpfs (that by itself would fit under "files" just fine), but also
>>> sys V segments created by shmget(2) and also mappings created by mmap with
>>> MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
>>> refer to the full list.
>>
>> So, how about this text:
>>
>>               After a successful MADV_DONTNEED operation, the semana??
>>               tics  of  memory  access  in  the specified region are
>>               changed: subsequent accesses of  pages  in  the  range
>>               will  succeed,  but will result in either reloading of
>>               the memory contents from the  underlying  mapped  file
>>               (for  shared file mappings, shared anonymous mappings,
>>               and shmem-based techniques such  as  System  V  shared
>>               memory  segments)  or  zero-fill-on-demand  pages  for
>>               anonymous private mappings.
> 
> Hmm, I'd like to clarify.
> 
> Whether it was intention or not, some of userspace developers thought
> about that syscall drop pages instantly if was no-error return so that
> they will see more free pages(ie, rss for the process will be decreased)
> with keeping the VMA. Can we rely on it?

I do not know. Michael?

> And we should make error section, too.
> "locked" covers mlock(2) and you said you will add hugetlb. Then,
> VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
> special mapping for some drivers?

I'm open for offers on what to add.
 
> One more thing, "The kernel is free to ignore the advice".
> It conflicts "This call does not influence the semantics of the
> application (except in the case of MADV_DONTNEED)" so
> is it okay we can believe "The kernel is free to ingmore the advise
> except MADV_DONTNEED"?

I decided to just drop the sentence

     The kernel is free to ignore the advice.

It creates misunderstandings, and does not really add information.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-05 15:41                     ` Michal Hocko
  (?)
@ 2015-02-06 15:57                       ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-06 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: mtk.manpages, Vlastimil Babka, Kirill A. Shutemov, Dave Hansen,
	Mel Gorman, linux-mm, Minchan Kim, Andrew Morton, lkml,
	Linux API, linux-man, Hugh Dickins

Hi Michael

On 02/05/2015 04:41 PM, Michal Hocko wrote:
> On Wed 04-02-15 20:24:27, Michael Kerrisk wrote:
> [...]
>> So, how about this text:
>>
>>               After a successful MADV_DONTNEED operation, the seman‐
>>               tics  of  memory  access  in  the specified region are
>>               changed: subsequent accesses of  pages  in  the  range
>>               will  succeed,  but will result in either reloading of
>>               the memory contents from the  underlying  mapped  file
> 
> "
> result in either providing the up-to-date contents of the underlying
> mapped file
> "

Thanks! I did something like that. See below.

> Would be more precise IMO because reload might be interpreted as a major
> fault which is not necessarily the case (see below).
> 
>>               (for  shared file mappings, shared anonymous mappings,
>>               and shmem-based techniques such  as  System  V  shared
>>               memory  segments)  or  zero-fill-on-demand  pages  for
>>               anonymous private mappings.
> 
> Yes, this wording is better because many users are not aware of
> MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
> mention that.

(Michal, would you have a text to propose to add to the mmap(2) page?
Maybe it would be useful to add something there.)

> 
> I am just wondering whether it makes sense to mention that MADV_DONTNEED
> for shared mappings might be surprising and not freeing the backing
> pages thus not really freeing memory until there is a memory
> pressure. But maybe this is too implementation specific for a man
> page. What about the following wording on top of yours?
> "
> Please note that the MADV_DONTNEED hint on shared mappings might not
> lead to immediate freeing of pages in the range. The kernel is free to
> delay this until an appropriate moment. RSS of the calling process will
> be reduced however.
> "

Thanks! I added this, but dropped in the word "immediately" in the last 
sentence, since I assume that was implied. So now we have:

              After  a  successful MADV_DONTNEED operation, the seman‐
              tics of  memory  access  in  the  specified  region  are
              changed:  subsequent accesses of pages in the range will
              succeed, but will result in either repopulating the mem‐
              ory  contents from the up-to-date contents of the under‐
              lying mapped file  (for  shared  file  mappings,  shared
              anonymous  mappings,  and shmem-based techniques such as
              System V shared memory segments) or  zero-fill-on-demand
              pages for anonymous private mappings.

              Note  that,  when applied to shared mappings, MADV_DONT‐
              NEED might not lead to immediate freeing of the pages in
              the  range.   The  kernel  is  free to delay freeing the
              pages until an appropriate  moment.   The  resident  set
              size  (RSS)  of  the calling process will be immediately
              reduced however.

The current draft of the page can be found in a branch,
http://git.kernel.org/cgit/docs/man-pages/man-pages.git/log/?h=draft_madvise

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-06 15:57                       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-06 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: mtk.manpages, Vlastimil Babka, Kirill A. Shutemov, Dave Hansen,
	Mel Gorman, linux-mm, Minchan Kim, Andrew Morton, lkml,
	Linux API, linux-man, Hugh Dickins

Hi Michael

On 02/05/2015 04:41 PM, Michal Hocko wrote:
> On Wed 04-02-15 20:24:27, Michael Kerrisk wrote:
> [...]
>> So, how about this text:
>>
>>               After a successful MADV_DONTNEED operation, the seman‐
>>               tics  of  memory  access  in  the specified region are
>>               changed: subsequent accesses of  pages  in  the  range
>>               will  succeed,  but will result in either reloading of
>>               the memory contents from the  underlying  mapped  file
> 
> "
> result in either providing the up-to-date contents of the underlying
> mapped file
> "

Thanks! I did something like that. See below.

> Would be more precise IMO because reload might be interpreted as a major
> fault which is not necessarily the case (see below).
> 
>>               (for  shared file mappings, shared anonymous mappings,
>>               and shmem-based techniques such  as  System  V  shared
>>               memory  segments)  or  zero-fill-on-demand  pages  for
>>               anonymous private mappings.
> 
> Yes, this wording is better because many users are not aware of
> MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
> mention that.

(Michal, would you have a text to propose to add to the mmap(2) page?
Maybe it would be useful to add something there.)

> 
> I am just wondering whether it makes sense to mention that MADV_DONTNEED
> for shared mappings might be surprising and not freeing the backing
> pages thus not really freeing memory until there is a memory
> pressure. But maybe this is too implementation specific for a man
> page. What about the following wording on top of yours?
> "
> Please note that the MADV_DONTNEED hint on shared mappings might not
> lead to immediate freeing of pages in the range. The kernel is free to
> delay this until an appropriate moment. RSS of the calling process will
> be reduced however.
> "

Thanks! I added this, but dropped in the word "immediately" in the last 
sentence, since I assume that was implied. So now we have:

              After  a  successful MADV_DONTNEED operation, the seman‐
              tics of  memory  access  in  the  specified  region  are
              changed:  subsequent accesses of pages in the range will
              succeed, but will result in either repopulating the mem‐
              ory  contents from the up-to-date contents of the under‐
              lying mapped file  (for  shared  file  mappings,  shared
              anonymous  mappings,  and shmem-based techniques such as
              System V shared memory segments) or  zero-fill-on-demand
              pages for anonymous private mappings.

              Note  that,  when applied to shared mappings, MADV_DONT‐
              NEED might not lead to immediate freeing of the pages in
              the  range.   The  kernel  is  free to delay freeing the
              pages until an appropriate  moment.   The  resident  set
              size  (RSS)  of  the calling process will be immediately
              reduced however.

The current draft of the page can be found in a branch,
http://git.kernel.org/cgit/docs/man-pages/man-pages.git/log/?h=draft_madvise

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-06 15:57                       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-06 15:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: mtk.manpages, Vlastimil Babka, Kirill A. Shutemov, Dave Hansen,
	Mel Gorman, linux-mm, Minchan Kim, Andrew Morton, lkml,
	Linux API, linux-man, Hugh Dickins

Hi Michael

On 02/05/2015 04:41 PM, Michal Hocko wrote:
> On Wed 04-02-15 20:24:27, Michael Kerrisk wrote:
> [...]
>> So, how about this text:
>>
>>               After a successful MADV_DONTNEED operation, the semana??
>>               tics  of  memory  access  in  the specified region are
>>               changed: subsequent accesses of  pages  in  the  range
>>               will  succeed,  but will result in either reloading of
>>               the memory contents from the  underlying  mapped  file
> 
> "
> result in either providing the up-to-date contents of the underlying
> mapped file
> "

Thanks! I did something like that. See below.

> Would be more precise IMO because reload might be interpreted as a major
> fault which is not necessarily the case (see below).
> 
>>               (for  shared file mappings, shared anonymous mappings,
>>               and shmem-based techniques such  as  System  V  shared
>>               memory  segments)  or  zero-fill-on-demand  pages  for
>>               anonymous private mappings.
> 
> Yes, this wording is better because many users are not aware of
> MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
> mention that.

(Michal, would you have a text to propose to add to the mmap(2) page?
Maybe it would be useful to add something there.)

> 
> I am just wondering whether it makes sense to mention that MADV_DONTNEED
> for shared mappings might be surprising and not freeing the backing
> pages thus not really freeing memory until there is a memory
> pressure. But maybe this is too implementation specific for a man
> page. What about the following wording on top of yours?
> "
> Please note that the MADV_DONTNEED hint on shared mappings might not
> lead to immediate freeing of pages in the range. The kernel is free to
> delay this until an appropriate moment. RSS of the calling process will
> be reduced however.
> "

Thanks! I added this, but dropped in the word "immediately" in the last 
sentence, since I assume that was implied. So now we have:

              After  a  successful MADV_DONTNEED operation, the semana??
              tics of  memory  access  in  the  specified  region  are
              changed:  subsequent accesses of pages in the range will
              succeed, but will result in either repopulating the mema??
              ory  contents from the up-to-date contents of the undera??
              lying mapped file  (for  shared  file  mappings,  shared
              anonymous  mappings,  and shmem-based techniques such as
              System V shared memory segments) or  zero-fill-on-demand
              pages for anonymous private mappings.

              Note  that,  when applied to shared mappings, MADV_DONTa??
              NEED might not lead to immediate freeing of the pages in
              the  range.   The  kernel  is  free to delay freeing the
              pages until an appropriate  moment.   The  resident  set
              size  (RSS)  of  the calling process will be immediately
              reduced however.

The current draft of the page can be found in a branch,
http://git.kernel.org/cgit/docs/man-pages/man-pages.git/log/?h=draft_madvise

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-06 20:45                         ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2015-02-06 20:45 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm, Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On Fri 06-02-15 16:57:50, Michael Kerrisk wrote:
[...]
> > Yes, this wording is better because many users are not aware of
> > MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
> > mention that.
> 
> (Michal, would you have a text to propose to add to the mmap(2) page?
> Maybe it would be useful to add something there.)

I am half way on vacation, but I can cook a patch after I am back after
week.
 
> > I am just wondering whether it makes sense to mention that MADV_DONTNEED
> > for shared mappings might be surprising and not freeing the backing
> > pages thus not really freeing memory until there is a memory
> > pressure. But maybe this is too implementation specific for a man
> > page. What about the following wording on top of yours?
> > "
> > Please note that the MADV_DONTNEED hint on shared mappings might not
> > lead to immediate freeing of pages in the range. The kernel is free to
> > delay this until an appropriate moment. RSS of the calling process will
> > be reduced however.
> > "
> 
> Thanks! I added this, but dropped in the word "immediately" in the last 
> sentence, since I assume that was implied. So now we have:
> 
>               After  a  successful MADV_DONTNEED operation, the seman‐
>               tics of  memory  access  in  the  specified  region  are
>               changed:  subsequent accesses of pages in the range will
>               succeed, but will result in either repopulating the mem‐
>               ory  contents from the up-to-date contents of the under‐
>               lying mapped file  (for  shared  file  mappings,  shared
>               anonymous  mappings,  and shmem-based techniques such as
>               System V shared memory segments) or  zero-fill-on-demand
>               pages for anonymous private mappings.
> 
>               Note  that,  when applied to shared mappings, MADV_DONT‐
>               NEED might not lead to immediate freeing of the pages in
>               the  range.   The  kernel  is  free to delay freeing the
>               pages until an appropriate  moment.   The  resident  set
>               size  (RSS)  of  the calling process will be immediately
>               reduced however.

This sounds good to me and it is definitely much better than the current
state. Thanks!

> The current draft of the page can be found in a branch,
> http://git.kernel.org/cgit/docs/man-pages/man-pages.git/log/?h=draft_madvise
> 
> Thanks,
> 
> Michael
> 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-06 20:45                         ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2015-02-06 20:45 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Minchan Kim, Andrew Morton,
	lkml, Linux API, linux-man, Hugh Dickins

On Fri 06-02-15 16:57:50, Michael Kerrisk wrote:
[...]
> > Yes, this wording is better because many users are not aware of
> > MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
> > mention that.
> 
> (Michal, would you have a text to propose to add to the mmap(2) page?
> Maybe it would be useful to add something there.)

I am half way on vacation, but I can cook a patch after I am back after
week.
 
> > I am just wondering whether it makes sense to mention that MADV_DONTNEED
> > for shared mappings might be surprising and not freeing the backing
> > pages thus not really freeing memory until there is a memory
> > pressure. But maybe this is too implementation specific for a man
> > page. What about the following wording on top of yours?
> > "
> > Please note that the MADV_DONTNEED hint on shared mappings might not
> > lead to immediate freeing of pages in the range. The kernel is free to
> > delay this until an appropriate moment. RSS of the calling process will
> > be reduced however.
> > "
> 
> Thanks! I added this, but dropped in the word "immediately" in the last 
> sentence, since I assume that was implied. So now we have:
> 
>               After  a  successful MADV_DONTNEED operation, the seman‐
>               tics of  memory  access  in  the  specified  region  are
>               changed:  subsequent accesses of pages in the range will
>               succeed, but will result in either repopulating the mem‐
>               ory  contents from the up-to-date contents of the under‐
>               lying mapped file  (for  shared  file  mappings,  shared
>               anonymous  mappings,  and shmem-based techniques such as
>               System V shared memory segments) or  zero-fill-on-demand
>               pages for anonymous private mappings.
> 
>               Note  that,  when applied to shared mappings, MADV_DONT‐
>               NEED might not lead to immediate freeing of the pages in
>               the  range.   The  kernel  is  free to delay freeing the
>               pages until an appropriate  moment.   The  resident  set
>               size  (RSS)  of  the calling process will be immediately
>               reduced however.

This sounds good to me and it is definitely much better than the current
state. Thanks!

> The current draft of the page can be found in a branch,
> http://git.kernel.org/cgit/docs/man-pages/man-pages.git/log/?h=draft_madvise
> 
> Thanks,
> 
> Michael
> 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-06 20:45                         ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2015-02-06 20:45 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm, Minchan Kim, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On Fri 06-02-15 16:57:50, Michael Kerrisk wrote:
[...]
> > Yes, this wording is better because many users are not aware of
> > MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
> > mention that.
> 
> (Michal, would you have a text to propose to add to the mmap(2) page?
> Maybe it would be useful to add something there.)

I am half way on vacation, but I can cook a patch after I am back after
week.
 
> > I am just wondering whether it makes sense to mention that MADV_DONTNEED
> > for shared mappings might be surprising and not freeing the backing
> > pages thus not really freeing memory until there is a memory
> > pressure. But maybe this is too implementation specific for a man
> > page. What about the following wording on top of yours?
> > "
> > Please note that the MADV_DONTNEED hint on shared mappings might not
> > lead to immediate freeing of pages in the range. The kernel is free to
> > delay this until an appropriate moment. RSS of the calling process will
> > be reduced however.
> > "
> 
> Thanks! I added this, but dropped in the word "immediately" in the last 
> sentence, since I assume that was implied. So now we have:
> 
>               After  a  successful MADV_DONTNEED operation, the semana??
>               tics of  memory  access  in  the  specified  region  are
>               changed:  subsequent accesses of pages in the range will
>               succeed, but will result in either repopulating the mema??
>               ory  contents from the up-to-date contents of the undera??
>               lying mapped file  (for  shared  file  mappings,  shared
>               anonymous  mappings,  and shmem-based techniques such as
>               System V shared memory segments) or  zero-fill-on-demand
>               pages for anonymous private mappings.
> 
>               Note  that,  when applied to shared mappings, MADV_DONTa??
>               NEED might not lead to immediate freeing of the pages in
>               the  range.   The  kernel  is  free to delay freeing the
>               pages until an appropriate  moment.   The  resident  set
>               size  (RSS)  of  the calling process will be immediately
>               reduced however.

This sounds good to me and it is definitely much better than the current
state. Thanks!

> The current draft of the page can be found in a branch,
> http://git.kernel.org/cgit/docs/man-pages/man-pages.git/log/?h=draft_madvise
> 
> Thanks,
> 
> Michael
> 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-09  6:46                         ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-09  6:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

Hello, Michael

On Fri, Feb 06, 2015 at 04:41:12PM +0100, Michael Kerrisk (man-pages) wrote:
> On 02/05/2015 02:07 AM, Minchan Kim wrote:
> > Hello,
> > 
> > On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
> >> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
> >>> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
> >>>>
> >>>> Hello Vlastimil,
> >>>>
> >>>> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
> >>>>>>>
> >>>>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
> >>>>>>> case
> >>>>>>> though. I dont see any check for other kinds of shared pages in the
> >>>>>>> code.
> >>>>>>
> >>>>>>
> >>>>>> Agreed. "shared" here seems confused. I've removed it. And I've
> >>>>>> added mention of "Huge TLB pages" for this error.
> >>>>>
> >>>>>
> >>>>> Thanks.
> >>>>
> >>>>
> >>>> I also added those cases for MADV_REMOVE, BTW.
> >>>
> >>>
> >>> Right. There's also the following for MADV_REMOVE that needs updating:
> >>>
> >>> "Currently, only shmfs/tmpfs supports this; other filesystems return with
> >>> the error ENOSYS."
> >>>
> >>> - it's not just shmem/tmpfs anymore. It should be best to refer to
> >>> fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> >>> date.
> >>>
> >>> - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> >>> listed in the ERRORS section.
> >>
> >> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
> >>
> >>>>>>>>> - The word "will result" did sound as a guarantee at least to me. So
> >>>>>>>>> here it
> >>>>>>>>> could be changed to "may result (unless the advice is ignored)"?
> >>>>>>>>
> >>>>>>>> It's too late to fix documentation. Applications already depends on
> >>>>>>>> the
> >>>>>>>> beheviour.
> >>>>>>>
> >>>>>>> Right, so as long as they check for EINVAL, it should be safe. It
> >>>>>>> appears
> >>>>>>> that
> >>>>>>> jemalloc does.
> >>>>>>
> >>>>>> So, first a brief question: in the cases where the call does not error
> >>>>>> out,
> >>>>>> are we agreed that in the current implementation, MADV_DONTNEED will
> >>>>>> always result in zero-filled pages when the region is faulted back in
> >>>>>> (when we consider pages that are not backed by a file)?
> >>>>>
> >>>>> I'd agree at this point.
> >>>>
> >>>> Thanks for the confirmation.
> >>>>
> >>>>> Also we should probably mention anonymously shared pages (shmem). I think
> >>>>> they behave the same as file here.
> >>>>
> >>>> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
> >>>
> >>> shmem is tmpfs (that by itself would fit under "files" just fine), but also
> >>> sys V segments created by shmget(2) and also mappings created by mmap with
> >>> MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> >>> refer to the full list.
> >>
> >> So, how about this text:
> >>
> >>               After a successful MADV_DONTNEED operation, the seman‐
> >>               tics  of  memory  access  in  the specified region are
> >>               changed: subsequent accesses of  pages  in  the  range
> >>               will  succeed,  but will result in either reloading of
> >>               the memory contents from the  underlying  mapped  file
> >>               (for  shared file mappings, shared anonymous mappings,
> >>               and shmem-based techniques such  as  System  V  shared
> >>               memory  segments)  or  zero-fill-on-demand  pages  for
> >>               anonymous private mappings.
> > 
> > Hmm, I'd like to clarify.
> > 
> > Whether it was intention or not, some of userspace developers thought
> > about that syscall drop pages instantly if was no-error return so that
> > they will see more free pages(ie, rss for the process will be decreased)
> > with keeping the VMA. Can we rely on it?
> 
> I do not know. Michael?

It's important to identify difference between MADV_DONTNEED and MADV_FREE
so it would be better to clear out in this chance.

> 
> > And we should make error section, too.
> > "locked" covers mlock(2) and you said you will add hugetlb. Then,
> > VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
> > special mapping for some drivers?
> 
> I'm open for offers on what to add.

I suggests from quote "LWN" http://lwn.net/Articles/162860/
"*special mapping* which is not made up of "normal" pages.
It is usually created by device drivers which map special memory areas
into user space"

>  
> > One more thing, "The kernel is free to ignore the advice".
> > It conflicts "This call does not influence the semantics of the
> > application (except in the case of MADV_DONTNEED)" so
> > is it okay we can believe "The kernel is free to ingmore the advise
> > except MADV_DONTNEED"?
> 
> I decided to just drop the sentence
> 
>      The kernel is free to ignore the advice.
> 
> It creates misunderstandings, and does not really add information.

Sounds good.

> 
> Cheers,
> 
> Michael
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-09  6:46                         ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-09  6:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, lkml, Linux API,
	linux-man, Hugh Dickins

Hello, Michael

On Fri, Feb 06, 2015 at 04:41:12PM +0100, Michael Kerrisk (man-pages) wrote:
> On 02/05/2015 02:07 AM, Minchan Kim wrote:
> > Hello,
> > 
> > On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
> >> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> wrote:
> >>> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
> >>>>
> >>>> Hello Vlastimil,
> >>>>
> >>>> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org> wrote:
> >>>>>>>
> >>>>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
> >>>>>>> case
> >>>>>>> though. I dont see any check for other kinds of shared pages in the
> >>>>>>> code.
> >>>>>>
> >>>>>>
> >>>>>> Agreed. "shared" here seems confused. I've removed it. And I've
> >>>>>> added mention of "Huge TLB pages" for this error.
> >>>>>
> >>>>>
> >>>>> Thanks.
> >>>>
> >>>>
> >>>> I also added those cases for MADV_REMOVE, BTW.
> >>>
> >>>
> >>> Right. There's also the following for MADV_REMOVE that needs updating:
> >>>
> >>> "Currently, only shmfs/tmpfs supports this; other filesystems return with
> >>> the error ENOSYS."
> >>>
> >>> - it's not just shmem/tmpfs anymore. It should be best to refer to
> >>> fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> >>> date.
> >>>
> >>> - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> >>> listed in the ERRORS section.
> >>
> >> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
> >>
> >>>>>>>>> - The word "will result" did sound as a guarantee at least to me. So
> >>>>>>>>> here it
> >>>>>>>>> could be changed to "may result (unless the advice is ignored)"?
> >>>>>>>>
> >>>>>>>> It's too late to fix documentation. Applications already depends on
> >>>>>>>> the
> >>>>>>>> beheviour.
> >>>>>>>
> >>>>>>> Right, so as long as they check for EINVAL, it should be safe. It
> >>>>>>> appears
> >>>>>>> that
> >>>>>>> jemalloc does.
> >>>>>>
> >>>>>> So, first a brief question: in the cases where the call does not error
> >>>>>> out,
> >>>>>> are we agreed that in the current implementation, MADV_DONTNEED will
> >>>>>> always result in zero-filled pages when the region is faulted back in
> >>>>>> (when we consider pages that are not backed by a file)?
> >>>>>
> >>>>> I'd agree at this point.
> >>>>
> >>>> Thanks for the confirmation.
> >>>>
> >>>>> Also we should probably mention anonymously shared pages (shmem). I think
> >>>>> they behave the same as file here.
> >>>>
> >>>> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
> >>>
> >>> shmem is tmpfs (that by itself would fit under "files" just fine), but also
> >>> sys V segments created by shmget(2) and also mappings created by mmap with
> >>> MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> >>> refer to the full list.
> >>
> >> So, how about this text:
> >>
> >>               After a successful MADV_DONTNEED operation, the seman‐
> >>               tics  of  memory  access  in  the specified region are
> >>               changed: subsequent accesses of  pages  in  the  range
> >>               will  succeed,  but will result in either reloading of
> >>               the memory contents from the  underlying  mapped  file
> >>               (for  shared file mappings, shared anonymous mappings,
> >>               and shmem-based techniques such  as  System  V  shared
> >>               memory  segments)  or  zero-fill-on-demand  pages  for
> >>               anonymous private mappings.
> > 
> > Hmm, I'd like to clarify.
> > 
> > Whether it was intention or not, some of userspace developers thought
> > about that syscall drop pages instantly if was no-error return so that
> > they will see more free pages(ie, rss for the process will be decreased)
> > with keeping the VMA. Can we rely on it?
> 
> I do not know. Michael?

It's important to identify difference between MADV_DONTNEED and MADV_FREE
so it would be better to clear out in this chance.

> 
> > And we should make error section, too.
> > "locked" covers mlock(2) and you said you will add hugetlb. Then,
> > VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
> > special mapping for some drivers?
> 
> I'm open for offers on what to add.

I suggests from quote "LWN" http://lwn.net/Articles/162860/
"*special mapping* which is not made up of "normal" pages.
It is usually created by device drivers which map special memory areas
into user space"

>  
> > One more thing, "The kernel is free to ignore the advice".
> > It conflicts "This call does not influence the semantics of the
> > application (except in the case of MADV_DONTNEED)" so
> > is it okay we can believe "The kernel is free to ingmore the advise
> > except MADV_DONTNEED"?
> 
> I decided to just drop the sentence
> 
>      The kernel is free to ignore the advice.
> 
> It creates misunderstandings, and does not really add information.

Sounds good.

> 
> Cheers,
> 
> Michael
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-09  6:46                         ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-09  6:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Vlastimil Babka, Kirill A. Shutemov, Dave Hansen, Mel Gorman,
	linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

Hello, Michael

On Fri, Feb 06, 2015 at 04:41:12PM +0100, Michael Kerrisk (man-pages) wrote:
> On 02/05/2015 02:07 AM, Minchan Kim wrote:
> > Hello,
> > 
> > On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
> >> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
> >>> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
> >>>>
> >>>> Hello Vlastimil,
> >>>>
> >>>> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@suse.cz> wrote:
> >>>>>>>
> >>>>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
> >>>>>>> case
> >>>>>>> though. I dont see any check for other kinds of shared pages in the
> >>>>>>> code.
> >>>>>>
> >>>>>>
> >>>>>> Agreed. "shared" here seems confused. I've removed it. And I've
> >>>>>> added mention of "Huge TLB pages" for this error.
> >>>>>
> >>>>>
> >>>>> Thanks.
> >>>>
> >>>>
> >>>> I also added those cases for MADV_REMOVE, BTW.
> >>>
> >>>
> >>> Right. There's also the following for MADV_REMOVE that needs updating:
> >>>
> >>> "Currently, only shmfs/tmpfs supports this; other filesystems return with
> >>> the error ENOSYS."
> >>>
> >>> - it's not just shmem/tmpfs anymore. It should be best to refer to
> >>> fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> >>> date.
> >>>
> >>> - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> >>> listed in the ERRORS section.
> >>
> >> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
> >>
> >>>>>>>>> - The word "will result" did sound as a guarantee at least to me. So
> >>>>>>>>> here it
> >>>>>>>>> could be changed to "may result (unless the advice is ignored)"?
> >>>>>>>>
> >>>>>>>> It's too late to fix documentation. Applications already depends on
> >>>>>>>> the
> >>>>>>>> beheviour.
> >>>>>>>
> >>>>>>> Right, so as long as they check for EINVAL, it should be safe. It
> >>>>>>> appears
> >>>>>>> that
> >>>>>>> jemalloc does.
> >>>>>>
> >>>>>> So, first a brief question: in the cases where the call does not error
> >>>>>> out,
> >>>>>> are we agreed that in the current implementation, MADV_DONTNEED will
> >>>>>> always result in zero-filled pages when the region is faulted back in
> >>>>>> (when we consider pages that are not backed by a file)?
> >>>>>
> >>>>> I'd agree at this point.
> >>>>
> >>>> Thanks for the confirmation.
> >>>>
> >>>>> Also we should probably mention anonymously shared pages (shmem). I think
> >>>>> they behave the same as file here.
> >>>>
> >>>> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
> >>>
> >>> shmem is tmpfs (that by itself would fit under "files" just fine), but also
> >>> sys V segments created by shmget(2) and also mappings created by mmap with
> >>> MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> >>> refer to the full list.
> >>
> >> So, how about this text:
> >>
> >>               After a successful MADV_DONTNEED operation, the semana??
> >>               tics  of  memory  access  in  the specified region are
> >>               changed: subsequent accesses of  pages  in  the  range
> >>               will  succeed,  but will result in either reloading of
> >>               the memory contents from the  underlying  mapped  file
> >>               (for  shared file mappings, shared anonymous mappings,
> >>               and shmem-based techniques such  as  System  V  shared
> >>               memory  segments)  or  zero-fill-on-demand  pages  for
> >>               anonymous private mappings.
> > 
> > Hmm, I'd like to clarify.
> > 
> > Whether it was intention or not, some of userspace developers thought
> > about that syscall drop pages instantly if was no-error return so that
> > they will see more free pages(ie, rss for the process will be decreased)
> > with keeping the VMA. Can we rely on it?
> 
> I do not know. Michael?

It's important to identify difference between MADV_DONTNEED and MADV_FREE
so it would be better to clear out in this chance.

> 
> > And we should make error section, too.
> > "locked" covers mlock(2) and you said you will add hugetlb. Then,
> > VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
> > special mapping for some drivers?
> 
> I'm open for offers on what to add.

I suggests from quote "LWN" http://lwn.net/Articles/162860/
"*special mapping* which is not made up of "normal" pages.
It is usually created by device drivers which map special memory areas
into user space"

>  
> > One more thing, "The kernel is free to ignore the advice".
> > It conflicts "This call does not influence the semantics of the
> > application (except in the case of MADV_DONTNEED)" so
> > is it okay we can believe "The kernel is free to ingmore the advise
> > except MADV_DONTNEED"?
> 
> I decided to just drop the sentence
> 
>      The kernel is free to ignore the advice.
> 
> It creates misunderstandings, and does not really add information.

Sounds good.

> 
> Cheers,
> 
> Michael
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-09  6:50                         ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-09  6:50 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Michal Hocko, Vlastimil Babka, Kirill A. Shutemov, Dave Hansen,
	Mel Gorman, linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On Fri, Feb 06, 2015 at 04:57:50PM +0100, Michael Kerrisk (man-pages) wrote:
> Hi Michael
> 
> On 02/05/2015 04:41 PM, Michal Hocko wrote:
> > On Wed 04-02-15 20:24:27, Michael Kerrisk wrote:
> > [...]
> >> So, how about this text:
> >>
> >>               After a successful MADV_DONTNEED operation, the seman‐
> >>               tics  of  memory  access  in  the specified region are
> >>               changed: subsequent accesses of  pages  in  the  range
> >>               will  succeed,  but will result in either reloading of
> >>               the memory contents from the  underlying  mapped  file
> > 
> > "
> > result in either providing the up-to-date contents of the underlying
> > mapped file
> > "
> 
> Thanks! I did something like that. See below.
> 
> > Would be more precise IMO because reload might be interpreted as a major
> > fault which is not necessarily the case (see below).
> > 
> >>               (for  shared file mappings, shared anonymous mappings,
> >>               and shmem-based techniques such  as  System  V  shared
> >>               memory  segments)  or  zero-fill-on-demand  pages  for
> >>               anonymous private mappings.
> > 
> > Yes, this wording is better because many users are not aware of
> > MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
> > mention that.
> 
> (Michal, would you have a text to propose to add to the mmap(2) page?
> Maybe it would be useful to add something there.)
> 
> > 
> > I am just wondering whether it makes sense to mention that MADV_DONTNEED
> > for shared mappings might be surprising and not freeing the backing
> > pages thus not really freeing memory until there is a memory
> > pressure. But maybe this is too implementation specific for a man
> > page. What about the following wording on top of yours?
> > "
> > Please note that the MADV_DONTNEED hint on shared mappings might not
> > lead to immediate freeing of pages in the range. The kernel is free to
> > delay this until an appropriate moment. RSS of the calling process will
> > be reduced however.
> > "
> 
> Thanks! I added this, but dropped in the word "immediately" in the last 
> sentence, since I assume that was implied. So now we have:
> 
>               After  a  successful MADV_DONTNEED operation, the seman‐
>               tics of  memory  access  in  the  specified  region  are
>               changed:  subsequent accesses of pages in the range will
>               succeed, but will result in either repopulating the mem‐
>               ory  contents from the up-to-date contents of the under‐
>               lying mapped file  (for  shared  file  mappings,  shared
>               anonymous  mappings,  and shmem-based techniques such as
>               System V shared memory segments) or  zero-fill-on-demand
>               pages for anonymous private mappings.
> 
>               Note  that,  when applied to shared mappings, MADV_DONT‐
>               NEED might not lead to immediate freeing of the pages in
>               the  range.   The  kernel  is  free to delay freeing the
>               pages until an appropriate  moment.   The  resident  set
>               size  (RSS)  of  the calling process will be immediately
>               reduced however.

Looks good. So, I can parse it that anonymous private mappings will lead
to immediate freeing of the pages in the range so it's clearly different
with MADV_FREE.

> 
> The current draft of the page can be found in a branch,
> http://git.kernel.org/cgit/docs/man-pages/man-pages.git/log/?h=draft_madvise
> 
> Thanks,
> 
> Michael
> 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-09  6:50                         ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-09  6:50 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Michal Hocko, Vlastimil Babka, Kirill A. Shutemov, Dave Hansen,
	Mel Gorman, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Andrew Morton, lkml,
	Linux API, linux-man, Hugh Dickins

On Fri, Feb 06, 2015 at 04:57:50PM +0100, Michael Kerrisk (man-pages) wrote:
> Hi Michael
> 
> On 02/05/2015 04:41 PM, Michal Hocko wrote:
> > On Wed 04-02-15 20:24:27, Michael Kerrisk wrote:
> > [...]
> >> So, how about this text:
> >>
> >>               After a successful MADV_DONTNEED operation, the seman‐
> >>               tics  of  memory  access  in  the specified region are
> >>               changed: subsequent accesses of  pages  in  the  range
> >>               will  succeed,  but will result in either reloading of
> >>               the memory contents from the  underlying  mapped  file
> > 
> > "
> > result in either providing the up-to-date contents of the underlying
> > mapped file
> > "
> 
> Thanks! I did something like that. See below.
> 
> > Would be more precise IMO because reload might be interpreted as a major
> > fault which is not necessarily the case (see below).
> > 
> >>               (for  shared file mappings, shared anonymous mappings,
> >>               and shmem-based techniques such  as  System  V  shared
> >>               memory  segments)  or  zero-fill-on-demand  pages  for
> >>               anonymous private mappings.
> > 
> > Yes, this wording is better because many users are not aware of
> > MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
> > mention that.
> 
> (Michal, would you have a text to propose to add to the mmap(2) page?
> Maybe it would be useful to add something there.)
> 
> > 
> > I am just wondering whether it makes sense to mention that MADV_DONTNEED
> > for shared mappings might be surprising and not freeing the backing
> > pages thus not really freeing memory until there is a memory
> > pressure. But maybe this is too implementation specific for a man
> > page. What about the following wording on top of yours?
> > "
> > Please note that the MADV_DONTNEED hint on shared mappings might not
> > lead to immediate freeing of pages in the range. The kernel is free to
> > delay this until an appropriate moment. RSS of the calling process will
> > be reduced however.
> > "
> 
> Thanks! I added this, but dropped in the word "immediately" in the last 
> sentence, since I assume that was implied. So now we have:
> 
>               After  a  successful MADV_DONTNEED operation, the seman‐
>               tics of  memory  access  in  the  specified  region  are
>               changed:  subsequent accesses of pages in the range will
>               succeed, but will result in either repopulating the mem‐
>               ory  contents from the up-to-date contents of the under‐
>               lying mapped file  (for  shared  file  mappings,  shared
>               anonymous  mappings,  and shmem-based techniques such as
>               System V shared memory segments) or  zero-fill-on-demand
>               pages for anonymous private mappings.
> 
>               Note  that,  when applied to shared mappings, MADV_DONT‐
>               NEED might not lead to immediate freeing of the pages in
>               the  range.   The  kernel  is  free to delay freeing the
>               pages until an appropriate  moment.   The  resident  set
>               size  (RSS)  of  the calling process will be immediately
>               reduced however.

Looks good. So, I can parse it that anonymous private mappings will lead
to immediate freeing of the pages in the range so it's clearly different
with MADV_FREE.

> 
> The current draft of the page can be found in a branch,
> http://git.kernel.org/cgit/docs/man-pages/man-pages.git/log/?h=draft_madvise
> 
> Thanks,
> 
> Michael
> 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-09  6:50                         ` Minchan Kim
  0 siblings, 0 replies; 76+ messages in thread
From: Minchan Kim @ 2015-02-09  6:50 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Michal Hocko, Vlastimil Babka, Kirill A. Shutemov, Dave Hansen,
	Mel Gorman, linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

On Fri, Feb 06, 2015 at 04:57:50PM +0100, Michael Kerrisk (man-pages) wrote:
> Hi Michael
> 
> On 02/05/2015 04:41 PM, Michal Hocko wrote:
> > On Wed 04-02-15 20:24:27, Michael Kerrisk wrote:
> > [...]
> >> So, how about this text:
> >>
> >>               After a successful MADV_DONTNEED operation, the semana??
> >>               tics  of  memory  access  in  the specified region are
> >>               changed: subsequent accesses of  pages  in  the  range
> >>               will  succeed,  but will result in either reloading of
> >>               the memory contents from the  underlying  mapped  file
> > 
> > "
> > result in either providing the up-to-date contents of the underlying
> > mapped file
> > "
> 
> Thanks! I did something like that. See below.
> 
> > Would be more precise IMO because reload might be interpreted as a major
> > fault which is not necessarily the case (see below).
> > 
> >>               (for  shared file mappings, shared anonymous mappings,
> >>               and shmem-based techniques such  as  System  V  shared
> >>               memory  segments)  or  zero-fill-on-demand  pages  for
> >>               anonymous private mappings.
> > 
> > Yes, this wording is better because many users are not aware of
> > MAP_ANON|MAP_SHARED being file backed in fact and mmap man page doesn't
> > mention that.
> 
> (Michal, would you have a text to propose to add to the mmap(2) page?
> Maybe it would be useful to add something there.)
> 
> > 
> > I am just wondering whether it makes sense to mention that MADV_DONTNEED
> > for shared mappings might be surprising and not freeing the backing
> > pages thus not really freeing memory until there is a memory
> > pressure. But maybe this is too implementation specific for a man
> > page. What about the following wording on top of yours?
> > "
> > Please note that the MADV_DONTNEED hint on shared mappings might not
> > lead to immediate freeing of pages in the range. The kernel is free to
> > delay this until an appropriate moment. RSS of the calling process will
> > be reduced however.
> > "
> 
> Thanks! I added this, but dropped in the word "immediately" in the last 
> sentence, since I assume that was implied. So now we have:
> 
>               After  a  successful MADV_DONTNEED operation, the semana??
>               tics of  memory  access  in  the  specified  region  are
>               changed:  subsequent accesses of pages in the range will
>               succeed, but will result in either repopulating the mema??
>               ory  contents from the up-to-date contents of the undera??
>               lying mapped file  (for  shared  file  mappings,  shared
>               anonymous  mappings,  and shmem-based techniques such as
>               System V shared memory segments) or  zero-fill-on-demand
>               pages for anonymous private mappings.
> 
>               Note  that,  when applied to shared mappings, MADV_DONTa??
>               NEED might not lead to immediate freeing of the pages in
>               the  range.   The  kernel  is  free to delay freeing the
>               pages until an appropriate  moment.   The  resident  set
>               size  (RSS)  of  the calling process will be immediately
>               reduced however.

Looks good. So, I can parse it that anonymous private mappings will lead
to immediate freeing of the pages in the range so it's clearly different
with MADV_FREE.

> 
> The current draft of the page can be found in a branch,
> http://git.kernel.org/cgit/docs/man-pages/man-pages.git/log/?h=draft_madvise
> 
> Thanks,
> 
> Michael
> 
> 
> 
> -- 
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
  2015-02-09  6:46                         ` Minchan Kim
@ 2015-02-09  9:13                           ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-09  9:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: mtk.manpages, Vlastimil Babka, Kirill A. Shutemov, Dave Hansen,
	Mel Gorman, linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

Hello Minchan

On 02/09/2015 07:46 AM, Minchan Kim wrote:
> Hello, Michael
> 
> On Fri, Feb 06, 2015 at 04:41:12PM +0100, Michael Kerrisk (man-pages) wrote:
>> On 02/05/2015 02:07 AM, Minchan Kim wrote:
>>> Hello,
>>>
>>> On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
>>>> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
>>>>> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:

[...]

>>> And we should make error section, too.
>>> "locked" covers mlock(2) and you said you will add hugetlb. Then,
>>> VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
>>> special mapping for some drivers?
>>
>> I'm open for offers on what to add.
> 
> I suggests from quote "LWN" http://lwn.net/Articles/162860/
> "*special mapping* which is not made up of "normal" pages.
> It is usually created by device drivers which map special memory areas
> into user space"

Thanks. I've added mention of VM_PFNMAP in the discussion of both 
MADV_DONTNEED and MADV_REMOVE, and noted that both of those
operations will give an error when applied to VM_PFNMAP pages.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints
@ 2015-02-09  9:13                           ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 76+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-02-09  9:13 UTC (permalink / raw)
  To: Minchan Kim
  Cc: mtk.manpages, Vlastimil Babka, Kirill A. Shutemov, Dave Hansen,
	Mel Gorman, linux-mm, Andrew Morton, lkml, Linux API, linux-man,
	Hugh Dickins

Hello Minchan

On 02/09/2015 07:46 AM, Minchan Kim wrote:
> Hello, Michael
> 
> On Fri, Feb 06, 2015 at 04:41:12PM +0100, Michael Kerrisk (man-pages) wrote:
>> On 02/05/2015 02:07 AM, Minchan Kim wrote:
>>> Hello,
>>>
>>> On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
>>>> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@suse.cz> wrote:
>>>>> On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:

[...]

>>> And we should make error section, too.
>>> "locked" covers mlock(2) and you said you will add hugetlb. Then,
>>> VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
>>> special mapping for some drivers?
>>
>> I'm open for offers on what to add.
> 
> I suggests from quote "LWN" http://lwn.net/Articles/162860/
> "*special mapping* which is not made up of "normal" pages.
> It is usually created by device drivers which map special memory areas
> into user space"

Thanks. I've added mention of VM_PFNMAP in the discussion of both 
MADV_DONTNEED and MADV_REMOVE, and noted that both of those
operations will give an error when applied to VM_PFNMAP pages.

Cheers,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2015-02-09  9:13 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-02 16:55 [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints Mel Gorman
2015-02-02 16:55 ` Mel Gorman
2015-02-02 22:05 ` Andrew Morton
2015-02-02 22:05   ` Andrew Morton
2015-02-02 22:18   ` Mel Gorman
2015-02-02 22:18     ` Mel Gorman
2015-02-02 22:35     ` Andrew Morton
2015-02-02 22:35       ` Andrew Morton
2015-02-03  0:26       ` Davidlohr Bueso
2015-02-03  0:26         ` Davidlohr Bueso
2015-02-03 10:50       ` Mel Gorman
2015-02-03 10:50         ` Mel Gorman
2015-02-05 21:44     ` Rik van Riel
2015-02-05 21:44       ` Rik van Riel
2015-02-02 22:22 ` Dave Hansen
2015-02-02 22:22   ` Dave Hansen
2015-02-03  8:19   ` MADV_DONTNEED semantics? Was: " Vlastimil Babka
2015-02-03  8:19     ` Vlastimil Babka
2015-02-03 10:53     ` Kirill A. Shutemov
2015-02-03 10:53       ` Kirill A. Shutemov
2015-02-03 10:53       ` Kirill A. Shutemov
2015-02-03 11:42       ` Vlastimil Babka
2015-02-03 11:42         ` Vlastimil Babka
2015-02-03 16:20         ` Michael Kerrisk (man-pages)
2015-02-03 16:20           ` Michael Kerrisk (man-pages)
2015-02-04 13:46           ` Vlastimil Babka
2015-02-04 13:46             ` Vlastimil Babka
2015-02-04 13:46             ` Vlastimil Babka
2015-02-04 14:00             ` Michael Kerrisk (man-pages)
2015-02-04 14:00               ` Michael Kerrisk (man-pages)
2015-02-04 14:00               ` Michael Kerrisk (man-pages)
2015-02-04 17:02               ` Vlastimil Babka
2015-02-04 17:02                 ` Vlastimil Babka
2015-02-04 19:24                 ` Michael Kerrisk (man-pages)
2015-02-04 19:24                   ` Michael Kerrisk (man-pages)
2015-02-04 19:24                   ` Michael Kerrisk (man-pages)
2015-02-05  1:07                   ` Minchan Kim
2015-02-05  1:07                     ` Minchan Kim
2015-02-05  1:07                     ` Minchan Kim
2015-02-06 15:41                     ` Michael Kerrisk (man-pages)
2015-02-06 15:41                       ` Michael Kerrisk (man-pages)
2015-02-06 15:41                       ` Michael Kerrisk (man-pages)
2015-02-09  6:46                       ` Minchan Kim
2015-02-09  6:46                         ` Minchan Kim
2015-02-09  6:46                         ` Minchan Kim
2015-02-09  9:13                         ` Michael Kerrisk (man-pages)
2015-02-09  9:13                           ` Michael Kerrisk (man-pages)
2015-02-05 15:41                   ` Michal Hocko
2015-02-05 15:41                     ` Michal Hocko
2015-02-05 15:41                     ` Michal Hocko
2015-02-06 15:57                     ` Michael Kerrisk (man-pages)
2015-02-06 15:57                       ` Michael Kerrisk (man-pages)
2015-02-06 15:57                       ` Michael Kerrisk (man-pages)
2015-02-06 20:45                       ` Michal Hocko
2015-02-06 20:45                         ` Michal Hocko
2015-02-06 20:45                         ` Michal Hocko
2015-02-09  6:50                       ` Minchan Kim
2015-02-09  6:50                         ` Minchan Kim
2015-02-09  6:50                         ` Minchan Kim
2015-02-04  0:09         ` Minchan Kim
2015-02-04  0:09           ` Minchan Kim
2015-02-04  0:09           ` Minchan Kim
2015-02-03 11:16     ` Mel Gorman
2015-02-03 11:16       ` Mel Gorman
2015-02-03 15:21       ` Michal Hocko
2015-02-03 15:21         ` Michal Hocko
2015-02-03 15:21         ` Michal Hocko
2015-02-03 16:25         ` Michael Kerrisk (man-pages)
2015-02-03 16:25           ` Michael Kerrisk (man-pages)
2015-02-03 16:25           ` Michael Kerrisk (man-pages)
2015-02-03  9:47   ` Mel Gorman
2015-02-03  9:47     ` Mel Gorman
2015-02-03 10:47     ` Kirill A. Shutemov
2015-02-03 10:47       ` Kirill A. Shutemov
2015-02-03 11:21       ` Mel Gorman
2015-02-03 11:21         ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.