All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] man-pages: clarify MAP_LOCKED semantic
@ 2015-05-13 14:38 ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-13 14:38 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API, linux-mm

Hi,
during the previous discussion http://marc.info/?l=linux-mm&m=143022313618001&w=2
it was made clear that making mmap(MAP_LOCKED) semantic really have
mlock() semantic is too dangerous. Even though we can try to reduce the
failure space the mmap man page should make it really clear about the
subtle distinctions between the two. This is what that first patch does.
The second patch is a small clarification for MAP_POPULATE based on
David Rientjes feedback.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/2] man-pages: clarify MAP_LOCKED semantic
@ 2015-05-13 14:38 ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-13 14:38 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API, linux-mm

Hi,
during the previous discussion http://marc.info/?l=linux-mm&m=143022313618001&w=2
it was made clear that making mmap(MAP_LOCKED) semantic really have
mlock() semantic is too dangerous. Even though we can try to reduce the
failure space the mmap man page should make it really clear about the
subtle distinctions between the two. This is what that first patch does.
The second patch is a small clarification for MAP_POPULATE based on
David Rientjes feedback.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2015-05-13 14:38   ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-13 14:38 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
	linux-mm, Michal Hocko

From: Michal Hocko <mhocko@suse.cz>

MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
it has been introduced.
mlock(2) fails if the memory range cannot get populated to guarantee
that no future major faults will happen on the range. mmap(MAP_LOCKED) on
the other hand silently succeeds even if the range was populated only
partially.

Fixing this subtle difference in the kernel is rather awkward because
the memory population happens after mm locks have been dropped and so
the cleanup before returning failure (munlock) could operate on something
else than the originally mapped area.

E.g. speculative userspace page fault handler catching SEGV and doing
mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
mmap and lead to lost data. Although it is not clear whether such a
usage would be valid, mmap page doesn't explicitly describe requirements
for threaded applications so we cannot exclude this possibility.

This patch makes the semantic of MAP_LOCKED explicit and suggest using
mmap + mlock as the only way to guarantee no later major page faults.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 man2/mmap.2 | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/man2/mmap.2 b/man2/mmap.2
index 54d68cf87e9e..1486be2e96b3 100644
--- a/man2/mmap.2
+++ b/man2/mmap.2
@@ -235,8 +235,19 @@ See the Linux kernel source file
 for further information.
 .TP
 .BR MAP_LOCKED " (since Linux 2.5.37)"
-Lock the pages of the mapped region into memory in the manner of
+Mark the mmaped region to be locked in the same way as
 .BR mlock (2).
+This implementation will try to populate (prefault) the whole range but
+the mmap call doesn't fail with
+.B ENOMEM
+if this fails. Therefore major faults might happen later on. So the semantic
+is not as strong as
+.BR mlock (2).
+.BR mmap (2)
++
+.BR mlock (2)
+should be used when major faults are not acceptable after the initialization
+of the mapping.
 This flag is ignored in older kernels.
 .\" If set, the mapped pages will not be swapped out.
 .TP
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2015-05-13 14:38   ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-13 14:38 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Michal Hocko

From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
it has been introduced.
mlock(2) fails if the memory range cannot get populated to guarantee
that no future major faults will happen on the range. mmap(MAP_LOCKED) on
the other hand silently succeeds even if the range was populated only
partially.

Fixing this subtle difference in the kernel is rather awkward because
the memory population happens after mm locks have been dropped and so
the cleanup before returning failure (munlock) could operate on something
else than the originally mapped area.

E.g. speculative userspace page fault handler catching SEGV and doing
mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
mmap and lead to lost data. Although it is not clear whether such a
usage would be valid, mmap page doesn't explicitly describe requirements
for threaded applications so we cannot exclude this possibility.

This patch makes the semantic of MAP_LOCKED explicit and suggest using
mmap + mlock as the only way to guarantee no later major page faults.

Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
---
 man2/mmap.2 | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/man2/mmap.2 b/man2/mmap.2
index 54d68cf87e9e..1486be2e96b3 100644
--- a/man2/mmap.2
+++ b/man2/mmap.2
@@ -235,8 +235,19 @@ See the Linux kernel source file
 for further information.
 .TP
 .BR MAP_LOCKED " (since Linux 2.5.37)"
-Lock the pages of the mapped region into memory in the manner of
+Mark the mmaped region to be locked in the same way as
 .BR mlock (2).
+This implementation will try to populate (prefault) the whole range but
+the mmap call doesn't fail with
+.B ENOMEM
+if this fails. Therefore major faults might happen later on. So the semantic
+is not as strong as
+.BR mlock (2).
+.BR mmap (2)
++
+.BR mlock (2)
+should be used when major faults are not acceptable after the initialization
+of the mapping.
 This flag is ignored in older kernels.
 .\" If set, the mapped pages will not be swapped out.
 .TP
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2015-05-13 14:38   ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-13 14:38 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
	linux-mm, Michal Hocko

From: Michal Hocko <mhocko@suse.cz>

MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
it has been introduced.
mlock(2) fails if the memory range cannot get populated to guarantee
that no future major faults will happen on the range. mmap(MAP_LOCKED) on
the other hand silently succeeds even if the range was populated only
partially.

Fixing this subtle difference in the kernel is rather awkward because
the memory population happens after mm locks have been dropped and so
the cleanup before returning failure (munlock) could operate on something
else than the originally mapped area.

E.g. speculative userspace page fault handler catching SEGV and doing
mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
mmap and lead to lost data. Although it is not clear whether such a
usage would be valid, mmap page doesn't explicitly describe requirements
for threaded applications so we cannot exclude this possibility.

This patch makes the semantic of MAP_LOCKED explicit and suggest using
mmap + mlock as the only way to guarantee no later major page faults.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 man2/mmap.2 | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/man2/mmap.2 b/man2/mmap.2
index 54d68cf87e9e..1486be2e96b3 100644
--- a/man2/mmap.2
+++ b/man2/mmap.2
@@ -235,8 +235,19 @@ See the Linux kernel source file
 for further information.
 .TP
 .BR MAP_LOCKED " (since Linux 2.5.37)"
-Lock the pages of the mapped region into memory in the manner of
+Mark the mmaped region to be locked in the same way as
 .BR mlock (2).
+This implementation will try to populate (prefault) the whole range but
+the mmap call doesn't fail with
+.B ENOMEM
+if this fails. Therefore major faults might happen later on. So the semantic
+is not as strong as
+.BR mlock (2).
+.BR mmap (2)
++
+.BR mlock (2)
+should be used when major faults are not acceptable after the initialization
+of the mapping.
 This flag is ignored in older kernels.
 .\" If set, the mapped pages will not be swapped out.
 .TP
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mmap2: clarify MAP_POPULATE
  2015-05-13 14:38 ` Michal Hocko
@ 2015-05-13 14:38   ` Michal Hocko
  -1 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-13 14:38 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
	linux-mm, Michal Hocko

From: Michal Hocko <mhocko@suse.cz>

David Rientjes has noticed that MAP_POPULATE wording might promise much
more than the kernel actually provides and intend to provide. The
primary usage of the flag is to pre-fault the range. There is no
guarantee that no major faults will happen later on. The pages might
have been reclaimed by the time the process tries to access them.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 man2/mmap.2 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man2/mmap.2 b/man2/mmap.2
index 1486be2e96b3..dcf306f2f730 100644
--- a/man2/mmap.2
+++ b/man2/mmap.2
@@ -284,7 +284,7 @@ private writable mappings.
 .BR MAP_POPULATE " (since Linux 2.5.46)"
 Populate (prefault) page tables for a mapping.
 For a file mapping, this causes read-ahead on the file.
-Later accesses to the mapping will not be blocked by page faults.
+This will help to reduce blocking on page faults later.
 .BR MAP_POPULATE
 is supported for private mappings only since Linux 2.6.23.
 .TP
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/2] mmap2: clarify MAP_POPULATE
@ 2015-05-13 14:38   ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-13 14:38 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
	linux-mm, Michal Hocko

From: Michal Hocko <mhocko@suse.cz>

David Rientjes has noticed that MAP_POPULATE wording might promise much
more than the kernel actually provides and intend to provide. The
primary usage of the flag is to pre-fault the range. There is no
guarantee that no major faults will happen later on. The pages might
have been reclaimed by the time the process tries to access them.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 man2/mmap.2 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man2/mmap.2 b/man2/mmap.2
index 1486be2e96b3..dcf306f2f730 100644
--- a/man2/mmap.2
+++ b/man2/mmap.2
@@ -284,7 +284,7 @@ private writable mappings.
 .BR MAP_POPULATE " (since Linux 2.5.46)"
 Populate (prefault) page tables for a mapping.
 For a file mapping, this causes read-ahead on the file.
-Later accesses to the mapping will not be blocked by page faults.
+This will help to reduce blocking on page faults later.
 .BR MAP_POPULATE
 is supported for private mappings only since Linux 2.6.23.
 .TP
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
  2015-05-13 14:38   ` Michal Hocko
  (?)
  (?)
@ 2015-05-13 14:45   ` Eric B Munson
  2015-05-13 14:48       ` Eric B Munson
  2015-05-14  8:01       ` Michal Hocko
  -1 siblings, 2 replies; 32+ messages in thread
From: Eric B Munson @ 2015-05-13 14:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 1515 bytes --]

On Wed, 13 May 2015, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.cz>
> 
> MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> it has been introduced.
> mlock(2) fails if the memory range cannot get populated to guarantee
> that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> the other hand silently succeeds even if the range was populated only
> partially.
> 
> Fixing this subtle difference in the kernel is rather awkward because
> the memory population happens after mm locks have been dropped and so
> the cleanup before returning failure (munlock) could operate on something
> else than the originally mapped area.
> 
> E.g. speculative userspace page fault handler catching SEGV and doing
> mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> mmap and lead to lost data. Although it is not clear whether such a
> usage would be valid, mmap page doesn't explicitly describe requirements
> for threaded applications so we cannot exclude this possibility.
> 
> This patch makes the semantic of MAP_LOCKED explicit and suggest using
> mmap + mlock as the only way to guarantee no later major page faults.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Does the problem still happend when MAP_POPULATE | MAP_LOCKED is used
(AFAICT MAP_POPULATE will cause the mmap to fail if all the pages cannot
be made present).

Either way this is a good catch.

Acked-by: Eric B Munson <emunson@akamai.com>


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mmap2: clarify MAP_POPULATE
@ 2015-05-13 14:47     ` Eric B Munson
  0 siblings, 0 replies; 32+ messages in thread
From: Eric B Munson @ 2015-05-13 14:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 543 bytes --]

On Wed, 13 May 2015, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.cz>
> 
> David Rientjes has noticed that MAP_POPULATE wording might promise much
> more than the kernel actually provides and intend to provide. The
> primary usage of the flag is to pre-fault the range. There is no
> guarantee that no major faults will happen later on. The pages might
> have been reclaimed by the time the process tries to access them.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Reviewed-by: Eric B Munson <emunson@akamai.com>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mmap2: clarify MAP_POPULATE
@ 2015-05-13 14:47     ` Eric B Munson
  0 siblings, 0 replies; 32+ messages in thread
From: Eric B Munson @ 2015-05-13 14:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 614 bytes --]

On Wed, 13 May 2015, Michal Hocko wrote:

> From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> 
> David Rientjes has noticed that MAP_POPULATE wording might promise much
> more than the kernel actually provides and intend to provide. The
> primary usage of the flag is to pre-fault the range. There is no
> guarantee that no major faults will happen later on. The pages might
> have been reclaimed by the time the process tries to access them.
> 
> Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>

Reviewed-by: Eric B Munson <emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2015-05-13 14:48       ` Eric B Munson
  0 siblings, 0 replies; 32+ messages in thread
From: Eric B Munson @ 2015-05-13 14:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]

On Wed, 13 May 2015, Eric B Munson wrote:

> On Wed, 13 May 2015, Michal Hocko wrote:
> 
> > From: Michal Hocko <mhocko@suse.cz>
> > 
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> > 
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> > 
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> > 
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> Does the problem still happend when MAP_POPULATE | MAP_LOCKED is used
> (AFAICT MAP_POPULATE will cause the mmap to fail if all the pages cannot
> be made present).
> 
> Either way this is a good catch.
> 
> Acked-by: Eric B Munson <emunson@akamai.com>
> 
Sorry for the noise, this should have been a

Reviewed-by: Eric B Munson <emunson@akamai.com>


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2015-05-13 14:48       ` Eric B Munson
  0 siblings, 0 replies; 32+ messages in thread
From: Eric B Munson @ 2015-05-13 14:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 1829 bytes --]

On Wed, 13 May 2015, Eric B Munson wrote:

> On Wed, 13 May 2015, Michal Hocko wrote:
> 
> > From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> > 
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> > 
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> > 
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> > 
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> > 
> > Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> 
> Does the problem still happend when MAP_POPULATE | MAP_LOCKED is used
> (AFAICT MAP_POPULATE will cause the mmap to fail if all the pages cannot
> be made present).
> 
> Either way this is a good catch.
> 
> Acked-by: Eric B Munson <emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
> 
Sorry for the noise, this should have been a

Reviewed-by: Eric B Munson <emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2015-05-14  8:01       ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-14  8:01 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm

On Wed 13-05-15 10:45:06, Eric B Munson wrote:
> On Wed, 13 May 2015, Michal Hocko wrote:
> 
> > From: Michal Hocko <mhocko@suse.cz>
> > 
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> > 
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> > 
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> > 
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> Does the problem still happend when MAP_POPULATE | MAP_LOCKED is used
> (AFAICT MAP_POPULATE will cause the mmap to fail if all the pages cannot
> be made present).

No, there is no difference because MAP_POPULATE is implicit when
MAP_LOCKED is used and as pointed in the cover, we cannot fail after the
vma is created and locks dropped. The second patch tries to clarify that
MAP_POPULATE is just a best effort.

> Either way this is a good catch.
> 
> Acked-by: Eric B Munson <emunson@akamai.com>
 
Thanks!


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2015-05-14  8:01       ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-14  8:01 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Wed 13-05-15 10:45:06, Eric B Munson wrote:
> On Wed, 13 May 2015, Michal Hocko wrote:
> 
> > From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> > 
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> > 
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> > 
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> > 
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> > 
> > Signed-off-by: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> 
> Does the problem still happend when MAP_POPULATE | MAP_LOCKED is used
> (AFAICT MAP_POPULATE will cause the mmap to fail if all the pages cannot
> be made present).

No, there is no difference because MAP_POPULATE is implicit when
MAP_LOCKED is used and as pointed in the cover, we cannot fail after the
vma is created and locks dropped. The second patch tries to clarify that
MAP_POPULATE is just a best effort.

> Either way this is a good catch.
> 
> Acked-by: Eric B Munson <emunson-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
 
Thanks!


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2015-05-14  8:01       ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-14  8:01 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm

On Wed 13-05-15 10:45:06, Eric B Munson wrote:
> On Wed, 13 May 2015, Michal Hocko wrote:
> 
> > From: Michal Hocko <mhocko@suse.cz>
> > 
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> > 
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> > 
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> > 
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> 
> Does the problem still happend when MAP_POPULATE | MAP_LOCKED is used
> (AFAICT MAP_POPULATE will cause the mmap to fail if all the pages cannot
> be made present).

No, there is no difference because MAP_POPULATE is implicit when
MAP_LOCKED is used and as pointed in the cover, we cannot fail after the
vma is created and locks dropped. The second patch tries to clarify that
MAP_POPULATE is just a best effort.

> Either way this is a good catch.
> 
> Acked-by: Eric B Munson <emunson@akamai.com>
 
Thanks!


-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
  2015-05-13 14:38   ` Michal Hocko
@ 2015-05-14 13:36     ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 32+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-05-14 13:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: mtk.manpages, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm, Michal Hocko, Eric B Munson

On 05/13/2015 04:38 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.cz>
> 
> MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> it has been introduced.
> mlock(2) fails if the memory range cannot get populated to guarantee
> that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> the other hand silently succeeds even if the range was populated only
> partially.
> 
> Fixing this subtle difference in the kernel is rather awkward because
> the memory population happens after mm locks have been dropped and so
> the cleanup before returning failure (munlock) could operate on something
> else than the originally mapped area.
> 
> E.g. speculative userspace page fault handler catching SEGV and doing
> mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> mmap and lead to lost data. Although it is not clear whether such a
> usage would be valid, mmap page doesn't explicitly describe requirements
> for threaded applications so we cannot exclude this possibility.
> 
> This patch makes the semantic of MAP_LOCKED explicit and suggest using
> mmap + mlock as the only way to guarantee no later major page faults.

Thanks, Michal. Applied, with Reviewed-by: from Eric added.

Cheers,

Michael


> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  man2/mmap.2 | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/man2/mmap.2 b/man2/mmap.2
> index 54d68cf87e9e..1486be2e96b3 100644
> --- a/man2/mmap.2
> +++ b/man2/mmap.2
> @@ -235,8 +235,19 @@ See the Linux kernel source file
>  for further information.
>  .TP
>  .BR MAP_LOCKED " (since Linux 2.5.37)"
> -Lock the pages of the mapped region into memory in the manner of
> +Mark the mmaped region to be locked in the same way as
>  .BR mlock (2).
> +This implementation will try to populate (prefault) the whole range but
> +the mmap call doesn't fail with
> +.B ENOMEM
> +if this fails. Therefore major faults might happen later on. So the semantic
> +is not as strong as
> +.BR mlock (2).
> +.BR mmap (2)
> ++
> +.BR mlock (2)
> +should be used when major faults are not acceptable after the initialization
> +of the mapping.
>  This flag is ignored in older kernels.
>  .\" If set, the mapped pages will not be swapped out.
>  .TP
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2015-05-14 13:36     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 32+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-05-14 13:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: mtk.manpages, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm, Michal Hocko, Eric B Munson

On 05/13/2015 04:38 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.cz>
> 
> MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> it has been introduced.
> mlock(2) fails if the memory range cannot get populated to guarantee
> that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> the other hand silently succeeds even if the range was populated only
> partially.
> 
> Fixing this subtle difference in the kernel is rather awkward because
> the memory population happens after mm locks have been dropped and so
> the cleanup before returning failure (munlock) could operate on something
> else than the originally mapped area.
> 
> E.g. speculative userspace page fault handler catching SEGV and doing
> mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> mmap and lead to lost data. Although it is not clear whether such a
> usage would be valid, mmap page doesn't explicitly describe requirements
> for threaded applications so we cannot exclude this possibility.
> 
> This patch makes the semantic of MAP_LOCKED explicit and suggest using
> mmap + mlock as the only way to guarantee no later major page faults.

Thanks, Michal. Applied, with Reviewed-by: from Eric added.

Cheers,

Michael


> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  man2/mmap.2 | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/man2/mmap.2 b/man2/mmap.2
> index 54d68cf87e9e..1486be2e96b3 100644
> --- a/man2/mmap.2
> +++ b/man2/mmap.2
> @@ -235,8 +235,19 @@ See the Linux kernel source file
>  for further information.
>  .TP
>  .BR MAP_LOCKED " (since Linux 2.5.37)"
> -Lock the pages of the mapped region into memory in the manner of
> +Mark the mmaped region to be locked in the same way as
>  .BR mlock (2).
> +This implementation will try to populate (prefault) the whole range but
> +the mmap call doesn't fail with
> +.B ENOMEM
> +if this fails. Therefore major faults might happen later on. So the semantic
> +is not as strong as
> +.BR mlock (2).
> +.BR mmap (2)
> ++
> +.BR mlock (2)
> +should be used when major faults are not acceptable after the initialization
> +of the mapping.
>  This flag is ignored in older kernels.
>  .\" If set, the mapped pages will not be swapped out.
>  .TP
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mmap2: clarify MAP_POPULATE
  2015-05-13 14:38   ` Michal Hocko
@ 2015-05-14 13:36     ` Michael Kerrisk (man-pages)
  -1 siblings, 0 replies; 32+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-05-14 13:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: mtk.manpages, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm, Michal Hocko

On 05/13/2015 04:38 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.cz>
> 
> David Rientjes has noticed that MAP_POPULATE wording might promise much
> more than the kernel actually provides and intend to provide. The
> primary usage of the flag is to pre-fault the range. There is no
> guarantee that no major faults will happen later on. The pages might
> have been reclaimed by the time the process tries to access them.

Yes, thanks, Michal -- that's a good point to make clearer.
Applied, with Reviewed-by: from Eric added.

Cheers,

Michael

> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  man2/mmap.2 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/man2/mmap.2 b/man2/mmap.2
> index 1486be2e96b3..dcf306f2f730 100644
> --- a/man2/mmap.2
> +++ b/man2/mmap.2
> @@ -284,7 +284,7 @@ private writable mappings.
>  .BR MAP_POPULATE " (since Linux 2.5.46)"
>  Populate (prefault) page tables for a mapping.
>  For a file mapping, this causes read-ahead on the file.
> -Later accesses to the mapping will not be blocked by page faults.
> +This will help to reduce blocking on page faults later.
>  .BR MAP_POPULATE
>  is supported for private mappings only since Linux 2.6.23.
>  .TP
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mmap2: clarify MAP_POPULATE
@ 2015-05-14 13:36     ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 32+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-05-14 13:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: mtk.manpages, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm, Michal Hocko

On 05/13/2015 04:38 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.cz>
> 
> David Rientjes has noticed that MAP_POPULATE wording might promise much
> more than the kernel actually provides and intend to provide. The
> primary usage of the flag is to pre-fault the range. There is no
> guarantee that no major faults will happen later on. The pages might
> have been reclaimed by the time the process tries to access them.

Yes, thanks, Michal -- that's a good point to make clearer.
Applied, with Reviewed-by: from Eric added.

Cheers,

Michael

> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  man2/mmap.2 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/man2/mmap.2 b/man2/mmap.2
> index 1486be2e96b3..dcf306f2f730 100644
> --- a/man2/mmap.2
> +++ b/man2/mmap.2
> @@ -284,7 +284,7 @@ private writable mappings.
>  .BR MAP_POPULATE " (since Linux 2.5.46)"
>  Populate (prefault) page tables for a mapping.
>  For a file mapping, this causes read-ahead on the file.
> -Later accesses to the mapping will not be blocked by page faults.
> +This will help to reduce blocking on page faults later.
>  .BR MAP_POPULATE
>  is supported for private mappings only since Linux 2.6.23.
>  .TP
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mmap2: clarify MAP_POPULATE
  2015-05-13 14:38   ` Michal Hocko
@ 2015-05-15  0:13     ` David Rientjes
  -1 siblings, 0 replies; 32+ messages in thread
From: David Rientjes @ 2015-05-15  0:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, LKML, Linux API,
	linux-mm, Michal Hocko

On Wed, 13 May 2015, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.cz>
> 
> David Rientjes has noticed that MAP_POPULATE wording might promise much
> more than the kernel actually provides and intend to provide. The
> primary usage of the flag is to pre-fault the range. There is no
> guarantee that no major faults will happen later on. The pages might
> have been reclaimed by the time the process tries to access them.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: David Rientjes <rientjes@google.com>

Thanks for following up!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/2] mmap2: clarify MAP_POPULATE
@ 2015-05-15  0:13     ` David Rientjes
  0 siblings, 0 replies; 32+ messages in thread
From: David Rientjes @ 2015-05-15  0:13 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, LKML, Linux API,
	linux-mm, Michal Hocko

On Wed, 13 May 2015, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.cz>
> 
> David Rientjes has noticed that MAP_POPULATE wording might promise much
> more than the kernel actually provides and intend to provide. The
> primary usage of the flag is to pre-fault the range. There is no
> guarantee that no major faults will happen later on. The pages might
> have been reclaimed by the time the process tries to access them.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>

Acked-by: David Rientjes <rientjes@google.com>

Thanks for following up!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/2] man-pages: clarify MAP_LOCKED semantic
  2015-05-13 14:38 ` Michal Hocko
  (?)
@ 2015-05-18  9:12   ` Michal Hocko
  -1 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-18  9:12 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API, linux-mm

On Wed 13-05-15 16:38:10, Michal Hocko wrote:
> Hi,
> during the previous discussion http://marc.info/?l=linux-mm&m=143022313618001&w=2
> it was made clear that making mmap(MAP_LOCKED) semantic really have
> mlock() semantic is too dangerous. Even though we can try to reduce the
> failure space the mmap man page should make it really clear about the
> subtle distinctions between the two. This is what that first patch does.
> The second patch is a small clarification for MAP_POPULATE based on
> David Rientjes feedback.

I have completely forgot about the in kernel doc.
---
>From 9d1478ccd036f84e50da906e39cd1e7bcb94cecd Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Mon, 18 May 2015 11:07:00 +0200
Subject: [PATCH] Documentation/vm/unevictable-lru.txt: clarify MAP_LOCKED
 behavior

There is a very subtle difference between mmap()+mlock() vs
mmap(MAP_LOCKED) semantic. The former one fails if the population of the
area fails while the later one doesn't. This basically means that
mmap(MAPLOCKED) areas might see major fault after mmap syscall returns
which is not the case for mlock. mmap man page has already been altered
but Documentation/vm/unevictable-lru.txt deserves a clarification as
well.

Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 Documentation/vm/unevictable-lru.txt | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt
index 3be0bfc4738d..32ee3a67dba2 100644
--- a/Documentation/vm/unevictable-lru.txt
+++ b/Documentation/vm/unevictable-lru.txt
@@ -467,7 +467,13 @@ mmap(MAP_LOCKED) SYSTEM CALL HANDLING
 
 In addition the mlock()/mlockall() system calls, an application can request
 that a region of memory be mlocked supplying the MAP_LOCKED flag to the mmap()
-call.  Furthermore, any mmap() call or brk() call that expands the heap by a
+call. There is one important and subtle difference here, though. mmap() + mlock()
+will fail if the range cannot be faulted in (e.g. because mm_populate fails)
+and returns with ENOMEM while mmap(MAP_LOCKED) will not fail. The mmaped
+area will still have properties of the locked area - aka. pages will not get
+swapped out - but major page faults to fault memory in might still happen.
+
+Furthermore, any mmap() call or brk() call that expands the heap by a
 task that has previously called mlockall() with the MCL_FUTURE flag will result
 in the newly mapped memory being mlocked.  Before the unevictable/mlock
 changes, the kernel simply called make_pages_present() to allocate pages and
-- 
2.1.4


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/2] man-pages: clarify MAP_LOCKED semantic
@ 2015-05-18  9:12   ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-18  9:12 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API, linux-mm

On Wed 13-05-15 16:38:10, Michal Hocko wrote:
> Hi,
> during the previous discussion http://marc.info/?l=linux-mm&m=143022313618001&w=2
> it was made clear that making mmap(MAP_LOCKED) semantic really have
> mlock() semantic is too dangerous. Even though we can try to reduce the
> failure space the mmap man page should make it really clear about the
> subtle distinctions between the two. This is what that first patch does.
> The second patch is a small clarification for MAP_POPULATE based on
> David Rientjes feedback.

I have completely forgot about the in kernel doc.
---
>From 9d1478ccd036f84e50da906e39cd1e7bcb94cecd Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Mon, 18 May 2015 11:07:00 +0200
Subject: [PATCH] Documentation/vm/unevictable-lru.txt: clarify MAP_LOCKED
 behavior

There is a very subtle difference between mmap()+mlock() vs
mmap(MAP_LOCKED) semantic. The former one fails if the population of the
area fails while the later one doesn't. This basically means that
mmap(MAPLOCKED) areas might see major fault after mmap syscall returns
which is not the case for mlock. mmap man page has already been altered
but Documentation/vm/unevictable-lru.txt deserves a clarification as
well.

Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 Documentation/vm/unevictable-lru.txt | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt
index 3be0bfc4738d..32ee3a67dba2 100644
--- a/Documentation/vm/unevictable-lru.txt
+++ b/Documentation/vm/unevictable-lru.txt
@@ -467,7 +467,13 @@ mmap(MAP_LOCKED) SYSTEM CALL HANDLING
 
 In addition the mlock()/mlockall() system calls, an application can request
 that a region of memory be mlocked supplying the MAP_LOCKED flag to the mmap()
-call.  Furthermore, any mmap() call or brk() call that expands the heap by a
+call. There is one important and subtle difference here, though. mmap() + mlock()
+will fail if the range cannot be faulted in (e.g. because mm_populate fails)
+and returns with ENOMEM while mmap(MAP_LOCKED) will not fail. The mmaped
+area will still have properties of the locked area - aka. pages will not get
+swapped out - but major page faults to fault memory in might still happen.
+
+Furthermore, any mmap() call or brk() call that expands the heap by a
 task that has previously called mlockall() with the MCL_FUTURE flag will result
 in the newly mapped memory being mlocked.  Before the unevictable/mlock
 changes, the kernel simply called make_pages_present() to allocate pages and
-- 
2.1.4


-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/2] man-pages: clarify MAP_LOCKED semantic
@ 2015-05-18  9:12   ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2015-05-18  9:12 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API, linux-mm

On Wed 13-05-15 16:38:10, Michal Hocko wrote:
> Hi,
> during the previous discussion http://marc.info/?l=linux-mm&m=143022313618001&w=2
> it was made clear that making mmap(MAP_LOCKED) semantic really have
> mlock() semantic is too dangerous. Even though we can try to reduce the
> failure space the mmap man page should make it really clear about the
> subtle distinctions between the two. This is what that first patch does.
> The second patch is a small clarification for MAP_POPULATE based on
> David Rientjes feedback.

I have completely forgot about the in kernel doc.
---

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2016-05-11 11:07     ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2016-05-11 11:07 UTC (permalink / raw)
  To: Michal Hocko, Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
	linux-mm, Michal Hocko



On 05/13/2015 04:38 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.cz>
>
> MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> it has been introduced.
> mlock(2) fails if the memory range cannot get populated to guarantee
> that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> the other hand silently succeeds even if the range was populated only
> partially.
>
> Fixing this subtle difference in the kernel is rather awkward because
> the memory population happens after mm locks have been dropped and so
> the cleanup before returning failure (munlock) could operate on something
> else than the originally mapped area.
>
> E.g. speculative userspace page fault handler catching SEGV and doing
> mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> mmap and lead to lost data. Although it is not clear whether such a
> usage would be valid, mmap page doesn't explicitly describe requirements
> for threaded applications so we cannot exclude this possibility.
>
> This patch makes the semantic of MAP_LOCKED explicit and suggest using
> mmap + mlock as the only way to guarantee no later major page faults.
>

URGH, this really blows chunks. It basically means MAP_LOCKED is 
pointless cruft and we might as well remove it.

Why not fix it proper?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2016-05-11 11:07     ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2016-05-11 11:07 UTC (permalink / raw)
  To: Michal Hocko, Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Michal Hocko



On 05/13/2015 04:38 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
>
> MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> it has been introduced.
> mlock(2) fails if the memory range cannot get populated to guarantee
> that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> the other hand silently succeeds even if the range was populated only
> partially.
>
> Fixing this subtle difference in the kernel is rather awkward because
> the memory population happens after mm locks have been dropped and so
> the cleanup before returning failure (munlock) could operate on something
> else than the originally mapped area.
>
> E.g. speculative userspace page fault handler catching SEGV and doing
> mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> mmap and lead to lost data. Although it is not clear whether such a
> usage would be valid, mmap page doesn't explicitly describe requirements
> for threaded applications so we cannot exclude this possibility.
>
> This patch makes the semantic of MAP_LOCKED explicit and suggest using
> mmap + mlock as the only way to guarantee no later major page faults.
>

URGH, this really blows chunks. It basically means MAP_LOCKED is 
pointless cruft and we might as well remove it.

Why not fix it proper?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2016-05-11 11:07     ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2016-05-11 11:07 UTC (permalink / raw)
  To: Michal Hocko, Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
	linux-mm, Michal Hocko



On 05/13/2015 04:38 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.cz>
>
> MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> it has been introduced.
> mlock(2) fails if the memory range cannot get populated to guarantee
> that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> the other hand silently succeeds even if the range was populated only
> partially.
>
> Fixing this subtle difference in the kernel is rather awkward because
> the memory population happens after mm locks have been dropped and so
> the cleanup before returning failure (munlock) could operate on something
> else than the originally mapped area.
>
> E.g. speculative userspace page fault handler catching SEGV and doing
> mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> mmap and lead to lost data. Although it is not clear whether such a
> usage would be valid, mmap page doesn't explicitly describe requirements
> for threaded applications so we cannot exclude this possibility.
>
> This patch makes the semantic of MAP_LOCKED explicit and suggest using
> mmap + mlock as the only way to guarantee no later major page faults.
>

URGH, this really blows chunks. It basically means MAP_LOCKED is 
pointless cruft and we might as well remove it.

Why not fix it proper?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
  2016-05-11 11:07     ` Peter Zijlstra
@ 2016-05-11 11:18       ` Peter Zijlstra
  -1 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2016-05-11 11:18 UTC (permalink / raw)
  To: Michal Hocko, Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
	linux-mm, Michal Hocko



On 05/11/2016 01:07 PM, Peter Zijlstra wrote:
> On 05/13/2015 04:38 PM, Michal Hocko wrote:
>>
>> This patch makes the semantic of MAP_LOCKED explicit and suggest using
>> mmap + mlock as the only way to guarantee no later major page faults.
>>
>
> URGH, this really blows chunks. It basically means MAP_LOCKED is 
> pointless cruft and we might as well remove it.
>
> Why not fix it proper?

OK; after having been pointed at this discussion, it seems I reacted rather
too hasty in that I didn't read all the previous threads.

 From that it appears fixing this proper is indeed rather hard, and we 
should
indeed consider MAP_LOCKED broken. At which point I would've worded the
manpage update stronger, but alas.

Sorry for the noise.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2016-05-11 11:18       ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2016-05-11 11:18 UTC (permalink / raw)
  To: Michal Hocko, Michael Kerrisk
  Cc: Andrew Morton, Linus Torvalds, David Rientjes, LKML, Linux API,
	linux-mm, Michal Hocko



On 05/11/2016 01:07 PM, Peter Zijlstra wrote:
> On 05/13/2015 04:38 PM, Michal Hocko wrote:
>>
>> This patch makes the semantic of MAP_LOCKED explicit and suggest using
>> mmap + mlock as the only way to guarantee no later major page faults.
>>
>
> URGH, this really blows chunks. It basically means MAP_LOCKED is 
> pointless cruft and we might as well remove it.
>
> Why not fix it proper?

OK; after having been pointed at this discussion, it seems I reacted rather
too hasty in that I didn't read all the previous threads.

 From that it appears fixing this proper is indeed rather hard, and we 
should
indeed consider MAP_LOCKED broken. At which point I would've worded the
manpage update stronger, but alas.

Sorry for the noise.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2016-05-11 11:32       ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2016-05-11 11:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm

On Wed 11-05-16 13:07:33, Peter Zijlstra wrote:
> 
> 
> On 05/13/2015 04:38 PM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.cz>
> > 
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> > 
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> > 
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> > 
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> > 
> 
> URGH, this really blows chunks. It basically means MAP_LOCKED is pointless
> cruft and we might as well remove it.

Yeah, the usefulness of MAP_LOCKED is somehow reduced. Everybody who
wants the full semantic really have to use mlock(2).

> Why not fix it proper?

I have tried but it turned out to be a problem because we are dropping
mmap_sem after we initialized VMA and as Linus pointed out there
are multithreaded applications which are doing opportunistic memory
management[1]. So we would have to hold the mmap_sem for write during
the whole VMA setup + population and that doesn't seem to be worth
all the trouble when we are even not sure whether somebody relies on
MAP_LOCKED to have the hard mlock semantic.

---
[1] http://lkml.kernel.org/r/CA+55aFydkG-BgZzry5DrTzueVh9VvEcVJdLV8iOyUphQk=0vpw@mail.gmail.com
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2016-05-11 11:32       ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2016-05-11 11:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm-Bw31MaZKKs3YtjvyW6yDsg

On Wed 11-05-16 13:07:33, Peter Zijlstra wrote:
> 
> 
> On 05/13/2015 04:38 PM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
> > 
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> > 
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> > 
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> > 
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> > 
> 
> URGH, this really blows chunks. It basically means MAP_LOCKED is pointless
> cruft and we might as well remove it.

Yeah, the usefulness of MAP_LOCKED is somehow reduced. Everybody who
wants the full semantic really have to use mlock(2).

> Why not fix it proper?

I have tried but it turned out to be a problem because we are dropping
mmap_sem after we initialized VMA and as Linus pointed out there
are multithreaded applications which are doing opportunistic memory
management[1]. So we would have to hold the mmap_sem for write during
the whole VMA setup + population and that doesn't seem to be worth
all the trouble when we are even not sure whether somebody relies on
MAP_LOCKED to have the hard mlock semantic.

---
[1] http://lkml.kernel.org/r/CA+55aFydkG-BgZzry5DrTzueVh9VvEcVJdLV8iOyUphQk=0vpw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic
@ 2016-05-11 11:32       ` Michal Hocko
  0 siblings, 0 replies; 32+ messages in thread
From: Michal Hocko @ 2016-05-11 11:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Kerrisk, Andrew Morton, Linus Torvalds, David Rientjes,
	LKML, Linux API, linux-mm

On Wed 11-05-16 13:07:33, Peter Zijlstra wrote:
> 
> 
> On 05/13/2015 04:38 PM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.cz>
> > 
> > MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
> > it has been introduced.
> > mlock(2) fails if the memory range cannot get populated to guarantee
> > that no future major faults will happen on the range. mmap(MAP_LOCKED) on
> > the other hand silently succeeds even if the range was populated only
> > partially.
> > 
> > Fixing this subtle difference in the kernel is rather awkward because
> > the memory population happens after mm locks have been dropped and so
> > the cleanup before returning failure (munlock) could operate on something
> > else than the originally mapped area.
> > 
> > E.g. speculative userspace page fault handler catching SEGV and doing
> > mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
> > mmap and lead to lost data. Although it is not clear whether such a
> > usage would be valid, mmap page doesn't explicitly describe requirements
> > for threaded applications so we cannot exclude this possibility.
> > 
> > This patch makes the semantic of MAP_LOCKED explicit and suggest using
> > mmap + mlock as the only way to guarantee no later major page faults.
> > 
> 
> URGH, this really blows chunks. It basically means MAP_LOCKED is pointless
> cruft and we might as well remove it.

Yeah, the usefulness of MAP_LOCKED is somehow reduced. Everybody who
wants the full semantic really have to use mlock(2).

> Why not fix it proper?

I have tried but it turned out to be a problem because we are dropping
mmap_sem after we initialized VMA and as Linus pointed out there
are multithreaded applications which are doing opportunistic memory
management[1]. So we would have to hold the mmap_sem for write during
the whole VMA setup + population and that doesn't seem to be worth
all the trouble when we are even not sure whether somebody relies on
MAP_LOCKED to have the hard mlock semantic.

---
[1] http://lkml.kernel.org/r/CA+55aFydkG-BgZzry5DrTzueVh9VvEcVJdLV8iOyUphQk=0vpw@mail.gmail.com
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-05-11 11:32 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13 14:38 [PATCH 0/2] man-pages: clarify MAP_LOCKED semantic Michal Hocko
2015-05-13 14:38 ` Michal Hocko
2015-05-13 14:38 ` [PATCH 1/2] mmap.2: " Michal Hocko
2015-05-13 14:38   ` Michal Hocko
2015-05-13 14:38   ` Michal Hocko
2015-05-13 14:45   ` Eric B Munson
2015-05-13 14:48     ` Eric B Munson
2015-05-13 14:48       ` Eric B Munson
2015-05-14  8:01     ` Michal Hocko
2015-05-14  8:01       ` Michal Hocko
2015-05-14  8:01       ` Michal Hocko
2015-05-14 13:36   ` Michael Kerrisk (man-pages)
2015-05-14 13:36     ` Michael Kerrisk (man-pages)
2016-05-11 11:07   ` Peter Zijlstra
2016-05-11 11:07     ` Peter Zijlstra
2016-05-11 11:07     ` Peter Zijlstra
2016-05-11 11:18     ` Peter Zijlstra
2016-05-11 11:18       ` Peter Zijlstra
2016-05-11 11:32     ` Michal Hocko
2016-05-11 11:32       ` Michal Hocko
2016-05-11 11:32       ` Michal Hocko
2015-05-13 14:38 ` [PATCH 2/2] mmap2: clarify MAP_POPULATE Michal Hocko
2015-05-13 14:38   ` Michal Hocko
2015-05-13 14:47   ` Eric B Munson
2015-05-13 14:47     ` Eric B Munson
2015-05-14 13:36   ` Michael Kerrisk (man-pages)
2015-05-14 13:36     ` Michael Kerrisk (man-pages)
2015-05-15  0:13   ` David Rientjes
2015-05-15  0:13     ` David Rientjes
2015-05-18  9:12 ` [PATCH 0/2] man-pages: clarify MAP_LOCKED semantic Michal Hocko
2015-05-18  9:12   ` Michal Hocko
2015-05-18  9:12   ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.