* [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
@ 2021-07-12 8:39 David Hildenbrand
2021-07-12 9:58 ` Pankaj Gupta
2021-07-12 11:05 ` Pankaj Gupta
0 siblings, 2 replies; 10+ messages in thread
From: David Hildenbrand @ 2021-07-12 8:39 UTC (permalink / raw)
To: linux-man
Cc: David Hildenbrand, Alejandro Colomar, Michael Kerrisk,
Andrew Morton, Michal Hocko, Oscar Salvador, Jann Horn,
Mike Rapoport, Linux API, linux-mm
MADV_POPULATE_READ and MADV_POPULATE_WRITE have been merged into
upstream Linux via commit 4ca9b3859dac ("mm/madvise: introduce
MADV_POPULATE_(READ|WRITE) to prefault page tables"), part of v5.14-rc1.
Let's document the behavior and error conditions of these new madvise()
options.
Cc: Alejandro Colomar <alx.manpages@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: linux-mm@kvack.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---
man2/madvise.2 | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 80 insertions(+)
diff --git a/man2/madvise.2 b/man2/madvise.2
index f1f384c0c..3ec8c53a7 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -469,6 +469,59 @@ If a page is file-backed and dirty, it will be written back to the backing
storage.
The advice might be ignored for some pages in the range when it is not
applicable.
+.TP
+.BR MADV_POPULATE_READ " (since Linux 5.14)
+Populate (prefault) page tables readable for the whole range without actually
+reading. Depending on the underlying mapping, map the shared zeropage,
+preallocate memory or read the underlying file; files with holes might or
+might not preallocate blocks.
+Do not generate
+.B SIGBUS
+when populating fails, return an error instead.
+.IP
+If
+.B MADV_POPULATE_READ
+succeeds, all page tables have been populated (prefaulted) readable once.
+If
+.B MADV_POPULATE_READ
+fails, some page tables might have been populated.
+.IP
+.B MADV_POPULATE_READ
+cannot be applied to mappings without read permissions
+and special mappings marked with the kernel-internal
+.B VM_PFNMAP
+and
+.BR VM_IO .
+.IP
+Note that with
+.BR MADV_POPULATE_READ ,
+the process can be killed at any moment when the system runs out of memory.
+.TP
+.BR MADV_POPULATE_WRITE " (since Linux 5.14)
+Populate (prefault) page tables writable for the whole range without actually
+writing. Depending on the underlying mapping, preallocate memory or read the
+underlying file; files with holes will preallocate blocks.
+Do not generate
+.B SIGBUS
+when populating fails, return an error instead.
+.IP
+If
+.B MADV_POPULATE_WRITE
+succeeds, all page tables have been populated (prefaulted) writable once.
+If
+.B MADV_POPULATE_WRITE
+fails, some page tables might have been populated.
+.IP
+.B MADV_POPULATE_WRITE
+cannot be applied to mappings without write permissions
+and special mappings marked with the kernel-internal
+.B VM_PFNMAP
+and
+.BR VM_IO .
+.IP
+Note that
+.BR MADV_POPULATE_WRITE ,
+the process can be killed at any moment when the system runs out of memory.
.SH RETURN VALUE
On success,
.BR madvise ()
@@ -533,6 +586,17 @@ or
.BR VM_PFNMAP
ranges.
.TP
+.B EINVAL
+.I advice
+is
+.B MADV_POPULATE_READ
+or
+.BR MADV_POPULATE_WRITE ,
+but the specified address range includes ranges with insufficient permissions,
+.B VM_IO
+or
+.BR VM_PFNMAP.
+.TP
.B EIO
(for
.BR MADV_WILLNEED )
@@ -548,6 +612,14 @@ Not enough memory: paging in failed.
Addresses in the specified range are not currently
mapped, or are outside the address space of the process.
.TP
+.B ENOMEM
+.I advice
+is
+.B MADV_POPULATE_READ
+or
+.BR MADV_POPULATE_WRITE ,
+but populating (prefaulting) page tables failed.
+.TP
.B EPERM
.I advice
is
@@ -555,6 +627,14 @@ is
but the caller does not have the
.B CAP_SYS_ADMIN
capability.
+.TP
+.B EHWPOISON
+.I advice
+is
+.B MADV_POPULATE_READ
+or
+.BR MADV_POPULATE_WRITE ,
+and a HW poisoned page is encountered.
.SH VERSIONS
Since Linux 3.18,
.\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
2021-07-12 8:39 [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE David Hildenbrand
@ 2021-07-12 9:58 ` Pankaj Gupta
2021-07-12 11:05 ` Pankaj Gupta
1 sibling, 0 replies; 10+ messages in thread
From: Pankaj Gupta @ 2021-07-12 9:58 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-man, Alejandro Colomar, Michael Kerrisk, Andrew Morton,
Michal Hocko, Oscar Salvador, Jann Horn, Mike Rapoport,
Linux API, Linux MM
> MADV_POPULATE_READ and MADV_POPULATE_WRITE have been merged into
> upstream Linux via commit 4ca9b3859dac ("mm/madvise: introduce
> MADV_POPULATE_(READ|WRITE) to prefault page tables"), part of v5.14-rc1.
>
> Let's document the behavior and error conditions of these new madvise()
> options.
>
> Cc: Alejandro Colomar <alx.manpages@gmail.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Linux API <linux-api@vger.kernel.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> man2/madvise.2 | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 80 insertions(+)
>
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index f1f384c0c..3ec8c53a7 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -469,6 +469,59 @@ If a page is file-backed and dirty, it will be written back to the backing
> storage.
> The advice might be ignored for some pages in the range when it is not
> applicable.
> +.TP
> +.BR MADV_POPULATE_READ " (since Linux 5.14)
> +Populate (prefault) page tables readable for the whole range without actually
> +reading. Depending on the underlying mapping, map the shared zeropage,
> +preallocate memory or read the underlying file; files with holes might or
> +might not preallocate blocks.
> +Do not generate
> +.B SIGBUS
> +when populating fails, return an error instead.
> +.IP
> +If
> +.B MADV_POPULATE_READ
> +succeeds, all page tables have been populated (prefaulted) readable once.
> +If
> +.B MADV_POPULATE_READ
> +fails, some page tables might have been populated.
> +.IP
> +.B MADV_POPULATE_READ
> +cannot be applied to mappings without read permissions
> +and special mappings marked with the kernel-internal
> +.B VM_PFNMAP
> +and
> +.BR VM_IO .
> +.IP
> +Note that with
> +.BR MADV_POPULATE_READ ,
> +the process can be killed at any moment when the system runs out of memory.
> +.TP
> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)
> +Populate (prefault) page tables writable for the whole range without actually
> +writing. Depending on the underlying mapping, preallocate memory or read the
Is this read or write?
just reading and trying to understand :)
> +underlying file; files with holes will preallocate blocks.
> +Do not generate
> +.B SIGBUS
> +when populating fails, return an error instead.
> +.IP
> +If
> +.B MADV_POPULATE_WRITE
> +succeeds, all page tables have been populated (prefaulted) writable once.
> +If
> +.B MADV_POPULATE_WRITE
> +fails, some page tables might have been populated.
> +.IP
> +.B MADV_POPULATE_WRITE
> +cannot be applied to mappings without write permissions
> +and special mappings marked with the kernel-internal
> +.B VM_PFNMAP
> +and
> +.BR VM_IO .
> +.IP
> +Note that
> +.BR MADV_POPULATE_WRITE ,
> +the process can be killed at any moment when the system runs out of memory.
> .SH RETURN VALUE
> On success,
> .BR madvise ()
> @@ -533,6 +586,17 @@ or
> .BR VM_PFNMAP
> ranges.
> .TP
> +.B EINVAL
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +but the specified address range includes ranges with insufficient permissions,
> +.B VM_IO
> +or
> +.BR VM_PFNMAP.
> +.TP
> .B EIO
> (for
> .BR MADV_WILLNEED )
> @@ -548,6 +612,14 @@ Not enough memory: paging in failed.
> Addresses in the specified range are not currently
> mapped, or are outside the address space of the process.
> .TP
> +.B ENOMEM
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +but populating (prefaulting) page tables failed.
> +.TP
> .B EPERM
> .I advice
> is
> @@ -555,6 +627,14 @@ is
> but the caller does not have the
> .B CAP_SYS_ADMIN
> capability.
> +.TP
> +.B EHWPOISON
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +and a HW poisoned page is encountered.
> .SH VERSIONS
> Since Linux 3.18,
> .\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb
> --
> 2.31.1
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
@ 2021-07-12 9:58 ` Pankaj Gupta
0 siblings, 0 replies; 10+ messages in thread
From: Pankaj Gupta @ 2021-07-12 9:58 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-man, Alejandro Colomar, Michael Kerrisk, Andrew Morton,
Michal Hocko, Oscar Salvador, Jann Horn, Mike Rapoport,
Linux API, Linux MM
> MADV_POPULATE_READ and MADV_POPULATE_WRITE have been merged into
> upstream Linux via commit 4ca9b3859dac ("mm/madvise: introduce
> MADV_POPULATE_(READ|WRITE) to prefault page tables"), part of v5.14-rc1.
>
> Let's document the behavior and error conditions of these new madvise()
> options.
>
> Cc: Alejandro Colomar <alx.manpages@gmail.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Linux API <linux-api@vger.kernel.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> man2/madvise.2 | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 80 insertions(+)
>
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index f1f384c0c..3ec8c53a7 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -469,6 +469,59 @@ If a page is file-backed and dirty, it will be written back to the backing
> storage.
> The advice might be ignored for some pages in the range when it is not
> applicable.
> +.TP
> +.BR MADV_POPULATE_READ " (since Linux 5.14)
> +Populate (prefault) page tables readable for the whole range without actually
> +reading. Depending on the underlying mapping, map the shared zeropage,
> +preallocate memory or read the underlying file; files with holes might or
> +might not preallocate blocks.
> +Do not generate
> +.B SIGBUS
> +when populating fails, return an error instead.
> +.IP
> +If
> +.B MADV_POPULATE_READ
> +succeeds, all page tables have been populated (prefaulted) readable once.
> +If
> +.B MADV_POPULATE_READ
> +fails, some page tables might have been populated.
> +.IP
> +.B MADV_POPULATE_READ
> +cannot be applied to mappings without read permissions
> +and special mappings marked with the kernel-internal
> +.B VM_PFNMAP
> +and
> +.BR VM_IO .
> +.IP
> +Note that with
> +.BR MADV_POPULATE_READ ,
> +the process can be killed at any moment when the system runs out of memory.
> +.TP
> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)
> +Populate (prefault) page tables writable for the whole range without actually
> +writing. Depending on the underlying mapping, preallocate memory or read the
Is this read or write?
just reading and trying to understand :)
> +underlying file; files with holes will preallocate blocks.
> +Do not generate
> +.B SIGBUS
> +when populating fails, return an error instead.
> +.IP
> +If
> +.B MADV_POPULATE_WRITE
> +succeeds, all page tables have been populated (prefaulted) writable once.
> +If
> +.B MADV_POPULATE_WRITE
> +fails, some page tables might have been populated.
> +.IP
> +.B MADV_POPULATE_WRITE
> +cannot be applied to mappings without write permissions
> +and special mappings marked with the kernel-internal
> +.B VM_PFNMAP
> +and
> +.BR VM_IO .
> +.IP
> +Note that
> +.BR MADV_POPULATE_WRITE ,
> +the process can be killed at any moment when the system runs out of memory.
> .SH RETURN VALUE
> On success,
> .BR madvise ()
> @@ -533,6 +586,17 @@ or
> .BR VM_PFNMAP
> ranges.
> .TP
> +.B EINVAL
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +but the specified address range includes ranges with insufficient permissions,
> +.B VM_IO
> +or
> +.BR VM_PFNMAP.
> +.TP
> .B EIO
> (for
> .BR MADV_WILLNEED )
> @@ -548,6 +612,14 @@ Not enough memory: paging in failed.
> Addresses in the specified range are not currently
> mapped, or are outside the address space of the process.
> .TP
> +.B ENOMEM
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +but populating (prefaulting) page tables failed.
> +.TP
> .B EPERM
> .I advice
> is
> @@ -555,6 +627,14 @@ is
> but the caller does not have the
> .B CAP_SYS_ADMIN
> capability.
> +.TP
> +.B EHWPOISON
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +and a HW poisoned page is encountered.
> .SH VERSIONS
> Since Linux 3.18,
> .\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb
> --
> 2.31.1
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
2021-07-12 9:58 ` Pankaj Gupta
(?)
@ 2021-07-12 10:03 ` David Hildenbrand
2021-07-12 10:17 ` Pankaj Gupta
-1 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2021-07-12 10:03 UTC (permalink / raw)
To: Pankaj Gupta
Cc: linux-man, Alejandro Colomar, Michael Kerrisk, Andrew Morton,
Michal Hocko, Oscar Salvador, Jann Horn, Mike Rapoport,
Linux API, Linux MM
On 12.07.21 11:58, Pankaj Gupta wrote:
>> MADV_POPULATE_READ and MADV_POPULATE_WRITE have been merged into
>> upstream Linux via commit 4ca9b3859dac ("mm/madvise: introduce
>> MADV_POPULATE_(READ|WRITE) to prefault page tables"), part of v5.14-rc1.
>>
>> Let's document the behavior and error conditions of these new madvise()
>> options.
>>
>> Cc: Alejandro Colomar <alx.manpages@gmail.com>
>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Jann Horn <jannh@google.com>
>> Cc: Mike Rapoport <rppt@kernel.org>
>> Cc: Linux API <linux-api@vger.kernel.org>
>> Cc: linux-mm@kvack.org
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> man2/madvise.2 | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 80 insertions(+)
>>
>> diff --git a/man2/madvise.2 b/man2/madvise.2
>> index f1f384c0c..3ec8c53a7 100644
>> --- a/man2/madvise.2
>> +++ b/man2/madvise.2
>> @@ -469,6 +469,59 @@ If a page is file-backed and dirty, it will be written back to the backing
>> storage.
>> The advice might be ignored for some pages in the range when it is not
>> applicable.
>> +.TP
>> +.BR MADV_POPULATE_READ " (since Linux 5.14)
>> +Populate (prefault) page tables readable for the whole range without actually
>> +reading. Depending on the underlying mapping, map the shared zeropage,
>> +preallocate memory or read the underlying file; files with holes might or
>> +might not preallocate blocks.
>> +Do not generate
>> +.B SIGBUS
>> +when populating fails, return an error instead.
>> +.IP
>> +If
>> +.B MADV_POPULATE_READ
>> +succeeds, all page tables have been populated (prefaulted) readable once.
>> +If
>> +.B MADV_POPULATE_READ
>> +fails, some page tables might have been populated.
>> +.IP
>> +.B MADV_POPULATE_READ
>> +cannot be applied to mappings without read permissions
>> +and special mappings marked with the kernel-internal
>> +.B VM_PFNMAP
>> +and
>> +.BR VM_IO .
>> +.IP
>> +Note that with
>> +.BR MADV_POPULATE_READ ,
>> +the process can be killed at any moment when the system runs out of memory.
>> +.TP
>> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)
>> +Populate (prefault) page tables writable for the whole range without actually
>> +writing. Depending on the underlying mapping, preallocate memory or read the
>
> Is this read or write?
> just reading and trying to understand :)
It's reading. Assume you have a file with existing content mapped into a
process. Once you touch a page (read/write/execute) that maps to blocks
with existing content, you'll have to load these blocks from disk first.
Thanks! :)
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
2021-07-12 10:03 ` David Hildenbrand
@ 2021-07-12 10:17 ` Pankaj Gupta
0 siblings, 0 replies; 10+ messages in thread
From: Pankaj Gupta @ 2021-07-12 10:17 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-man, Alejandro Colomar, Michael Kerrisk, Andrew Morton,
Michal Hocko, Oscar Salvador, Jann Horn, Mike Rapoport,
Linux API, Linux MM
> >> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)
> >> +Populate (prefault) page tables writable for the whole range without actually
> >> +writing. Depending on the underlying mapping, preallocate memory or read the
> >
> > Is this read or write?
> > just reading and trying to understand :)
>
> It's reading. Assume you have a file with existing content mapped into a
> process. Once you touch a page (read/write/execute) that maps to blocks
> with existing content, you'll have to load these blocks from disk first.
Got it. Thanks for explaining!
Best regards,
Pankaj
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
@ 2021-07-12 10:17 ` Pankaj Gupta
0 siblings, 0 replies; 10+ messages in thread
From: Pankaj Gupta @ 2021-07-12 10:17 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-man, Alejandro Colomar, Michael Kerrisk, Andrew Morton,
Michal Hocko, Oscar Salvador, Jann Horn, Mike Rapoport,
Linux API, Linux MM
> >> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)
> >> +Populate (prefault) page tables writable for the whole range without actually
> >> +writing. Depending on the underlying mapping, preallocate memory or read the
> >
> > Is this read or write?
> > just reading and trying to understand :)
>
> It's reading. Assume you have a file with existing content mapped into a
> process. Once you touch a page (read/write/execute) that maps to blocks
> with existing content, you'll have to load these blocks from disk first.
Got it. Thanks for explaining!
Best regards,
Pankaj
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
2021-07-12 8:39 [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE David Hildenbrand
@ 2021-07-12 11:05 ` Pankaj Gupta
2021-07-12 11:05 ` Pankaj Gupta
1 sibling, 0 replies; 10+ messages in thread
From: Pankaj Gupta @ 2021-07-12 11:05 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-man, Alejandro Colomar, Michael Kerrisk, Andrew Morton,
Michal Hocko, Oscar Salvador, Jann Horn, Mike Rapoport,
Linux API, Linux MM
> MADV_POPULATE_READ and MADV_POPULATE_WRITE have been merged into
> upstream Linux via commit 4ca9b3859dac ("mm/madvise: introduce
> MADV_POPULATE_(READ|WRITE) to prefault page tables"), part of v5.14-rc1.
>
> Let's document the behavior and error conditions of these new madvise()
> options.
>
> Cc: Alejandro Colomar <alx.manpages@gmail.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Linux API <linux-api@vger.kernel.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> man2/madvise.2 | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 80 insertions(+)
>
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index f1f384c0c..3ec8c53a7 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -469,6 +469,59 @@ If a page is file-backed and dirty, it will be written back to the backing
> storage.
> The advice might be ignored for some pages in the range when it is not
> applicable.
> +.TP
> +.BR MADV_POPULATE_READ " (since Linux 5.14)
> +Populate (prefault) page tables readable for the whole range without actually
> +reading. Depending on the underlying mapping, map the shared zeropage,
> +preallocate memory or read the underlying file; files with holes might or
> +might not preallocate blocks.
> +Do not generate
> +.B SIGBUS
> +when populating fails, return an error instead.
> +.IP
> +If
> +.B MADV_POPULATE_READ
> +succeeds, all page tables have been populated (prefaulted) readable once.
> +If
> +.B MADV_POPULATE_READ
> +fails, some page tables might have been populated.
> +.IP
> +.B MADV_POPULATE_READ
> +cannot be applied to mappings without read permissions
> +and special mappings marked with the kernel-internal
> +.B VM_PFNMAP
> +and
> +.BR VM_IO .
> +.IP
> +Note that with
> +.BR MADV_POPULATE_READ ,
> +the process can be killed at any moment when the system runs out of memory.
> +.TP
> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)
> +Populate (prefault) page tables writable for the whole range without actually
> +writing. Depending on the underlying mapping, preallocate memory or read the
> +underlying file; files with holes will preallocate blocks.
> +Do not generate
> +.B SIGBUS
> +when populating fails, return an error instead.
> +.IP
> +If
> +.B MADV_POPULATE_WRITE
> +succeeds, all page tables have been populated (prefaulted) writable once.
> +If
> +.B MADV_POPULATE_WRITE
> +fails, some page tables might have been populated.
> +.IP
> +.B MADV_POPULATE_WRITE
> +cannot be applied to mappings without write permissions
> +and special mappings marked with the kernel-internal
> +.B VM_PFNMAP
> +and
> +.BR VM_IO .
> +.IP
> +Note that
> +.BR MADV_POPULATE_WRITE ,
> +the process can be killed at any moment when the system runs out of memory.
> .SH RETURN VALUE
> On success,
> .BR madvise ()
> @@ -533,6 +586,17 @@ or
> .BR VM_PFNMAP
> ranges.
> .TP
> +.B EINVAL
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +but the specified address range includes ranges with insufficient permissions,
> +.B VM_IO
> +or
> +.BR VM_PFNMAP.
> +.TP
> .B EIO
> (for
> .BR MADV_WILLNEED )
> @@ -548,6 +612,14 @@ Not enough memory: paging in failed.
> Addresses in the specified range are not currently
> mapped, or are outside the address space of the process.
> .TP
> +.B ENOMEM
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +but populating (prefaulting) page tables failed.
> +.TP
> .B EPERM
> .I advice
> is
> @@ -555,6 +627,14 @@ is
> but the caller does not have the
> .B CAP_SYS_ADMIN
> capability.
> +.TP
> +.B EHWPOISON
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +and a HW poisoned page is encountered.
> .SH VERSIONS
> Since Linux 3.18,
> .\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb
From the end user point of view, I find document simple and easy to understand.
Did not went deep into the implementation yet, just skimmed a bit.
Acked-by: Pankaj Gupta <pankaj.gupta@ionos.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
@ 2021-07-12 11:05 ` Pankaj Gupta
0 siblings, 0 replies; 10+ messages in thread
From: Pankaj Gupta @ 2021-07-12 11:05 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-man, Alejandro Colomar, Michael Kerrisk, Andrew Morton,
Michal Hocko, Oscar Salvador, Jann Horn, Mike Rapoport,
Linux API, Linux MM
> MADV_POPULATE_READ and MADV_POPULATE_WRITE have been merged into
> upstream Linux via commit 4ca9b3859dac ("mm/madvise: introduce
> MADV_POPULATE_(READ|WRITE) to prefault page tables"), part of v5.14-rc1.
>
> Let's document the behavior and error conditions of these new madvise()
> options.
>
> Cc: Alejandro Colomar <alx.manpages@gmail.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Linux API <linux-api@vger.kernel.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> man2/madvise.2 | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 80 insertions(+)
>
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index f1f384c0c..3ec8c53a7 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -469,6 +469,59 @@ If a page is file-backed and dirty, it will be written back to the backing
> storage.
> The advice might be ignored for some pages in the range when it is not
> applicable.
> +.TP
> +.BR MADV_POPULATE_READ " (since Linux 5.14)
> +Populate (prefault) page tables readable for the whole range without actually
> +reading. Depending on the underlying mapping, map the shared zeropage,
> +preallocate memory or read the underlying file; files with holes might or
> +might not preallocate blocks.
> +Do not generate
> +.B SIGBUS
> +when populating fails, return an error instead.
> +.IP
> +If
> +.B MADV_POPULATE_READ
> +succeeds, all page tables have been populated (prefaulted) readable once.
> +If
> +.B MADV_POPULATE_READ
> +fails, some page tables might have been populated.
> +.IP
> +.B MADV_POPULATE_READ
> +cannot be applied to mappings without read permissions
> +and special mappings marked with the kernel-internal
> +.B VM_PFNMAP
> +and
> +.BR VM_IO .
> +.IP
> +Note that with
> +.BR MADV_POPULATE_READ ,
> +the process can be killed at any moment when the system runs out of memory.
> +.TP
> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)
> +Populate (prefault) page tables writable for the whole range without actually
> +writing. Depending on the underlying mapping, preallocate memory or read the
> +underlying file; files with holes will preallocate blocks.
> +Do not generate
> +.B SIGBUS
> +when populating fails, return an error instead.
> +.IP
> +If
> +.B MADV_POPULATE_WRITE
> +succeeds, all page tables have been populated (prefaulted) writable once.
> +If
> +.B MADV_POPULATE_WRITE
> +fails, some page tables might have been populated.
> +.IP
> +.B MADV_POPULATE_WRITE
> +cannot be applied to mappings without write permissions
> +and special mappings marked with the kernel-internal
> +.B VM_PFNMAP
> +and
> +.BR VM_IO .
> +.IP
> +Note that
> +.BR MADV_POPULATE_WRITE ,
> +the process can be killed at any moment when the system runs out of memory.
> .SH RETURN VALUE
> On success,
> .BR madvise ()
> @@ -533,6 +586,17 @@ or
> .BR VM_PFNMAP
> ranges.
> .TP
> +.B EINVAL
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +but the specified address range includes ranges with insufficient permissions,
> +.B VM_IO
> +or
> +.BR VM_PFNMAP.
> +.TP
> .B EIO
> (for
> .BR MADV_WILLNEED )
> @@ -548,6 +612,14 @@ Not enough memory: paging in failed.
> Addresses in the specified range are not currently
> mapped, or are outside the address space of the process.
> .TP
> +.B ENOMEM
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +but populating (prefaulting) page tables failed.
> +.TP
> .B EPERM
> .I advice
> is
> @@ -555,6 +627,14 @@ is
> but the caller does not have the
> .B CAP_SYS_ADMIN
> capability.
> +.TP
> +.B EHWPOISON
> +.I advice
> +is
> +.B MADV_POPULATE_READ
> +or
> +.BR MADV_POPULATE_WRITE ,
> +and a HW poisoned page is encountered.
> .SH VERSIONS
> Since Linux 3.18,
> .\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb
From the end user point of view, I find document simple and easy to understand.
Did not went deep into the implementation yet, just skimmed a bit.
Acked-by: Pankaj Gupta <pankaj.gupta@ionos.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
2021-07-12 9:58 ` Pankaj Gupta
(?)
(?)
@ 2021-07-25 20:15 ` Alejandro Colomar (man-pages)
2021-07-26 7:11 ` David Hildenbrand
-1 siblings, 1 reply; 10+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-25 20:15 UTC (permalink / raw)
To: Pankaj Gupta, David Hildenbrand
Cc: linux-man, Michael Kerrisk, Andrew Morton, Michal Hocko,
Oscar Salvador, Jann Horn, Mike Rapoport, Linux API, Linux MM
Hi David, Pankaj,
On 7/12/21 11:58 AM, Pankaj Gupta wrote:
>> MADV_POPULATE_READ and MADV_POPULATE_WRITE have been merged into
>> upstream Linux via commit 4ca9b3859dac ("mm/madvise: introduce
>> MADV_POPULATE_(READ|WRITE) to prefault page tables"), part of v5.14-rc1.
>>
>> Let's document the behavior and error conditions of these new madvise()
>> options.
Please see a couple of comments below.
>>
>> Cc: Alejandro Colomar <alx.manpages@gmail.com>
>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Jann Horn <jannh@google.com>
>> Cc: Mike Rapoport <rppt@kernel.org>
>> Cc: Linux API <linux-api@vger.kernel.org>
>> Cc: linux-mm@kvack.org
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> Acked-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Thanks for the acked by!
Cheers,
Alex
>> ---
>> man2/madvise.2 | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 80 insertions(+)
>>
>> diff --git a/man2/madvise.2 b/man2/madvise.2
>> index f1f384c0c..3ec8c53a7 100644
>> --- a/man2/madvise.2
>> +++ b/man2/madvise.2
>> @@ -469,6 +469,59 @@ If a page is file-backed and dirty, it will be written back to the backing
>> storage.
>> The advice might be ignored for some pages in the range when it is not
>> applicable.
>> +.TP
>> +.BR MADV_POPULATE_READ " (since Linux 5.14)
s/$/"/
>> +Populate (prefault) page tables readable for the whole range without actually
See the following extract from man-pages(7):
$ man 7 man-pages | sed -n '/Use semantic newlines/,/^$/p';
Use semantic newlines
In the source of a manual page, new sentences should be
started on new lines, and long sentences should split into
lines at clause breaks (commas, semicolons, colons, and so
on). This convention, sometimes known as "semantic new‐
lines", makes it easier to see the effect of patches, which
often operate at the level of individual sentences or sen‐
tence clauses.
>> +reading. Depending on the underlying mapping, map the shared zeropage,
>> +preallocate memory or read the underlying file; files with holes might or
>> +might not preallocate blocks.
>> +Do not generate
>> +.B SIGBUS
>> +when populating fails, return an error instead.
>> +.IP
>> +If
>> +.B MADV_POPULATE_READ
>> +succeeds, all page tables have been populated (prefaulted) readable once.
>> +If
>> +.B MADV_POPULATE_READ
>> +fails, some page tables might have been populated.
>> +.IP
>> +.B MADV_POPULATE_READ
>> +cannot be applied to mappings without read permissions
>> +and special mappings marked with the kernel-internal
>> +.B VM_PFNMAP
>> +and
>> +.BR VM_IO .
>> +.IP
>> +Note that with
>> +.BR MADV_POPULATE_READ ,
>> +the process can be killed at any moment when the system runs out of memory.
>> +.TP
>> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)
s/$/"/
>> +Populate (prefault) page tables writable for the whole range without actually
>> +writing. Depending on the underlying mapping, preallocate memory or read the
>
> Is this read or write?
> just reading and trying to understand :)
>
>> +underlying file; files with holes will preallocate blocks.
>> +Do not generate
>> +.B SIGBUS
>> +when populating fails, return an error instead.
>> +.IP
>> +If
>> +.B MADV_POPULATE_WRITE
>> +succeeds, all page tables have been populated (prefaulted) writable once.
>> +If
>> +.B MADV_POPULATE_WRITE
>> +fails, some page tables might have been populated.
>> +.IP
>> +.B MADV_POPULATE_WRITE
>> +cannot be applied to mappings without write permissions
>> +and special mappings marked with the kernel-internal
>> +.B VM_PFNMAP
>> +and
>> +.BR VM_IO .
>> +.IP
>> +Note that
>> +.BR MADV_POPULATE_WRITE ,
>> +the process can be killed at any moment when the system runs out of memory.
>> .SH RETURN VALUE
>> On success,
>> .BR madvise ()
>> @@ -533,6 +586,17 @@ or
>> .BR VM_PFNMAP
>> ranges.
>> .TP
>> +.B EINVAL
>> +.I advice
>> +is
>> +.B MADV_POPULATE_READ
>> +or
>> +.BR MADV_POPULATE_WRITE ,
>> +but the specified address range includes ranges with insufficient permissions,
>> +.B VM_IO
>> +or
>> +.BR VM_PFNMAP.
>> +.TP
>> .B EIO
>> (for
>> .BR MADV_WILLNEED )
>> @@ -548,6 +612,14 @@ Not enough memory: paging in failed.
>> Addresses in the specified range are not currently
>> mapped, or are outside the address space of the process.
>> .TP
>> +.B ENOMEM
>> +.I advice
>> +is
>> +.B MADV_POPULATE_READ
>> +or
>> +.BR MADV_POPULATE_WRITE ,
>> +but populating (prefaulting) page tables failed.
>> +.TP
>> .B EPERM
>> .I advice
>> is
>> @@ -555,6 +627,14 @@ is
>> but the caller does not have the
>> .B CAP_SYS_ADMIN
>> capability.
>> +.TP
>> +.B EHWPOISON
>> +.I advice
>> +is
>> +.B MADV_POPULATE_READ
>> +or
>> +.BR MADV_POPULATE_WRITE ,
>> +and a HW poisoned page is encountered.
>> .SH VERSIONS
>> Since Linux 3.18,
>> .\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb
>> --
>> 2.31.1
>>
>>
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE
2021-07-25 20:15 ` Alejandro Colomar (man-pages)
@ 2021-07-26 7:11 ` David Hildenbrand
0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2021-07-26 7:11 UTC (permalink / raw)
To: Alejandro Colomar (man-pages), Pankaj Gupta
Cc: linux-man, Michael Kerrisk, Andrew Morton, Michal Hocko,
Oscar Salvador, Jann Horn, Mike Rapoport, Linux API, Linux MM
Hi Alex,
>>> ---
>>> man2/madvise.2 | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 80 insertions(+)
>>>
>>> diff --git a/man2/madvise.2 b/man2/madvise.2
>>> index f1f384c0c..3ec8c53a7 100644
>>> --- a/man2/madvise.2
>>> +++ b/man2/madvise.2
>>> @@ -469,6 +469,59 @@ If a page is file-backed and dirty, it will be written back to the backing
>>> storage.
>>> The advice might be ignored for some pages in the range when it is not
>>> applicable.
>>> +.TP
>>> +.BR MADV_POPULATE_READ " (since Linux 5.14)
>
> s/$/"/
Thanks!
>
>>> +Populate (prefault) page tables readable for the whole range without actually
>
> See the following extract from man-pages(7):
>
> $ man 7 man-pages | sed -n '/Use semantic newlines/,/^$/p';
> Use semantic newlines
> In the source of a manual page, new sentences should be
> started on new lines, and long sentences should split into
> lines at clause breaks (commas, semicolons, colons, and so
> on). This convention, sometimes known as "semantic new‐
> lines", makes it easier to see the effect of patches, which
> often operate at the level of individual sentences or sen‐
> tence clauses.
Thanks, something like the following (also limiting to 80 characters
per page) work?
"
Populate (prefault) page tables readable for the whole range without actually
reading.
Depending on the underlying mapping,
map the shared zeropage,
preallocate memory or read the underlying file;
files with holes might or might not preallocate blocks.
"
>
>>> +reading. Depending on the underlying mapping, map the shared zeropage,
>>> +preallocate memory or read the underlying file; files with holes might or
>>> +might not preallocate blocks.
>>> +Do not generate
>>> +.B SIGBUS
>>> +when populating fails, return an error instead.
>>> +.IP
>>> +If
>>> +.B MADV_POPULATE_READ
>>> +succeeds, all page tables have been populated (prefaulted) readable once.
>>> +If
>>> +.B MADV_POPULATE_READ
>>> +fails, some page tables might have been populated.
>>> +.IP
>>> +.B MADV_POPULATE_READ
>>> +cannot be applied to mappings without read permissions
>>> +and special mappings marked with the kernel-internal
>>> +.B VM_PFNMAP
>>> +and
>>> +.BR VM_IO .
>>> +.IP
>>> +Note that with
>>> +.BR MADV_POPULATE_READ ,
>>> +the process can be killed at any moment when the system runs out of memory.
>>> +.TP
>>> +.BR MADV_POPULATE_WRITE " (since Linux 5.14)
>
> s/$/"/
Thanks!
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-07-26 7:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-12 8:39 [PATCH v1] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE David Hildenbrand
2021-07-12 9:58 ` Pankaj Gupta
2021-07-12 9:58 ` Pankaj Gupta
2021-07-12 10:03 ` David Hildenbrand
2021-07-12 10:17 ` Pankaj Gupta
2021-07-12 10:17 ` Pankaj Gupta
2021-07-25 20:15 ` Alejandro Colomar (man-pages)
2021-07-26 7:11 ` David Hildenbrand
2021-07-12 11:05 ` Pankaj Gupta
2021-07-12 11:05 ` Pankaj Gupta
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.