linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable
@ 2024-03-06 23:23 Richard Weinberger
  2024-03-06 23:23 ` [PATCH 2/2] [RFC] pagemap.rst: Document write bit Richard Weinberger
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Richard Weinberger @ 2024-03-06 23:23 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-kernel, linux-doc, upstream+pagemap,
	adobriyan, wangkefeng.wang, ryan.roberts, hughd, peterx, david,
	avagin, lstoakes, vbabka, akpm, usama.anjum, corbet,
	Richard Weinberger

Is a PTE present and writable, bit 58 will be set.
This allows detecting CoW memory mappings and other mappings
where a write access will cause a page fault.

Signed-off-by: Richard Weinberger <richard@nod.at>
---
 fs/proc/task_mmu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3f78ebbb795f..7c7e0e954c02 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1341,6 +1341,7 @@ struct pagemapread {
 #define PM_SOFT_DIRTY		BIT_ULL(55)
 #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
 #define PM_UFFD_WP		BIT_ULL(57)
+#define PM_WRITE		BIT_ULL(58)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1417,6 +1418,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 			flags |= PM_SOFT_DIRTY;
 		if (pte_uffd_wp(pte))
 			flags |= PM_UFFD_WP;
+		if (pte_write(pte))
+			flags |= PM_WRITE;
 	} else if (is_swap_pte(pte)) {
 		swp_entry_t entry;
 		if (pte_swp_soft_dirty(pte))
@@ -1483,6 +1486,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
 				flags |= PM_SOFT_DIRTY;
 			if (pmd_uffd_wp(pmd))
 				flags |= PM_UFFD_WP;
+			if (pmd_write(pmd))
+				flags |= PM_WRITE;
 			if (pm->show_pfn)
 				frame = pmd_pfn(pmd) +
 					((addr & ~PMD_MASK) >> PAGE_SHIFT);
@@ -1586,6 +1591,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
 		if (huge_pte_uffd_wp(pte))
 			flags |= PM_UFFD_WP;
 
+		if (pte_write(pte))
+			flags |= PM_WRITE;
+
 		flags |= PM_PRESENT;
 		if (pm->show_pfn)
 			frame = pte_pfn(pte) +
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] [RFC] pagemap.rst: Document write bit
  2024-03-06 23:23 [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable Richard Weinberger
@ 2024-03-06 23:23 ` Richard Weinberger
  2024-03-07 10:52   ` David Hildenbrand
  2024-03-10 22:14   ` Lorenzo Stoakes
  2024-03-07 10:44 ` [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable Muhammad Usama Anjum
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Richard Weinberger @ 2024-03-06 23:23 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-fsdevel, linux-kernel, linux-doc, upstream+pagemap,
	adobriyan, wangkefeng.wang, ryan.roberts, hughd, peterx, david,
	avagin, lstoakes, vbabka, akpm, usama.anjum, corbet,
	Richard Weinberger

Bit 58 denotes that a PTE is writable.
The main use case is detecting CoW mappings.

Signed-off-by: Richard Weinberger <richard@nod.at>
---
 Documentation/admin-guide/mm/pagemap.rst | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index f5f065c67615..81ffe3601b96 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -21,7 +21,8 @@ There are four components to pagemap:
     * Bit  56    page exclusively mapped (since 4.2)
     * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
       Documentation/admin-guide/mm/userfaultfd.rst)
-    * Bits 58-60 zero
+    * Bit  58    pte is writable (since 6.10)
+    * Bits 59-60 zero
     * Bit  61    page is file-page or shared-anon (since 3.5)
     * Bit  62    page swapped
     * Bit  63    page present
@@ -37,6 +38,11 @@ There are four components to pagemap:
    precisely which pages are mapped (or in swap) and comparing mapped
    pages between processes.
 
+   Bit 58 is useful to detect CoW mappings; however, it does not indicate
+   whether the page mapping is writable or not. If an anonymous mapping is
+   writable but the write bit is not set, it means that the next write access
+   will cause a page fault, and copy-on-write will happen.
+
    Efficient users of this interface will use ``/proc/pid/maps`` to
    determine which areas of memory are actually mapped and llseek to
    skip over unmapped regions.
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable
  2024-03-06 23:23 [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable Richard Weinberger
  2024-03-06 23:23 ` [PATCH 2/2] [RFC] pagemap.rst: Document write bit Richard Weinberger
@ 2024-03-07 10:44 ` Muhammad Usama Anjum
  2024-03-07 10:52 ` David Hildenbrand
  2024-03-10 21:55 ` Lorenzo Stoakes
  3 siblings, 0 replies; 15+ messages in thread
From: Muhammad Usama Anjum @ 2024-03-07 10:44 UTC (permalink / raw)
  To: Richard Weinberger, linux-mm
  Cc: Muhammad Usama Anjum, linux-fsdevel, linux-kernel, linux-doc,
	upstream+pagemap, adobriyan, wangkefeng.wang, ryan.roberts,
	hughd, peterx, david, avagin, lstoakes, vbabka, akpm, corbet

On 3/7/24 4:23 AM, Richard Weinberger wrote:
> Is a PTE present and writable, bit 58 will be set.
> This allows detecting CoW memory mappings and other mappings
> where a write access will cause a page fault.
> 
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>  fs/proc/task_mmu.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 3f78ebbb795f..7c7e0e954c02 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1341,6 +1341,7 @@ struct pagemapread {
>  #define PM_SOFT_DIRTY		BIT_ULL(55)
>  #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
>  #define PM_UFFD_WP		BIT_ULL(57)
> +#define PM_WRITE		BIT_ULL(58)
The name doesn't mention present from its "present and writable"
definition. Maybe some other name like PM_PRESENT_WRITE?

>  #define PM_FILE			BIT_ULL(61)
>  #define PM_SWAP			BIT_ULL(62)
>  #define PM_PRESENT		BIT_ULL(63)
> @@ -1417,6 +1418,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>  			flags |= PM_SOFT_DIRTY;
>  		if (pte_uffd_wp(pte))
>  			flags |= PM_UFFD_WP;
> +		if (pte_write(pte))
> +			flags |= PM_WRITE;
>  	} else if (is_swap_pte(pte)) {
>  		swp_entry_t entry;
>  		if (pte_swp_soft_dirty(pte))
> @@ -1483,6 +1486,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
>  				flags |= PM_SOFT_DIRTY;
>  			if (pmd_uffd_wp(pmd))
>  				flags |= PM_UFFD_WP;
> +			if (pmd_write(pmd))
> +				flags |= PM_WRITE;
>  			if (pm->show_pfn)
>  				frame = pmd_pfn(pmd) +
>  					((addr & ~PMD_MASK) >> PAGE_SHIFT);
> @@ -1586,6 +1591,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
>  		if (huge_pte_uffd_wp(pte))
>  			flags |= PM_UFFD_WP;
>  
> +		if (pte_write(pte))
> +			flags |= PM_WRITE;
> +
>  		flags |= PM_PRESENT;
>  		if (pm->show_pfn)
>  			frame = pte_pfn(pte) +

-- 
BR,
Muhammad Usama Anjum

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable
  2024-03-06 23:23 [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable Richard Weinberger
  2024-03-06 23:23 ` [PATCH 2/2] [RFC] pagemap.rst: Document write bit Richard Weinberger
  2024-03-07 10:44 ` [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable Muhammad Usama Anjum
@ 2024-03-07 10:52 ` David Hildenbrand
  2024-03-07 11:10   ` Richard Weinberger
  2024-03-10 21:55 ` Lorenzo Stoakes
  3 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-03-07 10:52 UTC (permalink / raw)
  To: Richard Weinberger, linux-mm
  Cc: linux-fsdevel, linux-kernel, linux-doc, upstream+pagemap,
	adobriyan, wangkefeng.wang, ryan.roberts, hughd, peterx, avagin,
	lstoakes, vbabka, akpm, usama.anjum, corbet

On 07.03.24 00:23, Richard Weinberger wrote:
> Is a PTE present and writable, bit 58 will be set.
> This allows detecting CoW memory mappings and other mappings
> where a write access will cause a page fault.
> 

But why is that required? What is the target use case? (I did not get 
the cover letter in my inbox)

We're running slowly but steadily out of bits, so we better make wise 
decisions.

Also, consider: Architectures where the dirty/access bit is not HW 
managed could indicate "writable" here although we *will* get a page 
fault to set the page dirty/accessed.

So best this can universally do is say "this PTE currently has write 
permissions".

> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>   fs/proc/task_mmu.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 3f78ebbb795f..7c7e0e954c02 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1341,6 +1341,7 @@ struct pagemapread {
>   #define PM_SOFT_DIRTY		BIT_ULL(55)
>   #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
>   #define PM_UFFD_WP		BIT_ULL(57)
> +#define PM_WRITE		BIT_ULL(58)
>   #define PM_FILE			BIT_ULL(61)
>   #define PM_SWAP			BIT_ULL(62)
>   #define PM_PRESENT		BIT_ULL(63)
> @@ -1417,6 +1418,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>   			flags |= PM_SOFT_DIRTY;
>   		if (pte_uffd_wp(pte))
>   			flags |= PM_UFFD_WP;
> +		if (pte_write(pte))
> +			flags |= PM_WRITE;
>   	} else if (is_swap_pte(pte)) {
>   		swp_entry_t entry;
>   		if (pte_swp_soft_dirty(pte))
> @@ -1483,6 +1486,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
>   				flags |= PM_SOFT_DIRTY;
>   			if (pmd_uffd_wp(pmd))
>   				flags |= PM_UFFD_WP;
> +			if (pmd_write(pmd))
> +				flags |= PM_WRITE;
>   			if (pm->show_pfn)
>   				frame = pmd_pfn(pmd) +
>   					((addr & ~PMD_MASK) >> PAGE_SHIFT);
> @@ -1586,6 +1591,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
>   		if (huge_pte_uffd_wp(pte))
>   			flags |= PM_UFFD_WP;
>   
> +		if (pte_write(pte))
> +			flags |= PM_WRITE;
> +
>   		flags |= PM_PRESENT;
>   		if (pm->show_pfn)
>   			frame = pte_pfn(pte) +

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] [RFC] pagemap.rst: Document write bit
  2024-03-06 23:23 ` [PATCH 2/2] [RFC] pagemap.rst: Document write bit Richard Weinberger
@ 2024-03-07 10:52   ` David Hildenbrand
  2024-03-07 11:10     ` Richard Weinberger
  2024-03-10 22:14   ` Lorenzo Stoakes
  1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-03-07 10:52 UTC (permalink / raw)
  To: Richard Weinberger, linux-mm
  Cc: linux-fsdevel, linux-kernel, linux-doc, upstream+pagemap,
	adobriyan, wangkefeng.wang, ryan.roberts, hughd, peterx, avagin,
	lstoakes, vbabka, akpm, usama.anjum, corbet

On 07.03.24 00:23, Richard Weinberger wrote:
> Bit 58 denotes that a PTE is writable.
> The main use case is detecting CoW mappings.
> 
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>   Documentation/admin-guide/mm/pagemap.rst | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> index f5f065c67615..81ffe3601b96 100644
> --- a/Documentation/admin-guide/mm/pagemap.rst
> +++ b/Documentation/admin-guide/mm/pagemap.rst
> @@ -21,7 +21,8 @@ There are four components to pagemap:
>       * Bit  56    page exclusively mapped (since 4.2)
>       * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
>         Documentation/admin-guide/mm/userfaultfd.rst)
> -    * Bits 58-60 zero
> +    * Bit  58    pte is writable (since 6.10)
> +    * Bits 59-60 zero
>       * Bit  61    page is file-page or shared-anon (since 3.5)
>       * Bit  62    page swapped
>       * Bit  63    page present
> @@ -37,6 +38,11 @@ There are four components to pagemap:
>      precisely which pages are mapped (or in swap) and comparing mapped
>      pages between processes.
>   
> +   Bit 58 is useful to detect CoW mappings; however, it does not indicate
> +   whether the page mapping is writable or not. If an anonymous mapping is
> +   writable but the write bit is not set, it means that the next write access
> +   will cause a page fault, and copy-on-write will happen.

That is not true.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable
  2024-03-07 10:52 ` David Hildenbrand
@ 2024-03-07 11:10   ` Richard Weinberger
  2024-03-07 11:20     ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Weinberger @ 2024-03-07 11:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-fsdevel, linux-kernel, Linux Doc Mailing List,
	upstream+pagemap, adobriyan, wangkefeng wang, ryan roberts,
	hughd, peterx, avagin, lstoakes, vbabka, Andrew Morton,
	usama anjum, Jonathan Corbet

----- Ursprüngliche Mail -----
> Von: "David Hildenbrand" <david@redhat.com>
> But why is that required? What is the target use case? (I did not get
> the cover letter in my inbox)
> 
> We're running slowly but steadily out of bits, so we better make wise
> decisions.
> 
> Also, consider: Architectures where the dirty/access bit is not HW
> managed could indicate "writable" here although we *will* get a page
> fault to set the page dirty/accessed.

I'm currently investigating why a real-time application faces unexpected
page faults. Page faults are usually fatal for real-time work loads because
the latency constraints are no longer met.

So, I wrote a small tool to inspect the memory mappings of a process to find
areas which are not correctly pre-faulted. While doing so I noticed that
there is currently no way to detect CoW mappings.
Exposing the writable property of a PTE seemed like a good start to me.

> So best this can universally do is say "this PTE currently has write
> permissions".

Ok.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] [RFC] pagemap.rst: Document write bit
  2024-03-07 10:52   ` David Hildenbrand
@ 2024-03-07 11:10     ` Richard Weinberger
  2024-03-07 11:15       ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Weinberger @ 2024-03-07 11:10 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-fsdevel, linux-kernel, Linux Doc Mailing List,
	upstream+pagemap, adobriyan, wangkefeng wang, ryan roberts,
	hughd, peterx, avagin, lstoakes, vbabka, Andrew Morton,
	usama anjum, Jonathan Corbet

----- Ursprüngliche Mail -----
> Von: "David Hildenbrand" <david@redhat.com>
> An: "richard" <richard@nod.at>, "linux-mm" <linux-mm@kvack.org>
>> +   Bit 58 is useful to detect CoW mappings; however, it does not indicate
>> +   whether the page mapping is writable or not. If an anonymous mapping is
>> +   writable but the write bit is not set, it means that the next write access
>> +   will cause a page fault, and copy-on-write will happen.
> 
> That is not true.

Can you please help me correct my obvious misunderstanding?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] [RFC] pagemap.rst: Document write bit
  2024-03-07 11:10     ` Richard Weinberger
@ 2024-03-07 11:15       ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-03-07 11:15 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-mm, linux-fsdevel, linux-kernel, Linux Doc Mailing List,
	upstream+pagemap, adobriyan, wangkefeng wang, ryan roberts,
	hughd, peterx, avagin, lstoakes, vbabka, Andrew Morton,
	usama anjum, Jonathan Corbet

On 07.03.24 12:10, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Von: "David Hildenbrand" <david@redhat.com>
>> An: "richard" <richard@nod.at>, "linux-mm" <linux-mm@kvack.org>
>>> +   Bit 58 is useful to detect CoW mappings; however, it does not indicate
>>> +   whether the page mapping is writable or not. If an anonymous mapping is
>>> +   writable but the write bit is not set, it means that the next write access
>>> +   will cause a page fault, and copy-on-write will happen.
>>
>> That is not true.
> 
> Can you please help me correct my obvious misunderstanding?

We'll perform a page copy of an anonymous page only if the page is not 
detected as exclusive to the process.

So a better description could be:

"In an private mapping, having the writable bit clear can indicate that 
next write access will result in copy-on-write during a page fault. Note 
that exclusive anonymous pages can be mapped read-only, and they might 
simply get remapped writable during the next write fault, avoiding a 
page copy."

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable
  2024-03-07 11:10   ` Richard Weinberger
@ 2024-03-07 11:20     ` David Hildenbrand
  2024-03-07 11:51       ` Richard Weinberger
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-03-07 11:20 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-mm, linux-fsdevel, linux-kernel, Linux Doc Mailing List,
	upstream+pagemap, adobriyan, wangkefeng wang, ryan roberts,
	hughd, peterx, avagin, lstoakes, vbabka, Andrew Morton,
	usama anjum, Jonathan Corbet

On 07.03.24 12:10, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Von: "David Hildenbrand" <david@redhat.com>
>> But why is that required? What is the target use case? (I did not get
>> the cover letter in my inbox)
>>
>> We're running slowly but steadily out of bits, so we better make wise
>> decisions.
>>
>> Also, consider: Architectures where the dirty/access bit is not HW
>> managed could indicate "writable" here although we *will* get a page
>> fault to set the page dirty/accessed.
> 
> I'm currently investigating why a real-time application faces unexpected
> page faults. Page faults are usually fatal for real-time work loads because
> the latency constraints are no longer met.

Are you concerned about any type of page fault, or are things like a 
simple remapping of the same page from "read-only to writable" 
acceptable? ("very minor fault")

> 
> So, I wrote a small tool to inspect the memory mappings of a process to find
> areas which are not correctly pre-faulted. While doing so I noticed that
> there is currently no way to detect CoW mappings.
> Exposing the writable property of a PTE seemed like a good start to me.

Is it just about "detection" for debugging purposes or about "fixup" in 
running applications?

If it's the latter, MADV_POPULATE_WRITE might do what you want (in 
writable mappings).

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable
  2024-03-07 11:20     ` David Hildenbrand
@ 2024-03-07 11:51       ` Richard Weinberger
  2024-03-07 11:59         ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Weinberger @ 2024-03-07 11:51 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-fsdevel, linux-kernel, Linux Doc Mailing List,
	upstream+pagemap, adobriyan, wangkefeng wang, ryan roberts,
	hughd, peterx, avagin, lstoakes, vbabka, Andrew Morton,
	usama anjum, Jonathan Corbet

----- Ursprüngliche Mail -----
> Von: "David Hildenbrand" <david@redhat.com>
>> I'm currently investigating why a real-time application faces unexpected
>> page faults. Page faults are usually fatal for real-time work loads because
>> the latency constraints are no longer met.
> 
> Are you concerned about any type of page fault, or are things like a
> simple remapping of the same page from "read-only to writable"
> acceptable? ("very minor fault")

Any page fault has to be avoided.
To give you more background, the real time application runs on Xenomai,
a real time extension for Linux.
Xenomai applies already many tweaks to the kernel to trigger pre-faulting of
memory areas. But sometimes the application does not use the Xenomai API
correctly or there is an bug in Xenomai it self.
Currently I'm suspecting the latter.
 
>> 
>> So, I wrote a small tool to inspect the memory mappings of a process to find
>> areas which are not correctly pre-faulted. While doing so I noticed that
>> there is currently no way to detect CoW mappings.
>> Exposing the writable property of a PTE seemed like a good start to me.
> 
> Is it just about "detection" for debugging purposes or about "fixup" in
> running applications?

It's only about debugging. If an application fails a test I want to have
a tool which tells me what memory mappings are wonky or could cause a fault
at runtime.

I fully understand that my use case is a corner case and anything but mainline.
While developing my debug tool I thought that improving the pagemap interface
might help others too.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable
  2024-03-07 11:51       ` Richard Weinberger
@ 2024-03-07 11:59         ` David Hildenbrand
  2024-03-07 12:09           ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-03-07 11:59 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-mm, linux-fsdevel, linux-kernel, Linux Doc Mailing List,
	upstream+pagemap, adobriyan, wangkefeng wang, ryan roberts,
	hughd, peterx, avagin, lstoakes, vbabka, Andrew Morton,
	usama anjum, Jonathan Corbet

On 07.03.24 12:51, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Von: "David Hildenbrand" <david@redhat.com>
>>> I'm currently investigating why a real-time application faces unexpected
>>> page faults. Page faults are usually fatal for real-time work loads because
>>> the latency constraints are no longer met.
>>
>> Are you concerned about any type of page fault, or are things like a
>> simple remapping of the same page from "read-only to writable"
>> acceptable? ("very minor fault")
> 
> Any page fault has to be avoided.
> To give you more background, the real time application runs on Xenomai,
> a real time extension for Linux.
> Xenomai applies already many tweaks to the kernel to trigger pre-faulting of
> memory areas. But sometimes the application does not use the Xenomai API
> correctly or there is an bug in Xenomai it self.
> Currently I'm suspecting the latter.
>   

Thanks for the details!

>>>
>>> So, I wrote a small tool to inspect the memory mappings of a process to find
>>> areas which are not correctly pre-faulted. While doing so I noticed that
>>> there is currently no way to detect CoW mappings.
>>> Exposing the writable property of a PTE seemed like a good start to me.
>>
>> Is it just about "detection" for debugging purposes or about "fixup" in
>> running applications?
> 
> It's only about debugging. If an application fails a test I want to have
> a tool which tells me what memory mappings are wonky or could cause a fault
> at runtime.

One destructive way to find out in a writable mapping if the page would 
actually get remapped:

a) Read the PFN of a virtual address using pagemap
b) Write to the virtual address using /proc/pid/mem
c) Read the PFN of a virtual address using pagemap to see if it changed

If the application can be paused, you could read+write a single byte, 
turning it non-destructive.

But that would still "hide" the remap-writable-type faults.

> 
> I fully understand that my use case is a corner case and anything but mainline.
> While developing my debug tool I thought that improving the pagemap interface
> might help others too.

I'm fine with this (can be a helpful debugging tool for some other cases 
as well, and IIRC we don't have another interface to introspect this), 
as long as we properly document the corner case that there could still 
be writefaults on some architectures when the page would not be 
accessed/dirty yet.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable
  2024-03-07 11:59         ` David Hildenbrand
@ 2024-03-07 12:09           ` David Hildenbrand
  2024-03-07 14:42             ` Richard Weinberger
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-03-07 12:09 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-mm, linux-fsdevel, linux-kernel, Linux Doc Mailing List,
	upstream+pagemap, adobriyan, wangkefeng wang, ryan roberts,
	hughd, peterx, avagin, lstoakes, vbabka, Andrew Morton,
	usama anjum, Jonathan Corbet

On 07.03.24 12:59, David Hildenbrand wrote:
> On 07.03.24 12:51, Richard Weinberger wrote:
>> ----- Ursprüngliche Mail -----
>>> Von: "David Hildenbrand" <david@redhat.com>
>>>> I'm currently investigating why a real-time application faces unexpected
>>>> page faults. Page faults are usually fatal for real-time work loads because
>>>> the latency constraints are no longer met.
>>>
>>> Are you concerned about any type of page fault, or are things like a
>>> simple remapping of the same page from "read-only to writable"
>>> acceptable? ("very minor fault")
>>
>> Any page fault has to be avoided.
>> To give you more background, the real time application runs on Xenomai,
>> a real time extension for Linux.
>> Xenomai applies already many tweaks to the kernel to trigger pre-faulting of
>> memory areas. But sometimes the application does not use the Xenomai API
>> correctly or there is an bug in Xenomai it self.
>> Currently I'm suspecting the latter.
>>    
> 
> Thanks for the details!
> 
>>>>
>>>> So, I wrote a small tool to inspect the memory mappings of a process to find
>>>> areas which are not correctly pre-faulted. While doing so I noticed that
>>>> there is currently no way to detect CoW mappings.
>>>> Exposing the writable property of a PTE seemed like a good start to me.
>>>
>>> Is it just about "detection" for debugging purposes or about "fixup" in
>>> running applications?
>>
>> It's only about debugging. If an application fails a test I want to have
>> a tool which tells me what memory mappings are wonky or could cause a fault
>> at runtime.
> 
> One destructive way to find out in a writable mapping if the page would
> actually get remapped:
> 
> a) Read the PFN of a virtual address using pagemap
> b) Write to the virtual address using /proc/pid/mem
> c) Read the PFN of a virtual address using pagemap to see if it changed
> 
> If the application can be paused, you could read+write a single byte,
> turning it non-destructive.
> 
> But that would still "hide" the remap-writable-type faults.
> 
>>
>> I fully understand that my use case is a corner case and anything but mainline.
>> While developing my debug tool I thought that improving the pagemap interface
>> might help others too.
> 
> I'm fine with this (can be a helpful debugging tool for some other cases
> as well, and IIRC we don't have another interface to introspect this),
> as long as we properly document the corner case that there could still
> be writefaults on some architectures when the page would not be
> accessed/dirty yet.
> 

[and I just recall, there are some other corner cases. For example, 
pages in a shadow stack can be pte_write(), but they can only be written 
by HW indirectly when modifying the stack, and ordinary write access 
would still fault]

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable
  2024-03-07 12:09           ` David Hildenbrand
@ 2024-03-07 14:42             ` Richard Weinberger
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Weinberger @ 2024-03-07 14:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-fsdevel, linux-kernel, Linux Doc Mailing List,
	upstream+pagemap, adobriyan, wangkefeng wang, ryan roberts,
	hughd, peterx, avagin, lstoakes, vbabka, Andrew Morton,
	usama anjum, Jonathan Corbet

----- Ursprüngliche Mail -----
> Von: "David Hildenbrand" <david@redhat.com>
>> One destructive way to find out in a writable mapping if the page would
>> actually get remapped:
>> 
>> a) Read the PFN of a virtual address using pagemap
>> b) Write to the virtual address using /proc/pid/mem
>> c) Read the PFN of a virtual address using pagemap to see if it changed
>> 
>> If the application can be paused, you could read+write a single byte,
>> turning it non-destructive.

I'm not so sure whether this works well if a mapping is device memory or such.
 
>> But that would still "hide" the remap-writable-type faults.

Xenomai will tell me anyway when there was a page fault while a real time thread
had the CPU.
My idea was having a tool to check before the applications enters the critical phase.

>>> I fully understand that my use case is a corner case and anything but mainline.
>>> While developing my debug tool I thought that improving the pagemap interface
>>> might help others too.
>> 
>> I'm fine with this (can be a helpful debugging tool for some other cases
>> as well, and IIRC we don't have another interface to introspect this),
>> as long as we properly document the corner case that there could still
>> be writefaults on some architectures when the page would not be
>> accessed/dirty yet.

Cool. :)
 
> 
> [and I just recall, there are some other corner cases. For example,
> pages in a shadow stack can be pte_write(), but they can only be written
> by HW indirectly when modifying the stack, and ordinary write access
> would still fault]

Yeah, I noticed this while browsing through various pte_write() implementations.
That's a tradeoff I can live with.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable
  2024-03-06 23:23 [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable Richard Weinberger
                   ` (2 preceding siblings ...)
  2024-03-07 10:52 ` David Hildenbrand
@ 2024-03-10 21:55 ` Lorenzo Stoakes
  3 siblings, 0 replies; 15+ messages in thread
From: Lorenzo Stoakes @ 2024-03-10 21:55 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-doc,
	upstream+pagemap, adobriyan, wangkefeng.wang, ryan.roberts,
	hughd, peterx, david, avagin, vbabka, akpm, usama.anjum, corbet

On Thu, Mar 07, 2024 at 12:23:38AM +0100, Richard Weinberger wrote:
> Is a PTE present and writable, bit 58 will be set.
> This allows detecting CoW memory mappings and other mappings
> where a write access will cause a page fault.

I think David has highlighted it elsewhere in the thread, but this
explanation definitely needs bulking up.

Need to emphsaise that we detect cases where a fault will occur (_possibly_
CoW, _possibly_ write notify clean file-backed page, _possibly_ other cases
where we need write fault tracking).

Very important to differentiate between a _page table_ read/write flag
being set and the mapping being read-only, it's a concern that being loose
on this might confuse people somewhat.

>
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>  fs/proc/task_mmu.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 3f78ebbb795f..7c7e0e954c02 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1341,6 +1341,7 @@ struct pagemapread {
>  #define PM_SOFT_DIRTY		BIT_ULL(55)
>  #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
>  #define PM_UFFD_WP		BIT_ULL(57)
> +#define PM_WRITE		BIT_ULL(58)

As an extension of the above comment re: confusion, I really dislike
PM_WRITE. Something like PM_PTE_WRITABLE might be better?

>  #define PM_FILE			BIT_ULL(61)
>  #define PM_SWAP			BIT_ULL(62)
>  #define PM_PRESENT		BIT_ULL(63)
> @@ -1417,6 +1418,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>  			flags |= PM_SOFT_DIRTY;
>  		if (pte_uffd_wp(pte))
>  			flags |= PM_UFFD_WP;
> +		if (pte_write(pte))
> +			flags |= PM_WRITE;
>  	} else if (is_swap_pte(pte)) {
>  		swp_entry_t entry;
>  		if (pte_swp_soft_dirty(pte))
> @@ -1483,6 +1486,8 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
>  				flags |= PM_SOFT_DIRTY;
>  			if (pmd_uffd_wp(pmd))
>  				flags |= PM_UFFD_WP;
> +			if (pmd_write(pmd))
> +				flags |= PM_WRITE;
>  			if (pm->show_pfn)
>  				frame = pmd_pfn(pmd) +
>  					((addr & ~PMD_MASK) >> PAGE_SHIFT);
> @@ -1586,6 +1591,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask,
>  		if (huge_pte_uffd_wp(pte))
>  			flags |= PM_UFFD_WP;
>
> +		if (pte_write(pte))

This should be huge_pte_write(). It amounts to the same thing, but for
consistency :)

> +			flags |= PM_WRITE;
> +
>  		flags |= PM_PRESENT;
>  		if (pm->show_pfn)
>  			frame = pte_pfn(pte) +
> --
> 2.35.3
>

Overall I _really_ like the idea of exposing this. Not long ago I wanted to
be able to assess whether private mappings were CoW'd or not 'at a glance'
and couldn't find any means of doing this (of course I might have missed
something but I don't think there is anything).

So I think a single bit in /proc/$pid/pagemap is absolutely worthwhile to
get this information.

I'd like to see a non-RFC version submitted :) as discussed on irc,
probably best after merge window!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] [RFC] pagemap.rst: Document write bit
  2024-03-06 23:23 ` [PATCH 2/2] [RFC] pagemap.rst: Document write bit Richard Weinberger
  2024-03-07 10:52   ` David Hildenbrand
@ 2024-03-10 22:14   ` Lorenzo Stoakes
  1 sibling, 0 replies; 15+ messages in thread
From: Lorenzo Stoakes @ 2024-03-10 22:14 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-mm, linux-fsdevel, linux-kernel, linux-doc,
	upstream+pagemap, adobriyan, wangkefeng.wang, ryan.roberts,
	hughd, peterx, david, avagin, vbabka, akpm, usama.anjum, corbet

On Thu, Mar 07, 2024 at 12:23:39AM +0100, Richard Weinberger wrote:
> Bit 58 denotes that a PTE is writable.
> The main use case is detecting CoW mappings.
>
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>  Documentation/admin-guide/mm/pagemap.rst | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> index f5f065c67615..81ffe3601b96 100644
> --- a/Documentation/admin-guide/mm/pagemap.rst
> +++ b/Documentation/admin-guide/mm/pagemap.rst
> @@ -21,7 +21,8 @@ There are four components to pagemap:
>      * Bit  56    page exclusively mapped (since 4.2)
>      * Bit  57    pte is uffd-wp write-protected (since 5.13) (see
>        Documentation/admin-guide/mm/userfaultfd.rst)
> -    * Bits 58-60 zero
> +    * Bit  58    pte is writable (since 6.10)

I really think we need to be careful about talking about 'writable' again
because people are easily confused about the difference between a writable
_mapping_ and a writable _page table entry_.

Of course you mention PTE here, but I think it might be better to say
something like:

    * Bit  58    raw pte r/w flag (since 6.10)

> +    * Bits 59-60 zero
>      * Bit  61    page is file-page or shared-anon (since 3.5)
>      * Bit  62    page swapped
>      * Bit  63    page present
> @@ -37,6 +38,11 @@ There are four components to pagemap:
>     precisely which pages are mapped (or in swap) and comparing mapped
>     pages between processes.
>
> +   Bit 58 is useful to detect CoW mappings; however, it does not indicate
> +   whether the page mapping is writable or not. If an anonymous mapping is
> +   writable but the write bit is not set, it means that the next write access
> +   will cause a page fault, and copy-on-write will happen.
> +

David has addressed the copy vs. anon exclusive remap issue, but I also
feel this needs some balking out.

I would simply rephrase this in terms of whether a write fault occurs or
not e.g.:

   Bit 58 indicates whether the PTE has the write flag set. If this flag is
   unset, then write accesses for this mapping will cause a fault for this
   page. If the mapping is private (whether anonymous or file-backed), this
   can result in a Copy-on-Write (though if anonymous-excusive the flag
   will simply be set). If file-backed, this being cleared may simply
   indicate that this file page is clean.

>     Efficient users of this interface will use ``/proc/pid/maps`` to
>     determine which areas of memory are actually mapped and llseek to
>     skip over unmapped regions.
> --
> 2.35.3
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-03-10 22:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-06 23:23 [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable Richard Weinberger
2024-03-06 23:23 ` [PATCH 2/2] [RFC] pagemap.rst: Document write bit Richard Weinberger
2024-03-07 10:52   ` David Hildenbrand
2024-03-07 11:10     ` Richard Weinberger
2024-03-07 11:15       ` David Hildenbrand
2024-03-10 22:14   ` Lorenzo Stoakes
2024-03-07 10:44 ` [PATCH 1/2] [RFC] proc: pagemap: Expose whether a PTE is writable Muhammad Usama Anjum
2024-03-07 10:52 ` David Hildenbrand
2024-03-07 11:10   ` Richard Weinberger
2024-03-07 11:20     ` David Hildenbrand
2024-03-07 11:51       ` Richard Weinberger
2024-03-07 11:59         ` David Hildenbrand
2024-03-07 12:09           ` David Hildenbrand
2024-03-07 14:42             ` Richard Weinberger
2024-03-10 21:55 ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).