All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/2] mm/hugetlb: fix write-fault handling for shared mappings
@ 2022-08-05 11:03 David Hildenbrand
  2022-08-05 11:03 ` [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify David Hildenbrand
  2022-08-05 11:03 ` [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings David Hildenbrand
  0 siblings, 2 replies; 30+ messages in thread
From: David Hildenbrand @ 2022-08-05 11:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Kravetz,
	Muchun Song, Peter Xu, Peter Feiner, Kirill A . Shutemov

I observed that hugetlb does not support/expect write-faults in shared
mappings that would have to map the R/O-mapped page writable -- and I
found one case where we could currently get such faults and would
erroneously map an anon page into a shared mapping, by triggering
clear_refs to clear soft-dirty tracking at the wrong point in time on a
process.

I propose to backport the fix to stable trees, as the fix for write-notify
should be straight forward.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

David Hildenbrand (2):
  mm/hugetlb: fix hugetlb not supporting write-notify
  mm/hugetlb: support write-faults in shared mappings

 mm/hugetlb.c | 21 ++++++++++++++-------
 mm/mmap.c    |  7 +++++++
 2 files changed, 21 insertions(+), 7 deletions(-)

-- 
2.35.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 11:03 [PATCH v1 0/2] mm/hugetlb: fix write-fault handling for shared mappings David Hildenbrand
@ 2022-08-05 11:03 ` David Hildenbrand
  2022-08-05 18:14   ` Peter Xu
  2022-08-05 11:03 ` [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings David Hildenbrand
  1 sibling, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2022-08-05 11:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Kravetz,
	Muchun Song, Peter Xu, Peter Feiner, Kirill A . Shutemov, stable

Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping. In fact, there is none, and so far we thought we could get
away with that because e.g., mprotect() should always do the right thing
and map all pages directly writable.

Looks like we were wrong:

--------------------------------------------------------------------------
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <errno.h>
 #include <sys/mman.h>

 #define HUGETLB_SIZE (2 * 1024 * 1024u)

 static void clear_softdirty(void)
 {
         int fd = open("/proc/self/clear_refs", O_WRONLY);
         const char *ctrl = "4";
         int ret;

         if (fd < 0) {
                 fprintf(stderr, "open(clear_refs) failed\n");
                 exit(1);
         }
         ret = write(fd, ctrl, strlen(ctrl));
         if (ret != strlen(ctrl)) {
                 fprintf(stderr, "write(clear_refs) failed\n");
                 exit(1);
         }
         close(fd);
 }

 int main(int argc, char **argv)
 {
         char *map;
         int fd;

         fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
         if (!fd) {
                 fprintf(stderr, "open() failed\n");
                 return -errno;
         }
         if (ftruncate(fd, HUGETLB_SIZE)) {
                 fprintf(stderr, "ftruncate() failed\n");
                 return -errno;
         }

         map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
         if (map == MAP_FAILED) {
                 fprintf(stderr, "mmap() failed\n");
                 return -errno;
         }

         *map = 0;

         if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         clear_softdirty();

         if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         *map = 0;

         return 0;
 }
--------------------------------------------------------------------------

Above test fails with SIGBUS when there is only a single free hugetlb page.
 # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 Bus error (core dumped)

And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
 # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 # cat /proc/meminfo | grep HugePages_
 HugePages_Total:       2
 HugePages_Free:        1
 HugePages_Rsvd:    18446744073709551615
 HugePages_Surp:        0

Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support write-notify, including softdirty tracking.

Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/mmap.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 61e6135c54ef..462a6b0344ac 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
 	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
 		return 0;
 
+	/*
+	 * Hugetlb does not require/support writenotify; especially, it does not
+	 * support softdirty tracking.
+	 */
+	if (is_vm_hugetlb_page(vma))
+		return 0;
+
 	/* The backer wishes to know when pages are first written to? */
 	if (vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite))
 		return 1;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-05 11:03 [PATCH v1 0/2] mm/hugetlb: fix write-fault handling for shared mappings David Hildenbrand
  2022-08-05 11:03 ` [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify David Hildenbrand
@ 2022-08-05 11:03 ` David Hildenbrand
  2022-08-05 18:12   ` Peter Xu
  1 sibling, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2022-08-05 11:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Kravetz,
	Muchun Song, Peter Xu, Peter Feiner, Kirill A . Shutemov

Let's add a safety net if we ever get (again) a write-fault on a R/O-mapped
page in a shared mapping, in which case we simply have to map the
page writable.

VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE
indicates that this was at least envisioned, but could never have worked
as expected. This theoretically paves the way for softdirty tracking
support in hugetlb.

Tested without the fix for softdirty tracking.

Note that there is no need to do any kind of reservation in hugetlb_fault()
in this case ... because we already have a hugetlb page mapped R/O
that we will simply map writable and we are not dealing with COW/unsharing.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/hugetlb.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a18c071c294e..bbab7aa9d8f8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5233,6 +5233,16 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 	VM_BUG_ON(unshare && (flags & FOLL_WRITE));
 	VM_BUG_ON(!unshare && !(flags & FOLL_WRITE));
 
+	/* Let's take out shared mappings first, this should be a rare event. */
+	if (unlikely(vma->vm_flags & VM_MAYSHARE)) {
+		if (unshare)
+			return 0;
+		if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
+			return VM_FAULT_SIGSEGV;
+		set_huge_ptep_writable(vma, haddr, ptep);
+		return 0;
+	}
+
 	pte = huge_ptep_get(ptep);
 	old_page = pte_page(pte);
 
@@ -5767,12 +5777,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * If we are going to COW/unshare the mapping later, we examine the
 	 * pending reservations for this page now. This will ensure that any
 	 * allocations necessary to record that reservation occur outside the
-	 * spinlock. For private mappings, we also lookup the pagecache
-	 * page now as it is used to determine if a reservation has been
-	 * consumed.
+	 * spinlock. Also lookup the pagecache page now as it is used to
+	 * determine if a reservation has been consumed.
 	 */
 	if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
-	    !huge_pte_write(entry)) {
+	    !(vma->vm_flags & VM_MAYSHARE) && !huge_pte_write(entry)) {
 		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto out_mutex;
@@ -5780,9 +5789,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		/* Just decrements count, does not deallocate */
 		vma_end_reservation(h, vma, haddr);
 
-		if (!(vma->vm_flags & VM_MAYSHARE))
-			pagecache_page = hugetlbfs_pagecache_page(h,
-								vma, haddr);
+		pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr);
 	}
 
 	ptl = huge_pte_lock(h, mm, ptep);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-05 11:03 ` [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings David Hildenbrand
@ 2022-08-05 18:12   ` Peter Xu
  2022-08-05 18:20     ` David Hildenbrand
  2022-08-05 23:08     ` Mike Kravetz
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Xu @ 2022-08-05 18:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

On Fri, Aug 05, 2022 at 01:03:29PM +0200, David Hildenbrand wrote:
> Let's add a safety net if we ever get (again) a write-fault on a R/O-mapped
> page in a shared mapping, in which case we simply have to map the
> page writable.
> 
> VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE
> indicates that this was at least envisioned, but could never have worked
> as expected. This theoretically paves the way for softdirty tracking
> support in hugetlb.
> 
> Tested without the fix for softdirty tracking.
> 
> Note that there is no need to do any kind of reservation in hugetlb_fault()
> in this case ... because we already have a hugetlb page mapped R/O
> that we will simply map writable and we are not dealing with COW/unsharing.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/hugetlb.c | 21 ++++++++++++++-------
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a18c071c294e..bbab7aa9d8f8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5233,6 +5233,16 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>  	VM_BUG_ON(unshare && (flags & FOLL_WRITE));
>  	VM_BUG_ON(!unshare && !(flags & FOLL_WRITE));
>  
> +	/* Let's take out shared mappings first, this should be a rare event. */
> +	if (unlikely(vma->vm_flags & VM_MAYSHARE)) {

Should we check VM_SHARED instead?

> +		if (unshare)
> +			return 0;

Curious when will this happen especially if we switch to VM_SHARED above.
Shouldn't "unshare" not happen at all on a shared region?

> +		if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
> +			return VM_FAULT_SIGSEGV;

I had a feeling that you just want to double check we have write
permission, but IIUC this should be checked far earlier or we'll have
problem.  No strong opinion if so, but I'd suggest dropping this one,
otherwise we could add tons of WARN_ON_ONCE() in anywhere in the page fault
stack and they mostly won't trigger at all.

> +		set_huge_ptep_writable(vma, haddr, ptep);

Do we wanna set dirty bits too?

> +		return 0;
> +	}

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 11:03 ` [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify David Hildenbrand
@ 2022-08-05 18:14   ` Peter Xu
  2022-08-05 18:22     ` David Hildenbrand
  2022-08-05 18:23     ` Mike Kravetz
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Xu @ 2022-08-05 18:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov, stable

On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 61e6135c54ef..462a6b0344ac 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
>  		return 0;
>  
> +	/*
> +	 * Hugetlb does not require/support writenotify; especially, it does not
> +	 * support softdirty tracking.
> +	 */
> +	if (is_vm_hugetlb_page(vma))
> +		return 0;

I'm kind of confused here..  you seems to be fixing up soft-dirty for
hugetlb but here it's explicitly forbidden.

Could you explain a bit more on why this patch is needed if (assume
there'll be a working) patch 2 being provided?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-05 18:12   ` Peter Xu
@ 2022-08-05 18:20     ` David Hildenbrand
  2022-08-08 16:05       ` Peter Xu
  2022-08-05 23:08     ` Mike Kravetz
  1 sibling, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2022-08-05 18:20 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

On 05.08.22 20:12, Peter Xu wrote:
> On Fri, Aug 05, 2022 at 01:03:29PM +0200, David Hildenbrand wrote:
>> Let's add a safety net if we ever get (again) a write-fault on a R/O-mapped
>> page in a shared mapping, in which case we simply have to map the
>> page writable.
>>
>> VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE
>> indicates that this was at least envisioned, but could never have worked
>> as expected. This theoretically paves the way for softdirty tracking
>> support in hugetlb.
>>
>> Tested without the fix for softdirty tracking.
>>
>> Note that there is no need to do any kind of reservation in hugetlb_fault()
>> in this case ... because we already have a hugetlb page mapped R/O
>> that we will simply map writable and we are not dealing with COW/unsharing.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  mm/hugetlb.c | 21 ++++++++++++++-------
>>  1 file changed, 14 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index a18c071c294e..bbab7aa9d8f8 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -5233,6 +5233,16 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
>>  	VM_BUG_ON(unshare && (flags & FOLL_WRITE));
>>  	VM_BUG_ON(!unshare && !(flags & FOLL_WRITE));
>>  
>> +	/* Let's take out shared mappings first, this should be a rare event. */
>> +	if (unlikely(vma->vm_flags & VM_MAYSHARE)) {
> 
> Should we check VM_SHARED instead?

Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
unfortunately wrong.

If you're curious, take a look at f83a275dbc5c ("mm: account for
MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
and mmap() code.

Long story short: if the file is read-only, we only have VM_MAYSHARE but
not VM_SHARED (and consequently also not VM_MAYWRITE).

> 
>> +		if (unshare)
>> +			return 0;
> 
> Curious when will this happen especially if we switch to VM_SHARED above.
> Shouldn't "unshare" not happen at all on a shared region?

FAULT_FLAG_UNSHARE is documented to behave like:

"FAULT_FLAG_UNSHARE is ignored and treated like an ordinary read fault
when no existing R/O-mapped anonymous page is encountered."

It should currently not happen. Focus on should ;)

> 
>> +		if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
>> +			return VM_FAULT_SIGSEGV;
> 
> I had a feeling that you just want to double check we have write
> permission, but IIUC this should be checked far earlier or we'll have
> problem.  No strong opinion if so, but I'd suggest dropping this one,
> otherwise we could add tons of WARN_ON_ONCE() in anywhere in the page fault
> stack and they mostly won't trigger at all.

Not quite. We usually (!hugetlb) have maybe_mkwrite() all over the
place. This is just an indication that we don't have maybe semantics
here. But as we also don't have it for hugetlb anon code below, maybe I
can just drop it. (or check it for both call paths)

> 
>> +		set_huge_ptep_writable(vma, haddr, ptep);
> 
> Do we wanna set dirty bits too?

set_huge_ptep_writable() handles that.

Thanks!


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 18:14   ` Peter Xu
@ 2022-08-05 18:22     ` David Hildenbrand
  2022-08-05 18:23     ` Mike Kravetz
  1 sibling, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2022-08-05 18:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov, stable

On 05.08.22 20:14, Peter Xu wrote:
> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 61e6135c54ef..462a6b0344ac 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
>>  		return 0;
>>  
>> +	/*
>> +	 * Hugetlb does not require/support writenotify; especially, it does not
>> +	 * support softdirty tracking.
>> +	 */
>> +	if (is_vm_hugetlb_page(vma))
>> +		return 0;
> 
> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> hugetlb but here it's explicitly forbidden.
> 
> Could you explain a bit more on why this patch is needed if (assume
> there'll be a working) patch 2 being provided?

I want something simple to backport. And even with patch #2 in place, as
long as we don't support softdirty tracking, it doesn't make sense to
enable it here as of now.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 18:14   ` Peter Xu
  2022-08-05 18:22     ` David Hildenbrand
@ 2022-08-05 18:23     ` Mike Kravetz
  2022-08-05 18:25       ` David Hildenbrand
  1 sibling, 1 reply; 30+ messages in thread
From: Mike Kravetz @ 2022-08-05 18:23 UTC (permalink / raw)
  To: Peter Xu
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Muchun Song, Peter Feiner, Kirill A . Shutemov, stable

On 08/05/22 14:14, Peter Xu wrote:
> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 61e6135c54ef..462a6b0344ac 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
> >  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> >  		return 0;
> >  
> > +	/*
> > +	 * Hugetlb does not require/support writenotify; especially, it does not
> > +	 * support softdirty tracking.
> > +	 */
> > +	if (is_vm_hugetlb_page(vma))
> > +		return 0;
> 
> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> hugetlb but here it's explicitly forbidden.
> 
> Could you explain a bit more on why this patch is needed if (assume
> there'll be a working) patch 2 being provided?
> 

No comments on the patch, but ...

Since it required little thought, I ran the test program on next-20220802 and
was surprised that the issue did not recreate.  Even added a simple printk
to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
We were.

I can easily recreate with an older Fedora kernel.

Will be trying to understand why this is the case.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 18:23     ` Mike Kravetz
@ 2022-08-05 18:25       ` David Hildenbrand
  2022-08-05 18:33         ` Mike Kravetz
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2022-08-05 18:25 UTC (permalink / raw)
  To: Mike Kravetz, Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Muchun Song, Peter Feiner,
	Kirill A . Shutemov, stable

On 05.08.22 20:23, Mike Kravetz wrote:
> On 08/05/22 14:14, Peter Xu wrote:
>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>> index 61e6135c54ef..462a6b0344ac 100644
>>> --- a/mm/mmap.c
>>> +++ b/mm/mmap.c
>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
>>>  		return 0;
>>>  
>>> +	/*
>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
>>> +	 * support softdirty tracking.
>>> +	 */
>>> +	if (is_vm_hugetlb_page(vma))
>>> +		return 0;
>>
>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
>> hugetlb but here it's explicitly forbidden.
>>
>> Could you explain a bit more on why this patch is needed if (assume
>> there'll be a working) patch 2 being provided?
>>
> 
> No comments on the patch, but ...
> 
> Since it required little thought, I ran the test program on next-20220802 and
> was surprised that the issue did not recreate.  Even added a simple printk
> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
> We were.


... does your config have CONFIG_MEM_SOFT_DIRTY enabled?


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 18:25       ` David Hildenbrand
@ 2022-08-05 18:33         ` Mike Kravetz
  2022-08-05 18:57           ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Kravetz @ 2022-08-05 18:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Xu, linux-kernel, linux-mm, Andrew Morton, Muchun Song,
	Peter Feiner, Kirill A . Shutemov, stable

On 08/05/22 20:25, David Hildenbrand wrote:
> On 05.08.22 20:23, Mike Kravetz wrote:
> > On 08/05/22 14:14, Peter Xu wrote:
> >> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> >>> diff --git a/mm/mmap.c b/mm/mmap.c
> >>> index 61e6135c54ef..462a6b0344ac 100644
> >>> --- a/mm/mmap.c
> >>> +++ b/mm/mmap.c
> >>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
> >>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> >>>  		return 0;
> >>>  
> >>> +	/*
> >>> +	 * Hugetlb does not require/support writenotify; especially, it does not
> >>> +	 * support softdirty tracking.
> >>> +	 */
> >>> +	if (is_vm_hugetlb_page(vma))
> >>> +		return 0;
> >>
> >> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> >> hugetlb but here it's explicitly forbidden.
> >>
> >> Could you explain a bit more on why this patch is needed if (assume
> >> there'll be a working) patch 2 being provided?
> >>
> > 
> > No comments on the patch, but ...
> > 
> > Since it required little thought, I ran the test program on next-20220802 and
> > was surprised that the issue did not recreate.  Even added a simple printk
> > to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
> > We were.
> 
> 
> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
> 

No, Duh!

FYI - Some time back, I started looking at adding soft dirty support for
hugetlb mappings.  I did not finish that work.  But, I seem to recall
places where code was operating on hugetlb mappings when perhaps it should
not.

Perhaps, it would also be good to just disable soft dirty for hugetlb at
the source?
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 18:33         ` Mike Kravetz
@ 2022-08-05 18:57           ` David Hildenbrand
  2022-08-05 20:48             ` Mike Kravetz
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2022-08-05 18:57 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Peter Xu, linux-kernel, linux-mm, Andrew Morton, Muchun Song,
	Peter Feiner, Kirill A . Shutemov, stable

On 05.08.22 20:33, Mike Kravetz wrote:
> On 08/05/22 20:25, David Hildenbrand wrote:
>> On 05.08.22 20:23, Mike Kravetz wrote:
>>> On 08/05/22 14:14, Peter Xu wrote:
>>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
>>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>>> index 61e6135c54ef..462a6b0344ac 100644
>>>>> --- a/mm/mmap.c
>>>>> +++ b/mm/mmap.c
>>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
>>>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
>>>>>  		return 0;
>>>>>  
>>>>> +	/*
>>>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
>>>>> +	 * support softdirty tracking.
>>>>> +	 */
>>>>> +	if (is_vm_hugetlb_page(vma))
>>>>> +		return 0;
>>>>
>>>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
>>>> hugetlb but here it's explicitly forbidden.
>>>>
>>>> Could you explain a bit more on why this patch is needed if (assume
>>>> there'll be a working) patch 2 being provided?
>>>>
>>>
>>> No comments on the patch, but ...
>>>
>>> Since it required little thought, I ran the test program on next-20220802 and
>>> was surprised that the issue did not recreate.  Even added a simple printk
>>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
>>> We were.
>>
>>
>> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
>>
> 
> No, Duh!
> 
> FYI - Some time back, I started looking at adding soft dirty support for
> hugetlb mappings.  I did not finish that work.  But, I seem to recall
> places where code was operating on hugetlb mappings when perhaps it should
> not.
> 
> Perhaps, it would also be good to just disable soft dirty for hugetlb at
> the source?

I thought about that as well. But I came to the conclusion that without
patch #2, hugetlb VMAs cannot possibly support write-notify, so there is
no need to bother in vma_wants_writenotify() at all.

The "root" would be places where we clear VM_SOFTDIRTY. That should only
be fs/proc/task_mmu.c:clear_refs_write() IIRC.

So I don't particularly care, I consider this patch a bit cleaner and
more generic, but I can adjust clear_refs_write() instead of there is a
preference.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 18:57           ` David Hildenbrand
@ 2022-08-05 20:48             ` Mike Kravetz
  2022-08-05 23:13               ` Peter Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Kravetz @ 2022-08-05 20:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Xu, linux-kernel, linux-mm, Andrew Morton, Muchun Song,
	Peter Feiner, Kirill A . Shutemov, stable

On 08/05/22 20:57, David Hildenbrand wrote:
> On 05.08.22 20:33, Mike Kravetz wrote:
> > On 08/05/22 20:25, David Hildenbrand wrote:
> >> On 05.08.22 20:23, Mike Kravetz wrote:
> >>> On 08/05/22 14:14, Peter Xu wrote:
> >>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> >>>>> diff --git a/mm/mmap.c b/mm/mmap.c
> >>>>> index 61e6135c54ef..462a6b0344ac 100644
> >>>>> --- a/mm/mmap.c
> >>>>> +++ b/mm/mmap.c
> >>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
> >>>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> >>>>>  		return 0;
> >>>>>  
> >>>>> +	/*
> >>>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
> >>>>> +	 * support softdirty tracking.
> >>>>> +	 */
> >>>>> +	if (is_vm_hugetlb_page(vma))
> >>>>> +		return 0;
> >>>>
> >>>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> >>>> hugetlb but here it's explicitly forbidden.
> >>>>
> >>>> Could you explain a bit more on why this patch is needed if (assume
> >>>> there'll be a working) patch 2 being provided?
> >>>>
> >>>
> >>> No comments on the patch, but ...
> >>>
> >>> Since it required little thought, I ran the test program on next-20220802 and
> >>> was surprised that the issue did not recreate.  Even added a simple printk
> >>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
> >>> We were.
> >>
> >>
> >> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
> >>
> > 
> > No, Duh!
> > 
> > FYI - Some time back, I started looking at adding soft dirty support for
> > hugetlb mappings.  I did not finish that work.  But, I seem to recall
> > places where code was operating on hugetlb mappings when perhaps it should
> > not.
> > 
> > Perhaps, it would also be good to just disable soft dirty for hugetlb at
> > the source?
> 
> I thought about that as well. But I came to the conclusion that without
> patch #2, hugetlb VMAs cannot possibly support write-notify, so there is
> no need to bother in vma_wants_writenotify() at all.
> 
> The "root" would be places where we clear VM_SOFTDIRTY. That should only
> be fs/proc/task_mmu.c:clear_refs_write() IIRC.
> 
> So I don't particularly care, I consider this patch a bit cleaner and
> more generic, but I can adjust clear_refs_write() instead of there is a
> preference.
> 

After a closer look, I agree that this may be the simplest/cleanest way to
proceed.  I was going to suggest that you note hugetlb does not support
softdirty, but see you did in the comment.

Acked-by: Mike Kravetz <mike.kravetz@oracle.com>

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-05 18:12   ` Peter Xu
  2022-08-05 18:20     ` David Hildenbrand
@ 2022-08-05 23:08     ` Mike Kravetz
  1 sibling, 0 replies; 30+ messages in thread
From: Mike Kravetz @ 2022-08-05 23:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Muchun Song, Peter Feiner, Kirill A . Shutemov

On 08/05/22 14:12, Peter Xu wrote:
> On Fri, Aug 05, 2022 at 01:03:29PM +0200, David Hildenbrand wrote:
> > Let's add a safety net if we ever get (again) a write-fault on a R/O-mapped
> > page in a shared mapping, in which case we simply have to map the
> > page writable.
> > 

> > +		if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
> > +			return VM_FAULT_SIGSEGV;
> 
> I had a feeling that you just want to double check we have write
> permission, but IIUC this should be checked far earlier or we'll have
> problem.

Back when I was exploring hugetlb softdirty support, this patch handled
this condition by not calling into hugetlb_wp (was hugetlb_cow).

https://lore.kernel.org/linux-mm/20210211000322.159437-3-mike.kravetz@oracle.com/

Here is a quickly updated version that was only tested with David's program.
-- 
Mike Kravetz

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a23f1cffaf49..86d38d41fddf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5758,9 +5758,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * spinlock. For private mappings, we also lookup the pagecache
 	 * page now as it is used to determine if a reservation has been
 	 * consumed.
+	 * Only private/non-shared mappings are sent to hugetlb_wp.
 	 */
 	if ((flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) &&
-	    !huge_pte_write(entry)) {
+	    !huge_pte_write(entry) && !(vma->vm_flags & VM_MAYSHARE)) {
 		if (vma_needs_reservation(h, vma, haddr) < 0) {
 			ret = VM_FAULT_OOM;
 			goto out_mutex;
@@ -5768,8 +5769,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 		/* Just decrements count, does not deallocate */
 		vma_end_reservation(h, vma, haddr);
 
-		if (!(vma->vm_flags & VM_MAYSHARE))
-			pagecache_page = hugetlbfs_pagecache_page(h,
+		pagecache_page = hugetlbfs_pagecache_page(h,
 								vma, haddr);
 	}
 
@@ -5815,9 +5815,14 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 
 	if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
 		if (!huge_pte_write(entry)) {
-			ret = hugetlb_wp(mm, vma, address, ptep, flags,
+			if (!(vma->vm_flags & VM_MAYSHARE)) {
+				ret = hugetlb_wp(mm, vma, address, ptep, flags,
 					 pagecache_page, ptl);
-			goto out_put_page;
+				goto out_put_page;
+			}
+
+			entry = huge_pte_mkwrite(entry);
+			entry = huge_pte_mkdirty(entry);
 		} else if (likely(flags & FAULT_FLAG_WRITE)) {
 			entry = huge_pte_mkdirty(entry);
 		}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 20:48             ` Mike Kravetz
@ 2022-08-05 23:13               ` Peter Xu
  2022-08-05 23:33                 ` Mike Kravetz
  2022-08-08 16:36                 ` David Hildenbrand
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Xu @ 2022-08-05 23:13 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Muchun Song, Peter Feiner, Kirill A . Shutemov, stable

On Fri, Aug 05, 2022 at 01:48:35PM -0700, Mike Kravetz wrote:
> On 08/05/22 20:57, David Hildenbrand wrote:
> > On 05.08.22 20:33, Mike Kravetz wrote:
> > > On 08/05/22 20:25, David Hildenbrand wrote:
> > >> On 05.08.22 20:23, Mike Kravetz wrote:
> > >>> On 08/05/22 14:14, Peter Xu wrote:
> > >>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> > >>>>> diff --git a/mm/mmap.c b/mm/mmap.c
> > >>>>> index 61e6135c54ef..462a6b0344ac 100644
> > >>>>> --- a/mm/mmap.c
> > >>>>> +++ b/mm/mmap.c
> > >>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
> > >>>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> > >>>>>  		return 0;
> > >>>>>  
> > >>>>> +	/*
> > >>>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
> > >>>>> +	 * support softdirty tracking.
> > >>>>> +	 */
> > >>>>> +	if (is_vm_hugetlb_page(vma))
> > >>>>> +		return 0;
> > >>>>
> > >>>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> > >>>> hugetlb but here it's explicitly forbidden.
> > >>>>
> > >>>> Could you explain a bit more on why this patch is needed if (assume
> > >>>> there'll be a working) patch 2 being provided?
> > >>>>
> > >>>
> > >>> No comments on the patch, but ...
> > >>>
> > >>> Since it required little thought, I ran the test program on next-20220802 and
> > >>> was surprised that the issue did not recreate.  Even added a simple printk
> > >>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
> > >>> We were.
> > >>
> > >>
> > >> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
> > >>
> > > 
> > > No, Duh!
> > > 
> > > FYI - Some time back, I started looking at adding soft dirty support for
> > > hugetlb mappings.  I did not finish that work.  But, I seem to recall
> > > places where code was operating on hugetlb mappings when perhaps it should
> > > not.
> > > 
> > > Perhaps, it would also be good to just disable soft dirty for hugetlb at
> > > the source?
> > 
> > I thought about that as well. But I came to the conclusion that without
> > patch #2, hugetlb VMAs cannot possibly support write-notify, so there is
> > no need to bother in vma_wants_writenotify() at all.
> > 
> > The "root" would be places where we clear VM_SOFTDIRTY. That should only
> > be fs/proc/task_mmu.c:clear_refs_write() IIRC.
> > 
> > So I don't particularly care, I consider this patch a bit cleaner and
> > more generic, but I can adjust clear_refs_write() instead of there is a
> > preference.
> > 
> 
> After a closer look, I agree that this may be the simplest/cleanest way to
> proceed.  I was going to suggest that you note hugetlb does not support
> softdirty, but see you did in the comment.
> 
> Acked-by: Mike Kravetz <mike.kravetz@oracle.com>

Filtering out hugetlbfs in vma_wants_writenotify() is still a bit hard to
follow to me, since it's not clear why hugetlbfs never wants writenotify.

If it's only about soft-dirty, we could have added the hugetlbfs check into
vma_soft_dirty_enabled(), then I think it'll achieve the same thing and
much clearer - with the soft-dirty check constantly returning false for it,
hugetlbfs shared vmas should have vma_wants_writenotify() naturally return
0 already.

For the long term - shouldn't we just enable soft-dirty for hugetlbfs?  I
remember Mike used to have that in todo.  Since we've got patch 2 already,
I feel like that's really much close (is the only missing piece the clear
refs write part? or maybe some more that I didn't notice).

Then patch 1 (or IMHO equivalant check in vma_soft_dirty_enabled(), but
maybe in stable trees we don't have vma_soft_dirty_enabled then it's
exactly patch 1) can be a stable-only backport just to avoid the bug from
triggering.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 23:13               ` Peter Xu
@ 2022-08-05 23:33                 ` Mike Kravetz
  2022-08-08 16:10                   ` Peter Xu
  2022-08-08 16:36                 ` David Hildenbrand
  1 sibling, 1 reply; 30+ messages in thread
From: Mike Kravetz @ 2022-08-05 23:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Muchun Song, Peter Feiner, Kirill A . Shutemov, stable

On 08/05/22 19:13, Peter Xu wrote:
> On Fri, Aug 05, 2022 at 01:48:35PM -0700, Mike Kravetz wrote:
> > On 08/05/22 20:57, David Hildenbrand wrote:
> > > On 05.08.22 20:33, Mike Kravetz wrote:
> > > > On 08/05/22 20:25, David Hildenbrand wrote:
> > > >> On 05.08.22 20:23, Mike Kravetz wrote:
> > > >>> On 08/05/22 14:14, Peter Xu wrote:
> > > >>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> > > >>>>> diff --git a/mm/mmap.c b/mm/mmap.c
> > > >>>>> index 61e6135c54ef..462a6b0344ac 100644
> > > >>>>> --- a/mm/mmap.c
> > > >>>>> +++ b/mm/mmap.c
> > > >>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
> > > >>>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> > > >>>>>  		return 0;
> > > >>>>>  
> > > >>>>> +	/*
> > > >>>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
> > > >>>>> +	 * support softdirty tracking.
> > > >>>>> +	 */
> > > >>>>> +	if (is_vm_hugetlb_page(vma))
> > > >>>>> +		return 0;
> > > >>>>
> > > >>>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> > > >>>> hugetlb but here it's explicitly forbidden.
> > > >>>>
> > > >>>> Could you explain a bit more on why this patch is needed if (assume
> > > >>>> there'll be a working) patch 2 being provided?
> > > >>>>
> > > >>>
> > > >>> No comments on the patch, but ...
> > > >>>
> > > >>> Since it required little thought, I ran the test program on next-20220802 and
> > > >>> was surprised that the issue did not recreate.  Even added a simple printk
> > > >>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
> > > >>> We were.
> > > >>
> > > >>
> > > >> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
> > > >>
> > > > 
> > > > No, Duh!
> > > > 
> > > > FYI - Some time back, I started looking at adding soft dirty support for
> > > > hugetlb mappings.  I did not finish that work.  But, I seem to recall
> > > > places where code was operating on hugetlb mappings when perhaps it should
> > > > not.
> > > > 
> > > > Perhaps, it would also be good to just disable soft dirty for hugetlb at
> > > > the source?
> > > 
> > > I thought about that as well. But I came to the conclusion that without
> > > patch #2, hugetlb VMAs cannot possibly support write-notify, so there is
> > > no need to bother in vma_wants_writenotify() at all.
> > > 
> > > The "root" would be places where we clear VM_SOFTDIRTY. That should only
> > > be fs/proc/task_mmu.c:clear_refs_write() IIRC.
> > > 
> > > So I don't particularly care, I consider this patch a bit cleaner and
> > > more generic, but I can adjust clear_refs_write() instead of there is a
> > > preference.
> > > 
> > 
> > After a closer look, I agree that this may be the simplest/cleanest way to
> > proceed.  I was going to suggest that you note hugetlb does not support
> > softdirty, but see you did in the comment.
> > 
> > Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Filtering out hugetlbfs in vma_wants_writenotify() is still a bit hard to
> follow to me, since it's not clear why hugetlbfs never wants writenotify.
> 
> If it's only about soft-dirty, we could have added the hugetlbfs check into
> vma_soft_dirty_enabled(), then I think it'll achieve the same thing and
> much clearer - with the soft-dirty check constantly returning false for it,
> hugetlbfs shared vmas should have vma_wants_writenotify() naturally return
> 0 already.
> 
> For the long term - shouldn't we just enable soft-dirty for hugetlbfs?  I
> remember Mike used to have that in todo.  Since we've got patch 2 already,
> I feel like that's really much close (is the only missing piece the clear
> refs write part? or maybe some more that I didn't notice).
> 
> Then patch 1 (or IMHO equivalant check in vma_soft_dirty_enabled(), but
> maybe in stable trees we don't have vma_soft_dirty_enabled then it's
> exactly patch 1) can be a stable-only backport just to avoid the bug from
> triggering.

It looks like vma_soft_dirty_enabled is recent and not in any stable
trees (or even 5.19).

Yes, I did start working on hugetlb softdirty support in the past.
https://lore.kernel.org/lkml/20210211000322.159437-1-mike.kravetz@oracle.com/

Unfortunately, it got preempted by other things.  I will try to move it up
the priority list.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-05 18:20     ` David Hildenbrand
@ 2022-08-08 16:05       ` Peter Xu
  2022-08-08 16:25         ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2022-08-08 16:05 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

On Fri, Aug 05, 2022 at 08:20:52PM +0200, David Hildenbrand wrote:
> On 05.08.22 20:12, Peter Xu wrote:
> > On Fri, Aug 05, 2022 at 01:03:29PM +0200, David Hildenbrand wrote:
> >> Let's add a safety net if we ever get (again) a write-fault on a R/O-mapped
> >> page in a shared mapping, in which case we simply have to map the
> >> page writable.
> >>
> >> VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE
> >> indicates that this was at least envisioned, but could never have worked
> >> as expected. This theoretically paves the way for softdirty tracking
> >> support in hugetlb.
> >>
> >> Tested without the fix for softdirty tracking.
> >>
> >> Note that there is no need to do any kind of reservation in hugetlb_fault()
> >> in this case ... because we already have a hugetlb page mapped R/O
> >> that we will simply map writable and we are not dealing with COW/unsharing.
> >>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  mm/hugetlb.c | 21 ++++++++++++++-------
> >>  1 file changed, 14 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> index a18c071c294e..bbab7aa9d8f8 100644
> >> --- a/mm/hugetlb.c
> >> +++ b/mm/hugetlb.c
> >> @@ -5233,6 +5233,16 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
> >>  	VM_BUG_ON(unshare && (flags & FOLL_WRITE));
> >>  	VM_BUG_ON(!unshare && !(flags & FOLL_WRITE));
> >>  
> >> +	/* Let's take out shared mappings first, this should be a rare event. */
> >> +	if (unlikely(vma->vm_flags & VM_MAYSHARE)) {
> > 
> > Should we check VM_SHARED instead?
> 
> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
> unfortunately wrong.
> 
> If you're curious, take a look at f83a275dbc5c ("mm: account for
> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
> and mmap() code.
> 
> Long story short: if the file is read-only, we only have VM_MAYSHARE but
> not VM_SHARED (and consequently also not VM_MAYWRITE).

To ask in another way: if file is RO but mapped RW (mmap() will have
VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
won't we grant write bit errornously while we shouldn't? As the user
doesn't really have write permission to the file.

> 
> > 
> >> +		if (unshare)
> >> +			return 0;
> > 
> > Curious when will this happen especially if we switch to VM_SHARED above.
> > Shouldn't "unshare" not happen at all on a shared region?
> 
> FAULT_FLAG_UNSHARE is documented to behave like:
> 
> "FAULT_FLAG_UNSHARE is ignored and treated like an ordinary read fault
> when no existing R/O-mapped anonymous page is encountered."
> 
> It should currently not happen. Focus on should ;)

OK. :)

Then does it also mean that it should be better to turn into
WARN_ON_ONCE()?  It's just that it looks like a valid path if without it.

> 
> > 
> >> +		if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
> >> +			return VM_FAULT_SIGSEGV;
> > 
> > I had a feeling that you just want to double check we have write
> > permission, but IIUC this should be checked far earlier or we'll have
> > problem.  No strong opinion if so, but I'd suggest dropping this one,
> > otherwise we could add tons of WARN_ON_ONCE() in anywhere in the page fault
> > stack and they mostly won't trigger at all.
> 
> Not quite. We usually (!hugetlb) have maybe_mkwrite() all over the
> place. This is just an indication that we don't have maybe semantics
> here. But as we also don't have it for hugetlb anon code below, maybe I
> can just drop it. (or check it for both call paths)

Hmm, this reminded me to wonder how hugetlb handles FOLL_FORCE|FOLL_WRITE
on RO regions.

Maybe that check is needed, but however instead of warning and sigbus, we
need to handle it?

I'll need to read more on this later, but raise this up.

> 
> > 
> >> +		set_huge_ptep_writable(vma, haddr, ptep);
> > 
> > Do we wanna set dirty bits too?
> 
> set_huge_ptep_writable() handles that.

Right, I noticed it right after I sent too..

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 23:33                 ` Mike Kravetz
@ 2022-08-08 16:10                   ` Peter Xu
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Xu @ 2022-08-08 16:10 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Muchun Song, Peter Feiner, Kirill A . Shutemov, stable

On Fri, Aug 05, 2022 at 04:33:56PM -0700, Mike Kravetz wrote:
> It looks like vma_soft_dirty_enabled is recent and not in any stable
> trees (or even 5.19).
> 
> Yes, I did start working on hugetlb softdirty support in the past.
> https://lore.kernel.org/lkml/20210211000322.159437-1-mike.kravetz@oracle.com/
> 
> Unfortunately, it got preempted by other things.  I will try to move it up
> the priority list.

Thanks, Mike.

It'll also makes sense to forbid it if it may take time to finish, so we
don't need to push ourselves.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-08 16:05       ` Peter Xu
@ 2022-08-08 16:25         ` David Hildenbrand
  2022-08-08 20:21           ` Peter Xu
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2022-08-08 16:25 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

>> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
>> unfortunately wrong.
>>
>> If you're curious, take a look at f83a275dbc5c ("mm: account for
>> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
>> and mmap() code.
>>
>> Long story short: if the file is read-only, we only have VM_MAYSHARE but
>> not VM_SHARED (and consequently also not VM_MAYWRITE).
> 
> To ask in another way: if file is RO but mapped RW (mmap() will have
> VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
> won't we grant write bit errornously while we shouldn't? As the user
> doesn't really have write permission to the file.

Thus the VM_WRITE check. :)

I wonder if we should just do it cleanly and introduce the maybe_mkwrite
semantics here as well. Then there is no need for additional VM_WRITE
checks and hugetlb will work just like !hugetlb.

Thoughts?

> 
>>
>>>
>>>> +		if (unshare)
>>>> +			return 0;
>>>
>>> Curious when will this happen especially if we switch to VM_SHARED above.
>>> Shouldn't "unshare" not happen at all on a shared region?
>>
>> FAULT_FLAG_UNSHARE is documented to behave like:
>>
>> "FAULT_FLAG_UNSHARE is ignored and treated like an ordinary read fault
>> when no existing R/O-mapped anonymous page is encountered."
>>
>> It should currently not happen. Focus on should ;)
> 
> OK. :)
> 
> Then does it also mean that it should be better to turn into
> WARN_ON_ONCE()?  It's just that it looks like a valid path if without it.

Well, it should work (and we handle the !hugetlb path) like that as
well. So I'd want to avoid WARN_ON_ONCE() at least for that check.


> 
>>
>>>
>>>> +		if (WARN_ON_ONCE(!(vma->vm_flags & VM_WRITE)))
>>>> +			return VM_FAULT_SIGSEGV;
>>>
>>> I had a feeling that you just want to double check we have write
>>> permission, but IIUC this should be checked far earlier or we'll have
>>> problem.  No strong opinion if so, but I'd suggest dropping this one,
>>> otherwise we could add tons of WARN_ON_ONCE() in anywhere in the page fault
>>> stack and they mostly won't trigger at all.
>>
>> Not quite. We usually (!hugetlb) have maybe_mkwrite() all over the
>> place. This is just an indication that we don't have maybe semantics
>> here. But as we also don't have it for hugetlb anon code below, maybe I
>> can just drop it. (or check it for both call paths)
> 
> Hmm, this reminded me to wonder how hugetlb handles FOLL_FORCE|FOLL_WRITE
> on RO regions.
> 
> Maybe that check is needed, but however instead of warning and sigbus, we
> need to handle it?

We don't support FOLL_FORCE|FOLL_WRITE for hugetlb, but if we would,
we'd need the maybe_mkwrite semantics.

Fortunately I detest private hugetlb mappings / anon hugetlb pages and
couldn't care less about debug access until it's actually a problem for
someone :)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-05 23:13               ` Peter Xu
  2022-08-05 23:33                 ` Mike Kravetz
@ 2022-08-08 16:36                 ` David Hildenbrand
  2022-08-08 19:28                   ` Peter Xu
  1 sibling, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2022-08-08 16:36 UTC (permalink / raw)
  To: Peter Xu, Mike Kravetz
  Cc: linux-kernel, linux-mm, Andrew Morton, Muchun Song, Peter Feiner,
	Kirill A . Shutemov, stable

On 06.08.22 01:13, Peter Xu wrote:
> On Fri, Aug 05, 2022 at 01:48:35PM -0700, Mike Kravetz wrote:
>> On 08/05/22 20:57, David Hildenbrand wrote:
>>> On 05.08.22 20:33, Mike Kravetz wrote:
>>>> On 08/05/22 20:25, David Hildenbrand wrote:
>>>>> On 05.08.22 20:23, Mike Kravetz wrote:
>>>>>> On 08/05/22 14:14, Peter Xu wrote:
>>>>>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
>>>>>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>>>>>> index 61e6135c54ef..462a6b0344ac 100644
>>>>>>>> --- a/mm/mmap.c
>>>>>>>> +++ b/mm/mmap.c
>>>>>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
>>>>>>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
>>>>>>>>  		return 0;
>>>>>>>>  
>>>>>>>> +	/*
>>>>>>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
>>>>>>>> +	 * support softdirty tracking.
>>>>>>>> +	 */
>>>>>>>> +	if (is_vm_hugetlb_page(vma))
>>>>>>>> +		return 0;
>>>>>>>
>>>>>>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
>>>>>>> hugetlb but here it's explicitly forbidden.
>>>>>>>
>>>>>>> Could you explain a bit more on why this patch is needed if (assume
>>>>>>> there'll be a working) patch 2 being provided?
>>>>>>>
>>>>>>
>>>>>> No comments on the patch, but ...
>>>>>>
>>>>>> Since it required little thought, I ran the test program on next-20220802 and
>>>>>> was surprised that the issue did not recreate.  Even added a simple printk
>>>>>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
>>>>>> We were.
>>>>>
>>>>>
>>>>> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
>>>>>
>>>>
>>>> No, Duh!
>>>>
>>>> FYI - Some time back, I started looking at adding soft dirty support for
>>>> hugetlb mappings.  I did not finish that work.  But, I seem to recall
>>>> places where code was operating on hugetlb mappings when perhaps it should
>>>> not.
>>>>
>>>> Perhaps, it would also be good to just disable soft dirty for hugetlb at
>>>> the source?
>>>
>>> I thought about that as well. But I came to the conclusion that without
>>> patch #2, hugetlb VMAs cannot possibly support write-notify, so there is
>>> no need to bother in vma_wants_writenotify() at all.
>>>
>>> The "root" would be places where we clear VM_SOFTDIRTY. That should only
>>> be fs/proc/task_mmu.c:clear_refs_write() IIRC.
>>>
>>> So I don't particularly care, I consider this patch a bit cleaner and
>>> more generic, but I can adjust clear_refs_write() instead of there is a
>>> preference.
>>>
>>
>> After a closer look, I agree that this may be the simplest/cleanest way to
>> proceed.  I was going to suggest that you note hugetlb does not support
>> softdirty, but see you did in the comment.
>>
>> Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Filtering out hugetlbfs in vma_wants_writenotify() is still a bit hard to
> follow to me, since it's not clear why hugetlbfs never wants writenotify.

Well, because the write-fault handler as is cannot deal with
write-faults in shared mappings. It cannot possibly work or ever have
worked.

> 
> If it's only about soft-dirty, we could have added the hugetlbfs check into
> vma_soft_dirty_enabled(), then I think it'll achieve the same thing and
> much clearer - with the soft-dirty check constantly returning false for it,
> hugetlbfs shared vmas should have vma_wants_writenotify() naturally return
> 0 already.

I considered that an option, but went with this approach for simplicity
and because I don't see a need to check for hugetlb in other code paths
(especially, in the !hugetlb mprotect variant).

I mean, with patch #2 we can remove it anytime we do support softdirty
tracking -- or if there is need for write-notify we can move it into the
softdirty check only.

> 
> For the long term - shouldn't we just enable soft-dirty for hugetlbfs?  I
> remember Mike used to have that in todo.  Since we've got patch 2 already,
> I feel like that's really much close (is the only missing piece the clear
> refs write part? or maybe some more that I didn't notice).

My gut feeling is that there isn't too much missing to have it working.
Define a PTE bit, implement hugetlb variant of clearing, and set it when
setting the PTE dirty.

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-08 16:36                 ` David Hildenbrand
@ 2022-08-08 19:28                   ` Peter Xu
  2022-08-10  9:29                     ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2022-08-08 19:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mike Kravetz, linux-kernel, linux-mm, Andrew Morton, Muchun Song,
	Peter Feiner, Kirill A . Shutemov, stable

On Mon, Aug 08, 2022 at 06:36:58PM +0200, David Hildenbrand wrote:
> Well, because the write-fault handler as is cannot deal with
> write-faults in shared mappings. It cannot possibly work or ever have
> worked.

Trivially - maybe drop the word "require" in "Hugetlb does not
require/support writenotify"?

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-08 16:25         ` David Hildenbrand
@ 2022-08-08 20:21           ` Peter Xu
  2022-08-08 22:08             ` Peter Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2022-08-08 20:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

On Mon, Aug 08, 2022 at 06:25:21PM +0200, David Hildenbrand wrote:
> >> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
> >> unfortunately wrong.
> >>
> >> If you're curious, take a look at f83a275dbc5c ("mm: account for
> >> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
> >> and mmap() code.
> >>
> >> Long story short: if the file is read-only, we only have VM_MAYSHARE but
> >> not VM_SHARED (and consequently also not VM_MAYWRITE).
> > 
> > To ask in another way: if file is RO but mapped RW (mmap() will have
> > VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
> > won't we grant write bit errornously while we shouldn't? As the user
> > doesn't really have write permission to the file.
> 
> Thus the VM_WRITE check. :)
> 
> I wonder if we should just do it cleanly and introduce the maybe_mkwrite
> semantics here as well. Then there is no need for additional VM_WRITE
> checks and hugetlb will work just like !hugetlb.

Hmm yeah I think the VM_MAYSHARE check is correct, since we'll need to fail
the cases where MAYSHARE && !SHARE - we used to silently let it pass.

But then OTOH using WARN_ON_ONCE on the VM_WRITE check is probably not
right, because iiuc it can be triggered easily by the userspace. E.g. as
simple as mapping hugetlb as RO+shared then write to it?

So maybe_mkwrite() seems not an option now - IIUC we really need that
!VM_WRITE check to fail properly, but just without the warning to pollute
dmesg?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-08 20:21           ` Peter Xu
@ 2022-08-08 22:08             ` Peter Xu
  2022-08-10  9:37               ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2022-08-08 22:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

On Mon, Aug 08, 2022 at 04:21:39PM -0400, Peter Xu wrote:
> On Mon, Aug 08, 2022 at 06:25:21PM +0200, David Hildenbrand wrote:
> > >> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
> > >> unfortunately wrong.
> > >>
> > >> If you're curious, take a look at f83a275dbc5c ("mm: account for
> > >> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
> > >> and mmap() code.
> > >>
> > >> Long story short: if the file is read-only, we only have VM_MAYSHARE but
> > >> not VM_SHARED (and consequently also not VM_MAYWRITE).
> > > 
> > > To ask in another way: if file is RO but mapped RW (mmap() will have
> > > VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
> > > won't we grant write bit errornously while we shouldn't? As the user
> > > doesn't really have write permission to the file.
> > 
> > Thus the VM_WRITE check. :)
> > 
> > I wonder if we should just do it cleanly and introduce the maybe_mkwrite
> > semantics here as well. Then there is no need for additional VM_WRITE
> > checks and hugetlb will work just like !hugetlb.
> 
> Hmm yeah I think the VM_MAYSHARE check is correct, since we'll need to fail
> the cases where MAYSHARE && !SHARE - we used to silently let it pass.

Sorry I think this is a wrong statement I made..  IIUC we'll fail correctly
with/without the patch on any write to hugetlb RO regions.

Then I just don't see a difference on checking VM_SHARED or VM_MAYSHARE
here, it's just that VM_MAYSHARE check should work too like VM_SHARED so I
don't see a problem.

It also means I can't think of any valid case of having VM_WRITE when
reaching here, then the WARN_ON_ONCE() is okay but maybe also redundant.
Using maybe_mkwrite() seems misleading to me if FOLL_FORCE not ready for
hugetlbfs after all.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
  2022-08-08 19:28                   ` Peter Xu
@ 2022-08-10  9:29                     ` David Hildenbrand
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2022-08-10  9:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: Mike Kravetz, linux-kernel, linux-mm, Andrew Morton, Muchun Song,
	Peter Feiner, Kirill A . Shutemov, stable

On 08.08.22 21:28, Peter Xu wrote:
> On Mon, Aug 08, 2022 at 06:36:58PM +0200, David Hildenbrand wrote:
>> Well, because the write-fault handler as is cannot deal with
>> write-faults in shared mappings. It cannot possibly work or ever have
>> worked.
> 
> Trivially - maybe drop the word "require" in "Hugetlb does not
> require/support writenotify"?
> 

Sure, can do.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-08 22:08             ` Peter Xu
@ 2022-08-10  9:37               ` David Hildenbrand
  2022-08-10  9:45                 ` David Hildenbrand
  2022-08-10 19:29                 ` Peter Xu
  0 siblings, 2 replies; 30+ messages in thread
From: David Hildenbrand @ 2022-08-10  9:37 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

On 09.08.22 00:08, Peter Xu wrote:
> On Mon, Aug 08, 2022 at 04:21:39PM -0400, Peter Xu wrote:
>> On Mon, Aug 08, 2022 at 06:25:21PM +0200, David Hildenbrand wrote:
>>>>> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
>>>>> unfortunately wrong.
>>>>>
>>>>> If you're curious, take a look at f83a275dbc5c ("mm: account for
>>>>> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
>>>>> and mmap() code.
>>>>>
>>>>> Long story short: if the file is read-only, we only have VM_MAYSHARE but
>>>>> not VM_SHARED (and consequently also not VM_MAYWRITE).
>>>>
>>>> To ask in another way: if file is RO but mapped RW (mmap() will have
>>>> VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
>>>> won't we grant write bit errornously while we shouldn't? As the user
>>>> doesn't really have write permission to the file.
>>>
>>> Thus the VM_WRITE check. :)
>>>
>>> I wonder if we should just do it cleanly and introduce the maybe_mkwrite
>>> semantics here as well. Then there is no need for additional VM_WRITE
>>> checks and hugetlb will work just like !hugetlb.
>>
>> Hmm yeah I think the VM_MAYSHARE check is correct, since we'll need to fail
>> the cases where MAYSHARE && !SHARE - we used to silently let it pass.
> 
> Sorry I think this is a wrong statement I made..  IIUC we'll fail correctly
> with/without the patch on any write to hugetlb RO regions.
> 
> Then I just don't see a difference on checking VM_SHARED or VM_MAYSHARE
> here, it's just that VM_MAYSHARE check should work too like VM_SHARED so I
> don't see a problem.
> 
> It also means I can't think of any valid case of having VM_WRITE when
> reaching here, then the WARN_ON_ONCE() is okay but maybe also redundant.
> Using maybe_mkwrite() seems misleading to me if FOLL_FORCE not ready for
> hugetlbfs after all.
> 

The main reason we'd have it would be to scream out lout and fail
gracefully if someone would -- for example -- use it for something like
FOLL_FORCE. I mean triggering a write fault without VM_WRITE on !hugetlb
works, so it would be easy to assume that it similarly simply works for
hugetlb as well. And the code most probably wouldn't even blow up
immediately :)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-10  9:37               ` David Hildenbrand
@ 2022-08-10  9:45                 ` David Hildenbrand
  2022-08-10 19:29                 ` Peter Xu
  1 sibling, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2022-08-10  9:45 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

On 10.08.22 11:37, David Hildenbrand wrote:
> On 09.08.22 00:08, Peter Xu wrote:
>> On Mon, Aug 08, 2022 at 04:21:39PM -0400, Peter Xu wrote:
>>> On Mon, Aug 08, 2022 at 06:25:21PM +0200, David Hildenbrand wrote:
>>>>>> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
>>>>>> unfortunately wrong.
>>>>>>
>>>>>> If you're curious, take a look at f83a275dbc5c ("mm: account for
>>>>>> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
>>>>>> and mmap() code.
>>>>>>
>>>>>> Long story short: if the file is read-only, we only have VM_MAYSHARE but
>>>>>> not VM_SHARED (and consequently also not VM_MAYWRITE).
>>>>>
>>>>> To ask in another way: if file is RO but mapped RW (mmap() will have
>>>>> VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
>>>>> won't we grant write bit errornously while we shouldn't? As the user
>>>>> doesn't really have write permission to the file.
>>>>
>>>> Thus the VM_WRITE check. :)
>>>>
>>>> I wonder if we should just do it cleanly and introduce the maybe_mkwrite
>>>> semantics here as well. Then there is no need for additional VM_WRITE
>>>> checks and hugetlb will work just like !hugetlb.
>>>
>>> Hmm yeah I think the VM_MAYSHARE check is correct, since we'll need to fail
>>> the cases where MAYSHARE && !SHARE - we used to silently let it pass.
>>
>> Sorry I think this is a wrong statement I made..  IIUC we'll fail correctly
>> with/without the patch on any write to hugetlb RO regions.
>>
>> Then I just don't see a difference on checking VM_SHARED or VM_MAYSHARE
>> here, it's just that VM_MAYSHARE check should work too like VM_SHARED so I
>> don't see a problem.
>>
>> It also means I can't think of any valid case of having VM_WRITE when
>> reaching here, then the WARN_ON_ONCE() is okay but maybe also redundant.
>> Using maybe_mkwrite() seems misleading to me if FOLL_FORCE not ready for
>> hugetlbfs after all.
>>
> 
> The main reason we'd have it would be to scream out lout and fail
> gracefully if someone would -- for example -- use it for something like
> FOLL_FORCE. I mean triggering a write fault without VM_WRITE on !hugetlb
> works, so it would be easy to assume that it similarly simply works for
> hugetlb as well. And the code most probably wouldn't even blow up
> immediately :)
> 

I propose the following change to hugetlb_wp():

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a18c071c294e..b92d30d3b33b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5233,6 +5233,21 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
        VM_BUG_ON(unshare && (flags & FOLL_WRITE));
        VM_BUG_ON(!unshare && !(flags & FOLL_WRITE));
 
+       /*
+        * hugetlb does not support FOLL_FORCE-style write faults that keep the
+        * PTE mapped R/O such as maybe_mkwrite() would do.
+        */
+       if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE)))
+               return VM_FAULT_SIGSEGV;
+
+       /* Let's take out shared mappings first, this should be a rare event. */
+       if (unlikely(vma->vm_flags & VM_MAYSHARE)) {
+               if (unlikely(unshare))
+                       return 0;
+               set_huge_ptep_writable(vma, haddr, ptep);
+               return 0;
+       }
+



-- 
Thanks,

David / dhildenb


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-10  9:37               ` David Hildenbrand
  2022-08-10  9:45                 ` David Hildenbrand
@ 2022-08-10 19:29                 ` Peter Xu
  2022-08-10 19:40                   ` David Hildenbrand
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Xu @ 2022-08-10 19:29 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

On Wed, Aug 10, 2022 at 11:37:13AM +0200, David Hildenbrand wrote:
> On 09.08.22 00:08, Peter Xu wrote:
> > On Mon, Aug 08, 2022 at 04:21:39PM -0400, Peter Xu wrote:
> >> On Mon, Aug 08, 2022 at 06:25:21PM +0200, David Hildenbrand wrote:
> >>>>> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
> >>>>> unfortunately wrong.
> >>>>>
> >>>>> If you're curious, take a look at f83a275dbc5c ("mm: account for
> >>>>> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
> >>>>> and mmap() code.
> >>>>>
> >>>>> Long story short: if the file is read-only, we only have VM_MAYSHARE but
> >>>>> not VM_SHARED (and consequently also not VM_MAYWRITE).
> >>>>
> >>>> To ask in another way: if file is RO but mapped RW (mmap() will have
> >>>> VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
> >>>> won't we grant write bit errornously while we shouldn't? As the user
> >>>> doesn't really have write permission to the file.
> >>>
> >>> Thus the VM_WRITE check. :)
> >>>
> >>> I wonder if we should just do it cleanly and introduce the maybe_mkwrite
> >>> semantics here as well. Then there is no need for additional VM_WRITE
> >>> checks and hugetlb will work just like !hugetlb.
> >>
> >> Hmm yeah I think the VM_MAYSHARE check is correct, since we'll need to fail
> >> the cases where MAYSHARE && !SHARE - we used to silently let it pass.
> > 
> > Sorry I think this is a wrong statement I made..  IIUC we'll fail correctly
> > with/without the patch on any write to hugetlb RO regions.
> > 
> > Then I just don't see a difference on checking VM_SHARED or VM_MAYSHARE
> > here, it's just that VM_MAYSHARE check should work too like VM_SHARED so I
> > don't see a problem.
> > 
> > It also means I can't think of any valid case of having VM_WRITE when
> > reaching here, then the WARN_ON_ONCE() is okay but maybe also redundant.
> > Using maybe_mkwrite() seems misleading to me if FOLL_FORCE not ready for
> > hugetlbfs after all.
> > 
> 
> The main reason we'd have it would be to scream out lout and fail
> gracefully if someone would -- for example -- use it for something like
> FOLL_FORCE.

Having that WARN_ON_ONCE() is okay to me, but just to double check we're on
the same page: why there's concern on using FOLL_FORCE? IIUC we're talking
about shared mappings here, then no FOLL_FORCE possible at all?  IOW,
"!is_cow_mapping()" should fail in check_vma_flags() already.

The other thing is I'm wondering whether patch 2 should be postponed anyway
so that we can wait for a full resolution of the problem from Mike.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-10 19:29                 ` Peter Xu
@ 2022-08-10 19:40                   ` David Hildenbrand
  2022-08-10 19:52                     ` Peter Xu
  0 siblings, 1 reply; 30+ messages in thread
From: David Hildenbrand @ 2022-08-10 19:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

On 10.08.22 21:29, Peter Xu wrote:
> On Wed, Aug 10, 2022 at 11:37:13AM +0200, David Hildenbrand wrote:
>> On 09.08.22 00:08, Peter Xu wrote:
>>> On Mon, Aug 08, 2022 at 04:21:39PM -0400, Peter Xu wrote:
>>>> On Mon, Aug 08, 2022 at 06:25:21PM +0200, David Hildenbrand wrote:
>>>>>>> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
>>>>>>> unfortunately wrong.
>>>>>>>
>>>>>>> If you're curious, take a look at f83a275dbc5c ("mm: account for
>>>>>>> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
>>>>>>> and mmap() code.
>>>>>>>
>>>>>>> Long story short: if the file is read-only, we only have VM_MAYSHARE but
>>>>>>> not VM_SHARED (and consequently also not VM_MAYWRITE).
>>>>>>
>>>>>> To ask in another way: if file is RO but mapped RW (mmap() will have
>>>>>> VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
>>>>>> won't we grant write bit errornously while we shouldn't? As the user
>>>>>> doesn't really have write permission to the file.
>>>>>
>>>>> Thus the VM_WRITE check. :)
>>>>>
>>>>> I wonder if we should just do it cleanly and introduce the maybe_mkwrite
>>>>> semantics here as well. Then there is no need for additional VM_WRITE
>>>>> checks and hugetlb will work just like !hugetlb.
>>>>
>>>> Hmm yeah I think the VM_MAYSHARE check is correct, since we'll need to fail
>>>> the cases where MAYSHARE && !SHARE - we used to silently let it pass.
>>>
>>> Sorry I think this is a wrong statement I made..  IIUC we'll fail correctly
>>> with/without the patch on any write to hugetlb RO regions.
>>>
>>> Then I just don't see a difference on checking VM_SHARED or VM_MAYSHARE
>>> here, it's just that VM_MAYSHARE check should work too like VM_SHARED so I
>>> don't see a problem.
>>>
>>> It also means I can't think of any valid case of having VM_WRITE when
>>> reaching here, then the WARN_ON_ONCE() is okay but maybe also redundant.
>>> Using maybe_mkwrite() seems misleading to me if FOLL_FORCE not ready for
>>> hugetlbfs after all.
>>>
>>
>> The main reason we'd have it would be to scream out lout and fail
>> gracefully if someone would -- for example -- use it for something like
>> FOLL_FORCE.
> 
> Having that WARN_ON_ONCE() is okay to me, but just to double check we're on
> the same page: why there's concern on using FOLL_FORCE? IIUC we're talking
> about shared mappings here, then no FOLL_FORCE possible at all?  IOW,
> "!is_cow_mapping()" should fail in check_vma_flags() already.

This code path also covers the anon case.
> 
> The other thing is I'm wondering whether patch 2 should be postponed anyway
> so that we can wait for a full resolution of the problem from Mike.

To make the code more robust and avoid any other such surprises I prefer
to have this in rather earlier than later.

As the commit says "Let's add a safety net ..."

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-10 19:40                   ` David Hildenbrand
@ 2022-08-10 19:52                     ` Peter Xu
  2022-08-10 23:55                       ` Mike Kravetz
  0 siblings, 1 reply; 30+ messages in thread
From: Peter Xu @ 2022-08-10 19:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Mike Kravetz, Muchun Song,
	Peter Feiner, Kirill A . Shutemov

On Wed, Aug 10, 2022 at 09:40:11PM +0200, David Hildenbrand wrote:
> On 10.08.22 21:29, Peter Xu wrote:
> > On Wed, Aug 10, 2022 at 11:37:13AM +0200, David Hildenbrand wrote:
> >> On 09.08.22 00:08, Peter Xu wrote:
> >>> On Mon, Aug 08, 2022 at 04:21:39PM -0400, Peter Xu wrote:
> >>>> On Mon, Aug 08, 2022 at 06:25:21PM +0200, David Hildenbrand wrote:
> >>>>>>> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
> >>>>>>> unfortunately wrong.
> >>>>>>>
> >>>>>>> If you're curious, take a look at f83a275dbc5c ("mm: account for
> >>>>>>> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
> >>>>>>> and mmap() code.
> >>>>>>>
> >>>>>>> Long story short: if the file is read-only, we only have VM_MAYSHARE but
> >>>>>>> not VM_SHARED (and consequently also not VM_MAYWRITE).
> >>>>>>
> >>>>>> To ask in another way: if file is RO but mapped RW (mmap() will have
> >>>>>> VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
> >>>>>> won't we grant write bit errornously while we shouldn't? As the user
> >>>>>> doesn't really have write permission to the file.
> >>>>>
> >>>>> Thus the VM_WRITE check. :)
> >>>>>
> >>>>> I wonder if we should just do it cleanly and introduce the maybe_mkwrite
> >>>>> semantics here as well. Then there is no need for additional VM_WRITE
> >>>>> checks and hugetlb will work just like !hugetlb.
> >>>>
> >>>> Hmm yeah I think the VM_MAYSHARE check is correct, since we'll need to fail
> >>>> the cases where MAYSHARE && !SHARE - we used to silently let it pass.
> >>>
> >>> Sorry I think this is a wrong statement I made..  IIUC we'll fail correctly
> >>> with/without the patch on any write to hugetlb RO regions.
> >>>
> >>> Then I just don't see a difference on checking VM_SHARED or VM_MAYSHARE
> >>> here, it's just that VM_MAYSHARE check should work too like VM_SHARED so I
> >>> don't see a problem.
> >>>
> >>> It also means I can't think of any valid case of having VM_WRITE when
> >>> reaching here, then the WARN_ON_ONCE() is okay but maybe also redundant.
> >>> Using maybe_mkwrite() seems misleading to me if FOLL_FORCE not ready for
> >>> hugetlbfs after all.
> >>>
> >>
> >> The main reason we'd have it would be to scream out lout and fail
> >> gracefully if someone would -- for example -- use it for something like
> >> FOLL_FORCE.
> > 
> > Having that WARN_ON_ONCE() is okay to me, but just to double check we're on
> > the same page: why there's concern on using FOLL_FORCE? IIUC we're talking
> > about shared mappings here, then no FOLL_FORCE possible at all?  IOW,
> > "!is_cow_mapping()" should fail in check_vma_flags() already.
> 
> This code path also covers the anon case.

But this specific warning is under the VM_MAYSHARE if clause just added in
this patch?

My understanding is any FOLL_FORCE will always constantly fail before this
patch, and it should keep failing as usual and I don't see any case it'll
be failing at the warn_on_once here.

So again, I'm fine with having the warning, but I just want to make sure
what you want to capture is what you expected..

> > 
> > The other thing is I'm wondering whether patch 2 should be postponed anyway
> > so that we can wait for a full resolution of the problem from Mike.
> 
> To make the code more robust and avoid any other such surprises I prefer
> to have this in rather earlier than later.
> 
> As the commit says "Let's add a safety net ..."

Sure, no strong opinion.  I'll leave that to Mike.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-10 19:52                     ` Peter Xu
@ 2022-08-10 23:55                       ` Mike Kravetz
  2022-08-11  8:48                         ` David Hildenbrand
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Kravetz @ 2022-08-10 23:55 UTC (permalink / raw)
  To: Peter Xu
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Muchun Song, Peter Feiner, Kirill A . Shutemov

On 08/10/22 15:52, Peter Xu wrote:
> On Wed, Aug 10, 2022 at 09:40:11PM +0200, David Hildenbrand wrote:
> > On 10.08.22 21:29, Peter Xu wrote:
> > > On Wed, Aug 10, 2022 at 11:37:13AM +0200, David Hildenbrand wrote:
> > >> On 09.08.22 00:08, Peter Xu wrote:
> > >>> On Mon, Aug 08, 2022 at 04:21:39PM -0400, Peter Xu wrote:
> > >>>> On Mon, Aug 08, 2022 at 06:25:21PM +0200, David Hildenbrand wrote:
> > >>>>>>> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
> > >>>>>>> unfortunately wrong.
> > >>>>>>>
> > >>>>>>> If you're curious, take a look at f83a275dbc5c ("mm: account for
> > >>>>>>> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
> > >>>>>>> and mmap() code.
> > >>>>>>>
> > >>>>>>> Long story short: if the file is read-only, we only have VM_MAYSHARE but
> > >>>>>>> not VM_SHARED (and consequently also not VM_MAYWRITE).
> > >>>>>>
> > >>>>>> To ask in another way: if file is RO but mapped RW (mmap() will have
> > >>>>>> VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
> > >>>>>> won't we grant write bit errornously while we shouldn't? As the user
> > >>>>>> doesn't really have write permission to the file.
> > >>>>>
> > >>>>> Thus the VM_WRITE check. :)
> > >>>>>
> > >>>>> I wonder if we should just do it cleanly and introduce the maybe_mkwrite
> > >>>>> semantics here as well. Then there is no need for additional VM_WRITE
> > >>>>> checks and hugetlb will work just like !hugetlb.
> > >>>>
> > >>>> Hmm yeah I think the VM_MAYSHARE check is correct, since we'll need to fail
> > >>>> the cases where MAYSHARE && !SHARE - we used to silently let it pass.
> > >>>
> > >>> Sorry I think this is a wrong statement I made..  IIUC we'll fail correctly
> > >>> with/without the patch on any write to hugetlb RO regions.
> > >>>
> > >>> Then I just don't see a difference on checking VM_SHARED or VM_MAYSHARE
> > >>> here, it's just that VM_MAYSHARE check should work too like VM_SHARED so I
> > >>> don't see a problem.
> > >>>
> > >>> It also means I can't think of any valid case of having VM_WRITE when
> > >>> reaching here, then the WARN_ON_ONCE() is okay but maybe also redundant.
> > >>> Using maybe_mkwrite() seems misleading to me if FOLL_FORCE not ready for
> > >>> hugetlbfs after all.
> > >>>
> > >>
> > >> The main reason we'd have it would be to scream out lout and fail
> > >> gracefully if someone would -- for example -- use it for something like
> > >> FOLL_FORCE.
> > > 
> > > Having that WARN_ON_ONCE() is okay to me, but just to double check we're on
> > > the same page: why there's concern on using FOLL_FORCE? IIUC we're talking
> > > about shared mappings here, then no FOLL_FORCE possible at all?  IOW,
> > > "!is_cow_mapping()" should fail in check_vma_flags() already.
> > 
> > This code path also covers the anon case.
> 
> But this specific warning is under the VM_MAYSHARE if clause just added in
> this patch?
> 
> My understanding is any FOLL_FORCE will always constantly fail before this
> patch, and it should keep failing as usual and I don't see any case it'll
> be failing at the warn_on_once here.
> 
> So again, I'm fine with having the warning, but I just want to make sure
> what you want to capture is what you expected..
> 
> > > 
> > > The other thing is I'm wondering whether patch 2 should be postponed anyway
> > > so that we can wait for a full resolution of the problem from Mike.
> > 
> > To make the code more robust and avoid any other such surprises I prefer
> > to have this in rather earlier than later.
> > 
> > As the commit says "Let's add a safety net ..."
> 
> Sure, no strong opinion.  I'll leave that to Mike.  Thanks,
> 

Sorry that I am not contributing to this thread more.  Need to spend
some time educating myself on the relatively new semantics here.

As mentioned, softdirty is on my todo list but has been there for over a
year.  So, better to add a safety net until that code moves forward.

However, just for clarification.  The only way we KNOW of to encounter
these situations today via softdirty.  Patch 1 takes care of that.  This
patch catches any unknown ways we may get here.  Correct?  i.e. We don't
really expect to exercise these code paths.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings
  2022-08-10 23:55                       ` Mike Kravetz
@ 2022-08-11  8:48                         ` David Hildenbrand
  0 siblings, 0 replies; 30+ messages in thread
From: David Hildenbrand @ 2022-08-11  8:48 UTC (permalink / raw)
  To: Mike Kravetz, Peter Xu
  Cc: linux-kernel, linux-mm, Andrew Morton, Muchun Song, Peter Feiner,
	Kirill A . Shutemov

On 11.08.22 01:55, Mike Kravetz wrote:
> On 08/10/22 15:52, Peter Xu wrote:
>> On Wed, Aug 10, 2022 at 09:40:11PM +0200, David Hildenbrand wrote:
>>> On 10.08.22 21:29, Peter Xu wrote:
>>>> On Wed, Aug 10, 2022 at 11:37:13AM +0200, David Hildenbrand wrote:
>>>>> On 09.08.22 00:08, Peter Xu wrote:
>>>>>> On Mon, Aug 08, 2022 at 04:21:39PM -0400, Peter Xu wrote:
>>>>>>> On Mon, Aug 08, 2022 at 06:25:21PM +0200, David Hildenbrand wrote:
>>>>>>>>>> Relying on VM_SHARED to detect MAP_PRIVATE vs. MAP_SHARED is
>>>>>>>>>> unfortunately wrong.
>>>>>>>>>>
>>>>>>>>>> If you're curious, take a look at f83a275dbc5c ("mm: account for
>>>>>>>>>> MAP_SHARED mappings using VM_MAYSHARE and not VM_SHARED in hugetlbfs")
>>>>>>>>>> and mmap() code.
>>>>>>>>>>
>>>>>>>>>> Long story short: if the file is read-only, we only have VM_MAYSHARE but
>>>>>>>>>> not VM_SHARED (and consequently also not VM_MAYWRITE).
>>>>>>>>>
>>>>>>>>> To ask in another way: if file is RO but mapped RW (mmap() will have
>>>>>>>>> VM_SHARED cleared but VM_MAYSHARE set), then if we check VM_MAYSHARE here
>>>>>>>>> won't we grant write bit errornously while we shouldn't? As the user
>>>>>>>>> doesn't really have write permission to the file.
>>>>>>>>
>>>>>>>> Thus the VM_WRITE check. :)
>>>>>>>>
>>>>>>>> I wonder if we should just do it cleanly and introduce the maybe_mkwrite
>>>>>>>> semantics here as well. Then there is no need for additional VM_WRITE
>>>>>>>> checks and hugetlb will work just like !hugetlb.
>>>>>>>
>>>>>>> Hmm yeah I think the VM_MAYSHARE check is correct, since we'll need to fail
>>>>>>> the cases where MAYSHARE && !SHARE - we used to silently let it pass.
>>>>>>
>>>>>> Sorry I think this is a wrong statement I made..  IIUC we'll fail correctly
>>>>>> with/without the patch on any write to hugetlb RO regions.
>>>>>>
>>>>>> Then I just don't see a difference on checking VM_SHARED or VM_MAYSHARE
>>>>>> here, it's just that VM_MAYSHARE check should work too like VM_SHARED so I
>>>>>> don't see a problem.
>>>>>>
>>>>>> It also means I can't think of any valid case of having VM_WRITE when
>>>>>> reaching here, then the WARN_ON_ONCE() is okay but maybe also redundant.
>>>>>> Using maybe_mkwrite() seems misleading to me if FOLL_FORCE not ready for
>>>>>> hugetlbfs after all.
>>>>>>
>>>>>
>>>>> The main reason we'd have it would be to scream out lout and fail
>>>>> gracefully if someone would -- for example -- use it for something like
>>>>> FOLL_FORCE.
>>>>
>>>> Having that WARN_ON_ONCE() is okay to me, but just to double check we're on
>>>> the same page: why there's concern on using FOLL_FORCE? IIUC we're talking
>>>> about shared mappings here, then no FOLL_FORCE possible at all?  IOW,
>>>> "!is_cow_mapping()" should fail in check_vma_flags() already.
>>>
>>> This code path also covers the anon case.
>>
>> But this specific warning is under the VM_MAYSHARE if clause just added in
>> this patch?
>>
>> My understanding is any FOLL_FORCE will always constantly fail before this
>> patch, and it should keep failing as usual and I don't see any case it'll
>> be failing at the warn_on_once here.
>>
>> So again, I'm fine with having the warning, but I just want to make sure
>> what you want to capture is what you expected..
>>
>>>>
>>>> The other thing is I'm wondering whether patch 2 should be postponed anyway
>>>> so that we can wait for a full resolution of the problem from Mike.
>>>
>>> To make the code more robust and avoid any other such surprises I prefer
>>> to have this in rather earlier than later.
>>>
>>> As the commit says "Let's add a safety net ..."
>>
>> Sure, no strong opinion.  I'll leave that to Mike.  Thanks,
>>
> 
> Sorry that I am not contributing to this thread more.  Need to spend
> some time educating myself on the relatively new semantics here.
> 
> As mentioned, softdirty is on my todo list but has been there for over a
> year.  So, better to add a safety net until that code moves forward.
> 
> However, just for clarification.  The only way we KNOW of to encounter
> these situations today via softdirty.  Patch 1 takes care of that.  This
> patch catches any unknown ways we may get here.  Correct?  i.e. We don't
> really expect to exercise these code paths.

While I do love a good challenge on a Thursday morning, I wish I could
spend less time writing reproducers and arguing about obviously shaky
code ;) . Having that said, there is a flaw in uffd-wp that will end up
in the same code path and similarly mess up.

I'll resend including the reproducer. Note that I'll be on vacation for
~ 1.5 weeks.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-08-11  8:48 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-05 11:03 [PATCH v1 0/2] mm/hugetlb: fix write-fault handling for shared mappings David Hildenbrand
2022-08-05 11:03 ` [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify David Hildenbrand
2022-08-05 18:14   ` Peter Xu
2022-08-05 18:22     ` David Hildenbrand
2022-08-05 18:23     ` Mike Kravetz
2022-08-05 18:25       ` David Hildenbrand
2022-08-05 18:33         ` Mike Kravetz
2022-08-05 18:57           ` David Hildenbrand
2022-08-05 20:48             ` Mike Kravetz
2022-08-05 23:13               ` Peter Xu
2022-08-05 23:33                 ` Mike Kravetz
2022-08-08 16:10                   ` Peter Xu
2022-08-08 16:36                 ` David Hildenbrand
2022-08-08 19:28                   ` Peter Xu
2022-08-10  9:29                     ` David Hildenbrand
2022-08-05 11:03 ` [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings David Hildenbrand
2022-08-05 18:12   ` Peter Xu
2022-08-05 18:20     ` David Hildenbrand
2022-08-08 16:05       ` Peter Xu
2022-08-08 16:25         ` David Hildenbrand
2022-08-08 20:21           ` Peter Xu
2022-08-08 22:08             ` Peter Xu
2022-08-10  9:37               ` David Hildenbrand
2022-08-10  9:45                 ` David Hildenbrand
2022-08-10 19:29                 ` Peter Xu
2022-08-10 19:40                   ` David Hildenbrand
2022-08-10 19:52                     ` Peter Xu
2022-08-10 23:55                       ` Mike Kravetz
2022-08-11  8:48                         ` David Hildenbrand
2022-08-05 23:08     ` Mike Kravetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.