All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Peter Xu <peterx@redhat.com>, Peter Feiner <pfeiner@google.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	stable@vger.kernel.org
Subject: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify
Date: Fri,  5 Aug 2022 13:03:28 +0200	[thread overview]
Message-ID: <20220805110329.80540-2-david@redhat.com> (raw)
In-Reply-To: <20220805110329.80540-1-david@redhat.com>

Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping. In fact, there is none, and so far we thought we could get
away with that because e.g., mprotect() should always do the right thing
and map all pages directly writable.

Looks like we were wrong:

--------------------------------------------------------------------------
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <errno.h>
 #include <sys/mman.h>

 #define HUGETLB_SIZE (2 * 1024 * 1024u)

 static void clear_softdirty(void)
 {
         int fd = open("/proc/self/clear_refs", O_WRONLY);
         const char *ctrl = "4";
         int ret;

         if (fd < 0) {
                 fprintf(stderr, "open(clear_refs) failed\n");
                 exit(1);
         }
         ret = write(fd, ctrl, strlen(ctrl));
         if (ret != strlen(ctrl)) {
                 fprintf(stderr, "write(clear_refs) failed\n");
                 exit(1);
         }
         close(fd);
 }

 int main(int argc, char **argv)
 {
         char *map;
         int fd;

         fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
         if (!fd) {
                 fprintf(stderr, "open() failed\n");
                 return -errno;
         }
         if (ftruncate(fd, HUGETLB_SIZE)) {
                 fprintf(stderr, "ftruncate() failed\n");
                 return -errno;
         }

         map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
         if (map == MAP_FAILED) {
                 fprintf(stderr, "mmap() failed\n");
                 return -errno;
         }

         *map = 0;

         if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         clear_softdirty();

         if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         *map = 0;

         return 0;
 }
--------------------------------------------------------------------------

Above test fails with SIGBUS when there is only a single free hugetlb page.
 # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 Bus error (core dumped)

And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
 # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 # cat /proc/meminfo | grep HugePages_
 HugePages_Total:       2
 HugePages_Free:        1
 HugePages_Rsvd:    18446744073709551615
 HugePages_Surp:        0

Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support write-notify, including softdirty tracking.

Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/mmap.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/mmap.c b/mm/mmap.c
index 61e6135c54ef..462a6b0344ac 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
 	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
 		return 0;
 
+	/*
+	 * Hugetlb does not require/support writenotify; especially, it does not
+	 * support softdirty tracking.
+	 */
+	if (is_vm_hugetlb_page(vma))
+		return 0;
+
 	/* The backer wishes to know when pages are first written to? */
 	if (vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite))
 		return 1;
-- 
2.35.3


  reply	other threads:[~2022-08-05 11:04 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-05 11:03 [PATCH v1 0/2] mm/hugetlb: fix write-fault handling for shared mappings David Hildenbrand
2022-08-05 11:03 ` David Hildenbrand [this message]
2022-08-05 18:14   ` [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify Peter Xu
2022-08-05 18:22     ` David Hildenbrand
2022-08-05 18:23     ` Mike Kravetz
2022-08-05 18:25       ` David Hildenbrand
2022-08-05 18:33         ` Mike Kravetz
2022-08-05 18:57           ` David Hildenbrand
2022-08-05 20:48             ` Mike Kravetz
2022-08-05 23:13               ` Peter Xu
2022-08-05 23:33                 ` Mike Kravetz
2022-08-08 16:10                   ` Peter Xu
2022-08-08 16:36                 ` David Hildenbrand
2022-08-08 19:28                   ` Peter Xu
2022-08-10  9:29                     ` David Hildenbrand
2022-08-05 11:03 ` [PATCH v1 2/2] mm/hugetlb: support write-faults in shared mappings David Hildenbrand
2022-08-05 18:12   ` Peter Xu
2022-08-05 18:20     ` David Hildenbrand
2022-08-08 16:05       ` Peter Xu
2022-08-08 16:25         ` David Hildenbrand
2022-08-08 20:21           ` Peter Xu
2022-08-08 22:08             ` Peter Xu
2022-08-10  9:37               ` David Hildenbrand
2022-08-10  9:45                 ` David Hildenbrand
2022-08-10 19:29                 ` Peter Xu
2022-08-10 19:40                   ` David Hildenbrand
2022-08-10 19:52                     ` Peter Xu
2022-08-10 23:55                       ` Mike Kravetz
2022-08-11  8:48                         ` David Hildenbrand
2022-08-05 23:08     ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220805110329.80540-2-david@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=pfeiner@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.