From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6684CC00140 for ; Fri, 5 Aug 2022 11:04:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240572AbiHELEB (ORCPT ); Fri, 5 Aug 2022 07:04:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240580AbiHELDz (ORCPT ); Fri, 5 Aug 2022 07:03:55 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 50F7074CE4 for ; Fri, 5 Aug 2022 04:03:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659697431; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mJ3wb3wnNw2CiOGpWxs5L2Tr+je+/REmFMZ2ZlIWVD0=; b=cIkSOyb40RuDC5vCkr5edBPrejhxBVwSyFpjDZjbXq2ZQgOWyjS/S1HBRp1D+DwjG9C05Q 3XLcc4rGq5j7Rgme6AVZ2biBExx9/37cQxmC6dj16AZZSWyfMGKY0DC2t2dcbJFXxCVgt+ WixoE/qsZC8AzCNyhUzXqSu7FH33HZ8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-299-s0xMztL9Mhql4sPg9FocmA-1; Fri, 05 Aug 2022 07:03:36 -0400 X-MC-Unique: s0xMztL9Mhql4sPg9FocmA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3F65C382ECC7; Fri, 5 Aug 2022 11:03:36 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.194.85]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1DCC9404E4D6; Fri, 5 Aug 2022 11:03:33 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Mike Kravetz , Muchun Song , Peter Xu , Peter Feiner , "Kirill A . Shutemov" , stable@vger.kernel.org Subject: [PATCH v1 1/2] mm/hugetlb: fix hugetlb not supporting write-notify Date: Fri, 5 Aug 2022 13:03:28 +0200 Message-Id: <20220805110329.80540-2-david@redhat.com> In-Reply-To: <20220805110329.80540-1-david@redhat.com> References: <20220805110329.80540-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Staring at hugetlb_wp(), one might wonder where all the logic for shared mappings is when stumbling over a write-protected page in a shared mapping. In fact, there is none, and so far we thought we could get away with that because e.g., mprotect() should always do the right thing and map all pages directly writable. Looks like we were wrong: -------------------------------------------------------------------------- #include #include #include #include #include #include #include #define HUGETLB_SIZE (2 * 1024 * 1024u) static void clear_softdirty(void) { int fd = open("/proc/self/clear_refs", O_WRONLY); const char *ctrl = "4"; int ret; if (fd < 0) { fprintf(stderr, "open(clear_refs) failed\n"); exit(1); } ret = write(fd, ctrl, strlen(ctrl)); if (ret != strlen(ctrl)) { fprintf(stderr, "write(clear_refs) failed\n"); exit(1); } close(fd); } int main(int argc, char **argv) { char *map; int fd; fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); if (!fd) { fprintf(stderr, "open() failed\n"); return -errno; } if (ftruncate(fd, HUGETLB_SIZE)) { fprintf(stderr, "ftruncate() failed\n"); return -errno; } map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; } *map = 0; if (mprotect(map, HUGETLB_SIZE, PROT_READ)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } clear_softdirty(); if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } *map = 0; return 0; } -------------------------------------------------------------------------- Above test fails with SIGBUS when there is only a single free hugetlb page. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test Bus error (core dumped) And worse, with sufficient free hugetlb pages it will map an anonymous page into a shared mapping, for example, messing up accounting during unmap and breaking MAP_SHARED semantics: # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test # cat /proc/meminfo | grep HugePages_ HugePages_Total: 2 HugePages_Free: 1 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Reason in this particular case is that vma_wants_writenotify() will return "true", removing VM_SHARED in vma_set_page_prot() to map pages write-protected. Let's teach vma_wants_writenotify() that hugetlb does not support write-notify, including softdirty tracking. Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Cc: # v3.18+ Signed-off-by: David Hildenbrand --- mm/mmap.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 61e6135c54ef..462a6b0344ac 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) return 0; + /* + * Hugetlb does not require/support writenotify; especially, it does not + * support softdirty tracking. + */ + if (is_vm_hugetlb_page(vma)) + return 0; + /* The backer wishes to know when pages are first written to? */ if (vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite)) return 1; -- 2.35.3