From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 178F0C76196 for ; Tue, 11 Apr 2023 14:25:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F44E28000C; Tue, 11 Apr 2023 10:25:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 96F02280001; Tue, 11 Apr 2023 10:25:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 85EC128000C; Tue, 11 Apr 2023 10:25:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6D3FA280001 for ; Tue, 11 Apr 2023 10:25:29 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id F11D0C0DCE for ; Tue, 11 Apr 2023 14:25:28 +0000 (UTC) X-FDA: 80669333136.22.7E9A8B9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 1357540026 for ; Tue, 11 Apr 2023 14:25:26 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cq+SORSc; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681223127; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VwTQ8oLmv6mJPSwow3w38v2InKFos9H0Yp4Ad7AU1D8=; b=cGzP5U7Eq37xnQ4GQ7utP0v++Q13mioANegHUkKQbSzMc0v3q5yBN0pp5btK/psgjFIZ3l Rd7ari4LNEUvKIu3zFvtOT3gcjwNeyL6mtrvZnYRzoMPQjRG/FL7R2oqxpx+hfpIvvqsz9 WPOqQ4eQ17HEFSqFQN5KqJT0awEMyZk= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cq+SORSc; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681223127; a=rsa-sha256; cv=none; b=Z7E79toFRXtONgFv8ND0efyMXh2YnV6D+ys8ndgRn1Onm8Mh2Y0mPU32iIkLCmDn0iweLX G64UF6Uh7jPLlwz/MiBdwhOzQ09Br6CqoSdu98GK4G0h0r+YwC4Md8v4cKwnhdX5fOjAIH 1B2C4TD/UN7uZEkcXRBh9CuMEZPNrC4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681223126; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VwTQ8oLmv6mJPSwow3w38v2InKFos9H0Yp4Ad7AU1D8=; b=cq+SORSc5ieYbQz25AEBjFw/hU590HorQePiIXwSz4rf0pfGkI4xy2MLW7TgODVriadiRM 6fN/tGjOFDR0FTo74afyspp1iiDQ+nunHdvFo+zh1onZut8G3HJ17GBIZF7bb+hkh18uHO 4sAq9u85+Un/wBz+HvIKQkjbR+kZOLM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-534-G3xKFkO9Mheb-VbM75NmYQ-1; Tue, 11 Apr 2023 10:25:25 -0400 X-MC-Unique: G3xKFkO9Mheb-VbM75NmYQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 069A43C0F423; Tue, 11 Apr 2023 14:25:24 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.95]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9221A14171D5; Tue, 11 Apr 2023 14:25:21 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-kselftest@vger.kernel.org, sparclinux@vger.kernel.org, David Hildenbrand , Andrew Morton , "David S. Miller" , Peter Xu , Hugh Dickins , Shuah Khan , Sam Ravnborg , Yu Zhao , Anshuman Khandual Subject: [PATCH v1 RESEND 3/6] sparc/mm: don't unconditionally set HW writable bit when setting PTE dirty on 64bit Date: Tue, 11 Apr 2023 16:25:09 +0200 Message-Id: <20230411142512.438404-4-david@redhat.com> In-Reply-To: <20230411142512.438404-1-david@redhat.com> References: <20230411142512.438404-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 1357540026 X-Stat-Signature: dwha44rtm9zm9crwf7om9maycg44c9pq X-HE-Tag: 1681223126-206541 X-HE-Meta: U2FsdGVkX19w4Nlz0BFvcWLPHqltH7ly1wRf6qo9fPZwA4Uw7U+5xRrADZKC3EG+CA53ykrwS5N9ZRiuVk40kue8rS4Am/qrSnYydJYptM/AJRpNfdWMJMfbh3D3V2TjQzUkL8WUFa7u0Fa0v0Glbd47OzyghIWhmccnAko16b/JaZAWqC4N3DQ08XWi+86GWDMoZ/qMbHHhnKXrssCJfB5o14k5etos/Xez8s66AeJI2uoWEgJw9oUDPR/efmy5HrDRDLz7PGxmmxQ0cYPeEPwqXJdxxOKtezLM2IT3/S0cTpT9w/Sjk8xOMb9gzMTIHGgHmxiLgvAN7mHKNRo6Z8IijuqAjmzYEHqFHk1zGmXztVXJ/Uisrv1dfX6sBP1mywVcqpFFc8HnqUoHBOY1KWHHyvC7r9XS/f35FN8n8PbPCL/i23Ir2fKk2cQWkOfCqUdVgAhkOePOMOBNbI7oMRtzmgzDdwVLxFMVKqbQCBXj+kBuk6u2RxBor1UC6yeKPyg0q9+qUrU76CKxOMIDCKddCzo6jit9In1CGHvmGFLkd86BS44gwpmD00BruQJ+0hVJ642GQknRF/n21d/b59M4kjKcmpJ8PxhCXPu4PGXjLGXQNricpmyADdF4Za5WnuPYt/2cMQtmom+xbFfrLtkI89GCClRcT2W5F6BXtID5UQakZoC2LfP0XW6QwcIlhp2iYcw5OJvH45u8CGARE3h0qhbZXDL4ZOTzjGK0Urxf0bV9C+B4KU7xw6OsonVf8nBV3ou+yt+ntP1WIXpm/VvzvDE8ECdMdcKXHSMeJ5HNHfDRgsQoOIYTjtlwnBsrMLA52MAODLAuc5Cp/bY6kMkdPKRXQx1sDGxpqhqaTAjV6kyEigtp9nft3LhEmWgw1nsVL5dyOGW8MVO03LvcGf7eFZ2K4MnDFO8pGv8kbdNSN7J0ynFqlVDOThJ0f5fLkwijs671JTpSt6nN8Tf k75rGFg6 emenTo9wP0qIK5iKl0xmLmvWxBLb75cNsp1orlYnbF0noA40A4+hALQMpN6oDGxoz5Y8b/m2MWfLzJtRneYQyCwJPgd0KOFKZn7xCS9q0OOGXwucGCaOL7v0n42aKv/pQOSU6lVQ6nDy3MeZONEeAWVQ6pVFrVPM94KwDoivTn/Y1yUiOf4HcFhp7T5pTO+BG/kCWuHXlYcwi36O/KgAj3ru6ouSoNDaSdDitKHUr57IuEcdBGl7HHUA3S6qyEBcxovwW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On sparc64, there is no HW modified bit, therefore, SW tracks via a SW bit if the PTE is dirty via pte_mkdirty(). However, pte_mkdirty() currently also unconditionally sets the HW writable bit, which is wrong. pte_mkdirty() is not supposed to make a PTE actually writable, unless the SW writable bit -- pte_write() -- indicates that the PTE is not write-protected. Fortunately, sparc64 also defines a SW writable bit. For example, this already turned into a problem in the context of THP splitting as documented in commit 624a2c94f5b7 ("Partly revert "mm/thp: carry over dirty bit when thp splits on pmd""), and for page migration, as documented in commit 96a9c287e25d ("mm/migrate: fix wrongly apply write bit after mkdirty on sparc64"). Also, we might want to use the dirty PTE bit in the context of KSM with shared zeropage [1], whereby setting the page writable would be problematic. But more general, any code that might end up setting a PTE/PMD dirty inside a VM without write permissions is possibly broken, Before this commit (sun4u in QEMU): root@debian:~/linux/tools/testing/selftests/mm# ./mkdirty # [INFO] detected THP size: 8192 KiB TAP version 13 1..6 # [INFO] PTRACE write access not ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP not ok 2 SIGSEGV generated, page not modified # [INFO] Page migration ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY not ok 6 SIGSEGV generated, page not modified Bail out! 3 out of 6 tests failed # Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0 Test #3,#4,#5 pass ever since we added some MM workarounds, the underlying issue remains. Let's fix the remaining issues and prepare for reverting the workarounds by setting the HW writable bit only if both, the SW dirty bit and the SW writable bit are set. We have to move pte_dirty() and pte_dirty() up. The code patching mechanism and handling constants > 22bit is a bit special on sparc64. The ASM logic in pte_mkdirty() and pte_mkwrite() match the logic in pte_mkold() to create the mask depending on the machine type. The ASM logic in __pte_mkhwwrite() matches the logic in pte_present(), just using an "or" instead of an "and" instruction. With this commit (sun4u in QEMU): root@debian:~/linux/tools/testing/selftests/mm# ./mkdirty # [INFO] detected THP size: 8192 KiB TAP version 13 1..6 # [INFO] PTRACE write access ok 1 SIGSEGV generated, page not modified # [INFO] PTRACE write access to THP ok 2 SIGSEGV generated, page not modified # [INFO] Page migration ok 3 SIGSEGV generated, page not modified # [INFO] Page migration of THP ok 4 SIGSEGV generated, page not modified # [INFO] PTE-mapping a THP ok 5 SIGSEGV generated, page not modified # [INFO] UFFDIO_COPY ok 6 SIGSEGV generated, page not modified # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 This handling seems to have been in place forever. [1] https://lkml.kernel.org/r/533a7c3d-3a48-b16b-b421-6e8386e0b142@redhat.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Hildenbrand --- arch/sparc/include/asm/pgtable_64.h | 116 ++++++++++++++++------------ 1 file changed, 66 insertions(+), 50 deletions(-) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 2dc8d4641734..5563efa1a19f 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -357,6 +357,42 @@ static inline pgprot_t pgprot_noncached(pgprot_t prot) */ #define pgprot_noncached pgprot_noncached +static inline unsigned long pte_dirty(pte_t pte) +{ + unsigned long mask; + + __asm__ __volatile__( + "\n661: mov %1, %0\n" + " nop\n" + " .section .sun4v_2insn_patch, \"ax\"\n" + " .word 661b\n" + " sethi %%uhi(%2), %0\n" + " sllx %0, 32, %0\n" + " .previous\n" + : "=r" (mask) + : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); + + return (pte_val(pte) & mask); +} + +static inline unsigned long pte_write(pte_t pte) +{ + unsigned long mask; + + __asm__ __volatile__( + "\n661: mov %1, %0\n" + " nop\n" + " .section .sun4v_2insn_patch, \"ax\"\n" + " .word 661b\n" + " sethi %%uhi(%2), %0\n" + " sllx %0, 32, %0\n" + " .previous\n" + : "=r" (mask) + : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); + + return (pte_val(pte) & mask); +} + #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE) pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags); #define arch_make_huge_pte arch_make_huge_pte @@ -418,28 +454,43 @@ static inline bool is_hugetlb_pte(pte_t pte) } #endif +static inline pte_t __pte_mkhwwrite(pte_t pte) +{ + unsigned long val = pte_val(pte); + + /* + * Note: we only want to set the HW writable bit if the SW writable bit + * and the SW dirty bit are set. + */ + __asm__ __volatile__( + "\n661: or %0, %2, %0\n" + " .section .sun4v_1insn_patch, \"ax\"\n" + " .word 661b\n" + " or %0, %3, %0\n" + " .previous\n" + : "=r" (val) + : "0" (val), "i" (_PAGE_W_4U), "i" (_PAGE_W_4V)); + + return __pte(val); +} + static inline pte_t pte_mkdirty(pte_t pte) { - unsigned long val = pte_val(pte), tmp; + unsigned long val = pte_val(pte), mask; __asm__ __volatile__( - "\n661: or %0, %3, %0\n" - " nop\n" - "\n662: nop\n" + "\n661: mov %1, %0\n" " nop\n" " .section .sun4v_2insn_patch, \"ax\"\n" " .word 661b\n" - " sethi %%uhi(%4), %1\n" - " sllx %1, 32, %1\n" - " .word 662b\n" - " or %1, %%lo(%4), %1\n" - " or %0, %1, %0\n" + " sethi %%uhi(%2), %0\n" + " sllx %0, 32, %0\n" " .previous\n" - : "=r" (val), "=r" (tmp) - : "0" (val), "i" (_PAGE_MODIFIED_4U | _PAGE_W_4U), - "i" (_PAGE_MODIFIED_4V | _PAGE_W_4V)); + : "=r" (mask) + : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); - return __pte(val); + pte = __pte(val | mask); + return pte_write(pte) ? __pte_mkhwwrite(pte) : pte; } static inline pte_t pte_mkclean(pte_t pte) @@ -481,7 +532,8 @@ static inline pte_t pte_mkwrite(pte_t pte) : "=r" (mask) : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); - return __pte(val | mask); + pte = __pte(val | mask); + return pte_dirty(pte) ? __pte_mkhwwrite(pte) : pte; } static inline pte_t pte_wrprotect(pte_t pte) @@ -584,42 +636,6 @@ static inline unsigned long pte_young(pte_t pte) return (pte_val(pte) & mask); } -static inline unsigned long pte_dirty(pte_t pte) -{ - unsigned long mask; - - __asm__ __volatile__( - "\n661: mov %1, %0\n" - " nop\n" - " .section .sun4v_2insn_patch, \"ax\"\n" - " .word 661b\n" - " sethi %%uhi(%2), %0\n" - " sllx %0, 32, %0\n" - " .previous\n" - : "=r" (mask) - : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); - - return (pte_val(pte) & mask); -} - -static inline unsigned long pte_write(pte_t pte) -{ - unsigned long mask; - - __asm__ __volatile__( - "\n661: mov %1, %0\n" - " nop\n" - " .section .sun4v_2insn_patch, \"ax\"\n" - " .word 661b\n" - " sethi %%uhi(%2), %0\n" - " sllx %0, 32, %0\n" - " .previous\n" - : "=r" (mask) - : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); - - return (pte_val(pte) & mask); -} - static inline unsigned long pte_exec(pte_t pte) { unsigned long mask; -- 2.39.2