From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15AC2C76196 for ; Tue, 11 Apr 2023 19:35:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C6E6900002; Tue, 11 Apr 2023 15:35:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 577A36B0075; Tue, 11 Apr 2023 15:35:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43E3B900002; Tue, 11 Apr 2023 15:35:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 32B0B6B0074 for ; Tue, 11 Apr 2023 15:35:55 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0304D1602D9 for ; Tue, 11 Apr 2023 19:35:54 +0000 (UTC) X-FDA: 80670115470.11.0C3BABF Received: from mailrelay1-1.pub.mailoutpod2-cph3.one.com (mailrelay1-1.pub.mailoutpod2-cph3.one.com [46.30.211.176]) by imf10.hostedemail.com (Postfix) with ESMTP id A2A1FC0004 for ; Tue, 11 Apr 2023 19:35:52 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=ravnborg.org header.s=rsa2 header.b=NXSgIozX; dkim=pass header.d=ravnborg.org header.s=ed2 header.b=S8Q7BbEh; spf=none (imf10.hostedemail.com: domain of sam@ravnborg.org has no SPF policy when checking 46.30.211.176) smtp.mailfrom=sam@ravnborg.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681241753; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6R6glNsSpLFyjginUD6NcowqNY9ur8a9ugp/eLkcD5w=; b=7un1qonTe7c6Y5iuRljYL7f4KCV+VTHdU4Wa8qLORSeULr/ymdxJ3phmBGjqrDqm7MoGYR 8R3wNztVi/++MStRBJS3e2tbMT8NQLj/7yN2fZHd+7XR3jqzJqCtNo34sF8eQCxxXtIzfY INWLOuT+5+dBDqauk4x0gWeYsuewL6M= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=ravnborg.org header.s=rsa2 header.b=NXSgIozX; dkim=pass header.d=ravnborg.org header.s=ed2 header.b=S8Q7BbEh; spf=none (imf10.hostedemail.com: domain of sam@ravnborg.org has no SPF policy when checking 46.30.211.176) smtp.mailfrom=sam@ravnborg.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681241753; a=rsa-sha256; cv=none; b=X4/HuBXKlsHo445KU1gDsiAG51ywIfjZlNH3SxR3OXOJqATeeJ02pGle649YxOtzdpJbHY 4ynptayV5JbcoVq/AOtX1NoPzYwGZL9JAeetKBwTkk+OvHYZGInZRdvoHjFS2NQdXS2lZe rcuQzUHVpYl1Cz5Sbnmb28OilHTaCAQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ravnborg.org; s=rsa2; h=in-reply-to:content-type:mime-version:references:message-id:subject:cc:to: from:date:from; bh=6R6glNsSpLFyjginUD6NcowqNY9ur8a9ugp/eLkcD5w=; b=NXSgIozXgT72RAvH5CBS8iHcb9fF9jNF9TfKmCxRQft9xe/HII+ZkuZAaNOlUd31IFqSXvziCr4wA /2RVyfYB38mztBu82uRXJRtKI1UAwUybyQQn0gfUnxxpfkuufrGPtQFiDuJrLv/7rQXPLGEWxlAHjh pTt57iy7vvluPgEnnzxXLOtTgFyO2xhA2Y38a0vPA5kaCi7i7ru9Qz8yGRtxmCBn+vlsnEFwH6n7WT /Sc4ACkKASHuR8S3daqRMVRlVsMInxYINy35X8ERxDqaE9kYyviA5I82glJUwreIH4ih1b1IeMKl9o /9Li7ny/x8s0SZYyUqgejG6xEnG/Kzw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ravnborg.org; s=ed2; h=in-reply-to:content-type:mime-version:references:message-id:subject:cc:to: from:date:from; bh=6R6glNsSpLFyjginUD6NcowqNY9ur8a9ugp/eLkcD5w=; b=S8Q7BbEh6oAOXaKST+6JUuVXYmYG+NQKPGlDTJozbNQCqWLjgfQKqJUYXdE3DTtdB/Dm4w8mNa77O 0BtXVXKCg== X-HalOne-ID: 0f4363a0-d8a0-11ed-9613-99461c6a3fe8 Received: from ravnborg.org (2-105-2-98-cable.dk.customer.tdc.net [2.105.2.98]) by mailrelay1 (Halon) with ESMTPSA id 0f4363a0-d8a0-11ed-9613-99461c6a3fe8; Tue, 11 Apr 2023 19:35:50 +0000 (UTC) Date: Tue, 11 Apr 2023 21:35:48 +0200 From: Sam Ravnborg To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, sparclinux@vger.kernel.org, Andrew Morton , "David S. Miller" , Peter Xu , Hugh Dickins , Shuah Khan , Yu Zhao , Anshuman Khandual Subject: Re: [PATCH v1 RESEND 3/6] sparc/mm: don't unconditionally set HW writable bit when setting PTE dirty on 64bit Message-ID: <20230411193548.GA2094947@ravnborg.org> References: <20230411142512.438404-1-david@redhat.com> <20230411142512.438404-4-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230411142512.438404-4-david@redhat.com> X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A2A1FC0004 X-Stat-Signature: 6c3h8iikbupiegfw84ax6btckyq5thd5 X-HE-Tag: 1681241752-649669 X-HE-Meta: U2FsdGVkX18FA+j0jWzk1W0jLayTk6xKG6Ab65MJVxiidJXRqdnxcwF0qMPRicQA/58x1nKJwBb5vr+MXPczxHw6Lo77cjiOeg4ezORHanYkpoCBD/8/6jU2wcosDt+ISXTibf3EzgHV5Js/UtO/3gbthgHpMEOPM8QlasT9g+dAKxc9WONmlW6acFamTq7WIF/qiInpqFOuGs8g7NxLiYCR18zuGmj1TvHyq4tSyU1auDJUVyRZhhF6maJtLR3kDYuK6l2HUaSs4/909SDjCYdgmTkMNARftl3OcSyZCDBhRiuk5Zsnx0OzWcG6sv7FCRhO27/mlIswO5PXZp6e1RAeSZ9bIPqWY5APicn0VDEsBlmt9fm4Epl0GyRtafMFCWbczmf18KWK/Y2d5EjxZNec+TeZvzQlkOJuAyAVo7mvBZQC5nkwax1HkFj+MCxj+rN3BrWCLi3X7QrAASsF9Q6A3ZuGUhK3nQve/Pzh30mKQvT7PpOR4PGAloRKRUMM8EqKCSj7nB2SqFxo5+SHBmRnR8uEA8oF2J2vekfJQiR5y5Hr42Z/G0FIcB3OFerjOualcdQ/ms581uiyxW8bDaQy9Af9M6kvKQ6T5B3lBiQSolm3W5qmvE34vAtQYtM9bPRP17aY/rAabH/GZsiR2ZosDIVP36IIRamdN76yQXElAOR9m7u6HuEHA2RlUWe9PeXCqO2t41lksgyBi4A8BrRjpIcsgi+be09pH3Ga7r9wBfjNd98SrUFszJqiEH+sCJ0jgjKYHhAyq3TFuGOCGD2gx6Sx6dmQXl3r1OZxVggAuGWDCynTJISda/soz0LEzgKIqEeW0emqpeqNgBhP4REcAkgKiIWvssjHyq7OnMPEKstgrfSzw4wq/LaTxIWYL8Em9TzyF4KruuE1tAZSQxaxh2/oOI6dxwuFbNvxjSUZFqli1ZiLa6HUgFuXauZiY0KTEhGpOrtosPCReM/ k9AVtxhm d7CItHTfBDWCbu88wDu7bHAzLvZtfqYDhyd0HNl+XYk1zgt5q+K3rnFwBYgKIAK3jWMzH+d33kG7OV0xttcnzXlKLyPJtVzqu6Yia18ZG5wBLrdciPLWl3Ghng5ieRvJBekObxfOCOJI0ljM1BGe+pG9oin4xwY3wCjybX/nViMzBeYG8Xmzn9LMq8U44FGDr0/HTPg0N3LliQy8kfipWi2plgRqZIugUZT/dN4KGfPrcCvpkqyRWIQEodtpmtGqLb+LlO6vxGF+uzkYXJpEB0xnST623Nh1BF4+rSNXO0EaqzfzvA5EePHcxqBGmTQy4/BSgjJ60R6rQalQdvbdV6WYLNweNly3O43ianSgZVQu6G0mJy+cwqACVKLXK1shRxGoeobuoZd8YMhJ6U6ShHEDjmbKt6chDrexFus0M3tWJu6MeRTSZ2MWXVMPt6c6o2cBpEF+n8tCQwKI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi David. On Tue, Apr 11, 2023 at 04:25:09PM +0200, David Hildenbrand wrote: > On sparc64, there is no HW modified bit, therefore, SW tracks via a SW > bit if the PTE is dirty via pte_mkdirty(). However, pte_mkdirty() > currently also unconditionally sets the HW writable bit, which is wrong. > > pte_mkdirty() is not supposed to make a PTE actually writable, unless the > SW writable bit -- pte_write() -- indicates that the PTE is not > write-protected. Fortunately, sparc64 also defines a SW writable bit. > > For example, this already turned into a problem in the context of > THP splitting as documented in commit 624a2c94f5b7 ("Partly revert "mm/thp: > carry over dirty bit when thp splits on pmd""), and for page migration, > as documented in commit 96a9c287e25d ("mm/migrate: fix wrongly apply write > bit after mkdirty on sparc64"). > > Also, we might want to use the dirty PTE bit in the context of KSM with > shared zeropage [1], whereby setting the page writable would be > problematic. > > But more general, any code that might end up setting a PTE/PMD dirty > inside a VM without write permissions is possibly broken, > > Before this commit (sun4u in QEMU): > root@debian:~/linux/tools/testing/selftests/mm# ./mkdirty > # [INFO] detected THP size: 8192 KiB > TAP version 13 > 1..6 > # [INFO] PTRACE write access > not ok 1 SIGSEGV generated, page not modified > # [INFO] PTRACE write access to THP > not ok 2 SIGSEGV generated, page not modified > # [INFO] Page migration > ok 3 SIGSEGV generated, page not modified > # [INFO] Page migration of THP > ok 4 SIGSEGV generated, page not modified > # [INFO] PTE-mapping a THP > ok 5 SIGSEGV generated, page not modified > # [INFO] UFFDIO_COPY > not ok 6 SIGSEGV generated, page not modified > Bail out! 3 out of 6 tests failed > # Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0 > > Test #3,#4,#5 pass ever since we added some MM workarounds, the > underlying issue remains. > > Let's fix the remaining issues and prepare for reverting the workarounds > by setting the HW writable bit only if both, the SW dirty bit and the SW > writable bit are set. > > We have to move pte_dirty() and pte_dirty() up. The code patching One of the pte_dirty() should be replaced with pte_write(). It would have been nice to separate moving and changes in two patches, but keeping it together works too. > mechanism and handling constants > 22bit is a bit special on sparc64. > > The ASM logic in pte_mkdirty() and pte_mkwrite() match the logic in > pte_mkold() to create the mask depending on the machine type. The ASM > logic in __pte_mkhwwrite() matches the logic in pte_present(), just > using an "or" instead of an "and" instruction. > > With this commit (sun4u in QEMU): > root@debian:~/linux/tools/testing/selftests/mm# ./mkdirty > # [INFO] detected THP size: 8192 KiB > TAP version 13 > 1..6 > # [INFO] PTRACE write access > ok 1 SIGSEGV generated, page not modified > # [INFO] PTRACE write access to THP > ok 2 SIGSEGV generated, page not modified > # [INFO] Page migration > ok 3 SIGSEGV generated, page not modified > # [INFO] Page migration of THP > ok 4 SIGSEGV generated, page not modified > # [INFO] PTE-mapping a THP > ok 5 SIGSEGV generated, page not modified > # [INFO] UFFDIO_COPY > ok 6 SIGSEGV generated, page not modified > # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 Nice! > > This handling seems to have been in place forever. > > [1] https://lkml.kernel.org/r/533a7c3d-3a48-b16b-b421-6e8386e0b142@redhat.com > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Signed-off-by: David Hildenbrand I tried to follow your changes, but my knowledge of gcc assembler failed me. But based on the nice and detailed change log and the code I managed to understand: Acked-by: Sam Ravnborg > --- > arch/sparc/include/asm/pgtable_64.h | 116 ++++++++++++++++------------ > 1 file changed, 66 insertions(+), 50 deletions(-) > > diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h > index 2dc8d4641734..5563efa1a19f 100644 > --- a/arch/sparc/include/asm/pgtable_64.h > +++ b/arch/sparc/include/asm/pgtable_64.h > @@ -357,6 +357,42 @@ static inline pgprot_t pgprot_noncached(pgprot_t prot) > */ > #define pgprot_noncached pgprot_noncached > > +static inline unsigned long pte_dirty(pte_t pte) > +{ > + unsigned long mask; > + > + __asm__ __volatile__( > + "\n661: mov %1, %0\n" > + " nop\n" > + " .section .sun4v_2insn_patch, \"ax\"\n" > + " .word 661b\n" > + " sethi %%uhi(%2), %0\n" > + " sllx %0, 32, %0\n" > + " .previous\n" > + : "=r" (mask) > + : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); > + > + return (pte_val(pte) & mask); > +} > + > +static inline unsigned long pte_write(pte_t pte) > +{ > + unsigned long mask; > + > + __asm__ __volatile__( > + "\n661: mov %1, %0\n" > + " nop\n" > + " .section .sun4v_2insn_patch, \"ax\"\n" > + " .word 661b\n" > + " sethi %%uhi(%2), %0\n" > + " sllx %0, 32, %0\n" > + " .previous\n" > + : "=r" (mask) > + : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); > + > + return (pte_val(pte) & mask); > +} > + > #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE) > pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags); > #define arch_make_huge_pte arch_make_huge_pte > @@ -418,28 +454,43 @@ static inline bool is_hugetlb_pte(pte_t pte) > } > #endif > > +static inline pte_t __pte_mkhwwrite(pte_t pte) > +{ > + unsigned long val = pte_val(pte); > + > + /* > + * Note: we only want to set the HW writable bit if the SW writable bit > + * and the SW dirty bit are set. > + */ > + __asm__ __volatile__( > + "\n661: or %0, %2, %0\n" > + " .section .sun4v_1insn_patch, \"ax\"\n" > + " .word 661b\n" > + " or %0, %3, %0\n" > + " .previous\n" > + : "=r" (val) > + : "0" (val), "i" (_PAGE_W_4U), "i" (_PAGE_W_4V)); > + > + return __pte(val); > +} > + > static inline pte_t pte_mkdirty(pte_t pte) > { > - unsigned long val = pte_val(pte), tmp; > + unsigned long val = pte_val(pte), mask; > > __asm__ __volatile__( > - "\n661: or %0, %3, %0\n" > - " nop\n" > - "\n662: nop\n" > + "\n661: mov %1, %0\n" > " nop\n" > " .section .sun4v_2insn_patch, \"ax\"\n" > " .word 661b\n" > - " sethi %%uhi(%4), %1\n" > - " sllx %1, 32, %1\n" > - " .word 662b\n" > - " or %1, %%lo(%4), %1\n" > - " or %0, %1, %0\n" > + " sethi %%uhi(%2), %0\n" > + " sllx %0, 32, %0\n" > " .previous\n" > - : "=r" (val), "=r" (tmp) > - : "0" (val), "i" (_PAGE_MODIFIED_4U | _PAGE_W_4U), > - "i" (_PAGE_MODIFIED_4V | _PAGE_W_4V)); > + : "=r" (mask) > + : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); > > - return __pte(val); > + pte = __pte(val | mask); > + return pte_write(pte) ? __pte_mkhwwrite(pte) : pte; > } > > static inline pte_t pte_mkclean(pte_t pte) > @@ -481,7 +532,8 @@ static inline pte_t pte_mkwrite(pte_t pte) > : "=r" (mask) > : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); > > - return __pte(val | mask); > + pte = __pte(val | mask); > + return pte_dirty(pte) ? __pte_mkhwwrite(pte) : pte; > } > > static inline pte_t pte_wrprotect(pte_t pte) > @@ -584,42 +636,6 @@ static inline unsigned long pte_young(pte_t pte) > return (pte_val(pte) & mask); > } > > -static inline unsigned long pte_dirty(pte_t pte) > -{ > - unsigned long mask; > - > - __asm__ __volatile__( > - "\n661: mov %1, %0\n" > - " nop\n" > - " .section .sun4v_2insn_patch, \"ax\"\n" > - " .word 661b\n" > - " sethi %%uhi(%2), %0\n" > - " sllx %0, 32, %0\n" > - " .previous\n" > - : "=r" (mask) > - : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); > - > - return (pte_val(pte) & mask); > -} > - > -static inline unsigned long pte_write(pte_t pte) > -{ > - unsigned long mask; > - > - __asm__ __volatile__( > - "\n661: mov %1, %0\n" > - " nop\n" > - " .section .sun4v_2insn_patch, \"ax\"\n" > - " .word 661b\n" > - " sethi %%uhi(%2), %0\n" > - " sllx %0, 32, %0\n" > - " .previous\n" > - : "=r" (mask) > - : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); > - > - return (pte_val(pte) & mask); > -} > - > static inline unsigned long pte_exec(pte_t pte) > { > unsigned long mask; > -- > 2.39.2