From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02F65C2BA80 for ; Tue, 7 Apr 2020 15:54:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C3B3520768 for ; Tue, 7 Apr 2020 15:54:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727703AbgDGPy4 (ORCPT ); Tue, 7 Apr 2020 11:54:56 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:7950 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727078AbgDGPyz (ORCPT ); Tue, 7 Apr 2020 11:54:55 -0400 Received: from pps.filterd (m0098394.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 037Fahnv072846 for ; Tue, 7 Apr 2020 11:54:54 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 3082pekgns-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 07 Apr 2020 11:54:54 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 7 Apr 2020 16:54:39 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 7 Apr 2020 16:54:32 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 037Fshw958654742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 7 Apr 2020 15:54:43 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 17FFE52059; Tue, 7 Apr 2020 15:54:43 +0000 (GMT) Received: from thinkpad (unknown [9.145.186.71]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 2847C52050; Tue, 7 Apr 2020 15:54:42 +0000 (GMT) Date: Tue, 7 Apr 2020 17:54:40 +0200 From: Gerald Schaefer To: Anshuman Khandual Cc: linux-mm@kvack.org, christophe.leroy@c-s.fr, Jonathan Corbet , Andrew Morton , Mike Rapoport , Vineet Gupta , Catalin Marinas , Will Deacon , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , "Kirill A . Shutemov" , Paul Walmsley , Palmer Dabbelt , linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-riscv@lists.infradead.org, x86@kernel.org, linux-doc@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, Gerald Schaefer Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests In-Reply-To: References: <1585027375-9997-1-git-send-email-anshuman.khandual@arm.com> <20200331143059.29fca8fa@thinkpad> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 20040715-0012-0000-0000-000003A01446 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20040715-0013-0000-0000-000021DD3634 Message-Id: <20200407175440.41cc00a5@thinkpad> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.676 definitions=2020-04-07_07:2020-04-07,2020-04-07 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 mlxscore=0 clxscore=1015 spamscore=0 lowpriorityscore=0 malwarescore=0 phishscore=0 adultscore=0 mlxlogscore=999 suspectscore=0 bulkscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004070128 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 5 Apr 2020 17:58:14 +0530 Anshuman Khandual wrote: [...] > > > > Could be fixed like this (the first de-reference is a bit special, > > because at that point *ptep does not really point to a large (pmd) entry > > yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() > > There seems to be an inconsistency on s390 platform. Even though it defines > a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET > which should have forced it fallback on generic huge_ptep_get() but it does > not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when > an arch uses . s390 does not use that and hence gets > away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds > confusing ? But I might not have the entire context here. Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get() was initially introduced because of s390, and now we don't select __HAVE_ARCH_HUGE_PTEP_GET... As you realized, I guess this is because we do not use generic hugetlb.h. And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad ("hugetlb: introduce generic version of huge_ptep_get"), that was probably the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET. Nothing really wrong with that, but yes, very confusing. Maybe we could also select it for s390, even though it wouldn't have any functional impact (so far), just for less confusion. Maybe also thinking about using the generic hugetlb.h, not sure if the original reasons for not doing so would still apply. Now I only need to find the time... > > > conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, > > because we do have some special bits there in our large pmds. It seems > > to also work w/o that alignment, but it feels a bit wrong): > > Sure, we can accommodate that. > > > > > @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test > > unsigned long vaddr, pgprot_t prot) > > { > > struct page *page = pfn_to_page(pfn); > > - pte_t pte = READ_ONCE(*ptep); > > + pte_t pte; > > > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > > + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); > > So that keeps the existing value in 'ptep' pointer at bay and instead > construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at > least provide the seed that can be ORed with RANDOM_ORVALUE before > being masked with PMD_MASK. Do you see any problem ? Yes, unfortunately. The problem is that the resulting pte is not marked as present. The conversion pte -> (huge) pmd, which is done in set_huge_pte_at() for s390, will establish an empty pmd for non-present ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent huge_ptep_get() will not result in the same original pte value. If you want to preserve and check the RANDOM_ORVALUE, it has to be a present pte, hence the mk_pte(_phys). > > Some thing like this instead. > > pte_t pte = READ_ONCE(*ptep); > pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); > > We cannot use mk_pte_phys() as it is defined only on some platforms > without any generic fallback for others. Oh, didn't know that, sorry. What about using mk_pte() instead, at least it would result in a present pte: pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); And if you also want to do some with the existing value, which seems to be an empty pte, then maybe just check if writing and reading that value with set_huge_pte_at() / huge_ptep_get() returns the same, i.e. initially w/o RANDOM_ORVALUE. So, in combination, like this (BTW, why is the barrier() needed, it is not used for the other set_huge_pte_at() calls later?): @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test struct page *page = pfn_to_page(pfn); pte_t pte = READ_ONCE(*ptep); - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); + set_huge_pte_at(mm, vaddr, ptep, pte); + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); + + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); set_huge_pte_at(mm, vaddr, ptep, pte); barrier(); WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); This would actually add a new test "write empty pte with set_huge_pte_at(), then verify with huge_ptep_get()", which happens to trigger a warning on s390 :-) That (new) warning actually points to misbehavior on s390, we do not write a correct empty pmd in this case, but one that is empty and also marked as large. huge_ptep_get() will then not correctly recognize it as empty and do wrong conversion. It is also not consistent with huge_ptep_get_and_clear(), where we write the empty pmd w/o marking as large. Last but not least it would also break our pmd_protnone() logic (see below). Another nice finding on s390 :-) I don't think this has any effect in practice (yet), but I will post a fix for that, just in case you will add / change this test. > > > set_huge_pte_at(mm, vaddr, ptep, pte); > > barrier(); > > WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > > huge_pte_clear(mm, vaddr, ptep, PMD_SIZE); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_set_wrprotect(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(huge_pte_write(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_get_and_clear(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test > > pte = huge_pte_mkwrite(pte); > > pte = huge_pte_mkdirty(pte); > > huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte))); > > } > > #else > > > > 3) The pmd_protnone_tests() has an issue, because it passes a pmd to > > pmd_protnone() which has not been marked as large. We check for large > > pmd in the s390 implementation of pmd_protnone(), and will fail if a > > pmd is not large. We had similar issues before, in other helpers, where > > I changed the logic on s390 to not require the pmd large check, but I'm > > not so sure in this case. Is there a valid use case for doing > > pmd_protnone() on "normal" pmds? Or could this be changed like this: > > That is a valid question. IIUC, all existing callers for pmd_protnone() > ensure that it is indeed a huge PMD. But even assuming otherwise should > not the huge PMD requirement get checked in the caller itself rather than > in the arch helper which is just supposed to check the existence of the > dedicated PTE bit(s) for this purpose. Purely from a helper perspective > pmd_protnone() should not really care about being large even though it > might never get used without one. > > Also all platforms (except s390) derive the pmd_protnone() from their > respective pte_protnone(). I wonder why should s390 be any different > unless it is absolutely necessary. This is again because of our different page table entry layouts for pte/pmd and (large) pmd. The bits we check for pmd_protnone() are not valid for normal pmd/pte, and we would return undefined result for normal entries. Of course, we could rely on nobody calling pmd_protnone() on normal pmds, but in this case we also use pmd_large() check in pmd_protnone() for indication if the pmd is present. W/o that, we would return true for empty pmds, that doesn't sound right. Not sure if we also want to rely on nobody calling pmd_protnone() on empty pmds. Anyway, if in practice it is not correct to use pmd_protnone() on normal pmds, then I would suggest that your tests should also not do / test it. And I strongly assume that it is not correct, at least I cannot think of a valid case, and of course s390 would already be broken if there was such a case. Regards, Gerald From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gerald Schaefer Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests Date: Tue, 7 Apr 2020 17:54:40 +0200 Message-ID: <20200407175440.41cc00a5@thinkpad> References: <1585027375-9997-1-git-send-email-anshuman.khandual@arm.com> <20200331143059.29fca8fa@thinkpad> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-doc-owner@vger.kernel.org To: Anshuman Khandual Cc: linux-mm@kvack.org, christophe.leroy@c-s.fr, Jonathan Corbet , Andrew Morton , Mike Rapoport , Vineet Gupta , Catalin Marinas , Will Deacon , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , "Kirill A . Shutemov" , Paul Walmsley , Palmer List-Id: linux-arch.vger.kernel.org On Sun, 5 Apr 2020 17:58:14 +0530 Anshuman Khandual wrote: [...] > > > > Could be fixed like this (the first de-reference is a bit special, > > because at that point *ptep does not really point to a large (pmd) entry > > yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() > > There seems to be an inconsistency on s390 platform. Even though it defines > a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET > which should have forced it fallback on generic huge_ptep_get() but it does > not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when > an arch uses . s390 does not use that and hence gets > away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds > confusing ? But I might not have the entire context here. Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get() was initially introduced because of s390, and now we don't select __HAVE_ARCH_HUGE_PTEP_GET... As you realized, I guess this is because we do not use generic hugetlb.h. And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad ("hugetlb: introduce generic version of huge_ptep_get"), that was probably the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET. Nothing really wrong with that, but yes, very confusing. Maybe we could also select it for s390, even though it wouldn't have any functional impact (so far), just for less confusion. Maybe also thinking about using the generic hugetlb.h, not sure if the original reasons for not doing so would still apply. Now I only need to find the time... > > > conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, > > because we do have some special bits there in our large pmds. It seems > > to also work w/o that alignment, but it feels a bit wrong): > > Sure, we can accommodate that. > > > > > @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test > > unsigned long vaddr, pgprot_t prot) > > { > > struct page *page = pfn_to_page(pfn); > > - pte_t pte = READ_ONCE(*ptep); > > + pte_t pte; > > > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > > + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); > > So that keeps the existing value in 'ptep' pointer at bay and instead > construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at > least provide the seed that can be ORed with RANDOM_ORVALUE before > being masked with PMD_MASK. Do you see any problem ? Yes, unfortunately. The problem is that the resulting pte is not marked as present. The conversion pte -> (huge) pmd, which is done in set_huge_pte_at() for s390, will establish an empty pmd for non-present ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent huge_ptep_get() will not result in the same original pte value. If you want to preserve and check the RANDOM_ORVALUE, it has to be a present pte, hence the mk_pte(_phys). > > Some thing like this instead. > > pte_t pte = READ_ONCE(*ptep); > pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); > > We cannot use mk_pte_phys() as it is defined only on some platforms > without any generic fallback for others. Oh, didn't know that, sorry. What about using mk_pte() instead, at least it would result in a present pte: pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); And if you also want to do some with the existing value, which seems to be an empty pte, then maybe just check if writing and reading that value with set_huge_pte_at() / huge_ptep_get() returns the same, i.e. initially w/o RANDOM_ORVALUE. So, in combination, like this (BTW, why is the barrier() needed, it is not used for the other set_huge_pte_at() calls later?): @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test struct page *page = pfn_to_page(pfn); pte_t pte = READ_ONCE(*ptep); - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); + set_huge_pte_at(mm, vaddr, ptep, pte); + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); + + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); set_huge_pte_at(mm, vaddr, ptep, pte); barrier(); WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); This would actually add a new test "write empty pte with set_huge_pte_at(), then verify with huge_ptep_get()", which happens to trigger a warning on s390 :-) That (new) warning actually points to misbehavior on s390, we do not write a correct empty pmd in this case, but one that is empty and also marked as large. huge_ptep_get() will then not correctly recognize it as empty and do wrong conversion. It is also not consistent with huge_ptep_get_and_clear(), where we write the empty pmd w/o marking as large. Last but not least it would also break our pmd_protnone() logic (see below). Another nice finding on s390 :-) I don't think this has any effect in practice (yet), but I will post a fix for that, just in case you will add / change this test. > > > set_huge_pte_at(mm, vaddr, ptep, pte); > > barrier(); > > WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > > huge_pte_clear(mm, vaddr, ptep, PMD_SIZE); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_set_wrprotect(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(huge_pte_write(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_get_and_clear(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test > > pte = huge_pte_mkwrite(pte); > > pte = huge_pte_mkdirty(pte); > > huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte))); > > } > > #else > > > > 3) The pmd_protnone_tests() has an issue, because it passes a pmd to > > pmd_protnone() which has not been marked as large. We check for large > > pmd in the s390 implementation of pmd_protnone(), and will fail if a > > pmd is not large. We had similar issues before, in other helpers, where > > I changed the logic on s390 to not require the pmd large check, but I'm > > not so sure in this case. Is there a valid use case for doing > > pmd_protnone() on "normal" pmds? Or could this be changed like this: > > That is a valid question. IIUC, all existing callers for pmd_protnone() > ensure that it is indeed a huge PMD. But even assuming otherwise should > not the huge PMD requirement get checked in the caller itself rather than > in the arch helper which is just supposed to check the existence of the > dedicated PTE bit(s) for this purpose. Purely from a helper perspective > pmd_protnone() should not really care about being large even though it > might never get used without one. > > Also all platforms (except s390) derive the pmd_protnone() from their > respective pte_protnone(). I wonder why should s390 be any different > unless it is absolutely necessary. This is again because of our different page table entry layouts for pte/pmd and (large) pmd. The bits we check for pmd_protnone() are not valid for normal pmd/pte, and we would return undefined result for normal entries. Of course, we could rely on nobody calling pmd_protnone() on normal pmds, but in this case we also use pmd_large() check in pmd_protnone() for indication if the pmd is present. W/o that, we would return true for empty pmds, that doesn't sound right. Not sure if we also want to rely on nobody calling pmd_protnone() on empty pmds. Anyway, if in practice it is not correct to use pmd_protnone() on normal pmds, then I would suggest that your tests should also not do / test it. And I strongly assume that it is not correct, at least I cannot think of a valid case, and of course s390 would already be broken if there was such a case. Regards, Gerald From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EEE0C2BA80 for ; Tue, 7 Apr 2020 15:55:02 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BC0022076E for ; Tue, 7 Apr 2020 15:55:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="ojhYzHdo" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BC0022076E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=de.ibm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+infradead-linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:Message-Id: Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=w4ikfm8QhYh6nR9GhNo4/qCSugkX5G7xhs+sge1bwLU=; b=ojhYzHdomiOF0jhPba9p2TD5o UvPJDGNbGucbIglGu4wHGNxmY7cE6JRL6v2i3hVP9lfABZNuZ6fnq50zlydenx3nC7Qqu/k+75QTD LJ72vsSdTO50ph5iK4EAZAJLpLGMPTqJ2uJp34S6eKQzCyCntxnhXVwpH7LSVEvc4IC7tjUBL0kXC O/QMBZOptsBrZm+bskgVDbPfNFV/P4Pz//fopnb/Q/tR8rJwd1T46c4aLhAqZy6C37uv1jb0kbdMq i64iUTdS4WdukhYTJ+6XS4xKy6wE5B8nZZiGk4KT0CYbeCjkrHAtSLgTfcideZvm61KLZE7zSdtxY F+Ys5xmvw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jLqZ2-0002ML-CB; Tue, 07 Apr 2020 15:55:00 +0000 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5] helo=mx0a-001b2d01.pphosted.com) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jLqYz-0002K5-9E for linux-riscv@lists.infradead.org; Tue, 07 Apr 2020 15:54:59 +0000 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 037FYMuU070543 for ; Tue, 7 Apr 2020 11:54:54 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 308eu8f4gu-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 07 Apr 2020 11:54:53 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 7 Apr 2020 16:54:39 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 7 Apr 2020 16:54:32 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 037Fshw958654742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 7 Apr 2020 15:54:43 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 17FFE52059; Tue, 7 Apr 2020 15:54:43 +0000 (GMT) Received: from thinkpad (unknown [9.145.186.71]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 2847C52050; Tue, 7 Apr 2020 15:54:42 +0000 (GMT) Date: Tue, 7 Apr 2020 17:54:40 +0200 From: Gerald Schaefer To: Anshuman Khandual Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests In-Reply-To: References: <1585027375-9997-1-git-send-email-anshuman.khandual@arm.com> <20200331143059.29fca8fa@thinkpad> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 20040715-0012-0000-0000-000003A01446 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20040715-0013-0000-0000-000021DD3634 Message-Id: <20200407175440.41cc00a5@thinkpad> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.676 definitions=2020-04-07_07:2020-04-07, 2020-04-07 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 malwarescore=0 clxscore=1015 priorityscore=1501 impostorscore=0 spamscore=0 mlxlogscore=999 suspectscore=0 lowpriorityscore=0 adultscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004070128 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200407_085457_443073_5BC0559C X-CRM114-Status: GOOD ( 49.63 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-doc@vger.kernel.org, Benjamin Herrenschmidt , Heiko Carstens , linux-mm@kvack.org, Paul Mackerras , "H. Peter Anvin" , linux-riscv@lists.infradead.org, Will Deacon , linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, Jonathan Corbet , Michael Ellerman , x86@kernel.org, Mike Rapoport , Christian Borntraeger , Ingo Molnar , Gerald Schaefer , Catalin Marinas , linux-snps-arc@lists.infradead.org, Vasily Gorbik , Borislav Petkov , Paul Walmsley , "Kirill A . Shutemov" , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, christophe.leroy@c-s.fr, Vineet Gupta , linux-kernel@vger.kernel.org, Palmer Dabbelt , Andrew Morton , linuxppc-dev@lists.ozlabs.org Sender: "linux-riscv" Errors-To: linux-riscv-bounces+infradead-linux-riscv=archiver.kernel.org@lists.infradead.org On Sun, 5 Apr 2020 17:58:14 +0530 Anshuman Khandual wrote: [...] > > > > Could be fixed like this (the first de-reference is a bit special, > > because at that point *ptep does not really point to a large (pmd) entry > > yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() > > There seems to be an inconsistency on s390 platform. Even though it defines > a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET > which should have forced it fallback on generic huge_ptep_get() but it does > not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when > an arch uses . s390 does not use that and hence gets > away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds > confusing ? But I might not have the entire context here. Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get() was initially introduced because of s390, and now we don't select __HAVE_ARCH_HUGE_PTEP_GET... As you realized, I guess this is because we do not use generic hugetlb.h. And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad ("hugetlb: introduce generic version of huge_ptep_get"), that was probably the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET. Nothing really wrong with that, but yes, very confusing. Maybe we could also select it for s390, even though it wouldn't have any functional impact (so far), just for less confusion. Maybe also thinking about using the generic hugetlb.h, not sure if the original reasons for not doing so would still apply. Now I only need to find the time... > > > conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, > > because we do have some special bits there in our large pmds. It seems > > to also work w/o that alignment, but it feels a bit wrong): > > Sure, we can accommodate that. > > > > > @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test > > unsigned long vaddr, pgprot_t prot) > > { > > struct page *page = pfn_to_page(pfn); > > - pte_t pte = READ_ONCE(*ptep); > > + pte_t pte; > > > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > > + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); > > So that keeps the existing value in 'ptep' pointer at bay and instead > construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at > least provide the seed that can be ORed with RANDOM_ORVALUE before > being masked with PMD_MASK. Do you see any problem ? Yes, unfortunately. The problem is that the resulting pte is not marked as present. The conversion pte -> (huge) pmd, which is done in set_huge_pte_at() for s390, will establish an empty pmd for non-present ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent huge_ptep_get() will not result in the same original pte value. If you want to preserve and check the RANDOM_ORVALUE, it has to be a present pte, hence the mk_pte(_phys). > > Some thing like this instead. > > pte_t pte = READ_ONCE(*ptep); > pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); > > We cannot use mk_pte_phys() as it is defined only on some platforms > without any generic fallback for others. Oh, didn't know that, sorry. What about using mk_pte() instead, at least it would result in a present pte: pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); And if you also want to do some with the existing value, which seems to be an empty pte, then maybe just check if writing and reading that value with set_huge_pte_at() / huge_ptep_get() returns the same, i.e. initially w/o RANDOM_ORVALUE. So, in combination, like this (BTW, why is the barrier() needed, it is not used for the other set_huge_pte_at() calls later?): @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test struct page *page = pfn_to_page(pfn); pte_t pte = READ_ONCE(*ptep); - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); + set_huge_pte_at(mm, vaddr, ptep, pte); + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); + + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); set_huge_pte_at(mm, vaddr, ptep, pte); barrier(); WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); This would actually add a new test "write empty pte with set_huge_pte_at(), then verify with huge_ptep_get()", which happens to trigger a warning on s390 :-) That (new) warning actually points to misbehavior on s390, we do not write a correct empty pmd in this case, but one that is empty and also marked as large. huge_ptep_get() will then not correctly recognize it as empty and do wrong conversion. It is also not consistent with huge_ptep_get_and_clear(), where we write the empty pmd w/o marking as large. Last but not least it would also break our pmd_protnone() logic (see below). Another nice finding on s390 :-) I don't think this has any effect in practice (yet), but I will post a fix for that, just in case you will add / change this test. > > > set_huge_pte_at(mm, vaddr, ptep, pte); > > barrier(); > > WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > > huge_pte_clear(mm, vaddr, ptep, PMD_SIZE); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_set_wrprotect(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(huge_pte_write(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_get_and_clear(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test > > pte = huge_pte_mkwrite(pte); > > pte = huge_pte_mkdirty(pte); > > huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte))); > > } > > #else > > > > 3) The pmd_protnone_tests() has an issue, because it passes a pmd to > > pmd_protnone() which has not been marked as large. We check for large > > pmd in the s390 implementation of pmd_protnone(), and will fail if a > > pmd is not large. We had similar issues before, in other helpers, where > > I changed the logic on s390 to not require the pmd large check, but I'm > > not so sure in this case. Is there a valid use case for doing > > pmd_protnone() on "normal" pmds? Or could this be changed like this: > > That is a valid question. IIUC, all existing callers for pmd_protnone() > ensure that it is indeed a huge PMD. But even assuming otherwise should > not the huge PMD requirement get checked in the caller itself rather than > in the arch helper which is just supposed to check the existence of the > dedicated PTE bit(s) for this purpose. Purely from a helper perspective > pmd_protnone() should not really care about being large even though it > might never get used without one. > > Also all platforms (except s390) derive the pmd_protnone() from their > respective pte_protnone(). I wonder why should s390 be any different > unless it is absolutely necessary. This is again because of our different page table entry layouts for pte/pmd and (large) pmd. The bits we check for pmd_protnone() are not valid for normal pmd/pte, and we would return undefined result for normal entries. Of course, we could rely on nobody calling pmd_protnone() on normal pmds, but in this case we also use pmd_large() check in pmd_protnone() for indication if the pmd is present. W/o that, we would return true for empty pmds, that doesn't sound right. Not sure if we also want to rely on nobody calling pmd_protnone() on empty pmds. Anyway, if in practice it is not correct to use pmd_protnone() on normal pmds, then I would suggest that your tests should also not do / test it. And I strongly assume that it is not correct, at least I cannot think of a valid case, and of course s390 would already be broken if there was such a case. Regards, Gerald From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2431C2BA80 for ; Tue, 7 Apr 2020 15:57:38 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 34EC42072A for ; Tue, 7 Apr 2020 15:57:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34EC42072A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=de.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 48xX9W5v6zzDqyB for ; Wed, 8 Apr 2020 01:57:35 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=de.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=gerald.schaefer@de.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=de.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 48xX6f6rkfzDqCs for ; Wed, 8 Apr 2020 01:55:06 +1000 (AEST) Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 037FXt3T016091 for ; Tue, 7 Apr 2020 11:55:04 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 306nvvw7dj-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 07 Apr 2020 11:55:02 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 7 Apr 2020 16:54:39 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 7 Apr 2020 16:54:32 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 037Fshw958654742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 7 Apr 2020 15:54:43 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 17FFE52059; Tue, 7 Apr 2020 15:54:43 +0000 (GMT) Received: from thinkpad (unknown [9.145.186.71]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 2847C52050; Tue, 7 Apr 2020 15:54:42 +0000 (GMT) Date: Tue, 7 Apr 2020 17:54:40 +0200 From: Gerald Schaefer To: Anshuman Khandual Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests In-Reply-To: References: <1585027375-9997-1-git-send-email-anshuman.khandual@arm.com> <20200331143059.29fca8fa@thinkpad> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 x-cbid: 20040715-0012-0000-0000-000003A01446 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20040715-0013-0000-0000-000021DD3634 Message-Id: <20200407175440.41cc00a5@thinkpad> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.676 definitions=2020-04-07_07:2020-04-07, 2020-04-07 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 clxscore=1015 malwarescore=0 phishscore=0 mlxscore=0 priorityscore=1501 adultscore=0 suspectscore=0 lowpriorityscore=0 mlxlogscore=999 spamscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004070128 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-doc@vger.kernel.org, Heiko Carstens , linux-mm@kvack.org, Paul Mackerras , "H. Peter Anvin" , linux-riscv@lists.infradead.org, Will Deacon , linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, Jonathan Corbet , x86@kernel.org, Mike Rapoport , Christian Borntraeger , Ingo Molnar , Gerald Schaefer , Catalin Marinas , linux-snps-arc@lists.infradead.org, Vasily Gorbik , Borislav Petkov , Paul Walmsley , "Kirill A . Shutemov" , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, Vineet Gupta , linux-kernel@vger.kernel.org, Palmer Dabbelt , Andrew Morton , linuxppc-dev@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Sun, 5 Apr 2020 17:58:14 +0530 Anshuman Khandual wrote: [...] > > > > Could be fixed like this (the first de-reference is a bit special, > > because at that point *ptep does not really point to a large (pmd) entry > > yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() > > There seems to be an inconsistency on s390 platform. Even though it defines > a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET > which should have forced it fallback on generic huge_ptep_get() but it does > not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when > an arch uses . s390 does not use that and hence gets > away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds > confusing ? But I might not have the entire context here. Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get() was initially introduced because of s390, and now we don't select __HAVE_ARCH_HUGE_PTEP_GET... As you realized, I guess this is because we do not use generic hugetlb.h. And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad ("hugetlb: introduce generic version of huge_ptep_get"), that was probably the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET. Nothing really wrong with that, but yes, very confusing. Maybe we could also select it for s390, even though it wouldn't have any functional impact (so far), just for less confusion. Maybe also thinking about using the generic hugetlb.h, not sure if the original reasons for not doing so would still apply. Now I only need to find the time... > > > conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, > > because we do have some special bits there in our large pmds. It seems > > to also work w/o that alignment, but it feels a bit wrong): > > Sure, we can accommodate that. > > > > > @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test > > unsigned long vaddr, pgprot_t prot) > > { > > struct page *page = pfn_to_page(pfn); > > - pte_t pte = READ_ONCE(*ptep); > > + pte_t pte; > > > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > > + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); > > So that keeps the existing value in 'ptep' pointer at bay and instead > construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at > least provide the seed that can be ORed with RANDOM_ORVALUE before > being masked with PMD_MASK. Do you see any problem ? Yes, unfortunately. The problem is that the resulting pte is not marked as present. The conversion pte -> (huge) pmd, which is done in set_huge_pte_at() for s390, will establish an empty pmd for non-present ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent huge_ptep_get() will not result in the same original pte value. If you want to preserve and check the RANDOM_ORVALUE, it has to be a present pte, hence the mk_pte(_phys). > > Some thing like this instead. > > pte_t pte = READ_ONCE(*ptep); > pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); > > We cannot use mk_pte_phys() as it is defined only on some platforms > without any generic fallback for others. Oh, didn't know that, sorry. What about using mk_pte() instead, at least it would result in a present pte: pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); And if you also want to do some with the existing value, which seems to be an empty pte, then maybe just check if writing and reading that value with set_huge_pte_at() / huge_ptep_get() returns the same, i.e. initially w/o RANDOM_ORVALUE. So, in combination, like this (BTW, why is the barrier() needed, it is not used for the other set_huge_pte_at() calls later?): @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test struct page *page = pfn_to_page(pfn); pte_t pte = READ_ONCE(*ptep); - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); + set_huge_pte_at(mm, vaddr, ptep, pte); + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); + + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); set_huge_pte_at(mm, vaddr, ptep, pte); barrier(); WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); This would actually add a new test "write empty pte with set_huge_pte_at(), then verify with huge_ptep_get()", which happens to trigger a warning on s390 :-) That (new) warning actually points to misbehavior on s390, we do not write a correct empty pmd in this case, but one that is empty and also marked as large. huge_ptep_get() will then not correctly recognize it as empty and do wrong conversion. It is also not consistent with huge_ptep_get_and_clear(), where we write the empty pmd w/o marking as large. Last but not least it would also break our pmd_protnone() logic (see below). Another nice finding on s390 :-) I don't think this has any effect in practice (yet), but I will post a fix for that, just in case you will add / change this test. > > > set_huge_pte_at(mm, vaddr, ptep, pte); > > barrier(); > > WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > > huge_pte_clear(mm, vaddr, ptep, PMD_SIZE); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_set_wrprotect(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(huge_pte_write(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_get_and_clear(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test > > pte = huge_pte_mkwrite(pte); > > pte = huge_pte_mkdirty(pte); > > huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte))); > > } > > #else > > > > 3) The pmd_protnone_tests() has an issue, because it passes a pmd to > > pmd_protnone() which has not been marked as large. We check for large > > pmd in the s390 implementation of pmd_protnone(), and will fail if a > > pmd is not large. We had similar issues before, in other helpers, where > > I changed the logic on s390 to not require the pmd large check, but I'm > > not so sure in this case. Is there a valid use case for doing > > pmd_protnone() on "normal" pmds? Or could this be changed like this: > > That is a valid question. IIUC, all existing callers for pmd_protnone() > ensure that it is indeed a huge PMD. But even assuming otherwise should > not the huge PMD requirement get checked in the caller itself rather than > in the arch helper which is just supposed to check the existence of the > dedicated PTE bit(s) for this purpose. Purely from a helper perspective > pmd_protnone() should not really care about being large even though it > might never get used without one. > > Also all platforms (except s390) derive the pmd_protnone() from their > respective pte_protnone(). I wonder why should s390 be any different > unless it is absolutely necessary. This is again because of our different page table entry layouts for pte/pmd and (large) pmd. The bits we check for pmd_protnone() are not valid for normal pmd/pte, and we would return undefined result for normal entries. Of course, we could rely on nobody calling pmd_protnone() on normal pmds, but in this case we also use pmd_large() check in pmd_protnone() for indication if the pmd is present. W/o that, we would return true for empty pmds, that doesn't sound right. Not sure if we also want to rely on nobody calling pmd_protnone() on empty pmds. Anyway, if in practice it is not correct to use pmd_protnone() on normal pmds, then I would suggest that your tests should also not do / test it. And I strongly assume that it is not correct, at least I cannot think of a valid case, and of course s390 would already be broken if there was such a case. Regards, Gerald From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FB9FC2BA1A for ; Tue, 7 Apr 2020 15:54:59 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CE6932072A for ; Tue, 7 Apr 2020 15:54:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="JvlT/hjb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE6932072A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=de.ibm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-snps-arc-bounces+linux-snps-arc=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Message-Id:MIME-Version:References: In-Reply-To:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=t3MzG0XrkjQFjIZBj/Dnd/7H4TI3ifGv3Ck+bocTmRo=; b=JvlT/hjbeTTqlJ wKav6g47HvIwf9bJXp6OJY4oDCaRhXF6AG+QlYljKh4nGNfkSF09kiUJtjc2L8itNduGsnoKtAmjW ANf7wTiQlXYJoDI1p0L9ElrxNilnynx72BLvoe1pTytyzlLkAGp+VKAWXu/f8gqOY+ibg9lCW6eNC Yzgg5jPs9BeitEb2p7ZkZsrc7P0zaeMQHiazNZCUBOSF/7WM6BRhHnJvQn6gSFBCblZCIGT8We20p w4LNHH0EN6udKBv/Z81lT9S7zm+icIJ+smEeeirElLEJBKbv9VkDM0kRqdLuSjBW6VBpMHTxzCo1/ aaDHRYFYdhVCc//rb+7A==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jLqZ0-0002L4-0B; Tue, 07 Apr 2020 15:54:58 +0000 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5] helo=mx0a-001b2d01.pphosted.com) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jLqYx-0002K4-NN for linux-snps-arc@lists.infradead.org; Tue, 07 Apr 2020 15:54:57 +0000 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 037FYjvb050763 for ; Tue, 7 Apr 2020 11:54:54 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 306kuwpkrk-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 07 Apr 2020 11:54:53 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 7 Apr 2020 16:54:39 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 7 Apr 2020 16:54:32 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 037Fshw958654742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 7 Apr 2020 15:54:43 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 17FFE52059; Tue, 7 Apr 2020 15:54:43 +0000 (GMT) Received: from thinkpad (unknown [9.145.186.71]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 2847C52050; Tue, 7 Apr 2020 15:54:42 +0000 (GMT) Date: Tue, 7 Apr 2020 17:54:40 +0200 From: Gerald Schaefer To: Anshuman Khandual Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests In-Reply-To: References: <1585027375-9997-1-git-send-email-anshuman.khandual@arm.com> <20200331143059.29fca8fa@thinkpad> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 20040715-0012-0000-0000-000003A01446 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20040715-0013-0000-0000-000021DD3634 Message-Id: <20200407175440.41cc00a5@thinkpad> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.676 definitions=2020-04-07_07:2020-04-07, 2020-04-07 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 adultscore=0 malwarescore=0 priorityscore=1501 impostorscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 clxscore=1015 spamscore=0 lowpriorityscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004070128 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200407_085455_885363_7315951A X-CRM114-Status: GOOD ( 49.22 ) X-BeenThere: linux-snps-arc@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on Synopsys ARC Processors List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-doc@vger.kernel.org, Benjamin Herrenschmidt , Heiko Carstens , linux-mm@kvack.org, Paul Mackerras , "H. Peter Anvin" , linux-riscv@lists.infradead.org, Will Deacon , linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, Jonathan Corbet , Michael Ellerman , x86@kernel.org, Mike Rapoport , Christian Borntraeger , Ingo Molnar , Gerald Schaefer , Catalin Marinas , linux-snps-arc@lists.infradead.org, Vasily Gorbik , Borislav Petkov , Paul Walmsley , "Kirill A . Shutemov" , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, christophe.leroy@c-s.fr, Vineet Gupta , linux-kernel@vger.kernel.org, Palmer Dabbelt , Andrew Morton , linuxppc-dev@lists.ozlabs.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-snps-arc" Errors-To: linux-snps-arc-bounces+linux-snps-arc=archiver.kernel.org@lists.infradead.org On Sun, 5 Apr 2020 17:58:14 +0530 Anshuman Khandual wrote: [...] > > > > Could be fixed like this (the first de-reference is a bit special, > > because at that point *ptep does not really point to a large (pmd) entry > > yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() > > There seems to be an inconsistency on s390 platform. Even though it defines > a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET > which should have forced it fallback on generic huge_ptep_get() but it does > not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when > an arch uses . s390 does not use that and hence gets > away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds > confusing ? But I might not have the entire context here. Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get() was initially introduced because of s390, and now we don't select __HAVE_ARCH_HUGE_PTEP_GET... As you realized, I guess this is because we do not use generic hugetlb.h. And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad ("hugetlb: introduce generic version of huge_ptep_get"), that was probably the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET. Nothing really wrong with that, but yes, very confusing. Maybe we could also select it for s390, even though it wouldn't have any functional impact (so far), just for less confusion. Maybe also thinking about using the generic hugetlb.h, not sure if the original reasons for not doing so would still apply. Now I only need to find the time... > > > conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, > > because we do have some special bits there in our large pmds. It seems > > to also work w/o that alignment, but it feels a bit wrong): > > Sure, we can accommodate that. > > > > > @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test > > unsigned long vaddr, pgprot_t prot) > > { > > struct page *page = pfn_to_page(pfn); > > - pte_t pte = READ_ONCE(*ptep); > > + pte_t pte; > > > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > > + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); > > So that keeps the existing value in 'ptep' pointer at bay and instead > construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at > least provide the seed that can be ORed with RANDOM_ORVALUE before > being masked with PMD_MASK. Do you see any problem ? Yes, unfortunately. The problem is that the resulting pte is not marked as present. The conversion pte -> (huge) pmd, which is done in set_huge_pte_at() for s390, will establish an empty pmd for non-present ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent huge_ptep_get() will not result in the same original pte value. If you want to preserve and check the RANDOM_ORVALUE, it has to be a present pte, hence the mk_pte(_phys). > > Some thing like this instead. > > pte_t pte = READ_ONCE(*ptep); > pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); > > We cannot use mk_pte_phys() as it is defined only on some platforms > without any generic fallback for others. Oh, didn't know that, sorry. What about using mk_pte() instead, at least it would result in a present pte: pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); And if you also want to do some with the existing value, which seems to be an empty pte, then maybe just check if writing and reading that value with set_huge_pte_at() / huge_ptep_get() returns the same, i.e. initially w/o RANDOM_ORVALUE. So, in combination, like this (BTW, why is the barrier() needed, it is not used for the other set_huge_pte_at() calls later?): @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test struct page *page = pfn_to_page(pfn); pte_t pte = READ_ONCE(*ptep); - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); + set_huge_pte_at(mm, vaddr, ptep, pte); + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); + + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); set_huge_pte_at(mm, vaddr, ptep, pte); barrier(); WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); This would actually add a new test "write empty pte with set_huge_pte_at(), then verify with huge_ptep_get()", which happens to trigger a warning on s390 :-) That (new) warning actually points to misbehavior on s390, we do not write a correct empty pmd in this case, but one that is empty and also marked as large. huge_ptep_get() will then not correctly recognize it as empty and do wrong conversion. It is also not consistent with huge_ptep_get_and_clear(), where we write the empty pmd w/o marking as large. Last but not least it would also break our pmd_protnone() logic (see below). Another nice finding on s390 :-) I don't think this has any effect in practice (yet), but I will post a fix for that, just in case you will add / change this test. > > > set_huge_pte_at(mm, vaddr, ptep, pte); > > barrier(); > > WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > > huge_pte_clear(mm, vaddr, ptep, PMD_SIZE); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_set_wrprotect(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(huge_pte_write(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_get_and_clear(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test > > pte = huge_pte_mkwrite(pte); > > pte = huge_pte_mkdirty(pte); > > huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte))); > > } > > #else > > > > 3) The pmd_protnone_tests() has an issue, because it passes a pmd to > > pmd_protnone() which has not been marked as large. We check for large > > pmd in the s390 implementation of pmd_protnone(), and will fail if a > > pmd is not large. We had similar issues before, in other helpers, where > > I changed the logic on s390 to not require the pmd large check, but I'm > > not so sure in this case. Is there a valid use case for doing > > pmd_protnone() on "normal" pmds? Or could this be changed like this: > > That is a valid question. IIUC, all existing callers for pmd_protnone() > ensure that it is indeed a huge PMD. But even assuming otherwise should > not the huge PMD requirement get checked in the caller itself rather than > in the arch helper which is just supposed to check the existence of the > dedicated PTE bit(s) for this purpose. Purely from a helper perspective > pmd_protnone() should not really care about being large even though it > might never get used without one. > > Also all platforms (except s390) derive the pmd_protnone() from their > respective pte_protnone(). I wonder why should s390 be any different > unless it is absolutely necessary. This is again because of our different page table entry layouts for pte/pmd and (large) pmd. The bits we check for pmd_protnone() are not valid for normal pmd/pte, and we would return undefined result for normal entries. Of course, we could rely on nobody calling pmd_protnone() on normal pmds, but in this case we also use pmd_large() check in pmd_protnone() for indication if the pmd is present. W/o that, we would return true for empty pmds, that doesn't sound right. Not sure if we also want to rely on nobody calling pmd_protnone() on empty pmds. Anyway, if in practice it is not correct to use pmd_protnone() on normal pmds, then I would suggest that your tests should also not do / test it. And I strongly assume that it is not correct, at least I cannot think of a valid case, and of course s390 would already be broken if there was such a case. Regards, Gerald _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2C0AC2BA1A for ; Tue, 7 Apr 2020 15:55:31 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8C8E620730 for ; Tue, 7 Apr 2020 15:55:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="jIEKTow7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8C8E620730 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=de.ibm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Message-Id:MIME-Version:References: In-Reply-To:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=MOUCFmKPzT2VHvaP2Suo3ABKZUzhcdYGz/UWNPOzsZ0=; b=jIEKTow7te928O 4WSLXeqP0DypwabS6wHgreBG00INyg/gLxd6fnGLNjDDUUSqlifmtO2MojPUGBG6S4iJxaJgufTNI /HwoYe+ENulA+WszRAY/EtwzrrqxeDvt4aaWbWjQADeUO2LE/ONGnZw7g5uRqYLVsWXXbzrXRws0/ 5bNCqxixtZw6VS5M9nrNFMHiq8Jw6FtcYbH6fcBxzwfb3zGMBzkUZQUEuLVBGZZKbZGU4ufA17nk1 bbJ72Yykaiiy3BmPvmU37edycEfHKckf5S2lVcN/fSFM4+tUybHeYaohs4qijzXVcd8hZbxAciyPG feoK7b3jatXeOv6IScPQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jLqZX-0004w4-6H; Tue, 07 Apr 2020 15:55:31 +0000 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5] helo=mx0a-001b2d01.pphosted.com) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jLqZT-0002K2-Tw for linux-arm-kernel@lists.infradead.org; Tue, 07 Apr 2020 15:55:29 +0000 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 037Fm3ml109830 for ; Tue, 7 Apr 2020 11:54:53 -0400 Received: from e06smtp03.uk.ibm.com (e06smtp03.uk.ibm.com [195.75.94.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 306kuwpkrg-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 07 Apr 2020 11:54:53 -0400 Received: from localhost by e06smtp03.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 7 Apr 2020 16:54:39 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (9.149.109.197) by e06smtp03.uk.ibm.com (192.168.101.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 7 Apr 2020 16:54:32 +0100 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 037Fshw958654742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 7 Apr 2020 15:54:43 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 17FFE52059; Tue, 7 Apr 2020 15:54:43 +0000 (GMT) Received: from thinkpad (unknown [9.145.186.71]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 2847C52050; Tue, 7 Apr 2020 15:54:42 +0000 (GMT) Date: Tue, 7 Apr 2020 17:54:40 +0200 From: Gerald Schaefer To: Anshuman Khandual Subject: Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests In-Reply-To: References: <1585027375-9997-1-git-send-email-anshuman.khandual@arm.com> <20200331143059.29fca8fa@thinkpad> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 20040715-0012-0000-0000-000003A01446 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20040715-0013-0000-0000-000021DD3634 Message-Id: <20200407175440.41cc00a5@thinkpad> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.676 definitions=2020-04-07_07:2020-04-07, 2020-04-07 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 adultscore=0 malwarescore=0 priorityscore=1501 impostorscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 clxscore=1015 spamscore=0 lowpriorityscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2004070128 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200407_085528_118925_93A61AD9 X-CRM114-Status: GOOD ( 50.77 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-doc@vger.kernel.org, Benjamin Herrenschmidt , Heiko Carstens , linux-mm@kvack.org, Paul Mackerras , "H. Peter Anvin" , linux-riscv@lists.infradead.org, Will Deacon , linux-arch@vger.kernel.org, linux-s390@vger.kernel.org, Jonathan Corbet , Michael Ellerman , x86@kernel.org, Mike Rapoport , Christian Borntraeger , Ingo Molnar , Gerald Schaefer , Catalin Marinas , linux-snps-arc@lists.infradead.org, Vasily Gorbik , Borislav Petkov , Paul Walmsley , "Kirill A . Shutemov" , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, christophe.leroy@c-s.fr, Vineet Gupta , linux-kernel@vger.kernel.org, Palmer Dabbelt , Andrew Morton , linuxppc-dev@lists.ozlabs.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Sun, 5 Apr 2020 17:58:14 +0530 Anshuman Khandual wrote: [...] > > > > Could be fixed like this (the first de-reference is a bit special, > > because at that point *ptep does not really point to a large (pmd) entry > > yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() > > There seems to be an inconsistency on s390 platform. Even though it defines > a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET > which should have forced it fallback on generic huge_ptep_get() but it does > not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when > an arch uses . s390 does not use that and hence gets > away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds > confusing ? But I might not have the entire context here. Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get() was initially introduced because of s390, and now we don't select __HAVE_ARCH_HUGE_PTEP_GET... As you realized, I guess this is because we do not use generic hugetlb.h. And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad ("hugetlb: introduce generic version of huge_ptep_get"), that was probably the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET. Nothing really wrong with that, but yes, very confusing. Maybe we could also select it for s390, even though it wouldn't have any functional impact (so far), just for less confusion. Maybe also thinking about using the generic hugetlb.h, not sure if the original reasons for not doing so would still apply. Now I only need to find the time... > > > conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE, > > because we do have some special bits there in our large pmds. It seems > > to also work w/o that alignment, but it feels a bit wrong): > > Sure, we can accommodate that. > > > > > @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test > > unsigned long vaddr, pgprot_t prot) > > { > > struct page *page = pfn_to_page(pfn); > > - pte_t pte = READ_ONCE(*ptep); > > + pte_t pte; > > > > - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); > > + pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot)); > > So that keeps the existing value in 'ptep' pointer at bay and instead > construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at > least provide the seed that can be ORed with RANDOM_ORVALUE before > being masked with PMD_MASK. Do you see any problem ? Yes, unfortunately. The problem is that the resulting pte is not marked as present. The conversion pte -> (huge) pmd, which is done in set_huge_pte_at() for s390, will establish an empty pmd for non-present ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent huge_ptep_get() will not result in the same original pte value. If you want to preserve and check the RANDOM_ORVALUE, it has to be a present pte, hence the mk_pte(_phys). > > Some thing like this instead. > > pte_t pte = READ_ONCE(*ptep); > pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK)); > > We cannot use mk_pte_phys() as it is defined only on some platforms > without any generic fallback for others. Oh, didn't know that, sorry. What about using mk_pte() instead, at least it would result in a present pte: pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); And if you also want to do some with the existing value, which seems to be an empty pte, then maybe just check if writing and reading that value with set_huge_pte_at() / huge_ptep_get() returns the same, i.e. initially w/o RANDOM_ORVALUE. So, in combination, like this (BTW, why is the barrier() needed, it is not used for the other set_huge_pte_at() calls later?): @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test struct page *page = pfn_to_page(pfn); pte_t pte = READ_ONCE(*ptep); - pte = __pte(pte_val(pte) | RANDOM_ORVALUE); + set_huge_pte_at(mm, vaddr, ptep, pte); + WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); + + pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot)); set_huge_pte_at(mm, vaddr, ptep, pte); barrier(); WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); This would actually add a new test "write empty pte with set_huge_pte_at(), then verify with huge_ptep_get()", which happens to trigger a warning on s390 :-) That (new) warning actually points to misbehavior on s390, we do not write a correct empty pmd in this case, but one that is empty and also marked as large. huge_ptep_get() will then not correctly recognize it as empty and do wrong conversion. It is also not consistent with huge_ptep_get_and_clear(), where we write the empty pmd w/o marking as large. Last but not least it would also break our pmd_protnone() logic (see below). Another nice finding on s390 :-) I don't think this has any effect in practice (yet), but I will post a fix for that, just in case you will add / change this test. > > > set_huge_pte_at(mm, vaddr, ptep, pte); > > barrier(); > > WARN_ON(!pte_same(pte, huge_ptep_get(ptep))); > > huge_pte_clear(mm, vaddr, ptep, PMD_SIZE); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_set_wrprotect(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(huge_pte_write(pte)); > > > > pte = mk_huge_pte(page, prot); > > set_huge_pte_at(mm, vaddr, ptep, pte); > > huge_ptep_get_and_clear(mm, vaddr, ptep); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!huge_pte_none(pte)); > > > > pte = mk_huge_pte(page, prot); > > @@ -759,7 +759,7 @@ static void __init hugetlb_advanced_test > > pte = huge_pte_mkwrite(pte); > > pte = huge_pte_mkdirty(pte); > > huge_ptep_set_access_flags(vma, vaddr, ptep, pte, 1); > > - pte = READ_ONCE(*ptep); > > + pte = huge_ptep_get(ptep); > > WARN_ON(!(huge_pte_write(pte) && huge_pte_dirty(pte))); > > } > > #else > > > > 3) The pmd_protnone_tests() has an issue, because it passes a pmd to > > pmd_protnone() which has not been marked as large. We check for large > > pmd in the s390 implementation of pmd_protnone(), and will fail if a > > pmd is not large. We had similar issues before, in other helpers, where > > I changed the logic on s390 to not require the pmd large check, but I'm > > not so sure in this case. Is there a valid use case for doing > > pmd_protnone() on "normal" pmds? Or could this be changed like this: > > That is a valid question. IIUC, all existing callers for pmd_protnone() > ensure that it is indeed a huge PMD. But even assuming otherwise should > not the huge PMD requirement get checked in the caller itself rather than > in the arch helper which is just supposed to check the existence of the > dedicated PTE bit(s) for this purpose. Purely from a helper perspective > pmd_protnone() should not really care about being large even though it > might never get used without one. > > Also all platforms (except s390) derive the pmd_protnone() from their > respective pte_protnone(). I wonder why should s390 be any different > unless it is absolutely necessary. This is again because of our different page table entry layouts for pte/pmd and (large) pmd. The bits we check for pmd_protnone() are not valid for normal pmd/pte, and we would return undefined result for normal entries. Of course, we could rely on nobody calling pmd_protnone() on normal pmds, but in this case we also use pmd_large() check in pmd_protnone() for indication if the pmd is present. W/o that, we would return true for empty pmds, that doesn't sound right. Not sure if we also want to rely on nobody calling pmd_protnone() on empty pmds. Anyway, if in practice it is not correct to use pmd_protnone() on normal pmds, then I would suggest that your tests should also not do / test it. And I strongly assume that it is not correct, at least I cannot think of a valid case, and of course s390 would already be broken if there was such a case. Regards, Gerald _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel