All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Yang Shi <shy828301@gmail.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Michal Hocko <mhocko@kernel.org>, Nadav Amit <namit@vmware.com>,
	Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>, Donald Dutile <ddutile@redhat.com>,
	Christoph Hellwig <hch@lst.de>, Oleg Nesterov <oleg@redhat.com>,
	Jan Kara <jack@suse.cz>, Liang Zhang <zhangliang5@huawei.com>,
	Pedro Gomes <pedrodemargomes@gmail.com>,
	Oded Gabbay <oded.gabbay@gmail.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, x86@kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v1 5/7] s390/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Date: Wed, 16 Mar 2022 15:00:56 +0100	[thread overview]
Message-ID: <75d6cccd-ce22-bdf9-68d5-0792cec39ab7@redhat.com> (raw)
In-Reply-To: <20220316142722.76c691d2@thinkpad>

On 16.03.22 14:27, Gerald Schaefer wrote:
> On Wed, 16 Mar 2022 14:01:07 +0100
> Christian Borntraeger <borntraeger@linux.ibm.com> wrote:
> 
>>
>>
>> Am 16.03.22 um 11:56 schrieb Gerald Schaefer:
>>> On Tue, 15 Mar 2022 18:12:16 +0100
>>> David Hildenbrand <david@redhat.com> wrote:
>>>
>>>> On 15.03.22 17:58, David Hildenbrand wrote:
>>>>>
>>>>>>> This would mean that it is not OK to have bit 52 not zero for swap PTEs.
>>>>>>> But if I read the POP correctly, all bits except for the DAT-protection
>>>>>>> would be ignored for invalid PTEs, so maybe this comment needs some update
>>>>>>> (for both bits 52 and also 55).
>>>>>>>
>>>>>>> Heiko might also have some more insight.
>>>>>>
>>>>>> Indeed, I wonder why we should get a specification exception when the
>>>>>> PTE is invalid. I'll dig a bit into the PoP.
>>>>>
>>>>> SA22-7832-12 6-46 ("Translation-Specification Exception") is clearer
>>>>>
>>>>> "The page-table entry used for the translation is
>>>>> valid, and bit position 52 does not contain zero."
>>>>>
>>>>> "The page-table entry used for the translation is
>>>>> valid, EDAT-1 does not apply, the instruction-exe-
>>>>> cution-protection facility is not installed, and bit
>>>>> position 55 does not contain zero. It is model
>>>>> dependent whether this condition is recognized."
>>>>>
>>>>
>>>> I wonder if the following matches reality:
>>>>
>>>> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
>>>> index 008a6c856fa4..6a227a8c3712 100644
>>>> --- a/arch/s390/include/asm/pgtable.h
>>>> +++ b/arch/s390/include/asm/pgtable.h
>>>> @@ -1669,18 +1669,16 @@ static inline int has_transparent_hugepage(void)
>>>>   /*
>>>>    * 64 bit swap entry format:
>>>>    * A page-table entry has some bits we have to treat in a special way.
>>>> - * Bits 52 and bit 55 have to be zero, otherwise a specification
>>>> - * exception will occur instead of a page translation exception. The
>>>> - * specification exception has the bad habit not to store necessary
>>>> - * information in the lowcore.
>>>>    * Bits 54 and 63 are used to indicate the page type.
>>>>    * A swap pte is indicated by bit pattern (pte & 0x201) == 0x200
>>>> - * This leaves the bits 0-51 and bits 56-62 to store type and offset.
>>>> - * We use the 5 bits from 57-61 for the type and the 52 bits from 0-51
>>>> - * for the offset.
>>>> - * |                     offset                        |01100|type |00|
>>>> + * |                     offset                        |XX1XX|type |S0|
>>>>    * |0000000000111111111122222222223333333333444444444455|55555|55566|66|
>>>>    * |0123456789012345678901234567890123456789012345678901|23456|78901|23|
>>>> + *
>>>> + * Bits 0-51 store the offset.
>>>> + * Bits 57-62 store the type.
>>>> + * Bit 62 (S) is used for softdirty tracking.
>>>> + * Bits 52, 53, 55 and 56 (X) are unused.
>>>>    */
>>>>   
>>>>   #define __SWP_OFFSET_MASK      ((1UL << 52) - 1)
>>>>
>>>>
>>>> I'm not sure why bit 53 was indicated as "1" and bit 55 was indicated as
>>>> "0". At least for 52 and 55 there was a clear description.
>>>
>>> Bit 53 is the invalid bit, and that is always 1 for swap ptes, in addition
>>> to protection bit 54. Bit 55, along with bit 52, has to be zero according
>>> to the (potentially deprecated) comment.
>>>
>>> It is interesting that bit 56 seems to be unused, at least according
>>> to the comment, but that would also mention bit 62 as unused, so that
>>> clearly needs some update.
>>>
>>> If bit 56 could be used for _PAGE_SWP_EXCLUSIVE, that would be better
>>> than stealing a bit from the offset, or using potentially dangerous
>>> bit 52. It is defined as _PAGE_UNUSED and only used for kvm, not sure
>>> if this is also relevant for swap ptes, similar to bit 62.
>>>
>>> Adding Christian on cc, maybe he has some insight on _PAGE_UNUSED
>>> bit 56 and swap ptes.
>>
>> I think _PAGE_UNUSED is not used for swap ptes. It is used _before_ swapping
>> to decide whether we swap or discard the page.
>>
>> Regarding bit 52, the POP says in chapter 3 for the page table entry
>>
>> [..]
>> Page-Invalid Bit (I): Bit 53 controls whether the
>> page associated with the page-table entry is avail-
>> able. When the bit is zero, address translation pro-
>> ceeds by using the page-table entry. When the bit is
>> one, the page-table entry cannot be used for transla-
>> tion.
>>
>>
>> -->When the page-invalid bit is one, all other bits in the
>> -->page-table entry are available for use by program-
>> -->ming.
>>
>> this was added with the z14 POP, but I guess it was just a clarification
>> and should be valid for older machines as well.
>> So 52 and 56 should be ok, with 52 probably the better choice.
> 
> Ok, bit 55 would then also be an option IIUC, since execution protection
> should not be relevant for swap ptes. And Davids clean-up removing the
> restriction for bit 52 and 55 in the comment would make sense.
> 
> I would also favor bit 52 though (PAGE_LARGE), as in Davids initial patch
> version, since this is never used for any real ptes. The PAGE_LARGE flag
> is only set in the "virtual" large ptes that the hugetlb code is seeing
> from huge_ptep_get(). But it will (and must) never be written as a valid
> pte, or else it will generate an exception. IIRC, we only set it to detect
> such possible bugs, e.g. hugetlb code writing a pte (which really is a
> pmd/pud) directly, instead of using set_huge_pte_at().
> 

Agreed. I'll include the doc cleanup patch and a fixed-up version of
this patch (still using bit 52, not messing with the offset bits) in the
next version.

Thanks all!

-- 
Thanks,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: x86@kernel.org, Jan Kara <jack@suse.cz>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Yang Shi <shy828301@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Xu <peterx@redhat.com>, Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, Donald Dutile <ddutile@redhat.com>,
	Liang Zhang <zhangliang5@huawei.com>,
	Borislav Petkov <bp@alien8.de>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Will Deacon <will@kernel.org>, Christoph Hellwig <hch@lst.de>,
	Paul Mackerras <paulus@samba.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	linux-s390@vger.kernel.org, Vasily Gorbik <gor@linux.ibm.com>,
	Rik van Riel <riel@surriel.com>, Hugh Dickins <hughd@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	linux-arm-kernel@lists.infradead.org,
	Jason Gunthorpe <jgg@nvidia.com>,
	David Rientjes <rientjes@google.com>,
	Pedro Gomes <pedrodemargomes@gmail.com>,
	Jann Horn <jannh@google.com>, John Hubbard <jhubbard@nvidia.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Shakeel Butt <shakeelb@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	Oded Gabbay <oded.gabbay@gmail.com>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	Nadav Amit <namit@vmware.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [PATCH v1 5/7] s390/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Date: Wed, 16 Mar 2022 15:00:56 +0100	[thread overview]
Message-ID: <75d6cccd-ce22-bdf9-68d5-0792cec39ab7@redhat.com> (raw)
In-Reply-To: <20220316142722.76c691d2@thinkpad>

On 16.03.22 14:27, Gerald Schaefer wrote:
> On Wed, 16 Mar 2022 14:01:07 +0100
> Christian Borntraeger <borntraeger@linux.ibm.com> wrote:
> 
>>
>>
>> Am 16.03.22 um 11:56 schrieb Gerald Schaefer:
>>> On Tue, 15 Mar 2022 18:12:16 +0100
>>> David Hildenbrand <david@redhat.com> wrote:
>>>
>>>> On 15.03.22 17:58, David Hildenbrand wrote:
>>>>>
>>>>>>> This would mean that it is not OK to have bit 52 not zero for swap PTEs.
>>>>>>> But if I read the POP correctly, all bits except for the DAT-protection
>>>>>>> would be ignored for invalid PTEs, so maybe this comment needs some update
>>>>>>> (for both bits 52 and also 55).
>>>>>>>
>>>>>>> Heiko might also have some more insight.
>>>>>>
>>>>>> Indeed, I wonder why we should get a specification exception when the
>>>>>> PTE is invalid. I'll dig a bit into the PoP.
>>>>>
>>>>> SA22-7832-12 6-46 ("Translation-Specification Exception") is clearer
>>>>>
>>>>> "The page-table entry used for the translation is
>>>>> valid, and bit position 52 does not contain zero."
>>>>>
>>>>> "The page-table entry used for the translation is
>>>>> valid, EDAT-1 does not apply, the instruction-exe-
>>>>> cution-protection facility is not installed, and bit
>>>>> position 55 does not contain zero. It is model
>>>>> dependent whether this condition is recognized."
>>>>>
>>>>
>>>> I wonder if the following matches reality:
>>>>
>>>> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
>>>> index 008a6c856fa4..6a227a8c3712 100644
>>>> --- a/arch/s390/include/asm/pgtable.h
>>>> +++ b/arch/s390/include/asm/pgtable.h
>>>> @@ -1669,18 +1669,16 @@ static inline int has_transparent_hugepage(void)
>>>>   /*
>>>>    * 64 bit swap entry format:
>>>>    * A page-table entry has some bits we have to treat in a special way.
>>>> - * Bits 52 and bit 55 have to be zero, otherwise a specification
>>>> - * exception will occur instead of a page translation exception. The
>>>> - * specification exception has the bad habit not to store necessary
>>>> - * information in the lowcore.
>>>>    * Bits 54 and 63 are used to indicate the page type.
>>>>    * A swap pte is indicated by bit pattern (pte & 0x201) == 0x200
>>>> - * This leaves the bits 0-51 and bits 56-62 to store type and offset.
>>>> - * We use the 5 bits from 57-61 for the type and the 52 bits from 0-51
>>>> - * for the offset.
>>>> - * |                     offset                        |01100|type |00|
>>>> + * |                     offset                        |XX1XX|type |S0|
>>>>    * |0000000000111111111122222222223333333333444444444455|55555|55566|66|
>>>>    * |0123456789012345678901234567890123456789012345678901|23456|78901|23|
>>>> + *
>>>> + * Bits 0-51 store the offset.
>>>> + * Bits 57-62 store the type.
>>>> + * Bit 62 (S) is used for softdirty tracking.
>>>> + * Bits 52, 53, 55 and 56 (X) are unused.
>>>>    */
>>>>   
>>>>   #define __SWP_OFFSET_MASK      ((1UL << 52) - 1)
>>>>
>>>>
>>>> I'm not sure why bit 53 was indicated as "1" and bit 55 was indicated as
>>>> "0". At least for 52 and 55 there was a clear description.
>>>
>>> Bit 53 is the invalid bit, and that is always 1 for swap ptes, in addition
>>> to protection bit 54. Bit 55, along with bit 52, has to be zero according
>>> to the (potentially deprecated) comment.
>>>
>>> It is interesting that bit 56 seems to be unused, at least according
>>> to the comment, but that would also mention bit 62 as unused, so that
>>> clearly needs some update.
>>>
>>> If bit 56 could be used for _PAGE_SWP_EXCLUSIVE, that would be better
>>> than stealing a bit from the offset, or using potentially dangerous
>>> bit 52. It is defined as _PAGE_UNUSED and only used for kvm, not sure
>>> if this is also relevant for swap ptes, similar to bit 62.
>>>
>>> Adding Christian on cc, maybe he has some insight on _PAGE_UNUSED
>>> bit 56 and swap ptes.
>>
>> I think _PAGE_UNUSED is not used for swap ptes. It is used _before_ swapping
>> to decide whether we swap or discard the page.
>>
>> Regarding bit 52, the POP says in chapter 3 for the page table entry
>>
>> [..]
>> Page-Invalid Bit (I): Bit 53 controls whether the
>> page associated with the page-table entry is avail-
>> able. When the bit is zero, address translation pro-
>> ceeds by using the page-table entry. When the bit is
>> one, the page-table entry cannot be used for transla-
>> tion.
>>
>>
>> -->When the page-invalid bit is one, all other bits in the
>> -->page-table entry are available for use by program-
>> -->ming.
>>
>> this was added with the z14 POP, but I guess it was just a clarification
>> and should be valid for older machines as well.
>> So 52 and 56 should be ok, with 52 probably the better choice.
> 
> Ok, bit 55 would then also be an option IIUC, since execution protection
> should not be relevant for swap ptes. And Davids clean-up removing the
> restriction for bit 52 and 55 in the comment would make sense.
> 
> I would also favor bit 52 though (PAGE_LARGE), as in Davids initial patch
> version, since this is never used for any real ptes. The PAGE_LARGE flag
> is only set in the "virtual" large ptes that the hugetlb code is seeing
> from huge_ptep_get(). But it will (and must) never be written as a valid
> pte, or else it will generate an exception. IIRC, we only set it to detect
> such possible bugs, e.g. hugetlb code writing a pte (which really is a
> pmd/pud) directly, instead of using set_huge_pte_at().
> 

Agreed. I'll include the doc cleanup patch and a fixed-up version of
this patch (still using bit 52, not messing with the offset bits) in the
next version.

Thanks all!

-- 
Thanks,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Yang Shi <shy828301@gmail.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Michal Hocko <mhocko@kernel.org>, Nadav Amit <namit@vmware.com>,
	Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>, Donald Dutile <ddutile@redhat.com>,
	Christoph Hellwig <hch@lst.de>, Oleg Nesterov <oleg@redhat.com>,
	Jan Kara <jack@suse.cz>, Liang Zhang <zhangliang5@huawei.com>,
	Pedro Gomes <pedrodemargomes@gmail.com>,
	Oded Gabbay <oded.gabbay@gmail.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-mm@kvack.org, x86@kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org
Subject: Re: [PATCH v1 5/7] s390/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE
Date: Wed, 16 Mar 2022 15:00:56 +0100	[thread overview]
Message-ID: <75d6cccd-ce22-bdf9-68d5-0792cec39ab7@redhat.com> (raw)
In-Reply-To: <20220316142722.76c691d2@thinkpad>

On 16.03.22 14:27, Gerald Schaefer wrote:
> On Wed, 16 Mar 2022 14:01:07 +0100
> Christian Borntraeger <borntraeger@linux.ibm.com> wrote:
> 
>>
>>
>> Am 16.03.22 um 11:56 schrieb Gerald Schaefer:
>>> On Tue, 15 Mar 2022 18:12:16 +0100
>>> David Hildenbrand <david@redhat.com> wrote:
>>>
>>>> On 15.03.22 17:58, David Hildenbrand wrote:
>>>>>
>>>>>>> This would mean that it is not OK to have bit 52 not zero for swap PTEs.
>>>>>>> But if I read the POP correctly, all bits except for the DAT-protection
>>>>>>> would be ignored for invalid PTEs, so maybe this comment needs some update
>>>>>>> (for both bits 52 and also 55).
>>>>>>>
>>>>>>> Heiko might also have some more insight.
>>>>>>
>>>>>> Indeed, I wonder why we should get a specification exception when the
>>>>>> PTE is invalid. I'll dig a bit into the PoP.
>>>>>
>>>>> SA22-7832-12 6-46 ("Translation-Specification Exception") is clearer
>>>>>
>>>>> "The page-table entry used for the translation is
>>>>> valid, and bit position 52 does not contain zero."
>>>>>
>>>>> "The page-table entry used for the translation is
>>>>> valid, EDAT-1 does not apply, the instruction-exe-
>>>>> cution-protection facility is not installed, and bit
>>>>> position 55 does not contain zero. It is model
>>>>> dependent whether this condition is recognized."
>>>>>
>>>>
>>>> I wonder if the following matches reality:
>>>>
>>>> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
>>>> index 008a6c856fa4..6a227a8c3712 100644
>>>> --- a/arch/s390/include/asm/pgtable.h
>>>> +++ b/arch/s390/include/asm/pgtable.h
>>>> @@ -1669,18 +1669,16 @@ static inline int has_transparent_hugepage(void)
>>>>   /*
>>>>    * 64 bit swap entry format:
>>>>    * A page-table entry has some bits we have to treat in a special way.
>>>> - * Bits 52 and bit 55 have to be zero, otherwise a specification
>>>> - * exception will occur instead of a page translation exception. The
>>>> - * specification exception has the bad habit not to store necessary
>>>> - * information in the lowcore.
>>>>    * Bits 54 and 63 are used to indicate the page type.
>>>>    * A swap pte is indicated by bit pattern (pte & 0x201) == 0x200
>>>> - * This leaves the bits 0-51 and bits 56-62 to store type and offset.
>>>> - * We use the 5 bits from 57-61 for the type and the 52 bits from 0-51
>>>> - * for the offset.
>>>> - * |                     offset                        |01100|type |00|
>>>> + * |                     offset                        |XX1XX|type |S0|
>>>>    * |0000000000111111111122222222223333333333444444444455|55555|55566|66|
>>>>    * |0123456789012345678901234567890123456789012345678901|23456|78901|23|
>>>> + *
>>>> + * Bits 0-51 store the offset.
>>>> + * Bits 57-62 store the type.
>>>> + * Bit 62 (S) is used for softdirty tracking.
>>>> + * Bits 52, 53, 55 and 56 (X) are unused.
>>>>    */
>>>>   
>>>>   #define __SWP_OFFSET_MASK      ((1UL << 52) - 1)
>>>>
>>>>
>>>> I'm not sure why bit 53 was indicated as "1" and bit 55 was indicated as
>>>> "0". At least for 52 and 55 there was a clear description.
>>>
>>> Bit 53 is the invalid bit, and that is always 1 for swap ptes, in addition
>>> to protection bit 54. Bit 55, along with bit 52, has to be zero according
>>> to the (potentially deprecated) comment.
>>>
>>> It is interesting that bit 56 seems to be unused, at least according
>>> to the comment, but that would also mention bit 62 as unused, so that
>>> clearly needs some update.
>>>
>>> If bit 56 could be used for _PAGE_SWP_EXCLUSIVE, that would be better
>>> than stealing a bit from the offset, or using potentially dangerous
>>> bit 52. It is defined as _PAGE_UNUSED and only used for kvm, not sure
>>> if this is also relevant for swap ptes, similar to bit 62.
>>>
>>> Adding Christian on cc, maybe he has some insight on _PAGE_UNUSED
>>> bit 56 and swap ptes.
>>
>> I think _PAGE_UNUSED is not used for swap ptes. It is used _before_ swapping
>> to decide whether we swap or discard the page.
>>
>> Regarding bit 52, the POP says in chapter 3 for the page table entry
>>
>> [..]
>> Page-Invalid Bit (I): Bit 53 controls whether the
>> page associated with the page-table entry is avail-
>> able. When the bit is zero, address translation pro-
>> ceeds by using the page-table entry. When the bit is
>> one, the page-table entry cannot be used for transla-
>> tion.
>>
>>
>> -->When the page-invalid bit is one, all other bits in the
>> -->page-table entry are available for use by program-
>> -->ming.
>>
>> this was added with the z14 POP, but I guess it was just a clarification
>> and should be valid for older machines as well.
>> So 52 and 56 should be ok, with 52 probably the better choice.
> 
> Ok, bit 55 would then also be an option IIUC, since execution protection
> should not be relevant for swap ptes. And Davids clean-up removing the
> restriction for bit 52 and 55 in the comment would make sense.
> 
> I would also favor bit 52 though (PAGE_LARGE), as in Davids initial patch
> version, since this is never used for any real ptes. The PAGE_LARGE flag
> is only set in the "virtual" large ptes that the hugetlb code is seeing
> from huge_ptep_get(). But it will (and must) never be written as a valid
> pte, or else it will generate an exception. IIRC, we only set it to detect
> such possible bugs, e.g. hugetlb code writing a pte (which really is a
> pmd/pud) directly, instead of using set_huge_pte_at().
> 

Agreed. I'll include the doc cleanup patch and a fixed-up version of
this patch (still using bit 52, not messing with the offset bits) in the
next version.

Thanks all!

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-03-16 14:01 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-15 14:18 [PATCH v1 0/7] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages David Hildenbrand
2022-03-15 14:18 ` David Hildenbrand
2022-03-15 14:18 ` David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 1/7] mm/swap: remember PG_anon_exclusive via a swp pte bit David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 2/7] mm/debug_vm_pgtable: add tests for __HAVE_ARCH_PTE_SWP_EXCLUSIVE David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 3/7] x86/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 4/7] arm64/pgtable: " David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-16 18:27   ` Catalin Marinas
2022-03-16 18:27     ` Catalin Marinas
2022-03-16 18:27     ` Catalin Marinas
2022-03-17 10:04     ` David Hildenbrand
2022-03-17 10:04       ` David Hildenbrand
2022-03-17 10:04       ` David Hildenbrand
2022-03-17 17:58       ` Catalin Marinas
2022-03-17 17:58         ` Catalin Marinas
2022-03-17 17:58         ` Catalin Marinas
2022-03-18  9:59         ` David Hildenbrand
2022-03-18  9:59           ` David Hildenbrand
2022-03-18  9:59           ` David Hildenbrand
2022-03-18 11:33           ` Catalin Marinas
2022-03-18 11:33             ` Catalin Marinas
2022-03-18 11:33             ` Catalin Marinas
2022-03-18 14:14             ` David Hildenbrand
2022-03-18 14:14               ` David Hildenbrand
2022-03-18 14:14               ` David Hildenbrand
2022-03-21 14:38     ` Will Deacon
2022-03-21 14:38       ` Will Deacon
2022-03-21 14:38       ` Will Deacon
2022-03-21 14:39       ` Will Deacon
2022-03-21 14:39         ` Will Deacon
2022-03-21 14:39         ` Will Deacon
2022-03-21 15:07       ` David Hildenbrand
2022-03-21 15:07         ` David Hildenbrand
2022-03-21 15:07         ` David Hildenbrand
2022-03-21 17:44         ` Will Deacon
2022-03-21 17:44           ` Will Deacon
2022-03-21 17:44           ` Will Deacon
2022-03-21 18:27           ` Catalin Marinas
2022-03-21 18:27             ` Catalin Marinas
2022-03-21 18:27             ` Catalin Marinas
2022-03-22  9:46             ` David Hildenbrand
2022-03-22  9:46               ` David Hildenbrand
2022-03-22  9:46               ` David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 5/7] s390/pgtable: " David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 16:21   ` Gerald Schaefer
2022-03-15 16:21     ` Gerald Schaefer
2022-03-15 16:21     ` Gerald Schaefer
2022-03-15 16:37     ` David Hildenbrand
2022-03-15 16:37       ` David Hildenbrand
2022-03-15 16:37       ` David Hildenbrand
2022-03-15 16:58       ` David Hildenbrand
2022-03-15 16:58         ` David Hildenbrand
2022-03-15 16:58         ` David Hildenbrand
2022-03-15 17:12         ` David Hildenbrand
2022-03-15 17:12           ` David Hildenbrand
2022-03-15 17:12           ` David Hildenbrand
2022-03-15 17:14           ` David Hildenbrand
2022-03-15 17:14             ` David Hildenbrand
2022-03-15 17:14             ` David Hildenbrand
2022-03-16 10:56           ` Gerald Schaefer
2022-03-16 10:56             ` Gerald Schaefer
2022-03-16 10:56             ` Gerald Schaefer
2022-03-16 11:06             ` David Hildenbrand
2022-03-16 11:06               ` David Hildenbrand
2022-03-16 11:06               ` David Hildenbrand
2022-03-16 13:01             ` Christian Borntraeger
2022-03-16 13:01               ` Christian Borntraeger
2022-03-16 13:01               ` Christian Borntraeger
2022-03-16 13:27               ` Gerald Schaefer
2022-03-16 13:27                 ` Gerald Schaefer
2022-03-16 13:27                 ` Gerald Schaefer
2022-03-16 14:00                 ` David Hildenbrand [this message]
2022-03-16 14:00                   ` David Hildenbrand
2022-03-16 14:00                   ` David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 6/7] powerpc/pgtable: remove _PAGE_BIT_SWAP_TYPE for book3s David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18 ` [PATCH v1 7/7] powerpc/pgtable: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE " David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-15 14:18   ` David Hildenbrand
2022-03-18 23:48 ` [PATCH v1 0/7] mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages Jason Gunthorpe
2022-03-18 23:48   ` Jason Gunthorpe
2022-03-18 23:48   ` Jason Gunthorpe
2022-03-19 11:17   ` David Hildenbrand
2022-03-19 11:17     ` David Hildenbrand
2022-03-19 11:17     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75d6cccd-ce22-bdf9-68d5-0792cec39ab7@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=borntraeger@linux.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=ddutile@redhat.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=guro@fb.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=namit@vmware.com \
    --cc=oded.gabbay@gmail.com \
    --cc=oleg@redhat.com \
    --cc=paulus@samba.org \
    --cc=pedrodemargomes@gmail.com \
    --cc=peterx@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=zhangliang5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.