linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: define pte_add_end for consistency
@ 2020-06-30  3:18 Wei Yang
  2020-06-30 12:44 ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Wei Yang @ 2020-06-30  3:18 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, tglx, mingo, bp, akpm
  Cc: x86, linux-kernel, kasan-dev, linux-mm, Wei Yang

When walking page tables, we define several helpers to get the address of
the next boundary. But we don't have one for pte level.

Let's define it and consolidate the code in several places.

Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
---
 arch/x86/mm/init_64.c   | 6 ++----
 include/linux/pgtable.h | 7 +++++++
 mm/kasan/init.c         | 4 +---
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index dbae185511cd..f902fbd17f27 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -973,9 +973,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
 
 	pte = pte_start + pte_index(addr);
 	for (; addr < end; addr = next, pte++) {
-		next = (addr + PAGE_SIZE) & PAGE_MASK;
-		if (next > end)
-			next = end;
+		next = pte_addr_end(addr, end);
 
 		if (!pte_present(*pte))
 			continue;
@@ -1558,7 +1556,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
 		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
 
 		if (!boot_cpu_has(X86_FEATURE_PSE)) {
-			next = (addr + PAGE_SIZE) & PAGE_MASK;
+			next = pte_addr_end(addr, end);
 			pmd = pmd_offset(pud, addr);
 			if (pmd_none(*pmd))
 				continue;
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 32b6c52d41b9..0de09c6c89d2 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -706,6 +706,13 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
 })
 #endif
 
+#ifndef pte_addr_end
+#define pte_addr_end(addr, end)						\
+({	unsigned long __boundary = ((addr) + PAGE_SIZE) & PAGE_MASK;	\
+	(__boundary - 1 < (end) - 1) ? __boundary : (end);		\
+})
+#endif
+
 /*
  * When walking page tables, we usually want to skip any p?d_none entries;
  * and any p?d_bad entries - reporting the error before resetting to none.
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index fe6be0be1f76..89f748601f74 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -349,9 +349,7 @@ static void kasan_remove_pte_table(pte_t *pte, unsigned long addr,
 	unsigned long next;
 
 	for (; addr < end; addr = next, pte++) {
-		next = (addr + PAGE_SIZE) & PAGE_MASK;
-		if (next > end)
-			next = end;
+		next = pte_addr_end(addr, end);
 
 		if (!pte_present(*pte))
 			continue;
-- 
2.20.1 (Apple Git-117)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: define pte_add_end for consistency
  2020-06-30  3:18 [PATCH] mm: define pte_add_end for consistency Wei Yang
@ 2020-06-30 12:44 ` David Hildenbrand
  2020-07-01  2:11   ` Wei Yang
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2020-06-30 12:44 UTC (permalink / raw)
  To: Wei Yang, dave.hansen, luto, peterz, tglx, mingo, bp, akpm
  Cc: x86, linux-kernel, kasan-dev, linux-mm

On 30.06.20 05:18, Wei Yang wrote:
> When walking page tables, we define several helpers to get the address of
> the next boundary. But we don't have one for pte level.
> 
> Let's define it and consolidate the code in several places.
> 
> Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
> ---
>  arch/x86/mm/init_64.c   | 6 ++----
>  include/linux/pgtable.h | 7 +++++++
>  mm/kasan/init.c         | 4 +---
>  3 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index dbae185511cd..f902fbd17f27 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -973,9 +973,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>  
>  	pte = pte_start + pte_index(addr);
>  	for (; addr < end; addr = next, pte++) {
> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
> -		if (next > end)
> -			next = end;
> +		next = pte_addr_end(addr, end);
>  
>  		if (!pte_present(*pte))
>  			continue;
> @@ -1558,7 +1556,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
>  		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
>  
>  		if (!boot_cpu_has(X86_FEATURE_PSE)) {
> -			next = (addr + PAGE_SIZE) & PAGE_MASK;
> +			next = pte_addr_end(addr, end);
>  			pmd = pmd_offset(pud, addr);
>  			if (pmd_none(*pmd))
>  				continue;
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 32b6c52d41b9..0de09c6c89d2 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -706,6 +706,13 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>  })
>  #endif
>  
> +#ifndef pte_addr_end
> +#define pte_addr_end(addr, end)						\
> +({	unsigned long __boundary = ((addr) + PAGE_SIZE) & PAGE_MASK;	\
> +	(__boundary - 1 < (end) - 1) ? __boundary : (end);		\
> +})
> +#endif
> +
>  /*
>   * When walking page tables, we usually want to skip any p?d_none entries;
>   * and any p?d_bad entries - reporting the error before resetting to none.
> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> index fe6be0be1f76..89f748601f74 100644
> --- a/mm/kasan/init.c
> +++ b/mm/kasan/init.c
> @@ -349,9 +349,7 @@ static void kasan_remove_pte_table(pte_t *pte, unsigned long addr,
>  	unsigned long next;
>  
>  	for (; addr < end; addr = next, pte++) {
> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
> -		if (next > end)
> -			next = end;
> +		next = pte_addr_end(addr, end);
>  
>  		if (!pte_present(*pte))
>  			continue;
> 

I'm not really a friend of this I have to say. We're simply iterating
over single pages, not much magic ....

What would definitely make sense is replacing (addr + PAGE_SIZE) &
PAGE_MASK; by PAGE_ALIGN() ...

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: define pte_add_end for consistency
  2020-06-30 12:44 ` David Hildenbrand
@ 2020-07-01  2:11   ` Wei Yang
  2020-07-01  8:29     ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Wei Yang @ 2020-07-01  2:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, dave.hansen, luto, peterz, tglx, mingo, bp, akpm, x86,
	linux-kernel, kasan-dev, linux-mm

On Tue, Jun 30, 2020 at 02:44:00PM +0200, David Hildenbrand wrote:
>On 30.06.20 05:18, Wei Yang wrote:
>> When walking page tables, we define several helpers to get the address of
>> the next boundary. But we don't have one for pte level.
>> 
>> Let's define it and consolidate the code in several places.
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>> ---
>>  arch/x86/mm/init_64.c   | 6 ++----
>>  include/linux/pgtable.h | 7 +++++++
>>  mm/kasan/init.c         | 4 +---
>>  3 files changed, 10 insertions(+), 7 deletions(-)
>> 
>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>> index dbae185511cd..f902fbd17f27 100644
>> --- a/arch/x86/mm/init_64.c
>> +++ b/arch/x86/mm/init_64.c
>> @@ -973,9 +973,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>>  
>>  	pte = pte_start + pte_index(addr);
>>  	for (; addr < end; addr = next, pte++) {
>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>> -		if (next > end)
>> -			next = end;
>> +		next = pte_addr_end(addr, end);
>>  
>>  		if (!pte_present(*pte))
>>  			continue;
>> @@ -1558,7 +1556,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
>>  		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
>>  
>>  		if (!boot_cpu_has(X86_FEATURE_PSE)) {
>> -			next = (addr + PAGE_SIZE) & PAGE_MASK;
>> +			next = pte_addr_end(addr, end);
>>  			pmd = pmd_offset(pud, addr);
>>  			if (pmd_none(*pmd))
>>  				continue;
>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> index 32b6c52d41b9..0de09c6c89d2 100644
>> --- a/include/linux/pgtable.h
>> +++ b/include/linux/pgtable.h
>> @@ -706,6 +706,13 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>>  })
>>  #endif
>>  
>> +#ifndef pte_addr_end
>> +#define pte_addr_end(addr, end)						\
>> +({	unsigned long __boundary = ((addr) + PAGE_SIZE) & PAGE_MASK;	\
>> +	(__boundary - 1 < (end) - 1) ? __boundary : (end);		\
>> +})
>> +#endif
>> +
>>  /*
>>   * When walking page tables, we usually want to skip any p?d_none entries;
>>   * and any p?d_bad entries - reporting the error before resetting to none.
>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>> index fe6be0be1f76..89f748601f74 100644
>> --- a/mm/kasan/init.c
>> +++ b/mm/kasan/init.c
>> @@ -349,9 +349,7 @@ static void kasan_remove_pte_table(pte_t *pte, unsigned long addr,
>>  	unsigned long next;
>>  
>>  	for (; addr < end; addr = next, pte++) {
>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>> -		if (next > end)
>> -			next = end;
>> +		next = pte_addr_end(addr, end);
>>  
>>  		if (!pte_present(*pte))
>>  			continue;
>> 
>
>I'm not really a friend of this I have to say. We're simply iterating
>over single pages, not much magic ....

Hmm... yes, we are iterating on Page boundary, while we many have the case
when addr or end is not PAGE_ALIGN.

>
>What would definitely make sense is replacing (addr + PAGE_SIZE) &
>PAGE_MASK; by PAGE_ALIGN() ...
>

No, PAGE_ALIGN() is expanded to be 

	(addr + PAGE_SIZE - 1) & PAGE_MASK;

If we change the code to PAGE_ALIGN(), we would end up with infinite loop.

>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: define pte_add_end for consistency
  2020-07-01  2:11   ` Wei Yang
@ 2020-07-01  8:29     ` David Hildenbrand
  2020-07-01 11:54       ` Wei Yang
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2020-07-01  8:29 UTC (permalink / raw)
  To: Wei Yang
  Cc: dave.hansen, luto, peterz, tglx, mingo, bp, akpm, x86,
	linux-kernel, kasan-dev, linux-mm

On 01.07.20 04:11, Wei Yang wrote:
> On Tue, Jun 30, 2020 at 02:44:00PM +0200, David Hildenbrand wrote:
>> On 30.06.20 05:18, Wei Yang wrote:
>>> When walking page tables, we define several helpers to get the address of
>>> the next boundary. But we don't have one for pte level.
>>>
>>> Let's define it and consolidate the code in several places.
>>>
>>> Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>>> ---
>>>  arch/x86/mm/init_64.c   | 6 ++----
>>>  include/linux/pgtable.h | 7 +++++++
>>>  mm/kasan/init.c         | 4 +---
>>>  3 files changed, 10 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>> index dbae185511cd..f902fbd17f27 100644
>>> --- a/arch/x86/mm/init_64.c
>>> +++ b/arch/x86/mm/init_64.c
>>> @@ -973,9 +973,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>>>  
>>>  	pte = pte_start + pte_index(addr);
>>>  	for (; addr < end; addr = next, pte++) {
>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>> -		if (next > end)
>>> -			next = end;
>>> +		next = pte_addr_end(addr, end);
>>>  
>>>  		if (!pte_present(*pte))
>>>  			continue;
>>> @@ -1558,7 +1556,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
>>>  		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
>>>  
>>>  		if (!boot_cpu_has(X86_FEATURE_PSE)) {
>>> -			next = (addr + PAGE_SIZE) & PAGE_MASK;
>>> +			next = pte_addr_end(addr, end);
>>>  			pmd = pmd_offset(pud, addr);
>>>  			if (pmd_none(*pmd))
>>>  				continue;
>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>> index 32b6c52d41b9..0de09c6c89d2 100644
>>> --- a/include/linux/pgtable.h
>>> +++ b/include/linux/pgtable.h
>>> @@ -706,6 +706,13 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>>>  })
>>>  #endif
>>>  
>>> +#ifndef pte_addr_end
>>> +#define pte_addr_end(addr, end)						\
>>> +({	unsigned long __boundary = ((addr) + PAGE_SIZE) & PAGE_MASK;	\
>>> +	(__boundary - 1 < (end) - 1) ? __boundary : (end);		\
>>> +})
>>> +#endif
>>> +
>>>  /*
>>>   * When walking page tables, we usually want to skip any p?d_none entries;
>>>   * and any p?d_bad entries - reporting the error before resetting to none.
>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>>> index fe6be0be1f76..89f748601f74 100644
>>> --- a/mm/kasan/init.c
>>> +++ b/mm/kasan/init.c
>>> @@ -349,9 +349,7 @@ static void kasan_remove_pte_table(pte_t *pte, unsigned long addr,
>>>  	unsigned long next;
>>>  
>>>  	for (; addr < end; addr = next, pte++) {
>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>> -		if (next > end)
>>> -			next = end;
>>> +		next = pte_addr_end(addr, end);
>>>  
>>>  		if (!pte_present(*pte))
>>>  			continue;
>>>
>>
>> I'm not really a friend of this I have to say. We're simply iterating
>> over single pages, not much magic ....
> 
> Hmm... yes, we are iterating on Page boundary, while we many have the case
> when addr or end is not PAGE_ALIGN.

I really do wonder if not having page aligned addresses actually happens
in real life. Page tables operate on page granularity, and
adding/removing unaligned parts feels wrong ... and that's also why I
dislike such a helper.

1. kasan_add_zero_shadow()/kasan_remove_zero_shadow(). If I understand
the logic (WARN_ON()) correctly, we bail out in case we would ever end
up in such a scenario, where we would want to add/remove things not
aligned to PAGE_SIZE.

2. remove_pagetable()...->remove_pte_table()

vmemmap_free() should never try to de-populate sub-pages. Even with
sub-section hot-add/remove (2MB / 512 pages), with valid struct page
sizes (56, 64, 72, 80), we always end up with full pages.

kernel_physical_mapping_remove() is only called via
arch_remove_memory(). That will never remove unaligned parts.

3. register_page_bootmem_memmap()

It operates on full pages only.


This needs in-depth analysis, but my gut feeling is that this alignment
is unnecessary.

> 
>>
>> What would definitely make sense is replacing (addr + PAGE_SIZE) &
>> PAGE_MASK; by PAGE_ALIGN() ...
>>
> 
> No, PAGE_ALIGN() is expanded to be 
> 
> 	(addr + PAGE_SIZE - 1) & PAGE_MASK;
> 
> If we change the code to PAGE_ALIGN(), we would end up with infinite loop.

Very right, it would have to be PAGE_ALIGN(addr + 1).

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: define pte_add_end for consistency
  2020-07-01  8:29     ` David Hildenbrand
@ 2020-07-01 11:54       ` Wei Yang
  2020-07-02 16:28         ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Wei Yang @ 2020-07-01 11:54 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, dave.hansen, luto, peterz, tglx, mingo, bp, akpm, x86,
	linux-kernel, kasan-dev, linux-mm

On Wed, Jul 01, 2020 at 10:29:08AM +0200, David Hildenbrand wrote:
>On 01.07.20 04:11, Wei Yang wrote:
>> On Tue, Jun 30, 2020 at 02:44:00PM +0200, David Hildenbrand wrote:
>>> On 30.06.20 05:18, Wei Yang wrote:
>>>> When walking page tables, we define several helpers to get the address of
>>>> the next boundary. But we don't have one for pte level.
>>>>
>>>> Let's define it and consolidate the code in several places.
>>>>
>>>> Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>>>> ---
>>>>  arch/x86/mm/init_64.c   | 6 ++----
>>>>  include/linux/pgtable.h | 7 +++++++
>>>>  mm/kasan/init.c         | 4 +---
>>>>  3 files changed, 10 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>>> index dbae185511cd..f902fbd17f27 100644
>>>> --- a/arch/x86/mm/init_64.c
>>>> +++ b/arch/x86/mm/init_64.c
>>>> @@ -973,9 +973,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>>>>  
>>>>  	pte = pte_start + pte_index(addr);
>>>>  	for (; addr < end; addr = next, pte++) {
>>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>> -		if (next > end)
>>>> -			next = end;
>>>> +		next = pte_addr_end(addr, end);
>>>>  
>>>>  		if (!pte_present(*pte))
>>>>  			continue;
>>>> @@ -1558,7 +1556,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
>>>>  		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
>>>>  
>>>>  		if (!boot_cpu_has(X86_FEATURE_PSE)) {
>>>> -			next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>> +			next = pte_addr_end(addr, end);
>>>>  			pmd = pmd_offset(pud, addr);
>>>>  			if (pmd_none(*pmd))
>>>>  				continue;
>>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>>> index 32b6c52d41b9..0de09c6c89d2 100644
>>>> --- a/include/linux/pgtable.h
>>>> +++ b/include/linux/pgtable.h
>>>> @@ -706,6 +706,13 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>>>>  })
>>>>  #endif
>>>>  
>>>> +#ifndef pte_addr_end
>>>> +#define pte_addr_end(addr, end)						\
>>>> +({	unsigned long __boundary = ((addr) + PAGE_SIZE) & PAGE_MASK;	\
>>>> +	(__boundary - 1 < (end) - 1) ? __boundary : (end);		\
>>>> +})
>>>> +#endif
>>>> +
>>>>  /*
>>>>   * When walking page tables, we usually want to skip any p?d_none entries;
>>>>   * and any p?d_bad entries - reporting the error before resetting to none.
>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>>>> index fe6be0be1f76..89f748601f74 100644
>>>> --- a/mm/kasan/init.c
>>>> +++ b/mm/kasan/init.c
>>>> @@ -349,9 +349,7 @@ static void kasan_remove_pte_table(pte_t *pte, unsigned long addr,
>>>>  	unsigned long next;
>>>>  
>>>>  	for (; addr < end; addr = next, pte++) {
>>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>> -		if (next > end)
>>>> -			next = end;
>>>> +		next = pte_addr_end(addr, end);
>>>>  
>>>>  		if (!pte_present(*pte))
>>>>  			continue;
>>>>
>>>
>>> I'm not really a friend of this I have to say. We're simply iterating
>>> over single pages, not much magic ....
>> 
>> Hmm... yes, we are iterating on Page boundary, while we many have the case
>> when addr or end is not PAGE_ALIGN.
>
>I really do wonder if not having page aligned addresses actually happens
>in real life. Page tables operate on page granularity, and
>adding/removing unaligned parts feels wrong ... and that's also why I
>dislike such a helper.
>
>1. kasan_add_zero_shadow()/kasan_remove_zero_shadow(). If I understand
>the logic (WARN_ON()) correctly, we bail out in case we would ever end
>up in such a scenario, where we would want to add/remove things not
>aligned to PAGE_SIZE.
>
>2. remove_pagetable()...->remove_pte_table()
>
>vmemmap_free() should never try to de-populate sub-pages. Even with
>sub-section hot-add/remove (2MB / 512 pages), with valid struct page
>sizes (56, 64, 72, 80), we always end up with full pages.
>
>kernel_physical_mapping_remove() is only called via
>arch_remove_memory(). That will never remove unaligned parts.
>

I don't have a very clear mind now, while when you look into
remove_pte_table(), it has two cases based on alignment of addr and next.

If we always remove a page, the second case won't happen?

>3. register_page_bootmem_memmap()
>
>It operates on full pages only.
>
>
>This needs in-depth analysis, but my gut feeling is that this alignment
>is unnecessary.
>
>> 
>>>
>>> What would definitely make sense is replacing (addr + PAGE_SIZE) &
>>> PAGE_MASK; by PAGE_ALIGN() ...
>>>
>> 
>> No, PAGE_ALIGN() is expanded to be 
>> 
>> 	(addr + PAGE_SIZE - 1) & PAGE_MASK;
>> 
>> If we change the code to PAGE_ALIGN(), we would end up with infinite loop.
>
>Very right, it would have to be PAGE_ALIGN(addr + 1).
>
>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: define pte_add_end for consistency
  2020-07-01 11:54       ` Wei Yang
@ 2020-07-02 16:28         ` David Hildenbrand
  2020-07-03  1:34           ` Wei Yang
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2020-07-02 16:28 UTC (permalink / raw)
  To: Wei Yang
  Cc: dave.hansen, luto, peterz, tglx, mingo, bp, akpm, x86,
	linux-kernel, kasan-dev, linux-mm

On 01.07.20 13:54, Wei Yang wrote:
> On Wed, Jul 01, 2020 at 10:29:08AM +0200, David Hildenbrand wrote:
>> On 01.07.20 04:11, Wei Yang wrote:
>>> On Tue, Jun 30, 2020 at 02:44:00PM +0200, David Hildenbrand wrote:
>>>> On 30.06.20 05:18, Wei Yang wrote:
>>>>> When walking page tables, we define several helpers to get the address of
>>>>> the next boundary. But we don't have one for pte level.
>>>>>
>>>>> Let's define it and consolidate the code in several places.
>>>>>
>>>>> Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>>>>> ---
>>>>>  arch/x86/mm/init_64.c   | 6 ++----
>>>>>  include/linux/pgtable.h | 7 +++++++
>>>>>  mm/kasan/init.c         | 4 +---
>>>>>  3 files changed, 10 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>>>> index dbae185511cd..f902fbd17f27 100644
>>>>> --- a/arch/x86/mm/init_64.c
>>>>> +++ b/arch/x86/mm/init_64.c
>>>>> @@ -973,9 +973,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>>>>>  
>>>>>  	pte = pte_start + pte_index(addr);
>>>>>  	for (; addr < end; addr = next, pte++) {
>>>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>> -		if (next > end)
>>>>> -			next = end;
>>>>> +		next = pte_addr_end(addr, end);
>>>>>  
>>>>>  		if (!pte_present(*pte))
>>>>>  			continue;
>>>>> @@ -1558,7 +1556,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
>>>>>  		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
>>>>>  
>>>>>  		if (!boot_cpu_has(X86_FEATURE_PSE)) {
>>>>> -			next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>> +			next = pte_addr_end(addr, end);
>>>>>  			pmd = pmd_offset(pud, addr);
>>>>>  			if (pmd_none(*pmd))
>>>>>  				continue;
>>>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>>>> index 32b6c52d41b9..0de09c6c89d2 100644
>>>>> --- a/include/linux/pgtable.h
>>>>> +++ b/include/linux/pgtable.h
>>>>> @@ -706,6 +706,13 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>>>>>  })
>>>>>  #endif
>>>>>  
>>>>> +#ifndef pte_addr_end
>>>>> +#define pte_addr_end(addr, end)						\
>>>>> +({	unsigned long __boundary = ((addr) + PAGE_SIZE) & PAGE_MASK;	\
>>>>> +	(__boundary - 1 < (end) - 1) ? __boundary : (end);		\
>>>>> +})
>>>>> +#endif
>>>>> +
>>>>>  /*
>>>>>   * When walking page tables, we usually want to skip any p?d_none entries;
>>>>>   * and any p?d_bad entries - reporting the error before resetting to none.
>>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>>>>> index fe6be0be1f76..89f748601f74 100644
>>>>> --- a/mm/kasan/init.c
>>>>> +++ b/mm/kasan/init.c
>>>>> @@ -349,9 +349,7 @@ static void kasan_remove_pte_table(pte_t *pte, unsigned long addr,
>>>>>  	unsigned long next;
>>>>>  
>>>>>  	for (; addr < end; addr = next, pte++) {
>>>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>> -		if (next > end)
>>>>> -			next = end;
>>>>> +		next = pte_addr_end(addr, end);
>>>>>  
>>>>>  		if (!pte_present(*pte))
>>>>>  			continue;
>>>>>
>>>>
>>>> I'm not really a friend of this I have to say. We're simply iterating
>>>> over single pages, not much magic ....
>>>
>>> Hmm... yes, we are iterating on Page boundary, while we many have the case
>>> when addr or end is not PAGE_ALIGN.
>>
>> I really do wonder if not having page aligned addresses actually happens
>> in real life. Page tables operate on page granularity, and
>> adding/removing unaligned parts feels wrong ... and that's also why I
>> dislike such a helper.
>>
>> 1. kasan_add_zero_shadow()/kasan_remove_zero_shadow(). If I understand
>> the logic (WARN_ON()) correctly, we bail out in case we would ever end
>> up in such a scenario, where we would want to add/remove things not
>> aligned to PAGE_SIZE.
>>
>> 2. remove_pagetable()...->remove_pte_table()
>>
>> vmemmap_free() should never try to de-populate sub-pages. Even with
>> sub-section hot-add/remove (2MB / 512 pages), with valid struct page
>> sizes (56, 64, 72, 80), we always end up with full pages.
>>
>> kernel_physical_mapping_remove() is only called via
>> arch_remove_memory(). That will never remove unaligned parts.
>>
> 
> I don't have a very clear mind now, while when you look into
> remove_pte_table(), it has two cases based on alignment of addr and next.
> 
> If we always remove a page, the second case won't happen?

So, the code talks about that the second case can only happen for
vmemmap, never for direct mappings.

I don't see a way how this could ever happen with current page sizes,
even with sub-section hotadd (2MB). Maybe that is a legacy leftover or
was never relevant? Or I am missing something important, where we could
have sub-4k-page vmemmap data.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: define pte_add_end for consistency
  2020-07-02 16:28         ` David Hildenbrand
@ 2020-07-03  1:34           ` Wei Yang
  2020-07-03  7:23             ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Wei Yang @ 2020-07-03  1:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, dave.hansen, luto, peterz, tglx, mingo, bp, akpm, x86,
	linux-kernel, kasan-dev, linux-mm

On Thu, Jul 02, 2020 at 06:28:19PM +0200, David Hildenbrand wrote:
>On 01.07.20 13:54, Wei Yang wrote:
>> On Wed, Jul 01, 2020 at 10:29:08AM +0200, David Hildenbrand wrote:
>>> On 01.07.20 04:11, Wei Yang wrote:
>>>> On Tue, Jun 30, 2020 at 02:44:00PM +0200, David Hildenbrand wrote:
>>>>> On 30.06.20 05:18, Wei Yang wrote:
>>>>>> When walking page tables, we define several helpers to get the address of
>>>>>> the next boundary. But we don't have one for pte level.
>>>>>>
>>>>>> Let's define it and consolidate the code in several places.
>>>>>>
>>>>>> Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>>>>>> ---
>>>>>>  arch/x86/mm/init_64.c   | 6 ++----
>>>>>>  include/linux/pgtable.h | 7 +++++++
>>>>>>  mm/kasan/init.c         | 4 +---
>>>>>>  3 files changed, 10 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>>>>> index dbae185511cd..f902fbd17f27 100644
>>>>>> --- a/arch/x86/mm/init_64.c
>>>>>> +++ b/arch/x86/mm/init_64.c
>>>>>> @@ -973,9 +973,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>>>>>>  
>>>>>>  	pte = pte_start + pte_index(addr);
>>>>>>  	for (; addr < end; addr = next, pte++) {
>>>>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>>> -		if (next > end)
>>>>>> -			next = end;
>>>>>> +		next = pte_addr_end(addr, end);
>>>>>>  
>>>>>>  		if (!pte_present(*pte))
>>>>>>  			continue;
>>>>>> @@ -1558,7 +1556,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
>>>>>>  		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
>>>>>>  
>>>>>>  		if (!boot_cpu_has(X86_FEATURE_PSE)) {
>>>>>> -			next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>>> +			next = pte_addr_end(addr, end);
>>>>>>  			pmd = pmd_offset(pud, addr);
>>>>>>  			if (pmd_none(*pmd))
>>>>>>  				continue;
>>>>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>>>>> index 32b6c52d41b9..0de09c6c89d2 100644
>>>>>> --- a/include/linux/pgtable.h
>>>>>> +++ b/include/linux/pgtable.h
>>>>>> @@ -706,6 +706,13 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>>>>>>  })
>>>>>>  #endif
>>>>>>  
>>>>>> +#ifndef pte_addr_end
>>>>>> +#define pte_addr_end(addr, end)						\
>>>>>> +({	unsigned long __boundary = ((addr) + PAGE_SIZE) & PAGE_MASK;	\
>>>>>> +	(__boundary - 1 < (end) - 1) ? __boundary : (end);		\
>>>>>> +})
>>>>>> +#endif
>>>>>> +
>>>>>>  /*
>>>>>>   * When walking page tables, we usually want to skip any p?d_none entries;
>>>>>>   * and any p?d_bad entries - reporting the error before resetting to none.
>>>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>>>>>> index fe6be0be1f76..89f748601f74 100644
>>>>>> --- a/mm/kasan/init.c
>>>>>> +++ b/mm/kasan/init.c
>>>>>> @@ -349,9 +349,7 @@ static void kasan_remove_pte_table(pte_t *pte, unsigned long addr,
>>>>>>  	unsigned long next;
>>>>>>  
>>>>>>  	for (; addr < end; addr = next, pte++) {
>>>>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>>> -		if (next > end)
>>>>>> -			next = end;
>>>>>> +		next = pte_addr_end(addr, end);
>>>>>>  
>>>>>>  		if (!pte_present(*pte))
>>>>>>  			continue;
>>>>>>
>>>>>
>>>>> I'm not really a friend of this I have to say. We're simply iterating
>>>>> over single pages, not much magic ....
>>>>
>>>> Hmm... yes, we are iterating on Page boundary, while we many have the case
>>>> when addr or end is not PAGE_ALIGN.
>>>
>>> I really do wonder if not having page aligned addresses actually happens
>>> in real life. Page tables operate on page granularity, and
>>> adding/removing unaligned parts feels wrong ... and that's also why I
>>> dislike such a helper.
>>>
>>> 1. kasan_add_zero_shadow()/kasan_remove_zero_shadow(). If I understand
>>> the logic (WARN_ON()) correctly, we bail out in case we would ever end
>>> up in such a scenario, where we would want to add/remove things not
>>> aligned to PAGE_SIZE.
>>>
>>> 2. remove_pagetable()...->remove_pte_table()
>>>
>>> vmemmap_free() should never try to de-populate sub-pages. Even with
>>> sub-section hot-add/remove (2MB / 512 pages), with valid struct page
>>> sizes (56, 64, 72, 80), we always end up with full pages.
>>>
>>> kernel_physical_mapping_remove() is only called via
>>> arch_remove_memory(). That will never remove unaligned parts.
>>>
>> 
>> I don't have a very clear mind now, while when you look into
>> remove_pte_table(), it has two cases based on alignment of addr and next.
>> 
>> If we always remove a page, the second case won't happen?
>
>So, the code talks about that the second case can only happen for
>vmemmap, never for direct mappings.
>
>I don't see a way how this could ever happen with current page sizes,
>even with sub-section hotadd (2MB). Maybe that is a legacy leftover or
>was never relevant? Or I am missing something important, where we could
>have sub-4k-page vmemmap data.
>

I took a calculation on the sub-section page struct size, it is page size (4K)
aligned. This means you are right, which we won't depopulate a sub-page.

And yes, I am not sure all those variants would fit this case. So I would like
to leave as it now. How about your opinion?

>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: define pte_add_end for consistency
  2020-07-03  1:34           ` Wei Yang
@ 2020-07-03  7:23             ` David Hildenbrand
  2020-07-03  8:33               ` Wei Yang
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2020-07-03  7:23 UTC (permalink / raw)
  To: Wei Yang
  Cc: dave.hansen, luto, peterz, tglx, mingo, bp, akpm, x86,
	linux-kernel, kasan-dev, linux-mm

On 03.07.20 03:34, Wei Yang wrote:
> On Thu, Jul 02, 2020 at 06:28:19PM +0200, David Hildenbrand wrote:
>> On 01.07.20 13:54, Wei Yang wrote:
>>> On Wed, Jul 01, 2020 at 10:29:08AM +0200, David Hildenbrand wrote:
>>>> On 01.07.20 04:11, Wei Yang wrote:
>>>>> On Tue, Jun 30, 2020 at 02:44:00PM +0200, David Hildenbrand wrote:
>>>>>> On 30.06.20 05:18, Wei Yang wrote:
>>>>>>> When walking page tables, we define several helpers to get the address of
>>>>>>> the next boundary. But we don't have one for pte level.
>>>>>>>
>>>>>>> Let's define it and consolidate the code in several places.
>>>>>>>
>>>>>>> Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>>>>>>> ---
>>>>>>>  arch/x86/mm/init_64.c   | 6 ++----
>>>>>>>  include/linux/pgtable.h | 7 +++++++
>>>>>>>  mm/kasan/init.c         | 4 +---
>>>>>>>  3 files changed, 10 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>>>>>> index dbae185511cd..f902fbd17f27 100644
>>>>>>> --- a/arch/x86/mm/init_64.c
>>>>>>> +++ b/arch/x86/mm/init_64.c
>>>>>>> @@ -973,9 +973,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>>>>>>>  
>>>>>>>  	pte = pte_start + pte_index(addr);
>>>>>>>  	for (; addr < end; addr = next, pte++) {
>>>>>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>>>> -		if (next > end)
>>>>>>> -			next = end;
>>>>>>> +		next = pte_addr_end(addr, end);
>>>>>>>  
>>>>>>>  		if (!pte_present(*pte))
>>>>>>>  			continue;
>>>>>>> @@ -1558,7 +1556,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
>>>>>>>  		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
>>>>>>>  
>>>>>>>  		if (!boot_cpu_has(X86_FEATURE_PSE)) {
>>>>>>> -			next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>>>> +			next = pte_addr_end(addr, end);
>>>>>>>  			pmd = pmd_offset(pud, addr);
>>>>>>>  			if (pmd_none(*pmd))
>>>>>>>  				continue;
>>>>>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>>>>>> index 32b6c52d41b9..0de09c6c89d2 100644
>>>>>>> --- a/include/linux/pgtable.h
>>>>>>> +++ b/include/linux/pgtable.h
>>>>>>> @@ -706,6 +706,13 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>>>>>>>  })
>>>>>>>  #endif
>>>>>>>  
>>>>>>> +#ifndef pte_addr_end
>>>>>>> +#define pte_addr_end(addr, end)						\
>>>>>>> +({	unsigned long __boundary = ((addr) + PAGE_SIZE) & PAGE_MASK;	\
>>>>>>> +	(__boundary - 1 < (end) - 1) ? __boundary : (end);		\
>>>>>>> +})
>>>>>>> +#endif
>>>>>>> +
>>>>>>>  /*
>>>>>>>   * When walking page tables, we usually want to skip any p?d_none entries;
>>>>>>>   * and any p?d_bad entries - reporting the error before resetting to none.
>>>>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>>>>>>> index fe6be0be1f76..89f748601f74 100644
>>>>>>> --- a/mm/kasan/init.c
>>>>>>> +++ b/mm/kasan/init.c
>>>>>>> @@ -349,9 +349,7 @@ static void kasan_remove_pte_table(pte_t *pte, unsigned long addr,
>>>>>>>  	unsigned long next;
>>>>>>>  
>>>>>>>  	for (; addr < end; addr = next, pte++) {
>>>>>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>>>> -		if (next > end)
>>>>>>> -			next = end;
>>>>>>> +		next = pte_addr_end(addr, end);
>>>>>>>  
>>>>>>>  		if (!pte_present(*pte))
>>>>>>>  			continue;
>>>>>>>
>>>>>>
>>>>>> I'm not really a friend of this I have to say. We're simply iterating
>>>>>> over single pages, not much magic ....
>>>>>
>>>>> Hmm... yes, we are iterating on Page boundary, while we many have the case
>>>>> when addr or end is not PAGE_ALIGN.
>>>>
>>>> I really do wonder if not having page aligned addresses actually happens
>>>> in real life. Page tables operate on page granularity, and
>>>> adding/removing unaligned parts feels wrong ... and that's also why I
>>>> dislike such a helper.
>>>>
>>>> 1. kasan_add_zero_shadow()/kasan_remove_zero_shadow(). If I understand
>>>> the logic (WARN_ON()) correctly, we bail out in case we would ever end
>>>> up in such a scenario, where we would want to add/remove things not
>>>> aligned to PAGE_SIZE.
>>>>
>>>> 2. remove_pagetable()...->remove_pte_table()
>>>>
>>>> vmemmap_free() should never try to de-populate sub-pages. Even with
>>>> sub-section hot-add/remove (2MB / 512 pages), with valid struct page
>>>> sizes (56, 64, 72, 80), we always end up with full pages.
>>>>
>>>> kernel_physical_mapping_remove() is only called via
>>>> arch_remove_memory(). That will never remove unaligned parts.
>>>>
>>>
>>> I don't have a very clear mind now, while when you look into
>>> remove_pte_table(), it has two cases based on alignment of addr and next.
>>>
>>> If we always remove a page, the second case won't happen?
>>
>> So, the code talks about that the second case can only happen for
>> vmemmap, never for direct mappings.
>>
>> I don't see a way how this could ever happen with current page sizes,
>> even with sub-section hotadd (2MB). Maybe that is a legacy leftover or
>> was never relevant? Or I am missing something important, where we could
>> have sub-4k-page vmemmap data.
>>
> 
> I took a calculation on the sub-section page struct size, it is page size (4K)
> aligned. This means you are right, which we won't depopulate a sub-page.
> 
> And yes, I am not sure all those variants would fit this case. So I would like
> to leave as it now. How about your opinion?

I'd say we clean this up and protect it by WARN_ON_ONCE(). Then, it
won't need another round of investigation to find out that handling
sub-pages is irrelevant.

If you don't want to tackle this, I can have a look. Just let me know.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm: define pte_add_end for consistency
  2020-07-03  7:23             ` David Hildenbrand
@ 2020-07-03  8:33               ` Wei Yang
  0 siblings, 0 replies; 9+ messages in thread
From: Wei Yang @ 2020-07-03  8:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, dave.hansen, luto, peterz, tglx, mingo, bp, akpm, x86,
	linux-kernel, kasan-dev, linux-mm

On Fri, Jul 03, 2020 at 09:23:30AM +0200, David Hildenbrand wrote:
>On 03.07.20 03:34, Wei Yang wrote:
>> On Thu, Jul 02, 2020 at 06:28:19PM +0200, David Hildenbrand wrote:
>>> On 01.07.20 13:54, Wei Yang wrote:
>>>> On Wed, Jul 01, 2020 at 10:29:08AM +0200, David Hildenbrand wrote:
>>>>> On 01.07.20 04:11, Wei Yang wrote:
>>>>>> On Tue, Jun 30, 2020 at 02:44:00PM +0200, David Hildenbrand wrote:
>>>>>>> On 30.06.20 05:18, Wei Yang wrote:
>>>>>>>> When walking page tables, we define several helpers to get the address of
>>>>>>>> the next boundary. But we don't have one for pte level.
>>>>>>>>
>>>>>>>> Let's define it and consolidate the code in several places.
>>>>>>>>
>>>>>>>> Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
>>>>>>>> ---
>>>>>>>>  arch/x86/mm/init_64.c   | 6 ++----
>>>>>>>>  include/linux/pgtable.h | 7 +++++++
>>>>>>>>  mm/kasan/init.c         | 4 +---
>>>>>>>>  3 files changed, 10 insertions(+), 7 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>>>>>>> index dbae185511cd..f902fbd17f27 100644
>>>>>>>> --- a/arch/x86/mm/init_64.c
>>>>>>>> +++ b/arch/x86/mm/init_64.c
>>>>>>>> @@ -973,9 +973,7 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
>>>>>>>>  
>>>>>>>>  	pte = pte_start + pte_index(addr);
>>>>>>>>  	for (; addr < end; addr = next, pte++) {
>>>>>>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>>>>> -		if (next > end)
>>>>>>>> -			next = end;
>>>>>>>> +		next = pte_addr_end(addr, end);
>>>>>>>>  
>>>>>>>>  		if (!pte_present(*pte))
>>>>>>>>  			continue;
>>>>>>>> @@ -1558,7 +1556,7 @@ void register_page_bootmem_memmap(unsigned long section_nr,
>>>>>>>>  		get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
>>>>>>>>  
>>>>>>>>  		if (!boot_cpu_has(X86_FEATURE_PSE)) {
>>>>>>>> -			next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>>>>> +			next = pte_addr_end(addr, end);
>>>>>>>>  			pmd = pmd_offset(pud, addr);
>>>>>>>>  			if (pmd_none(*pmd))
>>>>>>>>  				continue;
>>>>>>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>>>>>>> index 32b6c52d41b9..0de09c6c89d2 100644
>>>>>>>> --- a/include/linux/pgtable.h
>>>>>>>> +++ b/include/linux/pgtable.h
>>>>>>>> @@ -706,6 +706,13 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>>>>>>>>  })
>>>>>>>>  #endif
>>>>>>>>  
>>>>>>>> +#ifndef pte_addr_end
>>>>>>>> +#define pte_addr_end(addr, end)						\
>>>>>>>> +({	unsigned long __boundary = ((addr) + PAGE_SIZE) & PAGE_MASK;	\
>>>>>>>> +	(__boundary - 1 < (end) - 1) ? __boundary : (end);		\
>>>>>>>> +})
>>>>>>>> +#endif
>>>>>>>> +
>>>>>>>>  /*
>>>>>>>>   * When walking page tables, we usually want to skip any p?d_none entries;
>>>>>>>>   * and any p?d_bad entries - reporting the error before resetting to none.
>>>>>>>> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
>>>>>>>> index fe6be0be1f76..89f748601f74 100644
>>>>>>>> --- a/mm/kasan/init.c
>>>>>>>> +++ b/mm/kasan/init.c
>>>>>>>> @@ -349,9 +349,7 @@ static void kasan_remove_pte_table(pte_t *pte, unsigned long addr,
>>>>>>>>  	unsigned long next;
>>>>>>>>  
>>>>>>>>  	for (; addr < end; addr = next, pte++) {
>>>>>>>> -		next = (addr + PAGE_SIZE) & PAGE_MASK;
>>>>>>>> -		if (next > end)
>>>>>>>> -			next = end;
>>>>>>>> +		next = pte_addr_end(addr, end);
>>>>>>>>  
>>>>>>>>  		if (!pte_present(*pte))
>>>>>>>>  			continue;
>>>>>>>>
>>>>>>>
>>>>>>> I'm not really a friend of this I have to say. We're simply iterating
>>>>>>> over single pages, not much magic ....
>>>>>>
>>>>>> Hmm... yes, we are iterating on Page boundary, while we many have the case
>>>>>> when addr or end is not PAGE_ALIGN.
>>>>>
>>>>> I really do wonder if not having page aligned addresses actually happens
>>>>> in real life. Page tables operate on page granularity, and
>>>>> adding/removing unaligned parts feels wrong ... and that's also why I
>>>>> dislike such a helper.
>>>>>
>>>>> 1. kasan_add_zero_shadow()/kasan_remove_zero_shadow(). If I understand
>>>>> the logic (WARN_ON()) correctly, we bail out in case we would ever end
>>>>> up in such a scenario, where we would want to add/remove things not
>>>>> aligned to PAGE_SIZE.
>>>>>
>>>>> 2. remove_pagetable()...->remove_pte_table()
>>>>>
>>>>> vmemmap_free() should never try to de-populate sub-pages. Even with
>>>>> sub-section hot-add/remove (2MB / 512 pages), with valid struct page
>>>>> sizes (56, 64, 72, 80), we always end up with full pages.
>>>>>
>>>>> kernel_physical_mapping_remove() is only called via
>>>>> arch_remove_memory(). That will never remove unaligned parts.
>>>>>
>>>>
>>>> I don't have a very clear mind now, while when you look into
>>>> remove_pte_table(), it has two cases based on alignment of addr and next.
>>>>
>>>> If we always remove a page, the second case won't happen?
>>>
>>> So, the code talks about that the second case can only happen for
>>> vmemmap, never for direct mappings.
>>>
>>> I don't see a way how this could ever happen with current page sizes,
>>> even with sub-section hotadd (2MB). Maybe that is a legacy leftover or
>>> was never relevant? Or I am missing something important, where we could
>>> have sub-4k-page vmemmap data.
>>>
>> 
>> I took a calculation on the sub-section page struct size, it is page size (4K)
>> aligned. This means you are right, which we won't depopulate a sub-page.
>> 
>> And yes, I am not sure all those variants would fit this case. So I would like
>> to leave as it now. How about your opinion?
>
>I'd say we clean this up and protect it by WARN_ON_ONCE(). Then, it
>won't need another round of investigation to find out that handling
>sub-pages is irrelevant.
>
>If you don't want to tackle this, I can have a look. Just let me know.
>

Actually, I don't get what you are trying to do. So go ahead, maybe I can
review your change.

>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-07-03  8:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-30  3:18 [PATCH] mm: define pte_add_end for consistency Wei Yang
2020-06-30 12:44 ` David Hildenbrand
2020-07-01  2:11   ` Wei Yang
2020-07-01  8:29     ` David Hildenbrand
2020-07-01 11:54       ` Wei Yang
2020-07-02 16:28         ` David Hildenbrand
2020-07-03  1:34           ` Wei Yang
2020-07-03  7:23             ` David Hildenbrand
2020-07-03  8:33               ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).