linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups
@ 2020-01-13 14:40 David Hildenbrand
  2020-01-13 14:40 ` [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: David Hildenbrand @ 2020-01-13 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Kirill A. Shutemov,
	Michal Hocko, Oscar Salvador, Pavel Tatashin

Two cleanups for "[PATCH] mm/page_alloc: Skip non present sections on zone
initialization" [1], whereby one cleanup seems to also be a fix for a
(theoretial?) kernelcore=mirror case - unless I am messing something up :)

[1] https://lkml.kernel.org/r/20191230093828.24613-1-kirill.shutemov@linux.intel.com

David Hildenbrand (2):
  mm/page_alloc: fix and rework pfn handling in memmap_init_zone()
  mm: factor out next_present_section_nr()

 include/linux/mmzone.h | 10 ++++++++++
 mm/page_alloc.c        | 20 ++++++++------------
 mm/sparse.c            | 10 ----------
 3 files changed, 18 insertions(+), 22 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone()
  2020-01-13 14:40 [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups David Hildenbrand
@ 2020-01-13 14:40 ` David Hildenbrand
  2020-02-03 21:35   ` Alexander Duyck
  2020-01-13 14:40 ` [PATCH v1 2/2] mm: factor out next_present_section_nr() David Hildenbrand
  2020-01-31  4:30 ` [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups Andrew Morton
  2 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2020-01-13 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Pavel Tatashin, Andrew Morton,
	Michal Hocko, Oscar Salvador, Kirill A . Shutemov

Let's update the pfn manually whenever we continue the loop. This makes
the code easier to read but also less error prone (and we can directly
fix one issue).

When overlap_memmap_init() returns true, pfn is updated to
"memblock_region_memory_end_pfn(r)". So it already points at the *next*
pfn to process. Incrementing the pfn another time is wrong, we might
leave one uninitialized. I spotted this by inspecting the code, so I have
no idea if this is relevant in practise (with kernelcore=mirror).

Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a41bd7341de1..a92791512077 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 	}
 #endif
 
-	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+	for (pfn = start_pfn; pfn < end_pfn; ) {
 		/*
 		 * There can be holes in boot-time mem_map[]s handed to this
 		 * function.  They do not exist on hotplugged memory.
 		 */
 		if (context == MEMMAP_EARLY) {
 			if (!early_pfn_valid(pfn)) {
-				pfn = next_pfn(pfn) - 1;
+				pfn = next_pfn(pfn);
 				continue;
 			}
-			if (!early_pfn_in_nid(pfn, nid))
+			if (!early_pfn_in_nid(pfn, nid)) {
+				pfn++;
 				continue;
+			}
 			if (overlap_memmap_init(zone, &pfn))
 				continue;
 			if (defer_init(nid, pfn, end_pfn))
@@ -5944,6 +5946,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
 			cond_resched();
 		}
+		pfn++;
 	}
 }
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v1 2/2] mm: factor out next_present_section_nr()
  2020-01-13 14:40 [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups David Hildenbrand
  2020-01-13 14:40 ` [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() David Hildenbrand
@ 2020-01-13 14:40 ` David Hildenbrand
  2020-01-13 22:41   ` Kirill A. Shutemov
  2020-01-31  4:30 ` [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups Andrew Morton
  2 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2020-01-13 14:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Michal Hocko,
	Oscar Salvador, Kirill A . Shutemov

Let's move it to the header and use the shorter variant from
mm/page_alloc.c (the original one will also check
"__highest_present_section_nr + 1", which is not necessary). While at it,
make the section_nr in next_pfn() const.

In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once
we exceed __highest_present_section_nr, which doesn't make a difference in
the caller as it is big enough (>= all sane end_pfn).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/mmzone.h | 10 ++++++++++
 mm/page_alloc.c        | 11 ++---------
 mm/sparse.c            | 10 ----------
 3 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c2bc309d1634..462f6873905a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn)
 	return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
 }
 
+static inline unsigned long next_present_section_nr(unsigned long section_nr)
+{
+	while (++section_nr <= __highest_present_section_nr) {
+		if (present_section_nr(section_nr))
+			return section_nr;
+	}
+
+	return -1;
+}
+
 /*
  * These are _only_ used during initialisation, therefore they
  * can use __initdata ...  They could have names to indicate
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a92791512077..26e8044e9848 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
 /* Skip PFNs that belong to non-present sections */
 static inline __meminit unsigned long next_pfn(unsigned long pfn)
 {
-	unsigned long section_nr;
+	const unsigned long section_nr = pfn_to_section_nr(++pfn);
 
-	section_nr = pfn_to_section_nr(++pfn);
 	if (present_section_nr(section_nr))
 		return pfn;
-
-	while (++section_nr <= __highest_present_section_nr) {
-		if (present_section_nr(section_nr))
-			return section_nr_to_pfn(section_nr);
-	}
-
-	return -1;
+	return section_nr_to_pfn(next_present_section_nr(section_nr));
 }
 #else
 static inline __meminit unsigned long next_pfn(unsigned long pfn)
diff --git a/mm/sparse.c b/mm/sparse.c
index 3822ecbd8a1f..ac4a2bfae514 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -198,16 +198,6 @@ static void section_mark_present(struct mem_section *ms)
 	ms->section_mem_map |= SECTION_MARKED_PRESENT;
 }
 
-static inline unsigned long next_present_section_nr(unsigned long section_nr)
-{
-	do {
-		section_nr++;
-		if (present_section_nr(section_nr))
-			return section_nr;
-	} while ((section_nr <= __highest_present_section_nr));
-
-	return -1;
-}
 #define for_each_present_section_nr(start, section_nr)		\
 	for (section_nr = next_present_section_nr(start-1);	\
 	     ((section_nr != -1) &&				\
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr()
  2020-01-13 14:40 ` [PATCH v1 2/2] mm: factor out next_present_section_nr() David Hildenbrand
@ 2020-01-13 22:41   ` Kirill A. Shutemov
  2020-01-13 22:57     ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2020-01-13 22:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador

On Mon, Jan 13, 2020 at 03:40:35PM +0100, David Hildenbrand wrote:
> Let's move it to the header and use the shorter variant from
> mm/page_alloc.c (the original one will also check
> "__highest_present_section_nr + 1", which is not necessary). While at it,
> make the section_nr in next_pfn() const.
> 
> In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once
> we exceed __highest_present_section_nr, which doesn't make a difference in
> the caller as it is big enough (>= all sane end_pfn).
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/mmzone.h | 10 ++++++++++
>  mm/page_alloc.c        | 11 ++---------
>  mm/sparse.c            | 10 ----------
>  3 files changed, 12 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index c2bc309d1634..462f6873905a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn)
>  	return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
>  }
>  
> +static inline unsigned long next_present_section_nr(unsigned long section_nr)
> +{
> +	while (++section_nr <= __highest_present_section_nr) {
> +		if (present_section_nr(section_nr))
> +			return section_nr;
> +	}
> +
> +	return -1;
> +}
> +
>  /*
>   * These are _only_ used during initialisation, therefore they
>   * can use __initdata ...  They could have names to indicate
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a92791512077..26e8044e9848 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>  /* Skip PFNs that belong to non-present sections */
>  static inline __meminit unsigned long next_pfn(unsigned long pfn)
>  {
> -	unsigned long section_nr;
> +	const unsigned long section_nr = pfn_to_section_nr(++pfn);
>  
> -	section_nr = pfn_to_section_nr(++pfn);
>  	if (present_section_nr(section_nr))
>  		return pfn;
> -
> -	while (++section_nr <= __highest_present_section_nr) {
> -		if (present_section_nr(section_nr))
> -			return section_nr_to_pfn(section_nr);
> -	}
> -
> -	return -1;
> +	return section_nr_to_pfn(next_present_section_nr(section_nr));

This changes behaviour in the corner case: if next_present_section_nr()
returns -1, we call section_nr_to_pfn() for it. It's unlikely would give
any valid pfn, but I can't say for sure for all archs. I guess the worst
case scenrio would be endless loop over the same secitons/pfns.

Have you considered the case?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr()
  2020-01-13 22:41   ` Kirill A. Shutemov
@ 2020-01-13 22:57     ` David Hildenbrand
  2020-01-13 23:02       ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2020-01-13 22:57 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Michal Hocko, Oscar Salvador



> Am 13.01.2020 um 23:41 schrieb Kirill A. Shutemov <kirill@shutemov.name>:
> 
> On Mon, Jan 13, 2020 at 03:40:35PM +0100, David Hildenbrand wrote:
>> Let's move it to the header and use the shorter variant from
>> mm/page_alloc.c (the original one will also check
>> "__highest_present_section_nr + 1", which is not necessary). While at it,
>> make the section_nr in next_pfn() const.
>> 
>> In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once
>> we exceed __highest_present_section_nr, which doesn't make a difference in
>> the caller as it is big enough (>= all sane end_pfn).
>> 
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> include/linux/mmzone.h | 10 ++++++++++
>> mm/page_alloc.c        | 11 ++---------
>> mm/sparse.c            | 10 ----------
>> 3 files changed, 12 insertions(+), 19 deletions(-)
>> 
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index c2bc309d1634..462f6873905a 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn)
>>    return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
>> }
>> 
>> +static inline unsigned long next_present_section_nr(unsigned long section_nr)
>> +{
>> +    while (++section_nr <= __highest_present_section_nr) {
>> +        if (present_section_nr(section_nr))
>> +            return section_nr;
>> +    }
>> +
>> +    return -1;
>> +}
>> +
>> /*
>>  * These are _only_ used during initialisation, therefore they
>>  * can use __initdata ...  They could have names to indicate
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a92791512077..26e8044e9848 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>> /* Skip PFNs that belong to non-present sections */
>> static inline __meminit unsigned long next_pfn(unsigned long pfn)
>> {
>> -    unsigned long section_nr;
>> +    const unsigned long section_nr = pfn_to_section_nr(++pfn);
>> 
>> -    section_nr = pfn_to_section_nr(++pfn);
>>    if (present_section_nr(section_nr))
>>        return pfn;
>> -
>> -    while (++section_nr <= __highest_present_section_nr) {
>> -        if (present_section_nr(section_nr))
>> -            return section_nr_to_pfn(section_nr);
>> -    }
>> -
>> -    return -1;
>> +    return section_nr_to_pfn(next_present_section_nr(section_nr));
> 
> This changes behaviour in the corner case: if next_present_section_nr()
> returns -1, we call section_nr_to_pfn() for it. It's unlikely would give
> any valid pfn, but I can't say for sure for all archs. I guess the worst
> case scenrio would be endless loop over the same secitons/pfns.
> 
> Have you considered the case?

Yes, see the patch description. We return -1 << PFN_SECTION_SHIFT, so a number close to the end of the address space (0xfff...000). (Will double check tomorrow if any 32bit arch could be problematic here)

Thanks!

> 
> -- 
> Kirill A. Shutemov
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr()
  2020-01-13 22:57     ` David Hildenbrand
@ 2020-01-13 23:02       ` David Hildenbrand
  2020-01-14 10:41         ` Kirill A. Shutemov
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2020-01-13 23:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton,
	Michal Hocko, Oscar Salvador



> Am 13.01.2020 um 23:57 schrieb David Hildenbrand <dhildenb@redhat.com>:
> 
> 
> 
>>> Am 13.01.2020 um 23:41 schrieb Kirill A. Shutemov <kirill@shutemov.name>:
>>> 
>>> On Mon, Jan 13, 2020 at 03:40:35PM +0100, David Hildenbrand wrote:
>>> Let's move it to the header and use the shorter variant from
>>> mm/page_alloc.c (the original one will also check
>>> "__highest_present_section_nr + 1", which is not necessary). While at it,
>>> make the section_nr in next_pfn() const.
>>> 
>>> In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once
>>> we exceed __highest_present_section_nr, which doesn't make a difference in
>>> the caller as it is big enough (>= all sane end_pfn).
>>> 
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Michal Hocko <mhocko@kernel.org>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>> include/linux/mmzone.h | 10 ++++++++++
>>> mm/page_alloc.c        | 11 ++---------
>>> mm/sparse.c            | 10 ----------
>>> 3 files changed, 12 insertions(+), 19 deletions(-)
>>> 
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index c2bc309d1634..462f6873905a 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn)
>>>   return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
>>> }
>>> 
>>> +static inline unsigned long next_present_section_nr(unsigned long section_nr)
>>> +{
>>> +    while (++section_nr <= __highest_present_section_nr) {
>>> +        if (present_section_nr(section_nr))
>>> +            return section_nr;
>>> +    }
>>> +
>>> +    return -1;
>>> +}
>>> +
>>> /*
>>> * These are _only_ used during initialisation, therefore they
>>> * can use __initdata ...  They could have names to indicate
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index a92791512077..26e8044e9848 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>>> /* Skip PFNs that belong to non-present sections */
>>> static inline __meminit unsigned long next_pfn(unsigned long pfn)
>>> {
>>> -    unsigned long section_nr;
>>> +    const unsigned long section_nr = pfn_to_section_nr(++pfn);
>>> 
>>> -    section_nr = pfn_to_section_nr(++pfn);
>>>   if (present_section_nr(section_nr))
>>>       return pfn;
>>> -
>>> -    while (++section_nr <= __highest_present_section_nr) {
>>> -        if (present_section_nr(section_nr))
>>> -            return section_nr_to_pfn(section_nr);
>>> -    }
>>> -
>>> -    return -1;
>>> +    return section_nr_to_pfn(next_present_section_nr(section_nr));
>> 
>> This changes behaviour in the corner case: if next_present_section_nr()
>> returns -1, we call section_nr_to_pfn() for it. It's unlikely would give
>> any valid pfn, but I can't say for sure for all archs. I guess the worst
>> case scenrio would be endless loop over the same secitons/pfns.
>> 
>> Have you considered the case?
> 
> Yes, see the patch description. We return -1 << PFN_SECTION_SHIFT, so a number close to the end of the address space (0xfff...000). (Will double check tomorrow if any 32bit arch could be problematic here)

... but thinking again, 0xfff... is certainly an invalid PFN, so this should work just fine.

(biggest possible pfn is -1 >> PFN_SHIFT)

But it‘s late in Germany, will double check tomorrow :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr()
  2020-01-13 23:02       ` David Hildenbrand
@ 2020-01-14 10:41         ` Kirill A. Shutemov
  2020-01-14 10:49           ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2020-01-14 10:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador

On Tue, Jan 14, 2020 at 12:02:00AM +0100, David Hildenbrand wrote:
> 
> 
> > Am 13.01.2020 um 23:57 schrieb David Hildenbrand <dhildenb@redhat.com>:
> > 
> > 
> > 
> >>> Am 13.01.2020 um 23:41 schrieb Kirill A. Shutemov <kirill@shutemov.name>:
> >>> 
> >>> On Mon, Jan 13, 2020 at 03:40:35PM +0100, David Hildenbrand wrote:
> >>> Let's move it to the header and use the shorter variant from
> >>> mm/page_alloc.c (the original one will also check
> >>> "__highest_present_section_nr + 1", which is not necessary). While at it,
> >>> make the section_nr in next_pfn() const.
> >>> 
> >>> In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once
> >>> we exceed __highest_present_section_nr, which doesn't make a difference in
> >>> the caller as it is big enough (>= all sane end_pfn).
> >>> 
> >>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>> Cc: Michal Hocko <mhocko@kernel.org>
> >>> Cc: Oscar Salvador <osalvador@suse.de>
> >>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> >>> Signed-off-by: David Hildenbrand <david@redhat.com>
> >>> ---
> >>> include/linux/mmzone.h | 10 ++++++++++
> >>> mm/page_alloc.c        | 11 ++---------
> >>> mm/sparse.c            | 10 ----------
> >>> 3 files changed, 12 insertions(+), 19 deletions(-)
> >>> 
> >>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >>> index c2bc309d1634..462f6873905a 100644
> >>> --- a/include/linux/mmzone.h
> >>> +++ b/include/linux/mmzone.h
> >>> @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn)
> >>>   return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
> >>> }
> >>> 
> >>> +static inline unsigned long next_present_section_nr(unsigned long section_nr)
> >>> +{
> >>> +    while (++section_nr <= __highest_present_section_nr) {
> >>> +        if (present_section_nr(section_nr))
> >>> +            return section_nr;
> >>> +    }
> >>> +
> >>> +    return -1;
> >>> +}
> >>> +
> >>> /*
> >>> * These are _only_ used during initialisation, therefore they
> >>> * can use __initdata ...  They could have names to indicate
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index a92791512077..26e8044e9848 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> >>> /* Skip PFNs that belong to non-present sections */
> >>> static inline __meminit unsigned long next_pfn(unsigned long pfn)
> >>> {
> >>> -    unsigned long section_nr;
> >>> +    const unsigned long section_nr = pfn_to_section_nr(++pfn);
> >>> 
> >>> -    section_nr = pfn_to_section_nr(++pfn);
> >>>   if (present_section_nr(section_nr))
> >>>       return pfn;
> >>> -
> >>> -    while (++section_nr <= __highest_present_section_nr) {
> >>> -        if (present_section_nr(section_nr))
> >>> -            return section_nr_to_pfn(section_nr);
> >>> -    }
> >>> -
> >>> -    return -1;
> >>> +    return section_nr_to_pfn(next_present_section_nr(section_nr));
> >> 
> >> This changes behaviour in the corner case: if next_present_section_nr()
> >> returns -1, we call section_nr_to_pfn() for it. It's unlikely would give
> >> any valid pfn, but I can't say for sure for all archs. I guess the worst
> >> case scenrio would be endless loop over the same secitons/pfns.
> >> 
> >> Have you considered the case?
> > 
> > Yes, see the patch description. We return -1 << PFN_SECTION_SHIFT, so a number close to the end of the address space (0xfff...000). (Will double check tomorrow if any 32bit arch could be problematic here)
> 
> ... but thinking again, 0xfff... is certainly an invalid PFN, so this should work just fine.
> 
> (biggest possible pfn is -1 >> PFN_SHIFT)
> 
> But it‘s late in Germany, will double check tomorrow :)

If the end_pfn happens the be more than -1UL << PFN_SECTION_SHIFT we are
screwed: the pfn is invalid, next_present_section_nr() returns -1, the
next iterartion is on the same pfn and we have endless loop.

The question is whether we can prove end_pfn is always less than
-1UL << PFN_SECTION_SHIFT in any configuration of any arch.

It is not obvious for me.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr()
  2020-01-14 10:41         ` Kirill A. Shutemov
@ 2020-01-14 10:49           ` David Hildenbrand
  2020-01-14 15:52             ` Kirill A. Shutemov
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2020-01-14 10:49 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador

On 14.01.20 11:41, Kirill A. Shutemov wrote:
> On Tue, Jan 14, 2020 at 12:02:00AM +0100, David Hildenbrand wrote:
>>
>>
>>> Am 13.01.2020 um 23:57 schrieb David Hildenbrand <dhildenb@redhat.com>:
>>>
>>> 
>>>
>>>>> Am 13.01.2020 um 23:41 schrieb Kirill A. Shutemov <kirill@shutemov.name>:
>>>>>
>>>>> On Mon, Jan 13, 2020 at 03:40:35PM +0100, David Hildenbrand wrote:
>>>>> Let's move it to the header and use the shorter variant from
>>>>> mm/page_alloc.c (the original one will also check
>>>>> "__highest_present_section_nr + 1", which is not necessary). While at it,
>>>>> make the section_nr in next_pfn() const.
>>>>>
>>>>> In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once
>>>>> we exceed __highest_present_section_nr, which doesn't make a difference in
>>>>> the caller as it is big enough (>= all sane end_pfn).
>>>>>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>> ---
>>>>> include/linux/mmzone.h | 10 ++++++++++
>>>>> mm/page_alloc.c        | 11 ++---------
>>>>> mm/sparse.c            | 10 ----------
>>>>> 3 files changed, 12 insertions(+), 19 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>>> index c2bc309d1634..462f6873905a 100644
>>>>> --- a/include/linux/mmzone.h
>>>>> +++ b/include/linux/mmzone.h
>>>>> @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn)
>>>>>   return present_section(__nr_to_section(pfn_to_section_nr(pfn)));
>>>>> }
>>>>>
>>>>> +static inline unsigned long next_present_section_nr(unsigned long section_nr)
>>>>> +{
>>>>> +    while (++section_nr <= __highest_present_section_nr) {
>>>>> +        if (present_section_nr(section_nr))
>>>>> +            return section_nr;
>>>>> +    }
>>>>> +
>>>>> +    return -1;
>>>>> +}
>>>>> +
>>>>> /*
>>>>> * These are _only_ used during initialisation, therefore they
>>>>> * can use __initdata ...  They could have names to indicate
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index a92791512077..26e8044e9848 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
>>>>> /* Skip PFNs that belong to non-present sections */
>>>>> static inline __meminit unsigned long next_pfn(unsigned long pfn)
>>>>> {
>>>>> -    unsigned long section_nr;
>>>>> +    const unsigned long section_nr = pfn_to_section_nr(++pfn);
>>>>>
>>>>> -    section_nr = pfn_to_section_nr(++pfn);
>>>>>   if (present_section_nr(section_nr))
>>>>>       return pfn;
>>>>> -
>>>>> -    while (++section_nr <= __highest_present_section_nr) {
>>>>> -        if (present_section_nr(section_nr))
>>>>> -            return section_nr_to_pfn(section_nr);
>>>>> -    }
>>>>> -
>>>>> -    return -1;
>>>>> +    return section_nr_to_pfn(next_present_section_nr(section_nr));
>>>>
>>>> This changes behaviour in the corner case: if next_present_section_nr()
>>>> returns -1, we call section_nr_to_pfn() for it. It's unlikely would give
>>>> any valid pfn, but I can't say for sure for all archs. I guess the worst
>>>> case scenrio would be endless loop over the same secitons/pfns.
>>>>
>>>> Have you considered the case?
>>>
>>> Yes, see the patch description. We return -1 << PFN_SECTION_SHIFT, so a number close to the end of the address space (0xfff...000). (Will double check tomorrow if any 32bit arch could be problematic here)
>>
>> ... but thinking again, 0xfff... is certainly an invalid PFN, so this should work just fine.
>>
>> (biggest possible pfn is -1 >> PFN_SHIFT)
>>
>> But it‘s late in Germany, will double check tomorrow :)
> 
> If the end_pfn happens the be more than -1UL << PFN_SECTION_SHIFT we are
> screwed: the pfn is invalid, next_present_section_nr() returns -1, the
> next iterartion is on the same pfn and we have endless loop.
> 
> The question is whether we can prove end_pfn is always less than
> -1UL << PFN_SECTION_SHIFT in any configuration of any arch.
> 
> It is not obvious for me.

memmap_init_zone() is called for a physical memory region: pfn + size
(nr_pages)

The highest possible PFN you can have is "-1(unsigned long) >>
PFN_SHIFT". So even if you would want to add the very last section, the
PFN would still be smaller than -1UL << PFN_SECTION_SHIFT.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr()
  2020-01-14 10:49           ` David Hildenbrand
@ 2020-01-14 15:52             ` Kirill A. Shutemov
  2020-01-14 16:50               ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2020-01-14 15:52 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador

On Tue, Jan 14, 2020 at 11:49:19AM +0100, David Hildenbrand wrote:
> memmap_init_zone() is called for a physical memory region: pfn + size
> (nr_pages)
> 
> The highest possible PFN you can have is "-1(unsigned long) >>
> PFN_SHIFT". So even if you would want to add the very last section, the
> PFN would still be smaller than -1UL << PFN_SECTION_SHIFT.

PFN_SHIFT? I guess you mean PAGE_SHIFT.

Of course PFN can be more than -1UL >> PAGE_SHIFT. Like on 32-bit x86 with
PAE it is ((1ULL << 36) - 1) >> PAGE_SHIFT. That's the whole reason for
PAE.

The highest possible PFN must fit into phys_addr_t when shifted left by
PAGE_SHIFT and must fit into unsigned long. It's can be -1UL if
phys_addr_t is 64-bit.

Any other limitation I miss?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr()
  2020-01-14 15:52             ` Kirill A. Shutemov
@ 2020-01-14 16:50               ` David Hildenbrand
  2020-01-14 16:52                 ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2020-01-14 16:50 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador

On 14.01.20 16:52, Kirill A. Shutemov wrote:
> On Tue, Jan 14, 2020 at 11:49:19AM +0100, David Hildenbrand wrote:
>> memmap_init_zone() is called for a physical memory region: pfn + size
>> (nr_pages)
>>
>> The highest possible PFN you can have is "-1(unsigned long) >>
>> PFN_SHIFT". So even if you would want to add the very last section, the
>> PFN would still be smaller than -1UL << PFN_SECTION_SHIFT.
> 
> PFN_SHIFT? I guess you mean PAGE_SHIFT.

Yes :)

> 
> Of course PFN can be more than -1UL >> PAGE_SHIFT. Like on 32-bit x86 with
> PAE it is ((1ULL << 36) - 1) >> PAGE_SHIFT. That's the whole reason for
> PAE.

You are right about PAE, but I think you agree that is is a special case.

> 
> The highest possible PFN must fit into phys_addr_t when shifted left by
> PAGE_SHIFT and must fit into unsigned long. It's can be -1UL if
> phys_addr_t is 64-bit.
> 

Right, and for 32bit, that would mean (assuming something like 12bit
PAGE_SHIFT) if you have -1 (0xffffffff) that the biggest possible
address is 0xfffffffffff (44bit). In that case, the existing code would
already break because "end_pfn" (is actually +1, pointing after the one
to initialize), would overflow to 0 and you would have an endless loop
in memmap_init_zone().

Now, after thischange you not only get an endless loop when trying to
init the very last PFN, but when trying to init a PFN in the very last
section (section_nr= -1 - e.g., the last 128MB).

I don't think there is any sane use case where you initialize something
partially in the last section that is possible with any hardware address
extension mechanism.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr()
  2020-01-14 16:50               ` David Hildenbrand
@ 2020-01-14 16:52                 ` David Hildenbrand
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2020-01-14 16:52 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador

On 14.01.20 17:50, David Hildenbrand wrote:
> On 14.01.20 16:52, Kirill A. Shutemov wrote:
>> On Tue, Jan 14, 2020 at 11:49:19AM +0100, David Hildenbrand wrote:
>>> memmap_init_zone() is called for a physical memory region: pfn + size
>>> (nr_pages)
>>>
>>> The highest possible PFN you can have is "-1(unsigned long) >>
>>> PFN_SHIFT". So even if you would want to add the very last section, the
>>> PFN would still be smaller than -1UL << PFN_SECTION_SHIFT.
>>
>> PFN_SHIFT? I guess you mean PAGE_SHIFT.
> 
> Yes :)
> 
>>
>> Of course PFN can be more than -1UL >> PAGE_SHIFT. Like on 32-bit x86 with
>> PAE it is ((1ULL << 36) - 1) >> PAGE_SHIFT. That's the whole reason for
>> PAE.
> 
> You are right about PAE, but I think you agree that is is a special case.
> 
>>
>> The highest possible PFN must fit into phys_addr_t when shifted left by
>> PAGE_SHIFT and must fit into unsigned long. It's can be -1UL if
>> phys_addr_t is 64-bit.
>>
> 
> Right, and for 32bit, that would mean (assuming something like 12bit
> PAGE_SHIFT) if you have -1 (0xffffffff) that the biggest possible
> address is 0xfffffffffff (44bit). In that case, the existing code would
> already break because "end_pfn" (is actually +1, pointing after the one
> to initialize), would overflow to 0 and you would have an endless loop
> in memmap_init_zone().

Correction: If end_pfn overflows to 0, you would get no loop iteration
at all.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups
  2020-01-13 14:40 [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups David Hildenbrand
  2020-01-13 14:40 ` [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() David Hildenbrand
  2020-01-13 14:40 ` [PATCH v1 2/2] mm: factor out next_present_section_nr() David Hildenbrand
@ 2020-01-31  4:30 ` Andrew Morton
  2020-02-03 14:49   ` Kirill A. Shutemov
  2 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2020-01-31  4:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Kirill A. Shutemov, Michal Hocko,
	Oscar Salvador, Pavel Tatashin

On Mon, 13 Jan 2020 15:40:33 +0100 David Hildenbrand <david@redhat.com> wrote:

> Two cleanups for "[PATCH] mm/page_alloc: Skip non present sections on zone
> initialization" [1], whereby one cleanup seems to also be a fix for a
> (theoretial?) kernelcore=mirror case - unless I am messing something up :)
> 

I'm not seeing any acks or reviewed-by's on these two?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups
  2020-01-31  4:30 ` [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups Andrew Morton
@ 2020-02-03 14:49   ` Kirill A. Shutemov
  0 siblings, 0 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2020-02-03 14:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, linux-kernel, linux-mm, Michal Hocko,
	Oscar Salvador, Pavel Tatashin

On Thu, Jan 30, 2020 at 08:30:59PM -0800, Andrew Morton wrote:
> On Mon, 13 Jan 2020 15:40:33 +0100 David Hildenbrand <david@redhat.com> wrote:
> 
> > Two cleanups for "[PATCH] mm/page_alloc: Skip non present sections on zone
> > initialization" [1], whereby one cleanup seems to also be a fix for a
> > (theoretial?) kernelcore=mirror case - unless I am messing something up :)
> > 
> 
> I'm not seeing any acks or reviewed-by's on these two?

You can use mine:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone()
  2020-01-13 14:40 ` [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() David Hildenbrand
@ 2020-02-03 21:35   ` Alexander Duyck
  2020-02-03 21:44     ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2020-02-03 21:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, linux-mm, Pavel Tatashin, Andrew Morton, Michal Hocko,
	Oscar Salvador, Kirill A . Shutemov

On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote:
>
> Let's update the pfn manually whenever we continue the loop. This makes
> the code easier to read but also less error prone (and we can directly
> fix one issue).
>
> When overlap_memmap_init() returns true, pfn is updated to
> "memblock_region_memory_end_pfn(r)". So it already points at the *next*
> pfn to process. Incrementing the pfn another time is wrong, we might
> leave one uninitialized. I spotted this by inspecting the code, so I have
> no idea if this is relevant in practise (with kernelcore=mirror).
>
> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/page_alloc.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a41bd7341de1..a92791512077 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>         }
>  #endif
>
> -       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +       for (pfn = start_pfn; pfn < end_pfn; ) {
>                 /*
>                  * There can be holes in boot-time mem_map[]s handed to this
>                  * function.  They do not exist on hotplugged memory.
>                  */
>                 if (context == MEMMAP_EARLY) {
>                         if (!early_pfn_valid(pfn)) {
> -                               pfn = next_pfn(pfn) - 1;
> +                               pfn = next_pfn(pfn);
>                                 continue;
>                         }
> -                       if (!early_pfn_in_nid(pfn, nid))
> +                       if (!early_pfn_in_nid(pfn, nid)) {
> +                               pfn++;
>                                 continue;
> +                       }
>                         if (overlap_memmap_init(zone, &pfn))
>                                 continue;
>                         if (defer_init(nid, pfn, end_pfn))

I'm pretty sure this is a bit broken. The overlap_memmap_init is going
to return memblock_region_memory_end_pfn instead of the start of the
next region. I think that is going to stick you in a mirrored region
without advancing in that case. You would also need to have that case
do a pfn++ before the continue;

> @@ -5944,6 +5946,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>                         set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>                         cond_resched();
>                 }
> +               pfn++;
>         }
>  }
>
> --
> 2.24.1
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone()
  2020-02-03 21:35   ` Alexander Duyck
@ 2020-02-03 21:44     ` David Hildenbrand
  2020-02-03 23:17       ` Alexander Duyck
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2020-02-03 21:44 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Hildenbrand, LKML, linux-mm, Pavel Tatashin, Andrew Morton,
	Michal Hocko, Oscar Salvador, Kirill A . Shutemov



> Am 03.02.2020 um 22:35 schrieb Alexander Duyck <alexander.duyck@gmail.com>:
> 
> On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote:
>> 
>> Let's update the pfn manually whenever we continue the loop. This makes
>> the code easier to read but also less error prone (and we can directly
>> fix one issue).
>> 
>> When overlap_memmap_init() returns true, pfn is updated to
>> "memblock_region_memory_end_pfn(r)". So it already points at the *next*
>> pfn to process. Incrementing the pfn another time is wrong, we might
>> leave one uninitialized. I spotted this by inspecting the code, so I have
>> no idea if this is relevant in practise (with kernelcore=mirror).
>> 
>> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> mm/page_alloc.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index a41bd7341de1..a92791512077 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>        }
>> #endif
>> 
>> -       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> +       for (pfn = start_pfn; pfn < end_pfn; ) {
>>                /*
>>                 * There can be holes in boot-time mem_map[]s handed to this
>>                 * function.  They do not exist on hotplugged memory.
>>                 */
>>                if (context == MEMMAP_EARLY) {
>>                        if (!early_pfn_valid(pfn)) {
>> -                               pfn = next_pfn(pfn) - 1;
>> +                               pfn = next_pfn(pfn);
>>                                continue;
>>                        }
>> -                       if (!early_pfn_in_nid(pfn, nid))
>> +                       if (!early_pfn_in_nid(pfn, nid)) {
>> +                               pfn++;
>>                                continue;
>> +                       }
>>                        if (overlap_memmap_init(zone, &pfn))
>>                                continue;
>>                        if (defer_init(nid, pfn, end_pfn))
> 
> I'm pretty sure this is a bit broken. The overlap_memmap_init is going
> to return memblock_region_memory_end_pfn instead of the start of the
> next region. I think that is going to stick you in a mirrored region
> without advancing in that case. You would also need to have that case
> do a pfn++ before the continue;

Thanks for having a look.

Did you read the description regarding this change?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone()
  2020-02-03 21:44     ` David Hildenbrand
@ 2020-02-03 23:17       ` Alexander Duyck
  2020-02-04  8:40         ` David Hildenbrand
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2020-02-03 23:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: LKML, linux-mm, Pavel Tatashin, Andrew Morton, Michal Hocko,
	Oscar Salvador, Kirill A . Shutemov

On Mon, Feb 3, 2020 at 1:44 PM David Hildenbrand <david@redhat.com> wrote:
>
>
>
> > Am 03.02.2020 um 22:35 schrieb Alexander Duyck <alexander.duyck@gmail.com>:
> >
> > On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> Let's update the pfn manually whenever we continue the loop. This makes
> >> the code easier to read but also less error prone (and we can directly
> >> fix one issue).
> >>
> >> When overlap_memmap_init() returns true, pfn is updated to
> >> "memblock_region_memory_end_pfn(r)". So it already points at the *next*
> >> pfn to process. Incrementing the pfn another time is wrong, we might
> >> leave one uninitialized. I spotted this by inspecting the code, so I have
> >> no idea if this is relevant in practise (with kernelcore=mirror).
> >>
> >> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
> >> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> Cc: Michal Hocko <mhocko@kernel.org>
> >> Cc: Oscar Salvador <osalvador@suse.de>
> >> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >> mm/page_alloc.c | 9 ++++++---
> >> 1 file changed, 6 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index a41bd7341de1..a92791512077 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >>        }
> >> #endif
> >>
> >> -       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> >> +       for (pfn = start_pfn; pfn < end_pfn; ) {
> >>                /*
> >>                 * There can be holes in boot-time mem_map[]s handed to this
> >>                 * function.  They do not exist on hotplugged memory.
> >>                 */
> >>                if (context == MEMMAP_EARLY) {
> >>                        if (!early_pfn_valid(pfn)) {
> >> -                               pfn = next_pfn(pfn) - 1;
> >> +                               pfn = next_pfn(pfn);
> >>                                continue;
> >>                        }
> >> -                       if (!early_pfn_in_nid(pfn, nid))
> >> +                       if (!early_pfn_in_nid(pfn, nid)) {
> >> +                               pfn++;
> >>                                continue;
> >> +                       }
> >>                        if (overlap_memmap_init(zone, &pfn))
> >>                                continue;
> >>                        if (defer_init(nid, pfn, end_pfn))
> >
> > I'm pretty sure this is a bit broken. The overlap_memmap_init is going
> > to return memblock_region_memory_end_pfn instead of the start of the
> > next region. I think that is going to stick you in a mirrored region
> > without advancing in that case. You would also need to have that case
> > do a pfn++ before the continue;
>
> Thanks for having a look.
>
> Did you read the description regarding this change?

Actually I hadn't read it all that closely, so my bad on that. The
part that had caught my attention though was that
memblock_region_memory_end is using PFN_DOWN to identify the end of
the memory region, Given that we probably shouldn't be messing with
the PFNs that may contain any of this memory it might make more sense
to use memblock_region_reserved_end_pfn which uses PFN_UP so that we
exclude all memory that is in the mirrored region just in case
something doesn't end on a PFN aligned boundary.

If we know that the mirrored region is going to always be page size
aligned then I guess you are good to go. That was the only thing I
wasn't sure about.

Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone()
  2020-02-03 23:17       ` Alexander Duyck
@ 2020-02-04  8:40         ` David Hildenbrand
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2020-02-04  8:40 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: LKML, linux-mm, Pavel Tatashin, Andrew Morton, Michal Hocko,
	Oscar Salvador, Kirill A . Shutemov

On 04.02.20 00:17, Alexander Duyck wrote:
> On Mon, Feb 3, 2020 at 1:44 PM David Hildenbrand <david@redhat.com> wrote:
>>
>>
>>
>>> Am 03.02.2020 um 22:35 schrieb Alexander Duyck <alexander.duyck@gmail.com>:
>>>
>>> On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> Let's update the pfn manually whenever we continue the loop. This makes
>>>> the code easier to read but also less error prone (and we can directly
>>>> fix one issue).
>>>>
>>>> When overlap_memmap_init() returns true, pfn is updated to
>>>> "memblock_region_memory_end_pfn(r)". So it already points at the *next*
>>>> pfn to process. Incrementing the pfn another time is wrong, we might
>>>> leave one uninitialized. I spotted this by inspecting the code, so I have
>>>> no idea if this is relevant in practise (with kernelcore=mirror).
>>>>
>>>> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone")
>>>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>> Cc: Kirill A. Shutemov <kirill@shutemov.name>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>> mm/page_alloc.c | 9 ++++++---
>>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index a41bd7341de1..a92791512077 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>>        }
>>>> #endif
>>>>
>>>> -       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>>>> +       for (pfn = start_pfn; pfn < end_pfn; ) {
>>>>                /*
>>>>                 * There can be holes in boot-time mem_map[]s handed to this
>>>>                 * function.  They do not exist on hotplugged memory.
>>>>                 */
>>>>                if (context == MEMMAP_EARLY) {
>>>>                        if (!early_pfn_valid(pfn)) {
>>>> -                               pfn = next_pfn(pfn) - 1;
>>>> +                               pfn = next_pfn(pfn);
>>>>                                continue;
>>>>                        }
>>>> -                       if (!early_pfn_in_nid(pfn, nid))
>>>> +                       if (!early_pfn_in_nid(pfn, nid)) {
>>>> +                               pfn++;
>>>>                                continue;
>>>> +                       }
>>>>                        if (overlap_memmap_init(zone, &pfn))
>>>>                                continue;
>>>>                        if (defer_init(nid, pfn, end_pfn))
>>>
>>> I'm pretty sure this is a bit broken. The overlap_memmap_init is going
>>> to return memblock_region_memory_end_pfn instead of the start of the
>>> next region. I think that is going to stick you in a mirrored region
>>> without advancing in that case. You would also need to have that case
>>> do a pfn++ before the continue;
>>
>> Thanks for having a look.
>>
>> Did you read the description regarding this change?
> 
> Actually I hadn't read it all that closely, so my bad on that. The
> part that had caught my attention though was that
> memblock_region_memory_end is using PFN_DOWN to identify the end of
> the memory region, Given that we probably shouldn't be messing with
> the PFNs that may contain any of this memory it might make more sense
> to use memblock_region_reserved_end_pfn which uses PFN_UP so that we
> exclude all memory that is in the mirrored region just in case
> something doesn't end on a PFN aligned boundary.
> 
> If we know that the mirrored region is going to always be page size
> aligned then I guess you are good to go. That was the only thing I
> wasn't sure about.

I think we can safely assume this for now. But I *think* we are fine
either way:

We are using memblock_region_memory_end() in all cases I spotted
(especially consistently in overlap_memmap_init()) - so there is never a
mis-match that could result in an endless loop.

Anyhow, having mirrored sub-page regions would be weird either way :)
(just like any zone that would end on sub-pages)

> 
> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 

Thanks!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-02-04  8:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-13 14:40 [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups David Hildenbrand
2020-01-13 14:40 ` [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() David Hildenbrand
2020-02-03 21:35   ` Alexander Duyck
2020-02-03 21:44     ` David Hildenbrand
2020-02-03 23:17       ` Alexander Duyck
2020-02-04  8:40         ` David Hildenbrand
2020-01-13 14:40 ` [PATCH v1 2/2] mm: factor out next_present_section_nr() David Hildenbrand
2020-01-13 22:41   ` Kirill A. Shutemov
2020-01-13 22:57     ` David Hildenbrand
2020-01-13 23:02       ` David Hildenbrand
2020-01-14 10:41         ` Kirill A. Shutemov
2020-01-14 10:49           ` David Hildenbrand
2020-01-14 15:52             ` Kirill A. Shutemov
2020-01-14 16:50               ` David Hildenbrand
2020-01-14 16:52                 ` David Hildenbrand
2020-01-31  4:30 ` [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups Andrew Morton
2020-02-03 14:49   ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).