* [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups @ 2020-01-13 14:40 David Hildenbrand 2020-01-13 14:40 ` [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() David Hildenbrand ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: David Hildenbrand @ 2020-01-13 14:40 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Andrew Morton, Kirill A. Shutemov, Michal Hocko, Oscar Salvador, Pavel Tatashin Two cleanups for "[PATCH] mm/page_alloc: Skip non present sections on zone initialization" [1], whereby one cleanup seems to also be a fix for a (theoretial?) kernelcore=mirror case - unless I am messing something up :) [1] https://lkml.kernel.org/r/20191230093828.24613-1-kirill.shutemov@linux.intel.com David Hildenbrand (2): mm/page_alloc: fix and rework pfn handling in memmap_init_zone() mm: factor out next_present_section_nr() include/linux/mmzone.h | 10 ++++++++++ mm/page_alloc.c | 20 ++++++++------------ mm/sparse.c | 10 ---------- 3 files changed, 18 insertions(+), 22 deletions(-) -- 2.24.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() 2020-01-13 14:40 [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups David Hildenbrand @ 2020-01-13 14:40 ` David Hildenbrand 2020-02-03 21:35 ` Alexander Duyck 2020-01-13 14:40 ` [PATCH v1 2/2] mm: factor out next_present_section_nr() David Hildenbrand 2020-01-31 4:30 ` [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups Andrew Morton 2 siblings, 1 reply; 19+ messages in thread From: David Hildenbrand @ 2020-01-13 14:40 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Pavel Tatashin, Andrew Morton, Michal Hocko, Oscar Salvador, Kirill A . Shutemov Let's update the pfn manually whenever we continue the loop. This makes the code easier to read but also less error prone (and we can directly fix one issue). When overlap_memmap_init() returns true, pfn is updated to "memblock_region_memory_end_pfn(r)". So it already points at the *next* pfn to process. Incrementing the pfn another time is wrong, we might leave one uninitialized. I spotted this by inspecting the code, so I have no idea if this is relevant in practise (with kernelcore=mirror). Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone") Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/page_alloc.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a41bd7341de1..a92791512077 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, } #endif - for (pfn = start_pfn; pfn < end_pfn; pfn++) { + for (pfn = start_pfn; pfn < end_pfn; ) { /* * There can be holes in boot-time mem_map[]s handed to this * function. They do not exist on hotplugged memory. */ if (context == MEMMAP_EARLY) { if (!early_pfn_valid(pfn)) { - pfn = next_pfn(pfn) - 1; + pfn = next_pfn(pfn); continue; } - if (!early_pfn_in_nid(pfn, nid)) + if (!early_pfn_in_nid(pfn, nid)) { + pfn++; continue; + } if (overlap_memmap_init(zone, &pfn)) continue; if (defer_init(nid, pfn, end_pfn)) @@ -5944,6 +5946,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, set_pageblock_migratetype(page, MIGRATE_MOVABLE); cond_resched(); } + pfn++; } } -- 2.24.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() 2020-01-13 14:40 ` [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() David Hildenbrand @ 2020-02-03 21:35 ` Alexander Duyck 0 siblings, 0 replies; 19+ messages in thread From: Alexander Duyck @ 2020-02-03 21:35 UTC (permalink / raw) To: David Hildenbrand Cc: LKML, linux-mm, Pavel Tatashin, Andrew Morton, Michal Hocko, Oscar Salvador, Kirill A . Shutemov On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote: > > Let's update the pfn manually whenever we continue the loop. This makes > the code easier to read but also less error prone (and we can directly > fix one issue). > > When overlap_memmap_init() returns true, pfn is updated to > "memblock_region_memory_end_pfn(r)". So it already points at the *next* > pfn to process. Incrementing the pfn another time is wrong, we might > leave one uninitialized. I spotted this by inspecting the code, so I have > no idea if this is relevant in practise (with kernelcore=mirror). > > Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone") > Cc: Pavel Tatashin <pasha.tatashin@oracle.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Kirill A. Shutemov <kirill@shutemov.name> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/page_alloc.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index a41bd7341de1..a92791512077 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > } > #endif > > - for (pfn = start_pfn; pfn < end_pfn; pfn++) { > + for (pfn = start_pfn; pfn < end_pfn; ) { > /* > * There can be holes in boot-time mem_map[]s handed to this > * function. They do not exist on hotplugged memory. > */ > if (context == MEMMAP_EARLY) { > if (!early_pfn_valid(pfn)) { > - pfn = next_pfn(pfn) - 1; > + pfn = next_pfn(pfn); > continue; > } > - if (!early_pfn_in_nid(pfn, nid)) > + if (!early_pfn_in_nid(pfn, nid)) { > + pfn++; > continue; > + } > if (overlap_memmap_init(zone, &pfn)) > continue; > if (defer_init(nid, pfn, end_pfn)) I'm pretty sure this is a bit broken. The overlap_memmap_init is going to return memblock_region_memory_end_pfn instead of the start of the next region. I think that is going to stick you in a mirrored region without advancing in that case. You would also need to have that case do a pfn++ before the continue; > @@ -5944,6 +5946,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > set_pageblock_migratetype(page, MIGRATE_MOVABLE); > cond_resched(); > } > + pfn++; > } > } > > -- > 2.24.1 > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() @ 2020-02-03 21:35 ` Alexander Duyck 0 siblings, 0 replies; 19+ messages in thread From: Alexander Duyck @ 2020-02-03 21:35 UTC (permalink / raw) To: David Hildenbrand Cc: LKML, linux-mm, Pavel Tatashin, Andrew Morton, Michal Hocko, Oscar Salvador, Kirill A . Shutemov On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote: > > Let's update the pfn manually whenever we continue the loop. This makes > the code easier to read but also less error prone (and we can directly > fix one issue). > > When overlap_memmap_init() returns true, pfn is updated to > "memblock_region_memory_end_pfn(r)". So it already points at the *next* > pfn to process. Incrementing the pfn another time is wrong, we might > leave one uninitialized. I spotted this by inspecting the code, so I have > no idea if this is relevant in practise (with kernelcore=mirror). > > Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone") > Cc: Pavel Tatashin <pasha.tatashin@oracle.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Kirill A. Shutemov <kirill@shutemov.name> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/page_alloc.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index a41bd7341de1..a92791512077 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > } > #endif > > - for (pfn = start_pfn; pfn < end_pfn; pfn++) { > + for (pfn = start_pfn; pfn < end_pfn; ) { > /* > * There can be holes in boot-time mem_map[]s handed to this > * function. They do not exist on hotplugged memory. > */ > if (context == MEMMAP_EARLY) { > if (!early_pfn_valid(pfn)) { > - pfn = next_pfn(pfn) - 1; > + pfn = next_pfn(pfn); > continue; > } > - if (!early_pfn_in_nid(pfn, nid)) > + if (!early_pfn_in_nid(pfn, nid)) { > + pfn++; > continue; > + } > if (overlap_memmap_init(zone, &pfn)) > continue; > if (defer_init(nid, pfn, end_pfn)) I'm pretty sure this is a bit broken. The overlap_memmap_init is going to return memblock_region_memory_end_pfn instead of the start of the next region. I think that is going to stick you in a mirrored region without advancing in that case. You would also need to have that case do a pfn++ before the continue; > @@ -5944,6 +5946,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > set_pageblock_migratetype(page, MIGRATE_MOVABLE); > cond_resched(); > } > + pfn++; > } > } > > -- > 2.24.1 > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() 2020-02-03 21:35 ` Alexander Duyck (?) @ 2020-02-03 21:44 ` David Hildenbrand 2020-02-03 23:17 ` Alexander Duyck -1 siblings, 1 reply; 19+ messages in thread From: David Hildenbrand @ 2020-02-03 21:44 UTC (permalink / raw) To: Alexander Duyck Cc: David Hildenbrand, LKML, linux-mm, Pavel Tatashin, Andrew Morton, Michal Hocko, Oscar Salvador, Kirill A . Shutemov > Am 03.02.2020 um 22:35 schrieb Alexander Duyck <alexander.duyck@gmail.com>: > > On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote: >> >> Let's update the pfn manually whenever we continue the loop. This makes >> the code easier to read but also less error prone (and we can directly >> fix one issue). >> >> When overlap_memmap_init() returns true, pfn is updated to >> "memblock_region_memory_end_pfn(r)". So it already points at the *next* >> pfn to process. Incrementing the pfn another time is wrong, we might >> leave one uninitialized. I spotted this by inspecting the code, so I have >> no idea if this is relevant in practise (with kernelcore=mirror). >> >> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone") >> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Michal Hocko <mhocko@kernel.org> >> Cc: Oscar Salvador <osalvador@suse.de> >> Cc: Kirill A. Shutemov <kirill@shutemov.name> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> mm/page_alloc.c | 9 ++++++--- >> 1 file changed, 6 insertions(+), 3 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index a41bd7341de1..a92791512077 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >> } >> #endif >> >> - for (pfn = start_pfn; pfn < end_pfn; pfn++) { >> + for (pfn = start_pfn; pfn < end_pfn; ) { >> /* >> * There can be holes in boot-time mem_map[]s handed to this >> * function. They do not exist on hotplugged memory. >> */ >> if (context == MEMMAP_EARLY) { >> if (!early_pfn_valid(pfn)) { >> - pfn = next_pfn(pfn) - 1; >> + pfn = next_pfn(pfn); >> continue; >> } >> - if (!early_pfn_in_nid(pfn, nid)) >> + if (!early_pfn_in_nid(pfn, nid)) { >> + pfn++; >> continue; >> + } >> if (overlap_memmap_init(zone, &pfn)) >> continue; >> if (defer_init(nid, pfn, end_pfn)) > > I'm pretty sure this is a bit broken. The overlap_memmap_init is going > to return memblock_region_memory_end_pfn instead of the start of the > next region. I think that is going to stick you in a mirrored region > without advancing in that case. You would also need to have that case > do a pfn++ before the continue; Thanks for having a look. Did you read the description regarding this change? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() 2020-02-03 21:44 ` David Hildenbrand @ 2020-02-03 23:17 ` Alexander Duyck 0 siblings, 0 replies; 19+ messages in thread From: Alexander Duyck @ 2020-02-03 23:17 UTC (permalink / raw) To: David Hildenbrand Cc: LKML, linux-mm, Pavel Tatashin, Andrew Morton, Michal Hocko, Oscar Salvador, Kirill A . Shutemov On Mon, Feb 3, 2020 at 1:44 PM David Hildenbrand <david@redhat.com> wrote: > > > > > Am 03.02.2020 um 22:35 schrieb Alexander Duyck <alexander.duyck@gmail.com>: > > > > On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote: > >> > >> Let's update the pfn manually whenever we continue the loop. This makes > >> the code easier to read but also less error prone (and we can directly > >> fix one issue). > >> > >> When overlap_memmap_init() returns true, pfn is updated to > >> "memblock_region_memory_end_pfn(r)". So it already points at the *next* > >> pfn to process. Incrementing the pfn another time is wrong, we might > >> leave one uninitialized. I spotted this by inspecting the code, so I have > >> no idea if this is relevant in practise (with kernelcore=mirror). > >> > >> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone") > >> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> > >> Cc: Andrew Morton <akpm@linux-foundation.org> > >> Cc: Michal Hocko <mhocko@kernel.org> > >> Cc: Oscar Salvador <osalvador@suse.de> > >> Cc: Kirill A. Shutemov <kirill@shutemov.name> > >> Signed-off-by: David Hildenbrand <david@redhat.com> > >> --- > >> mm/page_alloc.c | 9 ++++++--- > >> 1 file changed, 6 insertions(+), 3 deletions(-) > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index a41bd7341de1..a92791512077 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > >> } > >> #endif > >> > >> - for (pfn = start_pfn; pfn < end_pfn; pfn++) { > >> + for (pfn = start_pfn; pfn < end_pfn; ) { > >> /* > >> * There can be holes in boot-time mem_map[]s handed to this > >> * function. They do not exist on hotplugged memory. > >> */ > >> if (context == MEMMAP_EARLY) { > >> if (!early_pfn_valid(pfn)) { > >> - pfn = next_pfn(pfn) - 1; > >> + pfn = next_pfn(pfn); > >> continue; > >> } > >> - if (!early_pfn_in_nid(pfn, nid)) > >> + if (!early_pfn_in_nid(pfn, nid)) { > >> + pfn++; > >> continue; > >> + } > >> if (overlap_memmap_init(zone, &pfn)) > >> continue; > >> if (defer_init(nid, pfn, end_pfn)) > > > > I'm pretty sure this is a bit broken. The overlap_memmap_init is going > > to return memblock_region_memory_end_pfn instead of the start of the > > next region. I think that is going to stick you in a mirrored region > > without advancing in that case. You would also need to have that case > > do a pfn++ before the continue; > > Thanks for having a look. > > Did you read the description regarding this change? Actually I hadn't read it all that closely, so my bad on that. The part that had caught my attention though was that memblock_region_memory_end is using PFN_DOWN to identify the end of the memory region, Given that we probably shouldn't be messing with the PFNs that may contain any of this memory it might make more sense to use memblock_region_reserved_end_pfn which uses PFN_UP so that we exclude all memory that is in the mirrored region just in case something doesn't end on a PFN aligned boundary. If we know that the mirrored region is going to always be page size aligned then I guess you are good to go. That was the only thing I wasn't sure about. Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() @ 2020-02-03 23:17 ` Alexander Duyck 0 siblings, 0 replies; 19+ messages in thread From: Alexander Duyck @ 2020-02-03 23:17 UTC (permalink / raw) To: David Hildenbrand Cc: LKML, linux-mm, Pavel Tatashin, Andrew Morton, Michal Hocko, Oscar Salvador, Kirill A . Shutemov On Mon, Feb 3, 2020 at 1:44 PM David Hildenbrand <david@redhat.com> wrote: > > > > > Am 03.02.2020 um 22:35 schrieb Alexander Duyck <alexander.duyck@gmail.com>: > > > > On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote: > >> > >> Let's update the pfn manually whenever we continue the loop. This makes > >> the code easier to read but also less error prone (and we can directly > >> fix one issue). > >> > >> When overlap_memmap_init() returns true, pfn is updated to > >> "memblock_region_memory_end_pfn(r)". So it already points at the *next* > >> pfn to process. Incrementing the pfn another time is wrong, we might > >> leave one uninitialized. I spotted this by inspecting the code, so I have > >> no idea if this is relevant in practise (with kernelcore=mirror). > >> > >> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone") > >> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> > >> Cc: Andrew Morton <akpm@linux-foundation.org> > >> Cc: Michal Hocko <mhocko@kernel.org> > >> Cc: Oscar Salvador <osalvador@suse.de> > >> Cc: Kirill A. Shutemov <kirill@shutemov.name> > >> Signed-off-by: David Hildenbrand <david@redhat.com> > >> --- > >> mm/page_alloc.c | 9 ++++++--- > >> 1 file changed, 6 insertions(+), 3 deletions(-) > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index a41bd7341de1..a92791512077 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, > >> } > >> #endif > >> > >> - for (pfn = start_pfn; pfn < end_pfn; pfn++) { > >> + for (pfn = start_pfn; pfn < end_pfn; ) { > >> /* > >> * There can be holes in boot-time mem_map[]s handed to this > >> * function. They do not exist on hotplugged memory. > >> */ > >> if (context == MEMMAP_EARLY) { > >> if (!early_pfn_valid(pfn)) { > >> - pfn = next_pfn(pfn) - 1; > >> + pfn = next_pfn(pfn); > >> continue; > >> } > >> - if (!early_pfn_in_nid(pfn, nid)) > >> + if (!early_pfn_in_nid(pfn, nid)) { > >> + pfn++; > >> continue; > >> + } > >> if (overlap_memmap_init(zone, &pfn)) > >> continue; > >> if (defer_init(nid, pfn, end_pfn)) > > > > I'm pretty sure this is a bit broken. The overlap_memmap_init is going > > to return memblock_region_memory_end_pfn instead of the start of the > > next region. I think that is going to stick you in a mirrored region > > without advancing in that case. You would also need to have that case > > do a pfn++ before the continue; > > Thanks for having a look. > > Did you read the description regarding this change? Actually I hadn't read it all that closely, so my bad on that. The part that had caught my attention though was that memblock_region_memory_end is using PFN_DOWN to identify the end of the memory region, Given that we probably shouldn't be messing with the PFNs that may contain any of this memory it might make more sense to use memblock_region_reserved_end_pfn which uses PFN_UP so that we exclude all memory that is in the mirrored region just in case something doesn't end on a PFN aligned boundary. If we know that the mirrored region is going to always be page size aligned then I guess you are good to go. That was the only thing I wasn't sure about. Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() 2020-02-03 23:17 ` Alexander Duyck (?) @ 2020-02-04 8:40 ` David Hildenbrand -1 siblings, 0 replies; 19+ messages in thread From: David Hildenbrand @ 2020-02-04 8:40 UTC (permalink / raw) To: Alexander Duyck Cc: LKML, linux-mm, Pavel Tatashin, Andrew Morton, Michal Hocko, Oscar Salvador, Kirill A . Shutemov On 04.02.20 00:17, Alexander Duyck wrote: > On Mon, Feb 3, 2020 at 1:44 PM David Hildenbrand <david@redhat.com> wrote: >> >> >> >>> Am 03.02.2020 um 22:35 schrieb Alexander Duyck <alexander.duyck@gmail.com>: >>> >>> On Mon, Jan 13, 2020 at 6:40 AM David Hildenbrand <david@redhat.com> wrote: >>>> >>>> Let's update the pfn manually whenever we continue the loop. This makes >>>> the code easier to read but also less error prone (and we can directly >>>> fix one issue). >>>> >>>> When overlap_memmap_init() returns true, pfn is updated to >>>> "memblock_region_memory_end_pfn(r)". So it already points at the *next* >>>> pfn to process. Incrementing the pfn another time is wrong, we might >>>> leave one uninitialized. I spotted this by inspecting the code, so I have >>>> no idea if this is relevant in practise (with kernelcore=mirror). >>>> >>>> Fixes: a9a9e77fbf27 ("mm: move mirrored memory specific code outside of memmap_init_zone") >>>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> >>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>> Cc: Michal Hocko <mhocko@kernel.org> >>>> Cc: Oscar Salvador <osalvador@suse.de> >>>> Cc: Kirill A. Shutemov <kirill@shutemov.name> >>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>> --- >>>> mm/page_alloc.c | 9 ++++++--- >>>> 1 file changed, 6 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index a41bd7341de1..a92791512077 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -5905,18 +5905,20 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >>>> } >>>> #endif >>>> >>>> - for (pfn = start_pfn; pfn < end_pfn; pfn++) { >>>> + for (pfn = start_pfn; pfn < end_pfn; ) { >>>> /* >>>> * There can be holes in boot-time mem_map[]s handed to this >>>> * function. They do not exist on hotplugged memory. >>>> */ >>>> if (context == MEMMAP_EARLY) { >>>> if (!early_pfn_valid(pfn)) { >>>> - pfn = next_pfn(pfn) - 1; >>>> + pfn = next_pfn(pfn); >>>> continue; >>>> } >>>> - if (!early_pfn_in_nid(pfn, nid)) >>>> + if (!early_pfn_in_nid(pfn, nid)) { >>>> + pfn++; >>>> continue; >>>> + } >>>> if (overlap_memmap_init(zone, &pfn)) >>>> continue; >>>> if (defer_init(nid, pfn, end_pfn)) >>> >>> I'm pretty sure this is a bit broken. The overlap_memmap_init is going >>> to return memblock_region_memory_end_pfn instead of the start of the >>> next region. I think that is going to stick you in a mirrored region >>> without advancing in that case. You would also need to have that case >>> do a pfn++ before the continue; >> >> Thanks for having a look. >> >> Did you read the description regarding this change? > > Actually I hadn't read it all that closely, so my bad on that. The > part that had caught my attention though was that > memblock_region_memory_end is using PFN_DOWN to identify the end of > the memory region, Given that we probably shouldn't be messing with > the PFNs that may contain any of this memory it might make more sense > to use memblock_region_reserved_end_pfn which uses PFN_UP so that we > exclude all memory that is in the mirrored region just in case > something doesn't end on a PFN aligned boundary. > > If we know that the mirrored region is going to always be page size > aligned then I guess you are good to go. That was the only thing I > wasn't sure about. I think we can safely assume this for now. But I *think* we are fine either way: We are using memblock_region_memory_end() in all cases I spotted (especially consistently in overlap_memmap_init()) - so there is never a mis-match that could result in an endless loop. Anyhow, having mirrored sub-page regions would be weird either way :) (just like any zone that would end on sub-pages) > > Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > Thanks! -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v1 2/2] mm: factor out next_present_section_nr() 2020-01-13 14:40 [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups David Hildenbrand 2020-01-13 14:40 ` [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() David Hildenbrand @ 2020-01-13 14:40 ` David Hildenbrand 2020-01-13 22:41 ` Kirill A. Shutemov 2020-01-31 4:30 ` [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups Andrew Morton 2 siblings, 1 reply; 19+ messages in thread From: David Hildenbrand @ 2020-01-13 14:40 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Andrew Morton, Michal Hocko, Oscar Salvador, Kirill A . Shutemov Let's move it to the header and use the shorter variant from mm/page_alloc.c (the original one will also check "__highest_present_section_nr + 1", which is not necessary). While at it, make the section_nr in next_pfn() const. In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once we exceed __highest_present_section_nr, which doesn't make a difference in the caller as it is big enough (>= all sane end_pfn). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: David Hildenbrand <david@redhat.com> --- include/linux/mmzone.h | 10 ++++++++++ mm/page_alloc.c | 11 ++--------- mm/sparse.c | 10 ---------- 3 files changed, 12 insertions(+), 19 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c2bc309d1634..462f6873905a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn) return present_section(__nr_to_section(pfn_to_section_nr(pfn))); } +static inline unsigned long next_present_section_nr(unsigned long section_nr) +{ + while (++section_nr <= __highest_present_section_nr) { + if (present_section_nr(section_nr)) + return section_nr; + } + + return -1; +} + /* * These are _only_ used during initialisation, therefore they * can use __initdata ... They could have names to indicate diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a92791512077..26e8044e9848 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn) /* Skip PFNs that belong to non-present sections */ static inline __meminit unsigned long next_pfn(unsigned long pfn) { - unsigned long section_nr; + const unsigned long section_nr = pfn_to_section_nr(++pfn); - section_nr = pfn_to_section_nr(++pfn); if (present_section_nr(section_nr)) return pfn; - - while (++section_nr <= __highest_present_section_nr) { - if (present_section_nr(section_nr)) - return section_nr_to_pfn(section_nr); - } - - return -1; + return section_nr_to_pfn(next_present_section_nr(section_nr)); } #else static inline __meminit unsigned long next_pfn(unsigned long pfn) diff --git a/mm/sparse.c b/mm/sparse.c index 3822ecbd8a1f..ac4a2bfae514 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -198,16 +198,6 @@ static void section_mark_present(struct mem_section *ms) ms->section_mem_map |= SECTION_MARKED_PRESENT; } -static inline unsigned long next_present_section_nr(unsigned long section_nr) -{ - do { - section_nr++; - if (present_section_nr(section_nr)) - return section_nr; - } while ((section_nr <= __highest_present_section_nr)); - - return -1; -} #define for_each_present_section_nr(start, section_nr) \ for (section_nr = next_present_section_nr(start-1); \ ((section_nr != -1) && \ -- 2.24.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr() 2020-01-13 14:40 ` [PATCH v1 2/2] mm: factor out next_present_section_nr() David Hildenbrand @ 2020-01-13 22:41 ` Kirill A. Shutemov 2020-01-13 22:57 ` David Hildenbrand 0 siblings, 1 reply; 19+ messages in thread From: Kirill A. Shutemov @ 2020-01-13 22:41 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador On Mon, Jan 13, 2020 at 03:40:35PM +0100, David Hildenbrand wrote: > Let's move it to the header and use the shorter variant from > mm/page_alloc.c (the original one will also check > "__highest_present_section_nr + 1", which is not necessary). While at it, > make the section_nr in next_pfn() const. > > In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once > we exceed __highest_present_section_nr, which doesn't make a difference in > the caller as it is big enough (>= all sane end_pfn). > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Kirill A. Shutemov <kirill@shutemov.name> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > include/linux/mmzone.h | 10 ++++++++++ > mm/page_alloc.c | 11 ++--------- > mm/sparse.c | 10 ---------- > 3 files changed, 12 insertions(+), 19 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index c2bc309d1634..462f6873905a 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn) > return present_section(__nr_to_section(pfn_to_section_nr(pfn))); > } > > +static inline unsigned long next_present_section_nr(unsigned long section_nr) > +{ > + while (++section_nr <= __highest_present_section_nr) { > + if (present_section_nr(section_nr)) > + return section_nr; > + } > + > + return -1; > +} > + > /* > * These are _only_ used during initialisation, therefore they > * can use __initdata ... They could have names to indicate > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index a92791512077..26e8044e9848 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn) > /* Skip PFNs that belong to non-present sections */ > static inline __meminit unsigned long next_pfn(unsigned long pfn) > { > - unsigned long section_nr; > + const unsigned long section_nr = pfn_to_section_nr(++pfn); > > - section_nr = pfn_to_section_nr(++pfn); > if (present_section_nr(section_nr)) > return pfn; > - > - while (++section_nr <= __highest_present_section_nr) { > - if (present_section_nr(section_nr)) > - return section_nr_to_pfn(section_nr); > - } > - > - return -1; > + return section_nr_to_pfn(next_present_section_nr(section_nr)); This changes behaviour in the corner case: if next_present_section_nr() returns -1, we call section_nr_to_pfn() for it. It's unlikely would give any valid pfn, but I can't say for sure for all archs. I guess the worst case scenrio would be endless loop over the same secitons/pfns. Have you considered the case? -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr() 2020-01-13 22:41 ` Kirill A. Shutemov @ 2020-01-13 22:57 ` David Hildenbrand 2020-01-13 23:02 ` David Hildenbrand 0 siblings, 1 reply; 19+ messages in thread From: David Hildenbrand @ 2020-01-13 22:57 UTC (permalink / raw) To: Kirill A. Shutemov Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador > Am 13.01.2020 um 23:41 schrieb Kirill A. Shutemov <kirill@shutemov.name>: > > On Mon, Jan 13, 2020 at 03:40:35PM +0100, David Hildenbrand wrote: >> Let's move it to the header and use the shorter variant from >> mm/page_alloc.c (the original one will also check >> "__highest_present_section_nr + 1", which is not necessary). While at it, >> make the section_nr in next_pfn() const. >> >> In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once >> we exceed __highest_present_section_nr, which doesn't make a difference in >> the caller as it is big enough (>= all sane end_pfn). >> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Michal Hocko <mhocko@kernel.org> >> Cc: Oscar Salvador <osalvador@suse.de> >> Cc: Kirill A. Shutemov <kirill@shutemov.name> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> include/linux/mmzone.h | 10 ++++++++++ >> mm/page_alloc.c | 11 ++--------- >> mm/sparse.c | 10 ---------- >> 3 files changed, 12 insertions(+), 19 deletions(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index c2bc309d1634..462f6873905a 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn) >> return present_section(__nr_to_section(pfn_to_section_nr(pfn))); >> } >> >> +static inline unsigned long next_present_section_nr(unsigned long section_nr) >> +{ >> + while (++section_nr <= __highest_present_section_nr) { >> + if (present_section_nr(section_nr)) >> + return section_nr; >> + } >> + >> + return -1; >> +} >> + >> /* >> * These are _only_ used during initialisation, therefore they >> * can use __initdata ... They could have names to indicate >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index a92791512077..26e8044e9848 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn) >> /* Skip PFNs that belong to non-present sections */ >> static inline __meminit unsigned long next_pfn(unsigned long pfn) >> { >> - unsigned long section_nr; >> + const unsigned long section_nr = pfn_to_section_nr(++pfn); >> >> - section_nr = pfn_to_section_nr(++pfn); >> if (present_section_nr(section_nr)) >> return pfn; >> - >> - while (++section_nr <= __highest_present_section_nr) { >> - if (present_section_nr(section_nr)) >> - return section_nr_to_pfn(section_nr); >> - } >> - >> - return -1; >> + return section_nr_to_pfn(next_present_section_nr(section_nr)); > > This changes behaviour in the corner case: if next_present_section_nr() > returns -1, we call section_nr_to_pfn() for it. It's unlikely would give > any valid pfn, but I can't say for sure for all archs. I guess the worst > case scenrio would be endless loop over the same secitons/pfns. > > Have you considered the case? Yes, see the patch description. We return -1 << PFN_SECTION_SHIFT, so a number close to the end of the address space (0xfff...000). (Will double check tomorrow if any 32bit arch could be problematic here) Thanks! > > -- > Kirill A. Shutemov > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr() 2020-01-13 22:57 ` David Hildenbrand @ 2020-01-13 23:02 ` David Hildenbrand 2020-01-14 10:41 ` Kirill A. Shutemov 0 siblings, 1 reply; 19+ messages in thread From: David Hildenbrand @ 2020-01-13 23:02 UTC (permalink / raw) To: Kirill A. Shutemov Cc: David Hildenbrand, linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador > Am 13.01.2020 um 23:57 schrieb David Hildenbrand <dhildenb@redhat.com>: > > > >>> Am 13.01.2020 um 23:41 schrieb Kirill A. Shutemov <kirill@shutemov.name>: >>> >>> On Mon, Jan 13, 2020 at 03:40:35PM +0100, David Hildenbrand wrote: >>> Let's move it to the header and use the shorter variant from >>> mm/page_alloc.c (the original one will also check >>> "__highest_present_section_nr + 1", which is not necessary). While at it, >>> make the section_nr in next_pfn() const. >>> >>> In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once >>> we exceed __highest_present_section_nr, which doesn't make a difference in >>> the caller as it is big enough (>= all sane end_pfn). >>> >>> Cc: Andrew Morton <akpm@linux-foundation.org> >>> Cc: Michal Hocko <mhocko@kernel.org> >>> Cc: Oscar Salvador <osalvador@suse.de> >>> Cc: Kirill A. Shutemov <kirill@shutemov.name> >>> Signed-off-by: David Hildenbrand <david@redhat.com> >>> --- >>> include/linux/mmzone.h | 10 ++++++++++ >>> mm/page_alloc.c | 11 ++--------- >>> mm/sparse.c | 10 ---------- >>> 3 files changed, 12 insertions(+), 19 deletions(-) >>> >>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >>> index c2bc309d1634..462f6873905a 100644 >>> --- a/include/linux/mmzone.h >>> +++ b/include/linux/mmzone.h >>> @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn) >>> return present_section(__nr_to_section(pfn_to_section_nr(pfn))); >>> } >>> >>> +static inline unsigned long next_present_section_nr(unsigned long section_nr) >>> +{ >>> + while (++section_nr <= __highest_present_section_nr) { >>> + if (present_section_nr(section_nr)) >>> + return section_nr; >>> + } >>> + >>> + return -1; >>> +} >>> + >>> /* >>> * These are _only_ used during initialisation, therefore they >>> * can use __initdata ... They could have names to indicate >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index a92791512077..26e8044e9848 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn) >>> /* Skip PFNs that belong to non-present sections */ >>> static inline __meminit unsigned long next_pfn(unsigned long pfn) >>> { >>> - unsigned long section_nr; >>> + const unsigned long section_nr = pfn_to_section_nr(++pfn); >>> >>> - section_nr = pfn_to_section_nr(++pfn); >>> if (present_section_nr(section_nr)) >>> return pfn; >>> - >>> - while (++section_nr <= __highest_present_section_nr) { >>> - if (present_section_nr(section_nr)) >>> - return section_nr_to_pfn(section_nr); >>> - } >>> - >>> - return -1; >>> + return section_nr_to_pfn(next_present_section_nr(section_nr)); >> >> This changes behaviour in the corner case: if next_present_section_nr() >> returns -1, we call section_nr_to_pfn() for it. It's unlikely would give >> any valid pfn, but I can't say for sure for all archs. I guess the worst >> case scenrio would be endless loop over the same secitons/pfns. >> >> Have you considered the case? > > Yes, see the patch description. We return -1 << PFN_SECTION_SHIFT, so a number close to the end of the address space (0xfff...000). (Will double check tomorrow if any 32bit arch could be problematic here) ... but thinking again, 0xfff... is certainly an invalid PFN, so this should work just fine. (biggest possible pfn is -1 >> PFN_SHIFT) But it‘s late in Germany, will double check tomorrow :) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr() 2020-01-13 23:02 ` David Hildenbrand @ 2020-01-14 10:41 ` Kirill A. Shutemov 2020-01-14 10:49 ` David Hildenbrand 0 siblings, 1 reply; 19+ messages in thread From: Kirill A. Shutemov @ 2020-01-14 10:41 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador On Tue, Jan 14, 2020 at 12:02:00AM +0100, David Hildenbrand wrote: > > > > Am 13.01.2020 um 23:57 schrieb David Hildenbrand <dhildenb@redhat.com>: > > > > > > > >>> Am 13.01.2020 um 23:41 schrieb Kirill A. Shutemov <kirill@shutemov.name>: > >>> > >>> On Mon, Jan 13, 2020 at 03:40:35PM +0100, David Hildenbrand wrote: > >>> Let's move it to the header and use the shorter variant from > >>> mm/page_alloc.c (the original one will also check > >>> "__highest_present_section_nr + 1", which is not necessary). While at it, > >>> make the section_nr in next_pfn() const. > >>> > >>> In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once > >>> we exceed __highest_present_section_nr, which doesn't make a difference in > >>> the caller as it is big enough (>= all sane end_pfn). > >>> > >>> Cc: Andrew Morton <akpm@linux-foundation.org> > >>> Cc: Michal Hocko <mhocko@kernel.org> > >>> Cc: Oscar Salvador <osalvador@suse.de> > >>> Cc: Kirill A. Shutemov <kirill@shutemov.name> > >>> Signed-off-by: David Hildenbrand <david@redhat.com> > >>> --- > >>> include/linux/mmzone.h | 10 ++++++++++ > >>> mm/page_alloc.c | 11 ++--------- > >>> mm/sparse.c | 10 ---------- > >>> 3 files changed, 12 insertions(+), 19 deletions(-) > >>> > >>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > >>> index c2bc309d1634..462f6873905a 100644 > >>> --- a/include/linux/mmzone.h > >>> +++ b/include/linux/mmzone.h > >>> @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn) > >>> return present_section(__nr_to_section(pfn_to_section_nr(pfn))); > >>> } > >>> > >>> +static inline unsigned long next_present_section_nr(unsigned long section_nr) > >>> +{ > >>> + while (++section_nr <= __highest_present_section_nr) { > >>> + if (present_section_nr(section_nr)) > >>> + return section_nr; > >>> + } > >>> + > >>> + return -1; > >>> +} > >>> + > >>> /* > >>> * These are _only_ used during initialisation, therefore they > >>> * can use __initdata ... They could have names to indicate > >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>> index a92791512077..26e8044e9848 100644 > >>> --- a/mm/page_alloc.c > >>> +++ b/mm/page_alloc.c > >>> @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn) > >>> /* Skip PFNs that belong to non-present sections */ > >>> static inline __meminit unsigned long next_pfn(unsigned long pfn) > >>> { > >>> - unsigned long section_nr; > >>> + const unsigned long section_nr = pfn_to_section_nr(++pfn); > >>> > >>> - section_nr = pfn_to_section_nr(++pfn); > >>> if (present_section_nr(section_nr)) > >>> return pfn; > >>> - > >>> - while (++section_nr <= __highest_present_section_nr) { > >>> - if (present_section_nr(section_nr)) > >>> - return section_nr_to_pfn(section_nr); > >>> - } > >>> - > >>> - return -1; > >>> + return section_nr_to_pfn(next_present_section_nr(section_nr)); > >> > >> This changes behaviour in the corner case: if next_present_section_nr() > >> returns -1, we call section_nr_to_pfn() for it. It's unlikely would give > >> any valid pfn, but I can't say for sure for all archs. I guess the worst > >> case scenrio would be endless loop over the same secitons/pfns. > >> > >> Have you considered the case? > > > > Yes, see the patch description. We return -1 << PFN_SECTION_SHIFT, so a number close to the end of the address space (0xfff...000). (Will double check tomorrow if any 32bit arch could be problematic here) > > ... but thinking again, 0xfff... is certainly an invalid PFN, so this should work just fine. > > (biggest possible pfn is -1 >> PFN_SHIFT) > > But it‘s late in Germany, will double check tomorrow :) If the end_pfn happens the be more than -1UL << PFN_SECTION_SHIFT we are screwed: the pfn is invalid, next_present_section_nr() returns -1, the next iterartion is on the same pfn and we have endless loop. The question is whether we can prove end_pfn is always less than -1UL << PFN_SECTION_SHIFT in any configuration of any arch. It is not obvious for me. -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr() 2020-01-14 10:41 ` Kirill A. Shutemov @ 2020-01-14 10:49 ` David Hildenbrand 2020-01-14 15:52 ` Kirill A. Shutemov 0 siblings, 1 reply; 19+ messages in thread From: David Hildenbrand @ 2020-01-14 10:49 UTC (permalink / raw) To: Kirill A. Shutemov Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador On 14.01.20 11:41, Kirill A. Shutemov wrote: > On Tue, Jan 14, 2020 at 12:02:00AM +0100, David Hildenbrand wrote: >> >> >>> Am 13.01.2020 um 23:57 schrieb David Hildenbrand <dhildenb@redhat.com>: >>> >>> >>> >>>>> Am 13.01.2020 um 23:41 schrieb Kirill A. Shutemov <kirill@shutemov.name>: >>>>> >>>>> On Mon, Jan 13, 2020 at 03:40:35PM +0100, David Hildenbrand wrote: >>>>> Let's move it to the header and use the shorter variant from >>>>> mm/page_alloc.c (the original one will also check >>>>> "__highest_present_section_nr + 1", which is not necessary). While at it, >>>>> make the section_nr in next_pfn() const. >>>>> >>>>> In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once >>>>> we exceed __highest_present_section_nr, which doesn't make a difference in >>>>> the caller as it is big enough (>= all sane end_pfn). >>>>> >>>>> Cc: Andrew Morton <akpm@linux-foundation.org> >>>>> Cc: Michal Hocko <mhocko@kernel.org> >>>>> Cc: Oscar Salvador <osalvador@suse.de> >>>>> Cc: Kirill A. Shutemov <kirill@shutemov.name> >>>>> Signed-off-by: David Hildenbrand <david@redhat.com> >>>>> --- >>>>> include/linux/mmzone.h | 10 ++++++++++ >>>>> mm/page_alloc.c | 11 ++--------- >>>>> mm/sparse.c | 10 ---------- >>>>> 3 files changed, 12 insertions(+), 19 deletions(-) >>>>> >>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >>>>> index c2bc309d1634..462f6873905a 100644 >>>>> --- a/include/linux/mmzone.h >>>>> +++ b/include/linux/mmzone.h >>>>> @@ -1379,6 +1379,16 @@ static inline int pfn_present(unsigned long pfn) >>>>> return present_section(__nr_to_section(pfn_to_section_nr(pfn))); >>>>> } >>>>> >>>>> +static inline unsigned long next_present_section_nr(unsigned long section_nr) >>>>> +{ >>>>> + while (++section_nr <= __highest_present_section_nr) { >>>>> + if (present_section_nr(section_nr)) >>>>> + return section_nr; >>>>> + } >>>>> + >>>>> + return -1; >>>>> +} >>>>> + >>>>> /* >>>>> * These are _only_ used during initialisation, therefore they >>>>> * can use __initdata ... They could have names to indicate >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>> index a92791512077..26e8044e9848 100644 >>>>> --- a/mm/page_alloc.c >>>>> +++ b/mm/page_alloc.c >>>>> @@ -5852,18 +5852,11 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn) >>>>> /* Skip PFNs that belong to non-present sections */ >>>>> static inline __meminit unsigned long next_pfn(unsigned long pfn) >>>>> { >>>>> - unsigned long section_nr; >>>>> + const unsigned long section_nr = pfn_to_section_nr(++pfn); >>>>> >>>>> - section_nr = pfn_to_section_nr(++pfn); >>>>> if (present_section_nr(section_nr)) >>>>> return pfn; >>>>> - >>>>> - while (++section_nr <= __highest_present_section_nr) { >>>>> - if (present_section_nr(section_nr)) >>>>> - return section_nr_to_pfn(section_nr); >>>>> - } >>>>> - >>>>> - return -1; >>>>> + return section_nr_to_pfn(next_present_section_nr(section_nr)); >>>> >>>> This changes behaviour in the corner case: if next_present_section_nr() >>>> returns -1, we call section_nr_to_pfn() for it. It's unlikely would give >>>> any valid pfn, but I can't say for sure for all archs. I guess the worst >>>> case scenrio would be endless loop over the same secitons/pfns. >>>> >>>> Have you considered the case? >>> >>> Yes, see the patch description. We return -1 << PFN_SECTION_SHIFT, so a number close to the end of the address space (0xfff...000). (Will double check tomorrow if any 32bit arch could be problematic here) >> >> ... but thinking again, 0xfff... is certainly an invalid PFN, so this should work just fine. >> >> (biggest possible pfn is -1 >> PFN_SHIFT) >> >> But it‘s late in Germany, will double check tomorrow :) > > If the end_pfn happens the be more than -1UL << PFN_SECTION_SHIFT we are > screwed: the pfn is invalid, next_present_section_nr() returns -1, the > next iterartion is on the same pfn and we have endless loop. > > The question is whether we can prove end_pfn is always less than > -1UL << PFN_SECTION_SHIFT in any configuration of any arch. > > It is not obvious for me. memmap_init_zone() is called for a physical memory region: pfn + size (nr_pages) The highest possible PFN you can have is "-1(unsigned long) >> PFN_SHIFT". So even if you would want to add the very last section, the PFN would still be smaller than -1UL << PFN_SECTION_SHIFT. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr() 2020-01-14 10:49 ` David Hildenbrand @ 2020-01-14 15:52 ` Kirill A. Shutemov 2020-01-14 16:50 ` David Hildenbrand 0 siblings, 1 reply; 19+ messages in thread From: Kirill A. Shutemov @ 2020-01-14 15:52 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador On Tue, Jan 14, 2020 at 11:49:19AM +0100, David Hildenbrand wrote: > memmap_init_zone() is called for a physical memory region: pfn + size > (nr_pages) > > The highest possible PFN you can have is "-1(unsigned long) >> > PFN_SHIFT". So even if you would want to add the very last section, the > PFN would still be smaller than -1UL << PFN_SECTION_SHIFT. PFN_SHIFT? I guess you mean PAGE_SHIFT. Of course PFN can be more than -1UL >> PAGE_SHIFT. Like on 32-bit x86 with PAE it is ((1ULL << 36) - 1) >> PAGE_SHIFT. That's the whole reason for PAE. The highest possible PFN must fit into phys_addr_t when shifted left by PAGE_SHIFT and must fit into unsigned long. It's can be -1UL if phys_addr_t is 64-bit. Any other limitation I miss? -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr() 2020-01-14 15:52 ` Kirill A. Shutemov @ 2020-01-14 16:50 ` David Hildenbrand 2020-01-14 16:52 ` David Hildenbrand 0 siblings, 1 reply; 19+ messages in thread From: David Hildenbrand @ 2020-01-14 16:50 UTC (permalink / raw) To: Kirill A. Shutemov Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador On 14.01.20 16:52, Kirill A. Shutemov wrote: > On Tue, Jan 14, 2020 at 11:49:19AM +0100, David Hildenbrand wrote: >> memmap_init_zone() is called for a physical memory region: pfn + size >> (nr_pages) >> >> The highest possible PFN you can have is "-1(unsigned long) >> >> PFN_SHIFT". So even if you would want to add the very last section, the >> PFN would still be smaller than -1UL << PFN_SECTION_SHIFT. > > PFN_SHIFT? I guess you mean PAGE_SHIFT. Yes :) > > Of course PFN can be more than -1UL >> PAGE_SHIFT. Like on 32-bit x86 with > PAE it is ((1ULL << 36) - 1) >> PAGE_SHIFT. That's the whole reason for > PAE. You are right about PAE, but I think you agree that is is a special case. > > The highest possible PFN must fit into phys_addr_t when shifted left by > PAGE_SHIFT and must fit into unsigned long. It's can be -1UL if > phys_addr_t is 64-bit. > Right, and for 32bit, that would mean (assuming something like 12bit PAGE_SHIFT) if you have -1 (0xffffffff) that the biggest possible address is 0xfffffffffff (44bit). In that case, the existing code would already break because "end_pfn" (is actually +1, pointing after the one to initialize), would overflow to 0 and you would have an endless loop in memmap_init_zone(). Now, after thischange you not only get an endless loop when trying to init the very last PFN, but when trying to init a PFN in the very last section (section_nr= -1 - e.g., the last 128MB). I don't think there is any sane use case where you initialize something partially in the last section that is possible with any hardware address extension mechanism. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm: factor out next_present_section_nr() 2020-01-14 16:50 ` David Hildenbrand @ 2020-01-14 16:52 ` David Hildenbrand 0 siblings, 0 replies; 19+ messages in thread From: David Hildenbrand @ 2020-01-14 16:52 UTC (permalink / raw) To: Kirill A. Shutemov Cc: linux-kernel, linux-mm, Andrew Morton, Michal Hocko, Oscar Salvador On 14.01.20 17:50, David Hildenbrand wrote: > On 14.01.20 16:52, Kirill A. Shutemov wrote: >> On Tue, Jan 14, 2020 at 11:49:19AM +0100, David Hildenbrand wrote: >>> memmap_init_zone() is called for a physical memory region: pfn + size >>> (nr_pages) >>> >>> The highest possible PFN you can have is "-1(unsigned long) >> >>> PFN_SHIFT". So even if you would want to add the very last section, the >>> PFN would still be smaller than -1UL << PFN_SECTION_SHIFT. >> >> PFN_SHIFT? I guess you mean PAGE_SHIFT. > > Yes :) > >> >> Of course PFN can be more than -1UL >> PAGE_SHIFT. Like on 32-bit x86 with >> PAE it is ((1ULL << 36) - 1) >> PAGE_SHIFT. That's the whole reason for >> PAE. > > You are right about PAE, but I think you agree that is is a special case. > >> >> The highest possible PFN must fit into phys_addr_t when shifted left by >> PAGE_SHIFT and must fit into unsigned long. It's can be -1UL if >> phys_addr_t is 64-bit. >> > > Right, and for 32bit, that would mean (assuming something like 12bit > PAGE_SHIFT) if you have -1 (0xffffffff) that the biggest possible > address is 0xfffffffffff (44bit). In that case, the existing code would > already break because "end_pfn" (is actually +1, pointing after the one > to initialize), would overflow to 0 and you would have an endless loop > in memmap_init_zone(). Correction: If end_pfn overflows to 0, you would get no loop iteration at all. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups 2020-01-13 14:40 [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups David Hildenbrand 2020-01-13 14:40 ` [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() David Hildenbrand 2020-01-13 14:40 ` [PATCH v1 2/2] mm: factor out next_present_section_nr() David Hildenbrand @ 2020-01-31 4:30 ` Andrew Morton 2020-02-03 14:49 ` Kirill A. Shutemov 2 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2020-01-31 4:30 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Kirill A. Shutemov, Michal Hocko, Oscar Salvador, Pavel Tatashin On Mon, 13 Jan 2020 15:40:33 +0100 David Hildenbrand <david@redhat.com> wrote: > Two cleanups for "[PATCH] mm/page_alloc: Skip non present sections on zone > initialization" [1], whereby one cleanup seems to also be a fix for a > (theoretial?) kernelcore=mirror case - unless I am messing something up :) > I'm not seeing any acks or reviewed-by's on these two? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups 2020-01-31 4:30 ` [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups Andrew Morton @ 2020-02-03 14:49 ` Kirill A. Shutemov 0 siblings, 0 replies; 19+ messages in thread From: Kirill A. Shutemov @ 2020-02-03 14:49 UTC (permalink / raw) To: Andrew Morton Cc: David Hildenbrand, linux-kernel, linux-mm, Michal Hocko, Oscar Salvador, Pavel Tatashin On Thu, Jan 30, 2020 at 08:30:59PM -0800, Andrew Morton wrote: > On Mon, 13 Jan 2020 15:40:33 +0100 David Hildenbrand <david@redhat.com> wrote: > > > Two cleanups for "[PATCH] mm/page_alloc: Skip non present sections on zone > > initialization" [1], whereby one cleanup seems to also be a fix for a > > (theoretial?) kernelcore=mirror case - unless I am messing something up :) > > > > I'm not seeing any acks or reviewed-by's on these two? You can use mine: Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2020-02-04 8:40 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-13 14:40 [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups David Hildenbrand 2020-01-13 14:40 ` [PATCH v1 1/2] mm/page_alloc: fix and rework pfn handling in memmap_init_zone() David Hildenbrand 2020-02-03 21:35 ` Alexander Duyck 2020-02-03 21:35 ` Alexander Duyck 2020-02-03 21:44 ` David Hildenbrand 2020-02-03 23:17 ` Alexander Duyck 2020-02-03 23:17 ` Alexander Duyck 2020-02-04 8:40 ` David Hildenbrand 2020-01-13 14:40 ` [PATCH v1 2/2] mm: factor out next_present_section_nr() David Hildenbrand 2020-01-13 22:41 ` Kirill A. Shutemov 2020-01-13 22:57 ` David Hildenbrand 2020-01-13 23:02 ` David Hildenbrand 2020-01-14 10:41 ` Kirill A. Shutemov 2020-01-14 10:49 ` David Hildenbrand 2020-01-14 15:52 ` Kirill A. Shutemov 2020-01-14 16:50 ` David Hildenbrand 2020-01-14 16:52 ` David Hildenbrand 2020-01-31 4:30 ` [PATCH v1 0/2] mm/page_alloc: memmap_init_zone() cleanups Andrew Morton 2020-02-03 14:49 ` Kirill A. Shutemov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.