All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@linux.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Baoquan He <bhe@redhat.com>,
	Hoan Tran <Hoan@os.amperecomputing.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Oscar Salvador <osalvador@suse.de>,
	Pavel Tatashin <pavel.tatashin@microsoft.com>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"David S. Miller" <davem@davemloft.net>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org,
	sparclinux@vger.kernel.org, x86@kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	lho@amperecomputing.com, mmorana@amperecomputing.com
Subject: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA
Date: Thu, 9 Apr 2020 19:27:41 +0300	[thread overview]
Message-ID: <20200409162741.GA9387@linux.ibm.com> (raw)
In-Reply-To: <20200331142138.GL30449@dhcp22.suse.cz>

On Tue, Mar 31, 2020 at 04:21:38PM +0200, Michal Hocko wrote:
> On Tue 31-03-20 22:03:32, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 03/31/20 at 10:55am, Michal Hocko wrote:
> > > On Tue 31-03-20 11:14:23, Mike Rapoport wrote:
> > > > Maybe I mis-read the code, but I don't see how this could happen. In the
> > > > HAVE_MEMBLOCK_NODE_MAP=y case, free_area_init_node() calls
> > > > calculate_node_totalpages() that ensures that node->node_zones are entirely
> > > > within the node because this is checked in zone_spanned_pages_in_node().
> > > 
> > > zone_spanned_pages_in_node does chech the zone boundaries are within the
> > > node boundaries. But that doesn't really tell anything about other
> > > potential zones interleaving with the physical memory range.
> > > zone->spanned_pages simply gives the physical range for the zone
> > > including holes. Interleaving nodes are essentially a hole
> > > (__absent_pages_in_range is going to skip those).
> > > 
> > > That means that when free_area_init_core simply goes over the whole
> > > physical zone range including holes and that is why we need to check
> > > both for physical and logical holes (aka other nodes).
> > > 
> > > The life would be so much easier if the whole thing would simply iterate
> > > over memblocks...
> > 
> > The memblock iterating sounds a great idea. I tried with putting the
> > memblock iterating in the upper layer, memmap_init(), which is used for
> > boot mem only anyway. Do you think it's doable and OK? It yes, I can
> > work out a formal patch to make this simpler as you said. The draft code
> > is as below. Like this it uses the existing code and involves little change.
> 
> Doing this would be a step in the right direction! I haven't checked the
> code very closely though. The below sounds way too simple to be truth I
> am afraid. First for_each_mem_pfn_range is available only for
> CONFIG_HAVE_MEMBLOCK_NODE_MAP (which is one of the reasons why I keep
> saying that I really hate that being conditional). Also I haven't really
> checked the deferred initialization path - I have a very vague
> recollection that it has been converted to the memblock api but I have
> happilly dropped all that memory.

The Baoquan's patch almost did it, at least for simple case of qemu with 2
nodes. It's only missing the adjustment to the size passed to
memmap_init_zone() as it may change because of clamping.

I've drafted something that removes HAVE_MEMBLOCK_NODE_MAP and added this
patch there [1]. For several memory configurations I could emulate with
qemu it worked.
I'm going to wait a bit to see of kbuild is happy and then I'll send the
patches.

Baoquan, I took liberty to add your SoB, hope you don't mind.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=memblock/all-have-node-map 
  
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 138a56c0f48f..558d421f294b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6007,14 +6007,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >  		 * function.  They do not exist on hotplugged memory.
> >  		 */
> >  		if (context == MEMMAP_EARLY) {
> > -			if (!early_pfn_valid(pfn)) {
> > -				pfn = next_pfn(pfn);
> > -				continue;
> > -			}
> > -			if (!early_pfn_in_nid(pfn, nid)) {
> > -				pfn++;
> > -				continue;
> > -			}
> >  			if (overlap_memmap_init(zone, &pfn))
> >  				continue;
> >  			if (defer_init(nid, pfn, end_pfn))
> > @@ -6130,9 +6122,17 @@ static void __meminit zone_init_free_lists(struct zone *zone)
> >  }
> >  
> >  void __meminit __weak memmap_init(unsigned long size, int nid,
> > -				  unsigned long zone, unsigned long start_pfn)
> > +				  unsigned long zone, unsigned long range_start_pfn)
> >  {
> > -	memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > +	unsigned long start_pfn, end_pfn;
> > +	unsigned long range_end_pfn = range_start_pfn + size;
> > +	int i;
> > +	for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
> > +		start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
> > +		end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> > +		if (end_pfn > start_pfn)
> > +			memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > +	}
> >  }
> >  
> >  static int zone_batchsize(struct zone *zone)
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Sincerely yours,
Mike.


WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@linux.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: mmorana@amperecomputing.com,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	sparclinux@vger.kernel.org,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	linux-s390@vger.kernel.org, Baoquan He <bhe@redhat.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	x86@kernel.org, Christian Borntraeger <borntraeger@de.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Hoan Tran <Hoan@os.amperecomputing.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Pavel Tatashin <pavel.tatashin@microsoft.com>,
	lho@amperecomputing.com, Vasily Gorbik <gor@linux.ibm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	Oscar Salvador <osalvador@suse.de>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA
Date: Thu, 09 Apr 2020 16:27:41 +0000	[thread overview]
Message-ID: <20200409162741.GA9387@linux.ibm.com> (raw)
In-Reply-To: <20200331142138.GL30449@dhcp22.suse.cz>

On Tue, Mar 31, 2020 at 04:21:38PM +0200, Michal Hocko wrote:
> On Tue 31-03-20 22:03:32, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 03/31/20 at 10:55am, Michal Hocko wrote:
> > > On Tue 31-03-20 11:14:23, Mike Rapoport wrote:
> > > > Maybe I mis-read the code, but I don't see how this could happen. In the
> > > > HAVE_MEMBLOCK_NODE_MAP=y case, free_area_init_node() calls
> > > > calculate_node_totalpages() that ensures that node->node_zones are entirely
> > > > within the node because this is checked in zone_spanned_pages_in_node().
> > > 
> > > zone_spanned_pages_in_node does chech the zone boundaries are within the
> > > node boundaries. But that doesn't really tell anything about other
> > > potential zones interleaving with the physical memory range.
> > > zone->spanned_pages simply gives the physical range for the zone
> > > including holes. Interleaving nodes are essentially a hole
> > > (__absent_pages_in_range is going to skip those).
> > > 
> > > That means that when free_area_init_core simply goes over the whole
> > > physical zone range including holes and that is why we need to check
> > > both for physical and logical holes (aka other nodes).
> > > 
> > > The life would be so much easier if the whole thing would simply iterate
> > > over memblocks...
> > 
> > The memblock iterating sounds a great idea. I tried with putting the
> > memblock iterating in the upper layer, memmap_init(), which is used for
> > boot mem only anyway. Do you think it's doable and OK? It yes, I can
> > work out a formal patch to make this simpler as you said. The draft code
> > is as below. Like this it uses the existing code and involves little change.
> 
> Doing this would be a step in the right direction! I haven't checked the
> code very closely though. The below sounds way too simple to be truth I
> am afraid. First for_each_mem_pfn_range is available only for
> CONFIG_HAVE_MEMBLOCK_NODE_MAP (which is one of the reasons why I keep
> saying that I really hate that being conditional). Also I haven't really
> checked the deferred initialization path - I have a very vague
> recollection that it has been converted to the memblock api but I have
> happilly dropped all that memory.

The Baoquan's patch almost did it, at least for simple case of qemu with 2
nodes. It's only missing the adjustment to the size passed to
memmap_init_zone() as it may change because of clamping.

I've drafted something that removes HAVE_MEMBLOCK_NODE_MAP and added this
patch there [1]. For several memory configurations I could emulate with
qemu it worked.
I'm going to wait a bit to see of kbuild is happy and then I'll send the
patches.

Baoquan, I took liberty to add your SoB, hope you don't mind.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=memblock/all-have-node-map 
  
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 138a56c0f48f..558d421f294b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6007,14 +6007,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >  		 * function.  They do not exist on hotplugged memory.
> >  		 */
> >  		if (context = MEMMAP_EARLY) {
> > -			if (!early_pfn_valid(pfn)) {
> > -				pfn = next_pfn(pfn);
> > -				continue;
> > -			}
> > -			if (!early_pfn_in_nid(pfn, nid)) {
> > -				pfn++;
> > -				continue;
> > -			}
> >  			if (overlap_memmap_init(zone, &pfn))
> >  				continue;
> >  			if (defer_init(nid, pfn, end_pfn))
> > @@ -6130,9 +6122,17 @@ static void __meminit zone_init_free_lists(struct zone *zone)
> >  }
> >  
> >  void __meminit __weak memmap_init(unsigned long size, int nid,
> > -				  unsigned long zone, unsigned long start_pfn)
> > +				  unsigned long zone, unsigned long range_start_pfn)
> >  {
> > -	memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > +	unsigned long start_pfn, end_pfn;
> > +	unsigned long range_end_pfn = range_start_pfn + size;
> > +	int i;
> > +	for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
> > +		start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
> > +		end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> > +		if (end_pfn > start_pfn)
> > +			memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > +	}
> >  }
> >  
> >  static int zone_batchsize(struct zone *zone)
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Sincerely yours,
Mike.

WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@linux.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: mmorana@amperecomputing.com,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	sparclinux@vger.kernel.org,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	linux-s390@vger.kernel.org, Baoquan He <bhe@redhat.com>,
	x86@kernel.org, Christian Borntraeger <borntraeger@de.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Hoan Tran <Hoan@os.amperecomputing.com>,
	Pavel Tatashin <pavel.tatashin@microsoft.com>,
	lho@amperecomputing.com, Vasily Gorbik <gor@linux.ibm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	Oscar Salvador <osalvador@suse.de>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA
Date: Thu, 9 Apr 2020 19:27:41 +0300	[thread overview]
Message-ID: <20200409162741.GA9387@linux.ibm.com> (raw)
In-Reply-To: <20200331142138.GL30449@dhcp22.suse.cz>

On Tue, Mar 31, 2020 at 04:21:38PM +0200, Michal Hocko wrote:
> On Tue 31-03-20 22:03:32, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 03/31/20 at 10:55am, Michal Hocko wrote:
> > > On Tue 31-03-20 11:14:23, Mike Rapoport wrote:
> > > > Maybe I mis-read the code, but I don't see how this could happen. In the
> > > > HAVE_MEMBLOCK_NODE_MAP=y case, free_area_init_node() calls
> > > > calculate_node_totalpages() that ensures that node->node_zones are entirely
> > > > within the node because this is checked in zone_spanned_pages_in_node().
> > > 
> > > zone_spanned_pages_in_node does chech the zone boundaries are within the
> > > node boundaries. But that doesn't really tell anything about other
> > > potential zones interleaving with the physical memory range.
> > > zone->spanned_pages simply gives the physical range for the zone
> > > including holes. Interleaving nodes are essentially a hole
> > > (__absent_pages_in_range is going to skip those).
> > > 
> > > That means that when free_area_init_core simply goes over the whole
> > > physical zone range including holes and that is why we need to check
> > > both for physical and logical holes (aka other nodes).
> > > 
> > > The life would be so much easier if the whole thing would simply iterate
> > > over memblocks...
> > 
> > The memblock iterating sounds a great idea. I tried with putting the
> > memblock iterating in the upper layer, memmap_init(), which is used for
> > boot mem only anyway. Do you think it's doable and OK? It yes, I can
> > work out a formal patch to make this simpler as you said. The draft code
> > is as below. Like this it uses the existing code and involves little change.
> 
> Doing this would be a step in the right direction! I haven't checked the
> code very closely though. The below sounds way too simple to be truth I
> am afraid. First for_each_mem_pfn_range is available only for
> CONFIG_HAVE_MEMBLOCK_NODE_MAP (which is one of the reasons why I keep
> saying that I really hate that being conditional). Also I haven't really
> checked the deferred initialization path - I have a very vague
> recollection that it has been converted to the memblock api but I have
> happilly dropped all that memory.

The Baoquan's patch almost did it, at least for simple case of qemu with 2
nodes. It's only missing the adjustment to the size passed to
memmap_init_zone() as it may change because of clamping.

I've drafted something that removes HAVE_MEMBLOCK_NODE_MAP and added this
patch there [1]. For several memory configurations I could emulate with
qemu it worked.
I'm going to wait a bit to see of kbuild is happy and then I'll send the
patches.

Baoquan, I took liberty to add your SoB, hope you don't mind.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=memblock/all-have-node-map 
  
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 138a56c0f48f..558d421f294b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6007,14 +6007,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >  		 * function.  They do not exist on hotplugged memory.
> >  		 */
> >  		if (context == MEMMAP_EARLY) {
> > -			if (!early_pfn_valid(pfn)) {
> > -				pfn = next_pfn(pfn);
> > -				continue;
> > -			}
> > -			if (!early_pfn_in_nid(pfn, nid)) {
> > -				pfn++;
> > -				continue;
> > -			}
> >  			if (overlap_memmap_init(zone, &pfn))
> >  				continue;
> >  			if (defer_init(nid, pfn, end_pfn))
> > @@ -6130,9 +6122,17 @@ static void __meminit zone_init_free_lists(struct zone *zone)
> >  }
> >  
> >  void __meminit __weak memmap_init(unsigned long size, int nid,
> > -				  unsigned long zone, unsigned long start_pfn)
> > +				  unsigned long zone, unsigned long range_start_pfn)
> >  {
> > -	memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > +	unsigned long start_pfn, end_pfn;
> > +	unsigned long range_end_pfn = range_start_pfn + size;
> > +	int i;
> > +	for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
> > +		start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
> > +		end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> > +		if (end_pfn > start_pfn)
> > +			memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > +	}
> >  }
> >  
> >  static int zone_batchsize(struct zone *zone)
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Sincerely yours,
Mike.


WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@linux.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: mmorana@amperecomputing.com,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>,
	Paul Mackerras <paulus@samba.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	sparclinux@vger.kernel.org,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>,
	linux-s390@vger.kernel.org, Baoquan He <bhe@redhat.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	x86@kernel.org, Christian Borntraeger <borntraeger@de.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Hoan Tran <Hoan@os.amperecomputing.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Pavel Tatashin <pavel.tatashin@microsoft.com>,
	lho@amperecomputing.com, Vasily Gorbik <gor@linux.ibm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Borislav Petkov <bp@alien8.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-kernel@lists.infradead.org,
	Oscar Salvador <osalvador@suse.de>,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA
Date: Thu, 9 Apr 2020 19:27:41 +0300	[thread overview]
Message-ID: <20200409162741.GA9387@linux.ibm.com> (raw)
In-Reply-To: <20200331142138.GL30449@dhcp22.suse.cz>

On Tue, Mar 31, 2020 at 04:21:38PM +0200, Michal Hocko wrote:
> On Tue 31-03-20 22:03:32, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 03/31/20 at 10:55am, Michal Hocko wrote:
> > > On Tue 31-03-20 11:14:23, Mike Rapoport wrote:
> > > > Maybe I mis-read the code, but I don't see how this could happen. In the
> > > > HAVE_MEMBLOCK_NODE_MAP=y case, free_area_init_node() calls
> > > > calculate_node_totalpages() that ensures that node->node_zones are entirely
> > > > within the node because this is checked in zone_spanned_pages_in_node().
> > > 
> > > zone_spanned_pages_in_node does chech the zone boundaries are within the
> > > node boundaries. But that doesn't really tell anything about other
> > > potential zones interleaving with the physical memory range.
> > > zone->spanned_pages simply gives the physical range for the zone
> > > including holes. Interleaving nodes are essentially a hole
> > > (__absent_pages_in_range is going to skip those).
> > > 
> > > That means that when free_area_init_core simply goes over the whole
> > > physical zone range including holes and that is why we need to check
> > > both for physical and logical holes (aka other nodes).
> > > 
> > > The life would be so much easier if the whole thing would simply iterate
> > > over memblocks...
> > 
> > The memblock iterating sounds a great idea. I tried with putting the
> > memblock iterating in the upper layer, memmap_init(), which is used for
> > boot mem only anyway. Do you think it's doable and OK? It yes, I can
> > work out a formal patch to make this simpler as you said. The draft code
> > is as below. Like this it uses the existing code and involves little change.
> 
> Doing this would be a step in the right direction! I haven't checked the
> code very closely though. The below sounds way too simple to be truth I
> am afraid. First for_each_mem_pfn_range is available only for
> CONFIG_HAVE_MEMBLOCK_NODE_MAP (which is one of the reasons why I keep
> saying that I really hate that being conditional). Also I haven't really
> checked the deferred initialization path - I have a very vague
> recollection that it has been converted to the memblock api but I have
> happilly dropped all that memory.

The Baoquan's patch almost did it, at least for simple case of qemu with 2
nodes. It's only missing the adjustment to the size passed to
memmap_init_zone() as it may change because of clamping.

I've drafted something that removes HAVE_MEMBLOCK_NODE_MAP and added this
patch there [1]. For several memory configurations I could emulate with
qemu it worked.
I'm going to wait a bit to see of kbuild is happy and then I'll send the
patches.

Baoquan, I took liberty to add your SoB, hope you don't mind.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=memblock/all-have-node-map 
  
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 138a56c0f48f..558d421f294b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6007,14 +6007,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >  		 * function.  They do not exist on hotplugged memory.
> >  		 */
> >  		if (context == MEMMAP_EARLY) {
> > -			if (!early_pfn_valid(pfn)) {
> > -				pfn = next_pfn(pfn);
> > -				continue;
> > -			}
> > -			if (!early_pfn_in_nid(pfn, nid)) {
> > -				pfn++;
> > -				continue;
> > -			}
> >  			if (overlap_memmap_init(zone, &pfn))
> >  				continue;
> >  			if (defer_init(nid, pfn, end_pfn))
> > @@ -6130,9 +6122,17 @@ static void __meminit zone_init_free_lists(struct zone *zone)
> >  }
> >  
> >  void __meminit __weak memmap_init(unsigned long size, int nid,
> > -				  unsigned long zone, unsigned long start_pfn)
> > +				  unsigned long zone, unsigned long range_start_pfn)
> >  {
> > -	memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > +	unsigned long start_pfn, end_pfn;
> > +	unsigned long range_end_pfn = range_start_pfn + size;
> > +	int i;
> > +	for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
> > +		start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
> > +		end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> > +		if (end_pfn > start_pfn)
> > +			memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > +	}
> >  }
> >  
> >  static int zone_batchsize(struct zone *zone)
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Sincerely yours,
Mike.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2020-04-09 16:28 UTC|newest]

Thread overview: 156+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-28 18:31 [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA Hoan Tran
2020-03-28 18:31 ` Hoan Tran
2020-03-28 18:31 ` Hoan Tran
2020-03-28 18:31 ` Hoan Tran
2020-03-28 18:31 ` Hoan Tran
2020-03-28 18:31 ` Hoan Tran
2020-03-28 18:31 ` [PATCH v3 1/5] " Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31 ` [PATCH v3 2/5] powerpc: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31 ` [PATCH v3 3/5] x86: " Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31 ` [PATCH v3 4/5] sparc: " Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31 ` [PATCH v3 5/5] s390: " Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-28 18:31   ` Hoan Tran
2020-03-29  0:19 ` [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA Baoquan He
2020-03-29  0:19   ` Baoquan He
2020-03-29  0:19   ` Baoquan He
2020-03-29  0:19   ` Baoquan He
2020-03-30  7:44   ` Michal Hocko
2020-03-30  7:44     ` Michal Hocko
2020-03-30  7:44     ` Michal Hocko
2020-03-30  7:44     ` Michal Hocko
2020-03-30  8:04     ` Baoquan He
2020-03-30  8:04       ` Baoquan He
2020-03-30  8:04       ` Baoquan He
2020-03-30  8:04       ` Baoquan He
2020-03-30  7:42 ` Michal Hocko
2020-03-30  7:42   ` Michal Hocko
2020-03-30  7:42   ` Michal Hocko
2020-03-30  7:42   ` Michal Hocko
2020-03-30  8:16   ` Baoquan He
2020-03-30  8:16     ` Baoquan He
2020-03-30  8:16     ` Baoquan He
2020-03-30  8:16     ` Baoquan He
2020-03-30  8:28     ` Baoquan He
2020-03-30  8:28       ` Baoquan He
2020-03-30  8:28       ` Baoquan He
2020-03-30  8:28       ` Baoquan He
2020-03-30  9:21   ` Mike Rapoport
2020-03-30  9:21     ` Mike Rapoport
2020-03-30  9:21     ` Mike Rapoport
2020-03-30  9:21     ` Mike Rapoport
2020-03-30  9:58     ` Michal Hocko
2020-03-30  9:58       ` Michal Hocko
2020-03-30  9:58       ` Michal Hocko
2020-03-30  9:58       ` Michal Hocko
2020-03-30 10:26       ` Mike Rapoport
2020-03-30 10:26         ` Mike Rapoport
2020-03-30 10:26         ` Mike Rapoport
2020-03-30 10:26         ` Mike Rapoport
2020-03-30 10:43         ` Baoquan He
2020-03-30 10:43           ` Baoquan He
2020-03-30 10:43           ` Baoquan He
2020-03-30 10:43           ` Baoquan He
2020-03-31 21:56       ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA) Mike Rapoport
2020-03-31 21:56         ` Mike Rapoport
2020-03-31 21:56         ` Mike Rapoport
2020-03-31 21:56         ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODE Mike Rapoport
2020-04-01  5:42         ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA) Baoquan He
2020-04-01  5:42           ` Baoquan He
2020-04-01  5:42           ` Baoquan He
2020-04-01  5:42           ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_ Baoquan He
2020-04-01  7:51           ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA) Mike Rapoport
2020-04-01  7:51             ` Mike Rapoport
2020-04-01  7:51             ` Mike Rapoport
2020-04-01  7:51             ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_ Mike Rapoport
2020-04-02  8:01             ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA) Michal Hocko
2020-04-02  8:01               ` Michal Hocko
2020-04-02  8:01               ` Michal Hocko
2020-04-02  8:01               ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_ Michal Hocko
2020-04-09 14:41               ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA) Baoquan He
2020-04-09 14:41                 ` Baoquan He
2020-04-09 14:41                 ` Baoquan He
2020-04-09 14:41                 ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_ Baoquan He
2020-04-09 15:33                 ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA) Michal Hocko
2020-04-09 15:33                   ` Michal Hocko
2020-04-09 15:33                   ` Michal Hocko
2020-04-09 15:33                   ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_ Michal Hocko
2020-04-10  6:46                   ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA) Baoquan He
2020-04-10  6:46                     ` Baoquan He
2020-04-10  6:46                     ` Baoquan He
2020-04-10  6:46                     ` [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_ Baoquan He
2020-03-30  9:26   ` [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA Baoquan He
2020-03-30  9:26     ` Baoquan He
2020-03-30  9:26     ` Baoquan He
2020-03-30  9:26     ` Baoquan He
2020-03-30 17:51   ` Mike Rapoport
2020-03-30 17:51     ` Mike Rapoport
2020-03-30 17:51     ` Mike Rapoport
2020-03-30 17:51     ` Mike Rapoport
2020-03-30 18:23     ` Michal Hocko
2020-03-30 18:23       ` Michal Hocko
2020-03-30 18:23       ` Michal Hocko
2020-03-30 18:23       ` Michal Hocko
2020-03-31  8:14       ` Mike Rapoport
2020-03-31  8:14         ` Mike Rapoport
2020-03-31  8:14         ` Mike Rapoport
2020-03-31  8:14         ` Mike Rapoport
2020-03-31  8:55         ` Michal Hocko
2020-03-31  8:55           ` Michal Hocko
2020-03-31  8:55           ` Michal Hocko
2020-03-31  8:55           ` Michal Hocko
2020-03-31 14:03           ` Baoquan He
2020-03-31 14:03             ` Baoquan He
2020-03-31 14:03             ` Baoquan He
2020-03-31 14:03             ` Baoquan He
2020-03-31 14:21             ` Michal Hocko
2020-03-31 14:21               ` Michal Hocko
2020-03-31 14:21               ` Michal Hocko
2020-03-31 14:21               ` Michal Hocko
2020-03-31 14:31               ` Baoquan He
2020-03-31 14:31                 ` Baoquan He
2020-03-31 14:31                 ` Baoquan He
2020-03-31 14:31                 ` Baoquan He
2020-04-03  4:46                 ` Hoan Tran
2020-04-03  4:46                   ` Hoan Tran
2020-04-03  4:46                   ` Hoan Tran
2020-04-03  4:46                   ` Hoan Tran
2020-04-03  7:09                   ` Baoquan He
2020-04-03  7:09                     ` Baoquan He
2020-04-03  7:09                     ` Baoquan He
2020-04-03  7:09                     ` Baoquan He
2020-04-03 16:36                     ` Hoan Tran
2020-04-03 16:36                       ` Hoan Tran
2020-04-03 16:36                       ` Hoan Tran
2020-04-03 16:36                       ` Hoan Tran
2020-04-09 16:27               ` Mike Rapoport [this message]
2020-04-09 16:27                 ` Mike Rapoport
2020-04-09 16:27                 ` Mike Rapoport
2020-04-09 16:27                 ` Mike Rapoport
2020-04-10  6:50                 ` Baoquan He
2020-04-10  6:50                   ` Baoquan He
2020-04-10  6:50                   ` Baoquan He
2020-04-10  6:50                   ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200409162741.GA9387@linux.ibm.com \
    --to=rppt@linux.ibm.com \
    --cc=Hoan@os.amperecomputing.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=benh@kernel.crashing.org \
    --cc=bhe@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=davem@davemloft.net \
    --cc=gor@linux.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=lho@amperecomputing.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mmorana@amperecomputing.com \
    --cc=mpe@ellerman.id.au \
    --cc=osalvador@suse.de \
    --cc=paulus@samba.org \
    --cc=pavel.tatashin@microsoft.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.