* [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-18 15:56 ` Russ Anderson 0 siblings, 0 replies; 41+ messages in thread From: Russ Anderson @ 2013-03-18 15:56 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, tglx, mingo, hpa, Russ Anderson When booting on a large memory system, the kernel spends considerable time in memmap_init_zone() setting up memory zones. Analysis shows significant time spent in __early_pfn_to_nid(). The routine memmap_init_zone() checks each PFN to verify the nid is valid. __early_pfn_to_nid() sequentially scans the list of pfn ranges to find the right range and returns the nid. This does not scale well. On a 4 TB (single rack) system there are 308 memory ranges to scan. The higher the PFN the more time spent sequentially spinning through memory ranges. Since memmap_init_zone() increments pfn, it will almost always be looking for the same range as the previous pfn, so check that range first. If it is in the same range, return that nid. If not, scan the list as before. A 4 TB (single rack) UV1 system takes 512 seconds to get through the zone code. This performance optimization reduces the time by 189 seconds, a 36% improvement. A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, a 112.9 second (53%) reduction. Signed-off-by: Russ Anderson <rja@sgi.com> --- mm/page_alloc.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) Index: linux/mm/page_alloc.c =================================================================== --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne { unsigned long start_pfn, end_pfn; int i, nid; + static unsigned long last_start_pfn, last_end_pfn; + static int last_nid; + + if (last_start_pfn <= pfn && pfn < last_end_pfn) + return last_nid; for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) - if (start_pfn <= pfn && pfn < end_pfn) + if (start_pfn <= pfn && pfn < end_pfn) { + last_nid = nid; + last_start_pfn = start_pfn; + last_end_pfn = end_pfn; return nid; + } /* This is a memory hole */ return -1; } -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc rja@sgi.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-18 15:56 ` Russ Anderson 0 siblings, 0 replies; 41+ messages in thread From: Russ Anderson @ 2013-03-18 15:56 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, tglx, mingo, hpa, Russ Anderson When booting on a large memory system, the kernel spends considerable time in memmap_init_zone() setting up memory zones. Analysis shows significant time spent in __early_pfn_to_nid(). The routine memmap_init_zone() checks each PFN to verify the nid is valid. __early_pfn_to_nid() sequentially scans the list of pfn ranges to find the right range and returns the nid. This does not scale well. On a 4 TB (single rack) system there are 308 memory ranges to scan. The higher the PFN the more time spent sequentially spinning through memory ranges. Since memmap_init_zone() increments pfn, it will almost always be looking for the same range as the previous pfn, so check that range first. If it is in the same range, return that nid. If not, scan the list as before. A 4 TB (single rack) UV1 system takes 512 seconds to get through the zone code. This performance optimization reduces the time by 189 seconds, a 36% improvement. A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, a 112.9 second (53%) reduction. Signed-off-by: Russ Anderson <rja@sgi.com> --- mm/page_alloc.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) Index: linux/mm/page_alloc.c =================================================================== --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne { unsigned long start_pfn, end_pfn; int i, nid; + static unsigned long last_start_pfn, last_end_pfn; + static int last_nid; + + if (last_start_pfn <= pfn && pfn < last_end_pfn) + return last_nid; for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) - if (start_pfn <= pfn && pfn < end_pfn) + if (start_pfn <= pfn && pfn < end_pfn) { + last_nid = nid; + last_start_pfn = start_pfn; + last_end_pfn = end_pfn; return nid; + } /* This is a memory hole */ return -1; } -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc rja@sgi.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-18 15:56 ` Russ Anderson @ 2013-03-19 3:56 ` David Rientjes -1 siblings, 0 replies; 41+ messages in thread From: David Rientjes @ 2013-03-19 3:56 UTC (permalink / raw) To: Russ Anderson; +Cc: linux-mm, linux-kernel, tglx, mingo, hpa On Mon, 18 Mar 2013, Russ Anderson wrote: > When booting on a large memory system, the kernel spends > considerable time in memmap_init_zone() setting up memory zones. > Analysis shows significant time spent in __early_pfn_to_nid(). > > The routine memmap_init_zone() checks each PFN to verify the > nid is valid. __early_pfn_to_nid() sequentially scans the list of > pfn ranges to find the right range and returns the nid. This does > not scale well. On a 4 TB (single rack) system there are 308 > memory ranges to scan. The higher the PFN the more time spent > sequentially spinning through memory ranges. > > Since memmap_init_zone() increments pfn, it will almost always be > looking for the same range as the previous pfn, so check that > range first. If it is in the same range, return that nid. > If not, scan the list as before. > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > the zone code. This performance optimization reduces the time > by 189 seconds, a 36% improvement. > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > a 112.9 second (53%) reduction. > > Signed-off-by: Russ Anderson <rja@sgi.com> Acked-by: David Rientjes <rientjes@google.com> Very nice improvement! ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-19 3:56 ` David Rientjes 0 siblings, 0 replies; 41+ messages in thread From: David Rientjes @ 2013-03-19 3:56 UTC (permalink / raw) To: Russ Anderson; +Cc: linux-mm, linux-kernel, tglx, mingo, hpa On Mon, 18 Mar 2013, Russ Anderson wrote: > When booting on a large memory system, the kernel spends > considerable time in memmap_init_zone() setting up memory zones. > Analysis shows significant time spent in __early_pfn_to_nid(). > > The routine memmap_init_zone() checks each PFN to verify the > nid is valid. __early_pfn_to_nid() sequentially scans the list of > pfn ranges to find the right range and returns the nid. This does > not scale well. On a 4 TB (single rack) system there are 308 > memory ranges to scan. The higher the PFN the more time spent > sequentially spinning through memory ranges. > > Since memmap_init_zone() increments pfn, it will almost always be > looking for the same range as the previous pfn, so check that > range first. If it is in the same range, return that nid. > If not, scan the list as before. > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > the zone code. This performance optimization reduces the time > by 189 seconds, a 36% improvement. > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > a 112.9 second (53%) reduction. > > Signed-off-by: Russ Anderson <rja@sgi.com> Acked-by: David Rientjes <rientjes@google.com> Very nice improvement! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-18 15:56 ` Russ Anderson (?) @ 2013-03-20 22:32 ` Andrew Morton -1 siblings, 0 replies; 41+ messages in thread From: Andrew Morton @ 2013-03-20 22:32 UTC (permalink / raw) To: Russ Anderson; +Cc: linux-mm, linux-kernel, tglx, mingo, hpa, linux-ia64 On Mon, 18 Mar 2013 10:56:19 -0500 Russ Anderson <rja@sgi.com> wrote: > When booting on a large memory system, the kernel spends > considerable time in memmap_init_zone() setting up memory zones. > Analysis shows significant time spent in __early_pfn_to_nid(). > > The routine memmap_init_zone() checks each PFN to verify the > nid is valid. __early_pfn_to_nid() sequentially scans the list of > pfn ranges to find the right range and returns the nid. This does > not scale well. On a 4 TB (single rack) system there are 308 > memory ranges to scan. The higher the PFN the more time spent > sequentially spinning through memory ranges. > > Since memmap_init_zone() increments pfn, it will almost always be > looking for the same range as the previous pfn, so check that > range first. If it is in the same range, return that nid. > If not, scan the list as before. > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > the zone code. This performance optimization reduces the time > by 189 seconds, a 36% improvement. > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > a 112.9 second (53%) reduction. > > ... > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > { > unsigned long start_pfn, end_pfn; > int i, nid; > + static unsigned long last_start_pfn, last_end_pfn; > + static int last_nid; > + > + if (last_start_pfn <= pfn && pfn < last_end_pfn) > + return last_nid; > > for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) > - if (start_pfn <= pfn && pfn < end_pfn) > + if (start_pfn <= pfn && pfn < end_pfn) { > + last_nid = nid; > + last_start_pfn = start_pfn; > + last_end_pfn = end_pfn; > return nid; > + } > /* This is a memory hole */ > return -1; lol. And yes, it seems pretty safe to assume that the kernel is running single-threaded at this time. arch/ia64/mm/numa.c's __early_pfn_to_nid might benefit from the same treatment. In fact if this had been implemented as a caching wrapper around an unchanged __early_pfn_to_nid(), no ia64 edits would be needed? ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-20 22:32 ` Andrew Morton 0 siblings, 0 replies; 41+ messages in thread From: Andrew Morton @ 2013-03-20 22:32 UTC (permalink / raw) To: Russ Anderson; +Cc: linux-mm, linux-kernel, tglx, mingo, hpa, linux-ia64 On Mon, 18 Mar 2013 10:56:19 -0500 Russ Anderson <rja@sgi.com> wrote: > When booting on a large memory system, the kernel spends > considerable time in memmap_init_zone() setting up memory zones. > Analysis shows significant time spent in __early_pfn_to_nid(). > > The routine memmap_init_zone() checks each PFN to verify the > nid is valid. __early_pfn_to_nid() sequentially scans the list of > pfn ranges to find the right range and returns the nid. This does > not scale well. On a 4 TB (single rack) system there are 308 > memory ranges to scan. The higher the PFN the more time spent > sequentially spinning through memory ranges. > > Since memmap_init_zone() increments pfn, it will almost always be > looking for the same range as the previous pfn, so check that > range first. If it is in the same range, return that nid. > If not, scan the list as before. > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > the zone code. This performance optimization reduces the time > by 189 seconds, a 36% improvement. > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > a 112.9 second (53%) reduction. > > ... > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > { > unsigned long start_pfn, end_pfn; > int i, nid; > + static unsigned long last_start_pfn, last_end_pfn; > + static int last_nid; > + > + if (last_start_pfn <= pfn && pfn < last_end_pfn) > + return last_nid; > > for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) > - if (start_pfn <= pfn && pfn < end_pfn) > + if (start_pfn <= pfn && pfn < end_pfn) { > + last_nid = nid; > + last_start_pfn = start_pfn; > + last_end_pfn = end_pfn; > return nid; > + } > /* This is a memory hole */ > return -1; lol. And yes, it seems pretty safe to assume that the kernel is running single-threaded at this time. arch/ia64/mm/numa.c's __early_pfn_to_nid might benefit from the same treatment. In fact if this had been implemented as a caching wrapper around an unchanged __early_pfn_to_nid(), no ia64 edits would be needed? ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-20 22:32 ` Andrew Morton 0 siblings, 0 replies; 41+ messages in thread From: Andrew Morton @ 2013-03-20 22:32 UTC (permalink / raw) To: Russ Anderson; +Cc: linux-mm, linux-kernel, tglx, mingo, hpa, linux-ia64 On Mon, 18 Mar 2013 10:56:19 -0500 Russ Anderson <rja@sgi.com> wrote: > When booting on a large memory system, the kernel spends > considerable time in memmap_init_zone() setting up memory zones. > Analysis shows significant time spent in __early_pfn_to_nid(). > > The routine memmap_init_zone() checks each PFN to verify the > nid is valid. __early_pfn_to_nid() sequentially scans the list of > pfn ranges to find the right range and returns the nid. This does > not scale well. On a 4 TB (single rack) system there are 308 > memory ranges to scan. The higher the PFN the more time spent > sequentially spinning through memory ranges. > > Since memmap_init_zone() increments pfn, it will almost always be > looking for the same range as the previous pfn, so check that > range first. If it is in the same range, return that nid. > If not, scan the list as before. > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > the zone code. This performance optimization reduces the time > by 189 seconds, a 36% improvement. > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > a 112.9 second (53%) reduction. > > ... > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > { > unsigned long start_pfn, end_pfn; > int i, nid; > + static unsigned long last_start_pfn, last_end_pfn; > + static int last_nid; > + > + if (last_start_pfn <= pfn && pfn < last_end_pfn) > + return last_nid; > > for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) > - if (start_pfn <= pfn && pfn < end_pfn) > + if (start_pfn <= pfn && pfn < end_pfn) { > + last_nid = nid; > + last_start_pfn = start_pfn; > + last_end_pfn = end_pfn; > return nid; > + } > /* This is a memory hole */ > return -1; lol. And yes, it seems pretty safe to assume that the kernel is running single-threaded at this time. arch/ia64/mm/numa.c's __early_pfn_to_nid might benefit from the same treatment. In fact if this had been implemented as a caching wrapper around an unchanged __early_pfn_to_nid(), no ia64 edits would be needed? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-18 15:56 ` Russ Anderson @ 2013-03-21 10:55 ` Ingo Molnar -1 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2013-03-21 10:55 UTC (permalink / raw) To: Russ Anderson; +Cc: linux-mm, linux-kernel, tglx, mingo, hpa * Russ Anderson <rja@sgi.com> wrote: > When booting on a large memory system, the kernel spends > considerable time in memmap_init_zone() setting up memory zones. > Analysis shows significant time spent in __early_pfn_to_nid(). > > The routine memmap_init_zone() checks each PFN to verify the > nid is valid. __early_pfn_to_nid() sequentially scans the list of > pfn ranges to find the right range and returns the nid. This does > not scale well. On a 4 TB (single rack) system there are 308 > memory ranges to scan. The higher the PFN the more time spent > sequentially spinning through memory ranges. > > Since memmap_init_zone() increments pfn, it will almost always be > looking for the same range as the previous pfn, so check that > range first. If it is in the same range, return that nid. > If not, scan the list as before. > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > the zone code. This performance optimization reduces the time > by 189 seconds, a 36% improvement. > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > a 112.9 second (53%) reduction. Nice speedup! A minor nit, in addition to Andrew's suggestion about wrapping __early_pfn_to_nid(): > Index: linux/mm/page_alloc.c > =================================================================== > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > { > unsigned long start_pfn, end_pfn; > int i, nid; > + static unsigned long last_start_pfn, last_end_pfn; > + static int last_nid; Please move these globals out of function local scope, to make it more apparent that they are not on-stack. I only noticed it in the second pass. Thanks, Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-21 10:55 ` Ingo Molnar 0 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2013-03-21 10:55 UTC (permalink / raw) To: Russ Anderson; +Cc: linux-mm, linux-kernel, tglx, mingo, hpa * Russ Anderson <rja@sgi.com> wrote: > When booting on a large memory system, the kernel spends > considerable time in memmap_init_zone() setting up memory zones. > Analysis shows significant time spent in __early_pfn_to_nid(). > > The routine memmap_init_zone() checks each PFN to verify the > nid is valid. __early_pfn_to_nid() sequentially scans the list of > pfn ranges to find the right range and returns the nid. This does > not scale well. On a 4 TB (single rack) system there are 308 > memory ranges to scan. The higher the PFN the more time spent > sequentially spinning through memory ranges. > > Since memmap_init_zone() increments pfn, it will almost always be > looking for the same range as the previous pfn, so check that > range first. If it is in the same range, return that nid. > If not, scan the list as before. > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > the zone code. This performance optimization reduces the time > by 189 seconds, a 36% improvement. > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > a 112.9 second (53%) reduction. Nice speedup! A minor nit, in addition to Andrew's suggestion about wrapping __early_pfn_to_nid(): > Index: linux/mm/page_alloc.c > =================================================================== > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > { > unsigned long start_pfn, end_pfn; > int i, nid; > + static unsigned long last_start_pfn, last_end_pfn; > + static int last_nid; Please move these globals out of function local scope, to make it more apparent that they are not on-stack. I only noticed it in the second pass. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-21 10:55 ` Ingo Molnar @ 2013-03-21 12:35 ` Michal Hocko -1 siblings, 0 replies; 41+ messages in thread From: Michal Hocko @ 2013-03-21 12:35 UTC (permalink / raw) To: Ingo Molnar; +Cc: Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa On Thu 21-03-13 11:55:16, Ingo Molnar wrote: > > * Russ Anderson <rja@sgi.com> wrote: > > > When booting on a large memory system, the kernel spends > > considerable time in memmap_init_zone() setting up memory zones. > > Analysis shows significant time spent in __early_pfn_to_nid(). > > > > The routine memmap_init_zone() checks each PFN to verify the > > nid is valid. __early_pfn_to_nid() sequentially scans the list of > > pfn ranges to find the right range and returns the nid. This does > > not scale well. On a 4 TB (single rack) system there are 308 > > memory ranges to scan. The higher the PFN the more time spent > > sequentially spinning through memory ranges. > > > > Since memmap_init_zone() increments pfn, it will almost always be > > looking for the same range as the previous pfn, so check that > > range first. If it is in the same range, return that nid. > > If not, scan the list as before. > > > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > > the zone code. This performance optimization reduces the time > > by 189 seconds, a 36% improvement. > > > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > > a 112.9 second (53%) reduction. > > Nice speedup! > > A minor nit, in addition to Andrew's suggestion about wrapping > __early_pfn_to_nid(): > > > Index: linux/mm/page_alloc.c > > =================================================================== > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > > { > > unsigned long start_pfn, end_pfn; > > int i, nid; > > + static unsigned long last_start_pfn, last_end_pfn; > > + static int last_nid; > > Please move these globals out of function local scope, to make it more > apparent that they are not on-stack. I only noticed it in the second pass. Wouldn't this just add more confision with other _pfn variables? (e.g. {min,max}_low_pfn and others) IMO the local scope is more obvious as this is and should only be used for caching purposes. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-21 12:35 ` Michal Hocko 0 siblings, 0 replies; 41+ messages in thread From: Michal Hocko @ 2013-03-21 12:35 UTC (permalink / raw) To: Ingo Molnar; +Cc: Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa On Thu 21-03-13 11:55:16, Ingo Molnar wrote: > > * Russ Anderson <rja@sgi.com> wrote: > > > When booting on a large memory system, the kernel spends > > considerable time in memmap_init_zone() setting up memory zones. > > Analysis shows significant time spent in __early_pfn_to_nid(). > > > > The routine memmap_init_zone() checks each PFN to verify the > > nid is valid. __early_pfn_to_nid() sequentially scans the list of > > pfn ranges to find the right range and returns the nid. This does > > not scale well. On a 4 TB (single rack) system there are 308 > > memory ranges to scan. The higher the PFN the more time spent > > sequentially spinning through memory ranges. > > > > Since memmap_init_zone() increments pfn, it will almost always be > > looking for the same range as the previous pfn, so check that > > range first. If it is in the same range, return that nid. > > If not, scan the list as before. > > > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > > the zone code. This performance optimization reduces the time > > by 189 seconds, a 36% improvement. > > > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > > a 112.9 second (53%) reduction. > > Nice speedup! > > A minor nit, in addition to Andrew's suggestion about wrapping > __early_pfn_to_nid(): > > > Index: linux/mm/page_alloc.c > > =================================================================== > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > > { > > unsigned long start_pfn, end_pfn; > > int i, nid; > > + static unsigned long last_start_pfn, last_end_pfn; > > + static int last_nid; > > Please move these globals out of function local scope, to make it more > apparent that they are not on-stack. I only noticed it in the second pass. Wouldn't this just add more confision with other _pfn variables? (e.g. {min,max}_low_pfn and others) IMO the local scope is more obvious as this is and should only be used for caching purposes. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-21 12:35 ` Michal Hocko @ 2013-03-21 18:03 ` Ingo Molnar -1 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2013-03-21 18:03 UTC (permalink / raw) To: Michal Hocko; +Cc: Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa * Michal Hocko <mhocko@suse.cz> wrote: > On Thu 21-03-13 11:55:16, Ingo Molnar wrote: > > > > * Russ Anderson <rja@sgi.com> wrote: > > > > > When booting on a large memory system, the kernel spends > > > considerable time in memmap_init_zone() setting up memory zones. > > > Analysis shows significant time spent in __early_pfn_to_nid(). > > > > > > The routine memmap_init_zone() checks each PFN to verify the > > > nid is valid. __early_pfn_to_nid() sequentially scans the list of > > > pfn ranges to find the right range and returns the nid. This does > > > not scale well. On a 4 TB (single rack) system there are 308 > > > memory ranges to scan. The higher the PFN the more time spent > > > sequentially spinning through memory ranges. > > > > > > Since memmap_init_zone() increments pfn, it will almost always be > > > looking for the same range as the previous pfn, so check that > > > range first. If it is in the same range, return that nid. > > > If not, scan the list as before. > > > > > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > > > the zone code. This performance optimization reduces the time > > > by 189 seconds, a 36% improvement. > > > > > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > > > a 112.9 second (53%) reduction. > > > > Nice speedup! > > > > A minor nit, in addition to Andrew's suggestion about wrapping > > __early_pfn_to_nid(): > > > > > Index: linux/mm/page_alloc.c > > > =================================================================== > > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > > > { > > > unsigned long start_pfn, end_pfn; > > > int i, nid; > > > + static unsigned long last_start_pfn, last_end_pfn; > > > + static int last_nid; > > > > Please move these globals out of function local scope, to make it more > > apparent that they are not on-stack. I only noticed it in the second pass. > > Wouldn't this just add more confision with other _pfn variables? (e.g. > {min,max}_low_pfn and others) I don't think so. > IMO the local scope is more obvious as this is and should only be used > for caching purposes. It's a pattern we actively avoid in kernel code. Thanks, Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-21 18:03 ` Ingo Molnar 0 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2013-03-21 18:03 UTC (permalink / raw) To: Michal Hocko; +Cc: Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa * Michal Hocko <mhocko@suse.cz> wrote: > On Thu 21-03-13 11:55:16, Ingo Molnar wrote: > > > > * Russ Anderson <rja@sgi.com> wrote: > > > > > When booting on a large memory system, the kernel spends > > > considerable time in memmap_init_zone() setting up memory zones. > > > Analysis shows significant time spent in __early_pfn_to_nid(). > > > > > > The routine memmap_init_zone() checks each PFN to verify the > > > nid is valid. __early_pfn_to_nid() sequentially scans the list of > > > pfn ranges to find the right range and returns the nid. This does > > > not scale well. On a 4 TB (single rack) system there are 308 > > > memory ranges to scan. The higher the PFN the more time spent > > > sequentially spinning through memory ranges. > > > > > > Since memmap_init_zone() increments pfn, it will almost always be > > > looking for the same range as the previous pfn, so check that > > > range first. If it is in the same range, return that nid. > > > If not, scan the list as before. > > > > > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > > > the zone code. This performance optimization reduces the time > > > by 189 seconds, a 36% improvement. > > > > > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > > > a 112.9 second (53%) reduction. > > > > Nice speedup! > > > > A minor nit, in addition to Andrew's suggestion about wrapping > > __early_pfn_to_nid(): > > > > > Index: linux/mm/page_alloc.c > > > =================================================================== > > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > > > { > > > unsigned long start_pfn, end_pfn; > > > int i, nid; > > > + static unsigned long last_start_pfn, last_end_pfn; > > > + static int last_nid; > > > > Please move these globals out of function local scope, to make it more > > apparent that they are not on-stack. I only noticed it in the second pass. > > Wouldn't this just add more confision with other _pfn variables? (e.g. > {min,max}_low_pfn and others) I don't think so. > IMO the local scope is more obvious as this is and should only be used > for caching purposes. It's a pattern we actively avoid in kernel code. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-21 18:03 ` Ingo Molnar @ 2013-03-25 21:26 ` Andrew Morton -1 siblings, 0 replies; 41+ messages in thread From: Andrew Morton @ 2013-03-25 21:26 UTC (permalink / raw) To: Ingo Molnar Cc: Michal Hocko, Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa On Thu, 21 Mar 2013 19:03:21 +0100 Ingo Molnar <mingo@kernel.org> wrote: > > IMO the local scope is more obvious as this is and should only be used > > for caching purposes. > > It's a pattern we actively avoid in kernel code. On the contrary, I always encourage people to move the static definitions into function scope if possible. So the reader can see the identifier's scope without having to search the whole file. Unnecessarily giving the identifier file-scope seems weird. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-25 21:26 ` Andrew Morton 0 siblings, 0 replies; 41+ messages in thread From: Andrew Morton @ 2013-03-25 21:26 UTC (permalink / raw) To: Ingo Molnar Cc: Michal Hocko, Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa On Thu, 21 Mar 2013 19:03:21 +0100 Ingo Molnar <mingo@kernel.org> wrote: > > IMO the local scope is more obvious as this is and should only be used > > for caching purposes. > > It's a pattern we actively avoid in kernel code. On the contrary, I always encourage people to move the static definitions into function scope if possible. So the reader can see the identifier's scope without having to search the whole file. Unnecessarily giving the identifier file-scope seems weird. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-25 21:26 ` Andrew Morton @ 2013-03-26 8:05 ` Ingo Molnar -1 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2013-03-26 8:05 UTC (permalink / raw) To: Andrew Morton Cc: Michal Hocko, Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa * Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 21 Mar 2013 19:03:21 +0100 Ingo Molnar <mingo@kernel.org> wrote: > > > > IMO the local scope is more obvious as this is and should only be > > > used for caching purposes. > > > > It's a pattern we actively avoid in kernel code. > > On the contrary, I always encourage people to move the static > definitions into function scope if possible. So the reader can see the > identifier's scope without having to search the whole file. > Unnecessarily giving the identifier file-scope seems weird. A common solution I use is to move such variables right before the function itself. That makes the "this function's scope only" aspect pretty apparent - without the risks of hiding globals amongst local variables. The other approach is to comment the variables very clearly that they are really globals as the 'static' keyword is easy to miss while reading email. Both solutions are basically just as visible as the solution you prefer - but more robust. Anyway, I guess we have to agree to disagree on that, we probably already spent more energy on discussing this than any worst-case problem the placement of these variables could ever cause in the future ;-) Thanks, Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-26 8:05 ` Ingo Molnar 0 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2013-03-26 8:05 UTC (permalink / raw) To: Andrew Morton Cc: Michal Hocko, Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa * Andrew Morton <akpm@linux-foundation.org> wrote: > On Thu, 21 Mar 2013 19:03:21 +0100 Ingo Molnar <mingo@kernel.org> wrote: > > > > IMO the local scope is more obvious as this is and should only be > > > used for caching purposes. > > > > It's a pattern we actively avoid in kernel code. > > On the contrary, I always encourage people to move the static > definitions into function scope if possible. So the reader can see the > identifier's scope without having to search the whole file. > Unnecessarily giving the identifier file-scope seems weird. A common solution I use is to move such variables right before the function itself. That makes the "this function's scope only" aspect pretty apparent - without the risks of hiding globals amongst local variables. The other approach is to comment the variables very clearly that they are really globals as the 'static' keyword is easy to miss while reading email. Both solutions are basically just as visible as the solution you prefer - but more robust. Anyway, I guess we have to agree to disagree on that, we probably already spent more energy on discussing this than any worst-case problem the placement of these variables could ever cause in the future ;-) Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-21 10:55 ` Ingo Molnar @ 2013-03-21 18:40 ` David Rientjes -1 siblings, 0 replies; 41+ messages in thread From: David Rientjes @ 2013-03-21 18:40 UTC (permalink / raw) To: Ingo Molnar; +Cc: Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa On Thu, 21 Mar 2013, Ingo Molnar wrote: > > Index: linux/mm/page_alloc.c > > =================================================================== > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > > { > > unsigned long start_pfn, end_pfn; > > int i, nid; > > + static unsigned long last_start_pfn, last_end_pfn; > > + static int last_nid; > > Please move these globals out of function local scope, to make it more > apparent that they are not on-stack. I only noticed it in the second pass. > The way they're currently defined places these in meminit.data as appropriate; if they are moved out, please make sure to annotate their definitions with __meminitdata. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-21 18:40 ` David Rientjes 0 siblings, 0 replies; 41+ messages in thread From: David Rientjes @ 2013-03-21 18:40 UTC (permalink / raw) To: Ingo Molnar; +Cc: Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa On Thu, 21 Mar 2013, Ingo Molnar wrote: > > Index: linux/mm/page_alloc.c > > =================================================================== > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > > { > > unsigned long start_pfn, end_pfn; > > int i, nid; > > + static unsigned long last_start_pfn, last_end_pfn; > > + static int last_nid; > > Please move these globals out of function local scope, to make it more > apparent that they are not on-stack. I only noticed it in the second pass. > The way they're currently defined places these in meminit.data as appropriate; if they are moved out, please make sure to annotate their definitions with __meminitdata. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-21 18:40 ` David Rientjes @ 2013-03-22 7:25 ` Ingo Molnar -1 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2013-03-22 7:25 UTC (permalink / raw) To: David Rientjes; +Cc: Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa * David Rientjes <rientjes@google.com> wrote: > On Thu, 21 Mar 2013, Ingo Molnar wrote: > > > > Index: linux/mm/page_alloc.c > > > =================================================================== > > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > > > { > > > unsigned long start_pfn, end_pfn; > > > int i, nid; > > > + static unsigned long last_start_pfn, last_end_pfn; > > > + static int last_nid; > > > > Please move these globals out of function local scope, to make it more > > apparent that they are not on-stack. I only noticed it in the second pass. > > The way they're currently defined places these in meminit.data as > appropriate; if they are moved out, please make sure to annotate their > definitions with __meminitdata. I'm fine with having them within the function as well in this special case, as long as a heavy /* NOTE: ... */ warning is put before them - which explains why these SMP-unsafe globals are safe. ( That warning will also act as a visual delimiter that breaks the normally confusing and misleading 'globals mixed amongst stack variables' pattern. ) Thanks, Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-22 7:25 ` Ingo Molnar 0 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2013-03-22 7:25 UTC (permalink / raw) To: David Rientjes; +Cc: Russ Anderson, linux-mm, linux-kernel, tglx, mingo, hpa * David Rientjes <rientjes@google.com> wrote: > On Thu, 21 Mar 2013, Ingo Molnar wrote: > > > > Index: linux/mm/page_alloc.c > > > =================================================================== > > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > > > { > > > unsigned long start_pfn, end_pfn; > > > int i, nid; > > > + static unsigned long last_start_pfn, last_end_pfn; > > > + static int last_nid; > > > > Please move these globals out of function local scope, to make it more > > apparent that they are not on-stack. I only noticed it in the second pass. > > The way they're currently defined places these in meminit.data as > appropriate; if they are moved out, please make sure to annotate their > definitions with __meminitdata. I'm fine with having them within the function as well in this special case, as long as a heavy /* NOTE: ... */ warning is put before them - which explains why these SMP-unsafe globals are safe. ( That warning will also act as a visual delimiter that breaks the normally confusing and misleading 'globals mixed amongst stack variables' pattern. ) Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-22 7:25 ` Ingo Molnar @ 2013-03-23 15:29 ` Russ Anderson -1 siblings, 0 replies; 41+ messages in thread From: Russ Anderson @ 2013-03-23 15:29 UTC (permalink / raw) To: Ingo Molnar; +Cc: David Rientjes, linux-mm, linux-kernel, tglx, mingo, hpa On Fri, Mar 22, 2013 at 08:25:32AM +0100, Ingo Molnar wrote: > > * David Rientjes <rientjes@google.com> wrote: > > > On Thu, 21 Mar 2013, Ingo Molnar wrote: > > > > > > Index: linux/mm/page_alloc.c > > > > =================================================================== > > > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > > > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > > > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > > > > { > > > > unsigned long start_pfn, end_pfn; > > > > int i, nid; > > > > + static unsigned long last_start_pfn, last_end_pfn; > > > > + static int last_nid; > > > > > > Please move these globals out of function local scope, to make it more > > > apparent that they are not on-stack. I only noticed it in the second pass. > > > > The way they're currently defined places these in meminit.data as > > appropriate; if they are moved out, please make sure to annotate their > > definitions with __meminitdata. > > I'm fine with having them within the function as well in this special > case, as long as a heavy /* NOTE: ... */ warning is put before them - > which explains why these SMP-unsafe globals are safe. > > ( That warning will also act as a visual delimiter that breaks the > normally confusing and misleading 'globals mixed amongst stack > variables' pattern. ) Thanks Ingo. Here is an updated patch with heavy warning added. As for the wrapper function, I was unable to find an obvious way to add a wrapper without significanly changing both versions of __early_pfn_to_nid(). It seems cleaner to add the change in both versions. I'm sure someone will point out if this conclusion is wrong. :-) ------------------------------------------------------------ When booting on a large memory system, the kernel spends considerable time in memmap_init_zone() setting up memory zones. Analysis shows significant time spent in __early_pfn_to_nid(). The routine memmap_init_zone() checks each PFN to verify the nid is valid. __early_pfn_to_nid() sequentially scans the list of pfn ranges to find the right range and returns the nid. This does not scale well. On a 4 TB (single rack) system there are 308 memory ranges to scan. The higher the PFN the more time spent sequentially spinning through memory ranges. Since memmap_init_zone() increments pfn, it will almost always be looking for the same range as the previous pfn, so check that range first. If it is in the same range, return that nid. If not, scan the list as before. A 4 TB (single rack) UV1 system takes 512 seconds to get through the zone code. This performance optimization reduces the time by 189 seconds, a 36% improvement. A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, a 112.9 second (53%) reduction. Signed-off-by: Russ Anderson <rja@sgi.com> --- arch/ia64/mm/numa.c | 15 ++++++++++++++- mm/page_alloc.c | 15 ++++++++++++++- 2 files changed, 28 insertions(+), 2 deletions(-) Index: linux/mm/page_alloc.c =================================================================== --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne { unsigned long start_pfn, end_pfn; int i, nid; + /* + NOTE: The following SMP-unsafe globals are only used early + in boot when the kernel is running single-threaded. + */ + static unsigned long last_start_pfn, last_end_pfn; + static int last_nid; + + if (last_start_pfn <= pfn && pfn < last_end_pfn) + return last_nid; for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) - if (start_pfn <= pfn && pfn < end_pfn) + if (start_pfn <= pfn && pfn < end_pfn) { + last_start_pfn = start_pfn; + last_end_pfn = end_pfn; + last_nid = nid; return nid; + } /* This is a memory hole */ return -1; } Index: linux/arch/ia64/mm/numa.c =================================================================== --- linux.orig/arch/ia64/mm/numa.c 2013-02-25 15:49:44.000000000 -0600 +++ linux/arch/ia64/mm/numa.c 2013-03-22 16:09:44.662268239 -0500 @@ -61,13 +61,26 @@ paddr_to_nid(unsigned long paddr) int __meminit __early_pfn_to_nid(unsigned long pfn) { int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec; + /* + NOTE: The following SMP-unsafe globals are only used early + in boot when the kernel is running single-threaded. + */ + static unsigned long last_start_pfn, last_end_pfn; + static int last_nid; + + if (section >= last_ssec && section < last_esec) + return last_nid; for (i = 0; i < num_node_memblks; i++) { ssec = node_memblk[i].start_paddr >> PA_SECTION_SHIFT; esec = (node_memblk[i].start_paddr + node_memblk[i].size + ((1L << PA_SECTION_SHIFT) - 1)) >> PA_SECTION_SHIFT; - if (section >= ssec && section < esec) + if (section >= ssec && section < esec) { + last_ssec = ssec; + last_esec = esec; + last_nid = node_memblk[i].nid return node_memblk[i].nid; + } } return -1; -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc rja@sgi.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-23 15:29 ` Russ Anderson 0 siblings, 0 replies; 41+ messages in thread From: Russ Anderson @ 2013-03-23 15:29 UTC (permalink / raw) To: Ingo Molnar; +Cc: David Rientjes, linux-mm, linux-kernel, tglx, mingo, hpa On Fri, Mar 22, 2013 at 08:25:32AM +0100, Ingo Molnar wrote: > > * David Rientjes <rientjes@google.com> wrote: > > > On Thu, 21 Mar 2013, Ingo Molnar wrote: > > > > > > Index: linux/mm/page_alloc.c > > > > =================================================================== > > > > --- linux.orig/mm/page_alloc.c 2013-03-18 10:52:11.510988843 -0500 > > > > +++ linux/mm/page_alloc.c 2013-03-18 10:52:14.214931348 -0500 > > > > @@ -4161,10 +4161,19 @@ int __meminit __early_pfn_to_nid(unsigne > > > > { > > > > unsigned long start_pfn, end_pfn; > > > > int i, nid; > > > > + static unsigned long last_start_pfn, last_end_pfn; > > > > + static int last_nid; > > > > > > Please move these globals out of function local scope, to make it more > > > apparent that they are not on-stack. I only noticed it in the second pass. > > > > The way they're currently defined places these in meminit.data as > > appropriate; if they are moved out, please make sure to annotate their > > definitions with __meminitdata. > > I'm fine with having them within the function as well in this special > case, as long as a heavy /* NOTE: ... */ warning is put before them - > which explains why these SMP-unsafe globals are safe. > > ( That warning will also act as a visual delimiter that breaks the > normally confusing and misleading 'globals mixed amongst stack > variables' pattern. ) Thanks Ingo. Here is an updated patch with heavy warning added. As for the wrapper function, I was unable to find an obvious way to add a wrapper without significanly changing both versions of __early_pfn_to_nid(). It seems cleaner to add the change in both versions. I'm sure someone will point out if this conclusion is wrong. :-) ------------------------------------------------------------ When booting on a large memory system, the kernel spends considerable time in memmap_init_zone() setting up memory zones. Analysis shows significant time spent in __early_pfn_to_nid(). The routine memmap_init_zone() checks each PFN to verify the nid is valid. __early_pfn_to_nid() sequentially scans the list of pfn ranges to find the right range and returns the nid. This does not scale well. On a 4 TB (single rack) system there are 308 memory ranges to scan. The higher the PFN the more time spent sequentially spinning through memory ranges. Since memmap_init_zone() increments pfn, it will almost always be looking for the same range as the previous pfn, so check that range first. If it is in the same range, return that nid. If not, scan the list as before. A 4 TB (single rack) UV1 system takes 512 seconds to get through the zone code. This performance optimization reduces the time by 189 seconds, a 36% improvement. A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, a 112.9 second (53%) reduction. Signed-off-by: Russ Anderson <rja@sgi.com> --- arch/ia64/mm/numa.c | 15 ++++++++++++++- mm/page_alloc.c | 15 ++++++++++++++- 2 files changed, 28 insertions(+), 2 deletions(-) Index: linux/mm/page_alloc.c =================================================================== --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne { unsigned long start_pfn, end_pfn; int i, nid; + /* + NOTE: The following SMP-unsafe globals are only used early + in boot when the kernel is running single-threaded. + */ + static unsigned long last_start_pfn, last_end_pfn; + static int last_nid; + + if (last_start_pfn <= pfn && pfn < last_end_pfn) + return last_nid; for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) - if (start_pfn <= pfn && pfn < end_pfn) + if (start_pfn <= pfn && pfn < end_pfn) { + last_start_pfn = start_pfn; + last_end_pfn = end_pfn; + last_nid = nid; return nid; + } /* This is a memory hole */ return -1; } Index: linux/arch/ia64/mm/numa.c =================================================================== --- linux.orig/arch/ia64/mm/numa.c 2013-02-25 15:49:44.000000000 -0600 +++ linux/arch/ia64/mm/numa.c 2013-03-22 16:09:44.662268239 -0500 @@ -61,13 +61,26 @@ paddr_to_nid(unsigned long paddr) int __meminit __early_pfn_to_nid(unsigned long pfn) { int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec; + /* + NOTE: The following SMP-unsafe globals are only used early + in boot when the kernel is running single-threaded. + */ + static unsigned long last_start_pfn, last_end_pfn; + static int last_nid; + + if (section >= last_ssec && section < last_esec) + return last_nid; for (i = 0; i < num_node_memblks; i++) { ssec = node_memblk[i].start_paddr >> PA_SECTION_SHIFT; esec = (node_memblk[i].start_paddr + node_memblk[i].size + ((1L << PA_SECTION_SHIFT) - 1)) >> PA_SECTION_SHIFT; - if (section >= ssec && section < esec) + if (section >= ssec && section < esec) { + last_ssec = ssec; + last_esec = esec; + last_nid = node_memblk[i].nid return node_memblk[i].nid; + } } return -1; -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc rja@sgi.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-23 15:29 ` Russ Anderson (?) @ 2013-03-23 20:37 ` Yinghai Lu 2013-03-25 2:11 ` Lin Feng -1 siblings, 1 reply; 41+ messages in thread From: Yinghai Lu @ 2013-03-23 20:37 UTC (permalink / raw) To: Russ Anderson, Tejun Heo, Andrew Morton Cc: Ingo Molnar, David Rientjes, linux-mm, linux-kernel, tglx, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 4440 bytes --] On Sat, Mar 23, 2013 at 8:29 AM, Russ Anderson <rja@sgi.com> wrote: > On Fri, Mar 22, 2013 at 08:25:32AM +0100, Ingo Molnar wrote: > ------------------------------------------------------------ > When booting on a large memory system, the kernel spends > considerable time in memmap_init_zone() setting up memory zones. > Analysis shows significant time spent in __early_pfn_to_nid(). > > The routine memmap_init_zone() checks each PFN to verify the > nid is valid. __early_pfn_to_nid() sequentially scans the list of > pfn ranges to find the right range and returns the nid. This does > not scale well. On a 4 TB (single rack) system there are 308 > memory ranges to scan. The higher the PFN the more time spent > sequentially spinning through memory ranges. > > Since memmap_init_zone() increments pfn, it will almost always be > looking for the same range as the previous pfn, so check that > range first. If it is in the same range, return that nid. > If not, scan the list as before. > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > the zone code. This performance optimization reduces the time > by 189 seconds, a 36% improvement. > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > a 112.9 second (53%) reduction. Interesting. but only have 308 entries in memblock... Did you try to extend memblock_search() to search nid back? Something like attached patch. That should save more time. > > Signed-off-by: Russ Anderson <rja@sgi.com> > --- > arch/ia64/mm/numa.c | 15 ++++++++++++++- > mm/page_alloc.c | 15 ++++++++++++++- > 2 files changed, 28 insertions(+), 2 deletions(-) > > Index: linux/mm/page_alloc.c > =================================================================== > --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 > +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne > { > unsigned long start_pfn, end_pfn; > int i, nid; > + /* > + NOTE: The following SMP-unsafe globals are only used early > + in boot when the kernel is running single-threaded. > + */ > + static unsigned long last_start_pfn, last_end_pfn; > + static int last_nid; > + > + if (last_start_pfn <= pfn && pfn < last_end_pfn) > + return last_nid; > > for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) > - if (start_pfn <= pfn && pfn < end_pfn) > + if (start_pfn <= pfn && pfn < end_pfn) { > + last_start_pfn = start_pfn; > + last_end_pfn = end_pfn; > + last_nid = nid; > return nid; > + } > /* This is a memory hole */ > return -1; > } > Index: linux/arch/ia64/mm/numa.c > =================================================================== > --- linux.orig/arch/ia64/mm/numa.c 2013-02-25 15:49:44.000000000 -0600 > +++ linux/arch/ia64/mm/numa.c 2013-03-22 16:09:44.662268239 -0500 > @@ -61,13 +61,26 @@ paddr_to_nid(unsigned long paddr) > int __meminit __early_pfn_to_nid(unsigned long pfn) > { > int i, section = pfn >> PFN_SECTION_SHIFT, ssec, esec; > + /* > + NOTE: The following SMP-unsafe globals are only used early > + in boot when the kernel is running single-threaded. > + */ > + static unsigned long last_start_pfn, last_end_pfn; last_ssec, last_esec? > + static int last_nid; > + > + if (section >= last_ssec && section < last_esec) > + return last_nid; > > for (i = 0; i < num_node_memblks; i++) { > ssec = node_memblk[i].start_paddr >> PA_SECTION_SHIFT; > esec = (node_memblk[i].start_paddr + node_memblk[i].size + > ((1L << PA_SECTION_SHIFT) - 1)) >> PA_SECTION_SHIFT; > - if (section >= ssec && section < esec) > + if (section >= ssec && section < esec) { > + last_ssec = ssec; > + last_esec = esec; > + last_nid = node_memblk[i].nid > return node_memblk[i].nid; > + } > } > > return -1; > also looks like you forget to put IA maintainers in the To list. may just put ia64 part in separated patch? Thanks Yinghai [-- Attachment #2: memblock_search_pfn_nid.patch --] [-- Type: application/octet-stream, Size: 2370 bytes --] --- include/linux/memblock.h | 2 ++ mm/memblock.c | 18 ++++++++++++++++++ mm/page_alloc.c | 14 ++++++++------ 3 files changed, 28 insertions(+), 6 deletions(-) Index: linux-2.6/include/linux/memblock.h =================================================================== --- linux-2.6.orig/include/linux/memblock.h +++ linux-2.6/include/linux/memblock.h @@ -60,6 +60,8 @@ int memblock_reserve(phys_addr_t base, p void memblock_trim_memory(phys_addr_t align); #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP +int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, + unsigned long *end_pfn); void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, unsigned long *out_end_pfn, int *out_nid); Index: linux-2.6/mm/memblock.c =================================================================== --- linux-2.6.orig/mm/memblock.c +++ linux-2.6/mm/memblock.c @@ -910,6 +910,24 @@ int __init_memblock memblock_is_memory(p return memblock_search(&memblock.memory, addr) != -1; } +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP +int __init_memblock memblock_search_pfn_nid(unsigned long pfn, + unsigned long *start_pfn, unsigned long *end_pfn) +{ + struct memblock_type *type = &memblock.memory; + int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT); + + if (mid == -1) + return -1; + + *start_pfn = type->regions[mid].base >> PAGE_SHIFT; + *end_pfn = (type->regions[mid].base + type->regions[mid].size) + >> PAGE_SHIFT; + + return type->regions[mid].nid; +} +#endif + /** * memblock_is_region_memory - check if a region is a subset of memory * @base: base of region to check Index: linux-2.6/mm/page_alloc.c =================================================================== --- linux-2.6.orig/mm/page_alloc.c +++ linux-2.6/mm/page_alloc.c @@ -4160,13 +4160,15 @@ int __meminit init_currently_empty_zone( int __meminit __early_pfn_to_nid(unsigned long pfn) { unsigned long start_pfn, end_pfn; - int i, nid; + int nid; - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) - if (start_pfn <= pfn && pfn < end_pfn) - return nid; - /* This is a memory hole */ - return -1; + nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn); + + if (nid != -1) { + /* save start_pfn, and end_pfn ?*/ + } + + return nid; } #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-23 20:37 ` Yinghai Lu @ 2013-03-25 2:11 ` Lin Feng 0 siblings, 0 replies; 41+ messages in thread From: Lin Feng @ 2013-03-25 2:11 UTC (permalink / raw) To: Yinghai Lu, Russ Anderson Cc: Tejun Heo, Andrew Morton, Ingo Molnar, David Rientjes, linux-mm, linux-kernel, tglx, mingo, hpa On 03/24/2013 04:37 AM, Yinghai Lu wrote: > +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > +int __init_memblock memblock_search_pfn_nid(unsigned long pfn, > + unsigned long *start_pfn, unsigned long *end_pfn) > +{ > + struct memblock_type *type = &memblock.memory; > + int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT); I'm really eager to see how much time can we save using binary search compared to linear search in this case :) (quote) > A 4 TB (single rack) UV1 system takes 512 seconds to get through > the zone code. This performance optimization reduces the time > by 189 seconds, a 36% improvement. > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > a 112.9 second (53%) reduction. (quote) thanks, linfeng ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-25 2:11 ` Lin Feng 0 siblings, 0 replies; 41+ messages in thread From: Lin Feng @ 2013-03-25 2:11 UTC (permalink / raw) To: Yinghai Lu, Russ Anderson Cc: Tejun Heo, Andrew Morton, Ingo Molnar, David Rientjes, linux-mm, linux-kernel, tglx, mingo, hpa On 03/24/2013 04:37 AM, Yinghai Lu wrote: > +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > +int __init_memblock memblock_search_pfn_nid(unsigned long pfn, > + unsigned long *start_pfn, unsigned long *end_pfn) > +{ > + struct memblock_type *type = &memblock.memory; > + int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT); I'm really eager to see how much time can we save using binary search compared to linear search in this case :) (quote) > A 4 TB (single rack) UV1 system takes 512 seconds to get through > the zone code. This performance optimization reduces the time > by 189 seconds, a 36% improvement. > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > a 112.9 second (53%) reduction. (quote) thanks, linfeng -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-25 2:11 ` Lin Feng @ 2013-03-25 21:56 ` Russ Anderson -1 siblings, 0 replies; 41+ messages in thread From: Russ Anderson @ 2013-03-25 21:56 UTC (permalink / raw) To: Lin Feng Cc: Yinghai Lu, Tejun Heo, Andrew Morton, Ingo Molnar, David Rientjes, linux-mm, linux-kernel, tglx, mingo, hpa On Mon, Mar 25, 2013 at 10:11:27AM +0800, Lin Feng wrote: > On 03/24/2013 04:37 AM, Yinghai Lu wrote: > > +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > > +int __init_memblock memblock_search_pfn_nid(unsigned long pfn, > > + unsigned long *start_pfn, unsigned long *end_pfn) > > +{ > > + struct memblock_type *type = &memblock.memory; > > + int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT); > > I'm really eager to see how much time can we save using binary search compared to > linear search in this case :) I have machine time tonight to measure the difference. Based on earlier testing, a system with 9TB memory calls __early_pfn_to_nid() 2,377,198,300 times while booting, but only 6815 times does it not find that the memory range is the same as previous and search the table. Caching the previous range avoids searching the table 2,377,191,485 times, saving a significant amount of time. Of the remaining 6815 times when it searches the table, a binary search may help, but with relatively few calls it may not make much of an overall difference. Testing will show how much. > (quote) > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > > the zone code. This performance optimization reduces the time > > by 189 seconds, a 36% improvement. > > > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > > a 112.9 second (53%) reduction. > (quote) > > thanks, > linfeng -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc rja@sgi.com ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-25 21:56 ` Russ Anderson 0 siblings, 0 replies; 41+ messages in thread From: Russ Anderson @ 2013-03-25 21:56 UTC (permalink / raw) To: Lin Feng Cc: Yinghai Lu, Tejun Heo, Andrew Morton, Ingo Molnar, David Rientjes, linux-mm, linux-kernel, tglx, mingo, hpa On Mon, Mar 25, 2013 at 10:11:27AM +0800, Lin Feng wrote: > On 03/24/2013 04:37 AM, Yinghai Lu wrote: > > +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP > > +int __init_memblock memblock_search_pfn_nid(unsigned long pfn, > > + unsigned long *start_pfn, unsigned long *end_pfn) > > +{ > > + struct memblock_type *type = &memblock.memory; > > + int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT); > > I'm really eager to see how much time can we save using binary search compared to > linear search in this case :) I have machine time tonight to measure the difference. Based on earlier testing, a system with 9TB memory calls __early_pfn_to_nid() 2,377,198,300 times while booting, but only 6815 times does it not find that the memory range is the same as previous and search the table. Caching the previous range avoids searching the table 2,377,191,485 times, saving a significant amount of time. Of the remaining 6815 times when it searches the table, a binary search may help, but with relatively few calls it may not make much of an overall difference. Testing will show how much. > (quote) > > A 4 TB (single rack) UV1 system takes 512 seconds to get through > > the zone code. This performance optimization reduces the time > > by 189 seconds, a 36% improvement. > > > > A 2 TB (single rack) UV2 system goes from 212.7 seconds to 99.8 seconds, > > a 112.9 second (53%) reduction. > (quote) > > thanks, > linfeng -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc rja@sgi.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-25 21:56 ` Russ Anderson (?) @ 2013-03-25 22:17 ` Yinghai Lu -1 siblings, 0 replies; 41+ messages in thread From: Yinghai Lu @ 2013-03-25 22:17 UTC (permalink / raw) To: Russ Anderson Cc: Lin Feng, Tejun Heo, Andrew Morton, Ingo Molnar, David Rientjes, linux-mm, linux-kernel, tglx, mingo, hpa [-- Attachment #1: Type: text/plain, Size: 1317 bytes --] On Mon, Mar 25, 2013 at 2:56 PM, Russ Anderson <rja@sgi.com> wrote: > On Mon, Mar 25, 2013 at 10:11:27AM +0800, Lin Feng wrote: >> On 03/24/2013 04:37 AM, Yinghai Lu wrote: >> > +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP >> > +int __init_memblock memblock_search_pfn_nid(unsigned long pfn, >> > + unsigned long *start_pfn, unsigned long *end_pfn) >> > +{ >> > + struct memblock_type *type = &memblock.memory; >> > + int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT); >> >> I'm really eager to see how much time can we save using binary search compared to >> linear search in this case :) > > I have machine time tonight to measure the difference. > > Based on earlier testing, a system with 9TB memory calls > __early_pfn_to_nid() 2,377,198,300 times while booting, but > only 6815 times does it not find that the memory range is > the same as previous and search the table. Caching the > previous range avoids searching the table 2,377,191,485 times, > saving a significant amount of time. > > Of the remaining 6815 times when it searches the table, a binary > search may help, but with relatively few calls it may not > make much of an overall difference. Testing will show how much. Please check attached patch that could be applied on top of your patch in -mm. Thanks Yinghai [-- Attachment #2: memblock_search_pfn_nid.patch --] [-- Type: application/octet-stream, Size: 2761 bytes --] --- include/linux/memblock.h | 2 ++ mm/memblock.c | 18 ++++++++++++++++++ mm/page_alloc.c | 19 +++++++++---------- 3 files changed, 29 insertions(+), 10 deletions(-) Index: linux-2.6/include/linux/memblock.h =================================================================== --- linux-2.6.orig/include/linux/memblock.h +++ linux-2.6/include/linux/memblock.h @@ -63,6 +63,8 @@ int __memblock_reserve(phys_addr_t base, void memblock_trim_memory(phys_addr_t align); #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP +int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn, + unsigned long *end_pfn); void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, unsigned long *out_end_pfn, int *out_nid); Index: linux-2.6/mm/memblock.c =================================================================== --- linux-2.6.orig/mm/memblock.c +++ linux-2.6/mm/memblock.c @@ -954,6 +954,24 @@ int __init_memblock memblock_is_memory(p return memblock_search(&memblock.memory, addr) != -1; } +#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP +int __init_memblock memblock_search_pfn_nid(unsigned long pfn, + unsigned long *start_pfn, unsigned long *end_pfn) +{ + struct memblock_type *type = &memblock.memory; + int mid = memblock_search(type, (phys_addr_t)pfn << PAGE_SHIFT); + + if (mid == -1) + return -1; + + *start_pfn = type->regions[mid].base >> PAGE_SHIFT; + *end_pfn = (type->regions[mid].base + type->regions[mid].size) + >> PAGE_SHIFT; + + return type->regions[mid].nid; +} +#endif + /** * memblock_is_region_memory - check if a region is a subset of memory * @base: base of region to check Index: linux-2.6/mm/page_alloc.c =================================================================== --- linux-2.6.orig/mm/page_alloc.c +++ linux-2.6/mm/page_alloc.c @@ -4166,7 +4166,7 @@ int __meminit init_currently_empty_zone( int __meminit __early_pfn_to_nid(unsigned long pfn) { unsigned long start_pfn, end_pfn; - int i, nid; + int nid; /* * NOTE: The following SMP-unsafe globals are only used early * in boot when the kernel is running single-threaded. @@ -4177,15 +4177,14 @@ int __meminit __early_pfn_to_nid(unsigne if (last_start_pfn <= pfn && pfn < last_end_pfn) return last_nid; - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) - if (start_pfn <= pfn && pfn < end_pfn) { - last_start_pfn = start_pfn; - last_end_pfn = end_pfn; - last_nid = nid; - return nid; - } - /* This is a memory hole */ - return -1; + nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn); + if (nid != -1) { + last_start_pfn = start_pfn; + last_end_pfn = end_pfn; + last_nid = nid; + } + + return nid; } #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */ ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-23 15:29 ` Russ Anderson @ 2013-03-23 22:24 ` KOSAKI Motohiro -1 siblings, 0 replies; 41+ messages in thread From: KOSAKI Motohiro @ 2013-03-23 22:24 UTC (permalink / raw) To: Russ Anderson Cc: Ingo Molnar, David Rientjes, linux-mm, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin > --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 > +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne > { > unsigned long start_pfn, end_pfn; > int i, nid; > + /* > + NOTE: The following SMP-unsafe globals are only used early > + in boot when the kernel is running single-threaded. > + */ > + static unsigned long last_start_pfn, last_end_pfn; > + static int last_nid; Why don't you mark them __meminitdata? They seems freeable. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-23 22:24 ` KOSAKI Motohiro 0 siblings, 0 replies; 41+ messages in thread From: KOSAKI Motohiro @ 2013-03-23 22:24 UTC (permalink / raw) To: Russ Anderson Cc: Ingo Molnar, David Rientjes, linux-mm, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin > --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 > +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne > { > unsigned long start_pfn, end_pfn; > int i, nid; > + /* > + NOTE: The following SMP-unsafe globals are only used early > + in boot when the kernel is running single-threaded. > + */ > + static unsigned long last_start_pfn, last_end_pfn; > + static int last_nid; Why don't you mark them __meminitdata? They seems freeable. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-23 22:24 ` KOSAKI Motohiro @ 2013-03-25 0:28 ` David Rientjes -1 siblings, 0 replies; 41+ messages in thread From: David Rientjes @ 2013-03-25 0:28 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Russ Anderson, Ingo Molnar, linux-mm, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin On Sat, 23 Mar 2013, KOSAKI Motohiro wrote: > > --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 > > +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 > > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne > > { > > unsigned long start_pfn, end_pfn; > > int i, nid; > > + /* > > + NOTE: The following SMP-unsafe globals are only used early > > + in boot when the kernel is running single-threaded. > > + */ > > + static unsigned long last_start_pfn, last_end_pfn; > > + static int last_nid; > > Why don't you mark them __meminitdata? They seems freeable. > Um, defining them in a __meminit function places them in .meminit.data already. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-25 0:28 ` David Rientjes 0 siblings, 0 replies; 41+ messages in thread From: David Rientjes @ 2013-03-25 0:28 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Russ Anderson, Ingo Molnar, linux-mm, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin On Sat, 23 Mar 2013, KOSAKI Motohiro wrote: > > --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 > > +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 > > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne > > { > > unsigned long start_pfn, end_pfn; > > int i, nid; > > + /* > > + NOTE: The following SMP-unsafe globals are only used early > > + in boot when the kernel is running single-threaded. > > + */ > > + static unsigned long last_start_pfn, last_end_pfn; > > + static int last_nid; > > Why don't you mark them __meminitdata? They seems freeable. > Um, defining them in a __meminit function places them in .meminit.data already. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-25 0:28 ` David Rientjes @ 2013-03-25 21:34 ` Andrew Morton -1 siblings, 0 replies; 41+ messages in thread From: Andrew Morton @ 2013-03-25 21:34 UTC (permalink / raw) To: David Rientjes Cc: KOSAKI Motohiro, Russ Anderson, Ingo Molnar, linux-mm, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin On Sun, 24 Mar 2013 17:28:12 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > On Sat, 23 Mar 2013, KOSAKI Motohiro wrote: > > > > --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 > > > +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 > > > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne > > > { > > > unsigned long start_pfn, end_pfn; > > > int i, nid; > > > + /* > > > + NOTE: The following SMP-unsafe globals are only used early > > > + in boot when the kernel is running single-threaded. > > > + */ > > > + static unsigned long last_start_pfn, last_end_pfn; > > > + static int last_nid; > > > > Why don't you mark them __meminitdata? They seems freeable. > > > > Um, defining them in a __meminit function places them in .meminit.data > already. I wish it did, but it doesn't. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-25 21:34 ` Andrew Morton 0 siblings, 0 replies; 41+ messages in thread From: Andrew Morton @ 2013-03-25 21:34 UTC (permalink / raw) To: David Rientjes Cc: KOSAKI Motohiro, Russ Anderson, Ingo Molnar, linux-mm, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin On Sun, 24 Mar 2013 17:28:12 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > On Sat, 23 Mar 2013, KOSAKI Motohiro wrote: > > > > --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 > > > +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 > > > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne > > > { > > > unsigned long start_pfn, end_pfn; > > > int i, nid; > > > + /* > > > + NOTE: The following SMP-unsafe globals are only used early > > > + in boot when the kernel is running single-threaded. > > > + */ > > > + static unsigned long last_start_pfn, last_end_pfn; > > > + static int last_nid; > > > > Why don't you mark them __meminitdata? They seems freeable. > > > > Um, defining them in a __meminit function places them in .meminit.data > already. I wish it did, but it doesn't. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-25 21:34 ` Andrew Morton @ 2013-03-25 22:36 ` David Rientjes -1 siblings, 0 replies; 41+ messages in thread From: David Rientjes @ 2013-03-25 22:36 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, Russ Anderson, Ingo Molnar, linux-mm, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin On Mon, 25 Mar 2013, Andrew Morton wrote: > > Um, defining them in a __meminit function places them in .meminit.data > > already. > > I wish it did, but it doesn't. > $ objdump -t mm/page_alloc.o | grep last_start_pfn 0000000000000240 l O .meminit.data 0000000000000008 last_start_pfn.34345 What version of gcc are you using? ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-25 22:36 ` David Rientjes 0 siblings, 0 replies; 41+ messages in thread From: David Rientjes @ 2013-03-25 22:36 UTC (permalink / raw) To: Andrew Morton Cc: KOSAKI Motohiro, Russ Anderson, Ingo Molnar, linux-mm, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin On Mon, 25 Mar 2013, Andrew Morton wrote: > > Um, defining them in a __meminit function places them in .meminit.data > > already. > > I wish it did, but it doesn't. > $ objdump -t mm/page_alloc.o | grep last_start_pfn 0000000000000240 l O .meminit.data 0000000000000008 last_start_pfn.34345 What version of gcc are you using? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-25 22:36 ` David Rientjes @ 2013-03-25 22:42 ` Andrew Morton -1 siblings, 0 replies; 41+ messages in thread From: Andrew Morton @ 2013-03-25 22:42 UTC (permalink / raw) To: David Rientjes Cc: KOSAKI Motohiro, Russ Anderson, Ingo Molnar, linux-mm, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin On Mon, 25 Mar 2013 15:36:54 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > On Mon, 25 Mar 2013, Andrew Morton wrote: > > > > Um, defining them in a __meminit function places them in .meminit.data > > > already. > > > > I wish it did, but it doesn't. > > > > $ objdump -t mm/page_alloc.o | grep last_start_pfn > 0000000000000240 l O .meminit.data 0000000000000008 last_start_pfn.34345 > > What version of gcc are you using? 4.4.4 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-25 22:42 ` Andrew Morton 0 siblings, 0 replies; 41+ messages in thread From: Andrew Morton @ 2013-03-25 22:42 UTC (permalink / raw) To: David Rientjes Cc: KOSAKI Motohiro, Russ Anderson, Ingo Molnar, linux-mm, LKML, Thomas Gleixner, Ingo Molnar, H. Peter Anvin On Mon, 25 Mar 2013 15:36:54 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > On Mon, 25 Mar 2013, Andrew Morton wrote: > > > > Um, defining them in a __meminit function places them in .meminit.data > > > already. > > > > I wish it did, but it doesn't. > > > > $ objdump -t mm/page_alloc.o | grep last_start_pfn > 0000000000000240 l O .meminit.data 0000000000000008 last_start_pfn.34345 > > What version of gcc are you using? 4.4.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid 2013-03-23 15:29 ` Russ Anderson @ 2013-03-24 7:43 ` Ingo Molnar -1 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2013-03-24 7:43 UTC (permalink / raw) To: Russ Anderson; +Cc: David Rientjes, linux-mm, linux-kernel, tglx, mingo, hpa * Russ Anderson <rja@sgi.com> wrote: > --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 > +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne > { > unsigned long start_pfn, end_pfn; > int i, nid; > + /* > + NOTE: The following SMP-unsafe globals are only used early > + in boot when the kernel is running single-threaded. > + */ > + static unsigned long last_start_pfn, last_end_pfn; > + static int last_nid; I guess I'm the nitpicker of the week: please use the customary (multi-line) comment style: /* * Comment ..... * ...... goes here. */ specified in Documentation/CodingStyle. Thanks, Ingo ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [patch] mm: speedup in __early_pfn_to_nid @ 2013-03-24 7:43 ` Ingo Molnar 0 siblings, 0 replies; 41+ messages in thread From: Ingo Molnar @ 2013-03-24 7:43 UTC (permalink / raw) To: Russ Anderson; +Cc: David Rientjes, linux-mm, linux-kernel, tglx, mingo, hpa * Russ Anderson <rja@sgi.com> wrote: > --- linux.orig/mm/page_alloc.c 2013-03-19 16:09:03.736450861 -0500 > +++ linux/mm/page_alloc.c 2013-03-22 17:07:43.895405617 -0500 > @@ -4161,10 +4161,23 @@ int __meminit __early_pfn_to_nid(unsigne > { > unsigned long start_pfn, end_pfn; > int i, nid; > + /* > + NOTE: The following SMP-unsafe globals are only used early > + in boot when the kernel is running single-threaded. > + */ > + static unsigned long last_start_pfn, last_end_pfn; > + static int last_nid; I guess I'm the nitpicker of the week: please use the customary (multi-line) comment style: /* * Comment ..... * ...... goes here. */ specified in Documentation/CodingStyle. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2013-03-26 8:05 UTC | newest] Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-03-18 15:56 [patch] mm: speedup in __early_pfn_to_nid Russ Anderson 2013-03-18 15:56 ` Russ Anderson 2013-03-19 3:56 ` David Rientjes 2013-03-19 3:56 ` David Rientjes 2013-03-20 22:32 ` Andrew Morton 2013-03-20 22:32 ` Andrew Morton 2013-03-20 22:32 ` Andrew Morton 2013-03-21 10:55 ` Ingo Molnar 2013-03-21 10:55 ` Ingo Molnar 2013-03-21 12:35 ` Michal Hocko 2013-03-21 12:35 ` Michal Hocko 2013-03-21 18:03 ` Ingo Molnar 2013-03-21 18:03 ` Ingo Molnar 2013-03-25 21:26 ` Andrew Morton 2013-03-25 21:26 ` Andrew Morton 2013-03-26 8:05 ` Ingo Molnar 2013-03-26 8:05 ` Ingo Molnar 2013-03-21 18:40 ` David Rientjes 2013-03-21 18:40 ` David Rientjes 2013-03-22 7:25 ` Ingo Molnar 2013-03-22 7:25 ` Ingo Molnar 2013-03-23 15:29 ` Russ Anderson 2013-03-23 15:29 ` Russ Anderson 2013-03-23 20:37 ` Yinghai Lu 2013-03-25 2:11 ` Lin Feng 2013-03-25 2:11 ` Lin Feng 2013-03-25 21:56 ` Russ Anderson 2013-03-25 21:56 ` Russ Anderson 2013-03-25 22:17 ` Yinghai Lu 2013-03-23 22:24 ` KOSAKI Motohiro 2013-03-23 22:24 ` KOSAKI Motohiro 2013-03-25 0:28 ` David Rientjes 2013-03-25 0:28 ` David Rientjes 2013-03-25 21:34 ` Andrew Morton 2013-03-25 21:34 ` Andrew Morton 2013-03-25 22:36 ` David Rientjes 2013-03-25 22:36 ` David Rientjes 2013-03-25 22:42 ` Andrew Morton 2013-03-25 22:42 ` Andrew Morton 2013-03-24 7:43 ` Ingo Molnar 2013-03-24 7:43 ` Ingo Molnar
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.