All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] percpu: make pcpu_build_alloc_info() clear static buffers
@ 2009-09-24 12:55 Tejun Heo
  2009-09-24 12:56 ` [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2009-09-24 12:55 UTC (permalink / raw)
  To: Linux Kernel, David Miller, Rusty Russell, Christoph Lameter,
	Ingo Molnar, H. Peter Anvin

pcpu_build_alloc_info() may be called multiple times when percpu is
falling back to different first chunk allocator.  Make it clear static
buffers so that they don't contain values from previous runs.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
These three patches are scheduled for mainline and aim to work around
the cases where distance between the two farthest units is too large
compared to vmalloc area size.  This happens on sparc64 because
vmalloc area size is relatively small there and nodes can easily be
placed such that they are too far apart.

This patchset implements page mapping first chunk allocator for
sparc64 and make embedding allocator fallback to it when vmalloc area
doesn't seem large enough.  This should make percpu allocator more
robust on other archs which implement page mapping allocator (only x86
currently) and diagnosing problems easier on other archs.

Thanks.

 mm/percpu.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: work/mm/percpu.c
===================================================================
--- work.orig/mm/percpu.c
+++ work/mm/percpu.c
@@ -1347,6 +1347,10 @@ struct pcpu_alloc_info * __init pcpu_bui
 	struct pcpu_alloc_info *ai;
 	unsigned int *cpu_map;

+	/* this function may be called multiple times */
+	memset(group_map, 0, sizeof(group_map));
+	memset(group_cnt, 0, sizeof(group_map));
+
 	/*
 	 * Determine min_unit_size, alloc_size and max_upa such that
 	 * alloc_size is multiple of atom_size and is the smallest

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator
  2009-09-24 12:55 [PATCH 1/3] percpu: make pcpu_build_alloc_info() clear static buffers Tejun Heo
@ 2009-09-24 12:56 ` Tejun Heo
  2009-09-24 12:57   ` [PATCH 3/3] percpu: make embedding first chunk allocator check vmalloc space size Tejun Heo
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Tejun Heo @ 2009-09-24 12:56 UTC (permalink / raw)
  To: Linux Kernel, David Miller, Rusty Russell, Christoph Lameter,
	Ingo Molnar, H. Peter Anvin

Implement page mapping percpu first chunk allocator as a fallback to
the embedding allocator.  The next patch will make the embedding
allocator check distances between units to determine whether it fits
within the vmalloc area so that this fallback can be used on such
cases.

sparc64 currently has relatively small vmalloc area which makes it
impossible to create any dynamic chunks on certain configurations
leading to percpu allocation failures.  This and the next patch should
allow those configurations to keep working until proper solution is
found.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
David, can you please ack this after reviewing?

Thanks.

 arch/sparc/Kconfig         |    3 ++
 arch/sparc/kernel/smp_64.c |   51 +++++++++++++++++++++++++++++++++++++--------
 2 files changed, 46 insertions(+), 8 deletions(-)

Index: work/arch/sparc/Kconfig
===================================================================
--- work.orig/arch/sparc/Kconfig
+++ work/arch/sparc/Kconfig
@@ -102,6 +102,9 @@ config HAVE_SETUP_PER_CPU_AREA
 config NEED_PER_CPU_EMBED_FIRST_CHUNK
 	def_bool y if SPARC64

+config NEED_PER_CPU_PAGE_FIRST_CHUNK
+	def_bool y if SPARC64
+
 config GENERIC_HARDIRQS_NO__DO_IRQ
 	bool
 	def_bool y if SPARC64
Index: work/arch/sparc/kernel/smp_64.c
===================================================================
--- work.orig/arch/sparc/kernel/smp_64.c
+++ work/arch/sparc/kernel/smp_64.c
@@ -1420,7 +1420,7 @@ static void __init pcpu_free_bootmem(voi
 	free_bootmem(__pa(ptr), size);
 }

-static int pcpu_cpu_distance(unsigned int from, unsigned int to)
+static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
 {
 	if (cpu_to_node(from) == cpu_to_node(to))
 		return LOCAL_DISTANCE;
@@ -1428,18 +1428,53 @@ static int pcpu_cpu_distance(unsigned in
 		return REMOTE_DISTANCE;
 }

+static void __init pcpu_populate_pte(unsigned long addr)
+{
+	pgd_t *pgd = pgd_offset_k(addr);
+	pud_t *pud;
+	pmd_t *pmd;
+
+	pud = pud_offset(pgd, addr);
+	if (pud_none(*pud)) {
+		pmd_t *new;
+
+		new = __alloc_bootmem(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
+		pud_populate(&init_mm, pud, new);
+	}
+
+	pmd = pmd_offset(pud, addr);
+	if (!pmd_present(*pmd)) {
+		pte_t *new;
+
+		new = __alloc_bootmem(PAGE_SIZE, PAGE_SIZE, PAGE_SIZE);
+		pmd_populate_kernel(&init_mm, pmd, new);
+	}
+}
+
 void __init setup_per_cpu_areas(void)
 {
 	unsigned long delta;
 	unsigned int cpu;
-	int rc;
+	int rc = -EINVAL;

-	rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
-				    PERCPU_DYNAMIC_RESERVE, 4 << 20,
-				    pcpu_cpu_distance, pcpu_alloc_bootmem,
-				    pcpu_free_bootmem);
-	if (rc)
-		panic("failed to initialize first chunk (%d)", rc);
+	if (pcpu_chosen_fc != PCPU_FC_PAGE) {
+		rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
+					    PERCPU_DYNAMIC_RESERVE, 4 << 20,
+					    pcpu_cpu_distance,
+					    pcpu_alloc_bootmem,
+					    pcpu_free_bootmem);
+		if (rc)
+			pr_warning("PERCPU: %s allocator failed (%d), "
+				   "falling back to page size\n",
+				   pcpu_fc_names[pcpu_chosen_fc], rc);
+	}
+	if (rc < 0)
+		rc = pcpu_page_first_chunk(PERCPU_MODULE_RESERVE,
+					   pcpu_alloc_bootmem,
+					   pcpu_free_bootmem,
+					   pcpu_populate_pte);
+	if (rc < 0)
+		panic("cannot initialize percpu area (err=%d)", rc);

 	delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
 	for_each_possible_cpu(cpu)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3/3] percpu: make embedding first chunk allocator check vmalloc space size
  2009-09-24 12:56 ` [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator Tejun Heo
@ 2009-09-24 12:57   ` Tejun Heo
  2009-09-24 22:51   ` [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator David Miller
  2009-09-28 21:36   ` David Miller
  2 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2009-09-24 12:57 UTC (permalink / raw)
  To: Linux Kernel, David Miller, Rusty Russell, Christoph Lameter,
	Ingo Molnar, H. Peter Anvin

Embedding first chunk allocator maintains the distances between units
in the vmalloc area and thus needs vmalloc space to be larger than the
maximum distances between units; otherwise, it wouldn't be able to
create any dynamic chunks.  This patch makes the embedding first chunk
allocator check vmalloc space size and if the maximum distance between
units is larger than 75% of it, print warning and, if page mapping
allocator is available, fail initialization so that the system falls
back onto it.

This should work around percpu allocation failure problems on certain
sparc64 configurations where distances between NUMA nodes are larger
than the vmalloc area and makes percpu allocator more robust for
future configurations.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 mm/percpu.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Index: work/mm/percpu.c
===================================================================
--- work.orig/mm/percpu.c
+++ work/mm/percpu.c
@@ -1786,7 +1786,7 @@ int __init pcpu_embed_first_chunk(size_t
 	void *base = (void *)ULONG_MAX;
 	void **areas = NULL;
 	struct pcpu_alloc_info *ai;
-	size_t size_sum, areas_size;
+	size_t size_sum, areas_size, max_distance;
 	int group, i, rc;

 	ai = pcpu_build_alloc_info(reserved_size, dyn_size, atom_size,
@@ -1836,8 +1836,24 @@ int __init pcpu_embed_first_chunk(size_t
 	}

 	/* base address is now known, determine group base offsets */
-	for (group = 0; group < ai->nr_groups; group++)
+	max_distance = 0;
+	for (group = 0; group < ai->nr_groups; group++) {
 		ai->groups[group].base_offset = areas[group] - base;
+		max_distance = max(max_distance, ai->groups[group].base_offset);
+	}
+	max_distance += ai->unit_size;
+
+	/* warn if maximum distance is further than 75% of vmalloc space */
+	if (max_distance > (VMALLOC_END - VMALLOC_START) * 3 / 4) {
+		pr_warning("PERCPU: max_distance=0x%lx too large for vmalloc "
+			   "space 0x%lx\n",
+			   max_distance, VMALLOC_END - VMALLOC_START);
+#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
+		/* and fail if we have fallback */
+		rc = -EINVAL;
+		goto out_free;
+#endif
+	}

 	pr_info("PERCPU: Embedded %zu pages/cpu @%p s%zu r%zu d%zu u%zu\n",
 		PFN_DOWN(size_sum), base, ai->static_size, ai->reserved_size,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator
  2009-09-24 12:56 ` [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator Tejun Heo
  2009-09-24 12:57   ` [PATCH 3/3] percpu: make embedding first chunk allocator check vmalloc space size Tejun Heo
@ 2009-09-24 22:51   ` David Miller
  2009-09-28 21:36   ` David Miller
  2 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2009-09-24 22:51 UTC (permalink / raw)
  To: tj; +Cc: linux-kernel, rusty, cl, mingo, hpa

From: Tejun Heo <tj@kernel.org>
Date: Thu, 24 Sep 2009 21:56:32 +0900

> Implement page mapping percpu first chunk allocator as a fallback to
> the embedding allocator.  The next patch will make the embedding
> allocator check distances between units to determine whether it fits
> within the vmalloc area so that this fallback can be used on such
> cases.
> 
> sparc64 currently has relatively small vmalloc area which makes it
> impossible to create any dynamic chunks on certain configurations
> leading to percpu allocation failures.  This and the next patch should
> allow those configurations to keep working until proper solution is
> found.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> David, can you please ack this after reviewing?

This looks fine to me:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator
  2009-09-24 12:56 ` [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator Tejun Heo
  2009-09-24 12:57   ` [PATCH 3/3] percpu: make embedding first chunk allocator check vmalloc space size Tejun Heo
  2009-09-24 22:51   ` [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator David Miller
@ 2009-09-28 21:36   ` David Miller
  2009-09-29  0:14     ` Tejun Heo
  2 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2009-09-28 21:36 UTC (permalink / raw)
  To: tj; +Cc: linux-kernel, rusty, cl, mingo, hpa

From: Tejun Heo <tj@kernel.org>
Date: Thu, 24 Sep 2009 21:56:32 +0900

> Implement page mapping percpu first chunk allocator as a fallback to
> the embedding allocator.  The next patch will make the embedding
> allocator check distances between units to determine whether it fits
> within the vmalloc area so that this fallback can be used on such
> cases.
> 
> sparc64 currently has relatively small vmalloc area which makes it
> impossible to create any dynamic chunks on certain configurations
> leading to percpu allocation failures.  This and the next patch should
> allow those configurations to keep working until proper solution is
> found.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> David, can you please ack this after reviewing?

Tejun I am testing out the following patch which will make these
patches of your's basically unnecessary:

diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h
index 0ff92fa..f3cb790 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -41,8 +41,8 @@
 #define LOW_OBP_ADDRESS		_AC(0x00000000f0000000,UL)
 #define HI_OBP_ADDRESS		_AC(0x0000000100000000,UL)
 #define VMALLOC_START		_AC(0x0000000100000000,UL)
-#define VMALLOC_END		_AC(0x0000000200000000,UL)
-#define VMEMMAP_BASE		_AC(0x0000000200000000,UL)
+#define VMALLOC_END		_AC(0x0000010000000000,UL)
+#define VMEMMAP_BASE		_AC(0x0000010000000000,UL)
 
 #define vmemmap			((struct page *)VMEMMAP_BASE)
 
diff --git a/arch/sparc/kernel/ktlb.S b/arch/sparc/kernel/ktlb.S
index 3ea6e8c..1d36147 100644
--- a/arch/sparc/kernel/ktlb.S
+++ b/arch/sparc/kernel/ktlb.S
@@ -280,8 +280,8 @@ kvmap_dtlb_nonlinear:
 
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 	/* Do not use the TSB for vmemmap.  */
-	mov		(VMEMMAP_BASE >> 24), %g5
-	sllx		%g5, 24, %g5
+	mov		(VMEMMAP_BASE >> 40), %g5
+	sllx		%g5, 40, %g5
 	cmp		%g4,%g5
 	bgeu,pn		%xcc, kvmap_vmemmap
 	 nop
@@ -293,8 +293,8 @@ kvmap_dtlb_tsbmiss:
 	sethi		%hi(MODULES_VADDR), %g5
 	cmp		%g4, %g5
 	blu,pn		%xcc, kvmap_dtlb_longpath
-	 mov		(VMALLOC_END >> 24), %g5
-	sllx		%g5, 24, %g5
+	 mov		(VMALLOC_END >> 40), %g5
+	sllx		%g5, 40, %g5
 	cmp		%g4, %g5
 	bgeu,pn		%xcc, kvmap_dtlb_longpath
 	 nop

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator
  2009-09-28 21:36   ` David Miller
@ 2009-09-29  0:14     ` Tejun Heo
  2009-09-29  0:16       ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2009-09-29  0:14 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, rusty, cl, mingo, hpa

Hello, David.

David Miller wrote:
> From: Tejun Heo <tj@kernel.org>
> Date: Thu, 24 Sep 2009 21:56:32 +0900
> 
>> Implement page mapping percpu first chunk allocator as a fallback to
>> the embedding allocator.  The next patch will make the embedding
>> allocator check distances between units to determine whether it fits
>> within the vmalloc area so that this fallback can be used on such
>> cases.
>>
>> sparc64 currently has relatively small vmalloc area which makes it
>> impossible to create any dynamic chunks on certain configurations
>> leading to percpu allocation failures.  This and the next patch should
>> allow those configurations to keep working until proper solution is
>> found.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> ---
>> David, can you please ack this after reviewing?
> 
> Tejun I am testing out the following patch which will make these
> patches of your's basically unnecessary:

Ah... great but unless you object, I think it would be better to push
it out anyway just to make things a bit more robust and ease tracking
and debugging when something goes wrong.  The added code is small and
ditched once boot is complete.  What do you think?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator
  2009-09-29  0:14     ` Tejun Heo
@ 2009-09-29  0:16       ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2009-09-29  0:16 UTC (permalink / raw)
  To: tj; +Cc: linux-kernel, rusty, cl, mingo, hpa

From: Tejun Heo <tj@kernel.org>
Date: Tue, 29 Sep 2009 09:14:58 +0900

> Ah... great but unless you object, I think it would be better to push
> it out anyway just to make things a bit more robust and ease tracking
> and debugging when something goes wrong.  The added code is small and
> ditched once boot is complete.  What do you think?

That's fine with me:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-09-29  0:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-24 12:55 [PATCH 1/3] percpu: make pcpu_build_alloc_info() clear static buffers Tejun Heo
2009-09-24 12:56 ` [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator Tejun Heo
2009-09-24 12:57   ` [PATCH 3/3] percpu: make embedding first chunk allocator check vmalloc space size Tejun Heo
2009-09-24 22:51   ` [PATCH 2/3] sparc64: implement page mapping percpu first chunk allocator David Miller
2009-09-28 21:36   ` David Miller
2009-09-29  0:14     ` Tejun Heo
2009-09-29  0:16       ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.