linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info
@ 2017-10-03 20:57 Nicolas Pitre
  2017-10-03 21:05 ` Tejun Heo
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2017-10-03 20:57 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter; +Cc: linux-mm, linux-kernel

This can be much smaller than a page on very small memory systems. 
Always rounding up the size to a page is wasteful in that case, and 
required alignment is smaller than the memblock default. Let's round 
things up to a page size only when the actual size is >= page size, and 
then it makes sense to page-align for a nicer allocation pattern.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/mm/percpu.c b/mm/percpu.c
index 434844415d..fe37f85cc2 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1410,13 +1410,17 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
 	struct pcpu_alloc_info *ai;
 	size_t base_size, ai_size;
 	void *ptr;
-	int unit;
+	int unit, align;
 
-	base_size = ALIGN(sizeof(*ai) + nr_groups * sizeof(ai->groups[0]),
-			  __alignof__(ai->groups[0].cpu_map[0]));
+	align = __alignof__(ai->groups[0].cpu_map[0]);
+	base_size = ALIGN(sizeof(*ai) + nr_groups * sizeof(ai->groups[0]), align);
 	ai_size = base_size + nr_units * sizeof(ai->groups[0].cpu_map[0]);
+	if (ai_size >= PAGE_SIZE) {
+		ai_size = PFN_ALIGN(ai_size);
+		align = PAGE_SIZE;
+	}
 
-	ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), 0);
+	ptr = memblock_virt_alloc_nopanic(ai_size, align);
 	if (!ptr)
 		return NULL;
 	ai = ptr;
@@ -1428,7 +1432,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
 		ai->groups[0].cpu_map[unit] = NR_CPUS;
 
 	ai->nr_groups = nr_groups;
-	ai->__ai_size = PFN_ALIGN(ai_size);
+	ai->__ai_size = ai_size;
 
 	return ai;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info
  2017-10-03 20:57 [PATCH] mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info Nicolas Pitre
@ 2017-10-03 21:05 ` Tejun Heo
  2017-10-03 22:29   ` Nicolas Pitre
  0 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2017-10-03 21:05 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Christoph Lameter, linux-mm, linux-kernel

On Tue, Oct 03, 2017 at 04:57:44PM -0400, Nicolas Pitre wrote:
> This can be much smaller than a page on very small memory systems. 
> Always rounding up the size to a page is wasteful in that case, and 
> required alignment is smaller than the memblock default. Let's round 
> things up to a page size only when the actual size is >= page size, and 
> then it makes sense to page-align for a nicer allocation pattern.

Isn't that a temporary area which gets freed later during boot?

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info
  2017-10-03 21:05 ` Tejun Heo
@ 2017-10-03 22:29   ` Nicolas Pitre
  2017-10-03 22:36     ` Tejun Heo
                       ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Nicolas Pitre @ 2017-10-03 22:29 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Christoph Lameter, linux-mm, linux-kernel

On Tue, 3 Oct 2017, Tejun Heo wrote:

> On Tue, Oct 03, 2017 at 04:57:44PM -0400, Nicolas Pitre wrote:
> > This can be much smaller than a page on very small memory systems. 
> > Always rounding up the size to a page is wasteful in that case, and 
> > required alignment is smaller than the memblock default. Let's round 
> > things up to a page size only when the actual size is >= page size, and 
> > then it makes sense to page-align for a nicer allocation pattern.
> 
> Isn't that a temporary area which gets freed later during boot?

Hmmm...

It may get freed through 3 different paths where 2 of them are error 
paths. What looks like a non-error path is in pcpu_embed_first_chunk() 
called from setup_per_cpu_areas(). But there are two versions of 
setup_per_cpu_areas(): one for SMP and one for !SMP. And the !SMP case 
never calls pcpu_free_alloc_info() currently.

I'm not sure i understand that code fully, but maybe the following patch 
could be a better fit:

----- >8
Subject: [PATCH] percpu: don't forget to free the temporary struct pcpu_alloc_info

Unlike the SMP case, the !SMP case does not free the memory for struct 
pcpu_alloc_info allocated in setup_per_cpu_areas(). And to give it a 
chance of being reused by the page allocator later, align it to a page 
boundary just like its size.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/mm/percpu.c b/mm/percpu.c
index 434844415d..caab63375b 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1416,7 +1416,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
 			  __alignof__(ai->groups[0].cpu_map[0]));
 	ai_size = base_size + nr_units * sizeof(ai->groups[0].cpu_map[0]);
 
-	ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), 0);
+	ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), PAGE_SIZE);
 	if (!ptr)
 		return NULL;
 	ai = ptr;
@@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
 
 	if (pcpu_setup_first_chunk(ai, fc) < 0)
 		panic("Failed to initialize percpu areas.");
+	pcpu_free_alloc_info(ai);
 }
 
 #endif	/* CONFIG_SMP */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info
  2017-10-03 22:29   ` Nicolas Pitre
@ 2017-10-03 22:36     ` Tejun Heo
  2017-10-03 23:48       ` Dennis Zhou
  2017-10-04 14:15     ` Tejun Heo
  2017-11-18 18:25     ` mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang) Guenter Roeck
  2 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2017-10-03 22:36 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Christoph Lameter, linux-mm, linux-kernel

Hello,

On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> I'm not sure i understand that code fully, but maybe the following patch 
> could be a better fit:
> 
> ----- >8
> Subject: [PATCH] percpu: don't forget to free the temporary struct pcpu_alloc_info

So, IIRC, the error path is either boot fail or some serious bug in
arch code.  It really doesn't matter whether we free a page or not.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info
  2017-10-03 22:36     ` Tejun Heo
@ 2017-10-03 23:48       ` Dennis Zhou
  2017-10-04  0:13         ` Nicolas Pitre
  0 siblings, 1 reply; 29+ messages in thread
From: Dennis Zhou @ 2017-10-03 23:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Nicolas Pitre, Christoph Lameter, linux-mm, linux-kernel

Hi Tejun,

On Tue, Oct 03, 2017 at 03:36:42PM -0700, Tejun Heo wrote:
> > Subject: [PATCH] percpu: don't forget to free the temporary struct pcpu_alloc_info
> 
> So, IIRC, the error path is either boot fail or some serious bug in
> arch code.  It really doesn't matter whether we free a page or not.
>

In setup_per_cpu_area, a call to either pcpu_embed_first_chunk,
pcpu_page_first_chunk, or pcpu_setup_first_chunk is made. The first two
eventually call pcpu_setup_first_chunk with a pairing call to
pcpu_free_alloc_info right after. This occurs in all implementations. It
happens we don't have a pairing call to pcpu_free_alloc_info in the UP
setup_per_cpu_area.

Thanks,
Dennis

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info
  2017-10-03 23:48       ` Dennis Zhou
@ 2017-10-04  0:13         ` Nicolas Pitre
  0 siblings, 0 replies; 29+ messages in thread
From: Nicolas Pitre @ 2017-10-04  0:13 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel

On Tue, 3 Oct 2017, Dennis Zhou wrote:

> Hi Tejun,
> 
> On Tue, Oct 03, 2017 at 03:36:42PM -0700, Tejun Heo wrote:
> > > Subject: [PATCH] percpu: don't forget to free the temporary struct pcpu_alloc_info
> > 
> > So, IIRC, the error path is either boot fail or some serious bug in
> > arch code.  It really doesn't matter whether we free a page or not.
> >
> 
> In setup_per_cpu_area, a call to either pcpu_embed_first_chunk,
> pcpu_page_first_chunk, or pcpu_setup_first_chunk is made. The first two
> eventually call pcpu_setup_first_chunk with a pairing call to
> pcpu_free_alloc_info right after. This occurs in all implementations. It
> happens we don't have a pairing call to pcpu_free_alloc_info in the UP
> setup_per_cpu_area.

That was my conclusion too (albeit not stated as clearly) and what my 
second patch fixed.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info
  2017-10-03 22:29   ` Nicolas Pitre
  2017-10-03 22:36     ` Tejun Heo
@ 2017-10-04 14:15     ` Tejun Heo
  2017-11-18 18:25     ` mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang) Guenter Roeck
  2 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2017-10-04 14:15 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Christoph Lameter, linux-mm, linux-kernel

Hello,

On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> Subject: [PATCH] percpu: don't forget to free the temporary struct pcpu_alloc_info
> 
> Unlike the SMP case, the !SMP case does not free the memory for struct 
> pcpu_alloc_info allocated in setup_per_cpu_areas(). And to give it a 
> chance of being reused by the page allocator later, align it to a page 
> boundary just like its size.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>

Applied to percpu/for-4.15 w/ Dennis's ack.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-10-03 22:29   ` Nicolas Pitre
  2017-10-03 22:36     ` Tejun Heo
  2017-10-04 14:15     ` Tejun Heo
@ 2017-11-18 18:25     ` Guenter Roeck
  2017-11-19 20:36       ` Nicolas Pitre
  2017-11-27 19:41       ` Tejun Heo
  2 siblings, 2 replies; 29+ messages in thread
From: Guenter Roeck @ 2017-11-18 18:25 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

Hi,

On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> On Tue, 3 Oct 2017, Tejun Heo wrote:
> 
> > On Tue, Oct 03, 2017 at 04:57:44PM -0400, Nicolas Pitre wrote:
> > > This can be much smaller than a page on very small memory systems. 
> > > Always rounding up the size to a page is wasteful in that case, and 
> > > required alignment is smaller than the memblock default. Let's round 
> > > things up to a page size only when the actual size is >= page size, and 
> > > then it makes sense to page-align for a nicer allocation pattern.
> > 
> > Isn't that a temporary area which gets freed later during boot?
> 
> Hmmm...
> 
> It may get freed through 3 different paths where 2 of them are error 
> paths. What looks like a non-error path is in pcpu_embed_first_chunk() 
> called from setup_per_cpu_areas(). But there are two versions of 
> setup_per_cpu_areas(): one for SMP and one for !SMP. And the !SMP case 
> never calls pcpu_free_alloc_info() currently.
> 
> I'm not sure i understand that code fully, but maybe the following patch 
> could be a better fit:
> 
> ----- >8
> Subject: [PATCH] percpu: don't forget to free the temporary struct pcpu_alloc_info
> 
> Unlike the SMP case, the !SMP case does not free the memory for struct 
> pcpu_alloc_info allocated in setup_per_cpu_areas(). And to give it a 
> chance of being reused by the page allocator later, align it to a page 
> boundary just like its size.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>

This patch causes my crisv32 qemu emulation to hang with no console output.

> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 434844415d..caab63375b 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -1416,7 +1416,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
>  			  __alignof__(ai->groups[0].cpu_map[0]));
>  	ai_size = base_size + nr_units * sizeof(ai->groups[0].cpu_map[0]);
>  
> -	ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), 0);
> +	ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), PAGE_SIZE);
>  	if (!ptr)
>  		return NULL;
>  	ai = ptr;
> @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
>  
>  	if (pcpu_setup_first_chunk(ai, fc) < 0)
>  		panic("Failed to initialize percpu areas.");
> +	pcpu_free_alloc_info(ai);

This is the culprit. Everything works fine if I remove this line.

No idea if the problem is here or in the cris core.
Copying cris maintainers for input.

Guenter

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-18 18:25     ` mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang) Guenter Roeck
@ 2017-11-19 20:36       ` Nicolas Pitre
  2017-11-20  2:03         ` Guenter Roeck
  2017-11-27 19:41       ` Tejun Heo
  1 sibling, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2017-11-19 20:36 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Sat, 18 Nov 2017, Guenter Roeck wrote:

> Hi,
> 
> On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> > On Tue, 3 Oct 2017, Tejun Heo wrote:
> > 
> > > On Tue, Oct 03, 2017 at 04:57:44PM -0400, Nicolas Pitre wrote:
> > > > This can be much smaller than a page on very small memory systems. 
> > > > Always rounding up the size to a page is wasteful in that case, and 
> > > > required alignment is smaller than the memblock default. Let's round 
> > > > things up to a page size only when the actual size is >= page size, and 
> > > > then it makes sense to page-align for a nicer allocation pattern.
> > > 
> > > Isn't that a temporary area which gets freed later during boot?
> > 
> > Hmmm...
> > 
> > It may get freed through 3 different paths where 2 of them are error 
> > paths. What looks like a non-error path is in pcpu_embed_first_chunk() 
> > called from setup_per_cpu_areas(). But there are two versions of 
> > setup_per_cpu_areas(): one for SMP and one for !SMP. And the !SMP case 
> > never calls pcpu_free_alloc_info() currently.
> > 
> > I'm not sure i understand that code fully, but maybe the following patch 
> > could be a better fit:
> > 
> > ----- >8
> > Subject: [PATCH] percpu: don't forget to free the temporary struct pcpu_alloc_info
> > 
> > Unlike the SMP case, the !SMP case does not free the memory for struct 
> > pcpu_alloc_info allocated in setup_per_cpu_areas(). And to give it a 
> > chance of being reused by the page allocator later, align it to a page 
> > boundary just like its size.
> > 
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> This patch causes my crisv32 qemu emulation to hang with no console output.
> 
> > 
> > diff --git a/mm/percpu.c b/mm/percpu.c
> > index 434844415d..caab63375b 100644
> > --- a/mm/percpu.c
> > +++ b/mm/percpu.c
> > @@ -1416,7 +1416,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
> >  			  __alignof__(ai->groups[0].cpu_map[0]));
> >  	ai_size = base_size + nr_units * sizeof(ai->groups[0].cpu_map[0]);
> >  
> > -	ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), 0);
> > +	ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), PAGE_SIZE);
> >  	if (!ptr)
> >  		return NULL;
> >  	ai = ptr;
> > @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
> >  
> >  	if (pcpu_setup_first_chunk(ai, fc) < 0)
> >  		panic("Failed to initialize percpu areas.");
> > +	pcpu_free_alloc_info(ai);
> 
> This is the culprit. Everything works fine if I remove this line.

Without this line, the memory at the ai pointer is leaked. Maybe this is 
modifying the memory allocation pattern and that triggers a bug later on 
in your case.

At that point the console driver is not yet initialized and any error 
message won't be printed. You should enable the early console mechanism 
in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what 
that might tell you.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-19 20:36       ` Nicolas Pitre
@ 2017-11-20  2:03         ` Guenter Roeck
  2017-11-20  4:08           ` Nicolas Pitre
  0 siblings, 1 reply; 29+ messages in thread
From: Guenter Roeck @ 2017-11-20  2:03 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On 11/19/2017 12:36 PM, Nicolas Pitre wrote:
> On Sat, 18 Nov 2017, Guenter Roeck wrote:
> 
>> Hi,
>>
>> On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
>>> On Tue, 3 Oct 2017, Tejun Heo wrote:
>>>
>>>> On Tue, Oct 03, 2017 at 04:57:44PM -0400, Nicolas Pitre wrote:
>>>>> This can be much smaller than a page on very small memory systems.
>>>>> Always rounding up the size to a page is wasteful in that case, and
>>>>> required alignment is smaller than the memblock default. Let's round
>>>>> things up to a page size only when the actual size is >= page size, and
>>>>> then it makes sense to page-align for a nicer allocation pattern.
>>>>
>>>> Isn't that a temporary area which gets freed later during boot?
>>>
>>> Hmmm...
>>>
>>> It may get freed through 3 different paths where 2 of them are error
>>> paths. What looks like a non-error path is in pcpu_embed_first_chunk()
>>> called from setup_per_cpu_areas(). But there are two versions of
>>> setup_per_cpu_areas(): one for SMP and one for !SMP. And the !SMP case
>>> never calls pcpu_free_alloc_info() currently.
>>>
>>> I'm not sure i understand that code fully, but maybe the following patch
>>> could be a better fit:
>>>
>>> ----- >8
>>> Subject: [PATCH] percpu: don't forget to free the temporary struct pcpu_alloc_info
>>>
>>> Unlike the SMP case, the !SMP case does not free the memory for struct
>>> pcpu_alloc_info allocated in setup_per_cpu_areas(). And to give it a
>>> chance of being reused by the page allocator later, align it to a page
>>> boundary just like its size.
>>>
>>> Signed-off-by: Nicolas Pitre <nico@linaro.org>
>>
>> This patch causes my crisv32 qemu emulation to hang with no console output.
>>
>>>
>>> diff --git a/mm/percpu.c b/mm/percpu.c
>>> index 434844415d..caab63375b 100644
>>> --- a/mm/percpu.c
>>> +++ b/mm/percpu.c
>>> @@ -1416,7 +1416,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
>>>   			  __alignof__(ai->groups[0].cpu_map[0]));
>>>   	ai_size = base_size + nr_units * sizeof(ai->groups[0].cpu_map[0]);
>>>   
>>> -	ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), 0);
>>> +	ptr = memblock_virt_alloc_nopanic(PFN_ALIGN(ai_size), PAGE_SIZE);
>>>   	if (!ptr)
>>>   		return NULL;
>>>   	ai = ptr;
>>> @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
>>>   
>>>   	if (pcpu_setup_first_chunk(ai, fc) < 0)
>>>   		panic("Failed to initialize percpu areas.");
>>> +	pcpu_free_alloc_info(ai);
>>
>> This is the culprit. Everything works fine if I remove this line.
> 
> Without this line, the memory at the ai pointer is leaked. Maybe this is
> modifying the memory allocation pattern and that triggers a bug later on
> in your case.
> 
> At that point the console driver is not yet initialized and any error
> message won't be printed. You should enable the early console mechanism
> in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what
> that might tell you.
> 

The problem is that BUG() on crisv32 does not yield useful output.
Anyway, here is the culprit.

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 6aef64254203..2bcc8901450c 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start, unsigned long end,
                         return 0;
                 pos = bdata->node_low_pfn;
         }
-       BUG();
+       WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n", start, end);
+       return -ENOMEM;
  }

  /**
diff --git a/mm/percpu.c b/mm/percpu.c
index 79e3549cab0f..c75622d844f1 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init pcpu_alloc_alloc_info(int nr_groups,
   */
  void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
  {
+       printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai));
         memblock_free_early(__pa(ai), ai->__ai_size);
  }

results in:

pcpu_free_alloc_info(c0534000 (0x40534000))
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa
mark_bootmem(): memory range 0x2029a-0x2029b not found

and the system keeps booting.

If I drop the __pa() from the memblock_free_early() parameter, everything works
as expected. Off to the cris maintainers ... I have no idea how to fix the problem.

Guenter

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-20  2:03         ` Guenter Roeck
@ 2017-11-20  4:08           ` Nicolas Pitre
  2017-11-20  5:05             ` Guenter Roeck
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2017-11-20  4:08 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Sun, 19 Nov 2017, Guenter Roeck wrote:
> On 11/19/2017 12:36 PM, Nicolas Pitre wrote:
> > On Sat, 18 Nov 2017, Guenter Roeck wrote:
> > > On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> > > > @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
> > > >     	if (pcpu_setup_first_chunk(ai, fc) < 0)
> > > >   		panic("Failed to initialize percpu areas.");
> > > > +	pcpu_free_alloc_info(ai);
> > > 
> > > This is the culprit. Everything works fine if I remove this line.
> > 
> > Without this line, the memory at the ai pointer is leaked. Maybe this is
> > modifying the memory allocation pattern and that triggers a bug later on
> > in your case.
> > 
> > At that point the console driver is not yet initialized and any error
> > message won't be printed. You should enable the early console mechanism
> > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what
> > that might tell you.
> > 
> 
> The problem is that BUG() on crisv32 does not yield useful output.
> Anyway, here is the culprit.
> 
> diff --git a/mm/bootmem.c b/mm/bootmem.c
> index 6aef64254203..2bcc8901450c 100644
> --- a/mm/bootmem.c
> +++ b/mm/bootmem.c
> @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start,
> unsigned long end,
>                         return 0;
>                 pos = bdata->node_low_pfn;
>         }
> -       BUG();
> +       WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n", start,
> end);
> +       return -ENOMEM;
>  }
> 
>  /**
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 79e3549cab0f..c75622d844f1 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init
> pcpu_alloc_alloc_info(int nr_groups,
>   */
>  void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
>  {
> +       printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai));
>         memblock_free_early(__pa(ai), ai->__ai_size);

The problem here is that there is two possibilities for 
memblock_free_early(). From include/linux/bootmem.h:

#if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)

static inline void __init memblock_free_early(
                                        phys_addr_t base, phys_addr_t size)
{
        __memblock_free_early(base, size);
}

#else

static inline void __init memblock_free_early(
                                        phys_addr_t base, phys_addr_t size)
{
        free_bootmem(base, size);
}

#endif

It looks like most architectures use the memblock variant, including all 
the ones I have access to.

> results in:
> 
> pcpu_free_alloc_info(c0534000 (0x40534000))
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa
> mark_bootmem(): memory range 0x2029a-0x2029b not found

Well... PFN_UP(0x40534000) should give 0x40534. How you might end up 
with 0x2029a in mark_bootmem(), let alone not exit on the first "if (max 
== end) return 0;" within the loop is rather weird.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-20  4:08           ` Nicolas Pitre
@ 2017-11-20  5:05             ` Guenter Roeck
  2017-11-20 18:18               ` Nicolas Pitre
  0 siblings, 1 reply; 29+ messages in thread
From: Guenter Roeck @ 2017-11-20  5:05 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On 11/19/2017 08:08 PM, Nicolas Pitre wrote:
> On Sun, 19 Nov 2017, Guenter Roeck wrote:
>> On 11/19/2017 12:36 PM, Nicolas Pitre wrote:
>>> On Sat, 18 Nov 2017, Guenter Roeck wrote:
>>>> On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
>>>>> @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
>>>>>      	if (pcpu_setup_first_chunk(ai, fc) < 0)
>>>>>    		panic("Failed to initialize percpu areas.");
>>>>> +	pcpu_free_alloc_info(ai);
>>>>
>>>> This is the culprit. Everything works fine if I remove this line.
>>>
>>> Without this line, the memory at the ai pointer is leaked. Maybe this is
>>> modifying the memory allocation pattern and that triggers a bug later on
>>> in your case.
>>>
>>> At that point the console driver is not yet initialized and any error
>>> message won't be printed. You should enable the early console mechanism
>>> in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what
>>> that might tell you.
>>>
>>
>> The problem is that BUG() on crisv32 does not yield useful output.
>> Anyway, here is the culprit.
>>
>> diff --git a/mm/bootmem.c b/mm/bootmem.c
>> index 6aef64254203..2bcc8901450c 100644
>> --- a/mm/bootmem.c
>> +++ b/mm/bootmem.c
>> @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start,
>> unsigned long end,
>>                          return 0;
>>                  pos = bdata->node_low_pfn;
>>          }
>> -       BUG();
>> +       WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n", start,
>> end);
>> +       return -ENOMEM;
>>   }
>>
>>   /**
>> diff --git a/mm/percpu.c b/mm/percpu.c
>> index 79e3549cab0f..c75622d844f1 100644
>> --- a/mm/percpu.c
>> +++ b/mm/percpu.c
>> @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init
>> pcpu_alloc_alloc_info(int nr_groups,
>>    */
>>   void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
>>   {
>> +       printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai));
>>          memblock_free_early(__pa(ai), ai->__ai_size);
> 
> The problem here is that there is two possibilities for
> memblock_free_early(). From include/linux/bootmem.h:
> 
> #if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)
> 
> static inline void __init memblock_free_early(
>                                          phys_addr_t base, phys_addr_t size)
> {
>          __memblock_free_early(base, size);
> }
> 
> #else
> 
> static inline void __init memblock_free_early(
>                                          phys_addr_t base, phys_addr_t size)
> {
>          free_bootmem(base, size);
> }
> 
> #endif
> 
> It looks like most architectures use the memblock variant, including all
> the ones I have access to.
> 
>> results in:
>>
>> pcpu_free_alloc_info(c0534000 (0x40534000))
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa
>> mark_bootmem(): memory range 0x2029a-0x2029b not found
> 
> Well... PFN_UP(0x40534000) should give 0x40534. How you might end up
> with 0x2029a in mark_bootmem(), let alone not exit on the first "if (max
> == end) return 0;" within the loop is rather weird.
> 
pcpu_free_alloc_info: ai=c0536000, __pa(ai)=0x40536000, PFN_UP(__pa(ai))=0x2029b, PFN_UP(ai)=0x6029b

bootmem range is 0x60000..0x61000. It doesn't get to "if (max == end)"
because "pos (=0x2029b) < bdata->node_min_pfn (=0x60000)".

Guenter

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-20  5:05             ` Guenter Roeck
@ 2017-11-20 18:18               ` Nicolas Pitre
  2017-11-20 18:51                 ` Guenter Roeck
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2017-11-20 18:18 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Sun, 19 Nov 2017, Guenter Roeck wrote:

> On 11/19/2017 08:08 PM, Nicolas Pitre wrote:
> > On Sun, 19 Nov 2017, Guenter Roeck wrote:
> > > On 11/19/2017 12:36 PM, Nicolas Pitre wrote:
> > > > On Sat, 18 Nov 2017, Guenter Roeck wrote:
> > > > > On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> > > > > > @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
> > > > > >      	if (pcpu_setup_first_chunk(ai, fc) < 0)
> > > > > >    		panic("Failed to initialize percpu areas.");
> > > > > > +	pcpu_free_alloc_info(ai);
> > > > > 
> > > > > This is the culprit. Everything works fine if I remove this line.
> > > > 
> > > > Without this line, the memory at the ai pointer is leaked. Maybe this is
> > > > modifying the memory allocation pattern and that triggers a bug later on
> > > > in your case.
> > > > 
> > > > At that point the console driver is not yet initialized and any error
> > > > message won't be printed. You should enable the early console mechanism
> > > > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what
> > > > that might tell you.
> > > > 
> > > 
> > > The problem is that BUG() on crisv32 does not yield useful output.
> > > Anyway, here is the culprit.
> > > 
> > > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > > index 6aef64254203..2bcc8901450c 100644
> > > --- a/mm/bootmem.c
> > > +++ b/mm/bootmem.c
> > > @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start,
> > > unsigned long end,
> > >                          return 0;
> > >                  pos = bdata->node_low_pfn;
> > >          }
> > > -       BUG();
> > > +       WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n",
> > > start,
> > > end);
> > > +       return -ENOMEM;
> > >   }
> > > 
> > >   /**
> > > diff --git a/mm/percpu.c b/mm/percpu.c
> > > index 79e3549cab0f..c75622d844f1 100644
> > > --- a/mm/percpu.c
> > > +++ b/mm/percpu.c
> > > @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init
> > > pcpu_alloc_alloc_info(int nr_groups,
> > >    */
> > >   void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
> > >   {
> > > +       printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai));
> > >          memblock_free_early(__pa(ai), ai->__ai_size);
> > 
> > The problem here is that there is two possibilities for
> > memblock_free_early(). From include/linux/bootmem.h:
> > 
> > #if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)
> > 
> > static inline void __init memblock_free_early(
> >                                          phys_addr_t base, phys_addr_t size)
> > {
> >          __memblock_free_early(base, size);
> > }
> > 
> > #else
> > 
> > static inline void __init memblock_free_early(
> >                                          phys_addr_t base, phys_addr_t size)
> > {
> >          free_bootmem(base, size);
> > }
> > 
> > #endif
> > 
> > It looks like most architectures use the memblock variant, including all
> > the ones I have access to.
> > 
> > > results in:
> > > 
> > > pcpu_free_alloc_info(c0534000 (0x40534000))
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa
> > > mark_bootmem(): memory range 0x2029a-0x2029b not found
> > 
> > Well... PFN_UP(0x40534000) should give 0x40534. How you might end up
> > with 0x2029a in mark_bootmem(), let alone not exit on the first "if (max
> > == end) return 0;" within the loop is rather weird.
> > 
> pcpu_free_alloc_info: ai=c0536000, __pa(ai)=0x40536000,
> PFN_UP(__pa(ai))=0x2029b, PFN_UP(ai)=0x6029b
> 
> bootmem range is 0x60000..0x61000. It doesn't get to "if (max == end)"
> because "pos (=0x2029b) < bdata->node_min_pfn (=0x60000)".

OK. the 0x2029b is the result of PAGE_SIZE being 8192 in your case.
However the bootmem allocator deals with physical addresses not virtual 
ones. So it shouldn't give you a 0x60000..0x61000 range.

Would be interesting to see what result you get on line 860 of 
mm/bootmem.c.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-20 18:18               ` Nicolas Pitre
@ 2017-11-20 18:51                 ` Guenter Roeck
  2017-11-20 20:21                   ` Nicolas Pitre
  0 siblings, 1 reply; 29+ messages in thread
From: Guenter Roeck @ 2017-11-20 18:51 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, Nov 20, 2017 at 01:18:38PM -0500, Nicolas Pitre wrote:
> On Sun, 19 Nov 2017, Guenter Roeck wrote:
> 
> > On 11/19/2017 08:08 PM, Nicolas Pitre wrote:
> > > On Sun, 19 Nov 2017, Guenter Roeck wrote:
> > > > On 11/19/2017 12:36 PM, Nicolas Pitre wrote:
> > > > > On Sat, 18 Nov 2017, Guenter Roeck wrote:
> > > > > > On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> > > > > > > @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
> > > > > > >      	if (pcpu_setup_first_chunk(ai, fc) < 0)
> > > > > > >    		panic("Failed to initialize percpu areas.");
> > > > > > > +	pcpu_free_alloc_info(ai);
> > > > > > 
> > > > > > This is the culprit. Everything works fine if I remove this line.
> > > > > 
> > > > > Without this line, the memory at the ai pointer is leaked. Maybe this is
> > > > > modifying the memory allocation pattern and that triggers a bug later on
> > > > > in your case.
> > > > > 
> > > > > At that point the console driver is not yet initialized and any error
> > > > > message won't be printed. You should enable the early console mechanism
> > > > > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what
> > > > > that might tell you.
> > > > > 
> > > > 
> > > > The problem is that BUG() on crisv32 does not yield useful output.
> > > > Anyway, here is the culprit.
> > > > 
> > > > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > > > index 6aef64254203..2bcc8901450c 100644
> > > > --- a/mm/bootmem.c
> > > > +++ b/mm/bootmem.c
> > > > @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start,
> > > > unsigned long end,
> > > >                          return 0;
> > > >                  pos = bdata->node_low_pfn;
> > > >          }
> > > > -       BUG();
> > > > +       WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n",
> > > > start,
> > > > end);
> > > > +       return -ENOMEM;
> > > >   }
> > > > 
> > > >   /**
> > > > diff --git a/mm/percpu.c b/mm/percpu.c
> > > > index 79e3549cab0f..c75622d844f1 100644
> > > > --- a/mm/percpu.c
> > > > +++ b/mm/percpu.c
> > > > @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init
> > > > pcpu_alloc_alloc_info(int nr_groups,
> > > >    */
> > > >   void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
> > > >   {
> > > > +       printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai));
> > > >          memblock_free_early(__pa(ai), ai->__ai_size);
> > > 
> > > The problem here is that there is two possibilities for
> > > memblock_free_early(). From include/linux/bootmem.h:
> > > 
> > > #if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)
> > > 
> > > static inline void __init memblock_free_early(
> > >                                          phys_addr_t base, phys_addr_t size)
> > > {
> > >          __memblock_free_early(base, size);
> > > }
> > > 
> > > #else
> > > 
> > > static inline void __init memblock_free_early(
> > >                                          phys_addr_t base, phys_addr_t size)
> > > {
> > >          free_bootmem(base, size);
> > > }
> > > 
> > > #endif
> > > 
> > > It looks like most architectures use the memblock variant, including all
> > > the ones I have access to.
> > > 
> > > > results in:
> > > > 
> > > > pcpu_free_alloc_info(c0534000 (0x40534000))
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa
> > > > mark_bootmem(): memory range 0x2029a-0x2029b not found
> > > 
> > > Well... PFN_UP(0x40534000) should give 0x40534. How you might end up
> > > with 0x2029a in mark_bootmem(), let alone not exit on the first "if (max
> > > == end) return 0;" within the loop is rather weird.
> > > 
> > pcpu_free_alloc_info: ai=c0536000, __pa(ai)=0x40536000,
> > PFN_UP(__pa(ai))=0x2029b, PFN_UP(ai)=0x6029b
> > 
> > bootmem range is 0x60000..0x61000. It doesn't get to "if (max == end)"
> > because "pos (=0x2029b) < bdata->node_min_pfn (=0x60000)".
> 
> OK. the 0x2029b is the result of PAGE_SIZE being 8192 in your case.
> However the bootmem allocator deals with physical addresses not virtual 
> ones. So it shouldn't give you a 0x60000..0x61000 range.
> 
> Would be interesting to see what result you get on line 860 of 
> mm/bootmem.c.
> 
Nothing; __alloc_bootmem_low_node() is not called.

Call chain is:
  pcpu_alloc_alloc_info
    memblock_virt_alloc_nopanic
      __alloc_bootmem_nopanic
        ___alloc_bootmem_nopanic

and returns 0xc0536000.

Guenter

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-20 18:51                 ` Guenter Roeck
@ 2017-11-20 20:21                   ` Nicolas Pitre
  2017-11-20 21:11                     ` Guenter Roeck
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2017-11-20 20:21 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, 20 Nov 2017, Guenter Roeck wrote:

> On Mon, Nov 20, 2017 at 01:18:38PM -0500, Nicolas Pitre wrote:
> > On Sun, 19 Nov 2017, Guenter Roeck wrote:
> > 
> > > On 11/19/2017 08:08 PM, Nicolas Pitre wrote:
> > > > On Sun, 19 Nov 2017, Guenter Roeck wrote:
> > > > > On 11/19/2017 12:36 PM, Nicolas Pitre wrote:
> > > > > > On Sat, 18 Nov 2017, Guenter Roeck wrote:
> > > > > > > On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> > > > > > > > @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
> > > > > > > >      	if (pcpu_setup_first_chunk(ai, fc) < 0)
> > > > > > > >    		panic("Failed to initialize percpu areas.");
> > > > > > > > +	pcpu_free_alloc_info(ai);
> > > > > > > 
> > > > > > > This is the culprit. Everything works fine if I remove this line.
> > > > > > 
> > > > > > Without this line, the memory at the ai pointer is leaked. Maybe this is
> > > > > > modifying the memory allocation pattern and that triggers a bug later on
> > > > > > in your case.
> > > > > > 
> > > > > > At that point the console driver is not yet initialized and any error
> > > > > > message won't be printed. You should enable the early console mechanism
> > > > > > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what
> > > > > > that might tell you.
> > > > > > 
> > > > > 
> > > > > The problem is that BUG() on crisv32 does not yield useful output.
> > > > > Anyway, here is the culprit.
> > > > > 
> > > > > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > > > > index 6aef64254203..2bcc8901450c 100644
> > > > > --- a/mm/bootmem.c
> > > > > +++ b/mm/bootmem.c
> > > > > @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start,
> > > > > unsigned long end,
> > > > >                          return 0;
> > > > >                  pos = bdata->node_low_pfn;
> > > > >          }
> > > > > -       BUG();
> > > > > +       WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n",
> > > > > start,
> > > > > end);
> > > > > +       return -ENOMEM;
> > > > >   }
> > > > > 
> > > > >   /**
> > > > > diff --git a/mm/percpu.c b/mm/percpu.c
> > > > > index 79e3549cab0f..c75622d844f1 100644
> > > > > --- a/mm/percpu.c
> > > > > +++ b/mm/percpu.c
> > > > > @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init
> > > > > pcpu_alloc_alloc_info(int nr_groups,
> > > > >    */
> > > > >   void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
> > > > >   {
> > > > > +       printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai));
> > > > >          memblock_free_early(__pa(ai), ai->__ai_size);
> > > > 
> > > > The problem here is that there is two possibilities for
> > > > memblock_free_early(). From include/linux/bootmem.h:
> > > > 
> > > > #if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)
> > > > 
> > > > static inline void __init memblock_free_early(
> > > >                                          phys_addr_t base, phys_addr_t size)
> > > > {
> > > >          __memblock_free_early(base, size);
> > > > }
> > > > 
> > > > #else
> > > > 
> > > > static inline void __init memblock_free_early(
> > > >                                          phys_addr_t base, phys_addr_t size)
> > > > {
> > > >          free_bootmem(base, size);
> > > > }
> > > > 
> > > > #endif
> > > > 
> > > > It looks like most architectures use the memblock variant, including all
> > > > the ones I have access to.
> > > > 
> > > > > results in:
> > > > > 
> > > > > pcpu_free_alloc_info(c0534000 (0x40534000))
> > > > > ------------[ cut here ]------------
> > > > > WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa
> > > > > mark_bootmem(): memory range 0x2029a-0x2029b not found
> > > > 
> > > > Well... PFN_UP(0x40534000) should give 0x40534. How you might end up
> > > > with 0x2029a in mark_bootmem(), let alone not exit on the first "if (max
> > > > == end) return 0;" within the loop is rather weird.
> > > > 
> > > pcpu_free_alloc_info: ai=c0536000, __pa(ai)=0x40536000,
> > > PFN_UP(__pa(ai))=0x2029b, PFN_UP(ai)=0x6029b
> > > 
> > > bootmem range is 0x60000..0x61000. It doesn't get to "if (max == end)"
> > > because "pos (=0x2029b) < bdata->node_min_pfn (=0x60000)".
> > 
> > OK. the 0x2029b is the result of PAGE_SIZE being 8192 in your case.
> > However the bootmem allocator deals with physical addresses not virtual 
> > ones. So it shouldn't give you a 0x60000..0x61000 range.
> > 
> > Would be interesting to see what result you get on line 860 of 
> > mm/bootmem.c.
> > 
> Nothing; __alloc_bootmem_low_node() is not called.
> 
> Call chain is:
>   pcpu_alloc_alloc_info
>     memblock_virt_alloc_nopanic
>       __alloc_bootmem_nopanic
>         ___alloc_bootmem_nopanic

But from there it should continue with: 

	alloc_bootmem_core() -->
	  alloc_bootmem_bdata() -->
	    [...]
	    region = phys_to_virt(PFN_PHYS(bdata->node_min_pfn) + start_off);

That's line 585, not 860 as I mentioned. Sorry for the confusion.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-20 20:21                   ` Nicolas Pitre
@ 2017-11-20 21:11                     ` Guenter Roeck
  2017-11-21  0:28                       ` Nicolas Pitre
  0 siblings, 1 reply; 29+ messages in thread
From: Guenter Roeck @ 2017-11-20 21:11 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, Nov 20, 2017 at 03:21:32PM -0500, Nicolas Pitre wrote:
> On Mon, 20 Nov 2017, Guenter Roeck wrote:
> 
> > On Mon, Nov 20, 2017 at 01:18:38PM -0500, Nicolas Pitre wrote:
> > > On Sun, 19 Nov 2017, Guenter Roeck wrote:
> > > 
> > > > On 11/19/2017 08:08 PM, Nicolas Pitre wrote:
> > > > > On Sun, 19 Nov 2017, Guenter Roeck wrote:
> > > > > > On 11/19/2017 12:36 PM, Nicolas Pitre wrote:
> > > > > > > On Sat, 18 Nov 2017, Guenter Roeck wrote:
> > > > > > > > On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> > > > > > > > > @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
> > > > > > > > >      	if (pcpu_setup_first_chunk(ai, fc) < 0)
> > > > > > > > >    		panic("Failed to initialize percpu areas.");
> > > > > > > > > +	pcpu_free_alloc_info(ai);
> > > > > > > > 
> > > > > > > > This is the culprit. Everything works fine if I remove this line.
> > > > > > > 
> > > > > > > Without this line, the memory at the ai pointer is leaked. Maybe this is
> > > > > > > modifying the memory allocation pattern and that triggers a bug later on
> > > > > > > in your case.
> > > > > > > 
> > > > > > > At that point the console driver is not yet initialized and any error
> > > > > > > message won't be printed. You should enable the early console mechanism
> > > > > > > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what
> > > > > > > that might tell you.
> > > > > > > 
> > > > > > 
> > > > > > The problem is that BUG() on crisv32 does not yield useful output.
> > > > > > Anyway, here is the culprit.
> > > > > > 
> > > > > > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > > > > > index 6aef64254203..2bcc8901450c 100644
> > > > > > --- a/mm/bootmem.c
> > > > > > +++ b/mm/bootmem.c
> > > > > > @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start,
> > > > > > unsigned long end,
> > > > > >                          return 0;
> > > > > >                  pos = bdata->node_low_pfn;
> > > > > >          }
> > > > > > -       BUG();
> > > > > > +       WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n",
> > > > > > start,
> > > > > > end);
> > > > > > +       return -ENOMEM;
> > > > > >   }
> > > > > > 
> > > > > >   /**
> > > > > > diff --git a/mm/percpu.c b/mm/percpu.c
> > > > > > index 79e3549cab0f..c75622d844f1 100644
> > > > > > --- a/mm/percpu.c
> > > > > > +++ b/mm/percpu.c
> > > > > > @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init
> > > > > > pcpu_alloc_alloc_info(int nr_groups,
> > > > > >    */
> > > > > >   void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
> > > > > >   {
> > > > > > +       printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai));
> > > > > >          memblock_free_early(__pa(ai), ai->__ai_size);
> > > > > 
> > > > > The problem here is that there is two possibilities for
> > > > > memblock_free_early(). From include/linux/bootmem.h:
> > > > > 
> > > > > #if defined(CONFIG_HAVE_MEMBLOCK) && defined(CONFIG_NO_BOOTMEM)
> > > > > 
> > > > > static inline void __init memblock_free_early(
> > > > >                                          phys_addr_t base, phys_addr_t size)
> > > > > {
> > > > >          __memblock_free_early(base, size);
> > > > > }
> > > > > 
> > > > > #else
> > > > > 
> > > > > static inline void __init memblock_free_early(
> > > > >                                          phys_addr_t base, phys_addr_t size)
> > > > > {
> > > > >          free_bootmem(base, size);
> > > > > }
> > > > > 
> > > > > #endif
> > > > > 
> > > > > It looks like most architectures use the memblock variant, including all
> > > > > the ones I have access to.
> > > > > 
> > > > > > results in:
> > > > > > 
> > > > > > pcpu_free_alloc_info(c0534000 (0x40534000))
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa
> > > > > > mark_bootmem(): memory range 0x2029a-0x2029b not found
> > > > > 
> > > > > Well... PFN_UP(0x40534000) should give 0x40534. How you might end up
> > > > > with 0x2029a in mark_bootmem(), let alone not exit on the first "if (max
> > > > > == end) return 0;" within the loop is rather weird.
> > > > > 
> > > > pcpu_free_alloc_info: ai=c0536000, __pa(ai)=0x40536000,
> > > > PFN_UP(__pa(ai))=0x2029b, PFN_UP(ai)=0x6029b
> > > > 
> > > > bootmem range is 0x60000..0x61000. It doesn't get to "if (max == end)"
> > > > because "pos (=0x2029b) < bdata->node_min_pfn (=0x60000)".
> > > 
> > > OK. the 0x2029b is the result of PAGE_SIZE being 8192 in your case.
> > > However the bootmem allocator deals with physical addresses not virtual 
> > > ones. So it shouldn't give you a 0x60000..0x61000 range.
> > > 
> > > Would be interesting to see what result you get on line 860 of 
> > > mm/bootmem.c.
> > > 
> > Nothing; __alloc_bootmem_low_node() is not called.
> > 
> > Call chain is:
> >   pcpu_alloc_alloc_info
> >     memblock_virt_alloc_nopanic
> >       __alloc_bootmem_nopanic
> >         ___alloc_bootmem_nopanic
> 
> But from there it should continue with: 
> 
> 	alloc_bootmem_core() -->
> 	  alloc_bootmem_bdata() -->
> 	    [...]
> 	    region = phys_to_virt(PFN_PHYS(bdata->node_min_pfn) + start_off);
> 
> That's line 585, not 860 as I mentioned. Sorry for the confusion.
> 
bdata->node_min_pfn=60000 PFN_PHYS(bdata->node_min_pfn)=c0000000 start_off=536000 region=c0536000

Guenter

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-20 21:11                     ` Guenter Roeck
@ 2017-11-21  0:28                       ` Nicolas Pitre
  2017-11-21  1:48                         ` Guenter Roeck
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2017-11-21  0:28 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, 20 Nov 2017, Guenter Roeck wrote:

> On Mon, Nov 20, 2017 at 03:21:32PM -0500, Nicolas Pitre wrote:
> > On Mon, 20 Nov 2017, Guenter Roeck wrote:
> > 
> > > On Mon, Nov 20, 2017 at 01:18:38PM -0500, Nicolas Pitre wrote:
> > > > On Sun, 19 Nov 2017, Guenter Roeck wrote:
> > > > 
> > > > > On 11/19/2017 08:08 PM, Nicolas Pitre wrote:
> > > > > > On Sun, 19 Nov 2017, Guenter Roeck wrote:
> > > > > > > On 11/19/2017 12:36 PM, Nicolas Pitre wrote:
> > > > > > > > On Sat, 18 Nov 2017, Guenter Roeck wrote:
> > > > > > > > > On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> > > > > > > > > > @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
> > > > > > > > > >      	if (pcpu_setup_first_chunk(ai, fc) < 0)
> > > > > > > > > >    		panic("Failed to initialize percpu areas.");
> > > > > > > > > > +	pcpu_free_alloc_info(ai);
> > > > > > > > > 
> > > > > > > > > This is the culprit. Everything works fine if I remove this line.
> > > > > > > > 
> > > > > > > > Without this line, the memory at the ai pointer is leaked. Maybe this is
> > > > > > > > modifying the memory allocation pattern and that triggers a bug later on
> > > > > > > > in your case.
> > > > > > > > 
> > > > > > > > At that point the console driver is not yet initialized and any error
> > > > > > > > message won't be printed. You should enable the early console mechanism
> > > > > > > > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what
> > > > > > > > that might tell you.
> > > > > > > > 
> > > > > > > 
> > > > > > > The problem is that BUG() on crisv32 does not yield useful output.
> > > > > > > Anyway, here is the culprit.
> > > > > > > 
> > > > > > > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > > > > > > index 6aef64254203..2bcc8901450c 100644
> > > > > > > --- a/mm/bootmem.c
> > > > > > > +++ b/mm/bootmem.c
> > > > > > > @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start,
> > > > > > > unsigned long end,
> > > > > > >                          return 0;
> > > > > > >                  pos = bdata->node_low_pfn;
> > > > > > >          }
> > > > > > > -       BUG();
> > > > > > > +       WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n",
> > > > > > > start,
> > > > > > > end);
> > > > > > > +       return -ENOMEM;
> > > > > > >   }
> > > > > > > 
> > > > > > >   /**
> > > > > > > diff --git a/mm/percpu.c b/mm/percpu.c
> > > > > > > index 79e3549cab0f..c75622d844f1 100644
> > > > > > > --- a/mm/percpu.c
> > > > > > > +++ b/mm/percpu.c
> > > > > > > @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init
> > > > > > > pcpu_alloc_alloc_info(int nr_groups,
> > > > > > >    */
> > > > > > >   void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
> > > > > > >   {
> > > > > > > +       printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai));
> > > > > > >          memblock_free_early(__pa(ai), ai->__ai_size);
> > > > > > >
> > > > > > > results in:
> > > > > > > 
> > > > > > > pcpu_free_alloc_info(c0534000 (0x40534000))
> > > > > > > ------------[ cut here ]------------
> > > > > > > WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa
> > > > > > > mark_bootmem(): memory range 0x2029a-0x2029b not found
> > > > > > 
> > > > > > Well... PFN_UP(0x40534000) should give 0x40534. How you might end up
> > > > > > with 0x2029a in mark_bootmem(), let alone not exit on the first "if (max
> > > > > > == end) return 0;" within the loop is rather weird.
> > > > > > 
> > > > > pcpu_free_alloc_info: ai=c0536000, __pa(ai)=0x40536000,
> > > > > PFN_UP(__pa(ai))=0x2029b, PFN_UP(ai)=0x6029b
> > > > > 
> > > > > bootmem range is 0x60000..0x61000. It doesn't get to "if (max == end)"
> > > > > because "pos (=0x2029b) < bdata->node_min_pfn (=0x60000)".
> > > > 
> > > > OK. the 0x2029b is the result of PAGE_SIZE being 8192 in your case.
> > > > However the bootmem allocator deals with physical addresses not virtual 
> > > > ones. So it shouldn't give you a 0x60000..0x61000 range.
> > > > 
> > > > Would be interesting to see what result you get on line 860 of 
> > > > mm/bootmem.c.
> > > > 
> > > Nothing; __alloc_bootmem_low_node() is not called.
> > > 
> > > Call chain is:
> > >   pcpu_alloc_alloc_info
> > >     memblock_virt_alloc_nopanic
> > >       __alloc_bootmem_nopanic
> > >         ___alloc_bootmem_nopanic
> > 
> > But from there it should continue with: 
> > 
> > 	alloc_bootmem_core() -->
> > 	  alloc_bootmem_bdata() -->
> > 	    [...]
> > 	    region = phys_to_virt(PFN_PHYS(bdata->node_min_pfn) + start_off);
> > 
> > That's line 585, not 860 as I mentioned. Sorry for the confusion.
> > 
> bdata->node_min_pfn=60000 PFN_PHYS(bdata->node_min_pfn)=c0000000 start_off=536000 region=c0536000

If PFN_PHYS(bdata->node_min_pfn)=c0000000 and
region=c0536000 that means phys_to_virt() is a no-op.

However, from your result above, __pa(0xc0534000) = 0x40534000.

So, why is it that phys_to_virt() is a no-op and __pa() is not?

virt_to_phys() and __pa() are meant to be the reverse of phys_to_virt() 
and __va().


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-21  0:28                       ` Nicolas Pitre
@ 2017-11-21  1:48                         ` Guenter Roeck
  2017-11-21  3:50                           ` Nicolas Pitre
  0 siblings, 1 reply; 29+ messages in thread
From: Guenter Roeck @ 2017-11-21  1:48 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, Nov 20, 2017 at 07:28:21PM -0500, Nicolas Pitre wrote:
> On Mon, 20 Nov 2017, Guenter Roeck wrote:
> 
> > On Mon, Nov 20, 2017 at 03:21:32PM -0500, Nicolas Pitre wrote:
> > > On Mon, 20 Nov 2017, Guenter Roeck wrote:
> > > 
> > > > On Mon, Nov 20, 2017 at 01:18:38PM -0500, Nicolas Pitre wrote:
> > > > > On Sun, 19 Nov 2017, Guenter Roeck wrote:
> > > > > 
> > > > > > On 11/19/2017 08:08 PM, Nicolas Pitre wrote:
> > > > > > > On Sun, 19 Nov 2017, Guenter Roeck wrote:
> > > > > > > > On 11/19/2017 12:36 PM, Nicolas Pitre wrote:
> > > > > > > > > On Sat, 18 Nov 2017, Guenter Roeck wrote:
> > > > > > > > > > On Tue, Oct 03, 2017 at 06:29:49PM -0400, Nicolas Pitre wrote:
> > > > > > > > > > > @@ -2295,6 +2295,7 @@ void __init setup_per_cpu_areas(void)
> > > > > > > > > > >      	if (pcpu_setup_first_chunk(ai, fc) < 0)
> > > > > > > > > > >    		panic("Failed to initialize percpu areas.");
> > > > > > > > > > > +	pcpu_free_alloc_info(ai);
> > > > > > > > > > 
> > > > > > > > > > This is the culprit. Everything works fine if I remove this line.
> > > > > > > > > 
> > > > > > > > > Without this line, the memory at the ai pointer is leaked. Maybe this is
> > > > > > > > > modifying the memory allocation pattern and that triggers a bug later on
> > > > > > > > > in your case.
> > > > > > > > > 
> > > > > > > > > At that point the console driver is not yet initialized and any error
> > > > > > > > > message won't be printed. You should enable the early console mechanism
> > > > > > > > > in your kernel (see arch/cris/arch-v32/kernel/debugport.c) and see what
> > > > > > > > > that might tell you.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > The problem is that BUG() on crisv32 does not yield useful output.
> > > > > > > > Anyway, here is the culprit.
> > > > > > > > 
> > > > > > > > diff --git a/mm/bootmem.c b/mm/bootmem.c
> > > > > > > > index 6aef64254203..2bcc8901450c 100644
> > > > > > > > --- a/mm/bootmem.c
> > > > > > > > +++ b/mm/bootmem.c
> > > > > > > > @@ -382,7 +382,8 @@ static int __init mark_bootmem(unsigned long start,
> > > > > > > > unsigned long end,
> > > > > > > >                          return 0;
> > > > > > > >                  pos = bdata->node_low_pfn;
> > > > > > > >          }
> > > > > > > > -       BUG();
> > > > > > > > +       WARN(1, "mark_bootmem(): memory range 0x%lx-0x%lx not found\n",
> > > > > > > > start,
> > > > > > > > end);
> > > > > > > > +       return -ENOMEM;
> > > > > > > >   }
> > > > > > > > 
> > > > > > > >   /**
> > > > > > > > diff --git a/mm/percpu.c b/mm/percpu.c
> > > > > > > > index 79e3549cab0f..c75622d844f1 100644
> > > > > > > > --- a/mm/percpu.c
> > > > > > > > +++ b/mm/percpu.c
> > > > > > > > @@ -1881,6 +1881,7 @@ struct pcpu_alloc_info * __init
> > > > > > > > pcpu_alloc_alloc_info(int nr_groups,
> > > > > > > >    */
> > > > > > > >   void __init pcpu_free_alloc_info(struct pcpu_alloc_info *ai)
> > > > > > > >   {
> > > > > > > > +       printk("pcpu_free_alloc_info(%p (0x%lx))\n", ai, __pa(ai));
> > > > > > > >          memblock_free_early(__pa(ai), ai->__ai_size);
> > > > > > > >
> > > > > > > > results in:
> > > > > > > > 
> > > > > > > > pcpu_free_alloc_info(c0534000 (0x40534000))
> > > > > > > > ------------[ cut here ]------------
> > > > > > > > WARNING: CPU: 0 PID: 0 at mm/bootmem.c:385 mark_bootmem+0x9a/0xaa
> > > > > > > > mark_bootmem(): memory range 0x2029a-0x2029b not found
> > > > > > > 
> > > > > > > Well... PFN_UP(0x40534000) should give 0x40534. How you might end up
> > > > > > > with 0x2029a in mark_bootmem(), let alone not exit on the first "if (max
> > > > > > > == end) return 0;" within the loop is rather weird.
> > > > > > > 
> > > > > > pcpu_free_alloc_info: ai=c0536000, __pa(ai)=0x40536000,
> > > > > > PFN_UP(__pa(ai))=0x2029b, PFN_UP(ai)=0x6029b
> > > > > > 
> > > > > > bootmem range is 0x60000..0x61000. It doesn't get to "if (max == end)"
> > > > > > because "pos (=0x2029b) < bdata->node_min_pfn (=0x60000)".
> > > > > 
> > > > > OK. the 0x2029b is the result of PAGE_SIZE being 8192 in your case.
> > > > > However the bootmem allocator deals with physical addresses not virtual 
> > > > > ones. So it shouldn't give you a 0x60000..0x61000 range.
> > > > > 
> > > > > Would be interesting to see what result you get on line 860 of 
> > > > > mm/bootmem.c.
> > > > > 
> > > > Nothing; __alloc_bootmem_low_node() is not called.
> > > > 
> > > > Call chain is:
> > > >   pcpu_alloc_alloc_info
> > > >     memblock_virt_alloc_nopanic
> > > >       __alloc_bootmem_nopanic
> > > >         ___alloc_bootmem_nopanic
> > > 
> > > But from there it should continue with: 
> > > 
> > > 	alloc_bootmem_core() -->
> > > 	  alloc_bootmem_bdata() -->
> > > 	    [...]
> > > 	    region = phys_to_virt(PFN_PHYS(bdata->node_min_pfn) + start_off);
> > > 
> > > That's line 585, not 860 as I mentioned. Sorry for the confusion.
> > > 
> > bdata->node_min_pfn=60000 PFN_PHYS(bdata->node_min_pfn)=c0000000 start_off=536000 region=c0536000
> 
> If PFN_PHYS(bdata->node_min_pfn)=c0000000 and
> region=c0536000 that means phys_to_virt() is a no-op.
> 
No, it is |= 0x80000000

> However, from your result above, __pa(0xc0534000) = 0x40534000.
> 
> So, why is it that phys_to_virt() is a no-op and __pa() is not?
> 
> virt_to_phys() and __pa() are meant to be the reverse of phys_to_virt() 
> and __va().
> 
I think the problem is the 0x60000 in bdata->node_min_pfn. It is shifted
left by PFN_PHYS, making it 0xc0000000, which in my understanding is
a virtual address. So something is wrong ... presumably node_min_pfn
should be 0x20000, not 0x60000. init_bootmem_node() definitely passes
virtual pfns as parameters.

That doesn't seem to be easy to fix. It seems there is a mixup of physical
and  virtual addresses in the architecture.

Guenter

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-21  1:48                         ` Guenter Roeck
@ 2017-11-21  3:50                           ` Nicolas Pitre
  2017-11-22 15:34                             ` Jesper Nilsson
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2017-11-21  3:50 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, 20 Nov 2017, Guenter Roeck wrote:

> On Mon, Nov 20, 2017 at 07:28:21PM -0500, Nicolas Pitre wrote:
> > On Mon, 20 Nov 2017, Guenter Roeck wrote:
> > 
> > > bdata->node_min_pfn=60000 PFN_PHYS(bdata->node_min_pfn)=c0000000 start_off=536000 region=c0536000
> > 
> > If PFN_PHYS(bdata->node_min_pfn)=c0000000 and
> > region=c0536000 that means phys_to_virt() is a no-op.
> > 
> No, it is |= 0x80000000

Then the bootmem registration looks very fishy. If you have:

> I think the problem is the 0x60000 in bdata->node_min_pfn. It is shifted
> left by PFN_PHYS, making it 0xc0000000, which in my understanding is
> a virtual address.

Exact.

#define __pa(x)                 ((unsigned long)(x) & 0x7fffffff)
#define __va(x)                 ((void *)((unsigned long)(x) | 0x80000000))

With that, the only possible physical address range you may have is 
0x40000000 - 0x7fffffff, and it better start at 0x40000000. If that's 
not where your RAM is then something is wrong.

This is in fact a very bad idea to define __va() and __pa() using 
bitwise operations as this hides mistakes like defining physical RAM 
address at 0xc0000000. Instead, it should look like:

#define __pa(x)                 ((unsigned long)(x) - 0x80000000)
#define __va(x)                 ((void *)((unsigned long)(x) + 0x80000000))

This way, bad physical RAM address definitions will be caught 
immediately.

> That doesn't seem to be easy to fix. It seems there is a mixup of physical
> and  virtual addresses in the architecture.

Well... I don't think there is much else to say other than this needs 
fixing.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-21  3:50                           ` Nicolas Pitre
@ 2017-11-22 15:34                             ` Jesper Nilsson
  2017-11-22 20:17                               ` Nicolas Pitre
  0 siblings, 1 reply; 29+ messages in thread
From: Jesper Nilsson @ 2017-11-22 15:34 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Guenter Roeck, Tejun Heo, Christoph Lameter, linux-mm,
	linux-kernel, Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, Nov 20, 2017 at 10:50:46PM -0500, Nicolas Pitre wrote:
> On Mon, 20 Nov 2017, Guenter Roeck wrote:
> > On Mon, Nov 20, 2017 at 07:28:21PM -0500, Nicolas Pitre wrote:
> > > On Mon, 20 Nov 2017, Guenter Roeck wrote:
> > > 
> > > > bdata->node_min_pfn=60000 PFN_PHYS(bdata->node_min_pfn)=c0000000 start_off=536000 region=c0536000
> > > 
> > > If PFN_PHYS(bdata->node_min_pfn)=c0000000 and
> > > region=c0536000 that means phys_to_virt() is a no-op.
> > > 
> > No, it is |= 0x80000000
> 
> Then the bootmem registration looks very fishy. If you have:
> 
> > I think the problem is the 0x60000 in bdata->node_min_pfn. It is shifted
> > left by PFN_PHYS, making it 0xc0000000, which in my understanding is
> > a virtual address.
> 
> Exact.
> 
> #define __pa(x)                 ((unsigned long)(x) & 0x7fffffff)
> #define __va(x)                 ((void *)((unsigned long)(x) | 0x80000000))
> 
> With that, the only possible physical address range you may have is 
> 0x40000000 - 0x7fffffff, and it better start at 0x40000000. If that's 
> not where your RAM is then something is wrong.
> 
> This is in fact a very bad idea to define __va() and __pa() using 
> bitwise operations as this hides mistakes like defining physical RAM 
> address at 0xc0000000. Instead, it should look like:
> 
> #define __pa(x)                 ((unsigned long)(x) - 0x80000000)
> #define __va(x)                 ((void *)((unsigned long)(x) + 0x80000000))
> 
> This way, bad physical RAM address definitions will be caught 
> immediately.
> 
> > That doesn't seem to be easy to fix. It seems there is a mixup of physical
> > and  virtual addresses in the architecture.
> 
> Well... I don't think there is much else to say other than this needs 
> fixing.

The memory map for the ETRAX FS has the SDRAM mapped at both 0x40000000-0x7fffffff
and 0xc0000000-0xffffffff, and the difference is cached and non-cached.
That is actively (ab)used in the port, unfortunately, allthough I'm
uncertain if this is the problem in this case.

I get the same behaviour in my QEMU, but I've not been able to make
sense of anything yet...

> Nicolas

/^JN - Jesper Nilsson
-- 
               Jesper Nilsson -- jesper.nilsson@axis.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-22 15:34                             ` Jesper Nilsson
@ 2017-11-22 20:17                               ` Nicolas Pitre
  2017-11-23  7:56                                 ` Jesper Nilsson
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2017-11-22 20:17 UTC (permalink / raw)
  To: Jesper Nilsson
  Cc: Guenter Roeck, Tejun Heo, Christoph Lameter, linux-mm,
	linux-kernel, Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Wed, 22 Nov 2017, Jesper Nilsson wrote:

> On Mon, Nov 20, 2017 at 10:50:46PM -0500, Nicolas Pitre wrote:
> > On Mon, 20 Nov 2017, Guenter Roeck wrote:
> > > On Mon, Nov 20, 2017 at 07:28:21PM -0500, Nicolas Pitre wrote:
> > > > On Mon, 20 Nov 2017, Guenter Roeck wrote:
> > > > 
> > > > > bdata->node_min_pfn=60000 PFN_PHYS(bdata->node_min_pfn)=c0000000 start_off=536000 region=c0536000
> > > > 
> > > > If PFN_PHYS(bdata->node_min_pfn)=c0000000 and
> > > > region=c0536000 that means phys_to_virt() is a no-op.
> > > > 
> > > No, it is |= 0x80000000
> > 
> > Then the bootmem registration looks very fishy. If you have:
> > 
> > > I think the problem is the 0x60000 in bdata->node_min_pfn. It is shifted
> > > left by PFN_PHYS, making it 0xc0000000, which in my understanding is
> > > a virtual address.
> > 
> > Exact.
> > 
> > #define __pa(x)                 ((unsigned long)(x) & 0x7fffffff)
> > #define __va(x)                 ((void *)((unsigned long)(x) | 0x80000000))
> > 
> > With that, the only possible physical address range you may have is 
> > 0x40000000 - 0x7fffffff, and it better start at 0x40000000. If that's 
> > not where your RAM is then something is wrong.
> > 
> > This is in fact a very bad idea to define __va() and __pa() using 
> > bitwise operations as this hides mistakes like defining physical RAM 
> > address at 0xc0000000. Instead, it should look like:
> > 
> > #define __pa(x)                 ((unsigned long)(x) - 0x80000000)
> > #define __va(x)                 ((void *)((unsigned long)(x) + 0x80000000))
> > 
> > This way, bad physical RAM address definitions will be caught 
> > immediately.
> > 
> > > That doesn't seem to be easy to fix. It seems there is a mixup of physical
> > > and  virtual addresses in the architecture.
> > 
> > Well... I don't think there is much else to say other than this needs 
> > fixing.
> 
> The memory map for the ETRAX FS has the SDRAM mapped at both 0x40000000-0x7fffffff
> and 0xc0000000-0xffffffff, and the difference is cached and non-cached.
> That is actively (ab)used in the port, unfortunately, allthough I'm
> uncertain if this is the problem in this case.

It certainly is a problem. If your cached RAM is physically mapped at 
0xc0000000 and you want it to be virtually mapped at 0xc0000000 then you 
should have:

#define __pa(x)                 ((unsigned long)(x))
#define __va(x)                 ((void *)(x))

i.e. no translation. For non-cached RAM access, there are specific 
interfaces for that. For example, you could have dma_alloc_coherent() 
take advantage of the fact that memory with the top bit cleared becomes 
uncached. But __pa() is the wrong interface for obtaining uncached 
memory.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-22 20:17                               ` Nicolas Pitre
@ 2017-11-23  7:56                                 ` Jesper Nilsson
  0 siblings, 0 replies; 29+ messages in thread
From: Jesper Nilsson @ 2017-11-23  7:56 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Jesper Nilsson, Guenter Roeck, Tejun Heo, Christoph Lameter,
	linux-mm, linux-kernel, Mikael Starvik, linux-cris-kernel

On Wed, Nov 22, 2017 at 03:17:00PM -0500, Nicolas Pitre wrote:
> On Wed, 22 Nov 2017, Jesper Nilsson wrote:
> 
> > On Mon, Nov 20, 2017 at 10:50:46PM -0500, Nicolas Pitre wrote:
> > > On Mon, 20 Nov 2017, Guenter Roeck wrote:
> > > > On Mon, Nov 20, 2017 at 07:28:21PM -0500, Nicolas Pitre wrote:
> > > > > On Mon, 20 Nov 2017, Guenter Roeck wrote:
> > > > > 
> > > > > > bdata->node_min_pfn=60000 PFN_PHYS(bdata->node_min_pfn)=c0000000 start_off=536000 region=c0536000
> > > > > 
> > > > > If PFN_PHYS(bdata->node_min_pfn)=c0000000 and
> > > > > region=c0536000 that means phys_to_virt() is a no-op.
> > > > > 
> > > > No, it is |= 0x80000000
> > > 
> > > Then the bootmem registration looks very fishy. If you have:
> > > 
> > > > I think the problem is the 0x60000 in bdata->node_min_pfn. It is shifted
> > > > left by PFN_PHYS, making it 0xc0000000, which in my understanding is
> > > > a virtual address.
> > > 
> > > Exact.
> > > 
> > > #define __pa(x)                 ((unsigned long)(x) & 0x7fffffff)
> > > #define __va(x)                 ((void *)((unsigned long)(x) | 0x80000000))
> > > 
> > > With that, the only possible physical address range you may have is 
> > > 0x40000000 - 0x7fffffff, and it better start at 0x40000000. If that's 
> > > not where your RAM is then something is wrong.
> > > 
> > > This is in fact a very bad idea to define __va() and __pa() using 
> > > bitwise operations as this hides mistakes like defining physical RAM 
> > > address at 0xc0000000. Instead, it should look like:
> > > 
> > > #define __pa(x)                 ((unsigned long)(x) - 0x80000000)
> > > #define __va(x)                 ((void *)((unsigned long)(x) + 0x80000000))
> > > 
> > > This way, bad physical RAM address definitions will be caught 
> > > immediately.
> > > 
> > > > That doesn't seem to be easy to fix. It seems there is a mixup of physical
> > > > and  virtual addresses in the architecture.
> > > 
> > > Well... I don't think there is much else to say other than this needs 
> > > fixing.
> > 
> > The memory map for the ETRAX FS has the SDRAM mapped at both 0x40000000-0x7fffffff
> > and 0xc0000000-0xffffffff, and the difference is cached and non-cached.
> > That is actively (ab)used in the port, unfortunately, allthough I'm
> > uncertain if this is the problem in this case.
> 
> It certainly is a problem. If your cached RAM is physically mapped at 
> 0xc0000000 and you want it to be virtually mapped at 0xc0000000 then you 
> should have:
> 
> #define __pa(x)                 ((unsigned long)(x))
> #define __va(x)                 ((void *)(x))
> 
> i.e. no translation.

Sorry, it's the other way around, cached memory is at 0x40000000 and
non-cached is at 0xc0000000, so the translation is right, even if
as you pointed out earlier, it should be performed differently.

> For non-cached RAM access, there are specific 
> interfaces for that. For example, you could have dma_alloc_coherent() 
> take advantage of the fact that memory with the top bit cleared becomes 
> uncached. But __pa() is the wrong interface for obtaining uncached 
> memory.
> 
> Nicolas

/^JN - Jesper Nilsson
-- 
               Jesper Nilsson -- jesper.nilsson@axis.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-18 18:25     ` mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang) Guenter Roeck
  2017-11-19 20:36       ` Nicolas Pitre
@ 2017-11-27 19:41       ` Tejun Heo
  2017-11-27 20:31         ` Nicolas Pitre
  1 sibling, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2017-11-27 19:41 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Nicolas Pitre, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

Hello,

I'm reverting the offending commit till we figure out what's going on.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-27 19:41       ` Tejun Heo
@ 2017-11-27 20:31         ` Nicolas Pitre
  2017-11-27 20:33           ` Tejun Heo
  0 siblings, 1 reply; 29+ messages in thread
From: Nicolas Pitre @ 2017-11-27 20:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Guenter Roeck, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, 27 Nov 2017, Tejun Heo wrote:

> Hello,
> 
> I'm reverting the offending commit till we figure out what's going on.

It is figured out. The cris port is wrongly initializing the bootmem 
allocator with virtual memory addresses rather than physical addresses. 
And because its __va() definition reads like this:

#define __va(x) ((void *)((unsigned long)(x) | 0x80000000))

then things just work out because the end result is the same whether you 
give this a physical or a virtual address.

Untill you call memblock_free_early(__pa(address)) that is, because 
values from __pa() don't match with the virtual addresses stuffed in the 
bootmem allocator anymore.

So IMHO I don't think reverting the commit is the right thing to do. 
That commit is clearly not at fault here.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-27 20:31         ` Nicolas Pitre
@ 2017-11-27 20:33           ` Tejun Heo
  2017-11-27 20:51             ` Nicolas Pitre
  0 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2017-11-27 20:33 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Guenter Roeck, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

Hello,

On Mon, Nov 27, 2017 at 03:31:52PM -0500, Nicolas Pitre wrote:
> So IMHO I don't think reverting the commit is the right thing to do. 
> That commit is clearly not at fault here.

It's not about the blame.  We just want to avoid breaking boot in a
way which is difficult to debug.  Once cris is fixed, we can re-apply
the patch.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-27 20:33           ` Tejun Heo
@ 2017-11-27 20:51             ` Nicolas Pitre
  2017-11-27 20:54               ` Tejun Heo
  2017-11-28  8:19               ` Jesper Nilsson
  0 siblings, 2 replies; 29+ messages in thread
From: Nicolas Pitre @ 2017-11-27 20:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Guenter Roeck, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, 27 Nov 2017, Tejun Heo wrote:

> Hello,
> 
> On Mon, Nov 27, 2017 at 03:31:52PM -0500, Nicolas Pitre wrote:
> > So IMHO I don't think reverting the commit is the right thing to do. 
> > That commit is clearly not at fault here.
> 
> It's not about the blame.  We just want to avoid breaking boot in a
> way which is difficult to debug.  Once cris is fixed, we can re-apply
> the patch.

In that case I suggest the following instead. No point penalizing 
everyone for a single architecture's fault. And this will serve as a 
visible reminder to the cris people that they need to clean up.

----- >8
Subject: percpu: hack to let the CRIS architecture to boot until they clean up

Commit 438a506180 ("percpu: don't forget to free the temporary struct 
pcpu_alloc_info") uncovered a problem on the CRIS architecture where
the bootmem allocator is initialized with virtual addresses. Given it 
has:

    #define __va(x) ((void *)((unsigned long)(x) | 0x80000000))

then things just work out because the end result is the same whether you
give this a physical or a virtual address.

Untill you call memblock_free_early(__pa(address)) that is, because
values from __pa() don't match with the virtual addresses stuffed in the
bootmem allocator anymore.

Avoid freeing the temporary pcpu_alloc_info memory on that architecture
until they fix things up to let the kernel boot like it did before.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/mm/percpu.c b/mm/percpu.c
index 79e3549cab..50e7fdf840 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2719,7 +2719,11 @@ void __init setup_per_cpu_areas(void)
 
 	if (pcpu_setup_first_chunk(ai, fc) < 0)
 		panic("Failed to initialize percpu areas.");
+#ifdef CONFIG_CRIS
+#warning "the CRIS architecture has physical and virtual addresses confused"
+#else
 	pcpu_free_alloc_info(ai);
+#endif
 }
 
 #endif	/* CONFIG_SMP */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-27 20:51             ` Nicolas Pitre
@ 2017-11-27 20:54               ` Tejun Heo
  2017-11-27 21:11                 ` Guenter Roeck
  2017-11-28  8:19               ` Jesper Nilsson
  1 sibling, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2017-11-27 20:54 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Guenter Roeck, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

Hello, Nicolas.

On Mon, Nov 27, 2017 at 03:51:04PM -0500, Nicolas Pitre wrote:
> Subject: percpu: hack to let the CRIS architecture to boot until they clean up
> 
> Commit 438a506180 ("percpu: don't forget to free the temporary struct 
> pcpu_alloc_info") uncovered a problem on the CRIS architecture where
> the bootmem allocator is initialized with virtual addresses. Given it 
> has:
> 
>     #define __va(x) ((void *)((unsigned long)(x) | 0x80000000))
> 
> then things just work out because the end result is the same whether you
> give this a physical or a virtual address.
> 
> Untill you call memblock_free_early(__pa(address)) that is, because
> values from __pa() don't match with the virtual addresses stuffed in the
> bootmem allocator anymore.
> 
> Avoid freeing the temporary pcpu_alloc_info memory on that architecture
> until they fix things up to let the kernel boot like it did before.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>

This totally works for me.  Replaced the revert with this one.

Thanks!

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-27 20:54               ` Tejun Heo
@ 2017-11-27 21:11                 ` Guenter Roeck
  0 siblings, 0 replies; 29+ messages in thread
From: Guenter Roeck @ 2017-11-27 21:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Nicolas Pitre, Christoph Lameter, linux-mm, linux-kernel,
	Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, Nov 27, 2017 at 12:54:21PM -0800, Tejun Heo wrote:
> Hello, Nicolas.
> 
> On Mon, Nov 27, 2017 at 03:51:04PM -0500, Nicolas Pitre wrote:
> > Subject: percpu: hack to let the CRIS architecture to boot until they clean up
> > 
> > Commit 438a506180 ("percpu: don't forget to free the temporary struct 
> > pcpu_alloc_info") uncovered a problem on the CRIS architecture where
> > the bootmem allocator is initialized with virtual addresses. Given it 
> > has:
> > 
> >     #define __va(x) ((void *)((unsigned long)(x) | 0x80000000))
> > 
> > then things just work out because the end result is the same whether you
> > give this a physical or a virtual address.
> > 
> > Untill you call memblock_free_early(__pa(address)) that is, because
> > values from __pa() don't match with the virtual addresses stuffed in the
> > bootmem allocator anymore.
> > 
> > Avoid freeing the temporary pcpu_alloc_info memory on that architecture
> > until they fix things up to let the kernel boot like it did before.
> > 
> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> This totally works for me.  Replaced the revert with this one.
> 
Same here.

Thanks,
Guenter

> Thanks!
> 
> -- 
> tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang)
  2017-11-27 20:51             ` Nicolas Pitre
  2017-11-27 20:54               ` Tejun Heo
@ 2017-11-28  8:19               ` Jesper Nilsson
  1 sibling, 0 replies; 29+ messages in thread
From: Jesper Nilsson @ 2017-11-28  8:19 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Tejun Heo, Guenter Roeck, Christoph Lameter, linux-mm,
	linux-kernel, Mikael Starvik, Jesper Nilsson, linux-cris-kernel

On Mon, Nov 27, 2017 at 03:51:04PM -0500, Nicolas Pitre wrote:
> On Mon, 27 Nov 2017, Tejun Heo wrote:
> 
> > Hello,
> > 
> > On Mon, Nov 27, 2017 at 03:31:52PM -0500, Nicolas Pitre wrote:
> > > So IMHO I don't think reverting the commit is the right thing to do. 
> > > That commit is clearly not at fault here.
> > 
> > It's not about the blame.  We just want to avoid breaking boot in a
> > way which is difficult to debug.  Once cris is fixed, we can re-apply
> > the patch.
> 
> In that case I suggest the following instead. No point penalizing 
> everyone for a single architecture's fault. And this will serve as a 
> visible reminder to the cris people that they need to clean up.
> 
> ----- >8
> Subject: percpu: hack to let the CRIS architecture to boot until they clean up
> 
> Commit 438a506180 ("percpu: don't forget to free the temporary struct 
> pcpu_alloc_info") uncovered a problem on the CRIS architecture where
> the bootmem allocator is initialized with virtual addresses. Given it 
> has:
> 
>     #define __va(x) ((void *)((unsigned long)(x) | 0x80000000))
> 
> then things just work out because the end result is the same whether you
> give this a physical or a virtual address.
> 
> Untill you call memblock_free_early(__pa(address)) that is, because
> values from __pa() don't match with the virtual addresses stuffed in the
> bootmem allocator anymore.
> 
> Avoid freeing the temporary pcpu_alloc_info memory on that architecture
> until they fix things up to let the kernel boot like it did before.
> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 79e3549cab..50e7fdf840 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2719,7 +2719,11 @@ void __init setup_per_cpu_areas(void)
>  
>  	if (pcpu_setup_first_chunk(ai, fc) < 0)
>  		panic("Failed to initialize percpu areas.");
> +#ifdef CONFIG_CRIS
> +#warning "the CRIS architecture has physical and virtual addresses confused"
> +#else
>  	pcpu_free_alloc_info(ai);
> +#endif
>  }
>  
>  #endif	/* CONFIG_SMP */

Works for me, and thanks.

/^JN - Jesper Nilsson
-- 
               Jesper Nilsson -- jesper.nilsson@axis.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-11-28  8:19 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-03 20:57 [PATCH] mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info Nicolas Pitre
2017-10-03 21:05 ` Tejun Heo
2017-10-03 22:29   ` Nicolas Pitre
2017-10-03 22:36     ` Tejun Heo
2017-10-03 23:48       ` Dennis Zhou
2017-10-04  0:13         ` Nicolas Pitre
2017-10-04 14:15     ` Tejun Heo
2017-11-18 18:25     ` mm/percpu.c: use smarter memory allocation for struct pcpu_alloc_info (crisv32 hang) Guenter Roeck
2017-11-19 20:36       ` Nicolas Pitre
2017-11-20  2:03         ` Guenter Roeck
2017-11-20  4:08           ` Nicolas Pitre
2017-11-20  5:05             ` Guenter Roeck
2017-11-20 18:18               ` Nicolas Pitre
2017-11-20 18:51                 ` Guenter Roeck
2017-11-20 20:21                   ` Nicolas Pitre
2017-11-20 21:11                     ` Guenter Roeck
2017-11-21  0:28                       ` Nicolas Pitre
2017-11-21  1:48                         ` Guenter Roeck
2017-11-21  3:50                           ` Nicolas Pitre
2017-11-22 15:34                             ` Jesper Nilsson
2017-11-22 20:17                               ` Nicolas Pitre
2017-11-23  7:56                                 ` Jesper Nilsson
2017-11-27 19:41       ` Tejun Heo
2017-11-27 20:31         ` Nicolas Pitre
2017-11-27 20:33           ` Tejun Heo
2017-11-27 20:51             ` Nicolas Pitre
2017-11-27 20:54               ` Tejun Heo
2017-11-27 21:11                 ` Guenter Roeck
2017-11-28  8:19               ` Jesper Nilsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).