linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
@ 2012-04-25 20:10 David Miller
  2012-04-25 20:12 ` Tejun Heo
  2012-04-25 22:46 ` Yinghai Lu
  0 siblings, 2 replies; 10+ messages in thread
From: David Miller @ 2012-04-25 20:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: yinghai, tj, torvalds


The comments above __alloc_bootmem_node() claim that the code will
first try the allocation using 'goal' and if that fails it will
try again but with the 'goal' requirement dropped.

Unfortunately, this is not what the code does, so fix it to do so.

This is important for nobootmem conversions to architectures such
as sparc where MAX_DMA_ADDRESS is infinity.

On such architectures all of the allocations done by generic spots,
such as the sparse-vmemmap implementation, will pass in:

	__pa(MAX_DMA_ADDRESS)

as the goal, and with the limit given as "-1" this will always fail
unless we add the appropriate fallback logic here.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 24f0fc1..e53bb8a 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -298,13 +298,19 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
 	if (WARN_ON_ONCE(slab_is_available()))
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
 
+again:
 	ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
 					 goal, -1ULL);
 	if (ptr)
 		return ptr;
 
-	return __alloc_memory_core_early(MAX_NUMNODES, size, align,
-					 goal, -1ULL);
+	ptr = __alloc_memory_core_early(MAX_NUMNODES, size, align,
+					goal, -1ULL);
+	if (!ptr && goal) {
+		goal = 0;
+		goto again;
+	}
+	return ptr;
 }
 
 void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
  2012-04-25 20:10 [PATCH] mm: nobootmem: Correct alloc_bootmem semantics David Miller
@ 2012-04-25 20:12 ` Tejun Heo
  2012-04-25 22:46 ` Yinghai Lu
  1 sibling, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2012-04-25 20:12 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, yinghai, torvalds

On Wed, Apr 25, 2012 at 04:10:50PM -0400, David Miller wrote:
> 
> The comments above __alloc_bootmem_node() claim that the code will
> first try the allocation using 'goal' and if that fails it will
> try again but with the 'goal' requirement dropped.
> 
> Unfortunately, this is not what the code does, so fix it to do so.
> 
> This is important for nobootmem conversions to architectures such
> as sparc where MAX_DMA_ADDRESS is infinity.
> 
> On such architectures all of the allocations done by generic spots,
> such as the sparse-vmemmap implementation, will pass in:
> 
> 	__pa(MAX_DMA_ADDRESS)
> 
> as the goal, and with the limit given as "-1" this will always fail
> unless we add the appropriate fallback logic here.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
  2012-04-25 20:10 [PATCH] mm: nobootmem: Correct alloc_bootmem semantics David Miller
  2012-04-25 20:12 ` Tejun Heo
@ 2012-04-25 22:46 ` Yinghai Lu
  2012-04-25 23:00   ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Yinghai Lu @ 2012-04-25 22:46 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, tj, torvalds

On Wed, Apr 25, 2012 at 1:10 PM, David Miller <davem@davemloft.net> wrote:
>
> The comments above __alloc_bootmem_node() claim that the code will
> first try the allocation using 'goal' and if that fails it will
> try again but with the 'goal' requirement dropped.
>
> Unfortunately, this is not what the code does, so fix it to do so.
>
> This is important for nobootmem conversions to architectures such
> as sparc where MAX_DMA_ADDRESS is infinity.
>
> On such architectures all of the allocations done by generic spots,
> such as the sparse-vmemmap implementation, will pass in:
>
>        __pa(MAX_DMA_ADDRESS)
>
> as the goal, and with the limit given as "-1" this will always fail
> unless we add the appropriate fallback logic here.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/mm/nobootmem.c b/mm/nobootmem.c
> index 24f0fc1..e53bb8a 100644
> --- a/mm/nobootmem.c
> +++ b/mm/nobootmem.c
> @@ -298,13 +298,19 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
>        if (WARN_ON_ONCE(slab_is_available()))
>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>
> +again:
>        ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
>                                         goal, -1ULL);
>        if (ptr)
>                return ptr;

If you want to be consistent to bootmem version.

again label should be here instead.

>
> -       return __alloc_memory_core_early(MAX_NUMNODES, size, align,
> -                                        goal, -1ULL);
> +       ptr = __alloc_memory_core_early(MAX_NUMNODES, size, align,
> +                                       goal, -1ULL);
> +       if (!ptr && goal) {
> +               goal = 0;
> +               goto again;
> +       }
> +       return ptr;
>  }

Thanks

Yinghai

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
  2012-04-25 22:46 ` Yinghai Lu
@ 2012-04-25 23:00   ` David Miller
  2012-04-25 23:14     ` Yinghai Lu
  2012-05-03 15:28     ` Johannes Weiner
  0 siblings, 2 replies; 10+ messages in thread
From: David Miller @ 2012-04-25 23:00 UTC (permalink / raw)
  To: yinghai; +Cc: linux-kernel, tj, torvalds

From: Yinghai Lu <yinghai@kernel.org>
Date: Wed, 25 Apr 2012 15:46:42 -0700

> On Wed, Apr 25, 2012 at 1:10 PM, David Miller <davem@davemloft.net> wrote:
>> @@ -298,13 +298,19 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
>>        if (WARN_ON_ONCE(slab_is_available()))
>>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>>
>> +again:
>>        ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
>>                                         goal, -1ULL);
>>        if (ptr)
>>                return ptr;
> 
> If you want to be consistent to bootmem version.
> 
> again label should be here instead.

It is merely an artifact of implementation that the bootmem version
doesn't try to respect the given node if the goal cannot be satisfied,
and in fact I would classify that as a bug that needs to be fixed.

Therefore, I believe the bootmem case is what needs to be adjusted
instead.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
  2012-04-25 23:00   ` David Miller
@ 2012-04-25 23:14     ` Yinghai Lu
  2012-04-25 23:15       ` David Miller
  2012-05-03 15:28     ` Johannes Weiner
  1 sibling, 1 reply; 10+ messages in thread
From: Yinghai Lu @ 2012-04-25 23:14 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, tj, torvalds

On Wed, Apr 25, 2012 at 4:00 PM, David Miller <davem@davemloft.net> wrote:
> From: Yinghai Lu <yinghai@kernel.org>
> Date: Wed, 25 Apr 2012 15:46:42 -0700
>
>> On Wed, Apr 25, 2012 at 1:10 PM, David Miller <davem@davemloft.net> wrote:
>>> @@ -298,13 +298,19 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
>>>        if (WARN_ON_ONCE(slab_is_available()))
>>>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>>>
>>> +again:
>>>        ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
>>>                                         goal, -1ULL);
>>>        if (ptr)
>>>                return ptr;
>>
>> If you want to be consistent to bootmem version.
>>
>> again label should be here instead.
>
> It is merely an artifact of implementation that the bootmem version
> doesn't try to respect the given node if the goal cannot be satisfied,
> and in fact I would classify that as a bug that needs to be fixed.
>
> Therefore, I believe the bootmem case is what needs to be adjusted
> instead.

Yes.

Acked-by: Yinghai Lu <yinghai@kernel.org>

Linus will pick it directly or through your sparc nobootmem conversion?

Yinghai

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
  2012-04-25 23:14     ` Yinghai Lu
@ 2012-04-25 23:15       ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2012-04-25 23:15 UTC (permalink / raw)
  To: yinghai; +Cc: linux-kernel, tj, torvalds

From: Yinghai Lu <yinghai@kernel.org>
Date: Wed, 25 Apr 2012 16:14:00 -0700

> On Wed, Apr 25, 2012 at 4:00 PM, David Miller <davem@davemloft.net> wrote:
>> From: Yinghai Lu <yinghai@kernel.org>
>> Date: Wed, 25 Apr 2012 15:46:42 -0700
>>
>>> On Wed, Apr 25, 2012 at 1:10 PM, David Miller <davem@davemloft.net> wrote:
>>>> @@ -298,13 +298,19 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
>>>>        if (WARN_ON_ONCE(slab_is_available()))
>>>>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>>>>
>>>> +again:
>>>>        ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
>>>>                                         goal, -1ULL);
>>>>        if (ptr)
>>>>                return ptr;
>>>
>>> If you want to be consistent to bootmem version.
>>>
>>> again label should be here instead.
>>
>> It is merely an artifact of implementation that the bootmem version
>> doesn't try to respect the given node if the goal cannot be satisfied,
>> and in fact I would classify that as a bug that needs to be fixed.
>>
>> Therefore, I believe the bootmem case is what needs to be adjusted
>> instead.
> 
> Yes.
> 
> Acked-by: Yinghai Lu <yinghai@kernel.org>
> 
> Linus will pick it directly or through your sparc nobootmem conversion?

I was hoping Linus would take this directly.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
  2012-04-25 23:00   ` David Miller
  2012-04-25 23:14     ` Yinghai Lu
@ 2012-05-03 15:28     ` Johannes Weiner
  2012-05-03 17:04       ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2012-05-03 15:28 UTC (permalink / raw)
  To: David Miller; +Cc: yinghai, linux-kernel, tj, torvalds

On Wed, Apr 25, 2012 at 07:00:34PM -0400, David Miller wrote:
> From: Yinghai Lu <yinghai@kernel.org>
> Date: Wed, 25 Apr 2012 15:46:42 -0700
> 
> > On Wed, Apr 25, 2012 at 1:10 PM, David Miller <davem@davemloft.net> wrote:
> >> @@ -298,13 +298,19 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
> >>        if (WARN_ON_ONCE(slab_is_available()))
> >>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
> >>
> >> +again:
> >>        ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> >>                                         goal, -1ULL);
> >>        if (ptr)
> >>                return ptr;
> > 
> > If you want to be consistent to bootmem version.
> > 
> > again label should be here instead.
> 
> It is merely an artifact of implementation that the bootmem version
> doesn't try to respect the given node if the goal cannot be satisfied,
> and in fact I would classify that as a bug that needs to be fixed.
> 
> Therefore, I believe the bootmem case is what needs to be adjusted
> instead.

Now it does: node+goal, goal, node, anywhere

whereas the memblock version of __alloc_bootmem_node_nopanic() also
still does: node+goal, goal, anywhere

Your description suggests that the node should be higher prioritized
than the goal, which I understand as: node+goal, node, anywhere.

Which do we actually want?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
  2012-05-03 15:28     ` Johannes Weiner
@ 2012-05-03 17:04       ` David Miller
  2012-05-04  9:41         ` Johannes Weiner
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2012-05-03 17:04 UTC (permalink / raw)
  To: hannes; +Cc: yinghai, linux-kernel, tj, torvalds

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 3 May 2012 17:28:41 +0200

> On Wed, Apr 25, 2012 at 07:00:34PM -0400, David Miller wrote:
>> From: Yinghai Lu <yinghai@kernel.org>
>> Date: Wed, 25 Apr 2012 15:46:42 -0700
>> 
>> > On Wed, Apr 25, 2012 at 1:10 PM, David Miller <davem@davemloft.net> wrote:
>> >> @@ -298,13 +298,19 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
>> >>        if (WARN_ON_ONCE(slab_is_available()))
>> >>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>> >>
>> >> +again:
>> >>        ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
>> >>                                         goal, -1ULL);
>> >>        if (ptr)
>> >>                return ptr;
>> > 
>> > If you want to be consistent to bootmem version.
>> > 
>> > again label should be here instead.
>> 
>> It is merely an artifact of implementation that the bootmem version
>> doesn't try to respect the given node if the goal cannot be satisfied,
>> and in fact I would classify that as a bug that needs to be fixed.
>> 
>> Therefore, I believe the bootmem case is what needs to be adjusted
>> instead.
> 
> Now it does: node+goal, goal, node, anywhere
> 
> whereas the memblock version of __alloc_bootmem_node_nopanic() also
> still does: node+goal, goal, anywhere
> 
> Your description suggests that the node should be higher prioritized
> than the goal, which I understand as: node+goal, node, anywhere.
> 
> Which do we actually want?

I think the goal is what needs to be prioritized.  An explicit goal usually
has a requirement, like "I need physical memory in the low 32-bits" and if
they specified an explicit node they really mean "and give me it on NUMA
node X if you can."  Hence the sequence:

	node+goal, goal, node, any

the only other reasonable option would be:

	node+goal, node, goal, any

but I think that doesn't match what people want when an explicit goal
is specified.  Do you?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
  2012-05-03 17:04       ` David Miller
@ 2012-05-04  9:41         ` Johannes Weiner
  2012-05-04 14:46           ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Weiner @ 2012-05-04  9:41 UTC (permalink / raw)
  To: David Miller; +Cc: yinghai, linux-kernel, tj, torvalds

On Thu, May 03, 2012 at 01:04:16PM -0400, David Miller wrote:
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Thu, 3 May 2012 17:28:41 +0200
> 
> > On Wed, Apr 25, 2012 at 07:00:34PM -0400, David Miller wrote:
> >> From: Yinghai Lu <yinghai@kernel.org>
> >> Date: Wed, 25 Apr 2012 15:46:42 -0700
> >> 
> >> > On Wed, Apr 25, 2012 at 1:10 PM, David Miller <davem@davemloft.net> wrote:
> >> >> @@ -298,13 +298,19 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
> >> >>        if (WARN_ON_ONCE(slab_is_available()))
> >> >>                return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
> >> >>
> >> >> +again:
> >> >>        ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> >> >>                                         goal, -1ULL);
> >> >>        if (ptr)
> >> >>                return ptr;
> >> > 
> >> > If you want to be consistent to bootmem version.
> >> > 
> >> > again label should be here instead.
> >> 
> >> It is merely an artifact of implementation that the bootmem version
> >> doesn't try to respect the given node if the goal cannot be satisfied,
> >> and in fact I would classify that as a bug that needs to be fixed.
> >> 
> >> Therefore, I believe the bootmem case is what needs to be adjusted
> >> instead.
> > 
> > Now it does: node+goal, goal, node, anywhere
> > 
> > whereas the memblock version of __alloc_bootmem_node_nopanic() also
> > still does: node+goal, goal, anywhere
> > 
> > Your description suggests that the node should be higher prioritized
> > than the goal, which I understand as: node+goal, node, anywhere.
> > 
> > Which do we actually want?
> 
> I think the goal is what needs to be prioritized.  An explicit goal usually
> has a requirement, like "I need physical memory in the low 32-bits" and if
> they specified an explicit node they really mean "and give me it on NUMA
> node X if you can."  Hence the sequence:
> 
> 	node+goal, goal, node, any
> 
> the only other reasonable option would be:
> 
> 	node+goal, node, goal, any
> 
> but I think that doesn't match what people want when an explicit goal
> is specified.  Do you?

Oh I think that's what limit is for.  The goal is usually to allocate
high address memory for users that can deal with it and keep lowmem
for users that can't.

For example, I can imagine sparsemem usemap allocation in the memory
hotplug case would prefer having the usemap on the same node as the
corresponding pgdat descriptor than allocating on any node above the
goal and possibly create circular dependencies.

But that is quite rare/unlikely anyway, and I guess in most other
cases it's better to go for preventing lowmem exhaustian than to
preserve node locality.

So I'm fine with this priority order, but it's a judgement call.

I'll send patches to make everything use the same policy.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: nobootmem: Correct alloc_bootmem semantics.
  2012-05-04  9:41         ` Johannes Weiner
@ 2012-05-04 14:46           ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2012-05-04 14:46 UTC (permalink / raw)
  To: hannes; +Cc: yinghai, linux-kernel, tj, torvalds

From: Johannes Weiner <hannes@cmpxchg.org>
Date: Fri, 4 May 2012 11:41:05 +0200

> I'll send patches to make everything use the same policy.

Thanks for doing this Johannes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-05-04 14:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-25 20:10 [PATCH] mm: nobootmem: Correct alloc_bootmem semantics David Miller
2012-04-25 20:12 ` Tejun Heo
2012-04-25 22:46 ` Yinghai Lu
2012-04-25 23:00   ` David Miller
2012-04-25 23:14     ` Yinghai Lu
2012-04-25 23:15       ` David Miller
2012-05-03 15:28     ` Johannes Weiner
2012-05-03 17:04       ` David Miller
2012-05-04  9:41         ` Johannes Weiner
2012-05-04 14:46           ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).