linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: bootmem: use phys_addr_t for physical addresses
@ 2012-09-12 16:06 Cyril Chemparathy
  2012-09-12 20:39 ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Cyril Chemparathy @ 2012-09-12 16:06 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: akpm, davem, eric.dumazet, hannes, shangw, tj, vitalya,
	Cyril Chemparathy

From: Vitaly Andrianov <vitalya@ti.com>

On a physical address extended (PAE) systems physical memory may be located
outside the first 4GB address range.  In particular, on TI Keystone devices,
all memory (including lowmem) is located outside the 4G address space. Many
functions in the bootmem.c use unsigned long as a type for physical addresses,
and this breaks badly on such PAE systems.

This patch intensively mangles the bootmem allocator to use phys_addr_t where
necessary.  We are aware that this is most certainly not the way to go
considering that the ARM architecture appears to be moving towards memblock.
Memblock may be a better solution, and fortunately it looks a lot more PAE
savvy than bootmem is.

However, we do not fully understand the motivations and restrictions behind
the mixed bootmem + memblock model in current ARM code. We hope for a
meaningful discussion and useful guidance towards a better solution to this
problem.

Signed-off-by: Vitaly Andrianov <vitalya@ti.com>
Signed-off-by: Cyril Chemparathy <cyril@ti.com>
---
 include/linux/bootmem.h |   30 ++++++++++++------------
 mm/bootmem.c            |   59 ++++++++++++++++++++++++-----------------------
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index 6d6795d..e43c463 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -49,10 +49,10 @@ extern unsigned long free_all_bootmem_node(pg_data_t *pgdat);
 extern unsigned long free_all_bootmem(void);
 
 extern void free_bootmem_node(pg_data_t *pgdat,
-			      unsigned long addr,
+			      phys_addr_t addr,
 			      unsigned long size);
-extern void free_bootmem(unsigned long addr, unsigned long size);
-extern void free_bootmem_late(unsigned long addr, unsigned long size);
+extern void free_bootmem(phys_addr_t addr, unsigned long size);
+extern void free_bootmem_late(phys_addr_t addr, unsigned long size);
 
 /*
  * Flags for reserve_bootmem (also if CONFIG_HAVE_ARCH_BOOTMEM_NODE,
@@ -65,44 +65,44 @@ extern void free_bootmem_late(unsigned long addr, unsigned long size);
 #define BOOTMEM_DEFAULT		0
 #define BOOTMEM_EXCLUSIVE	(1<<0)
 
-extern int reserve_bootmem(unsigned long addr,
+extern int reserve_bootmem(phys_addr_t addr,
 			   unsigned long size,
 			   int flags);
 extern int reserve_bootmem_node(pg_data_t *pgdat,
-				unsigned long physaddr,
+				phys_addr_t physaddr,
 				unsigned long size,
 				int flags);
 
 extern void *__alloc_bootmem(unsigned long size,
 			     unsigned long align,
-			     unsigned long goal);
+			     phys_addr_t goal);
 extern void *__alloc_bootmem_nopanic(unsigned long size,
 				     unsigned long align,
-				     unsigned long goal);
+				     phys_addr_t goal);
 extern void *__alloc_bootmem_node(pg_data_t *pgdat,
 				  unsigned long size,
 				  unsigned long align,
-				  unsigned long goal);
+				  phys_addr_t goal);
 void *__alloc_bootmem_node_high(pg_data_t *pgdat,
 				  unsigned long size,
 				  unsigned long align,
-				  unsigned long goal);
+				  phys_addr_t goal);
 extern void *__alloc_bootmem_node_nopanic(pg_data_t *pgdat,
 				  unsigned long size,
 				  unsigned long align,
-				  unsigned long goal);
+				  phys_addr_t goal);
 void *___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
 				  unsigned long size,
 				  unsigned long align,
-				  unsigned long goal,
-				  unsigned long limit);
+				  phys_addr_t goal,
+				  phys_addr_t limit);
 extern void *__alloc_bootmem_low(unsigned long size,
 				 unsigned long align,
-				 unsigned long goal);
+				 phys_addr_t goal);
 extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 				      unsigned long size,
 				      unsigned long align,
-				      unsigned long goal);
+				      phys_addr_t goal);
 
 #ifdef CONFIG_NO_BOOTMEM
 /* We are using top down, so it is safe to use 0 here */
@@ -137,7 +137,7 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 #define alloc_bootmem_low_pages_node(pgdat, x) \
 	__alloc_bootmem_low_node(pgdat, x, PAGE_SIZE, 0)
 
-extern int reserve_bootmem_generic(unsigned long addr, unsigned long size,
+extern int reserve_bootmem_generic(phys_addr_t addr, unsigned long size,
 				   int flags);
 
 #ifdef CONFIG_HAVE_ARCH_ALLOC_REMAP
diff --git a/mm/bootmem.c b/mm/bootmem.c
index f468185..3a4ee17 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -154,7 +154,7 @@ unsigned long __init init_bootmem(unsigned long start, unsigned long pages)
  * down, but we are still initializing the system.  Pages are given directly
  * to the page allocator, no bootmem metadata is updated because it is gone.
  */
-void __init free_bootmem_late(unsigned long addr, unsigned long size)
+void __init free_bootmem_late(phys_addr_t addr, unsigned long size)
 {
 	unsigned long cursor, end;
 
@@ -362,7 +362,7 @@ static int __init mark_bootmem(unsigned long start, unsigned long end,
  *
  * The range must reside completely on the specified node.
  */
-void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
+void __init free_bootmem_node(pg_data_t *pgdat, phys_addr_t physaddr,
 			      unsigned long size)
 {
 	unsigned long start, end;
@@ -384,7 +384,7 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
  *
  * The range must be contiguous but may span node boundaries.
  */
-void __init free_bootmem(unsigned long addr, unsigned long size)
+void __init free_bootmem(phys_addr_t addr, unsigned long size)
 {
 	unsigned long start, end;
 
@@ -407,7 +407,7 @@ void __init free_bootmem(unsigned long addr, unsigned long size)
  *
  * The range must reside completely on the specified node.
  */
-int __init reserve_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
+int __init reserve_bootmem_node(pg_data_t *pgdat, phys_addr_t physaddr,
 				 unsigned long size, int flags)
 {
 	unsigned long start, end;
@@ -428,7 +428,7 @@ int __init reserve_bootmem_node(pg_data_t *pgdat, unsigned long physaddr,
  *
  * The range must be contiguous but may span node boundaries.
  */
-int __init reserve_bootmem(unsigned long addr, unsigned long size,
+int __init reserve_bootmem(phys_addr_t addr, unsigned long size,
 			    int flags)
 {
 	unsigned long start, end;
@@ -439,7 +439,7 @@ int __init reserve_bootmem(unsigned long addr, unsigned long size,
 	return mark_bootmem(start, end, 1, flags);
 }
 
-int __weak __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
+int __weak __init reserve_bootmem_generic(phys_addr_t phys, unsigned long len,
 				   int flags)
 {
 	return reserve_bootmem(phys, len, flags);
@@ -461,7 +461,7 @@ static unsigned long __init align_idx(struct bootmem_data *bdata,
 static unsigned long __init align_off(struct bootmem_data *bdata,
 				      unsigned long off, unsigned long align)
 {
-	unsigned long base = PFN_PHYS(bdata->node_min_pfn);
+	phys_addr_t base = PFN_PHYS(bdata->node_min_pfn);
 
 	/* Same as align_idx for byte offsets */
 
@@ -470,14 +470,14 @@ static unsigned long __init align_off(struct bootmem_data *bdata,
 
 static void * __init alloc_bootmem_bdata(struct bootmem_data *bdata,
 					unsigned long size, unsigned long align,
-					unsigned long goal, unsigned long limit)
+					phys_addr_t goal, phys_addr_t limit)
 {
 	unsigned long fallback = 0;
 	unsigned long min, max, start, sidx, midx, step;
 
-	bdebug("nid=%td size=%lx [%lu pages] align=%lx goal=%lx limit=%lx\n",
+	bdebug("nid=%td size=%lx [%lu pages] align=%lx goal=%llx limit=%llx\n",
 		bdata - bootmem_node_data, size, PAGE_ALIGN(size) >> PAGE_SHIFT,
-		align, goal, limit);
+		align, (u64)goal, (u64)limit);
 
 	BUG_ON(!size);
 	BUG_ON(align & (align - 1));
@@ -519,7 +519,8 @@ static void * __init alloc_bootmem_bdata(struct bootmem_data *bdata,
 	while (1) {
 		int merge;
 		void *region;
-		unsigned long eidx, i, start_off, end_off;
+		unsigned long eidx, i;
+		phys_addr_t   start_off, end_off;
 find_block:
 		sidx = find_next_zero_bit(bdata->node_bootmem_map, midx, sidx);
 		sidx = align_idx(bdata, sidx, step);
@@ -577,7 +578,7 @@ find_block:
 
 static void * __init alloc_arch_preferred_bootmem(bootmem_data_t *bdata,
 					unsigned long size, unsigned long align,
-					unsigned long goal, unsigned long limit)
+					phys_addr_t goal, phys_addr_t limit)
 {
 	if (WARN_ON_ONCE(slab_is_available()))
 		return kzalloc(size, GFP_NOWAIT);
@@ -598,8 +599,8 @@ static void * __init alloc_arch_preferred_bootmem(bootmem_data_t *bdata,
 
 static void * __init alloc_bootmem_core(unsigned long size,
 					unsigned long align,
-					unsigned long goal,
-					unsigned long limit)
+					phys_addr_t goal,
+					phys_addr_t limit)
 {
 	bootmem_data_t *bdata;
 	void *region;
@@ -624,8 +625,8 @@ static void * __init alloc_bootmem_core(unsigned long size,
 
 static void * __init ___alloc_bootmem_nopanic(unsigned long size,
 					      unsigned long align,
-					      unsigned long goal,
-					      unsigned long limit)
+					      phys_addr_t goal,
+					      phys_addr_t limit)
 {
 	void *ptr;
 
@@ -655,15 +656,15 @@ restart:
  * Returns NULL on failure.
  */
 void * __init __alloc_bootmem_nopanic(unsigned long size, unsigned long align,
-					unsigned long goal)
+					phys_addr_t goal)
 {
-	unsigned long limit = 0;
+	phys_addr_t limit = 0;
 
 	return ___alloc_bootmem_nopanic(size, align, goal, limit);
 }
 
 static void * __init ___alloc_bootmem(unsigned long size, unsigned long align,
-					unsigned long goal, unsigned long limit)
+					phys_addr_t goal, phys_addr_t limit)
 {
 	void *mem = ___alloc_bootmem_nopanic(size, align, goal, limit);
 
@@ -691,16 +692,16 @@ static void * __init ___alloc_bootmem(unsigned long size, unsigned long align,
  * The function panics if the request can not be satisfied.
  */
 void * __init __alloc_bootmem(unsigned long size, unsigned long align,
-			      unsigned long goal)
+			      phys_addr_t goal)
 {
-	unsigned long limit = 0;
+	phys_addr_t limit = 0;
 
 	return ___alloc_bootmem(size, align, goal, limit);
 }
 
 void * __init ___alloc_bootmem_node_nopanic(pg_data_t *pgdat,
 				unsigned long size, unsigned long align,
-				unsigned long goal, unsigned long limit)
+				phys_addr_t goal, phys_addr_t limit)
 {
 	void *ptr;
 
@@ -731,7 +732,7 @@ again:
 }
 
 void * __init __alloc_bootmem_node_nopanic(pg_data_t *pgdat, unsigned long size,
-				   unsigned long align, unsigned long goal)
+				   unsigned long align, phys_addr_t goal)
 {
 	if (WARN_ON_ONCE(slab_is_available()))
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
@@ -740,8 +741,8 @@ void * __init __alloc_bootmem_node_nopanic(pg_data_t *pgdat, unsigned long size,
 }
 
 void * __init ___alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
-				    unsigned long align, unsigned long goal,
-				    unsigned long limit)
+				    unsigned long align, phys_addr_t goal,
+				    phys_addr_t limit)
 {
 	void *ptr;
 
@@ -770,7 +771,7 @@ void * __init ___alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
  * The function panics if the request can not be satisfied.
  */
 void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
-				   unsigned long align, unsigned long goal)
+				   unsigned long align, phys_addr_t goal)
 {
 	if (WARN_ON_ONCE(slab_is_available()))
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
@@ -779,7 +780,7 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size,
 }
 
 void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
-				   unsigned long align, unsigned long goal)
+				   unsigned long align, phys_addr_t goal)
 {
 #ifdef MAX_DMA32_PFN
 	unsigned long end_pfn;
@@ -825,7 +826,7 @@ void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
  * The function panics if the request can not be satisfied.
  */
 void * __init __alloc_bootmem_low(unsigned long size, unsigned long align,
-				  unsigned long goal)
+				  phys_addr_t goal)
 {
 	return ___alloc_bootmem(size, align, goal, ARCH_LOW_ADDRESS_LIMIT);
 }
@@ -846,7 +847,7 @@ void * __init __alloc_bootmem_low(unsigned long size, unsigned long align,
  * The function panics if the request can not be satisfied.
  */
 void * __init __alloc_bootmem_low_node(pg_data_t *pgdat, unsigned long size,
-				       unsigned long align, unsigned long goal)
+				       unsigned long align, phys_addr_t goal)
 {
 	if (WARN_ON_ONCE(slab_is_available()))
 		return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: bootmem: use phys_addr_t for physical addresses
  2012-09-12 16:06 [PATCH] mm: bootmem: use phys_addr_t for physical addresses Cyril Chemparathy
@ 2012-09-12 20:39 ` Tejun Heo
  2012-09-13  0:08   ` Cyril Chemparathy
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2012-09-12 20:39 UTC (permalink / raw)
  To: Cyril Chemparathy
  Cc: linux-kernel, linux-mm, akpm, davem, eric.dumazet, hannes,
	shangw, vitalya

Hello,

On Wed, Sep 12, 2012 at 12:06:48PM -0400, Cyril Chemparathy wrote:
>  static void * __init alloc_bootmem_core(unsigned long size,
>  					unsigned long align,
> -					unsigned long goal,
> -					unsigned long limit)
> +					phys_addr_t goal,
> +					phys_addr_t limit)

So, a function which takes phys_addr_t for goal and limit but returns
void * doesn't make much sense unless the function creates directly
addressable mapping somewhere.

The right thing to do would be converting to nobootmem (ie. memblock)
and use the memblock interface.  Have no idea at all whether that
would be a realistic short-term solution for arm.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: bootmem: use phys_addr_t for physical addresses
  2012-09-12 20:39 ` Tejun Heo
@ 2012-09-13  0:08   ` Cyril Chemparathy
  2012-09-13  0:34     ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Cyril Chemparathy @ 2012-09-13  0:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, linux-mm, akpm, davem, eric.dumazet, hannes,
	shangw, vitalya

Hi Tejun,

On 9/12/2012 4:39 PM, Tejun Heo wrote:
> Hello,
>
> On Wed, Sep 12, 2012 at 12:06:48PM -0400, Cyril Chemparathy wrote:
>>   static void * __init alloc_bootmem_core(unsigned long size,
>>   					unsigned long align,
>> -					unsigned long goal,
>> -					unsigned long limit)
>> +					phys_addr_t goal,
>> +					phys_addr_t limit)
>
> So, a function which takes phys_addr_t for goal and limit but returns
> void * doesn't make much sense unless the function creates directly
> addressable mapping somewhere.
>

On the 32-bit PAE platform in question, physical memory is located 
outside the 4GB range.  Therefore phys_to_virt takes a 64-bit physical 
address and returns a 32-bit kernel mapped lowmem pointer.

> The right thing to do would be converting to nobootmem (ie. memblock)
> and use the memblock interface.  Have no idea at all whether that
> would be a realistic short-term solution for arm.
>

I must plead ignorance and let wiser souls chime in on ARM architecture 
plans w.r.t. nobootmem.  As far as I can tell, the only thing that 
blocks us from using nobootmem at present is the need for sparsemem on 
some platforms.

-- 
Thanks
- Cyril

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: bootmem: use phys_addr_t for physical addresses
  2012-09-13  0:08   ` Cyril Chemparathy
@ 2012-09-13  0:34     ` Tejun Heo
  2012-09-13  0:40       ` Cyril Chemparathy
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2012-09-13  0:34 UTC (permalink / raw)
  To: Cyril Chemparathy
  Cc: linux-kernel, linux-mm, akpm, davem, eric.dumazet, hannes,
	shangw, vitalya

Hello,

On Wed, Sep 12, 2012 at 08:08:30PM -0400, Cyril Chemparathy wrote:
> >So, a function which takes phys_addr_t for goal and limit but returns
> >void * doesn't make much sense unless the function creates directly
> >addressable mapping somewhere.
> 
> On the 32-bit PAE platform in question, physical memory is located
> outside the 4GB range.  Therefore phys_to_virt takes a 64-bit
> physical address and returns a 32-bit kernel mapped lowmem pointer.

Yes but phys_to_virt() can return the vaddr only if the physical
address is already mapped in the kernel address space; otherwise, you
need one of the kmap*() calls which may not be online early in the
boot and consumes either the vmalloc area or fixmaps.  bootmem
interface can't handle unmapped memory.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: bootmem: use phys_addr_t for physical addresses
  2012-09-13  0:34     ` Tejun Heo
@ 2012-09-13  0:40       ` Cyril Chemparathy
  2012-09-13 19:32         ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Cyril Chemparathy @ 2012-09-13  0:40 UTC (permalink / raw)
  To: Tejun Heo
  Cc: linux-kernel, linux-mm, akpm, davem, eric.dumazet, hannes,
	shangw, vitalya

Hi Tejun,

On 9/12/2012 8:34 PM, Tejun Heo wrote:
> Hello,
>
> On Wed, Sep 12, 2012 at 08:08:30PM -0400, Cyril Chemparathy wrote:
>>> So, a function which takes phys_addr_t for goal and limit but returns
>>> void * doesn't make much sense unless the function creates directly
>>> addressable mapping somewhere.
>>
>> On the 32-bit PAE platform in question, physical memory is located
>> outside the 4GB range.  Therefore phys_to_virt takes a 64-bit
>> physical address and returns a 32-bit kernel mapped lowmem pointer.
>
> Yes but phys_to_virt() can return the vaddr only if the physical
> address is already mapped in the kernel address space; otherwise, you
> need one of the kmap*() calls which may not be online early in the
> boot and consumes either the vmalloc area or fixmaps.  bootmem
> interface can't handle unmapped memory.
>

You probably missed the lowmem bit from my response?

This system has all of its memory outside the 4GB physical address 
space.  This includes lowmem, which is permanently mapped into the 
kernel virtual address space as usual.

-- 
Thanks
- Cyril

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm: bootmem: use phys_addr_t for physical addresses
  2012-09-13  0:40       ` Cyril Chemparathy
@ 2012-09-13 19:32         ` Tejun Heo
  0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2012-09-13 19:32 UTC (permalink / raw)
  To: Cyril Chemparathy
  Cc: linux-kernel, linux-mm, akpm, davem, eric.dumazet, hannes,
	shangw, vitalya

Hello, Cyril.

On Wed, Sep 12, 2012 at 08:40:58PM -0400, Cyril Chemparathy wrote:
> You probably missed the lowmem bit from my response?
> 
> This system has all of its memory outside the 4GB physical address
> space.  This includes lowmem, which is permanently mapped into the
> kernel virtual address space as usual.

Yeah, I understand that and as a short-term solution we maybe can add
a check to verify that the goal and limits are under lowmem and fail
with NULL if not, but it still is a broken interface and I'd rather
not mess with it when memblock is already there.  Converting to
memblock usually isn't too much work although it expectedly involves
some subtleties and fallouts for a while.

Do you recall what the problem was with sparsemem and memblock?  I
don't think I'll directly work on arm but I'll be happy to help on
memblock issues.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-09-13 19:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-12 16:06 [PATCH] mm: bootmem: use phys_addr_t for physical addresses Cyril Chemparathy
2012-09-12 20:39 ` Tejun Heo
2012-09-13  0:08   ` Cyril Chemparathy
2012-09-13  0:34     ` Tejun Heo
2012-09-13  0:40       ` Cyril Chemparathy
2012-09-13 19:32         ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).