All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: Tejun Heo <tj@kernel.org>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>,
	Sasha Levin <levinsasha928@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	hpa@linux.intel.com, linux-mm <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: Early boot panic on machine with lots of memory
Date: Thu, 21 Jun 2012 18:47:24 -0700	[thread overview]
Message-ID: <CAE9FiQXubmnKHjnqOxVeoJknJZFNuStCcW=1XC6jLE7eznkTmg@mail.gmail.com> (raw)
In-Reply-To: <20120621201728.GB4642@google.com>

[-- Attachment #1: Type: text/plain, Size: 3421 bytes --]

On Thu, Jun 21, 2012 at 1:17 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Yinghai.
>
> On Tue, Jun 19, 2012 at 07:57:45PM -0700, Yinghai Lu wrote:
>> if it is that case, that change could fix other problem problem too.
>> --- during the one free reserved.regions could double the array.
>
> Yeah, that sounds much more attractive to me too.  Some comments on
> the patch tho.
>
>>  /**
>>   * memblock_double_array - double the size of the memblock regions array
>>   * @type: memblock type of the regions array being doubled
>> @@ -216,7 +204,7 @@ static int __init_memblock memblock_doub
>>
>>       /* Calculate new doubled size */
>>       old_size = type->max * sizeof(struct memblock_region);
>> -     new_size = old_size << 1;
>> +     new_size = PAGE_ALIGN(old_size << 1);
>
> We definintely can use some comments explaining why we want page
> alignment.  It's kinda subtle.

yes.

>
> This is a bit confusing here because old_size is the proper size
> without padding while new_size is page aligned size with possible
> padding.  Maybe discerning {old|new}_alloc_size is clearer?  Also, I
> think adding @new_cnt variable which is calculated together would make
> the code easier to follow.  So, sth like,
>
>        /* explain why page aligning is necessary */
>        old_size = type->max * sizeof(struct memblock_region);
>        old_alloc_size = PAGE_ALIGN(old_size);
>
>        new_max = type->max << 1;
>        new_size = new_max * sizeof(struct memblock_region);
>        new_alloc_size = PAGE_ALIGN(new_size);
>
> and use alloc_sizes for alloc/frees and sizes for everything else.

ok, will add new_alloc_size, old_alloc_size.

>
>>  unsigned long __init free_low_memory_core_early(int nodeid)
>>  {
>>       unsigned long count = 0;
>> -     phys_addr_t start, end;
>> +     phys_addr_t start, end, size;
>>       u64 i;
>>
>> -     /* free reserved array temporarily so that it's treated as free area */
>> -     memblock_free_reserved_regions();
>> +     for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL)
>> +             count += __free_memory_core(start, end);
>>
>> -     for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL) {
>> -             unsigned long start_pfn = PFN_UP(start);
>> -             unsigned long end_pfn = min_t(unsigned long,
>> -                                           PFN_DOWN(end), max_low_pfn);
>> -             if (start_pfn < end_pfn) {
>> -                     __free_pages_memory(start_pfn, end_pfn);
>> -                     count += end_pfn - start_pfn;
>> -             }
>> -     }
>> +     /* free range that is used for reserved array if we allocate it */
>> +     size = get_allocated_memblock_reserved_regions_info(&start);
>> +     if (size)
>> +             count += __free_memory_core(start, start + size);
>
> I'm afraid this is too early.  We don't want the region to be unmapped
> yet.  This should only happen after all memblock usages are finished
> which I don't think is the case yet.

No, it is not early. at that time memblock usage is done.

Also I tested one system with huge memory, duplicated the problem on
KVM that Sasha met.
my patch fixes the problem.

please check attached patch.

Also I add another patch to double check if there is any reference
with reserved.region.
so far there is no reference found.

Thanks

Yinghai

[-- Attachment #2: fix_free_memblock_reserve_v4_5.patch --]
[-- Type: application/octet-stream, Size: 6770 bytes --]

Subject: [PATCH] memblock: free allocated memblock_reserved_regions later

In memblock_free_reserved_regions, will call memblock_free(),
but memblock_free() would double reserved.regions too, so we could free
old range for reserved.regions.

Also tj said there is another bug could be related to this too.

| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing.  We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.

So try to allocate that in PAGE_SIZE alignment and free that later.

-v5: Use new_alloc_size, and old_alloc_size to simplify it according to tj.

Cc: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 include/linux/memblock.h |    4 ---
 mm/memblock.c            |   51 +++++++++++++++++++++--------------------------
 mm/nobootmem.c           |   36 ++++++++++++++++++++-------------
 3 files changed, 46 insertions(+), 45 deletions(-)

Index: linux-2.6/include/linux/memblock.h
===================================================================
--- linux-2.6.orig/include/linux/memblock.h
+++ linux-2.6/include/linux/memblock.h
@@ -50,9 +50,7 @@ phys_addr_t memblock_find_in_range_node(
 				phys_addr_t size, phys_addr_t align, int nid);
 phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
 				   phys_addr_t size, phys_addr_t align);
-int memblock_free_reserved_regions(void);
-int memblock_reserve_reserved_regions(void);
-
+phys_addr_t get_allocated_memblock_reserved_regions_info(phys_addr_t *addr);
 void memblock_allow_resize(void);
 int memblock_add_node(phys_addr_t base, phys_addr_t size, int nid);
 int memblock_add(phys_addr_t base, phys_addr_t size);
Index: linux-2.6/mm/memblock.c
===================================================================
--- linux-2.6.orig/mm/memblock.c
+++ linux-2.6/mm/memblock.c
@@ -143,30 +143,6 @@ phys_addr_t __init_memblock memblock_fin
 					   MAX_NUMNODES);
 }
 
-/*
- * Free memblock.reserved.regions
- */
-int __init_memblock memblock_free_reserved_regions(void)
-{
-	if (memblock.reserved.regions == memblock_reserved_init_regions)
-		return 0;
-
-	return memblock_free(__pa(memblock.reserved.regions),
-		 sizeof(struct memblock_region) * memblock.reserved.max);
-}
-
-/*
- * Reserve memblock.reserved.regions
- */
-int __init_memblock memblock_reserve_reserved_regions(void)
-{
-	if (memblock.reserved.regions == memblock_reserved_init_regions)
-		return 0;
-
-	return memblock_reserve(__pa(memblock.reserved.regions),
-		 sizeof(struct memblock_region) * memblock.reserved.max);
-}
-
 static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
 {
 	type->total_size -= type->regions[r].size;
@@ -184,6 +160,18 @@ static void __init_memblock memblock_rem
 	}
 }
 
+phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info(
+					phys_addr_t *addr)
+{
+	if (memblock.reserved.regions == memblock_reserved_init_regions)
+		return 0;
+
+	*addr = __pa(memblock.reserved.regions);
+
+	return PAGE_ALIGN(sizeof(struct memblock_region) *
+			  memblock.reserved.max);
+}
+
 /**
  * memblock_double_array - double the size of the memblock regions array
  * @type: memblock type of the regions array being doubled
@@ -204,6 +192,7 @@ static int __init_memblock memblock_doub
 						phys_addr_t new_area_size)
 {
 	struct memblock_region *new_array, *old_array;
+	phys_addr_t old_alloc_size, new_alloc_size;
 	phys_addr_t old_size, new_size, addr;
 	int use_slab = slab_is_available();
 	int *in_slab;
@@ -217,6 +206,12 @@ static int __init_memblock memblock_doub
 	/* Calculate new doubled size */
 	old_size = type->max * sizeof(struct memblock_region);
 	new_size = old_size << 1;
+	/*
+	 * We need to allocated new one align to PAGE_SIZE,
+	 *  so late could free them completely.
+	 */
+	old_alloc_size = PAGE_ALIGN(old_size);
+	new_alloc_size = PAGE_ALIGN(new_size);
 
 	/* Retrieve the slab flag */
 	if (type == &memblock.memory)
@@ -245,11 +240,11 @@ static int __init_memblock memblock_doub
 
 		addr = memblock_find_in_range(new_area_start + new_area_size,
 						memblock.current_limit,
-						new_size, sizeof(phys_addr_t));
+						new_alloc_size, PAGE_SIZE);
 		if (!addr && new_area_size)
 			addr = memblock_find_in_range(0,
 					min(new_area_start, memblock.current_limit),
-					new_size, sizeof(phys_addr_t));
+					new_alloc_size, PAGE_SIZE);
 
 		new_array = addr ? __va(addr) : 0;
 	}
@@ -279,13 +274,13 @@ static int __init_memblock memblock_doub
 		kfree(old_array);
 	else if (old_array != memblock_memory_init_regions &&
 		 old_array != memblock_reserved_init_regions)
-		memblock_free(__pa(old_array), old_size);
+		memblock_free(__pa(old_array), old_alloc_size);
 
 	/* Reserve the new array if that comes from the memblock.
 	 * Otherwise, we needn't do it
 	 */
 	if (!use_slab)
-		BUG_ON(memblock_reserve(addr, new_size));
+		BUG_ON(memblock_reserve(addr, new_alloc_size));
 
 	/* Update slab flag */
 	*in_slab = use_slab;
Index: linux-2.6/mm/nobootmem.c
===================================================================
--- linux-2.6.orig/mm/nobootmem.c
+++ linux-2.6/mm/nobootmem.c
@@ -105,27 +105,35 @@ static void __init __free_pages_memory(u
 		__free_pages_bootmem(pfn_to_page(i), 0);
 }
 
+static unsigned long __init __free_memory_core(phys_addr_t start,
+				 phys_addr_t end)
+{
+	unsigned long start_pfn = PFN_UP(start);
+	unsigned long end_pfn = min_t(unsigned long,
+				      PFN_DOWN(end), max_low_pfn);
+
+	if (start_pfn > end_pfn)
+		return 0;
+
+	__free_pages_memory(start_pfn, end_pfn);
+
+	return end_pfn - start_pfn;
+}
+
 unsigned long __init free_low_memory_core_early(int nodeid)
 {
 	unsigned long count = 0;
-	phys_addr_t start, end;
+	phys_addr_t start, end, size;
 	u64 i;
 
-	/* free reserved array temporarily so that it's treated as free area */
-	memblock_free_reserved_regions();
+	for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL)
+		count += __free_memory_core(start, end);
 
-	for_each_free_mem_range(i, MAX_NUMNODES, &start, &end, NULL) {
-		unsigned long start_pfn = PFN_UP(start);
-		unsigned long end_pfn = min_t(unsigned long,
-					      PFN_DOWN(end), max_low_pfn);
-		if (start_pfn < end_pfn) {
-			__free_pages_memory(start_pfn, end_pfn);
-			count += end_pfn - start_pfn;
-		}
-	}
+	/* free range that is used for reserved array if we allocate it */
+	size = get_allocated_memblock_reserved_regions_info(&start);
+	if (size)
+		count += __free_memory_core(start, start + size);
 
-	/* put region array back? */
-	memblock_reserve_reserved_regions();
 	return count;
 }
 

[-- Attachment #3: memblock_reserved_clear_check.patch --]
[-- Type: application/octet-stream, Size: 3926 bytes --]

Subject: [PATCH] memblock: Add checking about illegal using memblock.reserved

After memblock is not used anymore, Clear the memblock reserved so we will not
use it wrongly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/init_32.c    |    3 +++
 arch/x86/mm/init_64.c    |    2 ++
 include/linux/memblock.h |    1 +
 mm/memblock.c            |   15 +++++++++++++++
 4 files changed, 21 insertions(+)

Index: linux-2.6/arch/x86/mm/init_32.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_32.c
+++ linux-2.6/arch/x86/mm/init_32.c
@@ -759,6 +759,9 @@ void __init mem_init(void)
 		if (page_is_ram(tmp) && PageReserved(pfn_to_page(tmp)))
 			reservedpages++;
 
+	/* clear reserved to catch wrong usage */
+	memblock_clear_reserved();
+
 	codesize =  (unsigned long) &_etext - (unsigned long) &_text;
 	datasize =  (unsigned long) &_edata - (unsigned long) &_etext;
 	initsize =  (unsigned long) &__init_end - (unsigned long) &__init_begin;
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -699,6 +699,8 @@ void __init mem_init(void)
 
 	absent_pages = absent_pages_in_range(0, max_pfn);
 	reservedpages = max_pfn - totalram_pages - absent_pages;
+	/* clear reserved to catch wrong usage */
+	memblock_clear_reserved();
 	after_bootmem = 1;
 
 	codesize =  (unsigned long) &_etext - (unsigned long) &_text;
Index: linux-2.6/include/linux/memblock.h
===================================================================
--- linux-2.6.orig/include/linux/memblock.h
+++ linux-2.6/include/linux/memblock.h
@@ -46,6 +46,7 @@ extern int memblock_debug;
 #define memblock_dbg(fmt, ...) \
 	if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
 
+void memblock_clear_reserved(void);
 phys_addr_t memblock_find_in_range_node(phys_addr_t start, phys_addr_t end,
 				phys_addr_t size, phys_addr_t align, int nid);
 phys_addr_t memblock_find_in_range(phys_addr_t start, phys_addr_t end,
Index: linux-2.6/mm/memblock.c
===================================================================
--- linux-2.6.orig/mm/memblock.c
+++ linux-2.6/mm/memblock.c
@@ -101,6 +101,8 @@ phys_addr_t __init_memblock memblock_fin
 	phys_addr_t this_start, this_end, cand;
 	u64 i;
 
+	WARN_ONCE(!memblock.reserved.max, "memblock.reserved was cleared already!");
+
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
 		end = memblock.current_limit;
@@ -143,6 +145,14 @@ phys_addr_t __init_memblock memblock_fin
 					   MAX_NUMNODES);
 }
 
+/*
+ * Clear memblock.reserved
+ */
+void __init_memblock memblock_clear_reserved(void)
+{
+	memset(&memblock.reserved, 0, sizeof(memblock.reserved));
+}
+
 static void __init_memblock memblock_remove_region(struct memblock_type *type, unsigned long r)
 {
 	type->total_size -= type->regions[r].size;
@@ -535,6 +545,8 @@ int __init_memblock memblock_remove(phys
 
 int __init_memblock memblock_free(phys_addr_t base, phys_addr_t size)
 {
+	WARN_ONCE(!memblock.reserved.max, "memblock.reserved was cleared already!");
+
 	memblock_dbg("   memblock_free: [%#016llx-%#016llx] %pF\n",
 		     (unsigned long long)base,
 		     (unsigned long long)base + size,
@@ -547,6 +559,7 @@ int __init_memblock memblock_reserve(phy
 {
 	struct memblock_type *_rgn = &memblock.reserved;
 
+	WARN_ONCE(!memblock.reserved.max, "memblock.reserved was cleared already!");
 	memblock_dbg("memblock_reserve: [%#016llx-%#016llx] %pF\n",
 		     (unsigned long long)base,
 		     (unsigned long long)base + size,
@@ -587,6 +600,8 @@ void __init_memblock __next_free_mem_ran
 	int mi = *idx & 0xffffffff;
 	int ri = *idx >> 32;
 
+	WARN_ONCE(!rsv->max, "memblock.reserved was cleared already!");
+
 	for ( ; mi < mem->cnt; mi++) {
 		struct memblock_region *m = &mem->regions[mi];
 		phys_addr_t m_start = m->base;

  reply	other threads:[~2012-06-22  1:47 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-13 21:38 Early boot panic on machine with lots of memory Sasha Levin
2012-06-14  3:20 ` Tejun Heo
2012-06-14  3:20   ` Tejun Heo
2012-06-14  9:50   ` Sasha Levin
2012-06-14  9:50     ` Sasha Levin
2012-06-14 20:56     ` Yinghai Lu
2012-06-14 20:56       ` Yinghai Lu
2012-06-14 21:34       ` Sasha Levin
2012-06-14 21:34         ` Sasha Levin
2012-06-14 23:57         ` Yinghai Lu
2012-06-14 23:57           ` Yinghai Lu
2012-06-15  0:59           ` Sasha Levin
2012-06-15  0:59             ` Sasha Levin
2012-06-15  0:59             ` Sasha Levin
2012-06-15  2:21             ` Yinghai Lu
2012-06-15  2:21               ` Yinghai Lu
2012-06-15  7:41               ` Sasha Levin
2012-06-15  7:41                 ` Sasha Levin
2012-06-18 22:32     ` Tejun Heo
2012-06-18 22:32       ` Tejun Heo
2012-06-18 22:50       ` Sasha Levin
2012-06-18 22:50         ` Sasha Levin
2012-06-19  4:11         ` Gavin Shan
2012-06-19  4:11           ` Gavin Shan
2012-06-19  5:43           ` Yinghai Lu
2012-06-19  5:43             ` Yinghai Lu
2012-06-19  6:09             ` Gavin Shan
2012-06-19  6:09               ` Gavin Shan
2012-06-19 18:12               ` Yinghai Lu
2012-06-19 18:12                 ` Yinghai Lu
2012-06-19 21:20           ` Tejun Heo
2012-06-19 21:20             ` Tejun Heo
2012-06-19 21:26             ` Tejun Heo
2012-06-19 21:26               ` Tejun Heo
2012-06-20  2:57               ` Yinghai Lu
2012-06-21 20:17                 ` Tejun Heo
2012-06-21 20:17                   ` Tejun Heo
2012-06-22  1:47                   ` Yinghai Lu [this message]
2012-06-22  1:58                     ` Yinghai Lu
2012-06-22 18:51                     ` Tejun Heo
2012-06-22 18:51                       ` Tejun Heo
2012-06-22 19:23                       ` Yinghai Lu
2012-06-22 19:23                         ` Yinghai Lu
2012-06-22 19:29                         ` Tejun Heo
2012-06-22 19:29                           ` Tejun Heo
2012-06-22 20:01                           ` Yinghai Lu
2012-06-22 20:01                             ` Yinghai Lu
2012-06-22 20:14                             ` Tejun Heo
2012-06-22 20:14                               ` Tejun Heo
2012-06-22 20:23                               ` Yinghai Lu
2012-06-22 20:23                                 ` Yinghai Lu
2012-06-23  2:14                           ` Yinghai Lu
2012-06-27 18:13                             ` Tejun Heo
2012-06-27 18:13                               ` Tejun Heo
2012-06-27 19:22                               ` Yinghai Lu
2012-06-27 19:22                                 ` Yinghai Lu
2012-06-27 19:26                                 ` Tejun Heo
2012-06-27 19:26                                   ` Tejun Heo
2012-06-27 21:15                                   ` Yinghai Lu
2012-06-29 18:27                                     ` [PATCH for -3.5] memblock: free allocated memblock_reserved_regions later Yinghai Lu
2012-06-29 18:27                                       ` Yinghai Lu
2012-06-29 18:32                                       ` Tejun Heo
2012-06-29 18:32                                         ` Tejun Heo
2012-06-29 18:38                                         ` Yinghai Lu
2012-06-29 18:38                                           ` Yinghai Lu
2012-06-21 20:19             ` Early boot panic on machine with lots of memory Tejun Heo
2012-06-21 20:19               ` Tejun Heo
2012-06-22 10:29               ` Sasha Levin
2012-06-22 10:29                 ` Sasha Levin
2012-06-22 18:15                 ` Yinghai Lu
2012-06-22 18:15                   ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE9FiQXubmnKHjnqOxVeoJknJZFNuStCcW=1XC6jLE7eznkTmg@mail.gmail.com' \
    --to=yinghai@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=hpa@linux.intel.com \
    --cc=levinsasha928@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shangw@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.