All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
       [not found] <CGME20171107094311epcas1p4a5dd975d6e9f3618a26a0a5d68c68b55@epcas1p4.samsung.com>
@ 2017-11-07  9:44   ` Jaewon Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Jaewon Kim @ 2017-11-07  9:44 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, minchan, linux-mm, linux-kernel, jaewon31.kim,
	Jaewon Kim

online_page_ext and page_ext_init allocate page_ext for each section, but
they do not allocate if the first PFN is !pfn_present(pfn) or
!pfn_valid(pfn).

Though the first page is not valid, page_ext could be useful for other
pages in the section. But checking all PFNs in a section may be time
consuming job. Let's check each (section count / 16) PFN, then prepare
page_ext if any PFN is present or valid.

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
---
 mm/page_ext.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index 32f18911deda..634f9c5a8b9b 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -312,7 +312,17 @@ static int __meminit online_page_ext(unsigned long start_pfn,
 	}
 
 	for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
-		if (!pfn_present(pfn))
+		unsigned long t_pfn = pfn;
+		bool present = false;
+
+		while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
+			if (pfn_present(t_pfn)) {
+				present = true;
+				break;
+			}
+			t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
+		}
+		if (!present)
 			continue;
 		fail = init_section_page_ext(pfn, nid);
 	}
@@ -391,8 +401,17 @@ void __init page_ext_init(void)
 		 */
 		for (pfn = start_pfn; pfn < end_pfn;
 			pfn = ALIGN(pfn + 1, PAGES_PER_SECTION)) {
-
-			if (!pfn_valid(pfn))
+			unsigned long t_pfn = pfn;
+			bool valid = false;
+
+			while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
+				if (pfn_valid(t_pfn)) {
+					valid = true;
+					break;
+				}
+				t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
+			}
+			if (!valid)
 				continue;
 			/*
 			 * Nodes's pfns can be overlapping.
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
@ 2017-11-07  9:44   ` Jaewon Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Jaewon Kim @ 2017-11-07  9:44 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, minchan, linux-mm, linux-kernel, jaewon31.kim,
	Jaewon Kim

online_page_ext and page_ext_init allocate page_ext for each section, but
they do not allocate if the first PFN is !pfn_present(pfn) or
!pfn_valid(pfn).

Though the first page is not valid, page_ext could be useful for other
pages in the section. But checking all PFNs in a section may be time
consuming job. Let's check each (section count / 16) PFN, then prepare
page_ext if any PFN is present or valid.

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
---
 mm/page_ext.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index 32f18911deda..634f9c5a8b9b 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -312,7 +312,17 @@ static int __meminit online_page_ext(unsigned long start_pfn,
 	}
 
 	for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
-		if (!pfn_present(pfn))
+		unsigned long t_pfn = pfn;
+		bool present = false;
+
+		while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
+			if (pfn_present(t_pfn)) {
+				present = true;
+				break;
+			}
+			t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
+		}
+		if (!present)
 			continue;
 		fail = init_section_page_ext(pfn, nid);
 	}
@@ -391,8 +401,17 @@ void __init page_ext_init(void)
 		 */
 		for (pfn = start_pfn; pfn < end_pfn;
 			pfn = ALIGN(pfn + 1, PAGES_PER_SECTION)) {
-
-			if (!pfn_valid(pfn))
+			unsigned long t_pfn = pfn;
+			bool valid = false;
+
+			while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
+				if (pfn_valid(t_pfn)) {
+					valid = true;
+					break;
+				}
+				t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
+			}
+			if (!valid)
 				continue;
 			/*
 			 * Nodes's pfns can be overlapping.
-- 
2.13.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
  2017-11-07  9:44   ` Jaewon Kim
@ 2017-11-08  7:52     ` Joonsoo Kim
  -1 siblings, 0 replies; 16+ messages in thread
From: Joonsoo Kim @ 2017-11-08  7:52 UTC (permalink / raw)
  To: Jaewon Kim
  Cc: akpm, mhocko, vbabka, minchan, linux-mm, linux-kernel, jaewon31.kim

On Tue, Nov 07, 2017 at 06:44:47PM +0900, Jaewon Kim wrote:
> online_page_ext and page_ext_init allocate page_ext for each section, but
> they do not allocate if the first PFN is !pfn_present(pfn) or
> !pfn_valid(pfn).
> 
> Though the first page is not valid, page_ext could be useful for other
> pages in the section. But checking all PFNs in a section may be time
> consuming job. Let's check each (section count / 16) PFN, then prepare
> page_ext if any PFN is present or valid.

I guess that this kind of section is not so many. And, this is for
debugging so completeness would be important. It's better to check
all pfn in the section.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
@ 2017-11-08  7:52     ` Joonsoo Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Joonsoo Kim @ 2017-11-08  7:52 UTC (permalink / raw)
  To: Jaewon Kim
  Cc: akpm, mhocko, vbabka, minchan, linux-mm, linux-kernel, jaewon31.kim

On Tue, Nov 07, 2017 at 06:44:47PM +0900, Jaewon Kim wrote:
> online_page_ext and page_ext_init allocate page_ext for each section, but
> they do not allocate if the first PFN is !pfn_present(pfn) or
> !pfn_valid(pfn).
> 
> Though the first page is not valid, page_ext could be useful for other
> pages in the section. But checking all PFNs in a section may be time
> consuming job. Let's check each (section count / 16) PFN, then prepare
> page_ext if any PFN is present or valid.

I guess that this kind of section is not so many. And, this is for
debugging so completeness would be important. It's better to check
all pfn in the section.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
  2017-11-08  7:52     ` Joonsoo Kim
@ 2017-11-08 13:33       ` Jaewon Kim
  -1 siblings, 0 replies; 16+ messages in thread
From: Jaewon Kim @ 2017-11-08 13:33 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Jaewon Kim, Andrew Morton, mhocko, vbabka, minchan, linux-mm,
	linux-kernel

2017-11-08 16:52 GMT+09:00 Joonsoo Kim <iamjoonsoo.kim@lge.com>:
> On Tue, Nov 07, 2017 at 06:44:47PM +0900, Jaewon Kim wrote:
>> online_page_ext and page_ext_init allocate page_ext for each section, but
>> they do not allocate if the first PFN is !pfn_present(pfn) or
>> !pfn_valid(pfn).
>>
>> Though the first page is not valid, page_ext could be useful for other
>> pages in the section. But checking all PFNs in a section may be time
>> consuming job. Let's check each (section count / 16) PFN, then prepare
>> page_ext if any PFN is present or valid.
>
> I guess that this kind of section is not so many. And, this is for
> debugging so completeness would be important. It's better to check
> all pfn in the section.
Thank you for your comment.

AFAIK physical memory address depends on HW SoC.
Sometimes a SoC remains few GB address region hole between few GB DRAM
and other few GB DRAM
such as 2GB under 4GB address and 2GB beyond 4GB address and holes between them.
If SoC designs so big hole between actual mapping, I thought too much
time will be spent on just checking all the PFNs.

Anyway if we decide to check all PFNs, I can change patch to t_pfn++ like below.
Please give me comment again.


while (t_pfn <  ALIGN(pfn + 1, PAGES_PER_SECTION)) {
        if (pfn_valid(t_pfn)) {
                valid = true;
                break;
        }
-        t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
+        t_pfn++;


Thank you
Jaewon Kim

>
> Thanks.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
@ 2017-11-08 13:33       ` Jaewon Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Jaewon Kim @ 2017-11-08 13:33 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Jaewon Kim, Andrew Morton, mhocko, vbabka, minchan, linux-mm,
	linux-kernel

2017-11-08 16:52 GMT+09:00 Joonsoo Kim <iamjoonsoo.kim@lge.com>:
> On Tue, Nov 07, 2017 at 06:44:47PM +0900, Jaewon Kim wrote:
>> online_page_ext and page_ext_init allocate page_ext for each section, but
>> they do not allocate if the first PFN is !pfn_present(pfn) or
>> !pfn_valid(pfn).
>>
>> Though the first page is not valid, page_ext could be useful for other
>> pages in the section. But checking all PFNs in a section may be time
>> consuming job. Let's check each (section count / 16) PFN, then prepare
>> page_ext if any PFN is present or valid.
>
> I guess that this kind of section is not so many. And, this is for
> debugging so completeness would be important. It's better to check
> all pfn in the section.
Thank you for your comment.

AFAIK physical memory address depends on HW SoC.
Sometimes a SoC remains few GB address region hole between few GB DRAM
and other few GB DRAM
such as 2GB under 4GB address and 2GB beyond 4GB address and holes between them.
If SoC designs so big hole between actual mapping, I thought too much
time will be spent on just checking all the PFNs.

Anyway if we decide to check all PFNs, I can change patch to t_pfn++ like below.
Please give me comment again.


while (t_pfn <  ALIGN(pfn + 1, PAGES_PER_SECTION)) {
        if (pfn_valid(t_pfn)) {
                valid = true;
                break;
        }
-        t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
+        t_pfn++;


Thank you
Jaewon Kim

>
> Thanks.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
  2017-11-08 13:33       ` Jaewon Kim
@ 2017-11-09  4:33         ` Joonsoo Kim
  -1 siblings, 0 replies; 16+ messages in thread
From: Joonsoo Kim @ 2017-11-09  4:33 UTC (permalink / raw)
  To: Jaewon Kim
  Cc: Jaewon Kim, Andrew Morton, mhocko, vbabka, minchan, linux-mm,
	linux-kernel

On Wed, Nov 08, 2017 at 10:33:51PM +0900, Jaewon Kim wrote:
> 2017-11-08 16:52 GMT+09:00 Joonsoo Kim <iamjoonsoo.kim@lge.com>:
> > On Tue, Nov 07, 2017 at 06:44:47PM +0900, Jaewon Kim wrote:
> >> online_page_ext and page_ext_init allocate page_ext for each section, but
> >> they do not allocate if the first PFN is !pfn_present(pfn) or
> >> !pfn_valid(pfn).
> >>
> >> Though the first page is not valid, page_ext could be useful for other
> >> pages in the section. But checking all PFNs in a section may be time
> >> consuming job. Let's check each (section count / 16) PFN, then prepare
> >> page_ext if any PFN is present or valid.
> >
> > I guess that this kind of section is not so many. And, this is for
> > debugging so completeness would be important. It's better to check
> > all pfn in the section.
> Thank you for your comment.
> 
> AFAIK physical memory address depends on HW SoC.
> Sometimes a SoC remains few GB address region hole between few GB DRAM
> and other few GB DRAM
> such as 2GB under 4GB address and 2GB beyond 4GB address and holes between them.
> If SoC designs so big hole between actual mapping, I thought too much
> time will be spent on just checking all the PFNs.

I don't think that it is painful because it is done just once at
initialization step. However, if you worry about it, we can use
pfn_present() to skip the whole section at once. !pfn_present()
guarantees that there is no valid pfn in the section. If pfn_present()
returns true, we need to search the whole pages in the section in
order to find valid pfn.

And, I think that we don't need to change online_page_ext(). AFAIK,
hotplug always adds section aligned memory so pfn_present() check
should be enough.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
@ 2017-11-09  4:33         ` Joonsoo Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Joonsoo Kim @ 2017-11-09  4:33 UTC (permalink / raw)
  To: Jaewon Kim
  Cc: Jaewon Kim, Andrew Morton, mhocko, vbabka, minchan, linux-mm,
	linux-kernel

On Wed, Nov 08, 2017 at 10:33:51PM +0900, Jaewon Kim wrote:
> 2017-11-08 16:52 GMT+09:00 Joonsoo Kim <iamjoonsoo.kim@lge.com>:
> > On Tue, Nov 07, 2017 at 06:44:47PM +0900, Jaewon Kim wrote:
> >> online_page_ext and page_ext_init allocate page_ext for each section, but
> >> they do not allocate if the first PFN is !pfn_present(pfn) or
> >> !pfn_valid(pfn).
> >>
> >> Though the first page is not valid, page_ext could be useful for other
> >> pages in the section. But checking all PFNs in a section may be time
> >> consuming job. Let's check each (section count / 16) PFN, then prepare
> >> page_ext if any PFN is present or valid.
> >
> > I guess that this kind of section is not so many. And, this is for
> > debugging so completeness would be important. It's better to check
> > all pfn in the section.
> Thank you for your comment.
> 
> AFAIK physical memory address depends on HW SoC.
> Sometimes a SoC remains few GB address region hole between few GB DRAM
> and other few GB DRAM
> such as 2GB under 4GB address and 2GB beyond 4GB address and holes between them.
> If SoC designs so big hole between actual mapping, I thought too much
> time will be spent on just checking all the PFNs.

I don't think that it is painful because it is done just once at
initialization step. However, if you worry about it, we can use
pfn_present() to skip the whole section at once. !pfn_present()
guarantees that there is no valid pfn in the section. If pfn_present()
returns true, we need to search the whole pages in the section in
order to find valid pfn.

And, I think that we don't need to change online_page_ext(). AFAIK,
hotplug always adds section aligned memory so pfn_present() check
should be enough.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
  2017-11-06 22:30       ` Jaewon Kim
@ 2017-11-07  7:47         ` Michal Hocko
  -1 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-11-07  7:47 UTC (permalink / raw)
  To: Jaewon Kim
  Cc: Jaewon Kim, Andrew Morton, vbabka, minchan, linux-mm, linux-kernel

On Tue 07-11-17 07:30:05, Jaewon Kim wrote:
> I wonder if you want me to split and resend the 2 patches, or if you
> will use this mail thread for the further discussion.

Please resend
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
@ 2017-11-07  7:47         ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-11-07  7:47 UTC (permalink / raw)
  To: Jaewon Kim
  Cc: Jaewon Kim, Andrew Morton, vbabka, minchan, linux-mm, linux-kernel

On Tue 07-11-17 07:30:05, Jaewon Kim wrote:
> I wonder if you want me to split and resend the 2 patches, or if you
> will use this mail thread for the further discussion.

Please resend
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
  2017-11-02  8:02     ` Michal Hocko
@ 2017-11-06 22:30       ` Jaewon Kim
  -1 siblings, 0 replies; 16+ messages in thread
From: Jaewon Kim @ 2017-11-06 22:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jaewon Kim, Andrew Morton, vbabka, minchan, linux-mm, linux-kernel

2017-11-02 17:02 GMT+09:00 Michal Hocko <mhocko@kernel.org>:
> On Thu 02-11-17 15:35:07, Jaewon Kim wrote:
>> online_page_ext and page_ext_init allocate page_ext for each section, but
>> they do not allocate if the first PFN is !pfn_present(pfn) or
>> !pfn_valid(pfn). Then section->page_ext remains as NULL. lookup_page_ext
>> checks NULL only if CONFIG_DEBUG_VM is enabled. For a valid PFN,
>> __set_page_owner will try to get page_ext through lookup_page_ext.
>> Without CONFIG_DEBUG_VM lookup_page_ext will misuse NULL pointer as value
>> 0. This incurrs invalid address access.
>>
>> This is the panic example when PFN 0x100000 is not valid but PFN 0x13FC00
>> is being used for page_ext. section->page_ext is NULL, get_entry returned
>> invalid page_ext address as 0x1DFA000 for a PFN 0x13FC00.
>>
>> <1>[   11.618085] Unable to handle kernel paging request at virtual address 01dfa014
>> <1>[   11.618140] pgd = ffffffc0c6dc9000
>> <1>[   11.618174] [01dfa014] *pgd=0000000000000000, *pud=0000000000000000
>> <4>[   11.618240] ------------[ cut here ]------------
>> <2>[   11.618278] Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
>> <0>[   11.618338] Internal error: Oops: 96000045 [#1] PREEMPT SMP
>> <4>[   11.618381] Modules linked in:
>> <4>[   11.618524] task: ffffffc0c6ec9180 task.stack: ffffffc0c6f40000
>> <4>[   11.618569] PC is at __set_page_owner+0x48/0x78
>> <4>[   11.618607] LR is at __set_page_owner+0x44/0x78
>> <4>[   11.626025] [<ffffff80082371e0>] __set_page_owner+0x48/0x78
>> <4>[   11.626071] [<ffffff80081df9f0>] get_page_from_freelist+0x880/0x8e8
>> <4>[   11.626118] [<ffffff80081e00a4>] __alloc_pages_nodemask+0x14c/0xc48
>> <4>[   11.626165] [<ffffff80081e610c>] __do_page_cache_readahead+0xdc/0x264
>> <4>[   11.626214] [<ffffff80081d8824>] filemap_fault+0x2ac/0x550
>> <4>[   11.626259] [<ffffff80082e5cf8>] ext4_filemap_fault+0x3c/0x58
>> <4>[   11.626305] [<ffffff800820a2f8>] __do_fault+0x80/0x120
>> <4>[   11.626347] [<ffffff800820eb4c>] handle_mm_fault+0x704/0xbb0
>> <4>[   11.626393] [<ffffff800809ba70>] do_page_fault+0x2e8/0x394
>> <4>[   11.626437] [<ffffff8008080be4>] do_mem_abort+0x88/0x124
>>
>> Though the first page is not valid, page_ext could be useful for other
>> pages in the section. But checking all PFNs in a section may be time
>> consuming job. Let's check each (section count / 16) PFN, then prepare
>> page_ext if any PFN is present or valid. And remove the CONFIG_DEBUG_VM in
>> lookup_page_ext to avoid panic.
>
> So I would split this patch into two. First one to address the panic
> which sounds like a stable material and then the enhancement which will
> most likely need a further discussion.
Hello Michal Hocko
Thank you for your comment.
I think checking Null by erasing #if defined(CONFIG_DEBUG_VM) is the
stable material.
I wonder if you want me to split and resend the 2 patches, or if you
will use this mail thread for the further discussion.

Thank you
Jaewon Kim
>
>> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
>> ---
>>  mm/page_ext.c | 29 ++++++++++++++++++++++-------
>>  1 file changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/page_ext.c b/mm/page_ext.c
>> index 32f18911deda..bf9c99beb312 100644
>> --- a/mm/page_ext.c
>> +++ b/mm/page_ext.c
>> @@ -124,7 +124,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>>       struct page_ext *base;
>>
>>       base = NODE_DATA(page_to_nid(page))->node_page_ext;
>> -#if defined(CONFIG_DEBUG_VM)
>>       /*
>>        * The sanity checks the page allocator does upon freeing a
>>        * page can reach here before the page_ext arrays are
>> @@ -133,7 +132,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>>        */
>>       if (unlikely(!base))
>>               return NULL;
>> -#endif
>>       index = pfn - round_down(node_start_pfn(page_to_nid(page)),
>>                                       MAX_ORDER_NR_PAGES);
>>       return get_entry(base, index);
>> @@ -198,7 +196,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>>  {
>>       unsigned long pfn = page_to_pfn(page);
>>       struct mem_section *section = __pfn_to_section(pfn);
>> -#if defined(CONFIG_DEBUG_VM)
>>       /*
>>        * The sanity checks the page allocator does upon freeing a
>>        * page can reach here before the page_ext arrays are
>> @@ -207,7 +204,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>>        */
>>       if (!section->page_ext)
>>               return NULL;
>> -#endif
>>       return get_entry(section->page_ext, pfn);
>>  }
>>
>> @@ -312,7 +308,17 @@ static int __meminit online_page_ext(unsigned long start_pfn,
>>       }
>>
>>       for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
>> -             if (!pfn_present(pfn))
>> +             unsigned long t_pfn = pfn;
>> +             bool present = false;
>> +
>> +             while (t_pfn <  ALIGN(pfn + 1, PAGES_PER_SECTION)) {
>> +                     if (pfn_present(t_pfn)) {
>> +                             present = true;
>> +                             break;
>> +                     }
>> +                     t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
>> +             }
>> +             if (!present)
>>                       continue;
>>               fail = init_section_page_ext(pfn, nid);
>>       }
>> @@ -391,8 +397,17 @@ void __init page_ext_init(void)
>>                */
>>               for (pfn = start_pfn; pfn < end_pfn;
>>                       pfn = ALIGN(pfn + 1, PAGES_PER_SECTION)) {
>> -
>> -                     if (!pfn_valid(pfn))
>> +                     unsigned long t_pfn = pfn;
>> +                     bool valid = false;
>> +
>> +                     while (t_pfn <  ALIGN(pfn + 1, PAGES_PER_SECTION)) {
>> +                             if (pfn_valid(t_pfn)) {
>> +                                     valid = true;
>> +                                     break;
>> +                             }
>> +                             t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
>> +                     }
>> +                     if (!valid)
>>                               continue;
>>                       /*
>>                        * Nodes's pfns can be overlapping.
>> --
>> 2.13.0
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
@ 2017-11-06 22:30       ` Jaewon Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Jaewon Kim @ 2017-11-06 22:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jaewon Kim, Andrew Morton, vbabka, minchan, linux-mm, linux-kernel

2017-11-02 17:02 GMT+09:00 Michal Hocko <mhocko@kernel.org>:
> On Thu 02-11-17 15:35:07, Jaewon Kim wrote:
>> online_page_ext and page_ext_init allocate page_ext for each section, but
>> they do not allocate if the first PFN is !pfn_present(pfn) or
>> !pfn_valid(pfn). Then section->page_ext remains as NULL. lookup_page_ext
>> checks NULL only if CONFIG_DEBUG_VM is enabled. For a valid PFN,
>> __set_page_owner will try to get page_ext through lookup_page_ext.
>> Without CONFIG_DEBUG_VM lookup_page_ext will misuse NULL pointer as value
>> 0. This incurrs invalid address access.
>>
>> This is the panic example when PFN 0x100000 is not valid but PFN 0x13FC00
>> is being used for page_ext. section->page_ext is NULL, get_entry returned
>> invalid page_ext address as 0x1DFA000 for a PFN 0x13FC00.
>>
>> <1>[   11.618085] Unable to handle kernel paging request at virtual address 01dfa014
>> <1>[   11.618140] pgd = ffffffc0c6dc9000
>> <1>[   11.618174] [01dfa014] *pgd=0000000000000000, *pud=0000000000000000
>> <4>[   11.618240] ------------[ cut here ]------------
>> <2>[   11.618278] Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
>> <0>[   11.618338] Internal error: Oops: 96000045 [#1] PREEMPT SMP
>> <4>[   11.618381] Modules linked in:
>> <4>[   11.618524] task: ffffffc0c6ec9180 task.stack: ffffffc0c6f40000
>> <4>[   11.618569] PC is at __set_page_owner+0x48/0x78
>> <4>[   11.618607] LR is at __set_page_owner+0x44/0x78
>> <4>[   11.626025] [<ffffff80082371e0>] __set_page_owner+0x48/0x78
>> <4>[   11.626071] [<ffffff80081df9f0>] get_page_from_freelist+0x880/0x8e8
>> <4>[   11.626118] [<ffffff80081e00a4>] __alloc_pages_nodemask+0x14c/0xc48
>> <4>[   11.626165] [<ffffff80081e610c>] __do_page_cache_readahead+0xdc/0x264
>> <4>[   11.626214] [<ffffff80081d8824>] filemap_fault+0x2ac/0x550
>> <4>[   11.626259] [<ffffff80082e5cf8>] ext4_filemap_fault+0x3c/0x58
>> <4>[   11.626305] [<ffffff800820a2f8>] __do_fault+0x80/0x120
>> <4>[   11.626347] [<ffffff800820eb4c>] handle_mm_fault+0x704/0xbb0
>> <4>[   11.626393] [<ffffff800809ba70>] do_page_fault+0x2e8/0x394
>> <4>[   11.626437] [<ffffff8008080be4>] do_mem_abort+0x88/0x124
>>
>> Though the first page is not valid, page_ext could be useful for other
>> pages in the section. But checking all PFNs in a section may be time
>> consuming job. Let's check each (section count / 16) PFN, then prepare
>> page_ext if any PFN is present or valid. And remove the CONFIG_DEBUG_VM in
>> lookup_page_ext to avoid panic.
>
> So I would split this patch into two. First one to address the panic
> which sounds like a stable material and then the enhancement which will
> most likely need a further discussion.
Hello Michal Hocko
Thank you for your comment.
I think checking Null by erasing #if defined(CONFIG_DEBUG_VM) is the
stable material.
I wonder if you want me to split and resend the 2 patches, or if you
will use this mail thread for the further discussion.

Thank you
Jaewon Kim
>
>> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
>> ---
>>  mm/page_ext.c | 29 ++++++++++++++++++++++-------
>>  1 file changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/mm/page_ext.c b/mm/page_ext.c
>> index 32f18911deda..bf9c99beb312 100644
>> --- a/mm/page_ext.c
>> +++ b/mm/page_ext.c
>> @@ -124,7 +124,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>>       struct page_ext *base;
>>
>>       base = NODE_DATA(page_to_nid(page))->node_page_ext;
>> -#if defined(CONFIG_DEBUG_VM)
>>       /*
>>        * The sanity checks the page allocator does upon freeing a
>>        * page can reach here before the page_ext arrays are
>> @@ -133,7 +132,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>>        */
>>       if (unlikely(!base))
>>               return NULL;
>> -#endif
>>       index = pfn - round_down(node_start_pfn(page_to_nid(page)),
>>                                       MAX_ORDER_NR_PAGES);
>>       return get_entry(base, index);
>> @@ -198,7 +196,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>>  {
>>       unsigned long pfn = page_to_pfn(page);
>>       struct mem_section *section = __pfn_to_section(pfn);
>> -#if defined(CONFIG_DEBUG_VM)
>>       /*
>>        * The sanity checks the page allocator does upon freeing a
>>        * page can reach here before the page_ext arrays are
>> @@ -207,7 +204,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>>        */
>>       if (!section->page_ext)
>>               return NULL;
>> -#endif
>>       return get_entry(section->page_ext, pfn);
>>  }
>>
>> @@ -312,7 +308,17 @@ static int __meminit online_page_ext(unsigned long start_pfn,
>>       }
>>
>>       for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
>> -             if (!pfn_present(pfn))
>> +             unsigned long t_pfn = pfn;
>> +             bool present = false;
>> +
>> +             while (t_pfn <  ALIGN(pfn + 1, PAGES_PER_SECTION)) {
>> +                     if (pfn_present(t_pfn)) {
>> +                             present = true;
>> +                             break;
>> +                     }
>> +                     t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
>> +             }
>> +             if (!present)
>>                       continue;
>>               fail = init_section_page_ext(pfn, nid);
>>       }
>> @@ -391,8 +397,17 @@ void __init page_ext_init(void)
>>                */
>>               for (pfn = start_pfn; pfn < end_pfn;
>>                       pfn = ALIGN(pfn + 1, PAGES_PER_SECTION)) {
>> -
>> -                     if (!pfn_valid(pfn))
>> +                     unsigned long t_pfn = pfn;
>> +                     bool valid = false;
>> +
>> +                     while (t_pfn <  ALIGN(pfn + 1, PAGES_PER_SECTION)) {
>> +                             if (pfn_valid(t_pfn)) {
>> +                                     valid = true;
>> +                                     break;
>> +                             }
>> +                             t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
>> +                     }
>> +                     if (!valid)
>>                               continue;
>>                       /*
>>                        * Nodes's pfns can be overlapping.
>> --
>> 2.13.0
>
> --
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
  2017-11-02  6:35   ` Jaewon Kim
@ 2017-11-02  8:02     ` Michal Hocko
  -1 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-11-02  8:02 UTC (permalink / raw)
  To: Jaewon Kim; +Cc: akpm, vbabka, minchan, linux-mm, linux-kernel, jaewon31.kim

On Thu 02-11-17 15:35:07, Jaewon Kim wrote:
> online_page_ext and page_ext_init allocate page_ext for each section, but
> they do not allocate if the first PFN is !pfn_present(pfn) or
> !pfn_valid(pfn). Then section->page_ext remains as NULL. lookup_page_ext
> checks NULL only if CONFIG_DEBUG_VM is enabled. For a valid PFN,
> __set_page_owner will try to get page_ext through lookup_page_ext.
> Without CONFIG_DEBUG_VM lookup_page_ext will misuse NULL pointer as value
> 0. This incurrs invalid address access.
> 
> This is the panic example when PFN 0x100000 is not valid but PFN 0x13FC00
> is being used for page_ext. section->page_ext is NULL, get_entry returned
> invalid page_ext address as 0x1DFA000 for a PFN 0x13FC00.
> 
> <1>[   11.618085] Unable to handle kernel paging request at virtual address 01dfa014
> <1>[   11.618140] pgd = ffffffc0c6dc9000
> <1>[   11.618174] [01dfa014] *pgd=0000000000000000, *pud=0000000000000000
> <4>[   11.618240] ------------[ cut here ]------------
> <2>[   11.618278] Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
> <0>[   11.618338] Internal error: Oops: 96000045 [#1] PREEMPT SMP
> <4>[   11.618381] Modules linked in:
> <4>[   11.618524] task: ffffffc0c6ec9180 task.stack: ffffffc0c6f40000
> <4>[   11.618569] PC is at __set_page_owner+0x48/0x78
> <4>[   11.618607] LR is at __set_page_owner+0x44/0x78
> <4>[   11.626025] [<ffffff80082371e0>] __set_page_owner+0x48/0x78
> <4>[   11.626071] [<ffffff80081df9f0>] get_page_from_freelist+0x880/0x8e8
> <4>[   11.626118] [<ffffff80081e00a4>] __alloc_pages_nodemask+0x14c/0xc48
> <4>[   11.626165] [<ffffff80081e610c>] __do_page_cache_readahead+0xdc/0x264
> <4>[   11.626214] [<ffffff80081d8824>] filemap_fault+0x2ac/0x550
> <4>[   11.626259] [<ffffff80082e5cf8>] ext4_filemap_fault+0x3c/0x58
> <4>[   11.626305] [<ffffff800820a2f8>] __do_fault+0x80/0x120
> <4>[   11.626347] [<ffffff800820eb4c>] handle_mm_fault+0x704/0xbb0
> <4>[   11.626393] [<ffffff800809ba70>] do_page_fault+0x2e8/0x394
> <4>[   11.626437] [<ffffff8008080be4>] do_mem_abort+0x88/0x124
> 
> Though the first page is not valid, page_ext could be useful for other
> pages in the section. But checking all PFNs in a section may be time
> consuming job. Let's check each (section count / 16) PFN, then prepare
> page_ext if any PFN is present or valid. And remove the CONFIG_DEBUG_VM in
> lookup_page_ext to avoid panic.

So I would split this patch into two. First one to address the panic
which sounds like a stable material and then the enhancement which will
most likely need a further discussion.

> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> ---
>  mm/page_ext.c | 29 ++++++++++++++++++++++-------
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 32f18911deda..bf9c99beb312 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -124,7 +124,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>  	struct page_ext *base;
>  
>  	base = NODE_DATA(page_to_nid(page))->node_page_ext;
> -#if defined(CONFIG_DEBUG_VM)
>  	/*
>  	 * The sanity checks the page allocator does upon freeing a
>  	 * page can reach here before the page_ext arrays are
> @@ -133,7 +132,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>  	 */
>  	if (unlikely(!base))
>  		return NULL;
> -#endif
>  	index = pfn - round_down(node_start_pfn(page_to_nid(page)),
>  					MAX_ORDER_NR_PAGES);
>  	return get_entry(base, index);
> @@ -198,7 +196,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>  {
>  	unsigned long pfn = page_to_pfn(page);
>  	struct mem_section *section = __pfn_to_section(pfn);
> -#if defined(CONFIG_DEBUG_VM)
>  	/*
>  	 * The sanity checks the page allocator does upon freeing a
>  	 * page can reach here before the page_ext arrays are
> @@ -207,7 +204,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>  	 */
>  	if (!section->page_ext)
>  		return NULL;
> -#endif
>  	return get_entry(section->page_ext, pfn);
>  }
>  
> @@ -312,7 +308,17 @@ static int __meminit online_page_ext(unsigned long start_pfn,
>  	}
>  
>  	for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
> -		if (!pfn_present(pfn))
> +		unsigned long t_pfn = pfn;
> +		bool present = false;
> +
> +		while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
> +			if (pfn_present(t_pfn)) {
> +				present = true;
> +				break;
> +			}
> +			t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
> +		}
> +		if (!present)
>  			continue;
>  		fail = init_section_page_ext(pfn, nid);
>  	}
> @@ -391,8 +397,17 @@ void __init page_ext_init(void)
>  		 */
>  		for (pfn = start_pfn; pfn < end_pfn;
>  			pfn = ALIGN(pfn + 1, PAGES_PER_SECTION)) {
> -
> -			if (!pfn_valid(pfn))
> +			unsigned long t_pfn = pfn;
> +			bool valid = false;
> +
> +			while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
> +				if (pfn_valid(t_pfn)) {
> +					valid = true;
> +					break;
> +				}
> +				t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
> +			}
> +			if (!valid)
>  				continue;
>  			/*
>  			 * Nodes's pfns can be overlapping.
> -- 
> 2.13.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
@ 2017-11-02  8:02     ` Michal Hocko
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Hocko @ 2017-11-02  8:02 UTC (permalink / raw)
  To: Jaewon Kim; +Cc: akpm, vbabka, minchan, linux-mm, linux-kernel, jaewon31.kim

On Thu 02-11-17 15:35:07, Jaewon Kim wrote:
> online_page_ext and page_ext_init allocate page_ext for each section, but
> they do not allocate if the first PFN is !pfn_present(pfn) or
> !pfn_valid(pfn). Then section->page_ext remains as NULL. lookup_page_ext
> checks NULL only if CONFIG_DEBUG_VM is enabled. For a valid PFN,
> __set_page_owner will try to get page_ext through lookup_page_ext.
> Without CONFIG_DEBUG_VM lookup_page_ext will misuse NULL pointer as value
> 0. This incurrs invalid address access.
> 
> This is the panic example when PFN 0x100000 is not valid but PFN 0x13FC00
> is being used for page_ext. section->page_ext is NULL, get_entry returned
> invalid page_ext address as 0x1DFA000 for a PFN 0x13FC00.
> 
> <1>[   11.618085] Unable to handle kernel paging request at virtual address 01dfa014
> <1>[   11.618140] pgd = ffffffc0c6dc9000
> <1>[   11.618174] [01dfa014] *pgd=0000000000000000, *pud=0000000000000000
> <4>[   11.618240] ------------[ cut here ]------------
> <2>[   11.618278] Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
> <0>[   11.618338] Internal error: Oops: 96000045 [#1] PREEMPT SMP
> <4>[   11.618381] Modules linked in:
> <4>[   11.618524] task: ffffffc0c6ec9180 task.stack: ffffffc0c6f40000
> <4>[   11.618569] PC is at __set_page_owner+0x48/0x78
> <4>[   11.618607] LR is at __set_page_owner+0x44/0x78
> <4>[   11.626025] [<ffffff80082371e0>] __set_page_owner+0x48/0x78
> <4>[   11.626071] [<ffffff80081df9f0>] get_page_from_freelist+0x880/0x8e8
> <4>[   11.626118] [<ffffff80081e00a4>] __alloc_pages_nodemask+0x14c/0xc48
> <4>[   11.626165] [<ffffff80081e610c>] __do_page_cache_readahead+0xdc/0x264
> <4>[   11.626214] [<ffffff80081d8824>] filemap_fault+0x2ac/0x550
> <4>[   11.626259] [<ffffff80082e5cf8>] ext4_filemap_fault+0x3c/0x58
> <4>[   11.626305] [<ffffff800820a2f8>] __do_fault+0x80/0x120
> <4>[   11.626347] [<ffffff800820eb4c>] handle_mm_fault+0x704/0xbb0
> <4>[   11.626393] [<ffffff800809ba70>] do_page_fault+0x2e8/0x394
> <4>[   11.626437] [<ffffff8008080be4>] do_mem_abort+0x88/0x124
> 
> Though the first page is not valid, page_ext could be useful for other
> pages in the section. But checking all PFNs in a section may be time
> consuming job. Let's check each (section count / 16) PFN, then prepare
> page_ext if any PFN is present or valid. And remove the CONFIG_DEBUG_VM in
> lookup_page_ext to avoid panic.

So I would split this patch into two. First one to address the panic
which sounds like a stable material and then the enhancement which will
most likely need a further discussion.

> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> ---
>  mm/page_ext.c | 29 ++++++++++++++++++++++-------
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 32f18911deda..bf9c99beb312 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -124,7 +124,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>  	struct page_ext *base;
>  
>  	base = NODE_DATA(page_to_nid(page))->node_page_ext;
> -#if defined(CONFIG_DEBUG_VM)
>  	/*
>  	 * The sanity checks the page allocator does upon freeing a
>  	 * page can reach here before the page_ext arrays are
> @@ -133,7 +132,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>  	 */
>  	if (unlikely(!base))
>  		return NULL;
> -#endif
>  	index = pfn - round_down(node_start_pfn(page_to_nid(page)),
>  					MAX_ORDER_NR_PAGES);
>  	return get_entry(base, index);
> @@ -198,7 +196,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>  {
>  	unsigned long pfn = page_to_pfn(page);
>  	struct mem_section *section = __pfn_to_section(pfn);
> -#if defined(CONFIG_DEBUG_VM)
>  	/*
>  	 * The sanity checks the page allocator does upon freeing a
>  	 * page can reach here before the page_ext arrays are
> @@ -207,7 +204,6 @@ struct page_ext *lookup_page_ext(struct page *page)
>  	 */
>  	if (!section->page_ext)
>  		return NULL;
> -#endif
>  	return get_entry(section->page_ext, pfn);
>  }
>  
> @@ -312,7 +308,17 @@ static int __meminit online_page_ext(unsigned long start_pfn,
>  	}
>  
>  	for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
> -		if (!pfn_present(pfn))
> +		unsigned long t_pfn = pfn;
> +		bool present = false;
> +
> +		while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
> +			if (pfn_present(t_pfn)) {
> +				present = true;
> +				break;
> +			}
> +			t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
> +		}
> +		if (!present)
>  			continue;
>  		fail = init_section_page_ext(pfn, nid);
>  	}
> @@ -391,8 +397,17 @@ void __init page_ext_init(void)
>  		 */
>  		for (pfn = start_pfn; pfn < end_pfn;
>  			pfn = ALIGN(pfn + 1, PAGES_PER_SECTION)) {
> -
> -			if (!pfn_valid(pfn))
> +			unsigned long t_pfn = pfn;
> +			bool valid = false;
> +
> +			while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
> +				if (pfn_valid(t_pfn)) {
> +					valid = true;
> +					break;
> +				}
> +				t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
> +			}
> +			if (!valid)
>  				continue;
>  			/*
>  			 * Nodes's pfns can be overlapping.
> -- 
> 2.13.0

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
       [not found] <CGME20171102063347epcas2p2ce3e91597de3bf68e818130ea44ac769@epcas2p2.samsung.com>
@ 2017-11-02  6:35   ` Jaewon Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Jaewon Kim @ 2017-11-02  6:35 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, minchan, linux-mm, linux-kernel, jaewon31.kim,
	Jaewon Kim

online_page_ext and page_ext_init allocate page_ext for each section, but
they do not allocate if the first PFN is !pfn_present(pfn) or
!pfn_valid(pfn). Then section->page_ext remains as NULL. lookup_page_ext
checks NULL only if CONFIG_DEBUG_VM is enabled. For a valid PFN,
__set_page_owner will try to get page_ext through lookup_page_ext.
Without CONFIG_DEBUG_VM lookup_page_ext will misuse NULL pointer as value
0. This incurrs invalid address access.

This is the panic example when PFN 0x100000 is not valid but PFN 0x13FC00
is being used for page_ext. section->page_ext is NULL, get_entry returned
invalid page_ext address as 0x1DFA000 for a PFN 0x13FC00.

<1>[   11.618085] Unable to handle kernel paging request at virtual address 01dfa014
<1>[   11.618140] pgd = ffffffc0c6dc9000
<1>[   11.618174] [01dfa014] *pgd=0000000000000000, *pud=0000000000000000
<4>[   11.618240] ------------[ cut here ]------------
<2>[   11.618278] Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
<0>[   11.618338] Internal error: Oops: 96000045 [#1] PREEMPT SMP
<4>[   11.618381] Modules linked in:
<4>[   11.618524] task: ffffffc0c6ec9180 task.stack: ffffffc0c6f40000
<4>[   11.618569] PC is at __set_page_owner+0x48/0x78
<4>[   11.618607] LR is at __set_page_owner+0x44/0x78
<4>[   11.626025] [<ffffff80082371e0>] __set_page_owner+0x48/0x78
<4>[   11.626071] [<ffffff80081df9f0>] get_page_from_freelist+0x880/0x8e8
<4>[   11.626118] [<ffffff80081e00a4>] __alloc_pages_nodemask+0x14c/0xc48
<4>[   11.626165] [<ffffff80081e610c>] __do_page_cache_readahead+0xdc/0x264
<4>[   11.626214] [<ffffff80081d8824>] filemap_fault+0x2ac/0x550
<4>[   11.626259] [<ffffff80082e5cf8>] ext4_filemap_fault+0x3c/0x58
<4>[   11.626305] [<ffffff800820a2f8>] __do_fault+0x80/0x120
<4>[   11.626347] [<ffffff800820eb4c>] handle_mm_fault+0x704/0xbb0
<4>[   11.626393] [<ffffff800809ba70>] do_page_fault+0x2e8/0x394
<4>[   11.626437] [<ffffff8008080be4>] do_mem_abort+0x88/0x124

Though the first page is not valid, page_ext could be useful for other
pages in the section. But checking all PFNs in a section may be time
consuming job. Let's check each (section count / 16) PFN, then prepare
page_ext if any PFN is present or valid. And remove the CONFIG_DEBUG_VM in
lookup_page_ext to avoid panic.

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
---
 mm/page_ext.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index 32f18911deda..bf9c99beb312 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -124,7 +124,6 @@ struct page_ext *lookup_page_ext(struct page *page)
 	struct page_ext *base;
 
 	base = NODE_DATA(page_to_nid(page))->node_page_ext;
-#if defined(CONFIG_DEBUG_VM)
 	/*
 	 * The sanity checks the page allocator does upon freeing a
 	 * page can reach here before the page_ext arrays are
@@ -133,7 +132,6 @@ struct page_ext *lookup_page_ext(struct page *page)
 	 */
 	if (unlikely(!base))
 		return NULL;
-#endif
 	index = pfn - round_down(node_start_pfn(page_to_nid(page)),
 					MAX_ORDER_NR_PAGES);
 	return get_entry(base, index);
@@ -198,7 +196,6 @@ struct page_ext *lookup_page_ext(struct page *page)
 {
 	unsigned long pfn = page_to_pfn(page);
 	struct mem_section *section = __pfn_to_section(pfn);
-#if defined(CONFIG_DEBUG_VM)
 	/*
 	 * The sanity checks the page allocator does upon freeing a
 	 * page can reach here before the page_ext arrays are
@@ -207,7 +204,6 @@ struct page_ext *lookup_page_ext(struct page *page)
 	 */
 	if (!section->page_ext)
 		return NULL;
-#endif
 	return get_entry(section->page_ext, pfn);
 }
 
@@ -312,7 +308,17 @@ static int __meminit online_page_ext(unsigned long start_pfn,
 	}
 
 	for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
-		if (!pfn_present(pfn))
+		unsigned long t_pfn = pfn;
+		bool present = false;
+
+		while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
+			if (pfn_present(t_pfn)) {
+				present = true;
+				break;
+			}
+			t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
+		}
+		if (!present)
 			continue;
 		fail = init_section_page_ext(pfn, nid);
 	}
@@ -391,8 +397,17 @@ void __init page_ext_init(void)
 		 */
 		for (pfn = start_pfn; pfn < end_pfn;
 			pfn = ALIGN(pfn + 1, PAGES_PER_SECTION)) {
-
-			if (!pfn_valid(pfn))
+			unsigned long t_pfn = pfn;
+			bool valid = false;
+
+			while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
+				if (pfn_valid(t_pfn)) {
+					valid = true;
+					break;
+				}
+				t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
+			}
+			if (!valid)
 				continue;
 			/*
 			 * Nodes's pfns can be overlapping.
-- 
2.13.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH] mm: page_ext: allocate page extension though first PFN is invalid
@ 2017-11-02  6:35   ` Jaewon Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Jaewon Kim @ 2017-11-02  6:35 UTC (permalink / raw)
  To: akpm
  Cc: mhocko, vbabka, minchan, linux-mm, linux-kernel, jaewon31.kim,
	Jaewon Kim

online_page_ext and page_ext_init allocate page_ext for each section, but
they do not allocate if the first PFN is !pfn_present(pfn) or
!pfn_valid(pfn). Then section->page_ext remains as NULL. lookup_page_ext
checks NULL only if CONFIG_DEBUG_VM is enabled. For a valid PFN,
__set_page_owner will try to get page_ext through lookup_page_ext.
Without CONFIG_DEBUG_VM lookup_page_ext will misuse NULL pointer as value
0. This incurrs invalid address access.

This is the panic example when PFN 0x100000 is not valid but PFN 0x13FC00
is being used for page_ext. section->page_ext is NULL, get_entry returned
invalid page_ext address as 0x1DFA000 for a PFN 0x13FC00.

<1>[   11.618085] Unable to handle kernel paging request at virtual address 01dfa014
<1>[   11.618140] pgd = ffffffc0c6dc9000
<1>[   11.618174] [01dfa014] *pgd=0000000000000000, *pud=0000000000000000
<4>[   11.618240] ------------[ cut here ]------------
<2>[   11.618278] Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
<0>[   11.618338] Internal error: Oops: 96000045 [#1] PREEMPT SMP
<4>[   11.618381] Modules linked in:
<4>[   11.618524] task: ffffffc0c6ec9180 task.stack: ffffffc0c6f40000
<4>[   11.618569] PC is at __set_page_owner+0x48/0x78
<4>[   11.618607] LR is at __set_page_owner+0x44/0x78
<4>[   11.626025] [<ffffff80082371e0>] __set_page_owner+0x48/0x78
<4>[   11.626071] [<ffffff80081df9f0>] get_page_from_freelist+0x880/0x8e8
<4>[   11.626118] [<ffffff80081e00a4>] __alloc_pages_nodemask+0x14c/0xc48
<4>[   11.626165] [<ffffff80081e610c>] __do_page_cache_readahead+0xdc/0x264
<4>[   11.626214] [<ffffff80081d8824>] filemap_fault+0x2ac/0x550
<4>[   11.626259] [<ffffff80082e5cf8>] ext4_filemap_fault+0x3c/0x58
<4>[   11.626305] [<ffffff800820a2f8>] __do_fault+0x80/0x120
<4>[   11.626347] [<ffffff800820eb4c>] handle_mm_fault+0x704/0xbb0
<4>[   11.626393] [<ffffff800809ba70>] do_page_fault+0x2e8/0x394
<4>[   11.626437] [<ffffff8008080be4>] do_mem_abort+0x88/0x124

Though the first page is not valid, page_ext could be useful for other
pages in the section. But checking all PFNs in a section may be time
consuming job. Let's check each (section count / 16) PFN, then prepare
page_ext if any PFN is present or valid. And remove the CONFIG_DEBUG_VM in
lookup_page_ext to avoid panic.

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
---
 mm/page_ext.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/mm/page_ext.c b/mm/page_ext.c
index 32f18911deda..bf9c99beb312 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -124,7 +124,6 @@ struct page_ext *lookup_page_ext(struct page *page)
 	struct page_ext *base;
 
 	base = NODE_DATA(page_to_nid(page))->node_page_ext;
-#if defined(CONFIG_DEBUG_VM)
 	/*
 	 * The sanity checks the page allocator does upon freeing a
 	 * page can reach here before the page_ext arrays are
@@ -133,7 +132,6 @@ struct page_ext *lookup_page_ext(struct page *page)
 	 */
 	if (unlikely(!base))
 		return NULL;
-#endif
 	index = pfn - round_down(node_start_pfn(page_to_nid(page)),
 					MAX_ORDER_NR_PAGES);
 	return get_entry(base, index);
@@ -198,7 +196,6 @@ struct page_ext *lookup_page_ext(struct page *page)
 {
 	unsigned long pfn = page_to_pfn(page);
 	struct mem_section *section = __pfn_to_section(pfn);
-#if defined(CONFIG_DEBUG_VM)
 	/*
 	 * The sanity checks the page allocator does upon freeing a
 	 * page can reach here before the page_ext arrays are
@@ -207,7 +204,6 @@ struct page_ext *lookup_page_ext(struct page *page)
 	 */
 	if (!section->page_ext)
 		return NULL;
-#endif
 	return get_entry(section->page_ext, pfn);
 }
 
@@ -312,7 +308,17 @@ static int __meminit online_page_ext(unsigned long start_pfn,
 	}
 
 	for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION) {
-		if (!pfn_present(pfn))
+		unsigned long t_pfn = pfn;
+		bool present = false;
+
+		while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
+			if (pfn_present(t_pfn)) {
+				present = true;
+				break;
+			}
+			t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
+		}
+		if (!present)
 			continue;
 		fail = init_section_page_ext(pfn, nid);
 	}
@@ -391,8 +397,17 @@ void __init page_ext_init(void)
 		 */
 		for (pfn = start_pfn; pfn < end_pfn;
 			pfn = ALIGN(pfn + 1, PAGES_PER_SECTION)) {
-
-			if (!pfn_valid(pfn))
+			unsigned long t_pfn = pfn;
+			bool valid = false;
+
+			while (t_pfn <	ALIGN(pfn + 1, PAGES_PER_SECTION)) {
+				if (pfn_valid(t_pfn)) {
+					valid = true;
+					break;
+				}
+				t_pfn = ALIGN(pfn + 1, PAGES_PER_SECTION >> 4);
+			}
+			if (!valid)
 				continue;
 			/*
 			 * Nodes's pfns can be overlapping.
-- 
2.13.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-11-09  4:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20171107094311epcas1p4a5dd975d6e9f3618a26a0a5d68c68b55@epcas1p4.samsung.com>
2017-11-07  9:44 ` [PATCH] mm: page_ext: allocate page extension though first PFN is invalid Jaewon Kim
2017-11-07  9:44   ` Jaewon Kim
2017-11-08  7:52   ` Joonsoo Kim
2017-11-08  7:52     ` Joonsoo Kim
2017-11-08 13:33     ` Jaewon Kim
2017-11-08 13:33       ` Jaewon Kim
2017-11-09  4:33       ` Joonsoo Kim
2017-11-09  4:33         ` Joonsoo Kim
     [not found] <CGME20171102063347epcas2p2ce3e91597de3bf68e818130ea44ac769@epcas2p2.samsung.com>
2017-11-02  6:35 ` Jaewon Kim
2017-11-02  6:35   ` Jaewon Kim
2017-11-02  8:02   ` Michal Hocko
2017-11-02  8:02     ` Michal Hocko
2017-11-06 22:30     ` Jaewon Kim
2017-11-06 22:30       ` Jaewon Kim
2017-11-07  7:47       ` Michal Hocko
2017-11-07  7:47         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.