All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-12 15:53 ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-12 15:53 UTC (permalink / raw)
  To: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman, KAMEZAWA Hiroyuki

Kukjin, Could you test below patch?
I don't have any sparsemem system. Sorry. 

-- CUT DOWN HERE --

Kukjin reported oops happen while he change min_free_kbytes
http://www.spinics.net/lists/arm-kernel/msg92894.html
It happen by memory map on sparsemem. 

The system has a memory map following as. 
     section 0             section 1              section 2
0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
SECTION_SIZE_BITS 28(256M)

It means section 0 is an incompletely filled section.
Nontheless, current pfn_valid of sparsemem checks pfn loosely. 

It checks only mem_section's validation.
So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
It's not what we want. 

The Following patch adds check valid pfn range check on pfn_valid of sparsemem.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reported-by: Kukjin Kim <kgene.kim@samsung.com>

P.S) 
It is just RFC. If we agree with this, I will make the patch on mmotm.

--

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b4d109e..6c2147a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -979,6 +979,8 @@ struct mem_section {
        struct page_cgroup *page_cgroup;
        unsigned long pad;
 #endif
+       unsigned long start_pfn;
+       unsigned long end_pfn;
 };
 
 #ifdef CONFIG_SPARSEMEM_EXTREME
@@ -1039,6 +1041,12 @@ static inline int valid_section(struct mem_section *section)
        return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
 }
 
+static inline int valid_section_pfn(struct mem_section *section, unsigned long pfn)
+{
+       return ((section && (section->section_mem_map & SECTION_HAS_MEM_MAP)) &&
+               (section->start_pfn <= pfn && pfn < section->end_pfn));
+}
+
 static inline int valid_section_nr(unsigned long nr)
 {
        return valid_section(__nr_to_section(nr));
@@ -1053,7 +1061,7 @@ static inline int pfn_valid(unsigned long pfn)
 {
        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
-       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+       return valid_section_pfn(__nr_to_section(pfn_to_section_nr(pfn)), pfn);
 }

diff --git a/mm/sparse.c b/mm/sparse.c
index 95ac219..bde9090 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -195,6 +195,8 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
                if (!ms->section_mem_map)
                        ms->section_mem_map = sparse_encode_early_nid(nid) |
                                                        SECTION_MARKED_PRESENT;
+               ms->start_pfn = start;
+               ms->end_pfn = end;
        }
 }



-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-12 15:53 ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-12 15:53 UTC (permalink / raw)
  To: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman, KAMEZAWA Hiroyuki

Kukjin, Could you test below patch?
I don't have any sparsemem system. Sorry. 

-- CUT DOWN HERE --

Kukjin reported oops happen while he change min_free_kbytes
http://www.spinics.net/lists/arm-kernel/msg92894.html
It happen by memory map on sparsemem. 

The system has a memory map following as. 
     section 0             section 1              section 2
0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
SECTION_SIZE_BITS 28(256M)

It means section 0 is an incompletely filled section.
Nontheless, current pfn_valid of sparsemem checks pfn loosely. 

It checks only mem_section's validation.
So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
It's not what we want. 

The Following patch adds check valid pfn range check on pfn_valid of sparsemem.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reported-by: Kukjin Kim <kgene.kim@samsung.com>

P.S) 
It is just RFC. If we agree with this, I will make the patch on mmotm.

--

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b4d109e..6c2147a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -979,6 +979,8 @@ struct mem_section {
        struct page_cgroup *page_cgroup;
        unsigned long pad;
 #endif
+       unsigned long start_pfn;
+       unsigned long end_pfn;
 };
 
 #ifdef CONFIG_SPARSEMEM_EXTREME
@@ -1039,6 +1041,12 @@ static inline int valid_section(struct mem_section *section)
        return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
 }
 
+static inline int valid_section_pfn(struct mem_section *section, unsigned long pfn)
+{
+       return ((section && (section->section_mem_map & SECTION_HAS_MEM_MAP)) &&
+               (section->start_pfn <= pfn && pfn < section->end_pfn));
+}
+
 static inline int valid_section_nr(unsigned long nr)
 {
        return valid_section(__nr_to_section(nr));
@@ -1053,7 +1061,7 @@ static inline int pfn_valid(unsigned long pfn)
 {
        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
-       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+       return valid_section_pfn(__nr_to_section(pfn_to_section_nr(pfn)), pfn);
 }

diff --git a/mm/sparse.c b/mm/sparse.c
index 95ac219..bde9090 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -195,6 +195,8 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
                if (!ms->section_mem_map)
                        ms->section_mem_map = sparse_encode_early_nid(nid) |
                                                        SECTION_MARKED_PRESENT;
+               ms->start_pfn = start;
+               ms->end_pfn = end;
        }
 }



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* RE: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-12 15:53 ` Minchan Kim
@ 2010-07-12 23:59   ` Kukjin Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Kukjin Kim @ 2010-07-12 23:59 UTC (permalink / raw)
  To: 'Minchan Kim', linux, 'Yinghai Lu',
	'H. Peter Anvin', 'Andrew Morton',
	'Shaohua Li', 'Yakui Zhao',
	linux-kernel, linux-mm, arm-kernel, 'Mel Gorman',
	'KAMEZAWA Hiroyuki'

Minchan Kim wrote:
> 
Hi :-)

> Kukjin, Could you test below patch?

Sure.

> I don't have any sparsemem system. Sorry.

No problem...
And in the same test, there was no problem ;-)

It means has no kernel panic with your this patch.

If you need other test on sparsemem system, please let me know.

Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.
> 
> -- CUT DOWN HERE --
> 
> Kukjin reported oops happen while he change min_free_kbytes
> http://www.spinics.net/lists/arm-kernel/msg92894.html
> It happen by memory map on sparsemem.
> 
> The system has a memory map following as.
>      section 0             section 1              section 2
> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> SECTION_SIZE_BITS 28(256M)
> 
> It means section 0 is an incompletely filled section.
> Nontheless, current pfn_valid of sparsemem checks pfn loosely.
> 
> It checks only mem_section's validation.
> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> It's not what we want.
> 
> The Following patch adds check valid pfn range check on pfn_valid of
sparsemem.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
> 
> P.S)
> It is just RFC. If we agree with this, I will make the patch on mmotm.
> 
> --
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b4d109e..6c2147a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -979,6 +979,8 @@ struct mem_section {
>         struct page_cgroup *page_cgroup;
>         unsigned long pad;
>  #endif
> +       unsigned long start_pfn;
> +       unsigned long end_pfn;
>  };
> 
>  #ifdef CONFIG_SPARSEMEM_EXTREME
> @@ -1039,6 +1041,12 @@ static inline int valid_section(struct mem_section
> *section)
>         return (section && (section->section_mem_map &
> SECTION_HAS_MEM_MAP));
>  }
> 
> +static inline int valid_section_pfn(struct mem_section *section, unsigned
long pfn)
> +{
> +       return ((section && (section->section_mem_map &
> SECTION_HAS_MEM_MAP)) &&
> +               (section->start_pfn <= pfn && pfn < section->end_pfn));
> +}
> +
>  static inline int valid_section_nr(unsigned long nr)
>  {
>         return valid_section(__nr_to_section(nr));
> @@ -1053,7 +1061,7 @@ static inline int pfn_valid(unsigned long pfn)
>  {
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
> -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +       return valid_section_pfn(__nr_to_section(pfn_to_section_nr(pfn)),
pfn);
>  }
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 95ac219..bde9090 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -195,6 +195,8 @@ void __init memory_present(int nid, unsigned long
start,
> unsigned long end)
>                 if (!ms->section_mem_map)
>                         ms->section_mem_map =
> sparse_encode_early_nid(nid) |
> 
> SECTION_MARKED_PRESENT;
> +               ms->start_pfn = start;
> +               ms->end_pfn = end;
>         }
>  }
> 
> 
> 
> --
> Kind regards,
> Minchan Kim


^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-12 23:59   ` Kukjin Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Kukjin Kim @ 2010-07-12 23:59 UTC (permalink / raw)
  To: 'Minchan Kim', linux, 'Yinghai Lu',
	'H. Peter Anvin', 'Andrew Morton',
	'Shaohua Li', 'Yakui Zhao',
	linux-kernel, linux-mm, arm-kernel, 'Mel Gorman',
	'KAMEZAWA Hiroyuki'

Minchan Kim wrote:
> 
Hi :-)

> Kukjin, Could you test below patch?

Sure.

> I don't have any sparsemem system. Sorry.

No problem...
And in the same test, there was no problem ;-)

It means has no kernel panic with your this patch.

If you need other test on sparsemem system, please let me know.

Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.
> 
> -- CUT DOWN HERE --
> 
> Kukjin reported oops happen while he change min_free_kbytes
> http://www.spinics.net/lists/arm-kernel/msg92894.html
> It happen by memory map on sparsemem.
> 
> The system has a memory map following as.
>      section 0             section 1              section 2
> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> SECTION_SIZE_BITS 28(256M)
> 
> It means section 0 is an incompletely filled section.
> Nontheless, current pfn_valid of sparsemem checks pfn loosely.
> 
> It checks only mem_section's validation.
> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> It's not what we want.
> 
> The Following patch adds check valid pfn range check on pfn_valid of
sparsemem.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
> 
> P.S)
> It is just RFC. If we agree with this, I will make the patch on mmotm.
> 
> --
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b4d109e..6c2147a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -979,6 +979,8 @@ struct mem_section {
>         struct page_cgroup *page_cgroup;
>         unsigned long pad;
>  #endif
> +       unsigned long start_pfn;
> +       unsigned long end_pfn;
>  };
> 
>  #ifdef CONFIG_SPARSEMEM_EXTREME
> @@ -1039,6 +1041,12 @@ static inline int valid_section(struct mem_section
> *section)
>         return (section && (section->section_mem_map &
> SECTION_HAS_MEM_MAP));
>  }
> 
> +static inline int valid_section_pfn(struct mem_section *section, unsigned
long pfn)
> +{
> +       return ((section && (section->section_mem_map &
> SECTION_HAS_MEM_MAP)) &&
> +               (section->start_pfn <= pfn && pfn < section->end_pfn));
> +}
> +
>  static inline int valid_section_nr(unsigned long nr)
>  {
>         return valid_section(__nr_to_section(nr));
> @@ -1053,7 +1061,7 @@ static inline int pfn_valid(unsigned long pfn)
>  {
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
> -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +       return valid_section_pfn(__nr_to_section(pfn_to_section_nr(pfn)),
pfn);
>  }
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 95ac219..bde9090 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -195,6 +195,8 @@ void __init memory_present(int nid, unsigned long
start,
> unsigned long end)
>                 if (!ms->section_mem_map)
>                         ms->section_mem_map =
> sparse_encode_early_nid(nid) |
> 
> SECTION_MARKED_PRESENT;
> +               ms->start_pfn = start;
> +               ms->end_pfn = end;
>         }
>  }
> 
> 
> 
> --
> Kind regards,
> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-12 15:53 ` Minchan Kim
@ 2010-07-13  3:19   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  3:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, 13 Jul 2010 00:53:48 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Kukjin, Could you test below patch?
> I don't have any sparsemem system. Sorry. 
> 
> -- CUT DOWN HERE --
> 
> Kukjin reported oops happen while he change min_free_kbytes
> http://www.spinics.net/lists/arm-kernel/msg92894.html
> It happen by memory map on sparsemem. 
> 
> The system has a memory map following as. 
>      section 0             section 1              section 2
> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> SECTION_SIZE_BITS 28(256M)
> 
> It means section 0 is an incompletely filled section.
> Nontheless, current pfn_valid of sparsemem checks pfn loosely. 
> 
> It checks only mem_section's validation.
> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> It's not what we want. 
> 
> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
> 
> P.S) 
> It is just RFC. If we agree with this, I will make the patch on mmotm.
> 
> --
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b4d109e..6c2147a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -979,6 +979,8 @@ struct mem_section {
>         struct page_cgroup *page_cgroup;
>         unsigned long pad;
>  #endif
> +       unsigned long start_pfn;
> +       unsigned long end_pfn;
>  };
>  

I have 2 concerns.
 1. This makes mem_section twice. Wasting too much memory and not good for cache.
    But yes, you can put this under some CONFIG which has small number of mem_section[].

 2. This can't be help for a case where a section has multiple small holes.


Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
==
Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
To handle that, pfn_valid() should check there are really memmap or not.
For that purpose, __get_user() can be used.
This idea is from ia64_pfn_valid().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/mmzone.h |   12 ++++++++++++
 mm/sparse.c            |   17 +++++++++++++++++
 2 files changed, 29 insertions(+)

Index: mmotm-2.6.35-0701/include/linux/mmzone.h
===================================================================
--- mmotm-2.6.35-0701.orig/include/linux/mmzone.h
+++ mmotm-2.6.35-0701/include/linux/mmzone.h
@@ -1047,12 +1047,24 @@ static inline struct mem_section *__pfn_
 	return __nr_to_section(pfn_to_section_nr(pfn));
 }
 
+#ifndef CONFIG_ARCH_HAS_HOLES_IN_MEMMAP
 static inline int pfn_valid(unsigned long pfn)
 {
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
 	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
 }
+#else
+extern int pfn_valid_mapped(unsigned long pfn);
+static inline int pfn_valid(unsigned long pfn)
+{
+	if (pfn_to_seciton_nr(pfn) >= NR_MEM_SECTIONS)
+		return 0;
+	if (!valid_section(__nr_to_section(pfn_to_section_nr(pfn))))
+		return 0;
+	return pfn_valid_mapped(pfn);
+}
+#endif
 
 static inline int pfn_present(unsigned long pfn)
 {
Index: mmotm-2.6.35-0701/mm/sparse.c
===================================================================
--- mmotm-2.6.35-0701.orig/mm/sparse.c
+++ mmotm-2.6.35-0701/mm/sparse.c
@@ -799,3 +799,20 @@ void sparse_remove_one_section(struct zo
 	free_section_usemap(memmap, usemap);
 }
 #endif
+
+#ifdef CONFIG_ARCH_HAS_HOLES_IN_MEMMAP
+int pfn_valid_mapped(unsigned long pfn)
+{
+	struct page *page = pfn_to_page(pfn);
+	char *lastbyte = (char *)(page+1)-1;
+	char byte;
+
+	if(__get_user(byte, page) != 0)
+		return 0;
+
+	if ((((unsigned long)page) & PAGE_MASK) ==
+	    (((unsigned long)lastbyte) & PAGE_MASK))
+		return 1;
+	return (__get_user(byte,lastbyte) == 0);
+}
+#endif






^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  3:19   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  3:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, 13 Jul 2010 00:53:48 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Kukjin, Could you test below patch?
> I don't have any sparsemem system. Sorry. 
> 
> -- CUT DOWN HERE --
> 
> Kukjin reported oops happen while he change min_free_kbytes
> http://www.spinics.net/lists/arm-kernel/msg92894.html
> It happen by memory map on sparsemem. 
> 
> The system has a memory map following as. 
>      section 0             section 1              section 2
> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> SECTION_SIZE_BITS 28(256M)
> 
> It means section 0 is an incompletely filled section.
> Nontheless, current pfn_valid of sparsemem checks pfn loosely. 
> 
> It checks only mem_section's validation.
> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> It's not what we want. 
> 
> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
> 
> P.S) 
> It is just RFC. If we agree with this, I will make the patch on mmotm.
> 
> --
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b4d109e..6c2147a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -979,6 +979,8 @@ struct mem_section {
>         struct page_cgroup *page_cgroup;
>         unsigned long pad;
>  #endif
> +       unsigned long start_pfn;
> +       unsigned long end_pfn;
>  };
>  

I have 2 concerns.
 1. This makes mem_section twice. Wasting too much memory and not good for cache.
    But yes, you can put this under some CONFIG which has small number of mem_section[].

 2. This can't be help for a case where a section has multiple small holes.


Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
==
Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
To handle that, pfn_valid() should check there are really memmap or not.
For that purpose, __get_user() can be used.
This idea is from ia64_pfn_valid().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 include/linux/mmzone.h |   12 ++++++++++++
 mm/sparse.c            |   17 +++++++++++++++++
 2 files changed, 29 insertions(+)

Index: mmotm-2.6.35-0701/include/linux/mmzone.h
===================================================================
--- mmotm-2.6.35-0701.orig/include/linux/mmzone.h
+++ mmotm-2.6.35-0701/include/linux/mmzone.h
@@ -1047,12 +1047,24 @@ static inline struct mem_section *__pfn_
 	return __nr_to_section(pfn_to_section_nr(pfn));
 }
 
+#ifndef CONFIG_ARCH_HAS_HOLES_IN_MEMMAP
 static inline int pfn_valid(unsigned long pfn)
 {
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
 	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
 }
+#else
+extern int pfn_valid_mapped(unsigned long pfn);
+static inline int pfn_valid(unsigned long pfn)
+{
+	if (pfn_to_seciton_nr(pfn) >= NR_MEM_SECTIONS)
+		return 0;
+	if (!valid_section(__nr_to_section(pfn_to_section_nr(pfn))))
+		return 0;
+	return pfn_valid_mapped(pfn);
+}
+#endif
 
 static inline int pfn_present(unsigned long pfn)
 {
Index: mmotm-2.6.35-0701/mm/sparse.c
===================================================================
--- mmotm-2.6.35-0701.orig/mm/sparse.c
+++ mmotm-2.6.35-0701/mm/sparse.c
@@ -799,3 +799,20 @@ void sparse_remove_one_section(struct zo
 	free_section_usemap(memmap, usemap);
 }
 #endif
+
+#ifdef CONFIG_ARCH_HAS_HOLES_IN_MEMMAP
+int pfn_valid_mapped(unsigned long pfn)
+{
+	struct page *page = pfn_to_page(pfn);
+	char *lastbyte = (char *)(page+1)-1;
+	char byte;
+
+	if(__get_user(byte, page) != 0)
+		return 0;
+
+	if ((((unsigned long)page) & PAGE_MASK) ==
+	    (((unsigned long)lastbyte) & PAGE_MASK))
+		return 1;
+	return (__get_user(byte,lastbyte) == 0);
+}
+#endif





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  3:19   ` KAMEZAWA Hiroyuki
@ 2010-07-13  4:11     ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-13  4:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, Jul 13, 2010 at 12:19 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 13 Jul 2010 00:53:48 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Kukjin, Could you test below patch?
>> I don't have any sparsemem system. Sorry.
>>
>> -- CUT DOWN HERE --
>>
>> Kukjin reported oops happen while he change min_free_kbytes
>> http://www.spinics.net/lists/arm-kernel/msg92894.html
>> It happen by memory map on sparsemem.
>>
>> The system has a memory map following as.
>>      section 0             section 1              section 2
>> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
>> SECTION_SIZE_BITS 28(256M)
>>
>> It means section 0 is an incompletely filled section.
>> Nontheless, current pfn_valid of sparsemem checks pfn loosely.
>>
>> It checks only mem_section's validation.
>> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
>> It's not what we want.
>>
>> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
>>
>> P.S)
>> It is just RFC. If we agree with this, I will make the patch on mmotm.
>>
>> --
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index b4d109e..6c2147a 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -979,6 +979,8 @@ struct mem_section {
>>         struct page_cgroup *page_cgroup;
>>         unsigned long pad;
>>  #endif
>> +       unsigned long start_pfn;
>> +       unsigned long end_pfn;
>>  };
>>
>
> I have 2 concerns.
>  1. This makes mem_section twice. Wasting too much memory and not good for cache.
>    But yes, you can put this under some CONFIG which has small number of mem_section[].
>

I think memory usage isn't a big deal. but for cache, we can move
fields into just after section_mem_map.

>  2. This can't be help for a case where a section has multiple small holes.

I agree. But this(not punched hole but not filled section problem)
isn't such case. But it would be better to handle it altogether. :)

>
> Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
> ==
> Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
> To handle that, pfn_valid() should check there are really memmap or not.
> For that purpose, __get_user() can be used.

Look at free_unused_memmap. We don't unmap pte of hole memmap.
Is __get_use effective, still?




-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  4:11     ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-13  4:11 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, Jul 13, 2010 at 12:19 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 13 Jul 2010 00:53:48 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Kukjin, Could you test below patch?
>> I don't have any sparsemem system. Sorry.
>>
>> -- CUT DOWN HERE --
>>
>> Kukjin reported oops happen while he change min_free_kbytes
>> http://www.spinics.net/lists/arm-kernel/msg92894.html
>> It happen by memory map on sparsemem.
>>
>> The system has a memory map following as.
>>      section 0             section 1              section 2
>> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
>> SECTION_SIZE_BITS 28(256M)
>>
>> It means section 0 is an incompletely filled section.
>> Nontheless, current pfn_valid of sparsemem checks pfn loosely.
>>
>> It checks only mem_section's validation.
>> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
>> It's not what we want.
>>
>> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
>>
>> P.S)
>> It is just RFC. If we agree with this, I will make the patch on mmotm.
>>
>> --
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index b4d109e..6c2147a 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -979,6 +979,8 @@ struct mem_section {
>>         struct page_cgroup *page_cgroup;
>>         unsigned long pad;
>>  #endif
>> +       unsigned long start_pfn;
>> +       unsigned long end_pfn;
>>  };
>>
>
> I have 2 concerns.
>  1. This makes mem_section twice. Wasting too much memory and not good for cache.
>    But yes, you can put this under some CONFIG which has small number of mem_section[].
>

I think memory usage isn't a big deal. but for cache, we can move
fields into just after section_mem_map.

>  2. This can't be help for a case where a section has multiple small holes.

I agree. But this(not punched hole but not filled section problem)
isn't such case. But it would be better to handle it altogether. :)

>
> Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
> ==
> Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
> To handle that, pfn_valid() should check there are really memmap or not.
> For that purpose, __get_user() can be used.

Look at free_unused_memmap. We don't unmap pte of hole memmap.
Is __get_use effective, still?




-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  4:11     ` Minchan Kim
@ 2010-07-13  4:23       ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  4:23 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, 13 Jul 2010 13:11:14 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Tue, Jul 13, 2010 at 12:19 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Tue, 13 Jul 2010 00:53:48 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> Kukjin, Could you test below patch?
> >> I don't have any sparsemem system. Sorry.
> >>
> >> -- CUT DOWN HERE --
> >>
> >> Kukjin reported oops happen while he change min_free_kbytes
> >> http://www.spinics.net/lists/arm-kernel/msg92894.html
> >> It happen by memory map on sparsemem.
> >>
> >> The system has a memory map following as.
> >>      section 0             section 1              section 2
> >> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> >> SECTION_SIZE_BITS 28(256M)
> >>
> >> It means section 0 is an incompletely filled section.
> >> Nontheless, current pfn_valid of sparsemem checks pfn loosely.
> >>
> >> It checks only mem_section's validation.
> >> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> >> It's not what we want.
> >>
> >> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
> >>
> >> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> >> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
> >>
> >> P.S)
> >> It is just RFC. If we agree with this, I will make the patch on mmotm.
> >>
> >> --
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index b4d109e..6c2147a 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -979,6 +979,8 @@ struct mem_section {
> >>         struct page_cgroup *page_cgroup;
> >>         unsigned long pad;
> >>  #endif
> >> +       unsigned long start_pfn;
> >> +       unsigned long end_pfn;
> >>  };
> >>
> >
> > I have 2 concerns.
> >  1. This makes mem_section twice. Wasting too much memory and not good for cache.
> >    But yes, you can put this under some CONFIG which has small number of mem_section[].
> >
> 
> I think memory usage isn't a big deal. but for cache, we can move
> fields into just after section_mem_map.
> 
I don't think so. This addtional field can eat up the amount of memory you saved
by unmap.

> >  2. This can't be help for a case where a section has multiple small holes.
> 
> I agree. But this(not punched hole but not filled section problem)
> isn't such case. But it would be better to handle it altogether. :)
> 
> >
> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
> > ==
> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
> > To handle that, pfn_valid() should check there are really memmap or not.
> > For that purpose, __get_user() can be used.
> 
> Look at free_unused_memmap. We don't unmap pte of hole memmap.
> Is __get_use effective, still?
> 
__get_user() works with TLB and page table, the vaddr is really mapped or not.
If you got SEGV, __get_user() returns -EFAULT. It works per page granule.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  4:23       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  4:23 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, 13 Jul 2010 13:11:14 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Tue, Jul 13, 2010 at 12:19 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Tue, 13 Jul 2010 00:53:48 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> Kukjin, Could you test below patch?
> >> I don't have any sparsemem system. Sorry.
> >>
> >> -- CUT DOWN HERE --
> >>
> >> Kukjin reported oops happen while he change min_free_kbytes
> >> http://www.spinics.net/lists/arm-kernel/msg92894.html
> >> It happen by memory map on sparsemem.
> >>
> >> The system has a memory map following as.
> >> A  A  A section 0 A  A  A  A  A  A  section 1 A  A  A  A  A  A  A section 2
> >> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> >> SECTION_SIZE_BITS 28(256M)
> >>
> >> It means section 0 is an incompletely filled section.
> >> Nontheless, current pfn_valid of sparsemem checks pfn loosely.
> >>
> >> It checks only mem_section's validation.
> >> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> >> It's not what we want.
> >>
> >> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
> >>
> >> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> >> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
> >>
> >> P.S)
> >> It is just RFC. If we agree with this, I will make the patch on mmotm.
> >>
> >> --
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index b4d109e..6c2147a 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -979,6 +979,8 @@ struct mem_section {
> >> A  A  A  A  struct page_cgroup *page_cgroup;
> >> A  A  A  A  unsigned long pad;
> >> A #endif
> >> + A  A  A  unsigned long start_pfn;
> >> + A  A  A  unsigned long end_pfn;
> >> A };
> >>
> >
> > I have 2 concerns.
> > A 1. This makes mem_section twice. Wasting too much memory and not good for cache.
> > A  A But yes, you can put this under some CONFIG which has small number of mem_section[].
> >
> 
> I think memory usage isn't a big deal. but for cache, we can move
> fields into just after section_mem_map.
> 
I don't think so. This addtional field can eat up the amount of memory you saved
by unmap.

> > A 2. This can't be help for a case where a section has multiple small holes.
> 
> I agree. But this(not punched hole but not filled section problem)
> isn't such case. But it would be better to handle it altogether. :)
> 
> >
> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
> > ==
> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
> > To handle that, pfn_valid() should check there are really memmap or not.
> > For that purpose, __get_user() can be used.
> 
> Look at free_unused_memmap. We don't unmap pte of hole memmap.
> Is __get_use effective, still?
> 
__get_user() works with TLB and page table, the vaddr is really mapped or not.
If you got SEGV, __get_user() returns -EFAULT. It works per page granule.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  4:23       ` KAMEZAWA Hiroyuki
@ 2010-07-13  6:04         ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-13  6:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, Jul 13, 2010 at 1:23 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 13 Jul 2010 13:11:14 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Tue, Jul 13, 2010 at 12:19 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Tue, 13 Jul 2010 00:53:48 +0900
>> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >
>> >> Kukjin, Could you test below patch?
>> >> I don't have any sparsemem system. Sorry.
>> >>
>> >> -- CUT DOWN HERE --
>> >>
>> >> Kukjin reported oops happen while he change min_free_kbytes
>> >> http://www.spinics.net/lists/arm-kernel/msg92894.html
>> >> It happen by memory map on sparsemem.
>> >>
>> >> The system has a memory map following as.
>> >>      section 0             section 1              section 2
>> >> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
>> >> SECTION_SIZE_BITS 28(256M)
>> >>
>> >> It means section 0 is an incompletely filled section.
>> >> Nontheless, current pfn_valid of sparsemem checks pfn loosely.
>> >>
>> >> It checks only mem_section's validation.
>> >> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
>> >> It's not what we want.
>> >>
>> >> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
>> >>
>> >> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> >> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
>> >>
>> >> P.S)
>> >> It is just RFC. If we agree with this, I will make the patch on mmotm.
>> >>
>> >> --
>> >>
>> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> >> index b4d109e..6c2147a 100644
>> >> --- a/include/linux/mmzone.h
>> >> +++ b/include/linux/mmzone.h
>> >> @@ -979,6 +979,8 @@ struct mem_section {
>> >>         struct page_cgroup *page_cgroup;
>> >>         unsigned long pad;
>> >>  #endif
>> >> +       unsigned long start_pfn;
>> >> +       unsigned long end_pfn;
>> >>  };
>> >>
>> >
>> > I have 2 concerns.
>> >  1. This makes mem_section twice. Wasting too much memory and not good for cache.
>> >    But yes, you can put this under some CONFIG which has small number of mem_section[].
>> >
>>
>> I think memory usage isn't a big deal. but for cache, we can move
>> fields into just after section_mem_map.
>>
> I don't think so. This addtional field can eat up the amount of memory you saved
> by unmap.

Agree.

>
>> >  2. This can't be help for a case where a section has multiple small holes.
>>
>> I agree. But this(not punched hole but not filled section problem)
>> isn't such case. But it would be better to handle it altogether. :)
>>
>> >
>> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
>> > ==
>> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
>> > To handle that, pfn_valid() should check there are really memmap or not.
>> > For that purpose, __get_user() can be used.
>>
>> Look at free_unused_memmap. We don't unmap pte of hole memmap.
>> Is __get_use effective, still?
>>
> __get_user() works with TLB and page table, the vaddr is really mapped or not.
> If you got SEGV, __get_user() returns -EFAULT. It works per page granule.

I mean following as.
For example, there is a struct page in on 0x20000000.

int pfn_valid_mapped(unsigned long pfn)
{
       struct page *page = pfn_to_page(pfn); /* hole page is 0x2000000 */
       char *lastbyte = (char *)(page+1)-1;  /* lastbyte is 0x2000001f */
       char byte;

       /* We pass this test since free_unused_memmap doesn't unmap pte */
       if(__get_user(byte, page) != 0)				
               return 0;
	 /*
	  * (0x20000000 & PAGE_MASK) == (0x2000001f & PAGE_MASK)
	  * So, return 1, it is wrong result.
	  */
       if ((((unsigned long)page) & PAGE_MASK) ==
           (((unsigned long)lastbyte) & PAGE_MASK))
               return 1;
       return (__get_user(byte,lastbyte) == 0);
}

Am I missing something?


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  6:04         ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-13  6:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, Jul 13, 2010 at 1:23 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 13 Jul 2010 13:11:14 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Tue, Jul 13, 2010 at 12:19 PM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Tue, 13 Jul 2010 00:53:48 +0900
>> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >
>> >> Kukjin, Could you test below patch?
>> >> I don't have any sparsemem system. Sorry.
>> >>
>> >> -- CUT DOWN HERE --
>> >>
>> >> Kukjin reported oops happen while he change min_free_kbytes
>> >> http://www.spinics.net/lists/arm-kernel/msg92894.html
>> >> It happen by memory map on sparsemem.
>> >>
>> >> The system has a memory map following as.
>> >>      section 0             section 1              section 2
>> >> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
>> >> SECTION_SIZE_BITS 28(256M)
>> >>
>> >> It means section 0 is an incompletely filled section.
>> >> Nontheless, current pfn_valid of sparsemem checks pfn loosely.
>> >>
>> >> It checks only mem_section's validation.
>> >> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
>> >> It's not what we want.
>> >>
>> >> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
>> >>
>> >> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> >> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
>> >>
>> >> P.S)
>> >> It is just RFC. If we agree with this, I will make the patch on mmotm.
>> >>
>> >> --
>> >>
>> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> >> index b4d109e..6c2147a 100644
>> >> --- a/include/linux/mmzone.h
>> >> +++ b/include/linux/mmzone.h
>> >> @@ -979,6 +979,8 @@ struct mem_section {
>> >>         struct page_cgroup *page_cgroup;
>> >>         unsigned long pad;
>> >>  #endif
>> >> +       unsigned long start_pfn;
>> >> +       unsigned long end_pfn;
>> >>  };
>> >>
>> >
>> > I have 2 concerns.
>> >  1. This makes mem_section twice. Wasting too much memory and not good for cache.
>> >    But yes, you can put this under some CONFIG which has small number of mem_section[].
>> >
>>
>> I think memory usage isn't a big deal. but for cache, we can move
>> fields into just after section_mem_map.
>>
> I don't think so. This addtional field can eat up the amount of memory you saved
> by unmap.

Agree.

>
>> >  2. This can't be help for a case where a section has multiple small holes.
>>
>> I agree. But this(not punched hole but not filled section problem)
>> isn't such case. But it would be better to handle it altogether. :)
>>
>> >
>> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
>> > ==
>> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
>> > To handle that, pfn_valid() should check there are really memmap or not.
>> > For that purpose, __get_user() can be used.
>>
>> Look at free_unused_memmap. We don't unmap pte of hole memmap.
>> Is __get_use effective, still?
>>
> __get_user() works with TLB and page table, the vaddr is really mapped or not.
> If you got SEGV, __get_user() returns -EFAULT. It works per page granule.

I mean following as.
For example, there is a struct page in on 0x20000000.

int pfn_valid_mapped(unsigned long pfn)
{
       struct page *page = pfn_to_page(pfn); /* hole page is 0x2000000 */
       char *lastbyte = (char *)(page+1)-1;  /* lastbyte is 0x2000001f */
       char byte;

       /* We pass this test since free_unused_memmap doesn't unmap pte */
       if(__get_user(byte, page) != 0)				
               return 0;
	 /*
	  * (0x20000000 & PAGE_MASK) == (0x2000001f & PAGE_MASK)
	  * So, return 1, it is wrong result.
	  */
       if ((((unsigned long)page) & PAGE_MASK) ==
           (((unsigned long)lastbyte) & PAGE_MASK))
               return 1;
       return (__get_user(byte,lastbyte) == 0);
}

Am I missing something?


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  6:04         ` Minchan Kim
@ 2010-07-13  6:40           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  6:40 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, 13 Jul 2010 15:04:00 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> >> >  2. This can't be help for a case where a section has multiple small holes.
> >>
> >> I agree. But this(not punched hole but not filled section problem)
> >> isn't such case. But it would be better to handle it altogether. :)
> >>
> >> >
> >> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
> >> > ==
> >> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
> >> > To handle that, pfn_valid() should check there are really memmap or not.
> >> > For that purpose, __get_user() can be used.
> >>
> >> Look at free_unused_memmap. We don't unmap pte of hole memmap.
> >> Is __get_use effective, still?
> >>
> > __get_user() works with TLB and page table, the vaddr is really mapped or not.
> > If you got SEGV, __get_user() returns -EFAULT. It works per page granule.
> 
> I mean following as.
> For example, there is a struct page in on 0x20000000.
> 
> int pfn_valid_mapped(unsigned long pfn)
> {
>        struct page *page = pfn_to_page(pfn); /* hole page is 0x2000000 */
>        char *lastbyte = (char *)(page+1)-1;  /* lastbyte is 0x2000001f */
>        char byte;
> 
>        /* We pass this test since free_unused_memmap doesn't unmap pte */
>        if(__get_user(byte, page) != 0)				
>                return 0;

why ? When the page size is 4096 byte.

      0x1ffff000 - 0x1ffffffff
      0x20000000 - 0x200000fff are on the same page. And memory is mapped per page.

What we access by above __get_user() is a byte at [0x20000000, 0x20000001)
and it's unmapped if 0x20000000 is unmapped.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  6:40           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  6:40 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, 13 Jul 2010 15:04:00 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> >> > A 2. This can't be help for a case where a section has multiple small holes.
> >>
> >> I agree. But this(not punched hole but not filled section problem)
> >> isn't such case. But it would be better to handle it altogether. :)
> >>
> >> >
> >> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
> >> > ==
> >> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
> >> > To handle that, pfn_valid() should check there are really memmap or not.
> >> > For that purpose, __get_user() can be used.
> >>
> >> Look at free_unused_memmap. We don't unmap pte of hole memmap.
> >> Is __get_use effective, still?
> >>
> > __get_user() works with TLB and page table, the vaddr is really mapped or not.
> > If you got SEGV, __get_user() returns -EFAULT. It works per page granule.
> 
> I mean following as.
> For example, there is a struct page in on 0x20000000.
> 
> int pfn_valid_mapped(unsigned long pfn)
> {
>        struct page *page = pfn_to_page(pfn); /* hole page is 0x2000000 */
>        char *lastbyte = (char *)(page+1)-1;  /* lastbyte is 0x2000001f */
>        char byte;
> 
>        /* We pass this test since free_unused_memmap doesn't unmap pte */
>        if(__get_user(byte, page) != 0)				
>                return 0;

why ? When the page size is 4096 byte.

      0x1ffff000 - 0x1ffffffff
      0x20000000 - 0x200000fff are on the same page. And memory is mapped per page.

What we access by above __get_user() is a byte at [0x20000000, 0x20000001)
and it's unmapped if 0x20000000 is unmapped.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  6:04         ` Minchan Kim
@ 2010-07-13  7:20           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 58+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13  7:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KAMEZAWA Hiroyuki, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	Shaohua Li, Yakui Zhao, linux-kernel, linux-mm, arm-kernel,
	kgene.kim, Mel Gorman

On Tue, Jul 13, 2010 at 03:04:00PM +0900, Minchan Kim wrote:
> > __get_user() works with TLB and page table, the vaddr is really mapped or not.
> > If you got SEGV, __get_user() returns -EFAULT. It works per page granule.

Not in kernel space.  It works on 1MB sections there.

Testing whether a page is mapped by __get_user is a hugely expensive
way to test whether a PFN is valid.  It'd be cheaper to use our
flatmem implementation of pfn_valid() instead.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  7:20           ` Russell King - ARM Linux
  0 siblings, 0 replies; 58+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13  7:20 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KAMEZAWA Hiroyuki, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	Shaohua Li, Yakui Zhao, linux-kernel, linux-mm, arm-kernel,
	kgene.kim, Mel Gorman

On Tue, Jul 13, 2010 at 03:04:00PM +0900, Minchan Kim wrote:
> > __get_user() works with TLB and page table, the vaddr is really mapped or not.
> > If you got SEGV, __get_user() returns -EFAULT. It works per page granule.

Not in kernel space.  It works on 1MB sections there.

Testing whether a page is mapped by __get_user is a hugely expensive
way to test whether a PFN is valid.  It'd be cheaper to use our
flatmem implementation of pfn_valid() instead.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  7:20           ` Russell King - ARM Linux
@ 2010-07-13  7:34             ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  7:34 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Minchan Kim, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	Shaohua Li, Yakui Zhao, linux-kernel, linux-mm, arm-kernel,
	kgene.kim, Mel Gorman

On Tue, 13 Jul 2010 08:20:09 +0100
Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:

> On Tue, Jul 13, 2010 at 03:04:00PM +0900, Minchan Kim wrote:
> > > __get_user() works with TLB and page table, the vaddr is really mapped or not.
> > > If you got SEGV, __get_user() returns -EFAULT. It works per page granule.
> 
> Not in kernel space.  It works on 1MB sections there.
> 
> Testing whether a page is mapped by __get_user is a hugely expensive
> way to test whether a PFN is valid.

Note: pfn_valid() is for checking "there is memmap". 

> It'd be cheaper to use our flatmem implementation of pfn_valid() instead.
> 
Hmm. IIUC, pfn_valid() succeeds in almost all case if there is a section.
But yes, I'm not familar with ARM. 

I love another idea as I've already shown as preparing _a_ page filled with
0x00004000 and map it into the all holes. PG_reserved will help almost all case
even if it's ugly.

Anyway, sparsemem is designed to be aligned to SECTION_SIZE of memmap.
Please avoid adding new Spaghetti code without proper configs.
Thanks,
-Kame


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  7:34             ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  7:34 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Minchan Kim, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	Shaohua Li, Yakui Zhao, linux-kernel, linux-mm, arm-kernel,
	kgene.kim, Mel Gorman

On Tue, 13 Jul 2010 08:20:09 +0100
Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:

> On Tue, Jul 13, 2010 at 03:04:00PM +0900, Minchan Kim wrote:
> > > __get_user() works with TLB and page table, the vaddr is really mapped or not.
> > > If you got SEGV, __get_user() returns -EFAULT. It works per page granule.
> 
> Not in kernel space.  It works on 1MB sections there.
> 
> Testing whether a page is mapped by __get_user is a hugely expensive
> way to test whether a PFN is valid.

Note: pfn_valid() is for checking "there is memmap". 

> It'd be cheaper to use our flatmem implementation of pfn_valid() instead.
> 
Hmm. IIUC, pfn_valid() succeeds in almost all case if there is a section.
But yes, I'm not familar with ARM. 

I love another idea as I've already shown as preparing _a_ page filled with
0x00004000 and map it into the all holes. PG_reserved will help almost all case
even if it's ugly.

Anyway, sparsemem is designed to be aligned to SECTION_SIZE of memmap.
Please avoid adding new Spaghetti code without proper configs.
Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  7:34             ` KAMEZAWA Hiroyuki
@ 2010-07-13  7:58               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  7:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Russell King - ARM Linux, Minchan Kim, Yinghai Lu,
	H. Peter Anvin, Andrew Morton, Shaohua Li, Yakui Zhao,
	linux-kernel, linux-mm, arm-kernel, kgene.kim, Mel Gorman

On Tue, 13 Jul 2010 16:34:17 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Anyway, sparsemem is designed to be aligned to SECTION_SIZE of memmap.
> Please avoid adding new Spaghetti code without proper configs.
> Thanks,

Ok, I realized I misunderstand all. Arm doesn't unmap memmap but reuse the page
for memmap without modifing ptes. My routine only works when ARM uses sparsemem_vmemmap.
But yes, it isn't.

Hmm...How about using pfn_valid() for FLATMEM or avoid using SPARSEMEM ?
If you want conrols lower than SPARSEMEM, FLATMEM works better because ARM unmaps memmap.
What is the reason for SPARSEMEM ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  7:58               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  7:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Russell King - ARM Linux, Minchan Kim, Yinghai Lu,
	H. Peter Anvin, Andrew Morton, Shaohua Li, Yakui Zhao,
	linux-kernel, linux-mm, arm-kernel, kgene.kim, Mel Gorman

On Tue, 13 Jul 2010 16:34:17 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> Anyway, sparsemem is designed to be aligned to SECTION_SIZE of memmap.
> Please avoid adding new Spaghetti code without proper configs.
> Thanks,

Ok, I realized I misunderstand all. Arm doesn't unmap memmap but reuse the page
for memmap without modifing ptes. My routine only works when ARM uses sparsemem_vmemmap.
But yes, it isn't.

Hmm...How about using pfn_valid() for FLATMEM or avoid using SPARSEMEM ?
If you want conrols lower than SPARSEMEM, FLATMEM works better because ARM unmaps memmap.
What is the reason for SPARSEMEM ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  7:58               ` KAMEZAWA Hiroyuki
@ 2010-07-13  8:02                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  8:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Russell King - ARM Linux, Minchan Kim, Yinghai Lu,
	H. Peter Anvin, Andrew Morton, Shaohua Li, Yakui Zhao,
	linux-kernel, linux-mm, arm-kernel, kgene.kim, Mel Gorman

On Tue, 13 Jul 2010 16:58:08 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 13 Jul 2010 16:34:17 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > Anyway, sparsemem is designed to be aligned to SECTION_SIZE of memmap.
> > Please avoid adding new Spaghetti code without proper configs.
> > Thanks,
> 
> Ok, I realized I misunderstand all. Arm doesn't unmap memmap but reuse the page
> for memmap without modifing ptes. My routine only works when ARM uses sparsemem_vmemmap.
> But yes, it isn't.
> 
> Hmm...How about using pfn_valid() for FLATMEM or avoid using SPARSEMEM ?
> If you want conrols lower than SPARSEMEM, FLATMEM works better because ARM unmaps memmap.
                      allocation of memmap in lower granule than SPARSEMEM.


How about stop using SPARSEMEM ? What's the benefit ? It just eats up memory for
mem_section[].

Sorry,
-Kame


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  8:02                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  8:02 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Russell King - ARM Linux, Minchan Kim, Yinghai Lu,
	H. Peter Anvin, Andrew Morton, Shaohua Li, Yakui Zhao,
	linux-kernel, linux-mm, arm-kernel, kgene.kim, Mel Gorman

On Tue, 13 Jul 2010 16:58:08 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:

> On Tue, 13 Jul 2010 16:34:17 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> 
> > Anyway, sparsemem is designed to be aligned to SECTION_SIZE of memmap.
> > Please avoid adding new Spaghetti code without proper configs.
> > Thanks,
> 
> Ok, I realized I misunderstand all. Arm doesn't unmap memmap but reuse the page
> for memmap without modifing ptes. My routine only works when ARM uses sparsemem_vmemmap.
> But yes, it isn't.
> 
> Hmm...How about using pfn_valid() for FLATMEM or avoid using SPARSEMEM ?
> If you want conrols lower than SPARSEMEM, FLATMEM works better because ARM unmaps memmap.
                      allocation of memmap in lower granule than SPARSEMEM.


How about stop using SPARSEMEM ? What's the benefit ? It just eats up memory for
mem_section[].

Sorry,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  8:06             ` Minchan Kim
@ 2010-07-13  8:03               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  8:03 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, 13 Jul 2010 17:06:56 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Tue, Jul 13, 2010 at 3:40 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Tue, 13 Jul 2010 15:04:00 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> >> >  2. This can't be help for a case where a section has multiple small holes.
> >> >>
> >> >> I agree. But this(not punched hole but not filled section problem)
> >> >> isn't such case. But it would be better to handle it altogether. :)
> >> >>
> >> >> >
> >> >> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
> >> >> > ==
> >> >> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
> >> >> > To handle that, pfn_valid() should check there are really memmap or not.
> >> >> > For that purpose, __get_user() can be used.
> >> >>
> >> >> Look at free_unused_memmap. We don't unmap pte of hole memmap.
> >> >> Is __get_use effective, still?
> >> >>
> >> > __get_user() works with TLB and page table, the vaddr is really mapped or not.
> >> > If you got SEGV, __get_user() returns -EFAULT. It works per page granule.
> >>
> >> I mean following as.
> >> For example, there is a struct page in on 0x20000000.
> >>
> >> int pfn_valid_mapped(unsigned long pfn)
> >> {
> >>        struct page *page = pfn_to_page(pfn); /* hole page is 0x2000000 */
> >>        char *lastbyte = (char *)(page+1)-1;  /* lastbyte is 0x2000001f */
> >>        char byte;
> >>
> >>        /* We pass this test since free_unused_memmap doesn't unmap pte */
> >>        if(__get_user(byte, page) != 0)
> >>                return 0;
> >
> > why ? When the page size is 4096 byte.
> >
> >      0x1ffff000 - 0x1ffffffff
> >      0x20000000 - 0x200000fff are on the same page. And memory is mapped per page.
> 
> sizeof(struct page) is 32 byte.
> So lastbyte is address of struct page + 32 byte - 1.
> 
> > What we access by above __get_user() is a byte at [0x20000000, 0x20000001)
> 
> Right.
> 
> > and it's unmapped if 0x20000000 is unmapped.
> 
> free_unused_memmap doesn't unmap pte although it returns the page to
> free list of buddy.
> 
ok, I understood. please see my latest mail and ignore all others.

-kame


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  8:03               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-13  8:03 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, 13 Jul 2010 17:06:56 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Tue, Jul 13, 2010 at 3:40 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Tue, 13 Jul 2010 15:04:00 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> >> > A 2. This can't be help for a case where a section has multiple small holes.
> >> >>
> >> >> I agree. But this(not punched hole but not filled section problem)
> >> >> isn't such case. But it would be better to handle it altogether. :)
> >> >>
> >> >> >
> >> >> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
> >> >> > ==
> >> >> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
> >> >> > To handle that, pfn_valid() should check there are really memmap or not.
> >> >> > For that purpose, __get_user() can be used.
> >> >>
> >> >> Look at free_unused_memmap. We don't unmap pte of hole memmap.
> >> >> Is __get_use effective, still?
> >> >>
> >> > __get_user() works with TLB and page table, the vaddr is really mapped or not.
> >> > If you got SEGV, __get_user() returns -EFAULT. It works per page granule.
> >>
> >> I mean following as.
> >> For example, there is a struct page in on 0x20000000.
> >>
> >> int pfn_valid_mapped(unsigned long pfn)
> >> {
> >> A  A  A  A struct page *page = pfn_to_page(pfn); /* hole page is 0x2000000 */
> >> A  A  A  A char *lastbyte = (char *)(page+1)-1; A /* lastbyte is 0x2000001f */
> >> A  A  A  A char byte;
> >>
> >> A  A  A  A /* We pass this test since free_unused_memmap doesn't unmap pte */
> >> A  A  A  A if(__get_user(byte, page) != 0)
> >> A  A  A  A  A  A  A  A return 0;
> >
> > why ? When the page size is 4096 byte.
> >
> > A  A  A 0x1ffff000 - 0x1ffffffff
> > A  A  A 0x20000000 - 0x200000fff are on the same page. And memory is mapped per page.
> 
> sizeof(struct page) is 32 byte.
> So lastbyte is address of struct page + 32 byte - 1.
> 
> > What we access by above __get_user() is a byte at [0x20000000, 0x20000001)
> 
> Right.
> 
> > and it's unmapped if 0x20000000 is unmapped.
> 
> free_unused_memmap doesn't unmap pte although it returns the page to
> free list of buddy.
> 
ok, I understood. please see my latest mail and ignore all others.

-kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  6:40           ` KAMEZAWA Hiroyuki
@ 2010-07-13  8:06             ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-13  8:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, Jul 13, 2010 at 3:40 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 13 Jul 2010 15:04:00 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> >> >  2. This can't be help for a case where a section has multiple small holes.
>> >>
>> >> I agree. But this(not punched hole but not filled section problem)
>> >> isn't such case. But it would be better to handle it altogether. :)
>> >>
>> >> >
>> >> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
>> >> > ==
>> >> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
>> >> > To handle that, pfn_valid() should check there are really memmap or not.
>> >> > For that purpose, __get_user() can be used.
>> >>
>> >> Look at free_unused_memmap. We don't unmap pte of hole memmap.
>> >> Is __get_use effective, still?
>> >>
>> > __get_user() works with TLB and page table, the vaddr is really mapped or not.
>> > If you got SEGV, __get_user() returns -EFAULT. It works per page granule.
>>
>> I mean following as.
>> For example, there is a struct page in on 0x20000000.
>>
>> int pfn_valid_mapped(unsigned long pfn)
>> {
>>        struct page *page = pfn_to_page(pfn); /* hole page is 0x2000000 */
>>        char *lastbyte = (char *)(page+1)-1;  /* lastbyte is 0x2000001f */
>>        char byte;
>>
>>        /* We pass this test since free_unused_memmap doesn't unmap pte */
>>        if(__get_user(byte, page) != 0)
>>                return 0;
>
> why ? When the page size is 4096 byte.
>
>      0x1ffff000 - 0x1ffffffff
>      0x20000000 - 0x200000fff are on the same page. And memory is mapped per page.

sizeof(struct page) is 32 byte.
So lastbyte is address of struct page + 32 byte - 1.

> What we access by above __get_user() is a byte at [0x20000000, 0x20000001)

Right.

> and it's unmapped if 0x20000000 is unmapped.

free_unused_memmap doesn't unmap pte although it returns the page to
free list of buddy.

>
> Thanks,
> -Kame
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  8:06             ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-13  8:06 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman

On Tue, Jul 13, 2010 at 3:40 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 13 Jul 2010 15:04:00 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> >> >  2. This can't be help for a case where a section has multiple small holes.
>> >>
>> >> I agree. But this(not punched hole but not filled section problem)
>> >> isn't such case. But it would be better to handle it altogether. :)
>> >>
>> >> >
>> >> > Then, my proposal for HOLES_IN_MEMMAP sparsemem is below.
>> >> > ==
>> >> > Some architectures unmap memmap[] for memory holes even with SPARSEMEM.
>> >> > To handle that, pfn_valid() should check there are really memmap or not.
>> >> > For that purpose, __get_user() can be used.
>> >>
>> >> Look at free_unused_memmap. We don't unmap pte of hole memmap.
>> >> Is __get_use effective, still?
>> >>
>> > __get_user() works with TLB and page table, the vaddr is really mapped or not.
>> > If you got SEGV, __get_user() returns -EFAULT. It works per page granule.
>>
>> I mean following as.
>> For example, there is a struct page in on 0x20000000.
>>
>> int pfn_valid_mapped(unsigned long pfn)
>> {
>>        struct page *page = pfn_to_page(pfn); /* hole page is 0x2000000 */
>>        char *lastbyte = (char *)(page+1)-1;  /* lastbyte is 0x2000001f */
>>        char byte;
>>
>>        /* We pass this test since free_unused_memmap doesn't unmap pte */
>>        if(__get_user(byte, page) != 0)
>>                return 0;
>
> why ? When the page size is 4096 byte.
>
>      0x1ffff000 - 0x1ffffffff
>      0x20000000 - 0x200000fff are on the same page. And memory is mapped per page.

sizeof(struct page) is 32 byte.
So lastbyte is address of struct page + 32 byte - 1.

> What we access by above __get_user() is a byte at [0x20000000, 0x20000001)

Right.

> and it's unmapped if 0x20000000 is unmapped.

free_unused_memmap doesn't unmap pte although it returns the page to
free list of buddy.

>
> Thanks,
> -Kame
>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-12 15:53 ` Minchan Kim
@ 2010-07-13  9:30   ` Johannes Weiner
  -1 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2010-07-13  9:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman, KAMEZAWA Hiroyuki

On Tue, Jul 13, 2010 at 12:53:48AM +0900, Minchan Kim wrote:
> Kukjin, Could you test below patch?
> I don't have any sparsemem system. Sorry. 
> 
> -- CUT DOWN HERE --
> 
> Kukjin reported oops happen while he change min_free_kbytes
> http://www.spinics.net/lists/arm-kernel/msg92894.html
> It happen by memory map on sparsemem. 
> 
> The system has a memory map following as. 
>      section 0             section 1              section 2
> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> SECTION_SIZE_BITS 28(256M)
> 
> It means section 0 is an incompletely filled section.
> Nontheless, current pfn_valid of sparsemem checks pfn loosely. 
> 
> It checks only mem_section's validation.
> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> It's not what we want.
> 
> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.

Look at the declaration of struct mem_section for a second.  It is
meant to partition address space uniformly into backed and unbacked
areas.

It comes with implicit size and offset information by means of
SECTION_SIZE_BITS and the section's index in the section array.

Now you are not okay with the _granularity_ but propose to change _the
model_ by introducing a subsection within each section and at the same
time make the concept of a section completely meaningless: its size
becomes arbitrary and its associated mem_map and flags will apply to
the subsection only.

My question is: if the sections are not fine-grained enough, why not
just make them?

The biggest possible section size to describe the memory population on
this machine accurately is 16M.  Why not set SECTION_SIZE_BITS to 24?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  9:30   ` Johannes Weiner
  0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2010-07-13  9:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman, KAMEZAWA Hiroyuki

On Tue, Jul 13, 2010 at 12:53:48AM +0900, Minchan Kim wrote:
> Kukjin, Could you test below patch?
> I don't have any sparsemem system. Sorry. 
> 
> -- CUT DOWN HERE --
> 
> Kukjin reported oops happen while he change min_free_kbytes
> http://www.spinics.net/lists/arm-kernel/msg92894.html
> It happen by memory map on sparsemem. 
> 
> The system has a memory map following as. 
>      section 0             section 1              section 2
> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> SECTION_SIZE_BITS 28(256M)
> 
> It means section 0 is an incompletely filled section.
> Nontheless, current pfn_valid of sparsemem checks pfn loosely. 
> 
> It checks only mem_section's validation.
> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> It's not what we want.
> 
> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.

Look at the declaration of struct mem_section for a second.  It is
meant to partition address space uniformly into backed and unbacked
areas.

It comes with implicit size and offset information by means of
SECTION_SIZE_BITS and the section's index in the section array.

Now you are not okay with the _granularity_ but propose to change _the
model_ by introducing a subsection within each section and at the same
time make the concept of a section completely meaningless: its size
becomes arbitrary and its associated mem_map and flags will apply to
the subsection only.

My question is: if the sections are not fine-grained enough, why not
just make them?

The biggest possible section size to describe the memory population on
this machine accurately is 16M.  Why not set SECTION_SIZE_BITS to 24?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-12 15:53 ` Minchan Kim
@ 2010-07-13  9:37   ` Mel Gorman
  -1 siblings, 0 replies; 58+ messages in thread
From: Mel Gorman @ 2010-07-13  9:37 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	KAMEZAWA Hiroyuki

On Tue, Jul 13, 2010 at 12:53:48AM +0900, Minchan Kim wrote:
> Kukjin, Could you test below patch?
> I don't have any sparsemem system. Sorry. 
> 
> -- CUT DOWN HERE --
> 
> Kukjin reported oops happen while he change min_free_kbytes
> http://www.spinics.net/lists/arm-kernel/msg92894.html
> It happen by memory map on sparsemem. 
> 
> The system has a memory map following as. 
>      section 0             section 1              section 2
> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> SECTION_SIZE_BITS 28(256M)
> 
> It means section 0 is an incompletely filled section.
> Nontheless, current pfn_valid of sparsemem checks pfn loosely. 
> 
> It checks only mem_section's validation.

Note that this is meant to be fine as far as sparsemem is concerned.  This is
how it functions by design - if a section is valid, all pages within that
section are valid. Hence, I consider the comment "sparsemem checks pfn loosely"
unfair as it's as strong as required until someone breaks the model.

It's ARM that is punching holes here and as the hole is at the beginning
of a section, the zone bounaries could have been adjusted after the holes
was punched.

Anyway...

> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> It's not what we want. 
> 
> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
> 
> P.S) 
> It is just RFC. If we agree with this, I will make the patch on mmotm.
> 
> --
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b4d109e..6c2147a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -979,6 +979,8 @@ struct mem_section {
>         struct page_cgroup *page_cgroup;
>         unsigned long pad;
>  #endif
> +       unsigned long start_pfn;
> +       unsigned long end_pfn;
>  };

ouch, 8 to 16 bytes increase on a commonly-used structure. Minimally, I
would only expect this to be defined for
CONFIG_ARCH_HAS_HOLES_MEMORYMODEL. Similarly for the rest of the patch,
I'd expect the "strong" pfn_valid checks to only exist on ARM because
nothing else needs it.

>  
>  #ifdef CONFIG_SPARSEMEM_EXTREME
> @@ -1039,6 +1041,12 @@ static inline int valid_section(struct mem_section *section)
>         return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
>  }
>  
> +static inline int valid_section_pfn(struct mem_section *section, unsigned long pfn)
> +{
> +       return ((section && (section->section_mem_map & SECTION_HAS_MEM_MAP)) &&
> +               (section->start_pfn <= pfn && pfn < section->end_pfn));
> +}
> +
>  static inline int valid_section_nr(unsigned long nr)
>  {
>         return valid_section(__nr_to_section(nr));
> @@ -1053,7 +1061,7 @@ static inline int pfn_valid(unsigned long pfn)
>  {
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
> -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +       return valid_section_pfn(__nr_to_section(pfn_to_section_nr(pfn)), pfn);
>  }
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 95ac219..bde9090 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -195,6 +195,8 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
>                 if (!ms->section_mem_map)
>                         ms->section_mem_map = sparse_encode_early_nid(nid) |
>                                                         SECTION_MARKED_PRESENT;
> +               ms->start_pfn = start;
> +               ms->end_pfn = end;
>         }
>  }
> 

I'd also expect the patch to then delete memmap_valid_within and replace
it with a pfn_valid_within() check (which in turn should call the strong
version of pfn_valid().

I prefer Kamezawa's suggestion of mapping on a ZERO_PAGE-like page full
of PageReserved struct pages because it would have better performance
and be more in line with maintaining the assumptions of the memory
model. If we go in this direction, I would strongly prefer it was an
ARM-only thing.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  9:37   ` Mel Gorman
  0 siblings, 0 replies; 58+ messages in thread
From: Mel Gorman @ 2010-07-13  9:37 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	KAMEZAWA Hiroyuki

On Tue, Jul 13, 2010 at 12:53:48AM +0900, Minchan Kim wrote:
> Kukjin, Could you test below patch?
> I don't have any sparsemem system. Sorry. 
> 
> -- CUT DOWN HERE --
> 
> Kukjin reported oops happen while he change min_free_kbytes
> http://www.spinics.net/lists/arm-kernel/msg92894.html
> It happen by memory map on sparsemem. 
> 
> The system has a memory map following as. 
>      section 0             section 1              section 2
> 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> SECTION_SIZE_BITS 28(256M)
> 
> It means section 0 is an incompletely filled section.
> Nontheless, current pfn_valid of sparsemem checks pfn loosely. 
> 
> It checks only mem_section's validation.

Note that this is meant to be fine as far as sparsemem is concerned.  This is
how it functions by design - if a section is valid, all pages within that
section are valid. Hence, I consider the comment "sparsemem checks pfn loosely"
unfair as it's as strong as required until someone breaks the model.

It's ARM that is punching holes here and as the hole is at the beginning
of a section, the zone bounaries could have been adjusted after the holes
was punched.

Anyway...

> So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> It's not what we want. 
> 
> The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Reported-by: Kukjin Kim <kgene.kim@samsung.com>
> 
> P.S) 
> It is just RFC. If we agree with this, I will make the patch on mmotm.
> 
> --
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index b4d109e..6c2147a 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -979,6 +979,8 @@ struct mem_section {
>         struct page_cgroup *page_cgroup;
>         unsigned long pad;
>  #endif
> +       unsigned long start_pfn;
> +       unsigned long end_pfn;
>  };

ouch, 8 to 16 bytes increase on a commonly-used structure. Minimally, I
would only expect this to be defined for
CONFIG_ARCH_HAS_HOLES_MEMORYMODEL. Similarly for the rest of the patch,
I'd expect the "strong" pfn_valid checks to only exist on ARM because
nothing else needs it.

>  
>  #ifdef CONFIG_SPARSEMEM_EXTREME
> @@ -1039,6 +1041,12 @@ static inline int valid_section(struct mem_section *section)
>         return (section && (section->section_mem_map & SECTION_HAS_MEM_MAP));
>  }
>  
> +static inline int valid_section_pfn(struct mem_section *section, unsigned long pfn)
> +{
> +       return ((section && (section->section_mem_map & SECTION_HAS_MEM_MAP)) &&
> +               (section->start_pfn <= pfn && pfn < section->end_pfn));
> +}
> +
>  static inline int valid_section_nr(unsigned long nr)
>  {
>         return valid_section(__nr_to_section(nr));
> @@ -1053,7 +1061,7 @@ static inline int pfn_valid(unsigned long pfn)
>  {
>         if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>                 return 0;
> -       return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +       return valid_section_pfn(__nr_to_section(pfn_to_section_nr(pfn)), pfn);
>  }
> 
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 95ac219..bde9090 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -195,6 +195,8 @@ void __init memory_present(int nid, unsigned long start, unsigned long end)
>                 if (!ms->section_mem_map)
>                         ms->section_mem_map = sparse_encode_early_nid(nid) |
>                                                         SECTION_MARKED_PRESENT;
> +               ms->start_pfn = start;
> +               ms->end_pfn = end;
>         }
>  }
> 

I'd also expect the patch to then delete memmap_valid_within and replace
it with a pfn_valid_within() check (which in turn should call the strong
version of pfn_valid().

I prefer Kamezawa's suggestion of mapping on a ZERO_PAGE-like page full
of PageReserved struct pages because it would have better performance
and be more in line with maintaining the assumptions of the memory
model. If we go in this direction, I would strongly prefer it was an
ARM-only thing.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  9:37   ` Mel Gorman
@ 2010-07-13  9:46     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 58+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13  9:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	Shaohua Li, Yakui Zhao, linux-kernel, linux-mm, arm-kernel,
	kgene.kim, KAMEZAWA Hiroyuki

On Tue, Jul 13, 2010 at 10:37:00AM +0100, Mel Gorman wrote:
> I prefer Kamezawa's suggestion of mapping on a ZERO_PAGE-like page full
> of PageReserved struct pages because it would have better performance
> and be more in line with maintaining the assumptions of the memory
> model. If we go in this direction, I would strongly prefer it was an
> ARM-only thing.

As I've said, this is not possible without doing some serious page
manipulation.

Plus the pages that where there become unusable as they don't correspond
with a PFN or obey phys_to_virt().  So there's absolutely no point to
this.

Now, why do we free the holes in the mem_map - because these holes can
be extremely large.  Every 512K of hole equates to one page of mem_map
array.  Balance that against memory placed at 0xc0000000 physical on
some platforms, and with PHYSMEM_BITS at 32 and SECTION_SIZE_BITS at
19 - well, you do the maths.  The result is certainly not pretty.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13  9:46     ` Russell King - ARM Linux
  0 siblings, 0 replies; 58+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13  9:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Minchan Kim, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	Shaohua Li, Yakui Zhao, linux-kernel, linux-mm, arm-kernel,
	kgene.kim, KAMEZAWA Hiroyuki

On Tue, Jul 13, 2010 at 10:37:00AM +0100, Mel Gorman wrote:
> I prefer Kamezawa's suggestion of mapping on a ZERO_PAGE-like page full
> of PageReserved struct pages because it would have better performance
> and be more in line with maintaining the assumptions of the memory
> model. If we go in this direction, I would strongly prefer it was an
> ARM-only thing.

As I've said, this is not possible without doing some serious page
manipulation.

Plus the pages that where there become unusable as they don't correspond
with a PFN or obey phys_to_virt().  So there's absolutely no point to
this.

Now, why do we free the holes in the mem_map - because these holes can
be extremely large.  Every 512K of hole equates to one page of mem_map
array.  Balance that against memory placed at 0xc0000000 physical on
some platforms, and with PHYSMEM_BITS at 32 and SECTION_SIZE_BITS at
19 - well, you do the maths.  The result is certainly not pretty.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  9:46     ` Russell King - ARM Linux
@ 2010-07-13 10:00       ` Mel Gorman
  -1 siblings, 0 replies; 58+ messages in thread
From: Mel Gorman @ 2010-07-13 10:00 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Minchan Kim, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	Shaohua Li, Yakui Zhao, linux-kernel, linux-mm, arm-kernel,
	kgene.kim, KAMEZAWA Hiroyuki

On Tue, Jul 13, 2010 at 10:46:12AM +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 10:37:00AM +0100, Mel Gorman wrote:
> > I prefer Kamezawa's suggestion of mapping on a ZERO_PAGE-like page full
> > of PageReserved struct pages because it would have better performance
> > and be more in line with maintaining the assumptions of the memory
> > model. If we go in this direction, I would strongly prefer it was an
> > ARM-only thing.
> 
> As I've said, this is not possible without doing some serious page
> manipulation.
> 

Yep, x86 used to do something like this for discontig. It wasn't pretty.

> Plus the pages that where there become unusable as they don't correspond
> with a PFN or obey phys_to_virt().  So there's absolutely no point to
> this.
> 
> Now, why do we free the holes in the mem_map - because these holes can
> be extremely large.  Every 512K of hole equates to one page of mem_map
> array.

Sure, the holes might be large but at least they are contiguous. Is
there ever a case where you have

512K_Valid 512K_Hole 512K_Valid 512K_Hole

or is it typically

512K_hole 512K_hole ...... 512K_Valid 512K_Valid etc

If holes are typically contiguos, memmap is not allocated in the first place
and the savings from punching holes in memmap in the latter configuration
are minimal.

I recognise if you have a 2M section with a hole in it, you are
potentially wasting 3 pages on unused memmap. If this is really a problem,
Minchan's modification to sparsemem to increase the size of mem_section on
CONFIG_ARCH_HAS_HOLES_MEMORYMODEL is a messy option. I say messy because
it only works if the hole is on either end of the section and it's adding
quirks to the memory model.

>  Balance that against memory placed at 0xc0000000 physical on
> some platforms, and with PHYSMEM_BITS at 32 and SECTION_SIZE_BITS at
> 19 - well, you do the maths.  The result is certainly not pretty.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13 10:00       ` Mel Gorman
  0 siblings, 0 replies; 58+ messages in thread
From: Mel Gorman @ 2010-07-13 10:00 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Minchan Kim, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	Shaohua Li, Yakui Zhao, linux-kernel, linux-mm, arm-kernel,
	kgene.kim, KAMEZAWA Hiroyuki

On Tue, Jul 13, 2010 at 10:46:12AM +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 10:37:00AM +0100, Mel Gorman wrote:
> > I prefer Kamezawa's suggestion of mapping on a ZERO_PAGE-like page full
> > of PageReserved struct pages because it would have better performance
> > and be more in line with maintaining the assumptions of the memory
> > model. If we go in this direction, I would strongly prefer it was an
> > ARM-only thing.
> 
> As I've said, this is not possible without doing some serious page
> manipulation.
> 

Yep, x86 used to do something like this for discontig. It wasn't pretty.

> Plus the pages that where there become unusable as they don't correspond
> with a PFN or obey phys_to_virt().  So there's absolutely no point to
> this.
> 
> Now, why do we free the holes in the mem_map - because these holes can
> be extremely large.  Every 512K of hole equates to one page of mem_map
> array.

Sure, the holes might be large but at least they are contiguous. Is
there ever a case where you have

512K_Valid 512K_Hole 512K_Valid 512K_Hole

or is it typically

512K_hole 512K_hole ...... 512K_Valid 512K_Valid etc

If holes are typically contiguos, memmap is not allocated in the first place
and the savings from punching holes in memmap in the latter configuration
are minimal.

I recognise if you have a 2M section with a hole in it, you are
potentially wasting 3 pages on unused memmap. If this is really a problem,
Minchan's modification to sparsemem to increase the size of mem_section on
CONFIG_ARCH_HAS_HOLES_MEMORYMODEL is a messy option. I say messy because
it only works if the hole is on either end of the section and it's adding
quirks to the memory model.

>  Balance that against memory placed at 0xc0000000 physical on
> some platforms, and with PHYSMEM_BITS at 32 and SECTION_SIZE_BITS at
> 19 - well, you do the maths.  The result is certainly not pretty.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  9:30   ` Johannes Weiner
@ 2010-07-13 15:43     ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-13 15:43 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman, KAMEZAWA Hiroyuki

On Tue, Jul 13, 2010 at 11:30:06AM +0200, Johannes Weiner wrote:
> On Tue, Jul 13, 2010 at 12:53:48AM +0900, Minchan Kim wrote:
> > Kukjin, Could you test below patch?
> > I don't have any sparsemem system. Sorry. 
> > 
> > -- CUT DOWN HERE --
> > 
> > Kukjin reported oops happen while he change min_free_kbytes
> > http://www.spinics.net/lists/arm-kernel/msg92894.html
> > It happen by memory map on sparsemem. 
> > 
> > The system has a memory map following as. 
> >      section 0             section 1              section 2
> > 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> > SECTION_SIZE_BITS 28(256M)
> > 
> > It means section 0 is an incompletely filled section.
> > Nontheless, current pfn_valid of sparsemem checks pfn loosely. 
> > 
> > It checks only mem_section's validation.
> > So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> > It's not what we want.
> > 
> > The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
> 
> Look at the declaration of struct mem_section for a second.  It is
> meant to partition address space uniformly into backed and unbacked
> areas.
> 
> It comes with implicit size and offset information by means of
> SECTION_SIZE_BITS and the section's index in the section array.
> 
> Now you are not okay with the _granularity_ but propose to change _the
> model_ by introducing a subsection within each section and at the same
> time make the concept of a section completely meaningless: its size
> becomes arbitrary and its associated mem_map and flags will apply to
> the subsection only.
> 
> My question is: if the sections are not fine-grained enough, why not
> just make them?
> 
> The biggest possible section size to describe the memory population on
> this machine accurately is 16M.  Why not set SECTION_SIZE_BITS to 24?

You're right. AFAIK, Kukjin tried it but Russell and others rejected it. 
Let's wrap it up. 

First of all, Thanks for joining good discussion, Kame, Hannes, Mel and 
Russell. 

The system has following memory map. 
0x20000000-0x25000000,       0x40000000-0x50000000, 0x50000000-0x58000000
       80M          hole : 432M      256M                   128M

1) FLATMEM
If it uses FLATMEM, it wastes 864(432M/512K) pages due to memmap on hole. 
That's horrible. 

2) SPARSEMEM(16M)
It makes 56 mem_sections. It costs 448(56 * 8)byte.
It doesn't make unused memmap. So good.

3) SPARSEMEM(256M)
It makes 3 mem_sections. It costs 24(3 * 8) byte. 
And if we free unused memmap on 176M(256M - 80M), we can save 352 pages.

3 is best about memory usage. but for 3, we should check pfn_valid more tightly.
It can be checked by my patch. but mm guys didn't like it since it makes memory 
model messy due to some funny architecture.(ie, sparsemem designed to not include
hole.) and it still has a problem if there is a hole in the middle of section.

3 is not a big deal than 2 about memory usage.
If the system use memory space fully(MAX_PHYSMEM_BITS 31), it just consumes
1024(128 * 8) byte. So now I think best solution is 2. 

Russell. What do you think about it?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13 15:43     ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-13 15:43 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux, Yinghai Lu, H. Peter Anvin, Andrew Morton, Shaohua Li,
	Yakui Zhao, linux-kernel, linux-mm, arm-kernel, kgene.kim,
	Mel Gorman, KAMEZAWA Hiroyuki

On Tue, Jul 13, 2010 at 11:30:06AM +0200, Johannes Weiner wrote:
> On Tue, Jul 13, 2010 at 12:53:48AM +0900, Minchan Kim wrote:
> > Kukjin, Could you test below patch?
> > I don't have any sparsemem system. Sorry. 
> > 
> > -- CUT DOWN HERE --
> > 
> > Kukjin reported oops happen while he change min_free_kbytes
> > http://www.spinics.net/lists/arm-kernel/msg92894.html
> > It happen by memory map on sparsemem. 
> > 
> > The system has a memory map following as. 
> >      section 0             section 1              section 2
> > 0x20000000-0x25000000, 0x40000000-0x50000000, 0x50000000-0x58000000
> > SECTION_SIZE_BITS 28(256M)
> > 
> > It means section 0 is an incompletely filled section.
> > Nontheless, current pfn_valid of sparsemem checks pfn loosely. 
> > 
> > It checks only mem_section's validation.
> > So in above case, pfn on 0x25000000 can pass pfn_valid's validation check.
> > It's not what we want.
> > 
> > The Following patch adds check valid pfn range check on pfn_valid of sparsemem.
> 
> Look at the declaration of struct mem_section for a second.  It is
> meant to partition address space uniformly into backed and unbacked
> areas.
> 
> It comes with implicit size and offset information by means of
> SECTION_SIZE_BITS and the section's index in the section array.
> 
> Now you are not okay with the _granularity_ but propose to change _the
> model_ by introducing a subsection within each section and at the same
> time make the concept of a section completely meaningless: its size
> becomes arbitrary and its associated mem_map and flags will apply to
> the subsection only.
> 
> My question is: if the sections are not fine-grained enough, why not
> just make them?
> 
> The biggest possible section size to describe the memory population on
> this machine accurately is 16M.  Why not set SECTION_SIZE_BITS to 24?

You're right. AFAIK, Kukjin tried it but Russell and others rejected it. 
Let's wrap it up. 

First of all, Thanks for joining good discussion, Kame, Hannes, Mel and 
Russell. 

The system has following memory map. 
0x20000000-0x25000000,       0x40000000-0x50000000, 0x50000000-0x58000000
       80M          hole : 432M      256M                   128M

1) FLATMEM
If it uses FLATMEM, it wastes 864(432M/512K) pages due to memmap on hole. 
That's horrible. 

2) SPARSEMEM(16M)
It makes 56 mem_sections. It costs 448(56 * 8)byte.
It doesn't make unused memmap. So good.

3) SPARSEMEM(256M)
It makes 3 mem_sections. It costs 24(3 * 8) byte. 
And if we free unused memmap on 176M(256M - 80M), we can save 352 pages.

3 is best about memory usage. but for 3, we should check pfn_valid more tightly.
It can be checked by my patch. but mm guys didn't like it since it makes memory 
model messy due to some funny architecture.(ie, sparsemem designed to not include
hole.) and it still has a problem if there is a hole in the middle of section.

3 is not a big deal than 2 about memory usage.
If the system use memory space fully(MAX_PHYSMEM_BITS 31), it just consumes
1024(128 * 8) byte. So now I think best solution is 2. 

Russell. What do you think about it?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13 15:43     ` Minchan Kim
@ 2010-07-13 16:35       ` Dave Hansen
  -1 siblings, 0 replies; 58+ messages in thread
From: Dave Hansen @ 2010-07-13 16:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman, KAMEZAWA Hiroyuki

On Wed, 2010-07-14 at 00:43 +0900, Minchan Kim wrote:
> 3 is not a big deal than 2 about memory usage.
> If the system use memory space fully(MAX_PHYSMEM_BITS 31), it just consumes
> 1024(128 * 8) byte. So now I think best solution is 2. 
> 
> Russell. What do you think about it? 

I'm not Russell, but I'll tell you what I think. :)

Make the sections 16MB.  You suggestion to add the start/end pfns
_doubles_ the size of the structure, and its size overhead.  We have
systems with a pretty tremendous amount of memory with 16MB sections.

If you _really_ can't make the section size smaller, and the vast
majority of the sections are fully populated, you could hack something
in.  We could, for instance, have a global list that's mostly readonly
which tells you which sections need to be have their sizes closely
inspected.  That would work OK if, for instance, you only needed to
check a couple of memory sections in the system.  It'll start to suck if
you made the lists very long.

-- Dave


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13 16:35       ` Dave Hansen
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Hansen @ 2010-07-13 16:35 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman, KAMEZAWA Hiroyuki

On Wed, 2010-07-14 at 00:43 +0900, Minchan Kim wrote:
> 3 is not a big deal than 2 about memory usage.
> If the system use memory space fully(MAX_PHYSMEM_BITS 31), it just consumes
> 1024(128 * 8) byte. So now I think best solution is 2. 
> 
> Russell. What do you think about it? 

I'm not Russell, but I'll tell you what I think. :)

Make the sections 16MB.  You suggestion to add the start/end pfns
_doubles_ the size of the structure, and its size overhead.  We have
systems with a pretty tremendous amount of memory with 16MB sections.

If you _really_ can't make the section size smaller, and the vast
majority of the sections are fully populated, you could hack something
in.  We could, for instance, have a global list that's mostly readonly
which tells you which sections need to be have their sizes closely
inspected.  That would work OK if, for instance, you only needed to
check a couple of memory sections in the system.  It'll start to suck if
you made the lists very long.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13 16:35       ` Dave Hansen
@ 2010-07-13 16:44         ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-13 16:44 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman, KAMEZAWA Hiroyuki

Hi, Dave. 

On Tue, Jul 13, 2010 at 09:35:33AM -0700, Dave Hansen wrote:
> On Wed, 2010-07-14 at 00:43 +0900, Minchan Kim wrote:
> > 3 is not a big deal than 2 about memory usage.
> > If the system use memory space fully(MAX_PHYSMEM_BITS 31), it just consumes
> > 1024(128 * 8) byte. So now I think best solution is 2. 
> > 
> > Russell. What do you think about it? 
> 
> I'm not Russell, but I'll tell you what I think. :)
> 

No problem. :)

> Make the sections 16MB.  You suggestion to add the start/end pfns

I hope so. 

> _doubles_ the size of the structure, and its size overhead.  We have
> systems with a pretty tremendous amount of memory with 16MB sections.

Yes. it does in several GB server system.

> 
> If you _really_ can't make the section size smaller, and the vast
> majority of the sections are fully populated, you could hack something
> in.  We could, for instance, have a global list that's mostly readonly
> which tells you which sections need to be have their sizes closely
> inspected.  That would work OK if, for instance, you only needed to
> check a couple of memory sections in the system.  It'll start to suck if
> you made the lists very long.

Thanks for advise. As I say, I hope Russell accept 16M section. 

> 
> -- Dave
> 

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13 16:44         ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-13 16:44 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman, KAMEZAWA Hiroyuki

Hi, Dave. 

On Tue, Jul 13, 2010 at 09:35:33AM -0700, Dave Hansen wrote:
> On Wed, 2010-07-14 at 00:43 +0900, Minchan Kim wrote:
> > 3 is not a big deal than 2 about memory usage.
> > If the system use memory space fully(MAX_PHYSMEM_BITS 31), it just consumes
> > 1024(128 * 8) byte. So now I think best solution is 2. 
> > 
> > Russell. What do you think about it? 
> 
> I'm not Russell, but I'll tell you what I think. :)
> 

No problem. :)

> Make the sections 16MB.  You suggestion to add the start/end pfns

I hope so. 

> _doubles_ the size of the structure, and its size overhead.  We have
> systems with a pretty tremendous amount of memory with 16MB sections.

Yes. it does in several GB server system.

> 
> If you _really_ can't make the section size smaller, and the vast
> majority of the sections are fully populated, you could hack something
> in.  We could, for instance, have a global list that's mostly readonly
> which tells you which sections need to be have their sizes closely
> inspected.  That would work OK if, for instance, you only needed to
> check a couple of memory sections in the system.  It'll start to suck if
> you made the lists very long.

Thanks for advise. As I say, I hope Russell accept 16M section. 

> 
> -- Dave
> 

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13  8:02                 ` KAMEZAWA Hiroyuki
@ 2010-07-13 18:39                   ` Russell King - ARM Linux
  -1 siblings, 0 replies; 58+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13 18:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Minchan Kim, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	Shaohua Li, Yakui Zhao, linux-kernel, linux-mm, arm-kernel,
	kgene.kim, Mel Gorman

On Tue, Jul 13, 2010 at 05:02:22PM +0900, KAMEZAWA Hiroyuki wrote:
> How about stop using SPARSEMEM ? What's the benefit ? It just eats up
> memory for mem_section[].

The problem with that approach is that sometimes the mem_map array
doesn't fit into any memory banks.

We've gone around the loop of using flatmem with holes punched in it,
to using discontigmem, and now to using sparsemem.  It seems none of
these solutions does what we need for ARM.  I guess that's the price
we pay for not having memory architected to be at any particular place
in the physical memory map.

We're even seeing lately setups now where system memory is split into
two areas, where the second (higher physical address) is populated
first before the lower bank...  These kinds of games are getting rather
stupid and idiotic, but we're not the hardware designers and so we have
to live with it - or just tell the folk who are porting the kernel to
these platforms that we'll never take their patches.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13 18:39                   ` Russell King - ARM Linux
  0 siblings, 0 replies; 58+ messages in thread
From: Russell King - ARM Linux @ 2010-07-13 18:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Minchan Kim, Yinghai Lu, H. Peter Anvin, Andrew Morton,
	Shaohua Li, Yakui Zhao, linux-kernel, linux-mm, arm-kernel,
	kgene.kim, Mel Gorman

On Tue, Jul 13, 2010 at 05:02:22PM +0900, KAMEZAWA Hiroyuki wrote:
> How about stop using SPARSEMEM ? What's the benefit ? It just eats up
> memory for mem_section[].

The problem with that approach is that sometimes the mem_map array
doesn't fit into any memory banks.

We've gone around the loop of using flatmem with holes punched in it,
to using discontigmem, and now to using sparsemem.  It seems none of
these solutions does what we need for ARM.  I guess that's the price
we pay for not having memory architected to be at any particular place
in the physical memory map.

We're even seeing lately setups now where system memory is split into
two areas, where the second (higher physical address) is populated
first before the lower bank...  These kinds of games are getting rather
stupid and idiotic, but we're not the hardware designers and so we have
to live with it - or just tell the folk who are porting the kernel to
these platforms that we'll never take their patches.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13 18:39                   ` Russell King - ARM Linux
@ 2010-07-13 20:46                     ` Dave Hansen
  -1 siblings, 0 replies; 58+ messages in thread
From: Dave Hansen @ 2010-07-13 20:46 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

On Tue, 2010-07-13 at 19:39 +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 05:02:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > How about stop using SPARSEMEM ? What's the benefit ? It just eats up
> > memory for mem_section[].
> 
> The problem with that approach is that sometimes the mem_map array
> doesn't fit into any memory banks.
> 
> We've gone around the loop of using flatmem with holes punched in it,
> to using discontigmem, and now to using sparsemem.  It seems none of
> these solutions does what we need for ARM.  I guess that's the price
> we pay for not having memory architected to be at any particular place
> in the physical memory map.

What's the ARM hardware's maximum addressable memory these days? 4GB?

A 4GB system would have 256 sections, which means 256*2*sizeof(unsigned
long) for the mem_section[].  That's a pretty small amount of RAM.

What sizes are the holes that are being punched these days?  Smaller
than 16MB?

-- Dave


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-13 20:46                     ` Dave Hansen
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Hansen @ 2010-07-13 20:46 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: KAMEZAWA Hiroyuki, Minchan Kim, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

On Tue, 2010-07-13 at 19:39 +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 13, 2010 at 05:02:22PM +0900, KAMEZAWA Hiroyuki wrote:
> > How about stop using SPARSEMEM ? What's the benefit ? It just eats up
> > memory for mem_section[].
> 
> The problem with that approach is that sometimes the mem_map array
> doesn't fit into any memory banks.
> 
> We've gone around the loop of using flatmem with holes punched in it,
> to using discontigmem, and now to using sparsemem.  It seems none of
> these solutions does what we need for ARM.  I guess that's the price
> we pay for not having memory architected to be at any particular place
> in the physical memory map.

What's the ARM hardware's maximum addressable memory these days? 4GB?

A 4GB system would have 256 sections, which means 256*2*sizeof(unsigned
long) for the mem_section[].  That's a pretty small amount of RAM.

What sizes are the holes that are being punched these days?  Smaller
than 16MB?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-13 16:44         ` Minchan Kim
@ 2010-07-14  0:23           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-14  0:23 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Dave Hansen, Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

On Wed, 14 Jul 2010 01:44:23 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
 
> > If you _really_ can't make the section size smaller, and the vast
> > majority of the sections are fully populated, you could hack something
> > in.  We could, for instance, have a global list that's mostly readonly
> > which tells you which sections need to be have their sizes closely
> > inspected.  That would work OK if, for instance, you only needed to
> > check a couple of memory sections in the system.  It'll start to suck if
> > you made the lists very long.
> 
> Thanks for advise. As I say, I hope Russell accept 16M section. 
> 

It seems what I needed was good sleep....
How about this if 16M section is not acceptable ?

== NOT TESTED AT ALL, EVEN NOT COMPILED ==

register address of mem_section to memmap itself's page struct's pg->private field.
This means the page is used for memmap of the section.
Otherwise, the page is used for other purpose and memmap has a hole.

---
 arch/arm/mm/init.c     |   11 ++++++++++-
 include/linux/mmzone.h |   19 ++++++++++++++++++-
 mm/sparse.c            |   37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+), 2 deletions(-)

Index: mmotm-2.6.35-0701/include/linux/mmzone.h
===================================================================
--- mmotm-2.6.35-0701.orig/include/linux/mmzone.h
+++ mmotm-2.6.35-0701/include/linux/mmzone.h
@@ -1047,11 +1047,28 @@ static inline struct mem_section *__pfn_
 	return __nr_to_section(pfn_to_section_nr(pfn));
 }
 
+#ifdef CONFIG_SPARSEMEM_HAS_PIT
+void mark_memmap_pit(unsigned long start, unsigned long end, bool valid);
+static inline int page_valid(struct mem_section *ms, unsigned long pfn)
+{
+	struct page *page = pfn_to_page(pfn);
+	struct page *__pg = virt_to_page(page);
+	return __pg->private == ms;
+}
+#else
+static inline int page_valid(struct mem_section *ms, unsigned long pfn)
+{
+	return 1;
+}
+#endif
+
 static inline int pfn_valid(unsigned long pfn)
 {
+	struct mem_section *ms;
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
-	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+	ms = __nr_to_section(pfn_to_section_nr(pfn));
+	return valid_section(ms) && page_valid(ms, pfn);
 }
 
 static inline int pfn_present(unsigned long pfn)
Index: mmotm-2.6.35-0701/mm/sparse.c
===================================================================
--- mmotm-2.6.35-0701.orig/mm/sparse.c
+++ mmotm-2.6.35-0701/mm/sparse.c
@@ -615,6 +615,43 @@ void __init sparse_init(void)
 	free_bootmem(__pa(usemap_map), size);
 }
 
+#ifdef CONFIT_SPARSEMEM_HAS_PIT
+/*
+ * Fill memmap's pg->private with a pointer to mem_section.
+ * pfn_valid() will check this later. (see include/linux/mmzone.h)
+ * The caller should call 
+ * 	mark_memmap_pit(start, end, true) # for all allocated mem_map
+ * 	and, after that,
+ * 	mark_memmap_pit(start, end, false) # for all pits in mem_map.
+ * please see usage in ARM.
+ */
+void mark_memmap_pit(unsigned long start, unsigned long end, bool valid)
+{
+	struct mem_section *ms;
+	unsigned long pos, next;
+	struct page *pg;
+	void *memmap, *end;
+	unsigned long mapsize = sizeof(struct page) * PAGES_PER_SECTION;
+
+	for (pos = start;
+	     pos < end; pos = next) {
+		next = (pos + PAGES_PER_SECTION) & PAGE_SECTION_MASK;
+		ms = __pfn_to_section(pos);
+		if (!valid_section(ms))
+			continue;
+		for (memmap = pfn_to_page(pfn), end = pfn_to_page(next-1);
+		     memmap != end + 1;
+		     memmap += PAGE_SIZE) {
+			pg = virt_to_page(memmap);
+			if (valid)
+				pg->private = ms;
+			else
+				pg->private = NULL;
+		}
+	}
+}
+#endif
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
Index: mmotm-2.6.35-0701/arch/arm/mm/init.c
===================================================================
--- mmotm-2.6.35-0701.orig/arch/arm/mm/init.c
+++ mmotm-2.6.35-0701/arch/arm/mm/init.c
@@ -234,6 +234,13 @@ static void __init arm_bootmem_free(stru
 	arch_adjust_zones(zone_size, zhole_size);
 
 	free_area_init_node(0, zone_size, min, zhole_size);
+
+#ifdef CONFIG_SPARSEMEM
+	for_each_bank(i, mi) {
+		mark_memmap_pit(bank_start_pfn(mi->bank[i]),
+				bank_end_pfn(mi->bank[i]), true);
+	}
+#endif
 }
 
 #ifndef CONFIG_SPARSEMEM
@@ -386,8 +393,10 @@ free_memmap(unsigned long start_pfn, uns
 	 * If there are free pages between these,
 	 * free the section of the memmap array.
 	 */
-	if (pg < pgend)
+	if (pg < pgend) {
+		mark_memap_pit(pg >> PAGE_SHIFT, pgend >> PAGE_SHIFT, false);
 		free_bootmem(pg, pgend - pg);
+	}
 }
 
 /*


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-14  0:23           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-14  0:23 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Dave Hansen, Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

On Wed, 14 Jul 2010 01:44:23 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:
 
> > If you _really_ can't make the section size smaller, and the vast
> > majority of the sections are fully populated, you could hack something
> > in.  We could, for instance, have a global list that's mostly readonly
> > which tells you which sections need to be have their sizes closely
> > inspected.  That would work OK if, for instance, you only needed to
> > check a couple of memory sections in the system.  It'll start to suck if
> > you made the lists very long.
> 
> Thanks for advise. As I say, I hope Russell accept 16M section. 
> 

It seems what I needed was good sleep....
How about this if 16M section is not acceptable ?

== NOT TESTED AT ALL, EVEN NOT COMPILED ==

register address of mem_section to memmap itself's page struct's pg->private field.
This means the page is used for memmap of the section.
Otherwise, the page is used for other purpose and memmap has a hole.

---
 arch/arm/mm/init.c     |   11 ++++++++++-
 include/linux/mmzone.h |   19 ++++++++++++++++++-
 mm/sparse.c            |   37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 65 insertions(+), 2 deletions(-)

Index: mmotm-2.6.35-0701/include/linux/mmzone.h
===================================================================
--- mmotm-2.6.35-0701.orig/include/linux/mmzone.h
+++ mmotm-2.6.35-0701/include/linux/mmzone.h
@@ -1047,11 +1047,28 @@ static inline struct mem_section *__pfn_
 	return __nr_to_section(pfn_to_section_nr(pfn));
 }
 
+#ifdef CONFIG_SPARSEMEM_HAS_PIT
+void mark_memmap_pit(unsigned long start, unsigned long end, bool valid);
+static inline int page_valid(struct mem_section *ms, unsigned long pfn)
+{
+	struct page *page = pfn_to_page(pfn);
+	struct page *__pg = virt_to_page(page);
+	return __pg->private == ms;
+}
+#else
+static inline int page_valid(struct mem_section *ms, unsigned long pfn)
+{
+	return 1;
+}
+#endif
+
 static inline int pfn_valid(unsigned long pfn)
 {
+	struct mem_section *ms;
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
-	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+	ms = __nr_to_section(pfn_to_section_nr(pfn));
+	return valid_section(ms) && page_valid(ms, pfn);
 }
 
 static inline int pfn_present(unsigned long pfn)
Index: mmotm-2.6.35-0701/mm/sparse.c
===================================================================
--- mmotm-2.6.35-0701.orig/mm/sparse.c
+++ mmotm-2.6.35-0701/mm/sparse.c
@@ -615,6 +615,43 @@ void __init sparse_init(void)
 	free_bootmem(__pa(usemap_map), size);
 }
 
+#ifdef CONFIT_SPARSEMEM_HAS_PIT
+/*
+ * Fill memmap's pg->private with a pointer to mem_section.
+ * pfn_valid() will check this later. (see include/linux/mmzone.h)
+ * The caller should call 
+ * 	mark_memmap_pit(start, end, true) # for all allocated mem_map
+ * 	and, after that,
+ * 	mark_memmap_pit(start, end, false) # for all pits in mem_map.
+ * please see usage in ARM.
+ */
+void mark_memmap_pit(unsigned long start, unsigned long end, bool valid)
+{
+	struct mem_section *ms;
+	unsigned long pos, next;
+	struct page *pg;
+	void *memmap, *end;
+	unsigned long mapsize = sizeof(struct page) * PAGES_PER_SECTION;
+
+	for (pos = start;
+	     pos < end; pos = next) {
+		next = (pos + PAGES_PER_SECTION) & PAGE_SECTION_MASK;
+		ms = __pfn_to_section(pos);
+		if (!valid_section(ms))
+			continue;
+		for (memmap = pfn_to_page(pfn), end = pfn_to_page(next-1);
+		     memmap != end + 1;
+		     memmap += PAGE_SIZE) {
+			pg = virt_to_page(memmap);
+			if (valid)
+				pg->private = ms;
+			else
+				pg->private = NULL;
+		}
+	}
+}
+#endif
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
Index: mmotm-2.6.35-0701/arch/arm/mm/init.c
===================================================================
--- mmotm-2.6.35-0701.orig/arch/arm/mm/init.c
+++ mmotm-2.6.35-0701/arch/arm/mm/init.c
@@ -234,6 +234,13 @@ static void __init arm_bootmem_free(stru
 	arch_adjust_zones(zone_size, zhole_size);
 
 	free_area_init_node(0, zone_size, min, zhole_size);
+
+#ifdef CONFIG_SPARSEMEM
+	for_each_bank(i, mi) {
+		mark_memmap_pit(bank_start_pfn(mi->bank[i]),
+				bank_end_pfn(mi->bank[i]), true);
+	}
+#endif
 }
 
 #ifndef CONFIG_SPARSEMEM
@@ -386,8 +393,10 @@ free_memmap(unsigned long start_pfn, uns
 	 * If there are free pages between these,
 	 * free the section of the memmap array.
 	 */
-	if (pg < pgend)
+	if (pg < pgend) {
+		mark_memap_pit(pg >> PAGE_SHIFT, pgend >> PAGE_SHIFT, false);
 		free_bootmem(pg, pgend - pg);
+	}
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-14  0:23           ` KAMEZAWA Hiroyuki
@ 2010-07-14  6:44             ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-14  6:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Dave Hansen, Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

Hi, Kame.

On Wed, Jul 14, 2010 at 9:23 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 14 Jul 2010 01:44:23 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> > If you _really_ can't make the section size smaller, and the vast
>> > majority of the sections are fully populated, you could hack something
>> > in.  We could, for instance, have a global list that's mostly readonly
>> > which tells you which sections need to be have their sizes closely
>> > inspected.  That would work OK if, for instance, you only needed to
>> > check a couple of memory sections in the system.  It'll start to suck if
>> > you made the lists very long.
>>
>> Thanks for advise. As I say, I hope Russell accept 16M section.
>>
>
> It seems what I needed was good sleep....
> How about this if 16M section is not acceptable ?
>
> == NOT TESTED AT ALL, EVEN NOT COMPILED ==
>
> register address of mem_section to memmap itself's page struct's pg->private field.
> This means the page is used for memmap of the section.
> Otherwise, the page is used for other purpose and memmap has a hole.

It's a very good idea. :)
But can this handle case that a page on memmap pages have struct page
descriptor of hole?
I mean one page can include 128 page descriptor(4096 / 32).
In there, 64 page descriptor is valid but remain 64 page descriptor is on hole.
In this case, free_memmap doesn't free the page.

I think most of system will have aligned memory of 512K(4K * 128).
But I am not sure.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-14  6:44             ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-14  6:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Dave Hansen, Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

Hi, Kame.

On Wed, Jul 14, 2010 at 9:23 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 14 Jul 2010 01:44:23 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> > If you _really_ can't make the section size smaller, and the vast
>> > majority of the sections are fully populated, you could hack something
>> > in.  We could, for instance, have a global list that's mostly readonly
>> > which tells you which sections need to be have their sizes closely
>> > inspected.  That would work OK if, for instance, you only needed to
>> > check a couple of memory sections in the system.  It'll start to suck if
>> > you made the lists very long.
>>
>> Thanks for advise. As I say, I hope Russell accept 16M section.
>>
>
> It seems what I needed was good sleep....
> How about this if 16M section is not acceptable ?
>
> == NOT TESTED AT ALL, EVEN NOT COMPILED ==
>
> register address of mem_section to memmap itself's page struct's pg->private field.
> This means the page is used for memmap of the section.
> Otherwise, the page is used for other purpose and memmap has a hole.

It's a very good idea. :)
But can this handle case that a page on memmap pages have struct page
descriptor of hole?
I mean one page can include 128 page descriptor(4096 / 32).
In there, 64 page descriptor is valid but remain 64 page descriptor is on hole.
In this case, free_memmap doesn't free the page.

I think most of system will have aligned memory of 512K(4K * 128).
But I am not sure.
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-14  6:44             ` Minchan Kim
@ 2010-07-14  7:10               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-14  7:10 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Dave Hansen, Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

On Wed, 14 Jul 2010 15:44:41 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Hi, Kame.
> 
> On Wed, Jul 14, 2010 at 9:23 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Wed, 14 Jul 2010 01:44:23 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> > If you _really_ can't make the section size smaller, and the vast
> >> > majority of the sections are fully populated, you could hack something
> >> > in.  We could, for instance, have a global list that's mostly readonly
> >> > which tells you which sections need to be have their sizes closely
> >> > inspected.  That would work OK if, for instance, you only needed to
> >> > check a couple of memory sections in the system.  It'll start to suck if
> >> > you made the lists very long.
> >>
> >> Thanks for advise. As I say, I hope Russell accept 16M section.
> >>
> >
> > It seems what I needed was good sleep....
> > How about this if 16M section is not acceptable ?
> >
> > == NOT TESTED AT ALL, EVEN NOT COMPILED ==
> >
> > register address of mem_section to memmap itself's page struct's pg->private field.
> > This means the page is used for memmap of the section.
> > Otherwise, the page is used for other purpose and memmap has a hole.
> 
> It's a very good idea. :)
> But can this handle case that a page on memmap pages have struct page
> descriptor of hole?
> I mean one page can include 128 page descriptor(4096 / 32).
yes.

> In there, 64 page descriptor is valid but remain 64 page descriptor is on hole.
> In this case, free_memmap doesn't free the page.

yes. but in that case, there are valid page decriptor for 64pages of holes.
pfn_valid() should return true but PG_reserved is set.
(This is usual behavior.)

My intention is that

 - When all 128 page descriptors are unused, free_memmap() will free it.
   In that case, clear page->private of a page for freed page descriptors.

 - When some of page descriptors are used, free_memmap() can't free it
   and page->private points to &mem_section. We may have memmap for memory
   hole but pfn_valid() is a function to check there is memmap or not.
   The bahavior of pfn_valid() is valid.
   Anyway, you can't free only half of page.

If my code doesn't seem to work as above, it's bug.


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-14  7:10               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-14  7:10 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Dave Hansen, Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

On Wed, 14 Jul 2010 15:44:41 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Hi, Kame.
> 
> On Wed, Jul 14, 2010 at 9:23 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Wed, 14 Jul 2010 01:44:23 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> > If you _really_ can't make the section size smaller, and the vast
> >> > majority of the sections are fully populated, you could hack something
> >> > in. A We could, for instance, have a global list that's mostly readonly
> >> > which tells you which sections need to be have their sizes closely
> >> > inspected. A That would work OK if, for instance, you only needed to
> >> > check a couple of memory sections in the system. A It'll start to suck if
> >> > you made the lists very long.
> >>
> >> Thanks for advise. As I say, I hope Russell accept 16M section.
> >>
> >
> > It seems what I needed was good sleep....
> > How about this if 16M section is not acceptable ?
> >
> > == NOT TESTED AT ALL, EVEN NOT COMPILED ==
> >
> > register address of mem_section to memmap itself's page struct's pg->private field.
> > This means the page is used for memmap of the section.
> > Otherwise, the page is used for other purpose and memmap has a hole.
> 
> It's a very good idea. :)
> But can this handle case that a page on memmap pages have struct page
> descriptor of hole?
> I mean one page can include 128 page descriptor(4096 / 32).
yes.

> In there, 64 page descriptor is valid but remain 64 page descriptor is on hole.
> In this case, free_memmap doesn't free the page.

yes. but in that case, there are valid page decriptor for 64pages of holes.
pfn_valid() should return true but PG_reserved is set.
(This is usual behavior.)

My intention is that

 - When all 128 page descriptors are unused, free_memmap() will free it.
   In that case, clear page->private of a page for freed page descriptors.

 - When some of page descriptors are used, free_memmap() can't free it
   and page->private points to &mem_section. We may have memmap for memory
   hole but pfn_valid() is a function to check there is memmap or not.
   The bahavior of pfn_valid() is valid.
   Anyway, you can't free only half of page.

If my code doesn't seem to work as above, it's bug.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-14  7:10               ` KAMEZAWA Hiroyuki
@ 2010-07-14  7:35                 ` Minchan Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-14  7:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Dave Hansen, Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

On Wed, Jul 14, 2010 at 4:10 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 14 Jul 2010 15:44:41 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi, Kame.
>>
>> On Wed, Jul 14, 2010 at 9:23 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Wed, 14 Jul 2010 01:44:23 +0900
>> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >
>> >> > If you _really_ can't make the section size smaller, and the vast
>> >> > majority of the sections are fully populated, you could hack something
>> >> > in.  We could, for instance, have a global list that's mostly readonly
>> >> > which tells you which sections need to be have their sizes closely
>> >> > inspected.  That would work OK if, for instance, you only needed to
>> >> > check a couple of memory sections in the system.  It'll start to suck if
>> >> > you made the lists very long.
>> >>
>> >> Thanks for advise. As I say, I hope Russell accept 16M section.
>> >>
>> >
>> > It seems what I needed was good sleep....
>> > How about this if 16M section is not acceptable ?
>> >
>> > == NOT TESTED AT ALL, EVEN NOT COMPILED ==
>> >
>> > register address of mem_section to memmap itself's page struct's pg->private field.
>> > This means the page is used for memmap of the section.
>> > Otherwise, the page is used for other purpose and memmap has a hole.
>>
>> It's a very good idea. :)
>> But can this handle case that a page on memmap pages have struct page
>> descriptor of hole?
>> I mean one page can include 128 page descriptor(4096 / 32).
> yes.
>
>> In there, 64 page descriptor is valid but remain 64 page descriptor is on hole.
>> In this case, free_memmap doesn't free the page.
>
> yes. but in that case, there are valid page decriptor for 64pages of holes.
> pfn_valid() should return true but PG_reserved is set.
> (This is usual behavior.)
>
> My intention is that
>
>  - When all 128 page descriptors are unused, free_memmap() will free it.
>   In that case, clear page->private of a page for freed page descriptors.
>
>  - When some of page descriptors are used, free_memmap() can't free it
>   and page->private points to &mem_section. We may have memmap for memory
>   hole but pfn_valid() is a function to check there is memmap or not.
>   The bahavior of pfn_valid() is valid.
>   Anyway, you can't free only half of page.

Okay. I missed PageReserved.
Your idea seems to be good. :)

I looked at pagetypeinfo_showblockcount_print.
It doesn't check PageReserved. Instead of it, it does ugly memmap_valid_within.
Can't we remove it and change it with PageReserved?


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-14  7:35                 ` Minchan Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Minchan Kim @ 2010-07-14  7:35 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Dave Hansen, Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

On Wed, Jul 14, 2010 at 4:10 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 14 Jul 2010 15:44:41 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi, Kame.
>>
>> On Wed, Jul 14, 2010 at 9:23 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Wed, 14 Jul 2010 01:44:23 +0900
>> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >
>> >> > If you _really_ can't make the section size smaller, and the vast
>> >> > majority of the sections are fully populated, you could hack something
>> >> > in.  We could, for instance, have a global list that's mostly readonly
>> >> > which tells you which sections need to be have their sizes closely
>> >> > inspected.  That would work OK if, for instance, you only needed to
>> >> > check a couple of memory sections in the system.  It'll start to suck if
>> >> > you made the lists very long.
>> >>
>> >> Thanks for advise. As I say, I hope Russell accept 16M section.
>> >>
>> >
>> > It seems what I needed was good sleep....
>> > How about this if 16M section is not acceptable ?
>> >
>> > == NOT TESTED AT ALL, EVEN NOT COMPILED ==
>> >
>> > register address of mem_section to memmap itself's page struct's pg->private field.
>> > This means the page is used for memmap of the section.
>> > Otherwise, the page is used for other purpose and memmap has a hole.
>>
>> It's a very good idea. :)
>> But can this handle case that a page on memmap pages have struct page
>> descriptor of hole?
>> I mean one page can include 128 page descriptor(4096 / 32).
> yes.
>
>> In there, 64 page descriptor is valid but remain 64 page descriptor is on hole.
>> In this case, free_memmap doesn't free the page.
>
> yes. but in that case, there are valid page decriptor for 64pages of holes.
> pfn_valid() should return true but PG_reserved is set.
> (This is usual behavior.)
>
> My intention is that
>
>  - When all 128 page descriptors are unused, free_memmap() will free it.
>   In that case, clear page->private of a page for freed page descriptors.
>
>  - When some of page descriptors are used, free_memmap() can't free it
>   and page->private points to &mem_section. We may have memmap for memory
>   hole but pfn_valid() is a function to check there is memmap or not.
>   The bahavior of pfn_valid() is valid.
>   Anyway, you can't free only half of page.

Okay. I missed PageReserved.
Your idea seems to be good. :)

I looked at pagetypeinfo_showblockcount_print.
It doesn't check PageReserved. Instead of it, it does ugly memmap_valid_within.
Can't we remove it and change it with PageReserved?


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-14  7:35                 ` Minchan Kim
@ 2010-07-14  7:39                   ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-14  7:39 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Dave Hansen, Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

On Wed, 14 Jul 2010 16:35:22 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Wed, Jul 14, 2010 at 4:10 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Wed, 14 Jul 2010 15:44:41 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> Hi, Kame.
> >>
> >> On Wed, Jul 14, 2010 at 9:23 AM, KAMEZAWA Hiroyuki
> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >> > On Wed, 14 Jul 2010 01:44:23 +0900
> >> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >> >
> >> >> > If you _really_ can't make the section size smaller, and the vast
> >> >> > majority of the sections are fully populated, you could hack something
> >> >> > in.  We could, for instance, have a global list that's mostly readonly
> >> >> > which tells you which sections need to be have their sizes closely
> >> >> > inspected.  That would work OK if, for instance, you only needed to
> >> >> > check a couple of memory sections in the system.  It'll start to suck if
> >> >> > you made the lists very long.
> >> >>
> >> >> Thanks for advise. As I say, I hope Russell accept 16M section.
> >> >>
> >> >
> >> > It seems what I needed was good sleep....
> >> > How about this if 16M section is not acceptable ?
> >> >
> >> > == NOT TESTED AT ALL, EVEN NOT COMPILED ==
> >> >
> >> > register address of mem_section to memmap itself's page struct's pg->private field.
> >> > This means the page is used for memmap of the section.
> >> > Otherwise, the page is used for other purpose and memmap has a hole.
> >>
> >> It's a very good idea. :)
> >> But can this handle case that a page on memmap pages have struct page
> >> descriptor of hole?
> >> I mean one page can include 128 page descriptor(4096 / 32).
> > yes.
> >
> >> In there, 64 page descriptor is valid but remain 64 page descriptor is on hole.
> >> In this case, free_memmap doesn't free the page.
> >
> > yes. but in that case, there are valid page decriptor for 64pages of holes.
> > pfn_valid() should return true but PG_reserved is set.
> > (This is usual behavior.)
> >
> > My intention is that
> >
> >  - When all 128 page descriptors are unused, free_memmap() will free it.
> >   In that case, clear page->private of a page for freed page descriptors.
> >
> >  - When some of page descriptors are used, free_memmap() can't free it
> >   and page->private points to &mem_section. We may have memmap for memory
> >   hole but pfn_valid() is a function to check there is memmap or not.
> >   The bahavior of pfn_valid() is valid.
> >   Anyway, you can't free only half of page.
> 
> Okay. I missed PageReserved.
> Your idea seems to be good. :)
> 
> I looked at pagetypeinfo_showblockcount_print.
> It doesn't check PageReserved. Instead of it, it does ugly memmap_valid_within.
> Can't we remove it and change it with PageReserved?
> 
maybe. but I'm not sure how many archs uses CONFIG_ARCH_HAS_HOLES_MEMORYMODEL.
Because my idea requires to add arch-dependent hook, enhancement of pfn_valid()
happens only when an arch supports it. So, you may need a conservative path.

Anyway, I can't test the patch by myself. So, I pass ball to ARM guys.
Feel free to reuse my idea if you like.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks,
-Kame





^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-14  7:39                   ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-14  7:39 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Dave Hansen, Johannes Weiner, linux, Yinghai Lu, H. Peter Anvin,
	Andrew Morton, Shaohua Li, Yakui Zhao, linux-kernel, linux-mm,
	arm-kernel, kgene.kim, Mel Gorman

On Wed, 14 Jul 2010 16:35:22 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Wed, Jul 14, 2010 at 4:10 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Wed, 14 Jul 2010 15:44:41 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> Hi, Kame.
> >>
> >> On Wed, Jul 14, 2010 at 9:23 AM, KAMEZAWA Hiroyuki
> >> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> >> > On Wed, 14 Jul 2010 01:44:23 +0900
> >> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >> >
> >> >> > If you _really_ can't make the section size smaller, and the vast
> >> >> > majority of the sections are fully populated, you could hack something
> >> >> > in. A We could, for instance, have a global list that's mostly readonly
> >> >> > which tells you which sections need to be have their sizes closely
> >> >> > inspected. A That would work OK if, for instance, you only needed to
> >> >> > check a couple of memory sections in the system. A It'll start to suck if
> >> >> > you made the lists very long.
> >> >>
> >> >> Thanks for advise. As I say, I hope Russell accept 16M section.
> >> >>
> >> >
> >> > It seems what I needed was good sleep....
> >> > How about this if 16M section is not acceptable ?
> >> >
> >> > == NOT TESTED AT ALL, EVEN NOT COMPILED ==
> >> >
> >> > register address of mem_section to memmap itself's page struct's pg->private field.
> >> > This means the page is used for memmap of the section.
> >> > Otherwise, the page is used for other purpose and memmap has a hole.
> >>
> >> It's a very good idea. :)
> >> But can this handle case that a page on memmap pages have struct page
> >> descriptor of hole?
> >> I mean one page can include 128 page descriptor(4096 / 32).
> > yes.
> >
> >> In there, 64 page descriptor is valid but remain 64 page descriptor is on hole.
> >> In this case, free_memmap doesn't free the page.
> >
> > yes. but in that case, there are valid page decriptor for 64pages of holes.
> > pfn_valid() should return true but PG_reserved is set.
> > (This is usual behavior.)
> >
> > My intention is that
> >
> > A - When all 128 page descriptors are unused, free_memmap() will free it.
> > A  In that case, clear page->private of a page for freed page descriptors.
> >
> > A - When some of page descriptors are used, free_memmap() can't free it
> > A  and page->private points to &mem_section. We may have memmap for memory
> > A  hole but pfn_valid() is a function to check there is memmap or not.
> > A  The bahavior of pfn_valid() is valid.
> > A  Anyway, you can't free only half of page.
> 
> Okay. I missed PageReserved.
> Your idea seems to be good. :)
> 
> I looked at pagetypeinfo_showblockcount_print.
> It doesn't check PageReserved. Instead of it, it does ugly memmap_valid_within.
> Can't we remove it and change it with PageReserved?
> 
maybe. but I'm not sure how many archs uses CONFIG_ARCH_HAS_HOLES_MEMORYMODEL.
Because my idea requires to add arch-dependent hook, enhancement of pfn_valid()
happens only when an arch supports it. So, you may need a conservative path.

Anyway, I can't test the patch by myself. So, I pass ball to ARM guys.
Feel free to reuse my idea if you like.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-14  0:23           ` KAMEZAWA Hiroyuki
@ 2010-07-14  7:50             ` Kukjin Kim
  -1 siblings, 0 replies; 58+ messages in thread
From: Kukjin Kim @ 2010-07-14  7:50 UTC (permalink / raw)
  To: 'KAMEZAWA Hiroyuki', 'Minchan Kim'
  Cc: 'Dave Hansen', 'Johannes Weiner',
	linux, 'Yinghai Lu', 'H. Peter Anvin',
	'Andrew Morton', 'Shaohua Li',
	'Yakui Zhao',
	linux-kernel, linux-mm, arm-kernel, 'Mel Gorman'

KAMEZAWA Hiroyuki wrote:
> 
> On Wed, 14 Jul 2010 01:44:23 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
> 
> > > If you _really_ can't make the section size smaller, and the vast
> > > majority of the sections are fully populated, you could hack something
> > > in.  We could, for instance, have a global list that's mostly readonly
> > > which tells you which sections need to be have their sizes closely
> > > inspected.  That would work OK if, for instance, you only needed to
> > > check a couple of memory sections in the system.  It'll start to suck
if
> > > you made the lists very long.
> >
> > Thanks for advise. As I say, I hope Russell accept 16M section.
> >
Hi,

Thanks for your inputs.
> 
> It seems what I needed was good sleep....
> How about this if 16M section is not acceptable ?
> 
> == NOT TESTED AT ALL, EVEN NOT COMPILED ==

Yeah...

Couldn't build with s5pv210_defconfig when used mmotm tree,
And couldn't apply your patch against latest mainline 35-rc5.

Could you please remake your patch against mainline 35-rc5?
Or...please let me know how I can test on my board(smdkv210).

> 
> register address of mem_section to memmap itself's page struct's
pg->private
> field.
> This means the page is used for memmap of the section.
> Otherwise, the page is used for other purpose and memmap has a hole.
> 
> ---
>  arch/arm/mm/init.c     |   11 ++++++++++-
>  include/linux/mmzone.h |   19 ++++++++++++++++++-
>  mm/sparse.c            |   37
> +++++++++++++++++++++++++++++++++++++
>  3 files changed, 65 insertions(+), 2 deletions(-)
> 
> Index: mmotm-2.6.35-0701/include/linux/mmzone.h
> =============================================================
> ======
> --- mmotm-2.6.35-0701.orig/include/linux/mmzone.h
> +++ mmotm-2.6.35-0701/include/linux/mmzone.h
> @@ -1047,11 +1047,28 @@ static inline struct mem_section *__pfn_
>  	return __nr_to_section(pfn_to_section_nr(pfn));
>  }
> 
> +#ifdef CONFIG_SPARSEMEM_HAS_PIT
> +void mark_memmap_pit(unsigned long start, unsigned long end, bool valid);
> +static inline int page_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +	struct page *__pg = virt_to_page(page);
> +	return __pg->private == ms;
> +}
> +#else
> +static inline int page_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +	return 1;
> +}
> +#endif
> +
>  static inline int pfn_valid(unsigned long pfn)
>  {
> +	struct mem_section *ms;
>  	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>  		return 0;
> -	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +	ms = __nr_to_section(pfn_to_section_nr(pfn));
> +	return valid_section(ms) && page_valid(ms, pfn);
>  }
> 
>  static inline int pfn_present(unsigned long pfn)
> Index: mmotm-2.6.35-0701/mm/sparse.c
> =============================================================
> ======
> --- mmotm-2.6.35-0701.orig/mm/sparse.c
> +++ mmotm-2.6.35-0701/mm/sparse.c
> @@ -615,6 +615,43 @@ void __init sparse_init(void)
>  	free_bootmem(__pa(usemap_map), size);
>  }
> 
> +#ifdef CONFIT_SPARSEMEM_HAS_PIT
> +/*
> + * Fill memmap's pg->private with a pointer to mem_section.
> + * pfn_valid() will check this later. (see include/linux/mmzone.h)
> + * The caller should call
> + * 	mark_memmap_pit(start, end, true) # for all allocated mem_map
> + * 	and, after that,
> + * 	mark_memmap_pit(start, end, false) # for all pits in mem_map.
> + * please see usage in ARM.
> + */
> +void mark_memmap_pit(unsigned long start, unsigned long end, bool valid)
> +{
> +	struct mem_section *ms;
> +	unsigned long pos, next;
> +	struct page *pg;
> +	void *memmap, *end;
> +	unsigned long mapsize = sizeof(struct page) * PAGES_PER_SECTION;
> +
> +	for (pos = start;
> +	     pos < end; pos = next) {
> +		next = (pos + PAGES_PER_SECTION) &
> PAGE_SECTION_MASK;
> +		ms = __pfn_to_section(pos);
> +		if (!valid_section(ms))
> +			continue;
> +		for (memmap = pfn_to_page(pfn), end = pfn_to_page(next-1);
> +		     memmap != end + 1;
> +		     memmap += PAGE_SIZE) {
> +			pg = virt_to_page(memmap);
> +			if (valid)
> +				pg->private = ms;
> +			else
> +				pg->private = NULL;
> +		}
> +	}
> +}
> +#endif
> +
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  static inline struct page *kmalloc_section_memmap(unsigned long pnum, int
nid,
> Index: mmotm-2.6.35-0701/arch/arm/mm/init.c
> =============================================================
> ======
> --- mmotm-2.6.35-0701.orig/arch/arm/mm/init.c
> +++ mmotm-2.6.35-0701/arch/arm/mm/init.c
> @@ -234,6 +234,13 @@ static void __init arm_bootmem_free(stru
>  	arch_adjust_zones(zone_size, zhole_size);
> 
>  	free_area_init_node(0, zone_size, min, zhole_size);
> +
> +#ifdef CONFIG_SPARSEMEM
> +	for_each_bank(i, mi) {
> +		mark_memmap_pit(bank_start_pfn(mi->bank[i]),
> +				bank_end_pfn(mi->bank[i]), true);
> +	}
> +#endif
>  }
> 
>  #ifndef CONFIG_SPARSEMEM
> @@ -386,8 +393,10 @@ free_memmap(unsigned long start_pfn, uns
>  	 * If there are free pages between these,
>  	 * free the section of the memmap array.
>  	 */
> -	if (pg < pgend)
> +	if (pg < pgend) {
> +		mark_memap_pit(pg >> PAGE_SHIFT, pgend >> PAGE_SHIFT,
> false);
>  		free_bootmem(pg, pgend - pg);
> +	}
>  }
> 
>  /*



Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-14  7:50             ` Kukjin Kim
  0 siblings, 0 replies; 58+ messages in thread
From: Kukjin Kim @ 2010-07-14  7:50 UTC (permalink / raw)
  To: 'KAMEZAWA Hiroyuki', 'Minchan Kim'
  Cc: 'Dave Hansen', 'Johannes Weiner',
	linux, 'Yinghai Lu', 'H. Peter Anvin',
	'Andrew Morton', 'Shaohua Li',
	'Yakui Zhao',
	linux-kernel, linux-mm, arm-kernel, 'Mel Gorman'

KAMEZAWA Hiroyuki wrote:
> 
> On Wed, 14 Jul 2010 01:44:23 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
> 
> > > If you _really_ can't make the section size smaller, and the vast
> > > majority of the sections are fully populated, you could hack something
> > > in.  We could, for instance, have a global list that's mostly readonly
> > > which tells you which sections need to be have their sizes closely
> > > inspected.  That would work OK if, for instance, you only needed to
> > > check a couple of memory sections in the system.  It'll start to suck
if
> > > you made the lists very long.
> >
> > Thanks for advise. As I say, I hope Russell accept 16M section.
> >
Hi,

Thanks for your inputs.
> 
> It seems what I needed was good sleep....
> How about this if 16M section is not acceptable ?
> 
> == NOT TESTED AT ALL, EVEN NOT COMPILED ==

Yeah...

Couldn't build with s5pv210_defconfig when used mmotm tree,
And couldn't apply your patch against latest mainline 35-rc5.

Could you please remake your patch against mainline 35-rc5?
Or...please let me know how I can test on my board(smdkv210).

> 
> register address of mem_section to memmap itself's page struct's
pg->private
> field.
> This means the page is used for memmap of the section.
> Otherwise, the page is used for other purpose and memmap has a hole.
> 
> ---
>  arch/arm/mm/init.c     |   11 ++++++++++-
>  include/linux/mmzone.h |   19 ++++++++++++++++++-
>  mm/sparse.c            |   37
> +++++++++++++++++++++++++++++++++++++
>  3 files changed, 65 insertions(+), 2 deletions(-)
> 
> Index: mmotm-2.6.35-0701/include/linux/mmzone.h
> =============================================================
> ======
> --- mmotm-2.6.35-0701.orig/include/linux/mmzone.h
> +++ mmotm-2.6.35-0701/include/linux/mmzone.h
> @@ -1047,11 +1047,28 @@ static inline struct mem_section *__pfn_
>  	return __nr_to_section(pfn_to_section_nr(pfn));
>  }
> 
> +#ifdef CONFIG_SPARSEMEM_HAS_PIT
> +void mark_memmap_pit(unsigned long start, unsigned long end, bool valid);
> +static inline int page_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +	struct page *page = pfn_to_page(pfn);
> +	struct page *__pg = virt_to_page(page);
> +	return __pg->private == ms;
> +}
> +#else
> +static inline int page_valid(struct mem_section *ms, unsigned long pfn)
> +{
> +	return 1;
> +}
> +#endif
> +
>  static inline int pfn_valid(unsigned long pfn)
>  {
> +	struct mem_section *ms;
>  	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
>  		return 0;
> -	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
> +	ms = __nr_to_section(pfn_to_section_nr(pfn));
> +	return valid_section(ms) && page_valid(ms, pfn);
>  }
> 
>  static inline int pfn_present(unsigned long pfn)
> Index: mmotm-2.6.35-0701/mm/sparse.c
> =============================================================
> ======
> --- mmotm-2.6.35-0701.orig/mm/sparse.c
> +++ mmotm-2.6.35-0701/mm/sparse.c
> @@ -615,6 +615,43 @@ void __init sparse_init(void)
>  	free_bootmem(__pa(usemap_map), size);
>  }
> 
> +#ifdef CONFIT_SPARSEMEM_HAS_PIT
> +/*
> + * Fill memmap's pg->private with a pointer to mem_section.
> + * pfn_valid() will check this later. (see include/linux/mmzone.h)
> + * The caller should call
> + * 	mark_memmap_pit(start, end, true) # for all allocated mem_map
> + * 	and, after that,
> + * 	mark_memmap_pit(start, end, false) # for all pits in mem_map.
> + * please see usage in ARM.
> + */
> +void mark_memmap_pit(unsigned long start, unsigned long end, bool valid)
> +{
> +	struct mem_section *ms;
> +	unsigned long pos, next;
> +	struct page *pg;
> +	void *memmap, *end;
> +	unsigned long mapsize = sizeof(struct page) * PAGES_PER_SECTION;
> +
> +	for (pos = start;
> +	     pos < end; pos = next) {
> +		next = (pos + PAGES_PER_SECTION) &
> PAGE_SECTION_MASK;
> +		ms = __pfn_to_section(pos);
> +		if (!valid_section(ms))
> +			continue;
> +		for (memmap = pfn_to_page(pfn), end = pfn_to_page(next-1);
> +		     memmap != end + 1;
> +		     memmap += PAGE_SIZE) {
> +			pg = virt_to_page(memmap);
> +			if (valid)
> +				pg->private = ms;
> +			else
> +				pg->private = NULL;
> +		}
> +	}
> +}
> +#endif
> +
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
>  static inline struct page *kmalloc_section_memmap(unsigned long pnum, int
nid,
> Index: mmotm-2.6.35-0701/arch/arm/mm/init.c
> =============================================================
> ======
> --- mmotm-2.6.35-0701.orig/arch/arm/mm/init.c
> +++ mmotm-2.6.35-0701/arch/arm/mm/init.c
> @@ -234,6 +234,13 @@ static void __init arm_bootmem_free(stru
>  	arch_adjust_zones(zone_size, zhole_size);
> 
>  	free_area_init_node(0, zone_size, min, zhole_size);
> +
> +#ifdef CONFIG_SPARSEMEM
> +	for_each_bank(i, mi) {
> +		mark_memmap_pit(bank_start_pfn(mi->bank[i]),
> +				bank_end_pfn(mi->bank[i]), true);
> +	}
> +#endif
>  }
> 
>  #ifndef CONFIG_SPARSEMEM
> @@ -386,8 +393,10 @@ free_memmap(unsigned long start_pfn, uns
>  	 * If there are free pages between these,
>  	 * free the section of the memmap array.
>  	 */
> -	if (pg < pgend)
> +	if (pg < pgend) {
> +		mark_memap_pit(pg >> PAGE_SHIFT, pgend >> PAGE_SHIFT,
> false);
>  		free_bootmem(pg, pgend - pg);
> +	}
>  }
> 
>  /*



Thanks.

Best regards,
Kgene.
--
Kukjin Kim <kgene.kim@samsung.com>, Senior Engineer,
SW Solution Development Team, Samsung Electronics Co., Ltd.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
  2010-07-14  7:50             ` Kukjin Kim
@ 2010-07-14  8:09               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-14  8:09 UTC (permalink / raw)
  To: Kukjin Kim
  Cc: 'Minchan Kim', 'Dave Hansen',
	'Johannes Weiner', linux, 'Yinghai Lu',
	'H. Peter Anvin', 'Andrew Morton',
	'Shaohua Li', 'Yakui Zhao',
	linux-kernel, linux-mm, arm-kernel, 'Mel Gorman'

On Wed, 14 Jul 2010 16:50:25 +0900
Kukjin Kim <kgene.kim@samsung.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > 
> > On Wed, 14 Jul 2010 01:44:23 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> > 
> > > > If you _really_ can't make the section size smaller, and the vast
> > > > majority of the sections are fully populated, you could hack something
> > > > in.  We could, for instance, have a global list that's mostly readonly
> > > > which tells you which sections need to be have their sizes closely
> > > > inspected.  That would work OK if, for instance, you only needed to
> > > > check a couple of memory sections in the system.  It'll start to suck
> if
> > > > you made the lists very long.
> > >
> > > Thanks for advise. As I say, I hope Russell accept 16M section.
> > >
> Hi,
> 
> Thanks for your inputs.
> > 
> > It seems what I needed was good sleep....
> > How about this if 16M section is not acceptable ?
> > 
> > == NOT TESTED AT ALL, EVEN NOT COMPILED ==
> 
> Yeah...
> 
> Couldn't build with s5pv210_defconfig when used mmotm tree,
> And couldn't apply your patch against latest mainline 35-rc5.
> 
> Could you please remake your patch against mainline 35-rc5?
> Or...please let me know how I can test on my board(smdkv210).
> 
Ahh..how brave you are.

my patch was against mmotm-07-01.
select SPARSEMEM_HAS_PIT in config.
(in menuconfig it will appear under Processor type and features.)

This is a fixed one, maybe mm/sparce.o can be compiled.
At least, arch-generic part is compiled.

(This config should be selected automatically via arm's config.
 this patch is just for test.)

Signed-off-by: KAMEZAWA hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 arch/arm/mm/init.c     |   11 ++++++++++-
 include/linux/mmzone.h |   19 ++++++++++++++++++-
 mm/Kconfig             |    5 +++++
 mm/sparse.c            |   38 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 70 insertions(+), 3 deletions(-)

Index: mmotm-2.6.35-0701/include/linux/mmzone.h
===================================================================
--- mmotm-2.6.35-0701.orig/include/linux/mmzone.h
+++ mmotm-2.6.35-0701/include/linux/mmzone.h
@@ -1047,11 +1047,28 @@ static inline struct mem_section *__pfn_
 	return __nr_to_section(pfn_to_section_nr(pfn));
 }
 
+#ifdef CONFIG_SPARSEMEM_HAS_PIT
+void mark_memmap_pit(unsigned long start, unsigned long end, bool valid);
+static inline int page_valid(struct mem_section *ms, unsigned long pfn)
+{
+	struct page *page = pfn_to_page(pfn);
+	struct page *__pg = virt_to_page(page);
+	return __pg->private == ms;
+}
+#else
+static inline int page_valid(struct mem_section *ms, unsigned long pfn)
+{
+	return 1;
+}
+#endif
+
 static inline int pfn_valid(unsigned long pfn)
 {
+	struct mem_section *ms;
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
-	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+	ms = __nr_to_section(pfn_to_section_nr(pfn));
+	return valid_section(ms) && page_valid(ms, pfn);
 }
 
 static inline int pfn_present(unsigned long pfn)
Index: mmotm-2.6.35-0701/mm/sparse.c
===================================================================
--- mmotm-2.6.35-0701.orig/mm/sparse.c
+++ mmotm-2.6.35-0701/mm/sparse.c
@@ -13,7 +13,6 @@
 #include <asm/dma.h>
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
-
 /*
  * Permanent SPARSEMEM data:
  *
@@ -615,6 +614,43 @@ void __init sparse_init(void)
 	free_bootmem(__pa(usemap_map), size);
 }
 
+#ifdef CONFIG_SPARSEMEM_HAS_PIT
+/*
+ * Fill memmap's pg->private with a pointer to mem_section.
+ * pfn_valid() will check this later. (see include/linux/mmzone.h)
+ * The caller should call
+ * 	mark_memmap_pit(start, end, true) # for all allocated mem_map
+ * 	and, after that,
+ * 	mark_memmap_pit(start, end, false) # for all pits in mem_map.
+ * please see usage in ARM.
+ */
+void mark_memmap_pit(unsigned long start, unsigned long end, bool valid)
+{
+	struct mem_section *ms;
+	unsigned long pos, next;
+	struct page *pg;
+	void *memmap, *mapend;
+
+	for (pos = start;
+	     pos < end; pos = next) {
+		next = (pos + PAGES_PER_SECTION) & PAGE_SECTION_MASK;
+		ms = __pfn_to_section(pos);
+		if (!valid_section(ms))
+			continue;
+		for (memmap = (void*)pfn_to_page(pos),
+		     mapend = pfn_to_page(next-1); /* the last page in section*/
+		     memmap < mapend;
+		     memmap += PAGE_SIZE) {
+			pg = virt_to_page(memmap);
+			if (valid)
+				pg->private = (unsigned long)ms;
+			else
+				pg->private = 0;
+		}
+	}
+}
+#endif
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
Index: mmotm-2.6.35-0701/arch/arm/mm/init.c
===================================================================
--- mmotm-2.6.35-0701.orig/arch/arm/mm/init.c
+++ mmotm-2.6.35-0701/arch/arm/mm/init.c
@@ -234,6 +234,13 @@ static void __init arm_bootmem_free(stru
 	arch_adjust_zones(zone_size, zhole_size);
 
 	free_area_init_node(0, zone_size, min, zhole_size);
+
+#ifdef CONFIG_SPARSEMEM
+	for_each_bank(i, mi) {
+		mark_memmap_pit(bank_start_pfn(mi->bank[i]),
+				bank_end_pfn(mi->bank[i]), true);
+	}
+#endif
 }
 
 #ifndef CONFIG_SPARSEMEM
@@ -386,8 +393,10 @@ free_memmap(unsigned long start_pfn, uns
 	 * If there are free pages between these,
 	 * free the section of the memmap array.
 	 */
-	if (pg < pgend)
+	if (pg < pgend) {
+		mark_memap_pit(pg >> PAGE_SHIFT, pgend >> PAGE_SHIFT, false);
 		free_bootmem(pg, pgend - pg);
+	}
 }
 
 /*
Index: mmotm-2.6.35-0701/mm/Kconfig
===================================================================
--- mmotm-2.6.35-0701.orig/mm/Kconfig
+++ mmotm-2.6.35-0701/mm/Kconfig
@@ -128,6 +128,11 @@ config SPARSEMEM_VMEMMAP
 	 pfn_to_page and page_to_pfn operations.  This is the most
 	 efficient option when sufficient kernel resources are available.
 
+config SPAESEMEM_HAS_PIT
+	bool "allow holes in sparsemem's memmap"
+	depends on SPARSEMEM && !SPARSEMEM_VMEMMAP
+	default n
+
 # eventually, we can have this option just 'select SPARSEMEM'
 config MEMORY_HOTPLUG
 	bool "Allow for memory hot-add"


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Tight check of pfn_valid on sparsemem
@ 2010-07-14  8:09               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 58+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-07-14  8:09 UTC (permalink / raw)
  To: Kukjin Kim
  Cc: 'Minchan Kim', 'Dave Hansen',
	'Johannes Weiner', linux, 'Yinghai Lu',
	'H. Peter Anvin', 'Andrew Morton',
	'Shaohua Li', 'Yakui Zhao',
	linux-kernel, linux-mm, arm-kernel, 'Mel Gorman'

On Wed, 14 Jul 2010 16:50:25 +0900
Kukjin Kim <kgene.kim@samsung.com> wrote:

> KAMEZAWA Hiroyuki wrote:
> > 
> > On Wed, 14 Jul 2010 01:44:23 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> > 
> > > > If you _really_ can't make the section size smaller, and the vast
> > > > majority of the sections are fully populated, you could hack something
> > > > in.  We could, for instance, have a global list that's mostly readonly
> > > > which tells you which sections need to be have their sizes closely
> > > > inspected.  That would work OK if, for instance, you only needed to
> > > > check a couple of memory sections in the system.  It'll start to suck
> if
> > > > you made the lists very long.
> > >
> > > Thanks for advise. As I say, I hope Russell accept 16M section.
> > >
> Hi,
> 
> Thanks for your inputs.
> > 
> > It seems what I needed was good sleep....
> > How about this if 16M section is not acceptable ?
> > 
> > == NOT TESTED AT ALL, EVEN NOT COMPILED ==
> 
> Yeah...
> 
> Couldn't build with s5pv210_defconfig when used mmotm tree,
> And couldn't apply your patch against latest mainline 35-rc5.
> 
> Could you please remake your patch against mainline 35-rc5?
> Or...please let me know how I can test on my board(smdkv210).
> 
Ahh..how brave you are.

my patch was against mmotm-07-01.
select SPARSEMEM_HAS_PIT in config.
(in menuconfig it will appear under Processor type and features.)

This is a fixed one, maybe mm/sparce.o can be compiled.
At least, arch-generic part is compiled.

(This config should be selected automatically via arm's config.
 this patch is just for test.)

Signed-off-by: KAMEZAWA hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 arch/arm/mm/init.c     |   11 ++++++++++-
 include/linux/mmzone.h |   19 ++++++++++++++++++-
 mm/Kconfig             |    5 +++++
 mm/sparse.c            |   38 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 70 insertions(+), 3 deletions(-)

Index: mmotm-2.6.35-0701/include/linux/mmzone.h
===================================================================
--- mmotm-2.6.35-0701.orig/include/linux/mmzone.h
+++ mmotm-2.6.35-0701/include/linux/mmzone.h
@@ -1047,11 +1047,28 @@ static inline struct mem_section *__pfn_
 	return __nr_to_section(pfn_to_section_nr(pfn));
 }
 
+#ifdef CONFIG_SPARSEMEM_HAS_PIT
+void mark_memmap_pit(unsigned long start, unsigned long end, bool valid);
+static inline int page_valid(struct mem_section *ms, unsigned long pfn)
+{
+	struct page *page = pfn_to_page(pfn);
+	struct page *__pg = virt_to_page(page);
+	return __pg->private == ms;
+}
+#else
+static inline int page_valid(struct mem_section *ms, unsigned long pfn)
+{
+	return 1;
+}
+#endif
+
 static inline int pfn_valid(unsigned long pfn)
 {
+	struct mem_section *ms;
 	if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
 		return 0;
-	return valid_section(__nr_to_section(pfn_to_section_nr(pfn)));
+	ms = __nr_to_section(pfn_to_section_nr(pfn));
+	return valid_section(ms) && page_valid(ms, pfn);
 }
 
 static inline int pfn_present(unsigned long pfn)
Index: mmotm-2.6.35-0701/mm/sparse.c
===================================================================
--- mmotm-2.6.35-0701.orig/mm/sparse.c
+++ mmotm-2.6.35-0701/mm/sparse.c
@@ -13,7 +13,6 @@
 #include <asm/dma.h>
 #include <asm/pgalloc.h>
 #include <asm/pgtable.h>
-
 /*
  * Permanent SPARSEMEM data:
  *
@@ -615,6 +614,43 @@ void __init sparse_init(void)
 	free_bootmem(__pa(usemap_map), size);
 }
 
+#ifdef CONFIG_SPARSEMEM_HAS_PIT
+/*
+ * Fill memmap's pg->private with a pointer to mem_section.
+ * pfn_valid() will check this later. (see include/linux/mmzone.h)
+ * The caller should call
+ * 	mark_memmap_pit(start, end, true) # for all allocated mem_map
+ * 	and, after that,
+ * 	mark_memmap_pit(start, end, false) # for all pits in mem_map.
+ * please see usage in ARM.
+ */
+void mark_memmap_pit(unsigned long start, unsigned long end, bool valid)
+{
+	struct mem_section *ms;
+	unsigned long pos, next;
+	struct page *pg;
+	void *memmap, *mapend;
+
+	for (pos = start;
+	     pos < end; pos = next) {
+		next = (pos + PAGES_PER_SECTION) & PAGE_SECTION_MASK;
+		ms = __pfn_to_section(pos);
+		if (!valid_section(ms))
+			continue;
+		for (memmap = (void*)pfn_to_page(pos),
+		     mapend = pfn_to_page(next-1); /* the last page in section*/
+		     memmap < mapend;
+		     memmap += PAGE_SIZE) {
+			pg = virt_to_page(memmap);
+			if (valid)
+				pg->private = (unsigned long)ms;
+			else
+				pg->private = 0;
+		}
+	}
+}
+#endif
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static inline struct page *kmalloc_section_memmap(unsigned long pnum, int nid,
Index: mmotm-2.6.35-0701/arch/arm/mm/init.c
===================================================================
--- mmotm-2.6.35-0701.orig/arch/arm/mm/init.c
+++ mmotm-2.6.35-0701/arch/arm/mm/init.c
@@ -234,6 +234,13 @@ static void __init arm_bootmem_free(stru
 	arch_adjust_zones(zone_size, zhole_size);
 
 	free_area_init_node(0, zone_size, min, zhole_size);
+
+#ifdef CONFIG_SPARSEMEM
+	for_each_bank(i, mi) {
+		mark_memmap_pit(bank_start_pfn(mi->bank[i]),
+				bank_end_pfn(mi->bank[i]), true);
+	}
+#endif
 }
 
 #ifndef CONFIG_SPARSEMEM
@@ -386,8 +393,10 @@ free_memmap(unsigned long start_pfn, uns
 	 * If there are free pages between these,
 	 * free the section of the memmap array.
 	 */
-	if (pg < pgend)
+	if (pg < pgend) {
+		mark_memap_pit(pg >> PAGE_SHIFT, pgend >> PAGE_SHIFT, false);
 		free_bootmem(pg, pgend - pg);
+	}
 }
 
 /*
Index: mmotm-2.6.35-0701/mm/Kconfig
===================================================================
--- mmotm-2.6.35-0701.orig/mm/Kconfig
+++ mmotm-2.6.35-0701/mm/Kconfig
@@ -128,6 +128,11 @@ config SPARSEMEM_VMEMMAP
 	 pfn_to_page and page_to_pfn operations.  This is the most
 	 efficient option when sufficient kernel resources are available.
 
+config SPAESEMEM_HAS_PIT
+	bool "allow holes in sparsemem's memmap"
+	depends on SPARSEMEM && !SPARSEMEM_VMEMMAP
+	default n
+
 # eventually, we can have this option just 'select SPARSEMEM'
 config MEMORY_HOTPLUG
 	bool "Allow for memory hot-add"

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2010-07-14  8:14 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-12 15:53 [RFC] Tight check of pfn_valid on sparsemem Minchan Kim
2010-07-12 15:53 ` Minchan Kim
2010-07-12 23:59 ` Kukjin Kim
2010-07-12 23:59   ` Kukjin Kim
2010-07-13  3:19 ` KAMEZAWA Hiroyuki
2010-07-13  3:19   ` KAMEZAWA Hiroyuki
2010-07-13  4:11   ` Minchan Kim
2010-07-13  4:11     ` Minchan Kim
2010-07-13  4:23     ` KAMEZAWA Hiroyuki
2010-07-13  4:23       ` KAMEZAWA Hiroyuki
2010-07-13  6:04       ` Minchan Kim
2010-07-13  6:04         ` Minchan Kim
2010-07-13  6:40         ` KAMEZAWA Hiroyuki
2010-07-13  6:40           ` KAMEZAWA Hiroyuki
2010-07-13  8:06           ` Minchan Kim
2010-07-13  8:06             ` Minchan Kim
2010-07-13  8:03             ` KAMEZAWA Hiroyuki
2010-07-13  8:03               ` KAMEZAWA Hiroyuki
2010-07-13  7:20         ` Russell King - ARM Linux
2010-07-13  7:20           ` Russell King - ARM Linux
2010-07-13  7:34           ` KAMEZAWA Hiroyuki
2010-07-13  7:34             ` KAMEZAWA Hiroyuki
2010-07-13  7:58             ` KAMEZAWA Hiroyuki
2010-07-13  7:58               ` KAMEZAWA Hiroyuki
2010-07-13  8:02               ` KAMEZAWA Hiroyuki
2010-07-13  8:02                 ` KAMEZAWA Hiroyuki
2010-07-13 18:39                 ` Russell King - ARM Linux
2010-07-13 18:39                   ` Russell King - ARM Linux
2010-07-13 20:46                   ` Dave Hansen
2010-07-13 20:46                     ` Dave Hansen
2010-07-13  9:30 ` Johannes Weiner
2010-07-13  9:30   ` Johannes Weiner
2010-07-13 15:43   ` Minchan Kim
2010-07-13 15:43     ` Minchan Kim
2010-07-13 16:35     ` Dave Hansen
2010-07-13 16:35       ` Dave Hansen
2010-07-13 16:44       ` Minchan Kim
2010-07-13 16:44         ` Minchan Kim
2010-07-14  0:23         ` KAMEZAWA Hiroyuki
2010-07-14  0:23           ` KAMEZAWA Hiroyuki
2010-07-14  6:44           ` Minchan Kim
2010-07-14  6:44             ` Minchan Kim
2010-07-14  7:10             ` KAMEZAWA Hiroyuki
2010-07-14  7:10               ` KAMEZAWA Hiroyuki
2010-07-14  7:35               ` Minchan Kim
2010-07-14  7:35                 ` Minchan Kim
2010-07-14  7:39                 ` KAMEZAWA Hiroyuki
2010-07-14  7:39                   ` KAMEZAWA Hiroyuki
2010-07-14  7:50           ` Kukjin Kim
2010-07-14  7:50             ` Kukjin Kim
2010-07-14  8:09             ` KAMEZAWA Hiroyuki
2010-07-14  8:09               ` KAMEZAWA Hiroyuki
2010-07-13  9:37 ` Mel Gorman
2010-07-13  9:37   ` Mel Gorman
2010-07-13  9:46   ` Russell King - ARM Linux
2010-07-13  9:46     ` Russell King - ARM Linux
2010-07-13 10:00     ` Mel Gorman
2010-07-13 10:00       ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.