All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] page_cgroup cleanups
@ 2013-04-05 10:01 ` Glauber Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2013-04-05 10:01 UTC (permalink / raw)
  To: cgroups; +Cc: linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko

Hi,

Last time I sent the mem cgroup bypass patches, Kame and Michal pointed out
that some of it was a bit of cleanup, specifically at the page_cgroup side.
I've decided to separate those patches and send them separately. After these
patches are applied, page_cgroup will be initialized together with the root
cgroup, instead of init/main.c

When we move cgroup initialization to the first non-root cgroup created, all
we'll have to do from the page_cgroup side would be to move the initialization
that now happens at root, to the first child.

Glauber Costa (2):
  memcg: consistently use vmalloc for page_cgroup allocations
  memcg: defer page_cgroup initialization

 include/linux/page_cgroup.h | 21 +------------------
 init/main.c                 |  2 --
 mm/memcontrol.c             |  2 ++
 mm/page_cgroup.c            | 51 +++++++++++++++------------------------------
 4 files changed, 20 insertions(+), 56 deletions(-)

-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 0/2] page_cgroup cleanups
@ 2013-04-05 10:01 ` Glauber Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2013-04-05 10:01 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, Johannes Weiner,
	Michal Hocko

Hi,

Last time I sent the mem cgroup bypass patches, Kame and Michal pointed out
that some of it was a bit of cleanup, specifically at the page_cgroup side.
I've decided to separate those patches and send them separately. After these
patches are applied, page_cgroup will be initialized together with the root
cgroup, instead of init/main.c

When we move cgroup initialization to the first non-root cgroup created, all
we'll have to do from the page_cgroup side would be to move the initialization
that now happens at root, to the first child.

Glauber Costa (2):
  memcg: consistently use vmalloc for page_cgroup allocations
  memcg: defer page_cgroup initialization

 include/linux/page_cgroup.h | 21 +------------------
 init/main.c                 |  2 --
 mm/memcontrol.c             |  2 ++
 mm/page_cgroup.c            | 51 +++++++++++++++------------------------------
 4 files changed, 20 insertions(+), 56 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations
@ 2013-04-05 10:01   ` Glauber Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2013-04-05 10:01 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko, Glauber Costa

Right now, allocation for page_cgroup is a bit complicated, dependent on
a variety of system conditions:

For flat memory, we are likely to need quite big pages, so the page
allocator won't cut. We are forced to init flatmem mappings very early,
because if we run after the page allocator is in place those allocations
will be denied. Flatmem mappings thus resort to the bootmem allocator.

We can fix this by using vmalloc for flatmem mappings. However, we now
have the situation in which flatmem mapping allocate using vmalloc, but
sparsemem may or may not allocate with vmalloc. It will try the
page_allocator first, and retry vmalloc if it fails.

With that change in place, not only we *can* move
page_cgroup_flatmem_init, but we absolutely must move it. It now needs
to run with vmalloc enabled. Instead of just moving it after vmalloc, we
will move it together with the normal page_cgroup initialization. It
becomes then natural to merge them into a single name.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 include/linux/page_cgroup.h | 15 ---------------
 init/main.c                 |  1 -
 mm/page_cgroup.c            | 24 ++++++++++--------------
 3 files changed, 10 insertions(+), 30 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 777a524..4860eca 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -29,17 +29,7 @@ struct page_cgroup {
 
 void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
 
-#ifdef CONFIG_SPARSEMEM
-static inline void __init page_cgroup_init_flatmem(void)
-{
-}
 extern void __init page_cgroup_init(void);
-#else
-void __init page_cgroup_init_flatmem(void);
-static inline void __init page_cgroup_init(void)
-{
-}
-#endif
 
 struct page_cgroup *lookup_page_cgroup(struct page *page);
 struct page *lookup_cgroup_page(struct page_cgroup *pc);
@@ -97,11 +87,6 @@ static inline struct page_cgroup *lookup_page_cgroup(struct page *page)
 static inline void page_cgroup_init(void)
 {
 }
-
-static inline void __init page_cgroup_init_flatmem(void)
-{
-}
-
 #endif /* CONFIG_MEMCG */
 
 #include <linux/swap.h>
diff --git a/init/main.c b/init/main.c
index cee4b5c..494774f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -457,7 +457,6 @@ static void __init mm_init(void)
 	 * page_cgroup requires contiguous pages,
 	 * bigger than MAX_ORDER unless SPARSEMEM.
 	 */
-	page_cgroup_init_flatmem();
 	mem_init();
 	kmem_cache_init();
 	percpu_init_late();
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 6d757e3..84bca4b 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -53,9 +53,7 @@ static int __init alloc_node_page_cgroup(int nid)
 		return 0;
 
 	table_size = sizeof(struct page_cgroup) * nr_pages;
-
-	base = __alloc_bootmem_node_nopanic(NODE_DATA(nid),
-			table_size, PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
+	base = vzalloc_node(table_size, nid);
 	if (!base)
 		return -ENOMEM;
 	NODE_DATA(nid)->node_page_cgroup = base;
@@ -63,7 +61,7 @@ static int __init alloc_node_page_cgroup(int nid)
 	return 0;
 }
 
-void __init page_cgroup_init_flatmem(void)
+void __init page_cgroup_init(void)
 {
 
 	int nid, fail;
@@ -105,38 +103,37 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
 	return section->page_cgroup + pfn;
 }
 
-static void *__meminit alloc_page_cgroup(size_t size, int nid)
+static void *alloc_page_cgroup(int nid)
 {
 	gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN;
 	void *addr = NULL;
+	size_t table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
 
-	addr = alloc_pages_exact_nid(nid, size, flags);
+	addr = alloc_pages_exact_nid(nid, table_size, flags);
 	if (addr) {
-		kmemleak_alloc(addr, size, 1, flags);
+		kmemleak_alloc(addr, table_size, 1, flags);
 		return addr;
 	}
 
 	if (node_state(nid, N_HIGH_MEMORY))
-		addr = vzalloc_node(size, nid);
+		addr = vzalloc_node(table_size, nid);
 	else
-		addr = vzalloc(size);
+		addr = vzalloc(table_size);
 
 	return addr;
 }
 
-static int __meminit init_section_page_cgroup(unsigned long pfn, int nid)
+static int init_section_page_cgroup(unsigned long pfn, int nid)
 {
 	struct mem_section *section;
 	struct page_cgroup *base;
-	unsigned long table_size;
 
 	section = __pfn_to_section(pfn);
 
 	if (section->page_cgroup)
 		return 0;
 
-	table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
-	base = alloc_page_cgroup(table_size, nid);
+	base = alloc_page_cgroup(nid);
 
 	/*
 	 * The value stored in section->page_cgroup is (base - pfn)
@@ -156,7 +153,6 @@ static int __meminit init_section_page_cgroup(unsigned long pfn, int nid)
 	 */
 	pfn &= PAGE_SECTION_MASK;
 	section->page_cgroup = base - pfn;
-	total_usage += table_size;
 	return 0;
 }
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations
@ 2013-04-05 10:01   ` Glauber Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2013-04-05 10:01 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, Johannes Weiner,
	Michal Hocko, Glauber Costa

Right now, allocation for page_cgroup is a bit complicated, dependent on
a variety of system conditions:

For flat memory, we are likely to need quite big pages, so the page
allocator won't cut. We are forced to init flatmem mappings very early,
because if we run after the page allocator is in place those allocations
will be denied. Flatmem mappings thus resort to the bootmem allocator.

We can fix this by using vmalloc for flatmem mappings. However, we now
have the situation in which flatmem mapping allocate using vmalloc, but
sparsemem may or may not allocate with vmalloc. It will try the
page_allocator first, and retry vmalloc if it fails.

With that change in place, not only we *can* move
page_cgroup_flatmem_init, but we absolutely must move it. It now needs
to run with vmalloc enabled. Instead of just moving it after vmalloc, we
will move it together with the normal page_cgroup initialization. It
becomes then natural to merge them into a single name.

Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
---
 include/linux/page_cgroup.h | 15 ---------------
 init/main.c                 |  1 -
 mm/page_cgroup.c            | 24 ++++++++++--------------
 3 files changed, 10 insertions(+), 30 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 777a524..4860eca 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -29,17 +29,7 @@ struct page_cgroup {
 
 void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
 
-#ifdef CONFIG_SPARSEMEM
-static inline void __init page_cgroup_init_flatmem(void)
-{
-}
 extern void __init page_cgroup_init(void);
-#else
-void __init page_cgroup_init_flatmem(void);
-static inline void __init page_cgroup_init(void)
-{
-}
-#endif
 
 struct page_cgroup *lookup_page_cgroup(struct page *page);
 struct page *lookup_cgroup_page(struct page_cgroup *pc);
@@ -97,11 +87,6 @@ static inline struct page_cgroup *lookup_page_cgroup(struct page *page)
 static inline void page_cgroup_init(void)
 {
 }
-
-static inline void __init page_cgroup_init_flatmem(void)
-{
-}
-
 #endif /* CONFIG_MEMCG */
 
 #include <linux/swap.h>
diff --git a/init/main.c b/init/main.c
index cee4b5c..494774f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -457,7 +457,6 @@ static void __init mm_init(void)
 	 * page_cgroup requires contiguous pages,
 	 * bigger than MAX_ORDER unless SPARSEMEM.
 	 */
-	page_cgroup_init_flatmem();
 	mem_init();
 	kmem_cache_init();
 	percpu_init_late();
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 6d757e3..84bca4b 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -53,9 +53,7 @@ static int __init alloc_node_page_cgroup(int nid)
 		return 0;
 
 	table_size = sizeof(struct page_cgroup) * nr_pages;
-
-	base = __alloc_bootmem_node_nopanic(NODE_DATA(nid),
-			table_size, PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
+	base = vzalloc_node(table_size, nid);
 	if (!base)
 		return -ENOMEM;
 	NODE_DATA(nid)->node_page_cgroup = base;
@@ -63,7 +61,7 @@ static int __init alloc_node_page_cgroup(int nid)
 	return 0;
 }
 
-void __init page_cgroup_init_flatmem(void)
+void __init page_cgroup_init(void)
 {
 
 	int nid, fail;
@@ -105,38 +103,37 @@ struct page_cgroup *lookup_page_cgroup(struct page *page)
 	return section->page_cgroup + pfn;
 }
 
-static void *__meminit alloc_page_cgroup(size_t size, int nid)
+static void *alloc_page_cgroup(int nid)
 {
 	gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN;
 	void *addr = NULL;
+	size_t table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
 
-	addr = alloc_pages_exact_nid(nid, size, flags);
+	addr = alloc_pages_exact_nid(nid, table_size, flags);
 	if (addr) {
-		kmemleak_alloc(addr, size, 1, flags);
+		kmemleak_alloc(addr, table_size, 1, flags);
 		return addr;
 	}
 
 	if (node_state(nid, N_HIGH_MEMORY))
-		addr = vzalloc_node(size, nid);
+		addr = vzalloc_node(table_size, nid);
 	else
-		addr = vzalloc(size);
+		addr = vzalloc(table_size);
 
 	return addr;
 }
 
-static int __meminit init_section_page_cgroup(unsigned long pfn, int nid)
+static int init_section_page_cgroup(unsigned long pfn, int nid)
 {
 	struct mem_section *section;
 	struct page_cgroup *base;
-	unsigned long table_size;
 
 	section = __pfn_to_section(pfn);
 
 	if (section->page_cgroup)
 		return 0;
 
-	table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
-	base = alloc_page_cgroup(table_size, nid);
+	base = alloc_page_cgroup(nid);
 
 	/*
 	 * The value stored in section->page_cgroup is (base - pfn)
@@ -156,7 +153,6 @@ static int __meminit init_section_page_cgroup(unsigned long pfn, int nid)
 	 */
 	pfn &= PAGE_SECTION_MASK;
 	section->page_cgroup = base - pfn;
-	total_usage += table_size;
 	return 0;
 }
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] memcg: defer page_cgroup initialization
@ 2013-04-05 10:01   ` Glauber Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2013-04-05 10:01 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko, Glauber Costa

We have now reached the point in which there is no real need to allocate
page_cgroup upon system boot. We can defer it to the first memcg
initialization, and if it fails, we treat it like any other memcg memory
failures (like for instance, if the mem_cgroup structure itself failed).
In the future, we may want to defer this to the first non-root cgroup
initialization, but we are not there yet.

With that, page_cgroup can be more silent in its initialization.

Signed-off-by: Glauber Costa <glommer@parallels.com>
---
 include/linux/page_cgroup.h |  6 +-----
 init/main.c                 |  1 -
 mm/memcontrol.c             |  2 ++
 mm/page_cgroup.c            | 29 ++++++++---------------------
 4 files changed, 11 insertions(+), 27 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 4860eca..121b17b 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -29,7 +29,7 @@ struct page_cgroup {
 
 void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
 
-extern void __init page_cgroup_init(void);
+extern bool page_cgroup_init(void);
 
 struct page_cgroup *lookup_page_cgroup(struct page *page);
 struct page *lookup_cgroup_page(struct page_cgroup *pc);
@@ -83,10 +83,6 @@ static inline struct page_cgroup *lookup_page_cgroup(struct page *page)
 {
 	return NULL;
 }
-
-static inline void page_cgroup_init(void)
-{
-}
 #endif /* CONFIG_MEMCG */
 
 #include <linux/swap.h>
diff --git a/init/main.c b/init/main.c
index 494774f..1fb3ec0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -591,7 +591,6 @@ asmlinkage void __init start_kernel(void)
 		initrd_start = 0;
 	}
 #endif
-	page_cgroup_init();
 	debug_objects_mem_init();
 	kmemleak_init();
 	setup_per_cpu_pageset();
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f608546..59a5b1f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6357,6 +6357,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
 		res_counter_init(&memcg->res, NULL);
 		res_counter_init(&memcg->memsw, NULL);
 		res_counter_init(&memcg->kmem, NULL);
+		if (page_cgroup_init())
+			goto free_out;
 	}
 
 	memcg->last_scanned_node = MAX_NUMNODES;
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 84bca4b..0256658 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -61,27 +61,20 @@ static int __init alloc_node_page_cgroup(int nid)
 	return 0;
 }
 
-void __init page_cgroup_init(void)
+bool page_cgroup_init(void)
 {
 
 	int nid, fail;
 
 	if (mem_cgroup_disabled())
-		return;
+		return 0;
 
 	for_each_online_node(nid)  {
 		fail = alloc_node_page_cgroup(nid);
 		if (fail)
-			goto fail;
+			return 1;
 	}
-	printk(KERN_INFO "allocated %ld bytes of page_cgroup\n", total_usage);
-	printk(KERN_INFO "please try 'cgroup_disable=memory' option if you"
-	" don't want memory cgroups\n");
-	return;
-fail:
-	printk(KERN_CRIT "allocation of page_cgroup failed.\n");
-	printk(KERN_CRIT "please try 'cgroup_disable=memory' boot option\n");
-	panic("Out of memory");
+	return 0;
 }
 
 #else /* CONFIG_FLAT_NODE_MEM_MAP */
@@ -262,13 +255,13 @@ static int __meminit page_cgroup_callback(struct notifier_block *self,
 
 #endif
 
-void __init page_cgroup_init(void)
+bool page_cgroup_init(void)
 {
 	unsigned long pfn;
 	int nid;
 
 	if (mem_cgroup_disabled())
-		return;
+		return 0;
 
 	for_each_node_state(nid, N_MEMORY) {
 		unsigned long start_pfn, end_pfn;
@@ -295,17 +288,11 @@ void __init page_cgroup_init(void)
 			if (pfn_to_nid(pfn) != nid)
 				continue;
 			if (init_section_page_cgroup(pfn, nid))
-				goto oom;
+				return 1;
 		}
 	}
 	hotplug_memory_notifier(page_cgroup_callback, 0);
-	printk(KERN_INFO "allocated %ld bytes of page_cgroup\n", total_usage);
-	printk(KERN_INFO "please try 'cgroup_disable=memory' option if you "
-			 "don't want memory cgroups\n");
-	return;
-oom:
-	printk(KERN_CRIT "try 'cgroup_disable=memory' boot option\n");
-	panic("Out of memory");
+	return 0;
 }
 
 void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] memcg: defer page_cgroup initialization
@ 2013-04-05 10:01   ` Glauber Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2013-04-05 10:01 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, Johannes Weiner,
	Michal Hocko, Glauber Costa

We have now reached the point in which there is no real need to allocate
page_cgroup upon system boot. We can defer it to the first memcg
initialization, and if it fails, we treat it like any other memcg memory
failures (like for instance, if the mem_cgroup structure itself failed).
In the future, we may want to defer this to the first non-root cgroup
initialization, but we are not there yet.

With that, page_cgroup can be more silent in its initialization.

Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
---
 include/linux/page_cgroup.h |  6 +-----
 init/main.c                 |  1 -
 mm/memcontrol.c             |  2 ++
 mm/page_cgroup.c            | 29 ++++++++---------------------
 4 files changed, 11 insertions(+), 27 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 4860eca..121b17b 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -29,7 +29,7 @@ struct page_cgroup {
 
 void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat);
 
-extern void __init page_cgroup_init(void);
+extern bool page_cgroup_init(void);
 
 struct page_cgroup *lookup_page_cgroup(struct page *page);
 struct page *lookup_cgroup_page(struct page_cgroup *pc);
@@ -83,10 +83,6 @@ static inline struct page_cgroup *lookup_page_cgroup(struct page *page)
 {
 	return NULL;
 }
-
-static inline void page_cgroup_init(void)
-{
-}
 #endif /* CONFIG_MEMCG */
 
 #include <linux/swap.h>
diff --git a/init/main.c b/init/main.c
index 494774f..1fb3ec0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -591,7 +591,6 @@ asmlinkage void __init start_kernel(void)
 		initrd_start = 0;
 	}
 #endif
-	page_cgroup_init();
 	debug_objects_mem_init();
 	kmemleak_init();
 	setup_per_cpu_pageset();
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f608546..59a5b1f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6357,6 +6357,8 @@ mem_cgroup_css_alloc(struct cgroup *cont)
 		res_counter_init(&memcg->res, NULL);
 		res_counter_init(&memcg->memsw, NULL);
 		res_counter_init(&memcg->kmem, NULL);
+		if (page_cgroup_init())
+			goto free_out;
 	}
 
 	memcg->last_scanned_node = MAX_NUMNODES;
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 84bca4b..0256658 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -61,27 +61,20 @@ static int __init alloc_node_page_cgroup(int nid)
 	return 0;
 }
 
-void __init page_cgroup_init(void)
+bool page_cgroup_init(void)
 {
 
 	int nid, fail;
 
 	if (mem_cgroup_disabled())
-		return;
+		return 0;
 
 	for_each_online_node(nid)  {
 		fail = alloc_node_page_cgroup(nid);
 		if (fail)
-			goto fail;
+			return 1;
 	}
-	printk(KERN_INFO "allocated %ld bytes of page_cgroup\n", total_usage);
-	printk(KERN_INFO "please try 'cgroup_disable=memory' option if you"
-	" don't want memory cgroups\n");
-	return;
-fail:
-	printk(KERN_CRIT "allocation of page_cgroup failed.\n");
-	printk(KERN_CRIT "please try 'cgroup_disable=memory' boot option\n");
-	panic("Out of memory");
+	return 0;
 }
 
 #else /* CONFIG_FLAT_NODE_MEM_MAP */
@@ -262,13 +255,13 @@ static int __meminit page_cgroup_callback(struct notifier_block *self,
 
 #endif
 
-void __init page_cgroup_init(void)
+bool page_cgroup_init(void)
 {
 	unsigned long pfn;
 	int nid;
 
 	if (mem_cgroup_disabled())
-		return;
+		return 0;
 
 	for_each_node_state(nid, N_MEMORY) {
 		unsigned long start_pfn, end_pfn;
@@ -295,17 +288,11 @@ void __init page_cgroup_init(void)
 			if (pfn_to_nid(pfn) != nid)
 				continue;
 			if (init_section_page_cgroup(pfn, nid))
-				goto oom;
+				return 1;
 		}
 	}
 	hotplug_memory_notifier(page_cgroup_callback, 0);
-	printk(KERN_INFO "allocated %ld bytes of page_cgroup\n", total_usage);
-	printk(KERN_INFO "please try 'cgroup_disable=memory' option if you "
-			 "don't want memory cgroups\n");
-	return;
-oom:
-	printk(KERN_CRIT "try 'cgroup_disable=memory' boot option\n");
-	panic("Out of memory");
+	return 0;
 }
 
 void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] page_cgroup cleanups
@ 2013-04-05 11:32   ` Glauber Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2013-04-05 11:32 UTC (permalink / raw)
  To: cgroups; +Cc: linux-mm, kamezawa.hiroyu, Johannes Weiner, Michal Hocko

On 04/05/2013 02:01 PM, Glauber Costa wrote:
> Hi,
> 
> Last time I sent the mem cgroup bypass patches, Kame and Michal pointed out
> that some of it was a bit of cleanup, specifically at the page_cgroup side.
> I've decided to separate those patches and send them separately. After these
> patches are applied, page_cgroup will be initialized together with the root
> cgroup, instead of init/main.c
> 
> When we move cgroup initialization to the first non-root cgroup created, all
> we'll have to do from the page_cgroup side would be to move the initialization
> that now happens at root, to the first child.
> 
> Glauber Costa (2):
>   memcg: consistently use vmalloc for page_cgroup allocations
>   memcg: defer page_cgroup initialization
> 
>  include/linux/page_cgroup.h | 21 +------------------
>  init/main.c                 |  2 --
>  mm/memcontrol.c             |  2 ++
>  mm/page_cgroup.c            | 51 +++++++++++++++------------------------------
>  4 files changed, 20 insertions(+), 56 deletions(-)
> 
FYI: There are kbuild warnings with this. I wanted to send it earlier to
see what people think. If there is no changes requested, please let me
know I will send a new version with just the kbuild fixes folded.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/2] page_cgroup cleanups
@ 2013-04-05 11:32   ` Glauber Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2013-04-05 11:32 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, Johannes Weiner,
	Michal Hocko

On 04/05/2013 02:01 PM, Glauber Costa wrote:
> Hi,
> 
> Last time I sent the mem cgroup bypass patches, Kame and Michal pointed out
> that some of it was a bit of cleanup, specifically at the page_cgroup side.
> I've decided to separate those patches and send them separately. After these
> patches are applied, page_cgroup will be initialized together with the root
> cgroup, instead of init/main.c
> 
> When we move cgroup initialization to the first non-root cgroup created, all
> we'll have to do from the page_cgroup side would be to move the initialization
> that now happens at root, to the first child.
> 
> Glauber Costa (2):
>   memcg: consistently use vmalloc for page_cgroup allocations
>   memcg: defer page_cgroup initialization
> 
>  include/linux/page_cgroup.h | 21 +------------------
>  init/main.c                 |  2 --
>  mm/memcontrol.c             |  2 ++
>  mm/page_cgroup.c            | 51 +++++++++++++++------------------------------
>  4 files changed, 20 insertions(+), 56 deletions(-)
> 
FYI: There are kbuild warnings with this. I wanted to send it earlier to
see what people think. If there is no changes requested, please let me
know I will send a new version with just the kbuild fixes folded.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations
@ 2013-04-05 12:06     ` Johannes Weiner
  0 siblings, 0 replies; 16+ messages in thread
From: Johannes Weiner @ 2013-04-05 12:06 UTC (permalink / raw)
  To: Glauber Costa; +Cc: cgroups, linux-mm, kamezawa.hiroyu, Michal Hocko

On Fri, Apr 05, 2013 at 02:01:11PM +0400, Glauber Costa wrote:
> Right now, allocation for page_cgroup is a bit complicated, dependent on
> a variety of system conditions:
> 
> For flat memory, we are likely to need quite big pages, so the page
> allocator won't cut. We are forced to init flatmem mappings very early,
> because if we run after the page allocator is in place those allocations
> will be denied. Flatmem mappings thus resort to the bootmem allocator.
> 
> We can fix this by using vmalloc for flatmem mappings. However, we now
> have the situation in which flatmem mapping allocate using vmalloc, but
> sparsemem may or may not allocate with vmalloc. It will try the
> page_allocator first, and retry vmalloc if it fails.

Vmalloc space is a precious resource on 32-bit systems and harder on
the TLB than the identity mapping.

It's a last resort thing for when you need an unusually large chunk of
contiguously addressable memory during runtime, like loading a module,
buffers shared with userspace etc..  But here we know, during boot
time, the exact amount of memory we need for the page_cgroup array.

Code cleanup is not a good reason to use vmalloc in this case, IMO.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations
@ 2013-04-05 12:06     ` Johannes Weiner
  0 siblings, 0 replies; 16+ messages in thread
From: Johannes Weiner @ 2013-04-05 12:06 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, Michal Hocko

On Fri, Apr 05, 2013 at 02:01:11PM +0400, Glauber Costa wrote:
> Right now, allocation for page_cgroup is a bit complicated, dependent on
> a variety of system conditions:
> 
> For flat memory, we are likely to need quite big pages, so the page
> allocator won't cut. We are forced to init flatmem mappings very early,
> because if we run after the page allocator is in place those allocations
> will be denied. Flatmem mappings thus resort to the bootmem allocator.
> 
> We can fix this by using vmalloc for flatmem mappings. However, we now
> have the situation in which flatmem mapping allocate using vmalloc, but
> sparsemem may or may not allocate with vmalloc. It will try the
> page_allocator first, and retry vmalloc if it fails.

Vmalloc space is a precious resource on 32-bit systems and harder on
the TLB than the identity mapping.

It's a last resort thing for when you need an unusually large chunk of
contiguously addressable memory during runtime, like loading a module,
buffers shared with userspace etc..  But here we know, during boot
time, the exact amount of memory we need for the page_cgroup array.

Code cleanup is not a good reason to use vmalloc in this case, IMO.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations
@ 2013-04-05 12:27       ` Glauber Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2013-04-05 12:27 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: cgroups, linux-mm, kamezawa.hiroyu, Michal Hocko

On 04/05/2013 04:06 PM, Johannes Weiner wrote:
> On Fri, Apr 05, 2013 at 02:01:11PM +0400, Glauber Costa wrote:
>> Right now, allocation for page_cgroup is a bit complicated, dependent on
>> a variety of system conditions:
>>
>> For flat memory, we are likely to need quite big pages, so the page
>> allocator won't cut. We are forced to init flatmem mappings very early,
>> because if we run after the page allocator is in place those allocations
>> will be denied. Flatmem mappings thus resort to the bootmem allocator.
>>
>> We can fix this by using vmalloc for flatmem mappings. However, we now
>> have the situation in which flatmem mapping allocate using vmalloc, but
>> sparsemem may or may not allocate with vmalloc. It will try the
>> page_allocator first, and retry vmalloc if it fails.
> 
> Vmalloc space is a precious resource on 32-bit systems and harder on
> the TLB than the identity mapping.
> 
> It's a last resort thing for when you need an unusually large chunk of
> contiguously addressable memory during runtime, like loading a module,
> buffers shared with userspace etc..  But here we know, during boot
> time, the exact amount of memory we need for the page_cgroup array.
> 
> Code cleanup is not a good reason to use vmalloc in this case, IMO.
> 
This is indeed a code cleanup, but a code cleanup with a side goal:
freeing us from the need to register page_cgroup mandatorily at init
time. This is done because page_cgroup_init_flatmem will use the bootmem
allocator, to avoid the page allocator limitations.

What I can try to do, and would happily do, is to try a normal page
allocation and then resort to vmalloc if it is too big.

Would that be okay to you ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations
@ 2013-04-05 12:27       ` Glauber Costa
  0 siblings, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2013-04-05 12:27 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, Michal Hocko

On 04/05/2013 04:06 PM, Johannes Weiner wrote:
> On Fri, Apr 05, 2013 at 02:01:11PM +0400, Glauber Costa wrote:
>> Right now, allocation for page_cgroup is a bit complicated, dependent on
>> a variety of system conditions:
>>
>> For flat memory, we are likely to need quite big pages, so the page
>> allocator won't cut. We are forced to init flatmem mappings very early,
>> because if we run after the page allocator is in place those allocations
>> will be denied. Flatmem mappings thus resort to the bootmem allocator.
>>
>> We can fix this by using vmalloc for flatmem mappings. However, we now
>> have the situation in which flatmem mapping allocate using vmalloc, but
>> sparsemem may or may not allocate with vmalloc. It will try the
>> page_allocator first, and retry vmalloc if it fails.
> 
> Vmalloc space is a precious resource on 32-bit systems and harder on
> the TLB than the identity mapping.
> 
> It's a last resort thing for when you need an unusually large chunk of
> contiguously addressable memory during runtime, like loading a module,
> buffers shared with userspace etc..  But here we know, during boot
> time, the exact amount of memory we need for the page_cgroup array.
> 
> Code cleanup is not a good reason to use vmalloc in this case, IMO.
> 
This is indeed a code cleanup, but a code cleanup with a side goal:
freeing us from the need to register page_cgroup mandatorily at init
time. This is done because page_cgroup_init_flatmem will use the bootmem
allocator, to avoid the page allocator limitations.

What I can try to do, and would happily do, is to try a normal page
allocation and then resort to vmalloc if it is too big.

Would that be okay to you ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations
@ 2013-04-05 16:25         ` Johannes Weiner
  0 siblings, 0 replies; 16+ messages in thread
From: Johannes Weiner @ 2013-04-05 16:25 UTC (permalink / raw)
  To: Glauber Costa; +Cc: cgroups, linux-mm, kamezawa.hiroyu, Michal Hocko

On Fri, Apr 05, 2013 at 04:27:56PM +0400, Glauber Costa wrote:
> On 04/05/2013 04:06 PM, Johannes Weiner wrote:
> > On Fri, Apr 05, 2013 at 02:01:11PM +0400, Glauber Costa wrote:
> >> Right now, allocation for page_cgroup is a bit complicated, dependent on
> >> a variety of system conditions:
> >>
> >> For flat memory, we are likely to need quite big pages, so the page
> >> allocator won't cut. We are forced to init flatmem mappings very early,
> >> because if we run after the page allocator is in place those allocations
> >> will be denied. Flatmem mappings thus resort to the bootmem allocator.
> >>
> >> We can fix this by using vmalloc for flatmem mappings. However, we now
> >> have the situation in which flatmem mapping allocate using vmalloc, but
> >> sparsemem may or may not allocate with vmalloc. It will try the
> >> page_allocator first, and retry vmalloc if it fails.
> > 
> > Vmalloc space is a precious resource on 32-bit systems and harder on
> > the TLB than the identity mapping.
> > 
> > It's a last resort thing for when you need an unusually large chunk of
> > contiguously addressable memory during runtime, like loading a module,
> > buffers shared with userspace etc..  But here we know, during boot
> > time, the exact amount of memory we need for the page_cgroup array.
> > 
> > Code cleanup is not a good reason to use vmalloc in this case, IMO.
> > 
> This is indeed a code cleanup, but a code cleanup with a side goal:
> freeing us from the need to register page_cgroup mandatorily at init
> time. This is done because page_cgroup_init_flatmem will use the bootmem
> allocator, to avoid the page allocator limitations.
> 
> What I can try to do, and would happily do, is to try a normal page
> allocation and then resort to vmalloc if it is too big.
> 
> Would that be okay to you ?

With the size of page_cgroup right now (2 words), we need half a page
per MB of represented memory on 32 bit, so booting on a 4GB 32 bit
machine needs an order-11 (MAX_ORDER) allocation and thus fall back to
8MB of the 128MB vmalloc space.  A 16GB machine falls back to 32MB, a
quarter of the vmalloc space.

Now, I think we all agree that these are not necessarily recommended
configurations but we should not be breaking them for the hell of it
either.

How about leaving flatmem as it is and have an on-demand allocation
model that just works with sparsemem?  A 128MB section on 64 bit
"only" needs order-7 pages, but we satisfy order-9 THP allocations all
the time during runtime, so this may just work.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations
@ 2013-04-05 16:25         ` Johannes Weiner
  0 siblings, 0 replies; 16+ messages in thread
From: Johannes Weiner @ 2013-04-05 16:25 UTC (permalink / raw)
  To: Glauber Costa
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A, Michal Hocko

On Fri, Apr 05, 2013 at 04:27:56PM +0400, Glauber Costa wrote:
> On 04/05/2013 04:06 PM, Johannes Weiner wrote:
> > On Fri, Apr 05, 2013 at 02:01:11PM +0400, Glauber Costa wrote:
> >> Right now, allocation for page_cgroup is a bit complicated, dependent on
> >> a variety of system conditions:
> >>
> >> For flat memory, we are likely to need quite big pages, so the page
> >> allocator won't cut. We are forced to init flatmem mappings very early,
> >> because if we run after the page allocator is in place those allocations
> >> will be denied. Flatmem mappings thus resort to the bootmem allocator.
> >>
> >> We can fix this by using vmalloc for flatmem mappings. However, we now
> >> have the situation in which flatmem mapping allocate using vmalloc, but
> >> sparsemem may or may not allocate with vmalloc. It will try the
> >> page_allocator first, and retry vmalloc if it fails.
> > 
> > Vmalloc space is a precious resource on 32-bit systems and harder on
> > the TLB than the identity mapping.
> > 
> > It's a last resort thing for when you need an unusually large chunk of
> > contiguously addressable memory during runtime, like loading a module,
> > buffers shared with userspace etc..  But here we know, during boot
> > time, the exact amount of memory we need for the page_cgroup array.
> > 
> > Code cleanup is not a good reason to use vmalloc in this case, IMO.
> > 
> This is indeed a code cleanup, but a code cleanup with a side goal:
> freeing us from the need to register page_cgroup mandatorily at init
> time. This is done because page_cgroup_init_flatmem will use the bootmem
> allocator, to avoid the page allocator limitations.
> 
> What I can try to do, and would happily do, is to try a normal page
> allocation and then resort to vmalloc if it is too big.
> 
> Would that be okay to you ?

With the size of page_cgroup right now (2 words), we need half a page
per MB of represented memory on 32 bit, so booting on a 4GB 32 bit
machine needs an order-11 (MAX_ORDER) allocation and thus fall back to
8MB of the 128MB vmalloc space.  A 16GB machine falls back to 32MB, a
quarter of the vmalloc space.

Now, I think we all agree that these are not necessarily recommended
configurations but we should not be breaking them for the hell of it
either.

How about leaving flatmem as it is and have an on-demand allocation
model that just works with sparsemem?  A 128MB section on 64 bit
"only" needs order-7 pages, but we satisfy order-9 THP allocations all
the time during runtime, so this may just work.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations
@ 2013-04-09  2:41           ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 16+ messages in thread
From: Kamezawa Hiroyuki @ 2013-04-09  2:41 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Glauber Costa, cgroups, linux-mm, Michal Hocko

(2013/04/06 1:25), Johannes Weiner wrote:
> On Fri, Apr 05, 2013 at 04:27:56PM +0400, Glauber Costa wrote:
>> On 04/05/2013 04:06 PM, Johannes Weiner wrote:
>>> On Fri, Apr 05, 2013 at 02:01:11PM +0400, Glauber Costa wrote:
>>>> Right now, allocation for page_cgroup is a bit complicated, dependent on
>>>> a variety of system conditions:
>>>>
>>>> For flat memory, we are likely to need quite big pages, so the page
>>>> allocator won't cut. We are forced to init flatmem mappings very early,
>>>> because if we run after the page allocator is in place those allocations
>>>> will be denied. Flatmem mappings thus resort to the bootmem allocator.
>>>>
>>>> We can fix this by using vmalloc for flatmem mappings. However, we now
>>>> have the situation in which flatmem mapping allocate using vmalloc, but
>>>> sparsemem may or may not allocate with vmalloc. It will try the
>>>> page_allocator first, and retry vmalloc if it fails.
>>>
>>> Vmalloc space is a precious resource on 32-bit systems and harder on
>>> the TLB than the identity mapping.
>>>
>>> It's a last resort thing for when you need an unusually large chunk of
>>> contiguously addressable memory during runtime, like loading a module,
>>> buffers shared with userspace etc..  But here we know, during boot
>>> time, the exact amount of memory we need for the page_cgroup array.
>>>
>>> Code cleanup is not a good reason to use vmalloc in this case, IMO.
>>>
>> This is indeed a code cleanup, but a code cleanup with a side goal:
>> freeing us from the need to register page_cgroup mandatorily at init
>> time. This is done because page_cgroup_init_flatmem will use the bootmem
>> allocator, to avoid the page allocator limitations.
>>
>> What I can try to do, and would happily do, is to try a normal page
>> allocation and then resort to vmalloc if it is too big.
>>
>> Would that be okay to you ?
>
> With the size of page_cgroup right now (2 words), we need half a page
> per MB of represented memory on 32 bit, so booting on a 4GB 32 bit
> machine needs an order-11 (MAX_ORDER) allocation and thus fall back to
> 8MB of the 128MB vmalloc space.  A 16GB machine falls back to 32MB, a
> quarter of the vmalloc space.
>
> Now, I think we all agree that these are not necessarily recommended
> configurations but we should not be breaking them for the hell of it
> either.
>
> How about leaving flatmem as it is and have an on-demand allocation
> model that just works with sparsemem?  A 128MB section on 64 bit
> "only" needs order-7 pages, but we satisfy order-9 THP allocations all
> the time during runtime, so this may just work.
>

I agree to Johannes' option.

Thanks,
-Kame






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations
@ 2013-04-09  2:41           ` Kamezawa Hiroyuki
  0 siblings, 0 replies; 16+ messages in thread
From: Kamezawa Hiroyuki @ 2013-04-09  2:41 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Glauber Costa, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Michal Hocko

(2013/04/06 1:25), Johannes Weiner wrote:
> On Fri, Apr 05, 2013 at 04:27:56PM +0400, Glauber Costa wrote:
>> On 04/05/2013 04:06 PM, Johannes Weiner wrote:
>>> On Fri, Apr 05, 2013 at 02:01:11PM +0400, Glauber Costa wrote:
>>>> Right now, allocation for page_cgroup is a bit complicated, dependent on
>>>> a variety of system conditions:
>>>>
>>>> For flat memory, we are likely to need quite big pages, so the page
>>>> allocator won't cut. We are forced to init flatmem mappings very early,
>>>> because if we run after the page allocator is in place those allocations
>>>> will be denied. Flatmem mappings thus resort to the bootmem allocator.
>>>>
>>>> We can fix this by using vmalloc for flatmem mappings. However, we now
>>>> have the situation in which flatmem mapping allocate using vmalloc, but
>>>> sparsemem may or may not allocate with vmalloc. It will try the
>>>> page_allocator first, and retry vmalloc if it fails.
>>>
>>> Vmalloc space is a precious resource on 32-bit systems and harder on
>>> the TLB than the identity mapping.
>>>
>>> It's a last resort thing for when you need an unusually large chunk of
>>> contiguously addressable memory during runtime, like loading a module,
>>> buffers shared with userspace etc..  But here we know, during boot
>>> time, the exact amount of memory we need for the page_cgroup array.
>>>
>>> Code cleanup is not a good reason to use vmalloc in this case, IMO.
>>>
>> This is indeed a code cleanup, but a code cleanup with a side goal:
>> freeing us from the need to register page_cgroup mandatorily at init
>> time. This is done because page_cgroup_init_flatmem will use the bootmem
>> allocator, to avoid the page allocator limitations.
>>
>> What I can try to do, and would happily do, is to try a normal page
>> allocation and then resort to vmalloc if it is too big.
>>
>> Would that be okay to you ?
>
> With the size of page_cgroup right now (2 words), we need half a page
> per MB of represented memory on 32 bit, so booting on a 4GB 32 bit
> machine needs an order-11 (MAX_ORDER) allocation and thus fall back to
> 8MB of the 128MB vmalloc space.  A 16GB machine falls back to 32MB, a
> quarter of the vmalloc space.
>
> Now, I think we all agree that these are not necessarily recommended
> configurations but we should not be breaking them for the hell of it
> either.
>
> How about leaving flatmem as it is and have an on-demand allocation
> model that just works with sparsemem?  A 128MB section on 64 bit
> "only" needs order-7 pages, but we satisfy order-9 THP allocations all
> the time during runtime, so this may just work.
>

I agree to Johannes' option.

Thanks,
-Kame






^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-04-09  2:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-05 10:01 [PATCH 0/2] page_cgroup cleanups Glauber Costa
2013-04-05 10:01 ` Glauber Costa
2013-04-05 10:01 ` [PATCH 1/2] memcg: consistently use vmalloc for page_cgroup allocations Glauber Costa
2013-04-05 10:01   ` Glauber Costa
2013-04-05 12:06   ` Johannes Weiner
2013-04-05 12:06     ` Johannes Weiner
2013-04-05 12:27     ` Glauber Costa
2013-04-05 12:27       ` Glauber Costa
2013-04-05 16:25       ` Johannes Weiner
2013-04-05 16:25         ` Johannes Weiner
2013-04-09  2:41         ` Kamezawa Hiroyuki
2013-04-09  2:41           ` Kamezawa Hiroyuki
2013-04-05 10:01 ` [PATCH 2/2] memcg: defer page_cgroup initialization Glauber Costa
2013-04-05 10:01   ` Glauber Costa
2013-04-05 11:32 ` [PATCH 0/2] page_cgroup cleanups Glauber Costa
2013-04-05 11:32   ` Glauber Costa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.