All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Unmapped Page Cache Control (v4)
@ 2011-01-25  5:04 ` Balbir Singh
  0 siblings, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2011-01-25  5:04 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: npiggin, kvm, linux-kernel, kosaki.motohiro, cl, kamezawa.hiroyu

The following series implements page cache control,
this is a split out version of patch 1 of version 3 of the
page cache optimization patches posted earlier at
Previous posting http://lwn.net/Articles/419564/

The previous few revision received lot of comments, I've tried to
address as many of those as possible in this revision.

Detailed Description
====================
This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario
- In a virtualized environment with cache=writethrough, we see
  double caching - (one in the host and one in the guest). As
  we try to scale guests, cache usage across the system grows.
  The goal of this patch is to reclaim page cache when Linux is running
  as a guest and get the host to hold the page cache and manage it.
  There might be temporary duplication, but in the long run, memory
  in the guests would be used for mapped pages.
- The option is controlled via a boot option and the administrator
  can selectively turn it on, on a need to use basis.

A lot of the code is borrowed from zone_reclaim_mode logic for
__zone_reclaim(). One might argue that the with ballooning and
KSM this feature is not very useful, but even with ballooning,
we need extra logic to balloon multiple VM machines and it is hard
to figure out the correct amount of memory to balloon. With these
patches applied, each guest has a sufficient amount of free memory
available, that can be easily seen and reclaimed by the balloon driver.
The additional memory in the guest can be reused for additional
applications or used to start additional guests/balance memory in
the host.

KSM currently does not de-duplicate host and guest page cache. The goal
of this patch is to help automatically balance unmapped page cache when
instructed to do so.

The sysctl for min_unmapped_ratio provides further control from
within the guest on the amount of unmapped pages to reclaim, a similar
max_unmapped_ratio sysctl is added and helps in the decision making
process of when reclaim should occur. This is tunable and set by
default to 16 (based on tradeoff's seen between aggressiveness in
balancing versus size of unmapped pages). Distro's and administrators
can further tweak this for desired control.

Data from the previous patchsets can be found at
https://lkml.org/lkml/2010/11/30/79


---

Balbir Singh (3):
      Move zone_reclaim() outside of CONFIG_NUMA
      Refactor zone_reclaim code
      Provide control over unmapped pages


 Documentation/kernel-parameters.txt |    8 ++
 include/linux/mmzone.h              |    9 ++-
 include/linux/swap.h                |   23 +++++--
 init/Kconfig                        |   12 +++
 kernel/sysctl.c                     |   29 ++++++--
 mm/page_alloc.c                     |   31 ++++++++-
 mm/vmscan.c                         |  122 +++++++++++++++++++++++++++++++----
 7 files changed, 202 insertions(+), 32 deletions(-)

-- 
Balbir Singh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 0/3] Unmapped Page Cache Control (v4)
@ 2011-01-25  5:04 ` Balbir Singh
  0 siblings, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2011-01-25  5:04 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: npiggin, kvm, linux-kernel, kosaki.motohiro, cl, kamezawa.hiroyu

The following series implements page cache control,
this is a split out version of patch 1 of version 3 of the
page cache optimization patches posted earlier at
Previous posting http://lwn.net/Articles/419564/

The previous few revision received lot of comments, I've tried to
address as many of those as possible in this revision.

Detailed Description
====================
This patch implements unmapped page cache control via preferred
page cache reclaim. The current patch hooks into kswapd and reclaims
page cache if the user has requested for unmapped page control.
This is useful in the following scenario
- In a virtualized environment with cache=writethrough, we see
  double caching - (one in the host and one in the guest). As
  we try to scale guests, cache usage across the system grows.
  The goal of this patch is to reclaim page cache when Linux is running
  as a guest and get the host to hold the page cache and manage it.
  There might be temporary duplication, but in the long run, memory
  in the guests would be used for mapped pages.
- The option is controlled via a boot option and the administrator
  can selectively turn it on, on a need to use basis.

A lot of the code is borrowed from zone_reclaim_mode logic for
__zone_reclaim(). One might argue that the with ballooning and
KSM this feature is not very useful, but even with ballooning,
we need extra logic to balloon multiple VM machines and it is hard
to figure out the correct amount of memory to balloon. With these
patches applied, each guest has a sufficient amount of free memory
available, that can be easily seen and reclaimed by the balloon driver.
The additional memory in the guest can be reused for additional
applications or used to start additional guests/balance memory in
the host.

KSM currently does not de-duplicate host and guest page cache. The goal
of this patch is to help automatically balance unmapped page cache when
instructed to do so.

The sysctl for min_unmapped_ratio provides further control from
within the guest on the amount of unmapped pages to reclaim, a similar
max_unmapped_ratio sysctl is added and helps in the decision making
process of when reclaim should occur. This is tunable and set by
default to 16 (based on tradeoff's seen between aggressiveness in
balancing versus size of unmapped pages). Distro's and administrators
can further tweak this for desired control.

Data from the previous patchsets can be found at
https://lkml.org/lkml/2010/11/30/79


---

Balbir Singh (3):
      Move zone_reclaim() outside of CONFIG_NUMA
      Refactor zone_reclaim code
      Provide control over unmapped pages


 Documentation/kernel-parameters.txt |    8 ++
 include/linux/mmzone.h              |    9 ++-
 include/linux/swap.h                |   23 +++++--
 init/Kconfig                        |   12 +++
 kernel/sysctl.c                     |   29 ++++++--
 mm/page_alloc.c                     |   31 ++++++++-
 mm/vmscan.c                         |  122 +++++++++++++++++++++++++++++++----
 7 files changed, 202 insertions(+), 32 deletions(-)

-- 
Balbir Singh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v4)
  2011-01-25  5:04 ` Balbir Singh
@ 2011-01-25  5:05   ` Balbir Singh
  -1 siblings, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2011-01-25  5:05 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: npiggin, kvm, linux-kernel, kosaki.motohiro, cl, kamezawa.hiroyu

This patch moves zone_reclaim and associated helpers
outside CONFIG_NUMA. This infrastructure is reused
in the patches for page cache control that follow.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 include/linux/mmzone.h |    4 ++--
 include/linux/swap.h   |    4 ++--
 kernel/sysctl.c        |   18 +++++++++---------
 mm/page_alloc.c        |    6 +++---
 mm/vmscan.c            |    2 --
 5 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 02ecb01..2485acc 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -303,12 +303,12 @@ struct zone {
 	 */
 	unsigned long		lowmem_reserve[MAX_NR_ZONES];
 
-#ifdef CONFIG_NUMA
-	int node;
 	/*
 	 * zone reclaim becomes active if more unmapped pages exist.
 	 */
 	unsigned long		min_unmapped_pages;
+#ifdef CONFIG_NUMA
+	int node;
 	unsigned long		min_slab_pages;
 #endif
 	struct per_cpu_pageset __percpu *pageset;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5e3355a..7b75626 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -255,11 +255,11 @@ extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
 
+extern int sysctl_min_unmapped_ratio;
+extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
-extern int sysctl_min_unmapped_ratio;
 extern int sysctl_min_slab_ratio;
-extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #else
 #define zone_reclaim_mode 0
 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index bc86bb3..12e8f26 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1224,15 +1224,6 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 	},
 #endif
-#ifdef CONFIG_NUMA
-	{
-		.procname	= "zone_reclaim_mode",
-		.data		= &zone_reclaim_mode,
-		.maxlen		= sizeof(zone_reclaim_mode),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-		.extra1		= &zero,
-	},
 	{
 		.procname	= "min_unmapped_ratio",
 		.data		= &sysctl_min_unmapped_ratio,
@@ -1242,6 +1233,15 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one_hundred,
 	},
+#ifdef CONFIG_NUMA
+	{
+		.procname	= "zone_reclaim_mode",
+		.data		= &zone_reclaim_mode,
+		.maxlen		= sizeof(zone_reclaim_mode),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+	},
 	{
 		.procname	= "min_slab_ratio",
 		.data		= &sysctl_min_slab_ratio,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aede3a4..7b56473 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4167,10 +4167,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
 						/ 100;
+#ifdef CONFIG_NUMA
+		zone->node = nid;
 		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
 #endif
 		zone->name = zone_names[j];
@@ -5084,7 +5084,6 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
-#ifdef CONFIG_NUMA
 int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -5101,6 +5100,7 @@ int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
+#ifdef CONFIG_NUMA
 int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 47a5096..5899f2f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2868,7 +2868,6 @@ static int __init kswapd_init(void)
 
 module_init(kswapd_init)
 
-#ifdef CONFIG_NUMA
 /*
  * Zone reclaim mode
  *
@@ -3078,7 +3077,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	return ret;
 }
-#endif
 
 /*
  * page_evictable - test whether a page is evictable


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v4)
@ 2011-01-25  5:05   ` Balbir Singh
  0 siblings, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2011-01-25  5:05 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: npiggin, kvm, linux-kernel, kosaki.motohiro, cl, kamezawa.hiroyu

This patch moves zone_reclaim and associated helpers
outside CONFIG_NUMA. This infrastructure is reused
in the patches for page cache control that follow.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---
 include/linux/mmzone.h |    4 ++--
 include/linux/swap.h   |    4 ++--
 kernel/sysctl.c        |   18 +++++++++---------
 mm/page_alloc.c        |    6 +++---
 mm/vmscan.c            |    2 --
 5 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 02ecb01..2485acc 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -303,12 +303,12 @@ struct zone {
 	 */
 	unsigned long		lowmem_reserve[MAX_NR_ZONES];
 
-#ifdef CONFIG_NUMA
-	int node;
 	/*
 	 * zone reclaim becomes active if more unmapped pages exist.
 	 */
 	unsigned long		min_unmapped_pages;
+#ifdef CONFIG_NUMA
+	int node;
 	unsigned long		min_slab_pages;
 #endif
 	struct per_cpu_pageset __percpu *pageset;
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5e3355a..7b75626 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -255,11 +255,11 @@ extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
 
+extern int sysctl_min_unmapped_ratio;
+extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
-extern int sysctl_min_unmapped_ratio;
 extern int sysctl_min_slab_ratio;
-extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
 #else
 #define zone_reclaim_mode 0
 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index bc86bb3..12e8f26 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1224,15 +1224,6 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 	},
 #endif
-#ifdef CONFIG_NUMA
-	{
-		.procname	= "zone_reclaim_mode",
-		.data		= &zone_reclaim_mode,
-		.maxlen		= sizeof(zone_reclaim_mode),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
-		.extra1		= &zero,
-	},
 	{
 		.procname	= "min_unmapped_ratio",
 		.data		= &sysctl_min_unmapped_ratio,
@@ -1242,6 +1233,15 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one_hundred,
 	},
+#ifdef CONFIG_NUMA
+	{
+		.procname	= "zone_reclaim_mode",
+		.data		= &zone_reclaim_mode,
+		.maxlen		= sizeof(zone_reclaim_mode),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.extra1		= &zero,
+	},
 	{
 		.procname	= "min_slab_ratio",
 		.data		= &sysctl_min_slab_ratio,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index aede3a4..7b56473 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4167,10 +4167,10 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
 
 		zone->spanned_pages = size;
 		zone->present_pages = realsize;
-#ifdef CONFIG_NUMA
-		zone->node = nid;
 		zone->min_unmapped_pages = (realsize*sysctl_min_unmapped_ratio)
 						/ 100;
+#ifdef CONFIG_NUMA
+		zone->node = nid;
 		zone->min_slab_pages = (realsize * sysctl_min_slab_ratio) / 100;
 #endif
 		zone->name = zone_names[j];
@@ -5084,7 +5084,6 @@ int min_free_kbytes_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
-#ifdef CONFIG_NUMA
 int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
@@ -5101,6 +5100,7 @@ int sysctl_min_unmapped_ratio_sysctl_handler(ctl_table *table, int write,
 	return 0;
 }
 
+#ifdef CONFIG_NUMA
 int sysctl_min_slab_ratio_sysctl_handler(ctl_table *table, int write,
 	void __user *buffer, size_t *length, loff_t *ppos)
 {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 47a5096..5899f2f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2868,7 +2868,6 @@ static int __init kswapd_init(void)
 
 module_init(kswapd_init)
 
-#ifdef CONFIG_NUMA
 /*
  * Zone reclaim mode
  *
@@ -3078,7 +3077,6 @@ int zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	return ret;
 }
-#endif
 
 /*
  * page_evictable - test whether a page is evictable

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v4)
  2011-01-25  5:05   ` Balbir Singh
@ 2011-01-26 16:56     ` Christoph Lameter
  -1 siblings, 0 replies; 8+ messages in thread
From: Christoph Lameter @ 2011-01-26 16:56 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu


Reviewed-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v4)
@ 2011-01-26 16:56     ` Christoph Lameter
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Lameter @ 2011-01-26 16:56 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu


Reviewed-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v4)
  2011-01-26 16:56     ` Christoph Lameter
@ 2011-01-26 17:43       ` Balbir Singh
  -1 siblings, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2011-01-26 17:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu

* Christoph Lameter <cl@linux.com> [2011-01-26 10:56:56]:

> 
> Reviewed-by: Christoph Lameter <cl@linux.com>
>

Thanks for the review! 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v4)
@ 2011-01-26 17:43       ` Balbir Singh
  0 siblings, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2011-01-26 17:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, akpm, npiggin, kvm, linux-kernel, kosaki.motohiro,
	kamezawa.hiroyu

* Christoph Lameter <cl@linux.com> [2011-01-26 10:56:56]:

> 
> Reviewed-by: Christoph Lameter <cl@linux.com>
>

Thanks for the review! 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-01-28  6:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-25  5:04 [PATCH 0/3] Unmapped Page Cache Control (v4) Balbir Singh
2011-01-25  5:04 ` Balbir Singh
2011-01-25  5:05 ` [PATCH 1/3] Move zone_reclaim() outside of CONFIG_NUMA (v4) Balbir Singh
2011-01-25  5:05   ` Balbir Singh
2011-01-26 16:56   ` Christoph Lameter
2011-01-26 16:56     ` Christoph Lameter
2011-01-26 17:43     ` Balbir Singh
2011-01-26 17:43       ` Balbir Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.